Automatic segmentation of clinical target volume | Basic Computer Hubb

Key Points

  1. A U-ResNet model can auto-delineate for breast conservative radiotherapy.
  2. CTV and OARs generated by our model can meet the clinical requirements.
  3. AI assistance can effectively improve consistency in contouring radiotherapy workflow.

Introduction

Breast cancer (BC) is one of the most common cancers for women throughout the world.1 Breast radiotherapy after breast-conserving surgery (BCS) is an essential treatment for early breast cancer patients.2,3 Radiotherapy of tumors requires accurate, individualized contouring of clinical target volume (CTV) and organs at risk (OARs) to deliver high radiation doses to the target and to spare healthy tissues.4 Therefore, computer-assisted automatic segmentation techniques are highly desired and useful for relieving radiation oncologists from labor-intensive work as well as reducing considerable inter- and intra-observer variability in delineation of the regions of interest (ROIs).5,6

Current automatic approaches can be generally categorized into two groups: atlas-based auto-segmentation (ABAS) and convolutional neural network (CNN) based segmentation. Acceptable results have been reported using ABAS for OARs in head and neck cancer and prostate cancer.7–9 However, CTV is not a region with clear boundaries but includes tissues of potential tumor or subclinical diseases that are barely detectable in CT images.10 Moreover, the inconsistencies in body shape, organ size, and density of mammary glandular tissue remain large from person to person.11,12 Therefore, various kinds of CNN models13–16 have been presented for different cancers,8,16–20 showing better performance than ABAS.

A deep dilated residual network (DD-ResNet) was previously proposed by Men et al16 to perform automatic breast CTV contouring. A 0.91 DSC was reported for both the right and left breast CTV, but no clinical evaluation was performed. Moreover, this method was focused on CTV contouring; the OARs were not considered.

Here, we constructed a new CNN model based on the 2D U-Net model to solve the large inconsistencies between source and target image, even with a scarce amount of labelled training data. The proposed model was trained and then compared against U-Net. The accuracy and effectiveness were evaluated by both performance metrics and qualified radiation oncologists.

Materials and Methods

Data Acquisition

CT scans of patients with early-stage BC who underwent BCS in Peking Union Medical College Hospital were collected from January 2019 to December 2019. This study was approved by the Institutional Review Board of Peking Union Medical College Hospital. Informed consent/assent from the patient and/or parent/guardian, as appropriate, was obtained before enrollment. This study was conducted in accordance with the Declaration of Helsinki. The inclusion criteria are as follows: (1) Patients who were diagnosed with early-stage BC and underwent breast conservative surgery. (2) All the patients met the indication for radiotherapy and received whole-breast irradiation. Patients who underwent axilla or supraclavicular lymph nodes radiotherapy were excluded.

In total, 12,640 CT slices were collected from 160 patients; 79 patients had left-sided BC and the remainder had right-sided BC. All the CT scans followed the digital imaging and communications in medicine (DICOM) protocol and were scanned using a Philips Brilliance Big Bore CT scanner. CT images were reconstructed using a matrix size of 512×512 and a thickness of 5 mm. The pixel spacing of the data was 1.1543 mm × 1.1543 mm.

Contouring of the CTV and OARs (contralateral breast, lungs, heart, and spinal cord) were delineated manually by trained radiation oncologists following the European Society for Radiotherapy and Oncology (ESTRO)21 and the Radiation Therapy Oncology Group (RTOG)22 protocols. The specific sketching standards for CTV are shown in Table 1. All the contours were reviewed and approved by two professional radiation oncologists with more than 10 years’ experience in our center.

Table 1 The Standard Delineation of CTV After BCS

Network Architecture

Our model, called U-ResNet, is originated from the 2D U-Net model, which is composed of encoder and decoder paths. To conduct the segmentation task for BC radiotherapy, especially for the CTV segmentation, a deep network should be added to the U-Net to extract features as different abstraction levels. At the same time, the vanishing gradients of deep convolutional networks should be avoided. Therefore, ResNet is used as the encoder part. It encodes low-, middle- and high- level features and passes these features to the decoder part via four shortcut connections. In the decoder part, the upscaling is achieved using nearest neighbour interpolation, followed by a convolutional layer and a residual block. In this way, multiple-level features in the encoder and decoder parts are concatenated. The overall architectures of DD-ResNet and our proposed method are shown in Figure 1. DD-ResNet has no shortcut connections between the encoder and the decoder. The output of the sum layer was interpolated to the original size with a factor of 8, which may result in information loss.

Figure 1 Architecture of (A) deep dilated convolutional neural network (DDCNN), (B) our proposed network, and (C) the residual block used in decoder part of our network.

The breast is a continuous and smooth surface. A 2D architecture may result in a rough segmentation result in a 3D view. To obtain the 3D information of CT scans, the network is designed as a 2.5D architecture by assigning three adjacent slices into three channels as the input.

Implementation Details

The dataset, composed of 160 patients, was randomly assigned in 8:1:1 to three cohorts: 1) a training set of 128 patients was used to construct the segmentation model, 2) a validation set of 16 patients was used to optimize the parameters and 3) a testing set of 16 patients was used to obtain artificial intelligence-generated contouring for performance assessment. During the testing phase, all the CT slices of the 16 testing cases were tested individually.

We constructed our model using Python 3.6 and PyTorch 1.0. An Adam optimization algorithm23 was used for the optimization. The learning rate was 0.001. We trained and evaluated our model with a GTX 1080 GPU. The proposed model was trained over 50 circles to select the best model according to the lowest validation loss score. The convolutional layers are initialized using Xavier Uniform Initialization, and batch normalization layers were added after convolution layers to improve the training speed and to prevent overfitting.24

Performance Measurement

Performance of the proposed method was evaluated using the dice similarity coefficient (DSC) and the 95th percentile Hausdorff distance (95HD) to quantify the results. The mean and standard deviation were also calculated.

The DSC was used to measure the spatial overlap between AI and GT contours, which is defined in Equation (1).

(1)

where A represents the volume of the human-generated contour; B is the volume of an AI contour; and is the intersect volume that A and B have in common. The DSC value was between 0 and 1 (0 = no overlap, 1 = complete overlap).

The 95HD is defined as follows:

(2)


(3)


(4)

where A represents the human-generated contour; B is the AI contour, ||.|| means the Euclidean norm of the points of A and B. The 95HD means the 95 percentile maximum mismatch between A and B. When the 95HD value decreases, the similarity between A and B increases.

Oncologist Evaluation

OARs Evaluation

Considering that evaluation metrics cannot provide a comprehensive insight into whether the contours need to be modified in clinical practice, another 20 cases in clinical practice from our center were randomly collected. Each case was delineated with GT and AI contours for OARs and then distributed to two radiation oncologists with more than 10 years of clinical experience for further evaluation. Each slice was carefully evaluated, and the results were graded in four levels: 3 points (no need to be edited), 2 points (the number of layers need to be edited ≤4), 1 point (the number of layers need to be edited ≥4) and 0 point (not acceptable).

CTV Evaluation

CTV segmentations generated by AI and GT were also evaluated blindly slice by slice. The test data contained 10 patients and 650 slices in total (AI: 327 slices vs GT: 323 slices). The representative results were also graded on four levels: 3 points (acceptable for subsequent treatment), 2 points (Minor Revision), 1 point (Major Revision) and 0 point (Not Acceptable for treatment). When the score ≥2, it was defined as suitable for clinical application.

Furthermore, to verify the consistency of the judgment of two oncologists, we collected the CTV score of each slice evaluated by two oncologists, constituting a total of 650 slices of data sets. The data were classified into the same group if the slice was evaluated by two oncologists with the same CTV score. We calculated the weighted kappa coefficient to analyze for consistency.

Time Cost

Processing time was measured for the AI tool and pre- and post-AI assistance in the delineation of CTV and OARs for BC radiotherapy.

Statistical Analysis

The Wilcoxon matched-pairs signed-rank test was used to compare DSC and 95HD between our proposed model and U-Net and the differences between the two oncologists during the evaluation of CTV and OARs segmentation. McNemar test and kappa test were used to assess the consistency of the two oncologists. Statistical significance was set at two-tailed P<0.05.

Results

Performance of U-ResNet and Comparison with U-Net

The median age for all the 160 patients in dataset was 49 [42, 58]. The average CTV volume was 494.41 ± 198.51 cm3.

For CTV segmentation, the average DSC values of U-ResNet and U-Net were 0.94 vs.0.93 (P=0.001), and the average 95HD value was 4.31 mm vs 4.88 mm separately (P=0.030). Both differences were statistically significant, implying better accuracy of CTV contouring by U-ResNet.

Among all OARs, significant differences between U-ResNet and U-Net were achieved for the spinal cord (DSC: 0.93 vs 0.92 (P=0.015), 95HD:4.37 mm vs 5.07 mm (P=0.003)) and the contralateral breast (DSC: 0.95 vs 0.93 (P<0.001), 95HD:3.59 mm vs 4.15 mm (P=0.010)). The right lung contouring also displayed a statistically significant difference in 95HD (3.18 mm vs 2.98 mm (P=0.041)).

The results of the comparison are summarized in Table 2 and Figure 2.

Table 2 DSC and 95HD for CTV and All OARs

Figure 2 Boxplots obtained for DSC and 95HD analyses of U-ResNet and U-Net. (A) DSC analyses, (B) 95HD analyses.

Figure 3 shows the visualization segmentation samples in GT, U-Net and U-ResNet, respectively. The auto-segmented contours with U-ResNet were in good concordance with the GT contours.

Figure 3 CTV and OAR contours generated by (A) GT, (B) U-ResNet, and (C) U-Net after breast conservative surgery.

Oncologist Evaluation

Tables 3 and 4 show the oncologist evaluation results of OAR and CTV contours. Scores ≥2 were defined as suitable for clinical application. When using our grading criteria for contour evaluation, the majority of AI- and GT-generated OAR contours were deemed acceptable by the experts. Only one contour (5%) of the heart was assessed to require major revision by oncologist A.

Table 3 Evaluation for CTV and OARs by Oncologist A

Table 4 Evaluation for CTV and OARs by Oncologist B

Regarding CTV contours, 99.4% of those generated by AI were clinically acceptable by oncologist A, compared with 98.1% of GT segmentations. For oncologist B, the results were 99.4% for both methods. The average CTV scores for AI and GT were 2.89 vs 2.92 when evaluated by oncologist A (P=0.612) and 2.75 vs 2.83 by oncologist B (P=0.213), with no statistical differences.

Wilcoxon matched-pairs test was performed for the evaluation of the two oncologists for AI and GT contours separately. The results indicated that the average…

Automatic segmentation of clinical target volume

Post a Comment

Previous Post Next Post