Growth monitoring of greenhouse lettuce

04-08-2020    14:50   |    Nature

Growth-related traits, such as aboveground biomass and leaf area, are critical indicators to characterize the growth of greenhouse lettuce. Currently, nondestructive methods for estimating growth-related traits are subject to limitations in that the methods are susceptible to noise and heavily rely on manually designed features. In this study, a method for monitoring the growth of greenhouse lettuce was proposed by using digital images and a convolutional neural network (CNN). Taking lettuce images as the input, a CNN model was trained to learn the relationship between images and the corresponding growth-related traits, i.e., leaf fresh weight (LFW), leaf dry weight (LDW), and leaf area (LA). To compare the results of the CNN model, widely adopted methods were also used. The results showed that the values estimated by CNN had good agreement with the actual measurements, with R2 values of 0.8938, 0.8910, and 0.9156 and normalized root mean square error (NRMSE) values of 26.00, 22.07, and 19.94%, outperforming the compared methods for all three growth-related traits. The obtained results showed that the CNN demonstrated superior estimation performance for the flat-type cultivars of Flandria and Tiberius compared with the curled-type cultivar of Locarno. Generalization tests were conducted by using images of Tiberius from another growing season. The results showed that the CNN was still capable of achieving accurate estimation of the growth-related traits, with R2 values of 0.9277, 0.9126, and 0.9251 and NRMSE values of 22.96, 37.29, and 27.60%. The results indicated that a CNN with digital images is a robust tool for the monitoring of the growth of greenhouse lettuce.


Growth monitoring is essential for optimizing management and maximizing the production of greenhouse lettuce. Leaf fresh weight (LFW), leaf dry weight (LDW), and leaf area (LA) are critical indicators for characterizing growth1,2. Monitoring the growth of greenhouse lettuce by accurately obtaining growth-related traits (LFW, LDW, and LA) is of great practical significance for improving the yield and quality of lettuce3. The traditional methods for measuring growth-related traits, which are relatively straightforward, can achieve relatively accurate results4. However, the methods require destructive sampling, thus making it time-consuming and laborious5,6,7.

In recent years, nondestructive monitoring approaches have become a hot research topic. With the development of computer vision technology, image-based approaches have been widely applied to the nondestructive monitoring of crop growth6,8,9,10. Specifically, the image-based approaches extract low-level features from digital images and establish the relationship between the low-level features and manually measured growth-related traits, such as LA, LFW, and LDW. Based on this relationship, the image-derived features can estimate the growth-related traits, thus achieving nondestructive growth monitoring. For example, Chen et al.6 proposed method for the estimation of barley biomass. The authors extracted structure properties, color-related features, near-infrared (NIR) signals, and fluorescence-based features from images. Based on the above features, they built multiple models, i.e., support vector regression (SVR), random forest (RF), multivariate linear regression (MLR), and multivariate adaptive regression splines, to estimate barley biomass. The results showed that the RF model was able to accurately estimate the biomass of barley and better quantify the relationship between image-based features and barley biomass than the other methods. Tackenberg et al.11 proposed a method for estimating the growth-related traits of grass based on digital image analysis. Image features, such as the projected area (PA) and proportion of greenish pixels, were extracted, which were then fitted to the actual measured values of the aboveground fresh biomass, oven-dried biomass, and dry matter content by linear regression (LR). The results showed that all the determined coefficients of the constructed models were higher than 0.85, indicating that these features exhibited good linear relationship with growth-related traits. Casadesús and Villegas5 used color-based image features to estimate the leaf area index (LAI), green area index (GAI), and crop dry weight biomass (CDW) of two genotypes of barley. The image features included the H component of the HSI color space, the a* component of the CIEL*a*b* color space, and the U components of the CIELUV color space. In addition, the green fraction and greener fraction were also extracted. The features were linearly fitted to the measured values of LAI, GAI, and CDW at different growth stages. The results showed that the image features based on color had strong correlations with growth-related traits. Fan et al.12 developed a simple visible and NIR (near-infrared) camera system to capture time-series images of Italian ryegrass. Based on the digital number values of the R, G, and NIR channels of the raw images, MLR models for LAI estimation were built. The results showed that the image features derived from segmented images yielded better accuracy than those from non-segmented images, with an R2 value of 0.79 for LAI estimation. Liu and Pattey13 extracted the vertical gap fraction from digital images captured from nadir to estimate the LAI of corn, soybean, and wheat. Prior to the extraction of the canopy vertical gap fraction, the authors adopted the histogram-based threshold method to segment the green vegetative pixels. The results showed that the LAI estimated by the digital images before canopy closure was correlated with the field measurements. Sakamoto et al.14 used vegetation indices derived from digital images, i.e., the visible atmospherically resistant index (VARI) and excess green (ExG), to estimate the biophysical characteristics of maize during the daytime. The results showed that the VARI could accurately estimate the green LAI, and the ExG was able to accurately estimate the total LAI.

Although computer vision-based methods for estimating growth-related traits have achieved promising results, they are subject to two issues. First, the methods are susceptible to noise. Since the images are captured under field conditions, noise caused by uneven illumination and cluttered backgrounds is inevitable, which will affect image segmentation and feature extraction, thus potentially reducing the accuracy15. Second, the methods greatly rely on manually designed image features, which have large computational complexity. Moreover, the generalization ability of the extracted low-level image features is poor16,17. Therefore, a more feasible and robust approach should be explored.

Convolutional neural networks (CNNs), which is a state-of-the-art deep learning approach, can directly take images as input to automatically learn complex feature representations18,19. With a sufficient amount of data, CNNs can achieve better precision than conventional methods20,21. Therefore, CNNs have been used in a wide range of agricultural applications, such as weed and crop recognition19,22,23, plant disease diagnosis24,25,26,27,28, and plant organ detection and counting21,29. However, despite its extensive use in classification tasks, CNNs have rarely been applied to regression applications, and there are few reports on how CNNs have been used for the estimation of growth-related traits of greenhouse lettuce. Inspired by Ma et al.18, who accurately estimated the aboveground biomass of winter wheat at early growth stages by using a deep CNN, which is a CNN with a deep network structure, this study intended to adopt a CNN to construct an estimation model for growth monitoring of greenhouse lettuce based on digital images and to compare the results with conventional methods that have been widely adopted to estimate growth-related traits.

The objective of this study is to achieve accurate estimations of growth-related traits for greenhouse lettuce. A CNN is used to model the relationship between an RGB image of greenhouse lettuce and the corresponding growth-related traits (LFW, LDW, and LA). By following the proposed framework, including lettuce image preprocessing, image augmentation, and CNN construction, this study will investigate the potential of using CNNs with digital images to estimate the growth-related traits of greenhouse lettuce throughout the entire growing season, thus exploring a feasible and robust approach for growth monitoring.

Material and methods

Greenhouse lettuce image collection and preprocessing

The experiment was conducted at the experimental greenhouse of the Institute of Environment and Sustainable Development in Agriculture, Chinese Academy of Agricultural Sciences, Beijing, China (N39°57′, E116°19′). Three cultivars of greenhouse lettuce, i.e., Flandria, Tiberius, and Locarno, were grown under controlled climate conditions with 29/24 °C day/night temperatures and an average relative humidity of 58%. During the experiment, natural light was used for illumination, and a nutrient solution was circulated twice a day. The experiment was performed from April 22, 2019, to June 1, 2019. Six shelves were adopted in the experiment. Each shelf had a size of 3.48 × 0.6 m, and each lettuce cultivar occupied two shelves.

The number of plants for each lettuce cultivar was 96, which were sequentially labeled. Image collection was performed using a low-cost Kinect 2.0 depth sensor30. During the image collection, the sensor was mounted on a tripod at a distance of 78 cm to the ground and was oriented vertically downwards over the lettuce canopy to capture digital images and depth images. The original pixel resolutions of the digital images and depth images were 1920 × 1080 and 512 × 424, respectively. The digital images were stored in JPG format, while the depth images were stored in PNG format. The image collection was performed seven times 1 week after transplanting between 9:00 a.m. and 12:00 a.m. Finally, two image datasets were constructed, i.e., a digital image dataset containing 286 digital images and a depth image dataset containing 286 depth images. The number of digital images for Flandria, Tiberius, and Locarno was 96, 94 (two plants did not survive), and 96, respectively, and the number of depth images for the three cultivars was the same.

Since the original digital images of greenhouse lettuce contained an excess of background pixels, this study manually cropped images to eliminate the extra background pixels, after which images were uniformly adjusted to 900 × 900 pixel resolution. Figure 1 shows examples of the cropped digital images for the three cultivars. Prior to the construction of the CNN model, the original digital image dataset was divided into two datasets in a ratio of 8:2, i.e., a training dataset and a test dataset. The two datasets both covered all three cultivars and sampling intervals. The number of images for the training dataset was 229, where 20% of the images were randomly selected for the validation dataset. The test dataset contained 57 digital images. To enhance data diversity and prevent overfitting, a data augmentation method was used to enlarge the training dataset (Fig. 2). The augmentations were as follows: first, the images were rotated by 90°, 180°, and 270°, and then flipped horizontally and vertically. To adapt the CNN model to the changing illumination of the greenhouse, the images in the training dataset were converted to the HSV color space, and the brightness of the images was adjusted by changing the V channel31. The brightness of the images was adjusted to 0.8, 0.9, 1.1, and 1.2 times that of the original images to simulate the change in daylight. In total, the training dataset was enlarged by 26 times, resulting in 5954 digital images.

Fig. 1: Examples of the digital images for the three cultivars.

ab, and c shows the cultivar of Flandria, Tiberius, and Locarno

Photo Courtesy of Nature

Full size image

Fig. 2

Image augmentation scheme

Photo Courtesy of Nature

Full size image

Measurement of greenhouse lettuce growth-related traits

Field measurements of LFW, LDW, and LA were performed simultaneously with image collection. These measurements were conducted at an interval of seven days, specifically on April 29, May 6, May 13, May 20, May 27, May 31, and June 1 of 2019. For the first six measurements, ten plants of greenhouse lettuce were randomly sampled each time for each cultivar. The measurements were obtained using a destructive sampling method. The sample was placed on a balance with a precision of 0.01 g after root removal, and the LFW was measured. The LA of the corresponding sample was obtained by a LA meter (LI-3100 AREA METER; LI-COR Inc. Lincoln, Nebraska, USA). Given the relatively large leaves of lettuce during the late growing season, the sample was sealed in an envelope and oven-dried at 80 °C for 72 h, after which the sample was weighed to obtain the LDW. For the last measurement, all the remaining lettuce plants were harvested, and the measurements were obtained by using the same method.

Construction of the CNN

The architecture of the CNN model is shown in Fig. 3. The CNN model consisted of five convolutional layers, four pooling layers, and one fully connected layer. The input to the CNN model was digital images of greenhouse lettuce with a size of 128 × 128 × 3 (width × height (H) × channel). The convolutional layers adopted kernels with a size of 5 × 5 to extract features. The number of kernels in the five convolutional layers were 32, 64, 128, 216, and 512. To keep the size of the feature maps as an integer, zero-padding was employed in the second and third convolutional layers. The kernels in the pooling layers had a size of 2 × 2 and a stride of 2, which was able to reduce the size of feature maps by a factor of two. The average pooling function was adopted in the pooling layers instead of the max pooling function. The number of hidden neurons in the fully connected layer was three, corresponding to the three outputs of the model, i.e., the LFW, LDW, and LA. Therefore, the CNN model could estimate the three growth-related traits simultaneously. Dropout was used, and the rate was 0.5. In this study, the CNN model used stochastic gradient descent to optimize the network weights. The initial learning rate of the model was set to 0.001 and dropped every 20 epochs by a drop factor of 0.1. The mini-batch size was set to 128, and the maximum number of epochs for training was set to 300.

Fig. 3

Architecture of the CNN model

Photo Courtesy of Nature

Full size image

Performance evaluation

To evaluate the performance of the CNN model, tests were performed with the widely adopted estimation methods. Two shallow machine learning classifiers, i.e., SVR32,33 and RF34, were adopted to estimate the growth-related traits of greenhouse lettuce since these two methods have been reported to achieve good performance in crop growth monitoring. According to “Greenhouse lettuce image collection and preprocessing,” there was a large number of background pixels in the captured images of greenhouse lettuce. Therefore, it was necessary to conduct image segmentation to extract the lettuce pixels, thus ensuring that the extracted features in the following step were presenting the lettuce plants. For the digital images of the greenhouse lettuce, since the color contrast between the lettuce plant and the background was very obvious, image segmentation was achieved by using the adaptive threshold method for the color information. Some segmentation results are shown in Fig. 4.

Fig. 4: Image segmentation results of the three cultivars of greenhouse lettuce.

ac shows the original images of greenhouse lettuce, and df shows the corresponding segmentation results

Photo Courtesy of Nature

Full size image


To build the shallow machine learning classifiers, feature extraction was performed on the segmented images of greenhouse lettuce. According to the characteristics of the three cultivars of greenhouse lettuce, low-level image features, including color, texture, and shape features, were extracted35. The color features included the average and standard deviation of 15 color components of five color spaces (RGB, HSV, CIEL*a*b, YCbCr, and HSI)36. Based on the color components, the gray level co-occurrence matrix37 was combined to extract the texture features. The texture features included the contrast, correlation, energy, and homogeneity of the 15 color components. The shape features of the greenhouse lettuce that were extracted were area and perimeter in this study. The area was the area enclosed by the outline, and the perimeter was the total length of the blade outline. After extracting the image features, the Pearson coefficient was used to perform correlation analysis between the extracted features and the actual values of the LFW, LDW, and LA of greenhouse lettuce. The features with relatively high correlation values were used to build the shallow machine learning classifiers.

In addition to the above image features, structural features derived from the depth images, including H, PA, and digital volume (V), were also used to estimate the growth-related traits of the greenhouse lettuce8,38,39,40. Three LR models using H, PA, and V as the predictor variables (LR-H, LR-PA, and LR-V) were also used for comparison. Similar to the processing of digital images, image segmentation was also conducted on the depth images, which was achieved by the entropy rate superpixel segmentation method41. The lettuce plant could be extracted using the Euclidean distance to find the superpixel that was closest to the center of the image (Fig. 5). Once the lettuce plant was obtained, the structural features could be calculated (Fig. 6). Since the pixel value of the depth image was the actual distance from the sensor to the object, it reflected the depth information. Therefore, PA could be obtained by counting the number of pixels in the lettuce plant area. H could be obtained by averaging the H of the pixels in the lettuce plant area, which was obtained by using the H of the sensor minus the pixel values in the lettuce plant area. V could be obtained by multiplying PA by H.

Fig. 5: Depth image segmentation.

a shows the randomly colored superpixels, and b show the segmented lettuce plant

Photo Courtesy of Nature

Full size image


Fig. 6

The calculations for PA and H
Photo Courtesy of Nature

Full size image


In this study, the coefficient of determination (R2) and the normalized root mean square error (NRMSE) were used as the criteria for evaluating the performances of all the estimation models.


In this study, the construction of the estimation models and image preprocessing were implemented using MATLAB 2018b (MathWorks Inc., USA). The software environment was Windows 10 Professional Edition, the hardware environment was an Intel i7 processor, CPU 3.20 GHz, with 8 GB memory, and the GPU was NVIDIA GeForce GTX1060.


Click here for more information.

Photo by ThisisEngineering RAEng on Unsplash

Comments (0)

No comments found!

Write new comment

More news