Machine learning for prediction of undrained shear strength from cone penetration test data
Citation:
Beiyang Yu, Divya Varkey, Bram van den Eijnden, Guillaume Rongier, Michael Hicks, Machine learning for prediction of undrained shear strength from cone penetration test data, 14th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP14), Dublin, Ireland, 2023.Download Item:

Abstract:
Being widely employed in the design phase of many large-scale infrastructure projects, the shear strength of soil has always been one of the most important parameters in geotechnical engineering. Many methods have been applied to estimate soil shear strength, including various laboratory tests, in-situ tests and analytical methods. The cone penetration test, as an in-situ test method, is a powerful and cost-effective tool to investigate subsoil conditions, and various empirical correlations are available for interpreting cone penetration test data. However, these correlations are not universally applicable to all soils and subsurface conditions. Therefore, cone penetration test data are usually complemented by laboratory test data to verify the applicability of the correlations. For large projects involving thousands of data, however, laboratory-based studies of the subsoil can become not only more complex and tedious but also more expensive, compared with cone penetration tests. Therefore, new approaches for estimating soil shear strength are demanded. Having demonstrated superior predictive ability for many material properties compared to traditional methods, machine learning methods have been increasingly popular and widely used. This research focuses on investigating the relative performance of a range of machine learning algorithms, namely the artificial neural network, support vector machine, Gaussian process regression, random forest and XGBoost, for predicting the undrained shear strength from cone penetration test data to better assess how machine learning could help us lower the need for laboratory test data. The training dataset compiles 526 data from 13 countries and the testing dataset consists of 20 data from a polder located close to Leiden in the Netherlands. In addition, k-fold and group k-fold cross-validation strategies are applied to validate the models. K-fold cross-validation reproduces a scenario where new samples are added to sites where we already have data, whereas group k-fold reproduces a scenario where we add a new site. The poor performance of the models during group k-fold cross-validation suggests that, while machine learning techniques can perform decently when site-specific data are included during training, they struggle to generalize without site-specific data. This highlights the difficulty of capturing soil heterogeneity and suggests that either machine learning methods should be trained on specific sites for which some data are already available, or much larger training datasets are needed.
Description:
PUBLISHED
Author: ICASP14
Other Titles:
14th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP14)Type of material:
Conference PaperCollections:
Series/Report no:
14th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP14)Availability:
Full text availableLicences: