Abstract:
Artificial intelligence methods have a very wide range of applications. From speech recognition to selfdriving cars, the development of modern deep-learning architectures is helping researchers to achieve new levels of accuracy in different fields. Although deep convolutional neural networks (CNNs) (a kind of deeplearning technique) have reached or surpassed human-level performance in image recognition tasks, little has been done to transport this new image classification technology to geoscientific problems. We have developed what we believe to be the first use of CNNs to identify lithofacies in cores. We use highly accurate models (trained with millions of images) and transfer learning to classify images of cored carbonate rocks. We found that different modern CNN architectures can achieve high levels of lithologic image classification accuracy (approximately 90%) and can aid in the core description task. This core image classification technique has the potential to greatly standardize and accelerate the description process. We also provide the community with a new set of labeled data that can be used for further geologic/data science studies. Introduction Advances in deep learning and artificial intelligence promise to not only drive our cars but to also taste our beer (Gardner et al., 1994; Daily et al., 2017). Specifically, recent advances in the architecture of deep-learning convolutional neural networks (CNNs) have brought the field of image classification and computer vision to a new level. Very deep CNNs emerged in 2014 and have achieved new levels of accuracy in several artificial intelligence classification problems (Szegedy et al., 2014). The current benchmark in object category classification and detection, called ImageNet, consists of hundreds of mixed-object categories and millions of images (Deng et al., 2009; Russakovsky et al., 2015), and it is commonly used to train CNNs. Current CNN models are able to differentiate the image of a leopard from that of a container ship; moreover, they can differentiate images of leopards from their biological cousins — cheetahs and snow leopards (Krizhevsky et al., 2012). Although machine learning has been significantly used in geoscience fields, the application of this technique in core-based lithofacies identification, a key component to better understanding oil and gas reservoirs, is still limited. Machine-learning techniques have been intensely used to aid seismic-facies classification (de Matos et al., 2007, 2011; Roy et al., 2014; Qi et al., 2016; Zhao et al., 2016, 2017; Qian et al., 2018), electrofacies classification (Allen and Pranter, 2016), lithofacies classification from well logs (Baldwin et al., 1990; Zhang et al., 1999; Bestagini et al., 2017), to predict permeability in tight sands (Zhang et al., 2018), and even for seismicity studies (Kortström et al., 2016; Perol et al., 2018; Sinha et al., 2018; Wu et al., 2018). Cored wells are important because they are the only data that provide the ground truth of subsurface reservoirs including the lithofacies variations. The goals of corebased rock-type descriptions are to identify key lithofacies and facies associations; evaluate facies stacking and identify and interpret depositional environments; evaluate the relationships among porosity, permeability, and lithofacies; and help operators to identify optimal zones for designing completions. Traditional corebased lithofacies identification is challenging because it is costly, time consuming, and subjective (e.g., different geologists describing the same core might yield different results). To address some of the core-based lithofacies identification challenges, we evaluate whether a CNN can help a specialist on their image-recognition task. CNN goes hand in hand with the construction and archival of digital databases. Many museums are now The University of Oklahoma, School of Geology and Geophysics, 100 East Boyd Street, RM 710, Norman, Oklahoma 73019, USA and The Geological Survey of Brazil-CPRM, 55 Rua Costa, São Paulo, Brazil. E-mail: rlima@ou.edu (corresponding author). The University of Oklahoma, School of Geology and Geophysics, 100 East Boyd Street, RM 710, Norman, Oklahoma 73019, USA and Oklahoma Geological Survey, 100 East Boyd Street, Room N-131, Norman, Oklahoma 73019, USA. E-mail: huangcienming-1@ou.edu. The University of Oklahoma, School of Geology and Geophysics, 100 East Boyd Street, RM 710, Norman, Oklahoma 73019, USA. E-mail: kmarfurt@ou.edu; matthew.pranter@ou.edu. Manuscript received by the Editor 18 December 2018; published ahead of production 02 April 2019; published online 28 May 2019. This paper appears in Interpretation, Vol. 7, No. 3 (August 2019); p. SF27–SF40, 15 FIGS., 11 TABLES. http://dx.doi.org/10.1190/INT-2018-0245.1. © 2019 Society of Exploration Geophysicists and American Association of Petroleum Geologists. All rights reserved. t Special section: Insights into digital oil field data using artificial intelligence and big data analytics Interpretation / August 2019 SF27 D ow nl oa de d 07 /3 1/ 19 to 6 8. 97 .1 15 .2 6. R ed is tr ib ut io n su bj ec t t o SE G li ce ns e or c op yr ig ht ; s ee T er m s of U se a t h ttp :// lib ra ry .s eg .o rg / busy digitizing and sharing their collections (Blagoderov et al., 2012; Ellwood et al., 2015) With the exception of core measured by deep-sea drilling projects and the like (e.g., NOAA, 2016), core images are not readily available. As an example, more than 100 mi of cores are stored in the Oklahoma Petroleum Information Center, managed by the Oklahoma Geologic Survey. Other states and countries have similar repositories (USGS Core Research Center, 2018). Further digitization of this valuable resource resulting in core images will not only facilitate access to data for traditional analysis but will also provide the information needed to build and calibrate innovative machine-learning algorithms. The work we use here has the potential to organize many miles of slabbed cores into a reliable and coherent system easily accessible to a variety of users. In this paper, we provide one of the first attempts to conduct automated core lithofacies classification using CNN. We begin with an overview of the methodology, which includes data preparation and transfer learning. The details of the CNN method are summarized in tutorial form in Appendix A. Then, we apply CNN to our core data set, and we use confusion matrices, test and validation accuracies, as well as precision, recall, and the F1 score (Fawcett, 2006) computed with the final test set as a means to analyze our results. We conclude with a summary of our findings and suggestions on how our workflow can be extended and improved. Methodology The deep-learning methodology and CNN techniques are now very well-disseminated in diverse fields. LeCun et al. (2015) present details in the construction and the value of deep learning. Dumoulin and Visin (2016) give details on convolutions and other arithmetic steps used in deep-learning algorithms. Although carefully constructed interative papers have been published detailing CNN image transformations and image understanding (e.g., Olah et al., 2017, 2018), CNN may appear to be “magic” and therefore somewhat suspect to the practicing geoscientist. For this reason, Appendix A provides a tutorial that looks under the covers, providing a simple CNN application to classify images into three groups. The work for this paper was developed using open-source computational packages described by Hunter (2007), Chollet (2015), and Abadi et al. (2016) When used for image recognition tasks, CNN models need examples (images) to understand the properties of each “class” that they try to discriminate. Part of the parameters learned for a primary task (such as the ImageNet classification) can be transferred to a secondary task (e.g., lithofacies classification) through the use of transfer learning (Pan and Yang, 2010; Oquab et al., 2014; Yosinski et al., 2014). Our work focuses on using transfer learning of complex CNN architectures to serve our specific image recognition task. The following subsections detail how we prepared our data sets and give a brief explanation of transfer learning. Data preparation We used cores described using traditional methods published by Suriamin and Pranter (2018), capturing images using modern photographic equipment to generate the set of labeled data to feed our CNN. The total section used for this project consists of approximately 700 ft from one core from the Mississippian limestone and chert reservoirs in the Anadarko Shelf, Grant County, Oklahoma. The set of core images shown in Table 1 includes 17 different lithofacies. Two pairs of lithofacies exhibit similar lithology and appearance; we grouped these into a single class for this project. We carefully cropped the images in a standardized fashion, providing consistent input to the CNN. We used a sliding window technique to extract consistent squared cropped sections from the original core images (Figure 1), generating 180 × 180 pixels images representing Table 1. Class number assigned to each lithofacies in the core used in this study. Class Lithofacies Training set Test set 01 Chert breccia in greenish shale matrix 218 3 02 Chert breccia 236 3 03 Skeletal mudstonewackestone 258 4 04 Skeletal grainstone 160 3 05 Splotchy packstone grainstone 344 4 06 Bedded skeletal peloidal packstone-grainstone 416 4 07 Nodular packstone-grainstone 445 11 08 Skeletal peloidal packstonegrainstone Not used Not used 09 Bioturbated skeletal peloidal packstone-grainstone 795 19 10 Bioturbated mudstonewackestone 150 4 11 Brecciated spiculitic mudstone Not used Not used 12 Intraclast spiculitic mudstone Not used Not used 13 Spiculitic mudstonewackestone 3077 79 14 Argillaceous spiculitic mudstone-wackestone 15 Glauconitic sandstone Not used Not used 16 Shale 789 17 17 Shaly claystone Total number of images in each set 6888 151 Note: Classe