Introduction
Big data refers to overwhelming amounts of complex and heterogeneous data to be dealt with by traditional data-processing applications. In recent years, data science has been the domain of study that deals with these big data using modern tools and techniques to find unseen patterns and derive meaningful information (Agarwal & Dhar, 2014; Dobre & Xhafa, 2014; Sivarajah et al., 2017). Big data in the Earth sciences are conducted through the development of nascent technological instruments such as meteorological satellites, ocean observing systems, seismic instrumentation systems, and space telescopes (Gvishiani et al., 2022). A vast amount of data from the core of the Earth to the universe are analyzed using advanced computational scientific methods. In atmospheric science, modern findings have been conducted by developing instruments and numerical modeling using the high-performance computing system on broad spatiotemporal scales (Fathi et al., 2021; Zhou et al., 2022). On the other hand, storage of the weather and climate datasets is necessary for complex multidimensional file formats of big sizes. Network Common Data Form (NetCDF) is the file format storing multidimensional scientific variables and is widely adopted in Earth science research such as meteorology, climatology, geology, and astronomy. Traditional scientific languages such as C, C++, and Fortran can read and manipulate NetCDF files. However, NetCDF programming interfaces are more readily available in dealing with modern languages of Python, R, Interface Description Language (IDL), MATrix LABoratory (MATLAB), and NCAR Command Language (NCL). An analysis of the meteorological data is conducted to objectively analyze weather and climate, identify mechanisms, and determine the representative data for a region of interest in a globe. With the availability of vast amounts of meteorological data, the demand for techniques and tools for analyzing them also increased in recent years (Fathi et al., 2021; Zhou et al., 2022). The NCL, a product of the National Center for Atmospheric Research (NCAR) sponsored by the National Science Foundation in the United States of America, is a free interpreted language for data processing and visualization. NCL supports a variety of file formats, including NetCDF, for use in meteorological research. Currently, NCL is no longer updated and the transition to Python module form is in full swing. Recent trends show that open-source languages are becoming highly popular because they support major functionalities required for meteorological data analysis and parallel computing (Gharat et al., 2022). Meanwhile, East Asia is undergoing significant climate changes from a wide range of factors including extreme temperature and precipitation, and it is likely to grow (Cho et al., 2021; Cho et al., 2022; Kim et al., 2016). In particular, the wind speeds near the tropopause tend to weaken as the meridional air temperature gradient decreases due to the intensified atmospheric warming in the higher latitudes of East Asia (Kim et al., 2016). Such wind speed changes in East Asia have resulted in changing atmospheric environments in Korea (Cho et al., 2021; Cho et al., 2022). The NCL could be used to analyze the variations of climate normal, indicating the 30-year average of meteorological variables in East Asia. Moreover, the seasonal climatological mean in East Asia could be described using massive meteorological data. This study emphasizes interpreting meteorological big data and analyzing seasonal air temperature variations in East Asia during the three climatological periods of 1971-2000, 1981-2010, and 1991-2020.Materials and Methods
Meteorological Reanalysis Data
A reanalysis is a systematic approach to producing meteorological data sets on multidimensional scales. Meteorological reanalysis data incorporate millions of observations into a stable data assimilation system, enabling an analysis of weather and climate processes (Kalnay et al., 1996). Reanalysis data sets are multivariate, spatially, temporally complete, and gridded. Also, meteorological reanalysis data combine incomplete and inaccurate observations and imperfect models.
The National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis dataset for the meteorological variables consists of atmospheric model outputs with different time scales of 4-times daily, the daily mean, and monthly mean from the year 1948 up to the present. The monthly air temperatures of NCEP/NCAR reanalysis datasets () were taken to analyze atmospheric variability in East Asia using NCL libraries, which datasets are available in the NetCDF file format. Fig. 1 shows the air temperature NetCDF file format, indicating the detailed structure of the four dimensions. The NCEP/NCAR air temperature datasets comprise four dimensions of time, pressure level, latitude, and longitude; monthly intervals from 1948 to the present, pressure levels from 1,000 hPa to 10 hPa, and spatial coverage of 2.5° latitudes × 2.5° longitude. In this study, the monthly air temperatures were used in spatial coverage of East Asia with 25-55°N and 90-150°E at 850 hPa for 1971-2020.
Empirical Orthogonal Function Analysis
The most widely used statistical technique in meteorology is principal component analysis (PCA). The PCA technique has gained ground for studying meteorological data by Lorenz (1956), who called the technique an empirical orthogonal function (EOF) analysis. The EOF analysis finds the spatial and temporal variability and provides the importance of each pattern in meteorological studies. An EOF analysis is a technique that mathematically compresses N-dimensional data and expresses it in a dimension smaller than N. The analysis data can be linearly transformed to extract the significant variability from the calculated data, which can explain most of the raw data. The columns of matrix U are called the EOF of A, and the diagonal elements of matrix S are called eigenvalues. The eigenvalues of the matrix S are the variance of the raw data described by EOF, and the sum of the eigenvalues of all EOFs gives the variance of the raw data in the following Eq. 1. Each column of the matrix VT represents the temporal evolution of each EOF as a time series coefficient.
A = U × S × VT (Eq. 1)
In this study, an EOF analysis was employed to analyze the impacts of the East Asian seasonal air temperature variations on Korea. The area targeted for the EOF analysis was consistent with the coverage of the NCEP/NCAR monthly air temperature dataset in East Asia (90-150°E, 25-55°N), including eastern China and Korea. To analyze the seasonal air temperature variations depending on seasons between the three climatological periods of 1971-2000, 1981-2010, and 1991-2020, an EOF analysis was applied to the monthly air temperatures at 850 hPa. EOF analysis was performed using the seasonal average value of monthly air temperature. The principal components were calculated in the first and second modes in the EOF analysis. Then, the seasonal PCA results were compared with surface-based air temperature observations from the Korea Meteorological Administration (KMA) for the three climatological periods, describing the effects of the East Asian continent on Korea.
Results and Discussions
Setup of the NCL in the Linux-based System
NCL is a powerful modern language for reading, writing, manipulating, and visualizing meteorological NetCDF file datasets. In this study, NCL, version 6.6.2, was installed on the 64-bit Community Enterprise Operating System (CentOS) environment, comprising a hardware system of CPU with 64 cores (2.7 GHz) and memory (1TB). CentOS, version 7.9, was already installed in the hardware system. In this study, NCL was built with the GNU compilers in the CentOS Linux-based system. A CentOS Linux-based NCL was efficiently used through the SSH-connecting terminal software installed in the Windows operating system-based computer (Windows system). The SSH connections from the Windows system enabled it to command performing NCL in a script file in a CentOS Linux-based system and to get the analysis results within a postscript file format. NCL contains libraries for a suite of functions to read NetCDF meteorological data files, analyze EOF, and plot analysis results.
Every 10 years, average values for air temperatures, rainfalls, and other meteorological variables are calculated for the past three decades, known as the climate normal; those 30-year averages now span 1991-2020. Fig. 2 shows the NCL script reading and plotting NCEP/NCAR monthly air temperatures to analyze EOF spatial patterns during the three climatological periods of 1971-2000, 1981-2010, and 1991-2020. The parameters specifying the region and period are defined, and the data corresponding to the specified region and period is open. The standard EOF function, included in the NCL libraries indicating mathematical Eq. 1, was declared in the script to conduct spatial patterns and the seasonal PCA annual series. The EOF analysis results are produced within a plot in the workstation under the Windows system and converted the analysis results to a postscript file format. Those EOF functions would allow the NCL libraries to become educational materials in Earth science coursework and help instructors provide students with a reason to learn scientific languages.
Seasonal Air Temperature Variations in East Asia
The climate normal variations for seasonal air temperatures in East Asia were analyzed using the EOF method. Fig. 3 shows the spatial EOF patterns of the seasonal air temperatures at 850 hPa in East Asia (90-150°E, 25-55°N) during each climatological period of 1971-2000, 1981-2010, and 1991-2020. The EOF modes are dominated as a monopole covering most of East Asia. While the seasonal air temperatures varied a little bit among the three climatological periods, the first EOF mode for the most recent climatological period of 1991-2020 accounted for 50.6% in spring, 50.3% in winter, 42.7% in fall, and 36.9% in summer as in the same manner in the East Asian region ranging from northern Mongolia to Korea. Furthermore, the second EOF mode accounted for 25.7% of fall, 18.3% of summer, 17.5% of spring, and 16.2% of winter, with dipole structures of differing signals between northeastern and southwestern Asia. In other studies, on East Asia, atmospheric circulation anomalies have essential contributions to the formation of seasonal meteorological variable anomalies in East Asia (Chen et al., 2016; Chen et al., 2019; Miyazaki & Yasunari, 2008).
During the recent climatological period of 1991-2020, the sum of the two EOFs accounts for 68.4% in fall, 68.1% in spring, 67.5% in winter, and 55.2% in summer. Furthermore, the seasonal EOF percentages showed concurrent air temperature variations in the East Asian region from Mongolia eastward to Korea through eastern China during the three climatological periods. The East Asian continent has influenced Korea by the prevailing westerlies. Kim et al. (2017) reported that significant increases in annual air temperatures grew more substantial in the higher latitudes over 40°N such as Mongolia and northeastern China, and the northwest Pacific Ocean, compared with the mid-latitude region of eastern China and Korea from 1997 to 2016.
Fig. 4 shows the time series of the EOF modes for the seasonal air temperatures at 850 hPa in East Asia (90-150°E, 25-55°N) during each climatological period of 1971-2000, 1981-2010, and 1991-2020. In the seasonal air temperature variations, the influence of the East Asian continent on Korea has declined with lowering atmospheric warming effects in winter during the recent 1991-2020 than those previous climatological periods. A few studies suggested decreases in the air temperature increasing rate during the winter season in East Asia (Min et al., 2015; You et al., 2022). In contrast, the air temperature increasing rate has gradually intensified during the warm seasons compared with the winter seasons. While the weather conditions of the Korean summer have been dominated by the expanding air mass of the Northwest Pacific, the influence of the East Asian continent on Korea gradually intensified during spring and summer during the recent climatological period of 1991-2020.
Impact of Seasonal Variations in East Asia on Korea
Seasonal PCA on the first EOF modes in East Asia was compared with the surface-based observations of air temperatures in Korea to analyze the variations of the impacts of the East Asian continent on Korea during the climatological periods. The surface-based observations of air temperatures in Korea were analyzed using the national average values from the Korea Meteorological Administration during 1971-2020. Since the first EOF mode contains more than twice the second mode, the first EOF mode is mainly analyzed. Fig. 5 shows the correlation coefficients in seasons between the PCAs and the observed air temperatures in Korea during the climatological periods from 1971-2000 to 1991-2020. The seasonal PCAs in East Asia were correlated with air temperatures in Korea on significant levels of less than 0.05 in spring, summer, and fall. The correlation coefficient gradually increased during spring and summer for the climatological periods from 1971-2000 to 1991-2020.
The highest level of the correlation coefficients exceeding 0.78 (p<0.05) was recorded in the spring of the recent climatological period of 1991-2020. The increasing trend in air temperatures was found to have intensified in the spring season, compared with those in other seasons in East Asia (Cho et al., 2022; Jung et al., 2002; Kim et al., 2017; Kug & Ahn, 2013). Especially, Cho et al. (2021) showed a recently intensified increasing trend of the air temperature during spring in East Asia through an analysis of air temperature anomalies throughout 1998-2019. Moreover, in 2018, unusual East Asian spring warming occurred, leading to a North Atlantic tripole SST-mode transportation anomaly (Deng et al., 2019). The correlation coefficients in summer explained that the impact of Northwest Pacific air mass had intensified variations of the air temperatures in Korea ahead of the recent climatological period. In addition, Lee & Lee (2016) suggested that the summer duration had been extended in Korea for the past 42 years (1973-2014). The anomalous high-pressure pattern was accompanied by large-scale subsidence in Korea, providing favorable conditions for sweltering and humid days.
Conclusions
This study presented an NCL software setup in a CentOS Linux-based hardware system. It analyzed the impacts of East Asian atmospheric variability on Korea located on the downwind side during the climatological periods of 1971-2000, 1981-2010, and 1991-2020. In the meteorological data analysis domain, the NCL tool is widely adopted to analyze the big data file format as in NetCDF. The open-source availability of software and advanced scientific programming concepts, and extensive community support provide an additional advantage in using it. NCL routines were successfully installed on the CentOS Linux-based hardware system, which conducted the spatial pattern and PCA results using the EOF analysis method for the monthly air temperatures at 850 hPa in East Asia.
While the seasonal air temperatures varied among the climatological periods, the first EOF mode accounted for 50.6% in spring, 50.3% in winter, 42.7% in fall, and 36.9% in summer in the same signal from northern Mongolia to Korea during the recent climatological period of 1991-2020. The EOF analysis represented a spatial structure in which the variations were consistent with those in Korea during all seasons. The East Asian continent has influenced the Korean weather by the prevailing westerlies. In the seasonal air temperature variations, the impact of the East Asian continent on Korea in winter has decreased more during the recent 1991-2020 than during the previous climatological periods. In particular, the air temperatures in the East Asian continent have gradually intensified in Korea during the warm season of spring and summer. The air mass over the Northwest Pacific has strengthened the variability of air temperatures in Korea in summer for the recent climatological period of 1991-2020. Seasonal PCA on the first EOF modes in East Asia was compared with the air temperatures observed in Korea for the climatological periods. The correlation coefficients between PCAs in the East Asian region and the surface-based air temperatures in Korea indicated gradual increases in spring and summer ahead of the recent climatological period. In addition, the expanding air mass over the Northwest Pacific to the East Asian region in summer has intensified the dominant effects on the air temperatures in Korea.