Whereas the previous chapters mainly cover the expected content of a typical textbook about principle ideas and approaches of statistical data analyses as taught on primary level at most universities and subjects in the world, the following chapters will deviate slightly from the mainstream of advanced statistics textbooks:
In the first chapter “Feature scales” we try to get to the bottom of fundamental assumption for descriptive stats, test statistics and modeling such as unlimited algebraic validity, symmetry, normality and stochastic independence applied on real world data. The presented challenges have been motivated by the data analyses workshop “Earth Science Data Analyses - Getting it right by keeping it real” organized and held by Gert-Jan Weltje (KU Leuven), Rik Tjallingii (GFZ Potsdam), Arne Ramisch (University Innsbruck) and Kai Hartmann (FU Berlin) at Freie Universitaet Berlin (February 2022) and GFZ Potsdam (February 2023). Within this workshop we presented challenges and options to deal with constraint data scales such as positive real data, data with upper and lower limits as well as compositional data constraints in order to obey algebraic rules and statistical assumptions. We are aware of the need to present these topics in the very first row of every presentation regarding data analyses. However, due to didactical reasons we decided to show up with this stuff after basic understanding of statistic principles in order to get a higher acceptance for the needfulness of non-linear transformations before applying linear methods. We tried to seek out a comparable chapter in textbooks or online tutorials but failed up to now. If you know a comparable representation (beside the substantial tutorial focusing on compositional data analyses), please let us know via soga[at]zedat.fu-berlin.de .
The second chapter “Multivariate approaches” comprises common multilinear methods such as multilinear regression (MLR) , principal component analyses (PCA) and factor analyses (FA). For the introductory and application part we resign feature scale constraints as discussed before but provide examples with appropriate transformations at the end of the subjects in order to show the advantages and limitation. Furthermore, we present an Endmember Algorithm primary designed for geological grainsize analyses and an application far away from the initial objective afterwards.
The third chapter “Time Series Analyses” provides a selection of common methods, challenges and approaches of time domain data analyses and forecasts in preparation of machine learning algorithms. This chapter comprises typical tasks such as tests for stationarity, imputation of missing values, time series decomposition, statistical models as well as a forecasting approach using ARMA resp. ARIMA.
Statistical analyses in the spatial domain is represented in the two last chapters “Spatial Point Pattern” and “Geostatistics”. The first one starts explaining principles of random and partial-random point pattern processes. On the basis of Open-Street-Map (OSM) - data of Berlin we present methods for description, analyses and estimation of typical spatial pattern. In a second part, we represent approaches to estimates spatial interactions. The last chapter “Geostatistics” deals with common methods to estimate spatial variation of continuously distributed variables in order provides methods to interpolated these variables from discrete sample locations. Starting with the very basic Nearest Neighbor Interpolation, we focused on Inverse Distance Weighting (IDW) and the variogram based kriging method.
We resigned to present the expected third part of spatial statistics “Discrete spatial variation” because it is already partly covered by the e-learning project RESEDA (Remote SEnsing Data Analyses).
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.