# Statistics and Geodata Analysis using Python (SOGA-Py)

Welcome to the E-Learning project **Statistics and Geodata Analysis using Python**. This project is all about processing and understanding data, with a special focus on earthscience data. In a more general sense the project is all about **Data Science**. Data Science itself is an interdisciplinary field about processes and systems to extract knowledge from data applying various methods drawn from a broad field of different scientific disciplines, such as mathematics, statistics and computer science among others.

Source: Blog of Drew Conway (2010, http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)

As shown above in the Venn diagramm by Drew Conway (2010) to do data science we need a substantive expertise and domain knowledge, which in our case is the field of Earth Sciences, respectively Geosciences. In addition we need to know about mathematics and statistics, which is known as the arts of collecting, analysing, interpretating, presenting (visualizing), and organizing data. The main focus of the present E-Learning project is on statistics: **Yes, we will learn statistics! **

Last but not least we must develop hacking skills. For our purpose this means that we need to learn how to progammatically load/save, manipulate, visualize, and analyze (geospatial) data. We will develop hacking skills by learning and applying the statistical programming language **Python**.

**We will cover a lot of ground and it will need stamina and dedication to work through all of that. However, it will pay off!**

Be aware that the beginning of the Digital Age may be dated around the year 2002, when more than 50% of the data worldwile available had been stored digitally in contrast to data stored in analogues forms (Hilbert and López 2011).

Since the 17^{th} century, scientists have recognized experimental and theoretical science as the basic research paradigms for understanding nature. In recent decades, computer simulations have become an essential third paradigm. Nowadays, at the dawn of the Big Data Era, a fourth paradigm is emerging, consisting of the techniques and technologies needed to perform data-intensive science (Bell et al, 2009). Data-intensive science will be integral to many future scientific endeavors, but demands specialized skills and analysis tools (e.g. NASA´s Earth Observing System Data and Information System, EarthServer or Google Earth Engine). Therefore you should not be surprised that in an artilcle published in the Harvard Business Review the job as a data scientist - or call it statistician - is described as the *sexiest job of the 21 ^{st} century* (Davenport and Patil, 2010).

**Citation**

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin*