2. Introduction¶
2.1. What is data science?¶
One widely accepted concept is the three pillars of data science: mathematics/statistics, computer science, and domain knowledge.
In her 2014 Presidential Address, Prof. Bin Yu, then President of the Institute of Mathematical Statistics, gave an interesting definition:
where S is Statistics, D is domain/science knowledge, and the three C’s are computing, collaboration/teamwork, and communication to outsiders.
2.2. Set up Computing Environment¶
All setups are operating system dependent.
As soon as possible, stay away from Windows. Otherwise, good luck (you need it).
2.2.1. Command line interface¶
2.2.2. Python¶
Install Python package manager miniconda or pip.
Install Python
Install an IDE (Jupyter Notebook or VS Code)
2.2.3. A book project with Jupyter-book¶
Markdown for text
Jupyter notebook for code demo
Jupytext
2.2.4. MyST Markdown¶
Markedly Structured Text (MyST) examples:
Add my admonition
Adding my little admonition
Note
Initial
Warning
warning
Note
A note written in reStructuredText.
print("Hello!")
Hello!
10a = 2
11print('my 1st line')
12print(f'my {a}nd line')
Here’s my title
Here’s my admonition content
The basic quadratic equation, (2.1), allows for the construction of all kinds of parabolas
2.2.5. Git and GitHub¶
2.3. Topics¶
Setting up
Python Basics
Numerical operations (NumPy)
Data manipulation (Pandas)
Data visualization (Matplotlib)
Statistical modeling (statsmodels)
Machine learning (Scikit-learn)
Distributed computing (Dask)