2. Introduction

2.1. What is data science?

One widely accepted concept is the three pillars of data science: mathematics/statistics, computer science, and domain knowledge.

In her 2014 Presidential Address, Prof. Bin Yu, then President of the Institute of Mathematical Statistics, gave an interesting definition:

\[\mbox{Data Science} = \mbox{S}\mbox{D}\mbox{C}^3,\]

where S is Statistics, D is domain/science knowledge, and the three C’s are computing, collaboration/teamwork, and communication to outsiders.

2.2. Set up Computing Environment

All setups are operating system dependent.

As soon as possible, stay away from Windows. Otherwise, good luck (you need it).

2.2.1. Command line interface

2.2.2. Python

  • Install Python package manager miniconda or pip.

  • Install Python

  • Install an IDE (Jupyter Notebook or VS Code)

2.2.3. A book project with Jupyter-book

  • Markdown for text

  • Jupyter notebook for code demo

  • Jupytext

2.2.4. MyST Markdown

Markedly Structured Text (MyST) examples:

Add my admonition

Adding my little admonition

Note

Initial

Warning

warning

Note

A note written in reStructuredText.

print("Hello!")
Hello!
Listing 2.1 This is my multi-line caption. It is pretty nifty
10a = 2
11print('my 1st line')
12print(f'my {a}nd line')

Here’s my title

Here’s my admonition content

(2.1)\[ax^{2} + bx + c\]

The basic quadratic equation, (2.1), allows for the construction of all kinds of parabolas

2.2.5. Git and GitHub

2.3. Topics

  1. Setting up

  2. Python Basics

  3. Numerical operations (NumPy)

  4. Data manipulation (Pandas)

  5. Data visualization (Matplotlib)

  6. Statistical modeling (statsmodels)

  7. Machine learning (Scikit-learn)

  8. Distributed computing (Dask)