Preface#
Welcome to this book!
These are lecture notes for Data Science 122, Foundations of Data Science III, as taught at Boston University. The notes are written by Pawel Przytycki, Lisa Wobbes, and Mark Crovella.
Format#
The notes are in the form of Jupyter notebooks. Demos and most figures are included as executable Python code.
Each of the Chapters is based on a single notebook, and each forms the basis for one lecture (more or less).
Sources#
We have relied on many sources for these lecture notes, including many public domain images and other resources (thank you Wikimedia!). Some illustrations were generated using DALL-E and Stable Diffusion.
The principal sources from which we draw much text and many examples are:
Think Bayes, Second Edition, Allen Downey. This book in particular is the basis for the Bayesian portion of the course.
Data Science From Scratch, Joel Grus
Probabilistic Graphical Models, Koller and Friedman
Introduction to Probability, Dennis Sun, notes here.
A first course in probability, Sheldon Ross
Mathematical Statistics and Data Analysis, John A. Rice
Lecture notes, David Vogan here
Understanding the New Statistics, Geoff Cumming
Statistics Done Wrong, Alex Reinhart
Deep Learning, Goodfellow, Bengio, and Courville
Applied Stochastic Analysis, Miranda Holmes-Cerfon, available here
Numerical Algorithms, Justin Solomon, available here
Code#
Packages that are used in this book include: numpy scipy pandas matplotlib seaborn pymc
Here are some quick instructions for making this book:
Clone the repository, then:
make requirements.txt
Create a python environment with the necessary packages:
python -m venv bookenvironment
source bookenvironment/bin/activate
pip install -r requirements.txt
make book