Data analysis today is not an unsophisticated activity carried out by hand; it is much more ambitious, and … an intellectual force to be reckoned with.

David Donoho, “High Dimensional Data Analysis: The Curses and Blessings of Dimensionality”

James later reduced his complaint to a sentence: fielding statistics made sense only as numbers, not as language. Language, not numbers, is what interested him. Words, and the meaning they were designed to convey. “When the numbers acquire the significance of language,” he later wrote, “they acquire the power to do all of the things which language can do: to become fiction and drama and poetry.”

Michael Lewis, Moneyball writing about Bill James, inventor of sabermetrics

Preface#

Welcome to this book!

These are lecture notes for Computer Science 506, Computational Tools for Data Science, as taught by me at Boston University.

The content of the course has major contributions from Evimaria Terzi, George Kollios, and Lance Galletti. Errors are mine. (Please alert me to errors – or better yet, submit a pull request!).

Format#

The notes are in the form of Jupyter notebooks. Demos and most figures are included as executable Python code. All course materials are in the github repository here.

Each Chapter is based on a single Jupyter notebook, and each notebook forms the basis for one lecture (more or less).

This book will evolve as the semester progresses, but once I have given a lecture, the contents of that chapter will stay fixed except for corrections.