Introduction to Data Science

DATA 211 - Introduction to Data Science (3-0-3)

An overview of Data driven approach, Data analytics lifecycle. Basic statistics: Variance, Co-variance, Correlation, Confidence interval and Histogram. Data frames, series, slicing, sorting. Relational database with primary and foreign key. SQL implementation in Python. Data acquisition, cleaning, scrubbing, and manipulation. Correlation analysis, PCA, Linear Regression, Gradient descent, Bayesian classifier, Decision tree, K-means clustering, Hierarchical clustering, Big data, and high-dimensional data. Overview of MapReduce and Hadoop.