# A Crash Course in Data Science Quiz

Enroll Now

## What is data science?

1.
Question 1
Data science is

1 point

• Database management
• Applied machine learning
• Applied statistics
• Deep learning
• Answering specific questions with data

## What is statistics good for?

1.
Question 1
We covered four example broad areas of statistics. These were (check all that apply):

1 point

• Gut instincts
• Descriptive
• Prediction
• Inference
• Experimental design

2.
Question 2
Descriptive analysis includes which activities (check all that apply)?

1 point

• Sample size calculations
• Exploratory data analysis
• Basic summary tables

3.
Question 3
Statistical inference is defined as:

1 point

• The process of adding randomization to an experimental design.
• The process of performing unsupervised clustering.
• The process of evaluating predictions using cross validation.
• The process of drawing conclusions about populations from a sample.

4.
Question 4
Predictions are typically evaluated by:

1 point

• A measure of prediction performance.
• Whether randomization was included in the design.
• Model simplicity.

5.
Question 5
Randomization of a treatment in a design is used for:

1 point

• Balancing observed and unobserved covariates that may contaminate our results.
• Obtaining good predictions.

## Machine learning

1.
Question 1
The lecture discussed two broad categories of machine learning (check all that apply):

1 point

Support vector machines

Unsupervised learning

Supervised learning

2.
Question 2
Supervised machine learning algorithms focus on:

1 point

clustering without an outcome.

prediction through prediction performance.

principal components.

3.
Question 3
A way to obtain generalizability of a ML algorithm

1 point

use the same data for testing that was used to build the algorithm

test it on novel datasets

4.
Question 4
Traditional statistical approaches often differ from ML approaches by (check all that apply):

1 point

by focusing on superpopulation models.

by often placing a higher priority on parameter interpretability and simplicity over prediction performance.

by focusing on deep learning.

## Quiz: Software Engineering

1.
Question 1
What role does software engineering play in data science?

1 point

Software engineering is used to increase the speed of machine learning algorithms.

Software engineering is used to generalize data analyses into software so that they can be applied in different situations.

Software engineering’s role is to build computing infrastructure to support complex data analyses

2.
Question 2
Which is a benefit of building software packages for data analysis?

1 point

Software packages are always smaller than regular code files.

Software provides a well-defined interface that can abstract low-level technical details of data analysis routines.

Software packages are generally faster than simple code.

3.
Question 3
When should you consider developing a software package? Select all that apply.

1 point

When an analysis or a part of an analysis must be done more than once or twice.

Any time you analyze data.

If members of another team/group wish to apply your same analysis to their own datasets.

## Structure of a Data Science Project

1.
Question 1
What are the two stages in which a data science project might start? Select all that apply.

1 point

Interpretation

Report writing

Defining/stating the question

Exploratory data analysis

2.
Question 2
Which part is NOT part of the data analysis process?

1 point

Exploratory data analysis

Decision-making

Formal modeling

Communication

3.
Question 3
What are the two goals of exploratory data analysis? Select all that apply.

1 point

Determine if the data are suitable for the question

Assess the totality of the evidence regarding your question.

Build presentations for communicating results to people outside your organization.

4.
Question 4
An analyst on your team engages in exploratory data analysis of a dataset. The EDA inspires him to ask a new question about the data so he begins the data analysis process on this same dataset and goes through the 5 phases.

What is wrong with this approach?

1 point

The development of the question and the development of the answer to the question were conducted with the same dataset.

Exploratory data analysis should never come before defining the question

Exploratory data analysis was used to generate a new question.

## The outputs of a data science experiment

1.
Question 1
The outputs of a data science experiment often include (check all that apply):

1 point

Interactive web pages and apps

Presentations

Reports

2.
Question 2
Reproducibility tools for reports like knitr help with (check all that apply):

1 point

Reproducibility

Documenting the analysis

Getting the data scientist to think about the report during analysis

Version control

3.
Question 3
For maintainability of an data science app the following are useful (check all that apply):

1 point

Version control

Good code documentation

4.
Question 4
Example tools for reproducible report writing are (check all that apply):

1 point

dplyr

knitr

ipython notebooks

5.
Question 5
A good report practices is:

1 point

Being clear written with concise conclusions

Document every blind alley and bit of minutiae from the analysis

To cram as much detail in as possible

## Defining Success in Data Science

1.
Question 1
Some ways we can declare success in data science include (check all that apply):

1 point

Decisions are made based on the data analysis.

The results of the analysis are uncertain and conclusions are not clear.

New knowledge about the phenomena under study is created.

2.
Question 2
Learning that the data in question can’t answer the question being posed is a useful result of a data science experiment

1 point

Not true

True

3.
Question 3
Data products and apps are useful for creating impact of a data science experiment

1 point

True

False

4.
Question 4
A negative outcome from a data science experiment would include

1 point

The data is ignored despite having clear evidence.

New knowledge is created.

A high impact app is made.

Policy is enacted based on new data.

## Data scientist toolbox

1.
Question 1
What are some examples of languages designed for data analysis?

1 point

Scalable computing infrastructure

The Python programming language

Literate programming tools

The Postgres programming language

The MongoDB programming language

2.
Question 2
Why are chat tools like Slack part of the data scientist’s toolbox?

1 point

Chat tools like Slack are good for communicating results to a broad audience

Chat tools are good for downtime between long focused periods working on data science projects.

Data science tools are constantly updating, so keeping in touch with your data science colleagues is essential for success

General purpose tools for chatting are not part of the data scientist’s toolbox.

Data scientist’s aren’t typically good at communication, so a chat tool lets introverts work with others.

3.
Question 3
Which of the following is not a tool in the data scientist’s toolbox?

1 point

Data programming languages like R

Chat tools like Slack

Databases like MongoDB

Data journalism websites like FiveThirtyEight

Help websites like Stackoverflow

4.
Question 4
A data scientist must know how to pull data from every database.

1 point

TRUE

FALSE

## Separating hype from value

1.
Question 1
Joe proposes a data science project applying neural networks to all the data stored in her companies internet logs. Why might this project be hype?

1 point

The project isn’t designed to answer a concrete question.

There may be outliers, since some of the companies’ users are power users.

Neural networks are known not to work on databases

Internet log files are notoriously messy and hard to analyze

2.
Question 2
This is an interesting article about the end of theory due to data collection:

http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory/

One quote from the story is:

“There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”

Which hype vs. reality question does this paragraph fail?

1 point

What is the question you are trying to answer with data?

Do you have the data to answer that question?

3.
Question 3
The Netflix prize offered participants \$1 million to improve Netflix’s algorithm by a specified amount. Several teams did. This is an interesting article on why the Netflix prize solution was never implemented.

http://www.wired.com/2012/04/netflix-prize-costs/

Which of the hype versus reality questions did this project fail?

1 point

Do you have the data to answer that question?

What is the question you are trying to answer with data?

4.
Question 4

Which of the hype versus reality questions did Google Flu trends fail?

1 point