Process Data from Dirty to Clean

Weekly challenge 1

1.

Question 1

Fill in the blank: In order to have a strong and thorough analysis, a data analyst must verify _____.

1 / 1 point

data replication

data manipulation

data integrity

data engineering

2.

Question 2

A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

1 / 1 point

Data transfer

Data manipulation

Data analysis

Data gathering

3.

Question 3

You are working for a global technology company. You have a dataset with the company’s total cell phone sales by country from 2015 to present. Based on the data you have, what questions are you able to answer?

1 / 1 point

What was the effect on sales when a new phone model was launched?

What was the effect on sales when new phone features were introduced? ?

What countries have the most cell phone sales in the past three years?

What are the mean cell phone sales for each country since 2010?

4.

Question 4

A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices – Sheet1

The data analyst is asked to find the average estimate for Symteco over the past three years. What limitation of the data makes this impossible?

1 / 1 point

The data does not include Symteco.

The data uses the wrong currency.

The data does not include estimates.

The data is all from a single year.

5.

Question 5

A data analyst wants to predict the production output of a factory using a dataset that covers the years 2020 to 2021. In 2022, the factory implemented major labor and facility changes. What limitation of the data means that the analyst needs to get new data?

1 / 1 point

The data is outdated.

The data keeps updating.

The data is geographically limited.

The data is from only one source.

6.

Question 6

In the data analysis process, how does a sample relate to a population?

1 / 1 point

A sample is a duplicate selection of data that is taken from the population.

A sample is an average of all the data that represents the population.

A sample is an ideal example taken from a population.

A sample is a part of a population that is representative of the population.

7.

Question 7

A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?

1 / 1 point

Sampling bias

Unbiased sampling

Random sampling

Geographically limited sampling

8.

Question 8

Your manager asks you to analyze overall revenue so that your team can forecast next year’s sales. The dataset you have for analysis contains data from the past five years but only has six months of data per year. You find that the business objective does not align with the data. What should you convey to your manager? Select all that apply.?

1 / 1 point

There is sampling bias present.

There is insufficient data.

The data doesn’t demonstrate the whole picture.

The data contains duplicates.

Weekly challenge 2

1.

Question 1

Fill in the blank: In order to make your spreadsheet easier to analyze, you choose to alter the way cells appear if their values meet certain conditions. The spreadsheet tool that you use to do this is called _____.

1 / 1 point

cell querying

cell filtering

conditional formatting

conditional ranking

2.

Question 2

A delimiter is a character that indicates the beginning or end of a data item. The split text to columns tool uses a delimiter to accomplish what task?

1 / 1 point

To split duplicate substrings

To specify where to split a text string

To change the format of a column of text

To format a string to numeric

3.

Question 3

What describes syntax?

1 / 1 point

It is the function’s required information and its proper placement.

It is the function’s name and placement.

It is how the function can be used in a program.

It is the purpose of the function and its use.

4.

Question 4

You are working with the following selection of a spreadsheet:

N/AAB
1CustomerAddress
2Sally Stewart9912 School St. North Wales, PA 19454
3Lorenzo Price8621 Glendale Dr. Burlington, MA 01803
4Stella Moss372 W. Addison Street Brandon, FL 33510
5Paul Casey9069 E. Brickyard Road Chattanooga, TN 37421

In order to extract the five-digit postal code from Burlington, MA, what is the correct function?

1 / 1 point

=RIGHT(B3,5)

=RIGHT(5,B3)

=LEFT(B3,5)

=LEFT(5,B3)

Correct

5.

Question 5

A data analyst in a human resources department is working with the following selection of a spreadsheet:

N/AABCD
1Year HiredLast 4 of SS#DepartmentEmployee ID
220191192Marketing
320142683Operations
420201939Strategy
520093208Graphics

They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?

1 / 1 point

=CONCATENATE(A5*B5)

=CONCATENATE(A5+B5)

=CONCATENATE(A5!B5)

=CONCATENATE(A5,B5)

6.

Question 6

An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?

1 / 1 point

=COUNTIF(B2:B200, ”<=50”)

=COUNTIF(B2:B100,”<9″)

=COUNTIF(B2:B100,>50)

=COUNTIF(B2:B100,”>=10”)

7.

Question 7

A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?

1 / 1 point

VALUE

MATCH

FIND

VLOOKUP

8.

Question 8

Fill in the blank: Data mapping is the process of _____ fields from one data source to another.

1 / 1 point

matching

inserting

extracting

transforming

Weekly challenge 3

1.

Question 1

Fill in the blank: Data analysts usually use _____ to deal with very large datasets.

1 / 1 point

SQL

spreadsheets

word processors

CSV

2.

Question 2

What are some of the benefits of using SQL for analysis? Select all that apply.

0.5 / 1 point

SQL has better user management than spreadsheets.

SQL tracks changes across a team.

SQL interacts with database programs.

SQL can pull information from different database sources.

You didn’t select all the correct answers

3.

Question 3

A data analyst has added a massive table to their database on accident and needs to remove the table. What command can the analyst use to correct their mistake?

1 / 1 point

INSERT INTO

DROP TABLE IF EXISTS

DROP TABLE IF NOT EXISTS

REMOVE TABLE IF EXISTS

4.

Question 4

You are working with a database table that contains invoice data. The table includes a column for customer_id. You want to remove duplicate entries for customer_id and get a count of total customers in the database. 

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column. 

NOTE: The three dots (…) indicate where to add the clause.

1

2

3

SELECT COUNT(DISTINCT customer_id )

FROM

invoice

What is the total number of customers in the database?

1 / 1 point

105

43

84

59

5.

Question 5

You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. The state names are abbreviated. You want to check for city names that are greater than 5 characters long.

You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 5 characters long.

SELECT

*

FROM

customer

WHERE LENGTH(city) > 5

What is the the last name of the customer in row 1 of your query result?

NOTE: The query index starts at 1 not 0.

1 / 1 point

Gonçalves

Tremblay

Philips

Almeida

6.

Question 6

In SQL databases, what data type is the value 78.99 an example of?

1 / 1 point

Boolean

Integer

String

Float

7.

Question 7

In SQL databases, what function can be used to convert data from one datatype to another?

1 / 1 point

CAST

TRIM

LENGTH

SUBSTR

8.

Question 8

Fill in the blank: The _____ function can be used to join strings to create a new column.

1 / 1 point

CONCAT

COALESCE

CAST

TRIM

Correct

9.

Question 9

You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You use the SUBSTR function to retrieve the first 3 characters of each last_name, and use the AS command to store the result in a new column called new_last_name.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name

NOTE: The three dots (…) indicate where to add the statement.

NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index

SELECT

employee_id,

FROM

employee

ORDER BY

postal_code

What employee ID number is in row 8 of your query result?

NOTE: The query index starts at 1 not 0.

0 / 1 point

3

1

7

8

Weekly challenge 4

1.

Question 1

Fill in the blank: A data analyst finishes cleaning their data. The next step in the process is reporting and ____.

1 / 1 point

verification

manipulation

processing

replacing

2.

Question 2

As a data analyst, you will need to keep the big picture in mind throughout any project when verifying data cleaning. What must the analyst do to take a big picture view of the project? Select all that apply.

1 / 1 point

Consider the data

Consider the reporting

Consider the business problem

Consider the goal

3.

Question 3

During the verification process, you find that you missed a few leading spaces during data cleaning. What function can you use to eliminate these spaces?

1 / 1 point

TRIM

CUT

CROP

TIDY

4.

Question 4

Your manager points out an error in a product ID number in your dataset. The Product IDs can be numbers like 42 or text like “CAD-425”. Using a pivot table, what function can you use to find how many times this error occurs in the dataset?

0 / 1 point

COUNTA

CASE

CHECK

COUNT

Incorrect

Please review the video on the final step of data cleaning.

5.

Question 5

Which SQL tool considers one or more conditions, then returns a value as soon as a condition is met?

1 / 1 point

WHEN

ELSE

CASE

THEN

6.

Question 6

A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?

1 / 1 point

Illumination

Documentation

Examination

Disclosure

7.

Question 7

A data analyst is in the verification process and needs to verify the modifications that they have made to the data. What could the analyst reference to find the changes they made throughout data cleaning?

1 / 1 point

Spreadsheet

Notepad

Changelog

Metadata

8.

Question 8

You start a complex project that will take more than a year to complete. You need to document modifications made to your queries throughout the project. What is the correct way to store these modifications?

1 / 1 point

Creating a spreadsheet

Creating a changelog

Creating a notepad

Visualizing data

Course challenge

1.

Question 1

Scenario 1, questions 1-5

You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.

Before the meeting you review the About Us tab on their website and their business plan, linked below:

Meer-Kitty Interior Design About Us Page.pdf

PDF File

Meer-Kitty Interior Design Business Plan.pdf

PDF File

Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.

To use the template for the survey feedback, click the link below and select “Use Template.” 

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

Kitty Survey Feedback – Meer-Kitty survey feedback

CSV File

When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.

As the survey has too few responses and numerous duplicates that are skewing results, you decide to repeat the survey in order to create a new, improved dataset. What is your first step?

1 / 1 point

Write new, improved survey questions.

Find a survey tool that only allows someone to complete the survey once.

Delete all of the data from the current, skewed survey.

Talk with stakeholders, explain the new timeline, and ask for approval.

2.

Question 2

Scenario 1 continued

During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.

Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.

Without enough data to identify long-term trends about the video subjects that people prefer, what are your available options? Select all that apply.

0.75 / 1 point

Watch the videos and use your gut instinct to identify which are most successful.

Ask to wait for more data and provide Meer-Kitty with an updated timeline.

Talk with Meer-Kitty stakeholders and ask to adjust the objective.

Move ahead with the data you have to determine the top video subjects.

3.

Question 3

Scenario 1 continued

Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.

Clearly, one particular respondent, the superfan, is overrepresented. What does this situation describe?

1 / 1 point

Confidence level

Sampling bias

Statistical significance

Margin of error

This situation describes sampling bias. Sampling bias occurs when a sample isn’t representative of the population as a whole.

4.

Question 4

Scenario 1 continued

The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.

To use the template for the survey feedback, click the link below and select “Use Template.” 

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

Kitty Survey Feedback – New Meer-Kitty survey feedback

CSV File

If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.

You decide to use a spreadsheet tool that changes how cells appear when they meet a certain value — in this case, the word Yes. You are using VLOOKUP.

1 / 1 point

True

False

5.

Question 5

Scenario 1, continued

You have finished cleaning the data to ensure it is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.

Your team notes one aspect of data cleaning that would help improve the dataset. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.

You use a spreadsheet function to divide the text strings in Column G around the commas and put each fragment into a new, separate cell. In this example, what are the commas called?

1 / 1 point

Delimiters

MIDs

Substrings

Partitions

6.

Question 6

Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:

C4 B.Spoke Market Research Job Description.pdf

PDF File

So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:

C4 S2 Email from Recruiter.pdf

PDF File

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.

For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.

There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.

1 / 1 point

True

False

The VLOOKUP function searches for a certain value in a column to return a corresponding piece of information.

7.

Question 7

Scenario 2, continued

Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.

She says: Spreadsheets have a great tool for that called remove duplicates. Does this mean the team has to remove the duplicate data in a spreadsheet before transferring data to our database?

1 / 1 point

Yes

No

8.

Question 8

Scenario 2, continued

Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.

She asks: Is there a SQL function that can convert data types such as currency, dates, and times in a SQL table?

1 / 1 point

Yes, data types including currency, dates, and times can be converted.

No, only currency can be converted.

9.

Question 9

Scenario 2, continued

Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.

She asks: If you encountered a situation where you wanted to add strings together to create new text strings, which SQL function would you use?

1 / 1 point

CONCAT

COMBINE

COALESCE

CREATE

10.

Question 10

Scenario 2, continued

For your final question, your interviewer explains that her team often comes across data with extra leading or trailing spaces.

She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.

1 / 1 point

True

False

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *