Data Analysis with R Programming

Weekly challenge 1

1.

Question 1

How do data analysts refer to the words and symbols they use to write instructions for computers?

1 / 1 point

Syntax languages

Programming languages

Variable languages

Code languages

2.

Question 2

Using a programming language can help you with which aspects of data analysis? Select all that apply.

0.75 / 1 point

Clean your data

Transform your data

Visualize your data

Ask the right questions about your data

3.

Question 3

Which of the following are benefits of open-source code? Select all that apply.

1 / 1 point

Anyone can use the code for free.

Anyone can pay a fee for access to the code.

Anyone can fix bugs in the code.

Anyone can create an add-on package for the code.

4.

Question 4

Which of the following statements about the R programming language are correct? Select all that apply.

1 / 1 point

It can create world-class visualizations

It can process large amounts of data

It makes analysts spend more time cleaning data and less time analyzing

It relies on spreadsheet interfaces to clean and manipulate data

5.

Question 5

What is a benefit of using the R programming language for data analysis? Select all that apply.

0.75 / 1 point

It is a general-purpose programming language.

It is the most popular machine-learning language.

It can create world-class visualizations.

It can work with large amounts of data

6.

Question 6

RStudio’s integrated development environment lets you perform which of the following actions? Select all that apply.

1 / 1 point

Import data from spreadsheets

Stream online videos

Install R packages

Create data visualizations

7.

Question 7

A data analyst wants to write R code where they can access it again after they close their current session in RStudio. Where should they write their code?

0 / 1 point

R console

Files tab

Source editor

History tab

8.

Question 8

In RStudio, where can you find the output of the visualizations produced by your analysis?

1 / 1 point

Plots tab

R console

Environment tab

Files tab

Weekly challenge 2

1.

Question 1

Which of the following are examples of variable names that can be used in R?

1 / 1 point

value_2

value(2)

value-2

value%2

2.

Question 2

You want to create a vector with the values 12, 23, 51, in that exact order. After specifying the variable, what R code chunk lets you create the vector?

1 / 1 point

v(12, 23, 51)

c(12, 23, 51)

v(51, 23, 12)

c(51, 23, 12)

3.

Question 3

A data analyst finds the code mdy(10211020) in an R script. What is the year of the date that is created?

1 / 1 point

2120

1102

1020

1021

4.

Question 4

A data analyst wants to assign the value 50 to the variable daily_dosage. Which of the following types of operators will they need to use in the code?

1 / 1 point

Assignment

Arithmetic

Relational

Logical

5.

Question 5

Which of the following is a best practice when naming variables in R?

1 / 1 point

Use lowercase for variable names.

Variable names should be verbs.

Variable names should start with special characters.

Use a space character to separate words in variable names.

6.

Question 6

What type of packages are automatically installed and loaded to use in R studio when you start your first programming session?

1 / 1 point

Recommended packages

CRAN packages

Base packages

Community packages

7.

Question 7

What is the relationship between RStudio and CRAN?

1 / 1 point

RStudio installs packages from CRAN that are not in Base R.

CRAN contains all of the data that RStudio users need for analysis.

CRAN creates visualizations based on an analyst’s programming in RStudio.

RStudio and CRAN are both environments where data analysts can program using R code.

8.

Question 8

A data analyst previously created a series of nested functions that carry out multiple operations on some data in R. The analyst wants to complete the same operations but make the code easier to understand for their stakeholders. Which of the following can the analyst use to accomplish this?

0 / 1 point

Pipe

Argument

Comment

Vector

Weekly challenge 3

1.

Question 1

What scenarios would prevent you from being able to use a tibble?

0.75 / 1 point

You need to create column names

You need to create row names

You need to store numerical data

You need to change the data types of inputs

You didn’t select all the correct answers

2.

Question 2

A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?

1 / 1 point

print()

colnames()

preview()

head()

3.

Question 3

You are working with the ToothGrowth dataset. You want to use the skim_without_charts() function to get a comprehensive view of the dataset. Write the code chunk that will give you this view. 

skim_without_charts(ToothGrowth)

summary(ToothGrowth$len)

mean(ToothGrowth$len)

What is the average value of the len column?

1 / 1 point

18.8

7.65

4.2

13.1

4.

Question 4

You have a data frame named employees with a column named last_name. What will the name of the employees column be in the results of the function rename_with(employees, toupper)?

1 / 1 point

last_name

LAST_NAME

Last_Name

Last_name

5.

Question 5

A data analyst is working with the penguins dataset. The variable island represents the island on which the sample was collected. The analyst wants to create a data frame that excludes records from the island named “Torgersen”. What code chunk will allow them to create this data frame?

1 / 1 point

penguins %>% filter(island == “Torgersen”)

penguins %>% filter(island != “Torgersen”)

penguins %>% filter(island = “Torgersen”)

penguins %>% filter(island <> “Torgersen”)

6.

Question 6

You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable flipper_length_mm. You write the following code chunk:

penguins %>%
  drop_na() %>%
  group_by(species, sex) %>%
  summarize(mean(body_mass_g))

  penguins %>%
  drop_na() %>%
  group_by(species, sex) %>%
  summarize(min_flipper_length = min(flipper_length_mm)) %>%
  arrange(min_flipper_length)

What species and sex have the lowest minimum flipper length in mm?

0 / 1 point

Chinstrap males

Gentoo females

Adelie females

Gentoo males

7.

Question 7

A data analyst is working with a data frame called zoo_records. They want to create a new column named is_large_animal that signifies if an animal has a weight of more than 199 kilograms. What code chunk lets the analyst create the is_large_animal column?

1 / 1 point

zoo_records %>% mutate(weight > 199 = is_large_animal)

zoo_records %>% mutate(is_large_animal == weight > 199)

zoo_records %>% mutate(weight > 199 <- is_large_animal)

zoo_records %>% mutate(is_large_animal = weight > 199)

8.

Question 8

A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?

1 / 1 point

unite(stores, “location”, city, state, sep=”,”)

unite(stores, city, state, sep=”,”)

unite(stores, “location”, city, sep=”,”)

unite(stores, “location”, city, state)

9.

Question 9

In R, which statistical measure can help you understand the spread of values in a dataset and describe how far each value is from the mean?

1 / 1 point

Average

Standard deviation

Correlation

Maximum

10.

Question 10

A data analyst creates two different predictive models for the same dataset. They use the bias() function on both models. The first model has a bias of -40. The second model has a bias of 1. Which model is less biased?

1 / 1 point

The first model

It can’t be determined from this information

The second model

Weekly challenge 4

1.

Question 1

Which of the following are operations you can perform in ggplot2? Select all that apply.

1 / 1 point

Add a title and subtitle to your plot

Automatically clean data before creating a plot

Change the colors and dimensions of your plot

Create scatterplots and bar charts

2.

Question 2

Which ggplot function is used to define the mappings of variables to visual representations of data?

1 / 1 point

annotate()

mapping()

ggplot()

aes()

3.

Question 3

A data analyst creates a plot using the following code chunk:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Which of the following represents a function in the code chunk? Select all that apply.

0.75 / 1 point

The data function

The geom_point function

The ggplot function

The aes function

4.

Question 4

Which code snippet will make all of the bars in the plot purple?

1 / 1 point

ggplot(data = buildings) +

geom_bar(mapping = aes(x = construction_year), color=”purple”)

5.

Question 5

A data analyst is working with the following plot and gets an error caused by a bug. What is the cause of the bug?

ggplot(data = penguins)

+ geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

1 / 1 point

A function name needs to be capitalized.

A missing closing parenthesis needs to be added.

The plus should be at the end of the fist line.

The code uses a plus sign instead of a pipe.

6.

Question 6

You are working with the penguins dataset. You create a scatterplot with the following code chunk: 

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different penguin species on your plot. Add a code chunk to the second line of code to map the aesthetic shape to the variable species.

NOTE: the three dots (…) indicate where to add the code chunk. You may need to scroll in order to find the dots.

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, shape = species))

Which penguin species does your visualization display?

0 / 1 point

Adelie, Chinstrap, Emperor

Emperor, Chinstrap, Gentoo

Adelie, Chinstrap, Gentoo

Adelie, Gentoo, Macaroni

7.

Question 7

A data analyst creates a plot with the following code chunk:

ggplot(data = penguins) +

geom_jitter(mapping = aes(x = flipper_length_mm, y = body_mass_g))

What does the geom_jitter() function do to the points in the plot?

1 / 1 point

Decrease the size of each point in the plot

Adds a small amount of random noise to each point in the plot

Adds a small amount of random shapes at each point in the plot

Adds random colors to each point in the plot

8.

Question 8

You are working with the diamonds dataset. You create a bar chart with the following code:

ggplot(data = diamonds) +

geom_bar(mapping = aes(x = color, fill = cut)) +

You want to use the facet_wrap() function to display subsets of your data. Add the code chunk that lets you facet your plot based on the variable color.

1

facet_wrap(~color)

How many subplots does your visualization show?

1 / 1 point

8

9

6

7

9.

Question 9

Fill in the blank: You can use the _____ function to put a text label on your plot to call out specific data points.

1 / 1 point

facet_grid()

geom_smooth()

ggplot()

annotate()

10.

Question 10

In R studio, what default options does the Export functionality of the Plots tab give for exporting plots?

1 / 1 point

Slideshow

HTML

Image

PDF

Weekly challenge 5

1.

Question 1

A data analyst wants to create documentation for their cleaning process so other analysts on their team can recreate this process. What tool can help them create this shareable report?

1 / 1 point

Code chunks

Dashboards

R Markdown

Inline code

2.

Question 2

A data analyst wants to export their R Markdown notebook as a text document. What are the text document formats they can use to share their R Markdown notebook? Select all that apply.

1 / 1 point

Word

HTML

PDF

Notepad

3.

Question 3

A data analyst wants to change their header to be one font size smaller. What should they add to their markdown syntax?

1 / 1 point

Exclamation mark

Backtick

Double space

Hashtag

4.

Question 4

A data analyst wants to include a line of code directly in their .rmd file in order to explain their process more clearly. What is this code called?

1 / 1 point

Documented

YAML

Markdown

Inline code

5.

Question 5

A data analyst wants to add a bulleted list to their R Markdown document. What symbol can they type to create this formatting?

1 / 1 point

Asterisks

Brackets

Delimiters

Hashtags

6.

Question 6

A data analyst works with an .rmd file in RStudio and wants the ability to quickly find a code chunk using the label “analysis”. Which code example would allow the analyst to quickly access the code chunk using this label?

1 / 1 point

“`{analysis r}

“`analysis{r}

“`{r} analysis

“`{r analysis}

7.

Question 7

What does the delimiter (three hyphens) indicate in an R Markdown notebook?

1 / 1 point

Italic text

YAML metadata

Code chunk

Bold text

8.

Question 8

What type of export document should you use while you are working and don’t need to worry about adding page breaks in the correct places?

1 / 1 point

Word

PDF

YAML

HTML

Course challenge

1.

Question 1

Scenario 1, questions 1-7

As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.

Your current client is Chocolate and Tea, an up-and-coming chain of cafes.

Image of a creatively designed sign titled chocolate and tea

The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.

Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.

They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.

Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.

Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.

1 / 1 point

Create high-quality data visualizations

Define a problem and ask the right questions

Easily reproduce and share the analysis

Quickly process lots of data

2.

Question 2

Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load all the necessary libraries and packages. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is flavors_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

1 / 1 point

flavors_df + read_csv(“flavors_of_cacao.csv”)

flavors_df <- read_csv(“flavors_of_cacao.csv”)

read_csv(flavors_df <- “flavors_of_cacao.csv”)

read_csv(“flavors_of_cacao.csv”) <- flavors_df

3.

Question 3

Scenario 1, continued

Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.

Assume the name of your data frame is flavors_df. What code chunk lets you review the structure of the data frame? 

1 / 1 point

filter(flavors_df)

select(flavors_df)

str(flavors_df)

summarize(flavors_df)

4.

Question 4

Scenario 1, continued

Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company…Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Maker (without a period at the end).

Assume the first part of your code chunk is:

flavors_df %>%

What code chunk do you add to change the column name?

1 / 1 point

rename(Maker %<% Company…Maker.if.known.)

 rename(Company…Maker.if.known %<% Maker)

rename(Company…Maker.if.known. = Maker)

rename(Maker = Company…Maker.if.known.)

5.

Question 5

After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Bean.Type. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is: 

trimmed_flavors_df <- flavors_df %>% 

Add the code chunk that lets you select the three variables.

select(Rating, Cocoa.Percent, Bean.Type)

What bean type appears in row 6 of your tibble?

1 / 1 point

Trinitario

Criollo

Forastero

Beniano

6.

Question 6

Next, you select the basic statistics that can help your team better understand the ratings system in your data. 

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and sd() functions to find the standard deviation of the rating for your data. Add the code chunk that lets you find the standard deviation for the variable Rating.

  summarize(sd_rating = sd(Rating))

What is the standard deviation of the rating?

1 / 1 point

0.2951794

0.4780624

0.3720475

0.4458434

Correct

You add the code chunk summarize(sd(Rating)) to find the standard deviation for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(sd(Rating)). In this code chunk:

  • The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), max(), and min() to calculate specific statistics. 
  • In this case, you use sd() to calculate the standard deviation statistic for the variable Rating.

The standard deviation of the rating is 0.4780624.

7.

Question 7

After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.75 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percentage is greater than or equal to 80%. You decide to create a new data frame to find out which chocolate bars meet these two conditions. 

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>%

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the new data frame for chocolate bars that contain at least 80% cocoa and have a rating of at least 3.75 points.

1

filter(Cocoa.Percent >= 80, Rating >= 3.75)

How many rows does your tibble include?

1 / 1 point

8

12

20

22

Correct

The code chunk filter(Cocoa.Percent >= 80, Rating >= 3.75) lets you filter the data frame for chocolate bars that contain at least 80% cocoa and have a rating of at least 3.75 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 80, Rating >= 3.75). In this code chunk: 

  • The filter() function lets you filter your data frame based on specific criteria. 
  • Cocoa.Percent and Rating refer to the variables you want to filter. 
  • The >= operator signifies “greater than or equal to.” 
  • The new data frame will show all the values of Cocoa.Percent greater than or equal to 80, and all the values of Rating greater than or equal to 3.75.

Your tibble includes 8 rows.

8.

Question 8

Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals. 

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Company.Location on the x-axis.

  geom_bar(mapping = aes(x = Company.Location))

How many bars does your bar chart display?

1 / 1 point

6

5

4

3

You add the code chunk geom_bar(mapping = aes(x = Company.Location)) to create a bar chart with the variable Company.Location on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location)). In this code chunk:

  • geom_bar() is the geom function that uses bars to create a bar chart. 
  • Inside the parentheses of the aes() function, the code x = Company.Location maps the x aesthetic to the variable Company.Location. 
  • Company.Location will appear on the x-axis of the plot. 
  • By default, R will put a count of the variable Company.Location on the y-axis.

Your bar chart displays 5 bars.

9.

Question 9

Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.

Assume that you are working with the following code:

ggplot(data = best_trimmed_flavors_df) +

  geom_bar(mapping = aes(x = Company.Location))

Add a code chunk to the second line of code to map the aesthetic fill to the variable Rating.

NOTE: the three dots (…) indicate where to add the code chunk.

1

geom_bar(mapping = aes(x = Company.Location, fill = Rating))

According to your bar chart, which two company locations produce the highest rated chocolate bars?

1 / 1 point

Amsterdam and France

Canada and France

Scotland and U.S.A.

Scotland and Canada

You add the code chunk fill = Rating to the second line of code to map the aesthetic fill to the variable Rating. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location, fill = Rating)). In this code chunk: 

  • Inside the parentheses of the aes() function, after the comma that follows x = Company.Location, write the aesthetic (fill), then an equals sign, then the variable (Rating).
  • The specific rating of each location will appear as a specific color inside each bar of your bar chart.

On your visualization, the legend titled “Rating” shows the color coding for the variable Rating. Lighter blues correspond to higher ratings and darker blues correspond to lower ratings.

According to your bar chart, the two company locations that produce the highest rated chocolate bars are Canada and France.

10.

Question 10

Scenario 2, continued

A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

geom_bar(mapping = aes(x = Rating)) +

What code chunk do you add to the third line to create wrap around facets of the variable Rating?

0 / 1 point

facet_wrap(Rating~)

facet_wrap(~Rating)

facet_wrap(Rating)

facet(~Rating)

11.

Question 11

Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Best Chocolates to your plot?

1 / 1 point

labs(“Best Chocolates” = title)

labs(“Best Chocolates”)

labs(title = “Best Chocolates”)

labs(title <- “Best Chocolates”)

You write the code chunk labs(title = “Best Chocolates”). In this code chunk:

  • labs() is the function that lets you add a title to your plot.
  • In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (“Best Chocolates”).

12.

Question 12

Scenario 2, continued

Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.

Assume your first two lines of code are:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating))

What code chunk do you add to the third line to save your plot as a png file with chocolate as the file name?

1 / 1 point

ggsave(“png.chocolate”)

ggsave(“chocolate”)

ggsave(chocolate.png)

ggsave(“chocolate.png”)

Correct

You write the code chunk ggsave(“chocolate.png”). In this code chunk:

  • Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (png), then a closing quotation mark.

13.

Question 13

Scenario 2, continued

As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.

Fill in the blank: You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. You decide to create _____ to document your work.

1 / 1 point

an R Markdown notebook

a data frame

a database

a spreadsheet

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *