Think Summer: Project 4 — 2023
Submission
Students need to submit the following file by 10:00PM EST through Gradescope inside Brightspace.
-
A Jupyter notebook (a
.ipynbfile).
We’ve provided you with a template notebook for you to use. Please carefully read this section to get started.
|
When you are finished with the project, please make sure to run every cell in the notebook prior to submitting. To do this click . Next, to export your notebook (your |
Questions
Question 1
For the show The Handmaid’s Tale (title_id is tt5834204), there are 4 seasons listed in the IMDB database. Find the average rating of each of the four seasons. Hint: Use AVG for find the average, and use GROUP BY the season_number.
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 2
Identify the six most popular episodes of the show Grey’s Anatomy (where "popular" denotes a high rating).
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 3
Make a dotchart in R showing the results of the previous question.
Hint: You can use your work from SQL, and export the results to a dataframe called myDF in R. Then you can use something like:
# use a dbGetQuery here, to import the SQL results to R, and then
myresults <- myDF$rating
names(myresults) <- myDF$primary_title
dotchart(myresults)
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 4
Make a dotchart showing the total amount of money donated in each of the top 10 states, during the 2000 federal election cycle.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 5
Make a dotchart that shows how many movies premiered in each year. Now make another dotchart, which shows the same data (i.e., how many movies premiered each year) but only since the year 2000.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 6
Among the three big New York City airports (JFK, LGA, EWR), which of these airports had the worst DepDelay (on average) in 2005? (Can you solve this with 1 line of R, using a tapply (rather than using 3 separate lines of R)? Hint: After you run the tapply, you can index your results using [c("JFK", "LGA", "EWR")] to lookup all 3 airports at once.)
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 7
LIKE is a very powerful tool. You can read about SQLite’s version of LIKE here. Use LIKE to analyze the primary_title of all IMDB titles: First determine how many titles have Batman anywhere in the title, and then determine how many titles have Superman anywhere in the title? Which one occurs more often?
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 8
How much money was donated during the 2000 federal election cycle by people who have PURDUE listed somewhere in their employer name? How much money was donated by people who have MICROSOFT listed somewhere in their employer name? Hint: You might use the grep or the grepl (which is a logical grep) to solve this one.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)