Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
DATA ENGINEERING PLATFORMS (ADSP 31012)
ASSIGNMENT 4
Data ( Sakila dataset )
• Use the Sakila database schema which can be found in the course material:
• Full documentation:
http://dev.mysql.com/doc/index-other.html
https://dev.mysql.com/doc/sakila/en/sakila-structure-tables.html
Submissions ( Individual )
• For each question, you are required to provide the query you used along with any assumptions made
• All queries must be submitted as a .txt file (include data import scripts)
• Execute the queries and include the screenshot of results as a word or pdf document.
Part A (Sakila Relational DB): Data Extraction
1. Export data from the three tables in Sakila database in two formats:
a. JSON files: customers.json, films.json, rentals.jsonb. CSV files: customers.csv, films.csv, and rentals.csv.
Part B (MongoDB): Manipulating, Sorting and Grouping & Summarizing data
Write MongoDB commands and queries for the following:
2. Create a Sakila document database in MongoDb
3. Load data from files customers.json, films.json, rentals.json into MongoDB
4. Provide the count of the total number of customers living in California.
5. List all movies that are rated NC-17.
6. Provide the count of movies by category.
7. Find the top 2 movies with a length greater than 25 minutes or that have commentaries as special features.
8. Identify the top 3 most profitable movies based on total rental revenue.
9. Provide a count of how many rentals each customer has made over the last year.
10. Identify the fields that would benefit from indexing for performance optimization in common queries (e.g., Rental Date, Film Title). Write the MongoDB commands to create these indexes.
11. Provide 2 additional MongoDB queries for this dataset and indicate the specific business use cases they address
Part C (Neo4J): Linking, Manipulating & viewing relationships within data
Write Neo4J Cypher commands and queries for the following:
12. Create a Sakila graph database in Neo4J
Define the following nodes and properties:
a. Customer: customer_id, first_name, last_name, email, address, cityb. Film: film_id, title, description, release_year, rental_ratec. Rental: rental_id, rental_date, return_date
Define the following relations:
RENTED: Connects Customer to RentalCONTAINS: Connects Rental to Film
13. Load data from files customers.csv, films. csv, rentals. csv into Neo4J
14. Identify a customer and list films Rented by that customer
15. Return the title of each film along with the count of how many times it has been rented
16. Identify all customers who rented the film titled "Inception".
17. For each rental, calculate the number of days between the rental date and return date.
18. Determine which film has the highest number of rentals.
19. For each rental, calculate the number of days between the rental date and return date.
20. Determine which film has the highest number of rentals.