DS220 HW9 SQLite and python
In HW9, we will use a Python standard library sqlite3 to manage a light weight SQL DBMS SQLite. We will use the Wine database to learn how to use the library.
Upon completing HW9, you should understand how to use a programming language such as Python to connect and manage data in a DBMS. You will first import the neccessary libraries, then create the connection and the cursor (like a pointer) to the database, and use the cursor to execute SQL statements.
In addition, we will use functions from the pandas package, which is a python library for data manipulation and data analysis.
Submission requirement
You should run all the codes and finish all the questions below and export your notebook as a pdf file (File - Print - save as pdf file). Please make sure you run all the cells before save. Questions with no outputs (when outputs is required) will not receive any points.
Remember that, anytime you would like to re-start the running of all code cells, go to the "Runtime" tab and click on "Restart Runtime". Then, you can reset the environment to re-run all code cells.
If you are working on Google Colab, you need to mount your google drive in order to access data files such as "wine.db" in the Colab Notebooks folder on your Google drive.
View Tables in the database
In SQLite, there is a table called sqlite_master, which stores the metadata information about all tables. We can run a query on the sqlite_master table to get information about all tables that are not the sqlite_master table. In HW10, since we have just loaded the wine database, we will query the tables in the wine database.
Use cursor to execute multiple statements
You can only execute one SQL statement at a time with cursor.execute(). If you want to modify the table or the metadata of the database, often you would need to execute multiple statements. To run multiple statements, use cursor.executescript() as shown below.
Note that we use three double quote mark (""" """) to create a string with multiple lines.
We can also delete or drop tables:
Pandas and Database
Pandas is a powerful library for data manipulation and data analysis. The design of Pandas is to use a dedicated data structure called DataFrame to manipulate and process data in tabular form (e.g. csv files, tables).
So far, we have seen that the query results returned from a cursor are nested tuples, which can be hard to manipulate.
Pandas provides us with a powerful utility to load the query result into a DataFrame for later processing. pandas.read_sql() is designed for this purpose.
Pandas also supports SQL join, using merge. The code segment below joins the two tables, product and po_line, by matching on the column PRODNR.
Pandas can do almost every operation in a SQL relational database, such as Select, Group By, etc. Pandas objects can be easily converted into numpy or native Python objects. For more details, please refer to Pandas online documentation: https://pandas.pydata.org/docs/.