DS220 HW9 SQLite and python

DS220 HW9 SQLite and python

In HW9, we will use a Python standard library sqlite3 to manage a light weight SQL DBMS SQLite. We will use the Wine database to learn how to use the library.

Upon completing HW9, you should understand how to use a programming language such as Python to connect and manage data in a DBMS. You will first import the neccessary libraries, then create the connection and the cursor (like a pointer) to the database, and use the cursor to execute SQL statements.

In addition, we will use functions from the pandas package, which is a python library for data manipulation and data analysis.

Submission requirement

You should run all the codes and finish all the questions below and export your notebook as a pdf file (File - Print - save as pdf file). Please make sure you run all the cells before save. Questions with no outputs (when outputs is required) will not receive any points.

Remember that, anytime you would like to re-start the running of all code cells, go to the "Runtime" tab and click on "Restart Runtime". Then, you can reset the environment to re-run all code cells.

Import Python libraries and create connection to the Wine database

First import the neccessary libraries, create the connection to the wine database, and use the cursor to execute SQL statements.

In [ ]:
importsqlite3importpandasaspdimportos

If you are working on Google Colab, you need to mount your google drive in order to access data files such as "wine.db" in the Colab Notebooks folder on your Google drive.

In [ ]:
# mount Google drivefromgoogle.colabimportdrive#if you have updated your Google drive (folder name, etc.) and need to remount, you need to uncomment the line below#drive.flush_and_unmount()drive.mount('/content/drive')
In [ ]:
os.chdir("drive/My Drive/Colab Notebooks")
In [ ]:
# create the connection and the cursor to the wine databaseconn=sqlite3.connect('wine.db')c=conn.cursor()

View Tables in the database

In SQLite, there is a table called sqlite_master, which stores the metadata information about all tables. We can run a query on the sqlite_master table to get information about all tables that are not the sqlite_master table. In HW10, since we have just loaded the wine database, we will query the tables in the wine database.

In [ ]:
# display all the tables in the wine databaseforrowinc.execute("SELECT * FROM sqlite_master WHERE type ='table' AND name NOT LIKE 'sqlite_%';"):print(row)

Use Cursor to execute SQL statements

Next, we will learn how to execute a SQL query using the cursor "c". After a successful query execution, we can iterate through the rows to view the records in the returned result.

In [ ]:
# select and display all records in the supplier tablequery="Select * from supplier;"forrowinc.execute(query):print(row)

Use cursor to execute multiple statements

You can only execute one SQL statement at a time with cursor.execute(). If you want to modify the table or the metadata of the database, often you would need to execute multiple statements. To run multiple statements, use cursor.executescript() as shown below.

In [ ]:
script="""create table if not exists foo(foo1,foo2);create table if not exists goo(goo1,goo2,goo3);"""c.executescript(script)print("Columns in foo: ")c.execute("Select * from foo")fordinc.description:print(" ",d[0])print("Columns in goo: ")c.execute("Select * from goo")fordinc.description:print(" ",d[0])

Note that we use three double quote mark (""" """) to create a string with multiple lines.

We can also delete or drop tables:

In [ ]:
script="""drop table if exists foo;drop table if exists goo;"""c.executescript(script)

Pandas and Database

Pandas is a powerful library for data manipulation and data analysis. The design of Pandas is to use a dedicated data structure called DataFrame to manipulate and process data in tabular form (e.g. csv files, tables).

So far, we have seen that the query results returned from a cursor are nested tuples, which can be hard to manipulate.

Pandas provides us with a powerful utility to load the query result into a DataFrame for later processing. pandas.read_sql() is designed for this purpose.

In [ ]:
query="Select * from supplier;"pd.read_sql(query,conn)

Pandas also supports SQL join, using merge. The code segment below joins the two tables, product and po_line, by matching on the column PRODNR.

In [ ]:
query="Select PONR, PRODNR from po_line;"po_line=pd.read_sql(query,conn)query="Select PRODNR, PRODNAME from product;"product=pd.read_sql(query,conn)po_line.merge(product,how='inner',on='PRODNR')[:5]# limit to top 5 records to save space

Pandas can do almost every operation in a SQL relational database, such as Select, Group By, etc. Pandas objects can be easily converted into numpy or native Python objects. For more details, please refer to Pandas online documentation: https://pandas.pydata.org/docs/.

发表评论

电子邮件地址不会被公开。 必填项已用*标注