python - using sqlalchemy to load csv file into a database -


i trying learn program in python. csv files in database. idea use sqlalchemy framework insert the data.

each file database table. of these files have foreign keys other csv file / db tables.

thanks !

because of power of sqlalchemy, i'm using on project. it's power comes object-oriented way of "talking" database instead of hardcoding sql statements can pain manage. not mention, it's lot faster.

to answer question bluntly, yes! storing data csv database using sqlalchemy piece of cake. here's full working example (i used sqlalchemy 1.0.6 , python 2.7.6):

from numpy import genfromtxt time import time datetime import datetime sqlalchemy import column, integer, float, date sqlalchemy.ext.declarative import declarative_base sqlalchemy import create_engine sqlalchemy.orm import sessionmaker  def load_data(file_name):     data = genfromtxt(file_name, delimiter=',', skip_header=1, converters={0: lambda s: str(s)})     return data.tolist()  base = declarative_base()  class price_history(base):     #tell sqlalchemy table name , if there's table-specific arguments should know     __tablename__ = 'price_history'     __table_args__ = {'sqlite_autoincrement': true}     #tell sqlalchemy name of column , attributes:     id = column(integer, primary_key=true, nullable=false)      date = column(date)     opn = column(float)     hi = column(float)     lo = column(float)     close = column(float)     vol = column(float)  if __name__ == "__main__":     t = time()      #create database     engine = create_engine('sqlite:///csv_test.db')     base.metadata.create_all(engine)      #create session     session = sessionmaker()     session.configure(bind=engine)     s = session()      try:         file_name = "t.csv" #sample csv file used:  http://www.google.com/finance/historical?q=nyse%3at&ei=w4ikvam8lywjmagjhohacw&output=csv         data = load_data(file_name)           in data:             record = price_history(**{                 'date' : datetime.strptime(i[0], '%d-%b-%y').date(),                 'opn' : i[1],                 'hi' : i[2],                 'lo' : i[3],                 'close' : i[4],                 'vol' : i[5]             })             s.add(record) #add records          s.commit() #attempt commit records     except:         s.rollback() #rollback changes on error     finally:         s.close() #close connection     print "time elapsed: " + str(time() - t) + " s." #0.091s 

(note: not "best" way this, think format readable beginner; it's fast: 0.091s 251 records inserted!)

i think if go through line line, you'll see breeze use. notice lack of sql statements -- hooray! took liberty of using numpy load csv contents in 2 lines, can done without if like.

if wanted compare against traditional way of doing it, here's full-working example reference:

import sqlite3 import time numpy import genfromtxt  def dict_factory(cursor, row):     d = {}     idx, col in enumerate(cursor.description):         d[col[0]] = row[idx]     return d   def create_db(db):           #create db , format needed     sqlite3.connect(db) conn:         conn.row_factory = dict_factory         conn.text_factory = str          cursor = conn.cursor()          cursor.execute("create table [price_history] ([id] integer primary key autoincrement not null unique, [date] date, [opn] float, [hi] float, [lo] float, [close] float, [vol] integer);")   def add_record(db, data):     #insert record table     sqlite3.connect(db) conn:         conn.row_factory = dict_factory         conn.text_factory = str          cursor = conn.cursor()          cursor.execute("insert price_history({cols}) values({vals});".format(cols = str(data.keys()).strip('[]'),                      vals=str([data[i] in data]).strip('[]')                     ))   def load_data(file_name):     data = genfromtxt(file_name, delimiter=',', skiprows=1, converters={0: lambda s: str(s)})     return data.tolist()   if __name__ == "__main__":     t = time.time()       db = 'csv_test_sql.db' #database filename      file_name = "t.csv" #sample csv file used:  http://www.google.com/finance/historical?q=nyse%3at&ei=w4ikvam8lywjmagjhohacw&output=csv      data = load_data(file_name) #get data csv      create_db(db) #create db      #for every record, format , insert table     in data:         record = {                 'date' : i[0],                 'opn' : i[1],                 'hi' : i[2],                 'lo' : i[3],                 'close' : i[4],                 'vol' : i[5]             }         add_record(db, record)      print "time elapsed: " + str(time.time() - t) + " s." #3.604s 

(note: in "old" way, no means best way this, it's readable , "1-to-1" translation sqlalchemy way vs. "old" way.)

notice the sql statements: 1 create table, other insert records. also, notice it's bit more cumbersome maintain long sql strings vs. simple class attribute addition. liking sqlalchemy far?

as foreign key inquiry, of course. sqlalchemy has power too. here's example of how class attribute foreign key assignment (assuming foreignkey class has been imported sqlalchemy module):

class asset_analysis(base):     #tell sqlalchemy table name , if there's table-specific arguments should know     __tablename__ = 'asset_analysis'     __table_args__ = {'sqlite_autoincrement': true}     #tell sqlalchemy name of column , attributes:     id = column(integer, primary_key=true, nullable=false)      fid = column(integer, foreignkey('price_history.id')) 

which points "fid" column foreign key price_history's id column.

hope helps!


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -