Over the past few months here at Six Feet Up, I have noticed an increasing need to demonstrate the usefulness and efficiency of Python command line tools. Command line tools can be very handy, and, making your own using Python is easier than you might think! The information in this article is part four of our series on building a command line app and will show you how to import the CSV module and handle timezones post import . Did you miss part three on bootstrapping the Pyramid app? Read it here.
We have laid the groundwork to build a command line utility for our Pyramid application up to now, but let's do some real work and leverage another great part of the standard library. Python makes this very easy to do since it has the "batteries included" to handle CSV files. The PyMOTW page on the CSV module is a great place to start getting familiar with how to use it, but we want to be able to make use of it in our Pyramid app.
Character encoding can be tricky for CSV files that could contain almost any data. In our case, some of the speakers names have accents in them so we made sure that the file was encoded with UTF-8 and that the CSV reader was told to use the excel dialect to read the file. Below in the sample code, the UnicodeReader
class is a wrapper around the csv.reader
class that handles this enforcement.
import codecs import csv class UTF8Recoder: """ Iterator that reads an encoded stream and reencodes the input to UTF-8 """ def __init__(self, f, encoding): self.reader = codecs.getreader(encoding)(f) def __iter__(self): return self def next(self): return self.reader.next().encode("utf-8") class UnicodeReader: """ A CSV reader which will iterate over lines in the CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): f = UTF8Recoder(f, encoding) self.reader = csv.reader(f, dialect=dialect, **kwds) def next(self): row = self.reader.next() return [unicode(s, "utf-8") for s in row] def __iter__(self): return self
Now we can use this to handle the file.
So now, we can process our CSV file and save the data into the database in the app:
try: with open(csv_uri, 'rb') as f: reader = UnicodeReader(f) for row in reader: title, room, start, end, speakers = row startaware = tz.localize(datetime.strptime(start, DATETIME_FMT)) start = universify_datetime(startaware) endaware = tz.localize(datetime.strptime(end, DATETIME_FMT)) end = universify_datetime(endaware) speakers = [speaker.strip() for speaker in speakers.split(',')] sessionid = make_unique_name(sessions, title.strip()) session = Session(title.strip(), start, end, room.strip(), speakers, sessionid) sessions[session.id] = session session.__parent__ = sessions print session.id, row transaction.commit() finally: closer()
We are using the new context managers to simplify the code a bit here. The with
statement will open and then handle closing the file for us later.
We loop over each line of the CSV file and grab out the title, room, start, end and speakers. The next lines clean up the data and make sure the time is stored in a same way using a method called universify_datetime()
that I explain below. Then we use our reference to the sessions
object in the database to store a new instance of a Session
.
This post isn't really about time zones, but since they are so important, I wanted to just note that we are taking our data and converting it to UTC with no timezone information stored in it. This is the only sane way to have an application that can support multiple time zones. See Armin's blog post on this subject. If you don't read it, you will still need to follow this one rule:
So here the rule of thumb which never shall be broken:
Always measure and store time in UTC. If you need to record where the time was taken, store that separately. Do not store the local time + timezone information!
Here's how to go about following this rule:
import pytz def universify_datetime(dt): """Makes a `datetime` object a tz naive object """ utc = pytz.timezone('UTC') utc_dt = dt.astimezone(utc) utc_dt = utc_dt.replace(tzinfo=None) return utc_dt
Was this series useful? Do you have a topic you'd like to see us write about? Let us know in the comments! Be sure to stay tuned for more Python posts and sign up for our Python How-To digests to receive more how-to guides as soon as they are published!