I’ve been working on a project that interacts with a database, and happened upon a some interesting problems.
The data I want to input into the database is initially stored in a dict variable.
- If you weren’t already aware, the order of a Python dict changes, even if you use a blueprint or template. This means that a pre-prepared database statement wouldn’t necessarily align with the values of the dict, deeming the data to be parsed into the wrong fields in a database.
The dict may contain some or all of the tables’ fields
- My program will acquire as many data values from input as possible to fully populate the dict, however I have allowed this to be dynamic in a sense that if the data is irretrievable, or doesn’t match a regex for the field, it will ignore said field and move on, deeming the dict value to default to “None”.
I want to make correct use of the API’s escape mechanism
- This is most important. Not only to make fully sure that I’m not inserting “unclean and potentially harmful” data, but also to follow the line of best practice. I’m really sure (like 99%) that the data will be clean. However, mitigating the risk further will bullet proof the commit.
I don’t want to bore you with many lines of code, so I will explain my methodology of my trials instead.
At first, I kind of penned down the rough idea of the function in long hand.
I started with a loop to store the key/value pairs in 2 separate lists. I had to include a couple of cases; to ignore “None” values and some keys which weren’t destined for the intended table.
For each list, I wrote a loop that builds a string that had to be properly formatted for the final statement.
INSERT INTO x (f1, f2) VALUES(v1, v2)
This worked well but the code really was too verbose. 30~50 lines in fact. I thought about it and lessened the load.
Less loops, messy code
Trying to lessen the loops, I went with the same loop to extract the dict key/values as above.
_ = [,]
ignore = ["ig1", "ig2"]
for key, value in dict:
if key in ignore:
I decided to use the string .join() method to create the raw statements from each list, which was concatenated to make the final statement.
It worked and lessened the loop burden. However I had forgotten an extremely important step, the SQLite3 escape mechanism.
Success in less words
It suddenly dawned on me that through these statement creating methods, I’m joining raw data to this string for input, without properly parsing the data. I had also tried:
This of course gave me the same result.
The fix was fairly simple, however.
The values in the statement needed to be changed from the “data values” to “?”.
The execute method also required the second argument as a tuple to allow for parsing and “?” substitution. I simply converted the list to a tuple.
The code is now down from around 50 to 15 lines which I’m happy about.
ignore = ["ig1", "ig2"] # list of dict keys to ignore.
col =  # list of col headings
val =  # list of col values
# loop dict, append key/value to lists respectively
for key, value in dict.items():
if key not in ignore:
val = tuple(val) # convert list to tuple
f = ", ".join(col) # make string of fields
v = ",".join(["?" for i in range(len(col))]) # make string of "?"'s
statement = "INSERT INTO x (" + f + ") VALUES(" + v + ")" # finalise statement
This code sits right with me. It has taken a few attempts, but the final result gets the job done correctly:
INSERT INTO x (f1, f2, f3) VALUES(?,?,?)
It goes to show that even something fairly trivial as chucking information into a database can require some careful planning, especially when taking limitations into account (in this case datatypes), and the nature of task and has taught me some important lessons.