In my current AJAX project I use mod-python in the server and data is transferred using JSON encoding. There are several Python modules you can choose from for JSON serialization. Some of them are reviewed here. After some reading and testing, I picked python-cjson since I found it fast and reliable. I didn’t have any problem with it until I started sending and receiving data that where not 7-bits ascii, but belonged to the extended ascii set. Strange characters started to appear in the database and at the browser interface.
I tracked the problem down to the fact that python-cjson expects its input to be either 7-bits ascii or Python Unicode internal representation.
Which means that if you receive from the net a string encoded in UTF-8 which contains characters outside of the 7-bits ascii range, you get either an error or a wrong character translation. Same thing if you read a string, say, from a database and try to encode it.
After some tests and some mail exchanges with python-cjson’s author, here is what I have learned:
cjson.decode("your_data".decode('utf-8'))psycopg2.extensions.register_type
(psycopg2.extensions.UNICODE)
connection.set_client_encoding('UTF8')assuming that your db is UTF-8 encoded.
This way cjson.encode() will be happy and serialize correctly.
