Archive for the ‘Programming’ Category

DOMPDF + justification + extended ASCII chars

Tuesday, May 19th, 2009

If in your documents you use the so called extended ASCII characters, those with a code >= 128 and try to make a PDF print using DOMPDF, and you like to have your document justified, you are in trouble. There seems to be a problem right now with this combination. At least when you want to stick with the open source PDF rendering engine, R&OS CPDF.

The problem is that non only you cannot produce a decent justification, but your text can easly span beyond the border of the paper. This is also true for table cells, where text can flow over the next cell.

This seems to derive from an incorrect mapping between the extended characters and the numbering used by the AFM file. The problem is described in the R&OS CPDF FAQ along with a possible workaround.

Following the workaround proposed in the FAQ, I have tried to make it work under DOMPDF. The workaround says to add a second argument to the selectFont() method that specifies the correct mapping. A grep shows that there are 4 occurrences of this call, in the following files: cpdf_adapter.cls.php and page_cache.cls.php. I therefore proceeded to make the following change, from

$this->_pdf->selectFont($font);

to

$this->_pdf->selectFont($font,
array('encoding'=>'WinAnsiEncoding',
'differences'=>self::$diff));

Once I have written down the mapping, it worked well in the test that I have done. So what is the mapping? Here it is:

static $diff = array (
130 => 'quotesinglbase',
131 => 'florin',
132 => 'quotedblright',
133 => 'ellipsis',
134 => 'dagger',
135 => 'daggerdbl',
136 => 'circumflex',
137 => 'perthousand',
// 138 => '{Underscore}',
139 => 'guilsinglleft',
140 => 'OE',
145 => 'quoteleft',
146 => 'quoteright',
147 => 'quotedblleft',
148 => 'quotedblright',
149 => 'bullet',
150 => 'endash',
151 => 'emdash',
152 => 'tilde',
153 => 'trademark',
// 154 => '{Underscore}',
155 => 'guilsinglright',
156 => 'oe',
159 => 'Ydieresis',
// 160 => '{Nonbreaking space}',
161 => 'exclamdown',
162 => 'cent',
163 => 'sterling',
164 => 'currency',
165 => 'yen',
166 => 'brokenbar',
167 => 'section',
168 => 'dieresis',
169 => 'copyright',
170 => 'ordfeminine',
171 => 'guillemotleft',
172 => 'logicalnot',
// 173 => '{Soft hyphen}',
174 => 'registered',
175 => 'macron',
176 => 'degree',
177 => 'plusminus',
178 => 'twosuperior',
179 => 'threesuperior',
180 => 'acute',
181 => 'mu',
182 => 'paragraph',
183 => 'periodcentered',
184 => 'cedilla',
185 => 'onesuperior',
186 => 'ordmasculine',
187 => 'guillemotright',
188 => 'onequarter',
189 => 'onehalf',
190 => 'threequarters',
191 => 'questiondown',
192 => 'Agrave',
193 => 'Aacute',
194 => 'Acircumflex',
195 => 'Atilde',
196 => 'Adieresis',
197 => 'Aring',
198 => 'AE',
199 => 'Ccedilla',
200 => 'Egrave',
201 => 'Eacute',
202 => 'Ecircumflex',
203 => 'Edieresis',
204 => 'Igrave',
205 => 'Iacute',
206 => 'Icircumflex',
207 => 'Idieresis',
208 => 'Eth',
209 => 'Ntilde',
210 => 'Ograve',
211 => 'Oacute',
212 => 'Ocircumflex',
213 => 'Otilde',
214 => 'Odieresis',
215 => 'multiply',
216 => 'Oslash',
217 => 'Ugrave',
218 => 'Uacute',
219 => 'Ucircumflex',
220 => 'Udieresis',
221 => 'Yacute',
222 => 'Thorn',
223 => 'germandbls',
224 => 'agrave',
225 => 'aacute',
226 => 'acircumflex',
227 => 'atilde',
228 => 'adieresis',
229 => 'aring',
230 => 'ae',
231 => 'ccedilla',
232 => 'egrave',
233 => 'eacute',
234 => 'ecircumflex',
235 => 'edieresis',
236 => 'igrave',
237 => 'iacute',
238 => 'icircumflex',
239 => 'idieresis',
240 => 'eth',
241 => 'ntilde',
242 => 'ograve',
243 => 'oacute',
244 => 'ocircumflex',
245 => 'otilde',
246 => 'odieresis',
247 => 'divide',
248 => 'oslash',
249 => 'ugrave',
250 => 'uacute',
251 => 'ucircumflex',
252 => 'udieresis',
253 => 'yacute',
254 => 'thorn',
255 => 'ydieresis'
);

Configuring Ubuntu for Ajax development

Sunday, May 17th, 2009

I have written this howto in the past to remind me the steps to configure my Ubuntu PC. Since I often move from one machine to another, I needed to write down few basic instructions just to make the installation task easier. I thought that it might also be of some interest to others, so here it is. I keep the original as a Google doc, and what follows is the published version (inside an <iframe>). You can see the printer friendly version (and reader friendly too) here.

python-cjson package and non-ASCII data

Wednesday, November 14th, 2007

In my current AJAX project I use mod-python in the server and data is transferred using JSON encoding. There are several Python modules you can choose from for JSON serialization. Some of them are reviewed here. After some reading and testing, I picked python-cjson since I found it fast and reliable. I didn’t have any problem with it until I started sending and receiving data that where not 7-bits ascii, but belonged to the extended ascii set. Strange characters started to appear in the database and at the browser interface.
I tracked the problem down to the fact that python-cjson expects its input to be either 7-bits ascii or Python Unicode internal representation.
Which means that if you receive from the net a string encoded in UTF-8 which contains characters outside of the 7-bits ascii range, you get either an error or a wrong character translation. Same thing if you read a string, say, from a database and try to encode it.

After some tests and some mail exchanges with python-cjson’s author, here is what I have learned:

  • decoding: before calling cjson.decode(), you must convert your data to Python internal representation. For instance if you expect your data to be UTF-8, this is what you should do:
    cjson.decode("your_data".decode('utf-8'))
  • encoding: the approach is similar to the first one, the only problem is that you generally don’t have a simple string to encode but a complex structure. Clearly converting every string of the structure before feeding it to python-cjson is not a good solution. A better one would be converting your data to Python unicode as you read them into your program, for instance from a database or a file. In my case, where data are read from a PostgreSQL database using psycopg2, it was not difficult at all. Psycopg has an option to covert character representations when reading from/writing to a database. It consists of 2 lines of code as I found out here:
    psycopg2.extensions.register_type
             (psycopg2.extensions.UNICODE)
    connection.set_client_encoding('UTF8')

    assuming that your db is UTF-8 encoded.

    This way cjson.encode() will be happy and serialize correctly.

  • JS Builder for Linux

    Saturday, July 28th, 2007

    I have compiled JS Builder v.1.1.2 under Linux (Ubuntu 7.04) with the latest mono packages.
    I haven’t really tested it but someone can be interested in trying it anyway. You can download the binaries and the sources from here.
    This version of JS Builder does not compile right away with mono. There are few issues to take care of.

    1. The XmlSerializer crashes the application. I have replaced it with the SoapFormatter.
    2. JS Builder writes to the registry to register an extensions. I have excluded this operation under Linux.
    3. There are a few methods that on mono fail, so I have corrected these as well.
    4. I have also replaced hard coded backslashes in paths with System.IO.Path.DirectorySeparatorChar.

    One note of advice: when installing mono through apt-get many assemblies have their own packages. To make JS Builder work, ensure that you have installed libmono-system-runtime2.0-cil and libmono-winforms2.0-cil.

    The result is not graphically appealing, but at least does not crash and it even seems to do some work.


    JS Builder 1.1.2 on mono

    Dojo demos

    Sunday, July 22nd, 2007

    After I wrote my last blog about Dojo v0.9. there has been an interesting discussion in the Dojo forum here, and also here, where some nice Dojo developers have helped me understand better the scope and rationale behind the Dojo. From that discussion I gathered that the Dojo beta is not feature complete and that we must wait for the release of v. 1.0 to see what it will look like once finished. This is especially true for the documentation, that is currently the major weak point and, I must say, source of much frustration in using Dojo.

    In the meanwhile, to track the development of the Dojo toolkit and test some of its functionalities, I have created a page where I intend to publish some of the test that I am doing. To begin with, I have ported the Mail demo v0.4 to the current Dojo version (v.0.9).