Archive for the ‘Programming’ Category

python-cjson package and non-ASCII data

Wednesday, November 14th, 2007

In my current AJAX project I use mod-python in the server and data is transferred using JSON encoding. There are several Python modules you can choose from for JSON serialization. Some of them are reviewed here. After some reading and testing, I picked python-cjson since I found it fast and reliable. I didn’t have any problem with it until I started sending and receiving data that where not 7-bits ascii, but belonged to the extended ascii set. Strange characters started to appear in the database and at the browser interface.
I tracked the problem down to the fact that python-cjson expects its input to be either 7-bits ascii or Python Unicode internal representation.
Which means that if you receive from the net a string encoded in UTF-8 which contains characters outside of the 7-bits ascii range, you get either an error or a wrong character translation. Same thing if you read a string, say, from a database and try to encode it.

After some tests and some mail exchanges with python-cjson’s author, here is what I have learned:

  • decoding: before calling cjson.decode(), you must convert your data to Python internal representation. For instance if you expect your data to be UTF-8, this is what you should do:
    cjson.decode("your_data".decode('utf-8'))
  • encoding: the approach is similar to the first one, the only problem is that you generally don’t have a simple string to encode but a complex structure. Clearly converting every string of the structure before feeding it to python-cjson is not a good solution. A better one would be converting your data to Python unicode as you read them into your program, for instance from a database or a file. In my case, where data are read from a PostgreSQL database using psycopg2, it was not difficult at all. Psycopg has an option to covert character representations when reading from/writing to a database. It consists of 2 lines of code as I found out here:
    psycopg2.extensions.register_type
             (psycopg2.extensions.UNICODE)
    connection.set_client_encoding('UTF8')

    assuming that your db is UTF-8 encoded.

    This way cjson.encode() will be happy and serialize correctly.

  • JS Builder for Linux

    Saturday, July 28th, 2007

    I have compiled JS Builder v.1.1.2 under Linux (Ubuntu 7.04) with the latest mono packages.
    I haven’t really tested it but someone can be interested in trying it anyway. You can download the binaries and the sources from here.
    This version of JS Builder does not compile right away with mono. There are few issues to take care of.

    1. The XmlSerializer crashes the application. I have replaced it with the SoapFormatter.
    2. JS Builder writes to the registry to register an extensions. I have excluded this operation under Linux.
    3. There are a few methods that on mono fail, so I have corrected these as well.
    4. I have also replaced hard coded backslashes in paths with System.IO.Path.DirectorySeparatorChar.

    One note of advice: when installing mono through apt-get many assemblies have their own packages. To make JS Builder work, ensure that you have installed libmono-system-runtime2.0-cil and libmono-winforms2.0-cil.

    The result is not graphically appealing, but at least does not crash and it even seems to do some work.


    JS Builder 1.1.2 on mono

    Dojo demos

    Sunday, July 22nd, 2007

    After I wrote my last blog about Dojo v0.9. there has been an interesting discussion in the Dojo forum here, and also here, where some nice Dojo developers have helped me understand better the scope and rationale behind the Dojo. From that discussion I gathered that the Dojo beta is not feature complete and that we must wait for the release of v. 1.0 to see what it will look like once finished. This is especially true for the documentation, that is currently the major weak point and, I must say, source of much frustration in using Dojo.

    In the meanwhile, to track the development of the Dojo toolkit and test some of its functionalities, I have created a page where I intend to publish some of the test that I am doing. To begin with, I have ported the Mail demo v0.4 to the current Dojo version (v.0.9).

    Dojo 0.9 not ready yet

    Wednesday, July 18th, 2007

    Today I gave Dojo 0.9beta another try. I wasted few hours and than went back to v. 0.4.

    Dojo is a somewhat unsettling project. From one side, it looks like a great Javascript toolkit, it gets many and good reviews, it has nice widgets and a sound programming design. On the other side, its developers seem to do their best to discurage people from using it.

    Their site is slow and badly designed, the sw that manages their forum is coarse, there are practically no demos, the examples that come with the sources are too basic, there are no documents that give you the big picture, to understand how to use it you must resort too often to read the source code, but what is worse, the transition from v 0.4 to 0.9 is a real pain in the ass.

    First of all, your code based on v 0.4 doesn’t work any more, and you have to make a real porting to use v 0.9. They wrote a porting guide to help it, but after reading it I had no clue of what to do. In other words, the porting guide does not guide you much. They decided that the 0.4 widgets did too much, so 0.9 widgets are a stripped down version of what they had before. What the poor guy that invested on v0.4 has too do now that the bloated features on which he based its design have been removed?

    It is true that they say to have plans to add enhanced widgets that bring back the removed features, but I have preferred if they stated somewhere that this beta is feature incomplete and advice people not to try a port right now.

    I hope that they address these problems in the future since I really like this toolkit.

    Printing with DOMPDF

    Sunday, July 15th, 2007

    My last programming problem was about printing a report from a web application. In the past I solved it using Internet Explorer’s print templates, but that solution had 2 drawbacks: the first was that it required IE 5.5 or later, the second was that the page customizations (margins, orientation, header and footers) required downloading an ActiveX. This was an easy requirement at the time due to the intranet use of the application.
    Now that I have moved to Linux and Open Source software that is no more feasible. I had to find another solution.
    The IT market is full of reporting tools for the web but most of them are expensive and require some kind of plug-in, since the browser alone is not capable of giving a precise rendering of the printed page. Yes, CSS has something to say about print media but these properties are mostly ignored by the current bunch of browsers.
    And then there is my main requirement: all the software used must be Open Source.
    After some thoughts I decided to produce a PDF print for my reports. A quick google search gave me a number of products to try. I could divide them in 3 categories:

    1. raw PDF libraries
    2. XSL-FO
    3. conversion tool from other format (i.e. HTML)

    I didn’t even think to take into consideration the first solution. XSL-FO looked appealing in beginning, but when I looked more deeply to what was it about, I gave up. BTW just reading the word XML was enough to make me shiver.
    The third approach seemed much simpler. Write the report in HTML, give it to the converter program and send the result to the user. Too good to be true: I didn’t have to learn another language and could even reuse the old reports written for IE print templates. The best known of such tools is htmldoc but unfortunately this is a very crude product. It interprets only HTML 3.2 and no CSS. Not very useful. Other products that I downloaded did not look better that htmldoc. I was somewhat amazed by the fact that such a simple and useful concept, an HTML to PDF converter, did not found a supporter in the Open Source that could produce a decent utility. Until I discovered dompdf. It is an impressive library. It supports most of the CSS directives (the notably exceptions being absolute positioning and floating). The first report that I tried worked like a charm. I took my HTML, passed through this utility and my PDF was there. Ok, I have been too enthusiastic. I found a couple of problems when I tried to bend it to some special needs that I had but nothing that some search in the documentation and in the forum couldn’t solve.
    So, to help others that might encounter the same problems I thought to add a few points here to the dompdf FAQ:

    1. how can I make it faster?
    Up to version 0.5.1, there is a line in the library that uses a call to the uniqid() php function. Since this call is slow and it is done once for each dom node in the document, it slows down the translation in a sensible way. The workaround consists in changing the line 171 in file frame.cls.class:

    $this->set_id(uniqid(rand()));  
    with :
    global $dompdf_unique_id;
    if (isset($dompdf_unique_id)){
        $dompdf_unique_id++;
    }else{
        $dompdf_unique_id = 1;
    }
    $this->set_id( $dompdf_unique_id );
    

    Alternatively, change the same line with this:

    uniqid ('', true) 

    2. how can I put some data coming from the DB into the header/footer?
    See Q.4 in the official FAQ. In the script, declare the variable you want to use as global, like this:

    global $my_var;

    3. how can I put a background on every page?

    if ($draft)         // if TRUE print "DRAFT" across every page
    {
        $obj_draft = $pdf->open_object();
        $pdf->text(200, $h - 300, "DRAFT",
                   Font_Metrics::get_font("verdana", "bold"),
                   110, array(0.8, 0.8, 0.8), 0, -52, "Darken", 1);
        $pdf->close_object();
        $pdf->add_object ($obj_draft, "all");
    }
    

    4. I followed the instructions in Q.4 but it doesn’t work?
    Check that the tag

    <script type="text/php">

    is INSIDE the body tag.

    5. how do I print a page generated from PHP?
    All the examples in the official FAQ use the

    $dompdf->load_html($html);

    function, where $html is a string containing the document to print. But what if the document is the result of the current PHP page. Here the trick consists in buffering the PHP output and then using the load_html() on the content of the buffer. Like this:

    
    <?php
    ob_start();    // start buffering
    ...
    // whatever
    ...
    $dompdf = new DOMPDF();
    $out = ob_get_clean();   // get the content of the buffer and clear it
    $dompdf->load_html($out);
    $dompdf->render();
    $dompdf->stream ("Filename.pdf");
    ?>
    

    6. ERROR: Unable to stream pdf: headers already sent
    I came across this error few times. The reason was always due to 3 non visible characters at the beginning of the file, as a consequence of using an editor set with a different character encoding (e.g UTF-8, Latin 1, …).
    Try to run the command

    hexdump -C filename | less
    

    and see what the initial characters are.
    Beware that few editors allow to remove them. Emacs is one of these.

    7. ERROR: Nesting level too deep - recursive dependency? in (…)/dompdf/include/table_frame_decorator.cls.php on line 143
    This error often occurs when a table spans 2 or more pages. DOMPDF tries to split it and repeat the THEAD on the following page. In doing so, it calls the PHP in_array() function, which gives this error. There are 2 workarounds: one is deleting the TBODY tag, the other is patching the split() function found in table_frame_decorator.cls.php, and adding a 3rd argument (true) to the 3 in_array function calls found there. More or less like the following code:

        // If $child is a header or if it is the first non-header row, do
        // not duplicate headers, simply move the table to the next page.
        if ( count($this->_headers) && !in_array($child, $this->_headers, true) &&
             !in_array($child->get_prev_sibling(), $this->_headers, true) ) {
    
        ...
    
        } else if ( in_array($child->get_style()->display, self::$ROW_GROUPS, true) ) {
    

    8. ERROR: Frame not found in cellmap
    I encountered it when using the CSS attribute ‘page-break-inside: avoid’. A possible workaround is reported in the DOMPDF forum and consists in replacing all occurrences of

    throw new DOMPDF_Internal_Exception("Frame not found in cellmap");

    with

    return false;

    I don’t know any possible side effects, but it worked for me till now.
    Another fix that worked for me in one occasion was replacing the markup used with the ‘page-break-inside: avoid’, in that case from DIV to P.