Code Highlighting

Friday, December 28, 2012

SELECTing the comedy horror genre

Whenever we get a new intern, I like to poke and prod him*, to see what he's made of (bones and gooey bits, usually). To test SQL proficiency, I use the following problem. I like it because it's something that comes up occasionally in projects, and is surprisingly complex, though it seems like it shouldn't be:

Suppose you've got a movie database. You've got a table with movies T_MOVIES:

CREATE TABLE [dbo].[T_MOVIES](
 [movie_key] [int] NOT NULL,
 [movie_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_MOVIES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

and there's also a table for movie genres T_GENRES:

CREATE TABLE [dbo].[T_GENRES](
 [genre_key] [int] NOT NULL,
 [genre_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_GENRES] PRIMARY KEY CLUSTERED 
(
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

To link both tables there is a third table T_MOVIE_GENRES:

CREATE TABLE [dbo].[T_MOVIE_GENRES](
 [movie_key] [int] NOT NULL,
 [genre_key] [int] NOT NULL,
 CONSTRAINT [PK_T_MOVIE_GENRES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC,
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Assume the proper foreign key constraints have been applied, and you have a pretty sensible lay-out. To select all movies within the horror genre you only have to join T_MOVIES to T_MOVIE_GENRES and filter down your genre key in the WHERE clause to whatever the horror genre is, suppose it's 5:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES ON
 T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key = 5

The question now is: What if I want to filter by two genres? How do I find movies that are not only horror, but also comedy? What query will yield "Shaun of the Dead"?
More generally, how do we find those records in a table that have more than one matching record in another table, where those matching records have a field set to a set of specific values?

Obviously WHERE T_MOVIE_GENRES.genre_key IN (1,5) is going to yield movies that belong to either genre.
At this point usually the intern scratches his chin, gives it a bit of thought, and comes up with this:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 5
 ) AS Q_HORROR
 ON T_MOVIES.movie_key = Q_HORROR.movie_key
  INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 1
 ) AS Q_COMEDY
 ON T_MOVIES.movie_key = Q_COMEDY.movie_key

(Or some other solution involving a subquery)
Okay, fine, you found "Shaun of the Dead". But what if I'm actually in the mood for Zombieland, a comedy / horror / road movie? In fact, I want a stored procedure that can take an arbitrary number of genres, and filter by it. You can go the dynamic sql route, and build a string with an arbitrary number of subqueries. It'd be ugly and hard to maintain, but it would work.
The following shows what I think is the best solution though:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES
 ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key in (1,5)
GROUP BY
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
HAVING
 COUNT(*) = 2

To refactor this into forementioned stored procedure you need to add a table-typed variable, and use a little bit of dynamic sql - because SQL does not have anything like arrays. Something like this:

CREATE PROCEDURE GetMoviesByGenres
 @genre_keys nvarchar(2000)
AS
BEGIN
 DECLARE @genre_keys_table table(genre_key int)
 DECLARE @genre_count int
 
 INSERT INTO @genre_keys_table
   EXEC(N'SELECT DISTINCT genre_key FROM T_GENRES WHERE genre_key IN (' + @genre_keys + N')')
 
 SELECT @genre_count = COUNT(*) FROM @genre_keys_table
 
 SELECT
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 FROM
  T_MOVIES INNER JOIN
  T_MOVIE_GENRES
  ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
 WHERE
  T_MOVIE_GENRES.genre_key IN (
   SELECT genre_key FROM @genre_keys_table
  )
 GROUP BY
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 HAVING
  COUNT(*) = @genre_count
END

If it wasn't clear from the code; @genre_keys takes a comma-delimited list of keys. That leaves an obvious injection vulnerability, so I would advise to have your calling code take an array of int, and not a string.
So that pretty much concludes that. If any of you ever end up as an intern at Tabeoka, you can now impress me with your mad SQL skillz.

Menno




* "him" because we've only ever gotten male interns. I'm prety sure I could get in trouble for poking and prodding a girl intern anyway. Both with the law, and my wife.

Monday, December 17, 2012

Html and Svg: handling events to and fro

I've been working on a website where I embed a couple of svg's in an html page. Actually, I embed the same svg in three places on one page. The svg in question has some script-driven animation, so I needed to use an object tag, rather than a simple img.
First challenge was this: I needed to trigger the svg animation for all svg's one after the other, which meant I had to call into the svg's script:

        function bounceBall(ball) {
            var svgView = getSvgView(ball);
            
            if(svgView)
                svgView.startBounce();
        }

        function getSvgView(ball) {
            var svgDoc;
            try {
                if (ball.getSVGDocument)
                    svgDoc = ball.getSVGDocument();
                else if (ball.contentDocument)
                    svgDoc = ball.contentDocument;

                if (svgDoc) {
                    return svgDoc.defaultView;
                }
            } catch (e) { }
            return null;
        }

When I initially call this function, the SVG dom might not be loaded yet. That's why I include the try/catch, and return null if an error occurs. Elsewhere in the code I set a setTimeout to retry 200 milliseconds later. The onload event is not quite reliable enough.

Next up was the click handler. If you simply add an onclick to the object tag, nothing happens when you click the svg. That makes perfect sense: the onclick is registered and handled within the svg, and never makes it to the html dom. I needed to have a click on the image open a little div in html though. From svg I could call the html javascript functions using top.someFunctionName(). I had the same svg image three times though, and it needed to do something different each time. Here's what I came up with:

Html:

<object type="image/svg+xml" data="/Content/Images/ball.svg" class="ball" style="left: 130px; top: 160px;" onclick="showPopup('homepopup2');"></object>

"But wait!" you say, "You just told me that doesn't work!". And it doesn't, but it would be pretty convenient if it did:

Html dom javascript:

        function setClickEvent(ball) {
            if (ball.onclick) {
                var svgwin = getSvgView(ball);
                if (svgwin) {
                    svgwin.eventHandler = ball.onclick;
                } else {
                    // if the svg view is not available,
                    // try again in 200 ms.
                    window.setTimeout(function() {
                        setClickEvent(ball);
                    }, 200);
                }
            }
        }

There you go. On load I simply funnel the onclick handler into the svg dom. The svg implementation is trivial:

    var eventHandler = null;

    function handleClick(){
      if(eventHandler)
        eventHandler();
    }


<circle cx="15" cy="15" r="5" id="ball" onclick="handleClick();" />

If you need this sort of thing more often - or for more events - you could work out a neat wrapper with  registerEventHandler(eventName, eventHandler) and triggerEvent(eventName) methods to reuse (and a html-side script that automatically hooks it up). I don't currently foresee a need for it myself though.

Menno

Tuesday, December 4, 2012

Anything but restful

Last week I was asked to integrate some "webservices" into a .NET application. I say "webservices", but that word can mean anything from SOAP to custom format XML to some icky CSV that uses the asterisk for a delimiter.
I was lucky though: the url made obvious that this webservice was WCF-based, some methods returning json, and equivalent methods returning xml. All request parameters are to be sent as json through the querystring.
I'm an optimistic fool, so I simply use the "Add Service Reference" to add a link to the service, and all seems well. Until I try to actually use the service: my application throws up the following:

Could not find default endpoint element that references contract 'IServiceInterface' in the ServiceModel client configuration section. This might be because no configuration file was found for your application, or because no endpoint element matching this contract could be found in the client element. 

Sure enough, when I check in my configuration file, the following has been helpfully added:

<configuration></configuration>

Well thank you very much, Visual Studio. You're a great help.
Google turns out to be more of a help, and points me to the following article telling me: what you're trying to do does not work and will not work. Because they're so RESTful. Instead I should be using the WebChannelFactory class to generate a channel and use that to call the webservice.
No problem.
I  re-use the proxy classes generated in my non-working service reference (but delete the wrapper classes), and point the WebChannelFactory to the generated service interface. Does that work now?
No of course it doesn't work: "405 http method POST is not supported by this url".
Sure enough, Fiddler shows that the service is being called using a POST of the request paremeter serialized as xml. "Stupid boy, " Google tells me again, "you should be adding the WebGet attribute to your method, and if you want json, you need to add a behavior to your endpoint that selects the JsonQueryStringConverter". "And while you're on it, " Google continues, "don't forget to define a UriTemplate for your WebGet attribute".
Done, done and done.
Now does it work? No of course it bloody well doesn't work. No more errors, but no deserialized data either. The json serialized data appears to be wrapped in a single key 'd' for all requests. The class structure doesn't match up to the json hierarchy, so nothing gets deserialized. So I wrap all my return types in a small generic class:

  public class JsonWrapper<T>
    private T _d;
    public T d {
      get { return _d; }
      set { _d = value; }
    }
  }

Finally I start seeing data. WCF and the one side, WCF on the other side and nothing works by itself. Some of the articles I was reading had the gall to tell me this was because WCF has great extensibility. Is "extensible" newspeak for "does nothing useful out of the box"?
Sure, it is very extensible, but in the time it took me to figure out how to get it to call a simple json webservice, I could just as easily have used the WebClient class to call the url, and run the return data  through  NewtonSoft's Json.NET. In fact, I could have done it three times.
If the service reference is able to properly generate proxy classes, why can't the service reference also send the necessary metadata to configure all that stuff I had to do manually? Where was the added value in all that WCF stuff? Is this a useful abstraction, or needless obfuscation?
Now that I've got a bit more of a handle on how it works, I'll give it another chance. But I'm not sold.

Menno

Monday, November 26, 2012

SVG is such a tease

A couple of days ago I was asked to look into animations that would work on iPad. You could use CSS3 tranformations, but I was excited to see what I could do in SVG instead. Why? Because why not!
Off I went, and not too much longer I had this: http://www.tabeoka.be/downloads/svg/blokskes.svg . Cute. Works smoothly on all browsers that matter. Is a little jittery on Opera.
Turns out SVG is a lot of things I've wanted HTML to be in the past. Have you ever created a DIV just to make a rectangle? SVG is for you. It's HTML for making pretty pictures. It supports css and most of the javascript DOM methods you're used to. Pick it up and go.
Back to my image though: I could slap an image on there, and pretend the website is being swallowed by a maelstrom. Right?
Well, here's a unicorn with an unfortunate case of being torn into rectangles:  http://www.tabeoka.be/downloads/svg/blokskes_fill.svg . The most unfortunate part though, is that the unicorn is being torn up very slowly. Unfortunate for the unicorn, certainly, but more unfortunate for my ambition to animate stuff using SVG. What's up with that?

I must be doing something wrong. I can't believe my computer can render this, but not chop up a unicorn (where are its priorities?). Internet Explorer 9 is supposed to have hardware accelerated svg rendering. If this is the result, it must have needed that acceleration real bad. And I can believe Microsoft messed up SVG performance in IE, but every damn browser is slow!
In the past I did some animation where I would suspend rendering while updating the object hierarchy, and then restart rendering. But this article suggests browsers are not stupid; that's too bad. I think I am to conclude that fully SVG-driven animations are not quite ready from prime time. But when the performance catches up, the object model will be ready. I like SVG, but it's such a tease.

Menno

Monday, November 19, 2012

Drupal: open-source CMS

Drupal has been pretty popular with the various Belgian governments. Any new websites created for a government agency are supposed to use an open-source CMS. Drupal's popularity (and its Belgian roots?) have made it the perfect candidate. Heck, even the king uses it. A new initiative is Drupal-as-a-service: pick your features and click "Create website". Off you go!
With the government's zeal to convert everything to open-source the buzz has been increasing. Universities request websites built on Drupal, marketing people want their new website in "Druple".

So what's up with this love for open source? I will now wield my razor-sharp intelligence to shamelessly make up what I think is most likely:

  • Avoiding vendor lock-in
    Every few years, the government is required to field a project to get price quotes from a number of parties, and then pick the best price quote.
    If a project has been written using a proprietary framework/platform/language/etc the best candidate will likely be the candidate who initially set up the project. Any other candidate would need to either swallow the costs of a steep learning curve, or be more expensive.
    Not so for open source. Anyone who knows Drupal/Joomla/Wordpress/But mostly Drupal can pick up a website and go! Money savings galore!
  • Open data
    Whether you have access to the source of your CMS or not, the data on your site is yours. If you switch to a different system, you want to keep your data. If a proprietary system perishes, your data could perish with it. In an open source system, you have the ultimate data spec: the source code.
  • Security updates
    Whenever a security leak is found, noble developers will flock to it, and release a security patch in no time at all. Government sites need to be secure, right?
  • Oh! The modules!
    If you have a Drupal-based website and you want to add a online shop, you only need to install Drupal commerce. Works instantly! Hardly any configuration required! Money savings galore!
    More to the government point: integrate with the Belgian electronic ID card (currently has some security issues).
  • Open source means no license means cheap! Right?
    No it doesn't. And I doubt the government is stupid enough to believe this.
Some of those are pretty solid reasons. I want the government to spend my tax money wisely (frugally even), and store my personal data on secure servers! Who wouldn't?

But here's the thing: when we make a Drupal website, the customer comes back and tells us

"We don't like the way this editing form works. Can you make it do x and y instead?"

And we can. The customer's request makes sense: core Drupal is pretty spartan, and a lot of modules are a tad confusing. We just code up a nice custom module that does exactly what the customer wants. 
But here's the catch:
Those custom modules can be neither open-source, nor secure (nor very well-written for that matter).
That includes some modules that I've written. It's vendor lock-in all over again. 
We inherited a Drupal project from a competitor, and the last few weeks I've been poring over custom modules that implement some form of url rewriting using taxonomy terms. It's a very neatly commented and indented mess, and has a fair number of bugs - I'm not sure I can do better. I'm familiar enough with Drupal, but my employer has had to swallow these costs anyway. You can't pick up a major website and run with it. Open source doesn't matter that much here.

As far as the open data is concerned, I think that's a valid point. I have written a proprietary CMS in .NET, and I encrypt all website data using AES-256 before storing it in the database using an 'optimized' version of Base64. I double-dare customers to switch to a competitor (I kid, of course). But I could do such a thing, call it 'Enterprise-level security', and sell it to the banking industry.
Using open source, getting my data is not necessarily easier, but I know it can be done.

Lastly, about the security updates. The sysadmin is usually the weak link here. Nobody will likely ever find a security hole in my proprietary CMS, because nobody cares. There will, 100% sure, be security updates for Drupal. For all eternity: either you patch or you're vulnerable. My CMS may actually be safer than Drupal. Not by virtue of its technical qualities, but just because it's a tiny fish in a pretty small pond.
After all: wasn't Apple repeatedly the first to be hacked at pwn2own

Menno

Monday, October 22, 2012

SSMS: OutOfMemoryException executing a large SQL batch

So you're executing some beast of an Sql batch using the Management Studio, for instance the result of "Generate Scripts" on another database. Then all of a sudden you get a OutOfMemoryException. Even when just keeping the entire script in memory, poor SSMS is hanging on by its fingernails. How could you expect it to also execute that file, and give you the results?

Thankfully there is a command-line Sql server client Microsoft thoughtfully provides together with Sql server. If you're not able to import a large sql file using SSMS, navigate to the Binn folder of your Sql server folder (C:\Program Files\Microsoft SQL Server\100\Tools\Binn here) in Command Prompt, and type the following:

osql -S databaseserver -U username -P password -d DemoDatabase -i c:\demo.sql

That executes the file C:\demo.sql on databaseserver in the context of DemoDatabase using the login data provided. This will scroll a bunch of query results in your command prompt window. If you'd rather examine these results in detail later, the -o parameter writes this info to an output file:

osql -S databaseserver -U username -P password -d DemoDatabase -i c:\demo.sql -o c:\output.txt

That should work, even when SSMS chokes on the sheer size of your query file.

Monday, October 15, 2012

text-transform: uppercase subtleties

So last week I received an interesting question from a customer. He complained that in Chrome on Mac, the category 'Soßen und Dips' ("Sauces and dips") on the german (obviously) version was rendered as 'SOSSEN UND DIPS' (wrong), rather than 'SOßEN UND DIPS' (right). Notice that ÃŸ and SS are semantically identical - a latin1 collated database will consider those strings to be identical.
On Internet Explorer, the ÃŸ was displayed fine.
It didn't take long to figure out that the menu item had text-transform: uppercase applied, and that this caused the transform of ÃŸ to SS. What's more, Safari and Firefox also displayed SS instead of ÃŸ. My Google search led me to page https://bugzilla.mozilla.org/show_bug.cgi?id=354451 . This page indicated that transforming ÃŸ to SS was deliberate, and not a bug at all. Der Spiegel suggests that ÃŸ should, in capitals, always become SS.
So now I only need to convince the customer that their browser can spell their language better than they can.

Menno




As an interesting aside: a basic test case to check browser behavior sees Internet Explorer 9 also rendering SS, not ÃŸ. Changing the document mode down to IE7 shows ÃŸ again. The actual online page is in IE9 mode, and does not have a X-UA-Compatibility meta tag. The question then becomes, why does it show ÃŸ? No idea yet.

Thursday, October 4, 2012

In which I learn about multithreading performance

... or an exercise in optimizing IndexOf on List<T> for multicore.

So I had a great plan.
.NET 4 introduced AsParallel() into LINQ (well, 'PLINQ' , but try using that word in conversation without giggling uncontrollably). So how about re-implementing some of these methods in a multithreaded way for .NET 3.5?
To get my feet wet, I decided to start off simple: I re-implement IndexOf on List<T>. Should be easy:

  • Perfect for splitting up, each thread just takes a range of the list
  • No need for critical sections
So I wrote an extension method IndexOfParallel<T>:
  • Creates as many threads as there are logical cores,
  • pass them a list range, and a status object
  • each thread checks its range of the list, sets the status object to found and returns
  • main thread calls .Join() on each created thread
Done! Let's check how much better it performs compared to the regular IndexOf!

Looking for a thousand random ints in a list of a million ints:
  • Regular version: 8565064 ticks
  • Multithreaded version: 542255931 ticks
Whoops! Only sixty times slower!

Perhaps a million ints is too small to have the proper effect. Let's try ten million:
  • Regular version: 9131999 ticks
  • Multithreaded version: 149267210 ticks
Well, that's only sixteen times slower now. Progress!
I check the source for IndexOf on List<T>, which uses Array.IndexOf on its internal array. Turns out it uses a native method for basic types. Clearly I can't improve on that. Perhaps I should compare against a List<string> (100,000 items):
  • Regular version: 7044903 ticks
  • Multithreaded version: 93284309 ticks
Thirteen times slower. This is starting to piss me off. Why is it slow? What is slow?
Perhaps I shouldn't be trying to reinvent the wheel. What happens if I just use the overload of IndexOf on List<T> that takes a range instead of writing my own loop? Clearly I'll be losing my early exit, but with enough cores maybe it evens out:
  • Regular version: 7563221 ticks
  • Multithreaded version: 140339244 ticks
Nope! Twenty times slower! If I fire this version off against a List<int> of a million, I gave up waiting for it to finish at all. Extremely slow. I suspect the native method lock the List's internal array in memory, and that may be marked as a critical section.
Whatever the reason, let's scrap this, and go back to my own loop. What if we unroll the loop, say, four times? If that speeds it up significantly, we can deduce the loop implementation is slowing everything down:
  • Regular version: 7122743 ticks
  • Multithreaded version: 89569552 ticks
A small gain: twelve times slower. Clearly the loop is fine. So the thread creating is probably to blame. Let's not create our own threads, and use the ThreadPool:
  • Regular version: 7034966ticks
  • Multithreaded version: 5879994 ticks
Whaaa! Success! It's not much, but I finally beat the built-in version. Let's see if we can improve it a bit more. How about turn the status class into a struct, and avoid the getter for the properties by turning them into public fields? Obviously we need to ref the method parameter:
  • Regular version: 6979197 ticks
  • Multithreaded version: 4006500 ticks
Awesome! If we increase the number of strings in the list to a million, the effect increases too:
  • Regular version: 76802790 ticks
  • Multithreaded version: 36655462 ticks
Twice as fast! I'm sure this can still be improved significantly. I still need to figure out at what list size it makes sense to switch to multithreaded. If you're working on a List<T> where T's implementation of Equals is slow, it should do better. There are a number of lessons learned already though:
  • This method's only worth it in a few situations. Mostly just not.
  • Only create new Threads if you will hold onto them for a long time. Creating new threads takes long. 
  • Prefer using the ThreadPool.
  • Measure Measure Measure!
It'll be interesting to see how we manage reimplementing .Where(). The Predicate delegate could be pretty expensive. IEnumerable<T> is forward only. Will exporting to List<T> and splitting up be faster? Excitement!

So there we are.We beat the built-in IndexOf.

Menno


Here's, for now, the final result:

using System;
using System.Collections.Generic;
using System.Threading;

namespace Tabeoka.Extensions
{
    public static class ExtensionMethods
    {
        public static int IndexOfParallel<T>(this List<T> source, T item)
        {
            int threadCount = GetOptimalThreadCount(source.Count);

            if (threadCount == 1)
                return source.IndexOf(item);

            SearchStatus status = new SearchStatus()
            {
                Found = false,
                FoundIndex = -1
            };

            // Looks like the ThreadPool always hangs onto at least 
            // # of cores threads, if left unset otherwise
            using (ManualResetEvent resetEvent = new ManualResetEvent(false))
            {
                int threadsFinished = 0;
                for (int i = 0; i < threadCount; i++)
                {
                    int fromIndex = (source.Count * i) / threadCount;
                    int toIndex = (source.Count * (i + 1)) / threadCount;

                    ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object t)
                    {
                        SearchListRange(source, item, fromIndex, toIndex, ref status);
                        if (Interlocked.Increment(ref threadsFinished) == threadCount)
                            resetEvent.Set();

                    }));
                }

                resetEvent.WaitOne();
            }

            return status.FoundIndex;
        }

        private static int GetOptimalThreadCount(int listCount)
        {
            // needs more sophisticated logic
            return Math.Min(listCount, Environment.ProcessorCount);
        }

        private static void SearchListRange<T>(List<T> source, T item, int fromIndex, int toIndex, ref SearchStatus status)
        {
            int i;

            for (i = fromIndex; i < toIndex; i += 4)
            {
                if (source[i].Equals(item))
                {
                    status.FoundIndex = i;
                    status.Found = true;
                    return;
                }

                if (source[i + 1].Equals(item))
                {
                    status.FoundIndex = i + 1;
                    status.Found = true;
                    return;
                }

                if (source[i + 2].Equals(item))
                {
                    status.FoundIndex = i + 2;
                    status.Found = true;
                    return;
                }

                if (source[i + 3].Equals(item))
                {
                    status.FoundIndex = i + 3;
                    status.Found = true;
                    return;
                }

                // if some other thread found it; quit searching!
                if (status.Found)
                    return;
            }

            // finishing up loop
            for (i = i - 3; i < toIndex; i++)
            {
                if (source[i].Equals(item))
                {
                    status.FoundIndex = i;
                    status.Found = true;
                    return;
                }
            }
        }
    }

    internal struct SearchStatus
    {
        public bool Found;
        public int FoundIndex;
    }
}

Thursday, September 27, 2012

Complaining about Drupal

Kindly allow me to bitch some more about Drupal. I'm a bit frustrated.
  • Array parameters: just about every function in Drupal takes an associative array:
    array(
          'field_name' => 'publication_datum',
          'cardinality' => 1,
          'type'        => 'datetime',
          'settings'    => array (
            'granularity' => array (
              'month' => 'month',
              'day' => 'day',
              'year' => 'year',
              'hour' => 0,
              'second' => 0,
              'minute' => 0,
            ),
          )
        )
    

    This effectively defeats any auto-complete you might have had in your php editor. Of course I see why they did it; it makes everything 'neat', and it certainly is flexible. Except now you're not only unclear about what you should pass into a parameter, you don't even know the parameters. Add-on modules could look for any key, and there's no way to find out, except by proper documentation (which is rare) or poring over the source code.
    Distinct advantage for the lazy coder: if you need to send an extra variable, you can just hitchhike along with any array that's headed in the right direction if you make sure the key isn't taken.
  • Template naming: do you want to override the rendering of a particular element? You could hook into a theme process function and mess around with the render array or whatever, but the easier solution is actually to give a template file a cryptic name with lots of dashes. This file name functions somewhat as a css selector: the more specific ones override the more general ones. Thankfully there's a module to help you pick a template name: Theme developer offers a plethora of  possible template names for each part of your page. That's pretty handy, but also indicates I'm not the only one who has trouble keeping track.
  • The hooks! Oh, the hooks! There are hundreds of them, and you can go ahead and create your own if the fancy strikes you. Hooks takes different numbers of parameters, of different types (and with or without &), and there is no way to find out without, once again, checking the documentation or the source code.
    I know, PHP does not have strong typing, but the small amount of meta data that would otherwise be available is eradicated by kinda sorta duck typing that's going on in Drupal. Again: I can see why they did it, and it is even clever. But it's still frustrating.
The basic problem it comes down to is this: a lack of discoverability of features. When I'm developing a Drupal module I will have dozens of tabs open in my browser, looking for a clue how to use a particular module. Open source is good; it allows me to figure out what's wrong, even if I did not write the code. But just providing the source code is no replacement for proper documentation and sample code. It's just lazy.

Menno

Wednesday, September 5, 2012

Drupal 7 and Asp.NET webforms

I've been manhandled into writing some modules for Drupal lately. Weeping and gnashing of teeth abounds.
Php is not exactly my favorite language to begin with, and Drupal is extensive and complex and wholly alien to me.
One thing has struck me though, working through the Drupal hooks madness: just how much some of it resembles Asp.NET webforms. A recurrent criticism of Asp.NET has been its confusing event pipeline, and how WebControl abstractions give less control over the generated html.
But lo and behold: Drupal 7's "render array":

$form['taal'] = array(
    '#type' => 'radios',
    '#options' => array(
      'nl' => 'Nederlands (NL)',
      'en' => 'English (EN)',
    ),
    '#required' => FALSE,
    );

This generates a list of <input type"radio">. Change type to 'select', and it generates a <select>. It's also possible to pass simple html string into a render array, like a LiteralControl. The point is to be able to change properties about generated content from other modules, without having to do lots of string parsing.
And then I read in my "Building Drupal Modules" book about how you can hook into the rendering process to change things. Here's the list of available functions:

  • template_preprocess()
  • template_preprocess_[NAME OF HOOK]()
  • [NAME OF MODULE]_preprocess()
  • [NAME OF MODULE]_preprocess_ [NAME OF HOOK]()
  • [NAME OF THEME]_preprocess()
  • [NAME OF THEME]_preprocess_ [NAME OF HOOK]()
  • template_process()
  • template_process_ [NAME OF HOOK]()
  • ... you know, forget it
Total of twelve, for one possible hook, and there will be lots of hooks - add six for each. This is also an event pipeline of sorts. It's just that event handlers can only be added by AutoEventWireup, and it's for everything, not just page events.

Mind you, I'm not criticizing Drupal. Whenever you aim to provide a flexible web platform, you will come up with solutions that have to be either complex, or not enough.

Still, pain.

Menno

Tuesday, September 4, 2012

About those animated ajax page loads in MVC ...

A little while ago I showed a way to chain animation functions with callbacks in javascript. That was part of a web site in MVC the requirements of which were as follows:

  • It needs to use awesometastic animation prettiful swooping panel dynamified load superlicious "Html5" etc.
  • It needs to do well in search engines, and
  • work reasonably well in IE7
The animations were part of the ajax-loading of the page content, but -of course- people needed to be able to link to any page directly as well.

So here's what I did:

  • I created a ViewConfig class. This class contains the current configuration of the browser screen: what menus are shown, which background(s) are showing, etc etc:
        public class ViewConfig
        {
            [JsonConverter(typeof(StringEnumConverter))]
            public MenuDisplay Menu
            {
                get { return _showMenu; }
                set { _showMenu = value; }
            }
    
            [JsonConverter(typeof(StringEnumConverter))]
            public BackgroundsDisplay Backgrounds
            {
                get { return _backgrounds; }
                set { _backgrounds = value; }
            }
    
            public string Root
            {
                get { return _root; }
                set { _root = value; }
            }
    ...
    
  • I created a BaseModel class that includes a ViewConfig property (and my menu data and other data shared among all models):
        public abstract class BaseModel
        {
            public List<Business.MenuItem> MainMenuItems
            {
                get { return _mainMenuItems; }
                set { _mainMenuItems = value; }
            }
    
            public ViewConfig ViewConfig
            {
                get { return _viewConfig; }
                set { _viewConfig = value; }
            }
    
            [JsonIgnore]
            public string ViewConfigJson
            {
                get { return  _viewConfigJson; }
                set { _viewConfigJson = value; }
            }
    ...
    
  • Each controller takes a boolean json parameter that determines if the Model will be sent to the view, or rather simply returned as a JsonActionResult:
        public class InhoudController : BaseController
        {
            public ActionResult Index(string taal, string inhoudId, string json)
            {
                bool returnJson = "true".Equals(json);
                Models.InhoudModel model = new Tabeoka.Epsilon.Web.Models.InhoudModel();
    
    ...
    
                if (returnJson)
                {
                    var jsonResult = new JsonNetResult();
    
                    jsonResult.SerializerSettings.ReferenceLoopHandling = Newtonsoft.Json.ReferenceLoopHandling.Ignore;
                    jsonResult.SerializerSettings.MaxDepth = 1;
                    jsonResult.Data = model;
    
                    Response.Expires = 0;
                    Response.CacheControl = "no-cache";
    
                    return jsonResult;
                }
                else
                {
                    model.ViewConfigJson = model.ViewConfig.ToJson();
                    return View(model);
                }
    
  • In the pages I then simply send an ajax request, get my model as json, and move, step by step, from my current ViewConfig to the new ViewConfig, using supertastic fantalicious animations.
There are just two obvious drawbacks to this approach:

  1. As you can see, I need to send my ViewConfig twice; once as a string for the initial ViewConfig when a page is loaded through a View. The second when the entire model is serialized to json. It's ugly. I could serialize in my View to fix this.
  2. Worse: I have a bunch of html rendering code in javascript. Ideally I would be able to use the same template in javascript and .Net. I'm not sure how and if that could work though.
Search engines and javascriptless (or javascript-poor) clients, I can just present a static version of the site. Other browsers will automatically have links on the page 'ajaxified'.

Conceivably in the future I can take this approach, and improve it by POSTing my current ViewConfig in my ajax request, build the exact delta between that and the requested page, and only fill up my Model with data to the extent my ajax code needs it.

Menno

Thursday, August 16, 2012

And let me draw your attention to the left ...

I would like to take this opportunity to point everybody to this wonderful article: A truly lazy OrderBy in LINQ.
The default OrderBy's execution in LINQ only being somewhat deferred always irked me. This version is much better.

Menno

Monday, August 13, 2012

Sorting a GridView bound to an ObjectDataSource - Speed

Just a short bit of code that dates back to ancient times. I use a lot of GridViews bound to ObjectDataSources for database back-end updating modules. To enable sorting those, you need to define the SortParameterName attribute on the ObjectDataSource, and create an overload on your SelectMethod to takes a string parameter of the name you set. No problem.
The issue is that you now need to implement the sorting yourself.
To sort these there are lots of solutions around, from going over the various sort options in a switch statement, just shipping the sort to your database, or a generic approach using Reflection.

The generic approach is clearly the most flexible, but there is drawback: the performance of reflection is pretty terrible. As soon as you have longer lists, it matters. Nobody wants to wait 10 seconds just to sort a list.
So I took the reflection approach, and implemented it using a DynamicMethod. I'm not posting the code, because it's about 300 lines (including comments), there's a download link at the end.
Using it is easy. Either import the namespace and use the extension method:


public static List<Ticket> GetAll(){
            SqlConnection cnData = new SqlConnection(Data.ConnectionString);
            List<Ticket> dbItem = new List<Ticket>();
            SqlCommand cmdData = new SqlCommand("GetAllTickets", cnData);
            
            /* Yadda yadda, whatever */
            
            return dbItem;
        }

        public static List<Ticket> GetAll(string sortExpression)
        {
            var entities = GetAll();
            
            //calls the extension method
            entities.Sort(sortExpression);

            return entities;
        }


Or just use the IComparer directly:


        public static List<Ticket> GetAll(string sortExpression){
            var entities = GetAll();
            
            entities.Sort(new Tabeoka.PropertyComparer<Ticket>(sortExpression));
            return entities;
        }


Here are the performance numbers (Core 2 Quad, 8 Gb ram. 100,000 items in list):

Sort by Int property:
  • Native method: 00.0592331 s
  • DynamicMethod: 00.2716980 s
  • Reflection: 03.6649998 s
Sort by String property:
  • Native method:  00.3787244 s
  • DynamicMethod: 00.4680498 s
  • Reflection: 03.8817182 s
It's interesting how much faster the native method is compared to the DynamicMethod for the Int property. Clearly we're paying the overhead of the boxing due to casting to object and IComparable. If you're doing lots of sorting Int, you could conceivably write a separate code path. The speed increase is still dramatic compared to Reflection.
Just two more remarks:
  • You can sort by multiple properties. 
  • I included a extension method: List<T>.Sort(string sortExpression)
Download here. Have fun. Some more MVC next time.

Menno

Thursday, August 9, 2012

Chaining javascript function array with callbacks

I'm still working on the MVC website I mentioned earlier. It's one of those sites that, rather than just opening a new page, swoops in the new content through neat jQuery animation and ajax loading of data. Magic!
One of the challenges was the chaining of the necessary animations to render the loaded content.
If we're simply opening another content page in the same language (EN, FR, NL, DE) there are just the  following three animations:
  1. Slide away the current content pane,
  2. Slide in the new content, and
  3. Set the selected node of the menu to the new page
The menu and the set of backgrounds stay the same. If we're switching to the home page of another language though, we're looking at:
  1. Slide away the current content pane
  2. Slide away the current menu
  3. Slide away the current background
  4. Load the backgrounds for the new language, slide in the first
  5. Slide in the new menu
  6. Slide in the content of the home page
  7. Finally set the selected language and page
So depending on the page we're switching from and switching to, we need to run through from 3 up to 10 animations that need to be chained through jQuery/various other javascript callbacks.
I wrote functions that reduce the two pages to their common root, and return an array of animation functions that all take a callback parameter (CPS style). Then I use the following functions to run through them sequentially:

function startAnimation(animationFunctions, finalFunction) {
    var status = new animationStatus(animationFunctions, finalFunction);
    
    status.Delegates[0](function() {
        continueAnimation(status);
    });
}

function continueAnimation(status) {
    status.CurrentIndex++;
    if (status.CurrentIndex >= status.Delegates.length) {
        if (status.FinalFunction)
            status.FinalFunction();
    } else {
        status.Delegates[status.CurrentIndex](function() {
            continueAnimation(status);
        });
    }
}

function animationStatus(delegates, finalFunction) {
    this.Delegates = delegates;
    this.CurrentIndex = 0;
    this.FinalFunction = finalFunction;
}

I use the 'continueAnimation' function to chain the functions in the array. The callback closes over the 'status' variable. There's an optional 'finalFunction' parameter to be called after the end of the array has been reached. Obviously you need to make sure the callback always gets called in your animation function. Here is, as an example, the function to show the menu:

function(animationDone) {
    if (targetConfig.Menu == 'LanguageRoot') {
        $("#languagemenu").hide();
        $("#menu").append(renderMenu(data.MainMenuItems));
        $("#rootmenu").show();
    } else {
        $("#languagemenu").show();
        $("#rootmenu").hide();
    }
    $("#menu").animate({
        left: '0px'
    }, 400, animationDone);
}

Obviously functions with callbacks are nothing new. Loading them all into an array makes it easier to dynamically compose an animation though. It also eliminates some mental overhead: just set your animations in the right order, and fire away. You can think of your entire array as just one function, with the finalFunction callback as the only callback.

Menno

Monday, August 6, 2012

European cookie law and classic asp

The EU cookie law is being rolled out as we speak. The Netherlands specifically have already passed the necessary national laws, and dutch websites are already being checked for compliance (even though violations will only yield a warning, not a fine yet).
Those of us who have classic asp legacy sites may have noticed something disconcerting in dealing with this: there is no way to disable session cookies in asp on a per-request basis. You can disable asp sessions altogether for your entire application, and there is an option to disable the session for a single page (but session cookies will still be sent ). Your configuration may be different, but for what it's worth, here's what has worked for us:
You'll need:
  • ASP.NET on your server,
  • IIS 7 or higher, and
  • your application pool in integrated mode
The more perceptive of you will have already figured out where this is going. I am simply going to have ASP.NET remove the cookies the asp handler generates. I wrote a class CookieMonsterModule (because it eats cookies, see?) that checks for a custom header, and removes the cookies if it can't find that header:


using System;
using System.Web;

namespace Tabeoka.CookieMonster
{
    public class CookieMonsterModule : IHttpModule
    {
        public void Dispose() { }

        public void Init(HttpApplication context)
        {
            context.PostRequestHandlerExecute += new EventHandler(CheckEnableCookies);
        }

        public static void CheckEnableCookies(object sender, EventArgs e)
        {
            var response = HttpContext.Current.Response;

            if (!"true".Equals(response.Headers["enablecookies"]))
            {
                // Yum Yum
                response.Headers.Remove("SET-COOKIE");
            }
        }
    }
}

As you can see, all you need to have cookies make it through, is set the 'enablecookies' header to true:

<%@Language=Javascript %>

<%
Response.AddHeader("enablecookies", "true");
%>

I have made a neat little package at http://www.tabeoka.be/downloads/ClassicAspCookieMonster.zip . It includes a minimal web.config that loads the httpmodule, and the dll itself. You can just copy it into the root of your classic asp app, and start using it (given your configuration meets the requirements). Feel free to take the idea and run with it.

Menno

Friday, August 3, 2012

Binding a TreeView to a custom data type

I'm currently working on an MVC-based website. For managing the database I've elected to go with a classic GridView/DetailsView WebForms solution though, mostly because I have the webcontrols and the old code to make that the quicker solution.
One of the things that needs editing is the menu: a simple Key | ParentKey | MenuKey | Name | Url table (MenuKey because I have more than one menu). I bound a TreeView control to this menu using a set of classes I wrote a while ago to bind a TreeView to a folder structure. I decided to share my code, because it's considerably easier to use than anything else I've been able to find online.
I've based my code on the following article: http://www.codeproject.com/Articles/19639/Implementing-IHierarchy-Support-Into-Your-Custom-C . I was able to get my menu items bound to my TreeView using this method. It does not seem reasonable to have to write the same type of code every time I want to bind to a custom data type though. What's more; I'm adding a collection class, simply to implement the IHierarchicalEnumerable interface. The methods I'm simply patching through to existing methods and properties. I'm a programmer, not a plumber.
So here is a generic implementation of IHierarchyData and IHierarchicalEnumerable (download link at the end):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web.UI;

namespace Tabeoka
{
    /// <summary>
    /// Utility class for creating a hierarchical data source
    /// </summary>
    public static class HierarchyData
    {
        /// <summary>
        /// Returns a hierarchical data list containing only the root item
        /// </summary>
        /// <typeparam name="T">The data type used for building up the hierarchical data source</typeparam>
        /// <param name="root">The root data item</param>
        /// <param name="childSelector">A delegate that returns the child items of the item passed as an argument</param>
        /// <param name="parentSelector">A delegate that returns the parent item of the item passed as an argument, or null if root</param>
        /// <param name="pathSelector">A delegate that is called for the path of the current item as a string</param>
        /// <returns></returns>
        public static HierarchyDataList<T> GetData<T>(T root, Func<T, IEnumerable<T>> childSelector, Func<T, T> parentSelector, Func<T, string> pathSelector) where T : class
        {
            if (root == null)
                throw new ArgumentNullException("root");
            
            if (childSelector == null)
                throw new ArgumentNullException("childSelector");
            
            if (parentSelector == null)
                throw new ArgumentNullException("parentSelector");
            
            if (pathSelector == null)
                throw new ArgumentNullException("pathSelector");

            return new HierarchyDataList<T>(
                new List<T> { root },
                childSelector,
                parentSelector,
                pathSelector
                );
        }

        /// <summary>
        /// Wrapper class for the data type that implements IHierarchyData 
        /// </summary>
        /// <typeparam name="T">The underlying data type</typeparam>
        public class HierarchyDataItem<T> : IHierarchyData where T : class
        {
            internal T DataItem { get; set; }
            internal Func<T, IEnumerable<T>> ChildSelector { get; set; }
            internal Func<T, T> ParentSelector { get; set; }
            internal Func<T, string> PathSelector { get; set; }

            internal HierarchyDataItem(
                T dataItem,
                Func<T, IEnumerable<T>> childSelector,
                Func<T, T> parentSelector,
                Func<T, string> pathSelector
                )
            {
                DataItem = dataItem;

                this.ChildSelector = childSelector;
                this.ParentSelector = parentSelector;
                this.PathSelector = pathSelector;
            }

            #region IHierarchyData Members

            /// <summary>
            /// Gets the child items wrapped in a IHierarchicalEnumerable
            /// </summary>
            /// <returns>A HierarchyDataList&lt;T&gt;</returns>
            public IHierarchicalEnumerable GetChildren()
            {
                return new HierarchyDataList<T>(
                        ChildSelector(DataItem),
                        this.ChildSelector,
                        this.ParentSelector,
                        this.PathSelector
                    );
            }

            /// <summary>
            /// Gets the parent item, and wraps it in a HierarchyDataItem
            /// </summary>
            /// <returns>A HierarchyDataItem&lt;T&gt;, or null if not found</returns>
            public IHierarchyData GetParent()
            {
                var parent = this.ParentSelector(DataItem);
                if (parent != null)
                    return new HierarchyDataItem<T>(
                        parent,
                        this.ChildSelector,
                        this.ParentSelector,
                        this.PathSelector
                        );

                return null;
            }

            /// <summary>
            /// Checks if there are any child nodes, and returns true if there are
            /// </summary>
            public bool HasChildren
            {
                get
                {
                    return this.ChildSelector(DataItem).Count() > 0;
                }
            }

            /// <summary>
            /// The underlying data object
            /// </summary>
            public object Item
            {
                get { return DataItem; }
            }

            /// <summary>
            /// Is supposed to return the logical path according to the underlying data,
            /// just calls the pathSelector.
            /// </summary>
            public string Path
            {
                get { return PathSelector(DataItem); }
            }

            /// <summary>
            /// TypeOf(T)
            /// </summary>
            public string Type
            {
                get { return typeof(T).ToString(); }
            }

            #endregion
        }

        /// <summary>
        /// A list of T that implements IHierarchicalEnumerable
        /// </summary>
        /// <typeparam name="T">The underlying data type</typeparam>
        public class HierarchyDataList<T> : List<T>, IHierarchicalEnumerable where T : class
        {
            internal Func<T, IEnumerable<T>> ChildSelector { get; set; }
            internal Func<T, T> ParentSelector { get; set; }
            internal Func<T, string> PathSelector { get; set; }

            internal HierarchyDataList(
                IEnumerable<T> items,
                Func<T, IEnumerable<T>> childSelector,
                Func<T, T> parentSelector,
                Func<T, string> pathSelector
                )
                : base(items)
            {

                this.ChildSelector = childSelector;
                this.ParentSelector = parentSelector;
                this.PathSelector = pathSelector;
            }

            #region IHierarchicalEnumerable Members

            /// <summary>
            /// Wraps the enumeratedItem object in a HierarchyDataItem
            /// </summary>
            /// <param name="enumeratedItem">The data item</param>
            /// <returns>an instance of HierarchyDataItem&lt;T&gt;</returns>
            public IHierarchyData GetHierarchyData(object enumeratedItem)
            {
                return new HierarchyDataItem<T>(
                    enumeratedItem as T,
                    this.ChildSelector,
                    this.ParentSelector,
                    this.PathSelector);
            }

            #endregion
        }
    }
}

I'm wrapping all of it in a static class: the public static method allows me to infer the generic type parameters.

Here's how you use it:

// The root node for my folder structure
var root = Folders.GetRoot();

// Setting the delegates for getting
// children, parent and path value
var folderData = HierarchyData.GetData(
    root, 
    d => d.SubFolders, 
    d => d.ParentFolder, 
    d => d.Name);

tree = new TreeView();

this.Controls.Add(tree);

if (!this.Page.IsPostBack)
{
    // now just assign datasource and bindings
    tree.DataSource = folderData;
    tree.DataBindings.Add(new TreeNodeBinding()
    {
        TextField = "Name",
        ValueField = "Key"
    });
    tree.DataBind();
}

And here's the download link: http://www.tabeoka.be/downloads/HierarchyData.zip .

I hope this helps someone, somewhere. I welcome comments and criticism.

Menno