I managed to shave off additional time from the rendering today.
Took a full rewrite of the display code, but nevertheless it gained a 13% increase in speed.
Now I'm down to almost exactly 7 ms per frame which gives us roughly 1 ms per color and around 4 microseconds per pixel.
For comparison - just toggling a pin with the Chipkit core library function digitalWrite() takes almost 16 microseconds!
If I could live with a less vivid display I could shave of an additional 3 ms per frame, but I need that time to allow the pixels to fully illuminate (exponential delay ranging from 0 to 42 microseconds between each row). So I think I've come as far as I can get without sacrificing quality!
Unless.... interrupts to the rescue?
I guess now's a good time to mention that the rendering is also looking better than ever with a nice big contrast between the different shades? I even avoided the recommended "1 microsecond delay" between the two row clock updates without adding any ghosting. Sweet!