December 8th, 2005



Here's my latest contribution to the CPAN: Unicode::CheckUTF8.

It was originally code that haunted us for years (Inline deployment sucks) so Artur helped me convert it to XS. It's pretty pathetic I even had to do this, but nothing better's come out in the meantime.

All the other options are either incorrect or incorrect and segfault.

So Unicode::CheckUTF8 is:

-- correct (see test suite!)
-- fast (written in C)
-- doesn't use regexp engine (see fast, but also so it doesn't segfault)

Please, correct me if I'm wrong and something else works, but run your answer through my test suite first. Things known to misbehave, and the reasons if I'm aligning them properly:

-- w3c's recommended regexp segfaults perl with ease
-- Encode, Unicode::String -- don't reject low ascii bytes that expat/mozilla reject

*yawn* Well, I guess I can be excited about getting rid of Inline from production.

Scuba Diving

Hey David/Whitaker, wanna go scuba diving in Asia?

So, it turns out that two of the LJ recent hires also scuba dive, so there are 5 of us. Craziness.

Whitaker arrives sometime today and stays for 2 weeks. Hopefully we get some good hacking done.