||[Dec. 8th, 2005|12:40 am]
Here's my latest contribution to the CPAN: Unicode::CheckUTF8.
It was originally Inline.pm code that haunted us for years (Inline deployment sucks) so Artur helped me convert it to XS. It's pretty pathetic I even had to do this, but nothing better's come out in the meantime.
All the other options are either incorrect or incorrect and segfault.
So Unicode::CheckUTF8 is:
-- correct (see test suite!)
-- fast (written in C)
-- doesn't use regexp engine (see fast, but also so it doesn't segfault)
Please, correct me if I'm wrong and something else works, but run your answer through my test suite first. Things known to misbehave, and the reasons if I'm aligning them properly:
-- w3c's recommended regexp segfaults perl with ease
-- Encode, Unicode::String -- don't reject low ascii bytes that expat/mozilla reject
*yawn* Well, I guess I can be excited about getting rid of Inline from production.