Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick
brad

Unicode::CheckUTF8

Here's my latest contribution to the CPAN: Unicode::CheckUTF8.

It was originally Inline.pm code that haunted us for years (Inline deployment sucks) so Artur helped me convert it to XS. It's pretty pathetic I even had to do this, but nothing better's come out in the meantime.

All the other options are either incorrect or incorrect and segfault.

So Unicode::CheckUTF8 is:

-- correct (see test suite!)
-- fast (written in C)
-- doesn't use regexp engine (see fast, but also so it doesn't segfault)

Please, correct me if I'm wrong and something else works, but run your answer through my test suite first. Things known to misbehave, and the reasons if I'm aligning them properly:

-- w3c's recommended regexp segfaults perl with ease
-- Encode, Unicode::String -- don't reject low ascii bytes that expat/mozilla reject

*yawn* Well, I guess I can be excited about getting rid of Inline from production.
Tags: perl, tech
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 12 comments