Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick

C#: Strings without encodings? Working with buffers.

So far I'm really liking most of C#. Here's something I can't figure out, though:

Question for HTTP wizards
Writing an HTTP header parser in C# while still being 8-bit-clean looks to be a bitch. Or are headers always ASCII? RFC 2616's grammar says octet all over, which I can only assume means all 8 bits in "oct". Or is that just ideal, and in the real world so many servers suck that clients only send 7 bit headers? Or, is there an explicit encoding in HTTP? UTF-8?

The Mono XSP webserver isn't 8-bit clean. It assumes all is ASCII.

Question for C# wizards
Is there any way to conveniently operate on a buffer of bytes, perhaps as a String, without knowing its encoding? If I have a buffer with bytes over 127, the Encoding.ASCII converts them to question marks. (which I verifed by going from byte buffer to String and back again) This is what Mono's XSP does.

And you can only do regular expressions on a string, not a buffer?

If I have anonymous 8-bit data, I should still be able to split on it, search for substrings (read: other byte arrays), and run regexps on it.

Now, I think it's nice that Strings have known encoding, but I think C# should have a more powerful System.Buffer class that allows for more than just getting/setting bytes. I should be able to do searches from another byte array. And the RegExp library should allow matching on byte buffers.
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.