Question for HTTP wizards
Writing an HTTP header parser in C# while still being 8-bit-clean looks to be a bitch. Or are headers always ASCII? RFC 2616's grammar says octet all over, which I can only assume means all 8 bits in "oct". Or is that just ideal, and in the real world so many servers suck that clients only send 7 bit headers? Or, is there an explicit encoding in HTTP? UTF-8?
The Mono XSP webserver isn't 8-bit clean. It assumes all is ASCII.
Question for C# wizards
Is there any way to conveniently operate on a buffer of bytes, perhaps as a String, without knowing its encoding? If I have a buffer with bytes over 127, the Encoding.ASCII converts them to question marks. (which I verifed by going from byte buffer to String and back again) This is what Mono's XSP does.
And you can only do regular expressions on a string, not a buffer?
If I have anonymous 8-bit data, I should still be able to split on it, search for substrings (read: other byte arrays), and run regexps on it.
Now, I think it's nice that Strings have known encoding, but I think C# should have a more powerful System.Buffer class that allows for more than just getting/setting bytes. I should be able to do searches from another byte array. And the RegExp library should allow matching on byte buffers.