Log in

No account? Create an account
CSS Cleaner, automated browser testing - brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

CSS Cleaner, automated browser testing [Feb. 3rd, 2006|03:01 pm]
Brad Fitzpatrick
[Tags|, , , ]

So we need a CSS cleaner on LiveJournal. I don't want to talk about that part.

I'm going to talk about state of CSS parsers in Perl:

-- they don't work. CSS::Tiny, CSS.pm (both parsers), CSS::SAC.... no go.

Also looked at packaging csstidy, but it's not quite what we want either, and would be a lot of work, wasn't written with security in mind I imagine, etc, etc.

So let's assume I have to write my own CSS parser, or at least fix up CSS::SAC (which seems to be the most promising). How to test?

The CSS cleaner works like:

CSS -> parse -> parse tree -> clean -> serialize -> clean CSS

Goal: semantically equivalent CSS and clean CSS in the not-fucking-with-us case. (no JavaScript/etc).

So how do I do a semantic diff of two CSS files, when I don't trust the parser?

Mozilla (Firefox) has a great CSS parser. Let's use that to test!

So I wrote my Perl tests for CSS::Cleaner starts up and says, "Yo, connect to http://.....:9124/ with your browser.". Then it steps the browser through 2000 some CSS files, giving it both original and cleaned versions. Then I walk the CSS tree in JavaScript, build a canonical serialized version of both versions, and either advance to next test (if match), or send both to server to get back a diff, and the test suite fails in the browser with a unified diff of Mozilla's view of the CSS differences and links to orig/cleaned CSS, for human inspection.


Except now I have to fix CSS::SAC or something. Bleh.

From: jamesd
2006-02-04 01:30 am (UTC)
Life would be _so_ much easier if the browser makers did this right in the first place. But it'll never happen.
(Reply) (Thread)
From: plix
2006-02-04 02:13 am (UTC)

What about hacks?

There are a few CSS hacks dependent upon parsing peculiarities in the various browsers (such as the underscore hack for IE, the comment-structuring hack, the voice-family hack, etc) which really make non-obtrusive cleaning a real bitch. The primary question, I suppose, is if you're going to sanitize defensively or aggressively?
(Reply) (Thread)
[User Picture]From: brad
2006-02-04 07:02 pm (UTC)

Re: What about hacks?

We're not allowing hacks. None of them work reliably anyway, especially now with IE7. The proper way to do conditional CSS is by giving different clients different CSS.
(Reply) (Parent) (Thread)
From: plix
2006-02-04 07:48 pm (UTC)

Re: What about hacks?

IE7 only aggravates/changes the situation to an extent; IE is not the only browser for which these hacks are used. What I was getting at is: are you only going to allow properties within a predefined lexicon that your parser knows to be "safe"? If so, do you intend to support non-standard properties such as the -moz- family or Microsoft's filter:?

The other problem is serving different clients different CSS: how do you plan (or already, as I'm not very familiar with S2) to allow such conditionals? Conditional comments only work for IE, detecting based on the user agent string is notoriously spotty at best, and detecting with JS (object detection, etc) isn't reasonable for obvious reasons.
(Reply) (Parent) (Thread)
[User Picture]From: mart
2006-02-04 08:06 pm (UTC)

Re: What about hacks?

Probably best to stick to the hacks that don't rely on bad parsing such as the * html one for IE. The cleaner has to be picky because otherwise browser parsing bugs can be used to sneak in extra rules, much like people used to do by making entries with malformed HTML that the HTML cleaner would pass through but IE would interpret. Now the HTML cleaner bitches about malformed HTML and escapes the entire entry, as you've probably seen.

(Reply) (Parent) (Thread)
From: (Anonymous)
2006-02-04 08:54 am (UTC)

Have you read this blogger's post about talking to the LJ hackers?


"Livejournal assumed the majority of our javascript injection attacks involved malicious code implanted in style sheets or user posts, and they have heavily audited this area for bugs. The changes they made were for a Firefox-specific bug-- they assumed it was the key to the XSS attacks that we were doing. Ours affect all browsers though, and we were not using this Firefox-specific vulnerability."
(Reply) (Thread)
[User Picture]From: brad
2006-02-04 07:15 pm (UTC)

Re: Have you read this blogger's post about talking to the LJ hackers?

Yeah, one group was going after XSS vulns in the /shop/ area. I'm less concerned of those, which were/are easy to fix, than the Mozilla thing, which is much more involved.
(Reply) (Parent) (Thread)
[User Picture]From: perlmonger
2006-02-04 05:04 pm (UTC)
bear in mind that (theoretical) semantic equivalence doesn't necessarily mean equivalent behaviour across browsers (read: in IE[56]).

For example, I have:
a.lj:link, a.lj:visited,
.ljuser a:link, .ljuser a:visited,
.entrytail a:link, .entrytail a:visited {
  color: #f99;

a[href^="http://www.picpix.com/"]:visited {
  color: #f99;

split into two blocks because IE/Win < 7 ignores the whole thing if any alternate contains an attribute selector. Not that I personally care much - unlike for our commercial sites, IE/Win compatibility in my LJ isn't high on my priority list - but other users may be more concerned.
(Reply) (Thread)
[User Picture]From: robflynn
2006-02-06 08:03 pm (UTC)
All I know is HTML/CSS/JS is about to make me punch my computer. Hard. ;)
(Reply) (Thread)