Skip to content


A Better Way to Serve XHTML Pages to IE

A while back, I thought I had come up with a clever method for serving this XHTML 1.1 site with the HTML MIME type just to broken browsers, and with the XHTML MIME type for all of the good browsers, without having to duplicate files or rewrite HTTP headers on-the-fly.

To accomplish this, I gave every page on this site an extension of .xhtml, set the quality factor of HTML and XHTML to be identical, and then created a symbolic link for each XHTML page I wanted to have visible in Explorer. Since the symlink had the extension .HTML and a file size of just 17 bytes, this would ensure that any browser dumb enough to specify support for both XHTML and HTML (without indicating a preference for one) would be served up the XHTML file with the HTML MIME type.

This worked great, since I could select which pages I wanted to be unavailable to Internet Explorer users. But then a few days ago I went back to the old problem of making content negotiation work on mobile phone browsers that were also dumb enough to indicate support for HTML and WAP without specifying a preference. In fiddling around with Opera’s Accept header to make it behave like a mobile phone, I discovered the flaw in the above scheme. Since that symlink was always smaller than any WAP page, it was always being served to the modified Opera, just as it would be to any mobile phone browser that sent the correct header.

So I spent about two hours googling around, looking for a better way to change the MIME type sent to IE, and although I didn’t find anything that was exactly right, I did find something that was brilliant enough to perfect to use as a starting point.

Though he has since described the technique as flawed, Dan “MinutiaeMan” Carlson basically got it right the first time when he showed how to use Apache’s URL rewriting feature to target IE for special treatment. Although he has a point with the fact that Google also fails to support the XHTML MIME type, there are better ways of handling this than going back to the URL rewriting scheme that he now uses (the same method that I initially used).

First, here is Dan’s original URL rewriting code:

AddType application/xhtml+xml .xhtml
RewriteEngine on
RewriteCond %{REQUEST_URI} \.xhtml$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteCond %{HTTP_USER_AGENT} MSIE [NC]
RewriteRule .* - [T=text/html]

This is exactly the right concept for a purist interested in getting browser and search-engine developers to fix their browsers without keeping the 93% of surfers using Internet Explorer from seeing the site at all. In short, it is designed to filter only URLs with the .xhtml extension, only those sent via HTTP 1.1, and only those with MSIE (in any case) in their browser string, and then change the MIME type to text/html.

But there are a few minor problems. Neither are obvious, but both are important. Filtering on the HTTP header REQUEST_URI only catches URLs that have .XHTML as the extension. This will not catch URLs that specify no filename, yet point to one through content negotiation nonetheless. Instead, we need to filter on REQUEST_FILENAME, which references the full URI that is actually being served to the client, even if the full URI was not used to request it.

Second, applying this rewrite rule to all browsers with MSIE in their User Agent header will snag all clients with that word anywhere in their identification string. This might actually be desired, but it does include Opera in its default configuration, as well as the Avant browser, Amiga Aweb, iCab, and MS WebTV. And it will almost certainly affect future versions of IE, even if Microsoft modifies it to accept pages served with the XHTML MIME type.

To filter out just Internet Explorer for receiving Web pages as text/html, we need to filter on something more specific. All known versions of IE produce a string very similar to Mozilla/4.0 (compatible; MSIE 6.0; Update a; AOL 6.0; Windows 98), and all of them are identical from the beginning all the way up to the IE version number. So instead, we need to filter on a couple of strings, one for each version of IE that can not handle XHTML.

With these changes, my improved URL rewriting code looks like this:

RewriteBase /
RewriteCond %{REQUEST_FILENAME} \.xhtml$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteCond %{HTTP_USER_AGENT} "Mozilla\/4.0 \(compatible; MSIE 4" [OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla\/4.0 \(compatible; MSIE 5" [OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla\/4.0 \(compatible; MSIE 6.0"
RewriteRule .* - [T=text/html;charset=UTF-8]

If it turns out after configuring PetesGuide with this technique, that Google still can’t index XHTML, then I can simply add another [OR] user-agent test to the above code—or better yet, convince Google to fix the problem.

Posted in Standards, Web.


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

You must be logged in to post a comment.