Ticket #1010 (closed defect: wontfix)

Opened 2 years ago

Last modified 18 months ago

i18n: non-ASCII encoding broken on Konqueror, Safari

Reported by: peller Owned by: peller
Priority: normal Milestone:
Component: General Version: 0.3
Severity: normal Keywords:
Cc:

Description

Konqueror apparently does not honor UTF-8 by default in XHR

Change History

Changed 2 years ago by peller

  • summary changed from i18n: non-ASCII encoding broken on Konqueror to i18n: non-ASCII encoding broken on Konqueror, Safari

Changed 2 years ago by peller

http://twistedmatrix.com/pipermail/twisted-web/2005-February/001165.html

Unclear exactly when this was fixed, and I'm not sure this covers the case where no encoding is specified. Either way, we may be stuck doing the encoding on to old browsers?

Changed 2 years ago by peller

Changed 2 years ago by mumme

I can confirm that this apears in the latest kde3.5 branch so it isnt resolved yet, at least not for konqueror

This is cache related, seems like the Content-Type header isnt stored on cache.

After debugging this some bit with kdevelop on a related thing, I was trying to find out why a cached XHR got a 200 status on a async, but 304 on a sync XHR, it seems like the decoder looks for utf-8 BOM in the 3 start bytes, if that isnt found it looks for tags with charset info (<?xml or <meta )

it seems like would be able to get away with:

/* <?xml version="1.0" encoding="UTF-8" ?> */

in the top of your translation files

or use cache a buster

or save as UTF-16, iso-10646-ucs2

If you still would like to do the workaround that is..

/ Fredrik

Changed 2 years ago by peller

Interesting. I'm not sure this is a content-type header bug, as I was relying on the default encoding to be UTF-8 without any content-type heading... but I was hoping that content-type could be used as a server-based workaround. Guess that's out.

Unfortunately, all of these workaround would break the current code, which assumes the contents of the file can be eval'd as a JS expression. I suppose we could introduce code to optionally eat an XML decl, but I'd hate to do this... other browsers assume UTF-8, so encoding in UTF-16 or iso would break them, I think.

For now, I think the only workaround is to encode with JavaScript? uxxxx escapes or the single byte equivalent for high ascii. A build script could do this.

Changed 2 years ago by mumme

Well I looked some more into the khtml code and there are a number of issues, not just the content-type cache. I wrote a patch that seems to be working. Im going to try it a bit more before I send it to Kfm devel.

However the reason I put the uggly <?xml ... charset="UTF-8" inside a javascript comment was that it makes it eval'able. The auto detection decoder used in khtml XHR is the same as any other HTML/xml page and it does'nt care about a javascript comment.

I know its uggly but it must be cleaner than doing a build script replacement

/ Fredrik

Changed 2 years ago by peller

ah... I missed the /* comments */ around the xml declaration. Clever workaround, even if it's a real kludge.

That's wonderful if you can help get a patch into khtml!

Changed 2 years ago by dylan

  • milestone set to 0.4

Changed 2 years ago by peller

  • status changed from new to closed
  • resolution set to wontfix

Ok, so we have a workaround (thanks, Fredrik) and a bug filed against KDE. Not sure there's much more we can do. I checked in an example of the workaround into the tests and will add it when we have more detailed how-to documentation.

Changed 18 months ago by anonymous

  • milestone deleted

Milestone 0.4 deleted

Note: See TracTickets for help on using tickets.