How can I configure character encoding auto-detect (for the "file:" URL scheme)?
With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?
All Replies (7)
vinc17 wrote:
With the "file:" URL scheme, character information cannot be provided
For HTML files, specify the character encoding with the meta element's charset attribute.
<!DOCTYPE html> <html lang="en-us"> <head> <meta charset="utf-8"> ...
For XHTML files, you can specify the character encoding via XML declaration.
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ...
For stylesheets, specify the character encoding with the @charset rule.
I forgot to say: this is for text/plain files.
Did you try Auto-Detect > Universal?
Auto-Detect → Universal is buggy: https://bugzilla.mozilla.org/show_bug.cgi?id=760050 and I'm looking for a workaround (by choosing something that wouldn't include TIS-620...).
Your text file is missing the byte order mark (BOM). All browsers I've tried open it as Western (Windows-1252). Notepad++ identifies it as "ANSI as UTF8".
"Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in." — http://en.wikipedia.org/wiki/Byte_order_mark
Saving it with BOM makes all browsers treat it as UTF-8.
Using the BOM on UTF-8 files is not satisfactory as I've just explained on the bug report.
The View → Character Encoding → Auto-Detect setting to use depends on the languages used in the document. In the case of multiple languages, use either the setting for the first language that appears in the document, or Universal.
Your sample file is rendered properly when using any of the following Auto-Detect settings:
- Chinese
- East Asian
- Japanese
- Korean
- Simplified Chinese
- Traditional Chinese
Your sample file is rendered incorrectly when using any of the following Auto-Detect settings:
- (Off)
- Russian
- Ukrainian
- Universal