Search Support

Avoid support scams. We will never ask you to call or text a phone number or share personal information. Please report suspicious activity using the “Report Abuse” option.

Learn More

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

more options

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

All Replies (7)

more options

vinc17 wrote:

With the "file:" URL scheme, character information cannot be provided

For HTML files, specify the character encoding with the meta element's charset attribute.

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8">
...

For XHTML files, you can specify the character encoding via XML declaration.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ...

For stylesheets, specify the character encoding with the @charset rule.

more options

I forgot to say: this is for text/plain files.

more options

Did you try Auto-Detect > Universal?

more options

Auto-Detect → Universal is buggy: https://bugzilla.mozilla.org/show_bug.cgi?id=760050 and I'm looking for a workaround (by choosing something that wouldn't include TIS-620...).

more options

Your text file is missing the byte order mark (BOM). All browsers I've tried open it as Western (Windows-1252). Notepad++ identifies it as "ANSI as UTF8".

"Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in." — http://en.wikipedia.org/wiki/Byte_order_mark

Saving it with BOM makes all browsers treat it as UTF-8.

more options

Using the BOM on UTF-8 files is not satisfactory as I've just explained on the bug report.

more options

The View → Character Encoding → Auto-Detect setting to use depends on the languages used in the document. In the case of multiple languages, use either the setting for the first language that appears in the document, or Universal.

Your sample file is rendered properly when using any of the following Auto-Detect settings:

  • Chinese
  • East Asian
  • Japanese
  • Korean
  • Simplified Chinese
  • Traditional Chinese

Your sample file is rendered incorrectly when using any of the following Auto-Detect settings:

  • (Off)
  • Russian
  • Ukrainian
  • Universal