Question

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

7 个回答
4 人有此问题
13 次查看
最后回复者为 Gingerbread Man

10 年前

6/12/13, 9:31 PM

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

Answer 1 · 2013-06-12 21:31:00

Gingerbread Man

6/13/13, 3:10 AM

vinc17 wrote:

With the "file:" URL scheme, character information cannot be provided

For HTML files, specify the character encoding with the meta element's charset attribute.

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8">
...

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-charset

For XHTML files, you can specify the character encoding via XML declaration.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ...

http://en.wikipedia.org/wiki/Character_encodings_in_HTML

For stylesheets, specify the character encoding with the @charset rule.

https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

Answer 2 · 2013-06-12 21:31:00

vinc17 提问者

6/13/13, 10:14 AM

I forgot to say: this is for text/plain files.

Answer 3 · 2013-06-12 21:31:00

cor-el

Moderator
Top 10 Contributor

6/13/13, 10:24 AM

Did you try Auto-Detect > Universal?

Answer 4 · 2013-06-12 21:31:00

vinc17 提问者

6/13/13, 10:36 AM

Auto-Detect → Universal is buggy: https://bugzilla.mozilla.org/show_bug.cgi?id=760050 and I'm looking for a workaround (by choosing something that wouldn't include TIS-620...).

Answer 5 · 2013-06-12 21:31:00

Gingerbread Man

6/13/13, 11:48 PM

Your text file is missing the byte order mark (BOM). All browsers I've tried open it as Western (Windows-1252). Notepad++ identifies it as "ANSI as UTF8".

"Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in." — http://en.wikipedia.org/wiki/Byte_order_mark

Saving it with BOM makes all browsers treat it as UTF-8.

Answer 6 · 2013-06-12 21:31:00

vinc17 提问者

6/14/13, 12:51 AM

Using the BOM on UTF-8 files is not satisfactory as I've just explained on the bug report.

Answer 7 · 2013-06-12 21:31:00

Gingerbread Man

6/14/13, 1:30 AM

The View → Character Encoding → Auto-Detect setting to use depends on the languages used in the document. In the case of multiple languages, use either the setting for the first language that appears in the document, or Universal.

Your sample file is rendered properly when using any of the following Auto-Detect settings:

Chinese
East Asian
Japanese
Korean
Simplified Chinese
Traditional Chinese

Your sample file is rendered incorrectly when using any of the following Auto-Detect settings:

(Off)
Russian
Ukrainian
Universal

搜索 | 用户支持

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

所有回复 (7)

提问

浏览我们的帮助文章

Mozilla 账户

搜索 | 用户支持

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

所有回复 (7)