搜索 | 用户支持

防范以用户支持为名的诈骗。我们绝对不会要求您拨打电话或发送短信,及提供任何个人信息。请使用“举报滥用”选项报告涉及违规的行为。

Learn More

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

more options

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

所有回复 (7)

more options

vinc17 wrote:

With the "file:" URL scheme, character information cannot be provided

For HTML files, specify the character encoding with the meta element's charset attribute.

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8">
...

For XHTML files, you can specify the character encoding via XML declaration.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ...

For stylesheets, specify the character encoding with the @charset rule.

more options

I forgot to say: this is for text/plain files.

more options

Did you try Auto-Detect > Universal?

more options

Auto-Detect → Universal is buggy: https://bugzilla.mozilla.org/show_bug.cgi?id=760050 and I'm looking for a workaround (by choosing something that wouldn't include TIS-620...).

more options

Your text file is missing the byte order mark (BOM). All browsers I've tried open it as Western (Windows-1252). Notepad++ identifies it as "ANSI as UTF8".

"Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in." — http://en.wikipedia.org/wiki/Byte_order_mark

Saving it with BOM makes all browsers treat it as UTF-8.

more options

Using the BOM on UTF-8 files is not satisfactory as I've just explained on the bug report.

more options

The View → Character Encoding → Auto-Detect setting to use depends on the languages used in the document. In the case of multiple languages, use either the setting for the first language that appears in the document, or Universal.

Your sample file is rendered properly when using any of the following Auto-Detect settings:

  • Chinese
  • East Asian
  • Japanese
  • Korean
  • Simplified Chinese
  • Traditional Chinese

Your sample file is rendered incorrectly when using any of the following Auto-Detect settings:

  • (Off)
  • Russian
  • Ukrainian
  • Universal