Question

This thread was archived. Please ask a new question if you need help.

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

7 replies
4 have this problem
13 views
Last reply by Gingerbread Man

12 years ago

12.6.13, 21:31

With the "file:" URL scheme, character information cannot be provided, so that character encoding auto-detection must be used. The problem is that some UTF-8 files are incorrectly recognized as TIS-620, which I never use. Possible workarounds for me could be: 1. Remove TIS-620 from the charset list used for auto-detect. 2. Set the auto-detect list to UTF-8, ISO-8859-1. But how can one do this?

Answer 1 · 2013-06-12 21:31:00

Gingerbread Man

13.6.13, 03:10

vinc17 wrote:

With the "file:" URL scheme, character information cannot be provided

For HTML files, specify the character encoding with the meta element's charset attribute.

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8">
...

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-charset

For XHTML files, you can specify the character encoding via XML declaration.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ...

http://en.wikipedia.org/wiki/Character_encodings_in_HTML

For stylesheets, specify the character encoding with the @charset rule.

https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

Answer 2 · 2013-06-12 21:31:00

vinc17 Question owner

13.6.13, 10:14

I forgot to say: this is for text/plain files.

Answer 3 · 2013-06-12 21:31:00

cor-el

Moderator

13.6.13, 10:24

Did you try Auto-Detect > Universal?

Answer 4 · 2013-06-12 21:31:00

vinc17 Question owner

13.6.13, 10:36

Auto-Detect → Universal is buggy: https://bugzilla.mozilla.org/show_bug.cgi?id=760050 and I'm looking for a workaround (by choosing something that wouldn't include TIS-620...).

Answer 5 · 2013-06-12 21:31:00

Gingerbread Man

13.6.13, 23:48

Your text file is missing the byte order mark (BOM). All browsers I've tried open it as Western (Windows-1252). Notepad++ identifies it as "ANSI as UTF8".

"Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in." — http://en.wikipedia.org/wiki/Byte_order_mark

Saving it with BOM makes all browsers treat it as UTF-8.

Answer 6 · 2013-06-12 21:31:00

vinc17 Question owner

14.6.13, 00:51

Using the BOM on UTF-8 files is not satisfactory as I've just explained on the bug report.

Answer 7 · 2013-06-12 21:31:00

Gingerbread Man

14.6.13, 01:30

The View → Character Encoding → Auto-Detect setting to use depends on the languages used in the document. In the case of multiple languages, use either the setting for the first language that appears in the document, or Universal.

Your sample file is rendered properly when using any of the following Auto-Detect settings:

Chinese
East Asian
Japanese
Korean
Simplified Chinese
Traditional Chinese

Your sample file is rendered incorrectly when using any of the following Auto-Detect settings:

(Off)
Russian
Ukrainian
Universal

Search Support

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

All Replies (7)

Explore by product

Explore by topic

Browse by product

Browse all forum threads by topic

Get help with

Search Support

How can I configure character encoding auto-detect (for the "file:" URL scheme)?

All Replies (7)