text coding
I run a simple website. gcessa.org. Until recently I was able to import a .doc or .docx document converted to a text document into the site and it loaded like one would expect. Lately the characters ', ", and - have been resolving as a �. There might be others I'm not aware of. Which means the characters are not found in the codex. What do I need to do to resolve this issue. I believe it started after one of the last two updates. I don't see this happening in edge or chrome.
Thanx
Vybrané riešenie
Your meta tag works great. I insert that meta tag earlier but I didn't code it correctly. Still have to go through page by page. But it's just to insert that meta tag instead multiple steps to run the macro which kept growing to accommodate more and more characters.
Thanx for your help.
Čítať túto odpoveď v kontexte 👍 0Všetky odpovede (17)
The server sends the main HTML file(s) as Unicode (UTF-8). If I check the page source then I see content="text/html; charset=iso-8859-1". What encoding is used for those DOC(X) files ?
A common cause is the presence of 0x80 - 0x9f characters from a Windows encoding to add quotes. Unicode and Western (ISO-8859-1) do not have characters in this range.
Upravil(a) cor-el dňa
Any particular page? When posting a link to a non-Mozilla site, you can prevent delays by breaking the URL before the top-level domain:
https://www.example .org/path/page
(Live links go into a spam link moderation queue for review)
I’ve been hand coding this website since 2000. Started with Hotdog. Currently using Macromedia Dreamweaver as my editor - for at least 10 years. Since 2013 I’ve been using ms word 2013 to edit .doc text files then converting to .txt and inserting into various pages. The default has been “windows default” or western European. This has worked well for years until recently. Something changed with one of the last two Firefox updates. Now I get a lot of these diamond shapes with a question mark inside. I understand why. I don’t understand what Firefox did to the coding. Chrome and Edge work fine.
No particular page. Any text with ', ", or - shows the diamond with a question mark.
Also, I recently (within the last two years) started inserted the "content="text/html; charset=iso-8859-1" on new or modified pages. I removed it and it made no difference.
petecovert said
No particular page. Any text with ', ", or - shows the diamond with a question mark.
These all look fine to me: http://gcessa.org/minutes2023.htm
I see just "straight" keyboard-character marks that should work with any font, and not the curved marks you might get in a word processor (such as word with "Autoformat as you Type" set to replace the straight ones with curved ones).
Do you see them as curved marks in other browsers?
I did a replace all (', ", -) on pages http://gcessa.org/minutes2023.htm and http://gcessa.org/minutes2022.htm. So those pages resolve correctly. I checked Chrome again and there are no diamonds with a ? on any pages I checked. I checked some of the raw text files and they have the curved version going all the way back to 2001. This diamond with a ? mark is a new phenomenon with Firefox in the last couple of months.
You did that before I looked at them? This is why I asked where to see the problem. What pages have the problem?
"The Club"/"Meeting Minutes"/2001-2021. "The Club"/"Brags"/2000-2021.
I fixed the home page this morning.
Web Console: The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it. brags2009.htm The byte stream was erroneous according to the character encoding that was inherited from the parent document. The character encoding needs to be declared in the Content-Type HTTP header, using a meta tag, or using a byte order mark. brags2009.htm
Opening the frame in a new tab shows the quotes correctly as Unicode quotes.
The first screenshot shows the framed page with UTF-8 encoding that shows diamonds. The second screenshot shows the page opened in a new tab with Windows-1252 encoding and correct quotes.
I see what you are saying. I do this one website as a "hobby". It's the only website I do. I'm using Dreamweaver 6 as my webpage editor and MS Word 2013 as my document editor. I save the edited document as a text file and copy/paste the contents into the webpage.
The observation: Chrome and Edge work fine. Until maybe a couple of months ago Firefox worked fine (and has for a lot of years). Something changed with Firefox. Is there an easy way to fix what Firefox broke or do I have to go page by page and do a find/replace?
Did you edit some of the files because something went wrong and some JavaScript ended up as text in the body ? You forgot a closing tag for the meta http-equiv="Content-Type" tag and that makes the script tag ending up as text in the body.
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" <script="" type="text/javascript"> <!----> </head><body class="gcessa" vlink="#800080" link="#008000" background="paws02.jpg" alink="#ff0000">var image1=new Image() image1.src="photos/dogpic (1).jpg" var image2=new Image()
I think you should serve your site as UTF-8 by default, and only add the windows-1252 charset to the framed pages that do not work with that setting. Otherwise, you may have other issues.
There are definitely ANSI quotes in the 0x80 to 0x9f range in those files if I check the raw code and thus won't work with UTF-8.
I assume that in the past the server didn't send a content-type and at some time started sending the Content-Type: text/html; charset=UTF-8 HTTP response header. In Firefox the content-type send by the server always prevails.
If you are familiar with Word Macros, you can use this one to straighten out the quotation marks and apostrophes before uploading or pasting the document text:
Sub CurlyQuoteStraightener() ' Turn off Smart Quotes temporarily Options.AutoFormatAsYouTypeReplaceQuotes = False ' Find/Replace Curly quotes to straight With Selection.Find .ClearFormatting .Replacement.ClearFormatting .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False ' Left curly quotes Selection.HomeKey wdStory .Text = ChrW(8220) .Replacement.Text = """" .Execute Replace:=wdReplaceAll ' Right curly quotes Selection.HomeKey wdStory .Text = ChrW(8221) .Replacement.Text = """" .Execute Replace:=wdReplaceAll ' Curly apostrophe Selection.HomeKey wdStory .Text = ChrW(8217) .Replacement.Text = "'" .Execute Replace:=wdReplaceAll End With ' Turn Smart Quotes back on Options.AutoFormatAsYouTypeReplaceQuotes = True End Sub
Thank you for the macro. I've used macros in Excel a few times. I found a number of other characters that needed to be corrected. However I've been finding other errors that a macro wont fix. It's going to be very tedious and time consuming to go page by page and fixing.
It's too bad there can't be something to put in the beginning of the website or at least each page to overcome this problem.
But thanx for your help.
Does it work if you add a meta tag that sets the charset?
- <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" >
Vybrané riešenie
Your meta tag works great. I insert that meta tag earlier but I didn't code it correctly. Still have to go through page by page. But it's just to insert that meta tag instead multiple steps to run the macro which kept growing to accommodate more and more characters.
Thanx for your help.