Why are round brackets escaped when copy/pasting URLs from Firefox, in contravention of RFC3987?
Go to any site, and add a 'hash fragment' to the end of the URL that contains round brackets. Copy that URL out of Firefox and paste into a text editor, like so:
Note that the round brackets have been escaped into %28 and %29 characters.
RFC3987 specifies that "(" and ")" are 'gen-delimiters' are are _not_ equivalent to their escape codes. Replacing these characters with their escape codes is as incorrect (according to the spec) as replacing the "$" or "!" characters (which are not escaped in FF or other browsers) and the "'" (apostrophe) character, which universally is. No, I don't understand it either.
Worse, this behavior does not match what the other major browsers do, so Firefox has essentially broken the promise that URLs are universal. If there are any reasons for doing this, they have not been made clear or documented, that I can find.
The hash fragment is increasingly being used to store 'client side' information, and so messing with these characters is breaking applications. I'm trying to understand if there's a good reason why, or just poor programming. In which case, I'll log a bug report.
Additional System Details
- User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
You can always prevent such an escape by placing a character (a space will do) at the start of the location bar and copy the text to the right to the clipboard (Home; Space; Shift+End; Ctrl+C).
Alas, I can't require that every user of my web app do this if they're specifically using FireFox and want to copy the link in order to send it to someone who's using a different browser. That's silly.
The issue is not _how_ to fix it, because it's unfixable without altering the FF codebase. The question is why it's doing it in the first place.
Webbrowsers do not use RFC3987 at all. See section 1.2.a about Applicability. Maybe you should have a look at RFC3986 instead. What is the delimiting role for the round brackets that would allow their unescaped use?
Interesting... that is a good explanation, if a little unexpected. However RFC3987 also quotes the same set of gen and sub-delims (including round brackets) as '86. From section 2.2:
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.
Wait.. scratch that... looking at the wrong RFC. Being hasty.
In answer to your question about the delimiting role: I am personally trying to use the round brackets to delimit data in the hash fragment, because those delimiters are unused in HTTP/S and therefore are unambiguous for my purpose. The point is, the delimiters are supposed to be unreplaceable in ALL schemes (as far as I understand the spec) because some other/future scheme may use it, and URI/L/IRIs processors are not required to know scheme-specific details. That's why the delimiter set exists.
OK, so yes, the first part I quoted was the relevant part of RFC3986, which makes the round brackets a reserved character. This is from RFC3987, which you say is what browsers currently implement:
2.1. Summary of IRI Syntax
IRIs are defined similarly to URIs in [RFC3986], but the class of unreserved characters is extended by adding the characters of the UCS (Universal Character Set, [ISO10646]) beyond U+007F, subject to the limitations given in the syntax rules below and in section 6.1.
Otherwise, the syntax and use of components and reserved characters is the same as that in [RFC3986].
Which indicates the round brackets have the same delimiting status as the apostrophe, exclamation, and dollar sign. Some of which are escaped by FF, and some of which aren't.
'87 states in it's section 2.2: (identical to '86)
ipchar = iunreserved / pct-encoded / sub-delims / ":" / "@" iquery = *( ipchar / iprivate / "/" / "?" ) ifragment = *( ipchar / "/" / "?" )
To anticipate your point that because the round brackets do not _currently_ have a defined use as a delimiter in HTTP, therefore browsers (which should be fully aware of their own scheme, after all) are allowed to interchange them for the escaped version: they can't. Not allowed.
Historical equivalent: For a long time, the square brackets [ ] had no known specific use, but were one of the reserved delimiters. Later on, they were 'called up for use' to enclose IPv6 addresses, and a LOT of software promptly broke from randomly escaping them. There's still fallout.
The spec you quoted says, as far as I can tell, the ( ) should never be mangled. Sure, we don't generally see them used in URL/IRI's, but perhaps this is why. Is there a FF document that gets into the details?
Sorry to be verbose, but just to be clear: none of this is happening in the FF URL bar. You can type in round brackets and it works just fine. You can bookmark pages and drag links to other browsers and it's all good, the brackets are preserved.
It's only when you text copy the link _from_ the location box and paste it into an editor using normal copy-and-paste that the ( ) get mangled. (the same way the http:// magically gets added) So FF is inconsistent with itself?
I think I've talked myself into this being a flat-out bug, and will log it as such. Wonder how long it's been there.
Hello JeremyLee, Thank you for your answer. I agree with you on that last part where you say that Firefox is inconsistent with itself. I have noticed myself that Firefox is not always working as it should according to RFC3986. What I wanted to say about RFC3986 is that it seems to say (at least to me) that when a character is percent encoded by mistake, then it should be treated like the original character. So I think that when there is no delimiting role for the character, the URL processing application should treat both equally. From that point of view Firefox may be acting stupid, but it is technically not wrong. On the other hand, I am not an English speaker, so your knowledge of English is probably better than mine. Personally I find section 2.2 confusing, and a contradiction with the idea of reserved characters itself. I would prefer the situation where all delimiters are escaped to avoid any problem.
That is exactly true for the "non-delimiting" characters [a-zA-Z~_.-] which can be swapped at any time, but the purpose of the delimiters is precisely that they are NOT equivalent to their escaped codes. Ever. Otherwise they would be useless.
For example, the "?", "#" and "/" characters are also delimiters, and it is well understood that changing them to their escapes prevents URLs from working as intended, because the escaped version _must_ be treated as literal data, (eg, part of a filename) and not part of the URL syntax.
An example would be a query value that contains a question mark: http://server.org/page?query=%3F
It should be clear that you can't change either the "?" or the "%3F" into the other, or it changes the meaning of the URL. That's what makes "?" a delimiter character. The same logic is supposed to apply to the rest of the gen-delims, including the brackets.
And you are not supposed to know or care that the "http" scheme prefix does not currently use all the delimiters. (scheme-specific knowledge) It might one day, like the square brackets. For all you know, they're used in the "wais" or "ftp" schemes.
At least, this is _my_ current understanding after reading docs for a week. I keep hoping there's another document out there which explains the FireFox point of view. There just doesn't seem to be any logic to it, with inconsistencies everywhere.
Actually a better example would have been: http://server.org/page%3F.php?query=something
is actually legal and would work fine. (Once the first ? is hit, everything up to the "#" is data.)
I have this problem, too, and I think it is a bug, for the following simple argument:
Copying an URL from the location bar and immediately pasting it back into the browser's location bar, should result in precisely the same HTTP request, since these are inverse operations:
request = paste( copy( request ) )
In Firefox, this is not the case, because signs like '(' are encoded when copying. If the paste(copy(request)) is sent to the server, the percent signs of encoded chars are encoded again. For the server, this results in a different URL (and will usually lead to a "Not Found" reply).