677 - URI Encoding
Jul. 10th, 2025 01:02 pmWith the book club this past semester I read Daphne du Maurier's Rebecca, Gerald Durrell's My Family and Other Animals, and most of Tony Mendez's Argo. I also read Cormac McCarthy's All the Pretty Horses and Oliver Burkeman's 4000 Weeks on my own.
Percent-Encoding Guide
According to STD 66 RFC 3986, the characters of the string
could have a special meaning in a URI and are reserved. The ASCII alphanumeric characters and those contained in !#$&'()*+,/:;=?@[]
are unreserved. Any character outside of these two sets must be percent-encoded before inclusion in a URI. This is what the JavaScript encodeURI() function does, in addition to encoding the square brackets -._~
which were not yet included in the set of URI characters when the superseded RFC 2396 was written.[]
The unreserved characters can always be left unencoded, so we just need to encode some subset of the reserved characters. This subset depends on the URI scheme being used and where in the URI the characters are. The encodeURIComponent() function encodes all of the reserved characters except for
which probably don't need to be encoded as they weren't yet reserved in RFC 2396. We can encode a still smaller subset of the reserved characters in the following cases.!'()*
Data URIs
RFC 2397 states that the main content section of a data URI will consist of some number of 'uric' characters, and that these characters are defined in RFC 2396. It turns out that any percent-encoded, reserved, or unreserved character is allowed, except for
as these three are not 'uric' characters. The code below shows how an SVG data URI might be constructed.#[]
const rectSVG = String.raw`<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100"><rect fill="#69E" x="20" y="8" width="15.2" height="87"/></svg>`; const rectDataURI = "data:image/svg+xml;charset=UTF-8," + encodeURI(rectSVG).replaceAll("#", "%23");
The resulting string is
.data:image/svg+xml;charset=UTF-8,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20100%20100%22%3E%3Crect%20fill=%22%2369E%22%20x=%2220%22%20y=%228%22%20width=%2215.2%22%20height=%2287%22/%3E%3C/svg%3E
Note that String.raw() is helpful if the data contains backslashes, but if it contains backticks or the substring
then you can no longer simply paste the data into a template literal.${
String.raw`_\`_\${_${"`"}_` === "_\\`_\\${_`_"
Since the square brackets were more recently reserved, I thought they might be allowed in data URIs, but as of now a link like <a href="data:,%23[]">#[]</a> is flagged for an illegal character error by the W3C markup validator. I saw a few GitHub issues like this one in support of unescaped square brackets, so they may be allowed in the future.
Query Strings
The query part of a URI begins after a question mark. It is composed of the 'query' characters defined in RFC 3986, and these are exactly the same as the 'uric' characters. However, the characters
have special purposes. The query string is a set of 'key=value' pairs, separated by ampersands, and in which plus signs represent spaces. The equals sign needs to be encoded in the 'key' portion, but not in the 'value' portion as only the first equals sign separates the two parts. Encode the data as you would for a data URI, then handle these three special characters, and as a final step we can change the encoding of spaces from &+=
to %20
. This data URI contains two links which compare the encoding of the reserved characters and the space character in a data URI and in a query string.+
const inlineStyle = `background-image:url("${rectDataURI}");color-scheme:light dark`; const vertices = "[[1,0],[0.58,0.58],[0,1],[-0.58,0.58],[-1,0],[-0.58,-0.58],[0,-1],[0.58,-0.58]]"; function encodeQueryValue(val) { return encodeURI(val).replace(/[#&'+]|%20/g, function (char) { return { "#": "%23", "&": "%26", "'": "%27", "+": "%2B", "%20": "+" }[char]; });} const mirrorPolygonURL = `https://home.6t.lt/66c/mirror_polygon.svg?h=6&v=${encodeQueryValue(vertices)}&i=${encodeQueryValue(inlineStyle)}`;
The above code also encodes the single quote character, as this GitHub issue suggests doing so in some cases. The code generates the URL
.https://home.6t.lt/66c/mirror_polygon.svg?h=6&v=%5B%5B1,0%5D,%5B0.58,0.58%5D,%5B0,1%5D,%5B-0.58,0.58%5D,%5B-1,0%5D,%5B-0.58,-0.58%5D,%5B0,-1%5D,%5B0.58,-0.58%5D%5D&i=background-image:url(%22data:image/svg%2Bxml;charset=UTF-8,%253Csvg%2520xmlns=%2522http://www.w3.org/2000/svg%2522%2520viewBox=%25220%25200%2520100%2520100%2522%253E%253Crect%2520fill=%2522%252369E%2522%2520x=%252220%2522%2520y=%25228%2522%2520width=%252215.2%2522%2520height=%252287%2522/%253E%253C/svg%253E%22);color-scheme:light+dark
If you want to encode the whole query string at once, then the reserved characters to be encoded are
, and any ampersands or equals signs could be manually encoded if necessary. Just remember that all extra encoding has to be done after using encodeURI() to avoid double encoding.#'+[]
End of 2024 Changes
- Made a basic HTML Viewer webpage, as an alternative to an HTML data URI. I've set it up at the domain '6t.lt': link.
- Started posting bike ride plans for around Amherst on my cycling repository GitHub page.
- Uploaded an SVG version of the UMass Recreational Math Club logo.