Url decode php
PHP URL Encoding Decoding
PHP supports URL encoding decoding options with the use of its core functions. PHP urlencode() and urldecode() functions are the built-in functions which usually help to perform URL encoding and decoding respectively. In this tutorial, we are going to see the type of URL encoding type, PHP functions to encode/decode based on the standard with examples.
Encoding is an important job in various scenarios. URL encoding helps to converts the URL by adding standard entities into it. For example, if an URL contains non-alphanumeric characters then the encoding will replace those characters with the encoding entities. Some special characters are exceptional and will not be replaced. This will help to prevent data truncation, jumbling while accessing the URL data.
The encoding is required to be performed before sending URL data to query string or to a function which might dynamically work on this URL data. And then, the encoded data will be decoded into its original form, after receiving it in target or landing PHP code block. Decoding is the process of reversing the encoded data back to form.
URL Encoding Decoding Types in PHP
In PHP, the URL encoding and decoding can be done in two different ways. Those types are listed below. Based on the content type to be encoded, choosing the encoding method will be meaningful and effective.
- application/x-www-form-urlencoded type
- RFC 3986 standard type
In the following sections, we will see how to apply PHP encoding decoding based on the above types of standards.
application/x-www-form-urlencoded type
For performing this type of encoding or decoding, the PHP built-in functions urlencode() and urldecode() are used. We can prefer this type when we need to send the data submitted from the form to the URL query string.
This functions will replace the special characters except (_), (-) and (.) that occurs in the given URL with %[hex code]; space will be replaced with ‘+’ character.
If we send URL without encoding, then some special characters may truncate the actual data passed through the page URL. Then It will cause errors or unexpected results.
PHP Encoding Decoding Example with urlencode() urldecode()
The following example code will show how to apply encoding decoding on a URL by using PHP urlencode() urldecode() functions. In this script, the URL is assigned to a PHP variable. Then this variable will be passed to the urlencode() function. This function will return the encoded URL. To revert this encoded URL back to its original form, the urldecode function will be used.
RFC 3986 standard type
PHP functions rawurlencode() and rawurldecode() are used to encode and decode the URL with this type. By using this method the space in the URL could be replaced with %[hex code] instead of plus (+) symbol. This type of encoding will be preferable when we need to create URL dynamically.
As of PHP version 5.3.0 and later, the rawurlencode uses the RFC 3986 standard. Prior to this version, it followed RFC 1738 standard.
PHP Example with rawurlencode() and rawurldecode()
This code shows how to apply the rawurlencode() and rawurldecode functions. There is no difference in using these encoding decoding functions. It will be very similar to the usage of the urlencode() urldecode() functions. The expected output is stated by using the PHP comment statement. Run this program in your PHP environment and check if you get the same output as it is in the below code.
Comments to “PHP URL Encoding Decoding”
Hi Vincy, Ive been farming on your blog and i have found it interesting (php wise). A week back i started programming in php and I’m having fun with it. Now that i have found you i believe you can be a handy tutor as regards helping me understand this language. I’m looking forward to sharing more with you.
Decode from URL encoded format
Decode files from URL encoded format
Meet URL Decode and Encode, a simple online tool that does exactly what it says; decodes URL encoding and encodes into it quickly and easily. URL encode your data in a hassle-free way, or decode it into human-readable format.
URL encoding, also known as percent-encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such it is also used in the preparation of data of the «application/x-www-form-urlencoded» media type, as is often used in the submission of HTML form data in HTTP requests.
Advanced options
- Character set: In case of textual data the encoding scheme does not contain their character set, so you have to specify which one was used during the encoding process. It is usually UTF-8, but can be any other; if you are not sure then play with the available options including the auto-detect one. This information is used to convert the decoded data into our website’s character set, so all letters and symbols can be displayed properly. Note that this is irrelevant for files, since no web-safe conversions have to be applied to them.
- Decode each line separately: The encoded data usually consist of continuous text, even newlines are converted into their percent encoded forms. Prior decoding all non-encoded whitespaces are stripped from the input to take care of its integrity. This option is useful if you intended to decode multiple independent data entries separated with line breaks.
- Live mode: When you turn on this option the entered data is decoded immediately with your browser’s built-in JavaScript functions — without sending any information to our servers. Currently this mode supports only the UTF-8 character set.
Safe and secure
All communications with our servers are made through secure SSL encrypted connections (https). Uploaded files are deleted from our servers immediately after being processed, and the resulting downloadable file is deleted right after the first download attempt, or 15 minutes of inactivity. We do not keep or inspect the contents of the entered data or uploaded files in any way. Read our privacy policy below for more details.
Our tool is free to use. From now you don’t have to download any software for such tasks.
Details of the URL encoding
Types of URI characters
The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding). Reserved characters are those characters that sometimes have special meaning. For example, forward slash characters are used to separate different parts of a URL (or more generally, a URI). Unreserved characters have no such meanings. Using percent-encoding, reserved characters are represented using special character sequences. The sets of reserved and unreserved characters and the circumstances under which certain reserved characters have special meaning have changed slightly with each revision of specifications that govern URIs and URI schemes.
RFC 3986 section 2.2 Reserved Characters (January 2005) | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
! | * | ‘ | ( | ) | ; | : | @ | & | = | + | $ | , | / | ? | # | [ | ] |
RFC 3986 section 2.3 Unreserved Characters (January 2005) | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | — | _ | . |
Other characters in a URI must be percent encoded.
Percent-encoding reserved characters
When a character from the reserved set (a «reserved character») has special meaning (a «reserved purpose») in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded. Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits. The digits, preceded by a percent sign («%»), are then used in the URI in place of the reserved character. (For a non-ASCII character, it is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.)
The reserved character «/», for example, if used in the «path» component of a URI, has the special meaning of being a delimiter between path segments. If, according to a given URI scheme, «/» needs to be in a path segment, then the three characters «%2F» or «%2f» must be used in the segment instead of a raw «/».
Reserved characters after percent-encoding | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
! | # | $ | & | ‘ | ( | ) | * | + | , | / | : | ; | = | ? | @ | [ | ] |
%21 | %23 | %24 | %26 | %27 | %28 | %29 | %2A | %2B | %2C | %2F | %3A | %3B | %3D | %3F | %40 | %5B | %5D |
Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.
In the «query» component of a URI (the part after a ? character), for example, «/» is still considered a reserved character but it normally has no reserved purpose, unless a particular URI scheme says otherwise. The character does not need to be percent-encoded when it has no reserved purpose.
URIs that differ only by whether a reserved character is percent-encoded or appears literally are normally considered not equivalent (denoting the same resource) unless it can be determined that the reserved characters in question have no reserved purpose. This determination is dependent upon the rules established for reserved characters by individual URI schemes.
Percent-encoding unreserved characters
Characters from the unreserved set never need to be percent-encoded.
URIs that differ only by whether an unreserved character is percent-encoded or appears literally are equivalent by definition, but URI processors, in practice, may not always recognize this equivalence. For example, URI consumers shouldn’t treat «%41» differently from «A» or «%7E» differently from «
«, but some do. For maximum interoperability, URI producers are discouraged from percent-encoding unreserved characters.
Percent-encoding the percent character
Because the percent («%») character serves as the indicator for percent-encoded octets, it must be percent-encoded as «%25» for that octet to be used as data within a URI.
Percent-encoding arbitrary data
Most URI schemes involve the representation of arbitrary data, such as an IP address or file system path, as components of a URI. URI scheme specifications should, but often don’t, provide an explicit mapping between URI characters and all possible data values being represented by those characters.
Since the publication of RFC 1738 in 1994 it has been specified[1] that schemes that provide for the representation of binary data in a URI must divide the data into 8-bit bytes and percent-encode each byte in the same manner as above. Byte value 0F (hexadecimal), for example, should be represented by «%0F», but byte value 41 (hexadecimal) can be represented by «A», or «%41». The use of unencoded characters for alphanumeric and other unreserved characters is typically preferred as it results in shorter URLs.
The procedure for percent-encoding binary data has often been extrapolated, sometimes inappropriately or without being fully specified, to apply to character-based data. In the World Wide Web’s formative years, when dealing with data characters in the ASCII repertoire and using their corresponding bytes in ASCII as the basis for determining percent-encoded sequences, this practice was relatively harmless; it was just assumed that characters and bytes mapped one-to-one and were interchangeable. The need to represent characters outside the ASCII range, however, grew quickly and URI schemes and protocols often failed to provide standard rules for preparing character data for inclusion in a URI. Web applications consequently began using different multi-byte, stateful, and other non-ASCII-compatible encodings as the basis for percent-encoding, leading to ambiguities and difficulty interpreting URIs reliably.
For example, many URI schemes and protocols based on RFCs 1738 and 2396 presume that the data characters will be converted to bytes according to some unspecified character encoding before being represented in a URI by unreserved characters or percent-encoded bytes. If the scheme does not allow the URI to provide a hint as to what encoding was used, or if the encoding conflicts with the use of ASCII to percent-encode reserved and unreserved characters, then the URI cannot be reliably interpreted. Some schemes fail to account for encoding at all, and instead just suggest that data characters map directly to URI characters, which leaves it up to implementations to decide whether and how to percent-encode data characters that are in neither the reserved nor unreserved sets.