- Published on
Introduction to Base64 Algorithm
Viewed
times
- Authors
- Name
- Introduction to Base64 Algorithm
- Creating GLS Shipment Labels From API to Base64 Encoding
Welcome to our series on Base64 encoding! In this post, we'll explore the basics of Base64 - a method used to encode binary data into a text format. Let's get started.
Prerequisites
Before we dive into Base64 encoding, it's helpful to have a basic understanding of the following concepts.
The sections below can be expanded for quick refreshers on these concepts. Click on each topic to reveal more details.
Binary data
- Binary is a base-2 number system using only 0s and 1s
- Each digit is called a bit (binary digit)
- Used to represent all data in computers
- Example: The letter 'A' in binary is 01000001
Bits and bytes
- A bit is the smallest unit of data (0 or 1)
- A byte is a group of 8 bits
- 1 byte can represent 256 different values (2^8)
- Bytes are commonly used to represent characters in text encoding
ASCII
- ASCII stands for American Standard Code for Information Interchange
- It's a character encoding standard for electronic communication
- Represents text characters as numbers from 0-127
- Example: 'A' is represented by the decimal number 65
Text encoding
- Text encoding is the process of converting human-readable text to machine-readable format
- Common encodings include ASCII, UTF-8, and Unicode
- Encoding ensures consistent representation and storage of text across different systems
- Different encodings support various character sets and languages
Understanding Base64 Encoding
Base64 is an encoding algorithm that converts binary data into a text format. It's designed to allow binary data to be safely transmitted over channels that only support text, such as email systems, certain network protocols, and legacy data storage systems. These text-only channels typically handle ASCII characters well but may have issues with binary data. Base64 encoding ensures that binary information can be reliably transmitted through these text-based systems. Let's dive into how it works.
The Base64 Character Set
Base64 uses a set of 64 characters to represent binary data in a text format:
- A-Z (26 characters)
- a-z (26 characters)
- 0-9 (10 characters)
- '+' and '/' (2 characters)
Additionally, '=' is used for padding.
Here is the complete Base64 index table:
Index | Character | Index | Character | Index | Character | Index | Character |
---|---|---|---|---|---|---|---|
0 | A | 16 | Q | 32 | g | 48 | w |
1 | B | 17 | R | 33 | h | 49 | x |
2 | C | 18 | S | 34 | i | 50 | y |
3 | D | 19 | T | 35 | j | 51 | z |
4 | E | 20 | U | 36 | k | 52 | 0 |
5 | F | 21 | V | 37 | l | 53 | 1 |
6 | G | 22 | W | 38 | m | 54 | 2 |
7 | H | 23 | X | 39 | n | 55 | 3 |
8 | I | 24 | Y | 40 | o | 56 | 4 |
9 | J | 25 | Z | 41 | p | 57 | 5 |
10 | K | 26 | a | 42 | q | 58 | 6 |
11 | L | 27 | b | 43 | r | 59 | 7 |
12 | M | 28 | c | 44 | s | 60 | 8 |
13 | N | 29 | d | 45 | t | 61 | 9 |
14 | O | 30 | e | 46 | u | 62 | + |
15 | P | 31 | f | 47 | v | 63 | / |
How Base64 Encoding Works
Base64 encoding works by converting binary data into a series of 6-bit numbers, which are then represented using the 64-character set. Here's a step-by-step breakdown of the process:
- Group binary data: The input binary data is divided into groups of 24 bits (3 bytes).
- Split into 6-bit chunks: Each 24-bit group is then split into four 6-bit chunks.
- Convert to decimal: Each 6-bit chunk is converted to its decimal equivalent (0-63).
- Map to Base64 characters: The decimal values are used as indices to select characters from the Base64 character set.
Padding
If the input data's length is not a multiple of 3 bytes, padding is added:
- If there's 1 byte left, it's padded with two '=' characters.
- If there are 2 bytes left, it's padded with one '=' character.
Example of Base64 Encoding
Let's consider the name "Ratan Tata" as an example to demonstrate Base64 encoding:
In Base64 encoding, spaces are significant and are encoded differently from letters. The space character has an ASCII value of 32 (decimal) or 00100000 in binary, which will be included in our encoding process.
- Convert the text "Ratan Tata" to binary (ASCII):
How-To: ASCII to Binary Conversion
ASCII to binary conversion
Each ASCII character is represented by a unique decimal number, which is then converted to an 8-bit binary number.
Example:
'R' (ASCII) = 82 (decimal) = 01010010 (binary)
To convert 82 to binary:
82 ÷ 2 = 41 remainder 0, 41 ÷ 2 = 20 remainder 1, 20 ÷ 2 = 10 remainder 0, 10 ÷ 2 = 5 remainder 0, 5 ÷ 2 = 2 remainder 1, 2 ÷ 2 = 1 remainder 0, 1 ÷ 2 = 0 remainder 1
Reading remainders from bottom to top: 01010010
R (82): 01010010
a (97): 01100001
t (116): 01110100
a (97): 01100001
n (110): 01101110
(32): 00100000 // Space between "Ratan" and "Tata"
T (84): 01010100
a (97): 01100001
t (116): 01110100
a (97): 01100001
Each character is converted to its ASCII value, then to binary:
- 'R' has ASCII value 82, which is 01010010 in binary
- 'a' has ASCII value 97, which is 01100001 in binary
- 't' has ASCII value 116, which is 01110100 in binary
- 'n' has ASCII value 110, which is 01101110 in binary
- ' ' has ASCII value 32, which is 00100000 in binary
- and so on for the remaining characters...
Group into 24 bits:
010100100110000101110100 011000010110111000100000 010101000110000101110100 011000010000000000000000
We concatenate all the binary values and group them into sets of 24 bits (3 bytes):
- First group: 010100100110000101110100 (complete 24 bits)
- Second group: 011000010110111000100000 (complete 24 bits)
- Third group: 010101000110000101110100 (complete 24 bits)
- Fourth group: 011000010000000000000000 (complete 24 bits, with padding)
Split into 6-bit chunks:
010100 100110 000101 110100 011000 010110 111000 100000 010101 000110 000101 110100 011000 010000 000000 000000
We divide each 24-bit group into four 6-bit chunks:
- From first group: 010100, 100110, 000101, 110100
- From second group: 011000, 010110, 111000, 100000
- From third group: 010101, 000110, 000101, 110100
- From fourth group: 011000, 010000, 000000, 000000
Convert to decimal:
How-To: Convert Binary to Decimal
To convert binary to decimal:
- Identify the position of each bit, starting from right to left (0, 1, 2, ...)
- For each '1' bit, calculate 2 raised to its position
- Sum up all the calculated values
Example:
010100 (binary) = (02^5) + (12^4) + (02^3) + (12^2) + (02^1) + (02^0) = 20 (decimal)
20 38 5 52 24 22 56 32 21 6 5 52 24 16 0 0
Each 6-bit chunk is converted to its decimal equivalent following the same process:
- 011000 (binary) = 24 (decimal)
- 010110 (binary) = 22 (decimal)
- 111000 (binary) = 56 (decimal)
- 100000 (binary) = 32 (decimal)
- and so on for the remaining binary chunks...
Map to Base64 characters:
U m F 0 Y W 4 g V G F 0 Y Q = =
Using the Base64 index table, we map each decimal value to its corresponding Base64 character:
- 20 maps to 'U' (Upper case 'U')
- 38 maps to 'm' (Lower case 'm')
- 5 maps to 'F' (Upper case 'F')
- 52 maps to '0' (Zero) And so on for the remaining mappings: Y, W, 4, g, V, G, F, 0, Y, Q, A, A
Add padding (if necessary): In this case, we need padding because the input length (10 bytes) results in 80 bits, which doesn't divide evenly into 6-bit chunks. We have 4 bits left over, so we need to add two '=' characters for padding.
Padding with '=' characters is necessary when the number of bytes in the input is not divisible by 3. This ensures that the final Base64 string length is always a multiple of 4. The number of padding characters added (0, 1, or 2) depends on how many bytes are left over after dividing the input length by 3.
Therefore, the final Base64 encoding of "Ratan Tata" (including the space) is "UmF0YW4gVGF0YQ==".
This process ensures that any binary data can be represented using only the 64 characters in the Base64 character set, making it safe for transmission through text-based systems that might not handle binary data well.
Test Your Understanding
Question: What is the Base64 encoding of the word "Test"?
Show Solution
The Base64 encoding of "Test" is "VGVzdA=="
Show Explanation
Let's break down the process:
Convert to ASCII: T = 84, e = 101, s = 115, t = 116
Convert to binary: T = 01010100, e = 01100101, s = 01110011, t = 01110100
Group into 24 bits: 010101000110010101110011 01110100
Split into 6-bit chunks: 010101 000110 010101 110011 011101 00
Convert to decimal: 21 6 21 51 29 0
Map to Base64 characters: V G V z d A
Add padding: Since we have 2 bits left over (less than 6), we add two '=' for padding.
Therefore, the final encoding is "VGVzdA=="
Base64 Decoding
While we've focused on encoding so far, let us now explore briefly the reverse operation to convert our Base64-encoded text back into its original binary form ( and to our intial text).
Decoding Process
Remove Padding: First, any '=' characters at the end of the encoded string are removed.
Reverse Character Mapping: Each Base64 character is mapped back to its 6-bit value.
Combine Bits: The 6-bit values are combined into a continuous stream of bits.
Group into Bytes: The bit stream is grouped into 8-bit chunks (bytes).
Convert to Original Data: These bytes are then converted back to their original form (ASCII characters, binary data, etc.).
Decode our Base64 string "UmF0YW4gVGF0YQ=="?
Decoding process:
Remove padding: "UmF0YW4gVGF0YQ==" becomes "UmF0YW4gVGF0YQ"
Convert to binary (6 bits per character): U =
010101
, m =101101
, F =000110
, 0 =110000
, Y =011000
, W =010111
, 4 =110100
, g =100111
, V =010110
, G =000111
, F =000110
, 0 =110000
, Y =011000
, Q =010000
Combine bits:
010101101101000110110000011000010111110100100111010110000111000110110000011000010000
Group into bytes (8 bits):
01010110
01101000
01100001
01110100
01100001
01101110
00100000
01010100
01100001
01110100
01100001
Convert to ASCII:
82
97
116
97
110
32
84
97
116
97
Convert ASCII to characters:
R
a
t
a
n
T
a
t
a
Therefore, the decoded result is "Ratan Tata"
We've now explored both the encoding and decoding processes of Base64, understanding how binary data can be converted to text and back again. This bidirectional conversion is crucial for many applications in data transmission and storage. Let's now take a closer look at some of these practical applications.
Applications
- Email Attachments: Converts binary files (including images, documents, and other attachments) to text for email transmission. This is typically done when the email system doesn't support direct binary attachments or to ensure compatibility across different email clients.
- Web Images: Embeds small images directly in HTML as text strings.
- API Responses: Sends binary data (e.g., images) as text in API responses.
- URL Encoding: Safely includes complex data in URLs by converting it to text.
Considerations
- File Size: Base64 encoding increases file size by about 33%.
- Caching: Base64-encoded resources can't be cached separately by browsers, potentially affecting load times.
- SEO Impact: Search engines may not index Base64-encoded images, affecting image search visibility.
Conclusion
In this post, we've explored the fundamentals of Base64 encoding, including its process, applications, and considerations. We've learned how this technique transforms binary data into a text format, making it crucial for various data transmission and storage scenarios in the digital world.
In our next post, we'll explore how Base64 encoding is applied in creating GLS shipment labels, providing a practical example of this encoding technique in action. We'll be referring to the GLS ShipIT API documentation1 for this demonstration.
Footnotes
GLS ShipIT API Documentation. https://shipit.gls-group.eu/webservices/3_2_9/doxygen/WS-REST-API/index.html ↩