Published on

Introduction to Base64 Algorithm

Viewed

times

Authors
  • Name
    Twitter
Series: Base64 Encoding: Fundamentals and Application in GLS ShipIT API
Episodes: (1/2)

Welcome to our series on Base64 encoding! In this post, we'll explore the basics of Base64 - a method used to encode binary data into a text format. Let's get started.

Prerequisites

Before we dive into Base64 encoding, it's helpful to have a basic understanding of the following concepts.


Binary data
  • Binary is a base-2 number system using only 0s and 1s
  • Each digit is called a bit (binary digit)
  • Used to represent all data in computers
  • Example: The letter 'A' in binary is 01000001
Bits and bytes
  • A bit is the smallest unit of data (0 or 1)
  • A byte is a group of 8 bits
  • 1 byte can represent 256 different values (2^8)
  • Bytes are commonly used to represent characters in text encoding
ASCII
  • ASCII stands for American Standard Code for Information Interchange
  • It's a character encoding standard for electronic communication
  • Represents text characters as numbers from 0-127
  • Example: 'A' is represented by the decimal number 65
Text encoding
  • Text encoding is the process of converting human-readable text to machine-readable format
  • Common encodings include ASCII, UTF-8, and Unicode
  • Encoding ensures consistent representation and storage of text across different systems
  • Different encodings support various character sets and languages

Understanding Base64 Encoding

Base64 is an encoding algorithm that converts binary data into a text format. It's designed to allow binary data to be safely transmitted over channels that only support text, such as email systems, certain network protocols, and legacy data storage systems. These text-only channels typically handle ASCII characters well but may have issues with binary data. Base64 encoding ensures that binary information can be reliably transmitted through these text-based systems. Let's dive into how it works.

The Base64 Character Set

Base64 uses a set of 64 characters to represent binary data in a text format:

  • A-Z (26 characters)
  • a-z (26 characters)
  • 0-9 (10 characters)
  • '+' and '/' (2 characters)

Additionally, '=' is used for padding.

Here is the complete Base64 index table:

IndexCharacterIndexCharacterIndexCharacterIndexCharacter
0A16Q32g48w
1B17R33h49x
2C18S34i50y
3D19T35j51z
4E20U36k520
5F21V37l531
6G22W38m542
7H23X39n553
8I24Y40o564
9J25Z41p575
10K26a42q586
11L27b43r597
12M28c44s608
13N29d45t619
14O30e46u62+
15P31f47v63/

How Base64 Encoding Works

Base64 encoding works by converting binary data into a series of 6-bit numbers, which are then represented using the 64-character set. Here's a step-by-step breakdown of the process:

  1. Group binary data: The input binary data is divided into groups of 24 bits (3 bytes).
  2. Split into 6-bit chunks: Each 24-bit group is then split into four 6-bit chunks.
  3. Convert to decimal: Each 6-bit chunk is converted to its decimal equivalent (0-63).
  4. Map to Base64 characters: The decimal values are used as indices to select characters from the Base64 character set.

Padding

If the input data's length is not a multiple of 3 bytes, padding is added:

  • If there's 1 byte left, it's padded with two '=' characters.
  • If there are 2 bytes left, it's padded with one '=' character.

Example of Base64 Encoding

Let's consider the name "Ratan Tata" as an example to demonstrate Base64 encoding:

  1. Convert the text "Ratan Tata" to binary (ASCII):
How-To: ASCII to Binary Conversion
R (82):  01010010
a (97):  01100001
t (116): 01110100
a (97):  01100001
n (110): 01101110
  (32):  00100000  // Space between "Ratan" and "Tata"
T (84):  01010100
a (97):  01100001
t (116): 01110100
a (97):  01100001

Each character is converted to its ASCII value, then to binary:

  • 'R' has ASCII value 82, which is 01010010 in binary
  • 'a' has ASCII value 97, which is 01100001 in binary
  • 't' has ASCII value 116, which is 01110100 in binary
  • 'n' has ASCII value 110, which is 01101110 in binary
  • ' ' has ASCII value 32, which is 00100000 in binary
  • and so on for the remaining characters...
  1. Group into 24 bits:

    010100100110000101110100 011000010110111000100000 010101000110000101110100 011000010000000000000000
    

    We concatenate all the binary values and group them into sets of 24 bits (3 bytes):

    • First group: 010100100110000101110100 (complete 24 bits)
    • Second group: 011000010110111000100000 (complete 24 bits)
    • Third group: 010101000110000101110100 (complete 24 bits)
    • Fourth group: 011000010000000000000000 (complete 24 bits, with padding)
  2. Split into 6-bit chunks:

    010100 100110 000101 110100 011000 010110 111000 100000 010101 000110 000101 110100 011000 010000 000000 000000
    

    We divide each 24-bit group into four 6-bit chunks:

    • From first group: 010100, 100110, 000101, 110100
    • From second group: 011000, 010110, 111000, 100000
    • From third group: 010101, 000110, 000101, 110100
    • From fourth group: 011000, 010000, 000000, 000000
  3. Convert to decimal:

How-To: Convert Binary to Decimal
20 38 5 52 24 22 56 32 21 6 5 52 24 16 0 0

Each 6-bit chunk is converted to its decimal equivalent following the same process:

  • 011000 (binary) = 24 (decimal)
  • 010110 (binary) = 22 (decimal)
  • 111000 (binary) = 56 (decimal)
  • 100000 (binary) = 32 (decimal)
  • and so on for the remaining binary chunks...
  1. Map to Base64 characters:

    U m F 0 Y W 4 g V G F 0 Y Q = =
    

    Using the Base64 index table, we map each decimal value to its corresponding Base64 character:

    • 20 maps to 'U' (Upper case 'U')
    • 38 maps to 'm' (Lower case 'm')
    • 5 maps to 'F' (Upper case 'F')
    • 52 maps to '0' (Zero) And so on for the remaining mappings: Y, W, 4, g, V, G, F, 0, Y, Q, A, A
  2. Add padding (if necessary): In this case, we need padding because the input length (10 bytes) results in 80 bits, which doesn't divide evenly into 6-bit chunks. We have 4 bits left over, so we need to add two '=' characters for padding.

Therefore, the final Base64 encoding of "Ratan Tata" (including the space) is "UmF0YW4gVGF0YQ==".

This process ensures that any binary data can be represented using only the 64 characters in the Base64 character set, making it safe for transmission through text-based systems that might not handle binary data well.

Test Your Understanding
Question: What is the Base64 encoding of the word "Test"?
Show Solution

The Base64 encoding of "Test" is "VGVzdA=="

Show Explanation

Let's break down the process:

  1. Convert to ASCII: T = 84, e = 101, s = 115, t = 116

  2. Convert to binary: T = 01010100, e = 01100101, s = 01110011, t = 01110100

  3. Group into 24 bits: 010101000110010101110011 01110100

  4. Split into 6-bit chunks: 010101 000110 010101 110011 011101 00

  5. Convert to decimal: 21 6 21 51 29 0

  6. Map to Base64 characters: V G V z d A

  7. Add padding: Since we have 2 bits left over (less than 6), we add two '=' for padding.

Therefore, the final encoding is "VGVzdA=="

Base64 Decoding

While we've focused on encoding so far, let us now explore briefly the reverse operation to convert our Base64-encoded text back into its original binary form ( and to our intial text).

Decoding Process

  1. Remove Padding: First, any '=' characters at the end of the encoded string are removed.

  2. Reverse Character Mapping: Each Base64 character is mapped back to its 6-bit value.

  3. Combine Bits: The 6-bit values are combined into a continuous stream of bits.

  4. Group into Bytes: The bit stream is grouped into 8-bit chunks (bytes).

  5. Convert to Original Data: These bytes are then converted back to their original form (ASCII characters, binary data, etc.).

Decode our Base64 string "UmF0YW4gVGF0YQ=="?

Decoding process:

  1. Remove padding: "UmF0YW4gVGF0YQ==" becomes "UmF0YW4gVGF0YQ"

  2. Convert to binary (6 bits per character): U = 010101, m = 101101, F = 000110, 0 = 110000, Y = 011000, W = 010111, 4 = 110100, g = 100111, V = 010110, G = 000111, F = 000110, 0 = 110000, Y = 011000, Q = 010000

  3. Combine bits:

    010101101101000110110000011000010111110100100111010110000111000110110000011000010000
    
  4. Group into bytes (8 bits): 01010110 01101000 01100001 01110100 01100001 01101110 00100000 01010100 01100001 01110100 01100001

  5. Convert to ASCII: 82 97 116 97 110 32 84 97 116 97

  6. Convert ASCII to characters: R a t a n T a t a

Therefore, the decoded result is "Ratan Tata"

We've now explored both the encoding and decoding processes of Base64, understanding how binary data can be converted to text and back again. This bidirectional conversion is crucial for many applications in data transmission and storage. Let's now take a closer look at some of these practical applications.

Applications

  • Email Attachments: Converts binary files (including images, documents, and other attachments) to text for email transmission. This is typically done when the email system doesn't support direct binary attachments or to ensure compatibility across different email clients.
  • Web Images: Embeds small images directly in HTML as text strings.
  • API Responses: Sends binary data (e.g., images) as text in API responses.
  • URL Encoding: Safely includes complex data in URLs by converting it to text.

Considerations

  • File Size: Base64 encoding increases file size by about 33%.
  • Caching: Base64-encoded resources can't be cached separately by browsers, potentially affecting load times.
  • SEO Impact: Search engines may not index Base64-encoded images, affecting image search visibility.

Conclusion

In this post, we've explored the fundamentals of Base64 encoding, including its process, applications, and considerations. We've learned how this technique transforms binary data into a text format, making it crucial for various data transmission and storage scenarios in the digital world.

In our next post, we'll explore how Base64 encoding is applied in creating GLS shipment labels, providing a practical example of this encoding technique in action. We'll be referring to the GLS ShipIT API documentation1 for this demonstration.

Footnotes

  1. GLS ShipIT API Documentation. https://shipit.gls-group.eu/webservices/3_2_9/doxygen/WS-REST-API/index.html