A simple guide; Hashing, Encoding & Encryption

Aung Baw
5 min readMar 26, 2023

partly written by ChatGPT

Photo by FLY:D on Unsplash

Hashing

Hashing is a technique used in computer science and cryptography to transform data of arbitrary size into a fixed-size output called a hash value, hash code, or digest. The resulting hash value is typically a unique representation of the original data, which can be used for various purposes, such as data integrity checks, password storage, and digital signatures.

The process of hashing involves passing the original data through a hash function, which produces a fixed-size output. The hash function takes in the data and uses a mathematical algorithm to transform it into a hash value. The resulting hash value is typically a fixed length, regardless of the size of the input data.

$ echo "hello world" > hw.txt
$ md5 hw.txt
MD5 (hw.txt) = 6f5902ac237024bdd0c176cb93063dc4

Take away: one-way process, where data is transformed into fixed length alphanumeric string, to verify data integrity.

Common use case: To store as a (salted) password, append a random string then hash it for storage. To check data integrity, such as a file, and a message for corruption during transportation or modification in flight. To retrieve information quickly and perform at O(1) in databases or cache systems.

Encoding

Encoding is the process of converting data from one format to another. In computer science, encoding is often used to represent data in a format that can be transmitted or stored more efficiently or to provide a standardized format for exchanging data between different systems.

$ base64 -i hw.txt > encode.txt
aGVsbG8gd29ybGQK
$ base64 -d -i encode.txt
hello world

Take away: changing data into a new format(reversible) using a scheme, to maintain data usability.

Common use case: data transforming before transmitted on a network (ASCII or Unicode format). In storage we may encode text into Unicode while binary data into Binary format (can apply to multimedias too). In web development, to represent special characters in URLs, HTML, and other web-related protocols.

Encryption

Encryption is the process of converting plain text or data into a coded language or cipher text to prevent unauthorized access to it. Encryption is used to ensure confidentiality, integrity, and privacy of data.

Encryption algorithms typically require a key or a password that is used to transform the plain text into the cipher text. The encryption key is required to decrypt the cipher text and retrieve the original data. There are two types of encryption algorithms: symmetric key encryption and asymmetric key encryption.

Symmetric key encryption: Symmetric key encryption uses the same key for both encryption and decryption.

$ gpg --batch --output hw.txt.gpg --passphrase mysecurepassword --symmetric hw.txt
$ gpg --batch --output hw_new.txt --passphrase mysecurepassword --decrypt hw.txt.gpg
gpg: AES256.CFB encrypted data
gpg: encrypted with 1 passphrase

The first command will encrypt using AES256 & subsequently created hw.txt.gpg file. We can change that with --cipher-algo flag and also notice that we need same password(mysecurepassword) to decrypt back. Our new content can be found in hw_new.txt file.

Asymmetric key encryption: Asymmetric key encryption, also known as public key encryption, uses two different keys — a public key and a private key — for encryption and decryption.

$ mkdir sender receiver && cd receiver

$ gpg --batch --generate-key <<EOF
Key-Type: RSA
Key-Length: 3072
Subkey-Type: RSA
Subkey-Length: 3072
Name-Real: Receiver Foo
Name-Email: foo@server.com
Passphrase: foosecurepassword
Expire-Date: 30
%pubring receiver.kbx
%commit
EOF

$ gpg --keyring ./receiver.kbx --no-default-keyring --list-keys
./public.kbx
------------
pub rsa3072 2023-03-26 [SCEA] [expires: 2023-04-25]
2F88A59BCFB07568B1966F652B8A89B6C3C2F11B
uid [ unknown] Receiver Foo <foo@server.com>
sub rsa3072 2023-03-26 [SEA] [expires: 2023-04-25]

$ gpg --keyring ./receiver.kbx --no-default-keyring --list-secret-keys
./public.kbx
------------
sec rsa3072 2023-03-26 [SCEA] [expires: 2023-04-25]
2F88A59BCFB07568B1966F652B8A89B6C3C2F11B
uid [ unknown] Receiver Foo <foo@server.com>
ssb rsa3072 2023-03-26 [SEA] [expires: 2023-04-25]

We are trying to create public key in receiver folder and .kbx is the keyring file. Last two commands list public --list-keys and private key --list-secret-keys.

$ gpg --keyring ./receiver.kbx --no-default-keyring --armor --output receiverpublickey.gpg --export
-----BEGIN PGP PUBLIC KEY BLOCK-----
....
-----END PGP PUBLIC KEY BLOCK-----

$ cp receiverpublickey.gpg ../sender & cd ../sender

We can export our public key now, then share it with intended receiver(via email, chat, sms, etc), In pratical TLS/SSL client and server will do handshake, greeting and cert exchange back and forth, in this dummy folders example, we will just copy to receiver folder.

$ gpg --keyring ./sender.kbx --no-default-keyring --import receiverpublickey.gpg
gpg: keybox './sender.kbx' created
gpg: key 1D349FCF0393F73F: public key "Receiver Foo <foo@server.com>" imported
gpg: Total number processed: 1
gpg: imported: 1

$ gpg --keyring ./sender.kbx --no-default-keyring --list-keys
./receiver.kbx
--------------
pub rsa3072 2023-03-26 [SCEA] [expires: 2023-04-25]
48A035D8DEC9487E9A7B0A701D349FCF0393F73F
uid [ unknown] Receiver Foo <foo@server.com>
sub rsa3072 2023-03-26 [SEA] [expires: 2023-04-25]

$ gpg --keyring ./sender.kbx --no-default-keyring --edit-key "foo@server.com" trust

Now we just simply import sender public key into receiver keyring file. Unknown in this case means info regarding sender’s key trustworthiness is not available (yet), then we make it trustable. CLI will present with a choice to indicate how far we trust the key. 5 for trust ultimately, then we can run previous command ↑ and we will see uid [ultimate] Sender Foo <foo@server.com>.

$ gpg --keyring ./receiverpublickey.gpg --no-default-keyring --encrypt --recipient "foo@server.com" hw.txt

$ cp hw.txt.gpg ../receiver

Let’s encrypt hw.txt file with receiver pub key, copy that gpg from any channel, in this case copy to the other folder, we can start with the encryption process.

$ gpg --keyring ./receiver.kbx --no-default-keyring --pinentry-mode=loopback --passphrase "foosecurepassword" --output hw.txt --decrypt hw.txt.gpg
gpg: encrypted with 3072-bit RSA key, ID 8273FAC75696D83E, created 2023-04-25
"Reveiver Foo <foo@server.com>"

The communication is unidirectional, as the sender and receiver roles are static.

Take away: transform data to keep it secret from others, to maintain data confidentiality.

Common use case: secure communication over public internet. Data storage on personal PC, cloud or HDD. Authentication, the famous https(TLS/SSL) uses asymmetric encryption to first establish the identity of one or both parties. Then, it uses asymmetric encryption to exchange a key to a symmetric cipher. Regulatory requirements and compliance standards for data protection need both encryption types(Financial, Cloud, Mobile devices).

It’s just a dummy article for newbies and non-CS folks. If you’re using any Unix alike PC, you only need `gpg` to follow through code examples. Hope it’s useful and have a good one.

--

--

Aung Baw

Focusing on security, cloud, and DevOps, I am a lifelong learner and lazy 徒弟.