Public-key cryptography and digital signature using OpenSSL
The purpose of this post is to explain how to communicate privately over the Internet using public-key cryptography and how to digitally sign a document.
Introduction
Being able to communicate privately is a civil right and often a business need. As we can not allow anyone to eavesdrop our communications, we have also the right to avoid surveillance by companies or governments. There are many tools and protocols, many being open source and free, that can be used to enhance the security of our communications over the Internet. The aim of this post is to provide a very high level description of the ideas behind these tools and protocols and practical guidance on how to use one of them, OpenSSL, which is open source, free and used to secure most of the communications over the Internet. In particular in this post we will show
- How to avoid being eavesdropped while sending files to our friends or collaborators over the internet
- How to digitally sign a document
It is supposed that you are using a Linux distribution or a Mac with OpenSSL version 1.0.2 installed. In case you use Windows you might want to install Cygwin with openssl. It is assumed that you know how to use the command line.
Alice and Bob
We will set up a context for the secure communication problem using two characters, Alice and Bob. We will simulate the transmission of encrypted messages between Alice and Bob by copying files from Alice’s folder to Bob’s and vice-versa on our local file system. This simulation is meant for you to easily check what happens on both sides when they send or receive messages using OpenSSL, but it must be kept in mind that it bypasses the core business of encryption that is about sending messages over an insecure channel such as the Internet where other parties could eavesdrop or interfere with Alice’s and Bob’s communication. With this warning in mind, let’s start our simulation by creating a folder for Alice’s messages and one for Bob’s
Let’s imagine that Bob can’t remember his bank account details and asks Alice to send them to him by email. Alice is aware that sending the data as plain text over the Internet is risky so she wonders how to send the data to Bob in such a way that nobody else but he can read and use the data. After some investigation, Alice decides that the solution to their problem is public-key cryptography and the OpenSSL tools.
Public-key cryptography
Public-key cryptography consists of creating a key pair, namely a private key and a public key, to encrypt and decrypt messages. The private key is kept secret and is never shared with anyone. Alice uses Bob’s public key to encrypt the messages being sent to him. Bob uses his private key to decrypt the messages encrypted with his public key. The public key can even be published on the Internet for that matter. Only the owner of the private key can decrypt a message encrypted with his or her public key. There are different ways of creating a key pair but all are based on defining mathematical problems that are very difficult to solve in a short time scale, such as factorizing a number that is the product of two big prime numbers. This class of problems is used in the Rivest-Shamir-Adleman (RSA) cryptosystem. The idea is to find two prime numbers big enough, e.g. with more than 150 digits, so that it would be very difficult even for a cluster of computers to find them out in decades while it is very easy to compute their product. In RSA, the public key is the product of two prime numbers and the private key is the set of the two prime numbers themselves. An eavesdropper who wants to decrypt a message would need to extract the private key, i.e. the two prime numbers, from their product. In other words, the eavesdropper must be able to factorize a number that is the product of two big prime numbers, which in itself is an hard enough problem. Using RSA we can be confident that nobody will be able to decrypt our messages. The algorithm used for the encryption is well known and publicly available. The only thing that is not public, and known only to the owner of the key pair, is the private key. Let’s see what Alice and Bob have to do to keep their communication private:
- Alice and Bob create their own private and public keys.
- Bob sends Alice his public key.
- Alice encrypts her message using Bob’s public key and sends it to Bob.
- Bob decrypts Alice’s message using his private key.
So, first of all, both Alice and Bob need a key pair.
1. Alice and Bob create their own private and public keys.
Alice doesn’t yet have a key pair, so she needs to create it. As an example she may use the RSA cryptosystem. Her private key will be stored in a file, e.g. alice_rsa. The size of the private key will be 2048 bit. Let’s move into Alice’s folder and execute the command
The private key in alice_rsa is saved in the Privacy-Enhanced Mail (PEM) format and looks like the following
The public key can be created from the private one, and saved in e.g. alice_rsa.pub, with the command
Alice’s public key will look like
Now we have Alice’s key pair in her folder. Let’s do the same for Bob. We move into Bob’s folder and create his key pair, stored in e.g. bob_rsa and bob_rsa.pub, as we did for Alice. After Alice and Bob have their key pair we are done with the 1st step of the procedure.
2. Bob sends Alice his public key.
Let’s move to the 2nd step: Bob must send his public key to Alice so she will be able to send him her message encrypted. We simulate this by copying Bob’s public key file, bob_rsa.pub, in Alice’s folder. From Bob’s folder
As soon as a copy of Bob’s public key is in Alice’s folder, the 2nd step of the procedure is complete and we can move to the 3rd: Alice will encrypt her message using Bob’s public key and will send it to Bob.
3. Alice encrypts her message using Bob’s public key and sends it to Bob.
Bob’s public key can now be used by Alice with OpenSSL to encrypt her message stored in a file, e.g. data.txt, containing sensitive information
In our example the size of the file is only 65 bytes. Alice encrypts the file using OpenSSL and Bob’s public key that she has received from him, e.g. by email, which we have simulated by simply copying the file from Bob’s folder to Alice’s. From Alice’s folder
Now Alice can send her encrypted message, data.txt.enc. The encrypted message is a binary file whose content doesn’t make any sense and can be decrypted only by Bob using his private key. The RSA encryption algorithm is randomized, and executing again the same command will result in a different ciphertext but when they are decrypted the output will be exactly the same message. If Alice were a real person she would be able to send it to Bob by email. We will once again simulate the sending of the encrypted message by copying it in Bob’s folder. From Alice’s folder
As soon as the encrypted message has been received by Bob, in our simulation when it has been copied in Bob’s folder, the 3rd step is complete. We can move to the 4th and last step.
4. Bob decrypts Alice’s message using his private key.
From Bob’s folder
Bob can open the file data.txt containing the original message in plain text that Alice wanted to send to him. We can easily verify that Bob’s decrypted message and Alice’s original message are exactly the same. From the root folder
The procedure that Alice chose to send her message to Bob, without risking anyone else reading it, is complete. In this example Alice did not use her private or public key. In case Bob wanted to send her feedback, he could use Alice’s public key to encrypt his message, so that only she would be able to decrypt it, using her private key. Both Alice and Bob must keep their private keys in a very safe place. The private key we have just created for them can be used by anyone who has access to it. One way to protect the private key is to encrypt it using an algorithm, e.g. AES-256, with a password so that only the person who knows the password can decrypt the private key and use it. For example, Alice could have made her private key safer by creating it with the following command
where wT16pB9y would be Alice’s password. Currently OpenSSL supports only alphanumeric characters for passwords.
Hybrid cryptosystem
Alice has successfully solved Bob’s problem. She has been able to send him his bank account details in a secure way. Now she wants to send Bob a file, e.g. a jpeg picture that she doesn’t want anyone else to see, and whose size is some KB. Let’s try to encrypt the image on behalf of Alice
This time OpenSSL will raise an error
The problem is that the RSA algorithm can be used only to encrypt messages whose size is smaller than the size of the private key that corresponds to the public key used for the encryption. Since Bob’s private key is 2048 bit long, or 256 bytes, his public key cannot be used to encrypt messages that are bigger than 256 bytes. The best option to solve this issue is to use a symmetric algorithm. A symmetric algorithm can use only one key, called a symmetric key, for encryption and decryption. Once a message has been encrypted with the symmetric key, it can be sent, with the symmetric key encrypted using the public key of the recipient, so he or she will be able to decrypt the message. One more reason to use a symmetric algorithm to encrypt a message is that they are three orders of magnitude faster than asymmetric ones. The algorithms used in the symmetric key encryption are different from those used in public-key encryption. The symmetric key algorithms use a key that is based on a pseudo-random value taken from a huge range of possible values. The key is shared only by the two communicating parties. The strength of the algorithm rests in the difficulty of finding the key within a huge key space. The way in which the symmetric key must be created depends on the cryptographic algorithm, also called cipher. One of the most robust ciphers is AES-256, that we have already used to encrypt Alice’s private key. OpenSSL creates the symmetric key, to be used with the AES-256 cipher, from a secret string, in short secret, that can be created and stored in a file. Alice defines a new protocol in which she will create the secret that she will use to encrypt her picture and that she will share with Bob. The system that she is going to use is called a hybrid cryptosystem because it uses public-key and symmetric cryptography together.
- Alice creates the secret.
- Alice encrypts the data using the AES-256 cipher and the secret.
- Alice encrypts the secret using Bob’s public key.
- Alice sends the encrypted data and the encrypted secret to Bob.
- Bob decrypts the secret using his private key.
- Bob decrypts the data using the AES-256 cipher and the secret.
Let’s implement these steps on behalf of Alice and Bob using OpenSSL.
1. Alice creates the secret.
First, Alice creates a secret, e.g. a sequence of 32 random bytes, using a pseudo-random bytes generator provided by OpenSSL. The longer is the sequence of random bytes the more difficult is for an eavesdropper to figure it out. It should never be so short that it could be found simply by brute force. For example a sequence of only 2 bytes (16 bits) can be found in just 2^16 = 65536 attempts. From Alice’s folder
2. Alice encrypts the data using the AES-256 cipher and the secret.
The command will encrypt the image
and will print the key created by OpenSSL from the secret
The AES-256 cipher is a block cipher that encrypts a fixed block of 128 bits of the message at a time with a 256 bits long key. The mode of operation used in the example is Cipher Block Chaining (CBC). The CBC operation mode is a scheme that allows the use of a block cipher to encode strings longer than the block size (16 bytes). The key, created by OpenSSL from the secret, is shown as a result of the encryption with other random parameters, salt and iv. The salt parameter is created by OpenSSL to be added as a suffix to the secret to mitigate directory attacks, when the secret has not been chosen wisely and it could easily be found simply by brute force attack. The iv parameter is the initialization vector used as the content of the first block. It ensures that no information can be extracted by an attacker from messages that may start with some common header. All the parameters, key, salt and iv, are recreated newly every time the command is executed, even if the file and the secret are the same.
3. Alice encrypts the secret using Bob’s public key.
4. Alice sends the encrypted data and the encrypted secret to Bob.
We can simulate the sending of the encrypted data and secret by copying them from Alice’s folder to Bob’s.
5. Bob decrypts the secret using his private key.
From Bob’s folder
6. Bob decrypts the data using the AES-256 cipher and the secret.
You can verify that the image in Bob’s folder is exactly the same as the image in Alice’s folder by looking at them or by using the following command from the root folder
It can also be verified that the key, created by OpenSSL from the secret for the decryption, is the same as the key created for the encryption. In case a non valid secret is used, the decryption will fail.
This 2nd protocol enables Alice and Bob to send each other files of any size allowed by the channel, encrypted. Unfortunately it is subject to the man-in-the-middle attack. This is because a message sent over the Internet goes through different routers where a 3rd party, called Mallory in cryptography, can impersonate both Alice and Bob by sending them his public key instead of Bob’s and Alice’s respectively. Alice and Bob can solve this issue by publishing their public keys on a trusted website or by using certificates where their public keys are signed by a trusted 3rd party. The creation of certificates, even if possible with OpenSSL, requires the definition of a certificate authority and is beyond the scope of this post.
Message integrity
Encrypting a message is almost never enough. What we usually need is also to be sure that nobody can tamper our communications by intercepting our messages, dropping some of them or modifying them even when they are encrypted, without being detected. We need message integrity. The way in which the integrity of a message can be provided is by computing a value using a hash function that takes a message as input and outputs a short string, the digest or fingerprint. If the input message is modified, a hash function will output a complete different value in an unpredictable way. Even if the message space can be much larger than the digest space, the chances of a collision, in which a hash function outputs the same digest from two different messages, are practically negligible. Hash functions are used to support integrity in protocols such as TLS, SSH, IPsec. OpenSSL provides many hash functions such as SHA256, a standard function that hashes long messages into 256-bit digests. In some common use cases, encryption is not needed at all while integrity can be a strong security requirement. As an example, the integrity of a software package downloaded from the Internet can be checked comparing its fingerprint, provided on the website, with the fingerprint computed locally after the package has been downloaded and before it is executed. If the two fingerprints are the same we are assured that the software package has not been modified by an attacker during the transfer. Hash functions are also used to compute the fingerprint of public keys. For example, Bob can provide his public key’s fingerprint to Alice so that it will be easier for her to verify whether her copy of Bob’s public key is the right one. From Bob’s folder
The fingerprint can be verified more easily than the full public key
In the following section we will address another important use case in which a hash function is used to digitally sign a file.
Digital signature
Alice is a journalist and wants to send Bob an article, e.g. a pdf file, being sure than no one else can claim to be the author. Once again she comes up with a protocol that can solve her problem.
- Alice creates a one-way hash of a document, Alice’s digest.
- Alice encrypts the digest with her private key, thereby signing the document.
- Alice sends the document, her public key and the signed digest to Bob.
- Bob decrypts Alice’s digest with her public key.
- Bob creates a one-way hash of the document that Alice has sent, Bob’s digest.
- Bob compares his digest with Alice’s to find out if they match
If the signed hash matches the hash he generated, the signature is valid. Let’s say Alice wants to send a file, e.g. article.pdf, with her digital signature to Bob.
1. Alice creates a one-way hash of a document, Alice’s digest.
Alice can sign the message choosing one hash function, e.g. SHA-256 . She can create the one-way hash of the message, also known as the digest, with
The content of the digest will be similar to
2. Alice encrypts the digest with her private key, thereby signing the document.
The next step is to encrypt the digest of the hash function, data.dgst, with her private key
3. Alice sends the document and the signed digest to Bob.
Alice sends the document, article.pdf, with her signature, alice.sign and her public key, to Bob. Bob can verify Alice’s signature of the document using her public key. Again we will simulate the sending of the files by copying them from Alice’s folder to Bob’s.
4. Bob decrypts Alice’s digest with her public key.
From Bob’s folder
The output, alice.dgst, is Alice’s digest of the document, extracted from her signature of the document.
5. Bob creates a one-way hash of the document that Alice has sent, Bob’s digest.
Bob can again compute the hash of the document data.txt using the same hash function SHA-256 that has been used by Alice
6. Bob compares his digest with Alice’s to find out if they match
The result of the comparison is
proving that Alice has signed the document. The signature can not be repudiated and the document can not be changed without compromising the validity of the signature.
Conclusion
We have seen how to use OpenSSL to add some level of security to our communications with the public-key cryptography and the symmetric encryption. As previously cautioned, the protocols we have shown are not completely secure, but they will certainly limit the number of eavesdroppers capable of figuring out the content of your digital assets sent over the Internet. You can get more information on cryptography, algorithms and how protocols can be improved to enhance the security of the communications, by consulting the resources in the references.
Acknowledgments
Thanks to Eurydice Prentoulis for proof-reading the text.
References
- Bruce Schneier - Applied Cryptography, 2nd Edition
- William Stein - Elementary Number Theory: Primes, Congruences, and Secrets
- Dan Boneh, Victor Shoup - A Graduate Course in Applied Cryptography
- Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone - Handbook of Applied Cryptography
- Dan Boneh - Cryptography I (Coursera)
- Christoph Paar, Jan Pelzl - Understanding Cryptography