Abstract: The amount of data available is continuing to explode and so is the amount of confidential data while most data remain unused. Although data can be encrypted to provide privacy, modern encryption schemes require that encrypted data be decrypted in order to work with it. Therefore, data becomes vulnerable during the computation process. Companies either have to trust 3rd party cloud services or keep the data locked in local data storage rooms without any use. Fully Homomorphic Encryption offers a trustless solution to securely outsource data.
Problems are inevitable and it’s impossible to predict the problems of the future. Nobody could have imagined 100 years ago that climate change and data privacy would be some of the pressing challenges that humanity would face. But on the bright side, problems are soluble through technological solutions and philosophical discussions.
Historical trade via ports and modern trade via digital ports:
Trade has played a very important role in shaping human civilization. Humans traded goods and services since prehistoric times. Specialization and division of labour combined with trade brought prosperity and economic progress. Most goods from across the oceans entered through ports but the biggest threats occurred in the seas. Goods and lives were lost due to shipwreck or at the hands of the pirates.
Today, a vast majority of information and goods are traded digitally (via digital ports). These include confidential business secrets, healthcare, financial and even personal data. Despite companies and organizations spending a tonne of money on keeping the data secure, data breaches have been on the rise. IBM X-Force Threat Intelligence Index found that in 2019, some 8.5 billion records were breached, giving attackers access to stolen credentials.
What’s worse? While today’s encryption systems offer data security at rest and during transit, data security during compute is a huge challenge. Advancements in machine learning algorithms offer an opportunity for widespread adoption across industries. Nevertheless, areas dealing with sensitive and private data, like healthcare or finance, have lagged behind due to regulatory constraints and a lack of strong security solutions to protect users’ data.
Data computation requires heavy compute infrastructure that is usually outsourced to third-party cloud providers. Since data security during compute is a bottleneck, most data remain unused and piled in data server rooms of companies. This hampers the prospects of generating valuable information that could be used in drug discovery or financial security. Fully Homomorphic Encryption, claimed as the holy grail of cryptography is designed to close this gap. It allows data to remain encrypted even during computations, so that they are performed on encrypted data, without the service behind it needing to ‘see’ that data.
History of Encryption:
The earliest known history of encryption was around 600 BC in Egypt where they used a device called Scytale. Made of leather strap wrapped around a rod, the letters on the strap could be deciphered only when the correct size of the rod was used. Else, the content was meaningless. The Greeks introduced their own methods that encoded letters in the form of symbols The Romans then widely used cyclic displacement of letters in words to convert plaintexts to cypher texts during the era of Julius Cesar. The Arabs during the middle ages elucidated cryptanalysis which came into wider practice under European Rule.
Cryptanalysis played a pivotal role in World War I. The British intelligence intercepted and decoded a telegram sent by Germany which tried to broker an alliance with Mexico lest the United States enter the war. This provoked the US into the war which significantly influenced the outcome of the war.
By World War II, mechanical/electromechanical cypher devices were prevalent. The Enigma Machine was extensively deployed by Germany during the war. The machine's electromechanical rotor would scramble the letters of the alphabet converting the plaintext to ciphertext and vice versa.
Enigma Machine used in WWII
The Enigma transforms worked on the basis of permutations which depended on machine settings that were frequently changed. The number of possible settings went from ten quadrillions at one point to 159 quintillions. That’s 159 followed by eighteen zeroes. The cooperation of Allied Forces(Polish, British and French) cracked the Enigma key which substantially altered the outcome of the war.
In 1945, Claude Shanon published 'A mathematical theory of cryptography' which began the era of modern cryptography. Cryptography which was only used by governments for security and espionage now became available for commercial applications. encryption schemes for most parts of history followed ad-hoc designs i.e security was based on intuition and heuristics. But this started to change with broader applications of mathematics and computation via modern devices. In the 1970s, IBM's crypto group designed a block cypher to protect its customers' data. Advances in cryptographic research ever since paved the way for new algorithms and today communications, the internet and blockchain are all powered by the underlying crypto algorithms.
But what is encryption? It is the process of encoding information that converts the original representation of information(plaintext) into an alternative form(ciphertext). And decrypting is the conversion from ciphertext to plaintext. This process involves ‘keys’. Ideally, only authorized parties i.e with access to keys, can decipher a ciphertext back to plaintext and access the original information.
There are two kinds of keys:
Symmetric Key Cryptography
- Symmetric Key schemes where the encryption and decryption are performed using the same keys. Eg: Enigma used one such scheme.
- Asymmetric Key schemes or public-key schemes that involve two different keys; one for encryption which is published for anyone to encrypt messages and one for decryption which is only accessible by the receiving party that enables messages to be read. Eg: RSA, Diffie-Hellman
Asymmetric Key Cryptography
Modern Encryption relies on extremely hard to solve mathematical problems that have remained resilient to solutions for decades. Encryption schemes are built in a way that cracking the security relies on solving one of these hard problems. Some of the most popular encryption systems in use today in high-level security applications such as SSH, TLS, HTTPS, VPNs, bitcoin, etc. are:
- RSA(Rivest–Shamir–Adleman) is the most widely used for data transmission and one of the oldest cryptographic algorithms. The security relies on a hard mathematical problem: integer factorization of the product of two large prime numbers. Since it is relatively slow, it is used to transmit shared keys for symmetric key cryptography instead of directly encrypting user data.
El Gamal is an asymmetric key encryption scheme based on Diffie-Hellman key exchange, the earliest publicly known scheme that proposed the idea of a private key and a corresponding public key. The security of the algorithm relies on another hard to solve mathematical problem: discrete logarithm where
log(base b)a is a number x such that b^x = a
- Elliptic Curve Cryptography, based on mathematical elliptic curves, creates smaller, faster, and more efficient cryptographic keys. Bitcoin uses ECC. An elliptic curve is the set of points that satisfy a specific mathematical equation such as y^2 = x^3 + ax + b. The security relies on the hardness of solving the above equation.
Unfortunately, public-key algorithms such as RSA, Diffie-Hellman are considered to be quantum breakable i.e quantum computing can break the hard mathematical problems that the security of these algorithms rely on. Although current quantum algorithms lack the processing power to break some of the modern encryption, given the advances in quantum computing it seems imperative to switch to quantum secure solutions at the earliest. Eg: Quantum Computing is known to be capable of solving factoring problems using a technique called Shor’s Algorithm.
Privacy Enhancing Technologies:
Fully Homomorphic Encryption(FHE) is claimed to offer quantum-secure algorithms. But before diving into why FHE is considered quantum secure, let’s have a brief dive into the competing technologies. FHE is part of a broader consortium of technologies called Privacy Enhancing Technologies that include Differential Privacy, Trusted Execution Environment and Secret Computing. Secret Computing is an umbrella term for two of the most promising technologies - Secure Multi-Party Computation and Fully homomorphic Encryption.
Privacy Enhancing Technologies. Source: Inpher
Differential Privacy allows the collection and sharing of aggregate information of users while maintaining the privacy of individual users. It allows access to a large number of sensitive data without privacy breach. The algorithms could in principle either:
- essentially make the same inference about any individual user's information, irrespective of whether an individual’s information is included in the input to the analysis.
- add some random noise that is tolerable to existing data resulting in an output that is very close to the actual output without noise.
While DP can offer privacy, it fails in precision.
Trusted Execution Environment(TEE):
A Trusted Execution Environment (TEE) is a secure area/hardware fence that guarantees privacy inside the main processor. Eg: Intel's SGX, ARM's TrustZone. The enclave technology allows programs to be executed in isolation from the rest of the programs. In such a enclave, data incoming and outgoing are encrypted, while the computation in decrypted form only happens within the enclave. However, TEE is prone to side-channel attacks.
Multi-Party Computation is a cryptographic protocol that distributes computation across multiple parties preventing individual parties to see the other parties’ data. Secure MPC is a great solution to privacy challenges but only when
- majority of parties have no intention to corrupt the data
- all parties have sufficient compute resources
Additionally, the biggest challenge with MPC is the communication overhead due to the constant communication between different parties to compute on encrypted data.
The security of the protocol relies on random number generation to distribute encrypted data for computation and the random numbers can’t be cracked through brute force.
Fully Homomorphic Encryption(FHE):
Fully Homomorphic Encryption. Source: Openmined
FHE encryption schemes by definition have an amazing property, where the encryption of values, say of number 1 and number 2, can be simply added together as ciphertexts to obtain the encrypted ciphertext for 1+2=3, without ever revealing the original plaintexts. In other words, FHE schemes allow analytical functions to be executed directly on encrypted data while yielding the same encrypted results as if the functions were executed on original data. This property is called homomorphism.
The protocol has been known since the 1970s but thanks to recent breakthroughs and demand for better security systems, it is starting to enter the mainstream. Since plaintext is converted to ciphertext for HE computation which is much larger in size, the computation costs are extremely high making the process expensive and time-consuming. Operations on homomorphic encryptions are largely built on additive and multiplicative functions. There are 3 different categories of homomorphic encryption:
- Partial Homomorphic Encryption(PHE): The scheme is either additively homomorphic or multiplicatively homomorphic.
- Somewhat Homomorphic Encryption(SWHE): The HE scheme is capable of doing both addition and multiplication but its capability or number of operations is heavily limited.
- Fully Homomorphic Encryption(FHE): The scheme allows any arbitrary function to be performed on encrypted data any number of times.
While HE is really promising, it is highly compute-intensive, not real-time and has power performance owing to inherent noise in the encryption scheme. Schemes for both PHE and SWHE have been around but a big breakthrough in FHE came only a little over a decade ago. The big breakthrough in FHE came in 2009 -bootstrapping - an ingenious method that reduces the noise in the scheme. To understand the method, we need to dive a bit into the underlying security of the FHE scheme and where the noise comes from.
Among the candidates for quantum-secure algorithms, lattice-based cryptography(LBC) is widely considered to be the best bet. LBC is the foundation for FHE. LBC is the generic term for constructions of cryptographic primitives that involve lattices, either in the construction itself or in the security proof.
What are lattices? A lattice can basically be thought of as any regularly spaced grid of points stretching out to infinity i.e infinite set(discrete) of points in n-dimensional Euclidean space with a periodic structure. Although lattices, in theory, can tend towards infinity, computers are limited with finite memory. To represent different points on a lattice, a basis is used. A basis is a set of vectors that can be used to reproduce any point in the grid that forms the lattice. There are decade-old problems based on LBC that are extremely difficult to solve.
A couple of popular hard problems based on lattice-based cryptography are as follows:
- The shortest vector problem (SVP): given a basis B, find the shortest non-zero vector in the lattice generated by B.
- The closest vector problem (CVP): given a basis B and a target vector t, find a lattice vector generated by B that is closest to t.
These may sound very simple to solve in 2-D. But what if the lattice were 3D, 100D or 50,000D? In other words, a lattice with 50,000 coordinates, not 2. Finding the shortest or closest vector across 50,000 generated coordinates, turns out, is extremely hard that no known quantum algorithm on a quantum computer can solve it, let alone a conventional one. It is conjectured that there is no probabilistic polynomial-time algorithm that can solve the above lattice-based problems, even approximately, by breaking it down into polynomial factors.
Learning with Error:
One of the most widely used lattice-based algorithms in cryptography is learning with errors(LWE). The security of LWE relies upon (provably reducible) solving the Shortest-Vector-Problem(SVP) in a lattice. Introduced by Oded Regev in 2005, LWE involves the difficulty of finding the values which solve B = A*s +e where A and B are known, and s is the secret message. Introducing a random error e (called noise) makes the equation hard to solve.
Noise in FHE:
In FHE schemes, the above-mentioned noise is introduced into the ciphertext while encrypting. The noise is usually a small term (integer or polynomial) added into the ciphertext while encrypting. While the noise value is generally random, the decryption of a function will not work if the noise is greater than a threshold value. However, noise increases with the number of homomorphic operations. There are two broad approaches of FHE to work with noise:
- Levelled approach: When a homomorphic circuit(set of operations) is small and known, the noise is tolerable which enables one to perform homomorphic operations as expected.
- Bootstrapped approach: When a homomorphic circuit is large and unknown, an additional operation is performed whenever the accumulated noise reaches the threshold value to reset the noise.
The bootstrapping approach (remember the big breakthrough?) allows practical realisation of fully homomorphic encryption. It is a mechanism where the ciphertext is decrypted into plaintext inside the cover of a separate encryption system and then re-encrypted back into ciphertext, resetting the noise to a small value.
Since the bootstrapping requires the decryption key to be made available, this is where Gentry devised an innovative idea:
The public key is appended with public-key encryption of the private(secret) key. Using homomorphic encryption, the encrypted private key is used to perform the decryption so at no point is the private key exposed.
However, this increases the order of computation complexity due to the requirement that the noise is reduced at frequent interventions. To tackle this and improve efficiency, there need to be improvements on both the software and hardware ends of FHE.
Inpher is a US startup with 14M $ in funding that is developing both MPC and FHE solutions. They offer MPC solution through the XOR engine that they have developed and contribute to the open source TFHE library.
Duality is an Isreali-USA startup that offers secure information sharing solution. With large amounts of funding secured and a 14.5M $ grant from DARPA, the company seems to be leading the FHE space. The CTO of Duality is also the founder of the Palisade Library while their chief cryptographer is the developer of a leading FHE scheme called BGV.
Zama is a French company developing an open source framework (the library Concrete on top of TFHE) using homomorphic encryption to enable secure AI applications in the cloud. They have developed a programmable bootstrapping approach to accelerate FHE operations. Their vision is to build the httpz: the protocol for secure computing like https is the protocol for secure data transmission.
Cosmian is another French company with 1.4M $ in funding that aims to enable Confidential Data Processing to a rapidly-rising data economy without compromising on privacy. They offer different privacy enhancing technology solutions depending on the use-cases.
Apheris AI is a German company with 2.5M $ in funding that offers secret computing and differential privacy solutions for privacy preserving applications.
While software can help improve the performance of FHE, the real gamechanger will lie in FHE hardware accelerators that can offer 1000X improvements or real-time capabilities. This is why Darpa has launched the DPRIVE program to develop an FHE accelerator in partnership with Intel and Duality.
Cornami is the first FHE hardware startup that is introducing a new paradigm in computer architecture design. With 33M $ in funding, Cornami seems to be the only company venturing into the hardware landscape. They are developing a reconfigurable architecture built on modular cores.
Although no other startups are in the hardware space, there is more research being done such as processing-in-memory architecture that may offer new hardware designs to accelerate lattice-based applications such as Fully Homomorphic Encryption and Hyperdimensional Computing.
There is still a long way to go before Fully Homomorphic encryption underpins the security layer but recent developments in this space are surely exciting.
3Blue1Brown is one of the best mathematics teachers out there. Using amazing visualizations, he boils down concepts that can be easily understood. His work on Linear algebra is unparalleled.
Openmined, an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies. The workshops and the community they have built are top-botch resources to learn about privacy-enhancing technologies.
Follow on twitter for more updates.