The weak security of CipherCloud explained by experts

CipherCloud claims to secure data on cloud services such as Salesforce and Gmail. They also claim they can help companies fulfill information protection policies.

But encryption is hard and any minor mistake may open your data up to severe information leaks. The amazing experts at the Stack Exchange Cryptography forum took a serious look at the details, which you will see below. The post was removed due to a copyright complaint from CipherCloud, afraid of exposure of their bad encryption practices as they are.

In summary: CipherCloud's encryption is not very secure. It is probably easy to stage a frequency-analysis attack, and some fields are not encrypted at all!

The content below is the texts from the post, with pictures removed and headers written by me. We can publish it here thanks to the free copyright license the forum uses for all material. This means you are also free to use it in any way you wish (see license at the bottom).

This page last updated April 20, 2013


Question: How is CipherCloud doing homomorphic encryption?

Much of the literature and latest papers suggest that homomorphic encryption is still not practical yet.

How is CipherCloud able to achieve this? Does anyone have an idea? Their website does not provide much information about how their system works.

Asked Aug 25 2012 by sashank (edited by D.W.)

They Don't: Accepted Answer with 6 Upvotes

My impression from their whitepaper is that they don't use homomorphic encryption. They seem to encrypt fields individually with symmetric encryption(mentions AES). The scheme is deterministic and format preserving.

When a field consists of several parts(say firstname and lastname) they split it into those parts and encrypt them individually.

Some important fields, such as "annual revenue" aren't encrypted at all, probably because they need to do use it in calculations.

An inherent weakness in such a scheme is that it doesn't offer semantic security. If the same string gets encrypted in different places, an attacker can see that the same string was used in both places. If he can figure out one of them, he automatically known the other too. He could also employ some kind of frequency analysis on tokens. For example the most common firstname might be "John", allowing him to find all "John"s.

An interesting question is how they achieved their format-preservation. There are relatively secure methods, but it's easy to get wrong. I couldn't find any specification of what they're doing, so I wouldn't trust them on getting it right.

Answered Aug 25 2012 by CodesInChaos

Bad Substring Encryption: Answer with 8 Upvotes

I don't think they have implemented homomorphic encryption at all. They have just implemented regular AES encryption (they have a FIPS 197 certificate for their AES), but in what appears to be a very insecure way. Why would they choose to do that? Because they had no choice. Here's what I mean:

But SaaS encryption implementation has a much larger problem. Searching for exact matches is an easy problem to solve - just send the encrypted value for the match to the SaaS application an you're all set. But that is not the only kind of search you need to support. What happens if a user does a search for all names that begin with John? (e.g. "John*") There are (at least) two options: The first is to store a mapping to every instance of every string that begins with John* in the encryption appliance, then send all the instances of the encrypted text for every mapped string that matches John* to the SaaS application so it can perform the search. That becomes problematic if there are a lot of strings that begin with "John*" - you have to send all those matching strings to the SaaS application in order to make the search work. But imagine a search for John* + Jon* + Jame* + Smith*. You could run out of query parameters pretty easily. It's even worse when running reports.

You also have to have a mapping infrastructure (a database would be the enterprise-grade way to do this) on the encryption appliance to make this work, but CipherCloud do not appear to require a database, making this approach unlikely in their case. And CipherCloud does not seem to use this approach, as it appears from their publicly available documents.

But they may have implemented a worse one, because the second way to address searches for "John*" is what they appear to have actually done. This method preserves string order within the ciphertext, such that John becomes XXyzzz, Johnathan is XXyzzzAAddBBaaBB, and Johnson is XXyzzzDdffsss (this is not their algorithm just a representation of the net effect.) That way, a search for John* means I only have to send "XXyzzz*" to the SaaS application in order to properly fulfill the search. But this approach greatly weakens the security of the data. This is because once I deduce that John is XXyzzz, anytime I see a string beginning with those characters I know it is some form of the name "John*" and I can really start attacking the data. CipherCloud claims to use AES, which should not have this problem, so how can they preserve this string order using AES? Well, the first thing to do is not use padding, or to use the same padding everywhere. Yikes! The second thing is to use the same IV (initialization vector, aka nonce) for all strings. Yikes again! Without padding and IV diversity, AES becomes a glorified version of XOR. Who would bet the security of their data on that? (This probably explains why they do not have, and are not even in process to obtain, FIPS 140-2 validation, which pertains to the proper implementation of an approved algorithm.)

More recent demonstrations of the CipherCloud solution appear to use multi-byte characters in the ciphertext, which makes the patterns harder to see by the naked eye (ok, to eyes used to parsing western character sets) but certainly no harder for a computer to crack.

I'm not sure if they are still there, but there used to be some good videos of their solutions on Vimeo and Youtube, so you can look at those and see for yourself what I'm talking about. I'm sure you can also download whitepapers from their site. I'll leave it to someone else to really dig into the available data and figure out exactly how they are doing what, but it's worth mentioning to any would-be investigators that CipherCloud also appears to be preserving certain punctuation in clear text. (I saw an instance of " I'm " encrypted in a way that preserved the apostrophe!)

As always, but doubly so when it comes to security products, Caveat Emptor! If you are looking at CipherCloud, or any SaaS encryption solution, you'd do well to ask a lot of specific questions and make sure the answers are clear and unambiguous.

Answered Aug 26 2012 by AdrenaLion

It Gets Worse: Answer with 5 Upvotes

I haven't posted in a while, so long in fact that the email tied to my Stack Exchange account is no more, I forgot my StackEx password, and I had to create a new account. (I'll leave it to the reader to decide if this is the real me.)

But I did want to just to follow up here, because there were some unanswered questions from my last post and the follow-up posts from others. Since I wrote the above post, I had been wondering myself how this searchable encryption could actually work without being incredibly weak from a security standpoint. As it happened, I was at the RSA Security 2013 conference this week where Ciphercloud was exhibiting. In between sessions I had time to visit their booth to learn more.

They do claim to do "military grade encryption", and it does appear that they can use third-party FIPS 140-2 encryption modules. However, in the demonstration I was given, where they were encrypting data in a SalesForce setup, the encryption was definitely NOT using FIPS 140-2 or anything close. In fact, I could see on their large demo screen the exact issues I had expected to see with their encryption algorithm, plus some things that just made me shake my head.

For example, it turns out that they are indeed preserving clear-text patterns in their ciphertext. Searching for "John" is easy if it is encrypted the same way (eg "XXyyZ123") everywhere. But they also appear to individually encrypt each word within a string, such as you would see in an Account Name field. I know this because they showed their demo of a side-by side comparison of clear text and encrypted Accounts. There were two Accounts with "United Oil & Gas" in the name. Both the encrypted names were the same. That means they are using the same key, nonce (IV), and padding for the Account Names. Since the whole point of encryption is to promote randomness in the ciphertext, this is a pretty weak, non-random implementation. Would you entrust your data to what amounts to XOR? I sure wouldn't.

But here is the part that had be shaking my head, mainly because it takes almost zero crypto cracking skills to determine the true value of the data: They appear to have issues, for reasons I cannot completely fathom, encrypting the punctuation characters in the string. In the example they showed me, "United Oil & Gas" was encrypted as something like "fgt^e3s3 SD72d & 3edf" (Note: they also have prefixes and suffixes that wrap their encrypted strings, but I have not included them because they appeared to be pretty consistent and may be there to identify the strings as encrypted, but would do nothing to protect the data.)

So, if you are looking for a customer that named "United Oil & Gas", you have a pretty simple way to narrow down which records that could be - just search for the "&" character, and narrow those results to the one where it appears between the second and third words. Then, in that list, look at the word lengths in the name, and the strings with the short second and third strings are your best bet. This is in part because in "United Casualty & Life", the word "Casualty" would have longer ciphertext than the word "Oil". (Remember they are using the same padding to make this all searchable.) The bottom line is that encryption hasn't really protected the data here. Cost with no benefit.

But it gets worse: Once you knew you had the ciphertext for each of the words “United”, “Oil”, and “Gas”, you could just search for matches for those ciphertext patterns, and you would know all the Account Names (and perhaps all the other fields as well) that had those words in them, as well as the placement of the words in the multi-word strings stored in those fields.

But then, it may be even worse: You may even be able to derive new words based on the words you have already derived. This is because for those three clear text words, you now know the ciphertext patterns for any words that begin with those strings. (Full disclosure: Here is where I am speculating a bit because the guy showing me the demo couldn't tell me the AES modes that they use. I am assuming they use something like CBC, and that they still process in 16 bit blocks - two characters - at a time.) With CBC, the same key, nonce, and padding will preserve the patterns of the strings at the beginning of words. So "United", "Un" would share two common character patterns to start their ciphertext, and "United" and "Unit" would share the first four. So if you derive "United" you could find any word that also began with "Unit", "Un" etc.

Using the example I saw at Ciphercloud’s booth at RSA, you could use that pattern preservation to find out any other account with the word "United", or "Oil", or "Gas", as well as any words that began with the same character strings as "United", "Oil", or "Gas."

Now, I know this was a demo, and the guy showing it was probably a marketing guy with no concept of security. But this was the 2013 RSA Security show. You are going to be viewed by people like me who know a thing or two about encryption, and poke holes in the shoddy stuff. I will also say, in their defense, these shows are coordinated by their marketing department and may not have the most up to date demonstration materials. So, perhaps they could have shown a better (newer?) implementation of their product that would have satisfied me or any security professional.

But the fact remains that they did not. And at one of the largest, most influential security shows in the US, if not the world , you shouldn't put up for demonstration something so easily defeated.

Caveat Emptor!

Answered Mar 1 2013 by adrenalion

No Exotic Encryption; See Video: Answer with 3 Upvotes

They are not using any exotic encryption. In fact I don't even think they are doing any encryption, just 1:1 mapping (tokenization) after lowering the case on plain text data. For details, I did some basic crypt-analysis on a still from their publicly visible demo video.

Basically they end up with a 1:1 mapping of lower case words. No matter if they circle the galaxy, suck all the energy of a star or perform AES256. At the end of the day it's just 1:1, at lower case word level! So you can run the entire "encrypted" conversation into a statistical analyzer and based on the frequency of regular English words uncover that 1:1 mapping. If you add the logic that word level patterns exist ("The the" is extraordinarily rare vs "extra extra") i.e. Markov chains modelling - then you need even fewer copies of "encrypted data" to peel off the security.

There is NO way I would trust my Aamzon S3 or Azure Blob storage to be encrypted by these guys.

Overall I would say this is borderline commercial snake oil because

Answered Mar 7 2013 by Sid

Slightly Related Science; How they Ought to do it: Answer with 3 Upvotes

I don't know how CipherCloud works. However, a related question is: How could you encrypt data in a database, in a way that allows you to achieve these goals? What are the best cryptographic techniques currently known, for that goal?

As it happens, that question has a good answer. Take a look at CryptDB, a system built by MIT researchers to encrypt all the data in your database while still allowing your application to manipulate the data. In their system, the application can execute SQL queries on the encrypted data (even though it is encrypted!) and do some limited computation on the data.

CryptDB uses a combination of techniques that have been developed by cryptographers over the past decade or two, to achieve these goals. They show that the result is practical, with good performance and ability to use it with existing systems (like phpBB). It's a brilliant system, and a significant advance for the field. Read their research paper for more on how they do it:

In summary, the techniques in CryptDB are what Ciphercloud ought to be doing. I have no clue whether Ciphercloud is actually doing that (you'd have to ask Ciphercloud that), but CryptDB represents about the state of the art in this area right now.

Answered Aug 25 2012 by D.W.

Commentary on the Video: Answer with 2 Upvotes

I also watched the video (thanks Sid, for the link) and after looking at it, it reveals some of the other methods that Ciphercloud appears to be using to preserve search. Nothing appears to be an implementation of any sort of homomorphic encryption.

I snapped a copy of one screen after the response from John is entered and encrypted, and have attached an image below (Editors note: Image removed) (apologies for the crude highlighting). Look at the word "meet" in John's post and then "meet-up" in the first post from Sophie. The pattern of ciphertext for the string "meet" is the same in both, which would be required if you were to perform searches by encrypting the user input of "meet" and sending the ciphertext to the cloud to actually perform the search.

I have not had time to fully explore this, but note that in "meet-up" the hyphen is preserved in the clear within the ciphertext. I suspect that this is because there is a requirement to enable search for the word "up", which basically requires setting the IV back to one of its static values like the one used when "meet" was encrypted or perhaps the one (assuming that there are any other IVs) used to encrypt other instances of "up". This is the only way to guarantee that the suffixed "up" will match the singular instance of "up".

I didn't highlight it, but you can also see that terminating punctuation such as question marks are preserved in the clear. Again, if you want to perform an exact match search for "meet", you need to strip the extra character because the ciphertext for "meet?" would be different than from "meet" so the search would not return results that a human would expect.

But, the implication here from a security perspective is that if I am able to plainly see punctuation such as hyphens, and preservation of patterns in the ciphertext is so critical that I have to strip (and then reveal!) trailing punctuation, then an attacker is provided a head start in breaking down the encryption. If you are not promoting randomness in your ciphertext you are not encrypting. What Ciphercloud appears to be doing is not random, therefore it is not truly encryption, and certainly not homomorphic encryption.

So, the answer to the original question is that Ciphercloud is NOT doing homomorphic encryption.

As always, Caveat Emptor! adrenalion

Answered Apr 20 2013 by adrenalion

License

This page is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license. It is free to use as long as you attribute me and the original authors and share it under the same license.

/Emil Vikström