Pop quiz hot shot: You've got a huge file you want to send to a friend -- should you zip it? Rar it? Tar.Gz it? Many people are familiar with several different types of compression -- but do you know how they perform against each other? Which method compresses the best? The fastest?
As you read this article, you'll come to find that the .zip format, arguably the most popular format, is one of the most inefficient formats. What format should you be using? Read on and find out.
The performance of a compression algorithm is based on several factors, including:
In order to recognize all of the above factors, I took the following steps to give all the tested algorithms equal footing:
Ajax/Javascript must be enabled in order to view the following data:
If you would like to further examine my original data, or even analyze it yourself, feel free to download the CSV file of the two SQL tables I used.
You'll notice that the compression ratios in the jpg and avi file types are quite horrible -- even two of the jpg file entries actually have a higher compressed size than an original size. The cause for the poor compression is because both of these files are already compressed -- compressing a compressed file is not particularly successful. The ability of a jpg to be compressed does, however, depend on its optimization -- some jpg files (which haven't been optimzied yet), can compress quite well.
To make things simple, I wrote small PHP scripts to take care of the recording for me. I used the 7z command-line to measure all the data, except for the rar format (which I used WinRar's command-line).
And the winner is -- 7z formats. The 7z (PPMd/BCJ2) was best on compressed file types (jpg, avi) and 7z (LZMA/BCJ2) was best on non-compressed files (txt, pdf).
Unfortunately, the loser is without a doubt the .zip format. Zip (Deflate) had the worst compression ratio, and Zip (BZip2 Ultra) was the slowest. I say this is unfortunate because the .zip format is one of the most popular formats online. It seems that it is so well supported -- not because it is the best method.
Nearly every single algorithm offers a 'normal' and an 'ultra' format -- the ultra format is designed to take more memory and time, but should yield superior compression ratios. I found that the ultra formats did not provide an acceptable increase in compression ratio at the cost of a longer run time.
If you're really, really interested in compression benchmarking, check out Maximum Compression. Maximum Compression provides a wealth of information regarding compression, especially little-known algorithms that perform even better than mainstream methods.
I believe that the only thing holding back the 7z format is mainstream adoption. People use .zip because they're comfortable with it, and they have files to unzip it. I would strongly encourage you to check out 7-Zip's homepage and consider using their format. Their client is able to decompress all the popular formats, along with the 7z format.