Mohamed Adel's Blog (aka. Rashdan's Blog): Software

Hi Everyone, My post today is about Data Compression Techniques, I’ve decided to post about Data Compression because It’s really widely used everywhere around us in many applications that we are using everyday… I really like this CS topic trying everyday to know more about it…
Today I’ll be talking about:

Types/ Techniques of Data Compression
Lossless Data Compression

Huffman Coding
LZC Coding

Lossy Data Compression

DCT Coding

Let’s Start now;
First, Simply There is 2 main techniques in data compression:

Lossless Data Compression
Lossy Data Compression

1. Lossless Data Compression: Where no Loss of info –> The compressed data can be decoded into exactly the original data.
2. Lossy Compression: Because sometimes we need to save size and time we are using this technique; Lossy Compression is not accurate 100%, This Compression technique is used in analog ranged data such as videos, JPEG encoding and sounds.
and now let’s talk about the most known fast Lossless Data Compression Algorithms that are used nowadays: Huffman and Lempel-ZIV coding … let’s take a look on them

Examples of Lossless Data Compression Algorithms

1-Huffman Algorithm:
A Huffman code is designed by merging together the two least probable characters, and repeating this process until there is only one character remaining. A code tree is thus generated and the Huffman code is obtained from the labeling of the code tree. An example of how this is done is shown below.
Huffman Design Animation

The final static code tree is given below:
Huffman Design Animation

It does not matter how the characters are arranged. The Tree have been arranged above so that the final code tree looks nice and neat.

It does not matter how the final code tree are labeled (with 0s and 1s). we have chosen to label the upper branches with 0s and the lower branches with 1s.

There may be cases where there is a tie for the two least probable characters. In such cases, any tie-breaking procedure is acceptable.

Huffman codes are not unique.

Huffman codes are optimal in the sense that no other lossless fixed-to-variable length code has a lower average rate.

The rate of the above code is 2.94 bits/character. [1]

The entropy lower bound is 2.88 bits/character. [2]

2-Lempel-Ziv Coding (LZ Coding):
The basic idea is to parse the input sequence into non-overlapping blocks of different lengths while constructing a dictionary of blocks seen thus far.

Encoding Algorithm:

Initialize the dictionary to contain all blocks of length one (D={a,b}).
Search for the longest block W which has appeared in the dictionary.
Encode W by its index in the dictionary.
Add W followed by the first symbol of the next block to the dictionary.
Go to Step 2.

The final code results are given below:

after talking about LZ Coding … now let’s talk about one of the most known Lossy Data Compression Algorithms : The Discrete Cosine Transform (DCT) used in JPEG and MP3 Compressions

Example of Lossy Data Compression Algorithms

1-Discrete Cosine Transform (DCT) Algorithm:
expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio (e.g. MP3) and images (e.g. JPEG) (where small high-frequency components can be discarded) (Wikipedia)
you can take a look on this Demo Java Applet that applies the DCT Algorithm: http://www.comp.nus.edu.sg/~cs5248/0910S2/l01/DCTdemo.html
DCT Simulation tool:
http://pi4.informatik.uni-mannheim.de/pi4.data/content/animations/dct_2d/index.html
e.g.
Consider this 8x8 grayscale image of capital letter A.

Original size, scaled 10x (nearest neighbor), scaled 10x (bilinear).
DCT of the image.
$\begin{bmatrix} 6.1917 & -0.3411 & 1.2418 & 0.1492 & 0.1583 & 0.2742 & -0.0724 & 0.0561 \\ 0.2205 & 0.0214 & 0.4503 & 0.3947 & -0.7846 & -0.4391 & 0.1001 & -0.2554 \\ 1.0423 & 0.2214 & -1.0017 & -0.2720 & 0.0789 & -0.1952 & 0.2801 & 0.4713 \\ -0.2340 & -0.0392 & -0.2617 & -0.2866 & 0.6351 & 0.3501 & -0.1433 & 0.3550 \\ 0.2750 & 0.0226 & 0.1229 & 0.2183 & -0.2583 & -0.0742 & -0.2042 & -0.5906 \\ 0.0653 & 0.0428 & -0.4721 & -0.2905 & 0.4745 & 0.2875 & -0.0284 & -0.1311 \\ 0.3169 & 0.0541 & -0.1033 & -0.0225 & -0.0056 & 0.1017 & -0.1650 & -0.1500 \\ -0.2970 & -0.0627 & 0.1960 & 0.0644 & -0.1136 & -0.1031 & 0.1887 & 0.1444 \\ \end{bmatrix}$

Basis functions of the discrete cosine transformation with corresponding coefficients (specific for our image).
Each basis function is multiplied by its coefficient and then this product is added to the final image.

On the left is final image. In the middle is weighted function (multiplied by coefficient) which is added to the final image. On the right is the current function and corresponding coefficient. Images are scaled (using bilinear interpolation) by factor 10x.
you can know more about DCT here : https://www.tach.ula.ve/vermig/DCT_TR802.pdf

[1] Compression Ratio: It’s the ration between the bits of data before compression and the Bits of data after compression ( Original Bits:Compressed Bits)
e.g.
if we have a file with an original data size of 65,535 bytes .. This file became 16,384 bytes after applying some data compression algorithms on it. we can say that the compression ratio is 65535:16384 which is approximately 4:1 or we can just say now that the file is 75% compressed (The total size is 100% and the compressed size is 25% of the total);
Now if we have 8 bits per each byte we can say that we are representing each byte by just 2 bits (25 %) or 2 bits per byte or 2 bits/character

*The Difference between the Original data and the Reconstruction data is called “Distortion” also “Fidelity” and “Quality”.

[2] The entropy rate of a source: is a number which depends only on the statistical nature of the source. If the source has a simple model, then this number can be easily calculated. Here, we consider an arbitrary source:

while paying special attention to the case where

is English text.
Zero-Order Model: The characters are statistically independent of each other and every letter of the alphabet,

, are equally likely to occur. Let

be the size of the alphabet. In this case, the entropy rate is given by

For English text, the alphabet size is m=27. Thus, if this had been an accurate model for English text, then the entropy rate would have been H=log₂ 27=4.75 bits/character.

References:

1-Khalid Sayoud Introduction to Data Compression 2nd Edition, 2006.
2-A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.
3-D. A. Huffman, ``A Method for the Construction of Minimum Redundancy Codes,'' Proceedings of the IRE, Vol. 40, pp. 1098--1101, 1952.
4-J. Ziv and A. Lempel, ``A Universal Algorithm for Sequential Data Compression,'' IEEE Transactions on Information Theory, Vol. 23, pp. 337--342, 1977.
5-J. Ziv and A. Lempel, ``Compression of Individual Sequences Via Variable-Rate Coding,'' IEEE Transactions on Information Theory, Vol. 24, pp. 530--536, 1978.
6-T. A. Welch, ``A Technique for High-Performance Data Compression,'' Computer, pp. 8--18, 1984.
7.C. E. Shannon, ``Prediction and Entropy of Printed English," available in Shannon: Collected Papers.

Hope you enjoyed this Data Compression Introduction and Thanks for Reading; Smile

For More Info Please Visit Data Compression - Debra A. Lelewer and Daniel S. Hirschberg


	What is The Imagine Cup?
	It is an International Competition organized by Microsoft that let everyone - software programmers, hardware developers, artists and dreamers - to imagine a better world, then make it happen. The Imagine Cup brings together more than 200,000 students from over 100 countries around the world where they compete to help find the answers. And no matter who comes up with the best solutions - everybody wins! Each year the Finals are held in a different city of the world. Past Worldwide Finals include Cairo, Egypt (2009);Paris, France (2008); Seoul, South Korea (2007); Delhi, India (2006); Yokohama, Japan (2005); Sao Paulo, Brazil (2004); and Barcelona, Spain (2003). The next Imagine Cup2010 will be held in Poland. The world can't wait! Register now! Competition Categories for the next Imagine Cup Poland 2010: Software Design This is the only Imagine Cup competition that is run locally each year in countries/regions all around the globe. The mission is simple: create real world software and service applications that use Microsoft tools and technology. Think big! The competition requires students to use their creativity and drive to make it to the Worldwide Finals stage. This is where legends are born and lives are changed - where an application starts as an idea and ends up creating change all over the world. Visit the Software Design competition page to learn more! Game Design Build a full game experience from scratch! Create a game that uses Microsoft's XNA Game Studio 3.0, Visual Studio and/or Silverlight. Calling all students who've always enjoyed playing games ... this is your chance to create your very own game and at the same time change the global community. The Game Design competition is a great opportunity for learning and advancement towards an important step in a budding career either as a game developer or as an entrepreneur in the game business. The world cannot wait to play YOUR game! Visit the Game Design competition page to learn more! Digital Media The popularity of homemade videos created via editing pictures, text, music, voice, and video footage has exploded with the easy-to-use processes available to post them on websites around the world. The Digital Media competition challenges students around the world to use their creativity to create web videos by combining user-generated content in order to address the Imagine Cup theme. Submissions will use the internet to communicate, explain, and touch on issues in our global society. Your team will need to convey its message in a brief period of time while also being visually and audibly stimulating. Visit the Digital Media competition page to learn more! Coming Soon - Award Opportunities! Including Embedded, IT Challenge and more... No matter which team comes up with the best solutions � the world wins.

Mohamed Adel's Blog (aka. Rashdan's Blog)

Thursday, August 22, 2013

Data Compression Techniques - 1

Examples of Lossless Data Compression Algorithms

Example of Lossy Data Compression Algorithms

Sunday, September 20, 2009

Imagine Cup, Get connected

Competition Categories for the next Imagine Cup Poland 2010:

AI-Enabled Risk Scoring for TPPs in Open Banking: A Game Changer for Ecosystem Trust