⚙️ Pictures Database Manager

Project completed as a duo within the context of the EPFL course « System programming project »

C

This project is a command line utility tool for managing images in a specific format database. This is an inspired and simplified version of the Haystack system used by Facebook.

Social networks have to manage hundreds of millions of images. Usual file systems (such as the one used on your hard disk) have efficiency problems with such numbers of files. Moreover, they are not designed to handle the fact that we want to have each of these images in several resolutions, for example very small (icon), medium for a quick "preview" and in normal size (original resolution ).

In the “Haystack” approach, several images are in the same file. Also, different resolutions of the same image are stored automatically. This single file contains both data (images) and metadata (information about each image). The key idea is that the image server has a copy of this metadata in memory, in order to allow very fast access to a specific photo, and in the correct resolution.

This approach has a number of advantages: first, it reduces the number of files managed by the operating system; on the other hand, it makes it possible to elegantly implement two important aspects of the management of an image database:

  1. automatic management of different image resolutions, in our case the three supported resolutions;
  2. the possibility of not duplicating identical images submitted under different names (eg by different users at Facebook); it is an extremely useful optimization in any social network.

This “deduplication” is done using a “hash function” which summarizes binary content (in our case an image) into a much shorter signature. We use here the “SHA-256” function which summarizes all binary content in 256 bits, with the interesting cryptographic property that the function is resistant to collisions: for a given image, it is practically impossible to create another image which would have the same signature.

Professor: Jean-Cédric Chappelier

Discover the application

HTTP web server using the image database management utility