Use the MongoDB MD5 field in GridFS

If you were like me you started throwing all kinds of files into MongoDB with GridFS. When you took a look at the db.fs.files collection you saw something like this for a document:

{
	"_id" : ObjectId("4c40affcce64e5e275c60100"),
	"filename" : "My First File.jpg",
	"uploadDate" : "Fri Jul 16 2010 15:16:12 GMT-0400 (EDT)",
	"length" : 55162,
	"chunkSize" : 262144,
	"md5" : "46aa378be7f6f1f3660efd7de5c1cbb6"
}

Did you see the MD5 hash? It’s there for a reason you know.

Since my PHP/MongoDB application has an administrative backend multiple people are loading up files. There is always a possibility that they will upload the same file. Of course this would be a very inefficient use of storage especially when the file is a video or picture.  That’s where the MD5 field in fs.files comes in handy.

In PHP you can use the md5_file() method to get the MD5 hash before you save the file to MongoDB. Running a findOne query using the md5 of your tmp file will let you know if  a document for that file already exists. If it does exist, then you’ll get back the fs.files document of the preloaded file. Then you can use the _id as a reference and don’t bother saving the file. Can you imagine all the money you save in storage fees on Amazon S3?

This is a very common and reliable way of doing things since byte for byte you know the files are the same. The sample script below is a snapshot of code in a Lithium application (Lithium is a new PHP 5.3+ framework). I’m basically running a findOne({“md5″ : “$md5″}) query:

protected function write() {
		$success = false;
		$grid = File::getGridFS();
		$this->fileName = $this->request->data['Filedata']['name'];
		$md5 = md5_file($this->request->data['Filedata']['tmp_name']);
		$file = File::first(array('conditions' => array('md5' => $md5)));
		if ($file) {
			$success = true;
			$this->id = (string) $file->_id;
		} else {
			$this->id = (string) $grid->storeUpload('Filedata', $this->fileName);
			if ($this->id) {
				$success = true;
			}
		}
		return $success;
}

Tags: , ,

Leave a Reply