If you were like me you started throwing all kinds of files into MongoDB with GridFS. When you took a look at the db.fs.files collection you saw something like this for a document:

	
{
	"_id" : ObjectId("4c40affcce64e5e275c60100"),
	"filename" : "My First File.jpg",
	"uploadDate" : "Fri Jul 16 2010 15:16:12 GMT-0400 (EDT)",
	"length" : 55162,
	"chunkSize" : 262144,
	"md5" : "46aa378be7f6f1f3660efd7de5c1cbb6"
}

Did you see the MD5 hash? It’s there for a reason you know.

Since my PHP/MongoDB application has an administrative backend multiple people are loading up files. There is always a possibility that they will upload the same file. Of course this would be a very inefficient use of storage especially when the file is a video or picture.  That’s where the MD5 field in fs.files comes in handy.

In PHP you can use the md5_file() method to get the MD5 hash before you save the file to MongoDB. Running a findOne query using the md5 of your tmp file will let you know if  a document for that file already exists. If it does exist, then you’ll get back the fs.files document of the preloaded file. Then you can use the _id as a reference and don’t bother saving the file. Can you imagine all the money you save in storage fees on Amazon S3?

This is a very common and reliable way of doing things since byte for byte you know the files are the same. The sample script below is a snapshot of code in a Lithium application (Lithium is a new PHP 5.3+ framework). I’m basically running a findOne({“md5″ : “$md5″}) query:

	
protected function write() {
		$success = false;
		$grid = File::getGridFS();
		$this->fileName = $this->request->data['Filedata']['name'];
		$md5 = md5_file($this->request->data['Filedata']['tmp_name']);
		$file = File::first(array('conditions' => array('md5' => $md5)));
		if ($file) {
			$success = true;
			$this->id = (string) $file->_id;
		} else {
			$this->id = (string) $grid->storeUpload('Filedata', $this->fileName);
			if ($this->id) {
				$success = true;
			}
		}
		return $success;
}

It was great attending and presenting at the MongoSF conference held on April 30th in San Francisco – Thanks 10Gen.

Speaking with a few attendees I got the feeling that many are moving forward with Mongo production deployments. I should be a bit more specific with my relative use of the word “many”:

“The Many” – Yes, I’m slightly jealous that the Ruby community was there in full force and vocal.

“The Other Many” – There were also folks who were jumping right in with Java and Python.

“The Not So Many” – I’m not getting the same feeling of enthusiasm from the PHP community.

The PHP community doesn’t seem as audible with their success/concerns/thoughts of using Mongo. There are developers using it and liking it. But it doesn’t “feel” as accepted yet. Maybe this feeling has some substance to it. According to a survey taken back in February 2010 the Ruby community dominated by representing 40% of the MongoDB users. We’re even outnumbered by Python users. Of course, one can question how far reaching this survey was. However, considering the fact that several PHP frameworks have made a concerted effort to work with MongoDB I’m wondering why I’m not hearing more about it. I’m probably not listening in the right circles.

With all that said and MongoSF in the past it’s time to prepare for MongoNY. I’m going to ask those aforementioned questions to the PHPers who show up so be ready! During the talk I’ll also discuss each framework and what it’s doing with MongoDB. Come to MongoNY and lets have a lively PHP/MongoDB discussion. You can use the discount code “lightcube” at checkout for a 25% discount.

In the meantime, here are my slides from the MongoSF presentation:

(Yes, my JSON business card on slide #2 validates!)

You’ve quickly put together the code to add files to MongoDB. But as any good DBA and Developer knows not everything that goes into your database should stay there.

Yes, you already figured out it’s as simple as $grid->remove(). Here is something to keep in mind. MongoGridFS::remove won’t present you with a message if it failed. To ensure that a file was successfully removed use MongoDB::lastError(). You’ll get an array in return that will show something along the lines of:

Array
(
[err] =>
[n] => 0
[ok] => 1
)

The field to watch is “ok”. The value of “1″ will return if the remove went according to plan. Otherwise you’ll get “0″ and it’s time to check the value for “err”. If you’re wondering about the “n” its the number of files that were removed.

In practice your code could look something like this:

/*
* function remove
* Removes files from Mongo and captures error if failed
* @return mixed
*/
public function remove($criteria)
{
    //Get the GridFS Object
    $grid = $db->getGridFS();

    //Setup some criteria to search for file
    $id = new MongoID($criteria['_id']);

    //Remove file
    $grid->remove(array('_id'=>$id), true);

    //Get lastError array
    $errorArray = $db->lastError();
    if ($errorArray['ok'] == 1 ) {
        $retval = true;
    }else{
        //Send back the error message
        $retval = $errorArray['err'];
    }

    return $retval;
}

With a few short lines you’re able to capture the error message and do whatever you like with it. Hopefully this helps keep your fs.files and fs.chunks collections clean.

As a quick followup to the last blog, I was asked to demonstrate how we could include some metadata with a file during or after upload.

This is a great feature since we can retrieve all the information attached with a document during a query. Including metadata (i.e. comments, dates, votes or WHATEVER ELSE YOU LIKE) is just as easy.

We are going to pick up the ball at the point where you’ve got all your MongoDB objects ready:

<?php
$date = new MongoDate();
//Use $set to add metadata to a file
$metaData = array('$set' => array(“comment”=>”MongoDB is awesome”, “date”=>$date));

//Just setting up search criteria
$criteria = array('_id' => $id);

//Update the document with the new info
$db->grid->update($criteria, $metaData);

?>

Did you break a sweat? Probably not.

BTW, I threw in a little extra tidbit there with MongoDate(). It’s a better way of inputting a date into Mongo instead of using PHP date with a bunch of formatting to get it just right.

I’ve been tweeting quite a bit about MongoDB over the past few weeks and its time to blog.

About 2 months ago we decided to install and mess around with MongoDB. We went from messing around to serious adoption about 2 weeks ago when we realized the power of working with it and PHP. It was Mitch Pirtle of www.spacemonkeylabs.com that first pointed us in this direction. Mitch was very enthusiastic about Mongo as he explained the great potential. Yet, it was only until I added an array of data from PHP into Mongo that my eyes started to open up.

Go ahead, open another tab in your browser (KEEPING THIS ONE OPEN) and Google MongoDB if you haven’t already (I made it easy providing the link). You’ll find sufficient documentation to give you the warm and fuzzy that 10gen isn’t messing around. They are in the database game for the long haul.

During our play with MongoDB we realized how true the term NoSQL is. For all you SQL grease monkeys out there it’s liberating. No Joins, No Select, just plain NoSQL.

As an example, let’s start off with putting uploaded files “directly” into MongoDB via GridFS (the storage specification for handling large files). Here is how simple it is:

First, we are going to assume you already have the code in place to handle the upload itself. We first tested this using SWFUpload which provides quite a bit of flexibility if you like to control how your page looks during upload. From a PHP perspective, the uploaded file will be accessible via the predefined variable $_FILES.

Here is what you have to do to get the upload into MongoDB:

<?php

$m = new Mongo(); //connect
$db = $m->selectDB("example"); //select Mongo Database

$grid = $db->getGridFS(); //use GridFS class for handling files

$name = $_FILES['Filedata']['name']; //Optional - capture the name of the uploaded file
$id = $grid->storeUpload('Filedata',$name) //load file into MongoDB

?>

Yep, that was it. One thing that took me a “second” to realize is that you actually pass the literal name of the file_post_name (in the example ‘Filedata’) to MongoDB. It does all the heavy lifting of getting the data from the system and storing it. Also, take note that you get an $id back which is the MongoID that acts as the “primary key”. That makes it easy for you to reference the file right away if you need to.

So that’s a rather quick look and tip for MongoDB. Stay tuned  as we continue to put out our MongoDB tricks and tips for PHP. We’re in it for the long haul too.