Windows developer article online

19 09 2012

As I mentioned inside an earlier post the article about the serialization in .NET using MongoDB was published on the windows developer. Today I realised that the article is available online.

Would be really happy to receive some feedback or discuss the concept with you.

Cheers,
Daniel





First article on a German .net magazine

28 06 2012

I have written an article with a colleague from work about the serialization in .NET using MongoDB. If you are interested in the article have a look at the current windows.developer.

http://it-republik.de/dotnet/dotnet-magazin/Datenbanken-0508.html

The article is in German but I hope many of you will read it and I would be very happy to get some feedback or discuss this article with you.

Cheers,
Daniel





HACK: Creating triggers for MongoDB

14 05 2012

I visited the MongoDB conference in Berlin. At one talk about tips, tricks and hacks for MongoDB the speaker mentioned that there is a little hack which you can use to create a trigger for MongoDB. I wanted to try this out because he only mentioned how to do this theoretically very shortly.

When you have configured MongoDB to work as a replicaset you maybe have noticed that on the local database a new collection called “oplog.rs” is created. Inside this collection MongoDB stores all insert / update and delete operations which are executed against this replicaset (it’s comparable to the transaction log on a SQL Server). The oplog collection is used to distribute all the operations from the primary node to all secondary’s. With the help of this collection and a little javascript file we are able to create something which behaves like a trigger.

Let’s start with the oplog collection. If you look at an entry from this collection you can see something which can look similar to the following extract.

{
    "ts" : {
        "$timestamp" : NumberLong("5724119038133534721")
    },
    "h" : NumberLong("-7041921609633449468"),
    "op" : "i",
    "ns" : "TestApplication.BlogPost",
    "o" : {
        "_id" : ObjectId("4f7027f0df6e252390d2332a"),
        "Author" : "Test Author",
        "CreationDate" : new Date("Mon, 12 Mar 2012 00:00:00 GMT +01:00"),
        "Comment" : "My Comment"
    }
}

ts: is the timestamp. We need the timestamp to avoid that an element can be triggered twice.

op: is the operation. The interesting operations are “i” for insert / “u” for update and “d” for delete.

ns: is the namespace (database and collection) were the operation was executed.

o: is the object which is created or updated.

If you need more information about the oplog have a look at the following page about the oplog on the website from MongoDB:

http://www.mongodb.org/display/DOCS/Replica+Sets+-+Oplog

Now we create the javascript file. This script has a while loop without any option to exit this loop. We want to watch for all changes on the oplog and want to react on these changes. As long the script is running we have a behavior similar to a trigger.

Two features of MongoDB are used to allow the execution of this script (have a look at the links if you want to have further information):

Now have a look at the script and modify and reuse it if you like.

var coll = db.oplog.rs;
var lastTimeStamp = coll.find().sort({ '$natural' : -1 })[0].ts;

while(1){
    cursor = coll.find({ ts: { $gt: lastTimeStamp } });
    // tailable
    cursor.addOption( 2 );
    // await data
    cursor.addOption( 32 );

    while( cursor.hasNext() ){
        var doc = cursor.next();
        lastTimeStamp = doc.ts;
        printjson( doc );
    }
}

What the current script does is checking for operations inside the oplog and print out the oplog entry. Just change the line with the printjson command to the operation you want to perform as the result of the trigger. On the line where you initialize the cursor you can enhance the query if you maybe only want to react on update operations.

I developed on a project with MongoDB nearly 1.5 years now and I didn’t came across a problem were I really need a trigger. I saw a couple of people asking for triggers at different pages and hope I can help some of them with this little hack. This is not tested on a high traffic environments.





Persisting and fetching DateTime values with MongoDB

28 03 2012

The C# driver for MongoDB serializes data by default from C# objects to a bson representation which is stored inside MongoDB. The DateTime type is a bit special which I want to demonstrate with the following test.


public abstract class RepositorySubjectwhere T : Repository
{
    public static T Subject { get; set; }

    public RepositorySubject()
    {
        var mongoDb = new MongoDB();
        Subject = (T)Activator.CreateInstance(typeof(T), mongoDb);
    }
}


public static class BlogRepositorySpecs
{
    [Subject(typeof(BlogRepository))]
    public class When_refetching_persisted_data : RepositorySubject
    {
        private static BlogPost blogEntry;

        private static BlogPost result;

        Establish context = () =>
        {
            blogEntry = new BlogPost()
            {
                Author = "Test Author",
                Comment = "My Comment",
                CreationDate = new DateTime(2012, 4, 12),
                Id = ObjectId.GenerateNewId()
            };
        };

        Because of = () =>
        {
            Subject.Save(blogEntry);
            result = Subject.FindById(blogEntry.Id.ToString());
        };

        Cleanup after = () => Subject.Drop();

        It should_have_the_correct_creationdate = () => {
           result.CreationDate.ShouldEqual(blogEntry.CreationDate);
        }

        It should_have_the_correct_author = () => {
            result.Author.ShouldEqual(blogEntry.Author);
        }

        It should_have_the_correct_comment = () => {
            result.Comment.ShouldEqual(blogEntry.Comment);
        }
    }
}

This simple test is written with mspec. What the tests does is create a BlogPost-object with a DateTime value and 2 string values. This object is persisted inside MongoDB. Afterwards the object is retrieved and the expected data is compared to the retrieved data.
The abstract RepositorySubject is a helper which can be used for different mspec tests which are written to test different repository functionality.
The Repository class by itself is an abstract class holding the most important CRUD-operations for the usage with MongoDB. BlogRepository is inherited from Repository. Additional implementations can be added here. Inside the MongoDB class the connection to the MongoDB database is established and the commands are executed.

Running the Test

When we run the test on a machine where the current time is set to something else than UTC time the test will fail. Why will the test fail? DateTime values are stored as UTC time inside the database. When we retrieve the data from the database the DateTime value is fetched as a UTC DateTime value. Therefore we compare a local DateTime with a UTC DateTime value.

Set Serialization Options for DateTimes

What can we do to fix this problem? We can register a serialization option for the DateTime value. With this option we can specify that the value will be converted to the local DateTime when we retrieve the object from the database. The following code snippet shows how to transfer the CreationDate from UTC to the local DateTime value.


public class RegisterSerializer
{
    public void Setup()
    {
        BsonClassMap.RegisterClassMap(cm => {
            cm.AutoMap();
            cm.GetMemberMap(c => c.CreationDate).SetSerializationOptions(
                    new DateTimeSerializationOptions(DateTimeKind.Local));
        });
     }
 }

After adding this class we only need to call the method which can be done inside the RepositorySubject by adding the following 2 lines of code.


var registerSerializer = new RegisterSerializer();
registerSerializer.Setup();

Now we can run the tests again. The result should be as expected – green.





Thoughts about Replica Set configuration with MongoDB

2 03 2012

In my last post I provided you with a setup script which can be used to get a simple replica set configuration up and running. Now I want to talk about some details of the script and why I have created the script like it is.

Why should I use the option notablescan?

In my opinion this is a flag which should be set on every development environment. When you write new functionality inside your data access it can happen that you forget to update the indices on the database. This will result in bad performance on queries. Especially on applications with a lot of traffic, this will result in big performance issues.

To avoid this problem enable the notablescan option on your development environment. Every time when a query has to run over the complete table to fetch data (because of a missing index) you will receive an exception similar to the following:

image

When you set this option on your development environment the risk to deploy code to your live systems, without a correct index, is reduced to a minimum.

Priority for Replica Set nodes

While setting up a replica set, you can provide every node with a priority thru the configuration. The priority is used to rate a single node as the primary. A higher priority will result in a higher chance to be rated as primary. The priority of 0 excludes a node from becoming a primary. This is useful to exclude nodes with bad performance to be rated as primary. With version 2.0.2 of MongoDB (Windows) you can specify a priority from 0.0 to 100.0; inside the script I want to achieve that the node with the smallest port number will be the primary. The configuration for the first node starts with the priority of 100. Every node receives a priority which is decreased by 1. Therefore we make sure that the node with the smallest port number has the highest priority and will be rated as primary. For the reason the first node fail to start, the node with the second highest priority will take over.

Reinstallation of Replica Sets; but what happened to my data?

As I mentioned in my last post about the setup script we remove the old service (if some exists with the same name) and install everything new. For the reason we want to keep the data inside the database we need to do some things to achieve this.

If we used the script to install the replica set and created a database called “MyTestDb”, we should have a folder structure on the file system which should look like in the following picture.

image

Now we want to reinstall the instance with another configuration and keep all the existing data. On every node, expect the node with the smallest port number; we make sure that none of the data folders hold any data or folders. In this case the “replSet2” and the “arbiter” folder are emptied completely. We need to do this because when installing a replica set it’s not allowed to have data inside any node except form the node where you initiate the configuration.

Inside the “replSet1” folder we only need to delete the content of the local folder. The content of the “MyTestDb” folder isn’t touched. Why do we need to do this? The configuration of the replica set is stored inside the local database. If we don’t delete content of the local folder, we can’t run the initiate method; we can only use the reconfigure options. To avoid a differentiation between a new installation and a reinstall, I decided to implement the script to use a remove with a completely new initiation process.

After the new installation the replica set should come up with the new configuration. On the startup the replica set will start to sync all data from the “replSet1” folder to all other nodes. All data from the “MyTestDb” folder are synced. When the replication process is finished, you have a completely new configured replica set with the complete content from the old configuration.

I hope this information help some of you. If you have any questions about this or the setup script just let me know.

Cheers,

Daniel





Setup MongoDB as a service with Powershell

27 02 2012

I have written a small Powershell script to setup a MongoDB instance as a single node or in a replicaset configuration. I want to share the script here and hope this is useful for some of you (feedback is welcome). To run the script you need admin rights; otherwise the service can’t be created. The main purpose of the script is to get MongoDB up and running on a Windows PC for local development.

In this post I want to talk about how the script works. In a further post I will provide some information regarding the setup and configuration.

I have separated the script into 3 files. MongoDbSetup.ps1, WebClient.ps1 and Zip.ps1. The MongoDbSetup-file is the main script and responsible for the installation. WebClient and Zip are only small helpers. WebClient.ps1 is used to download a file and display a progress bar (while downloading). The file Zip.ps1 is used to unzip a zip-file to a specified destination folder (used to unpack MongoDB after download).

What the MongoDB setup script do

The following picture shows a simplified process what the script does.

image

I want to provide you with a bit more details about the script execution. The first thing we do is to setup the folder structure we expect inside the script. We have a download folder where the downloaded binaries from MongoDB are stored (zip-files). Every installed instance has his own folder in this case “MongoDB ReplicaSet”. Inside the “MongoDB ReplicaSet” directory we have 3 folders. “Bin” for storing the unzipped MongoDB binaries, “data” for the database files and “log” for all log messages.

image

The location and name where the folders get stored can be defined as a parameter while calling the script. Before we start to download the zip-file holding the MongoDB binaries we want to make sure there is no service running with the same name like the service we want to create. Therefore the script shuts down existing services to allow replacing the service thru the script. This can be handy for the reason you want to update an existing instance (for example you want to update the MongoDB version). Then the download of the zip-file with the binaries starts. The download will fetch the Windows 64-bit version of the binaries for the specified MongoDB version (tested with version 2.0.2 and 1.8.5). If the format of the filename on the MongoDB server will change you need to update the script. If you install a second node the download won’t fetch the file from the server as long you have the zip file inside the download folder (which is created thru the installation process).

The next step is to unzip and copy the executables to the bin folder for the new instance. The fact that we have a bin folder for every instance, made it possible to run different MongoDB versions on the different instances.

Now all preparation is done. We can start with the installation of the single node or the replicaset. I will describe the process of the replicaset installation because it’s much more interesting. The single node installation is a sub-part of the replicaset configuration.

For the reason we want to make changes on an existing replicaset with the script, we need to remove the existing instances with the same name. Afterwards we can setup the number of nodes. The amount of nodes is provided thru a parameter when calling the script. After the setup of the nodes another node is installed as arbiter, if you didn’t change the default configuration. Now we have all nodes installed and need to start the services.

The last step is to configure of the replicaset. We create a file which holds the configuration for all nodes. After creating the file we can run mongo.exe and provide the file as parameter to run and initiate the replicaset configuration. The replicaset needs a bit of time till it is up and running. Connect to your newly created instance and check the replicaset status by calling rs.status(). Then you are done.

As I mentioned above there are a couple of parameters you can set by calling the script to override the default values. In the following table you can find a list with these parameters.

Parameter Default Usage
version 2.0.2 Specify the version of the MongoDB binaries we want to use for the new instance.
Mode ReplicaSet Options are ReplicaSet and SingleMode. Depends on the instance you want to install.
portNumber 30000 Start the port number at the given port. On nodes for a replicaset the port is increased for every node.
numberNodes 2 Number of nodes (without arbiter)
useArbiter True Create and use an arbiter
destinationPath c:\mongodb\ Path where the installation stores the data
serviceName MongoDB ReplicaSet Name of the service which is created. When creating a replicaset a number is attached to the name.

I have uploaded the scripts on GitHub; use it on your own risk 🙂
The repository is located at: https://github.com/danielweberonline/MongoDB-Setup

Cheers,

Daniel





CSV export from MongoDB using PowerShell

7 02 2012

The tools to import and export data on a MongoDB instance are very powerful. I really like the tools because they are very easy to use. Some features would be nice to have, but you can reach a lot with the current set of tools and options.  Detailed information about the import and export tools can be found at the following address: http://www.mongodb.org/display/DOCS/Import+Export+Tools

Mongoexport offers an ability to export data to csv which you can easily read with Excel. This allows “normal” users to display, sort & filter data in their familiar environment. Especially for flat documents the csv export is a great option.

To export data from a collection you can use a command which is similar to this one:

mongoexport -d <databaseName> -c <collectionName> -f “<field1,field2>” --csv -o <outputFile>

But wait there is one thing which I don’t like about this. We must define the fields we want to export. When you use the “csv”-option for mongoexport, the field-options becomes required. But what can I do to avoid a hard coded list of fields? Especially on an environment where many changes will happen, you need a solution which works without a manual edited field list.

What we can do is a map/reduce to get all field names from every document inside a collection. With this result you are able to call mongoexport with a field list which is generated on the fly. Details about map/reduce for MongoDB can be found at the following address: http://www.mongodb.org/display/DOCS/MapReduce

The map/reduce can look like the following example:

function GetAllFields() {
    map = function(){
        for (var key in this) { emit(1, {"keys" : [ key ]}); }
    };

    reduce = function(key, values) {
        var resultArray = [],
            removeDuplicates = function (elements) {
                var result=[],
                    listOfElemets={};
                for (var i = 0, elemCount = elements.length;
                                                        i < elemCount; i++) {
                    for (var j = 0, keyCount = elements[i].keys.length;
                                                         j < keyCount; j++) {
                       listOfElemets[elements[i].keys[j]] = true;
                    }
                }
                for (var element in listOfElemets) {
                    result.push(element);
                }
            return result;
        }

        return { "keys" : removeDuplicates(values) };
    };

    retVal = db.<collectionName>.mapReduce(map, reduce, {out : {inline : 1}})
    print(retVal.results[0].value.keys);
}
GetAllFields();

This function can be stored inside a js-file. Now we need to execute the function. This can be done, for example with PowerShell and the help of mongo.exe. The result of the script execution is a comma separated field list with all field-names on the 1st -level from one collection. This is exactly the format we need for the csv export using mongoexport. Therefore we are ready to go and can call the export to a csv-file.

The following code shows a PowerShell script which retrieves the field-list and runs the export afterwards.

$fieldNames = (mongo.exe <server>/<database> <scriptFile> --quiet)
(mongoexport.exe -d <databaseName> -c <collectionName> -f $fieldNames --csv   -o <outputFile>)

Hope this will be useful for someone!