Using Python+MongoDB for rapid and scalable app development

What is MongoDB?

MongoDB is an open-source document-oriented database management system (DBMS) with flexible schemas. Now it is available under the Free Software Foundation’s GNU AGPL Version 3.0 commercial license terms.

The document is a set of key/value pairs. It has a dynamic schema. Dynamic schemas mean that documents from one collection do not necessarily have the same set of fields and structures. It also implies that the common fields in the documents collections can contain different types of data. Like the other document-oriented DBMS, MongoDB is not a relational DBMS. There is no concept as a “transaction.” Atomicity is only guaranteed on a whole document-level, so the partial update of the document can not happen. Also, there is no an “isolation” concept: any data which is read by one client can simultaneously be changed by another client. If you are new to MangoDB, you can get the detailed comparison between Firebase, AWS and MongoDB at- Three Modern Technologies Software Stacks: Firebase vs. AWS vs. MongoDB.

What is pyMongo?

PyMongo is a tool for working with MongoDB and is the official recommended way to work with MongoDB from Python.

MongoDB inside PyMongo

In PyMongo we use dictionaries to represent documents. As an example, the following dictionary be used to represent a blog post:

import datetime

post = {“author”: “Mike”,

        “text”: “My first blog post!”,

        “tags”: [“mongodb”, “python”, “pymongo”],

        “date”: datetime.datetime.utcnow()}

Documents can contain native Python types (For eg., datetime.datetime). These native types will be automatically converted to and from the appropriate BSON types.

To add a document into the collection, you can use the insert_one() method:

posts = db.posts

post_id = posts.insert_one(post).inserted_id

One of the important type of query is – find_one method. This method returns a single document matching a query . Here we use find_one() to get the first document from the posts collection:

>>> posts.find_one()

The result is a document that we inserted previously. find_one() also supports querying on specific elements that the document must match. To get more documents, we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents. Also, we can limit the find() returned results. We only get documents with author “Mike” as:

for post in posts.find({“author”: “Mike”}):

    print post

At the PyMongo core is the MongoClient object, which is used to make connections and queries to a MongoDB database cluster. It can be used to connect to a standalone mongodb instance, a replica set or mongos instances. 

MongoDB with MongoEngine-

Python and PyMongo permit direct coding against MongoDB from Python. This is most appropriately compared to programming at the level of raw SQL for RDBMSes. That level is a necessary building block, but for most applications working at a higher level and building upon custom classes is more appropriate. This module explores one of the most popular Object-Data Mappers for Python and MongoDB: MongoEngine.

Entity Design-

Entity designing in MongoDB and document databases is very different than 3rd-normal-form from SQL tables. To be effective with MongoDB, as a developer you will need to master this skill. Getting your entity design correct is key to high performance and flexible applications.

Connecting and inserting into a database-

Probably the best thing about the pymongo  connection is that it’s automatically pooled. This implies that pymongo maintains a pool of connections to the mongodb server that it reuses over the lifetime of your application. This is useful since it implies pymongo doesn’t need to go through the overhead of establishing a connection each time it does an operation. Mostly, this happens automatically. You do, however, need to be aware of the connection pooling, however, since you need to manually notify pymongo that you’re “done” with a connection in the pool so it can be reused. A simple way to connect to a MongoDB database from python is below :

In: import pymongo

In: conn = pymongo.Connection()

Documents insertion starts by selecting a database. To create a database, you do… well, nothing, actually. The first time you refer to a database, the MongoDB server creates it for you automatically. So once you have your database, you need to decide which “collection” in which to store your documents. To create a collection, you do… right – nothing. 

In: db = conn.tutorial

In: db.test

Out: Collection(Database(Connection(‘localhost’, 27017), u’tutorial’), u’test’)

In: db.test.insert({‘name’: ‘My Document’, ‘ids’: [1,2,3], ‘subdocument’: {‘a’:2}})  

Out: ObjectId(‘4f25bcffeb033049af000000’)

here the insert command returned us an ObjectId value. This is the value that pymongo generated for the _idproperty, the “primary key” of a MongoDB document. We can also manually specify the _id if we want and we don’t have to use ObjectIds:

In: db.test.insert({‘_id’: 42, ‘name’: ‘My Document’, ‘ids’: [1,2,3], ‘subdocument’: {‘a’:2}})

Out: 42

Counting-

If we just want to know how many documents match a query we can perform a count_documents() operation instead of a full query. We can get a count of all of the documents in a collection:

>>> posts.count_documents({})

3

or just of those documents that match a specific query:

>>> posts.count_documents({“author”: “Mike”})

2

Bulk Inserts-

To make querying a more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list, sending only a single command to the server:

>>> new_posts = [{“author”: “Mike”,

             “text”: “Another post!”,

             “tags”: [“bulk”, “insert”],

             “date”: datetime.datetime(2009, 11, 12, 11, 14)},

            {“author”: “Eliot”,

             “title”: “MongoDB is fun”,

             “text”: “and pretty easy too!”,

             “date”: datetime.datetime(2009, 11, 10, 10, 45)}]

>>> result = posts.insert_many(new_posts)

>>> result.inserted_ids

[ObjectId(‘…’), ObjectId(‘…’)]

There are two interesting things to note about this example:

  • The result from insert_many() now returns two ObjectId instances, one for each inserted document.
  • new_posts[1] has a different “shape” than the other posts – there is no “tags” field and we’ve added a new field, “title”. This is what we mean when we say that MongoDB is schema-free.

Indexing and profiling-

Indexes are the single biggest contributor for extremely high performance MongoDB deployments and applications. Ensure that your applications use indexes to full advantage. Finding the queries that need optimized can be tricky, especially when there is a translation layer in the middle such as MongoEngine and an ODM.

MongoDB has an extremely fast query that it can use in some cases where it doesn’t have to scan any objects, only the index entries. This happens when the only data you’re returning from a query is part of the index:

In: db.test.find({‘a’:2}, {‘a’:1, ‘_id’:0}).explain()

Out: 


u’indexBounds’: {u’a’: [[2, 2]]},

u’indexOnly’: True,

u’isMultiKey’: False,

here the indexOnly field is true, specifying that MongoDB only had to inspect the index (and not the actual collection data) to satisfy the query. 

GridFS-

MongoDB has a facility to store, classify, and query files of virtually unlimited size in binary data, text data, etc. GridFS and show you how to work with it from Python. You can upload, download, and list files in GridFS. Also you can create custom classes and store them within our GridFS files which can then be used for rich reporting and querying that does not exist in standard file systems.

creating a GridFS instance to use:

>>> from pymongo import MongoClient

>>> import gridfs

>>>

>>> db = MongoClient().gridfs_example

>>> fs = gridfs.GridFS(db)


Replication-

Replication is key to MongoDB’s fault tolerance. It is also used for data locality across data centers, scaled-out read, offsite backups, reporting without performance degradation, and more. PyMongo makes working  with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.

Are you looking to develop a software using Python + MongoDB for your business? Solace is the right place to start with. Expert’s team believes in effectiveness of using PyMongo in development. They will help you to develop the best software using PyMongo that will be beneficial for your software development. Kindly contact us for any software development using PyMongo.