February 22nd, 2016

MongoDB Aggregation Query: Getting All Records having Max Value

In this post, we will write a query using MongoDB aggregate method to get all records having Max (Highest) value for specific field. If you are new to aggregation framework, I would recommend to read following post first:
MongoDB Query: Select and Filter Child Array
Consider a collection school with the following documents:

{
    "_id" : 1,
    "name" : "Class 1",
    "students" : [  
        { "rollNo" : 10001, "name" : "Ram", "score" : 65 },
        { "rollNo" : 10002, "name" : "Shyam", "score" : 90 },
        { "rollNo" : 10003, "name" : "Mohan", "score" : 75 }       
        ]
},
{
    "_id" : 2,
    "name" : "Class 2",
    "students" : [  
        { "rollNo" : 20001, "name" : "Krishna", "score" : 88 },
        { "rollNo" : 20002, "name" : "Sohan", "score" : 91 },
        { "rollNo" : 20003, "name" : "Radhika", "score" : 82 },
        { "rollNo" : 20004, "name" : "Komal", "score" : 55 },
		{ "rollNo" : 20005, "name" : "Sonam", "score" : 91 }            
        ]
},
{
    "_id" : 3,
    "name" : "Class 3",
    "students" : [  
        { "rollNo" : 30001, "name" : "Monika", "score" : 77 },      
        { "rollNo" : 30002, "name" : "Rahul", "score" : 81 }		   		
        ]
}

Note: MongoDB 3.2 is used for this post.

Here our objective is to get list of topper students (Highest score in the class).

MongoDB aggregation offers the $max operator, but in our case we want the “whole” student record as it is. So the appropriate thing to do here is $sort and then use the $first operator within a $group

db.school.aggregate([
	{ $unwind :'$students'},
    { "$sort": { "name": 1, "students.score": -1 } },
    { "$group": {
        "_id": "$name",
        "name": { "$first": "$students.name" },
        "rollNo": { "$first": "$students.rollNo" },
        "score": { "$first": "$students.score" }
    }}
])

Output:

{ "_id" : "Class 1", "name" : "Shyam", "rollNo" : 10002, "score" : 90 }
{ "_id" : "Class 2", "name" : "Sohan", "rollNo" : 20002, "score" : 91 }
{ "_id" : "Class 3", "name" : "Rahul", "rollNo" : 30002, "score" : 81 }

But the above approach is not useful if multiple students have same highest score.

In another approach, first Max score is calculated for each class and then student record is filtered with max score. Let’s see following query based on it.

db.school.aggregate([{
	$unwind: '$students'
}, {
	"$group": {
		"_id": "$name",
		"maxScore": {
			"$max": "$students.score"
		},
		"studentgrp": {
			"$push": {
				"class": "$name",
				"name": "$students.name",
				"rollNo": "$students.rollNo",
				"score": "$students.score"
			}
		}
	}
}, {
	"$project": {
		_id: 0,
		toppers: {
			"$setDifference": [{
					"$map": {
						"input": "$studentgrp",
						"as": "student",
						"in": {
							"$cond": [{
									"$eq": ["$maxScore", "$$student.score"]
								},
								"$$student",
								false
							]
						}
					}
				},
				[false]
			]
		}

	}
}, {
	$unwind: '$toppers'
}, {
	$project: {
		class: "$toppers.class",
		name: "$toppers.name",
		rollNo: "$toppers.rollNo",
		score: "$toppers.score"
	}
}])

Here are the steps done in above query:
1. unwind students array
2. group by class name, get max score and push items in studentgrp.
3. In project, $map is used to generate array which has either top student record (comparing score with maxscore) or false. false items are filtered out with $setDifference.
4. unwind the generated $toppers
5. select the student information

It is okay for small data samples. But for large data, the above query should be divided into multiple parts.

Output:

{ "class" : "Class 1", "name" : "Shyam", "rollNo" : 10002, "score" : 90 }
{ "class" : "Class 2", "name" : "Sohan", "rollNo" : 20002, "score" : 91 }
{ "class" : "Class 2", "name" : "Sonam", "rollNo" : 20005, "score" : 91 }
{ "class" : "Class 3", "name" : "Rahul", "rollNo" : 30002, "score" : 81 }

You can see there are TWO records with same score for Class 2.

In this post, We have seen two approaches to get records having Max (Highest) value for specific field. One is to sort in descending order, select first record and another is calculate max value, select records based on it. How are you doing? Feel free to share your opinion, suggestion or query in the comment box.