We often required to perform calculations on multiple rows to get the data for reporting, statistical and analytical purpose. In such cases, the aggregation functions available in MySQL are handy and we must be aware of using these functions to fetch the desired data. These functions include SUM, AVG, MAX, MIN, COUNT, and DISTINCT.
The rest of the section of this tutorial explains the usage of these aggregation functions.
Test Data
You can use the below-mentioned queries to prepare the test data to follow the subsequent sections. Create the school database and tables (user and score) to understand the aggregation functions.
# Create the database CREATE SCHEMA `school` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
# Create the user table CREATE TABLE `school`.`user` ( `user_id` BIGINT NOT NULL AUTO_INCREMENT, `first_name` VARCHAR(45) NOT NULL, `last_name` VARCHAR(45) NULL, `section` SMALLINT(6) NULL, `active` TINYINT(1) NOT NULL DEFAULT 0, PRIMARY KEY (`user_id`));
# Create the score table CREATE TABLE `school`.`score` ( `score_id` BIGINT NOT NULL AUTO_INCREMENT, `user_id` BIGINT NOT NULL, `section` SMALLINT(6) NOT NULL DEFAULT 1, `subject` VARCHAR(45) NOT NULL, `score` SMALLINT(6) NOT NULL DEFAULT 0, PRIMARY KEY (`score_id`));
Now add the test data to the user table as shown below.
# User data
INSERT INTO `user` ( `user_id`, `first_name`, `last_name`, `section`, `active` ) VALUES ( 1, 'John', 'Smith', 1, 1 ), ( 2, 'Rick', 'Jones', 1, 1 ), ( 3, 'Catherine', 'Ponting', 2, 1 ), ( 4, 'Harsh', 'Upadhyay', 2, 1 ), ( 5, 'Tajwinder', 'Singh', 2, 0 );
The above query inserts 5 users for sections 1 and 2. Also, insert the score data using the query as shown below.
# Score data INSERT INTO `school`.`score` (`score_id`, `user_id`, `section`, `subject`, `score`) VALUES ('1', '1', '1', 'Maths', '86'), ('2', '1', '1', 'Physics', '75'), ('3', '1', '1', 'Chemistry', '90'), ('4', '2', '1', 'Maths', '64'), ('5', '2', '1', 'Physics', '98'), ('6', '2', '1', 'Chemistry', '56'), ('7', '3', '1', 'Maths', '48'), ('8', '3', '1', 'Physics', '69'), ('9', '3', '1', 'Chemistry', '54'), ('10', '3', '2', 'Maths', '82'), ('11', '3', '2', 'Physics', '85'), ('12', '3', '2', 'Chemistry', '91'), ('13', '4', '2', 'Maths', '65'), ('14', '4', '2', 'Physics', '74'), ('15', '4', '2', 'Chemistry', '84');
We have inserted the score of the active students. Also, note that the score for the student with id 1 and 2 is added for section 1 for all the 3 subjects. The above query also inserts the score data of student with id 3 for sections 1 and 2. The student having id 4 got the score data for section 2. Now we have good test data to start learning the aggregate functions.
SUM
This section explains the usage of the aggregate function sum to get the section-wise score of the users to get the total score of all the subjects for all the sections as shown below.
# SUM - Total Score of the Users for each section SELECT `user`.`first_name`, `user`.`last_name`, `score`.`section`, SUM(`score`) AS total_score FROM `user`, `score` WHERE `user`.`user_id` = `score`.`user_id` GROUP BY `score`.`user_id`, `score`.`section`;
# Result
first_name last_name section total_score ---------------------------------------------------- John Smith 1 251 Rick Jones 1 218 Catherine Ponting 1 171 Catherine Ponting 2 258 Harsh Upadhyay 2 223
The above query results can be used to present the report showing the total score of all the users for each section they have studied in the school.
AVG
The aggregate function AVG can be used to get the average value of the columns qualified for the aggregation based on the WHERE conditions and the grouping applied by us.
We can obtain the section-wise average marks in each subject as shown below.
# AVG - Section wise average score in each subject SELECT `score`.`section`, `score`.`subject`, AVG(`score`) AS avg_score FROM `score` GROUP BY `score`.`section`, `score`.`subject`;
# Result
section subject avg_score -------------------------------------------- 1 Maths 66.0000 1 Physics 80.6667 1 Chemistry 66.6667 2 Maths 73.5000 2 Physics 79.5000 2 Chemistry 87.5000
Now we can use the above data to generate the report showing the average score obtained by the users for each section.
MAX
The aggregate function MAX can be used to find out the maximum value of the columns qualified for the aggregation based on the WHERE conditions and the grouping applied by us.
We can obtain the section-wise maximum marks in each subject as shown below.
# MAX - Section wise maximum score in each subject SELECT `score`.`section`, `score`.`subject`, MAX(`score`) AS max_score FROM `score` GROUP BY `score`.`section`, `score`.`subject`;
# Result section subject max_score -------------------------------------------- 1 Maths 86 1 Physics 98 1 Chemistry 90 2 Maths 82 2 Physics 85 2 Chemistry 91
MIN
The aggregate function MIN can be used to find out the minimum value of the columns qualified for the aggregation based on the WHERE conditions and the grouping applied by us.
We can obtain the section-wise minimum marks in each subject as shown below.
# MIN - Section wise minimum score in each subject SELECT `score`.`section`, `score`.`subject`, MIN(`score`) AS min_score FROM `score` GROUP BY `score`.`section`, `score`.`subject`;
# Result
section subject max_score -------------------------------------------- 1 Maths 48 1 Physics 69 1 Chemistry 54 2 Maths 65 2 Physics 74 2 Chemistry 84
COUNT
The aggregate function COUNT can be used to get the total number of values in the specified columns.
Using the test data, we can get the section-wise total active users as shown below.
# COUNT - Total active users in each section SELECT `user`.`section`, COUNT(`user_id`) AS active_users FROM `user` WHERE `user`.`active` = 1 GROUP BY `user`.`section`;
# Result section active_user -------------------- 1 2 2 2
DISTINCT
We can use the keyword DISTINCT to omit duplicate records. In our scenario, we can get the user data who obtained
# DISTINCT - Get list of users who got score SELECT DISTINCT `user`.`user_id`, `user`.`first_name`, `user`.`last_name` FROM `user`, `score` WHERE `user`.`user_id` = `score`.`user_id`;
# Result user_id first_name last_name -------------------------------------------- 1 John Smith 2 Rick Jones 3 Catherine Ponting 4 Harsh Upadhyay
We can see that the inactive user who did not score even once is omitted from the query results.
This is how we can use the aggregate functions to obtain the data for reporting and analysis purpose. These functions are also important for managerial activities to take organization decisions using the data accumulated over time.