Add partitions periodically in AWS RDS Aurora Cluster (MySQL compatible) with the SQL Procedures

Add partitions periodically in AWS RDS Aurora Cluster (MySQL compatible) with the SQL Procedures
Page content

Introduction

RDS Aurora is a managed service provided by AWS. Aurora is a OSS compatible relational database built on cloud.

In this article, I explain table partitioning with date in RDS Aurora MySQL compatible.

Goals

  1. Create table partitioned by date on RDS Aurora (MySQL compatible)
  2. Add a partition for a day periodically (once a day)
  3. Drop records after a certain period of time
  4. Handle errors in these operation running

As an example, I handle the following hoge table.

columntypesnote
idvarchar(255)User ID
infovarchar(255)Some information about the user for id
create_attimestampDate and time when the record was created

Add a partition once a day

Create hoge table by the following steps.

  1. Create a certain number of partitions at first
  2. Regist a SQL procedure to add a partition
  3. Run the procedure once a day

Create partition table

First, execute the following query and create hoge table.

 1-- create hoge table
 2DROP TABLE IF EXISTS hoge;
 3CREATE TABLE hoge (
 4  id varchar(255) NOT NULL,
 5  info varchar(255) NOT NULL,
 6  create_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 7  PRIMARY KEY (id, create_at),
 8  INDEX index_id (id)
 9) ENGINE=InnoDB DEFAULT CHARSET=utf8;
10
11ALTER TABLE hoge PARTITION BY RANGE (UNIX_TIMESTAMP(create_at)) (
12  PARTITION p20180219 VALUES LESS THAN (UNIX_TIMESTAMP('2018-02-19 00:00:00'))
13);

In this partitioning scheme, naming rules of partition is p + yyyyMMdd. create_at column is a partition key. id column and create_at column are Composite Primary Keys. The partition key must be included in the Primary keys.

The partition p20180219 is created by PARTITION BY RANGE clause, but p20180219 can store only the data for the create_at column before 2018-02-19 00:00:00.

Regist a SQL procedure to add a partition

Second, Create a SQL procedure to add a partition. One partition can store in range of 24 hours of data. To make the code easier to see, validation is omitted.

 1--
 2-- A procedure to add partition to hoge table.
 3-- This procedure creates partitions in the range of days 'from_date' to 'to_date - 1'.
 4-- 
 5-- Arguments 
 6--       from_date: The start of date time to create partitions
 7--       to_date: The end of date time to create partitions
 8-- Execution sample
 9--       CALL add_hoge_partition(str_to_date('2018-01-01', '%Y-%m-%d'), str_to_date('2019-01-01', '%Y-%m-%d'));
10
11DROP PROCEDURE IF EXISTS add_hoge_partition;
12DELIMITER $$
13CREATE PROCEDURE add_hoge_partition(IN from_date DATE, IN to_date DATE)
14  proc_label:BEGIN
15    DECLARE target_date DATE;
16    DECLARE partition_range DATE;
17    DECLARE p_count INT;
18
19    SET target_date = from_date;
20    WHILE DATEDIFF(to_date, target_date) > 0 DO
21
22      SET partition_range = DATE_ADD(target_date, INTERVAL 1 DAY);
23      SELECT CONCAT(
24        'ALTER TABLE hoge ADD PARTITION ( PARTITION ',
25        DATE_FORMAT(target_date, 'p%Y%m%d'),
26        ' VALUES LESS THAN (UNIX_TIMESTAMP(', QUOTE(DATE_FORMAT(partition_range, '%Y-%m-%d 00:00:00')), ')))'
27      ) INTO @ddl;
28
29      PREPARE ddl_stmt FROM @ddl;
30      EXECUTE ddl_stmt;
31      DEALLOCATE PREPARE ddl_stmt;
32
33      SET target_date = DATE_ADD(target_date, INTERVAL 1 DAY);
34    END WHILE;
35
36  END$$
37DELIMITER ;

Now add initial partitions using add_hoge_partition. Add the partition for 365 days with the current date as the base date.

1CALL add_hoge_partition(CURDATE(), DATE_ADD(CURDATE(), INTERVAL 365 DAY));

Create Event task for procedure call

Use MySQL Events that are tasks that run according to a schedule. To use that in RDS, need to set event_scheduler to ON in parameter group of RDS. A reboot of RDS instances is needless.

Call MySQL Events from SQL with the following query.

 1CREATE EVENT add_hoge_partition
 2ON SCHEDULE EVERY 1 DAY STARTS '2018-02-19 00:00:00'
 3DO CALL
 4    add_hoge_partition(
 5        (
 6          select 
 7            from_unixtime(max(PARTITION_DESCRIPTION)) 
 8          from 
 9            INFORMATION_SCHEMA.PARTITIONS 
10          where 
11            TABLE_NAME = 'hoge', 
12          DATE_ADD(
13            (
14              select 
15                from_unixtime(max(PARTITION_DESCRIPTION)) 
16              from 
17                INFORMATION_SCHEMA.PARTITIONS 
18              where 
19                TABLE_NAME = 'hoge'
20            ),
21            INTERVAL 1 DAY
22          )
23        );

This query get the latest partition information from INFORMATION_SCHEMA, and calculate target partition date that should be created.

Now the add_hoge_partition procedure is called every day at 0 o'clock.

On RDS Aurora Cluster, only writer instance executes MySQL Events

Make sure that MySQL Events runs only on Writer instance.

  • Execute the below query on Writer instance
1select * from INFORMATION_SCHEMA.PROCESSLIST where USER = 'event_scheduler' limit 10;
2> 1	event_scheduler	localhost		Daemon	40803	Waiting for next activation
  • Execute the below query on Reader instance
1select * from INFORMATION_SCHEMA.PROCESSLIST where USER = 'event_scheduler' limit 10;
2> Empty set (0.01 sec)

If the role of database change with failover, only new switched writer instance executes the events.

Error handling on Aurora

If there is no way to detect an error, nobody notices database problems. This sentence describes error handling of Aurora instance in procedure call.

Notification with AWS Lambda

RDS Aurora(MySQL Compatible) has mysql.lambda_async procedure as a default. The mysql.lambda_async procedure can execute AWS Lambda function directly with messages. Developers can be notified some database troubles using mysql.lambda_async .

The steps will be explained below.

Add IAM Role to Aurora cluster

When RDS Aurora executes AWS Lambda function, it should have a execution permission. Now, create rdsToLambdaRole to do that. The IAM role has a access policy as below

1"Action": [
2  "lambda:InvokeFunction"
3]

Set Amazon Resource Name(ARN) of rdsToLambdaRole to parameter named aws_default_lambda_role in RDS cluster parameter group. Since this parameter is dynamic, rebooting of RDS cluster is needless.

rds_to_lambda_role

Set network routing for subnets

RDS instances need to be permitted outbound communication. Generally, database instances are placed in private subnets, so developers should allow the subnets to communicate to outbound.

Add error handling to procedure

For error handling, add HANDLER declaration that executes AWS Lambda function to procedure. Call Lambda function named rds_monitor using mysql.lambda_async procedure. The mysql.lambda_async procedure requires ARN of AWS Lambda as argument.

Code snippet is as below.

 1    -- Notify when errors occured
 2    DECLARE EXIT HANDLER FOR SQLEXCEPTION
 3      BEGIN
 4        GET DIAGNOSTICS CONDITION 1 @p1 = RETURNED_SQLSTATE, @p2 = MESSAGE_TEXT;
 5        CALL mysql.lambda_async(
 6          'arn:aws:lambda:${your_region}:${your_account_id}:function:rds_monitor',
 7          CONCAT('{"message":"', @p2, '",',
 8                  '"state":"', @p1, '"}')
 9        );
10      END;

Add the above snippet to the previous procedure code.

 1DROP PROCEDURE IF EXISTS add_hoge_partition;
 2DELIMITER $$
 3CREATE PROCEDURE add_hoge_partition(IN from_date DATE, IN to_date DATE)
 4  proc_label:BEGIN
 5    DECLARE target_date DATE;
 6    DECLARE partition_range DATE;
 7    DECLARE p_count INT;
 8    DECLARE EXIT HANDLER FOR SQLEXCEPTION
 9      BEGIN
10        GET DIAGNOSTICS CONDITION 1 @p1 = RETURNED_SQLSTATE, @p2 = MESSAGE_TEXT;
11        CALL mysql.lambda_async(
12          'arn:aws:lambda:${your_region}:${your_account_id}:function:rds_monitor',
13          CONCAT('{"message":"', @p2, '",',
14                  '"state":"', @p1, '"}')
15        );
16      END;
17
18    SET target_date = from_date;
19    WHILE DATEDIFF(to_date, target_date) > 0 DO
20
21      SET partition_range = DATE_ADD(target_date, INTERVAL 1 DAY);
22      SELECT CONCAT(
23        'ALTER TABLE hoge ADD PARTITION ( PARTITION ',
24        DATE_FORMAT(target_date, 'p%Y%m%d'),
25        ' VALUES LESS THAN (UNIX_TIMESTAMP(', QUOTE(DATE_FORMAT(partition_range, '%Y-%m-%d 00:00:00')), ')))'
26      ) INTO @ddl;
27
28      PREPARE ddl_stmt FROM @ddl;
29      EXECUTE ddl_stmt;
30      DEALLOCATE PREPARE ddl_stmt;
31
32      SET target_date = DATE_ADD(target_date, INTERVAL 1 DAY);
33    END WHILE;
34
35  END$$
36DELIMITER ;

Implement AWS Lambda function

Implement AWS Lambda function to handle the message from Aurora. In case of me, I implement to notify errors to datadog that is a monitoring service.

[Addition] Error log on RDS instances

To check error log for RDS instances, I opened error/mysql-error-running.log on AWS Management Console. However, despite the fact that log file size is 38.2 kB, the log file shows only ‘END OF LOG’ as below.

rds error log is empty

Actually, this is the specification of RDS. The file includes the log used by AWS, and the log is not displayed to AWS users. For this reason, the file size is not 0.

Conclusion

It is available to

  • Call procedure and add partition table periodically.
  • Handle errors when error occured in procedure call.