MySQL Stored Procedures: How and why with examples
MySQL allows you to define stored procedures on your databases that can execute arbitrary SQL statements whenever you invoke them, including updating data in one or more tables. This tutorial will teach you how to use stored procedures and give you some ideas about what they're good for.
What's a stored procedure?
A stored procedure is a set of SQL statements that you invoke with the CALL
keyword. They can accept parameters that change how the procedure operates. Here's a simple classic example using MySQL stored procedure syntax: defining a stored procedure that inserts values into multiple tables.
mysql> CREATE TABLE employees (
id bigint primary key auto_increment,
first_name varchar(100),
last_name varchar(100));
mysql> CREATE TABLE birthdays (
emp_id bigint,
birthday date,
constraint foreign key (emp_id) references employees(id)
);
mysql> delimiter \\
mysql> CREATE procedure new_employee(
first char(100),
last char(100),
birthday date)
BEGIN
INSERT INTO employees (first_name, last_name) VALUES (first, last);
SET @id = (SELECT last_insert_id());
INSERT INTO birthdays (emp_id, birthday) VALUES (@id, birthday);
END;
//
mysql> delimiter ;
mysql> call new_employee("tim", "sehn", "1980-02-03");
Query OK, 1 row affected (0.02 sec)
mysql> mysql> SELECT * FROM birthdays;
+--------+------------+
| emp_id | birthday |
+--------+------------+
| 1 | 1980-02-03 |
+--------+------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM employees;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | tim | sehn |
+----+------------+-----------+
1 row in set (0.00 sec)
This procedure inserts a row into both the employees
and birthdays
tables, using the generated employee ID from the first INSERT
to do the second.
Let's break down the definition of this stored procedure and see what each part does.
Defining a stored procedure
The procedure we defined above looks like this. I'm commenting every element in the definition to make it clear what it does.
delimiter // -- since our stored procedure contains multiple statement separated with ";",
-- we need to tell the MySQL shell not to try to execute the statement when
-- it encounters a ";" like it normally would. Instead, it should wait for "//"
CREATE procedure new_employee( -- the name of our procedure is new_employee
first char(100), -- the first param is called "first" and is a character string
last char(100), -- the second param is called "last" and is a character string
birthday date) -- the third param is called "birthday" and is a date
BEGIN -- since our procedure body has multiple statements, we wrap them in a BEGIN .. END block
INSERT INTO employees (first_name, last_name) VALUES (first, last); -- insert into employees table
SET @id = (SELECT last_insert_id()); -- assign the auto-generated ID from the INSERT to a variable
INSERT INTO birthdays (emp_id, birthday) VALUES (@id, birthday); -- INSERT into the second table
END; -- end of the BEGIN .. END block
//
delimiter ; -- now that we're done defining our procedure, change the delimiter back to ";"
So a procedure has a number of parts that must be declared in the order above. They are:
- The name of the procedure. Procedure names must be unique in a schema.
- A list of parameters (could be none). Each parameter must have:
IN
orOUT
. Out-params are how MySQL returns values from stored procedures (see next section). Params areIN
by default.- A name, which must be unique in the parameter list
- A type, which MySQL will convert automatically if possible (like how the string "1980-02-03" got converted to a date)
BEGIN
starts the body of the procedure, like a curly brace in most programming languages. This is optional if your procedure has only a single statement- One or more statements for the body of the procedure, each ending in
;
END
ends the body of the procedure
Also note the use of DELIMITER
. This isn't a SQL statement, it's a command to the MySQL shell. These are only necessary if you're using the MySQL shell to define your procedure. If you're using another tool, like a GUI workbench, it will already correctly interpret the ;
characters in the body of the procedure.
OUT
parameters
MySQL stored procedures don't use the return
keyword to return a value to the caller like other programming languages. Instead, you declare parameters as OUT
rather than IN
, and then they're set in the procedure body. Here's a simple example.
delimiter //
CREATE PROCEDURE birthday_count(
IN bday date,
OUT count int)
BEGIN
SET count = (SELECT count(*) FROM birthdays WHERE birthday = bday);
END
//
delimiter ;
-- couple other birthdays
CALL new_employee('aaron', 'son', '1985-01-10');
CALL new_employee('brian', 'hendriks', '1985-01-10');
SET @count = 0;
call birthday_count('1985-01-10', @count);
Query OK, 0 rows affected (0.00 sec)
SELECT @count;
+--------+
| @count |
+--------+
| 2 |
+--------+
1 row in set (0.00 sec)
Note that count
is used two different ways:
- Inside the stored procedure, it's used as a parameter that gets a value assigned to it with a
SET
statement.. You could also use theSELECT .. INTO
syntax for this purpose. - Outside the stored procedure, the variable
@count
is initialized before the call, passed into the procedure, and then selected to get the result.
Generally speaking, if your procedure needs to return a value, it probably makes more sense to write it as a function instead, but there are situations where OUT
parameters are important.
Variables
Stored procedures can set session variables (@var
), or global variables (@@var
), or local variables. The latter are defined only in the scope of an execution of a call of the procedure. Which one you use depends on what you're trying to accomplish. It's considered best practice to use the DECLARE
syntax to declare local variables for the procedure, and then to return any values necessary back to the calling scope with OUT
params.
Declaring and using variables looks like this:
CREATE PROCEDURE combined_birthday_count(
IN bday1 date,
IN bday2 date,
OUT count int)
BEGIN
DECLARE count1, count2 int;
SET count1 = (SELECT count(*) FROM birthdays WHERE birthday = bday1);
SET count2 = (SELECT count(*) FROM birthdays WHERE birthday = bday2);
SET count = count1 + count2;
END
mysql> call combined_birthday_count('1980-02-03', '1985-01-10', @count);
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @count;
+--------+
| @count |
+--------+
| 3 |
+--------+
Here we are declaring two local variables count1
and count2
, both of the int
type. The syntax is very similar to parameters, with the added wrinkle that you can define more than one variable with the same type in the same statement.
Note that DECLARE
statements must occur at the beginning of the procedure, before other kinds of statements, similar to older versions of C that require all variables to be declared before other statements in a function.
Control flow statements
Just like any other programming language, MySQL stored procedures support conditional logic by means of a set of control flow statements like IF
, LOOP
, etc. Using these statements makes it possible to implement any logic you can imagine in your stored procedures.
IF
IF
statements execute one of N statements depending on a condition. Multiple ELSEIF
cases can follow the IF
, and an optional ELSE
can end the block. The THEN
keyword that follows IF
and ELSEIF
begins a block of statements similar to BEGIN
. Finally, the entire IF
statement must be terminated by END IF
. Let's see an example.
CREATE PROCEDURE birthday_message(
bday date,
OUT message varchar(100))
BEGIN
DECLARE count int;
DECLARE name varchar(100);
SET count = (SELECT count(*) FROM birthdays WHERE birthday = bday);
IF count = 0 THEN
SET message = "Nobody has this birthday";
ELSEIF count = 1 THEN
SET name = (SELECT concat(first_name, " ", last_name)
FROM employees join birthdays
on emp_id = id
WHERE birthday = bday);
SET message = (SELECT concat("It's ", name, "'s birthday"));
ELSE
SET message = "More than one employee has this birthday";
END IF;
END
call birthday_message (now(), @message);
Query OK, 0 rows affected, 1 warning (0.00 sec)
SELECT @message;
+--------------------------+
| @message |
+--------------------------+
| Nobody has this birthday |
+--------------------------+
1 row in set (0.00 sec)
call birthday_message ('1980-02-03', @message);
Query OK, 0 rows affected (0.01 sec)
SELECT @message;
+--------------------------+
| @message |
+--------------------------+
| It's tim sehn's birthday |
+--------------------------+
1 row in set (0.00 sec)
call birthday_message ('1985-01-10', @message);
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @message;
+------------------------------------------+
| @message |
+------------------------------------------+
| More than one employee has this birthday |
+------------------------------------------+
1 row in set (0.00 sec)
CASE
CASE
statements are another way of expressing conditional logic, when the same expression is evaluated for every logical branch, similar to the switch statement found in many programming languages. We can implement the same procedure above using case statements instead. Note that you end a CASE
block with a END CASE
statement.
CREATE PROCEDURE birthday_message(
bday date,
OUT message varchar(100))
BEGIN
DECLARE count int;
DECLARE name varchar(100);
SET count = (SELECT count(*) FROM birthdays WHERE birthday = bday);
CASE count
WHEN 0 THEN
SET message = "Nobody has this birthday";
WHEN 1 THEN
SET name = (SELECT concat(first_name, " ", last_name)
FROM employees join birthdays
on emp_id = id
WHERE birthday = bday);
SET message = (SELECT concat("It's ", name, "'s birthday"));
ELSE
SET message = "More than one employee has this birthday";
END CASE;
END
Compared to using IF
and ELSEIF
, the CASE
version makes it clearer that the choice of execution path depends on the value of the count
variable.
Loop constructs
To repeat the same set of statements more than once, use a loop construct. MySQL provides three different loop constructs to choose from: WHILE
, REPEAT
, and LOOP
. Which one you use is mostly a matter of personal preference and how easy you find them to read for a particular situation.
For this example, we'll write a procedure that computes the Nth fibonnaci number and assigns it to an out parameter.
WHILE
First let's look at the WHILE
keyword:
CREATE PROCEDURE fib(n int, out answer int)
BEGIN
DECLARE i int default 2;
DECLARE p, q int default 1;
SET answer = 1;
WHILE i < n DO
SET answer = p + q;
SET p = q;
SET q = answer;
SET i = i + 1;
END WHILE;
END;
(Let's make sure it works first.)
SET @answer = 1; call fib(6, @answer); SELECT @answer;
Query OK, 0 rows affected (0.00 sec)
+---------+
| @answer |
+---------+
| 8 |
+---------+
SET @answer = 1; call fib(7, @answer); SELECT @answer;
Query OK, 0 rows affected (0.00 sec)
+---------+
| @answer |
+---------+
| 13 |
+---------+
Note the use of the DEFAULT
keyword on the DECLARE
statements, which we hadn't used before. This assigns an initial value to the variable. Unlike other languages, MySQL integer variables do not default to 0
or any other value, but instead are initialized to NULL
by default (which you don't want for calculation).
REPEAT
Now that our procedure works, let's write a version using REPEAT
:
CREATE PROCEDURE fib(n int, out answer int)
BEGIN
DECLARE i int default 1;
DECLARE p int default 0;
DECLARE q int default 1;
SET answer = 1;
REPEAT
SET answer = p + q;
SET p = q;
SET q = answer;
SET i = i + 1;
UNTIL i >= n END REPEAT;
END;
Unlike WHILE
, REPEAT
loops check the loop condition at the end of the loop, not the beginning. So they always execute the body of the loop at least once. Because of this, we needed to adjust our initial variable values for the n=1 and n=2 cases.
LOOP
, ITERATE
and LEAVE
Finally let's look at LOOP
. Unlike REPEAT
and WHILE
, LOOP
has no built-in exit condition, making it very easy to write an infinite loop. You have to use a label and code an explicit LEAVE
statement to exit the loop. Here's the same procedure again, with the loop1
label applied:
CREATE PROCEDURE fib(n int, out answer int)
BEGIN
DECLARE i int default 2;
DECLARE p, q int default 1;
SET answer = 1;
loop1: LOOP
IF i >= n THEN
LEAVE loop1;
END IF;
SET answer = p + q;
SET p = q;
SET q = answer;
SET i = i + 1;
END LOOP loop1;
END;
Note that the loop1
label occurs both before the LOOP
as well as at the end of it. The LEAVE
statement terminates the loop.
In addition to the LEAVE
keyword, it can sometimes be useful or clearer to use the ITERATE
keyword. Here's a version of the loop that uses ITERATE
:
CREATE PROCEDURE fib(n int, out answer int)
BEGIN
DECLARE i int default 2;
DECLARE p, q int default 1;
SET answer = 1;
loop1: LOOP
IF i < n THEN
SET answer = p + q;
SET p = q;
SET q = answer;
SET i = i + 1;
ITERATE loop1;
END IF;
LEAVE loop1;
END LOOP loop1;
END;
Which way you choose to write a LOOP
depends on what you're trying to do and the cleanest way to express it. You can also use ITERATE
to conditionally skip parts of a loop under certain circumstances.
Exception handling
MySQL stored procedures support terminating execution with an error with a special SIGNAL
keyword. You can also define your own named error conditions to use in stored procedures you write for readability. This can be used to halt the operation of a transaction and cause it to be rolled back, just like any native MySQL error. This can be useful when implementing complex data integrity checks.
The syntax to signal an error is unfamiliar but straightforward. This example comes straight out of the MySQL docs:
CREATE PROCEDURE p (pval INT)
BEGIN
DECLARE specialty CONDITION FOR SQLSTATE '45000';
IF pval = 0 THEN
SIGNAL SQLSTATE '01000';
ELSEIF pval = 1 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'An error occurred';
ELSEIF pval = 2 THEN
SIGNAL specialty
SET MESSAGE_TEXT = 'An error occurred';
ELSE
SIGNAL SQLSTATE '01000'
SET MESSAGE_TEXT = 'A warning occurred', MYSQL_ERRNO = 1000;
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'An error occurred', MYSQL_ERRNO = 1001;
END IF;
END;
What are stored procedures used for?
Now that you understand how to define a stored procedures in MySQL, you might be asking yourself why you would want to. What are they good for? What do people use these things to do? Stored procedures are a general tool you can use to do all sorts of things, but here are some of the more common ones in our experience.
System maintenance
Sometimes it's convenient or desirable to encode business logic directly in the database, in the same place where the data is defined and stored. This approach means that the logic to maintain key database properties isn't spread out somewhere else in a distributed system (like a batch job), but directly in the database itself.
For example, consider an online storefront, where you store an order history for every customer. It might make sense to archive details about these orders, like tracking information and return requests, once they get old enough that it's unlikely they'll be useful any longer. Archiving old data means a smaller database, which means lower storage bills for your backups and possibly faster execution time on queries.
Here's a procedure that deletes order information older than a cutoff provided:
CREATE PROCEDURE archive_order_history(cutoff date)
BEGIN
-- sanity check: don't proceed if we are given a date in the last year
IF datediff(now(), date) < 365 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'attempted to archive recent order history';
END IF;
-- clear out any orders older than the cutoff, lots of tables to delete from here
DELETE FROM order_updates WHERE update_date < cutoff;
DELETE FROM order_history WHERE order_date < cutoff;
DELETE FROM order_feedback WHERE feedack_date < cutoff;
DELETE FROM support_requests WHERE order_date < cutoff;
END
Note that this procedure encapsulates the logic of deleting old data from many tables into a single procedure. As the database schema continues to evolve, new logic can be added here as necessary.
Procedures like this one tend to be run on a periodic basis, and you can schedule MySQL to run them for you automatically:
CREATE EVENT archive_order_history_event
ON SCHEDULE
EVERY 1 DAY
STARTS TIMESTAMP(CURDATE(), '02:00:00')
DO
BEGIN
CALL archive_order_history(DATE_SUB(CURDATE(), INTERVAL 1 YEAR));
END;
You can also alter this procedure to perform work in small chunks, if the data is truly massive and there's any concern about interfering with other write throughput. Just run the DELETE
with a LIMIT
clause in a loop:
CREATE PROCEDURE archive_order_history(cutoff date)
BEGIN
DECLARE count int default 1;
WHILE COUNT > 0 DO
DELETE FROM order_history WHERE order_date < cutoff LIMIT 100;
-- See how many are left to delete
-- Make sure to stop our table scan once we find a matching row
SET count = (SELECT count(*) FROM (
SELECT order_id FROM order_history WHERE order_date < cutoff LIMIT 1));
COMMIT; -- commit our batch of 100 deletes
END WHILE;
END
Time-based compliance
Some regulations, such as GDPR, require online businesses to remove customer data upon request or face heavy fines. Usually there is a grace period of some number of days for the customer to change their mind, during which time you don't want to delete their data if possible. Here again it may make sense to implement a stored procedure to encapsulate the logic of what a deletion request does, then run it on a timer.
First, create a table that logs when a customer requests data removal:
CREATE TABLE removal_requests(
customer_id bigint primary key,
request_date date,
removal_date date
);
Now define a procedure to delete all personally identifiable data associated with a customer.
CREATE PROCEDURE delete_customer(id_to_remove int)
BEGIN
-- clear out any customer information
UPDATE cust_info SET first_name = "REDACTED", last_name = "REDACTED"
WHERE cust_id = id_to_remove;
DELETE FROM customer_addresses WHERE cust_id = id_to_remove;
DELETE FROM reviews WHERE cust_id = id_to_remove;
UPDATE order_history SET delivery_instructions = "REDACTED"
WHERE cust_id = id_to_remove;
END
Notice that this procedure completely encapsulates the logic required to comply with the regulation. We can't just delete the customer
record because that would cause referential constraint violations, or cascading deletion, in other tables storing information we're required to keep for financial reasons. Instead, we perform a mix of deletions where possible and updates to redact personally identifiable information where it's not.
Finally we schedule an event to delete customers whose time is up:
CREATE EVENT customer_removal_event
ON SCHEDULE
EVERY 1 DAY
STARTS TIMESTAMP(CURDATE(), '03:00:00')
DO
BEGIN
DECLARE cust_id int;
cust_loop: REPEAT
SET cust_id = (SELECT customer_id FROM removal_requests
WHERE removal_date >= now() limit 1);
IF cust_id IS NULL
LEAVE cust_loop;
CALL delete_customer(cust_id);
UNTIL cust_id IS NULL END REPEAT cust_loop; -- redundant end condition on this loop
END;
Data integrity checks and complex updates
For some complex table relationships, it might be desirable to define the logic of inserting or updating the data in the tables in a procedure. This has the advantage of allowing you to execute arbitrarily complex logic for validation, things that can't be expressed with simple FOREIGN KEY
or CHECK
constraints.
Consider a library system with tables for books, holds, and checkouts.
CREATE TABLE books(
isbn varchar(100) primary key,
title varchar(255),
...
);
CREATE TABLE holds(
isbn varchar(100),
patron_id bigint,
unique key (isbn, patron_id)
);
CREATE TABLE checkouts(
isbn varchar(100),
patron_id bigint
date date,
KEY (isbn, patron_id)
);
We can define a procedure to handle all the business logic we want to consider when checking out a book:
CREATE PROCEDURE checkout(
checkout_isbn varchar(100),
borrower_patron_id bigint)
BEGIN
DECLARE current_checkouts int;
SET current_checkouts = (SELECT COUNT(*) FROM checkouts WHERE patron_id = borrower_patron_id);
IF current_checkouts > 20 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'Too many books checked out';
END;
DELETE FROM holds WHERE isbn = checkout_isbn and patron_id = borrower_patron_id;
INSERT INTO checkouts VALUES (checkout_isbn, borrower_patron_id, now());
END;
It's also possible (and more common) to define this sort of logic in application code. But the advantage of defining logic at the database level is that multiple applications can use this stored procedure without needing to each understand the business logic around checking out a book. The logic can be updated in one place when policy like the max number of books per patron changes.
In modern architectures, this role is more typically filled by a web service than a database, but there are still a surprising number of older systems out there where this isn't an option, where multiple legacy applications still connect directly to and update a shared database. For these systems, it's useful to have this kind of trick up our sleeves.
Advanced topic: cursors
The examples above all limit results from SQL queries in procedures to a single row, and usually a single value. There's a lot you can do with this pattern, but for some situations you really need to examine multiple rows in a loop to answer some question. For those situations, MySQL provides cursors. For obvious reasons, it's a lot more difficult to write correct stored procedures that require cursors than ones that don't.
This simple example is straight out of the MySQL docs.
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a CHAR(16);
DECLARE b, c INT;
DECLARE cur1 CURSOR FOR SELECT id,data FROM test.t1;
DECLARE cur2 CURSOR FOR SELECT i FROM test.t2;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur1;
OPEN cur2;
read_loop: LOOP
FETCH cur1 INTO a, b;
FETCH cur2 INTO c;
IF done THEN
LEAVE read_loop;
END IF;
IF b < c THEN
INSERT INTO test.t3 VALUES (a,b);
ELSE
INSERT INTO test.t3 VALUES (a,c);
END IF;
END LOOP;
CLOSE cur1;
CLOSE cur2;
END;
https://www.dolthub.com/blog/2024-01-17-writing-mysql-procedures/
No hay comentarios:
Publicar un comentario