High Performance MySQL

1. Special SQL
- 1.1. AUTOCOMMIT
- 1.2. Set isolation level
2. Storage Engine
- 2.1. Change engine
3. Benchmark
4. Profiling
5. Schema
6. Query Optimization
- 6.1. Isolating the Column

1 Special SQL

1.1 AUTOCOMMIT

SHOW VARIABLES LIKE 'AUTOCOMMIT';
SET AUTOCOMMIT = 1;

1.2 Set isolation level

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

2 Storage Engine

InnoDB
MyISAM
Archive
OLTP: XtraDB, TokuDB, like InnoDB
Infobright: Column-oriented
Aria: successor to MyISAM, crash-safe
MySQL Cluster

2.1 Change engine

2.1.1 ALTER TABLE

ALTER TABLE mytable ENGINE = InnoDB;

row-by-row copy
inplace and slow

2.1.2 Dump and import

Use mysqldump

dump to text
change create sql
import

2.1.3 CREATE and SELECT

CREATE TABLE innodb_table LIKE myisam_table;
ALTER TABLE innodb_table ENGINE=InnoDB;
INSERT INTO innodb_table SELECT * FROM myisam_table;

For large volume data:

START TRANSACTION;
INSERT INTO innodb_table SELECT * FROM myisam_table WHERE id BETWEEN x AND y;
COMMIT;

2.1.4 Online schema change

pt-online-schema-change (based on Facebook’s online schema change technique)

3 Benchmark

3.1 What to Measure

3.1.1 Throughput

Throughput is defined as the number of transactions per unit of time. The usual unit of measurement is transactions per second(TPS), although it is sometimes transactions per minute(TPM).

3.1.2 Response time or latency

This measures the total time a task requires.

3.1.3 Concurrency

Concurrency is an important but frequently misused and misunderstood metric. Concurrency benchmark shoud be the number of threads or connections doing work simultaneously.

3.1.4 Scalability

3.2 Gathering Data on MySQL

specify user to execute mysql command mysql_config_editor set --user=username --host=127.0.0.1 --port=3306 --password
a sample shell script that you can use to gather data on MySQL during benchmarks

#!/bin/sh
INTERVAL=5
PREFIX=$INTERVAL-sec-status
RUNFILE=/home/benchmark/running
mysql -e 'SHOW GLOBAL VARIABLES' >> mysql-variables
while test -e $RUNFILE; do
    file=$(date +%F_%I)
    sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
    sleep $sleep
    ts="$(date +"TS %s.%N %F %T")"
    loadavg="$(uptime)"
    echo "$ts $loadavg" >> $PREFIX-${file}-status
    mysql -e 'SHOW GLOBAL STATUS' >> $PREFIX-${file}-status &
    echo "$ts $loadavg" >> $PREFIX-${file}-innodbstatus
    mysql -e 'SHOW ENGINE INNODB STATUS\G' >> $PREFIX-${file}-innodbstatus &
    echo "$ts $loadavg" >> $PREFIX-${file}-processlist
    mysql -e 'SHOW FULL PROCESSLIST\G' >> $PREFIX-${file}-processlist &
    echo $ts
done
echo Exiting because $RUNFILE does not exist.

3.3 Analyzing Results

#!/bin/sh
awk '
BEGIN {
  printf "#ts date time load QPS";
  fmt = " %.2f";
}
/^TS/ { # The timestamp lines begin with TS.
  ts = substr($2, 1, index($2, ".") - 1);
  load = NF - 2;
  diff = ts - prev_ts;
  prev_ts = ts;
  printf "\n%s %s %s %s", ts, $3, $4, substr($load, 1, length($load)-1);
}
/Queries/ {
  printf fmt, ($2-Queries)/diff;
  Queries=$2
}
' "$@"

3.3.1 plot

gnuplot> plot "filename" using 5 w lines title "QPS"

3.4 Tools

3.4.1 http_load

3.4.2 MySQL Benchmark Suite

mysqltest

3.4.3 sysbench

cpu

sysbench cpu --cpu-max-prime=20000 --threads=4 run

fileio

IO mode	description
seqwr	Sequential write
seqrewr	Sequential rewrite
seqrd	Sequential read
rndrd	Random read
rndwr	Random write
rndrw	Combined random read/write

sysbench fileio --file-total-size=21G prepare
sysbench fileio --file-total-size=21G --file-test-mode=rndrw \
	 --time=300 --max-requests=0 --threads=4 run

oltp

sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \
	 --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_common.lua prepare
sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \
	 --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_read_write.lua \
	 --threads=4 run

3.4.4 TPC-C

TPC-C is a specification published by the TPC organization that emulates a complex online transaction-processing load.

dbt2
TPPC-MySQL

4 Profiling

execution-time profiling
wait analysis

4.1 slow query log

related variables: slow_query_log, long_query_time

low-overhead on I/O-bound workloads
high-fidelity

4.1.1 pt-query-digest

Analyze MySQL queries from logs, processlist, and tcpdump. pt-query-digest [log_file]

4.1.2 tcpdump

Use tcpdump to capture TCP network traffic and inspecting it, then

use pt-query-digest --type=tpcdump to decode the MySQL client/server protocol

4.2 Single Query Profiling

4.2.1 performance_schema

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%statement/%';
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%stage/%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_statements_%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_stages_%';

SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
FROM performance_schema.events_statements_history_long
WHERE ... /*same as SHOW PROFILES*/

SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
FROM performance_schema.events_stages_history_long
WHERE NESTING_EVENT_ID=... /*same as SHOW PROFILE FOR QUERY*/

results in performance_schema.events_statements_history_long

4.2.2 SHOW PROFILE[deprecated]

SET profiling=1;

SHOW PROFILES;
SHOW PROFILE FOR QUERY [Query_ID];

4.2.3 SHOW STATUS

FLUSH STATUS;
SHOW STATUS WHERE Variable_name LIKE 'Handler%' OR Variable_name LIKE 'Created%';

4.2.4 EXPLAIN

EXPLAIN [SQL_QUERY]

4.3 Server-Wide Problem Detection

4.3.1 sysstat

iostat, vmstat

4.3.2 SHOW GLOBAL STATUS

Run SHOW GLOBAL STATUS periodically, gather Threads_running, Threads_connected, Questions, and Queries.

mysqladmin ext -i1 -p | awk '
    /Queries/{q=$4-qp;qp=$4}
    /Threads_connected/{tc=$4}
    /Threads_running/{printf "%5d %5d %5d\n", q, tc, $4}'

4.3.3 SHOW PROCESSLIST

mysql -e 'SHOW PROCESSLIST\G' -p | grep State: | sort | uniq -c | sort -rn

other approaches

PROCESSLIST table

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;

innotop command

4.3.4 SHOW INNODB STATUS

4.3.5 analyze slow log

awk '/^# Time:/{print $3, $4, c;c=0}/^# User/{c++}' slow-query.log

4.3.6 oprofile

primary Linux system profiling tool

4.3.7 pt-stalk

Collect forensic data about MySQL when problems occur.

pt-sift: Browses files created by pt-stalk

4.3.8 `pt-mysql-summary` & `pt-summary`

4.3.9 pt-pmp

Aggregate GDB stack traces for a selected program

4.4 Tools

4.4.1 USER_STATISTICS

SHOW TABLES FROM INFORMATION_SCHEMA LIKE '%_STATISTICS'; in MariaDB

4.4.2 strace

Use strace to intercepts system calls

strace -cfp [PID]

5 Schema

5.1 Data Type

5.1.1 Priciples

Smaller is usually better
Simple is good
Avoid NULL if possible

5.1.2 SET

stores bits. useful for ACL(Access Control List)

CREATE TABLE acl (
       perms SET('CAN_READ', 'CAN_WRITE', 'CAN_DELETE') NOT NULL
);
INSERT INTO acl (perms) VALUES ('CAN_READ,CAN_WRITE');
SELECT perms FROM acl WHERE FIND_IN_SET('CAN_READ', perms);

stores SET types internally as integers but converts them to strings when doing comparisons.

5.1.3 UUID

recommend using BINARY(16) type

5.2 Normalization and Denormalization

5.2.1 Normalization

Benefits:

Normalized updates are usually faster than denormalized updates
Normalized tables are usually smaller, so they fit better in memory and perform better
The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values

Drawbacks:

requires more joins

5.2.2 Denormalization

Benefits:

avoids joins
allow more efficient indexing strategies

5.3 Summary Table & Cache Table

summary tables: hold aggregated data from GROUP BY queries
cache tables: contain data that can be easily retrieved from the schema(i.e., data that is logically redundant).

5.3.1 Flexviews

offer materialized views

5.4 Speeding Up ALTER TABLE

5.4.1 Default

MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table.

5.4.2 Alternatives

Performing the ALTER on servers that are not in production service
Shadow copy: build a new table with the desired structure beside the existing one,

and then perform a rename and drop to swap the two.

Shadow copy tool
- "online schema change" from facebook
- openark
- Percona Toolkit

5.5 Indexing Methods

5.5.1 B-tree

column order in a index is extremely important

B+Tree

data is on the leaf node.
Limitation
- lookup does not start from the leftmost side of the indexed columns
- you can't skip columns in the index.
e.g. index(col1, col2, col3) need to specify a value to col2 if value of col3 is given
- The storage engine can’t optimize accesses with any columns to the right of the first range condition.
e.g. col2 LIKE 'abc%' <- range condition, then col3=="somevalue" won't be optimized.

5.5.2 Hash

only the Memory storage engine supports explicit hash indexes

InnoDB

adaptive hash indexes: When InnoDB notices that some index values are being accessed very frequently, it builds a hash index for them in memory on top of B-Tree indexes.

Building hash indexes

add hash column
create trigger on insert and update
select by hash value

SELECT id FROM url WHERE url_crc=CRC32("http://www.mysql.com")
       AND url="http://www.mysql.com"; -- Search in collision

hash functions: CRC32, FNV64, SHA1, MD5

5.5.3 R-Tree

index GEOMETRY information
MyISAM supports R-Tree

5.5.4 Full-text

Full-text is a special type of index that finds keywords in the text instead of comparing values directly to the values in the index.

5.6 Indexing Strategies

5.6.1 Long string index

one approach: simulate a hash index
indexing the first few characters
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table. Choose a prefix that’s long enough to give good selectivity, but short enough to save space
- one way: try to compare occurrences of each distinct indexed values for each prefix
```
SELECT COUNT(*) AS cnt, city FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10;
SELECT COUNT(*) AS cnt, LEFT(city, 3) AS pref FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
```
- another way: computing the full column's selectivity and trying to make the prefix's selectivity close to that value.
```
SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;
SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT(*) AS sel3,
       COUNT(DISTINCT LEFT(city, 4))/COUNT(*) AS sel4,
FROM sakila.city_demo;
```
- downside: MySQL cannot use prefix indexes for ORDER BY or GROUP BY queries, nor can it use them as covering indexes.

5.6.2 Multicolumn Indexes

Individual indexes on lots of columns won’t help MySQL improve performance for most queries

SELECT col1, col2 FROM tbl WHERE index1=1 or index2=1;
-- will translate to
SELECT col1, col2 FROM tbl WHERE index1=1
UNION ALL
SELECT col1, col2 FROM tbl WHERE index2=1 AND index1 <> 1;

Use EXPLAIN to check

5.6.3 Choosing a Good Column Order

Choose the index order such that rows are sorted and grouped in a way that will benefit the query The order of columns in a multicolumn B-Tree index means that the index is sorted first by the leftmost column, then by the next column, and so on.

5.7 Clustered Indexes

Clustered index is an approach to data storage.

InnoDB's clustered indexes actually store a B-Tree index and the rows together in the same structure.

5.8 Covering Indexes

An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index.

Covering indexes are especially helpful for InnoDB tables, because of InnoDB’s clustered indexes
When you issue a query that is an index-covered query, you’ll see “Using index” in the Extra column in EXPLAIN

5.8.1 Benefits

reduce random I/O

5.9 Using Index for Sorts

Ordering the results by the index works only when

the index’s order is exactly the same as the ORDER BY clause
and all columns are sorted in the same direction

6 Query Optimization

6.1 Isolating the Column

Indexed column is alone on one side of the comparison operator. Wrong Examples: WHERE id+2=5, WHERE SUM(id)=10