High Performance MySQL

Table of Contents

1 Special SQL

1.1 AUTOCOMMIT

SHOW VARIABLES LIKE 'AUTOCOMMIT';
SET AUTOCOMMIT = 1;

1.2 Set isolation level

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

2 Storage Engine

  • InnoDB
  • MyISAM
  • Archive
  • OLTP: XtraDB, TokuDB, like InnoDB
  • Infobright: Column-oriented
  • Aria: successor to MyISAM, crash-safe
  • MySQL Cluster

2.1 Change engine

2.1.1 ALTER TABLE

ALTER TABLE mytable ENGINE = InnoDB;
  • row-by-row copy
  • inplace and slow

2.1.2 Dump and import

Use mysqldump

  1. dump to text
  2. change create sql
  3. import

2.1.3 CREATE and SELECT

CREATE TABLE innodb_table LIKE myisam_table;
ALTER TABLE innodb_table ENGINE=InnoDB;
INSERT INTO innodb_table SELECT * FROM myisam_table;

For large volume data:

START TRANSACTION;
INSERT INTO innodb_table SELECT * FROM myisam_table WHERE id BETWEEN x AND y;
COMMIT;

2.1.4 Online schema change

pt-online-schema-change (based on Facebook’s online schema change technique)

3 Benchmark

3.1 What to Measure

3.1.1 Throughput

Throughput is defined as the number of transactions per unit of time. The usual unit of measurement is transactions per second(TPS), although it is sometimes transactions per minute(TPM).

3.1.2 Response time or latency

This measures the total time a task requires.

3.1.3 Concurrency

Concurrency is an important but frequently misused and misunderstood metric. Concurrency benchmark shoud be the number of threads or connections doing work simultaneously.

3.1.4 Scalability

3.2 Gathering Data on MySQL

  • specify user to execute mysql command mysql_config_editor set --user=username --host=127.0.0.1 --port=3306 --password
  • a sample shell script that you can use to gather data on MySQL during benchmarks
#!/bin/sh
INTERVAL=5
PREFIX=$INTERVAL-sec-status
RUNFILE=/home/benchmark/running
mysql -e 'SHOW GLOBAL VARIABLES' >> mysql-variables
while test -e $RUNFILE; do
    file=$(date +%F_%I)
    sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
    sleep $sleep
    ts="$(date +"TS %s.%N %F %T")"
    loadavg="$(uptime)"
    echo "$ts $loadavg" >> $PREFIX-${file}-status
    mysql -e 'SHOW GLOBAL STATUS' >> $PREFIX-${file}-status &
    echo "$ts $loadavg" >> $PREFIX-${file}-innodbstatus
    mysql -e 'SHOW ENGINE INNODB STATUS\G' >> $PREFIX-${file}-innodbstatus &
    echo "$ts $loadavg" >> $PREFIX-${file}-processlist
    mysql -e 'SHOW FULL PROCESSLIST\G' >> $PREFIX-${file}-processlist &
    echo $ts
done
echo Exiting because $RUNFILE does not exist.

3.3 Analyzing Results

#!/bin/sh
awk '
BEGIN {
  printf "#ts date time load QPS";
  fmt = " %.2f";
}
/^TS/ { # The timestamp lines begin with TS.
  ts = substr($2, 1, index($2, ".") - 1);
  load = NF - 2;
  diff = ts - prev_ts;
  prev_ts = ts;
  printf "\n%s %s %s %s", ts, $3, $4, substr($load, 1, length($load)-1);
}
/Queries/ {
  printf fmt, ($2-Queries)/diff;
  Queries=$2
}
' "$@"

3.3.1 plot

gnuplot> plot "filename" using 5 w lines title "QPS"

3.4 Tools

3.4.1 http_load

3.4.2 MySQL Benchmark Suite

mysqltest

3.4.3 sysbench

  • cpu
    sysbench cpu --cpu-max-prime=20000 --threads=4 run
    
  • fileio
    IO mode description
    seqwr Sequential write
    seqrewr Sequential rewrite
    seqrd Sequential read
    rndrd Random read
    rndwr Random write
    rndrw Combined random read/write
    sysbench fileio --file-total-size=21G prepare
    sysbench fileio --file-total-size=21G --file-test-mode=rndrw \
    	 --time=300 --max-requests=0 --threads=4 run
    
  • oltp
    sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \
    	 --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_common.lua prepare
    sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \
    	 --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_read_write.lua \
    	 --threads=4 run
    

3.4.4 TPC-C

TPC-C is a specification published by the TPC organization that emulates a complex online transaction-processing load.

  • dbt2
  • TPPC-MySQL

4 Profiling

  • execution-time profiling
  • wait analysis

4.1 slow query log

related variables: slow_query_log, long_query_time

  • low-overhead on I/O-bound workloads
  • high-fidelity

4.1.1 pt-query-digest

Analyze MySQL queries from logs, processlist, and tcpdump. pt-query-digest [log_file]

4.1.2 tcpdump

Use tcpdump to capture TCP network traffic and inspecting it, then

  • use pt-query-digest --type=tpcdump to decode the MySQL client/server protocol

4.2 Single Query Profiling

4.2.1 performance_schema

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%statement/%';
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%stage/%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_statements_%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_stages_%';

SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
FROM performance_schema.events_statements_history_long
WHERE ... /*same as SHOW PROFILES*/

SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
FROM performance_schema.events_stages_history_long
WHERE NESTING_EVENT_ID=... /*same as SHOW PROFILE FOR QUERY*/

results in performance_schema.events_statements_history_long

4.2.2 SHOW PROFILE[deprecated]

SET profiling=1;

SHOW PROFILES;
SHOW PROFILE FOR QUERY [Query_ID];

4.2.3 SHOW STATUS

FLUSH STATUS;
SHOW STATUS WHERE Variable_name LIKE 'Handler%' OR Variable_name LIKE 'Created%';

4.2.4 EXPLAIN

EXPLAIN [SQL_QUERY]

4.3 Server-Wide Problem Detection

4.3.1 sysstat

  • iostat, vmstat

4.3.2 SHOW GLOBAL STATUS

Run SHOW GLOBAL STATUS periodically, gather Threads_running, Threads_connected, Questions, and Queries.

mysqladmin ext -i1 -p | awk '
    /Queries/{q=$4-qp;qp=$4}
    /Threads_connected/{tc=$4}
    /Threads_running/{printf "%5d %5d %5d\n", q, tc, $4}'

4.3.3 SHOW PROCESSLIST

mysql -e 'SHOW PROCESSLIST\G' -p | grep State: | sort | uniq -c | sort -rn
  • other approaches
    • PROCESSLIST table
    SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;
    
    • innotop command

4.3.4 SHOW INNODB STATUS

4.3.5 analyze slow log

awk '/^# Time:/{print $3, $4, c;c=0}/^# User/{c++}' slow-query.log

4.3.6 oprofile

primary Linux system profiling tool

4.3.7 pt-stalk

Collect forensic data about MySQL when problems occur.

  • pt-sift: Browses files created by pt-stalk

4.3.8 pt-mysql-summary & pt-summary

4.3.9 pt-pmp

Aggregate GDB stack traces for a selected program

4.4 Tools

4.4.1 USER_STATISTICS

SHOW TABLES FROM INFORMATION_SCHEMA LIKE '%_STATISTICS'; in MariaDB

4.4.2 strace

Use strace to intercepts system calls

strace -cfp [PID]

5 Schema

5.1 Data Type

5.1.1 Priciples

  • Smaller is usually better
  • Simple is good
  • Avoid NULL if possible

5.1.2 SET

stores bits. useful for ACL(Access Control List)

CREATE TABLE acl (
       perms SET('CAN_READ', 'CAN_WRITE', 'CAN_DELETE') NOT NULL
);
INSERT INTO acl (perms) VALUES ('CAN_READ,CAN_WRITE');
SELECT perms FROM acl WHERE FIND_IN_SET('CAN_READ', perms);
  • stores SET types internally as integers but converts them to strings when doing comparisons.

5.1.3 UUID

  • recommend using BINARY(16) type

5.2 Normalization and Denormalization

5.2.1 Normalization

Benefits:

  • Normalized updates are usually faster than denormalized updates
  • Normalized tables are usually smaller, so they fit better in memory and perform better
  • The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values

Drawbacks:

  • requires more joins

5.2.2 Denormalization

Benefits:

  • avoids joins
  • allow more efficient indexing strategies

5.3 Summary Table & Cache Table

  • summary tables: hold aggregated data from GROUP BY queries
  • cache tables: contain data that can be easily retrieved from the schema(i.e., data that is logically redundant).

5.3.1 Flexviews

  • offer materialized views

5.4 Speeding Up ALTER TABLE

5.4.1 Default

MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table.

5.4.2 Alternatives

  • Performing the ALTER on servers that are not in production service
  • Shadow copy: build a new table with the desired structure beside the existing one,

and then perform a rename and drop to swap the two.

  • Shadow copy tool
    • "online schema change" from facebook
    • openark
    • Percona Toolkit

5.5 Indexing Methods

5.5.1 B-tree

  • column order in a index is extremely important
  • B+Tree

    data is on the leaf node.

  • Limitation
    • lookup does not start from the leftmost side of the indexed columns
    • you can't skip columns in the index.

    e.g. index(col1, col2, col3) need to specify a value to col2 if value of col3 is given

    • The storage engine can’t optimize accesses with any columns to the right of the first range condition.

    e.g. col2 LIKE 'abc%' <- range condition, then col3=="somevalue" won't be optimized.

5.5.2 Hash

  • only the Memory storage engine supports explicit hash indexes
  • InnoDB

    adaptive hash indexes: When InnoDB notices that some index values are being accessed very frequently, it builds a hash index for them in memory on top of B-Tree indexes.

  • Building hash indexes
    1. add hash column
    2. create trigger on insert and update
    3. select by hash value
    SELECT id FROM url WHERE url_crc=CRC32("http://www.mysql.com")
           AND url="http://www.mysql.com"; -- Search in collision
    
    • hash functions: CRC32, FNV64, SHA1, MD5

5.5.3 R-Tree

  • index GEOMETRY information
  • MyISAM supports R-Tree

5.5.4 Full-text

Full-text is a special type of index that finds keywords in the text instead of comparing values directly to the values in the index.

5.6 Indexing Strategies

5.6.1 Long string index

  • one approach: simulate a hash index
  • indexing the first few characters

    Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table. Choose a prefix that’s long enough to give good selectivity, but short enough to save space

    • one way: try to compare occurrences of each distinct indexed values for each prefix
    SELECT COUNT(*) AS cnt, city FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10;
    SELECT COUNT(*) AS cnt, LEFT(city, 3) AS pref FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
    
    • another way: computing the full column's selectivity and trying to make the prefix's selectivity close to that value.
    SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;
    SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT(*) AS sel3,
           COUNT(DISTINCT LEFT(city, 4))/COUNT(*) AS sel4,
    FROM sakila.city_demo;
    
    • downside: MySQL cannot use prefix indexes for ORDER BY or GROUP BY queries, nor can it use them as covering indexes.

5.6.2 Multicolumn Indexes

  • Individual indexes on lots of columns won’t help MySQL improve performance for most queries

    SELECT col1, col2 FROM tbl WHERE index1=1 or index2=1;
    -- will translate to
    SELECT col1, col2 FROM tbl WHERE index1=1
    UNION ALL
    SELECT col1, col2 FROM tbl WHERE index2=1 AND index1 <> 1;
    

    Use EXPLAIN to check

5.6.3 Choosing a Good Column Order

Choose the index order such that rows are sorted and grouped in a way that will benefit the query The order of columns in a multicolumn B-Tree index means that the index is sorted first by the leftmost column, then by the next column, and so on.

5.7 Clustered Indexes

Clustered index is an approach to data storage.

  • InnoDB's clustered indexes actually store a B-Tree index and the rows together in the same structure.

5.8 Covering Indexes

An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index.

  • Covering indexes are especially helpful for InnoDB tables, because of InnoDB’s clustered indexes
  • When you issue a query that is an index-covered query, you’ll see “Using index” in the Extra column in EXPLAIN

5.8.1 Benefits

  • reduce random I/O

5.9 Using Index for Sorts

Ordering the results by the index works only when

  • the index’s order is exactly the same as the ORDER BY clause
  • and all columns are sorted in the same direction

6 Query Optimization

6.1 Isolating the Column

Indexed column is alone on one side of the comparison operator. Wrong Examples: WHERE id+2=5, WHERE SUM(id)=10