High Performance MySQL
Table of Contents
1 Special SQL
1.1 AUTOCOMMIT
SHOW VARIABLES LIKE 'AUTOCOMMIT'; SET AUTOCOMMIT = 1;
1.2 Set isolation level
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
2 Storage Engine
- InnoDB
- MyISAM
- Archive
- OLTP: XtraDB, TokuDB, like InnoDB
- Infobright: Column-oriented
- Aria: successor to MyISAM, crash-safe
- MySQL Cluster
2.1 Change engine
2.1.1 ALTER TABLE
ALTER TABLE mytable ENGINE = InnoDB;
- row-by-row copy
- inplace and slow
2.1.2 Dump and import
Use mysqldump
- dump to text
- change create sql
- import
2.1.3 CREATE and SELECT
CREATE TABLE innodb_table LIKE myisam_table; ALTER TABLE innodb_table ENGINE=InnoDB; INSERT INTO innodb_table SELECT * FROM myisam_table;
For large volume data:
START TRANSACTION; INSERT INTO innodb_table SELECT * FROM myisam_table WHERE id BETWEEN x AND y; COMMIT;
2.1.4 Online schema change
pt-online-schema-change (based on Facebook’s online schema change technique)
3 Benchmark
3.1 What to Measure
3.1.1 Throughput
Throughput is defined as the number of transactions per unit of time. The usual unit of measurement is transactions per second(TPS), although it is sometimes transactions per minute(TPM).
3.1.2 Response time or latency
This measures the total time a task requires.
3.1.3 Concurrency
Concurrency is an important but frequently misused and misunderstood metric. Concurrency benchmark shoud be the number of threads or connections doing work simultaneously.
3.1.4 Scalability
3.2 Gathering Data on MySQL
- specify user to execute mysql command
mysql_config_editor set --user=username --host=127.0.0.1 --port=3306 --password - a sample shell script that you can use to gather data on MySQL during benchmarks
#!/bin/sh INTERVAL=5 PREFIX=$INTERVAL-sec-status RUNFILE=/home/benchmark/running mysql -e 'SHOW GLOBAL VARIABLES' >> mysql-variables while test -e $RUNFILE; do file=$(date +%F_%I) sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}") sleep $sleep ts="$(date +"TS %s.%N %F %T")" loadavg="$(uptime)" echo "$ts $loadavg" >> $PREFIX-${file}-status mysql -e 'SHOW GLOBAL STATUS' >> $PREFIX-${file}-status & echo "$ts $loadavg" >> $PREFIX-${file}-innodbstatus mysql -e 'SHOW ENGINE INNODB STATUS\G' >> $PREFIX-${file}-innodbstatus & echo "$ts $loadavg" >> $PREFIX-${file}-processlist mysql -e 'SHOW FULL PROCESSLIST\G' >> $PREFIX-${file}-processlist & echo $ts done echo Exiting because $RUNFILE does not exist.
3.3 Analyzing Results
#!/bin/sh awk ' BEGIN { printf "#ts date time load QPS"; fmt = " %.2f"; } /^TS/ { # The timestamp lines begin with TS. ts = substr($2, 1, index($2, ".") - 1); load = NF - 2; diff = ts - prev_ts; prev_ts = ts; printf "\n%s %s %s %s", ts, $3, $4, substr($load, 1, length($load)-1); } /Queries/ { printf fmt, ($2-Queries)/diff; Queries=$2 } ' "$@"
3.3.1 plot
gnuplot> plot "filename" using 5 w lines title "QPS"
3.4 Tools
3.4.1 http_load
3.4.2 MySQL Benchmark Suite
mysqltest
3.4.3 sysbench
- cpu
sysbench cpu --cpu-max-prime=20000 --threads=4 run
- fileio
IO mode description seqwr Sequential write seqrewr Sequential rewrite seqrd Sequential read rndrd Random read rndwr Random write rndrw Combined random read/write sysbench fileio --file-total-size=21G prepare sysbench fileio --file-total-size=21G --file-test-mode=rndrw \ --time=300 --max-requests=0 --threads=4 run - oltp
sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \ --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_common.lua prepare sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=test1234 \ --mysql-db=test --table-size=100000 /usr/local/share/sysbench/oltp_read_write.lua \ --threads=4 run
4 Profiling
- execution-time profiling
- wait analysis
4.1 slow query log
related variables: slow_query_log, long_query_time
- low-overhead on I/O-bound workloads
- high-fidelity
4.1.1 pt-query-digest
Analyze MySQL queries from logs, processlist, and tcpdump.
pt-query-digest [log_file]
4.1.2 tcpdump
Use tcpdump to capture TCP network traffic and inspecting it, then
- use
pt-query-digest --type=tpcdumpto decode the MySQL client/server protocol
4.2 Single Query Profiling
4.2.1 performance_schema
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%statement/%'; UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%stage/%'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_statements_%'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_stages_%'; SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE ... /*same as SHOW PROFILES*/ SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=... /*same as SHOW PROFILE FOR QUERY*/
results in performance_schema.events_statements_history_long
4.2.2 SHOW PROFILE[deprecated]
SET profiling=1; SHOW PROFILES; SHOW PROFILE FOR QUERY [Query_ID];
4.2.3 SHOW STATUS
FLUSH STATUS; SHOW STATUS WHERE Variable_name LIKE 'Handler%' OR Variable_name LIKE 'Created%';
4.2.4 EXPLAIN
EXPLAIN [SQL_QUERY]
4.3 Server-Wide Problem Detection
4.3.1 sysstat
- iostat, vmstat
4.3.2 SHOW GLOBAL STATUS
Run SHOW GLOBAL STATUS periodically, gather Threads_running, Threads_connected, Questions, and Queries.
mysqladmin ext -i1 -p | awk ' /Queries/{q=$4-qp;qp=$4} /Threads_connected/{tc=$4} /Threads_running/{printf "%5d %5d %5d\n", q, tc, $4}'
4.3.3 SHOW PROCESSLIST
mysql -e 'SHOW PROCESSLIST\G' -p | grep State: | sort | uniq -c | sort -rn
4.3.4 SHOW INNODB STATUS
4.3.5 analyze slow log
awk '/^# Time:/{print $3, $4, c;c=0}/^# User/{c++}' slow-query.log
4.3.6 oprofile
primary Linux system profiling tool
4.3.7 pt-stalk
Collect forensic data about MySQL when problems occur.
pt-sift: Browses files created bypt-stalk
4.3.8 pt-mysql-summary & pt-summary
4.3.9 pt-pmp
Aggregate GDB stack traces for a selected program
4.4 Tools
4.4.1 USER_STATISTICS
SHOW TABLES FROM INFORMATION_SCHEMA LIKE '%_STATISTICS'; in MariaDB
4.4.2 strace
Use strace to intercepts system calls
strace -cfp [PID]
5 Schema
5.1 Data Type
5.1.1 Priciples
- Smaller is usually better
- Simple is good
- Avoid NULL if possible
5.1.2 SET
stores bits. useful for ACL(Access Control List)
CREATE TABLE acl ( perms SET('CAN_READ', 'CAN_WRITE', 'CAN_DELETE') NOT NULL ); INSERT INTO acl (perms) VALUES ('CAN_READ,CAN_WRITE'); SELECT perms FROM acl WHERE FIND_IN_SET('CAN_READ', perms);
- stores SET types internally as integers but converts them to strings when doing comparisons.
5.1.3 UUID
- recommend using BINARY(16) type
5.2 Normalization and Denormalization
5.2.1 Normalization
Benefits:
- Normalized updates are usually faster than denormalized updates
- Normalized tables are usually smaller, so they fit better in memory and perform better
- The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values
Drawbacks:
- requires more joins
5.2.2 Denormalization
Benefits:
- avoids joins
- allow more efficient indexing strategies
5.3 Summary Table & Cache Table
- summary tables: hold aggregated data from GROUP BY queries
- cache tables: contain data that can be easily retrieved from the schema(i.e., data that is logically redundant).
5.3.1 Flexviews
- offer materialized views
5.4 Speeding Up ALTER TABLE
5.4.1 Default
MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table.
5.5 Indexing Methods
5.5.1 B-tree
- column order in a index is extremely important
- B+Tree
data is on the leaf node.
- Limitation
- lookup does not start from the leftmost side of the indexed columns
- you can't skip columns in the index.
e.g. index(col1, col2, col3) need to specify a value to col2 if value of col3 is given
- The storage engine can’t optimize accesses with any columns to the right of the first range condition.
e.g. col2 LIKE 'abc%' <- range condition, then col3=="somevalue" won't be optimized.
5.5.2 Hash
- only the Memory storage engine supports explicit hash indexes
- InnoDB
adaptive hash indexes: When InnoDB notices that some index values are being accessed very frequently, it builds a hash index for them in memory on top of B-Tree indexes.
- Building hash indexes
- add hash column
- create trigger on
insertandupdate selectby hash value
SELECT id FROM url WHERE url_crc=CRC32("http://www.mysql.com") AND url="http://www.mysql.com"; -- Search in collision
- hash functions: CRC32, FNV64, SHA1, MD5
5.5.3 R-Tree
- index GEOMETRY information
- MyISAM supports R-Tree
5.5.4 Full-text
Full-text is a special type of index that finds keywords in the text instead of comparing values directly to the values in the index.
5.6 Indexing Strategies
5.6.1 Long string index
- one approach: simulate a hash index
- indexing the first few characters
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table. Choose a prefix that’s long enough to give good selectivity, but short enough to save space
- one way: try to compare occurrences of each distinct indexed values for each prefix
SELECT COUNT(*) AS cnt, city FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10; SELECT COUNT(*) AS cnt, LEFT(city, 3) AS pref FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
- another way: computing the full column's selectivity and trying to make the prefix's selectivity close to that value.
SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo; SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT(*) AS sel3, COUNT(DISTINCT LEFT(city, 4))/COUNT(*) AS sel4, FROM sakila.city_demo;
- downside: MySQL cannot use prefix indexes for ORDER BY or GROUP BY queries, nor can it use them as covering indexes.
5.6.2 Multicolumn Indexes
Individual indexes on lots of columns won’t help MySQL improve performance for most queries
SELECT col1, col2 FROM tbl WHERE index1=1 or index2=1; -- will translate to SELECT col1, col2 FROM tbl WHERE index1=1 UNION ALL SELECT col1, col2 FROM tbl WHERE index2=1 AND index1 <> 1;
Use
EXPLAINto check
5.6.3 Choosing a Good Column Order
Choose the index order such that rows are sorted and grouped in a way that will benefit the query The order of columns in a multicolumn B-Tree index means that the index is sorted first by the leftmost column, then by the next column, and so on.
5.7 Clustered Indexes
Clustered index is an approach to data storage.
- InnoDB's clustered indexes actually store a B-Tree index and the rows together in the same structure.
5.8 Covering Indexes
An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index.
- Covering indexes are especially helpful for InnoDB tables, because of InnoDB’s clustered indexes
- When you issue a query that is an index-covered query, you’ll see “Using index” in the Extra column in EXPLAIN
5.8.1 Benefits
- reduce random I/O
5.9 Using Index for Sorts
Ordering the results by the index works only when
- the index’s order is exactly the same as the ORDER BY clause
- and all columns are sorted in the same direction
6 Query Optimization
6.1 Isolating the Column
Indexed column is alone on one side of the comparison operator.
Wrong Examples: WHERE id+2=5, WHERE SUM(id)=10