Character SetsΒΆ
The latest Unicode 9.0 is supported in MySQL 8.0 under the name of utf8mb4
. utf8mb4
is variable length with individual characters consuming between one and four bytes. There are many naunces between bytes and characters and which is variable length:
- When creating a column as
VARCHAR(n)
, n represents the character length. The byte storage requirement could be up to 4 times more (but is frequently less). - Internally the InnoDB storage engine always [1] stores
utf8mb4
as variable length (in indexes and table rows) forVARCHAR
,CHAR
andTEXT
data types. - In memory temporary tables that are used as part of materialization are fixed length. This may result in cases where temporary tables are either larger, or spill over to disk earlier when using the
utf8mb4
character set. - The buffers used to sort data are variable length (since MySQL 5.7).
EXPLAIN
will always show the maximum length of a variable length index (byte length). Frequently the storage requirements are much lower.
Example 37 shows EXPLAIN
reporting the use of an index on a column of type CHAR(52)
in latin1 character set. After converting the table to utf8mb4
the storage requirements of the table have not increased, but EXPLAIN
is now showing an increase in key_length
.
Example 37: EXPLAIN showing the maximum key length of an index (latin1 character set)
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE name='Canada';
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "1.20"
},
"table": {
"table_name": "Country",
"access_type": "ref",
"possible_keys": [
"Name"
],
"key": "Name",
"used_key_parts": [
"Name"
],
"key_length": "52", # CHAR(52)
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "0.20",
"prefix_cost": "1.20",
"data_read_per_join": "264"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
]
}
}
}
Example 38: EXPLAIN showing the maximum key length of an index (utf8mb4 character set)
ALTER TABLE Country CONVERT TO CHARACTER SET utf8mb4;
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE name='Canada';
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "1.20"
},
"table": {
"table_name": "Country",
"access_type": "ref",
"possible_keys": [
"Name"
],
"key": "Name",
"used_key_parts": [
"Name"
],
"key_length": "208", # CHAR(52) * 4 = 208
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "0.20",
"prefix_cost": "1.20",
"data_read_per_join": "968"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
]
}
}
}
[1] | Always when using the DYNAMIC , COMPACT and COMPRESSED row formats. There is typically no practical use to use the earlier REDUNDANT row format. |