Joins¶
MySQL executes joins using a nested-loop join algorithm. It does not support hash or sort merge joins as may be available in competing database, which makes it less suitable for some analytic/data warehouse style queries. The MySQL optimizer does, however, have buffering strategies available to it which lessen the worst case of a nested-loop join.
The nested loop join¶
Example 23 shows a 3-table join between tables Country
, City
and CountryLanguage
. The complete steps to execute this query are described as follows:
- The Optimizer must first decide on a driving table (
Country
), as well as the indexes to be used to join to the remaining tables (City
,CountryLanguage
). - Execution starts by stepping through the first table (
Country
) table one row at a time. For each row that matches the filtering conditions (Country.Continent='Asia'
), a lookup will be performed into the next table (City
). - For each row that matches in the City table, a lookup will be performed in the final table (
CountryLanguage
). TheCountryLanguage
table will apply an additional filter: checking thatIsOfficial = 1
.
The nested loop algorithm works best when work can be eliminated before the joining begins. This means that the “best case” is where the driving table has a number of highly selective predicates.
The worst case can be when there are predicates scattered across many of the tables, and indexes cannot eliminate enough work before all of the tables need to be joined. In situations like this, it is common to denormalize a schema.
Denormalization helps because by redundantly maintaining a set of columns on the driving table, composite indexes can be added, and filtering can be applied before having to access rows or join to other tables.
Example 23: A nested loop join on three tables
EXPLAIN FORMAT=JSON
SELECT
Country.Name as Country, City.Name as Capital, Language
FROM
City
INNER JOIN Country ON Country.Capital=City.id
INNER JOIN CountryLanguage ON CountryLanguage.CountryCode=Country.code
WHERE
Country.Continent='Asia' and CountryLanguage.IsOfficial='T';
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "3.42"
},
"nested_loop": [
{
"table": {
"table_name": "Country",
"access_type": "const",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"Code"
],
"key_length": "3",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.00",
"eval_cost": "0.20",
"prefix_cost": "0.00",
"data_read_per_join": "264"
},
"used_columns": [
"Code",
"Name",
"Capital"
]
}
},
{
"table": {
"table_name": "City",
"access_type": "const",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"ID"
],
"key_length": "4",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.00",
"eval_cost": "0.20",
"prefix_cost": "0.00",
"data_read_per_join": "72"
},
"used_columns": [
"ID",
"Name"
]
}
},
{
"table": {
"table_name": "CountryLanguage",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"CountryCode"
],
"key": "PRIMARY",
"used_key_parts": [
"CountryCode"
],
"key_length": "3",
"ref": [
"const"
],
"rows_examined_per_scan": 12,
"rows_produced_per_join": 6,
"filtered": "50.00",
"cost_info": {
"read_cost": "1.02",
"eval_cost": "1.20",
"prefix_cost": "3.42",
"data_read_per_join": "240"
},
"used_columns": [
"CountryCode",
"Language",
"IsOfficial"
],
"attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 'T')"
}
}
]
}
}
INNER JOIN¶
An INNER JOIN
semantically requires that a row exist on both the left and the right side of the join. With this semantic in mind, it means that MySQL is able to join between the two tables in either order. It is the responsibility of the optimizer to pick the order with the lowest cost.
LEFT and RIGHT JOIN¶
A LEFT JOIN
semantically means that a row on the right side of the join is optional (and therefore a RIGHT JOIN
also means the left side is optional). Because one side of the join is optional, execution is forced to start access with the side that exists first. Because the order is enforced, the optimizer is not able to consider all of the potential join orders (execution plans) that would apply to an INNER JOIN
. Consequently, this leads to the simplification that LEFT JOIN
is slower.
Condition Fanout Filter¶
Starting with MySQL 5.7, the optimizer now considers the filtering effect of conditions other than those on the driving table. This condition_filtering
is used to refine the join order of subsequent tables.
Condition filtering is highly suited for histograms as, without it, it uses a simpler heuristic that can be inaccurate with skewed or lopsided data. It will also make use of statistics available from indexes (which have a higher cost).
Tip
In MySQL 5.7 a number of query regressions were due to the condition fanout filter and skewed data sets. In MySQL 8.0 consider adding histograms to columns on tables used in join conditions.