Prerequisites for the appearance of the JOIN operator

As the amount of data in tables grows, duplicate values ​​often appear in them. For example, the product table has brand names. which are sometimes repeated. It would be useful to have a separate brand table for them, which would contain, for example, such data:

  • id - brand id;
  • name – brand name;
  • company_owner - the name of the company owner of the brand;
  • company_contacts - contacts of the brand owner's company.

Then we would like to create a table of addresses into which we could put all the addresses:

  • id – id-addresses;
  • country;
  • region;
  • city;
  • street;
  • house;
  • zip.

Moreover, in this table it would be possible to store the addresses of not only brand-owning companies, but also the addresses of customers and employees. And this approach is simply a consequence of the growth in the amount of data (the number of rows in the tables). This makes it easier to manipulate data in tables and maintain their integrity.

When you have 5 employees, you can simply enter their occupation in the occupation column . If you have 5 thousand employees in your company, then you need a table with a list of x professions and their responsibilities.

In the Java language, by the way, there is something similar. If you have a lot of code in a method, then there is a desire to break it into several methods. If there are many methods in a class, then I want to break it into several classes.

That is why large databases have thousands of tables. And almost all queries are executed on several tables at once. And the Cartesian product of the rows of three tables in each of which has a thousand records is already a billion rows.

But just filtering a billion rows using WHERE can take hours. Therefore, the creators of the SQL language offered their solution - the JOIN operator.

Introduction to the JOIN operator

The JOIN operator is used to explicitly tell the SQL server that we do not need a Cartesian product of all rows of tables for all, but a smart gluing of rows of different tables that refer to each other using an ID (or in another way). The server has a separate algorithm for processing table joins using the JOIN operator, which allows such operations to be performed much faster. The general form of the JOIN operator is as follows:

table 1 JOIN table 2 ON condition

It says here that you need to combine the tables table1 and table2 into one table, and use the condition as the join criterion .

Let's take our old example:

SELECT * FROM employee, task WHERE emploee.id = task.emploee_id

And rewrite it using the JOIN operator:

SELECT * FROM employee JOIN task ON emploee.id = task.emploee_id

Both queries will give the same result in our case, but this is only because the queries are very simple. In the future, WHERE allows you to build up exactly the row filter, and the one specified after ON allows you to write complex scripts for linking tables. You can also use aliases (table aliases) when using the JOIN statement. Example:

SELECT * FROM employee e JOIN task t ON e.id = t.emploee_id