Menu
Is free
registration
home  /  Programs/ Multiple SELECT COUNTs in one MySQL query. Optimizing MySQL queries Mysql query multiple queries in one

Multiple SELECT COUNTs in one MySQL query. Optimizing MySQL queries Mysql query multiple queries in one

I have already written about various SQL queries, but it's time to talk about more complex things, for example, SQL query to select records from multiple tables.

When you and I made a selection from one table, everything was very simple:

SELECT required_field_names FROM table_name WHERE select_condition

Everything is very simple and trivial, but for sampling from several tables at once it becomes more and more complicated. One of the difficulties is the coincidence of field names. For example, every table has a field id.

Let's consider a query like this:

SELECT * FROM table_1, table_2 WHERE table_1.id> table_2.user_id

To many who have not dealt with such queries, it will seem that everything is very simple, thinking that only the names of the tables have been added here before the names of the fields. In fact, this avoids contradictions between the same names fields. However, the difficulty lies not in this, but in the algorithm of operation of such a SQL query.

The work algorithm is as follows: the first record is taken from table_1... Taken id this entry from table_1... Further the table looks completely table_2... And all records are added where the field value user_id smaller id selected entry in table_1... Thus, after the first iteration, there may appear from 0 to infinite resulting records. At the next iteration, the next record of the table is taken table_1... The entire table is scanned again table_2, and the fetch condition is triggered again table_1.id> table_2.user_id... All records that meet this condition are added to the result. The output may turn out great amount records that are many times the total size of both tables.

If you understand how it works after the first time, then it is very cool, and if not, then read until you understand completely. If you understand this, then it will be easier.

Previous SQL query as such is rarely used. It was just given for explanations of the algorithm for fetching from several tables... Now let's look at a more stocky one SQL query... Let's say we have two tables: with goods (there is a field owner_id responsible for id the owner of the product) and with users (there is a field id). We want one SQL query get all records, and that each contains information about the user and his one product. The next entry contained information about the same user and his next product. When this user runs out of products, then go to the next user. Thus, we have to join the two tables and get a result in which each record contains information about the user and about one of his products.

A similar query will replace 2 SQL queries: to select separately from the table with products and from the table with users. In addition, such a request will immediately match the user and his product.

The request itself is very simple (if you understood the previous one):

SELECT * FROM users, products WHERE users.id = products.owner_id

The algorithm here is already simple: the first record from the table is taken users... Then it is taken id and all records from the table are analyzed products, adding to the result those for which owner_id is equal to id from the table users... Thus, at the first iteration, all products are collected from the first user. At the second iteration, all products are collected from the second user, and so on.

As you can see SQL queries to select from multiple tables not the simplest ones, but the benefits of them can be enormous, so it is very desirable to know and be able to use such queries.

October 9, 2008 at 11:37 PM

Optimizing MySQL queries

  • MySQL

In everyday work, one has to deal with quite the same type of errors when writing queries.

In this article I would like to give examples of how NOT to write queries.

  • Fetching all fields
    SELECT * FROM table

    When writing queries, do not use a selection of all fields - "*". List only the fields that you really need. This will reduce the amount of data fetched and sent. Also, don't forget about covering indexes. Even if you really need all the fields in the table, it's better to list them. First, it improves the readability of the code. When using an asterisk, it is impossible to find out what fields are in the table without looking into it. Secondly, over time, the number of columns in your table may change, and if today there are five INT columns, then in a month TEXT and BLOB fields may be added, which will slow down the selection.

  • Looped queries.
    You need to be clear that SQL is a set language. At times, programmers who are used to thinking in terms of procedural languages ​​find it difficult to restructure their thinking into a language of sets. This can be done quite simply by adopting a simple rule - "never execute queries in a loop." Examples of how this can be done:

    1. Samples
    $ news_ids = get_list ("SELECT news_id FROM today_news");
    while ($ news_id = get_next ($ news_ids))
    $ news = get_row ("SELECT title, body FROM news WHERE news_id =". $ news_id);

    The rule is very simple - the fewer requests, the better (although there are exceptions to this, as to any rule). Don't forget about the IN () construct. The above code can be written in one query:
    SELECT title, body FROM today_news INNER JOIN news USING (news_id)

    2. Inserts
    $ log = parse_log ();
    while ($ record = next ($ log))
    query ("INSERT INTO logs SET value =" (! LANG :. $ log ["value"]);!}

    It is much more efficient to glue and execute one query:
    INSERT INTO logs (value) VALUES (...), (...)

    3. Updates
    Sometimes it is necessary to update several rows in the same table. If the updated value is the same, then everything is simple:
    UPDATE news SET title = "(! LANG: test" WHERE id IN (1, 2, 3).!}

    If the changed value for each record is different, then this can be done with the following query:
    UPDATE news SET
    title = CASE
    WHEN news_id = 1 THEN "aa"
    WHEN news_id = 2 THEN "bb" END
    WHERE news_id IN (1, 2)

    Our tests show that such a request is 2-3 times faster than several separate requests.

  • Performing operations on indexed fields
    SELECT user_id FROM users WHERE blogs_count * 2 = $ value

    This query will not use the index even if the blogs_count column is indexed. No conversions must be performed on the indexed field in the query for the index to be used. For similar queries, move the transformation functions to a different part:
    SELECT user_id FROM users WHERE blogs_count = $ value / 2;

    A similar example:
    SELECT user_id FROM users WHERE TO_DAYS (CURRENT_DATE) - TO_DAYS (registered)<= 10;

    Will not use an index on the registered field, whereas
    SELECT user_id FROM users WHERE registered> = DATE_SUB (CURRENT_DATE, INTERVAL 10 DAY);
    will.

  • Fetching rows just to count their number
    $ result = mysql_query ("SELECT * FROM table", $ link);
    $ num_rows = mysql_num_rows ($ result);
    If you need to select the number of rows that satisfy a specific condition, use the SELECT COUNT (*) FROM table query, rather than selecting all rows just to count the number of rows.
  • Fetching extra rows
    $ result = mysql_query ("SELECT * FROM table1", $ link);
    while ($ row = mysql_fetch_assoc ($ result) && $ i< 20) {

    }
    If you only need n sample rows, use LIMIT instead of discarding extra rows in your application.
  • Using ORDER BY RAND ()
    SELECT * FROM table ORDER BY RAND () LIMIT 1;

    If the table has more than 4-5 thousand rows, then ORDER BY RAND () will work very slowly. It would be much more efficient to execute two queries:

    If the auto_increment table has a "new" primary key and there are no gaps:
    $ rnd = rand (1, query ("SELECT MAX (id) FROM table"));
    $ row = query ("SELECT * FROM table WHERE id =". $ rnd);

    Or:
    $ cnt = query ("SELECT COUNT (*) FROM table");
    $ row = query ("SELECT * FROM table LIMIT". $ cnt. ", 1");
    which, however, can also be slow with a very large number of rows in the table.

  • Using a large number of JOINs
    SELECT
    v.video_id
    a.name,
    g.genre
    FROM
    videos AS v
    LEFT JOIN
    link_actors_videos AS la ON la.video_id = v.video_id
    LEFT JOIN
    actors AS a ON a.actor_id = la.actor_id
    LEFT JOIN
    link_genre_video AS lg ON lg.video_id = v.video_id
    LEFT JOIN
    genres AS g ON g.genre_id = lg.genre_id

    It should be remembered that when linking tables one-to-many, the number of rows in the selection will grow with each successive JOIN. For such cases, it is faster to split such a query into several simple ones.

  • Using LIMIT
    SELECT… FROM table LIMIT $ start, $ per_page

    Many people think that such a query will return $ per_page records (usually 10-20) and therefore will work quickly. It will work quickly for the first few pages. But if the number of records is large and you need to execute a SELECT… FROM table LIMIT 1000000, 1000020 query, then to execute such a query, MySQL will first select 1000020 records, discard the first million and return 20. It may not be fast at all. There are no trivial ways to solve the problem. Many people simply limit the number of available pages to a reasonable number. You can also speed up such queries by using covering indexes or third-party solutions (such as sphinx).

  • Not using ON DUPLICATE KEY UPDATE
    $ row = query ("SELECT * FROM table WHERE id = 1");

    If ($ row)
    query ("UPDATE table SET column = column + 1 WHERE id = 1")
    else
    query ("INSERT INTO table SET column = 1, id = 1");

    A similar construction can be replaced with one query, provided that there is a primary or unique key for the id field:
    INSERT INTO table SET column = 1, id = 1 ON DUPLICATE KEY UPDATE column = column + 1

Read

This short article will focus on databases in particular MySQL, sampling and counting. Working with databases, it is often required to count the number of COUNT () rows with or without a certain condition, this is extremely simple to do with the following query

View MYSQL Code

The query will return a value with the number of rows in the table.

Conditional counting

View MYSQL Code

The query will return a value with the number of rows in the table satisfying this condition: var = 1

To get multiple row count values ​​with different conditions, you can run several queries one by one, for example

View MYSQL Code

But in some cases, this approach is not practical and not optimal. Therefore, it becomes relevant to organize a query, with several subqueries, to get several results in one query at once. For example

View MYSQL Code

Thus, having executed just one query to the database, we get a result with a count of the number of rows for several conditions, containing several count values, for example

View code TEXT

c1 | c2 | c3 -------- 1 | 5 | 8

The disadvantage of using subqueries, in comparison with several separate queries, is the speed of execution and the load on the database.

The following example of a query containing several COUNTs in one MySQL query, is built a little differently, it uses IF (condition, value1, value2) constructs, as well as summation SUM (). They allow you to select data according to specified criteria within a single query, then summarize them, and display several values ​​as a result.

View MYSQL Code

As you can see from the request, it is built quite succinctly, but the speed of its execution is also not happy, the result of this request will be as follows,

View code TEXT

total | c1 | c2 | c3 -------------- 14 | 1 | 5 | 8

Next, I will give comparative statistics of the speed of execution of three variants of queries, for a selection of several COUNT (). To test the speed of query execution, 1000 queries of each type were executed, with a table containing more than three thousand records. In this case, each time the query contained SQL_NO_CACHE to disable caching of results by the database.

Execution speed
Three separate requests: 0.9 sec
One query with subqueries: 0.95 sec
One query with IF and SUM construction: 1.5 sec

Output. And so, we have several options for building queries to the database. MySQL data with multiple COUNT (), the first option with separate queries is not very convenient, but it has the best result in terms of speed. The second option with subqueries is somewhat more convenient, but its execution speed is slightly slower. And finally, the third laconic version of the query with the IF and SUM constructs, which seems to be the most convenient, has the lowest execution speed, which is almost two times lower than the first two options. Therefore, when optimizing the operation of the database, I recommend using the second version of the query containing subqueries with COUNT (), firstly, its execution speed is close to the fastest result, and secondly, such an organization within one query is quite convenient.

In the last lesson, we encountered one inconvenience. When we wanted to know who created the "bicycles" theme and made a request:

Instead of the author's name, we received his identifier. This is understandable, because we made a query to one table - Topics, and the names of the authors of topics are stored in another table - Users. Therefore, having found out the theme author's identifier, we need to make one more query - to the Users table to find out his name:

SQL provides the ability to combine such queries into one by turning one of them into a subquery (subquery). So, to find out who created the bicycles theme, we'll make the following query:

That is, after the keyword WHERE, we write another request into the condition. MySQL processes the subquery first, returns id_author = 2, and this value is passed to the clause WHERE external request.

One query can have several subqueries, the syntax for such a query is as follows: Note that subqueries can select only one column, the values ​​of which they will return to the outer query. Attempting to select multiple columns will result in an error.

Let's make another request for consolidation, find out what messages on the forum were left by the author of the topic "bicycles":

Now let's complicate the task, find out in which topics the author of the topic "bicycles" left messages:

Let's see how it works.

  • MySQL will execute the deepest query first:

  • The resulting result (id_author = 2) will be passed to an external request, which will take the form:

  • The resulting result (id_topic: 4,1) will be passed to an external request, which will take the form:

  • And it will give the final result (topic_name: about fishing, about fishing). Those. the author of the "bicycles" topic posted messages in the topic "About fishing", created by Sergey (id = 1) and in the topic "About fishing", created by Sveta (id = 4).
That's all I wanted to say about nested queries. Although, there are two points to pay attention to:
  • It is not recommended to create queries with a nesting level of more than three. This leads to an increase in execution time and complexity of the code perception.
  • The given syntax of nested queries is rather the most common, but not the only one. For example, instead of requesting

    to write

    Those. we can use any operators used with keyword WHERE (we studied them in the last lesson).