vendredi 17 juin 2016

MySQL insert with sub select slow

We are using Python and LOAD DATA INFILE to load data from CSVs into our staging database. From staging we have sql scripts moving the data into our actual production database.

LOAD DATA INFILE is lightning fast compared to selecting rows from staging and inserting them into production.

We are on 5.7, using InnoDB and we have applied the following configuration to optimise our inserts:

  • Set innodb_autoinc_lock_mode to 2
  • Set innodb buffer pool size to half of the memory (16GB)
  • Set log buffer size to 4GB
  • We are using TRANSACTIONS
  • Use SET autocommit=0;

Still the insert from one table to another is significantly slower compared to LOAD DATA INFILE.

When I look at IO writes, with load data infile it goes up to 30 MB/s, while with normal inserts, it's max 500KB/sec.

Is there any way we could improve this performance or do we need to completely rethink our approach. I can think of using OUTFILE for sub queries and load that back in with INFILE but it doesn't sound like the right approach.

And the statement:

INSERT INTO documentkey (documentClassCode,dId,fileTypeCode,internet,pathId,creationTime,signature,CSVimportId) 
SELECT case when csv.`Document Class` is null
                then (select classCode from mydb.class where classDesc = 'Empty'
                    And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'C' and EntityLookedup = 'documentkey')
                    )
                else (select classCode from mydb.class where    classDesc = csv.`Document Class`
                    And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'C' and EntityLookedup = 'documentkey')
                    )
        end,
        csv.`dId`,
        (select typeCode from mydb.type
                Where typeDesc = csv.`File Type`
                And LookupId = (select LookupId from mydb.Lookup where LookupGroupCode = 'T' and EntityLookedup = 'documentkey')
        ),
        case    when csv.`message ID` is null
                then (select messageIncrId from message where internetdesc = 'Empty')
                else case   when    exists (select internetMessageIncrId from internetMessage where internetdesc = csv.`Internet Message ID`)
                            then    (select internetMessageIncrId from internetMessage where internetdesc = csv.`Internet Message ID`)
                            else    0
                    end
        end,
        case    when exists (select pathId from Path where pathDesc = csv.`path`)
                then    (select pathId from Path where pathDesc = csv.`path`)
                else 0
        end,
        case when csv.`Creation Time` <> '' then STR_TO_DATE(csv.`Creation Time`, '%d/%m/%Y  %H:%i:%s') else '2016-06-16 10:00:00' end,
        #STR_TO_DATE(csv.`Creation Time`, '%Y-%m-%d %H:%i:%s'),
        csv.`Signature Hash`,
        1
        #csv.`CSV import id`
FROM `mydb_stage`.`csvDocumentKey` csv
where csv.`dId` is not null and csv.threadId = @thread;

Select part of the query only takes a fraction of a second.

Aucun commentaire:

Enregistrer un commentaire