All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Emmet Caulfield <emmet.caulfield@stanford.edu>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py
Date: Mon, 10 Jun 2013 22:30:16 +0800	[thread overview]
Message-ID: <20130610143016.GA2124@feng-snb> (raw)
In-Reply-To: <CA+nN=O9dnp-j48fb1MAKLh=gdtMqY8o0kkL5jM3eSFbp04whaA@mail.gmail.com>

On Fri, Jun 07, 2013 at 11:58:53AM -0700, Emmet Caulfield wrote:
> The example script tools/perf/scripts/python/event_analyzing_sample.py
> contains a minor error. This script takes a perf.data file and
> populates a SQLite database with it.
> 
> There's a long comment on lines 29-34 to the effect that it takes a
> long time to populate the database if the .db file is on disk, so it's
> done in the "ramdisk" (/dev/shm/perf.db), but the problem here is
> actually line 36:
> 
>     con.isolation_level=None
> 
> This line turns on AUTOCOMMIT, making every INSERT statement into its
> own transaction, and greatly slowing down a bulk insert (25 minutes
> vs. a few seconds to insert 15,000 records). This is best solved by
> merely omitting this line or changing it to:
> 
>     con.isolation_level='DEFERRED'
> 
> After making this change, if the database is in memory, it takes
> roughly 0.5 seconds to insert 15,000 records and 0.8 seconds if the
> database file is on disk, effectively solving the problem.
> 
> Given that the whole purpose of having AUTOCOMMIT turned on is to
> ensure that individual insert/update/delete operations are committed
> to persistent storage, moving the .db file to a ramdisk defeats the
> purpose of turning this option on in the first place. Thus
> leaving/turning it *off* with the file on disk is no worse. It is
> pretty much standard practice to defer transactions and index updates
> for bulk inserts like this anyway.
> 
> The following patch deletes the offending line and updates the
> associated comment.
> 
> Emmet.
> 
> 
> --- tools/perf/scripts/python/event_analyzing_sample.py~
> 2013-06-03 15:38:41.762331865 -0700
> +++ tools/perf/scripts/python/event_analyzing_sample.py 2013-06-03
> 15:43:48.978344602 -0700
> @@ -26,14 +26,9 @@
>  from perf_trace_context import *
>  from EventClass import *
> 
> -#
> -# If the perf.data has a big number of samples, then the insert operation
> -# will be very time consuming (about 10+ minutes for 10000 samples) if the
> -# .db database is on disk. Move the .db file to RAM based FS to speedup
> -# the handling, which will cut the time down to several seconds.
> -#
> +# Create/connect to a SQLite3 database:
>  con = sqlite3.connect("/dev/shm/perf.db")
> -con.isolation_level = None
> +
> 
>  def trace_begin():
>         print "In trace_begin:\n"

Thanks for the root causing the slowness of SQLite3 operation.

Acked-by: Feng Tang <feng.tang@intel.com>


      reply	other threads:[~2013-06-10 14:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-07 18:58 [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py Emmet Caulfield
2013-06-10 14:30 ` Feng Tang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130610143016.GA2124@feng-snb \
    --to=feng.tang@intel.com \
    --cc=emmet.caulfield@stanford.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.