linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Josef Bacik <jbacik@redhat.com>, linux-ext4@vger.kernel.org
Subject: Re: transaction batching performance & multi-threaded synchronous writers
Date: Tue, 15 Jul 2008 07:29:21 -0400	[thread overview]
Message-ID: <487C8A11.3050801@redhat.com> (raw)
In-Reply-To: <20080715075832.GD6239@webber.adilger.int>

Andreas Dilger wrote:
> On Jul 14, 2008  12:58 -0400, Josef Bacik wrote:
>   
>> Perhaps we track the average time a commit takes to occur, and then if
>> the current transaction start time is < than the avg commit time we sleep
>> and wait for more things to join the transaction, and then we commit.
>> How does that idea sound?  Thanks,
>>     
>
> The drawback of this approach is that if the thread waits an extra "average
> transaction time" for the transaction to commit then this will increase the
> average transaction time each time, and it still won't tell you if there
> needs to be a wait at all.
>
> What might be more interesting is tracking how many processes had sync
> handles on the previous transaction(s), and once that number of processes
> have done that work, or the timeout reached, the transaction is committed.
>
> While this might seem like a hack for the particular benchmark, this
> will also optimize real-world workloads like mailserver, NFS/fileserver,
> http where the number of threads running at one time is generally fixed.
>
> The best way to do that would be to keep a field in the task struct to
> track whether a given thread has participated in transaction "T" when
> it starts a new handle, and if not then increment the "number of sync
> threads on this transaction" counter.
>
> In journal_stop() if t_num_sync_thr >= prev num_sync_thr then
> the transaction can be committed earlier, and if not then it does a
> wait_event_interruptible_timeout(cur_num_sync_thr >= prev_num_sync_thr, 1).
>
> While the number of sync threads is growing or constant the commits will 
> be rapid, and any "slow" threads will block on the next transaction and
> increment its num_sync_thr until the thread count stabilizes (i.e. a small
> number of transactions at startup).  After that the wait will be exactly
> as long as needed for each thread to participate.  If some threads are
> too slow, or stop processing then there will be a single sleep and the
> next transaction will wait for fewer threads the next time.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>   
This really sounds like one of those math problems (queuing theory?) 
that I never was able to completely wrap my head around back at 
university, but the basic things that we we have are:

    (1) the average time it takes to complete an independent 
transaction. This will be different for each target device and will 
possibly change over time (specific odd case is a shared disk, like an 
array).
    (2) the average cost it takes to add "one more" thread to a 
transaction. I think that the assumption is that this cost is close to zero.
    (3) the rate of arrival of threads trying to join a transaction.
    (4) come knowledge of the history of which threads did the past 
transactions. It is quite reasonable to never wait if a single thread is 
the author of the last (most of the last?) sequence which is the good 
thing in there now.
    (5) the minimum time we can effectively wait with a given mechanism 
(4ms or 1ms for example depending on the HZ in the code today)

I think the trick here is to try and get a heuristic that works without 
going nuts in complexity.

The obvious thing we need to keep is the heuristic to not wait if we 
detect a single thread workload.

It would seem reasonable not to wait if the latency of the device (1 
above) is lower than the time the chosen mechanism can wait (5). For 
example, if transactions are done in microseconds like for a ramdisk, 
just blast away ;-)

What would be left would be the need to figure out if (3) arrival rate 
would predict a new thread will come along before we would be able to 
finish the current transaction without waiting.

Does this make any sense? This sounds close to the idea that Josef 
proposed above, we would just tweak his proposal to avoid sleeping in 
the single threaded case.

Ric





  reply	other threads:[~2008-07-15 11:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-14 16:15 transaction batching performance & multi-threaded synchronous writers Ric Wheeler
2008-07-14 16:58 ` Josef Bacik
2008-07-14 17:26   ` Ric Wheeler
2008-07-15  7:58   ` Andreas Dilger
2008-07-15 11:29     ` Ric Wheeler [this message]
2008-07-15 12:51     ` Josef Bacik
2008-07-15 14:05       ` Josef Bacik
2008-07-15 14:22       ` Ric Wheeler
2008-07-15 18:39 ` Josef Bacik
2008-07-15 20:10   ` Josef Bacik
2008-07-15 20:43     ` Josef Bacik
2008-07-15 22:33       ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487C8A11.3050801@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=adilger@sun.com \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).