Re: [PATCH 1/2] blkdev: add flush generation counter

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	axboe@kernel.dk
Subject: Re: [PATCH 1/2] blkdev: add flush generation counter
Date: Thu, 18 Apr 2013 16:34:59 +0200	[thread overview]
Message-ID: <20130418143459.GA10243@quack.suse.cz> (raw)
In-Reply-To: <87sj2psfws.fsf@openvz.org>

On Wed 17-04-13 12:46:43, Dmitry Monakhov wrote:
> On Mon, 15 Apr 2013 16:14:39 +0200, Jan Kara <jack@suse.cz> wrote:
> > On Sun 14-04-13 23:34:27, Dmitry Monakhov wrote:
> > > Callers may use this counter to optimize flushes
> > > 
> > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > > ---
> > >  block/blk-core.c       |    1 +
> > >  block/blk-flush.c      |    3 ++-
> > >  include/linux/blkdev.h |    1 +
> > >  3 files changed, 4 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 074b758..afb5a4b 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	spin_unlock_irq(lock);
> > >  	mutex_unlock(&q->sysfs_lock);
> > >  
> > > +	atomic_set(&q->flush_tag, 0);
> > >  	/*
> > >  	 * Drain all requests queued before DYING marking. Set DEAD flag to
> > >  	 * prevent that q->request_fn() gets invoked after draining finished.
> > > diff --git a/block/blk-flush.c b/block/blk-flush.c
> > > index cc2b827..b1adc75 100644
> > > --- a/block/blk-flush.c
> > > +++ b/block/blk-flush.c
> > > @@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error)
> > >  	/* account completion of the flush request */
> > >  	q->flush_running_idx ^= 1;
> > >  	elv_completed_request(q, flush_rq);
> > > -
> > > +	atomic_inc(&q->flush_tag);
> > >  	/* and push the waiting requests to the next stage */
> > >  	list_for_each_entry_safe(rq, n, running, flush.list) {
> > >  		unsigned int seq = blk_flush_cur_seq(rq);
> > > @@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
> > >  	q->flush_rq.end_io = flush_end_io;
> > >  
> > >  	q->flush_pending_idx ^= 1;
> > > +	atomic_inc(&q->flush_tag);
> > >  	list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
> > >  	return true;
> > >  }
> >   But this doesn't quite work, does it? When fs reads flush_tag counter,
> > CACHE_FLUSH command can be already issued so you are not sure how the IO
> > you are issuing relates to the cache flush in flight. If you want to make
> > this work, you really need two counters - one for flush submissions and one
> > for flush completions. Fs would need to read the submission counter before
> > issuing the IO and the completion counter when it wants to make sure date is
> > on the stable storage. But it all gets somewhat complex and I'm not sure
> > it's worth it given the block layer tries to merge flush requests by
> > itself. But you can try for correct implementation and measure the
> > performance impact of course...
> 
> Probably I've missed good comments, but I've tried to make things correct
> Here are an flush implementation assumptions from block/blk-flush.c:
> 
> Currently, the following conditions are used to determine when to e flush.
> 
> C1. At any given time, only one flush shall be in progress.  This is
>     double buffering sufficient.
> C2. Flush is deferred if any request is executing DATA of its ence.
>     This avoids issuing separate POSTFLUSHes for requests which ed
>     PREFLUSH.
> C3. The second condition is ignored if there is a request which has
>     waited longer than FLUSH_PENDING_TIMEOUT.  This is to avoid
>     starvation in the unlikely case where there are continuous am of
>     FUA (without FLUSH) requests.
> 
> So if we will increment flush_tag counter in two places:
> blk_kick_flush: the place where flush request is issued
> flush_end_io : the place where flush is completed
> And by using rule (C1) we can guarantee that
> 
> if (flush_tag & 0x1 == 1) then flush_tag  is in progress
> if (flush_tag & 0x1 == 0) then (flush_tag & ~(0x1)) completed
> In other words we can define it as:
> 
> FLUSH_TAG_IDX    = (flush_tag +1) & ~0x1
> FLUSH_TAG_STATE  = flush_tag & 0x1 ? IN_PROGRESS : COMPLETED
> 
> After that we can define rules for flush optimization:
> We can be sure that our data was flushed only if:
> 1) data's bio was completed before flush request was QUEUED
>    and COMPLETED
> So in terms of formulas we can write it like follows:
> is_data_flushed = (blkdev->flush_tag & ~(0x1)) >
>                   ((data->flush_tag + 0x1) & (~0x1))
  Ah, clever! But then please
a) create a function with the condition so that it doesn't have to be
   opencoded in every place using the counter
b) add a detailed comment before that function explaining why it works.
   Something like what you wrote in this email...

  Thanks!

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

     prev parent reply	other threads:[~2013-04-18 14:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-14 19:31 [PATCH 0/2] [RFC] blkdev: flush generation optimization Dmitry Monakhov
2013-04-14 19:34 ` [PATCH 1/2] blkdev: add flush generation counter Dmitry Monakhov
2013-04-14 19:34   ` [PATCH 2/2] ext4: Add fdatasync scalability optimization Dmitry Monakhov
2013-04-15 14:14   ` [PATCH 1/2] blkdev: add flush generation counter Jan Kara
2013-04-17  8:46     ` Dmitry Monakhov
2013-04-18 14:34       ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130418143459.GA10243@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).