linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: Broken dio refcounting leads to livelock?
Date: Wed, 9 Jan 2019 10:12:55 +1100	[thread overview]
Message-ID: <20190108231255.GY4205@dastard> (raw)
In-Reply-To: <20190108210716.GV12689@magnolia>

On Tue, Jan 08, 2019 at 01:07:16PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 08, 2019 at 03:54:23PM -0500, Jeff Moyer wrote:
> > "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> > 
> > > It looks like we're issuing two bios to satisfy a particular dio.
> > > However, the first bio completes before the submitter calls finish_plug??
> > >
> > > I guess this means that blk_start_plug no longer plugs up io requests,
> > > which means that the end_io function tries to wake up the submitter
> > > before it's even had a chance to issue the second bio.
> > 
> > blk_start_plug was always a hint.  If you exceed a certain number of
> > requests (BLK_MAX_REQUEST_COUNT, which is 16) or a certain size of
> > request (BLK_PLUG_FLUSH_SIZE, which is 128k), the block layer will flush
> > the plug list.
> > 
> > > This is surprising to me, because I was under the impression that
> > > blk_{start,finish}_plug held all the bios so there was no chance that
> > > any of them would issue (let alone call end_io) before finish_plug.
> > 
> > Definitely not the case.  The plug list is just a performance
> > optimization.
> 
> Ahh, fun!  I had not realized that it was merely a suggestion.
> 
> $ git grep blk_start_plug Documentation/
> $ echo $?
> 1
> 
> In that case we definitely need something to prevent the endio from
> trying to wake up a submitter that's still submitting.  Below is the
> amended patch I'm using to run generic/013 and generic/323 in a loop.
> No crashes so far.

Bloody ugly having to take /two/ submission context reference
counts, but....

/me sees this in the block device dio path:

              if (!dio->multi_bio) {
                        /*
                         * AIO needs an extra reference to ensure the dio
                         * structure which is embedded into the first bio
                         * stays around.
                         */
                        if (!is_sync)
                                bio_get(bio);
                        dio->multi_bio = true;
                        atomic_set(&dio->ref, 2);
               } else {
                       atomic_inc(&dio->ref);
               }


This code was derived from the iomap direct IO code, but unlike the
iomap code it takes two references to the first bio in the
submission set and sets a flag to indicate there are multiple
bios in the set. because the extra bio reference is not dropped
until all the IO is completed, the struct dio is valid until
everything is done and it is no longer being referenced.

The other thing to note here is that "dio->ref" is actually an "io
remaining" counter, and it is /pre-incremented/. That is, when we
submit the first bio, the ref count is set to 2 so that if the IO
completes before the second bio is submitted the count will not go
to zero and the IO won't be completed.

So, this is /quite different/ to the iomap code it was derived from.
i.e. dio->ref is an IO counter, not a structure lifetime controller,
and the struct dio lifetime is handled by a bio reference counter
rather than an internal counter. That makes a bunch of stuff much
simpler.

OK, I'll rework the patch with this in mind....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      parent reply	other threads:[~2019-01-08 23:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-08  0:26 Broken dio refcounting leads to livelock? Darrick J. Wong
2019-01-08  1:16 ` Darrick J. Wong
2019-01-08  3:03   ` Darrick J. Wong
2019-01-08  7:46 ` Dave Chinner
2019-01-08 20:08   ` Darrick J. Wong
2019-01-08 20:54     ` Jeff Moyer
2019-01-08 21:07       ` Darrick J. Wong
2019-01-08 21:30         ` Jeff Moyer
2019-01-08 23:12         ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190108231255.GY4205@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).