All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, "Klaus Jensen" <its@irrelevant.dk>,
	kwolf@redhat.com, "Cleber Rosa" <crosa@redhat.com>,
	"John Snow" <jsnow@redhat.com>, "Hanna Reitz" <hreitz@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Keith Busch" <kbusch@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	qemu-block@nongnu.org,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"David Hildenbrand" <david@redhat.com>,
	"Fam Zheng" <fam@euphon.net>
Subject: Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test
Date: Thu, 14 Sep 2023 08:01:41 -0400	[thread overview]
Message-ID: <20230914120141.GA1047741@fedora> (raw)
In-Reply-To: <wxfg62pyjq2dhh7jfhpng7tgnkaz3o5zkdie426m7tbgopjkhm@gijotjxjzjhf>

[-- Attachment #1: Type: text/plain, Size: 4342 bytes --]

On Wed, Sep 13, 2023 at 11:08:54AM -0500, Eric Blake wrote:
> On Tue, Sep 12, 2023 at 07:10:34PM -0400, Stefan Hajnoczi wrote:
> > This patch fixes a race condition in test-bdrv-drain that is difficult
> > to reproduce. test-bdrv-drain sometimes fails without an error message
> > on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
> > to reproduce it locally and found that "block-backend: process I/O in
> > the current AioContext" (in this patch series) is the first commit where
> > it reproduces.
> > 
> > I do not know why "block-backend: process I/O in the current AioContext"
> > exposes this bug. It might be related to the fact that the test's preadv
> > request runs in the main thread instead of IOThread a after my commit.
> 
> In reading the commit message before the impacted code, my first
> thought was that you had a typo of an extra word (that is, something
> to fix by s/a //), but reading further, a better fix would be calling
> attention to the fact that you are referencing a specific named
> thread, as in s/IOThread a/IOThread A/...
> 
> > That might simply change the timing of the test.
> > 
> > Now on to the race condition in test-bdrv-drain. The main thread
> > schedules a BH in IOThread a and then drains the BDS:
> 
> ...and another spot with the same parse issue...
> 
> > 
> >   aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
> > 
> >   /* The request is running on the IOThread a. Draining its block device
> 
> ...but here you were quoting from the existing code base, which is
> where I finally realized it was more than just your commit message.
> 
> >    * will make sure that it has completed as far as the BDS is concerned,
> >    * but the drain in this thread can continue immediately after
> >    * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
> >    * later. */
> >   do_drain_begin(drain_type, bs);
> > 
> > If the BH completes before do_drain_begin() then there is nothing to
> > worry about.
> > 
> > If the BH invokes bdrv_flush() before do_drain_begin(), then
> > do_drain_begin() waits for it to complete.
> > 
> > The problematic case is when do_drain_begin() runs before the BH enters
> > bdrv_flush(). Then do_drain_begin() misses the BH and the drain
> > mechanism has failed in quiescing I/O.
> > 
> > Fix this by incrementing the in_flight counter so that do_drain_begin()
> > waits for test_iothread_main_thread_bh().
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tests/unit/test-bdrv-drain.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> > index ccc453c29e..67a79aa3f0 100644
> > --- a/tests/unit/test-bdrv-drain.c
> > +++ b/tests/unit/test-bdrv-drain.c
> > @@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
> >       * executed during drain, otherwise this would deadlock. */
> >      aio_context_acquire(bdrv_get_aio_context(data->bs));
> >      bdrv_flush(data->bs);
> > +    bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
> >      aio_context_release(bdrv_get_aio_context(data->bs));
> >  }
> >  
> > @@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
> >              aio_context_acquire(ctx_a);
> >          }
> >  
> > +        /*
> > +         * Increment in_flight so that do_drain_begin() waits for
> > +         * test_iothread_main_thread_bh(). This prevents the race between
> > +         * test_iothread_main_thread_bh() in IOThread a and do_drain_begin() in
> > +         * this thread. test_iothread_main_thread_bh() decrements in_flight.
> > +         */
> > +        bdrv_inc_in_flight(bs);
> >          aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
> >  
> >          /* The request is running on the IOThread a. Draining its block device
> 
> and indeed, your commit message is consistent with the current code's
> naming convention.  If you have reason to respin, a pre-req patch to
> change the case before adding more references might be nice, but I
> won't insist.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

Sorry about that. It is confusing.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-09-14 12:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12 23:10 [PATCH v3 0/5] block-backend: process I/O in the current AioContext Stefan Hajnoczi
2023-09-12 23:10 ` [PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context() Stefan Hajnoczi
2023-09-13 15:59   ` Eric Blake
2023-09-14  6:44   ` Klaus Jensen
2023-09-12 23:10 ` [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test Stefan Hajnoczi
2023-09-13 16:08   ` Eric Blake
2023-09-14 12:01     ` Stefan Hajnoczi [this message]
2023-09-12 23:10 ` [PATCH v3 3/5] block-backend: process I/O in the current AioContext Stefan Hajnoczi
2023-09-13 16:20   ` Eric Blake
2023-09-12 23:10 ` [PATCH v3 4/5] block-backend: process zoned requests " Stefan Hajnoczi
2023-09-13 16:22   ` Eric Blake
2023-09-12 23:10 ` [PATCH v3 5/5] block-coroutine-wrapper: use qemu_get_current_aio_context() Stefan Hajnoczi
2023-09-13 16:23   ` Eric Blake
2023-09-15 14:39 ` [PATCH v3 0/5] block-backend: process I/O in the current AioContext Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230914120141.GA1047741@fedora \
    --to=stefanha@redhat.com \
    --cc=crosa@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=its@irrelevant.dk \
    --cc=jsnow@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.