Re: [Qemu-devel] [Qemu-block] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nishanth Aravamudan <naravamudan@digitalocean.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Eric Blake <eblake@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] [RFC v2] aio: properly bubble up errors from initialization
Date: Wed, 20 Jun 2018 20:26:14 -0700	[thread overview]
Message-ID: <20180621032614.GA8787@breakout> (raw)
In-Reply-To: <20180620193452.GA25190@breakout>

On 20.06.2018 [12:34:52 -0700], Nishanth Aravamudan wrote:
> On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote:
> > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben:
> > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote:
> > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > > > 
> > > > <snip>
> > > > 
> > > > > > >           } else if (s->use_linux_aio) {
> > > > > > > +            int rc;
> > > > > > > +            rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > > > > > +            if (rc != 0) {
> > > > > > > +                error_report("Unable to use native AIO, falling back to "
> > > > > > > +                             "thread pool.");
> > > > > > 
> > > > > > In general, error_report() should not output a trailing '.'.
> > > > > 
> > > > > Will fix.
> > > > > 
> > > > > > > +                s->use_linux_aio = 0;
> > > > > > > +                return rc;
> > > > > > 
> > > > > > Wait - the message claims we are falling back, but the non-zero return code
> > > > > > sounds like we are returning an error instead of falling back.  (My
> > > > > > preference - if the user requested something and we can't do it, it's better
> > > > > > to error than to fall back to something that does not match the user's
> > > > > > request).
> > > > > 
> > > > > I think that makes sense, I hadn't tested this specific case (in my
> > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be
> > > > > called before raw_aio_plug() had been called, but I think returning the
> > > > > error code up should be handled correctly. What about the cases where
> > > > > there is no error handling (the other two changes in the patch)?
> > > > 
> > > > While looking at doing these changes, I realized that I'm not quite sure
> > > > what the right approach is here. My original rationale for returning
> > > > non-zero was that AIO was requested but could not be completed. I
> > > > haven't fully tracked back the calling paths, but I assumed it would get
> > > > retried at the top level, and since we indicated to not use AIO on
> > > > subsequent calls, it will succeed and use threads then (note, that I do
> > > > now realize this means a mismatch between the qemu command-line and the
> > > > in-use AIO model).
> > > > 
> > > > In practice, with my v2 patch, where I do return a non-zero error-code
> > > > from this function, qemu does not exit (nor is any logging other than
> > > > that I added emitted on the monitor). If I do not fallback, I imagine we
> > > > would just continuously see this error message and IO might not actually
> > > > every occur? Reworking all of the callpath to fail on non-zero returns
> > > > from raw_co_prw() seems like a fair bit of work, but if that is what is
> > > > being requested, I can try that (it will just take a while).
> > > > Alternatively, I can produce a v3 quickly that does not bubble the
> > > > actual errno all the way up (since it does seem like it is ignored
> > > > anyways?).
> > > 
> > > Sorry for the noise, but I had one more thought. Would it be appropriate
> > > to push the _setup() call up to when we parse the arguments about
> > > aio=native? E.g., we already check there if cache=directsync is
> > > specified and error out if not.
> > 
> > We already do this:
> 
> Right, I stated above it already is done, I simply meant adding a second
> check here that we can obtain and setup the AIO context successfully.
>  
> >      /* Currently Linux does AIO only for files opened with O_DIRECT */
> >     if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
> >         error_setg(errp, "aio=native was specified, but it requires "
> >                          "cache.direct=on, which was not specified.");
> >         ret = -EINVAL;
> >         goto fail;
> >     }
> > 
> > laio_init() is about other types of errors. But anyway, yes, calling
> > laio_init() already in .bdrv_open() is possible. Returning errors from
> > .bdrv_open() is nice and easy and we should do it.
> 
> Ack.
> 
> > However, we may also need to call laio_init() again when switching to a
> > different I/O thread after the image is already opened. This is what I
> > meant when I commented on v1 that you should do this in the
> > .bdrv_attach_aio_context callback. The problem here is that we can't
> > return an error there and the guest is already using the image. In this
> > case, logging an error and falling back to the thread pool seems to be
> > the best option we have.
> 
> Is this is a request for new functionality? Just trying to understand,
> because aiui, block/file-posix.c does not implement the
> bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio()
> is called from three places, raw_co_prw, raw_aio_plug and
> raw_aio_unplug, which calls into laio_init() and
> laio_attach_aio_context(). I can add the callback you suggest with
> appropriate error handling (I suppose it would point to
> laio_attach_aio_context, possibly with some modifications) and remove
> the call from aio_get_linux_aio()? Just trying to understand the request
> a bit better, as I don't see where exactly iothreads get switched and
> how that is implemented currently (and thus where laio_init() would get
> called again in the current code).

While I waited for a reply to this, I started coding on what I think was
being asked for and have come to the conclusion that there are actually
three bugs here :)

Test cases (with one disk attached to the VM):

1) Set /proc/sys/fs/max-aio-nr to 0. Specify aio=native and qemu dies
with a SIGSEGV.
    - This case is understood and pushing the laio_init()-return code
      check to the bdrv_open() path fixes this (and allows for the
      failure to be communicated to the user).

2) Set /proc/sys/fs/max-aio-nr to 128. Specify aio=native and some
number of IOThreads. Over qmp issue a x-blockdev-set-iothread command to
move the block device node to one of the IOThreads. qemu eventually dies
with a SIGSEGV.
    - I am fairly sure this is the case you described above, and is
      fixed by re-implementing the bdrv_{attach,detach}_aio_context
      callbacks. I have a patch that does this and successfully tested
      the SEGV is avoided.

3) Set /proc/sys/fs/max-aio-nr to 512 (I think 256 would be sufficient,
though). Specify aio=native and some number of IOThreads. Over qmp issue
a x-blockdev-set-iothread command to move the block device node to one
of the IOThreads. Shutdown the guest normally. qemu dies with a SIGABRT.
    - This appears to be because there is a mismatch in
      aio_context_{acquire,release} calls (this is my hypothesis right
      now). The abort comes from bdrv_flush -> aio_context_release and
      an EPERM from qemu_mutex_unlock_impl() which I believe is just
      reflecting an EPERM from pthread_mutex_unlock? My theory is that
      the main qemu thread acquired the aio mutex but then the IOThread
      released it? I will try and trace the mutexes tomorrow, but I
      still don't have a fix for this case.

Thanks,
Nish

next prev parent reply	other threads:[~2018-06-21  3:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-14 23:21 [Qemu-devel] [PATCH] [RFC] aio: properly bubble up errors from initialization Nishanth Aravamudan
2018-06-15  0:20 ` no-reply
2018-06-15  8:41 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2018-06-15 16:51   ` Nishanth Aravamudan
2018-06-15 17:47 ` [Qemu-devel] [PATCH] [RFC v2] " Nishanth Aravamudan
2018-06-19 19:35   ` Eric Blake
2018-06-19 20:14     ` Nishanth Aravamudan
2018-06-19 22:35       ` Nishanth Aravamudan
2018-06-19 22:54         ` Nishanth Aravamudan
2018-06-20  9:57           ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2018-06-20 19:34             ` Nishanth Aravamudan
2018-06-21  3:26               ` Nishanth Aravamudan [this message]
2018-06-21 13:51                 ` Kevin Wolf
2018-06-21 16:30                   ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180621032614.GA8787@breakout \
    --to=naravamudan@digitalocean.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.