qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] AIO error case
@ 2018-05-22 22:01 Nishanth Aravamudan
  2018-05-23 17:53 ` John Snow
  0 siblings, 1 reply; 4+ messages in thread
From: Nishanth Aravamudan @ 2018-05-22 22:01 UTC (permalink / raw)
  To: qemu-devel

Hi!

I'm tracking an error case in the native AIO path, and was wondering if
there was a latent (albeit possibly hard to hit) bug. Specifically
util/async.c::aio_get_linux_aio:

#ifdef CONFIG_LINUX_AIO
LinuxAioState *aio_get_linux_aio(AioContext *ctx)
{
    if (!ctx->linux_aio) {
        ctx->linux_aio = laio_init();
        laio_attach_aio_context(ctx->linux_aio, ctx);
    }
    return ctx->linux_aio;
}
#endif

laio_init() can in certain conditions return NULL, but that's not checked
here and then the NULL result is passed directly into
laio_attach_aio_context, which dereferences it without checking that the
pointer is valid.

I'm not sure what is appropriate if laio_init() returns NULL, returning
NULL back to the caller of aio_get_linux_aio() has its own issues, because
those callers don't seem to check its return value either.

Thanks in advance!
-Nish

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] AIO error case
  2018-05-22 22:01 [Qemu-devel] AIO error case Nishanth Aravamudan
@ 2018-05-23 17:53 ` John Snow
  2018-05-23 18:25   ` Nishanth Aravamudan
  0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2018-05-23 17:53 UTC (permalink / raw)
  To: Nishanth Aravamudan, qemu-devel, Qemu-block



On 05/22/2018 06:01 PM, Nishanth Aravamudan via Qemu-devel wrote:
> Hi!
> 

Hi! CCing qemu-block@nongnu.org;

> I'm tracking an error case in the native AIO path, and was wondering if
> there was a latent (albeit possibly hard to hit) bug. Specifically
> util/async.c::aio_get_linux_aio:
> 
> #ifdef CONFIG_LINUX_AIO
> LinuxAioState *aio_get_linux_aio(AioContext *ctx)
> {
>     if (!ctx->linux_aio) {
>         ctx->linux_aio = laio_init();
>         laio_attach_aio_context(ctx->linux_aio, ctx);
>     }
>     return ctx->linux_aio;
> }
> #endif
> 
> laio_init() can in certain conditions return NULL, but that's not checked
> here and then the NULL result is passed directly into
> laio_attach_aio_context, which dereferences it without checking that the
> pointer is valid.
> 

Looks like a good old-fashioned bug to me:

``
    if (event_notifier_init(&s->e, false) < 0) {
        goto out_free_state;
    }

    if (io_setup(MAX_EVENTS, &s->ctx) != 0) {
        goto out_close_efd;
    }

...

out_close_efd:
    event_notifier_cleanup(&s->e);
out_free_state:
    g_free(s);
    return NULL;
``

event_notifier_init looks like it is... usually(?) going to return
-errno on error, and the man page for io_setup suggests that the libaio
wrapper for io_setup is supposed to return -errno if it fails; we could
probably hold onto that error code.

We probably ought to forward those error codes up to the caller; maybe:

int laio_init(LinuxAioState **linux_aio);

> I'm not sure what is appropriate if laio_init() returns NULL, returning
> NULL back to the caller of aio_get_linux_aio() has its own issues, because
> those callers don't seem to check its return value either.
> 

If laio_init can return NULL,  aio_get_linux_aio needs to check for that.

Then callers of that ought to be amended to return some kind of error:

raw_co_prw can return whatever laio_init returned as an error code.
raw_aio_plug doesn't have a way to communicate errors, yet...
raw_aio_unplug doesn't have an error reporting mechanism either.

Those last two are defined here:

    .bdrv_io_plug = raw_aio_plug,
    .bdrv_io_unplug = raw_aio_unplug,

but it looks like bdrv_io_un|plug doesn't return errors either.

bdrv_io_plug is called by both blk_io_plug and itself, but has no other
callers.

blk_io_plug is called in two places:

virtio_scsi_handle_cmd_req_prepare, which can report -errno error codes
(so we can amend blk_io_plug and bdrv_io_plug to just bubble up errors), and

virtio_blk_handle_vq -- which doesn't appear to be capable of handling
an error, just indicating "progress".

unplug has three usages, one in virtio-blk and two in virtio-scsi, all
of which only appear to check for "progress" and don't seem to have
explicit error reporting.

Probably unplug can't actually fail like this though -- I'm assuming by
the name that it must occur after a call to plug and that will have
memoized the call to aio_get_linux_aio by then.

So... probably what we need to do here is take a look at the virtio-blk
usage of plug and do error-plumbing as necessary.

Wanna send a patch?

--js

> Thanks in advance!
> -Nish
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] AIO error case
  2018-05-23 17:53 ` John Snow
@ 2018-05-23 18:25   ` Nishanth Aravamudan
  2018-05-23 18:27     ` John Snow
  0 siblings, 1 reply; 4+ messages in thread
From: Nishanth Aravamudan @ 2018-05-23 18:25 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, Qemu-block

On Wed, May 23, 2018 at 10:53 AM, John Snow <jsnow@redhat.com> wrote:
>
>
>
> On 05/22/2018 06:01 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > Hi!
> >
>
> Hi! CCing qemu-block@nongnu.org;
>
> > I'm tracking an error case in the native AIO path, and was wondering if
> > there was a latent (albeit possibly hard to hit) bug. Specifically
> > util/async.c::aio_get_linux_aio:
> >
> > #ifdef CONFIG_LINUX_AIO
> > LinuxAioState *aio_get_linux_aio(AioContext *ctx)
> > {
> >     if (!ctx->linux_aio) {
> >         ctx->linux_aio = laio_init();
> >         laio_attach_aio_context(ctx->linux_aio, ctx);
> >     }
> >     return ctx->linux_aio;
> > }
> > #endif
> >
> > laio_init() can in certain conditions return NULL, but that's not
checked
> > here and then the NULL result is passed directly into
> > laio_attach_aio_context, which dereferences it without checking that the
> > pointer is valid.
> >
>
> Looks like a good old-fashioned bug to me:


Agreed!

<snip>

> Wanna send a patch?

Yep I'll work on this over the next few days. Thanks for reply!

-Nish

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] AIO error case
  2018-05-23 18:25   ` Nishanth Aravamudan
@ 2018-05-23 18:27     ` John Snow
  0 siblings, 0 replies; 4+ messages in thread
From: John Snow @ 2018-05-23 18:27 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: qemu-devel, Qemu-block



On 05/23/2018 02:25 PM, Nishanth Aravamudan wrote:
> On Wed, May 23, 2018 at 10:53 AM, John Snow <jsnow@redhat.com
> <mailto:jsnow@redhat.com>> wrote:
>>
>>
>>
>> On 05/22/2018 06:01 PM, Nishanth Aravamudan via Qemu-devel wrote:
>> > Hi!
>> >
>>
>> Hi! CCing qemu-block@nongnu.org <mailto:qemu-block@nongnu.org>;
>>
>> > I'm tracking an error case in the native AIO path, and was wondering if
>> > there was a latent (albeit possibly hard to hit) bug. Specifically
>> > util/async.c::aio_get_linux_aio:
>> >
>> > #ifdef CONFIG_LINUX_AIO
>> > LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>> > {
>> >     if (!ctx->linux_aio) {
>> >         ctx->linux_aio = laio_init();
>> >         laio_attach_aio_context(ctx->linux_aio, ctx);
>> >     }
>> >     return ctx->linux_aio;
>> > }
>> > #endif
>> >
>> > laio_init() can in certain conditions return NULL, but that's not
> checked
>> > here and then the NULL result is passed directly into
>> > laio_attach_aio_context, which dereferences it without checking that the
>> > pointer is valid.
>> >
>>
>> Looks like a good old-fashioned bug to me:
> 
> 
> Agreed!
>  
> <snip>
> 
>> Wanna send a patch?
> 
> Yep I'll work on this over the next few days. Thanks for reply!
> 
> -Nish

I looked at plug and unplug and it really looks like -- apart from the
memoization of aio_get_linux_aio that might fail -- there's nothing in
those calls that is expected to actually break.

Might be saner to try to force the memoization earlier for virtio-blk
and virtio-scsi and test the return at that time; then just assert that
aio_get_linux_aio actually returns non-null in calls like un/plug that
cannot fail.

Should save you a lot of rewiring work.

--js

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-23 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-22 22:01 [Qemu-devel] AIO error case Nishanth Aravamudan
2018-05-23 17:53 ` John Snow
2018-05-23 18:25   ` Nishanth Aravamudan
2018-05-23 18:27     ` John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).