Re: 9pfs: Twalk crash - Christian Schoenebeck

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: Greg Kurz <groug@kaod.org>
Cc: qemu-devel@nongnu.org
Subject: Re: 9pfs: Twalk crash
Date: Wed, 01 Sep 2021 18:07:39 +0200	[thread overview]
Message-ID: <3500709.Usqnbg2EYA@silver> (raw)
In-Reply-To: <20210901174102.715b3169@bahia.lan>

On Mittwoch, 1. September 2021 17:41:02 CEST Greg Kurz wrote:
> On Wed, 01 Sep 2021 16:21:06 +0200
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Mittwoch, 1. September 2021 14:49:37 CEST Christian Schoenebeck wrote:
> > > > > And it triggered, however I am not sure if some of those functions I
> > > > > asserted above are indeed allowed to be executed on a different
> > > > > thread
> > > > > than main thread:
> > > > > 
> > > > > Program terminated with signal SIGABRT, Aborted.
> > > > > #0  __GI_raise (sig=sig@entry=6) at
> > > > > ../sysdeps/unix/sysv/linux/raise.c:50
> > > > > 50      ../sysdeps/unix/sysv/linux/raise.c: No such file or
> > > > > directory.
> > > > > [Current thread is 1 (Thread 0x7fd0bcef1700 (LWP 6470))]
> > > > 
> > > > Based in the thread number, it seems that the signal was raised by
> > > > the main event thread...
> > > 
> > > No, it was not main thread actually, gdb's "current thread is 1" output
> > > is
> > > misleading.
> > > 
> > > Following the thread id trace, I extended the thread assertion checks
> > > over
> > > to v9fs_walk() as well, like this:
> > > 
> > > static void coroutine_fn v9fs_walk(void *opaque)
> > > {
> > > 
> > >     ...
> > >     assert_thread();
> > >     v9fs_co_run_in_worker({
> > >     
> > >         ...
> > >     
> > >     });
> > >     assert_thread();
> > >     ...
> > > 
> > > }
> > > 
> > > and made sure the reference thread id to be compared is really the main
> > > thread.
> > > 
> > > And what happens here is before v9fs_co_run_in_worker() is entered,
> > > v9fs_walk() runs on main thread, but after returning from
> > > v9fs_co_run_in_worker() it runs on a different thread for some reason,
> > > not
> > > on main thread as it would be expected at that point.
> > 
> > Ok, I think I found the root cause: the block is break;-ing out too far.
> > The
> That could explain the breakage indeed since the block you've added
> to v9fs_walk() embeds a bunch of break statements. AFAICT this block
> breaks on errors... do you know which one ?

Yes, I've verified that. In my case an interrupt of Twalk triggered this bug. 
so it was this path exactly:

    v9fs_co_run_in_worker({
        if (v9fs_request_cancelled(pdu)) {
            ...
            break;
        }
        ...
    });

so it was really this break;-ing too far being the root cause of the crash.

> > following patch should fix it:
> > 
> > diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h
> > index c51289903d..f83c7dda7b 100644
> > --- a/hw/9pfs/coth.h
> > +++ b/hw/9pfs/coth.h
> > @@ -51,7 +51,9 @@
> > 
> >           */                                                             \
> >          
> >          qemu_coroutine_yield();                                         \
> >          qemu_bh_delete(co_bh);                                          \
> > 
> > -        code_block;                                                     \
> > +        do {                                                            \
> > +            code_block;                                                 \
> > +        } while (0);                                                    \
> 
> Good.
> 
> >          /* re-enter back to qemu thread */                              \
> >          qemu_coroutine_yield();                                         \
> >      
> >      } while (0)
> > 
> > I haven't triggered a crash with that patch, but due to the occasional
> > nature of this issue I'll give it some more spins before officially
> > proclaiming it my bug. :)
> 
> Well, this is a pre-existing limitation with v9fs_co_run_in_worker().
> This wasn't documented as such and not really obvious to detect when
> you optimized TWALK. We've never hit it before because the other
> v9fs_co_run_in_worker() users don't have break statements.

Yes, I know, this was my bad.

> But, indeed, this caused a regression in 6.1 so this will need a Fixes:
> tag and Cc: qemu-stable.

Yep, I'm preparing a patch now.

Best regards,
Christian Schoenebeck

next prev parent reply	other threads:[~2021-09-01 16:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30 15:55 9pfs: Twalk crash Christian Schoenebeck
2021-08-31 10:57 ` Greg Kurz
2021-08-31 15:00   ` Christian Schoenebeck
2021-08-31 17:04     ` Greg Kurz
2021-09-01 12:49       ` Christian Schoenebeck
2021-09-01 14:21         ` Christian Schoenebeck
2021-09-01 15:41           ` Greg Kurz
2021-09-01 16:07             ` Christian Schoenebeck [this message]
2021-09-01 16:31               ` Greg Kurz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3500709.Usqnbg2EYA@silver \
    --to=qemu_oss@crudebyte.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.