netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	syzbot <syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Rafael Wysocki <rafael@kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Subject: Re: WARNING in ib_umad_kill_port
Date: Tue, 7 Apr 2020 11:35:28 -0300	[thread overview]
Message-ID: <20200407143528.GV20941@ziepe.ca> (raw)
In-Reply-To: <CACT4Y+Zy0LwpHkTMTtb08ojOxuEUFo1Z7wkMCYSVCvsVDcxayw@mail.gmail.com>

On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > I'm not sure what could be done wrong here to elicit this:
> > > >
> > > >  sysfs group 'power' not found for kobject 'umad1'
> > > >
> > > > ??
> > > >
> > > > I've seen another similar sysfs related trigger that we couldn't
> > > > figure out.
> > > >
> > > > Hard to investigate without a reproducer.
> > >
> > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > some races. E.g. one thread registers devices, while another
> > > unregisters these.
> >
> > I did check that the naming is ordered right, at least we won't be
> > concurrently creating and destroying umadX sysfs of the same names.
> >
> > I'm also fairly sure we can't be destroying the parent at the same
> > time as this child.
> >
> > Do you see the above commonly? Could it be some driver core thing? Or
> > is it more likely something wrong in umad?
> 
> Mmmm... I can't say, I am looking at some bugs very briefly. I've
> noticed that sysfs comes up periodically (or was it some other similar
> fs?). 

Hmm..

Looking at the git history I see several cases where there are
ordering problems. I wonder if the rdma parent device is being
destroyed before the rdma devices complete destruction?

I see the syzkaller is creating a bunch of virtual net devices, and I
assume it has created a software rdma device on one of these virtual
devices.

So I'm guessing that it is also destroying a parent? But I can't guess
which.. Some simple tests with veth suggest it is OK because the
parent is virtual. But maybe bond or bridge or something?

The issue in rdma is that unregistering a netdev triggers an async
destruction of the RDMA devices. This has to be async because the
netdev notification is delivered with RTNL held, and a rdma device
cannot be destroyed while holding RTNL.

So there is a race, I suppose, where the netdev can complete
destruction while rdma continues, and if someone deletes the sysfs
holding the netdev before rdma completes, I'm going to guess, that we
hit this warning?

Could it be? I would love to know what netdev the rdma device was
created on, but it doesn't seem to show in the trace :\ 

This theory could be made more likely by adding a sleep to
ib_unregister_work() to increase the race window - is there some way
to get syzkaller to search for a reproducer with that patch?

Jason

  parent reply	other threads:[~2020-04-07 14:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-06  6:37 WARNING in ib_umad_kill_port syzbot
2020-04-06 17:21 ` Leon Romanovsky
2020-04-06 17:44   ` Jason Gunthorpe
2020-04-07  9:56     ` Dmitry Vyukov
2020-04-07 11:55       ` Jason Gunthorpe
2020-04-07 12:39         ` Dmitry Vyukov
2020-04-07 14:33           ` Greg Kroah-Hartman
2020-04-07 14:35           ` Jason Gunthorpe [this message]
2020-04-09 13:35             ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200407143528.GV20941@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=dvyukov@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).