From: Leon Romanovsky <leon@kernel.org>
To: Jason Wang <jasowang@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>,
John Ogness <john.ogness@linutronix.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
netdev <netdev@vger.kernel.org>, Amit Shah <amit@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
virtualization@lists.linux-foundation.org,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Ran Rozenstein <ranro@nvidia.com>,
Itay Aveksis <itayav@nvidia.com>,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: netconsole deadlock with virtnet
Date: Tue, 24 Nov 2020 11:26:21 +0200 [thread overview]
Message-ID: <20201124092621.GH3159@unreal> (raw)
In-Reply-To: <6f046c51-cdcc-77f9-4859-2508d08126f8@redhat.com>
On Tue, Nov 24, 2020 at 04:57:23PM +0800, Jason Wang wrote:
>
> On 2020/11/24 下午4:01, Leon Romanovsky wrote:
> > On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote:
> > > On 2020/11/24 上午3:21, Jakub Kicinski wrote:
> > > > On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
> > > > > On Mon, 23 Nov 2020 10:52:52 -0800
> > > > > Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:
> > > > > > > On Mon, 23 Nov 2020 13:08:55 +0200
> > > > > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > > > > >
> > > > > > > > [ 10.028024] Chain exists of:
> > > > > > > > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2
> > > > > > > Note, the problem is that we have a location that grabs the xmit_lock while
> > > > > > > holding target_list_lock (and possibly console_owner).
> > > > > > Well, it try_locks the xmit_lock. Does lockdep understand try-locks?
> > > > > >
> > > > > > (not that I condone the shenanigans that are going on here)
> > > > > Does it?
> > > > >
> > > > > virtnet_poll_tx() {
> > > > > __netif_tx_lock() {
> > > > > spin_lock(&txq->_xmit_lock);
> > > > Umpf. Right. I was looking at virtnet_poll_cleantx()
> > > >
> > > > > That looks like we can have:
> > > > >
> > > > >
> > > > > CPU0 CPU1
> > > > > ---- ----
> > > > > lock(xmit_lock)
> > > > >
> > > > > lock(console)
> > > > > lock(target_list_lock)
> > > > > __netif_tx_lock()
> > > > > lock(xmit_lock);
> > > > >
> > > > > [BLOCKED]
> > > > >
> > > > > <interrupt>
> > > > > lock(console)
> > > > >
> > > > > [BLOCKED]
> > > > >
> > > > >
> > > > >
> > > > > DEADLOCK.
> > > > >
> > > > >
> > > > > So where is the trylock here?
> > > > >
> > > > > Perhaps you need the trylock in virtnet_poll_tx()?
> > > > That could work. Best if we used normal lock if !!budget, and trylock
> > > > when budget is 0. But maybe that's too hairy.
> > >
> > > If we use trylock, we probably lose(or delay) tx notification that may have
> > > side effects to the stack.
> > >
> > >
> > > > I'm assuming all this trickiness comes from virtqueue_get_buf() needing
> > > > locking vs the TX path? It's pretty unusual for the completion path to
> > > > need locking vs xmit path.
> > >
> > > Two reasons for doing this:
> > >
> > > 1) For some historical reason, we try to free transmitted tx packets in xmit
> > > (see free_old_xmit_skbs() in start_xmit()), we can probably remove this if
> > > we remove the non tx interrupt mode.
> > > 2) virtio core requires virtqueue_get_buf() to be synchronized with
> > > virtqueue_add(), we probably can solve this but it requires some non trivial
> > > refactoring in the virtio core
> > So how will we solve our lockdep issues?
> >
> > Thanks
>
>
> It's not clear to me that whether it's a virtio-net specific issue. E.g the
> above deadlock looks like a generic issue so workaround it via virtio-net
> may not help for other drivers.
It is hard to say, no one else complained except me who is using virtio :).
Thanks
>
> Thanks
>
>
> >
> > > Btw, have a quick search, there are several other drivers that uses tx lock
> > > in the tx NAPI.
> > >
> > > Thanks
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: Leon Romanovsky <leon@kernel.org>
To: Jason Wang <jasowang@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Petr Mladek <pmladek@suse.com>,
John Ogness <john.ogness@linutronix.de>,
virtualization@lists.linux-foundation.org,
Amit Shah <amit@kernel.org>, Itay Aveksis <itayav@nvidia.com>,
Ran Rozenstein <ranro@nvidia.com>,
netdev <netdev@vger.kernel.org>
Subject: Re: netconsole deadlock with virtnet
Date: Tue, 24 Nov 2020 11:26:21 +0200 [thread overview]
Message-ID: <20201124092621.GH3159@unreal> (raw)
In-Reply-To: <6f046c51-cdcc-77f9-4859-2508d08126f8@redhat.com>
On Tue, Nov 24, 2020 at 04:57:23PM +0800, Jason Wang wrote:
>
> On 2020/11/24 下午4:01, Leon Romanovsky wrote:
> > On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote:
> > > On 2020/11/24 上午3:21, Jakub Kicinski wrote:
> > > > On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
> > > > > On Mon, 23 Nov 2020 10:52:52 -0800
> > > > > Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:
> > > > > > > On Mon, 23 Nov 2020 13:08:55 +0200
> > > > > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > > > > >
> > > > > > > > [ 10.028024] Chain exists of:
> > > > > > > > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2
> > > > > > > Note, the problem is that we have a location that grabs the xmit_lock while
> > > > > > > holding target_list_lock (and possibly console_owner).
> > > > > > Well, it try_locks the xmit_lock. Does lockdep understand try-locks?
> > > > > >
> > > > > > (not that I condone the shenanigans that are going on here)
> > > > > Does it?
> > > > >
> > > > > virtnet_poll_tx() {
> > > > > __netif_tx_lock() {
> > > > > spin_lock(&txq->_xmit_lock);
> > > > Umpf. Right. I was looking at virtnet_poll_cleantx()
> > > >
> > > > > That looks like we can have:
> > > > >
> > > > >
> > > > > CPU0 CPU1
> > > > > ---- ----
> > > > > lock(xmit_lock)
> > > > >
> > > > > lock(console)
> > > > > lock(target_list_lock)
> > > > > __netif_tx_lock()
> > > > > lock(xmit_lock);
> > > > >
> > > > > [BLOCKED]
> > > > >
> > > > > <interrupt>
> > > > > lock(console)
> > > > >
> > > > > [BLOCKED]
> > > > >
> > > > >
> > > > >
> > > > > DEADLOCK.
> > > > >
> > > > >
> > > > > So where is the trylock here?
> > > > >
> > > > > Perhaps you need the trylock in virtnet_poll_tx()?
> > > > That could work. Best if we used normal lock if !!budget, and trylock
> > > > when budget is 0. But maybe that's too hairy.
> > >
> > > If we use trylock, we probably lose(or delay) tx notification that may have
> > > side effects to the stack.
> > >
> > >
> > > > I'm assuming all this trickiness comes from virtqueue_get_buf() needing
> > > > locking vs the TX path? It's pretty unusual for the completion path to
> > > > need locking vs xmit path.
> > >
> > > Two reasons for doing this:
> > >
> > > 1) For some historical reason, we try to free transmitted tx packets in xmit
> > > (see free_old_xmit_skbs() in start_xmit()), we can probably remove this if
> > > we remove the non tx interrupt mode.
> > > 2) virtio core requires virtqueue_get_buf() to be synchronized with
> > > virtqueue_add(), we probably can solve this but it requires some non trivial
> > > refactoring in the virtio core
> > So how will we solve our lockdep issues?
> >
> > Thanks
>
>
> It's not clear to me that whether it's a virtio-net specific issue. E.g the
> above deadlock looks like a generic issue so workaround it via virtio-net
> may not help for other drivers.
It is hard to say, no one else complained except me who is using virtio :).
Thanks
>
> Thanks
>
>
> >
> > > Btw, have a quick search, there are several other drivers that uses tx lock
> > > in the tx NAPI.
> > >
> > > Thanks
> > >
>
next prev parent reply other threads:[~2020-11-24 9:26 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-17 10:23 netconsole deadlock with virtnet Leon Romanovsky
2020-11-17 14:33 ` Steven Rostedt
2020-11-17 18:12 ` Leon Romanovsky
2020-11-18 2:46 ` Sergey Senozhatsky
2020-11-18 3:15 ` Sergey Senozhatsky
2020-11-18 4:09 ` Jason Wang
2020-11-18 14:12 ` Steven Rostedt
2020-11-18 14:12 ` Steven Rostedt
2020-11-23 11:08 ` Leon Romanovsky
2020-11-23 11:08 ` Leon Romanovsky
2020-11-23 14:31 ` Steven Rostedt
2020-11-23 14:31 ` Steven Rostedt
2020-11-23 18:52 ` Jakub Kicinski
2020-11-23 19:09 ` Steven Rostedt
2020-11-23 19:09 ` Steven Rostedt
2020-11-23 19:21 ` Jakub Kicinski
2020-11-24 3:22 ` Jason Wang
2020-11-24 3:22 ` Jason Wang
2020-11-24 8:01 ` Leon Romanovsky
2020-11-24 8:01 ` Leon Romanovsky
2020-11-24 8:57 ` Jason Wang
2020-11-24 8:57 ` Jason Wang
2020-11-24 9:26 ` Leon Romanovsky [this message]
2020-11-24 9:26 ` Leon Romanovsky
2020-11-24 14:31 ` Steven Rostedt
2020-11-24 14:31 ` Steven Rostedt
2020-11-25 6:20 ` Jason Wang
2020-11-25 6:20 ` Jason Wang
2020-11-24 16:20 ` Jakub Kicinski
2020-11-25 6:21 ` Jason Wang
2020-11-25 6:21 ` Jason Wang
2020-11-19 12:55 ` Petr Mladek via Virtualization
2020-11-22 8:41 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201124092621.GH3159@unreal \
--to=leon@kernel.org \
--cc=amit@kernel.org \
--cc=itayav@nvidia.com \
--cc=jasowang@redhat.com \
--cc=john.ogness@linutronix.de \
--cc=kuba@kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=ranro@nvidia.com \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.