From: Salil Mehta <salil.mehta@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: "edumazet@google.com" <edumazet@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"lipeng (Y)" <lipeng321@huawei.com>,
"Zhuangyuzeng (Yisen)" <yisen.zhuang@huawei.com>,
"David S . Miller" <davem@davemloft.net>,
Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH] net: hns: Ensure that interface teardown cannot race with TX interrupt
Date: Wed, 6 Nov 2019 19:05:33 +0000 [thread overview]
Message-ID: <b60a6cd0c3934d52aec14b47b2218edf@huawei.com> (raw)
In-Reply-To: <21493d3d08936d7ed67f7153cdaa418e@www.loen.fr>
> From: Marc Zyngier [mailto:maz@kernel.org]
> Sent: Wednesday, November 6, 2019 12:17 PM
> To: Salil Mehta <salil.mehta@huawei.com>
>
> On 2019-11-06 12:28, Salil Mehta wrote:
> > Hi Marc,
> >
> >> On Tue, 5 Nov 2019 18:41:11 +0000
> >> Salil Mehta <salil.mehta@huawei.com> wrote:
> >>
> >> Hi Salil,
> >>
> >> > Hi Marc,
> >> > I tested with the patch on D05 with the lockdep enabled kernel
> >> with below options
> >> > and I could not reproduce the deadlock. I do not argue the issue
> >> being mentioned
> >> > as this looks to be a clear bug which should hit while TX
> >> data-path is running
> >> > and we try to disable the interface.
> >>
> >> Lockdep screaming at you doesn't mean the deadly scenario happens in
> >> practice, and I've never seen the machine hanging in these
> >> conditions.
> >> But I've also never tried to trigger it in anger.
> >>
> >> > Could you please help me know the exact set of steps you used to
> >> get into this
> >> > problem. Also, are you able to re-create it easily/frequently?
> >>
> >> I just need to issue "reboot" (which calls "ip link ... down") for
> >> this
> >> to trigger. Here's a full splat[1], as well as my full config[2]. It
> >> is
> >> 100% repeatable.
> >>
> >> > # Kernel Config options:
> >> > CONFIG_LOCKDEP_SUPPORT=y
> >> > CONFIG_LOCKDEP=y
> >>
> >> You'll need at least
> >>
> >> CONFIG_PROVE_LOCKING=y
> >> CONFIG_NET_POLL_CONTROLLER=y
> >
> >
> > Few points:
> > 1. To me netpoll causing spinlock deadlock with IRQ leg of TX and ip
> > util is
> > highly unlikely since netpoll runs with both RX/TX interrupts
> > disabled.
> > It runs in polling mode to facilitate parallel path to features
> > like
> > Netconsole, netdump etc. hence, deadlock because of the netpoll
> > should
> > be highly unlikely. Therefore, smells of some other problem
> > here...
> > 2. Also, I remember patch[s1][s2] from Eric Dumazet to disable
> > netpoll on many
> > NICs way back in 4.19 kernel on the basis of Song Liu's findings.
> >
> > Problem:
> > Aah, I see the problem now, it is because of the stray code related
> > to the
> > NET_POLL_CONTROLLER in hns driver which actually should have got
> > remove within
> > the patch[s1], and that also explains why it does not get hit while
> > NET POLL
> > is disabled.
> >
> >
> > /* netif_tx_lock will turn down the performance, set only when
> > necessary */
> > #ifdef CONFIG_NET_POLL_CONTROLLER
> > #define NETIF_TX_LOCK(ring) spin_lock(&(ring)->lock)
> > #define NETIF_TX_UNLOCK(ring) spin_unlock(&(ring)->lock)
> > #else
> > #define NETIF_TX_LOCK(ring)
> > #define NETIF_TX_UNLOCK(ring)
> > #endif
> >
> >
> > Once you define CONFIG_NET_POLL_CONTROLLER in the latest code these
> > macros
> > Kick-in even for the normal NAPI path. Which can cause deadlock and
> > that perhaps
> > is what you are seeing?
>
> Yes, that's the problem.
Sure, thanks.
> > Now, the question is do we require these locks in normal NAPI poll? I
> > do not
> > see that we need them anymore as Tasklets are serialized to
> > themselves and
> > configuration path like "ip <intf> down" cannot conflict with NAPI
> > poll path
> > as the later is always disabled prior performing interface down
> > operation.
> > Hence, no conflict there.
>
> My preference would indeed be to drop these per-queue locks if they
> aren't
> required. I couldn't figure out from a cursory look at the code whether
> two CPUs could serve the same TX queue. If that cannot happen by
> construction,
> then these locks are perfectly useless and should be removed.
Sure.
> > As a side analysis, I could figure out some contentions in the
> > configuration
> > path not related to this though. :)
> >
> >
> > Suggested Solution:
> > Since we do not have support of NET_POLL_CONTROLLER macros
> > NETIF_TX_[UN]LOCK
> > We should remove these NET_POLL_CONTROLLER macros altogether for now.
> >
> > Though, I still have not looked comprehensively how other are able to
> > use
> > Debugging utils like netconsole etc without having
> > NET_POLL_CONTROLLER.
> > Maybe @Eric Dumazet might give us some insight on this?
> >
> >
> > If you agree with this then I can send a patch to remove these from
> > hns
> > driver. This should solve your problem as well?
>
> Sure, as long as you can guarantee that these locks are never used for
> anything
> useful.
Hi Marc,
I have floated the patch. Could you please confirm if this solves your issue
and if possible provide your Tested-by? :)
[PATCH net] net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
Thanks
Salil
prev parent reply other threads:[~2019-11-06 19:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-04 19:56 [PATCH] net: hns: Ensure that interface teardown cannot race with TX interrupt Marc Zyngier
2019-11-05 9:16 ` Salil Mehta
2019-11-05 18:41 ` Salil Mehta
2019-11-06 8:17 ` Marc Zyngier
2019-11-06 11:19 ` Salil Mehta
2019-11-06 12:16 ` Marc Zyngier
2019-11-06 19:05 ` Salil Mehta [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b60a6cd0c3934d52aec14b47b2218edf@huawei.com \
--to=salil.mehta@huawei.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linuxarm@huawei.com \
--cc=lipeng321@huawei.com \
--cc=maz@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).