netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Salil Mehta <salil.mehta@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: "edumazet@google.com" <edumazet@google.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"lipeng (Y)" <lipeng321@huawei.com>,
	"Zhuangyuzeng (Yisen)" <yisen.zhuang@huawei.com>,
	"David S . Miller" <davem@davemloft.net>,
	Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH] net: hns: Ensure that interface teardown cannot race with TX interrupt
Date: Wed, 6 Nov 2019 19:05:33 +0000	[thread overview]
Message-ID: <b60a6cd0c3934d52aec14b47b2218edf@huawei.com> (raw)
In-Reply-To: <21493d3d08936d7ed67f7153cdaa418e@www.loen.fr>

> From: Marc Zyngier [mailto:maz@kernel.org]
> Sent: Wednesday, November 6, 2019 12:17 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On 2019-11-06 12:28, Salil Mehta wrote:
> > Hi Marc,
> >
> >> On Tue, 5 Nov 2019 18:41:11 +0000
> >> Salil Mehta <salil.mehta@huawei.com> wrote:
> >>
> >> Hi Salil,
> >>
> >> > Hi Marc,
> >> > I tested with the patch on D05 with the lockdep enabled kernel
> >> with below options
> >> > and I could not reproduce the deadlock. I do not argue the issue
> >> being mentioned
> >> > as this looks to be a clear bug which should hit while TX
> >> data-path is running
> >> > and we try to disable the interface.
> >>
> >> Lockdep screaming at you doesn't mean the deadly scenario happens in
> >> practice, and I've never seen the machine hanging in these
> >> conditions.
> >> But I've also never tried to trigger it in anger.
> >>
> >> > Could you please help me know the exact set of steps you used to
> >> get into this
> >> > problem. Also, are you able to re-create it easily/frequently?
> >>
> >> I just need to issue "reboot" (which calls "ip link ... down") for
> >> this
> >> to trigger. Here's a full splat[1], as well as my full config[2]. It
> >> is
> >> 100% repeatable.
> >>
> >> > # Kernel Config options:
> >> > CONFIG_LOCKDEP_SUPPORT=y
> >> > CONFIG_LOCKDEP=y
> >>
> >> You'll need at least
> >>
> >> CONFIG_PROVE_LOCKING=y
> >> CONFIG_NET_POLL_CONTROLLER=y
> >
> >
> > Few points:
> > 1. To me netpoll causing spinlock deadlock with IRQ leg of TX and ip
> > util is
> >     highly unlikely since netpoll runs with both RX/TX interrupts
> > disabled.
> >     It runs in polling mode to facilitate parallel path to features
> > like
> >     Netconsole, netdump etc. hence, deadlock because of the netpoll
> > should
> >     be highly unlikely. Therefore, smells of some other problem
> > here...
> > 2. Also, I remember patch[s1][s2] from Eric Dumazet to disable
> > netpoll on many
> >     NICs way back in 4.19 kernel on the basis of Song Liu's findings.
> >
> > Problem:
> > Aah, I see the problem now, it is because of the stray code related
> > to the
> > NET_POLL_CONTROLLER in hns driver which actually should have got
> > remove within
> > the patch[s1], and that also explains why it does not get hit while
> > NET POLL
> > is disabled.
> >
> >
> > /* netif_tx_lock will turn down the performance, set only when
> > necessary */
> > #ifdef CONFIG_NET_POLL_CONTROLLER
> > #define NETIF_TX_LOCK(ring) spin_lock(&(ring)->lock)
> > #define NETIF_TX_UNLOCK(ring) spin_unlock(&(ring)->lock)
> > #else
> > #define NETIF_TX_LOCK(ring)
> > #define NETIF_TX_UNLOCK(ring)
> > #endif
> >
> >
> > Once you define CONFIG_NET_POLL_CONTROLLER in the latest code these
> > macros
> > Kick-in even for the normal NAPI path. Which can cause deadlock and
> > that perhaps
> > is what you are seeing?
> 
> Yes, that's the problem.


Sure, thanks.


> > Now, the question is do we require these locks in normal NAPI poll? I
> > do not
> > see that we need them anymore as Tasklets are serialized to
> > themselves and
> > configuration path like "ip <intf> down" cannot conflict with NAPI
> > poll path
> > as the later is always disabled prior performing interface down
> > operation.
> > Hence, no conflict there.
> 
> My preference would indeed be to drop these per-queue locks if they
> aren't
> required. I couldn't figure out from a cursory look at the code whether
> two CPUs could serve the same TX queue. If that cannot happen by
> construction,
> then these locks are perfectly useless and should be removed.


Sure.


> > As  a side analysis, I could figure out some contentions in the
> > configuration
> > path not related to this though. :)
> >
> >
> > Suggested Solution:
> > Since we do not have support of NET_POLL_CONTROLLER macros
> > NETIF_TX_[UN]LOCK
> > We should remove these NET_POLL_CONTROLLER macros altogether for now.
> >
> > Though, I still have not looked comprehensively how other are able to
> > use
> > Debugging utils like netconsole etc without having
> > NET_POLL_CONTROLLER.
> > Maybe @Eric Dumazet might give us some insight on this?
> >
> >
> > If you agree with this then I can send a patch to remove these from
> > hns
> > driver. This should solve your problem as well?
> 
> Sure, as long as you can guarantee that these locks are never used for
> anything
> useful.


Hi Marc,
I have floated the patch. Could you please confirm if this solves your issue
and if possible provide your Tested-by? :)

[PATCH net] net: hns: Fix the stray netpoll locks causing deadlock in NAPI path

Thanks
Salil

      reply	other threads:[~2019-11-06 19:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-04 19:56 [PATCH] net: hns: Ensure that interface teardown cannot race with TX interrupt Marc Zyngier
2019-11-05  9:16 ` Salil Mehta
2019-11-05 18:41 ` Salil Mehta
2019-11-06  8:17   ` Marc Zyngier
2019-11-06 11:19     ` Salil Mehta
2019-11-06 12:16       ` Marc Zyngier
2019-11-06 19:05         ` Salil Mehta [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b60a6cd0c3934d52aec14b47b2218edf@huawei.com \
    --to=salil.mehta@huawei.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linuxarm@huawei.com \
    --cc=lipeng321@huawei.com \
    --cc=maz@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).