public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "David S . Miller" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"UNGLinuxDriver@microchip.com" <UNGLinuxDriver@microchip.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	"Allan W . Nielsen" <allan.nielsen@microchip.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Steen Hegelund <steen.hegelund@microchip.com>
Subject: Re: [PATCH v2 net] net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
Date: Fri, 4 Dec 2020 19:23:11 +0000	[thread overview]
Message-ID: <20201204192310.al6fhkbarhbjng3a@skbuf> (raw)
In-Reply-To: <20201204105516.1237a690@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>

On Fri, Dec 04, 2020 at 10:55:16AM -0800, Jakub Kicinski wrote:
> On Fri, 4 Dec 2020 18:12:51 +0000 Vladimir Oltean wrote:
> > On Fri, Dec 04, 2020 at 10:00:21AM -0800, Jakub Kicinski wrote:
> > > On Fri,  4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote:
> > > > Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has
> > > > a very nice ocelot_mact_wait_for_completion at the end. Introduced in
> > > > commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be
> > > > wall time not attempts"), this function uses readx_poll_timeout which
> > > > triggers a lot of lockdep warnings and is also dangerous to use from
> > > > atomic context, leading to lockups and panics.
> > > >
> > > > Steen Hegelund added a poll timeout of 100 ms for checking the MAC
> > > > table, a duration which is clearly absurd to poll in atomic context.
> > > > So we need to defer the MAC table access to process context, which we do
> > > > via a dynamically allocated workqueue which contains all there is to
> > > > know about the MAC table operation it has to do.
> > > >
> > > > Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts")
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > > - Added Fixes tag (it won't backport that far, but anyway)
> > > > - Using get_device and put_device to avoid racing with unbind
> > >
> > > Does get_device really protect you from unbind? I thought it only
> > > protects you from .release being called, IOW freeing struct device
> > > memory..
> >
> > Possibly.
> > I ran a bind && unbind loop for a while, and I couldn't trigger any
> > concurrency.
>
> You'd need to switch to a delayed work or add some other sleep for
> testing, maybe?

Ok, I'll test with a sleep in the worker task.

> > > More usual way of handling this would be allocating your own workqueue
> > > and flushing that wq at the right point.
> >
> > Yeah, well I more or less deliberately lose track of the workqueue as
> > soon as ocelot_enqueue_mact_action is over, and that is by design. There
> > is potentially more than one address to offload to the hardware in progress
> > at the same time, and any sort of serialization in .ndo_set_rx_mode (so
> > I could add the workqueue to a list of items to cancel on unbind)
> > would mean
> > (a) more complicated code
> > (b) more busy waiting
>
> Are you sure you're not confusing workqueue with a work entry?
>
> You can still put multiple work entries on the queue.

I am confused indeed. I will create an ordered_workqueue in ocelot and I
will flush it after unregistering the network interfaces and before
unbinding the device, when accesses to registers are still valid but
there is no further NDO that gets called.

> > > >  drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++-
> > > >  1 file changed, 80 insertions(+), 3 deletions(-)
> > >
> > > This is a little large for a rc7 fix :S
> >
> > Fine, net-next it is then.
>
> If this really is the fix we want, it's the fix we want, and it should
> go into net. We'll just need to test it very well is all.

Well, as I said, I don't care at all how far this patch will be backported.
I am not using the ocelot switchdev driver anyway (since I don't have
hardware that uses it), I just have a test vehicle that I use from time
to time to check that I don't introduce regressions in the various code
paths. And seeing lockdep give warnings is annoying. I am perfectly fine
with targeting v3 for net-next. I don't think even the AUTOSEL crew will
pick it up, since it's going to conflict with some refactoring.

> > > What's the expected poll time? maybe it's not that bad to busy wait?
> > > Clearly nobody noticed the issue in 2 years (you mention lockdep so
> > > not a "real" deadlock) which makes me think the wait can't be that long?
> >
> > Not too much, but the sleep is there.
> > Also, all 3 of ocelot/felix/seville are memory-mapped devices. But I
> > heard from Alex a while ago that he intends to add support for a switch
> > managed over a slow bus like SPI, and to use the same regmap infrastructure.
> > That would mean that this problem would need to be resolved anyway.
>
> So it's MMIO but the other end is some firmware running on the device?

No. It's always register-based access, just that in some cases the
registers are directly memory-mapped for Linux, while in other cases
they are beyond an SPI bus. But there isn't any firmware in any case.

  reply	other threads:[~2020-12-04 19:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 17:51 [PATCH v2 net] net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context Vladimir Oltean
2020-12-04 18:00 ` Jakub Kicinski
2020-12-04 18:12   ` Vladimir Oltean
2020-12-04 18:15     ` Alexandre Belloni
2020-12-04 18:55     ` Jakub Kicinski
2020-12-04 19:23       ` Vladimir Oltean [this message]
2020-12-05  0:41         ` Jakub Kicinski
2020-12-04 19:24       ` Jakub Kicinski
2020-12-04 19:26       ` Alexandre Belloni
2020-12-04 18:25   ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204192310.al6fhkbarhbjng3a@skbuf \
    --to=vladimir.oltean@nxp.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=allan.nielsen@microchip.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=horatiu.vultur@microchip.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=steen.hegelund@microchip.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox