* Re: [RFC PATCH] net: stmmac: Fix the problem about interrupt storm
[not found] <CAKKbWA7e0TmU4z4O8tHfwE=dvqPFaZbSPjxR-==fQSsNq6ELCQ@mail.gmail.com>
@ 2024-11-03 19:00 ` Avi Fishman
2024-11-12 15:12 ` Romain Gantois
0 siblings, 1 reply; 2+ messages in thread
From: Avi Fishman @ 2024-11-03 19:00 UTC (permalink / raw)
To: cathycai0714, Alexandre Torgue, Jose Abreu, Giuseppe Cavallaro
Cc: cathy.cai, cixi.geng1, David Miller, edumazet, kuba, Linux ARM,
Linux Kernel Mailing List, linux-stm32, mcoquelin.stm32,
Network Development, pabeni, romain.gantois, wade.shu,
xuewen.yan94, zhiguo.niu, Alexandre Torgue, Murali, Tomer Maimon,
Silva, L Antonio, Arias Pablo, Somarouthu Murali, uri.trichter
Hi all,
We recently encountered the same interrupt storm and the root cause
was the same as here.
The suggested patch solved 99% of the issues, but indeed as written
below on rare cases the issue happens between the dev_open() and
clear_bit(STMMAC_DOWN) calls.
I also agree that stmmac_interrupt() unconditionally ignores
interrupts when the driver is in STMMAC_DOWN state is dangerous.
The issue happened for us in linux 5.10 but I see that this behaviour
wasn't changed also in newer versions.
maybe we should disable the device interrupts before dev_close(), and
enable it after dev_open().
>
> Hi Romain,
>
> On Sun, Mar 31, 2024 at 4:35 PM Romain Gantois
> <romain.gantois@bootlin.com> wrote:
> >
> > Hello Cathy,
> >
> > On Wed, 27 Mar 2024, Cathy Cai wrote:
> >
> > > Tx queue time out then reset adapter. When reset the adapter, stmmac driver
> > > sets the state to STMMAC_DOWN and calls dev_close() function. If an interrupt
> > > is triggered at this instant after setting state to STMMAC_DOWN, before the
> > > dev_close() call.
> > >
> > ...
> > > - set_bit(STMMAC_DOWN, &priv->state);
> > > dev_close(priv->dev);
> > > + set_bit(STMMAC_DOWN, &priv->state);
> > > dev_open(priv->dev, NULL);
> > > clear_bit(STMMAC_DOWN, &priv->state);
> > > clear_bit(STMMAC_RESETING, &priv->state);
> >
> > If this IRQ issue can happen whenever STMMAC_DOWN is set while the net device is
> > open, then it could also happen between the dev_open() and
> > clear_bit(STMMAC_DOWN) calls right? So you'd have to clear STMMAC_DOWN before
> > calling dev_open() but then I don't see the usefulness of setting STMMAC_DOWN
> > and clearing it immediately. Maybe closing and opening the net device should be
> > enough?
Indeed we encounter an issue between the dev_open() and clear_bit(STMMAC_DOWN)..
> >
> Yes. It could also happen between the dev_open() and
> clear_bit(STMMAC_DOWN) calls.
> Although we did not reproduce this scenario, it should have happened
> if we had increased
> the number of test samples. In addition, I found that other people had
> similar problems before.
> The link is:
> https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@baikalelectronics.ru/
>
> >
> > Moreover, it seems strange to me that stmmac_interrupt() unconditionnally
> > ignores interrupts when the driver is in STMMAC_DOWN state. This seems like
> > dangerous behaviour, since it could cause IRQ storm issues whenever something
> > in the driver sets this state. I'm not too familiar with the interrupt handling
> > in this driver, but maybe stmmac_interrupt() could clear interrupts
> > unconditionnally in the STMMAC_DOWN state?
> >
> Clear interrupts unconditionally in the STMMAC_DOWN state directly
> certainly won't cause this problem.
> This may be too rough, maybe this design has other considerations.
>
But then after the dev_open() you might miss interrupt, no?
> >
> > Best Regards,
> >
> > --
> > Romain Gantois, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
>
> Best Regards,
> Cathy
--
Regards,
Avi
^ permalink raw reply [flat|nested] 2+ messages in thread