* Re: [RFC PATCH] net: stmmac: Fix the problem about interrupt storm
[not found] <CAKKbWA7e0TmU4z4O8tHfwE=dvqPFaZbSPjxR-==fQSsNq6ELCQ@mail.gmail.com>
@ 2024-11-03 19:00 ` Avi Fishman
2024-11-12 15:12 ` Romain Gantois
0 siblings, 1 reply; 2+ messages in thread
From: Avi Fishman @ 2024-11-03 19:00 UTC (permalink / raw)
To: cathycai0714, Alexandre Torgue, Jose Abreu, Giuseppe Cavallaro
Cc: cathy.cai, cixi.geng1, David Miller, edumazet, kuba, Linux ARM,
Linux Kernel Mailing List, linux-stm32, mcoquelin.stm32,
Network Development, pabeni, romain.gantois, wade.shu,
xuewen.yan94, zhiguo.niu, Alexandre Torgue, Murali, Tomer Maimon,
Silva, L Antonio, Arias Pablo, Somarouthu Murali, uri.trichter
Hi all,
We recently encountered the same interrupt storm and the root cause
was the same as here.
The suggested patch solved 99% of the issues, but indeed as written
below on rare cases the issue happens between the dev_open() and
clear_bit(STMMAC_DOWN) calls.
I also agree that stmmac_interrupt() unconditionally ignores
interrupts when the driver is in STMMAC_DOWN state is dangerous.
The issue happened for us in linux 5.10 but I see that this behaviour
wasn't changed also in newer versions.
maybe we should disable the device interrupts before dev_close(), and
enable it after dev_open().
>
> Hi Romain,
>
> On Sun, Mar 31, 2024 at 4:35 PM Romain Gantois
> <romain.gantois@bootlin.com> wrote:
> >
> > Hello Cathy,
> >
> > On Wed, 27 Mar 2024, Cathy Cai wrote:
> >
> > > Tx queue time out then reset adapter. When reset the adapter, stmmac driver
> > > sets the state to STMMAC_DOWN and calls dev_close() function. If an interrupt
> > > is triggered at this instant after setting state to STMMAC_DOWN, before the
> > > dev_close() call.
> > >
> > ...
> > > - set_bit(STMMAC_DOWN, &priv->state);
> > > dev_close(priv->dev);
> > > + set_bit(STMMAC_DOWN, &priv->state);
> > > dev_open(priv->dev, NULL);
> > > clear_bit(STMMAC_DOWN, &priv->state);
> > > clear_bit(STMMAC_RESETING, &priv->state);
> >
> > If this IRQ issue can happen whenever STMMAC_DOWN is set while the net device is
> > open, then it could also happen between the dev_open() and
> > clear_bit(STMMAC_DOWN) calls right? So you'd have to clear STMMAC_DOWN before
> > calling dev_open() but then I don't see the usefulness of setting STMMAC_DOWN
> > and clearing it immediately. Maybe closing and opening the net device should be
> > enough?
Indeed we encounter an issue between the dev_open() and clear_bit(STMMAC_DOWN)..
> >
> Yes. It could also happen between the dev_open() and
> clear_bit(STMMAC_DOWN) calls.
> Although we did not reproduce this scenario, it should have happened
> if we had increased
> the number of test samples. In addition, I found that other people had
> similar problems before.
> The link is:
> https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@baikalelectronics.ru/
>
> >
> > Moreover, it seems strange to me that stmmac_interrupt() unconditionnally
> > ignores interrupts when the driver is in STMMAC_DOWN state. This seems like
> > dangerous behaviour, since it could cause IRQ storm issues whenever something
> > in the driver sets this state. I'm not too familiar with the interrupt handling
> > in this driver, but maybe stmmac_interrupt() could clear interrupts
> > unconditionnally in the STMMAC_DOWN state?
> >
> Clear interrupts unconditionally in the STMMAC_DOWN state directly
> certainly won't cause this problem.
> This may be too rough, maybe this design has other considerations.
>
But then after the dev_open() you might miss interrupt, no?
> >
> > Best Regards,
> >
> > --
> > Romain Gantois, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
>
> Best Regards,
> Cathy
--
Regards,
Avi
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC PATCH] net: stmmac: Fix the problem about interrupt storm
2024-11-03 19:00 ` [RFC PATCH] net: stmmac: Fix the problem about interrupt storm Avi Fishman
@ 2024-11-12 15:12 ` Romain Gantois
0 siblings, 0 replies; 2+ messages in thread
From: Romain Gantois @ 2024-11-12 15:12 UTC (permalink / raw)
To: cathycai0714, Alexandre Torgue, Jose Abreu, Giuseppe Cavallaro,
Avi Fishman
Cc: cathy.cai, cixi.geng1, David Miller, edumazet, kuba, Linux ARM,
Linux Kernel Mailing List, linux-stm32, mcoquelin.stm32,
Network Development, pabeni, wade.shu, xuewen.yan94, zhiguo.niu,
Alexandre Torgue, Murali, Tomer Maimon, Silva, L Antonio,
Arias Pablo, Somarouthu Murali, uri.trichter
Hello,
On dimanche 3 novembre 2024 20:00:54 heure normale d’Europe centrale Avi
Fishman wrote:
> Hi all,
>
...
> > Yes. It could also happen between the dev_open() and
> >
> > clear_bit(STMMAC_DOWN) calls.
> > Although we did not reproduce this scenario, it should have happened
> > if we had increased
> > the number of test samples. In addition, I found that other people had
> > similar problems before.
> > The link is:
> > https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@baikalele
> > ctronics.ru/>
> > > Moreover, it seems strange to me that stmmac_interrupt()
> > > unconditionnally
> > > ignores interrupts when the driver is in STMMAC_DOWN state. This seems
> > > like
> > > dangerous behaviour, since it could cause IRQ storm issues whenever
> > > something in the driver sets this state. I'm not too familiar with the
> > > interrupt handling in this driver, but maybe stmmac_interrupt() could
> > > clear interrupts unconditionnally in the STMMAC_DOWN state?
> >
> > Clear interrupts unconditionally in the STMMAC_DOWN state directly
> > certainly won't cause this problem.
> > This may be too rough, maybe this design has other considerations.
>
> But then after the dev_open() you might miss interrupt, no?
Indeed, but in any case, unconditionally returning from an IRQ handler without
clearing any interrupt flags seems like very strange behavior to me.
Disabling and reenabling interrupts as you suggested does seem like a
good solution for this particular scenario, but it doesn't solve the more
general issue of the dangerous way stmmac_interrupt handles this.
Maybe the setting and clearing of this STMMAC_DOWN bit should
be wrapped in some kind of handler which also disables all interrupts?
Best Regards
--
Romain Gantois, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-11-12 15:16 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAKKbWA7e0TmU4z4O8tHfwE=dvqPFaZbSPjxR-==fQSsNq6ELCQ@mail.gmail.com>
2024-11-03 19:00 ` [RFC PATCH] net: stmmac: Fix the problem about interrupt storm Avi Fishman
2024-11-12 15:12 ` Romain Gantois
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox