* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
@ 2021-05-12 22:28 ` Thomas Gleixner
2021-05-13 0:50 ` Steven Rostedt
2021-05-14 18:56 ` Jakub Kicinski
2021-05-13 5:12 ` Juri Lelli
2021-05-13 20:20 ` patchwork-bot+netdevbpf
2 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2021-05-12 22:28 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, netdev
Cc: Juri Lelli, linux-rt-users, Steven Rostedt, LKML, sassmann,
David S. Miller, Jakub Kicinski, stable-rt
On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
>
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
>
> Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> PREEMPT_RT kernels.
>
> The callers of ____napi_schedule() in the networking core have been
> audited and are correct on PREEMPT_RT kernels as well.
>
> Reported-by: Juri Lelli <juri.lelli@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> an inline provided which invokes __napi_schedule().
>
> This was not chosen as it creates #ifdeffery all over the place and with
> the proposed solution the code reflects the documentation consistently
> and in one obvious place.
Blame me for that decision.
No matter which variant we end up with, this needs to go into all stable
RT kernels ASAP.
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-12 22:28 ` Thomas Gleixner
@ 2021-05-13 0:50 ` Steven Rostedt
2021-05-13 16:43 ` Steven Rostedt
2021-05-14 18:56 ` Jakub Kicinski
1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2021-05-13 0:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt
On Thu, 13 May 2021 00:28:02 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:
> No matter which variant we end up with, this needs to go into all stable
> RT kernels ASAP.
Is this in rt-devel already?
I'll start pulling in whatever is in there.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-13 0:50 ` Steven Rostedt
@ 2021-05-13 16:43 ` Steven Rostedt
2021-05-14 12:11 ` Thomas Gleixner
0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2021-05-13 16:43 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt
On Wed, 12 May 2021 20:50:46 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 13 May 2021 00:28:02 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > No matter which variant we end up with, this needs to go into all stable
> > RT kernels ASAP.
>
> Is this in rt-devel already?
>
> I'll start pulling in whatever is in there.
I don't see this in the rt-devel tree. The stable-rt releases always pull
from there (following the stable vs mainline relationship).
Is there going to be a new rt-devel release?
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-13 16:43 ` Steven Rostedt
@ 2021-05-14 12:11 ` Thomas Gleixner
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2021-05-14 12:11 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt
On Thu, May 13 2021 at 12:43, Steven Rostedt wrote:
> On Wed, 12 May 2021 20:50:46 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Thu, 13 May 2021 00:28:02 +0200
>> Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> > No matter which variant we end up with, this needs to go into all stable
>> > RT kernels ASAP.
>>
>> Is this in rt-devel already?
>>
>> I'll start pulling in whatever is in there.
>
> I don't see this in the rt-devel tree. The stable-rt releases always pull
> from there (following the stable vs mainline relationship).
>
> Is there going to be a new rt-devel release?
Once we have time to work on it.
The patch got applied to net-next, so please pick it up from there.
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-12 22:28 ` Thomas Gleixner
2021-05-13 0:50 ` Steven Rostedt
@ 2021-05-14 18:56 ` Jakub Kicinski
2021-05-14 19:44 ` Alison Chaiken
2021-05-14 20:16 ` Thomas Gleixner
1 sibling, 2 replies; 12+ messages in thread
From: Jakub Kicinski @ 2021-05-14 18:56 UTC (permalink / raw)
To: Thomas Gleixner, Sebastian Andrzej Siewior
Cc: netdev, Juri Lelli, linux-rt-users, Steven Rostedt, LKML,
sassmann, David S. Miller, stable-rt
On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
> On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> > __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> > which can be used where it is known that interrupts are disabled,
> > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> > callbacks.
> >
> > On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> > threaded interrupt handlers and spinlocks are not disabling interrupts
> > and the NAPI hrtimer callback is forced into softirq context which runs
> > with interrupts enabled as well.
> >
> > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> > game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> > PREEMPT_RT kernels.
> >
> > The callers of ____napi_schedule() in the networking core have been
> > audited and are correct on PREEMPT_RT kernels as well.
> >
> > Reported-by: Juri Lelli <juri.lelli@redhat.com>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>
> > ---
> > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> > an inline provided which invokes __napi_schedule().
> >
> > This was not chosen as it creates #ifdeffery all over the place and with
> > the proposed solution the code reflects the documentation consistently
> > and in one obvious place.
>
> Blame me for that decision.
>
> No matter which variant we end up with, this needs to go into all stable
> RT kernels ASAP.
Mumble mumble. I thought we concluded that drivers used on RT can be
fixed, we've already done it for a couple drivers (by which I mean two).
If all the IRQ handler is doing is scheduling NAPI (which it is for
modern NICs) - IRQF_NO_THREAD seems like the right option.
Is there any driver you care about that we can convert to using
IRQF_NO_THREAD so we can have new drivers to "do the right thing"
while the old ones depend on this workaround for now?
Another thing while I have your attention - ____napi_schedule() does
__raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
On non-RT we get occasional NOHZ warnings when drivers schedule napi
from process context, but on RT this is even more of a problem, right?
ksoftirqd won't run until something else actually wakes it up?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-14 18:56 ` Jakub Kicinski
@ 2021-05-14 19:44 ` Alison Chaiken
2021-05-14 21:53 ` Thomas Gleixner
2021-05-14 20:16 ` Thomas Gleixner
1 sibling, 1 reply; 12+ messages in thread
From: Alison Chaiken @ 2021-05-14 19:44 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Thomas Gleixner, Sebastian Andrzej Siewior, netdev, Juri Lelli,
linux-rt-users, Steven Rostedt, LKML, sassmann, David S. Miller,
stable-rt
On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
> > On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> > > __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> > > which can be used where it is known that interrupts are disabled,
> > > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> > > callbacks.
> > >
> > > On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> > > threaded interrupt handlers and spinlocks are not disabling interrupts
> > > and the NAPI hrtimer callback is forced into softirq context which runs
> > > with interrupts enabled as well.
> > >
> > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> > > game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> > > PREEMPT_RT kernels.
> > >
> > > The callers of ____napi_schedule() in the networking core have been
> > > audited and are correct on PREEMPT_RT kernels as well.
> > >
> > > Reported-by: Juri Lelli <juri.lelli@redhat.com>
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > > ---
> > > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> > > an inline provided which invokes __napi_schedule().
> > >
> > > This was not chosen as it creates #ifdeffery all over the place and with
> > > the proposed solution the code reflects the documentation consistently
> > > and in one obvious place.
> >
> > Blame me for that decision.
> >
> > No matter which variant we end up with, this needs to go into all stable
> > RT kernels ASAP.
>
> Mumble mumble. I thought we concluded that drivers used on RT can be
> fixed, we've already done it for a couple drivers (by which I mean two).
> If all the IRQ handler is doing is scheduling NAPI (which it is for
> modern NICs) - IRQF_NO_THREAD seems like the right option.
>
> Is there any driver you care about that we can convert to using
> IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> while the old ones depend on this workaround for now?
>
>
> Another thing while I have your attention - ____napi_schedule() does
> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
> On non-RT we get occasional NOHZ warnings when drivers schedule napi
> from process context, but on RT this is even more of a problem, right?
> ksoftirqd won't run until something else actually wakes it up?
By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"? We see
that message about once a week with 4.19. Presumably any failure of
____napi_schedule() to wake ksoftirqd could only cause problems for the
NET_RX softirq, so if the pending softirq is different, the cause lies
elsewhere.
-- Alison Chaiken
Aurora Innovation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-14 19:44 ` Alison Chaiken
@ 2021-05-14 21:53 ` Thomas Gleixner
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2021-05-14 21:53 UTC (permalink / raw)
To: Alison Chaiken, Jakub Kicinski
Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
Steven Rostedt, LKML, sassmann, David S. Miller, stable-rt
On Fri, May 14 2021 at 12:44, Alison Chaiken wrote:
> On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Another thing while I have your attention - ____napi_schedule() does
>> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
>> On non-RT we get occasional NOHZ warnings when drivers schedule napi
>> from process context, but on RT this is even more of a problem, right?
>> ksoftirqd won't run until something else actually wakes it up?
>
> By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"? We see
> that message about once a week with 4.19. Presumably any failure of
> ____napi_schedule() to wake ksoftirqd could only cause problems for the
> NET_RX softirq, so if the pending softirq is different, the cause lies
> elsewhere.
If you read the above carefully you might notice that this _IS_ about
____napi_schedule() being invoked from task context which raises NET_RX
and then results in pending 08! which is NET_RX.
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-14 18:56 ` Jakub Kicinski
2021-05-14 19:44 ` Alison Chaiken
@ 2021-05-14 20:16 ` Thomas Gleixner
2021-05-14 20:38 ` Jakub Kicinski
1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:16 UTC (permalink / raw)
To: Jakub Kicinski, Sebastian Andrzej Siewior
Cc: netdev, Juri Lelli, linux-rt-users, Steven Rostedt, LKML,
sassmann, David S. Miller, stable-rt
On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote:
> On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
>> > ---
>> > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
>> > an inline provided which invokes __napi_schedule().
>> >
>> > This was not chosen as it creates #ifdeffery all over the place and with
>> > the proposed solution the code reflects the documentation consistently
>> > and in one obvious place.
>>
>> Blame me for that decision.
>>
>> No matter which variant we end up with, this needs to go into all stable
>> RT kernels ASAP.
>
> Mumble mumble. I thought we concluded that drivers used on RT can be
> fixed, we've already done it for a couple drivers (by which I mean two).
> If all the IRQ handler is doing is scheduling NAPI (which it is for
> modern NICs) - IRQF_NO_THREAD seems like the right option.
Yes. That works, but there are a bunch which do more than that IIRC.
> Is there any driver you care about that we can convert to using
> IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> while the old ones depend on this workaround for now?
The start of this thread was about i40e_msix_clean_rings() which
probably falls under the IRQF_NO_THREAD category, but I'm sure that
there are others. So I chose the safe way for RT for now.
> Another thing while I have your attention - ____napi_schedule() does
> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
> On non-RT we get occasional NOHZ warnings when drivers schedule napi
> from process context, but on RT this is even more of a problem, right?
> ksoftirqd won't run until something else actually wakes it up?
Correct. I sent a patch for the r8152 usb network driver today which
suffers from that problem. :)
As I said there, we want a (debug/lockdep) check in __napi_schedule()
whether soft interrupts are disabled, but let me have a look whether
that check might make more sense directly in __raise_softirq_irqoff().
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-14 20:16 ` Thomas Gleixner
@ 2021-05-14 20:38 ` Jakub Kicinski
0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2021-05-14 20:38 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
Steven Rostedt, LKML, sassmann, David S. Miller, stable-rt
On Fri, 14 May 2021 22:16:10 +0200 Thomas Gleixner wrote:
> On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote:
> > On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
> >> Blame me for that decision.
> >>
> >> No matter which variant we end up with, this needs to go into all stable
> >> RT kernels ASAP.
> >
> > Mumble mumble. I thought we concluded that drivers used on RT can be
> > fixed, we've already done it for a couple drivers (by which I mean two).
> > If all the IRQ handler is doing is scheduling NAPI (which it is for
> > modern NICs) - IRQF_NO_THREAD seems like the right option.
>
> Yes. That works, but there are a bunch which do more than that IIRC.
>
> > Is there any driver you care about that we can convert to using
> > IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> > while the old ones depend on this workaround for now?
>
> The start of this thread was about i40e_msix_clean_rings() which
> probably falls under the IRQF_NO_THREAD category, but I'm sure that
> there are others. So I chose the safe way for RT for now.
Sounds reasonable. I'll send a patch with a new helper and convert
an example driver I'm sure falls into the "napi_schedule(); return;"
category. I just want to make sure "the right thing to do" is
accessible for people writing new drivers.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
2021-05-12 22:28 ` Thomas Gleixner
@ 2021-05-13 5:12 ` Juri Lelli
2021-05-13 20:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: Juri Lelli @ 2021-05-13 5:12 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, Thomas Gleixner, linux-rt-users, Steven Rostedt, LKML,
sassmann, David S. Miller, Jakub Kicinski
Hi,
On 12/05/21 23:43, Sebastian Andrzej Siewior wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
>
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
>
> Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> PREEMPT_RT kernels.
>
> The callers of ____napi_schedule() in the networking core have been
> audited and are correct on PREEMPT_RT kernels as well.
>
> Reported-by: Juri Lelli <juri.lelli@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> an inline provided which invokes __napi_schedule().
>
> This was not chosen as it creates #ifdeffery all over the place and with
> the proposed solution the code reflects the documentation consistently
> and in one obvious place.
>
> net/core/dev.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 222b1d322c969..febb23708184e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6501,11 +6501,18 @@ EXPORT_SYMBOL(napi_schedule_prep);
> * __napi_schedule_irqoff - schedule for receive
> * @n: entry to schedule
> *
> - * Variant of __napi_schedule() assuming hard irqs are masked
> + * Variant of __napi_schedule() assuming hard irqs are masked.
> + *
> + * On PREEMPT_RT enabled kernels this maps to __napi_schedule()
> + * because the interrupt disabled assumption might not be true
> + * due to force-threaded interrupts and spinlock substitution.
> */
> void __napi_schedule_irqoff(struct napi_struct *n)
> {
> - ____napi_schedule(this_cpu_ptr(&softnet_data), n);
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> + ____napi_schedule(this_cpu_ptr(&softnet_data), n);
> + else
> + __napi_schedule(n);
> }
> EXPORT_SYMBOL(__napi_schedule_irqoff);
Thanks for the patch!
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Best,
Juri
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
2021-05-12 22:28 ` Thomas Gleixner
2021-05-13 5:12 ` Juri Lelli
@ 2021-05-13 20:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-05-13 20:20 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, juri.lelli, tglx, linux-rt-users, rostedt, linux-kernel,
sassmann, davem, kuba
Hello:
This patch was applied to netdev/net-next.git (refs/heads/master):
On Wed, 12 May 2021 23:43:24 +0200 you wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
>
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
>
> [...]
Here is the summary with links:
- [net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
https://git.kernel.org/netdev/net-next/c/8380c81d5c4f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread