netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] r8169: Fix GRO-related issue with not disabled device interrupts
@ 2024-05-14  6:49 Heiner Kallweit
  2024-05-14  6:50 ` [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value Heiner Kallweit
  2024-05-14  6:52 ` [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI Heiner Kallweit
  0 siblings, 2 replies; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14  6:49 UTC (permalink / raw)
  To: Paolo Abeni, Jakub Kicinski, Eric Dumazet, David Miller,
	Realtek linux nic maintainers
  Cc: netdev@vger.kernel.org, Ken Milmore

Ken reported that RTL8125b can lock up if gro_flush_timeout has the
default value of 20000 and napi_defer_hard_irqs is set to 0.
In this scenario device interrupts aren't disabled, what seems to
trigger some silicon bug under heavy load. I was able to reproduce this
behavior on RTL8168h.
Disabling device interrupts if NAPI is scheduled from a place other than
the driver's interrupt handler is a necessity in r8169, for other
drivers it may still be a performance optimization.
Therefore add a variant of napi_schedule_prep() with a more granular
return value.

Patch was verified to fix the issue for RTL8168h.

Heiner Kallweit (2):
  net: add napi_schedule_prep variant with more granular return value
  r8169: disable interrupts also for GRO-scheduled NAPI

 drivers/net/ethernet/realtek/r8169_main.c |  6 ++++--
 include/linux/netdevice.h                 |  7 ++++++-
 net/core/dev.c                            | 12 ++++++------
 3 files changed, 16 insertions(+), 9 deletions(-)

-- 
2.45.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value
  2024-05-14  6:49 [PATCH net 0/2] r8169: Fix GRO-related issue with not disabled device interrupts Heiner Kallweit
@ 2024-05-14  6:50 ` Heiner Kallweit
  2024-05-14 11:07   ` Eric Dumazet
  2024-05-14  6:52 ` [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI Heiner Kallweit
  1 sibling, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14  6:50 UTC (permalink / raw)
  To: Paolo Abeni, Jakub Kicinski, Eric Dumazet, David Miller,
	Realtek linux nic maintainers
  Cc: netdev@vger.kernel.org, Ken Milmore

For deciding whether to disable device interrupts, drivers may need the
information whether NAPIF_STATE_DISABLE or NAPIF_STATE_SCHED was set.
Therefore add a __napi_schedule_prep() which returns -1 in case
NAPIF_STATE_DISABLE was set.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 include/linux/netdevice.h |  7 ++++++-
 net/core/dev.c            | 12 ++++++------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf261fb89..d1515cf8c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -498,7 +498,12 @@ static inline bool napi_is_scheduled(struct napi_struct *n)
 	return test_bit(NAPI_STATE_SCHED, &n->state);
 }
 
-bool napi_schedule_prep(struct napi_struct *n);
+int __napi_schedule_prep(struct napi_struct *n);
+
+static inline bool napi_schedule_prep(struct napi_struct *n)
+{
+	return __napi_schedule_prep(n) > 0;
+}
 
 /**
  *	napi_schedule - schedule NAPI poll
diff --git a/net/core/dev.c b/net/core/dev.c
index d2ce91a33..66f55fc50 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6116,21 +6116,21 @@ void __napi_schedule(struct napi_struct *n)
 EXPORT_SYMBOL(__napi_schedule);
 
 /**
- *	napi_schedule_prep - check if napi can be scheduled
+ *	__napi_schedule_prep - check if napi can be scheduled
  *	@n: napi context
  *
  * Test if NAPI routine is already running, and if not mark
  * it as running.  This is used as a condition variable to
- * insure only one NAPI poll instance runs.  We also make
- * sure there is no pending NAPI disable.
+ * insure only one NAPI poll instance runs. Return -1 if
+ * there is a pending NAPI disable.
  */
-bool napi_schedule_prep(struct napi_struct *n)
+int __napi_schedule_prep(struct napi_struct *n)
 {
 	unsigned long new, val = READ_ONCE(n->state);
 
 	do {
 		if (unlikely(val & NAPIF_STATE_DISABLE))
-			return false;
+			return -1;
 		new = val | NAPIF_STATE_SCHED;
 
 		/* Sets STATE_MISSED bit if STATE_SCHED was already set
@@ -6145,7 +6145,7 @@ bool napi_schedule_prep(struct napi_struct *n)
 
 	return !(val & NAPIF_STATE_SCHED);
 }
-EXPORT_SYMBOL(napi_schedule_prep);
+EXPORT_SYMBOL(__napi_schedule_prep);
 
 /**
  * __napi_schedule_irqoff - schedule for receive
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14  6:49 [PATCH net 0/2] r8169: Fix GRO-related issue with not disabled device interrupts Heiner Kallweit
  2024-05-14  6:50 ` [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value Heiner Kallweit
@ 2024-05-14  6:52 ` Heiner Kallweit
  2024-05-14  9:45   ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14  6:52 UTC (permalink / raw)
  To: Paolo Abeni, Jakub Kicinski, Eric Dumazet, David Miller,
	Realtek linux nic maintainers
  Cc: netdev@vger.kernel.org, Ken Milmore

Ken reported that RTL8125b can lock up if gro_flush_timeout has the
default value of 20000 and napi_defer_hard_irqs is set to 0.
In this scenario device interrupts aren't disabled, what seems to
trigger some silicon bug under heavy load. I was able to reproduce this
behavior on RTL8168h.
Disabling device interrupts if NAPI is scheduled from a place other than
the driver's interrupt handler is a necessity in r8169, for other
drivers it may still be a performance optimization.

Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
Reported-by: Ken Milmore <ken.milmore@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index e5ea827a2..01f0ca53d 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct rtl8169_private *tp = dev_instance;
 	u32 status = rtl_get_events(tp);
+	int ret;
 
 	if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
 		return IRQ_NONE;
@@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
 	}
 
-	if (napi_schedule_prep(&tp->napi)) {
+	ret = __napi_schedule_prep(&tp->napi);
+	if (ret >= 0)
 		rtl_irq_disable(tp);
+	if (ret > 0)
 		__napi_schedule(&tp->napi);
-	}
 out:
 	rtl_ack_events(tp, status);
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14  6:52 ` [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI Heiner Kallweit
@ 2024-05-14  9:45   ` Eric Dumazet
  2024-05-14 10:52     ` Alexander Lobakin
  2024-05-15  5:53     ` Heiner Kallweit
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2024-05-14  9:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> default value of 20000 and napi_defer_hard_irqs is set to 0.
> In this scenario device interrupts aren't disabled, what seems to
> trigger some silicon bug under heavy load. I was able to reproduce this
> behavior on RTL8168h.
> Disabling device interrupts if NAPI is scheduled from a place other than
> the driver's interrupt handler is a necessity in r8169, for other
> drivers it may still be a performance optimization.
>
> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index e5ea827a2..01f0ca53d 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  {
>         struct rtl8169_private *tp = dev_instance;
>         u32 status = rtl_get_events(tp);
> +       int ret;
>
>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>                 return IRQ_NONE;
> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>         }
>
> -       if (napi_schedule_prep(&tp->napi)) {
> +       ret = __napi_schedule_prep(&tp->napi);
> +       if (ret >= 0)
>                 rtl_irq_disable(tp);
> +       if (ret > 0)
>                 __napi_schedule(&tp->napi);
> -       }
>  out:
>         rtl_ack_events(tp, status);
>

I do not understand this patch.

__napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
but this should not happen under normal operations ?

A simple revert would avoid adding yet another NAPI helper.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14  9:45   ` Eric Dumazet
@ 2024-05-14 10:52     ` Alexander Lobakin
  2024-05-14 11:05       ` Eric Dumazet
  2024-05-15  5:53     ` Heiner Kallweit
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2024-05-14 10:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Heiner Kallweit, Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 14 May 2024 11:45:05 +0200

> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>> In this scenario device interrupts aren't disabled, what seems to
>> trigger some silicon bug under heavy load. I was able to reproduce this
>> behavior on RTL8168h.
>> Disabling device interrupts if NAPI is scheduled from a place other than
>> the driver's interrupt handler is a necessity in r8169, for other
>> drivers it may still be a performance optimization.
>>
>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index e5ea827a2..01f0ca53d 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>  {
>>         struct rtl8169_private *tp = dev_instance;
>>         u32 status = rtl_get_events(tp);
>> +       int ret;
>>
>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>>                 return IRQ_NONE;
>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>>         }
>>
>> -       if (napi_schedule_prep(&tp->napi)) {
>> +       ret = __napi_schedule_prep(&tp->napi);
>> +       if (ret >= 0)
>>                 rtl_irq_disable(tp);
>> +       if (ret > 0)
>>                 __napi_schedule(&tp->napi);
>> -       }
>>  out:
>>         rtl_ack_events(tp, status);
>>
> 
> I do not understand this patch.
> 
> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> but this should not happen under normal operations ?

Without this patch, napi_schedule_prep() returns false if it's either
scheduled already OR it's disabled. Drivers disable interrupts only if
it returns true, which means they don't do that if it's already scheduled.
With this patch, __napi_schedule_prep() returns -1 if it's disabled and
0 if it was already scheduled. Which means we can disable interrupts
when the result is >= 0, i.e. regardless if it was scheduled before the
call or within the call.

IIUC, this addresses such situations:

napi_schedule()		// we disabled interrupts
napi_poll()		// we polled < budget frames
napi_complete_done()	// reenable the interrupts, no repoll
  hrtimer_start()	// GRO flush is queued
    napi_schedule()
      napi_poll()	// GRO flush, BUT interrupts are enabled

On r8169, this seems to cause issues. On other drivers, it seems to be
okay, but with this new helper, you can save some cycles.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>

> 
> A simple revert would avoid adding yet another NAPI helper.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 10:52     ` Alexander Lobakin
@ 2024-05-14 11:05       ` Eric Dumazet
  2024-05-14 11:17         ` Alexander Lobakin
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2024-05-14 11:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Heiner Kallweit, Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, May 14, 2024 at 12:53 PM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 14 May 2024 11:45:05 +0200
>
> > On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> >> default value of 20000 and napi_defer_hard_irqs is set to 0.
> >> In this scenario device interrupts aren't disabled, what seems to
> >> trigger some silicon bug under heavy load. I was able to reproduce this
> >> behavior on RTL8168h.
> >> Disabling device interrupts if NAPI is scheduled from a place other than
> >> the driver's interrupt handler is a necessity in r8169, for other
> >> drivers it may still be a performance optimization.
> >>
> >> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> >> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >> ---
> >>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> >> index e5ea827a2..01f0ca53d 100644
> >> --- a/drivers/net/ethernet/realtek/r8169_main.c
> >> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> >> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>  {
> >>         struct rtl8169_private *tp = dev_instance;
> >>         u32 status = rtl_get_events(tp);
> >> +       int ret;
> >>
> >>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
> >>                 return IRQ_NONE;
> >> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
> >>         }
> >>
> >> -       if (napi_schedule_prep(&tp->napi)) {
> >> +       ret = __napi_schedule_prep(&tp->napi);
> >> +       if (ret >= 0)
> >>                 rtl_irq_disable(tp);
> >> +       if (ret > 0)
> >>                 __napi_schedule(&tp->napi);
> >> -       }
> >>  out:
> >>         rtl_ack_events(tp, status);
> >>
> >
> > I do not understand this patch.
> >
> > __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> > but this should not happen under normal operations ?
>
> Without this patch, napi_schedule_prep() returns false if it's either
> scheduled already OR it's disabled. Drivers disable interrupts only if
> it returns true, which means they don't do that if it's already scheduled.
> With this patch, __napi_schedule_prep() returns -1 if it's disabled and
> 0 if it was already scheduled. Which means we can disable interrupts
> when the result is >= 0, i.e. regardless if it was scheduled before the
> call or within the call.
>
> IIUC, this addresses such situations:
>
> napi_schedule()         // we disabled interrupts
> napi_poll()             // we polled < budget frames
> napi_complete_done()    // reenable the interrupts, no repoll
>   hrtimer_start()       // GRO flush is queued
>     napi_schedule()
>       napi_poll()       // GRO flush, BUT interrupts are enabled
>
> On r8169, this seems to cause issues. On other drivers, it seems to be
> okay, but with this new helper, you can save some cycles.
>
> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Rephrasing the changelog is not really helping.

Consider myself as a network maintainer, not as a casual patch reviewer.

"This seems to cause issues" is rather weak.

I would simply revert the faulty commit, because the interrupts are
going to be disabled no matter what.

Old logic was very simple and rock solid. A revert is a clear stable candidate.

rtl_irq_disable(tp);
napi_schedule(&tp->napi);

If this is still broken, we might have similar issues in old/legacy drivers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value
  2024-05-14  6:50 ` [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value Heiner Kallweit
@ 2024-05-14 11:07   ` Eric Dumazet
  2024-05-14 11:40     ` Heiner Kallweit
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2024-05-14 11:07 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, May 14, 2024 at 8:50 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> For deciding whether to disable device interrupts, drivers may need the
> information whether NAPIF_STATE_DISABLE or NAPIF_STATE_SCHED was set.
> Therefore add a __napi_schedule_prep() which returns -1 in case
> NAPIF_STATE_DISABLE was set.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---

net-next is closed.

See my feedback on the other patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 11:05       ` Eric Dumazet
@ 2024-05-14 11:17         ` Alexander Lobakin
  2024-05-14 14:27           ` Eric Dumazet
  2024-05-14 11:29         ` Heiner Kallweit
  2024-05-14 14:11         ` Jakub Kicinski
  2 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2024-05-14 11:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Heiner Kallweit, Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 14 May 2024 13:05:55 +0200

> On Tue, May 14, 2024 at 12:53 PM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Eric Dumazet <edumazet@google.com>
>> Date: Tue, 14 May 2024 11:45:05 +0200
>>
>>> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>>>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>>>> In this scenario device interrupts aren't disabled, what seems to
>>>> trigger some silicon bug under heavy load. I was able to reproduce this
>>>> behavior on RTL8168h.
>>>> Disabling device interrupts if NAPI is scheduled from a place other than
>>>> the driver's interrupt handler is a necessity in r8169, for other
>>>> drivers it may still be a performance optimization.
>>>>
>>>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>>>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>> index e5ea827a2..01f0ca53d 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>>>  {
>>>>         struct rtl8169_private *tp = dev_instance;
>>>>         u32 status = rtl_get_events(tp);
>>>> +       int ret;
>>>>
>>>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>>>>                 return IRQ_NONE;
>>>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>>>>         }
>>>>
>>>> -       if (napi_schedule_prep(&tp->napi)) {
>>>> +       ret = __napi_schedule_prep(&tp->napi);
>>>> +       if (ret >= 0)
>>>>                 rtl_irq_disable(tp);
>>>> +       if (ret > 0)
>>>>                 __napi_schedule(&tp->napi);
>>>> -       }
>>>>  out:
>>>>         rtl_ack_events(tp, status);
>>>>
>>>
>>> I do not understand this patch.
>>>
>>> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
>>> but this should not happen under normal operations ?
>>
>> Without this patch, napi_schedule_prep() returns false if it's either
>> scheduled already OR it's disabled. Drivers disable interrupts only if
>> it returns true, which means they don't do that if it's already scheduled.
>> With this patch, __napi_schedule_prep() returns -1 if it's disabled and
>> 0 if it was already scheduled. Which means we can disable interrupts
>> when the result is >= 0, i.e. regardless if it was scheduled before the
>> call or within the call.
>>
>> IIUC, this addresses such situations:
>>
>> napi_schedule()         // we disabled interrupts
>> napi_poll()             // we polled < budget frames
>> napi_complete_done()    // reenable the interrupts, no repoll
>>   hrtimer_start()       // GRO flush is queued
>>     napi_schedule()
>>       napi_poll()       // GRO flush, BUT interrupts are enabled
>>
>> On r8169, this seems to cause issues. On other drivers, it seems to be
>> okay, but with this new helper, you can save some cycles.
>>
>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 
> Rephrasing the changelog is not really helping.
> 
> Consider myself as a network maintainer, not as a casual patch reviewer.

And?

> 
> "This seems to cause issues" is rather weak.

It has "Reported-by", so it really causes issues.

> 
> I would simply revert the faulty commit, because the interrupts are
> going to be disabled no matter what.
> 
> Old logic was very simple and rock solid. A revert is a clear stable candidate.
> 
> rtl_irq_disable(tp);
> napi_schedule(&tp->napi);
> 
> If this is still broken, we might have similar issues in old/legacy drivers.

I might agree that we could just revert the mentioned commit for stable,
but for the next net-next, avoid unnecessary
scheduling/enabling/disabling interrupts makes sense, not only for
"old/legacy" drivers.
"Very simple and rock solid" is not an argument for avoiding improvements.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 11:05       ` Eric Dumazet
  2024-05-14 11:17         ` Alexander Lobakin
@ 2024-05-14 11:29         ` Heiner Kallweit
  2024-05-14 14:11         ` Jakub Kicinski
  2 siblings, 0 replies; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14 11:29 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Lobakin
  Cc: Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On 14.05.2024 13:05, Eric Dumazet wrote:
> On Tue, May 14, 2024 at 12:53 PM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Eric Dumazet <edumazet@google.com>
>> Date: Tue, 14 May 2024 11:45:05 +0200
>>
>>> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>>>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>>>> In this scenario device interrupts aren't disabled, what seems to
>>>> trigger some silicon bug under heavy load. I was able to reproduce this
>>>> behavior on RTL8168h.
>>>> Disabling device interrupts if NAPI is scheduled from a place other than
>>>> the driver's interrupt handler is a necessity in r8169, for other
>>>> drivers it may still be a performance optimization.
>>>>
>>>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>>>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>> index e5ea827a2..01f0ca53d 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>>>  {
>>>>         struct rtl8169_private *tp = dev_instance;
>>>>         u32 status = rtl_get_events(tp);
>>>> +       int ret;
>>>>
>>>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>>>>                 return IRQ_NONE;
>>>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>>>>         }
>>>>
>>>> -       if (napi_schedule_prep(&tp->napi)) {
>>>> +       ret = __napi_schedule_prep(&tp->napi);
>>>> +       if (ret >= 0)
>>>>                 rtl_irq_disable(tp);
>>>> +       if (ret > 0)
>>>>                 __napi_schedule(&tp->napi);
>>>> -       }
>>>>  out:
>>>>         rtl_ack_events(tp, status);
>>>>
>>>
>>> I do not understand this patch.
>>>
>>> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
>>> but this should not happen under normal operations ?
>>
>> Without this patch, napi_schedule_prep() returns false if it's either
>> scheduled already OR it's disabled. Drivers disable interrupts only if
>> it returns true, which means they don't do that if it's already scheduled.
>> With this patch, __napi_schedule_prep() returns -1 if it's disabled and
>> 0 if it was already scheduled. Which means we can disable interrupts
>> when the result is >= 0, i.e. regardless if it was scheduled before the
>> call or within the call.
>>
>> IIUC, this addresses such situations:
>>
>> napi_schedule()         // we disabled interrupts
>> napi_poll()             // we polled < budget frames
>> napi_complete_done()    // reenable the interrupts, no repoll
>>   hrtimer_start()       // GRO flush is queued
>>     napi_schedule()
>>       napi_poll()       // GRO flush, BUT interrupts are enabled
>>
>> On r8169, this seems to cause issues. On other drivers, it seems to be
>> okay, but with this new helper, you can save some cycles.
>>
>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 
> Rephrasing the changelog is not really helping.
> 
> Consider myself as a network maintainer, not as a casual patch reviewer.
> 
> "This seems to cause issues" is rather weak.
> 
> I would simply revert the faulty commit, because the interrupts are
> going to be disabled no matter what.
> 
> Old logic was very simple and rock solid. A revert is a clear stable candidate.
> 
> rtl_irq_disable(tp);
> napi_schedule(&tp->napi);
> 
Proposed new solution differs from the revert wrt how NAPIF_STATE_DISABLE
is handled. I'm not sure the old code would handle this corner case correctly.
It would disable device interrupts, and following napi_schedule() is a no-op.
Therefore device interrupts would remain disabled, and I think that's not
what we want.
When trying to solve this by adding an extra call to e.g. napi_disable_pending(),
then I have concerns this might be racy.

> If this is still broken, we might have similar issues in old/legacy drivers.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value
  2024-05-14 11:07   ` Eric Dumazet
@ 2024-05-14 11:40     ` Heiner Kallweit
  0 siblings, 0 replies; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14 11:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On 14.05.2024 13:07, Eric Dumazet wrote:
> On Tue, May 14, 2024 at 8:50 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> For deciding whether to disable device interrupts, drivers may need the
>> information whether NAPIF_STATE_DISABLE or NAPIF_STATE_SCHED was set.
>> Therefore add a __napi_schedule_prep() which returns -1 in case
>> NAPIF_STATE_DISABLE was set.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
> 
> net-next is closed.
> 
This patch is a prerequisite for the other one. Therefore I don't see it
as net-next material. If it would be a standalone patch, I'd agree.

> See my feedback on the other patch.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 11:05       ` Eric Dumazet
  2024-05-14 11:17         ` Alexander Lobakin
  2024-05-14 11:29         ` Heiner Kallweit
@ 2024-05-14 14:11         ` Jakub Kicinski
  2024-05-14 16:35           ` Heiner Kallweit
  2 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-05-14 14:11 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Eric Dumazet, Heiner Kallweit, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, 14 May 2024 13:05:55 +0200 Eric Dumazet wrote:
> > napi_schedule()         // we disabled interrupts
> > napi_poll()             // we polled < budget frames
> > napi_complete_done()    // reenable the interrupts, no repoll
> >   hrtimer_start()       // GRO flush is queued
> >     napi_schedule()
> >       napi_poll()       // GRO flush, BUT interrupts are enabled

I thought the bug is because of a race with disable.
But there's already a synchronize_net() after disable, so NAPI poll
must fully exit before we mask in rtl8169_cleanup().

If the bug is double-enable you describe the fix is just making 
the race window smaller. But I don't think that's the bug.

BTW why are events only acked in rtl8169_interrupt() and not
rtl8169_poll()? 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 11:17         ` Alexander Lobakin
@ 2024-05-14 14:27           ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2024-05-14 14:27 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Heiner Kallweit, Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, May 14, 2024 at 1:18 PM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 14 May 2024 13:05:55 +0200
>
> > On Tue, May 14, 2024 at 12:53 PM Alexander Lobakin
> > <aleksander.lobakin@intel.com> wrote:
> >>
> >> From: Eric Dumazet <edumazet@google.com>
> >> Date: Tue, 14 May 2024 11:45:05 +0200
> >>
> >>> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> >>>> default value of 20000 and napi_defer_hard_irqs is set to 0.
> >>>> In this scenario device interrupts aren't disabled, what seems to
> >>>> trigger some silicon bug under heavy load. I was able to reproduce this
> >>>> behavior on RTL8168h.
> >>>> Disabling device interrupts if NAPI is scheduled from a place other than
> >>>> the driver's interrupt handler is a necessity in r8169, for other
> >>>> drivers it may still be a performance optimization.
> >>>>
> >>>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> >>>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> >>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >>>> ---
> >>>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
> >>>>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> >>>> index e5ea827a2..01f0ca53d 100644
> >>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
> >>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> >>>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>>>  {
> >>>>         struct rtl8169_private *tp = dev_instance;
> >>>>         u32 status = rtl_get_events(tp);
> >>>> +       int ret;
> >>>>
> >>>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
> >>>>                 return IRQ_NONE;
> >>>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
> >>>>         }
> >>>>
> >>>> -       if (napi_schedule_prep(&tp->napi)) {
> >>>> +       ret = __napi_schedule_prep(&tp->napi);
> >>>> +       if (ret >= 0)
> >>>>                 rtl_irq_disable(tp);
> >>>> +       if (ret > 0)
> >>>>                 __napi_schedule(&tp->napi);
> >>>> -       }
> >>>>  out:
> >>>>         rtl_ack_events(tp, status);
> >>>>
> >>>
> >>> I do not understand this patch.
> >>>
> >>> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> >>> but this should not happen under normal operations ?
> >>
> >> Without this patch, napi_schedule_prep() returns false if it's either
> >> scheduled already OR it's disabled. Drivers disable interrupts only if
> >> it returns true, which means they don't do that if it's already scheduled.
> >> With this patch, __napi_schedule_prep() returns -1 if it's disabled and
> >> 0 if it was already scheduled. Which means we can disable interrupts
> >> when the result is >= 0, i.e. regardless if it was scheduled before the
> >> call or within the call.
> >>
> >> IIUC, this addresses such situations:
> >>
> >> napi_schedule()         // we disabled interrupts
> >> napi_poll()             // we polled < budget frames
> >> napi_complete_done()    // reenable the interrupts, no repoll
> >>   hrtimer_start()       // GRO flush is queued
> >>     napi_schedule()
> >>       napi_poll()       // GRO flush, BUT interrupts are enabled
> >>
> >> On r8169, this seems to cause issues. On other drivers, it seems to be
> >> okay, but with this new helper, you can save some cycles.
> >>
> >> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> >
> > Rephrasing the changelog is not really helping.
> >
> > Consider myself as a network maintainer, not as a casual patch reviewer.
>
> And?
>
> >
> > "This seems to cause issues" is rather weak.
>
> It has "Reported-by", so it really causes issues.

And ?

Revert ?

>
> >
> > I would simply revert the faulty commit, because the interrupts are
> > going to be disabled no matter what.
> >
> > Old logic was very simple and rock solid. A revert is a clear stable candidate.
> >
> > rtl_irq_disable(tp);
> > napi_schedule(&tp->napi);
> >
> > If this is still broken, we might have similar issues in old/legacy drivers.
>
> I might agree that we could just revert the mentioned commit for stable,
> but for the next net-next, avoid unnecessary
> scheduling/enabling/disabling interrupts makes sense, not only for
> "old/legacy" drivers.
> "Very simple and rock solid" is not an argument for avoiding improvements.

I explained that I failed to see the 'so called' improvement there.

You explained nothing really, just that you like some approach that I
think is for net-next.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 14:11         ` Jakub Kicinski
@ 2024-05-14 16:35           ` Heiner Kallweit
  2024-05-14 16:49             ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14 16:35 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Lobakin
  Cc: Eric Dumazet, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On 14.05.2024 16:11, Jakub Kicinski wrote:
> On Tue, 14 May 2024 13:05:55 +0200 Eric Dumazet wrote:
>>> napi_schedule()         // we disabled interrupts
>>> napi_poll()             // we polled < budget frames
>>> napi_complete_done()    // reenable the interrupts, no repoll
>>>   hrtimer_start()       // GRO flush is queued
>>>     napi_schedule()
>>>       napi_poll()       // GRO flush, BUT interrupts are enabled
> 
> I thought the bug is because of a race with disable.

No, the second napi_poll() in this scenario is executed with device
interrupts enabled, what triggers a (supposedly) hw bug under heavy
load. So the fix is to disable device interrupts also in the case
that NAPI is already scheduled when entering the interrupt handler.

> But there's already a synchronize_net() after disable, so NAPI poll
> must fully exit before we mask in rtl8169_cleanup().
> 
> If the bug is double-enable you describe the fix is just making 
> the race window smaller. But I don't think that's the bug.
> 
> BTW why are events only acked in rtl8169_interrupt() and not
> rtl8169_poll()? 

You mean clearing the rx/tx-related interrupt status bits only
after napi_complete_done(), as an alternative to disabling
device interrupts?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 16:35           ` Heiner Kallweit
@ 2024-05-14 16:49             ` Jakub Kicinski
  2024-05-14 17:09               ` Heiner Kallweit
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-05-14 16:49 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, 14 May 2024 18:35:46 +0200 Heiner Kallweit wrote:
> > I thought the bug is because of a race with disable.  
> 
> No, the second napi_poll() in this scenario is executed with device
> interrupts enabled, what triggers a (supposedly) hw bug under heavy
> load. So the fix is to disable device interrupts also in the case
> that NAPI is already scheduled when entering the interrupt handler.
> 
> > But there's already a synchronize_net() after disable, so NAPI poll
> > must fully exit before we mask in rtl8169_cleanup().
> > 
> > If the bug is double-enable you describe the fix is just making 
> > the race window smaller. But I don't think that's the bug.
> > 
> > BTW why are events only acked in rtl8169_interrupt() and not
> > rtl8169_poll()?   
> 
> You mean clearing the rx/tx-related interrupt status bits only
> after napi_complete_done(), as an alternative to disabling
> device interrupts?

Before, basically ack them at the start of a poll function.
If gro_timeout / IRQ suppression is not enabled it won't make 
much of a difference. Probably also won't make much difference
with iperf.

But normally traffic is bursty so with gro_timeout we can see 
something like:

    packets: x x  x  x x   <  no more packets  >
IRQ pending: xxx  xxxxxxxxxxxxxxxxxxxxxx
        ISR:    []                      []
    IRQ ack:    x                       x
       NAPI:     [=====] < timeout > [=] [=] < timeout > [=]

Acking at the beginning of NAPI poll can't make us miss events 
but we'd clear the pending IRQ on the "deferred" NAPI run, avoiding 
an extra HW IRQ and 2 NAPI calls:

    packets: x x  x  x x   <  no more packets  >
IRQ pending: xxxx xxxxxxxxxxxxxxxxxxx
        ISR:    []                   
    IRQ ack:     x                   x
       NAPI:     [=====] < timeout > [=]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 16:49             ` Jakub Kicinski
@ 2024-05-14 17:09               ` Heiner Kallweit
  2024-05-14 17:47                 ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-14 17:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On 14.05.2024 18:49, Jakub Kicinski wrote:
> On Tue, 14 May 2024 18:35:46 +0200 Heiner Kallweit wrote:
>>> I thought the bug is because of a race with disable.  
>>
>> No, the second napi_poll() in this scenario is executed with device
>> interrupts enabled, what triggers a (supposedly) hw bug under heavy
>> load. So the fix is to disable device interrupts also in the case
>> that NAPI is already scheduled when entering the interrupt handler.
>>
>>> But there's already a synchronize_net() after disable, so NAPI poll
>>> must fully exit before we mask in rtl8169_cleanup().
>>>
>>> If the bug is double-enable you describe the fix is just making 
>>> the race window smaller. But I don't think that's the bug.
>>>
>>> BTW why are events only acked in rtl8169_interrupt() and not
>>> rtl8169_poll()?   
>>
>> You mean clearing the rx/tx-related interrupt status bits only
>> after napi_complete_done(), as an alternative to disabling
>> device interrupts?
> 
> Before, basically ack them at the start of a poll function.
> If gro_timeout / IRQ suppression is not enabled it won't make 
> much of a difference. Probably also won't make much difference
> with iperf.
> 
> But normally traffic is bursty so with gro_timeout we can see 
> something like:
> 
>     packets: x x  x  x x   <  no more packets  >
> IRQ pending: xxx  xxxxxxxxxxxxxxxxxxxxxx
>         ISR:    []                      []
>     IRQ ack:    x                       x
>        NAPI:     [=====] < timeout > [=] [=] < timeout > [=]
> 
> Acking at the beginning of NAPI poll can't make us miss events 
> but we'd clear the pending IRQ on the "deferred" NAPI run, avoiding 
> an extra HW IRQ and 2 NAPI calls:
> 
>     packets: x x  x  x x   <  no more packets  >
> IRQ pending: xxxx xxxxxxxxxxxxxxxxxxx
>         ISR:    []                   
>     IRQ ack:     x                   x
>        NAPI:     [=====] < timeout > [=]

Thanks for the explanation. What is the benefit of acking interrupts
at the beginning of NAPI poll, compared to acking them after
napi_complete_done()?
If budget is exceeded and we know we're polled again, why ack
the interrupts in between?
I just tested with the defaults of gro_flush_timeout=20000 and
napi_defer_hardirqs=1, and iperf3 --bidir.
The difference is massive. When acking after napi_complete_done()
I see only a few hundred interrupts. Acking at the beginning of
NAPI poll it's few hundred thousand interrupts.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 17:09               ` Heiner Kallweit
@ 2024-05-14 17:47                 ` Jakub Kicinski
  2024-05-14 17:49                   ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-05-14 17:47 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, 14 May 2024 19:09:21 +0200 Heiner Kallweit wrote:
> Thanks for the explanation. What is the benefit of acking interrupts
> at the beginning of NAPI poll, compared to acking them after
> napi_complete_done()?
> If budget is exceeded and we know we're polled again, why ack
> the interrupts in between?

That's a fair point, the main concern for acking after processing
is that we will miss an event. If we ack before processing we can
occasionally take an unnecessary IRQ, but we'll never let a packet
rot on the ring because it arrived between processing packets and
acking the IRQ.
But you know the driver better, maybe there's a clean way of avoiding
the missed IRQs (not sure it would be worth the complexity, tho TBH).

> I just tested with the defaults of gro_flush_timeout=20000 and
> napi_defer_hardirqs=1, and iperf3 --bidir.
> The difference is massive. When acking after napi_complete_done()
> I see only a few hundred interrupts. Acking at the beginning of
> NAPI poll it's few hundred thousand interrupts.

That's quite odd. Maybe because rtl_tx() doesn't contribute to work
done? Maybe it'd be better to set work done to min(budget, !!tx, rx) ?

Or maybe the disabling is not working somehow?

napi_defer_hardirqs=1 should make us reschedule NAPI if there was _any_
work done. Meaning we'd enable NAPI only after a completely empty NAPI
run. On an empty NAPI run it should not matter whether we acked before
or after checking for packets, or so I'd naively think.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 17:47                 ` Jakub Kicinski
@ 2024-05-14 17:49                   ` Jakub Kicinski
  2024-05-14 20:47                     ` Ken Milmore
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-05-14 17:49 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On Tue, 14 May 2024 10:47:39 -0700 Jakub Kicinski wrote:
> enable NAPI 

enable IRQ

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14 17:49                   ` Jakub Kicinski
@ 2024-05-14 20:47                     ` Ken Milmore
  0 siblings, 0 replies; 19+ messages in thread
From: Ken Milmore @ 2024-05-14 20:47 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Heiner Kallweit, Alexander Lobakin, Eric Dumazet, Paolo Abeni,
	David Miller, Realtek linux nic maintainers, Jakub Kicinski

It seems to me that these chips are known for being badly-documented and quirky,
so maybe an empirical approach is called for.

I have briefly surveyed various driver sources available from the vendor and
from the BSDs, and AFAICT they all follow the pattern of unconditionally
masking interrupts, then clearing the status bits, then processing the rings
and then re-enabling interrupts in that order. In this respect, the Linux
driver may have become an outlier in that it doesn't *always* mask interrupts
before acking them, and that it may process the rings while IRQs are unmasked.
I'm not saying that these are necessarily problems, but...

There are some differences in how the drivers work: the FreeBSD one masks
interrupts straight away but defers writing the status register to the bottom
half, the OpenBSD driver seems to do everything in the IRQ handler. These
drivers also tend to flip between using "hard IRQs" and the built-in timer,
which complicates things. But in terms of the "mask first" approach, I think
they all look equivalent.

https://sources.debian.org/src/r8168/8.053.00-1/src/r8168_n.c/
https://sources.debian.org/src/r8125/9.011.00-4/src/r8125_n.c/
https://github.com/openbsd/src/blob/master/sys/dev/ic/re.c
https://github.com/openbsd/src/blob/master/sys/dev/pci/if_rge.c
https://cgit.freebsd.org/src/tree/sys/dev/re/if_re.c


I also found this ancient netdev thread which looks startlingly familiar to the
behaviour in the present issue. It seems that people have been here before...

https://lore.kernel.org/netdev/1242328457.32579.12.camel@lap75545.ornl.gov/
"I added some code to print the irq status when it hangs, and it shows
0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've
lost an MSI-edge interrupt somehow."

https://lore.kernel.org/netdev/1243042174.3580.23.camel@obelisk.thedillows.org/
"The 8169 chip only generates MSI interrupts when all enabled event
sources are quiescent and one or more sources transition to active. If
not all of the active events are acknowledged, or a new event becomes
active while the existing ones are cleared in the handler, we will not
see a new interrupt."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI
  2024-05-14  9:45   ` Eric Dumazet
  2024-05-14 10:52     ` Alexander Lobakin
@ 2024-05-15  5:53     ` Heiner Kallweit
  1 sibling, 0 replies; 19+ messages in thread
From: Heiner Kallweit @ 2024-05-15  5:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Jakub Kicinski, David Miller,
	Realtek linux nic maintainers, netdev@vger.kernel.org,
	Ken Milmore

On 14.05.2024 11:45, Eric Dumazet wrote:
> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>> In this scenario device interrupts aren't disabled, what seems to
>> trigger some silicon bug under heavy load. I was able to reproduce this
>> behavior on RTL8168h.
>> Disabling device interrupts if NAPI is scheduled from a place other than
>> the driver's interrupt handler is a necessity in r8169, for other
>> drivers it may still be a performance optimization.
>>
>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index e5ea827a2..01f0ca53d 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>  {
>>         struct rtl8169_private *tp = dev_instance;
>>         u32 status = rtl_get_events(tp);
>> +       int ret;
>>
>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>>                 return IRQ_NONE;
>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>>         }
>>
>> -       if (napi_schedule_prep(&tp->napi)) {
>> +       ret = __napi_schedule_prep(&tp->napi);
>> +       if (ret >= 0)
>>                 rtl_irq_disable(tp);
>> +       if (ret > 0)
>>                 __napi_schedule(&tp->napi);
>> -       }
>>  out:
>>         rtl_ack_events(tp, status);
>>
> 
> I do not understand this patch.
> 
> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> but this should not happen under normal operations ?
> 
> A simple revert would avoid adding yet another NAPI helper.

Fine with me, I'll go with the revert for now.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-15  5:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-14  6:49 [PATCH net 0/2] r8169: Fix GRO-related issue with not disabled device interrupts Heiner Kallweit
2024-05-14  6:50 ` [PATCH net 1/2] net: add napi_schedule_prep variant with more granular return value Heiner Kallweit
2024-05-14 11:07   ` Eric Dumazet
2024-05-14 11:40     ` Heiner Kallweit
2024-05-14  6:52 ` [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI Heiner Kallweit
2024-05-14  9:45   ` Eric Dumazet
2024-05-14 10:52     ` Alexander Lobakin
2024-05-14 11:05       ` Eric Dumazet
2024-05-14 11:17         ` Alexander Lobakin
2024-05-14 14:27           ` Eric Dumazet
2024-05-14 11:29         ` Heiner Kallweit
2024-05-14 14:11         ` Jakub Kicinski
2024-05-14 16:35           ` Heiner Kallweit
2024-05-14 16:49             ` Jakub Kicinski
2024-05-14 17:09               ` Heiner Kallweit
2024-05-14 17:47                 ` Jakub Kicinski
2024-05-14 17:49                   ` Jakub Kicinski
2024-05-14 20:47                     ` Ken Milmore
2024-05-15  5:53     ` Heiner Kallweit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).