pegged softirq and NAPI race (?)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* pegged softirq and NAPI race (?)
@ 2018-09-18  8:41 Song Liu
  2018-09-18 13:45 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-09-18  8:41 UTC (permalink / raw)
  To: Networking
  Cc: edumazet@google.com, jeffrey.t.kirsher@intel.com,
	alexander.h.duyck@intel.com, Michael Chan, Kernel Team

We are debugging this issue that netconsole message triggers pegged softirq
(ksoftirqd taking 100% CPU for many seconds). We found this issue in
production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
detail).

After debugging for some time, we found that this issue is likely related
to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
this commit also reduces the chances we hit the issue with bnxt (it still
happens with a lower rate).

I tried to fix this issue with relaxed variant (or older version) of
napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
However, my tests do not always go as expected.

Please share your comments/suggestions on which direction shall we try
to fix this.

Thanks in advance!
Song

[1] https://www.spinics.net/lists/netdev/msg522328.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18  8:41 pegged softirq and NAPI race (?) Song Liu
@ 2018-09-18 13:45 ` Eric Dumazet
  2018-09-18 16:19   ` Song Liu
  2018-09-18 17:51   ` Alexei Starovoitov
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 13:45 UTC (permalink / raw)
  To: songliubraving
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan, kernel-team

On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
>
> We are debugging this issue that netconsole message triggers pegged softirq
> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
> detail).
>
> After debugging for some time, we found that this issue is likely related
> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
> this commit also reduces the chances we hit the issue with bnxt (it still
> happens with a lower rate).
>
> I tried to fix this issue with relaxed variant (or older version) of
> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
> However, my tests do not always go as expected.
>
> Please share your comments/suggestions on which direction shall we try
> to fix this.
>
> Thanks in advance!
> Song
>
>
> [1] https://www.spinics.net/lists/netdev/msg522328.html

You have not traced ixgbe to understand why driver hits
"clean_complete=false" all the time ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 13:45 ` Eric Dumazet
@ 2018-09-18 16:19   ` Song Liu
  2018-09-18 16:31     ` Rik van Riel
  2018-09-18 16:33     ` Eric Dumazet
  2018-09-18 17:51   ` Alexei Starovoitov
  1 sibling, 2 replies; 22+ messages in thread
From: Song Liu @ 2018-09-18 16:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan@broadcom.com,
	Kernel Team



> On Sep 18, 2018, at 6:45 AM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> We are debugging this issue that netconsole message triggers pegged softirq
>> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
>> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
>> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
>> detail).
>> 
>> After debugging for some time, we found that this issue is likely related
>> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
>> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
>> this commit also reduces the chances we hit the issue with bnxt (it still
>> happens with a lower rate).
>> 
>> I tried to fix this issue with relaxed variant (or older version) of
>> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
>> However, my tests do not always go as expected.
>> 
>> Please share your comments/suggestions on which direction shall we try
>> to fix this.
>> 
>> Thanks in advance!
>> Song
>> 
>> 
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
> 
> You have not traced ixgbe to understand why driver hits
> "clean_complete=false" all the time ?

The trace showed that we got "clean_complete=false" because 
ixgbe_clean_rx_irq() used all budget (64). It feels like the driver
is tricked to process old data on the rx_ring for one more time. 

Have you seen similar issue?

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 16:19   ` Song Liu
@ 2018-09-18 16:31     ` Rik van Riel
  2018-09-18 16:33     ` Eric Dumazet
  1 sibling, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2018-09-18 16:31 UTC (permalink / raw)
  To: Song Liu, Eric Dumazet
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan@broadcom.com,
	Kernel Team

On Tue, 2018-09-18 at 12:19 -0400, Song Liu wrote:
> > On Sep 18, 2018, at 6:45 AM, Eric Dumazet <edumazet@google.com>
> > wrote:
> > 
> > On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com>
> > wrote:
> > > 
> > > We are debugging this issue that netconsole message triggers
> > > pegged softirq
> > > (ksoftirqd taking 100% CPU for many seconds). We found this issue
> > > in
> > > production with both bnxt and ixgbe, on a 4.11 based kernel. This
> > > is easily
> > > reproducible with ixgbe on 4.11, and latest net/net-next (see [1]
> > > for more
> > > detail).
> > > 
> > > After debugging for some time, we found that this issue is likely
> > > related
> > > to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this
> > > commit,
> > > the steps described in [1] cannot reproduce the issue on ixgbe.
> > > Reverting
> > > this commit also reduces the chances we hit the issue with bnxt
> > > (it still
> > > happens with a lower rate).
> > > 
> > > I tried to fix this issue with relaxed variant (or older version)
> > > of
> > > napi_schedule_prep() in netpoll, just like the one on
> > > napi_watchdog().
> > > However, my tests do not always go as expected.
> > > 
> > > Please share your comments/suggestions on which direction shall
> > > we try
> > > to fix this.
> > > 
> > > Thanks in advance!
> > > Song
> > > 
> > > 
> > > [1] 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
> > 
> > You have not traced ixgbe to understand why driver hits
> > "clean_complete=false" all the time ?
> 
> The trace showed that we got "clean_complete=false" because 
> ixgbe_clean_rx_irq() used all budget (64). It feels like the driver
> is tricked to process old data on the rx_ring for one more time. 
> 
> Have you seen similar issue?

A quick reading of the code suggests that means
polling cannot keep up with the rate of incoming
packets.

That should not be a surprise, given that polling
appears to happen on just one CPU, while interrupt
driven packet delivery was fanned out across a
larger number of CPUs.

Does the NAPI code have any way in which it
periodically force-returns to IRQ mode, because
multiple CPUs in IRQ mode can keep up with packets
better than a single CPU in polling mode?

Alternatively, is NAPI with multi-queue network
adapters supposed to be polling on multiple CPUs,
but simply failing to do so in this case?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 16:19   ` Song Liu
  2018-09-18 16:31     ` Rik van Riel
@ 2018-09-18 16:33     ` Eric Dumazet
  2018-09-18 16:49       ` Song Liu
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 16:33 UTC (permalink / raw)
  To: songliubraving
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan, kernel-team

On Tue, Sep 18, 2018 at 9:19 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Sep 18, 2018, at 6:45 AM, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> We are debugging this issue that netconsole message triggers pegged softirq
> >> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
> >> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
> >> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
> >> detail).
> >>
> >> After debugging for some time, we found that this issue is likely related
> >> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
> >> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
> >> this commit also reduces the chances we hit the issue with bnxt (it still
> >> happens with a lower rate).
> >>
> >> I tried to fix this issue with relaxed variant (or older version) of
> >> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
> >> However, my tests do not always go as expected.
> >>
> >> Please share your comments/suggestions on which direction shall we try
> >> to fix this.
> >>
> >> Thanks in advance!
> >> Song
> >>
> >>
> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
> >
> > You have not traced ixgbe to understand why driver hits
> > "clean_complete=false" all the time ?
>
> The trace showed that we got "clean_complete=false" because
> ixgbe_clean_rx_irq() used all budget (64). It feels like the driver
> is tricked to process old data on the rx_ring for one more time.

Process old data ???? That would be quite an horrible bug !

Probably ASAN would help here, detecting use-after-free or things like that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 16:33     ` Eric Dumazet
@ 2018-09-18 16:49       ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-09-18 16:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan@broadcom.com,
	Kernel Team



> On Sep 18, 2018, at 9:33 AM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 9:19 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Sep 18, 2018, at 6:45 AM, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> We are debugging this issue that netconsole message triggers pegged softirq
>>>> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
>>>> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
>>>> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
>>>> detail).
>>>> 
>>>> After debugging for some time, we found that this issue is likely related
>>>> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
>>>> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
>>>> this commit also reduces the chances we hit the issue with bnxt (it still
>>>> happens with a lower rate).
>>>> 
>>>> I tried to fix this issue with relaxed variant (or older version) of
>>>> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
>>>> However, my tests do not always go as expected.
>>>> 
>>>> Please share your comments/suggestions on which direction shall we try
>>>> to fix this.
>>>> 
>>>> Thanks in advance!
>>>> Song
>>>> 
>>>> 
>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
>>> 
>>> You have not traced ixgbe to understand why driver hits
>>> "clean_complete=false" all the time ?
>> 
>> The trace showed that we got "clean_complete=false" because
>> ixgbe_clean_rx_irq() used all budget (64). It feels like the driver
>> is tricked to process old data on the rx_ring for one more time.
> 
> Process old data ???? That would be quite an horrible bug !
> 
> Probably ASAN would help here, detecting use-after-free or things like that.

I have tried KASAN, unfortunately, it doesn't yield any useful data. I think 
it is not a use-after-free of some skb. It is more like bugs in the handling 
of the rx/tx ring. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 13:45 ` Eric Dumazet
  2018-09-18 16:19   ` Song Liu
@ 2018-09-18 17:51   ` Alexei Starovoitov
  2018-09-18 18:17     ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2018-09-18 17:51 UTC (permalink / raw)
  To: Eric Dumazet, Song Liu
  Cc: netdev, Jeff Kirsher, Alexander Duyck, michael.chan@broadcom.com,
	Kernel Team

On 9/18/18 6:45 AM, Eric Dumazet wrote:
> On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
>>
>> We are debugging this issue that netconsole message triggers pegged softirq
>> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
>> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
>> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
>> detail).
>>
>> After debugging for some time, we found that this issue is likely related
>> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
>> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
>> this commit also reduces the chances we hit the issue with bnxt (it still
>> happens with a lower rate).
>>
>> I tried to fix this issue with relaxed variant (or older version) of
>> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
>> However, my tests do not always go as expected.
>>
>> Please share your comments/suggestions on which direction shall we try
>> to fix this.
>>
>> Thanks in advance!
>> Song
>>
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
>
> You have not traced ixgbe to understand why driver hits
> "clean_complete=false" all the time ?

Eric,

I'm looking at commit 39e6c8208d7b and wondering that it's doing
clear_bit(NAPI_STATE_MISSED,..);
for busy_poll_stop(), but not for netpoll.
Can that be an issue?

and then something like below is needed:
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 57557a6a950c..a848be6b503c 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -172,6 +172,7 @@ static void poll_one_napi(struct napi_struct *napi)
         trace_napi_poll(napi, work, 0);

         clear_bit(NAPI_STATE_NPSVC, &napi->state);
+       clear_bit(NAPI_STATE_MISSED, &napi->state);
  }




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 17:51   ` Alexei Starovoitov
@ 2018-09-18 18:17     ` Eric Dumazet
  2018-09-18 20:37       ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 18:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: songliubraving, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan, kernel-team

On Tue, Sep 18, 2018 at 10:51 AM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 9/18/18 6:45 AM, Eric Dumazet wrote:
> > On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> We are debugging this issue that netconsole message triggers pegged softirq
> >> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
> >> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
> >> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
> >> detail).
> >>
> >> After debugging for some time, we found that this issue is likely related
> >> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
> >> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
> >> this commit also reduces the chances we hit the issue with bnxt (it still
> >> happens with a lower rate).
> >>
> >> I tried to fix this issue with relaxed variant (or older version) of
> >> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
> >> However, my tests do not always go as expected.
> >>
> >> Please share your comments/suggestions on which direction shall we try
> >> to fix this.
> >>
> >> Thanks in advance!
> >> Song
> >>
> >>
> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
> >
> > You have not traced ixgbe to understand why driver hits
> > "clean_complete=false" all the time ?
>
> Eric,
>
> I'm looking at commit 39e6c8208d7b and wondering that it's doing
> clear_bit(NAPI_STATE_MISSED,..);
> for busy_poll_stop(), but not for netpoll.
> Can that be an issue?
>
> and then something like below is needed:
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 57557a6a950c..a848be6b503c 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -172,6 +172,7 @@ static void poll_one_napi(struct napi_struct *napi)
>          trace_napi_poll(napi, work, 0);
>
>          clear_bit(NAPI_STATE_NPSVC, &napi->state);
> +       clear_bit(NAPI_STATE_MISSED, &napi->state);
>   }


NAPI_STATE_MISSED should only be cleared under strict circumstances.

The clear in  busy_poll_stop() is an optimization really (as explained
in the comment)

It is cleared when napi_complete_done() is eventually called, but if
ixgbe always handle 64 RX frames in its poll function,
napi_complete_done() will not be called. The bug is  in ixgbe,
pretending its poll function should be called forever.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 18:17     ` Eric Dumazet
@ 2018-09-18 20:37       ` Song Liu
  2018-09-18 21:13         ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-09-18 20:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan@broadcom.com, Kernel Team



> On Sep 18, 2018, at 11:17 AM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 10:51 AM Alexei Starovoitov <ast@fb.com> wrote:
>> 
>> On 9/18/18 6:45 AM, Eric Dumazet wrote:
>>> On Tue, Sep 18, 2018 at 1:41 AM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> We are debugging this issue that netconsole message triggers pegged softirq
>>>> (ksoftirqd taking 100% CPU for many seconds). We found this issue in
>>>> production with both bnxt and ixgbe, on a 4.11 based kernel. This is easily
>>>> reproducible with ixgbe on 4.11, and latest net/net-next (see [1] for more
>>>> detail).
>>>> 
>>>> After debugging for some time, we found that this issue is likely related
>>>> to 39e6c8208d7b ("net: solve a NAPI race"). After reverting this commit,
>>>> the steps described in [1] cannot reproduce the issue on ixgbe. Reverting
>>>> this commit also reduces the chances we hit the issue with bnxt (it still
>>>> happens with a lower rate).
>>>> 
>>>> I tried to fix this issue with relaxed variant (or older version) of
>>>> napi_schedule_prep() in netpoll, just like the one on napi_watchdog().
>>>> However, my tests do not always go as expected.
>>>> 
>>>> Please share your comments/suggestions on which direction shall we try
>>>> to fix this.
>>>> 
>>>> Thanks in advance!
>>>> Song
>>>> 
>>>> 
>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg522328.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=iSaOapj1kxjhGYLgQr0Qd8mQCzVdobmgT1L4JwFvzxs&s=lCEhrz6wQJUUaJOkxFmtOszAgkf3Jh4reX_i1GbI5RI&e=
>>> 
>>> You have not traced ixgbe to understand why driver hits
>>> "clean_complete=false" all the time ?
>> 
>> Eric,
>> 
>> I'm looking at commit 39e6c8208d7b and wondering that it's doing
>> clear_bit(NAPI_STATE_MISSED,..);
>> for busy_poll_stop(), but not for netpoll.
>> Can that be an issue?
>> 
>> and then something like below is needed:
>> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> index 57557a6a950c..a848be6b503c 100644
>> --- a/net/core/netpoll.c
>> +++ b/net/core/netpoll.c
>> @@ -172,6 +172,7 @@ static void poll_one_napi(struct napi_struct *napi)
>>         trace_napi_poll(napi, work, 0);
>> 
>>         clear_bit(NAPI_STATE_NPSVC, &napi->state);
>> +       clear_bit(NAPI_STATE_MISSED, &napi->state);
>>  }
> 
> 
> NAPI_STATE_MISSED should only be cleared under strict circumstances.
> 
> The clear in  busy_poll_stop() is an optimization really (as explained
> in the comment)
> 
> It is cleared when napi_complete_done() is eventually called, but if
> ixgbe always handle 64 RX frames in its poll function,
> napi_complete_done() will not be called. The bug is  in ixgbe,
> pretending its poll function should be called forever.


Looks like a patch like the following fixes the issue for ixgbe. But I
cannot explain it yet. 

Does this ring a bell?

Thanks,
Song



diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 787c84fb20dd..51611f799dae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3059,11 +3059,14 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
 static irqreturn_t ixgbe_msix_clean_rings(int irq, void *data)
 {
        struct ixgbe_q_vector *q_vector = data;
+       struct napi_struct *napi = &q_vector->napi;

        /* EIAM disabled interrupts (on this vector) for us */

-       if (q_vector->rx.ring || q_vector->tx.ring)
-               napi_schedule_irqoff(&q_vector->napi);
+       if ((q_vector->rx.ring || q_vector->tx.ring) &&
+           !napi_disable_pending(napi) &&
+           !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
+               __napi_schedule_irqoff(napi);

        return IRQ_HANDLED;
 }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 20:37       ` Song Liu
@ 2018-09-18 21:13         ` Eric Dumazet
  2018-09-18 21:21           ` Eric Dumazet
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 21:13 UTC (permalink / raw)
  To: songliubraving
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan, kernel-team

On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com> wrote:
>

> Looks like a patch like the following fixes the issue for ixgbe. But I
> cannot explain it yet.
>
> Does this ring a bell?

I dunno, it looks like the NIC is  generating an interrupt while it should not,
and constantly sets NAPI_STATE_MISSED.

Or maybe we need to properly check napi_complete_done() return value.

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedcdcb72dc1c401813
100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int budget)
        ixgb_clean_rx_irq(adapter, &work_done, budget);

        /* If budget not fully consumed, exit the polling mode */
-       if (work_done < budget) {
-               napi_complete_done(napi, work_done);
+       if (work_done < budget &&
+           napi_complete_done(napi, work_done)) {
                if (!test_bit(__IXGB_DOWN, &adapter->flags))
                        ixgb_irq_enable(adapter);
        }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:13         ` Eric Dumazet
@ 2018-09-18 21:21           ` Eric Dumazet
  2018-09-18 21:36             ` Jeff Kirsher
  2018-09-18 21:21           ` Song Liu
  2018-09-18 21:25           ` Florian Fainelli
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 21:21 UTC (permalink / raw)
  To: Eric Dumazet, songliubraving
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan, kernel-team



On 09/18/2018 02:13 PM, Eric Dumazet wrote:
> On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com> wrote:
>>
> 
>> Looks like a patch like the following fixes the issue for ixgbe. But I
>> cannot explain it yet.
>>
>> Does this ring a bell?
> 
> I dunno, it looks like the NIC is  generating an interrupt while it should not,
> and constantly sets NAPI_STATE_MISSED.
> 
> Or maybe we need to properly check napi_complete_done() return value.
> 
> diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> index d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedcdcb72dc1c401813
> 100644
> --- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> +++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> @@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int budget)
>         ixgb_clean_rx_irq(adapter, &work_done, budget);
> 
>         /* If budget not fully consumed, exit the polling mode */
> -       if (work_done < budget) {
> -               napi_complete_done(napi, work_done);
> +       if (work_done < budget &&
> +           napi_complete_done(napi, work_done)) {
>                 if (!test_bit(__IXGB_DOWN, &adapter->flags))
>                         ixgb_irq_enable(adapter);
>         }
> 


ixgbe patch would be :

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 604282f03d236e4358fc91e64d8ba00a9b36cb8c..80d00aecb6e3e3e950ce6309bfe3639953dd73d9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3196,12 +3196,12 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
                return budget;
 
        /* all work done, exit the polling mode */
-       napi_complete_done(napi, work_done);
-       if (adapter->rx_itr_setting & 1)
-               ixgbe_set_itr(q_vector);
-       if (!test_bit(__IXGBE_DOWN, &adapter->state))
-               ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector->v_idx));
-
+       if (napi_complete_done(napi, work_done)) {
+               if (adapter->rx_itr_setting & 1)
+                       ixgbe_set_itr(q_vector);
+               if (!test_bit(__IXGBE_DOWN, &adapter->state))
+                       ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector->v_idx));
+       }
        return min(work_done, budget - 1);
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:21           ` Eric Dumazet
@ 2018-09-18 21:36             ` Jeff Kirsher
  2018-09-18 21:40               ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Kirsher @ 2018-09-18 21:36 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, songliubraving
  Cc: Alexei Starovoitov, netdev, Alexander Duyck, michael.chan,
	kernel-team

[-- Attachment #1: Type: text/plain, Size: 2907 bytes --]

On Tue, 2018-09-18 at 14:21 -0700, Eric Dumazet wrote:
> 
> On 09/18/2018 02:13 PM, Eric Dumazet wrote:
> > On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com>
> > wrote:
> > > 
> > > Looks like a patch like the following fixes the issue for ixgbe.
> > > But I
> > > cannot explain it yet.
> > > 
> > > Does this ring a bell?
> > 
> > I dunno, it looks like the NIC is  generating an interrupt while it
> > should not,
> > and constantly sets NAPI_STATE_MISSED.
> > 
> > Or maybe we need to properly check napi_complete_done() return
> > value.
> > 
> > diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> > b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> > index
> > d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedc
> > dcb72dc1c401813
> > 100644
> > --- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> > +++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> > @@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int
> > budget)
> >         ixgb_clean_rx_irq(adapter, &work_done, budget);
> > 
> >         /* If budget not fully consumed, exit the polling mode */
> > -       if (work_done < budget) {
> > -               napi_complete_done(napi, work_done);
> > +       if (work_done < budget &&
> > +           napi_complete_done(napi, work_done)) {
> >                 if (!test_bit(__IXGB_DOWN, &adapter->flags))
> >                         ixgb_irq_enable(adapter);
> >         }
> > 
> 
> 
> ixgbe patch would be :
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index
> 604282f03d236e4358fc91e64d8ba00a9b36cb8c..80d00aecb6e3e3e950ce6309bfe
> 3639953dd73d9 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -3196,12 +3196,12 @@ int ixgbe_poll(struct napi_struct *napi, int
> budget)
>                 return budget;
>  
>         /* all work done, exit the polling mode */
> -       napi_complete_done(napi, work_done);
> -       if (adapter->rx_itr_setting & 1)
> -               ixgbe_set_itr(q_vector);
> -       if (!test_bit(__IXGBE_DOWN, &adapter->state))
> -               ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector-
> >v_idx));
> -
> +       if (napi_complete_done(napi, work_done)) {
> +               if (adapter->rx_itr_setting & 1)
> +                       ixgbe_set_itr(q_vector);
> +               if (!test_bit(__IXGBE_DOWN, &adapter->state))
> +                       ixgbe_irq_enable_queues(adapter,
> BIT_ULL(q_vector->v_idx));
> +       }
>         return min(work_done, budget - 1);
>  }
>  

Eric, after Song does some testing on these changes, will you be
submitting a formal patch?  If so, make sure to send it to 
intel-wired-lan@lists.osuosl.org mailing list so I can pick up the fix.

By the way, thanks!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:36             ` Jeff Kirsher
@ 2018-09-18 21:40               ` Song Liu
  2018-09-18 21:46                 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-09-18 21:40 UTC (permalink / raw)
  To: jeffrey.t.kirsher@intel.com
  Cc: Eric Dumazet, Eric Dumazet, Alexei Starovoitov, netdev,
	Alexander Duyck, michael.chan@broadcom.com, Kernel Team



> On Sep 18, 2018, at 2:36 PM, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> 
> On Tue, 2018-09-18 at 14:21 -0700, Eric Dumazet wrote:
>> 
>> On 09/18/2018 02:13 PM, Eric Dumazet wrote:
>>> On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com>
>>> wrote:
>>>> 
>>>> Looks like a patch like the following fixes the issue for ixgbe.
>>>> But I
>>>> cannot explain it yet.
>>>> 
>>>> Does this ring a bell?
>>> 
>>> I dunno, it looks like the NIC is  generating an interrupt while it
>>> should not,
>>> and constantly sets NAPI_STATE_MISSED.
>>> 
>>> Or maybe we need to properly check napi_complete_done() return
>>> value.
>>> 
>>> diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
>>> b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
>>> index
>>> d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedc
>>> dcb72dc1c401813
>>> 100644
>>> --- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
>>> +++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
>>> @@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int
>>> budget)
>>>        ixgb_clean_rx_irq(adapter, &work_done, budget);
>>> 
>>>        /* If budget not fully consumed, exit the polling mode */
>>> -       if (work_done < budget) {
>>> -               napi_complete_done(napi, work_done);
>>> +       if (work_done < budget &&
>>> +           napi_complete_done(napi, work_done)) {
>>>                if (!test_bit(__IXGB_DOWN, &adapter->flags))
>>>                        ixgb_irq_enable(adapter);
>>>        }
>>> 
>> 
>> 
>> ixgbe patch would be :
>> 
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index
>> 604282f03d236e4358fc91e64d8ba00a9b36cb8c..80d00aecb6e3e3e950ce6309bfe
>> 3639953dd73d9 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -3196,12 +3196,12 @@ int ixgbe_poll(struct napi_struct *napi, int
>> budget)
>>                return budget;
>> 
>>        /* all work done, exit the polling mode */
>> -       napi_complete_done(napi, work_done);
>> -       if (adapter->rx_itr_setting & 1)
>> -               ixgbe_set_itr(q_vector);
>> -       if (!test_bit(__IXGBE_DOWN, &adapter->state))
>> -               ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector-
>>> v_idx));
>> -
>> +       if (napi_complete_done(napi, work_done)) {

Shall we add likely() here? 

>> +               if (adapter->rx_itr_setting & 1)
>> +                       ixgbe_set_itr(q_vector);
>> +               if (!test_bit(__IXGBE_DOWN, &adapter->state))
>> +                       ixgbe_irq_enable_queues(adapter,
>> BIT_ULL(q_vector->v_idx));
>> +       }
>>        return min(work_done, budget - 1);
>> }
>> 
> 
> Eric, after Song does some testing on these changes, will you be
> submitting a formal patch?  If so, make sure to send it to 
> intel-wired-lan@lists.osuosl.org mailing list so I can pick up the fix.
> 
> By the way, thanks!

I would submit the patch if Eric prefer not to. :)

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:40               ` Song Liu
@ 2018-09-18 21:46                 ` Eric Dumazet
  2018-09-18 21:55                   ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 21:46 UTC (permalink / raw)
  To: songliubraving
  Cc: Jeff Kirsher, Eric Dumazet, Alexei Starovoitov, netdev,
	Alexander Duyck, michael.chan, kernel-team

On Tue, Sep 18, 2018 at 2:41 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> >
> I would submit the patch if Eric prefer not to. :)


Hmmm.... maybe the bug is really in ixgbe_netpoll()

This whole ndo_poll_controller() stuff  is crazy.

All sane implementations should only call napi_schedule()

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:46                 ` Eric Dumazet
@ 2018-09-18 21:55                   ` Song Liu
  2018-09-18 22:04                     ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-09-18 21:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jeff Kirsher, Eric Dumazet, Alexei Starovoitov, netdev,
	Alexander Duyck, michael.chan@broadcom.com, Kernel Team



> On Sep 18, 2018, at 2:46 PM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 2:41 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> 
>> I would submit the patch if Eric prefer not to. :)
> 
> 
> Hmmm.... maybe the bug is really in ixgbe_netpoll()
> 
> This whole ndo_poll_controller() stuff  is crazy.
> 
> All sane implementations should only call napi_schedule()

Current implement is about identical to napi_schedule_irqoff(). Do 
we really need napi_schedule() instead?

On the other hand, I think we should check napi_complete_done() in
ixgbe_poll() anyway. 

Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:55                   ` Song Liu
@ 2018-09-18 22:04                     ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 22:04 UTC (permalink / raw)
  To: songliubraving
  Cc: Jeff Kirsher, Eric Dumazet, Alexei Starovoitov, netdev,
	Alexander Duyck, michael.chan, kernel-team

On Tue, Sep 18, 2018 at 2:56 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Sep 18, 2018, at 2:46 PM, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Sep 18, 2018 at 2:41 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>>
> >> I would submit the patch if Eric prefer not to. :)
> >
> >
> > Hmmm.... maybe the bug is really in ixgbe_netpoll()
> >
> > This whole ndo_poll_controller() stuff  is crazy.
> >
> > All sane implementations should only call napi_schedule()
>
> Current implement is about identical to napi_schedule_irqoff(). Do
> we really need napi_schedule() instead?
>
> On the other hand, I think we should check napi_complete_done() in
> ixgbe_poll() anyway.

It seems the netpoll code  is racy, since another cpu might be calling
ixgbe poll(),
and return early from napi_complete_done() :

if (unlikely(n->state & (NAPIF_STATE_NPSVC |
      NAPIF_STATE_IN_BUSY_POLL)))
      return false;


This is why netpoll enabled drivers _must_ check the
napi_complete[_done]() return value,
otherwise they might re-enable IRQs why they should not.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:13         ` Eric Dumazet
  2018-09-18 21:21           ` Eric Dumazet
@ 2018-09-18 21:21           ` Song Liu
  2018-09-18 21:25             ` Eric Dumazet
  2018-09-18 21:25           ` Florian Fainelli
  2 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-09-18 21:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan@broadcom.com, Kernel Team



> On Sep 18, 2018, at 2:13 PM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com> wrote:
>> 
> 
>> Looks like a patch like the following fixes the issue for ixgbe. But I
>> cannot explain it yet.
>> 
>> Does this ring a bell?
> 
> I dunno, it looks like the NIC is  generating an interrupt while it should not,
> and constantly sets NAPI_STATE_MISSED.
> 
> Or maybe we need to properly check napi_complete_done() return value.
> 
> diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> index d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedcdcb72dc1c401813
> 100644
> --- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> +++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> @@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int budget)
>        ixgb_clean_rx_irq(adapter, &work_done, budget);
> 
>        /* If budget not fully consumed, exit the polling mode */
> -       if (work_done < budget) {
> -               napi_complete_done(napi, work_done);
> +       if (work_done < budget &&
> +           napi_complete_done(napi, work_done)) {
>                if (!test_bit(__IXGB_DOWN, &adapter->flags))
>                        ixgb_irq_enable(adapter);
>        }

Thanks Eric!

I was looking at exactly this part. :) And it seems working!

I will run a bigger test and update shortly. 

Best,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:21           ` Song Liu
@ 2018-09-18 21:25             ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 21:25 UTC (permalink / raw)
  To: songliubraving
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan, kernel-team

On Tue, Sep 18, 2018 at 2:22 PM Song Liu <songliubraving@fb.com> wrote:
>

> Thanks Eric!
>
> I was looking at exactly this part. :) And it seems working!

And this should also make busy polling a bit faster as well, avoiding
enabling/receiving interrupts in the busy loop.

>
> I will run a bigger test and update shortly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:13         ` Eric Dumazet
  2018-09-18 21:21           ` Eric Dumazet
  2018-09-18 21:21           ` Song Liu
@ 2018-09-18 21:25           ` Florian Fainelli
  2018-09-18 21:28             ` Eric Dumazet
  2 siblings, 1 reply; 22+ messages in thread
From: Florian Fainelli @ 2018-09-18 21:25 UTC (permalink / raw)
  To: Eric Dumazet, songliubraving
  Cc: Alexei Starovoitov, netdev, Jeff Kirsher, Alexander Duyck,
	michael.chan, kernel-team

On 09/18/2018 02:13 PM, Eric Dumazet wrote:
> On Tue, Sep 18, 2018 at 1:37 PM Song Liu <songliubraving@fb.com> wrote:
>>
> 
>> Looks like a patch like the following fixes the issue for ixgbe. But I
>> cannot explain it yet.
>>
>> Does this ring a bell?
> 
> I dunno, it looks like the NIC is  generating an interrupt while it should not,
> and constantly sets NAPI_STATE_MISSED.
> 
> Or maybe we need to properly check napi_complete_done() return value.
> 
> diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> index d3e72d0f66ef428b08e4bd88508e05b734bc43a4..c4c565c982a98a5891603cedcdcb72dc1c401813
> 100644
> --- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> +++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> @@ -1773,8 +1773,8 @@ ixgb_clean(struct napi_struct *napi, int budget)
>         ixgb_clean_rx_irq(adapter, &work_done, budget);
> 
>         /* If budget not fully consumed, exit the polling mode */
> -       if (work_done < budget) {
> -               napi_complete_done(napi, work_done);
> +       if (work_done < budget &&
> +           napi_complete_done(napi, work_done)) {
>                 if (!test_bit(__IXGB_DOWN, &adapter->flags))
>                         ixgb_irq_enable(adapter);
>         }

This would not be the only driver doing this unfortunately... should we
add a __must_check annotation to help catch those (mis)uses? Though that
could cause false positives for drivers using NAPI to clean their TX ring.
-- 
Florian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:25           ` Florian Fainelli
@ 2018-09-18 21:28             ` Eric Dumazet
  2018-09-18 21:35               ` Florian Fainelli
  2018-09-18 21:36               ` Song Liu
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-09-18 21:28 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: songliubraving, Alexei Starovoitov, netdev, Jeff Kirsher,
	Alexander Duyck, michael.chan, kernel-team

On Tue, Sep 18, 2018 at 2:25 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
> This would not be the only driver doing this unfortunately... should we
> add a __must_check annotation to help catch those (mis)uses? Though that
> could cause false positives for drivers using NAPI to clean their TX ring.
>

Well, before adding __must_check we would need to cook a (big) patch
series to change all occurrences.


Not clear why netpoll is the trigger ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:28             ` Eric Dumazet
@ 2018-09-18 21:35               ` Florian Fainelli
  2018-09-18 21:36               ` Song Liu
  1 sibling, 0 replies; 22+ messages in thread
From: Florian Fainelli @ 2018-09-18 21:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: songliubraving, Alexei Starovoitov, netdev, Jeff Kirsher,
	Alexander Duyck, michael.chan, kernel-team

On 09/18/2018 02:28 PM, Eric Dumazet wrote:
> On Tue, Sep 18, 2018 at 2:25 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>> This would not be the only driver doing this unfortunately... should we
>> add a __must_check annotation to help catch those (mis)uses? Though that
>> could cause false positives for drivers using NAPI to clean their TX ring.
>>
> 
> Well, before adding __must_check we would need to cook a (big) patch
> series to change all occurrences.
> 

Sounds good, I don't mind submitting something unless you beat me to it.

> 
> Not clear why netpoll is the trigger ?
> 

Me either, this should be observable during normal operation as well,
though most likely you just get a spurious interrupt and nothing to
clean in the RX ring, so things just get unnoticed?
-- 
Florian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pegged softirq and NAPI race (?)
  2018-09-18 21:28             ` Eric Dumazet
  2018-09-18 21:35               ` Florian Fainelli
@ 2018-09-18 21:36               ` Song Liu
  1 sibling, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-09-18 21:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Fainelli, Alexei Starovoitov, netdev, Jeff Kirsher,
	Alexander Duyck, michael.chan@broadcom.com, Kernel Team



> On Sep 18, 2018, at 2:28 PM, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 18, 2018 at 2:25 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 
>> 
>> This would not be the only driver doing this unfortunately... should we
>> add a __must_check annotation to help catch those (mis)uses? Though that
>> could cause false positives for drivers using NAPI to clean their TX ring.
>> 
> 
> Well, before adding __must_check we would need to cook a (big) patch
> series to change all occurrences.
> 
> 
> Not clear why netpoll is the trigger ?

>From my observation, the trigger path is:

netpoll_poll_dev() => ndo_poll_controller() => napi_schedule_prep(). 

I guess it is a race between IRQ/netpoll=>napi_schedule_prep() and 
not handled napi_complete_done(). netpoll just makes it triggers more
often? 

Unfortunately for me, bnxt already checked napi_complete_done(), so 
there should be another bug with bnxt (at least with our 4.11 based
kernel). 

Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-09-19  3:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-18  8:41 pegged softirq and NAPI race (?) Song Liu
2018-09-18 13:45 ` Eric Dumazet
2018-09-18 16:19   ` Song Liu
2018-09-18 16:31     ` Rik van Riel
2018-09-18 16:33     ` Eric Dumazet
2018-09-18 16:49       ` Song Liu
2018-09-18 17:51   ` Alexei Starovoitov
2018-09-18 18:17     ` Eric Dumazet
2018-09-18 20:37       ` Song Liu
2018-09-18 21:13         ` Eric Dumazet
2018-09-18 21:21           ` Eric Dumazet
2018-09-18 21:36             ` Jeff Kirsher
2018-09-18 21:40               ` Song Liu
2018-09-18 21:46                 ` Eric Dumazet
2018-09-18 21:55                   ` Song Liu
2018-09-18 22:04                     ` Eric Dumazet
2018-09-18 21:21           ` Song Liu
2018-09-18 21:25             ` Eric Dumazet
2018-09-18 21:25           ` Florian Fainelli
2018-09-18 21:28             ` Eric Dumazet
2018-09-18 21:35               ` Florian Fainelli
2018-09-18 21:36               ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).