All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
@ 2020-06-22 16:08 Stephen Hemminger
  2020-06-22 17:46 ` Björn Töpel
  2020-06-23  6:27 ` Yahui Chen
  0 siblings, 2 replies; 8+ messages in thread
From: Stephen Hemminger @ 2020-06-22 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh
  Cc: bpf



Begin forwarded message:

Date: Mon, 22 Jun 2020 10:13:52 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock


https://bugzilla.kernel.org/show_bug.cgi?id=208275

            Bug ID: 208275
           Summary: kernel hang occasionally while running the sample of
                    xdpsock
           Product: Networking
           Version: 2.5
    Kernel Version: 5.7.0
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: goodluckwillcomesoon@gmail.com
        Regression: No

Distribution:
5.7.0-1.el7.centos.x86_64

Hardware Environment:
Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016

Software Environment:


Problem Description:
kernel hang occasionally while running the sample of xdpsock

Steps to reproduce:

I want to test the rx performace of AF_XDP. I change the nic to 4 queues by cmd
`ethtool -L p6p1 combined 4`, then I will create 1 socket for every queue.

for ((i=0; i<4; i++));
do 
./xdpsock -r -z -i p6p1 -m -q $i &
done

I run the xdpsock in samples/bpf using the shell command above.
And occasionally the kernel hang, so I have to power off and on.

Additonal information:

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
  2020-06-22 16:08 Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock Stephen Hemminger
@ 2020-06-22 17:46 ` Björn Töpel
  2020-06-22 17:49     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2020-06-23  6:27 ` Yahui Chen
  1 sibling, 1 reply; 8+ messages in thread
From: Björn Töpel @ 2020-06-22 17:46 UTC (permalink / raw)
  To: Stephen Hemminger, Karlsson, Magnus, Björn Töpel
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, bpf

On Mon, 22 Jun 2020 at 18:08, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
>
> Begin forwarded message:
>
> Date: Mon, 22 Jun 2020 10:13:52 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
>

Thanks for forwarding, Stephen.

I'll have a look!


Björn

>
> https://bugzilla.kernel.org/show_bug.cgi?id=208275
>
>             Bug ID: 208275
>            Summary: kernel hang occasionally while running the sample of
>                     xdpsock
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 5.7.0
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: goodluckwillcomesoon@gmail.com
>         Regression: No
>
> Distribution:
> 5.7.0-1.el7.centos.x86_64
>
> Hardware Environment:
> Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016
>
> Software Environment:
>
>
> Problem Description:
> kernel hang occasionally while running the sample of xdpsock
>
> Steps to reproduce:
>
> I want to test the rx performace of AF_XDP. I change the nic to 4 queues by cmd
> `ethtool -L p6p1 combined 4`, then I will create 1 socket for every queue.
>
> for ((i=0; i<4; i++));
> do
> ./xdpsock -r -z -i p6p1 -m -q $i &
> done
>
> I run the xdpsock in samples/bpf using the shell command above.
> And occasionally the kernel hang, so I have to power off and on.
>
> Additonal information:
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
  2020-06-22 17:46 ` Björn Töpel
@ 2020-06-22 17:49     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2020-06-22 17:49 UTC (permalink / raw)
  To: Stephen Hemminger, Karlsson, Magnus, Björn Töpel,
	intel-wired-lan
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, bpf

On Mon, 22 Jun 2020 at 19:46, Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> On Mon, 22 Jun 2020 at 18:08, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Mon, 22 Jun 2020 10:13:52 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
> >
>
> Thanks for forwarding, Stephen.
>
> I'll have a look!
>

Intel ixgbe splat. Adding intel-wired-lan to To:.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
@ 2020-06-22 17:49     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-06-22 17:49 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 22 Jun 2020 at 19:46, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> On Mon, 22 Jun 2020 at 18:08, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Mon, 22 Jun 2020 10:13:52 +0000
> > From: bugzilla-daemon at bugzilla.kernel.org
> > To: stephen at networkplumber.org
> > Subject: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
> >
>
> Thanks for forwarding, Stephen.
>
> I'll have a look!
>

Intel ixgbe splat. Adding intel-wired-lan to To:.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
  2020-06-22 17:49     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  (?)
@ 2020-06-23  3:44     ` Yahui Chen
  2020-06-23  9:18         ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 1 reply; 8+ messages in thread
From: Yahui Chen @ 2020-06-23  3:44 UTC (permalink / raw)
  To: intel-wired-lan

Hi, Bjorn, Thank your response.
Could you describe it more clearly? I can not get it exactly.
Thx.

Bj?rn T?pel <bjorn.topel@gmail.com> ?2020?6?23??? ??1:50???

> On Mon, 22 Jun 2020 at 19:46, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> >
> > On Mon, 22 Jun 2020 at 18:08, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > >
> > >
> > > Begin forwarded message:
> > >
> > > Date: Mon, 22 Jun 2020 10:13:52 +0000
> > > From: bugzilla-daemon at bugzilla.kernel.org
> > > To: stephen at networkplumber.org
> > > Subject: [Bug 208275] New: kernel hang occasionally while running the
> sample of xdpsock
> > >
> >
> > Thanks for forwarding, Stephen.
> >
> > I'll have a look!
> >
>
> Intel ixgbe splat. Adding intel-wired-lan to To:.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200623/75f6ccfb/attachment.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
  2020-06-22 16:08 Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock Stephen Hemminger
  2020-06-22 17:46 ` Björn Töpel
@ 2020-06-23  6:27 ` Yahui Chen
  1 sibling, 0 replies; 8+ messages in thread
From: Yahui Chen @ 2020-06-23  6:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	bpf@vger.kernel.org

Hi, Bjorn, Thx for your response.
I can not get it exactly. Could you describe the details more clearly?
Thx.

Stephen Hemminger <stephen@networkplumber.org> 于2020年6月23日周二 上午12:08写道:

>
>
>
> Begin forwarded message:
>
> Date: Mon, 22 Jun 2020 10:13:52 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=208275
>
>             Bug ID: 208275
>            Summary: kernel hang occasionally while running the sample of
>                     xdpsock
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 5.7.0
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: goodluckwillcomesoon@gmail.com
>         Regression: No
>
> Distribution:
> 5.7.0-1.el7.centos.x86_64
>
> Hardware Environment:
> Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016
>
> Software Environment:
>
>
> Problem Description:
> kernel hang occasionally while running the sample of xdpsock
>
> Steps to reproduce:
>
> I want to test the rx performace of AF_XDP. I change the nic to 4 queues by cmd
> `ethtool -L p6p1 combined 4`, then I will create 1 socket for every queue.
>
> for ((i=0; i<4; i++));
> do
> ./xdpsock -r -z -i p6p1 -m -q $i &
> done
>
> I run the xdpsock in samples/bpf using the shell command above.
> And occasionally the kernel hang, so I have to power off and on.
>
> Additonal information:
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
  2020-06-23  3:44     ` Yahui Chen
@ 2020-06-23  9:18         ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2020-06-23  9:18 UTC (permalink / raw)
  To: Yahui Chen
  Cc: Stephen Hemminger, Karlsson, Magnus, Björn Töpel,
	intel-wired-lan, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, bpf

On Tue, 23 Jun 2020 at 05:45, Yahui Chen <goodluckwillcomesoon@gmail.com> wrote:
>
> Hi, Bjorn, Thank your response.
> Could you describe it more clearly? I can not get it exactly.
> Thx.
>

When XDP is enabled, the ixgbe NIC does a (somewhat heavy)
reconfiguration. During the reconfiguration, for some reason, the
rx_buffer->page is NULL in the following call chain:
  ixgbe_down()->ixgbe_clean_all_rx_rings()->ixgbe_clean_rx_ring()->__page_frag_cache_drain()

This results in that when __page_frag_cache_drain() want to touch the
reference counter, you get a NULL pointer dereference.

[277994.329145] BUG: kernel NULL pointer dereference, address: 0000000000000034
...
[277994.329428] RIP: 0010:__page_frag_cache_drain+0x5/0x40
[277994.329463] Code: d2 ff ff 31 f6 84 c0 74 04 0f b6 73 51 48 89 df
e8 70 ff ff ff eb dc 48 83 eb 01 eb d0 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 55 48 89 e5 a9 00 00 01 00 74
0f 0f

2a:*    f0 29 77 34              lock sub %esi,0x34(%rdi)        <--
trapping instruction

I tried to reproduce the issue, but without success so far. I'll keep
looking for the bug. Hopefully someone from Intel with better insight
into ixgbe can help!


Björn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock
@ 2020-06-23  9:18         ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-06-23  9:18 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 23 Jun 2020 at 05:45, Yahui Chen <goodluckwillcomesoon@gmail.com> wrote:
>
> Hi, Bjorn, Thank your response.
> Could you describe it more clearly? I can not get it exactly.
> Thx.
>

When XDP is enabled, the ixgbe NIC does a (somewhat heavy)
reconfiguration. During the reconfiguration, for some reason, the
rx_buffer->page is NULL in the following call chain:
  ixgbe_down()->ixgbe_clean_all_rx_rings()->ixgbe_clean_rx_ring()->__page_frag_cache_drain()

This results in that when __page_frag_cache_drain() want to touch the
reference counter, you get a NULL pointer dereference.

[277994.329145] BUG: kernel NULL pointer dereference, address: 0000000000000034
...
[277994.329428] RIP: 0010:__page_frag_cache_drain+0x5/0x40
[277994.329463] Code: d2 ff ff 31 f6 84 c0 74 04 0f b6 73 51 48 89 df
e8 70 ff ff ff eb dc 48 83 eb 01 eb d0 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 55 48 89 e5 a9 00 00 01 00 74
0f 0f

2a:*    f0 29 77 34              lock sub %esi,0x34(%rdi)        <--
trapping instruction

I tried to reproduce the issue, but without success so far. I'll keep
looking for the bug. Hopefully someone from Intel with better insight
into ixgbe can help!


Bj?rn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-23  9:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-22 16:08 Fw: [Bug 208275] New: kernel hang occasionally while running the sample of xdpsock Stephen Hemminger
2020-06-22 17:46 ` Björn Töpel
2020-06-22 17:49   ` Björn Töpel
2020-06-22 17:49     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-06-23  3:44     ` Yahui Chen
2020-06-23  9:18       ` Björn Töpel
2020-06-23  9:18         ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-06-23  6:27 ` Yahui Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.