From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: John Hickey <jjh@daedalian.us>
Cc: <anthony.l.nguyen@intel.com>,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
Shujin Li <lishujin@kuaishou.com>,
Jason Xing <xingwanli@kuaishou.com>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>
Subject: Re: [PATCH net v3] ixgbe: Panic during XDP_TX with > 64 CPUs
Date: Wed, 5 Apr 2023 16:40:33 +0200 [thread overview]
Message-ID: <ZC2IYWgTUFCnlKc9@boxer> (raw)
In-Reply-To: <20230308220756.587317-1-jjh@daedalian.us>
On Wed, Mar 08, 2023 at 02:07:57PM -0800, John Hickey wrote:
> In commit 'ixgbe: let the xdpdrv work with more than 64 cpus'
> (4fe815850bdc), support was added to allow XDP programs to run on systems
> with more than 64 CPUs by locking the XDP TX rings and indexing them
> using cpu % 64 (IXGBE_MAX_XDP_QS).
>
> Upon trying this out patch via the Intel 5.18.6 out of tree driver
> on a system with more than 64 cores, the kernel paniced with an
> array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in
> ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the
> cpu instead of cpu % IXGBE_MAX_XDP_QS. An example splat:
i am assuming that in-tree driver is also affected so the info about OOT
driver is irrelevant here :)
>
> ==========================================================================
> UBSAN: array-index-out-of-bounds in
> /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
> index 65 is out of range for type 'ixgbe_ring *[64]'
> ==========================================================================
> BUG: kernel NULL pointer dereference, address: 0000000000000058
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP NOPTI
> CPU: 65 PID: 408 Comm: ksoftirqd/65
> Tainted: G IOE 5.15.0-48-generic #54~20.04.1-Ubuntu
> Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
> RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
> Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
> 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7
> 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
> RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
> RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
> RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
> RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
> R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
> R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
> FS: 0000000000000000(0000) GS:ffff92b9dfc00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ixgbe_poll+0x103e/0x1280 [ixgbe]
> ? sched_clock_cpu+0x12/0xe0
> __napi_poll+0x30/0x160
> net_rx_action+0x11c/0x270
> __do_softirq+0xda/0x2ee
> run_ksoftirqd+0x2f/0x50
> smpboot_thread_fn+0xb7/0x150
> ? sort_range+0x30/0x30
> kthread+0x127/0x150
> ? set_kthread_struct+0x50/0x50
> ret_from_fork+0x1f/0x30
> </TASK>
>
> I think this is how it happens:
>
> Upon loading the first XDP program on a system with more than 64 CPUs,
> ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup. However,
> immediately after this, the rings are reconfigured by ixgbe_setup_tc.
> ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
> ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
> ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
> it is non-zero. Commenting out the decrement in ixgbe_free_q_vector
> stopped my system from panicing.
>
> I suspect to make the original patch work, I would need to load an XDP
> program and then replace it in order to get ixgbe_xdp_locking_key back
> above 0 since ixgbe_setup_tc is only called when transitioning between
> XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
> incremented every time ixgbe_xdp_setup is called.
>
> Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
> becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
> with greater than 64 CPUs.
>
> For this patch, I have changed static_branch_inc to static_branch_enable
> in ixgbe_setup_xdp. We weren't counting references. The
> ixgbe_xdp_locking_key only protects code in the XDP_TX path, which is
> not run when an XDP program is loaded. The other condition for setting
> it on is the number of CPUs, which I assume is static.
From technical POV i think the contents here are fine. Maybe we could even
move this check to probe path because as you said this is static. I saw
that others had some wording/style comments, so if you decide to fix it
and push a v4, then you can add:
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
> Fixes: 4fe815850bdc ("ixgbe: let the xdpdrv work with more than 64 cpus")
> Signed-off-by: John Hickey <jjh@daedalian.us>
> ---
> v1 -> v2:
> Added Fixes and net tag. No code changes.
> v2 -> v3:
> Added splat. Slight clarification as to why ixgbe_xdp_locking_key
> is not turned off. Based on feedback from Maciej Fijalkowski.
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 3 ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index f8156fe4b1dc..0ee943db3dc9 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -1035,9 +1035,6 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx)
> adapter->q_vector[v_idx] = NULL;
> __netif_napi_del(&q_vector->napi);
>
> - if (static_key_enabled(&ixgbe_xdp_locking_key))
> - static_branch_dec(&ixgbe_xdp_locking_key);
> -
> /*
> * after a call to __netif_napi_del() napi may still be used and
> * ixgbe_get_stats64() might access the rings on this vector,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ab8370c413f3..cd2fb72c67be 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10283,7 +10283,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
> if (nr_cpu_ids > IXGBE_MAX_XDP_QS * 2)
> return -ENOMEM;
> else if (nr_cpu_ids > IXGBE_MAX_XDP_QS)
> - static_branch_inc(&ixgbe_xdp_locking_key);
> + static_branch_enable(&ixgbe_xdp_locking_key);
>
> old_prog = xchg(&adapter->xdp_prog, prog);
> need_reset = (!!prog != !!old_prog);
> --
> 2.37.2
>
WARNING: multiple messages have this Message-ID (diff)
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: John Hickey <jjh@daedalian.us>
Cc: Shujin Li <lishujin@kuaishou.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
intel-wired-lan@lists.osuosl.org,
John Fastabend <john.fastabend@gmail.com>,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
Alexei Starovoitov <ast@kernel.org>,
Eric Dumazet <edumazet@google.com>,
anthony.l.nguyen@intel.com, netdev@vger.kernel.org,
Jakub Kicinski <kuba@kernel.org>,
bpf@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
linux-kernel@vger.kernel.org, Jason Xing <xingwanli@kuaishou.com>
Subject: Re: [Intel-wired-lan] [PATCH net v3] ixgbe: Panic during XDP_TX with > 64 CPUs
Date: Wed, 5 Apr 2023 16:40:33 +0200 [thread overview]
Message-ID: <ZC2IYWgTUFCnlKc9@boxer> (raw)
In-Reply-To: <20230308220756.587317-1-jjh@daedalian.us>
On Wed, Mar 08, 2023 at 02:07:57PM -0800, John Hickey wrote:
> In commit 'ixgbe: let the xdpdrv work with more than 64 cpus'
> (4fe815850bdc), support was added to allow XDP programs to run on systems
> with more than 64 CPUs by locking the XDP TX rings and indexing them
> using cpu % 64 (IXGBE_MAX_XDP_QS).
>
> Upon trying this out patch via the Intel 5.18.6 out of tree driver
> on a system with more than 64 cores, the kernel paniced with an
> array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in
> ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the
> cpu instead of cpu % IXGBE_MAX_XDP_QS. An example splat:
i am assuming that in-tree driver is also affected so the info about OOT
driver is irrelevant here :)
>
> ==========================================================================
> UBSAN: array-index-out-of-bounds in
> /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
> index 65 is out of range for type 'ixgbe_ring *[64]'
> ==========================================================================
> BUG: kernel NULL pointer dereference, address: 0000000000000058
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP NOPTI
> CPU: 65 PID: 408 Comm: ksoftirqd/65
> Tainted: G IOE 5.15.0-48-generic #54~20.04.1-Ubuntu
> Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
> RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
> Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
> 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7
> 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
> RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
> RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
> RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
> RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
> R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
> R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
> FS: 0000000000000000(0000) GS:ffff92b9dfc00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ixgbe_poll+0x103e/0x1280 [ixgbe]
> ? sched_clock_cpu+0x12/0xe0
> __napi_poll+0x30/0x160
> net_rx_action+0x11c/0x270
> __do_softirq+0xda/0x2ee
> run_ksoftirqd+0x2f/0x50
> smpboot_thread_fn+0xb7/0x150
> ? sort_range+0x30/0x30
> kthread+0x127/0x150
> ? set_kthread_struct+0x50/0x50
> ret_from_fork+0x1f/0x30
> </TASK>
>
> I think this is how it happens:
>
> Upon loading the first XDP program on a system with more than 64 CPUs,
> ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup. However,
> immediately after this, the rings are reconfigured by ixgbe_setup_tc.
> ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
> ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
> ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
> it is non-zero. Commenting out the decrement in ixgbe_free_q_vector
> stopped my system from panicing.
>
> I suspect to make the original patch work, I would need to load an XDP
> program and then replace it in order to get ixgbe_xdp_locking_key back
> above 0 since ixgbe_setup_tc is only called when transitioning between
> XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
> incremented every time ixgbe_xdp_setup is called.
>
> Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
> becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
> with greater than 64 CPUs.
>
> For this patch, I have changed static_branch_inc to static_branch_enable
> in ixgbe_setup_xdp. We weren't counting references. The
> ixgbe_xdp_locking_key only protects code in the XDP_TX path, which is
> not run when an XDP program is loaded. The other condition for setting
> it on is the number of CPUs, which I assume is static.
From technical POV i think the contents here are fine. Maybe we could even
move this check to probe path because as you said this is static. I saw
that others had some wording/style comments, so if you decide to fix it
and push a v4, then you can add:
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
> Fixes: 4fe815850bdc ("ixgbe: let the xdpdrv work with more than 64 cpus")
> Signed-off-by: John Hickey <jjh@daedalian.us>
> ---
> v1 -> v2:
> Added Fixes and net tag. No code changes.
> v2 -> v3:
> Added splat. Slight clarification as to why ixgbe_xdp_locking_key
> is not turned off. Based on feedback from Maciej Fijalkowski.
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 3 ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index f8156fe4b1dc..0ee943db3dc9 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -1035,9 +1035,6 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx)
> adapter->q_vector[v_idx] = NULL;
> __netif_napi_del(&q_vector->napi);
>
> - if (static_key_enabled(&ixgbe_xdp_locking_key))
> - static_branch_dec(&ixgbe_xdp_locking_key);
> -
> /*
> * after a call to __netif_napi_del() napi may still be used and
> * ixgbe_get_stats64() might access the rings on this vector,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ab8370c413f3..cd2fb72c67be 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10283,7 +10283,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
> if (nr_cpu_ids > IXGBE_MAX_XDP_QS * 2)
> return -ENOMEM;
> else if (nr_cpu_ids > IXGBE_MAX_XDP_QS)
> - static_branch_inc(&ixgbe_xdp_locking_key);
> + static_branch_enable(&ixgbe_xdp_locking_key);
>
> old_prog = xchg(&adapter->xdp_prog, prog);
> need_reset = (!!prog != !!old_prog);
> --
> 2.37.2
>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2023-04-05 14:41 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-08 22:07 [PATCH net v3] ixgbe: Panic during XDP_TX with > 64 CPUs John Hickey
2023-03-08 22:07 ` [Intel-wired-lan] " John Hickey
2023-03-25 18:29 ` Rout, ChandanX
2023-03-25 18:29 ` [Intel-wired-lan] " Rout, ChandanX
2023-03-26 15:03 ` Paul Menzel
2023-03-26 15:03 ` Paul Menzel
2023-03-26 15:10 ` Paul Menzel
2023-03-26 15:10 ` Paul Menzel
2023-03-26 15:13 ` Paul Menzel
2023-03-26 15:13 ` Paul Menzel
2023-03-27 2:45 ` Jason Xing
2023-03-27 2:45 ` Jason Xing
2023-03-27 3:00 ` Jason Xing
2023-03-27 3:00 ` [Intel-wired-lan] " Jason Xing
2023-04-05 14:40 ` Maciej Fijalkowski [this message]
2023-04-05 14:40 ` Maciej Fijalkowski
2023-04-13 23:03 ` [PATCH net v4] ixgbe: Fix panic " John Hickey
2023-04-13 23:03 ` [Intel-wired-lan] " John Hickey
2023-04-24 4:08 ` Rout, ChandanX
2023-04-24 4:08 ` Rout, ChandanX
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZC2IYWgTUFCnlKc9@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jesse.brandeburg@intel.com \
--cc=jjh@daedalian.us \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lishujin@kuaishou.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=xingwanli@kuaishou.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.