From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Mekala, SunithaX D" <sunithax.d.mekala@intel.com>
Cc: "intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>,
Leon Romanovsky <leon@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: clear number of qs when rings are free
Date: Fri, 7 Apr 2023 18:58:24 +0200 [thread overview]
Message-ID: <ZDBLsBhGN74K6Nns@boxer> (raw)
In-Reply-To: <CO1PR11MB50282B593D48A928AE1B11B0A0969@CO1PR11MB5028.namprd11.prod.outlook.com>
On Fri, Apr 07, 2023 at 04:12:49PM +0000, Mekala, SunithaX D wrote:
> Still observing system hung
> Test 1: Upon PF reset
> Applied reproducer.patch in kernel, followed by below commands
> echo 1 > /sys/module/ice/parameters/ice_reproduce_panic
what is that?
> echo 1 > /sys/class/net/<ice_pf>/device/reset
> System did not hang but the PF interface went down with dmesg to reload driver
> On unloading driver, system hangs with no response.
> 2. On changing queues
> Applied reproducer.patch in kernel, followed by below commands
> echo 1 > /sys/module/ice/parameters/ice_reproduce_panic
> ethtool -L $pf rx 1 tx 1
> System stops responding
this is not enough info for us. You should be able to catch the splat
(like below was included in commit message).
I might be missing something, but to me zeroing num_rxq is not enough. In
the rebuild path ice_vsi_set_num_qs() will re-init that *before* calling
ice_vsi_alloc_arrays(), so if workqueue is still running there is a small
time frame where driver will be in state of non-zero num_rxq without rx
ring array being allocated. Only reset path cancels ptp->work.
> >
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Michal Swiatkowski
> > Sent: Monday, March 20, 2023 7:59 AM
> > To: Leon Romanovsky <leon@kernel.org>
> > Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> > Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: clear number of qs when rings are free
> >
> > On Mon, Mar 20, 2023 at 01:51:17PM +0200, Leon Romanovsky wrote:
> > > On Mon, Mar 20, 2023 at 12:23:47PM +0100, Michal Swiatkowski wrote:
> > > > In case rebuild fails not clearing this field can lead to call trace.
> > > >
> > > > [ +0.009792] BUG: kernel NULL pointer dereference, address:
> > > > 0000000000000000 [ +0.000009] #PF: supervisor read access in kernel
> > > > mode [ +0.000006] #PF: error_code(0x0000) - not-present page [
> > > > +0.000005] PGD 0 P4D 0 [ +0.000009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [ +0.000009] CPU: 45 PID: 77867 Comm: ice-ptp-0000:60 Kdump: loaded Tainted: G S OE 6.2.0-rc6+ #110
> > > > [ +0.000010] Hardware name: Dell Inc. PowerEdge R740/0JMK61, BIOS
> > > > 2.11.2 004/21/2021 [ +0.000005] RIP:
> > > > 0010:ice_ptp_update_cached_phctime+0xb0/0x130 [ice] [ +0.000145]
> > > > Code: fa 7e 55 48 8b 93 48 01 00 00 48 8b 0c fa 48 85 c9 74 e1 8b 51
> > > > 68 85 d2 75 da 66 83 b9 86 04 00 00 00 74 d0 31 d2 48 8b 71 20 <48>
> > > > 8b 34 d6 48 85 f6 74 07 48 89 86 d8 00 00 00 0f b7 b1 86 04 00 [
> > > > +0.000008] RSP: 0018:ffffa036cf7c7ea8 EFLAGS: 00010246 [ +0.000008]
> > > > RAX: 174ab1a8ab400f43 RBX: ffff937cda2c01a0 RCX: ffff937cdca9b028 [
> > > > +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > > 0000000000000000 [ +0.000005] RBP: ffffa036cf7c7eb8 R08:
> > > > 0000000000000000 R09: 0000000000000000 [ +0.000005] R10:
> > > > 0000000000000080 R11: 0000000000000001 R12: ffff937cdc971f40 [
> > > > +0.000006] R13: ffff937cdc971f44 R14: 0000000000000001 R15:
> > > > ffffffffc13f3210 [ +0.000005] FS: 0000000000000000(0000)
> > > > GS:ffff93826f980000(0000) knlGS:0000000000000000 [ +0.000006] CS:
> > > > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000006] CR2:
> > > > 0000000000000000 CR3: 00000004b7310002 CR4: 00000000007726e0 [
> > > > +0.000006] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000 [ +0.000004] DR3: 0000000000000000 DR6:
> > > > 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000005] PKRU:
> > > > 55555554 [ +0.000004] Call Trace:
> > > > [ +0.000004] <TASK>
> > > > [ +0.000007] ice_ptp_periodic_work+0x2a/0x60 [ice] [ +0.000126]
> > > > kthread_worker_fn+0xa6/0x250 [ +0.000014] ?
> > > > __pfx_kthread_worker_fn+0x10/0x10 [ +0.000010] kthread+0xfc/0x130
> > > > [ +0.000009] ? __pfx_kthread+0x10/0x10 [ +0.000010]
> > > > ret_from_fork+0x29/0x50
> > > >
> > > > ice_ptp_update_cached_phctime() is calling ice_for_each_rxq macro,
> > > > in case of rebuild fail the rx_ring is NULL and there is NULL
> > > > pointer dereference.
> > > >
> > >> Also for future safety it is better to clear the size values for tx
> > > > and rx ring when they are cleared.
> > > >
> > > > Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller
> > > > functions")
> > > > Reported-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
> > > > Signed-off-by: Michal Swiatkowski
> > > > <michal.swiatkowski@linux.intel.com>
> > > > ---
> > > > v1 --> v2:
> > > > * change subject to net and add fixes tag
> > > > ---
> > > > drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
WARNING: multiple messages have this Message-ID (diff)
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Mekala, SunithaX D" <sunithax.d.mekala@intel.com>
Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
Leon Romanovsky <leon@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>
Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: clear number of qs when rings are free
Date: Fri, 7 Apr 2023 18:58:24 +0200 [thread overview]
Message-ID: <ZDBLsBhGN74K6Nns@boxer> (raw)
In-Reply-To: <CO1PR11MB50282B593D48A928AE1B11B0A0969@CO1PR11MB5028.namprd11.prod.outlook.com>
On Fri, Apr 07, 2023 at 04:12:49PM +0000, Mekala, SunithaX D wrote:
> Still observing system hung
> Test 1: Upon PF reset
> Applied reproducer.patch in kernel, followed by below commands
> echo 1 > /sys/module/ice/parameters/ice_reproduce_panic
what is that?
> echo 1 > /sys/class/net/<ice_pf>/device/reset
> System did not hang but the PF interface went down with dmesg to reload driver
> On unloading driver, system hangs with no response.
> 2. On changing queues
> Applied reproducer.patch in kernel, followed by below commands
> echo 1 > /sys/module/ice/parameters/ice_reproduce_panic
> ethtool -L $pf rx 1 tx 1
> System stops responding
this is not enough info for us. You should be able to catch the splat
(like below was included in commit message).
I might be missing something, but to me zeroing num_rxq is not enough. In
the rebuild path ice_vsi_set_num_qs() will re-init that *before* calling
ice_vsi_alloc_arrays(), so if workqueue is still running there is a small
time frame where driver will be in state of non-zero num_rxq without rx
ring array being allocated. Only reset path cancels ptp->work.
> >
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Michal Swiatkowski
> > Sent: Monday, March 20, 2023 7:59 AM
> > To: Leon Romanovsky <leon@kernel.org>
> > Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> > Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: clear number of qs when rings are free
> >
> > On Mon, Mar 20, 2023 at 01:51:17PM +0200, Leon Romanovsky wrote:
> > > On Mon, Mar 20, 2023 at 12:23:47PM +0100, Michal Swiatkowski wrote:
> > > > In case rebuild fails not clearing this field can lead to call trace.
> > > >
> > > > [ +0.009792] BUG: kernel NULL pointer dereference, address:
> > > > 0000000000000000 [ +0.000009] #PF: supervisor read access in kernel
> > > > mode [ +0.000006] #PF: error_code(0x0000) - not-present page [
> > > > +0.000005] PGD 0 P4D 0 [ +0.000009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [ +0.000009] CPU: 45 PID: 77867 Comm: ice-ptp-0000:60 Kdump: loaded Tainted: G S OE 6.2.0-rc6+ #110
> > > > [ +0.000010] Hardware name: Dell Inc. PowerEdge R740/0JMK61, BIOS
> > > > 2.11.2 004/21/2021 [ +0.000005] RIP:
> > > > 0010:ice_ptp_update_cached_phctime+0xb0/0x130 [ice] [ +0.000145]
> > > > Code: fa 7e 55 48 8b 93 48 01 00 00 48 8b 0c fa 48 85 c9 74 e1 8b 51
> > > > 68 85 d2 75 da 66 83 b9 86 04 00 00 00 74 d0 31 d2 48 8b 71 20 <48>
> > > > 8b 34 d6 48 85 f6 74 07 48 89 86 d8 00 00 00 0f b7 b1 86 04 00 [
> > > > +0.000008] RSP: 0018:ffffa036cf7c7ea8 EFLAGS: 00010246 [ +0.000008]
> > > > RAX: 174ab1a8ab400f43 RBX: ffff937cda2c01a0 RCX: ffff937cdca9b028 [
> > > > +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > > 0000000000000000 [ +0.000005] RBP: ffffa036cf7c7eb8 R08:
> > > > 0000000000000000 R09: 0000000000000000 [ +0.000005] R10:
> > > > 0000000000000080 R11: 0000000000000001 R12: ffff937cdc971f40 [
> > > > +0.000006] R13: ffff937cdc971f44 R14: 0000000000000001 R15:
> > > > ffffffffc13f3210 [ +0.000005] FS: 0000000000000000(0000)
> > > > GS:ffff93826f980000(0000) knlGS:0000000000000000 [ +0.000006] CS:
> > > > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000006] CR2:
> > > > 0000000000000000 CR3: 00000004b7310002 CR4: 00000000007726e0 [
> > > > +0.000006] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000 [ +0.000004] DR3: 0000000000000000 DR6:
> > > > 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000005] PKRU:
> > > > 55555554 [ +0.000004] Call Trace:
> > > > [ +0.000004] <TASK>
> > > > [ +0.000007] ice_ptp_periodic_work+0x2a/0x60 [ice] [ +0.000126]
> > > > kthread_worker_fn+0xa6/0x250 [ +0.000014] ?
> > > > __pfx_kthread_worker_fn+0x10/0x10 [ +0.000010] kthread+0xfc/0x130
> > > > [ +0.000009] ? __pfx_kthread+0x10/0x10 [ +0.000010]
> > > > ret_from_fork+0x29/0x50
> > > >
> > > > ice_ptp_update_cached_phctime() is calling ice_for_each_rxq macro,
> > > > in case of rebuild fail the rx_ring is NULL and there is NULL
> > > > pointer dereference.
> > > >
> > >> Also for future safety it is better to clear the size values for tx
> > > > and rx ring when they are cleared.
> > > >
> > > > Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller
> > > > functions")
> > > > Reported-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
> > > > Signed-off-by: Michal Swiatkowski
> > > > <michal.swiatkowski@linux.intel.com>
> > > > ---
> > > > v1 --> v2:
> > > > * change subject to net and add fixes tag
> > > > ---
> > > > drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2023-04-07 16:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 11:23 [Intel-wired-lan] [PATCH net v2] ice: clear number of qs when rings are free Michal Swiatkowski
2023-03-20 11:23 ` Michal Swiatkowski
2023-03-20 11:51 ` [Intel-wired-lan] " Leon Romanovsky
2023-03-20 11:51 ` Leon Romanovsky
2023-03-20 14:59 ` [Intel-wired-lan] " Michal Swiatkowski
2023-03-20 14:59 ` Michal Swiatkowski
2023-04-07 16:12 ` [Intel-wired-lan] " Mekala, SunithaX D
2023-04-07 16:12 ` Mekala, SunithaX D
2023-04-07 16:58 ` Maciej Fijalkowski [this message]
2023-04-07 16:58 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDBLsBhGN74K6Nns@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=leon@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sunithax.d.mekala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.