* Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-08 10:25 [PATCH net] iavf: Do not restart Tx queues after reset task failure Ivan Vecera
@ 2022-11-08 16:40 ` Jacob Keller
2022-11-09 18:20 ` Leon Romanovsky
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Jacob Keller @ 2022-11-08 16:40 UTC (permalink / raw)
To: Ivan Vecera, netdev
Cc: sassmann, Patryk Piotrowski, SlawomirX Laba, Jesse Brandeburg,
Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, moderated list:INTEL ETHERNET DRIVERS, open list
On 11/8/2022 2:25 AM, Ivan Vecera wrote:
> After commit aa626da947e9 ("iavf: Detach device during reset task")
> the device is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are
> restarted during device re-attach and this leads later to a crash.
>
> To resolve this issue properly close the net device in cause of
> failure in reset task to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close
> that clears properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter
> state is already __IAVF_DOWN.
Thanks for fixing this. I thought we'd removed the last such use of
incorrect IFF_UP munging in 7d59706dbef8 ("Revert "iavf: Fix deadlock
occurrence during resetting VF interface""), but apparently we had
missed one...
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-08 10:25 [PATCH net] iavf: Do not restart Tx queues after reset task failure Ivan Vecera
2022-11-08 16:40 ` Jacob Keller
@ 2022-11-09 18:20 ` Leon Romanovsky
2022-11-09 20:11 ` Keller, Jacob E
2022-11-18 14:30 ` [Intel-wired-lan] " Jankowski, Konrad0
2022-11-18 14:31 ` Jankowski, Konrad0
3 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2022-11-09 18:20 UTC (permalink / raw)
To: Ivan Vecera
Cc: netdev, sassmann, Jacob Keller, Patryk Piotrowski, SlawomirX Laba,
Jesse Brandeburg, Tony Nguyen, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni,
moderated list:INTEL ETHERNET DRIVERS, open list
On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> After commit aa626da947e9 ("iavf: Detach device during reset task")
> the device is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are
> restarted during device re-attach and this leads later to a crash.
<...>
> + if (netif_running(netdev)) {
> + /* Close device to ensure that Tx queues will not be started
> + * during netif_device_attach() at the end of the reset task.
> + */
> + rtnl_lock();
> + dev_close(netdev);
> + rtnl_unlock();
> + }
Sorry for my naive question, I see this pattern a lot (including RDMA),
so curious. Everyone checks netif_running() outside of rtnl_lock, while
dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
placed before netif_running()?
Thanks
> +
> dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
> reset_finish:
> rtnl_lock();
> --
> 2.37.4
>
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-09 18:20 ` Leon Romanovsky
@ 2022-11-09 20:11 ` Keller, Jacob E
2022-11-10 9:17 ` Leon Romanovsky
2022-11-10 14:51 ` Ivan Vecera
0 siblings, 2 replies; 11+ messages in thread
From: Keller, Jacob E @ 2022-11-09 20:11 UTC (permalink / raw)
To: Leon Romanovsky, ivecera
Cc: netdev@vger.kernel.org, sassmann@redhat.com, Piotrowski, Patryk,
SlawomirX Laba, Brandeburg, Jesse, Nguyen, Anthony L,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
moderated list:INTEL ETHERNET DRIVERS, open list
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Wednesday, November 9, 2022 10:21 AM
> To: ivecera <ivecera@redhat.com>
> Cc: netdev@vger.kernel.org; sassmann@redhat.com; Keller, Jacob E
> <jacob.e.keller@intel.com>; Piotrowski, Patryk <patryk.piotrowski@intel.com>;
> SlawomirX Laba <slawomirx.laba@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
>
> On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > the device is detached during reset task and re-attached at its end.
> > The problem occurs when reset task fails because Tx queues are
> > restarted during device re-attach and this leads later to a crash.
>
> <...>
>
> > + if (netif_running(netdev)) {
> > + /* Close device to ensure that Tx queues will not be started
> > + * during netif_device_attach() at the end of the reset task.
> > + */
> > + rtnl_lock();
> > + dev_close(netdev);
> > + rtnl_unlock();
> > + }
>
> Sorry for my naive question, I see this pattern a lot (including RDMA),
> so curious. Everyone checks netif_running() outside of rtnl_lock, while
> dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> placed before netif_running()?
Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
Thanks,
Jake
>
> Thanks
>
> > +
> > dev_err(&adapter->pdev->dev, "failed to allocate resources during
> reinit\n");
> > reset_finish:
> > rtnl_lock();
> > --
> > 2.37.4
> >
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-09 20:11 ` Keller, Jacob E
@ 2022-11-10 9:17 ` Leon Romanovsky
2022-11-10 14:51 ` Ivan Vecera
1 sibling, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2022-11-10 9:17 UTC (permalink / raw)
To: Keller, Jacob E
Cc: ivecera, netdev@vger.kernel.org, sassmann@redhat.com,
Piotrowski, Patryk, SlawomirX Laba, Brandeburg, Jesse,
Nguyen, Anthony L, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, moderated list:INTEL ETHERNET DRIVERS, open list
On Wed, Nov 09, 2022 at 08:11:55PM +0000, Keller, Jacob E wrote:
>
>
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Wednesday, November 9, 2022 10:21 AM
> > To: ivecera <ivecera@redhat.com>
> > Cc: netdev@vger.kernel.org; sassmann@redhat.com; Keller, Jacob E
> > <jacob.e.keller@intel.com>; Piotrowski, Patryk <patryk.piotrowski@intel.com>;
> > SlawomirX Laba <slawomirx.laba@intel.com>; Brandeburg, Jesse
> > <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> > <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Eric
> > Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> > Abeni <pabeni@redhat.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> > wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
> >
> > On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > > the device is detached during reset task and re-attached at its end.
> > > The problem occurs when reset task fails because Tx queues are
> > > restarted during device re-attach and this leads later to a crash.
> >
> > <...>
> >
> > > + if (netif_running(netdev)) {
> > > + /* Close device to ensure that Tx queues will not be started
> > > + * during netif_device_attach() at the end of the reset task.
> > > + */
> > > + rtnl_lock();
> > > + dev_close(netdev);
> > > + rtnl_unlock();
> > > + }
> >
> > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > placed before netif_running()?
>
> Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
I honestly don't know.
To remove any doubts, this patch is LGTM.
Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-09 20:11 ` Keller, Jacob E
2022-11-10 9:17 ` Leon Romanovsky
@ 2022-11-10 14:51 ` Ivan Vecera
2022-11-10 17:07 ` Leon Romanovsky
1 sibling, 1 reply; 11+ messages in thread
From: Ivan Vecera @ 2022-11-10 14:51 UTC (permalink / raw)
To: Keller, Jacob E
Cc: Leon Romanovsky, netdev@vger.kernel.org, sassmann@redhat.com,
Piotrowski, Patryk, SlawomirX Laba, Brandeburg, Jesse,
Nguyen, Anthony L, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, moderated list:INTEL ETHERNET DRIVERS, open list
On Wed, 9 Nov 2022 20:11:55 +0000
"Keller, Jacob E" <jacob.e.keller@intel.com> wrote:
> > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > placed before netif_running()?
>
> Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
>
> Thanks,
> Jake
Check for a bit value (like netif_runnning()) is much cheaper than unconditionally
taking global lock like RTNL.
Ivan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-10 14:51 ` Ivan Vecera
@ 2022-11-10 17:07 ` Leon Romanovsky
[not found] ` <20221110122418.32414666@kernel.org>
0 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2022-11-10 17:07 UTC (permalink / raw)
To: Ivan Vecera
Cc: Keller, Jacob E, netdev@vger.kernel.org, sassmann@redhat.com,
Piotrowski, Patryk, SlawomirX Laba, Brandeburg, Jesse,
Nguyen, Anthony L, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, moderated list:INTEL ETHERNET DRIVERS, open list
On Thu, Nov 10, 2022 at 03:51:47PM +0100, Ivan Vecera wrote:
> On Wed, 9 Nov 2022 20:11:55 +0000
> "Keller, Jacob E" <jacob.e.keller@intel.com> wrote:
>
> > > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > > placed before netif_running()?
> >
> > Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
> >
> > Thanks,
> > Jake
>
> Check for a bit value (like netif_runnning()) is much cheaper than unconditionally
> taking global lock like RTNL.
This cheap operation is racy and performed in non-performance critical path.
Thanks
>
> Ivan
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-08 10:25 [PATCH net] iavf: Do not restart Tx queues after reset task failure Ivan Vecera
2022-11-08 16:40 ` Jacob Keller
2022-11-09 18:20 ` Leon Romanovsky
@ 2022-11-18 14:30 ` Jankowski, Konrad0
2022-11-18 14:31 ` Jankowski, Konrad0
3 siblings, 0 replies; 11+ messages in thread
From: Jankowski, Konrad0 @ 2022-11-18 14:30 UTC (permalink / raw)
To: ivecera, netdev@vger.kernel.org
Cc: SlawomirX Laba, Eric Dumazet,
moderated list:INTEL ETHERNET DRIVERS, open list,
Piotrowski, Patryk, Jakub Kicinski, Paolo Abeni, David S. Miller,
sassmann@redhat.com
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan
> Vecera
> Sent: Tuesday, November 8, 2022 11:25 AM
> To: netdev@vger.kernel.org
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>; Eric Dumazet
> <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; sassmann@redhat.com
> Subject: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset
> task failure
>
> After commit aa626da947e9 ("iavf: Detach device during reset task") the device
> is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are restarted during
> device re-attach and this leads later to a crash.
>
> To resolve this issue properly close the net device in cause of failure in reset task
> to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close that clears
> properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter state is
> already __IAVF_DOWN.
>
> Reproducer:
> 1) Run some Tx traffic (e.g. iperf3) over iavf interface
> 2) Set VF trusted / untrusted in loop
>
> [root@host ~]# cat repro.sh
> #!/bin/sh
>
> PF=enp65s0f0
> IF=${PF}v0
>
> ip link set up $IF
> ip addr add 192.168.0.2/24 dev $IF
> sleep 1
>
> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & sleep 2
>
> while :; do
> ip link set $PF vf 0 trust on
> ip link set $PF vf 0 trust off
> done
> [root@host ~]# ./repro.sh
>
> Result:
> [ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53 [ 2006.675662] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.689997] iavf 0000:41:01.0: Reset task
> did not complete, VF disabled [ 2006.696611] iavf 0000:41:01.0: failed to allocate
> resources during reinit [ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted [
> 2006.737011] ice 0000:41:00.0: VF 0 is now trusted [ 2006.764536] ice
> 0000:41:00.0: VF 0 is now untrusted [ 2006.768919] BUG: kernel NULL pointer
> dereference, address: 0000000000000b4a [ 2006.776358] #PF: supervisor read
> access in kernel mode [ 2006.781488] #PF: error_code(0x0000) - not-present
> page [ 2006.786620] PGD 0 P4D 0 [ 2006.789152] Oops: 0000 [#1] PREEMPT SMP
> NOPTI [ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted [ 2006.793501] CPU:
> 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2 [
> 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super
> Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 2006.815915] RIP:
> 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf] [ 2006.821028] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48
> 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08
> 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00 [
> 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293 [ 2006.850397]
> RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00 [
> 2006.856327] ice 0000:41:00.0: VF 0 is now trusted [ 2006.857523] RDX:
> 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0 [
> 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09:
> 0000000000000001 [ 2006.857526] R10: ffff9d154de45940 R11:
> 0000000000000000 R12: 0000000000000002 [ 2006.883600] R13:
> ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380 [ 2006.885840]
> ice 0000:41:00.0: VF 0 is now untrusted [ 2006.890725] FS:
> 0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
> [ 2006.890727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4:
> 0000000000770ee0 [ 2006.916543] PKRU: 55555554 [ 2006.918254] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.919248] Call Trace:
> [ 2006.919250] <IRQ>
> [ 2006.919252] dev_hard_start_xmit+0x9e/0x1f0 [ 2006.932587]
> sch_direct_xmit+0xa0/0x370 [ 2006.936424] __dev_queue_xmit+0x7af/0xd00 [
> 2006.940429] ip_finish_output2+0x26c/0x540 [ 2006.944519]
> ip_output+0x71/0x110 [ 2006.947831] ? __ip_finish_output+0x2b0/0x2b0 [
> 2006.952180] __ip_queue_xmit+0x16d/0x400 [ 2006.952721] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.956098] __tcp_transmit_skb+0xa96/0xbf0 [
> 2006.965148] __tcp_retransmit_skb+0x174/0x860 [ 2006.969499] ?
> cubictcp_cwnd_event+0x40/0x40 [ 2006.973769]
> tcp_retransmit_skb+0x14/0xb0 ...
>
> Fixes: aa626da947e9 ("iavf: Detach device during reset task")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 5abcd66e7c7a..b66f8fa1d83b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset task failure
2022-11-08 10:25 [PATCH net] iavf: Do not restart Tx queues after reset task failure Ivan Vecera
` (2 preceding siblings ...)
2022-11-18 14:30 ` [Intel-wired-lan] " Jankowski, Konrad0
@ 2022-11-18 14:31 ` Jankowski, Konrad0
3 siblings, 0 replies; 11+ messages in thread
From: Jankowski, Konrad0 @ 2022-11-18 14:31 UTC (permalink / raw)
To: ivecera, netdev@vger.kernel.org
Cc: SlawomirX Laba, Eric Dumazet,
moderated list:INTEL ETHERNET DRIVERS, open list,
Piotrowski, Patryk, Jakub Kicinski, Paolo Abeni, David S. Miller,
sassmann@redhat.com
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan
> Vecera
> Sent: Tuesday, November 8, 2022 11:25 AM
> To: netdev@vger.kernel.org
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>; Eric Dumazet
> <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; sassmann@redhat.com
> Subject: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset
> task failure
>
> After commit aa626da947e9 ("iavf: Detach device during reset task") the device
> is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are restarted during
> device re-attach and this leads later to a crash.
>
> To resolve this issue properly close the net device in cause of failure in reset task
> to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close that clears
> properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter state is
> already __IAVF_DOWN.
>
> Reproducer:
> 1) Run some Tx traffic (e.g. iperf3) over iavf interface
> 2) Set VF trusted / untrusted in loop
>
> [root@host ~]# cat repro.sh
> #!/bin/sh
>
> PF=enp65s0f0
> IF=${PF}v0
>
> ip link set up $IF
> ip addr add 192.168.0.2/24 dev $IF
> sleep 1
>
> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & sleep 2
>
> while :; do
> ip link set $PF vf 0 trust on
> ip link set $PF vf 0 trust off
> done
> [root@host ~]# ./repro.sh
>
> Result:
> [ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53 [ 2006.675662] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.689997] iavf 0000:41:01.0: Reset task
> did not complete, VF disabled [ 2006.696611] iavf 0000:41:01.0: failed to allocate
> resources during reinit [ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted [
> 2006.737011] ice 0000:41:00.0: VF 0 is now trusted [ 2006.764536] ice
> 0000:41:00.0: VF 0 is now untrusted [ 2006.768919] BUG: kernel NULL pointer
> dereference, address: 0000000000000b4a [ 2006.776358] #PF: supervisor read
> access in kernel mode [ 2006.781488] #PF: error_code(0x0000) - not-present
> page [ 2006.786620] PGD 0 P4D 0 [ 2006.789152] Oops: 0000 [#1] PREEMPT SMP
> NOPTI [ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted [ 2006.793501] CPU:
> 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2 [
> 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super
> Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 2006.815915] RIP:
> 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf] [ 2006.821028] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48
> 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08
> 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00 [
> 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293 [ 2006.850397]
> RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00 [
> 2006.856327] ice 0000:41:00.0: VF 0 is now trusted [ 2006.857523] RDX:
> 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0 [
> 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09:
> 0000000000000001 [ 2006.857526] R10: ffff9d154de45940 R11:
> 0000000000000000 R12: 0000000000000002 [ 2006.883600] R13:
> ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380 [ 2006.885840]
> ice 0000:41:00.0: VF 0 is now untrusted [ 2006.890725] FS:
> 0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
> [ 2006.890727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4:
> 0000000000770ee0 [ 2006.916543] PKRU: 55555554 [ 2006.918254] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.919248] Call Trace:
> [ 2006.919250] <IRQ>
> [ 2006.919252] dev_hard_start_xmit+0x9e/0x1f0 [ 2006.932587]
> sch_direct_xmit+0xa0/0x370 [ 2006.936424] __dev_queue_xmit+0x7af/0xd00 [
> 2006.940429] ip_finish_output2+0x26c/0x540 [ 2006.944519]
> ip_output+0x71/0x110 [ 2006.947831] ? __ip_finish_output+0x2b0/0x2b0 [
> 2006.952180] __ip_queue_xmit+0x16d/0x400 [ 2006.952721] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.956098] __tcp_transmit_skb+0xa96/0xbf0 [
> 2006.965148] __tcp_retransmit_skb+0x174/0x860 [ 2006.969499] ?
> cubictcp_cwnd_event+0x40/0x40 [ 2006.973769]
> tcp_retransmit_skb+0x14/0xb0 ...
>
> Fixes: aa626da947e9 ("iavf: Detach device during reset task")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 5abcd66e7c7a..b66f8fa1d83b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2921,7 +2921,6 @@ static void iavf_disable_vf(struct iavf_adapter
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread