From: Tony Nguyen <anthony.l.nguyen@intel.com>
To: Stefan Assmann <sassmann@redhat.com>, Ivan Vecera <ivecera@redhat.com>
Cc: SlawomirX Laba <slawomirx.laba@intel.com>,
Eric Dumazet <edumazet@google.com>,
netdev@vger.kernel.org, open list <linux-kernel@vger.kernel.org>,
Patryk Piotrowski <patryk.piotrowski@intel.com>,
"moderated list:INTEL ETHERNET DRIVERS"
<intel-wired-lan@lists.osuosl.org>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: [Intel-wired-lan] [PATCH net] iavf: Fix a crash during reset task
Date: Mon, 14 Nov 2022 16:49:15 -0800 [thread overview]
Message-ID: <2a630a76-bcfb-d08d-619d-eafa6a7b1025@intel.com> (raw)
In-Reply-To: <20221108105343.vjczwdxcsxhfghk7@p1>
On 11/8/2022 2:53 AM, Stefan Assmann wrote:
> On 2022-11-08 10:35, Ivan Vecera wrote:
>> Recent commit aa626da947e9 ("iavf: Detach device during reset task")
>> removed netif_tx_stop_all_queues() with an assumption that Tx queues
>> are already stopped by netif_device_detach() in the beginning of
>> reset task. This assumption is incorrect because during reset
>> task a potential link event can start Tx queues again.
>> Revert this change to fix this issue.
>>
>> Reproducer:
>> 1. Run some Tx traffic (e.g. iperf3) over iavf interface
>> 2. Switch MTU of this interface in a loop
>>
>> [root@host ~]# cat repro.sh
>> #!/bin/sh
>>
>> IF=enp2s0f0v0
>>
>> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
>> sleep 2
>>
>> while :; do
>> for i in 1280 1500 2000 900 ; do
>> ip link set $IF mtu $i
>> sleep 2
>> done
>> done
>
> With this patch applied iavf doesn't crash anymore but after a few
> cycles with the reproducer tx timeouts are observed.
>
> [ 47.551151] iavf 0000:00:09.0 eth0: NIC Link is Up Speed is 10 Gbps Full Duplex
> [ 54.035902] ------------[ cut here ]------------
> [ 54.037397] NETDEV WATCHDOG: eth0 (iavf): transmit queue 3 timed out
> [ 54.039264] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:526 dev_watchdog+0x20f/0x250
> [ 54.041524] Modules linked in: 8021q intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl pcspkr drm ramoops reed_solomon crct10dif_pclmul crc32_pclmul crc32c_intel ata_generic pata_acpi ghash_clmulni_intel ata_piix aesni_intel crypto_simd iavf libata be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
> [ 54.049723] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc2+ #90
> [ 54.051049] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014
> [ 54.052898] RIP: 0010:dev_watchdog+0x20f/0x250
> [ 54.053907] Code: 00 e9 4d ff ff ff 48 89 df c6 05 92 24 96 01 01 e8 c6 f2 f8 ff 44 89 e9 48 89 de 48 c7 c7 28 7f f6 a0 48 89 c2 e8 6e 65 23 00 <0f> 0b e9 2f ff ff ff e8 25 06 2a 00 85 c0 74 b5 80 3d 74 1b 96 01
> [ 54.057282] RSP: 0018:ffffaf56c00e0e80 EFLAGS: 00010282
> [ 54.058164] RAX: 0000000000000000 RBX: ffff993ed95b8000 RCX: 0000000000000103
> [ 54.059345] RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff
> [ 54.060473] RBP: ffff993ed95b8508 R08: 0000000000000000 R09: c0000000fff7ffff
> [ 54.061558] R10: 0000000000000001 R11: ffffaf56c00e0d18 R12: ffff993ed95b8420
> [ 54.062640] R13: 0000000000000003 R14: ffff993ed95b8508 R15: ffff993ef74a06c0
> [ 54.063681] FS: 0000000000000000(0000) GS:ffff993ef7480000(0000) knlGS:0000000000000000
> [ 54.064867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 54.065654] CR2: 00007f42309e1280 CR3: 0000000107f6a003 CR4: 0000000000170ee0
> [ 54.066612] Call Trace:
> [ 54.066985] <IRQ>
> [ 54.067265] ? mq_change_real_num_tx+0xd0/0xd0
> [ 54.067844] call_timer_fn+0xa1/0x2c0
> [ 54.068330] ? mq_change_real_num_tx+0xd0/0xd0
> [ 54.068916] run_timer_softirq+0x527/0x550
> [ 54.069447] ? lock_is_held_type+0xd8/0x130
> [ 54.069998] __do_softirq+0xc3/0x481
> [ 54.070469] irq_exit_rcu+0xe4/0x120
> [ 54.070963] sysvec_apic_timer_interrupt+0x9e/0xc0
> [ 54.071604] </IRQ>
> [ 54.071909] <TASK>
> [ 54.072223] asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 54.072942] RIP: 0010:default_idle+0x10/0x20
> [ 54.073533] Code: 89 df 31 f6 5b 5d e9 ff 1c a5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d f2 2a 42 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 65
>
> This only occurs when the device is detached and reattached during reset.
Hi Ivan,
Was there going to be an update to the patch to resolve this? If not,
I'll take what there is now.
Thanks,
Tony
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2022-11-15 0:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-08 9:35 [Intel-wired-lan] [PATCH net] iavf: Fix a crash during reset task Ivan Vecera
2022-11-08 10:53 ` Stefan Assmann
2022-11-15 0:49 ` Tony Nguyen [this message]
2022-11-18 14:29 ` Jankowski, Konrad0
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a630a76-bcfb-d08d-619d-eafa6a7b1025@intel.com \
--to=anthony.l.nguyen@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=ivecera@redhat.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patryk.piotrowski@intel.com \
--cc=sassmann@redhat.com \
--cc=slawomirx.laba@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox