linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ibmveth: Kernel crash LSO offload flag toggle
@ 2017-11-13 23:01 Bryant G. Ly
  2017-11-14  1:07 ` Daniel Axtens
  0 siblings, 1 reply; 8+ messages in thread
From: Bryant G. Ly @ 2017-11-13 23:01 UTC (permalink / raw)
  To: benh, paulus, mpe, tlfalcon; +Cc: linuxppc-dev, netdev, Bryant G. Ly, stable

The following script when run (along with some iperf traffic recreates the
crash within 5-10 mins or so).

while true
do
	ethtool -k ibmveth0 | grep tcp-segmentation-offload
	ethtool -K ibmveth0 tso off
	ethtool -k ibmveth0 | grep tcp-segmentation-offload
	ethtool -K ibmveth0 tso on
done

Note: This issue happens the very first time largsesend offload is
turned off too (but the above script recreates the issue all the times)

Stack trace output:
 [76563.914380] NIP [c000000000063940] memcpy_power7+0x40/0x800
[76563.914387] LR [d000000000d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth]
[76563.914392] Call Trace:
[76563.914396] [c0000000feab3270] [c0000000feab32d0] 0xc0000000feab32d0 (unreliable)
[76563.914407] [c0000000feab3360] [c0000000009816f4] dev_hard_start_xmit+0x304/0x530
[76563.914415] [c0000000feab3440] [c0000000009b6564] sch_direct_xmit+0x124/0x330
[76563.914423] [c0000000feab34e0] [c000000000981ddc] __dev_queue_xmit+0x26c/0x770
[76563.914431] [c0000000feab3580] [c000000000a1efc0] arp_xmit+0x30/0xd0
[76563.914438] [c0000000feab35f0] [c000000000a1f0f4] arp_send_dst.part.0+0x94/0xb0
[76563.914445] [c0000000feab3660] [c000000000a1fcb4] arp_solicit+0x114/0x2b0
[76563.914452] [c0000000feab3730] [c00000000098d8f4] neigh_probe+0x84/0xd0
[76563.914460] [c0000000feab3760] [c0000000009937cc] neigh_timer_handler+0xbc/0x320
[76563.914468] [c0000000feab37a0] [c00000000014a3fc] call_timer_fn+0x5c/0x1c0
[76563.914474] [c0000000feab3830] [c00000000014a8bc] run_timer_softirq+0x31c/0x3f0
[76563.914483] [c0000000feab3900] [c0000000000bec58] __do_softirq+0x188/0x3e0
[76563.914490] [c0000000feab39f0] [c0000000000bf128] irq_exit+0xc8/0x100
[76563.914498] [c0000000feab3a10] [c00000000001f974] timer_interrupt+0xa4/0xe0
[76563.914505] [c0000000feab3a40] [c000000000002714] decrementer_common+0x114/0x180

Oops output:
 [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
[76563.914197] Faulting instruction address: 0xc000000000063940
[76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
[76563.914210] SMP NR_CPUS=2048 NUMA pSeries
[76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
[76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
[76563.914258] task: c0000000fe9efcc0 ti: c0000000feab0000 task.ti: c0000000feab0000
[76563.914265] NIP: c000000000063940 LR: d000000000d31788 CTR: c000000000064100
[76563.914271] REGS: c0000000feab2ff0 TRAP: 0300   Not tainted  (4.4.0-34-generic)
[76563.914277] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 4800284e  XER: 0000001a
[76563.914294] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: 000000000000f240 c0000000feab3270 c0000000015b5d00 0000000000000000
GPR04: c00000000d9b0004 000000000000002a 0000000000000006 c0000000efc0ccac
GPR08: d000000000d3dd28 0000000000000080 0000000000000000 d000000000d34758
GPR12: c000000000064100 c00000000e7f1c80 c0000000ffdca938 0000000000000100
GPR16: c0000000ffdca738 c0000000ffdca538 c0000000feab34a0 c0000000015f4d00
GPR20: 0000000000000000 c0000000015f4cf0 c0000000f5945900 0000000000000000
GPR24: 0000000000000000 0000000080000000 c0000000feab32d0 c0000000efc0ccac
GPR28: c0000000f23ccb00 c0000000f5945000 c0000000f23ccb00 c0000000efc0cc00
[76563.914380] NIP [c000000000063940] memcpy_power7+0x40/0x800
[76563.914387] LR [d000000000d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth]
[76563.914392] Call Trace:
[76563.914396] [c0000000feab3270] [c0000000feab32d0] 0xc0000000feab32d0 (unreliable)
[76563.914407] [c0000000feab3360] [c0000000009816f4] dev_hard_start_xmit+0x304/0x530
[76563.914415] [c0000000feab3440] [c0000000009b6564] sch_direct_xmit+0x124/0x330
[76563.914423] [c0000000feab34e0] [c000000000981ddc] __dev_queue_xmit+0x26c/0x770
[76563.914431] [c0000000feab3580] [c000000000a1efc0] arp_xmit+0x30/0xd0
[76563.914438] [c0000000feab35f0] [c000000000a1f0f4] arp_send_dst.part.0+0x94/0xb0
[76563.914445] [c0000000feab3660] [c000000000a1fcb4] arp_solicit+0x114/0x2b0
[76563.914452] [c0000000feab3730] [c00000000098d8f4] neigh_probe+0x84/0xd0
[76563.914460] [c0000000feab3760] [c0000000009937cc] neigh_timer_handler+0xbc/0x320
[76563.914468] [c0000000feab37a0] [c00000000014a3fc] call_timer_fn+0x5c/0x1c0
[76563.914474] [c0000000feab3830] [c00000000014a8bc] run_timer_softirq+0x31c/0x3f0
[76563.914483] [c0000000feab3900] [c0000000000bec58] __do_softirq+0x188/0x3e0
[76563.914490] [c0000000feab39f0] [c0000000000bf128] irq_exit+0xc8/0x100
[76563.914498] [c0000000feab3a10] [c00000000001f974] timer_interrupt+0xa4/0xe0
[76563.914505] [c0000000feab3a40] [c000000000002714] decrementer_common+0x114/0x180
[76563.914515] --- interrupt: 901 at plpar_hcall_norets+0x1c/0x28
[76563.914515]     LR = check_and_cede_processor+0x34/0x50
[76563.914525] [c0000000feab3d30] [c000000000916bf0] check_and_cede_processor+0x20/0x50 (unreliable)
[76563.914534] [c0000000feab3d90] [c000000000916e18] shared_cede_loop+0x68/0x170
[76563.914541] [c0000000feab3dd0] [c000000000913e20] cpuidle_enter_state+0x160/0x410
[76563.914549] [c0000000feab3e30] [c000000000119d48] call_cpuidle+0x78/0xd0
[76563.914556] [c0000000feab3e70] [c00000000011a11c] cpu_startup_entry+0x37c/0x480
[76563.914564] [c0000000feab3f30] [c00000000004563c] start_secondary+0x33c/0x360
[76563.914572] [c0000000feab3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
[76563.914579] Instruction dump:
[76563.914584] 4185024c 7cc400d0 7cd01120 78c60760 409f0014 88040000 38840001 98030000
[76563.914596] 38630001 409e0014 a0040000 38840002 <b0030000> 38630002 409d0014 80040000
[76563.914613] ---[ end trace 5382b3d78671418e ]---
[76563.916817]
[76565.916870] Kernel panic - not syncing: Fatal exception in interrupt
[76565.919468] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.14+
---
 drivers/net/ethernet/ibm/ibmveth.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index f210398200ece..971ed54332d42 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	 */
 	if (force_bounce || (!skb_is_nonlinear(skb) &&
 				(skb->len < tx_copybreak))) {
-		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
-					  skb->len);
+		if (adapter->bounce_buffer) {
+			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
+						  skb->len);
+		} else {
+			adapter->tx_send_failed++;
+			netdev->stats.tx_dropped++;
+			goto out;
+		}
 
 		descs[0].fields.flags_len = desc_flags | skb->len;
 		descs[0].fields.address = adapter->bounce_buffer_dma;
@@ -1691,9 +1697,6 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		netdev->features |= NETIF_F_FRAGLIST;
 	}
 
-	netdev->min_mtu = IBMVETH_MIN_MTU;
-	netdev->max_mtu = ETH_MAX_MTU;
-
 	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
 	if (firmware_has_feature(FW_FEATURE_CMO))
-- 
2.13.6 (Apple Git-96)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-13 23:01 Bryant G. Ly
@ 2017-11-14  1:07 ` Daniel Axtens
  2017-11-14 15:24   ` Bryant G. Ly
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Axtens @ 2017-11-14  1:07 UTC (permalink / raw)
  To: Bryant G. Ly, benh, paulus, mpe, tlfalcon
  Cc: netdev, linuxppc-dev, stable, Bryant G. Ly

Hi Bryant,

A few things:

1) The commit message could probably be trimmed, especially the stack
traces.

2) What exactly are you changing and why does it fix the issue? I
couldn't figure that out from the commit message.

3) Does this need a Fixes: tag?


>  	}
>  
> -	netdev->min_mtu = IBMVETH_MIN_MTU;
> -	netdev->max_mtu = ETH_MAX_MTU;
> -

4) What does the above hunk do? It seems unrelated...


Regards,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-14  1:07 ` Daniel Axtens
@ 2017-11-14 15:24   ` Bryant G. Ly
  0 siblings, 0 replies; 8+ messages in thread
From: Bryant G. Ly @ 2017-11-14 15:24 UTC (permalink / raw)
  To: Daniel Axtens, benh, paulus, mpe, tlfalcon; +Cc: netdev, linuxppc-dev, stable

On 11/13/17 7:07 PM, Daniel Axtens wrote:

> Hi Bryant,
>
> A few things:
>
> 1) The commit message could probably be trimmed, especially the stack
> traces.

Yes, I can trim it. 

> 2) What exactly are you changing and why does it fix the issue? I
> couldn't figure that out from the commit message.

The bounce buffer can be null when skb_copy_from_linear_data tries to use
it and that causes the kernel crash. Thus adding the check to see whether or
not bounce buffer is null prior to utilizing it.

>
> 3) Does this need a Fixes: tag?
>
Honestly, this bug has existed for a very long time. It only became more evident when 
commit 66aa0678efc29abd2ab02a09b23f9a8bc9f12a6c was committed. It is rare for a customer
to his this issue since a customer only toggles LSO once and leaves it, but to re-create
this bug one would need to write a script like the one in the commit message to toggle it
on and off in a loop. Its ultimately a legacy bug that was uncovered more recently. 

>>  	}
>>  
>> -	netdev->min_mtu = IBMVETH_MIN_MTU;
>> -	netdev->max_mtu = ETH_MAX_MTU;
>> -
> 4) What does the above hunk do? It seems unrelated...
>
You are right the above is just cleanup, I should have separated the patch. 

> Regards,
> Daniel
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ibmveth: Kernel crash LSO offload flag toggle
@ 2017-11-14 15:34 Bryant G. Ly
  2017-11-15  2:47 ` Daniel Axtens
  0 siblings, 1 reply; 8+ messages in thread
From: Bryant G. Ly @ 2017-11-14 15:34 UTC (permalink / raw)
  To: benh, paulus, mpe, tlfalcon
  Cc: linuxppc-dev, netdev, Bryant G. Ly, Sivakumar Krishnasamy, stable

The following patch ensures that the bounce_buffer is not null
prior to using it within skb_copy_from_linear_data.

The problem can be recreated toggling on/off Large send offload.

The following script when run (along with some iperf traffic recreates the
crash within 5-10 mins or so).

while true
do
	ethtool -k ibmveth0 | grep tcp-segmentation-offload
	ethtool -K ibmveth0 tso off
	ethtool -k ibmveth0 | grep tcp-segmentation-offload
	ethtool -K ibmveth0 tso on
done

Note: This issue happens the very first time largsesend offload is
turned off too (but the above script recreates the issue all the times)

[76563.914173] Unable to handle kernel paging request for data at address 0x00000000
[76563.914197] Faulting instruction address: 0xc000000000063940
[76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
[76563.914210] SMP NR_CPUS=2048 NUMA pSeries
[76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
[76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
[76563.914258] task: c0000000fe9efcc0 ti: c0000000feab0000 task.ti: c0000000feab0000
[76563.914265] NIP: c000000000063940 LR: d000000000d31788 CTR: c000000000064100
[76563.914271] REGS: c0000000feab2ff0 TRAP: 0300   Not tainted  (4.4.0-34-generic)
[76563.914277] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 4800284e  XER: 0000001a
[76563.914294] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: 000000000000f240 c0000000feab3270 c0000000015b5d00 0000000000000000
GPR04: c00000000d9b0004 000000000000002a 0000000000000006 c0000000efc0ccac
GPR08: d000000000d3dd28 0000000000000080 0000000000000000 d000000000d34758
GPR12: c000000000064100 c00000000e7f1c80 c0000000ffdca938 0000000000000100
GPR16: c0000000ffdca738 c0000000ffdca538 c0000000feab34a0 c0000000015f4d00
GPR20: 0000000000000000 c0000000015f4cf0 c0000000f5945900 0000000000000000
GPR24: 0000000000000000 0000000080000000 c0000000feab32d0 c0000000efc0ccac
GPR28: c0000000f23ccb00 c0000000f5945000 c0000000f23ccb00 c0000000efc0cc00
[76563.914380] NIP [c000000000063940] memcpy_power7+0x40/0x800
[76563.914387] LR [d000000000d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth]
[76563.914392] Call Trace:
[76563.914396] [c0000000feab3270] [c0000000feab32d0] 0xc0000000feab32d0 (unreliable)
[76563.914407] [c0000000feab3360] [c0000000009816f4] dev_hard_start_xmit+0x304/0x530
[76563.914415] [c0000000feab3440] [c0000000009b6564] sch_direct_xmit+0x124/0x330
[76563.914423] [c0000000feab34e0] [c000000000981ddc] __dev_queue_xmit+0x26c/0x770
[76563.914431] [c0000000feab3580] [c000000000a1efc0] arp_xmit+0x30/0xd0
[76563.914438] [c0000000feab35f0] [c000000000a1f0f4] arp_send_dst.part.0+0x94/0xb0
[76563.914445] [c0000000feab3660] [c000000000a1fcb4] arp_solicit+0x114/0x2b0
[76563.914452] [c0000000feab3730] [c00000000098d8f4] neigh_probe+0x84/0xd0
[76563.914460] [c0000000feab3760] [c0000000009937cc] neigh_timer_handler+0xbc/0x320
[76563.914468] [c0000000feab37a0] [c00000000014a3fc] call_timer_fn+0x5c/0x1c0
[76563.914474] [c0000000feab3830] [c00000000014a8bc] run_timer_softirq+0x31c/0x3f0
[76563.914483] [c0000000feab3900] [c0000000000bec58] __do_softirq+0x188/0x3e0
[76563.914490] [c0000000feab39f0] [c0000000000bf128] irq_exit+0xc8/0x100
[76563.914498] [c0000000feab3a10] [c00000000001f974] timer_interrupt+0xa4/0xe0
[76563.914505] [c0000000feab3a40] [c000000000002714] decrementer_common+0x114/0x180
[76563.914515] --- interrupt: 901 at plpar_hcall_norets+0x1c/0x28
[76563.914515]     LR = check_and_cede_processor+0x34/0x50
[76563.914525] [c0000000feab3d30] [c000000000916bf0] check_and_cede_processor+0x20/0x50 (unreliable)
[76563.914534] [c0000000feab3d90] [c000000000916e18] shared_cede_loop+0x68/0x170
[76563.914541] [c0000000feab3dd0] [c000000000913e20] cpuidle_enter_state+0x160/0x410
[76563.914549] [c0000000feab3e30] [c000000000119d48] call_cpuidle+0x78/0xd0
[76563.914556] [c0000000feab3e70] [c00000000011a11c] cpu_startup_entry+0x37c/0x480
[76563.914564] [c0000000feab3f30] [c00000000004563c] start_secondary+0x33c/0x360
[76563.914572] [c0000000feab3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.4+
---
 drivers/net/ethernet/ibm/ibmveth.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index f210398200ece..1d29b1649118d 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	 */
 	if (force_bounce || (!skb_is_nonlinear(skb) &&
 				(skb->len < tx_copybreak))) {
-		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
-					  skb->len);
+		if (adapter->bounce_buffer) {
+			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
+						  skb->len);
+		} else {
+			adapter->tx_send_failed++;
+			netdev->stats.tx_dropped++;
+			goto out;
+		}
 
 		descs[0].fields.flags_len = desc_flags | skb->len;
 		descs[0].fields.address = adapter->bounce_buffer_dma;
-- 
2.13.6 (Apple Git-96)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-14 15:34 [PATCH] ibmveth: Kernel crash LSO offload flag toggle Bryant G. Ly
@ 2017-11-15  2:47 ` Daniel Axtens
  2017-11-15  3:14   ` Benjamin Herrenschmidt
  2017-11-15 16:45   ` Bryant G. Ly
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Axtens @ 2017-11-15  2:47 UTC (permalink / raw)
  To: Bryant G. Ly, benh, paulus, mpe, tlfalcon
  Cc: linuxppc-dev, netdev, Bryant G. Ly, Sivakumar Krishnasamy, stable

Hi Bryant,

This looks a bit better, but...

> The following patch ensures that the bounce_buffer is not null
> prior to using it within skb_copy_from_linear_data.

How would this occur?

Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
and allocated in ibmveth_open() only. If allocation fails, the whole
opening of the device fails with -ENOMEM.

It seems your test case - changing TSO - causes ibmveth_set_tso() to
cause an adaptor restart - an ibmveth_close(dev) and then an
ibmveth_open(dev). Is this happening in parallel with an out of memory
condition - is the memory allocation failing?

Alternatively, could it be the case that you're closing the device while
packets are in flight, and then trying to use a bounce_buffer that's
been freed elsewhere? Do you need to decouple memory freeing from
ibmveth_close?

> The problem can be recreated toggling on/off Large send offload.
>
> The following script when run (along with some iperf traffic recreates the
> crash within 5-10 mins or so).
>
> while true
> do
> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> 	ethtool -K ibmveth0 tso off
> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> 	ethtool -K ibmveth0 tso on
> done
>
> Note: This issue happens the very first time largsesend offload is
> turned off too (but the above script recreates the issue all the times)
>
> [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
> [76563.914197] Faulting instruction address: 0xc000000000063940
> [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
> [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
> [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
> [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
                                                            ^--- yikes!

There are relevant changes to this area since 4.4:
2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")

Does this crash occurs on a more recent kernel?

> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index f210398200ece..1d29b1649118d 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
>  	 */
>  	if (force_bounce || (!skb_is_nonlinear(skb) &&
>  				(skb->len < tx_copybreak))) {
> -		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> -					  skb->len);
> +		if (adapter->bounce_buffer) {
> +			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> +						  skb->len);
> +		} else {
> +			adapter->tx_send_failed++;
> +			netdev->stats.tx_dropped++;
> +			goto out;
Finally, as I alluded to at the top of this message, isn't the
disappearance of the bounce-buffer a pretty serious issue? As I
understand it, it means your data structures are now inconsistent. Do
you need to - at the least - be more chatty here?

Regards,
Daniel
> +		}
>  
>  		descs[0].fields.flags_len = desc_flags | skb->len;
>  		descs[0].fields.address = adapter->bounce_buffer_dma;
> -- 
> 2.13.6 (Apple Git-96)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-15  2:47 ` Daniel Axtens
@ 2017-11-15  3:14   ` Benjamin Herrenschmidt
  2017-11-15 16:45   ` Bryant G. Ly
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-15  3:14 UTC (permalink / raw)
  To: Daniel Axtens, Bryant G. Ly, paulus, mpe, tlfalcon
  Cc: linuxppc-dev, netdev, Sivakumar Krishnasamy, stable

On Wed, 2017-11-15 at 13:47 +1100, Daniel Axtens wrote:
> Hi Bryant,
> 
> This looks a bit better, but...
> 
> > The following patch ensures that the bounce_buffer is not null
> > prior to using it within skb_copy_from_linear_data.
> 
> How would this occur?
> 
> Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
> and allocated in ibmveth_open() only. If allocation fails, the whole
> opening of the device fails with -ENOMEM.
> 
> It seems your test case - changing TSO - causes ibmveth_set_tso() to
> cause an adaptor restart - an ibmveth_close(dev) and then an
> ibmveth_open(dev). Is this happening in parallel with an out of memory
> condition - is the memory allocation failing?
> 
> Alternatively, could it be the case that you're closing the device while
> packets are in flight, and then trying to use a bounce_buffer that's
> been freed elsewhere? Do you need to decouple memory freeing from
> ibmveth_close?

Hrm, you should at least stop the tx queue and NAPI (and synchronize)
while doing a reset. A lot of drivers, rather than doing close/open
(which does subtly different things) tend to instead fire a work queue
(often called reset_task) which does the job (and uses the same lower
level helpers as open/close to free/allocate the rings etc...).


> > The problem can be recreated toggling on/off Large send offload.
> > 
> > The following script when run (along with some iperf traffic recreates the
> > crash within 5-10 mins or so).
> > 
> > while true
> > do
> > 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> > 	ethtool -K ibmveth0 tso off
> > 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> > 	ethtool -K ibmveth0 tso on
> > done
> > 
> > Note: This issue happens the very first time largsesend offload is
> > turned off too (but the above script recreates the issue all the times)
> > 
> > [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
> > [76563.914197] Faulting instruction address: 0xc000000000063940
> > [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
> > [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
> > [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
> > [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
> 
>                                                             ^--- yikes!
> 
> There are relevant changes to this area since 4.4:
> 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
> 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
> 
> Does this crash occurs on a more recent kernel?
> 
> > diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> > index f210398200ece..1d29b1649118d 100644
> > --- a/drivers/net/ethernet/ibm/ibmveth.c
> > +++ b/drivers/net/ethernet/ibm/ibmveth.c
> > @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
> >  	 */
> >  	if (force_bounce || (!skb_is_nonlinear(skb) &&
> >  				(skb->len < tx_copybreak))) {
> > -		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> > -					  skb->len);
> > +		if (adapter->bounce_buffer) {
> > +			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> > +						  skb->len);
> > +		} else {
> > +			adapter->tx_send_failed++;
> > +			netdev->stats.tx_dropped++;
> > +			goto out;
> 
> Finally, as I alluded to at the top of this message, isn't the
> disappearance of the bounce-buffer a pretty serious issue? As I
> understand it, it means your data structures are now inconsistent. Do
> you need to - at the least - be more chatty here?
> 
> Regards,
> Daniel
> > +		}
> >  
> >  		descs[0].fields.flags_len = desc_flags | skb->len;
> >  		descs[0].fields.address = adapter->bounce_buffer_dma;
> > -- 
> > 2.13.6 (Apple Git-96)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-15  2:47 ` Daniel Axtens
  2017-11-15  3:14   ` Benjamin Herrenschmidt
@ 2017-11-15 16:45   ` Bryant G. Ly
  2017-11-15 21:57     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 8+ messages in thread
From: Bryant G. Ly @ 2017-11-15 16:45 UTC (permalink / raw)
  To: Daniel Axtens, benh, paulus, mpe, tlfalcon
  Cc: linuxppc-dev, netdev, Sivakumar Krishnasamy, stable

On 11/14/17 8:47 PM, Daniel Axtens wrote:

> Hi Bryant,
>
> This looks a bit better, but...
>
>> The following patch ensures that the bounce_buffer is not null
>> prior to using it within skb_copy_from_linear_data.
> How would this occur?
>
> Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
> and allocated in ibmveth_open() only. If allocation fails, the whole
> opening of the device fails with -ENOMEM.
>
> It seems your test case - changing TSO - causes ibmveth_set_tso() to
> cause an adaptor restart - an ibmveth_close(dev) and then an
> ibmveth_open(dev). Is this happening in parallel with an out of memory
> condition - is the memory allocation failing?
>
> Alternatively, could it be the case that you're closing the device while
> packets are in flight, and then trying to use a bounce_buffer that's
> been freed elsewhere? Do you need to decouple memory freeing from
> ibmveth_close?

When checksum or largesend attribute for VEA is enabled, it triggers a
ibmveth_set_features->ibmveth_set_csum_offload or ibmveth_set_tso

Those two routines will do a ibmveth_close, which as you see unmaps and frees
the bounce buffer. 

Like you said if there is data in transmit, it causes the system to crash due
to bounce_buffer being used when its already freed elsewhere.

This patch just closes the window, bad things can still happen. I wanted to leave it
up to the people who actively develop in ibmveth to close the window, since introducing
a lock can be expensive in tx. 

>> The problem can be recreated toggling on/off Large send offload.
>>
>> The following script when run (along with some iperf traffic recreates the
>> crash within 5-10 mins or so).
>>
>> while true
>> do
>> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
>> 	ethtool -K ibmveth0 tso off
>> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
>> 	ethtool -K ibmveth0 tso on
>> done
>>
>> Note: This issue happens the very first time largsesend offload is
>> turned off too (but the above script recreates the issue all the times)
>>
>> [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
>> [76563.914197] Faulting instruction address: 0xc000000000063940
>> [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
>> [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
>> [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
>> [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
>                                                             ^--- yikes!
>
> There are relevant changes to this area since 4.4:
> 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
> 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
>
> Does this crash occurs on a more recent kernel?

We DKMS ibmveth, so we have the most recent stuff. Also I have a test 4.14 kernel
that still have this problem. 

>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index f210398200ece..1d29b1649118d 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
>>  	 */
>>  	if (force_bounce || (!skb_is_nonlinear(skb) &&
>>  				(skb->len < tx_copybreak))) {
>> -		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
>> -					  skb->len);
>> +		if (adapter->bounce_buffer) {
>> +			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
>> +						  skb->len);
>> +		} else {
>> +			adapter->tx_send_failed++;
>> +			netdev->stats.tx_dropped++;
>> +			goto out;
> Finally, as I alluded to at the top of this message, isn't the
> disappearance of the bounce-buffer a pretty serious issue? As I
> understand it, it means your data structures are now inconsistent. Do
> you need to - at the least - be more chatty here?
>
> Regards,
> Daniel

Yes, we can be more chatty, but like I said previously the real solution is to make sure
if there is data in transmit to clear that up first before napi_disable in ibmveth_close. 

-Bryant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
  2017-11-15 16:45   ` Bryant G. Ly
@ 2017-11-15 21:57     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-15 21:57 UTC (permalink / raw)
  To: Bryant G. Ly, Daniel Axtens, paulus, mpe, tlfalcon
  Cc: linuxppc-dev, netdev, Sivakumar Krishnasamy, stable

On Wed, 2017-11-15 at 10:45 -0600, Bryant G. Ly wrote:
> This patch just closes the window, bad things can still happen. I wanted to leave it
> up to the people who actively develop in ibmveth to close the window, since introducing
> a lock can be expensive in tx. 

You don't need to instroduce a lock. The network stack already have a
per-queue lock, you just use the existing one.

Look at what I did in sungem or ftgmac100 with the reset task, those
are fairly simple drivers and should illustrate the technique.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-16  0:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-14 15:34 [PATCH] ibmveth: Kernel crash LSO offload flag toggle Bryant G. Ly
2017-11-15  2:47 ` Daniel Axtens
2017-11-15  3:14   ` Benjamin Herrenschmidt
2017-11-15 16:45   ` Bryant G. Ly
2017-11-15 21:57     ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2017-11-13 23:01 Bryant G. Ly
2017-11-14  1:07 ` Daniel Axtens
2017-11-14 15:24   ` Bryant G. Ly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).