Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Daniel Axtens <dja@axtens.net>
To: "Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
	tlfalcon@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org,
	"Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
Date: Wed, 15 Nov 2017 13:47:08 +1100	[thread overview]
Message-ID: <87inec2r83.fsf@linkitivity.dja.id.au> (raw)
In-Reply-To: <20171114153420.3911-1-bryantly@linux.vnet.ibm.com>

Hi Bryant,

This looks a bit better, but...

> The following patch ensures that the bounce_buffer is not null
> prior to using it within skb_copy_from_linear_data.

How would this occur?

Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
and allocated in ibmveth_open() only. If allocation fails, the whole
opening of the device fails with -ENOMEM.

It seems your test case - changing TSO - causes ibmveth_set_tso() to
cause an adaptor restart - an ibmveth_close(dev) and then an
ibmveth_open(dev). Is this happening in parallel with an out of memory
condition - is the memory allocation failing?

Alternatively, could it be the case that you're closing the device while
packets are in flight, and then trying to use a bounce_buffer that's
been freed elsewhere? Do you need to decouple memory freeing from
ibmveth_close?

> The problem can be recreated toggling on/off Large send offload.
>
> The following script when run (along with some iperf traffic recreates the
> crash within 5-10 mins or so).
>
> while true
> do
> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> 	ethtool -K ibmveth0 tso off
> 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> 	ethtool -K ibmveth0 tso on
> done
>
> Note: This issue happens the very first time largsesend offload is
> turned off too (but the above script recreates the issue all the times)
>
> [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
> [76563.914197] Faulting instruction address: 0xc000000000063940
> [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
> [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
> [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
> [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
                                                            ^--- yikes!

There are relevant changes to this area since 4.4:
2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")

Does this crash occurs on a more recent kernel?

> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index f210398200ece..1d29b1649118d 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
>  	 */
>  	if (force_bounce || (!skb_is_nonlinear(skb) &&
>  				(skb->len < tx_copybreak))) {
> -		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> -					  skb->len);
> +		if (adapter->bounce_buffer) {
> +			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> +						  skb->len);
> +		} else {
> +			adapter->tx_send_failed++;
> +			netdev->stats.tx_dropped++;
> +			goto out;
Finally, as I alluded to at the top of this message, isn't the
disappearance of the bounce-buffer a pretty serious issue? As I
understand it, it means your data structures are now inconsistent. Do
you need to - at the least - be more chatty here?

Regards,
Daniel
> +		}
>  
>  		descs[0].fields.flags_len = desc_flags | skb->len;
>  		descs[0].fields.address = adapter->bounce_buffer_dma;
> -- 
> 2.13.6 (Apple Git-96)

next prev parent reply	other threads:[~2017-11-15  2:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 15:34 [PATCH] ibmveth: Kernel crash LSO offload flag toggle Bryant G. Ly
2017-11-15  2:47 ` Daniel Axtens [this message]
2017-11-15  3:14   ` Benjamin Herrenschmidt
2017-11-15 16:45   ` Bryant G. Ly
2017-11-15 21:57     ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2017-11-13 23:01 Bryant G. Ly
2017-11-14  1:07 ` Daniel Axtens
2017-11-14 15:24   ` Bryant G. Ly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inec2r83.fsf@linkitivity.dja.id.au \
    --to=dja@axtens.net \
    --cc=benh@kernel.crashing.org \
    --cc=bryantly@linux.vnet.ibm.com \
    --cc=ksiva@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=stable@vger.kernel.org \
    --cc=tlfalcon@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.