linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Daniel Axtens <dja@axtens.net>,
	"Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	paulus@samba.org, mpe@ellerman.id.au,
	tlfalcon@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org,
	Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
Date: Wed, 15 Nov 2017 14:14:52 +1100	[thread overview]
Message-ID: <1510715692.2397.41.camel@kernel.crashing.org> (raw)
In-Reply-To: <87inec2r83.fsf@linkitivity.dja.id.au>

On Wed, 2017-11-15 at 13:47 +1100, Daniel Axtens wrote:
> Hi Bryant,
> 
> This looks a bit better, but...
> 
> > The following patch ensures that the bounce_buffer is not null
> > prior to using it within skb_copy_from_linear_data.
> 
> How would this occur?
> 
> Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
> and allocated in ibmveth_open() only. If allocation fails, the whole
> opening of the device fails with -ENOMEM.
> 
> It seems your test case - changing TSO - causes ibmveth_set_tso() to
> cause an adaptor restart - an ibmveth_close(dev) and then an
> ibmveth_open(dev). Is this happening in parallel with an out of memory
> condition - is the memory allocation failing?
> 
> Alternatively, could it be the case that you're closing the device while
> packets are in flight, and then trying to use a bounce_buffer that's
> been freed elsewhere? Do you need to decouple memory freeing from
> ibmveth_close?

Hrm, you should at least stop the tx queue and NAPI (and synchronize)
while doing a reset. A lot of drivers, rather than doing close/open
(which does subtly different things) tend to instead fire a work queue
(often called reset_task) which does the job (and uses the same lower
level helpers as open/close to free/allocate the rings etc...).


> > The problem can be recreated toggling on/off Large send offload.
> > 
> > The following script when run (along with some iperf traffic recreates the
> > crash within 5-10 mins or so).
> > 
> > while true
> > do
> > 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> > 	ethtool -K ibmveth0 tso off
> > 	ethtool -k ibmveth0 | grep tcp-segmentation-offload
> > 	ethtool -K ibmveth0 tso on
> > done
> > 
> > Note: This issue happens the very first time largsesend offload is
> > turned off too (but the above script recreates the issue all the times)
> > 
> > [76563.914173] Unable to handle kernel paging request for data at address 0x00000000
> > [76563.914197] Faulting instruction address: 0xc000000000063940
> > [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
> > [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
> > [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth
> > [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
> 
>                                                             ^--- yikes!
> 
> There are relevant changes to this area since 4.4:
> 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
> 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
> 
> Does this crash occurs on a more recent kernel?
> 
> > diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> > index f210398200ece..1d29b1649118d 100644
> > --- a/drivers/net/ethernet/ibm/ibmveth.c
> > +++ b/drivers/net/ethernet/ibm/ibmveth.c
> > @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
> >  	 */
> >  	if (force_bounce || (!skb_is_nonlinear(skb) &&
> >  				(skb->len < tx_copybreak))) {
> > -		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> > -					  skb->len);
> > +		if (adapter->bounce_buffer) {
> > +			skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> > +						  skb->len);
> > +		} else {
> > +			adapter->tx_send_failed++;
> > +			netdev->stats.tx_dropped++;
> > +			goto out;
> 
> Finally, as I alluded to at the top of this message, isn't the
> disappearance of the bounce-buffer a pretty serious issue? As I
> understand it, it means your data structures are now inconsistent. Do
> you need to - at the least - be more chatty here?
> 
> Regards,
> Daniel
> > +		}
> >  
> >  		descs[0].fields.flags_len = desc_flags | skb->len;
> >  		descs[0].fields.address = adapter->bounce_buffer_dma;
> > -- 
> > 2.13.6 (Apple Git-96)

  reply	other threads:[~2017-11-15  4:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 15:34 [PATCH] ibmveth: Kernel crash LSO offload flag toggle Bryant G. Ly
2017-11-15  2:47 ` Daniel Axtens
2017-11-15  3:14   ` Benjamin Herrenschmidt [this message]
2017-11-15 16:45   ` Bryant G. Ly
2017-11-15 21:57     ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2017-11-13 23:01 Bryant G. Ly
2017-11-14  1:07 ` Daniel Axtens
2017-11-14 15:24   ` Bryant G. Ly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1510715692.2397.41.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=bryantly@linux.vnet.ibm.com \
    --cc=dja@axtens.net \
    --cc=ksiva@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=stable@vger.kernel.org \
    --cc=tlfalcon@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).