usbnet OOM handling - possible CPU starvation?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* usbnet OOM handling - possible CPU starvation?
@ 2014-03-18  9:59 Bjørn Mork
  2014-03-18 10:29 ` David Laight
  0 siblings, 1 reply; 3+ messages in thread
From: Bjørn Mork @ 2014-03-18  9:59 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, linux-usb

Hello Oliver & other clueful gurus!

I started looking at how to improve the OOM handling in usbnet, but have
to admit that this is way over my head. I don't think I dare touching
it...

I've seen a few reports clearly showing that we do have a problem. One
example is here: http://forums.whirlpool.net.au/forum-replies.cfm?t=2207038&p=57#r1122

Although all the reports I've seen are from users of cdc_ncm, I don't
think this problem is specific to any particualar minidriver. But it is
of course easier to trigger it using a minidriver with (too?)  large
buffers, like the cdc_ncm minidriver family.

To recall: The NCM and MBIM protocols aggregate packets before
transission over the USB link, requiring USB buffers with multiple
ethernet/IP packets.  Common code used by both cdc_mbim and cdc_ncm
currently cap the buffers at 32768 bytes (hard coded limit), which is
still too high for some embedded hosts.  This results in failures to
allocate buffers in rx_submit (flags == GFP_ATOMIC when it is called
from the URB callback):

	skb = __netdev_alloc_skb_ip_align(dev->net, size, flags);
	if (!skb) {
		netif_dbg(dev, rx_err, dev->net, "no rx skb\n");
		usbnet_defer_kevent (dev, EVENT_RX_MEMORY);
		usb_free_urb (urb);
		return -ENOMEM;
	}

The deferred event ends up retrying the same allocation with
flags == GFP_KERNEL :

	/* tasklet could resubmit itself forever if memory is tight */
	if (test_bit (EVENT_RX_MEMORY, &dev->flags)) {
		struct urb	*urb = NULL;
		int resched = 1;

		if (netif_running (dev->net))
			urb = usb_alloc_urb (0, GFP_KERNEL);
		else
			clear_bit (EVENT_RX_MEMORY, &dev->flags);
		if (urb != NULL) {
			clear_bit (EVENT_RX_MEMORY, &dev->flags);
			status = usb_autopm_get_interface(dev->intf);
			if (status < 0) {
				usb_free_urb(urb);
				goto fail_lowmem;
			}
			if (rx_submit (dev, urb, GFP_KERNEL) == -ENOLINK)
				resched = 0;
			usb_autopm_put_interface(dev->intf);
fail_lowmem:
			if (resched)
				tasklet_schedule (&dev->bh);
		}
	}

But what if this still fails with -ENOMEM? The scheduled tasklet then
runs this:

	// or are we maybe short a few urbs?
	} else if (netif_running (dev->net) &&
		   netif_device_present (dev->net) &&
		   netif_carrier_ok(dev->net) &&
		   !timer_pending (&dev->delay) &&
		   !test_bit (EVENT_RX_HALT, &dev->flags)) {
		int	temp = dev->rxq.qlen;

		if (temp < RX_QLEN(dev)) {
			if (rx_alloc_submit(dev, GFP_ATOMIC) == -ENOLINK)
				return;
			if (temp != dev->rxq.qlen)
				netif_dbg(dev, link, dev->net,
					  "rxqlen %d --> %d\n",
					  temp, dev->rxq.qlen);
			if (dev->rxq.qlen < RX_QLEN(dev))
				tasklet_schedule (&dev->bh);
		}
		if (dev->txq.qlen < TX_QLEN (dev))
			netif_wake_queue (dev->net);
	}

which is going to reschedule itself infinitely, without any delays at
all, as long as dev->rxq.qlen < RX_QLEN(dev).  So this depends on
rx_alloc_submit being able to allocate RX_QLEN(dev) buffers, or it is
going to lock up one CPU forever in a tight loop.

But rx_alloc_submit is just going to repeat the rx_submit() call over
and over again, with flags == GFP_ATOMIC:

static int rx_alloc_submit(struct usbnet *dev, gfp_t flags)
{
	struct urb	*urb;
	int		i;
	int		ret = 0;

	/* don't refill the queue all at once */
	for (i = 0; i < 10 && dev->rxq.qlen < RX_QLEN(dev); i++) {
		urb = usb_alloc_urb(0, flags);
		if (urb != NULL) {
			ret = rx_submit(dev, urb, flags);
			if (ret)
				goto err;
		} else {
			ret = -ENOMEM;
			goto err;
		}
	}
err:
	return ret;
}

And quite frankly: That is pretty much guaranteed to fail if we ended up
going through that route in the first place, isn't it?  So rx_submit
tries to defer the allocation again, and we're back where we started.
Except that we are now already rescheduling the tasklet infinitely.

So we are going to spin here forever, unless something magically frees
up some memory for us.  Which of course is much less likely to happen
when we spin like this.  There are users with severely limited OpenWRT
based embedded systems hitting this, with a single CPU core and not much
memory at all...

One visibly symptom of this state, in addition to the CPU spinning, is
that the log will be full of "kevent 2 may have been dropped" messages.
rx_submit tries to defer the allocation every time it fails, and with a
high enough failure rate then schedule_work is called much faster than
the work queue is processed.

But how should this be fixed?  Running a delay timer for each time the
tasklet reschedules might help giving up the CPU.  Maybe we should also
break out at some point, either failing the device or trying to reduce
RX_QLEN(dev)?  If the deferred GFP_KERNEL allocation ever fails, then I
think it's time for the driver to stop and reconsider its strategies.

Another idea I've played is that we could try to dynamically reduce the
rx buffer size.  This might be possible with NCM/MBIM.  The current hard
coded ceiling is arbitrary and could be set at runtime instead.  We
could add a usbnet callback requesting the minidriver to reduce its rx
buffer size.

Help and/or ideas are appreciated.

I will try to add code to cdc_ncm allowing userspace to change the
buffer ceilings at runtime, which should help with the immediate
problem. But it will require user interaction, which of course isn't as
good as an automatic solution.

[1] 

Bjørn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: usbnet OOM handling - possible CPU starvation?
  2014-03-18  9:59 usbnet OOM handling - possible CPU starvation? Bjørn Mork
@ 2014-03-18 10:29 ` David Laight
  2014-03-18 11:01   ` Oliver Neukum
  0 siblings, 1 reply; 3+ messages in thread
From: David Laight @ 2014-03-18 10:29 UTC (permalink / raw)
  To: 'Bjørn Mork', Oliver Neukum
  Cc: netdev@vger.kernel.org, linux-usb@vger.kernel.org

From: Bjørn Mork
> Hello Oliver & other clueful gurus!
> 
> I started looking at how to improve the OOM handling in usbnet, but have
> to admit that this is way over my head. I don't think I dare touching
> it...
> 
> I've seen a few reports clearly showing that we do have a problem. One
> example is here: http://forums.whirlpool.net.au/forum-replies.cfm?t=2207038&p=57#r1122
> 
> Although all the reports I've seen are from users of cdc_ncm, I don't
> think this problem is specific to any particualar minidriver. But it is
> of course easier to trigger it using a minidriver with (too?)  large
> buffers, like the cdc_ncm minidriver family.
> 
> To recall: The NCM and MBIM protocols aggregate packets before
> transission over the USB link, requiring USB buffers with multiple
> ethernet/IP packets.  Common code used by both cdc_mbim and cdc_ncm
> currently cap the buffers at 32768 bytes (hard coded limit), which is
> still too high for some embedded hosts.  This results in failures to
> allocate buffers in rx_submit (flags == GFP_ATOMIC when it is called
> from the URB callback):

I think that some of these drivers need to use a slightly different
buffer scheme to the one that usbnet currently supports.

In many cases the usb hardware will aggregate many receive frames into
a single USB bulk packet, and also supports the driver aggregating multiple
transmit frames.
Not all the stub drivers aggregate transmit packets.

It really doesn't make much sense (to me) to transmit and receive
aggregated packets from dynamically allocated skb.
I'd have thought it much better to copy the ethernet frames to/from
pre-allocated buffers.
Yes, you take the cost of a data copy, but for USB2 the data rate
isn't that high, and USB3 ought to be able to do scatter-gather
transmit (when xhci is fixed).

The transmit side could have 2 buffers, and at most one active urb.
So merges until the previous tx finishes (or 3 buffers and two
active tx).

The rx side needs a few buffers that are multiples of the usb
transfer size - so that long usb bulk data packets can end up
in multiple urb, and then get merged into a single skb correctly
sized for the ethernet frame.
For USB2 maybe you could use 1536 byte skb - but I expect that
you still want to copy short frames.

At the moment some driver allocate very large rx skb - and then
lie about their truesize.
If any devices actually do receive segment offload (the ax88179
does TSO, don't know about the rx side), then using large skb
for the rx side just in case they are very long packets when
they can be passed directly up the stack.

	David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: usbnet OOM handling - possible CPU starvation?
  2014-03-18 10:29 ` David Laight
@ 2014-03-18 11:01   ` Oliver Neukum
  0 siblings, 0 replies; 3+ messages in thread
From: Oliver Neukum @ 2014-03-18 11:01 UTC (permalink / raw)
  To: David Laight
  Cc: 'Bjørn Mork', netdev@vger.kernel.org,
	linux-usb@vger.kernel.org

On Tue, 2014-03-18 at 10:29 +0000, David Laight wrote:
> From: Bjørn Mork

Hi,

> > To recall: The NCM and MBIM protocols aggregate packets before
> > transission over the USB link, requiring USB buffers with multiple
> > ethernet/IP packets.  Common code used by both cdc_mbim and cdc_ncm
> > currently cap the buffers at 32768 bytes (hard coded limit), which is
> > still too high for some embedded hosts.  This results in failures to
> > allocate buffers in rx_submit (flags == GFP_ATOMIC when it is called
> > from the URB callback):

True. We cannot rely on this working from a callback. So the correct
OOM response would happen from a delayed work queue.

> It really doesn't make much sense (to me) to transmit and receive
> aggregated packets from dynamically allocated skb.
> I'd have thought it much better to copy the ethernet frames to/from
> pre-allocated buffers.

That makes sense.

> The transmit side could have 2 buffers, and at most one active urb.
> So merges until the previous tx finishes (or 3 buffers and two
> active tx).

We had bad experiences with that scheme. It turns out that you
need to have two URBs in flight, because the time you lose waiting
for the next frame to begin is too high.
So you need at least three buffers. 

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-03-18 11:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-18  9:59 usbnet OOM handling - possible CPU starvation? Bjørn Mork
2014-03-18 10:29 ` David Laight
2014-03-18 11:01   ` Oliver Neukum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).