[PATCH] atm: br2684: Fix excessive queue bloat

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] atm: br2684: Fix excessive queue bloat
@ 2012-11-24  0:01 David Woodhouse
  2012-11-25 21:09 ` David Miller
  2012-11-25 21:43 ` Krzysztof Mazur
  0 siblings, 2 replies; 5+ messages in thread
From: David Woodhouse @ 2012-11-24  0:01 UTC (permalink / raw)
  To: netdev
  Cc: John Crispin, Dave Täht, Krzysztof Mazur,
	Chas Williams (CONTRACTOR)

[-- Attachment #1: Type: text/plain, Size: 5089 bytes --]

There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.

If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.

cf. commit 9d02daf754238adac48fa075ee79e7edd3d79ed3 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.

Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.

An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...

Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.

Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 net/atm/br2684.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 4819d315..3440a79 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -74,6 +74,7 @@ struct br2684_vcc {
 	struct br2684_filter filter;
 #endif /* CONFIG_ATM_BR2684_IPFILTER */
 	unsigned int copies_needed, copies_failed;
+	atomic_t qspace;
 };

 struct br2684_dev {
@@ -181,18 +182,15 @@ static struct notifier_block atm_dev_notifier = {
 static void br2684_pop(struct atm_vcc *vcc, struct sk_buff *skb)
 {
 	struct br2684_vcc *brvcc = BR2684_VCC(vcc);
-	struct net_device *net_dev = skb->dev;

-	pr_debug("(vcc %p ; net_dev %p )\n", vcc, net_dev);
+	pr_debug("(vcc %p ; net_dev %p )\n", vcc, brvcc->device);
 	brvcc->old_pop(vcc, skb);

-	if (!net_dev)
-		return;
-
-	if (atm_may_send(vcc, 0))
-		netif_wake_queue(net_dev);
-
+	/* If the queue space just went up from zero, wake */
+	if (atomic_inc_return(&brvcc->qspace) == 1)
+		netif_wake_queue(brvcc->device);
 }
+
 /*
  * Send a packet out a particular vcc.  Not to useful right now, but paves
  * the way for multiple vcc's per itf.  Returns true if we can send,
@@ -256,16 +254,19 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
-	atmvcc->send(atmvcc, skb);

-	if (!atm_may_send(atmvcc, 0)) {
+	if (atomic_dec_return(&brvcc->qspace) < 1) {
+		/* No more please! */
 		netif_stop_queue(brvcc->device);
-		/*check for race with br2684_pop*/
-		if (atm_may_send(atmvcc, 0))
-			netif_start_queue(brvcc->device);
+		/* We might have raced with br2684_pop() */
+		if (unlikely(atomic_read(&brvcc->qspace) > 0))
+			netif_wake_queue(brvcc->device);
 	}

-	return 1;
+	/* If this fails immediately, the skb will be freed and br2684_pop()
+	   will wake the queue if appropriate. Just return an error so that
+	   the stats are updated correctly */
+	return !atmvcc->send(atmvcc, skb);
 }

 static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
@@ -504,6 +505,11 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL);
 	if (!brvcc)
 		return -ENOMEM;
+	/* Allow two packets in the ATM queue. One actually being sent, and one
+	   for the ATM 'TX done' handler to send. It shouldn't take long to get
+	   the next one from the netdev queue, when we need it. More than that
+	   would be bufferbloat. */
+	atomic_set(&brvcc->qspace, 2);
 	write_lock_irq(&devs_lock);
 	net_dev = br2684_find_dev(&be.ifspec);
 	if (net_dev == NULL) {
-- 
1.8.0

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
  2012-11-24  0:01 [PATCH] atm: br2684: Fix excessive queue bloat David Woodhouse
@ 2012-11-25 21:09 ` David Miller
  2012-11-25 21:43 ` Krzysztof Mazur
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2012-11-25 21:09 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, blogic, dave.taht, krzysiek, chas

From: David Woodhouse <dwmw2@infradead.org>
Date: Sat, 24 Nov 2012 00:01:32 +0000

> +	/* Allow two packets in the ATM queue. One actually being sent, and one
> +	   for the ATM 'TX done' handler to send. It shouldn't take long to get
> +	   the next one from the netdev queue, when we need it. More than that
> +	   would be bufferbloat. */

Please format this:

	/* Like
	 * this.
	 */

and I'll apply this, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
  2012-11-24  0:01 [PATCH] atm: br2684: Fix excessive queue bloat David Woodhouse
  2012-11-25 21:09 ` David Miller
@ 2012-11-25 21:43 ` Krzysztof Mazur
  2012-11-25 22:01   ` David Woodhouse
  1 sibling, 1 reply; 5+ messages in thread
From: Krzysztof Mazur @ 2012-11-25 21:43 UTC (permalink / raw)
  To: David Woodhouse
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)

On Sat, Nov 24, 2012 at 12:01:32AM +0000, David Woodhouse wrote:
> There's really no excuse for an additional wmem_default of buffering
> between the netdev queue and the ATM device. Two packets (one in-flight,
> and one ready to send) ought to be fine. It's not as if it should take
> long to get another from the netdev queue when we need it.
> 
> If necessary we can make the queue space configurable later, but I don't
> think it's likely to be necessary.

Maybe some high-speed devices will require larger queue, especially for
smaller packets, but 2 packet queue should be sufficient in almost all cases.

>  static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
> @@ -504,6 +505,11 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
>  	brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL);
>  	if (!brvcc)
>  		return -ENOMEM;
> +	/* Allow two packets in the ATM queue. One actually being sent, and one
> +	   for the ATM 'TX done' handler to send. It shouldn't take long to get
> +	   the next one from the netdev queue, when we need it. More than that
> +	   would be bufferbloat. */
> +	atomic_set(&brvcc->qspace, 2);

Maybe this magic "2" and the comment should be moved to some #define.

>  	write_lock_irq(&devs_lock);
>  	net_dev = br2684_find_dev(&be.ifspec);
>  	if (net_dev == NULL) {

Looks good,

Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>

Krzysiek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
  2012-11-25 21:43 ` Krzysztof Mazur
@ 2012-11-25 22:01   ` David Woodhouse
  2012-11-25 23:52     ` Krzysztof Mazur
  0 siblings, 1 reply; 5+ messages in thread
From: David Woodhouse @ 2012-11-25 22:01 UTC (permalink / raw)
  To: Krzysztof Mazur
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Sun, 2012-11-25 at 22:43 +0100, Krzysztof Mazur wrote:
> On Sat, Nov 24, 2012 at 12:01:32AM +0000, David Woodhouse wrote:
> > There's really no excuse for an additional wmem_default of buffering
> > between the netdev queue and the ATM device. Two packets (one in-flight,
> > and one ready to send) ought to be fine. It's not as if it should take
> > long to get another from the netdev queue when we need it.
> > 
> > If necessary we can make the queue space configurable later, but I don't
> > think it's likely to be necessary.
> 
> Maybe some high-speed devices will require larger queue, especially for
> smaller packets, but 2 packet queue should be sufficient in almost all cases.

Yeah. This is fairly much the same conversation I ended up having when I
did the same for PPPoATM.

Some day *perhaps* we might look at doing something adaptive, so it'll
detect a TX underrun and increase the amount of buffering. But this
seems perfectly good for now.

> Maybe this magic "2" and the comment should be moved to some #define.

Doesn't make it any less magic. I'm happier with it as it is, with a
clear comment describing why it's done that way.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
  2012-11-25 22:01   ` David Woodhouse
@ 2012-11-25 23:52     ` Krzysztof Mazur
  0 siblings, 0 replies; 5+ messages in thread
From: Krzysztof Mazur @ 2012-11-25 23:52 UTC (permalink / raw)
  To: David Woodhouse
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)

On Sun, Nov 25, 2012 at 10:01:32PM +0000, David Woodhouse wrote:
> 
> Yeah. This is fairly much the same conversation I ended up having when I
> did the same for PPPoATM.
> 
> Some day *perhaps* we might look at doing something adaptive, so it'll
> detect a TX underrun and increase the amount of buffering. But this
> seems perfectly good for now.

The biggest problem is the existence of additional, mostly unnecessary
huge software queue that we have between us and the hardware on "ATM sockets".
Most of ATM drivers have some hardware queue, but accept and queue
additional frames in software and it just adds unnecessary delay
- it's much better to queue those frames in netdev queue. With this patch
we limit this queue by limiting the number of frames in this queue and
hardware queue to 2 frames, but the hardware may require larger *hardware*
queue to saturate the link. If we had only the hardware queue, the queue
would be probably sized adequately in ATM driver or could be tuned by user.

I think that the 2 frame queue is much better than the huge queue,
that we have now.

> 
> > Maybe this magic "2" and the comment should be moved to some #define.
> 
> Doesn't make it any less magic. I'm happier with it as it is, with a
> clear comment describing why it's done that way.
> 

In PPPoATM we have NONE_INFLIGHT.

Krzysiek

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-11-25 23:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-24  0:01 [PATCH] atm: br2684: Fix excessive queue bloat David Woodhouse
2012-11-25 21:09 ` David Miller
2012-11-25 21:43 ` Krzysztof Mazur
2012-11-25 22:01   ` David Woodhouse
2012-11-25 23:52     ` Krzysztof Mazur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).