[DEBUG]: sk_forward_alloc assertion failures

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [DEBUG]: sk_forward_alloc assertion failures
@ 2005-01-14  1:12 David S. Miller
  2005-01-14  1:25 ` Anton Blanchard
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: David S. Miller @ 2005-01-14  1:12 UTC (permalink / raw)
  To: anton; +Cc: netdev, herbert

Anton, can you and the other IBM folks who can reproduce
the problem give this test patch a whirl?

Herbert and I aren't exactly sure what the exact sequence
of events is that causes the problem, but we do know that
this code is highly suspect.

[ BTW, Herbert, it just occurred to me that these adjustments
  are incorrect for the skb->data pulling case since we really
  aren't liberating the data. ]

Thanks.

===== net/ipv4/tcp_output.c 1.75 vs edited =====
--- 1.75/net/ipv4/tcp_output.c	2004-12-28 10:22:40 -08:00
+++ edited/net/ipv4/tcp_output.c	2005-01-13 16:45:26 -08:00
@@ -580,12 +580,12 @@

 	TCP_SKB_CB(skb)->seq += len;
 	skb->ip_summed = CHECKSUM_HW;
-
+#if 0
 	skb->truesize	     -= len;
 	sk->sk_queue_shrunk   = 1;
 	sk->sk_wmem_queued   -= len;
 	sk->sk_forward_alloc += len;
-
+#endif
 	/* Any change of skb->len requires recalculation of tso
 	 * factor and mss.
 	 */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:12 [DEBUG]: sk_forward_alloc assertion failures David S. Miller
@ 2005-01-14  1:25 ` Anton Blanchard
  2005-01-14  4:19   ` David S. Miller
  2005-01-14  1:37 ` Herbert Xu
  2005-01-14  1:50 ` Sridhar Samudrala
  2 siblings, 1 reply; 24+ messages in thread
From: Anton Blanchard @ 2005-01-14  1:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, herbert


Hi Dave,
 
> Anton, can you and the other IBM folks who can reproduce
> the problem give this test patch a whirl?
> 
> Herbert and I aren't exactly sure what the exact sequence
> of events is that causes the problem, but we do know that
> this code is highly suspect.
> 
> [ BTW, Herbert, it just occurred to me that these adjustments
>   are incorrect for the skb->data pulling case since we really
>   aren't liberating the data. ]

Excellent! Ive forwarded it on, lets see if it reproduces.

Anton

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:25 ` Anton Blanchard
@ 2005-01-14  4:19   ` David S. Miller
  2005-01-14 11:16     ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-14  4:19 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev, herbert

On Fri, 14 Jan 2005 12:25:04 +1100
Anton Blanchard <anton@samba.org> wrote:

> Excellent! Ive forwarded it on, lets see if it reproduces.

I think Herbert and I may have zeroed in on the true cause
of this bug.  It's a bug older than TSO support, but it
never showed up because nothing really depended upon
skb->truesize being incredibly accurate.

do_tcp_sendpages(), when it allocates new SKBs, optimistically
sets skb->truesize to the struct sk_buff et al. overhead
plus tp->mss_cache.  This, in and of itself, leaves us generally
with a valid skb->truesize value.  At worst, it will be an
overestimate if we don't fully fill the SKB, but that's fine.
It's underestimates that can lead to the BUG case.

do_tcp_sendpages() aparently does this so it doesn't have to
keep adjusting skb->truesize, sk_wmem_queued, and sk_forward_alloc
as new page pieces are tacked onto the SKB.

tcp_sendmsg() works the other way, it actually does adjust all
the accounting knobs as each piece of user data is copied into
the packet.

So if tcp_sendmsg() creates the SKB, and then do_tcp_sendpages()
tacks pages onto the tail of that SKB, skb->truesize can be way
too small.

Later on if this were a TSO frame, and it gets deeply partially
acked, tcp_trim_skb() will underflow skb->truesize and this is
where all the trouble starts.

The easiest way to fix this is to simply make do_tcp_sendpages()
account just like tcp_sendmsg() does.  This is implemented below
and should be the real fix for the sk_forward_alloc assertion
failures.

===== net/ipv4/tcp.c 1.88 vs edited =====
--- 1.88/net/ipv4/tcp.c	2005-01-06 15:13:39 -08:00
+++ edited/net/ipv4/tcp.c	2005-01-13 19:44:36 -08:00
@@ -664,7 +664,7 @@
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;

-			skb = sk_stream_alloc_pskb(sk, 0, tp->mss_cache,
+			skb = sk_stream_alloc_pskb(sk, 0, 0,
 						   sk->sk_allocation);
 			if (!skb)
 				goto wait_for_memory;
@@ -689,6 +689,9 @@

 		skb->len += copy;
 		skb->data_len += copy;
+		skb->truesize += copy;
+		sk->sk_wmem_queued += copy;
+		sk->sk_forward_alloc -= copy;
 		skb->ip_summed = CHECKSUM_HW;
 		tp->write_seq += copy;
 		TCP_SKB_CB(skb)->end_seq += copy;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  4:19   ` David S. Miller
@ 2005-01-14 11:16     ` Herbert Xu
  2005-01-14 12:03       ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 11:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: Anton Blanchard, netdev

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Thu, Jan 13, 2005 at 08:19:14PM -0800, David S. Miller wrote:
> 
> The easiest way to fix this is to simply make do_tcp_sendpages()
> account just like tcp_sendmsg() does.  This is implemented below
> and should be the real fix for the sk_forward_alloc assertion
> failures.

Sorry, but this may bypass the memory checks if sk_forward_alloc
is less than mss_now.

Since the packet is bounded by mss_now, and most of the time it
will be exactly mss_now bytes long, how about if we simply
replace tp->mss_cache by mss_now?

Could you guys please give this patch a go?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 402 bytes --]

===== net/ipv4/tcp.c 1.88 vs edited =====
--- 1.88/net/ipv4/tcp.c	2005-01-07 10:13:39 +11:00
+++ edited/net/ipv4/tcp.c	2005-01-14 22:14:41 +11:00
@@ -664,7 +664,7 @@
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
 
-			skb = sk_stream_alloc_pskb(sk, 0, tp->mss_cache,
+			skb = sk_stream_alloc_pskb(sk, 0, mss_now,
 						   sk->sk_allocation);
 			if (!skb)
 				goto wait_for_memory;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 11:16     ` Herbert Xu
@ 2005-01-14 12:03       ` Herbert Xu
  2005-01-14 19:03         ` David S. Miller
  2005-01-14 21:34         ` Herbert Xu
  0 siblings, 2 replies; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: Anton Blanchard, netdev

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Fri, Jan 14, 2005 at 10:16:48PM +1100, herbert wrote:
> 
> Since the packet is bounded by mss_now, and most of the time it
> will be exactly mss_now bytes long, how about if we simply
> replace tp->mss_cache by mss_now?

Doh! My patch is the same as the current code :)

What we need to do is to remember mss_now across sendpage calls.  
Luckily the value of truesize lets us deduce the value of mss_now.
To complicate the picture the mss might have been reduced between
sendpages calls.  So we take the minimum of the remembered mss
and the current mss.

Any brave souls out there to try this?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 1917 bytes --]

===== net/ipv4/tcp.c 1.88 vs edited =====
--- 1.88/net/ipv4/tcp.c	2005-01-07 10:13:39 +11:00
+++ edited/net/ipv4/tcp.c	2005-01-14 22:57:44 +11:00
@@ -654,23 +654,31 @@
 
 	while (psize > 0) {
 		struct sk_buff *skb = sk->sk_write_queue.prev;
+		int skb_mss;
 		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
 		int offset = poffset % PAGE_SIZE;
 		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 
-		if (!sk->sk_send_head || (copy = mss_now - skb->len) <= 0) {
+		if (!sk->sk_send_head)
+			goto new_segment;
+
+		skb_mss = skb_netsize(skb) - skb_headroom(skb);
+		if (mss_now < skb_mss)
+			skb_mss = mss_now;
+		if ((copy = skb_mss - skb->len) <= 0) {
 new_segment:
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
 
-			skb = sk_stream_alloc_pskb(sk, 0, tp->mss_cache,
+			skb = sk_stream_alloc_pskb(sk, 0, mss_now,
 						   sk->sk_allocation);
 			if (!skb)
 				goto wait_for_memory;
 
 			skb_entail(sk, tp, skb);
 			copy = mss_now;
+			skb_mss = mss_now;
 		}
 
 		if (copy > size)
@@ -702,14 +710,14 @@
 		if (!(psize -= copy))
 			goto out;
 
-		if (skb->len != mss_now || (flags & MSG_OOB))
+		if (skb->len != skb_mss || (flags & MSG_OOB))
 			continue;
 
 		if (forced_push(tp)) {
 			tcp_mark_push(tp, skb);
-			__tcp_push_pending_frames(sk, tp, mss_now, TCP_NAGLE_PUSH);
+			__tcp_push_pending_frames(sk, tp, skb_mss, TCP_NAGLE_PUSH);
 		} else if (skb == sk->sk_send_head)
-			tcp_push_one(sk, mss_now);
+			tcp_push_one(sk, skb_mss);
 		continue;
 
 wait_for_sndbuf:
===== include/linux/skbuff.h 1.58 vs edited =====
--- 1.58/include/linux/skbuff.h	2004-12-28 16:24:42 +11:00
+++ edited/include/linux/skbuff.h	2005-01-14 22:30:45 +11:00
@@ -1124,6 +1124,11 @@
 	return buffer;
 }
 
+static inline int skb_netsize(const struct sk_buff *skb)
+{
+	return skb->truesize - sizeof(struct sk_buff);
+}
+
 extern void skb_init(void);
 extern void skb_add_mtu(int mtu);
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 12:03       ` Herbert Xu
@ 2005-01-14 19:03         ` David S. Miller
  2005-01-14 20:34           ` Herbert Xu
  2005-01-14 21:34         ` Herbert Xu
  1 sibling, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-14 19:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Fri, 14 Jan 2005 23:03:22 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> What we need to do is to remember mss_now across sendpage calls.  
> Luckily the value of truesize lets us deduce the value of mss_now.
> To complicate the picture the mss might have been reduced between
> sendpages calls.  So we take the minimum of the remembered mss
> and the current mss.
> 
> Any brave souls out there to try this?

Herbert, this patch doesn't fix the bug.  We still have
to do the adjustments gradually just like sendmsg().
That's the whole problem.

If sendmsg() creates the SKB, then sendpages() adds paged
data onto the end, we don't make skb->truesize big enough.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 19:03         ` David S. Miller
@ 2005-01-14 20:34           ` Herbert Xu
  2005-01-14 21:27             ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 20:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Fri, Jan 14, 2005 at 11:03:52AM -0800, David S. Miller wrote:
> 
> Herbert, this patch doesn't fix the bug.  We still have
> to do the adjustments gradually just like sendmsg().
> That's the whole problem.
> 
> If sendmsg() creates the SKB, then sendpages() adds paged
> data onto the end, we don't make skb->truesize big enough.

My reasoning is as follows:

The size of the skb is bounded by the MSS.  Therefore it can
never grow beyond that.  So if we allocated that much memory
then we can tack bits on as long as we don't exceed the original
MSS that was used.

The problem before was that the MSS could've changed between
sendpages calls.  If it increased then we may exceed the
amount of memory allocated originally.

My idea is to remember the original MSS so that we never exceed
it.  Did I missing something?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 20:34           ` Herbert Xu
@ 2005-01-14 21:27             ` David S. Miller
  2005-01-14 21:38               ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-14 21:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Sat, 15 Jan 2005 07:34:52 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> My reasoning is as follows:
> 
> The size of the skb is bounded by the MSS.  Therefore it can
> never grow beyond that.  So if we allocated that much memory
> then we can tack bits on as long as we don't exceed the original
> MSS that was used.
> 
> The problem before was that the MSS could've changed between
> sendpages calls.  If it increased then we may exceed the
> amount of memory allocated originally.
> 
> My idea is to remember the original MSS so that we never exceed
> it.  Did I missing something?

You missed the whole problem.

It has nothing to do with the MSS changing.

The bug case is when _SEND_MSG_ creates the SKB, and a subsequent
_SEND_PAGE_ call adds pages onto that SKB in the queue.

Unlike _SEND_PAGE_, _SEND_MSG_ does not do the "allocate tp->mss_cache"
thingy.  Instead, it adjusts the queueing allocation values as it copies
the data into the SKB.

Therefore my change aims to make _SEND_PAGE_ use _SEND_MSG_'s accounting
scheme (adjust as we actually add the data) so that it all works out.

That fix is in Linus's tree at this point.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 21:27             ` David S. Miller
@ 2005-01-14 21:38               ` Herbert Xu
  2005-01-14 21:36                 ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 21:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Fri, Jan 14, 2005 at 01:27:57PM -0800, David S. Miller wrote:
> 
> You missed the whole problem.
> 
> It has nothing to do with the MSS changing.

However if the MSS does change then it can cause the same symptoms.
In fact the MSS can easily increase when TSO is enabled since it
increases as the RCV window is opened up.

> The bug case is when _SEND_MSG_ creates the SKB, and a subsequent
> _SEND_PAGE_ call adds pages onto that SKB in the queue.
> 
> Unlike _SEND_PAGE_, _SEND_MSG_ does not do the "allocate tp->mss_cache"
> thingy.  Instead, it adjusts the queueing allocation values as it copies
> the data into the SKB.
> 
> Therefore my change aims to make _SEND_PAGE_ use _SEND_MSG_'s accounting
> scheme (adjust as we actually add the data) so that it all works out.

I understand your point about sendmsg followed by sendpages.  My patch
addresses that as well by making sure that we don't tack on more bits
than what's allowed by skb->truesize.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 21:38               ` Herbert Xu
@ 2005-01-14 21:36                 ` David S. Miller
  2005-01-14 21:55                   ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-14 21:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Sat, 15 Jan 2005 08:38:29 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Fri, Jan 14, 2005 at 01:27:57PM -0800, David S. Miller wrote:
> > 
> > You missed the whole problem.
> > 
> > It has nothing to do with the MSS changing.
> 
> However if the MSS does change then it can cause the same symptoms.
> In fact the MSS can easily increase when TSO is enabled since it
> increases as the RCV window is opened up.

MSS changes are handled _transparently_ if we do the adjustments
as we actaully add data to the packet.  We always account what is
actually put into the packet in this way.

That's why I made the fix the way that I did.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 21:36                 ` David S. Miller
@ 2005-01-14 21:55                   ` Herbert Xu
  2005-01-14 22:04                     ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 21:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Fri, Jan 14, 2005 at 01:36:11PM -0800, David S. Miller wrote:
> 
> MSS changes are handled _transparently_ if we do the adjustments
> as we actaully add data to the packet.  We always account what is
> actually put into the packet in this way.
> 
> That's why I made the fix the way that I did.

You're right.  However, doing so requires us to check sk_forward_alloc
every time around the loop.  That is, you'll need to do the same check
as is done in sk_stream_alloc_pskb and sk_stream_alloc_page.  If we
simply adjust sk_forward_alloc as you do now then it can easily go
negative if we didn't have enough memory to start with.

My point is that most of the time sendpages will produce a packet that
is MSS-sized.  Therefore we can avoid the duplicate work (if the
connection is at full throttle then MSS ~~ 64K which is much bigger
than the page size of 4K) by allocating MSS bytes up front in
sk_forward_alloc.

As long as we're careful about what we've allocated in the skb then
everything will be alright.  We're in trouble right now with mixed
sendmsg/sendpages calls and changing MSSs only because we're not
careful about keeping track of how much memory we've allocated
at the start.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 21:55                   ` Herbert Xu
@ 2005-01-14 22:04                     ` David S. Miller
  2005-01-14 22:47                       ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-14 22:04 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Sat, 15 Jan 2005 08:55:04 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> As long as we're careful about what we've allocated in the skb then
> everything will be alright.  We're in trouble right now with mixed
> sendmsg/sendpages calls and changing MSSs only because we're not
> careful about keeping track of how much memory we've allocated
> at the start.

I see, but we don't want sendmsg() to charge a full MSS if the
user is only writing 10 bytes of data onto the wire.  That seems
to be what happens if we implement things the way you are suggesting.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 22:04                     ` David S. Miller
@ 2005-01-14 22:47                       ` Herbert Xu
  2005-01-15  4:55                         ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 22:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]

On Fri, Jan 14, 2005 at 02:04:26PM -0800, David S. Miller wrote:
> 
> I see, but we don't want sendmsg() to charge a full MSS if the
> user is only writing 10 bytes of data onto the wire.  That seems
> to be what happens if we implement things the way you are suggesting.

Yes you're right.  Although I belive it only happens in the case of
sendpages().  It will allocate more memory than what's required in
that case.  I'd say that it should be rare to have a 10 byte packet with
sendfile() though.

However, it's not all that difficult to fix up either.  We can adjust
truesize to a more reasonable value in tcp_write_xmit().  Something
like this will do.

Actually it happens in the case of sendmsg() too.  Unfortunately in
that case we can't do a thing about it since the memory is allocated
between skb->head and skb->tail.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 745 bytes --]

===== net/ipv4/tcp_output.c 1.75 vs edited =====
--- 1.75/net/ipv4/tcp_output.c	2004-12-29 05:22:40 +11:00
+++ edited/net/ipv4/tcp_output.c	2005-01-15 09:43:48 +11:00
@@ -750,9 +750,21 @@
 		       tcp_snd_test(tp, skb, mss_now,
 			       	    tcp_skb_is_last(sk, skb) ? nonagle :
 				    			       TCP_NAGLE_PUSH)) {
+			unsigned int len, used;
+
 			if (skb->len > mss_now) {
 				if (tcp_fragment(sk, skb, mss_now))
 					break;
+			}
+
+			len = skb_netsize(skb);
+			used = skb->data_len + (skb->end - skb->head);
+			if (len > used) {
+				len -= used;
+				skb->truesize	     -= len;
+				sk->sk_queue_shrunk   = 1;
+				sk->sk_wmem_queued   -= len;
+				sk->sk_forward_alloc += len;
 			}
 
 			TCP_SKB_CB(skb)->when = tcp_time_stamp;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 22:47                       ` Herbert Xu
@ 2005-01-15  4:55                         ` David S. Miller
  2005-01-15  5:45                           ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-15  4:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Sat, 15 Jan 2005 09:47:45 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> However, it's not all that difficult to fix up either.  We can adjust
> truesize to a more reasonable value in tcp_write_xmit().  Something
> like this will do.
> 
> Actually it happens in the case of sendmsg() too.  Unfortunately in
> that case we can't do a thing about it since the memory is allocated
> between skb->head and skb->tail.

Doing the adjustments at tcp_write_xmit() time also runs into the
problem you mentioned where we're just blindly subtracting from
sk_forward_alloc.

I really still feel that the best way is to "adjust as we add data".
sendmsg() does that already, I tried to add it simply to sendpage()
but it just needs some checks and wait_for_memory logic.

Here's a patch against current BK that tries to do that.  See any
holes? :-)

===== net/ipv4/tcp.c 1.89 vs edited =====
--- 1.89/net/ipv4/tcp.c	2005-01-13 19:57:57 -08:00
+++ edited/net/ipv4/tcp.c	2005-01-14 20:22:27 -08:00
@@ -655,7 +655,7 @@
 	while (psize > 0) {
 		struct sk_buff *skb = sk->sk_write_queue.prev;
 		struct page *page = pages[poffset / PAGE_SIZE];
-		int copy, i;
+		int copy, i, can_coalesce;
 		int offset = poffset % PAGE_SIZE;
 		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 
@@ -677,14 +677,20 @@
 			copy = size;
 
 		i = skb_shinfo(skb)->nr_frags;
-		if (skb_can_coalesce(skb, i, page, offset)) {
+		can_coalesce = skb_can_coalesce(skb, i, page, offset);
+		if (!can_coalesce && i >= MAX_SKB_FRAGS) {
+			tcp_mark_push(tp, skb);
+			goto new_segment;
+		}
+		if (sk->sk_forward_alloc < copy &&
+		    !sk_stream_mem_schedule(sk, copy, 0))
+			goto wait_for_memory;
+		
+		if (can_coalesce) {
 			skb_shinfo(skb)->frags[i - 1].size += copy;
-		} else if (i < MAX_SKB_FRAGS) {
+		} else {
 			get_page(page);
 			skb_fill_page_desc(skb, i, page, offset, copy);
-		} else {
-			tcp_mark_push(tp, skb);
-			goto new_segment;
 		}
 
 		skb->len += copy;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-15  4:55                         ` David S. Miller
@ 2005-01-15  5:45                           ` Herbert Xu
  2005-01-17 20:37                             ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-15  5:45 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Fri, Jan 14, 2005 at 08:55:43PM -0800, David S. Miller wrote:
> 
> Doing the adjustments at tcp_write_xmit() time also runs into the
> problem you mentioned where we're just blindly subtracting from
> sk_forward_alloc.

Not really because what we're doing there is increasing sk_forward_alloc,
just as we do in tcp_trim_head.

Anyway, I'm not terribly attached to that idea since we're doing
a page's worth of data each time around the loop anyway so it's
not too bad.
 
> Here's a patch against current BK that tries to do that.  See any
> holes? :-)

Just one little problem :)

> +		if (sk->sk_forward_alloc < copy &&
> +		    !sk_stream_mem_schedule(sk, copy, 0))

You want to have sk->sk_allocation here instead of 0.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-15  5:45                           ` Herbert Xu
@ 2005-01-17 20:37                             ` David S. Miller
  2005-01-17 20:40                               ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-01-17 20:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Sat, 15 Jan 2005 16:45:47 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> > +		if (sk->sk_forward_alloc < copy &&
> > +		    !sk_stream_mem_schedule(sk, copy, 0))
> 
> You want to have sk->sk_allocation here instead of 0.

The third argument to sk_stream_mem_schedule() is "int kind"
which is to be set to 1 for receive packet allocations and 0
for send packet allocations.

Why would I want to use a sk_allocation gfp mask here? :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-17 20:37                             ` David S. Miller
@ 2005-01-17 20:40                               ` Herbert Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Herbert Xu @ 2005-01-17 20:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Mon, Jan 17, 2005 at 12:37:29PM -0800, David S. Miller wrote:
> On Sat, 15 Jan 2005 16:45:47 +1100
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > > +		if (sk->sk_forward_alloc < copy &&
> > > +		    !sk_stream_mem_schedule(sk, copy, 0))
> > 
> > You want to have sk->sk_allocation here instead of 0.
> 
> The third argument to sk_stream_mem_schedule() is "int kind"
> which is to be set to 1 for receive packet allocations and 0
> for send packet allocations.
> 
> Why would I want to use a sk_allocation gfp mask here? :-)

You're right.  I got carried away there :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14 12:03       ` Herbert Xu
  2005-01-14 19:03         ` David S. Miller
@ 2005-01-14 21:34         ` Herbert Xu
  1 sibling, 0 replies; 24+ messages in thread
From: Herbert Xu @ 2005-01-14 21:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Anton Blanchard, netdev

[-- Attachment #1: Type: text/plain, Size: 564 bytes --]

On Fri, Jan 14, 2005 at 11:03:22PM +1100, herbert wrote:
>
> To complicate the picture the mss might have been reduced between
> sendpages calls.  So we take the minimum of the remembered mss
> and the current mss.

With that the packet size might actually exceed the current mss.
So we need to change the skb->len != skb_mss test to
skb->len < skb_mss.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 1916 bytes --]

===== net/ipv4/tcp.c 1.88 vs edited =====
--- 1.88/net/ipv4/tcp.c	2005-01-07 10:13:39 +11:00
+++ edited/net/ipv4/tcp.c	2005-01-15 08:31:52 +11:00
@@ -654,23 +654,31 @@
 
 	while (psize > 0) {
 		struct sk_buff *skb = sk->sk_write_queue.prev;
+		int skb_mss;
 		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
 		int offset = poffset % PAGE_SIZE;
 		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 
-		if (!sk->sk_send_head || (copy = mss_now - skb->len) <= 0) {
+		if (!sk->sk_send_head)
+			goto new_segment;
+
+		skb_mss = skb_netsize(skb) - skb_headroom(skb);
+		if (mss_now < skb_mss)
+			skb_mss = mss_now;
+		if ((copy = skb_mss - skb->len) <= 0) {
 new_segment:
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
 
-			skb = sk_stream_alloc_pskb(sk, 0, tp->mss_cache,
+			skb = sk_stream_alloc_pskb(sk, 0, mss_now,
 						   sk->sk_allocation);
 			if (!skb)
 				goto wait_for_memory;
 
 			skb_entail(sk, tp, skb);
 			copy = mss_now;
+			skb_mss = mss_now;
 		}
 
 		if (copy > size)
@@ -702,14 +710,14 @@
 		if (!(psize -= copy))
 			goto out;
 
-		if (skb->len != mss_now || (flags & MSG_OOB))
+		if (skb->len < skb_mss || (flags & MSG_OOB))
 			continue;
 
 		if (forced_push(tp)) {
 			tcp_mark_push(tp, skb);
-			__tcp_push_pending_frames(sk, tp, mss_now, TCP_NAGLE_PUSH);
+			__tcp_push_pending_frames(sk, tp, skb_mss, TCP_NAGLE_PUSH);
 		} else if (skb == sk->sk_send_head)
-			tcp_push_one(sk, mss_now);
+			tcp_push_one(sk, skb_mss);
 		continue;
 
 wait_for_sndbuf:
===== include/linux/skbuff.h 1.58 vs edited =====
--- 1.58/include/linux/skbuff.h	2004-12-28 16:24:42 +11:00
+++ edited/include/linux/skbuff.h	2005-01-14 22:30:45 +11:00
@@ -1124,6 +1124,11 @@
 	return buffer;
 }
 
+static inline int skb_netsize(const struct sk_buff *skb)
+{
+	return skb->truesize - sizeof(struct sk_buff);
+}
+
 extern void skb_init(void);
 extern void skb_add_mtu(int mtu);
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:12 [DEBUG]: sk_forward_alloc assertion failures David S. Miller
  2005-01-14  1:25 ` Anton Blanchard
@ 2005-01-14  1:37 ` Herbert Xu
  2005-01-14  4:09   ` David S. Miller
  2005-01-14  1:50 ` Sridhar Samudrala
  2 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14  1:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev

On Thu, Jan 13, 2005 at 05:12:34PM -0800, David S. Miller wrote:
> 
> [ BTW, Herbert, it just occurred to me that these adjustments
>   are incorrect for the skb->data pulling case since we really
>   aren't liberating the data. ]

Yes but I thought the idea was to fake it so the application
can transmit more packets? Without TSO, the data would've been
freed at this point.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:37 ` Herbert Xu
@ 2005-01-14  4:09   ` David S. Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David S. Miller @ 2005-01-14  4:09 UTC (permalink / raw)
  To: Herbert Xu; +Cc: anton, netdev

On Fri, 14 Jan 2005 12:37:37 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Thu, Jan 13, 2005 at 05:12:34PM -0800, David S. Miller wrote:
> > 
> > [ BTW, Herbert, it just occurred to me that these adjustments
> >   are incorrect for the skb->data pulling case since we really
> >   aren't liberating the data. ]
> 
> Yes but I thought the idea was to fake it so the application
> can transmit more packets? Without TSO, the data would've been
> freed at this point.

You're right.  At most we'll pseudo-free no more than 1 full
packet, so it's safe.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:12 [DEBUG]: sk_forward_alloc assertion failures David S. Miller
  2005-01-14  1:25 ` Anton Blanchard
  2005-01-14  1:37 ` Herbert Xu
@ 2005-01-14  1:50 ` Sridhar Samudrala
  2005-01-14  2:10   ` Herbert Xu
  2 siblings, 1 reply; 24+ messages in thread
From: Sridhar Samudrala @ 2005-01-14  1:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, netdev, herbert

Dave,

I think with this patch we will start seeing
sk_wmem_queued assertions.

Currently sk_wmem_queued seems to be accounted
correctly, but this patch changes this value too.

-Sridhar

On Thu, 13 Jan 2005, David S. Miller wrote:

>
> Anton, can you and the other IBM folks who can reproduce
> the problem give this test patch a whirl?
>
> Herbert and I aren't exactly sure what the exact sequence
> of events is that causes the problem, but we do know that
> this code is highly suspect.
>
> [ BTW, Herbert, it just occurred to me that these adjustments
>   are incorrect for the skb->data pulling case since we really
>   aren't liberating the data. ]
>
> Thanks.
>
> ===== net/ipv4/tcp_output.c 1.75 vs edited =====
> --- 1.75/net/ipv4/tcp_output.c	2004-12-28 10:22:40 -08:00
> +++ edited/net/ipv4/tcp_output.c	2005-01-13 16:45:26 -08:00
> @@ -580,12 +580,12 @@
>
>  	TCP_SKB_CB(skb)->seq += len;
>  	skb->ip_summed = CHECKSUM_HW;
> -
> +#if 0
>  	skb->truesize	     -= len;
>  	sk->sk_queue_shrunk   = 1;
>  	sk->sk_wmem_queued   -= len;
>  	sk->sk_forward_alloc += len;
> -
> +#endif
>  	/* Any change of skb->len requires recalculation of tso
>  	 * factor and mss.
>  	 */
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  1:50 ` Sridhar Samudrala
@ 2005-01-14  2:10   ` Herbert Xu
  2005-01-14  3:44     ` David S. Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2005-01-14  2:10 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: David S. Miller, anton, netdev

On Thu, Jan 13, 2005 at 05:50:17PM -0800, Sridhar Samudrala wrote:
> 
> I think with this patch we will start seeing
> sk_wmem_queued assertions.

Not necessarily.  What's probably happening currently is that
sk_forward_alloc is temporarily raised above the value it should
have.  It then gets cut down by the reclaim function and
subsequently goes negative.

Since there is no reclaim for sk_wmem_queued it doesn't happen
there.

So taking out the adjustments should not cause any problems of
sk_wmem_queued, theoretically :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
  2005-01-14  2:10   ` Herbert Xu
@ 2005-01-14  3:44     ` David S. Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David S. Miller @ 2005-01-14  3:44 UTC (permalink / raw)
  To: Herbert Xu; +Cc: sri, anton, netdev

On Fri, 14 Jan 2005 13:10:05 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Since there is no reclaim for sk_wmem_queued it doesn't happen
> there.
> 
> So taking out the adjustments should not cause any problems of
> sk_wmem_queued, theoretically :)

If we take out the skb->truesize adjustment, as the test patch does,
then we must take out the sk_wmem_queued and sk_forward_alloc
adjustments, as the test patch also does. :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [DEBUG]: sk_forward_alloc assertion failures
@ 2005-01-16 23:03 Nancy Milliner
  0 siblings, 0 replies; 24+ messages in thread
From: Nancy Milliner @ 2005-01-16 23:03 UTC (permalink / raw)
  To: netdev

Dave,

I've been running SPECweb99 with this patch all weekend and have not 
been able to produce any kernel assertions at all.  I'm currently 
running RH kernel 2.6.9-5.EL.root and have been able to reproduce at 
will when setting /proc/net/ipv4/tcp_tso_win_divisor to 1, 2 or 4.

Let me know when you have something else ready to test.

===== net/ipv4/tcp.c 1.88 vs edited =====
--- 1.88/net/ipv4/tcp.c	2005-01-06 15:13:39 -08:00
+++ edited/net/ipv4/tcp.c	2005-01-13 19:44:36 -08:00
@@ -664,7 +664,7 @@
  			if (!sk_stream_memory_free(sk))
  				goto wait_for_sndbuf;

-			skb = sk_stream_alloc_pskb(sk, 0, tp->mss_cache,
+			skb = sk_stream_alloc_pskb(sk, 0, 0,
  						   sk->sk_allocation);
  			if (!skb)
  				goto wait_for_memory;
@@ -689,6 +689,9 @@

  		skb->len += copy;
  		skb->data_len += copy;
+		skb->truesize += copy;
+		sk->sk_wmem_queued += copy;
+		sk->sk_forward_alloc -= copy;
  		skb->ip_summed = CHECKSUM_HW;
  		tp->write_seq += copy;
  		TCP_SKB_CB(skb)->end_seq += copy;

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2005-01-17 20:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-14  1:12 [DEBUG]: sk_forward_alloc assertion failures David S. Miller
2005-01-14  1:25 ` Anton Blanchard
2005-01-14  4:19   ` David S. Miller
2005-01-14 11:16     ` Herbert Xu
2005-01-14 12:03       ` Herbert Xu
2005-01-14 19:03         ` David S. Miller
2005-01-14 20:34           ` Herbert Xu
2005-01-14 21:27             ` David S. Miller
2005-01-14 21:38               ` Herbert Xu
2005-01-14 21:36                 ` David S. Miller
2005-01-14 21:55                   ` Herbert Xu
2005-01-14 22:04                     ` David S. Miller
2005-01-14 22:47                       ` Herbert Xu
2005-01-15  4:55                         ` David S. Miller
2005-01-15  5:45                           ` Herbert Xu
2005-01-17 20:37                             ` David S. Miller
2005-01-17 20:40                               ` Herbert Xu
2005-01-14 21:34         ` Herbert Xu
2005-01-14  1:37 ` Herbert Xu
2005-01-14  4:09   ` David S. Miller
2005-01-14  1:50 ` Sridhar Samudrala
2005-01-14  2:10   ` Herbert Xu
2005-01-14  3:44     ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2005-01-16 23:03 Nancy Milliner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).