All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: ben@bigfootnetworks.com
Cc: netdev@vger.kernel.org
Subject: Re: Bridge + Conntrack + SKB Recycle: Fragment Reassembly Errors
Date: Tue, 10 Nov 2009 17:50:38 +0100	[thread overview]
Message-ID: <4AF999DE.9060206@trash.net> (raw)
In-Reply-To: <767BAF49E93AFB4B815B11325788A8ED45F0BA@L01SLCXDB03.calltower.com>

ben@bigfootnetworks.com wrote:
> We have observed significant reassembly errors when combining
> routing/bridging with conntrack + nf_defrag_ipv4 loaded, and
> skb_recycle_check - enabled interfaces.  For our test, we had a single
> linux device with two interfaces (gianfars in this case) with SKB
> recycling enabled.  We sent large, continuous pings across the bridge,
> like this:
> ping -s 64000 -A <dest IP>
> 
> Then, we ran netstat -s --raw, and noticed that IPSTATS_MIB_REASMFAILS
> were happening for about 40% of the received datagrams.  Tracing the
> code in ip_fragment.c, we instrumented each of the
> IPSTATS_MIB_REASMFAILS locations, and found the culprit to be
> ip_evictor.  Nothing looked unusual here, so we placed tracing in
> ip_frag_queue, directly above:
> 	atomic_add(skb->truesize, &qp->q.net->mem);
> 
> We noticed that quite a few of the skb->truesize numbers were in the 67K
> range, which quickly overwhelms the default 192K-ish ipfrag_low_thresh.
> This means that the next time inet_frag_evictor is run:
>  work = atomic_read(&nf->mem) - nf->low_thresh;
> 
> Will surely be positive, and it is likely that our huge-frag-containing
> queue will be one of those evicted. 
> 
> Looking at the source of these huge skbs, it seems that during
> re-fragmentation in br_nf_dev_queue_xmit (which calls ip_fragment with
> CONFIG_NF_CONNTRACK_IPV4 enabled), the huge datagram that was allocated
> to hold a successfully-reassembled skb may be getting reused?  In any
> case, when skb_recycle_check(skb, min_rx_size) is called, the huge
> (skb->truesize huge, not data huge) skb is recycled for use on RX, and
> it eventually gets enqueued for reassembly, causing the
> inet_frag_evictor to have a positive work value.

Interesting problem. I wonder what the linear size of the skb was
and whether we're just not properly adjusting truesize of the head
during refragmentation.

This code in ip_fragment() looks suspicious:

	if (skb_has_frags(skb)) {
	...
		skb_walk_frags(skb, frag) {
			...
			if (skb->sk) {
				frag->sk = skb->sk;
				frag->destructor = sock_wfree;
				truesizes += frag->truesize;
			}

truesizes is later used to adjust truesize of the head skb.
For some reason this is only done when it originated from a
local socket.

> Our solution was to add an upper-bounds check to skb_recycle_check,
> which prevents the large-ish SKBs from being used to create future
> frags, and overwhelming ipfrag_low_thresh.  This seems quite clunky,
> although I would be happy to submit this as a patch...

This seems reasonable to me, there might be large skbs for
different reasons.

  reply	other threads:[~2009-11-10 16:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-10 16:09 Bridge + Conntrack + SKB Recycle: Fragment Reassembly Errors ben
2009-11-10 16:50 ` Patrick McHardy [this message]
2009-11-21 19:08   ` David Miller
2009-11-22  0:21     ` Patrick McHardy
2009-11-22  0:29       ` Patrick McHardy
2009-12-01 16:00         ` ben
2009-12-01 16:24           ` Patrick McHardy
2009-12-01 23:54             ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF999DE.9060206@trash.net \
    --to=kaber@trash.net \
    --cc=ben@bigfootnetworks.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.