netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: ben@bigfootnetworks.com
Cc: netdev@vger.kernel.org
Subject: Re: Bridge + Conntrack + SKB Recycle: Fragment Reassembly Errors
Date: Tue, 10 Nov 2009 17:50:38 +0100	[thread overview]
Message-ID: <4AF999DE.9060206@trash.net> (raw)
In-Reply-To: <767BAF49E93AFB4B815B11325788A8ED45F0BA@L01SLCXDB03.calltower.com>

ben@bigfootnetworks.com wrote:
> We have observed significant reassembly errors when combining
> routing/bridging with conntrack + nf_defrag_ipv4 loaded, and
> skb_recycle_check - enabled interfaces.  For our test, we had a single
> linux device with two interfaces (gianfars in this case) with SKB
> recycling enabled.  We sent large, continuous pings across the bridge,
> like this:
> ping -s 64000 -A <dest IP>
> 
> Then, we ran netstat -s --raw, and noticed that IPSTATS_MIB_REASMFAILS
> were happening for about 40% of the received datagrams.  Tracing the
> code in ip_fragment.c, we instrumented each of the
> IPSTATS_MIB_REASMFAILS locations, and found the culprit to be
> ip_evictor.  Nothing looked unusual here, so we placed tracing in
> ip_frag_queue, directly above:
> 	atomic_add(skb->truesize, &qp->q.net->mem);
> 
> We noticed that quite a few of the skb->truesize numbers were in the 67K
> range, which quickly overwhelms the default 192K-ish ipfrag_low_thresh.
> This means that the next time inet_frag_evictor is run:
>  work = atomic_read(&nf->mem) - nf->low_thresh;
> 
> Will surely be positive, and it is likely that our huge-frag-containing
> queue will be one of those evicted. 
> 
> Looking at the source of these huge skbs, it seems that during
> re-fragmentation in br_nf_dev_queue_xmit (which calls ip_fragment with
> CONFIG_NF_CONNTRACK_IPV4 enabled), the huge datagram that was allocated
> to hold a successfully-reassembled skb may be getting reused?  In any
> case, when skb_recycle_check(skb, min_rx_size) is called, the huge
> (skb->truesize huge, not data huge) skb is recycled for use on RX, and
> it eventually gets enqueued for reassembly, causing the
> inet_frag_evictor to have a positive work value.

Interesting problem. I wonder what the linear size of the skb was
and whether we're just not properly adjusting truesize of the head
during refragmentation.

This code in ip_fragment() looks suspicious:

	if (skb_has_frags(skb)) {
	...
		skb_walk_frags(skb, frag) {
			...
			if (skb->sk) {
				frag->sk = skb->sk;
				frag->destructor = sock_wfree;
				truesizes += frag->truesize;
			}

truesizes is later used to adjust truesize of the head skb.
For some reason this is only done when it originated from a
local socket.

> Our solution was to add an upper-bounds check to skb_recycle_check,
> which prevents the large-ish SKBs from being used to create future
> frags, and overwhelming ipfrag_low_thresh.  This seems quite clunky,
> although I would be happy to submit this as a patch...

This seems reasonable to me, there might be large skbs for
different reasons.

  reply	other threads:[~2009-11-10 16:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-10 16:09 Bridge + Conntrack + SKB Recycle: Fragment Reassembly Errors ben
2009-11-10 16:50 ` Patrick McHardy [this message]
2009-11-21 19:08   ` David Miller
2009-11-22  0:21     ` Patrick McHardy
2009-11-22  0:29       ` Patrick McHardy
2009-12-01 16:00         ` ben
2009-12-01 16:24           ` Patrick McHardy
2009-12-01 23:54             ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF999DE.9060206@trash.net \
    --to=kaber@trash.net \
    --cc=ben@bigfootnetworks.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).