All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: "liujian (CE)" <liujian56@huawei.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kuznet@ms2.inr.ac.ru" <kuznet@ms2.inr.ac.ru>,
	"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
	"elena.reshetova@intel.com" <elena.reshetova@intel.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Wangkefeng (Kevin)" <wangkefeng.wang@huawei.com>,
	"weiyongjun (A)" <weiyongjun1@huawei.com>
Subject: Re: Question about ip_defrag
Date: Mon, 28 Aug 2017 16:00:32 +0200	[thread overview]
Message-ID: <20170828140032.GB12926@breakpoint.cc> (raw)
In-Reply-To: <4F88C5DDA1E80143B232E89585ACE27D018F3157@DGGEMA502-MBX.china.huawei.com>

liujian (CE) <liujian56@huawei.com> wrote:
> Hi
> 
> I checked our 3.10 kernel, we had backported all percpu_counter bug fix in lib/percpu_counter.c and include/linux/percpu_counter.h.
> And I check 4.13-rc6, also has the issue if NIC's rx cpu num big enough.
> 
> > > > > the issue:
> > > > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > > > At this moment,sum_frag_mem_limit is about 10K.
> 
> So should we change ipfrag high/low thresh to a reasonable value ? 
> And if it is, is there a standard to change the value?

Each cpu can have frag_percpu_counter_batch bytes rest doesn't know
about so with 64 cpus that is ~8 mbyte.

possible solutions:
1. reduce frag_percpu_counter_batch to 16k or so
2. make both low and high thresh depend on NR_CPUS

liujian, does this change help in any way?

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -123,6 +123,17 @@ static bool inet_fragq_should_evict(const struct inet_frag_queue *q)
 	       frag_mem_limit(q->net) >= q->net->low_thresh;
 }
 
+/* ->mem batch size is huge, this can cause severe discrepancies
+ * between actual value (sum of pcpu values) and the global estimate.
+ *
+ * Use a smaller batch to give an opportunity for the global estimate
+ * to more accurately reflect current state.
+ */
+static void update_frag_mem_limit(struct netns_frags *nf, unsigned int batch)
+{
+	 percpu_counter_add_batch(&nf->mem, 0, batch);
+}
+
 static unsigned int
 inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
 {
@@ -146,8 +157,12 @@ inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
 
 	spin_unlock(&hb->chain_lock);
 
-	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
+	hlist_for_each_entry_safe(fq, n, &expired, list_evictor) {
+		struct netns_frags *nf = fq->net;
+
 		f->frag_expire((unsigned long) fq);
+		update_frag_mem_limit(nf, 1);
+	}
 
 	return evicted;
 }
@@ -396,8 +411,10 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	struct inet_frag_queue *q;
 	int depth = 0;
 
-	if (frag_mem_limit(nf) > nf->low_thresh)
+	if (frag_mem_limit(nf) > nf->low_thresh) {
 		inet_frag_schedule_worker(f);
+		update_frag_mem_limit(nf, SKB_TRUESIZE(1500) * 16);
+	}
 
 	hash &= (INETFRAGS_HASHSZ - 1);
 	hb = &f->hash[hash];
@@ -416,6 +433,8 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	if (depth <= INETFRAGS_MAXDEPTH)
 		return inet_frag_create(nf, f, key);
 
+	update_frag_mem_limit(nf, 1);
+
 	if (inet_frag_may_rebuild(f)) {
 		if (!f->rebuild)
 			f->rebuild = true;

  reply	other threads:[~2017-08-28 14:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4F88C5DDA1E80143B232E89585ACE27D018F07E2@DGGEMA502-MBX.china.huawei.com>
2017-08-24 13:53 ` Question about ip_defrag Jesper Dangaard Brouer
     [not found]   ` <4F88C5DDA1E80143B232E89585ACE27D018F0AE1@DGGEMA502-MBX.china.huawei.com>
2017-08-24 18:59     ` Jesper Dangaard Brouer
2017-08-25  1:33       ` liujian (CE)
2017-08-28  8:08       ` liujian (CE)
2017-08-28 14:00         ` Florian Westphal [this message]
2017-08-29  7:20           ` Jesper Dangaard Brouer
2017-08-29  7:44             ` liujian (CE)
2017-08-29  7:53             ` Florian Westphal
2017-08-30 10:58               ` Jesper Dangaard Brouer
2017-08-30 11:58                 ` Florian Westphal
2017-08-30 12:22                   ` Jesper Dangaard Brouer
2017-08-29  7:40           ` liujian (CE)
2017-08-29 13:01           ` liujian (CE)
2017-08-29 13:46             ` Florian Westphal
2017-08-30  1:52               ` liujian (CE)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170828140032.GB12926@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=elena.reshetova@intel.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=liujian56@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=weiyongjun1@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.