netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@comx.dk>
To: Arnaldo Carvalho de Melo <acme@infradead.org>,
	Steven Rostedt <srostedt@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
	netfilter-devel <netfilter-devel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Subject: Re: Possible regression: Packet drops during iptables calls
Date: Thu, 16 Dec 2010 15:04:26 +0100	[thread overview]
Message-ID: <1292508266.31289.12.camel@firesoul.comx.local> (raw)
In-Reply-To: <1292343855.5934.27.camel@edumazet-laptop>


On Tue, 2010-12-14 at 17:24 +0100, Eric Dumazet wrote:
> Le mardi 14 décembre 2010 à 17:09 +0100, Jesper Dangaard Brouer a écrit :
> > On Tue, 2010-12-14 at 16:31 +0100, Eric Dumazet wrote:
> > > Le mardi 14 décembre 2010 à 15:46 +0100, Jesper Dangaard Brouer a
> > > écrit :
> > > > I'm experiencing RX packet drops during call to iptables, on my
> > > > production servers.
> > > > 
> > > > Further investigations showed, that its only the CPU executing the
> > > > iptables command that experience packet drops!?  Thus, a quick fix was
> > > > to force the iptables command to run on one of the idle CPUs (This can
> > > > be achieved with the "taskset" command).
> > > > 
> > > > I have a 2x Xeon 5550 CPU system, thus 16 CPUs (with HT enabled).  We
> > > > only use 8 CPUs due to a multiqueue limitation of 8 queues in the
> > > > 1Gbit/s NICs (82576 chips).  CPUs 0 to 7 is assigned for packet
> > > > processing via smp_affinity.
> > > > 
> > > > Can someone explain why the packet drops only occur on the CPU
> > > > executing the iptables command?
> > > > 
> > > 
> > > It blocks BH
> > > 
> > > take a look at commits :
> > > 
> > > 24b36f0193467fa727b85b4c004016a8dae999b9
> > > netfilter: {ip,ip6,arp}_tables: dont block bottom half more than
> > > necessary 
> > > 
> > > 001389b9581c13fe5fc357a0f89234f85af4215d
> > > netfilter: {ip,ip6,arp}_tables: avoid lockdep false positiv
<... cut ...>
> > 
> > Looking closer at the two combined code change, I see that the code path
> > has been improved (a bit), as the local BH is only disabled inside the
> > for_each_possible_cpu(cpu).  Before local_bh was disabled for the hole
> > function.  Guess I need to reproduce this in my testlab.


To do some further investigation into the unfortunate behavior of the
iptables get_counters() function I started to use "ftrace".  This is a
really useful tool (thanks Steven Rostedt).

 # Select the tracer "function_graph"
 echo function_graph > /sys/kernel/debug/tracing/current_tracer

 # Limit the number of function we look at:
 echo local_bh_\*  >   /sys/kernel/debug/tracing/set_ftrace_filter
 echo get_counters >>  /sys/kernel/debug/tracing/set_ftrace_filter

 # Enable tracing while calling iptables
 cd /sys/kernel/debug/tracing
 echo 0 > trace
 echo 1 > tracing_enabled;
   taskset 1 iptables -vnL > /dev/null ;
 echo 0 > tracing_enabled
 cat trace | less


The reduced output:

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  2)   2.772 us    |  local_bh_disable();
....
  0)   0.228 us    |  local_bh_enable();
  0)               |  get_counters() {
  0)   0.232 us    |    local_bh_disable();
  0)   7.919 us    |    local_bh_enable();
  0) ! 109467.1 us |  }
  0)   2.344 us    |  local_bh_disable();


The output show that we spend no less that 100 ms with local BH
disabled.  So, no wonder that this causes packet drops in the NIC
(attached to this CPU).

My iptables rule set in question is also very large, it contains:
 Chains: 20929
 Rules: 81239

The vmalloc size is approx 19 MB (19.820.544 bytes) (see
/proc/vmallocinfo).  Looking through vmallocinfo I realized that
even-though I only have 16 CPUs, there is 32 allocated rulesets
"xt_alloc_table_info" (for the filter table). Thus, I have approx
634MB iptables filter rules in the kernel, half of which is totally
unused.

Guess this is because we use: "for_each_possible_cpu" instead of
"for_each_online_cpu". (Feel free to fix this, or point me to some
documentation of this CPU hotplug stuff... I see we are missing
get_cpu() and put_cpu() a lot of places).


The GOOD NEWS, is that moving the local BH disable section into the
"for_each_possible_cpu" fixed the problem with packet drops during
iptables calls.

I wanted to profile with ftrace on the new code, but I cannot get the
measurement I want. Perhaps Steven or Acme can help?

Now I want to measure the time used between the local_bh_disable() and
local_bh_enable, within the loop.  I cannot figure out howto do that?
The new trace looks almost the same as before, just a lot of
local_bh_* inside the get_counters() function call.

 Guess is that the time spend is: 100 ms / 32 = 3.125 ms.

Now I just need to calculate, how large a NIC buffer I need to buffer
3.125 ms at 1Gbit/s.

 3.125 ms *  1Gbit/s = 390625 bytes

Can this be correct?

How much buffer does each queue have in the 82576 NIC?
(Hope Alexander Duyck can answer this one?)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-12-16 14:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-14 14:46 Possible regression: Packet drops during iptables calls Jesper Dangaard Brouer
2010-12-14 15:31 ` Eric Dumazet
2010-12-14 16:09   ` Jesper Dangaard Brouer
2010-12-14 16:24     ` Eric Dumazet
2010-12-16 14:04       ` Jesper Dangaard Brouer [this message]
2010-12-16 14:12         ` Eric Dumazet
2010-12-16 14:24           ` Jesper Dangaard Brouer
2010-12-16 14:29             ` Eric Dumazet
2010-12-16 15:02               ` Eric Dumazet
2010-12-16 16:07                 ` [PATCH net-next-2.6] netfilter: ip_tables: dont block BH while reading counters Eric Dumazet
2010-12-16 16:53                   ` [PATCH v2 " Eric Dumazet
2010-12-16 17:31                     ` Stephen Hemminger
2010-12-16 17:53                       ` [PATCH v3 net-next-2.6] netfilter: x_tables: " Eric Dumazet
2010-12-16 17:57                         ` Stephen Hemminger
2010-12-16 19:58                           ` Eric Dumazet
2010-12-16 20:12                             ` Stephen Hemminger
2010-12-16 20:40                               ` Eric Dumazet
2010-12-16 17:57                         ` Stephen Hemminger
2010-12-18  4:29                         ` [PATCH v4 " Eric Dumazet
2010-12-20 13:42                           ` Jesper Dangaard Brouer
2010-12-20 14:45                             ` Eric Dumazet
2010-12-21 16:48                               ` Jesper Dangaard Brouer
2011-01-08 16:45                           ` Eric Dumazet
2011-01-09 21:31                             ` Pablo Neira Ayuso
2010-12-16 14:13         ` Possible regression: Packet drops during iptables calls Eric Dumazet
2010-12-16 14:20         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1292508266.31289.12.camel@firesoul.comx.local \
    --to=hawk@comx.dk \
    --cc=acme@infradead.org \
    --cc=alexander.h.duyck@intel.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=shemminger@vyatta.com \
    --cc=srostedt@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).