netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: paulmck@linux.vnet.ibm.com, David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH 3/3] iptables: lock free counters (alternate version)
Date: Tue, 3 Feb 2009 12:44:42 -0800	[thread overview]
Message-ID: <20090203124442.44598d63@extreme> (raw)
In-Reply-To: <4988A6F4.6000902@cosmosbay.com>


> >>> +	e = t->entries[curcpu];
> >>>  	i = 0;
> >>> -	IPT_ENTRY_ITERATE(t->entries[curcpu],
> >>> +	IPT_ENTRY_ITERATE(e,
> >>>  			  t->size,
> >>>  			  set_entry_to_counter,
> >> Hum, current cpu might be interrupted by NIC, since you only disabled preemption.
> >> set_entry_to_counter() might get garbage.
> >> I suppose I already mentioned it :)


Need to change preempt_disable to local_bh_disable.

> >>>  			  counters,
> >>>  			  &i);
> >>>  
> >>>  	for_each_possible_cpu(cpu) {
> >>> +		void *p;
> >>> +
> >>>  		if (cpu == curcpu)
> >>>  			continue;
> >>> +
> >>> +		/* Swizzle the values and wait */
> >>> +		e->counters = ((struct xt_counters) { 0, 0 });
> >> I dont see what you want to do here...
> >>
> >> e->counters is the counter associated with rule #0
> >>
> >>> +		p = t->entries[cpu];
> >>> +		rcu_assign_pointer(t->entries[cpu], e);
> >>> +		synchronize_net();
> >>
> >> Oh well, not this synchronize_net() :)
> >>
> >> This wont provide atomic sampling of counters for whole CPUS, and introduce large delays
> >> on big machines (NR_CPUS >= 64)

For large number of CPU's it  would be possible to allocate
a full set of ncpu-1 new counters, then swap them in and then do one synchronize net.
Or instead of synchronize_net, maybe a "wait for CPU #n" to go through
grace period.

> > Why would this not provide the moral equivalent of atomic sampling?
> > The code above switches to another counter set, and waits for a grace
> > period.  Shouldn't this mean that all CPUs that were incrementing the
> > old set of counters have finished doing so, so that the aggregate count
> > covers all CPUs that started their increments before the pointer switch?
> > Same as acquiring a write lock, which would wait for all CPUs that
> > started their increments before starting the write-lock acquisition.
> > CPUs that started their increments after starting the write acquisition
> > would not be accounted for in the total, same as the RCU approach.
> > 
> > Steve's approach does delay reading out the counters, but it avoids
> > delaying any CPU trying to increment the counters.
> 
> I see your point, but this is not what Stephen implemented.
> 
> So.. CPU will increments which counters, if not delayed ?

The concept is that only the sum of all the counters matters, not
which CPU has the count.

> How counters will be synced again after our 'iptables -L' finished ?
> 
> "iptable -L" is not supposed to miss some counters updates (only some packets
>  might be droped at NIC level because we spend time in the collection).
> If packets matches some rules, we really want up2date counters.
> 
> Maybe we need for this collection an extra "cpu", to collect 
> all increments that were done when CPUs where directed to a 
> "secondary table/counters"
> 
> > 
> > So what am I missing here?
> > 
> 
> Well, I saw one synchronize_net() inside the for_each_possible_cpu(cpu) loop.
> Say we have NR_CPUS=4096, how long will it takes to perform "iptables -L" ?
> 
> 
> 
> General/intuitive idea would be :
> 
> switch pointers to a newly allocated table (and zeroed counters)
> wait one RCU grace period
> collect/sum all counters of "old" table + (all cpus) into user provided table
> restore previous table
> wait one RCU grace period
> disable_bh()
> collect/sum all counters of "new and temporary" table (all cpus) and
> reinject them into local cpu table (this cpu should not be interrupted)
> enable_bh()
> 
> This way, "iptables -L" is not too expensive and doesnt block packet processing at all.

Pretty much what I said.

  reply	other threads:[~2009-02-03 20:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 21:57 [PATCH 0/6] iptables: eliminate read/write lock (v0.4) Stephen Hemminger
2009-01-30 21:57 ` [PATCH 1/6] netfilter: change elements in x_tables Stephen Hemminger
2009-01-30 21:57 ` [PATCH 2/6] netfilter: remove unneeded initializations Stephen Hemminger
2009-01-30 21:57 ` [PATCH 3/6] ebtables: " Stephen Hemminger
2009-01-30 21:57 ` [PATCH 4/6] netfilter: abstract xt_counters Stephen Hemminger
2009-02-01 12:25   ` Eric Dumazet
2009-02-02 23:33     ` [PATCH 3/3] iptables: lock free counters (alternate version) Stephen Hemminger
2009-02-03 19:00       ` Eric Dumazet
2009-02-03 19:19         ` Eric Dumazet
2009-02-03 19:32         ` Paul E. McKenney
2009-02-03 20:20           ` Eric Dumazet
2009-02-03 20:44             ` Stephen Hemminger [this message]
2009-02-03 21:05               ` Eric Dumazet
2009-02-03 21:10             ` Paul E. McKenney
2009-02-03 21:22               ` Stephen Hemminger
2009-02-03 21:27                 ` Rick Jones
2009-02-03 23:11                 ` Paul E. McKenney
2009-02-03 23:18                   ` Stephen Hemminger
2009-01-30 21:57 ` [PATCH 5/6] netfilter: use sequence number synchronization for counters Stephen Hemminger
2009-01-30 21:57 ` [PATCH 6/6] netfilter: convert x_tables to use RCU Stephen Hemminger
2009-01-31 17:27   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090203124442.44598d63@extreme \
    --to=shemminger@vyatta.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).