All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: paulmck@linux.vnet.ibm.com, David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH 3/3] iptables: lock free counters (alternate version)
Date: Tue, 3 Feb 2009 12:44:42 -0800	[thread overview]
Message-ID: <20090203124442.44598d63@extreme> (raw)
In-Reply-To: <4988A6F4.6000902@cosmosbay.com>


> >>> +	e = t->entries[curcpu];
> >>>  	i = 0;
> >>> -	IPT_ENTRY_ITERATE(t->entries[curcpu],
> >>> +	IPT_ENTRY_ITERATE(e,
> >>>  			  t->size,
> >>>  			  set_entry_to_counter,
> >> Hum, current cpu might be interrupted by NIC, since you only disabled preemption.
> >> set_entry_to_counter() might get garbage.
> >> I suppose I already mentioned it :)


Need to change preempt_disable to local_bh_disable.

> >>>  			  counters,
> >>>  			  &i);
> >>>  
> >>>  	for_each_possible_cpu(cpu) {
> >>> +		void *p;
> >>> +
> >>>  		if (cpu == curcpu)
> >>>  			continue;
> >>> +
> >>> +		/* Swizzle the values and wait */
> >>> +		e->counters = ((struct xt_counters) { 0, 0 });
> >> I dont see what you want to do here...
> >>
> >> e->counters is the counter associated with rule #0
> >>
> >>> +		p = t->entries[cpu];
> >>> +		rcu_assign_pointer(t->entries[cpu], e);
> >>> +		synchronize_net();
> >>
> >> Oh well, not this synchronize_net() :)
> >>
> >> This wont provide atomic sampling of counters for whole CPUS, and introduce large delays
> >> on big machines (NR_CPUS >= 64)

For large number of CPU's it  would be possible to allocate
a full set of ncpu-1 new counters, then swap them in and then do one synchronize net.
Or instead of synchronize_net, maybe a "wait for CPU #n" to go through
grace period.

> > Why would this not provide the moral equivalent of atomic sampling?
> > The code above switches to another counter set, and waits for a grace
> > period.  Shouldn't this mean that all CPUs that were incrementing the
> > old set of counters have finished doing so, so that the aggregate count
> > covers all CPUs that started their increments before the pointer switch?
> > Same as acquiring a write lock, which would wait for all CPUs that
> > started their increments before starting the write-lock acquisition.
> > CPUs that started their increments after starting the write acquisition
> > would not be accounted for in the total, same as the RCU approach.
> > 
> > Steve's approach does delay reading out the counters, but it avoids
> > delaying any CPU trying to increment the counters.
> 
> I see your point, but this is not what Stephen implemented.
> 
> So.. CPU will increments which counters, if not delayed ?

The concept is that only the sum of all the counters matters, not
which CPU has the count.

> How counters will be synced again after our 'iptables -L' finished ?
> 
> "iptable -L" is not supposed to miss some counters updates (only some packets
>  might be droped at NIC level because we spend time in the collection).
> If packets matches some rules, we really want up2date counters.
> 
> Maybe we need for this collection an extra "cpu", to collect 
> all increments that were done when CPUs where directed to a 
> "secondary table/counters"
> 
> > 
> > So what am I missing here?
> > 
> 
> Well, I saw one synchronize_net() inside the for_each_possible_cpu(cpu) loop.
> Say we have NR_CPUS=4096, how long will it takes to perform "iptables -L" ?
> 
> 
> 
> General/intuitive idea would be :
> 
> switch pointers to a newly allocated table (and zeroed counters)
> wait one RCU grace period
> collect/sum all counters of "old" table + (all cpus) into user provided table
> restore previous table
> wait one RCU grace period
> disable_bh()
> collect/sum all counters of "new and temporary" table (all cpus) and
> reinject them into local cpu table (this cpu should not be interrupted)
> enable_bh()
> 
> This way, "iptables -L" is not too expensive and doesnt block packet processing at all.

Pretty much what I said.

  reply	other threads:[~2009-02-03 20:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 21:57 [PATCH 0/6] iptables: eliminate read/write lock (v0.4) Stephen Hemminger
2009-01-30 21:57 ` [PATCH 1/6] netfilter: change elements in x_tables Stephen Hemminger
2009-01-30 21:57 ` [PATCH 2/6] netfilter: remove unneeded initializations Stephen Hemminger
2009-01-30 21:57 ` [PATCH 3/6] ebtables: " Stephen Hemminger
2009-01-30 21:57 ` [PATCH 4/6] netfilter: abstract xt_counters Stephen Hemminger
2009-02-01 12:25   ` Eric Dumazet
2009-02-02 23:33     ` [PATCH 3/3] iptables: lock free counters (alternate version) Stephen Hemminger
2009-02-03 19:00       ` Eric Dumazet
2009-02-03 19:19         ` Eric Dumazet
2009-02-03 19:32         ` Paul E. McKenney
2009-02-03 20:20           ` Eric Dumazet
2009-02-03 20:44             ` Stephen Hemminger [this message]
2009-02-03 21:05               ` Eric Dumazet
2009-02-03 21:10             ` Paul E. McKenney
2009-02-03 21:22               ` Stephen Hemminger
2009-02-03 21:27                 ` Rick Jones
2009-02-03 23:11                 ` Paul E. McKenney
2009-02-03 23:18                   ` Stephen Hemminger
2009-01-30 21:57 ` [PATCH 5/6] netfilter: use sequence number synchronization for counters Stephen Hemminger
2009-01-30 21:57 ` [PATCH 6/6] netfilter: convert x_tables to use RCU Stephen Hemminger
2009-01-31 17:27   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090203124442.44598d63@extreme \
    --to=shemminger@vyatta.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.