All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Stephen Hemminger <shemminger@linux-foundation.org>,
	netdev@vger.kernel.org, bhutchings@solarflare.com,
	Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH net-next-2.6] bridge: 64bit rx/tx counters
Date: Thu, 12 Aug 2010 08:07:31 -0700	[thread overview]
Message-ID: <20100812080731.c9456ef9.akpm@linux-foundation.org> (raw)
In-Reply-To: <1281615375.2494.20.camel@edumazet-laptop>

On Thu, 12 Aug 2010 14:16:15 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > And all this open-coded per-cpu counter stuff added all over the place.
> > Were percpu_counters tested or reviewed and found inadequate and unfixable?
> > If so, please do tell.
> > 
> 
> percpu_counters tries hard to maintain a view of the current value of
> the (global) counter. This adds a cost because of a shared cache line
> and locking. (__percpu_counter_sum() is not very scalable on big hosts,
> it locks the percpu_counter lock for a possibly long iteration)

Could be.  Is percpu_counter_read_positive() unsuitable?

> 
> For network stats we dont want to maintain this central value, we do the
> folding only when necessary.

hm.  Well, why?  That big walk across all possible CPUs could be really
expensive for some applications.  Especially if num_possible_cpus is
much larger than num_online_cpus, which iirc can happen in
virtualisation setups; probably it can happen in non-virtualised
machines too.

> And this folding has zero effect on
> concurrent writers (counter updates)

The fastpath looks a little expensive in the code you've added.  The
write_seqlock() does an rmw and a wmb() and the stats inc is a 64-bit
rmw whereas percpu_counters do a simple 32-bit add.  So I'd expect that
at some suitable batch value, percpu-counters are faster on 32-bit. 

They'll usually be slower on 64-bit, until that num_possible_cpus walk
bites you.

percpu_counters might need some work to make them irq-friendly.  That
bare spin_lock().

btw, I worry a bit about seqlocks in the presence of interrupts:

static inline void write_seqcount_begin(seqcount_t *s)
{
	s->sequence++;
	smp_wmb();
}

are we assuming that the ++ there is atomic wrt interrupts?  I think
so.  Is that always true for all architectures, compiler versions, etc?

> For network stack, we also need to update two values, a packet counter
> and a bytes counter. percpu_counter is not very good for the 'bytes
> counter', since we would have to use a arbitrary big bias value.

OK, that's a nasty problem for percpu-counters.

> Using several percpu_counter would also probably use more cache lines.
> 
> Also please note this stuff is only needed for 32bit arches. 
> 
> Using percpu_counter would slow down network stack on modern arches.

Was this ever quantified?

> 
> I am very well aware of the percpu_counter stuff, I believe I tried to
> optimize it a bit in the past.

  reply	other threads:[~2010-08-12 15:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-14 15:59 [PATCH net-next-2.6] loopback: Implement 64bit stats on 32bit arches Eric Dumazet
2010-06-15  6:14 ` David Miller
2010-06-15  6:49   ` Nick Piggin
2010-06-15  7:23     ` Eric Dumazet
2010-06-15 10:14   ` [PATCH net-next-2.6] net: Introduce u64_stats_sync infrastructure Eric Dumazet
2010-06-15 10:25     ` Nick Piggin
2010-06-15 10:43       ` Eric Dumazet
2010-06-15 11:04         ` Nick Piggin
2010-06-15 12:12           ` Eric Dumazet
2010-06-15 13:29           ` [PATCH net-next-2.6 v2] " Eric Dumazet
2010-06-22 17:24             ` David Miller
2010-06-22 17:31               ` Eric Dumazet
2010-06-15 10:39     ` [PATCH net-next-2.6] bridge: 64bit rx/tx counters Eric Dumazet
2010-06-22 17:25       ` David Miller
2010-08-10  4:47       ` Andrew Morton
2010-08-12 12:16         ` Eric Dumazet
2010-08-12 15:07           ` Andrew Morton [this message]
2010-08-12 21:47             ` Eric Dumazet
2010-08-12 22:11               ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100812080731.c9456ef9.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.