From: Andrew Morton <akpm@linux-foundation.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
Stephen Hemminger <shemminger@linux-foundation.org>,
netdev@vger.kernel.org, bhutchings@solarflare.com,
Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH net-next-2.6] bridge: 64bit rx/tx counters
Date: Thu, 12 Aug 2010 15:11:45 -0700 [thread overview]
Message-ID: <20100812151145.f5fa259b.akpm@linux-foundation.org> (raw)
In-Reply-To: <1281649657.2305.38.camel@edumazet-laptop>
On Thu, 12 Aug 2010 23:47:37 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 12 ao__t 2010 __ 08:07 -0700, Andrew Morton a __crit :
> > On Thu, 12 Aug 2010 14:16:15 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > > And all this open-coded per-cpu counter stuff added all over the place.
> > > > Were percpu_counters tested or reviewed and found inadequate and unfixable?
> > > > If so, please do tell.
> > > >
> > >
> > > percpu_counters tries hard to maintain a view of the current value of
> > > the (global) counter. This adds a cost because of a shared cache line
> > > and locking. (__percpu_counter_sum() is not very scalable on big hosts,
> > > it locks the percpu_counter lock for a possibly long iteration)
> >
> > Could be. Is percpu_counter_read_positive() unsuitable?
> >
>
> I bet most people want precise counters when doing 'ifconfig lo'
>
> SNMP applications would be very surprised to get non increasing values
> between two samples, or inexact values.
percpu_counter_read_positive() should be returning monotonically
increasing numbers - if it ever went backward that would be bad. But
yes, the value will increase in a lumpy fashion. Probably one would
need to make informed choices between percpu_counter_read_positive()
and percpu_counter_sum(), depending on the type of stat.
But that's all a bit academic.
>
> > > And this folding has zero effect on
> > > concurrent writers (counter updates)
> >
> > The fastpath looks a little expensive in the code you've added. The
> > write_seqlock() does an rmw and a wmb() and the stats inc is a 64-bit
> > rmw whereas percpu_counters do a simple 32-bit add. So I'd expect that
> > at some suitable batch value, percpu-counters are faster on 32-bit.
> >
>
> Hmm... 6 instructions (16 bytes of text) are a "little expensive" versus
> 120 instructions if we use percpu_counter ?
>
> Following code from drivers/net/loopback.c
>
> u64_stats_update_begin(&lb_stats->syncp);
> lb_stats->bytes += len;
> lb_stats->packets++;
> u64_stats_update_end(&lb_stats->syncp);
>
> maps on i386 to :
>
> ff 46 10 incl 0x10(%esi) // u64_stats_update_begin(&lb_stats->syncp);
> 89 f8 mov %edi,%eax
> 99 cltd
> 01 7e 08 add %edi,0x8(%esi)
> 11 56 0c adc %edx,0xc(%esi)
> 83 06 01 addl $0x1,(%esi)
> 83 56 04 00 adcl $0x0,0x4(%esi)
> ff 46 10 incl 0x10(%esi) // u64_stats_update_end(&lb_stats->syncp);
>
>
> Exactly 6 added instructions compared to previous kernel (32bit
> counters), only on 32bit hosts. These instructions are not expensive (no
> conditional branches, no extra register pressure) and access private cpu
> data.
>
> While two calls to __percpu_counter_add() add about 120 instructions,
> even on 64bit hosts, wasting precious cpu cycles.
Oy. You omitted the per_cpu_ptr() evaluation and, I bet, included all
the executed-1/batch-times instructions.
>
> > They'll usually be slower on 64-bit, until that num_possible_cpus walk
> > bites you.
> >
>
> But are you aware we already fold SNMP values using for_each_possible()
> macros, before adding 64bit counters ? Not related to 64bit stuff
> really...
> > percpu_counters might need some work to make them irq-friendly. That
> > bare spin_lock().
> >
> > btw, I worry a bit about seqlocks in the presence of interrupts:
> >
>
> Please note that nothing is assumed about interrupts and seqcounts
>
> Both readers and writers must mask them if necessary.
>
> In most situations, masking softirq is enough for networking cases
> (updates are performed from softirq handler, reads from process context)
Yup, write_seqcount_begin/end() are pretty dangerous-looking. The
caller needs to protect the lock against other CPUs, against interrupts
and even against preemption.
prev parent reply other threads:[~2010-08-12 22:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-14 15:59 [PATCH net-next-2.6] loopback: Implement 64bit stats on 32bit arches Eric Dumazet
2010-06-15 6:14 ` David Miller
2010-06-15 6:49 ` Nick Piggin
2010-06-15 7:23 ` Eric Dumazet
2010-06-15 10:14 ` [PATCH net-next-2.6] net: Introduce u64_stats_sync infrastructure Eric Dumazet
2010-06-15 10:25 ` Nick Piggin
2010-06-15 10:43 ` Eric Dumazet
2010-06-15 11:04 ` Nick Piggin
2010-06-15 12:12 ` Eric Dumazet
2010-06-15 13:29 ` [PATCH net-next-2.6 v2] " Eric Dumazet
2010-06-22 17:24 ` David Miller
2010-06-22 17:31 ` Eric Dumazet
2010-06-15 10:39 ` [PATCH net-next-2.6] bridge: 64bit rx/tx counters Eric Dumazet
2010-06-22 17:25 ` David Miller
2010-08-10 4:47 ` Andrew Morton
2010-08-12 12:16 ` Eric Dumazet
2010-08-12 15:07 ` Andrew Morton
2010-08-12 21:47 ` Eric Dumazet
2010-08-12 22:11 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100812151145.f5fa259b.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.