From: Andrew Morton <akpm@linux-foundation.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
Stephen Hemminger <shemminger@linux-foundation.org>,
netdev@vger.kernel.org, bhutchings@solarflare.com,
Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH net-next-2.6] bridge: 64bit rx/tx counters
Date: Thu, 12 Aug 2010 15:11:45 -0700 [thread overview]
Message-ID: <20100812151145.f5fa259b.akpm@linux-foundation.org> (raw)
In-Reply-To: <1281649657.2305.38.camel@edumazet-laptop>
On Thu, 12 Aug 2010 23:47:37 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 12 ao__t 2010 __ 08:07 -0700, Andrew Morton a __crit :
> > On Thu, 12 Aug 2010 14:16:15 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > > And all this open-coded per-cpu counter stuff added all over the place.
> > > > Were percpu_counters tested or reviewed and found inadequate and unfixable?
> > > > If so, please do tell.
> > > >
> > >
> > > percpu_counters tries hard to maintain a view of the current value of
> > > the (global) counter. This adds a cost because of a shared cache line
> > > and locking. (__percpu_counter_sum() is not very scalable on big hosts,
> > > it locks the percpu_counter lock for a possibly long iteration)
> >
> > Could be. Is percpu_counter_read_positive() unsuitable?
> >
>
> I bet most people want precise counters when doing 'ifconfig lo'
>
> SNMP applications would be very surprised to get non increasing values
> between two samples, or inexact values.
percpu_counter_read_positive() should be returning monotonically
increasing numbers - if it ever went backward that would be bad. But
yes, the value will increase in a lumpy fashion. Probably one would
need to make informed choices between percpu_counter_read_positive()
and percpu_counter_sum(), depending on the type of stat.
But that's all a bit academic.
>
> > > And this folding has zero effect on
> > > concurrent writers (counter updates)
> >
> > The fastpath looks a little expensive in the code you've added. The
> > write_seqlock() does an rmw and a wmb() and the stats inc is a 64-bit
> > rmw whereas percpu_counters do a simple 32-bit add. So I'd expect that
> > at some suitable batch value, percpu-counters are faster on 32-bit.
> >
>
> Hmm... 6 instructions (16 bytes of text) are a "little expensive" versus
> 120 instructions if we use percpu_counter ?
>
> Following code from drivers/net/loopback.c
>
> u64_stats_update_begin(&lb_stats->syncp);
> lb_stats->bytes += len;
> lb_stats->packets++;
> u64_stats_update_end(&lb_stats->syncp);
>
> maps on i386 to :
>
> ff 46 10 incl 0x10(%esi) // u64_stats_update_begin(&lb_stats->syncp);
> 89 f8 mov %edi,%eax
> 99 cltd
> 01 7e 08 add %edi,0x8(%esi)
> 11 56 0c adc %edx,0xc(%esi)
> 83 06 01 addl $0x1,(%esi)
> 83 56 04 00 adcl $0x0,0x4(%esi)
> ff 46 10 incl 0x10(%esi) // u64_stats_update_end(&lb_stats->syncp);
>
>
> Exactly 6 added instructions compared to previous kernel (32bit
> counters), only on 32bit hosts. These instructions are not expensive (no
> conditional branches, no extra register pressure) and access private cpu
> data.
>
> While two calls to __percpu_counter_add() add about 120 instructions,
> even on 64bit hosts, wasting precious cpu cycles.
Oy. You omitted the per_cpu_ptr() evaluation and, I bet, included all
the executed-1/batch-times instructions.
>
> > They'll usually be slower on 64-bit, until that num_possible_cpus walk
> > bites you.
> >
>
> But are you aware we already fold SNMP values using for_each_possible()
> macros, before adding 64bit counters ? Not related to 64bit stuff
> really...
> > percpu_counters might need some work to make them irq-friendly. That
> > bare spin_lock().
> >
> > btw, I worry a bit about seqlocks in the presence of interrupts:
> >
>
> Please note that nothing is assumed about interrupts and seqcounts
>
> Both readers and writers must mask them if necessary.
>
> In most situations, masking softirq is enough for networking cases
> (updates are performed from softirq handler, reads from process context)
Yup, write_seqcount_begin/end() are pretty dangerous-looking. The
caller needs to protect the lock against other CPUs, against interrupts
and even against preemption.
prev parent reply other threads:[~2010-08-12 22:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-14 15:59 [PATCH net-next-2.6] loopback: Implement 64bit stats on 32bit arches Eric Dumazet
2010-06-15 6:14 ` David Miller
2010-06-15 6:49 ` Nick Piggin
2010-06-15 7:23 ` Eric Dumazet
2010-06-15 10:14 ` [PATCH net-next-2.6] net: Introduce u64_stats_sync infrastructure Eric Dumazet
2010-06-15 10:25 ` Nick Piggin
2010-06-15 10:43 ` Eric Dumazet
2010-06-15 11:04 ` Nick Piggin
2010-06-15 12:12 ` Eric Dumazet
2010-06-15 13:29 ` [PATCH net-next-2.6 v2] " Eric Dumazet
2010-06-22 17:24 ` David Miller
2010-06-22 17:31 ` Eric Dumazet
2010-06-15 10:39 ` [PATCH net-next-2.6] bridge: 64bit rx/tx counters Eric Dumazet
2010-06-22 17:25 ` David Miller
2010-08-10 4:47 ` Andrew Morton
2010-08-12 12:16 ` Eric Dumazet
2010-08-12 15:07 ` Andrew Morton
2010-08-12 21:47 ` Eric Dumazet
2010-08-12 22:11 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100812151145.f5fa259b.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).