All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <borislav.petkov@amd.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	greg@kroah.com, mingo@elte.hu, norsk5@yahoo.com,
	tglx@linutronix.de, mchehab@redhat.com, aris@redhat.com,
	edt@aei.ca, linux-kernel@vger.kernel.org,
	randy.dunlap@oracle.com, Sam Ravnborg <sam@ravnborg.org>
Subject: Re: [PATCH 0/4] amd64_edac: misc fixes
Date: Wed, 3 Jun 2009 20:20:23 +0200	[thread overview]
Message-ID: <20090603182023.GA28083@aftab> (raw)
In-Reply-To: <4A24248E.3060005@zytor.com>

On Mon, Jun 01, 2009 at 11:57:18AM -0700, H. Peter Anvin wrote:
> Borislav Petkov wrote:
> > Actually, popcnt got added to gas in July 2006 so checking the gas
> > version should suffice, IMHO.
> 
> gas is part of binutils.
> 
> > Anyway, I proposed something similar before but Andrew suggested that we
> > should simply slap in the opcode so we don't need the Kbuild changes.
> > The advantage of the approach is that it works unconditionally on all
> > toolchains and introduces less code changes. Hmm...
> 
> That really sucks, though, in the long run.  I personally prefer to have
> the "right thing" -- which in this case is probably gcc intrinsics --
> and then a fallback that will gradually fall out of use.

Ok, here's a simple performance data measurement exercise:

I went and rerouted all the cpumask_weight calls in sched.c through a
noinline local definition:

static noinline unsigned int my_weight(const struct cpumask *mask)
{
       return cpumask_weight(mask);
}

so that I could be able to dynamically ftrace the invocations. Compiling
a kernel (make -j8) on a quad core Fam10h gave the following trace
(excerpt):

          <idle>-0     [000]   313.120141: my_weight <-scheduler_tick
          <idle>-0     [000]   313.120145: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.124133: my_weight <-scheduler_tick
          <idle>-0     [000]   313.124138: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.128124: my_weight <-scheduler_tick
          <idle>-0     [000]   313.128127: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.132116: my_weight <-scheduler_tick
          <idle>-0     [000]   313.132120: my_weight <-select_nohz_load_balancer
          <idle>-0     [000]   313.136109: my_weight <-scheduler_tick
          <idle>-0     [000]   313.136114: my_weight <-select_nohz_load_balancer
           <...>-3986  [002]   313.138868: my_weight <-sched_balance_self
           <...>-3986  [002]   313.138870: my_weight <-sched_balance_self
           <...>-4064  [003]   313.138942: my_weight <-sched_balance_self
           <...>-4064  [003]   313.138945: my_weight <-sched_balance_self
           <...>-4064  [000]   313.142034: my_weight <-sched_balance_self
           <...>-4064  [000]   313.142037: my_weight <-sched_balance_self
           <...>-4065  [001]   313.143509: my_weight <-sched_balance_self
           <...>-4065  [001]   313.143511: my_weight <-sched_balance_self
            make-3777  [000]   313.146553: my_weight <-sched_balance_self
            make-3777  [000]   313.146554: my_weight <-sched_balance_self
           <...>-4066  [001]   313.146614: my_weight <-sched_balance_self
           <...>-4066  [001]   313.146614: my_weight <-sched_balance_self
           <...>-4066  [003]   313.149516: my_weight <-sched_balance_self


and the following stats:

compile time: ~309.373623 secs
my_weight calls on _all_ cores: 54005
	(cpu0: 14262, cpu1: 14417, cpu2: 11654, cpu3: 13672)

leading to approx. 174.56 calls per second on _ALL_ cores combined. If,
hypothetically speaking, this is a representative workload and we forget
the ftrace overhead, it looks like there's no need to switch to the
hardware version of hweight since this'll bring a bunch of code changes
which simply wouldn't justify themselves wrt to performance improvement.
It is just not worth the effort.

Of course, I'm open for suggestions wrt to a better workload but from
looking at the code, the most frequent hweight call site seems to be
scheduler_tick which happens with HZ frequency and even this is by
several magnitudes not enough for a measurable performance improvement.

Hmm..?

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


      reply	other threads:[~2009-06-03 18:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-20 18:43 [PATCH 0/4] amd64_edac: misc fixes Borislav Petkov
2009-05-20 18:43 ` [PATCH 1/4] x86: msr.h: fix build error Borislav Petkov
2009-05-20 18:43 ` [PATCH 2/4] amd64_edac: do not enable module by default Borislav Petkov
2009-05-20 18:43 ` [PATCH 3/4] EDAC: do not enable modules " Borislav Petkov
2009-05-20 18:43 ` [PATCH 4/4] amd64_edac: add MAINTAINERS entry Borislav Petkov
2009-05-20 21:41 ` [PATCH 0/4] amd64_edac: misc fixes Randy Dunlap
2009-05-28 23:47 ` Andrew Morton
2009-05-29 10:33   ` Borislav Petkov
2009-05-29 20:01     ` Andrew Morton
2009-05-30  8:19       ` Borislav Petkov
2009-05-30  8:40         ` Andrew Morton
2009-05-30 10:31           ` Borislav Petkov
2009-05-30 19:22           ` H. Peter Anvin
2009-06-01 14:53             ` Borislav Petkov
2009-06-01 16:54               ` H. Peter Anvin
2009-06-01 17:02                 ` H. Peter Anvin
2009-06-01 17:31                   ` H.J. Lu
2009-06-01 18:12                 ` Borislav Petkov
2009-06-01 18:57                   ` H. Peter Anvin
2009-06-03 18:20                     ` Borislav Petkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090603182023.GA28083@aftab \
    --to=borislav.petkov@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=aris@redhat.com \
    --cc=edt@aei.ca \
    --cc=greg@kroah.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=norsk5@yahoo.com \
    --cc=randy.dunlap@oracle.com \
    --cc=sam@ravnborg.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.