[mm] f7b5d647946: -3.0% dbench.throughput-MB/sec

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
@ 2014-08-19  4:41 Fengguang Wu
  2014-08-19 14:34 ` Mel Gorman
  0 siblings, 1 reply; 6+ messages in thread
From: Fengguang Wu @ 2014-08-19  4:41 UTC (permalink / raw)
  To: Mel Gorman; +Cc: LKML, lkp

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

Hi Mel,

We noticed a minor dbench throughput regression on commit
f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
zone allocation policy when remotes nodes are encountered").

testcase: ivb44/dbench/100%

bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
---------------  -------------------------
     25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
   6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
     18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
      9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
   1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Thanks,
Fengguang

[-- Attachment #2: reproduce --]
[-- Type: text/plain, Size: 3584 bytes --]

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu10/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu11/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu12/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu13/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu14/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu15/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu16/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu17/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu18/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu19/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu20/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu21/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu22/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu23/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu24/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu25/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu26/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu27/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu28/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu29/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu30/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu31/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu32/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu33/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu34/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu35/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu36/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu37/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu38/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu39/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu40/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu41/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu42/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu43/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu44/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu45/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu46/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu47/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu9/cpufreq/scaling_governor
dbench 48 -c /usr/share/dbench/client.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
  2014-08-19  4:41 [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec Fengguang Wu
@ 2014-08-19 14:34 ` Mel Gorman
  2014-08-19 15:43   ` Fengguang Wu
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2014-08-19 14:34 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: LKML, lkp

On Tue, Aug 19, 2014 at 12:41:34PM +0800, Fengguang Wu wrote:
> Hi Mel,
> 
> We noticed a minor dbench throughput regression on commit
> f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
> zone allocation policy when remotes nodes are encountered").
> 
> testcase: ivb44/dbench/100%
> 
> bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
> ---------------  -------------------------
>      25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
>    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
>      18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
>       9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
>    1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 

DirectMap2M changing is a major surprise and doesn't make sense for this
machine. Did the amount of memory in the machine change between two
tests?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
  2014-08-19 14:34 ` Mel Gorman
@ 2014-08-19 15:43   ` Fengguang Wu
  2014-08-19 16:12     ` Mel Gorman
  2014-08-19 19:11     ` Mel Gorman
  0 siblings, 2 replies; 6+ messages in thread
From: Fengguang Wu @ 2014-08-19 15:43 UTC (permalink / raw)
  To: Mel Gorman; +Cc: LKML, lkp

On Tue, Aug 19, 2014 at 03:34:28PM +0100, Mel Gorman wrote:
> On Tue, Aug 19, 2014 at 12:41:34PM +0800, Fengguang Wu wrote:
> > Hi Mel,
> > 
> > We noticed a minor dbench throughput regression on commit
> > f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
> > zone allocation policy when remotes nodes are encountered").
> > 
> > testcase: ivb44/dbench/100%
> > 
> > bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
> > ---------------  -------------------------
> >      25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
> >    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
> >      18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
> >       9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
> >    1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> 
> DirectMap2M changing is a major surprise and doesn't make sense for this
> machine.

The ivb44's hardware configuration is

        model: Ivytown Ivy Bridge-EP
        nr_cpu: 48
        memory: 64G

And note that this is an in-memory dbench run, which is why
dbench.throughput-MB/sec is so high.

> Did the amount of memory in the machine change between two tests?

Nope. They are back-to-back test runs, so the environment pretty much
remains the same.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
  2014-08-19 15:43   ` Fengguang Wu
@ 2014-08-19 16:12     ` Mel Gorman
  2014-08-19 17:36       ` Fengguang Wu
  2014-08-19 19:11     ` Mel Gorman
  1 sibling, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2014-08-19 16:12 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: LKML, lkp

On Tue, Aug 19, 2014 at 11:43:51PM +0800, Fengguang Wu wrote:
> On Tue, Aug 19, 2014 at 03:34:28PM +0100, Mel Gorman wrote:
> > On Tue, Aug 19, 2014 at 12:41:34PM +0800, Fengguang Wu wrote:
> > > Hi Mel,
> > > 
> > > We noticed a minor dbench throughput regression on commit
> > > f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
> > > zone allocation policy when remotes nodes are encountered").
> > > 
> > > testcase: ivb44/dbench/100%
> > > 
> > > bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
> > > ---------------  -------------------------
> > >      25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
> > >    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
> > >      18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
> > >       9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
> > >    1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches
> > > 
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > > 
> > 
> > DirectMap2M changing is a major surprise and doesn't make sense for this
> > machine.
> 
> The ivb44's hardware configuration is
> 
>         model: Ivytown Ivy Bridge-EP
>         nr_cpu: 48
>         memory: 64G
> 
> And note that this is an in-memory dbench run, which is why
> dbench.throughput-MB/sec is so high.
> 
> > Did the amount of memory in the machine change between two tests?
> 
> Nope. They are back-to-back test runs, so the environment pretty much
> remains the same.

Then how did directmap2m change? The sum of the direct maps should
correspond to the amount of physical memory and this patch has nothing
to do with any memory initialisation paths that might affect this.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
  2014-08-19 16:12     ` Mel Gorman
@ 2014-08-19 17:36       ` Fengguang Wu
  0 siblings, 0 replies; 6+ messages in thread
From: Fengguang Wu @ 2014-08-19 17:36 UTC (permalink / raw)
  To: Mel Gorman; +Cc: LKML, lkp

[-- Attachment #1: Type: text/plain, Size: 2930 bytes --]

On Tue, Aug 19, 2014 at 05:12:58PM +0100, Mel Gorman wrote:
> On Tue, Aug 19, 2014 at 11:43:51PM +0800, Fengguang Wu wrote:
> > On Tue, Aug 19, 2014 at 03:34:28PM +0100, Mel Gorman wrote:
> > > On Tue, Aug 19, 2014 at 12:41:34PM +0800, Fengguang Wu wrote:
> > > > Hi Mel,
> > > > 
> > > > We noticed a minor dbench throughput regression on commit
> > > > f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
> > > > zone allocation policy when remotes nodes are encountered").
> > > > 
> > > > testcase: ivb44/dbench/100%
> > > > 
> > > > bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
> > > > ---------------  -------------------------
> > > >      25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
> > > >    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
> > > >      18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
> > > >       9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
> > > >    1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches
> > > > 
> > > > Disclaimer:
> > > > Results have been estimated based on internal Intel analysis and are provided
> > > > for informational purposes only. Any difference in system hardware or software
> > > > design or configuration may affect actual performance.
> > > > 
> > > 
> > > DirectMap2M changing is a major surprise and doesn't make sense for this
> > > machine.
> > 
> > The ivb44's hardware configuration is
> > 
> >         model: Ivytown Ivy Bridge-EP
> >         nr_cpu: 48
> >         memory: 64G
> > 
> > And note that this is an in-memory dbench run, which is why
> > dbench.throughput-MB/sec is so high.
> > 
> > > Did the amount of memory in the machine change between two tests?
> > 
> > Nope. They are back-to-back test runs, so the environment pretty much
> > remains the same.
> 
> Then how did directmap2m change? The sum of the direct maps should
> correspond to the amount of physical memory and this patch has nothing
> to do with any memory initialisation paths that might affect this.
 
Good question. Not sure for the moment, but it looks that even the
multiple boots for the same kernel bb0b6dffa2ccfbd have different
DirectMap2M. And it only happen for kernel bb0b6dffa2ccfbd.
f7b5d64794 remains stable for all boots.

 bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
 ---------------  -------------------------
      %stddev        %change               %stddev
             \          |                 /
    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M

Looking at the concrete numbers for each boot, the DirectMap2M changes
each time.

  "meminfo.DirectMap2M": [
    7182336,
    7190528,
    7178240,
    6131712,
    7188480
  ],

The MemTotal does remain stable for this kernel:

  "meminfo.MemTotal": [
    65869268,
    65869268,
    65869268,
    65869268,
    65869268
  ],

Attached are the full stats for the 2 kernels.

Thanks,
Fengguang

[-- Attachment #2: f7b5d6-matrix.json --]
[-- Type: application/json, Size: 240807 bytes --]

[-- Attachment #3: bb0b6d-matrix.json --]
[-- Type: application/json, Size: 243268 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec
  2014-08-19 15:43   ` Fengguang Wu
  2014-08-19 16:12     ` Mel Gorman
@ 2014-08-19 19:11     ` Mel Gorman
  1 sibling, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2014-08-19 19:11 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: LKML, lkp

On Tue, Aug 19, 2014 at 11:43:51PM +0800, Fengguang Wu wrote:
> On Tue, Aug 19, 2014 at 03:34:28PM +0100, Mel Gorman wrote:
> > On Tue, Aug 19, 2014 at 12:41:34PM +0800, Fengguang Wu wrote:
> > > Hi Mel,
> > > 
> > > We noticed a minor dbench throughput regression on commit
> > > f7b5d647946aae1647bf5cd26c16b3a793c1ac49 ("mm: page_alloc: abort fair
> > > zone allocation policy when remotes nodes are encountered").
> > > 
> > > testcase: ivb44/dbench/100%
> > > 
> > > bb0b6dffa2ccfbd  f7b5d647946aae1647bf5cd26
> > > ---------------  -------------------------
> > >      25692 ± 0%      -3.0%      24913 ± 0%  dbench.throughput-MB/sec
> > >    6974259 ± 6%     -12.1%    6127616 ± 0%  meminfo.DirectMap2M
> > >      18.43 ± 0%      -4.6%      17.59 ± 0%  turbostat.RAM_W
> > >       9302 ± 0%      -3.6%       8965 ± 1%  time.user_time
> > >    1425791 ± 1%      -2.0%    1396598 ± 0%  time.involuntary_context_switches
> > > 
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > > 
> > 
> > DirectMap2M changing is a major surprise and doesn't make sense for this
> > machine.
> 
> The ivb44's hardware configuration is
> 
>         model: Ivytown Ivy Bridge-EP
>         nr_cpu: 48
>         memory: 64G
> 
> And note that this is an in-memory dbench run, which is why
> dbench.throughput-MB/sec is so high.
> 

Ok, it's a NUMA machine. I expect in this case that prior to the patch more
local memory would have been used on node 0 due to the fair zone allocation
policy skipping remote nodes. The patch corrects the behaviour of zonelist
but the downside is more remote accesses for processes running on node 0. The
behaviour is correct although not necessarily desirable from a performance
point of view.  Users should boot with numa_zonelist_order=node if it's
a problem.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-19 19:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-19  4:41 [mm] f7b5d647946: -3.0% dbench.throughput-MB/sec Fengguang Wu
2014-08-19 14:34 ` Mel Gorman
2014-08-19 15:43   ` Fengguang Wu
2014-08-19 16:12     ` Mel Gorman
2014-08-19 17:36       ` Fengguang Wu
2014-08-19 19:11     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox