Re: [PATCH 5/6] numa,sched: normalize faults_from stats and weigh by CPU use

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterz@infradead.org, mingo@redhat.com, chegu_vinod@hp.com
Subject: Re: [PATCH 5/6] numa,sched: normalize faults_from stats and weigh by CPU use
Date: Tue, 21 Jan 2014 16:05:16 -0500	[thread overview]
Message-ID: <52DEE10C.6030907@redhat.com> (raw)
In-Reply-To: <20140121155652.GL4963@suse.de>

On 01/21/2014 10:56 AM, Mel Gorman wrote:
> On Mon, Jan 20, 2014 at 02:21:06PM -0500, riel@redhat.com wrote:

>> @@ -1434,6 +1436,11 @@ static void task_numa_placement(struct task_struct *p)
>>  	p->numa_scan_seq = seq;
>>  	p->numa_scan_period_max = task_scan_max(p);
>>  
>> +	total_faults = p->numa_faults_locality[0] +
>> +		       p->numa_faults_locality[1] + 1;
> 
> Depending on how you reacted to the review of other patches this may or
> may not have a helper now.

This is a faults "buffer", zeroed quickly after we take these
faults, so we should probably not tempt others by having a helper
function to get these numbers...

>> +	runtime = p->se.avg.runnable_avg_sum;
>> +	period = p->se.avg.runnable_avg_period;
>> +
> 
> Ok, IIRC these stats are based a decaying average based on recent
> history so heavy activity followed by long periods of idle will not skew
> the stats.

Turns out that using a longer time statistic results in a 1% performance
gain, so expect this code to change again in the next version :)

>> @@ -1458,8 +1465,18 @@ static void task_numa_placement(struct task_struct *p)
>>  			fault_types[priv] += p->numa_faults_buffer[i];
>>  			p->numa_faults_buffer[i] = 0;
>>  
>> +			/*
>> +			 * Normalize the faults_from, so all tasks in a group
>> +			 * count according to CPU use, instead of by the raw
>> +			 * number of faults. Tasks with little runtime have
>> +			 * little over-all impact on throughput, and thus their
>> +			 * faults are less important.
>> +			 */
>> +			f_weight = (16384 * runtime *
>> +				   p->numa_faults_from_buffer[i]) /
>> +				   (total_faults * period + 1);
> 
> Why 16384? It looks like a scaling factor to deal with integer approximations
> but I'm not 100% sure and I do not see how you arrived at that value.

Indeed, it is simply a fixed point math scaling factor.

I used 1024 before, but that is kind of a small number when we could
be dealing with a node that has 20% of the accesses, and a task that
used 10% CPU time.

Having the numbers a little larger could help, and certainly should
not hurt, as long as we keep the number small enough to avoid overflows.

>>  			p->numa_faults_from[i] >>= 1;
>> -			p->numa_faults_from[i] += p->numa_faults_from_buffer[i];
>> +			p->numa_faults_from[i] += f_weight;
>>  			p->numa_faults_from_buffer[i] = 0;
>>  
> 
> numa_faults_from needs a big comment that it's no longer about the
> number of faults in it. It's the sum of faults measured by the group
> weighted by the CPU

Agreed.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterz@infradead.org, mingo@redhat.com, chegu_vinod@hp.com
Subject: Re: [PATCH 5/6] numa,sched: normalize faults_from stats and weigh by CPU use
Date: Tue, 21 Jan 2014 16:05:16 -0500	[thread overview]
Message-ID: <52DEE10C.6030907@redhat.com> (raw)
In-Reply-To: <20140121155652.GL4963@suse.de>

On 01/21/2014 10:56 AM, Mel Gorman wrote:
> On Mon, Jan 20, 2014 at 02:21:06PM -0500, riel@redhat.com wrote:

>> @@ -1434,6 +1436,11 @@ static void task_numa_placement(struct task_struct *p)
>>  	p->numa_scan_seq = seq;
>>  	p->numa_scan_period_max = task_scan_max(p);
>>  
>> +	total_faults = p->numa_faults_locality[0] +
>> +		       p->numa_faults_locality[1] + 1;
> 
> Depending on how you reacted to the review of other patches this may or
> may not have a helper now.

This is a faults "buffer", zeroed quickly after we take these
faults, so we should probably not tempt others by having a helper
function to get these numbers...

>> +	runtime = p->se.avg.runnable_avg_sum;
>> +	period = p->se.avg.runnable_avg_period;
>> +
> 
> Ok, IIRC these stats are based a decaying average based on recent
> history so heavy activity followed by long periods of idle will not skew
> the stats.

Turns out that using a longer time statistic results in a 1% performance
gain, so expect this code to change again in the next version :)

>> @@ -1458,8 +1465,18 @@ static void task_numa_placement(struct task_struct *p)
>>  			fault_types[priv] += p->numa_faults_buffer[i];
>>  			p->numa_faults_buffer[i] = 0;
>>  
>> +			/*
>> +			 * Normalize the faults_from, so all tasks in a group
>> +			 * count according to CPU use, instead of by the raw
>> +			 * number of faults. Tasks with little runtime have
>> +			 * little over-all impact on throughput, and thus their
>> +			 * faults are less important.
>> +			 */
>> +			f_weight = (16384 * runtime *
>> +				   p->numa_faults_from_buffer[i]) /
>> +				   (total_faults * period + 1);
> 
> Why 16384? It looks like a scaling factor to deal with integer approximations
> but I'm not 100% sure and I do not see how you arrived at that value.

Indeed, it is simply a fixed point math scaling factor.

I used 1024 before, but that is kind of a small number when we could
be dealing with a node that has 20% of the accesses, and a task that
used 10% CPU time.

Having the numbers a little larger could help, and certainly should
not hurt, as long as we keep the number small enough to avoid overflows.

>>  			p->numa_faults_from[i] >>= 1;
>> -			p->numa_faults_from[i] += p->numa_faults_from_buffer[i];
>> +			p->numa_faults_from[i] += f_weight;
>>  			p->numa_faults_from_buffer[i] = 0;
>>  
> 
> numa_faults_from needs a big comment that it's no longer about the
> number of faults in it. It's the sum of faults measured by the group
> weighted by the CPU

Agreed.

-- 
All rights reversed

next prev parent reply	other threads:[~2014-01-21 21:05 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-20 19:21 [PATCH v3 0/6] pseudo-interleaving for automatic NUMA balancing riel
2014-01-20 19:21 ` riel
2014-01-20 19:21 ` [PATCH 1/6] numa,sched,mm: remove p->numa_migrate_deferred riel
2014-01-20 19:21   ` riel
2014-01-21 11:52   ` Mel Gorman
2014-01-21 11:52     ` Mel Gorman
2014-01-20 19:21 ` [PATCH 2/6] numa,sched: track from which nodes NUMA faults are triggered riel
2014-01-20 19:21   ` riel
2014-01-21 12:21   ` Mel Gorman
2014-01-21 12:21     ` Mel Gorman
2014-01-21 22:26     ` Rik van Riel
2014-01-21 22:26       ` Rik van Riel
2014-01-24 14:14       ` Mel Gorman
2014-01-24 14:14         ` Mel Gorman
2014-01-20 19:21 ` [PATCH 3/6] numa,sched: build per numa_group active node mask from faults_from statistics riel
2014-01-20 19:21   ` riel
2014-01-21 14:19   ` Mel Gorman
2014-01-21 14:19     ` Mel Gorman
2014-01-21 15:09     ` Rik van Riel
2014-01-21 15:09       ` Rik van Riel
2014-01-21 15:41       ` Mel Gorman
2014-01-21 15:41         ` Mel Gorman
2014-01-20 19:21 ` [PATCH 4/6] numa,sched,mm: use active_nodes nodemask to limit numa migrations riel
2014-01-20 19:21   ` riel
2014-01-21 15:08   ` Mel Gorman
2014-01-21 15:08     ` Mel Gorman
2014-01-20 19:21 ` [PATCH 5/6] numa,sched: normalize faults_from stats and weigh by CPU use riel
2014-01-20 19:21   ` riel
2014-01-21 15:56   ` Mel Gorman
2014-01-21 15:56     ` Mel Gorman
2014-01-21 21:05     ` Rik van Riel [this message]
2014-01-21 21:05       ` Rik van Riel
2014-01-20 19:21 ` [PATCH 6/6] numa,sched: do statistics calculation using local variables only riel
2014-01-20 19:21   ` riel
2014-01-21 16:15   ` Mel Gorman
2014-01-21 16:15     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DEE10C.6030907@redhat.com \
    --to=riel@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.