public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
To: Chris Hyser <chris.hyser@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	"longman@redhat.com" <longman@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Subject: Re: [PATCH 1/2] sched/numa: Add ability to override task's numa_preferred_nid.
Date: Thu, 17 Apr 2025 09:09:59 +0530	[thread overview]
Message-ID: <46291879-83d4-4e03-9c3a-74872e44b0d6@linux.ibm.com> (raw)
In-Reply-To: <SA2PR10MB47145047CBF0AE1B6E099E299BBD2@SA2PR10MB4714.namprd10.prod.outlook.com>

On 17/04/25 02:43, Chris Hyser wrote:
>> From: Madadi Vineeth Reddy
>> Sent: Wednesday, April 16, 2025 3:00 AM
>> To: Chris Hyser
>> Cc: Peter Zijlstra; Mel Gorman; longman@redhat.com; linux-kernel@vger.kernel.org; Madadi Vineeth Reddy
>> Subject: Re: [PATCH 1/2] sched/numa: Add ability to override task's numa_preferred_nid.
>>
>>
>> Hi Chris,
>>
>> On 15/04/25 07:05, Chris Hyser wrote:
>>> From: chris hyser <chris.hyser@oracle.com>
>>>
>>
>> [..snip..]
>>
>>> The following results were from TPCC runs on an Oracle Database. The system
>>> was a 2-node Intel machine with a database running on each node with local
>>> memory allocations. No tasks or memory were pinned.
>>>
>>> There are four scenarios of interest:
>>>
>>> - Auto NUMA Balancing OFF.
>>>      base value
>>>
>>> - Auto NUMA Balancing ON.
>>>      1.2% - ANB ON better than ANB OFF.
>>>
>>> - Use the prctl(), ANB ON, parameters set to prevent faulting.
>>>      2.4% - prctl() better then ANB OFF.
>>>      1.2% - prctl() better than ANB ON.
>>>
>>> - Use the prctl(), ANB parameters normal.
>>>      3.1% - prctl() and ANB ON better than ANB OFF.
>>>      1.9% - prctl() and ANB ON better than just ANB ON.
>>>      0.7% - prctl() and ANB ON better than prctl() and ANB ON/faulting off
>>>
>>
>> Are you using prctl() to set the preferred node id for all the tasks of your run?
>> If yes, then how `prctl() and ANB ON better than prctl() and ANB ON/faulting off`
>> case happens?
> 
> Not every task in the system (including some DB tasks) has a prctl() set preferred node as the expected preference is not always known. So that is part of it, however the bigger influence even with a prctl() set preferred node, is that faulting drives physical page migration.  You only want to migrate pages that the task is accessing. The fault tells you it was accessed and what node it is currently in allowing a migration decision to be made.
> 

Yes, understood.

>> IIUC, when setting preferred node in numa_preferred_nid_force, the original
>> numa_preferred_nid which is derived from page faults will be a nop which should
>> be an overhead.
> 
> As mentioned above faulting drives physical page migration with the usual trade-off between faulting overhead and the benefits of consolidating pages on the same node. 
> 
> One issue I've seen repeatably is that if you monitor a task (numa fields in /proc/<pid>/sched) some tasks keep changing their preferred node. This makes sense since spatial access locality can change over time, but you also see the migrated page count going up independent of which node is currently preferred. So on a two node system, there are pages being migrated back and forth (not necessarily the same ones). One possible effect of forcing the preferred node is that it isn't changing and migrated pages should be going the same way. 
> 
>> Let me know if my understanding is correct. Also, can you tell how to set the
>> parameters of ANB to prevent faulting.
> 
> Basically, I set the sampling periods to a large number of seconds. Sampling frequency then is 1/large is ~0. Monitoring the task again, it should show no NUMA faults and no pages migrated. 
> 
> kernel.numa_balancing : 1
> scan_period_max_ms: 4294967295
> scan_period_min_ms: 4294967295
> scan_delay_ms: 4294967295
>

Got it. Thanks for the explanation.

Thanks,
Madadi Vineeth Reddy
 
> -chrish


  reply	other threads:[~2025-04-17  3:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15  1:35 [PATCH 1/2] sched/numa: Add ability to override task's numa_preferred_nid Chris Hyser
2025-04-15  1:35 ` [PATCH 2/2] sched/numa: prctl to set/override " Chris Hyser
2025-04-16  7:00 ` [PATCH 1/2] sched/numa: Add ability to override " Madadi Vineeth Reddy
2025-04-16 21:13   ` Chris Hyser
2025-04-17  3:39     ` Madadi Vineeth Reddy [this message]
2025-06-10 18:31 ` Dhaval Giani
2025-06-11 15:12   ` Chris Hyser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46291879-83d4-4e03-9c3a-74872e44b0d6@linux.ibm.com \
    --to=vineethr@linux.ibm.com \
    --cc=chris.hyser@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@techsingularity.net \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox