From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Tue, 09 Mar 2010 15:25:54 +0800 [thread overview]
Message-ID: <4B95F802.9020308@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003081330370.18502@chino.kir.corp.google.com>
on 2010-3-9 5:46, David Rientjes wrote:
[snip]
>> Considering the change of task->mems_allowed is not frequent, so in this patch,
>> I use two variables as a tag to indicate whether task->mems_allowed need be
>> update or not. And before setting the tag, cpuset caches the new mask of every
>> task at its task_struct.
>>
>
> So what exactly is the benefit of 58568d2 from last June that caused this
> issue to begin with? It seems like this entire patchset is a revert of
> that commit. So why shouldn't we just revert that one commit and then add
> the locking and updating necessary for configs where
> MAX_NUMNODES > BITS_PER_LONG on top?
I worried about the consistency of task->mempolicy with task->mems_allowed for
configs where MAX_NUMNODES <= BITS_PER_LONG.
The problem that I worried is fowllowing:
When the kernel allocator allocates pages for tasks, it will access task->mempolicy
first and get the allowed node, then check whether that node is allowed by
task->mems_allowed.
But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same
time. the kernel allocator may access the inconsistent information of ->mempolicy
and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy,
but checks whether that node is allowed by new mems_allowed which does't intersect
old mempolicy.
So I made this patchset.
>> +/**
>> + * cpuset_update_task_mems_allowed - update task memory placement
>> + *
>> + * If the current task's mems_allowed_for_update and mempolicy_for_update are
>> + * changed by cpuset behind our backs, update current->mems_allowed,
>> + * mems_generation and task NUMA mempolicy to the new value.
>> + *
>> + * Call WITHOUT mems_lock held.
>> + *
>> + * This routine is needed to update the pre-task mems_allowed and mempolicy
>> + * within the tasks context, when it is trying to allocate memory.
>> + */
>> +static __always_inline void cpuset_update_task_mems_allowed(void)
>> +{
>> + struct task_struct *tsk = current;
>> + unsigned long flags;
>> +
>> + if (unlikely(tsk->mems_generation != tsk->mems_generation_for_update)) {
>> + task_mems_lock_irqsave(tsk, flags);
>> + tsk->mems_allowed = tsk->mems_allowed_for_update;
>> + tsk->mems_generation = tsk->mems_generation_for_update;
>> + task_mems_unlock_irqrestore(tsk, flags);
>
> By this synchronization, you're guaranteeing that no other kernel code
> ever reads tsk->mems_allowed when tsk != current? Otherwise, you're
> simply protecting the store to tsk->mems_allowed here and not serializing
> on the loads that can return empty nodemasks.
I guarantee that no other kernel code changes tsk->mems_allowed when tsk != current.
so every task can be safe to read tsk->mems_allowed without lock.
I will use mems_lock to protect it when other task reads.
>> + /* Protection of ->mems_allowed_for_update */
>> + spinlock_t mems_lock;
>> + /*
>> + * This variable(mems_allowed_for_update) are just used for caching
>> + * memory placement information.
>> + *
>> + * ->mems_allowed are used by the kernel allocator.
>> + */
>> + nodemask_t mems_allowed_for_update; /* Protected by mems_lock */
>
> Another nodemask_t in struct task_struct for this? And for all configs,
> including those that can do atomic updates to mems_allowed?
Yes, for all configs.
>
>> +
>> + /*
>> + * Increment this integer everytime ->mems_allowed_for_update is
>> + * changed by cpuset. Task can compare this number with mems_generation,
>> + * and if they are not the same, mems_allowed_for_update is changed and
>> + * ->mems_allowed must be updated. In this way, tasks can avoid having
>> + * to lock and reload mems_allowed_for_update unless it is changed.
>> + */
>> + int mems_generation_for_update;
>> + /*
>> + * After updating mems_allowed, set mems_generation to
>> + * mems_generation_for_update.
>> + */
>> + int mems_generation;
>
> I don't see why you need two mems_generation numbers, one should belong in
> the task's cpuset. Then you can compare tsk->mems_generation to
> task_cs(tsk)->mems_generation at cpuset_update_task_memory_state() if you
> set tsk->mems_generation = task_cs(tsk)->mems_generation on
> cpuset_attach() or update_nodemask().
In this way, we must use rcu_read_lock() to protect task's cs, and the performance
will slowdown though rcu read lock's spending is very small.
Thanks!
Miao
WARNING: multiple messages have this Message-ID (diff)
From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Tue, 09 Mar 2010 15:25:54 +0800 [thread overview]
Message-ID: <4B95F802.9020308@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003081330370.18502@chino.kir.corp.google.com>
on 2010-3-9 5:46, David Rientjes wrote:
[snip]
>> Considering the change of task->mems_allowed is not frequent, so in this patch,
>> I use two variables as a tag to indicate whether task->mems_allowed need be
>> update or not. And before setting the tag, cpuset caches the new mask of every
>> task at its task_struct.
>>
>
> So what exactly is the benefit of 58568d2 from last June that caused this
> issue to begin with? It seems like this entire patchset is a revert of
> that commit. So why shouldn't we just revert that one commit and then add
> the locking and updating necessary for configs where
> MAX_NUMNODES > BITS_PER_LONG on top?
I worried about the consistency of task->mempolicy with task->mems_allowed for
configs where MAX_NUMNODES <= BITS_PER_LONG.
The problem that I worried is fowllowing:
When the kernel allocator allocates pages for tasks, it will access task->mempolicy
first and get the allowed node, then check whether that node is allowed by
task->mems_allowed.
But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same
time. the kernel allocator may access the inconsistent information of ->mempolicy
and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy,
but checks whether that node is allowed by new mems_allowed which does't intersect
old mempolicy.
So I made this patchset.
>> +/**
>> + * cpuset_update_task_mems_allowed - update task memory placement
>> + *
>> + * If the current task's mems_allowed_for_update and mempolicy_for_update are
>> + * changed by cpuset behind our backs, update current->mems_allowed,
>> + * mems_generation and task NUMA mempolicy to the new value.
>> + *
>> + * Call WITHOUT mems_lock held.
>> + *
>> + * This routine is needed to update the pre-task mems_allowed and mempolicy
>> + * within the tasks context, when it is trying to allocate memory.
>> + */
>> +static __always_inline void cpuset_update_task_mems_allowed(void)
>> +{
>> + struct task_struct *tsk = current;
>> + unsigned long flags;
>> +
>> + if (unlikely(tsk->mems_generation != tsk->mems_generation_for_update)) {
>> + task_mems_lock_irqsave(tsk, flags);
>> + tsk->mems_allowed = tsk->mems_allowed_for_update;
>> + tsk->mems_generation = tsk->mems_generation_for_update;
>> + task_mems_unlock_irqrestore(tsk, flags);
>
> By this synchronization, you're guaranteeing that no other kernel code
> ever reads tsk->mems_allowed when tsk != current? Otherwise, you're
> simply protecting the store to tsk->mems_allowed here and not serializing
> on the loads that can return empty nodemasks.
I guarantee that no other kernel code changes tsk->mems_allowed when tsk != current.
so every task can be safe to read tsk->mems_allowed without lock.
I will use mems_lock to protect it when other task reads.
>> + /* Protection of ->mems_allowed_for_update */
>> + spinlock_t mems_lock;
>> + /*
>> + * This variable(mems_allowed_for_update) are just used for caching
>> + * memory placement information.
>> + *
>> + * ->mems_allowed are used by the kernel allocator.
>> + */
>> + nodemask_t mems_allowed_for_update; /* Protected by mems_lock */
>
> Another nodemask_t in struct task_struct for this? And for all configs,
> including those that can do atomic updates to mems_allowed?
Yes, for all configs.
>
>> +
>> + /*
>> + * Increment this integer everytime ->mems_allowed_for_update is
>> + * changed by cpuset. Task can compare this number with mems_generation,
>> + * and if they are not the same, mems_allowed_for_update is changed and
>> + * ->mems_allowed must be updated. In this way, tasks can avoid having
>> + * to lock and reload mems_allowed_for_update unless it is changed.
>> + */
>> + int mems_generation_for_update;
>> + /*
>> + * After updating mems_allowed, set mems_generation to
>> + * mems_generation_for_update.
>> + */
>> + int mems_generation;
>
> I don't see why you need two mems_generation numbers, one should belong in
> the task's cpuset. Then you can compare tsk->mems_generation to
> task_cs(tsk)->mems_generation at cpuset_update_task_memory_state() if you
> set tsk->mems_generation = task_cs(tsk)->mems_generation on
> cpuset_attach() or update_nodemask().
In this way, we must use rcu_read_lock() to protect task's cs, and the performance
will slowdown though rcu read lock's spending is very small.
Thanks!
Miao
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-09 7:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-08 10:10 [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-08 10:10 ` Miao Xie
2010-03-08 21:46 ` David Rientjes
2010-03-08 21:46 ` David Rientjes
2010-03-09 7:25 ` Miao Xie [this message]
2010-03-09 7:25 ` Miao Xie
2010-03-11 8:15 ` Nick Piggin
2010-03-11 8:15 ` Nick Piggin
2010-03-11 10:33 ` Miao Xie
2010-03-11 10:33 ` Miao Xie
2010-03-11 11:03 ` Nick Piggin
2010-03-11 11:03 ` Nick Piggin
2010-03-25 10:23 ` Miao Xie
2010-03-25 10:23 ` Miao Xie
2010-03-25 12:56 ` Miao Xie
2010-03-25 12:56 ` Miao Xie
2010-03-25 13:33 ` [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily) Miao Xie
2010-03-25 13:33 ` Miao Xie
2010-03-28 5:30 ` Bob Liu
2010-03-28 5:30 ` Bob Liu
2010-03-31 19:42 ` Andrew Morton
2010-03-31 19:42 ` Andrew Morton
2010-03-31 9:54 ` [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-31 9:54 ` Miao Xie
2010-03-31 10:34 ` David Rientjes
2010-03-31 10:34 ` David Rientjes
2010-04-01 2:16 ` Miao Xie
2010-04-01 2:16 ` Miao Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B95F802.9020308@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.