From: Kirill Korotaev <dev@sw.ru>
To: Paul Jackson <pj@sgi.com>
Cc: akpm@osdl.org, kurosawa@valinux.co.jp, simon.derr@bull.net,
linux-kernel@vger.kernel.org, st@sw.ru, den@sw.ru,
djwong@us.ibm.com
Subject: Re: [PATCH] cpuset oom lock fix
Date: Thu, 12 Jan 2006 13:06:49 +0300 [thread overview]
Message-ID: <43C62A39.50606@sw.ru> (raw)
In-Reply-To: <20060112091627.18409.49780.sendpatchset@jackhammer.engr.sgi.com>
To tell the truth I don't like such approaches, when more and more hooks
are create and scattered all aroung. maybe it is possible to split
semaphore to lock and sem and use lock where possible?
Kirill
> The problem, reported in:
> http://bugzilla.kernel.org/show_bug.cgi?id=5859
> and by various other email messages and lkml posts
> is that the cpuset hook in the oom (out of memory)
> code can try to take a cpuset semaphore while holding
> the tasklist_lock (a spinlock).
>
> One must not sleep while holding a spinlock.
>
> The fix seems easy enough - move the cpuset semaphore
> region outside the tasklist_lock region.
>
> This required a few lines of mechanism to implement.
> The oom code where the locking needs to be changed
> does not have access to the cpuset locks, which are
> internal to kernel/cpuset.c only. So I provided a
> couple more cpuset interface routines, available to
> the rest of the kernel, which simple take and drop
> the lock needed here (cpusets callback_sem).
>
> Signed-off-by: Paul Jackson
>
> ---
>
> Andrew - this should be ready for *-mm now.
> It's the same change I sent yesterday labeled [RFC].
> It passed my testing fine.
>
> This is a bug fix that I would recommend for 2.6.16
> in a few days.
>
> include/linux/cpuset.h | 6 ++++++
> kernel/cpuset.c | 33 ++++++++++++++++++++++++++++-----
> mm/oom_kill.c | 3 +++
> 3 files changed, 37 insertions(+), 5 deletions(-)
>
> --- 2.6.15-mm2.orig/include/linux/cpuset.h 2006-01-10 16:00:21.190309415 -0800
> +++ 2.6.15-mm2/include/linux/cpuset.h 2006-01-10 23:07:59.648091868 -0800
> @@ -48,6 +48,9 @@ extern void __cpuset_memory_pressure_bum
> extern struct file_operations proc_cpuset_operations;
> extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
>
> +extern void cpuset_lock(void);
> +extern void cpuset_unlock(void);
> +
> #else /* !CONFIG_CPUSETS */
>
> static inline int cpuset_init_early(void) { return 0; }
> @@ -93,6 +96,9 @@ static inline char *cpuset_task_status_a
> return buffer;
> }
>
> +static inline void cpuset_lock(void) {}
> +static inline void cpuset_unlock(void) {}
> +
> #endif /* !CONFIG_CPUSETS */
>
> #endif /* _LINUX_CPUSET_H */
> --- 2.6.15-mm2.orig/kernel/cpuset.c 2006-01-10 18:26:40.408120365 -0800
> +++ 2.6.15-mm2/kernel/cpuset.c 2006-01-10 23:14:28.355492590 -0800
> @@ -2150,6 +2150,33 @@ int __cpuset_zone_allowed(struct zone *z
> }
>
> /**
> + * cpuset_lock - lock out any changes to cpuset structures
> + *
> + * The out of memory (oom) code needs to lock down cpusets
> + * from being changed while it scans the tasklist looking for a
> + * task in an overlapping cpuset. Expose callback_sem via this
> + * cpuset_lock() routine, so the oom code can lock it, before
> + * locking the task list. The tasklist_lock is a spinlock, so
> + * must be taken inside callback_sem.
> + */
> +
> +void cpuset_lock(void)
> +{
> + down(&callback_sem);
> +}
> +
> +/**
> + * cpuset_unlock - release lock on cpuset changes
> + *
> + * Undo the lock taken in a previous cpuset_lock() call.
> + */
> +
> +void cpuset_unlock(void)
> +{
> + up(&callback_sem);
> +}
> +
> +/**
> * cpuset_excl_nodes_overlap - Do we overlap @p's mem_exclusive ancestors?
> * @p: pointer to task_struct of some other task.
> *
> @@ -2158,7 +2185,7 @@ int __cpuset_zone_allowed(struct zone *z
> * determine if task @p's memory usage might impact the memory
> * available to the current task.
> *
> - * Acquires callback_sem - not suitable for calling from a fast path.
> + * Call while holding callback_sem.
> **/
>
> int cpuset_excl_nodes_overlap(const struct task_struct *p)
> @@ -2166,8 +2193,6 @@ int cpuset_excl_nodes_overlap(const stru
> const struct cpuset *cs1, *cs2; /* my and p's cpuset ancestors */
> int overlap = 0; /* do cpusets overlap? */
>
> - down(&callback_sem);
> -
> task_lock(current);
> if (current->flags & PF_EXITING) {
> task_unlock(current);
> @@ -2186,8 +2211,6 @@ int cpuset_excl_nodes_overlap(const stru
>
> overlap = nodes_intersects(cs1->mems_allowed, cs2->mems_allowed);
> done:
> - up(&callback_sem);
> -
> return overlap;
> }
>
> --- 2.6.15-mm2.orig/mm/oom_kill.c 2006-01-10 19:18:25.588685008 -0800
> +++ 2.6.15-mm2/mm/oom_kill.c 2006-01-10 23:16:05.936643833 -0800
> @@ -274,6 +274,7 @@ void out_of_memory(gfp_t gfp_mask, int o
> show_mem();
> }
>
> + cpuset_lock();
> read_lock(&tasklist_lock);
> retry:
> p = select_bad_process();
> @@ -284,6 +285,7 @@ retry:
> /* Found nothing?!?! Either we hang forever, or we panic. */
> if (!p) {
> read_unlock(&tasklist_lock);
> + cpuset_unlock();
> panic("Out of memory and no killable processes...\n");
> }
>
> @@ -293,6 +295,7 @@ retry:
>
> out:
> read_unlock(&tasklist_lock);
> + cpuset_unlock();
> if (mm)
> mmput(mm);
>
>
next prev parent reply other threads:[~2006-01-12 10:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-12 9:16 [PATCH] cpuset oom lock fix Paul Jackson
2006-01-12 10:06 ` Kirill Korotaev [this message]
2006-01-12 10:51 ` Paul Jackson
2006-01-12 17:31 ` Darrick J. Wong
2006-01-12 17:35 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C62A39.50606@sw.ru \
--to=dev@sw.ru \
--cc=akpm@osdl.org \
--cc=den@sw.ru \
--cc=djwong@us.ibm.com \
--cc=kurosawa@valinux.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=pj@sgi.com \
--cc=simon.derr@bull.net \
--cc=st@sw.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox