* [PATCH][V3]Make get_user_pages interruptible @ 2008-11-22 1:31 Ying Han 2008-11-24 20:02 ` Paul Menage 0 siblings, 1 reply; 7+ messages in thread From: Ying Han @ 2008-11-22 1:31 UTC (permalink / raw) To: linux-mm, linux-kernel, akpm, David Rientjes, Paul Menage, Rohit Seth From: Paul Menage <menage@google.com> make get_user_pages interruptible The initial implementation of checking TIF_MEMDIE covers the cases of OOM killing. If the process has been OOM killed, the TIF_MEMDIE is set and it return immediately. This patch includes: 1. add the case that the SIGKILL is sent by user processes. The process can try to get_user_pages() unlimited memory even if a user process has sent a SIGKILL to it(maybe a monitor find the process exceed its memory limit and try to kill it). In the old implementation, the SIGKILL won't be handled until the get_user_pages() returns. 2. change the return value to be ERESTARTSYS. It makes no sense to return ENOMEM if the get_user_pages returned by getting a SIGKILL signal. Considering the general convention for a system call interrupted by a signal is ERESTARTNOSYS, so the current return value is consistant to that. Signed-off-by: Paul Menage <menage@google.com> Singed-off-by: Ying Han <yinghan@google.com> include/linux/sched.h | 1 + kernel/signal.c | 2 +- mm/memory.c | 9 +- diff --git a/include/linux/sched.h b/include/linux/sched.h index b483f39..f9c6a8a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1790,6 +1790,7 @@ extern void sched_dead(struct task_struct *p); extern int in_group_p(gid_t); extern int in_egroup_p(gid_t); +extern int sigkill_pending(struct task_struct *tsk); extern void proc_caches_init(void); extern void flush_signals(struct task_struct *); extern void ignore_signals(struct task_struct *); diff --git a/kernel/signal.c b/kernel/signal.c index 105217d..f3f154e 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1497,7 +1497,7 @@ static inline int may_ptrace_stop(void) * Return nonzero if there is a SIGKILL that should be waking us up. * Called with the siglock held. */ -static int sigkill_pending(struct task_struct *tsk) +int sigkill_pending(struct task_struct *tsk) { return sigismember(&tsk->pending.signal, SIGKILL) || sigismember(&tsk->signal->shared_pending.signal, SIGKILL); diff --git a/mm/memory.c b/mm/memory.c index 164951c..ae24300 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1218,12 +1218,11 @@ int __get_user_pages(struct task_struct *tsk, struct m struct page *page; /* - * If tsk is ooming, cut off its access to large memory - * allocations. It has a pending SIGKILL, but it can't - * be processed until returning to user space. + * If we have a pending SIGKILL, don't keep + * allocating memory. */ - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) - return i ? i : -ENOMEM; + if (unlikely(sigkill_pending(tsk))) + return i ? i : -ERESTARTSYS; if (write) foll_flags |= FOLL_WRITE; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-22 1:31 [PATCH][V3]Make get_user_pages interruptible Ying Han @ 2008-11-24 20:02 ` Paul Menage 2008-11-24 20:55 ` Pekka Enberg 2008-11-24 21:02 ` Ying Han 0 siblings, 2 replies; 7+ messages in thread From: Paul Menage @ 2008-11-24 20:02 UTC (permalink / raw) To: Ying Han; +Cc: linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth On Fri, Nov 21, 2008 at 5:31 PM, Ying Han <yinghan@google.com> wrote: > From: Paul Menage <menage@google.com> This patch is getting further and further from my original internal changes, so I'm not sure that a From: line from me is appropriate. > */ > - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) > - return i ? i : -ENOMEM; > + if (unlikely(sigkill_pending(tsk))) > + return i ? i : -ERESTARTSYS; You've changed the check from sigkill_pending(current) to sigkill_pending(tsk). I originally made that sigkill_pending(current) since we want to avoid tasks entering an unkillable state just because they're doing get_user_pages() on a system that's short of memory. Admittedly for the main case that we care about, mlock() (or an mmap() with MCL_FUTURE set) then tsk==current, but philosophically it seems to me to be more correct to do the check against current than tsk, since current is the thing that's actually allocating the memory. But maybe it would be better to check both? Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-24 20:02 ` Paul Menage @ 2008-11-24 20:55 ` Pekka Enberg 2008-11-24 21:02 ` Ying Han 1 sibling, 0 replies; 7+ messages in thread From: Pekka Enberg @ 2008-11-24 20:55 UTC (permalink / raw) To: Paul Menage Cc: Ying Han, linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth Hi Paul, On Fri, Nov 21, 2008 at 5:31 PM, Ying Han <yinghan@google.com> wrote: >> */ >> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >> - return i ? i : -ENOMEM; >> + if (unlikely(sigkill_pending(tsk))) >> + return i ? i : -ERESTARTSYS; On Mon, Nov 24, 2008 at 10:02 PM, Paul Menage <menage@google.com> wrote: > You've changed the check from sigkill_pending(current) to sigkill_pending(tsk). > > I originally made that sigkill_pending(current) since we want to avoid > tasks entering an unkillable state just because they're doing > get_user_pages() on a system that's short of memory. Admittedly for > the main case that we care about, mlock() (or an mmap() with > MCL_FUTURE set) then tsk==current, but philosophically it seems to me > to be more correct to do the check against current than tsk, since > current is the thing that's actually allocating the memory. But maybe > it would be better to check both? Well, most callers seem to pass 'current' to get_user_pages() but for the out-of-tree revoke patches, for example, you certainly want to check sigkill_pending(current) as well; otherwise the revoke operation is unkillable while in get_user_pages(). Not that revoke() is going to hit mainline any time soon but it does serve as an argument for checking both. Pekka ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-24 20:02 ` Paul Menage 2008-11-24 20:55 ` Pekka Enberg @ 2008-11-24 21:02 ` Ying Han 2008-11-24 21:13 ` Pekka Enberg 1 sibling, 1 reply; 7+ messages in thread From: Ying Han @ 2008-11-24 21:02 UTC (permalink / raw) To: Paul Menage; +Cc: linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth On Mon, Nov 24, 2008 at 12:02 PM, Paul Menage <menage@google.com> wrote: > On Fri, Nov 21, 2008 at 5:31 PM, Ying Han <yinghan@google.com> wrote: >> From: Paul Menage <menage@google.com> > > This patch is getting further and further from my original internal > changes, so I'm not sure that a From: line from me is appropriate. > >> */ >> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >> - return i ? i : -ENOMEM; >> + if (unlikely(sigkill_pending(tsk))) >> + return i ? i : -ERESTARTSYS; > > You've changed the check from sigkill_pending(current) to sigkill_pending(tsk). > > I originally made that sigkill_pending(current) since we want to avoid > tasks entering an unkillable state just because they're doing > get_user_pages() on a system that's short of memory. Admittedly for > the main case that we care about, mlock() (or an mmap() with > MCL_FUTURE set) then tsk==current, but philosophically it seems to me > to be more correct to do the check against current than tsk, since > current is the thing that's actually allocating the memory. But maybe > it would be better to check both? In most of cases, tsk==current in get_user_pages(), that is why i change current to tsk since tsk is a superset of current, no? If that is right, why we need to check both? > > Paul > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-24 21:02 ` Ying Han @ 2008-11-24 21:13 ` Pekka Enberg 2008-11-24 21:50 ` Ying Han 0 siblings, 1 reply; 7+ messages in thread From: Pekka Enberg @ 2008-11-24 21:13 UTC (permalink / raw) To: Ying Han Cc: Paul Menage, linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth On Fri, Nov 21, 2008 at 5:31 PM, Ying Han <yinghan@google.com> wrote: >>> */ >>> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >>> - return i ? i : -ENOMEM; >>> + if (unlikely(sigkill_pending(tsk))) >>> + return i ? i : -ERESTARTSYS; On Mon, Nov 24, 2008 at 12:02 PM, Paul Menage <menage@google.com> wrote: >> You've changed the check from sigkill_pending(current) to sigkill_pending(tsk). >> >> I originally made that sigkill_pending(current) since we want to avoid >> tasks entering an unkillable state just because they're doing >> get_user_pages() on a system that's short of memory. Admittedly for >> the main case that we care about, mlock() (or an mmap() with >> MCL_FUTURE set) then tsk==current, but philosophically it seems to me >> to be more correct to do the check against current than tsk, since >> current is the thing that's actually allocating the memory. But maybe >> it would be better to check both? On Mon, Nov 24, 2008 at 11:02 PM, Ying Han <yinghan@google.com> wrote: > In most of cases, tsk==current in get_user_pages(), that is why i > change current to tsk since > tsk is a superset of current, no? If that is right, why we need to check both? I'm not sure if it's strictly necessary but as I pointed out in the other mail, there can be callers that are doing get_user_pages() on behalf of other tasks and you probably want to be able to kill the task that's actually _calling_ get_user_pages() as well. Pekka ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-24 21:13 ` Pekka Enberg @ 2008-11-24 21:50 ` Ying Han 2008-11-24 22:45 ` Pekka Enberg 0 siblings, 1 reply; 7+ messages in thread From: Ying Han @ 2008-11-24 21:50 UTC (permalink / raw) To: Pekka Enberg Cc: Paul Menage, linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth thanks Pekka and i think one example of the case you mentioned is in access_process_vm() which is calling get_user_pages(tsk, mm, addr, 1, write, 1, &pages, &vma). However, it is allocating only one page here which much less likely to be stuck under memory pressure. Like you said, in order to make it more flexible for future changes, i might make the change like: >>>> */ >>>> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >>>> - return i ? i : -ENOMEM; >>>> + if (unlikely(sigkill_pending(current) | | sigkill_pending(tsk))) >>>> + return i ? i : -ERESTARTSYS; is this something acceptable? On Mon, Nov 24, 2008 at 1:13 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Fri, Nov 21, 2008 at 5:31 PM, Ying Han <yinghan@google.com> wrote: >>>> */ >>>> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >>>> - return i ? i : -ENOMEM; >>>> + if (unlikely(sigkill_pending(tsk))) >>>> + return i ? i : -ERESTARTSYS; > > On Mon, Nov 24, 2008 at 12:02 PM, Paul Menage <menage@google.com> wrote: >>> You've changed the check from sigkill_pending(current) to sigkill_pending(tsk). >>> >>> I originally made that sigkill_pending(current) since we want to avoid >>> tasks entering an unkillable state just because they're doing >>> get_user_pages() on a system that's short of memory. Admittedly for >>> the main case that we care about, mlock() (or an mmap() with >>> MCL_FUTURE set) then tsk==current, but philosophically it seems to me >>> to be more correct to do the check against current than tsk, since >>> current is the thing that's actually allocating the memory. But maybe >>> it would be better to check both? > > On Mon, Nov 24, 2008 at 11:02 PM, Ying Han <yinghan@google.com> wrote: >> In most of cases, tsk==current in get_user_pages(), that is why i >> change current to tsk since >> tsk is a superset of current, no? If that is right, why we need to check both? > > I'm not sure if it's strictly necessary but as I pointed out in the > other mail, there can be callers that are doing get_user_pages() on > behalf of other tasks and you probably want to be able to kill the > task that's actually _calling_ get_user_pages() as well. > > Pekka > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][V3]Make get_user_pages interruptible 2008-11-24 21:50 ` Ying Han @ 2008-11-24 22:45 ` Pekka Enberg 0 siblings, 0 replies; 7+ messages in thread From: Pekka Enberg @ 2008-11-24 22:45 UTC (permalink / raw) To: Ying Han Cc: Paul Menage, linux-mm, linux-kernel, akpm, David Rientjes, Rohit Seth Ying Han wrote: > thanks Pekka and i think one example of the case you mentioned is in > access_process_vm() which is calling > get_user_pages(tsk, mm, addr, 1, write, 1, &pages, &vma). However, it > is allocating only one page here which > much less likely to be stuck under memory pressure. Like you said, in > order to make it more flexible for future > changes, i might make the change like: >>>>> */ >>>>> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >>>>> - return i ? i : -ENOMEM; >>>>> + if (unlikely(sigkill_pending(current) | | sigkill_pending(tsk))) >>>>> + return i ? i : -ERESTARTSYS; > > is this something acceptable? The formatting is bit wacky but I'm certainly OK with the change. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-24 22:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-22 1:31 [PATCH][V3]Make get_user_pages interruptible Ying Han 2008-11-24 20:02 ` Paul Menage 2008-11-24 20:55 ` Pekka Enberg 2008-11-24 21:02 ` Ying Han 2008-11-24 21:13 ` Pekka Enberg 2008-11-24 21:50 ` Ying Han 2008-11-24 22:45 ` Pekka Enberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox