[PATCH RFC] memcg: close the race window between OOM detection and killing

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC] memcg: close the race window between OOM detection and killing
@ 2015-06-03  3:15 Tejun Heo
  2015-06-03 14:44 ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2015-06-03  3:15 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Hello,

This patch closes the race window by introducing OOM victim generation
number to detect whether any exited between OOM detection and killing;
however, this isn't the prettiest thing in the world and is nasty in
that memcg OOM mechanism deviates from system-wide OOM killer.

The only reason memcg OOM killer behaves asynchronously (unwinding
stack and then handling) is memcg userland OOM handling, which may end
up blocking for userland actions while still holding whatever locks
that it was holding at the time it was invoking try_charge() leading
to a deadlock.

However, given that userland OOMs are retriable, this doesn't have to
be this complicated.  Waiting with timeout in try_charge()
synchronously should be enough - in the unlikely cases where forward
progress can't be made, the OOM killing can simply abort waiting and
continue on.  If it is an OOM deadlock which requires death of more
victims, OOM condition will trigger again and kill more.

IOW, it'd be cleaner to do everything synchronously while holding
oom_lock with timeout to get out of rare deadlocks.

What do you think?

Thanks.
----- 8< -----
Memcg OOM killings are done at the end of page fault apart from OOM
detection.  This allows the following race condition.

	Task A				Task B

	OOM detection
					OOM detection
	OOM kill
	victim exits
					OOM kill

Task B has no way of knowing that another task has already killed an
OOM victim which proceeded to exit and release memory and will
unnecessarily pick another victim.  In highly contended cases, this
can lead to multiple unnecessary chained killings.

This patch closes this race window by adding per-memcg OOM victim exit
generation number.  Each task snapshots it when trying to charge.  If
OOM condition is triggered, the kill path compares the remembered
generation against the current value.  If they differ, it indicates
that some victims have exited between the charge attempt and OOM kill
path and the task shouldn't pick another victim.

The condition can be reliably triggered with multiple allocating
processes by modifying mem_cgroup_oom_trylock() to retry several times
with a short delay.  With the patch applied, memcg OOM correctly
detects the race condition and skips OOM killing to retry the
allocation.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/memcontrol.h |    9 ++++++-
 include/linux/sched.h      |    3 +-
 mm/memcontrol.c            |   52 +++++++++++++++++++++++++++++++++++++++++++++
 mm/oom_kill.c              |    5 ++++
 4 files changed, 66 insertions(+), 3 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -126,8 +126,8 @@ bool mem_cgroup_lruvec_online(struct lru
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
 void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
-extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
-					struct task_struct *p);
+void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p);
+void mem_cgroup_exit_oom_victim(void);
 
 static inline void mem_cgroup_oom_enable(void)
 {
@@ -321,6 +321,11 @@ mem_cgroup_print_oom_info(struct mem_cgr
 {
 }
 
+static inline void
+mem_cgroup_exit_oom_victim(void)
+{
+}
+
 static inline struct mem_cgroup *mem_cgroup_begin_page_stat(struct page *page)
 {
 	return NULL;
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1770,7 +1770,8 @@ struct task_struct {
 		struct mem_cgroup *memcg;
 		gfp_t gfp_mask;
 		int order;
-		unsigned int may_oom:1;
+		u16 oom_exit_gen;
+		u16 may_oom:1;
 	} memcg_oom;
 #endif
 #ifdef CONFIG_UPROBES
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -285,6 +285,21 @@ struct mem_cgroup {
 	 */
 	bool use_hierarchy;
 
+	/*
+	 * Because memcg OOM detection and killing aren't in the same
+	 * critical section, OOM victims might exit between the detection
+	 * and killing steps of another OOMing task leading to unnecessary
+	 * consecutive OOM killings.  The following counter, bumped
+	 * whenever an OOM victim exits, is used to detect such race
+	 * conditions by testing whether it has changed between detection
+	 * and killing.
+	 *
+	 * Use u16 to avoid bloating task->memcg_oom_info.  While u16 can
+	 * wrap inbetween, it's highly unlikely and we can afford rare
+	 * inaccuracies.  Protected by oom_lock.
+	 */
+	u16		oom_exit_gen;
+
 	/* protected by memcg_oom_lock */
 	bool		oom_lock;
 	int		under_oom;
@@ -1490,6 +1505,28 @@ void mem_cgroup_print_oom_info(struct me
 	mutex_unlock(&oom_info_lock);
 }
 
+/**
+ * mem_cgroup_exit_oom_victim - note the exit of an OOM victim
+ *
+ * Called from exit_oom_victm() with oom_lock held.  This is used to bump
+ * memcg->oom_exit_gen which is used to avoid unnecessary chained OOM
+ * killings.
+ */
+void mem_cgroup_exit_oom_victim(void)
+{
+	struct mem_cgroup *memcg;
+
+	lockdep_assert_held(&oom_lock);
+
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+
+	/* paired with load_acquire in try_charge() */
+	smp_store_release(&memcg->oom_exit_gen, memcg->oom_exit_gen + 1);
+
+	rcu_read_unlock();
+ }
+
 /*
  * This function returns the number of memcg under hierarchy tree. Returns
  * 1(self count) if no children.
@@ -1533,6 +1570,13 @@ static void mem_cgroup_out_of_memory(str
 	mutex_lock(&oom_lock);
 
 	/*
+	 * If we raced OOM victim exits between our charge attempt and
+	 * here, there's no reason to kill more.  Bail and retry.
+	 */
+	if (current->memcg_oom.oom_exit_gen != memcg->oom_exit_gen)
+		goto unlock;
+
+	/*
 	 * If current has a pending SIGKILL or is exiting, then automatically
 	 * select it.  The goal is to allow it to allocate so that it may
 	 * quickly exit and free its memory.
@@ -2245,6 +2289,14 @@ static int try_charge(struct mem_cgroup
 	if (mem_cgroup_is_root(memcg))
 		goto done;
 retry:
+	/*
+	 * Snapshot the current OOM exit generation number.  The generation
+	 * number has to be updated after memory is released and read
+	 * before charging is attempted.  Use load_acquire paired with
+	 * store_release in mem_cgroup_exit_oom_victim() for ordering.
+	 */
+	current->memcg_oom.oom_exit_gen = smp_load_acquire(&memcg->oom_exit_gen);
+
 	if (consume_stock(memcg, nr_pages))
 		goto done;
 
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -435,10 +435,15 @@ void mark_oom_victim(struct task_struct
  */
 void exit_oom_victim(void)
 {
+	mutex_lock(&oom_lock);
+
+	mem_cgroup_exit_oom_victim();
 	clear_thread_flag(TIF_MEMDIE);
 
 	if (!atomic_dec_return(&oom_victims))
 		wake_up_all(&oom_victims_wait);
+
+	mutex_unlock(&oom_lock);
 }
 
 /**

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
  2015-06-03  3:15 [PATCH RFC] memcg: close the race window between OOM detection and killing Tejun Heo
@ 2015-06-03 14:44 ` Michal Hocko
       [not found]   ` <20150603144414.GG16201-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-06-03 14:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Johannes Weiner, cgroups, linux-mm

On Wed 03-06-15 12:15:44, Tejun Heo wrote:
> Hello,
> 
> This patch closes the race window by introducing OOM victim generation
> number to detect whether any exited between OOM detection and killing;
> however, this isn't the prettiest thing in the world and is nasty in
> that memcg OOM mechanism deviates from system-wide OOM killer.

The race does exist in the global case as well AFAICS.
__alloc_pages_may_oom
  mutex_trylock
  get_page_from_freelist # fails
  <preempted>				exit_mm # releases some memory
  out_of_memory
  					  exit_oom_victim
    # No TIF_MEMDIE task so kill a new

The race window might be smaller but it is possible in principle.
Why does it make sense to treat those two in a different way?

> The only reason memcg OOM killer behaves asynchronously (unwinding
> stack and then handling) is memcg userland OOM handling, which may end
> up blocking for userland actions while still holding whatever locks
> that it was holding at the time it was invoking try_charge() leading
> to a deadlock.

This is not the only reason. In-kernel memcg oom handling needs it
as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
full callstack on OOM"). In fact it was the in-kernel case which has
triggered this change. We simply cannot wait for oom with the stack and
all the state the charge is called from.

> However, given that userland OOMs are retriable, this doesn't have to
> be this complicated.  Waiting with timeout in try_charge()
> synchronously should be enough - in the unlikely cases where forward
> progress can't be made, the OOM killing can simply abort waiting and
> continue on.  If it is an OOM deadlock which requires death of more
> victims, OOM condition will trigger again and kill more.
> 
> IOW, it'd be cleaner to do everything synchronously while holding
> oom_lock with timeout to get out of rare deadlocks.

Deadlocks are quite real and we really have to unwind and handle with a
clean stack.

> What do you think?
> 
> Thanks.
> ----- 8< -----
> Memcg OOM killings are done at the end of page fault apart from OOM
> detection.  This allows the following race condition.
> 
> 	Task A				Task B
> 
> 	OOM detection
> 					OOM detection
> 	OOM kill
> 	victim exits
> 					OOM kill
> 
> Task B has no way of knowing that another task has already killed an
> OOM victim which proceeded to exit and release memory and will
> unnecessarily pick another victim.  In highly contended cases, this
> can lead to multiple unnecessary chained killings.

Yes I can see this might happen. I haven't seen this in the real life
but I guess such a load can be constructed. The question is whether this
is serious enough to make the code more complicated.
 
> This patch closes this race window by adding per-memcg OOM victim exit
> generation number.  Each task snapshots it when trying to charge.  If
> OOM condition is triggered, the kill path compares the remembered
> generation against the current value.  If they differ, it indicates
> that some victims have exited between the charge attempt and OOM kill
> path and the task shouldn't pick another victim.

The idea is good. See comments to the implementation below.

> The condition can be reliably triggered with multiple allocating
> processes by modifying mem_cgroup_oom_trylock() to retry several times
> with a short delay.  With the patch applied, memcg OOM correctly
> detects the race condition and skips OOM killing to retry the
> allocation.

Were you able to trigger this even without adding delays?

> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  include/linux/memcontrol.h |    9 ++++++-
>  include/linux/sched.h      |    3 +-
>  mm/memcontrol.c            |   52 +++++++++++++++++++++++++++++++++++++++++++++
>  mm/oom_kill.c              |    5 ++++
>  4 files changed, 66 insertions(+), 3 deletions(-)
> 
[...]
> +/**
> + * mem_cgroup_exit_oom_victim - note the exit of an OOM victim
> + *
> + * Called from exit_oom_victm() with oom_lock held.  This is used to bump
> + * memcg->oom_exit_gen which is used to avoid unnecessary chained OOM
> + * killings.
> + */
> +void mem_cgroup_exit_oom_victim(void)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	lockdep_assert_held(&oom_lock);
> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);

The OOM might have happened in a parent memcg and the OOM victim might
be a sibling or where ever in the hierarchy under oom memcg.
So you have to use the OOM memcg to track the counter otherwise the
tasks from other memcgs in the hierarchy racing with the oom victim
would miss it anyway. You can store the target memcg into the victim
when killing it.

[...]
> @@ -2245,6 +2289,14 @@ static int try_charge(struct mem_cgroup
>  	if (mem_cgroup_is_root(memcg))
>  		goto done;
>  retry:
> +	/*
> +	 * Snapshot the current OOM exit generation number.  The generation
> +	 * number has to be updated after memory is released and read
> +	 * before charging is attempted.  Use load_acquire paired with
> +	 * store_release in mem_cgroup_exit_oom_victim() for ordering.
> +	 */
> +	current->memcg_oom.oom_exit_gen = smp_load_acquire(&memcg->oom_exit_gen);

Same here. You should store the oom memcg gen count. Ideally hook it
into mem_cgroup_oom.

> +
>  	if (consume_stock(memcg, nr_pages))
>  		goto done;
>  
[...]

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20150603144414.GG16201-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]   ` <20150603144414.GG16201-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2015-06-03 19:36     ` Tejun Heo
  2015-06-04  9:30       ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2015-06-03 19:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Hey, Michal.

On Wed, Jun 03, 2015 at 04:44:14PM +0200, Michal Hocko wrote:
> The race does exist in the global case as well AFAICS.
> __alloc_pages_may_oom
>   mutex_trylock
>   get_page_from_freelist # fails
>   <preempted>				exit_mm # releases some memory
>   out_of_memory
>   					  exit_oom_victim
>     # No TIF_MEMDIE task so kill a new

Hmmm?  In -mm, if __alloc_page_may_oom() fails trylock, it never calls
out_of_memory().

> The race window might be smaller but it is possible in principle.
> Why does it make sense to treat those two in a different way?

The main difference here is that the alloc path does the whole thing
synchrnously and thus the OOM detection and killing can be put in the
same critical section which isn't the case for the memcg OOM handling.

> > The only reason memcg OOM killer behaves asynchronously (unwinding
> > stack and then handling) is memcg userland OOM handling, which may end
> > up blocking for userland actions while still holding whatever locks
> > that it was holding at the time it was invoking try_charge() leading
> > to a deadlock.
> 
> This is not the only reason. In-kernel memcg oom handling needs it
> as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
> full callstack on OOM"). In fact it was the in-kernel case which has
> triggered this change. We simply cannot wait for oom with the stack and
> all the state the charge is called from.

Why should this be any different from OOM handling from page allocator
tho?  That can end up in the exact same situation and currently it
won't be able to get out of such situation - the OOM victim would be
stuck with OOM_SCAN_ABORT and all subsequent OOM invocations wouldn't
do anything due to OOM_SCAN_ABORT.

The solution here seems to be timing out on those waits.  ie. pick an
OOM victim, kill it, wait for it to exit for enough seconds.  If the
victim doesn't exit, ignore it and repeat the process, which is
guaranteed to make progress no matter what and appliable for both
allocator and memcg OOM handling.

> > IOW, it'd be cleaner to do everything synchronously while holding
> > oom_lock with timeout to get out of rare deadlocks.
> 
> Deadlocks are quite real and we really have to unwind and handle with a
> clean stack.

Yeah, they're real but we can deal with them in a more consistent way
using timeouts.

> > Memcg OOM killings are done at the end of page fault apart from OOM
> > detection.  This allows the following race condition.
> > 
> > 	Task A				Task B
> > 
> > 	OOM detection
> > 					OOM detection
> > 	OOM kill
> > 	victim exits
> > 					OOM kill
> > 
> > Task B has no way of knowing that another task has already killed an
> > OOM victim which proceeded to exit and release memory and will
> > unnecessarily pick another victim.  In highly contended cases, this
> > can lead to multiple unnecessary chained killings.
> 
> Yes I can see this might happen. I haven't seen this in the real life
> but I guess such a load can be constructed. The question is whether this
> is serious enough to make the code more complicated.

Yeah, I do have bug report and dmesg where multiple processes are
being killed (each one pretty large) when one should have been enough.

...
> > +void mem_cgroup_exit_oom_victim(void)
> > +{
> > +	struct mem_cgroup *memcg;
> > +
> > +	lockdep_assert_held(&oom_lock);
> > +
> > +	rcu_read_lock();
> > +	memcg = mem_cgroup_from_task(current);
> 
> The OOM might have happened in a parent memcg and the OOM victim might
> be a sibling or where ever in the hierarchy under oom memcg.
> So you have to use the OOM memcg to track the counter otherwise the
> tasks from other memcgs in the hierarchy racing with the oom victim
> would miss it anyway. You can store the target memcg into the victim
> when killing it.

In those cases there actually are more than one domains OOMing, but
yeah the count prolly should propagate all the way to the root.
Gees... I dislike this approach even more.  Grabbng the oom lock and
doing everything synchronously with timeout will be far simpler and
easier to follow.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
  2015-06-03 19:36     ` Tejun Heo
@ 2015-06-04  9:30       ` Michal Hocko
       [not found]         ` <20150604093031.GB4806-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-06-04  9:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Johannes Weiner, cgroups, linux-mm

On Thu 04-06-15 04:36:39, Tejun Heo wrote:
> Hey, Michal.
> 
> On Wed, Jun 03, 2015 at 04:44:14PM +0200, Michal Hocko wrote:
> > The race does exist in the global case as well AFAICS.
> > __alloc_pages_may_oom
> >   mutex_trylock
> >   get_page_from_freelist # fails
> >   <preempted>				exit_mm # releases some memory
> >   out_of_memory
> >   					  exit_oom_victim
> >     # No TIF_MEMDIE task so kill a new
> 
> Hmmm?  In -mm, if __alloc_page_may_oom() fails trylock, it never calls
> out_of_memory().

Sure but the oom_lock might be free already. out_of_memory doesn't wait
for the victim to finish. It just does schedule_timeout_killable.

> > The race window might be smaller but it is possible in principle.
> > Why does it make sense to treat those two in a different way?
> 
> The main difference here is that the alloc path does the whole thing
> synchrnously and thus the OOM detection and killing can be put in the
> same critical section which isn't the case for the memcg OOM handling.

This is true but there is still a time window between the last
allocation attempt and out_of_memory when the OOM victim might have
exited and another task would be selected.
 
> > > The only reason memcg OOM killer behaves asynchronously (unwinding
> > > stack and then handling) is memcg userland OOM handling, which may end
> > > up blocking for userland actions while still holding whatever locks
> > > that it was holding at the time it was invoking try_charge() leading
> > > to a deadlock.
> > 
> > This is not the only reason. In-kernel memcg oom handling needs it
> > as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
> > full callstack on OOM"). In fact it was the in-kernel case which has
> > triggered this change. We simply cannot wait for oom with the stack and
> > all the state the charge is called from.
> 
> Why should this be any different from OOM handling from page allocator
> tho? 

Yes the global OOM is prone to deadlock. This has been discussed a lot
and we still do not have a good answer for that. The primary problem
is that small allocations do not fail and retry indefinitely so an OOM
victim might be blocked on a lock held by a task which is the allocator.

This is less likely and harder to trigger with standard loads than in
memcg environment though.

> That can end up in the exact same situation and currently it
> won't be able to get out of such situation - the OOM victim would be
> stuck with OOM_SCAN_ABORT and all subsequent OOM invocations wouldn't
> do anything due to OOM_SCAN_ABORT.
> 
> The solution here seems to be timing out on those waits.  ie. pick an
> OOM victim, kill it, wait for it to exit for enough seconds.  If the
> victim doesn't exit, ignore it and repeat the process, which is
> guaranteed to make progress no matter what and appliable for both
> allocator and memcg OOM handling.

There have been suggestions to add an OOM timeout and ignore the
previous OOM victim after the timeout expires and select a new
victim. This sounds attractive but this approach has its own problems
(http://marc.info/?l=linux-mm&m=141686814824684&w=2).
I am convinced that a more appropriate solution for this is to not
pretend that small allocation never fail and start failing them after
OOM killer is not able to make any progress (GFP_NOFS allocations would
be the first candidate and the easiest one to trigger deadlocks via
i_mutex). Johannes was also suggesting an OOM memory reserve which would
be used for OOM contexts.

Also OOM killer can be improved and shrink some of the victims memory
before killing it (e.g. drop private clean pages and their page tables).

> > > IOW, it'd be cleaner to do everything synchronously while holding
> > > oom_lock with timeout to get out of rare deadlocks.
> > 
> > Deadlocks are quite real and we really have to unwind and handle with a
> > clean stack.
> 
> Yeah, they're real but we can deal with them in a more consistent way
> using timeouts.
> 
> > > Memcg OOM killings are done at the end of page fault apart from OOM
> > > detection.  This allows the following race condition.
> > > 
> > > 	Task A				Task B
> > > 
> > > 	OOM detection
> > > 					OOM detection
> > > 	OOM kill
> > > 	victim exits
> > > 					OOM kill
> > > 
> > > Task B has no way of knowing that another task has already killed an
> > > OOM victim which proceeded to exit and release memory and will
> > > unnecessarily pick another victim.  In highly contended cases, this
> > > can lead to multiple unnecessary chained killings.
> > 
> > Yes I can see this might happen. I haven't seen this in the real life
> > but I guess such a load can be constructed. The question is whether this
> > is serious enough to make the code more complicated.
> 
> Yeah, I do have bug report and dmesg where multiple processes are
> being killed (each one pretty large) when one should have been enough.

Could you share the oom reports?

> ...
> > > +void mem_cgroup_exit_oom_victim(void)
> > > +{
> > > +	struct mem_cgroup *memcg;
> > > +
> > > +	lockdep_assert_held(&oom_lock);
> > > +
> > > +	rcu_read_lock();
> > > +	memcg = mem_cgroup_from_task(current);
> > 
> > The OOM might have happened in a parent memcg and the OOM victim might
> > be a sibling or where ever in the hierarchy under oom memcg.
> > So you have to use the OOM memcg to track the counter otherwise the
> > tasks from other memcgs in the hierarchy racing with the oom victim
> > would miss it anyway. You can store the target memcg into the victim
> > when killing it.
> 
> In those cases there actually are more than one domains OOMing, but
> yeah the count prolly should propagate all the way to the root.

Not sure I understand what you mean by more OOM domains. There is only
one in a hierarchy which has reached its limit and it is not able to
reclaim any charges.

> Gees... I dislike this approach even more.  Grabbng the oom lock and
> doing everything synchronously with timeout will be far simpler and
> easier to follow.

It might sound easier but it has its own problems...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20150604093031.GB4806-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]         ` <20150604093031.GB4806-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2015-06-04 19:06           ` Johannes Weiner
  2015-06-05 14:29             ` Michal Hocko
  2015-06-04 19:29           ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2015-06-04 19:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu, Jun 04, 2015 at 11:30:31AM +0200, Michal Hocko wrote:
> There have been suggestions to add an OOM timeout and ignore the
> previous OOM victim after the timeout expires and select a new
> victim. This sounds attractive but this approach has its own problems
> (http://marc.info/?l=linux-mm&m=141686814824684&w=2).

Since this list of concerns have been brought up but never really
addressed, let me give it a shot.  From David's email:

: The oom killer timeout is always an attractive remedy to this situation 
: and gets proposed quite often.  Several problems: (1) you can needlessly 
: panic the machine because no other processes are eligible for oom kill 
: after declaring that the first oom kill victim cannot make progress,

If we run out of OOM victims that can successfully exit, then we are
genuinely deadlocked.  What else is there to do?  A panic() is better
than locking up quietly in that case.

: (2) it can lead to unnecessary oom killing if the oom kill victim
: can exit but hasn't be scheduled or is in the process of exiting,

We can set the timeout sufficiently high that this should be a fair
trade-off.  Let's say 10 seconds.  If your only remaining means of
reclaiming memory, which is waiting for that one task to exit, takes
longer than 10 seconds, aren't you in big enough trouble already?  It
seems reasonable to assume that there won't be any more progress.  But
even if there were, the machine is in a state bad enough that a second
OOM kill should not be the end of the world.

: (3) you can easily turn the oom killer into a serial oom killer
: since there's no guarantee the next process that is chosen won't be
: affected by the same problem,

Again, this would still be better than deadlocking.

: and (4) this doesn't fix the problem if an oom disabled process is
: wedged trying to allocate memory while holding a mutex that others
: are waiting on.

I don't follow.  If another OOM victim is chosen and can exit, the
task that is trying to allocate with the lock held will finish the
allocation and release the lock.

> I am convinced that a more appropriate solution for this is to not
> pretend that small allocation never fail and start failing them after
> OOM killer is not able to make any progress (GFP_NOFS allocations would
> be the first candidate and the easiest one to trigger deadlocks via
> i_mutex). Johannes was also suggesting an OOM memory reserve which would
> be used for OOM contexts.

I am no longer convinced we can ever go back to failing smaller
allocations and NOFS allocations.  The filesystem people involved in
that discussion have proven completely uncooperative on that subject.

So I think we should make the OOM killer as robust as possible.  It's
just unnecessary to deadlock on a single process when there are more
candidates out there that we could try instead.  We are already in a
worst-case state, killing more tasks is not going to make it worse.

> Also OOM killer can be improved and shrink some of the victims memory
> before killing it (e.g. drop private clean pages and their page tables).

That might work too.  It's just a bit more complex and I don't really
see the downside of moving on to other victims after a timeout.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
  2015-06-04 19:06           ` Johannes Weiner
@ 2015-06-05 14:29             ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-06-05 14:29 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Tejun Heo, David Rientjes, cgroups, linux-mm

On Thu 04-06-15 15:06:49, Johannes Weiner wrote:
> On Thu, Jun 04, 2015 at 11:30:31AM +0200, Michal Hocko wrote:
> > There have been suggestions to add an OOM timeout and ignore the
> > previous OOM victim after the timeout expires and select a new
> > victim. This sounds attractive but this approach has its own problems
> > (http://marc.info/?l=linux-mm&m=141686814824684&w=2).
> 
> Since this list of concerns have been brought up but never really
> addressed, let me give it a shot.  From David's email:
[...]
> : (3) you can easily turn the oom killer into a serial oom killer
> : since there's no guarantee the next process that is chosen won't be
> : affected by the same problem,

This is the primary argument I had in mind when nacking the previous
attempt. If we go after another task after the timeout and that happens
to be just another blocked task we will not move any further, yet we
have killed another task pointlessly.  Tetsuo had a load which generates
hundreds of such tasks all blocked on the same i_mutex. So you would end
up in a basically time unbounded unresponsive system. If you put timeout
too low you risk pointless killing and the large timeout will not help
in the pathological case as well for which it is proposed for.

I can imagine we can enhance panic_on_oom policy to consider also a
timeout (panic_on_oom_timeout). This would be a much better solution
IMO and much cleaner policy because it would be bounded in time. So the
administrator knows what the timeout actually means. It also sounds like
a more appropriate very last resort because the current panic_on_oom
is just too restricted.

[...]

> > I am convinced that a more appropriate solution for this is to not
> > pretend that small allocation never fail and start failing them after
> > OOM killer is not able to make any progress (GFP_NOFS allocations would
> > be the first candidate and the easiest one to trigger deadlocks via
> > i_mutex). Johannes was also suggesting an OOM memory reserve which would
> > be used for OOM contexts.
> 
> I am no longer convinced we can ever go back to failing smaller
> allocations and NOFS allocations.  The filesystem people involved in
> that discussion have proven completely uncooperative on that subject.

Yeah, this is sad but I fail to see why this should be a reason to stop
trying to make NOFS behavior more sensible. Those allocations are
clearly restricted and retrying endlessly is simply wrong conceptually.
Maybe this alone will turn out being sufficient to prevent from the dead
lock issues in 99% of cases. panic_on_oom_timeout might be the last
resort for those who want to be really sure that the system will not be
unresponsive for an unbounded amount of time.

> So I think we should make the OOM killer as robust as possible. 

I do agree. I just think there are other steps we can do which would be
less disruptive like giving oom context access to memory reserves. Page
table reclaim would be more complicated but maybe we even do not need
that. 

> It's just unnecessary to deadlock on a single process when there are
> more candidates out there that we could try instead.  We are already
> in a worst-case state, killing more tasks is not going to make it
> worse.
> 
> > Also OOM killer can be improved and shrink some of the victims memory
> > before killing it (e.g. drop private clean pages and their page tables).
> 
> That might work too.  It's just a bit more complex and I don't really
> see the downside of moving on to other victims after a timeout.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]         ` <20150604093031.GB4806-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2015-06-04 19:06           ` Johannes Weiner
@ 2015-06-04 19:29           ` Tejun Heo
       [not found]             ` <20150604192936.GR20091-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2015-06-04 19:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Hello, Michal.

On Thu, Jun 04, 2015 at 11:30:31AM +0200, Michal Hocko wrote:
> > Hmmm?  In -mm, if __alloc_page_may_oom() fails trylock, it never calls
> > out_of_memory().
> 
> Sure but the oom_lock might be free already. out_of_memory doesn't wait
> for the victim to finish. It just does schedule_timeout_killable.

That doesn't matter because the detection and TIF_MEMDIE assertion are
atomic w.r.t. oom_lock and TIF_MEMDIE essentially extends the locking
by preventing further OOM kills.  Am I missing something?

> > The main difference here is that the alloc path does the whole thing
> > synchrnously and thus the OOM detection and killing can be put in the
> > same critical section which isn't the case for the memcg OOM handling.
> 
> This is true but there is still a time window between the last
> allocation attempt and out_of_memory when the OOM victim might have
> exited and another task would be selected.

Please see above.

> > > This is not the only reason. In-kernel memcg oom handling needs it
> > > as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
> > > full callstack on OOM"). In fact it was the in-kernel case which has
> > > triggered this change. We simply cannot wait for oom with the stack and
> > > all the state the charge is called from.
> > 
> > Why should this be any different from OOM handling from page allocator
> > tho? 
> 
> Yes the global OOM is prone to deadlock. This has been discussed a lot
> and we still do not have a good answer for that. The primary problem
> is that small allocations do not fail and retry indefinitely so an OOM
> victim might be blocked on a lock held by a task which is the allocator.
> This is less likely and harder to trigger with standard loads than in
> memcg environment though.

Deadlocks from infallible allocations getting interlocked are
different.  OOM killer can't really get around that by itself but I'm
not talking about those deadlocks but at the same time they're a lot
less likely.  It's about OOM victim trapped in a deadlock failing to
release memory because someone else is waiting for that memory to be
released while blocking the victim.  Sure, the two issues are related
but once you solve things getting blocked on single OOM victim, it
becomes a lot less of an issue.

> There have been suggestions to add an OOM timeout and ignore the
> previous OOM victim after the timeout expires and select a new
> victim. This sounds attractive but this approach has its own problems
> (http://marc.info/?l=linux-mm&m=141686814824684&w=2).

Here are the the issues the message lists

 (1) you can needlessly panic the machine because no other processes
 are eligible for oom kill after declaring that the first oom kill
 victim cannot make progress,

This is extremely unlikely unless most processes in the system are
involved in the same deadlock.  All processes have SIGKILL pending but
nobody can exit?  In such cases, panic prolly isn't such a bad idea.
I mean, where would you go from there?

 (2) it can lead to unnecessary oom killing if the oom kill victim can
 exit but hasn't be scheduled or is in the process of exiting,

It's a matter of having a reasonable timeout.  OOM killing isn't an
exact operation to begin with and if an OOM victim fails to release
memory in, say 10s or whatever, finding another target is the right
thing to do.

 (3) you can easily turn the oom killer into a serial oom killer since
 there's no guarantee the next process that is chosen won't be
 affected by the same problem, and

And how is that worse than deadlocking?  OOM killer is a mechanism to
prevent the system from complete lockup at the cost of essentially
randomly butchering its workload.  The nasty userland memcg OOM hack
aside, by the time OOM killing has engaged, the system is already at
the end of the rope.

 (4) this doesn't fix the problem if an oom disabled process is wedged
 trying to allocate memory while holding a mutex that others are
 waiting

*All* others in the system are waiting on this particular OOM disabled
process and nobody can release any memory?  Yeah, panic then.

The arguments in that message aren't really against adding timeouts
but a lot more for wholesale removal of OOM killing.  That's an
awesome goal but is way far fetched at the moment.

> I am convinced that a more appropriate solution for this is to not
> pretend that small allocation never fail and start failing them after
> OOM killer is not able to make any progress (GFP_NOFS allocations would
> be the first candidate and the easiest one to trigger deadlocks via
> i_mutex). Johannes was also suggesting an OOM memory reserve which would
> be used for OOM contexts.

I don't follow why you reached such conclusion.  The arguments don't
really make sense to me.  Once you accept that OOM killer is a
sledgehammer rather than a surgical blade, the direction to take seems
pretty obvious to me and it *can't* be a precision mechanism - no
matter what, it's killing a random process with SIGKILL.

> Also OOM killer can be improved and shrink some of the victims memory
> before killing it (e.g. drop private clean pages and their page tables).

And why would we go to that level of sophiscation.  Just wait a while
and kill more until it gets unwedged.  That will achieve most effects
of being a lot more sophiscated with a lot less complexity and again
those minute differences don't matter here.

> > Gees... I dislike this approach even more.  Grabbng the oom lock and
> > doing everything synchronously with timeout will be far simpler and
> > easier to follow.
> 
> It might sound easier but it has its own problems...

I'm still failing to see what the problems are.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20150604192936.GR20091-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]             ` <20150604192936.GR20091-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-06-05 14:35               ` Michal Hocko
       [not found]                 ` <20150605143534.GD26113-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-06-05 14:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri 05-06-15 04:29:36, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Jun 04, 2015 at 11:30:31AM +0200, Michal Hocko wrote:
> > > Hmmm?  In -mm, if __alloc_page_may_oom() fails trylock, it never calls
> > > out_of_memory().
> > 
> > Sure but the oom_lock might be free already. out_of_memory doesn't wait
> > for the victim to finish. It just does schedule_timeout_killable.
> 
> That doesn't matter because the detection and TIF_MEMDIE assertion are
> atomic w.r.t. oom_lock and TIF_MEMDIE essentially extends the locking
> by preventing further OOM kills.  Am I missing something?

This is true but TIF_MEMDIE releasing is not atomic wrt. the allocation
path. So the oom victim could have released memory and dropped
TIF_MEMDIE but the allocation path hasn't noticed that because it's passed
        /*
         * Go through the zonelist yet one more time, keep very high watermark
         * here, this is only to catch a parallel oom killing, we must fail if
         * we're still under heavy pressure.
         */
        page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
                                        ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);

and goes on to kill another task because there is no TIF_MEMDIE
anymore.
 
> > > The main difference here is that the alloc path does the whole thing
> > > synchrnously and thus the OOM detection and killing can be put in the
> > > same critical section which isn't the case for the memcg OOM handling.
> > 
> > This is true but there is still a time window between the last
> > allocation attempt and out_of_memory when the OOM victim might have
> > exited and another task would be selected.
> 
> Please see above.
> 
> > > > This is not the only reason. In-kernel memcg oom handling needs it
> > > > as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
> > > > full callstack on OOM"). In fact it was the in-kernel case which has
> > > > triggered this change. We simply cannot wait for oom with the stack and
> > > > all the state the charge is called from.
> > > 
> > > Why should this be any different from OOM handling from page allocator
> > > tho? 
> > 
> > Yes the global OOM is prone to deadlock. This has been discussed a lot
> > and we still do not have a good answer for that. The primary problem
> > is that small allocations do not fail and retry indefinitely so an OOM
> > victim might be blocked on a lock held by a task which is the allocator.
> > This is less likely and harder to trigger with standard loads than in
> > memcg environment though.
> 
> Deadlocks from infallible allocations getting interlocked are
> different.  OOM killer can't really get around that by itself but I'm
> not talking about those deadlocks but at the same time they're a lot
> less likely.  It's about OOM victim trapped in a deadlock failing to
> release memory because someone else is waiting for that memory to be
> released while blocking the victim. 

I thought those would be in the allocator context - which was the
example I've provided. What kind of context do you have in mind?

> Sure, the two issues are related
> but once you solve things getting blocked on single OOM victim, it
> becomes a lot less of an issue.
> 
> > There have been suggestions to add an OOM timeout and ignore the
> > previous OOM victim after the timeout expires and select a new
> > victim. This sounds attractive but this approach has its own problems
> > (http://marc.info/?l=linux-mm&m=141686814824684&w=2).
> 
> Here are the the issues the message lists

Let's focus on discussing those points in reply to Johannes' email. AFAIU
your notes very in line with his.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20150605143534.GD26113-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]                 ` <20150605143534.GD26113-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2015-06-05 14:57                   ` Tejun Heo
       [not found]                     ` <20150605145759.GA5946-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2015-06-05 14:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Hello, Michal.

On Fri, Jun 05, 2015 at 04:35:34PM +0200, Michal Hocko wrote:
> > That doesn't matter because the detection and TIF_MEMDIE assertion are
> > atomic w.r.t. oom_lock and TIF_MEMDIE essentially extends the locking
> > by preventing further OOM kills.  Am I missing something?
> 
> This is true but TIF_MEMDIE releasing is not atomic wrt. the allocation
> path. So the oom victim could have released memory and dropped

This is splitting hairs.  In vast majority of problem cases, if
anything is gonna be locked up, it's gonna be locked up before
releasing memory it's holding.  Yet again, this is a blunt instrument
to unwedge the system.  It's difficult to see the point of aiming that
level of granularity.

> TIF_MEMDIE but the allocation path hasn't noticed that because it's passed
>         /*
>          * Go through the zonelist yet one more time, keep very high watermark
>          * here, this is only to catch a parallel oom killing, we must fail if
>          * we're still under heavy pressure.
>          */
>         page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
>                                         ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
> 
> and goes on to kill another task because there is no TIF_MEMDIE
> anymore.

Why would this be an issue if we disallow parallel killing?

> > Deadlocks from infallible allocations getting interlocked are
> > different.  OOM killer can't really get around that by itself but I'm
> > not talking about those deadlocks but at the same time they're a lot
> > less likely.  It's about OOM victim trapped in a deadlock failing to
> > release memory because someone else is waiting for that memory to be
> > released while blocking the victim. 
> 
> I thought those would be in the allocator context - which was the
> example I've provided. What kind of context do you have in mind?

Yeah, sure, they'd be in the allocator context holding other resources
which are being waited upon.  The first case was deadlock based on
purely memory starvation where NOFAIL allocations interlock with each
other w/o involving other resources.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20150605145759.GA5946-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
       [not found]                     ` <20150605145759.GA5946-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-06-05 15:21                       ` Michal Hocko
  2015-06-06  0:56                         ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-06-05 15:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri 05-06-15 23:57:59, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Jun 05, 2015 at 04:35:34PM +0200, Michal Hocko wrote:
> > > That doesn't matter because the detection and TIF_MEMDIE assertion are
> > > atomic w.r.t. oom_lock and TIF_MEMDIE essentially extends the locking
> > > by preventing further OOM kills.  Am I missing something?
> > 
> > This is true but TIF_MEMDIE releasing is not atomic wrt. the allocation
> > path. So the oom victim could have released memory and dropped
> 
> This is splitting hairs.  In vast majority of problem cases, if
> anything is gonna be locked up, it's gonna be locked up before
> releasing memory it's holding.  Yet again, this is a blunt instrument
> to unwedge the system.  It's difficult to see the point of aiming that
> level of granularity.

I was just pointing out that the OOM killer is inherently racy even for
the global case. Not sure we are talking about the same thing here.

> 
> > TIF_MEMDIE but the allocation path hasn't noticed that because it's passed
> >         /*
> >          * Go through the zonelist yet one more time, keep very high watermark
> >          * here, this is only to catch a parallel oom killing, we must fail if
> >          * we're still under heavy pressure.
> >          */
> >         page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
> >                                         ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
> > 
> > and goes on to kill another task because there is no TIF_MEMDIE
> > anymore.
> 
> Why would this be an issue if we disallow parallel killing?

I am confused. The whole thread has started by fixing a race in memcg
and I was asking about the global case which is racy currently as well.

> > > Deadlocks from infallible allocations getting interlocked are
> > > different.  OOM killer can't really get around that by itself but I'm
> > > not talking about those deadlocks but at the same time they're a lot
> > > less likely.  It's about OOM victim trapped in a deadlock failing to
> > > release memory because someone else is waiting for that memory to be
> > > released while blocking the victim. 
> > 
> > I thought those would be in the allocator context - which was the
> > example I've provided. What kind of context do you have in mind?
> 
> Yeah, sure, they'd be in the allocator context holding other resources
> which are being waited upon.  The first case was deadlock based on
> purely memory starvation where NOFAIL allocations interlock with each
> other w/o involving other resources.

OK, I guess we were just talking past each other.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
  2015-06-05 15:21                       ` Michal Hocko
@ 2015-06-06  0:56                         ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2015-06-06  0:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Johannes Weiner, cgroups, linux-mm

Hello, Michal.

On Fri, Jun 05, 2015 at 05:21:35PM +0200, Michal Hocko wrote:
...
> > > TIF_MEMDIE but the allocation path hasn't noticed that because it's passed
> > >         /*
> > >          * Go through the zonelist yet one more time, keep very high watermark
> > >          * here, this is only to catch a parallel oom killing, we must fail if
> > >          * we're still under heavy pressure.
> > >          */
> > >         page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
> > >                                         ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
> > > 
> > > and goes on to kill another task because there is no TIF_MEMDIE
> > > anymore.
> > 
> > Why would this be an issue if we disallow parallel killing?
> 
> I am confused. The whole thread has started by fixing a race in memcg
> and I was asking about the global case which is racy currently as well.

Ah, okay, I thought we were still talking about issues w/ making
things synchronous, but anyways, the above isn't a synchronization
race per-se which is what the original patch was trying to address for
memcg OOM path, right?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-06-06  0:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-03  3:15 [PATCH RFC] memcg: close the race window between OOM detection and killing Tejun Heo
2015-06-03 14:44 ` Michal Hocko
     [not found]   ` <20150603144414.GG16201-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-06-03 19:36     ` Tejun Heo
2015-06-04  9:30       ` Michal Hocko
     [not found]         ` <20150604093031.GB4806-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-06-04 19:06           ` Johannes Weiner
2015-06-05 14:29             ` Michal Hocko
2015-06-04 19:29           ` Tejun Heo
     [not found]             ` <20150604192936.GR20091-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-06-05 14:35               ` Michal Hocko
     [not found]                 ` <20150605143534.GD26113-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-06-05 14:57                   ` Tejun Heo
     [not found]                     ` <20150605145759.GA5946-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-06-05 15:21                       ` Michal Hocko
2015-06-06  0:56                         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox