All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Glauber Costa <glommer@parallels.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH v3 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling
Date: Tue, 13 Nov 2012 16:10:41 -0500	[thread overview]
Message-ID: <20121113211041.GB1543@cmpxchg.org> (raw)
In-Reply-To: <20121030103559.GA7394@dhcp22.suse.cz>

On Tue, Oct 30, 2012 at 11:35:59AM +0100, Michal Hocko wrote:
> On Mon 29-10-12 15:00:22, Andrew Morton wrote:
> > On Mon, 29 Oct 2012 17:58:45 +0400
> > Glauber Costa <glommer@parallels.com> wrote:
> > 
> > > > + * move charges to its parent or the root cgroup if the group has no
> > > > + * parent (aka use_hierarchy==0).
> > > > + * Although this might fail (get_page_unless_zero, isolate_lru_page or
> > > > + * mem_cgroup_move_account fails) the failure is always temporary and
> > > > + * it signals a race with a page removal/uncharge or migration. In the
> > > > + * first case the page is on the way out and it will vanish from the LRU
> > > > + * on the next attempt and the call should be retried later.
> > > > + * Isolation from the LRU fails only if page has been isolated from
> > > > + * the LRU since we looked at it and that usually means either global
> > > > + * reclaim or migration going on. The page will either get back to the
> > > > + * LRU or vanish.
> > > 
> > > I just wonder for how long can it go in the worst case?
> > 
> > If the kernel is uniprocessor and the caller is SCHED_FIFO: ad infinitum!
> 
> You are right, if the rmdir (resp. echo > force_empty) at SCHED_FIFO
> races with put_page (on a shared page) which gets preempted after
> put_page_testzero and before __page_cache_release then we are screwed:
> 
> 						put_page(page)
> 						  put_page_testzero
> 						  <preempted and page still on LRU>
> mem_cgroup_force_empty_list
>   page = list_entry(list->prev, struct page, lru);
>   mem_cgroup_move_parent(page)
>     get_page_unless_zero <fails>
>   cond_resched() <scheduled again>
> 
> The race window is really small but it is definitely possible. I am not
> happy about this state and it should be probably mentioned in the
> patch description but I do not see any way around (except for hacks like
> sched_setscheduler for the current which is, ehm...) and still keep
> do_not_fail contract here.
> 
> Can we consider this as a corner case (it is much easier to kill a
> machine with SCHED_FIFO than this anyway) or the concern is really
> strong and we should come with a solution before this can get merged?

Wouldn't the much bigger race window be reclaim having the page
isolated and SCHED_FIFO preventing it from putback?

I also don't think this is a new class of problem, though.

Would it make sense to stick a wait_on_page_locked() in there just so
that we don't busy spin on a page under migration/reclaim?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Glauber Costa <glommer@parallels.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH v3 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling
Date: Tue, 13 Nov 2012 16:10:41 -0500	[thread overview]
Message-ID: <20121113211041.GB1543@cmpxchg.org> (raw)
In-Reply-To: <20121030103559.GA7394@dhcp22.suse.cz>

On Tue, Oct 30, 2012 at 11:35:59AM +0100, Michal Hocko wrote:
> On Mon 29-10-12 15:00:22, Andrew Morton wrote:
> > On Mon, 29 Oct 2012 17:58:45 +0400
> > Glauber Costa <glommer@parallels.com> wrote:
> > 
> > > > + * move charges to its parent or the root cgroup if the group has no
> > > > + * parent (aka use_hierarchy==0).
> > > > + * Although this might fail (get_page_unless_zero, isolate_lru_page or
> > > > + * mem_cgroup_move_account fails) the failure is always temporary and
> > > > + * it signals a race with a page removal/uncharge or migration. In the
> > > > + * first case the page is on the way out and it will vanish from the LRU
> > > > + * on the next attempt and the call should be retried later.
> > > > + * Isolation from the LRU fails only if page has been isolated from
> > > > + * the LRU since we looked at it and that usually means either global
> > > > + * reclaim or migration going on. The page will either get back to the
> > > > + * LRU or vanish.
> > > 
> > > I just wonder for how long can it go in the worst case?
> > 
> > If the kernel is uniprocessor and the caller is SCHED_FIFO: ad infinitum!
> 
> You are right, if the rmdir (resp. echo > force_empty) at SCHED_FIFO
> races with put_page (on a shared page) which gets preempted after
> put_page_testzero and before __page_cache_release then we are screwed:
> 
> 						put_page(page)
> 						  put_page_testzero
> 						  <preempted and page still on LRU>
> mem_cgroup_force_empty_list
>   page = list_entry(list->prev, struct page, lru);
>   mem_cgroup_move_parent(page)
>     get_page_unless_zero <fails>
>   cond_resched() <scheduled again>
> 
> The race window is really small but it is definitely possible. I am not
> happy about this state and it should be probably mentioned in the
> patch description but I do not see any way around (except for hacks like
> sched_setscheduler for the current which is, ehm...) and still keep
> do_not_fail contract here.
> 
> Can we consider this as a corner case (it is much easier to kill a
> machine with SCHED_FIFO than this anyway) or the concern is really
> strong and we should come with a solution before this can get merged?

Wouldn't the much bigger race window be reclaim having the page
isolated and SCHED_FIFO preventing it from putback?

I also don't think this is a new class of problem, though.

Would it make sense to stick a wait_on_page_locked() in there just so
that we don't busy spin on a page under migration/reclaim?

  parent reply	other threads:[~2012-11-13 21:10 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-26 11:37 memcg/cgroup: do not fail fail on pre_destroy callbacks Michal Hocko
2012-10-26 11:37 ` Michal Hocko
2012-10-26 11:37 ` Michal Hocko
2012-10-26 11:37 ` [PATCH v3 1/6] memcg: split mem_cgroup_force_empty into reclaiming and reparenting parts Michal Hocko
2012-10-26 11:37   ` Michal Hocko
2012-10-29 13:45   ` Glauber Costa
2012-10-29 13:45     ` Glauber Costa
     [not found]   ` <1351251453-6140-2-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-31 16:29     ` Johannes Weiner
2012-10-31 16:29       ` Johannes Weiner
2012-10-31 16:29       ` Johannes Weiner
2012-10-26 11:37 ` [PATCH v3 2/6] memcg: root_cgroup cannot reach mem_cgroup_move_parent Michal Hocko
2012-10-26 11:37   ` Michal Hocko
2012-10-29 13:48   ` Glauber Costa
2012-10-29 13:48     ` Glauber Costa
     [not found]     ` <508E8910.40203-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-29 13:52       ` Michal Hocko
2012-10-29 13:52         ` Michal Hocko
2012-10-29 13:52         ` Michal Hocko
     [not found]   ` <1351251453-6140-3-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-31 16:31     ` Johannes Weiner
2012-10-31 16:31       ` Johannes Weiner
2012-10-31 16:31       ` Johannes Weiner
2012-10-26 11:37 ` [PATCH v3 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling Michal Hocko
2012-10-26 11:37   ` Michal Hocko
     [not found]   ` <1351251453-6140-4-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-29 13:58     ` Glauber Costa
2012-10-29 13:58       ` Glauber Costa
2012-10-29 13:58       ` Glauber Costa
     [not found]       ` <508E8B95.406-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-29 14:15         ` Michal Hocko
2012-10-29 14:15           ` Michal Hocko
2012-10-29 14:15           ` Michal Hocko
2012-10-29 15:09           ` Glauber Costa
2012-10-29 15:09             ` Glauber Costa
2012-10-29 22:00       ` Andrew Morton
2012-10-29 22:00         ` Andrew Morton
     [not found]         ` <20121029150022.a595b866.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2012-10-30 10:35           ` Michal Hocko
2012-10-30 10:35             ` Michal Hocko
2012-10-30 10:35             ` Michal Hocko
     [not found]             ` <20121030103559.GA7394-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 21:30               ` Andrew Morton
2012-10-31 21:30                 ` Andrew Morton
2012-10-31 21:30                 ` Andrew Morton
2012-11-13 21:10             ` Johannes Weiner [this message]
2012-11-13 21:10               ` Johannes Weiner
     [not found]               ` <20121113211041.GB1543-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-11-14 13:59                 ` Michal Hocko
2012-11-14 13:59                   ` Michal Hocko
2012-11-14 13:59                   ` Michal Hocko
2012-11-14 18:33                   ` Johannes Weiner
2012-11-14 18:33                     ` Johannes Weiner
2012-10-26 11:37 ` [PATCH v3 4/6] cgroups: forbid pre_destroy callback to fail Michal Hocko
2012-10-26 11:37   ` Michal Hocko
     [not found]   ` <1351251453-6140-5-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-29 14:04     ` Glauber Costa
2012-10-29 14:04       ` Glauber Costa
2012-10-29 14:04       ` Glauber Costa
     [not found]       ` <508E8CDE.1090702-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-29 14:06         ` Glauber Costa
2012-10-29 14:06           ` Glauber Costa
2012-10-29 14:06           ` Glauber Costa
     [not found]           ` <508E8D6A.5040602-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-29 14:17             ` Michal Hocko
2012-10-29 14:17               ` Michal Hocko
2012-10-29 14:17               ` Michal Hocko
2012-11-13 21:13   ` Johannes Weiner
2012-11-13 21:13     ` Johannes Weiner
2012-10-26 11:37 ` [PATCH v3 5/6] memcg: make mem_cgroup_reparent_charges non failing Michal Hocko
2012-10-26 11:37   ` Michal Hocko
2012-10-29 14:07   ` Glauber Costa
2012-10-29 14:07     ` Glauber Costa
2012-10-26 11:37 ` [PATCH v3 6/6] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Michal Hocko
2012-10-26 11:37   ` Michal Hocko
2012-10-29 14:08   ` Glauber Costa
2012-10-29 14:08     ` Glauber Costa
2012-10-29 23:26 ` memcg/cgroup: do not fail fail on pre_destroy callbacks Tejun Heo
2012-10-29 23:26   ` Tejun Heo
     [not found]   ` <20121029232602.GF4066-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-30 23:37     ` Michal Hocko
2012-10-30 23:37       ` Michal Hocko
2012-10-30 23:37       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121113211041.GB1543@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=glommer@parallels.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.