From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter()
Date: Tue, 4 Jun 2013 09:58:40 -0400 [thread overview]
Message-ID: <20130604135840.GN15576@cmpxchg.org> (raw)
In-Reply-To: <20130604130336.GE31242-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote:
> On Mon 03-06-13 17:44:37, Tejun Heo wrote:
> [...]
> > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > * is alive.
> > */
> > dead_count = atomic_read(&root->dead_count);
> > - smp_rmb();
> > +
> > last_visited = iter->last_visited;
> > if (last_visited) {
> > + /*
> > + * Paired with smp_wmb() below in this
> > + * function. The pair guarantee that
> > + * last_visited is more current than
> > + * last_dead_count, which may lead to
> > + * spurious iteration resets but guarantees
> > + * reliable detection of dead condition.
> > + */
> > + smp_rmb();
> > if ((dead_count != iter->last_dead_count) ||
> > !css_tryget(&last_visited->css)) {
> > last_visited = NULL;
>
> I originally had the barrier this way but Johannes pointed out it is not
> correct https://lkml.org/lkml/2013/2/11/411
> "
> !> + /*
> !> + * last_visited might be invalid if some of the group
> !> + * downwards was removed. As we do not know which one
> !> + * disappeared we have to start all over again from the
> !> + * root.
> !> + * css ref count then makes sure that css won't
> !> + * disappear while we iterate to the next memcg
> !> + */
> !> + last_visited = iter->last_visited;
> !> + dead_count = atomic_read(&root->dead_count);
> !> + smp_rmb();
> !
> !Confused about this barrier, see below.
> !
> !As per above, if you remove the iter lock, those lines are mixed up.
> !You need to read the dead count first because the writer updates the
> !dead count after it sets the new position. That way, if the dead
> !count gives the go-ahead, you KNOW that the position cache is valid,
> !because it has been updated first. If either the two reads or the two
> !writes get reordered, you risk seeing a matching dead count while the
> !position cache is stale.
> "
The original prototype code I sent looked like this:
mem_cgroup_iter:
rcu_read_lock()
if atomic_read(&root->dead_count) == iter->dead_count:
smp_rmb()
if tryget(iter->position):
position = iter->position
memcg = find_next(postion)
css_put(position)
iter->position = memcg
smp_wmb() /* Write position cache BEFORE marking it uptodate */
iter->dead_count = atomic_read(&root->dead_count)
rcu_read_unlock()
iter->last_position is written, THEN iter->last_dead_count is written.
So, yes, you "need to read the dead count" first to be sure
iter->last_position is uptodate. But iter->last_dead_count, not
root->dead_count. I should have caught this in the final submission
of your patch :(
Tejun's patch is not correct, either. Something like this?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010d6c1..92830fa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
mz = mem_cgroup_zoneinfo(root, nid, zid);
iter = &mz->reclaim_iter[reclaim->priority];
- last_visited = iter->last_visited;
if (prev && reclaim->generation != iter->generation) {
iter->last_visited = NULL;
goto out_unlock;
@@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
* css_tryget() verifies the cgroup pointed to
* is alive.
*/
+ last_visited = NULL;
dead_count = atomic_read(&root->dead_count);
- smp_rmb();
- last_visited = iter->last_visited;
- if (last_visited) {
- if ((dead_count != iter->last_dead_count) ||
- !css_tryget(&last_visited->css)) {
+ if (dead_count == iter->last_dead_count) {
+ /*
+ * The writer below sets the position
+ * pointer, then the dead count.
+ * Ensure we read the updated position
+ * when the dead count matches.
+ */
+ smp_rmb();
+ last_visited = iter->last_visited;
+ if (last_visited &&
+ !css_tryget(&last_visited->css))
last_visited = NULL;
- }
}
}
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>,
bsingharora@gmail.com, cgroups@vger.kernel.org,
linux-mm@kvack.org, lizefan@huawei.com
Subject: Re: [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter()
Date: Tue, 4 Jun 2013 09:58:40 -0400 [thread overview]
Message-ID: <20130604135840.GN15576@cmpxchg.org> (raw)
In-Reply-To: <20130604130336.GE31242@dhcp22.suse.cz>
On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote:
> On Mon 03-06-13 17:44:37, Tejun Heo wrote:
> [...]
> > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > * is alive.
> > */
> > dead_count = atomic_read(&root->dead_count);
> > - smp_rmb();
> > +
> > last_visited = iter->last_visited;
> > if (last_visited) {
> > + /*
> > + * Paired with smp_wmb() below in this
> > + * function. The pair guarantee that
> > + * last_visited is more current than
> > + * last_dead_count, which may lead to
> > + * spurious iteration resets but guarantees
> > + * reliable detection of dead condition.
> > + */
> > + smp_rmb();
> > if ((dead_count != iter->last_dead_count) ||
> > !css_tryget(&last_visited->css)) {
> > last_visited = NULL;
>
> I originally had the barrier this way but Johannes pointed out it is not
> correct https://lkml.org/lkml/2013/2/11/411
> "
> !> + /*
> !> + * last_visited might be invalid if some of the group
> !> + * downwards was removed. As we do not know which one
> !> + * disappeared we have to start all over again from the
> !> + * root.
> !> + * css ref count then makes sure that css won't
> !> + * disappear while we iterate to the next memcg
> !> + */
> !> + last_visited = iter->last_visited;
> !> + dead_count = atomic_read(&root->dead_count);
> !> + smp_rmb();
> !
> !Confused about this barrier, see below.
> !
> !As per above, if you remove the iter lock, those lines are mixed up.
> !You need to read the dead count first because the writer updates the
> !dead count after it sets the new position. That way, if the dead
> !count gives the go-ahead, you KNOW that the position cache is valid,
> !because it has been updated first. If either the two reads or the two
> !writes get reordered, you risk seeing a matching dead count while the
> !position cache is stale.
> "
The original prototype code I sent looked like this:
mem_cgroup_iter:
rcu_read_lock()
if atomic_read(&root->dead_count) == iter->dead_count:
smp_rmb()
if tryget(iter->position):
position = iter->position
memcg = find_next(postion)
css_put(position)
iter->position = memcg
smp_wmb() /* Write position cache BEFORE marking it uptodate */
iter->dead_count = atomic_read(&root->dead_count)
rcu_read_unlock()
iter->last_position is written, THEN iter->last_dead_count is written.
So, yes, you "need to read the dead count" first to be sure
iter->last_position is uptodate. But iter->last_dead_count, not
root->dead_count. I should have caught this in the final submission
of your patch :(
Tejun's patch is not correct, either. Something like this?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010d6c1..92830fa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
mz = mem_cgroup_zoneinfo(root, nid, zid);
iter = &mz->reclaim_iter[reclaim->priority];
- last_visited = iter->last_visited;
if (prev && reclaim->generation != iter->generation) {
iter->last_visited = NULL;
goto out_unlock;
@@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
* css_tryget() verifies the cgroup pointed to
* is alive.
*/
+ last_visited = NULL;
dead_count = atomic_read(&root->dead_count);
- smp_rmb();
- last_visited = iter->last_visited;
- if (last_visited) {
- if ((dead_count != iter->last_dead_count) ||
- !css_tryget(&last_visited->css)) {
+ if (dead_count == iter->last_dead_count) {
+ /*
+ * The writer below sets the position
+ * pointer, then the dead count.
+ * Ensure we read the updated position
+ * when the dead count matches.
+ */
+ smp_rmb();
+ last_visited = iter->last_visited;
+ if (last_visited &&
+ !css_tryget(&last_visited->css))
last_visited = NULL;
- }
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-04 13:58 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 0:44 [PATCHSET] memcg: fix and reimplement iterator Tejun Heo
2013-06-04 0:44 ` Tejun Heo
[not found] ` <1370306679-13129-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-04 0:44 ` [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter() Tejun Heo
2013-06-04 0:44 ` Tejun Heo
2013-06-04 13:03 ` Michal Hocko
[not found] ` <20130604130336.GE31242-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-04 13:58 ` Johannes Weiner [this message]
2013-06-04 13:58 ` Johannes Weiner
2013-06-04 15:29 ` Michal Hocko
2013-06-04 0:44 ` [PATCH 2/3] memcg: restructure mem_cgroup_iter() Tejun Heo
2013-06-04 0:44 ` Tejun Heo
[not found] ` <1370306679-13129-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-04 13:21 ` Michal Hocko
2013-06-04 13:21 ` Michal Hocko
2013-06-04 20:51 ` Tejun Heo
2013-06-04 0:44 ` [PATCH 3/3] memcg: simplify mem_cgroup_reclaim_iter Tejun Heo
2013-06-04 0:44 ` Tejun Heo
[not found] ` <1370306679-13129-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-04 13:18 ` Michal Hocko
2013-06-04 13:18 ` Michal Hocko
[not found] ` <20130604131843.GF31242-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-04 20:50 ` Tejun Heo
2013-06-04 20:50 ` Tejun Heo
2013-06-04 21:28 ` Michal Hocko
2013-06-04 21:55 ` Tejun Heo
2013-06-05 7:30 ` Michal Hocko
2013-06-05 8:20 ` Tejun Heo
2013-06-05 8:36 ` Michal Hocko
2013-06-05 8:44 ` Tejun Heo
2013-06-05 8:55 ` Michal Hocko
2013-06-05 9:03 ` Tejun Heo
[not found] ` <20130605082023.GG7303-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-05 14:39 ` Johannes Weiner
2013-06-05 14:39 ` Johannes Weiner
[not found] ` <20130605143949.GQ15576-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-06-05 14:50 ` Johannes Weiner
2013-06-05 14:50 ` Johannes Weiner
2013-06-05 17:22 ` Tejun Heo
2013-06-05 17:22 ` Tejun Heo
[not found] ` <20130605172212.GA10693-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-05 19:45 ` Johannes Weiner
2013-06-05 19:45 ` Johannes Weiner
[not found] ` <20130605194552.GI15721-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-06-05 20:06 ` Tejun Heo
2013-06-05 20:06 ` Tejun Heo
2013-06-05 21:17 ` Johannes Weiner
2013-06-05 22:20 ` Tejun Heo
[not found] ` <20130605222021.GL10693-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-05 22:27 ` Tejun Heo
2013-06-05 22:27 ` Tejun Heo
2013-06-06 11:50 ` Michal Hocko
2013-06-07 0:52 ` Tejun Heo
[not found] ` <20130607005242.GB16160-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-07 7:37 ` Michal Hocko
2013-06-07 7:37 ` Michal Hocko
2013-06-07 23:25 ` Tejun Heo
[not found] ` <20130607232557.GL14781-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-10 8:02 ` Michal Hocko
2013-06-10 8:02 ` Michal Hocko
2013-06-10 19:54 ` Tejun Heo
2013-06-10 20:48 ` Michal Hocko
[not found] ` <20130610204801.GA21003-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-10 23:13 ` Tejun Heo
2013-06-10 23:13 ` Tejun Heo
2013-06-11 7:27 ` Michal Hocko
2013-06-11 7:44 ` Tejun Heo
[not found] ` <20130611074404.GE22530-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-11 7:55 ` Michal Hocko
2013-06-11 7:55 ` Michal Hocko
2013-06-11 8:00 ` Tejun Heo
2013-06-05 14:56 ` Michal Hocko
2013-06-04 21:40 ` Johannes Weiner
[not found] ` <20130604214050.GP15576-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-06-04 21:49 ` Tejun Heo
2013-06-04 21:49 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130604135840.GN15576@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.