All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Joonsoo Kim <js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Andrey Ryabinin
	<aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Suren Baghdasaryan
	<surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Rik van Riel <riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org>,
	Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org
Subject: Re: [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root
Date: Wed, 12 Feb 2020 13:18:34 -0500	[thread overview]
Message-ID: <20200212181834.GD180867@cmpxchg.org> (raw)
In-Reply-To: <20200212102817.GA18107@js1304-desktop>

On Wed, Feb 12, 2020 at 07:28:19PM +0900, Joonsoo Kim wrote:
> Hello, Johannes.
> 
> When I tested my patchset on v5.5, I found that my patchset doesn't
> work as intended. I tracked down the issue and this patch would be the
> reason of unintended work. I don't fully understand the patchset so I
> could be wrong. Please let me ask some questions.
> 
> On Thu, Nov 07, 2019 at 12:53:33PM -0800, Johannes Weiner wrote:
> ...snip...
> > -static void snapshot_refaults(struct mem_cgroup *root_memcg, pg_data_t *pgdat)
> > +static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
> >  {
> > -	struct mem_cgroup *memcg;
> > -
> > -	memcg = mem_cgroup_iter(root_memcg, NULL, NULL);
> > -	do {
> > -		unsigned long refaults;
> > -		struct lruvec *lruvec;
> > +	struct lruvec *target_lruvec;
> > +	unsigned long refaults;
> >  
> > -		lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > -		refaults = lruvec_page_state_local(lruvec, WORKINGSET_ACTIVATE);
> > -		lruvec->refaults = refaults;
> > -	} while ((memcg = mem_cgroup_iter(root_memcg, memcg, NULL)));
> > +	target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
> > +	refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE);
> > +	target_lruvec->refaults = refaults;
> 
> Is it correct to just snapshot the refault for the target memcg? I
> think that we need to snapshot the refault for all the child memcgs
> since we have traversed all the child memcgs with the refault count
> that is aggregration of all the child memcgs. If next reclaim happens
> from the child memcg, workingset transition that is already considered
> could be considered again.

Good catch, you're right! We have to update all cgroups in the tree,
like we used to. However, we need to use lruvec_page_state() instead
of _local, because we do recursive comparisons in shrink_node()! So
it's not a clean revert of that hunk.

Does this patch here fix the problem you are seeing?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c82e9831003f..e7431518db13 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2993,12 +2993,17 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 
 static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
 {
-	struct lruvec *target_lruvec;
-	unsigned long refaults;
+	struct mem_cgroup *memcg;
 
-	target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
-	refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE);
-	target_lruvec->refaults = refaults;
+	memcg = mem_cgroup_iter(target_memcg, NULL, NULL);
+	do {
+		unsigned long refaults;
+		struct lruvec *lruvec;
+
+		lruvec = mem_cgroup_lruvec(memcg, pgdat);
+		refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE);
+		lruvec->refaults = refaults;
+	} while ((memcg = mem_cgroup_iter(target_memcg, memcg, NULL)));
 }
 
 /*

> > @@ -277,12 +305,12 @@ void workingset_refault(struct page *page, void *shadow)
> >  	 * would be better if the root_mem_cgroup existed in all
> >  	 * configurations instead.
> >  	 */
> > -	memcg = mem_cgroup_from_id(memcgid);
> > -	if (!mem_cgroup_disabled() && !memcg)
> > +	eviction_memcg = mem_cgroup_from_id(memcgid);
> > +	if (!mem_cgroup_disabled() && !eviction_memcg)
> >  		goto out;
> > -	lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > -	refault = atomic_long_read(&lruvec->inactive_age);
> > -	active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
> > +	eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat);
> > +	refault = atomic_long_read(&eviction_lruvec->inactive_age);
> > +	active_file = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE);
> 
> Do we need to use the aggregation LRU count of all the child memcgs?
> AFAIU, refault here is the aggregation counter of all the related
> memcgs. Without using the aggregation count for LRU, active_file could
> be so small than the refault distance and refault cannot happen
> correctly.

lruvec_page_state() *is* aggregated for all child memcgs (as opposed
to lruvec_page_state_local()), so that comparison looks correct to me.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Rik van Riel <riel@surriel.com>, Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root
Date: Wed, 12 Feb 2020 13:18:34 -0500	[thread overview]
Message-ID: <20200212181834.GD180867@cmpxchg.org> (raw)
In-Reply-To: <20200212102817.GA18107@js1304-desktop>

On Wed, Feb 12, 2020 at 07:28:19PM +0900, Joonsoo Kim wrote:
> Hello, Johannes.
> 
> When I tested my patchset on v5.5, I found that my patchset doesn't
> work as intended. I tracked down the issue and this patch would be the
> reason of unintended work. I don't fully understand the patchset so I
> could be wrong. Please let me ask some questions.
> 
> On Thu, Nov 07, 2019 at 12:53:33PM -0800, Johannes Weiner wrote:
> ...snip...
> > -static void snapshot_refaults(struct mem_cgroup *root_memcg, pg_data_t *pgdat)
> > +static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
> >  {
> > -	struct mem_cgroup *memcg;
> > -
> > -	memcg = mem_cgroup_iter(root_memcg, NULL, NULL);
> > -	do {
> > -		unsigned long refaults;
> > -		struct lruvec *lruvec;
> > +	struct lruvec *target_lruvec;
> > +	unsigned long refaults;
> >  
> > -		lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > -		refaults = lruvec_page_state_local(lruvec, WORKINGSET_ACTIVATE);
> > -		lruvec->refaults = refaults;
> > -	} while ((memcg = mem_cgroup_iter(root_memcg, memcg, NULL)));
> > +	target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
> > +	refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE);
> > +	target_lruvec->refaults = refaults;
> 
> Is it correct to just snapshot the refault for the target memcg? I
> think that we need to snapshot the refault for all the child memcgs
> since we have traversed all the child memcgs with the refault count
> that is aggregration of all the child memcgs. If next reclaim happens
> from the child memcg, workingset transition that is already considered
> could be considered again.

Good catch, you're right! We have to update all cgroups in the tree,
like we used to. However, we need to use lruvec_page_state() instead
of _local, because we do recursive comparisons in shrink_node()! So
it's not a clean revert of that hunk.

Does this patch here fix the problem you are seeing?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c82e9831003f..e7431518db13 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2993,12 +2993,17 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 
 static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat)
 {
-	struct lruvec *target_lruvec;
-	unsigned long refaults;
+	struct mem_cgroup *memcg;
 
-	target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
-	refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE);
-	target_lruvec->refaults = refaults;
+	memcg = mem_cgroup_iter(target_memcg, NULL, NULL);
+	do {
+		unsigned long refaults;
+		struct lruvec *lruvec;
+
+		lruvec = mem_cgroup_lruvec(memcg, pgdat);
+		refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE);
+		lruvec->refaults = refaults;
+	} while ((memcg = mem_cgroup_iter(target_memcg, memcg, NULL)));
 }
 
 /*

> > @@ -277,12 +305,12 @@ void workingset_refault(struct page *page, void *shadow)
> >  	 * would be better if the root_mem_cgroup existed in all
> >  	 * configurations instead.
> >  	 */
> > -	memcg = mem_cgroup_from_id(memcgid);
> > -	if (!mem_cgroup_disabled() && !memcg)
> > +	eviction_memcg = mem_cgroup_from_id(memcgid);
> > +	if (!mem_cgroup_disabled() && !eviction_memcg)
> >  		goto out;
> > -	lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > -	refault = atomic_long_read(&lruvec->inactive_age);
> > -	active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
> > +	eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat);
> > +	refault = atomic_long_read(&eviction_lruvec->inactive_age);
> > +	active_file = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE);
> 
> Do we need to use the aggregation LRU count of all the child memcgs?
> AFAIU, refault here is the aggregation counter of all the related
> memcgs. Without using the aggregation count for LRU, active_file could
> be so small than the refault distance and refault cannot happen
> correctly.

lruvec_page_state() *is* aggregated for all child memcgs (as opposed
to lruvec_page_state_local()), so that comparison looks correct to me.


  reply	other threads:[~2020-02-12 18:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 20:53 [PATCH 0/3] mm: fix page aging across multiple cgroups Johannes Weiner
2019-11-07 20:53 ` [PATCH 1/3] mm: vmscan: move file exhaustion detection to the node level Johannes Weiner
2019-11-10 22:02   ` Suren Baghdasaryan
2019-11-10 22:09   ` Khadarnimcaan Khadarnimcaan
2019-11-07 20:53 ` [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root Johannes Weiner
2019-11-11  2:01   ` Suren Baghdasaryan
2019-11-12 17:45     ` Johannes Weiner
2019-11-12 18:45       ` Suren Baghdasaryan
2019-11-12 18:59         ` Johannes Weiner
2019-11-12 20:35           ` Suren Baghdasaryan
2019-11-14 23:47   ` Shakeel Butt
2019-11-15 16:07     ` Johannes Weiner
2019-11-15 16:52       ` Shakeel Butt
     [not found]   ` <20191107205334.158354-3-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-02-12 10:28     ` Joonsoo Kim
2020-02-12 10:28       ` Joonsoo Kim
2020-02-12 18:18       ` Johannes Weiner [this message]
2020-02-12 18:18         ` Johannes Weiner
     [not found]         ` <20200212181834.GD180867-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-02-14  1:17           ` Joonsoo Kim
2020-02-14  1:17             ` Joonsoo Kim
2019-11-07 20:53 ` [PATCH 3/3] mm: vmscan: enforce inactive:active ratio " Johannes Weiner
2019-11-11  2:15   ` Suren Baghdasaryan
2019-11-12 18:00     ` Johannes Weiner
2019-11-12 19:13       ` Suren Baghdasaryan
2019-11-12 20:34         ` Suren Baghdasaryan
2019-11-15  0:29   ` Shakeel Butt
2019-11-27 22:16     ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200212181834.GD180867@cmpxchg.org \
    --to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=kernel-team-b10kYP2dOMg@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-IBi9RG/b67k@public.gmane.org \
    --cc=riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.