All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Balbir Singh <bsingharora@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Boris Zhmurov <bb@kernelpanic.ru>,
	"Christopher S. Aker" <caker@theshore.net>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Subject: Re: [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg
Date: Mon, 5 Dec 2016 08:16:29 -0800	[thread overview]
Message-ID: <20161205161629.GD3924@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161205124955.GG30758@dhcp22.suse.cz>

On Mon, Dec 05, 2016 at 01:49:55PM +0100, Michal Hocko wrote:
> [CC Paul - sorry I've tried to save you from more emails...]
> 
> On Mon 05-12-16 23:44:27, Balbir Singh wrote:
> > >
> > > Hi,
> > > there were multiple reportes of the similar RCU stalls. Only Boris has
> > > confirmed that this patch helps in his workload. Others might see a
> > > slightly different issue and that should be investigated if it is the
> > > case. As pointed out by Paul [1] cond_resched might be not sufficient
> > > to silence RCU stalls because that would require a real scheduling.
> > > This is a separate problem, though, and Paul is working with Peter [2]
> > > to resolve it.
> > >
> > > Anyway, I believe that this patch should be a good start because it
> > > really seems that nr_taken=0 during the LRU isolation can be triggered
> > > in the real life. All reporters are agreeing to start seeing this issue
> > > when moving on to 4.8 kernel which might be just a coincidence or a
> > > different behavior of some subsystem. Well, MM has moved from zone to
> > > node reclaim but I couldn't have found any direct relation to that
> > > change.
> > >
> > > [1] http://lkml.kernel.org/r/20161130142955.GS3924@linux.vnet.ibm.com
> > > [2] http://lkml.kernel.org/r/20161201124024.GB3924@linux.vnet.ibm.com
> > >
> > >  mm/vmscan.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index c05f00042430..c4abf08861d2 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2362,6 +2362,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
> > >                         }
> > >                 }
> > >
> > > +               cond_resched();
> > > +
> > 
> > I see a cond_resched_rcu_qs() as a part of linux next inside the while
> > (nr[..]) loop.
> 
> This is a left over from Paul's initial attempt to fix this issue. I
> expect him to drop his patch from his tree. He has considered it
> experimental anyway.

To prevent further confusion, I am dropping these patches from my tree:

80c099e11c19 ("mm: Prevent shrink_node() RCU CPU stall warnings")
34c53f5cd399 ("mm: Prevent shrink_node_memcg() RCU CPU stall warnings")

If you need them, please feel free to pull them in.

Given that I don't have those, I am dropping this one as well:

f2a471ffc8a8 ("rcu: Allow boot-time use of cond_resched_rcu_qs()")

If you need it, please let me know.

> > Do we need this as well?
> 
> Paul is working with Peter to make cond_resched general and cover RCU
> stalls even when cond_resched doesn't schedule because there is no
> runnable task.

And 0day just told me that my current attempt gets a 227% increase in
context switches on the unlink tests in LTP, so back to the drawing
board...

						Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Balbir Singh <bsingharora@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Boris Zhmurov <bb@kernelpanic.ru>,
	"Christopher S. Aker" <caker@theshore.net>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Subject: Re: [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg
Date: Mon, 5 Dec 2016 08:16:29 -0800	[thread overview]
Message-ID: <20161205161629.GD3924@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161205124955.GG30758@dhcp22.suse.cz>

On Mon, Dec 05, 2016 at 01:49:55PM +0100, Michal Hocko wrote:
> [CC Paul - sorry I've tried to save you from more emails...]
> 
> On Mon 05-12-16 23:44:27, Balbir Singh wrote:
> > >
> > > Hi,
> > > there were multiple reportes of the similar RCU stalls. Only Boris has
> > > confirmed that this patch helps in his workload. Others might see a
> > > slightly different issue and that should be investigated if it is the
> > > case. As pointed out by Paul [1] cond_resched might be not sufficient
> > > to silence RCU stalls because that would require a real scheduling.
> > > This is a separate problem, though, and Paul is working with Peter [2]
> > > to resolve it.
> > >
> > > Anyway, I believe that this patch should be a good start because it
> > > really seems that nr_taken=0 during the LRU isolation can be triggered
> > > in the real life. All reporters are agreeing to start seeing this issue
> > > when moving on to 4.8 kernel which might be just a coincidence or a
> > > different behavior of some subsystem. Well, MM has moved from zone to
> > > node reclaim but I couldn't have found any direct relation to that
> > > change.
> > >
> > > [1] http://lkml.kernel.org/r/20161130142955.GS3924@linux.vnet.ibm.com
> > > [2] http://lkml.kernel.org/r/20161201124024.GB3924@linux.vnet.ibm.com
> > >
> > >  mm/vmscan.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index c05f00042430..c4abf08861d2 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2362,6 +2362,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
> > >                         }
> > >                 }
> > >
> > > +               cond_resched();
> > > +
> > 
> > I see a cond_resched_rcu_qs() as a part of linux next inside the while
> > (nr[..]) loop.
> 
> This is a left over from Paul's initial attempt to fix this issue. I
> expect him to drop his patch from his tree. He has considered it
> experimental anyway.

To prevent further confusion, I am dropping these patches from my tree:

80c099e11c19 ("mm: Prevent shrink_node() RCU CPU stall warnings")
34c53f5cd399 ("mm: Prevent shrink_node_memcg() RCU CPU stall warnings")

If you need them, please feel free to pull them in.

Given that I don't have those, I am dropping this one as well:

f2a471ffc8a8 ("rcu: Allow boot-time use of cond_resched_rcu_qs()")

If you need it, please let me know.

> > Do we need this as well?
> 
> Paul is working with Peter to make cond_resched general and cover RCU
> stalls even when cond_resched doesn't schedule because there is no
> runnable task.

And 0day just told me that my current attempt gets a 227% increase in
context switches on the unlink tests in LTP, so back to the drawing
board...

						Thanx, Paul

  reply	other threads:[~2016-12-05 16:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-02  9:58 [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg Michal Hocko
2016-12-02  9:58 ` Michal Hocko
2016-12-05 12:44 ` Balbir Singh
2016-12-05 12:44   ` Balbir Singh
2016-12-05 12:49   ` Michal Hocko
2016-12-05 12:49     ` Michal Hocko
2016-12-05 16:16     ` Paul E. McKenney [this message]
2016-12-05 16:16       ` Paul E. McKenney
2016-12-09 10:13 ` Donald Buczek
2016-12-09 10:13   ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161205161629.GD3924@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bb@kernelpanic.ru \
    --cc=bsingharora@gmail.com \
    --cc=buczek@molgen.mpg.de \
    --cc=caker@theshore.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.