Re: rcu_preempt detected stalls.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: rcu_preempt detected stalls.
       [not found]               ` <20141027211329.GJ5718@linux.vnet.ibm.com>
@ 2014-10-27 23:44                 ` Paul E. McKenney
  2014-11-13 23:07                   ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2014-10-27 23:44 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > >>>>>>>>>>> > >>>>> > >>>
> > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > >>>>>>> > >>> > > critical section might be.
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > >>>>> > >> > 
> > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > >>>>> > >> > set and nothing else is showing up before/after that.
> > >>> > > Indeed it was directed to you.  ;-)
> > >>> > > 
> > >>> > > Does the following crude diagnostic patch turn up anything?
> > >> > 
> > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > OK, color me confused.  Could you please send me the full dmesg or a
> > > pointer to it?
> > 
> > Attached.
> 
> Thank you!  I would complain about the FAULT_INJECTION messages, but
> they don't appear to be happening all that frequently.
> 
> The stack dumps do look different here.  I suspect that this is a real
> issue in the VM code.

And to that end...  The filemap_map_pages() function does have loop over
a list of pages.  I wonder if the rcu_read_lock() should be moved into
the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
though it looks to me like the current radix_tree_for_each_slot() wants
to be under RCU protection.  But I am not seeing anything that requires
all iterations of the loop to be under the same RCU read-side critical
section.  Maybe something like the following patch?

							Thanx, Paul

------------------------------------------------------------------------

mm: Attempted fix for RCU CPU stall warning

It appears that filemap_map_pages() can stay in a single RCU read-side
critical section for a very long time if given a large area to map.
This could result in RCU CPU stall warnings.  This commit therefore breaks
the read-side critical section into per-iteration critical sections, taking
care to make sure that the radix_tree_for_each_slot() call itself remains
in an RCU read-side critical section, as required.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/mm/filemap.c b/mm/filemap.c
index 14b4642279f1..f78f144fb41f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2055,6 +2055,8 @@ skip:
 next:
 		if (iter.index == vmf->max_pgoff)
 			break;
+		rcu_read_unlock();
+		rcu_read_lock();
 	}
 	rcu_read_unlock();
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-27 23:44                 ` rcu_preempt detected stalls Paul E. McKenney
@ 2014-11-13 23:07                   ` Paul E. McKenney
  2014-11-13 23:10                     ` Sasha Levin
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2014-11-13 23:07 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > > >>>>>>>>>>> > >>>>> > >>>
> > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > > >>>>>>> > >>> > > critical section might be.
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > > >>>>> > >> > 
> > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > > >>>>> > >> > set and nothing else is showing up before/after that.
> > > >>> > > Indeed it was directed to you.  ;-)
> > > >>> > > 
> > > >>> > > Does the following crude diagnostic patch turn up anything?
> > > >> > 
> > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > > OK, color me confused.  Could you please send me the full dmesg or a
> > > > pointer to it?
> > > 
> > > Attached.
> > 
> > Thank you!  I would complain about the FAULT_INJECTION messages, but
> > they don't appear to be happening all that frequently.
> > 
> > The stack dumps do look different here.  I suspect that this is a real
> > issue in the VM code.
> 
> And to that end...  The filemap_map_pages() function does have loop over
> a list of pages.  I wonder if the rcu_read_lock() should be moved into
> the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
> though it looks to me like the current radix_tree_for_each_slot() wants
> to be under RCU protection.  But I am not seeing anything that requires
> all iterations of the loop to be under the same RCU read-side critical
> section.  Maybe something like the following patch?

Just following up, did the patch below help?

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> mm: Attempted fix for RCU CPU stall warning
> 
> It appears that filemap_map_pages() can stay in a single RCU read-side
> critical section for a very long time if given a large area to map.
> This could result in RCU CPU stall warnings.  This commit therefore breaks
> the read-side critical section into per-iteration critical sections, taking
> care to make sure that the radix_tree_for_each_slot() call itself remains
> in an RCU read-side critical section, as required.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 14b4642279f1..f78f144fb41f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2055,6 +2055,8 @@ skip:
>  next:
>  		if (iter.index == vmf->max_pgoff)
>  			break;
> +		rcu_read_unlock();
> +		rcu_read_lock();
>  	}
>  	rcu_read_unlock();
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-11-13 23:07                   ` Paul E. McKenney
@ 2014-11-13 23:10                     ` Sasha Levin
  0 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2014-11-13 23:10 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> > 
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you.  ;-)
>>>>>>>>> > > > >>> > > 
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up anything?
>>>>>>> > > > >> > 
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused.  Could you please send me the full dmesg or a
>>>>> > > > > pointer to it?
>>>> > > > 
>>>> > > > Attached.
>>> > > 
>>> > > Thank you!  I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > > 
>>> > > The stack dumps do look different here.  I suspect that this is a real
>>> > > issue in the VM code.
>> > 
>> > And to that end...  The filemap_map_pages() function does have loop over
>> > a list of pages.  I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection.  But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section.  Maybe something like the following patch?
> Just following up, did the patch below help?

I'm not seeing any more stalls with filemap in them, but I don see different
traces.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-11-13 23:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20141013173504.GA27955@redhat.com>
     [not found] ` <543DDD5E.9080602@oracle.com>
     [not found]   ` <20141023183917.GX4977@linux.vnet.ibm.com>
     [not found]     ` <54494F2F.6020005@oracle.com>
     [not found]       ` <20141023195808.GB4977@linux.vnet.ibm.com>
     [not found]         ` <544A45F8.2030207@oracle.com>
     [not found]           ` <20141024161337.GQ4977@linux.vnet.ibm.com>
     [not found]             ` <544A80B3.9070800@oracle.com>
     [not found]               ` <20141027211329.GJ5718@linux.vnet.ibm.com>
2014-10-27 23:44                 ` rcu_preempt detected stalls Paul E. McKenney
2014-11-13 23:07                   ` Paul E. McKenney
2014-11-13 23:10                     ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).