From: Sasha Levin <sasha.levin@oracle.com>
To: paulmck@linux.vnet.ibm.com
Cc: Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
htejun@gmail.com, linux-mm@kvack.org
Subject: Re: rcu_preempt detected stalls.
Date: Thu, 13 Nov 2014 18:10:52 -0500 [thread overview]
Message-ID: <54653A7C.80803@oracle.com> (raw)
In-Reply-To: <20141113230751.GB26051@linux.vnet.ibm.com>
On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> (detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > >
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > >
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm... Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> >
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you. ;-)
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up anything?
>>>>>>> > > > >> >
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused. Could you please send me the full dmesg or a
>>>>> > > > > pointer to it?
>>>> > > >
>>>> > > > Attached.
>>> > >
>>> > > Thank you! I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > >
>>> > > The stack dumps do look different here. I suspect that this is a real
>>> > > issue in the VM code.
>> >
>> > And to that end... The filemap_map_pages() function does have loop over
>> > a list of pages. I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop. CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection. But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section. Maybe something like the following patch?
> Just following up, did the patch below help?
I'm not seeing any more stalls with filemap in them, but I don see different
traces.
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sasha.levin@oracle.com>
To: paulmck@linux.vnet.ibm.com
Cc: Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
htejun@gmail.com, linux-mm@kvack.org
Subject: Re: rcu_preempt detected stalls.
Date: Thu, 13 Nov 2014 18:10:52 -0500 [thread overview]
Message-ID: <54653A7C.80803@oracle.com> (raw)
In-Reply-To: <20141113230751.GB26051@linux.vnet.ibm.com>
On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> (detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > >
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > >
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm... Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> >
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you. ;-)
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up anything?
>>>>>>> > > > >> >
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused. Could you please send me the full dmesg or a
>>>>> > > > > pointer to it?
>>>> > > >
>>>> > > > Attached.
>>> > >
>>> > > Thank you! I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > >
>>> > > The stack dumps do look different here. I suspect that this is a real
>>> > > issue in the VM code.
>> >
>> > And to that end... The filemap_map_pages() function does have loop over
>> > a list of pages. I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop. CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection. But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section. Maybe something like the following patch?
> Just following up, did the patch below help?
I'm not seeing any more stalls with filemap in them, but I don see different
traces.
Thanks,
Sasha
next prev parent reply other threads:[~2014-11-13 23:10 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-13 17:35 rcu_preempt detected stalls Dave Jones
2014-10-15 2:35 ` Sasha Levin
2014-10-23 18:39 ` Paul E. McKenney
2014-10-23 18:55 ` Sasha Levin
2014-10-23 19:58 ` Paul E. McKenney
2014-10-24 12:28 ` Sasha Levin
2014-10-24 16:13 ` Paul E. McKenney
2014-10-24 16:39 ` Sasha Levin
2014-10-27 21:13 ` Paul E. McKenney
2014-10-27 23:44 ` Paul E. McKenney
2014-10-27 23:44 ` Paul E. McKenney
2014-11-13 23:07 ` Paul E. McKenney
2014-11-13 23:07 ` Paul E. McKenney
2014-11-13 23:10 ` Sasha Levin [this message]
2014-11-13 23:10 ` Sasha Levin
2014-10-30 23:41 ` Sasha Levin
2014-10-23 18:32 ` Paul E. McKenney
2014-10-23 18:40 ` Dave Jones
2014-10-23 19:28 ` Paul E. McKenney
2014-10-23 19:37 ` Dave Jones
2014-10-23 19:52 ` Paul E. McKenney
2014-10-23 20:28 ` Dave Jones
2014-10-23 20:44 ` Paul E. McKenney
2014-10-23 19:13 ` Oleg Nesterov
2014-10-23 19:38 ` Paul E. McKenney
2014-10-23 19:53 ` Oleg Nesterov
2014-10-23 20:24 ` Paul E. McKenney
2014-10-23 21:13 ` Oleg Nesterov
2014-10-23 21:38 ` Paul E. McKenney
2014-10-25 3:16 ` Dâniel Fraga
-- strict thread matches above, loose matches on Subject: below --
2021-08-31 15:21 Jorge Ramirez-Ortiz, Foundries
2021-08-31 15:21 ` Jorge Ramirez-Ortiz, Foundries
2021-08-31 15:53 ` Paul E. McKenney
2021-08-31 15:53 ` Paul E. McKenney
2021-08-31 17:01 ` Zhouyi Zhou
2021-08-31 17:01 ` Zhouyi Zhou
2021-08-31 17:11 ` Zhouyi Zhou
2021-08-31 17:11 ` Zhouyi Zhou
2021-09-01 1:03 ` Zhouyi Zhou
2021-09-01 1:03 ` Zhouyi Zhou
2021-09-01 4:08 ` Neeraj Upadhyay
2021-09-01 6:47 ` Zhouyi Zhou
2021-09-01 6:47 ` Zhouyi Zhou
2021-09-01 8:23 ` Jorge Ramirez-Ortiz, Foundries
2021-09-01 8:23 ` Jorge Ramirez-Ortiz, Foundries
2021-09-01 9:17 ` Zhouyi Zhou
2021-09-01 9:17 ` Zhouyi Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54653A7C.80803@oracle.com \
--to=sasha.levin@oracle.com \
--cc=davej@redhat.com \
--cc=htejun@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.