All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
@ 2023-09-07 13:17 Joel Fernandes
  2023-09-07 14:34 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2023-09-07 13:17 UTC (permalink / raw)
  To: rcu; +Cc: Paul E. McKenney

Hi,
Just started seeing this on 6.5 stable. It is new and first occurrence:

TREE04 no success message, 234 successful version messages
 [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
[   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
f0x0 ->state 0x2 cpu 6
[   38.388342] Call Trace:
[   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
f0x2 ->state 0x2 cpu 6
[   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
f0x0 ->state 0x2 cpu 6
[   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
f0x0 ->state 0x2 cpu 6
[   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
f0x0 ->state 0x2 cpu 6
[  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
f0x0 ->state 0x2 cpu 6
[  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
f0x0 ->state 0x2 cpu 6
[..]

All logs:
http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-07 13:17 [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL Joel Fernandes
@ 2023-09-07 14:34 ` Paul E. McKenney
  2023-09-07 20:03   ` Joel Fernandes
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Paul E. McKenney @ 2023-09-07 14:34 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> Hi,
> Just started seeing this on 6.5 stable. It is new and first occurrence:
> 
> TREE04 no success message, 234 successful version messages
>  [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> f0x0 ->state 0x2 cpu 6
> [   38.388342] Call Trace:
> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> f0x2 ->state 0x2 cpu 6
> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> f0x0 ->state 0x2 cpu 6
> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> f0x0 ->state 0x2 cpu 6
> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> f0x0 ->state 0x2 cpu 6
> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> f0x0 ->state 0x2 cpu 6
> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> f0x0 ->state 0x2 cpu 6
> [..]
> 
> All logs:
> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/

Huh.  Does this happen for you in v6.5 mainline?

Both the code under test (full-state polled grace periods) and the
rcutorture test code are fairly new, so there is some reason for general
suspicion.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-07 14:34 ` Paul E. McKenney
@ 2023-09-07 20:03   ` Joel Fernandes
  2023-09-08  0:51     ` Joel Fernandes
  2023-09-08 10:28   ` Zhouyi Zhou
  2023-09-16  1:09   ` Joel Fernandes
  2 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2023-09-07 20:03 UTC (permalink / raw)
  To: paulmck; +Cc: Joel Fernandes, rcu



> On Sep 7, 2023, at 12:23 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
>> Hi,
>> Just started seeing this on 6.5 stable. It is new and first occurrence:
>> 
>> TREE04 no success message, 234 successful version messages
>> [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
>> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
>> f0x0 ->state 0x2 cpu 6
>> [   38.388342] Call Trace:
>> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
>> f0x2 ->state 0x2 cpu 6
>> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
>> f0x0 ->state 0x2 cpu 6
>> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
>> f0x0 ->state 0x2 cpu 6
>> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
>> f0x0 ->state 0x2 cpu 6
>> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
>> f0x0 ->state 0x2 cpu 6
>> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
>> f0x0 ->state 0x2 cpu 6
>> [..]
>> 
>> All logs:
>> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> 
> Huh.  Does this happen for you in v6.5 mainline?
> 
> Both the code under test (full-state polled grace periods) and the
> rcutorture test code are fairly new, so there is some reason for general
> suspicion.  ;-)

Ah. I never saw it on either 6.5 mainline or stable till today. Even on stable
I only ever saw it this once. On mainline I have not seen it yet but I do test
stable much more since I have been on stable maintenance duty ;-).

I will keep an eye on it.. this also happens quite early per that time stamp. thanks,

- Joel 




> 
>                            Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-07 20:03   ` Joel Fernandes
@ 2023-09-08  0:51     ` Joel Fernandes
  2023-09-08  8:27       ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2023-09-08  0:51 UTC (permalink / raw)
  To: paulmck; +Cc: Joel Fernandes, rcu

On Thu, Sep 7, 2023 at 4:03 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
>
>
> > On Sep 7, 2023, at 12:23 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> >> Hi,
> >> Just started seeing this on 6.5 stable. It is new and first occurrence:
> >>
> >> TREE04 no success message, 234 successful version messages
> >> [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> >> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> >> f0x0 ->state 0x2 cpu 6
> >> [   38.388342] Call Trace:
> >> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> >> f0x2 ->state 0x2 cpu 6
> >> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> >> f0x0 ->state 0x2 cpu 6
> >> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> >> f0x0 ->state 0x2 cpu 6
> >> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> >> f0x0 ->state 0x2 cpu 6
> >> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> >> f0x0 ->state 0x2 cpu 6
> >> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> >> f0x0 ->state 0x2 cpu 6
> >> [..]
> >>
> >> All logs:
> >> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> >
> > Huh.  Does this happen for you in v6.5 mainline?
> >
> > Both the code under test (full-state polled grace periods) and the
> > rcutorture test code are fairly new, so there is some reason for general
> > suspicion.  ;-)
>
> Ah. I never saw it on either 6.5 mainline or stable till today. Even on stable
> I only ever saw it this once. On mainline I have not seen it yet but I do test
> stable much more since I have been on stable maintenance duty ;-).

I did a couple of long runs and I am not able to reproduce it anymore. :-/

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-08  0:51     ` Joel Fernandes
@ 2023-09-08  8:27       ` Paul E. McKenney
  2023-09-08 11:41         ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2023-09-08  8:27 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Joel Fernandes, rcu

On Thu, Sep 07, 2023 at 08:51:43PM -0400, Joel Fernandes wrote:
> On Thu, Sep 7, 2023 at 4:03 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> >
> >
> > > On Sep 7, 2023, at 12:23 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > >> Hi,
> > >> Just started seeing this on 6.5 stable. It is new and first occurrence:
> > >>
> > >> TREE04 no success message, 234 successful version messages
> > >> [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > >> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > >> f0x0 ->state 0x2 cpu 6
> > >> [   38.388342] Call Trace:
> > >> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > >> f0x2 ->state 0x2 cpu 6
> > >> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > >> f0x0 ->state 0x2 cpu 6
> > >> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > >> f0x0 ->state 0x2 cpu 6
> > >> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > >> f0x0 ->state 0x2 cpu 6
> > >> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > >> f0x0 ->state 0x2 cpu 6
> > >> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > >> f0x0 ->state 0x2 cpu 6
> > >> [..]
> > >>
> > >> All logs:
> > >> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> > >
> > > Huh.  Does this happen for you in v6.5 mainline?
> > >
> > > Both the code under test (full-state polled grace periods) and the
> > > rcutorture test code are fairly new, so there is some reason for general
> > > suspicion.  ;-)
> >
> > Ah. I never saw it on either 6.5 mainline or stable till today. Even on stable
> > I only ever saw it this once. On mainline I have not seen it yet but I do test
> > stable much more since I have been on stable maintenance duty ;-).
> 
> I did a couple of long runs and I am not able to reproduce it anymore. :-/

I know that feeling!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-07 14:34 ` Paul E. McKenney
  2023-09-07 20:03   ` Joel Fernandes
@ 2023-09-08 10:28   ` Zhouyi Zhou
  2023-09-08 23:33     ` Zhouyi Zhou
  2023-09-16  1:09   ` Joel Fernandes
  2 siblings, 1 reply; 11+ messages in thread
From: Zhouyi Zhou @ 2023-09-08 10:28 UTC (permalink / raw)
  To: paulmck; +Cc: Joel Fernandes, rcu

On Fri, Sep 8, 2023 at 1:59 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > Hi,
> > Just started seeing this on 6.5 stable. It is new and first occurrence:
> >
> > TREE04 no success message, 234 successful version messages
> >  [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > f0x0 ->state 0x2 cpu 6
> > [   38.388342] Call Trace:
> > [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > f0x2 ->state 0x2 cpu 6
> > [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > f0x0 ->state 0x2 cpu 6
> > [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > f0x0 ->state 0x2 cpu 6
> > [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > f0x0 ->state 0x2 cpu 6
> > [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > f0x0 ->state 0x2 cpu 6
> > [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > f0x0 ->state 0x2 cpu 6
> > [..]
> >
> > All logs:
> > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
>
> Huh.  Does this happen for you in v6.5 mainline?
Hi, I am started torture.sh in a kvm environment (with nested kvm
enable) in my Intel  i7-1165G7 laptop, which can be examined at
runtime:
http://154.220.3.120:8080/test/linux-stable/tools/testing/selftests/rcutorture/res/2023.09.08-10.23.47-torture/

Hope I can be of some beneficial
Thanks
Zhouyi
>
> Both the code under test (full-state polled grace periods) and the
> rcutorture test code are fairly new, so there is some reason for general
> suspicion.  ;-)
>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-08  8:27       ` Paul E. McKenney
@ 2023-09-08 11:41         ` Frederic Weisbecker
  2023-09-08 13:32           ` Joel Fernandes
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2023-09-08 11:41 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Joel Fernandes, Joel Fernandes, rcu

On Fri, Sep 08, 2023 at 01:27:06AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 07, 2023 at 08:51:43PM -0400, Joel Fernandes wrote:
> > On Thu, Sep 7, 2023 at 4:03 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > >
> > >
> > > > On Sep 7, 2023, at 12:23 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > > >> Hi,
> > > >> Just started seeing this on 6.5 stable. It is new and first occurrence:
> > > >>
> > > >> TREE04 no success message, 234 successful version messages
> > > >> [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > > >> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [   38.388342] Call Trace:
> > > >> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > > >> f0x2 ->state 0x2 cpu 6
> > > >> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > > >> f0x0 ->state 0x2 cpu 6
> > > >> [..]
> > > >>
> > > >> All logs:
> > > >> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> > > >
> > > > Huh.  Does this happen for you in v6.5 mainline?
> > > >
> > > > Both the code under test (full-state polled grace periods) and the
> > > > rcutorture test code are fairly new, so there is some reason for general
> > > > suspicion.  ;-)
> > >
> > > Ah. I never saw it on either 6.5 mainline or stable till today. Even on stable
> > > I only ever saw it this once. On mainline I have not seen it yet but I do test
> > > stable much more since I have been on stable maintenance duty ;-).
> > 
> > I did a couple of long runs and I am not able to reproduce it anymore. :-/
> 
> I know that feeling!

Same here, this is after all the reason why we keep the tick dependency within
the hotplug process without really knowing why :o)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-08 11:41         ` Frederic Weisbecker
@ 2023-09-08 13:32           ` Joel Fernandes
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2023-09-08 13:32 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Paul E. McKenney, Joel Fernandes, rcu

On Fri, Sep 8, 2023 at 7:41 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Fri, Sep 08, 2023 at 01:27:06AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 07, 2023 at 08:51:43PM -0400, Joel Fernandes wrote:
> > > On Thu, Sep 7, 2023 at 4:03 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > >
> > > >
> > > > > On Sep 7, 2023, at 12:23 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > > > >> Hi,
> > > > >> Just started seeing this on 6.5 stable. It is new and first occurrence:
> > > > >>
> > > > >> TREE04 no success message, 234 successful version messages
> > > > >> [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > > > >> [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [   38.388342] Call Trace:
> > > > >> [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > > > >> f0x2 ->state 0x2 cpu 6
> > > > >> [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > > > >> f0x0 ->state 0x2 cpu 6
> > > > >> [..]
> > > > >>
> > > > >> All logs:
> > > > >> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> > > > >
> > > > > Huh.  Does this happen for you in v6.5 mainline?
> > > > >
> > > > > Both the code under test (full-state polled grace periods) and the
> > > > > rcutorture test code are fairly new, so there is some reason for general
> > > > > suspicion.  ;-)
> > > >
> > > > Ah. I never saw it on either 6.5 mainline or stable till today. Even on stable
> > > > I only ever saw it this once. On mainline I have not seen it yet but I do test
> > > > stable much more since I have been on stable maintenance duty ;-).
> > >
> > > I did a couple of long runs and I am not able to reproduce it anymore. :-/
> >
> > I know that feeling!
>
> Same here, this is after all the reason why we keep the tick dependency within
> the hotplug process without really knowing why :o)

Heh. I have been running into another intermittent one as well which
is the boost failure and that happens once in 10-15 runs or so.

I was thinking of running the following configuration on an automated
regular basis to at least provide a better clue on the lucky run that
catches an issue. But then the issue is it would change timing enough
to maybe hide bugs. I could also make it submit logs automatically to
the list on such occurrences, but one step at a time and all that.  I
do need to add (hopefully less noisy) tick/timer related trace events.

# Define the bootargs array
bootargs=(
    "ftrace_dump_on_oops"
    "panic_on_warn=1"
    "sysctl.kernel.panic_on_rcu_stall=1"
    "sysctl.kernel.max_rcu_stall_to_panic=1"
    "trace_buf_size=10K"
    "traceoff_on_warning=1"
    "panic_print=0x1f"      # To dump held locks, mem and other info.
)
# Define the trace events array passed to bootargs.
trace_events=(
    "sched:sched_switch"
    "sched:sched_waking"
    "rcu:rcu_callback"
    "rcu:rcu_fqs"
    "rcu:rcu_quiescent_state_report"
    "rcu:rcu_grace_period"
)

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-08 10:28   ` Zhouyi Zhou
@ 2023-09-08 23:33     ` Zhouyi Zhou
  2023-09-09  0:10       ` Zhouyi Zhou
  0 siblings, 1 reply; 11+ messages in thread
From: Zhouyi Zhou @ 2023-09-08 23:33 UTC (permalink / raw)
  To: paulmck; +Cc: Joel Fernandes, rcu

On Fri, Sep 8, 2023 at 6:28 PM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> On Fri, Sep 8, 2023 at 1:59 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > > Hi,
> > > Just started seeing this on 6.5 stable. It is new and first occurrence:
> > >
> > > TREE04 no success message, 234 successful version messages
> > >  [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > > [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > > f0x0 ->state 0x2 cpu 6
> > > [   38.388342] Call Trace:
> > > [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > > f0x2 ->state 0x2 cpu 6
> > > [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > > f0x0 ->state 0x2 cpu 6
> > > [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > > f0x0 ->state 0x2 cpu 6
> > > [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > > f0x0 ->state 0x2 cpu 6
> > > [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > > f0x0 ->state 0x2 cpu 6
> > > [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > > f0x0 ->state 0x2 cpu 6
> > > [..]
> > >
> > > All logs:
> > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> >
> > Huh.  Does this happen for you in v6.5 mainline?
> Hi, I am started torture.sh in a kvm environment (with nested kvm
> enable) in my Intel  i7-1165G7 laptop, which can be examined at
> runtime:
> http://154.220.3.120:8080/test/linux-stable/tools/testing/selftests/rcutorture/res/2023.09.08-10.23.47-torture/
again I can't reproduce the bug in my environment, I will try it more times.
the git head is  3766ec12cf89

System stability is a profound knowledge, there is too much for me to
learn from the community.
Thanks
Zhouyi
>
> Hope I can be of some beneficial
> Thanks
> Zhouyi
> >
> > Both the code under test (full-state polled grace periods) and the
> > rcutorture test code are fairly new, so there is some reason for general
> > suspicion.  ;-)
> >
> >                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-08 23:33     ` Zhouyi Zhou
@ 2023-09-09  0:10       ` Zhouyi Zhou
  0 siblings, 0 replies; 11+ messages in thread
From: Zhouyi Zhou @ 2023-09-09  0:10 UTC (permalink / raw)
  To: paulmck; +Cc: Joel Fernandes, rcu

On Sat, Sep 9, 2023 at 7:33 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> On Fri, Sep 8, 2023 at 6:28 PM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >
> > On Fri, Sep 8, 2023 at 1:59 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > > > Hi,
> > > > Just started seeing this on 6.5 stable. It is new and first occurrence:
> > > >
> > > > TREE04 no success message, 234 successful version messages
> > > >  [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > > > [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > > > f0x0 ->state 0x2 cpu 6
> > > > [   38.388342] Call Trace:
> > > > [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > > > f0x2 ->state 0x2 cpu 6
> > > > [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > > > f0x0 ->state 0x2 cpu 6
> > > > [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > > > f0x0 ->state 0x2 cpu 6
> > > > [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > > > f0x0 ->state 0x2 cpu 6
> > > > [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > > > f0x0 ->state 0x2 cpu 6
> > > > [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > > > f0x0 ->state 0x2 cpu 6
> > > > [..]
> > > >
> > > > All logs:
> > > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> > >
> > > Huh.  Does this happen for you in v6.5 mainline?
> > Hi, I am started torture.sh in a kvm environment (with nested kvm
> > enable) in my Intel  i7-1165G7 laptop, which can be examined at
> > runtime:
> > http://154.220.3.120:8080/test/linux-stable/tools/testing/selftests/rcutorture/res/2023.09.08-10.23.47-torture/
> again I can't reproduce the bug in my environment, I will try it more times.
besides my laptop, I also started the test on PPC vm of Open Source
lab of Oregon State University:
http://140.211.169.189/stable/linux/tools/testing/selftests/rcutorture/res/2023.09.09-00.07.55-torture/
> the git head is  3766ec12cf89
>
> System stability is a profound knowledge, there is too much for me to
> learn from the community.
> Thanks
> Zhouyi
> >
> > Hope I can be of some beneficial
> > Thanks
> > Zhouyi
> > >
> > > Both the code under test (full-state polled grace periods) and the
> > > rcutorture test code are fairly new, so there is some reason for general
> > > suspicion.  ;-)
> > >
> > >                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL
  2023-09-07 14:34 ` Paul E. McKenney
  2023-09-07 20:03   ` Joel Fernandes
  2023-09-08 10:28   ` Zhouyi Zhou
@ 2023-09-16  1:09   ` Joel Fernandes
  2 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2023-09-16  1:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu

On Thu, Sep 07, 2023 at 07:34:44AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 07, 2023 at 09:17:15AM -0400, Joel Fernandes wrote:
> > Hi,
> > Just started seeing this on 6.5 stable. It is new and first occurrence:
> > 
> > TREE04 no success message, 234 successful version messages
> >  [033mWARNING:  [mTREE04 GP HANG at 14 torture stat 2
> > [   38.371120] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g1253
> > f0x0 ->state 0x2 cpu 6
> > [   38.388342] Call Trace:
> > [   53.741039] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g3637
> > f0x2 ->state 0x2 cpu 6
> > [   69.093462] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g5501
> > f0x0 ->state 0x2 cpu 6
> > [   84.450028] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g10505
> > f0x0 ->state 0x2 cpu 6
> > [   99.815871] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g13781
> > f0x0 ->state 0x2 cpu 6
> > [  115.166476] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g16544
> > f0x0 ->state 0x2 cpu 6
> > [  130.550116] ??? Writer stall state RTWS_COND_SYNC_FULL(10) g18941
> > f0x0 ->state 0x2 cpu 6
> > [..]
> > 
> > All logs:
> > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.5.y/17/artifact/tools/testing/selftests/rcutorture/res/2023.09.07-04.10.25/TREE04/
> 
> Huh.  Does this happen for you in v6.5 mainline?
> 
> Both the code under test (full-state polled grace periods) and the
> rcutorture test code are fairly new, so there is some reason for general
> suspicion.  ;-)

I happened to hit this again but this time on 6.1 stable and TREE05:

Here are some logs:
http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.1.y/139/artifact/tools/testing/selftests/rcutorture/res/2023.09.15-04.02.48/TREE05/

I am planning to look closer soon. thanks,

 - Joel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-09-16  1:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-07 13:17 [BUG] TREE04 hang on 6.5.y stable: Writer stall state RTWS_COND_SYNC_FULL Joel Fernandes
2023-09-07 14:34 ` Paul E. McKenney
2023-09-07 20:03   ` Joel Fernandes
2023-09-08  0:51     ` Joel Fernandes
2023-09-08  8:27       ` Paul E. McKenney
2023-09-08 11:41         ` Frederic Weisbecker
2023-09-08 13:32           ` Joel Fernandes
2023-09-08 10:28   ` Zhouyi Zhou
2023-09-08 23:33     ` Zhouyi Zhou
2023-09-09  0:10       ` Zhouyi Zhou
2023-09-16  1:09   ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.