public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Guillaume Morin <guillaume@morinfr.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: guillaume@morinfr.org, linux-kernel@vger.kernel.org
Subject: Re: call_rcu data race patch
Date: Sat, 18 Sep 2021 02:39:35 +0200	[thread overview]
Message-ID: <20210918003933.GA25868@bender.morinfr.org> (raw)
In-Reply-To: <20210917220700.GV4156@paulmck-ThinkPad-P17-Gen-1>

On 17 Sep 15:07, Paul E. McKenney wrote:
> > I have a few kdumps from 5.4 and 5.10 kernels (that's how I was able to
> > observe that the gp thread was sleeping for a long time) and that
> > rcu_state.gp_flags & 1 == 1.
> > 
> > But this warning has happened a couple of dozen times on multiple
> > machines in the __fput path (different kind of HW as well). Removing
> > nohz_full from the command line makes the problem disappear.
> > 
> > Most machines have had fairly long uptime (30+ days) before showing the
> > warning, though it has happened on a couple occasions only after a few
> > hours.
> > 
> > That's pretty much all I have been able to gather so far, unfortunately.
> 
> What are these systems doing?  Running mostly in nohz_full usermode?
> Mostly idle?  Something else?

Running mostly in nohz_full usermode (non preempt), mostly busy but
it varies. I don't think I've seen this warning on a idle machine
though.

> If it happens again, could you please also capture the state of the
> various rcuo kthreads?  Of these, the rcuog kthreads start grace
> periods and the rcuoc kthreads invoke callbacks.

You mean the task state? Or something else I can dig up from a kdump?

This one was taken about 32:24s after the warning happened.
  
crash> ps -m | grep rcu
[0 00:00:26.697] [IN]  PID: 89     TASK: ffff93c940b60000  CPU: 0   COMMAND: "rcuog/12"
[0 00:00:30.443] [IN]  PID: 114    TASK: ffff93c940c623c0  CPU: 0   COMMAND: "rcuog/16"
[0 00:00:30.483] [IN]  PID: 20     TASK: ffff93c940920000  CPU: 0   COMMAND: "rcuog/1"
[0 00:00:30.490] [IN]  PID: 64     TASK: ffff93c940a9c780  CPU: 0   COMMAND: "rcuog/8"
[0 00:00:31.373] [IN]  PID: 39     TASK: ffff93c9409aa3c0  CPU: 0   COMMAND: "rcuog/4"
[0 00:32:24.007] [IN]  PID: 58     TASK: ffff93c940a6c780  CPU: 0   COMMAND: "rcuos/7"
[0 00:32:24.007] [ID]  PID: 12     TASK: ffff93c940854780  CPU: 0   COMMAND: "rcu_sched"
[0 00:32:24.080] [IN]  PID: 27     TASK: ffff93c94094a3c0  CPU: 0   COMMAND: "rcuos/2"
[0 00:32:24.090] [IN]  PID: 83     TASK: ffff93c940b38000  CPU: 0   COMMAND: "rcuos/11"
[0 00:32:24.200] [IN]  PID: 115    TASK: ffff93c940c64780  CPU: 0   COMMAND: "rcuos/16"
[0 00:32:24.250] [IN]  PID: 40     TASK: ffff93c9409ac780  CPU: 0   COMMAND: "rcuos/4"
[0 00:32:24.973] [IN]  PID: 65     TASK: ffff93c940ab0000  CPU: 0   COMMAND: "rcuos/8"
[0 00:32:24.973] [IN]  PID: 46     TASK: ffff93c9409d4780  CPU: 0   COMMAND: "rcuos/5"
[0 00:32:28.197] [IN]  PID: 77     TASK: ffff93c940b08000  CPU: 0   COMMAND: "rcuos/10"
[0 00:39:04.800] [IN]  PID: 52     TASK: ffff93c940a44780  CPU: 0   COMMAND: "rcuos/6"
[0 00:39:04.850] [IN]  PID: 33     TASK: ffff93c94097a3c0  CPU: 0   COMMAND: "rcuos/3"
[0 02:36:51.923] [IN]  PID: 102    TASK: ffff93c940bfa3c0  CPU: 0   COMMAND: "rcuos/14"
[0 04:21:46.806] [IN]  PID: 121    TASK: ffff93c940c8c780  CPU: 0   COMMAND: "rcuos/17"
[0 04:21:46.806] [IN]  PID: 108    TASK: ffff93c940c323c0  CPU: 0   COMMAND: "rcuos/15"
[0 04:25:49.033] [IN]  PID: 21     TASK: ffff93c9409223c0  CPU: 0   COMMAND: "rcuos/1"
[0 04:25:49.033] [IN]  PID: 96     TASK: ffff93c940bd23c0  CPU: 0   COMMAND: "rcuos/13"
[0 05:12:14.289] [IN]  PID: 71     TASK: ffff93c940ad8000  CPU: 0   COMMAND: "rcuos/9"
[0 05:12:17.849] [IN]  PID: 90     TASK: ffff93c940b623c0  CPU: 0   COMMAND: "rcuos/12"
[0 05:18:39.813] [IN]  PID: 10     TASK: ffff93c940850000  CPU: 0   COMMAND: "rcu_tasks_trace"
[0 05:18:39.813] [IN]  PID: 9      TASK: ffff93c940844780  CPU: 0   COMMAND: "rcu_tasks_rude_"
[0 05:18:39.813] [ID]  PID: 4      TASK: ffff93c940828000  CPU: 0   COMMAND: "rcu_par_gp"
[0 05:18:39.813] [ID]  PID: 3      TASK: ffff93c940804780  CPU: 0   COMMAND: "rcu_gp"

> OK, please see below.  This is a complete shot in the dark, but could
> potentially prevent the problem.  Or make it worse, which would at the
> very least speed up debugging.  It might needs a bit of adjustment to
> apply to the -stable kernels, but at first glance should apply cleanly.

I can adjust, that's not a problem. But to be clear you'd rather have me
apply this instead of the other patch I mentioned
(https://www.spinics.net/lists/rcu/msg05731.html) or you're okay with me
trying with both applied?
 
> Oh, and FYI I am having to manually paste your email address into the To:
> line in order to get this to go back to you.  Please check your email
> configuration.

Hmm I've adjusted the Reply-To. Let me know if it's better.

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

  reply	other threads:[~2021-09-18  0:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210917191555.GA2198@bender.morinfr.org>
2021-09-17 21:11 ` call_rcu data race patch Paul E. McKenney
2021-09-17 21:34   ` Guillaume Morin
2021-09-17 22:07     ` Paul E. McKenney
2021-09-18  0:39       ` Guillaume Morin [this message]
2021-09-18  4:00         ` Paul E. McKenney
2021-09-18  7:08           ` Guillaume Morin
2021-09-19 16:35             ` Paul E. McKenney
2021-09-20 16:05               ` Guillaume Morin
2021-09-22 19:14                 ` Guillaume Morin
2021-09-22 19:24                   ` Paul E. McKenney
2021-09-27 15:38                     ` Guillaume Morin
2021-09-27 16:10                       ` Paul E. McKenney
2021-09-27 16:49                         ` Guillaume Morin
2021-09-27 21:46                           ` Paul E. McKenney
2021-09-30 13:50                             ` Guillaume Morin
2021-11-18 18:41                         ` Daniel Vacek
2021-11-18 22:59                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210918003933.GA25868@bender.morinfr.org \
    --to=guillaume@morinfr.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox