Live Patching
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: jpoimboe@kernel.org, jikos@kernel.org, mbenes@suse.cz,
	joe.lawrence@redhat.com, live-patching@vger.kernel.org
Subject: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
Date: Thu, 13 Feb 2025 12:19:46 +0100	[thread overview]
Message-ID: <Z63VUsiaPsEjS9SR@pathway.suse.cz> (raw)
In-Reply-To: <20250211062437.46811-3-laoar.shao@gmail.com>

On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> I encountered a hard lockup when attempting to reproduce the panic issue
> occurred on our production servers [0]. The hard lockup is as follows,
> 
> [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit

This message does not exist in vanilla kernel. It looks like an extra
debug message. And many extra messages might create stalls on its own.

AFAIK, your reproduced the problem even without these extra messages.
At least, I see the following at
https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com

<paste>
[20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
[20329703.340417] livepatch: 'livepatch_61_release6': starting
patching transition
[20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
10166 jiffies old.
[20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
10219 jiffies old.
[20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
10199 jiffies old.
[20329754.848036] livepatch: 'livepatch_61_release6': patching complete
</paste>

Could you please confirm that this the original _non-filtered_ log?
I mean that the debug messages were _not_ printed and later filtered?

I would like to know more about the system where RCU reported the
stall. How many processes are running there in average?
A rough number is enough. I wonder if it is about 1000, 10000, or
50000?

Also what is the CONFIG_HZ value, please?

Also we should get some statistics how long klp_try_switch_task()
lasts in average. I have never did it but I guess that
it should be rather easy with perf. Or maybe just by looking
at function_graph trace.

I would like to be more sure that klp_try_complete_transition() really
could block RCU for that long. I would like to confirm that
the following is the reality:

  num_processes * average_klp_try_switch_task > 10second

If it is true than we really need to break the cycle after some
timeout. And rcu_read_lock() is _not_ a solution because it would
block RCU the same way.

Does it make sense, please?

Best Regards,
Petr

  parent reply	other threads:[~2025-02-13 11:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
2025-02-12 12:51   ` Petr Mladek
2025-02-13  5:49     ` Yafang Shao
2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
2025-02-12  0:40   ` Josh Poimboeuf
2025-02-12  2:34     ` Yafang Shao
2025-02-12 11:54       ` Yafang Shao
2025-02-12 15:42         ` Petr Mladek
2025-02-13  1:36           ` Josh Poimboeuf
2025-02-13  5:53             ` Yafang Shao
2025-02-13  9:48             ` Petr Mladek
2025-02-13 17:32               ` Josh Poimboeuf
2025-02-14 14:44                 ` Petr Mladek
2025-02-14 18:12                   ` Josh Poimboeuf
2025-02-18  2:37                     ` Yafang Shao
2025-02-13  2:47         ` Josh Poimboeuf
2025-02-13 11:19   ` Petr Mladek [this message]
2025-02-13 12:32     ` Find root of the stall: was: " Yafang Shao
2025-02-13 12:39       ` Yafang Shao
2025-02-14  2:44         ` Yafang Shao
2025-02-14  8:36           ` Josh Poimboeuf
2025-02-14 11:37             ` Petr Mladek
2025-02-18  2:19               ` Yafang Shao
2025-02-14  9:46       ` Petr Mladek
2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
2025-02-12  0:52   ` Josh Poimboeuf
2025-02-12  2:42     ` Yafang Shao
2025-02-13  1:58       ` Josh Poimboeuf
2025-02-13  5:51         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z63VUsiaPsEjS9SR@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox