From: Petr Mladek <pmladek@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com,
live-patching@vger.kernel.org
Subject: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
Date: Wed, 12 Feb 2025 16:42:39 +0100 [thread overview]
Message-ID: <Z6zBb9GRkFC-R0RE@pathway.suse.cz> (raw)
In-Reply-To: <CALOAHbDEBqZyDvSSv+KTFVR3owkjfawCQ-fT9pC1fMHNGPnG+g@mail.gmail.com>
On Wed 2025-02-12 19:54:21, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > > > void klp_try_complete_transition(void)
> > > > {
> > > > + unsigned long timeout, proceed_pending_processes;
> > > > unsigned int cpu;
> > > > struct task_struct *g, *task;
> > > > struct klp_patch *patch;
> > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > > > * unless the patch includes changes to a very common function.
> > > > */
> > > > read_lock(&tasklist_lock);
> > > > - for_each_process_thread(g, task)
> > > > + timeout = jiffies + HZ;
> > > > + proceed_pending_processes = 0;
> > > > + for_each_process_thread(g, task) {
> > > > + /* check if this task has already switched over */
> > > > + if (task->patch_state == klp_target_state)
> > > > + continue;
> > > > +
> > > > + proceed_pending_processes++;
> > > > +
> > > > if (!klp_try_switch_task(task))
> > > > complete = false;
> > > > +
> > > > + /*
> > > > + * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > > + * But guarantee the forward progress by making sure at least
> > > > + * some pending processes were checked.
> > > > + */
> > > > + if (rwlock_is_contended(&tasklist_lock) &&
> > > > + time_after(jiffies, timeout) &&
> > > > + proceed_pending_processes > 100) {
> > > > + complete = false;
> > > > + break;
> > > > + }
> > > > + }
> > > > read_unlock(&tasklist_lock);
> > >
> > > Instead of all this can we not just use rcu_read_lock() instead of
> > > tasklist_lock?
> >
> > I’m concerned that there’s a potential race condition in fork() if we
> > use RCU, as illustrated below:
> >
> > CPU0 CPU1
> >
> > write_lock_irq(&tasklist_lock);
> > klp_copy_process(p);
> >
> > parent->patch_state=klp_target_state
> >
> > list_add_tail_rcu(&p->tasks, &init_task.tasks);
> > write_unlock_irq(&tasklist_lock);
> >
> > In this scenario, after the parent executes klp_copy_process(p) to
> > copy its patch_state to the child, but before adding the child to the
> > task list, the parent’s patch_state might be updated by the KLP
> > transition. This could result in the child being left with an outdated
> > state.
Similar race actually existed even before.
> > We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> > treated as a single atomic operation.
>
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:
This might work when a new process is created, for example, using
execve(). But it would fail when the process is just forked (man 2 fork)
and both parent and child continue running the same code.
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
> p->exit_signal = args->exit_signal;
> }
>
> - klp_copy_process(p);
> + // klp_init_process(p);
> + clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> + child->patch_state = KLP_TRANSITION_IDLE;
This is exactly the loction where we depend on the sychronization
with the tasklist_lock. It allows us to synchronize both
TIF_PATCH_PENDING flag and p->patch_state between the parent
and child, see the comment in klp_copy_process():
/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
/*
* The parent process may have gone through a KLP transition since
* the thread flag was copied in setup_thread_stack earlier. Bring
* the task flag up to date with the parent here.
*
* The operation is serialized against all klp_*_transition()
* operations by the tasklist_lock. The only exceptions are
* klp_update_patch_state(current) and __klp_sched_try_switch(), but we
* cannot race with them because we are current.
*/
if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
set_tsk_thread_flag(child, TIF_PATCH_PENDING);
else
clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
child->patch_state = current->patch_state;
}
When using rcu_read_lock() in klp_try_complete_transition,
child->patch_state might get an outdated information. The following
race commes to my mind:
CPU1 CPU1
klp_try_complete_transition()
taskA:
+ fork()
+ klp_copy_process()
child->patch_state = KLP_PATCH_UNPATCHED
klp_try_switch_task(taskA)
// safe
child->patch_state = KLP_PATCH_PATCHED
all processes patched
klp_finish_transition()
list_add_tail_rcu(&p->thread_node,
&p->signal->thread_head);
BANG: The forked task has KLP_PATCH_UNPATCHED so that
klp_ftrace_handler() will redirect it to the old code.
But CPU1 thinks that all tasks are migrated and is going
to finish the transition
My opinion:
I am afraid that using rcu_read_lock() in klp_try_complete_transition()
might cause more harm than good. The code already is complicated
and this might make it even more tricky.
I would first like to understand how exactly the stall happens.
It is possible that even rcu_read_lock() won't help here!
If the it takes too long time to check backtraces of all pending
processes then even rcu_read_lock() might trigger the RCU stall
warning as well.
Best Regards,
Petr
next prev parent reply other threads:[~2025-02-12 15:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
2025-02-11 6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
2025-02-12 12:51 ` Petr Mladek
2025-02-13 5:49 ` Yafang Shao
2025-02-11 6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
2025-02-12 0:40 ` Josh Poimboeuf
2025-02-12 2:34 ` Yafang Shao
2025-02-12 11:54 ` Yafang Shao
2025-02-12 15:42 ` Petr Mladek [this message]
2025-02-13 1:36 ` Josh Poimboeuf
2025-02-13 5:53 ` Yafang Shao
2025-02-13 9:48 ` Petr Mladek
2025-02-13 17:32 ` Josh Poimboeuf
2025-02-14 14:44 ` Petr Mladek
2025-02-14 18:12 ` Josh Poimboeuf
2025-02-18 2:37 ` Yafang Shao
2025-02-13 2:47 ` Josh Poimboeuf
2025-02-13 11:19 ` Find root of the stall: was: " Petr Mladek
2025-02-13 12:32 ` Yafang Shao
2025-02-13 12:39 ` Yafang Shao
2025-02-14 2:44 ` Yafang Shao
2025-02-14 8:36 ` Josh Poimboeuf
2025-02-14 11:37 ` Petr Mladek
2025-02-18 2:19 ` Yafang Shao
2025-02-14 9:46 ` Petr Mladek
2025-02-11 6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
2025-02-12 0:52 ` Josh Poimboeuf
2025-02-12 2:42 ` Yafang Shao
2025-02-13 1:58 ` Josh Poimboeuf
2025-02-13 5:51 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6zBb9GRkFC-R0RE@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=jikos@kernel.org \
--cc=joe.lawrence@redhat.com \
--cc=jpoimboe@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=live-patching@vger.kernel.org \
--cc=mbenes@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox