From: Petr Mladek <pmladek@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com,
live-patching@vger.kernel.org
Subject: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
Date: Wed, 12 Feb 2025 16:42:39 +0100 [thread overview]
Message-ID: <Z6zBb9GRkFC-R0RE@pathway.suse.cz> (raw)
In-Reply-To: <CALOAHbDEBqZyDvSSv+KTFVR3owkjfawCQ-fT9pC1fMHNGPnG+g@mail.gmail.com>
On Wed 2025-02-12 19:54:21, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > > > void klp_try_complete_transition(void)
> > > > {
> > > > + unsigned long timeout, proceed_pending_processes;
> > > > unsigned int cpu;
> > > > struct task_struct *g, *task;
> > > > struct klp_patch *patch;
> > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > > > * unless the patch includes changes to a very common function.
> > > > */
> > > > read_lock(&tasklist_lock);
> > > > - for_each_process_thread(g, task)
> > > > + timeout = jiffies + HZ;
> > > > + proceed_pending_processes = 0;
> > > > + for_each_process_thread(g, task) {
> > > > + /* check if this task has already switched over */
> > > > + if (task->patch_state == klp_target_state)
> > > > + continue;
> > > > +
> > > > + proceed_pending_processes++;
> > > > +
> > > > if (!klp_try_switch_task(task))
> > > > complete = false;
> > > > +
> > > > + /*
> > > > + * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > > + * But guarantee the forward progress by making sure at least
> > > > + * some pending processes were checked.
> > > > + */
> > > > + if (rwlock_is_contended(&tasklist_lock) &&
> > > > + time_after(jiffies, timeout) &&
> > > > + proceed_pending_processes > 100) {
> > > > + complete = false;
> > > > + break;
> > > > + }
> > > > + }
> > > > read_unlock(&tasklist_lock);
> > >
> > > Instead of all this can we not just use rcu_read_lock() instead of
> > > tasklist_lock?
> >
> > I’m concerned that there’s a potential race condition in fork() if we
> > use RCU, as illustrated below:
> >
> > CPU0 CPU1
> >
> > write_lock_irq(&tasklist_lock);
> > klp_copy_process(p);
> >
> > parent->patch_state=klp_target_state
> >
> > list_add_tail_rcu(&p->tasks, &init_task.tasks);
> > write_unlock_irq(&tasklist_lock);
> >
> > In this scenario, after the parent executes klp_copy_process(p) to
> > copy its patch_state to the child, but before adding the child to the
> > task list, the parent’s patch_state might be updated by the KLP
> > transition. This could result in the child being left with an outdated
> > state.
Similar race actually existed even before.
> > We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> > treated as a single atomic operation.
>
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:
This might work when a new process is created, for example, using
execve(). But it would fail when the process is just forked (man 2 fork)
and both parent and child continue running the same code.
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
> p->exit_signal = args->exit_signal;
> }
>
> - klp_copy_process(p);
> + // klp_init_process(p);
> + clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> + child->patch_state = KLP_TRANSITION_IDLE;
This is exactly the loction where we depend on the sychronization
with the tasklist_lock. It allows us to synchronize both
TIF_PATCH_PENDING flag and p->patch_state between the parent
and child, see the comment in klp_copy_process():
/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
/*
* The parent process may have gone through a KLP transition since
* the thread flag was copied in setup_thread_stack earlier. Bring
* the task flag up to date with the parent here.
*
* The operation is serialized against all klp_*_transition()
* operations by the tasklist_lock. The only exceptions are
* klp_update_patch_state(current) and __klp_sched_try_switch(), but we
* cannot race with them because we are current.
*/
if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
set_tsk_thread_flag(child, TIF_PATCH_PENDING);
else
clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
child->patch_state = current->patch_state;
}
When using rcu_read_lock() in klp_try_complete_transition,
child->patch_state might get an outdated information. The following
race commes to my mind:
CPU1 CPU1
klp_try_complete_transition()
taskA:
+ fork()
+ klp_copy_process()
child->patch_state = KLP_PATCH_UNPATCHED
klp_try_switch_task(taskA)
// safe
child->patch_state = KLP_PATCH_PATCHED
all processes patched
klp_finish_transition()
list_add_tail_rcu(&p->thread_node,
&p->signal->thread_head);
BANG: The forked task has KLP_PATCH_UNPATCHED so that
klp_ftrace_handler() will redirect it to the old code.
But CPU1 thinks that all tasks are migrated and is going
to finish the transition
My opinion:
I am afraid that using rcu_read_lock() in klp_try_complete_transition()
might cause more harm than good. The code already is complicated
and this might make it even more tricky.
I would first like to understand how exactly the stall happens.
It is possible that even rcu_read_lock() won't help here!
If the it takes too long time to check backtraces of all pending
processes then even rcu_read_lock() might trigger the RCU stall
warning as well.
Best Regards,
Petr
next prev parent reply other threads:[~2025-02-12 15:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
2025-02-11 6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
2025-02-12 12:51 ` Petr Mladek
2025-02-13 5:49 ` Yafang Shao
2025-02-11 6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
2025-02-12 0:40 ` Josh Poimboeuf
2025-02-12 2:34 ` Yafang Shao
2025-02-12 11:54 ` Yafang Shao
2025-02-12 15:42 ` Petr Mladek [this message]
2025-02-13 1:36 ` Josh Poimboeuf
2025-02-13 5:53 ` Yafang Shao
2025-02-13 9:48 ` Petr Mladek
2025-02-13 17:32 ` Josh Poimboeuf
2025-02-14 14:44 ` Petr Mladek
2025-02-14 18:12 ` Josh Poimboeuf
2025-02-18 2:37 ` Yafang Shao
2025-02-13 2:47 ` Josh Poimboeuf
2025-02-13 11:19 ` Find root of the stall: was: " Petr Mladek
2025-02-13 12:32 ` Yafang Shao
2025-02-13 12:39 ` Yafang Shao
2025-02-14 2:44 ` Yafang Shao
2025-02-14 8:36 ` Josh Poimboeuf
2025-02-14 11:37 ` Petr Mladek
2025-02-18 2:19 ` Yafang Shao
2025-02-14 9:46 ` Petr Mladek
2025-02-11 6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
2025-02-12 0:52 ` Josh Poimboeuf
2025-02-12 2:42 ` Yafang Shao
2025-02-13 1:58 ` Josh Poimboeuf
2025-02-13 5:51 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6zBb9GRkFC-R0RE@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=jikos@kernel.org \
--cc=joe.lawrence@redhat.com \
--cc=jpoimboe@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=live-patching@vger.kernel.org \
--cc=mbenes@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.