Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
	jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com,
	live-patching@vger.kernel.org
Subject: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
Date: Wed, 12 Feb 2025 16:42:39 +0100	[thread overview]
Message-ID: <Z6zBb9GRkFC-R0RE@pathway.suse.cz> (raw)
In-Reply-To: <CALOAHbDEBqZyDvSSv+KTFVR3owkjfawCQ-fT9pC1fMHNGPnG+g@mail.gmail.com>

On Wed 2025-02-12 19:54:21, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > > >  void klp_try_complete_transition(void)
> > > >  {
> > > > +     unsigned long timeout, proceed_pending_processes;
> > > >       unsigned int cpu;
> > > >       struct task_struct *g, *task;
> > > >       struct klp_patch *patch;
> > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > > >        * unless the patch includes changes to a very common function.
> > > >        */
> > > >       read_lock(&tasklist_lock);
> > > > -     for_each_process_thread(g, task)
> > > > +     timeout = jiffies + HZ;
> > > > +     proceed_pending_processes = 0;
> > > > +     for_each_process_thread(g, task) {
> > > > +             /* check if this task has already switched over */
> > > > +             if (task->patch_state == klp_target_state)
> > > > +                     continue;
> > > > +
> > > > +             proceed_pending_processes++;
> > > > +
> > > >               if (!klp_try_switch_task(task))
> > > >                       complete = false;
> > > > +
> > > > +             /*
> > > > +              * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > > +              * But guarantee the forward progress by making sure at least
> > > > +              * some pending processes were checked.
> > > > +              */
> > > > +             if (rwlock_is_contended(&tasklist_lock) &&
> > > > +                 time_after(jiffies, timeout) &&
> > > > +                 proceed_pending_processes > 100) {
> > > > +                     complete = false;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > >       read_unlock(&tasklist_lock);
> > >
> > > Instead of all this can we not just use rcu_read_lock() instead of
> > > tasklist_lock?
> >
> > I’m concerned that there’s a potential race condition in fork() if we
> > use RCU, as illustrated below:
> >
> >   CPU0                                                     CPU1
> >
> > write_lock_irq(&tasklist_lock);
> > klp_copy_process(p);
> >
> >                                          parent->patch_state=klp_target_state
> >
> > list_add_tail_rcu(&p->tasks, &init_task.tasks);
> > write_unlock_irq(&tasklist_lock);
> >
> > In this scenario, after the parent executes klp_copy_process(p) to
> > copy its patch_state to the child, but before adding the child to the
> > task list, the parent’s patch_state might be updated by the KLP
> > transition. This could result in the child being left with an outdated
> > state.

Similar race actually existed even before. 

> > We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> > treated as a single atomic operation.
> 
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:

This might work when a new process is created, for example, using
execve(). But it would fail when the process is just forked (man 2 fork)
and both parent and child continue running the same code.


> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
>                 p->exit_signal = args->exit_signal;
>         }
> 
> -       klp_copy_process(p);
> +       // klp_init_process(p);
> +       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +       child->patch_state = KLP_TRANSITION_IDLE;

This is exactly the loction where we depend on the sychronization
with the tasklist_lock. It allows us to synchronize both
TIF_PATCH_PENDING flag and p->patch_state between the parent
and child, see the comment in klp_copy_process():

/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
	/*
	 * The parent process may have gone through a KLP transition since
	 * the thread flag was copied in setup_thread_stack earlier. Bring
	 * the task flag up to date with the parent here.
	 *
	 * The operation is serialized against all klp_*_transition()
	 * operations by the tasklist_lock. The only exceptions are
	 * klp_update_patch_state(current) and __klp_sched_try_switch(), but we
	 * cannot race with them because we are current.
	 */
	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
	else
		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);

	child->patch_state = current->patch_state;
}

When using rcu_read_lock() in klp_try_complete_transition,
child->patch_state might get an outdated information. The following
race commes to my mind:

CPU1				CPU1

				klp_try_complete_transition()


taskA:	
 + fork()
   + klp_copy_process()
      child->patch_state = KLP_PATCH_UNPATCHED

				  klp_try_switch_task(taskA)
				    // safe

				child->patch_state = KLP_PATCH_PATCHED

				all processes patched

				klp_finish_transition()


	list_add_tail_rcu(&p->thread_node,
			  &p->signal->thread_head);


BANG: The forked task has KLP_PATCH_UNPATCHED so that
      klp_ftrace_handler() will redirect it to the old code.

      But CPU1 thinks that all tasks are migrated and is going
      to finish the transition



My opinion:

I am afraid that using rcu_read_lock() in klp_try_complete_transition()
might cause more harm than good. The code already is complicated
and this might make it even more tricky.

I would first like to understand how exactly the stall happens.
It is possible that even rcu_read_lock() won't help here!

If the it takes too long time to check backtraces of all pending
processes then even rcu_read_lock() might trigger the RCU stall
warning as well.

Best Regards,
Petr

next prev parent reply	other threads:[~2025-02-12 15:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
2025-02-12 12:51   ` Petr Mladek
2025-02-13  5:49     ` Yafang Shao
2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
2025-02-12  0:40   ` Josh Poimboeuf
2025-02-12  2:34     ` Yafang Shao
2025-02-12 11:54       ` Yafang Shao
2025-02-12 15:42         ` Petr Mladek [this message]
2025-02-13  1:36           ` Josh Poimboeuf
2025-02-13  5:53             ` Yafang Shao
2025-02-13  9:48             ` Petr Mladek
2025-02-13 17:32               ` Josh Poimboeuf
2025-02-14 14:44                 ` Petr Mladek
2025-02-14 18:12                   ` Josh Poimboeuf
2025-02-18  2:37                     ` Yafang Shao
2025-02-13  2:47         ` Josh Poimboeuf
2025-02-13 11:19   ` Find root of the stall: was: " Petr Mladek
2025-02-13 12:32     ` Yafang Shao
2025-02-13 12:39       ` Yafang Shao
2025-02-14  2:44         ` Yafang Shao
2025-02-14  8:36           ` Josh Poimboeuf
2025-02-14 11:37             ` Petr Mladek
2025-02-18  2:19               ` Yafang Shao
2025-02-14  9:46       ` Petr Mladek
2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
2025-02-12  0:52   ` Josh Poimboeuf
2025-02-12  2:42     ` Yafang Shao
2025-02-13  1:58       ` Josh Poimboeuf
2025-02-13  5:51         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6zBb9GRkFC-R0RE@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.