All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Seohyeon Maeng <msh1307@theori.io>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	jgross@suse.com, leitao@debian.org, linux-kernel@vger.kernel.org,
	bioloidgp@gmail.com
Subject: Re: [Report] Race Condition in text_poke_bp_batch/poke_int3_handler
Date: Tue, 3 Dec 2024 11:07:57 +0100	[thread overview]
Message-ID: <20241203100757.GG8562@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <674eba3e.620a0220.127f52.7e92@mx.google.com>

On Tue, Dec 03, 2024 at 04:58:50PM +0900, Seohyeon Maeng wrote:

> [   24.729808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014

What QEMU version and setup are you using? 

There have been QEMU bugs around there. Can you reproduce on real
hardware? Because I can't seem to trigger this...


> A kernel panic occurs when the following code is executed during live
> patching. In this scenario, an int3 trap can be triggered.
> 
> static inline void perf_event_task_sched_out(struct task_struct *prev,
> 					     struct task_struct *next)
> {
> 	[...]
> 	if (static_branch_unlikely(&perf_sched_events))
> 		__perf_event_task_sched_out(prev, next);
> }
> 
> noinstr int poke_int3_handler(struct pt_regs *regs)
> {
> 	[...]
> 	desc = try_get_desc();    // [1]
> 	if (!desc)
> 		return 0;
> 	[...]
> 	if (unlikely(desc->nr_entries > 1)) {
> 		tp = __inline_bsearch(ip, desc->vec, desc->nr_entries,
> 				      sizeof(struct text_poke_loc),
> 				      patch_cmp);
> 		if (!tp)
> 			goto out_put;
> 	} else {
> 		tp = desc->vec;
> 		if (text_poke_addr(tp) != ip)
> 			goto out_put;
> 	}
> 	[...]
> out_put:
> 	put_desc();
> 	return ret;
> }
> 
> During the Interrupt Handler (poke_int3_handler) processing, the patch
> function may be entered, resulting in an improper reference count
> (refcount). This can cause the reference count to be incorrectly set,
> and the bp_desc.vec and bp_desc.nr_entries are reinitialized, leading
> to a loss of critical information and subsequent failures in handling
> the int3 trap.
> 
> static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
> {	
> 	[...]
> 	lockdep_assert_held(&text_mutex);
> 
> 	bp_desc.vec = tp;
> 	bp_desc.nr_entries = nr_entries;
> 
> 	/*
> 	 * Corresponds to the implicit memory barrier in try_get_desc() to
> 	 * ensure reading a non-zero refcount provides up to date bp_desc data.
> 	 */
> 	atomic_set_release(&bp_desc.refs, 1); // [2]
> 	[...]	
> 	/*
> 	 * Remove and wait for refs to be zero.
> 	 */
> 	if (!atomic_dec_and_test(&bp_desc.refs))   // [3]
> 		atomic_cond_read_acquire(&bp_desc.refs, !VAL);
> 	[...]
> }
> 
> As demonstrated above, bp_desc and its refcount can be modified while
> poke_int3_handler is executing, leading to unexpected behavior.
> 
> Consider a scenario where two CPUs concurrently execute the sequence
> [1] -> [2] -> [3] -> [1], with overlapping operations on the reference
> count.  When [3] is executed, the refcount may drop to zero. As a
> result, when [1] attempts to retrieve the descriptor, it fails,
> leading to a kernel panic.

I'm failing to see how this can happen. The text_poke_bp() caller should
hold text_mutex, there SHOULD be no concurrency on [2]/[3].

So there is a single CPU doing text_poke_bp():

  mutex_lock(&text_mutex);
  text_poke_bp_batch()
    lockdep_assert_held(&text_mutex);
    atomic_set_release(&bp_desc.refs, 1);	[2]
    smp_wmb();

    poke-first-byte-to-INT3			[A]

    text_poke_sync();

    poke-tail-bytes

    text_poke_sync();

    poke-first-byte

    text_poke_sync();				[B]
							
    if (!atomic_dec_and_test(&bp_desc.refs))	[3]
      atomic_cond_read_acquire(&bp_desc.refs, VAL);
  mutex_unlock(&text_mutex);


The only concurrency is multiple CPUs hitting the INT3, which exists
between [A] and [B], and notably, in that range the reference count
should be very much >= 1.

And [3] very specifically waits for all pre-existing interrupt handlers
to complete; at point [B] the INT3 is gone and no new handlers can
possibly start.

The INT3 handler (poke_int3_handler()) had the following cases:

 - the boring case, INT3 is observed right after A, it gets a ref, does
   the emulation and completes before 3.

 - the tail case, INT3 is observed somewhere before B, it gets a ref,
   does the emulation but complets after B, in which case 3 will wait
   for it.

Hmm, there *might* be an issue when:

 - INT3 triggers right before B, poke_int3_handler()'s try_get_desc() is
   delayed until after 3.

But that is not what you were describing, were you? I think that case is
made impossible by text_poke_sync() itself, that sends an IPI to all
CPUs, completion of that IPI would block on the completion of the INT3
which triggered right before B.

And after the sync-IPI that CPU must not observe INT3 anymore.


If you really think there is a problem here, please describe the code
flow in more detail. But given I can't trigger anything, nor actually
see a hole in the code, I'm going to assume you managed to tickle the
QEMU bug.



      reply	other threads:[~2024-12-03 10:08 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03  7:58 [Report] Race Condition in text_poke_bp_batch/poke_int3_handler Seohyeon Maeng
2024-12-03 10:07 ` Peter Zijlstra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241203100757.GG8562@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bioloidgp@gmail.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=msh1307@theori.io \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.