From: Peter Zijlstra <peterz@infradead.org>
To: Seohyeon Maeng <msh1307@theori.io>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
jgross@suse.com, leitao@debian.org, linux-kernel@vger.kernel.org,
bioloidgp@gmail.com
Subject: Re: [Report] Race Condition in text_poke_bp_batch/poke_int3_handler
Date: Tue, 3 Dec 2024 11:07:57 +0100 [thread overview]
Message-ID: <20241203100757.GG8562@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <674eba3e.620a0220.127f52.7e92@mx.google.com>
On Tue, Dec 03, 2024 at 04:58:50PM +0900, Seohyeon Maeng wrote:
> [ 24.729808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
What QEMU version and setup are you using?
There have been QEMU bugs around there. Can you reproduce on real
hardware? Because I can't seem to trigger this...
> A kernel panic occurs when the following code is executed during live
> patching. In this scenario, an int3 trap can be triggered.
>
> static inline void perf_event_task_sched_out(struct task_struct *prev,
> struct task_struct *next)
> {
> [...]
> if (static_branch_unlikely(&perf_sched_events))
> __perf_event_task_sched_out(prev, next);
> }
>
> noinstr int poke_int3_handler(struct pt_regs *regs)
> {
> [...]
> desc = try_get_desc(); // [1]
> if (!desc)
> return 0;
> [...]
> if (unlikely(desc->nr_entries > 1)) {
> tp = __inline_bsearch(ip, desc->vec, desc->nr_entries,
> sizeof(struct text_poke_loc),
> patch_cmp);
> if (!tp)
> goto out_put;
> } else {
> tp = desc->vec;
> if (text_poke_addr(tp) != ip)
> goto out_put;
> }
> [...]
> out_put:
> put_desc();
> return ret;
> }
>
> During the Interrupt Handler (poke_int3_handler) processing, the patch
> function may be entered, resulting in an improper reference count
> (refcount). This can cause the reference count to be incorrectly set,
> and the bp_desc.vec and bp_desc.nr_entries are reinitialized, leading
> to a loss of critical information and subsequent failures in handling
> the int3 trap.
>
> static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
> {
> [...]
> lockdep_assert_held(&text_mutex);
>
> bp_desc.vec = tp;
> bp_desc.nr_entries = nr_entries;
>
> /*
> * Corresponds to the implicit memory barrier in try_get_desc() to
> * ensure reading a non-zero refcount provides up to date bp_desc data.
> */
> atomic_set_release(&bp_desc.refs, 1); // [2]
> [...]
> /*
> * Remove and wait for refs to be zero.
> */
> if (!atomic_dec_and_test(&bp_desc.refs)) // [3]
> atomic_cond_read_acquire(&bp_desc.refs, !VAL);
> [...]
> }
>
> As demonstrated above, bp_desc and its refcount can be modified while
> poke_int3_handler is executing, leading to unexpected behavior.
>
> Consider a scenario where two CPUs concurrently execute the sequence
> [1] -> [2] -> [3] -> [1], with overlapping operations on the reference
> count. When [3] is executed, the refcount may drop to zero. As a
> result, when [1] attempts to retrieve the descriptor, it fails,
> leading to a kernel panic.
I'm failing to see how this can happen. The text_poke_bp() caller should
hold text_mutex, there SHOULD be no concurrency on [2]/[3].
So there is a single CPU doing text_poke_bp():
mutex_lock(&text_mutex);
text_poke_bp_batch()
lockdep_assert_held(&text_mutex);
atomic_set_release(&bp_desc.refs, 1); [2]
smp_wmb();
poke-first-byte-to-INT3 [A]
text_poke_sync();
poke-tail-bytes
text_poke_sync();
poke-first-byte
text_poke_sync(); [B]
if (!atomic_dec_and_test(&bp_desc.refs)) [3]
atomic_cond_read_acquire(&bp_desc.refs, VAL);
mutex_unlock(&text_mutex);
The only concurrency is multiple CPUs hitting the INT3, which exists
between [A] and [B], and notably, in that range the reference count
should be very much >= 1.
And [3] very specifically waits for all pre-existing interrupt handlers
to complete; at point [B] the INT3 is gone and no new handlers can
possibly start.
The INT3 handler (poke_int3_handler()) had the following cases:
- the boring case, INT3 is observed right after A, it gets a ref, does
the emulation and completes before 3.
- the tail case, INT3 is observed somewhere before B, it gets a ref,
does the emulation but complets after B, in which case 3 will wait
for it.
Hmm, there *might* be an issue when:
- INT3 triggers right before B, poke_int3_handler()'s try_get_desc() is
delayed until after 3.
But that is not what you were describing, were you? I think that case is
made impossible by text_poke_sync() itself, that sends an IPI to all
CPUs, completion of that IPI would block on the completion of the INT3
which triggered right before B.
And after the sync-IPI that CPU must not observe INT3 anymore.
If you really think there is a problem here, please describe the code
flow in more detail. But given I can't trigger anything, nor actually
see a hole in the code, I'm going to assume you managed to tickle the
QEMU bug.
prev parent reply other threads:[~2024-12-03 10:08 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-03 7:58 [Report] Race Condition in text_poke_bp_batch/poke_int3_handler Seohyeon Maeng
2024-12-03 10:07 ` Peter Zijlstra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241203100757.GG8562@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bioloidgp@gmail.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=msh1307@theori.io \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox