linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org, rostedt@goodmis.org,
	oleg@redhat.com, peterz@infradead.org, mingo@redhat.com,
	bpf@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org,
	clm@meta.com
Subject: Re: [PATCH 04/12] uprobes: revamp uprobe refcounting and lifetime management
Date: Thu, 27 Jun 2024 11:29:58 +0900	[thread overview]
Message-ID: <20240627112958.0e4aa22fe5a694a2feb11e06@kernel.org> (raw)
In-Reply-To: <20240625002144.3485799-5-andrii@kernel.org>

On Mon, 24 Jun 2024 17:21:36 -0700
Andrii Nakryiko <andrii@kernel.org> wrote:

> Anyways, under exclusive writer lock, we double-check that refcount
> didn't change and is still zero. If it is, we proceed with destruction,
> because at that point we have a guarantee that find_active_uprobe()
> can't successfully look up this uprobe instance, as it's going to be
> removed in destructor under writer lock. If, on the other hand,
> find_active_uprobe() managed to bump refcount from zero to one in
> between put_uprobe()'s atomic_dec_and_test(&uprobe->ref) and
> write_lock(&uprobes_treelock), we'll deterministically detect this with
> extra atomic_read(&uprobe->ref) check, and if it doesn't hold, we
> pretend like atomic_dec_and_test() never returned true. There is no
> resource freeing or any other irreversible action taken up till this
> point, so we just exit early.
> 
> One tricky part in the above is actually two CPUs racing and dropping
> refcnt to zero, and then attempting to free resources. This can happen
> as follows:
>   - CPU #0 drops refcnt from 1 to 0, and proceeds to grab uprobes_treelock;
>   - before CPU #0 grabs a lock, CPU #1 updates refcnt as 0 -> 1 -> 0, at
>     which point it decides that it needs to free uprobe as well.
> 
> At this point both CPU #0 and CPU #1 will believe they need to destroy
> uprobe, which is obviously wrong. To prevent this situations, we augment
> refcount with epoch counter, which is always incremented by 1 on either
> get or put operation. This allows those two CPUs above to disambiguate
> who should actually free uprobe (it's the CPU #1, because it has
> up-to-date epoch). See comments in the code and note the specific values
> of UPROBE_REFCNT_GET and UPROBE_REFCNT_PUT constants. Keep in mind that
> a single atomi64_t is actually a two sort-of-independent 32-bit counters
> that are incremented/decremented with a single atomic_add_and_return()
> operation. Note also a small and extremely rare (and thus having no
> effect on performance) need to clear the highest bit every 2 billion
> get/put operations to prevent high 32-bit counter from "bleeding over"
> into lower 32-bit counter.

I have a question here.
Is there any chance to the CPU#1 to put the uprobe before CPU#0 gets
the uprobes_treelock, and free uprobe before CPU#0 validate uprobe->ref
again? e.g.

CPU#0							CPU#1

put_uprobe() {
	atomic64_add_return()
							__get_uprobe();
							put_uprobe() {
								kfree(uprobe)
							}
	write_lock(&uprobes_treelock);
	atomic64_read(&uprobe->ref);
}

I think it is very rare case, but I could not find any code to prevent
this scenario.

Thank you,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

  parent reply	other threads:[~2024-06-27  2:30 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25  0:21 [PATCH 00/12] uprobes: add batched register/unregister APIs and per-CPU RW semaphore Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 01/12] uprobes: update outdated comment Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 02/12] uprobes: grab write mmap lock in unapply_uprobe() Andrii Nakryiko
2024-06-25  1:29   ` Masami Hiramatsu
2024-06-25 14:49     ` Oleg Nesterov
2024-06-25 17:37       ` Andrii Nakryiko
2024-06-25 19:07         ` Oleg Nesterov
2024-06-26 16:38           ` Andrii Nakryiko
2024-06-25 10:50   ` Oleg Nesterov
2024-06-25  0:21 ` [PATCH 03/12] uprobes: simplify error handling for alloc_uprobe() Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 04/12] uprobes: revamp uprobe refcounting and lifetime management Andrii Nakryiko
2024-06-25 14:44   ` Oleg Nesterov
2024-06-25 17:30     ` Andrii Nakryiko
2024-06-26  6:02   ` kernel test robot
2024-06-26 16:39     ` Andrii Nakryiko
2024-06-27  2:29   ` Masami Hiramatsu [this message]
2024-06-27 16:43     ` Andrii Nakryiko
2024-07-01 21:59       ` Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 05/12] uprobes: move offset and ref_ctr_offset into uprobe_consumer Andrii Nakryiko
2024-06-27  3:06   ` Masami Hiramatsu
2024-06-25  0:21 ` [PATCH 06/12] uprobes: add batch uprobe register/unregister APIs Andrii Nakryiko
2024-06-26 11:27   ` Jiri Olsa
2024-06-26 16:44     ` Andrii Nakryiko
2024-06-27 13:04   ` Masami Hiramatsu
2024-06-27 16:47     ` Andrii Nakryiko
2024-06-28  6:28       ` Masami Hiramatsu
2024-06-28 16:34         ` Andrii Nakryiko
2024-06-29 23:30           ` Masami Hiramatsu
2024-07-01 17:55             ` Andrii Nakryiko
2024-07-01 22:15               ` Andrii Nakryiko
2024-07-02  1:01                 ` Masami Hiramatsu
2024-07-02  1:34                   ` Andrii Nakryiko
2024-07-02 15:19                     ` Masami Hiramatsu
2024-07-02 16:53                       ` Steven Rostedt
2024-07-02 21:23                         ` Andrii Nakryiko
2024-07-02 23:16                         ` Masami Hiramatsu
2024-06-25  0:21 ` [PATCH 07/12] uprobes: inline alloc_uprobe() logic into __uprobe_register() Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 08/12] uprobes: split uprobe allocation and uprobes_tree insertion steps Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 09/12] uprobes: batch uprobes_treelock during registration Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 10/12] uprobes: improve lock batching for uprobe_unregister_batch Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 11/12] uprobes,bpf: switch to batch uprobe APIs for BPF multi-uprobes Andrii Nakryiko
2024-06-25  0:21 ` [PATCH 12/12] uprobes: switch uprobes_treelock to per-CPU RW semaphore Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240627112958.0e4aa22fe5a694a2feb11e06@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clm@meta.com \
    --cc=jolsa@kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).