From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDD4CD51E; Thu, 27 Jun 2024 02:30:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719455403; cv=none; b=o6SxNJ0JKvlHu6XOT/6owX6L/23l0aQuW/mNbD6BWOkq/ac0lEHvf6G7t8poWuO3e66rVZQgjpsBH229O3V6FTWN6nrGRXnMItLttaa2lxHAL8JU8GNAXeZ4QRT1L6ghRSikiYveGf/+vgH/KJ0yxtWKeXuWrVfkCqLHBNQaa2U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719455403; c=relaxed/simple; bh=yLoEUkzIdR2ynQVS/sAJoHQIrQ/7oTOkXayKVHubxEw=; h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References: Mime-Version:Content-Type; b=gDd1WM0cVv9ZBae1AVscN21dTW8MdkIQaXkaDZ2HS7y24RnUdUeoJIhCr/551dV6a+mTH+WxbOaIfbC3WeqFlMsjJYfFeZz8WmBGM7JHA17lJobiMZ65pOp+ndqTg1/5Ho1+Ub+xTyHJnteJxDHwQvdRGGNt4jkKnwj+o/SmlIY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UDTAIHEi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UDTAIHEi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8BCAC116B1; Thu, 27 Jun 2024 02:30:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719455403; bh=yLoEUkzIdR2ynQVS/sAJoHQIrQ/7oTOkXayKVHubxEw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=UDTAIHEitL1yCCx4EoOND2jkD2AWNFAwJqHtcjRlXwB6AflLUXQrb3e1aSFD06Uit X+BPLiW3Mw9VavuY9ZCn+jNt7cGBO4pg+T8EVs4XL4fDX5Doyk/NqOd1Zw7HwjbTvR pNMO5TKbObbG6pNl4fVwQ2BykNN8qOXTDu/T7yXeoveLpDd15XfCF0T1zreReYBXHQ B/uczRtHy6DbD7ilXJN+1WOFEtt9KbUTTHXGQwWe89TK2k+qYSCB0si2GMlMe9ZsG2 3fydiy5flX+rRVl1LoT8vbzppV5uVFhJ+MNtvi6bUd1z77OHleVdjH0JBiIy9kHCkN vtxoZrqsnfihQ== Date: Thu, 27 Jun 2024 11:29:58 +0900 From: Masami Hiramatsu (Google) To: Andrii Nakryiko Cc: linux-trace-kernel@vger.kernel.org, rostedt@goodmis.org, oleg@redhat.com, peterz@infradead.org, mingo@redhat.com, bpf@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, clm@meta.com Subject: Re: [PATCH 04/12] uprobes: revamp uprobe refcounting and lifetime management Message-Id: <20240627112958.0e4aa22fe5a694a2feb11e06@kernel.org> In-Reply-To: <20240625002144.3485799-5-andrii@kernel.org> References: <20240625002144.3485799-1-andrii@kernel.org> <20240625002144.3485799-5-andrii@kernel.org> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 24 Jun 2024 17:21:36 -0700 Andrii Nakryiko wrote: > Anyways, under exclusive writer lock, we double-check that refcount > didn't change and is still zero. If it is, we proceed with destruction, > because at that point we have a guarantee that find_active_uprobe() > can't successfully look up this uprobe instance, as it's going to be > removed in destructor under writer lock. If, on the other hand, > find_active_uprobe() managed to bump refcount from zero to one in > between put_uprobe()'s atomic_dec_and_test(&uprobe->ref) and > write_lock(&uprobes_treelock), we'll deterministically detect this with > extra atomic_read(&uprobe->ref) check, and if it doesn't hold, we > pretend like atomic_dec_and_test() never returned true. There is no > resource freeing or any other irreversible action taken up till this > point, so we just exit early. > > One tricky part in the above is actually two CPUs racing and dropping > refcnt to zero, and then attempting to free resources. This can happen > as follows: > - CPU #0 drops refcnt from 1 to 0, and proceeds to grab uprobes_treelock; > - before CPU #0 grabs a lock, CPU #1 updates refcnt as 0 -> 1 -> 0, at > which point it decides that it needs to free uprobe as well. > > At this point both CPU #0 and CPU #1 will believe they need to destroy > uprobe, which is obviously wrong. To prevent this situations, we augment > refcount with epoch counter, which is always incremented by 1 on either > get or put operation. This allows those two CPUs above to disambiguate > who should actually free uprobe (it's the CPU #1, because it has > up-to-date epoch). See comments in the code and note the specific values > of UPROBE_REFCNT_GET and UPROBE_REFCNT_PUT constants. Keep in mind that > a single atomi64_t is actually a two sort-of-independent 32-bit counters > that are incremented/decremented with a single atomic_add_and_return() > operation. Note also a small and extremely rare (and thus having no > effect on performance) need to clear the highest bit every 2 billion > get/put operations to prevent high 32-bit counter from "bleeding over" > into lower 32-bit counter. I have a question here. Is there any chance to the CPU#1 to put the uprobe before CPU#0 gets the uprobes_treelock, and free uprobe before CPU#0 validate uprobe->ref again? e.g. CPU#0 CPU#1 put_uprobe() { atomic64_add_return() __get_uprobe(); put_uprobe() { kfree(uprobe) } write_lock(&uprobes_treelock); atomic64_read(&uprobe->ref); } I think it is very rare case, but I could not find any code to prevent this scenario. Thank you, -- Masami Hiramatsu (Google)