From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31FD9CFB42B for ; Sat, 5 Oct 2024 18:52:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62C446B032D; Sat, 5 Oct 2024 14:52:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DD2C6B032F; Sat, 5 Oct 2024 14:52:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 456226B0330; Sat, 5 Oct 2024 14:52:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 20D466B032D for ; Sat, 5 Oct 2024 14:52:22 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8C937A03AF for ; Sat, 5 Oct 2024 18:52:21 +0000 (UTC) X-FDA: 82640444082.05.98015F0 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf11.hostedemail.com (Postfix) with ESMTP id 9BD1240003 for ; Sat, 5 Oct 2024 18:52:19 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=vxTXlShV; dmarc=pass (policy=none) header.from=efficios.com; spf=pass (imf11.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728154315; a=rsa-sha256; cv=none; b=VuZwBFpPMdeiq6Nqm1qhptp0D5UhlZ0Ag/0M5ujye5VfcMHsXFUfbHRtIzatz0U+A/DaIP lbxPvriKLc+1+/VlVb2ODgv9in/3k6gWM2yhWKTKYF+ZoxuZ1zoQAEWMRqzJ11mtfUdfX5 m79vawVuL+1o/hWMWvd/JWdLdQtJqIA= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=vxTXlShV; dmarc=pass (policy=none) header.from=efficios.com; spf=pass (imf11.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728154315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GSfRXdW0/xdwSyz6PBBC4W67Cy+ZxtEoZ9JKGqG3XMQ=; b=wHSQvoPjScxXgpDJGyc86rJYwQLZKo1fRbxPBVUAYQzCweXGdjdynj9YGZawA1SJmygPSb GWMwlyYM5R0lsqQ6m2aKCqviyhHoY6itw4hAoqitJ6aNshJTgB8L/JCww7rZsSe8wfPu/3 IGfIkDTIz8ReCPdjj1ldfxeEKZPXgbw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1728154338; bh=JSI4GIafCFIDWNusASxURrvmoqUTCwAH/RTHAgePnVc=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=vxTXlShVlxBtfVRn9FZAIew6JLF1xV1uJViSIOkv1tsFkyEXnPZvLOTOPqNMXYRYJ Suei90Ej5J6nX12eotVsnRrVCPkb36HAY3bEFgULmSHyDbRnXt4VMJlg6rqP8YM7cw xbgMDtpdAI9AEq7Wc2t3mT6bLGznLBLJpLdujjmpUHjcROlKsKliaPw33sBQ/lVuel irK0eTFdPU3Mckx8mpDUYFZxYJKUAQZGweM8R0C3+bin5khSmV3jOWvwSJ7ku5BLGJ cCwCQspuaSQkYYFaU2ps1EAwdJGFxfYLayg3QZGPAoc2y4S4P+/iD1eoqQIvaxbSqr /qgPE4N4CD7TA== Received: from [IPV6:2606:6d00:100:4000:cacb:9855:de1f:ded2] (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XLZKG0RJxzcCP; Sat, 5 Oct 2024 14:52:18 -0400 (EDT) Message-ID: Date: Sat, 5 Oct 2024 14:50:17 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers To: Peter Zijlstra Cc: Boqun Feng , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev References: <20241004182734.1761555-1-mathieu.desnoyers@efficios.com> <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> <20241005160444.GA18071@noisy.programming.kicks-ass.net> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20241005160444.GA18071@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: yb6ubsuhyzifa33w15adz1zxncde9pdh X-Rspamd-Queue-Id: 9BD1240003 X-Rspamd-Server: rspam02 X-HE-Tag: 1728154339-716022 X-HE-Meta: U2FsdGVkX1/TqJ7YhRn6hcjZZi8Y0w1u12GRgHMedYBEIYo4PFmoeyEJaYaTehvRdaFjcOjfFgDVh5tp5+PpPvYTMSoqccrPZ1OSNXnmFu9gmnZqsZxn3R1ZmLgmjd9973AoUqylE5LpottukXj59Ulk4lFe3bLw1+VxlK8wjRjIXUrguauMuWvlaIepvX5k7yeowz/hpt2k9I3n228uIev31ZCYUxBl3fK1TzqCE5u8vw6boGlrsU6zHFkfrFX5Or1sjY0J/16+nh7c8nakT9clwCHkTcMkt+4Hrbb8EEgXImd1rW7V3WnJcSHLBTfc1kCI3O0sih9as/32QvBs3IZ/wynnvxLtjVYZctPhXNVupInoFhASRAupmA/oLGZv1853TVLVNK+SVO1UJ1OIMeHhyiTCgaLHAxOE7/hk4r6X596BkNl5hk+h5cTc2N97CuPCRyoGUQWiEeSUiEH7E2uHoZgbX1WGdSwgUijh9E5GvwWIVFDKTsVbbxUylJjeCX1AoHMbgYQW7FxFGf4r6xyR2PBOXZrVtGJib9BoSEBUj+UEP+nfap73cYOAHJ2wxlDkRKdi4TRIIoPzLqk1Bfc1wAbxLXXVbCUFbFezW6IjLGdm6NyXoHAIqlJiN+9eUvzeMErZWol1U05k0e7ugddMwKgboqrlZLYhvF2KiAJGjP2cYFSkDTuqHIgEf7eWoJ8ZBzDzNn46NmfV5M8LZuk4xFzFSuLoXoCPv+tIV/2GBK5k9Wr0PM+J+wi503QMoTjSPU725WpAOKTxFXP1MhF7vZLz1Nw7EoQ9ZyGMFsCf74EfFUXa06bIUo3+KGzTOVYCgWdkTGmzDZwxDXSkeD5JIZ8RFdtMnAkjio5oq1iV+u++0Hv82tbLLtl4u1p+pQtsAZu/4jL3X/0XzXY5YoqwIX4wh/BDE+acyfQz4ktXKhJlrIcUBWQQWXE7A9ob82cAHsizIUELMgku/7R CINVIXKc YFmgSzDHA/h9JcyZ79nKkD2LYbVyVnDAVNitpaCJiUD148yNV3dSg5GgUleYb/T4OxO/p9ci8OrSRC+ObTD0qIRgMBpDEB2t5BUEByrlrbzeBwSbKetPN++NeNeTYgbxq4o+PzgoK9INl2Cciq8OERXRhxcT7RfQ+ylWlP8qo6XRS8WGzUErx/EApiIua5N6V3VpBIx3+NAM/1qKC2kwkpwJjERM1eUDT8U0l5zFE5yp9Db9BAslcUKCjU5T/Vp06rhfhve7Hapl70J0f7apqU0NVKu0+TOAcLn2MkNsWL9L9eanNNi2qBOxMzpr/+T86KHTegNPWiKm3GIwfuYqOCN7+wI7HKbFCg3VKzEki/j1efvQBOgdbKBaA2S6PEMCFj9TgTYJJVVeQM8MNt2aS02iuio5xlOZgdDhO6RTFQn6yOc+8IedZvEWLLV2mqVwIS+gDXX2klMprwVA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-10-05 18:04, Peter Zijlstra wrote: > On Fri, Oct 04, 2024 at 02:27:33PM -0400, Mathieu Desnoyers wrote: >> include/linux/hp.h | 158 +++++++++++++++++++++++++++++++++++++++++++++ >> kernel/Makefile | 2 +- >> kernel/hp.c | 46 +++++++++++++ >> 3 files changed, 205 insertions(+), 1 deletion(-) >> create mode 100644 include/linux/hp.h >> create mode 100644 kernel/hp.c >> >> diff --git a/include/linux/hp.h b/include/linux/hp.h >> new file mode 100644 >> index 000000000000..e85fc4365ea2 >> --- /dev/null >> +++ b/include/linux/hp.h >> @@ -0,0 +1,158 @@ >> +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers >> +// >> +// SPDX-License-Identifier: LGPL-2.1-or-later >> + >> +#ifndef _LINUX_HP_H >> +#define _LINUX_HP_H >> + >> +/* >> + * HP: Hazard Pointers >> + * >> + * This API provides existence guarantees of objects through hazard >> + * pointers. >> + * >> + * It uses a fixed number of hazard pointer slots (nr_cpus) across the >> + * entire system for each HP domain. >> + * >> + * Its main benefit over RCU is that it allows fast reclaim of >> + * HP-protected pointers without needing to wait for a grace period. >> + * >> + * It also allows the hazard pointer scan to call a user-defined callback >> + * to retire a hazard pointer slot immediately if needed. This callback >> + * may, for instance, issue an IPI to the relevant CPU. >> + * >> + * References: >> + * >> + * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for >> + * lock-free objects," in IEEE Transactions on Parallel and >> + * Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 >> + */ >> + >> +#include >> + >> +/* >> + * Hazard pointer slot. >> + */ >> +struct hp_slot { >> + void *addr; >> +}; >> + >> +/* >> + * Hazard pointer context, returned by hp_use(). >> + */ >> +struct hp_ctx { >> + struct hp_slot *slot; >> + void *addr; >> +}; >> + >> +/* >> + * hp_scan: Scan hazard pointer domain for @addr. >> + * >> + * Scan hazard pointer domain for @addr. >> + * If @retire_cb is NULL, wait to observe that each slot contains a value >> + * that differs from @addr. >> + * If @retire_cb is non-NULL, invoke @callback for each slot containing >> + * @addr. >> + */ >> +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, >> + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)); > > struct hp_domain { > struct hp_slot __percpu *slots > }; > > might clarify things a wee little. Good point. This introduces: #define DECLARE_HP_DOMAIN(domain) \ extern struct hp_domain domain #define DEFINE_HP_DOMAIN(domain) \ static DEFINE_PER_CPU(struct hp_slot, __ ## domain ## _slots); \ struct hp_domain domain = { \ .percpu_slots = &__## domain ## _slots, \ } > >> + >> +/* Get the hazard pointer context address (may be NULL). */ >> +static inline >> +void *hp_ctx_addr(struct hp_ctx ctx) >> +{ >> + return ctx.addr; >> +} > > From where I'm sitting this seems like superfluous fluff, what's wrong > with ctx.addr ? I'm OK removing the accessor and just using ctx.addr. > >> +/* >> + * hp_allocate: Allocate a hazard pointer. >> + * >> + * Allocate a hazard pointer slot for @addr. The object existence should >> + * be guaranteed by the caller. Expects to be called from preempt >> + * disable context. >> + * >> + * Returns a hazard pointer context. > > So you made the WTF'o'meter crack, this here function does not allocate > nothing. Naming is bad. At best this is something like > try-set-hazard-pointer or somesuch. I went with the naming from the 2004 paper from Maged Michael, but I agree it could be clearer. I'm tempted to go for "hp_try_post()" and "hp_remove()", basically "posting" the intent to use a pointer (as in on a metaphorical billboard), and removing it when it's done. > >> + */ >> +static inline >> +struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr) >> +{ >> + struct hp_slot *slot; >> + struct hp_ctx ctx; >> + >> + if (!addr) >> + goto fail; >> + slot = this_cpu_ptr(percpu_slots); >> + /* >> + * A single hazard pointer slot per CPU is available currently. >> + * Other hazard pointer domains can eventually have a different >> + * configuration. >> + */ >> + if (READ_ONCE(slot->addr)) >> + goto fail; >> + WRITE_ONCE(slot->addr, addr); /* Store B */ >> + ctx.slot = slot; >> + ctx.addr = addr; >> + return ctx; >> + >> +fail: >> + ctx.slot = NULL; >> + ctx.addr = NULL; >> + return ctx; >> +} >> + >> +/* >> + * hp_dereference_allocate: Dereference and allocate a hazard pointer. >> + * >> + * Returns a hazard pointer context. Expects to be called from preempt >> + * disable context. >> + */ > > More terrible naming. Same as above, but additionally, I would expect a > 'dereference' to actually dereference the pointer and have a return > value of the dereferenced type. hp_dereference_try_post() ? > > This function seems to double check and update the hp_ctx thing. I'm not > at all sure yet wtf this is doing -- and the total lack of comments > aren't helping. The hp_ctx contains the outputs. The function loads *addr_p to then try_post it into a HP slot. On success, it re-reads the *addr_p (with address dependency) and if it still matches, use that as output address pointer. I'm planning to remove hp_ctx, and just have: /* * hp_try_post: Try to post a hazard pointer. * * Post a hazard pointer slot for @addr. The object existence should * be guaranteed by the caller. Expects to be called from preempt * disable context. * * Returns true if post succeeds, false otherwise. */ static inline bool hp_try_post(struct hp_domain *hp_domain, void *addr, struct hp_slot **_slot) [...] /* * hp_dereference_try_post: Dereference and try to post a hazard pointer. * * Returns a hazard pointer context. Expects to be called from preempt * disable context. */ static inline void *__hp_dereference_try_post(struct hp_domain *hp_domain, void * const * addr_p, struct hp_slot **_slot) [...] #define hp_dereference_try_post(domain, p, slot_p) \ ((__typeof__(*(p))) __hp_dereference_try_post(domain, (void * const *) p, slot_p)) /* Clear the hazard pointer in @slot. */ static inline void hp_remove(struct hp_slot *slot) [...] > >> +static inline >> +struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p) >> +{ >> + void *addr, *addr2; >> + struct hp_ctx ctx; >> + >> + addr = READ_ONCE(*addr_p); >> +retry: >> + ctx = hp_allocate(percpu_slots, addr); >> + if (!hp_ctx_addr(ctx)) >> + goto fail; >> + /* Memory ordering: Store B before Load A. */ >> + smp_mb(); >> + /* >> + * Use RCU dereference without lockdep checks, because >> + * lockdep is not aware of HP guarantees. >> + */ >> + addr2 = rcu_access_pointer(*addr_p); /* Load A */ >> + /* >> + * If @addr_p content has changed since the first load, >> + * clear the hazard pointer and try again. >> + */ >> + if (!ptr_eq(addr2, addr)) { >> + WRITE_ONCE(ctx.slot->addr, NULL); >> + if (!addr2) >> + goto fail; >> + addr = addr2; >> + goto retry; >> + } >> + /* >> + * Use addr2 loaded from rcu_access_pointer() to preserve >> + * address dependency ordering. >> + */ >> + ctx.addr = addr2; >> + return ctx; >> + >> +fail: >> + ctx.slot = NULL; >> + ctx.addr = NULL; >> + return ctx; >> +} >> + >> +/* Retire the hazard pointer in @ctx. */ >> +static inline >> +void hp_retire(const struct hp_ctx ctx) >> +{ >> + smp_store_release(&ctx.slot->addr, NULL); >> +} >> + >> +#endif /* _LINUX_HP_H */ >> diff --git a/kernel/Makefile b/kernel/Makefile >> index 3c13240dfc9f..ec16de96fa80 100644 >> --- a/kernel/Makefile >> +++ b/kernel/Makefile >> @@ -7,7 +7,7 @@ obj-y = fork.o exec_domain.o panic.o \ >> cpu.o exit.o softirq.o resource.o \ >> sysctl.o capability.o ptrace.o user.o \ >> signal.o sys.o umh.o workqueue.o pid.o task_work.o \ >> - extable.o params.o \ >> + extable.o params.o hp.o \ >> kthread.o sys_ni.o nsproxy.o \ >> notifier.o ksysfs.o cred.o reboot.o \ >> async.o range.o smpboot.o ucount.o regset.o ksyms_common.o >> diff --git a/kernel/hp.c b/kernel/hp.c >> new file mode 100644 >> index 000000000000..b2447bf15300 >> --- /dev/null >> +++ b/kernel/hp.c >> @@ -0,0 +1,46 @@ >> +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers >> +// >> +// SPDX-License-Identifier: LGPL-2.1-or-later >> + >> +/* >> + * HP: Hazard Pointers >> + */ >> + >> +#include >> +#include >> + >> +/* >> + * hp_scan: Scan hazard pointer domain for @addr. >> + * >> + * Scan hazard pointer domain for @addr. >> + * If @retire_cb is non-NULL, invoke @callback for each slot containing >> + * @addr. >> + * Wait to observe that each slot contains a value that differs from >> + * @addr before returning. >> + */ >> +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, >> + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)) >> +{ >> + int cpu; >> + >> + /* >> + * Store A precedes hp_scan(): it unpublishes addr (sets it to >> + * NULL or to a different value), and thus hides it from hazard >> + * pointer readers. >> + */ >> + >> + if (!addr) >> + return; >> + /* Memory ordering: Store A before Load B. */ >> + smp_mb(); >> + /* Scan all CPUs slots. */ >> + for_each_possible_cpu(cpu) { >> + struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu); >> + >> + if (retire_cb && smp_load_acquire(&slot->addr) == addr) /* Load B */ >> + retire_cb(cpu, slot, addr); > > Is retirce_cb allowed to cmpxchg the thing? It could, but we'd need to make sure the slot is not re-used by another hp_try_post() before the current user removes its own post. It would need to synchronize with the current HP user (e.g. with IPI). I've actually renamed retire_cb to "on_match_cb". > >> + /* Busy-wait if node is found. */ >> + while ((smp_load_acquire(&slot->addr)) == addr) /* Load B */ >> + cpu_relax(); > > This really should be using smp_cond_load_acquire() Good point, Thanks, Mathieu > >> + } >> +} -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com