From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C8F1C433EF for ; Tue, 1 Feb 2022 20:22:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231218AbiBAUWM (ORCPT ); Tue, 1 Feb 2022 15:22:12 -0500 Received: from mail.efficios.com ([167.114.26.124]:44348 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234644AbiBAUWM (ORCPT ); Tue, 1 Feb 2022 15:22:12 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 830A9348349; Tue, 1 Feb 2022 15:22:11 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id KY2d6z-Wntep; Tue, 1 Feb 2022 15:22:11 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id ECA35348346; Tue, 1 Feb 2022 15:22:10 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com ECA35348346 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1643746931; bh=v91rrTgNrdyWSt2/uNOesVvcG4ff1f/IjsdFmDxvw2I=; h=Date:From:To:Message-ID:MIME-Version; b=qXDonx4SjnLlO4MImf6cNmrj80Nwepd2KokiMjD/223eJJGtgq084HwGVdsTgQXGa KeGbbKsygZW0/JYwMvPZw+9uKunJ4KgD3JKudeh/qyPYWTin7SgsGL6AKUJDhb848N 5oOfQw2j2zllCGbeROQLvVby6mavG/0o6cgkp+QQXnQhdRXTL3FRAXW7ZxPxQWso0/ ysxBWfpwQ/U3zQaAXRVEPHZU1wPA5k470XdCNKrlvVadg5/mghxu3ncO+iTW2QheNo dKbNYKF+f3VK+ZxZBk9D1l9nrofznik1Cg2caQuG+WOviwImGwiym++YrbQpzNdK3Z 3RetqtxLYa4CQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id C06nrz-eLqu1; Tue, 1 Feb 2022 15:22:10 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id D9D0F348063; Tue, 1 Feb 2022 15:22:10 -0500 (EST) Date: Tue, 1 Feb 2022 15:22:10 -0500 (EST) From: Mathieu Desnoyers To: Florian Weimer Cc: Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , David Laight , carlos , Peter Oskolkov Message-ID: <1075473571.25688.1643746930751.JavaMail.zimbra@efficios.com> In-Reply-To: <87bkzqz75q.fsf@mid.deneb.enyo.de> References: <20220201192540.10439-1-mathieu.desnoyers@efficios.com> <20220201192540.10439-2-mathieu.desnoyers@efficios.com> <87bkzqz75q.fsf@mid.deneb.enyo.de> Subject: Re: [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_4203 (ZimbraWebClient - FF96 (Linux)/8.8.15_GA_4203) Thread-Topic: rseq: extend struct rseq with per thread group vcpu id Thread-Index: uYaG4pyeANMXZzoDJBt8q438TTkSZw== Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org ----- On Feb 1, 2022, at 3:03 PM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: >=20 >> If a thread group has fewer threads than cores, or is limited to run on >> few cores concurrently through sched affinity or cgroup cpusets, the >> virtual cpu ids will be values close to 0, thus allowing efficient use >> of user-space memory for per-cpu data structures. >=20 > From a userspace programmer perspective, what's a good way to obtain a > reasonable upper bound for the possible tg_vcpu_id values? Some effective upper bounds: - sysconf(3) _SC_NPROCESSORS_CONF, - the number of threads which exist concurrently in the process, - the number of cpus in the cpu affinity mask applied by sched_setaffinity, except in corner-case situations such as cpu hotplug removing all cpus fr= om the affinity set, - cgroup cpuset "partition" limits, Note that AFAIR non-partition cgroup cpusets allow a cgroup to "borrow" additional cores from the rest of the system if they are idle, therefore allowing the number of concurrent threads to go beyond the specified limit. >=20 > I believe not all users of cgroup cpusets change the affinity mask. AFAIR the sched affinity mask is tweaked independently of the cgroup cpuset= . Those are two mechanisms both affecting the scheduler task placement. I would expect the user-space code to use some sensible upper bound as a hint about how many per-vcpu data structure elements to expect (and how man= y to pre-allocate), but have a "lazy initialization" fall-back in case the vcpu id goes up to the number of configured processors - 1. And I suspect that even the number of configured processors may change with CRIU. >=20 >> diff --git a/kernel/rseq.c b/kernel/rseq.c >> index 13f6d0419f31..37b43735a400 100644 >> --- a/kernel/rseq.c >> +++ b/kernel/rseq.c >> @@ -86,10 +86,14 @@ static int rseq_update_cpu_node_id(struct task_struc= t *t) >> =09struct rseq __user *rseq =3D t->rseq; >> =09u32 cpu_id =3D raw_smp_processor_id(); >> =09u32 node_id =3D cpu_to_node(cpu_id); >> +=09u32 tg_vcpu_id =3D task_tg_vcpu_id(t); >> =20 >> =09if (!user_write_access_begin(rseq, t->rseq_len)) >> =09=09goto efault; >> =09switch (t->rseq_len) { >> +=09case offsetofend(struct rseq, tg_vcpu_id): >> +=09=09unsafe_put_user(tg_vcpu_id, &rseq->tg_vcpu_id, efault_end); >> +=09=09fallthrough; >> =09case offsetofend(struct rseq, node_id): >> =09=09unsafe_put_user(node_id, &rseq->node_id, efault_end); >> =09=09fallthrough; >=20 > Is the switch really useful? I suspect it's faster to just write as > much as possible all the time. The switch should be well-predictable > if running uniform userspace, but still =E2=80=A6 The switch ensures the kernel don't try to write to a memory area beyond the rseq size which has been registered by user-space. So it seems to be useful to ensure we don't corrupt user-space memory. Or am I missing your point ? Thanks, Mathieu --=20 Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com