From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 194C8C433F5 for ; Wed, 15 Dec 2021 13:50:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A2B066B0073; Wed, 15 Dec 2021 08:50:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D9916B0074; Wed, 15 Dec 2021 08:50:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A2466B0075; Wed, 15 Dec 2021 08:50:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 7B8FC6B0073 for ; Wed, 15 Dec 2021 08:50:06 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3C3C27BFBF for ; Wed, 15 Dec 2021 13:49:56 +0000 (UTC) X-FDA: 78920161992.25.FC88EC3 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP id 5BE4FC000C for ; Wed, 15 Dec 2021 13:49:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=iuaYJqma/ERiefenupmDJPzfD80YKiUtXtqKwSolbeo=; b=hTExRdHTxiLn75Zknoq0y9o5H4 0zxG/+VZG7PMMXpqiyFOmW2MRru8Q92pkg5yQN4hi0PINVrOPRM7va6jxuocFrvlgOo+R2nZZVRoE wwgCQ5qNOSg7SlU7AcpdvnqZtAgpapgMzF8fsgA6tNkYUEUv2OXadB7P/S/33fiByodoVVEg4Jipu McjTELM2qMPw0e29FDbPq4YcoXpmVG9/0e6YCci1lSCeWsxxHZLgw7eVjIadOZ/QiOZlki/sl8Fxc vEg5T8WitZbroosREJ24YG2W6Vyhz3f7U6wMbbTVPtRNoiQp8XRJSxw1DY06m8Hoc0tm4reezKvX5 ViIrurtg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mxUeu-00Eh0z-GE; Wed, 15 Dec 2021 13:49:28 +0000 Date: Wed, 15 Dec 2021 13:49:28 +0000 From: Matthew Wilcox To: Peter Zijlstra Cc: Peter Oskolkov , Ingo Molnar , Thomas Gleixner , juri.lelli@redhat.com, Vincent Guittot , dietmar.eggemann@arm.com, Steven Rostedt , Ben Segall , mgorman@suse.de, bristot@redhat.com, Linux Kernel Mailing List , Linux Memory Management List , linux-api@vger.kernel.org, x86@kernel.org, Paul Turner , Peter Oskolkov , Andrei Vagin , Jann Horn , Thierry Delisle Subject: Re: [RFC][PATCH 0/3] sched: User Managed Concurrency Groups Message-ID: References: <20211214204445.665580974@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5BE4FC000C X-Stat-Signature: bqa4dudh1ijt881g1mkaj6touk9wf3x1 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=hTExRdHT; dmarc=none; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1639576195-166609 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 15, 2021 at 11:44:49AM +0100, Peter Zijlstra wrote: > On Tue, Dec 14, 2021 at 07:46:25PM -0800, Peter Oskolkov wrote: > > > Anyway, I'll test your patchset over the next week or so and let you > > know if anything really needed is missing (other than waking an idle > > server if there is one on a worker wakeup; this piece is definitely > > needed). > > Right, so the problem I'm having is that a single idle server ptr like > before can trivially miss waking annother idle server. > > Suppose: > > umcg::idle_server_tid_ptr > > Then the enqueue_and_wake() thing from the last patch would: > > idle_server_tid = xchg((pid_t __user *)self->idle_server_tid_ptr, 0); > > to consume the tid, and then use that to enqueue and wake. But what if a > second wakeup happens right after that? There might be a second idle > server, but we'll never find it, because userspace hasn't had time to > update the field again. > > Alternatively, we do a linked list of servers, but then every such > wakeup needs to iterate the whole list, looking for one that has > UMCG_TF_IDLE set, or something like that, but that lookup is bad for > performance. > > So I'm really not sure what way to go yet. 1. Linked lists are fugly and bad for the CPU. 2. I'm not sure how big the 'N' in 'M:N' is supposed to be. Might be one per hardware thread? So it could be hundreds-to-thousands, depending on the scale of system. 3. The interface between user-kernel could be an array of idle tids, maybe 16 entries long (16 * 4 = 64 bytes, just one cacheline). As a server finishes work, it looks for a 0 tid in the batch and stores its tid in the slot (cmpxchg, I guess, since the array will be shared between processes). If there are no free slots in the array, then we definitely have 16 threads already waiting for work, so it can park itself in whatever data structure userspace wants to use to manage idle servers. It's up to userspace to decide when to repopulate the array of available servers from its data structure of idle servers.