From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C787C001DC for ; Mon, 31 Jul 2023 16:54:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05A8128006E; Mon, 31 Jul 2023 12:54:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00B6C6B015B; Mon, 31 Jul 2023 12:54:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E145A28006E; Mon, 31 Jul 2023 12:54:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D11926B015A for ; Mon, 31 Jul 2023 12:54:26 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9E46B1C9996 for ; Mon, 31 Jul 2023 16:54:26 +0000 (UTC) X-FDA: 81072505332.23.09E7C88 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id DC2DB40003 for ; Mon, 31 Jul 2023 16:54:23 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=AjwLXfW0; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690822464; a=rsa-sha256; cv=none; b=NmkABnMxKdXvTIgp3bBqpS31nb3VXAzEOjTNwBNiHnMPPHiEDfcVkhTPmvuXgJ/qUALhuL NcPhpEIpfJeuHNH0mK14CeMymqIYNVCj+v8I/yzGZ3iC7n+PgHU1/2UJ4G/3QkuDRITElZ nIaL+oPnj4ZklRKECZuFy7RB2ES7JrE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=AjwLXfW0; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690822464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PjItUIkCI0I/tP6BHRrvgMnMVmybDiMWc/8ahzNK1Yo=; b=ONuJONXcVkhTCU10sa6kbH7vEybYVWD9MfTfBUjtyuYRAzW57yjP2LQ9Cks0NzdcWkY/yC qrB8tZZ0LQ3RiYpIQ8bEJFaEigU7tIX37kKFSno+wvH8oZD5wTeHMe/eglPGGDjcKOTl4W cMFU0GWu+7WR368AvgjR978YprYBZ7w= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=PjItUIkCI0I/tP6BHRrvgMnMVmybDiMWc/8ahzNK1Yo=; b=AjwLXfW0vRm7dfkKk8o/DHhOWH c7hxfSfIc/MOQZveHcyB03KDci4sBZV4+M2L8SqC3mmgrCcss8AzbRnXRWPncy0cdiMkHsAOv13jF vD23leXiiOWfvnMo8iBJX0pyRTvm/mEUfML5QXvdEhRxaSHvCtUY22JWv4ZwpUAKpwAGn8xT2GvML jpxvL8cn4O0WwkN6FivsK17oxbScAkGTLWQ4jgGn/jrFrZMdddPtTZcaGN6mKv7qt07/wzkbXN37V VRI6zgclVkLsQgU7GlYAeYAH+Ojt+AjBGDdU59n2xwERE/lx6dKka7b+ttU4ceBAS2sOpeRKvRAi8 3Y0B18Sw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qQW9u-002kk1-Q7; Mon, 31 Jul 2023 16:54:15 +0000 Date: Mon, 31 Jul 2023 17:54:14 +0100 From: Matthew Wilcox To: David Hildenbrand Cc: Rongwei Wang , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "xuyu@linux.alibaba.com" Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) Message-ID: References: <74fe50d9-9be9-cc97-e550-3ca30aebfd13@linux.alibaba.com> <9faea1cf-d3da-47ff-eb41-adc5bd73e5ca@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: DC2DB40003 X-Stat-Signature: 8cfbgp7wp1zn3n7ifwa6pqjgxr19ikmt X-Rspam-User: X-HE-Tag: 1690822463-770242 X-HE-Meta: U2FsdGVkX1/d12TCeFFWDZLgJmM5gI7RbdxUFDu+89nY0jney07p9jZaLLtIKQIWHyTOQ1eakKDzVpcOBWhc8sEk8BRo41mc5P1HQTw/53Pjw7VBQK2rI0cD23dnhvSFqnYoqtwp2TxMGiQcGTwmnzmWrca8dRJg6Ivs6y89lGXcAIhARhVKW5aBKmMx1BmyWicIIa+KkHcEeOWkrhz0bXQ1u2W2aRv00sXxPY9AuEyH8u9t1mWqPslCV3+r9T2fTpStbfeFyFiQF9Xo/oV63HJqQ+JiJ2VdMmVjZXbQQMDR+ZfubLDcUHeXlDGRCIHHLWdIBLpb4BoA/73Ghi8th2CVPhFIDVIDCZcvXwfWDI7vF836Q1I6Ja4N5A2nAsyL/Eqnwzqb9Ew5Ms83RwsBhPUT/Xbt9Vuf96AE9Dmv4xeQ+nPuYZWIZTy3D9g2HUZ68tGafY6/90OabxsGetg+I9crnNM93R/c8dXOlwSg3m7NClY0hfWVCKwLBgUmC4IcO7n6u0OLmFGFsmhjjoPnqtIUhJvMGp6K6YZqTNr3G73TcF+ry+HbvYKOd8cVHoJvJ4hDxotlwZHJLtCEzt/EbY/OlHN1xHvWxa84b1Tn2BU2z382lLl3ugoWFoAioSCI5DfzzjcYUTHXDikogDr/kCdCN/rV7DZpQXxkFhP8VeosDbQfJXYR1mxRPjsY/7U5t+fSHiN7iDsE3WW9XPrLtwYbqc/QGhmhJ2UXwvi4wXkI295KOQ541C1tEMogsa+G7hO+5xV2Hzbsgi9aDsNIgoEn8KHJzho48MpST/NL9IxfutnoPPgLShcAon1Frxm9l1r5NjEis/1aH4BfhTq7Uh/B2AiB6bEeSOm0nte9puuzIZIh1M5lIY+Cw8wjJzDyg+tM4lGQndIbOWijp3rnyjonbbyR72kg0AP1VV4d7xy1QMAyBwfYcXsYf/4U5Q1V+oLK2hOahKrNtZQ/kiz njdbk3xY ELQL1ecUm1dyq/KmpPvqr5V1Sqdv7g8Xi7b3Q7JdHB+UstLH8KFJ6FMjTcFsg+OG7icaVSnDub+VeWked1UUW8r2D+1wxL0pc5srxeA+arXET7EnIl3EawICwXKm3TT8am1HuswZguOnqPkJCWFvw8pMvKzdf303rpyX2hyhHX/US+sO1IbcEeABgGw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 31, 2023 at 06:48:47PM +0200, David Hildenbrand wrote: > On 31.07.23 18:38, Matthew Wilcox wrote: > > On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote: > > > Assume we do do the page table sharing at mmap time, if the flags are right. > > > Let's focus on the most common: > > > > > > mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED) > > > > > > And doing the same in each and every process. > > > > That may be the most common in your usage, but for a database, you're > > looking at two usage scenarios. Postgres calls mmap() on the database > > file itself so that all processes share the kernel page cache. > > Some Commercial Databases call mmap() on a hugetlbfs file so that all > > processes share the same userspace buffer cache. Other Commecial > > Databases call shmget() / shmat() with SHM_HUGETLB for the exact > > same reason. > > I remember you said that postgres might be looking into using shmem as well, > maybe I am wrong. No, I said that postgres was also interested in sharing page tables. I don't think they have any use for shmem. > memfd/hugetlb/shmem could all be handled alike, just "arbitrary filesystems" > would require more work. But arbitrary filesystems was one of the origin use cases; where the database is stored on a persistent memory filesystem, and neither the kernel nor userspace has a cache. The Postgres & Commercial Database use-cases collapse into the same case, and we want to mmap the files directly and share the page tables. > > This is why I proposed mshare(). Anyone can use it for anything. > > We have such a diverse set of users who want to do stuff with shared > > page tables that we should not be tying it to memfd or any other > > filesystem. Not to mention that it's more flexible; you can map > > individual 4kB files into it and still get page table sharing. > > That's not what the current proposal does, or am I wrong? I think you're wrong, but I haven't had time to read the latest patches. > Also, I'm curious, is that a real requirement in the database world? I don't know. It's definitely an advantage that falls out of the design of mshare.