From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33311C47080 for ; Tue, 1 Jun 2021 14:01:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AD74B61376 for ; Tue, 1 Jun 2021 14:01:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD74B61376 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 40DDB6B0074; Tue, 1 Jun 2021 10:01:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BD6A6B0078; Tue, 1 Jun 2021 10:01:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 286466B007D; Tue, 1 Jun 2021 10:01:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E83536B0074 for ; Tue, 1 Jun 2021 10:01:24 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7150B180AD817 for ; Tue, 1 Jun 2021 14:01:24 +0000 (UTC) X-FDA: 78205317288.01.D2B4455 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf04.hostedemail.com (Postfix) with ESMTP id 73738374B for ; Tue, 1 Jun 2021 14:01:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZiE8Me3jHpwxhOlc93V5FqRNLCf85iDK1eNQmxkvojU=; b=iOGxrVahck6sDIk0ekDTGBXZ76 m6fXNc+f4AVmqG7oL7CRf35QcAzguJBskpq8BPp8ufgRHrwqFaGtbek/GcinRVRgiiEmh8/t0esU+ rIK/91SLuDNHRI4PWbtK9XSxVcK+WWh5Qw1I+9dpywFd++woD+B1rh9B94sKc3OzRtpas3jjrnhnY QxyPkxXzo7lWXkgoofdFO653Kc20eB+5px5aO5nPtZl/PTru49bBvYOhrxyKPLBHQXak5eCJ2b2OC 2xHoxMYda3xsDVsv8YHBquews8L3ZCUJQn61LV2orC+Rc0hcOTwEBdsGO/ZXSeFAPffP9guu/vDfI aGDsW30A==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lo4x2-00A5lB-PB; Tue, 01 Jun 2021 14:01:04 +0000 Date: Tue, 1 Jun 2021 15:01:00 +0100 From: Matthew Wilcox To: Michel Lespinasse Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org Subject: Re: [LSF/MM TOPIC] mmap locking topics Message-ID: References: <20210601044845.GA12713@lespinasse.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210601044845.GA12713@lespinasse.org> X-Rspamd-Queue-Id: 73738374B Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=iOGxrVah; dmarc=none; spf=none (imf04.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Rspamd-Server: rspam03 X-Stat-Signature: n1nsu7sfddhf6qi4drs77urg7ms6mumx X-HE-Tag: 1622556074-650030 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 31, 2021 at 09:48:45PM -0700, Michel Lespinasse wrote: > I - Speculative page faults > > The idea there is to avoid taking the mmap lock during page faults, > at least for the easier cases. This requiers the fault handler to be > a careful to avoid races with mmap writers (and most particularly > munmap), and when the new page is ready to be inserted into the user > process, to verify, at the last moment (after taking the page table > lock), that there has been no race between the fault handler and any > mmap writers. Such checks can be implemented locally, without hitting > any global locks, which results in very nice scalability improvements > when processing concurrent faults. > > I think the idea is ready for prime time, and a patchset has been proposed, > but it is not getting much traction yet. I suspect we will need to discuss > the idea in person to figure out the next steps. There is a lot of interest in this. I disagree with Michel's approach in that he wants to use seqlocks to detect whether any modification has been made to the process's address space, whereas I want to use the VMA tree to detect whether any modification has been made to this VMA. > II - Fine grained MM locking > > A major limitation of the current mmap lock design is that it covers a > process's entire address space. In threaded applications, it is common > for threads to issue concurrent requests for non-overlapping parts of > the process address space - for example, one thread might be mmaping > new memory while another releases a different range, and a third might > fault within his own address range too. The current mmap lock design > does not take the non-overlapping ranges into consideration, and > consequently serialises the 3 above requests rather than letting them > proceed in parallel. > > There has been a lot of work spent mitigating the problem by reducing > the mmap lock hold times (for example, dropping the mmap lock during > page faults that hit disk, or lowering to a read lock during longer > mmap/munmap/populate operations). But this approach is hitting its > limits, and I think it would be better to fix the core of the problem > by making the mmap lock capable of allowing concurrent non-overlapping > operations. > > I would like to propose an approach that: > - separates the mmap lock into two separate locks, one that is only > held for short periods of time to protect mm-wide data structures > (including the vma tree), and another that functions as a range lock > and can be held for longer periods of time; > - allows for incremental conversion from the current code to being > aware about locking ranges; > > I have been maintaining a prototype for this, which has been shared > with a small set of people. The main holdup is with page fault > performance; in order to allow non-overlapping writers to proceed > while some page faults are in progress, the prototype needs to > maintain a shared structure holding addresses for each pending page > fault. Updating this shared structure gets very expenside in high > concurrency page fault benchmarks, though it seems quite unnoticeable > in macro benchmarks I hae looked at. Here I have larger disagreements with Michel. I do not believe the range lock is a useful tool for this problem. The two topics above seem large enough, but there are other important users of the mmap_sem that also hit contention. /proc/$pid/maps, smaps and similar files hit priority inversion problems which have been reduced, but not solved. The lockless VMA tree walk patches have run into problems (eg that holding the RCU lock is not enough to prevent page table freeing), and I don't have time to work on them right now due to the folio work.