From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14C58C4708F for ; Wed, 2 Jun 2021 03:34:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A01A760FE4 for ; Wed, 2 Jun 2021 03:34:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A01A760FE4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A3CA6B006C; Tue, 1 Jun 2021 23:34:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0537D6B006E; Tue, 1 Jun 2021 23:34:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5D756B0070; Tue, 1 Jun 2021 23:34:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id B25A46B006C for ; Tue, 1 Jun 2021 23:34:11 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 429DDC5AE for ; Wed, 2 Jun 2021 03:34:11 +0000 (UTC) X-FDA: 78207365502.27.A940F1E Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf07.hostedemail.com (Postfix) with ESMTP id 6712DA000245 for ; Wed, 2 Jun 2021 03:33:57 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-18-ed; t=1622604848; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to : from; bh=h9BuINWq1l538MmUiuIViw0CKrbpVzJ1VH4VrIWwte8=; b=EBadCFS3A2joSB8wqNdE6Qm4OYDJ0y/i0xRnZ3FLGx+zP+FmYHy5ZEJPjhc1O15f+6EDx vf8VqH5hpwvKVDFBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-18-rsa; t=1622604848; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to : from; bh=h9BuINWq1l538MmUiuIViw0CKrbpVzJ1VH4VrIWwte8=; b=NcP6h2fDNBxMixGTafK4W/64h8n+0kJWnBTOOizXWwlXUo1KWBEZvyf3GAel5Un1IhYmY qZfZvuONqP0sP5a82OkpR1zxOzfDUSS5UdQwmnchs0/A3jyJ7TEuHVwLWwrEWuXanQipVTU T3nvL1gGvAWmWOgjA7DRvg9cdd2aUw9Cm+S1m8II1RnE3u/ngwWh0cpVWN/EC18Z/MgAEYZ Q/Gg0wWpaWtd1v9vZXkdjsi37Mvkf0Q8u+RisXv/AnDqY0jwJnVvYZ4hITgGbkRZMBtlJU7 bOc4091VZHn1PH5Q1eQ8i4eXOf9AIGgWM1fnlZnpHQRX0otTC15s3lwRsT5Q== Received: by server.lespinasse.org (Postfix, from userid 1000) id 92A8216078D; Tue, 1 Jun 2021 20:34:08 -0700 (PDT) Date: Tue, 1 Jun 2021 20:34:08 -0700 From: Michel Lespinasse To: Matthew Wilcox Cc: Michel Lespinasse , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org Subject: Re: [LSF/MM TOPIC] mmap locking topics Message-ID: <20210602033408.GA3229@lespinasse.org> References: <20210601044845.GA12713@lespinasse.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=lespinasse.org header.s=srv-18-ed header.b=EBadCFS3; dkim=pass header.d=lespinasse.org header.s=srv-18-rsa header.b=NcP6h2fD; dmarc=pass (policy=none) header.from=lespinasse.org; spf=pass (imf07.hostedemail.com: domain of michel@lespinasse.org designates 63.205.204.226 as permitted sender) smtp.mailfrom=michel@lespinasse.org X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6712DA000245 X-Stat-Signature: eduy6z783xx3fqw1mzktgjk5hp11nafe X-HE-Tag: 1622604837-900857 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 01, 2021 at 03:01:00PM +0100, Matthew Wilcox wrote: > On Mon, May 31, 2021 at 09:48:45PM -0700, Michel Lespinasse wrote: > > I - Speculative page faults > > > > The idea there is to avoid taking the mmap lock during page faults, > > at least for the easier cases. This requiers the fault handler to be > > a careful to avoid races with mmap writers (and most particularly > > munmap), and when the new page is ready to be inserted into the user > > process, to verify, at the last moment (after taking the page table > > lock), that there has been no race between the fault handler and any > > mmap writers. Such checks can be implemented locally, without hitting > > any global locks, which results in very nice scalability improvements > > when processing concurrent faults. > > > > I think the idea is ready for prime time, and a patchset has been proposed, > > but it is not getting much traction yet. I suspect we will need to discuss > > the idea in person to figure out the next steps. > > There is a lot of interest in this. I disagree with Michel's approach > in that he wants to use seqlocks to detect whether any modification has > been made to the process's address space, whereas I want to use the VMA > tree to detect whether any modification has been made to this VMA. I see the sequence count as being the easy & safe approach, but yes it does have limitations that can lead to unnecessary fast path aborts. It would be nice checking the VMAs to avoid *some* of these aborts, but I do not think that is always applicable either - I wrote about that in https://lwn.net/ml/linux-kernel/20210430224649.GA29203@lespinasse.org/ ("Thoughts about concurrency checks at the end of the page fault") > > II - Fine grained MM locking > > > > A major limitation of the current mmap lock design is that it covers a > > process's entire address space. In threaded applications, it is common > > for threads to issue concurrent requests for non-overlapping parts of > > the process address space - for example, one thread might be mmaping > > new memory while another releases a different range, and a third might > > fault within his own address range too. The current mmap lock design > > does not take the non-overlapping ranges into consideration, and > > consequently serialises the 3 above requests rather than letting them > > proceed in parallel. > > > > There has been a lot of work spent mitigating the problem by reducing > > the mmap lock hold times (for example, dropping the mmap lock during > > page faults that hit disk, or lowering to a read lock during longer > > mmap/munmap/populate operations). But this approach is hitting its > > limits, and I think it would be better to fix the core of the problem > > by making the mmap lock capable of allowing concurrent non-overlapping > > operations. > > > > I would like to propose an approach that: > > - separates the mmap lock into two separate locks, one that is only > > held for short periods of time to protect mm-wide data structures > > (including the vma tree), and another that functions as a range lock > > and can be held for longer periods of time; > > - allows for incremental conversion from the current code to being > > aware about locking ranges; > > > > I have been maintaining a prototype for this, which has been shared > > with a small set of people. The main holdup is with page fault > > performance; in order to allow non-overlapping writers to proceed > > while some page faults are in progress, the prototype needs to > > maintain a shared structure holding addresses for each pending page > > fault. Updating this shared structure gets very expenside in high > > concurrency page fault benchmarks, though it seems quite unnoticeable > > in macro benchmarks I hae looked at. > > Here I have larger disagreements with Michel. I do not believe the > range lock is a useful tool for this problem. Regardless of any proposed solution, do you agree that most of the cases where mmap lock blocking happens are between non-overlapping memory operations that could conceivably be handled concurrently ? In other words - do you believe that range locks would be too slow to be a useful solution, or is it that you do not think they would actually solve the issue ? > The two topics above seem large enough, but there are other important > users of the mmap_sem that also hit contention. /proc/$pid/maps, smaps > and similar files hit priority inversion problems which have been reduced, > but not solved. Yes - I do think this would be worth discussing too. Not sure if that is a separate topic, or if this should be brought under a larger theme. Generally - I think there are many issues people have with mmap locking, and it's been really hard to make progress addressing these - even when prototype solutions exist - due to a lack of concensus (many of the people involved have different ideas as to which of the issues are important to them). But I think that's what makes this an important topic to be discussed ? -- Michel "walken" Lespinasse