From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA8ADC47080 for ; Tue, 1 Jun 2021 04:48:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 524C561005 for ; Tue, 1 Jun 2021 04:48:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 524C561005 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 960916B006C; Tue, 1 Jun 2021 00:48:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 910DF6B006E; Tue, 1 Jun 2021 00:48:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B1368D0002; Tue, 1 Jun 2021 00:48:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 49A9A6B006C for ; Tue, 1 Jun 2021 00:48:49 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D478B181AEF00 for ; Tue, 1 Jun 2021 04:48:48 +0000 (UTC) X-FDA: 78203924736.34.57382F4 Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf02.hostedemail.com (Postfix) with ESMTP id C3B814202A1E for ; Tue, 1 Jun 2021 04:48:41 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-18-ed; t=1622522925; h=date : from : to : cc : subject : message-id : mime-version : content-type : from; bh=iE/WmEnKinENpfj53IuofZR5kM5n2xX5o0/aa0Pb/To=; b=0SMoAENCqH1MPIuUrgz7xD6QeATC7zKiC60YhpTvwtRnNfj1nYMQePa89xSVwtxiAYrn+ TdZ08fJL2De1eUPBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-18-rsa; t=1622522925; h=date : from : to : cc : subject : message-id : mime-version : content-type : from; bh=iE/WmEnKinENpfj53IuofZR5kM5n2xX5o0/aa0Pb/To=; b=ok+uTtzTLnRSA4TJNaD8IA0T2ExxGd8TsOUTYdAvs8f7sv2XLGjX2xPbspLu3gkA72TZS g7Wo4gI13M1hf7npuSveFydotdmb5BuZuRZEFpa4iZxygtcn4OIv+Yepx9wCj4IxjCgZ2bd F0Q8AedNY6UrentJ6gWRi52OAIqXQfmVW1i8XXNzdd41Y9JCQ+A3xCqMC0uUfVd4kNtVzIF FgD+qdCNnkGoqm18Mr5ZYxI6UHSzlWBP2LNCFeXrGt8qoy43JhYkd7ojEYL5kmu6k7dJQ7T 43vuc10Ettku9UiyFzknsfIrqdC14D0Wi+aURM/vCrrBpV8EBAlcko0rynUA== Received: by server.lespinasse.org (Postfix, from userid 1000) id D9119160564; Mon, 31 May 2021 21:48:45 -0700 (PDT) Date: Mon, 31 May 2021 21:48:45 -0700 From: Michel Lespinasse To: lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org Subject: [LSF/MM TOPIC] mmap locking topics Message-ID: <20210601044845.GA12713@lespinasse.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=lespinasse.org header.s=srv-18-ed header.b=0SMoAENC; dkim=pass header.d=lespinasse.org header.s=srv-18-rsa header.b=ok+uTtzT; spf=pass (imf02.hostedemail.com: domain of michel@lespinasse.org designates 63.205.204.226 as permitted sender) smtp.mailfrom=michel@lespinasse.org; dmarc=pass (policy=none) header.from=lespinasse.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C3B814202A1E X-Stat-Signature: sz4um8ec866mddwzh93errn7mioj6td8 X-HE-Tag: 1622522921-490442 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000517, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, I have two MM topics to propose for LSF/MM/BPF 2021, both in the area of mmap lock performance: I - Speculative page faults The idea there is to avoid taking the mmap lock during page faults, at least for the easier cases. This requiers the fault handler to be a careful to avoid races with mmap writers (and most particularly munmap), and when the new page is ready to be inserted into the user process, to verify, at the last moment (after taking the page table lock), that there has been no race between the fault handler and any mmap writers. Such checks can be implemented locally, without hitting any global locks, which results in very nice scalability improvements when processing concurrent faults. I think the idea is ready for prime time, and a patchset has been proposed, but it is not getting much traction yet. I suspect we will need to discuss the idea in person to figure out the next steps. II - Fine grained MM locking A major limitation of the current mmap lock design is that it covers a process's entire address space. In threaded applications, it is common for threads to issue concurrent requests for non-overlapping parts of the process address space - for example, one thread might be mmaping new memory while another releases a different range, and a third might fault within his own address range too. The current mmap lock design does not take the non-overlapping ranges into consideration, and consequently serialises the 3 above requests rather than letting them proceed in parallel. There has been a lot of work spent mitigating the problem by reducing the mmap lock hold times (for example, dropping the mmap lock during page faults that hit disk, or lowering to a read lock during longer mmap/munmap/populate operations). But this approach is hitting its limits, and I think it would be better to fix the core of the problem by making the mmap lock capable of allowing concurrent non-overlapping operations. I would like to propose an approach that: - separates the mmap lock into two separate locks, one that is only held for short periods of time to protect mm-wide data structures (including the vma tree), and another that functions as a range lock and can be held for longer periods of time; - allows for incremental conversion from the current code to being aware about locking ranges; I have been maintaining a prototype for this, which has been shared with a small set of people. The main holdup is with page fault performance; in order to allow non-overlapping writers to proceed while some page faults are in progress, the prototype needs to maintain a shared structure holding addresses for each pending page fault. Updating this shared structure gets very expenside in high concurrency page fault benchmarks, though it seems quite unnoticeable in macro benchmarks I hae looked at. Sorry for the lenghty proposal - I swear I've tried to keep it short :) Thanks, -- Michel "walken" Lespinasse