From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B28C433C1 for ; Tue, 30 Mar 2021 16:21:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A921061964 for ; Tue, 30 Mar 2021 16:21:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A921061964 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 41A826B0080; Tue, 30 Mar 2021 12:21:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CAFD6B0081; Tue, 30 Mar 2021 12:21:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 243F36B0082; Tue, 30 Mar 2021 12:21:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0A04D6B0080 for ; Tue, 30 Mar 2021 12:21:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 920BD8249980 for ; Tue, 30 Mar 2021 16:21:41 +0000 (UTC) X-FDA: 77977056402.29.BF6A574 Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf29.hostedemail.com (Postfix) with ESMTP id 42CEF3CA for ; Tue, 30 Mar 2021 16:21:34 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id f16so20588471ljm.1 for ; Tue, 30 Mar 2021 09:21:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sD2dqYMXeFiKEoj9JxONMxx5R6jrqHrbtb78j2OsOAQ=; b=D6p0Ji51dTK2Fo2jj9xMuHdWUaEfxd4F1U8503my1cLLcqzj0D0vbF5w9hn/15Crhr UmdIwRQVMUjiRvvNqx9RHFWmPhg1iEW03S+gZ6gywq9iSbfkazDORknc4uio5sP8W3CQ /wrxxyxxgMiScIB9laarimIxOfSxqSn4+0ywMPUedsGbJYSEuAKzMClTPh5BR2WKnwZM NYCd6jHR3BdoqJPeho5998vytNjWra4Y/3c6QPg4QkhM4dtRwTmFlVEAxODZiQ0sOJ1o h0+UDg5osUL41P8VIjToy6X0npHKMjfQmKa2QZiEnwH4V7LPlPrykLT/mAEV9gz7cBHC IRsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sD2dqYMXeFiKEoj9JxONMxx5R6jrqHrbtb78j2OsOAQ=; b=bXKyhcc94WVXMceyIchECuHiImzzDTV1pg6aphHlAqRtN+xSpQimhqeLfeIBWs8VYh u15rhf/nkdsrfyPY0FI27iTtNn5D3eEo3aef89RdE4lzklNpl7ePTgM9Ys/mp3UYSDgF GfXCxLv6Y2G3ROWsOEiNjmGjo6cXYigLJbUM6ArLeYKjjx6qwObtwGddqwL3aqGxWnAF jV8u9W/xXWwGtSJJt/zhMCF7YQv59PPP1o67HmAH2vtrRnC8iqvj1pBiAzyt/sAfnof+ xAx2nbhQjgPtY/N9Mv38qC2hSu/zUSRF83h38k5Bv/MVtoKD5Skgr6dw6e9RXH6aULky 4smg== X-Gm-Message-State: AOAM533xAaOs3dveIf8iPEZno2s5ungUjE4QJMa4ZetIshwZdLatXP6H A0jv3ermFkiTnBCymCv8oJ07ne7sulXtDUDAi1GEyQ== X-Google-Smtp-Source: ABdhPJzpocE1IkELvDijbdY/Z3ooshHcOlM7jaZGC02K2KN7kINlC+Y5w9/d1Drl2UKjoC1usvoUqA4nO6OKv5TerQg= X-Received: by 2002:a2e:7d04:: with SMTP id y4mr2339858ljc.94.1617121294587; Tue, 30 Mar 2021 09:21:34 -0700 (PDT) MIME-Version: 1.0 References: <20210317110644.25343-1-david@redhat.com> <20210317110644.25343-3-david@redhat.com> <2bab28c7-08c0-7ff0-c70e-9bf94da05ce1@redhat.com> In-Reply-To: <2bab28c7-08c0-7ff0-c70e-9bf94da05ce1@redhat.com> From: Jann Horn Date: Tue, 30 Mar 2021 18:21:07 +0200 Message-ID: Subject: Re: [PATCH v1 2/5] mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault/prealloc memory To: David Hildenbrand Cc: kernel list , Linux-MM , Andrew Morton , Arnd Bergmann , Michal Hocko , Oscar Salvador , Matthew Wilcox , Andrea Arcangeli , Minchan Kim , Jason Gunthorpe , Dave Hansen , Hugh Dickins , Rik van Riel , "Michael S . Tsirkin" , "Kirill A . Shutemov" , Vlastimil Babka , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Chris Zankel , Max Filippov , Mike Kravetz , Peter Xu , Rolf Eike Beer , linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch , Linux API Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 42CEF3CA X-Stat-Signature: 5psk84seha3obpewbes9d37ggydafjhf X-Rspamd-Server: rspam02 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=mail-lj1-f178.google.com; client-ip=209.85.208.178 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617121294-439418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 30, 2021 at 5:01 PM David Hildenbrand wrote: > >> +long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start, > >> + unsigned long end, bool write, int *locked) > >> +{ > >> + struct mm_struct *mm = vma->vm_mm; > >> + unsigned long nr_pages = (end - start) / PAGE_SIZE; > >> + int gup_flags; > >> + > >> + VM_BUG_ON(!PAGE_ALIGNED(start)); > >> + VM_BUG_ON(!PAGE_ALIGNED(end)); > >> + VM_BUG_ON_VMA(start < vma->vm_start, vma); > >> + VM_BUG_ON_VMA(end > vma->vm_end, vma); > >> + mmap_assert_locked(mm); > >> + > >> + /* > >> + * FOLL_HWPOISON: Return -EHWPOISON instead of -EFAULT when we hit > >> + * a poisoned page. > >> + * FOLL_POPULATE: Always populate memory with VM_LOCKONFAULT. > >> + * !FOLL_FORCE: Require proper access permissions. > >> + */ > >> + gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK | FOLL_HWPOISON; > >> + if (write) > >> + gup_flags |= FOLL_WRITE; > >> + > >> + /* > >> + * See check_vma_flags(): Will return -EFAULT on incompatible mappings > >> + * or with insufficient permissions. > >> + */ > >> + return __get_user_pages(mm, start, nr_pages, gup_flags, > >> + NULL, NULL, locked); > > > > You mentioned in the commit message that you don't want to actually > > dirty all the file pages and force writeback; but doesn't > > POPULATE_WRITE still do exactly that? In follow_page_pte(), if > > FOLL_TOUCH and FOLL_WRITE are set, we mark the page as dirty: > > Well, I mention that POPULATE_READ explicitly doesn't do that. I > primarily set it because populate_vma_page_range() also sets it. > > Is it safe to *not* set it? IOW, fault something writable into a page > table (where the CPU could dirty it without additional page faults) > without marking it accessed? For me, this made logically sense. Thus I > also understood why populate_vma_page_range() set it. FOLL_TOUCH doesn't have anything to do with installing the PTE - it essentially means "the caller of get_user_pages wants to read/write the contents of the returned page, so please do the same things you would do if userspace was accessing the page". So in particular, if you look up a page via get_user_pages() with FOLL_WRITE|FOLL_TOUCH, that tells the MM subsystem "I will be writing into this page directly from the kernel, bypassing the userspace page tables, so please mark it as dirty now so that it will be properly written back later". Part of that is that it marks the page as recently used, which has an effect on LRU pageout behavior, I think - as far as I understand, that is why populate_vma_page_range() uses FOLL_TOUCH. If you look at __get_user_pages(), you can see that it is split up into two major parts: faultin_page() for creating PTEs, and follow_page_mask() for grabbing pages from PTEs. faultin_page() ignores FOLL_TOUCH completely; only follow_page_mask() uses it. In a way I guess maybe you do want the "mark as recently accessed" part that FOLL_TOUCH would give you without FOLL_WRITE? But I think you very much don't want the dirtying that FOLL_TOUCH|FOLL_WRITE leads to. Maybe the ideal approach would be to add a new FOLL flag to say "I only want to mark as recently used, I don't want to dirty". Or maybe it's enough to just leave out the FOLL_TOUCH entirely, I don't know.