From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70205C63798 for ; Thu, 26 Nov 2020 22:24:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF80B21527 for ; Thu, 26 Nov 2020 22:24:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Z05ceovv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF80B21527 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EE91B6B005D; Thu, 26 Nov 2020 17:24:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E72796B0070; Thu, 26 Nov 2020 17:24:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC4176B0071; Thu, 26 Nov 2020 17:24:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id B15996B005D for ; Thu, 26 Nov 2020 17:24:06 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6FEDD181AEF32 for ; Thu, 26 Nov 2020 22:24:06 +0000 (UTC) X-FDA: 77527998492.21.drug88_4b0a84e27382 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 55E7C180442CE for ; Thu, 26 Nov 2020 22:24:06 +0000 (UTC) X-HE-Tag: drug88_4b0a84e27382 X-Filterd-Recvd-Size: 7501 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 26 Nov 2020 22:24:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1606429445; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=kjJ3uf7v5WaBpBCr7bqxgU6BDn3tacPiof+m7txYtT8=; b=Z05ceovvnWuP9ZxW1o2TNoMA+pIdyrlQQurfpclFWdZCvqBtBQJrx2bb70KEDZD6QBE6gf VtMJb1iyJ9qEtTSu1uXob/rnz6pifQYqTPxbZXWev8JMyTbwhv9bJZke7oKo18WHg8pN5D V60kqiOg8oVNWhJ0Rc8kzKYUzz0+/Fs= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-338-9NKtM8jqOpW_LGH6_jHSyQ-1; Thu, 26 Nov 2020 17:24:03 -0500 X-MC-Unique: 9NKtM8jqOpW_LGH6_jHSyQ-1 Received: by mail-qt1-f197.google.com with SMTP id f29so149349qtv.23 for ; Thu, 26 Nov 2020 14:24:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kjJ3uf7v5WaBpBCr7bqxgU6BDn3tacPiof+m7txYtT8=; b=dY+afG+MhiGLy7Tl56S+AcAAsBY/AmtzjpkLGkWIKkfImCDBzpeuwL6K2q8c0XbfSO tCnKOxAOD847997mJ+ZL0r/B2vY7+rLkX8UGNWJOlcBixAOI0IaSssFCPCL6eNp0DrhB fM9M8esmqCbd0l9D70mFTHUTBn+HMEeAIhAsBpWhUpSLqegJz8EiMfkFwnLV8HEaLkZa ns2TBEyYBTuK99Tt+aAMb7WEpNuX1D+XQGsjDQ3/SE4t2CqHJt8RyRHbWktV6E40+v7g yLKoAL1t2cdo3a2X4Xk5zP1AzJOSFYcNVFtYj7SnNxvuBOe0gPiTCHSZv7cgdv2kl3nb YjGw== X-Gm-Message-State: AOAM530m55VulN3poxlCuCF5hbe9axu/0xwbWyCvDaVXFy4dRllIGK8/ k9OC8Z7dNFI7SQu/pYyHdEKaxzTTooUXpGVWokYFWrWGYiL9Uc29JNvOBFyRPAILqiS07bm2QHx /r3FUjOPgUkU= X-Received: by 2002:ac8:134b:: with SMTP id f11mr5109684qtj.126.1606429442898; Thu, 26 Nov 2020 14:24:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJyR7i78Qb3v31x/mfYi/TUGN6S5qVtXLyNjSQ82qD1sJmiqf9i7czpgLvmAF/p3bLO6fXlvBw== X-Received: by 2002:ac8:134b:: with SMTP id f11mr5109664qtj.126.1606429442684; Thu, 26 Nov 2020 14:24:02 -0800 (PST) Received: from localhost.localdomain ([142.126.81.247]) by smtp.gmail.com with ESMTPSA id k32sm3999242qte.59.2020.11.26.14.24.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Nov 2020 14:24:01 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Andrew Morton , Hugh Dickins , Andrea Arcangeli , Mike Rapoport Subject: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads Date: Thu, 26 Nov 2020 17:23:59 -0500 Message-Id: <20201126222359.8120-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Faulting around for reads are in most cases helpful for the performance s= o that continuous memory accesses may avoid another trip of page fault. However= it may not always work as expected. For example, userfaultfd registered regions may not be the best candidate= for pre-faults around the reads. For missing mode uffds, fault around does not help because if the page ca= che existed, then the page should be there already. If the page cache is not there, nothing else we can do, either. If the fault-around code is desti= ned to be helpless for userfault-missing vmas, then ideally we can skip it. For wr-protected mode uffds, errornously fault in those pages around coul= d lead to threads accessing the pages without uffd server's awareness. For exam= ple, when punching holes on uffd-wp registered shmem regions, we'll first try = to unmap all the pages before evicting the page cache but without locking th= e page (please refer to shmem_fallocate(), where unmap_mapping_range() is c= alled before shmem_truncate_range()). When fault-around happens near a hole be= ing punched, we might errornously fault in the "holes" right before it will b= e punched. Then there's a small window before the page cache was finally dropped, and after the page will be writable again (NOTE: the uffd-wp pro= tect information is totally lost due to the pre-unmap in shmem_fallocate(), so= the page can be writable within the small window). That's severe data loss. Let's grant the userspace full control of the uffd-registered ranges, rat= her than trying to do the tricks. Cc: Hugh Dickins Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Mike Rapoport Signed-off-by: Peter Xu --- Note that since no file-backed uffd-wp support is there yet upstream, so = the uffd-wp check is actually not really functioning. However since we have = all the necessary uffd-wp concepts already upstream, maybe it's better to do = it once and for all. This patch comes from debugging a data loss issue when working on the uff= d-wp support on shmem/hugetlbfs. I posted this out for early review and comme= nts, but also because it should already start to benefit missing mode userfaul= tfd to avoid trying to fault around on reads. --- include/linux/userfaultfd_k.h | 5 +++++ mm/memory.c | 17 +++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.= h index a8e5f3ea9bb2..451d99bb3a1a 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -62,6 +62,11 @@ static inline bool userfaultfd_wp(struct vm_area_struc= t *vma) return vma->vm_flags & VM_UFFD_WP; } =20 +static inline bool vma_registered_userfaultfd(struct vm_area_struct *vma= ) +{ + return userfaultfd_missing(vma) || userfaultfd_wp(vma); +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { diff --git a/mm/memory.c b/mm/memory.c index eeae590e526a..ca58ada94c96 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3933,6 +3933,23 @@ static vm_fault_t do_fault_around(struct vm_fault = *vmf) int off; vm_fault_t ret =3D 0; =20 + /* + * Be extremely careful with uffd-armed regions. + * + * For missing mode uffds, fault around does not help because if the + * page cache existed, then the page should be there already. If the + * page cache is not there, nothing else we can do either. + * + * For wr-protected mode uffds, errornously fault in those pages around + * could lead to threads accessing the pages without uffd server's + * awareness, finally it could cause ghostly data corruption. + * + * The idea is that, every single page of uffd regions should be + * governed by the userspace on which page to fault in. + */ + if (unlikely(vma_registered_userfaultfd(vmf->vma))) + return 0; + nr_pages =3D READ_ONCE(fault_around_bytes) >> PAGE_SHIFT; mask =3D ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK; =20 --=20 2.26.2