From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B213ECAAD5 for ; Mon, 29 Aug 2022 08:14:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229817AbiH2IOH (ORCPT ); Mon, 29 Aug 2022 04:14:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbiH2IOF (ORCPT ); Mon, 29 Aug 2022 04:14:05 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC78253015; Mon, 29 Aug 2022 01:14:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A3716B80D64; Mon, 29 Aug 2022 08:14:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C1128C433C1; Mon, 29 Aug 2022 08:14:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1661760842; bh=YTySXLUW0r8T5ka2X3JesPLIYBGBWpbu34uaAUBqL3I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gsaKAICYviLDNLjnlk6drKvJhFVdTkvpRNblGH+9qt+30dv+mVgEpN7sG/xDWPlav 6XjTsoasLcarlpik8rP04xa0Es7Wz7LAEaSiu8PTQ5Hsa4JC0FY+IKtc8L+pv1lTRO fta4NRE/GB5KNRSp5ACFsfK/7/o3uzDWML4Y3xkY= Date: Mon, 29 Aug 2022 10:13:59 +0200 From: Greg KH To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Kravetz , Peter Feiner , "Kirill A . Shutemov" , Cyrill Gorcunov , Pavel Emelyanov , Jamie Liu , Hugh Dickins , Naoya Horiguchi , Bjorn Helgaas , Muchun Song , Peter Xu , stable@vger.kernel.org Subject: Re: [PATCH 4.9-stable -- 5.19-stable] mm/hugetlb: fix hugetlb not supporting softdirty tracking Message-ID: References: <1661424546448@kroah.com> <20220825143258.36151-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220825143258.36151-1-david@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 25, 2022 at 04:32:58PM +0200, David Hildenbrand wrote: > commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream. > > Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2. > > I observed that hugetlb does not support/expect write-faults in shared > mappings that would have to map the R/O-mapped page writable -- and I > found two case where we could currently get such faults and would > erroneously map an anon page into a shared mapping. > > Reproducers part of the patches. > > I propose to backport both fixes to stable trees. The first fix needs a > small adjustment. > > This patch (of 2): > > Staring at hugetlb_wp(), one might wonder where all the logic for shared > mappings is when stumbling over a write-protected page in a shared > mapping. In fact, there is none, and so far we thought we could get away > with that because e.g., mprotect() should always do the right thing and > map all pages directly writable. > > Looks like we were wrong: > > -------------------------------------------------------------------------- > #include > #include > #include > #include > #include > #include > #include > > #define HUGETLB_SIZE (2 * 1024 * 1024u) > > static void clear_softdirty(void) > { > int fd = open("/proc/self/clear_refs", O_WRONLY); > const char *ctrl = "4"; > int ret; > > if (fd < 0) { > fprintf(stderr, "open(clear_refs) failed\n"); > exit(1); > } > ret = write(fd, ctrl, strlen(ctrl)); > if (ret != strlen(ctrl)) { > fprintf(stderr, "write(clear_refs) failed\n"); > exit(1); > } > close(fd); > } > > int main(int argc, char **argv) > { > char *map; > int fd; > > fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); > if (!fd) { > fprintf(stderr, "open() failed\n"); > return -errno; > } > if (ftruncate(fd, HUGETLB_SIZE)) { > fprintf(stderr, "ftruncate() failed\n"); > return -errno; > } > > map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > if (map == MAP_FAILED) { > fprintf(stderr, "mmap() failed\n"); > return -errno; > } > > *map = 0; > > if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { > fprintf(stderr, "mmprotect() failed\n"); > return -errno; > } > > clear_softdirty(); > > if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { > fprintf(stderr, "mmprotect() failed\n"); > return -errno; > } > > *map = 0; > > return 0; > } > -------------------------------------------------------------------------- > > Above test fails with SIGBUS when there is only a single free hugetlb page. > # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > # ./test > Bus error (core dumped) > > And worse, with sufficient free hugetlb pages it will map an anonymous page > into a shared mapping, for example, messing up accounting during unmap > and breaking MAP_SHARED semantics: > # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > # ./test > # cat /proc/meminfo | grep HugePages_ > HugePages_Total: 2 > HugePages_Free: 1 > HugePages_Rsvd: 18446744073709551615 > HugePages_Surp: 0 > > Reason in this particular case is that vma_wants_writenotify() will > return "true", removing VM_SHARED in vma_set_page_prot() to map pages > write-protected. Let's teach vma_wants_writenotify() that hugetlb does not > support softdirty tracking. > > Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com > Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com > Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") > Signed-off-by: David Hildenbrand > Reviewed-by: Mike Kravetz > Cc: Peter Feiner > Cc: Kirill A. Shutemov > Cc: Cyrill Gorcunov > Cc: Pavel Emelyanov > Cc: Jamie Liu > Cc: Hugh Dickins > Cc: Naoya Horiguchi > Cc: Bjorn Helgaas > Cc: Muchun Song > Cc: Peter Xu > Cc: [3.18+] > Signed-off-by: Andrew Morton > Signed-off-by: David Hildenbrand > --- > mm/mmap.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Now queued up, thanks. greg k-h