From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1C2BBEE49A0 for ; Mon, 21 Aug 2023 22:01:23 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=ipxAnVHK; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4RV5z53mtHz3bwL for ; Tue, 22 Aug 2023 08:01:21 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=ipxAnVHK; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2a00:1450:4864:20::334; helo=mail-wm1-x334.google.com; envelope-from=jannh@google.com; receiver=lists.ozlabs.org) Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4RV5y65MF5z2xBF for ; Tue, 22 Aug 2023 08:00:29 +1000 (AEST) Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-3fef56e85edso14255e9.1 for ; Mon, 21 Aug 2023 15:00:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692655224; x=1693260024; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7KOwCMKv6gJICwnu0jRAkIqxjiFqWwB1qyLCAsgnqG0=; b=ipxAnVHK5hrxVLmvntmZQrofxIux7YPNhttG1XpTAHAPHEbJX74AyC6D4ZvDaOaFFY 0u7VO6HQsy9Si6WU/4KQ10X3DsXebeRA9C4uF7BsmGmhr/b1m4wFOGUNBoH1rH/c96ns MggRJw+aieI4JfjXdgUWoV/cCP0nogIMoWDaf610sIRFm82YPcnV89QELcdfkpl7k5tb vj2fNyIEO+jILsmwR1cpX7MNprfnDDbjw88QMqN1iZkXUwb/wTCTWxbJP9qJIuSMBYEJ pJo0wONEosBk90GtUzsRO2I04ynlMXp9sa6iFVk3XLg4jUVcaNA+gfK91UV+6QJq9QkO icXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692655224; x=1693260024; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7KOwCMKv6gJICwnu0jRAkIqxjiFqWwB1qyLCAsgnqG0=; b=DmnGDND+FEu2KRUPQoHygO2dzb5iEqKg/XDSB4AjcOQd3QuJdR1VB6LIyClcnC/OEA ypyhz3A6zzrJiM/CqcaeCXG71JYdAovY50EkuQ1MFhnPHinnjs2Hfh2s3awyVNVUnBjO U9o9zjYAEj08u0kziZdcxXutnMfkGwKgIAEUwaMRfDKKuFFmFb7ZCQNZxzOhjxr7nIu1 NNJSa82Od+W0gl+SaFzVF8SD4KmFxEPBl9H5BnZEy0oB5McXTbY5n2XZs4RiI9MBZauS IfqMAjAyAgzI9zXVBdqo+UvwLWddUpBo7WFl5Vuz/XwbvZmlqKbhMusqedXD7vvaTvqC rDqA== X-Gm-Message-State: AOJu0Yz1BrHaYE8oe+yfa0klU9ZOrrY2k+hpumMcAQFsyIzl3IuGx0ED CsHGvQzK8thIAM1tbYGKZ6doETBUcpvi/uu1C4tCLA== X-Google-Smtp-Source: AGHT+IH64SpGveN9dGK2zvuRPaovHVk4+hMnYMZFm087iCMpl0qOKktqy2IA+4bd+qpox23ZOFiqzmDaa1UD37turPI= X-Received: by 2002:a05:600c:1d23:b0:3fd:e15:6d5 with SMTP id l35-20020a05600c1d2300b003fd0e1506d5mr45830wms.2.1692655223487; Mon, 21 Aug 2023 15:00:23 -0700 (PDT) MIME-Version: 1.0 References: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> In-Reply-To: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> From: Jann Horn Date: Mon, 21 Aug 2023 23:59:46 +0200 Message-ID: Subject: Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd To: Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , kernel list , Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390 , Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Zi Yan , Huang Ying , Axel Rasmussen , Gerald Schaefer , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , Vlastimil Babka , Linux ARM , SeongJae Park , Lorenzo Stoakes , Linux-MM , linuxppc-dev , Naoya Horiguchi , Zack Rusin , Zach O'Keefe , Vishal Moola , Minchan Kim , "Kirill A. Shutemov" , Andrew Morton , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Mon, Aug 21, 2023 at 9:51=E2=80=AFPM Hugh Dickins wro= te: > Jann Horn demonstrated how userfaultfd ioctl UFFDIO_COPY into a private > shmem mapping can add valid PTEs to page table collapse_pte_mapped_thp() > thought it had emptied: page lock on the huge page is enough to protect > against WP faults (which find the PTE has been cleared), but not enough > to protect against userfaultfd. "BUG: Bad rss-counter state" followed. > > retract_page_tables() protects against this by checking !vma->anon_vma; > but we know that MADV_COLLAPSE needs to be able to work on private shmem > mappings, even those with an anon_vma prepared for another part of the > mapping; and we know that MADV_COLLAPSE needs to work on shared shmem > mappings which are userfaultfd_armed(). Whether it needs to work on > private shmem mappings which are userfaultfd_armed(), I'm not so sure: > but assume that it does. I think we couldn't rely on anon_vma here anyway, since holding the mmap_lock in read mode doesn't prevent concurrent creation of an anon_vma? > Just for this case, take the pmd_lock() two steps earlier: not because > it gives any protection against this case itself, but because ptlock > nests inside it, and it's the dropping of ptlock which let the bug in. > In other cases, continue to minimize the pmd_lock() hold time. Special-casing userfaultfd like this makes me a bit uncomfortable; but I also can't find anything other than userfaultfd that would insert pages into regions that are khugepaged-compatible, so I guess this works? I guess an alternative would be to use a spin_trylock() instead of the current pmd_lock(), and if that fails, temporarily drop the page table lock and then restart from step 2 with both locks held - and at that point the page table scan should be fast since we expect it to usually be empty.