From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A3FC27C76 for ; Wed, 25 Jan 2023 19:26:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B28506B0072; Wed, 25 Jan 2023 14:26:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD8476B0073; Wed, 25 Jan 2023 14:26:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99FD76B0074; Wed, 25 Jan 2023 14:26:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8A4346B0072 for ; Wed, 25 Jan 2023 14:26:17 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 66747120E37 for ; Wed, 25 Jan 2023 19:26:17 +0000 (UTC) X-FDA: 80394302394.01.5D9BD74 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf09.hostedemail.com (Postfix) with ESMTP id 78B02140021 for ; Wed, 25 Jan 2023 19:26:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="gP/WSksG"; spf=pass (imf09.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674674775; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oO2sWpBgC0V416kEOdLUF3rV8TgA+qxQA8n88HXpZYE=; b=5TKRUxFA9oUOcVTdIFnJl2xJuf/3JtkDj13PZbpCa7KYmbN7ifozB61FpXSpFfcP2JuGm3 sxDo43qi/nqRv/9nMHU7WQVAz9L/VaAogWh6WWlUQ9XNX21aHtnebSLCZtJ8T8uUpfyo9K cKhvP8E/vQ9phdfzzWL3qdePJjtX0TM= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="gP/WSksG"; spf=pass (imf09.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674674775; a=rsa-sha256; cv=none; b=f3UMVvC2UcmAPy4/MtZzm565htjiRTO1p5eCere4UhUDspZEX7zn8V4k0eNLtrDi3wA/Zt h5tBIv/ThpnEWDvdPMI2e/lpmKbMgwL4SJQp2ddzCAKNSDyvhK4gCvL1v5WaEP3cvFxauU l7nRXeTWZDlHiAyazNVNQwIES1N0vZI= Received: by mail-pj1-f49.google.com with SMTP id e10-20020a17090a630a00b0022bedd66e6dso3134204pjj.1 for ; Wed, 25 Jan 2023 11:26:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=oO2sWpBgC0V416kEOdLUF3rV8TgA+qxQA8n88HXpZYE=; b=gP/WSksGwWKmKSTxU5qfCw54/GPIpIuhXOltfJOhew45Hr2f7ke14cO9p7ZxXbXr0n DM0jkXoCVRN/rrppbYvY9EyiRyMxJ498cxTAglD+IIV00dSBJiWU/wM46hOvJHS/z/Pw +EGvKseM+KvqrN3XxGa55nJq+BKAQdRFRR5UqOdRt6eP/nZphZiYxu5DIAh/3nZRboDQ dkFDmKuGMvUBlc5U7gHfCA2ahkFAnvIHiz2uSivlJuf1/inHQ199jOtJysvSrTipt9Pe V6H3RBQRHeTjd2OTN0AKD5EwA0YRS1qMBiT0sPtzw9oVXrjHxsVv9A/2TUkA+jleSzOC 2lJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oO2sWpBgC0V416kEOdLUF3rV8TgA+qxQA8n88HXpZYE=; b=QqzswXfNyrhXc8NmfvUfl4etS0/OvtDV1jSJN0Ttld209jrd/5TJnGwFfd9fPWNzi3 hGSw2zyRcG2lqlpqcqVtoTkgtgnlwLslWKgpZYfkEQ1JsuZLt2jmIoTN2qgaLLDIiAHi Izc7Ls7qkcGGmFGg38T/igmyDyDjgZBtE+z9NkafB/zU5yUldd7PfEG9cwyurWu1zbtq RrbAU5NvFFSZtIX916XH/ajc5yu/adem29szMaxvKnIlzeT1RkKs0146AQ+RgbfJah/y bhmkkEILgm6biuBhmAtssuRban3B11w/XMYu+L9S8R6uIg0h8w8t59m7igCNM+aUvyhD J5JQ== X-Gm-Message-State: AO0yUKWA44i+hVlth/t9lmVvAaZ/DPx9Bja1pcLBY6UHWEMAo6TW9cAB fbw6gZeU3ccquslGn8D6XaQ= X-Google-Smtp-Source: AK7set/EBZVOej8nex1ZHEpb3kJlC4XTDDHvdIjJVruFEUbpgnt36CPH4t6Mug9LDyC68svBlmwJJQ== X-Received: by 2002:a17:903:1247:b0:196:19af:a7f3 with SMTP id u7-20020a170903124700b0019619afa7f3mr8230381plh.39.1674674774083; Wed, 25 Jan 2023 11:26:14 -0800 (PST) Received: from fedora ([2601:644:8002:1c20::4e4b]) by smtp.gmail.com with ESMTPSA id l5-20020a170902d34500b00192d3e7eb8fsm3970735plk.252.2023.01.25.11.26.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jan 2023 11:26:13 -0800 (PST) Date: Wed, 25 Jan 2023 11:26:10 -0800 From: Vishal Moola To: James Houghton Cc: Peter Xu , Mike Kravetz , linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , David Hildenbrand , Muchun Song Subject: Re: A mapcount riddle Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: qt7oj96jpt3wb4suqtr7hid551an798z X-Rspamd-Queue-Id: 78B02140021 X-HE-Tag: 1674674775-518171 X-HE-Meta: U2FsdGVkX198y+I10gxeiG3d9IPIw85kNtwXvDTxUvAVO+K9YJ8NlYZl2utAaS5IfDsT2jWFWwjVy0ZoVvq67oMTuyIm+dXKjmXVh46qCNNAAzGaDPoUTZGNFVqSVmPZnIKnrX94ANDdbluqWo38GziRuAhicX2S9AD9ByjQJNFlp9LNN7Pjc/LdjbxarLTQUoYkJBBAaOygpbECdeqDgasP6GrdhXfxaBjcw+zLTZrCNfSTAGyGZ1p5HbgpAhWolktAqH8I2InevdaVUKphoB5PEe0O+zVKhoY2NaUFSVyxXnB85N2BxmFArjtOQdv8qk15SGgil4OoZAkN7ZHY/uKloAZTRKT8EgI3Ke+gb3HUM3lTvF8MqewtvZj8hiiAUELjSJxk5np9BMasLl7AmouEqFF2roFJCL8PkaQ87scauwacb+udhUwLBGZ+XMVh2GvFNlHGjC4hiXSgz7Mn06raCubqPDit22SaqRaZdSzocfK7sfiMif0o5Z8dgk2jJOlw/lfySzoVpViNiPleRcrzSdPzItE3HYpFxAUOxMEkWo+rD7xtbNhHq5eTvsLRk2OQilxIqZaoSIp5A4DH40FKvnrl4eKJKkVUug97E2OqlXz4wtHDMzXBqmDi9luNQGVzvy2Nag2XqJmdtLrfOhed4PVDM3Rh0fnNJ3dc8017RBN1zyexAn6CE+03PdqZhmL3mXc9ezvhG22RrUg8I7Pm0P39oxadbZMajIPtow16rYCsOSWvVLDgZXySqKRXbcJmrSQMIq/jcC8YtR5z24dop2ZYyY4vqisV6lfC2PVOk0exx+qVPr1Y5q4pCc6LgD7T5+UE+v1ovVVK3giQHb52xJXkg8GSUS3pkcRvZ/v2f/SHAyFFhPSmdlGblT7LnQb6/oszv02KtGnGZWOLCBN+fRfTVRzShOqWosG85QkgWE1WjPlnjSGT0KsfaX0fDTHzHLJGFUpd2l9vUwY fJXa+5LY KHTMd8o/9vDxhMQEKVL/acJFqGeJaOIbLicGxrZdm402kEl5R4g6qpgjccNpogpVHF0EpdPKJfWMaWmVJdfKKcz8Z37/ApiGdN3jdbIcnfpG07eZDqHSxVmijnq41p8ga10cZ2Zag64O1+HlzAdj6/J+l4PwXUyXfopSqC5bHUsEj0KFVhmBKglA9XKhUuGlVjwP9a7MQd5qym8CDFjRKjuYIRHcO80miCj/9iPKrFTW+wldQGrs3itHDqLmhMel7v9Hv8TCu0m4Cj1+/x7T4JPIi8alcGidyfPFF+SX0k+6J+DMKzVE/VGUw6CiSBQH1OPkRvvMrrRXP5u7CaSBd836TFZSdLOO3dskxrXGtE8moC1FL6kYIF09/k59ZL06ztzUJFp2AhqiiUicYAd2jduZz8/lprtggV/3H97kmVWD88TRnJcYWmaHFMZXfyy4MPituoN/911C9NQubDayUTDaLSWHe4bTPReOKXAJZrWAtaYey8yZgEgxHcw4g44F4AfWbiQEAKImTM4U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 25, 2023 at 08:22:24AM -0800, James Houghton wrote: > On Wed, Jan 25, 2023 at 7:54 AM Peter Xu wrote: > > > > On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote: > > > > At first thought this seems bad. However, I believe this has been the > > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am > > > > unaware of any reported issues. I did a audit of code looking at > > > > mapcount. In addition to the above issue with smaps, there appears > > > > to be an issue with 'migrate_pages' where shared pages could be migrated > > > > without appropriate privilege. > > > > > > > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > > > > if (flags & (MPOL_MF_MOVE_ALL) || > > > > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > > > > if (isolate_hugetlb(page, qp->pagelist) && > > > > (flags & MPOL_MF_STRICT)) > > > > /* > > > > * Failed to isolate page but allow migrating pages > > > > * which have been queued. > > > > */ > > > > ret = 1; > > > > } > > > > > > This isn't the exact same problem you're fixing Mike, but I want to > > > point out a related problem. > > > > > > This is the generic-mm-equivalent of the hugetlb code above: > > > > > > static int migrate_page_add(struct page *page, struct list_head > > > *pagelist, unsigned long flags) > > > { > > > struct page *head = compound_head(page); > > > /* > > > * Avoid migrating a page that is shared with others. > > > */ > > > if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { > > > if (!isolate_lru_page(head)) { > > > list_add_tail(&head->lru, pagelist); > > > mod_node_page_state(page_pgdat(head), > > > NR_ISOLATED_ANON + page_is_file_lru(head), > > > thp_nr_pages(head)); > > > ... > > > } > > > There's also 3 functions in migrate that appear to check for a similar condition - add_page_for_migration(), numamigrate_isolate_page(), and migrate_misplaced_page(). > > > If you have a partially PTE-mapped THP, page_mapcount(head) will not > > > accurately determine if a page is mapped in multiple VMAs or not (it > > > only tells you how many times the head page is mapped). > > > > > > For example... > > > 1) You could have the THP PMD-mapped in one VMA, and then one tail > > > page of the THP can be mapped in another. page_mapcount(head) will be > > > 1. > > > 2) You could have two VMAs map two separate tail pages of the THP, in > > > which case page_mapcount(head) will be 0. > > > > > > I bring this up because we have the same problem with HugeTLB > > > high-granularity mapping. > > > > Maybe a better match here is total_mapcount() rather than page_mapcount() > > (despite the overheads on the sub-page loop)? > > This would kind of fix the problem, but it would be too conservative now. :) I agree. Interestingly, numamigrate_isolate_page() does take the total_mapcount() approach right now, so I'm not sure how much of a difference being more conservative makes. > In both example 1 and 2 above, total_mapcount(head) for both would be > 2, so that's ok. But now consider: you have one VMA that is > PTE-mapping two pieces of the same THP. total_mapcount(head) is still > 2, even though only a single VMA is mapping the page.