From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A520DC27C76 for ; Wed, 25 Jan 2023 16:47:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F0C86B0071; Wed, 25 Jan 2023 11:47:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A1806B0072; Wed, 25 Jan 2023 11:47:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1411E6B0073; Wed, 25 Jan 2023 11:47:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0256F6B0071 for ; Wed, 25 Jan 2023 11:47:07 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B356340C3F for ; Wed, 25 Jan 2023 16:47:07 +0000 (UTC) X-FDA: 80393901294.08.EBC8BA3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 827E9C0018 for ; Wed, 25 Jan 2023 16:47:05 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UQbsnNc7; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674665225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/Niw0+u8bBbxVWy/r/VRYQeyLmpzFWdQA1LKuVrF4rA=; b=Fl/RyGe8JmKaZQcDnZ7OoKaQ1bimY23QkktFk+XBOihsp4QFRnbLicnD52yS26xgEZusbx iZSG6k1hlv4ulOgTK+N150TiggIrOCeBboZ9GoLF9yVksw8jrKyDtHqZykKJt1dKW+nVs0 lXJQWGsMtqTbmiEt/GrFFDZkjsvPbkk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UQbsnNc7; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674665225; a=rsa-sha256; cv=none; b=T+lVbcXcYzqfC9Uo1Sm09Jpx7XSV+LTClVZHn7vul8tJaF603wmhBDGc3EgawAsIZrbxcL ejDGhwq+sgdBZ493stxXDHt08wMcs5UPjNPR7qkTMkZAWex2t7xKEtmXxRhfzl44bETFnQ 4wQTgmvQElBNVMkFeWKMHEKcN9DwBgs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674665224; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/Niw0+u8bBbxVWy/r/VRYQeyLmpzFWdQA1LKuVrF4rA=; b=UQbsnNc7B9Igu7AlUi1OO7jfiqqFv13N2fTAW2DnC51nVBNznANgj6JSZ1gYJhSEYQyLy/ h2n3Zfl8Gdj/AViVF7/61Yz2pCn3qcC3bz87lE9VQ8nRl1KValUSh7V+rrNN87ZH8yUR8+ ce6qB9Osau2lz21ePjrCInqPXPmKKcI= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-547-CgpeT3-iNLSXwSZe3mJeZA-1; Wed, 25 Jan 2023 11:47:01 -0500 X-MC-Unique: CgpeT3-iNLSXwSZe3mJeZA-1 Received: by mail-qv1-f71.google.com with SMTP id px22-20020a056214051600b00537657b0449so3276698qvb.23 for ; Wed, 25 Jan 2023 08:47:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/Niw0+u8bBbxVWy/r/VRYQeyLmpzFWdQA1LKuVrF4rA=; b=Gvuqx9i1RFr4dBt6WevZMFyv8OdAAWU9TWxG9xv0PVlwFUPpop+U+O+Fgte/3eAi3l WrO2LDJvdrlvZ7xehsjyaHTVK249SdQ+nupScmSCZX5499T62jNZfDbF6sPJW51ltVOb xfwlio7zwEN0sFqq0kSAk9NxlsdDL5RWCpRSrZ8MGQODzZ5zw4y13aKh2tUPdmBA3Sd3 n1fRe0ddM1yPBQnwpTRKHjfdAP+Un/1DN9fPKb6poLx/oWH8ens605ojfD3MXl782p5y OjFxn3xQH4SfOx54bSdn1uLjEofUWoB1EwLktnCyPFtRescK/eCxLhNyVNY5dSJMAHN0 N1xQ== X-Gm-Message-State: AO0yUKUwynMFlSqPR9TOl28F6mi6JYzfeGW0HbxE7TjDEvnmiAAspDKn faOIWagyh6R8XDXodEBlCqgVOKnnam2Gdas6Pqf6ugZW+C0VaPvX/nLf4wth8lZNGRUlH/FuFwR RQp6EX0cNCAM= X-Received: by 2002:a05:6214:192c:b0:537:708d:3fef with SMTP id es12-20020a056214192c00b00537708d3fefmr5650242qvb.38.1674665220705; Wed, 25 Jan 2023 08:47:00 -0800 (PST) X-Google-Smtp-Source: AK7set+7R+mABf8fpjUJMvQMPPl/FTQ6eDr4qhAAwLnG4tItnXK9un7hSe2abBw6k5HxWQEs6uijiw== X-Received: by 2002:a05:6214:192c:b0:537:708d:3fef with SMTP id es12-20020a056214192c00b00537708d3fefmr5650220qvb.38.1674665220435; Wed, 25 Jan 2023 08:47:00 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id bi11-20020a05620a318b00b007090bb886a2sm3911686qkb.118.2023.01.25.08.46.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jan 2023 08:46:59 -0800 (PST) Date: Wed, 25 Jan 2023 11:46:58 -0500 From: Peter Xu To: Mike Kravetz Cc: linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , David Hildenbrand , James Houghton , Muchun Song Subject: Re: A mapcount riddle Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 827E9C0018 X-Rspam-User: X-Stat-Signature: ywuiuzdo8ibf796pfignnauc8ha7bg86 X-HE-Tag: 1674665225-593584 X-HE-Meta: U2FsdGVkX19P7ZPS3DCJRteeC9xWX6TXCgaIuAibk28MJFFdkd4x5IC0avxyrQ/ppYpX9/tFRLO8O43Lm84OTaEIKuWzerhU0wLrXJyShMoMnA2lWK5mlOoHLq2Zv2WfKB6owOcRzmSG8XE5Wps6Oclfj5Y8JN4EoRrUAIZcDEr3yKYi4wQpAr4bMo4IKqggjQVmv22KONsTfW87JWvSF2gXP1OdsVHaBQA7VQ8U8rpbl2Q1CAzfbGnAzED4SBZ3lQwjgdgVI1Q+cJgVN+AsGC4c9eW3a9RCJtFcvMIPxvdDmFK8I5X+iN3xOeeLZf2WD0tsRnuRm3F0Wyapg4kyTY8NM4/9l03S5WGNILDdjPRJjT4DI99pxis7LBBv/pS3wgqgSUAFNKEkdjtTNNxlu+EWbunGUmcCgInN3Rh4Ncc6Q59Xvrho90WKrEJyT4Ftj9tXGlD2cEw6QH6cIbBRKQgJIiFwmf5Xaza17LYibBhW9lGd0Qnr/5zMJShLdyRwPKMasicAawek0VKf3BDQmUx54HAcZQsB2nDxE+orQA+BRJH4KNE8vIPEimxz4C4z1LEI1q8kL6IXuiqPQIY0G5tP+OGduxeX54Gho/BpA31hAsijnaAk0ha31dx1A4bqZDlRFp/An0CvCSZ8JNl9sBI0RqkmW0HNEz08fnoqlC3xX+LBkDQev/eyq7zH50D2tUpG5Y/4TsYOpdnv2O9QnWfIoGgulaGxMNEslvtiDn5Ff8SEl2uuOk0tlXivkVZpeTO9vGiqWJCEEwO42GBV8K2xHbg3B3fD6CfEbQZUUwe3JLvRAkeob2XcfCDV4RjStNARej5n1Q5fmSBNlWQL2fG6MxyPx0TjwNginiqyoNhnYPauUxuw1SYbJnJRuYLUUj13mVs2CZoN7h97EnjwWXQSW+U1343wIJXFMp62X+QMA/6IPxy/9ldN+SNbOw2NYQ/YhvKtw8k0gd8t/Gw 8/ZoUnf+ WheUS0xraaF4ARX3hm73wqMOTQ5zoalO+8MyxUTBGAEariJK5BIXNiduNl1tpJQD+eIDTsdxLHAy+3dThHaJkS+iNHNLZdBUGBCqV8VL6Rq02RjFrhIjty7ydE1gG0LVxAKur0ksS1uKMMs2mHFJEZw3qgOtrQBncpvbj26oldT3KxfZ/k4Ibf1Rq0AAzdWxyPC0uC2+KnMr6nH85tI3o60IYWd9CfoltLt8PAMunUoKiuL8bdnCwg9WsJ3hXHjpCHXQUf5xSJywPG61E+il+lv1BD7s73T+QYLxMZC4BXTY5uxWuXFxldC8vYg7aAYOfHlW5W74FdZPDmFxb0+HByHx6J6iNdFM5pqqZASf/3b8vvO73POD1UwFK64txV3tRIqFR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 24, 2023 at 03:35:38PM -0800, Mike Kravetz wrote: > On 01/24/23 18:00, Peter Xu wrote: > > On Tue, Jan 24, 2023 at 12:56:24PM -0800, Mike Kravetz wrote: > > > At first thought this seems bad. However, I believe this has been the > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am > > > unaware of any reported issues. I did a audit of code looking at > > > mapcount. In addition to the above issue with smaps, there appears > > > to be an issue with 'migrate_pages' where shared pages could be migrated > > > without appropriate privilege. > > > > > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > > > if (flags & (MPOL_MF_MOVE_ALL) || > > > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > > > if (isolate_hugetlb(page, qp->pagelist) && > > > (flags & MPOL_MF_STRICT)) > > > /* > > > * Failed to isolate page but allow migrating pages > > > * which have been queued. > > > */ > > > ret = 1; > > > } > > > > > > I will prepare fixes for both of these. However, I wanted to ask if > > > anyone has ideas about other potential issues with this? > > > > This reminded me whether things should be checked already before this > > happens. E.g. when trying to share pmd, whether it makes sense to check > > vma mempolicy before doing so? > > Not sure I understand your question. Are you questioning whether we should > enter into pmd sharing if mempolicy allows movement to another node? > Wouldn't this be the 'normal' case on a multi-node system? > > > Then the question is if pmd sharing only happens with the vma that shares > > the same memory policy, whether above mapcount==1 check would be acceptable > > even if it's shared by multiple processes. > > I am not a mempolicy expert, but that would still involve moving pages > mapped by another process. For that CAP_SYS_NICE is required. So, my > opinion would be that it is not allowed even if mempolicy is the same. Makes sense. > > > Besides, I'm also curious on the planned fix too regarding the two issues > > mentioned. > > My planned 'fix' is to simply check for shared a PMD > (page_count(virt_to_page(pte))) to determine if page with mapcount == 1 > is shared. I think having the current pte* won't easily work, we'll need to walk all the pgtable that mapped this page. To be explicit, one page can be mapped at pgtable1 which is shared by proc1 & proc2, and it can also be mapped at pgtable2 which is shared by proc3 & proc4. Then (assuming pte1* points to pgtable1): page_count(virt_to_page(pte1)) + page_mapcount(page) Won't be the right mapcount we're looking for. But then if we're going to rmap walk all the mappings it seems even easier to just count how many times the page is mapped when walking, rather than relying on page_mapcount. What David said on maintaining map/ref counts during sharing is another approach, I'm wondering whether there's any better way to do this. Thanks, -- Peter Xu