From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m15.mail.163.com (m15.mail.163.com [45.254.50.219]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 03FC2149DE1; Thu, 27 Jun 2024 12:47:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.254.50.219 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719492441; cv=none; b=QthhATtTnE+dIs4RI6x1Fqza7oXKdwJXrwm2Ffe4qkst42JHtC8C4y+063PqZsFqkQWDQK/BqDmhkgCNH+6rkvPldy6OAKeg6qPAmhpgu8CdFf2T3Fihq53mYWZ+H76N/32iAtdkSdvB0RY7EJ19RHhLgTjFi3swxpWKarGPch4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719492441; c=relaxed/simple; bh=9O0e0/pVQKRPZRpTviOM9puWuTW1NUrnLncr2TIKr88=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PhAfLpCILBVX7s8PZDvaL1HCVGOzidddAVSMSqyeuvMO8IW/Y0cQ9WAkCUoxL52LdgomgLvMmJR3DX9fkoYy44AiikMoIW9kIXJcNdk2Lo2ByAP+EfjlOGsl7iWmiZ6mOcnxPODJFkJcnxx73JWkdMjJLfc+XDyBuCj49nb9UWs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=S0y+nBbM; arc=none smtp.client-ip=45.254.50.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="S0y+nBbM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=pIFig ibXxiJl8Xxkvx7dP2YimD87atVk5KkBCJ4OxgE=; b=S0y+nBbMS3giy7EKQn5tD DJeTJcxZ/hSPb7pJFhAdVnA7BmDMNU0KCilVPycc62EdlxfOQ/kELnWUskfgIK/r GZ3lXS/wXYH2R4f34UvDdx7cRaEKAP9Vwb9jpLdvkmqu3cWqTHC++kZbL4BkPAAp asSxlvKD1TP/lgBcYYCBb4= Received: from localhost.localdomain (unknown [193.203.214.57]) by gzga-smtp-mta-g0-3 (Coremail) with SMTP id _____wDHr84VX31mAubSAg--.63733S4; Thu, 27 Jun 2024 20:46:15 +0800 (CST) From: ran xiaokai To: ryan.roberts@arm.com Cc: 21cnbao@gmail.com, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, david@redhat.com, ioworker0@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterx@redhat.com, ran.xiaokai@zte.com.cn, ranxiaokai627@163.com, svetly.todorov@memverge.com, vbabka@suse.cz, yang.yang29@zte.com.cn, si.hao@zte.com.cn, wangkefeng.wang@huawei.com, willy@infradead.org, ziy@nvidia.com Subject: Re: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages Date: Thu, 27 Jun 2024 12:46:13 +0000 Message-Id: <20240627124613.23377-1-ranxiaokai627@163.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <4e1a1878-4133-4d78-90fa-1d5bc99d179c@arm.com> References: <4e1a1878-4133-4d78-90fa-1d5bc99d179c@arm.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wDHr84VX31mAubSAg--.63733S4 X-Coremail-Antispam: 1Uf129KBjvJXoWxJF48Wr4kXr15JF4kKrW7urg_yoWrKryUpF yrtFyDtF4ktr4Fyr17tw4UtFy8Kr13XFWrWr98Ary8Zwn0qrnrur17G3y09F9rZrn7Ar1j vF4jvF93ua4qvFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0JUrxhQUUUUU= X-CM-SenderInfo: xudq5x5drntxqwsxqiywtou0bp/xtbB0hkLTGWXyo-slAAAsD >On 27/06/2024 10:16, Barry Song wrote: >> On Thu, Jun 27, 2024 at 8:39?PM Ryan Roberts wrote: >>> >>> On 27/06/2024 05:10, Barry Song wrote: >>>> On Thu, Jun 27, 2024 at 2:40?AM Zi Yan wrote: >>>>> >>>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: >>>>>> On 26/06/2024 04:06, Zi Yan wrote: >>>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>>>>>> From: Ran Xiaokai >>>>>>>> >>>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>>>>>> pages, which means of any order, but KPF_THP should only be set >>>>>>>> when the folio is a 2M pmd mappable THP. >>>>>> >>>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >>>>>> currently configured? >>>>>> >>>>>> I would argue that mTHP is still THP so should still have the flag. And since >>>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >>>>>> will be enabling them, so I'll naively state that it should not cause compat >>>>>> issues as is. >>>>>> >>>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >>>>>> sizes to function correctly. So that would need to be reworked if making this >>>>>> change. >>>>> >>>>> + more folks working on mTHP >>>>> >>>>> I agree that mTHP is still THP, but we might want different >>>>> stats/counters for it, since people might want to keep the old THP counters >>>>> consistent. See recent commits on adding mTHP counters: >>>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback >>>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") >>>>> >>>>> and changes to make THP counter to only count PMD THP: >>>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for >>>>> THP split statistics") >>>>> >>>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some >>>>> adjustment on tools/mm/thpmaps. >>>> >>>> It seems we have to do this though I think keeping KPF_THP and adding a >>>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools >>>> relying on KPF_THP need to realize this and check the new bit , which is >>>> not done now. >>>> whether the mTHP's name is mTHP or THP will make no difference for >>>> this case:-) >>> >>> I don't quite follow your logic for that last part; If there are 2 separate >>> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that >>> would be a safe/compatible approach, right? Where as your suggestion requires >>> changes to existing tools to work. >> >> Right, my point is that mTHP and THP are both types of THP. The only difference >> is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how >> the page is mapped would more accurately reflect reality. However, this change >> would disrupt tools that assume KPF_THP always means PMD-mapped THP. >> Therefore, we would still need separate bits for THP and mTHP in this case. > >I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is >whether PKF_THP implies a fixed size (and alignment). If compat is an issue, >then PKF_THP must continue to imply PMD-size. If compat is not an issue, then >size can be determined by iterating over the entries. > >Having a mechanism to determine the level at which a block is mapped would >potentially be a useful feature, but seems orthogonal to me. > >> >> I saw Willy complain about mTHP being called "mTHP," but in this case, calling >> it "mTHP" or just "THP" doesn't change anything if old tools continue to assume >> that KPF_THP means PMD-mapped THP. > >I think Willy was just ribbing me because he preferred calling it "anonymous >large folios". That's how I took it anyway. > >> >>> >>> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new >>> flag; We don't currently expose the term "mTHP" to user space. I can't think of >>> a better name though. >> >> Yes. If "compatibility" is a requirement, we cannot disregard it. >> >>> I'd still like to understand what is actually broken that this change is fixing. >>> Is the concern that a user could see KPF_THP and advance forward by >>> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries? >>> >> >> Maybe we need an example which is thinking that KPF_THP is PMD-mapped. > >Yes, that would help. For now it is the testcase in tools/testing/selftests/mm/split_huge_page_test, if we try to split THP to other orders other than 0, the testcase will break. Maybe we can use KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL to figure out the compound page's start/end and the order. But these two flags are not for userspace memory only.