From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43EF4C3DA45 for ; Fri, 12 Jul 2024 15:52:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CC986B0093; Fri, 12 Jul 2024 11:52:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97B256B0095; Fri, 12 Jul 2024 11:52:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81BD06B0096; Fri, 12 Jul 2024 11:52:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 63CCB6B0093 for ; Fri, 12 Jul 2024 11:52:38 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7BB5D160A9D for ; Fri, 12 Jul 2024 15:52:37 +0000 (UTC) X-FDA: 82331543154.18.6242BA0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 5A26A14000F for ; Fri, 12 Jul 2024 15:52:35 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PmgWxvM6; spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720799510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CBPZJXhtPTYw9uGWwBpC8EVMxJRjwQ/Iz+iND+SW6sA=; b=cTxPlvgwY25TWBlNZ2Q8s7TOiytBE8gipXIgMiRqSc9o/Lh4M+fVoOpSMLL1hGmxE/lwsV GOJbKkiEBB0yWhHt8fK3hPfNScA+o1B0qiECOEMboG3bVF2Y9XFDAeKI4RWxrAVXmv+pOQ QIWhtbH8wzHsM9COG/ZOsp216HRt9DQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PmgWxvM6; spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720799510; a=rsa-sha256; cv=none; b=5sU4Wufb1UmCiAqWBcQ1wSfKvje+JX4A+M1DdpLQzcTW0C97O96HNr4ntVHTZ+kq/AYmmP Ho9YQp7KaDCTq2w2SAZWfVD/2pDndKh+g/bY4z2XwYUZXxt8puB6vlAaiENnvSavFNp42q 5t60IpMNPcffCwRsISp4FaPMy6xUOAA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720799554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CBPZJXhtPTYw9uGWwBpC8EVMxJRjwQ/Iz+iND+SW6sA=; b=PmgWxvM6EAtMenNTAoVwv7W1mM4igF78kfNKy8xvpx2q7dWoVO4rxrJMgEE1CNP8gMpLdx fxJL1qPAA1I9W9VMMFNivK/+6SxnrDYuLCmyI1Q4pnPF5K1gyayTcAIbvYbP+F2uRlIs/l feDCZyHcGPHXqBMK3qGG6XtnElhqilc= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-34-qSgPppDgME2oXhl4U6sB7Q-1; Fri, 12 Jul 2024 11:52:32 -0400 X-MC-Unique: qSgPppDgME2oXhl4U6sB7Q-1 Received: by mail-oi1-f198.google.com with SMTP id 5614622812f47-3d91995a3c6so21280b6e.2 for ; Fri, 12 Jul 2024 08:52:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720799552; x=1721404352; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CBPZJXhtPTYw9uGWwBpC8EVMxJRjwQ/Iz+iND+SW6sA=; b=CuzqBH+ksrIl18cVyT6hrRx0M4n9RFjvZCO/KMKk0UMzmDgNQAiXnNUFnsjjtJ+xuD 31iSwDzb08UJDhZt1lgGzkJp9oKS9xu52DQeKhJrUELbTHPiA83ctjGkXl915mq2K/zF 28L+fY0PHDktzPQ4g4GcE3Kls6PasxDmqXMUlNp07+AxNioqbxRFCEIaOEZMP3focobJ eZdq09TCBB/DxmmOu03fGy1o5Xu8q0wXh/ZKEltVNkcI0UGptOyUOdynSYG9KfSPDirb 3OXpcMNIXt8w5EBQYPevNmSMTqJXhMf3JcuPkTbzOw7K5dvD1iBYEylUR609bqxObkuh YJsw== X-Forwarded-Encrypted: i=1; AJvYcCUxjhMYF+aof0HrMvxHyv5/0QU2j3kr8FSxCdqJng+rObLQtNEYERyIS5bwbdFxUrtyyfcbjxFCBWrDuwqiR/sc8Vg= X-Gm-Message-State: AOJu0YzU4gwP1IJlJ/xgzgH8hAHrabxWmL/rVyRoqozmAOeyxTrJQSHQ ucgCLCvAy6k0fF2Y+OL66ZuqZ2ilw6teRb+/Kt3XbUzNSC7lbHnsGsmKaAvQtMBNTWZsl4gLLkO HL62QlMccWLFOPw3umVkR+gsMOvbH8dgsROe9RhsJemMSrmr+ X-Received: by 2002:a05:6808:18a4:b0:3d9:3f72:7147 with SMTP id 5614622812f47-3daa0cc727emr5077556b6e.3.1720799551957; Fri, 12 Jul 2024 08:52:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG/ZIrWNSDLGpdOzmVtr7gg5txILa2tK9NGlD5tHXAm+7q1klaoYYw+zZj//kgznCBpBAfzHQ== X-Received: by 2002:a05:6808:18a4:b0:3d9:3f72:7147 with SMTP id 5614622812f47-3daa0cc727emr5077526b6e.3.1720799551507; Fri, 12 Jul 2024 08:52:31 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79f19012fa9sm414621985a.46.2024.07.12.08.52.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jul 2024 08:52:31 -0700 (PDT) Date: Fri, 12 Jul 2024 11:52:27 -0400 From: Peter Xu To: Alistair Popple Cc: dan.j.williams@intel.com, vishal.l.verma@intel.com, dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com, jack@suse.cz, jgg@ziepe.ca, catalin.marinas@arm.com, will@kernel.org, mpe@ellerman.id.au, npiggin@gmail.com, dave.hansen@linux.intel.com, ira.weiny@intel.com, willy@infradead.org, djwong@kernel.org, tytso@mit.edu, linmiaohe@huawei.com, david@redhat.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, jhubbard@nvidia.com, hch@lst.de, david@fromorbit.com, Alex Williamson Subject: Re: [PATCH 11/13] huge_memory: Remove dead vmf_insert_pXd code Message-ID: References: <400a4584f6f628998a7093aee49d9f86c592754b.1719386613.git-series.apopple@nvidia.com> <87zfqrw69i.fsf@nvdebian.thelocal> <87sewf48s6.fsf@nvdebian.thelocal> MIME-Version: 1.0 In-Reply-To: <87sewf48s6.fsf@nvdebian.thelocal> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5A26A14000F X-Stat-Signature: xyboaq98gwjy43iagt8qp491aoo5r7qk X-Rspam-User: X-HE-Tag: 1720799555-228525 X-HE-Meta: U2FsdGVkX1+l0LX/WvjUB3cDUR77DSQJVBLhXp5L2fLha0RdTf++/yqcVHDCKJ7W7eVks6gmrZAWJXXSV9+v/9EplpKYLPcJag0F2To6CIHtA6Q5Mbym0WtT5UGuHlz8U66MXaAUVZ1Wzra5T67yRvWwVjNXe4KKYJ5ZGcSzgsALGs6ariZUOY5W8qy8aVJZk1QFIramyYgkTsr8Z1F+RrKIF0xa/uUFDRlKK/WcaFS5b3jxHIl/DC3dr1ChkgXS8k+ffqPa3SGZp3EnzXSQe4dBHie1x9yhncfS1wmJrYXnlJZ3S5afu0JNNKBtuiYseNncErsjU60OraEVRoneKNzUNQakfJwTa3wWQZ8FjzXShSd0o/ICu2JSEU/Jd2YUa0u7xmm1QzwSGhOhAjCOtdWfHieAmGo1ZP4cem97wLIV+VOAUJVYVXCxmdn5bNB0zdYi4u0+16zmBWMcDqQh0n43JE3eY53sDmjPh+8n9XJYf1JV1oULcmsfaIz5HJTJOpGeej8jCkX+c8oT/yCfRmOJtYZb83kEk3N2A5+Vut2XmoD1fZprVAEZyDutArd/vtfLfnnmuZWPP+sQ4HiM9MSGj/XtT6CYZlpkOMBNUae7G9cyTzf1UFHfHtaEbp4pLVDc1QNM4r+kgtMz1RyjA1QKTtBdc9hAG8cA5eVZt19ddMVE3V2LebuXDtqgAu4MeqCc45HVLnDpX1RauxfZ6U2wqIbXzwrENul9/pdIQCzwh5OWq20BsQG9rYgWpX90yoADXCeVXgY2B57LY7DavGjAc7AUMdnCmDoljoBRi7ZkMzZKzaWPZ0AxIlhFThjic9N4EpfbSlVtcYaQJSdHBTKgA6/FzlPEWNPnBD+EdnIveC2+fGuQLnZDGYukgYQ2FBHVWHUNBtkkLNOKihoefXDg31dx/NQ+VT4DxhF4T3ANZ0PY865BymatTdpcHHhEkDan6JbZEgyQFraQnjK IV0yy5ti KDASOrIWdVNKwf+2nFkeW9eQKrJ+B3asAbZU0fAqYPwXOt4qTm+YwCKnPij+9j9RTFnlCtyDupjrzPoQHA2K70qUN7oODw2n+AF0MydyT584arFIrERd5kyGIb3iNwjRIoKWvjDes/dOpEaxgCICoqhVmKlSAYzmef2Mh4e10JY0WFOmMTUE0irDvC/ktBCx1wqAgfmTnFeY4rLSoDuUWJ8vDggxVIGN+aCZHTEuzxmmDTnCBOKnuJ2PorRsMw0vCAxeYnYL5SejQCOVrc1dtvXh+ebXCyJdwAcKe7DPFB604wsvC3pFsRg/SdWvhCmBiEKM67vd4XIME0tNzScPTRkRfpIGGq4wStoluzAqDfz2cRfTq34XUu0bedHf3XDuKNCf4dKISCIrKKNLO3gUpJDUwi8naK0Igbwmi6qu1+5m+J8FMjUU1cJ/rUL71QlYG6fpLbkv9U7nw6e+NiDovf/Y2n+5y+iDlgZuZIw8w4LnUJSJg3Rb8HpqEiqSGxm+SG8X8m+48fC2Fp4KIkkLwm4iL56+YJo+iqgV4bKohk+RLN/RwAXbwKeyFLIph62nZ+4FLO0B1ui6W+CT/A1RdRuG9gA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 12, 2024 at 12:40:39PM +1000, Alistair Popple wrote: > > Peter Xu writes: > > > On Tue, Jul 09, 2024 at 02:07:31PM +1000, Alistair Popple wrote: > >> > >> Peter Xu writes: > >> > >> > Hi, Alistair, > >> > > >> > On Thu, Jun 27, 2024 at 10:54:26AM +1000, Alistair Popple wrote: > >> >> Now that DAX is managing page reference counts the same as normal > >> >> pages there are no callers for vmf_insert_pXd functions so remove > >> >> them. > >> >> > >> >> Signed-off-by: Alistair Popple > >> >> --- > >> >> include/linux/huge_mm.h | 2 +- > >> >> mm/huge_memory.c | 165 +----------------------------------------- > >> >> 2 files changed, 167 deletions(-) > >> >> > >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >> >> index 9207d8e..0fb6bff 100644 > >> >> --- a/include/linux/huge_mm.h > >> >> +++ b/include/linux/huge_mm.h > >> >> @@ -37,8 +37,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > >> >> pmd_t *pmd, unsigned long addr, pgprot_t newprot, > >> >> unsigned long cp_flags); > >> >> > >> >> -vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); > >> >> -vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); > >> >> vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); > >> >> vm_fault_t dax_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); > >> > > >> > There's a plan to support huge pfnmaps in VFIO, which may still make good > >> > use of these functions. I think it's fine to remove them but it may mean > >> > we'll need to add them back when supporting pfnmaps with no memmap. > >> > >> I'm ok with that. If we need them back in future it shouldn't be too > >> hard to add them back again. I just couldn't find any callers of them > >> once DAX stopped using them and the usual policy is to remove unused > >> functions. > > > > True. Currently the pmd/pud helpers are only used in dax. > > > >> > >> > Is it still possible to make the old API generic to both service the new > >> > dax refcount plan, but at the meantime working for pfn injections when > >> > there's no page struct? > >> > >> I don't think so - this new dax refcount plan relies on having a struct > >> page to take references on so I don't think it makes much sense to > >> combine it with something that doesn't have a struct page. It sounds > >> like the situation is the analogue of vm_insert_page() > >> vs. vmf_insert_pfn() - it's possible for both to exist but there's not > >> really anything that can be shared between the two APIs as one has a > >> page and the other is just a raw PFN. > > > > I still think most of the codes should be shared on e.g. most of sanity > > checks, pgtable injections, pgtable deposits (for pmd) and so on. > > Yeah, it was mostly the BUG_ON's that weren't applicable once pXd_devmap > went away. > > > To be explicit, I wonder whether something like below diff would be > > applicable on top of the patch "huge_memory: Allow mappings of PMD sized > > pages" in this series, which introduced dax_insert_pfn_pmd() for dax: > > > > $ diff origin new > > 1c1 > > < vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) > > --- > >> vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) > > 55,58c55,60 > > < folio = page_folio(page); > > < folio_get(folio); > > < folio_add_file_rmap_pmd(folio, page, vma); > > < add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR); > > --- > >> if (page) { > >> folio = page_folio(page); > >> folio_get(folio); > >> folio_add_file_rmap_pmd(folio, page, vma); > >> add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR); > >> } > > We get the page from calling pfn_t_to_page(pfn). This is safe for the > DAX case but is it safe to use a page returned by this more generally? Good question. I thought it should work when the caller doesn't set any bit in PFN_FLAGS_MASK, but it turns out it's not the case? As I just notice: static inline bool pfn_t_has_page(pfn_t pfn) { return (pfn.val & PFN_MAP) == PFN_MAP || (pfn.val & PFN_DEV) == 0; } So it looks like "no PFN_FLAGS" case should also fall into this category of "(pfn.val & PFN_DEV) == 0".. I'm not sure whether my understanding is correct, though. Maybe we'd want to double check with pfn_valid() when it's a generic function. > > From an API perspective it would make more sense for the DAX code to > pass the page rather than the pfn. I didn't do that because device DAX > just had the PFN and this was DAX-specific code. But if we want to make > it generic I'd rather have callers pass the page in. > > Of course that probably doesn't help you, because then the call would be > vmf_insert_page_pmd() rather than a raw pfn, but as you point out there > might be some common code we could share. It'll be fine if it needs page*, then it'll be NULL for VFIO. So far it looks cleaner if it has the pgtable entry anyway to me, as that indeed contains the pfn. But I'd trust you more on what should it look like, as I didn't read the whole series here. > > > > > As most of the rest look very similar to what pfn injections would need.. > > and in the PoC of ours we're using vmf_insert_pfn_pmd/pud(). > > Do you have the PoC posted anywhere so I can get an understanding of how > this might be used? https://github.com/xzpeter/linux/commits/vfio-pfnmap-all/ Specifically Alex's commit here: https://github.com/xzpeter/linux/commit/afd05f1082bc78738e280f1fc1937da52b2572ed Just a note that it's still work in progress. Alex did run it through (not this tree, but an older one) and it works pretty well so far. I think it's because so far nothing involves the pfn flags, the only one has it involved is (taking pmd as example): insert_pfn_pmd(): if (!pmd_none(*pmd)) { if (write) { if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); goto out_unlock; } entry = pmd_mkyoung(*pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); if (pmdp_set_access_flags(vma, addr, pmd, entry, 1)) update_mmu_cache_pmd(vma, addr, pmd); } goto out_unlock; } But for VFIO it'll definitely be pmd_none() here, so the whole path ignores pfn flags so far here, I assume. > > > That also reminds me on whether it'll be easier to implement the new dax > > support for page struct on top of vmf_insert_pfn_pmd/pud, rather than > > removing the 1st then adding the new one. Maybe it'll reduce code churns, > > and would that also make reviews easier? > > Yeah, that's a good observation. I think it was just a quirk of how I > was developing this and also not caring about the PFN case so I'll see > what that looks like. Great! I hope it'll reduce the diff for this series too, so it could be a win-win. Thanks, -- Peter Xu