From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE51BC5320E for ; Mon, 19 Aug 2024 13:16:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:CC:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=VMS7vyPOD1w3XGo5EFn67TJJwOsp6W1Y92rgoFNzXWI=; b=NncpNNUQcEghaT9gtFobXSD9IP ii4ljor0BEglwY/Sx+kGgDg7GDdXfMLG1Lf223vQguB8eqkGf4Xen42gi9eVEF6+VyaNpzWZzrm9m IKuAmJz5eR7zsRUdVn7npVyMHQ8VLeS6awuGt6DUtgtWh15VZdsyqEf05CZ8mexbOCmgPQeOB3VZI 2lXpnATFAaNEiOf49TcCA008ROU1rCgxHzoR9kf8duzvrQd5CAstlTv0K5jpWJKeuFBNgVyl8gQhF HQ4/+UAaMHJhWsBLoxNJmXUbkctXC+nt3BDTiaH4q5GfQ6RFj9gtOm7FN8/7AT0kDWdjtb0b94rKM TftfEaKQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sg2Eg-00000001Y9E-0sCi; Mon, 19 Aug 2024 13:15:50 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sg2DT-00000001Xto-3x2R for linux-arm-kernel@lists.infradead.org; Mon, 19 Aug 2024 13:14:38 +0000 Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4WnXxP422Nz1j6lw; Mon, 19 Aug 2024 21:09:29 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id E97721402E2; Mon, 19 Aug 2024 21:14:27 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 19 Aug 2024 21:14:26 +0800 Message-ID: <498e0731-81a4-4f75-95b4-a8ad0bcc7665@huawei.com> Date: Mon, 19 Aug 2024 21:14:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Content-Language: en-US To: Peter Xu CC: Jason Gunthorpe , , , Sean Christopherson , Oscar Salvador , Axel Rasmussen , , , Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , , Dave Hansen , Alex Williamson , Yan Zhao References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> <1147332f-790e-487f-8816-1860b8744ab2@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemf100008.china.huawei.com (7.185.36.138) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240819_061436_632577_862881DA X-CRM114-Status: GOOD ( 22.84 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 2024/8/16 22:33, Peter Xu wrote: > On Fri, Aug 16, 2024 at 11:05:33AM +0800, Kefeng Wang wrote: >> >> >> On 2024/8/16 3:20, Peter Xu wrote: >>> On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote: >>>>> Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. >>>> >>>> There is definitely interest here in extending ARM to support the 1G >>>> size too, what is missing? >>> >>> Currently PUD pfnmap relies on THP_PUD config option: >>> >>> config ARCH_SUPPORTS_PUD_PFNMAP >>> def_bool y >>> depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>> >>> Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet. >>> >>> Ideally, pfnmap is too simple comparing to real THPs and it shouldn't >>> require to depend on THP at all, but we'll need things like below to land >>> first: >>> >>> https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com >>> >>> I sent that first a while ago, but I didn't collect enough inputs, and I >>> decided to unblock this series from that, so x86_64 shouldn't be affected, >>> and arm64 will at least start to have 2M. >>> >>>> >>>>> The other trick is how to allow gup-fast working for such huge mappings >>>>> even if there's no direct sign of knowing whether it's a normal page or >>>>> MMIO mapping. This series chose to keep the pte_special solution, so that >>>>> it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that >>>>> gup-fast will be able to identify them and fail properly. >>>> >>>> Make sense >>>> >>>>> More architectures / More page sizes >>>>> ------------------------------------ >>>>> >>>>> Currently only x86_64 (2M+1G) and arm64 (2M) are supported. >>>>> >>>>> For example, if arm64 can start to support THP_PUD one day, the huge pfnmap >>>>> on 1G will be automatically enabled. >> >> A draft patch to enable THP_PUD on arm64, only passed with DEBUG_VM_PGTABLE, >> we may test pud pfnmaps on arm64. > > Thanks, Kefeng. It'll be great if this works already, as simple. > > Might be interesting to know whether it works already if you have some > few-GBs GPU around on the systems. > > Logically as long as you have HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD selected > below, 1g pfnmap will be automatically enabled when you rebuild the kernel. > You can double check that by looking for this: > > CONFIG_ARCH_SUPPORTS_PUD_PFNMAP=y > > And you can try to observe the mappings by enabling dynamic debug for > vfio_pci_mmap_huge_fault(), then map the bar with vfio-pci and read > something from it. I don't have such device, but we write a driver which use vmf_insert_pfn_pmd/pud in huge_fault, static const struct vm_operations_struct test_vm_ops = { .huge_fault = test_huge_fault, ... } and read/write it after mmap(,2M/1G,test_fd,...), it works as expected, since it could be used by dax, let's send it separately.