From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 932B4C3DA7F for ; Thu, 15 Aug 2024 19:21:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=BrwJ8JemNmaTSuT5Mhk5UCbqWH GjCFlHOsih1Uk6U32Qt5qiMeojciVAOU1BRG0443t1Fco3Pa+idKwU5n2/bMlqQUsC2bVYm4Nkafj FYWTzxhiDzRUR19qF9HGoZ0mAxLJ6+9dz4V/0TdjC9p9GoROHVzMPtxvnUzsoLCPeOXcXv4iLwltx +H2aRZSYIrv06vYQ/0ZFS3TrZ2FSs/Ove02pno8YLzaVpdq6zzmx+eS01qseB3qbKfaGVXg3XjCh/ cy7tM6GvWeZI9pGzc0rEzJaKPP5cIjTsQnrHsF+xOEli6fyJ51oZgchHmVwC54PVcvuABRv+6HEqf IRvEbllQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1seg2D-0000000AsbK-44SW; Thu, 15 Aug 2024 19:21:22 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1seg1a-0000000AsUe-0SAI for linux-arm-kernel@lists.infradead.org; Thu, 15 Aug 2024 19:20:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723749641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=L9BE+/ioC/FHxOhLEc6x6p0qPeF9x5FIGcKtIgkPCIv9po+JJ8r94XoQ/JHl+C5/YTwxPm yqaXq41RnPJJGon+BdE/cusGJoTNdKO8Td3z4AJsih2FjMVwbW8J1p6VX9jNffJDKQh28V jMulnB+1p3D+vOymSN5Pa117sAG8pys= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-594-KKHPNu_XNrC3Xs1kCcbvow-1; Thu, 15 Aug 2024 15:20:39 -0400 X-MC-Unique: KKHPNu_XNrC3Xs1kCcbvow-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-44fe325cd56so1830331cf.1 for ; Thu, 15 Aug 2024 12:20:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723749639; x=1724354439; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=NYfrmvY/sCflMcfbYenUGODrk85v00LdK6n6TqpcreY8HLFCaOqMi55bA17szl/fog X8Fj7gNTN53MsZg76AnllZodhVrxnhyEwwQEZFLNRGZ7F8+63KzHLGhy0ezNJXq/4eAP ZacTKNFpDu2fV1cNEoHUYSRTSS6XkyWWnQhaGHqVFZklessb7Jut3FAC8olS0KGYausu sFEK6VbeJak3/ne5gGDvZcRDTy5x9Kumn1biE3daYZb4UEpM2rT1+CiyN651hZsuDc64 +UxdctLOZyjzlHQAkyWpelVPYnQqynZTsuspX9aKDwr/nyIsfK6ftZVsX61yyHXxwQjC ZCLQ== X-Forwarded-Encrypted: i=1; AJvYcCWkWz90hcEHSZuSxvJcLdlxMrGH7E2h8IYDuWc1HQ9BvhK3dh7GIamWJEHrT/0/6M8hWLWMmCT5gAqKprj+AVDl@lists.infradead.org X-Gm-Message-State: AOJu0YwGS2oUU6U/w/DCc3m625JY+v0xNlDO07gTZdT5dV22uBnRzuBv o8KjCOMcJqYeFS2M3hQt77LeqnF6bhNX4YM+yQQDApAE5xISQWatVBfW730awfS1OBu2Jpwffvh 57XnhpzOVMRb5sTimoj0Rr/KUv2Q8tHrqEytzJ7u+V9D6epWQ24hjARdUxXcbBhS/RrMnOb3r X-Received: by 2002:a05:620a:4153:b0:7a2:1c0:37b5 with SMTP id af79cd13be357-7a50693d38fmr45573185a.4.1723749639338; Thu, 15 Aug 2024 12:20:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFojTZh/htiiNcOeQoBS1xVpbX6xb2lxm+nPbSup4gW+eZznV4PZFeir1x4wegN1uR+lXAZRg== X-Received: by 2002:a05:620a:4153:b0:7a2:1c0:37b5 with SMTP id af79cd13be357-7a50693d38fmr45569585a.4.1723749638927; Thu, 15 Aug 2024 12:20:38 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4ff055ae8sm90637485a.51.2024.08.15.12.20.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 12:20:38 -0700 (PDT) Date: Thu, 15 Aug 2024 15:20:35 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sean Christopherson , Oscar Salvador , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Message-ID: References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240814123715.GB2032816@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240815_122042_247158_487A0F2D X-CRM114-Status: GOOD ( 30.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote: > > Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. > > There is definitely interest here in extending ARM to support the 1G > size too, what is missing? Currently PUD pfnmap relies on THP_PUD config option: config ARCH_SUPPORTS_PUD_PFNMAP def_bool y depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet. Ideally, pfnmap is too simple comparing to real THPs and it shouldn't require to depend on THP at all, but we'll need things like below to land first: https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com I sent that first a while ago, but I didn't collect enough inputs, and I decided to unblock this series from that, so x86_64 shouldn't be affected, and arm64 will at least start to have 2M. > > > The other trick is how to allow gup-fast working for such huge mappings > > even if there's no direct sign of knowing whether it's a normal page or > > MMIO mapping. This series chose to keep the pte_special solution, so that > > it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that > > gup-fast will be able to identify them and fail properly. > > Make sense > > > More architectures / More page sizes > > ------------------------------------ > > > > Currently only x86_64 (2M+1G) and arm64 (2M) are supported. > > > > For example, if arm64 can start to support THP_PUD one day, the huge pfnmap > > on 1G will be automatically enabled. > > Oh that sounds like a bigger step.. Just to mention, no real THP 1G needed here for pfnmaps. The real gap here is only about the pud helpers that only exists so far with CONFIG_THP_PUD in huge_memory.c. > > > VFIO is so far the only consumer for the huge pfnmaps after this series > > applied. Besides above remap_pfn_range() generic optimization, device > > driver can also try to optimize its mmap() on a better VA alignment for > > either PMD/PUD sizes. This may, iiuc, normally require userspace changes, > > as the driver doesn't normally decide the VA to map a bar. But I don't > > think I know all the drivers to know the full picture. > > How does alignment work? In most caes I'm aware of the userspace does > not use MAP_FIXED so the expectation would be for the kernel to > automatically select a high alignment. I suppose your cases are > working because qemu uses MAP_FIXED and naturally aligns the BAR > addresses? > > > - x86_64 + AMD GPU > > - Needs Alex's modified QEMU to guarantee proper VA alignment to make > > sure all pages to be mapped with PUDs > > Oh :( So I suppose this answers above. :) Yes, alignment needed. -- Peter Xu