From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CECE3C67861 for ; Thu, 4 Apr 2024 21:48:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=hTco9cfJHEK8t/z45XFGE3KnwRhfZMZ0vR7ixmp8nQU=; b=yThYBh1tOQOdxa qrb8xHCXJ4FZRWFOK/32BBbBE7hQ2tpqzqvh7+zB6kUl2O9N45wQVsW6bsUMTRRisJkxvomWj6N7Q apZlJ/CbVsunv1MvYFlS/fH8sv+cSr/XEcBbTHrKKbbxma5AL7Ws2LCUPUM2FvKPQKSCtcxIfJY2v sETvRzSwK/sw8BvIdL7ejJUUeWUF5F359j3wu7yVLmM66WAr2l3Y+S5smeNL65WSYjS/mPrH3qLhE n4ZN6eS+EGrgXeT9j3foQRk5yTT9RYdqG7YZwbrUUjy5T47NBq921Zg6dlh5SsT+hjozyDa0gMqDi 5Dd0bo7Qj9/qokVGGzBQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsUwQ-00000004NN9-1Ivm; Thu, 04 Apr 2024 21:48:14 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsUwM-00000004NLg-0pIq for linux-riscv@lists.infradead.org; Thu, 04 Apr 2024 21:48:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=U4cl7svSlIK7RcjmgEdmdGc2rP4assD95p547ZSziMc3T4tfYTQxv+ScKab+haYTYNo7Vt R3WE+vIFpTo8SiXx0n4YzkXksCrsj8wEI1cUhrD+sT4hJX+L3kQ1j/GOYC9lTN3DXa/s5M UsYW5NdbrOllh8fYakYIWxjBpHttVNY= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-270-XoGc45lxNqmXr64qVS3ZlQ-1; Thu, 04 Apr 2024 17:48:07 -0400 X-MC-Unique: XoGc45lxNqmXr64qVS3ZlQ-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-78d41af5bebso33977585a.0 for ; Thu, 04 Apr 2024 14:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712267287; x=1712872087; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=bFN/o/gNWV2ueo1BtKCQURTlipms3QS0yD9aLdEYKNybRlEjVg2XtL1SPtwHQ3UKSf DUdf2pOQCpyXiAp8V+sfgyPlYNMSlj05zylAdu5E92GHVOoj7e5By7VPGppT/EOrSj4F XNWIPifLDKySOoVRBwY1RiiuAdL8cDvXmVVY8g9E8lMBBT1E2x96YpIFnzT7Hs//zkzf ugg36eMePuQGqfLBAx7hm49b09F1MWLfddn++qO3pS8UmKnVJP2TSeWxxv72J6CyCN7O 0u3Zcn99td8MyvKA/50KYfQtsbyzWyIG7wBGbsGQkplrMRkbDy0EUIRpeRIsAf2sBMe6 tFhQ== X-Forwarded-Encrypted: i=1; AJvYcCXAQnOxPO5z2PQomBB5gicFatzVTsV1llDLmgvmciDroMrkXfZvGUEKIF4hKyCvA090BeTSjIKSaPUWS9q9P4PjKc5meGdWGUtMbhpnBOk4 X-Gm-Message-State: AOJu0YxVSTz9q4sq2UbBLPlGpgRq+s+StQ1efJuWXcZkwNzVqIYHyrKK tjdrJz1ruTjrT5K9YEe6eul+Z9aRNoJayoGVybwDLC4bzYLojqVQ2AKJCzlWBWLachV0nvC3zwK CAkWKWUwG4wLu/qhKJ1pCAcaycS9RtYBNfW9knxQa/uZTByXXTez0+cCldenxYkEwCQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680983qkb.0.1712267286798; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGchKyZTo6ytc9p0sjvCKGDIoG+ERHoc+VJC1tNlLSTRdcMUNyMf3HEESrAIicSQ8HrtIJ6pQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680952qkb.0.1712267286207; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id wg6-20020a05620a568600b00789e49808ffsm105555qkn.105.2024.04.04.14.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:48:05 -0700 (PDT) Date: Thu, 4 Apr 2024 17:48:03 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , Christophe Leroy , Matthew Wilcox , Rik van Riel , Lorenzo Stoakes , Axel Rasmussen , Yang Shi , John Hubbard , linux-arm-kernel@lists.infradead.org, "Kirill A . Shutemov" , Andrew Jones , Vlastimil Babka , Mike Rapoport , Andrew Morton , Muchun Song , Christoph Hellwig , linux-riscv@lists.infradead.org, James Houghton , David Hildenbrand , Andrea Arcangeli , "Aneesh Kumar K . V" , Mike Kravetz Subject: Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2 Message-ID: References: <20240321220802.679544-1-peterx@redhat.com> <20240322161000.GJ159172@nvidia.com> <20240326140252.GH6245@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240326140252.GH6245@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_144810_350207_A7FDC26A X-CRM114-Status: GOOD ( 29.68 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote: > The more I look at this the more I think we need to get to Matthew's > idea of having some kind of generic page table API that is not tightly > tied to level. Replacing the hugetlb trick of 'everything is a PTE' > with 5 special cases in every place seems just horrible. > > struct mm_walk_ops { > int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk); > } > > And many cases really want something like: > struct mm_walk_state state; > > if (!mm_walk_seek_leaf(state, mm, address)) > goto no_present > if (mm_walk_is_write(state)) .. > > And detailed walking: > for_each_pt_leaf(state, mm, address) { > if (mm_walk_is_write(state)) .. > } > > Replacing it with a mm_walk_state that retains the level or otherwise > to allow decoding any entry composes a lot better. Forced Loop > unrolling can get back to the current code gen in alot of places. > > It also makes the power stuff a bit nicer as the mm_walk_state could > automatically retain back pointers to the higher levels in the state > struct too... > > The puzzle is how to do it and still get reasonable efficient codegen, > many operations are going to end up switching on some state->level to > know how to decode the entry. These discussions are definitely constructive, thanks Jason. Very helpful. I thought about this last week but got interrupted. It does make sense to me; it looks pretty generic and it is flexible enough as a top design. At least that's what I thought. However now when I rethink about it, and look more into the code when I got the chance, it turns out this will be a major rewrite of mostly every walkers.. it doesn't mean that this is a bad idea, but then I'll need to compare the other approach, because there can be a huge difference on when we can get that code ready, I think. :) Consider that what we (or.. I) want to teach the pXd layers are two things right now: (1) hugetlb mappings (2) MMIO (PFN) mappings. That mostly shares the generic concept when working on the mm walkers no matter which way to go, just different treatment on different type of mem. (2) is on top of current code and new stuff, while (1) is a refactoring to drop hugetlb_entry() hook point as the goal. Taking a simplest mm walker (smaps) as example, I think most codes are ready thanks to THP's existance, and also like vm_normal_page[_pmd]() which should even already work for pfnmaps; pud layer is missing but that should be trivial. It means we may have chance to drop hugetlb_entry() without an huge overhaul yet. Now the important question I'm asking myself is: do we really need huge p4d or even bigger? It's 512GB on x86, and we said "No 512 GiB pages yet" (commit fe1e8c3e963) since 2017 - that is 7 years without chaning this fact. While on non-x86 p4d_leaf() never defined. Then it's also interesting to see how many codes are "ready" to handle p4d entries (by looking at p4d_leaf() calls; much easier to see with the removal of the rest huge apis..) even if none existed. So, can we over-engineer too much if we go the generic route now? Considering that we already have most of pmd/pud entries around in the mm walker ops. So far it sounds better we leave it for later, until further justifed to be useful. And that won't block it if it ever justified to be needed, I'd say it can also be seen as a step forward if I can make it to remove hugetlb_entry() first. Comments welcomed (before I start to work on anything..). Thanks, -- Peter Xu _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 04812C67861 for ; Thu, 4 Apr 2024 21:49:00 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X7drrplw; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X7drrplw; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4V9Zy34gxRz3vZR for ; Fri, 5 Apr 2024 08:48:59 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X7drrplw; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=X7drrplw; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4V9ZxF4Vrmz3bp7 for ; Fri, 5 Apr 2024 08:48:16 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=X7drrplwCIou89vX3EWjevFB3k1McXSAdYxQHkKnVOQcso0AgIKdnIzEELCEKOAIoD1fwt Gbw+7fHRHMEo2PtwvXA0J/qD6rhU8NUbcRH8wJMDMt9A0eMQFEsyEADKYfapYvdHGOilc1 CDRTPJ4V+CnQ3ScdsiOgbH5l5Wi+/MI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=X7drrplwCIou89vX3EWjevFB3k1McXSAdYxQHkKnVOQcso0AgIKdnIzEELCEKOAIoD1fwt Gbw+7fHRHMEo2PtwvXA0J/qD6rhU8NUbcRH8wJMDMt9A0eMQFEsyEADKYfapYvdHGOilc1 CDRTPJ4V+CnQ3ScdsiOgbH5l5Wi+/MI= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-439-mJVxAp-aMGiM_jqNuvmgnA-1; Thu, 04 Apr 2024 17:48:12 -0400 X-MC-Unique: mJVxAp-aMGiM_jqNuvmgnA-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-78d41af5bebso33979685a.0 for ; Thu, 04 Apr 2024 14:48:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712267292; x=1712872092; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=SrkDz11vhWENUhLyELB5AxkDGYFa324ObMHsiVrojloOs01awtABtN7PmQCWGRr137 /vawii19/BGV64I6diuHeU701JXeGVHCYC1irZiILeDyx9Q7BnMEK0y0PoWVGcEuz3fn csjD6vmxR/ulYodt100JWgvR8HSb/oHAqNPVmLkBZlTir7BtUEGqdh0cjNUpNdTvv3cb iJge1Tt1yjEh5cAJ3aCjFVDJG3LTPI5ubIvqA8+LTxrRRRRmm1W2L2cm82MClSM1pYkb fSNaDwpawstklhGP8GyZ/mm8glf+NcFZF2SrkX9YmxVwriSbfYkpAcOaMMXanRsBypLO VPqw== X-Forwarded-Encrypted: i=1; AJvYcCVnyzFLbfjvgj1AEngYolgfC2glM+X1y+teynGIa6otHxfU+EhuH6eVl7Y8mKrHAKebzwvDvV6oj4GLAAxlJ5TF5GCgGMaYH93bbbiVbw== X-Gm-Message-State: AOJu0YwtW3vblxjgtGpDRt6jRvPEdmAgVU2+S1Fn/9v0J4pLIgei9AjD HBnCrwR85GrV0qT+rEwExeCG6gsGS4xdldnXRCA6qW+KgMJlgLciNeZ6CdlSUtV0fLG8Sd9mZNZ byAP4MIllrA/cpsFPXL883q/n5HfxPr/A7gtW8XnCfIy8JMlBloKC6VVHO79ArKI= X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4681200qkb.0.1712267291537; Thu, 04 Apr 2024 14:48:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGchKyZTo6ytc9p0sjvCKGDIoG+ERHoc+VJC1tNlLSTRdcMUNyMf3HEESrAIicSQ8HrtIJ6pQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680952qkb.0.1712267286207; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id wg6-20020a05620a568600b00789e49808ffsm105555qkn.105.2024.04.04.14.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:48:05 -0700 (PDT) Date: Thu, 4 Apr 2024 17:48:03 -0400 From: Peter Xu To: Jason Gunthorpe Subject: Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2 Message-ID: References: <20240321220802.679544-1-peterx@redhat.com> <20240322161000.GJ159172@nvidia.com> <20240326140252.GH6245@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240326140252.GH6245@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Houghton , David Hildenbrand , Yang Shi , Andrew Jones , linux-mm@kvack.org, linux-riscv@lists.infradead.org, Andrea Arcangeli , "Aneesh Kumar K . V" , Matthew Wilcox , Christoph Hellwig , linux-arm-kernel@lists.infradead.org, Axel Rasmussen , Rik van Riel , John Hubbard , "Kirill A . Shutemov" , Vlastimil Babka , Lorenzo Stoakes , Muchun Song , linux-kernel@vger.kernel.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote: > The more I look at this the more I think we need to get to Matthew's > idea of having some kind of generic page table API that is not tightly > tied to level. Replacing the hugetlb trick of 'everything is a PTE' > with 5 special cases in every place seems just horrible. > > struct mm_walk_ops { > int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk); > } > > And many cases really want something like: > struct mm_walk_state state; > > if (!mm_walk_seek_leaf(state, mm, address)) > goto no_present > if (mm_walk_is_write(state)) .. > > And detailed walking: > for_each_pt_leaf(state, mm, address) { > if (mm_walk_is_write(state)) .. > } > > Replacing it with a mm_walk_state that retains the level or otherwise > to allow decoding any entry composes a lot better. Forced Loop > unrolling can get back to the current code gen in alot of places. > > It also makes the power stuff a bit nicer as the mm_walk_state could > automatically retain back pointers to the higher levels in the state > struct too... > > The puzzle is how to do it and still get reasonable efficient codegen, > many operations are going to end up switching on some state->level to > know how to decode the entry. These discussions are definitely constructive, thanks Jason. Very helpful. I thought about this last week but got interrupted. It does make sense to me; it looks pretty generic and it is flexible enough as a top design. At least that's what I thought. However now when I rethink about it, and look more into the code when I got the chance, it turns out this will be a major rewrite of mostly every walkers.. it doesn't mean that this is a bad idea, but then I'll need to compare the other approach, because there can be a huge difference on when we can get that code ready, I think. :) Consider that what we (or.. I) want to teach the pXd layers are two things right now: (1) hugetlb mappings (2) MMIO (PFN) mappings. That mostly shares the generic concept when working on the mm walkers no matter which way to go, just different treatment on different type of mem. (2) is on top of current code and new stuff, while (1) is a refactoring to drop hugetlb_entry() hook point as the goal. Taking a simplest mm walker (smaps) as example, I think most codes are ready thanks to THP's existance, and also like vm_normal_page[_pmd]() which should even already work for pfnmaps; pud layer is missing but that should be trivial. It means we may have chance to drop hugetlb_entry() without an huge overhaul yet. Now the important question I'm asking myself is: do we really need huge p4d or even bigger? It's 512GB on x86, and we said "No 512 GiB pages yet" (commit fe1e8c3e963) since 2017 - that is 7 years without chaning this fact. While on non-x86 p4d_leaf() never defined. Then it's also interesting to see how many codes are "ready" to handle p4d entries (by looking at p4d_leaf() calls; much easier to see with the removal of the rest huge apis..) even if none existed. So, can we over-engineer too much if we go the generic route now? Considering that we already have most of pmd/pud entries around in the mm walker ops. So far it sounds better we leave it for later, until further justifed to be useful. And that won't block it if it ever justified to be needed, I'd say it can also be seen as a step forward if I can make it to remove hugetlb_entry() first. Comments welcomed (before I start to work on anything..). Thanks, -- Peter Xu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2AF5ECD1292 for ; Thu, 4 Apr 2024 21:48:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=UEiC8rIBD/eo+ogYosQj3r2yO8fh/dznfkeGUyGljaE=; b=B3+q9ZOqu9ULA2 0UAcPL/z3Ec9jiFbDget6+fPP7ogq03MCmM3sf1KqvzXne2BWS0ZJVXxIPGWhMzBtIYwosflu3xiH lvGkBaI4I1+HEY4NmdiREdHPmj59Ex09gC0IhVgeb2VvpiVgs5fNZPh/rCRPaMA2tZzyG3N6weMQ0 0K20gsT78s2LpzBSmEqKHIVAt66SBiwMvzbPkblRfLlTURd/a2X3Jg4yl6GL77s4JjPHkJZ52/FgF 8oJSfCQzdqEzwgSRsG7aJEgDXShsfXGlPQP2+yMei9J1Z9/YegdqciUVKGr1I+sklB8p8jP24j2uI XwFx1A9JTpRGfqNzBt5g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsUwR-00000004NNH-017S; Thu, 04 Apr 2024 21:48:15 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsUwN-00000004NMS-28eU for linux-arm-kernel@lists.infradead.org; Thu, 04 Apr 2024 21:48:13 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=Dp1CCNAwaqkPHaB9pYzXuG9DmKHnIt5oc7KrG0nbpDXEoNAlO12IKM5hE8fOzN7qZHHuyE Z4cMd2M2kuF6UZH8utFBFhVH8kJsa3AYUE9SRUtxkkoHRQhzKeDaxE0EV032KrD/YIc/0m GSiIWIOp5Ovoqme+Z/k0KRapRL21+Vg= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-27-0-GvITzgN-CmxavxETcRJQ-1; Thu, 04 Apr 2024 17:48:07 -0400 X-MC-Unique: 0-GvITzgN-CmxavxETcRJQ-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-78d41af5bebso33977485a.0 for ; Thu, 04 Apr 2024 14:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712267287; x=1712872087; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=PhZCHva1bbhfLS4C82n3n9TOckp1Xzo8h7oCP2AG8gmwks908NsQbuASVK+4rVPFUQ I7d/5TPKmPH8K5/sG0hf4/YqZVbS2WL1UqRQtA/Uml8q6f33FVyTf8zZ5m3KR5Afj9Vj 3tbIKSEJucoPQeQ3tO/lUOYk10LTYNWb0lgLCuyRN7AJqdMavmnSZ/WJ66eBckdXtlSY K9bghvnmEaLzvvXO2o6+2PVPhCWNBmHPmPJnExqfPkvLL1YizsGeyln4rvUGLDaN+6vJ d7Ci3/MkMbT4gUyBebCRqYDbceTy7DnLTSw45M1tgp1Km4jppitKjhj2VgR/qq2RhboL +xvw== X-Forwarded-Encrypted: i=1; AJvYcCVLtFyuZRqBSUgKMaJAvwDEFUyMGU+iHCEqlfrb/1c4k7/T+ICkR+DZr9cD+Lk15TotCeRLgmDohpADoRcRZR6v5fCJbo0llDC8BUAgsJWYsFWVNX0= X-Gm-Message-State: AOJu0YwgyAzkSA+kyFQRTAJxebKlXjLajfXIIgmtirvdR0cqADbbpP2Y oy1MFt9zx2o0KzGpUjYJY2BBCz0xan8TRsoxd8X83LiossXnzRKln902hww/qDqMqQbeGNDqh8j 14fr0DNDvvHnAlVFHbjV8Hv+K8XnCdWYXFliqatBUccZy8EuTB2jyA2q0YOoxxEoD4yjMayRJ X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680978qkb.0.1712267286792; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGchKyZTo6ytc9p0sjvCKGDIoG+ERHoc+VJC1tNlLSTRdcMUNyMf3HEESrAIicSQ8HrtIJ6pQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680952qkb.0.1712267286207; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id wg6-20020a05620a568600b00789e49808ffsm105555qkn.105.2024.04.04.14.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:48:05 -0700 (PDT) Date: Thu, 4 Apr 2024 17:48:03 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , Christophe Leroy , Matthew Wilcox , Rik van Riel , Lorenzo Stoakes , Axel Rasmussen , Yang Shi , John Hubbard , linux-arm-kernel@lists.infradead.org, "Kirill A . Shutemov" , Andrew Jones , Vlastimil Babka , Mike Rapoport , Andrew Morton , Muchun Song , Christoph Hellwig , linux-riscv@lists.infradead.org, James Houghton , David Hildenbrand , Andrea Arcangeli , "Aneesh Kumar K . V" , Mike Kravetz Subject: Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2 Message-ID: References: <20240321220802.679544-1-peterx@redhat.com> <20240322161000.GJ159172@nvidia.com> <20240326140252.GH6245@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240326140252.GH6245@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_144811_910335_18C12ED8 X-CRM114-Status: GOOD ( 31.09 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote: > The more I look at this the more I think we need to get to Matthew's > idea of having some kind of generic page table API that is not tightly > tied to level. Replacing the hugetlb trick of 'everything is a PTE' > with 5 special cases in every place seems just horrible. > > struct mm_walk_ops { > int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk); > } > > And many cases really want something like: > struct mm_walk_state state; > > if (!mm_walk_seek_leaf(state, mm, address)) > goto no_present > if (mm_walk_is_write(state)) .. > > And detailed walking: > for_each_pt_leaf(state, mm, address) { > if (mm_walk_is_write(state)) .. > } > > Replacing it with a mm_walk_state that retains the level or otherwise > to allow decoding any entry composes a lot better. Forced Loop > unrolling can get back to the current code gen in alot of places. > > It also makes the power stuff a bit nicer as the mm_walk_state could > automatically retain back pointers to the higher levels in the state > struct too... > > The puzzle is how to do it and still get reasonable efficient codegen, > many operations are going to end up switching on some state->level to > know how to decode the entry. These discussions are definitely constructive, thanks Jason. Very helpful. I thought about this last week but got interrupted. It does make sense to me; it looks pretty generic and it is flexible enough as a top design. At least that's what I thought. However now when I rethink about it, and look more into the code when I got the chance, it turns out this will be a major rewrite of mostly every walkers.. it doesn't mean that this is a bad idea, but then I'll need to compare the other approach, because there can be a huge difference on when we can get that code ready, I think. :) Consider that what we (or.. I) want to teach the pXd layers are two things right now: (1) hugetlb mappings (2) MMIO (PFN) mappings. That mostly shares the generic concept when working on the mm walkers no matter which way to go, just different treatment on different type of mem. (2) is on top of current code and new stuff, while (1) is a refactoring to drop hugetlb_entry() hook point as the goal. Taking a simplest mm walker (smaps) as example, I think most codes are ready thanks to THP's existance, and also like vm_normal_page[_pmd]() which should even already work for pfnmaps; pud layer is missing but that should be trivial. It means we may have chance to drop hugetlb_entry() without an huge overhaul yet. Now the important question I'm asking myself is: do we really need huge p4d or even bigger? It's 512GB on x86, and we said "No 512 GiB pages yet" (commit fe1e8c3e963) since 2017 - that is 7 years without chaning this fact. While on non-x86 p4d_leaf() never defined. Then it's also interesting to see how many codes are "ready" to handle p4d entries (by looking at p4d_leaf() calls; much easier to see with the removal of the rest huge apis..) even if none existed. So, can we over-engineer too much if we go the generic route now? Considering that we already have most of pmd/pud entries around in the mm walker ops. So far it sounds better we leave it for later, until further justifed to be useful. And that won't block it if it ever justified to be needed, I'd say it can also be seen as a step forward if I can make it to remove hugetlb_entry() first. Comments welcomed (before I start to work on anything..). Thanks, -- Peter Xu _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA817C67861 for ; Thu, 4 Apr 2024 21:48:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FC6B6B0098; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6854D6B009F; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FF336B00A0; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F8846B0098 for ; Thu, 4 Apr 2024 17:48:14 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9391C1A0EF5 for ; Thu, 4 Apr 2024 21:48:13 +0000 (UTC) X-FDA: 81973188066.20.E74F4F4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 8A46218000C for ; Thu, 4 Apr 2024 21:48:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U4cl7svS; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712267291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=szfWg5Ufq6SY0/lUfRxJD9uKzfIQ6yXqGNFrTxU5JgB6Wb+2jLSdEhqVt0Jsx7XmOMVXN9 xxCeIihTwV9Apw3atMHgpzW4X/VE5QgkHl3NhBMg0ZAencmrqd1LbzbqXfRPTBXvWpj20O fjkDac6rLjP4guToNmYnBPuIaCyl6QU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U4cl7svS; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712267291; a=rsa-sha256; cv=none; b=BDYYoMdVAHV4HFznWNFX4oeQg1SC/KFQO/aQJvgmovmnysMCCKr705usu3gY6ygN66/qZl D4MwoVRV0vvuWzsibCTqwuWUbX7Np4Arg2snX9Ry69M4dk+Vosv8cmfcUj4TMLsBpGEViE ZCVhQSROQkkGSUe2AQhg9OHa2+cVTtA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712267288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=U4cl7svSlIK7RcjmgEdmdGc2rP4assD95p547ZSziMc3T4tfYTQxv+ScKab+haYTYNo7Vt R3WE+vIFpTo8SiXx0n4YzkXksCrsj8wEI1cUhrD+sT4hJX+L3kQ1j/GOYC9lTN3DXa/s5M UsYW5NdbrOllh8fYakYIWxjBpHttVNY= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-27-2VH9HPUONLCWwxVGNo0TSw-1; Thu, 04 Apr 2024 17:48:07 -0400 X-MC-Unique: 2VH9HPUONLCWwxVGNo0TSw-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-78d41af5bebso33977885a.0 for ; Thu, 04 Apr 2024 14:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712267287; x=1712872087; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=d56SKhTEmtwwDufddPphJGQfZPKBUJnUzF2wWqxF3/c=; b=Uo+fRiowfpZUf97kyIOqXEpTjNUHftQo8d6nhhxvjYNrbWfbTp/KbIk05FXjcTp45B p4ojdjHYtMBiK9m7BkXZzJPJXFnz5NDRy8cNpARKh0bAvfQmzQCzsGqQsdCNI4wfc0uv tssW1d3w4y2PLB0GXKEqnqQ4EnxygI6pCixeS+UUo1myp7JZHI4h/ulhqIF3SMQ+bUK7 3pGvFgez/aGZf9N0uHTRQeMcaKF0hPwvsuMqNJFR5UKINh5ay0WYis04t2LuZcwZTG8n R0uK1PaERXDduMjeVzaYdMJzTs/YSh54rQ9ZoOpQ2iIviZ3E5CJS1kpmvmK0MuYifT1N cJ+Q== X-Gm-Message-State: AOJu0YyoSYlQdwxaNxO0Q+6Tm2lQxsy/QhBQk8ds7V8Mp9pN0y1l+xTr pFhmhuniEWA7QFGcpjir3Ao4cm37JKgLYsFzeYIA0u6U9urpnw143W1qMHXzNXrYcZFApapmGbi hGH5zoH6P5tV298Hnt1m+w8qFSN+O9bCHrdKYJBgGkdr3SzskGuKPiec98+A= X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680986qkb.0.1712267286804; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGchKyZTo6ytc9p0sjvCKGDIoG+ERHoc+VJC1tNlLSTRdcMUNyMf3HEESrAIicSQ8HrtIJ6pQ== X-Received: by 2002:a05:620a:4710:b0:78d:3b13:f5ab with SMTP id bs16-20020a05620a471000b0078d3b13f5abmr4680952qkb.0.1712267286207; Thu, 04 Apr 2024 14:48:06 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id wg6-20020a05620a568600b00789e49808ffsm105555qkn.105.2024.04.04.14.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:48:05 -0700 (PDT) Date: Thu, 4 Apr 2024 17:48:03 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman , Christophe Leroy , Matthew Wilcox , Rik van Riel , Lorenzo Stoakes , Axel Rasmussen , Yang Shi , John Hubbard , linux-arm-kernel@lists.infradead.org, "Kirill A . Shutemov" , Andrew Jones , Vlastimil Babka , Mike Rapoport , Andrew Morton , Muchun Song , Christoph Hellwig , linux-riscv@lists.infradead.org, James Houghton , David Hildenbrand , Andrea Arcangeli , "Aneesh Kumar K . V" , Mike Kravetz Subject: Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2 Message-ID: References: <20240321220802.679544-1-peterx@redhat.com> <20240322161000.GJ159172@nvidia.com> <20240326140252.GH6245@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240326140252.GH6245@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 8A46218000C X-Rspam-User: X-Stat-Signature: dkyanana8h5oqtkjrpy8fhuwyn66tu84 X-Rspamd-Server: rspam01 X-HE-Tag: 1712267289-386421 X-HE-Meta: U2FsdGVkX19lOhGYxdZYOHyzOSqpZQXso5yOOVViFYt6YGN4JjFk4fXn3z7fJlohpdCtfB9FtV9pSi03SQzOYAEPlzB7RscQT/ezTo+qkr8ZmQozDFPfPdRUwkupK5LKqhSb6P1PSg1MvqgnFJcsY2cgPcvae+slJWSo6A8XNXpApEgyyn7Hf5oojiRLLQ/3IJt49oYwQUjn5JaH/xhO5Dlj0IKaxcMlVjQL2Uj7BnzLE2Y7upEQNaapd7K5YrR9IE6R3iGIXFl0sX67FlEEN0zgS5Ss1BDADdht5r6ZeGr6Vkfxs5xV4tGqtL8Ll3iht9WCtoBZNExkeHq7gn3igv0lNPymGgWSVTkvabVwaXFcZEiOouEW/Hxd5QYfibrXLlts8R65tORG9HcoT6M9JuvF+x6Olx1HPuRB2JNHCcOdJNul6Z2Ip7HksIp19mUI+/2VgYhaexTB++aumGlfBeAY55kNnlDWSQQdfFTjhcshhYwQ9XOvR4RqrEqO2C7UTCuzcArmXnm4nBH5hV+RlYIKI9Y5w9kvPAE8/LGOYyzwT+saKxKjhKIcrulvK53Mupmx6t5V78g9CxNGOPMwmHHSpkD4IHNpH5MwvO6lvQ7ZB9t8hsAMq47la+zeWCPxzM1MBX7Mnb+oJymc6BZpc3pNZbFTdN0pmF3G8gGa8DrfDwoUdlvd7JDk9hlk5TtXIIOsCmG5LftvDF5zs0f1N/LDczrMCLRgm7OCE88UMbMU9mJ2/0Dm3re7IjhjvfRbYJtW7Re2JbrBaYa6/W7JH+a5lTNxURcPfOD6DF5F5FqOeV+ie888M15QpKCMh+TEiUkagif0Sm8z+qoNBFYi8r6Lb7WC4pS1CdV55JwKQrHX5VhtkDEw49xUjFGQvwZzbe5bZ7/Z54VGJg/POQv74clDRiDobYUj6TBab435gWsCVO31sWDj0Bqgop3h4XSaeTwAMsw/eLjpeFyO1UH NnITFt2j YfQUmeI6BT4w7weRqs8iCS7cDnQAKf6wG4AaUsv+ASta3g5vaw7RdUb4Yu29mpvJcnMIBWpc4fHwvia5+YfgXvecNvGKDHWRiapeXcnehCbdYEoAMF1hfCVykxo69zJLsJpCPG9g6PoIkaHS8duR0wcGjcdUoYbMPS2zXOGpW6cQmPvr6CTi1IQMsaYvQBzTLaNedw8wc4NyH3Wk5jOZWakJqkjiBZE1iHaZcqA9y9/qditr+EWMUJavsiMVBnNAZdH+s+i7CmKwqJcUY/UP/mQ5FmWEZV89eVazP6AtMtNXBpPxvfFFUq+dE7/70Bb44AgDWaxdef5/QFDCWdsX185U/3MY1sGHUjS3FovgvW3DUzP6t3/asC+thfQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote: > The more I look at this the more I think we need to get to Matthew's > idea of having some kind of generic page table API that is not tightly > tied to level. Replacing the hugetlb trick of 'everything is a PTE' > with 5 special cases in every place seems just horrible. > > struct mm_walk_ops { > int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk); > } > > And many cases really want something like: > struct mm_walk_state state; > > if (!mm_walk_seek_leaf(state, mm, address)) > goto no_present > if (mm_walk_is_write(state)) .. > > And detailed walking: > for_each_pt_leaf(state, mm, address) { > if (mm_walk_is_write(state)) .. > } > > Replacing it with a mm_walk_state that retains the level or otherwise > to allow decoding any entry composes a lot better. Forced Loop > unrolling can get back to the current code gen in alot of places. > > It also makes the power stuff a bit nicer as the mm_walk_state could > automatically retain back pointers to the higher levels in the state > struct too... > > The puzzle is how to do it and still get reasonable efficient codegen, > many operations are going to end up switching on some state->level to > know how to decode the entry. These discussions are definitely constructive, thanks Jason. Very helpful. I thought about this last week but got interrupted. It does make sense to me; it looks pretty generic and it is flexible enough as a top design. At least that's what I thought. However now when I rethink about it, and look more into the code when I got the chance, it turns out this will be a major rewrite of mostly every walkers.. it doesn't mean that this is a bad idea, but then I'll need to compare the other approach, because there can be a huge difference on when we can get that code ready, I think. :) Consider that what we (or.. I) want to teach the pXd layers are two things right now: (1) hugetlb mappings (2) MMIO (PFN) mappings. That mostly shares the generic concept when working on the mm walkers no matter which way to go, just different treatment on different type of mem. (2) is on top of current code and new stuff, while (1) is a refactoring to drop hugetlb_entry() hook point as the goal. Taking a simplest mm walker (smaps) as example, I think most codes are ready thanks to THP's existance, and also like vm_normal_page[_pmd]() which should even already work for pfnmaps; pud layer is missing but that should be trivial. It means we may have chance to drop hugetlb_entry() without an huge overhaul yet. Now the important question I'm asking myself is: do we really need huge p4d or even bigger? It's 512GB on x86, and we said "No 512 GiB pages yet" (commit fe1e8c3e963) since 2017 - that is 7 years without chaning this fact. While on non-x86 p4d_leaf() never defined. Then it's also interesting to see how many codes are "ready" to handle p4d entries (by looking at p4d_leaf() calls; much easier to see with the removal of the rest huge apis..) even if none existed. So, can we over-engineer too much if we go the generic route now? Considering that we already have most of pmd/pud entries around in the mm walker ops. So far it sounds better we leave it for later, until further justifed to be useful. And that won't block it if it ever justified to be needed, I'd say it can also be seen as a step forward if I can make it to remove hugetlb_entry() first. Comments welcomed (before I start to work on anything..). Thanks, -- Peter Xu