From: Dan Williams <dan.j.williams@intel.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>, Ira Weiny <ira.weiny@intel.com>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
Matthew Wilcox <willy@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>, Jane Chu <jane.chu@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages
Date: Mon, 7 Jun 2021 13:17:33 -0700 [thread overview]
Message-ID: <CAPcyv4j_PdzytEeabe95FrUiNVNobdJRvUE9M9j0krKQ1defBg@mail.gmail.com> (raw)
In-Reply-To: <8c922a58-c901-1ad9-5d19-1182bd6dea1e@oracle.com>
On Tue, May 18, 2021 at 10:28 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> On 5/5/21 11:36 PM, Joao Martins wrote:
> > On 5/5/21 11:20 PM, Dan Williams wrote:
> >> On Wed, May 5, 2021 at 12:50 PM Joao Martins <joao.m.martins@oracle.com> wrote:
> >>> On 5/5/21 7:44 PM, Dan Williams wrote:
> >>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >>>>> index b46f63dcaed3..bb28d82dda5e 100644
> >>>>> --- a/include/linux/memremap.h
> >>>>> +++ b/include/linux/memremap.h
> >>>>> @@ -114,6 +114,7 @@ struct dev_pagemap {
> >>>>> struct completion done;
> >>>>> enum memory_type type;
> >>>>> unsigned int flags;
> >>>>> + unsigned long align;
> >>>>
> >>>> I think this wants some kernel-doc above to indicate that non-zero
> >>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
> >>>> means "use non-compound base pages".
>
> [...]
>
> >>>> The non-zero value must be
> >>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
> >>>> Hmm, maybe it should be an
> >>>> enum:
> >>>>
> >>>> enum devmap_geometry {
> >>>> DEVMAP_PTE,
> >>>> DEVMAP_PMD,
> >>>> DEVMAP_PUD,
> >>>> }
> >>>>
> >>> I suppose a converter between devmap_geometry and page_size would be needed too? And maybe
> >>> the whole dax/nvdimm align values change meanwhile (as a followup improvement)?
> >>
> >> I think it is ok for dax/nvdimm to continue to maintain their align
> >> value because it should be ok to have 4MB align if the device really
> >> wanted. However, when it goes to map that alignment with
> >> memremap_pages() it can pick a mode. For example, it's already the
> >> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
> >> they're already separate concepts that can stay separate.
> >>
> > Gotcha.
>
> I am reconsidering part of the above. In general, yes, the meaning of devmap @align
> represents a slightly different variation of the device @align i.e. how the metadata is
> laid out **but** regardless of what kind of page table entries we use vmemmap.
>
> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax already
> validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE and PUD_SIZE)
> 2) the geometry of metadata is very much tied to the value we pick to @align at namespace
> provisioning -- not the "align" we might use at mmap() perhaps that's what you referred
> above? -- and 3) the value of geometry actually derives from dax device @align because we
> will need to create compound pages representing a page size of @align value.
>
> Using your example above: you're saying that dax->align == 1G is mapped with DEVMAP_PTEs,
> in reality the vmemmap is populated with PMDs/PUDs page tables (depending on what archs
> decide to do at vmemmap_populate()) and uses base pages as its metadata regardless of what
> device @align. In reality what we want to convey in @geometry is not page table sizes, but
> just the page size used for the vmemmap of the dax device.
Good point, the names "PTE, PMD, PUD" imply the hardware mapping size,
not the software compound page size.
> Additionally, limiting its
> value might not be desirable... if tomorrow Linux for some arch supports dax/nvdimm
> devices with 4M align or 64K align, the value of @geometry will have to reflect the 4M to
> create compound pages of order 10 for the said vmemmap.
>
> I am going to wait until you finish reviewing the remaining four patches of this series,
> but maybe this is a simple misnomer (s/align/geometry/) with a comment but without
> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum a
> setter/getter to audit its value? Thoughts?
I do see what you mean about the confusion DEVMAP_{PTE,PMD,PUD}
introduces, but I still think the device-dax align and the
organization of the 'struct page' metadata are distinct concepts. So
I'm happy with any color of the bikeshed as long as the 2 concepts are
distinct. How about calling it "compound_page_order"? Open to other
ideas...
next prev parent reply other threads:[~2021-06-07 20:18 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 23:09 [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-03-25 23:09 ` [PATCH v1 01/11] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-04-24 0:12 ` Dan Williams
2021-04-24 19:00 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 02/11] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-04-24 0:16 ` Dan Williams
2021-04-24 19:05 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 03/11] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-04-24 0:18 ` Dan Williams
2021-04-24 19:05 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-05-05 18:44 ` Dan Williams
2021-05-05 18:58 ` Matthew Wilcox
2021-05-05 19:49 ` Joao Martins
2021-05-05 22:20 ` Dan Williams
2021-05-05 22:36 ` Joao Martins
2021-05-05 23:03 ` Dan Williams
2021-05-06 10:12 ` Joao Martins
2021-05-18 17:27 ` Joao Martins
2021-05-18 19:56 ` Jane Chu
2021-05-19 11:29 ` Joao Martins
2021-05-19 18:36 ` Jane Chu
2021-06-07 20:17 ` Dan Williams [this message]
2021-06-07 20:47 ` Joao Martins
2021-06-07 21:00 ` Joao Martins
2021-06-07 21:57 ` Dan Williams
2021-05-06 8:05 ` Aneesh Kumar K.V
2021-05-06 10:23 ` Joao Martins
2021-05-06 11:43 ` Matthew Wilcox
2021-05-06 12:15 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-05-05 22:34 ` Dan Williams
2021-05-05 22:37 ` Joao Martins
2021-05-05 23:14 ` Dan Williams
2021-05-06 10:24 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 06/11] mm/sparse-vmemmap: refactor vmemmap_populate_basepages() Joao Martins
2021-05-05 22:43 ` Dan Williams
2021-05-06 10:27 ` Joao Martins
2021-05-06 18:36 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 07/11] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-05-06 1:18 ` Dan Williams
2021-05-06 11:01 ` Joao Martins
2021-05-10 19:19 ` Dan Williams
2021-05-13 18:45 ` Joao Martins
2021-06-16 15:05 ` Joao Martins
2021-06-16 23:35 ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 08/11] mm/sparse-vmemmap: use hugepages for PUD " Joao Martins
2021-06-01 19:30 ` Dan Williams
2021-06-07 12:02 ` Joao Martins
2021-06-07 19:47 ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 09/11] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-06-01 23:35 ` Dan Williams
2021-06-07 13:48 ` Joao Martins
2021-06-07 19:32 ` Dan Williams
2021-06-14 18:41 ` Joao Martins
2021-06-14 23:07 ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 10/11] device-dax: compound pagemap support Joao Martins
2021-06-02 0:36 ` Dan Williams
2021-06-07 13:59 ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 11/11] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-06-02 1:05 ` Dan Williams
2021-06-07 15:21 ` Joao Martins
2021-06-07 19:22 ` Dan Williams
2021-04-01 9:38 ` [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPcyv4j_PdzytEeabe95FrUiNVNobdJRvUE9M9j0krKQ1defBg@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=ira.weiny@intel.com \
--cc=jane.chu@oracle.com \
--cc=jgg@ziepe.ca \
--cc=joao.m.martins@oracle.com \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).