All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com,
	willy@infradead.org, dan.j.williams@intel.com,
	david@fromorbit.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	djwong@kernel.org, hch@lst.de, david@redhat.com
Subject: Re: ZONE_DEVICE refcounting
Date: Fri, 8 Mar 2024 09:44:18 -0400	[thread overview]
Message-ID: <20240308134418.GH9179@nvidia.com> (raw)
In-Reply-To: <87ttlhmj9p.fsf@nvdebian.thelocal>

On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote:
> Hi,
> 
> I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
> have been looking  at fixing the 1-based refcounts that are currently used for
> FS DAX pages (and p2pdma pages, but that's trival).
> 
> This started with the simple idea of "just subtract one from the
> refcounts everywhere and that will fix the off by one". Unfortunately
> it's not that simple. For starters doing a simple conversion like that
> requires allowing pages to be mapped with zero refcounts. That seems
> wrong. It also leads to problems detecting idle IO vs. page map pages.
> 
> So instead I'm thinking of doing something along the lines of the following:
> 
> 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
>    increment the refcount inline with mapcount and decrement it when pages are
>    unmapped.

This is the right thing to do

> 2. As per normal pages the pages are considered free when the refcount drops
>    to zero.
> 
> 3. Because these are treated as normal pages for refcounting we no longer map
>    them as pte_devmap() (possibly freeing up a PTE bit).

Yes, the pmd/pte_devmap() should ideally go away.

> 4. PMD sized FS DAX pages get treated the same as normal compound pages.
> 
> 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
>    the page->pgmap field with page->compound_head, but this isn't a problem
>    because the LSB of page->pgmap is free and we can still get pgmap from
>    compound_head(page)->pgmap.

Right, this is the actual work - the mm is obviously already happy
with its part, fsdax just need to create a properly sized folio and
map it properly.

> 6. When FS DAX pages are freed they notify filesystem drivers. This can be done
>    from the pgmap->ops->page_free() callback.
> 
> 7. We could probably get rid of the pgmap refcounting because we can just scan
>    pages and look for any pages with non-zero references and wait for them to be
>    freed whilst ensuring no new mappings can be created (some drivers do a
>    similar thing for private pages today). This might be a follow-up
>    change.

Yeah, the pgmap refcounting needs some cleanup for sure.

Jason

  reply	other threads:[~2024-03-08 13:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08  4:24 ZONE_DEVICE refcounting Alistair Popple
2024-03-08 13:44 ` Jason Gunthorpe [this message]
2024-03-13  6:32 ` Dan Williams
2024-03-20  5:20   ` Alistair Popple
2024-03-21  5:26     ` Alistair Popple
2024-03-21  6:03       ` Dan Williams
2024-03-22  0:01         ` Alistair Popple
2024-03-22  3:18           ` Dave Chinner
2024-03-22  5:34             ` Alistair Popple
2024-03-22  6:58               ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240308134418.GH9179@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.