All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com,
	willy@infradead.org, jgg@nvidia.com,
	linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org,
	hch@lst.de, david@redhat.com, ruansy.fnst@fujitsu.com
Subject: Re: ZONE_DEVICE refcounting
Date: Fri, 22 Mar 2024 16:34:33 +1100	[thread overview]
Message-ID: <87v85e6cj2.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <Zfz4dT+YWpx5OYxM@dread.disaster.area>


Dave Chinner <david@fromorbit.com> writes:

> On Fri, Mar 22, 2024 at 11:01:25AM +1100, Alistair Popple wrote:
>> 
>> Dan Williams <dan.j.williams@intel.com> writes:
>> 
>> > Alistair Popple wrote:
>> >> 
>> >> Alistair Popple <apopple@nvidia.com> writes:
>> >> 
>> >> > Dan Williams <dan.j.williams@intel.com> writes:
>> >> >
>> >> >> Alistair Popple wrote:
>> >> >
>> >> > I also noticed folio_anon() is not safe to call on a FS DAX page due to
>> >> > sharing PAGE_MAPPING_DAX_SHARED.
>> >> 
>> >> Also it feels like I could be missing something here. AFAICT the
>> >> page->mapping and page->index fields can't actually be used outside of
>> >> fs/dax because they are overloaded for the shared case. Therefore
>> >> setting/clearing them could be skipped and the only reason for doing so
>> >> is so dax_associate_entry()/dax_disassociate_entry() can generate
>> >> warnings which should never occur anyway. So all that code is
>> >> functionally unnecessary.
>> >
>> > What do you mean outside of fs/dax, do you literally mean outside of
>> > fs/dax.c, or the devdax case (i.e. dax without fs-entanglements)?
>> 
>> Only the cases fs dax pages might need it. ie. Not devdax which I
>> haven't looked at closely yet.
>> 
>> > Memory
>> > failure needs ->mapping and ->index to rmap dax pages. See
>> > mm/memory-failure.c::__add_to_kill() and
>> > mm/memory-failure.c::__add_to_kill_fsdax() where that latter one is for
>> > cases where the fs needs has signed up to react to dax page failure.
>> 
>> How does that work for reflink/shared pages which overwrite
>> page->mapping and page->index?
>
> Via reverse mapping in the *filesystem*, not the mm rmap stuff.
>
> pmem_pagemap_memory_failure()
>   dax_holder_notify_failure()
>     .notify_failure()
>       xfs_dax_notify_failure()
>         xfs_dax_notify_ddev_failure()
> 	  xfs_rmap_query_range(xfs_dax_failure_fn)
> 	     xfs_dax_failure_fn(rmap record)
> 	       <grabs inode from cache>
> 	       <converts range to file offset>
> 	       mf_dax_kill_procs(inode->mapping, pgoff)
> 	         collect_procs_fsdax(mapping, page)
> 		   add_to_kill_fsdax(task)
> 		     __add_to_kill(task)
> 		 unmap_and_kill_tasks()
>
> Remember: in FSDAX, the pages are the storage media physically owned
> by the filesystem, not the mm subsystem. Hence answering questions
> like "who owns this page" can only be answered correctly by asking
> the filesystem.

Thanks Dave for writing that up, it really helped solidify my
understanding of how this is all supposed to work.

> We shortcut that for pages that only have one owner - we just store
> the owner information in the page as a {mapping, offset} tuple. But
> when we have multiple owners, the only way to find all the {mapping,
> offset} tuples is to ask the filesystem to find all the owners of
> that page.
>
> Hence the special case values for page->mapping/page->index for
> pages over shared filesystem extents. These shared extents are
> communicated to the fsdax layer via the IOMAP_F_SHARED flag
> in the iomaps returned by the filesystem. This flag is the trigger
> for the special mapping share count behaviour to be used. e.g. see
> dax_insert_entry(iomap_iter) -> dax_associate_entry(shared) ->
> dax_page_share_get()....
>
>> Eg. in __add_to_kill() if *p is a shared fs
>> dax page then p->mapping == PAGE_MAPPING_DAX_SHARED and
>> page_address_in_vma(vma, p) will probably crash.
>
> As per above, we don't get the mapping from the page in those cases.

Yep, that all makes sense and I see where I was getting confsued. It was
because in __add_to_kill() we do actually use page->mapping when
page_address_in_vma(vma, p) is called. And because
folio_test_anon(page_folio(p)) is also true for shared FS DAX pages
(p->mapping == PAGE_MAPPING_DAX_SHARED/PAGE_MAPPING_DAX_SHARED) I
thought things would go bad there.

However after re-reading the page_address_in_vma() implementation I
noticed that isn't the case, because folio_anon_vma(page_folio(p)) will
still return NULL which was the detail I had missed.

So to go back to my original point it seems page->mapping and
page->index is largely unused on fs dax pages, even for memory
failure. Because for memory failure the mapping and fsdax_pgoff comes
from the filesystem not the page.

> If you haven't got access to the page though a filesystem method and
> guaranteed that truncate() can't free it from under you, then you're
> probably doing the wrong thing with fsdax...
>
> -Dave.


  reply	other threads:[~2024-03-22  5:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08  4:24 ZONE_DEVICE refcounting Alistair Popple
2024-03-08 13:44 ` Jason Gunthorpe
2024-03-13  6:32 ` Dan Williams
2024-03-20  5:20   ` Alistair Popple
2024-03-21  5:26     ` Alistair Popple
2024-03-21  6:03       ` Dan Williams
2024-03-22  0:01         ` Alistair Popple
2024-03-22  3:18           ` Dave Chinner
2024-03-22  5:34             ` Alistair Popple [this message]
2024-03-22  6:58               ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v85e6cj2.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcampbell@nvidia.com \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.