From: Alistair Popple <apopple@nvidia.com>
To: Jan Kara <jack@suse.cz>
Cc: dan.j.williams@intel.com, vishal.l.verma@intel.com,
dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com,
jgg@ziepe.ca, catalin.marinas@arm.com, will@kernel.org,
mpe@ellerman.id.au, npiggin@gmail.com,
dave.hansen@linux.intel.com, ira.weiny@intel.com,
willy@infradead.org, djwong@kernel.org, tytso@mit.edu,
linmiaohe@huawei.com, david@redhat.com, peterx@redhat.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org, jhubbard@nvidia.com, hch@lst.de,
david@fromorbit.com
Subject: Re: [PATCH 06/13] mm/memory: Add dax_insert_pfn
Date: Fri, 06 Sep 2024 16:21:53 +1000 [thread overview]
Message-ID: <87seudb8nm.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <20240627113328.ozqkzhloufrpsdcr@quack3>
Jan Kara <jack@suse.cz> writes:
> On Thu 27-06-24 10:54:21, Alistair Popple wrote:
>> Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This
>> creates a special devmap PTE entry for the pfn but does not take a
>> reference on the underlying struct page for the mapping. This is
>> because DAX page refcounts are treated specially, as indicated by the
>> presence of a devmap entry.
>>
>> To allow DAX page refcounts to be managed the same as normal page
>> refcounts introduce dax_insert_pfn. This will take a reference on the
>> underlying page much the same as vmf_insert_page, except it also
>> permits upgrading an existing mapping to be writable if
>> requested/possible.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>
> Overall this looks good to me. Some comments below.
>
>> ---
>> include/linux/mm.h | 4 ++-
>> mm/memory.c | 79 ++++++++++++++++++++++++++++++++++++++++++-----
>> 2 files changed, 76 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 9a5652c..b84368b 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1080,6 +1080,8 @@ int vma_is_stack_for_current(struct vm_area_struct *vma);
>> struct mmu_gather;
>> struct inode;
>>
>> +extern void prep_compound_page(struct page *page, unsigned int order);
>> +
>
> You don't seem to use this function in this patch?
Thanks, bad rebase splitting this up. It belongs later in the series.
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ce48a05..4f26a1f 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1989,14 +1989,42 @@ static int validate_page_before_insert(struct page *page)
>> }
>>
>> static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>> - unsigned long addr, struct page *page, pgprot_t prot)
>> + unsigned long addr, struct page *page, pgprot_t prot, bool mkwrite)
>> {
>> struct folio *folio = page_folio(page);
>> + pte_t entry = ptep_get(pte);
>>
>> - if (!pte_none(ptep_get(pte)))
>> + if (!pte_none(entry)) {
>> + if (mkwrite) {
>> + /*
>> + * For read faults on private mappings the PFN passed
>> + * in may not match the PFN we have mapped if the
>> + * mapped PFN is a writeable COW page. In the mkwrite
>> + * case we are creating a writable PTE for a shared
>> + * mapping and we expect the PFNs to match. If they
>> + * don't match, we are likely racing with block
>> + * allocation and mapping invalidation so just skip the
>> + * update.
>> + */
>> + if (pte_pfn(entry) != page_to_pfn(page)) {
>> + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry)));
>> + return -EFAULT;
>> + }
>> + entry = maybe_mkwrite(entry, vma);
>> + entry = pte_mkyoung(entry);
>> + if (ptep_set_access_flags(vma, addr, pte, entry, 1))
>> + update_mmu_cache(vma, addr, pte);
>> + return 0;
>> + }
>> return -EBUSY;
>
> If you do this like:
>
> if (!mkwrite)
> return -EBUSY;
>
> You can reduce indentation of the big block and also making the flow more
> obvious...
Good idea.
>> + }
>> +
>> /* Ok, finally just insert the thing.. */
>> folio_get(folio);
>> + if (mkwrite)
>> + entry = maybe_mkwrite(mk_pte(page, prot), vma);
>> + else
>> + entry = mk_pte(page, prot);
>
> I'd prefer:
>
> entry = mk_pte(page, prot);
> if (mkwrite)
> entry = maybe_mkwrite(entry, vma);
>
> but I don't insist. Also insert_pfn() additionally has pte_mkyoung() and
> pte_mkdirty(). Why was it left out here?
An oversight by me, thanks for pointing it out!
> Honza
next prev parent reply other threads:[~2024-09-06 6:27 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-27 0:54 [PATCH 00/13] fs/dax: Fix FS DAX page reference counts Alistair Popple
2024-06-27 0:54 ` [PATCH 01/13] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-06-27 6:36 ` Dan Williams
2024-06-27 0:54 ` [PATCH 02/13] pci/p2pdma: Don't initialise page refcount to one Alistair Popple
2024-06-27 5:30 ` Christoph Hellwig
2024-06-29 21:28 ` Bjorn Helgaas
2024-06-27 0:54 ` [PATCH 03/13] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-06-27 5:31 ` Christoph Hellwig
2024-06-27 0:54 ` [PATCH 04/13] fs/dax: Add dax_page_free callback Alistair Popple
2024-06-27 5:33 ` Christoph Hellwig
2024-06-27 23:48 ` Alistair Popple
2024-06-27 0:54 ` [PATCH 05/13] mm: Allow compound zone device pages Alistair Popple
2024-06-27 5:35 ` Christoph Hellwig
2024-06-27 0:54 ` [PATCH 06/13] mm/memory: Add dax_insert_pfn Alistair Popple
2024-06-27 5:22 ` Christoph Hellwig
2024-06-27 11:33 ` Jan Kara
2024-09-06 6:21 ` Alistair Popple [this message]
2024-07-02 7:18 ` David Hildenbrand
2024-07-02 10:47 ` Alistair Popple
2024-07-02 11:46 ` Christoph Hellwig
2024-07-02 11:53 ` David Hildenbrand
2024-06-27 0:54 ` [PATCH 07/13] huge_memory: Allow mappings of PUD sized pages Alistair Popple
2024-06-27 22:26 ` kernel test robot
2024-07-02 7:16 ` David Hildenbrand
2024-07-02 10:19 ` Alistair Popple
2024-07-02 11:02 ` David Hildenbrand
2024-07-02 11:30 ` Alistair Popple
2024-07-02 13:01 ` David Hildenbrand
2024-07-02 11:51 ` Christoph Hellwig
2024-06-27 0:54 ` [PATCH 08/13] huge_memory: Allow mappings of PMD " Alistair Popple
2024-06-27 0:54 ` [PATCH 09/13] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2024-07-01 8:59 ` David Hildenbrand
2024-07-01 23:47 ` Alistair Popple
2024-07-02 10:48 ` David Hildenbrand
2024-06-27 0:54 ` [PATCH 10/13] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-06-27 5:44 ` Christoph Hellwig
2024-09-06 6:00 ` Alistair Popple
2024-06-27 0:54 ` [PATCH 11/13] huge_memory: Remove dead vmf_insert_pXd code Alistair Popple
2024-07-05 14:24 ` Peter Xu
2024-07-09 4:07 ` Alistair Popple
2024-07-09 15:56 ` Peter Xu
2024-07-12 2:40 ` Alistair Popple
2024-07-12 15:52 ` Peter Xu
2024-06-27 0:54 ` [PATCH 12/13] mm: Remove pXX_devmap callers Alistair Popple
2024-06-27 0:54 ` [PATCH 13/13] mm: Remove devmap related functions and page table bits Alistair Popple
2024-06-27 23:04 ` kernel test robot
2024-06-28 2:12 ` kernel test robot
2024-07-08 11:35 ` Will Deacon
2024-06-27 6:58 ` [PATCH 00/13] fs/dax: Fix FS DAX page reference counts Dan Williams
2024-06-27 7:15 ` Alistair Popple
2024-06-27 20:24 ` Dan Williams
2024-06-28 0:06 ` Alistair Popple
2024-07-01 4:24 ` Dave Chinner
2024-07-01 8:33 ` Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87seudb8nm.fsf@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=linmiaohe@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=logang@deltatee.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=nvdimm@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=tytso@mit.edu \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).