All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>,
	rcampbell@nvidia.com, willy@infradead.org,
	David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	amd-gfx@lists.freedesktop.org, linux-xfs@vger.kernel.org,
	linux-mm@kvack.org, jglisse@redhat.com,
	dri-devel@lists.freedesktop.org, akpm@linux-foundation.org,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v6 01/10] mm: add zone device coherent type memory support
Date: Fri, 18 Feb 2022 15:26:40 -0400	[thread overview]
Message-ID: <20220218192640.GV4160@nvidia.com> (raw)
In-Reply-To: <2f2569d5-df0d-7975-7f4a-2d85ceaf29ef@amd.com>

On Fri, Feb 18, 2022 at 02:20:45PM -0500, Felix Kuehling wrote:
> Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe:
> > On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote:
> > 
> > > I'm thinking of a more theoretical approach: Instead of auditing all users,
> > > I'd ask, what are the invariants that a vm_normal_page should have. Then
> > > check, whether our DEVICE_COHERENT pages satisfy them. But maybe the concept
> > > of a vm_normal_page isn't defined clearly enough for that.
> > I would say the expectation is that only 'page cache and anon' pages
> > are returned - ie the first union in struct page
> > 
> > This is because the first file in your list I looked at:
> > 
> > static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > 				unsigned long addr, unsigned long end,
> > 				struct mm_walk *walk)
> > 
> > {
> > 		page = vm_normal_page(vma, addr, ptent);
> > [..]
> > 		if (pageout) {
> > 			if (!isolate_lru_page(page)) {
> > 
> > Uses the LRU field, so this is incompatible with all the other page
> > types.
> > 
> > One mitigation of this might be to formally make vm_normal_page() ==
> > 'pte to page cache and anon page' and add a new function that is 'pte
> > to any struct page'
> > 
> > Then go through and sort callers as appropriate.
> > 
> > The 'pte to page cache and anon page' can detect ZONE_DEVICE by
> > calling is_zone_device_page() insted of pte_devmap() and then continue
> > to return NULL. This same trick will fix GUP_fast.
> 
> Sounds good to me. What about vm_normal_page_pmd? Should we remove the
> pmd_devmap check from that function as well. I'm not even sure what a huge
> zone_device page would look like, but maybe that's a worthwhile future
> optimization for our driver.

IIRC there are other problems here as PMDs are currently wired to THPs
and not general at all..

We have huge zone_device pages, it is just any compound page of
contiguous pfns - you should be aggregating any contiguous string of
logical PFNs together into a folio for performance. If the folio is
stuffed into a PMD or not is a different question..

> I'd propose the function names vm_normal_page and vm_normal_or_device_page
> for the two functions you described. 

I wouldn't say device_page, it should be any type of page - though
device_page is the only other option ATM, AFIAK.

> current vm_normal_page with the pte_devmap check removed. vm_normal_page
> could be implemented as a wrapper around vm_normal_or_device_page, which
> just adds the !is_zone_device_page() check.

Yes

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Alex Sierra <alex.sierra@amd.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	rcampbell@nvidia.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, jglisse@redhat.com,
	willy@infradead.org
Subject: Re: [PATCH v6 01/10] mm: add zone device coherent type memory support
Date: Fri, 18 Feb 2022 15:26:40 -0400	[thread overview]
Message-ID: <20220218192640.GV4160@nvidia.com> (raw)
In-Reply-To: <2f2569d5-df0d-7975-7f4a-2d85ceaf29ef@amd.com>

On Fri, Feb 18, 2022 at 02:20:45PM -0500, Felix Kuehling wrote:
> Am 2022-02-17 um 19:19 schrieb Jason Gunthorpe:
> > On Thu, Feb 17, 2022 at 04:12:20PM -0500, Felix Kuehling wrote:
> > 
> > > I'm thinking of a more theoretical approach: Instead of auditing all users,
> > > I'd ask, what are the invariants that a vm_normal_page should have. Then
> > > check, whether our DEVICE_COHERENT pages satisfy them. But maybe the concept
> > > of a vm_normal_page isn't defined clearly enough for that.
> > I would say the expectation is that only 'page cache and anon' pages
> > are returned - ie the first union in struct page
> > 
> > This is because the first file in your list I looked at:
> > 
> > static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > 				unsigned long addr, unsigned long end,
> > 				struct mm_walk *walk)
> > 
> > {
> > 		page = vm_normal_page(vma, addr, ptent);
> > [..]
> > 		if (pageout) {
> > 			if (!isolate_lru_page(page)) {
> > 
> > Uses the LRU field, so this is incompatible with all the other page
> > types.
> > 
> > One mitigation of this might be to formally make vm_normal_page() ==
> > 'pte to page cache and anon page' and add a new function that is 'pte
> > to any struct page'
> > 
> > Then go through and sort callers as appropriate.
> > 
> > The 'pte to page cache and anon page' can detect ZONE_DEVICE by
> > calling is_zone_device_page() insted of pte_devmap() and then continue
> > to return NULL. This same trick will fix GUP_fast.
> 
> Sounds good to me. What about vm_normal_page_pmd? Should we remove the
> pmd_devmap check from that function as well. I'm not even sure what a huge
> zone_device page would look like, but maybe that's a worthwhile future
> optimization for our driver.

IIRC there are other problems here as PMDs are currently wired to THPs
and not general at all..

We have huge zone_device pages, it is just any compound page of
contiguous pfns - you should be aggregating any contiguous string of
logical PFNs together into a folio for performance. If the folio is
stuffed into a PMD or not is a different question..

> I'd propose the function names vm_normal_page and vm_normal_or_device_page
> for the two functions you described. 

I wouldn't say device_page, it should be any type of page - though
device_page is the only other option ATM, AFIAK.

> current vm_normal_page with the pte_devmap check removed. vm_normal_page
> could be implemented as a wrapper around vm_normal_or_device_page, which
> just adds the !is_zone_device_page() check.

Yes

Jason

  reply	other threads:[~2022-02-18 19:33 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 15:48 [PATCH v6 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
2022-02-01 15:48 ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 01/10] mm: add zone device coherent type memory support Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-11 16:15   ` David Hildenbrand
2022-02-11 16:15     ` David Hildenbrand
2022-02-11 16:39     ` David Hildenbrand
2022-02-11 16:39       ` David Hildenbrand
2022-02-11 16:52       ` Sierra Guiza, Alejandro (Alex)
2022-02-11 16:52         ` Sierra Guiza, Alejandro (Alex)
2022-02-11 17:07       ` Felix Kuehling
2022-02-11 17:07         ` Felix Kuehling
2022-02-15 12:16         ` David Hildenbrand
2022-02-15 12:16           ` David Hildenbrand
2022-02-15 14:45           ` Jason Gunthorpe
2022-02-15 14:45             ` Jason Gunthorpe
2022-02-15 18:32             ` Christoph Hellwig
2022-02-15 18:32               ` Christoph Hellwig
2022-02-15 19:41               ` Jason Gunthorpe
2022-02-15 19:41                 ` Jason Gunthorpe
2022-02-15 21:35                 ` Felix Kuehling
2022-02-15 21:35                   ` Felix Kuehling
2022-02-15 21:47                   ` Jason Gunthorpe
2022-02-15 21:47                     ` Jason Gunthorpe
2022-02-15 22:49                     ` Felix Kuehling
2022-02-15 22:49                       ` Felix Kuehling
2022-02-16  2:01                       ` Jason Gunthorpe
2022-02-16  2:01                         ` Jason Gunthorpe
2022-02-16 16:56                         ` Felix Kuehling
2022-02-16 16:56                           ` Felix Kuehling
2022-02-16 17:28                           ` Jason Gunthorpe
2022-02-16 17:28                             ` Jason Gunthorpe
2022-02-16  1:23                     ` Alistair Popple
2022-02-16  1:23                       ` Alistair Popple
2022-02-16  2:03                       ` Jason Gunthorpe
2022-02-16  2:03                         ` Jason Gunthorpe
2022-02-16  2:36                         ` Alistair Popple
2022-02-16  2:36                           ` Alistair Popple
2022-02-16  8:31                           ` David Hildenbrand
2022-02-16  8:31                             ` David Hildenbrand
2022-02-16 12:26                             ` Jason Gunthorpe
2022-02-16 12:26                               ` Jason Gunthorpe
2022-02-17  1:05                               ` Alistair Popple
2022-02-17  1:05                                 ` Alistair Popple
2022-02-17 21:12                               ` Felix Kuehling
2022-02-17 21:12                                 ` Felix Kuehling
2022-02-18  0:19                                 ` Jason Gunthorpe
2022-02-18  0:19                                   ` Jason Gunthorpe
2022-02-18 19:20                                   ` Felix Kuehling
2022-02-18 19:20                                     ` Felix Kuehling
2022-02-18 19:26                                     ` Jason Gunthorpe [this message]
2022-02-18 19:26                                       ` Jason Gunthorpe
2022-02-18 19:37                                       ` Felix Kuehling
2022-02-18 19:37                                         ` Felix Kuehling
2022-02-28 20:34                                       ` [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling Alex Sierra
2022-02-28 20:34                                         ` Alex Sierra
2022-02-28 22:41                                         ` Felix Kuehling
2022-02-28 22:41                                           ` Felix Kuehling
2022-03-01  8:03                                         ` David Hildenbrand
2022-03-01  8:03                                           ` David Hildenbrand
2022-03-01 16:08                                           ` Felix Kuehling
2022-03-01 16:08                                             ` Felix Kuehling
2022-03-01 16:22                                             ` David Hildenbrand
2022-03-01 16:22                                               ` David Hildenbrand
2022-03-01 16:30                                               ` Felix Kuehling
2022-03-01 16:30                                                 ` Felix Kuehling
2022-03-01 16:32                                                 ` David Hildenbrand
2022-03-01 16:32                                                   ` David Hildenbrand
2022-02-18  0:59                                 ` [PATCH v6 01/10] mm: add zone device coherent type memory support Alistair Popple
2022-02-18  0:59                                   ` Alistair Popple
2022-02-11 16:45     ` Jason Gunthorpe
2022-02-11 16:45       ` Jason Gunthorpe
2022-02-11 16:49       ` David Hildenbrand
2022-02-11 16:49         ` David Hildenbrand
2022-02-11 16:56         ` Jason Gunthorpe
2022-02-11 16:56           ` Jason Gunthorpe
2022-02-15 12:15           ` David Hildenbrand
2022-02-15 12:15             ` David Hildenbrand
2022-02-15 18:52             ` Felix Kuehling
2022-02-15 18:52               ` Felix Kuehling
2022-02-11 17:05     ` Felix Kuehling
2022-02-11 17:05       ` Felix Kuehling
2022-02-14  2:04       ` Alistair Popple
2022-02-14  2:04         ` Alistair Popple
2022-02-01 15:48 ` [PATCH v6 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 07/10] lib: test_hmm add module param for " Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:48 ` [PATCH v6 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
2022-02-01 15:48   ` Alex Sierra
2022-02-01 15:49 ` [PATCH v6 09/10] tools: update hmm-test to support device coherent type Alex Sierra
2022-02-01 15:49   ` Alex Sierra
2022-02-01 15:49 ` [PATCH v6 10/10] tools: update test_hmm script to support SP config Alex Sierra
2022-02-01 15:49   ` Alex Sierra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220218192640.GV4160@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=hch@lst.de \
    --cc=jglisse@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.