Re: [GIT PULL] VFIO updates for v6.17-rc1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"lizhe.67@bytedance.com" <lizhe.67@bytedance.com>
Subject: Re: [GIT PULL] VFIO updates for v6.17-rc1
Date: Tue, 5 Aug 2025 10:55:05 -0300	[thread overview]
Message-ID: <20250805135505.GL184255@nvidia.com> (raw)
In-Reply-To: <00999740-d762-488a-a946-0c10589df146@redhat.com>

On Tue, Aug 05, 2025 at 03:33:49PM +0200, David Hildenbrand wrote:

> > David, there is another alternative to prevent this, simple though a
> > bit wasteful, just allocate a bit bigger to ensure the allocation
> > doesn't end on an exact PAGE_SIZE boundary?
> 
> :/ in particular doing that through the memblock in sparse_init_nid(), I am
> not so sure that's a good idea.

It would probably be some work to make larger allocations to avoid
padding :\

> I prefer Linus' proposal and avoids the one nth_page(), unless any other
> approach can help us get rid of more nth_page() usage -- and I don't think
> your proposal could, right?

If the above were solved - so the struct page allocations could be
larger than a section, arguably just the entire range being plugged,
then I think you also solve the nth_page() problem too.

Effectively the nth_page() problem is that we allocate the struct page
arrays on an arbitary section-by-section basis, and then the arch sets
MAX_ORDER so that a folio can cross sections, effectively guaranteeing
to virtually fragment the page *'s inside folios.

Doing a giant vmalloc at the start so you could also cheaply add some
padding would effectively also prevent the nth_page problem as we can
reasonably say that no folio should extend past an entire memory
region.

Maybe there is some reason we can't do a giant vmalloc on these
systems that also can't do SPARSE_VMMEMAP :\ But perhaps we could get
up to MAX_ORDER at least? Or perhaps we could have those systems
reduce MAX_ORDER?

So, I think they are lightly linked problems.

I suppose this is also a limitation with Linus's suggestion. It
doesn't give the optimal answer for for 1G pages on these older systems:

        for (size_t nr = 1; nr < nr_pages; nr++) {
                if (*pages++ != ++page)
                        break;

Since that will exit every section.

At least for scatterlist like cases the point of this function is just
to speed things up. If it returns short the calling code should still
be directly checking phys_addr contiguity anyhow. Something for the
kdoc I suppose.

Jason

next prev parent reply	other threads:[~2025-08-05 13:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04 22:22 [GIT PULL] VFIO updates for v6.17-rc1 Alex Williamson
2025-08-04 23:55 ` Linus Torvalds
2025-08-05  0:53   ` Alex Williamson
2025-08-05  7:47     ` David Hildenbrand
2025-08-05 11:49       ` Jason Gunthorpe
2025-08-05 12:07         ` David Hildenbrand
2025-08-05 12:38           ` Jason Gunthorpe
2025-08-05 12:41             ` David Hildenbrand
2025-08-05 12:56               ` Jason Gunthorpe
2025-08-05 13:05                 ` David Hildenbrand
2025-08-05 13:15                   ` Linus Torvalds
2025-08-05 13:19                     ` David Hildenbrand
2025-08-05 13:22                     ` David Hildenbrand
2025-08-05 13:00       ` Linus Torvalds
2025-08-05 13:20         ` David Hildenbrand
2025-08-05 13:24           ` David Hildenbrand
2025-08-05 13:28           ` Linus Torvalds
2025-08-05 13:37             ` David Hildenbrand
2025-08-05 13:49               ` Linus Torvalds
2025-08-05 13:25         ` Jason Gunthorpe
2025-08-05 13:33           ` David Hildenbrand
2025-08-05 13:55             ` Jason Gunthorpe [this message]
2025-08-05 14:10               ` David Hildenbrand
2025-08-05 14:20                 ` Jason Gunthorpe
2025-08-05 14:22                   ` David Hildenbrand
2025-08-05 14:24                     ` Jason Gunthorpe
2025-08-05 14:26                       ` David Hildenbrand
2025-08-05 13:36           ` Linus Torvalds
2025-08-05 13:47             ` David Hildenbrand
2025-08-05 13:51               ` Linus Torvalds
2025-08-05 13:55                 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250805135505.GL184255@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizhe.67@bytedance.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.