From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: kevin.tian@intel.com, shuah@kernel.org,
joao.m.martins@oracle.com, steven.sistare@oracle.com,
iommu@lists.linux.dev, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org, thomas.weissschuh@linutronix.de
Subject: Re: [PATCH rc 1/4] iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes
Date: Tue, 17 Jun 2025 08:59:48 -0300 [thread overview]
Message-ID: <20250617115948.GV1174925@nvidia.com> (raw)
In-Reply-To: <aFDMoMX8eL7azoUL@nvidia.com>
On Mon, Jun 16, 2025 at 07:02:08PM -0700, Nicolin Chen wrote:
> On Mon, Jun 16, 2025 at 01:25:01PM -0300, Jason Gunthorpe wrote:
> > On Sun, Jun 15, 2025 at 10:02:03PM -0700, Nicolin Chen wrote:
> > > FIXTURE_TEARDOWN(iommufd_dirty_tracking)
> > > {
> > > - munmap(self->buffer, variant->buffer_size);
> > > - munmap(self->bitmap, DIV_ROUND_UP(self->bitmap_size, BITS_PER_BYTE));
> > > + unsigned long size = variant->buffer_size;
> > > +
> > > + if (variant->hugepages)
> > > + size = __ALIGN_KERNEL(variant->buffer_size, HUGEPAGE_SIZE);
> > > + munmap(self->buffer, size);
> > > + free(self->buffer);
> > > + free(self->bitmap);
> > > teardown_iommufd(self->fd, _metadata);
> >
> > munmap followed by free isn't right..
>
> You are right. I re-checked with Copilot. It says the same thing.
> I think the whole posix_memalign() + mmap() confuses me..
>
> Yet, should the bitmap pair with free() since it's allocated by a
> posix_memalign() call?
>
> > This code is using the glibc allocator to get a bunch of pages mmap'd
> > to an aligned location then replacing the pages with MAP_SHARED and
> > maybe HAP_HUGETLB versions.
>
> And I studied some use cases from Copilot. It says that, to use
> the combination of posix_memalign+mmap, we should do:
> aligned_ptr = posix_memalign(pagesize, pagesize);
> unmap(aligned_ptr, pagesize);
> mapped = mmap(aligned_ptr, pagesize, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> munmap(mapped, pagesize);
> // No free() after munmap().
> ---breakdown---
> Before `posix_memalign()`:
> [ heap memory unused ]
>
> After `posix_memalign()`:
> [ posix_memalign() memory ] ← managed by malloc/free
> ↑ aligned_ptr
>
> After `munmap(aligned_ptr)`:
> [ unmapped memory ] ← allocator no longer owns it
Incorrect. The allocator has no idea about the munmap and munmap
doesn't disturb any of the allocator tracking structures.
> After `mmap(aligned_ptr, ..., MAP_FIXED)`:
> [ anonymous mmap region ] ← fully remapped, under your control
> ↑ mapped
> ---end---
No, this is wrong.
> It points out that the heap bookkeeping will be silently clobbered
> without the munmap() in-between (like we are doing):
Nope, doesn't work like that.
> ---breakdown---
> After `posix_memalign()`:
> [ posix_memalign() memory ] ← malloc thinks it owns this
>
> Then `mmap(aligned_ptr, ..., MAP_FIXED)`:
> [ anonymous mmap region ] ← malloc still thinks it owns this (!)
> ↑ mapped
> ---end---
Yes, this is correct and what we are doing here. The allocator always
owns it and we are just replacing the memory with a different mmap.
> It also gives a simpler solution for a memory that is not huge
> page backed but huge page aligned (our !variant->hugepage case):
> ---code---
> void *ptr;
> size_t alignment = 2 * 1024 * 1024; // or whatever HUGEPAGE_SIZE was
> size_t size = variant->buffer_size;
>
> // Step 1: Use posix_memalign to get an aligned pointer
> if (posix_memalign(&ptr, alignment, size) != 0) {
> perror("posix_memalign");
> return -1;
> }
Also no, the main point of this is to inject MAP_SHARED which
posix_memalign cannot not do.
> Also, for a huge page case, there is no need of posix_memalign():
> "Hugepages are not part of the standard heap, so allocator functions
> like posix_memalign() or malloc() don't help and can even get in the
> way."
> Instead, it suggests a cleaner version without posix_memalign():
> ---code---
> void *addr = mmap(NULL, variant->buffer_size, PROT_READ | PROT_WRITE,
> MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE,
> -1, 0);
> if (addr == MAP_FAILED) { perror("mmap"); return -1; }
> ---end---
Yes, we could do this only for MAP_HUGETLB, but it doesn't help the
normal case with MAP_SHARED.
So I would leave it alone, use the version I showed.
Jason
next prev parent reply other threads:[~2025-06-17 11:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-16 5:02 [PATCH rc 0/4] Fix iommufd selftest FAIL and warnings with v6.16 Nicolin Chen
2025-06-16 5:02 ` [PATCH rc 1/4] iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes Nicolin Chen
2025-06-16 16:25 ` Jason Gunthorpe
2025-06-17 2:02 ` Nicolin Chen
2025-06-17 11:59 ` Jason Gunthorpe [this message]
2025-06-17 21:23 ` Nicolin Chen
2025-06-17 23:01 ` Jason Gunthorpe
2025-06-17 23:46 ` Nicolin Chen
2025-06-18 11:38 ` Jason Gunthorpe
2025-06-16 5:02 ` [PATCH rc 2/4] iommufd/selftest: Add missing close(mfd) in memfd_mmap() Nicolin Chen
2025-06-16 16:25 ` Jason Gunthorpe
2025-06-16 5:02 ` [PATCH rc 3/4] iommufd/selftest: Add asserts testing global mfd Nicolin Chen
2025-06-16 16:26 ` Jason Gunthorpe
2025-06-16 5:02 ` [PATCH rc 4/4] iommufd/selftest: Fix build warnings due to uninitialized mfd Nicolin Chen
2025-06-16 16:27 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250617115948.GV1174925@nvidia.com \
--to=jgg@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=shuah@kernel.org \
--cc=steven.sistare@oracle.com \
--cc=thomas.weissschuh@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.