From: Boris Brezillon <boris.brezillon@collabora.com>
To: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>,
iommu@lists.linux.dev, Robin Murphy <robin.murphy@arm.com>,
linux-arm-kernel@lists.infradead.org,
Rob Clark <robdclark@gmail.com>
Subject: Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
Date: Wed, 9 Aug 2023 17:10:17 +0200 [thread overview]
Message-ID: <20230809171017.64409d60@collabora.com> (raw)
In-Reply-To: <20230809144720.GA4472@willie-the-truck>
On Wed, 9 Aug 2023 15:47:21 +0100
Will Deacon <will@kernel.org> wrote:
> On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote:
> > We need that in order to implement the VM_BIND ioctl in the GPU driver
> > targeting new Mali GPUs.
> >
> > VM_BIND is about executing MMU map/unmap requests asynchronously,
> > possibly after waiting for external dependencies encoded as dma_fences.
> > We intend to use the drm_sched framework to automate the dependency
> > tracking and VM job dequeuing logic, but this comes with its own set
> > of constraints, one of them being the fact we are not allowed to
> > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> > sort of deadlocks:
> >
> > - VM_BIND map job needs to allocate a page table to map some memory
> > to the VM. No memory available, so kswapd is kicked
> > - GPU driver shrinker backend ends up waiting on the fence attached to
> > the VM map job or any other job fence depending on this VM operation.
> >
> > With custom allocators, we will be able to pre-reserve enough pages to
> > guarantee the map/unmap operations we queued will take place without
> > going through the system allocator. But we can also optimize
> > allocation/reservation by not free-ing pages immediately, so any
> > upcoming page table allocation requests can be serviced by some free
> > page table pool kept at the driver level.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
> > drivers/iommu/io-pgtable.c | 12 ++++++++
> > 2 files changed, 46 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 72dcdd468cf3..c5c04f0106f3 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
> > }
> >
> > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> > - struct io_pgtable_cfg *cfg)
> > + struct io_pgtable_cfg *cfg,
> > + void *cookie)
> > {
> > struct device *dev = cfg->iommu_dev;
> > int order = get_order(size);
> > - struct page *p;
> > dma_addr_t dma;
> > void *pages;
> >
> > VM_BUG_ON((gfp & __GFP_HIGHMEM));
> > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > - if (!p)
> > +
> > + if (cfg->alloc) {
> > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
> > + } else {
> > + struct page *p;
> > +
> > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > + pages = p ? page_address(p) : NULL;
>
> Hmm, so the reason we pass the order is because the pgd may have
> additional alignment requirements but that's not passed back to your new
> allocation hook. Does it just return naturally aligned allocations?
So, the assumption was that the custom allocator should be aware of
these alignment constraints. Like, if size > min_granule_sz, then order
equal get_order(size).
Right now, I reject any size that's not SZ_4K, because, given the
VA-space and the pgtable config, I know I'll always end up with 4
levels and the pgd will fit in a 4k page. But if we ever decide we want
to support 64k granule or a VA space that's smaller, we'll adjust the
custom allocator accordingly.
I'm not sure we discussed this specific aspect with Robin, but his
point was that the user passing a custom allocator should be aware of
the page table format and all the allocation constraints that come with
it.
>
> If you don't care about the pgd (since it's not something that's going
> to be freed and reallocated during the lifetime of the page-table), then
> perhaps it would be cleaner to pass in a 'struct kmem_cache' for the
> non-pgd pages when configuring the page-table, rather than override the
> allocation function wholesale?
I'd be fine with that, but I wasn't sure every one would be okay to use
a kmem_cache as the page caching mechanism. I know Rob was intending to
use a custom allocator with the MSM MMU to provide the same sort of
caching for the Adreno GPU driver, so it might be good to have his
opinion on that.
> I think that would also mean that all the
> lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a
> little less fragile.
Well, that would only help if a custom kmem_cache is used, unless you
want to generalize the kmem_cache approach and force it to all
io-pgtable-arm users, in which case I probably don't need to pass a
custom kmem_cache in the first place (mine has nothing special).
WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>,
iommu@lists.linux.dev, Robin Murphy <robin.murphy@arm.com>,
linux-arm-kernel@lists.infradead.org,
Rob Clark <robdclark@gmail.com>
Subject: Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
Date: Wed, 9 Aug 2023 17:10:17 +0200 [thread overview]
Message-ID: <20230809171017.64409d60@collabora.com> (raw)
In-Reply-To: <20230809144720.GA4472@willie-the-truck>
On Wed, 9 Aug 2023 15:47:21 +0100
Will Deacon <will@kernel.org> wrote:
> On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote:
> > We need that in order to implement the VM_BIND ioctl in the GPU driver
> > targeting new Mali GPUs.
> >
> > VM_BIND is about executing MMU map/unmap requests asynchronously,
> > possibly after waiting for external dependencies encoded as dma_fences.
> > We intend to use the drm_sched framework to automate the dependency
> > tracking and VM job dequeuing logic, but this comes with its own set
> > of constraints, one of them being the fact we are not allowed to
> > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> > sort of deadlocks:
> >
> > - VM_BIND map job needs to allocate a page table to map some memory
> > to the VM. No memory available, so kswapd is kicked
> > - GPU driver shrinker backend ends up waiting on the fence attached to
> > the VM map job or any other job fence depending on this VM operation.
> >
> > With custom allocators, we will be able to pre-reserve enough pages to
> > guarantee the map/unmap operations we queued will take place without
> > going through the system allocator. But we can also optimize
> > allocation/reservation by not free-ing pages immediately, so any
> > upcoming page table allocation requests can be serviced by some free
> > page table pool kept at the driver level.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
> > drivers/iommu/io-pgtable.c | 12 ++++++++
> > 2 files changed, 46 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 72dcdd468cf3..c5c04f0106f3 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
> > }
> >
> > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> > - struct io_pgtable_cfg *cfg)
> > + struct io_pgtable_cfg *cfg,
> > + void *cookie)
> > {
> > struct device *dev = cfg->iommu_dev;
> > int order = get_order(size);
> > - struct page *p;
> > dma_addr_t dma;
> > void *pages;
> >
> > VM_BUG_ON((gfp & __GFP_HIGHMEM));
> > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > - if (!p)
> > +
> > + if (cfg->alloc) {
> > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
> > + } else {
> > + struct page *p;
> > +
> > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > + pages = p ? page_address(p) : NULL;
>
> Hmm, so the reason we pass the order is because the pgd may have
> additional alignment requirements but that's not passed back to your new
> allocation hook. Does it just return naturally aligned allocations?
So, the assumption was that the custom allocator should be aware of
these alignment constraints. Like, if size > min_granule_sz, then order
equal get_order(size).
Right now, I reject any size that's not SZ_4K, because, given the
VA-space and the pgtable config, I know I'll always end up with 4
levels and the pgd will fit in a 4k page. But if we ever decide we want
to support 64k granule or a VA space that's smaller, we'll adjust the
custom allocator accordingly.
I'm not sure we discussed this specific aspect with Robin, but his
point was that the user passing a custom allocator should be aware of
the page table format and all the allocation constraints that come with
it.
>
> If you don't care about the pgd (since it's not something that's going
> to be freed and reallocated during the lifetime of the page-table), then
> perhaps it would be cleaner to pass in a 'struct kmem_cache' for the
> non-pgd pages when configuring the page-table, rather than override the
> allocation function wholesale?
I'd be fine with that, but I wasn't sure every one would be okay to use
a kmem_cache as the page caching mechanism. I know Rob was intending to
use a custom allocator with the MSM MMU to provide the same sort of
caching for the Adreno GPU driver, so it might be good to have his
opinion on that.
> I think that would also mean that all the
> lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a
> little less fragile.
Well, that would only help if a custom kmem_cache is used, unless you
want to generalize the kmem_cache approach and force it to all
io-pgtable-arm users, in which case I probably don't need to pass a
custom kmem_cache in the first place (mine has nothing special).
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-08-09 15:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 12:17 [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Boris Brezillon
2023-08-09 12:17 ` Boris Brezillon
2023-08-09 12:17 ` [PATCH 1/2] " Boris Brezillon
2023-08-09 12:17 ` Boris Brezillon
2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
2023-08-09 12:17 ` Boris Brezillon
2023-08-09 14:47 ` Will Deacon
2023-08-09 14:47 ` Will Deacon
2023-08-09 15:10 ` Boris Brezillon [this message]
2023-08-09 15:10 ` Boris Brezillon
2023-08-28 12:50 ` Boris Brezillon
2023-08-28 12:50 ` Boris Brezillon
2023-09-20 16:42 ` Robin Murphy
2023-09-20 16:42 ` Robin Murphy
2023-11-10 9:52 ` Boris Brezillon
2023-11-10 9:52 ` Boris Brezillon
2023-09-20 13:12 ` [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Steven Price
2023-09-20 13:12 ` Steven Price
2023-10-23 21:02 ` Rob Clark
2023-10-23 21:02 ` Rob Clark
2023-11-07 11:52 ` Gaurav Kohli
2023-11-07 11:52 ` Gaurav Kohli
2023-11-07 12:01 ` Gaurav Kohli
2023-11-07 12:01 ` Gaurav Kohli
2023-11-10 9:47 ` Boris Brezillon
2023-11-10 9:47 ` Boris Brezillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230809171017.64409d60@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=robdclark@gmail.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.