linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
       [not found]       ` <20180211112808.GA4551@bombadil.infradead.org>
@ 2018-02-11 12:05         ` Matthew Wilcox
  2018-02-11 12:05           ` Matthew Wilcox
  2018-02-11 23:51           ` Matthew Wilcox
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox @ 2018-02-11 12:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kai Heng Feng, Laura Abbott, linux-mm, Linux Kernel Mailing List,
	linux-arch, James.Bottomley, davem

On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> Now, longer-term, perhaps we should do the following:
> 
> #ifdef CONFIG_ZONE_DMA32
> #define OPT_ZONE_DMA32	ZONE_DMA32
> #elif defined(CONFIG_64BIT)
> #define OPT_ZONE_DMA	OPT_ZONE_DMA
> #else
> #define OPT_ZONE_DMA32 ZONE_NORMAL
> #endif
> 
> Then we wouldn't need the ifdef here and could always use GFP_DMA32
> | GFP_KERNEL.  Would need to audit current users and make sure they
> wouldn't be broken by such a change.

Argh, I forgot to say the most important thing.  (For those newly invited
to the party, we're talking about drivers/media, in particular
drivers/media/common/saa7146/saa7146_core.c, functions
saa7146_vmalloc_build_pgtable and vmalloc_to_sg)

I think we're missing a function in our DMA API.  These drivers don't
actually need physical memory below the 4GB mark.  They need DMA addresses
which are below the 4GB mark.  For machines with IOMMUs, this can mean
no restrictions on physical memory.  If we don't have an IOMMU, then a
bounce buffer could be used (but would be slow) -- like the swiotlb.
So we should endeavour to allocate memory below the 4GB boundary on
systems with no IOMMU, but can allocate memory anywhere on systems with
an IOMMU.

For consistent / coherent memory, we have an allocation function.
But we don't have an allocation function for streaming memory, which is
what these drivers want.  They also flush the DMA memory and then access
the memory through a different virtual mapping, which I'm not sure is
going to work well on virtually-indexed caches like SPARC and PA-RISC
(maybe not MIPS either?)

I think we want something like

struct scatterlist *dma_alloc_sg(struct device *dev, int *nents);
void dma_free_sg(struct device *dev, struct scatterlist *sg, int nents);

That lets individual architectures decide where to allocate, and handle
the tradeoff between allocating below 4GB and using bounce buffers.

I don't have a good answer to synchronising between device-view of
memory and CPU-view-through-vmalloc though.  They're already calling
dma_sync_*_for_cpu(); do they need to also call a new vflush(void *p,
unsigned long len) function which can be a no-op on x86 and flushes the
range on SPARC/PA-RISC/... ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
  2018-02-11 12:05         ` Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") Matthew Wilcox
@ 2018-02-11 12:05           ` Matthew Wilcox
  2018-02-11 23:51           ` Matthew Wilcox
  1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2018-02-11 12:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kai Heng Feng, Laura Abbott, linux-mm, Linux Kernel Mailing List,
	linux-arch, James.Bottomley, davem

On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> Now, longer-term, perhaps we should do the following:
> 
> #ifdef CONFIG_ZONE_DMA32
> #define OPT_ZONE_DMA32	ZONE_DMA32
> #elif defined(CONFIG_64BIT)
> #define OPT_ZONE_DMA	OPT_ZONE_DMA
> #else
> #define OPT_ZONE_DMA32 ZONE_NORMAL
> #endif
> 
> Then we wouldn't need the ifdef here and could always use GFP_DMA32
> | GFP_KERNEL.  Would need to audit current users and make sure they
> wouldn't be broken by such a change.

Argh, I forgot to say the most important thing.  (For those newly invited
to the party, we're talking about drivers/media, in particular
drivers/media/common/saa7146/saa7146_core.c, functions
saa7146_vmalloc_build_pgtable and vmalloc_to_sg)

I think we're missing a function in our DMA API.  These drivers don't
actually need physical memory below the 4GB mark.  They need DMA addresses
which are below the 4GB mark.  For machines with IOMMUs, this can mean
no restrictions on physical memory.  If we don't have an IOMMU, then a
bounce buffer could be used (but would be slow) -- like the swiotlb.
So we should endeavour to allocate memory below the 4GB boundary on
systems with no IOMMU, but can allocate memory anywhere on systems with
an IOMMU.

For consistent / coherent memory, we have an allocation function.
But we don't have an allocation function for streaming memory, which is
what these drivers want.  They also flush the DMA memory and then access
the memory through a different virtual mapping, which I'm not sure is
going to work well on virtually-indexed caches like SPARC and PA-RISC
(maybe not MIPS either?)

I think we want something like

struct scatterlist *dma_alloc_sg(struct device *dev, int *nents);
void dma_free_sg(struct device *dev, struct scatterlist *sg, int nents);

That lets individual architectures decide where to allocate, and handle
the tradeoff between allocating below 4GB and using bounce buffers.

I don't have a good answer to synchronising between device-view of
memory and CPU-view-through-vmalloc though.  They're already calling
dma_sync_*_for_cpu(); do they need to also call a new vflush(void *p,
unsigned long len) function which can be a no-op on x86 and flushes the
range on SPARC/PA-RISC/... ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
  2018-02-11 12:05         ` Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") Matthew Wilcox
  2018-02-11 12:05           ` Matthew Wilcox
@ 2018-02-11 23:51           ` Matthew Wilcox
  2018-02-11 23:51             ` Matthew Wilcox
  2018-02-14 14:04             ` Michal Hocko
  1 sibling, 2 replies; 5+ messages in thread
From: Matthew Wilcox @ 2018-02-11 23:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kai Heng Feng, Laura Abbott, linux-mm, Linux Kernel Mailing List,
	linux-arch, James.Bottomley, davem

On Sun, Feb 11, 2018 at 04:05:15AM -0800, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> > Now, longer-term, perhaps we should do the following:
> > 
> > #ifdef CONFIG_ZONE_DMA32
> > #define OPT_ZONE_DMA32	ZONE_DMA32
> > #elif defined(CONFIG_64BIT)
> > #define OPT_ZONE_DMA	OPT_ZONE_DMA
> > #else
> > #define OPT_ZONE_DMA32 ZONE_NORMAL
> > #endif
> 
> For consistent / coherent memory, we have an allocation function.
> But we don't have an allocation function for streaming memory, which is
> what these drivers want.  They also flush the DMA memory and then access
> the memory through a different virtual mapping, which I'm not sure is
> going to work well on virtually-indexed caches like SPARC and PA-RISC
> (maybe not MIPS either?)

Perhaps I (and a number of other people ...) have misunderstood the
semantics of GFP_DMA32.  Perhaps GFP_DMA32 is not "allocate memory below
4GB", perhaps it's "allocate memory which can be mapped below 4GB".
Machines with an IOMMU can use ZONE_NORMAL.  Machines with no IOMMU can
choose to allocate memory with a physical address below 4GB.

After all, it has 'DMA' right there in the name.  If someone's relying
on it to allocate physical memory below 4GB, they're arguably misusing it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
  2018-02-11 23:51           ` Matthew Wilcox
@ 2018-02-11 23:51             ` Matthew Wilcox
  2018-02-14 14:04             ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2018-02-11 23:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kai Heng Feng, Laura Abbott, linux-mm, Linux Kernel Mailing List,
	linux-arch, James.Bottomley, davem

On Sun, Feb 11, 2018 at 04:05:15AM -0800, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> > Now, longer-term, perhaps we should do the following:
> > 
> > #ifdef CONFIG_ZONE_DMA32
> > #define OPT_ZONE_DMA32	ZONE_DMA32
> > #elif defined(CONFIG_64BIT)
> > #define OPT_ZONE_DMA	OPT_ZONE_DMA
> > #else
> > #define OPT_ZONE_DMA32 ZONE_NORMAL
> > #endif
> 
> For consistent / coherent memory, we have an allocation function.
> But we don't have an allocation function for streaming memory, which is
> what these drivers want.  They also flush the DMA memory and then access
> the memory through a different virtual mapping, which I'm not sure is
> going to work well on virtually-indexed caches like SPARC and PA-RISC
> (maybe not MIPS either?)

Perhaps I (and a number of other people ...) have misunderstood the
semantics of GFP_DMA32.  Perhaps GFP_DMA32 is not "allocate memory below
4GB", perhaps it's "allocate memory which can be mapped below 4GB".
Machines with an IOMMU can use ZONE_NORMAL.  Machines with no IOMMU can
choose to allocate memory with a physical address below 4GB.

After all, it has 'DMA' right there in the name.  If someone's relying
on it to allocate physical memory below 4GB, they're arguably misusing it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
  2018-02-11 23:51           ` Matthew Wilcox
  2018-02-11 23:51             ` Matthew Wilcox
@ 2018-02-14 14:04             ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2018-02-14 14:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kai Heng Feng, Laura Abbott, linux-mm, Linux Kernel Mailing List,
	linux-arch, James.Bottomley, davem

On Sun 11-02-18 15:51:07, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 04:05:15AM -0800, Matthew Wilcox wrote:
> > On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> > > Now, longer-term, perhaps we should do the following:
> > > 
> > > #ifdef CONFIG_ZONE_DMA32
> > > #define OPT_ZONE_DMA32	ZONE_DMA32
> > > #elif defined(CONFIG_64BIT)
> > > #define OPT_ZONE_DMA	OPT_ZONE_DMA
> > > #else
> > > #define OPT_ZONE_DMA32 ZONE_NORMAL
> > > #endif
> > 
> > For consistent / coherent memory, we have an allocation function.
> > But we don't have an allocation function for streaming memory, which is
> > what these drivers want.  They also flush the DMA memory and then access
> > the memory through a different virtual mapping, which I'm not sure is
> > going to work well on virtually-indexed caches like SPARC and PA-RISC
> > (maybe not MIPS either?)
> 
> Perhaps I (and a number of other people ...) have misunderstood the
> semantics of GFP_DMA32.  Perhaps GFP_DMA32 is not "allocate memory below
> 4GB", perhaps it's "allocate memory which can be mapped below 4GB".

Well, GFP_DMA32 is clearly under-documented. But I _believe_ the
intention was to really return a physical memory within 32b address
range.

> Machines with an IOMMU can use ZONE_NORMAL.  Machines with no IOMMU can
> choose to allocate memory with a physical address below 4GB.

This would be something for the higher level allocator I think. The page
allocator is largely unaware of IOMMU or any remapping and that is good
IMHO.

> After all, it has 'DMA' right there in the name.

The name is misnomer following GFP_DMA which is arguably a better fit.
GFP_MEM32 would be a better name.

Btw. I believe the GFP_VMALLOC32 shows that our GFP_DM32 needs some
love. The user shouldn't really care about lowmem zones layout.
GFP_DMA32 should simply use the appropriate zone regardless the arch
specific details.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-14 14:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <627DA40A-D0F6-41C1-BB5A-55830FBC9800@canonical.com>
     [not found] ` <20180208130649.GA15846@bombadil.infradead.org>
     [not found]   ` <20180208232004.GA21027@bombadil.infradead.org>
     [not found]     ` <20180211092652.GV21609@dhcp22.suse.cz>
     [not found]       ` <20180211112808.GA4551@bombadil.infradead.org>
2018-02-11 12:05         ` Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") Matthew Wilcox
2018-02-11 12:05           ` Matthew Wilcox
2018-02-11 23:51           ` Matthew Wilcox
2018-02-11 23:51             ` Matthew Wilcox
2018-02-14 14:04             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).