public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] 32-bit dma memory zone
       [not found] <20010607153119.H1522@suse.de>
@ 2001-06-07 21:22 ` Linus Torvalds
  2001-06-07 21:59   ` Richard Henderson
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Linus Torvalds @ 2001-06-07 21:22 UTC (permalink / raw)
  To: Jens Axboe, Patrick Mochel
  Cc: Alan Cox, David S. Miller, MOLNAR Ingo, Richard Henderson,
	Kanoj Sarcar, Kernel Mailing List


On Thu, 7 Jun 2001, Jens Axboe wrote:
> 
> I'd like to push this patch from the block highmem patch set, to prune
> it down and make it easier to include it later on :-)
> 
> This patch implements a new memory zone, ZONE_DMA32. It holds highmem
> pages that are below 4GB, as we can do I/O on those directly. Also if we
> do need to bounce a > 4GB page, we can use pages from this zone and not
> always resort to < 960MB pages.

Patrick Mochel has another patch that adds another zone on x86: the "low
memory" zone for the 0-1MB area, which is special for some things, notably
real mode bootstrapping (ie the SMP stuff could use it instead of the
current special-case allocations, and Pat needs it for allocating low
memory pags for suspend/resumt).

I'd like to see what these two look like together.

But even more I'd like to see a more dynamic zone setup: we already have
people talking about adding memory dynamically at run-time on some of the
server machines, which implies that we might want to add zones at a later
time, along with binding those zones to different zonelists.

This is also an issue for different architectures: some of these zones do
not make any _sense_ on other architectures. For example, what's the
difference between ZONE_HIGHMEM and ZONE_NORMAL on a sane 64-bit
architecture (right now I _think_ the 64-bit architectures actually make
ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they already need
to be able to distinguish between memory that can be PCI-DMA'd to, and
memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they use for
the DMA32 stuff?).

Anyway, what I'm saying is that "GFP_HIGHMEM" already traverses three
zones, and with ZONE_1M and ZONE_DMA32, you'd have a list of five of them.
Of which only _two_ would actually be meaningful on some architectures.

So should we not try to have some nicer interface like

	create_zone(&zone, offset, end);

	add_zone(&zone, zonelist);

and then we could on x86 have

	create_zone(zone+0, 0, 1M);
	create_zone(zone+1, 1M, 16M);
	create_zone(zone+2, 16M, 896M);
	create_zone(zone+3, 896M, 4G);
	create_zone(zone+4, 4G, 64G);

	.. populate the zones ..

	add_zone(zone+4, GFP_HIGHMEM);

	add_zone(zone+3, GFP_HIGHMEM);
	add_zone(zone+3, GFP_DMA32);

	add_zone(zone+2, GFP_HIGHMEM);
	add_zone(zone+2, GFP_DMA32);
	add_zone(zone+2, GFP_NORMAL);

	/* the 1M-16M zone is usable for just about everything */
	add_zone(zone+1, GFP_HIGHMEM);
	add_zone(zone+1, GFP_DMA32);
	add_zone(zone+1, GFP_NORMAL);
	add_zone(zone+1, GFP_DMA);

	/* The low 1M can be used for everything */
	add_zone(zone+0, GFP_HIGHMEM);
	add_zone(zone+0, GFP_DMA32);
	add_zone(zone+0, GFP_NORMAL);
	add_zone(zone+0, GFP_DMA);
	add_zone(zone+0, GFP_LOWMEM);

and eventually, when we get hot-plug memory, the hotplug event would be
just something like

	zone = kmalloc(sizeof(struct zone), GFP_KERNEL);
	create_zone(zone, start, end);

	.. populate it with the newly added memory ..

	/*
	 * Add it to all the appropriate zones (I suspect hotplug will
	 * only occur in high memory, but who knows? 
	 */
	add_zone(zone, GFP_HIGHMEM);
	...

(Note how this might also be part of the equation of how you add nodes
dynamically in a NuMA environment).

And see how the above would mean that something like sparc64 wouldn't need
to see five zones when it reall yonly needs two of them.

		Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds
@ 2001-06-07 21:59   ` Richard Henderson
  2001-06-08  1:30     ` David S. Miller
  2001-06-08 14:55     ` Eric W. Biederman
  2001-06-08  9:05   ` Russell King
  2001-06-08 11:19   ` Jens Axboe
  2 siblings, 2 replies; 8+ messages in thread
From: Richard Henderson @ 2001-06-07 21:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Patrick Mochel, Alan Cox, David S. Miller,
	MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List

On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote:
> For example, what's the difference between ZONE_HIGHMEM and ZONE_NORMAL
> on a sane 64-bit architecture (right now I _think_ the 64-bit architectures
> actually make ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they
> already need to be able to distinguish between memory that can be PCI-DMA'd
> to, and memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they
> use for the DMA32 stuff?).

On most alphas we use only one zone -- ZONE_DMA.  The iommu makes it
possible to do 32-bit pci to the entire memory space.

For those alphas without an iommu, we also set up ZONE_NORMAL.


r~

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-07 21:59   ` Richard Henderson
@ 2001-06-08  1:30     ` David S. Miller
  2001-06-08  8:58       ` Steffen Persvold
  2001-06-08 14:55     ` Eric W. Biederman
  1 sibling, 1 reply; 8+ messages in thread
From: David S. Miller @ 2001-06-08  1:30 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Linus Torvalds, Jens Axboe, Patrick Mochel, Alan Cox, MOLNAR Ingo,
	Kanoj Sarcar, Kernel Mailing List


Richard Henderson writes:
 > On most alphas we use only one zone -- ZONE_DMA.  The iommu makes it
 > possible to do 32-bit pci to the entire memory space.
 > 
 > For those alphas without an iommu, we also set up ZONE_NORMAL.

And on sparc64 since all machines have an iommu, we use just ZONE_DMA
for everything.

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-08  1:30     ` David S. Miller
@ 2001-06-08  8:58       ` Steffen Persvold
  0 siblings, 0 replies; 8+ messages in thread
From: Steffen Persvold @ 2001-06-08  8:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: Richard Henderson, Linus Torvalds, Jens Axboe, Patrick Mochel,
	Alan Cox, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List

"David S. Miller" wrote:
> 
> Richard Henderson writes:
>  > On most alphas we use only one zone -- ZONE_DMA.  The iommu makes it
>  > possible to do 32-bit pci to the entire memory space.
>  >
>  > For those alphas without an iommu, we also set up ZONE_NORMAL.
> 
> And on sparc64 since all machines have an iommu, we use just ZONE_DMA
> for everything.
> 

And on IA64 they use both ZONE_NORMAL and ZONE_DMA. ZONE_DMA is up to 4GB.

This setup actually makes a PCI device driver I'm writing kind of broken. It allocates
buffers (with get_free_page) for streaming DMA and pass them on to pci_map_sg(). These
buffers can be really large because this is a shared memory adapter where you basically
make large portions (>100MByte) of your memory available to other machines over the PCI
bus. Unfortunately this adapter is not able to do DAC (64bit addressing) so I have to be
sure that the physical memory is within a 32bit range. Bounce buffers is really out of the
question because it will kill my performance. On Alpha (atleast for the Tsunami and
Nautilus models I've looked at) this guaranteed by using either the direct mapped windows
(which limits you to 2GB of physical memory) or the IOMMU scatter-gather windows. On i386
I can't use GFP_DMA because this will only give me memory below 16MByte and that is not
enough for these buffers, but just using the ZONE_NORMAL (no special GFP flag to
get_free_page) memory is fine (BTW, I have not yet understood how a 32bit machine can
access more than 4GB physical memory..). The problem child here is IA64. These machines
may or may not have an IOMMU. If the machine doesn't have an IOMMU (like the 460GX
chipset) and you have a lot of memory (like 2GB) you might get a physical address above
the 4GB boundary which is no good for my 32bit device. The IA64 code fixes this by using
something called Sofwtare I/O TLB, which copies your data to a memory below the 4GB
boundary when you do pci_map_xxx (if direction is DMA_TO_DEVICE), and copies it back when
you do pci_unmap_ (if direction is DMA_FROM_DEVICE). I guess this is what you call bounce
buffers, but since this I/O TLB area is so small by default (4MByte I think) it is no good
either because I will soon run out of Software IO TLB Entries resulting in a kernel panic.
My solution here was to use the GFP_DMA flag to get_free_page to ensure that the page was
below the 4GB boundary.

Now, because of this I need a #ifdef __ia64__ (or maybe I could use #ifndef __i386__ ?) in
my driver but I would really not like to have that there. My suggestion is therefore to
have a ZONE_32BIT (and a corresponding GFP_32BIT flag) to have a common way of ensuring
that the memory you get is guaranteed to be below the 4GB boundary. Actually this is
already mentioned in the IA64 swiotlb_alloc_consistent() code :

if (!hwdev || hwdev->dma_mask <= 0xffffffff)
	gfp |= GFP_DMA; /* XXX fix me: should change this to GFP_32BIT or ZONE_32BIT */
ret = (void *)__get_free_pages(gfp, get_order(size));


Regards,
-- 
  Steffen Persvold               Systems Engineer
  Email : mailto:sp@scali.no     Scali AS (http://www.scali.com)
  Tlf   : (+47) 22 62 89 50      Olaf Helsets vei 6
  Fax   : (+47) 22 62 89 51      N-0621 Oslo, Norway

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds
  2001-06-07 21:59   ` Richard Henderson
@ 2001-06-08  9:05   ` Russell King
  2001-06-08 11:19   ` Jens Axboe
  2 siblings, 0 replies; 8+ messages in thread
From: Russell King @ 2001-06-08  9:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Patrick Mochel, Alan Cox, David S. Miller,
	MOLNAR Ingo, Richard Henderson, Kanoj Sarcar, Kernel Mailing List

On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote:
> So should we not try to have some nicer interface like
> ...

This would certainly be very useful for ARM.  For several machines,
we don't want the dma region starting at ram offset 0, but at some
offset into the memory space.  Your suggested interface allows for
this nicely.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds
  2001-06-07 21:59   ` Richard Henderson
  2001-06-08  9:05   ` Russell King
@ 2001-06-08 11:19   ` Jens Axboe
  2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2001-06-08 11:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick Mochel, Alan Cox, David S. Miller, MOLNAR Ingo,
	Richard Henderson, Kanoj Sarcar, Kernel Mailing List

On Thu, Jun 07 2001, Linus Torvalds wrote:
> 
> On Thu, 7 Jun 2001, Jens Axboe wrote:
> > 
> > I'd like to push this patch from the block highmem patch set, to prune
> > it down and make it easier to include it later on :-)
> > 
> > This patch implements a new memory zone, ZONE_DMA32. It holds highmem
> > pages that are below 4GB, as we can do I/O on those directly. Also if we
> > do need to bounce a > 4GB page, we can use pages from this zone and not
> > always resort to < 960MB pages.
> 
> Patrick Mochel has another patch that adds another zone on x86: the "low
> memory" zone for the 0-1MB area, which is special for some things, notably
> real mode bootstrapping (ie the SMP stuff could use it instead of the
> current special-case allocations, and Pat needs it for allocating low
> memory pags for suspend/resumt).
> 
> I'd like to see what these two look like together.

Not a problem, would be easy to add 'one more zone'.

> But even more I'd like to see a more dynamic zone setup: we already have

[snip]

Sure this looks pretty sane. Is this really what you want for 2.4? How
about just adding the DMA32 and 1M zone right now, and postpone the
bigger zone changes to 2.5. To be honest, I already started implementing
your specified interface -- most of the changes aren't too bad, but
still...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-07 21:59   ` Richard Henderson
  2001-06-08  1:30     ` David S. Miller
@ 2001-06-08 14:55     ` Eric W. Biederman
  2001-06-08 21:00       ` H. Peter Anvin
  1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2001-06-08 14:55 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Linus Torvalds, Jens Axboe, Patrick Mochel, Alan Cox,
	David S. Miller, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List

Richard Henderson <rth@redhat.com> writes:

> On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote:
> > For example, what's the difference between ZONE_HIGHMEM and ZONE_NORMAL
> > on a sane 64-bit architecture (right now I _think_ the 64-bit architectures
> > actually make ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they
> > already need to be able to distinguish between memory that can be PCI-DMA'd
> > to, and memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they
> > use for the DMA32 stuff?).
> 
> On most alphas we use only one zone -- ZONE_DMA.  The iommu makes it
> possible to do 32-bit pci to the entire memory space.
> 
> For those alphas without an iommu, we also set up ZONE_NORMAL.

The AMD760 which looks like it might walk on both the alpha, an x86
side of the fence also has an iommu.  Mostly it's used for AGP but
according to the docs it should be able to handle the other cases as
well.  The only downside is that it only supports 4GB of ram...

Anyway we shouldn't assume iommu's don't exist on x86.

Eric







^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] 32-bit dma memory zone
  2001-06-08 14:55     ` Eric W. Biederman
@ 2001-06-08 21:00       ` H. Peter Anvin
  0 siblings, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2001-06-08 21:00 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <m13d9b3ttj.fsf@frodo.biederman.org>
By author:    ebiederm@xmission.com (Eric W. Biederman)
In newsgroup: linux.dev.kernel
> 
> The AMD760 which looks like it might walk on both the alpha, an x86
> side of the fence also has an iommu.  Mostly it's used for AGP but
> according to the docs it should be able to handle the other cases as
> well.  The only downside is that it only supports 4GB of ram...
> 
> Anyway we shouldn't assume iommu's don't exist on x86.
> 

On most chips the AGP GART isn't just limited to AGP; it's a
full-fledged iommu.  The main problem with it is that it is usually a
rather limited space it provides.

       -hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-06-08 21:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20010607153119.H1522@suse.de>
2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds
2001-06-07 21:59   ` Richard Henderson
2001-06-08  1:30     ` David S. Miller
2001-06-08  8:58       ` Steffen Persvold
2001-06-08 14:55     ` Eric W. Biederman
2001-06-08 21:00       ` H. Peter Anvin
2001-06-08  9:05   ` Russell King
2001-06-08 11:19   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox