* Re: [patch] 32-bit dma memory zone [not found] <20010607153119.H1522@suse.de> @ 2001-06-07 21:22 ` Linus Torvalds 2001-06-07 21:59 ` Richard Henderson ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Linus Torvalds @ 2001-06-07 21:22 UTC (permalink / raw) To: Jens Axboe, Patrick Mochel Cc: Alan Cox, David S. Miller, MOLNAR Ingo, Richard Henderson, Kanoj Sarcar, Kernel Mailing List On Thu, 7 Jun 2001, Jens Axboe wrote: > > I'd like to push this patch from the block highmem patch set, to prune > it down and make it easier to include it later on :-) > > This patch implements a new memory zone, ZONE_DMA32. It holds highmem > pages that are below 4GB, as we can do I/O on those directly. Also if we > do need to bounce a > 4GB page, we can use pages from this zone and not > always resort to < 960MB pages. Patrick Mochel has another patch that adds another zone on x86: the "low memory" zone for the 0-1MB area, which is special for some things, notably real mode bootstrapping (ie the SMP stuff could use it instead of the current special-case allocations, and Pat needs it for allocating low memory pags for suspend/resumt). I'd like to see what these two look like together. But even more I'd like to see a more dynamic zone setup: we already have people talking about adding memory dynamically at run-time on some of the server machines, which implies that we might want to add zones at a later time, along with binding those zones to different zonelists. This is also an issue for different architectures: some of these zones do not make any _sense_ on other architectures. For example, what's the difference between ZONE_HIGHMEM and ZONE_NORMAL on a sane 64-bit architecture (right now I _think_ the 64-bit architectures actually make ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they already need to be able to distinguish between memory that can be PCI-DMA'd to, and memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they use for the DMA32 stuff?). Anyway, what I'm saying is that "GFP_HIGHMEM" already traverses three zones, and with ZONE_1M and ZONE_DMA32, you'd have a list of five of them. Of which only _two_ would actually be meaningful on some architectures. So should we not try to have some nicer interface like create_zone(&zone, offset, end); add_zone(&zone, zonelist); and then we could on x86 have create_zone(zone+0, 0, 1M); create_zone(zone+1, 1M, 16M); create_zone(zone+2, 16M, 896M); create_zone(zone+3, 896M, 4G); create_zone(zone+4, 4G, 64G); .. populate the zones .. add_zone(zone+4, GFP_HIGHMEM); add_zone(zone+3, GFP_HIGHMEM); add_zone(zone+3, GFP_DMA32); add_zone(zone+2, GFP_HIGHMEM); add_zone(zone+2, GFP_DMA32); add_zone(zone+2, GFP_NORMAL); /* the 1M-16M zone is usable for just about everything */ add_zone(zone+1, GFP_HIGHMEM); add_zone(zone+1, GFP_DMA32); add_zone(zone+1, GFP_NORMAL); add_zone(zone+1, GFP_DMA); /* The low 1M can be used for everything */ add_zone(zone+0, GFP_HIGHMEM); add_zone(zone+0, GFP_DMA32); add_zone(zone+0, GFP_NORMAL); add_zone(zone+0, GFP_DMA); add_zone(zone+0, GFP_LOWMEM); and eventually, when we get hot-plug memory, the hotplug event would be just something like zone = kmalloc(sizeof(struct zone), GFP_KERNEL); create_zone(zone, start, end); .. populate it with the newly added memory .. /* * Add it to all the appropriate zones (I suspect hotplug will * only occur in high memory, but who knows? */ add_zone(zone, GFP_HIGHMEM); ... (Note how this might also be part of the equation of how you add nodes dynamically in a NuMA environment). And see how the above would mean that something like sparc64 wouldn't need to see five zones when it reall yonly needs two of them. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds @ 2001-06-07 21:59 ` Richard Henderson 2001-06-08 1:30 ` David S. Miller 2001-06-08 14:55 ` Eric W. Biederman 2001-06-08 9:05 ` Russell King 2001-06-08 11:19 ` Jens Axboe 2 siblings, 2 replies; 8+ messages in thread From: Richard Henderson @ 2001-06-07 21:59 UTC (permalink / raw) To: Linus Torvalds Cc: Jens Axboe, Patrick Mochel, Alan Cox, David S. Miller, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote: > For example, what's the difference between ZONE_HIGHMEM and ZONE_NORMAL > on a sane 64-bit architecture (right now I _think_ the 64-bit architectures > actually make ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they > already need to be able to distinguish between memory that can be PCI-DMA'd > to, and memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they > use for the DMA32 stuff?). On most alphas we use only one zone -- ZONE_DMA. The iommu makes it possible to do 32-bit pci to the entire memory space. For those alphas without an iommu, we also set up ZONE_NORMAL. r~ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-07 21:59 ` Richard Henderson @ 2001-06-08 1:30 ` David S. Miller 2001-06-08 8:58 ` Steffen Persvold 2001-06-08 14:55 ` Eric W. Biederman 1 sibling, 1 reply; 8+ messages in thread From: David S. Miller @ 2001-06-08 1:30 UTC (permalink / raw) To: Richard Henderson Cc: Linus Torvalds, Jens Axboe, Patrick Mochel, Alan Cox, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List Richard Henderson writes: > On most alphas we use only one zone -- ZONE_DMA. The iommu makes it > possible to do 32-bit pci to the entire memory space. > > For those alphas without an iommu, we also set up ZONE_NORMAL. And on sparc64 since all machines have an iommu, we use just ZONE_DMA for everything. Later, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-08 1:30 ` David S. Miller @ 2001-06-08 8:58 ` Steffen Persvold 0 siblings, 0 replies; 8+ messages in thread From: Steffen Persvold @ 2001-06-08 8:58 UTC (permalink / raw) To: David S. Miller Cc: Richard Henderson, Linus Torvalds, Jens Axboe, Patrick Mochel, Alan Cox, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List "David S. Miller" wrote: > > Richard Henderson writes: > > On most alphas we use only one zone -- ZONE_DMA. The iommu makes it > > possible to do 32-bit pci to the entire memory space. > > > > For those alphas without an iommu, we also set up ZONE_NORMAL. > > And on sparc64 since all machines have an iommu, we use just ZONE_DMA > for everything. > And on IA64 they use both ZONE_NORMAL and ZONE_DMA. ZONE_DMA is up to 4GB. This setup actually makes a PCI device driver I'm writing kind of broken. It allocates buffers (with get_free_page) for streaming DMA and pass them on to pci_map_sg(). These buffers can be really large because this is a shared memory adapter where you basically make large portions (>100MByte) of your memory available to other machines over the PCI bus. Unfortunately this adapter is not able to do DAC (64bit addressing) so I have to be sure that the physical memory is within a 32bit range. Bounce buffers is really out of the question because it will kill my performance. On Alpha (atleast for the Tsunami and Nautilus models I've looked at) this guaranteed by using either the direct mapped windows (which limits you to 2GB of physical memory) or the IOMMU scatter-gather windows. On i386 I can't use GFP_DMA because this will only give me memory below 16MByte and that is not enough for these buffers, but just using the ZONE_NORMAL (no special GFP flag to get_free_page) memory is fine (BTW, I have not yet understood how a 32bit machine can access more than 4GB physical memory..). The problem child here is IA64. These machines may or may not have an IOMMU. If the machine doesn't have an IOMMU (like the 460GX chipset) and you have a lot of memory (like 2GB) you might get a physical address above the 4GB boundary which is no good for my 32bit device. The IA64 code fixes this by using something called Sofwtare I/O TLB, which copies your data to a memory below the 4GB boundary when you do pci_map_xxx (if direction is DMA_TO_DEVICE), and copies it back when you do pci_unmap_ (if direction is DMA_FROM_DEVICE). I guess this is what you call bounce buffers, but since this I/O TLB area is so small by default (4MByte I think) it is no good either because I will soon run out of Software IO TLB Entries resulting in a kernel panic. My solution here was to use the GFP_DMA flag to get_free_page to ensure that the page was below the 4GB boundary. Now, because of this I need a #ifdef __ia64__ (or maybe I could use #ifndef __i386__ ?) in my driver but I would really not like to have that there. My suggestion is therefore to have a ZONE_32BIT (and a corresponding GFP_32BIT flag) to have a common way of ensuring that the memory you get is guaranteed to be below the 4GB boundary. Actually this is already mentioned in the IA64 swiotlb_alloc_consistent() code : if (!hwdev || hwdev->dma_mask <= 0xffffffff) gfp |= GFP_DMA; /* XXX fix me: should change this to GFP_32BIT or ZONE_32BIT */ ret = (void *)__get_free_pages(gfp, get_order(size)); Regards, -- Steffen Persvold Systems Engineer Email : mailto:sp@scali.no Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-07 21:59 ` Richard Henderson 2001-06-08 1:30 ` David S. Miller @ 2001-06-08 14:55 ` Eric W. Biederman 2001-06-08 21:00 ` H. Peter Anvin 1 sibling, 1 reply; 8+ messages in thread From: Eric W. Biederman @ 2001-06-08 14:55 UTC (permalink / raw) To: Richard Henderson Cc: Linus Torvalds, Jens Axboe, Patrick Mochel, Alan Cox, David S. Miller, MOLNAR Ingo, Kanoj Sarcar, Kernel Mailing List Richard Henderson <rth@redhat.com> writes: > On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote: > > For example, what's the difference between ZONE_HIGHMEM and ZONE_NORMAL > > on a sane 64-bit architecture (right now I _think_ the 64-bit architectures > > actually make ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they > > already need to be able to distinguish between memory that can be PCI-DMA'd > > to, and memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they > > use for the DMA32 stuff?). > > On most alphas we use only one zone -- ZONE_DMA. The iommu makes it > possible to do 32-bit pci to the entire memory space. > > For those alphas without an iommu, we also set up ZONE_NORMAL. The AMD760 which looks like it might walk on both the alpha, an x86 side of the fence also has an iommu. Mostly it's used for AGP but according to the docs it should be able to handle the other cases as well. The only downside is that it only supports 4GB of ram... Anyway we shouldn't assume iommu's don't exist on x86. Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-08 14:55 ` Eric W. Biederman @ 2001-06-08 21:00 ` H. Peter Anvin 0 siblings, 0 replies; 8+ messages in thread From: H. Peter Anvin @ 2001-06-08 21:00 UTC (permalink / raw) To: linux-kernel Followup to: <m13d9b3ttj.fsf@frodo.biederman.org> By author: ebiederm@xmission.com (Eric W. Biederman) In newsgroup: linux.dev.kernel > > The AMD760 which looks like it might walk on both the alpha, an x86 > side of the fence also has an iommu. Mostly it's used for AGP but > according to the docs it should be able to handle the other cases as > well. The only downside is that it only supports 4GB of ram... > > Anyway we shouldn't assume iommu's don't exist on x86. > On most chips the AGP GART isn't just limited to AGP; it's a full-fledged iommu. The main problem with it is that it is usually a rather limited space it provides. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds 2001-06-07 21:59 ` Richard Henderson @ 2001-06-08 9:05 ` Russell King 2001-06-08 11:19 ` Jens Axboe 2 siblings, 0 replies; 8+ messages in thread From: Russell King @ 2001-06-08 9:05 UTC (permalink / raw) To: Linus Torvalds Cc: Jens Axboe, Patrick Mochel, Alan Cox, David S. Miller, MOLNAR Ingo, Richard Henderson, Kanoj Sarcar, Kernel Mailing List On Thu, Jun 07, 2001 at 02:22:10PM -0700, Linus Torvalds wrote: > So should we not try to have some nicer interface like > ... This would certainly be very useful for ARM. For several machines, we don't want the dma region starting at ram offset 0, but at some offset into the memory space. Your suggested interface allows for this nicely. -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] 32-bit dma memory zone 2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds 2001-06-07 21:59 ` Richard Henderson 2001-06-08 9:05 ` Russell King @ 2001-06-08 11:19 ` Jens Axboe 2 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2001-06-08 11:19 UTC (permalink / raw) To: Linus Torvalds Cc: Patrick Mochel, Alan Cox, David S. Miller, MOLNAR Ingo, Richard Henderson, Kanoj Sarcar, Kernel Mailing List On Thu, Jun 07 2001, Linus Torvalds wrote: > > On Thu, 7 Jun 2001, Jens Axboe wrote: > > > > I'd like to push this patch from the block highmem patch set, to prune > > it down and make it easier to include it later on :-) > > > > This patch implements a new memory zone, ZONE_DMA32. It holds highmem > > pages that are below 4GB, as we can do I/O on those directly. Also if we > > do need to bounce a > 4GB page, we can use pages from this zone and not > > always resort to < 960MB pages. > > Patrick Mochel has another patch that adds another zone on x86: the "low > memory" zone for the 0-1MB area, which is special for some things, notably > real mode bootstrapping (ie the SMP stuff could use it instead of the > current special-case allocations, and Pat needs it for allocating low > memory pags for suspend/resumt). > > I'd like to see what these two look like together. Not a problem, would be easy to add 'one more zone'. > But even more I'd like to see a more dynamic zone setup: we already have [snip] Sure this looks pretty sane. Is this really what you want for 2.4? How about just adding the DMA32 and 1M zone right now, and postpone the bigger zone changes to 2.5. To be honest, I already started implementing your specified interface -- most of the changes aren't too bad, but still... -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-06-08 21:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20010607153119.H1522@suse.de>
2001-06-07 21:22 ` [patch] 32-bit dma memory zone Linus Torvalds
2001-06-07 21:59 ` Richard Henderson
2001-06-08 1:30 ` David S. Miller
2001-06-08 8:58 ` Steffen Persvold
2001-06-08 14:55 ` Eric W. Biederman
2001-06-08 21:00 ` H. Peter Anvin
2001-06-08 9:05 ` Russell King
2001-06-08 11:19 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox