* 2.4.8aa1 @ 2001-08-11 12:58 Andrea Arcangeli 2001-08-11 13:32 ` 2.4.8aa1 Eyal Lebedinsky 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2001-08-11 12:58 UTC (permalink / raw) To: linux-kernel Only in 2.4.8pre8aa1: 40_blkdev-pagecache-11 Only in 2.4.8aa1: 40_blkdev-pagecache-12 Fixed a missing drop_super and some other detail related to superblock handling (superblock cannot go away anymore under us after a get_super, so some piece of code is been cleaned up). Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-11 12:58 2.4.8aa1 Andrea Arcangeli @ 2001-08-11 13:32 ` Eyal Lebedinsky 2001-08-11 14:02 ` 2.4.8aa1 Andrea Arcangeli 0 siblings, 1 reply; 9+ messages in thread From: Eyal Lebedinsky @ 2001-08-11 13:32 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Andrea Arcangeli wrote: > > Only in 2.4.8pre8aa1: 40_blkdev-pagecache-11 > Only in 2.4.8aa1: 40_blkdev-pagecache-12 Everything possible selected as a module. The relevant DRM part of .config: CONFIG_DRM=y CONFIG_DRM_TDFX=m CONFIG_DRM_GAMMA=m CONFIG_DRM_R128=m CONFIG_DRM_RADEON=m CONFIG_DRM_I810=m CONFIG_DRM_MGA=m gcc -D__KERNEL__ -I/data2/usr/local/src/linux-2.4/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-aliasing -fno-common -pipe -mpreferred-stack-boundary=2 -march=i686 -malign-functions=4 -DMODULE -DMODVERSIONS -include /data2/usr/local/src/linux-2.4/include/linux/modversions.h -c -o r128_drv.o r128_drv.c In file included from r128_drv.c:36: ati_pcigart.h: In function `r128_ati_pcigart_init': ati_pcigart.h:114: structure has no member named `virtual' make[3]: *** [r128_drv.o] Error 1 make[3]: Leaving directory `/data2/usr/local/src/linux-2.4/drivers/char/drm' -- Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.anu.edu.au/eyal/> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-11 13:32 ` 2.4.8aa1 Eyal Lebedinsky @ 2001-08-11 14:02 ` Andrea Arcangeli 2001-08-11 18:20 ` 2.4.8aa1 Linus Torvalds 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2001-08-11 14:02 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: linux-kernel, Linus Torvalds On Sat, Aug 11, 2001 at 11:32:46PM +1000, Eyal Lebedinsky wrote: > Andrea Arcangeli wrote: > > > > Only in 2.4.8pre8aa1: 40_blkdev-pagecache-11 > > Only in 2.4.8aa1: 40_blkdev-pagecache-12 > > Everything possible selected as a module. > > The relevant DRM part of .config: > > CONFIG_DRM=y > CONFIG_DRM_TDFX=m > CONFIG_DRM_GAMMA=m > CONFIG_DRM_R128=m > CONFIG_DRM_RADEON=m > CONFIG_DRM_I810=m > CONFIG_DRM_MGA=m > > gcc -D__KERNEL__ -I/data2/usr/local/src/linux-2.4/include -Wall > -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer > -fno-strict-aliasing -fno-common -pipe -mpreferred-stack-boundary=2 > -march=i686 -malign-functions=4 -DMODULE -DMODVERSIONS -include > /data2/usr/local/src/linux-2.4/include/linux/modversions.h -c -o > r128_drv.o r128_drv.c > In file included from r128_drv.c:36: > ati_pcigart.h: In function `r128_ati_pcigart_init': > ati_pcigart.h:114: structure has no member named `virtual' > make[3]: *** [r128_drv.o] Error 1 > make[3]: Leaving directory > `/data2/usr/local/src/linux-2.4/drivers/char/drm' This is the same problem I mentioned yesterday to the list. Nobody should ever use page->virtual directly, it's not there in -aa when highmem is disabled to save memory and increase performance, if it would be in C or python it would be a private field of the class to make it explicit (nitpicking, in python __ just rename and it's techincally still visible from the outside of the class). page_address(page) must be used instead of page->virtual. Anyways here an incremental patch that will fix your compile problem (Linus please include): --- 2.4.8aa1/drivers/char/drm/ati_pcigart.h.~1~ Sat Aug 11 15:54:25 2001 +++ 2.4.8aa1/drivers/char/drm/ati_pcigart.h Sat Aug 11 15:54:55 2001 @@ -111,7 +111,7 @@ memset( pci_gart, 0, ATI_MAX_PCIGART_PAGES * sizeof(u32) ); for ( i = 0 ; i < pages ; i++ ) { - page_base = virt_to_bus( entry->pagelist[i]->virtual ); + page_base = virt_to_bus( page_address(entry->pagelist[i]) ); for (j = 0; j < (PAGE_SIZE / ATI_PCIGART_PAGE_SIZE); j++) { *pci_gart++ = cpu_to_le32( page_base ); page_base += ATI_PCIGART_PAGE_SIZE; --- 2.4.8aa1/drivers/char/drm/r128_cce.c.~1~ Sat Aug 11 08:04:05 2001 +++ 2.4.8aa1/drivers/char/drm/r128_cce.c Sat Aug 11 15:55:51 2001 @@ -351,10 +351,10 @@ page_ofs = tmp_ofs >> PAGE_SHIFT; R128_WRITE( R128_PM4_BUFFER_DL_RPTR_ADDR, - virt_to_bus(entry->pagelist[page_ofs]->virtual)); + virt_to_bus(page_address(entry->pagelist[page_ofs]))); DRM_DEBUG( "ring rptr: offset=0x%08lx handle=0x%08lx\n", - virt_to_bus(entry->pagelist[page_ofs]->virtual), + virt_to_bus(page_address(entry->pagelist[page_ofs])), entry->handle + tmp_ofs ); } --- 2.4.8aa1/drivers/char/drm/radeon_cp.c.~1~ Sat Aug 11 08:04:05 2001 +++ 2.4.8aa1/drivers/char/drm/radeon_cp.c Sat Aug 11 15:56:33 2001 @@ -624,10 +624,10 @@ page_ofs = tmp_ofs >> PAGE_SHIFT; RADEON_WRITE( RADEON_CP_RB_RPTR_ADDR, - virt_to_bus(entry->pagelist[page_ofs]->virtual)); + virt_to_bus(page_address(entry->pagelist[page_ofs]))); DRM_DEBUG( "ring rptr: offset=0x%08lx handle=0x%08lx\n", - virt_to_bus(entry->pagelist[page_ofs]->virtual), + virt_to_bus(page_address(entry->pagelist[page_ofs])), entry->handle + tmp_ofs ); } Also please include this below one too that I just addressed previously for my own compilations. (Eyal, you don't need to apply the below one of course) --- 2.4.8pre7aa1/drivers/char/drm/drm_vm.h.~1~ Thu Aug 9 01:37:46 2001 +++ 2.4.8pre7aa1/drivers/char/drm/drm_vm.h Thu Aug 9 01:42:22 2001 @@ -107,7 +107,7 @@ if( !pmd_present( *pmd ) ) return NOPAGE_OOM; pte = pte_offset( pmd, i ); if( !pte_present( *pte ) ) return NOPAGE_OOM; - physical = (unsigned long)pte_page( *pte )->virtual; + physical = (unsigned long)page_address(pte_page( *pte )); atomic_inc(&virt_to_page(physical)->count); /* Dec. by kernel */ DRM_DEBUG("0x%08lx => 0x%08lx\n", address, physical); Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-11 14:02 ` 2.4.8aa1 Andrea Arcangeli @ 2001-08-11 18:20 ` Linus Torvalds 2001-08-12 17:02 ` 2.4.8aa1 Andrea Arcangeli 0 siblings, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2001-08-11 18:20 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Eyal Lebedinsky, linux-kernel Andrea, mind cleaning this up a bit and not just papering over the horridness? On Sat, 11 Aug 2001, Andrea Arcangeli wrote: > > This is the same problem I mentioned yesterday to the list. Nobody > should ever use page->virtual directly, it's not there in -aa when > highmem is disabled to save memory and increase performance, if it would > be in C or python it would be a private field of the class to make it > explicit (nitpicking, in python __ just rename and it's techincally > still visible from the outside of the class). > > page_address(page) must be used instead of page->virtual. It would be good to instead adding the functions unsigned long pte_to_pfn(pte_t pte) { .. architecture-specific in asm/pgtable.h .. } /* * struct page -> "page frame number", ie * physical page number. */ unsigned long page_to_pfn(struct page *page) { zone_t zone = page->zone; return (page - zone->zone_mem_map) + zone->zone_start_mapnr; } unsigned long long page_to_bus(struct page *page) { return (unsigned long long) phys_to_bus(page_to_pfn(page) << PAGE_SHIFT; } and using those? As it is right now, drivers that _could_ use up to 4GB of bus addresses simply cannot do it, because there is no good way to get the high physical addresses from a "struct page". (You can do the above by hand, of course, but device driver writers really shouldn't know about the internals of page zone handling). The things that you changed were all virt_to_bus( page_address (...) ) which really is rather distateful in that it artificially limits itself to only lowmem code, and gets randomly incorrect values for anything else. > @@ -107,7 +107,7 @@ > if( !pmd_present( *pmd ) ) return NOPAGE_OOM; > pte = pte_offset( pmd, i ); > if( !pte_present( *pte ) ) return NOPAGE_OOM; > - physical = (unsigned long)pte_page( *pte )->virtual; > + physical = (unsigned long)page_address(pte_page( *pte )); > atomic_inc(&virt_to_page(physical)->count); /* Dec. by kernel */ That is just too ugly for words. It's not a "physical" address at all, but a virtual one, and we're getting the virtual address from the struct page just to get back to the struct page. It should really do struct page *page = pte_page( *pte ); get_page(page); instead.. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-11 18:20 ` 2.4.8aa1 Linus Torvalds @ 2001-08-12 17:02 ` Andrea Arcangeli 2001-08-12 17:21 ` 2.4.8aa1 Linus Torvalds 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2001-08-12 17:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eyal Lebedinsky, linux-kernel On Sat, Aug 11, 2001 at 11:20:07AM -0700, Linus Torvalds wrote: > > Andrea, > mind cleaning this up a bit and not just papering over the horridness? I'm sorry but while fixing the compilation troubles I didn't spent one millisecond looking at what the code was trying to achieve despite it looked doing something silly, I was just doing a blind sed/page->virtual/page_address(page)/ and I cared to communicate the community to never use page->virtual but to always use page_address() instead. Infact you obviously didn't looked at such horrid code either when you included it between 2.4.8pre6 and 2.4.8pre7 (not that I pretend you to look at every driver update you include of course, this is not a remark, it's just a matter of fact): diff -urN 2.4.8pre6/drivers/char/drm/drm_vm.h 2.4.8pre7/drivers/char/drm/drm_vm.h --- 2.4.8pre6/drivers/char/drm/drm_vm.h Thu Jan 1 01:00:00 1970 +++ 2.4.8pre7/drivers/char/drm/drm_vm.h Wed Aug 8 22:06:34 2001 [..] + physical = (unsigned long)pte_page( *pte )->virtual; + atomic_inc(&virt_to_page(physical)->count); /* Dec. by kernel */ [..] BTW, I think it would be nice if we also avoid to waste ram and x86 ram bandwith with page->virtual in mainline where it's not needed, here the patch (possibly other archs could use it too by simply defining the CONFIG_NO_PAGE_VIRTUAL to Y when highmem is not defined, but I thought it was mostly important for x86 lowend): diff -urN 2.4.6pre5/arch/i386/config.in novirtual/arch/i386/config.in --- 2.4.6pre5/arch/i386/config.in Thu Jun 21 08:03:30 2001 +++ novirtual/arch/i386/config.in Thu Jun 21 16:02:11 2001 @@ -165,6 +165,9 @@ define_bool CONFIG_HIGHMEM y define_bool CONFIG_X86_PAE y fi +if [ "$CONFIG_NOHIGHMEM" = "y" ]; then + define_bool CONFIG_NO_PAGE_VIRTUAL y +fi bool 'Math emulation' CONFIG_MATH_EMULATION bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR diff -urN 2.4.6pre5/include/asm-i386/pgtable.h novirtual/include/asm-i386/pgtable.h --- 2.4.6pre5/include/asm-i386/pgtable.h Thu Jun 14 18:07:49 2001 +++ novirtual/include/asm-i386/pgtable.h Thu Jun 21 16:02:11 2001 @@ -255,7 +255,11 @@ * Permanent address of a page. Obviously must never be * called on a highmem page. */ +#ifdef CONFIG_NO_PAGE_VIRTUAL +#define page_address(page) __va((page - mem_map) << PAGE_SHIFT) +#else #define page_address(page) ((page)->virtual) +#endif #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) /* diff -urN 2.4.6pre5/include/linux/mm.h novirtual/include/linux/mm.h --- 2.4.6pre5/include/linux/mm.h Thu Jun 21 08:03:56 2001 +++ novirtual/include/linux/mm.h Thu Jun 21 16:02:33 2001 @@ -160,8 +160,10 @@ wait_queue_head_t wait; /* Page locked? Stand in line... */ struct page **pprev_hash; /* Complement to *next_hash. */ struct buffer_head * buffers; /* Buffer maps us to a disk block. */ +#ifndef CONFIG_NO_PAGE_VIRTUAL void *virtual; /* Kernel virtual address (NULL if not kmapped, ie. highmem) */ +#endif struct zone_struct *zone; /* Memory zone we are in. */ } mem_map_t; diff -urN 2.4.6pre5/mm/page_alloc.c novirtual/mm/page_alloc.c --- 2.4.6pre5/mm/page_alloc.c Thu Jun 21 08:03:57 2001 +++ novirtual/mm/page_alloc.c Thu Jun 21 16:02:11 2001 @@ -851,8 +851,10 @@ for (i = 0; i < size; i++) { struct page *page = mem_map + offset + i; page->zone = zone; +#ifndef CONFIG_NO_PAGE_VIRTUAL if (j != ZONE_HIGHMEM) page->virtual = __va(zone_start_paddr); +#endif zone_start_paddr += PAGE_SIZE; } > On Sat, 11 Aug 2001, Andrea Arcangeli wrote: > > > > This is the same problem I mentioned yesterday to the list. Nobody > > should ever use page->virtual directly, it's not there in -aa when > > highmem is disabled to save memory and increase performance, if it would > > be in C or python it would be a private field of the class to make it > > explicit (nitpicking, in python __ just rename and it's techincally > > still visible from the outside of the class). > > > > page_address(page) must be used instead of page->virtual. > > It would be good to instead adding the functions > > unsigned long pte_to_pfn(pte_t pte) > { > .. architecture-specific in asm/pgtable.h .. > } > > /* > * struct page -> "page frame number", ie > * physical page number. > */ > unsigned long page_to_pfn(struct page *page) > { > zone_t zone = page->zone; > return (page - zone->zone_mem_map) + zone->zone_start_mapnr; > } > > unsigned long long page_to_bus(struct page *page) > { > return (unsigned long long) phys_to_bus(page_to_pfn(page) << PAGE_SHIFT; > } > > and using those? As it is right now, drivers that _could_ use up to 4GB of > bus addresses simply cannot do it, because there is no good way to get the > high physical addresses from a "struct page". > > (You can do the above by hand, of course, but device driver writers really > shouldn't know about the internals of page zone handling). > > The things that you changed were all > > virt_to_bus( page_address (...) ) > > which really is rather distateful in that it artificially limits itself to > only lowmem code, and gets randomly incorrect values for anything else. virt_to_bus(page_address(pte_page(pte))) actually returns the right bus address on x86 in a range of 4G, so the first part of the highmem (99% of boxes where those drivers are needed I guess) gets covered too (of course also virt_to_bus is limited to 32bit in its reval in first place so anything physical above 4g breaks on the 32bit boxes). In 2.2 instead virt_to_bus malfunctions on _all_ the highmem "virtual addresses" (if we can call ""virtual addresses"" the wrap around). But of course calling page_address() on anything highmem is ugly like hell even if it ""incidentally"" works on 2.4 up to 4G (it temporarly wraps around in userspace). Furthmore the worst bug here is that calling virt_to_bus will break badly on alpha or any other 64bit arch. virt_to_bus should never used and it should been undefined by davem when he introduced in the iommu API so people would stop doing those mistakes even today in a 2.4.8 driver update. The pci api on x86 will work up to 4G (as well as the virt_to_bus(page_address(pte_page(pte))) and then it will break because there's not iommu on x86, while on the real 64bit platforms it works for all the physical ram, but nobody checks for the reval so the iommu capable platforms are prone to break too. Also the fact we pass virtual addresses to the pci_map API is quite an ugly API for the 32bit platforms, same ugliness of virt_to_bus(page_address(pte_page(pte))) on the memory between 1G and 4G, it was really only supposed to get the lowmem virtual addresses infact (as virt_to_bus after all). For your suggestion of the pte_to_pfn and the reverse I'm not sure why they're needed (of course for the arch is extremely simple to implement pte_to_pfn, on x86 is a shitright of 12, on alpha it's a shiftright of 32, much simpler than pte_page, pte_page has to deal with discontigmem on numa system so it's slower). But in general people shouldn't use pte_page in drivers, they shouldn't walk pagetables, the vm of linux walks pagetables instead. We need pte_page for things like ptrace, flushing of dirty pagetables etc... when we have virtual userspace addresses and we must find the pages we are working with. But a device driver can either get the page structure via the kiobuf if it's using the direct I/O "pinning userspace memory way" or allocate the memory itself inside the driver so it will have to deal again only with an arry of page structures in first place, never with pte_page. So any usage of pte_page in drivers looks a bit suspect on the design side. Now about your page_to_bus suggestion I think it's on the right track, virt_*anything* has to die, we cannot deal with kernel virtual addresses on pci data any longer, people should either use pfn or page structures (page structures is the most generic and preferred way I believe because drivers should only deal with page structures as said above). So I believe a kind of page_to_bus64 should be implemented, and it should possibly return dma64_addr_t typedeffed as 'unsigned long long'. But the real problem is how to have the driver understand that on some arch it should keep using the iommu 32bit api instead of DAC because those archs have crappy PCI64 DAC hardware implementation that inhibits pci prefetch cycles or stuff like that that leads to bad performance (DaveM can certainly provide more details). This is the hard part of the API I believe. So we either need the driver to have two explicit cases (then the plain page_to_bus64 would not need other changes), or to hide the iommu stuff inside the page_to_bus64, but in such case it means we also need to have a second function to release the pte ala pci_unmap_*. BTW, at the moment what the quadrics and myri folks are doing on alpha in 2.4 is just to use the monster window by hand and to use DAC where avaialble, we enabled it by default on the tsunami chipsets for that reason (for those drivers that maps huge amounts of ram it's manadatory not to generate pressure on the iommu pagetables or they can destabilize the system since all drivers are buggy and they don't check for iommu map functions faliures). The myri folks told me if the monster window is not available they now have an high bound of 32mbytes of data in flight again to avoid those stability troubles and that looks fine. And what the driver should ever do when its device is 32bit and it gets a bus address out of page_to_bus64 that overflows 32bit and no iommu is available? To handle that case it should either enforce that condition not to happen by design (by allocating its pages without __GFP_HIGHMEM if it's using the mmap callback, of course with map_user_kiobuf directIO way it's not possible to enforce it by design instead) or to preallocate some lowmem page to do the bounce buffers (note: it cannot allocate the lowmem page on the fly if it is in a path that could be used to release memory, like I/O or it could deadlock, the memory reservation should be provided by the VM as Stephen suggested, Ben has some patches for that. The advantage of the having the reservation provided by the VM rather than having each subsystem preallocating a minimal amount of ram that allows it to make progress regardless of GFP [like highmem.c currently does] is that the VM has the knowledge to still allow those "reserved" pages to be used as "clean unmapped cache", so they would not be wasted, but it must not be "clean cache" as it is now, it should really be "not locked" and "unmapped" atomically freeable cache, as soon as somebody wants to do I/O or map the page in userspace, the page should be refiled and replaced, and possibly the I/O should wait for some other memory to be freed before the lock/map on the page can succeed, so it's *way* more complicated than what I seen implemented so far by Ben if we want something reliable). > > @@ -107,7 +107,7 @@ > > if( !pmd_present( *pmd ) ) return NOPAGE_OOM; > > pte = pte_offset( pmd, i ); > > if( !pte_present( *pte ) ) return NOPAGE_OOM; > > - physical = (unsigned long)pte_page( *pte )->virtual; > > + physical = (unsigned long)page_address(pte_page( *pte )); > > atomic_inc(&virt_to_page(physical)->count); /* Dec. by kernel */ > > That is just too ugly for words. It's not a "physical" address at all, but > a virtual one, and we're getting the virtual address from the struct page > just to get back to the struct page. > > It should really do > > struct page *page = pte_page( *pte ); > get_page(page); > > instead.. Indeed ;) Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-12 17:02 ` 2.4.8aa1 Andrea Arcangeli @ 2001-08-12 17:21 ` Linus Torvalds 2001-08-12 18:49 ` 2.4.8aa1 Andrea Arcangeli 2001-08-13 3:29 ` 2.4.8aa1 David S. Miller 0 siblings, 2 replies; 9+ messages in thread From: Linus Torvalds @ 2001-08-12 17:21 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Eyal Lebedinsky, linux-kernel On Sun, 12 Aug 2001, Andrea Arcangeli wrote: > > virt_to_bus(page_address(pte_page(pte))) actually returns the right bus > address on x86 in a range of 4G, No it doesn't. page_address(page) is (page)->virtual, which is going to be zero for non-kmapped pages, and going to ve the virtual mapping of a kmapped page. NEITHER of which will translate correctly with "virt_to_bus()". In short, "virt_to_bus(page_address(pte_page(pte)))" only works for the first 1GB. Always have, always will. > But of course calling page_address() on anything highmem is ugly like > hell even if it ""incidentally"" works on 2.4 up to 4G (it temporarly > wraps around in userspace). It incidentally DOES NOT WORK. It happens to work for you in the non-highmem case when you disabled "page->virtual", but in that case HIGHMEM obviously doesn't work at all, so you cannot actually ever _use_ anything but the low 1GB anyway, so it ends up not working in practice even then. So stop making these things up. > For your suggestion of the pte_to_pfn and the reverse I'm not sure why > they're needed (of course for the arch is extremely simple to implement > pte_to_pfn, on x86 is a shitright of 12, on alpha it's a shiftright of > 32, much simpler than pte_page, pte_page has to deal with discontigmem > on numa system so it's slower). But in general people shouldn't use > pte_page in drivers, they shouldn't walk pagetables, the vm of linux > walks pagetables instead. I agree. However, if you looked at how I used it, it was just as a helper function for doing the "struct page -> physical" and "struct page -> bus" address calculations, which _are_ valid. Think of the "page_to_pfn()" as an internal helper routine that avoids the overflow thing. But also, there is actually at least one useful case in the VM layer that wouldn't mind having it - look at what the VALID_PAGE() users in mm/memory.c, and look at the code we generate now (first we do "pte_page()", then we take the page and turn it into a PFN, and then we compare that PFN against the highest PFN we support - and all without realizing that we already _had_ the PFN in the page table and could have avoided one conversion altogether). Also, look at remap_pte_range: it would actually be a _lot_ better off using PFN's instead of the current physical addresses. Right now it cannot handle 64-bit remapping on x86, and the reason is again that we don't have good conversion functions.. > So I believe a kind of page_to_bus64 should be implemented, and it should > possibly return dma64_addr_t typedeffed as 'unsigned long long'. Fair enough, that sounds like a good idea too. > But the real problem is how to have the driver understand that on some > arch it should keep using the iommu 32bit api instead of DAC because > those archs have crappy PCI64 DAC hardware implementation that inhibits > pci prefetch cycles or stuff like that that leads to bad performance > (DaveM can certainly provide more details). This is the hard part of the > API I believe. So we either need the driver to have two explicit cases > (then the plain page_to_bus64 would not need other changes), or to hide > the iommu stuff inside the page_to_bus64, but in such case it means we > also need to have a second function to release the pte ala pci_unmap_*. Yes. And we should call them something like "pci_kmap()", "pci_kunmap()", because that is exactly what they'd be. > And what the driver should ever do when its device is 32bit and it gets > a bus address out of page_to_bus64 that overflows 32bit and no iommu is > available? It shouldn't use page_to_bus64(), I believe. That function only makes sense for 64-bit aware drivers. 32-bit drivers could still use the regular PCI mapping functions, which will map it down to 32 bits as far as the driver is concerned, even if the "real" unmapped bus address might be >32 bits. Agreed? Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-12 17:21 ` 2.4.8aa1 Linus Torvalds @ 2001-08-12 18:49 ` Andrea Arcangeli 2001-08-13 3:29 ` 2.4.8aa1 David S. Miller 1 sibling, 0 replies; 9+ messages in thread From: Andrea Arcangeli @ 2001-08-12 18:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eyal Lebedinsky, linux-kernel On Sun, Aug 12, 2001 at 10:21:58AM -0700, Linus Torvalds wrote: > > On Sun, 12 Aug 2001, Andrea Arcangeli wrote: > > > > virt_to_bus(page_address(pte_page(pte))) actually returns the right bus > > address on x86 in a range of 4G, > > No it doesn't. > > page_address(page) is (page)->virtual, which is going to be zero for > non-kmapped pages, and going to ve the virtual mapping of a kmapped page. > NEITHER of which will translate correctly with "virt_to_bus()". > > In short, "virt_to_bus(page_address(pte_page(pte)))" only works for the > first 1GB. Always have, always will. Woops sorry, I was thinking at the 2.2 page_address, the 2.2 virt_to_bus won't handle the warp around while the 2.4 does, in some bigmem patches we used the 2.4 virt_to_bus just to allow virt_to_bus(page_address(pte_page(pte))) to work up to 4G, this is why I got mistaken. Of course in mainline 2.4 page_address(pte_page(pte)) is just broken for any highmem page as you said. Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-12 17:21 ` 2.4.8aa1 Linus Torvalds 2001-08-12 18:49 ` 2.4.8aa1 Andrea Arcangeli @ 2001-08-13 3:29 ` David S. Miller 2001-08-13 5:44 ` 2.4.8aa1 Linus Torvalds 1 sibling, 1 reply; 9+ messages in thread From: David S. Miller @ 2001-08-13 3:29 UTC (permalink / raw) To: torvalds; +Cc: andrea, eyal, linux-kernel From: Linus Torvalds <torvalds@transmeta.com> Date: Sun, 12 Aug 2001 10:21:58 -0700 (PDT) On Sun, 12 Aug 2001, Andrea Arcangeli wrote: > So I believe a kind of page_to_bus64 should be implemented, and it should > possibly return dma64_addr_t typedeffed as 'unsigned long long'. Fair enough, that sounds like a good idea too. I agree with the dma64_addr_t type, but I don't know if I agree with the logistics of how drivers go about doing things and what the interfaces are that an architecture implements. First, maybe I'm confused about intent. Are we trying to just hack together something quick for 2.4.x that allows PCI drivers on highmem machines to get at the complete low 4GB of physical memory, even the highmem parts? If this is the case, then I have no strong opinions about what you do because I know I will be able to retrofit my ports into whatever interfaces you come up with for that (f.e. the pci_kmap()/pci_kunmap() stuff). But if this is more far reaching, ie. a 64-bit addressing API for drivers, my thinking is that this is a larger problem to solve. For example, if you're talking seriously about sending page/off/len triplets to the drivers (which is what we want in the end), then with that I can whip together something really nice and portable in about a weekend of thinking and hacking. The only reason I haven't done this work yet is strictly because the drivers aren't getting pages yet (well, the zerocopy aware networking drivers are). It was my understanding that this is the bit that Jens Axboe and Andrea are working on. Later, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.8aa1 2001-08-13 3:29 ` 2.4.8aa1 David S. Miller @ 2001-08-13 5:44 ` Linus Torvalds 0 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2001-08-13 5:44 UTC (permalink / raw) To: David S. Miller; +Cc: andrea, eyal, linux-kernel On Sun, 12 Aug 2001, David S. Miller wrote: > > First, maybe I'm confused about intent. Are we trying to just hack > together something quick for 2.4.x that allows PCI drivers on highmem > machines to get at the complete low 4GB of physical memory, even the > highmem parts? No, it's more about cleaning up stuff for 2.4.x - the drivers affected right now can't use more than 1GB of ram anyway, but as they were written they would not compile in Andrea's tree due to having a bit too intimate a knowledge of the VM system. For 2.5.x, we'll see where we end up going in the long run. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2001-08-13 5:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-08-11 12:58 2.4.8aa1 Andrea Arcangeli 2001-08-11 13:32 ` 2.4.8aa1 Eyal Lebedinsky 2001-08-11 14:02 ` 2.4.8aa1 Andrea Arcangeli 2001-08-11 18:20 ` 2.4.8aa1 Linus Torvalds 2001-08-12 17:02 ` 2.4.8aa1 Andrea Arcangeli 2001-08-12 17:21 ` 2.4.8aa1 Linus Torvalds 2001-08-12 18:49 ` 2.4.8aa1 Andrea Arcangeli 2001-08-13 3:29 ` 2.4.8aa1 David S. Miller 2001-08-13 5:44 ` 2.4.8aa1 Linus Torvalds
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.