* rmap: parisc __flush_dcache_page @ 2004-04-08 13:41 Hugh Dickins 2004-04-08 13:52 ` [parisc-linux] " James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2004-04-08 13:41 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: parisc-linux, linux-kernel Something to notice about that parisc __flush_dcache_page I sent you: there's no locking around searching the tree for vmas; there was never any locking around searching the list for vmas. arm is similar, but at least has no CONFIG_SMP, just a preemption issue. Any ideas? Thanks, Hugh ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 13:41 rmap: parisc __flush_dcache_page Hugh Dickins @ 2004-04-08 13:52 ` James Bottomley 2004-04-08 14:16 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 13:52 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrea Arcangeli, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 08:41, Hugh Dickins wrote: > Something to notice about that parisc __flush_dcache_page I sent you: > there's no locking around searching the tree for vmas; there was never > any locking around searching the list for vmas. arm is similar, but > at least has no CONFIG_SMP, just a preemption issue. Any ideas? I don't think you sent it to the parisc list? I'm afraid we've just been pretty heavily updating flush_dcache_page recently to fill a number of holes in the implementation. As far as list traversal goes...we don't require the list to freeze: acidentally flushing dead vmas would be harmless and added ones wouldn't need flushing, so all we need would probably be a safe traversal and a reference to prevent the vma being deallocated. James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 13:52 ` [parisc-linux] " James Bottomley @ 2004-04-08 14:16 ` Hugh Dickins 2004-04-08 14:40 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2004-04-08 14:16 UTC (permalink / raw) To: James Bottomley; +Cc: Andrea Arcangeli, Linux Kernel, parisc-linux On 8 Apr 2004, James Bottomley wrote: > On Thu, 2004-04-08 at 08:41, Hugh Dickins wrote: > > Something to notice about that parisc __flush_dcache_page I sent you: > > there's no locking around searching the tree for vmas; there was never > > any locking around searching the list for vmas. arm is similar, but > > at least has no CONFIG_SMP, just a preemption issue. Any ideas? > > I don't think you sent it to the parisc list? That's right, at present it's just something in Andrea's -aa tree and Martin's -mjb tree. Will try to remember to copy maintainers when sending to Andrew. But the problem was there before the patch. > I'm afraid we've just been pretty heavily updating flush_dcache_page > recently to fill a number of holes in the implementation. Don't be afraid, that's good! Is it still going vertically down i_mmap_shared and i_mmap? Whereas it's only interested in vmas of the one mm, so could go horizontally along it. Just an option to play with, but I don't believe it solves anything, just as unsafe when threaded. > As far as list traversal goes...we don't require the list to freeze: > acidentally flushing dead vmas would be harmless and added ones wouldn't > need flushing, Yes. > so all we need would probably be a safe traversal and a > reference to prevent the vma being deallocated. Which we're not giving you at all at present. I guess another layer of spinlocking/nopreemption, for parisc and arm, dissolving to nothing on other arches. Hugh ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 14:16 ` Hugh Dickins @ 2004-04-08 14:40 ` James Bottomley 2004-04-08 15:14 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 14:40 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrea Arcangeli, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 09:16, Hugh Dickins wrote: > Don't be afraid, that's good! Is it still going vertically down > i_mmap_shared and i_mmap? Whereas it's only interested in vmas of > the one mm, so could go horizontally along it. Just an option to > play with, but I don't believe it solves anything, just as unsafe > when threaded. Yes, I've attached the current code. I'm afraid we have space separation in our caches and tlbs, so we're interested in all mm's not just the current one (that was the bug that just got fixed). > Which we're not giving you at all at present. I guess another layer > of spinlocking/nopreemption, for parisc and arm, dissolving to nothing > on other arches. Whatever seems most convenient that won't impact the flushing fast path, I suppose. It's one of the hottest paths in the system since all data transfers go through it for user visibility. James __flush_dcache_page(struct page *page) { struct mm_struct *mm = current->active_mm; struct list_head *l; struct vm_area_struct *anyvma = NULL; flush_kernel_dcache_page(page_address(page)); if (!page->mapping) return; /* We have ensured in arch_get_unmapped_area() that all shared * mappings are mapped at equivalent addresses, so we only need * to flush one for them all to become coherent */ list_for_each(l, &page->mapping->i_mmap_shared) { struct vm_area_struct *mpnt; unsigned long off, addr; mpnt = list_entry(l, struct vm_area_struct, shared); if (page->index < mpnt->vm_pgoff) continue; off = page->index - mpnt->vm_pgoff; if (off >= (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT) continue; addr = mpnt->vm_start + (off << PAGE_SHIFT); /* flush instructions produce non access tlb misses. * On PA, we nullify these instructions rather than * taking a page fault if the pte doesn't exist, so we * have to find a congruent address with an existing * translation */ if (!translation_exists(mpnt, addr)) continue; anyvma = mpnt; /* * We try first to find a page in our current user process */ if (mpnt->vm_mm != mm) continue; __flush_cache_page(mpnt, addr); /* All user shared mappings should be equivalently mapped, * so once we've flushed one we should be ok */ goto flush_unshared; } /* OK, shared page but not in our current process' address space */ if (anyvma) { unsigned long addr = anyvma->vm_start + ((page->index - anyvma->vm_pgoff) << PAGE_SHIFT); __flush_cache_page(anyvma, addr); } flush_unshared: /* Private mappings will not have congruent addresses, so we * have to flush each of them individually to make the change * in the kernel page visible */ list_for_each(l, &page->mapping->i_mmap) { struct vm_area_struct *mpnt; unsigned long off, addr; mpnt = list_entry(l, struct vm_area_struct, shared); if (page->index < mpnt->vm_pgoff) continue; off = page->index - mpnt->vm_pgoff; if (off >= (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT) continue; addr = mpnt->vm_start + (off << PAGE_SHIFT); /* This is just for speed. If the page translation isn't * there there's no point exciting the nadtlb handler into * a nullification frenzy */ if(!translation_exists(mpnt, addr)) continue; __flush_cache_page(mpnt, addr); } } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 14:40 ` James Bottomley @ 2004-04-08 15:14 ` Andrea Arcangeli 2004-04-08 15:28 ` James Bottomley 2004-04-08 15:35 ` Hugh Dickins 0 siblings, 2 replies; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 15:14 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 09:40:37AM -0500, James Bottomley wrote: > Whatever seems most convenient that won't impact the flushing fast path, > I suppose. It's one of the hottest paths in the system since all data > transfers go through it for user visibility. you'd need to take a semaphore there to be safe, so it's basically unfixable since you can't sleep or just trylock. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:14 ` Andrea Arcangeli @ 2004-04-08 15:28 ` James Bottomley 2004-04-08 15:34 ` Andrea Arcangeli 2004-04-08 15:35 ` Hugh Dickins 1 sibling, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 15:28 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 10:14, Andrea Arcangeli wrote: > you'd need to take a semaphore there to be safe, so it's basically > unfixable since you can't sleep or just trylock. That's a bit of an unhelpful suggestion. flush_dcache_page() exists to support coherency problems with virtual aliasing and a feature of that is that you have to flush every inequivalent user address which might be cached, hence the need for list traversal. Exactly why wouldn't a simple spinlock to protect page->mapping work? I know we don't want to bloat struct page, but such a thing could go in struct address_space? James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:28 ` James Bottomley @ 2004-04-08 15:34 ` Andrea Arcangeli 2004-04-08 15:47 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 15:34 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 10:28:44AM -0500, James Bottomley wrote: > Exactly why wouldn't a simple spinlock to protect page->mapping work? I > know we don't want to bloat struct page, but such a thing could go in > struct address_space? yes, the spinlock in struct address_space would be enough, and that's what 2.4 does too, Andrew changed it to a semaphore in 2.6 but it can be made a spinlock again. Then you can fix it (as far as you never call it from an irq and as far as you don't generate exceptions inside the critical section, but I'm sure you don't). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:34 ` Andrea Arcangeli @ 2004-04-08 15:47 ` James Bottomley 2004-04-08 16:16 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 15:47 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 10:34, Andrea Arcangeli wrote: > yes, the spinlock in struct address_space would be enough, and that's > what 2.4 does too, Andrew changed it to a semaphore in 2.6 but it can be > made a spinlock again. Then you can fix it (as far as you never call it > from an irq and as far as you don't generate exceptions inside the > critical section, but I'm sure you don't). Well, yes, of course we do. We're a sofware tlb arch, so we generate exceptions on tlb misses, which can occur anywhere (even in critical sections). However, the exceptions are carefully crafted not to take spinlocks, so everything should be safe. I'm not sure about the no in irq assertion. The biggest use of flush_dcache_page is on the I/O return path ... that looks like a good candidate for being in interrupt (even though most drivers should be offloading to softirq/tasklets). James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:47 ` James Bottomley @ 2004-04-08 16:16 ` Andrea Arcangeli 2004-04-08 16:29 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 16:16 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 10:47:23AM -0500, James Bottomley wrote: > candidate for being in interrupt (even though most drivers should be > offloading to softirq/tasklets). softirq tasklets would be unsafe too, oh well, if you take it really from irq context (irq/softirq/tasklet) then just a spinlock isn't enough, it'd need to be an irq safe lock or whatever similar plus you must be sure to never generate exceptions triggering the call inside the critical section. sounds like we need some per-arch abstraction to cover this, we for sure don't want an irq spinlock for this, then we can as well leave the semaphore for all archs but parisc. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 16:16 ` Andrea Arcangeli @ 2004-04-08 16:29 ` James Bottomley 2004-04-08 17:10 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 16:29 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 11:16, Andrea Arcangeli wrote: > softirq tasklets would be unsafe too, oh well, if you take it really > from irq context (irq/softirq/tasklet) then just a spinlock isn't > enough, it'd need to be an irq safe lock or whatever similar plus you > must be sure to never generate exceptions triggering the call inside the > critical section. sounds like we need some per-arch abstraction to cover > this, we for sure don't want an irq spinlock for this, then we can as > well leave the semaphore for all archs but parisc. Erm, well, I think this is a global problem. All VI archs have to use the flush_ APIs in cachetlb.txt to ensure coherence. It's just that sparc seems to have some nice cache manipulation instructions that relieve it of the necessity of traversing the mappings. Why don't we want an irq safe spinlock? As Hugh said, we'd abstract it so as to be a nop on PI archs. James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 16:29 ` James Bottomley @ 2004-04-08 17:10 ` Andrea Arcangeli 2004-04-08 17:43 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 17:10 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 11:29:50AM -0500, James Bottomley wrote: > On Thu, 2004-04-08 at 11:16, Andrea Arcangeli wrote: > > softirq tasklets would be unsafe too, oh well, if you take it really > > from irq context (irq/softirq/tasklet) then just a spinlock isn't > > enough, it'd need to be an irq safe lock or whatever similar plus you > > must be sure to never generate exceptions triggering the call inside the > > critical section. sounds like we need some per-arch abstraction to cover ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > this, we for sure don't want an irq spinlock for this, then we can as > > well leave the semaphore for all archs but parisc. > > Erm, well, I think this is a global problem. All VI archs have to use > the flush_ APIs in cachetlb.txt to ensure coherence. It's just that > sparc seems to have some nice cache manipulation instructions that > relieve it of the necessity of traversing the mappings. > > Why don't we want an irq safe spinlock? As Hugh said, we'd abstract it > so as to be a nop on PI archs. I said above per-arch abstraction, a per-arch abstraction isn't an irq safe spinlock, we cannot add an irq safe spinlock there, it'd be too bad for all the common archs that don't need to walk those lists (actually trees in my -aa tree) from irq context. if you implement the locking abstraction with an irq safe spinlock it's your own business then. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 17:10 ` Andrea Arcangeli @ 2004-04-08 17:43 ` James Bottomley 2004-04-08 17:51 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 17:43 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 12:10, Andrea Arcangeli wrote: > I said above per-arch abstraction, a per-arch abstraction isn't an irq > safe spinlock, we cannot add an irq safe spinlock there, it'd be too bad > for all the common archs that don't need to walk those lists (actually > trees in my -aa tree) from irq context. I think we agree on the abstraction thing. I was more wondering what you thought was so costly about an irq safe spinlock as opposed to an ordinary one? Is there something adding to this cost I don't know about? i.e. should we be thinking about something like RCU or phased tree approach to walking the mapping lists? James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 17:43 ` James Bottomley @ 2004-04-08 17:51 ` Andrea Arcangeli 2004-04-08 18:07 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 17:51 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 12:43:45PM -0500, James Bottomley wrote: > On Thu, 2004-04-08 at 12:10, Andrea Arcangeli wrote: > > I said above per-arch abstraction, a per-arch abstraction isn't an irq > > safe spinlock, we cannot add an irq safe spinlock there, it'd be too bad > > for all the common archs that don't need to walk those lists (actually > > trees in my -aa tree) from irq context. > > I think we agree on the abstraction thing. I was more wondering what > you thought was so costly about an irq safe spinlock as opposed to an > ordinary one? Is there something adding to this cost I don't know > about? i.e. should we be thinking about something like RCU or phased > tree approach to walking the mapping lists? that path can take as long as timeslice to run, not taking interrupts for a whole scheduler timeslice is pretty bad. Note that the data structure will become a tree soon, but a prio-tree, walking it with RCU lockless sounds very tricky to me, but it may be doable. For the short term I doubt you want the RCU prio-tree, I guess you want to stabilze the kernel first with the irq safe spinlock, then you can try to hack on the prio-tree to read it in a lockless fascion. If you can make the reading lockless we can giveup the abstraction too, since we can make all archs walk with lockless, but warning, freeing vmas in rcu callbacks means freeing mm in rcu callbacks, that then means freeing pgd in rcu callbacks, the whole mm layer will collapse on you as soon as you try to read that tree without any locking, only the inode will be still there as far as you've a reference on the page (and as far as you don't use nonlinear :-/ ). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 17:51 ` Andrea Arcangeli @ 2004-04-08 18:07 ` James Bottomley 2004-04-08 18:18 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 18:07 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 12:51, Andrea Arcangeli wrote: > that path can take as long as timeslice to run, not taking interrupts > for a whole scheduler timeslice is pretty bad. OK, now I'm confused. Where's the whole timeslice bit coming from? the parisc flush_dcache_code better not take a timeslice to execute otherwise we're in very serious performance trouble. Does it take as long as a timeslice to do mmap[_shared] list insertion? James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:07 ` James Bottomley @ 2004-04-08 18:18 ` Andrea Arcangeli 2004-04-08 18:28 ` James Bottomley 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 18:18 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 01:07:31PM -0500, James Bottomley wrote: > On Thu, 2004-04-08 at 12:51, Andrea Arcangeli wrote: > > that path can take as long as timeslice to run, not taking interrupts > > for a whole scheduler timeslice is pretty bad. > > OK, now I'm confused. Where's the whole timeslice bit coming from? the > parisc flush_dcache_code better not take a timeslice to execute > otherwise we're in very serious performance trouble. > > Does it take as long as a timeslice to do mmap[_shared] list insertion? it enterely depends on the workload. On a desktop machine there may be only some hundred entries in those lists at maximum with glibc being the biggest offender: cat /proc/*/maps | grep libc.so.6 | wc -l with shared memory on some server there can be easily several thousand entries for some inode even on 64bit, but a timeslice was probably exaggerated (the timeslice was for the walking of the ptes in each mapping too, I don't think you need to look at every pte). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:18 ` Andrea Arcangeli @ 2004-04-08 18:28 ` James Bottomley 2004-04-08 18:42 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 18:28 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 13:18, Andrea Arcangeli wrote: > it enterely depends on the workload. On a desktop machine there may be > only some hundred entries in those lists at maximum with glibc being the > biggest offender: > > cat /proc/*/maps | grep libc.so.6 | wc -l > > with shared memory on some server there can be easily several thousand > entries for some inode even on 64bit, but a timeslice was probably > exaggerated (the timeslice was for the walking of the ptes in each > mapping too, I don't think you need to look at every pte). So you're worried about our code? OK, if you look, you'll see we only have to flush one address in the mmap_shared list, (which is usually the long list). I'd constructed it on the predicate that flushing a non-current space is more expensive than finding a current one, but I can alter it to flush the first vma it comes to with a present translation. The mmap list is usually empty. We only excite that case for multiple private mappings of a file which for some reason gets updated. I'd be very surprised if flush_dcache_page executes more than a few hundred instructions all told...that's certainly nowhere close to a timeslice. James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:28 ` James Bottomley @ 2004-04-08 18:42 ` Andrea Arcangeli 2004-04-08 18:49 ` James Bottomley 2004-04-10 1:21 ` Paul E. McKenney 0 siblings, 2 replies; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 18:42 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 01:28:17PM -0500, James Bottomley wrote: > So you're worried about our code? OK, if you look, you'll see we only > have to flush one address in the mmap_shared list, (which is usually the > long list). if you need to flush just one address, the prio-tree may give you an huge boost, overall flush_dcache_page should become pretty quick in most situations. > I'd be very surprised if flush_dcache_page executes more than a few > hundred instructions all told...that's certainly nowhere close to a > timeslice. What you miss is that the problem is not in flush_dcache_page, the problem is that the _other_ users of the prio-tree may take as long as a timeslice. So it's the _other_ user that you've no control about (i.e. truncate) that may take timeslices with irq disabled. But I've an fairly optimal solution for you, you should make it a read_write spinlock, with the readers not disabling interrupts, and the writer disabling interrupts, the writer of the prio-tree will not take a timeslice, the readers instead will take a timeslice, but since they're readers and you've only to read in the flush_dcache_page irq context, you don't need to disable irqs for the readers. I don't have better solutions than this one at the moment (yeah there's the rcu reading of the prio-tree but I'd leave it for later...) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:42 ` Andrea Arcangeli @ 2004-04-08 18:49 ` James Bottomley 2004-04-08 19:02 ` Andrea Arcangeli 2004-04-10 1:21 ` Paul E. McKenney 1 sibling, 1 reply; 22+ messages in thread From: James Bottomley @ 2004-04-08 18:49 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, 2004-04-08 at 13:42, Andrea Arcangeli wrote: > What you miss is that the problem is not in flush_dcache_page, the > problem is that the _other_ users of the prio-tree may take as long as a > timeslice. So it's the _other_ user that you've no control about (i.e. > truncate) that may take timeslices with irq disabled. So the problem isn't in the parisc code, it's in the generic vm code, OK. > But I've an fairly optimal solution for you, you should make it a > read_write spinlock, with the readers not disabling interrupts, and the > writer disabling interrupts, the writer of the prio-tree will not take a > timeslice, the readers instead will take a timeslice, but since they're > readers and you've only to read in the flush_dcache_page irq context, > you don't need to disable irqs for the readers. I don't have better > solutions than this one at the moment (yeah there's the rcu reading of > the prio-tree but I'd leave it for later...) Yes, I'll go for that. The write need only be done on vma insert, which should be very fast. So do we agree this is a generic solution, or were you still thinking of trying to abstract it per-arch? James ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:49 ` James Bottomley @ 2004-04-08 19:02 ` Andrea Arcangeli 0 siblings, 0 replies; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 19:02 UTC (permalink / raw) To: James Bottomley; +Cc: Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 01:49:55PM -0500, James Bottomley wrote: > Yes, I'll go for that. The write need only be done on vma insert, which > should be very fast. So do we agree this is a generic solution, or were > you still thinking of trying to abstract it per-arch? I'm unsure, the semaphore simplifies a lot the need_resched() in the vmtruncate/zap_pte path, plus it avoids to waste time in the other cpus while vmtruncate it working on it (potentially for more than a timeslice). btw, I already considered making the semaphore a rw semaphore (note not a rwspinlock) to boost scalability of the paging too, but OTOH the paging has a so small critical section under the lock (objrmap) that I wasn't sure if it would payoff, the biggest cost will still be the bouncing of the cacheline, so I desisted from the idea of making it a rwsem and I thought the semaphore was ideal as Andrew told me a few days before I had the rwsem idea. Plus concurrent truncate aren't worth optimizing since they're serialized from the i_sem in the first place. Ideally it should be a semaphore for all archs but the ones who needs to walk it from irqs that wants a rw_spinlock. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 18:42 ` Andrea Arcangeli 2004-04-08 18:49 ` James Bottomley @ 2004-04-10 1:21 ` Paul E. McKenney 1 sibling, 0 replies; 22+ messages in thread From: Paul E. McKenney @ 2004-04-10 1:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: James Bottomley, Hugh Dickins, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 08:42:45PM +0200, Andrea Arcangeli wrote: > But I've an fairly optimal solution for you, you should make it a > read_write spinlock, with the readers not disabling interrupts, and the > writer disabling interrupts, the writer of the prio-tree will not take a > timeslice, the readers instead will take a timeslice, but since they're > readers and you've only to read in the flush_dcache_page irq context, > you don't need to disable irqs for the readers. I don't have better > solutions than this one at the moment (yeah there's the rcu reading of > the prio-tree but I'd leave it for later...) FWIW, agreed. Past attempts at RCU-based tree algorithms have been a bit on the complex side. While I believe that simpler versions are possible, RCU-based trees should be approached with caution and with long lead times. Thanx, Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:14 ` Andrea Arcangeli 2004-04-08 15:28 ` James Bottomley @ 2004-04-08 15:35 ` Hugh Dickins 2004-04-08 16:13 ` Andrea Arcangeli 1 sibling, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2004-04-08 15:35 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: James Bottomley, Linux Kernel, parisc-linux On Thu, 8 Apr 2004, Andrea Arcangeli wrote: > On Thu, Apr 08, 2004 at 09:40:37AM -0500, James Bottomley wrote: > > Whatever seems most convenient that won't impact the flushing fast path, > > I suppose. It's one of the hottest paths in the system since all data > > transfers go through it for user visibility. > > you'd need to take a semaphore there to be safe, so it's basically > unfixable since you can't sleep or just trylock. It's not fixable via the i_shared_sem, but we can add another layer of spin_lock around the i_mmap* list/tree manipulations, one that preprocesses away to nothing on all arches but parisc and arm, and is used in their __flush_dcache_page. *Not* the page_table_lock ;) Unappealing, but a lot better than leaving them unsafe. Hugh ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [parisc-linux] rmap: parisc __flush_dcache_page 2004-04-08 15:35 ` Hugh Dickins @ 2004-04-08 16:13 ` Andrea Arcangeli 0 siblings, 0 replies; 22+ messages in thread From: Andrea Arcangeli @ 2004-04-08 16:13 UTC (permalink / raw) To: Hugh Dickins; +Cc: James Bottomley, Linux Kernel, parisc-linux On Thu, Apr 08, 2004 at 04:35:12PM +0100, Hugh Dickins wrote: > It's not fixable via the i_shared_sem, but we can add another layer I meant it's unfixable unless we change the VM common code. > of spin_lock around the i_mmap* list/tree manipulations, one that > preprocesses away to nothing on all arches but parisc and arm, and > is used in their __flush_dcache_page. *Not* the page_table_lock ;) I'd prefer to use only a spinlock then to carry around two overlapping locks, the need_resched() check is needed anyways even with preempt in the real costly paths. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-04-10 1:21 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-08 13:41 rmap: parisc __flush_dcache_page Hugh Dickins 2004-04-08 13:52 ` [parisc-linux] " James Bottomley 2004-04-08 14:16 ` Hugh Dickins 2004-04-08 14:40 ` James Bottomley 2004-04-08 15:14 ` Andrea Arcangeli 2004-04-08 15:28 ` James Bottomley 2004-04-08 15:34 ` Andrea Arcangeli 2004-04-08 15:47 ` James Bottomley 2004-04-08 16:16 ` Andrea Arcangeli 2004-04-08 16:29 ` James Bottomley 2004-04-08 17:10 ` Andrea Arcangeli 2004-04-08 17:43 ` James Bottomley 2004-04-08 17:51 ` Andrea Arcangeli 2004-04-08 18:07 ` James Bottomley 2004-04-08 18:18 ` Andrea Arcangeli 2004-04-08 18:28 ` James Bottomley 2004-04-08 18:42 ` Andrea Arcangeli 2004-04-08 18:49 ` James Bottomley 2004-04-08 19:02 ` Andrea Arcangeli 2004-04-10 1:21 ` Paul E. McKenney 2004-04-08 15:35 ` Hugh Dickins 2004-04-08 16:13 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox