* shrink_dcache_sb scalability problem. @ 2006-04-13 8:22 David Chinner 2006-04-13 8:52 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: David Chinner @ 2006-04-13 8:22 UTC (permalink / raw) To: linux-kernel; +Cc: linux-fsdevel Folks, After recently upgrading a build machine to 2.6.16, we started seeing 10-50s pauses where the machine would appear to hang. Profiles showed that we were spending a substantial amount of time in shrink_dcache_sb, and several CPUs were spinning on the dcache_lock. This is happening quite frequently - we recorded a 10 minute period where there were 13 incidents where a touch/rm of a single file was taking longer than 10s. The machine was close to unusable when this happened. At the time of the problem the machine had several million unused cached dentries in memory (often > 10million), and the builds use chroot environments with internally mounted filesystems like /proc and /sys. The problem is that whenever we mount /proc, /sys, /dev/pts, etc, we call shrink_dcache_sb() which does multiple traversals across the unused dentry list with the dcache_lock held. It is trivial to reduce this to one traversal for the case of a new mount. However, that doesn't solve the issue that we are walking a linked list of many million entries with a global lock held and holding out everyone else. We're open to any suggestions on how to go about fixing this problem as it's not obvious what the correct way to approach this problem is. Any advice, patches, etc is more than welcome. Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-13 8:22 shrink_dcache_sb scalability problem David Chinner @ 2006-04-13 8:52 ` Andrew Morton 2006-04-14 3:43 ` David Chinner 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2006-04-13 8:52 UTC (permalink / raw) To: David Chinner; +Cc: linux-kernel, linux-fsdevel David Chinner <dgc@sgi.com> wrote: > > After recently upgrading a build machine to 2.6.16, we started > seeing 10-50s pauses where the machine would appear to hang. This sounds like the recent thread "Avoid excessive time spend on concurrent slab shrinking" over on linux-mm. Have you read through that? http://marc.theaimsgroup.com/?l=linux-mm&r=1&b=200603&w=2 http://marc.theaimsgroup.com/?l=linux-mm&r=3&b=200604&w=2 It ended up somewaht inconclusive, but it looks like we do have a bit of a problem, but it got exacerbated by an XFS slowness. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-13 8:52 ` Andrew Morton @ 2006-04-14 3:43 ` David Chinner 2006-04-14 5:23 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: David Chinner @ 2006-04-14 3:43 UTC (permalink / raw) To: Andrew Morton; +Cc: David Chinner, linux-kernel, linux-fsdevel On Thu, Apr 13, 2006 at 01:52:57AM -0700, Andrew Morton wrote: > David Chinner <dgc@sgi.com> wrote: > > > > After recently upgrading a build machine to 2.6.16, we started seeing > > 10-50s pauses where the machine would appear to hang. > > This sounds like the recent thread "Avoid excessive time spend on concurrent > slab shrinking" over on linux-mm. Have you read through that? > > http://marc.theaimsgroup.com/?l=linux-mm&r=1&b=200603&w=2 > http://marc.theaimsgroup.com/?l=linux-mm&r=3&b=200604&w=2 Yes, I even made comments directly in the thread and it really wasn't a problem with the slab shrinking infrastructure. It was (obvious to us XFS folk) just another XFS inode caching scalability problem that this machine has uncovered over the past few years. > It ended up somewaht inconclusive, but it looks like we do have a bit of a > problem, but it got exacerbated by an XFS slowness. I've already fixed that problem with: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1fc5d959d88a5f77aa7e4435f6c9d0e2d2236704 and the machine showing the shrink_dcache_sb() problems is already running that fix. That problem masked the shrink_dcache_sb one - who notices a 10s hang when the machine has been really, really slow for 20 minutes? So this is not (directly) related to reclaim of inodes or dentries. It can be seen during reclaim of dentries if someone is mounting or unmounting a filesystem at the same time, but fundamentally it's a result of a large number of cached dentries on a single list protected by a single lock and having to walk that list atomically. Given the complexity of the dcache, I really don't know enough about it or have the time available to invest in learning all I need to know about it to solve the problem. That's why I'm asking for help from the experts.... Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-14 3:43 ` David Chinner @ 2006-04-14 5:23 ` Andrew Morton 2006-04-15 5:25 ` Bharata B Rao 2006-04-18 0:29 ` David Chinner 0 siblings, 2 replies; 9+ messages in thread From: Andrew Morton @ 2006-04-14 5:23 UTC (permalink / raw) To: David Chinner; +Cc: dgc, linux-kernel, linux-fsdevel, Dipankar Sarma David Chinner <dgc@sgi.com> wrote: > > On Thu, Apr 13, 2006 at 01:52:57AM -0700, Andrew Morton wrote: > > David Chinner <dgc@sgi.com> wrote: > > > > > > After recently upgrading a build machine to 2.6.16, we started seeing > > > 10-50s pauses where the machine would appear to hang. > > > > This sounds like the recent thread "Avoid excessive time spend on concurrent > > slab shrinking" over on linux-mm. Have you read through that? > > > > http://marc.theaimsgroup.com/?l=linux-mm&r=1&b=200603&w=2 > > http://marc.theaimsgroup.com/?l=linux-mm&r=3&b=200604&w=2 > > Yes, I even made comments directly in the thread and it really > wasn't a problem with the slab shrinking infrastructure. It > was (obvious to us XFS folk) just another XFS inode caching > scalability problem that this machine has uncovered over > the past few years. OK. > > It ended up somewaht inconclusive, but it looks like we do have a bit of a > > problem, but it got exacerbated by an XFS slowness. > > I've already fixed that problem with: > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1fc5d959d88a5f77aa7e4435f6c9d0e2d2236704 > > and the machine showing the shrink_dcache_sb() problems is > already running that fix. That problem masked the shrink_dcache_sb > one - who notices a 10s hang when the machine has been really, > really slow for 20 minutes? So the problem is shrink_dcache_sb() and not regular memory reclaim? What is happening on that machine to be triggering shrink_dcache_sb()? automounts? We fixed a similar problem in the inode cache a year or so back by creating per-superblock inode lists, so that search-a-global-list-for-objects-belonging-to-this-superblock thing went away. Presumably we could fix this in the same manner. But adding two more pointers to struct dentry would hurt. An alternative might be to remove the global LRU altogether, make it per-superblock. That would reduce the quality of the LRUing, but that probably wouldn't hurt a lot. Another idea would be to take shrinker_sem for writing when running shrink_dcache_sb() - that would prevent tasks from coming in and getting stuck on dcache_lock, but there are plentry of other places which want dcache_lock. I don't immediately see any simple tweaks which would allow us to avoid that long lock hold time. Perhaps the scanning in shrink_dcache_sb() could use just rcu_read_lock()... OT, I'm a bit curious about this: list_del_init(tmp); spin_lock(&dentry->d_lock); if (atomic_read(&dentry->d_count)) { spin_unlock(&dentry->d_lock); continue; } So we rip the dentry off dcache_unused and just leave it floating about? Dipankar, do you remember why that change was made, and why it's not a bug? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-14 5:23 ` Andrew Morton @ 2006-04-15 5:25 ` Bharata B Rao 2006-04-15 5:53 ` Andrew Morton 2006-04-18 0:29 ` David Chinner 1 sibling, 1 reply; 9+ messages in thread From: Bharata B Rao @ 2006-04-15 5:25 UTC (permalink / raw) To: Andrew Morton; +Cc: David Chinner, linux-kernel, linux-fsdevel, Dipankar Sarma On 4/14/06, Andrew Morton <akpm@osdl.org> wrote: > David Chinner <dgc@sgi.com> wrote: > > > > On Thu, Apr 13, 2006 at 01:52:57AM -0700, Andrew Morton wrote: > > > David Chinner <dgc@sgi.com> wrote: > > > > > > > > After recently upgrading a build machine to 2.6.16, we started seeing > > > > 10-50s pauses where the machine would appear to hang. > > > > > > This sounds like the recent thread "Avoid excessive time spend on concurrent > > > slab shrinking" over on linux-mm. Have you read through that? > > > > > > http://marc.theaimsgroup.com/?l=linux-mm&r=1&b=200603&w=2 > > > http://marc.theaimsgroup.com/?l=linux-mm&r=3&b=200604&w=2 > > > > Yes, I even made comments directly in the thread and it really > > wasn't a problem with the slab shrinking infrastructure. It > > was (obvious to us XFS folk) just another XFS inode caching > > scalability problem that this machine has uncovered over > > the past few years. > > OK. > > > > It ended up somewaht inconclusive, but it looks like we do have a bit of a > > > problem, but it got exacerbated by an XFS slowness. > > > > I've already fixed that problem with: > > > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1fc5d959d88a5f77aa7e4435f6c9d0e2d2236704 > > > > and the machine showing the shrink_dcache_sb() problems is > > already running that fix. That problem masked the shrink_dcache_sb > > one - who notices a 10s hang when the machine has been really, > > really slow for 20 minutes? > > So the problem is shrink_dcache_sb() and not regular memory reclaim? > > What is happening on that machine to be triggering shrink_dcache_sb()? > automounts? > > We fixed a similar problem in the inode cache a year or so back by creating > per-superblock inode lists, so that > search-a-global-list-for-objects-belonging-to-this-superblock thing went > away. Presumably we could fix this in the same manner. But adding two > more pointers to struct dentry would hurt. > > An alternative might be to remove the global LRU altogether, make it > per-superblock. That would reduce the quality of the LRUing, but that > probably wouldn't hurt a lot. > > Another idea would be to take shrinker_sem for writing when running > shrink_dcache_sb() - that would prevent tasks from coming in and getting > stuck on dcache_lock, but there are plentry of other places which want > dcache_lock. > > I don't immediately see any simple tweaks which would allow us to avoid that > long lock hold time. Perhaps the scanning in shrink_dcache_sb() could use > just rcu_read_lock()... > > OT, I'm a bit curious about this: > > list_del_init(tmp); > spin_lock(&dentry->d_lock); > if (atomic_read(&dentry->d_count)) { > spin_unlock(&dentry->d_lock); > continue; > } > > So we rip the dentry off dcache_unused and just leave it floating about? > Dipankar, do you remember why that change was made, and why it's not a bug? Due to lazy updating of the LRU list, there can be some dentries with non-zero ref counts on LRU list. This is one of the places where such dentries are removed from the LRU list. (Basically such dentries will be both on hash list and LRU list and here they get removed from the LRU list) Regards, Bharata. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-15 5:25 ` Bharata B Rao @ 2006-04-15 5:53 ` Andrew Morton 2006-04-18 5:57 ` Bharata B Rao 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2006-04-15 5:53 UTC (permalink / raw) To: Bharata B Rao; +Cc: dgc, linux-kernel, linux-fsdevel, dipankar "Bharata B Rao" <bharata.rao@gmail.com> wrote: > > > OT, I'm a bit curious about this: > > > > list_del_init(tmp); > > spin_lock(&dentry->d_lock); > > if (atomic_read(&dentry->d_count)) { > > spin_unlock(&dentry->d_lock); > > continue; > > } > > > > So we rip the dentry off dcache_unused and just leave it floating about? > > Dipankar, do you remember why that change was made, and why it's not a bug? > > Due to lazy updating of the LRU list, there can be some dentries with non-zero > ref counts on LRU list. This is one of the places where such dentries are > removed from the LRU list. (Basically such dentries will be both on > hash list and LRU > list and here they get removed from the LRU list) OK. But what guarantees that these live-but-detached dentries are appropriately destroyed before the unmount completes? Or... if these dentries will be freed by RCU callbacks potentially after the unmount, are we sure that they will always be in a state which will permit them to be freed? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-15 5:53 ` Andrew Morton @ 2006-04-18 5:57 ` Bharata B Rao 0 siblings, 0 replies; 9+ messages in thread From: Bharata B Rao @ 2006-04-18 5:57 UTC (permalink / raw) To: Andrew Morton; +Cc: dgc, linux-kernel, linux-fsdevel, dipankar On 4/15/06, Andrew Morton <akpm@osdl.org> wrote: > "Bharata B Rao" <bharata.rao@gmail.com> wrote: > > > > > OT, I'm a bit curious about this: > > > > > > list_del_init(tmp); > > > spin_lock(&dentry->d_lock); > > > if (atomic_read(&dentry->d_count)) { > > > spin_unlock(&dentry->d_lock); > > > continue; > > > } > > > > > > So we rip the dentry off dcache_unused and just leave it floating about? > > > Dipankar, do you remember why that change was made, and why it's not a bug? > > > > Due to lazy updating of the LRU list, there can be some dentries with non-zero > > ref counts on LRU list. This is one of the places where such dentries are > > removed from the LRU list. (Basically such dentries will be both on > > hash list and LRU > > list and here they get removed from the LRU list) > > OK. But what guarantees that these live-but-detached dentries are > appropriately destroyed before the unmount completes? These are live dentries but not really detached, they are still attached to the hash list. And yes I don't see shrink_dcache_sb holding the umount because of these dentries. I assume dput of such dentries will happen from different paths. But I am not sure if we could even end up in this situation where we have landed up in shrink_dcache_sb from unmount path and there are still some inuse dentries present. Need some clarification here. > > Or... if these dentries will be freed by RCU callbacks potentially after > the unmount, are we sure that they will always be in a state which will > permit them to be freed? When a dentry is queued for RCU freeing, there will be no references to it from anywhere. It wouldn't be on hash list or on lru list. So I would think only those dentries which are really freeable are queued for RCU freeing. I see that shrink_dcache_sb is being called from the remount path (do_remount_sb). I couldn't understand why we do this. AFAICS we anyway don't modify the mountpoint or the mount root during remount. Woudn't it be advantageous to leave those dentries on LRU ? Regards, Bharata. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: shrink_dcache_sb scalability problem. 2006-04-14 5:23 ` Andrew Morton 2006-04-15 5:25 ` Bharata B Rao @ 2006-04-18 0:29 ` David Chinner 2006-04-18 14:37 ` [RFC][PATCH] " David Chinner 1 sibling, 1 reply; 9+ messages in thread From: David Chinner @ 2006-04-18 0:29 UTC (permalink / raw) To: Andrew Morton; +Cc: David Chinner, linux-kernel, linux-fsdevel, Dipankar Sarma On Thu, Apr 13, 2006 at 10:23:25PM -0700, Andrew Morton wrote: > David Chinner <dgc@sgi.com> wrote: > > I've already fixed that problem with: > > > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1fc5d959d88a5f77aa7e4435f6c9d0e2d2236704 > > > > and the machine showing the shrink_dcache_sb() problems is > > already running that fix. That problem masked the shrink_dcache_sb > > one - who notices a 10s hang when the machine has been really, > > really slow for 20 minutes? > > So the problem is shrink_dcache_sb() and not regular memory reclaim? *nod* > What is happening on that machine to be triggering shrink_dcache_sb()? > automounts? That was what i first suspected as themachine has lots of automounted directories. however, breaking into kdb when the problem occurred showed mounts of sysfs and /dev/pts. >From what I've been told, the chroot package build environment being used (not something we control) mounts /proc, /sys, /dev/pts, etc when the chroot is entered, and unmounts them after the build is complete. hence for every package build an engineer kicks off, we get multiple mounts and unmounts occurring. > We fixed a similar problem in the inode cache a year or so back by creating > per-superblock inode lists, so that > search-a-global-list-for-objects-belonging-to-this-superblock thing went > away. Presumably we could fix this in the same manner. But adding two > more pointers to struct dentry would hurt. *nod* I can't see æny fields we'd be able to overload, either. > An alternative might be to remove the global LRU altogether, make it > per-superblock. That would reduce the quality of the LRUing, but that > probably wouldn't hurt a lot. That makes reclaim an interesting problem. You'd have to walk the superblock list to get to each lru, and then how would you ensure that you fairly reclaim inodes from each filesystem? > Another idea would be to take shrinker_sem for writing when running > shrink_dcache_sb() - that would prevent tasks from coming in and getting > stuck on dcache_lock, but there are plentry of other places which want > dcache_lock. Yeah, the contention we are seeing is from all those other places, so I can't see how taking the shrinker semaphore reall helps us here. > I don't immediately see any simple tweaks which would allow us to avoid that > long lock hold time. Perhaps the scanning in shrink_dcache_sb() could use > just rcu_read_lock()... We're modifying the list as we scan it, so I can't see how we can do this without an exclusive lock. The other thing that I thought of over the weekend is per-node LRU lists and a lock per node This will reduce the length of the lists, allow some parallelism even while we scan and purge each list using the existing algorithm, and not completely destroy the LRU-ness of the dcache. It would also allow parallelising prune_dcache() and allow the shrinker to prune the local node first (i.e. where we really need the memory). FWIW, this is a showstopper for us. The only thing that is allowing us to keep running a recent kernel on this machine is the fact that someone is running `echo 3 > /proc/sys/vm/drop_caches` as soon as the slowdown manifests to blow away the dentry cache.... Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH] Re: shrink_dcache_sb scalability problem. 2006-04-18 0:29 ` David Chinner @ 2006-04-18 14:37 ` David Chinner 0 siblings, 0 replies; 9+ messages in thread From: David Chinner @ 2006-04-18 14:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel, Dipankar Sarma On Tue, Apr 18, 2006 at 10:29:28AM +1000, David Chinner wrote: > > The other thing that I thought of over the weekend is per-node LRU > lists and a lock per node This will reduce the length of the lists, > allow some parallelism even while we scan and purge each list > using the existing algorithm, and not completely destroy the LRU-ness > of the dcache. Here's a patch that does that. It's rough, but it boots and i've run some basic tests against it. It doesn't survive dbench with 200 processes, though, as it crashes in prune_one_dentry() with a corrupted d_child. The logic in prune_dcache() is pretty grotesque atm, and I doubt it's correct. Comments on how to fix it are welcome ;) Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group Make the dentry unused lists per-node. Signed-off-by: Dave Chinner <dgc@sgi.com> Index: 2.6.x-xfs-new/fs/dcache.c =================================================================== --- 2.6.x-xfs-new.orig/fs/dcache.c 2006-04-18 10:35:10.000000000 +1000 +++ 2.6.x-xfs-new/fs/dcache.c 2006-04-18 22:23:04.603552947 +1000 @@ -62,7 +62,6 @@ static kmem_cache_t *dentry_cache; static unsigned int d_hash_mask; static unsigned int d_hash_shift; static struct hlist_head *dentry_hashtable; -static LIST_HEAD(dentry_unused); /* Statistics gathering. */ struct dentry_stat_t dentry_stat = { @@ -114,6 +113,72 @@ static void dentry_iput(struct dentry * } } +static void +dentry_unused_add(struct dentry *dentry) +{ + pg_data_t *pgdat = page_zone(virt_to_page(dentry))->zone_pgdat; + + spin_lock(&pgdat->dentry_unused_lock); + list_add(&dentry->d_lru, &pgdat->dentry_unused); + spin_unlock(&pgdat->dentry_unused_lock); + dentry_stat.nr_unused++; +} + +static void +dentry_unused_add_tail(struct dentry *dentry) +{ + pg_data_t *pgdat = page_zone(virt_to_page(dentry))->zone_pgdat; + + spin_lock(&pgdat->dentry_unused_lock); + list_add_tail(&dentry->d_lru, &pgdat->dentry_unused); + spin_unlock(&pgdat->dentry_unused_lock); + dentry_stat.nr_unused++; +} + +/* + * Assumes external locks are already held + */ +static void +dentry_unused_move(struct dentry *dentry, struct list_head *head) +{ + list_del(&dentry->d_lru); + list_add(&dentry->d_lru, head); +} + +static void +dentry_unused_del(struct dentry *dentry) +{ + if (!list_empty(&dentry->d_lru)) { + pg_data_t *pgdat = page_zone(virt_to_page(dentry))->zone_pgdat; + + spin_lock(&pgdat->dentry_unused_lock); + list_del(&dentry->d_lru); + spin_unlock(&pgdat->dentry_unused_lock); + dentry_stat.nr_unused--; + } +} + +static inline void +__dentry_unused_del_init(struct dentry *dentry) +{ + if (likely(!list_empty(&dentry->d_lru))) + list_del_init(&dentry->d_lru); + +} + +static void +dentry_unused_del_init(struct dentry *dentry) +{ + if (!list_empty(&dentry->d_lru)) { + pg_data_t *pgdat = page_zone(virt_to_page(dentry))->zone_pgdat; + + spin_lock(&pgdat->dentry_unused_lock); + __dentry_unused_del_init(dentry); + spin_unlock(&pgdat->dentry_unused_lock); + dentry_stat.nr_unused--; + } +} + /* * This is dput * @@ -173,8 +238,7 @@ repeat: goto kill_it; if (list_empty(&dentry->d_lru)) { dentry->d_flags |= DCACHE_REFERENCED; - list_add(&dentry->d_lru, &dentry_unused); - dentry_stat.nr_unused++; + dentry_unused_add(dentry); } spin_unlock(&dentry->d_lock); spin_unlock(&dcache_lock); @@ -186,13 +250,8 @@ unhash_it: kill_it: { struct dentry *parent; - /* If dentry was on d_lru list - * delete it from there - */ - if (!list_empty(&dentry->d_lru)) { - list_del(&dentry->d_lru); - dentry_stat.nr_unused--; - } + /* If dentry was on d_lru list delete it from there */ + dentry_unused_del(dentry); list_del(&dentry->d_u.d_child); dentry_stat.nr_dentry--; /* For d_free, below */ /*drops the locks, at that point nobody can reach this dentry */ @@ -268,10 +327,7 @@ int d_invalidate(struct dentry * dentry) static inline struct dentry * __dget_locked(struct dentry *dentry) { atomic_inc(&dentry->d_count); - if (!list_empty(&dentry->d_lru)) { - dentry_stat.nr_unused--; - list_del_init(&dentry->d_lru); - } + dentry_unused_del_init(dentry); return dentry; } @@ -392,42 +448,73 @@ static inline void prune_one_dentry(stru static void prune_dcache(int count) { - spin_lock(&dcache_lock); - for (; count ; count--) { - struct dentry *dentry; - struct list_head *tmp; - - cond_resched_lock(&dcache_lock); - - tmp = dentry_unused.prev; - if (tmp == &dentry_unused) - break; - list_del_init(tmp); - prefetch(dentry_unused.prev); - dentry_stat.nr_unused--; - dentry = list_entry(tmp, struct dentry, d_lru); + int node_id = numa_node_id(); + int scan_low = 0; + int c = count; + pg_data_t *pgdat; + +rescan: + for_each_pgdat(pgdat) { + if (!scan_low) { + if (pgdat->node_id < node_id) + continue; + } else { + if (pgdat->node_id >= node_id) + break; + } + for (c = count; c ; c--) { + struct dentry *dentry; + struct list_head *tmp; + spin_lock(&pgdat->dentry_unused_lock); + + tmp = pgdat->dentry_unused.prev; + if (tmp == &pgdat->dentry_unused) { + spin_unlock(&pgdat->dentry_unused_lock); + break; + } + dentry = list_entry(tmp, struct dentry, d_lru); + __dentry_unused_del_init(dentry); + prefetch(&pgdat->dentry_unused.prev); + spin_unlock(&pgdat->dentry_unused_lock); - spin_lock(&dentry->d_lock); + spin_lock(&dcache_lock); + spin_lock(&dentry->d_lock); + /* + * We found an inuse dentry which was not removed from + * dentry_unused because of laziness during lookup or + * a dentry that has just been put back on the unused + * list. Do not free it - just leave it where it is. + */ + if (atomic_read(&dentry->d_count) || + !list_empty(&dentry->d_lru)) { + spin_unlock(&dentry->d_lock); + spin_unlock(&dcache_lock); + continue; + } + /* If the dentry was recently referenced, don't free it. */ + if (dentry->d_flags & DCACHE_REFERENCED) { + dentry->d_flags &= ~DCACHE_REFERENCED; + dentry_unused_add(dentry); + spin_unlock(&dentry->d_lock); + spin_unlock(&dcache_lock); + continue; + } + prune_one_dentry(dentry); + spin_unlock(&dcache_lock); + } /* - * We found an inuse dentry which was not removed from - * dentry_unused because of laziness during lookup. Do not free - * it - just keep it off the dentry_unused list. + * shrink_parent needs to scan each list, and if it only + * calls in with one count then we may never find it. So + * if count ==1, scan each list once. */ - if (atomic_read(&dentry->d_count)) { - spin_unlock(&dentry->d_lock); - continue; - } - /* If the dentry was recently referenced, don't free it. */ - if (dentry->d_flags & DCACHE_REFERENCED) { - dentry->d_flags &= ~DCACHE_REFERENCED; - list_add(&dentry->d_lru, &dentry_unused); - dentry_stat.nr_unused++; - spin_unlock(&dentry->d_lock); - continue; - } - prune_one_dentry(dentry); + if (count == 1) + c = 1; + if (!c) + break; } - spin_unlock(&dcache_lock); + if (c && scan_low++ == 0) + goto rescan; + } /* @@ -456,39 +543,66 @@ void shrink_dcache_sb(struct super_block { struct list_head *tmp, *next; struct dentry *dentry; + pg_data_t *pgdat; + int found; - /* - * Pass one ... move the dentries for the specified - * superblock to the most recent end of the unused list. - */ - spin_lock(&dcache_lock); - list_for_each_safe(tmp, next, &dentry_unused) { - dentry = list_entry(tmp, struct dentry, d_lru); - if (dentry->d_sb != sb) - continue; - list_del(tmp); - list_add(tmp, &dentry_unused); - } + for_each_pgdat(pgdat) { + found = 0; + spin_lock(&pgdat->dentry_unused_lock); + /* + * Pass one ... move the dentries for the specified + * superblock to the most recent end of the unused list. + */ + list_for_each_safe(tmp, next, &pgdat->dentry_unused) { + dentry = list_entry(tmp, struct dentry, d_lru); + if (dentry->d_sb != sb) + continue; + dentry_unused_move(dentry, &pgdat->dentry_unused); + found++; + } + spin_unlock(&pgdat->dentry_unused_lock); - /* - * Pass two ... free the dentries for this superblock. - */ -repeat: - list_for_each_safe(tmp, next, &dentry_unused) { - dentry = list_entry(tmp, struct dentry, d_lru); - if (dentry->d_sb != sb) - continue; - dentry_stat.nr_unused--; - list_del_init(tmp); - spin_lock(&dentry->d_lock); - if (atomic_read(&dentry->d_count)) { - spin_unlock(&dentry->d_lock); + /* + * Pass two ... free the dentries for this superblock. + * use the output of the first pass to determine if we need + * to run this pass. + */ + if (!found) continue; +repeat: + spin_lock(&pgdat->dentry_unused_lock); + list_for_each_safe(tmp, next, &pgdat->dentry_unused) { + dentry = list_entry(tmp, struct dentry, d_lru); + if (dentry->d_sb != sb) + continue; + __dentry_unused_del_init(dentry); + + /* + * We snoop on the d_count here so we can skip + * dentries we can obviously not free right now + * without dropping the list lock. This prevents us + * from getting stuck on an in-use dentry on the unused + * list. + */ + if (atomic_read(&dentry->d_count)) + continue; + + spin_unlock(&pgdat->dentry_unused_lock); + spin_lock(&dcache_lock); + spin_lock(&dentry->d_lock); + if (atomic_read(&dentry->d_count) || + (dentry->d_sb != sb) || + !list_empty(&dentry->d_lru)) { + spin_unlock(&dentry->d_lock); + spin_unlock(&dcache_lock); + goto repeat; + } + prune_one_dentry(dentry); + spin_unlock(&dcache_lock); + goto repeat; } - prune_one_dentry(dentry); - goto repeat; + spin_unlock(&pgdat->dentry_unused_lock); } - spin_unlock(&dcache_lock); } /* @@ -572,17 +686,13 @@ resume: struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child); next = tmp->next; - if (!list_empty(&dentry->d_lru)) { - dentry_stat.nr_unused--; - list_del_init(&dentry->d_lru); - } + dentry_unused_del_init(dentry); /* * move only zero ref count dentries to the end * of the unused list for prune_dcache */ if (!atomic_read(&dentry->d_count)) { - list_add(&dentry->d_lru, dentry_unused.prev); - dentry_stat.nr_unused++; + dentry_unused_add_tail(dentry); found++; } @@ -657,18 +767,14 @@ void shrink_dcache_anon(struct hlist_hea spin_lock(&dcache_lock); hlist_for_each(lp, head) { struct dentry *this = hlist_entry(lp, struct dentry, d_hash); - if (!list_empty(&this->d_lru)) { - dentry_stat.nr_unused--; - list_del_init(&this->d_lru); - } + dentry_unused_del_init(this); /* * move only zero ref count dentries to the end * of the unused list for prune_dcache */ if (!atomic_read(&this->d_count)) { - list_add_tail(&this->d_lru, &dentry_unused); - dentry_stat.nr_unused++; + dentry_unused_add_tail(this); found++; } } @@ -1673,6 +1779,12 @@ static void __init dcache_init_early(voi static void __init dcache_init(unsigned long mempages) { int loop; + pg_data_t *pgdat; + + for_each_pgdat(pgdat) { + spin_lock_init(&pgdat->dentry_unused_lock); + INIT_LIST_HEAD(&pgdat->dentry_unused); + } /* * A constructor could be added for stable state like the lists, Index: 2.6.x-xfs-new/include/linux/mmzone.h =================================================================== --- 2.6.x-xfs-new.orig/include/linux/mmzone.h 2006-02-06 11:57:55.000000000 +1100 +++ 2.6.x-xfs-new/include/linux/mmzone.h 2006-04-18 18:04:22.378952121 +1000 @@ -311,6 +311,8 @@ typedef struct pglist_data { wait_queue_head_t kswapd_wait; struct task_struct *kswapd; int kswapd_max_order; + spinlock_t dentry_unused_lock; + struct list_head dentry_unused; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-04-18 14:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-13 8:22 shrink_dcache_sb scalability problem David Chinner 2006-04-13 8:52 ` Andrew Morton 2006-04-14 3:43 ` David Chinner 2006-04-14 5:23 ` Andrew Morton 2006-04-15 5:25 ` Bharata B Rao 2006-04-15 5:53 ` Andrew Morton 2006-04-18 5:57 ` Bharata B Rao 2006-04-18 0:29 ` David Chinner 2006-04-18 14:37 ` [RFC][PATCH] " David Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox