* Important regression with XFS update for 2.6.24-rc6 @ 2007-12-18 11:28 Damien Wyart 2007-12-18 12:24 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: Damien Wyart @ 2007-12-18 11:28 UTC (permalink / raw) To: David Chinner, Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds Cc: linux-xfs, LKML Hello, As a follow-up to <http://marc.info/?l=linux-kernel&m=119796120524618&w=2> (LKML seems down right now so I am not linking to it), I have detected an important problem with these two patches: after applying them by hand (downloaded them raw from SGI's gitweb) on top of 2.6.24-rc5-git5 (they have not yet been pulled into mainline by Linux as of this morning) for testing purposes, I noticed upon reboot that "ls -l" on directories with many files and subdirectories (around 5000 entries) takes several hundreds of MB in RAM and then dies with "memory exhausted" error. I also noticed that ldconfig takes a lot of time to complete, and firefox seems also to eat much more memory than usual. Reverting the two patches (going back to vanilla rc5-git5) makes these problems go away. I am not able to test right now if only one of the patches is bogus or if both of them are concerned. As the symptoms are easy to reproduce, I guess this is some kind of brown paper bag bug and will be easy for XFS experts to spot. Best, -- Damien Wyart ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-18 11:28 Important regression with XFS update for 2.6.24-rc6 Damien Wyart @ 2007-12-18 12:24 ` David Chinner 2007-12-18 14:30 ` Damien Wyart 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-18 12:24 UTC (permalink / raw) To: Damien Wyart Cc: David Chinner, Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML On Tue, Dec 18, 2007 at 12:28:04PM +0100, Damien Wyart wrote: > Hello, > > As a follow-up to <http://marc.info/?l=linux-kernel&m=119796120524618&w=2> > (LKML seems down right now so I am not linking to it), I have detected an > important problem with these two patches: after applying them by hand > (downloaded them raw from SGI's gitweb) on top of 2.6.24-rc5-git5 (they have > not yet been pulled into mainline by Linux as of this morning) for testing > purposes, I noticed upon reboot that "ls -l" on directories with many files > and subdirectories (around 5000 entries) takes several hundreds of MB in RAM > and then dies with "memory exhausted" error. Ok. I haven't noticed anything wrong with directories up to about 250,000 files in the last few days. The ls -l I just did on a directory with 15000 entries (btree format) used about 5MB of RAM. extent format directories appear to work fine as well (tested 500 entries). Can you: a) isolate the problem to one patch or the other. My guess would be the directory mod, but..... b) show your working ;) - what platform (i386, x86_64, etc) - what debug options - commands and output that shows the problem - strace of ls -l going bad - xfs_info from filesystem in question > I also noticed that ldconfig takes a lot of time to complete, and firefox > seems also to eat much more memory than usual. Reverting the two patches > (going back to vanilla rc5-git5) makes these problems go away. I am not > able to test right now if only one of the patches is bogus or if both of > them are concerned. Well, there goes a)..... > As the symptoms are easy to reproduce, I guess this is some kind of brown > paper bag bug and will be easy for XFS experts to spot. Well, not reproducable on my test boxes. It may well be a brown paper bag job, but it's not obvious. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-18 12:24 ` David Chinner @ 2007-12-18 14:30 ` Damien Wyart 2007-12-18 15:19 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: Damien Wyart @ 2007-12-18 14:30 UTC (permalink / raw) To: David Chinner Cc: Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML * David Chinner <dgc@sgi.com> [071218 13:24]: > Ok. I haven't noticed anything wrong with directories up to about > 250,000 files in the last few days. The ls -l I just did on > a directory with 15000 entries (btree format) used about 5MB of RAM. > extent format directories appear to work fine as well (tested 500 > entries). Ok, nice to know the problem is not so frequent. > Can you: > a) isolate the problem to one patch or the other. My guess > would be the directory mod, but..... Yes, it is indeed the directory patch. But even if I still sometimes get huge memory usage with ls (using the patched kernel), this is quite rare, and the problem is now mainly getting entries in the listing repeated, and the ls process taking longer than without the patch. But this is mainly after booting. I guess the cache plays a role and even using drop_caches, I can't reproduce the problem. Only on fresh reboot do I get it systematically, but much less often the memory problem. And as said earlier, after fresh boot on rc5-git5 without the directory patch, the ls -l goes normal (no repeated entries). > b) show your working ;) Sorry, I forgot this part in my initial report. > - what platform (i386, x86_64, etc) i386. > - what debug options Nothing special, the kernel has 4K stacks, and xfs partitions are mounted with noatime,nodiratime. > - commands and output that shows the problem It is mainly "ls -l" in a quite crowded directory. > - strace of ls -l going bad > - xfs_info from filesystem in question I have put the files at http://damien.wyart.free.fr/xfs/ strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created with the problematic kernel, and are quite bigger than strace_xfs_problem.normal.gz, which has been created with the vanilla rc5-git5. There is also xfs_info. I can provide further details if needed (maybe kernel config, but nothing special on the xfs side), but I confirm the behavior is different with and without the directory patch (041388b54ed95cd169546bd83bacd08ee32bd7ea on oss.sgi), and doesn't look normal with the patch. -- Damien Wyart ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-18 14:30 ` Damien Wyart @ 2007-12-18 15:19 ` David Chinner 2007-12-19 10:45 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-18 15:19 UTC (permalink / raw) To: Damien Wyart Cc: David Chinner, Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML On Tue, Dec 18, 2007 at 03:30:31PM +0100, Damien Wyart wrote: > * David Chinner <dgc@sgi.com> [071218 13:24]: > > Ok. I haven't noticed anything wrong with directories up to about > > 250,000 files in the last few days. The ls -l I just did on > > a directory with 15000 entries (btree format) used about 5MB of RAM. > > extent format directories appear to work fine as well (tested 500 > > entries). > > Ok, nice to know the problem is not so frequent. ..... > I have put the files at http://damien.wyart.free.fr/xfs/ > > strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created > with the problematic kernel, and are quite bigger than > strace_xfs_problem.normal.gz, which has been created with the vanilla > rc5-git5. There is also xfs_info. Looks like several getdents() through the directory the getdents() call starts outputting the first files again. It gets to a certain point and always goes back to the beginning. However, it appears to get to the end eventually (without ever getting past the bad offset). I'll ook into this more in the morning as it's not obvious what is wrong in my sleep-deprived state.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-18 15:19 ` David Chinner @ 2007-12-19 10:45 ` David Chinner 2007-12-19 11:17 ` Damien Wyart 0 siblings, 1 reply; 7+ messages in thread From: David Chinner @ 2007-12-19 10:45 UTC (permalink / raw) To: David Chinner Cc: Damien Wyart, Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML On Wed, Dec 19, 2007 at 02:19:47AM +1100, David Chinner wrote: > On Tue, Dec 18, 2007 at 03:30:31PM +0100, Damien Wyart wrote: > > * David Chinner <dgc@sgi.com> [071218 13:24]: > > > Ok. I haven't noticed anything wrong with directories up to about > > > 250,000 files in the last few days. The ls -l I just did on > > > a directory with 15000 entries (btree format) used about 5MB of RAM. > > > extent format directories appear to work fine as well (tested 500 > > > entries). > > > > Ok, nice to know the problem is not so frequent. > > ..... > > > I have put the files at http://damien.wyart.free.fr/xfs/ > > > > strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created > > with the problematic kernel, and are quite bigger than > > strace_xfs_problem.normal.gz, which has been created with the vanilla > > rc5-git5. There is also xfs_info. > > Looks like several getdents() through the directory the getdents() > call starts outputting the first files again. It gets to a certain > point and always goes back to the beginning. However, it appears to > get to the end eventually (without ever getting past the bad offset). UML and a bunch of printk's to the rescue. So we went back to double buffering, which then screwed up the d_off of the dirents. I changed the temporary dirents to point to the current offset so that filldir got what it expected when filling the user buffer. Except it appears that it I didn't to initialise the current offset for the first dirent read from the temporary buffer so filldir occasionally got an uninitialised offset. Can someone pass me a brown paper bag, please? In my local testing, more often than not, that uninitialised offset reads as zero which is where the looping comes from. Sometimes it points off into wacko-land, which is probably how we eventually get the looping terminating before you run out of memory. That also explains why we haven't seen it - it requires the user buffer to fill on the first entry of a backing buffer and so it is largely dependent on the pattern of name lengths, page size and filesystem block size aligning just right to trigger the problem. Can you test this patch, Damien? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/linux-2.6/xfs_file.c | 1 + 1 file changed, 1 insertion(+) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_file.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_file.c 2007-12-19 00:26:40.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_file.c 2007-12-19 21:26:38.701143555 +1100 @@ -348,6 +348,7 @@ xfs_file_readdir( size = buf.used; de = (struct hack_dirent *)buf.dirent; + curr_offset = de->offset /* & 0x7fffffff */; while (size > 0) { if (filldir(dirent, de->name, de->namlen, curr_offset & 0x7fffffff, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-19 10:45 ` David Chinner @ 2007-12-19 11:17 ` Damien Wyart 2007-12-19 11:31 ` David Chinner 0 siblings, 1 reply; 7+ messages in thread From: Damien Wyart @ 2007-12-19 11:17 UTC (permalink / raw) To: David Chinner Cc: Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML * David Chinner <dgc@sgi.com> [071219 11:45]: > Can someone pass me a brown paper bag, please? My first impression on this bug was not so wrong, after all ;-) > That also explains why we haven't seen it - it requires the user > buffer to fill on the first entry of a backing buffer and so it is > largely dependent on the pattern of name lengths, page size and > filesystem block size aligning just right to trigger the problem. I guess I was lucky to trigger it quite easily... > Can you test this patch, Damien? Works fine, all the bad symptoms have disappeared and strace output is normal. So you can add: Tested-by: Damien Wyart <damien.wyart@free.fr> -- Damien ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Important regression with XFS update for 2.6.24-rc6 2007-12-19 11:17 ` Damien Wyart @ 2007-12-19 11:31 ` David Chinner 0 siblings, 0 replies; 7+ messages in thread From: David Chinner @ 2007-12-19 11:31 UTC (permalink / raw) To: Damien Wyart Cc: David Chinner, Christoph Hellwig, Lachlan McIlroy, Peter Leckie, Linus Torvalds, linux-xfs, LKML On Wed, Dec 19, 2007 at 12:17:30PM +0100, Damien Wyart wrote: > * David Chinner <dgc@sgi.com> [071219 11:45]: > > Can someone pass me a brown paper bag, please? > > My first impression on this bug was not so wrong, after all ;-) > > > That also explains why we haven't seen it - it requires the user buffer to > > fill on the first entry of a backing buffer and so it is largely dependent > > on the pattern of name lengths, page size and filesystem block size > > aligning just right to trigger the problem. > > I guess I was lucky to trigger it quite easily... > > > Can you test this patch, Damien? > > Works fine, all the bad symptoms have disappeared and strace output is > normal. > > So you can add: > > Tested-by: Damien Wyart <damien.wyart@free.fr> Thanks for reporting the bug and testing the fix so quickly, Damien. I'll give it some more QA before I push it, though. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-12-19 11:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-18 11:28 Important regression with XFS update for 2.6.24-rc6 Damien Wyart 2007-12-18 12:24 ` David Chinner 2007-12-18 14:30 ` Damien Wyart 2007-12-18 15:19 ` David Chinner 2007-12-19 10:45 ` David Chinner 2007-12-19 11:17 ` Damien Wyart 2007-12-19 11:31 ` David Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox