* [ANNOUNCE] Reiserfs/kill-bkl tree v2 @ 2009-07-31 17:46 Frederic Weisbecker 2009-08-01 8:11 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Frederic Weisbecker @ 2009-07-31 17:46 UTC (permalink / raw) To: LKML Cc: Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen, Marcel Hilzinger, Edward Shishkin Hi everyone, I'm pleased to announce the v2 of the reiserfs/kill-bkl tree. (Has there been a v1 actually...I'm not sure). This work was first borned and hosted in the tip:kill-the-bkl tree and has then been detached as a seperate branch. This patchset consists in dropping the bkl locking scheme from reiserfs 3 and replacing it with a per superblock mutex. I) Deal with the BKL scheme The first obstacle was to deal with the bkl based locking scheme in which the whole reiserfs code is based on: Bkl behaviour: - disables preemption - is relaxed while scheduling - can be acquired recursively by a task The resulting reiserfs code: - some callsites acquire the lock, sometimes recursively. In the latter case, it's often hard to fix - after every calls to functions that might sleep, reiserfs performs checks to ensure the tree hasn't changed and compute fixups in the latter case. These properties have resulted in the creation of an ad-hoc locking primitive based on a mutex but that can be acquired recursively. Also most might-sleep-callsites have been explicitly surrounded with a relax of the lock. II) Deal with performance regressions The bkl is based on a spinlock whereas the new lock is based on a mutex. We couldn't safely make it a spinlock because the code locked by the bkl can sleep, and such conversion would have needed a lot of rewrites. There are a lot of reasons that can make a spinlock more efficient than a mutex. But still we have now two nice properties: - the mutexes have the spin on owner features, making them closer to a spinlock behaviour. - the bkl is *forced* to be relaxed on schedule(). And sometimes this is a weakness. After a simple kmalloc, we have to check the filesystem hasn't changed behind us and to fixup in that case. That can be very costly. Sometimes this is something we want, sometimes not. At least with a mutex, we can choose. III) Benchmarks Comparisons have been made using dbench between vanilla throughput (bkl based) and the head of the reiserfs/kill-bkl tree (mutex based). Both kernel had the same config (CONFIG_PREEMPT=n, CONFIG_SMP=y) - Dbench with 1 thread during 600 seconds (better with the mutex): Lock Throughput in the end Bkl 232.54 MB/sec Mutex 237.71 MB/sec Complete trace: Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-1.log Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-1.log Graphical comparison: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-1.pdf - Dbench with 30 threads during 600 seconds (better with the bkl): Lock Throughput in the end Bkl 92.41 MB/sec Mutex 82.25 MB/sec Complete trace: Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-30.log Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-30.log Graphical comparison: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-30.pdf - Dbench with 100 threads during 600 seconds (better with the mutex): Lock Throughput in the end Bkl 37.89 MB/sec Mutex 40.58 MB/sec Complete trace: Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-100.log Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-100.log Graphical comparison: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-100.pdf - Dbench with two threads, writing on a seperate partition simultaneoulsy, during 600 seconds (better with the mutex): Lock Thread #1 Thread #2 Bkl 199.95 MB/sec 186.16 MB/sec Mutex 213.91 MB/sec 203.84 MB/sec Complete trace: Bkl, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-bkl-600-1.log Bkl, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-bkl-600-1.log Mutex, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-mut-600-1.log Mutex, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-mut-600-1.log Graphical comparison: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-dual.pdf IV) Testing and review You can fetch the git tree, this one will keep being the most up to date: git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git reiserfs/kill-bkl Or you can either apply the raw diff: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/reis_full.diff Tests/reviews/any kind of contributions are very welcome! Thanks, Frederic. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker @ 2009-08-01 8:11 ` Andi Kleen 2009-08-01 15:53 ` Frederic Weisbecker 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2009-08-01 8:11 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen, Marcel Hilzinger, Edward Shishkin On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote: > IV) Testing and review You seem to have done a lot of benchmarks, but did you also do stress testing? While a performance issue wouldn't be good, corrupting someone's file system would be really bad. e.g. running all the file system stress tests from LTP would be a good idea, ideally multiple at a time on a multi processor system on a ram disk or perhaps AIM9. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-01 8:11 ` Andi Kleen @ 2009-08-01 15:53 ` Frederic Weisbecker 2009-08-02 14:21 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Frederic Weisbecker @ 2009-08-01 15:53 UTC (permalink / raw) To: Andi Kleen Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Sat, Aug 01, 2009 at 10:11:41AM +0200, Andi Kleen wrote: > On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote: > > IV) Testing and review > > You seem to have done a lot of benchmarks, but did you also do stress testing? > > While a performance issue wouldn't be good, corrupting someone's file system > would be really bad. Heh, your are fairly right. I've been focusing so much on performances regressions, that I forgot this step. Well, actually no I haven't forgot about it, but... The problem is that the only SMP box I have is my personal laptop. It has been sufficient to perform the dbench tests I've posted, but I could do more (and more finegrained) tests with a separate box. Then, indeed I could launch some stress tests in my laptop, but stress tests should be run in another way than throughput tests, ideally stress tests should run this tree for long hours (no, ideally: all the time), whereas throughput tests can make it with punctual tests (though still, it would be nicer to do more and more finegrained). I have UP test boxes however, I will do long hours stress tests on it next week but I fear that UP, even with CONFIG_PREEMPT can't open enough racy windows to unearth the real wicked bugs that are waiting. That said, and as you can guess while reading between the aboves lines, one of my secret hopes while posting this announce was to get people to test it, why not with stress tests, using better hardware ressources than I have :-) > e.g. running all the file system stress tests from LTP would be a good > idea, ideally multiple at a time on a multi processor system on a ram > disk or perhaps AIM9. Yeah good idea. But again, I fear my laptop hasn't enough memory to support big enough ramdisks mount points to host selftests. Anyway, I'll launch stress tests on one of my UP boxes next week, and also some in my laptop. Thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-01 15:53 ` Frederic Weisbecker @ 2009-08-02 14:21 ` Ingo Molnar 2009-08-02 14:32 ` Daniel Walker ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Ingo Molnar @ 2009-08-02 14:21 UTC (permalink / raw) To: Frederic Weisbecker Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > e.g. running all the file system stress tests from LTP would be > > a good idea, ideally multiple at a time on a multi processor > > system on a ram disk or perhaps AIM9. > > Yeah good idea. But again, I fear my laptop hasn't enough memory > to support big enough ramdisks mount points to host selftests. Well, dont waste too much time on it (beyond the due diligence level) - Andi forgot that the right way to stress-test patches is to get through the review process and then through the integration trees which have far more test exposure than any single contributor can test. Patch submitters cannot possibly test every crazy possibility that is out there - nor should they: it just doesnt scale. What we expect people to do is to write clean patches, to test the bits on their own boxes and submit them to lkml and address specific review feedback. Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-02 14:21 ` Ingo Molnar @ 2009-08-02 14:32 ` Daniel Walker 2009-08-02 23:41 ` Frank Ch. Eigler ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Daniel Walker @ 2009-08-02 14:32 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Sun, 2009-08-02 at 16:21 +0200, Ingo Molnar wrote: > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > > e.g. running all the file system stress tests from LTP would be > > > a good idea, ideally multiple at a time on a multi processor > > > system on a ram disk or perhaps AIM9. > > > > Yeah good idea. But again, I fear my laptop hasn't enough memory > > to support big enough ramdisks mount points to host selftests. > > Well, dont waste too much time on it (beyond the due diligence > level) - Andi forgot that the right way to stress-test patches is to > get through the review process and then through the integration > trees which have far more test exposure than any single contributor > can test. > > Patch submitters cannot possibly test every crazy possibility that > is out there - nor should they: it just doesnt scale. What we expect > people to do is to write clean patches, to test the bits on their > own boxes and submit them to lkml and address specific review > feedback. Is this the case for patches against drivers which the submitter doesn't have or can't get a hold of? Daniel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-02 14:21 ` Ingo Molnar 2009-08-02 14:32 ` Daniel Walker @ 2009-08-02 23:41 ` Frank Ch. Eigler 2009-08-03 6:09 ` Ingo Molnar 2009-08-03 5:04 ` Roland Dreier 2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker 3 siblings, 1 reply; 14+ messages in thread From: Frank Ch. Eigler @ 2009-08-02 23:41 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin Ingo Molnar <mingo@elte.hu> writes: >> Yeah good idea. But again, I fear my laptop hasn't enough memory >> to support big enough ramdisks mount points to host selftests. > > Well, dont waste too much time on it (beyond the due diligence > level) - Andi forgot that the right way to stress-test patches is to > get through the review process and then through the integration > trees which have far more test exposure than any single contributor > can test. What guideline can you offer as to what is "due diligence" level of stress testing, as compared to delegating this task to eyeballed reviews + incidental use on the integration trees? - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-02 23:41 ` Frank Ch. Eigler @ 2009-08-03 6:09 ` Ingo Molnar 2009-08-05 22:10 ` Frederic Weisbecker 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2009-08-03 6:09 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin * Frank Ch. Eigler <fche@redhat.com> wrote: > Ingo Molnar <mingo@elte.hu> writes: > > >> Yeah good idea. But again, I fear my laptop hasn't enough > >> memory to support big enough ramdisks mount points to host > >> selftests. > > > > Well, dont waste too much time on it (beyond the due diligence > > level) - Andi forgot that the right way to stress-test patches > > is to get through the review process and then through the > > integration trees which have far more test exposure than any > > single contributor can test. > > What guideline can you offer as to what is "due diligence" level > of stress testing, as compared to delegating this task to > eyeballed reviews + incidental use on the integration trees? The kind of testing the VFS tree itself gets is a good starting point i suspect - and it is a far more critical tree as it can affects all filesystems. AFAICS the VFS tree relies on linux-next and -mm for testing mostly and that's a good model IMO. Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-03 6:09 ` Ingo Molnar @ 2009-08-05 22:10 ` Frederic Weisbecker 0 siblings, 0 replies; 14+ messages in thread From: Frederic Weisbecker @ 2009-08-05 22:10 UTC (permalink / raw) To: Ingo Molnar, Al Viro Cc: Frank Ch. Eigler, Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Mon, Aug 03, 2009 at 08:09:44AM +0200, Ingo Molnar wrote: > > * Frank Ch. Eigler <fche@redhat.com> wrote: > > > Ingo Molnar <mingo@elte.hu> writes: > > > > >> Yeah good idea. But again, I fear my laptop hasn't enough > > >> memory to support big enough ramdisks mount points to host > > >> selftests. > > > > > > Well, dont waste too much time on it (beyond the due diligence > > > level) - Andi forgot that the right way to stress-test patches > > > is to get through the review process and then through the > > > integration trees which have far more test exposure than any > > > single contributor can test. > > > > What guideline can you offer as to what is "due diligence" level > > of stress testing, as compared to delegating this task to > > eyeballed reviews + incidental use on the integration trees? > > The kind of testing the VFS tree itself gets is a good starting > point i suspect - and it is a far more critical tree as it can > affects all filesystems. AFAICS the VFS tree relies on linux-next > and -mm for testing mostly and that's a good model IMO. > I've asked Al Viro to integrate this branch in the VFS tree. He wasn't opposed to that. Then I sent him the individual patches for integration in VFS but hadn't any answer since then. I've kept him cc'ed for any further patches too. If Al still agrees about this integration, I would be pleased. Frederic. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-02 14:21 ` Ingo Molnar 2009-08-02 14:32 ` Daniel Walker 2009-08-02 23:41 ` Frank Ch. Eigler @ 2009-08-03 5:04 ` Roland Dreier 2009-08-03 13:26 ` Chris Mason 2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker 3 siblings, 1 reply; 14+ messages in thread From: Roland Dreier @ 2009-08-03 5:04 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin > Well, dont waste too much time on it (beyond the due diligence > level) - Andi forgot that the right way to stress-test patches is to > get through the review process and then through the integration > trees which have far more test exposure than any single contributor > can test. > > Patch submitters cannot possibly test every crazy possibility that > is out there - nor should they: it just doesnt scale. What we expect > people to do is to write clean patches, to test the bits on their > own boxes and submit them to lkml and address specific review > feedback. I respectfully disagree in this case. For patches that touch, say, something hardware dependent where the patch submitter doesn't have all the variations on the hardware, yes, I agree, scale the testing by running the code on many machines. But for the code in question, where some very fundamental and complex changes are being made to filesystem locking, I don't think that testing really scales -- after all, if there is some race then it's quite likely that testers will just see some rare filesystem corruption, which could easily waste weeks of debugging before the BKL/reiserfs patches were even implicated. - R. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-03 5:04 ` Roland Dreier @ 2009-08-03 13:26 ` Chris Mason 2009-08-03 13:56 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Chris Mason @ 2009-08-03 13:26 UTC (permalink / raw) To: Roland Dreier Cc: Ingo Molnar, Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote: > > > Well, dont waste too much time on it (beyond the due diligence > > level) - Andi forgot that the right way to stress-test patches is to > > get through the review process and then through the integration > > trees which have far more test exposure than any single contributor > > can test. > > > > Patch submitters cannot possibly test every crazy possibility that > > is out there - nor should they: it just doesnt scale. What we expect > > people to do is to write clean patches, to test the bits on their > > own boxes and submit them to lkml and address specific review > > feedback. > > I respectfully disagree in this case. For patches that touch, say, > something hardware dependent where the patch submitter doesn't have all > the variations on the hardware, yes, I agree, scale the testing by > running the code on many machines. But for the code in question, where > some very fundamental and complex changes are being made to filesystem > locking, I don't think that testing really scales -- after all, if there > is some race then it's quite likely that testers will just see some rare > filesystem corruption, which could easily waste weeks of debugging > before the BKL/reiserfs patches were even implicated. Definitely, the cost of the rare bug is much higher. The good news is that reiserfs tends to pile its races into a few spots. Most of them can be found with a 12 hour run of the namesys stress.sh program and a lot of memory pressure. I'd compile with preemption on and you'll have a good test on any SMP machine. http://oss.oracle.com/~mason/stress.sh stress.sh just copies a source directory into the test filesystem, then reads it back and deletes it in a loop. I'd run with 50 procs and enough memory pressure for the box to lightly swap (booting w/mem= is a fine way to make memory pressure). This way you make sure to hammer on the metadata writeback paths, which is where all of the difficult races come in. Testing with an fsx-linux process running at the same time will make sure all of the mmap/truncate paths are working correctly as well. -chris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-03 13:26 ` Chris Mason @ 2009-08-03 13:56 ` Ingo Molnar 2009-08-05 22:13 ` Frederic Weisbecker 2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker 2 siblings, 0 replies; 14+ messages in thread From: Ingo Molnar @ 2009-08-03 13:56 UTC (permalink / raw) To: Chris Mason, Roland Dreier, Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin * Chris Mason <chris.mason@oracle.com> wrote: > Definitely, the cost of the rare bug is much higher. The good > news is that reiserfs tends to pile its races into a few spots. > Most of them can be found with a 12 hour run of the namesys > stress.sh program and a lot of memory pressure. I'd compile with > preemption on and you'll have a good test on any SMP machine. > > http://oss.oracle.com/~mason/stress.sh > > stress.sh just copies a source directory into the test filesystem, > then reads it back and deletes it in a loop. I'd run with 50 > procs and enough memory pressure for the box to lightly swap > (booting w/mem= is a fine way to make memory pressure). This way > you make sure to hammer on the metadata writeback paths, which is > where all of the difficult races come in. > > Testing with an fsx-linux process running at the same time will > make sure all of the mmap/truncate paths are working correctly as > well. Ok, that definitely looks like something feasible to run for Frederic too, thanks for the detailed suggestion! Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-03 13:26 ` Chris Mason 2009-08-03 13:56 ` Ingo Molnar @ 2009-08-05 22:13 ` Frederic Weisbecker 2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker 2 siblings, 0 replies; 14+ messages in thread From: Frederic Weisbecker @ 2009-08-05 22:13 UTC (permalink / raw) To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML, Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote: > On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote: > > > > > Well, dont waste too much time on it (beyond the due diligence > > > level) - Andi forgot that the right way to stress-test patches is to > > > get through the review process and then through the integration > > > trees which have far more test exposure than any single contributor > > > can test. > > > > > > Patch submitters cannot possibly test every crazy possibility that > > > is out there - nor should they: it just doesnt scale. What we expect > > > people to do is to write clean patches, to test the bits on their > > > own boxes and submit them to lkml and address specific review > > > feedback. > > > > I respectfully disagree in this case. For patches that touch, say, > > something hardware dependent where the patch submitter doesn't have all > > the variations on the hardware, yes, I agree, scale the testing by > > running the code on many machines. But for the code in question, where > > some very fundamental and complex changes are being made to filesystem > > locking, I don't think that testing really scales -- after all, if there > > is some race then it's quite likely that testers will just see some rare > > filesystem corruption, which could easily waste weeks of debugging > > before the BKL/reiserfs patches were even implicated. > > Definitely, the cost of the rare bug is much higher. The good news is > that reiserfs tends to pile its races into a few spots. Most of them > can be found with a 12 hour run of the namesys stress.sh program and a > lot of memory pressure. I'd compile with preemption on and you'll have > a good test on any SMP machine. > > http://oss.oracle.com/~mason/stress.sh > > stress.sh just copies a source directory into the test filesystem, then > reads it back and deletes it in a loop. I'd run with 50 procs and > enough memory pressure for the box to lightly swap (booting w/mem= is a > fine way to make memory pressure). This way you make sure to hammer on > the metadata writeback paths, which is where all of the difficult races > come in. > > Testing with an fsx-linux process running at the same time will make > sure all of the mmap/truncate paths are working correctly as well. > > -chris Thanks a lot for this script Chris, I'm going to test with that. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment 2009-08-03 13:26 ` Chris Mason 2009-08-03 13:56 ` Ingo Molnar 2009-08-05 22:13 ` Frederic Weisbecker @ 2009-08-09 23:27 ` Frederic Weisbecker 2 siblings, 0 replies; 14+ messages in thread From: Frederic Weisbecker @ 2009-08-09 23:27 UTC (permalink / raw) To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML, Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin, Laurent Riffard On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote: > Definitely, the cost of the rare bug is much higher. The good news is > that reiserfs tends to pile its races into a few spots. Most of them > can be found with a 12 hour run of the namesys stress.sh program and a > lot of memory pressure. I'd compile with preemption on and you'll have > a good test on any SMP machine. > > http://oss.oracle.com/~mason/stress.sh > > stress.sh just copies a source directory into the test filesystem, then > reads it back and deletes it in a loop. I'd run with 50 procs and > enough memory pressure for the box to lightly swap (booting w/mem= is a > fine way to make memory pressure). This way you make sure to hammer on > the metadata writeback paths, which is where all of the difficult races > come in. > > Testing with an fsx-linux process running at the same time will make > sure all of the mmap/truncate paths are working correctly as well. > > -chris > Running this script has unearthed a bug introduced in my last commit. This is fixed in the patch below. Thanks for this script, I'm now running it very often, only on PREEMPT UP for now. --- >From a22c48509ca7b54206c0616141278e5561f119ef Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker <fweisbec@gmail.com> Date: Mon, 10 Aug 2009 00:53:45 +0200 Subject: [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment The previous commit: "kill-the-bkl/reiserfs: release the lock only for first entry in readdir" brought a bug which increments the readdir offset even if we failed to copy a directory entry through filldir. Then if we are in the end of the user buffer, there are chances that getdents() will be subsequently called with a new buffer to continue fetching the directory. At this time the directory entry offset will be wrong because it has omitted the previous entry that failed to copy. We need to increment the directory offset after fetching an entry, not before. This fixes weird bugs in which a directory seems not empty whereas it is. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> --- fs/reiserfs/dir.c | 21 +++++++++++---------- 1 files changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c index d6fb8d3..d4477eb 100644 --- a/fs/reiserfs/dir.c +++ b/fs/reiserfs/dir.c @@ -195,12 +195,6 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent, *pos = d_off; d_ino = deh_objectid(deh); - /* - * next entry should be looked for with such - * offset - */ - next_pos = deh_offset(deh) + 1; - if (first_entry) { int fillret; @@ -221,11 +215,18 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent, if (item_moved(&tmp_ih, &path_to_entry)) goto research; - continue; - } - if (filldir(dirent, d_name, d_reclen, d_off, - d_ino, DT_UNKNOWN) < 0) + } else { + if (filldir(dirent, d_name, d_reclen, + d_off, d_ino, DT_UNKNOWN) < 0) goto end; + } + + /* + * next entry should be looked for with such + * offset + */ + next_pos = deh_offset(deh) + 1; + } /* for */ } -- 1.6.2.3 You can find this patch and the other in this series in the following git tree: git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git reiserfs/kill-bkl Thanks. ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2 2009-08-02 14:21 ` Ingo Molnar ` (2 preceding siblings ...) 2009-08-03 5:04 ` Roland Dreier @ 2009-08-05 21:59 ` Frederic Weisbecker 3 siblings, 0 replies; 14+ messages in thread From: Frederic Weisbecker @ 2009-08-05 21:59 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin On Sun, Aug 02, 2009 at 04:21:00PM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > > e.g. running all the file system stress tests from LTP would be > > > a good idea, ideally multiple at a time on a multi processor > > > system on a ram disk or perhaps AIM9. > > > > Yeah good idea. But again, I fear my laptop hasn't enough memory > > to support big enough ramdisks mount points to host selftests. > > Well, dont waste too much time on it (beyond the due diligence > level) - Andi forgot that the right way to stress-test patches is to > get through the review process and then through the integration > trees which have far more test exposure than any single contributor > can test. > > Patch submitters cannot possibly test every crazy possibility that > is out there - nor should they: it just doesnt scale. What we expect > people to do is to write clean patches, to test the bits on their > own boxes and submit them to lkml and address specific review > feedback. > > Ingo Yeah, I know that doesn't scale. But I really would like to test it, at least with the minimum scope I have. Such fundamental change in a filesystem is scary :) ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-08-09 23:27 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker 2009-08-01 8:11 ` Andi Kleen 2009-08-01 15:53 ` Frederic Weisbecker 2009-08-02 14:21 ` Ingo Molnar 2009-08-02 14:32 ` Daniel Walker 2009-08-02 23:41 ` Frank Ch. Eigler 2009-08-03 6:09 ` Ingo Molnar 2009-08-05 22:10 ` Frederic Weisbecker 2009-08-03 5:04 ` Roland Dreier 2009-08-03 13:26 ` Chris Mason 2009-08-03 13:56 ` Ingo Molnar 2009-08-05 22:13 ` Frederic Weisbecker 2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker 2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox