* [ANNOUNCE] Reiserfs/kill-bkl tree v2
@ 2009-07-31 17:46 Frederic Weisbecker
2009-08-01 8:11 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2009-07-31 17:46 UTC (permalink / raw)
To: LKML
Cc: Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen,
Marcel Hilzinger, Edward Shishkin
Hi everyone,
I'm pleased to announce the v2 of the reiserfs/kill-bkl tree.
(Has there been a v1 actually...I'm not sure).
This work was first borned and hosted in the tip:kill-the-bkl tree
and has then been detached as a seperate branch.
This patchset consists in dropping the bkl locking scheme from
reiserfs 3 and replacing it with a per superblock mutex.
I) Deal with the BKL scheme
The first obstacle was to deal with the bkl based locking scheme
in which the whole reiserfs code is based on:
Bkl behaviour:
- disables preemption
- is relaxed while scheduling
- can be acquired recursively by a task
The resulting reiserfs code:
- some callsites acquire the lock, sometimes recursively. In
the latter case, it's often hard to fix
- after every calls to functions that might sleep, reiserfs
performs checks to ensure the tree hasn't changed and compute
fixups in the latter case.
These properties have resulted in the creation of an ad-hoc locking
primitive based on a mutex but that can be acquired recursively.
Also most might-sleep-callsites have been explicitly surrounded
with a relax of the lock.
II) Deal with performance regressions
The bkl is based on a spinlock whereas the new lock is based
on a mutex. We couldn't safely make it a spinlock because the
code locked by the bkl can sleep, and such conversion would
have needed a lot of rewrites.
There are a lot of reasons that can make a spinlock more efficient
than a mutex.
But still we have now two nice properties:
- the mutexes have the spin on owner features, making them closer to a
spinlock behaviour.
- the bkl is *forced* to be relaxed on schedule(). And sometimes this
is a weakness. After a simple kmalloc, we have to check the filesystem
hasn't changed behind us and to fixup in that case. That can be very
costly. Sometimes this is something we want, sometimes not. At least
with a mutex, we can choose.
III) Benchmarks
Comparisons have been made using dbench between vanilla throughput
(bkl based) and the head of the reiserfs/kill-bkl tree (mutex based).
Both kernel had the same config (CONFIG_PREEMPT=n, CONFIG_SMP=y)
- Dbench with 1 thread during 600 seconds (better with the mutex):
Lock Throughput in the end
Bkl 232.54 MB/sec
Mutex 237.71 MB/sec
Complete trace:
Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-1.log
Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-1.log
Graphical comparison:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-1.pdf
- Dbench with 30 threads during 600 seconds (better with the bkl):
Lock Throughput in the end
Bkl 92.41 MB/sec
Mutex 82.25 MB/sec
Complete trace:
Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-30.log
Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-30.log
Graphical comparison:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-30.pdf
- Dbench with 100 threads during 600 seconds (better with the mutex):
Lock Throughput in the end
Bkl 37.89 MB/sec
Mutex 40.58 MB/sec
Complete trace:
Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-100.log
Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-100.log
Graphical comparison:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-100.pdf
- Dbench with two threads, writing on a seperate partition simultaneoulsy,
during 600 seconds (better with the mutex):
Lock Thread #1 Thread #2
Bkl 199.95 MB/sec 186.16 MB/sec
Mutex 213.91 MB/sec 203.84 MB/sec
Complete trace:
Bkl, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-bkl-600-1.log
Bkl, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-bkl-600-1.log
Mutex, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-mut-600-1.log
Mutex, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-mut-600-1.log
Graphical comparison:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-dual.pdf
IV) Testing and review
You can fetch the git tree, this one will keep being the most up to date:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git reiserfs/kill-bkl
Or you can either apply the raw diff:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/reis_full.diff
Tests/reviews/any kind of contributions are very welcome!
Thanks,
Frederic.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
@ 2009-08-01 8:11 ` Andi Kleen
2009-08-01 15:53 ` Frederic Weisbecker
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2009-08-01 8:11 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen,
Marcel Hilzinger, Edward Shishkin
On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote:
> IV) Testing and review
You seem to have done a lot of benchmarks, but did you also do stress testing?
While a performance issue wouldn't be good, corrupting someone's file system
would be really bad.
e.g. running all the file system stress tests from LTP would be a good
idea, ideally multiple at a time on a multi processor system on a ram
disk or perhaps AIM9.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-01 8:11 ` Andi Kleen
@ 2009-08-01 15:53 ` Frederic Weisbecker
2009-08-02 14:21 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-01 15:53 UTC (permalink / raw)
To: Andi Kleen
Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
Marcel Hilzinger, Edward Shishkin
On Sat, Aug 01, 2009 at 10:11:41AM +0200, Andi Kleen wrote:
> On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote:
> > IV) Testing and review
>
> You seem to have done a lot of benchmarks, but did you also do stress testing?
>
> While a performance issue wouldn't be good, corrupting someone's file system
> would be really bad.
Heh, your are fairly right.
I've been focusing so much on performances regressions, that I forgot
this step.
Well, actually no I haven't forgot about it, but...
The problem is that the only SMP box I have is my personal laptop.
It has been sufficient to perform the dbench tests I've posted, but I could
do more (and more finegrained) tests with a separate box.
Then, indeed I could launch some stress tests in my laptop, but stress
tests should be run in another way than throughput tests, ideally
stress tests should run this tree for long hours (no, ideally: all the time),
whereas throughput tests can make it with punctual tests (though still,
it would be nicer to do more and more finegrained).
I have UP test boxes however, I will do long hours stress tests on it
next week but I fear that UP, even with CONFIG_PREEMPT can't open
enough racy windows to unearth the real wicked bugs that are waiting.
That said, and as you can guess while reading between the aboves lines,
one of my secret hopes while posting this announce was to get people
to test it, why not with stress tests, using better hardware
ressources than I have :-)
> e.g. running all the file system stress tests from LTP would be a good
> idea, ideally multiple at a time on a multi processor system on a ram
> disk or perhaps AIM9.
Yeah good idea. But again, I fear my laptop hasn't enough memory
to support big enough ramdisks mount points to host selftests.
Anyway, I'll launch stress tests on one of my UP boxes next week,
and also some in my laptop.
Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-01 15:53 ` Frederic Weisbecker
@ 2009-08-02 14:21 ` Ingo Molnar
2009-08-02 14:32 ` Daniel Walker
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Ingo Molnar @ 2009-08-02 14:21 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov,
Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
Marcel Hilzinger, Edward Shishkin
* Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > e.g. running all the file system stress tests from LTP would be
> > a good idea, ideally multiple at a time on a multi processor
> > system on a ram disk or perhaps AIM9.
>
> Yeah good idea. But again, I fear my laptop hasn't enough memory
> to support big enough ramdisks mount points to host selftests.
Well, dont waste too much time on it (beyond the due diligence
level) - Andi forgot that the right way to stress-test patches is to
get through the review process and then through the integration
trees which have far more test exposure than any single contributor
can test.
Patch submitters cannot possibly test every crazy possibility that
is out there - nor should they: it just doesnt scale. What we expect
people to do is to write clean patches, to test the bits on their
own boxes and submit them to lkml and address specific review
feedback.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-02 14:21 ` Ingo Molnar
@ 2009-08-02 14:32 ` Daniel Walker
2009-08-02 23:41 ` Frank Ch. Eigler
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Daniel Walker @ 2009-08-02 14:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
On Sun, 2009-08-02 at 16:21 +0200, Ingo Molnar wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > > e.g. running all the file system stress tests from LTP would be
> > > a good idea, ideally multiple at a time on a multi processor
> > > system on a ram disk or perhaps AIM9.
> >
> > Yeah good idea. But again, I fear my laptop hasn't enough memory
> > to support big enough ramdisks mount points to host selftests.
>
> Well, dont waste too much time on it (beyond the due diligence
> level) - Andi forgot that the right way to stress-test patches is to
> get through the review process and then through the integration
> trees which have far more test exposure than any single contributor
> can test.
>
> Patch submitters cannot possibly test every crazy possibility that
> is out there - nor should they: it just doesnt scale. What we expect
> people to do is to write clean patches, to test the bits on their
> own boxes and submit them to lkml and address specific review
> feedback.
Is this the case for patches against drivers which the submitter doesn't
have or can't get a hold of?
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-02 14:21 ` Ingo Molnar
2009-08-02 14:32 ` Daniel Walker
@ 2009-08-02 23:41 ` Frank Ch. Eigler
2009-08-03 6:09 ` Ingo Molnar
2009-08-03 5:04 ` Roland Dreier
2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
3 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2009-08-02 23:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
Ingo Molnar <mingo@elte.hu> writes:
>> Yeah good idea. But again, I fear my laptop hasn't enough memory
>> to support big enough ramdisks mount points to host selftests.
>
> Well, dont waste too much time on it (beyond the due diligence
> level) - Andi forgot that the right way to stress-test patches is to
> get through the review process and then through the integration
> trees which have far more test exposure than any single contributor
> can test.
What guideline can you offer as to what is "due diligence" level of
stress testing, as compared to delegating this task to eyeballed
reviews + incidental use on the integration trees?
- FChE
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-02 14:21 ` Ingo Molnar
2009-08-02 14:32 ` Daniel Walker
2009-08-02 23:41 ` Frank Ch. Eigler
@ 2009-08-03 5:04 ` Roland Dreier
2009-08-03 13:26 ` Chris Mason
2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
3 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-03 5:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
> Well, dont waste too much time on it (beyond the due diligence
> level) - Andi forgot that the right way to stress-test patches is to
> get through the review process and then through the integration
> trees which have far more test exposure than any single contributor
> can test.
>
> Patch submitters cannot possibly test every crazy possibility that
> is out there - nor should they: it just doesnt scale. What we expect
> people to do is to write clean patches, to test the bits on their
> own boxes and submit them to lkml and address specific review
> feedback.
I respectfully disagree in this case. For patches that touch, say,
something hardware dependent where the patch submitter doesn't have all
the variations on the hardware, yes, I agree, scale the testing by
running the code on many machines. But for the code in question, where
some very fundamental and complex changes are being made to filesystem
locking, I don't think that testing really scales -- after all, if there
is some race then it's quite likely that testers will just see some rare
filesystem corruption, which could easily waste weeks of debugging
before the BKL/reiserfs patches were even implicated.
- R.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-02 23:41 ` Frank Ch. Eigler
@ 2009-08-03 6:09 ` Ingo Molnar
2009-08-05 22:10 ` Frederic Weisbecker
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2009-08-03 6:09 UTC (permalink / raw)
To: Frank Ch. Eigler
Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
* Frank Ch. Eigler <fche@redhat.com> wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
> >> Yeah good idea. But again, I fear my laptop hasn't enough
> >> memory to support big enough ramdisks mount points to host
> >> selftests.
> >
> > Well, dont waste too much time on it (beyond the due diligence
> > level) - Andi forgot that the right way to stress-test patches
> > is to get through the review process and then through the
> > integration trees which have far more test exposure than any
> > single contributor can test.
>
> What guideline can you offer as to what is "due diligence" level
> of stress testing, as compared to delegating this task to
> eyeballed reviews + incidental use on the integration trees?
The kind of testing the VFS tree itself gets is a good starting
point i suspect - and it is a far more critical tree as it can
affects all filesystems. AFAICS the VFS tree relies on linux-next
and -mm for testing mostly and that's a good model IMO.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-03 5:04 ` Roland Dreier
@ 2009-08-03 13:26 ` Chris Mason
2009-08-03 13:56 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Chris Mason @ 2009-08-03 13:26 UTC (permalink / raw)
To: Roland Dreier
Cc: Ingo Molnar, Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney,
Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote:
>
> > Well, dont waste too much time on it (beyond the due diligence
> > level) - Andi forgot that the right way to stress-test patches is to
> > get through the review process and then through the integration
> > trees which have far more test exposure than any single contributor
> > can test.
> >
> > Patch submitters cannot possibly test every crazy possibility that
> > is out there - nor should they: it just doesnt scale. What we expect
> > people to do is to write clean patches, to test the bits on their
> > own boxes and submit them to lkml and address specific review
> > feedback.
>
> I respectfully disagree in this case. For patches that touch, say,
> something hardware dependent where the patch submitter doesn't have all
> the variations on the hardware, yes, I agree, scale the testing by
> running the code on many machines. But for the code in question, where
> some very fundamental and complex changes are being made to filesystem
> locking, I don't think that testing really scales -- after all, if there
> is some race then it's quite likely that testers will just see some rare
> filesystem corruption, which could easily waste weeks of debugging
> before the BKL/reiserfs patches were even implicated.
Definitely, the cost of the rare bug is much higher. The good news is
that reiserfs tends to pile its races into a few spots. Most of them
can be found with a 12 hour run of the namesys stress.sh program and a
lot of memory pressure. I'd compile with preemption on and you'll have
a good test on any SMP machine.
http://oss.oracle.com/~mason/stress.sh
stress.sh just copies a source directory into the test filesystem, then
reads it back and deletes it in a loop. I'd run with 50 procs and
enough memory pressure for the box to lightly swap (booting w/mem= is a
fine way to make memory pressure). This way you make sure to hammer on
the metadata writeback paths, which is where all of the difficult races
come in.
Testing with an fsx-linux process running at the same time will make
sure all of the mmap/truncate paths are working correctly as well.
-chris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-03 13:26 ` Chris Mason
@ 2009-08-03 13:56 ` Ingo Molnar
2009-08-05 22:13 ` Frederic Weisbecker
2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
2 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2009-08-03 13:56 UTC (permalink / raw)
To: Chris Mason, Roland Dreier, Frederic Weisbecker, Andi Kleen, LKML,
Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
* Chris Mason <chris.mason@oracle.com> wrote:
> Definitely, the cost of the rare bug is much higher. The good
> news is that reiserfs tends to pile its races into a few spots.
> Most of them can be found with a 12 hour run of the namesys
> stress.sh program and a lot of memory pressure. I'd compile with
> preemption on and you'll have a good test on any SMP machine.
>
> http://oss.oracle.com/~mason/stress.sh
>
> stress.sh just copies a source directory into the test filesystem,
> then reads it back and deletes it in a loop. I'd run with 50
> procs and enough memory pressure for the box to lightly swap
> (booting w/mem= is a fine way to make memory pressure). This way
> you make sure to hammer on the metadata writeback paths, which is
> where all of the difficult races come in.
>
> Testing with an fsx-linux process running at the same time will
> make sure all of the mmap/truncate paths are working correctly as
> well.
Ok, that definitely looks like something feasible to run for
Frederic too, thanks for the detailed suggestion!
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-02 14:21 ` Ingo Molnar
` (2 preceding siblings ...)
2009-08-03 5:04 ` Roland Dreier
@ 2009-08-05 21:59 ` Frederic Weisbecker
3 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 21:59 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov,
Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
Marcel Hilzinger, Edward Shishkin
On Sun, Aug 02, 2009 at 04:21:00PM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > > e.g. running all the file system stress tests from LTP would be
> > > a good idea, ideally multiple at a time on a multi processor
> > > system on a ram disk or perhaps AIM9.
> >
> > Yeah good idea. But again, I fear my laptop hasn't enough memory
> > to support big enough ramdisks mount points to host selftests.
>
> Well, dont waste too much time on it (beyond the due diligence
> level) - Andi forgot that the right way to stress-test patches is to
> get through the review process and then through the integration
> trees which have far more test exposure than any single contributor
> can test.
>
> Patch submitters cannot possibly test every crazy possibility that
> is out there - nor should they: it just doesnt scale. What we expect
> people to do is to write clean patches, to test the bits on their
> own boxes and submit them to lkml and address specific review
> feedback.
>
> Ingo
Yeah, I know that doesn't scale. But I really would like to test it, at
least with the minimum scope I have.
Such fundamental change in a filesystem is scary :)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-03 6:09 ` Ingo Molnar
@ 2009-08-05 22:10 ` Frederic Weisbecker
0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 22:10 UTC (permalink / raw)
To: Ingo Molnar, Al Viro
Cc: Frank Ch. Eigler, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
Alexander Beregalov, Bron Gondwana, Reiserfs, Andrea Gelmini,
Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
Marcel Hilzinger, Edward Shishkin
On Mon, Aug 03, 2009 at 08:09:44AM +0200, Ingo Molnar wrote:
>
> * Frank Ch. Eigler <fche@redhat.com> wrote:
>
> > Ingo Molnar <mingo@elte.hu> writes:
> >
> > >> Yeah good idea. But again, I fear my laptop hasn't enough
> > >> memory to support big enough ramdisks mount points to host
> > >> selftests.
> > >
> > > Well, dont waste too much time on it (beyond the due diligence
> > > level) - Andi forgot that the right way to stress-test patches
> > > is to get through the review process and then through the
> > > integration trees which have far more test exposure than any
> > > single contributor can test.
> >
> > What guideline can you offer as to what is "due diligence" level
> > of stress testing, as compared to delegating this task to
> > eyeballed reviews + incidental use on the integration trees?
>
> The kind of testing the VFS tree itself gets is a good starting
> point i suspect - and it is a far more critical tree as it can
> affects all filesystems. AFAICS the VFS tree relies on linux-next
> and -mm for testing mostly and that's a good model IMO.
>
I've asked Al Viro to integrate this branch in the VFS tree. He wasn't
opposed to that. Then I sent him the individual patches for integration
in VFS but hadn't any answer since then.
I've kept him cc'ed for any further patches too.
If Al still agrees about this integration, I would be pleased.
Frederic.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
2009-08-03 13:26 ` Chris Mason
2009-08-03 13:56 ` Ingo Molnar
@ 2009-08-05 22:13 ` Frederic Weisbecker
2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
2 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 22:13 UTC (permalink / raw)
To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML,
Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin
On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote:
> On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote:
> >
> > > Well, dont waste too much time on it (beyond the due diligence
> > > level) - Andi forgot that the right way to stress-test patches is to
> > > get through the review process and then through the integration
> > > trees which have far more test exposure than any single contributor
> > > can test.
> > >
> > > Patch submitters cannot possibly test every crazy possibility that
> > > is out there - nor should they: it just doesnt scale. What we expect
> > > people to do is to write clean patches, to test the bits on their
> > > own boxes and submit them to lkml and address specific review
> > > feedback.
> >
> > I respectfully disagree in this case. For patches that touch, say,
> > something hardware dependent where the patch submitter doesn't have all
> > the variations on the hardware, yes, I agree, scale the testing by
> > running the code on many machines. But for the code in question, where
> > some very fundamental and complex changes are being made to filesystem
> > locking, I don't think that testing really scales -- after all, if there
> > is some race then it's quite likely that testers will just see some rare
> > filesystem corruption, which could easily waste weeks of debugging
> > before the BKL/reiserfs patches were even implicated.
>
> Definitely, the cost of the rare bug is much higher. The good news is
> that reiserfs tends to pile its races into a few spots. Most of them
> can be found with a 12 hour run of the namesys stress.sh program and a
> lot of memory pressure. I'd compile with preemption on and you'll have
> a good test on any SMP machine.
>
> http://oss.oracle.com/~mason/stress.sh
>
> stress.sh just copies a source directory into the test filesystem, then
> reads it back and deletes it in a loop. I'd run with 50 procs and
> enough memory pressure for the box to lightly swap (booting w/mem= is a
> fine way to make memory pressure). This way you make sure to hammer on
> the metadata writeback paths, which is where all of the difficult races
> come in.
>
> Testing with an fsx-linux process running at the same time will make
> sure all of the mmap/truncate paths are working correctly as well.
>
> -chris
Thanks a lot for this script Chris, I'm going to test with that.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment
2009-08-03 13:26 ` Chris Mason
2009-08-03 13:56 ` Ingo Molnar
2009-08-05 22:13 ` Frederic Weisbecker
@ 2009-08-09 23:27 ` Frederic Weisbecker
2 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-09 23:27 UTC (permalink / raw)
To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML,
Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin,
Laurent Riffard
On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote:
> Definitely, the cost of the rare bug is much higher. The good news is
> that reiserfs tends to pile its races into a few spots. Most of them
> can be found with a 12 hour run of the namesys stress.sh program and a
> lot of memory pressure. I'd compile with preemption on and you'll have
> a good test on any SMP machine.
>
> http://oss.oracle.com/~mason/stress.sh
>
> stress.sh just copies a source directory into the test filesystem, then
> reads it back and deletes it in a loop. I'd run with 50 procs and
> enough memory pressure for the box to lightly swap (booting w/mem= is a
> fine way to make memory pressure). This way you make sure to hammer on
> the metadata writeback paths, which is where all of the difficult races
> come in.
>
> Testing with an fsx-linux process running at the same time will make
> sure all of the mmap/truncate paths are working correctly as well.
>
> -chris
>
Running this script has unearthed a bug introduced in my last commit.
This is fixed in the patch below.
Thanks for this script, I'm now running it very often, only on PREEMPT UP
for now.
---
>From a22c48509ca7b54206c0616141278e5561f119ef Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Mon, 10 Aug 2009 00:53:45 +0200
Subject: [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment
The previous commit:
"kill-the-bkl/reiserfs: release the lock only for first entry in readdir"
brought a bug which increments the readdir offset even if we failed to
copy a directory entry through filldir.
Then if we are in the end of the user buffer, there are chances that
getdents() will be subsequently called with a new buffer to continue
fetching the directory. At this time the directory entry offset will
be wrong because it has omitted the previous entry that failed to copy.
We need to increment the directory offset after fetching an entry, not
before.
This fixes weird bugs in which a directory seems not empty whereas
it is.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
---
fs/reiserfs/dir.c | 21 +++++++++++----------
1 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c
index d6fb8d3..d4477eb 100644
--- a/fs/reiserfs/dir.c
+++ b/fs/reiserfs/dir.c
@@ -195,12 +195,6 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent,
*pos = d_off;
d_ino = deh_objectid(deh);
- /*
- * next entry should be looked for with such
- * offset
- */
- next_pos = deh_offset(deh) + 1;
-
if (first_entry) {
int fillret;
@@ -221,11 +215,18 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent,
if (item_moved(&tmp_ih, &path_to_entry))
goto research;
- continue;
- }
- if (filldir(dirent, d_name, d_reclen, d_off,
- d_ino, DT_UNKNOWN) < 0)
+ } else {
+ if (filldir(dirent, d_name, d_reclen,
+ d_off, d_ino, DT_UNKNOWN) < 0)
goto end;
+ }
+
+ /*
+ * next entry should be looked for with such
+ * offset
+ */
+ next_pos = deh_offset(deh) + 1;
+
} /* for */
}
--
1.6.2.3
You can find this patch and the other in this series in the following git
tree:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
reiserfs/kill-bkl
Thanks.
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-08-09 23:27 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
2009-08-01 8:11 ` Andi Kleen
2009-08-01 15:53 ` Frederic Weisbecker
2009-08-02 14:21 ` Ingo Molnar
2009-08-02 14:32 ` Daniel Walker
2009-08-02 23:41 ` Frank Ch. Eigler
2009-08-03 6:09 ` Ingo Molnar
2009-08-05 22:10 ` Frederic Weisbecker
2009-08-03 5:04 ` Roland Dreier
2009-08-03 13:26 ` Chris Mason
2009-08-03 13:56 ` Ingo Molnar
2009-08-05 22:13 ` Frederic Weisbecker
2009-08-09 23:27 ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
2009-08-05 21:59 ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).