[ANNOUNCE] Reiserfs/kill-bkl tree v2

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [ANNOUNCE] Reiserfs/kill-bkl tree v2
@ 2009-07-31 17:46 Frederic Weisbecker
  2009-08-01  8:11 ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2009-07-31 17:46 UTC (permalink / raw)
  To: LKML
  Cc: Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
	Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen,
	Marcel Hilzinger, Edward Shishkin

Hi everyone,

I'm pleased to announce the v2 of the reiserfs/kill-bkl tree.
(Has there been a v1 actually...I'm not sure).

This work was first borned and hosted in the tip:kill-the-bkl tree
and has then been detached as a seperate branch.

This patchset consists in dropping the bkl locking scheme from
reiserfs 3 and replacing it with a per superblock mutex.


I) Deal with the BKL scheme

The first obstacle was to deal with the bkl based locking scheme
in which the whole reiserfs code is based on:

Bkl behaviour:

- disables preemption
- is relaxed while scheduling
- can be acquired recursively by a task

The resulting reiserfs code:

- some callsites acquire the lock, sometimes recursively. In
  the latter case, it's often hard to fix
- after every calls to functions that might sleep, reiserfs
  performs checks to ensure the tree hasn't changed and compute
  fixups in the latter case.

These properties have resulted in the creation of an ad-hoc locking
primitive based on a mutex but that can be acquired recursively.
Also most might-sleep-callsites have been explicitly surrounded
with a relax of the lock.


II) Deal with performance regressions

The bkl is based on a spinlock whereas the new lock is based
on a mutex. We couldn't safely make it a spinlock because the
code locked by the bkl can sleep, and such conversion would
have needed a lot of rewrites.

There are a lot of reasons that can make a spinlock more efficient
than a mutex.
But still we have now two nice properties:

- the mutexes have the spin on owner features, making them closer to a
 spinlock behaviour.
- the bkl is *forced* to be relaxed on schedule(). And sometimes this
  is a weakness. After a simple kmalloc, we have to check the filesystem
  hasn't changed behind us and to fixup in that case. That can be very
  costly. Sometimes this is something we want, sometimes not. At least
  with a mutex, we can choose.


III) Benchmarks

Comparisons have been made using dbench between vanilla throughput
(bkl based) and the head of the reiserfs/kill-bkl tree (mutex based).
Both kernel had the same config (CONFIG_PREEMPT=n, CONFIG_SMP=y)

- Dbench with 1 thread during 600 seconds (better with the mutex):

  Lock                   Throughput in the end
  
  Bkl                    232.54 MB/sec
  Mutex                  237.71 MB/sec

Complete trace:

  Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-1.log
  Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-1.log

Graphical comparison:

  http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-1.pdf


- Dbench with 30 threads during 600 seconds (better with the bkl):

  Lock                   Throughput in the end

  Bkl			 92.41 MB/sec
  Mutex			 82.25 MB/sec

Complete trace:

  Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-30.log
  Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-30.log

Graphical comparison:

  http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-30.pdf


- Dbench with 100 threads during 600 seconds (better with the mutex):

  Lock                   Throughput in the end

  Bkl                    37.89 MB/sec
  Mutex                  40.58 MB/sec

Complete trace:

  Bkl: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/bkl-600-100.log
  Mutex: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/mut-600-100.log

Graphical comparison:

  http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-100.pdf

- Dbench with two threads, writing on a seperate partition simultaneoulsy,
  during 600 seconds (better with the mutex):

  Lock                   Thread #1         Thread #2

  Bkl			 199.95 MB/sec     186.16 MB/sec
  Mutex                  213.91 MB/sec     203.84 MB/sec

Complete trace:

  Bkl, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-bkl-600-1.log
  Bkl, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-bkl-600-1.log
  Mutex, thread #1: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual-mut-600-1.log
  Mutex, thread #2: http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dual2-mut-600-1.log

Graphical comparison:

http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/dbench-dual.pdf


IV) Testing and review

You can fetch the git tree, this one will keep being the most up to date:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git reiserfs/kill-bkl

Or you can either apply the raw diff:
http://www.kernel.org/pub/linux/kernel/people/frederic/bench-3107/reis_full.diff

Tests/reviews/any kind of contributions are very welcome!

Thanks,
Frederic.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
@ 2009-08-01  8:11 ` Andi Kleen
  2009-08-01 15:53   ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2009-08-01  8:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
	Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani, Andi Kleen,
	Marcel Hilzinger, Edward Shishkin

On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote:
> IV) Testing and review

You seem to have done a lot of benchmarks, but did you also do stress testing?

While a performance issue wouldn't be good, corrupting someone's file system
would be really bad.

e.g. running all the file system stress tests from LTP would be a good 
idea, ideally multiple at a time on a multi processor system on a ram
disk or perhaps AIM9.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-01  8:11 ` Andi Kleen
@ 2009-08-01 15:53   ` Frederic Weisbecker
  2009-08-02 14:21     ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-01 15:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Jeff Mahoney, Chris Mason, Ingo Molnar, Alexander Beregalov,
	Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
	Marcel Hilzinger, Edward Shishkin

On Sat, Aug 01, 2009 at 10:11:41AM +0200, Andi Kleen wrote:
> On Fri, Jul 31, 2009 at 07:46:43PM +0200, Frederic Weisbecker wrote:
> > IV) Testing and review
> 
> You seem to have done a lot of benchmarks, but did you also do stress testing?
> 
> While a performance issue wouldn't be good, corrupting someone's file system
> would be really bad.

Heh, your are fairly right.
I've been focusing so much on performances regressions, that I forgot
this step.

Well, actually no I haven't forgot about it, but...
The problem is that the only SMP box I have is my personal laptop.
It has been sufficient to perform the dbench tests I've posted, but I could
do more (and more finegrained) tests with a separate box.

Then, indeed I could launch some stress tests in my laptop, but stress
tests should be run in another way than throughput tests, ideally
stress tests should run this tree for long hours (no, ideally: all the time),
whereas throughput tests can make it with punctual tests (though still,
it would be nicer to do more and more finegrained).

I have UP test boxes however, I will do long hours stress tests on it
next week but I fear that UP, even with CONFIG_PREEMPT can't open
enough racy windows to unearth the real wicked bugs that are waiting.

That said, and as you can guess while reading between the aboves lines,
one of my secret hopes while posting this announce was to get people
to test it, why not with stress tests, using better hardware
ressources than I have :-)

> e.g. running all the file system stress tests from LTP would be a good 
> idea, ideally multiple at a time on a multi processor system on a ram
> disk or perhaps AIM9.

Yeah good idea. But again, I fear my laptop hasn't enough memory
to support big enough ramdisks mount points to host selftests.

Anyway, I'll launch stress tests on one of my UP boxes next week,
and also some in my laptop.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-01 15:53   ` Frederic Weisbecker
@ 2009-08-02 14:21     ` Ingo Molnar
  2009-08-02 14:32       ` Daniel Walker
                         ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Ingo Molnar @ 2009-08-02 14:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov,
	Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
	Marcel Hilzinger, Edward Shishkin

* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> > e.g. running all the file system stress tests from LTP would be 
> > a good idea, ideally multiple at a time on a multi processor 
> > system on a ram disk or perhaps AIM9.
> 
> Yeah good idea. But again, I fear my laptop hasn't enough memory 
> to support big enough ramdisks mount points to host selftests.

Well, dont waste too much time on it (beyond the due diligence 
level) - Andi forgot that the right way to stress-test patches is to 
get through the review process and then through the integration 
trees which have far more test exposure than any single contributor 
can test.

Patch submitters cannot possibly test every crazy possibility that 
is out there - nor should they: it just doesnt scale. What we expect 
people to do is to write clean patches, to test the bits on their 
own boxes and submit them to lkml and address specific review 
feedback.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-02 14:21     ` Ingo Molnar
@ 2009-08-02 14:32       ` Daniel Walker
  2009-08-02 23:41       ` Frank Ch. Eigler
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Daniel Walker @ 2009-08-02 14:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
	Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin

On Sun, 2009-08-02 at 16:21 +0200, Ingo Molnar wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > > e.g. running all the file system stress tests from LTP would be 
> > > a good idea, ideally multiple at a time on a multi processor 
> > > system on a ram disk or perhaps AIM9.
> > 
> > Yeah good idea. But again, I fear my laptop hasn't enough memory 
> > to support big enough ramdisks mount points to host selftests.
> 
> Well, dont waste too much time on it (beyond the due diligence 
> level) - Andi forgot that the right way to stress-test patches is to 
> get through the review process and then through the integration 
> trees which have far more test exposure than any single contributor 
> can test.
> 
> Patch submitters cannot possibly test every crazy possibility that 
> is out there - nor should they: it just doesnt scale. What we expect 
> people to do is to write clean patches, to test the bits on their 
> own boxes and submit them to lkml and address specific review 
> feedback.

Is this the case for patches against drivers which the submitter doesn't
have or can't get a hold of?

Daniel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-02 14:21     ` Ingo Molnar
  2009-08-02 14:32       ` Daniel Walker
@ 2009-08-02 23:41       ` Frank Ch. Eigler
  2009-08-03  6:09         ` Ingo Molnar
  2009-08-03  5:04       ` Roland Dreier
  2009-08-05 21:59       ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
  3 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2009-08-02 23:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
	Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin

Ingo Molnar <mingo@elte.hu> writes:

>> Yeah good idea. But again, I fear my laptop hasn't enough memory 
>> to support big enough ramdisks mount points to host selftests.
>
> Well, dont waste too much time on it (beyond the due diligence 
> level) - Andi forgot that the right way to stress-test patches is to 
> get through the review process and then through the integration 
> trees which have far more test exposure than any single contributor 
> can test.

What guideline can you offer as to what is "due diligence" level of
stress testing, as compared to delegating this task to eyeballed
reviews + incidental use on the integration trees?

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-02 23:41       ` Frank Ch. Eigler
@ 2009-08-03  6:09         ` Ingo Molnar
  2009-08-05 22:10           ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2009-08-03  6:09 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
	Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> >> Yeah good idea. But again, I fear my laptop hasn't enough 
> >> memory to support big enough ramdisks mount points to host 
> >> selftests.
> >
> > Well, dont waste too much time on it (beyond the due diligence 
> > level) - Andi forgot that the right way to stress-test patches 
> > is to get through the review process and then through the 
> > integration trees which have far more test exposure than any 
> > single contributor can test.
> 
> What guideline can you offer as to what is "due diligence" level 
> of stress testing, as compared to delegating this task to 
> eyeballed reviews + incidental use on the integration trees?

The kind of testing the VFS tree itself gets is a good starting 
point i suspect - and it is a far more critical tree as it can 
affects all filesystems. AFAICS the VFS tree relies on linux-next 
and -mm for testing mostly and that's a good model IMO.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-03  6:09         ` Ingo Molnar
@ 2009-08-05 22:10           ` Frederic Weisbecker
  0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Al Viro
  Cc: Frank Ch. Eigler, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
	Marcel Hilzinger, Edward Shishkin

On Mon, Aug 03, 2009 at 08:09:44AM +0200, Ingo Molnar wrote:
> 
> * Frank Ch. Eigler <fche@redhat.com> wrote:
> 
> > Ingo Molnar <mingo@elte.hu> writes:
> > 
> > >> Yeah good idea. But again, I fear my laptop hasn't enough 
> > >> memory to support big enough ramdisks mount points to host 
> > >> selftests.
> > >
> > > Well, dont waste too much time on it (beyond the due diligence 
> > > level) - Andi forgot that the right way to stress-test patches 
> > > is to get through the review process and then through the 
> > > integration trees which have far more test exposure than any 
> > > single contributor can test.
> > 
> > What guideline can you offer as to what is "due diligence" level 
> > of stress testing, as compared to delegating this task to 
> > eyeballed reviews + incidental use on the integration trees?
> 
> The kind of testing the VFS tree itself gets is a good starting 
> point i suspect - and it is a far more critical tree as it can 
> affects all filesystems. AFAICS the VFS tree relies on linux-next 
> and -mm for testing mostly and that's a good model IMO.
> 


I've asked Al Viro to integrate this branch in the VFS tree. He wasn't
opposed to that. Then I sent him the individual patches for integration
in VFS but hadn't any answer since then.

I've kept him cc'ed for any further patches too.

If Al still agrees about this integration, I would be pleased.

Frederic.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-02 14:21     ` Ingo Molnar
  2009-08-02 14:32       ` Daniel Walker
  2009-08-02 23:41       ` Frank Ch. Eigler
@ 2009-08-03  5:04       ` Roland Dreier
  2009-08-03 13:26         ` Chris Mason
  2009-08-05 21:59       ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
  3 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-03  5:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney, Chris Mason,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
	Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin

 > Well, dont waste too much time on it (beyond the due diligence 
 > level) - Andi forgot that the right way to stress-test patches is to 
 > get through the review process and then through the integration 
 > trees which have far more test exposure than any single contributor 
 > can test.
 > 
 > Patch submitters cannot possibly test every crazy possibility that 
 > is out there - nor should they: it just doesnt scale. What we expect 
 > people to do is to write clean patches, to test the bits on their 
 > own boxes and submit them to lkml and address specific review 
 > feedback.

I respectfully disagree in this case.  For patches that touch, say,
something hardware dependent where the patch submitter doesn't have all
the variations on the hardware, yes, I agree, scale the testing by
running the code on many machines.  But for the code in question, where
some very fundamental and complex changes are being made to filesystem
locking, I don't think that testing really scales -- after all, if there
is some race then it's quite likely that testers will just see some rare
filesystem corruption, which could easily waste weeks of debugging
before the BKL/reiserfs patches were even implicated.

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-03  5:04       ` Roland Dreier
@ 2009-08-03 13:26         ` Chris Mason
  2009-08-03 13:56           ` Ingo Molnar
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Chris Mason @ 2009-08-03 13:26 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Ingo Molnar, Frederic Weisbecker, Andi Kleen, LKML, Jeff Mahoney,
	Alexander Beregalov, Bron Gondwana, Reiserfs, Al Viro,
	Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin

On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote:
> 
>  > Well, dont waste too much time on it (beyond the due diligence 
>  > level) - Andi forgot that the right way to stress-test patches is to 
>  > get through the review process and then through the integration 
>  > trees which have far more test exposure than any single contributor 
>  > can test.
>  > 
>  > Patch submitters cannot possibly test every crazy possibility that 
>  > is out there - nor should they: it just doesnt scale. What we expect 
>  > people to do is to write clean patches, to test the bits on their 
>  > own boxes and submit them to lkml and address specific review 
>  > feedback.
> 
> I respectfully disagree in this case.  For patches that touch, say,
> something hardware dependent where the patch submitter doesn't have all
> the variations on the hardware, yes, I agree, scale the testing by
> running the code on many machines.  But for the code in question, where
> some very fundamental and complex changes are being made to filesystem
> locking, I don't think that testing really scales -- after all, if there
> is some race then it's quite likely that testers will just see some rare
> filesystem corruption, which could easily waste weeks of debugging
> before the BKL/reiserfs patches were even implicated.

Definitely, the cost of the rare bug is much higher.  The good news is
that reiserfs tends to pile its races into a few spots.  Most of them
can be found with a 12 hour run of the namesys stress.sh program and a
lot of memory pressure.  I'd compile with preemption on and you'll have
a good test on any SMP machine.

http://oss.oracle.com/~mason/stress.sh

stress.sh just copies a source directory into the test filesystem, then
reads it  back and deletes it in a loop.  I'd run with 50 procs and
enough memory  pressure for the box to lightly swap (booting w/mem= is a
fine way to make memory pressure).  This way you make sure to hammer on
the metadata writeback paths, which is where all of the difficult races
come in.

Testing with an fsx-linux process running at the same time will make
sure all of the mmap/truncate paths are working correctly as well.

-chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-03 13:26         ` Chris Mason
@ 2009-08-03 13:56           ` Ingo Molnar
  2009-08-05 22:13           ` Frederic Weisbecker
  2009-08-09 23:27           ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
  2 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2009-08-03 13:56 UTC (permalink / raw)
  To: Chris Mason, Roland Dreier, Frederic Weisbecker, Andi Kleen, LKML,
	Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
	Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin


* Chris Mason <chris.mason@oracle.com> wrote:

> Definitely, the cost of the rare bug is much higher.  The good 
> news is that reiserfs tends to pile its races into a few spots.  
> Most of them can be found with a 12 hour run of the namesys 
> stress.sh program and a lot of memory pressure.  I'd compile with 
> preemption on and you'll have a good test on any SMP machine.
> 
> http://oss.oracle.com/~mason/stress.sh
> 
> stress.sh just copies a source directory into the test filesystem, 
> then reads it back and deletes it in a loop.  I'd run with 50 
> procs and enough memory pressure for the box to lightly swap 
> (booting w/mem= is a fine way to make memory pressure).  This way 
> you make sure to hammer on the metadata writeback paths, which is 
> where all of the difficult races come in.
> 
> Testing with an fsx-linux process running at the same time will 
> make sure all of the mmap/truncate paths are working correctly as 
> well.

Ok, that definitely looks like something feasible to run for 
Frederic too, thanks for the detailed suggestion!

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-03 13:26         ` Chris Mason
  2009-08-03 13:56           ` Ingo Molnar
@ 2009-08-05 22:13           ` Frederic Weisbecker
  2009-08-09 23:27           ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
  2 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 22:13 UTC (permalink / raw)
  To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML,
	Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
	Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin

On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote:
> On Sun, Aug 02, 2009 at 10:04:40PM -0700, Roland Dreier wrote:
> > 
> >  > Well, dont waste too much time on it (beyond the due diligence 
> >  > level) - Andi forgot that the right way to stress-test patches is to 
> >  > get through the review process and then through the integration 
> >  > trees which have far more test exposure than any single contributor 
> >  > can test.
> >  > 
> >  > Patch submitters cannot possibly test every crazy possibility that 
> >  > is out there - nor should they: it just doesnt scale. What we expect 
> >  > people to do is to write clean patches, to test the bits on their 
> >  > own boxes and submit them to lkml and address specific review 
> >  > feedback.
> > 
> > I respectfully disagree in this case.  For patches that touch, say,
> > something hardware dependent where the patch submitter doesn't have all
> > the variations on the hardware, yes, I agree, scale the testing by
> > running the code on many machines.  But for the code in question, where
> > some very fundamental and complex changes are being made to filesystem
> > locking, I don't think that testing really scales -- after all, if there
> > is some race then it's quite likely that testers will just see some rare
> > filesystem corruption, which could easily waste weeks of debugging
> > before the BKL/reiserfs patches were even implicated.
> 
> Definitely, the cost of the rare bug is much higher.  The good news is
> that reiserfs tends to pile its races into a few spots.  Most of them
> can be found with a 12 hour run of the namesys stress.sh program and a
> lot of memory pressure.  I'd compile with preemption on and you'll have
> a good test on any SMP machine.
> 
> http://oss.oracle.com/~mason/stress.sh
> 
> stress.sh just copies a source directory into the test filesystem, then
> reads it  back and deletes it in a loop.  I'd run with 50 procs and
> enough memory  pressure for the box to lightly swap (booting w/mem= is a
> fine way to make memory pressure).  This way you make sure to hammer on
> the metadata writeback paths, which is where all of the difficult races
> come in.
> 
> Testing with an fsx-linux process running at the same time will make
> sure all of the mmap/truncate paths are working correctly as well.
> 
> -chris

Thanks a lot for this script Chris, I'm going to test with that.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment
  2009-08-03 13:26         ` Chris Mason
  2009-08-03 13:56           ` Ingo Molnar
  2009-08-05 22:13           ` Frederic Weisbecker
@ 2009-08-09 23:27           ` Frederic Weisbecker
  2 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-09 23:27 UTC (permalink / raw)
  To: Chris Mason, Roland Dreier, Ingo Molnar, Andi Kleen, LKML,
	Jeff Mahoney, Alexander Beregalov, Bron Gondwana, Reiserfs,
	Al Viro, Andrea Gelmini, Trenton D. Adams, Thomas Meyer,
	Alessio Igor Bogani, Marcel Hilzinger, Edward Shishkin,
	Laurent Riffard

On Mon, Aug 03, 2009 at 09:26:59AM -0400, Chris Mason wrote:
> Definitely, the cost of the rare bug is much higher.  The good news is
> that reiserfs tends to pile its races into a few spots.  Most of them
> can be found with a 12 hour run of the namesys stress.sh program and a
> lot of memory pressure.  I'd compile with preemption on and you'll have
> a good test on any SMP machine.
> 
> http://oss.oracle.com/~mason/stress.sh
> 
> stress.sh just copies a source directory into the test filesystem, then
> reads it  back and deletes it in a loop.  I'd run with 50 procs and
> enough memory  pressure for the box to lightly swap (booting w/mem= is a
> fine way to make memory pressure).  This way you make sure to hammer on
> the metadata writeback paths, which is where all of the difficult races
> come in.
> 
> Testing with an fsx-linux process running at the same time will make
> sure all of the mmap/truncate paths are working correctly as well.
> 
> -chris
> 


Running this script has unearthed a bug introduced in my last commit.
This is fixed in the patch below.
Thanks for this script, I'm now running it very often, only on PREEMPT UP
for now.

---
>From a22c48509ca7b54206c0616141278e5561f119ef Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Mon, 10 Aug 2009 00:53:45 +0200
Subject: [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment

The previous commit:
"kill-the-bkl/reiserfs: release the lock only for first entry in readdir"
brought a bug which increments the readdir offset even if we failed to
copy a directory entry through filldir.

Then if we are in the end of the user buffer, there are chances that
getdents() will be subsequently called with a new buffer to continue
fetching the directory. At this time the directory entry offset will
be wrong because it has omitted the previous entry that failed to copy.

We need to increment the directory offset after fetching an entry, not
before.

This fixes weird bugs in which a directory seems not empty whereas
it is.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
---
 fs/reiserfs/dir.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c
index d6fb8d3..d4477eb 100644
--- a/fs/reiserfs/dir.c
+++ b/fs/reiserfs/dir.c
@@ -195,12 +195,6 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent,
 				*pos = d_off;
 				d_ino = deh_objectid(deh);
 
-				/*
-				 * next entry should be looked for with such
-				 * offset
-				 */
-				next_pos = deh_offset(deh) + 1;
-
 				if (first_entry) {
 					int fillret;
 
@@ -221,11 +215,18 @@ int reiserfs_readdir_dentry(struct dentry *dentry, void *dirent,
 
 					if (item_moved(&tmp_ih, &path_to_entry))
 						goto research;
-					continue;
-				}
-				if (filldir(dirent, d_name, d_reclen, d_off,
-					    d_ino, DT_UNKNOWN) < 0)
+				} else {
+					if (filldir(dirent, d_name, d_reclen,
+						  d_off, d_ino, DT_UNKNOWN) < 0)
 						goto end;
+				}
+
+				/*
+				 * next entry should be looked for with such
+				 * offset
+				 */
+				next_pos = deh_offset(deh) + 1;
+
 			}	/* for */
 		}
 
-- 
1.6.2.3


You can find this patch and the other in this series in the following git
tree:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	reiserfs/kill-bkl

Thanks.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCE] Reiserfs/kill-bkl tree v2
  2009-08-02 14:21     ` Ingo Molnar
                         ` (2 preceding siblings ...)
  2009-08-03  5:04       ` Roland Dreier
@ 2009-08-05 21:59       ` Frederic Weisbecker
  3 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2009-08-05 21:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, LKML, Jeff Mahoney, Chris Mason, Alexander Beregalov,
	Bron Gondwana, Reiserfs, Al Viro, Andrea Gelmini,
	Trenton D. Adams, Thomas Meyer, Alessio Igor Bogani,
	Marcel Hilzinger, Edward Shishkin

On Sun, Aug 02, 2009 at 04:21:00PM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > > e.g. running all the file system stress tests from LTP would be 
> > > a good idea, ideally multiple at a time on a multi processor 
> > > system on a ram disk or perhaps AIM9.
> > 
> > Yeah good idea. But again, I fear my laptop hasn't enough memory 
> > to support big enough ramdisks mount points to host selftests.
> 
> Well, dont waste too much time on it (beyond the due diligence 
> level) - Andi forgot that the right way to stress-test patches is to 
> get through the review process and then through the integration 
> trees which have far more test exposure than any single contributor 
> can test.
> 
> Patch submitters cannot possibly test every crazy possibility that 
> is out there - nor should they: it just doesnt scale. What we expect 
> people to do is to write clean patches, to test the bits on their 
> own boxes and submit them to lkml and address specific review 
> feedback.
> 
> 	Ingo

Yeah, I know that doesn't scale. But I really would like to test it, at
least with the minimum scope I have.
Such fundamental change in a filesystem is scary :)


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-08-09 23:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-31 17:46 [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker
2009-08-01  8:11 ` Andi Kleen
2009-08-01 15:53   ` Frederic Weisbecker
2009-08-02 14:21     ` Ingo Molnar
2009-08-02 14:32       ` Daniel Walker
2009-08-02 23:41       ` Frank Ch. Eigler
2009-08-03  6:09         ` Ingo Molnar
2009-08-05 22:10           ` Frederic Weisbecker
2009-08-03  5:04       ` Roland Dreier
2009-08-03 13:26         ` Chris Mason
2009-08-03 13:56           ` Ingo Molnar
2009-08-05 22:13           ` Frederic Weisbecker
2009-08-09 23:27           ` [PATCH] kill-the-bkl/reiserfs: fix early readdir offset increment Frederic Weisbecker
2009-08-05 21:59       ` [ANNOUNCE] Reiserfs/kill-bkl tree v2 Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox