public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] reiserfs fixes
@ 2010-01-02  1:27 Frederic Weisbecker
  2010-01-02 13:41 ` Andi Kleen
  2010-01-02 19:19 ` Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02  1:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Frederic Weisbecker, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

Linus,

Please pull the reiserfs/kill-bkl branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	reiserfs/kill-bkl

These changes fix a lot of lock inversions, some of them were
triggering soft lockups very easily in xattrs operations.

As the reiserfs lock is a giant lock (in reiserfs scope),
these dependency inversions couldn't get smart fixes without a deep
locking rewrite.

That's why you'll mostly find dependency inversion fixes based on
such pattern:

reiserfs_write_unlock()
mutex_lock(random_lock)
reiserfs_write_lock()

This is not beautiful but at least that's better than the bkl.

Oh and I expect other lock inversions will get reported in
the future due to rare and then yet untested paths.

Thanks,
	Frederic
---

Frederic Weisbecker (13):
      reiserfs: Fix possible recursive lock
      reiserfs: Fix reiserfs lock and journal lock inversion dependency
      reiserfs: Fix reiserfs lock <-> inode mutex dependency inversion
      reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion
      reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion
      reiserfs: Warn on lock relax if taken recursively
      reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr
      reiserfs: Relax reiserfs lock while freeing the journal
      reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle()
      reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink()
      reiserfs: Fix journal mutex <-> inode mutex lock inversion
      reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr
      reiserfs: Safely acquire i_mutex from xattr_rmdir


 fs/reiserfs/bitmap.c        |    3 +++
 fs/reiserfs/inode.c         |    5 +++--
 fs/reiserfs/journal.c       |   18 ++++++++++++++----
 fs/reiserfs/lock.c          |    9 +++++++++
 fs/reiserfs/namei.c         |    7 ++++---
 fs/reiserfs/xattr.c         |   26 ++++++++++++++++++++------
 include/linux/reiserfs_fs.h |   26 ++++++++++++++++++++++++++
 7 files changed, 79 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02  1:27 [GIT PULL] reiserfs fixes Frederic Weisbecker
@ 2010-01-02 13:41 ` Andi Kleen
  2010-01-02 16:36   ` Frederic Weisbecker
  2010-01-02 19:19 ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2010-01-02 13:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, LKML, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

Frederic Weisbecker <fweisbec@gmail.com> writes:
>
> That's why you'll mostly find dependency inversion fixes based on
> such pattern:
>
> reiserfs_write_unlock()
> mutex_lock(random_lock)
> reiserfs_write_lock()

These `workarounds' look rather ugly and are likely much slower
than the BKL that was there before. Perhaps it's better to simply
go back to the BKL until this can be all fixed properly
(or a more faithful emulation for the BKL can be devised)?

>
> This is not beautiful but at least that's better than the bkl.
>
> Oh and I expect other lock inversions will get reported in
> the future due to rare and then yet untested paths.

... and given that was the conversion really a good idea?

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02 13:41 ` Andi Kleen
@ 2010-01-02 16:36   ` Frederic Weisbecker
  2010-01-02 17:43     ` reiserfs broken in 2.6.32 was " Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 16:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, LKML, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 02:41:51PM +0100, Andi Kleen wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> >
> > That's why you'll mostly find dependency inversion fixes based on
> > such pattern:
> >
> > reiserfs_write_unlock()
> > mutex_lock(random_lock)
> > reiserfs_write_lock()
> 
> These `workarounds' look rather ugly and are likely much slower
> than the BKL that was there before.


This is ugly, I can't argue against that. But while this is
apparently uglier than the bkl, it is actually better than it.

The bkl does its dirty workarounds internally. We don't see
it as it's done on schedule time and so. So indeed, placing
a simple lock_kernel() in each outer most caller in reiserfs
looks much proper and pretty than all these workarounds that
try to catch up with its locking-based scheme.

Now try to look at the new reiserfs lock not from a visual
point of view but from its impact. The bkl relax the lock, it is
acquired recursively etc... This usually implies a lot of hard
investigation. The new lock has identified all of
these dirty implicit places, knows when it is required to
relax, knows when it is acquired recursively, knows where
are the lock inversions once converted into a normal lock.

All of this hard work of investigation has been done already.
If someone wants to, one day, refine the locking to make
something smarter (which I doubt), starting from the current
codebase is _way_ much easier than starting from the 2.6.32
bkl based scheme. Actually someone who is going to try that
from the bkl point will undoubtly need an intermediate state
like the current one.

The ugliness is here. But finding it more ugly than the bkl
is a _pure_ illusion.

Concerning the slowness. The xattr operation that have been
patched here don't appear to me beeing in a fast path.
Moreover some lookup areas (apart from relax on random lock) have
been relaxed from the reiserfs lock in this new set.


> Perhaps it's better to simply
> go back to the BKL until this can be all fixed properly
> (or a more faithful emulation for the BKL can be devised)?


There are 99% of chances that nobody will ever fix it. Especially
if one needs to start from the bkl base.
Few people are familiar with the reiserfs code. Among these
people, I doubt someone is motivated to do it.

And for those who aren't familiar with it, reiserfs code is
so messy that I doubt many people will try something
for more than few minutes.

This is especially true now that it is an old and legacy
filesystem. Nobody cares anymore.

I only have reiserfs partitions in my laptop and my testbox,
nothing else. And that because I'm now maintaining it de facto.
Otherwise I wouldn't encumber with that and would immediately
set up btrfs everywhere.

Concerning a more faithful emulation of the bkl. That would
require to divide the bkl in several sub-bkl. This is
pointless and even worst than the bkl.

- that would require a notifier in schedule(), one notifier
  per sub-bkl. That's horrible for performances. And for
  the scheduler. I will be the first to NAK.

- that doesn't solve the problem as this sub-bkl won't ever
  be removed. We just isolate a giant lock in reiserfs. That
  doesn't change anything, nor make the things simpler
  especially since we already know that reiserfs use of the bkl
  is not related to other users of bkl.

- the reiserfs lock is per superblock. It scales better.


> >
> > This is not beautiful but at least that's better than the bkl.
> >
> > Oh and I expect other lock inversions will get reported in
> > the future due to rare and then yet untested paths.
> 
> ... and given that was the conversion really a good idea?


Despite the _apparent_ ugliness compared to the bkl, I'm
still sure this was, and is still, a good idea.

The fact we are going to experience other locking inversions
is a necessary pain. It's impossible to bypass these states,
you can't remove that easily such giant lock from a complex
code base.

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 16:36   ` Frederic Weisbecker
@ 2010-01-02 17:43     ` Andi Kleen
  2010-01-02 19:02       ` Frederic Weisbecker
                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Andi Kleen @ 2010-01-02 17:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

> I only have reiserfs partitions in my laptop and my testbox,
> nothing else. And that because I'm now maintaining it de facto.

AFAIK it's widely used in SUSE installations. It was the default
for a long time.

And right now as in 2.6.32 it's in a state of
"may randomly explode/deadlock". And no clear path out of it. Not good.

I am very concerned about destabilizing a widely used file system
like this. This has the potential to really hurt users.

> - that would require a notifier in schedule(), one notifier
>   per sub-bkl. That's horrible for performances. And for
>   the scheduler. I will be the first to NAK.

I thought the original idea was to find everything that 
can sleep in reiserfs and simply wrap it with lock dropping?

That should be roughly equivalent to the old BKL semantics.

Where did it go wrong?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 17:43     ` reiserfs broken in 2.6.32 was " Andi Kleen
@ 2010-01-02 19:02       ` Frederic Weisbecker
  2010-01-02 19:23         ` Andi Kleen
  2010-01-02 20:11       ` Ingo Molnar
  2010-01-02 22:18       ` Ingo Molnar
  2 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 19:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, LKML, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 06:43:12PM +0100, Andi Kleen wrote:
> > I only have reiserfs partitions in my laptop and my testbox,
> > nothing else. And that because I'm now maintaining it de facto.
> 
> AFAIK it's widely used in SUSE installations. It was the default
> for a long time.
> 
> And right now as in 2.6.32 it's in a state of
> "may randomly explode/deadlock". And no clear path out of it. Not good.
> 
> I am very concerned about destabilizing a widely used file system
> like this. This has the potential to really hurt users.


I understand your worries. And I've been very cautious with that,
waiting for three cycles before requesting an upstream merge. I did
it because the isolated tree model did not scale anymore.

Now that it's upstream, I get more testing and I expect that, in
the end of this cycle, I get most of these issues reported and
fixed.

Serious users who run serious datas won't ship 2.6.33, they will ship
a further stable version 2.6.33.x (if they haven't converted their
filesystems already).
And at this time, things should be 99% fixed.

 
> > - that would require a notifier in schedule(), one notifier
> >   per sub-bkl. That's horrible for performances. And for
> >   the scheduler. I will be the first to NAK.
> 
> I thought the original idea was to find everything that 
> can sleep in reiserfs and simply wrap it with lock dropping?
>
> That should be roughly equivalent to the old BKL semantics.
>
> Where did it go wrong?



That's the theory. Fitting into this strict scheme brings performance
regressions. The bkl is a spinlock, it disables preemption, it is
relaxed on sleep, and doesn't have locking dependencies. Moreover
it's not a lock but a simulation of a NO_PREEMPT UP flow, with all
the fixup guardians that come with (fixup if we schedule, as
scheduling brings races).

>From the conversion is borned a mutex. Even though we have
adaptive spinning, we don't catch up spinlock performances
as it's not a pure optimized looping fast path, and it may
actually just sleep.

The bkl is relaxed only when we sleep. Now simulating that with
a mutex that gets explicitly relaxed is not the same thing as
we need to relax the lock each time we _might_ sleep. It means
we relax more and that brings performance regressions.

That said it's sometimes a drawback for the bkl to be relaxed
every time we schedule, because we need to fixup after that,
sometimes we need to re-walk into the entire tree, etc...

So sometimes we can do better. There are some places where
we don't relax like did the bkl, so that we don't need to fixup,
and we get a win of performances.

You see? The bkl semantics must not be always strictly imitated on
such conversion. It depends on what does the code. In reiserfs,
sometimes it was desired that the bkl get relaxed, sometimes it
wasn't. And all the reiserfs code deals with that.

With a mutex we have the choice. So the conversion has been
a balance between performance regressions brought by the mutex
conversion, and the performance win because we have actually
more control with a traditional lock.

That said there are places where we really need to sleep, like
when we grab another lock, so that we don't create inverted
dependencies.


That said, if the general opinion is in favour of unmerging
the bkl removal changes in reiserfs. Then please do.

Just to express my point of view, as my primary goal is not
to fix reiserfs but the kernel: If you are afraid of such
changes, your kernel will just become mildewed by the time.
You need to drop such bad ill-legacies if you want it to
evolve. Until every users of the big kernel lock will remain
in the kernel, vanilla upstream will keep it as ball and chain,
won't ever be able to perform any serious real time service,
etc...

So yes this is risky. But I think this is necessary. And as I
explained above, things will be fine as serious datas are not
manipulated with a random -rc2 (except my own datas...).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02  1:27 [GIT PULL] reiserfs fixes Frederic Weisbecker
  2010-01-02 13:41 ` Andi Kleen
@ 2010-01-02 19:19 ` Linus Torvalds
  2010-01-02 19:21   ` Linus Torvalds
  2010-01-02 19:22   ` Frederic Weisbecker
  1 sibling, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2010-01-02 19:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christian Kujau, Alexander Beregalov, Chris Mason,
	Ingo Molnar, Greg KH



On Sat, 2 Jan 2010, Frederic Weisbecker wrote:
> 
> These changes fix a lot of lock inversions, some of them were
> triggering soft lockups very easily in xattrs operations.

Should 2.6.32-stable merge this branch too?

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02 19:19 ` Linus Torvalds
@ 2010-01-02 19:21   ` Linus Torvalds
  2010-01-02 19:24     ` Frederic Weisbecker
  2010-01-02 19:22   ` Frederic Weisbecker
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2010-01-02 19:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christian Kujau, Alexander Beregalov, Chris Mason,
	Ingo Molnar, Greg KH



On Sat, 2 Jan 2010, Linus Torvalds wrote:
> 
> Should 2.6.32-stable merge this branch too?

Never mind, none of the reiserfs bkl-removal went into 2.6.32. We had some 
other BKL work that got in there, but all the reiserfs stuff is 
post-2.6.32 only, I guess.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02 19:19 ` Linus Torvalds
  2010-01-02 19:21   ` Linus Torvalds
@ 2010-01-02 19:22   ` Frederic Weisbecker
  1 sibling, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Christian Kujau, Alexander Beregalov, Chris Mason,
	Ingo Molnar, Greg KH

On Sat, Jan 02, 2010 at 11:19:20AM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 2 Jan 2010, Frederic Weisbecker wrote:
> > 
> > These changes fix a lot of lock inversions, some of them were
> > triggering soft lockups very easily in xattrs operations.
> 
> Should 2.6.32-stable merge this branch too?
> 
> 		Linus



No this only relies on the bkl removal patches merged in this cycle.

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 19:02       ` Frederic Weisbecker
@ 2010-01-02 19:23         ` Andi Kleen
  2010-01-02 20:11           ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2010-01-02 19:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 08:02:15PM +0100, Frederic Weisbecker wrote:
> On Sat, Jan 02, 2010 at 06:43:12PM +0100, Andi Kleen wrote:
> > > I only have reiserfs partitions in my laptop and my testbox,
> > > nothing else. And that because I'm now maintaining it de facto.
> > 
> > AFAIK it's widely used in SUSE installations. It was the default
> > for a long time.
> > 
> > And right now as in 2.6.32 it's in a state of
> > "may randomly explode/deadlock". And no clear path out of it. Not good.
> > 
> > I am very concerned about destabilizing a widely used file system
> > like this. This has the potential to really hurt users.
> 
> 
> I understand your worries. And I've been very cautious with that,
> waiting for three cycles before requesting an upstream merge. I did
> it because the isolated tree model did not scale anymore.
> 
> Now that it's upstream, I get more testing and I expect that, in
> the end of this cycle, I get most of these issues reported and
> fixed.

Will you? 

How many users systems could it break by then?

> 
> Serious users who run serious datas won't ship 2.6.33, they will ship
> a further stable version 2.6.33.x (if they haven't converted their
> filesystems already).
> And at this time, things should be 99% fixed.

That seems very risky.  For some rarely used obscure subsystems
that might work but a widely used file system that keeps people's $HOME? 
I don't think seriously destabilizing that for a potentially longer
time is a good idea. There's the potential to break
a lot of porcelain.

Probably you could do a ext3/ext4 like thing by starting
with a "reiserfs3.5" copy and do the work there and then
merge back once things work and have been reasonably verified
by code review.


> That's the theory. Fitting into this strict scheme brings performance
> regressions. The bkl is a spinlock, it disables preemption, it is
> relaxed on sleep, and doesn't have locking dependencies. Moreover
> it's not a lock but a simulation of a NO_PREEMPT UP flow, with all
> the fixup guardians that come with (fixup if we schedule, as
> scheduling brings races).
> 
> From the conversion is borned a mutex. Even though we have
> adaptive spinning, we don't catch up spinlock performances
> as it's not a pure optimized looping fast path, and it may
> actually just sleep.

Fix the adaptive spinlock then?

> 
> The bkl is relaxed only when we sleep. Now simulating that with
> a mutex that gets explicitly relaxed is not the same thing as
> we need to relax the lock each time we _might_ sleep. It means
> we relax more and that brings performance regressions.

At least in the cases where the decision is in reiserfs code
directly you could predict it by using need_resched(), couldn't you?

That might not be 100% accurate, but good enough.


> That said, if the general opinion is in favour of unmerging
> the bkl removal changes in reiserfs. Then please do.

For me it seems too aggressive at this point.

If it was just a case of fixing a few known bugs, but
if you're not even sure how many problems are left ...

Perhaps do the reiserfs35 variant?


> Just to express my point of view, as my primary goal is not
> to fix reiserfs but the kernel: If you are afraid of such
> changes, your kernel will just become mildewed by the time.

Better some mildew than a seriously-broken-for-enough people's 
release (although I have my doubts that's the right metapher
for the BKL anyways)

Having stable releases is an important part for
getting enough testers (we already have too little). And 
if we start breaking their $HOMEs they might become 
even less.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] reiserfs fixes
  2010-01-02 19:21   ` Linus Torvalds
@ 2010-01-02 19:24     ` Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Christian Kujau, Alexander Beregalov, Chris Mason,
	Ingo Molnar, Greg KH

On Sat, Jan 02, 2010 at 11:21:51AM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 2 Jan 2010, Linus Torvalds wrote:
> > 
> > Should 2.6.32-stable merge this branch too?
> 
> Never mind, none of the reiserfs bkl-removal went into 2.6.32. We had some 
> other BKL work that got in there, but all the reiserfs stuff is 
> post-2.6.32 only, I guess.
> 
> 		Linus


Exactly.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 17:43     ` reiserfs broken in 2.6.32 was " Andi Kleen
  2010-01-02 19:02       ` Frederic Weisbecker
@ 2010-01-02 20:11       ` Ingo Molnar
  2010-01-02 22:18       ` Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2010-01-02 20:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Frederic Weisbecker, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason


* Andi Kleen <andi@firstfloor.org> wrote:

> > I only have reiserfs partitions in my laptop and my testbox, nothing else. 
> > And that because I'm now maintaining it de facto.
> 
> AFAIK it's widely used in SUSE installations. It was the default for a long 
> time.

[ Btw., if so then SuSE/Novell should sponsor Frederic's reiserfs maintanence 
  work. ]

> And right now as in 2.6.32 it's in a state of "may randomly 
> explode/deadlock". And no clear path out of it. Not good.

You are quite wrong about that accusation: Frederic's changes are not in 
v2.6.32 - they were merged by Linus in the v2.6.33 cycle so that code cannot 
physicaly have caused any problems in v2.6.32 ...

If there's stability problems with reiserfs in v2.6.32 then it's doubly good 
that Linus merged Frederic's tree in v2.6.33 [beyond the obvious advantage 
that it gets rid of the BKL, which was a serious and oft reported limit to 
reiserfs scalability]: finally there's again reiserfs development activity, 
which might lead to further fixes, and which might solve the v2.6.32 stability 
problems you mention.

In my view this work could have been merged sooner, in v2.6.32 already, maybe 
that way the v2.6.32 reiserfs stability problems you mentioned could have been 
avoided.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 19:23         ` Andi Kleen
@ 2010-01-02 20:11           ` Frederic Weisbecker
  2010-01-02 20:33             ` Andi Kleen
  2010-01-02 21:01             ` tytso
  0 siblings, 2 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 20:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, LKML, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 08:23:37PM +0100, Andi Kleen wrote:
> On Sat, Jan 02, 2010 at 08:02:15PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 02, 2010 at 06:43:12PM +0100, Andi Kleen wrote:
> > > > I only have reiserfs partitions in my laptop and my testbox,
> > > > nothing else. And that because I'm now maintaining it de facto.
> > > 
> > > AFAIK it's widely used in SUSE installations. It was the default
> > > for a long time.
> > > 
> > > And right now as in 2.6.32 it's in a state of
> > > "may randomly explode/deadlock". And no clear path out of it. Not good.
> > > 
> > > I am very concerned about destabilizing a widely used file system
> > > like this. This has the potential to really hurt users.
> > 
> > 
> > I understand your worries. And I've been very cautious with that,
> > waiting for three cycles before requesting an upstream merge. I did
> > it because the isolated tree model did not scale anymore.
> > 
> > Now that it's upstream, I get more testing and I expect that, in
> > the end of this cycle, I get most of these issues reported and
> > fixed.
> 
> Will you? 
> 
> How many users systems could it break by then?


I've never lost any datas since I began this work. And
I run it every day. If I had experienced lock inversions,
and sometimes soft lockups, I did not experienced serious
damages. It's a journalized filesystem that can fixup the things
pretty well.

Also we are talking about potential lock inversions, in potential
rare path, that could potentially raise soft lockups. That makes
a lot of potentials, for things that are going to be fixed and
for which I've never seen serious damages.

 
> > 
> > Serious users who run serious datas won't ship 2.6.33, they will ship
> > a further stable version 2.6.33.x (if they haven't converted their
> > filesystems already).
> > And at this time, things should be 99% fixed.
> 
> That seems very risky.  For some rarely used obscure subsystems
> that might work but a widely used file system that keeps people's $HOME? 
> I don't think seriously destabilizing that for a potentially longer
> time is a good idea. There's the potential to break
> a lot of porcelain.
> 
> Probably you could do a ext3/ext4 like thing by starting
> with a "reiserfs3.5" copy and do the work there and then
> merge back once things work and have been reasonably verified
> by code review.



I fear nobody else than me will review it that deeply, which
limits the scalability of this plan.

We could make a new reiserfs version by duplicating the code
base. But nobody will test it. That would require to patch
mkreiserfs, waiting for distros to ship it, waiting for
users to ship the distros. Assuming at this time there
will be remaining users to set up new reiserfs partitions.

We could also have a reiserfs-no-bkl config option that
would pick the duplicated code base. Again I fear few people
will test it.


> 
> > That's the theory. Fitting into this strict scheme brings performance
> > regressions. The bkl is a spinlock, it disables preemption, it is
> > relaxed on sleep, and doesn't have locking dependencies. Moreover
> > it's not a lock but a simulation of a NO_PREEMPT UP flow, with all
> > the fixup guardians that come with (fixup if we schedule, as
> > scheduling brings races).
> > 
> > From the conversion is borned a mutex. Even though we have
> > adaptive spinning, we don't catch up spinlock performances
> > as it's not a pure optimized looping fast path, and it may
> > actually just sleep.
> 
> Fix the adaptive spinlock then?



Believe me, I've reviewed the mutex code several dozens of time.
I just fail to find weaknesses inside, especially in the adaptive
spinning code.

We just can not make it as fast as a spinlock fast path, as it needs
to do regular checks to ensure it can continue to spin.


> > 
> > The bkl is relaxed only when we sleep. Now simulating that with
> > a mutex that gets explicitly relaxed is not the same thing as
> > we need to relax the lock each time we _might_ sleep. It means
> > we relax more and that brings performance regressions.
> 
> At least in the cases where the decision is in reiserfs code
> directly you could predict it by using need_resched(), couldn't you?
>
> 
> That might not be 100% accurate, but good enough.



Sometimes I do. Sometimes it's just wasteful. We don't want to relax
the lock just because of a kmalloc(__GFP_NOFS).

Sometimes relaxing the lock even when we are going to schedule is not
something we want for performances.

 
> 
> > That said, if the general opinion is in favour of unmerging
> > the bkl removal changes in reiserfs. Then please do.
> 
> For me it seems too aggressive at this point.
> 
> If it was just a case of fixing a few known bugs, but
> if you're not even sure how many problems are left ...
> 
> Perhaps do the reiserfs35 variant?


As explained above, I think this just reschedule the problem
for later. This model won't have any testers and won't evolve.



> 
> > Just to express my point of view, as my primary goal is not
> > to fix reiserfs but the kernel: If you are afraid of such
> > changes, your kernel will just become mildewed by the time.
> 
> Better some mildew than a seriously-broken-for-enough people's 
> release (although I have my doubts that's the right metapher
> for the BKL anyways)
>
> Having stable releases is an important part for
> getting enough testers (we already have too little). And 
> if we start breaking their $HOMEs they might become 
> even less.


This is very unlikely to break their $HOME.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 20:11           ` Frederic Weisbecker
@ 2010-01-02 20:33             ` Andi Kleen
  2010-01-02 20:54               ` Frederic Weisbecker
                                 ` (2 more replies)
  2010-01-02 21:01             ` tytso
  1 sibling, 3 replies; 27+ messages in thread
From: Andi Kleen @ 2010-01-02 20:33 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 09:11:39PM +0100, Frederic Weisbecker wrote:
> I've never lost any datas since I began this work. And
> I run it every day. If I had experienced lock inversions,
> and sometimes soft lockups, I did not experienced serious
> damages. It's a journalized filesystem that can fixup the things
> pretty well.

So are you confident that 2.6.33 will not have regular soft-lockups
for reiserfs users?

> 
> Also we are talking about potential lock inversions, in potential
> rare path, that could potentially raise soft lockups. That makes
> a lot of potentials, for things that are going to be fixed and
> for which I've never seen serious damages.

A soft lockup is a problem. Perhaps not totally serious,
but a user who experiences them regularly would rightly consider
such a release very broken.

> We could make a new reiserfs version by duplicating the code
> base. But nobody will test it. That would require to patch
> mkreiserfs, waiting for distros to ship it, waiting for
> users to ship the distros. Assuming at this time there
> will be remaining users to set up new reiserfs partitions.

I suspect your estimates on how widely reiserfs is used 
are quite off. However as usual a large part of the user
base simply only uses what their distribution ships.

> We could also have a reiserfs-no-bkl config option that
> would pick the duplicated code base. Again I fear few people
> will test it.

That sounds reasonable, at least have a workaround if there
are too many problems.

> Sometimes I do. Sometimes it's just wasteful. We don't want to relax
> the lock just because of a kmalloc(__GFP_NOFS).

If that's the problem you can always split the allocation:
first try it with __GFP_NOWAIT without lock dropped, then
if that fails do again with full __GFP_NOFS and lock drop.

However it's hard to believe that a few instructions
more or less would make much difference; I would normally
expect any larger changes coming from changes IO 
patterns or cache line bouncing.


> > Better some mildew than a seriously-broken-for-enough people's 
> > release (although I have my doubts that's the right metapher
> > for the BKL anyways)
> >
> > Having stable releases is an important part for
> > getting enough testers (we already have too little). And 
> > if we start breaking their $HOMEs they might become 
> > even less.
> 
> 
> This is very unlikely to break their $HOME.

Well break access to their $HOME

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 20:33             ` Andi Kleen
@ 2010-01-02 20:54               ` Frederic Weisbecker
  2010-01-02 21:10               ` Ingo Molnar
  2010-01-02 21:42               ` Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 20:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, LKML, Christian Kujau, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 09:33:05PM +0100, Andi Kleen wrote:
> On Sat, Jan 02, 2010 at 09:11:39PM +0100, Frederic Weisbecker wrote:
> > I've never lost any datas since I began this work. And
> > I run it every day. If I had experienced lock inversions,
> > and sometimes soft lockups, I did not experienced serious
> > damages. It's a journalized filesystem that can fixup the things
> > pretty well.
> 
> So are you confident that 2.6.33 will not have regular soft-lockups
> for reiserfs users?



Yep.


 
> > 
> > Also we are talking about potential lock inversions, in potential
> > rare path, that could potentially raise soft lockups. That makes
> > a lot of potentials, for things that are going to be fixed and
> > for which I've never seen serious damages.
> 
> A soft lockup is a problem. Perhaps not totally serious,
> but a user who experiences them regularly would rightly consider
> such a release very broken.


If there will be softlockups, these will be rare.

The same risk apply to every new features in Linux (which
includes improvements in existing features).



> > We could make a new reiserfs version by duplicating the code
> > base. But nobody will test it. That would require to patch
> > mkreiserfs, waiting for distros to ship it, waiting for
> > users to ship the distros. Assuming at this time there
> > will be remaining users to set up new reiserfs partitions.
> 
> I suspect your estimates on how widely reiserfs is used 
> are quite off. However as usual a large part of the user
> base simply only uses what their distribution ships.


Sure, but few (none?) distributions continue to propose
reiserfs as a default.

I'm not telling there are no users anymore, I'm just telling
there won't be new users anymore.


> > We could also have a reiserfs-no-bkl config option that
> > would pick the duplicated code base. Again I fear few people
> > will test it.
> 
> That sounds reasonable, at least have a workaround if there
> are too many problems.


No, I'll be the only guy that is going to test it. This will
basically paralyze the progresses and once we'll merge it back
as the reiserfs 3 mainline, we'll have to face the same issues.

The only thing that makes this work progressing is the report done
by testers. Reviews on such a huge and complicated code base is
useful but is also strong in proving its own limits.


> > Sometimes I do. Sometimes it's just wasteful. We don't want to relax
> > the lock just because of a kmalloc(__GFP_NOFS).
> 
> If that's the problem you can always split the allocation:
> first try it with __GFP_NOWAIT without lock dropped, then
> if that fails do again with full __GFP_NOFS and lock drop.
> 
> However it's hard to believe that a few instructions
> more or less would make much difference; I would normally
> expect any larger changes coming from changes IO 
> patterns or cache line bouncing.


The problem is not there. Each sites where we are going to
sleep have to be treated as an isolated case. Should or shouldn't
we relax here? Depending on the case, it's sometimes wasteful to
relax even if we are going to block, sometimes it's useful to relax
even if it doesn't appear to be needed.

Only reviews proven by benchmarks can help. Which is what I did.
But blindly apply the "sleep if we are going to block on io or
something" is not always right.


> > > Better some mildew than a seriously-broken-for-enough people's 
> > > release (although I have my doubts that's the right metapher
> > > for the BKL anyways)
> > >
> > > Having stable releases is an important part for
> > > getting enough testers (we already have too little). And 
> > > if we start breaking their $HOMEs they might become 
> > > even less.
> > 
> > 
> > This is very unlikely to break their $HOME.
> 
> Well break access to their $HOME


Never experienced that.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 20:11           ` Frederic Weisbecker
  2010-01-02 20:33             ` Andi Kleen
@ 2010-01-02 21:01             ` tytso
  2010-01-02 21:06               ` Frederic Weisbecker
  1 sibling, 1 reply; 27+ messages in thread
From: tytso @ 2010-01-02 21:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 09:11:39PM +0100, Frederic Weisbecker wrote:
> 
> I've never lost any datas since I began this work. And
> I run it every day. If I had experienced lock inversions,
> and sometimes soft lockups, I did not experienced serious
> damages. It's a journalized filesystem that can fixup the things
> pretty well.

Have you tried using the xfsqa regression test suite?  Despite the
name, it will work on non-xfs filesystems (although there are some
XFS-specific tests in the test suite.)  Both the btrfs and ext4
developers use it to debug their file systems, and it's a good way of
stressing the file system in all sorts of different ways that might
not be seen during normal desktop usage.  I suspect it would be a good
way of flushing out potential problems for reiserfs as well.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 21:01             ` tytso
@ 2010-01-02 21:06               ` Frederic Weisbecker
  2010-01-02 23:36                 ` tytso
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-02 21:06 UTC (permalink / raw)
  To: tytso
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 04:01:01PM -0500, tytso@mit.edu wrote:
> On Sat, Jan 02, 2010 at 09:11:39PM +0100, Frederic Weisbecker wrote:
> > 
> > I've never lost any datas since I began this work. And
> > I run it every day. If I had experienced lock inversions,
> > and sometimes soft lockups, I did not experienced serious
> > damages. It's a journalized filesystem that can fixup the things
> > pretty well.
> 
> Have you tried using the xfsqa regression test suite?  Despite the
> name, it will work on non-xfs filesystems (although there are some
> XFS-specific tests in the test suite.)  Both the btrfs and ext4
> developers use it to debug their file systems, and it's a good way of
> stressing the file system in all sorts of different ways that might
> not be seen during normal desktop usage.  I suspect it would be a good
> way of flushing out potential problems for reiserfs as well.
> 
> Regards,


Thanks! I'm going to test it now. I've been running a stress test
from Chris Mason which basically checks races on parallel writes/read.

If this testsuite includes more checks, like xattr and some other
things, then that's exactly what I was searching.

I guess this is the right place to get it?

git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 20:33             ` Andi Kleen
  2010-01-02 20:54               ` Frederic Weisbecker
@ 2010-01-02 21:10               ` Ingo Molnar
  2010-01-02 21:42               ` Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2010-01-02 21:10 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Frederic Weisbecker, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason


* Andi Kleen <andi@firstfloor.org> wrote:

> > We could also have a reiserfs-no-bkl config option that would pick the 
> > duplicated code base. Again I fear few people will test it.
> 
> That sounds reasonable, at least have a workaround if there are too many 
> problems.

Uhm, i think Frederic meant that a bit mockingly. Judging on past experience 
related to the BKL, introducing such a config option is one of the worst 
options possible. IMHO you are giving pretty bad unsolicited advice here.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 20:33             ` Andi Kleen
  2010-01-02 20:54               ` Frederic Weisbecker
  2010-01-02 21:10               ` Ingo Molnar
@ 2010-01-02 21:42               ` Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2010-01-02 21:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Frederic Weisbecker, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason


* Andi Kleen <andi@firstfloor.org> wrote:

> On Sat, Jan 02, 2010 at 09:11:39PM +0100, Frederic Weisbecker wrote:
> > I've never lost any datas since I began this work. And
> > I run it every day. If I had experienced lock inversions,
> > and sometimes soft lockups, I did not experienced serious
> > damages. It's a journalized filesystem that can fixup the things
> > pretty well.
> 
> So are you confident that 2.6.33 will not have regular soft-lockups for 
> reiserfs users?

Well, are you confident that your hardware-poison changes in v2.6.33 will not 
have any user triggerable bugs? It's a pretty silly question IMO because you 
can never be 100% confident.

What matters is for the changes to be sane and clean, for there to be no known 
regression and for any reported bugs to be fixed speedily. Which Frederic is 
doing in an utmost excellent way ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 17:43     ` reiserfs broken in 2.6.32 was " Andi Kleen
  2010-01-02 19:02       ` Frederic Weisbecker
  2010-01-02 20:11       ` Ingo Molnar
@ 2010-01-02 22:18       ` Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2010-01-02 22:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Frederic Weisbecker, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason


* Andi Kleen <andi@firstfloor.org> wrote:

> > I only have reiserfs partitions in my laptop and my testbox,
> > nothing else. And that because I'm now maintaining it de facto.
> 
> AFAIK it's widely used in SUSE installations. It was the default for a long 
> time.
> 
> And right now as in 2.6.32 it's in a state of "may randomly 
> explode/deadlock". And no clear path out of it. Not good.

Btw. that's quite weird as nothing of substance happened in fs/reiserfs in 
v2.6.32:

 $ git shortlog v2.6.31..v2.6.32 fs/reiserfs/
 Alexey Dobriyan (2):
       const: make struct super_block::dq_op const
       const: make struct super_block::s_qcop const

( There might be VFS core interactions perhaps - but nothing i've seen seems
  to correlate with your claims and that's quite weird. )

There's always reports about filesystem corruptions (file data cache is 
statistically the most likely thing to get corrupted by borderline hw failures 
such as RAM failures or DMA corruptions), but i've seen nothing out of the 
ordinary for v2.6.32 with reiserfs. (and the v2.6.33-rc1 bits are not yet 
merged in distro kernels)

So please substantiate your claims: URLs, distro bugzilla entries, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 21:06               ` Frederic Weisbecker
@ 2010-01-02 23:36                 ` tytso
  2010-01-02 23:43                   ` Christian Kujau
  2010-01-03  1:52                   ` Frederic Weisbecker
  0 siblings, 2 replies; 27+ messages in thread
From: tytso @ 2010-01-02 23:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 10:06:55PM +0100, Frederic Weisbecker wrote:
> 
> Thanks! I'm going to test it now. I've been running a stress test
> from Chris Mason which basically checks races on parallel writes/read.

Hmm, which test is this?

> If this testsuite includes more checks, like xattr and some other
> things, then that's exactly what I was searching.

Yes, quite a bit more than that.  One such test (which is used by
xfsqa test) is the fsstress proram, which is quite flexible.  You can
program different combinations of fallocate, direct I/O read/writes,
setxattr, buffered read/writes, symlinks, truncates, renames, etc..
The xfsqa suite will run fsstress in a number of different modes, but
that's not the only test program that it uses.  It also uses the fsx
program which exercises concurrent read/write/mmap operations, as well
as other programs to test acl support, noatime support, etc.

I make a point of running the regression test suite before pushing a
patch series to Linus; it makes me far more comfortable than I haven't
accidentally introduced some problem.

> I guess this is the right place to get it?
> 
> git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

Yep.  You'll need to install a number of packages to compile it,
including libaio-dev, libattr1-dev, libacl1-dev, xfsprogs,
xfslibs-dev, etc.

						- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 23:36                 ` tytso
@ 2010-01-02 23:43                   ` Christian Kujau
  2010-01-03  1:16                     ` Frederic Weisbecker
  2010-01-03  1:52                   ` Frederic Weisbecker
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Kujau @ 2010-01-02 23:43 UTC (permalink / raw)
  To: tytso, Frederic Weisbecker, Andi Kleen, Linus Torvalds, LKML,
	Alexander Beregalov, Chris Mason, Ingo Molnar

tytso@mit.edu wrote on 2010-01-02 15:36 :
>> Thanks! I'm going to test it now. I've been running a stress test
>> from Chris Mason which basically checks races on parallel writes/read.
>
> Hmm, which test is this?

http://oss.oracle.com/~mason/stress.sh, IIRC.

Christian.
-- 
make bzImage, not war

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 23:43                   ` Christian Kujau
@ 2010-01-03  1:16                     ` Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-03  1:16 UTC (permalink / raw)
  To: Christian Kujau
  Cc: tytso, Andi Kleen, Linus Torvalds, LKML, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 03:43:52PM -0800, Christian Kujau wrote:
> tytso@mit.edu wrote on 2010-01-02 15:36 :
>>> Thanks! I'm going to test it now. I've been running a stress test
>>> from Chris Mason which basically checks races on parallel writes/read.
>>
>> Hmm, which test is this?
>
> http://oss.oracle.com/~mason/stress.sh, IIRC.
>
> Christian.


That's it! It's simple script that runs concurrent copies of
a directory and that checks the coherency of the result.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-02 23:36                 ` tytso
  2010-01-02 23:43                   ` Christian Kujau
@ 2010-01-03  1:52                   ` Frederic Weisbecker
  2010-01-03  2:05                     ` Christian Kujau
  1 sibling, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-03  1:52 UTC (permalink / raw)
  To: tytso, Andi Kleen, Linus Torvalds, LKML, Christian Kujau,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 06:36:03PM -0500, tytso@mit.edu wrote:
> Yes, quite a bit more than that.  One such test (which is used by
> xfsqa test) is the fsstress proram, which is quite flexible.  You can
> program different combinations of fallocate, direct I/O read/writes,
> setxattr, buffered read/writes, symlinks, truncates, renames, etc..
>
> The xfsqa suite will run fsstress in a number of different modes, but
> that's not the only test program that it uses.  It also uses the fsx
> program which exercises concurrent read/write/mmap operations, as well
> as other programs to test acl support, noatime support, etc.
> 
> I make a point of running the regression test suite before pushing a
> patch series to Linus; it makes me far more comfortable than I haven't
> accidentally introduced some problem.



Hmm, I'm trying to play with it, and some things are not obvious to
me. I can launch fsstress directly, works well, but as I would
like to run all possible tests, I'm trying the check script, as
explained in the README file.

So I've set the partition and ran the script:

export TEST_DEV=/dev/sda3
./check -r

But:

[: 53: ==: unexpected operator
common.rc: Error: $TEST_DEV (/dev/sda3) is not a MOUNTED xfs filesystem
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda3 reiserfs   134797288   2982296 131814992   3% /data


I'm not sure how I can run these tests on a non-xfs partitions.
I must be missing something.

 
> > I guess this is the right place to get it?
> > 
> > git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> 
> Yep.  You'll need to install a number of packages to compile it,
> including libaio-dev, libattr1-dev, libacl1-dev, xfsprogs,
> xfslibs-dev, etc.


Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-03  1:52                   ` Frederic Weisbecker
@ 2010-01-03  2:05                     ` Christian Kujau
  2010-01-03  3:27                       ` tytso
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Kujau @ 2010-01-03  2:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: tytso, Andi Kleen, Linus Torvalds, LKML, Alexander Beregalov,
	Chris Mason, Ingo Molnar

On Sun, 3 Jan 2010 at 02:52, Frederic Weisbecker wrote:
> [: 53: ==: unexpected operator
> common.rc: Error: $TEST_DEV (/dev/sda3) is not a MOUNTED xfs filesystem

Yeah, I'm playing around with xfstests as well, but apparently they're 
assuming !/bin/sh will be run under /bin/bash, which is not always the 
case. A short fix is to link /bin/sh to /bin/bash, but perhaps some of the 
tests can be tweaked to run under /bin/sh as well.

> I'm not sure how I can run these tests on a non-xfs partitions.
> I must be missing something.

I haven't found an easy way to do this yet without rewriting a few 
routines (mkfs, mount, etc...). As Ted is already using xfstests for 
btrfs, ext4, maybe he wants to share his magic? :-)

Christian.
-- 
BOFH excuse #47:

Complete Transient Lockout

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-03  2:05                     ` Christian Kujau
@ 2010-01-03  3:27                       ` tytso
  2010-01-04 20:20                         ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: tytso @ 2010-01-03  3:27 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Frederic Weisbecker, Andi Kleen, Linus Torvalds, LKML,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 06:05:13PM -0800, Christian Kujau wrote:
> On Sun, 3 Jan 2010 at 02:52, Frederic Weisbecker wrote:
> > [: 53: ==: unexpected operator
> > common.rc: Error: $TEST_DEV (/dev/sda3) is not a MOUNTED xfs filesystem
> 
> Yeah, I'm playing around with xfstests as well, but apparently they're 
> assuming !/bin/sh will be run under /bin/bash, which is not always the 
> case. A short fix is to link /bin/sh to /bin/bash, but perhaps some of the 
> tests can be tweaked to run under /bin/sh as well.
> 
> > I'm not sure how I can run these tests on a non-xfs partitions.
> > I must be missing something.
> 
> I haven't found an easy way to do this yet without rewriting a few 
> routines (mkfs, mount, etc...). As Ted is already using xfstests for 
> btrfs, ext4, maybe he wants to share his magic? :-)

I'm using it for ext4.  It looks like someone has already tried using
the xfstests with reiserfs; take a look at common.rc; you'll see case
statements for xfs, udf, nfs, ext2/3/4, reiserfs, and gfs2.  Someone
who wants to use xfstests for some other file system may need to
further edit common.rc.  I thought the btrfs folks were using it as
well, but at least the kernel.org git tree for xfstests doesn't seem
to show any btrfs references in common.rc, so perhaps I'm wrong about
btrfs developers using xfstests (or they haven't sent their patches
back upstream).  I'm not sure how well tested the reiserfs support is,
so you may need to edit common.rc as necessary.

In any case, the README file has pretty much what you need.  I
personally run my test kernels using KVM, and I have a run-test script
which invokes check as follows:

#!/bin/sh
export TEST_DEV=/dev/sdb
export TEST_DIR=/test
export SCRATCH_DEV=/dev/sdc1
export SCRATCH_MNT=/scratch
export EXT_MOUNT_OPTIONS="-o block_validity"
exec ./check -ext4 $*

/dev/sdb is an ext4-formated filesystem, which you're supposed to not
reformat from run to run, so that some testing can be done with an
"aged" file system.  The scratch filesystem will be reformatted for
various tests, so you shouldn't keep anything valueable on it.

I also have a 1k block file system on /dev/sdd, so I invoke check as
follows to check to make sure things work when blocksize != pagesize:

#!/bin/sh
export TEST_DEV=/dev/sdd
export TEST_DIR=/test-1k
export SCRATCH_DEV=/dev/sdc1
export SCRATCH_MNT=/scratch
export MKFS_OPTIONS="-b 1024"
exec ./check -ext4 $*

As far as the inconsistency between TEST_DIR versus SCRATCH_MNT ---
all I can say is, the XFS engineers who threw together the xfstests
scripts may have been very good file system engineers, but they
obviously weren't very well used as UI/User Experience designers.  :-)
(The documentation isn't all that great either, but patches sent to
xfs@oss.sgi.com do tend to be accepted and merged into the kernel.org
tree if they are clean.)

Hope this helps,

						- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reiserfs broken in 2.6.32 was Re: [GIT PULL] reiserfs fixes
  2010-01-03  3:27                       ` tytso
@ 2010-01-04 20:20                         ` Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-01-04 20:20 UTC (permalink / raw)
  To: tytso
  Cc: Christian Kujau, Andi Kleen, Linus Torvalds, LKML,
	Alexander Beregalov, Chris Mason, Ingo Molnar

On Sat, Jan 02, 2010 at 10:27:58PM -0500, tytso@mit.edu wrote:
> On Sat, Jan 02, 2010 at 06:05:13PM -0800, Christian Kujau wrote:
> > On Sun, 3 Jan 2010 at 02:52, Frederic Weisbecker wrote:
> > > [: 53: ==: unexpected operator
> > > common.rc: Error: $TEST_DEV (/dev/sda3) is not a MOUNTED xfs filesystem
> > 
> > Yeah, I'm playing around with xfstests as well, but apparently they're 
> > assuming !/bin/sh will be run under /bin/bash, which is not always the 
> > case. A short fix is to link /bin/sh to /bin/bash, but perhaps some of the 
> > tests can be tweaked to run under /bin/sh as well.
> > 
> > > I'm not sure how I can run these tests on a non-xfs partitions.
> > > I must be missing something.
> > 
> > I haven't found an easy way to do this yet without rewriting a few 
> > routines (mkfs, mount, etc...). As Ted is already using xfstests for 
> > btrfs, ext4, maybe he wants to share his magic? :-)
> 
> I'm using it for ext4.  It looks like someone has already tried using
> the xfstests with reiserfs; take a look at common.rc; you'll see case
> statements for xfs, udf, nfs, ext2/3/4, reiserfs, and gfs2.  Someone
> who wants to use xfstests for some other file system may need to
> further edit common.rc.  I thought the btrfs folks were using it as
> well, but at least the kernel.org git tree for xfstests doesn't seem
> to show any btrfs references in common.rc, so perhaps I'm wrong about
> btrfs developers using xfstests (or they haven't sent their patches
> back upstream).  I'm not sure how well tested the reiserfs support is,
> so you may need to edit common.rc as necessary.
> 
> In any case, the README file has pretty much what you need.  I
> personally run my test kernels using KVM, and I have a run-test script
> which invokes check as follows:
> 
> #!/bin/sh
> export TEST_DEV=/dev/sdb
> export TEST_DIR=/test
> export SCRATCH_DEV=/dev/sdc1
> export SCRATCH_MNT=/scratch
> export EXT_MOUNT_OPTIONS="-o block_validity"
> exec ./check -ext4 $*
> 
> /dev/sdb is an ext4-formated filesystem, which you're supposed to not
> reformat from run to run, so that some testing can be done with an
> "aged" file system.  The scratch filesystem will be reformatted for
> various tests, so you shouldn't keep anything valueable on it.
> 
> I also have a 1k block file system on /dev/sdd, so I invoke check as
> follows to check to make sure things work when blocksize != pagesize:
> 
> #!/bin/sh
> export TEST_DEV=/dev/sdd
> export TEST_DIR=/test-1k
> export SCRATCH_DEV=/dev/sdc1
> export SCRATCH_MNT=/scratch
> export MKFS_OPTIONS="-b 1024"
> exec ./check -ext4 $*



Ok. Thanks a lot for all these details. It seems to be working
now that I have edited some shebang and replaced /bin/sh by
/bin/bash, like Christopher suggested.


 
> As far as the inconsistency between TEST_DIR versus SCRATCH_MNT ---
> all I can say is, the XFS engineers who threw together the xfstests
> scripts may have been very good file system engineers, but they
> obviously weren't very well used as UI/User Experience designers.  :-)



Agreed :)



> (The documentation isn't all that great either, but patches sent to
> xfs@oss.sgi.com do tend to be accepted and merged into the kernel.org
> tree if they are clean.)
> 
> Hope this helps,


Yeah, thanks!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [GIT PULL] reiserfs fixes
@ 2010-02-14 18:14 Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-02-14 18:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Frederic Weisbecker, Alexander Beregalov, Christian Kujau,
	Chris Mason

Linus,

Please pull the reiserfs/kill-bkl branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	reiserfs/kill-bkl

Thanks,
	Frederic
---

Frederic Weisbecker (1):
      reiserfs: Fix softlockup while waiting on an inode


 fs/reiserfs/inode.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

---
commit 175359f89df39f4faed663c8cfd6ee0222d2fa1e
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Thu Feb 11 13:13:10 2010 +0100

    reiserfs: Fix softlockup while waiting on an inode
    
    When we wait for an inode through reiserfs_iget(), we hold
    the reiserfs lock. And waiting for an inode may imply waiting
    for its writeback. But the inode writeback path may also require
    the reiserfs lock, which leads to a deadlock.
    
    We just need to release the reiserfs lock from reiserfs_iget()
    to fix this.
    
    Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
    Tested-by: Christian Kujau <lists@nerdbynature.de>
    Cc: Chris Mason <chris.mason@oracle.com>

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 9087b10..2df0f5c 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1497,9 +1497,11 @@ struct inode *reiserfs_iget(struct super_block *s, const struct cpu_key *key)
 
 	args.objectid = key->on_disk_key.k_objectid;
 	args.dirid = key->on_disk_key.k_dir_id;
+	reiserfs_write_unlock(s);
 	inode = iget5_locked(s, key->on_disk_key.k_objectid,
 			     reiserfs_find_actor, reiserfs_init_locked_inode,
 			     (void *)(&args));
+	reiserfs_write_lock(s);
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-02-14 18:14 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-02  1:27 [GIT PULL] reiserfs fixes Frederic Weisbecker
2010-01-02 13:41 ` Andi Kleen
2010-01-02 16:36   ` Frederic Weisbecker
2010-01-02 17:43     ` reiserfs broken in 2.6.32 was " Andi Kleen
2010-01-02 19:02       ` Frederic Weisbecker
2010-01-02 19:23         ` Andi Kleen
2010-01-02 20:11           ` Frederic Weisbecker
2010-01-02 20:33             ` Andi Kleen
2010-01-02 20:54               ` Frederic Weisbecker
2010-01-02 21:10               ` Ingo Molnar
2010-01-02 21:42               ` Ingo Molnar
2010-01-02 21:01             ` tytso
2010-01-02 21:06               ` Frederic Weisbecker
2010-01-02 23:36                 ` tytso
2010-01-02 23:43                   ` Christian Kujau
2010-01-03  1:16                     ` Frederic Weisbecker
2010-01-03  1:52                   ` Frederic Weisbecker
2010-01-03  2:05                     ` Christian Kujau
2010-01-03  3:27                       ` tytso
2010-01-04 20:20                         ` Frederic Weisbecker
2010-01-02 20:11       ` Ingo Molnar
2010-01-02 22:18       ` Ingo Molnar
2010-01-02 19:19 ` Linus Torvalds
2010-01-02 19:21   ` Linus Torvalds
2010-01-02 19:24     ` Frederic Weisbecker
2010-01-02 19:22   ` Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2010-02-14 18:14 Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox