public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* btrfs failing fsx-linux
@ 2009-08-18 17:26 Nick Piggin
  2009-08-18 17:59 ` Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Nick Piggin @ 2009-08-18 17:26 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

Hi Chris,

Don't know if this is a known issue, but I have btrfs (after my previous
inode_tree fixup patch) failing fsx-linux in Linus's current git tree.

This makes it a bit hard for me to test my btrfs truncate conversion patch
unfortunately, though it does seem pretty stable so I will probably just
send it out anyway.

Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
on 1GB rd, run 4 instances of fsx-linux on different files in the root
directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
done` at the same time.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-08-18 17:26 btrfs failing fsx-linux Nick Piggin
@ 2009-08-18 17:59 ` Josef Bacik
  2009-08-19  8:50   ` Nick Piggin
  2009-09-02 17:14   ` Nick Piggin
  2009-09-09 13:13 ` Chris Mason
  2009-09-11 19:36 ` Chris Mason
  2 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2009-08-18 17:59 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Mason, linux-btrfs

On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> Hi Chris,
> 
> Don't know if this is a known issue, but I have btrfs (after my previous
> inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> 
> This makes it a bit hard for me to test my btrfs truncate conversion patch
> unfortunately, though it does seem pretty stable so I will probably just
> send it out anyway.
> 
> Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> on 1GB rd, run 4 instances of fsx-linux on different files in the root
> directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> done` at the same time.
> 

Nick,

I'll take a look at this later today and see what I can come up with.  Thanks
for reporting it.

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-08-18 17:59 ` Josef Bacik
@ 2009-08-19  8:50   ` Nick Piggin
  2009-09-02 17:14   ` Nick Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2009-08-19  8:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, linux-btrfs

On Tue, Aug 18, 2009 at 01:59:50PM -0400, Josef Bacik wrote:
> On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > Hi Chris,
> > 
> > Don't know if this is a known issue, but I have btrfs (after my previous
> > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > 
> > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > unfortunately, though it does seem pretty stable so I will probably just
> > send it out anyway.
> > 
> > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > done` at the same time.
> > 
> 
> Nick,
> 
> I'll take a look at this later today and see what I can come up with.  Thanks
> for reporting it.
> 
> Josef

Thanks. After spending a good chunk of my life tracking down fsx-linux
problems in fsblock and mm code, I can't say I envy you :) It's a great
test to run though (especially syncing and dropping caches because that
hits writeback and bypasses caches far far more than fsx ever does
on its own).

Let me know if you need me to test any patches.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-08-18 17:59 ` Josef Bacik
  2009-08-19  8:50   ` Nick Piggin
@ 2009-09-02 17:14   ` Nick Piggin
  2009-09-02 17:29     ` Josef Bacik
  2009-09-02 19:07     ` Chris Mason
  1 sibling, 2 replies; 13+ messages in thread
From: Nick Piggin @ 2009-09-02 17:14 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, linux-btrfs

On Tue, Aug 18, 2009 at 01:59:50PM -0400, Josef Bacik wrote:
> On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > Hi Chris,
> > 
> > Don't know if this is a known issue, but I have btrfs (after my previous
> > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > 
> > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > unfortunately, though it does seem pretty stable so I will probably just
> > send it out anyway.
> > 
> > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > done` at the same time.
> > 
> 
> Nick,
> 
> I'll take a look at this later today and see what I can come up with.  Thanks
> for reporting it.

Any progress with this? Were you able to reproduce?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-02 17:14   ` Nick Piggin
@ 2009-09-02 17:29     ` Josef Bacik
  2009-09-02 19:07     ` Chris Mason
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2009-09-02 17:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Josef Bacik, Chris Mason, linux-btrfs

On Wed, Sep 02, 2009 at 07:14:50PM +0200, Nick Piggin wrote:
> On Tue, Aug 18, 2009 at 01:59:50PM -0400, Josef Bacik wrote:
> > On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > > Hi Chris,
> > > 
> > > Don't know if this is a known issue, but I have btrfs (after my previous
> > > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > > 
> > > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > > unfortunately, though it does seem pretty stable so I will probably just
> > > send it out anyway.
> > > 
> > > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > > done` at the same time.
> > > 
> > 
> > Nick,
> > 
> > I'll take a look at this later today and see what I can come up with.  Thanks
> > for reporting it.
> 
> Any progress with this? Were you able to reproduce?
> 

Err sorry no, I got distracted with my -ENOSPC stuff, it's taking longer than I
expected it to.  I hope to wrap it up today and take a look at this next.  I was
able to reproduce the problem on the experimental branch of Chris's tree, so
it's not something that has been fixed yet.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-02 17:14   ` Nick Piggin
  2009-09-02 17:29     ` Josef Bacik
@ 2009-09-02 19:07     ` Chris Mason
  2009-09-03  7:10       ` Nick Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Mason @ 2009-09-02 19:07 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Josef Bacik, linux-btrfs

On Wed, Sep 02, 2009 at 07:14:50PM +0200, Nick Piggin wrote:
> On Tue, Aug 18, 2009 at 01:59:50PM -0400, Josef Bacik wrote:
> > On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > > Hi Chris,
> > > 
> > > Don't know if this is a known issue, but I have btrfs (after my previous
> > > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > > 
> > > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > > unfortunately, though it does seem pretty stable so I will probably just
> > > send it out anyway.
> > > 
> > > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > > done` at the same time.
> > > 
> > 
> > Nick,
> > 
> > I'll take a look at this later today and see what I can come up with.  Thanks
> > for reporting it.
> 
> Any progress with this? Were you able to reproduce?
> 

I thought this was where the rb tree patch came from, sorry.  I'll try
to reproduce.

-chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-02 19:07     ` Chris Mason
@ 2009-09-03  7:10       ` Nick Piggin
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2009-09-03  7:10 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, linux-btrfs

On Wed, Sep 02, 2009 at 03:07:21PM -0400, Chris Mason wrote:
> On Wed, Sep 02, 2009 at 07:14:50PM +0200, Nick Piggin wrote:
> > On Tue, Aug 18, 2009 at 01:59:50PM -0400, Josef Bacik wrote:
> > > I'll take a look at this later today and see what I can come up with.  Thanks
> > > for reporting it.
> > 
> > Any progress with this? Were you able to reproduce?
> > 
> 
> I thought this was where the rb tree patch came from, sorry.  I'll try
> to reproduce.

Oh, I must not have been clear: I first hit the rbtree bug, then when
that was fixed, the same workload was eventually showing up fsx-linux
failures (but no more kernel crashes).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-08-18 17:26 btrfs failing fsx-linux Nick Piggin
  2009-08-18 17:59 ` Josef Bacik
@ 2009-09-09 13:13 ` Chris Mason
  2009-09-09 13:34   ` Nick Piggin
  2009-09-11 19:36 ` Chris Mason
  2 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2009-09-09 13:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-btrfs

On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> Hi Chris,
> 
> Don't know if this is a known issue, but I have btrfs (after my previous
> inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> 
> This makes it a bit hard for me to test my btrfs truncate conversion patch
> unfortunately, though it does seem pretty stable so I will probably just
> send it out anyway.
> 
> Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> on 1GB rd, run 4 instances of fsx-linux on different files in the root
> directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> done` at the same time.

Just an update, I think I've tracked this down.  Our fixup code for when
set_page_dirty was called without page_mkwrite wasn't catching every
case.  I'm hitting other problems because I'm testing on my new
performance code, but hopefully this will all be nailed down soon.

-chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-09 13:13 ` Chris Mason
@ 2009-09-09 13:34   ` Nick Piggin
  2009-09-09 13:37     ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-09-09 13:34 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Wed, Sep 09, 2009 at 09:13:49AM -0400, Chris Mason wrote:
> On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > Hi Chris,
> > 
> > Don't know if this is a known issue, but I have btrfs (after my previous
> > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > 
> > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > unfortunately, though it does seem pretty stable so I will probably just
> > send it out anyway.
> > 
> > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > done` at the same time.
> 
> Just an update, I think I've tracked this down.  Our fixup code for when
> set_page_dirty was called without page_mkwrite wasn't catching every
> case.  I'm hitting other problems because I'm testing on my new
> performance code, but hopefully this will all be nailed down soon.

Which cases do set_page_dirty happen without page_mkwrite? For fsx-linux
I hope there shouldn't be any... but one issue with btrfs's page_mkwrite
is that it still unlocks the page before page_mkwrite returns so if it
is anything like other filesystems, the page might get written out
between the page_mkwrite and the page fault's subsequent set_page_dirty.

So if that race is hitting, it might appear like set_page_dirty is
being called without a page_mkwrite.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-09 13:34   ` Nick Piggin
@ 2009-09-09 13:37     ` Chris Mason
  2009-09-09 13:43       ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2009-09-09 13:37 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-btrfs

On Wed, Sep 09, 2009 at 03:34:35PM +0200, Nick Piggin wrote:
> On Wed, Sep 09, 2009 at 09:13:49AM -0400, Chris Mason wrote:
> > On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> > > Hi Chris,
> > > 
> > > Don't know if this is a known issue, but I have btrfs (after my previous
> > > inode_tree fixup patch) failing fsx-linux in Linus's current git tree.
> > > 
> > > This makes it a bit hard for me to test my btrfs truncate conversion patch
> > > unfortunately, though it does seem pretty stable so I will probably just
> > > send it out anyway.
> > > 
> > > Anyway, just a head's up. Oh, the way I reproduce is to create btrfs
> > > on 1GB rd, run 4 instances of fsx-linux on different files in the root
> > > directory, and run `while true ; do sync ; echo 3 > /proc/sys/vm/drop_caches;
> > > done` at the same time.
> > 
> > Just an update, I think I've tracked this down.  Our fixup code for when
> > set_page_dirty was called without page_mkwrite wasn't catching every
> > case.  I'm hitting other problems because I'm testing on my new
> > performance code, but hopefully this will all be nailed down soon.
> 
> Which cases do set_page_dirty happen without page_mkwrite? For fsx-linux
> I hope there shouldn't be any...

I've been assuming its the zap_pte_range part, but now I have enough
code around it to make stack traces.

> but one issue with btrfs's page_mkwrite
> is that it still unlocks the page before page_mkwrite returns so if it
> is anything like other filesystems, the page might get written out
> between the page_mkwrite and the page fault's subsequent set_page_dirty.
> 
> So if that race is hitting, it might appear like set_page_dirty is
> being called without a page_mkwrite.

Good point, I'll make sure.

-chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-09 13:37     ` Chris Mason
@ 2009-09-09 13:43       ` Nick Piggin
  2009-09-09 13:47         ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-09-09 13:43 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Wed, Sep 09, 2009 at 09:37:29AM -0400, Chris Mason wrote:
> On Wed, Sep 09, 2009 at 03:34:35PM +0200, Nick Piggin wrote:
> > Which cases do set_page_dirty happen without page_mkwrite? For fsx-linux
> > I hope there shouldn't be any...
> 
> I've been assuming its the zap_pte_range part, but now I have enough
> code around it to make stack traces.

OK, because AFAIK we're _supposed_ to be able to close all that
up since the last round of page_mkwrite fixes here, so pte state
will not be out of state with page state.

Yes set_page_dirty still gets called there, for fses without
page_mkwrite or which don't try hard to keep in sync. But if you
return with the page locked from page_mkwrite, this set_page_diryt
call should always find the page dirty (I hope).


> > but one issue with btrfs's page_mkwrite
> > is that it still unlocks the page before page_mkwrite returns so if it
> > is anything like other filesystems, the page might get written out
> > between the page_mkwrite and the page fault's subsequent set_page_dirty.
> > 
> > So if that race is hitting, it might appear like set_page_dirty is
> > being called without a page_mkwrite.
> 
> Good point, I'll make sure.

Though if it has exposed a bug, that's a good thing too; you still
want to handle that case (for now) due to the get_uesr_pages problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-09-09 13:43       ` Nick Piggin
@ 2009-09-09 13:47         ` Chris Mason
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Mason @ 2009-09-09 13:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-btrfs

On Wed, Sep 09, 2009 at 03:43:51PM +0200, Nick Piggin wrote:
> On Wed, Sep 09, 2009 at 09:37:29AM -0400, Chris Mason wrote:
> > On Wed, Sep 09, 2009 at 03:34:35PM +0200, Nick Piggin wrote:
> > > Which cases do set_page_dirty happen without page_mkwrite? For fsx-linux
> > > I hope there shouldn't be any...
> > 
> > I've been assuming its the zap_pte_range part, but now I have enough
> > code around it to make stack traces.
> 
> OK, because AFAIK we're _supposed_ to be able to close all that
> up since the last round of page_mkwrite fixes here, so pte state
> will not be out of state with page state.
> 
> Yes set_page_dirty still gets called there, for fses without
> page_mkwrite or which don't try hard to keep in sync. But if you
> return with the page locked from page_mkwrite, this set_page_diryt
> call should always find the page dirty (I hope).
> 
> 
> > > but one issue with btrfs's page_mkwrite
> > > is that it still unlocks the page before page_mkwrite returns so if it
> > > is anything like other filesystems, the page might get written out
> > > between the page_mkwrite and the page fault's subsequent set_page_dirty.
> > > 
> > > So if that race is hitting, it might appear like set_page_dirty is
> > > being called without a page_mkwrite.
> > 
> > Good point, I'll make sure.
> 
> Though if it has exposed a bug, that's a good thing too; you still
> want to handle that case (for now) due to the get_uesr_pages problem.

Yeah, I'll hammer on it until the handler works and then I'll fixup
page_mkwrite to leave things locked.

-chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs failing fsx-linux
  2009-08-18 17:26 btrfs failing fsx-linux Nick Piggin
  2009-08-18 17:59 ` Josef Bacik
  2009-09-09 13:13 ` Chris Mason
@ 2009-09-11 19:36 ` Chris Mason
  2 siblings, 0 replies; 13+ messages in thread
From: Chris Mason @ 2009-09-11 19:36 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-btrfs

On Tue, Aug 18, 2009 at 07:26:41PM +0200, Nick Piggin wrote:
> Hi Chris,
> 
> Don't know if this is a known issue, but I have btrfs (after my previous
> inode_tree fixup patch) failing fsx-linux in Linus's current git tree.

Thanks for your help on this Nick, I've got my current fixes for fsx in
the master branch of the btrfs-unstable repo:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git

If you're curious, the 3 most relevant commits are here:

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=93c82d575055f1bd0277acae6f966bebafd80dd5

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=50a9b214bc6c052caa05a210ebfc1bdf0d7085b2

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=a1ed835e1ab5795f91b198d08c43e2f56848dcf3

But these fixes are built on top of my performance work (which was sure
to break fsx on its own).

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-09-11 19:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-18 17:26 btrfs failing fsx-linux Nick Piggin
2009-08-18 17:59 ` Josef Bacik
2009-08-19  8:50   ` Nick Piggin
2009-09-02 17:14   ` Nick Piggin
2009-09-02 17:29     ` Josef Bacik
2009-09-02 19:07     ` Chris Mason
2009-09-03  7:10       ` Nick Piggin
2009-09-09 13:13 ` Chris Mason
2009-09-09 13:34   ` Nick Piggin
2009-09-09 13:37     ` Chris Mason
2009-09-09 13:43       ` Nick Piggin
2009-09-09 13:47         ` Chris Mason
2009-09-11 19:36 ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox