linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with latest for-linus branch
@ 2011-05-28 17:05 Andrea Gelmini
  2011-05-28 22:14 ` Chris Mason
  2011-05-28 22:40 ` David Sterba
  0 siblings, 2 replies; 10+ messages in thread
From: Andrea Gelmini @ 2011-05-28 17:05 UTC (permalink / raw)
  To: linux-btrfs

Hi all,
   and thanks a lot for your work.
   Well, I'm using my home with BTRFS. It's a Ext4 converted to BTRFS
via btrfs-convert.
   Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
vanilla 2.6.38 and vanilla 2.6.39.
   If I use Linus' git tree, BTRFS ooops at mount.
   So I bisected using kernel version 2.6.39 + latest for-linus branch.
   Bisect complains about this commit:
581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
Author: Li Zefan <lizf@cn.fujitsu.com>
Date:   Wed Apr 20 10:06:11 2011 +0800

    Btrfs: Cache free inode numbers in memory

   And bisect log is this:
git bisect start
# bad: [174ba50915b08dcfd07c8b5fb795b46a165fa09a] Btrfs: use the
device_list_mutex during write_dev_supers
git bisect bad 174ba50915b08dcfd07c8b5fb795b46a165fa09a
# good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
# bad: [aa2dfb372a2a647beedac163ce6f8b0fcbefac29] Merge branch
'allocator' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne
into inode_numbers
git bisect bad aa2dfb372a2a647beedac163ce6f8b0fcbefac29
# good: [7a36ddec1003a4e84e79f28ee714a142ed6bc529] btrfs: use
printk_ratelimited instead of printk_ratelimit
git bisect good 7a36ddec1003a4e84e79f28ee714a142ed6bc529
# bad: [0965537308ac3b267ea16e731bd73870a51c53b8] Merge branch
'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into inode_numbers
git bisect bad 0965537308ac3b267ea16e731bd73870a51c53b8
# bad: [581bb050941b4f220f84d3e5ed6dace3d42dd382] Btrfs: Cache free
inode numbers in memory
git bisect bad 581bb050941b4f220f84d3e5ed6dace3d42dd382
# good: [f38b6e754d8cc4605ac21d9c1094d569d88b163b] Btrfs: Use bitmap_set/clear()
git bisect good f38b6e754d8cc4605ac21d9c1094d569d88b163b
# good: [34d52cb6c50b5a43901709998f59fb1c5a43dc4a] Btrfs: Make free
space cache code generic
git bisect good 34d52cb6c50b5a43901709998f59fb1c5a43dc4a

  I can see two kind of problems, with different commit, of course.
  Sometimes the Ooops happens just as kernel mounts the partition,
sometimes the mount is good, but HD keeps reading for more than 30
seconds, and the it Ooops.
  Also, you can read but you can't write, meanwhile.

In attachment my config.

I have photos of the Ooops, but right now I can't take 'em from the phone...
But, maybe, you already knew and solved the problem.
Anyway, if you need much more details, just tell me.

Thanks a lot for your time,
Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-28 17:05 Problem with latest for-linus branch Andrea Gelmini
@ 2011-05-28 22:14 ` Chris Mason
  2011-05-30 10:13   ` Andrea Gelmini
  2011-05-30 13:02   ` Andrea Gelmini
  2011-05-28 22:40 ` David Sterba
  1 sibling, 2 replies; 10+ messages in thread
From: Chris Mason @ 2011-05-28 22:14 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs

Excerpts from Andrea Gelmini's message of 2011-05-28 13:05:47 -0400:
> Hi all,
>    and thanks a lot for your work.
>    Well, I'm using my home with BTRFS. It's a Ext4 converted to BTRFS
> via btrfs-convert.
>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
> vanilla 2.6.38 and vanilla 2.6.39.
>    If I use Linus' git tree, BTRFS ooops at mount.
>    So I bisected using kernel version 2.6.39 + latest for-linus branch.

Thanks, could you please send in the photos of the oops when you get
chance.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-28 17:05 Problem with latest for-linus branch Andrea Gelmini
  2011-05-28 22:14 ` Chris Mason
@ 2011-05-28 22:40 ` David Sterba
  2011-05-30  2:49   ` Li Zefan
  1 sibling, 1 reply; 10+ messages in thread
From: David Sterba @ 2011-05-28 22:40 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs, lizf

Hi,

On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini wrote:
>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
> vanilla 2.6.38 and vanilla 2.6.39.
>    If I use Linus' git tree, BTRFS ooops at mount.

can you please attach the oops traces?

>    So I bisected using kernel version 2.6.39 + latest for-linus branch.
>    Bisect complains about this commit:
> 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
> commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
> Author: Li Zefan <lizf@cn.fujitsu.com>
> Date:   Wed Apr 20 10:06:11 2011 +0800
> 
>     Btrfs: Cache free inode numbers in memory

this patch was part of the new ino allocator and it may depend
on subsequent patches (eg. 33345d015 "Btrfs: Always use
64bit inode number"). In this case it could be a 32/64 bit mismatch in
inode numbers and blame would point to a incomplete state wrt the
filesystem.

You've created your FS from ext4, I think that the filesystem has
64bit inode numbers, allocated to files and this got broken during the
conversion. (just a wild idea)

>   I can see two kind of problems, with different commit, of course.
>   Sometimes the Ooops happens just as kernel mounts the partition,
> sometimes the mount is good, but HD keeps reading for more than 30
> seconds, and the it Ooops.

This would mean something's broken during transaction commit.

>   Also, you can read but you can't write, meanwhile.
> 
> In attachment my config.

No attachment, but not needed IMHO.

> I have photos of the Ooops, but right now I can't take 'em from the phone...

Would really help if you can :)


david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-28 22:40 ` David Sterba
@ 2011-05-30  2:49   ` Li Zefan
  0 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2011-05-30  2:49 UTC (permalink / raw)
  To: dave; +Cc: andrea.gelmini, linux-btrfs@vger.kernel.org

David Sterba wrote:
> Hi,
> 
> On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini wrote:
>>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
>> vanilla 2.6.38 and vanilla 2.6.39.
>>    If I use Linus' git tree, BTRFS ooops at mount.
> 
> can you please attach the oops traces?
> 
>>    So I bisected using kernel version 2.6.39 + latest for-linus branch.
>>    Bisect complains about this commit:
>> 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
>> commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
>> Author: Li Zefan <lizf@cn.fujitsu.com>
>> Date:   Wed Apr 20 10:06:11 2011 +0800
>>
>>     Btrfs: Cache free inode numbers in memory
> 
> this patch was part of the new ino allocator and it may depend
> on subsequent patches (eg. 33345d015 "Btrfs: Always use
> 64bit inode number"). In this case it could be a 32/64 bit mismatch in
> inode numbers and blame would point to a incomplete state wrt the
> filesystem.
> 

the bug probably not caused by this.

> You've created your FS from ext4, I think that the filesystem has
> 64bit inode numbers, allocated to files and this got broken during the
> conversion. (just a wild idea)
> 
>>   I can see two kind of problems, with different commit, of course.
>>   Sometimes the Ooops happens just as kernel mounts the partition,

just mount the partition, and then no other fs operations? if so, the
patch you bisected down actually won't take effect.

>> sometimes the mount is good, but HD keeps reading for more than 30
>> seconds, and the it Ooops.
> 
> This would mean something's broken during transaction commit.
> 
>>   Also, you can read but you can't write, meanwhile.
>>
>> In attachment my config.
> 
> No attachment, but not needed IMHO.
> 
>> I have photos of the Ooops, but right now I can't take 'em from the phone...
> 
> Would really help if you can :)
> 

right.

and thanks for the bug report!

btw, I'll be off till 6.5, so this week I probably won't be able to take
care of this..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-28 22:14 ` Chris Mason
@ 2011-05-30 10:13   ` Andrea Gelmini
  2011-05-30 10:41     ` Chris Mason
  2011-05-30 13:02   ` Andrea Gelmini
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Gelmini @ 2011-05-30 10:13 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

2011/5/29 Chris Mason <chris.mason@oracle.com>:
> Thanks, could you please send in the photos of the oops when you get
> chance.

Well, I retested everything compiling with frame pointers, so:
a) partition is mounted with this flags:
defaults,ssd,noacl,space_cache (at the beginning I also used
compress);
b) vanilla kernel .38 and .39 are working good;
c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a
the one before 3.0-rc1,
   so in the photos you can find it as .39g+), it goes up, but after a
while of intense i/o working thread (it's a specific
   kernel thread of btrfs, I guess btrfs-ino-cache, but I could be
wrong) the system freeze. Well, if i/o keep working enough time,
   I can even touch and unlink files, or read files already present,
or do something like /usr/bin/find; these
   photos are here: http://ooops.lugbs.linux.it/linusgit
d) rebooting with .39 doesn't work. It crashes at mount time.
   The photos are here: http://ooops.lugbs.linux.it/2.6.39
e) booting with 2.6.38.7 solves the problem, giving this info:
[   20.273822] Btrfs loaded
[   20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home
[   20.388269] btrfs: use ssd allocation scheme
[   20.388277] btrfs: enabling disk space caching
[   25.025873] btrfs: unlinked 5 orphans
[   25.025876] btrfs: truncated 3 orphans
f) by the way, bisect.jpg is the photo I took when I sent first email.

These photos are terrible, but I guess they're good enough to read 'em.
Anyway, these are multiple shoots of same screen, of course.

Thanks a lot for your time,
Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-30 10:13   ` Andrea Gelmini
@ 2011-05-30 10:41     ` Chris Mason
  2011-05-30 11:59       ` Andrea Gelmini
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2011-05-30 10:41 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs

Excerpts from Andrea Gelmini's message of 2011-05-30 06:13:47 -0400:
> 2011/5/29 Chris Mason <chris.mason@oracle.com>:
> > Thanks, could you please send in the photos of the oops when you get
> > chance.
> 
> Well, I retested everything compiling with frame pointers, so:
> a) partition is mounted with this flags:
> defaults,ssd,noacl,space_cache (at the beginning I also used
> compress);
> b) vanilla kernel .38 and .39 are working good;
> c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a
> the one before 3.0-rc1,
>    so in the photos you can find it as .39g+), it goes up, but after a
> while of intense i/o working thread (it's a specific
>    kernel thread of btrfs, I guess btrfs-ino-cache, but I could be
> wrong) the system freeze. Well, if i/o keep working enough time,
>    I can even touch and unlink files, or read files already present,
> or do something like /usr/bin/find; these
>    photos are here: http://ooops.lugbs.linux.it/linusgit
> d) rebooting with .39 doesn't work. It crashes at mount time.
>    The photos are here: http://ooops.lugbs.linux.it/2.6.39
> e) booting with 2.6.38.7 solves the problem, giving this info:
> [   20.273822] Btrfs loaded
> [   20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home
> [   20.388269] btrfs: use ssd allocation scheme
> [   20.388277] btrfs: enabling disk space caching
> [   25.025873] btrfs: unlinked 5 orphans
> [   25.025876] btrfs: truncated 3 orphans
> f) by the way, bisect.jpg is the photo I took when I sent first email.
> 
> These photos are terrible, but I guess they're good enough to read 'em.
> Anyway, these are multiple shoots of same screen, of course.

These are perfect, thank you.  We're failing to write out the inode
cache.  Since you're on a 32 bit machine, I'm guessing that we failed to
kmap something properly.

Could you please do gdb fs/btrfs/btrfs.ko, and then at the gdb prompt:

gdb> list *__btrfs_write_out_cache+0x43a

And send the output here?  This corresponds to where you were crashing
in the kernel you oops in your linusgit directory.

If this doesn't work, you might need to recompile with
CONFIG_DEBUG_INFO=y.  You won't need to trigger the crash again,
just do the gdb command on the new .ko.

If you don't have btrfs compiled as a module, use gdb vmlinux instead of
gdb fs/btrfs/btrfs.ko

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-30 10:41     ` Chris Mason
@ 2011-05-30 11:59       ` Andrea Gelmini
  2011-05-30 13:35         ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Andrea Gelmini @ 2011-05-30 11:59 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

2011/5/30 Chris Mason <chris.mason@oracle.com>:
> These are perfect, thank you. =C2=A0We're failing to write out the in=
ode
> cache. =C2=A0Since you're on a 32 bit machine, I'm guessing that we f=
ailed to
> kmap something properly.

Thanks a lot for detailed info.
I recompiled, and get this:
gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/*
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl=
=2Ehtml>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copyi=
ng"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
=46or bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko...=
done.
(gdb) list *__btrfs_write_out_cache+0x43a
0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:676)=
=2E
671				struct btrfs_free_space *e;
672=09
673				e =3D rb_entry(node, struct btrfs_free_space, offset_index);
674				entries++;
675=09
676				entry->offset =3D cpu_to_le64(e->offset);
677				entry->bytes =3D cpu_to_le64(e->bytes);
678				if (e->bitmap) {
679					entry->type =3D BTRFS_FREE_SPACE_BITMAP;
680					list_add_tail(&e->list, &bitmap_list);
(gdb)

Thanks a lot for your quick answer,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-28 22:14 ` Chris Mason
  2011-05-30 10:13   ` Andrea Gelmini
@ 2011-05-30 13:02   ` Andrea Gelmini
  1 sibling, 0 replies; 10+ messages in thread
From: Andrea Gelmini @ 2011-05-30 13:02 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

2011/5/29 Chris Mason <chris.mason@oracle.com>:
> Thanks, could you please send in the photos of the oops when you get
> chance.

By the way, switching from 2.6.38.7 to 2.6.39, I have a lot of this messages:
[  140.297248] block group 1107296256 has an wrong amount of free space
[  140.848435] block group 8623489024 has an wrong amount of free space
[  140.879178] block group 17213423616 has an wrong amount of free space
[  140.910181] block group 24729616384 has an wrong amount of free space
[  140.937690] block group 33319550976 has an wrong amount of free space
[  140.971150] block group 40835743744 has an wrong amount of free space
[  141.000816] block group 49425678336 has an wrong amount of free space
[  141.027175] block group 56941871104 has an wrong amount of free space
[  141.057614] block group 65531805696 has an wrong amount of free space
[  141.088269] block group 73047998464 has an wrong amount of free space
[  141.124767] block group 81637933056 has an wrong amount of free space
[  141.156891] block group 97744060416 has an wrong amount of free space
[  141.190143] block group 121366380544 has an wrong amount of free space
[  141.219235] block group 129956315136 has an wrong amount of free space

It also happens with 2.6.38.7, but lot less.
Should I worry?

Thanks again,
Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-30 11:59       ` Andrea Gelmini
@ 2011-05-30 13:35         ` Chris Mason
  2011-05-31 18:15           ` Andrea Gelmini
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2011-05-30 13:35 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs, Josef Bacik

Excerpts from Andrea Gelmini's message of 2011-05-30 07:59:30 -0400:
> 2011/5/30 Chris Mason <chris.mason@oracle.com>:
> > These are perfect, thank you. =C2=A0We're failing to write out the =
inode
> > cache. =C2=A0Since you're on a 32 bit machine, I'm guessing that we=
 failed to
> > kmap something properly.
>=20
> Thanks a lot for detailed info.
> I recompiled, and get this:
> gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/*
> GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/g=
pl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show cop=
ying"
> and "show warranty" for details.
> This GDB was configured as "i686-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko.=
=2E.done.
> (gdb) list *__btrfs_write_out_cache+0x43a
> 0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:67=
6).
> 671                struct btrfs_free_space *e;
> 672   =20
> 673                e =3D rb_entry(node, struct btrfs_free_space, offs=
et_index);
> 674                entries++;
> 675   =20
> 676                entry->offset =3D cpu_to_le64(e->offset);
> 677                entry->bytes =3D cpu_to_le64(e->bytes);
> 678                if (e->bitmap) {
> 679                    entry->type =3D BTRFS_FREE_SPACE_BITMAP;
> 680                    list_add_tail(&e->list, &bitmap_list);
> (gdb)

Ok, so I think we're blowing past the end of the page we've kmap'd.  Bu=
t
I don't think that can happen without something like the patch below
triggering:

Josef, what do you think?

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 70d4579..a95b72e 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -596,6 +596,11 @@ int __btrfs_write_out_cache(struct btrfs_root *roo=
t, struct inode *inode,
 	 */
 	first_page_offset =3D (sizeof(u32) * num_pages) + sizeof(u64);
=20
+	if (first_page_offset + sizeof(struct btrfs_free_space_entry) >=3D PA=
GE_CACHE_SIZE) {
+		printk(KERN_CRIT "bad first page offset %lu\n", first_page_offset);
+		BUG();
+	}
+
 	/* Get the cluster for this block_group if it exists */
 	if (block_group && !list_empty(&block_group->cluster_list))
 		cluster =3D list_entry(block_group->cluster_list.next,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Problem with latest for-linus branch
  2011-05-30 13:35         ` Chris Mason
@ 2011-05-31 18:15           ` Andrea Gelmini
  0 siblings, 0 replies; 10+ messages in thread
From: Andrea Gelmini @ 2011-05-31 18:15 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Josef Bacik

2011/5/30 Chris Mason <chris.mason@oracle.com>:
> Ok, so I think we're blowing past the end of the page we've kmap'd. =C2=
=A0But
> I don't think that can happen without something like the patch below
> triggering:

Quick update: after rm of ~10 GB of data, I rebooted with Linus' latest
git tree, and it works (after some minutes of btrfs-ino-cache).

Ciao,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-31 18:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-28 17:05 Problem with latest for-linus branch Andrea Gelmini
2011-05-28 22:14 ` Chris Mason
2011-05-30 10:13   ` Andrea Gelmini
2011-05-30 10:41     ` Chris Mason
2011-05-30 11:59       ` Andrea Gelmini
2011-05-30 13:35         ` Chris Mason
2011-05-31 18:15           ` Andrea Gelmini
2011-05-30 13:02   ` Andrea Gelmini
2011-05-28 22:40 ` David Sterba
2011-05-30  2:49   ` Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).