linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG in btrfs_set_item_key_safe
@ 2012-10-26  6:35 Jan Schmidt
  2012-10-26 12:19 ` Chris Mason
  2013-01-24 16:42 ` David Sterba
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Schmidt @ 2012-10-26  6:35 UTC (permalink / raw)
  To: linux-btrfs

Hi,

while running extensive qgroup and tree mod log tests I got to the following
BUG, which is probably not related to tree mod log:

[71793.530286] ------------[ cut here ]------------
[71793.585407] kernel BUG at fs/btrfs/ctree.c:3166!
[71793.640487] invalid opcode: 0000 [#1] PREEMPT SMP
[71793.697953] Modules linked in: btrfs mpt2sas scsi_transport_sas raid_class [last unloaded: btrfs]
[71793.804482] CPU 3
[71793.826370] Pid: 31128, comm: fsstress Tainted: G        W    3.6.0+ #2 Supermicro X8SIL/X8SIL
[71793.931342] RIP: 0010:[<ffffffffa050a348>]  [<ffffffffa050a348>] btrfs_set_item_key_safe+0x1d8/0x250 [btrfs]
[71794.048765] RSP: 0018:ffff8801cfd0fa28  EFLAGS: 00010246
[71794.112142] RAX: ffff8801cfd0fb68 RBX: ffff8801cfd0fa38 RCX: 000000000000006c
[71794.197304] RDX: 0000000000000132 RSI: 00000000000f6000 RDI: 00000000000f6000
[71794.282467] RBP: ffff8801cfd0fa88 R08: 0000000000000000 R09: ffff8801cfd0fa40
[71794.367629] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880231425000
[71794.452790] R13: ffff88023125f000 R14: 0000000000000000 R15: ffff8801d25f9ce0
[71794.537951] FS:  00007f4d458fc700(0000) GS:ffff880236c00000(0000) knlGS:0000000000000000
[71794.634523] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[71794.703088] CR2: 00007f4d44d76cdc CR3: 00000001cfe57000 CR4: 00000000000007e0
[71794.788250] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[71794.873410] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[71794.958573] Process fsstress (pid: 31128, threadinfo ffff8801cfd0e000, task ffff88023273c500)
[71795.060330] Stack:
[71795.084294]  ffff8801cfd0fb68 ffff8802315fb000 0000000000000132 000000000f60006c
[71795.172981]  0000000000000000 ffff8801d0307000 ffff8801cfd0fa88 ffff880231425000
[71795.261668]  ffff8801d25f9ce0 0000000000105000 00000000000a8000 0000000000000002
[71795.350357] Call Trace:
[71795.379520]  [<ffffffffa05452ec>] __btrfs_drop_extents+0x5bc/0xba0 [btrfs]
[71795.461564]  [<ffffffffa0511c94>] ? btrfs_search_slot+0xb34/0xb40 [btrfs]
[71795.542581]  [<ffffffffa0569679>] btrfs_log_changed_extents+0x659/0x6f0 [btrfs]
[71795.629816]  [<ffffffffa056ca6c>] btrfs_log_inode+0x4dc/0x650 [btrfs]
[71795.706774]  [<ffffffff811b0b00>] ? d_hash_and_lookup+0x30/0x70
[71795.777735]  [<ffffffffa056cd70>] btrfs_log_inode_parent+0x190/0x4b0 [btrfs]
[71795.862367]  [<ffffffff811b0b57>] ? dget_parent+0x17/0xd0
[71795.927105]  [<ffffffffa056d13f>] btrfs_log_dentry_safe+0x3f/0x60 [btrfs]
[71796.008427]  [<ffffffffa0543611>] btrfs_sync_file+0x121/0x290 [btrfs]
[71796.085695]  [<ffffffff811c7f73>] vfs_fsync_range+0x23/0x30
[71796.152496]  [<ffffffff811c7f97>] vfs_fsync+0x17/0x20
[71796.213098]  [<ffffffff811c81c4>] do_fsync+0x34/0x60
[71796.272615]  [<ffffffff811c81fe>] sys_fdatasync+0xe/0x20
[71796.336305]  [<ffffffff8193f5a2>] system_call_fastpath+0x16/0x1b
[71796.408292] Code: 7d f8 c9 c3 0f 1f 80 00 00 00 00 72 1e 0f b6 48 08 40 38 ce 76 0d 48 8b 70 09 e9 5f ff ff ff 0f 1f 40 00 73 36 66 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 eb f8 66 0f 1f 44 00 00 48 3b 48 09 0f
[71796.642304] RIP  [<ffffffffa050a348>] btrfs_set_item_key_safe+0x1d8/0x250 [btrfs]
[71796.732134]  RSP <ffff8801cfd0fa28>
[71796.774296] ---[ end trace f2639a7e4750b6d5 ]---

Due to added printk statements, my line numbers are shifted. The corresponding
BUG is:

3150 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans,
3151                              struct btrfs_root *root, struct btrfs_path *path,
3152                              struct btrfs_key *new_key)
3153 {
3154         struct btrfs_disk_key disk_key;
3155         struct extent_buffer *eb;
3156         int slot;
3157
3158         eb = path->nodes[0];
3159         slot = path->slots[0];
3160         if (slot > 0) {
3161                 btrfs_item_key(eb, &disk_key, slot - 1);
3162                 BUG_ON(comp_keys(&disk_key, new_key) >= 0);
3163         }
3164         if (slot < btrfs_header_nritems(eb) - 1) {
3165                 btrfs_item_key(eb, &disk_key, slot + 1);
3166                 BUG_ON(comp_keys(&disk_key, new_key) <= 0);
3167         }

-Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG in btrfs_set_item_key_safe
  2012-10-26  6:35 BUG in btrfs_set_item_key_safe Jan Schmidt
@ 2012-10-26 12:19 ` Chris Mason
  2012-10-26 12:30   ` Jan Schmidt
  2013-01-24 16:42 ` David Sterba
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Mason @ 2012-10-26 12:19 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs

On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote:
> Hi,
> 
> while running extensive qgroup and tree mod log tests I got to the following
> BUG, which is probably not related to tree mod log:

So our keys are out of order...are you able to reproduce this?

-chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG in btrfs_set_item_key_safe
  2012-10-26 12:19 ` Chris Mason
@ 2012-10-26 12:30   ` Jan Schmidt
  2012-10-26 12:37     ` Chris Mason
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Schmidt @ 2012-10-26 12:30 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Fri, October 26, 2012 at 14:19 (+0200), Chris Mason wrote:
> On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote:
>> Hi,
>>
>> while running extensive qgroup and tree mod log tests I got to the following
>> BUG, which is probably not related to tree mod log:
> 
> So our keys are out of order...are you able to reproduce this?

No reproducer yet. My tests ran fine for about 15 hours before that happened.
I'll have an extensive amount of (tree mod log related) debug information if I
should get there again.

My guess is that it's not related to tree mod log, though, as there are tree log
functions (fs/btrfs/tree-log.c) on the stack. I don't think rewinded buffers can
make it there.

I'll call back when I'll hit this again.

-Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG in btrfs_set_item_key_safe
  2012-10-26 12:30   ` Jan Schmidt
@ 2012-10-26 12:37     ` Chris Mason
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2012-10-26 12:37 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Chris Mason, linux-btrfs

On Fri, Oct 26, 2012 at 06:30:06AM -0600, Jan Schmidt wrote:
> On Fri, October 26, 2012 at 14:19 (+0200), Chris Mason wrote:
> > On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote:
> >> Hi,
> >>
> >> while running extensive qgroup and tree mod log tests I got to the following
> >> BUG, which is probably not related to tree mod log:
> > 
> > So our keys are out of order...are you able to reproduce this?
> 
> No reproducer yet. My tests ran fine for about 15 hours before that happened.
> I'll have an extensive amount of (tree mod log related) debug information if I
> should get there again.
> 
> My guess is that it's not related to tree mod log, though, as there are tree log
> functions (fs/btrfs/tree-log.c) on the stack. I don't think rewinded buffers can
> make it there.
> 
> I'll call back when I'll hit this again.

This is more likely to be memory corruption.  I'd suggest turning on
slab debug and CONFIG_DEBUG_PAGE_ALLOC...hopefully you'll get a faster
crash.

-chris


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG in btrfs_set_item_key_safe
  2012-10-26  6:35 BUG in btrfs_set_item_key_safe Jan Schmidt
  2012-10-26 12:19 ` Chris Mason
@ 2013-01-24 16:42 ` David Sterba
  1 sibling, 0 replies; 5+ messages in thread
From: David Sterba @ 2013-01-24 16:42 UTC (permalink / raw)
  To: chris.mason; +Cc: linux-btrfs, Jan Schmidt

Hi,

On Fri, Oct 26, 2012 at 08:35:43AM +0200, Jan Schmidt wrote:
> while running extensive qgroup and tree mod log tests I got to the following
> BUG, which is probably not related to tree mod log:

I've left the fsx DIO tester overnight, after Josef identified the fix for the
csum mismatch bug, and found it crashed at the same point as Jan reported:

[80582.012720] ------------[ cut here ]------------
[80582.013045] kernel BUG at fs/btrfs/ctree.c:2945!
[80582.013045] invalid opcode: 0000 [#1] PREEMPT SMP
io_blk virtio_pci sr_mod 8139cp parport microcode virtio_ring floppy virtio sg pcspkr button i2c_piix4 cdrom processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_ge
neric ata_piix [last unloaded: btrfs]
[80582.013045] CPU 16
[80582.013045] Pid: 13045, comm: btrfs-endio-wri Tainted: G        W    3.8.0-rc4-desktop+ #54 Bochs Bochs
[80582.013045] RIP: 0010:[<ffffffffa0393a25>]  [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs]
[80582.013045] RSP: 0018:ffff880adb9d9ad8  EFLAGS: 00010206
[80582.013045] RAX: 000000000000006c RBX: ffff880adb9d9c18 RCX: 000000000000012e
[80582.013045] RDX: 0000000000016000 RSI: ffff880adeeee076 RDI: ffff880adb9d9af9
[80582.013045] RBP: ffff880adb9d9b38 R08: ffff880ae4cd9dc0 R09: 0000000000000000
[80582.013045] R10: ffff880bee7c0f40 R11: 0000000000000015 R12: 0000000000000001
[80582.013045] R13: ffff880add976c78 R14: ffff880adb9d9ae8 R15: ffff880be92301c0
[80582.013045] FS:  0000000000000000(0000) GS:ffff880c1f200000(0000) knlGS:0000000000000000
[80582.013045] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[80582.013045] CR2: ffffffffff600400 CR3: 0000000be6cb8000 CR4: 00000000000006e0
[80582.013045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[80582.013045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[80582.013045] Process btrfs-endio-wri (pid: 13045, threadinfo ffff880adb9d8000, task ffff880b301545c0)
[80582.013045] Stack:
[80582.013045]  ffff880bec37a000 ffff880b932a5170 000000000000012e 000000000160006c
[80582.013045]  ffff880adb9d9b00 ffffffffa03a1ad0 000000000000012e 0000000000010000
[80582.013045]  ffff880add976c78 ffff880be92301c0 0000000000016000 0000000000000f96
[80582.013045] Call Trace:
[80582.013045]  [<ffffffffa03a1ad0>] ? btrfs_inc_extent_ref+0x90/0xb0 [btrfs]
[80582.013045]  [<ffffffffa03cdc50>] __btrfs_drop_extents+0x530/0xb70 [btrfs]
[80582.013045]  [<ffffffffa03cecab>] btrfs_drop_extents+0x6b/0x90 [btrfs]
[80582.013045]  [<ffffffffa03bce1a>] insert_reserved_file_extent+0x8a/0x2a0 [btrfs]
[80582.013045]  [<ffffffffa03c0c6e>] btrfs_finish_ordered_io+0x39e/0x3e0 [btrfs]
[80582.013045]  [<ffffffffa03c0cc0>] finish_ordered_fn+0x10/0x20 [btrfs]
[80582.013045]  [<ffffffffa03e7205>] worker_loop+0x165/0x4c0 [btrfs]
[80582.013045]  [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs]
[80582.013045]  [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs]
[80582.013045]  [<ffffffff8106d026>] kthread+0xc6/0xd0
[80582.013045]  [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70
[80582.013045]  [<ffffffff815f7a7c>] ret_from_fork+0x7c/0xb0
[80582.013045]  [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70
[80582.013045] Code: 00 00 00 4c 89 ef 48 63 d2 4c 89 f6 48 8d 14 92 48 8d 54 92 65 e8 6c 0c 04 00 48 8b 0b 48 39 4d b0 48 8b 55 b9 0f b6 45 b8 76 0b <0f> 0b eb fe 0f 1f 80 00 00 00 00 72 1e 3a 43
08 77 ee 66 0f 1f
[80582.013045] RIP  [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs]
[80582.013045]  RSP <ffff880adb9d9ad8>
[80582.071249] ---[ end trace 5360a95c1506db85 ]---

The lines are shifted, but it's the same code:

2933 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans,
2934                              struct btrfs_root *root, struct btrfs_path *path,
2935                              struct btrfs_key *new_key)
2936 {
2937         struct btrfs_disk_key disk_key;
2938         struct extent_buffer *eb;
2939         int slot;
2940
2941         eb = path->nodes[0];
2942         slot = path->slots[0];
2943         if (slot > 0) {
2944                 btrfs_item_key(eb, &disk_key, slot - 1);
2945                 BUG_ON(comp_keys(&disk_key, new_key) >= 0);
^^^^
2946         }
2947         if (slot < btrfs_header_nritems(eb) - 1) {
2948                 btrfs_item_key(eb, &disk_key, slot + 1);
2949                 BUG_ON(comp_keys(&disk_key, new_key) <= 0);
2950         }
2951
2952         btrfs_cpu_key_to_disk(&disk_key, new_key);
2953         btrfs_set_item_key(eb, &disk_key, slot);
2954         btrfs_mark_buffer_dirty(eb);
2955         if (slot == 0)
2956                 fixup_low_keys(trans, root, path, &disk_key, 1);
2957 }

It was unexpected so I haven't put any debugging into config or code.
According to the timestamps, the test has been running for 16h 58m.

Test setup:
* kvm virtual machine with 48 cpus
* 48G memory (single bit ECC)
* 5x 136GB 15K SAS disks (virtio,cache=none)

filesystem was freshly created before the test as data/raid0 and
meta/radi0 and then converted via balance to data/single meta/single
(that was one of the reproducers of the csum problem).

balance log:

[19441.016374] btrfs: disk space caching is enabled
[19452.900668] btrfs: relocating block group 4332584960 flags 9
[19455.935676] btrfs: found 4 extents
[19456.247759] btrfs: found 4 extents
[19456.471688] btrfs: relocating block group 37683200 flags 12
[19459.345387] btrfs: found 16 extents
[19459.688217] btrfs: relocating block group 20971520 flags 10
[19460.194694] btrfs: found 1 extents
[19460.465852] btrfs: relocating block group 12582912 flags 1
[19463.829548] btrfs: found 1 extents
[19464.123115] btrfs: found 1 extents
[19464.512113] btrfs: relocating block group 4194304 flags 4
[19475.407355] btrfs: found 1 extents

# btrfs fi df /mnt/btrfs/
Data: total=3.00GiB, used=11.29MiB
System, RAID1: total=32.00MiB, used=4.00KiB
System: total=4.00MiB, used=0.00
Metadata, RAID1: total=2.00GiB, used=336.00KiB

# btrfs fi show
Label: none  uuid: a15e1f6c-0e19-420e-93e2-82b4aa072a47
        Total devices 5 FS bytes used 10.41MiB
        devid    1 size 136.70GiB used 1.04GiB path /dev/vda
        devid    2 size 136.70GiB used 2.00GiB path /dev/vdb
        devid    3 size 136.70GiB used 1.00GiB path /dev/vdc
        devid    4 size 136.70GiB used 1.03GiB path /dev/vdd
        devid    5 size 136.70GiB used 2.00GiB path /dev/vde

the sources used were cmason/for-linus (0d90060f330ef49a60f56df2d9439ba263c979d9)
plus https://patchwork.kernel.org/patch/2020661/ (Btrfs: put csums on the right ordered extent)
plus https://patchwork.kernel.org/patch/1987481/ (Btrfs: fix wrong max device number for single profile)

btrfs-debug-tree dumped the filesystem without problems, so it's an in-memory
issue.


david

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-24 16:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-26  6:35 BUG in btrfs_set_item_key_safe Jan Schmidt
2012-10-26 12:19 ` Chris Mason
2012-10-26 12:30   ` Jan Schmidt
2012-10-26 12:37     ` Chris Mason
2013-01-24 16:42 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).