4.1-rc6 - kernel crash after doing chattr +C

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* 4.1-rc6 - kernel crash after doing chattr +C
@ 2015-06-06  6:07 Tomasz Chmielewski
  2015-06-08 15:48 ` Chris Mason
  2015-07-03 20:25 ` Filipe David Manana
  0 siblings, 2 replies; 4+ messages in thread
From: Tomasz Chmielewski @ 2015-06-06  6:07 UTC (permalink / raw)
  To: linux-btrfs

4.1-rc6, busy filesystem.

I was running mongo import which made quite a lot of IO.
During the import, I did "chattr +C /var/lib/mongodb" - shortly after I 
saw this in dmesg and server died:

[57860.149839] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[57860.149877] IP: [<ffffffffc0158b8e>] 
btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
[57860.149923] PGD 5d1ac6067 PUD 5d40fc067 PMD 0
[57860.149943] Oops: 0002 [#1] SMP
[57860.149960] Modules linked in: xt_conntrack veth xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc intel_rapl 
iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
crct10dif_pclmul eeepc_wmi asus_wmi crc32_pclmul ghash_clmulni_intel 
sparse_keymap aesni_intel aes_x86_64 ie31200_edac lpc_ich lrw gf128mul 
edac_core glue_helper ablk_helper shpchp cryptd serio_raw wmi video 
tpm_infineon 8250_fintek mac_hid btrfs lp parport raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
e1000e raid1 ahci raid0 ptp libahci pps_core multipath linear
[57860.150203] CPU: 4 PID: 14111 Comm: mongod Not tainted 
4.1.0-040100rc6-generic #201506010235
[57860.150237] Hardware name: System manufacturer System Product 
Name/P8B WS, BIOS 0904 10/24/2011
[57860.150271] task: ffff88007901bc60 ti: ffff8805d5c38000 task.ti: 
ffff8805d5c38000
[57860.150303] RIP: 0010:[<ffffffffc0158b8e>]  [<ffffffffc0158b8e>] 
btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
[57860.150346] RSP: 0018:ffff8805d5c3bd18  EFLAGS: 00010206
[57860.150364] RAX: 0000000000000000 RBX: ffff880103c9d950 RCX: 
0000000000003d44
[57860.150386] RDX: 0000000000000000 RSI: 0000000000003d44 RDI: 
ffff880806a74838
[57860.150407] RBP: ffff8805d5c3bd88 R08: 0000000000000000 R09: 
0000000000000000
[57860.150428] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffff880806bcb800
[57860.150450] R13: ffff880806a74838 R14: ffff880103c9d8d8 R15: 
ffff88080a7e3518
[57860.150471] FS:  00007f5f4e6dc700(0000) GS:ffff88082fb00000(0000) 
knlGS:0000000000000000
[57860.150504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57860.150523] CR2: 0000000000000008 CR3: 000000062a584000 CR4: 
00000000000407e0
[57860.150544] Stack:
[57860.150558]  ffff8805d5c3bd48 ffff88080a7e35c8 ffff880806bcb000 
ffff880806bcb800
[57860.150592]  ffff8800070da638 ffffffffd5c3bdb0 0000000000000287 
ffff88080a72a4d0
[57860.150626]  ffff880806bcb800 ffff88080a72a4d0 ffff880806bcb800 
0000000000000000
[57860.150659] Call Trace:
[57860.150682]  [<ffffffffc015addb>] 
btrfs_commit_transaction+0x40b/0xb60 [btrfs]
[57860.150717]  [<ffffffff810c0700>] ? prepare_to_wait_event+0x100/0x100
[57860.150745]  [<ffffffffc0171973>] btrfs_sync_file+0x313/0x380 [btrfs]
[57860.150768]  [<ffffffff81236bf6>] vfs_fsync_range+0x46/0xc0
[57860.150788]  [<ffffffff81236c8c>] vfs_fsync+0x1c/0x20
[57860.150806]  [<ffffffff81236cc8>] do_fsync+0x38/0x70
[57860.150825]  [<ffffffff812370c3>] SyS_fdatasync+0x13/0x20
[57860.150846]  [<ffffffff8180cb32>] system_call_fastpath+0x16/0x75
[57860.150866] Code: 45 98 48 39 d8 0f 84 ad 00 00 00 48 8d 45 a8 48 83 
c0 18 48 89 45 90 66 0f 1f 44 00 00 48 8b 13 48 8b 43 08 4c 89 ef 4c 8d 
73 88 <48> 89 42 08 48 89 10 48 89 1b 48 89 5b 08 e8 bf 3a 6b c1 e8 aa
[57860.150959] RIP  [<ffffffffc0158b8e>] 
btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
[57860.150998]  RSP <ffff8805d5c3bd18>
[57860.151014] CR2: 0000000000000008
[57860.151186] ---[ end trace f41cd52aa31494ac ]---


-- 
Tomasz Chmielewski
http://wpkg.org


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.1-rc6 - kernel crash after doing chattr +C
  2015-06-06  6:07 4.1-rc6 - kernel crash after doing chattr +C Tomasz Chmielewski
@ 2015-06-08 15:48 ` Chris Mason
  2015-06-09 19:08   ` David Sterba
  2015-07-03 20:25 ` Filipe David Manana
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Mason @ 2015-06-08 15:48 UTC (permalink / raw)
  To: Tomasz Chmielewski, linux-btrfs

On 06/06/2015 02:07 AM, Tomasz Chmielewski wrote:
> 4.1-rc6, busy filesystem.
> 
> I was running mongo import which made quite a lot of IO.
> During the import, I did "chattr +C /var/lib/mongodb" - shortly after I
> saw this in dmesg and server died:
> 
> [57860.149839] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000008
> [57860.149877] IP: [<ffffffffc0158b8e>]
> btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]

Sorry, it's not obvious where the 0000000000000008 is coming from, can
you turn btrfs_wait_pending_ordered+0x5e/0x110 into a line number?

Use list *btrfs_wait_pending_ordered+0x5e at the gdb prompt, after you
gdb btrfs.ko

-chris


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.1-rc6 - kernel crash after doing chattr +C
  2015-06-08 15:48 ` Chris Mason
@ 2015-06-09 19:08   ` David Sterba
  0 siblings, 0 replies; 4+ messages in thread
From: David Sterba @ 2015-06-09 19:08 UTC (permalink / raw)
  To: Chris Mason; +Cc: Tomasz Chmielewski, linux-btrfs

On Mon, Jun 08, 2015 at 11:48:54AM -0400, Chris Mason wrote:
> On 06/06/2015 02:07 AM, Tomasz Chmielewski wrote:
> > 4.1-rc6, busy filesystem.
> > 
> > I was running mongo import which made quite a lot of IO.
> > During the import, I did "chattr +C /var/lib/mongodb" - shortly after I
> > saw this in dmesg and server died:
> > 
> > [57860.149839] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000008
> > [57860.149877] IP: [<ffffffffc0158b8e>]
> > btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
> 
> Sorry, it's not obvious where the 0000000000000008 is coming from, can
> you turn btrfs_wait_pending_ordered+0x5e/0x110 into a line number?
> 
> Use list *btrfs_wait_pending_ordered+0x5e at the gdb prompt, after you
> gdb btrfs.ko

Guesswork, but doing that on my sources points to __list_del

(gdb) l *(btrfs_wait_pending_ordered+0x5e)
0x333fe is in btrfs_wait_pending_ordered (include/linux/list.h:89).
84       * This is only for internal list manipulation where we know
85       * the prev/next entries already!
86       */
87      static inline void __list_del(struct list_head * prev, struct list_head * next)
88      {
89              next->prev = prev;
90              prev->next = next;
91      }

that is called from btrfs_wait_pending_ordered. The off 8 corresponds to 'prev'
of list_head, so the 'next' poiinter is NULL.

If we go from the list_del_init(ordered->trans_list) we find that it's called as

list_del(entry->prev, entry->next)

(ie entry === ordered->trans_list).

1755         while (!list_empty(&cur_trans->pending_ordered)) {
1756                 ordered = list_first_entry(&cur_trans->pending_ordered,
1757                                            struct btrfs_ordered_extent,
1758                                            trans_list);
1759                 list_del_init(&ordered->trans_list);
1760                 spin_unlock(&fs_info->trans_lock);
1761
1762                 wait_event(ordered->wait, test_bit(BTRFS_ORDERED_COMPLETE,
1763                                                    &ordered->flags));
1764                 btrfs_put_ordered_extent(ordered);                                                                                                               1765                 spin_lock(&fs_info->trans_lock);
1766         }

So we probably got bogus data from cur_trans->pending_ordered. I don't know if
ordered is zeroed or if just the list_head got corrupted. The way the list_head
pointer magic works it's possible to get there both ways (I think).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.1-rc6 - kernel crash after doing chattr +C
  2015-06-06  6:07 4.1-rc6 - kernel crash after doing chattr +C Tomasz Chmielewski
  2015-06-08 15:48 ` Chris Mason
@ 2015-07-03 20:25 ` Filipe David Manana
  1 sibling, 0 replies; 4+ messages in thread
From: Filipe David Manana @ 2015-07-03 20:25 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs, Chris Mason

On Sat, Jun 6, 2015 at 7:07 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
> 4.1-rc6, busy filesystem.
>
> I was running mongo import which made quite a lot of IO.
> During the import, I did "chattr +C /var/lib/mongodb" - shortly after I saw
> this in dmesg and server died:
>
> [57860.149839] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000008
> [57860.149877] IP: [<ffffffffc0158b8e>]
> btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
> [57860.149923] PGD 5d1ac6067 PUD 5d40fc067 PMD 0
> [57860.149943] Oops: 0002 [#1] SMP
> [57860.149960] Modules linked in: xt_conntrack veth xt_CHECKSUM
> iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp
> iptable_filter ip_tables x_tables bridge stp llc intel_rapl iosf_mbi
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> crct10dif_pclmul eeepc_wmi asus_wmi crc32_pclmul ghash_clmulni_intel
> sparse_keymap aesni_intel aes_x86_64 ie31200_edac lpc_ich lrw gf128mul
> edac_core glue_helper ablk_helper shpchp cryptd serio_raw wmi video
> tpm_infineon 8250_fintek mac_hid btrfs lp parport raid10 raid456
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> e1000e raid1 ahci raid0 ptp libahci pps_core multipath linear
> [57860.150203] CPU: 4 PID: 14111 Comm: mongod Not tainted
> 4.1.0-040100rc6-generic #201506010235
> [57860.150237] Hardware name: System manufacturer System Product Name/P8B
> WS, BIOS 0904 10/24/2011
> [57860.150271] task: ffff88007901bc60 ti: ffff8805d5c38000 task.ti:
> ffff8805d5c38000
> [57860.150303] RIP: 0010:[<ffffffffc0158b8e>]  [<ffffffffc0158b8e>]
> btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
> [57860.150346] RSP: 0018:ffff8805d5c3bd18  EFLAGS: 00010206
> [57860.150364] RAX: 0000000000000000 RBX: ffff880103c9d950 RCX:
> 0000000000003d44
> [57860.150386] RDX: 0000000000000000 RSI: 0000000000003d44 RDI:
> ffff880806a74838
> [57860.150407] RBP: ffff8805d5c3bd88 R08: 0000000000000000 R09:
> 0000000000000000
> [57860.150428] R10: 0000000000000001 R11: 0000000000000000 R12:
> ffff880806bcb800
> [57860.150450] R13: ffff880806a74838 R14: ffff880103c9d8d8 R15:
> ffff88080a7e3518
> [57860.150471] FS:  00007f5f4e6dc700(0000) GS:ffff88082fb00000(0000)
> knlGS:0000000000000000
> [57860.150504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [57860.150523] CR2: 0000000000000008 CR3: 000000062a584000 CR4:
> 00000000000407e0
> [57860.150544] Stack:
> [57860.150558]  ffff8805d5c3bd48 ffff88080a7e35c8 ffff880806bcb000
> ffff880806bcb800
> [57860.150592]  ffff8800070da638 ffffffffd5c3bdb0 0000000000000287
> ffff88080a72a4d0
> [57860.150626]  ffff880806bcb800 ffff88080a72a4d0 ffff880806bcb800
> 0000000000000000
> [57860.150659] Call Trace:
> [57860.150682]  [<ffffffffc015addb>] btrfs_commit_transaction+0x40b/0xb60
> [btrfs]
> [57860.150717]  [<ffffffff810c0700>] ? prepare_to_wait_event+0x100/0x100
> [57860.150745]  [<ffffffffc0171973>] btrfs_sync_file+0x313/0x380 [btrfs]
> [57860.150768]  [<ffffffff81236bf6>] vfs_fsync_range+0x46/0xc0
> [57860.150788]  [<ffffffff81236c8c>] vfs_fsync+0x1c/0x20
> [57860.150806]  [<ffffffff81236cc8>] do_fsync+0x38/0x70
> [57860.150825]  [<ffffffff812370c3>] SyS_fdatasync+0x13/0x20
> [57860.150846]  [<ffffffff8180cb32>] system_call_fastpath+0x16/0x75
> [57860.150866] Code: 45 98 48 39 d8 0f 84 ad 00 00 00 48 8d 45 a8 48 83 c0
> 18 48 89 45 90 66 0f 1f 44 00 00 48 8b 13 48 8b 43 08 4c 89 ef 4c 8d 73 88
> <48> 89 42 08 48 89 10 48 89 1b 48 89 5b 08 e8 bf 3a 6b c1 e8 aa
> [57860.150959] RIP  [<ffffffffc0158b8e>]
> btrfs_wait_pending_ordered+0x5e/0x110 [btrfs]
> [57860.150998]  RSP <ffff8805d5c3bd18>
> [57860.151014] CR2: 0000000000000008
> [57860.151186] ---[ end trace f41cd52aa31494ac ]---

Hi,

Managed to reproduce it and the following patch should fix the problem:

https://patchwork.kernel.org/patch/6716871/

>
>
> --
> Tomasz Chmielewski
> http://wpkg.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-03 20:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-06  6:07 4.1-rc6 - kernel crash after doing chattr +C Tomasz Chmielewski
2015-06-08 15:48 ` Chris Mason
2015-06-09 19:08   ` David Sterba
2015-07-03 20:25 ` Filipe David Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox