* Hard crash on 4.9.5
@ 2017-01-23 20:03 Matt McKinnon
2017-01-23 20:27 ` Hans van Kranenburg
2017-01-25 21:06 ` Liu Bo
0 siblings, 2 replies; 8+ messages in thread
From: Matt McKinnon @ 2017-01-23 20:03 UTC (permalink / raw)
To: linux-btrfs
Wondering what to do about this error which says 'reboot needed'. Has
happened a three times in the past week:
Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
sda1): err add delayed dir index item(index: 23810) into the deletion
tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] ------------[ cut here
]------------
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 0000
[#1] SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si
ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
nfsd Tainted: G W 4.9.5-custom #1
Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name:
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28
/2014
Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: ffff95a42addab80
task.stack: ffffb9da85330000
Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP:
0010:[<ffffffffc0567ee6>] [<ffffffffc0567ee6>]
btrfs_delete_delayed_dir_inde
x+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP:
0018:ffffb9da85333be0 EFLAGS: 00010286
Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX: 0000000000000000
RBX: ffff95a3b104b690 RCX: 0000000000000000
Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0000000000000001
RSI: ffff95a42fc0dcc8 RDI: ffff95a42fc0dcc8
Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: ffffb9da85333c48
R08: 0000000000000491 R09: 0000000000000000
Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0000000000000005
R11: 0000000000000006 R12: ffff95a3b104b6d8
Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 0000000000005d02
R14: ffff95a82953d800 R15: 00000000ffffffef
Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS:
0000000000000000(0000) GS:ffff95a42fc00000(0000) knlGS:0000000000000000
Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 00007f12ea376000
CR3: 00000003e1e07000 CR4: 00000000001406f0
Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
Jan 23 14:16:17 my_machine kernel: [ 2568.799524] ffffffff9b7fe5f2
ffff95a3b104b560 0000000000040000 ffff95a3f96b3e80
Jan 23 14:16:17 my_machine kernel: [ 2568.806983] ffff95a3f96b3e80
39ff95a814eeeb68 600000000000289c 0000000000005d02
Jan 23 14:16:17 my_machine kernel: [ 2568.814436] ffff95a3f7457c40
ffff95a3bcb74138 ffff95a814eeeb68 0000000000289c39
Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
Jan 23 14:16:17 my_machine kernel: [ 2568.824343] [<ffffffff9b7fe5f2>]
? mutex_lock+0x12/0x2f
Jan 23 14:16:17 my_machine kernel: [ 2568.829671] [<ffffffffc0513488>]
__btrfs_unlink_inode+0x198/0x4c0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.836555] [<ffffffffc0516dec>]
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.843086] [<ffffffffc0516e7b>]
btrfs_unlink+0x6b/0xb0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.849091] [<ffffffff9b21ea9a>]
vfs_unlink+0xda/0x190
Jan 23 14:16:17 my_machine kernel: [ 2568.854315] [<ffffffff9b21ac83>]
? lookup_one_len+0xd3/0x130
Jan 23 14:16:17 my_machine kernel: [ 2568.860075] [<ffffffffc09160ae>]
nfsd_unlink+0x16e/0x210 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.866084] [<ffffffffc091d63c>]
nfsd3_proc_remove+0x7c/0x110 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.872529] [<ffffffffc09102a8>]
nfsd_dispatch+0xb8/0x1f0 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.878641] [<ffffffffc064e68f>]
svc_process_common+0x43f/0x700 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.885432] [<ffffffffc064f80c>]
svc_process+0xfc/0x1c0 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.891528] [<ffffffffc090fd00>]
nfsd+0xf0/0x160 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.896838] [<ffffffffc090fc10>]
? nfsd_destroy+0x60/0x60 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.902931] [<ffffffff9b09cd4a>]
kthread+0xca/0xe0
Jan 23 14:16:17 my_machine kernel: [ 2568.907807] [<ffffffff9b09cc80>]
? kthread_park+0x60/0x60
Jan 23 14:16:17 my_machine kernel: [ 2568.913296] [<ffffffff9b801075>]
ret_from_fork+0x25/0x30
Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43
10 49 8b be f0 01 00 00 45 89 f9 4c 8b 03 4c 89 ea 48 c7 c6 f
0 8f 59 c0 48 8b 88 48 03 00 00 31 c0 e8 ba 36 f7 ff <0f> 0b 0f 1f 84 00
00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48
Jan 23 14:16:17 my_machine kernel: [ 2568.938651] RIP
[<ffffffffc0567ee6>] btrfs_delete_delayed_dir_index+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.946773] RSP <ffffb9da85333be0>
Jan 23 14:16:17 my_machine kernel: [ 2568.996481] ---[ end trace
e8c95b69e4ef5f70 ]---
Jan 23 14:16:19 my_machine kernel: [ 2570.503671] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000246
Jan 23 14:16:19 my_machine kernel: [ 2570.511551] IP:
[<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
Jan 23 14:16:19 my_machine kernel: [ 2570.517498] PGD 46a002067
Jan 23 14:16:19 my_machine kernel: [ 2570.520036] PUD 45af9c067
Jan 23 14:16:19 my_machine kernel: [ 2570.522748] PMD 0
Jan 23 14:16:19 my_machine kernel: [ 2570.523284]
Jan 23 14:23:50 riperton kernel: [ 3021.853513] [<ffffffff9b18407f>]
queued_spin_lock_slowpath+0xb/0xf
Jan 23 14:23:50 riperton kernel: [ 3021.859776] [<ffffffff9b800b80>]
_raw_spin_lock+0x20/0x30
Jan 23 14:23:50 riperton kernel: [ 3021.865261] [<ffffffff9b27c0bd>]
pid_revalidate+0x4d/0xf0
Jan 23 14:23:50 riperton kernel: [ 3021.870747] [<ffffffff9b21a74b>]
lookup_fast+0x29b/0x2c0
Jan 23 14:23:50 riperton kernel: [ 3021.876147] [<ffffffff9b21d7c2>]
path_openat+0x172/0x1370
Jan 23 14:16:19 my_machine kernel: [ 2570.524789] Oops: 0000 [#2] SMP
Jan 23 14:16:19 my_machine kernel: [ 2570.527932] Modules linked in: ufs
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_reject_ipv4
xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl
nfs lockd grace sunrpc fscache intel_rapl sb_edac edac_core
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev mei_me
mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp
parport ses enclosure scsi_transport_sas raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c igb hid_generic i2c_algo_bit raid1 dca usbhid ahci raid0 ptp
megaraid_sas multipath
Jan 23 14:16:19 my_machine kernel: [ 2570.600135] hid libahci pps_core
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:19 my_machine kernel: [ 2570.605651] CPU: 2 PID: 2440 Comm:
nfsd Tainted: G D W 4.9.5-custom #1
Jan 23 14:16:19 my_machine kernel: [ 2570.613128] Hardware name:
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
Jan 23 14:16:19 my_machine kernel: [ 2570.622168] task: ffff95a42addab80
task.stack: ffffb9da85330000
Jan 23 14:16:19 my_machine kernel: [ 2570.628085] RIP:
0010:[<ffffffff9b0c0ecb>] [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
Jan 23 14:16:19 my_machine kernel: [ 2570.636451] RSP:
0018:ffffb9da85333e58 EFLAGS: 00010082
Jan 23 14:16:19 my_machine kernel: [ 2570.641762] RAX: 0000000000000282
RBX: ffffb9da85333f18 RCX: 0000000000000000
Jan 23 14:16:19 my_machine kernel: [ 2570.648897] RDX: 0000000000000246
RSI: 0000000000000003 RDI: ffffb9da85333f18
Jan 23 14:16:19 my_machine kernel: [ 2570.656028] RBP: ffffb9da85333e90
R08: 0000000000000000 R09: ffff95a429c7ba00
Jan 23 14:16:19 my_machine kernel: [ 2570.663162] R10: 000002567df4f057
R11: 0000000000000001 R12: ffffb9da85333f20
Jan 23 14:16:19 my_machine kernel: [ 2570.670295] R13: 0000000000000282
R14: 0000000000000000 R15: 0000000000000003
Jan 23 14:16:19 my_machine kernel: [ 2570.677427] FS:
0000000000000000(0000) GS:ffff95a42fd00000(0000) knlGS:0000000000000000
Jan 23 14:16:19 my_machine kernel: [ 2570.685513] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jan 23 14:16:19 my_machine kernel: [ 2570.691261] CR2: 0000000000000246
CR3: 000000045a400000 CR4: 00000000001406e0
Jan 23 14:16:19 my_machine kernel: [ 2570.698393] Stack:
Jan 23 14:16:19 my_machine kernel: [ 2570.700411] 0000000100000246
0000000000000000 ffffb9da85333f18 ffffb9da85333f10
Jan 23 14:16:19 my_machine kernel: [ 2570.707865] 0000000000000282
ffff95a42addab80 0000000000000000 ffffb9da85333ea0
Jan 23 14:16:19 my_machine kernel: [ 2570.715326] ffffffff9b0c0f43
ffffb9da85333ec8 ffffffff9b0c1967 ffff95a42addb2a8
Jan 23 14:16:19 my_machine kernel: [ 2570.722797] Call Trace:
Jan 23 14:16:19 my_machine kernel: [ 2570.725267] [<ffffffff9b0c0f43>]
__wake_up_locked+0x13/0x20
Jan 23 14:16:19 my_machine kernel: [ 2570.730923] [<ffffffff9b0c1967>]
complete+0x37/0x50
Jan 23 14:16:19 my_machine kernel: [ 2570.735892] [<ffffffff9b07a74f>]
mm_release+0xbf/0x140
Jan 23 14:16:19 my_machine kernel: [ 2570.741113] [<ffffffff9b08168a>]
do_exit+0x13a/0xad0
Jan 23 14:16:19 my_machine kernel: [ 2570.746169] [<ffffffff9b802627>]
rewind_stack_do_exit+0x17/0x20
Jan 23 14:16:19 my_machine kernel: [ 2570.752170] Code: 0f 1f 44 00 00
55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 08 53 48
83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 0a 49 39 d4 48 8d 42
e8 4c 8d 69 e8 75 08 eb 38 4c 89 e8
Jan 23 14:16:19 my_machine kernel: [ 2570.772172] RIP
[<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
Jan 23 14:16:19 my_machine kernel: [ 2570.778196] RSP <ffffb9da85333e58>
Jan 23 14:16:19 my_machine kernel: [ 2570.781680] CR2: 0000000000000246
Jan 23 14:16:19 my_machine kernel: [ 2570.784993] ---[ end trace
e8c95b69e4ef5f71 ]---
Jan 23 14:16:19 my_machine kernel: [ 2570.794692] Fixing recursive fault
but reboot is needed!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-23 20:03 Hard crash on 4.9.5 Matt McKinnon
@ 2017-01-23 20:27 ` Hans van Kranenburg
2017-01-23 20:33 ` Hans van Kranenburg
2017-01-25 16:20 ` Liu Bo
2017-01-25 21:06 ` Liu Bo
1 sibling, 2 replies; 8+ messages in thread
From: Hans van Kranenburg @ 2017-01-23 20:27 UTC (permalink / raw)
To: Matt McKinnon, linux-btrfs
On 01/23/2017 09:03 PM, Matt McKinnon wrote:
> Wondering what to do about this error which says 'reboot needed'. Has
> happened a three times in the past week:
>
> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
> sda1): err add delayed dir index item(index: 23810) into the deletion
> tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] ------------[ cut here
> ]------------
> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> fs/btrfs/delayed-inode.c:1557!
> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 0000
> [#1] SMP
> [...]
The purpose of the code involved is that if you create a directory or
file and quickly remove it again, the filesystem doesn't need to do two
disk writes, it can just erase it again from its memory before writing
anything to disk.
---- 8< more ----
This is when the functionality was added:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=16cdcec736cd214350cdb591bf1091f8beedefa0
If you look for "err add delayed dir" in the source code of that commit
message, you see where the error message is constructed
errno: -17, just after it called __btrfs_add_delayed_insertion_item
__btrfs_add_delayed_insertion_item calls __btrfs_add_delayed_item, and
the only non-0 return in that function is: return -EEXIST, which is -17
I think this means you added a file or directory, and the kernel code
tried to add adding the file twice to the list of additions, which it
has no way to deal with except making the whole kernel crash.
---- >8 ----
A while ago someone reported this on IRC, running a 4.8.13 kernel.
(that's when I looked up the above info). I can also find it in Oct 2016
in my IRC logs, but without any info on kernel version.
Anyway, it seems to point to something that's going wrong with changes
that are *not* on disk *yet*, and the crash is preventing .
--
Hans van Kranenburg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-23 20:27 ` Hans van Kranenburg
@ 2017-01-23 20:33 ` Hans van Kranenburg
2017-01-25 16:20 ` Liu Bo
1 sibling, 0 replies; 8+ messages in thread
From: Hans van Kranenburg @ 2017-01-23 20:33 UTC (permalink / raw)
To: Matt McKinnon, linux-btrfs
On 01/23/2017 09:27 PM, Hans van Kranenburg wrote:
> [... press send without rereading ...]
>
> Anyway, it seems to point to something that's going wrong with changes
> that are *not* on disk *yet*, and the crash is preventing ...
... whatever incorrect data this situation might result in from reaching
disk, at least.
--
Hans van Kranenburg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-23 20:27 ` Hans van Kranenburg
2017-01-23 20:33 ` Hans van Kranenburg
@ 2017-01-25 16:20 ` Liu Bo
1 sibling, 0 replies; 8+ messages in thread
From: Liu Bo @ 2017-01-25 16:20 UTC (permalink / raw)
To: Hans van Kranenburg; +Cc: Matt McKinnon, linux-btrfs
On Mon, Jan 23, 2017 at 09:27:22PM +0100, Hans van Kranenburg wrote:
> On 01/23/2017 09:03 PM, Matt McKinnon wrote:
> > Wondering what to do about this error which says 'reboot needed'. Has
> > happened a three times in the past week:
> >
> > Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
> > sda1): err add delayed dir index item(index: 23810) into the deletion
> > tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
> > Jan 23 14:16:17 my_machine kernel: [ 2568.611010] ------------[ cut here
> > ]------------
> > Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> > fs/btrfs/delayed-inode.c:1557!
> > Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 0000
> > [#1] SMP
> > [...]
>
> The purpose of the code involved is that if you create a directory or
> file and quickly remove it again, the filesystem doesn't need to do two
> disk writes, it can just erase it again from its memory before writing
> anything to disk.
>
> ---- 8< more ----
>
> This is when the functionality was added:
>
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=16cdcec736cd214350cdb591bf1091f8beedefa0
>
> If you look for "err add delayed dir" in the source code of that commit
> message, you see where the error message is constructed
>
> errno: -17, just after it called __btrfs_add_delayed_insertion_item
>
> __btrfs_add_delayed_insertion_item calls __btrfs_add_delayed_item, and
> the only non-0 return in that function is: return -EEXIST, which is -17
>
> I think this means you added a file or directory, and the kernel code
> tried to add adding the file twice to the list of additions, which it
> has no way to deal with except making the whole kernel crash.
>
This was happening while doing unlink, so I think it encounters a twice
deletion somehow.
Thanks,
-liubo
> ---- >8 ----
>
> A while ago someone reported this on IRC, running a 4.8.13 kernel.
> (that's when I looked up the above info). I can also find it in Oct 2016
> in my IRC logs, but without any info on kernel version.
>
> Anyway, it seems to point to something that's going wrong with changes
> that are *not* on disk *yet*, and the crash is preventing .
>
> --
> Hans van Kranenburg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-23 20:03 Hard crash on 4.9.5 Matt McKinnon
2017-01-23 20:27 ` Hans van Kranenburg
@ 2017-01-25 21:06 ` Liu Bo
2017-01-28 20:50 ` Matt McKinnon
1 sibling, 1 reply; 8+ messages in thread
From: Liu Bo @ 2017-01-25 21:06 UTC (permalink / raw)
To: Matt McKinnon; +Cc: linux-btrfs
On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:
> Wondering what to do about this error which says 'reboot needed'. Has
> happened a three times in the past week:
>
Well, I don't think btrfs's logic here is wrong, the following stack
shows that a nfs client has sent a second unlink against the same inode
while somehow the inode was not fully deleted by the first unlink.
So it'd be good that you could add some debugging information to get us
further.
Thanks,
-liubo
> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
> err add delayed dir index item(index: 23810) into the deletion tree of the
> delayed node(root id: 257, inode id: 2661433, errno: -17)
> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] ------------[ cut here
> ]------------
> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> fs/btrfs/delayed-inode.c:1557!
> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 0000 [#1]
> SMP
> Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
> qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
> ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
> th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
> el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
> d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
> btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
> sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
> async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
> bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
> Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core
> linear dm_mirror dm_region_hash dm_log
> Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
> nfsd Tainted: G W 4.9.5-custom #1
> Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: Supermicro
> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28
> /2014
> Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: ffff95a42addab80
> task.stack: ffffb9da85330000
> Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP:
> 0010:[<ffffffffc0567ee6>] [<ffffffffc0567ee6>]
> btrfs_delete_delayed_dir_inde
> x+0x286/0x290 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 0018:ffffb9da85333be0
> EFLAGS: 00010286
> Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX: 0000000000000000 RBX:
> ffff95a3b104b690 RCX: 0000000000000000
> Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0000000000000001 RSI:
> ffff95a42fc0dcc8 RDI: ffff95a42fc0dcc8
> Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: ffffb9da85333c48 R08:
> 0000000000000491 R09: 0000000000000000
> Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0000000000000005 R11:
> 0000000000000006 R12: ffff95a3b104b6d8
> Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 0000000000005d02 R14:
> ffff95a82953d800 R15: 00000000ffffffef
> Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 0000000000000000(0000)
> GS:ffff95a42fc00000(0000) knlGS:0000000000000000
> Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 00007f12ea376000 CR3:
> 00000003e1e07000 CR4: 00000000001406f0
> Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
> Jan 23 14:16:17 my_machine kernel: [ 2568.799524] ffffffff9b7fe5f2
> ffff95a3b104b560 0000000000040000 ffff95a3f96b3e80
> Jan 23 14:16:17 my_machine kernel: [ 2568.806983] ffff95a3f96b3e80
> 39ff95a814eeeb68 600000000000289c 0000000000005d02
> Jan 23 14:16:17 my_machine kernel: [ 2568.814436] ffff95a3f7457c40
> ffff95a3bcb74138 ffff95a814eeeb68 0000000000289c39
> Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
> Jan 23 14:16:17 my_machine kernel: [ 2568.824343] [<ffffffff9b7fe5f2>] ?
> mutex_lock+0x12/0x2f
> Jan 23 14:16:17 my_machine kernel: [ 2568.829671] [<ffffffffc0513488>]
> __btrfs_unlink_inode+0x198/0x4c0 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.836555] [<ffffffffc0516dec>]
> btrfs_unlink_inode+0x1c/0x40 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.843086] [<ffffffffc0516e7b>]
> btrfs_unlink+0x6b/0xb0 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.849091] [<ffffffff9b21ea9a>]
> vfs_unlink+0xda/0x190
> Jan 23 14:16:17 my_machine kernel: [ 2568.854315] [<ffffffff9b21ac83>] ?
> lookup_one_len+0xd3/0x130
> Jan 23 14:16:17 my_machine kernel: [ 2568.860075] [<ffffffffc09160ae>]
> nfsd_unlink+0x16e/0x210 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.866084] [<ffffffffc091d63c>]
> nfsd3_proc_remove+0x7c/0x110 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.872529] [<ffffffffc09102a8>]
> nfsd_dispatch+0xb8/0x1f0 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.878641] [<ffffffffc064e68f>]
> svc_process_common+0x43f/0x700 [sunrpc]
> Jan 23 14:16:17 my_machine kernel: [ 2568.885432] [<ffffffffc064f80c>]
> svc_process+0xfc/0x1c0 [sunrpc]
> Jan 23 14:16:17 my_machine kernel: [ 2568.891528] [<ffffffffc090fd00>]
> nfsd+0xf0/0x160 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.896838] [<ffffffffc090fc10>] ?
> nfsd_destroy+0x60/0x60 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.902931] [<ffffffff9b09cd4a>]
> kthread+0xca/0xe0
> Jan 23 14:16:17 my_machine kernel: [ 2568.907807] [<ffffffff9b09cc80>] ?
> kthread_park+0x60/0x60
> Jan 23 14:16:17 my_machine kernel: [ 2568.913296] [<ffffffff9b801075>]
> ret_from_fork+0x25/0x30
> Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 10 49
> 8b be f0 01 00 00 45 89 f9 4c 8b 03 4c 89 ea 48 c7 c6 f
> 0 8f 59 c0 48 8b 88 48 03 00 00 31 c0 e8 ba 36 f7 ff <0f> 0b 0f 1f 84 00 00
> 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48
> Jan 23 14:16:17 my_machine kernel: [ 2568.938651] RIP [<ffffffffc0567ee6>]
> btrfs_delete_delayed_dir_index+0x286/0x290 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.946773] RSP <ffffb9da85333be0>
> Jan 23 14:16:17 my_machine kernel: [ 2568.996481] ---[ end trace
> e8c95b69e4ef5f70 ]---
> Jan 23 14:16:19 my_machine kernel: [ 2570.503671] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000246
> Jan 23 14:16:19 my_machine kernel: [ 2570.511551] IP: [<ffffffff9b0c0ecb>]
> __wake_up_common+0x2b/0x90
> Jan 23 14:16:19 my_machine kernel: [ 2570.517498] PGD 46a002067
> Jan 23 14:16:19 my_machine kernel: [ 2570.520036] PUD 45af9c067
> Jan 23 14:16:19 my_machine kernel: [ 2570.522748] PMD 0
> Jan 23 14:16:19 my_machine kernel: [ 2570.523284]
> Jan 23 14:23:50 riperton kernel: [ 3021.853513] [<ffffffff9b18407f>]
> queued_spin_lock_slowpath+0xb/0xf
> Jan 23 14:23:50 riperton kernel: [ 3021.859776] [<ffffffff9b800b80>]
> _raw_spin_lock+0x20/0x30
> Jan 23 14:23:50 riperton kernel: [ 3021.865261] [<ffffffff9b27c0bd>]
> pid_revalidate+0x4d/0xf0
> Jan 23 14:23:50 riperton kernel: [ 3021.870747] [<ffffffff9b21a74b>]
> lookup_fast+0x29b/0x2c0
> Jan 23 14:23:50 riperton kernel: [ 3021.876147] [<ffffffff9b21d7c2>]
> path_openat+0x172/0x1370
> Jan 23 14:16:19 my_machine kernel: [ 2570.524789] Oops: 0000 [#2] SMP
> Jan 23 14:16:19 my_machine kernel: [ 2570.527932] Modules linked in: ufs
> qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_reject_ipv4
> xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs
> lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
> glue_helper ablk_helper cryptd dm_multipath joydev mei_me mei lpc_ich
> ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses
> enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy
> async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic
> i2c_algo_bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
> Jan 23 14:16:19 my_machine kernel: [ 2570.600135] hid libahci pps_core
> linear dm_mirror dm_region_hash dm_log
> Jan 23 14:16:19 my_machine kernel: [ 2570.605651] CPU: 2 PID: 2440 Comm:
> nfsd Tainted: G D W 4.9.5-custom #1
> Jan 23 14:16:19 my_machine kernel: [ 2570.613128] Hardware name: Supermicro
> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
> Jan 23 14:16:19 my_machine kernel: [ 2570.622168] task: ffff95a42addab80
> task.stack: ffffb9da85330000
> Jan 23 14:16:19 my_machine kernel: [ 2570.628085] RIP:
> 0010:[<ffffffff9b0c0ecb>] [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
> Jan 23 14:16:19 my_machine kernel: [ 2570.636451] RSP: 0018:ffffb9da85333e58
> EFLAGS: 00010082
> Jan 23 14:16:19 my_machine kernel: [ 2570.641762] RAX: 0000000000000282 RBX:
> ffffb9da85333f18 RCX: 0000000000000000
> Jan 23 14:16:19 my_machine kernel: [ 2570.648897] RDX: 0000000000000246 RSI:
> 0000000000000003 RDI: ffffb9da85333f18
> Jan 23 14:16:19 my_machine kernel: [ 2570.656028] RBP: ffffb9da85333e90 R08:
> 0000000000000000 R09: ffff95a429c7ba00
> Jan 23 14:16:19 my_machine kernel: [ 2570.663162] R10: 000002567df4f057 R11:
> 0000000000000001 R12: ffffb9da85333f20
> Jan 23 14:16:19 my_machine kernel: [ 2570.670295] R13: 0000000000000282 R14:
> 0000000000000000 R15: 0000000000000003
> Jan 23 14:16:19 my_machine kernel: [ 2570.677427] FS: 0000000000000000(0000)
> GS:ffff95a42fd00000(0000) knlGS:0000000000000000
> Jan 23 14:16:19 my_machine kernel: [ 2570.685513] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jan 23 14:16:19 my_machine kernel: [ 2570.691261] CR2: 0000000000000246 CR3:
> 000000045a400000 CR4: 00000000001406e0
> Jan 23 14:16:19 my_machine kernel: [ 2570.698393] Stack:
> Jan 23 14:16:19 my_machine kernel: [ 2570.700411] 0000000100000246
> 0000000000000000 ffffb9da85333f18 ffffb9da85333f10
> Jan 23 14:16:19 my_machine kernel: [ 2570.707865] 0000000000000282
> ffff95a42addab80 0000000000000000 ffffb9da85333ea0
> Jan 23 14:16:19 my_machine kernel: [ 2570.715326] ffffffff9b0c0f43
> ffffb9da85333ec8 ffffffff9b0c1967 ffff95a42addb2a8
> Jan 23 14:16:19 my_machine kernel: [ 2570.722797] Call Trace:
> Jan 23 14:16:19 my_machine kernel: [ 2570.725267] [<ffffffff9b0c0f43>]
> __wake_up_locked+0x13/0x20
> Jan 23 14:16:19 my_machine kernel: [ 2570.730923] [<ffffffff9b0c1967>]
> complete+0x37/0x50
> Jan 23 14:16:19 my_machine kernel: [ 2570.735892] [<ffffffff9b07a74f>]
> mm_release+0xbf/0x140
> Jan 23 14:16:19 my_machine kernel: [ 2570.741113] [<ffffffff9b08168a>]
> do_exit+0x13a/0xad0
> Jan 23 14:16:19 my_machine kernel: [ 2570.746169] [<ffffffff9b802627>]
> rewind_stack_do_exit+0x17/0x20
> Jan 23 14:16:19 my_machine kernel: [ 2570.752170] Code: 0f 1f 44 00 00 55 48
> 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 08 53 48 83 ec 10
> 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 0a 49 39 d4 48 8d 42 e8 4c 8d 69 e8
> 75 08 eb 38 4c 89 e8
> Jan 23 14:16:19 my_machine kernel: [ 2570.772172] RIP [<ffffffff9b0c0ecb>]
> __wake_up_common+0x2b/0x90
> Jan 23 14:16:19 my_machine kernel: [ 2570.778196] RSP <ffffb9da85333e58>
> Jan 23 14:16:19 my_machine kernel: [ 2570.781680] CR2: 0000000000000246
> Jan 23 14:16:19 my_machine kernel: [ 2570.784993] ---[ end trace
> e8c95b69e4ef5f71 ]---
> Jan 23 14:16:19 my_machine kernel: [ 2570.794692] Fixing recursive fault but
> reboot is needed!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-25 21:06 ` Liu Bo
@ 2017-01-28 20:50 ` Matt McKinnon
2017-03-13 21:58 ` Kai Krakow
0 siblings, 1 reply; 8+ messages in thread
From: Matt McKinnon @ 2017-01-28 20:50 UTC (permalink / raw)
To: bo.li.liu; +Cc: linux-btrfs
This same file system (which crashed again with the same errors) is also
giving this output during a metadata or data balance:
Jan 27 19:42:47 my_machine kernel: [ 335.018123] BTRFS info (device
sda1): no csum found for inode 28472371 start 2191360
Jan 27 19:42:47 my_machine kernel: [ 335.018128] BTRFS info (device
sda1): no csum found for inode 28472371 start 2195456
Jan 27 19:42:47 my_machine kernel: [ 335.018491] BTRFS info (device
sda1): no csum found for inode 28472371 start 4018176
Jan 27 19:42:47 my_machine kernel: [ 335.018496] BTRFS info (device
sda1): no csum found for inode 28472371 start 4022272
Jan 27 19:42:47 my_machine kernel: [ 335.018499] BTRFS info (device
sda1): no csum found for inode 28472371 start 4026368
Jan 27 19:42:47 my_machine kernel: [ 335.018502] BTRFS info (device
sda1): no csum found for inode 28472371 start 4030464
Jan 27 19:42:47 my_machine kernel: [ 335.019443] BTRFS info (device
sda1): no csum found for inode 28472371 start 6156288
Jan 27 19:42:47 my_machine kernel: [ 335.019688] BTRFS info (device
sda1): no csum found for inode 28472371 start 7933952
Jan 27 19:42:47 my_machine kernel: [ 335.019693] BTRFS info (device
sda1): no csum found for inode 28472371 start 7938048
Jan 27 19:42:47 my_machine kernel: [ 335.019754] BTRFS info (device
sda1): no csum found for inode 28472371 start 8077312
Jan 27 19:42:47 my_machine kernel: [ 335.025485] BTRFS warning (device
sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.025490] BTRFS warning (device
sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.025526] BTRFS warning (device
sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.025531] BTRFS warning (device
sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.025534] BTRFS warning (device
sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.025540] BTRFS warning (device
sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.026142] BTRFS warning (device
sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.026147] BTRFS warning (device
sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.026934] BTRFS warning (device
sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0
Jan 27 19:42:47 my_machine kernel: [ 335.033249] BTRFS warning (device
sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0
Can these be ignored?
On 01/25/2017 04:06 PM, Liu Bo wrote:
> On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:
>> Wondering what to do about this error which says 'reboot needed'. Has
>> happened a three times in the past week:
>>
>
> Well, I don't think btrfs's logic here is wrong, the following stack
> shows that a nfs client has sent a second unlink against the same inode
> while somehow the inode was not fully deleted by the first unlink.
>
> So it'd be good that you could add some debugging information to get us
> further.
>
> Thanks,
>
> -liubo
>
>> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
>> err add delayed dir index item(index: 23810) into the deletion tree of the
>> delayed node(root id: 257, inode id: 2661433, errno: -17)
>> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] ------------[ cut here
>> ]------------
>> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
>> fs/btrfs/delayed-inode.c:1557!
>> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 0000 [#1]
>> SMP
>> Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
>> qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
>> ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
>> nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
>> th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
>> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
>> el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
>> d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
>> btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
>> sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
>> async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
>> bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
>> Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core
>> linear dm_mirror dm_region_hash dm_log
>> Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
>> nfsd Tainted: G W 4.9.5-custom #1
>> Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: Supermicro
>> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28
>> /2014
>> Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: ffff95a42addab80
>> task.stack: ffffb9da85330000
>> Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP:
>> 0010:[<ffffffffc0567ee6>] [<ffffffffc0567ee6>]
>> btrfs_delete_delayed_dir_inde
>> x+0x286/0x290 [btrfs]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 0018:ffffb9da85333be0
>> EFLAGS: 00010286
>> Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX: 0000000000000000 RBX:
>> ffff95a3b104b690 RCX: 0000000000000000
>> Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0000000000000001 RSI:
>> ffff95a42fc0dcc8 RDI: ffff95a42fc0dcc8
>> Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: ffffb9da85333c48 R08:
>> 0000000000000491 R09: 0000000000000000
>> Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0000000000000005 R11:
>> 0000000000000006 R12: ffff95a3b104b6d8
>> Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 0000000000005d02 R14:
>> ffff95a82953d800 R15: 00000000ffffffef
>> Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 0000000000000000(0000)
>> GS:ffff95a42fc00000(0000) knlGS:0000000000000000
>> Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS: 0010 DS: 0000 ES:
>> 0000 CR0: 0000000080050033
>> Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 00007f12ea376000 CR3:
>> 00000003e1e07000 CR4: 00000000001406f0
>> Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
>> Jan 23 14:16:17 my_machine kernel: [ 2568.799524] ffffffff9b7fe5f2
>> ffff95a3b104b560 0000000000040000 ffff95a3f96b3e80
>> Jan 23 14:16:17 my_machine kernel: [ 2568.806983] ffff95a3f96b3e80
>> 39ff95a814eeeb68 600000000000289c 0000000000005d02
>> Jan 23 14:16:17 my_machine kernel: [ 2568.814436] ffff95a3f7457c40
>> ffff95a3bcb74138 ffff95a814eeeb68 0000000000289c39
>> Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
>> Jan 23 14:16:17 my_machine kernel: [ 2568.824343] [<ffffffff9b7fe5f2>] ?
>> mutex_lock+0x12/0x2f
>> Jan 23 14:16:17 my_machine kernel: [ 2568.829671] [<ffffffffc0513488>]
>> __btrfs_unlink_inode+0x198/0x4c0 [btrfs]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.836555] [<ffffffffc0516dec>]
>> btrfs_unlink_inode+0x1c/0x40 [btrfs]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.843086] [<ffffffffc0516e7b>]
>> btrfs_unlink+0x6b/0xb0 [btrfs]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.849091] [<ffffffff9b21ea9a>]
>> vfs_unlink+0xda/0x190
>> Jan 23 14:16:17 my_machine kernel: [ 2568.854315] [<ffffffff9b21ac83>] ?
>> lookup_one_len+0xd3/0x130
>> Jan 23 14:16:17 my_machine kernel: [ 2568.860075] [<ffffffffc09160ae>]
>> nfsd_unlink+0x16e/0x210 [nfsd]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.866084] [<ffffffffc091d63c>]
>> nfsd3_proc_remove+0x7c/0x110 [nfsd]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.872529] [<ffffffffc09102a8>]
>> nfsd_dispatch+0xb8/0x1f0 [nfsd]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.878641] [<ffffffffc064e68f>]
>> svc_process_common+0x43f/0x700 [sunrpc]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.885432] [<ffffffffc064f80c>]
>> svc_process+0xfc/0x1c0 [sunrpc]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.891528] [<ffffffffc090fd00>]
>> nfsd+0xf0/0x160 [nfsd]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.896838] [<ffffffffc090fc10>] ?
>> nfsd_destroy+0x60/0x60 [nfsd]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.902931] [<ffffffff9b09cd4a>]
>> kthread+0xca/0xe0
>> Jan 23 14:16:17 my_machine kernel: [ 2568.907807] [<ffffffff9b09cc80>] ?
>> kthread_park+0x60/0x60
>> Jan 23 14:16:17 my_machine kernel: [ 2568.913296] [<ffffffff9b801075>]
>> ret_from_fork+0x25/0x30
>> Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 10 49
>> 8b be f0 01 00 00 45 89 f9 4c 8b 03 4c 89 ea 48 c7 c6 f
>> 0 8f 59 c0 48 8b 88 48 03 00 00 31 c0 e8 ba 36 f7 ff <0f> 0b 0f 1f 84 00 00
>> 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48
>> Jan 23 14:16:17 my_machine kernel: [ 2568.938651] RIP [<ffffffffc0567ee6>]
>> btrfs_delete_delayed_dir_index+0x286/0x290 [btrfs]
>> Jan 23 14:16:17 my_machine kernel: [ 2568.946773] RSP <ffffb9da85333be0>
>> Jan 23 14:16:17 my_machine kernel: [ 2568.996481] ---[ end trace
>> e8c95b69e4ef5f70 ]---
>> Jan 23 14:16:19 my_machine kernel: [ 2570.503671] BUG: unable to handle
>> kernel NULL pointer dereference at 0000000000000246
>> Jan 23 14:16:19 my_machine kernel: [ 2570.511551] IP: [<ffffffff9b0c0ecb>]
>> __wake_up_common+0x2b/0x90
>> Jan 23 14:16:19 my_machine kernel: [ 2570.517498] PGD 46a002067
>> Jan 23 14:16:19 my_machine kernel: [ 2570.520036] PUD 45af9c067
>> Jan 23 14:16:19 my_machine kernel: [ 2570.522748] PMD 0
>> Jan 23 14:16:19 my_machine kernel: [ 2570.523284]
>> Jan 23 14:23:50 riperton kernel: [ 3021.853513] [<ffffffff9b18407f>]
>> queued_spin_lock_slowpath+0xb/0xf
>> Jan 23 14:23:50 riperton kernel: [ 3021.859776] [<ffffffff9b800b80>]
>> _raw_spin_lock+0x20/0x30
>> Jan 23 14:23:50 riperton kernel: [ 3021.865261] [<ffffffff9b27c0bd>]
>> pid_revalidate+0x4d/0xf0
>> Jan 23 14:23:50 riperton kernel: [ 3021.870747] [<ffffffff9b21a74b>]
>> lookup_fast+0x29b/0x2c0
>> Jan 23 14:23:50 riperton kernel: [ 3021.876147] [<ffffffff9b21d7c2>]
>> path_openat+0x172/0x1370
>> Jan 23 14:16:19 my_machine kernel: [ 2570.524789] Oops: 0000 [#2] SMP
>> Jan 23 14:16:19 my_machine kernel: [ 2570.527932] Modules linked in: ufs
>> qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_reject_ipv4
>> xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
>> iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs
>> lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal
>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
>> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
>> glue_helper ablk_helper cryptd dm_multipath joydev mei_me mei lpc_ich
>> ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses
>> enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy
>> async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic
>> i2c_algo_bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
>> Jan 23 14:16:19 my_machine kernel: [ 2570.600135] hid libahci pps_core
>> linear dm_mirror dm_region_hash dm_log
>> Jan 23 14:16:19 my_machine kernel: [ 2570.605651] CPU: 2 PID: 2440 Comm:
>> nfsd Tainted: G D W 4.9.5-custom #1
>> Jan 23 14:16:19 my_machine kernel: [ 2570.613128] Hardware name: Supermicro
>> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
>> Jan 23 14:16:19 my_machine kernel: [ 2570.622168] task: ffff95a42addab80
>> task.stack: ffffb9da85330000
>> Jan 23 14:16:19 my_machine kernel: [ 2570.628085] RIP:
>> 0010:[<ffffffff9b0c0ecb>] [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
>> Jan 23 14:16:19 my_machine kernel: [ 2570.636451] RSP: 0018:ffffb9da85333e58
>> EFLAGS: 00010082
>> Jan 23 14:16:19 my_machine kernel: [ 2570.641762] RAX: 0000000000000282 RBX:
>> ffffb9da85333f18 RCX: 0000000000000000
>> Jan 23 14:16:19 my_machine kernel: [ 2570.648897] RDX: 0000000000000246 RSI:
>> 0000000000000003 RDI: ffffb9da85333f18
>> Jan 23 14:16:19 my_machine kernel: [ 2570.656028] RBP: ffffb9da85333e90 R08:
>> 0000000000000000 R09: ffff95a429c7ba00
>> Jan 23 14:16:19 my_machine kernel: [ 2570.663162] R10: 000002567df4f057 R11:
>> 0000000000000001 R12: ffffb9da85333f20
>> Jan 23 14:16:19 my_machine kernel: [ 2570.670295] R13: 0000000000000282 R14:
>> 0000000000000000 R15: 0000000000000003
>> Jan 23 14:16:19 my_machine kernel: [ 2570.677427] FS: 0000000000000000(0000)
>> GS:ffff95a42fd00000(0000) knlGS:0000000000000000
>> Jan 23 14:16:19 my_machine kernel: [ 2570.685513] CS: 0010 DS: 0000 ES:
>> 0000 CR0: 0000000080050033
>> Jan 23 14:16:19 my_machine kernel: [ 2570.691261] CR2: 0000000000000246 CR3:
>> 000000045a400000 CR4: 00000000001406e0
>> Jan 23 14:16:19 my_machine kernel: [ 2570.698393] Stack:
>> Jan 23 14:16:19 my_machine kernel: [ 2570.700411] 0000000100000246
>> 0000000000000000 ffffb9da85333f18 ffffb9da85333f10
>> Jan 23 14:16:19 my_machine kernel: [ 2570.707865] 0000000000000282
>> ffff95a42addab80 0000000000000000 ffffb9da85333ea0
>> Jan 23 14:16:19 my_machine kernel: [ 2570.715326] ffffffff9b0c0f43
>> ffffb9da85333ec8 ffffffff9b0c1967 ffff95a42addb2a8
>> Jan 23 14:16:19 my_machine kernel: [ 2570.722797] Call Trace:
>> Jan 23 14:16:19 my_machine kernel: [ 2570.725267] [<ffffffff9b0c0f43>]
>> __wake_up_locked+0x13/0x20
>> Jan 23 14:16:19 my_machine kernel: [ 2570.730923] [<ffffffff9b0c1967>]
>> complete+0x37/0x50
>> Jan 23 14:16:19 my_machine kernel: [ 2570.735892] [<ffffffff9b07a74f>]
>> mm_release+0xbf/0x140
>> Jan 23 14:16:19 my_machine kernel: [ 2570.741113] [<ffffffff9b08168a>]
>> do_exit+0x13a/0xad0
>> Jan 23 14:16:19 my_machine kernel: [ 2570.746169] [<ffffffff9b802627>]
>> rewind_stack_do_exit+0x17/0x20
>> Jan 23 14:16:19 my_machine kernel: [ 2570.752170] Code: 0f 1f 44 00 00 55 48
>> 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 08 53 48 83 ec 10
>> 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 0a 49 39 d4 48 8d 42 e8 4c 8d 69 e8
>> 75 08 eb 38 4c 89 e8
>> Jan 23 14:16:19 my_machine kernel: [ 2570.772172] RIP [<ffffffff9b0c0ecb>]
>> __wake_up_common+0x2b/0x90
>> Jan 23 14:16:19 my_machine kernel: [ 2570.778196] RSP <ffffb9da85333e58>
>> Jan 23 14:16:19 my_machine kernel: [ 2570.781680] CR2: 0000000000000246
>> Jan 23 14:16:19 my_machine kernel: [ 2570.784993] ---[ end trace
>> e8c95b69e4ef5f71 ]---
>> Jan 23 14:16:19 my_machine kernel: [ 2570.794692] Fixing recursive fault but
>> reboot is needed!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-01-28 20:50 ` Matt McKinnon
@ 2017-03-13 21:58 ` Kai Krakow
2017-03-13 22:19 ` Omar Sandoval
0 siblings, 1 reply; 8+ messages in thread
From: Kai Krakow @ 2017-03-13 21:58 UTC (permalink / raw)
To: linux-btrfs
Am Sat, 28 Jan 2017 15:50:38 -0500
schrieb Matt McKinnon <matt@techsquare.com>:
> This same file system (which crashed again with the same errors) is
> also giving this output during a metadata or data balance:
This looks somewhat familiar to the err=-17 that I am experiencing when
using VirtualBox image on btrfs in CoW mode (compress=lzo).
During IO intensive workloads, it results in "object already exists,
err -17" (or similar, someone else also experienced it through another
workload). The resulting btrfs check show the same errors, giving
inodes without csum.
Trying to continue using this file system in successive boots usually
results in boot freezes or complete unmountable filesystem, broken
beyond repair.
I'm feeling that using the bfq elevator usually enables me to trigger
this bug also without using VirtualBox, i.e. during normal system
usage, and mostly during boot when IO load is very high. So I also
stopped using bfq although it was giving me a much superior
interactivity.
Marking vbox images nocow and using standard elevators (cfq, deadline)
exposes no such problems so far - even during excessive IO loads.
EOM
> Jan 27 19:42:47 my_machine kernel: [ 335.018123] BTRFS info (device
> sda1): no csum found for inode 28472371 start 2191360
> Jan 27 19:42:47 my_machine kernel: [ 335.018128] BTRFS info (device
> sda1): no csum found for inode 28472371 start 2195456
> Jan 27 19:42:47 my_machine kernel: [ 335.018491] BTRFS info (device
> sda1): no csum found for inode 28472371 start 4018176
> Jan 27 19:42:47 my_machine kernel: [ 335.018496] BTRFS info (device
> sda1): no csum found for inode 28472371 start 4022272
> Jan 27 19:42:47 my_machine kernel: [ 335.018499] BTRFS info (device
> sda1): no csum found for inode 28472371 start 4026368
> Jan 27 19:42:47 my_machine kernel: [ 335.018502] BTRFS info (device
> sda1): no csum found for inode 28472371 start 4030464
> Jan 27 19:42:47 my_machine kernel: [ 335.019443] BTRFS info (device
> sda1): no csum found for inode 28472371 start 6156288
> Jan 27 19:42:47 my_machine kernel: [ 335.019688] BTRFS info (device
> sda1): no csum found for inode 28472371 start 7933952
> Jan 27 19:42:47 my_machine kernel: [ 335.019693] BTRFS info (device
> sda1): no csum found for inode 28472371 start 7938048
> Jan 27 19:42:47 my_machine kernel: [ 335.019754] BTRFS info (device
> sda1): no csum found for inode 28472371 start 8077312
> Jan 27 19:42:47 my_machine kernel: [ 335.025485] BTRFS warning
> (device sda1): csum failed ino 28472371 off 2191360 csum 4031061501
> expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025490]
> BTRFS warning (device sda1): csum failed ino 28472371 off 2195456
> csum 2371784003 expected csum 0 Jan 27 19:42:47 my_machine kernel:
> [ 335.025526] BTRFS warning (device sda1): csum failed ino 28472371
> off 4018176 csum 3812080098 expected csum 0 Jan 27 19:42:47
> my_machine kernel: [ 335.025531] BTRFS warning (device sda1): csum
> failed ino 28472371 off 4022272 csum 2776681411 expected csum 0 Jan
> 27 19:42:47 my_machine kernel: [ 335.025534] BTRFS warning (device
> sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected
> csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025540] BTRFS
> warning (device sda1): csum failed ino 28472371 off 4030464 csum
> 1256914217 expected csum 0 Jan 27 19:42:47 my_machine kernel:
> [ 335.026142] BTRFS warning (device sda1): csum failed ino 28472371
> off 7933952 csum 2695958066 expected csum 0 Jan 27 19:42:47
> my_machine kernel: [ 335.026147] BTRFS warning (device sda1): csum
> failed ino 28472371 off 7938048 csum 3260800596 expected csum 0 Jan
> 27 19:42:47 my_machine kernel: [ 335.026934] BTRFS warning (device
> sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected
> csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.033249] BTRFS
> warning (device sda1): csum failed ino 28472371 off 8077312 csum
> 4031878292 expected csum 0
>
> Can these be ignored?
>
>
> On 01/25/2017 04:06 PM, Liu Bo wrote:
> > On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:
> >> Wondering what to do about this error which says 'reboot needed'.
> >> Has happened a three times in the past week:
> >>
> >
> > Well, I don't think btrfs's logic here is wrong, the following stack
> > shows that a nfs client has sent a second unlink against the same
> > inode while somehow the inode was not fully deleted by the first
> > unlink.
> >
> > So it'd be good that you could add some debugging information to
> > get us further.
> >
> > Thanks,
> >
> > -liubo
> >
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error
> >> (device sda1): err add delayed dir index item(index: 23810) into
> >> the deletion tree of the delayed node(root id: 257, inode id:
> >> 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel:
> >> [ 2568.611010] ------------[ cut here ]------------
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> >> fs/btrfs/delayed-inode.c:1557!
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:
> >> 0000 [#1] SMP
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked
> >> in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
> >> ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> >> nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
> >> th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
> >> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
> >> el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> >> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
> >> d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si
> >> ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure
> >> scsi_tran sport_sas raid10 raid456 async_raid6_recov async_memcpy
> >> async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic
> >> i2c_algo_ bit raid1 dca usbhid ahci raid0 ptp megaraid_sas
> >> multipath Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid
> >> libahci pps_core linear dm_mirror dm_region_hash dm_log
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440
> >> Comm: nfsd Tainted: G W 4.9.5-custom #1
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name:
> >> Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28
> >> /2014
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task:
> >> ffff95a42addab80 task.stack: ffffb9da85330000
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP:
> >> 0010:[<ffffffffc0567ee6>] [<ffffffffc0567ee6>]
> >> btrfs_delete_delayed_dir_inde
> >> x+0x286/0x290 [btrfs]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP:
> >> 0018:ffffb9da85333be0 EFLAGS: 00010286
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:
> >> 0000000000000000 RBX: ffff95a3b104b690 RCX: 0000000000000000
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX:
> >> 0000000000000001 RSI: ffff95a42fc0dcc8 RDI: ffff95a42fc0dcc8
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP:
> >> ffffb9da85333c48 R08: 0000000000000491 R09: 0000000000000000
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10:
> >> 0000000000000005 R11: 0000000000000006 R12: ffff95a3b104b6d8
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13:
> >> 0000000000005d02 R14: ffff95a82953d800 R15: 00000000ffffffef
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS:
> >> 0000000000000000(0000) GS:ffff95a42fc00000(0000)
> >> knlGS:0000000000000000 Jan 23 14:16:17 my_machine kernel:
> >> [ 2568.784639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2:
> >> 00007f12ea376000 CR3: 00000003e1e07000 CR4: 00000000001406f0
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.799524] ffffffff9b7fe5f2
> >> ffff95a3b104b560 0000000000040000 ffff95a3f96b3e80
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.806983] ffff95a3f96b3e80
> >> 39ff95a814eeeb68 600000000000289c 0000000000005d02
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.814436] ffff95a3f7457c40
> >> ffff95a3bcb74138 ffff95a814eeeb68 0000000000289c39
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.824343]
> >> [<ffffffff9b7fe5f2>] ? mutex_lock+0x12/0x2f
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.829671]
> >> [<ffffffffc0513488>] __btrfs_unlink_inode+0x198/0x4c0 [btrfs]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.836555]
> >> [<ffffffffc0516dec>] btrfs_unlink_inode+0x1c/0x40 [btrfs]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.843086]
> >> [<ffffffffc0516e7b>] btrfs_unlink+0x6b/0xb0 [btrfs]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.849091]
> >> [<ffffffff9b21ea9a>] vfs_unlink+0xda/0x190
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.854315]
> >> [<ffffffff9b21ac83>] ? lookup_one_len+0xd3/0x130
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.860075]
> >> [<ffffffffc09160ae>] nfsd_unlink+0x16e/0x210 [nfsd]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.866084]
> >> [<ffffffffc091d63c>] nfsd3_proc_remove+0x7c/0x110 [nfsd]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.872529]
> >> [<ffffffffc09102a8>] nfsd_dispatch+0xb8/0x1f0 [nfsd]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.878641]
> >> [<ffffffffc064e68f>] svc_process_common+0x43f/0x700 [sunrpc]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.885432]
> >> [<ffffffffc064f80c>] svc_process+0xfc/0x1c0 [sunrpc]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.891528]
> >> [<ffffffffc090fd00>] nfsd+0xf0/0x160 [nfsd]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.896838]
> >> [<ffffffffc090fc10>] ? nfsd_destroy+0x60/0x60 [nfsd]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.902931]
> >> [<ffffffff9b09cd4a>] kthread+0xca/0xe0
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.907807]
> >> [<ffffffff9b09cc80>] ? kthread_park+0x60/0x60
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.913296]
> >> [<ffffffff9b801075>] ret_from_fork+0x25/0x30
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48
> >> 8b 43 10 49 8b be f0 01 00 00 45 89 f9 4c 8b 03 4c 89 ea 48 c7 c6 f
> >> 0 8f 59 c0 48 8b 88 48 03 00 00 31 c0 e8 ba 36 f7 ff <0f> 0b 0f 1f
> >> 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.938651] RIP
> >> [<ffffffffc0567ee6>] btrfs_delete_delayed_dir_index+0x286/0x290
> >> [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.946773] RSP
> >> <ffffb9da85333be0> Jan 23 14:16:17 my_machine kernel:
> >> [ 2568.996481] ---[ end trace e8c95b69e4ef5f70 ]---
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.503671] BUG: unable to
> >> handle kernel NULL pointer dereference at 0000000000000246
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.511551] IP:
> >> [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.517498] PGD 46a002067
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.520036] PUD 45af9c067
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.522748] PMD 0
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.523284]
> >> Jan 23 14:23:50 riperton kernel: [ 3021.853513]
> >> [<ffffffff9b18407f>] queued_spin_lock_slowpath+0xb/0xf
> >> Jan 23 14:23:50 riperton kernel: [ 3021.859776]
> >> [<ffffffff9b800b80>] _raw_spin_lock+0x20/0x30
> >> Jan 23 14:23:50 riperton kernel: [ 3021.865261]
> >> [<ffffffff9b27c0bd>] pid_revalidate+0x4d/0xf0
> >> Jan 23 14:23:50 riperton kernel: [ 3021.870747]
> >> [<ffffffff9b21a74b>] lookup_fast+0x29b/0x2c0
> >> Jan 23 14:23:50 riperton kernel: [ 3021.876147]
> >> [<ffffffff9b21d7c2>] path_openat+0x172/0x1370
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.524789] Oops: 0000 [#2]
> >> SMP Jan 23 14:16:19 my_machine kernel: [ 2570.527932] Modules
> >> linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs
> >> ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4
> >> nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables
> >> x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace
> >> sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal
> >> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> >> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw
> >> gf128mul glue_helper ablk_helper cryptd dm_multipath joydev mei_me
> >> mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp
> >> mac_hid lp parport ses enclosure scsi_transport_sas raid10 raid456
> >> async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> >> raid6_pq libcrc32c igb hid_generic i2c_algo_bit raid1 dca usbhid
> >> ahci raid0 ptp megaraid_sas multipath Jan 23 14:16:19 my_machine
> >> kernel: [ 2570.600135] hid libahci pps_core linear dm_mirror
> >> dm_region_hash dm_log Jan 23 14:16:19 my_machine kernel:
> >> [ 2570.605651] CPU: 2 PID: 2440 Comm: nfsd Tainted: G D
> >> W 4.9.5-custom #1 Jan 23 14:16:19 my_machine kernel:
> >> [ 2570.613128] Hardware name: Supermicro
> >> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.622168] task:
> >> ffff95a42addab80 task.stack: ffffb9da85330000 Jan 23 14:16:19
> >> my_machine kernel: [ 2570.628085] RIP: 0010:[<ffffffff9b0c0ecb>]
> >> [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90 Jan 23 14:16:19
> >> my_machine kernel: [ 2570.636451] RSP: 0018:ffffb9da85333e58
> >> EFLAGS: 00010082 Jan 23 14:16:19 my_machine kernel: [ 2570.641762]
> >> RAX: 0000000000000282 RBX: ffffb9da85333f18 RCX: 0000000000000000
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.648897] RDX:
> >> 0000000000000246 RSI: 0000000000000003 RDI: ffffb9da85333f18 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.656028] RBP:
> >> ffffb9da85333e90 R08: 0000000000000000 R09: ffff95a429c7ba00 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.663162] R10:
> >> 000002567df4f057 R11: 0000000000000001 R12: ffffb9da85333f20 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.670295] R13:
> >> 0000000000000282 R14: 0000000000000000 R15: 0000000000000003 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.677427] FS:
> >> 0000000000000000(0000) GS:ffff95a42fd00000(0000)
> >> knlGS:0000000000000000 Jan 23 14:16:19 my_machine kernel:
> >> [ 2570.685513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.691261] CR2:
> >> 0000000000000246 CR3: 000000045a400000 CR4: 00000000001406e0 Jan
> >> 23 14:16:19 my_machine kernel: [ 2570.698393] Stack: Jan 23
> >> 14:16:19 my_machine kernel: [ 2570.700411] 0000000100000246
> >> 0000000000000000 ffffb9da85333f18 ffffb9da85333f10 Jan 23 14:16:19
> >> my_machine kernel: [ 2570.707865] 0000000000000282
> >> ffff95a42addab80 0000000000000000 ffffb9da85333ea0 Jan 23 14:16:19
> >> my_machine kernel: [ 2570.715326] ffffffff9b0c0f43
> >> ffffb9da85333ec8 ffffffff9b0c1967 ffff95a42addb2a8 Jan 23 14:16:19
> >> my_machine kernel: [ 2570.722797] Call Trace: Jan 23 14:16:19
> >> my_machine kernel: [ 2570.725267] [<ffffffff9b0c0f43>]
> >> __wake_up_locked+0x13/0x20 Jan 23 14:16:19 my_machine kernel:
> >> [ 2570.730923] [<ffffffff9b0c1967>] complete+0x37/0x50
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.735892]
> >> [<ffffffff9b07a74f>] mm_release+0xbf/0x140
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.741113]
> >> [<ffffffff9b08168a>] do_exit+0x13a/0xad0
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.746169]
> >> [<ffffffff9b802627>] rewind_stack_do_exit+0x17/0x20
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.752170] Code: 0f 1f 44
> >> 00 00 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d
> >> 67 08 53 48 83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 0a
> >> 49 39 d4 48 8d 42 e8 4c 8d 69 e8 75 08 eb 38 4c 89 e8
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.772172] RIP
> >> [<ffffffff9b0c0ecb>] __wake_up_common+0x2b/0x90
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.778196] RSP
> >> <ffffb9da85333e58> Jan 23 14:16:19 my_machine kernel:
> >> [ 2570.781680] CR2: 0000000000000246 Jan 23 14:16:19 my_machine
> >> kernel: [ 2570.784993] ---[ end trace e8c95b69e4ef5f71 ]---
> >> Jan 23 14:16:19 my_machine kernel: [ 2570.794692] Fixing recursive
> >> fault but reboot is needed!
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe
> >> linux-btrfs" in the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at
> >> http://vger.kernel.org/majordomo-info.html
--
Regards,
Kai
Replies to list-only preferred.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Hard crash on 4.9.5
2017-03-13 21:58 ` Kai Krakow
@ 2017-03-13 22:19 ` Omar Sandoval
0 siblings, 0 replies; 8+ messages in thread
From: Omar Sandoval @ 2017-03-13 22:19 UTC (permalink / raw)
To: Kai Krakow; +Cc: linux-btrfs
On Mon, Mar 13, 2017 at 10:58:29PM +0100, Kai Krakow wrote:
> Am Sat, 28 Jan 2017 15:50:38 -0500
> schrieb Matt McKinnon <matt@techsquare.com>:
>
> > This same file system (which crashed again with the same errors) is
> > also giving this output during a metadata or data balance:
>
> This looks somewhat familiar to the err=-17 that I am experiencing when
> using VirtualBox image on btrfs in CoW mode (compress=lzo).
>
> During IO intensive workloads, it results in "object already exists,
> err -17" (or similar, someone else also experienced it through another
> workload). The resulting btrfs check show the same errors, giving
> inodes without csum.
>
> Trying to continue using this file system in successive boots usually
> results in boot freezes or complete unmountable filesystem, broken
> beyond repair.
>
> I'm feeling that using the bfq elevator usually enables me to trigger
> this bug also without using VirtualBox, i.e. during normal system
> usage, and mostly during boot when IO load is very high. So I also
> stopped using bfq although it was giving me a much superior
> interactivity.
>
> Marking vbox images nocow and using standard elevators (cfq, deadline)
> exposes no such problems so far - even during excessive IO loads.
>
> EOM
This sounds similar to a bug I fixed here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e2bd3b7fac91b79a6115fd1511ca20b2a09696d
That change is in v4.10. If you're not already running a kernel version
with that fix, could you check if that solves it?
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-03-13 22:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-23 20:03 Hard crash on 4.9.5 Matt McKinnon
2017-01-23 20:27 ` Hans van Kranenburg
2017-01-23 20:33 ` Hans van Kranenburg
2017-01-25 16:20 ` Liu Bo
2017-01-25 21:06 ` Liu Bo
2017-01-28 20:50 ` Matt McKinnon
2017-03-13 21:58 ` Kai Krakow
2017-03-13 22:19 ` Omar Sandoval
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.