linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ext4 deadlock (+lockdep splat) during rsync
@ 2017-01-08 22:41 George Spelvin
  2017-01-10  3:40 ` Theodore Ts'o
  2017-01-19 17:37 ` Jan Kara
  0 siblings, 2 replies; 14+ messages in thread
From: George Spelvin @ 2017-01-08 22:41 UTC (permalink / raw)
  To: linux-ext4; +Cc: linux

After replacing a drive in a RAID array, I tried to bring some things
up to date with rsync and ran into an annoyingly repeatable deadlock.

So I found a chance to boot with a lockdep kernel and immediately turned up the following:

[  755.740865] =============================================
[  755.741072] [ INFO: possible recursive locking detected ]
[  755.741279] 4.9.1-00126-gfbb9fcc9-dirty #576 Not tainted
[  755.741489] ---------------------------------------------
[  755.741699] rsync/14818 is trying to acquire lock:
[  755.741907]  (&ei->xattr_sem){++++..}, at: [<ffffffff81234603>] ext4_expand_extra_isize_ea+0x63/0x850
[  755.742145] but task is already holding lock:
[  755.742742]  (&ei->xattr_sem){++++..}, at: [<ffffffff81236f95>] ext4_try_add_inline_entry+0x55/0x1a0
[  755.743102] other info that might help us debug this:
[  755.743802]  Possible unsafe locking scenario:
[  755.743802]        CPU0
[  755.743802]        ----
[  755.743802]   lock(&ei->xattr_sem);
[  755.743802]   lock(&ei->xattr_sem);
[  755.743802] *** DEADLOCK ***
[  755.743802]  May be due to missing lock nesting notation
[  755.743802] 4 locks held by rsync/14818:
[  755.743802]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811a0fef>] mnt_want_write+0x1f/0x50
[  755.743802]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<ffffffff8118b058>] path_openat+0x2f8/0x9f0
[  755.743802]  #2:  (jbd2_handle){++++..}, at: [<ffffffff81239aa6>] start_this_handle+0x196/0x540
[  755.743802]  #3:  (&ei->xattr_sem){++++..}, at: [<ffffffff81236f95>] ext4_try_add_inline_entry+0x55/0x1a0
[  755.743802] stack backtrace:
[  755.743802] CPU: 0 PID: 14818 Comm: rsync Not tainted 4.9.1-00126-gfbb9fcc9-dirty #576
[  755.743802] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X79-UP4, BIOS F7 03/19/2014
[  755.743802]  ffffc9000c273820 ffffffff812a6d05 ffffffff8253a080 ffffffff8253a080
[  755.743802]  ffffc9000c2738d8 ffffffff810c7eab ffffc9000c272000 ffffc90000000004
[  755.743802]  0000000000000000 ffffffff81e0b100 1a883a7e30ec461a ffffffff8253a080
[  755.743802] Call Trace:
[  755.743802]  [<ffffffff812a6d05>] dump_stack+0x68/0x93
[  755.743802]  [<ffffffff810c7eab>] __lock_acquire+0x7ab/0x1270
[  755.743802]  [<ffffffff810c8d10>] lock_acquire+0x60/0x80
[  755.743802]  [<ffffffff81234603>] ? ext4_expand_extra_isize_ea+0x63/0x850
[  755.743802]  [<ffffffff81793a44>] down_write+0x44/0x80
[  755.743802]  [<ffffffff81234603>] ? ext4_expand_extra_isize_ea+0x63/0x850
[  755.743802]  [<ffffffff81234603>] ext4_expand_extra_isize_ea+0x63/0x850
[  755.743802]  [<ffffffff81795ec2>] ? _raw_read_unlock+0x22/0x30
[  755.743802]  [<ffffffff8123a542>] ? jbd2_journal_extend+0x132/0x1b0
[  755.743802]  [<ffffffff812002c9>] ext4_mark_inode_dirty+0x129/0x180
[  755.743802]  [<ffffffff81235d64>] ext4_add_dirent_to_inline.isra.16+0xe4/0x100
[  755.743802]  [<ffffffff81236fd9>] ext4_try_add_inline_entry+0x99/0x1a0
[  755.743802]  [<ffffffff8120b102>] ext4_add_entry+0x1d2/0x370
[  755.743802]  [<ffffffff8120b2b9>] ext4_add_nondir+0x19/0x70
[  755.743802]  [<ffffffff8120b523>] ext4_create+0xc3/0x150
[  755.743802]  [<ffffffff8118aaf8>] lookup_open+0x3d8/0x640
[  755.743802]  [<ffffffff8118b072>] path_openat+0x312/0x9f0
[  755.743802]  [<ffffffff8118d309>] do_filp_open+0x79/0xd0
[  755.743802]  [<ffffffff81795bc2>] ? _raw_spin_unlock+0x22/0x30
[  755.743802]  [<ffffffff8119d9b3>] ? __alloc_fd+0xf3/0x200
[  755.743802]  [<ffffffff8117b4fe>] do_sys_open+0x11e/0x1f0
[  755.743802]  [<ffffffff811d2046>] compat_SyS_open+0x16/0x20
[  755.743802]  [<ffffffff810027e4>] do_fast_syscall_32+0x94/0x210
[  755.743802]  [<ffffffff81797da1>] entry_SYSENTER_compat+0x51/0x60

Followed by repetitions of 
[ 1105.549736] INFO: task rsync:14818 blocked for more than 120 seconds.
[ 1105.550089]       Not tainted 4.9.1-00126-gfbb9fcc9-dirty #576
[ 1105.550436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1105.550801] rsync           D    0 14818      1 0x20020004
[ 1105.551173]  ffff88040f7f1900 ffff88042d660000 0000000000000000 ffff8804082a2b00
[ 1105.551552]  ffff88042f216298 ffffc9000c273820 ffffffff8178fc08 0000000000000046
[ 1105.551933]  ffff8804082a3060 ffff88042f216298 ffff88042f216280 ffff8804082a2b00
[ 1105.552311] Call Trace:
[ 1105.552678]  [<ffffffff8178fc08>] ? __schedule+0x2b8/0x840
[ 1105.553051]  [<ffffffff817901c8>] schedule+0x38/0x90
[ 1105.553438]  [<ffffffff817945b5>] ? rwsem_down_write_failed+0x2a5/0x450
[ 1105.553810]  [<ffffffff817945ba>] rwsem_down_write_failed+0x2aa/0x450
[ 1105.554186]  [<ffffffff81794360>] ? rwsem_down_write_failed+0x50/0x450
[ 1105.554571]  [<ffffffff812b41d7>] call_rwsem_down_write_failed+0x17/0x30
[ 1105.554953]  [<ffffffff812a6d21>] ? dump_stack+0x84/0x93
[ 1105.555335]  [<ffffffff81793a5f>] down_write+0x5f/0x80
[ 1105.555720]  [<ffffffff81234603>] ? ext4_expand_extra_isize_ea+0x63/0x850
[ 1105.556111]  [<ffffffff81234603>] ext4_expand_extra_isize_ea+0x63/0x850
[ 1105.556502]  [<ffffffff81795ec2>] ? _raw_read_unlock+0x22/0x30
[ 1105.556900]  [<ffffffff8123a542>] ? jbd2_journal_extend+0x132/0x1b0
[ 1105.557282]  [<ffffffff812002c9>] ext4_mark_inode_dirty+0x129/0x180
[ 1105.557675]  [<ffffffff81235d64>] ext4_add_dirent_to_inline.isra.16+0xe4/0x100
[ 1105.558065]  [<ffffffff81236fd9>] ext4_try_add_inline_entry+0x99/0x1a0
[ 1105.558448]  [<ffffffff8120b102>] ext4_add_entry+0x1d2/0x370
[ 1105.558832]  [<ffffffff8120b2b9>] ext4_add_nondir+0x19/0x70
[ 1105.559208]  [<ffffffff8120b523>] ext4_create+0xc3/0x150
[ 1105.559575]  [<ffffffff8118aaf8>] lookup_open+0x3d8/0x640
[ 1105.559943]  [<ffffffff8118b072>] path_openat+0x312/0x9f0
[ 1105.560287]  [<ffffffff8118d309>] do_filp_open+0x79/0xd0
[ 1105.560637]  [<ffffffff81795bc2>] ? _raw_spin_unlock+0x22/0x30
[ 1105.560978]  [<ffffffff8119d9b3>] ? __alloc_fd+0xf3/0x200
[ 1105.561316]  [<ffffffff8117b4fe>] do_sys_open+0x11e/0x1f0
[ 1105.561652]  [<ffffffff811d2046>] compat_SyS_open+0x16/0x20
[ 1105.561984]  [<ffffffff810027e4>] do_fast_syscall_32+0x94/0x210
[ 1105.562313]  [<ffffffff81797da1>] entry_SYSENTER_compat+0x51/0x60
[ 1105.562639] INFO: lockdep is turned off.

Problem first observed with a 4.8 kernel, but the lockdep one is 4.9.
I have metadata checksums turned on, 

Filesystem features:      has_journal resize_inode dir_index sparse_super2 filetype needs_recovery extent flex_bg inline_data sparse_super large_file extra_isize metadata_csum
Journal features:         journal_incompat_revoke journal_checksum_v3
Inode size:               256
Required extra isize:     28
Desired extra isize:      28

It's awakward to reboot in the middle of sending e-mail, but when I do
I can strace the rsync and find the path it's trying to create, if that
is helpful to debugging.

^ permalink raw reply	[flat|nested] 14+ messages in thread
* kernel BUG at fs/ext4/inline.c:1943!
@ 2017-01-18  8:21 George Spelvin
  2017-01-19 17:50 ` ext4_iget:4740: inode #%ld: block 48: comm find: invalid block George Spelvin
  0 siblings, 1 reply; 14+ messages in thread
From: George Spelvin @ 2017-01-18  8:21 UTC (permalink / raw)
  To: linux, tytso; +Cc: linux-ext4

I was trying to rmdir an empty directory in lost+found that accumulated
during my recent problems.

EXT4-fs (md3): mounted filesystem with writeback data mode. Opts: data=writeback,delalloc

$ cd /mountpoint/lost+found
$ rmdir \#1625089/

------------[ cut here ]------------
kernel BUG at fs/ext4/inline.c:1943!
invalid opcode: 0000 [#1] SMP
Modules linked in: nfsd lockd grace sunrpc ablk_helper via_velocity x86_pkg_temp_thermal crc32_pclmul crc32c_intel [last unloaded: twofish_common]
CPU: 0 PID: 16879 Comm: rmdir Not tainted 4.10.0-rc3-00322-g1aa1df43-dirty #591
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X79-UP4, BIOS F7 03/19/2014
task: ffff88040e4cab00 task.stack: ffffc9000dd5c000
RIP: 0010:ext4_inline_data_truncate+0x3db/0x400
RSP: 0018:ffffc9000dd5dcd8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88042d287ea0 RCX: 0000021810000000
RDX: 00000000ffffffc3 RSI: ffffc9000dd5dcf8 RDI: ffff8801a41771f0
RBP: ffffc9000dd5dd80 R08: ffff88006123e9c0 R09: ffff8803fe3d60a0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a41771f0
R13: ffff8801a4177058 R14: ffff8801a41770f0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88042f200000(0063) knlGS:00000000f7787800
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000f765fbe0 CR3: 0000000179a5e000 CR4: 00000000001406f0
Call Trace:
 ext4_truncate+0x1ea/0x300
 ext4_evict_inode+0x2c7/0x3d0
 evict+0xc0/0x190
 iput+0x14c/0x1e0
 dentry_unlink_inode+0xbe/0x160
 d_delete+0x9c/0xb0
 vfs_rmdir+0x102/0x130
 do_rmdir+0x1a3/0x1e0
 SyS_rmdir+0x11/0x20
 do_fast_syscall_32+0x94/0x210
 entry_SYSENTER_compat+0x51/0x60
RIP: 0023:0xf778aaf9
RSP: 002b:00000000ffda9e0c EFLAGS: 00000292 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00000000ffdab332 RCX: 0000000000000000
RDX: 00000000565f4000 RSI: 00000000565efb38 RDI: 00000000ffdab332
RBP: 00000000ffda9e68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: ff ff 89 c1 48 89 d7 48 c7 44 0a f8 00 00 00 00 8d 48 ff 31 c0 c1 e9 03 f3 48 ab e9 31 fe ff ff 41 bf f4 ff ff ff e9 3c fe ff ff <0f> 0b 89 c0 c7 02 00 00 00 00 c7 44 02 fc 00 00 00 00 e9 0f fe 
RIP: ext4_inline_data_truncate+0x3db/0x400 RSP: ffffc9000dd5dcd8
---[ end trace caa17f858299cde5 ]---


That's the "BUG_ON(is.s.not_found);"

debugfs says the directory entry is gone, but I can:

debugfs:  stat <1625089>
Inode: 1625089   Type: directory    Mode:  0775   Flags: 0x10000000
Generation: 927350643    Version: 0x00000000:00000004
User:  1000   Group:   161   Project:     0   Size: 132
File ACL: 1664090185    Directory ACL: 0
Links: 0   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x587f2034:472c74ec -- Wed Jan 18 02:58:44 2017
 atime: 0x56b9e2f8:b68a7658 -- Tue Feb  9 08:00:40 2016
 mtime: 0x56c1bc4b:a7765de8 -- Mon Feb 15 06:53:47 2016
crtime: 0x56ba9eb4:a51d90ac -- Tue Feb  9 21:21:40 2016
Size of extra inode fields: 32
Extended attributes:
  system.data (72)
Inode checksum: 0xe2f12fc5
Size of inline data: 132
debugfs:  dump <1625089> /tmp/inode
$ xxd /tmp/inode
00000000: 0b00 0000 0000 0000 3800 0000 0000 0000  ........8.......
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 4800 3101 xxxx xxxx xxxx xxxx xxxx xxxx  H.1.xxxxxxxxxxxx
00000050: xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  xxxxxxxxxxxxxxxx
00000060: xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  xxxxxxxxxxxxxxxx
00000070: xx2e 7064 6600 0000 0000 0000 0000 0000  x.pdf...........
00000080: 0000 0000                                ....
debugfs:  ea_get -f /tmp/ea <1625089> system.data
$ xxd /tmp/ea
00000000: 0000 0000 4800 3101 xxxx xxxx xxxx xxxx  ....H.1.xxxxxxxx
00000010: xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  xxxxxxxxxxxxxxxx
00000020: xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  xxxxxxxxxxxxxxxx
00000030: xxxx xxxx xx2e 7064 6600 0000 0000 0000  xxxxx.pdf.......
00000040: 0000 0000 0000 0000                      ........

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-01-23  0:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-08 22:41 Ext4 deadlock (+lockdep splat) during rsync George Spelvin
2017-01-10  3:40 ` Theodore Ts'o
2017-01-10  7:25   ` George Spelvin
2017-01-19 17:37 ` Jan Kara
2017-01-19 17:45   ` Jan Kara
2017-01-19 17:59     ` George Spelvin
2017-01-19 23:19   ` Theodore Ts'o
2017-01-19 23:38     ` George Spelvin
2017-01-20 14:40       ` Theodore Ts'o
2017-01-20 17:57         ` George Spelvin
2017-01-21 13:30           ` Theodore Ts'o
2017-01-21 16:45             ` ext4_iget:4740: inode #%ld: block 48: comm find: invalid block George Spelvin
2017-01-23  0:08               ` Theodore Ts'o
  -- strict thread matches above, loose matches on Subject: below --
2017-01-18  8:21 kernel BUG at fs/ext4/inline.c:1943! George Spelvin
2017-01-19 17:50 ` ext4_iget:4740: inode #%ld: block 48: comm find: invalid block George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).