* Re: [PATCH] Btrfs: fix missing delayed iputs on unmount
2018-10-31 0:14 [PATCH] Btrfs: fix missing delayed iputs on unmount Omar Sandoval
@ 2018-10-31 7:14 ` Lu Fengqi
2018-10-31 8:26 ` Nikolay Borisov
2018-10-31 7:41 ` Lu Fengqi
2018-10-31 8:43 ` Nikolay Borisov
2 siblings, 1 reply; 10+ messages in thread
From: Lu Fengqi @ 2018-10-31 7:14 UTC (permalink / raw)
To: Omar Sandoval; +Cc: linux-btrfs, kernel-team
[-- Attachment #1: Type: text/plain, Size: 4156 bytes --]
On Tue, Oct 30, 2018 at 05:14:42PM -0700, Omar Sandoval wrote:
>From: Omar Sandoval <osandov@fb.com>
>
>There's a race between close_ctree() and cleaner_kthread().
>close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
>sees it set, but this is racy; the cleaner might have already checked
>the bit and could be cleaning stuff. In particular, if it deletes unused
>block groups, it will create delayed iputs for the free space cache
>inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
>longer running delayed iputs after a commit. Therefore, if the cleaner
>creates more delayed iputs after delayed iputs are run in
>btrfs_commit_super(), we will leak inodes on unmount and get a busy
>inode crash from the VFS.
>
>Fix it by parking the cleaner before we actually close anything. Then,
>any remaining delayed iputs will always be handled in
>btrfs_commit_super(). This also ensures that the commit in close_ctree()
>is really the last commit, so we can get rid of the commit in
>cleaner_kthread().
>
>Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
>Signed-off-by: Omar Sandoval <osandov@fb.com>
>---
>We found this with a stress test that our containers team runs. I'm
>wondering if this same race could have caused any other issues other
>than this new iput thing, but I couldn't identify any.
I noticed an inode leak issue in generic/475, but whether dropping commit
30928e9baac2 ("btrfs: don't run delayed_iputs in commit") or applying
this patch, the issue still exists.
I have attached the dmesg.
>
> fs/btrfs/disk-io.c | 40 +++++++---------------------------------
> 1 file changed, 7 insertions(+), 33 deletions(-)
>
>diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>index b0ab41da91d1..7c17284ae3c2 100644
>--- a/fs/btrfs/disk-io.c
>+++ b/fs/btrfs/disk-io.c
>@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
> struct btrfs_root *root = arg;
> struct btrfs_fs_info *fs_info = root->fs_info;
> int again;
>- struct btrfs_trans_handle *trans;
>
>- do {
>+ while (1) {
> again = 0;
>
> /* Make the cleaner go to sleep early. */
>@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
> */
> btrfs_delete_unused_bgs(fs_info);
> sleep:
>+ if (kthread_should_park())
>+ kthread_parkme();
>+ if (kthread_should_stop())
>+ return 0;
> if (!again) {
> set_current_state(TASK_INTERRUPTIBLE);
>- if (!kthread_should_stop())
>- schedule();
>+ schedule();
> __set_current_state(TASK_RUNNING);
> }
>- } while (!kthread_should_stop());
>-
>- /*
>- * Transaction kthread is stopped before us and wakes us up.
>- * However we might have started a new transaction and COWed some
>- * tree blocks when deleting unused block groups for example. So
>- * make sure we commit the transaction we started to have a clean
>- * shutdown when evicting the btree inode - if it has dirty pages
>- * when we do the final iput() on it, eviction will trigger a
>- * writeback for it which will fail with null pointer dereferences
>- * since work queues and other resources were already released and
>- * destroyed by the time the iput/eviction/writeback is made.
>- */
>- trans = btrfs_attach_transaction(root);
>- if (IS_ERR(trans)) {
>- if (PTR_ERR(trans) != -ENOENT)
>- btrfs_err(fs_info,
>- "cleaner transaction attach returned %ld",
>- PTR_ERR(trans));
>- } else {
>- int ret;
>-
>- ret = btrfs_commit_transaction(trans);
>- if (ret)
>- btrfs_err(fs_info,
>- "cleaner open transaction commit returned %d",
>- ret);
> }
>-
>- return 0;
> }
>
> static int transaction_kthread(void *arg)
>@@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> int ret;
>
> set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
>+ kthread_park(fs_info->cleaner_kthread);
Since we are not going to call kthread_unpark, I am not sure why
kthread_park is used instead of kthread_stop here. It looks like there
is no significant difference between stopping instantly and delayed stop.
--
Thanks,
Lu
>
> /* wait for the qgroup rescan worker to stop */
> btrfs_qgroup_wait_for_completion(fs_info, false);
>--
>2.19.1
>
>
>
>
[-- Attachment #2: generic_475.dmesg --]
[-- Type: text/plain, Size: 14435 bytes --]
[ 366.955193] run fstests generic/475 at 2018-10-31 15:06:43
[ 367.495791] BTRFS: device fsid 812f883c-40b2-4456-9769-b94ddf1cb07e devid 1 transid 5 /dev/nvme0n1p2
[ 367.624469] BTRFS info (device dm-3): disk space caching is enabled
[ 367.627305] BTRFS info (device dm-3): has skinny extents
[ 367.628771] BTRFS info (device dm-3): flagging fs with big metadata feature
[ 367.637585] BTRFS info (device dm-3): enabling ssd optimizations
[ 367.640079] BTRFS info (device dm-3): checking UUID tree
[ 367.666290] btrfs_dev_stat_print_on_error: 86 callbacks suppressed
[ 367.666293] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 367.670665] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[ 367.673332] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[ 367.674024] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[ 367.676072] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[ 367.677316] Buffer I/O error on dev dm-3, logical block 4718576, async page read
[ 367.682713] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[ 367.685150] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[ 367.690053] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[ 367.690298] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[ 367.694166] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[ 367.700484] BTRFS: error (device dm-3) in btrfs_commit_transaction:2239: errno=-5 IO failure (Error while writing out transaction)
[ 367.707185] BTRFS info (device dm-3): forced readonly
[ 367.708627] BTRFS warning (device dm-3): Skipping commit of aborted transaction.
[ 367.710526] BTRFS: error (device dm-3) in cleanup_transaction:1849: errno=-5 IO failure
[ 367.712330] BTRFS info (device dm-3): delayed_refs has NO entry
[ 367.975623] BTRFS error (device dm-3): cleaner transaction attach returned -30
[ 367.977998] BTRFS info (device dm-3): at unmount delalloc count 8192
[ 367.980424] WARNING: CPU: 2 PID: 14918 at fs/btrfs/disk-io.c:3846 btrfs_free_fs_root+0x94/0xa0 [btrfs]
[ 367.983228] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 367.986171] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Not tainted 4.19.0-rc8+ #18
[ 367.988154] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 367.990175] RIP: 0010:btrfs_free_fs_root+0x94/0xa0 [btrfs]
[ 367.997036] RSP: 0018:ffffc9000744bd60 EFLAGS: 00010282
[ 367.998595] RAX: ffff880354eb6270 RBX: ffff8803661c4548 RCX: 0000000000000000
[ 368.000665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[ 368.002527] RBP: ffff88033cd70000 R08: 0000000000000000 R09: 0000000000000001
[ 368.004426] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000744bd88
[ 368.006248] R13: ffff88033cd700c0 R14: ffffc9000744bd80 R15: 0000000000000000
[ 368.008082] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.010068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.011654] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.013440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.015232] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.016979] Call Trace:
[ 368.017947] btrfs_free_fs_roots+0xcf/0x160 [btrfs]
[ 368.019317] ? wait_for_completion+0x126/0x150
[ 368.020617] close_ctree+0x149/0x330 [btrfs]
[ 368.021822] generic_shutdown_super+0x6c/0x100
[ 368.023043] kill_anon_super+0x14/0x30
[ 368.024203] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.025471] deactivate_locked_super+0x2f/0x70
[ 368.027647] cleanup_mnt+0x3b/0x70
[ 368.028712] task_work_run+0x9e/0xd0
[ 368.029875] do_syscall_64+0x470/0x600
[ 368.031009] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.032350] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.033620] RIP: 0033:0x7f2062ec1b5b
[ 368.039127] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.040906] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.042564] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.044250] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.045862] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.047503] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.049089] irq event stamp: 0
[ 368.049979] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.051674] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.053587] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.055550] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.058098] ---[ end trace 08fc3163d80dad81 ]---
[ 368.061134] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:123 btrfs_put_block_group+0x51/0x60 [btrfs]
[ 368.063637] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.066230] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.068615] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.070406] RIP: 0010:btrfs_put_block_group+0x51/0x60 [btrfs]
[ 368.076431] RSP: 0018:ffffc9000744bda0 EFLAGS: 00010206
[ 368.077765] RAX: 0000000000000001 RBX: ffff88033cd70000 RCX: ffff8803414f76c0
[ 368.079477] RDX: ffff8803414f76c0 RSI: 0000000000000001 RDI: ffff8803414f74e8
[ 368.081141] RBP: ffff8803414f74e8 R08: 0000000000000000 R09: 0000000000000000
[ 368.082830] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[ 368.084565] R13: ffff88033cd70168 R14: ffff8803414f7680 R15: dead000000000100
[ 368.087151] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.089087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.090616] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.092390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.094103] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.095812] Call Trace:
[ 368.096700] btrfs_free_block_groups+0x1da/0x3b0 [btrfs]
[ 368.098065] close_ctree+0x171/0x330 [btrfs]
[ 368.099254] generic_shutdown_super+0x6c/0x100
[ 368.100486] kill_anon_super+0x14/0x30
[ 368.101554] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.102758] deactivate_locked_super+0x2f/0x70
[ 368.103986] cleanup_mnt+0x3b/0x70
[ 368.105002] task_work_run+0x9e/0xd0
[ 368.106038] do_syscall_64+0x470/0x600
[ 368.107123] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.108397] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.109677] RIP: 0033:0x7f2062ec1b5b
[ 368.115270] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.117866] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.119595] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.121368] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.122983] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.124680] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.126330] irq event stamp: 0
[ 368.127242] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.128874] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.130836] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.132742] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.134443] ---[ end trace 08fc3163d80dad82 ]---
[ 368.140638] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.143175] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.145923] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.148908] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.150766] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.156851] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[ 368.158212] RAX: ffff8803414f4620 RBX: ffff88033cd70000 RCX: 0000000000000000
[ 368.159883] RDX: 0000000000000002 RSI: ffff88033cd71750 RDI: ffff8803414f7a38
[ 368.161614] RBP: ffff8803414f0558 R08: 0000000000000000 R09: ffff8803414f7b70
[ 368.163340] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[ 368.165029] R13: ffff8803414f4620 R14: 0000000000000000 R15: dead000000000100
[ 368.166730] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.168613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.170055] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.171794] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.173501] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.175273] Call Trace:
[ 368.176969] close_ctree+0x171/0x330 [btrfs]
[ 368.178203] generic_shutdown_super+0x6c/0x100
[ 368.179443] kill_anon_super+0x14/0x30
[ 368.180560] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.181794] deactivate_locked_super+0x2f/0x70
[ 368.182975] cleanup_mnt+0x3b/0x70
[ 368.184024] task_work_run+0x9e/0xd0
[ 368.185051] do_syscall_64+0x470/0x600
[ 368.186098] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.187392] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.188675] RIP: 0033:0x7f2062ec1b5b
[ 368.194299] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.195990] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.197690] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.199348] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.200998] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.202651] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.204354] irq event stamp: 0
[ 368.205236] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.207825] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.209804] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.211807] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.213501] ---[ end trace 08fc3163d80dad83 ]---
[ 368.214836] BTRFS info (device dm-3): space_info 1 has 7708672 free, is not full
[ 368.216595] BTRFS info (device dm-3): space_info total=8388608, used=585728, pinned=0, reserved=86016, may_use=8192, readonly=0
[ 368.219751] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.222497] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq efivarfs xfs nvme nvme_core
[ 368.225120] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G W 4.19.0-rc8+ #18
[ 368.227581] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 368.229402] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[ 368.235564] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[ 368.237770] RAX: ffff8803414f7b40 RBX: ffff88033cd70000 RCX: 000000000017000c
[ 368.239647] RDX: 0000000000000002 RSI: 000000000017000c RDI: ffffffff8125fb74
[ 368.241447] RBP: ffff8803414f47d0 R08: 0000000000000000 R09: 0000000000000000
[ 368.243206] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803414f4848
[ 368.245071] R13: ffff8803414f7b40 R14: 0000000000000000 R15: dead000000000100
[ 368.246825] FS: 00007f206293e780(0000) GS:ffff880373800000(0000) knlGS:0000000000000000
[ 368.248758] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.250245] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[ 368.251982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.253714] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 368.255491] Call Trace:
[ 368.256375] close_ctree+0x171/0x330 [btrfs]
[ 368.257584] generic_shutdown_super+0x6c/0x100
[ 368.258786] kill_anon_super+0x14/0x30
[ 368.259900] btrfs_kill_super+0x13/0x100 [btrfs]
[ 368.261180] deactivate_locked_super+0x2f/0x70
[ 368.262433] cleanup_mnt+0x3b/0x70
[ 368.263452] task_work_run+0x9e/0xd0
[ 368.264531] do_syscall_64+0x470/0x600
[ 368.265621] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 368.267805] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 368.269288] RIP: 0033:0x7f2062ec1b5b
[ 368.275103] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 368.276887] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[ 368.278598] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[ 368.280268] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[ 368.281917] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[ 368.283580] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[ 368.285271] irq event stamp: 0
[ 368.286151] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 368.287851] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.289798] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
[ 368.291756] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 368.293419] ---[ end trace 08fc3163d80dad84 ]---
[ 368.294684] BTRFS info (device dm-3): space_info 4 has 275857408 free, is not full
[ 368.297278] BTRFS info (device dm-3): space_info total=276824064, used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536
[ 368.300677] VFS: Busy inodes after unmount of dm-3. Self-destruct in 5 seconds. Have a nice day...
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] Btrfs: fix missing delayed iputs on unmount
2018-10-31 0:14 [PATCH] Btrfs: fix missing delayed iputs on unmount Omar Sandoval
2018-10-31 7:14 ` Lu Fengqi
@ 2018-10-31 7:41 ` Lu Fengqi
2018-10-31 16:37 ` Omar Sandoval
2018-10-31 8:43 ` Nikolay Borisov
2 siblings, 1 reply; 10+ messages in thread
From: Lu Fengqi @ 2018-10-31 7:41 UTC (permalink / raw)
To: Omar Sandoval; +Cc: linux-btrfs, kernel-team
On Tue, Oct 30, 2018 at 05:14:42PM -0700, Omar Sandoval wrote:
>From: Omar Sandoval <osandov@fb.com>
>
>There's a race between close_ctree() and cleaner_kthread().
>close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
>sees it set, but this is racy; the cleaner might have already checked
>the bit and could be cleaning stuff. In particular, if it deletes unused
>block groups, it will create delayed iputs for the free space cache
>inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
>longer running delayed iputs after a commit. Therefore, if the cleaner
>creates more delayed iputs after delayed iputs are run in
>btrfs_commit_super(), we will leak inodes on unmount and get a busy
Since the assert added via commit e187831e1875 ("btrfs: assert on non-empty
delayed iputs") wasn't triggered, it doesn't seem to be the cause of inode
leak.
--
Thanks,
Lu
>inode crash from the VFS.
>
>Fix it by parking the cleaner before we actually close anything. Then,
>any remaining delayed iputs will always be handled in
>btrfs_commit_super(). This also ensures that the commit in close_ctree()
>is really the last commit, so we can get rid of the commit in
>cleaner_kthread().
>
>Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
>Signed-off-by: Omar Sandoval <osandov@fb.com>
>---
>We found this with a stress test that our containers team runs. I'm
>wondering if this same race could have caused any other issues other
>than this new iput thing, but I couldn't identify any.
>
> fs/btrfs/disk-io.c | 40 +++++++---------------------------------
> 1 file changed, 7 insertions(+), 33 deletions(-)
>
>diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>index b0ab41da91d1..7c17284ae3c2 100644
>--- a/fs/btrfs/disk-io.c
>+++ b/fs/btrfs/disk-io.c
>@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
> struct btrfs_root *root = arg;
> struct btrfs_fs_info *fs_info = root->fs_info;
> int again;
>- struct btrfs_trans_handle *trans;
>
>- do {
>+ while (1) {
> again = 0;
>
> /* Make the cleaner go to sleep early. */
>@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
> */
> btrfs_delete_unused_bgs(fs_info);
> sleep:
>+ if (kthread_should_park())
>+ kthread_parkme();
>+ if (kthread_should_stop())
>+ return 0;
> if (!again) {
> set_current_state(TASK_INTERRUPTIBLE);
>- if (!kthread_should_stop())
>- schedule();
>+ schedule();
> __set_current_state(TASK_RUNNING);
> }
>- } while (!kthread_should_stop());
>-
>- /*
>- * Transaction kthread is stopped before us and wakes us up.
>- * However we might have started a new transaction and COWed some
>- * tree blocks when deleting unused block groups for example. So
>- * make sure we commit the transaction we started to have a clean
>- * shutdown when evicting the btree inode - if it has dirty pages
>- * when we do the final iput() on it, eviction will trigger a
>- * writeback for it which will fail with null pointer dereferences
>- * since work queues and other resources were already released and
>- * destroyed by the time the iput/eviction/writeback is made.
>- */
>- trans = btrfs_attach_transaction(root);
>- if (IS_ERR(trans)) {
>- if (PTR_ERR(trans) != -ENOENT)
>- btrfs_err(fs_info,
>- "cleaner transaction attach returned %ld",
>- PTR_ERR(trans));
>- } else {
>- int ret;
>-
>- ret = btrfs_commit_transaction(trans);
>- if (ret)
>- btrfs_err(fs_info,
>- "cleaner open transaction commit returned %d",
>- ret);
> }
>-
>- return 0;
> }
>
> static int transaction_kthread(void *arg)
>@@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> int ret;
>
> set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
>+ kthread_park(fs_info->cleaner_kthread);
>
> /* wait for the qgroup rescan worker to stop */
> btrfs_qgroup_wait_for_completion(fs_info, false);
>--
>2.19.1
>
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] Btrfs: fix missing delayed iputs on unmount
2018-10-31 0:14 [PATCH] Btrfs: fix missing delayed iputs on unmount Omar Sandoval
2018-10-31 7:14 ` Lu Fengqi
2018-10-31 7:41 ` Lu Fengqi
@ 2018-10-31 8:43 ` Nikolay Borisov
2018-10-31 16:35 ` Omar Sandoval
2 siblings, 1 reply; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-31 8:43 UTC (permalink / raw)
To: Omar Sandoval, linux-btrfs; +Cc: kernel-team
On 31.10.18 г. 2:14 ч., Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
>
> There's a race between close_ctree() and cleaner_kthread().
> close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
> sees it set, but this is racy; the cleaner might have already checked
> the bit and could be cleaning stuff. In particular, if it deletes unused
> block groups, it will create delayed iputs for the free space cache
> inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
> longer running delayed iputs after a commit. Therefore, if the cleaner
> creates more delayed iputs after delayed iputs are run in
> btrfs_commit_super(), we will leak inodes on unmount and get a busy
> inode crash from the VFS.
>
> Fix it by parking the cleaner before we actually close anything. Then,
> any remaining delayed iputs will always be handled in
> btrfs_commit_super(). This also ensures that the commit in close_ctree()
> is really the last commit, so we can get rid of the commit in
> cleaner_kthread().
>
> Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> We found this with a stress test that our containers team runs. I'm
> wondering if this same race could have caused any other issues other
> than this new iput thing, but I couldn't identify any.
>
> fs/btrfs/disk-io.c | 40 +++++++---------------------------------
> 1 file changed, 7 insertions(+), 33 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b0ab41da91d1..7c17284ae3c2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
> struct btrfs_root *root = arg;
> struct btrfs_fs_info *fs_info = root->fs_info;
> int again;
> - struct btrfs_trans_handle *trans;
>
> - do {
> + while (1) {
> again = 0;
>
> /* Make the cleaner go to sleep early. */
> @@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
> */
> btrfs_delete_unused_bgs(fs_info);
> sleep:
> + if (kthread_should_park())
> + kthread_parkme();
> + if (kthread_should_stop())
> + return 0;
> if (!again) {
> set_current_state(TASK_INTERRUPTIBLE);
> - if (!kthread_should_stop())
> - schedule();
> + schedule();
> __set_current_state(TASK_RUNNING);
> }
> - } while (!kthread_should_stop());
> -
> - /*
> - * Transaction kthread is stopped before us and wakes us up.
> - * However we might have started a new transaction and COWed some
> - * tree blocks when deleting unused block groups for example. So
> - * make sure we commit the transaction we started to have a clean
> - * shutdown when evicting the btree inode - if it has dirty pages
> - * when we do the final iput() on it, eviction will trigger a
> - * writeback for it which will fail with null pointer dereferences
> - * since work queues and other resources were already released and
> - * destroyed by the time the iput/eviction/writeback is made.
> - */
> - trans = btrfs_attach_transaction(root);
> - if (IS_ERR(trans)) {
> - if (PTR_ERR(trans) != -ENOENT)
> - btrfs_err(fs_info,
> - "cleaner transaction attach returned %ld",
> - PTR_ERR(trans));
> - } else {
> - int ret;
> -
> - ret = btrfs_commit_transaction(trans);
> - if (ret)
> - btrfs_err(fs_info,
> - "cleaner open transaction commit returned %d",
> - ret);
> }
> -
> - return 0;
> }
>
> static int transaction_kthread(void *arg)
> @@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> int ret;
>
> set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
> + kthread_park(fs_info->cleaner_kthread);
Can't you directly call kthread_stop here? When you park the thread it
will sleep and then when you call kthread_stop that function will unpark
the thread and the cleaner kthread will see KTHREAD_SHOULD_STOP bit and
just return 0. So the from the moment the thread is parked until it's
stopped it doesn't have a chance to do useful work.
>
> /* wait for the qgroup rescan worker to stop */
> btrfs_qgroup_wait_for_completion(fs_info, false);
>
^ permalink raw reply [flat|nested] 10+ messages in thread