cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] cgroup writeback crash
@ 2016-02-12 19:32 Tahsin Erdogan
       [not found] ` <CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Tahsin Erdogan @ 2016-02-12 19:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o, Nauman Rafique

Hi,
cgroup based writeback has a race condition bug leads to a kernel crash.

When an inode's bdi_writeback is switched, an additional ref count on the inode
is acquired in inode_switch_wbs()  and the actual reassignment work is scheduled
to be executed later. If file gets deleted and fs unmounted before the work is
finished, then the last ref drop by inode_switch_wbs_work_fn() will
try to evict the
inode and so attempt to access released filesystem data.

Here is the shell script that I am using to reproduce this (not a
reliable repro):

cat > repro.sh << "EOF"
#!/bin/bash
set -e

FILE_COUNT=${1:-18}
BLK_COUNT=${2:-2}


CGROUP_ROOT=/mnt-cgroup2

mkdir -p $CGROUP_ROOT

if ! mount | grep -qw cgroup2; then
mount -t cgroup2 none $CGROUP_ROOT
fi

mkdir -p $CGROUP_ROOT/mem1
mkdir -p $CGROUP_ROOT/mem2

echo '+memory' > $CGROUP_ROOT/cgroup.subtree_control

mkdir -p /mnt/sdb

if mount | grep -qw /dev/sdb; then
umount /dev/sdb &> /dev/null || true
fi

mount /dev/sdb /mnt/sdb

FILES=$(seq 1 $FILE_COUNT)

for f in $FILES; do
rm -f /mnt/sdb/dd$f
done

# Move to mem1 cgroup
echo $$ > $CGROUP_ROOT/mem1/cgroup.procs

for i in {1..10}; do
for f in $FILES; do
dd if=/dev/urandom of=/mnt/sdb/dd$f conv=notrunc \
bs=4k count=$BLK_COUNT seek=$(($BLK_COUNT*$i)) &> /dev/null
done
sync

# After first iteration, switch to mem2 cgroup
if [[ "$i" == "1" ]]; then
echo $$ > $CGROUP_ROOT/mem2/cgroup.procs
fi
done

for f in $FILES; do
rm -f /mnt/sdb/dd$f
done

umount /mnt/sdb

EOF

[  278.498009] ------------[ cut here ]------------
[  278.502764] kernel BUG at fs/jbd2/transaction.c:319!
[  278.507652] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  278.507652] CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
[  278.507652] Hardware name: Google Google, BIOS Google 01/01/2011
[  278.507652] Workqueue: events inode_switch_wbs_work_fn
[  278.507652] task: ffff880213dbbd40 ti: ffff880209264000 task.ti:
ffff880209264000
[  278.507652] RIP: 0010:[<ffffffff803e6922>]  [<ffffffff803e6922>]
start_this_handle+0x382/0x3e0
[  278.507652] RSP: 0018:ffff880209267c30  EFLAGS: 00010202
[  278.507652] RAX: 0000000000000031 RBX: ffff880213fba000 RCX: 0000000000000000
[  278.507652] RDX: 0000000000000001 RSI: 00000000000001ff RDI: ffff880213fba028
[  278.507652] RBP: ffff880209267cb0 R08: 0000000000002000 R09: 00000000000000ef
[  278.507652] R10: ffff880216085750 R11: 0000000000000006 R12: ffff880213fba024
[  278.507652] R13: ffff880213fba070 R14: ffff880216085750 R15: 00000000000000ef
[  278.507652] FS:  0000000000000000(0000) GS:ffff88021ef00000(0000)
knlGS:0000000000000000
[  278.507652] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  278.507652] CR2: 00007f310a2a9095 CR3: 0000000000e0a000 CR4: 00000000000406e0
[  278.507652] Stack:
[  278.507652]  000000000000000a ffff880213fba3e0 024000400000000c
ffff880209267c78
[  278.507652]  0000000000000000 0000003000000000 ffff880216085750
ffff880209267cb0
[  278.507652]  ffffffff8032d78a ffffffff803e6b69 ffff880216085720
ffff880213fba000
[  278.507652] Call Trace:
[  278.507652]  [<ffffffff8032d78a>] ? kmem_cache_alloc+0x10a/0x150
[  278.507652]  [<ffffffff803e6b69>] ? jbd2__journal_start+0x79/0x190
[  278.507652]  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
[  278.507652]  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
[  278.507652]  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
[  278.507652]  [<ffffffff8035338b>] evict+0xbb/0x190
[  278.507652]  [<ffffffff80354190>] iput+0x130/0x190
[  278.507652]  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
[  278.507652]  [<ffffffff80279819>] process_one_work+0x129/0x300
[  278.507652]  [<ffffffff802ab1a3>] ? try_to_del_timer_sync+0x43/0x60
[  278.507652]  [<ffffffff80279b16>] worker_thread+0x126/0x480
[  278.507652]  [<ffffffff802799f0>] ? process_one_work+0x300/0x300
[  278.507652]  [<ffffffff8027ed14>] kthread+0xc4/0xe0
[  278.507652]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.507652]  [<ffffffff809771df>] ret_from_fork+0x3f/0x70
[  278.507652]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.507652] Code: 00 00 e8 82 6d f4 ff 48 85 c0 48 89 45 a0 0f 85
28 fd ff ff 41 bf f4 ff ff ff e9 bf fe ff ff c7 45 a8 00 00 00 00 e9
b8 fc ff ff <0f> 0b 41 bf e2 ff ff ff e9 a6 fe ff ff 0f 0b 8b 4d a8 8b
55 ac
[  278.507652] RIP  [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
[  278.507652]  RSP <ffff880209267c30>
[  278.775069] ---[ end trace b85bc47b5909067f ]---
[  278.779848] BUG: unable to handle kernel paging request at ffffffffffffffd8
[  278.787306] IP: [<ffffffff8027eebb>] kthread_data+0xb/0x20
[  278.789819] PGD e0b067 PUD e0d067 PMD 0
[  278.789819] Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
[  278.789819] CPU: 1 PID: 29158 Comm: kworker/1:10 Tainted: G      D
       4.5.0-rc3 #51
[  278.789819] Hardware name: Google Google, BIOS Google 01/01/2011
[  278.789819] task: ffff880213dbbd40 ti: ffff880209264000 task.ti:
ffff880209264000
[  278.789819] RIP: 0010:[<ffffffff8027eebb>]  [<ffffffff8027eebb>]
kthread_data+0xb/0x20
[  278.789819] RSP: 0018:ffff880209267930  EFLAGS: 00010002
[  278.789819] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[  278.789819] RDX: ffffffff80fbf880 RSI: 0000000000000001 RDI: ffff880213dbbd40
[  278.789819] RBP: ffff880209267930 R08: 00000040e892772b R09: 0000000000000000
[  278.789819] R10: 0000000000000000 R11: ffffea000857da00 R12: ffff880213dbc1f0
[  278.789819] R13: 0000000000013cc0 R14: 0000000000000001 R15: ffff880213dbbd40
[  278.789819] FS:  0000000000000000(0000) GS:ffff88021ef00000(0000)
knlGS:0000000000000000
[  278.789819] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  278.789819] CR2: 0000000000000028 CR3: 0000000000e0a000 CR4: 00000000000406e0
[  278.789819] Stack:
[  278.789819]  ffff880209267948 ffffffff802793fc ffff88021ef13cc0
ffff880209267998
[  278.789819]  ffffffff80973ad7 ffff880213dbbd40 ffffffff802a6d22
ffff8802092679b0
[  278.789819]  ffff880209268000 ffff880213dbc0e0 ffffffff80c36d32
ffff880213dbbd40
[  278.789819] Call Trace:
[  278.789819]  [<ffffffff802793fc>] wq_worker_sleeping+0xc/0x90
[  278.789819]  [<ffffffff80973ad7>] __schedule+0x347/0x7d6
[  278.789819]  [<ffffffff802a6d22>] ? call_rcu_sched+0x12/0x20
[  278.789819]  [<ffffffff80973fc0>] schedule+0x30/0x80
[  278.789819]  [<ffffffff8026972a>] do_exit+0x5fa/0xa50
[  278.789819]  [<ffffffff80206058>] oops_end+0x68/0x90
[  278.789819]  [<ffffffff802061b6>] die+0x46/0x60
[  278.789819]  [<ffffffff802038b3>] do_trap+0xa3/0x140
[  278.789819]  [<ffffffff802039c2>] do_error_trap+0x72/0xe0
[  278.789819]  [<ffffffff803e6922>] ? start_this_handle+0x382/0x3e0
[  278.789819]  [<ffffffff80203c6b>] do_invalid_op+0x1b/0x20
[  278.789819]  [<ffffffff80978318>] invalid_op+0x18/0x20
[  278.789819]  [<ffffffff803e6922>] ? start_this_handle+0x382/0x3e0
[  278.789819]  [<ffffffff8032d78a>] ? kmem_cache_alloc+0x10a/0x150
[  278.789819]  [<ffffffff803e6b69>] ? jbd2__journal_start+0x79/0x190
[  278.789819]  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
[  278.789819]  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
[  278.789819]  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
[  278.789819]  [<ffffffff8035338b>] evict+0xbb/0x190
[  278.789819]  [<ffffffff80354190>] iput+0x130/0x190
[  278.789819]  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
[  278.789819]  [<ffffffff80279819>] process_one_work+0x129/0x300
[  278.789819]  [<ffffffff802ab1a3>] ? try_to_del_timer_sync+0x43/0x60
[  278.789819]  [<ffffffff80279b16>] worker_thread+0x126/0x480
[  278.789819]  [<ffffffff802799f0>] ? process_one_work+0x300/0x300
[  278.789819]  [<ffffffff8027ed14>] kthread+0xc4/0xe0
[  278.789819]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.789819]  [<ffffffff809771df>] ret_from_fork+0x3f/0x70
[  278.789819]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.789819] Code: 25 80 ac 00 00 48 8b 80 50 04 00 00 5d 48 8b 40
c8 48 d1 e8 83 e0 01 c3 0f 1f 84 00 00 00 00 00 55 48 8b 87 50 04 00
00 48 89 e5 <48> 8b 40 d8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
00 00
[  278.789819] RIP  [<ffffffff8027eebb>] kthread_data+0xb/0x20
[  278.789819]  RSP <ffff880209267930>
[  278.789819] CR2: ffffffffffffffd8
[  278.789819] ---[ end trace b85bc47b59090680 ]---

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-03-02 10:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-12 19:32 [BUG] cgroup writeback crash Tahsin Erdogan
     [not found] ` <CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-15 21:00   ` Tejun Heo
     [not found]     ` <20160215210047.GN3965-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-02-16  7:56       ` Tahsin Erdogan
     [not found]         ` <CAAeU0aNAd1Ra6LXmWwq8row4MD_BpVHiSXOwHx07m86UWREvHw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-16 18:24           ` [PATCH block/for-4.5-fixes] writeback: keep superblock pinned during cgroup writeback association switches Tejun Heo
     [not found]             ` <20160216182457.GO3741-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-02-16 18:34               ` Jens Axboe
2016-02-17 20:57               ` Jan Kara
     [not found]                 ` <20160217205721.GE14140-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2016-02-17 21:07                   ` Tejun Heo
2016-02-17 22:30                     ` Jan Kara
2016-02-17 22:41                       ` Tahsin Erdogan
     [not found]                         ` <CAAeU0aOvSwPbLPU0=20D1RExNj8VsbB38hUnyso2L8xNSQC0XA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-17 23:02                           ` Tejun Heo
2016-02-18  9:55                             ` Jan Kara
     [not found]                               ` <20160218095538.GA4338-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2016-02-18 13:00                                 ` Tejun Heo
2016-02-18 13:20                                   ` Jan Kara
2016-02-19 20:18                                   ` Al Viro
2016-02-19 20:51                                     ` Tejun Heo
     [not found]                                       ` <20160219205147.GN13177-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-02-19 21:58                                         ` Al Viro
     [not found]                                           ` <20160219215811.GA17997-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2016-02-19 22:15                                             ` Tejun Heo
2016-02-19 22:26                                               ` Al Viro
     [not found]                                                 ` <20160219222609.GC17997-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2016-02-28 21:53                                                   ` Tejun Heo
     [not found]                             ` <20160217230231.GC6479-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-02-29 20:47                               ` [PATCH block/for-linus] writeback: flush inode cgroup wb switches instead of pinning super_block Tejun Heo
     [not found]                                 ` <20160229204724.GV3965-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-02-29 20:54                                   ` Al Viro
     [not found]                                     ` <20160229205428.GB17997-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2016-02-29 20:58                                       ` Tejun Heo
     [not found]                                         ` <20160229205837.GX3965-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-02-29 21:06                                           ` Al Viro
     [not found]                                             ` <20160229210614.GC17997-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2016-02-29 21:08                                               ` Tejun Heo
     [not found]                                                 ` <20160229210800.GY3965-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-02-29 21:21                                                   ` Jan Kara
2016-02-29 23:28                                   ` [PATCH v2 " Tejun Heo
2016-03-01  9:20                                     ` Jan Kara
2016-03-01 17:46                                     ` Jens Axboe
     [not found]                                       ` <56D5D592.2020800-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2016-03-01 17:50                                         ` Tejun Heo
     [not found]                                           ` <CAOS58YO5vTBnM561np7gpXKGQELrT169bYqmcfvAvsquBJK5yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-02 10:29                                             ` Jan Kara
2016-03-01 13:39                                 ` [PATCH " Tahsin Erdogan
2016-02-18 10:12               ` [PATCH block/for-4.5-fixes] writeback: keep superblock pinned during cgroup writeback association switches Nikolay Borisov
2016-02-18 12:57                 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).