From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: wenli xie <wlxie7296@gmail.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [Bug report] overlayfs over xfs whiteout operation may cause deadlock
Date: Fri, 11 Dec 2020 15:42:33 -0800 [thread overview]
Message-ID: <20201211234233.GK106271@magnolia> (raw)
In-Reply-To: <CABRboy006NP8JrxuBgEJbfCcGGUY2Kucwfov+HJf2xW34D5Ocg@mail.gmail.com>
On Thu, Dec 10, 2020 at 03:14:53PM +0800, wenli xie wrote:
> We are using the linux ubuntu kernel for 5.4.0-34.generic.x86_64. In our
> host, there are many containers use the same base image. All of them have
> `mv /etc/security/access.conf /etc/security/access.conf.bak` operation
> when the container starts up. Sometimes we met that the `mv` process came
> to D state and couldn't recover.
> I did some research and find this issue can be reproduced by following
> steps.
>
> 1. mount a disk to /var/mnt/test with xfs fs.
> mount -t xfs /mnt/test
> 2. Create lower dir under /var/mnt/test
> /var/mnt/test/lowdir
> /var/mnt/test/lowdir1
> 3. Create `/etc/access.conf` in dir `/var/mnt/test/lowdir`
> 4.
> Run following script in different terminals
> 1)
> while true
> do
> umount /var/mnt/test/merged
> rm -rf /var/mnt/test/merged
> rm -rf /var/mnt/test/upperdir
> rm -rf /var/mnt/test/workdir
> mkdir /var/mnt/test/merged
> mkdir /var/mnt/test/workdir
> mkdir /var/mnt/test/upperdir
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir,workdir=/var/mnt/test/workdir
> /var/mnt/test/merged
> mv /var/mnt/test/merged/etc/access.conf
> /var/mnt/test/merged/etc/access.conf.bak
> touch /var/mnt/test/merged/etc/access.conf
> mv /var/mnt/test/merged/etc/access.conf
> /var/mnt/test/merged/etc/access.conf.bak
> touch /var/mnt/test/merged/etc/access.conf
> echo "---------------"
> done
>
> 2)
> while true
> do
> umount /var/mnt/test/merged1
> rm -rf /var/mnt/test/merged1
> rm -rf /var/mnt/test/upperdir1
> rm -rf /var/mnt/test/workdir1
> mkdir /var/mnt/test/merged1
> mkdir /var/mnt/test/workdir1
> mkdir /var/mnt/test/upperdir1
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir1,workdir=/var/mnt/test/workdir1
> /var/mnt/test/merged1
> mv /var/mnt/test/merged1/etc/access.conf
> /var/mnt/test/merged1/etc/access.conf.bak
> touch /var/mnt/test/merged1/etc/access.conf
> mv /var/mnt/test/merged1/etc/access.conf
> /var/mnt/test/merged1/etc/access.conf.bak
> touch /var/mnt/test/merged1/etc/access.conf
> echo "---------------"
> done
>
> 3)
> while true
> do
> umount /var/mnt/test/merged3
> rm -rf /var/mnt/test/merged3
> rm -rf /var/mnt/test/upperdir3
> rm -rf /var/mnt/test/workdir3
> mkdir /var/mnt/test/merged3
> mkdir /var/mnt/test/workdir3
> mkdir /var/mnt/test/upperdir3
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir3,workdir=/var/mnt/test/workdir3
> /var/mnt/test/merged3
> mv /var/mnt/test/merged3/etc/access.conf
> /var/mnt/test/merged3/etc/access.conf.bak
> touch /var/mnt/test/merged3/etc/access.conf
> mv /var/mnt/test/merged3/etc/access.conf
> /var/mnt/test/merged3/etc/access.conf.bak
> touch /var/mnt/test/merged3/etc/access.conf
> echo "---------------"
> done
>
> 4)
> while true
> do
> umount /var/mnt/test/merged4
> rm -rf /var/mnt/test/merged4
> rm -rf /var/mnt/test/upperdir4
> rm -rf /var/mnt/test/workdir4
> mkdir /var/mnt/test/merged4
> mkdir /var/mnt/test/workdir4
> mkdir /var/mnt/test/upperdir4
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir4,workdir=/var/mnt/test/workdir4
> /var/mnt/test/merged4
> mv /var/mnt/test/merged4/etc/access.conf
> /var/mnt/test/merged4/etc/access.conf.bak
> touch /var/mnt/test/merged4/etc/access.conf
> mv /var/mnt/test/merged4/etc/access.conf
> /var/mnt/test/merged4/etc/access.conf.bak
> touch /var/mnt/test/merged4/etc/access.conf
> echo "---------------"
> done
>
> 5)
> while true
> do
> umount /var/mnt/test/merged5
> rm -rf /var/mnt/test/merged5
> rm -rf /var/mnt/test/upperdir5
> rm -rf /var/mnt/test/workdir5
> mkdir /var/mnt/test/merged5
> mkdir /var/mnt/test/workdir5
> mkdir /var/mnt/test/upperdir5
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir5,workdir=/var/mnt/test/workdir5
> /var/mnt/test/merged5
> mv /var/mnt/test/merged5/etc/access.conf
> /var/mnt/test/merged5/etc/access.conf.bak
> touch /var/mnt/test/merged5/etc/access.conf
> mv /var/mnt/test/merged5/etc/access.conf
> /var/mnt/test/merged5/etc/access.conf.bak
> touch /var/mnt/test/merged5/etc/access.conf
> echo "---------------"
> done
>
> 6)
> while true
> do
> umount /var/mnt/test/merged6
> rm -rf /var/mnt/test/merged6
> rm -rf /var/mnt/test/upperdir6
> rm -rf /var/mnt/test/workdir6
> mkdir /var/mnt/test/merged6
> mkdir /var/mnt/test/workdir6
> mkdir /var/mnt/test/upperdir6
> mount -t overlay overlay -o
> lowerdir=/var/mnt/test/lowerdir:/var/mnt/test/lowerdir1,upperdir=/var/mnt/test/upperdir6,workdir=/var/mnt/test/workdir6
> /var/mnt/test/merged6
> mv /var/mnt/test/merged6/etc/access.conf
> /var/mnt/test/merged6/etc/access.conf.bak
> touch /var/mnt/test/merged6/etc/access.conf
> mv /var/mnt/test/merged6/etc/access.conf
> /var/mnt/test/merged6/etc/access.conf.bak
> touch /var/mnt/test/merged6/etc/access.conf
> echo "---------------"
> done
>
>
>
> After a while, you will found that there are `mv` process become to D state
>
> crash> foreach UN ps -m|sort|tail
> [0 00:00:23.271] [UN] PID: 38033 TASK: ffff908cd4c8ba80 CPU: 20
> COMMAND: "mv"
> [0 00:00:23.332] [UN] PID: 38029 TASK: ffff908cd6093a80 CPU: 1
> COMMAND: "mv"
> [0 00:00:23.332] [UN] PID: 38037 TASK: ffff908cd606ba80 CPU: 31
> COMMAND: "touch"
> [0 00:00:23.332] [UN] PID: 38038 TASK: ffff908cd4c50000 CPU: 24
> COMMAND: "mv"
> [0 00:00:23.333] [UN] PID: 38032 TASK: ffff908cd5a0ba80 CPU: 33
> COMMAND: "mv"
> [0 00:00:23.333] [UN] PID: 38035 TASK: ffff908cd5110000 CPU: 8
> COMMAND: "touch"
> [0 00:00:23.336] [UN] PID: 38040 TASK: ffff908cd4cbba80 CPU: 2
> COMMAND: "touch"
> [0 00:00:23.337] [UN] PID: 38039 TASK: ffff908cd62cba80 CPU: 35
> COMMAND: "touch"
> [0 00:00:23.338] [UN] PID: 38030 TASK: ffff908cdbfc0000 CPU: 15
> COMMAND: "mv"
> [0 00:00:23.339] [UN] PID: 38036 TASK: ffff908cd4ce8000 CPU: 22
> COMMAND: "mv"
>
>
> The `mv` process have the same stack like:
>
> crash> bt
> PID: 38029 TASK: ffff908cd6093a80 CPU: 1 COMMAND: "mv"
> #0 [ffffa848961b3558] __schedule at ffffffff8fa96c09
> #1 [ffffa848961b35e8] schedule at ffffffff8fa97235
> #2 [ffffa848961b3608] schedule_timeout at ffffffff8fa9ccc6
> #3 [ffffa848961b36c8] __down_common at ffffffff8fa9b2b3
> #4 [ffffa848961b3748] __down at ffffffff8fa9b311
> #5 [ffffa848961b3758] down at ffffffff8ed43be1
> #6 [ffffa848961b3778] xfs_buf_lock at ffffffffc06fab78 [xfs]
> #7 [ffffa848961b37a8] xfs_buf_find at ffffffffc06fb160 [xfs]
> #8 [ffffa848961b3850] xfs_buf_get_map at ffffffffc06fbf41 [xfs]
> #9 [ffffa848961b38a0] xfs_buf_read_map at ffffffffc06fcd67 [xfs]
> #10 [ffffa848961b38f0] xfs_trans_read_buf_map at ffffffffc0752496 [xfs]
> #11 [ffffa848961b3938] xfs_read_agi at ffffffffc06cdf02 [xfs]
> #12 [ffffa848961b39a0] xfs_iunlink at ffffffffc071c741 [xfs]
> #13 [ffffa848961b3a08] xfs_droplink at ffffffffc071c9e2 [xfs]
> #14 [ffffa848961b3a30] xfs_rename at ffffffffc07221b7 [xfs]
> #15 [ffffa848961b3b08] xfs_vn_rename at ffffffffc0717ec4 [xfs]
> #16 [ffffa848961b3b80] vfs_rename at ffffffff8eff9d65
> #17 [ffffa848961b3c40] ovl_do_rename at ffffffffc091d177 [overlay]
> #18 [ffffa848961b3c78] ovl_rename at ffffffffc091e885 [overlay]
> #19 [ffffa848961b3d10] vfs_rename at ffffffff8eff9d65
> #20 [ffffa848961b3dd8] do_renameat2 at ffffffff8effcc66
> #21 [ffffa848961b3ea8] __x64_sys_rename at ffffffff8effcdb9
> #22 [ffffa848961b3ec0] do_syscall_64 at ffffffff8ec04c54
> #23 [ffffa848961b3f50] entry_SYSCALL_64_after_hwframe at ffffffff8fc00091
> RIP: 00007f84ed829da7 RSP: 00007ffe6da90508 RFLAGS: 00000202
> RAX: ffffffffffffffda RBX: 00007ffe6da9093f RCX: 00007f84ed829da7
> RDX: 0000000000000000 RSI: 00007ffe6da92b94 RDI: 00007ffe6da92b7a
> RBP: 00007ffe6da908e0 R8: 0000000000000001 R9: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffe6da909c0
> R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffe6da92b7a
> ORIG_RAX: 0000000000000052 CS: 0033 SS: 002b
>
>
> I revert this PR
> https://github.com/torvalds/linux/commit/fa6c668d807b1e9ac041101dfcb59bd8e279cfe5
> , which can indicate which process is hold the lock. This can help for the
> debug.
>
> 1. Check for process 38036
>
> crash> set 38036
> PID: 38036
> COMMAND: "mv"
> TASK: ffff908cd4ce8000 [THREAD_INFO: ffff908cd4ce8000]
> CPU: 22
> STATE: TASK_UNINTERRUPTIBLE
>
>
> It is waiting for a lock, which is holding by 38029:
>
> crash> struct xfs_buf.b_last_holder ffff908f92a0a680
> b_last_holder = 38029
>
>
> 2. Check for process 38029
> crash> set 38029
> PID: 38029
> COMMAND: "mv"
> TASK: ffff908cd6093a80 [THREAD_INFO: ffff908cd6093a80]
> CPU: 1
> STATE: TASK_UNINTERRUPTIBLE
>
> crash> struct xfs_buf.b_last_holder ffff908f993cc780
> b_last_holder = 38030
>
> It is waiting for a lock, which is holding by 38030
>
> 3. Check for process 38030
> crash> set 38030
> PID: 38030
> COMMAND: "mv"
> TASK: ffff908cdbfc0000 [THREAD_INFO: ffff908cdbfc0000]
> CPU: 15
> STATE: TASK_UNINTERRUPTIBLE
>
> crash> struct xfs_buf.b_last_holder ffff908f92a0a680
> b_last_holder = 38029
>
>
>
> I did a bit more trace by ebpf, the lock ffff908f92a0a680 should be held by
> 38029 in xfs_iunlink_remove() but not released:
>
> b'xfs_buf_trylock+0x1'
> b'xfs_buf_get_map+0x51'
> b'xfs_buf_read_map+0x47'
> b'xfs_trans_read_buf_map+0xf6'
> b'xfs_read_agi+0xd2'
> b'xfs_iunlink_remove+0x9a'
> b'xfs_rename+0x618'
> b'xfs_vn_rename+0x104'
> b'vfs_rename+0x6e5'
> b'ovl_do_rename+0x47'
> b'ovl_rename+0x5d5'
> b'vfs_rename+0x6e5'
> b'do_renameat2+0x576'
> b'__x64_sys_rename+0x29'
> b'do_syscall_64+0x84'
> b'entry_SYSCALL_64_after_hwframe+0x49'
>
> The lock 0xffff908f993cc780 should also be held by 38030
> xfs_iunlink_remove() but not released:
>
> b'xfs_buf_trylock+0x1'
> b'xfs_buf_get_map+0x51'
> b'xfs_buf_read_map+0x47'
> b'xfs_trans_read_buf_map+0xf6'
> b'xfs_read_agi+0xd2'
> b'xfs_iunlink_remove+0x9a'
> b'xfs_rename+0x618'
> b'xfs_vn_rename+0x104'
> b'vfs_rename+0x6e5'
> b'ovl_do_rename+0x47'
> b'ovl_rename+0x5d5'
> b'vfs_rename+0x6e5'
> b'do_renameat2+0x576'
> b'__x64_sys_rename+0x29'
> b'do_syscall_64+0x84'
> b'entry_SYSCALL_64_after_hwframe+0x49'
>
>
> Looks like there are ABBA deadlock in this scenario.
>
> This issue is very easy to be reproduced by the scripts I provided.
Hm. Does this happen on upstream? There have been a couple of deadlock
fixes for the rename code in the past couple of years.
93597ae8dac0149b5c00b787cba6bf7ba213e666 (5.5)
bc56ad8c74b8588685c2875de0df8ab6974828ef (5.4)
I have no idea if Ubuntu has picked up the newer one, you'll have to
contact them.
(Did this ever show up on linux-xfs?)
--D
next parent reply other threads:[~2020-12-11 23:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CABRboy006NP8JrxuBgEJbfCcGGUY2Kucwfov+HJf2xW34D5Ocg@mail.gmail.com>
2020-12-11 23:42 ` Darrick J. Wong [this message]
[not found] ` <CABRboy35_tyxA3gHN7_=xp0_RVugQjvFOHCRsH4Y4rrivE7HmQ@mail.gmail.com>
2020-12-17 21:11 ` [Bug report] overlayfs over xfs whiteout operation may cause deadlock Darrick J. Wong
2020-12-22 1:29 ` wenli xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201211234233.GK106271@magnolia \
--to=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=wlxie7296@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox