* Re: xfs/311 test pass but leave block device unusable
[not found] <CAG2S0o-wJc-2_wm=35mE5Lt0e4idXwb3g5ezc9=LdWrLHfRM_Q@mail.gmail.com>
@ 2022-01-20 18:02 ` Darrick J. Wong
2022-01-21 10:20 ` Andrea Tomassetti
0 siblings, 1 reply; 2+ messages in thread
From: Darrick J. Wong @ 2022-01-20 18:02 UTC (permalink / raw)
To: Andrea Tomassetti; +Cc: fstests, xfs
[cc xfs list]
On Thu, Jan 20, 2022 at 12:04:47PM +0100, Andrea Tomassetti wrote:
> Hi all,
> I was using the (x)fstest utility on the Kernel 5.11 to try to
> reproduce some xfs-related issues:
> INFO: task xfs-conv/dm-3:1360 blocked for more than 120 seconds.
> Workqueue: xfs-conv/dm-3 xfs_end_io [xfs]
> Call Trace:
> __schedule+0x44c/0x8a0
> schedule+0x4f/0xc0
> xlog_grant_head_wait+0xb5/0x1a0 [xfs]
> xlog_grant_head_check+0xe1/0x100 [xfs]
Threads are stuck waiting for log space; can you post the full dmesg?
And the xfs_info output of the test device?
> When I realized that xfs test n. 311 was passing correctly but every
> further attempt to use the block device (e.g. mount it) was failing.
> The issue is reproducible after reboot.
>
> Test passed:
> ./check xfs/311
> FSTYP -- xfs (non-debug)
> PLATFORM -- Linux/x86_64 test 5.11.0-1021-aws
> #22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021
> MKFS_OPTIONS -- -f /dev/xvdz
> MOUNT_OPTIONS -- /dev/xvdz /home/test/z
>
> xfs/311 25s ... 25s
> Ran: xfs/311
> Passed all 1 tests
>
> Fail:
> # mount /dev/xvdz /home/test/z/
> mount: /home/test/z: /dev/xvdz already mounted or mount point busy.
> [ 2222.028417] /dev/xvdz: Can't open blockdev
>
> lsof does not show anything that is using either /dev/xvdz or /home/test/z
>
> Any idea why is this happening?
xfs-conv handles unwritten extent conversion after writeback, so I would
speculate (without dmesg data) that everyone got wedged trying to start
a transaction, and the log is blocked up for whatever reason.
> The `xlog_grant_head_wait` race issue has been resolved in a later
> Kernel version, am I right?
Beats me.
> Best regards,
> Andrea
>
> --
>
>
>
>
>
>
>
> The contents of this email are confidential. If the reader of this
Not anymore they aren't.
--D
> message is not the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by replying to this message and deleting it from your
> computer. Thank you. Devo, Inc; arco@devo.com <mailto:arco@devo.com>;
> Calle Estébanez Calderón 3-5, 5th Floor. Madrid, Spain 28020
>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: xfs/311 test pass but leave block device unusable
2022-01-20 18:02 ` xfs/311 test pass but leave block device unusable Darrick J. Wong
@ 2022-01-21 10:20 ` Andrea Tomassetti
0 siblings, 0 replies; 2+ messages in thread
From: Andrea Tomassetti @ 2022-01-21 10:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: fstests, xfs
On Thu, Jan 20, 2022 at 7:02 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> [cc xfs list]
>
> On Thu, Jan 20, 2022 at 12:04:47PM +0100, Andrea Tomassetti wrote:
> > Hi all,
> > I was using the (x)fstest utility on the Kernel 5.11 to try to
> > reproduce some xfs-related issues:
> > INFO: task xfs-conv/dm-3:1360 blocked for more than 120 seconds.
> > Workqueue: xfs-conv/dm-3 xfs_end_io [xfs]
> > Call Trace:
> > __schedule+0x44c/0x8a0
> > schedule+0x4f/0xc0
> > xlog_grant_head_wait+0xb5/0x1a0 [xfs]
> > xlog_grant_head_check+0xe1/0x100 [xfs]
>
> Threads are stuck waiting for log space; can you post the full dmesg?
> And the xfs_info output of the test device?
>
I think I maybe didn't expose correctly the problems. I mentioned two
separated issues:
1. The thread stuck problem, to which the dmesg refers to. It is
happening to us in production and we're trying to replicate it in a
test environment but without any success at the moment.
2. While I was using xfstests, in the hope it could replicate the
problem, I stepped into the buggy behaviour of the test n. 311. And I
wanted to report this to you (and maybe get some advice on the
production's problem that is affecting us)
I hope I made myself clear, sorry for the misunderstanding.
Here the full dmesg, about the production's issue:
[Thu Dec 23 17:13:56 2021] INFO: task xfs-conv/dm-3:1360 blocked for
more than 120 seconds.
[Thu Dec 23 17:13:56 2021] Not tainted 5.11.0-1020-aws
#21~20.04.2-Ubuntu
[Thu Dec 23 17:13:56 2021] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Dec 23 17:13:56 2021] task:xfs-conv/dm-3 state:D stack: 0
pid: 1360 ppid: 2 flags:0x00004000
[Thu Dec 23 17:13:56 2021] Workqueue: xfs-conv/dm-3 xfs_end_io [xfs]
[Thu Dec 23 17:13:56 2021] Call Trace:
[Thu Dec 23 17:13:56 2021] __schedule+0x44c/0x8a0
[Thu Dec 23 17:13:56 2021] schedule+0x4f/0xc0
[Thu Dec 23 17:13:56 2021] xlog_grant_head_wait+0xb5/0x1a0 [xfs]
[Thu Dec 23 17:13:56 2021] xlog_grant_head_check+0xe1/0x100 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_log_reserve+0xc2/0x1c0 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_trans_reserve+0x1ca/0x210 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_trans_alloc+0xd7/0x190 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_iomap_write_unwritten+0x125/0x2e0 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_end_ioend+0xdb/0x120 [xfs]
[Thu Dec 23 17:13:56 2021] xfs_end_io+0xb7/0xe0 [xfs]
[Thu Dec 23 17:13:56 2021] process_one_work+0x220/0x3c0
[Thu Dec 23 17:13:56 2021] rescuer_thread+0x2ca/0x3b0
[Thu Dec 23 17:13:56 2021] ? worker_thread+0x3f0/0x3f0
[Thu Dec 23 17:13:56 2021] kthread+0x12b/0x150
[Thu Dec 23 17:13:56 2021] ? set_kthread_struct+0x40/0x40
[Thu Dec 23 17:13:56 2021] ret_from_fork+0x22/0x30
> > When I realized that xfs test n. 311 was passing correctly but every
> > further attempt to use the block device (e.g. mount it) was failing.
> > The issue is reproducible after reboot.
> >
> > Test passed:
> > ./check xfs/311
> > FSTYP -- xfs (non-debug)
> > PLATFORM -- Linux/x86_64 test 5.11.0-1021-aws
> > #22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021
> > MKFS_OPTIONS -- -f /dev/xvdz
> > MOUNT_OPTIONS -- /dev/xvdz /home/test/z
> >
> > xfs/311 25s ... 25s
> > Ran: xfs/311
> > Passed all 1 tests
> >
> > Fail:
> > # mount /dev/xvdz /home/test/z/
> > mount: /home/test/z: /dev/xvdz already mounted or mount point busy.
> > [ 2222.028417] /dev/xvdz: Can't open blockdev
> >
> > lsof does not show anything that is using either /dev/xvdz or /home/test/z
> >
> > Any idea why is this happening?
>
> xfs-conv handles unwritten extent conversion after writeback, so I would
> speculate (without dmesg data) that everyone got wedged trying to start
> a transaction, and the log is blocked up for whatever reason.
>
I run the test xfs/311 with the Kernel v5.13 and the behaviour is the
same as with the 5.11: after running it the disk becomes unusable.
Trying to rerun the test, after the first successful run, leads to:
FSTYP -- xfs (non-debug)
PLATFORM -- Linux/x86_64 test 5.11.0-1021-aws
#22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021
MKFS_OPTIONS -- -f /dev/xvdz
MOUNT_OPTIONS -- /dev/xvdz /home/test/z
our local _scratch_mkfs routine ...
mkfs.xfs: cannot open /dev/xvdz: Device or resource busy
check: failed to mkfs $SCRATCH_DEV using specified options
Interrupted!
Passed all 0 tests
with no errors in dmesg.
xfs_info /dev/xvdz
meta-data=/dev/xvdz isize=512 agcount=4, agsize=32768000 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=131072000, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=64000, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
> > The `xlog_grant_head_wait` race issue has been resolved in a later
> > Kernel version, am I right?
>
> Beats me.
>
> > Best regards,
> > Andrea
> >
> > --
> >
> >
> >
> >
> >
> >
> >
> > The contents of this email are confidential. If the reader of this
>
> Not anymore they aren't.
>
> --D
>
> > message is not the intended recipient, you are hereby notified that any
> > dissemination, distribution or copying of this communication is strictly
> > prohibited. If you have received this communication in error, please notify
> > us immediately by replying to this message and deleting it from your
> > computer. Thank you. Devo, Inc; arco@devo.com <mailto:arco@devo.com>;
> > Calle Estébanez Calderón 3-5, 5th Floor. Madrid, Spain 28020
> >
--
The contents of this email are confidential. If the reader of this
message is not the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please notify
us immediately by replying to this message and deleting it from your
computer. Thank you. Devo, Inc; arco@devo.com <mailto:arco@devo.com>;
Calle Estébanez Calderón 3-5, 5th Floor. Madrid, Spain 28020
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-01-21 10:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAG2S0o-wJc-2_wm=35mE5Lt0e4idXwb3g5ezc9=LdWrLHfRM_Q@mail.gmail.com>
2022-01-20 18:02 ` xfs/311 test pass but leave block device unusable Darrick J. Wong
2022-01-21 10:20 ` Andrea Tomassetti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox