From: bugzilla-daemon@kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 216343] XFS: no space left in xlog cause system hang
Date: Mon, 15 Aug 2022 16:12:20 +0000 [thread overview]
Message-ID: <bug-216343-201763-NC5Ss92xjn@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-216343-201763@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #2 from Amir Goldstein (amir73il@gmail.com) ---
On Mon, Aug 15, 2022 at 2:54 AM Dave Chinner <david@fromorbit.com> wrote:
>
> [cc Amir, the 5.10 stable XFS maintainer]
>
> On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216343
> >
> > Bug ID: 216343
> > Summary: XFS: no space left in xlog cause system hang
> > Product: File System
> > Version: 2.5
> > Kernel Version: 5.10.38
> > Hardware: ARM
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: XFS
> > Assignee: filesystem_xfs@kernel-bugs.kernel.org
> > Reporter: zhoukete@126.com
> > Regression: No
> >
> > Created attachment 301539
> > --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> > stack
> >
> > 1. cannot login with ssh, system hanged and cannot do anything
> > 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> > 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce
> it.
> >
> > Follwing is my analysis from vmcore:
> >
> > The reason why tty cannot login is pid 2021571 hold the acct_process mutex,
> and
> > 2021571 cannot release mutex because it is wait for xlog release space. See
> the
> > stac info in the attachment of stack.txt
> >
> > So I try to figure out what happened to xlog
> >
> > crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000
> > ail_target_prev = 0xe9200058600
> > ail_target = 0xe9200058600
> > ail_head = {
> > next = 0xffff0340999a0a80,
> > prev = 0xffff020013c66b40
> > }
> >
> > there are 112 log item in ail list
> > crash> list 0xffff0340999a0a80 | wc -l
> > 112
> >
> > 79 item of them are xlog_inode_item
> > 30 item of them are xlog_buf_item
> >
> > crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x
> > li_flags = 0x1
> > li_lsn = 0xe910005cc00 ===> first item lsn
> >
> > crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
> > li_flags = 0x1
> > li_lsn = 0xe9200058600 ===> last item lsn
> >
> > crash>xfs_log_item.li_buf 0xffff0340999a0a80
> > li_buf = 0xffff0200125b7180
> >
> > crash> xfs_buf.b_flags 0xffff0200125b7180 -x
> > b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES)
> >
> > crash> xfs_buf.b_state 0xffff0200125b7180 -x
> > b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
> >
> > crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180
> -x
> > b_last_error = 0x0
> > b_retries = 0x0
> > b_first_retry_time = 0x0
> >
> > The buf flags show the io had been done(XBF_DONE is set).
> > When I review the code xfs_buf_ioend, if XBF_DONE is set,
> xfs_buf_inode_iodone
> > will be called and it will remove the log item from ail list, then release
> the
> > xlog space by moving the tail_lsn.
> >
> > But now this item is still in the ail list, and the b_last_error = 0,
> XBF_WRITE
> > is set.
> >
> > xfs buf log item is the same as the inode log item.
> >
> > crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> > ffff033f8d7c9de8
> > li_buf = 0x0
> > crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> > bli_buf = 0xffff0200125b4a80
> > crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
> > b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES)
> >
> > I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the
> item
> > still in the ail.
> >
> > Is my analysis correct?
I don't think so.
I think this buffer write is in-flight.
> > Why xlog space cannot release space?
Not sure if space cannot be released or just takes a lot of time.
There are several AIL/CIL improvements in upstream kernel and
none of them are going to land in 5.10.y.
The reported kernel version 5.10.38 has almost no upstream fixes
at all, but I don't think that any of the fixes in 5.10.y are relevant for
this case anyway.
If this hang happens often with your workload, I suggest using
a newer kernel and/or formatting xfs with a larger log to meet
the demands of your workload.
Thanks,
Amir.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2022-08-15 16:12 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
2022-08-11 7:04 ` [Bug 216343] " bugzilla-daemon
2022-08-14 23:54 ` [Bug 216343] New: " Dave Chinner
2022-08-15 16:12 ` Amir Goldstein
2022-08-14 23:54 ` [Bug 216343] " bugzilla-daemon
2022-08-15 16:12 ` bugzilla-daemon [this message]
2022-08-16 6:56 ` bugzilla-daemon
2022-08-16 14:32 ` Amir Goldstein
2022-08-16 14:32 ` bugzilla-daemon
2022-08-17 10:05 ` bugzilla-daemon
2022-08-17 13:15 ` Amir Goldstein
2022-08-17 13:15 ` bugzilla-daemon
2022-08-18 8:23 ` bugzilla-daemon
2023-09-21 6:58 ` bugzilla-daemon
2023-09-21 6:59 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-216343-201763-NC5Ss92xjn@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.