* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
@ 2022-08-11 7:04 ` bugzilla-daemon
2022-08-14 23:54 ` [Bug 216343] New: " Dave Chinner
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-11 7:04 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
zhoukete@126.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |high
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Bug 216343] New: XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
2022-08-11 7:04 ` [Bug 216343] " bugzilla-daemon
@ 2022-08-14 23:54 ` Dave Chinner
2022-08-15 16:12 ` Amir Goldstein
2022-08-14 23:54 ` [Bug 216343] " bugzilla-daemon
` (8 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2022-08-14 23:54 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs, Amir Goldstein
[cc Amir, the 5.10 stable XFS maintainer]
On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> Bug ID: 216343
> Summary: XFS: no space left in xlog cause system hang
> Product: File System
> Version: 2.5
> Kernel Version: 5.10.38
> Hardware: ARM
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: XFS
> Assignee: filesystem_xfs@kernel-bugs.kernel.org
> Reporter: zhoukete@126.com
> Regression: No
>
> Created attachment 301539
> --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> stack
>
> 1. cannot login with ssh, system hanged and cannot do anything
> 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce it.
>
> Follwing is my analysis from vmcore:
>
> The reason why tty cannot login is pid 2021571 hold the acct_process mutex, and
> 2021571 cannot release mutex because it is wait for xlog release space. See the
> stac info in the attachment of stack.txt
>
> So I try to figure out what happened to xlog
>
> crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000
> ail_target_prev = 0xe9200058600
> ail_target = 0xe9200058600
> ail_head = {
> next = 0xffff0340999a0a80,
> prev = 0xffff020013c66b40
> }
>
> there are 112 log item in ail list
> crash> list 0xffff0340999a0a80 | wc -l
> 112
>
> 79 item of them are xlog_inode_item
> 30 item of them are xlog_buf_item
>
> crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x
> li_flags = 0x1
> li_lsn = 0xe910005cc00 ===> first item lsn
>
> crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
> li_flags = 0x1
> li_lsn = 0xe9200058600 ===> last item lsn
>
> crash>xfs_log_item.li_buf 0xffff0340999a0a80
> li_buf = 0xffff0200125b7180
>
> crash> xfs_buf.b_flags 0xffff0200125b7180 -x
> b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES)
>
> crash> xfs_buf.b_state 0xffff0200125b7180 -x
> b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
>
> crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180 -x
> b_last_error = 0x0
> b_retries = 0x0
> b_first_retry_time = 0x0
>
> The buf flags show the io had been done(XBF_DONE is set).
> When I review the code xfs_buf_ioend, if XBF_DONE is set, xfs_buf_inode_iodone
> will be called and it will remove the log item from ail list, then release the
> xlog space by moving the tail_lsn.
>
> But now this item is still in the ail list, and the b_last_error = 0, XBF_WRITE
> is set.
>
> xfs buf log item is the same as the inode log item.
>
> crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> ffff033f8d7c9de8
> li_buf = 0x0
> crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
> b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES)
>
> I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the item
> still in the ail.
>
> Is my analysis correct?
> Why xlog space cannot release space?
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Bug 216343] New: XFS: no space left in xlog cause system hang
2022-08-14 23:54 ` [Bug 216343] New: " Dave Chinner
@ 2022-08-15 16:12 ` Amir Goldstein
0 siblings, 0 replies; 15+ messages in thread
From: Amir Goldstein @ 2022-08-15 16:12 UTC (permalink / raw)
To: Dave Chinner; +Cc: bugzilla-daemon, linux-xfs
On Mon, Aug 15, 2022 at 2:54 AM Dave Chinner <david@fromorbit.com> wrote:
>
> [cc Amir, the 5.10 stable XFS maintainer]
>
> On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216343
> >
> > Bug ID: 216343
> > Summary: XFS: no space left in xlog cause system hang
> > Product: File System
> > Version: 2.5
> > Kernel Version: 5.10.38
> > Hardware: ARM
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: XFS
> > Assignee: filesystem_xfs@kernel-bugs.kernel.org
> > Reporter: zhoukete@126.com
> > Regression: No
> >
> > Created attachment 301539
> > --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> > stack
> >
> > 1. cannot login with ssh, system hanged and cannot do anything
> > 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> > 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce it.
> >
> > Follwing is my analysis from vmcore:
> >
> > The reason why tty cannot login is pid 2021571 hold the acct_process mutex, and
> > 2021571 cannot release mutex because it is wait for xlog release space. See the
> > stac info in the attachment of stack.txt
> >
> > So I try to figure out what happened to xlog
> >
> > crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000
> > ail_target_prev = 0xe9200058600
> > ail_target = 0xe9200058600
> > ail_head = {
> > next = 0xffff0340999a0a80,
> > prev = 0xffff020013c66b40
> > }
> >
> > there are 112 log item in ail list
> > crash> list 0xffff0340999a0a80 | wc -l
> > 112
> >
> > 79 item of them are xlog_inode_item
> > 30 item of them are xlog_buf_item
> >
> > crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x
> > li_flags = 0x1
> > li_lsn = 0xe910005cc00 ===> first item lsn
> >
> > crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
> > li_flags = 0x1
> > li_lsn = 0xe9200058600 ===> last item lsn
> >
> > crash>xfs_log_item.li_buf 0xffff0340999a0a80
> > li_buf = 0xffff0200125b7180
> >
> > crash> xfs_buf.b_flags 0xffff0200125b7180 -x
> > b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES)
> >
> > crash> xfs_buf.b_state 0xffff0200125b7180 -x
> > b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
> >
> > crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180 -x
> > b_last_error = 0x0
> > b_retries = 0x0
> > b_first_retry_time = 0x0
> >
> > The buf flags show the io had been done(XBF_DONE is set).
> > When I review the code xfs_buf_ioend, if XBF_DONE is set, xfs_buf_inode_iodone
> > will be called and it will remove the log item from ail list, then release the
> > xlog space by moving the tail_lsn.
> >
> > But now this item is still in the ail list, and the b_last_error = 0, XBF_WRITE
> > is set.
> >
> > xfs buf log item is the same as the inode log item.
> >
> > crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> > ffff033f8d7c9de8
> > li_buf = 0x0
> > crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> > bli_buf = 0xffff0200125b4a80
> > crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
> > b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES)
> >
> > I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the item
> > still in the ail.
> >
> > Is my analysis correct?
I don't think so.
I think this buffer write is in-flight.
> > Why xlog space cannot release space?
Not sure if space cannot be released or just takes a lot of time.
There are several AIL/CIL improvements in upstream kernel and
none of them are going to land in 5.10.y.
The reported kernel version 5.10.38 has almost no upstream fixes
at all, but I don't think that any of the fixes in 5.10.y are relevant for
this case anyway.
If this hang happens often with your workload, I suggest using
a newer kernel and/or formatting xfs with a larger log to meet
the demands of your workload.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
2022-08-11 7:04 ` [Bug 216343] " bugzilla-daemon
2022-08-14 23:54 ` [Bug 216343] New: " Dave Chinner
@ 2022-08-14 23:54 ` bugzilla-daemon
2022-08-15 16:12 ` bugzilla-daemon
` (7 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-14 23:54 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #1 from Dave Chinner (david@fromorbit.com) ---
[cc Amir, the 5.10 stable XFS maintainer]
On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> Bug ID: 216343
> Summary: XFS: no space left in xlog cause system hang
> Product: File System
> Version: 2.5
> Kernel Version: 5.10.38
> Hardware: ARM
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: XFS
> Assignee: filesystem_xfs@kernel-bugs.kernel.org
> Reporter: zhoukete@126.com
> Regression: No
>
> Created attachment 301539
> --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> stack
>
> 1. cannot login with ssh, system hanged and cannot do anything
> 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce it.
>
> Follwing is my analysis from vmcore:
>
> The reason why tty cannot login is pid 2021571 hold the acct_process mutex,
> and
> 2021571 cannot release mutex because it is wait for xlog release space. See
> the
> stac info in the attachment of stack.txt
>
> So I try to figure out what happened to xlog
>
> crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000
> ail_target_prev = 0xe9200058600
> ail_target = 0xe9200058600
> ail_head = {
> next = 0xffff0340999a0a80,
> prev = 0xffff020013c66b40
> }
>
> there are 112 log item in ail list
> crash> list 0xffff0340999a0a80 | wc -l
> 112
>
> 79 item of them are xlog_inode_item
> 30 item of them are xlog_buf_item
>
> crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x
> li_flags = 0x1
> li_lsn = 0xe910005cc00 ===> first item lsn
>
> crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
> li_flags = 0x1
> li_lsn = 0xe9200058600 ===> last item lsn
>
> crash>xfs_log_item.li_buf 0xffff0340999a0a80
> li_buf = 0xffff0200125b7180
>
> crash> xfs_buf.b_flags 0xffff0200125b7180 -x
> b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES)
>
> crash> xfs_buf.b_state 0xffff0200125b7180 -x
> b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
>
> crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180
> -x
> b_last_error = 0x0
> b_retries = 0x0
> b_first_retry_time = 0x0
>
> The buf flags show the io had been done(XBF_DONE is set).
> When I review the code xfs_buf_ioend, if XBF_DONE is set,
> xfs_buf_inode_iodone
> will be called and it will remove the log item from ail list, then release
> the
> xlog space by moving the tail_lsn.
>
> But now this item is still in the ail list, and the b_last_error = 0,
> XBF_WRITE
> is set.
>
> xfs buf log item is the same as the inode log item.
>
> crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> ffff033f8d7c9de8
> li_buf = 0x0
> crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
> b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES)
>
> I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the
> item
> still in the ail.
>
> Is my analysis correct?
> Why xlog space cannot release space?
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (2 preceding siblings ...)
2022-08-14 23:54 ` [Bug 216343] " bugzilla-daemon
@ 2022-08-15 16:12 ` bugzilla-daemon
2022-08-16 6:56 ` bugzilla-daemon
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-15 16:12 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #2 from Amir Goldstein (amir73il@gmail.com) ---
On Mon, Aug 15, 2022 at 2:54 AM Dave Chinner <david@fromorbit.com> wrote:
>
> [cc Amir, the 5.10 stable XFS maintainer]
>
> On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216343
> >
> > Bug ID: 216343
> > Summary: XFS: no space left in xlog cause system hang
> > Product: File System
> > Version: 2.5
> > Kernel Version: 5.10.38
> > Hardware: ARM
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: XFS
> > Assignee: filesystem_xfs@kernel-bugs.kernel.org
> > Reporter: zhoukete@126.com
> > Regression: No
> >
> > Created attachment 301539
> > --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> > stack
> >
> > 1. cannot login with ssh, system hanged and cannot do anything
> > 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> > 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce
> it.
> >
> > Follwing is my analysis from vmcore:
> >
> > The reason why tty cannot login is pid 2021571 hold the acct_process mutex,
> and
> > 2021571 cannot release mutex because it is wait for xlog release space. See
> the
> > stac info in the attachment of stack.txt
> >
> > So I try to figure out what happened to xlog
> >
> > crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000
> > ail_target_prev = 0xe9200058600
> > ail_target = 0xe9200058600
> > ail_head = {
> > next = 0xffff0340999a0a80,
> > prev = 0xffff020013c66b40
> > }
> >
> > there are 112 log item in ail list
> > crash> list 0xffff0340999a0a80 | wc -l
> > 112
> >
> > 79 item of them are xlog_inode_item
> > 30 item of them are xlog_buf_item
> >
> > crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x
> > li_flags = 0x1
> > li_lsn = 0xe910005cc00 ===> first item lsn
> >
> > crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
> > li_flags = 0x1
> > li_lsn = 0xe9200058600 ===> last item lsn
> >
> > crash>xfs_log_item.li_buf 0xffff0340999a0a80
> > li_buf = 0xffff0200125b7180
> >
> > crash> xfs_buf.b_flags 0xffff0200125b7180 -x
> > b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES)
> >
> > crash> xfs_buf.b_state 0xffff0200125b7180 -x
> > b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
> >
> > crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180
> -x
> > b_last_error = 0x0
> > b_retries = 0x0
> > b_first_retry_time = 0x0
> >
> > The buf flags show the io had been done(XBF_DONE is set).
> > When I review the code xfs_buf_ioend, if XBF_DONE is set,
> xfs_buf_inode_iodone
> > will be called and it will remove the log item from ail list, then release
> the
> > xlog space by moving the tail_lsn.
> >
> > But now this item is still in the ail list, and the b_last_error = 0,
> XBF_WRITE
> > is set.
> >
> > xfs buf log item is the same as the inode log item.
> >
> > crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> > ffff033f8d7c9de8
> > li_buf = 0x0
> > crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> > bli_buf = 0xffff0200125b4a80
> > crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
> > b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES)
> >
> > I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the
> item
> > still in the ail.
> >
> > Is my analysis correct?
I don't think so.
I think this buffer write is in-flight.
> > Why xlog space cannot release space?
Not sure if space cannot be released or just takes a lot of time.
There are several AIL/CIL improvements in upstream kernel and
none of them are going to land in 5.10.y.
The reported kernel version 5.10.38 has almost no upstream fixes
at all, but I don't think that any of the fixes in 5.10.y are relevant for
this case anyway.
If this hang happens often with your workload, I suggest using
a newer kernel and/or formatting xfs with a larger log to meet
the demands of your workload.
Thanks,
Amir.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (3 preceding siblings ...)
2022-08-15 16:12 ` bugzilla-daemon
@ 2022-08-16 6:56 ` bugzilla-daemon
2022-08-16 14:32 ` Amir Goldstein
2022-08-16 14:32 ` bugzilla-daemon
` (5 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-16 6:56 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #3 from zhoukete@126.com ---
(In reply to Amir Goldstein from comment #2)
>
> I don't think so.
> I think this buffer write is in-flight.
But how to explain the XFS_DONE flag is set.Is XFS_DONE flag means the io had
been done?
>
> Not sure if space cannot be released or just takes a lot of time.
crash> ps -m 2006840
[0 14:35:38.418] [UN] PID: 2006840 TASK: ffff035f80de8f80 CPU: 51 COMMAND:
"onestor-peon"
pid 2006840 hang 14 hours. It is waiting for the xfs buf io finish.
So I think it is ‘cannot be released’ not take a lot of time.
crash> bt 2006840
PID: 2006840 TASK: ffff035f80de8f80 CPU: 51 COMMAND: "onestor-peon"
#0 [ffff80002a873580] __switch_to at ffff800010016dec
#1 [ffff80002a8735b0] __schedule at ffff800010a99d60
#2 [ffff80002a873650] schedule at ffff800010a9a1cc
#3 [ffff80002a873670] schedule_timeout at ffff800010a9ee00
#4 [ffff80002a873710] __down at ffff800010a9d2ec
#5 [ffff80002a8737a0] down at ffff8000100ea30c
#6 [ffff80002a8737c0] xfs_buf_lock at ffff800008f5bd4c [xfs]
#7 [ffff80002a8737f0] xfs_buf_find at ffff800008f5c3f0 [xfs]
#8 [ffff80002a8738b0] xfs_buf_get_map at ffff800008f5c7d4 [xfs]
#9 [ffff80002a873930] xfs_buf_read_map at ffff800008f5cf10 [xfs]
#10 [ffff80002a8739a0] xfs_trans_read_buf_map at ffff800008f9c198 [xfs]
#11 [ffff80002a873a20] xfs_da_read_buf at ffff800008f2b51c [xfs]
#12 [ffff80002a873aa0] xfs_da3_node_read at ffff800008f2b5a4 [xfs]
#13 [ffff80002a873ac0] xfs_da3_node_lookup_int at ffff800008f2c778 [xfs]
#14 [ffff80002a873b60] xfs_dir2_node_removename at ffff800008f38964 [xfs]
#15 [ffff80002a873c30] xfs_dir_removename at ffff800008f2fc6c [xfs]
#16 [ffff80002a873c90] xfs_remove at ffff800008f77f70 [xfs]
#17 [ffff80002a873d00] xfs_vn_unlink at ffff800008f72744 [xfs]
#18 [ffff80002a873d50] vfs_unlink at ffff8000103264e8
#19 [ffff80002a873d90] do_unlinkat at ffff80001032ad44
#20 [ffff80002a873e20] __arm64_sys_unlinkat at ffff80001032ae30
> There are several AIL/CIL improvements in upstream kernel and
> none of them are going to land in 5.10.y.
> The reported kernel version 5.10.38 has almost no upstream fixes
> at all, but I don't think that any of the fixes in 5.10.y are relevant for
> this case anyway.
>
> If this hang happens often with your workload, I suggest using
> a newer kernel and/or formatting xfs with a larger log to meet
> the demands of your workload.
>
It is just happens only once.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-16 6:56 ` bugzilla-daemon
@ 2022-08-16 14:32 ` Amir Goldstein
0 siblings, 0 replies; 15+ messages in thread
From: Amir Goldstein @ 2022-08-16 14:32 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs
On Tue, Aug 16, 2022 at 11:57 AM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> --- Comment #3 from zhoukete@126.com ---
> (In reply to Amir Goldstein from comment #2)
> >
> > I don't think so.
> > I think this buffer write is in-flight.
>
> But how to explain the XFS_DONE flag is set.Is XFS_DONE flag means the io had
> been done?
>
According to xfs_trans_dirty_buf() I think it could mean uptodate and
dirty buffer.
>
> >
> > Not sure if space cannot be released or just takes a lot of time.
>
> crash> ps -m 2006840
> [0 14:35:38.418] [UN] PID: 2006840 TASK: ffff035f80de8f80 CPU: 51 COMMAND:
> "onestor-peon"
>
> pid 2006840 hang 14 hours. It is waiting for the xfs buf io finish.
> So I think it is ‘cannot be released’ not take a lot of time.
Maybe the hardware never returned with a response?
Hard to say. Maybe someone else has ideas.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (4 preceding siblings ...)
2022-08-16 6:56 ` bugzilla-daemon
@ 2022-08-16 14:32 ` bugzilla-daemon
2022-08-17 10:05 ` bugzilla-daemon
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-16 14:32 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #4 from Amir Goldstein (amir73il@gmail.com) ---
On Tue, Aug 16, 2022 at 11:57 AM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> --- Comment #3 from zhoukete@126.com ---
> (In reply to Amir Goldstein from comment #2)
> >
> > I don't think so.
> > I think this buffer write is in-flight.
>
> But how to explain the XFS_DONE flag is set.Is XFS_DONE flag means the io had
> been done?
>
According to xfs_trans_dirty_buf() I think it could mean uptodate and
dirty buffer.
>
> >
> > Not sure if space cannot be released or just takes a lot of time.
>
> crash> ps -m 2006840
> [0 14:35:38.418] [UN] PID: 2006840 TASK: ffff035f80de8f80 CPU: 51
> COMMAND:
> "onestor-peon"
>
> pid 2006840 hang 14 hours. It is waiting for the xfs buf io finish.
> So I think it is ‘cannot be released’ not take a lot of time.
Maybe the hardware never returned with a response?
Hard to say. Maybe someone else has ideas.
Thanks,
Amir.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (5 preceding siblings ...)
2022-08-16 14:32 ` bugzilla-daemon
@ 2022-08-17 10:05 ` bugzilla-daemon
2022-08-17 13:15 ` Amir Goldstein
2022-08-17 13:15 ` bugzilla-daemon
` (3 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-17 10:05 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #5 from zhoukete@126.com ---
(In reply to Amir Goldstein from comment #4)
>
> According to xfs_trans_dirty_buf() I think it could mean uptodate and
> dirty buffer.
>
when I review the xfs_trans_dirty_buf code, I found that xfs inode item
b_log_item is null,
crash> xfs_log_item.li_buf,li_ops 0xffff0340999a0a80 -x
li_buf = 0xffff0200125b7180
li_ops = 0xffff800008faec60 <xfs_inode_item_ops>
crash> xfs_buf.b_log_item 0xffff0200125b7180
b_log_item = 0x0
and only xfs buf log item b_log_item has value
crash> xfs_log_item.li_buf,li_ops ffff033f8d7c9de8 -x
li_buf = 0x0
li_ops = 0xffff800008fae8d8 <xfs_buf_item_ops>
crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
bli_buf = 0xffff0200125b4a80
crash> xfs_buf.b_log_item 0xffff0200125b4a80
b_log_item = 0xffff033f8d7c9de8
crash> xfs_buf_log_item.bli_flags 0xffff033f8d7c9de8
bli_flags = 2 (XFS_BLI_DIRTY)
crash> xfs_buf_log_item.bli_item.li_flags ffff033f8d7c9de8
bli_item.li_flags = 1, (XFS_LI_IN_AIL)
So xfs buf log item XFS_DONE is set because of xfs_trans_dirty_buf(),buf xfs
inode log item never call xfs_trans_dirty_buf() because of b_log_item == 0x0.
Do you know the reason why xfs inode log item XFS_DONE is set?
>
> Maybe the hardware never returned with a response?
> Hard to say. Maybe someone else has ideas.
>
If we can prove that XFS_DONE isn't stand for iodone, I think this issue may
cause by the hardware error.
I find the err msg in dmesg:
[ 9824.111366] mpt3sas_cm0: issue target reset: handle = (0x0034)
Maybe it tell us mpt3sas lost the io requests before.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-17 10:05 ` bugzilla-daemon
@ 2022-08-17 13:15 ` Amir Goldstein
0 siblings, 0 replies; 15+ messages in thread
From: Amir Goldstein @ 2022-08-17 13:15 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs
On Wed, Aug 17, 2022 at 1:19 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> --- Comment #5 from zhoukete@126.com ---
> (In reply to Amir Goldstein from comment #4)
>
> >
> > According to xfs_trans_dirty_buf() I think it could mean uptodate and
> > dirty buffer.
> >
>
> when I review the xfs_trans_dirty_buf code, I found that xfs inode item
> b_log_item is null,
>
> crash> xfs_log_item.li_buf,li_ops 0xffff0340999a0a80 -x
> li_buf = 0xffff0200125b7180
> li_ops = 0xffff800008faec60 <xfs_inode_item_ops>
> crash> xfs_buf.b_log_item 0xffff0200125b7180
> b_log_item = 0x0
>
> and only xfs buf log item b_log_item has value
>
> crash> xfs_log_item.li_buf,li_ops ffff033f8d7c9de8 -x
> li_buf = 0x0
> li_ops = 0xffff800008fae8d8 <xfs_buf_item_ops>
> crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_log_item 0xffff0200125b4a80
> b_log_item = 0xffff033f8d7c9de8
> crash> xfs_buf_log_item.bli_flags 0xffff033f8d7c9de8
> bli_flags = 2 (XFS_BLI_DIRTY)
> crash> xfs_buf_log_item.bli_item.li_flags ffff033f8d7c9de8
> bli_item.li_flags = 1, (XFS_LI_IN_AIL)
>
> So xfs buf log item XFS_DONE is set because of xfs_trans_dirty_buf(),buf xfs
> inode log item never call xfs_trans_dirty_buf() because of b_log_item == 0x0.
>
> Do you know the reason why xfs inode log item XFS_DONE is set?
>
#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
Buffer uptodate does not mean that it is not dirty.
I am not sure about the rest of your analysis.
> >
> > Maybe the hardware never returned with a response?
> > Hard to say. Maybe someone else has ideas.
> >
>
> If we can prove that XFS_DONE isn't stand for iodone, I think this issue may
> cause by the hardware error.
>
> I find the err msg in dmesg:
> [ 9824.111366] mpt3sas_cm0: issue target reset: handle = (0x0034)
>
> Maybe it tell us mpt3sas lost the io requests before.
>
Yes, maybe it does.
Anyway, if your hardware had errors, could it be that your
filesystem is shutting down?
If it does, you may be hit by the bug fixed by
84d8949e7707 ("xfs: hold buffer across unpin and potential shutdown processing")
but I am not sure if all the conditions in this bug match your case.
If you did get hit by this bug, you may consider upgrade to v5.10.135
which has the bug fix.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (6 preceding siblings ...)
2022-08-17 10:05 ` bugzilla-daemon
@ 2022-08-17 13:15 ` bugzilla-daemon
2022-08-18 8:23 ` bugzilla-daemon
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-17 13:15 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #6 from Amir Goldstein (amir73il@gmail.com) ---
On Wed, Aug 17, 2022 at 1:19 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> --- Comment #5 from zhoukete@126.com ---
> (In reply to Amir Goldstein from comment #4)
>
> >
> > According to xfs_trans_dirty_buf() I think it could mean uptodate and
> > dirty buffer.
> >
>
> when I review the xfs_trans_dirty_buf code, I found that xfs inode item
> b_log_item is null,
>
> crash> xfs_log_item.li_buf,li_ops 0xffff0340999a0a80 -x
> li_buf = 0xffff0200125b7180
> li_ops = 0xffff800008faec60 <xfs_inode_item_ops>
> crash> xfs_buf.b_log_item 0xffff0200125b7180
> b_log_item = 0x0
>
> and only xfs buf log item b_log_item has value
>
> crash> xfs_log_item.li_buf,li_ops ffff033f8d7c9de8 -x
> li_buf = 0x0
> li_ops = 0xffff800008fae8d8 <xfs_buf_item_ops>
> crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_log_item 0xffff0200125b4a80
> b_log_item = 0xffff033f8d7c9de8
> crash> xfs_buf_log_item.bli_flags 0xffff033f8d7c9de8
> bli_flags = 2 (XFS_BLI_DIRTY)
> crash> xfs_buf_log_item.bli_item.li_flags ffff033f8d7c9de8
> bli_item.li_flags = 1, (XFS_LI_IN_AIL)
>
> So xfs buf log item XFS_DONE is set because of xfs_trans_dirty_buf(),buf xfs
> inode log item never call xfs_trans_dirty_buf() because of b_log_item == 0x0.
>
> Do you know the reason why xfs inode log item XFS_DONE is set?
>
#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
Buffer uptodate does not mean that it is not dirty.
I am not sure about the rest of your analysis.
> >
> > Maybe the hardware never returned with a response?
> > Hard to say. Maybe someone else has ideas.
> >
>
> If we can prove that XFS_DONE isn't stand for iodone, I think this issue may
> cause by the hardware error.
>
> I find the err msg in dmesg:
> [ 9824.111366] mpt3sas_cm0: issue target reset: handle = (0x0034)
>
> Maybe it tell us mpt3sas lost the io requests before.
>
Yes, maybe it does.
Anyway, if your hardware had errors, could it be that your
filesystem is shutting down?
If it does, you may be hit by the bug fixed by
84d8949e7707 ("xfs: hold buffer across unpin and potential shutdown
processing")
but I am not sure if all the conditions in this bug match your case.
If you did get hit by this bug, you may consider upgrade to v5.10.135
which has the bug fix.
Thanks,
Amir.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (7 preceding siblings ...)
2022-08-17 13:15 ` bugzilla-daemon
@ 2022-08-18 8:23 ` bugzilla-daemon
2023-09-21 6:58 ` bugzilla-daemon
2023-09-21 6:59 ` bugzilla-daemon
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-08-18 8:23 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #7 from zhoukete@126.com ---
(In reply to Amir Goldstein from comment #6)
> On Wed, Aug 17, 2022 at 1:19 PM <bugzilla-daemon@kernel.org> wrote:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=216343
> >
> > --- Comment #5 from zhoukete@126.com ---
> > (In reply to Amir Goldstein from comment #4)
> >
> > >
> > > According to xfs_trans_dirty_buf() I think it could mean uptodate and
> > > dirty buffer.
> > >
> >
> > when I review the xfs_trans_dirty_buf code, I found that xfs inode item
> > b_log_item is null,
> >
> > crash> xfs_log_item.li_buf,li_ops 0xffff0340999a0a80 -x
> > li_buf = 0xffff0200125b7180
> > li_ops = 0xffff800008faec60 <xfs_inode_item_ops>
> > crash> xfs_buf.b_log_item 0xffff0200125b7180
> > b_log_item = 0x0
> >
> > and only xfs buf log item b_log_item has value
> >
> > crash> xfs_log_item.li_buf,li_ops ffff033f8d7c9de8 -x
> > li_buf = 0x0
> > li_ops = 0xffff800008fae8d8 <xfs_buf_item_ops>
> > crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8
> > bli_buf = 0xffff0200125b4a80
> > crash> xfs_buf.b_log_item 0xffff0200125b4a80
> > b_log_item = 0xffff033f8d7c9de8
> > crash> xfs_buf_log_item.bli_flags 0xffff033f8d7c9de8
> > bli_flags = 2 (XFS_BLI_DIRTY)
> > crash> xfs_buf_log_item.bli_item.li_flags ffff033f8d7c9de8
> > bli_item.li_flags = 1, (XFS_LI_IN_AIL)
> >
> > So xfs buf log item XFS_DONE is set because of xfs_trans_dirty_buf(),buf
> xfs
> > inode log item never call xfs_trans_dirty_buf() because of b_log_item ==
> 0x0.
> >
> > Do you know the reason why xfs inode log item XFS_DONE is set?
> >
>
> #define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
>
> Buffer uptodate does not mean that it is not dirty.
> I am not sure about the rest of your analysis.
>
> > >
> > > Maybe the hardware never returned with a response?
> > > Hard to say. Maybe someone else has ideas.
> > >
> >
> > If we can prove that XFS_DONE isn't stand for iodone, I think this issue
> may
> > cause by the hardware error.
> >
> > I find the err msg in dmesg:
> > [ 9824.111366] mpt3sas_cm0: issue target reset: handle = (0x0034)
> >
> > Maybe it tell us mpt3sas lost the io requests before.
> >
>
> Yes, maybe it does.
>
> Anyway, if your hardware had errors, could it be that your
> filesystem is shutting down?
>
> If it does, you may be hit by the bug fixed by
> 84d8949e7707 ("xfs: hold buffer across unpin and potential shutdown
> processing")
> but I am not sure if all the conditions in this bug match your case.
>
> If you did get hit by this bug, you may consider upgrade to v5.10.135
> which has the bug fix.
>
> Thanks,
> Amir.
I think xfs isn't in shutdown stat,
crash> xfs_mount.m_flags 0xffff00ff85169000 -x
m_flags = 0x100 (XFS_MOUNT_ATTR2)
About the 84d8949e7707 ,I would try to back port to 5.10.38.
Thanks
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (8 preceding siblings ...)
2022-08-18 8:23 ` bugzilla-daemon
@ 2023-09-21 6:58 ` bugzilla-daemon
2023-09-21 6:59 ` bugzilla-daemon
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2023-09-21 6:58 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
--- Comment #8 from zhoukete@126.com ---
I have found the cause of the problem. By applying the patch
d43aaf1685aa471f0593685c9f54d53e3af3cf3f , the problem can be solved.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread* [Bug 216343] XFS: no space left in xlog cause system hang
2022-08-09 11:46 [Bug 216343] New: XFS: no space left in xlog cause system hang bugzilla-daemon
` (9 preceding siblings ...)
2023-09-21 6:58 ` bugzilla-daemon
@ 2023-09-21 6:59 ` bugzilla-daemon
10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2023-09-21 6:59 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=216343
zhoukete@126.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |PATCH_ALREADY_AVAILABLE
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread