* [Bug 202077] xfs transaction overruns on 4.14.67
2018-12-26 17:19 [Bug 202077] New: xfs transaction overruns on 4.14.67 bugzilla-daemon
@ 2018-12-26 17:19 ` bugzilla-daemon
2019-01-01 22:30 ` [Bug 202077] New: " Dave Chinner
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2018-12-26 17:19 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=202077
--- Comment #1 from Thomas Walker (thomas.walker@twosigma.com) ---
Created attachment 280151
--> https://bugzilla.kernel.org/attachment.cgi?id=280151&action=edit
xfs transaction overrun #2
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Bug 202077] New: xfs transaction overruns on 4.14.67
2018-12-26 17:19 [Bug 202077] New: xfs transaction overruns on 4.14.67 bugzilla-daemon
2018-12-26 17:19 ` [Bug 202077] " bugzilla-daemon
@ 2019-01-01 22:30 ` Dave Chinner
2019-01-01 22:30 ` [Bug 202077] xfs transaction log reservation " bugzilla-daemon
2019-01-02 16:54 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2019-01-01 22:30 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs
On Wed, Dec 26, 2018 at 05:19:12PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> We've encountered two recent examples of xfs transaction overruns on production
> systems running 4.14.67 kernels. Both systems in this case are running docker
> with dozens of overlay mounts, using this xfs fs as both upper and lower. In
> both cases the filesystem was able to successfully recover when the filesystem
> was unmounted and remounted again.
Inboth cases, it looks like there were two free space manipulations
in a single transaction, likely first modifying the free list
(pattern is EFD, XAGF, ABTB, ABTC, then AGFL) followed by freeing
the actual extent (more ABTB, ABTC buffers).
> It looks like there has been a good bit of work in 4.16+
The first fixes went into 4.18 with the deferred AGFL free
operations. Those were the commits associated with the patchset
titled "[PATCH v2 0/6] xfs: defer agfl block frees".
There were more fixes in 4.19 to always defer the AGFL free for all
operations. This was a much larger and more significant change, and
can be found from the series titled "[PATCH 00/24] xfs: broad
enablement of deferred agfl frees".
> addressing similar issues but none of it has made it back into the
> 4.14 LTS. Any chance that any of the attached debug output points
> to anything specific that might be a candidate for backport?
Backporting the first series might be sufficient to avoid your
problem (both are from the inode inactivation path) but it is no
guarantee. I also have no idea what dependencies that patchset has
on the rest of the code (e.g. is there enough deferred op
infrastructure in place in 4.14?), and seeing as it touches core
allocation algorithms it would require a substantial amount of QA
before release....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 202077] xfs transaction log reservation overruns on 4.14.67
2018-12-26 17:19 [Bug 202077] New: xfs transaction overruns on 4.14.67 bugzilla-daemon
2018-12-26 17:19 ` [Bug 202077] " bugzilla-daemon
2019-01-01 22:30 ` [Bug 202077] New: " Dave Chinner
@ 2019-01-01 22:30 ` bugzilla-daemon
2019-01-02 16:54 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2019-01-01 22:30 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=202077
--- Comment #2 from Dave Chinner (david@fromorbit.com) ---
On Wed, Dec 26, 2018 at 05:19:12PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> We've encountered two recent examples of xfs transaction overruns on
> production
> systems running 4.14.67 kernels. Both systems in this case are running
> docker
> with dozens of overlay mounts, using this xfs fs as both upper and lower. In
> both cases the filesystem was able to successfully recover when the
> filesystem
> was unmounted and remounted again.
Inboth cases, it looks like there were two free space manipulations
in a single transaction, likely first modifying the free list
(pattern is EFD, XAGF, ABTB, ABTC, then AGFL) followed by freeing
the actual extent (more ABTB, ABTC buffers).
> It looks like there has been a good bit of work in 4.16+
The first fixes went into 4.18 with the deferred AGFL free
operations. Those were the commits associated with the patchset
titled "[PATCH v2 0/6] xfs: defer agfl block frees".
There were more fixes in 4.19 to always defer the AGFL free for all
operations. This was a much larger and more significant change, and
can be found from the series titled "[PATCH 00/24] xfs: broad
enablement of deferred agfl frees".
> addressing similar issues but none of it has made it back into the
> 4.14 LTS. Any chance that any of the attached debug output points
> to anything specific that might be a candidate for backport?
Backporting the first series might be sufficient to avoid your
problem (both are from the inode inactivation path) but it is no
guarantee. I also have no idea what dependencies that patchset has
on the rest of the code (e.g. is there enough deferred op
infrastructure in place in 4.14?), and seeing as it touches core
allocation algorithms it would require a substantial amount of QA
before release....
Cheers,
Dave.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 202077] xfs transaction log reservation overruns on 4.14.67
2018-12-26 17:19 [Bug 202077] New: xfs transaction overruns on 4.14.67 bugzilla-daemon
` (2 preceding siblings ...)
2019-01-01 22:30 ` [Bug 202077] xfs transaction log reservation " bugzilla-daemon
@ 2019-01-02 16:54 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2019-01-02 16:54 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=202077
--- Comment #3 from Thomas Walker (thomas.walker@twosigma.com) ---
Thanks for the response. That first patchset does appear to apply cleanly
(with a little fuzz) to 4.14 but, as you say, I don't know offhand how mature
the code it depends upon is in 4.14 and without a reliable reproducer it will
be hard to say whether it even addresses my issue. I'll keep seeing if I can
reproduce the problem more consistently and see...
While I'm running 4.19 on a few test systems, I've been taking a wait-and-see
approach towards broader usage given the number of regressions that have
cropped up (and been fixed) thus far. Good to know that this is likely
addressed there though.
Thanks,
Tom.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread