XFS journal log resetting

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* XFS journal log resetting
@ 2021-12-10 18:27 Chanchal Mathew
  2021-12-10 21:36 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Chanchal Mathew @ 2021-12-10 18:27 UTC (permalink / raw)
  To: linux-xfs

Hi

A question for the developers -

Is there a reason why a clean unmount on an XFS filesystem would not reset the log numbers to 0? The xfs_logprint output shows the head and tail log numbers to have the same number > 0 and as CLEAN. But the log number is not reset to 0 without running a xfs_repair -L. Is there a reason it’s not done as part of the unmount?

The problem I’m noticing is, the higher the log number, it takes longer for it to be mounted. Most time seems spent on the xlog_find_tail() call. 

Thank you!

- Chanchal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS journal log resetting
  2021-12-10 18:27 XFS journal log resetting Chanchal Mathew
@ 2021-12-10 21:36 ` Dave Chinner
  2021-12-11  0:09   ` Chanchal Mathew
       [not found]   ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2021-12-10 21:36 UTC (permalink / raw)
  To: Chanchal Mathew; +Cc: linux-xfs

On Fri, Dec 10, 2021 at 10:27:11AM -0800, Chanchal Mathew wrote:
> Hi
> 
> A question for the developers -
> 
> Is there a reason why a clean unmount on an XFS filesystem would
> not reset the log numbers to 0? The xfs_logprint output shows the
> head and tail log numbers to have the same number > 0 and as
> CLEAN. But the log number is not reset to 0 without running a
> xfs_repair -L. Is there a reason it’s not done as part of the
> unmount?

Yes.

Log sequence numbers are a persistent record of when a given
metadata object was modified. They are stored in all metadata, not
just the log, and are used to determine if the modification in the
log found at recovery time is already present in the on-disk
metadata or not.

Resetting the number back to zero basically breaks this recovery
ordering verification, and takes us back to the bad old days of the
v4 filesystem format where log recovery could overwrite newer
changes in metadata. That could cause data loss and corruption
problems when recovering newly allocated inodes and subsequent
changes....

> The problem I’m noticing is, the higher the log number, it takes
> longer for it to be mounted. Most time seems spent on the
> xlog_find_tail() call. 

The log sequence number has nothign to do with how long
xlog_find_tail() takes to run. entirely dependent on the size
of the log and time it takes to binary search the journal to find
the head. The head then points at the tail, which then gets
verified. Once the head and tail are found, the journal contents
between the head and tail are CRC checked, and the time this takes
is entirely dependent on the distance between the head and tail of
the log (i.e. up to 2GB of journal space might get CRC'd here).

But at no point do larger LSNs make this take any longer.  The upper
32 bits of the LSN is just a cycle number, and it is always
interpreted as "past, current, next" by the xlog_find_tail()
process no matter what it's magnitude is. i.e. log recvoery only has
3 cycle states it cares about, regardless of how many cycles the log
has actually run.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS journal log resetting
  2021-12-10 21:36 ` Dave Chinner
@ 2021-12-11  0:09   ` Chanchal Mathew
  2021-12-11  5:55     ` Eric Sandeen
       [not found]   ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Chanchal Mathew @ 2021-12-11  0:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Thank you for the explanation!

Wouldn’t we expect zero pending modification or no unwritten data when a device is cleanly unmounted? Or do you mean, a successful ‘umount’ run on the device doesn’t guarantee there are no pending changes?

The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount. 

$ sudo xfs_logprint -t /dev/mapper/loop0p1
…
    log tail: 451 head: 451 state: <CLEAN>

Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail().

$ sudo xfs_logprint -t /dev/mapper/loop0p1
…
    log tail: 17658 head: 17658 state: <CLEAN>

The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted)

The only way I can work around this delay for an instance started from an image file with higher log number is, to reset it to 0 with xfs_repair. 

Chanchal 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS journal log resetting
  2021-12-11  0:09   ` Chanchal Mathew
@ 2021-12-11  5:55     ` Eric Sandeen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2021-12-11  5:55 UTC (permalink / raw)
  To: Chanchal Mathew, Dave Chinner; +Cc: linux-xfs

On 12/10/21 6:09 PM, Chanchal Mathew wrote:
> Thank you for the explanation!
> 
> Wouldn’t we expect zero pending modification or no unwritten data when a device is cleanly unmounted? Or do you mean, a successful ‘umount’ run on the device doesn’t guarantee there are no pending changes?
> 
> The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount.
> 
> $ sudo xfs_logprint -t /dev/mapper/loop0p1
> …
>      log tail: 451 head: 451 state: <CLEAN>
> 
> 
> Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail().
> 
> $ sudo xfs_logprint -t /dev/mapper/loop0p1
> …
>      log tail: 17658 head: 17658 state: <CLEAN>
> 

What is the geometry of these filesystems (xfs_info) and how much delay are we talking
about here, in wall clock time?

-Eric

> The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted)
> 
> The only way I can work around this delay for an instance started from an image file with higher log number is, to reset it to 0 with xfs_repair.
> 
> 
> 
> Chanchal
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>]

* Re: XFS journal log resetting
       [not found]   ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>
@ 2021-12-12 23:08     ` Dave Chinner
       [not found]       ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2021-12-12 23:08 UTC (permalink / raw)
  To: Chanchal Mathew; +Cc: linux-xfs

On Fri, Dec 10, 2021 at 04:04:59PM -0800, Chanchal Mathew wrote:
> 
> Thank you for the explanation! My response in-line.
> 
> >> Hi
> >> 
> >> A question for the developers -
> >> 
> >> Is there a reason why a clean unmount on an XFS filesystem
> >> would not reset the log numbers to 0? The xfs_logprint output
> >> shows the head and tail log numbers to have the same number > 0
> >> and as CLEAN. But the log number is not reset to 0 without
> >> running a xfs_repair -L. Is there a reason it’s not done as
> >> part of the unmount?
> > 
> > Yes.
> > 
> > Log sequence numbers are a persistent record of when a given
> > metadata object was modified. They are stored in all metadata,
> > not just the log, and are used to determine if the modification
> > in the log found at recovery time is already present in the
> > on-disk metadata or not.
> > 
> > Resetting the number back to zero basically breaks this recovery
> > ordering verification, and takes us back to the bad old days of
> > the v4 filesystem format where log recovery could overwrite
> > newer changes in metadata. That could cause data loss and
> > corruption problems when recovering newly allocated inodes and
> > subsequent changes....
> 
> Wouldn’t we expect zero pending modification or no unwritten data
> when a device is cleanly unmounted?

Yes, but how do we determine that when mounting the filesystem?

> Or do you mean, a successful
> ‘umount’ run on the device doesn’t guarantee there are no pending
> changes?

No, the log is left clean via an unmount record that is written to
the log during unmount.

IOWs, to determine there is zero pending modification at mount time,
the next mount *has to find that unmount record* to determine
that the log was cleanly unmounted and does not require recovery.
That's what xlog_find_tail() does.

> >> The problem I’m noticing is, the higher the log number, it takes
> >> longer for it to be mounted. Most time seems spent on the
> >> xlog_find_tail() call. 
> > 
> > The log sequence number has nothign to do with how long
> > xlog_find_tail() takes to run. entirely dependent on the size
> > of the log and time it takes to binary search the journal to find
> > the head. The head then points at the tail, which then gets
> > verified. Once the head and tail are found, the journal contents
> > between the head and tail are CRC checked, and the time this takes
> > is entirely dependent on the distance between the head and tail of
> > the log (i.e. up to 2GB of journal space might get CRC'd here).
> > 
> > But at no point do larger LSNs make this take any longer.  The upper
> > 32 bits of the LSN is just a cycle number, and it is always
> > interpreted as "past, current, next" by the xlog_find_tail()
> > process no matter what it's magnitude is. i.e. log recvoery only has
> > 3 cycle states it cares about, regardless of how many cycles the log
> > has actually run.
> > 
> > -Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> 
> The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount. 
> 
> $ sudo xfs_logprint -t /dev/mapper/loop0p1
> …
>     log tail: 451 head: 451 state: <CLEAN>
> 
> Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail().

This log hasn't even been written once. 

> $ sudo xfs_logprint -t /dev/mapper/loop0p1
> …
>     log tail: 17658 head: 17658 state: <CLEAN>
> 
> The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted)

And this log hasn't even been written once here, either.

As the cycle number is zero in both cases, that means it is likely
that the only difference between the two cases is the number of IOs
performed in the binary search to find the unmount record. i.e. we
have a IO count and seek pattern difference.

Absent any timing information, I would only expect the difference
here to be a few tens of milliseconds between the two cases as it
will only be a handful of disk seeks that are different.

However, I can only guess at this, because you're not including raw
timing information or any other information about the filesystem or
starage.  Can you please attach the scripts the timing information
that you are using to determine that xlog_find_tail() is slower?
i.e. post the commands *and the output* of the scripts you are using
so that we can run the tests ourselves to try to replicate the
problem you are seeing.

We'll also need to know what storage this is running on, etc:

https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> The only way I can work around this delay for an instance started
> from an image file with higher log number is, to reset it to 0
> with xfs_repair.

Which, since v4.3.0 was released in ~October 2015, xfs_repair does
not do do on v5 filesystems because it can corrupt the filesystem
(as I previously described). Hence I have to guess that you are
either using the deprecated v4 filesystem format or your xfs_repair
version is older than v4.3.0. Either way, you've got a problem when
you upgrade these systems to newer OS versions...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>]

* Re: XFS journal log resetting
       [not found]       ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>
@ 2021-12-20 20:54         ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2021-12-20 20:54 UTC (permalink / raw)
  To: Chanchal Mathew; +Cc: linux-xfs

On Mon, Dec 20, 2021 at 10:48:31AM -0800, Chanchal Mathew wrote:
> I am starting an EC2 instance from an image file with XFS file system. The file system is cleanly unmounted, and the log print output I shared before is run on it before creating a snapshot and starting an instance from it.
> The time is measured from dmesg & journalctl logs. Here is a sample output. Kernel version 5.10.x
>  
> -
> [  1.984392] XFS (nvme0n1p1): Mounting V5 Filesystem
> [  3.538829] XFS (nvme0n1p1): Ending clean mount
> -
> 
> I then edited XFS code to print more log messages for this code path. And found it to spend the nearly 1s time on xlog_find_tail. Here is the debug dmesg output from another image with high log number. If I reset the log to 0, then the mount completes in 200-300ms avg. For a long tail number of 17983, for example, it takes 1s approx.
>  
> -
> [  2.387962] XFS (nvme0n1p1): starting xlog_find_tail
> [  2.392108] XFS (nvme0n1p1): Find previous log record
> [  2.402898] XFS (nvme0n1p1): Assert head_blk < INT_MAX: 17983
> [  2.407355] XFS (nvme0n1p1): Allocate buffer
> [  2.411108] XFS (nvme0n1p1): Search backwards through the log
> [  2.416299] XFS (nvme0n1p1): tail_blk: 17979
> [  2.420045] XFS (nvme0n1p1): Set the log state based on the current head record
> [  2.426834] XFS (nvme0n1p1): Look for an unmount record at the head of the log
> [  2.433763] XFS (nvme0n1p1): Verify the log head of the log is not clean
> [  2.438599] XFS (nvme0n1p1): Note that the unmount was clean
> [  2.442950] XFS (nvme0n1p1): Make sure that there are no blocks in front of the head
> [  3.439000] XFS (nvme0n1p1): finished xlog_find_tail

So there's a delay while recovery writes up to 2MB of zeroed blocks
beyond the head so that future recoveries won't get confused by
stale, unrecovered partial log writes that this recovery ignored.

> Here is the output of xfs_info on the mounted image:
>  
> -
> $ xfs_info /dev/mapper/loop2p1
> meta-data=/dev/mapper/loop2p1  isize=512  agcount=4, agsize=524159 blks
>      =            sectsz=512  attr=2, projid32bit=1
>      =            crc=1    finobt=1 spinodes=0

V5 format - "zeroing" the log with xfs_repair is not actually zeroing
the log sequence numbers - it's re-initialising the log, not
"resetting it to zero".

> data   =            bsize=4096  blocks=2096635, imaxpct=25
>      =            sunit=0   swidth=0 blks
> naming  =version 2       bsize=4096  ascii-ci=0 ftype=1
> log   =internal        bsize=4096  blocks=2560, version=2

And so you have a 10MB log, which means the log block zeroing is
wrapping around the head and having to do two IOs instead of one to
clear the 2MB region.

That is, head_blk = 17983, log->l_logbbsize = 20480, zeroing needs
to run to 17983 + 4096 = 22079, which means it wraps the log and
has to zero 17984 -> 20389, then 0 -> 1599.

WHen the headblk is smaller, this is just a single 2MB sized IO.
When the headblk is within 2MB of the end of the log, it is two
sub-2MB write IOs issued sequentially. If this is taking a long
time, then I'd be looking at why the different IO patterns in
xlog_clear_stale_blocks() result in such a long delay. Smells like
a storage problem, not an XFS issue.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-20 20:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-10 18:27 XFS journal log resetting Chanchal Mathew
2021-12-10 21:36 ` Dave Chinner
2021-12-11  0:09   ` Chanchal Mathew
2021-12-11  5:55     ` Eric Sandeen
     [not found]   ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>
2021-12-12 23:08     ` Dave Chinner
     [not found]       ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>
2021-12-20 20:54         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox