* XFS journal log resetting @ 2021-12-10 18:27 Chanchal Mathew 2021-12-10 21:36 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Chanchal Mathew @ 2021-12-10 18:27 UTC (permalink / raw) To: linux-xfs Hi A question for the developers - Is there a reason why a clean unmount on an XFS filesystem would not reset the log numbers to 0? The xfs_logprint output shows the head and tail log numbers to have the same number > 0 and as CLEAN. But the log number is not reset to 0 without running a xfs_repair -L. Is there a reason it’s not done as part of the unmount? The problem I’m noticing is, the higher the log number, it takes longer for it to be mounted. Most time seems spent on the xlog_find_tail() call. Thank you! - Chanchal ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS journal log resetting 2021-12-10 18:27 XFS journal log resetting Chanchal Mathew @ 2021-12-10 21:36 ` Dave Chinner 2021-12-11 0:09 ` Chanchal Mathew [not found] ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com> 0 siblings, 2 replies; 6+ messages in thread From: Dave Chinner @ 2021-12-10 21:36 UTC (permalink / raw) To: Chanchal Mathew; +Cc: linux-xfs On Fri, Dec 10, 2021 at 10:27:11AM -0800, Chanchal Mathew wrote: > Hi > > A question for the developers - > > Is there a reason why a clean unmount on an XFS filesystem would > not reset the log numbers to 0? The xfs_logprint output shows the > head and tail log numbers to have the same number > 0 and as > CLEAN. But the log number is not reset to 0 without running a > xfs_repair -L. Is there a reason it’s not done as part of the > unmount? Yes. Log sequence numbers are a persistent record of when a given metadata object was modified. They are stored in all metadata, not just the log, and are used to determine if the modification in the log found at recovery time is already present in the on-disk metadata or not. Resetting the number back to zero basically breaks this recovery ordering verification, and takes us back to the bad old days of the v4 filesystem format where log recovery could overwrite newer changes in metadata. That could cause data loss and corruption problems when recovering newly allocated inodes and subsequent changes.... > The problem I’m noticing is, the higher the log number, it takes > longer for it to be mounted. Most time seems spent on the > xlog_find_tail() call. The log sequence number has nothign to do with how long xlog_find_tail() takes to run. entirely dependent on the size of the log and time it takes to binary search the journal to find the head. The head then points at the tail, which then gets verified. Once the head and tail are found, the journal contents between the head and tail are CRC checked, and the time this takes is entirely dependent on the distance between the head and tail of the log (i.e. up to 2GB of journal space might get CRC'd here). But at no point do larger LSNs make this take any longer. The upper 32 bits of the LSN is just a cycle number, and it is always interpreted as "past, current, next" by the xlog_find_tail() process no matter what it's magnitude is. i.e. log recvoery only has 3 cycle states it cares about, regardless of how many cycles the log has actually run. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS journal log resetting 2021-12-10 21:36 ` Dave Chinner @ 2021-12-11 0:09 ` Chanchal Mathew 2021-12-11 5:55 ` Eric Sandeen [not found] ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com> 1 sibling, 1 reply; 6+ messages in thread From: Chanchal Mathew @ 2021-12-11 0:09 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Thank you for the explanation! Wouldn’t we expect zero pending modification or no unwritten data when a device is cleanly unmounted? Or do you mean, a successful ‘umount’ run on the device doesn’t guarantee there are no pending changes? The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount. $ sudo xfs_logprint -t /dev/mapper/loop0p1 … log tail: 451 head: 451 state: <CLEAN> Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail(). $ sudo xfs_logprint -t /dev/mapper/loop0p1 … log tail: 17658 head: 17658 state: <CLEAN> The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted) The only way I can work around this delay for an instance started from an image file with higher log number is, to reset it to 0 with xfs_repair. Chanchal ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS journal log resetting 2021-12-11 0:09 ` Chanchal Mathew @ 2021-12-11 5:55 ` Eric Sandeen 0 siblings, 0 replies; 6+ messages in thread From: Eric Sandeen @ 2021-12-11 5:55 UTC (permalink / raw) To: Chanchal Mathew, Dave Chinner; +Cc: linux-xfs On 12/10/21 6:09 PM, Chanchal Mathew wrote: > Thank you for the explanation! > > Wouldn’t we expect zero pending modification or no unwritten data when a device is cleanly unmounted? Or do you mean, a successful ‘umount’ run on the device doesn’t guarantee there are no pending changes? > > The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount. > > $ sudo xfs_logprint -t /dev/mapper/loop0p1 > … > log tail: 451 head: 451 state: <CLEAN> > > > Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail(). > > $ sudo xfs_logprint -t /dev/mapper/loop0p1 > … > log tail: 17658 head: 17658 state: <CLEAN> > What is the geometry of these filesystems (xfs_info) and how much delay are we talking about here, in wall clock time? -Eric > The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted) > > The only way I can work around this delay for an instance started from an image file with higher log number is, to reset it to 0 with xfs_repair. > > > > Chanchal > ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>]
* Re: XFS journal log resetting [not found] ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com> @ 2021-12-12 23:08 ` Dave Chinner [not found] ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Dave Chinner @ 2021-12-12 23:08 UTC (permalink / raw) To: Chanchal Mathew; +Cc: linux-xfs On Fri, Dec 10, 2021 at 04:04:59PM -0800, Chanchal Mathew wrote: > > Thank you for the explanation! My response in-line. > > >> Hi > >> > >> A question for the developers - > >> > >> Is there a reason why a clean unmount on an XFS filesystem > >> would not reset the log numbers to 0? The xfs_logprint output > >> shows the head and tail log numbers to have the same number > 0 > >> and as CLEAN. But the log number is not reset to 0 without > >> running a xfs_repair -L. Is there a reason it’s not done as > >> part of the unmount? > > > > Yes. > > > > Log sequence numbers are a persistent record of when a given > > metadata object was modified. They are stored in all metadata, > > not just the log, and are used to determine if the modification > > in the log found at recovery time is already present in the > > on-disk metadata or not. > > > > Resetting the number back to zero basically breaks this recovery > > ordering verification, and takes us back to the bad old days of > > the v4 filesystem format where log recovery could overwrite > > newer changes in metadata. That could cause data loss and > > corruption problems when recovering newly allocated inodes and > > subsequent changes.... > > Wouldn’t we expect zero pending modification or no unwritten data > when a device is cleanly unmounted? Yes, but how do we determine that when mounting the filesystem? > Or do you mean, a successful > ‘umount’ run on the device doesn’t guarantee there are no pending > changes? No, the log is left clean via an unmount record that is written to the log during unmount. IOWs, to determine there is zero pending modification at mount time, the next mount *has to find that unmount record* to determine that the log was cleanly unmounted and does not require recovery. That's what xlog_find_tail() does. > >> The problem I’m noticing is, the higher the log number, it takes > >> longer for it to be mounted. Most time seems spent on the > >> xlog_find_tail() call. > > > > The log sequence number has nothign to do with how long > > xlog_find_tail() takes to run. entirely dependent on the size > > of the log and time it takes to binary search the journal to find > > the head. The head then points at the tail, which then gets > > verified. Once the head and tail are found, the journal contents > > between the head and tail are CRC checked, and the time this takes > > is entirely dependent on the distance between the head and tail of > > the log (i.e. up to 2GB of journal space might get CRC'd here). > > > > But at no point do larger LSNs make this take any longer. The upper > > 32 bits of the LSN is just a cycle number, and it is always > > interpreted as "past, current, next" by the xlog_find_tail() > > process no matter what it's magnitude is. i.e. log recvoery only has > > 3 cycle states it cares about, regardless of how many cycles the log > > has actually run. > > > > -Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > The devices I’m testing on are image files with same amount of data. One with lower log number is quicker to mount. > > $ sudo xfs_logprint -t /dev/mapper/loop0p1 > … > log tail: 451 head: 451 state: <CLEAN> > > Whereas, the one with higher log number, such as the one below, is about 4-5 times slower running xlog_find_tail(). This log hasn't even been written once. > $ sudo xfs_logprint -t /dev/mapper/loop0p1 > … > log tail: 17658 head: 17658 state: <CLEAN> > > The images are of same size, and have same amount of data as well (as verified by df and df -i once mounted) And this log hasn't even been written once here, either. As the cycle number is zero in both cases, that means it is likely that the only difference between the two cases is the number of IOs performed in the binary search to find the unmount record. i.e. we have a IO count and seek pattern difference. Absent any timing information, I would only expect the difference here to be a few tens of milliseconds between the two cases as it will only be a handful of disk seeks that are different. However, I can only guess at this, because you're not including raw timing information or any other information about the filesystem or starage. Can you please attach the scripts the timing information that you are using to determine that xlog_find_tail() is slower? i.e. post the commands *and the output* of the scripts you are using so that we can run the tests ourselves to try to replicate the problem you are seeing. We'll also need to know what storage this is running on, etc: https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > The only way I can work around this delay for an instance started > from an image file with higher log number is, to reset it to 0 > with xfs_repair. Which, since v4.3.0 was released in ~October 2015, xfs_repair does not do do on v5 filesystems because it can corrupt the filesystem (as I previously described). Hence I have to guess that you are either using the deprecated v4 filesystem format or your xfs_repair version is older than v4.3.0. Either way, you've got a problem when you upgrade these systems to newer OS versions... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>]
* Re: XFS journal log resetting [not found] ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com> @ 2021-12-20 20:54 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2021-12-20 20:54 UTC (permalink / raw) To: Chanchal Mathew; +Cc: linux-xfs On Mon, Dec 20, 2021 at 10:48:31AM -0800, Chanchal Mathew wrote: > I am starting an EC2 instance from an image file with XFS file system. The file system is cleanly unmounted, and the log print output I shared before is run on it before creating a snapshot and starting an instance from it. > The time is measured from dmesg & journalctl logs. Here is a sample output. Kernel version 5.10.x > > - > [ 1.984392] XFS (nvme0n1p1): Mounting V5 Filesystem > [ 3.538829] XFS (nvme0n1p1): Ending clean mount > - > > I then edited XFS code to print more log messages for this code path. And found it to spend the nearly 1s time on xlog_find_tail. Here is the debug dmesg output from another image with high log number. If I reset the log to 0, then the mount completes in 200-300ms avg. For a long tail number of 17983, for example, it takes 1s approx. > > - > [ 2.387962] XFS (nvme0n1p1): starting xlog_find_tail > [ 2.392108] XFS (nvme0n1p1): Find previous log record > [ 2.402898] XFS (nvme0n1p1): Assert head_blk < INT_MAX: 17983 > [ 2.407355] XFS (nvme0n1p1): Allocate buffer > [ 2.411108] XFS (nvme0n1p1): Search backwards through the log > [ 2.416299] XFS (nvme0n1p1): tail_blk: 17979 > [ 2.420045] XFS (nvme0n1p1): Set the log state based on the current head record > [ 2.426834] XFS (nvme0n1p1): Look for an unmount record at the head of the log > [ 2.433763] XFS (nvme0n1p1): Verify the log head of the log is not clean > [ 2.438599] XFS (nvme0n1p1): Note that the unmount was clean > [ 2.442950] XFS (nvme0n1p1): Make sure that there are no blocks in front of the head > [ 3.439000] XFS (nvme0n1p1): finished xlog_find_tail So there's a delay while recovery writes up to 2MB of zeroed blocks beyond the head so that future recoveries won't get confused by stale, unrecovered partial log writes that this recovery ignored. > Here is the output of xfs_info on the mounted image: > > - > $ xfs_info /dev/mapper/loop2p1 > meta-data=/dev/mapper/loop2p1 isize=512 agcount=4, agsize=524159 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1 spinodes=0 V5 format - "zeroing" the log with xfs_repair is not actually zeroing the log sequence numbers - it's re-initialising the log, not "resetting it to zero". > data = bsize=4096 blocks=2096635, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=2560, version=2 And so you have a 10MB log, which means the log block zeroing is wrapping around the head and having to do two IOs instead of one to clear the 2MB region. That is, head_blk = 17983, log->l_logbbsize = 20480, zeroing needs to run to 17983 + 4096 = 22079, which means it wraps the log and has to zero 17984 -> 20389, then 0 -> 1599. WHen the headblk is smaller, this is just a single 2MB sized IO. When the headblk is within 2MB of the end of the log, it is two sub-2MB write IOs issued sequentially. If this is taking a long time, then I'd be looking at why the different IO patterns in xlog_clear_stale_blocks() result in such a long delay. Smells like a storage problem, not an XFS issue. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-20 20:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-10 18:27 XFS journal log resetting Chanchal Mathew
2021-12-10 21:36 ` Dave Chinner
2021-12-11 0:09 ` Chanchal Mathew
2021-12-11 5:55 ` Eric Sandeen
[not found] ` <F6A3B616-EF69-4AFF-BB12-17D6E53AC8D3@gmail.com>
2021-12-12 23:08 ` Dave Chinner
[not found] ` <5171B0E5-B782-4787-BE20-A432D090A9A4@gmail.com>
2021-12-20 20:54 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox