* Massive overhead even after deleting checkpoints
@ 2025-01-10 15:54 Felix E. Klee
2025-01-10 17:36 ` Ryusuke Konishi
0 siblings, 1 reply; 10+ messages in thread
From: Felix E. Klee @ 2025-01-10 15:54 UTC (permalink / raw)
To: linux-nilfs
The disk is full close to the max:
$ df -h /bigstore/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/bigstore 3.5T 3.3T 65G 99% /bigstore
Yet, not that much is actually used by the files themselves:
$ du -sh /bigstore/
2.5T /bigstore/
Using `rmcp` I deleted all checkpoints, but that didn’t solve the issue.
Furthermore, there are no snapshots:
$ lscp
CNO DATE TIME MODE FLG BLKCNT ICNT
443574 2025-01-10 16:41:44 cp - 652100924 421961
443575 2025-01-10 16:41:44 cp - 652100923 421960
The cleaner daemon is running with default configuration (Arch):
$ ps ax | grep -i cleanerd
827 ? S 0:39 /sbin/nilfs_cleanerd
/dev/mapper/bigstore /bigstore
117067 pts/1 S+ 0:00 grep --color=auto -i cleanerd
I also rebooted the system, causing a remount of the partition. Yet,
still no improvement.
Is there a solution, or is the missing space simply used up by NILFS
data structures? (it would be a bit very much overhead)
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Massive overhead even after deleting checkpoints 2025-01-10 15:54 Massive overhead even after deleting checkpoints Felix E. Klee @ 2025-01-10 17:36 ` Ryusuke Konishi 2025-01-10 18:25 ` Felix E. Klee 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2025-01-10 17:36 UTC (permalink / raw) To: Felix E. Klee; +Cc: linux-nilfs On Sat, Jan 11, 2025 at 12:54 AM Felix E. Klee wrote: > > The disk is full close to the max: > > $ df -h /bigstore/ > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/bigstore 3.5T 3.3T 65G 99% /bigstore > > Yet, not that much is actually used by the files themselves: > > $ du -sh /bigstore/ > 2.5T /bigstore/ > > Using `rmcp` I deleted all checkpoints, but that didn’t solve the issue. > Furthermore, there are no snapshots: > > $ lscp > CNO DATE TIME MODE FLG BLKCNT ICNT > 443574 2025-01-10 16:41:44 cp - 652100924 421961 > 443575 2025-01-10 16:41:44 cp - 652100923 421960 > > The cleaner daemon is running with default configuration (Arch): > > $ ps ax | grep -i cleanerd > 827 ? S 0:39 /sbin/nilfs_cleanerd > /dev/mapper/bigstore /bigstore > 117067 pts/1 S+ 0:00 grep --color=auto -i cleanerd > > I also rebooted the system, causing a remount of the partition. Yet, > still no improvement. > > Is there a solution, or is the missing space simply used up by NILFS > data structures? (it would be a bit very much overhead) rmcp only deletes checkpoints and does nothing to free up disk space, so try the nilfs-clean command. This command forces GC to run. Example: $ sudo nilfs-clean -S 20/0.1 Because your disk usage seems too high, you might want to check the actual usage with the "lssu -l" command before actually running it. This is a kind of debugging command, but it will actually show you the percentage of blocks used per segment (GC unit). $ sudo lssu -l Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-10 17:36 ` Ryusuke Konishi @ 2025-01-10 18:25 ` Felix E. Klee 2025-01-11 5:29 ` Ryusuke Konishi 0 siblings, 1 reply; 10+ messages in thread From: Felix E. Klee @ 2025-01-10 18:25 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs On Fri, Jan 10, 2025 at 6:37 PM Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote: > Example: > $ sudo nilfs-clean -S 20/0.1 Thank you! That improved things. But there is still a lot of overhead. It’s 3.0TB in total vs. 2.5TB actually used by files: $ sudo nilfs-clean -S 20/0.1 $ df -h /bigstore/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/bigstore 3.5T 3.0T 338G 91% /bigstore $ du -sh /bigstore/ 2.5T /bigstore/ As mentioned in my original email, initially usage according to `df` was 3.3TB. So only 0.3TB have been gained. > $ sudo lssu -l It generates 28 MB of data that starts off like this: SEGNUM DATE TIME STAT NBLOCKS NLIVEBLOCKS 3 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) 4 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) 5 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) 6 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) 7 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) I have no idea what to make out of this. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-10 18:25 ` Felix E. Klee @ 2025-01-11 5:29 ` Ryusuke Konishi 2025-01-11 6:21 ` Ryusuke Konishi 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2025-01-11 5:29 UTC (permalink / raw) To: Felix E. Klee; +Cc: linux-nilfs On Sat, Jan 11, 2025 at 3:25 AM Felix E. Klee wrote: > > On Fri, Jan 10, 2025 at 6:37 PM Ryusuke Konishi > <konishi.ryusuke@gmail.com> wrote: > > Example: > > $ sudo nilfs-clean -S 20/0.1 > > Thank you! That improved things. But there is still a lot of overhead. > It’s 3.0TB in total vs. 2.5TB actually used by files: > > $ sudo nilfs-clean -S 20/0.1 > $ df -h /bigstore/ > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/bigstore 3.5T 3.0T 338G 91% /bigstore > $ du -sh /bigstore/ > 2.5T /bigstore/ > > As mentioned in my original email, initially usage according to `df` was > 3.3TB. So only 0.3TB have been gained. > > > $ sudo lssu -l > > It generates 28 MB of data that starts off like this: > > SEGNUM DATE TIME STAT NBLOCKS NLIVEBLOCKS > 3 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > 4 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) > 5 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > 6 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) > 7 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > > I have no idea what to make out of this. The output seems to be after GC, but by default nilfs considers blocks less than an hour old as live (in use), so if you run "lssu -l" again or add the "-p 0" option to set the protection period to 0 seconds, the results may be different. $ sudo lssu -l -p 0 Note that the disk capacity output of the df command includes the reserved space of the file system. By default, NILFS reserves 5% of the disk capacity as a reserved space for GC and normal file system operations (the ratio is the same as ext4). Therefore, the effective capacity of a 3.5TiB disk is about 3.3TiB. In addition to that, NILFS has overhead due to various metadata, the largest of which are DAT for disk address management (1), segment summary for managing segments and logs (2), and B-tree blocks (3). Of these, (3) should be included in the du output capacity, so (1) and (2) are likely to be the main causes. (1) is just over 32 bytes per 4KiB block, which is about 0.78%, and (2) is at most 1.5% depending on usage, so there is a total overhead of just over 2.3%. If the effective capacity is 3.3TiB, the calculated overhead is 0.076TiB, so the upper limit capacity should be around 3.2TiB (theoretically). Other factors may include the 3600 second protection period, and the fact that the NILFS df output is roughly calculated from used segments rather than actual used blocks, so this difference may be affecting it. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-11 5:29 ` Ryusuke Konishi @ 2025-01-11 6:21 ` Ryusuke Konishi 2025-01-16 11:08 ` Felix E. Klee 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2025-01-11 6:21 UTC (permalink / raw) To: Felix E. Klee; +Cc: linux-nilfs On Sat, Jan 11, 2025 at 2:29 PM Ryusuke Konishi wrote: > > On Sat, Jan 11, 2025 at 3:25 AM Felix E. Klee wrote: > > > > On Fri, Jan 10, 2025 at 6:37 PM Ryusuke Konishi > > <konishi.ryusuke@gmail.com> wrote: > > > Example: > > > $ sudo nilfs-clean -S 20/0.1 > > > > Thank you! That improved things. But there is still a lot of overhead. > > It’s 3.0TB in total vs. 2.5TB actually used by files: > > > > $ sudo nilfs-clean -S 20/0.1 > > $ df -h /bigstore/ > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/bigstore 3.5T 3.0T 338G 91% /bigstore > > $ du -sh /bigstore/ > > 2.5T /bigstore/ > > > > As mentioned in my original email, initially usage according to `df` was > > 3.3TB. So only 0.3TB have been gained. > > > > > $ sudo lssu -l > > > > It generates 28 MB of data that starts off like this: > > > > SEGNUM DATE TIME STAT NBLOCKS NLIVEBLOCKS > > 3 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > > 4 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) > > 5 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > > 6 2025-01-10 12:19:48 -d-- 2048 2040 ( 99%) > > 7 2025-01-10 12:19:48 -d-- 2048 2036 ( 99%) > > > > I have no idea what to make out of this. > > The output seems to be after GC, but by default nilfs considers blocks > less than an hour old as live (in use), so if you run "lssu -l" again > or add the "-p 0" option to set the protection period to 0 seconds, > the results may be different. > > $ sudo lssu -l -p 0 > > Note that the disk capacity output of the df command includes the > reserved space of the file system. By default, NILFS reserves 5% of > the disk capacity as a reserved space for GC and normal file system > operations (the ratio is the same as ext4). Therefore, the effective > capacity of a 3.5TiB disk is about 3.3TiB. > > In addition to that, NILFS has overhead due to various metadata, the > largest of which are DAT for disk address management (1), segment > summary for managing segments and logs (2), and B-tree blocks (3). > > Of these, (3) should be included in the du output capacity, so (1) and > (2) are likely to be the main causes. > (1) is just over 32 bytes per 4KiB block, which is about 0.78%, and > (2) is at most 1.5% depending on usage, so there is a total overhead > of just over 2.3%. > If the effective capacity is 3.3TiB, the calculated overhead is > 0.076TiB, so the upper limit capacity should be around 3.2TiB > (theoretically). > > Other factors may include the 3600 second protection period, and the > fact that the NILFS df output is roughly calculated from used segments > rather than actual used blocks, so this difference may be affecting > it. Incidentally, the reason why the df output (used capacity) of NILFS is calculated from the used segments and not the number of used blocks is because the blocks in use on NILFS change dynamically depending on the conditions, making it difficult to respond immediately. If the dissociation is large, I think some kind of algorithm should be introduced to improve it. The actual blocks in use should be able to be calculated as follows using the output of "lssu -l" (when the block size is 4KiB). For your reference. $ sudo lssu -l -p 0 | awk 'NR>1{sum+=$6}END{print sum*4096}' | numfmt --to=iec-i Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-11 6:21 ` Ryusuke Konishi @ 2025-01-16 11:08 ` Felix E. Klee 2025-01-16 18:24 ` Ryusuke Konishi 0 siblings, 1 reply; 10+ messages in thread From: Felix E. Klee @ 2025-01-16 11:08 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs Thank you for the detailed explanation! The overhead for metadata is what I expected. However, I wasn’t aware of the default protection period of one hour, also not of the concept of segments. Now, a few days later, I get: $ df -h /bigstore/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/bigstore 3.5T 2.7T 699G 80% /bigstore $ du -sh /bigstore/ 2.5T /bigstore/ The used space reported by `df -f` is not 2.7T vs 3.0T a few days ago. Back then I was apparently too close to a major file operation. I had geotagged tens of thousands of raw image files, modifying them directly (exif headers). Should the following command have freed up diskspace? # nilfs-clean -S 20/0.1 --protection-period=0 /bigstore I realize it doesn’t reduce the number of checkpoints. I really am a n00b when it comes to log structured file systems. I just want to use NILFS2 for the ability to revert accidental file changes. One more question, as you wrote: > Incidentally, the reason why the df output (used capacity) of NILFS is > calculated from the used segments and not the number of used blocks is > because the blocks in use on NILFS change dynamically depending on the > conditions, making it difficult to respond immediately. If the > dissociation is large, I think some kind of algorithm should be > introduced to improve it. > > The actual blocks in use should be able to be calculated as follows > using the output of "lssu -l" (when the block size is 4KiB). For your > reference. > > $ sudo lssu -l -p 0 | awk 'NR>1{sum+=$6}END{print sum*4096}' | numfmt --to=iec-i Certainly interesting! But, I assume, without garbage collection I cannot use the space in sparse segments anyhow. So `df` should give me the space that currently is available for actual use. Do I understand that correctly? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-16 11:08 ` Felix E. Klee @ 2025-01-16 18:24 ` Ryusuke Konishi 2025-01-31 8:13 ` Felix E. Klee 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2025-01-16 18:24 UTC (permalink / raw) To: Felix E. Klee; +Cc: linux-nilfs On Thu, Jan 16, 2025 at 8:09 PM Felix E. Klee wrote: > > Thank you for the detailed explanation! The overhead for metadata is > what I expected. However, I wasn’t aware of the default protection > period of one hour, also not of the concept of segments. > > Now, a few days later, I get: > > $ df -h /bigstore/ > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/bigstore 3.5T 2.7T 699G 80% /bigstore > $ du -sh /bigstore/ > 2.5T /bigstore/ > > The used space reported by `df -f` is not 2.7T vs 3.0T a few days ago. > Back then I was apparently too close to a major file operation. I had > geotagged tens of thousands of raw image files, modifying them directly > (exif headers). > > Should the following command have freed up diskspace? > > # nilfs-clean -S 20/0.1 --protection-period=0 /bigstore > > I realize it doesn’t reduce the number of checkpoints. The nilfs-clean command is a maintenance command, so use it when you want to free up disk space quickly (especially if you don't mind turning off the protection period). GC runs automatically in the background based on the watermark conditions set in /etc/nilfs_cleanerd.conf, even if you don't run the nilfs-clean command. However, the GC speed is slow by default, so if the capacity is large, you may want to adjust parameters such as "nsegments_per_clean" (and "mc_nsegments_per_clean") and "cleaning_interval" (and "mc_cleaning_interval") to increase the GC speed. The nilfs-clean command takes a device path as an argument, so the command line is: # nilfs-clean -S 20/0.1 --protection-period=0 /dev/mapper/bigstore or, omitting the device: # nilfs-clean -S 20/0.1 --protection-period=0 Also, I forgot to explain one thing. The current GC skips segments whose free block ratio is equal to or less than the "min_reclaimable_blocks" parameter (default is 10%, and 1% of mc_min_reclaimable_blocks if the remaining capacity is low) in /etc/nilfs_cleanerd.conf. If you want to ignore this ratio and force GC, use the "-m" option, like this: # nilfs-clean -S 20/0.1 -p 0 -m 5 or specify a percentage: # nilfs-clean -S 20/0.1 -p 0 -m 1% When running GC manually, be aware that once GC of all segments is finished, GC will be skipped until some change is made to avoid infinite GC execution. If you change the GC conditions and rerun it, it may work if you make a change and then run the nilfs-clean command, as in the example below: # touch /bigstore # mkcp # nilfs-clean -S 20/0.1 -p 0 -m 1% I've gotten into a lot of detail. Please understand that these are tips to use for when you're in trouble. > I really am a n00b when it comes to log structured file systems. I just > want to use NILFS2 for the ability to revert accidental file changes. LFS is a legacy method and is not common, so the need for this kind of interaction is a drawback in a sense, and it may be an area where the tools need to be improved. One thing to note is that while nilfs can save you from data loss due to user misoperation, it is powerless against physical device failure, so you still need to back up important data. > One more question, as you wrote: > > Incidentally, the reason why the df output (used capacity) of NILFS is > > calculated from the used segments and not the number of used blocks is > > because the blocks in use on NILFS change dynamically depending on the > > conditions, making it difficult to respond immediately. If the > > dissociation is large, I think some kind of algorithm should be > > introduced to improve it. > > > > The actual blocks in use should be able to be calculated as follows > > using the output of "lssu -l" (when the block size is 4KiB). For your > > reference. > > > > $ sudo lssu -l -p 0 | awk 'NR>1{sum+=$6}END{print sum*4096}' | numfmt --to=iec-i > > Certainly interesting! But, I assume, without garbage collection I > cannot use the space in sparse segments anyhow. So `df` should give me > the space that currently is available for actual use. Do I understand > that correctly? That understanding is correct. The above example is for checking potential free space and isolating the problem. On the other hand, the current capacity calculation is not without issues. For example, I think it is counterintuitive that a segment is judged to be in use even if all the blocks inside are garbage (reclaimable) until the GC runs. Well, but rather than improving the capacity calculation, it may be better to cover this by improving the GC. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-16 18:24 ` Ryusuke Konishi @ 2025-01-31 8:13 ` Felix E. Klee 2025-02-06 7:07 ` Ryusuke Konishi 0 siblings, 1 reply; 10+ messages in thread From: Felix E. Klee @ 2025-01-31 8:13 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs On Fri, Jan 17, 2025 at 2:25 AM Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote: > GC runs automatically in the background based on the watermark > conditions set in /etc/nilfs_cleanerd.conf, even if you don't run the > nilfs-clean command. When I run `nilfs-clean` with options such as `--protection-period=0`, will that change the settings of `cleanerd` until the next reboot? Or do the options only apply to a single GC run? > If you want to ignore this ratio and force GC, use the "-m" option, > like this: > > # nilfs-clean -S 20/0.1 -p 0 -m 5 Thanks! Regarding `-S 20/0.1`, that means the cleaning happens 20 times for every 0.1 seconds? And each time `nsegments_per_clean` / `mc_nsegments_per_clean` are cleaned? > LFS is a legacy method and is not common “not common” I understand, but why legacy? What does supersede it? High frequency snapshotting is something I am missing from other file systems. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-01-31 8:13 ` Felix E. Klee @ 2025-02-06 7:07 ` Ryusuke Konishi 2025-02-07 4:00 ` Felix E. Klee 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2025-02-06 7:07 UTC (permalink / raw) To: Felix E. Klee; +Cc: linux-nilfs Hi Felix, On Fri, Jan 31, 2025 at 5:14 PM Felix E. Klee wrote: > > On Fri, Jan 17, 2025 at 2:25 AM Ryusuke Konishi > <konishi.ryusuke@gmail.com> wrote: > > GC runs automatically in the background based on the watermark > > conditions set in /etc/nilfs_cleanerd.conf, even if you don't run the > > nilfs-clean command. > > When I run `nilfs-clean` with options such as `--protection-period=0`, > will that change the settings of `cleanerd` until the next reboot? Or do > the options only apply to a single GC run? It only affects a single (one round) GC. Once that's done, it goes back to normal. > > If you want to ignore this ratio and force GC, use the "-m" option, > > like this: > > > > # nilfs-clean -S 20/0.1 -p 0 -m 5 > > Thanks! > > Regarding `-S 20/0.1`, that means the cleaning happens 20 times for > every 0.1 seconds? And each time `nsegments_per_clean` / > `mc_nsegments_per_clean` are cleaned? '-S 20/0.1' gives the GC pace, meaning that 20 segments are GC'd every 0.1 seconds. The numerator of the speed is the parameter equivalent to "nsegments_per_clean", which changes only during manual GC. > > LFS is a legacy method and is not common > > “not common” I understand, but why legacy? What does supersede it? Generally speaking, copy-on-write file systems such as ZFS and Btrfs are newer. The concept of LFS (Log-structured File System) itself was proposed in 1988 and implemented in UNIX in 1992. It is an old method, and I believe there are few surviving implementations today. In that sense, I used the word "legacy". In a broad sense, LFS is also a copy-on-write file system, but the difference is that it divides the storage medium into segments and performs space management (GC) on those units. > High frequency snapshotting is something I am missing from other file > systems. That may be true, but frequent snapshots are in principle possible in copy-on-write filesystems (apart from the actual support), while retroactive snapshots (the ability to turn each checkpoint into a mountable snapshot at a later time) are unique to NILFS. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Massive overhead even after deleting checkpoints 2025-02-06 7:07 ` Ryusuke Konishi @ 2025-02-07 4:00 ` Felix E. Klee 0 siblings, 0 replies; 10+ messages in thread From: Felix E. Klee @ 2025-02-07 4:00 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs Ryusuke Konishi, thank you again for the detailed explanations! On Thu, Feb 6, 2025 at 3:08 PM Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote: > Generally speaking, copy-on-write file systems such as ZFS and Btrfs > are newer. The concept of LFS (Log-structured File System) itself was > proposed in 1988 and implemented in UNIX in 1992. It is an old method, > and I believe there are few surviving implementations today. In that > sense, I used the word "legacy". Haha, but we’re still using lots of old methods around today, such as UNIX and UNIX-like systems. > > High frequency snapshotting is something I am missing from other > > file systems. > > That may be true, but frequent snapshots are in principle possible in > copy-on-write filesystems (apart from the actual support), while > retroactive snapshots (the ability to turn each checkpoint into a > mountable snapshot at a later time) are unique to NILFS. I should’ve used the term “high-frequency checkpointing”. As an end user, the only difference between checkpoints and snapshots I see is that snapshots won’t be deleted by GC. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-02-07 4:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-10 15:54 Massive overhead even after deleting checkpoints Felix E. Klee 2025-01-10 17:36 ` Ryusuke Konishi 2025-01-10 18:25 ` Felix E. Klee 2025-01-11 5:29 ` Ryusuke Konishi 2025-01-11 6:21 ` Ryusuke Konishi 2025-01-16 11:08 ` Felix E. Klee 2025-01-16 18:24 ` Ryusuke Konishi 2025-01-31 8:13 ` Felix E. Klee 2025-02-06 7:07 ` Ryusuke Konishi 2025-02-07 4:00 ` Felix E. Klee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).