* xfsprogs: repair: Higher memory consumption when disable prefetch
@ 2023-11-08 15:56 Per Förlin
2023-11-08 22:05 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Per Förlin @ 2023-11-08 15:56 UTC (permalink / raw)
To: linux-xfs@vger.kernel.org
Hi Linux XFS community,
Please bare with me I'm new to XFS :)
I'm comparing how EXT4 and XFS behaves on systems with a relative
small RAM vs storage ratio. The current focus is on FS repair memory consumption.
I have been running some tests using the max_mem_specified option.
The "-m" (max_mem_specified) parameter does not guarantee success but it surely helps
to reduce the memory load, in comparison to EXT4 this is an improvement.
My question concerns the relation between "-P" (disable prefetch) and "-m" (max_mem_specified).
There is a difference in xfs_repair memory consumption between the following commands
1. xfs_repair -P -m 500
2. xfs_repair -m 500
1) Exceeds the max_mem_specified limit
2) Stays below the max_mem_specified limit
I expected disabled prefetch to reduce the memory load but instead the result is the opposite.
The two commands 1) and 2) are being executed in the same system.
My speculation:
Does the prefetch facilitate and improve the calculation of the memory
consumption and make it more accurate?
Here follows output with -P and without -P from the same system.
I have extracted the part the actually differs.
The full logs are available the bottom of this email.
# -P -m 500 #
Phase 3 - for each AG...
...
Active entries = 12336
Hash table size = 1549
Hits = 1
Misses = 224301
Hit ratio = 0.00
MRU 0 entries = 12335 ( 99%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 0 ( 0%)
MRU 3 entries = 0 ( 0%)
MRU 4 entries = 0 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 0 ( 0%)
MRU 7 entries = 0 ( 0%)
# -m 500 #
Phase 3 - for each AG...
...
Active entries = 12388
Hash table size = 1549
Hits = 220459
Misses = 235388
Hit ratio = 48.36
MRU 0 entries = 2 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 1362 ( 10%)
MRU 3 entries = 68 ( 0%)
MRU 4 entries = 10 ( 0%)
MRU 5 entries = 6097 ( 49%)
MRU 6 entries = 4752 ( 38%)
MRU 7 entries = 96 ( 0%)
Tested on version 6.1.1 and 6.5.0, same result.
BR
Per Forlin
-------------------------------------------------------------------------
Here follows full logs for both memory consumption and xfs_repair output.
## Full log of xfs_repair with "-P" that exceeds max limit and crash the system
# xfs_repair -vvv -P -n -m 500 /dev/sda1
Phase 1 - find and verify superblock...
bhash_option_used=0
max_mem_specified=500
verbose=3
[main:1166] perfn
- max_mem = 512000, icount = 6445568, imem = 25178, dblock = 488378385, dmem = 238466
- block cache size set to 12392 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 1068680 tail block 1068680
- scan filesystem freespace and inode maps...
- found root inode chunk
libxfs_bcache: 0x5597cf9220
Max supported entries = 12392
Max utilized entries = 582
Active entries = 582
Hash table size = 1549
Hits = 1
Misses = 582
Hit ratio = 0.17
MRU 0 entries = 581 ( 99%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 0 ( 0%)
MRU 3 entries = 0 ( 0%)
MRU 4 entries = 0 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 0 ( 0%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 0 entries 1078 ( 0%)
Hash buckets with 1 entries 375 ( 64%)
Hash buckets with 2 entries 84 ( 28%)
Hash buckets with 3 entries 9 ( 4%)
Hash buckets with 4 entries 3 ( 2%)
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
libxfs_bcache: 0x5597cf9220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12336
Hash table size = 1549
Hits = 1
Misses = 224301
Hit ratio = 0.00
MRU 0 entries = 12335 ( 99%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 0 ( 0%)
MRU 3 entries = 0 ( 0%)
MRU 4 entries = 0 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 0 ( 0%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 2 ( 0%)
Hash buckets with 2 entries 17 ( 0%)
Hash buckets with 3 entries 42 ( 1%)
Hash buckets with 4 entries 87 ( 2%)
Hash buckets with 5 entries 153 ( 6%)
Hash buckets with 6 entries 198 ( 9%)
Hash buckets with 7 entries 225 ( 12%)
Hash buckets with 8 entries 208 ( 13%)
Hash buckets with 9 entries 184 ( 13%)
Hash buckets with 10 entries 162 ( 13%)
Hash buckets with 11 entries 99 ( 8%)
Hash buckets with 12 entries 79 ( 7%)
Hash buckets with 13 entries 38 ( 4%)
Hash buckets with 14 entries 25 ( 2%)
Hash buckets with 15 entries 15 ( 1%)
Hash buckets with 16 entries 9 ( 1%)
Hash buckets with 17 entries 2 ( 0%)
Hash buckets with 18 entries 2 ( 0%)
Hash buckets with 19 entries 2 ( 0%)
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
!! OOM system crash !!
# Memory log:
# while :; do grep Private_D /proc/$(pidof xfs_repair)/smaps_rollup ; grep MemAvail /proc/meminfo ; sleep 10; done
Private_Dirty: 11772 kB
MemAvailable: 625436 kB
Private_Dirty: 135020 kB
MemAvailable: 501736 kB
Private_Dirty: 239860 kB
MemAvailable: 396432 kB
Private_Dirty: 269312 kB
MemAvailable: 366948 kB
Private_Dirty: 290976 kB
MemAvailable: 344756 kB
Private_Dirty: 304520 kB
MemAvailable: 330392 kB
Private_Dirty: 331152 kB
MemAvailable: 304184 kB
Private_Dirty: 361924 kB
MemAvailable: 272400 kB
Private_Dirty: 382204 kB
MemAvailable: 252476 kB
Private_Dirty: 407184 kB
MemAvailable: 227008 kB
Private_Dirty: 422432 kB
MemAvailable: 211160 kB
Private_Dirty: 437428 kB
MemAvailable: 197144 kB
Private_Dirty: 460960 kB
MemAvailable: 175692 kB
Private_Dirty: 467128 kB
MemAvailable: 168156 kB
Private_Dirty: 483184 kB
MemAvailable: 153280 kB
Private_Dirty: 507128 kB
MemAvailable: 131140 kB
Private_Dirty: 540896 kB
MemAvailable: 97488 kB
Private_Dirty: 575480 kB
MemAvailable: 67268 kB
Private_Dirty: 604580 kB
MemAvailable: 36484 kB
Private_Dirty: 614316 kB
MemAvailable: 31668 kB
Private_Dirty: 645888 kB
MemAvailable: 24232 kB
Private_Dirty: 659140 kB
MemAvailable: 21444 kB
!! Runs out of memory at this point !!
## Full log of xfs_repair with "-P" run that stays within max limit and finish successfully
root@ax-b8a44f27a3b4:~# xfs_repair -vvv -n -m 500 /dev/sda1
Phase 1 - find and verify superblock...
bhash_option_used=0
max_mem_specified=500
verbose=3
[main:1166] perfn
- max_mem = 512000, icount = 6445568, imem = 25178, dblock = 488378385, dmem = 238466
- block cache size set to 12392 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 1068680 tail block 1068680
- scan filesystem freespace and inode maps...
- found root inode chunk
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 582
Active entries = 582
Hash table size = 1549
Hits = 1
Misses = 582
Hit ratio = 0.17
MRU 0 entries = 581 ( 99%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 0 ( 0%)
MRU 3 entries = 0 ( 0%)
MRU 4 entries = 0 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 0 ( 0%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 0 entries 1078 ( 0%)
Hash buckets with 1 entries 375 ( 64%)
Hash buckets with 2 entries 84 ( 28%)
Hash buckets with 3 entries 9 ( 4%)
Hash buckets with 4 entries 3 ( 2%)
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12388
Hash table size = 1549
Hits = 220459
Misses = 235388
Hit ratio = 48.36
MRU 0 entries = 2 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 1362 ( 10%)
MRU 3 entries = 68 ( 0%)
MRU 4 entries = 10 ( 0%)
MRU 5 entries = 6097 ( 49%)
MRU 6 entries = 4752 ( 38%)
MRU 7 entries = 96 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 2 ( 0%)
Hash buckets with 2 entries 8 ( 0%)
Hash buckets with 3 entries 35 ( 0%)
Hash buckets with 4 entries 88 ( 2%)
Hash buckets with 5 entries 123 ( 4%)
Hash buckets with 6 entries 180 ( 8%)
Hash buckets with 7 entries 243 ( 13%)
Hash buckets with 8 entries 249 ( 16%)
Hash buckets with 9 entries 224 ( 16%)
Hash buckets with 10 entries 151 ( 12%)
Hash buckets with 11 entries 109 ( 9%)
Hash buckets with 12 entries 50 ( 4%)
Hash buckets with 13 entries 51 ( 5%)
Hash buckets with 14 entries 17 ( 1%)
Hash buckets with 15 entries 8 ( 0%)
Hash buckets with 16 entries 9 ( 1%)
Hash buckets with 17 entries 1 ( 0%)
Hash buckets with 18 entries 1 ( 0%)
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12369
Hash table size = 1549
Hits = 445862
Misses = 484224
Hit ratio = 47.94
MRU 0 entries = 5 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 1498 ( 12%)
MRU 3 entries = 73 ( 0%)
MRU 4 entries = 17 ( 0%)
MRU 5 entries = 6401 ( 51%)
MRU 6 entries = 4374 ( 35%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 1 ( 0%)
Hash buckets with 2 entries 13 ( 0%)
Hash buckets with 3 entries 35 ( 0%)
Hash buckets with 4 entries 93 ( 3%)
Hash buckets with 5 entries 126 ( 5%)
Hash buckets with 6 entries 184 ( 8%)
Hash buckets with 7 entries 235 ( 13%)
Hash buckets with 8 entries 239 ( 15%)
Hash buckets with 9 entries 214 ( 15%)
Hash buckets with 10 entries 155 ( 12%)
Hash buckets with 11 entries 115 ( 10%)
Hash buckets with 12 entries 63 ( 6%)
Hash buckets with 13 entries 30 ( 3%)
Hash buckets with 14 entries 22 ( 2%)
Hash buckets with 15 entries 12 ( 1%)
Hash buckets with 16 entries 7 ( 0%)
Hash buckets with 17 entries 3 ( 0%)
Hash buckets with 18 entries 2 ( 0%)
No modify flag set, skipping phase 5
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12369
Hash table size = 1549
Hits = 445862
Misses = 484224
Hit ratio = 47.94
MRU 0 entries = 5 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 1498 ( 12%)
MRU 3 entries = 73 ( 0%)
MRU 4 entries = 17 ( 0%)
MRU 5 entries = 6401 ( 51%)
MRU 6 entries = 4374 ( 35%)
MRU 7 entries = 0 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 0 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 1 ( 0%)
Hash buckets with 2 entries 13 ( 0%)
Hash buckets with 3 entries 35 ( 0%)
Hash buckets with 4 entries 93 ( 3%)
Hash buckets with 5 entries 126 ( 5%)
Hash buckets with 6 entries 184 ( 8%)
Hash buckets with 7 entries 235 ( 13%)
Hash buckets with 8 entries 239 ( 15%)
Hash buckets with 9 entries 214 ( 15%)
Hash buckets with 10 entries 155 ( 12%)
Hash buckets with 11 entries 115 ( 10%)
Hash buckets with 12 entries 63 ( 6%)
Hash buckets with 13 entries 30 ( 3%)
Hash buckets with 14 entries 22 ( 2%)
Hash buckets with 15 entries 12 ( 1%)
Hash buckets with 16 entries 7 ( 0%)
Hash buckets with 17 entries 3 ( 0%)
Hash buckets with 18 entries 2 ( 0%)
Phase 6 - check inode connectivity...
- traversing filesystem ...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- traversal finished ...
- moving disconnected inodes to lost+found ...
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12357
Hash table size = 1549
Hits = 3043575
Misses = 717152
Hit ratio = 80.93
MRU 0 entries = 1505 ( 12%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 3 ( 0%)
MRU 3 entries = 58 ( 0%)
MRU 4 entries = 9 ( 0%)
MRU 5 entries = 5981 ( 48%)
MRU 6 entries = 4696 ( 38%)
MRU 7 entries = 96 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 8 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 3 ( 0%)
Hash buckets with 2 entries 10 ( 0%)
Hash buckets with 3 entries 39 ( 0%)
Hash buckets with 4 entries 79 ( 2%)
Hash buckets with 5 entries 131 ( 5%)
Hash buckets with 6 entries 182 ( 8%)
Hash buckets with 7 entries 240 ( 13%)
Hash buckets with 8 entries 256 ( 16%)
Hash buckets with 9 entries 209 ( 15%)
Hash buckets with 10 entries 151 ( 12%)
Hash buckets with 11 entries 115 ( 10%)
Hash buckets with 12 entries 54 ( 5%)
Hash buckets with 13 entries 38 ( 3%)
Hash buckets with 14 entries 21 ( 2%)
Hash buckets with 15 entries 8 ( 0%)
Hash buckets with 16 entries 7 ( 0%)
Hash buckets with 17 entries 6 ( 0%)
Phase 7 - verify link counts...
libxfs_bcache: 0x55aa821220
Max supported entries = 12392
Max utilized entries = 12392
Active entries = 12357
Hash table size = 1549
Hits = 3043575
Misses = 717152
Hit ratio = 80.93
MRU 0 entries = 1505 ( 12%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 3 ( 0%)
MRU 3 entries = 58 ( 0%)
MRU 4 entries = 9 ( 0%)
MRU 5 entries = 5981 ( 48%)
MRU 6 entries = 4696 ( 38%)
MRU 7 entries = 96 ( 0%)
MRU 8 entries = 0 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 8 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Dirty MRU 16 entries = 0 ( 0%)
Hash buckets with 1 entries 3 ( 0%)
Hash buckets with 2 entries 10 ( 0%)
Hash buckets with 3 entries 39 ( 0%)
Hash buckets with 4 entries 79 ( 2%)
Hash buckets with 5 entries 131 ( 5%)
Hash buckets with 6 entries 182 ( 8%)
Hash buckets with 7 entries 240 ( 13%)
Hash buckets with 8 entries 256 ( 16%)
Hash buckets with 9 entries 209 ( 15%)
Hash buckets with 10 entries 151 ( 12%)
Hash buckets with 11 entries 115 ( 10%)
Hash buckets with 12 entries 54 ( 5%)
Hash buckets with 13 entries 38 ( 3%)
Hash buckets with 14 entries 21 ( 2%)
Hash buckets with 15 entries 8 ( 0%)
Hash buckets with 16 entries 7 ( 0%)
Hash buckets with 17 entries 6 ( 0%)
No modify flag set, skipping filesystem flush and exiting.
XFS_REPAIR Summary Wed Nov 8 13:05:25 2023
Phase Start End Duration
Phase 1: 11/08 12:59:25 11/08 12:59:25
Phase 2: 11/08 12:59:25 11/08 12:59:33 8 seconds
Phase 3: 11/08 12:59:33 11/08 13:01:29 1 minute, 56 seconds
Phase 4: 11/08 13:01:29 11/08 13:03:22 1 minute, 53 seconds
Phase 5: Skipped
Phase 6: 11/08 13:03:22 11/08 13:05:25 2 minutes, 3 seconds
Phase 7: 11/08 13:05:25 11/08 13:05:25
Total run time: 6 minutes
# Memory log:
# while :; do grep Private_D /proc/$(pidof xfs_repair)/smaps_rollup ; grep MemAvail /proc/meminfo ; sleep 10; done
Private_Dirty: 26172 kB
MemAvailable: 613712 kB
Private_Dirty: 235704 kB
MemAvailable: 403512 kB
Private_Dirty: 247580 kB
MemAvailable: 393164 kB
Private_Dirty: 258268 kB
MemAvailable: 381100 kB
Private_Dirty: 265832 kB
MemAvailable: 374548 kB
Private_Dirty: 272652 kB
MemAvailable: 366484 kB
Private_Dirty: 282496 kB
MemAvailable: 356484 kB
Private_Dirty: 286664 kB
MemAvailable: 354624 kB
Private_Dirty: 291684 kB
MemAvailable: 349820 kB
Private_Dirty: 308204 kB
MemAvailable: 332716 kB
Private_Dirty: 310520 kB
MemAvailable: 330180 kB
Private_Dirty: 312348 kB
MemAvailable: 327424 kB
Private_Dirty: 315280 kB
MemAvailable: 324828 kB
Private_Dirty: 332864 kB
MemAvailable: 307064 kB
Private_Dirty: 348504 kB
MemAvailable: 292304 kB
Private_Dirty: 362752 kB
MemAvailable: 276240 kB
Private_Dirty: 380060 kB
MemAvailable: 260568 kB
Private_Dirty: 396068 kB
MemAvailable: 244392 kB
Private_Dirty: 406540 kB
MemAvailable: 232548 kB
Private_Dirty: 417648 kB
MemAvailable: 221476 kB
Private_Dirty: 434708 kB
MemAvailable: 205192 kB
Private_Dirty: 443988 kB
MemAvailable: 194844 kB
Private_Dirty: 452880 kB
MemAvailable: 185504 kB
Private_Dirty: 462060 kB
MemAvailable: 178244 kB
Private_Dirty: 414464 kB
MemAvailable: 228676 kB
Private_Dirty: 420344 kB
MemAvailable: 223304 kB
Private_Dirty: 422584 kB
MemAvailable: 220700 kB
Private_Dirty: 423904 kB
MemAvailable: 218104 kB
Private_Dirty: 424172 kB
MemAvailable: 218120 kB
Private_Dirty: 424548 kB
MemAvailable: 216860 kB
Private_Dirty: 425080 kB
MemAvailable: 217236 kB
Private_Dirty: 425184 kB
MemAvailable: 217032 kB
Private_Dirty: 425464 kB
MemAvailable: 216512 kB
Private_Dirty: 427392 kB
MemAvailable: 214480 kB
Private_Dirty: 427768 kB
MemAvailable: 214732 kB
Private_Dirty: 428592 kB
MemAvailable: 215032 kB
Private_Dirty: 428580 kB
MemAvailable: 214204 kB
Finished successfully!
grep: /proc//smaps_rollup: No such file or directory
MemAvailable: 643000 kB
Type / to insert files and more
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: xfsprogs: repair: Higher memory consumption when disable prefetch 2023-11-08 15:56 xfsprogs: repair: Higher memory consumption when disable prefetch Per Förlin @ 2023-11-08 22:05 ` Dave Chinner 2023-11-08 22:54 ` Darrick J. Wong 0 siblings, 1 reply; 3+ messages in thread From: Dave Chinner @ 2023-11-08 22:05 UTC (permalink / raw) To: Per Förlin; +Cc: linux-xfs@vger.kernel.org On Wed, Nov 08, 2023 at 03:56:00PM +0000, Per Förlin wrote: > Hi Linux XFS community, > > Please bare with me I'm new to XFS :) > > I'm comparing how EXT4 and XFS behaves on systems with a relative > small RAM vs storage ratio. The current focus is on FS repair memory consumption. > > I have been running some tests using the max_mem_specified option. > The "-m" (max_mem_specified) parameter does not guarantee success but it surely helps > to reduce the memory load, in comparison to EXT4 this is an improvement. > > My question concerns the relation between "-P" (disable prefetch) and "-m" (max_mem_specified). > > There is a difference in xfs_repair memory consumption between the following commands > 1. xfs_repair -P -m 500 > 2. xfs_repair -m 500 > > 1) Exceeds the max_mem_specified limit > 2) Stays below the max_mem_specified limit Purely co-incidental, IMO. As the man page says: -m maxmem Specifies the approximate maximum amount of memory, in megabytes, to use for xfs_repair. xfs_repair has its own internal block cache which will scale out up to the lesser of the process’s virtual address limit or about 75% of the system’s physical RAM. This option overrides these limits. NOTE: These memory limits are only approximate and may use more than the specified limit. IOWs, behaviour is expected - the max_mem figure is just a starting point guideline, and it only affects the size of the IO cache that repair holds. We still need lots of memory to index free space, used space, inodes, hold directory information, etc, so memory usage on any filesystem with enough metadata in it to fill the internal buffer cache will always go over this number.... > I expected disabled prefetch to reduce the memory load but instead the result is the opposite. > The two commands 1) and 2) are being executed in the same system. > My speculation: > Does the prefetch facilitate and improve the calculation of the memory > consumption and make it more accurate? No, prefetching changes the way processing of the metadata occurs. It also vastly changes the way IO is done and the buffer cache is populated. e.g. prefetching looks at metadata density and issues large IOs if the density is high enough and then chops them up into individual metadata buffers in memory at prefetch IO completion. This avoids doing lots of small IOs, greatly improving IO throughput and keeping the processing pipeline busy. This comes at the cost of increased CPU overhead and non-buffer cache memory footprint, but for slow IO devices this can improve IO throughput (and hence repair times) by a factor of up to 100x. Have a look at the difference in IO patterns when you enable/disable prefetching... When prefetching is turned off, the processing issues individual IO itself and doesn't do density-based scan optimisation. In some cases this is faster (e.g. high speed SSDs) because it is more CPU efficient, but it results in different IO patterns and buffer access patterns. The end result is that buffers have a very different life time when prefetching is turned on compared to when it is off, and so there's a very different buffer cache memory footprint between the two options. > Here follows output with -P and without -P from the same system. > I have extracted the part the actually differs. > The full logs are available the bottom of this email. > > # -P -m 500 # > Phase 3 - for each AG... > ... > Active entries = 12336 > Hash table size = 1549 > Hits = 1 > Misses = 224301 > Hit ratio = 0.00 > MRU 0 entries = 12335 ( 99%) > MRU 1 entries = 0 ( 0%) > MRU 2 entries = 0 ( 0%) > MRU 3 entries = 0 ( 0%) > MRU 4 entries = 0 ( 0%) > MRU 5 entries = 0 ( 0%) > MRU 6 entries = 0 ( 0%) > MRU 7 entries = 0 ( 0%) Without prefetching, we have a single use for all buffers and the metadata accessed is 20x larger than the size of the buffer cache (220k vs 12k for the cache size). This is just showing how the non-prefetch case is just streaming buffers through the cache in processing access order. i.e. The MRU list indicates that nothing is being kept for long periods or being accessed out of order as all buffers are on list 0 (most recently used). i.e. nothing is aging out and which means buffers are being used and reclaimed in the same order they are being instantiated. If anything was being accessed out of order, we would see buffers move down the aging lists.... > # -m 500 # > Phase 3 - for each AG... > ... > Active entries = 12388 > Hash table size = 1549 > Hits = 220459 > Misses = 235388 > Hit ratio = 48.36 And there's the difference - two accesses per buffer for the prefetch case. One for the IO dispatch to bring it into memory (the miss) and one for processing (the hit). > MRU 0 entries = 2 ( 0%) > MRU 1 entries = 0 ( 0%) > MRU 2 entries = 1362 ( 10%) > MRU 3 entries = 68 ( 0%) > MRU 4 entries = 10 ( 0%) > MRU 5 entries = 6097 ( 49%) > MRU 6 entries = 4752 ( 38%) > MRU 7 entries = 96 ( 0%) And the MRU list shows how the buffer access are not uniform - we are seeing buffers of all different ages in the cache. This shows that buffers are being aged 5-6 times before they are getting used, which means the cache size is almost too small for prefetch to work effectively.... Actually, the cache is too small - cache misses are significantly larger than cache hits, meaning some buffers are being fetched from disk twice because the prefetched buffers are aging out before the processing thread gets to them. Give xfs_repair ~5GB of RAM, and it should only need to do a single IO pass in phase 3 and then phase 4 and 6 will hit the buffers in the cache and hence not need to do any IO at all... So to me, this is prefetch working as it should - it's bringing buffers into cache in the optimal IO pattern rather than the application level access pattern. The difference in memory footprint compared to no prefetching is largely co-incidental and really not something we are concerned about in any way... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: xfsprogs: repair: Higher memory consumption when disable prefetch 2023-11-08 22:05 ` Dave Chinner @ 2023-11-08 22:54 ` Darrick J. Wong 0 siblings, 0 replies; 3+ messages in thread From: Darrick J. Wong @ 2023-11-08 22:54 UTC (permalink / raw) To: Dave Chinner; +Cc: Per Förlin, linux-xfs@vger.kernel.org On Thu, Nov 09, 2023 at 09:05:52AM +1100, Dave Chinner wrote: > On Wed, Nov 08, 2023 at 03:56:00PM +0000, Per Förlin wrote: > > Hi Linux XFS community, > > > > Please bare with me I'm new to XFS :) > > > > I'm comparing how EXT4 and XFS behaves on systems with a relative > > small RAM vs storage ratio. The current focus is on FS repair memory consumption. > > > > I have been running some tests using the max_mem_specified option. > > The "-m" (max_mem_specified) parameter does not guarantee success but it surely helps > > to reduce the memory load, in comparison to EXT4 this is an improvement. > > > > My question concerns the relation between "-P" (disable prefetch) and "-m" (max_mem_specified). > > > > There is a difference in xfs_repair memory consumption between the following commands > > 1. xfs_repair -P -m 500 > > 2. xfs_repair -m 500 > > > > 1) Exceeds the max_mem_specified limit > > 2) Stays below the max_mem_specified limit > > Purely co-incidental, IMO. > > As the man page says: > > -m maxmem > > Specifies the approximate maximum amount of memory, in > megabytes, to use for xfs_repair. xfs_repair has its > own internal block cache which will scale out > up to the lesser of the process’s virtual address > limit or about 75% of the system’s physical RAM. This > option overrides these limits. > > NOTE: These memory limits are only approximate and may > use more than the specified limit. > > IOWs, behaviour is expected - the max_mem figure is just a starting > point guideline, and it only affects the size of the IO cache that > repair holds. We still need lots of memory to index free space, > used space, inodes, hold directory information, etc, so memory usage > on any filesystem with enough metadata in it to fill the internal > buffer cache will always go over this number.... > > > I expected disabled prefetch to reduce the memory load but instead the result is the opposite. > > The two commands 1) and 2) are being executed in the same system. > > > My speculation: > > Does the prefetch facilitate and improve the calculation of the memory > > consumption and make it more accurate? > > No, prefetching changes the way processing of the metadata occurs. > It also vastly changes the way IO is done and the buffer cache is > populated. > > e.g. prefetching looks at metadata density and issues > large IOs if the density is high enough and then chops them up into > individual metadata buffers in memory at prefetch IO completion. > This avoids doing lots of small IOs, greatly improving IO throughput > and keeping the processing pipeline busy. > > This comes at the cost of increased CPU overhead and non-buffer > cache memory footprint, but for slow IO devices this can improve IO > throughput (and hence repair times) by a factor of up to 100x. Have > a look at the difference in IO patterns when you enable/disable > prefetching... > > When prefetching is turned off, the processing issues individual IO > itself and doesn't do density-based scan optimisation. In some cases > this is faster (e.g. high speed SSDs) because it is more CPU > efficient, but it results in different IO patterns and buffer access > patterns. > > The end result is that buffers have a very different life time when > prefetching is turned on compared to when it is off, and so there's > a very different buffer cache memory footprint between the two > options. > > > Here follows output with -P and without -P from the same system. > > I have extracted the part the actually differs. > > The full logs are available the bottom of this email. > > > > # -P -m 500 # > > Phase 3 - for each AG... > > ... > > Active entries = 12336 > > Hash table size = 1549 > > Hits = 1 > > Misses = 224301 > > Hit ratio = 0.00 > > MRU 0 entries = 12335 ( 99%) > > MRU 1 entries = 0 ( 0%) > > MRU 2 entries = 0 ( 0%) > > MRU 3 entries = 0 ( 0%) > > MRU 4 entries = 0 ( 0%) > > MRU 5 entries = 0 ( 0%) > > MRU 6 entries = 0 ( 0%) > > MRU 7 entries = 0 ( 0%) > > Without prefetching, we have a single use for all buffers and the > metadata accessed is 20x larger than the size of the buffer cache > (220k vs 12k for the cache size). This is just showing how the > non-prefetch case is just streaming buffers through the cache in > processing access order. > > i.e. The MRU list indicates that nothing is being kept for long > periods or being accessed out of order as all buffers are on list 0 > (most recently used). i.e. nothing is aging out and which means > buffers are being used and reclaimed in the same order they are > being instantiated. If anything was being accessed out of order, we > would see buffers move down the aging lists.... > > > # -m 500 # > > Phase 3 - for each AG... > > ... > > Active entries = 12388 > > Hash table size = 1549 > > Hits = 220459 > > Misses = 235388 > > Hit ratio = 48.36 > > And there's the difference - two accesses per buffer for the > prefetch case. One for the IO dispatch to bring it into memory (the > miss) and one for processing (the hit). > > > MRU 0 entries = 2 ( 0%) > > MRU 1 entries = 0 ( 0%) > > MRU 2 entries = 1362 ( 10%) > > MRU 3 entries = 68 ( 0%) > > MRU 4 entries = 10 ( 0%) > > MRU 5 entries = 6097 ( 49%) > > MRU 6 entries = 4752 ( 38%) > > MRU 7 entries = 96 ( 0%) > > And the MRU list shows how the buffer access are not uniform - we > are seeing buffers of all different ages in the cache. This shows > that buffers are being aged 5-6 times before they are getting used, > which means the cache size is almost too small for prefetch to work > effectively.... > > Actually, the cache is too small - cache misses are significantly > larger than cache hits, meaning some buffers are being fetched from > disk twice because the prefetched buffers are aging out before the > processing thread gets to them. Give xfs_repair ~5GB of RAM, and it > should only need to do a single IO pass in phase 3 and then phase 4 > and 6 will hit the buffers in the cache and hence not need to do any > IO at all... > > So to me, this is prefetch working as it should - it's bringing > buffers into cache in the optimal IO pattern rather than the > application level access pattern. The difference in memory footprint > compared to no prefetching is largely co-incidental and really not > something we are concerned about in any way... /me notes that if you turn on the fancy new features (rmap, reflink, or parent pointers) then repair will consume even more memory. None of that can be precomputed before scanning the fs, so the -m "limits" are even less precise. (Also, large metadata-heavy filesystems aren't well supported on systems with limited DRAM.) --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-11-08 22:54 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-08 15:56 xfsprogs: repair: Higher memory consumption when disable prefetch Per Förlin 2023-11-08 22:05 ` Dave Chinner 2023-11-08 22:54 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox