* Question about logbsize default value
@ 2019-10-23 9:40 Gionatan Danti
2019-10-24 21:50 ` Dave Chinner
0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2019-10-23 9:40 UTC (permalink / raw)
To: linux-xfs; +Cc: g.danti
Hi list,
on both the mount man page and the doc here [1] I read that when the
underlying RAID stripe unit is bigger than 256k, the log buffer size
(logbsize) will be set at 32k by default.
As in my tests (on top of software RAID 10 with 512k chunks) it seems
that using logbsize=256k helps in metadata-heavy workload, I wonder why
the default is to set such a small log buffer size.
For example, given the following array:
md126 : active raid10 sda1[3] sdb1[1] sdc1[0] sdd1[2]
268439552 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 1/3 pages [4KB], 65536KB chunk
running "fs_mark -n 1000000 -k -S 0 -D 1000 -N 1000 -s 16384
-d /mnt/xfs/" shows the following results:
32k logbsize (default, due to 512k chunk size): 3027.4 files/sec
256k logbsize (manually specified during mount): 4768.4 files/sec
I would naively think that logbsize=256k would be a better default. Am I
missing something?
[1]
https://git.kernel.org/pub/scm/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc#n322
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Question about logbsize default value 2019-10-23 9:40 Question about logbsize default value Gionatan Danti @ 2019-10-24 21:50 ` Dave Chinner 2019-10-25 7:10 ` Gionatan Danti 0 siblings, 1 reply; 7+ messages in thread From: Dave Chinner @ 2019-10-24 21:50 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-xfs On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote: > Hi list, > on both the mount man page and the doc here [1] I read that when the > underlying RAID stripe unit is bigger than 256k, the log buffer size > (logbsize) will be set at 32k by default. > > As in my tests (on top of software RAID 10 with 512k chunks) it seems that > using logbsize=256k helps in metadata-heavy workload, I wonder why the > default is to set such a small log buffer size. > > For example, given the following array: > > md126 : active raid10 sda1[3] sdb1[1] sdc1[0] sdd1[2] > 268439552 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU] > bitmap: 1/3 pages [4KB], 65536KB chunk > > running "fs_mark -n 1000000 -k -S 0 -D 1000 -N 1000 -s 16384 -d > /mnt/xfs/" shows the following results: > > 32k logbsize (default, due to 512k chunk size): 3027.4 files/sec > 256k logbsize (manually specified during mount): 4768.4 files/sec > > I would naively think that logbsize=256k would be a better default. Am I > missing something? Defaults are for best compatibility and general behaviour, not best performance. A log stripe unit of 32kB allows the user to configure a logbsize appropriate for their workload, as it supports logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the default log stripe unit, then you have no opportunity to set the logbsize appropriately for your workload. remember, LSU determines how much padding is added to every non-full log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if you have a workload that frequnetly writes non-full iclogs (e.g. regular fsyncs) then a small LSU results in much better performance as there is less padding that needs to be initialised and the IOs are much smaller. Hence for the general case (i.e. what the defaults are aimed at), a small LSU is a much better choice. you can still use a large logbsize mount option and it will perform identically to a large LSU filesystem on full iclog workloads (like the above fsmark workload that doesn't use fsync). However, a small LSU is likely to perform better over a wider range of workloads and storage than a large LSU, and so small LSU is a better choice for the default.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about logbsize default value 2019-10-24 21:50 ` Dave Chinner @ 2019-10-25 7:10 ` Gionatan Danti 2019-10-25 23:39 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: Gionatan Danti @ 2019-10-25 7:10 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, g.danti On 24/10/19 23:50, Dave Chinner wrote: > On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote: > Defaults are for best compatibility and general behaviour, not > best performance. A log stripe unit of 32kB allows the user to > configure a logbsize appropriate for their workload, as it supports > logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the > default log stripe unit, then you have no opportunity to set the > logbsize appropriately for your workload. > > remember, LSU determines how much padding is added to every non-full > log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if > you have a workload that frequnetly writes non-full iclogs (e.g. > regular fsyncs) then a small LSU results in much better performance > as there is less padding that needs to be initialised and the IOs > are much smaller. > > Hence for the general case (i.e. what the defaults are aimed at), a > small LSU is a much better choice. you can still use a large > logbsize mount option and it will perform identically to a large LSU > filesystem on full iclog workloads (like the above fsmark workload > that doesn't use fsync). However, a small LSU is likely to perform > better over a wider range of workloads and storage than a large LSU, > and so small LSU is a better choice for the default.... Hi Dave, thank you for your explanation. The observed behavior of a large LSU surely matches what you described - less-than-optimal fsync perf. That said, I was wondering why *logbsize* (rather than LSU) has a low default of 32k (or, better, its default is to match LSU size). If I understand it correctly, a large logbsize (eg: 256k) on top of a small LSU (32k) would give high performance on both full-log-writes and partial-log-writes (eg: frequent fsync). Is my understanding correct? If you, do you suggest to always set logbsize to the maximum supported value? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about logbsize default value 2019-10-25 7:10 ` Gionatan Danti @ 2019-10-25 23:39 ` Dave Chinner 2019-10-26 9:54 ` Gionatan Danti 0 siblings, 1 reply; 7+ messages in thread From: Dave Chinner @ 2019-10-25 23:39 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-xfs On Fri, Oct 25, 2019 at 09:10:28AM +0200, Gionatan Danti wrote: > On 24/10/19 23:50, Dave Chinner wrote: > > On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote: > > Defaults are for best compatibility and general behaviour, not > > best performance. A log stripe unit of 32kB allows the user to > > configure a logbsize appropriate for their workload, as it supports > > logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the > > default log stripe unit, then you have no opportunity to set the > > logbsize appropriately for your workload. > > > > remember, LSU determines how much padding is added to every non-full > > log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if > > you have a workload that frequnetly writes non-full iclogs (e.g. > > regular fsyncs) then a small LSU results in much better performance > > as there is less padding that needs to be initialised and the IOs > > are much smaller. > > > > Hence for the general case (i.e. what the defaults are aimed at), a > > small LSU is a much better choice. you can still use a large > > logbsize mount option and it will perform identically to a large LSU > > filesystem on full iclog workloads (like the above fsmark workload > > that doesn't use fsync). However, a small LSU is likely to perform > > better over a wider range of workloads and storage than a large LSU, > > and so small LSU is a better choice for the default.... > > Hi Dave, thank you for your explanation. The observed behavior of a large > LSU surely matches what you described - less-than-optimal fsync perf. > > That said, I was wondering why *logbsize* (rather than LSU) has a low > default of 32k (or, better, its default is to match LSU size). The default is to match LSU size, otherwise if LSU is < 32kB (e.g. not set) it will use 32kB. If you try to set a logbsize smaller than the LSU at mount time, it should throw an error. > If I > understand it correctly, a large logbsize (eg: 256k) on top of a small LSU > (32k) would give high performance on both full-log-writes and > partial-log-writes (eg: frequent fsync). Again, it's a trade-off. 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole in the journal, while 32kB iclogs means it's only 256kB. 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only 256kB. We have users with hundreds of individual XFS filesystems mounted on single machines, and so 256kB iclogs is a lot of wasted memory... On small logs and filesystems, 256kB iclogs doesn't provide any real benefit because throughput is limited by log tail pushing (metadata writeback), not async transaction throughput. It's not uncommon for modern disks to have best throughput and/or lowest latency at IO sizes of 128kB or smaller. If you have lots of NVRAM in front of your spinning disks, then log IO sizes mostly don't matter - they end up bandwidth limited before the iclog size is an issue. Testing on a pristine filesystem doesn't show what happens as the filesystem ages over years of constant use, and so what provides "best performance on empty filesystem" often doesn't provide best long term production performance. And so on. Storage is complex, filesystems are complex, and no one setting is right for everyone. The defaults are intended to be "good enough" in the majority of typical user configs. > Is my understanding correct? For you're specific storage setup, yes. > If you, do you suggest to always set logbsize > to the maximum supported value? No. I recommend that people use the defaults, and only if there are performance issues with their -actual production workload- should they consider changing anything. Benchmarks rarely match the behaviour of production workloads - tuning for benchmarks can actively harm production performance, especially over the long term... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about logbsize default value 2019-10-25 23:39 ` Dave Chinner @ 2019-10-26 9:54 ` Gionatan Danti 2019-10-26 21:59 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: Gionatan Danti @ 2019-10-26 9:54 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, g.danti Il 26-10-2019 01:39 Dave Chinner ha scritto: > Again, it's a trade-off. > > 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole > in the journal, while 32kB iclogs means it's only 256kB. Sure, but a crash will always cause the loss of unsynced data, especially when using deferred logging and/or deferred allocation, right? > 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only > 256kB. We have users with hundreds of individual XFS filesystems > mounted on single machines, and so 256kB iclogs is a lot of wasted > memory... Just wondering: 1000 filesystems with 256k logbsize would result in 2 GB of memory consumed by journal buffers. Is this considered too much memory for a system managing 1000 filesystems? The pagecache write back memory consumption on these systems (probably equipped with 10s GB of RAM) would dwarfs any journal buffers, no? > On small logs and filesystems, 256kB iclogs doesn't provide any real > benefit because throughput is limited by log tail pushing (metadata > writeback), not async transaction throughput. > > It's not uncommon for modern disks to have best throughput and/or > lowest latency at IO sizes of 128kB or smaller. > > If you have lots of NVRAM in front of your spinning disks, then log > IO sizes mostly don't matter - they end up bandwidth limited before > the iclog size is an issue. Yes, this matches my observation. > Testing on a pristine filesystem doesn't show what happens as the > filesystem ages over years of constant use, and so what provides > "best performance on empty filesystem" often doesn't provide best > long term production performance. > > And so on. > > Storage is complex, filesystems are complex, and no one setting is > right for everyone. The defaults are intended to be "good enough" in > the majority of typical user configs. Yep. > > For you're specific storage setup, yes. > >> If you, do you suggest to always set logbsize >> to the maximum supported value? > > No. I recommend that people use the defaults, and only if there are > performance issues with their -actual production workload- should > they consider changing anything. > > Benchmarks rarely match the behaviour of production workloads - > tuning for benchmarks can actively harm production performance, > especially over the long term... > > Cheers, > > Dave. Ok, very clear. Thank you so much. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about logbsize default value 2019-10-26 9:54 ` Gionatan Danti @ 2019-10-26 21:59 ` Dave Chinner 2019-10-27 18:09 ` Gionatan Danti 0 siblings, 1 reply; 7+ messages in thread From: Dave Chinner @ 2019-10-26 21:59 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-xfs On Sat, Oct 26, 2019 at 11:54:02AM +0200, Gionatan Danti wrote: > Il 26-10-2019 01:39 Dave Chinner ha scritto: > > Again, it's a trade-off. > > > > 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole > > in the journal, while 32kB iclogs means it's only 256kB. > > Sure, but a crash will always cause the loss of unsynced data, especially > when using deferred logging and/or deferred allocation, right? Yes, but there's a big difference between 2MB and 256KB, especially if it's a small filesystem (very common) and the log is only ~10MB in size. > > 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only > > 256kB. We have users with hundreds of individual XFS filesystems > > mounted on single machines, and so 256kB iclogs is a lot of wasted > > memory... > > Just wondering: 1000 filesystems with 256k logbsize would result in 2 GB of > memory consumed by journal buffers. Is this considered too much memory for a > system managing 1000 filesystems? The pagecache write back memory > consumption on these systems (probably equipped with 10s GB of RAM) would > dwarfs any journal buffers, no? Log buffers are static memory footprint. Page cache memory is dynamic and can be trimmed to nothing when there is memory pressure However, memory allocated to log buffers is pinned for the life of the mount, whether that filesystem is busy or not - the memory is not reclaimable. THe 8 log buffers of 32kB each is a good trade-off between minimising memory footprint and maintaining performance over a wide range of storage and use cases. If that's still too much memory per filesystem, then the user can compromise on performance by reducing the number of logbufs. If performance is too slow, then the user can increase the memory footprint to improve performance. The default values sit in the middle ground on both axis - enough logbufs and iclog size for decent performance but with a small enough memory footprint that dense or resource constrained installations are possible to deploy without needing any tweaking. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about logbsize default value 2019-10-26 21:59 ` Dave Chinner @ 2019-10-27 18:09 ` Gionatan Danti 0 siblings, 0 replies; 7+ messages in thread From: Gionatan Danti @ 2019-10-27 18:09 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, g.danti Il 26-10-2019 23:59 Dave Chinner ha scritto: > On Sat, Oct 26, 2019 at 11:54:02AM +0200, Gionatan Danti wrote: >> Il 26-10-2019 01:39 Dave Chinner ha scritto: >> > Again, it's a trade-off. >> > >> > 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole >> > in the journal, while 32kB iclogs means it's only 256kB. >> >> Sure, but a crash will always cause the loss of unsynced data, >> especially >> when using deferred logging and/or deferred allocation, right? > > Yes, but there's a big difference between 2MB and 256KB, especially > if it's a small filesystem (very common) and the log is only ~10MB > in size. > >> > 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only >> > 256kB. We have users with hundreds of individual XFS filesystems >> > mounted on single machines, and so 256kB iclogs is a lot of wasted >> > memory... >> >> Just wondering: 1000 filesystems with 256k logbsize would result in 2 >> GB of >> memory consumed by journal buffers. Is this considered too much memory >> for a >> system managing 1000 filesystems? The pagecache write back memory >> consumption on these systems (probably equipped with 10s GB of RAM) >> would >> dwarfs any journal buffers, no? > > Log buffers are static memory footprint. Page cache memory is > dynamic and can be trimmed to nothing when there is memory pressure > However, memory allocated to log buffers is pinned for the life > of the mount, whether that filesystem is busy or not - the memory is > not reclaimable. > > THe 8 log buffers of 32kB each is a good trade-off between > minimising memory footprint and maintaining performance over a wide > range of storage and use cases. If that's still too much memory per > filesystem, then the user can compromise on performance by reducing > the number of logbufs. If performance is too slow, then the user can > increase the memory footprint to improve performance. > > The default values sit in the middle ground on both axis - enough > logbufs and iclog size for decent performance but with a small > enough memory footprint that dense or resource constrained > installations are possible to deploy without needing any tweaking. > > Cheers, > > Dave. It surely is reasonable. Thank you for the clear explanation. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-10-27 18:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-23 9:40 Question about logbsize default value Gionatan Danti 2019-10-24 21:50 ` Dave Chinner 2019-10-25 7:10 ` Gionatan Danti 2019-10-25 23:39 ` Dave Chinner 2019-10-26 9:54 ` Gionatan Danti 2019-10-26 21:59 ` Dave Chinner 2019-10-27 18:09 ` Gionatan Danti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox