Question about logbsize default value

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Question about logbsize default value
@ 2019-10-23  9:40 Gionatan Danti
  2019-10-24 21:50 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2019-10-23  9:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Hi list,
on both the mount man page and the doc here [1] I read that when the 
underlying RAID stripe unit is bigger than 256k, the log buffer size 
(logbsize) will be set at 32k by default.

As in my tests (on top of software RAID 10 with 512k chunks) it seems 
that using logbsize=256k helps in metadata-heavy workload, I wonder why 
the default is to set such a small log buffer size.

For example, given the following array:

md126 : active raid10 sda1[3] sdb1[1] sdc1[0] sdd1[2]
       268439552 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 1/3 pages [4KB], 65536KB chunk

running "fs_mark  -n  1000000  -k  -S  0  -D  1000  -N  1000  -s  16384 
-d  /mnt/xfs/" shows the following results:

32k  logbsize (default, due to 512k chunk size): 3027.4 files/sec
256k logbsize (manually specified during mount): 4768.4 files/sec

I would naively think that logbsize=256k would be a better default. Am I 
missing something?

[1] 
https://git.kernel.org/pub/scm/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc#n322

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-23  9:40 Question about logbsize default value Gionatan Danti
@ 2019-10-24 21:50 ` Dave Chinner
  2019-10-25  7:10   ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2019-10-24 21:50 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote:
> Hi list,
> on both the mount man page and the doc here [1] I read that when the
> underlying RAID stripe unit is bigger than 256k, the log buffer size
> (logbsize) will be set at 32k by default.
> 
> As in my tests (on top of software RAID 10 with 512k chunks) it seems that
> using logbsize=256k helps in metadata-heavy workload, I wonder why the
> default is to set such a small log buffer size.
> 
> For example, given the following array:
> 
> md126 : active raid10 sda1[3] sdb1[1] sdc1[0] sdd1[2]
>       268439552 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>       bitmap: 1/3 pages [4KB], 65536KB chunk
> 
> running "fs_mark  -n  1000000  -k  -S  0  -D  1000  -N  1000  -s  16384 -d
> /mnt/xfs/" shows the following results:
> 
> 32k  logbsize (default, due to 512k chunk size): 3027.4 files/sec
> 256k logbsize (manually specified during mount): 4768.4 files/sec
> 
> I would naively think that logbsize=256k would be a better default. Am I
> missing something?

Defaults are for best compatibility and general behaviour, not
best performance. A log stripe unit of 32kB allows the user to
configure a logbsize appropriate for their workload, as it supports
logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the
default log stripe unit, then you have no opportunity to set the
logbsize appropriately for your workload.

remember, LSU determines how much padding is added to every non-full
log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if
you have a workload that frequnetly writes non-full iclogs (e.g.
regular fsyncs) then a small LSU results in much better performance
as there is less padding that needs to be initialised and the IOs
are much smaller.

Hence for the general case (i.e. what the defaults are aimed at), a
small LSU is a much better choice. you can still use a large
logbsize mount option and it will perform identically to a large LSU
filesystem on full iclog workloads (like the above fsmark workload
that doesn't use fsync). However, a small LSU is likely to perform
better over a wider range of workloads and storage than a large LSU,
and so small LSU is a better choice for the default....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-24 21:50 ` Dave Chinner
@ 2019-10-25  7:10   ` Gionatan Danti
  2019-10-25 23:39     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2019-10-25  7:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, g.danti

On 24/10/19 23:50, Dave Chinner wrote:
> On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote:
> Defaults are for best compatibility and general behaviour, not
> best performance. A log stripe unit of 32kB allows the user to
> configure a logbsize appropriate for their workload, as it supports
> logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the
> default log stripe unit, then you have no opportunity to set the
> logbsize appropriately for your workload.
> 
> remember, LSU determines how much padding is added to every non-full
> log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if
> you have a workload that frequnetly writes non-full iclogs (e.g.
> regular fsyncs) then a small LSU results in much better performance
> as there is less padding that needs to be initialised and the IOs
> are much smaller.
> 
> Hence for the general case (i.e. what the defaults are aimed at), a
> small LSU is a much better choice. you can still use a large
> logbsize mount option and it will perform identically to a large LSU
> filesystem on full iclog workloads (like the above fsmark workload
> that doesn't use fsync). However, a small LSU is likely to perform
> better over a wider range of workloads and storage than a large LSU,
> and so small LSU is a better choice for the default....

Hi Dave, thank you for your explanation. The observed behavior of a 
large LSU surely matches what you described - less-than-optimal fsync perf.

That said, I was wondering why *logbsize* (rather than LSU) has a low 
default of 32k (or, better, its default is to match LSU size). If I 
understand it correctly, a large logbsize (eg: 256k) on top of a small 
LSU (32k) would give high performance on both full-log-writes and 
partial-log-writes (eg: frequent fsync).

Is my understanding correct? If you, do you suggest to always set 
logbsize to the maximum supported value?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-25  7:10   ` Gionatan Danti
@ 2019-10-25 23:39     ` Dave Chinner
  2019-10-26  9:54       ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2019-10-25 23:39 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Fri, Oct 25, 2019 at 09:10:28AM +0200, Gionatan Danti wrote:
> On 24/10/19 23:50, Dave Chinner wrote:
> > On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote:
> > Defaults are for best compatibility and general behaviour, not
> > best performance. A log stripe unit of 32kB allows the user to
> > configure a logbsize appropriate for their workload, as it supports
> > logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the
> > default log stripe unit, then you have no opportunity to set the
> > logbsize appropriately for your workload.
> > 
> > remember, LSU determines how much padding is added to every non-full
> > log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if
> > you have a workload that frequnetly writes non-full iclogs (e.g.
> > regular fsyncs) then a small LSU results in much better performance
> > as there is less padding that needs to be initialised and the IOs
> > are much smaller.
> > 
> > Hence for the general case (i.e. what the defaults are aimed at), a
> > small LSU is a much better choice. you can still use a large
> > logbsize mount option and it will perform identically to a large LSU
> > filesystem on full iclog workloads (like the above fsmark workload
> > that doesn't use fsync). However, a small LSU is likely to perform
> > better over a wider range of workloads and storage than a large LSU,
> > and so small LSU is a better choice for the default....
> 
> Hi Dave, thank you for your explanation. The observed behavior of a large
> LSU surely matches what you described - less-than-optimal fsync perf.
> 
> That said, I was wondering why *logbsize* (rather than LSU) has a low
> default of 32k (or, better, its default is to match LSU size).

The default is to match LSU size, otherwise if LSU is < 32kB (e.g.
not set) it will use 32kB. If you try to set a logbsize smaller than
the LSU at mount time, it should throw an error.

> If I
> understand it correctly, a large logbsize (eg: 256k) on top of a small LSU
> (32k) would give high performance on both full-log-writes and
> partial-log-writes (eg: frequent fsync).

Again, it's a trade-off.

256kB iclogs mean that a crash can leave an unrecoverable 2MB hole
in the journal, while 32kB iclogs means it's only 256kB.

256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only
256kB. We have users with hundreds of individual XFS filesystems
mounted on single machines, and so 256kB iclogs is a lot of wasted
memory...

On small logs and filesystems, 256kB iclogs doesn't provide any real
benefit because throughput is limited by log tail pushing (metadata
writeback), not async transaction throughput.

It's not uncommon for modern disks to have best throughput and/or
lowest latency at IO sizes of 128kB or smaller.

If you have lots of NVRAM in front of your spinning disks, then log
IO sizes mostly don't matter - they end up bandwidth limited before
the iclog size is an issue.

Testing on a pristine filesystem doesn't show what happens as the
filesystem ages over years of constant use, and so what provides
"best performance on empty filesystem" often doesn't provide best
long term production performance.

And so on.

Storage is complex, filesystems are complex, and no one setting is
right for everyone. The defaults are intended to be "good enough" in
the majority of typical user configs. 

> Is my understanding correct?

For you're specific storage setup, yes.

> If you, do you suggest to always set logbsize
> to the maximum supported value?

No. I recommend that people use the defaults, and only if there are
performance issues with their -actual production workload- should
they consider changing anything.

Benchmarks rarely match the behaviour of production workloads -
tuning for benchmarks can actively harm production performance,
especially over the long term...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-25 23:39     ` Dave Chinner
@ 2019-10-26  9:54       ` Gionatan Danti
  2019-10-26 21:59         ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Gionatan Danti @ 2019-10-26  9:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, g.danti

Il 26-10-2019 01:39 Dave Chinner ha scritto:
> Again, it's a trade-off.
> 
> 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole
> in the journal, while 32kB iclogs means it's only 256kB.

Sure, but a crash will always cause the loss of unsynced data, 
especially when using deferred logging and/or deferred allocation, 
right?

> 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only
> 256kB. We have users with hundreds of individual XFS filesystems
> mounted on single machines, and so 256kB iclogs is a lot of wasted
> memory...

Just wondering: 1000 filesystems with 256k logbsize would result in 2 GB 
of memory consumed by journal buffers. Is this considered too much 
memory for a system managing 1000 filesystems? The pagecache write back 
memory consumption on these systems (probably equipped with 10s GB of 
RAM) would dwarfs any journal buffers, no?

> On small logs and filesystems, 256kB iclogs doesn't provide any real
> benefit because throughput is limited by log tail pushing (metadata
> writeback), not async transaction throughput.
> 
> It's not uncommon for modern disks to have best throughput and/or
> lowest latency at IO sizes of 128kB or smaller.
> 
> If you have lots of NVRAM in front of your spinning disks, then log
> IO sizes mostly don't matter - they end up bandwidth limited before
> the iclog size is an issue.

Yes, this matches my observation.

> Testing on a pristine filesystem doesn't show what happens as the
> filesystem ages over years of constant use, and so what provides
> "best performance on empty filesystem" often doesn't provide best
> long term production performance.
> 
> And so on.
> 
> Storage is complex, filesystems are complex, and no one setting is
> right for everyone. The defaults are intended to be "good enough" in
> the majority of typical user configs.

Yep.

> 
> For you're specific storage setup, yes.
> 
>> If you, do you suggest to always set logbsize
>> to the maximum supported value?
> 
> No. I recommend that people use the defaults, and only if there are
> performance issues with their -actual production workload- should
> they consider changing anything.
> 
> Benchmarks rarely match the behaviour of production workloads -
> tuning for benchmarks can actively harm production performance,
> especially over the long term...
> 
> Cheers,
> 
> Dave.

Ok, very clear.
Thank you so much.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-26  9:54       ` Gionatan Danti
@ 2019-10-26 21:59         ` Dave Chinner
  2019-10-27 18:09           ` Gionatan Danti
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2019-10-26 21:59 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Sat, Oct 26, 2019 at 11:54:02AM +0200, Gionatan Danti wrote:
> Il 26-10-2019 01:39 Dave Chinner ha scritto:
> > Again, it's a trade-off.
> > 
> > 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole
> > in the journal, while 32kB iclogs means it's only 256kB.
> 
> Sure, but a crash will always cause the loss of unsynced data, especially
> when using deferred logging and/or deferred allocation, right?

Yes, but there's a big difference between 2MB and 256KB, especially
if it's a small filesystem (very common) and the log is only ~10MB
in size.

> > 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only
> > 256kB. We have users with hundreds of individual XFS filesystems
> > mounted on single machines, and so 256kB iclogs is a lot of wasted
> > memory...
> 
> Just wondering: 1000 filesystems with 256k logbsize would result in 2 GB of
> memory consumed by journal buffers. Is this considered too much memory for a
> system managing 1000 filesystems? The pagecache write back memory
> consumption on these systems (probably equipped with 10s GB of RAM) would
> dwarfs any journal buffers, no?

Log buffers are static memory footprint. Page cache memory is
dynamic and can be trimmed to nothing when there is memory pressure
However, memory allocated to log buffers is pinned for the life
of the mount, whether that filesystem is busy or not - the memory is
not reclaimable.

THe 8 log buffers of 32kB each is a good trade-off between
minimising memory footprint and maintaining performance over a wide
range of storage and use cases. If that's still too much memory per
filesystem, then the user can compromise on performance by reducing
the number of logbufs. If performance is too slow, then the user can
increase the memory footprint to improve performance.

The default values sit in the middle ground on both axis - enough
logbufs and iclog size for decent performance but with a small
enough memory footprint that dense or resource constrained
installations are possible to deploy without needing any tweaking.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about logbsize default value
  2019-10-26 21:59         ` Dave Chinner
@ 2019-10-27 18:09           ` Gionatan Danti
  0 siblings, 0 replies; 7+ messages in thread
From: Gionatan Danti @ 2019-10-27 18:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, g.danti

Il 26-10-2019 23:59 Dave Chinner ha scritto:
> On Sat, Oct 26, 2019 at 11:54:02AM +0200, Gionatan Danti wrote:
>> Il 26-10-2019 01:39 Dave Chinner ha scritto:
>> > Again, it's a trade-off.
>> >
>> > 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole
>> > in the journal, while 32kB iclogs means it's only 256kB.
>> 
>> Sure, but a crash will always cause the loss of unsynced data, 
>> especially
>> when using deferred logging and/or deferred allocation, right?
> 
> Yes, but there's a big difference between 2MB and 256KB, especially
> if it's a small filesystem (very common) and the log is only ~10MB
> in size.
> 
>> > 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only
>> > 256kB. We have users with hundreds of individual XFS filesystems
>> > mounted on single machines, and so 256kB iclogs is a lot of wasted
>> > memory...
>> 
>> Just wondering: 1000 filesystems with 256k logbsize would result in 2 
>> GB of
>> memory consumed by journal buffers. Is this considered too much memory 
>> for a
>> system managing 1000 filesystems? The pagecache write back memory
>> consumption on these systems (probably equipped with 10s GB of RAM) 
>> would
>> dwarfs any journal buffers, no?
> 
> Log buffers are static memory footprint. Page cache memory is
> dynamic and can be trimmed to nothing when there is memory pressure
> However, memory allocated to log buffers is pinned for the life
> of the mount, whether that filesystem is busy or not - the memory is
> not reclaimable.
> 
> THe 8 log buffers of 32kB each is a good trade-off between
> minimising memory footprint and maintaining performance over a wide
> range of storage and use cases. If that's still too much memory per
> filesystem, then the user can compromise on performance by reducing
> the number of logbufs. If performance is too slow, then the user can
> increase the memory footprint to improve performance.
> 
> The default values sit in the middle ground on both axis - enough
> logbufs and iclog size for decent performance but with a small
> enough memory footprint that dense or resource constrained
> installations are possible to deploy without needing any tweaking.
> 
> Cheers,
> 
> Dave.

It surely is reasonable.
Thank you for the clear explanation.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-27 18:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-23  9:40 Question about logbsize default value Gionatan Danti
2019-10-24 21:50 ` Dave Chinner
2019-10-25  7:10   ` Gionatan Danti
2019-10-25 23:39     ` Dave Chinner
2019-10-26  9:54       ` Gionatan Danti
2019-10-26 21:59         ` Dave Chinner
2019-10-27 18:09           ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox