* cap on writeback?
@ 2013-03-25 23:33 Raymond Jennings
2013-03-26 0:06 ` Valdis.Kletnieks at vt.edu
0 siblings, 1 reply; 7+ messages in thread
From: Raymond Jennings @ 2013-03-25 23:33 UTC (permalink / raw)
To: kernelnewbies
Just curious, is there a cap on how much data can be in "writeback" at
the same time?
I'm asking because I have over a gigabyte of data in "dirty", but
during flush, only about 60k or so is in writeback at any one time.
Is there a cap of sorts, and if so, how do I remove it?
^ permalink raw reply [flat|nested] 7+ messages in thread
* cap on writeback?
2013-03-25 23:33 cap on writeback? Raymond Jennings
@ 2013-03-26 0:06 ` Valdis.Kletnieks at vt.edu
2013-03-26 0:23 ` Raymond Jennings
0 siblings, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-03-26 0:06 UTC (permalink / raw)
To: kernelnewbies
On Mon, 25 Mar 2013 16:33:48 -0700, Raymond Jennings said:
> Just curious, is there a cap on how much data can be in "writeback" at
> the same time?
>
> I'm asking because I have over a gigabyte of data in "dirty", but
> during flush, only about 60k or so is in writeback at any one time.
Only a gigabyte? :) (I've got a box across the hall that has 2.6T of RAM,
and yes, it's pretty sad when it decides it's time for writeback across
an NFS or GPFS mount, even though it's a 10GE connection.)
For the record, writeback is one of those things that's really hard to
get right, because there's always corner cases. Probably why we seem to
end up screwing around with it every 2-3 releases. :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130325/3a11f438/attachment.bin
^ permalink raw reply [flat|nested] 7+ messages in thread
* cap on writeback?
2013-03-26 0:06 ` Valdis.Kletnieks at vt.edu
@ 2013-03-26 0:23 ` Raymond Jennings
2013-03-26 3:17 ` Valdis.Kletnieks at vt.edu
0 siblings, 1 reply; 7+ messages in thread
From: Raymond Jennings @ 2013-03-26 0:23 UTC (permalink / raw)
To: kernelnewbies
On Mon, Mar 25, 2013 at 5:06 PM, <Valdis.Kletnieks@vt.edu> wrote:
> On Mon, 25 Mar 2013 16:33:48 -0700, Raymond Jennings said:
>> Just curious, is there a cap on how much data can be in "writeback" at
>> the same time?
>>
>> I'm asking because I have over a gigabyte of data in "dirty", but
>> during flush, only about 60k or so is in writeback at any one time.
>
> Only a gigabyte? :)
Well, yes. Considering I'm on a piece of crap that only has 2G total
ram on a stick, part of which is reserved for video. It's a budget
computer I bought in a pinch, sue me :P
> (I've got a box across the hall that has 2.6T of RAM,
> and yes, it's pretty sad when it decides it's time for writeback across
> an NFS or GPFS mount, even though it's a 10GE connection.)
>
> For the record, writeback is one of those things that's really hard to
> get right, because there's always corner cases. Probably why we seem to
> end up screwing around with it every 2-3 releases. :)
But anyway, I'm appreciative (but already aware) of the complications,
but this still doesn't answer my question.
Is there some sort of mechanism that throttles the size of the writeback pool?
I would like to remove that cap if it exists, or at least make it a
proc/sys tunable of sorts.
it's somewhat related to my brainfuck queue, since I would like to
stress test it digesting a huge pile of outbound data and seeing if it
can make writeback less seeky.
^ permalink raw reply [flat|nested] 7+ messages in thread
* cap on writeback?
2013-03-26 0:23 ` Raymond Jennings
@ 2013-03-26 3:17 ` Valdis.Kletnieks at vt.edu
2013-03-26 9:01 ` Raymond Jennings
2013-03-26 16:46 ` SSDs vs elevators (was Re: cap on writeback?) Arlie Stephens
0 siblings, 2 replies; 7+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-03-26 3:17 UTC (permalink / raw)
To: kernelnewbies
On Mon, 25 Mar 2013 17:23:40 -0700, Raymond Jennings said:
> Is there some sort of mechanism that throttles the size of the writeback pool?
There's a lot of tunables in /proc/sys/vm - everything from drop_caches
to swappiness to vfs_cache_pressure. Note that they all interact in mystical
and hard-to-understand ways. ;)
> it's somewhat related to my brainfuck queue, since I would like to
> stress test it digesting a huge pile of outbound data and seeing if it
> can make writeback less seeky.
The biggest challenge here is that there's a bit of a layering violation
to be resolved - when the VM is choosing what pages get written out first,
it really has no clue where on disk the pages are going. Consider a 16M
file that's fragged into 16 1M extents - they'll almost certainly hit
the writeback queue in logical block order, not physical address order.
The only really good choices here are to either allow the writeback queue
to get deep enough that an elevator can do something useful (if you only
have 2-3 IOs queued, you can do less than if you have 20-30 of them you
can sort into some useful order), and filesystems that are less prone
to fragmentation issues
Just for the record, most of my high-performance stuff runs best with
the noop scheduler - when you're striping I/O across several hundred disks,
the last thing you want is some some single-minded disk scheduler re-arranging
the I/Os and creating latency issues for your striping.
Might want to think about why there's lots of man-hours spent doing
new filesystems and stuff like zcache and kernel shared memory, but the
only IO schedulers in tree are noop, deadline, and cfq :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130325/063c2e2e/attachment.bin
^ permalink raw reply [flat|nested] 7+ messages in thread
* cap on writeback?
2013-03-26 3:17 ` Valdis.Kletnieks at vt.edu
@ 2013-03-26 9:01 ` Raymond Jennings
2013-03-26 16:46 ` SSDs vs elevators (was Re: cap on writeback?) Arlie Stephens
1 sibling, 0 replies; 7+ messages in thread
From: Raymond Jennings @ 2013-03-26 9:01 UTC (permalink / raw)
To: kernelnewbies
On Mon, Mar 25, 2013 at 8:17 PM, <Valdis.Kletnieks@vt.edu> wrote:
> On Mon, 25 Mar 2013 17:23:40 -0700, Raymond Jennings said:
>
>> Is there some sort of mechanism that throttles the size of the writeback pool?
>
> There's a lot of tunables in /proc/sys/vm - everything from drop_caches
> to swappiness to vfs_cache_pressure. Note that they all interact in mystical
> and hard-to-understand ways. ;)
I'm pretty familiar with this directory, but alas, I can find nothing
regarding writeback throttling that would limit the amount of data in
the "writeback" pool.
So again I ask, where is it? Unless you are hinting I should search
the source myself ^^.
>> it's somewhat related to my brainfuck queue, since I would like to
>> stress test it digesting a huge pile of outbound data and seeing if it
>> can make writeback less seeky.
>
> The biggest challenge here is that there's a bit of a layering violation
> to be resolved - when the VM is choosing what pages get written out first,
> it really has no clue where on disk the pages are going.
Already realized this myself ^^
> Consider a 16M
> file that's fragged into 16 1M extents - they'll almost certainly hit
> the writeback queue in logical block order, not physical address order.
> The only really good choices here are to either allow the writeback queue
> to get deep enough that an elevator can do something useful (if you only
> have 2-3 IOs queued, you can do less than if you have 20-30 of them you
> can sort into some useful order), and filesystems that are less prone
> to fragmentation issues
Indeed, the filesystem really ought to be the one making decisions on
what to flush, and should be taking hints from the block layer given a
sector number.
> Just for the record, most of my high-performance stuff runs best with
> the noop scheduler - when you're striping I/O across several hundred disks,
> the last thing you want is some some single-minded disk scheduler re-arranging
> the I/Os and creating latency issues for your striping.
> Might want to think about why there's lots of man-hours spent doing
> new filesystems and stuff like zcache and kernel shared memory, but the
> only IO schedulers in tree are noop, deadline, and cfq :)
Hey, gotta cut my teeth somewhere. :)
^ permalink raw reply [flat|nested] 7+ messages in thread
* SSDs vs elevators (was Re: cap on writeback?)
2013-03-26 3:17 ` Valdis.Kletnieks at vt.edu
2013-03-26 9:01 ` Raymond Jennings
@ 2013-03-26 16:46 ` Arlie Stephens
2013-03-27 10:43 ` Raymond Jennings
1 sibling, 1 reply; 7+ messages in thread
From: Arlie Stephens @ 2013-03-26 16:46 UTC (permalink / raw)
To: kernelnewbies
On Mar 25 2013, Valdis.Kletnieks at vt.edu wrote:
>
> Just for the record, most of my high-performance stuff runs best with
> the noop scheduler - when you're striping I/O across several hundred disks,
> the last thing you want is some some single-minded disk scheduler re-arranging
> the I/Os and creating latency issues for your striping.
>
> Might want to think about why there's lots of man-hours spent doing
> new filesystems and stuff like zcache and kernel shared memory, but the
> only IO schedulers in tree are noop, deadline, and cfq :)
Interesting. I'd have expected, naively, that when dealing with SSDs -
and their delightful habits of rearranging data under the covers,
suddenly introducing delays, as well as their desire to have reads
grouped away from writes (RWRWR == slower than RRRWW if I understand
correctly) there'd be work to be done at the IO scheduler layer, not
just at the file systems layer, to get the best performance out of SSDs.
For the record, my own attempts to get performance out of SSDs have
been productive of nothing but pain. I was, however, trying to do this
_without_ kernel mods, and on FreeBSD rather than linux.
--
Arlie
^ permalink raw reply [flat|nested] 7+ messages in thread
* SSDs vs elevators (was Re: cap on writeback?)
2013-03-26 16:46 ` SSDs vs elevators (was Re: cap on writeback?) Arlie Stephens
@ 2013-03-27 10:43 ` Raymond Jennings
0 siblings, 0 replies; 7+ messages in thread
From: Raymond Jennings @ 2013-03-27 10:43 UTC (permalink / raw)
To: kernelnewbies
On Tue, Mar 26, 2013 at 9:46 AM, Arlie Stephens <arlie@worldash.org> wrote:
> On Mar 25 2013, Valdis.Kletnieks at vt.edu wrote:
>>
>> Just for the record, most of my high-performance stuff runs best with
>> the noop scheduler - when you're striping I/O across several hundred disks,
>> the last thing you want is some some single-minded disk scheduler re-arranging
>> the I/Os and creating latency issues for your striping.
>>
>> Might want to think about why there's lots of man-hours spent doing
>> new filesystems and stuff like zcache and kernel shared memory, but the
>> only IO schedulers in tree are noop, deadline, and cfq :)
>
> Interesting. I'd have expected, naively, that when dealing with SSDs -
> and their delightful habits of rearranging data under the covers,
> suddenly introducing delays, as well as their desire to have reads
> grouped away from writes (RWRWR == slower than RRRWW if I understand
> correctly) there'd be work to be done at the IO scheduler layer, not
> just at the file systems layer, to get the best performance out of SSDs.
>
> For the record, my own attempts to get performance out of SSDs have
> been productive of nothing but pain. I was, however, trying to do this
> _without_ kernel mods, and on FreeBSD rather than linux.
>
> --
> Arlie
If I had to guess, firmware housekeeping is considered the same as how
rotational media buffers and queues and caches behind the sata bus,
namely that "the device has a mind of its own"
Often this is seen as an argument against allowing the media to do any
serious caching and instead let the kernel's page and buffer caches
handle it.
Oddly it's been said that disabling write caching on the device can
improve performance by preventing the drive heads from sneaking away
from where the block driver parked them..
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-03-27 10:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-25 23:33 cap on writeback? Raymond Jennings
2013-03-26 0:06 ` Valdis.Kletnieks at vt.edu
2013-03-26 0:23 ` Raymond Jennings
2013-03-26 3:17 ` Valdis.Kletnieks at vt.edu
2013-03-26 9:01 ` Raymond Jennings
2013-03-26 16:46 ` SSDs vs elevators (was Re: cap on writeback?) Arlie Stephens
2013-03-27 10:43 ` Raymond Jennings
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).