nilfs_cleanerd using a lot of disk-write bandwidth

linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* nilfs_cleanerd using a lot of disk-write bandwidth
@ 2011-08-09 10:18 Gordan Bobic
       [not found] ` <94b06fe504b540199f338f9bd4ed890f-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Gordan Bobic @ 2011-08-09 10:18 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

 Hi,

 I'm seeing nilfs_cleanerd using a lot of disk write bandwidth according 
 to iotop. It seems to be performing approximately equal amounts of reads 
 and writes when it is running. Reads I can understand, but why is it 
 writing so much in order to garbage collect? Should it not be just 
 trying to mark blocks as free? The disk I/O r/w symmetry implies that it 
 is trying to do something like defragment the file system. Is there a 
 way to configure this behaviour in some way? The main use-case I have 
 for nilfs is cheap flash media that suffers from terrible random-write 
 performance, but on such media this many writes are going to cause media 
 failure very quickly. What can be done about this?

 Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nilfs_cleanerd using a lot of disk-write bandwidth
       [not found] ` <94b06fe504b540199f338f9bd4ed890f-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
@ 2011-08-09 11:03   ` dexen deVries
       [not found]     ` <201108091303.54968.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: dexen deVries @ 2011-08-09 11:03 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Gordan,

On Tuesday 09 of August 2011 12:18:12 you wrote:
>  I'm seeing nilfs_cleanerd using a lot of disk write bandwidth according
>  to iotop. It seems to be performing approximately equal amounts of reads
>  and writes when it is running. Reads I can understand, but why is it
>  writing so much in order to garbage collect? Should it not be just
>  trying to mark blocks as free? The disk I/O r/w symmetry implies that it
>  is trying to do something like defragment the file system. Is there a
>  way to configure this behaviour in some way? The main use-case I have
>  for nilfs is cheap flash media that suffers from terrible random-write
>  performance, but on such media this many writes are going to cause media
>  failure very quickly. What can be done about this?

I'm not a NILFS2 developer, so don't rely too much on the following remarks!

NILFS2 consider filesystem as a (wrapped around) list of segments, by default 
each 8MB. Those segments contain both file data and metadata.

cleanerd operates on whole segments; normally either 2 or 4 in one pass 
(depending on remaining free space). It seems to me a segment is reclaimed 
when there is any amount of garbage in it, no matter how small. Thus you see, 
in some cases, about as much of read as of write.

One way could be be to make cleanerd configurable so it doesn't reclaim 
segments that have only very little garbage in them. That would probably be a 
trade-off between wasted diskspace and lessened bandwidth use.

As for wearing flash media down, I believe NILFS2 is still very good for them, 
because it tends to write in large chunks -- much larger than the original 
512B sector -- and not over-write once written areas (untill reclaimed by 
cleanerd, often much, much later). Once the flash' large erase unit is erased, 
NILFS2 append-writes to it, but not over-writes already written data. Which 
means the flash is erased almost as little as possible.

Regards,
-- 
dexen deVries

[[[↓][→]]]

For example, if the first thing in the file is:
   <?kzy irefvba="1.0" rapbqvat="ebg13"?>
an XML parser will recognize that the document is stored in the traditional 
ROT13 encoding.

(( Joe English, http://www.flightlab.com/~joe/sgml/faq-not.txt ))
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nilfs_cleanerd using a lot of disk-write bandwidth
       [not found]     ` <201108091303.54968.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-08-09 12:25       ` Gordan Bobic
       [not found]         ` <4ef54cf8b3d0b2725aa1788d98ffbbe5-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Gordan Bobic @ 2011-08-09 12:25 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

 On Tue, 9 Aug 2011 13:03:54 +0200, dexen deVries 
 <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Gordan,
>
>
> On Tuesday 09 of August 2011 12:18:12 you wrote:
>>  I'm seeing nilfs_cleanerd using a lot of disk write bandwidth 
>> according
>>  to iotop. It seems to be performing approximately equal amounts of 
>> reads
>>  and writes when it is running. Reads I can understand, but why is 
>> it
>>  writing so much in order to garbage collect? Should it not be just
>>  trying to mark blocks as free? The disk I/O r/w symmetry implies 
>> that it
>>  is trying to do something like defragment the file system. Is there 
>> a
>>  way to configure this behaviour in some way? The main use-case I 
>> have
>>  for nilfs is cheap flash media that suffers from terrible 
>> random-write
>>  performance, but on such media this many writes are going to cause 
>> media
>>  failure very quickly. What can be done about this?
>
>
> I'm not a NILFS2 developer, so don't rely too much on the following 
> remarks!
>
> NILFS2 consider filesystem as a (wrapped around) list of segments, by
> default
> each 8MB. Those segments contain both file data and metadata.
>
> cleanerd operates on whole segments; normally either 2 or 4 in one 
> pass
> (depending on remaining free space). It seems to me a segment is 
> reclaimed
> when there is any amount of garbage in it, no matter how small. Thus
> you see,
> in some cases, about as much of read as of write.
>
> One way could be be to make cleanerd configurable so it doesn't 
> reclaim
> segments that have only very little garbage in them. That would
> probably be a
> trade-off between wasted diskspace and lessened bandwidth use.
>
> As for wearing flash media down, I believe NILFS2 is still very good
> for them,
> because it tends to write in large chunks -- much larger than the 
> original
> 512B sector -- and not over-write once written areas (untill 
> reclaimed by
> cleanerd, often much, much later). Once the flash' large erase unit
> is erased,
> NILFS2 append-writes to it, but not over-writes already written data. 
> Which
> means the flash is erased almost as little as possible.

 Interesting. I still think something should be done to minimize the 
 amount of writes required. How about something like the following. 
 Divide situations into 3 classes (thresholds should be adjustable in 
 nilfs_cleanerd.conf):

 1) Free space good (e.g. space >= 25%)
 Don't do any garbage collection at all, unless an entire block contains 
 only garbage.

 2) Free space low (e.g. 10% < space < 25%)
 Run GC as now, with the nice/ionice applied. Only GC blocks where 
 $block_free_space_percent >= $disk_free_space_percent. So as the disk 
 space starts to decrease, the number of blocks that get considered for 
 GC increase, too.

 3) Free space critical (e.g. space < 10%)
 As 2) but start decreasing niceness/ioniceness (niceness by 3 for every 
 1% drop in free space, so for example:
 10% - 19
 ...
 7% - 10
 ...
 4% - 1
 3% - -2
 ...
 1% - -8

 This would give a very gradual increase in GC aggressiveness that would 
 both minimize unnecessary writes that shorted flash life and provide a 
 softer landing in terms of performance degradation as space starts to 
 run out.

 The other idea that comes to mind on top of this is to GC blocks in 
 order of % of space in the block being reclaimable. That would allow for 
 the minimum number of blocks to always be GC-ed to get the free space 
 above the required threshold.

 Thoughts?

 Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nilfs_cleanerd using a lot of disk-write bandwidth
       [not found]         ` <4ef54cf8b3d0b2725aa1788d98ffbbe5-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
@ 2011-08-09 15:19           ` dexen deVries
       [not found]             ` <201108091719.01585.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: dexen deVries @ 2011-08-09 15:19 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tuesday 09 of August 2011 14:25:07 you wrote:
>  Interesting. I still think something should be done to minimize the
>  amount of writes required. How about something like the following.
>  Divide situations into 3 classes (thresholds should be adjustable in
>  nilfs_cleanerd.conf):
> 
>  1) Free space good (e.g. space >= 25%)
>  Don't do any garbage collection at all, unless an entire block contains
>  only garbage.
> 
>  2) Free space low (e.g. 10% < space < 25%)
>  Run GC as now, with the nice/ionice applied. Only GC blocks where
>  $block_free_space_percent >= $disk_free_space_percent. So as the disk
>  space starts to decrease, the number of blocks that get considered for
>  GC increase, too.
> 
>  3) Free space critical (e.g. space < 10%)
>  As 2) but start decreasing niceness/ioniceness (niceness by 3 for every
>  1% drop in free space, so for example:
>  10% - 19
>  ...
>  7% - 10
>  ...
>  4% - 1
>  3% - -2
>  ...
>  1% - -8
> 
>  This would give a very gradual increase in GC aggressiveness that would
>  both minimize unnecessary writes that shorted flash life and provide a
>  softer landing in terms of performance degradation as space starts to
>  run out.
> 
>  The other idea that comes to mind on top of this is to GC blocks in
>  order of % of space in the block being reclaimable. That would allow for
>  the minimum number of blocks to always be GC-ed to get the free space
>  above the required threshold.
> 
>  Thoughts?


Could end up being too slow. A 2TB filesystem has about 260'000 segments (given 
the default size of 8MB). cleanerd already takes quite a bit of CPU power  at 
times.

Also, cleanerd can do a lot of HDD seeks, if some parts of metadata aren't in 
cache. Performing some 260'000 seeks on a harddrive would take anywhere from 
1000 to 3000 seconds; that not very interactive. Actually, it gets dangerously 
close to an hour.

However, if the cleanerd did not have to follow this exact algorithm, but 
instead id something roughly similar (heueristics rather than algorithm), it 
could be good enough.

Possibly related, I'd love if cleanerd tented to do some mild de-fragmentation 
of files. Not necessarily full-blown, exact defragmentation, just placing quite 
stuff close together.


-- 
dexen deVries

[[[↓][→]]]

For example, if the first thing in the file is:
   <?kzy irefvba="1.0" rapbqvat="ebg13"?>
an XML parser will recognize that the document is stored in the traditional 
ROT13 encoding.

(( Joe English, http://www.flightlab.com/~joe/sgml/faq-not.txt ))
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nilfs_cleanerd using a lot of disk-write bandwidth
       [not found]             ` <201108091719.01585.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-08-09 15:45               ` Gordan Bobic
  0 siblings, 0 replies; 5+ messages in thread
From: Gordan Bobic @ 2011-08-09 15:45 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

 On Tue, 9 Aug 2011 17:19:01 +0200, dexen deVries 
 <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tuesday 09 of August 2011 14:25:07 you wrote:
>>  Interesting. I still think something should be done to minimize the
>>  amount of writes required. How about something like the following.
>>  Divide situations into 3 classes (thresholds should be adjustable 
>> in
>>  nilfs_cleanerd.conf):
>>
>>  1) Free space good (e.g. space >= 25%)
>>  Don't do any garbage collection at all, unless an entire block 
>> contains
>>  only garbage.
>>
>>  2) Free space low (e.g. 10% < space < 25%)
>>  Run GC as now, with the nice/ionice applied. Only GC blocks where
>>  $block_free_space_percent >= $disk_free_space_percent. So as the 
>> disk
>>  space starts to decrease, the number of blocks that get considered 
>> for
>>  GC increase, too.
>>
>>  3) Free space critical (e.g. space < 10%)
>>  As 2) but start decreasing niceness/ioniceness (niceness by 3 for 
>> every
>>  1% drop in free space, so for example:
>>  10% - 19
>>  ...
>>  7% - 10
>>  ...
>>  4% - 1
>>  3% - -2
>>  ...
>>  1% - -8
>>
>>  This would give a very gradual increase in GC aggressiveness that 
>> would
>>  both minimize unnecessary writes that shorted flash life and 
>> provide a
>>  softer landing in terms of performance degradation as space starts 
>> to
>>  run out.
>>
>>  The other idea that comes to mind on top of this is to GC blocks in
>>  order of % of space in the block being reclaimable. That would 
>> allow for
>>  the minimum number of blocks to always be GC-ed to get the free 
>> space
>>  above the required threshold.
>>
>>  Thoughts?
>
>
> Could end up being too slow. A 2TB filesystem has about 260'000
> segments (given
> the default size of 8MB). cleanerd already takes quite a bit of CPU
> power  at times.
>
> Also, cleanerd can do a lot of HDD seeks, if some parts of metadata
> aren't in
> cache. Performing some 260'000 seeks on a harddrive would take 
> anywhere from
> 1000 to 3000 seconds; that not very interactive. Actually, it gets
> dangerously close to an hour.
>
> However, if the cleanerd did not have to follow this exact algorithm, 
> but
> instead id something roughly similar (heueristics rather than 
> algorithm), it
> could be good enough.

 Well, you could adjust all the numbers in the algorithm. :)

 As an aside, why would you use nilfs on a multi-TB FS? What's the 
 advantage? The way I see it the killer application for nilfs is slow 
 flash media with (probably) poorly implemented wear leveling.

 The idea of the above is that you don't end up suffering poor disk 
 performance due to background clean-up until you actually have a 
 plausible chance of running out of space. What is the point of GC-ing if 
 there is already 80% of empty space ready for writing to? All you'll be 
 doing is making the fs slow for no obvious gain.

> Possibly related, I'd love if cleanerd tented to do some mild
> de-fragmentation
> of files. Not necessarily full-blown, exact defragmentation, just
> placing quite stuff close together.

 If it's garbage collecting involves reading a block and re-writing it 
 without the deleted data, then isn't that already effectively 
 defragmenting the fs?

 Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-09 15:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-09 10:18 nilfs_cleanerd using a lot of disk-write bandwidth Gordan Bobic
     [not found] ` <94b06fe504b540199f338f9bd4ed890f-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
2011-08-09 11:03   ` dexen deVries
     [not found]     ` <201108091303.54968.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-08-09 12:25       ` Gordan Bobic
     [not found]         ` <4ef54cf8b3d0b2725aa1788d98ffbbe5-tp2ajI7sM87MEvS+BUbURm2TqnkC6wfpXqFh9Ls21Oc@public.gmane.org>
2011-08-09 15:19           ` dexen deVries
     [not found]             ` <201108091719.01585.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-08-09 15:45               ` Gordan Bobic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).