Re: Does nilfs2 do any in-place writes?

Linux NILFS development
 help / color / mirror / Atom feed

* Re:  Does nilfs2 do any in-place writes?
@ 2014-01-17 19:19 Mark Trumpold
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Trumpold @ 2014-01-17 19:19 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ, Mark Trumpold
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


>-----Original Message-----
>From: Vyacheslav Dubeyko [mailto:slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org]
>Sent: Thursday, January 16, 2014 10:31 PM
>To: 'Mark Trumpold'
>Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>Subject: Re: Does nilfs2 do any in-place writes?
>
>On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
>> Hello All,
>> 
>> I am wondering what the impact of in-place writes of the
>> superblock has on SSDs in terms of wear?
>> 
>> I've been stress testing our system which uses Nilfs, and
>> recently I had a SSD fail with the classic messages indicating
>> low level media problems -- and also implicating Nilfs as trying
>> to locate a superblock (I think).
>> 
>> Following is a partial dmesg list: 
>> 
>> [    7.630382] Sense Key : Medium Error [current] [descriptor]
>> [    7.630385] Descriptor sense data with sense descriptors (in hex):
>> [    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
>> [    7.630394]         05 ff 0e 58 
>> [    7.630397] sd 0:0:0:0: [sda]  
>> [    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
>> [    7.630401] sd 0:0:0:0: [sda] CDB: 
>> [    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
>> [    7.630409] end_request: I/O error, dev sda, sector 100601432
>> [    7.635326] NILFS warning: I/O error on loading last segment
>> [    7.635329] NILFS: error searching super root.
>> 
>> 
>
>I don't think that this issue is related to superblocks. Because I can't
>see in your output the magic signature of NILFS2. For example, I have
>such first 16 bytes in superblock:
>
>00000400  02 00 00 00 00 00 34 34  18 01 00 00 52 85 db 71  |......44....R..q|
>
>Of course, I don't know your partition table details but I doubt that
>sector 100601432 is a superblock sector. Moreover, you have error
>messages that inform about troubles with loading last segment during
>super root searching.
>
>We have on NILFS2 only two blocks that live under in-place update
>policy. An update frequency is not so high. So, I suppose that any FTL
>can easily provide good wear leveling support for superblocks. But, of
>course, in-place update is not good policy for flash-based devices,
>anyway.
>
>Maybe, I misunderstand something in your output. But I suppose that
>during stress-testing you can discover I/O error in any part of volume.
>Because it is really hard to predict when you will exhaust spare pool of
>erase blocks.
>
>With the best regards,
>Vyacheslav Dubeyko.
>
>
>

Hi Vyacheslav,

Thank you for taking a look at this.

Your assessment makes good sense, and I am relieved we have
a plausible explanation.

BTW: I upgraded to the 3.11.6 linux kernel (per your's and Ryusuke's
suggestions) to pick up the most recent Nilfs devel, and am finding things
to be very stable.

Best regards,
Mark T.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re:  Does nilfs2 do any in-place writes?
@ 2014-01-16 19:40 Mark Trumpold
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Trumpold @ 2014-01-16 19:40 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

> -----Original Message-----
> From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Clemens Eisserer
> Sent: Thursday, January 16, 2014 10:42 AM
> To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: [PossibleSpam] Re: Does nilfs2 do any in-place writes?
>
> Hi Mark,
>
> > I am wondering what the impact of in-place writes of the
> > superblock has on SSDs in terms of wear?
>
> Typically SSDs have far more advanved static wear leveling algorithms
> which keep the erease count for each erease block as well as a
> sophisticated mapping table. Otherwise e.g. journaling file systems
> would probably kill it quickly.
>
> Regards, Clemens
> --

Hi Clemens,

Thank you for the info.  That was my prior understanding; however
I thought it curious that the SSD failure sited nilfs trying to access
the superblock which had failed at the media level.

It was a fairly high end SSD with TRIM, etc (Corsair Force 240GB).
Working with the vendor to analyze further..

Thanks again,
Mark T. 



--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
@ 2014-01-16 17:48 Mark Trumpold
  2014-01-16 18:41 ` Clemens Eisserer
  2014-01-17  6:31 ` Vyacheslav Dubeyko
  0 siblings, 2 replies; 24+ messages in thread
From: Mark Trumpold @ 2014-01-16 17:48 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hello All,

I am wondering what the impact of in-place writes of the
superblock has on SSDs in terms of wear?

I've been stress testing our system which uses Nilfs, and
recently I had a SSD fail with the classic messages indicating
low level media problems -- and also implicating Nilfs as trying
to locate a superblock (I think).

Following is a partial dmesg list: 

[    7.630382] Sense Key : Medium Error [current] [descriptor]
[    7.630385] Descriptor sense data with sense descriptors (in hex):
[    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[    7.630394]         05 ff 0e 58 
[    7.630397] sd 0:0:0:0: [sda]  
[    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
[    7.630401] sd 0:0:0:0: [sda] CDB: 
[    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
[    7.630409] end_request: I/O error, dev sda, sector 100601432
[    7.635326] NILFS warning: I/O error on loading last segment
[    7.635329] NILFS: error searching super root.


Best regards,
Mark T.



--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 17:48 Mark Trumpold
@ 2014-01-16 18:41 ` Clemens Eisserer
  2014-01-17  6:31 ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-16 18:41 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Mark,

> I am wondering what the impact of in-place writes of the
> superblock has on SSDs in terms of wear?

Typically SSDs have far more advanved static wear leveling algorithms
which keep the erease count for each erease block as well as a
sophisticated mapping table. Otherwise e.g. journaling file systems
would probably kill it quickly.

Regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 17:48 Mark Trumpold
  2014-01-16 18:41 ` Clemens Eisserer
@ 2014-01-17  6:31 ` Vyacheslav Dubeyko
  2014-01-18  1:47   ` Ryusuke Konishi
  1 sibling, 1 reply; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-17  6:31 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
> Hello All,
> 
> I am wondering what the impact of in-place writes of the
> superblock has on SSDs in terms of wear?
> 
> I've been stress testing our system which uses Nilfs, and
> recently I had a SSD fail with the classic messages indicating
> low level media problems -- and also implicating Nilfs as trying
> to locate a superblock (I think).
> 
> Following is a partial dmesg list: 
> 
> [    7.630382] Sense Key : Medium Error [current] [descriptor]
> [    7.630385] Descriptor sense data with sense descriptors (in hex):
> [    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
> [    7.630394]         05 ff 0e 58 
> [    7.630397] sd 0:0:0:0: [sda]  
> [    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
> [    7.630401] sd 0:0:0:0: [sda] CDB: 
> [    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
> [    7.630409] end_request: I/O error, dev sda, sector 100601432
> [    7.635326] NILFS warning: I/O error on loading last segment
> [    7.635329] NILFS: error searching super root.
> 
> 

I don't think that this issue is related to superblocks. Because I can't
see in your output the magic signature of NILFS2. For example, I have
such first 16 bytes in superblock:

00000400  02 00 00 00 00 00 34 34  18 01 00 00 52 85 db 71  |......44....R..q|

Of course, I don't know your partition table details but I doubt that
sector 100601432 is a superblock sector. Moreover, you have error
messages that inform about troubles with loading last segment during
super root searching.

We have on NILFS2 only two blocks that live under in-place update
policy. An update frequency is not so high. So, I suppose that any FTL
can easily provide good wear leveling support for superblocks. But, of
course, in-place update is not good policy for flash-based devices,
anyway.

Maybe, I misunderstand something in your output. But I suppose that
during stress-testing you can discover I/O error in any part of volume.
Because it is really hard to predict when you will exhaust spare pool of
erase blocks.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-17  6:31 ` Vyacheslav Dubeyko
@ 2014-01-18  1:47   ` Ryusuke Konishi
       [not found]     ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Ryusuke Konishi @ 2014-01-18  1:47 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Mark Trumpold, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 17 Jan 2014 10:31:55 +0400, Vyacheslav Dubeyko wrote:
> On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
>> Hello All,
>> 
>> I am wondering what the impact of in-place writes of the
>> superblock has on SSDs in terms of wear?
>> 
>> I've been stress testing our system which uses Nilfs, and
>> recently I had a SSD fail with the classic messages indicating
>> low level media problems -- and also implicating Nilfs as trying
>> to locate a superblock (I think).
>> 
>> Following is a partial dmesg list: 
>> 
>> [    7.630382] Sense Key : Medium Error [current] [descriptor]
>> [    7.630385] Descriptor sense data with sense descriptors (in hex):
>> [    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
>> [    7.630394]         05 ff 0e 58 
>> [    7.630397] sd 0:0:0:0: [sda]  
>> [    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
>> [    7.630401] sd 0:0:0:0: [sda] CDB: 
>> [    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
>> [    7.630409] end_request: I/O error, dev sda, sector 100601432
>> [    7.635326] NILFS warning: I/O error on loading last segment
>> [    7.635329] NILFS: error searching super root.
>> 
>> 
> 
> I don't think that this issue is related to superblocks. Because I can't
> see in your output the magic signature of NILFS2. For example, I have
> such first 16 bytes in superblock:
> 
> 00000400  02 00 00 00 00 00 34 34  18 01 00 00 52 85 db 71  |......44....R..q|
> 
> Of course, I don't know your partition table details but I doubt that
> sector 100601432 is a superblock sector. Moreover, you have error
> messages that inform about troubles with loading last segment during
> super root searching.
> 
> We have on NILFS2 only two blocks that live under in-place update
> policy. An update frequency is not so high. So, I suppose that any FTL
> can easily provide good wear leveling support for superblocks. But, of
> course, in-place update is not good policy for flash-based devices,
> anyway.
> 
> Maybe, I misunderstand something in your output. But I suppose that
> during stress-testing you can discover I/O error in any part of volume.
> Because it is really hard to predict when you will exhaust spare pool of
> erase blocks.

Rather, the issue on the flash devices may come from the current
immature garbage collection algorithm.  The current cleanerd only
supports the timestamp-based GC policy which always tries to move the
oldest segment first and even moves segments full of live blocks,
thereby shortens the lifetime of flash devices. :-(

Actually, this is a high-priority todo, and now I am inclined to
consider it with the group concept of segments.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]     ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-18  9:44       ` Clemens Eisserer
       [not found]         ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-01-18 11:45       ` Andreas Rohner
  1 sibling, 1 reply; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-18  9:44 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi again,

> Rather, the issue on the flash devices may come from the current
> immature garbage collection algorithm.  The current cleanerd only
> supports the timestamp-based GC policy which always tries to move the
> oldest segment first and even moves segments full of live blocks,
> thereby shortens the lifetime of flash devices. :-(

It depends - for SSDs the timestamp policy is not optimal as it leads
to unnecessary writes.

On the other hand, most cards only implement dynamic wear leveling
(wear leveling takes only place for areas that are written to, which
leads to very uneven wear distribution when there is mostly static
data) and also don't have read-disturb handling.
So for cards it is actually helpful to have the writes spread out
evenly and as a bonus there is no need to worry about read-disturb
effects =)

Regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]         ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-18 16:25           ` Mark Trumpold
       [not found]             ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Trumpold @ 2014-01-18 16:25 UTC (permalink / raw)
  To: Clemens Eisserer,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org



On 1/18/14 1:44 AM, "Clemens Eisserer" <linuxhippy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

>Hi again,
>
>> Rather, the issue on the flash devices may come from the current
>> immature garbage collection algorithm.  The current cleanerd only
>> supports the timestamp-based GC policy which always tries to move the
>> oldest segment first and even moves segments full of live blocks,
>> thereby shortens the lifetime of flash devices. :-(
>
>It depends - for SSDs the timestamp policy is not optimal as it leads
>to unnecessary writes.
>
>On the other hand, most cards only implement dynamic wear leveling
>(wear leveling takes only place for areas that are written to, which
>leads to very uneven wear distribution when there is mostly static
>data) and also don't have read-disturb handling.
>So for cards it is actually helpful to have the writes spread out
>evenly and as a bonus there is no need to worry about read-disturb
>effects =)
>
>Regards, Clemens
>--
>To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi Clemens and group,

Good information.  So, is it true that the logging/cow nature of
nilfs actually improves wear leveling by having 'writes spread out
evenly'?

Regards,
Mark T.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>
@ 2014-01-18 18:11               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-18 18:11 UTC (permalink / raw)
  To: Mark Trumpold
  Cc: Clemens Eisserer,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Jan 18, 2014, at 7:25 PM, Mark Trumpold wrote:

> Hi Clemens and group,
> 
> Good information.  So, is it true that the logging/cow nature of
> nilfs actually improves wear leveling by having 'writes spread out
> evenly'?

Of course, COW policy is much better for flash than in-place update.
But even if NILFS2 (or any other file system) uses COW approach then
namely FTL does wear-leveling. And namely FTL's algorithms define
wear-leveling efficiency. So, I don't think that NILFS2 can improve
wear-leveling. COW policy can leave FTL "cold" and to give opportunity
not to use sophisticated algorithms of wear-leveling and using spare
pool of erase blocks. But, moreover, it is very desirable to use TRIM
command with COW policy also for improving FTL efficiency and
performance.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
       [not found]     ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-18  9:44       ` Clemens Eisserer
@ 2014-01-18 11:45       ` Andreas Rohner
       [not found]         ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Andreas Rohner @ 2014-01-18 11:45 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs

On 2014-01-18 02:47, Ryusuke Konishi wrote:
> On Fri, 17 Jan 2014 10:31:55 +0400, Vyacheslav Dubeyko wrote:
>> On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
>>> Hello All,
>>>
>>> I am wondering what the impact of in-place writes of the
>>> superblock has on SSDs in terms of wear?
>>>
>>> I've been stress testing our system which uses Nilfs, and
>>> recently I had a SSD fail with the classic messages indicating
>>> low level media problems -- and also implicating Nilfs as trying
>>> to locate a superblock (I think).
>>>
>>> Following is a partial dmesg list: 
>>>
>>> [    7.630382] Sense Key : Medium Error [current] [descriptor]
>>> [    7.630385] Descriptor sense data with sense descriptors (in hex):
>>> [    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
>>> [    7.630394]         05 ff 0e 58 
>>> [    7.630397] sd 0:0:0:0: [sda]  
>>> [    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
>>> [    7.630401] sd 0:0:0:0: [sda] CDB: 
>>> [    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
>>> [    7.630409] end_request: I/O error, dev sda, sector 100601432
>>> [    7.635326] NILFS warning: I/O error on loading last segment
>>> [    7.635329] NILFS: error searching super root.
>>>
>>>
>>
>> I don't think that this issue is related to superblocks. Because I can't
>> see in your output the magic signature of NILFS2. For example, I have
>> such first 16 bytes in superblock:
>>
>> 00000400  02 00 00 00 00 00 34 34  18 01 00 00 52 85 db 71  |......44....R..q|
>>
>> Of course, I don't know your partition table details but I doubt that
>> sector 100601432 is a superblock sector. Moreover, you have error
>> messages that inform about troubles with loading last segment during
>> super root searching.
>>
>> We have on NILFS2 only two blocks that live under in-place update
>> policy. An update frequency is not so high. So, I suppose that any FTL
>> can easily provide good wear leveling support for superblocks. But, of
>> course, in-place update is not good policy for flash-based devices,
>> anyway.
>>
>> Maybe, I misunderstand something in your output. But I suppose that
>> during stress-testing you can discover I/O error in any part of volume.
>> Because it is really hard to predict when you will exhaust spare pool of
>> erase blocks.
> 
> Rather, the issue on the flash devices may come from the current
> immature garbage collection algorithm.  The current cleanerd only
> supports the timestamp-based GC policy which always tries to move the
> oldest segment first and even moves segments full of live blocks,
> thereby shortens the lifetime of flash devices. :-(
> 
> Actually, this is a high-priority todo, and now I am inclined to
> consider it with the group concept of segments.

Hi,

I am currently working on the garbage collector. I have implemented the
cost-benefit and greedy policies. It is quite a big change and I was
reluctant to submit a patch until I thoroughly tested it. I have
substantially redesigned it since last time I wrote about it on the
mailinglist. Now it seems to be very stable and the results are quite
promising.

The following results [1] are from my "ultimate" benchmark. It runs on
an AMD Phenom II X6 1090T processor with 8GB Ram and a Samsung SSD 840
with a 100GB partition for NILFS2. I used the Lair62 NFS traces form the
IOTTA Repository [2] to get a realistic and reproducible benchmark:

This is what the benchmark does:

1. Create a 20GB file of static data
2a. Start replaying the Lair62 NFS traces
2b. In parallel turn random checkpoints into snapshots every 5 minutes,
keep a list of the snapshots and turn them back into checkpoints after
15 minutes, so there are at most 3 snapshots present at the same time.

Timestamp is so slow, because it needlessly copies the 20GB static data
around over and over again, which can be seen because of the periodic
drops in performance. The other policies ignore the static data and
never move it. This is also evident if you compare the amount of data
written to the device [3] (compare /proc/diskstats before and after the
benchmark).

If you are interested I could clean up my code and submit a patch set
for review. I am sure there are lots of things, that need to be changed,
but maybe it can give you some ideas...

It would also be possible, to improve timestamp by allowing the cleaner
to abort if there is nothing to gain from cleaning a particular segment.
Instead it could just updated the su_lastmod in the SUFILE without doing
anything else. This would be a fairly simple change. I could provide a
patch for that too.

Regards,
Andreas Rohner

[1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf
[2] http://iotta.snia.org/historical_section?tracetype_id=2
[3]
https://www.dropbox.com/s/nwfixlzzzvf93v2/replay_parallel_stats_write.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]         ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-18 23:08           ` Vyacheslav Dubeyko
       [not found]             ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-18 23:08 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: Ryusuke Konishi, linux-nilfs


On Jan 18, 2014, at 2:45 PM, Andreas Rohner wrote:

> If you are interested I could clean up my code and submit a patch set
> for review. I am sure there are lots of things, that need to be changed,
> but maybe it can give you some ideas...
> 
> It would also be possible, to improve timestamp by allowing the cleaner
> to abort if there is nothing to gain from cleaning a particular segment.
> Instead it could just updated the su_lastmod in the SUFILE without doing
> anything else. This would be a fairly simple change. I could provide a
> patch for that too.
> 

I think that it is very desirable to share patches for the review on early
stages because it is possible to achieve a valuable results by means of
open and continuos discussion. So, you are welcome to share your
vision and your patches.

As I remember, I had made many remarks about your approach and about
your code last time. So, I hope that you rework your approach significantly.

> Regards,
> Andreas Rohner
> 
> [1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf

To be honest, I completely misunderstand this diagram. It is hard to understand
it without additional description, from my point of view.

> [2] http://iotta.snia.org/historical_section?tracetype_id=2
> [3]
> https://www.dropbox.com/s/nwfixlzzzvf93v2/replay_parallel_stats_write.pdf

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-18 23:08               ` Andreas Rohner
       [not found]                 ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Rohner @ 2014-01-18 23:08 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs

On 2014-01-19 00:08, Vyacheslav Dubeyko wrote:
> I think that it is very desirable to share patches for the review on early
> stages because it is possible to achieve a valuable results by means of
> open and continuos discussion. So, you are welcome to share your
> vision and your patches.

Good I will prepare my patches.

> As I remember, I had made many remarks about your approach and about
> your code last time. So, I hope that you rework your approach significantly.

Yes it is basically a complete rewrite.

>> [1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf
> 
> To be honest, I completely misunderstand this diagram. It is hard to understand
> it without additional description, from my point of view.

Yes you are probably right about that. It is generated from a GC log
file. For every GC operation I print out the number of live blocks. For
example if the GC cleans a segment and 90% of the blocks in it are live,
then 90% of the blocks need to be moved to a new segment. Moving blocks
is undesirable and therefore I call that "inefficient". In this example
the efficiency would be 10%. So 10% would then become one point in the
graph at time 0. The goal of every cleaning policy is to find segments
with as few live blocks as possible. So efficiency is basically the
percentage of dead blocks. Maybe I should label it differently...

The vertical dashed lines mark the time, when the benchmark finished.
The GC still runs on for some time after that until it reaches the
max_clean_segments threshold.

The graph shows, that the cost-benefit and greedy policies are better at
finding segments with a lot of dead blocks. So less blocks need to be
moved to new segments and the benchmark finishes in less than half the time.

Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]                 ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-19  5:43                   ` Ryusuke Konishi
       [not found]                     ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Ryusuke Konishi @ 2014-01-19  5:43 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sun, 19 Jan 2014 00:08:58 +0100, Andreas Rohner wrote:
> On 2014-01-19 00:08, Vyacheslav Dubeyko wrote:
>> I think that it is very desirable to share patches for the review on early
>> stages because it is possible to achieve a valuable results by means of
>> open and continuos discussion. So, you are welcome to share your
>> vision and your patches.
> 
> Good I will prepare my patches.

Before submitting patches, please test them with scripts/checkpatch.pl
in the latest nilfs-utils.git.  This would help to reduce coding style
issues, and we will be able to concentrate on design and
implementation of the patches.

If your patchset affects compatibility of existing code, it would be
helpful if you clarify them in terms of disk format compatibility,
ioctl interface compatibility, library compatibility, and
CUI-compatibility.

I expect that your work makes progress on the GC issue adding improved
algorithm (cost-benefit altogorithm, greedy algorithm, and further) to
cleanerd.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]                     ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-19 14:11                       ` Andreas Rohner
  0 siblings, 0 replies; 24+ messages in thread
From: Andreas Rohner @ 2014-01-19 14:11 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-19 06:43, Ryusuke Konishi wrote:
> On Sun, 19 Jan 2014 00:08:58 +0100, Andreas Rohner wrote:
>> On 2014-01-19 00:08, Vyacheslav Dubeyko wrote:
>>> I think that it is very desirable to share patches for the review on early
>>> stages because it is possible to achieve a valuable results by means of
>>> open and continuos discussion. So, you are welcome to share your
>>> vision and your patches.
>>
>> Good I will prepare my patches.
> 
> Before submitting patches, please test them with scripts/checkpatch.pl
> in the latest nilfs-utils.git.  This would help to reduce coding style
> issues, and we will be able to concentrate on design and
> implementation of the patches.

I used scripts/checkpatch.pl and I hope everything is correct now.

> If your patchset affects compatibility of existing code, it would be
> helpful if you clarify them in terms of disk format compatibility,
> ioctl interface compatibility, library compatibility, and
> CUI-compatibility.

Yes there are some compatibility issues with the implementation of a
counter for segment usage tracking.

> I expect that your work makes progress on the GC issue adding improved
> algorithm (cost-benefit altogorithm, greedy algorithm, and further) to
> cleanerd.

Yes that's right. Although the patch set I submitted today has nothing
to do with cost-benefit and greedy. I decided to first try and submit a
smaller contribution with no compatibility issues.

Best regards,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Does nilfs2 do any in-place writes?
@ 2014-01-15 10:44 Clemens Eisserer
       [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-15 10:44 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

Recently my raspberry pi destroyed a 32GB SD card after only 4 days,
because that cheap SD card seemed to have issues with wear-leveling.
The areas where the ext4 journal was stored were no longer read- or writeable.

I wonder which write-access patterns nilfs2 does exhibit.
Are there any frequent in-place updates to statically positioned data
structures (superblock, translation tables, ...) or is the data mostly
written sequentially?

Thank you in advance, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-15 10:52   ` Vyacheslav Dubeyko
  2014-01-15 11:44     ` Clemens Eisserer
  0 siblings, 1 reply; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-15 10:52 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2014-01-15 at 11:44 +0100, Clemens Eisserer wrote:
> Hi,
> 
> Recently my raspberry pi destroyed a 32GB SD card after only 4 days,
> because that cheap SD card seemed to have issues with wear-leveling.
> The areas where the ext4 journal was stored were no longer read- or writeable.
> 
> I wonder which write-access patterns nilfs2 does exhibit.
> Are there any frequent in-place updates to statically positioned data
> structures (superblock, translation tables, ...) or is the data mostly
> written sequentially?
> 

The main approach of NILFS2 is COW (copy-on-write) policy. It means that
all data and metadata are written in log manner. Only superblocks are
placed in fixed positions and updated there. First superblock is located
in the volume begin, second one in the volume end.

With the best regards,
Vyacheslav Dubeyko.

> Thank you in advance, Clemens
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 10:52   ` Vyacheslav Dubeyko
@ 2014-01-15 11:44     ` Clemens Eisserer
       [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-15 11:44 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> The main approach of NILFS2 is COW (copy-on-write) policy. It means that
> all data and metadata are written in log manner. Only superblocks are
> placed in fixed positions and updated there. First superblock is located
> in the volume begin, second one in the volume end.

Can you give me an estime how often the superblock is updated / written to?

Thanks a lot, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-15 12:01         ` Vyacheslav Dubeyko
  2014-01-15 15:23           ` Ryusuke Konishi
  2014-01-16 10:03           ` Clemens Eisserer
  0 siblings, 2 replies; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-15 12:01 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote:
> Hi Vyacheslav,
> 
> > The main approach of NILFS2 is COW (copy-on-write) policy. It means that
> > all data and metadata are written in log manner. Only superblocks are
> > placed in fixed positions and updated there. First superblock is located
> > in the volume begin, second one in the volume end.
> 
> Can you give me an estime how often the superblock is updated / written to?
> 

NILFS2 has special method nilfs_sb_need_update() [1] and special
constant NILFS_SB_FREQ [2] that it is used usually for definition
frequency of superblocks updating. So, as far as I can judge, default
value of such frequency under high I/O load is 10 seconds (Minimum
interval of periodical update of superblocks (in seconds)).

With the best regards,
Vyacheslav Dubeyko.

[1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
[2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 12:01         ` Vyacheslav Dubeyko
@ 2014-01-15 15:23           ` Ryusuke Konishi
       [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-16 10:03           ` Clemens Eisserer
  1 sibling, 1 reply; 24+ messages in thread
From: Ryusuke Konishi @ 2014-01-15 15:23 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,
On Wed, 15 Jan 2014 16:01:44 +0400, Vyacheslav Dubeyko wrote:
> Hi Clemens,
> 
> On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote:
>> Hi Vyacheslav,
>> 
>> > The main approach of NILFS2 is COW (copy-on-write) policy. It means that
>> > all data and metadata are written in log manner. Only superblocks are
>> > placed in fixed positions and updated there. First superblock is located
>> > in the volume begin, second one in the volume end.
>> 
>> Can you give me an estime how often the superblock is updated / written to?
>> 
> 
> NILFS2 has special method nilfs_sb_need_update() [1] and special
> constant NILFS_SB_FREQ [2] that it is used usually for definition
> frequency of superblocks updating. So, as far as I can judge, default
> value of such frequency under high I/O load is 10 seconds (Minimum
> interval of periodical update of superblocks (in seconds)).
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
> [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252

By the way, do you have any nice idea to stop periodical update of
superblocks?

Most information on the superblock is static (layout information
or so).

sbp->s_state has an ERROR state bit and a VALID state bit, but
these bits are mostly static.

sbp->s_free_blocks_count keeps free block count at the time, but I
think this information is not important because it can be calculated
from the number of clean segments.

We need to the periodical in-place superblock write only for updating
a pointer to the most latest log.  And, this will be eliminable if we
can invent the fast way to determine the latest log.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-16 10:08               ` Vyacheslav Dubeyko
  2014-01-17 22:55                 ` Ryusuke Konishi
  2014-01-18  0:00                 ` Ryusuke Konishi
  0 siblings, 2 replies; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-16 10:08 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote:

> 
> By the way, do you have any nice idea to stop periodical update of
> superblocks?
> 

Yes, I think too that such suggestion is valuable for NILFS2. But I
suppose that the problem is more complex. I mean a situation with
write-able snapshots. If we will have write-able snapshots then it means
necessity to have independent version of some superblock's  fields
(s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count,
s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For
example, snapshot can be made before xafile creation on a volume and
write-able snapshot should continue to live without possibility to xattr
creation, and so on.

> Most information on the superblock is static (layout information
> or so).
> 
> sbp->s_state has an ERROR state bit and a VALID state bit, but
> these bits are mostly static.
> 
> sbp->s_free_blocks_count keeps free block count at the time, but I
> think this information is not important because it can be calculated
> from the number of clean segments.
> 
> We need to the periodical in-place superblock write only for updating
> a pointer to the most latest log.  And, this will be eliminable if we
> can invent the fast way to determine the latest log.
> 

As far as I can see, we have more changeable fields in superblock. But,
of course, it is possible to leave in superblock only static
information. I assume that it makes sense to move changeable fields of
superblock into super root metadata structure. We can provide
independent sets of above-mentioned changeable superblock's field for
every snapshot/checkpoint in such way.

I suppose that another problem is to have unchangeable superblock (4 KB)
and changeable segment area near after it. I think that it can be not
FTL friendly situation for the case of flash using. Maybe it makes sense
to have specially reserved areas (with size is equal by segment size) in
the begin and in the end of NILFS2 volume. These areas can be used for
special metadata structures are modified with COW principle inside
reserved areas. Anyway, we need some metadata structure on volume (tree
or something else) that can give information about most latest log by
snapshot number.

So, currently, I haven't clear vision of possible good approach for
efficient search of latest log. But it needs to take into account and
possible GC policy change. Because more efficient way of garbage
collection can be a way to leave untouched "cold" segments (such full
segment can contain valid and actual data). As a result, chain of linked
partial segments on the volume can be more sophisticated and not chain
of sibling segments. Thereby, search of latest log can be not so simple
and fast operation in the environment of current search algorithm, I
think.

I think that [snapshot number | latest log] tree can be restricted by
one file system block (4 KB). So, one of the possible way is to save
changing in such tree by journal-like circular log which keeps chain of
modified blocks, for example. Maybe, it makes sense to keep pair of
values [current last log; upper bound of last log]. The upper bound can
be prediction value that it means where last log will be after some
time. Thereby, such tree and superblock(s) can live in reserved area at
the volume begin and end.

As you can see, currently, I don't suggest something concrete. I need to
think over it more deeply.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 10:08               ` Vyacheslav Dubeyko
@ 2014-01-17 22:55                 ` Ryusuke Konishi
  2014-01-18  0:00                 ` Ryusuke Konishi
  1 sibling, 0 replies; 24+ messages in thread
From: Ryusuke Konishi @ 2014-01-17 22:55 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote:
> 
>> 
>> By the way, do you have any nice idea to stop periodical update of
>> superblocks?
>> 
> 
> Yes, I think too that such suggestion is valuable for NILFS2. But I
> suppose that the problem is more complex. I mean a situation with
> write-able snapshots. If we will have write-able snapshots then it means
> necessity to have independent version of some superblock's  fields
> (s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count,
> s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For
> example, snapshot can be made before xafile creation on a volume and
> write-able snapshot should continue to live without possibility to xattr
> creation, and so on.

OK, please tell me what do you suppose about the writable snapshot.

Do you think we should keep multiple branches or concurrently
mountable namespaces on one device ?

I prefer to maintain only one super root block per partition even if
we support writable snapshots.  Otherwise, I think we should use
multiple partitions to simplify the design.

I mean keeping multiple branches in one super root block with a DAT
file and a sufile in such a case.  Maintaining multiple DAT files and
sufiles on one device seems too complex to me.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 10:08               ` Vyacheslav Dubeyko
  2014-01-17 22:55                 ` Ryusuke Konishi
@ 2014-01-18  0:00                 ` Ryusuke Konishi
  1 sibling, 0 replies; 24+ messages in thread
From: Ryusuke Konishi @ 2014-01-18  0:00 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote:
>> Most information on the superblock is static (layout information
>> or so).
>> 
>> sbp->s_state has an ERROR state bit and a VALID state bit, but
>> these bits are mostly static.
>> 
>> sbp->s_free_blocks_count keeps free block count at the time, but I
>> think this information is not important because it can be calculated
>> from the number of clean segments.
>> 
>> We need to the periodical in-place superblock write only for updating
>> a pointer to the most latest log.  And, this will be eliminable if we
>> can invent the fast way to determine the latest log.
>> 
> 
> As far as I can see, we have more changeable fields in superblock. But,
> of course, it is possible to leave in superblock only static
> information. I assume that it makes sense to move changeable fields of
> superblock into super root metadata structure. We can provide
> independent sets of above-mentioned changeable superblock's field for
> every snapshot/checkpoint in such way.

I think only fields that constantly change should be put on the super
root block.  If we can find the way to look up the most latest log, we
don't need to update s_last_cno, s_last_pseg, s_last_seq so frequently
too. (cno, pseg, and seq number can be obtainable from the latest log)

Ideally, I think, we only should update the super blocks only at
mount/umount time or when changing file system state bits or feature
bits, etc.

> I suppose that another problem is to have unchangeable superblock (4 KB)
> and changeable segment area near after it. I think that it can be not
> FTL friendly situation for the case of flash using. Maybe it makes sense
> to have specially reserved areas (with size is equal by segment size) in
> the begin and in the end of NILFS2 volume.
> These areas can be used for
> special metadata structures are modified with COW principle inside
> reserved areas. Anyway, we need some metadata structure on volume (tree
> or something else) that can give information about most latest log by
> snapshot number.

Ok, I agree it is a reasonable approach to keep the pointer
information.  We should avoid frequent erase (overwrite) there, too.
And, it may become acceptable by applying COW policy.

Or, we may be able to reduce the number of seeks/disk-scans without
such additional information by combining binary search on the device
and a restriction on segment allocator.  For instance, we may be able
to reduce the number of block scans for the latest log lookup by
itroducing groups of segments in which we ensure that segments are
sequntially allocated in each group.  Then we can devide the disk-scan
steps into two phases, one is searching the latest segment group and
another is seaching the latest segment (log) in the group.  This idea
of grouping may be able to be nested.

> So, currently, I haven't clear vision of possible good approach for
> efficient search of latest log. But it needs to take into account and
> possible GC policy change. Because more efficient way of garbage
> collection can be a way to leave untouched "cold" segments (such full
> segment can contain valid and actual data). As a result, chain of linked
> partial segments on the volume can be more sophisticated and not chain
> of sibling segments. Thereby, search of latest log can be not so simple
> and fast operation in the environment of current search algorithm, I
> think.

Yes, GC policy is affected.  I think it is an acceptable change for
this purpose.

> I think that [snapshot number | latest log] tree can be restricted by
> one file system block (4 KB). So, one of the possible way is to save
> changing in such tree by journal-like circular log which keeps chain of
> modified blocks, for example. Maybe, it makes sense to keep pair of
> values [current last log; upper bound of last log]. The upper bound can
> be prediction value that it means where last log will be after some
> time. Thereby, such tree and superblock(s) can live in reserved area at
> the volume begin and end.

How about considering the latest log scan first, and then extending
the idea for writable snapshots/branches later, or the two ideas would be
separable if we think keeping them with one super root block.

Regards,
Ryusuke Konishi

> As you can see, currently, I don't suggest something concrete. I need to
> think over it more deeply.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 12:01         ` Vyacheslav Dubeyko
  2014-01-15 15:23           ` Ryusuke Konishi
@ 2014-01-16 10:03           ` Clemens Eisserer
       [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-16 10:03 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> NILFS2 has special method nilfs_sb_need_update() [1] and special
> constant NILFS_SB_FREQ [2] that it is used usually for definition
> frequency of superblocks updating. So, as far as I can judge, default
> value of such frequency under high I/O load is 10 seconds (Minimum
> interval of periodical update of superblocks (in seconds)).
>
> [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
> [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252

Thanks for the in-depth explanation.

> We need to the periodical in-place superblock write only for updating
> a pointer to the most latest log.  And, this will be eliminable if we
> can invent the fast way to determine the latest log.

Maybe it would be enough to detect whether the stored pointer to the
last log is recent and otherwise perform a slow scan?

Thanks, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-16 10:10               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 24+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-16 10:10 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2014-01-16 at 11:03 +0100, Clemens Eisserer wrote:

> 
> Maybe it would be enough to detect whether the stored pointer to the
> last log is recent and otherwise perform a slow scan?
> 

Slow scan means slow mount. :) Are you ready to wait the ending of
really long mount?

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-01-19 14:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17 19:19 Does nilfs2 do any in-place writes? Mark Trumpold
  -- strict thread matches above, loose matches on Subject: below --
2014-01-16 19:40 Mark Trumpold
2014-01-16 17:48 Mark Trumpold
2014-01-16 18:41 ` Clemens Eisserer
2014-01-17  6:31 ` Vyacheslav Dubeyko
2014-01-18  1:47   ` Ryusuke Konishi
     [not found]     ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-18  9:44       ` Clemens Eisserer
     [not found]         ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-18 16:25           ` Mark Trumpold
     [not found]             ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>
2014-01-18 18:11               ` Vyacheslav Dubeyko
2014-01-18 11:45       ` Andreas Rohner
     [not found]         ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>
2014-01-18 23:08           ` Vyacheslav Dubeyko
     [not found]             ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-18 23:08               ` Andreas Rohner
     [not found]                 ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>
2014-01-19  5:43                   ` Ryusuke Konishi
     [not found]                     ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-19 14:11                       ` Andreas Rohner
2014-01-15 10:44 Clemens Eisserer
     [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 10:52   ` Vyacheslav Dubeyko
2014-01-15 11:44     ` Clemens Eisserer
     [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 12:01         ` Vyacheslav Dubeyko
2014-01-15 15:23           ` Ryusuke Konishi
     [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-16 10:08               ` Vyacheslav Dubeyko
2014-01-17 22:55                 ` Ryusuke Konishi
2014-01-18  0:00                 ` Ryusuke Konishi
2014-01-16 10:03           ` Clemens Eisserer
     [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-16 10:10               ` Vyacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox