* Re: Does nilfs2 do any in-place writes?
@ 2014-01-17 19:19 Mark Trumpold
0 siblings, 0 replies; 24+ messages in thread
From: Mark Trumpold @ 2014-01-17 19:19 UTC (permalink / raw)
To: slava-yeENwD64cLxBDgjK7y7TUQ, Mark Trumpold
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
>-----Original Message-----
>From: Vyacheslav Dubeyko [mailto:slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org]
>Sent: Thursday, January 16, 2014 10:31 PM
>To: 'Mark Trumpold'
>Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>Subject: Re: Does nilfs2 do any in-place writes?
>
>On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
>> Hello All,
>>
>> I am wondering what the impact of in-place writes of the
>> superblock has on SSDs in terms of wear?
>>
>> I've been stress testing our system which uses Nilfs, and
>> recently I had a SSD fail with the classic messages indicating
>> low level media problems -- and also implicating Nilfs as trying
>> to locate a superblock (I think).
>>
>> Following is a partial dmesg list:
>>
>> [ 7.630382] Sense Key : Medium Error [current] [descriptor]
>> [ 7.630385] Descriptor sense data with sense descriptors (in hex):
>> [ 7.630386] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
>> [ 7.630394] 05 ff 0e 58
>> [ 7.630397] sd 0:0:0:0: [sda]
>> [ 7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
>> [ 7.630401] sd 0:0:0:0: [sda] CDB:
>> [ 7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
>> [ 7.630409] end_request: I/O error, dev sda, sector 100601432
>> [ 7.635326] NILFS warning: I/O error on loading last segment
>> [ 7.635329] NILFS: error searching super root.
>>
>>
>
>I don't think that this issue is related to superblocks. Because I can't
>see in your output the magic signature of NILFS2. For example, I have
>such first 16 bytes in superblock:
>
>00000400 02 00 00 00 00 00 34 34 18 01 00 00 52 85 db 71 |......44....R..q|
>
>Of course, I don't know your partition table details but I doubt that
>sector 100601432 is a superblock sector. Moreover, you have error
>messages that inform about troubles with loading last segment during
>super root searching.
>
>We have on NILFS2 only two blocks that live under in-place update
>policy. An update frequency is not so high. So, I suppose that any FTL
>can easily provide good wear leveling support for superblocks. But, of
>course, in-place update is not good policy for flash-based devices,
>anyway.
>
>Maybe, I misunderstand something in your output. But I suppose that
>during stress-testing you can discover I/O error in any part of volume.
>Because it is really hard to predict when you will exhaust spare pool of
>erase blocks.
>
>With the best regards,
>Vyacheslav Dubeyko.
>
>
>
Hi Vyacheslav,
Thank you for taking a look at this.
Your assessment makes good sense, and I am relieved we have
a plausible explanation.
BTW: I upgraded to the 3.11.6 linux kernel (per your's and Ryusuke's
suggestions) to pick up the most recent Nilfs devel, and am finding things
to be very stable.
Best regards,
Mark T.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: Does nilfs2 do any in-place writes?
@ 2014-01-16 19:40 Mark Trumpold
0 siblings, 0 replies; 24+ messages in thread
From: Mark Trumpold @ 2014-01-16 19:40 UTC (permalink / raw)
To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
> -----Original Message-----
> From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Clemens Eisserer
> Sent: Thursday, January 16, 2014 10:42 AM
> To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: [PossibleSpam] Re: Does nilfs2 do any in-place writes?
>
> Hi Mark,
>
> > I am wondering what the impact of in-place writes of the
> > superblock has on SSDs in terms of wear?
>
> Typically SSDs have far more advanved static wear leveling algorithms
> which keep the erease count for each erease block as well as a
> sophisticated mapping table. Otherwise e.g. journaling file systems
> would probably kill it quickly.
>
> Regards, Clemens
> --
Hi Clemens,
Thank you for the info. That was my prior understanding; however
I thought it curious that the SSD failure sited nilfs trying to access
the superblock which had failed at the media level.
It was a fairly high end SSD with TRIM, etc (Corsair Force 240GB).
Working with the vendor to analyze further..
Thanks again,
Mark T.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: Does nilfs2 do any in-place writes? @ 2014-01-16 17:48 Mark Trumpold 2014-01-16 18:41 ` Clemens Eisserer 2014-01-17 6:31 ` Vyacheslav Dubeyko 0 siblings, 2 replies; 24+ messages in thread From: Mark Trumpold @ 2014-01-16 17:48 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hello All, I am wondering what the impact of in-place writes of the superblock has on SSDs in terms of wear? I've been stress testing our system which uses Nilfs, and recently I had a SSD fail with the classic messages indicating low level media problems -- and also implicating Nilfs as trying to locate a superblock (I think). Following is a partial dmesg list: [ 7.630382] Sense Key : Medium Error [current] [descriptor] [ 7.630385] Descriptor sense data with sense descriptors (in hex): [ 7.630386] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 7.630394] 05 ff 0e 58 [ 7.630397] sd 0:0:0:0: [sda] [ 7.630399] Add. Sense: Unrecovered read error - auto reallocate failed [ 7.630401] sd 0:0:0:0: [sda] CDB: [ 7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00 [ 7.630409] end_request: I/O error, dev sda, sector 100601432 [ 7.635326] NILFS warning: I/O error on loading last segment [ 7.635329] NILFS: error searching super root. Best regards, Mark T. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-16 17:48 Mark Trumpold @ 2014-01-16 18:41 ` Clemens Eisserer 2014-01-17 6:31 ` Vyacheslav Dubeyko 1 sibling, 0 replies; 24+ messages in thread From: Clemens Eisserer @ 2014-01-16 18:41 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Mark, > I am wondering what the impact of in-place writes of the > superblock has on SSDs in terms of wear? Typically SSDs have far more advanved static wear leveling algorithms which keep the erease count for each erease block as well as a sophisticated mapping table. Otherwise e.g. journaling file systems would probably kill it quickly. Regards, Clemens -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-16 17:48 Mark Trumpold 2014-01-16 18:41 ` Clemens Eisserer @ 2014-01-17 6:31 ` Vyacheslav Dubeyko 2014-01-18 1:47 ` Ryusuke Konishi 1 sibling, 1 reply; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-17 6:31 UTC (permalink / raw) To: Mark Trumpold; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote: > Hello All, > > I am wondering what the impact of in-place writes of the > superblock has on SSDs in terms of wear? > > I've been stress testing our system which uses Nilfs, and > recently I had a SSD fail with the classic messages indicating > low level media problems -- and also implicating Nilfs as trying > to locate a superblock (I think). > > Following is a partial dmesg list: > > [ 7.630382] Sense Key : Medium Error [current] [descriptor] > [ 7.630385] Descriptor sense data with sense descriptors (in hex): > [ 7.630386] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 > [ 7.630394] 05 ff 0e 58 > [ 7.630397] sd 0:0:0:0: [sda] > [ 7.630399] Add. Sense: Unrecovered read error - auto reallocate failed > [ 7.630401] sd 0:0:0:0: [sda] CDB: > [ 7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00 > [ 7.630409] end_request: I/O error, dev sda, sector 100601432 > [ 7.635326] NILFS warning: I/O error on loading last segment > [ 7.635329] NILFS: error searching super root. > > I don't think that this issue is related to superblocks. Because I can't see in your output the magic signature of NILFS2. For example, I have such first 16 bytes in superblock: 00000400 02 00 00 00 00 00 34 34 18 01 00 00 52 85 db 71 |......44....R..q| Of course, I don't know your partition table details but I doubt that sector 100601432 is a superblock sector. Moreover, you have error messages that inform about troubles with loading last segment during super root searching. We have on NILFS2 only two blocks that live under in-place update policy. An update frequency is not so high. So, I suppose that any FTL can easily provide good wear leveling support for superblocks. But, of course, in-place update is not good policy for flash-based devices, anyway. Maybe, I misunderstand something in your output. But I suppose that during stress-testing you can discover I/O error in any part of volume. Because it is really hard to predict when you will exhaust spare pool of erase blocks. With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-17 6:31 ` Vyacheslav Dubeyko @ 2014-01-18 1:47 ` Ryusuke Konishi [not found] ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Ryusuke Konishi @ 2014-01-18 1:47 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Mark Trumpold, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Fri, 17 Jan 2014 10:31:55 +0400, Vyacheslav Dubeyko wrote: > On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote: >> Hello All, >> >> I am wondering what the impact of in-place writes of the >> superblock has on SSDs in terms of wear? >> >> I've been stress testing our system which uses Nilfs, and >> recently I had a SSD fail with the classic messages indicating >> low level media problems -- and also implicating Nilfs as trying >> to locate a superblock (I think). >> >> Following is a partial dmesg list: >> >> [ 7.630382] Sense Key : Medium Error [current] [descriptor] >> [ 7.630385] Descriptor sense data with sense descriptors (in hex): >> [ 7.630386] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 >> [ 7.630394] 05 ff 0e 58 >> [ 7.630397] sd 0:0:0:0: [sda] >> [ 7.630399] Add. Sense: Unrecovered read error - auto reallocate failed >> [ 7.630401] sd 0:0:0:0: [sda] CDB: >> [ 7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00 >> [ 7.630409] end_request: I/O error, dev sda, sector 100601432 >> [ 7.635326] NILFS warning: I/O error on loading last segment >> [ 7.635329] NILFS: error searching super root. >> >> > > I don't think that this issue is related to superblocks. Because I can't > see in your output the magic signature of NILFS2. For example, I have > such first 16 bytes in superblock: > > 00000400 02 00 00 00 00 00 34 34 18 01 00 00 52 85 db 71 |......44....R..q| > > Of course, I don't know your partition table details but I doubt that > sector 100601432 is a superblock sector. Moreover, you have error > messages that inform about troubles with loading last segment during > super root searching. > > We have on NILFS2 only two blocks that live under in-place update > policy. An update frequency is not so high. So, I suppose that any FTL > can easily provide good wear leveling support for superblocks. But, of > course, in-place update is not good policy for flash-based devices, > anyway. > > Maybe, I misunderstand something in your output. But I suppose that > during stress-testing you can discover I/O error in any part of volume. > Because it is really hard to predict when you will exhaust spare pool of > erase blocks. Rather, the issue on the flash devices may come from the current immature garbage collection algorithm. The current cleanerd only supports the timestamp-based GC policy which always tries to move the oldest segment first and even moves segments full of live blocks, thereby shortens the lifetime of flash devices. :-( Actually, this is a high-priority todo, and now I am inclined to consider it with the group concept of segments. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-01-18 9:44 ` Clemens Eisserer [not found] ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-01-18 11:45 ` Andreas Rohner 1 sibling, 1 reply; 24+ messages in thread From: Clemens Eisserer @ 2014-01-18 9:44 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi again, > Rather, the issue on the flash devices may come from the current > immature garbage collection algorithm. The current cleanerd only > supports the timestamp-based GC policy which always tries to move the > oldest segment first and even moves segments full of live blocks, > thereby shortens the lifetime of flash devices. :-( It depends - for SSDs the timestamp policy is not optimal as it leads to unnecessary writes. On the other hand, most cards only implement dynamic wear leveling (wear leveling takes only place for areas that are written to, which leads to very uneven wear distribution when there is mostly static data) and also don't have read-disturb handling. So for cards it is actually helpful to have the writes spread out evenly and as a bonus there is no need to worry about read-disturb effects =) Regards, Clemens -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-01-18 16:25 ` Mark Trumpold [not found] ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Mark Trumpold @ 2014-01-18 16:25 UTC (permalink / raw) To: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 1/18/14 1:44 AM, "Clemens Eisserer" <linuxhippy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >Hi again, > >> Rather, the issue on the flash devices may come from the current >> immature garbage collection algorithm. The current cleanerd only >> supports the timestamp-based GC policy which always tries to move the >> oldest segment first and even moves segments full of live blocks, >> thereby shortens the lifetime of flash devices. :-( > >It depends - for SSDs the timestamp policy is not optimal as it leads >to unnecessary writes. > >On the other hand, most cards only implement dynamic wear leveling >(wear leveling takes only place for areas that are written to, which >leads to very uneven wear distribution when there is mostly static >data) and also don't have read-disturb handling. >So for cards it is actually helpful to have the writes spread out >evenly and as a bonus there is no need to worry about read-disturb >effects =) > >Regards, Clemens >-- >To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in >the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Clemens and group, Good information. So, is it true that the logging/cow nature of nilfs actually improves wear leveling by having 'writes spread out evenly'? Regards, Mark T. > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org> @ 2014-01-18 18:11 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-18 18:11 UTC (permalink / raw) To: Mark Trumpold Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Jan 18, 2014, at 7:25 PM, Mark Trumpold wrote: > Hi Clemens and group, > > Good information. So, is it true that the logging/cow nature of > nilfs actually improves wear leveling by having 'writes spread out > evenly'? Of course, COW policy is much better for flash than in-place update. But even if NILFS2 (or any other file system) uses COW approach then namely FTL does wear-leveling. And namely FTL's algorithms define wear-leveling efficiency. So, I don't think that NILFS2 can improve wear-leveling. COW policy can leave FTL "cold" and to give opportunity not to use sophisticated algorithms of wear-leveling and using spare pool of erase blocks. But, moreover, it is very desirable to use TRIM command with COW policy also for improving FTL efficiency and performance. With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? [not found] ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-01-18 9:44 ` Clemens Eisserer @ 2014-01-18 11:45 ` Andreas Rohner [not found] ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org> 1 sibling, 1 reply; 24+ messages in thread From: Andreas Rohner @ 2014-01-18 11:45 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs On 2014-01-18 02:47, Ryusuke Konishi wrote: > On Fri, 17 Jan 2014 10:31:55 +0400, Vyacheslav Dubeyko wrote: >> On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote: >>> Hello All, >>> >>> I am wondering what the impact of in-place writes of the >>> superblock has on SSDs in terms of wear? >>> >>> I've been stress testing our system which uses Nilfs, and >>> recently I had a SSD fail with the classic messages indicating >>> low level media problems -- and also implicating Nilfs as trying >>> to locate a superblock (I think). >>> >>> Following is a partial dmesg list: >>> >>> [ 7.630382] Sense Key : Medium Error [current] [descriptor] >>> [ 7.630385] Descriptor sense data with sense descriptors (in hex): >>> [ 7.630386] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 >>> [ 7.630394] 05 ff 0e 58 >>> [ 7.630397] sd 0:0:0:0: [sda] >>> [ 7.630399] Add. Sense: Unrecovered read error - auto reallocate failed >>> [ 7.630401] sd 0:0:0:0: [sda] CDB: >>> [ 7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00 >>> [ 7.630409] end_request: I/O error, dev sda, sector 100601432 >>> [ 7.635326] NILFS warning: I/O error on loading last segment >>> [ 7.635329] NILFS: error searching super root. >>> >>> >> >> I don't think that this issue is related to superblocks. Because I can't >> see in your output the magic signature of NILFS2. For example, I have >> such first 16 bytes in superblock: >> >> 00000400 02 00 00 00 00 00 34 34 18 01 00 00 52 85 db 71 |......44....R..q| >> >> Of course, I don't know your partition table details but I doubt that >> sector 100601432 is a superblock sector. Moreover, you have error >> messages that inform about troubles with loading last segment during >> super root searching. >> >> We have on NILFS2 only two blocks that live under in-place update >> policy. An update frequency is not so high. So, I suppose that any FTL >> can easily provide good wear leveling support for superblocks. But, of >> course, in-place update is not good policy for flash-based devices, >> anyway. >> >> Maybe, I misunderstand something in your output. But I suppose that >> during stress-testing you can discover I/O error in any part of volume. >> Because it is really hard to predict when you will exhaust spare pool of >> erase blocks. > > Rather, the issue on the flash devices may come from the current > immature garbage collection algorithm. The current cleanerd only > supports the timestamp-based GC policy which always tries to move the > oldest segment first and even moves segments full of live blocks, > thereby shortens the lifetime of flash devices. :-( > > Actually, this is a high-priority todo, and now I am inclined to > consider it with the group concept of segments. Hi, I am currently working on the garbage collector. I have implemented the cost-benefit and greedy policies. It is quite a big change and I was reluctant to submit a patch until I thoroughly tested it. I have substantially redesigned it since last time I wrote about it on the mailinglist. Now it seems to be very stable and the results are quite promising. The following results [1] are from my "ultimate" benchmark. It runs on an AMD Phenom II X6 1090T processor with 8GB Ram and a Samsung SSD 840 with a 100GB partition for NILFS2. I used the Lair62 NFS traces form the IOTTA Repository [2] to get a realistic and reproducible benchmark: This is what the benchmark does: 1. Create a 20GB file of static data 2a. Start replaying the Lair62 NFS traces 2b. In parallel turn random checkpoints into snapshots every 5 minutes, keep a list of the snapshots and turn them back into checkpoints after 15 minutes, so there are at most 3 snapshots present at the same time. Timestamp is so slow, because it needlessly copies the 20GB static data around over and over again, which can be seen because of the periodic drops in performance. The other policies ignore the static data and never move it. This is also evident if you compare the amount of data written to the device [3] (compare /proc/diskstats before and after the benchmark). If you are interested I could clean up my code and submit a patch set for review. I am sure there are lots of things, that need to be changed, but maybe it can give you some ideas... It would also be possible, to improve timestamp by allowing the cleaner to abort if there is nothing to gain from cleaning a particular segment. Instead it could just updated the su_lastmod in the SUFILE without doing anything else. This would be a fairly simple change. I could provide a patch for that too. Regards, Andreas Rohner [1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf [2] http://iotta.snia.org/historical_section?tracetype_id=2 [3] https://www.dropbox.com/s/nwfixlzzzvf93v2/replay_parallel_stats_write.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org> @ 2014-01-18 23:08 ` Vyacheslav Dubeyko [not found] ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-18 23:08 UTC (permalink / raw) To: Andreas Rohner; +Cc: Ryusuke Konishi, linux-nilfs On Jan 18, 2014, at 2:45 PM, Andreas Rohner wrote: > If you are interested I could clean up my code and submit a patch set > for review. I am sure there are lots of things, that need to be changed, > but maybe it can give you some ideas... > > It would also be possible, to improve timestamp by allowing the cleaner > to abort if there is nothing to gain from cleaning a particular segment. > Instead it could just updated the su_lastmod in the SUFILE without doing > anything else. This would be a fairly simple change. I could provide a > patch for that too. > I think that it is very desirable to share patches for the review on early stages because it is possible to achieve a valuable results by means of open and continuos discussion. So, you are welcome to share your vision and your patches. As I remember, I had made many remarks about your approach and about your code last time. So, I hope that you rework your approach significantly. > Regards, > Andreas Rohner > > [1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf To be honest, I completely misunderstand this diagram. It is hard to understand it without additional description, from my point of view. > [2] http://iotta.snia.org/historical_section?tracetype_id=2 > [3] > https://www.dropbox.com/s/nwfixlzzzvf93v2/replay_parallel_stats_write.pdf Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> @ 2014-01-18 23:08 ` Andreas Rohner [not found] ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Andreas Rohner @ 2014-01-18 23:08 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs On 2014-01-19 00:08, Vyacheslav Dubeyko wrote: > I think that it is very desirable to share patches for the review on early > stages because it is possible to achieve a valuable results by means of > open and continuos discussion. So, you are welcome to share your > vision and your patches. Good I will prepare my patches. > As I remember, I had made many remarks about your approach and about > your code last time. So, I hope that you rework your approach significantly. Yes it is basically a complete rewrite. >> [1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf > > To be honest, I completely misunderstand this diagram. It is hard to understand > it without additional description, from my point of view. Yes you are probably right about that. It is generated from a GC log file. For every GC operation I print out the number of live blocks. For example if the GC cleans a segment and 90% of the blocks in it are live, then 90% of the blocks need to be moved to a new segment. Moving blocks is undesirable and therefore I call that "inefficient". In this example the efficiency would be 10%. So 10% would then become one point in the graph at time 0. The goal of every cleaning policy is to find segments with as few live blocks as possible. So efficiency is basically the percentage of dead blocks. Maybe I should label it differently... The vertical dashed lines mark the time, when the benchmark finished. The GC still runs on for some time after that until it reaches the max_clean_segments threshold. The graph shows, that the cost-benefit and greedy policies are better at finding segments with a lot of dead blocks. So less blocks need to be moved to new segments and the benchmark finishes in less than half the time. Best regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org> @ 2014-01-19 5:43 ` Ryusuke Konishi [not found] ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Ryusuke Konishi @ 2014-01-19 5:43 UTC (permalink / raw) To: Andreas Rohner; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Sun, 19 Jan 2014 00:08:58 +0100, Andreas Rohner wrote: > On 2014-01-19 00:08, Vyacheslav Dubeyko wrote: >> I think that it is very desirable to share patches for the review on early >> stages because it is possible to achieve a valuable results by means of >> open and continuos discussion. So, you are welcome to share your >> vision and your patches. > > Good I will prepare my patches. Before submitting patches, please test them with scripts/checkpatch.pl in the latest nilfs-utils.git. This would help to reduce coding style issues, and we will be able to concentrate on design and implementation of the patches. If your patchset affects compatibility of existing code, it would be helpful if you clarify them in terms of disk format compatibility, ioctl interface compatibility, library compatibility, and CUI-compatibility. I expect that your work makes progress on the GC issue adding improved algorithm (cost-benefit altogorithm, greedy algorithm, and further) to cleanerd. Thanks, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-01-19 14:11 ` Andreas Rohner 0 siblings, 0 replies; 24+ messages in thread From: Andreas Rohner @ 2014-01-19 14:11 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-01-19 06:43, Ryusuke Konishi wrote: > On Sun, 19 Jan 2014 00:08:58 +0100, Andreas Rohner wrote: >> On 2014-01-19 00:08, Vyacheslav Dubeyko wrote: >>> I think that it is very desirable to share patches for the review on early >>> stages because it is possible to achieve a valuable results by means of >>> open and continuos discussion. So, you are welcome to share your >>> vision and your patches. >> >> Good I will prepare my patches. > > Before submitting patches, please test them with scripts/checkpatch.pl > in the latest nilfs-utils.git. This would help to reduce coding style > issues, and we will be able to concentrate on design and > implementation of the patches. I used scripts/checkpatch.pl and I hope everything is correct now. > If your patchset affects compatibility of existing code, it would be > helpful if you clarify them in terms of disk format compatibility, > ioctl interface compatibility, library compatibility, and > CUI-compatibility. Yes there are some compatibility issues with the implementation of a counter for segment usage tracking. > I expect that your work makes progress on the GC issue adding improved > algorithm (cost-benefit altogorithm, greedy algorithm, and further) to > cleanerd. Yes that's right. Although the patch set I submitted today has nothing to do with cost-benefit and greedy. I decided to first try and submit a smaller contribution with no compatibility issues. Best regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Does nilfs2 do any in-place writes?
@ 2014-01-15 10:44 Clemens Eisserer
[not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Clemens Eisserer @ 2014-01-15 10:44 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
Recently my raspberry pi destroyed a 32GB SD card after only 4 days,
because that cheap SD card seemed to have issues with wear-leveling.
The areas where the ext4 journal was stored were no longer read- or writeable.
I wonder which write-access patterns nilfs2 does exhibit.
Are there any frequent in-place updates to statically positioned data
structures (superblock, translation tables, ...) or is the data mostly
written sequentially?
Thank you in advance, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread[parent not found: <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-01-15 10:52 ` Vyacheslav Dubeyko 2014-01-15 11:44 ` Clemens Eisserer 0 siblings, 1 reply; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-15 10:52 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Wed, 2014-01-15 at 11:44 +0100, Clemens Eisserer wrote: > Hi, > > Recently my raspberry pi destroyed a 32GB SD card after only 4 days, > because that cheap SD card seemed to have issues with wear-leveling. > The areas where the ext4 journal was stored were no longer read- or writeable. > > I wonder which write-access patterns nilfs2 does exhibit. > Are there any frequent in-place updates to statically positioned data > structures (superblock, translation tables, ...) or is the data mostly > written sequentially? > The main approach of NILFS2 is COW (copy-on-write) policy. It means that all data and metadata are written in log manner. Only superblocks are placed in fixed positions and updated there. First superblock is located in the volume begin, second one in the volume end. With the best regards, Vyacheslav Dubeyko. > Thank you in advance, Clemens > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-15 10:52 ` Vyacheslav Dubeyko @ 2014-01-15 11:44 ` Clemens Eisserer [not found] ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Clemens Eisserer @ 2014-01-15 11:44 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, > The main approach of NILFS2 is COW (copy-on-write) policy. It means that > all data and metadata are written in log manner. Only superblocks are > placed in fixed positions and updated there. First superblock is located > in the volume begin, second one in the volume end. Can you give me an estime how often the superblock is updated / written to? Thanks a lot, Clemens -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-01-15 12:01 ` Vyacheslav Dubeyko 2014-01-15 15:23 ` Ryusuke Konishi 2014-01-16 10:03 ` Clemens Eisserer 0 siblings, 2 replies; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-15 12:01 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Clemens, On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote: > Hi Vyacheslav, > > > The main approach of NILFS2 is COW (copy-on-write) policy. It means that > > all data and metadata are written in log manner. Only superblocks are > > placed in fixed positions and updated there. First superblock is located > > in the volume begin, second one in the volume end. > > Can you give me an estime how often the superblock is updated / written to? > NILFS2 has special method nilfs_sb_need_update() [1] and special constant NILFS_SB_FREQ [2] that it is used usually for definition frequency of superblocks updating. So, as far as I can judge, default value of such frequency under high I/O load is 10 seconds (Minimum interval of periodical update of superblocks (in seconds)). With the best regards, Vyacheslav Dubeyko. [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254 [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-15 12:01 ` Vyacheslav Dubeyko @ 2014-01-15 15:23 ` Ryusuke Konishi [not found] ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-01-16 10:03 ` Clemens Eisserer 1 sibling, 1 reply; 24+ messages in thread From: Ryusuke Konishi @ 2014-01-15 15:23 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, On Wed, 15 Jan 2014 16:01:44 +0400, Vyacheslav Dubeyko wrote: > Hi Clemens, > > On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote: >> Hi Vyacheslav, >> >> > The main approach of NILFS2 is COW (copy-on-write) policy. It means that >> > all data and metadata are written in log manner. Only superblocks are >> > placed in fixed positions and updated there. First superblock is located >> > in the volume begin, second one in the volume end. >> >> Can you give me an estime how often the superblock is updated / written to? >> > > NILFS2 has special method nilfs_sb_need_update() [1] and special > constant NILFS_SB_FREQ [2] that it is used usually for definition > frequency of superblocks updating. So, as far as I can judge, default > value of such frequency under high I/O load is 10 seconds (Minimum > interval of periodical update of superblocks (in seconds)). > > With the best regards, > Vyacheslav Dubeyko. > > [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254 > [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252 By the way, do you have any nice idea to stop periodical update of superblocks? Most information on the superblock is static (layout information or so). sbp->s_state has an ERROR state bit and a VALID state bit, but these bits are mostly static. sbp->s_free_blocks_count keeps free block count at the time, but I think this information is not important because it can be calculated from the number of clean segments. We need to the periodical in-place superblock write only for updating a pointer to the most latest log. And, this will be eliminable if we can invent the fast way to determine the latest log. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-01-16 10:08 ` Vyacheslav Dubeyko 2014-01-17 22:55 ` Ryusuke Konishi 2014-01-18 0:00 ` Ryusuke Konishi 0 siblings, 2 replies; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-16 10:08 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Ryusuke, On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote: > > By the way, do you have any nice idea to stop periodical update of > superblocks? > Yes, I think too that such suggestion is valuable for NILFS2. But I suppose that the problem is more complex. I mean a situation with write-able snapshots. If we will have write-able snapshots then it means necessity to have independent version of some superblock's fields (s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count, s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For example, snapshot can be made before xafile creation on a volume and write-able snapshot should continue to live without possibility to xattr creation, and so on. > Most information on the superblock is static (layout information > or so). > > sbp->s_state has an ERROR state bit and a VALID state bit, but > these bits are mostly static. > > sbp->s_free_blocks_count keeps free block count at the time, but I > think this information is not important because it can be calculated > from the number of clean segments. > > We need to the periodical in-place superblock write only for updating > a pointer to the most latest log. And, this will be eliminable if we > can invent the fast way to determine the latest log. > As far as I can see, we have more changeable fields in superblock. But, of course, it is possible to leave in superblock only static information. I assume that it makes sense to move changeable fields of superblock into super root metadata structure. We can provide independent sets of above-mentioned changeable superblock's field for every snapshot/checkpoint in such way. I suppose that another problem is to have unchangeable superblock (4 KB) and changeable segment area near after it. I think that it can be not FTL friendly situation for the case of flash using. Maybe it makes sense to have specially reserved areas (with size is equal by segment size) in the begin and in the end of NILFS2 volume. These areas can be used for special metadata structures are modified with COW principle inside reserved areas. Anyway, we need some metadata structure on volume (tree or something else) that can give information about most latest log by snapshot number. So, currently, I haven't clear vision of possible good approach for efficient search of latest log. But it needs to take into account and possible GC policy change. Because more efficient way of garbage collection can be a way to leave untouched "cold" segments (such full segment can contain valid and actual data). As a result, chain of linked partial segments on the volume can be more sophisticated and not chain of sibling segments. Thereby, search of latest log can be not so simple and fast operation in the environment of current search algorithm, I think. I think that [snapshot number | latest log] tree can be restricted by one file system block (4 KB). So, one of the possible way is to save changing in such tree by journal-like circular log which keeps chain of modified blocks, for example. Maybe, it makes sense to keep pair of values [current last log; upper bound of last log]. The upper bound can be prediction value that it means where last log will be after some time. Thereby, such tree and superblock(s) can live in reserved area at the volume begin and end. As you can see, currently, I don't suggest something concrete. I need to think over it more deeply. With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-16 10:08 ` Vyacheslav Dubeyko @ 2014-01-17 22:55 ` Ryusuke Konishi 2014-01-18 0:00 ` Ryusuke Konishi 1 sibling, 0 replies; 24+ messages in thread From: Ryusuke Konishi @ 2014-01-17 22:55 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote: > Hi Ryusuke, > > On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote: > >> >> By the way, do you have any nice idea to stop periodical update of >> superblocks? >> > > Yes, I think too that such suggestion is valuable for NILFS2. But I > suppose that the problem is more complex. I mean a situation with > write-able snapshots. If we will have write-able snapshots then it means > necessity to have independent version of some superblock's fields > (s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count, > s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For > example, snapshot can be made before xafile creation on a volume and > write-able snapshot should continue to live without possibility to xattr > creation, and so on. OK, please tell me what do you suppose about the writable snapshot. Do you think we should keep multiple branches or concurrently mountable namespaces on one device ? I prefer to maintain only one super root block per partition even if we support writable snapshots. Otherwise, I think we should use multiple partitions to simplify the design. I mean keeping multiple branches in one super root block with a DAT file and a sufile in such a case. Maintaining multiple DAT files and sufiles on one device seems too complex to me. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-16 10:08 ` Vyacheslav Dubeyko 2014-01-17 22:55 ` Ryusuke Konishi @ 2014-01-18 0:00 ` Ryusuke Konishi 1 sibling, 0 replies; 24+ messages in thread From: Ryusuke Konishi @ 2014-01-18 0:00 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote: >> Most information on the superblock is static (layout information >> or so). >> >> sbp->s_state has an ERROR state bit and a VALID state bit, but >> these bits are mostly static. >> >> sbp->s_free_blocks_count keeps free block count at the time, but I >> think this information is not important because it can be calculated >> from the number of clean segments. >> >> We need to the periodical in-place superblock write only for updating >> a pointer to the most latest log. And, this will be eliminable if we >> can invent the fast way to determine the latest log. >> > > As far as I can see, we have more changeable fields in superblock. But, > of course, it is possible to leave in superblock only static > information. I assume that it makes sense to move changeable fields of > superblock into super root metadata structure. We can provide > independent sets of above-mentioned changeable superblock's field for > every snapshot/checkpoint in such way. I think only fields that constantly change should be put on the super root block. If we can find the way to look up the most latest log, we don't need to update s_last_cno, s_last_pseg, s_last_seq so frequently too. (cno, pseg, and seq number can be obtainable from the latest log) Ideally, I think, we only should update the super blocks only at mount/umount time or when changing file system state bits or feature bits, etc. > I suppose that another problem is to have unchangeable superblock (4 KB) > and changeable segment area near after it. I think that it can be not > FTL friendly situation for the case of flash using. Maybe it makes sense > to have specially reserved areas (with size is equal by segment size) in > the begin and in the end of NILFS2 volume. > These areas can be used for > special metadata structures are modified with COW principle inside > reserved areas. Anyway, we need some metadata structure on volume (tree > or something else) that can give information about most latest log by > snapshot number. Ok, I agree it is a reasonable approach to keep the pointer information. We should avoid frequent erase (overwrite) there, too. And, it may become acceptable by applying COW policy. Or, we may be able to reduce the number of seeks/disk-scans without such additional information by combining binary search on the device and a restriction on segment allocator. For instance, we may be able to reduce the number of block scans for the latest log lookup by itroducing groups of segments in which we ensure that segments are sequntially allocated in each group. Then we can devide the disk-scan steps into two phases, one is searching the latest segment group and another is seaching the latest segment (log) in the group. This idea of grouping may be able to be nested. > So, currently, I haven't clear vision of possible good approach for > efficient search of latest log. But it needs to take into account and > possible GC policy change. Because more efficient way of garbage > collection can be a way to leave untouched "cold" segments (such full > segment can contain valid and actual data). As a result, chain of linked > partial segments on the volume can be more sophisticated and not chain > of sibling segments. Thereby, search of latest log can be not so simple > and fast operation in the environment of current search algorithm, I > think. Yes, GC policy is affected. I think it is an acceptable change for this purpose. > I think that [snapshot number | latest log] tree can be restricted by > one file system block (4 KB). So, one of the possible way is to save > changing in such tree by journal-like circular log which keeps chain of > modified blocks, for example. Maybe, it makes sense to keep pair of > values [current last log; upper bound of last log]. The upper bound can > be prediction value that it means where last log will be after some > time. Thereby, such tree and superblock(s) can live in reserved area at > the volume begin and end. How about considering the latest log scan first, and then extending the idea for writable snapshots/branches later, or the two ideas would be separable if we think keeping them with one super root block. Regards, Ryusuke Konishi > As you can see, currently, I don't suggest something concrete. I need to > think over it more deeply. > > With the best regards, > Vyacheslav Dubeyko. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Does nilfs2 do any in-place writes? 2014-01-15 12:01 ` Vyacheslav Dubeyko 2014-01-15 15:23 ` Ryusuke Konishi @ 2014-01-16 10:03 ` Clemens Eisserer [not found] ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 24+ messages in thread From: Clemens Eisserer @ 2014-01-16 10:03 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, > NILFS2 has special method nilfs_sb_need_update() [1] and special > constant NILFS_SB_FREQ [2] that it is used usually for definition > frequency of superblocks updating. So, as far as I can judge, default > value of such frequency under high I/O load is 10 seconds (Minimum > interval of periodical update of superblocks (in seconds)). > > [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254 > [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252 Thanks for the in-depth explanation. > We need to the periodical in-place superblock write only for updating > a pointer to the most latest log. And, this will be eliminable if we > can invent the fast way to determine the latest log. Maybe it would be enough to detect whether the stored pointer to the last log is recent and otherwise perform a slow scan? Thanks, Clemens -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Does nilfs2 do any in-place writes? [not found] ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-01-16 10:10 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 24+ messages in thread From: Vyacheslav Dubeyko @ 2014-01-16 10:10 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 2014-01-16 at 11:03 +0100, Clemens Eisserer wrote: > > Maybe it would be enough to detect whether the stored pointer to the > last log is recent and otherwise perform a slow scan? > Slow scan means slow mount. :) Are you ready to wait the ending of really long mount? With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2014-01-19 14:11 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17 19:19 Does nilfs2 do any in-place writes? Mark Trumpold
-- strict thread matches above, loose matches on Subject: below --
2014-01-16 19:40 Mark Trumpold
2014-01-16 17:48 Mark Trumpold
2014-01-16 18:41 ` Clemens Eisserer
2014-01-17 6:31 ` Vyacheslav Dubeyko
2014-01-18 1:47 ` Ryusuke Konishi
[not found] ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-18 9:44 ` Clemens Eisserer
[not found] ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-18 16:25 ` Mark Trumpold
[not found] ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>
2014-01-18 18:11 ` Vyacheslav Dubeyko
2014-01-18 11:45 ` Andreas Rohner
[not found] ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>
2014-01-18 23:08 ` Vyacheslav Dubeyko
[not found] ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-18 23:08 ` Andreas Rohner
[not found] ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>
2014-01-19 5:43 ` Ryusuke Konishi
[not found] ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-19 14:11 ` Andreas Rohner
2014-01-15 10:44 Clemens Eisserer
[not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 10:52 ` Vyacheslav Dubeyko
2014-01-15 11:44 ` Clemens Eisserer
[not found] ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 12:01 ` Vyacheslav Dubeyko
2014-01-15 15:23 ` Ryusuke Konishi
[not found] ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-16 10:08 ` Vyacheslav Dubeyko
2014-01-17 22:55 ` Ryusuke Konishi
2014-01-18 0:00 ` Ryusuke Konishi
2014-01-16 10:03 ` Clemens Eisserer
[not found] ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-16 10:10 ` Vyacheslav Dubeyko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox