Does nilfs2 do any in-place writes?

Linux NILFS development
 help / color / mirror / Atom feed

* Does nilfs2 do any in-place writes?
@ 2014-01-15 10:44 Clemens Eisserer
       [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-15 10:44 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

Recently my raspberry pi destroyed a 32GB SD card after only 4 days,
because that cheap SD card seemed to have issues with wear-leveling.
The areas where the ext4 journal was stored were no longer read- or writeable.

I wonder which write-access patterns nilfs2 does exhibit.
Are there any frequent in-place updates to statically positioned data
structures (superblock, translation tables, ...) or is the data mostly
written sequentially?

Thank you in advance, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-15 10:52   ` Vyacheslav Dubeyko
  2014-01-15 11:44     ` Clemens Eisserer
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-15 10:52 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2014-01-15 at 11:44 +0100, Clemens Eisserer wrote:
> Hi,
> 
> Recently my raspberry pi destroyed a 32GB SD card after only 4 days,
> because that cheap SD card seemed to have issues with wear-leveling.
> The areas where the ext4 journal was stored were no longer read- or writeable.
> 
> I wonder which write-access patterns nilfs2 does exhibit.
> Are there any frequent in-place updates to statically positioned data
> structures (superblock, translation tables, ...) or is the data mostly
> written sequentially?
> 

The main approach of NILFS2 is COW (copy-on-write) policy. It means that
all data and metadata are written in log manner. Only superblocks are
placed in fixed positions and updated there. First superblock is located
in the volume begin, second one in the volume end.

With the best regards,
Vyacheslav Dubeyko.

> Thank you in advance, Clemens
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 10:52   ` Vyacheslav Dubeyko
@ 2014-01-15 11:44     ` Clemens Eisserer
       [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-15 11:44 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> The main approach of NILFS2 is COW (copy-on-write) policy. It means that
> all data and metadata are written in log manner. Only superblocks are
> placed in fixed positions and updated there. First superblock is located
> in the volume begin, second one in the volume end.

Can you give me an estime how often the superblock is updated / written to?

Thanks a lot, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-15 12:01         ` Vyacheslav Dubeyko
  2014-01-15 15:23           ` Ryusuke Konishi
  2014-01-16 10:03           ` Clemens Eisserer
  0 siblings, 2 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-15 12:01 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote:
> Hi Vyacheslav,
> 
> > The main approach of NILFS2 is COW (copy-on-write) policy. It means that
> > all data and metadata are written in log manner. Only superblocks are
> > placed in fixed positions and updated there. First superblock is located
> > in the volume begin, second one in the volume end.
> 
> Can you give me an estime how often the superblock is updated / written to?
> 

NILFS2 has special method nilfs_sb_need_update() [1] and special
constant NILFS_SB_FREQ [2] that it is used usually for definition
frequency of superblocks updating. So, as far as I can judge, default
value of such frequency under high I/O load is 10 seconds (Minimum
interval of periodical update of superblocks (in seconds)).

With the best regards,
Vyacheslav Dubeyko.

[1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
[2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 12:01         ` Vyacheslav Dubeyko
@ 2014-01-15 15:23           ` Ryusuke Konishi
       [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-16 10:03           ` Clemens Eisserer
  1 sibling, 1 reply; 39+ messages in thread
From: Ryusuke Konishi @ 2014-01-15 15:23 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,
On Wed, 15 Jan 2014 16:01:44 +0400, Vyacheslav Dubeyko wrote:
> Hi Clemens,
> 
> On Wed, 2014-01-15 at 12:44 +0100, Clemens Eisserer wrote:
>> Hi Vyacheslav,
>> 
>> > The main approach of NILFS2 is COW (copy-on-write) policy. It means that
>> > all data and metadata are written in log manner. Only superblocks are
>> > placed in fixed positions and updated there. First superblock is located
>> > in the volume begin, second one in the volume end.
>> 
>> Can you give me an estime how often the superblock is updated / written to?
>> 
> 
> NILFS2 has special method nilfs_sb_need_update() [1] and special
> constant NILFS_SB_FREQ [2] that it is used usually for definition
> frequency of superblocks updating. So, as far as I can judge, default
> value of such frequency under high I/O load is 10 seconds (Minimum
> interval of periodical update of superblocks (in seconds)).
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
> [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252

By the way, do you have any nice idea to stop periodical update of
superblocks?

Most information on the superblock is static (layout information
or so).

sbp->s_state has an ERROR state bit and a VALID state bit, but
these bits are mostly static.

sbp->s_free_blocks_count keeps free block count at the time, but I
think this information is not important because it can be calculated
from the number of clean segments.

We need to the periodical in-place superblock write only for updating
a pointer to the most latest log.  And, this will be eliminable if we
can invent the fast way to determine the latest log.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-16 10:08               ` Vyacheslav Dubeyko
  2014-01-17 22:55                 ` Ryusuke Konishi
  2014-01-18  0:00                 ` Ryusuke Konishi
  0 siblings, 2 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-16 10:08 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote:

> 
> By the way, do you have any nice idea to stop periodical update of
> superblocks?
> 

Yes, I think too that such suggestion is valuable for NILFS2. But I
suppose that the problem is more complex. I mean a situation with
write-able snapshots. If we will have write-able snapshots then it means
necessity to have independent version of some superblock's  fields
(s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count,
s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For
example, snapshot can be made before xafile creation on a volume and
write-able snapshot should continue to live without possibility to xattr
creation, and so on.

> Most information on the superblock is static (layout information
> or so).
> 
> sbp->s_state has an ERROR state bit and a VALID state bit, but
> these bits are mostly static.
> 
> sbp->s_free_blocks_count keeps free block count at the time, but I
> think this information is not important because it can be calculated
> from the number of clean segments.
> 
> We need to the periodical in-place superblock write only for updating
> a pointer to the most latest log.  And, this will be eliminable if we
> can invent the fast way to determine the latest log.
> 

As far as I can see, we have more changeable fields in superblock. But,
of course, it is possible to leave in superblock only static
information. I assume that it makes sense to move changeable fields of
superblock into super root metadata structure. We can provide
independent sets of above-mentioned changeable superblock's field for
every snapshot/checkpoint in such way.

I suppose that another problem is to have unchangeable superblock (4 KB)
and changeable segment area near after it. I think that it can be not
FTL friendly situation for the case of flash using. Maybe it makes sense
to have specially reserved areas (with size is equal by segment size) in
the begin and in the end of NILFS2 volume. These areas can be used for
special metadata structures are modified with COW principle inside
reserved areas. Anyway, we need some metadata structure on volume (tree
or something else) that can give information about most latest log by
snapshot number.

So, currently, I haven't clear vision of possible good approach for
efficient search of latest log. But it needs to take into account and
possible GC policy change. Because more efficient way of garbage
collection can be a way to leave untouched "cold" segments (such full
segment can contain valid and actual data). As a result, chain of linked
partial segments on the volume can be more sophisticated and not chain
of sibling segments. Thereby, search of latest log can be not so simple
and fast operation in the environment of current search algorithm, I
think.

I think that [snapshot number | latest log] tree can be restricted by
one file system block (4 KB). So, one of the possible way is to save
changing in such tree by journal-like circular log which keeps chain of
modified blocks, for example. Maybe, it makes sense to keep pair of
values [current last log; upper bound of last log]. The upper bound can
be prediction value that it means where last log will be after some
time. Thereby, such tree and superblock(s) can live in reserved area at
the volume begin and end.

As you can see, currently, I don't suggest something concrete. I need to
think over it more deeply.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 10:08               ` Vyacheslav Dubeyko
@ 2014-01-17 22:55                 ` Ryusuke Konishi
       [not found]                   ` <20140118.075519.43661574.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-18  0:00                 ` Ryusuke Konishi
  1 sibling, 1 reply; 39+ messages in thread
From: Ryusuke Konishi @ 2014-01-17 22:55 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> On Thu, 2014-01-16 at 00:23 +0900, Ryusuke Konishi wrote:
> 
>> 
>> By the way, do you have any nice idea to stop periodical update of
>> superblocks?
>> 
> 
> Yes, I think too that such suggestion is valuable for NILFS2. But I
> suppose that the problem is more complex. I mean a situation with
> write-able snapshots. If we will have write-able snapshots then it means
> necessity to have independent version of some superblock's  fields
> (s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count,
> s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For
> example, snapshot can be made before xafile creation on a volume and
> write-able snapshot should continue to live without possibility to xattr
> creation, and so on.

OK, please tell me what do you suppose about the writable snapshot.

Do you think we should keep multiple branches or concurrently
mountable namespaces on one device ?

I prefer to maintain only one super root block per partition even if
we support writable snapshots.  Otherwise, I think we should use
multiple partitions to simplify the design.

I mean keeping multiple branches in one super root block with a DAT
file and a sufile in such a case.  Maintaining multiple DAT files and
sufiles on one device seems too complex to me.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <20140118.075519.43661574.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: [writable snapshots discussion] Does nilfs2 do any in-place writes?
       [not found]                   ` <20140118.075519.43661574.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-20 11:54                     ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-20 11:54 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On Sat, 2014-01-18 at 07:55 +0900, Ryusuke Konishi wrote:

> > Yes, I think too that such suggestion is valuable for NILFS2. But I
> > suppose that the problem is more complex. I mean a situation with
> > write-able snapshots. If we will have write-able snapshots then it means
> > necessity to have independent version of some superblock's  fields
> > (s_last_cno, s_last_pseg, s_last_seq, s_mtime, s_wtime, s_mnt_count,
> > s_state, s_c_interval, s_feature_compat_ro, s_feature_incompat). For
> > example, snapshot can be made before xafile creation on a volume and
> > write-able snapshot should continue to live without possibility to xattr
> > creation, and so on.
> 
> OK, please tell me what do you suppose about the writable snapshot.
> 
> Do you think we should keep multiple branches or concurrently
> mountable namespaces on one device ?
> 

When I think about notion of snapshot then I have such understanding.
Read-only snapshot is a "frozen" file system state. Writable snapshot is
an isolated file system state. So, I started from such understanding in
my considerations.

And I suppose that when we keep in one super root info about several
snapshots then we have "multiple branches" approach. But now we use
"concurrently mountable namespaces" approach for read-only snapshots, as
far as I can see. So, I think that users like and use "concurrently
mountable namespaces" and we should keep and evolve this approach. I
didn't think deeply about writable snapshots yet but maybe it will need
to modify VFS for multiple writable snapshots support (but I haven't any
concrete vision of it). Thereby, maybe, "multiple branches" and
"concurrently mountable namespaces" approaches are not contradictory but
complimentary.

> I prefer to maintain only one super root block per partition even if
> we support writable snapshots.  Otherwise, I think we should use
> multiple partitions to simplify the design.
> 
> I mean keeping multiple branches in one super root block with a DAT
> file and a sufile in such a case.  Maintaining multiple DAT files and
> sufiles on one device seems too complex to me.
> 

I suppose that your vision is right. But, anyway, it needs to elaborate
more concrete suggestions for discussion. I simply mentioned about it
but I didn't suggest something concrete. And I am not ready right now to
suggest several ideas. :) Firstly, I want to finish my consideration
about changing of superblocks' in-place update policy.

Anyway, I am going to return to discussion about writable snapshots.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-16 10:08               ` Vyacheslav Dubeyko
  2014-01-17 22:55                 ` Ryusuke Konishi
@ 2014-01-18  0:00                 ` Ryusuke Konishi
       [not found]                   ` <20140118.090008.194171715.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Ryusuke Konishi @ 2014-01-18  0:00 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 16 Jan 2014 14:08:32 +0400, Vyacheslav Dubeyko wrote:
>> Most information on the superblock is static (layout information
>> or so).
>> 
>> sbp->s_state has an ERROR state bit and a VALID state bit, but
>> these bits are mostly static.
>> 
>> sbp->s_free_blocks_count keeps free block count at the time, but I
>> think this information is not important because it can be calculated
>> from the number of clean segments.
>> 
>> We need to the periodical in-place superblock write only for updating
>> a pointer to the most latest log.  And, this will be eliminable if we
>> can invent the fast way to determine the latest log.
>> 
> 
> As far as I can see, we have more changeable fields in superblock. But,
> of course, it is possible to leave in superblock only static
> information. I assume that it makes sense to move changeable fields of
> superblock into super root metadata structure. We can provide
> independent sets of above-mentioned changeable superblock's field for
> every snapshot/checkpoint in such way.

I think only fields that constantly change should be put on the super
root block.  If we can find the way to look up the most latest log, we
don't need to update s_last_cno, s_last_pseg, s_last_seq so frequently
too. (cno, pseg, and seq number can be obtainable from the latest log)

Ideally, I think, we only should update the super blocks only at
mount/umount time or when changing file system state bits or feature
bits, etc.

> I suppose that another problem is to have unchangeable superblock (4 KB)
> and changeable segment area near after it. I think that it can be not
> FTL friendly situation for the case of flash using. Maybe it makes sense
> to have specially reserved areas (with size is equal by segment size) in
> the begin and in the end of NILFS2 volume.
> These areas can be used for
> special metadata structures are modified with COW principle inside
> reserved areas. Anyway, we need some metadata structure on volume (tree
> or something else) that can give information about most latest log by
> snapshot number.

Ok, I agree it is a reasonable approach to keep the pointer
information.  We should avoid frequent erase (overwrite) there, too.
And, it may become acceptable by applying COW policy.

Or, we may be able to reduce the number of seeks/disk-scans without
such additional information by combining binary search on the device
and a restriction on segment allocator.  For instance, we may be able
to reduce the number of block scans for the latest log lookup by
itroducing groups of segments in which we ensure that segments are
sequntially allocated in each group.  Then we can devide the disk-scan
steps into two phases, one is searching the latest segment group and
another is seaching the latest segment (log) in the group.  This idea
of grouping may be able to be nested.

> So, currently, I haven't clear vision of possible good approach for
> efficient search of latest log. But it needs to take into account and
> possible GC policy change. Because more efficient way of garbage
> collection can be a way to leave untouched "cold" segments (such full
> segment can contain valid and actual data). As a result, chain of linked
> partial segments on the volume can be more sophisticated and not chain
> of sibling segments. Thereby, search of latest log can be not so simple
> and fast operation in the environment of current search algorithm, I
> think.

Yes, GC policy is affected.  I think it is an acceptable change for
this purpose.

> I think that [snapshot number | latest log] tree can be restricted by
> one file system block (4 KB). So, one of the possible way is to save
> changing in such tree by journal-like circular log which keeps chain of
> modified blocks, for example. Maybe, it makes sense to keep pair of
> values [current last log; upper bound of last log]. The upper bound can
> be prediction value that it means where last log will be after some
> time. Thereby, such tree and superblock(s) can live in reserved area at
> the volume begin and end.

How about considering the latest log scan first, and then extending
the idea for writable snapshots/branches later, or the two ideas would be
separable if we think keeping them with one super root block.

Regards,
Ryusuke Konishi

> As you can see, currently, I don't suggest something concrete. I need to
> think over it more deeply.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <20140118.090008.194171715.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                   ` <20140118.090008.194171715.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-28  9:25                     ` Vyacheslav Dubeyko
       [not found]                       ` <1390901114.2942.11.camel-dzAnj6fV1RxGeWtTaGDT1UEK6ufn8VP3@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-28  9:25 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

This is my improved vision of possible approach to change in-place
update of superblock on COW policy. I suppose that current description
includes all that we discussed previously. And we can continue to deepen
this discussion. 

Approach is based on necessity to have two areas at the begin and
at the end of a NILFS2 volume. Every such area should have capacity
is equal to segment size. The goal of these two areas is to provide
a FTL-friendly way of storing information about latest log and
modified superblock's fields by means of COW (Copy-On-Write) policy.

At the begin of a NILFS2 volume is located primary superblock area.
Primary superblock area begins from static superblock is created
during NILFS2 volume creation by means of mkfs tool. This superblock
(primary superblock) is located on 1024 bytes from the volume begin
(as it placed currently). The primary superblock leaves untouched
during filling primary superblock area by modified information.
Initial state of superblock can be rewritten at the moment of
beginning next iteration of filling of primary superblock area
(because this area lives likewise circular buffer).

------------------------------------------------------------
| Primary superblock |         Modifiable area             |
------------------------------------------------------------
|<----  4 KB  ------>|
|<-------------------- segment size ---------------------->|

On the opposite side of the volume (at the volume's end) is located
secondary (or back) superblock area. This area begins from modifiable
area and it ends with secondary superblock (as it is located currently).
Modifiable area of secondary superblock area lives likewise of
modifiable area in first superblock area.

------------------------------------------------------------
|         Modifiable area           | Secondary superblock |
------------------------------------------------------------
                                    |<------  4 KB  ------>|
|<-------------------- segment size ---------------------->|

Primary and secondary superblock areas have goal to keep copies
of super roots. And, firstly, namely these areas are used for
searching a latest log. These areas should keep as super root as
physical block of this super root's placement. Moreover, primary
and secondary superblock areas have different frequency of updating.
Secondary superblock area is updated during every umount or once
at several hours (if we have significant system uptime). Primary
superblock area is updated more frequently. The frequency of
primary superblock area's update can be based on timeout or count
of constructed segments. But, anyway, it makes sense to take into
account only full segments instead of partial segments. Maybe, it
makes sense to keep more complex combination in modifiable area:
super root + diff to superblock state + physical block of super root's
placement.

Modifiable area should have special filling policy. This policy
doesn't contradict with COW policy but it implements not in
sequential manner. Namely, modifiable area should be divided on
several groups (the count of groups can be configurable option).
Moreover, primary and secondary superblock areas would have
different values of groups count. Thereby, every group will contain
some blocks count.

-------------------------------------------------------------
| Group1 | Group2 | Group3 |          ****         | GroupN |
-------------------------------------------------------------
|<-------------------- Modifiable area -------------------->|

Saving blocks are distributed between groups by means of policy
that every next block should be saved in next group on every
iteration. If all groups in modifiable area have equal count of
saved blocks then it begins the next iteration which starts from
the first group.

FIRST ITERATION [A phase]:

(1) first block
-------------------------------------------------------------
|A1|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
-------------------------------------------------------------
|<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|

(2) second block
-------------------------------------------------------------
|A1|  |  |  |  |A2|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
-------------------------------------------------------------
|<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|

(N) Nth block
-------------------------------------------------------------
|A1|  |  |  |  |A2|  |  |  |  |A3|  |  |  |  |An|  |  |  |  |
-------------------------------------------------------------
|<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|

SECOND ITERATION [B phase]:

-------------------------------------------------------------
|A1|B1|  |  |  |A2|B2|  |  |  |A3|B3|  |  |  |An|Bn|  |  |  |
-------------------------------------------------------------
|<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|

Nth ITERATION [E phase]:

-------------------------------------------------------------
|A1|B1|C1|D1|E1|A2|B2|C2|D2|E2|A3|B3|C3|D3|E3|An|Bn|Cn|Dn|En|
-------------------------------------------------------------
|<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|

Finally, when modifiable area is completely filled then it is
possible to discard area's content and to begin filling iterations
again. We will have two modifiable areas are filling with
different frequencies and some state of replication of
information. Thereby, it provides basis for safe and independent
discarding of modifiable areas.

The special filling policy has goal to provide a basis for
efficient search. Namely, first group contains blocks differ by
some period from each other. We have such sequence during saving:
[A1,A2,A3,..,An], [B1,B2,B3,..,Bn], ..., [E1,E2,E3,..,En]. But
first group will contain (A1,B1,C1,D1,E1). Thereby, passing
item-by-item through first group means jumping with some period.
Moreover, in the case of some failure it is possible to start the
searching from any group (with decreasing search efficiency).
It needs to take into account magic signature, header checksum and
timestamps during comparison of items in group. It provides opportunity
to distinguish valid blocks from empty and invalid ones and to
distinguish older blocks from latest ones.

Searching in dedicated area gives opportunity to use read-ahead
technique. Moreover, if group contains many items then it is
possible to increase step between current and next items during
search. For example, it is possible to use such sequence of steps
during searching: 0, 1, 3, 5, 7, and so on. If we have found
latest item in first group, for example, then it is possible
to find a latest item in he whole sequence by means of jumping on
group period (count of blocks in a group).

Two modifiable areas are filled with different frequencies and
it gives opportunity to use special searching algorithm. Such algorithm
can use, for example, secondary superblock area for rough,
preliminary search (because this modifiable area is changed rarely).
Then, further, algorithm can continue search in first superblock
area (because this modifiable area is changed more frequently).
Moreover, segctor thread has knowledge about all dirty files and it can
predict, theoretically, how many segments will be constructed.
Thereby, it is possible to save in items of modifiable area's groups
such prediction in the form of hint that it can be used during search
for improving search algorithm efficiency.

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <1390901114.2942.11.camel-dzAnj6fV1RxGeWtTaGDT1UEK6ufn8VP3@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                       ` <1390901114.2942.11.camel-dzAnj6fV1RxGeWtTaGDT1UEK6ufn8VP3@public.gmane.org>
@ 2014-01-29 12:44                         ` Andreas Rohner
       [not found]                           ` <52E8F7A7.8010505-hi6Y0CQ0nG0@public.gmane.org>
  2014-01-29 18:18                         ` Clemens Eisserer
  1 sibling, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-29 12:44 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, Ryusuke Konishi
  Cc: Clemens Eisserer, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-28 10:25, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> This is my improved vision of possible approach to change in-place
> update of superblock on COW policy. I suppose that current description
> includes all that we discussed previously. And we can continue to deepen
> this discussion. 
> 
> Approach is based on necessity to have two areas at the begin and
> at the end of a NILFS2 volume. Every such area should have capacity
> is equal to segment size. The goal of these two areas is to provide
> a FTL-friendly way of storing information about latest log and
> modified superblock's fields by means of COW (Copy-On-Write) policy.
> 
> At the begin of a NILFS2 volume is located primary superblock area.
> Primary superblock area begins from static superblock is created
> during NILFS2 volume creation by means of mkfs tool. This superblock
> (primary superblock) is located on 1024 bytes from the volume begin
> (as it placed currently). The primary superblock leaves untouched
> during filling primary superblock area by modified information.
> Initial state of superblock can be rewritten at the moment of
> beginning next iteration of filling of primary superblock area
> (because this area lives likewise circular buffer).
> 
> ------------------------------------------------------------
> | Primary superblock |         Modifiable area             |
> ------------------------------------------------------------
> |<----  4 KB  ------>|
> |<-------------------- segment size ---------------------->|
> 
> On the opposite side of the volume (at the volume's end) is located
> secondary (or back) superblock area. This area begins from modifiable
> area and it ends with secondary superblock (as it is located currently).
> Modifiable area of secondary superblock area lives likewise of
> modifiable area in first superblock area.
> 
> ------------------------------------------------------------
> |         Modifiable area           | Secondary superblock |
> ------------------------------------------------------------
>                                     |<------  4 KB  ------>|
> |<-------------------- segment size ---------------------->|
> 
> Primary and secondary superblock areas have goal to keep copies
> of super roots. And, firstly, namely these areas are used for
> searching a latest log. These areas should keep as super root as
> physical block of this super root's placement. Moreover, primary
> and secondary superblock areas have different frequency of updating.
> Secondary superblock area is updated during every umount or once
> at several hours (if we have significant system uptime). Primary
> superblock area is updated more frequently. The frequency of
> primary superblock area's update can be based on timeout or count
> of constructed segments. But, anyway, it makes sense to take into
> account only full segments instead of partial segments. Maybe, it
> makes sense to keep more complex combination in modifiable area:
> super root + diff to superblock state + physical block of super root's
> placement.
> 
> Modifiable area should have special filling policy. This policy
> doesn't contradict with COW policy but it implements not in
> sequential manner. Namely, modifiable area should be divided on
> several groups (the count of groups can be configurable option).
> Moreover, primary and secondary superblock areas would have
> different values of groups count. Thereby, every group will contain
> some blocks count.
> 
> -------------------------------------------------------------
> | Group1 | Group2 | Group3 |          ****         | GroupN |
> -------------------------------------------------------------
> |<-------------------- Modifiable area -------------------->|
> 
> Saving blocks are distributed between groups by means of policy
> that every next block should be saved in next group on every
> iteration. If all groups in modifiable area have equal count of
> saved blocks then it begins the next iteration which starts from
> the first group.
> 
> FIRST ITERATION [A phase]:
> 
> (1) first block
> -------------------------------------------------------------
> |A1|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
> -------------------------------------------------------------
> |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|
> 
> (2) second block
> -------------------------------------------------------------
> |A1|  |  |  |  |A2|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
> -------------------------------------------------------------
> |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|
> 
> (N) Nth block
> -------------------------------------------------------------
> |A1|  |  |  |  |A2|  |  |  |  |A3|  |  |  |  |An|  |  |  |  |
> -------------------------------------------------------------
> |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|
> 
> SECOND ITERATION [B phase]:
> 
> -------------------------------------------------------------
> |A1|B1|  |  |  |A2|B2|  |  |  |A3|B3|  |  |  |An|Bn|  |  |  |
> -------------------------------------------------------------
> |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|
> 
> Nth ITERATION [E phase]:
> 
> -------------------------------------------------------------
> |A1|B1|C1|D1|E1|A2|B2|C2|D2|E2|A3|B3|C3|D3|E3|An|Bn|Cn|Dn|En|
> -------------------------------------------------------------
> |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->|
> 
> Finally, when modifiable area is completely filled then it is
> possible to discard area's content and to begin filling iterations
> again. We will have two modifiable areas are filling with
> different frequencies and some state of replication of
> information. Thereby, it provides basis for safe and independent
> discarding of modifiable areas.
> 
> The special filling policy has goal to provide a basis for
> efficient search. Namely, first group contains blocks differ by
> some period from each other. We have such sequence during saving:
> [A1,A2,A3,..,An], [B1,B2,B3,..,Bn], ..., [E1,E2,E3,..,En]. But
> first group will contain (A1,B1,C1,D1,E1). Thereby, passing
> item-by-item through first group means jumping with some period.
> Moreover, in the case of some failure it is possible to start the
> searching from any group (with decreasing search efficiency).
> It needs to take into account magic signature, header checksum and
> timestamps during comparison of items in group. It provides opportunity
> to distinguish valid blocks from empty and invalid ones and to
> distinguish older blocks from latest ones.
> 
> Searching in dedicated area gives opportunity to use read-ahead
> technique. Moreover, if group contains many items then it is
> possible to increase step between current and next items during
> search. For example, it is possible to use such sequence of steps
> during searching: 0, 1, 3, 5, 7, and so on. If we have found
> latest item in first group, for example, then it is possible
> to find a latest item in he whole sequence by means of jumping on
> group period (count of blocks in a group).
> 
> Two modifiable areas are filled with different frequencies and
> it gives opportunity to use special searching algorithm. Such algorithm
> can use, for example, secondary superblock area for rough,
> preliminary search (because this modifiable area is changed rarely).
> Then, further, algorithm can continue search in first superblock
> area (because this modifiable area is changed more frequently).
> Moreover, segctor thread has knowledge about all dirty files and it can
> predict, theoretically, how many segments will be constructed.
> Thereby, it is possible to save in items of modifiable area's groups
> such prediction in the form of hint that it can be used during search
> for improving search algorithm efficiency.
> 
> With the best regards,
> Vyacheslav Dubeyko.

I hope I understand your approach correctly. Can it be summarized as
follows: Instead of overwriting the super block you want to reserve the
first segment to write the super block in a round-robin way into groups.
Thereby spreading the writes over a larger area. Then the groups should
probably have a typical erase block size like 512k. If that is true, I
don't think you need any special algorithm to search the latest super
block. You just read in the whole segment at mount time and select the
one with the biggest s_last_cno.

What about Ryusukes suggestion of never updating the super block and
instead using a clever segment allocation scheme that allows a binary
search for the latest segment?

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <52E8F7A7.8010505-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                           ` <52E8F7A7.8010505-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-29 13:19                             ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-29 13:19 UTC (permalink / raw)
  To: Andreas Rohner
  Cc: Ryusuke Konishi, Clemens Eisserer,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2014-01-29 at 13:44 +0100, Andreas Rohner wrote:

> 
> I hope I understand your approach correctly. Can it be summarized as
> follows: Instead of overwriting the super block you want to reserve the
> first segment to write the super block in a round-robin way into groups.

Not superblock but super root. It's different things.

> Thereby spreading the writes over a larger area. Then the groups should
> probably have a typical erase block size like 512k. 

Group size hasn't any relation with erase block size. Group is not erase
block. Moreover, different NAND chips can have different erase block
size. Erase block size can be 8 MB for modern chips. I don't think that
it needs to have any relations with erase block size. Group can have
different sizes that it needs to define from the algorithm efficiency
point of view. This special segment will fill by COW policy, anyway. 

> If that is true, I
> don't think you need any special algorithm to search the latest super
> block. You just read in the whole segment at mount time and select the
> one with the biggest s_last_cno.
> 

Even if you read the whole segment then you will need to search. But if
you have smart algorithm then you don't need to read the whole segment
in memory. It can be expensive operation that it will make mount more
slow operation.

> What about Ryusukes suggestion of never updating the super block and
> instead using a clever segment allocation scheme that allows a binary
> search for the latest segment?
> 

I think that you need to read our discussion with more attention. Of
course, my suggestion can have disadvantages and it needs to discuss it
more deeply. But now I have feeling that you misunderstand my suggestion
yet.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                       ` <1390901114.2942.11.camel-dzAnj6fV1RxGeWtTaGDT1UEK6ufn8VP3@public.gmane.org>
  2014-01-29 12:44                         ` Andreas Rohner
@ 2014-01-29 18:18                         ` Clemens Eisserer
       [not found]                           ` <CAFvQSYSu5CGxs+K6bZUCtq17PrS_paX3bXBuLBRTba_XWYGgAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-29 18:18 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

Just to make sure I understand your approach: You reserve two areas (a
few mb in size) which are dedicated to hold only super-root data and
this data is written at the same frequency as the current superblock?
If so, this most likely won't improve the situation for flash media
with weak FTL - as those writes will still be concentrated around a
few erase blocks. For devices with powerful FTL such as modern SSDs,
updating the superblock periodically doesn't pose a real problem.

I am still somewhat favour an optional simple linear-scan.
Would it be possible to check at mount-time whether the volume was
cleanly unmounted and otherwise perform a linear scan?
So when I mount the volume with o=no_superblock_update a flag is set
in the superblock, which is reset on a clean volume unmount after the
most recent segment has been stored in the superblock.
When at mount-time the flag is still set, nilfs knows a scan is
required as the segment in the superblock is not up-to-date.

Thanks & regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYSu5CGxs+K6bZUCtq17PrS_paX3bXBuLBRTba_XWYGgAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                           ` <CAFvQSYSu5CGxs+K6bZUCtq17PrS_paX3bXBuLBRTba_XWYGgAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-30  2:46                             ` Andreas Rohner
       [not found]                               ` <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  2014-01-30  8:35                             ` [static superblock discussion] Does nilfs2 do any in-place writes? Vyacheslav Dubeyko
  1 sibling, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  2:46 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner

Hi,

This is only a hacky proof of concept implementation and probably 
full of nasty bugs. I also havn't really tested it. I was 
just interested how hard it would be to implement Clemens' suggestion 
of writing the super block only at umount time and do a linear scan 
of all the segments in case of file system failure.

The linear scan is only performed if the file system wasn't shut down 
properly. So for normal operation there shouldn't be any slowdown.

I repurposed the ss_pad field of nilfs_segment_summary to contain the 
crc seed, because I needed a way to distinguish left over segments 
from previous nilfs2 volumes from real segments that are part of the 
current file system. 

I don't really expect it to be merged or anything. Maybe it can spark 
a discussion. Maybe somebody could try it out on an old SD-Card and 
time the mount command or something.

I tested it on a virtual machine. It seemed to recover fine when I 
killed the VM and mounted it again. Clearly more testing is 
necessary...

br,
Andreas Rohner

---
Andreas Rohner (1):
  nilfs2: add mount option that reduces super block writes

 fs/nilfs2/segbuf.c        |  3 ++-
 fs/nilfs2/segment.c       |  3 ++-
 fs/nilfs2/super.c         | 10 +++++++--
 fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/nilfs2_fs.h |  4 +++-
 5 files changed, 66 insertions(+), 8 deletions(-)

-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]

* [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                               ` <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-30  2:47                                 ` Andreas Rohner
       [not found]                                   ` <75ceb45c464097ab556baacf2d15d6ae4b792bb2.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  2014-01-30  3:27                                 ` [PATCH 0/1] " Andreas Rohner
  2014-01-30  5:29                                 ` Ryusuke Konishi
  2 siblings, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  2:47 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner

This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a power outage the file system
needs to be recovered by a linear scan of all segment summary blocks.

The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.

Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
 fs/nilfs2/segbuf.c        |  3 ++-
 fs/nilfs2/segment.c       |  3 ++-
 fs/nilfs2/super.c         | 10 +++++++--
 fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/nilfs2_fs.h |  4 +++-
 5 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 2d8be51..4ea9dd6 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -158,6 +158,7 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
 {
 	struct nilfs_segment_summary *raw_sum;
 	struct buffer_head *bh_sum;
+	struct the_nilfs *nilfs = segbuf->sb_super->s_fs_info;
 
 	bh_sum = list_entry(segbuf->sb_segsum_buffers.next,
 			    struct buffer_head, b_assoc_buffers);
@@ -172,7 +173,7 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
 	raw_sum->ss_nblocks  = cpu_to_le32(segbuf->sb_sum.nblocks);
 	raw_sum->ss_nfinfo   = cpu_to_le32(segbuf->sb_sum.nfinfo);
 	raw_sum->ss_sumbytes = cpu_to_le32(segbuf->sb_sum.sumbytes);
-	raw_sum->ss_pad      = 0;
+	raw_sum->ss_crc_seed      = cpu_to_le32(nilfs->ns_crc_seed);
 	raw_sum->ss_cno      = cpu_to_le64(segbuf->sb_sum.cno);
 }
 
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..e8e38a9 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2288,7 +2288,8 @@ static int nilfs_segctor_construct(struct nilfs_sc_info *sci, int mode)
 		if (mode != SC_FLUSH_DAT)
 			atomic_set(&nilfs->ns_ndirtyblks, 0);
 		if (test_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags) &&
-		    nilfs_discontinued(nilfs)) {
+		    nilfs_discontinued(nilfs) &&
+		    !nilfs_test_opt(nilfs, BAD_FTL)) {
 			down_write(&nilfs->ns_sem);
 			err = -EIO;
 			sbp = nilfs_prepare_super(sci->sc_super,
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 7ac2a12..c3374ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -505,7 +505,7 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
 		err = nilfs_construct_segment(sb);
 
 	down_write(&nilfs->ns_sem);
-	if (nilfs_sb_dirty(nilfs)) {
+	if (nilfs_sb_dirty(nilfs) && !nilfs_test_opt(nilfs, BAD_FTL)) {
 		sbp = nilfs_prepare_super(sb, nilfs_sb_will_flip(nilfs));
 		if (likely(sbp)) {
 			nilfs_set_log_cursor(sbp[0], nilfs);
@@ -691,6 +691,8 @@ static int nilfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",norecovery");
 	if (nilfs_test_opt(nilfs, DISCARD))
 		seq_puts(seq, ",discard");
+	if (nilfs_test_opt(nilfs, BAD_FTL))
+		seq_puts(seq, ",bad_ftl");
 
 	return 0;
 }
@@ -712,7 +714,7 @@ static const struct super_operations nilfs_sops = {
 enum {
 	Opt_err_cont, Opt_err_panic, Opt_err_ro,
 	Opt_barrier, Opt_nobarrier, Opt_snapshot, Opt_order, Opt_norecovery,
-	Opt_discard, Opt_nodiscard, Opt_err,
+	Opt_discard, Opt_nodiscard, Opt_err, Opt_bad_ftl,
 };
 
 static match_table_t tokens = {
@@ -726,6 +728,7 @@ static match_table_t tokens = {
 	{Opt_norecovery, "norecovery"},
 	{Opt_discard, "discard"},
 	{Opt_nodiscard, "nodiscard"},
+	{Opt_bad_ftl, "bad_ftl"},
 	{Opt_err, NULL}
 };
 
@@ -787,6 +790,9 @@ static int parse_options(char *options, struct super_block *sb, int is_remount)
 		case Opt_nodiscard:
 			nilfs_clear_opt(nilfs, DISCARD);
 			break;
+		case Opt_bad_ftl:
+			nilfs_set_opt(nilfs, BAD_FTL);
+			break;
 		default:
 			printk(KERN_ERR
 			       "NILFS: Unrecognized mount option \"%s\"\n", p);
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 94c451c..d29b2f0 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -168,6 +168,50 @@ static void nilfs_clear_recovery_info(struct nilfs_recovery_info *ri)
 	nilfs_dispose_segment_list(&ri->ri_used_segments);
 }
 
+static int nilfs_search_log_cursor(struct the_nilfs *nilfs)
+{
+	u64 segnum;
+	struct buffer_head *bh_sum = NULL;
+	struct nilfs_segment_summary *sum;
+	sector_t seg_start, seg_end; /* range of full segment (block number) */
+
+	for (segnum = 0; segnum < nilfs->ns_nsegments; ++segnum) {
+		brelse(bh_sum);
+
+		/* Calculate range of segment */
+		nilfs_get_segment_range(nilfs, segnum, &seg_start, &seg_end);
+
+		bh_sum = __bread(nilfs->ns_bdev, seg_start,
+					nilfs->ns_blocksize);
+		if (!bh_sum) {
+			printk(KERN_ERR "NILFS error searching for cursor.\n");
+			return -EINVAL;
+		}
+
+		sum = (struct nilfs_segment_summary *)bh_sum->b_data;
+
+		/*
+		 * use ss_crc_seed to distinguish the segments from previous
+		 * nilfs2 file systems on the same volume
+		 */
+		if (le32_to_cpu(sum->ss_magic) != NILFS_SEGSUM_MAGIC
+			|| le32_to_cpu(sum->ss_nblocks) == 0
+			|| le32_to_cpu(sum->ss_nblocks) >
+				nilfs->ns_blocks_per_segment
+			|| le32_to_cpu(sum->ss_crc_seed) != nilfs->ns_crc_seed)
+			continue;
+
+		if (le64_to_cpu(sum->ss_seq) > nilfs->ns_last_seq) {
+			nilfs->ns_last_pseg = seg_start;
+			nilfs->ns_last_cno = le64_to_cpu(sum->ss_cno);
+			nilfs->ns_last_seq = le64_to_cpu(sum->ss_seq);
+		}
+	}
+	brelse(bh_sum);
+
+	return 0;
+}
+
 /**
  * nilfs_store_log_cursor - load log cursor from a super block
  * @nilfs: nilfs object
@@ -179,7 +223,7 @@ static void nilfs_clear_recovery_info(struct nilfs_recovery_info *ri)
  * scanning and recovery.
  */
 static int nilfs_store_log_cursor(struct the_nilfs *nilfs,
-				  struct nilfs_super_block *sbp)
+				  struct nilfs_super_block *sbp, int valid_fs)
 {
 	int ret = 0;
 
@@ -187,6 +231,9 @@ static int nilfs_store_log_cursor(struct the_nilfs *nilfs,
 	nilfs->ns_last_cno = le64_to_cpu(sbp->s_last_cno);
 	nilfs->ns_last_seq = le64_to_cpu(sbp->s_last_seq);
 
+	if (!valid_fs && nilfs_test_opt(nilfs, BAD_FTL))
+		ret = nilfs_search_log_cursor(nilfs);
+
 	nilfs->ns_prev_seq = nilfs->ns_last_seq;
 	nilfs->ns_seg_seq = nilfs->ns_last_seq;
 	nilfs->ns_segnum =
@@ -263,7 +310,7 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 			goto scan_error;
 		}
 
-		err = nilfs_store_log_cursor(nilfs, sbp[0]);
+		err = nilfs_store_log_cursor(nilfs, sbp[0], 1);
 		if (err)
 			goto scan_error;
 
@@ -626,7 +673,8 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data)
 
 	nilfs->ns_mount_state = le16_to_cpu(sbp->s_state);
 
-	err = nilfs_store_log_cursor(nilfs, sbp);
+	err = nilfs_store_log_cursor(nilfs, sbp,
+			nilfs->ns_mount_state & NILFS_VALID_FS);
 	if (err)
 		goto failed_sbh;
 
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 4140f7f..6a8f5f8 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -135,6 +135,8 @@ struct nilfs_super_root {
 #define NILFS_MOUNT_NORECOVERY		0x4000  /* Disable write access during
 						   mount-time recovery */
 #define NILFS_MOUNT_DISCARD		0x8000  /* Issue DISCARD requests */
+#define NILFS_MOUNT_BAD_FTL		0x10000 /* Only write super block
+						   at umount time */
 
 
 /**
@@ -422,7 +424,7 @@ struct nilfs_segment_summary {
 	__le32 ss_nblocks;
 	__le32 ss_nfinfo;
 	__le32 ss_sumbytes;
-	__le32 ss_pad;
+	__le32 ss_crc_seed;
 	__le64 ss_cno;
 	/* array of finfo structures */
 };
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

[parent not found: <75ceb45c464097ab556baacf2d15d6ae4b792bb2.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                   ` <75ceb45c464097ab556baacf2d15d6ae4b792bb2.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-30  6:36                                     ` Vyacheslav Dubeyko
       [not found]                                       ` <127C78C3-9D47-439C-9639-263BC453D98D-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30  6:36 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,

On Jan 30, 2014, at 5:47 AM, Andreas Rohner wrote:

> This patch introduces a mount option bad_ftl that disables the
> periodic overwrites of the super block to make the file system better
> suitable for bad flash memory with a bad FTL. The super block is only
> written at umount time. So if there is a power outage the file system
> needs to be recovered by a linear scan of all segment summary blocks.
> 
> The linear scan is only necessary if the file system wasn't umounted
> properly. So the normal mount time is not affected.
> 
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> fs/nilfs2/segbuf.c        |  3 ++-
> fs/nilfs2/segment.c       |  3 ++-
> fs/nilfs2/super.c         | 10 +++++++--
> fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
> include/linux/nilfs2_fs.h |  4 +++-
> 5 files changed, 66 insertions(+), 8 deletions(-)
> 

As far as I can judge, conceptually it is simply rollback of the fix [1].

Thanks,
Vyacheslav Dubeyko.

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9bae189542e71f91e61a4428adf6e5a7dfe8063

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <127C78C3-9D47-439C-9639-263BC453D98D-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                       ` <127C78C3-9D47-439C-9639-263BC453D98D-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-30  6:02                                         ` Andreas Rohner
       [not found]                                           ` <52E9EB06.1000504-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  6:02 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

On 2014-01-30 07:36, Vyacheslav Dubeyko wrote:
> Hi Andreas,
> 
> On Jan 30, 2014, at 5:47 AM, Andreas Rohner wrote:
> 
>> This patch introduces a mount option bad_ftl that disables the
>> periodic overwrites of the super block to make the file system better
>> suitable for bad flash memory with a bad FTL. The super block is only
>> written at umount time. So if there is a power outage the file system
>> needs to be recovered by a linear scan of all segment summary blocks.
>>
>> The linear scan is only necessary if the file system wasn't umounted
>> properly. So the normal mount time is not affected.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> fs/nilfs2/segbuf.c        |  3 ++-
>> fs/nilfs2/segment.c       |  3 ++-
>> fs/nilfs2/super.c         | 10 +++++++--
>> fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>> include/linux/nilfs2_fs.h |  4 +++-
>> 5 files changed, 66 insertions(+), 8 deletions(-)
>>
> 
> As far as I can judge, conceptually it is simply rollback of the fix [1].

The normal recovery mode checks all partial segments and computes the
checksum over all the data. That takes significantly longer than my
approach of just checking one block per segment, namely the segment
summary block.

br,
Andreas Rohner

> Thanks,
> Vyacheslav Dubeyko.
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9bae189542e71f91e61a4428adf6e5a7dfe8063
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <52E9EB06.1000504-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                           ` <52E9EB06.1000504-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-30  7:44                                             ` Vyacheslav Dubeyko
       [not found]                                               ` <8DBE8E18-F678-44B0-A6A6-5AFEC227AA86-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30  7:44 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Jan 30, 2014, at 9:02 AM, Andreas Rohner wrote:

> Hi Vyacheslav,
> 
> On 2014-01-30 07:36, Vyacheslav Dubeyko wrote:
>> Hi Andreas,
>> 
>> On Jan 30, 2014, at 5:47 AM, Andreas Rohner wrote:
>> 
>>> This patch introduces a mount option bad_ftl that disables the
>>> periodic overwrites of the super block to make the file system better
>>> suitable for bad flash memory with a bad FTL. The super block is only
>>> written at umount time. So if there is a power outage the file system
>>> needs to be recovered by a linear scan of all segment summary blocks.
>>> 
>>> The linear scan is only necessary if the file system wasn't umounted
>>> properly. So the normal mount time is not affected.
>>> 
>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>> ---
>>> fs/nilfs2/segbuf.c        |  3 ++-
>>> fs/nilfs2/segment.c       |  3 ++-
>>> fs/nilfs2/super.c         | 10 +++++++--
>>> fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>> include/linux/nilfs2_fs.h |  4 +++-
>>> 5 files changed, 66 insertions(+), 8 deletions(-)
>>> 
>> 
>> As far as I can judge, conceptually it is simply rollback of the fix [1].
> 
> The normal recovery mode checks all partial segments and computes the
> checksum over all the data. That takes significantly longer than my
> approach of just checking one block per segment, namely the segment
> summary block.
> 

I don't think that your suggestion changes situation significantly.
Because in the issue [1] you can go through the whole volume in the really
bad environment. Maybe you will mount volume 30 minutes instead of 1 hour
with technique that you suggest in the patch.

With the best regards,
Vyacheslav Dubeyko.

> br,
> Andreas Rohner
> 
>> Thanks,
>> Vyacheslav Dubeyko.
>> 
>> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9bae189542e71f91e61a4428adf6e5a7dfe8063
>> 
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <8DBE8E18-F678-44B0-A6A6-5AFEC227AA86-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                               ` <8DBE8E18-F678-44B0-A6A6-5AFEC227AA86-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-30  6:52                                                 ` Andreas Rohner
  2014-01-30  9:48                                                 ` Andreas Rohner
  1 sibling, 0 replies; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  6:52 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-30 08:44, Vyacheslav Dubeyko wrote:
> 
> On Jan 30, 2014, at 9:02 AM, Andreas Rohner wrote:
> 
>> Hi Vyacheslav,
>>
>> On 2014-01-30 07:36, Vyacheslav Dubeyko wrote:
>>> Hi Andreas,
>>>
>>> On Jan 30, 2014, at 5:47 AM, Andreas Rohner wrote:
>>>
>>>> This patch introduces a mount option bad_ftl that disables the
>>>> periodic overwrites of the super block to make the file system better
>>>> suitable for bad flash memory with a bad FTL. The super block is only
>>>> written at umount time. So if there is a power outage the file system
>>>> needs to be recovered by a linear scan of all segment summary blocks.
>>>>
>>>> The linear scan is only necessary if the file system wasn't umounted
>>>> properly. So the normal mount time is not affected.
>>>>
>>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>>> ---
>>>> fs/nilfs2/segbuf.c        |  3 ++-
>>>> fs/nilfs2/segment.c       |  3 ++-
>>>> fs/nilfs2/super.c         | 10 +++++++--
>>>> fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>>> include/linux/nilfs2_fs.h |  4 +++-
>>>> 5 files changed, 66 insertions(+), 8 deletions(-)
>>>>
>>>
>>> As far as I can judge, conceptually it is simply rollback of the fix [1].
>>
>> The normal recovery mode checks all partial segments and computes the
>> checksum over all the data. That takes significantly longer than my
>> approach of just checking one block per segment, namely the segment
>> summary block.
>>
> 
> I don't think that your suggestion changes situation significantly.
> Because in the issue [1] you can go through the whole volume in the really
> bad environment. Maybe you will mount volume 30 minutes instead of 1 hour
> with technique that you suggest in the patch.

Yes probably. It is still a simple scan. But you wouldn't use the mount
option for 1 TB hard drives. I just thought it could be useful for small
linux systems, like the raspberry pi that use a small SD card as root.
But as I said it is just an experiment. Thanks for your comments.

br,
Andreas Rohner

> With the best regards,
> Vyacheslav Dubeyko.
> 
>> br,
>> Andreas Rohner
>>
>>> Thanks,
>>> Vyacheslav Dubeyko.
>>>
>>> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9bae189542e71f91e61a4428adf6e5a7dfe8063
>>>
>>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                               ` <8DBE8E18-F678-44B0-A6A6-5AFEC227AA86-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  2014-01-30  6:52                                                 ` Andreas Rohner
@ 2014-01-30  9:48                                                 ` Andreas Rohner
       [not found]                                                   ` <52EA2002.1030809-hi6Y0CQ0nG0@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  9:48 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-30 08:44, Vyacheslav Dubeyko wrote:
> 
> On Jan 30, 2014, at 9:02 AM, Andreas Rohner wrote:
> 
>> Hi Vyacheslav,
>>
>> On 2014-01-30 07:36, Vyacheslav Dubeyko wrote:
>>> Hi Andreas,
>>>
>>> On Jan 30, 2014, at 5:47 AM, Andreas Rohner wrote:
>>>
>>>> This patch introduces a mount option bad_ftl that disables the
>>>> periodic overwrites of the super block to make the file system better
>>>> suitable for bad flash memory with a bad FTL. The super block is only
>>>> written at umount time. So if there is a power outage the file system
>>>> needs to be recovered by a linear scan of all segment summary blocks.
>>>>
>>>> The linear scan is only necessary if the file system wasn't umounted
>>>> properly. So the normal mount time is not affected.
>>>>
>>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>>> ---
>>>> fs/nilfs2/segbuf.c        |  3 ++-
>>>> fs/nilfs2/segment.c       |  3 ++-
>>>> fs/nilfs2/super.c         | 10 +++++++--
>>>> fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>>> include/linux/nilfs2_fs.h |  4 +++-
>>>> 5 files changed, 66 insertions(+), 8 deletions(-)
>>>>
>>>
>>> As far as I can judge, conceptually it is simply rollback of the fix [1].
>>
>> The normal recovery mode checks all partial segments and computes the
>> checksum over all the data. That takes significantly longer than my
>> approach of just checking one block per segment, namely the segment
>> summary block.
>>
> 
> I don't think that your suggestion changes situation significantly.
> Because in the issue [1] you can go through the whole volume in the really
> bad environment. Maybe you will mount volume 30 minutes instead of 1 hour
> with technique that you suggest in the patch.
>>> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9bae189542e71f91e61a4428adf6e5a7dfe8063

I have just finished a test with my  100 GB HDD and SSD. I filled it
with dd until it was 100% full. Then I cut power to the machine and
timed the following mount operation:

100GB HDD:
time sudo mount -o bad_ftl /dev/sda1 /mnt/

real	1m21.068s
user	0m0.020s
sys	0m0.770s

100GB SSD:
time sudo mount -o bad_ftl /dev/sdc1 /mnt/

real	0m2.124s
user	0m0.010s
sys	0m0.243s

So it looks quite bad for hard drives. To scan a 1 TB hard drive would
take 13 minutes.

But a 1 TB SSD would only take 20 seconds!

I will test one of my SD cards next.

Regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <52EA2002.1030809-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                                   ` <52EA2002.1030809-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-30 11:27                                                     ` Vyacheslav Dubeyko
       [not found]                                                       ` <A6830DB2-DC73-4ACC-BE73-7A6EC1AC7C18-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30 11:27 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Jan 30, 2014, at 12:48 PM, Andreas Rohner wrote:

> 
> I have just finished a test with my  100 GB HDD and SSD. I filled it
> with dd until it was 100% full. Then I cut power to the machine and
> timed the following mount operation:
> 
> 100GB HDD:
> time sudo mount -o bad_ftl /dev/sda1 /mnt/
> 
> real	1m21.068s
> user	0m0.020s
> sys	0m0.770s
> 
> 100GB SSD:
> time sudo mount -o bad_ftl /dev/sdc1 /mnt/
> 
> real	0m2.124s
> user	0m0.010s
> sys	0m0.243s
> 
> So it looks quite bad for hard drives. To scan a 1 TB hard drive would
> take 13 minutes.
> 
> But a 1 TB SSD would only take 20 seconds!
> 

I think that it will be good to have comparable results for the same environment.
I mean, for example, measurement in different situations for SSD (without your
patch and this your patch). How much time do you need for scanning the whole SSD
by your approach? I think that comparison of linear scanning results for the whole
SSD drive and for sudden power-off situation can provide basis for consideration.
But sudden power-off situation can be different, I suppose.

Thanks,
Vyacheslav Dubeyko.

> I will test one of my SD cards next.
> 
> Regards,
> Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <A6830DB2-DC73-4ACC-BE73-7A6EC1AC7C18-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                                       ` <A6830DB2-DC73-4ACC-BE73-7A6EC1AC7C18-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-30 11:33                                                         ` Andreas Rohner
       [not found]                                                           ` <52EA38A3.8060107-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30 11:33 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-30 12:27, Vyacheslav Dubeyko wrote:
> 
> On Jan 30, 2014, at 12:48 PM, Andreas Rohner wrote:
> 
>>
>> I have just finished a test with my  100 GB HDD and SSD. I filled it
>> with dd until it was 100% full. Then I cut power to the machine and
>> timed the following mount operation:
>>
>> 100GB HDD:
>> time sudo mount -o bad_ftl /dev/sda1 /mnt/
>>
>> real	1m21.068s
>> user	0m0.020s
>> sys	0m0.770s
>>
>> 100GB SSD:
>> time sudo mount -o bad_ftl /dev/sdc1 /mnt/
>>
>> real	0m2.124s
>> user	0m0.010s
>> sys	0m0.243s
>>
>> So it looks quite bad for hard drives. To scan a 1 TB hard drive would
>> take 13 minutes.
>>
>> But a 1 TB SSD would only take 20 seconds!
>>
> 
> I think that it will be good to have comparable results for the same environment.
> I mean, for example, measurement in different situations for SSD (without your
> patch and this your patch). How much time do you need for scanning the whole SSD
> by your approach? I think that comparison of linear scanning results for the whole
> SSD drive and for sudden power-off situation can provide basis for consideration.
> But sudden power-off situation can be different, I suppose.

I will test it more thoroughly in the evening or tomorrow morning. The
whole SSD has 120 GB, but I only use a 100 GB partition to be able to
better compare it with my hard drive. The scan always scans the whole
partition, because you cannot predict where the latest segment will be.

It is a pretty worn out Samsung 840 SSD. I use it for all my tests.

br,
Andreas Rohner

> Thanks,
> Vyacheslav Dubeyko.
> 
>> I will test one of my SD cards next.
>>
>> Regards,
>> Andreas Rohner
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <52EA38A3.8060107-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [PATCH 1/1] nilfs2: add mount option that reduces super block writes
       [not found]                                                           ` <52EA38A3.8060107-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-02-01 19:05                                                             ` Clemens Eisserer
  0 siblings, 0 replies; 39+ messages in thread
From: Clemens Eisserer @ 2014-02-01 19:05 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,

Because modifying NILFS_SB_FREQ requires a kernel-recompile anyway, I
gave your patch a try.
So far everything seems fine and the filesystem survived a series of power-cuts.
Although this doesn't prove the code is safe under all circumstances
of course, it seems stable enough for my use case on the raspberry pi.

I also did some tests on the raspberry with unclean shutdowns and
recovery is barely noticeable on the 8GB SD card:

unclean shutdown with recovery:
> time mount -o bad_ftl /dev/mmcblk0p3 sd_8gb
> real        0m3.552s
> user        0m0.020s
> sys        0m0.430s

mount following a proper unmount:
> time mount -o bad_ftl /dev/mmcblk0p3 sd_8gb
> real        0m1.219s
> user        0m0.020s
> sys        0m0.280s

For me the patch somehow reminds me of running ext4 with journaling
disabled - it reduces writes at the cost of longer recovery times
after an unclean shutdown. The big difference with nilfs2 is that I
don't have to worry the FS won't survive it ;)

Thanks, Clemens

PS: The only thing I don't really like is the name of the mount option ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                               ` <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  2014-01-30  2:47                                 ` [PATCH 1/1] " Andreas Rohner
@ 2014-01-30  3:27                                 ` Andreas Rohner
  2014-01-30  5:29                                 ` Ryusuke Konishi
  2 siblings, 0 replies; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  3:27 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-30 03:46, Andreas Rohner wrote:
> I repurposed the ss_pad field of nilfs_segment_summary to contain the 
> crc seed, because I needed a way to distinguish left over segments 
> from previous nilfs2 volumes from real segments that are part of the 
> current file system. 

I just realized, that using the ss_pad field is completely unnecessary.
I just have to check the checksums. The reason why there is a crc_seed
in the first place is to distinguish segments from previous nilfs2
volumes. Sorry for that.

br,
Andreas Rohner


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                               ` <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  2014-01-30  2:47                                 ` [PATCH 1/1] " Andreas Rohner
  2014-01-30  3:27                                 ` [PATCH 0/1] " Andreas Rohner
@ 2014-01-30  5:29                                 ` Ryusuke Konishi
       [not found]                                   ` <20140130.142941.55837481.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2 siblings, 1 reply; 39+ messages in thread
From: Ryusuke Konishi @ 2014-01-30  5:29 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,
On Thu, 30 Jan 2014 03:46:59 +0100, Andreas Rohner wrote:
> Hi,
> 
> This is only a hacky proof of concept implementation and probably 
> full of nasty bugs. I also havn't really tested it. I was 
> just interested how hard it would be to implement Clemens' suggestion 
> of writing the super block only at umount time and do a linear scan 
> of all the segments in case of file system failure.
> The linear scan is only performed if the file system wasn't shut down 
> properly. So for normal operation there shouldn't be any slowdown.

This premise is not acceptable.
We have to avoid long mount time even after unexpected power failures.

I prefer some sort of way which combines binary search of segment
summary blocks and partial linear scan of logs. 

I don't know the latency of recent SSDs, however we should estimate
the latency of disk I/O about 5ms~20ms per a separate block (in the
case of hard drives).  So the maximum number of scans of segment
summary blocks seems to be roughly 10~100 times.

Regards,
Ryusuke Konishi

> 
> I repurposed the ss_pad field of nilfs_segment_summary to contain the 
> crc seed, because I needed a way to distinguish left over segments 
> from previous nilfs2 volumes from real segments that are part of the 
> current file system. 
> 
> I don't really expect it to be merged or anything. Maybe it can spark 
> a discussion. Maybe somebody could try it out on an old SD-Card and 
> time the mount command or something.
> 
> I tested it on a virtual machine. It seemed to recover fine when I 
> killed the VM and mounted it again. Clearly more testing is 
> necessary...
> 
> br,
> Andreas Rohner
> 
> ---
> Andreas Rohner (1):
>   nilfs2: add mount option that reduces super block writes
> 
>  fs/nilfs2/segbuf.c        |  3 ++-
>  fs/nilfs2/segment.c       |  3 ++-
>  fs/nilfs2/super.c         | 10 +++++++--
>  fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>  include/linux/nilfs2_fs.h |  4 +++-
>  5 files changed, 66 insertions(+), 8 deletions(-)
> 
> -- 
> 1.8.5.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <20140130.142941.55837481.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]

* Re: [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                                   ` <20140130.142941.55837481.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-30  5:59                                     ` Andreas Rohner
  2014-01-30  6:29                                     ` Andreas Rohner
  1 sibling, 0 replies; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  5:59 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On 2014-01-30 06:29, Ryusuke Konishi wrote:
> Hi Andreas,
> On Thu, 30 Jan 2014 03:46:59 +0100, Andreas Rohner wrote:
>> Hi,
>>
>> This is only a hacky proof of concept implementation and probably 
>> full of nasty bugs. I also havn't really tested it. I was 
>> just interested how hard it would be to implement Clemens' suggestion 
>> of writing the super block only at umount time and do a linear scan 
>> of all the segments in case of file system failure.
>> The linear scan is only performed if the file system wasn't shut down 
>> properly. So for normal operation there shouldn't be any slowdown.
> 
> This premise is not acceptable.
> We have to avoid long mount time even after unexpected power failures.

It is only activated by the mount option bad_ftl. So the user has to
chose it explicitly. The user wouldn't activate it if he/she uses a 4 TB
hard drive. The user also wouldn't activate it if he/she uses a modern
SSD with a decent FTL. It would only make sense to activate it for
crappy SD-Cards.

> I prefer some sort of way which combines binary search of segment
> summary blocks and partial linear scan of logs. 

For a binary search the segments have to be sorted at some granularity
(groups). I think this would hinder more sophisticated GC policies. That
seems to be a high price just so that the super block is not updated so
often.

> I don't know the latency of recent SSDs, however we should estimate
> the latency of disk I/O about 5ms~20ms per a separate block (in the
> case of hard drives).  So the maximum number of scans of segment
> summary blocks seems to be roughly 10~100 times.
> 
> Regards,
> Ryusuke Konishi
> 
>>
>> I repurposed the ss_pad field of nilfs_segment_summary to contain the 
>> crc seed, because I needed a way to distinguish left over segments 
>> from previous nilfs2 volumes from real segments that are part of the 
>> current file system. 
>>
>> I don't really expect it to be merged or anything. Maybe it can spark 
>> a discussion. Maybe somebody could try it out on an old SD-Card and 
>> time the mount command or something.
>>
>> I tested it on a virtual machine. It seemed to recover fine when I 
>> killed the VM and mounted it again. Clearly more testing is 
>> necessary...
>>
>> br,
>> Andreas Rohner
>>
>> ---
>> Andreas Rohner (1):
>>   nilfs2: add mount option that reduces super block writes
>>
>>  fs/nilfs2/segbuf.c        |  3 ++-
>>  fs/nilfs2/segment.c       |  3 ++-
>>  fs/nilfs2/super.c         | 10 +++++++--
>>  fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>  include/linux/nilfs2_fs.h |  4 +++-
>>  5 files changed, 66 insertions(+), 8 deletions(-)
>>
>> -- 
>> 1.8.5.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                                   ` <20140130.142941.55837481.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-30  5:59                                     ` Andreas Rohner
@ 2014-01-30  6:29                                     ` Andreas Rohner
       [not found]                                       ` <52E9F13A.5050805-hi6Y0CQ0nG0@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Andreas Rohner @ 2014-01-30  6:29 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On 2014-01-30 06:29, Ryusuke Konishi wrote:
> Hi Andreas,
> On Thu, 30 Jan 2014 03:46:59 +0100, Andreas Rohner wrote:
>> Hi,
>>
>> This is only a hacky proof of concept implementation and probably 
>> full of nasty bugs. I also havn't really tested it. I was 
>> just interested how hard it would be to implement Clemens' suggestion 
>> of writing the super block only at umount time and do a linear scan 
>> of all the segments in case of file system failure.
>> The linear scan is only performed if the file system wasn't shut down 
>> properly. So for normal operation there shouldn't be any slowdown.
> 
> This premise is not acceptable.
> We have to avoid long mount time even after unexpected power failures.
> 
> I prefer some sort of way which combines binary search of segment
> summary blocks and partial linear scan of logs. 
> 
> I don't know the latency of recent SSDs, however we should estimate
> the latency of disk I/O about 5ms~20ms per a separate block (in the
> case of hard drives).  So the maximum number of scans of segment
> summary blocks seems to be roughly 10~100 times.
> 
> Regards,
> Ryusuke Konishi

Basically I agree with you. It was just a quick experiment. I just
thought Clemens suggestion, to have a mount option to turn on the linear
scan for users who want it, was worth a try.

br,
Andreas Rohner

>>
>> I repurposed the ss_pad field of nilfs_segment_summary to contain the 
>> crc seed, because I needed a way to distinguish left over segments 
>> from previous nilfs2 volumes from real segments that are part of the 
>> current file system. 
>>
>> I don't really expect it to be merged or anything. Maybe it can spark 
>> a discussion. Maybe somebody could try it out on an old SD-Card and 
>> time the mount command or something.
>>
>> I tested it on a virtual machine. It seemed to recover fine when I 
>> killed the VM and mounted it again. Clearly more testing is 
>> necessary...
>>
>> br,
>> Andreas Rohner
>>
>> ---
>> Andreas Rohner (1):
>>   nilfs2: add mount option that reduces super block writes
>>
>>  fs/nilfs2/segbuf.c        |  3 ++-
>>  fs/nilfs2/segment.c       |  3 ++-
>>  fs/nilfs2/super.c         | 10 +++++++--
>>  fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>  include/linux/nilfs2_fs.h |  4 +++-
>>  5 files changed, 66 insertions(+), 8 deletions(-)
>>
>> -- 
>> 1.8.5.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <52E9F13A.5050805-hi6Y0CQ0nG0@public.gmane.org>]

* Re: [PATCH 0/1] nilfs2: add mount option that reduces super block writes
       [not found]                                       ` <52E9F13A.5050805-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-30  8:46                                         ` Ryusuke Konishi
  0 siblings, 0 replies; 39+ messages in thread
From: Ryusuke Konishi @ 2014-01-30  8:46 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 30 Jan 2014 07:29:14 +0100, Andreas Rohner wrote:
> Hi Ryusuke,
> 
> On 2014-01-30 06:29, Ryusuke Konishi wrote:
>> Hi Andreas,
>> On Thu, 30 Jan 2014 03:46:59 +0100, Andreas Rohner wrote:
>>> Hi,
>>>
>>> This is only a hacky proof of concept implementation and probably 
>>> full of nasty bugs. I also havn't really tested it. I was 
>>> just interested how hard it would be to implement Clemens' suggestion 
>>> of writing the super block only at umount time and do a linear scan 
>>> of all the segments in case of file system failure.
>>> The linear scan is only performed if the file system wasn't shut down 
>>> properly. So for normal operation there shouldn't be any slowdown.
>> 
>> This premise is not acceptable.
>> We have to avoid long mount time even after unexpected power failures.
>> 
>> I prefer some sort of way which combines binary search of segment
>> summary blocks and partial linear scan of logs. 
>> 
>> I don't know the latency of recent SSDs, however we should estimate
>> the latency of disk I/O about 5ms~20ms per a separate block (in the
>> case of hard drives).  So the maximum number of scans of segment
>> summary blocks seems to be roughly 10~100 times.
>> 
>> Regards,
>> Ryusuke Konishi
> 
> Basically I agree with you. It was just a quick experiment. I just
> thought Clemens suggestion, to have a mount option to turn on the linear
> scan for users who want it, was worth a try.

I see.  For further discussion on this approach, it looks like we need
some measurement data of the situation that this patch makes a
difference (for example, for an SD card or some device).  Anyway, I
agree that the patch has a value for experiment purpose.

Thanks,
Ryusuke Konishi


> br,
> Andreas Rohner
> 
>>>
>>> I repurposed the ss_pad field of nilfs_segment_summary to contain the 
>>> crc seed, because I needed a way to distinguish left over segments 
>>> from previous nilfs2 volumes from real segments that are part of the 
>>> current file system. 
>>>
>>> I don't really expect it to be merged or anything. Maybe it can spark 
>>> a discussion. Maybe somebody could try it out on an old SD-Card and 
>>> time the mount command or something.
>>>
>>> I tested it on a virtual machine. It seemed to recover fine when I 
>>> killed the VM and mounted it again. Clearly more testing is 
>>> necessary...
>>>
>>> br,
>>> Andreas Rohner
>>>
>>> ---
>>> Andreas Rohner (1):
>>>   nilfs2: add mount option that reduces super block writes
>>>
>>>  fs/nilfs2/segbuf.c        |  3 ++-
>>>  fs/nilfs2/segment.c       |  3 ++-
>>>  fs/nilfs2/super.c         | 10 +++++++--
>>>  fs/nilfs2/the_nilfs.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++---
>>>  include/linux/nilfs2_fs.h |  4 +++-
>>>  5 files changed, 66 insertions(+), 8 deletions(-)
>>>
>>> -- 
>>> 1.8.5.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                           ` <CAFvQSYSu5CGxs+K6bZUCtq17PrS_paX3bXBuLBRTba_XWYGgAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-01-30  2:46                             ` [PATCH 0/1] nilfs2: add mount option that reduces super block writes Andreas Rohner
@ 2014-01-30  8:35                             ` Vyacheslav Dubeyko
       [not found]                               ` <71B2806D-7CF2-4992-A588-EB73EADFFF9F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30  8:35 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Jan 29, 2014, at 9:18 PM, Clemens Eisserer wrote:

> Hi Vyacheslav,
> 
> Just to make sure I understand your approach: You reserve two areas (a
> few mb in size) which are dedicated to hold only super-root data and
> this data is written at the same frequency as the current superblock?

This statement is not fully correct. First of all, primary and secondary
reserved areas should have different frequencies. I suggest
to add information in the secondary reserved area (volume end) during
volume umount or with some big timeout in the case of significant system
uptime.

Primary reserved area should be written by super roots more frequently
than secondary reserved area. Maybe this information should be written
with the same frequency as superblock's updates, currently. But, anyway,
it will be not in-place update as currently. Information will be added by
means of COW policy. Moreover, replication of super roots will increase
reliability, as I suppose.

> If so, this most likely won't improve the situation for flash media
> with weak FTL - as those writes will still be concentrated around a
> few erase blocks. For devices with powerful FTL such as modern SSDs,
> updating the superblock periodically doesn't pose a real problem.
> 

I suppose that my suggestion can improve situation for any FTL.
Why do I think so? First of all, we will change in-place update of
superblocks on policy of filling reserved areas by means of COW
policy. So, as a result, any FTL can use more simple algorithms of
providing clear erase blocks from a pool.

You are worried that writes will be concentrated around a few erase
blocks. But you need to take into account that you are talking about
logical blocks. Because any FTL implements algorithm of mapping
logical blocks into physical once with the goal of wear leveling. So,
in-place update really complicates situation for any FTL. But COW
policy should improve situation for any FTL always because opportunity
to implement efficient algorithm of wear leveling in more simple manner.

So, as a result, when you are talking about logical block placement
then it doesn't mean that you are talking about writes in few physical
erase blocks. Because only mapping table of FTL knows what physical
erase blocks are changed really.

> I am still somewhat favour an optional simple linear-scan.
> Would it be possible to check at mount-time whether the volume was
> cleanly unmounted and otherwise perform a linear scan?

We know about clean (or unclean) umount from the superblock.
So, you should update superblock for keeping such knowledge.
Otherwise, you will need to perform linear scan always.

> So when I mount the volume with o=no_superblock_update a flag is set
> in the superblock, which is reset on a clean volume unmount after the
> most recent segment has been stored in the superblock.
> When at mount-time the flag is still set, nilfs knows a scan is
> required as the segment in the superblock is not up-to-date.
> 

I don't quite follow your thought. If we will not update superblock then
how we can save any changes in superblock? After any mount
s_last_cno can be changed very frequently because of GC activity.
As a result, linear scan will pass through hundreds or thousands segments
without superblock update or any other technique of keeping knowledge
about latest segment.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <71B2806D-7CF2-4992-A588-EB73EADFFF9F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                               ` <71B2806D-7CF2-4992-A588-EB73EADFFF9F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-30 10:09                                 ` Clemens Eisserer
       [not found]                                   ` <CAFvQSYQ84_BsqVC_ZM77P92jkP+1dh7NexvZWg4mFE7B3wSK0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-30 10:09 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> So, as a result, when you are talking about logical block placement
> then it doesn't mean that you are talking about writes in few physical
> erase blocks. Because only mapping table of FTL knows what physical
> erase blocks are changed really.
ntly).

For powerful FTL/controllers it almost doesn't make a difference
whether you write in-place or in the "almost random" style you
propose.
Those FTLs usually have a fully associative fine-grained
physical-to-logical mapping, so they also can distribute the wear
evenly across the entire NAND.

However there are also FTLs (SD cards, managed NAND, ...) arround
which do only mapping on erase-block level, with limited associativity
- often lacking a random write unit.
So each logical block is mapped to one of n possible physical erase
blocks (for SD cards 4MB erase blocks are quite common) and for each
single write a full erase block has to be erased. And as there are
only n physcial blocks which can be mapped to this logical block, the
card pretty soon starts to develop bad sectors (even worse, these days
often cheap TLC flash is used limited to < 500 erase cycles).

The linaro guys did a survey on this some time ago:
https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey
Very interesting is the paragraph about "Access modes", there are
still a lot of controllers which don't feature a random write unit.

> We know about clean (or unclean) umount from the superblock.
> So, you should update superblock for keeping such knowledge.
> Otherwise, you will need to perform linear scan always.
> I don't quite follow your thought. If we will not update superblock then
> how we can save any changes in superblock?

Only update the superblock at mount/unmount time, and do a linear scan
in the case of an unclean shutdown.

Thanks, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYQ84_BsqVC_ZM77P92jkP+1dh7NexvZWg4mFE7B3wSK0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                   ` <CAFvQSYQ84_BsqVC_ZM77P92jkP+1dh7NexvZWg4mFE7B3wSK0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-30 12:42                                     ` Vyacheslav Dubeyko
       [not found]                                       ` <AE0F313D-5934-452B-80AB-5D691AF8A4BE-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30 12:42 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Jan 30, 2014, at 1:09 PM, Clemens Eisserer wrote:

> Hi Vyacheslav,
> 
>> So, as a result, when you are talking about logical block placement
>> then it doesn't mean that you are talking about writes in few physical
>> erase blocks. Because only mapping table of FTL knows what physical
>> erase blocks are changed really.
> ntly).
> 
> For powerful FTL/controllers it almost doesn't make a difference
> whether you write in-place or in the "almost random" style you
> propose.
> Those FTLs usually have a fully associative fine-grained
> physical-to-logical mapping, so they also can distribute the wear
> evenly across the entire NAND.
> 

I suppose that current implementation is not bad. And it is possible
to give what you want by simple management of superblock's
update timeout. Because and now superblock is updated on mount/umount
and with frequency is defined by some timeout.

> However there are also FTLs (SD cards, managed NAND, ...) arround
> which do only mapping on erase-block level, with limited associativity
> - often lacking a random write unit.
> So each logical block is mapped to one of n possible physical erase
> blocks (for SD cards 4MB erase blocks are quite common) and for each
> single write a full erase block has to be erased. And as there are
> only n physcial blocks which can be mapped to this logical block, the
> card pretty soon starts to develop bad sectors (even worse, these days
> often cheap TLC flash is used limited to < 500 erase cycles).
> 

Ok. But I can't see anything bad for my approach. Because primary reserved
area will be 8MB. So, if super root (and all other info) is 4KB, for example, then
we can do 2048 write operations without any erase operations. It means that
if you will save super root for every full segment then you need to fill 16 GB of
volume by data. I think that if reserved area will be fully filled with one iteration
of whole volume filling then it will good policy for any FTL. As a result, if you will
save super root after 10 segments construction then it will be 160 GB volume;
to save super root for 100 segments construction -> 1 TB volume. Moreover,
it is possible to distribute load between two areas (primary and secondary) and
to increase size of reservation. And it gives opportunity to decrease count
of segments between super roots saving in reserved areas.

> The linaro guys did a survey on this some time ago:
> https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey
> Very interesting is the paragraph about "Access modes", there are
> still a lot of controllers which don't feature a random write unit.
> 

Thank you for the link.

Thanks,
Vyacheslav Dubeyko.

>> We know about clean (or unclean) umount from the superblock.
>> So, you should update superblock for keeping such knowledge.
>> Otherwise, you will need to perform linear scan always.
>> I don't quite follow your thought. If we will not update superblock then
>> how we can save any changes in superblock?
> 
> Only update the superblock at mount/unmount time, and do a linear scan
> in the case of an unclean shutdown.
> 
> Thanks, Clemens

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <AE0F313D-5934-452B-80AB-5D691AF8A4BE-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                       ` <AE0F313D-5934-452B-80AB-5D691AF8A4BE-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-30 13:09                                         ` Clemens Eisserer
       [not found]                                           ` <CAFvQSYQGDXmUit1zFZ9_LAjdLjxM-i_yR2L6pwFDX_BEdjdXxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-30 13:09 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,


> I suppose that current implementation is not bad. And it is possible
> to give what you want by simple management of superblock's
> update timeout. Because and now superblock is updated on mount/umount
> and with frequency is defined by some timeout.

What would happen in case of an unclean shutdown and a very large
superblock update intervall (several hours)?
As far as I understood, this is where Andreas' patch would come into play?

> Ok. But I can't see anything bad for my approach. Because primary reserved
> area will be 8MB. So, if super root (and all other info) is 4KB, for example, then
> we can do 2048 write operations without any erase operations.

The problem with this approach is, that there is a minimal write unit
which size is dependant on the FTL - explained in the linaro wiki:

> The smallest write unit is significantly larger than a page.
> Reading or writing less than one of these units causes a full unit to be accessed.
> Trying to do streaming write in smaller units causes the medium to do multiple
> read-modify-write cycles on the same write unit, which in turn causes multiple
> garbage collection cycles for writing a single allocation group from start to end.

So updating 4kb pages in a linear fashion would cause
read-modify-write cycles on most devices, with blocks as large as the
mapping unit (for SD cards this often means a full erase block of
several MBs).
The chapter "FAT optimization" lists several of those caveats, I found
it a very intersting and worthwhile reading.


Regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYQGDXmUit1zFZ9_LAjdLjxM-i_yR2L6pwFDX_BEdjdXxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                           ` <CAFvQSYQGDXmUit1zFZ9_LAjdLjxM-i_yR2L6pwFDX_BEdjdXxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-30 13:32                                             ` Vyacheslav Dubeyko
  2014-01-30 14:03                                               ` Clemens Eisserer
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30 13:32 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Thu, 2014-01-30 at 14:09 +0100, Clemens Eisserer wrote:
> Hi Vyacheslav,
> 
> 
> > I suppose that current implementation is not bad. And it is possible
> > to give what you want by simple management of superblock's
> > update timeout. Because and now superblock is updated on mount/umount
> > and with frequency is defined by some timeout.
> 
> What would happen in case of an unclean shutdown and a very large
> superblock update intervall (several hours)?
> As far as I understood, this is where Andreas' patch would come into play?
> 

The result of unclean shutdown and big update timeout will be a long
mount. And such issue was reported earlier and it was fixed. I don't
think that Andreas's patch can resolve long mount principally. Maybe
this approach can slightly reduce mount time in such situation.

> > Ok. But I can't see anything bad for my approach. Because primary reserved
> > area will be 8MB. So, if super root (and all other info) is 4KB, for example, then
> > we can do 2048 write operations without any erase operations.
> 
> The problem with this approach is, that there is a minimal write unit
> which size is dependant on the FTL - explained in the linaro wiki:
> 
> > The smallest write unit is significantly larger than a page.
> > Reading or writing less than one of these units causes a full unit to be accessed.
> > Trying to do streaming write in smaller units causes the medium to do multiple
> > read-modify-write cycles on the same write unit, which in turn causes multiple
> > garbage collection cycles for writing a single allocation group from start to end.
> 
> So updating 4kb pages in a linear fashion would cause
> read-modify-write cycles on most devices, with blocks as large as the
> mapping unit (for SD cards this often means a full erase block of
> several MBs).
> The chapter "FAT optimization" lists several of those caveats, I found
> it a very intersting and worthwhile reading.
> 

In such case NILFS2 at whole is in trouble. Because partial segments can
have different size. And these sizes doesn't correlate with sizes of
physical erase block or physical writing units. And the whole COW
approach is useless. Maybe some NAND chips have writing units larger
than page size. But in such case play on the on FTL side anyway.
Otherwise, it needs operate with pure NAND for the best efficiency. And
it is out of the NILFS2's scope.

With the best regards,
Vyacheslav Dubeyko.

> 
> Regards, Clemens
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
  2014-01-30 13:32                                             ` Vyacheslav Dubeyko
@ 2014-01-30 14:03                                               ` Clemens Eisserer
       [not found]                                                 ` <CAFvQSYQ-qkXz677-obgHVN5fLQiF10-A=T2yNNAHKRcOGm_Pqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-30 14:03 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> The result of unclean shutdown and big update timeout will be a long
> mount. And such issue was reported earlier and it was fixed. I don't
> think that Andreas's patch can resolve long mount principally. Maybe
> this approach can slightly reduce mount time in such situation.

So even without Andreas' patch there is no risk for data loss with a
very outdated superblock - but recovery would be slower?

> In such case NILFS2 at whole is in trouble. Because partial segments can
> have different size. And these sizes doesn't correlate with sizes of
> physical erase block or physical writing units. And the whole COW
> approach is useless.

Sure, NILFS won't cure the horrible write-amplification of those
devices, but it will spread the wear evenly over the whole device
thanks to COW.
So it won`t wear out the meadia faster where it`s metadata is stored
(with exception of the superblock) like ext4 does.
Btw. isn't nilfs's minimal writing size 128kb (I remember I read it in
a paper somewhere)?

Regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYQ-qkXz677-obgHVN5fLQiF10-A=T2yNNAHKRcOGm_Pqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                                 ` <CAFvQSYQ-qkXz677-obgHVN5fLQiF10-A=T2yNNAHKRcOGm_Pqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-30 15:27                                                   ` Vyacheslav Dubeyko
       [not found]                                                     ` <720AFF13-6203-4A28-9850-3C2CAFF3B7BF-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-30 15:27 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Jan 30, 2014, at 5:03 PM, Clemens Eisserer wrote:

> Hi Vyacheslav,
> 
>> The result of unclean shutdown and big update timeout will be a long
>> mount. And such issue was reported earlier and it was fixed. I don't
>> think that Andreas's patch can resolve long mount principally. Maybe
>> this approach can slightly reduce mount time in such situation.
> 
> So even without Andreas' patch there is no risk for data loss with a
> very outdated superblock - but recovery would be slower?
> 

Yes, recovery will be really slower. And recovery mechanisms defend from
data loss in reasonable way. I mean that some data can be lost during
unsuccessful flushes. But I am afraid that current state of Andreas's patch
breaks some recovery mechanisms (but maybe I am wrong).

>> In such case NILFS2 at whole is in trouble. Because partial segments can
>> have different size. And these sizes doesn't correlate with sizes of
>> physical erase block or physical writing units. And the whole COW
>> approach is useless.
> 
> Sure, NILFS won't cure the horrible write-amplification of those
> devices, but it will spread the wear evenly over the whole device
> thanks to COW.
> So it won`t wear out the meadia faster where it`s metadata is stored
> (with exception of the superblock) like ext4 does.
> Btw. isn't nilfs's minimal writing size 128kb (I remember I read it in
> a paper somewhere)?

But, anyway, how 128 KB correlates with physical erase blocks or
physical writing units size? :) Physical sizes can be much more and 128 KB
unable to defend from all possible situations. Sure, there isn't good
approach for all cases of real life. Moreover, if you have FTL then you
expect that block layer will operate with 4 KB block sizes. Otherwise,
it means that you sell bad storage devices.

Thanks,
Vyacheslav Dubeyko.

> 
> Regards, Clemens

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <720AFF13-6203-4A28-9850-3C2CAFF3B7BF-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                                     ` <720AFF13-6203-4A28-9850-3C2CAFF3B7BF-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-02-05 20:47                                                       ` Clemens Eisserer
       [not found]                                                         ` <CAFvQSYStT4uwxqtxATLbPOvHYjww=sw=C=f3vBi_qdu6MXAn5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-02-05 20:47 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> But, anyway, how 128 KB correlates with physical erase blocks or
> physical writing units size? :) Physical sizes can be much more and 128 KB
> unable to defend from all possible situations. Sure, there isn't good
> approach for all cases of real life.

I agree that modifying filesystem-code to work arround every single
firmware quirk isn't a good idea.
My enthusiasm stems from the fact that nilfs2 by design is as good as
you can get on devices with weak FTL implementations - except for the
frequent superblock updates.

> Moreover, if you have FTL then you
> expect that block layer will operate with 4 KB block sizes. Otherwise,
> it means that you sell bad storage devices.

Efficient block sizes as small as 4K are only doable with a (DRAM)
cache, which isn't a viable solution for small/cheap devices - even
the best SD cards can't handle that efficiently. So those "bad storage
devices" are unfortunately a common reality (sd/mmc cards, usb pen
drives, ...).

When looking through the raspberry pi forums, you'll find a lot
reports about dead SD cards, worn out where the ext4 journal or
metadata was placed. I admit this is a very specific example, but
quite a lot of embedded linux-powered solutions exist - and all of
them I know boot from (micro-)SD cards. So an improvement in this
situation (such as the mount option proposed by Andreas) could lower
the pain for this quite wide-spread use-case, without altering the
experience for present use-cases where nilfs2 does fine.

Rgerdas, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYStT4uwxqtxATLbPOvHYjww=sw=C=f3vBi_qdu6MXAn5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [static superblock discussion] Does nilfs2 do any in-place writes?
       [not found]                                                         ` <CAFvQSYStT4uwxqtxATLbPOvHYjww=sw=C=f3vBi_qdu6MXAn5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-02-07  6:43                                                           ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-02-07  6:43 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Wed, 2014-02-05 at 21:47 +0100, Clemens Eisserer wrote:

> > Moreover, if you have FTL then you
> > expect that block layer will operate with 4 KB block sizes. Otherwise,
> > it means that you sell bad storage devices.
> 
> Efficient block sizes as small as 4K are only doable with a (DRAM)
> cache, which isn't a viable solution for small/cheap devices - even
> the best SD cards can't handle that efficiently. So those "bad storage
> devices" are unfortunately a common reality (sd/mmc cards, usb pen
> drives, ...).
> 
> When looking through the raspberry pi forums, you'll find a lot
> reports about dead SD cards, worn out where the ext4 journal or
> metadata was placed. I admit this is a very specific example, but
> quite a lot of embedded linux-powered solutions exist - and all of
> them I know boot from (micro-)SD cards. So an improvement in this
> situation (such as the mount option proposed by Andreas) could lower
> the pain for this quite wide-spread use-case, without altering the
> experience for present use-cases where nilfs2 does fine.

Ok. I think that one of the possible another approach can be such.

[1] We can place static superblock information (it is small size info)
at the begin of every segment. It will be more FTL-friendly than static
superblock at the begin and at the end of a volume.

[2] We can use delayed mount. I mean that it is possible to make
preliminary mount immediately without long searching or recovering. And
to start recovering thread in the background for the search of latest
log. Of course, in such way we haven't enough knowledge about whole
volume state immediately and we need some time for searching last log in
background. But, theoretically, we can access to information in RO mode
on reliable portion of volume before ending of recovering process in
background. But I understand that delayed mount is really arguable
suggestion. :) 

[3] We can divide NILFS2 volume on allocation groups that it will
contain several segments. And we can make searching of latest log on the
basis of such allocation group, firstly. Then it will be used usual way
of searching latest log inside a latest allocation group.

[4] We can have special area, in likewise way as I suggested previously,
that it can store special structures for more efficient search. This
area(s) will store information about allocation groups. It will be
recorded only on umount. This area information will be modified only in
memory between mount and umount. And this area can be divided and saved
by means of portions with size adequate to physical erase block size. I
think it is possible to have configuration parameter of such portion
size that can be used by user for tuning on mkfs or tunefs phases.

Of course, it is really raw considerations.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Does nilfs2 do any in-place writes?
  2014-01-15 12:01         ` Vyacheslav Dubeyko
  2014-01-15 15:23           ` Ryusuke Konishi
@ 2014-01-16 10:03           ` Clemens Eisserer
       [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Clemens Eisserer @ 2014-01-16 10:03 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

> NILFS2 has special method nilfs_sb_need_update() [1] and special
> constant NILFS_SB_FREQ [2] that it is used usually for definition
> frequency of superblocks updating. So, as far as I can judge, default
> value of such frequency under high I/O load is 10 seconds (Minimum
> interval of periodical update of superblocks (in seconds)).
>
> [1] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L254
> [2] http://lxr.free-electrons.com/source/fs/nilfs2/the_nilfs.h#L252

Thanks for the in-depth explanation.

> We need to the periodical in-place superblock write only for updating
> a pointer to the most latest log.  And, this will be eliminable if we
> can invent the fast way to determine the latest log.

Maybe it would be enough to detect whether the stored pointer to the
last log is recent and otherwise perform a slow scan?

Thanks, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

[parent not found: <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Does nilfs2 do any in-place writes?
       [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-16 10:10               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 39+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-16 10:10 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2014-01-16 at 11:03 +0100, Clemens Eisserer wrote:

> 
> Maybe it would be enough to detect whether the stored pointer to the
> last log is recent and otherwise perform a slow scan?
> 

Slow scan means slow mount. :) Are you ready to wait the ending of
really long mount?

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-02-07  6:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15 10:44 Does nilfs2 do any in-place writes? Clemens Eisserer
     [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 10:52   ` Vyacheslav Dubeyko
2014-01-15 11:44     ` Clemens Eisserer
     [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 12:01         ` Vyacheslav Dubeyko
2014-01-15 15:23           ` Ryusuke Konishi
     [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-16 10:08               ` Vyacheslav Dubeyko
2014-01-17 22:55                 ` Ryusuke Konishi
     [not found]                   ` <20140118.075519.43661574.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-20 11:54                     ` [writable snapshots discussion] " Vyacheslav Dubeyko
2014-01-18  0:00                 ` Ryusuke Konishi
     [not found]                   ` <20140118.090008.194171715.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-28  9:25                     ` [static superblock discussion] " Vyacheslav Dubeyko
     [not found]                       ` <1390901114.2942.11.camel-dzAnj6fV1RxGeWtTaGDT1UEK6ufn8VP3@public.gmane.org>
2014-01-29 12:44                         ` Andreas Rohner
     [not found]                           ` <52E8F7A7.8010505-hi6Y0CQ0nG0@public.gmane.org>
2014-01-29 13:19                             ` Vyacheslav Dubeyko
2014-01-29 18:18                         ` Clemens Eisserer
     [not found]                           ` <CAFvQSYSu5CGxs+K6bZUCtq17PrS_paX3bXBuLBRTba_XWYGgAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-30  2:46                             ` [PATCH 0/1] nilfs2: add mount option that reduces super block writes Andreas Rohner
     [not found]                               ` <cover.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-01-30  2:47                                 ` [PATCH 1/1] " Andreas Rohner
     [not found]                                   ` <75ceb45c464097ab556baacf2d15d6ae4b792bb2.1391048231.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-01-30  6:36                                     ` Vyacheslav Dubeyko
     [not found]                                       ` <127C78C3-9D47-439C-9639-263BC453D98D-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-30  6:02                                         ` Andreas Rohner
     [not found]                                           ` <52E9EB06.1000504-hi6Y0CQ0nG0@public.gmane.org>
2014-01-30  7:44                                             ` Vyacheslav Dubeyko
     [not found]                                               ` <8DBE8E18-F678-44B0-A6A6-5AFEC227AA86-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-30  6:52                                                 ` Andreas Rohner
2014-01-30  9:48                                                 ` Andreas Rohner
     [not found]                                                   ` <52EA2002.1030809-hi6Y0CQ0nG0@public.gmane.org>
2014-01-30 11:27                                                     ` Vyacheslav Dubeyko
     [not found]                                                       ` <A6830DB2-DC73-4ACC-BE73-7A6EC1AC7C18-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-30 11:33                                                         ` Andreas Rohner
     [not found]                                                           ` <52EA38A3.8060107-hi6Y0CQ0nG0@public.gmane.org>
2014-02-01 19:05                                                             ` Clemens Eisserer
2014-01-30  3:27                                 ` [PATCH 0/1] " Andreas Rohner
2014-01-30  5:29                                 ` Ryusuke Konishi
     [not found]                                   ` <20140130.142941.55837481.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-30  5:59                                     ` Andreas Rohner
2014-01-30  6:29                                     ` Andreas Rohner
     [not found]                                       ` <52E9F13A.5050805-hi6Y0CQ0nG0@public.gmane.org>
2014-01-30  8:46                                         ` Ryusuke Konishi
2014-01-30  8:35                             ` [static superblock discussion] Does nilfs2 do any in-place writes? Vyacheslav Dubeyko
     [not found]                               ` <71B2806D-7CF2-4992-A588-EB73EADFFF9F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-30 10:09                                 ` Clemens Eisserer
     [not found]                                   ` <CAFvQSYQ84_BsqVC_ZM77P92jkP+1dh7NexvZWg4mFE7B3wSK0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-30 12:42                                     ` Vyacheslav Dubeyko
     [not found]                                       ` <AE0F313D-5934-452B-80AB-5D691AF8A4BE-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-30 13:09                                         ` Clemens Eisserer
     [not found]                                           ` <CAFvQSYQGDXmUit1zFZ9_LAjdLjxM-i_yR2L6pwFDX_BEdjdXxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-30 13:32                                             ` Vyacheslav Dubeyko
2014-01-30 14:03                                               ` Clemens Eisserer
     [not found]                                                 ` <CAFvQSYQ-qkXz677-obgHVN5fLQiF10-A=T2yNNAHKRcOGm_Pqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-30 15:27                                                   ` Vyacheslav Dubeyko
     [not found]                                                     ` <720AFF13-6203-4A28-9850-3C2CAFF3B7BF-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-02-05 20:47                                                       ` Clemens Eisserer
     [not found]                                                         ` <CAFvQSYStT4uwxqtxATLbPOvHYjww=sw=C=f3vBi_qdu6MXAn5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-02-07  6:43                                                           ` Vyacheslav Dubeyko
2014-01-16 10:03           ` Clemens Eisserer
     [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-16 10:10               ` Vyacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox