Re: Does nilfs2 do any in-place writes?

Linux NILFS development
 help / color / mirror / Atom feed

From: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
To: Ryusuke Konishi
	<konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: linux-nilfs <linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Does nilfs2 do any in-place writes?
Date: Sat, 18 Jan 2014 12:45:49 +0100	[thread overview]
Message-ID: <52DA696D.6010206@gmx.net> (raw)
In-Reply-To: <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

On 2014-01-18 02:47, Ryusuke Konishi wrote:
> On Fri, 17 Jan 2014 10:31:55 +0400, Vyacheslav Dubeyko wrote:
>> On Thu, 2014-01-16 at 17:48 +0000, Mark Trumpold wrote:
>>> Hello All,
>>>
>>> I am wondering what the impact of in-place writes of the
>>> superblock has on SSDs in terms of wear?
>>>
>>> I've been stress testing our system which uses Nilfs, and
>>> recently I had a SSD fail with the classic messages indicating
>>> low level media problems -- and also implicating Nilfs as trying
>>> to locate a superblock (I think).
>>>
>>> Following is a partial dmesg list: 
>>>
>>> [    7.630382] Sense Key : Medium Error [current] [descriptor]
>>> [    7.630385] Descriptor sense data with sense descriptors (in hex):
>>> [    7.630386]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
>>> [    7.630394]         05 ff 0e 58 
>>> [    7.630397] sd 0:0:0:0: [sda]  
>>> [    7.630399] Add. Sense: Unrecovered read error - auto reallocate failed
>>> [    7.630401] sd 0:0:0:0: [sda] CDB: 
>>> [    7.630402] Read(10): 28 00 05 ff 0e 54 00 00 08 00
>>> [    7.630409] end_request: I/O error, dev sda, sector 100601432
>>> [    7.635326] NILFS warning: I/O error on loading last segment
>>> [    7.635329] NILFS: error searching super root.
>>>
>>>
>>
>> I don't think that this issue is related to superblocks. Because I can't
>> see in your output the magic signature of NILFS2. For example, I have
>> such first 16 bytes in superblock:
>>
>> 00000400  02 00 00 00 00 00 34 34  18 01 00 00 52 85 db 71  |......44....R..q|
>>
>> Of course, I don't know your partition table details but I doubt that
>> sector 100601432 is a superblock sector. Moreover, you have error
>> messages that inform about troubles with loading last segment during
>> super root searching.
>>
>> We have on NILFS2 only two blocks that live under in-place update
>> policy. An update frequency is not so high. So, I suppose that any FTL
>> can easily provide good wear leveling support for superblocks. But, of
>> course, in-place update is not good policy for flash-based devices,
>> anyway.
>>
>> Maybe, I misunderstand something in your output. But I suppose that
>> during stress-testing you can discover I/O error in any part of volume.
>> Because it is really hard to predict when you will exhaust spare pool of
>> erase blocks.
> 
> Rather, the issue on the flash devices may come from the current
> immature garbage collection algorithm.  The current cleanerd only
> supports the timestamp-based GC policy which always tries to move the
> oldest segment first and even moves segments full of live blocks,
> thereby shortens the lifetime of flash devices. :-(
> 
> Actually, this is a high-priority todo, and now I am inclined to
> consider it with the group concept of segments.

Hi,

I am currently working on the garbage collector. I have implemented the
cost-benefit and greedy policies. It is quite a big change and I was
reluctant to submit a patch until I thoroughly tested it. I have
substantially redesigned it since last time I wrote about it on the
mailinglist. Now it seems to be very stable and the results are quite
promising.

The following results [1] are from my "ultimate" benchmark. It runs on
an AMD Phenom II X6 1090T processor with 8GB Ram and a Samsung SSD 840
with a 100GB partition for NILFS2. I used the Lair62 NFS traces form the
IOTTA Repository [2] to get a realistic and reproducible benchmark:

This is what the benchmark does:

1. Create a 20GB file of static data
2a. Start replaying the Lair62 NFS traces
2b. In parallel turn random checkpoints into snapshots every 5 minutes,
keep a list of the snapshots and turn them back into checkpoints after
15 minutes, so there are at most 3 snapshots present at the same time.

Timestamp is so slow, because it needlessly copies the 20GB static data
around over and over again, which can be seen because of the periodic
drops in performance. The other policies ignore the static data and
never move it. This is also evident if you compare the amount of data
written to the device [3] (compare /proc/diskstats before and after the
benchmark).

If you are interested I could clean up my code and submit a patch set
for review. I am sure there are lots of things, that need to be changed,
but maybe it can give you some ideas...

It would also be possible, to improve timestamp by allowing the cleaner
to abort if there is nothing to gain from cleaning a particular segment.
Instead it could just updated the su_lastmod in the SUFILE without doing
anything else. This would be a fairly simple change. I could provide a
patch for that too.

Regards,
Andreas Rohner

[1] https://www.dropbox.com/s/3ued8g5xaktnpbq/replay_parallel_ssd_line.pdf
[2] http://iotta.snia.org/historical_section?tracetype_id=2
[3]
https://www.dropbox.com/s/nwfixlzzzvf93v2/replay_parallel_stats_write.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2014-01-18 11:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-16 17:48 Does nilfs2 do any in-place writes? Mark Trumpold
2014-01-16 18:41 ` Clemens Eisserer
2014-01-17  6:31 ` Vyacheslav Dubeyko
2014-01-18  1:47   ` Ryusuke Konishi
     [not found]     ` <20140118.104703.356941870.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-18  9:44       ` Clemens Eisserer
     [not found]         ` <CAFvQSYQZtf0fsfX_7zNHdw4hVo9VHggN9F0TYEi1Fwo2ZvS4Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-18 16:25           ` Mark Trumpold
     [not found]             ` <CEFFE8EC.9A4A%markt-qk0wvQ0ghJwAvxtiuMwx3w@public.gmane.org>
2014-01-18 18:11               ` Vyacheslav Dubeyko
2014-01-18 11:45       ` Andreas Rohner [this message]
     [not found]         ` <52DA696D.6010206-hi6Y0CQ0nG0@public.gmane.org>
2014-01-18 23:08           ` Vyacheslav Dubeyko
     [not found]             ` <04877EE1-F5BF-41CE-AC92-CD9C3ED0B8A4-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-18 23:08               ` Andreas Rohner
     [not found]                 ` <52DB098A.4010300-hi6Y0CQ0nG0@public.gmane.org>
2014-01-19  5:43                   ` Ryusuke Konishi
     [not found]                     ` <20140119.144345.373615211.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-19 14:11                       ` Andreas Rohner
  -- strict thread matches above, loose matches on Subject: below --
2014-01-17 19:19 Mark Trumpold
2014-01-16 19:40 Mark Trumpold
2014-01-15 10:44 Clemens Eisserer
     [not found] ` <CAFvQSYSzpX_WpUi9KpGj0pZvzhw2mfzzOqcgdj9ripXAjipmtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 10:52   ` Vyacheslav Dubeyko
2014-01-15 11:44     ` Clemens Eisserer
     [not found]       ` <CAFvQSYTG6HBVc9iodYyvCejwf889jiwOPsVb1Hi8cDrR9pOGeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-15 12:01         ` Vyacheslav Dubeyko
2014-01-15 15:23           ` Ryusuke Konishi
     [not found]             ` <20140116.002353.94325733.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-16 10:08               ` Vyacheslav Dubeyko
2014-01-17 22:55                 ` Ryusuke Konishi
2014-01-18  0:00                 ` Ryusuke Konishi
2014-01-16 10:03           ` Clemens Eisserer
     [not found]             ` <CAFvQSYSC7+dd93pRH-uok9N+A_s=1VKrfGEppu3qRTg3q=CuXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-16 10:10               ` Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DA696D.6010206@gmx.net \
    --to=andreas.rohner-hi6y0cq0ng0@public.gmane.org \
    --cc=konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org \
    --cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox