Re: SSD failure modes - Kent Overstreet

public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed

From: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: James Harper <james.harper-NMzNsA1hOHcW+bLBXbPJGg@public.gmane.org>
Cc: "linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: SSD failure modes
Date: Thu, 24 Jan 2013 14:52:05 -0800	[thread overview]
Message-ID: <20130124225205.GL26407@google.com> (raw)
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B35638C4A-mzsoxcrO4/2UD0RQwgcqbDSf8X3wrgjD@public.gmane.org>

On Wed, Jan 23, 2013 at 02:56:40AM +0000, James Harper wrote:
> What is the expected behaviour of bcache when an SSD wears out? Do SSD's internally do a verify after write to ensure that the data has made it to the 'media' correctly and report a failure if that's the case? Does (or can) bcache do a verify itself?

I've never heard of SSDs doing that, AFAIK they rely more on strong ECC.
Bcache does not itself do that kind of verify, though I think it'd be
pretty easy to implement (and you'd only need it for dirty data and
metadata).

> And what about an SSD that fails hard (eg linux detects an unplug)? A system crash is acceptable in such a case if the cache was in write-back mode, but what are the chances of rebooting successfully with the outstanding cached writes now lost?

I just wrote some documentation about error handling - tell me if that
helps:
http://atlas.evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt?h=bcache-dev

Not quite sure I get the scenario you're describing though - you unplug
the SSD, then reboot?

The reboot itself is not a problem, nor is unplugging the SSD -
unplugging the SSD from bcache's perspective looks like a crash when
things come up again (at some point writes just stopped, but whatever's
on the SSD will still be consistent).

However, if you unplug the SSD in writeback mode, run for a bit, and
then reboot - after the SSD is unplugged all the writeback writes are
going to error. We could retry those writes as writes that bypass the
cache (in case it was just a random IO error), although we don't do that
yet - but metadata writes fail in writeback mode we may want to just
panic the kernel.

Hrm.

next prev parent reply	other threads:[~2013-01-24 22:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23  2:56 SSD failure modes James Harper
     [not found] ` <6035A0D088A63A46850C3988ED045A4B35638C4A-mzsoxcrO4/2UD0RQwgcqbDSf8X3wrgjD@public.gmane.org>
2013-01-24 22:52   ` Kent Overstreet [this message]
     [not found]     ` <20130124225205.GL26407-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-01-25  2:26       ` James Harper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130124225205.GL26407@google.com \
    --to=koverstreet-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
    --cc=james.harper-NMzNsA1hOHcW+bLBXbPJGg@public.gmane.org \
    --cc=linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox