All of lore.kernel.org
 help / color / mirror / Atom feed
* SDD Failure - What to do?
@ 2013-10-05  8:12 David Humphreys
       [not found] ` <524FC9DB.5080608-Ni3nFjXdSsm9FHfhHBbuYA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: David Humphreys @ 2013-10-05  8:12 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

The first bit of this is by way of scene setting...

I have been operating a machine with a bcache configuration for the root
filesystem.

The SDD part is a SanDisk Extreme 120G drive; the main drive is a 1T
Seagate; both 2.5" SATA on 6Gb/s interfaces.

Last week the computer hung, and on attempted reboot, the SDD (which was
the boot drive) had disappeared.

Unplugging and replugging the drive brought it back, so I thought that
I'd suffered from a poorly mated cable.

All started to boot OK, but BTRFS would not mount, and I had to zero the
log.

Therefore, as may have been expected, the failure of the SDD had left
the filesystem somewhat corrupted, but recoverable.

Now...

The computer has just crashed again, this time, the SDD has clearly
failed 'hard'. It has disappeared and cannot be made to return.

I have replaced the SDD with a new, identical, device. This now appears
at boot time.

I can boot the machine to a sensible recovery state from a different
drive in the machine.

What is the best procedure to recover?

What I really want to do is to get the new SDD working as the cache for
the original main drive, then boot from the pair as normal.

I don't really want to experiment without taking advice, because this
seems to me like a good way to risk loosing everything.


I then have a subsidiary question:

This total failure of the drive to even appear at boot time does not
seem to me to be a likely symptom of SDD failure through repeated erase
cycles. Agreed?

I am assuming that it is just one of those unfortunate early mortality
failures of the drive electronics.

This is an important point, because it would be a bit of a disaster if
it were a repeatable failure brought about by the pattern of use using
bcache. I will return the SDD to SanDisk under warranty and see what
happens.

Regards,
David Humphreys

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SDD Failure - What to do?
       [not found] ` <524FC9DB.5080608-Ni3nFjXdSsm9FHfhHBbuYA@public.gmane.org>
@ 2013-10-05 10:03   ` David Humphreys
  2013-10-05 12:23   ` David Humphreys
  2013-10-05 14:37   ` David Humphreys
  2 siblings, 0 replies; 4+ messages in thread
From: David Humphreys @ 2013-10-05 10:03 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

I have just read my posting via the web listing.
Apologies, I have repeatedly typed 'SDD' when, of course, I meant 'SSD'.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SDD Failure - What to do?
       [not found] ` <524FC9DB.5080608-Ni3nFjXdSsm9FHfhHBbuYA@public.gmane.org>
  2013-10-05 10:03   ` David Humphreys
@ 2013-10-05 12:23   ` David Humphreys
  2013-10-05 14:37   ` David Humphreys
  2 siblings, 0 replies; 4+ messages in thread
From: David Humphreys @ 2013-10-05 12:23 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

I proceeded to prepare for recovery by partitioning the new SDD and
setting it up again as the boot disk (installing grub2 and the necessary
kernel and initramfs files).

I rebooted into my initramfs recovery environment and found that
everything seems to work and I haven't done anything.

I have my normal BTRFS root filesystem up and running again.

Nothing is lost. Nothing done to reconfigure bcache.

/dev/bcache0 is there and mounted as my root filesystem.

Subsidiary questions:

1) Is this now running without using the SSD cache?

2) I have complicated the situation by adding additional partitions to
the SSD (to provide journals for 'ceph'), therefore the partition on the
SSD that is now intended for bcache is not the same as it was
originally. To be specific, the original bcache SSD partition was
/dev/sda4; now I have added 4 small partitions for ceph journals and my
intended bcache partition is /dev/sda8.

My assumption is that the most likely scenario is that bcache has not
found the initialised cache partion on the SSD and is simply giving
cache-free access to the main drive.

An alternative scenario is that the cache configuration information is
actually stored on the main drive and that the new /dev/sda4 is being
used as a cache.

What I'd like to be able to configure/establish is that the new
/dev/sda8 partition is an active cache for the main drive partition
(/dev/sdb4).

Regards,
David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SDD Failure - What to do?
       [not found] ` <524FC9DB.5080608-Ni3nFjXdSsm9FHfhHBbuYA@public.gmane.org>
  2013-10-05 10:03   ` David Humphreys
  2013-10-05 12:23   ` David Humphreys
@ 2013-10-05 14:37   ` David Humphreys
  2 siblings, 0 replies; 4+ messages in thread
From: David Humphreys @ 2013-10-05 14:37 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

OK, I have done the experimentation and everything is now clear and fixed.

The /sys/block/bcache0/bcache/state was 'none'.

So I did 'make-bcache -C /dev/sda8'

I did 'echo <UID> >/sys/block/bcache0/bcache/attach'

And I have /sys/block/bcache0/bcache/state is 'clean'.

So I'd say that this is what they would call 'cool'.

Complete SSD failure is easy to fix and did not cause any collateral damage.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-05 14:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-05  8:12 SDD Failure - What to do? David Humphreys
     [not found] ` <524FC9DB.5080608-Ni3nFjXdSsm9FHfhHBbuYA@public.gmane.org>
2013-10-05 10:03   ` David Humphreys
2013-10-05 12:23   ` David Humphreys
2013-10-05 14:37   ` David Humphreys

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.