Failed HD, automate procedure?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Failed HD, automate procedure?
@ 2012-04-02 10:31 Marco Aroldi
  2012-04-02 15:07 ` Wido den Hollander
  2012-04-02 16:23 ` Tommi Virtanen
  0 siblings, 2 replies; 4+ messages in thread
From: Marco Aroldi @ 2012-04-02 10:31 UTC (permalink / raw)
  To: ceph-devel

Hi all,
i'm looking the procedure to replace a failed HD
(http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD)
and I was wondering if the procedure could be more automated like:

1- The operator replaces the failed hd, say osd.23
2- Give a new command like "ceph reborn osd.23"

Thanks, thanks to all the community for this big piece of software!
Marco Aroldi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failed HD, automate procedure?
  2012-04-02 10:31 Failed HD, automate procedure? Marco Aroldi
@ 2012-04-02 15:07 ` Wido den Hollander
  2012-04-02 16:23 ` Tommi Virtanen
  1 sibling, 0 replies; 4+ messages in thread
From: Wido den Hollander @ 2012-04-02 15:07 UTC (permalink / raw)
  To: Marco Aroldi; +Cc: ceph-devel

Hi,

On 04/02/2012 12:31 PM, Marco Aroldi wrote:
> Hi all,
> i'm looking the procedure to replace a failed HD
> (http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD)

The Wiki is a bit outdated, most of the docs are moving to 
http://ceph.newdream.net/docs/ but the Wiki is still linked on the 
frontpage.

> and I was wondering if the procedure could be more automated like:
>
> 1- The operator replaces the failed hd, say osd.23
> 2- Give a new command like "ceph reborn osd.23"

Currently that isn't available. But some discussion has been going on 
about this: http://marc.info/?l=ceph-devel&m=133106885906229&w=2

That might not seem related at first glance, but it is. Somehow that new 
disk has to be formatted and linked to the new OSD.

A OSD simply wants a data directory were to store it's data.

If you replace the disk, something/somebody has to format that disk and 
make sure it's mounted.

When that's done, the OSD can format the fresh data directory and 
connect to the cluster again.

The OSD also needs the current monitor map and needs his key to 
authenticate to the cluster.

That data needs to come from somewhere, some form of external 
involvement is needed to get this done.

To make the story short: This is on the radar.

Wido

>
> Thanks, thanks to all the community for this big piece of software!
> Marco Aroldi
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failed HD, automate procedure?
  2012-04-02 10:31 Failed HD, automate procedure? Marco Aroldi
  2012-04-02 15:07 ` Wido den Hollander
@ 2012-04-02 16:23 ` Tommi Virtanen
  2012-04-02 20:33   ` Marco Aroldi
  1 sibling, 1 reply; 4+ messages in thread
From: Tommi Virtanen @ 2012-04-02 16:23 UTC (permalink / raw)
  To: Marco Aroldi; +Cc: ceph-devel

On Mon, Apr 2, 2012 at 03:31, Marco Aroldi <marco.aroldi@gmail.com> wrote:
> i'm looking the procedure to replace a failed HD
> (http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD)
> and I was wondering if the procedure could be more automated like:
>
> 1- The operator replaces the failed hd, say osd.23
> 2- Give a new command like "ceph reborn osd.23"

To echo what Wido said, this is definitely in the plan. The goal is
something like this:

- failed disk is unmounted and the "service light" is set (where available)
- a technician pulls out the failed disk and plugs in a prepared new one
- new disk is automatically taken into use, a new osd is started
- later, the failed disk can be examined and recovered, if desired; OR
  the osd that used that disk is deleted from the monitors and the disk
  discarded

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failed HD, automate procedure?
  2012-04-02 16:23 ` Tommi Virtanen
@ 2012-04-02 20:33   ` Marco Aroldi
  0 siblings, 0 replies; 4+ messages in thread
From: Marco Aroldi @ 2012-04-02 20:33 UTC (permalink / raw)
  To: ceph-devel

Wido, Tommi, thanks

It would be great!

> Somehow that new disk has to be formatted and linked to the new OSD.
Yes, of course.
But in theory, the cluster's osd informations could be stored on the
mon or the mds, so they know the size, the filesystem, the partitions
of the failed osd and so, once replaced the hd, can recreate a new
"twin" osd

Il 02 aprile 2012 18:23, Tommi Virtanen <tommi.virtanen@dreamhost.com>
ha scritto:
> To echo what Wido said, this is definitely in the plan

Great news Tommi!

Marco

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-04-02 20:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-02 10:31 Failed HD, automate procedure? Marco Aroldi
2012-04-02 15:07 ` Wido den Hollander
2012-04-02 16:23 ` Tommi Virtanen
2012-04-02 20:33   ` Marco Aroldi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.