* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
@ 2015-04-01 20:27 ` Mark Nelson
2015-04-01 20:29 ` Sage Weil
2015-04-01 21:17 ` John Spray
2 siblings, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 20:27 UTC (permalink / raw)
To: Handzik, Joe, ceph-devel@vger.kernel.org
On 04/01/2015 01:56 PM, Handzik, Joe wrote:
>
> Hey all,
>
> Gregory Meno's pull request in Calamari (https://github.com/ceph/calamari/pull/267) is motivating some discussion about a feature (or set of features) that I'm about to start working on.
>
> My goal is to allow users to enable the identify and fault LEDs (fault is negotiable) via the Calamari GUI. I've had some discussion with Dan Mick and Gregory Meno about the concept, and they both see the value in it. The decision that needs to be made is...where should this functionality exist? There are a couple of obvious choices, after Gregory's SMART patch:
>
> 1. Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I'd still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
> 2. Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn't need to be involved.
>
> Dan mentioned something I thought about too...not EVERY OSD's backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I'd need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn't have an LED to toggle or doesn't understand SCSI Enclosure Services (I'm targeting industry standard HBAs first, and I'll deal with RAID controllers like Smart Array later).
>
> I'm trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I'm still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.
I'd personally vastly prefer #2. Back when I worked for an HPC center,
we constantly were frustrated by being tied into GUIs by storage
vendors. It's not that they were necessarily bad in any way and we
often did use them. We just wanted the ability to watch for or script
these things ourselves independently of the GUI. Unless it's a huge
burden, having most of this exist in ceph itself (potentially via some
kind of flexible mechanism that can be used for other tools like megacli
as well) would be a huge win imho. Having a calamari interface that can
control that would of course be very nice.
Mark
>
> Thanks,
>
> Joe
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
2015-04-01 20:27 ` Mark Nelson
@ 2015-04-01 20:29 ` Sage Weil
2015-04-01 21:17 ` John Spray
2 siblings, 0 replies; 13+ messages in thread
From: Sage Weil @ 2015-04-01 20:29 UTC (permalink / raw)
To: Handzik, Joe; +Cc: ceph-devel@vger.kernel.org
Hi Joe,
First, Hooray! that you're working on this! :)
On Wed, 1 Apr 2015, Handzik, Joe wrote:
> Hey all,
>
> Gregory Meno's pull request in Calamari
> (https://github.com/ceph/calamari/pull/267) is motivating some
> discussion about a feature (or set of features) that I'm about to start
> working on.
>
> My goal is to allow users to enable the identify and fault LEDs (fault
> is negotiable) via the Calamari GUI. I've had some discussion with Dan
> Mick and Gregory Meno about the concept, and they both see the value in
> it. The decision that needs to be made is...where should this
> functionality exist? There are a couple of obvious choices, after
> Gregory's SMART patch:
>
> 1. Stick everything in Calamari via Salt calls similar to what Gregory
> is showing. I have concerns about this, I think I'd still need extra
> information from the OSDs themselves. I might need to implement the
> first half of option #2 anyway.
I'm skeptical that this can work without *some* Ceph changes..
> 2. Scatter it across the codebases (would probably require changes in
> Ceph, Calamari, and Calamari-clients). Expose the storage target data
> via the OSDs, and move that information upward via the RESTful API.
> Then, expose another RESTful API behavior that allows a user to change
> the LED state. Implementing as much as possible in the Ceph codebase
> itself has an added benefit (as far as I see it, at least) if someone
> ever decides that the fault LED should be toggled on based on the state
> of the OSD or backing storage device. It should be easier for Ceph to
> hook into that kind of functionality if Calamari doesn't need to be
> involved.
This sounds more practical.
> Dan mentioned something I thought about too...not EVERY OSD's backing
> storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2,
> etc etc), I'd need to implement some way to filter devices and
> communicate via the Calamari GUI that the device doesn't have an LED to
> toggle or doesn't understand SCSI Enclosure Services (I'm targeting
> industry standard HBAs first, and I'll deal with RAID controllers like
> Smart Array later).
I suspect the first step is to make sure that the OSDs are surfacing the
metadata you need about the devices. I would start with what is currently
exposed via 'ceph osd metadata <id>'. See
https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L4508
and
https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L642
..but I'm guessing this quite enough information yet.
> I'm trying to get this out there early so anyone with particularly
> strong implementation opinions can give feedback. Any advice would be
> appreciated! I'm still new to the Ceph source base, and probably
> understand Calamari and Calamari-clients better than Ceph proper at the
> moment.
Exposing everything in a generic way should also cover this case.
Backends that don't have meaningful enclosure metadata won't have it (or
will report something different, like which DIMM slot the NVDIMM is in, or
whatever.)
As for the process that actuallly flips the LED state, that probably makes
more sense to do via Calamari?
sage
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
2015-04-01 20:27 ` Mark Nelson
2015-04-01 20:29 ` Sage Weil
@ 2015-04-01 21:17 ` John Spray
2015-04-01 21:55 ` John Spray
2 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2015-04-01 21:17 UTC (permalink / raw)
To: Handzik, Joe, ceph-devel@vger.kernel.org
On 01/04/2015 19:56, Handzik, Joe wrote:
> 1. Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I'd still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
> 2. Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn't need to be involved.
>
> Dan mentioned something I thought about too...not EVERY OSD's backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I'd need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn't have an LED to toggle or doesn't understand SCSI Enclosure Services (I'm targeting industry standard HBAs first, and I'll deal with RAID controllers like Smart Array later).
>
> I'm trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I'm still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.
Similar to Mark's comment, I would lean towards option 2 -- it would be
great to have a CLI-driven ability to flash the LEDs for an OSD, and
work on integrating that with a GUI afterwards.
Currently the OSD metadata on drives is pretty limited, it'll just tell
you the /var/lib/ceph/osd/ceph-X path for the data and journal -- the
task of resolving that to a physical device is left as an exercise to
the reader, so to speak.
I would suggest extending osd metadata to also report the block device,
but only for the simple case where an OSD is a GPT partition on a raw
/dev/sdX block device. Resolving block device to underlying disks in
configurations like LVM/MDRAID/multipath is complex in the general case
(I've done it, I don't recommend it), and most ceph clusters don't use
those layers. You could add a fallback ability for users to specify
their block device in ceph.conf, in case the simple GPT-assuming OSD
probing code can't find it from the mount point.
Once you have found the block device and reported it in the OSD
metadata, you can use that information to go poke its LEDs using
enclosure services hooks as you suggest, and wrap that in an OSD 'tell'
command (OSD::do_command). In a similar vein to finding the block
device, it would be a good thing to have a config option here so that
admins can optionally specify a custom command for flashing a particular
OSD's LED. Admins might not bother setting that, but it would mean a
system integrator could optionally configure ceph to work with whatever
exotic custom stuff they have.
Hopefully that's some help, it sounds like you've already thought it
through a fair bit anyway.
Cheers,
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 21:17 ` John Spray
@ 2015-04-01 21:55 ` John Spray
2015-04-01 21:57 ` Mark Nelson
2015-04-01 23:17 ` Sage Weil
0 siblings, 2 replies; 13+ messages in thread
From: John Spray @ 2015-04-01 21:55 UTC (permalink / raw)
To: Handzik, Joe, ceph-devel@vger.kernel.org
On 01/04/2015 22:17, John Spray wrote:
> Once you have found the block device and reported it in the OSD
> metadata, you can use that information to go poke its LEDs using
> enclosure services hooks as you suggest, and wrap that in an OSD
> 'tell' command (OSD::do_command). In a similar vein to finding the
> block device, it would be a good thing to have a config option here so
> that admins can optionally specify a custom command for flashing a
> particular OSD's LED. Admins might not bother setting that, but it
> would mean a system integrator could optionally configure ceph to work
> with whatever exotic custom stuff they have.
One more thought occurs to me -- one of the main cases where you'd want
to flash an LED would be to identify the drive of an OSD that is
down/out due to a dead drive. In that instance, the ceph-osd process
wouldn't actually be running, so you wouldn't be able to send it the
'tell' to flash the LED.
I guess in this interesting case you could either:
* Allow other OSDs on the same host to handle the 'tell blink' command
for the dead OSD's drive
* Leave this to calamari/whoever to read the dead OSD's block device
path from "ceph osd metadata", and go blink the LEDs themselves.
Cheers,
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 21:55 ` John Spray
@ 2015-04-01 21:57 ` Mark Nelson
2015-04-01 22:04 ` John Spray
2015-04-01 23:17 ` Sage Weil
1 sibling, 1 reply; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 21:57 UTC (permalink / raw)
To: John Spray, Handzik, Joe, ceph-devel@vger.kernel.org
On 04/01/2015 04:55 PM, John Spray wrote:
> On 01/04/2015 22:17, John Spray wrote:
>> Once you have found the block device and reported it in the OSD
>> metadata, you can use that information to go poke its LEDs using
>> enclosure services hooks as you suggest, and wrap that in an OSD
>> 'tell' command (OSD::do_command). In a similar vein to finding the
>> block device, it would be a good thing to have a config option here so
>> that admins can optionally specify a custom command for flashing a
>> particular OSD's LED. Admins might not bother setting that, but it
>> would mean a system integrator could optionally configure ceph to work
>> with whatever exotic custom stuff they have.
> One more thought occurs to me -- one of the main cases where you'd want
> to flash an LED would be to identify the drive of an OSD that is
> down/out due to a dead drive. In that instance, the ceph-osd process
> wouldn't actually be running, so you wouldn't be able to send it the
> 'tell' to flash the LED.
>
> I guess in this interesting case you could either:
> * Allow other OSDs on the same host to handle the 'tell blink' command
> for the dead OSD's drive
> * Leave this to calamari/whoever to read the dead OSD's block device
> path from "ceph osd metadata", and go blink the LEDs themselves.
>
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
It seems to me that the OSD potentially would flash the LED on it's way
down if it thinks it's drive is dead/dying?
Mark
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 21:57 ` Mark Nelson
@ 2015-04-01 22:04 ` John Spray
2015-04-01 22:06 ` Mark Nelson
2015-04-01 22:07 ` John Spray
0 siblings, 2 replies; 13+ messages in thread
From: John Spray @ 2015-04-01 22:04 UTC (permalink / raw)
To: Mark Nelson, Handzik, Joe, ceph-devel@vger.kernel.org
On 01/04/2015 22:57, Mark Nelson wrote:
> It seems to me that the OSD potentially would flash the LED on it's
> way down if it thinks it's drive is dead/dying?
That's a good idea for the case where ceph-osd is proactively
identifying a failing drive. I'm also thinking about the case where we
come back from a reboot and a drive is sufficiently unreadable that
ceph-disk doesn't see the OSD partitions and ceph-osd never gets
started, or the OSD's local filesystem is unmountable. Because the
keyring lives on that local filesystem, OSDs couldn't phone home in that
case, even to report a failure.
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 22:04 ` John Spray
@ 2015-04-01 22:06 ` Mark Nelson
2015-04-01 22:07 ` John Spray
1 sibling, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 22:06 UTC (permalink / raw)
To: John Spray, Handzik, Joe, ceph-devel@vger.kernel.org
On 04/01/2015 05:04 PM, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively
> identifying a failing drive. I'm also thinking about the case where we
> come back from a reboot and a drive is sufficiently unreadable that
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets
> started, or the OSD's local filesystem is unmountable. Because the
> keyring lives on that local filesystem, OSDs couldn't phone home in that
> case, even to report a failure.
If things are that bad, I think it should get picked up lower in the
stack. IE there should be some kind of daemon on the system that knows
when there are scsi errors or whatever and blinks drives that are that
far gone (in the case of RAID controllers, they may already do this anyway).
>
> John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 22:04 ` John Spray
2015-04-01 22:06 ` Mark Nelson
@ 2015-04-01 22:07 ` John Spray
2015-04-01 22:10 ` Handzik, Joe
1 sibling, 1 reply; 13+ messages in thread
From: John Spray @ 2015-04-01 22:07 UTC (permalink / raw)
To: Mark Nelson, Handzik, Joe, ceph-devel@vger.kernel.org
On 01/04/2015 23:04, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively
> identifying a failing drive. I'm also thinking about the case where
> we come back from a reboot and a drive is sufficiently unreadable that
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets
> started, or the OSD's local filesystem is unmountable. Because the
> keyring lives on that local filesystem, OSDs couldn't phone home in
> that case, even to report a failure.
Sorry, mental lapse: we're not talking about phoning home, we're talking
about flashing the LED. So perhaps ceph-disk itself could be modified
to flash an LED on a drive if it has a GPT partition ID for a ceph osd
but we can't mount it or start an OSD service.
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 22:07 ` John Spray
@ 2015-04-01 22:10 ` Handzik, Joe
0 siblings, 0 replies; 13+ messages in thread
From: Handzik, Joe @ 2015-04-01 22:10 UTC (permalink / raw)
To: John Spray, Mark Nelson, ceph-devel@vger.kernel.org
Sounds like this LED flashing business is going to evolve. I'll let you guys know when I have a first pass implementation in play...you can tell me which future pieces I might've forgotten to take into account.
Really, really great info. I think the key takeaway for me is to see how much I can get done in Ceph by itself, and involve Calamari and Calamari-clients only when necessary. At least to start.
-----Original Message-----
From: John Spray [mailto:john.spray@redhat.com]
Sent: Wednesday, April 01, 2015 5:08 PM
To: Mark Nelson; Handzik, Joe; ceph-devel@vger.kernel.org
Subject: Re: Advice for implementation of LED behavior in Ceph ecosystem
On 01/04/2015 23:04, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively
> identifying a failing drive. I'm also thinking about the case where
> we come back from a reboot and a drive is sufficiently unreadable that
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets
> started, or the OSD's local filesystem is unmountable. Because the
> keyring lives on that local filesystem, OSDs couldn't phone home in
> that case, even to report a failure.
Sorry, mental lapse: we're not talking about phoning home, we're talking
about flashing the LED. So perhaps ceph-disk itself could be modified
to flash an LED on a drive if it has a GPT partition ID for a ceph osd
but we can't mount it or start an OSD service.
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 21:55 ` John Spray
2015-04-01 21:57 ` Mark Nelson
@ 2015-04-01 23:17 ` Sage Weil
2015-04-01 23:48 ` Handzik, Joe
1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2015-04-01 23:17 UTC (permalink / raw)
To: John Spray; +Cc: Handzik, Joe, ceph-devel@vger.kernel.org
On Wed, 1 Apr 2015, John Spray wrote:
> I guess in this interesting case you could either:
> * Allow other OSDs on the same host to handle the 'tell blink' command for
> the dead OSD's drive
> * Leave this to calamari/whoever to read the dead OSD's block device path
> from "ceph osd metadata", and go blink the LEDs themselves.
#2 really sounds safer to me. In particular, you need to be really
careful not to flash an LED until you're sure you don't need the data on
the disk (i.e., it's down+out and the cluster state is healthy--no heroic
measures needed). I think anything that triggers flashing that doesn't
have a holistic view of the cluster would be dangerous.
That, combined with the complications around ceph-osd possibly not
running, make me thing this would be the calamari agent that does the
flashing.
It also may be necessary for the disk -> last known state mapping to go
somewhere other than in just osd metadata; if the osd is recreated or the
id gets reused that info go away. (We could also be careful to avoid
deallocating the id until the disk is removed, I guess, but it's another
constraint to worry about.)
sage
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice for implementation of LED behavior in Ceph ecosystem
2015-04-01 23:17 ` Sage Weil
@ 2015-04-01 23:48 ` Handzik, Joe
0 siblings, 0 replies; 13+ messages in thread
From: Handzik, Joe @ 2015-04-01 23:48 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, ceph-devel@vger.kernel.org
Well, keep in mind that this isn't just for identification of failed disks. I can conceive of use cases where a user flips on one or more drive LEDs for identification or debugging purposes. That would be the distinction between identity and fail. We can give the user the ability to distinguish between the two and figure out which they'd want to use at any given time (also, keep in mind that the failure LED is not customer controllable behind some storage controllers anyway...).
I was wondering if I'd need to carry along the last known disk state...guess I'll figure that nuance out as I go.
Joe
> On Apr 1, 2015, at 6:17 PM, Sage Weil <sage@newdream.net> wrote:
>
> #2 really sounds safer to me. In particular, you need to be really
> careful not to flash an LED until you're sure you don't need the data on
> the disk (i.e., it's down+out and the cluster state is healthy--no heroic
> measures needed). I think anything that triggers flashing that doesn't
> have a holistic view of the cluster would be dangerous.
>
> That, combined with the complications around ceph-osd possibly not
> running, make me thing this would be the calamari agent that does the
> flashing.
>
> It also may be necessary for the disk -> last known state mapping to go
> somewhere other than in just osd metadata; if the osd is recreated or the
> id gets reused that info go away. (We could also be careful to avoid
> deallocating the id until the disk is removed, I guess, but it's another
> constraint to worry about.)
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread