RE: Advice for implementation of LED behavior in Ceph ecosystem

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: Advice for implementation of LED behavior in Ceph ecosystem
@ 2015-04-01 18:56 Handzik, Joe
  2015-04-01 20:27 ` Mark Nelson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Handzik, Joe @ 2015-04-01 18:56 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hey all,

Gregory Meno's pull request in Calamari (https://github.com/ceph/calamari/pull/267) is motivating some discussion about a feature (or set of features) that I'm about to start working on.

My goal is to allow users to enable the identify and fault LEDs (fault is negotiable) via the Calamari GUI. I've had some discussion with Dan Mick and Gregory Meno about the concept, and they both see the value in it. The decision that needs to be made is...where should this functionality exist? There are a couple of obvious choices, after Gregory's SMART patch:

1. Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I'd still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
2. Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn't need to be involved.

Dan mentioned something I thought about too...not EVERY OSD's backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I'd need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn't have an LED to toggle or doesn't understand SCSI Enclosure Services (I'm targeting industry standard HBAs first, and I'll deal with RAID controllers like Smart Array later). 

I'm trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I'm still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.

Thanks,

Joe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
@ 2015-04-01 20:27 ` Mark Nelson
  2015-04-01 20:29 ` Sage Weil
  2015-04-01 21:17 ` John Spray
  2 siblings, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 20:27 UTC (permalink / raw)
  To: Handzik, Joe, ceph-devel@vger.kernel.org



On 04/01/2015 01:56 PM, Handzik, Joe wrote:
>
> Hey all,
>
> Gregory Meno's pull request in Calamari (https://github.com/ceph/calamari/pull/267) is motivating some discussion about a feature (or set of features) that I'm about to start working on.
>
> My goal is to allow users to enable the identify and fault LEDs (fault is negotiable) via the Calamari GUI. I've had some discussion with Dan Mick and Gregory Meno about the concept, and they both see the value in it. The decision that needs to be made is...where should this functionality exist? There are a couple of obvious choices, after Gregory's SMART patch:
>
> 1. Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I'd still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
> 2. Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn't need to be involved.
>
> Dan mentioned something I thought about too...not EVERY OSD's backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I'd need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn't have an LED to toggle or doesn't understand SCSI Enclosure Services (I'm targeting industry standard HBAs first, and I'll deal with RAID controllers like Smart Array later).
>
> I'm trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I'm still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.

I'd personally vastly prefer #2.  Back when I worked for an HPC center, 
we constantly were frustrated by being tied into GUIs by storage 
vendors.  It's not that they were necessarily bad in any way and we 
often did use them.  We just wanted the ability to watch for or script 
these things ourselves independently of the GUI.  Unless it's a huge 
burden, having most of this exist in ceph itself (potentially via some 
kind of flexible mechanism that can be used for other tools like megacli 
as well) would be a huge win imho.  Having a calamari interface that can 
control that would of course be very nice.

Mark

>
> Thanks,
>
> Joe
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
  2015-04-01 20:27 ` Mark Nelson
@ 2015-04-01 20:29 ` Sage Weil
  2015-04-01 21:17 ` John Spray
  2 siblings, 0 replies; 13+ messages in thread
From: Sage Weil @ 2015-04-01 20:29 UTC (permalink / raw)
  To: Handzik, Joe; +Cc: ceph-devel@vger.kernel.org

Hi Joe,

First, Hooray! that you're working on this!  :)

On Wed, 1 Apr 2015, Handzik, Joe wrote:
> Hey all,
> 
> Gregory Meno's pull request in Calamari 
> (https://github.com/ceph/calamari/pull/267) is motivating some 
> discussion about a feature (or set of features) that I'm about to start 
> working on.
> 
> My goal is to allow users to enable the identify and fault LEDs (fault 
> is negotiable) via the Calamari GUI. I've had some discussion with Dan 
> Mick and Gregory Meno about the concept, and they both see the value in 
> it. The decision that needs to be made is...where should this 
> functionality exist? There are a couple of obvious choices, after 
> Gregory's SMART patch:
> 
> 1. Stick everything in Calamari via Salt calls similar to what Gregory 
> is showing. I have concerns about this, I think I'd still need extra 
> information from the OSDs themselves. I might need to implement the 
> first half of option #2 anyway.

I'm skeptical that this can work without *some* Ceph changes..

> 2. Scatter it across the codebases (would probably require changes in 
> Ceph, Calamari, and Calamari-clients). Expose the storage target data 
> via the OSDs, and move that information upward via the RESTful API. 
> Then, expose another RESTful API behavior that allows a user to change 
> the LED state. Implementing as much as possible in the Ceph codebase 
> itself has an added benefit (as far as I see it, at least) if someone 
> ever decides that the fault LED should be toggled on based on the state 
> of the OSD or backing storage device. It should be easier for Ceph to 
> hook into that kind of functionality if Calamari doesn't need to be 
> involved.

This sounds more practical.

> Dan mentioned something I thought about too...not EVERY OSD's backing 
> storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, 
> etc etc), I'd need to implement some way to filter devices and 
> communicate via the Calamari GUI that the device doesn't have an LED to 
> toggle or doesn't understand SCSI Enclosure Services (I'm targeting 
> industry standard HBAs first, and I'll deal with RAID controllers like 
> Smart Array later).

I suspect the first step is to make sure that the OSDs are surfacing the 
metadata you need about the devices.  I would start with what is currently 
exposed via 'ceph osd metadata <id>'.  See

	https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L4508
and
	https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L642

..but I'm guessing this quite enough information yet.

> I'm trying to get this out there early so anyone with particularly 
> strong implementation opinions can give feedback. Any advice would be 
> appreciated! I'm still new to the Ceph source base, and probably 
> understand Calamari and Calamari-clients better than Ceph proper at the 
> moment.

Exposing everything in a generic way should also cover this case.  
Backends that don't have meaningful enclosure metadata won't have it (or 
will report something different, like which DIMM slot the NVDIMM is in, or 
whatever.)

As for the process that actuallly flips the LED state, that probably makes 
more sense to do via Calamari?

sage


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
  2015-04-01 20:27 ` Mark Nelson
  2015-04-01 20:29 ` Sage Weil
@ 2015-04-01 21:17 ` John Spray
  2015-04-01 21:55   ` John Spray
  2 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2015-04-01 21:17 UTC (permalink / raw)
  To: Handzik, Joe, ceph-devel@vger.kernel.org

On 01/04/2015 19:56, Handzik, Joe wrote:
> 1. Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I'd still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
> 2. Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn't need to be involved.
>
> Dan mentioned something I thought about too...not EVERY OSD's backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I'd need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn't have an LED to toggle or doesn't understand SCSI Enclosure Services (I'm targeting industry standard HBAs first, and I'll deal with RAID controllers like Smart Array later).
>
> I'm trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I'm still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.

Similar to Mark's comment, I would lean towards option 2 -- it would be 
great to have a CLI-driven ability to flash the LEDs for an OSD, and 
work on integrating that with a GUI afterwards.

Currently the OSD metadata on drives is pretty limited, it'll just tell 
you the /var/lib/ceph/osd/ceph-X path for the data and journal -- the 
task of resolving that to a physical device is left as an exercise to 
the reader, so to speak.

I would suggest extending osd metadata to also report the block device, 
but only for the simple case where an OSD is a GPT partition on a raw 
/dev/sdX block device.  Resolving block device to underlying disks in 
configurations like LVM/MDRAID/multipath is complex in the general case 
(I've done it, I don't recommend it), and most ceph clusters don't use 
those layers.  You could add a fallback ability for users to specify 
their block device in ceph.conf, in case the simple GPT-assuming OSD 
probing code can't find it from the mount point.

Once you have found the block device and reported it in the OSD 
metadata, you can use that information to go poke its LEDs using 
enclosure services hooks as you suggest, and wrap that in an OSD 'tell' 
command (OSD::do_command).  In a similar vein to finding the block 
device, it would be a good thing to have a config option here so that 
admins can optionally specify a custom command for flashing a particular 
OSD's LED.  Admins might not bother setting that, but it would mean a 
system integrator could optionally configure ceph to work with whatever 
exotic custom stuff they have.

Hopefully that's some help, it sounds like you've already thought it 
through a fair bit anyway.

Cheers,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 21:17 ` John Spray
@ 2015-04-01 21:55   ` John Spray
  2015-04-01 21:57     ` Mark Nelson
  2015-04-01 23:17     ` Sage Weil
  0 siblings, 2 replies; 13+ messages in thread
From: John Spray @ 2015-04-01 21:55 UTC (permalink / raw)
  To: Handzik, Joe, ceph-devel@vger.kernel.org

On 01/04/2015 22:17, John Spray wrote:
> Once you have found the block device and reported it in the OSD 
> metadata, you can use that information to go poke its LEDs using 
> enclosure services hooks as you suggest, and wrap that in an OSD 
> 'tell' command (OSD::do_command).  In a similar vein to finding the 
> block device, it would be a good thing to have a config option here so 
> that admins can optionally specify a custom command for flashing a 
> particular OSD's LED.  Admins might not bother setting that, but it 
> would mean a system integrator could optionally configure ceph to work 
> with whatever exotic custom stuff they have.
One more thought occurs to me -- one of the main cases where you'd want 
to flash an LED would be to identify the drive of an OSD that is 
down/out due to a dead drive.  In that instance, the ceph-osd process 
wouldn't actually be running, so you wouldn't be able to send it the 
'tell' to flash the LED.

I guess in this interesting case you could either:
  * Allow other OSDs on the same host to handle the 'tell blink' command 
for the dead OSD's drive
  * Leave this to calamari/whoever to read the dead OSD's block device 
path from "ceph osd metadata", and go blink the LEDs themselves.

Cheers,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 21:55   ` John Spray
@ 2015-04-01 21:57     ` Mark Nelson
  2015-04-01 22:04       ` John Spray
  2015-04-01 23:17     ` Sage Weil
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 21:57 UTC (permalink / raw)
  To: John Spray, Handzik, Joe, ceph-devel@vger.kernel.org

On 04/01/2015 04:55 PM, John Spray wrote:
> On 01/04/2015 22:17, John Spray wrote:
>> Once you have found the block device and reported it in the OSD
>> metadata, you can use that information to go poke its LEDs using
>> enclosure services hooks as you suggest, and wrap that in an OSD
>> 'tell' command (OSD::do_command).  In a similar vein to finding the
>> block device, it would be a good thing to have a config option here so
>> that admins can optionally specify a custom command for flashing a
>> particular OSD's LED.  Admins might not bother setting that, but it
>> would mean a system integrator could optionally configure ceph to work
>> with whatever exotic custom stuff they have.
> One more thought occurs to me -- one of the main cases where you'd want
> to flash an LED would be to identify the drive of an OSD that is
> down/out due to a dead drive.  In that instance, the ceph-osd process
> wouldn't actually be running, so you wouldn't be able to send it the
> 'tell' to flash the LED.
>
> I guess in this interesting case you could either:
>   * Allow other OSDs on the same host to handle the 'tell blink' command
> for the dead OSD's drive
>   * Leave this to calamari/whoever to read the dead OSD's block device
> path from "ceph osd metadata", and go blink the LEDs themselves.
>
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It seems to me that the OSD potentially would flash the LED on it's way 
down if it thinks it's drive is dead/dying?

Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 21:57     ` Mark Nelson
@ 2015-04-01 22:04       ` John Spray
  2015-04-01 22:06         ` Mark Nelson
  2015-04-01 22:07         ` John Spray
  0 siblings, 2 replies; 13+ messages in thread
From: John Spray @ 2015-04-01 22:04 UTC (permalink / raw)
  To: Mark Nelson, Handzik, Joe, ceph-devel@vger.kernel.org

On 01/04/2015 22:57, Mark Nelson wrote:
> It seems to me that the OSD potentially would flash the LED on it's 
> way down if it thinks it's drive is dead/dying?
That's a good idea for the case where ceph-osd is proactively 
identifying a failing drive.  I'm also thinking about the case where we 
come back from a reboot and a drive is sufficiently unreadable that 
ceph-disk doesn't see the OSD partitions and ceph-osd never gets 
started, or the OSD's local filesystem is unmountable.  Because the 
keyring lives on that local filesystem, OSDs couldn't phone home in that 
case, even to report a failure.

John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 22:04       ` John Spray
@ 2015-04-01 22:06         ` Mark Nelson
  2015-04-01 22:07         ` John Spray
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2015-04-01 22:06 UTC (permalink / raw)
  To: John Spray, Handzik, Joe, ceph-devel@vger.kernel.org



On 04/01/2015 05:04 PM, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively
> identifying a failing drive.  I'm also thinking about the case where we
> come back from a reboot and a drive is sufficiently unreadable that
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets
> started, or the OSD's local filesystem is unmountable.  Because the
> keyring lives on that local filesystem, OSDs couldn't phone home in that
> case, even to report a failure.

If things are that bad, I think it should get picked up lower in the 
stack.  IE there should be some kind of daemon on the system that knows 
when there are scsi errors or whatever and blinks drives that are that 
far gone (in the case of RAID controllers, they may already do this anyway).

>
> John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 22:04       ` John Spray
  2015-04-01 22:06         ` Mark Nelson
@ 2015-04-01 22:07         ` John Spray
  2015-04-01 22:10           ` Handzik, Joe
  1 sibling, 1 reply; 13+ messages in thread
From: John Spray @ 2015-04-01 22:07 UTC (permalink / raw)
  To: Mark Nelson, Handzik, Joe, ceph-devel@vger.kernel.org

On 01/04/2015 23:04, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's 
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively 
> identifying a failing drive.  I'm also thinking about the case where 
> we come back from a reboot and a drive is sufficiently unreadable that 
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets 
> started, or the OSD's local filesystem is unmountable.  Because the 
> keyring lives on that local filesystem, OSDs couldn't phone home in 
> that case, even to report a failure.
Sorry, mental lapse: we're not talking about phoning home, we're talking 
about flashing the LED.  So perhaps ceph-disk itself could be modified 
to flash an LED on a drive if it has a GPT partition ID for a ceph osd 
but we can't mount it or start an OSD service.

John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 22:07         ` John Spray
@ 2015-04-01 22:10           ` Handzik, Joe
  0 siblings, 0 replies; 13+ messages in thread
From: Handzik, Joe @ 2015-04-01 22:10 UTC (permalink / raw)
  To: John Spray, Mark Nelson, ceph-devel@vger.kernel.org

Sounds like this LED flashing business is going to evolve. I'll let you guys know when I have a first pass implementation in play...you can tell me which future pieces I might've forgotten to take into account.

Really, really great info. I think the key takeaway for me is to see how much I can get done in Ceph by itself, and involve Calamari and Calamari-clients only when necessary. At least to start.

-----Original Message-----
From: John Spray [mailto:john.spray@redhat.com] 
Sent: Wednesday, April 01, 2015 5:08 PM
To: Mark Nelson; Handzik, Joe; ceph-devel@vger.kernel.org
Subject: Re: Advice for implementation of LED behavior in Ceph ecosystem

On 01/04/2015 23:04, John Spray wrote:
> On 01/04/2015 22:57, Mark Nelson wrote:
>> It seems to me that the OSD potentially would flash the LED on it's 
>> way down if it thinks it's drive is dead/dying?
> That's a good idea for the case where ceph-osd is proactively 
> identifying a failing drive.  I'm also thinking about the case where 
> we come back from a reboot and a drive is sufficiently unreadable that 
> ceph-disk doesn't see the OSD partitions and ceph-osd never gets 
> started, or the OSD's local filesystem is unmountable.  Because the 
> keyring lives on that local filesystem, OSDs couldn't phone home in 
> that case, even to report a failure.
Sorry, mental lapse: we're not talking about phoning home, we're talking 
about flashing the LED.  So perhaps ceph-disk itself could be modified 
to flash an LED on a drive if it has a GPT partition ID for a ceph osd 
but we can't mount it or start an OSD service.

John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 21:55   ` John Spray
  2015-04-01 21:57     ` Mark Nelson
@ 2015-04-01 23:17     ` Sage Weil
  2015-04-01 23:48       ` Handzik, Joe
  1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2015-04-01 23:17 UTC (permalink / raw)
  To: John Spray; +Cc: Handzik, Joe, ceph-devel@vger.kernel.org

On Wed, 1 Apr 2015, John Spray wrote:
> I guess in this interesting case you could either:
>  * Allow other OSDs on the same host to handle the 'tell blink' command for
> the dead OSD's drive
>  * Leave this to calamari/whoever to read the dead OSD's block device path
> from "ceph osd metadata", and go blink the LEDs themselves.

#2 really sounds safer to me.  In particular, you need to be really 
careful not to flash an LED until you're sure you don't need the data on 
the disk (i.e., it's down+out and the cluster state is healthy--no heroic 
measures needed).  I think anything that triggers flashing that doesn't 
have a holistic view of the cluster would be dangerous.

That, combined with the complications around ceph-osd possibly not 
running, make me thing this would be the calamari agent that does the 
flashing.

It also may be necessary for the disk -> last known state mapping to go 
somewhere other than in just osd metadata; if the osd is recreated or the 
id gets reused that info go away.  (We could also be careful to avoid 
deallocating the id until the disk is removed, I guess, but it's another 
constraint to worry about.)

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Advice for implementation of LED behavior in Ceph ecosystem
  2015-04-01 23:17     ` Sage Weil
@ 2015-04-01 23:48       ` Handzik, Joe
  0 siblings, 0 replies; 13+ messages in thread
From: Handzik, Joe @ 2015-04-01 23:48 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, ceph-devel@vger.kernel.org

Well, keep in mind that this isn't just for identification of failed disks. I can conceive of use cases where a user flips on one or more drive LEDs for identification or debugging purposes. That would be the distinction between identity and fail. We can give the user the ability to distinguish between the two and figure out which they'd want to use at any given time (also, keep in mind that the failure LED is not customer controllable behind some storage controllers anyway...).

I was wondering if I'd need to carry along the last known disk state...guess I'll figure that nuance out as I go.

Joe

> On Apr 1, 2015, at 6:17 PM, Sage Weil <sage@newdream.net> wrote:
> 
> #2 really sounds safer to me.  In particular, you need to be really 
> careful not to flash an LED until you're sure you don't need the data on 
> the disk (i.e., it's down+out and the cluster state is healthy--no heroic 
> measures needed).  I think anything that triggers flashing that doesn't 
> have a holistic view of the cluster would be dangerous.
> 
> That, combined with the complications around ceph-osd possibly not 
> running, make me thing this would be the calamari agent that does the 
> flashing.
> 
> It also may be necessary for the disk -> last known state mapping to go 
> somewhere other than in just osd metadata; if the osd is recreated or the 
> id gets reused that info go away.  (We could also be careful to avoid 
> deallocating the id until the disk is removed, I guess, but it's another 
> constraint to worry about.)
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <2C438B34CAC8264398F5C7AF7411910A6341F7BD@G4W3206.americas.hpqcorp.net>]

[parent not found: <61596DF5-5893-4F29-B3A3-DC2CEBDCD119@redhat.com>]

* Re: Advice for implementation of LED behavior in Ceph ecosystem
       [not found] ` <61596DF5-5893-4F29-B3A3-DC2CEBDCD119@redhat.com>
@ 2015-04-21 14:46   ` Gregory Meno
  0 siblings, 0 replies; 13+ messages in thread
From: Gregory Meno @ 2015-04-21 14:46 UTC (permalink / raw)
  To: Handzik, Joe; +Cc: ceph-devel@vger.kernel.org, ceph-calamari


On Apr 21, 2015, at 9:58 AM, Gregory Meno <gmeno@redhat.com> wrote:

cross posting to ceph-calamari@lists.ceph.com in case we have interest there too.

> On Apr 1, 2015, at 2:52 PM, Handzik, Joe <joseph.t.handzik@hp.com> wrote:
> 
> Hey all,
>  
> Gregory Meno’s pull request in Calamari (https://github.com/ceph/calamari/pull/267) is motivating some discussion about a feature (or set of features) that I’m about to start working on.
>  
> My goal is to allow users to enable the identify and fault LEDs (fault is negotiable) via the Calamari GUI. I’ve had some discussion with Dan Mick and Gregory Meno about the concept, and they both see the value in it. The decision that needs to be made is…where should this functionality exist? There are a couple of obvious choices, after Gregory’s SMART patch:
>  
> 1.       Stick everything in Calamari via Salt calls similar to what Gregory is showing. I have concerns about this, I think I’d still need extra information from the OSDs themselves. I might need to implement the first half of option #2 anyway.
> 2.       Scatter it across the codebases (would probably require changes in Ceph, Calamari, and Calamari-clients). Expose the storage target data via the OSDs, and move that information upward via the RESTful API. Then, expose another RESTful API behavior that allows a user to change the LED state. Implementing as much as possible in the Ceph codebase itself has an added benefit (as far as I see it, at least) if someone ever decides that the fault LED should be toggled on based on the state of the OSD or backing storage device. It should be easier for Ceph to hook into that kind of functionality if Calamari doesn’t need to be involved.
>  
> Dan mentioned something I thought about too…not EVERY OSD’s backing storage is going to be able to use this (Kinetic drives, NVDIMMs, M.2, etc etc), I’d need to implement some way to filter devices and communicate via the Calamari GUI that the device doesn’t have an LED to toggle or doesn’t understand SCSI Enclosure Services (I’m targeting industry standard HBAs first, and I’ll deal with RAID controllers like Smart Array later).
>  
> I’m trying to get this out there early so anyone with particularly strong implementation opinions can give feedback. Any advice would be appreciated! I’m still new to the Ceph source base, and probably understand Calamari and Calamari-clients better than Ceph proper at the moment.
>  
> Thanks,
>  
> Joe


Joe,

I’ve been thinking about how to expose this new hardware context in the calamari API.

I propose we add an end point like /api/v2/hardware and each entry there is something that relates to ceph in that we know what ceph subsystem is affected by the hardware entry being in a particular state e.g. OK, WARN, FAIL. This state is a result of the checks like SMART.

This /hardware endpoint would be a collection of all hardware that we can have checks for or the OSD advertises that it rely on. This endpoint is good for an overview for someone who wants a current status of hardware independent of cluster context.

Naturally I expect this info to be present in context within /api/v2/cluster/FSID/osd/N and other endpoints describing Ceph primitives. That helps UIs have better context without having to do the correlation in the client.

It seems like it would be helpful to have the OSDs report their underlying data storage target.

How would you expose the backing storage for the OSD? Is it a command line query, configuration,  part of the OSD map, or something else?

regards,
Gregory

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-04-21 14:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-01 18:56 Advice for implementation of LED behavior in Ceph ecosystem Handzik, Joe
2015-04-01 20:27 ` Mark Nelson
2015-04-01 20:29 ` Sage Weil
2015-04-01 21:17 ` John Spray
2015-04-01 21:55   ` John Spray
2015-04-01 21:57     ` Mark Nelson
2015-04-01 22:04       ` John Spray
2015-04-01 22:06         ` Mark Nelson
2015-04-01 22:07         ` John Spray
2015-04-01 22:10           ` Handzik, Joe
2015-04-01 23:17     ` Sage Weil
2015-04-01 23:48       ` Handzik, Joe
     [not found] <2C438B34CAC8264398F5C7AF7411910A6341F7BD@G4W3206.americas.hpqcorp.net>
     [not found] ` <61596DF5-5893-4F29-B3A3-DC2CEBDCD119@redhat.com>
2015-04-21 14:46   ` Gregory Meno

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.