* adding "{mds,mon} metadata" asok command
@ 2015-03-20 5:39 kefu chai
2015-03-20 11:43 ` John Spray
2015-03-20 13:30 ` Sage Weil
0 siblings, 2 replies; 11+ messages in thread
From: kefu chai @ 2015-03-20 5:39 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
hi ceph,
to pave the road to http://tracker.ceph.com/issues/10904, where we
need to add a command to list the hostname of nodes in a ceph cluster,
i would like to add the "{mds,mon} metadata" commands to print the
system information including, but not limited to hostname,
mem_{total,swap}_kb, and distro info, of specified mds and mon.
the implementation follow the mechanism of "osd metadata":
on the mds side i would like to reuse the MDSMonitor service:
1. piggy back a map for the metadata in MMDSBeacon message,
2. put the metadata into the same DBStore transaction but with another
prefix when storing the pending inc into local storage.
3. and expose it using the "mds metadata" and later on the "service
ls" (not sure about the name ...)
@greg and @zyan, are you good with this? not sure this will overburden
the mds or not. i will use uname(2) and grep /proc/meminfo to get the
metadata in the same way of OSD.
on the mon side, i will apply the same approach to MMonJoin and
MonmapMonitor, but with another prefix when putting encoding metadata
into transaction.
@joao, IIRC, you were suggesting me to add another paxos service for
this purpose but is it feasible to put this data into MMonJoin and
persist it into the paxos service's storage?
any suggestion or comments would be appreciated. if no objections, i
will be on it.
thanks =)
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-20 5:39 adding "{mds,mon} metadata" asok command kefu chai
@ 2015-03-20 11:43 ` John Spray
[not found] ` <CAJE9aOPK6Qge8BNvB0na2-Y_12THB2aSmk-wvpPytAwcjonMGQ@mail.gmail.com>
2015-03-20 13:30 ` Sage Weil
1 sibling, 1 reply; 11+ messages in thread
From: John Spray @ 2015-03-20 11:43 UTC (permalink / raw)
To: kefu chai, ceph-devel@vger.kernel.org
On 20/03/2015 05:39, kefu chai wrote:
> to pave the road to http://tracker.ceph.com/issues/10904, where we
> need to add a command to list the hostname of nodes in a ceph cluster,
> i would like to add the "{mds,mon} metadata" commands to print the
> system information including, but not limited to hostname,
> mem_{total,swap}_kb, and distro info, of specified mds and mon.
>
> the implementation follow the mechanism of "osd metadata":
>
> on the mds side i would like to reuse the MDSMonitor service:
> 1. piggy back a map for the metadata in MMDSBeacon message,
> 2. put the metadata into the same DBStore transaction but with another
> prefix when storing the pending inc into local storage.
> 3. and expose it using the "mds metadata" and later on the "service
> ls" (not sure about the name ...)
>
> @greg and @zyan, are you good with this? not sure this will overburden
> the mds or not. i will use uname(2) and grep /proc/meminfo to get the
> metadata in the same way of OSD.
It should be straightforward to include the metadata in MMDSBeacon only
once per daemon lifetime, by checking if state is CEPH_MDS_STATE_BOOT --
that way we don't have to worry about any ongoing costs. I expect that
change can live entirely in Beacon.cc without touching any other MDS code.
As for the means of getting the information, I expect the generic
kernel/mem/cpu/distro stuff from OSD::_collect_metadata can be moved up
into common/ somewhere and reused as-is from mon+mds.
John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-20 5:39 adding "{mds,mon} metadata" asok command kefu chai
2015-03-20 11:43 ` John Spray
@ 2015-03-20 13:30 ` Sage Weil
2015-03-20 16:04 ` kefu chai
2015-03-23 10:04 ` Joao Eduardo Luis
1 sibling, 2 replies; 11+ messages in thread
From: Sage Weil @ 2015-03-20 13:30 UTC (permalink / raw)
To: kefu chai; +Cc: ceph-devel@vger.kernel.org
On Fri, 20 Mar 2015, kefu chai wrote:
> on the mon side, i will apply the same approach to MMonJoin and
> MonmapMonitor, but with another prefix when putting encoding metadata
> into transaction.
The MMonJoin is only used when expanding the cluster, i.e. when first
adding the new monitor during cluster configuration time.
> @joao, IIRC, you were suggesting me to add another paxos service for
> this purpose but is it feasible to put this data into MMonJoin and
> persist it into the paxos service's storage?
>
> any suggestion or comments would be appreciated. if no objections, i
> will be on it.
Instead, I suggest having each mon generate the metadata during startup.
Then, somewhere during bootstrap(), compare the stored metadata (they all
have a copy of the mon data set) with the actual metadata, and if it
varies send a new message (MMonUpdateMetadata?) to the leader requesting a
change.
Joao, what do you think?
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Fwd: adding "{mds,mon} metadata" asok command
[not found] ` <CAJE9aOPK6Qge8BNvB0na2-Y_12THB2aSmk-wvpPytAwcjonMGQ@mail.gmail.com>
@ 2015-03-20 15:58 ` kefu chai
0 siblings, 0 replies; 11+ messages in thread
From: kefu chai @ 2015-03-20 15:58 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
On Fri, Mar 20, 2015 at 7:43 PM, John Spray <john.spray@redhat.com> wrote:
> On 20/03/2015 05:39, kefu chai wrote:
>>
>> to pave the road to http://tracker.ceph.com/issues/10904, where we
>> need to add a command to list the hostname of nodes in a ceph cluster,
>> i would like to add the "{mds,mon} metadata" commands to print the
>> system information including, but not limited to hostname,
>> mem_{total,swap}_kb, and distro info, of specified mds and mon.
>>
>> the implementation follow the mechanism of "osd metadata":
>>
>> on the mds side i would like to reuse the MDSMonitor service:
>> 1. piggy back a map for the metadata in MMDSBeacon message,
>> 2. put the metadata into the same DBStore transaction but with another
>> prefix when storing the pending inc into local storage.
>> 3. and expose it using the "mds metadata" and later on the "service
>> ls" (not sure about the name ...)
>>
>> @greg and @zyan, are you good with this? not sure this will overburden
>> the mds or not. i will use uname(2) and grep /proc/meminfo to get the
>> metadata in the same way of OSD.
>
> It should be straightforward to include the metadata in MMDSBeacon only once
> per daemon lifetime, by checking if state is CEPH_MDS_STATE_BOOT -- that way
> we don't have to worry about any ongoing costs. I expect that change can
thanks john, it sounds great! and this is a big relief to me.
> live entirely in Beacon.cc without touching any other MDS code.
and MMDSBeacon.{h,cc} and MDSMonitor.{h,cc} for sure, i think. because
we need to put the payload into the message, and put/get the metadata
into the mon's DB storage.
>
> As for the means of getting the information, I expect the generic
> kernel/mem/cpu/distro stuff from OSD::_collect_metadata can be moved up into
> common/ somewhere and reused as-is from mon+mds.
sure, i will refactor the OSD::_collect_metadata() to avoid the copy & paste.
>
> John
--
Regards
Kefu Chai
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-20 13:30 ` Sage Weil
@ 2015-03-20 16:04 ` kefu chai
2015-03-23 10:04 ` Joao Eduardo Luis
1 sibling, 0 replies; 11+ messages in thread
From: kefu chai @ 2015-03-20 16:04 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
On Fri, Mar 20, 2015 at 9:30 PM, Sage Weil <sage@newdream.net> wrote:
> On Fri, 20 Mar 2015, kefu chai wrote:
>> on the mon side, i will apply the same approach to MMonJoin and
>> MonmapMonitor, but with another prefix when putting encoding metadata
>> into transaction.
>
> The MMonJoin is only used when expanding the cluster, i.e. when first
> adding the new monitor during cluster configuration time.
>
>> @joao, IIRC, you were suggesting me to add another paxos service for
>> this purpose but is it feasible to put this data into MMonJoin and
>> persist it into the paxos service's storage?
>>
>> any suggestion or comments would be appreciated. if no objections, i
>> will be on it.
>
> Instead, I suggest having each mon generate the metadata during startup.
> Then, somewhere during bootstrap(), compare the stored metadata (they all
> have a copy of the mon data set) with the actual metadata, and if it
> varies send a new message (MMonUpdateMetadata?) to the leader requesting a
> change.
thanks sage =) will go this way once i have joao's blessing.
>
> Joao, what do you think?
>
> sage
>
--
Regards
Kefu Chai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-20 13:30 ` Sage Weil
2015-03-20 16:04 ` kefu chai
@ 2015-03-23 10:04 ` Joao Eduardo Luis
2015-03-23 10:35 ` John Spray
1 sibling, 1 reply; 11+ messages in thread
From: Joao Eduardo Luis @ 2015-03-23 10:04 UTC (permalink / raw)
To: Sage Weil, kefu chai; +Cc: ceph-devel@vger.kernel.org
Sorry for the latency.
On 03/20/2015 01:30 PM, Sage Weil wrote:
> On Fri, 20 Mar 2015, kefu chai wrote:
>> on the mon side, i will apply the same approach to MMonJoin and
>> MonmapMonitor, but with another prefix when putting encoding metadata
>> into transaction.
>
> The MMonJoin is only used when expanding the cluster, i.e. when first
> adding the new monitor during cluster configuration time.
>
>> @joao, IIRC, you were suggesting me to add another paxos service for
>> this purpose but is it feasible to put this data into MMonJoin and
>> persist it into the paxos service's storage?
>>
>> any suggestion or comments would be appreciated. if no objections, i
>> will be on it.
>
> Instead, I suggest having each mon generate the metadata during startup.
> Then, somewhere during bootstrap(), compare the stored metadata (they all
> have a copy of the mon data set) with the actual metadata, and if it
> varies send a new message (MMonUpdateMetadata?) to the leader requesting a
> change.
I agree. And I don't think we need a new service for this, and I also
don't think we need to write stuff to the store. We can generate this
information when the monitor hits 'bootstrap()' and share it with the
rest of the quorum once an election finishes, and always keep it in
memory (unless there's some information that needs to be persisted, but
I was under the impression that was not the case).
This, btw, is what mon/DataHealthService.cc does, but you should be able
to avoid stuffing this into a service or creating a new service. The
timecheck stuff in the monitor also works a bit like this -- take a look
at those if you need a guideline.
Cheers!
-Joao
>
> Joao, what do you think?
>
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Joao Eduardo Luis | github.com/jecluis
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-23 10:04 ` Joao Eduardo Luis
@ 2015-03-23 10:35 ` John Spray
2015-03-23 13:58 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: John Spray @ 2015-03-23 10:35 UTC (permalink / raw)
To: Joao Eduardo Luis, Sage Weil, kefu chai; +Cc: ceph-devel@vger.kernel.org
On 23/03/2015 10:04, Joao Eduardo Luis wrote:
> I agree. And I don't think we need a new service for this, and I also
> don't think we need to write stuff to the store. We can generate this
> information when the monitor hits 'bootstrap()' and share it with the
> rest of the quorum once an election finishes, and always keep it in
> memory (unless there's some information that needs to be persisted,
> but I was under the impression that was not the case).
Just to clarify, you mean we don't need to write the mon metadata to the
store, but we'd still want to persist the MDS/OSD metadata - right?
John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-23 10:35 ` John Spray
@ 2015-03-23 13:58 ` Sage Weil
2015-03-24 10:11 ` Joao Eduardo Luis
2015-03-25 18:18 ` kefu chai
0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2015-03-23 13:58 UTC (permalink / raw)
To: John Spray; +Cc: Joao Eduardo Luis, kefu chai, ceph-devel@vger.kernel.org
On Mon, 23 Mar 2015, John Spray wrote:
> On 23/03/2015 10:04, Joao Eduardo Luis wrote:
> > I agree. And I don't think we need a new service for this, and I also don't
> > think we need to write stuff to the store. We can generate this information
> > when the monitor hits 'bootstrap()' and share it with the rest of the quorum
> > once an election finishes, and always keep it in memory (unless there's some
> > information that needs to be persisted, but I was under the impression that
> > was not the case).
>
> Just to clarify, you mean we don't need to write the mon metadata to the
> store, but we'd still want to persist the MDS/OSD metadata - right?
I think we definitely still want to persist those (as we already do
persist the OSD metadata).
Since we're reporting on running daemons we could get that without
persisting the mon metadata. I think the question is whether we want to
report on the last known running instance. I forget whether 'osd
metadata' includes OSDs that are down... if so, we may as well do
the same for mons too and persist that.
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-23 13:58 ` Sage Weil
@ 2015-03-24 10:11 ` Joao Eduardo Luis
2015-03-24 10:20 ` John Spray
2015-03-25 18:18 ` kefu chai
1 sibling, 1 reply; 11+ messages in thread
From: Joao Eduardo Luis @ 2015-03-24 10:11 UTC (permalink / raw)
To: Sage Weil, John Spray; +Cc: kefu chai, ceph-devel@vger.kernel.org
On 03/23/2015 01:58 PM, Sage Weil wrote:
> On Mon, 23 Mar 2015, John Spray wrote:
>> On 23/03/2015 10:04, Joao Eduardo Luis wrote:
>>> I agree. And I don't think we need a new service for this, and I also don't
>>> think we need to write stuff to the store. We can generate this information
>>> when the monitor hits 'bootstrap()' and share it with the rest of the quorum
>>> once an election finishes, and always keep it in memory (unless there's some
>>> information that needs to be persisted, but I was under the impression that
>>> was not the case).
>>
>> Just to clarify, you mean we don't need to write the mon metadata to the
>> store, but we'd still want to persist the MDS/OSD metadata - right?
>
> I think we definitely still want to persist those (as we already do
> persist the OSD metadata).
>
> Since we're reporting on running daemons we could get that without
> persisting the mon metadata. I think the question is whether we want to
> report on the last known running instance. I forget whether 'osd
> metadata' includes OSDs that are down... if so, we may as well do
> the same for mons too and persist that.
>
Yeah, we may persist the info if we intend on reporting on down/out of
quorum monitors. In that case, I don't think we need anything
particularly fancy. Something like:
1. on finish_election() send metadata info to leader
2. leader coalesces all quorum participants info
3. leader gets last known metadata:
last_metadata = store->get(MONITOR_STORE_PREFIX, "last_metadata");
4. Fill whatever is missing on the this quorum's metadata (down mons and
what not):
new_metadata.fill_gaps_and_stuff(last_metada);
5. leader creates a store transaction:
MonitorDBStore::TransactionRef t = paxos->get_pending_transaction();
t->put(Monitor::MONITOR_STORE_PREFIX, "last_metadata", new_metadata);
paxos->trigger_propose();
This should do nicely, and it will only take adding a new message type
and a few functions to the Monitor class that will handle that message
type. I don't think we need to enforce the same restrictions that
PaxosService does, so that will simplify things a lot.
Also, just thought that a good way to version the metadata struct would
be to use the election epoch.
Anyway, on reporting it would be nice if we were to clearly state that
the hostname may not be current, if the mon is not in quorum. I don't
think people change hostnames for sport, but I can imagine an instance
in which someone changed a given hostname, kept the IP just the same,
and then got confused seeing the old hostname being reported by ceph mon
metadata (or wtv).
-Joao
--
Joao Eduardo Luis | github.com/jecluis
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-24 10:11 ` Joao Eduardo Luis
@ 2015-03-24 10:20 ` John Spray
0 siblings, 0 replies; 11+ messages in thread
From: John Spray @ 2015-03-24 10:20 UTC (permalink / raw)
To: Joao Eduardo Luis; +Cc: ceph-devel@vger.kernel.org
On 24/03/2015 10:11, Joao Eduardo Luis wrote:
> I don't think people change hostnames for sport
Sounds interesting, I might buy tickets to a game :-D
John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: adding "{mds,mon} metadata" asok command
2015-03-23 13:58 ` Sage Weil
2015-03-24 10:11 ` Joao Eduardo Luis
@ 2015-03-25 18:18 ` kefu chai
1 sibling, 0 replies; 11+ messages in thread
From: kefu chai @ 2015-03-25 18:18 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, Joao Eduardo Luis, ceph-devel@vger.kernel.org
On Mon, Mar 23, 2015 at 9:58 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 23 Mar 2015, John Spray wrote:
>> On 23/03/2015 10:04, Joao Eduardo Luis wrote:
>> > I agree. And I don't think we need a new service for this, and I also don't
>> > think we need to write stuff to the store. We can generate this information
>> > when the monitor hits 'bootstrap()' and share it with the rest of the quorum
>> > once an election finishes, and always keep it in memory (unless there's some
>> > information that needs to be persisted, but I was under the impression that
>> > was not the case).
thanks João. yes, i will have another map in DataHealthService for the
metadata. and update it in the way we update the health data. but will
only send the MMonUpdateMetadata when the service bootstraps itself.
and i will see if we can/need to remove the mon in the metadata map
after it is removed from the quorum.
>>
>> Just to clarify, you mean we don't need to write the mon metadata to the
>> store, but we'd still want to persist the MDS/OSD metadata - right?
>
> I think we definitely still want to persist those (as we already do
> persist the OSD metadata).
agreed, that's also my impression.
>
> Since we're reporting on running daemons we could get that without
> persisting the mon metadata. I think the question is whether we want to
> report on the last known running instance. I forget whether 'osd
> metadata' includes OSDs that are down... if so, we may as well do
> the same for mons too and persist that.
yes, "osd metadata" also includes OSDs that are down, except for the
ones explicitly removed using "osd rm".
>
> sage
--
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-03-25 18:18 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-20 5:39 adding "{mds,mon} metadata" asok command kefu chai
2015-03-20 11:43 ` John Spray
[not found] ` <CAJE9aOPK6Qge8BNvB0na2-Y_12THB2aSmk-wvpPytAwcjonMGQ@mail.gmail.com>
2015-03-20 15:58 ` Fwd: " kefu chai
2015-03-20 13:30 ` Sage Weil
2015-03-20 16:04 ` kefu chai
2015-03-23 10:04 ` Joao Eduardo Luis
2015-03-23 10:35 ` John Spray
2015-03-23 13:58 ` Sage Weil
2015-03-24 10:11 ` Joao Eduardo Luis
2015-03-24 10:20 ` John Spray
2015-03-25 18:18 ` kefu chai
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.