* Single MDS cephx key @ 2017-09-26 23:09 Travis Nielsen 2017-09-27 10:01 ` John Spray 0 siblings, 1 reply; 7+ messages in thread From: Travis Nielsen @ 2017-09-26 23:09 UTC (permalink / raw) To: Ceph Development Is it possible to use the same cephx key for all instances of MDS or do they each require their own? Mons require the same keyring so I tried following the same pattern by creating a keyring with "mds.", but the MDS is complaining about not being authorized when it tries to start. Am I missing something or is this not possible for MDS keys? If I create a unique key for each MDS instance it works fine, but it would simplify my scenario if I could use the same key. I'm running on Luminous. The key was generated with this: ceph auth get-or-create-key mds. osd allow * mds allow mon allow profile mds The keyring contents are: [mds.] key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== I run the following with that keyring: ceph-mds --foreground --name=mds.mymds -i mymds And I see the error: 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed to authenticate: (22) Invalid argument Thanks, Travis ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-26 23:09 Single MDS cephx key Travis Nielsen @ 2017-09-27 10:01 ` John Spray 2017-09-27 14:49 ` Travis Nielsen 2017-09-27 16:36 ` Travis Nielsen 0 siblings, 2 replies; 7+ messages in thread From: John Spray @ 2017-09-27 10:01 UTC (permalink / raw) To: Travis Nielsen; +Cc: Ceph Development On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen <Travis.Nielsen@quantum.com> wrote: > Is it possible to use the same cephx key for all instances of MDS or do > they each require their own? Mons require the same keyring so I tried > following the same pattern by creating a keyring with "mds.", but the MDS > is complaining about not being authorized when it tries to start. Am I > missing something or is this not possible for MDS keys? If I create a > unique key for each MDS instance it works fine, but it would simplify my > scenario if I could use the same key. I'm running on Luminous. I've never heard of anyone trying to do this. It's probably not a great idea, because if all MDS daemons are using the same key then you lose the ability to simply remove an MDS's key to ensure that it can't talk to the system any more. This is useful when tearing something down, because it means you're not taking it on faith that the daemon is really physically stopped. John > The key was generated with this: > ceph auth get-or-create-key mds. osd allow * mds allow mon allow profile > mds > > > > The keyring contents are: > [mds.] > key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== > > > I run the following with that keyring: > ceph-mds --foreground --name=mds.mymds -i mymds > > And I see the error: > 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed to > authenticate: (22) Invalid argument > > > > Thanks, > Travis > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-27 10:01 ` John Spray @ 2017-09-27 14:49 ` Travis Nielsen 2017-09-27 16:36 ` Travis Nielsen 1 sibling, 0 replies; 7+ messages in thread From: Travis Nielsen @ 2017-09-27 14:49 UTC (permalink / raw) To: John Spray; +Cc: Ceph Development Ok that makes sense to maintain the control over the teardown, thanks for the perspective. On 9/27/17, 3:01 AM, "John Spray" <jspray@redhat.com> wrote: >On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen ><Travis.Nielsen@quantum.com> wrote: >> Is it possible to use the same cephx key for all instances of MDS or do >> they each require their own? Mons require the same keyring so I tried >> following the same pattern by creating a keyring with "mds.", but the >>MDS >> is complaining about not being authorized when it tries to start. Am I >> missing something or is this not possible for MDS keys? If I create a >> unique key for each MDS instance it works fine, but it would simplify my >> scenario if I could use the same key. I'm running on Luminous. > >I've never heard of anyone trying to do this. > >It's probably not a great idea, because if all MDS daemons are using >the same key then you lose the ability to simply remove an MDS's key >to ensure that it can't talk to the system any more. This is useful >when tearing something down, because it means you're not taking it on >faith that the daemon is really physically stopped. > >John > >> The key was generated with this: >> ceph auth get-or-create-key mds. osd allow * mds allow mon allow profile >> mds >> >> >> >> The keyring contents are: >> [mds.] >> key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== >> >> >> I run the following with that keyring: >> ceph-mds --foreground --name=mds.mymds -i mymds >> >> And I see the error: >> 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed to >> authenticate: (22) Invalid argument >> >> >> >> Thanks, >> Travis >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at >>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kern >>el.org%2Fmajordomo-info.html&data=02%7C01%7CTravis.Nielsen%40quantum.com% >>7C00d1db42478d48fa8c6508d5058ec254%7C322a135f14fb4d72aede122272134ae0%7C1 >>%7C0%7C636421033061815149&sdata=3Vu79xeZbnb1jwhGE85PACq6qByVE6vUlPjp8pjrv >>hA%3D&reserved=0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-27 10:01 ` John Spray 2017-09-27 14:49 ` Travis Nielsen @ 2017-09-27 16:36 ` Travis Nielsen 2017-09-27 16:45 ` John Spray 1 sibling, 1 reply; 7+ messages in thread From: Travis Nielsen @ 2017-09-27 16:36 UTC (permalink / raw) To: John Spray; +Cc: Ceph Development To expand on the scenario, I'm working in a Kubernetes environment where the MDS instances are somewhat ephemeral. If an instance (pod) dies or the machine is restarted, Kubernetes will start a new one in its place. To handle the failed pod scenario, I'd appreciate if you could help me understand MDS better. 1) MDS instances are stateless, correct? If so, I'm assuming when an MDS instance dies, a new MDS instance (with a new ID) can be brought up and assigned its rank without any side effects other than disruption during the failover. Or is there a reason to treat them more like mons that need to survive reboots and maintain state? 2) Will there be any side effects from MDS instances being somewhat ephemeral? For example, if a new instance came up every hour or every day, what challenges would I run into besides cleaning up the old cephx keys? Thanks! Travis On 9/27/17, 3:01 AM, "John Spray" <jspray@redhat.com> wrote: >On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen ><Travis.Nielsen@quantum.com> wrote: >> Is it possible to use the same cephx key for all instances of MDS or do >> they each require their own? Mons require the same keyring so I tried >> following the same pattern by creating a keyring with "mds.", but the >>MDS >> is complaining about not being authorized when it tries to start. Am I >> missing something or is this not possible for MDS keys? If I create a >> unique key for each MDS instance it works fine, but it would simplify my >> scenario if I could use the same key. I'm running on Luminous. > >I've never heard of anyone trying to do this. > >It's probably not a great idea, because if all MDS daemons are using >the same key then you lose the ability to simply remove an MDS's key >to ensure that it can't talk to the system any more. This is useful >when tearing something down, because it means you're not taking it on >faith that the daemon is really physically stopped. > >John > >> The key was generated with this: >> ceph auth get-or-create-key mds. osd allow * mds allow mon allow profile >> mds >> >> >> >> The keyring contents are: >> [mds.] >> key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== >> >> >> I run the following with that keyring: >> ceph-mds --foreground --name=mds.mymds -i mymds >> >> And I see the error: >> 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed to >> authenticate: (22) Invalid argument >> >> >> >> Thanks, >> Travis >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at >>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kern >>el.org%2Fmajordomo-info.html&data=02%7C01%7CTravis.Nielsen%40quantum.com% >>7C00d1db42478d48fa8c6508d5058ec254%7C322a135f14fb4d72aede122272134ae0%7C1 >>%7C0%7C636421033061815149&sdata=3Vu79xeZbnb1jwhGE85PACq6qByVE6vUlPjp8pjrv >>hA%3D&reserved=0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-27 16:36 ` Travis Nielsen @ 2017-09-27 16:45 ` John Spray 2017-09-27 17:01 ` Travis Nielsen 0 siblings, 1 reply; 7+ messages in thread From: John Spray @ 2017-09-27 16:45 UTC (permalink / raw) To: Travis Nielsen; +Cc: Ceph Development On Wed, Sep 27, 2017 at 5:36 PM, Travis Nielsen <Travis.Nielsen@quantum.com> wrote: > To expand on the scenario, I'm working in a Kubernetes environment where > the MDS instances are somewhat ephemeral. If an instance (pod) dies or the > machine is restarted, Kubernetes will start a new one in its place. To > handle the failed pod scenario, I'd appreciate if you could help me > understand MDS better. > > 1) MDS instances are stateless, correct? If so, I'm assuming when an MDS > instance dies, a new MDS instance (with a new ID) can be brought up and > assigned its rank without any side effects other than disruption during > the failover. Or is there a reason to treat them more like mons that need > to survive reboots and maintain state? Yep, completely stateless. Don't forget logs though -- for ephemeral instances, it would be a good idea to have them sending their logs somewhere central, so that we don't lose all the history whenever a container restarts (you may very well have already covered this in general in the context of Rook). > 2) Will there be any side effects from MDS instances being somewhat > ephemeral? For example, if a new instance came up every hour or every day, > what challenges would I run into besides cleaning up the old cephx keys? While switching daemons around is an online operation, it is not without some impact to client IOs, and the freshly started MDS daemon will generally have a less well populated cache than the one it is replacing. John > > Thanks! > Travis > > > > > On 9/27/17, 3:01 AM, "John Spray" <jspray@redhat.com> wrote: > >>On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen >><Travis.Nielsen@quantum.com> wrote: >>> Is it possible to use the same cephx key for all instances of MDS or do >>> they each require their own? Mons require the same keyring so I tried >>> following the same pattern by creating a keyring with "mds.", but the >>>MDS >>> is complaining about not being authorized when it tries to start. Am I >>> missing something or is this not possible for MDS keys? If I create a >>> unique key for each MDS instance it works fine, but it would simplify my >>> scenario if I could use the same key. I'm running on Luminous. >> >>I've never heard of anyone trying to do this. >> >>It's probably not a great idea, because if all MDS daemons are using >>the same key then you lose the ability to simply remove an MDS's key >>to ensure that it can't talk to the system any more. This is useful >>when tearing something down, because it means you're not taking it on >>faith that the daemon is really physically stopped. >> >>John >> >>> The key was generated with this: >>> ceph auth get-or-create-key mds. osd allow * mds allow mon allow profile >>> mds >>> >>> >>> >>> The keyring contents are: >>> [mds.] >>> key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== >>> >>> >>> I run the following with that keyring: >>> ceph-mds --foreground --name=mds.mymds -i mymds >>> >>> And I see the error: >>> 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed to >>> authenticate: (22) Invalid argument >>> >>> >>> >>> Thanks, >>> Travis >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at >>>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kern >>>el.org%2Fmajordomo-info.html&data=02%7C01%7CTravis.Nielsen%40quantum.com% >>>7C00d1db42478d48fa8c6508d5058ec254%7C322a135f14fb4d72aede122272134ae0%7C1 >>>%7C0%7C636421033061815149&sdata=3Vu79xeZbnb1jwhGE85PACq6qByVE6vUlPjp8pjrv >>>hA%3D&reserved=0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-27 16:45 ` John Spray @ 2017-09-27 17:01 ` Travis Nielsen 2017-10-02 16:14 ` Xiaoxi Chen 0 siblings, 1 reply; 7+ messages in thread From: Travis Nielsen @ 2017-09-27 17:01 UTC (permalink / raw) To: John Spray; +Cc: Ceph Development Thanks for the clarification, and Rook does use Kubernetes facilities to handle the log collection so it sounds like we're good to go. On 9/27/17, 9:45 AM, "John Spray" <jspray@redhat.com> wrote: >On Wed, Sep 27, 2017 at 5:36 PM, Travis Nielsen ><Travis.Nielsen@quantum.com> wrote: >> To expand on the scenario, I'm working in a Kubernetes environment where >> the MDS instances are somewhat ephemeral. If an instance (pod) dies or >>the >> machine is restarted, Kubernetes will start a new one in its place. To >> handle the failed pod scenario, I'd appreciate if you could help me >> understand MDS better. >> >> 1) MDS instances are stateless, correct? If so, I'm assuming when an MDS >> instance dies, a new MDS instance (with a new ID) can be brought up and >> assigned its rank without any side effects other than disruption during >> the failover. Or is there a reason to treat them more like mons that >>need >> to survive reboots and maintain state? > >Yep, completely stateless. Don't forget logs though -- for ephemeral >instances, it would be a good idea to have them sending their logs >somewhere central, so that we don't lose all the history whenever a >container restarts (you may very well have already covered this in >general in the context of Rook). > >> 2) Will there be any side effects from MDS instances being somewhat >> ephemeral? For example, if a new instance came up every hour or every >>day, >> what challenges would I run into besides cleaning up the old cephx keys? > >While switching daemons around is an online operation, it is not >without some impact to client IOs, and the freshly started MDS daemon >will generally have a less well populated cache than the one it is >replacing. > >John > >> >> Thanks! >> Travis >> >> >> >> >> On 9/27/17, 3:01 AM, "John Spray" <jspray@redhat.com> wrote: >> >>>On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen >>><Travis.Nielsen@quantum.com> wrote: >>>> Is it possible to use the same cephx key for all instances of MDS or >>>>do >>>> they each require their own? Mons require the same keyring so I tried >>>> following the same pattern by creating a keyring with "mds.", but the >>>>MDS >>>> is complaining about not being authorized when it tries to start. Am I >>>> missing something or is this not possible for MDS keys? If I create a >>>> unique key for each MDS instance it works fine, but it would simplify >>>>my >>>> scenario if I could use the same key. I'm running on Luminous. >>> >>>I've never heard of anyone trying to do this. >>> >>>It's probably not a great idea, because if all MDS daemons are using >>>the same key then you lose the ability to simply remove an MDS's key >>>to ensure that it can't talk to the system any more. This is useful >>>when tearing something down, because it means you're not taking it on >>>faith that the daemon is really physically stopped. >>> >>>John >>> >>>> The key was generated with this: >>>> ceph auth get-or-create-key mds. osd allow * mds allow mon allow >>>>profile >>>> mds >>>> >>>> >>>> >>>> The keyring contents are: >>>> [mds.] >>>> key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== >>>> >>>> >>>> I run the following with that keyring: >>>> ceph-mds --foreground --name=mds.mymds -i mymds >>>> >>>> And I see the error: >>>> 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed >>>>to >>>> authenticate: (22) Invalid argument >>>> >>>> >>>> >>>> Thanks, >>>> Travis >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at >>>>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke >>>>rn >>>>el.org%2Fmajordomo-info.html&data=02%7C01%7CTravis.Nielsen%40quantum.co >>>>m% >>>>7C00d1db42478d48fa8c6508d5058ec254%7C322a135f14fb4d72aede122272134ae0%7 >>>>C1 >>>>%7C0%7C636421033061815149&sdata=3Vu79xeZbnb1jwhGE85PACq6qByVE6vUlPjp8pj >>>>rv >>>>hA%3D&reserved=0 >> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Single MDS cephx key 2017-09-27 17:01 ` Travis Nielsen @ 2017-10-02 16:14 ` Xiaoxi Chen 0 siblings, 0 replies; 7+ messages in thread From: Xiaoxi Chen @ 2017-10-02 16:14 UTC (permalink / raw) To: Travis Nielsen; +Cc: John Spray, Ceph Development FWIW, below is our mds log when an MDS turn from standby-replay to active, it take really long time especially on rejoin. The FS do go with 100+ clients and a few million of files. So it seems that MDS restart/replace is not that **light weight** as we are thinking about. 2017-09-29 08:07:27.738118 7fe34d085700 1 mds.0.0 replay_done (as standby) 2017-09-29 08:07:28.835714 7fe34d085700 1 mds.0.0 replay_done (as standby) 2017-09-29 08:07:29.932846 7fe34d085700 1 mds.0.0 replay_done (as standby) 2017-09-29 08:07:31.034661 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:07:31.034663 7fe353091700 1 mds.0.5419 handle_mds_map state change up:standby-replay --> up:replay 2017-09-29 08:07:31.063181 7fe34d085700 1 mds.0.5419 replay_done (as standby) 2017-09-29 08:07:31.063201 7fe34d085700 1 mds.0.5419 standby_replay_restart (final takeover pass) 2017-09-29 08:07:31.168992 7fe34d085700 1 mds.0.5419 replay_done 2017-09-29 08:07:31.169005 7fe34d085700 1 mds.0.5419 making mds journal writeable 2017-09-29 08:07:32.046255 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:07:32.046257 7fe353091700 1 mds.0.5419 handle_mds_map state change up:replay --> up:resolve 2017-09-29 08:07:32.046265 7fe353091700 1 mds.0.5419 resolve_start 2017-09-29 08:07:32.046267 7fe353091700 1 mds.0.5419 reopen_log 2017-09-29 08:07:32.046274 7fe353091700 1 mds.0.5419 recovery set is 1 2017-09-29 08:08:14.856587 7fe353091700 1 mds.0.cache handle_mds_failure mds.1 : recovery peers are 1 2017-09-29 08:08:15.863834 7fe353091700 1 mds.0.5419 recovery set is 1 2017-09-29 08:08:15.868900 7fe356034700 0 -- 10.148.245.147:6800/3329170275 >> 10.148.245.145:6804/3466203682 conn(0x55b64a7c6000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING 2017-09-29 08:08:19.957946 7fe353091700 1 mds.0.5419 resolve_done 2017-09-29 08:08:24.950118 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:08:24.950171 7fe353091700 1 mds.0.5419 handle_mds_map state change up:resolve --> up:reconnect 2017-09-29 08:08:24.950194 7fe353091700 1 mds.0.5419 reconnect_start 2017-09-29 08:08:24.950634 7fe353091700 1 mds.0.server reconnect_clients -- 186 sessions 2017-09-29 08:08:24.950839 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34337 10.148.245.251:0/2769043642 after 0.000084 2017-09-29 08:08:24.960074 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34330 10.148.185.72:0/2433467420 after 0.009367 2017-09-29 08:08:24.964139 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34322 10.148.190.129:0/647912525 after 0.013405 2017-09-29 08:08:24.964331 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.104245 10.148.184.144:0/2533535048 after 0.013667 2017-09-29 08:08:24.964843 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34319 10.148.190.128:0/99750648 after 0.014185 2017-09-29 08:08:24.966976 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.104212 10.148.184.140:0/1486286861 after 0.016276 2017-09-29 08:08:24.967093 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.104225 10.148.184.131:0/3388986983 after 0.016435 2017-09-29 08:08:24.967263 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.104255 10.148.184.130:0/4145162106 after 0.016546 2017-09-29 08:08:24.967344 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34313 10.148.190.155:0/2880915725 after 0.016690 2017-09-29 08:08:42.996768 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34328 10.148.178.59:0/3803632372 after 18.046088 2017-09-29 08:08:43.886804 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34315 10.148.190.180:0/3130069617 after 18.936122 2017-09-29 08:08:44.696920 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34280 10.148.190.176:0/3500883319 after 19.746237 2017-09-29 08:08:45.531987 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34225 10.148.178.35:0/1295837455 after 20.581320 2017-09-29 08:08:46.524033 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34160 10.148.178.28:0/3696130157 after 21.573369 2017-09-29 08:08:47.285174 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34324 10.148.190.167:0/2637778599 after 22.334416 2017-09-29 08:08:48.173818 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34140 10.148.178.37:0/474083565 after 23.223125 2017-09-29 08:08:48.976772 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34128 10.148.178.40:0/3470525153 after 24.026070 2017-09-29 08:08:49.680177 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34283 10.148.190.172:0/561253376 after 24.729396 2017-09-29 08:08:50.457263 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34231 10.148.178.21:0/686475218 after 25.506604 2017-09-29 08:08:51.216427 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34204 10.148.178.64:0/2775534546 after 26.265759 2017-09-29 08:08:52.063327 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34207 10.148.178.66:0/2705035982 after 27.112667 2017-09-29 08:08:52.726760 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34178 10.148.178.26:0/168300491 after 27.775987 2017-09-29 08:08:53.346579 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34169 10.148.178.31:0/561092381 after 28.395917 2017-09-29 08:08:53.998588 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34237 10.148.178.33:0/718178560 after 29.047928 2017-09-29 08:08:54.810011 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34219 10.148.178.41:0/3334906872 after 29.859354 2017-09-29 08:08:55.692896 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34187 10.148.178.63:0/2527423535 after 30.742233 2017-09-29 08:08:56.663520 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34287 10.148.190.175:0/1683041168 after 31.712852 2017-09-29 08:08:57.501470 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34272 10.148.190.178:0/2415576439 after 32.550770 2017-09-29 08:08:58.324454 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34175 10.148.178.43:0/2792043288 after 33.373769 2017-09-29 08:08:59.197349 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34143 10.148.178.49:0/119222329 after 34.246687 2017-09-29 08:08:59.981611 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34149 10.148.178.20:0/2783244186 after 35.030950 2017-09-29 08:09:00.884353 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34216 10.148.178.32:0/2085010403 after 35.933695 2017-09-29 08:09:01.687472 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34303 10.148.190.183:0/2441938068 after 36.736810 2017-09-29 08:09:03.390089 7fe353091700 0 log_channel(cluster) log [DBG] : reconnect by client.34112 10.148.245.113:0/556076324 after 38.439312 2017-09-29 08:09:03.390209 7fe353091700 1 mds.0.5419 reconnect_done 2017-09-29 08:09:03.451448 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:09:03.451468 7fe353091700 1 mds.0.5419 handle_mds_map state change up:reconnect --> up:rejoin 2017-09-29 08:09:03.451487 7fe353091700 1 mds.0.5419 rejoin_start 2017-09-29 08:09:08.306597 7fe353091700 1 mds.0.5419 rejoin_joint_start 2017-09-29 08:12:46.172231 7fe353091700 1 mds.0.5419 rejoin_done 2017-09-29 08:12:49.354507 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:12:49.354518 7fe353091700 1 mds.0.5419 handle_mds_map state change up:rejoin --> up:clientreplay 2017-09-29 08:12:49.354529 7fe353091700 1 mds.0.5419 recovery_done -- successful recovery! 2017-09-29 08:12:49.354944 7fe353091700 1 mds.0.5419 clientreplay_start 2017-09-29 08:12:50.325607 7fe34e888700 1 mds.0.5419 clientreplay_done 2017-09-29 08:12:50.983391 7fe353091700 1 mds.0.5419 handle_mds_map i am now mds.0.5419 2017-09-29 08:12:50.983393 7fe353091700 1 mds.0.5419 handle_mds_map state change up:clientreplay --> up:active 2017-09-29 08:12:50.983410 7fe353091700 1 mds.0.5419 active_start 2017-09-28 1:01 GMT+08:00 Travis Nielsen <Travis.Nielsen@quantum.com>: > Thanks for the clarification, and Rook does use Kubernetes facilities to > handle the log collection so it sounds like we're good to go. > > > > On 9/27/17, 9:45 AM, "John Spray" <jspray@redhat.com> wrote: > >>On Wed, Sep 27, 2017 at 5:36 PM, Travis Nielsen >><Travis.Nielsen@quantum.com> wrote: >>> To expand on the scenario, I'm working in a Kubernetes environment where >>> the MDS instances are somewhat ephemeral. If an instance (pod) dies or >>>the >>> machine is restarted, Kubernetes will start a new one in its place. To >>> handle the failed pod scenario, I'd appreciate if you could help me >>> understand MDS better. >>> >>> 1) MDS instances are stateless, correct? If so, I'm assuming when an MDS >>> instance dies, a new MDS instance (with a new ID) can be brought up and >>> assigned its rank without any side effects other than disruption during >>> the failover. Or is there a reason to treat them more like mons that >>>need >>> to survive reboots and maintain state? >> >>Yep, completely stateless. Don't forget logs though -- for ephemeral >>instances, it would be a good idea to have them sending their logs >>somewhere central, so that we don't lose all the history whenever a >>container restarts (you may very well have already covered this in >>general in the context of Rook). >> >>> 2) Will there be any side effects from MDS instances being somewhat >>> ephemeral? For example, if a new instance came up every hour or every >>>day, >>> what challenges would I run into besides cleaning up the old cephx keys? >> >>While switching daemons around is an online operation, it is not >>without some impact to client IOs, and the freshly started MDS daemon >>will generally have a less well populated cache than the one it is >>replacing. >> >>John >> >>> >>> Thanks! >>> Travis >>> >>> >>> >>> >>> On 9/27/17, 3:01 AM, "John Spray" <jspray@redhat.com> wrote: >>> >>>>On Wed, Sep 27, 2017 at 12:09 AM, Travis Nielsen >>>><Travis.Nielsen@quantum.com> wrote: >>>>> Is it possible to use the same cephx key for all instances of MDS or >>>>>do >>>>> they each require their own? Mons require the same keyring so I tried >>>>> following the same pattern by creating a keyring with "mds.", but the >>>>>MDS >>>>> is complaining about not being authorized when it tries to start. Am I >>>>> missing something or is this not possible for MDS keys? If I create a >>>>> unique key for each MDS instance it works fine, but it would simplify >>>>>my >>>>> scenario if I could use the same key. I'm running on Luminous. >>>> >>>>I've never heard of anyone trying to do this. >>>> >>>>It's probably not a great idea, because if all MDS daemons are using >>>>the same key then you lose the ability to simply remove an MDS's key >>>>to ensure that it can't talk to the system any more. This is useful >>>>when tearing something down, because it means you're not taking it on >>>>faith that the daemon is really physically stopped. >>>> >>>>John >>>> >>>>> The key was generated with this: >>>>> ceph auth get-or-create-key mds. osd allow * mds allow mon allow >>>>>profile >>>>> mds >>>>> >>>>> >>>>> >>>>> The keyring contents are: >>>>> [mds.] >>>>> key = AQD62spZw3zRGhAAkHHVokP3BDf8PEy4+vXGMg== >>>>> >>>>> >>>>> I run the following with that keyring: >>>>> ceph-mds --foreground --name=mds.mymds -i mymds >>>>> >>>>> And I see the error: >>>>> 2017-09-26 22:55:55.973047 7fb004459200 -1 mds.mds81c2n ERROR: failed >>>>>to >>>>> authenticate: (22) Invalid argument >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Travis >>>>> >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at >>>>>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke >>>>>rn >>>>>el.org%2Fmajordomo-info.html&data=02%7C01%7CTravis.Nielsen%40quantum.co >>>>>m% >>>>>7C00d1db42478d48fa8c6508d5058ec254%7C322a135f14fb4d72aede122272134ae0%7 >>>>>C1 >>>>>%7C0%7C636421033061815149&sdata=3Vu79xeZbnb1jwhGE85PACq6qByVE6vUlPjp8pj >>>>>rv >>>>>hA%3D&reserved=0 >>> > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-10-02 16:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-26 23:09 Single MDS cephx key Travis Nielsen 2017-09-27 10:01 ` John Spray 2017-09-27 14:49 ` Travis Nielsen 2017-09-27 16:36 ` Travis Nielsen 2017-09-27 16:45 ` John Spray 2017-09-27 17:01 ` Travis Nielsen 2017-10-02 16:14 ` Xiaoxi Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.