issues with adjusting the crushmap in 0.51

All of lore.kernel.org
 help / color / mirror / Atom feed

* issues with adjusting the crushmap in 0.51
@ 2012-09-06 17:58 Jimmy Tang
  2012-09-06 18:19 ` Gregory Farnum
  0 siblings, 1 reply; 6+ messages in thread
From: Jimmy Tang @ 2012-09-06 17:58 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 3696 bytes --]

Hi All,

I've been playing around with 0.51 of ceph on two test machines in
work, I was experimenting with adjusting the crushmap to change from
replicating across osd's to replicating across hosts. When I change
the rule for my data pool from type osd to type host, compile up the
crushmap and then a "ceph osd setcrushmap -i crush.new" it crashes my
monitor if I have one running, if I have two, then one of them crashes
and the process just hangs and leaves my test filesystem in an unclean
state.

I changed the rule data {} to this

rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 0 type host
        step emit
}

Are there any constraints for changing the rules on where things get
replicated? i.e. to go from osd to host to rack with the data and
metadata?

Here's my ceph.conf file and crushmap before the changes

[global]
        #auth supported = cephx
        #keyring = /etc/ceph/ceph.keyring
        filestore xattr use omap = true
  
[osd]
        osd journal size = 1000
        filestore xattr use omap = true

[mon.a]
        host = 134.226.112.194
        mon addr = 134.226.112.194:6789
        mon data = /data/mon.$id
[mon.b]
        host = 134.226.112.138
        mon addr = 134.226.112.138:6789
        mon data = /home/mon.$id

[mds.a]
        host = 134.226.112.194
        mon data = /data/mds.$id

[mds.b]
        host = 134.226.112.138
        mon data = /home/mds.$id

[osd.0]
        host = 134.226.112.194
        osd data = /data/osd.$id
        osd journal = /data/osd.$id.journal

[osd.1]
        host = 134.226.112.194
        osd data = /data$id/osd.$id
        osd journal = /data$id/osd.$id.journal

[osd.2]
        host = 134.226.112.138
        osd data = /home/osd.$id
        osd journal = /home/osd.$id.journal


My crushmap

# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool

# buckets
host 134.226.112.194 {
        id -2           # do not change unnecessarily
        # weight 2.000
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 1.000
        item osd.0 weight 1.000
}
host 134.226.112.138 {
        id -4           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 1.000
}
rack rack-1 {
        id -3           # do not change unnecessarily
        # weight 3.000
        alg straw
        hash 0  # rjenkins1
        item 134.226.112.194 weight 2.000
        item 134.226.112.138 weight 1.000
}
pool default {
        id -1           # do not change unnecessarily
        # weight 2.000
        alg straw
        hash 0  # rjenkins1
        item rack-1 weight 2.000
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 0 type osd
        step emit
}

rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 0 type osd
        step emit
}
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 0 type osd
        step emit
}


Jimmy

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: issues with adjusting the crushmap in 0.51
  2012-09-06 17:58 issues with adjusting the crushmap in 0.51 Jimmy Tang
@ 2012-09-06 18:19 ` Gregory Farnum
  2012-09-06 18:51   ` Jimmy Tang
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Farnum @ 2012-09-06 18:19 UTC (permalink / raw)
  To: Jimmy Tang; +Cc: ceph-devel

On Thu, Sep 6, 2012 at 10:58 AM, Jimmy Tang <jtang@tchpc.tcd.ie> wrote:
> Hi All,
>
> I've been playing around with 0.51 of ceph on two test machines in
> work, I was experimenting with adjusting the crushmap to change from
> replicating across osd's to replicating across hosts. When I change
> the rule for my data pool from type osd to type host, compile up the
> crushmap and then a "ceph osd setcrushmap -i crush.new" it crashes my
> monitor if I have one running, if I have two, then one of them crashes
> and the process just hangs and leaves my test filesystem in an unclean
> state.
>
> I changed the rule data {} to this
>
> rule data {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step choose firstn 0 type host
>         step emit
> }
>
> Are there any constraints for changing the rules on where things get
> replicated? i.e. to go from osd to host to rack with the data and
> metadata?

You always need to end up with "devices" (the OSDs, generally) and
then emit those from your CRUSH rule. You can do so hierarchically:
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 0 type host
        step choose firstn 1 osd
        step emit
}
In this case, (with n being your replication count) this rule chooses
n hosts, and then chooses 1 OSD from each chosen host.

You can also use "chooseleaf", which is a bit more robust in the
presence of failed OSDs:
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
This rule will choose n hosts and an OSD from each chosen host, and if
it fails on any host then it will restart with a different host (the
previous rule would stick with the chosen hosts and so it can't handle
if eg an entire host's OSDs are down).
-Greg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: issues with adjusting the crushmap in 0.51
  2012-09-06 18:19 ` Gregory Farnum
@ 2012-09-06 18:51   ` Jimmy Tang
  2012-09-06 20:31     ` Tommi Virtanen
  0 siblings, 1 reply; 6+ messages in thread
From: Jimmy Tang @ 2012-09-06 18:51 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]

Hi Greg

On Thu, Sep 06, 2012 at 11:19:12AM -0700, Gregory Farnum wrote:
> You always need to end up with "devices" (the OSDs, generally) and
> then emit those from your CRUSH rule. You can do so hierarchically:
> rule data {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step choose firstn 0 type host
>         step choose firstn 1 osd
>         step emit
> }
> In this case, (with n being your replication count) this rule chooses
> n hosts, and then chooses 1 OSD from each chosen host.
> 
> You can also use "chooseleaf", which is a bit more robust in the
> presence of failed OSDs:
> rule data {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> This rule will choose n hosts and an OSD from each chosen host, and if
> it fails on any host then it will restart with a different host (the
> previous rule would stick with the chosen hosts and so it can't handle
> if eg an entire host's OSDs are down).
> -Greg
> 

That explaination certainly has cleared things up for me, I had not
realised that I needed to end with a "device" at the end of a rule
based on the documentation that I could find on the website and the
old ceph wiki.

Also, the "ceph osd setcrushmap..." command doesn't up when a ceph
--help is run in the 0.51 release, however it is documented on the
wiki as far as I recall. It'd be real nice if the applications emitted
all the available commands, it would make experimenting much nicer and
fun.

Thanks,
Jimmy,

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: issues with adjusting the crushmap in 0.51
  2012-09-06 18:51   ` Jimmy Tang
@ 2012-09-06 20:31     ` Tommi Virtanen
  2012-09-08  8:08       ` Jimmy Tang
  2012-09-12 11:13       ` Jimmy Tang
  0 siblings, 2 replies; 6+ messages in thread
From: Tommi Virtanen @ 2012-09-06 20:31 UTC (permalink / raw)
  To: Jimmy Tang; +Cc: Gregory Farnum, ceph-devel

On Thu, Sep 6, 2012 at 11:51 AM, Jimmy Tang <jtang@tchpc.tcd.ie> wrote:
> Also, the "ceph osd setcrushmap..." command doesn't up when a ceph
> --help is run in the 0.51 release, however it is documented on the
> wiki as far as I recall. It'd be real nice if the applications emitted
> all the available commands, it would make experimenting much nicer and
> fun.

"ceph" is just a client app that sends (most of) the commands to
ceph-mon for execution; it's --help is problematic to keep up to
date.. We're trying to do better. Patches appreciated, see ticket at
the end..

Background:
http://www.spinics.net/lists/ceph-devel/msg08471.html
http://www.spinics.net/lists/ceph-devel/msg08472.html
http://www.tracker.newdream.net/issues/2894

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: issues with adjusting the crushmap in 0.51
  2012-09-06 20:31     ` Tommi Virtanen
@ 2012-09-08  8:08       ` Jimmy Tang
  2012-09-12 11:13       ` Jimmy Tang
  1 sibling, 0 replies; 6+ messages in thread
From: Jimmy Tang @ 2012-09-08  8:08 UTC (permalink / raw)
  To: Tommi Virtanen; +Cc: Gregory Farnum, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Thu, Sep 06, 2012 at 01:31:15PM -0700, Tommi Virtanen wrote:
> On Thu, Sep 6, 2012 at 11:51 AM, Jimmy Tang <jtang@tchpc.tcd.ie> wrote:
> > Also, the "ceph osd setcrushmap..." command doesn't up when a ceph
> > --help is run in the 0.51 release, however it is documented on the
> > wiki as far as I recall. It'd be real nice if the applications emitted
> > all the available commands, it would make experimenting much nicer and
> > fun.
> 
> "ceph" is just a client app that sends (most of) the commands to
> ceph-mon for execution; it's --help is problematic to keep up to
> date.. We're trying to do better. Patches appreciated, see ticket at
> the end..
> 
> Background:
> http://www.spinics.net/lists/ceph-devel/msg08471.html
> http://www.spinics.net/lists/ceph-devel/msg08472.html
> http://www.tracker.newdream.net/issues/2894
> 

I was just looking at that, there's a few places where I have noticed
to have in-accurate "help" messages. I will go over my notes and see
if I can provide a few patches.

Jimmy

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: issues with adjusting the crushmap in 0.51
  2012-09-06 20:31     ` Tommi Virtanen
  2012-09-08  8:08       ` Jimmy Tang
@ 2012-09-12 11:13       ` Jimmy Tang
  1 sibling, 0 replies; 6+ messages in thread
From: Jimmy Tang @ 2012-09-12 11:13 UTC (permalink / raw)
  To: Tommi Virtanen; +Cc: Gregory Farnum, ceph-devel

Hi Tommi.


On 6 Sep 2012, at 21:31, Tommi Virtanen wrote:

> On Thu, Sep 6, 2012 at 11:51 AM, Jimmy Tang <jtang@tchpc.tcd.ie> wrote:
>> Also, the "ceph osd setcrushmap..." command doesn't up when a ceph
>> --help is run in the 0.51 release, however it is documented on the
>> wiki as far as I recall. It'd be real nice if the applications emitted
>> all the available commands, it would make experimenting much nicer and
>> fun.
> 
> "ceph" is just a client app that sends (most of) the commands to
> ceph-mon for execution; it's --help is problematic to keep up to
> date.. We're trying to do better. Patches appreciated, see ticket at
> the end..


Just to scratch an itch, here's some documentation additions from testing locally,

[jtang@x00 ceph (master)]$ git diff
diff --git a/doc/cluster-ops/control.rst b/doc/cluster-ops/control.rst
index 9af4562..9946229 100644
--- a/doc/cluster-ops/control.rst
+++ b/doc/cluster-ops/control.rst
@@ -293,6 +293,10 @@ Enables debug messages. ::
 
 Displays the status of all metadata servers.
 
+Make a pool usable by the fs with::
+
+       ceph mds add_data_pool {pool-name}
+
 .. todo:: ``ceph mds`` subcommands missing docs: set_max_mds, dump, getmap, stop, setmap
 
 
diff --git a/src/tools/ceph.cc b/src/tools/ceph.cc
index 0435771..306d67a 100644
--- a/src/tools/ceph.cc
+++ b/src/tools/ceph.cc
@@ -62,6 +62,7 @@ static void usage()
   cout << "\n";
   cout << "METADATA SERVER (MDS) COMMANDS\n";
   cout << "  ceph mds stat\n";
+  cout << "  ceph mds add_data_pool <pool>\n";
   cout << "  ceph mds tell <mds-id or *> injectargs '--<switch> <value> [--<switch> <value>...]'\n";
   cout << "\n";
   cout << "MONITOR (MON) COMMANDS\n";
@@ -82,6 +83,7 @@ static void usage()
   cout << "  ceph osd unpause\n";
   cout << "  ceph osd tell <osd-id or *> injectargs '--<switch> <value> [--<switch> <value>...]'\n";
   cout << "  ceph osd getcrushmap -o <file>\n";
+  cout << "  ceph osd setcrushmap -i <file>\n";
   cout << "  ceph osd getmap -o <file>\n";
   cout << "  ceph osd crush set <osd-id> <weight> <loc1> [<loc2> ...]\n";
   cout << "  ceph osd crush move <bucketname> <loc1> [<loc2> ...]\n";



Regards,
Jimmy Tang

--
Senior Software Engineer, Digital Repository of Ireland (DRI)
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-12 11:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-06 17:58 issues with adjusting the crushmap in 0.51 Jimmy Tang
2012-09-06 18:19 ` Gregory Farnum
2012-09-06 18:51   ` Jimmy Tang
2012-09-06 20:31     ` Tommi Virtanen
2012-09-08  8:08       ` Jimmy Tang
2012-09-12 11:13       ` Jimmy Tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.