cluster status information

All of lore.kernel.org
 help / color / mirror / Atom feed

* cluster status information
@ 2011-05-26 12:18 Fyodor Ustinov
  2011-05-26 12:43 ` Wilfrid Allembrand
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Fyodor Ustinov @ 2011-05-26 12:18 UTC (permalink / raw)
  To: ceph-devel

Hi!

How to get information about status of each server in cluster?

#ceph osd stat
2011-05-26 15:07:05.103621 mon <- [osd,stat]
2011-05-26 15:07:05.104201 mon0 -> 'e413: 6 osds: 5 up, 5 in' (0)

I see - in cluster 6 osd servers and now up only 5.  How do I know which 
server is down?

More global question - how to monitor the state of servers in a cluster?

WBR,
     Fyodor.

P.S. JFYI: key "-s" do not described in manual page about ceph command.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cluster status information
  2011-05-26 12:18 cluster status information Fyodor Ustinov
@ 2011-05-26 12:43 ` Wilfrid Allembrand
  2011-05-26 13:38 ` Dyweni - Ceph-Devel
  2011-05-26 14:54 ` Wido den Hollander
  2 siblings, 0 replies; 4+ messages in thread
From: Wilfrid Allembrand @ 2011-05-26 12:43 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Hi all,

I'm new to ceph but I'll try it soon. It looks really excellent, keep
up the good work !!

From my experience with a commercial solution of a scale out NAS
cluster (Isilon, not to name it), this is a really important feature
and attention should be put on that as well :)
What about an Isilon's like cluster status ?
Here is the output for the whole cluster + information for a specific
node (sorry for badly formatted text).
Could it be implemented somewhat like this in Ceph ? Simple & precise.

# isi status

Cluster Name:     my-cluster-1
Cluster Health:   [ OK ]
Available:         69T (11%)

                        Health    Throughput (bits/s)
 ID | IP Address      |D-A--S-R|   In     Out    Total |  Used  / Capacity
----+-----------------+--------+-------+-------+-------+-----------------------
  1 | XX.YY.Z.1       | [ OK ] |  374M |  258M |  631M |   19T  /   22T (89%)
  2 | XX.YY.Z.2       | [ OK ] |     0 |     0 |     0 |   19T  /   22T (88%)
  3 | XX.YY.Z.3       | [ OK ] |  1.7M |     0 |  1.7M |   19T  /   22T (89%)
  4 | XX.YY.Z.4       | [ OK ] |   16K |  177M |  177M |   19T  /   22T (88%)
  5 | XX.YY.Z.5       | [ OK ] |  581M |  147M |  729M |   19T  /   22T (88%)
  6 | XX.YY.Z.6       | [ OK ] |   12M |  151M |  163M |   19T  /   22T (89%)
  7 | XX.YY.Z.7       | [ OK ] |  1.1K |  107K |  108K |   19T  /   22T (89%)
  8 | XX.YY.Z.8       | [ OK ] |  9.0K |   89M |   89M |   19T  /   22T (88%)
  9 | XX.YY.Z.9       | [ OK ] |  7.5M |  201K |  7.7M |   19T  /   22T (88%)
 10 | XX.YY.Z.10      | [ OK ] |     0 |  933M |  933M |   19T  /   22T (88%)
 11 | XX.YY.Z.11      | [ OK ] |  1.9K |  170M |  170M |   19T  /   22T (88%)
 12 | XX.YY.Z.12      | [ OK ] |   992 |  948M |  948M |   19T  /   22T (89%)
 13 | XX.YY.Z.13      | [ OK ] |  6.2M |  161M |  167M |   19T  /   22T (89%)
 14 | XX.YY.Z.14      | [ OK ] |   80M |  228M |  308M |   19T  /   22T (88%)
 15 | XX.YY.Z.15      | [ OK ] |   762 |  101M |  101M |   19T  /   22T (88%)
 16 | XX.YY.Z.16      | [ OK ] |  1.6K |  178K |  180K |   19T  /   22T (89%)
 17 | XX.YY.Z.17      | [ OK ] |   22M |  441M |  463M |   19T  /   22T (88%)
 18 | XX.YY.Z.18      | [ OK ] |     0 |  303M |  303M |   19T  /   22T (88%)
 19 | XX.YY.Z.19      | [ OK ] |  1.0M |  334M |  335M |   19T  /   22T (88%)
 20 | XX.YY.Z.20      | [ OK ] |  3.1M |   17M |   20M |   19T  /   22T (88%)
 21 | XX.YY.Z.21      | [ OK ] |  127M |  6.6M |  133M |   19T  /   22T (88%)
 22 | XX.YY.Z.22      | [ OK ] |   29M |  126M |  155M |   19T  /   22T (89%)
 23 | XX.YY.Z.23      | [ OK ] |     0 |     0 |     0 |   19T  /   22T (88%)
 24 | XX.YY.Z.24      | [ OK ] |     0 |   74M |   74M |   19T  /   22T (88%)
 25 | XX.YY.Z.25      | [ OK ] |   765 |     0 |   765 |   19T  /   22T (88%)
 26 | XX.YY.Z.26      | [ OK ] |  380K |   99M |  100M |   19T  /   22T (88%)
 27 | XX.YY.Z.27      | [ OK ] |   12M |  136M |  148M |   19T  /   22T (88%)
 28 | XX.YY.Z.28      | [ OK ] |  1.1K |     0 |  1.1K |   19T  /   22T (88%)
 29 | XX.YY.Z.29      | [ OK ] |  5.4M |  1.1G |  1.1G |   19T  /   22T (88%)
-------------------------------+-------+-------+-------+-----------------------
 Cluster Totals:               |  1.3G |  6.0G |  7.3G |  558T  /  627T (88%)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

No Alerts.

--> Then, to get the status of a specific node (here : I want to know
on node 29 of the cluster) :

# isi status -n 29

Node LNN:            29
Node ID:             39
Node Name:           my-cluster-1-29
Node IP Address:     XX.YY.Z.29
Node Health:          [ OK ]
Node SN:             1234567890
Node Capacity:        22T
Available:           2.4T (11%)
Used:                 19T (88%)

Network Status:
        See 'isi networks list interfaces -v' for more detail or man(8) isi.
Internal:            2 IB network interfaces (2 up, 0 down)
External:            2 GbE network interfaces (2 up, 0 down)
                     1 Aggregated network interfaces (0 up, 1 down)

Disk Drive Status:
  Bay  1 <12>      Bay  2 <15>      Bay  3 <18>      Bay  4 <21>
    13Mb/s           12Mb/s          6.7Mb/s          6.0Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]

  Bay  5 <13>      Bay  6 <16>      Bay  7 <19>      Bay  8 <22>
   4.5Mb/s           15Mb/s           16Mb/s          8.5Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]

  Bay  9 <14>      Bay 10 <17>      Bay 11 <20>      Bay 12 <23>
    11Mb/s          8.3Mb/s          6.6Mb/s          5.0Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]

  Bay 13 <3>       Bay 14 <6>       Bay 15 <9>       Bay 16 <0>
   7.2Mb/s          6.1Mb/s          7.3Mb/s          8.2Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]

  Bay 17 <4>       Bay 18 <7>       Bay 19 <10>      Bay 20 <1>
   6.5Mb/s           12Mb/s          3.1Mb/s          3.0Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]

  Bay 21 <5>       Bay 22 <8>       Bay 23 <11>      Bay 24 <2>
   8.2Mb/s          6.7Mb/s          6.3Mb/s          6.8Mb/s
  [HEALTHY]        [HEALTHY]        [HEALTHY]        [HEALTHY]




2011/5/26 Fyodor Ustinov <ufm@ufm.su>:
> Hi!
>
> How to get information about status of each server in cluster?
>
> #ceph osd stat
> 2011-05-26 15:07:05.103621 mon <- [osd,stat]
> 2011-05-26 15:07:05.104201 mon0 -> 'e413: 6 osds: 5 up, 5 in' (0)
>
> I see - in cluster 6 osd servers and now up only 5.  How do I know which
> server is down?
>
> More global question - how to monitor the state of servers in a cluster?
>
> WBR,
>    Fyodor.
>
> P.S. JFYI: key "-s" do not described in manual page about ceph command.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cluster status information
  2011-05-26 12:18 cluster status information Fyodor Ustinov
  2011-05-26 12:43 ` Wilfrid Allembrand
@ 2011-05-26 13:38 ` Dyweni - Ceph-Devel
  2011-05-26 14:54 ` Wido den Hollander
  2 siblings, 0 replies; 4+ messages in thread
From: Dyweni - Ceph-Devel @ 2011-05-26 13:38 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel



> Hi!
>
> How to get information about status of each server in cluster?
>
> #ceph osd stat
> 2011-05-26 15:07:05.103621 mon 'e413: 6 osds: 5 up, 5 in' (0)
>
> I see - in cluster 6 osd servers and now up only 5. How do I know 
> which
> server is down?

 # ceph osd dump -o -


>
> More global question - how to monitor the state of servers in a 
> cluster?
>

 # ceph -w


> WBR,
> Fyodor.
>
> P.S. JFYI: key "-s" do not described in manual page about ceph 
> command.


 Thanks,
 Dyweni


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cluster status information
  2011-05-26 12:18 cluster status information Fyodor Ustinov
  2011-05-26 12:43 ` Wilfrid Allembrand
  2011-05-26 13:38 ` Dyweni - Ceph-Devel
@ 2011-05-26 14:54 ` Wido den Hollander
  2 siblings, 0 replies; 4+ messages in thread
From: Wido den Hollander @ 2011-05-26 14:54 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Hi,

On Thu, 2011-05-26 at 15:18 +0300, Fyodor Ustinov wrote:
> Hi!
> 
> How to get information about status of each server in cluster?
> 
> #ceph osd stat
> 2011-05-26 15:07:05.103621 mon <- [osd,stat]
> 2011-05-26 15:07:05.104201 mon0 -> 'e413: 6 osds: 5 up, 5 in' (0)
> 
> I see - in cluster 6 osd servers and now up only 5.  How do I know which 
> server is down?

Like mentioned, this can be done with: ceph osd dump -o -

Or you can write it to a file: ceph osd dump -o osdstatus.txt

> 
> More global question - how to monitor the state of servers in a cluster?

Currently 'ceph -s' or 'ceph health' will give you the best available
information, but this is a aspect that is being worked on.

Getting more and better monitoring is on the wishlist (so I heard).

There is a issue about this in the tracker:
http://tracker.newdream.net/issues/685

Once this information is exposed via libceph, writing an application
which shows it in a nice format is trivial.

Wido

> 
> WBR,
>      Fyodor.
> 
> P.S. JFYI: key "-s" do not described in manual page about ceph command.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-05-26 14:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-26 12:18 cluster status information Fyodor Ustinov
2011-05-26 12:43 ` Wilfrid Allembrand
2011-05-26 13:38 ` Dyweni - Ceph-Devel
2011-05-26 14:54 ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.