* Tool for ceph performance analysis
@ 2015-02-24 8:40 Alyona Kiselyova
2015-02-24 11:57 ` John Spray
0 siblings, 1 reply; 9+ messages in thread
From: Alyona Kiselyova @ 2015-02-24 8:40 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Hi,
This topic was raised several times, but we since have no such thing.
We are interested in some tool, which allow us to collect all or any
counters from whole cluster. Now we have internal way to get counters
via admin socket, but we need to do it directly on each node or use
some external network commands.
There are similar message from Sage Weil in ceph-devel maillist some
weeks ago. It was about perf-watch.py script, which is available from
ceph repository, but it provides only per-node work too (and works on
vbstart cluster, so to use it on working system some changes must be
done).
We are working now on tool, which has similar possibilities, but it
can collect counters either from one node, or from all ceph nodes.
Also tool provide possibility to check system resources usage by ceph
processes.Now it uses ssh, so it doesn't work good, if you have no
password-less access to all nodes.
The first version of this tool is available on github
(https://github.com/Ved-vampir/ceph-perf-tool). May be, after
improvements, this tool will be useful for other people and it can
appear in ceph in some way. It would be cose, if such utility will be
in ceph "out of the box". May be, we can merge it?
It would be great, if there will be internal possibility to collect
info about whole cluster from one node. May be, something like
extension for "tell" command, which can call any node directly and
replace external network connections. Or improved version of "ceph osd
perf" command, which would allow to get more info.
-------------------------------
Best regards,
Alyona Kiseleva
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Tool for ceph performance analysis
2015-02-24 8:40 Tool for ceph performance analysis Alyona Kiselyova
@ 2015-02-24 11:57 ` John Spray
2015-02-24 12:16 ` John Spray
0 siblings, 1 reply; 9+ messages in thread
From: John Spray @ 2015-02-24 11:57 UTC (permalink / raw)
To: Alyona Kiselyova, ceph-devel@vger.kernel.org; +Cc: ceph-calamari@lists.ceph.com
On 24/02/2015 08:40, Alyona Kiselyova wrote:
> There are similar message from Sage Weil in ceph-devel maillist some
> weeks ago. It was about perf-watch.py script, which is available from
> ceph repository, but it provides only per-node work too (and works on
> vbstart cluster, so to use it on working system some changes must be
> done).
There is now a modernized version of perf-watch in a PR:
https://github.com/ceph/ceph/pull/3615
I posted about it to the list a little while ago but there wasn't any
interest, so it's still hanging around in a PR (subject was "Performance
watching (dstat-like) CLI mode")
>
> We are working now on tool, which has similar possibilities, but it
> can collect counters either from one node, or from all ceph nodes.
> Also tool provide possibility to check system resources usage by ceph
> processes.Now it uses ssh, so it doesn't work good, if you have no
> password-less access to all nodes.
Cool! You may also be interested in the calamari branch of diamond:
https://github.com/ceph/Diamond/tree/calamari
This will grab all the perf counters and send them back to a graphite
server that you can run whatever queries you wish to on.
> The first version of this tool is available on github
> (https://github.com/Ved-vampir/ceph-perf-tool). May be, after
> improvements, this tool will be useful for other people and it can
> appear in ceph in some way. It would be cose, if such utility will be
> in ceph "out of the box". May be, we can merge it?
There has been discussion in the past about allowing users to run
arbitrary admin socket operations via the mon, that would at least
remove the need for a program like yours to do its own SSHing. However,
regular polling of 1000s of OSDs perf stats via this mechanism could
quickly have a measurable impact on things.
The other thing that would be very nice to add into the main ceph .py
code is the general service discovery part where we enumerate which
services are running on a node and get their admin socket paths:
currently this is done in both the diamond collector module and in the
calamari salt module.
>
> It would be great, if there will be internal possibility to collect
> info about whole cluster from one node. May be, something like
> extension for "tell" command, which can call any node directly and
> replace external network connections. Or improved version of "ceph osd
> perf" command, which would allow to get more info.
>
This pretty much already exists if someone chooses to deploy
diamond+graphite. Perhaps we need to talk about what's wrong with that
solution as it stands? I'm guessing the main problem is that it's less
highly available than ceph mons, and comparatively heavyweight,
especially if one is only interested in the latest values.
Cheers,
John
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Tool for ceph performance analysis
2015-02-24 11:57 ` John Spray
@ 2015-02-24 12:16 ` John Spray
2015-02-24 14:14 ` Mark Nelson
0 siblings, 1 reply; 9+ messages in thread
From: John Spray @ 2015-02-24 12:16 UTC (permalink / raw)
To: Alyona Kiselyova, ceph-devel@vger.kernel.org; +Cc: ceph-calamari@lists.ceph.com
On 24/02/2015 11:57, John Spray wrote:
>> It would be great, if there will be internal possibility to collect
>> info about whole cluster from one node. May be, something like
>> extension for "tell" command, which can call any node directly and
>> replace external network connections. Or improved version of "ceph osd
>> perf" command, which would allow to get more info.
>>
> This pretty much already exists if someone chooses to deploy
> diamond+graphite. Perhaps we need to talk about what's wrong with
> that solution as it stands? I'm guessing the main problem is that
> it's less highly available than ceph mons, and comparatively
> heavyweight, especially if one is only interested in the latest values.
Ah, I also forgot to mention: it is not very hard to make a cut-down
version of calamari that doesn't require lots of heavyweight
dependencies. I started building this a while back before switching
tasks, but there's an old branch here:
https://github.com/ceph/calamari/commits/wip-lite
The key things there are that it doesn't require a postgres database,
and the remote-execution is abstracted into a "Remote" interface so that
you can implement alternatives to salt (e.g. SSH, or run locally on
mon). It's all free software so borrow what you wish ;-) The point is
that it isn't necessary to start from scratch in order to get something
lightweight.
Cheers,
John
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Tool for ceph performance analysis
2015-02-24 12:16 ` John Spray
@ 2015-02-24 14:14 ` Mark Nelson
[not found] ` <1780156480.1852442.1424793103509.JavaMail.zimbra@oxygem.tv>
2015-02-25 1:45 ` Brad Hubbard
0 siblings, 2 replies; 9+ messages in thread
From: Mark Nelson @ 2015-02-24 14:14 UTC (permalink / raw)
To: John Spray, Alyona Kiselyova, ceph-devel@vger.kernel.org
Cc: ceph-calamari@lists.ceph.com
On 02/24/2015 06:16 AM, John Spray wrote:
>
> On 24/02/2015 11:57, John Spray wrote:
>>> It would be great, if there will be internal possibility to collect
>>> info about whole cluster from one node. May be, something like
>>> extension for "tell" command, which can call any node directly and
>>> replace external network connections. Or improved version of "ceph osd
>>> perf" command, which would allow to get more info.
>>>
>> This pretty much already exists if someone chooses to deploy
>> diamond+graphite. Perhaps we need to talk about what's wrong with
>> that solution as it stands? I'm guessing the main problem is that
>> it's less highly available than ceph mons, and comparatively
>> heavyweight, especially if one is only interested in the latest values.
> Ah, I also forgot to mention: it is not very hard to make a cut-down
> version of calamari that doesn't require lots of heavyweight
> dependencies. I started building this a while back before switching
> tasks, but there's an old branch here:
> https://github.com/ceph/calamari/commits/wip-lite
>
> The key things there are that it doesn't require a postgres database,
> and the remote-execution is abstracted into a "Remote" interface so that
> you can implement alternatives to salt (e.g. SSH, or run locally on
> mon). It's all free software so borrow what you wish ;-) The point is
> that it isn't necessary to start from scratch in order to get something
> lightweight.
My personal vote is to try to get ourselves well integrated into a good
cross section of the existing tools that already do this kind of thing
(zabbix, collectd, collectl, etc). I'm slightly guilty of rolling my
own too since in cbt I gather up some of our daemon socket output from
all the hosts via ssh and just dump it in the output directory. There's
tons of other systems out there that do this kind of thing way better
though. I don't want to discourage anyone from making a new tool if
that's their preference, but I think a lot of folks would benefit if
they could just keep using their existing monitoring tools.
Perhaps part of this might be to just try to get a better idea of which
tools folks are using to do performance monitoring on their existing
clusters (ceph or otherwise). I've heard zabbix come up quite a bit
recently.
Mark
>
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <1780156480.1852442.1424793103509.JavaMail.zimbra@oxygem.tv>]
* Re: Tool for ceph performance analysis
[not found] ` <1780156480.1852442.1424793103509.JavaMail.zimbra@oxygem.tv>
@ 2015-02-24 15:51 ` Alexandre DERUMIER
0 siblings, 0 replies; 9+ messages in thread
From: Alexandre DERUMIER @ 2015-02-24 15:51 UTC (permalink / raw)
To: Mark Nelson; +Cc: John Spray, Alyona Kiselyova, ceph-devel, ceph-calamari
>>Perhaps part of this might be to just try to get a better idea of which
>>tools folks are using to do performance monitoring on their existing
>>clusters (ceph or otherwise). I've heard zabbix come up quite a bit
>>recently.
Hi, we are using graphite here with collectd to retreive host stats.
It's also eay to send custom stats to graphite. It's a simple write to an udp socket.
$conn = fsockopen("carbon.hostedgraphite.com", 2003);
fwrite($conn, "YOUR-API-KEY.foo 1.2\n");
And we use http://grafana.org/ , as frontend to manage graphs.
----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "John Spray" <john.spray@redhat.com>, "Alyona Kiselyova" <akiselyova@mirantis.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Cc: ceph-calamari@lists.ceph.com
Envoyé: Mardi 24 Février 2015 15:14:48
Objet: Re: Tool for ceph performance analysis
On 02/24/2015 06:16 AM, John Spray wrote:
>
> On 24/02/2015 11:57, John Spray wrote:
>>> It would be great, if there will be internal possibility to collect
>>> info about whole cluster from one node. May be, something like
>>> extension for "tell" command, which can call any node directly and
>>> replace external network connections. Or improved version of "ceph osd
>>> perf" command, which would allow to get more info.
>>>
>> This pretty much already exists if someone chooses to deploy
>> diamond+graphite. Perhaps we need to talk about what's wrong with
>> that solution as it stands? I'm guessing the main problem is that
>> it's less highly available than ceph mons, and comparatively
>> heavyweight, especially if one is only interested in the latest values.
> Ah, I also forgot to mention: it is not very hard to make a cut-down
> version of calamari that doesn't require lots of heavyweight
> dependencies. I started building this a while back before switching
> tasks, but there's an old branch here:
> https://github.com/ceph/calamari/commits/wip-lite
>
> The key things there are that it doesn't require a postgres database,
> and the remote-execution is abstracted into a "Remote" interface so that
> you can implement alternatives to salt (e.g. SSH, or run locally on
> mon). It's all free software so borrow what you wish ;-) The point is
> that it isn't necessary to start from scratch in order to get something
> lightweight.
My personal vote is to try to get ourselves well integrated into a good
cross section of the existing tools that already do this kind of thing
(zabbix, collectd, collectl, etc). I'm slightly guilty of rolling my
own too since in cbt I gather up some of our daemon socket output from
all the hosts via ssh and just dump it in the output directory. There's
tons of other systems out there that do this kind of thing way better
though. I don't want to discourage anyone from making a new tool if
that's their preference, but I think a lot of folks would benefit if
they could just keep using their existing monitoring tools.
Perhaps part of this might be to just try to get a better idea of which
tools folks are using to do performance monitoring on their existing
clusters (ceph or otherwise). I've heard zabbix come up quite a bit
recently.
Mark
>
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Tool for ceph performance analysis
2015-02-24 14:14 ` Mark Nelson
[not found] ` <1780156480.1852442.1424793103509.JavaMail.zimbra@oxygem.tv>
@ 2015-02-25 1:45 ` Brad Hubbard
2015-02-25 1:48 ` Mark Nelson
1 sibling, 1 reply; 9+ messages in thread
From: Brad Hubbard @ 2015-02-25 1:45 UTC (permalink / raw)
To: Mark Nelson, John Spray, Alyona Kiselyova,
ceph-devel@vger.kernel.org
Cc: ceph-calamari@lists.ceph.com
On 02/25/2015 12:14 AM, Mark Nelson wrote:
> On 02/24/2015 06:16 AM, John Spray wrote:
>>
>> On 24/02/2015 11:57, John Spray wrote:
>>>> It would be great, if there will be internal possibility to collect
>>>> info about whole cluster from one node. May be, something like
>>>> extension for "tell" command, which can call any node directly and
>>>> replace external network connections. Or improved version of "ceph osd
>>>> perf" command, which would allow to get more info.
>>>>
>>> This pretty much already exists if someone chooses to deploy
>>> diamond+graphite. Perhaps we need to talk about what's wrong with
>>> that solution as it stands? I'm guessing the main problem is that
>>> it's less highly available than ceph mons, and comparatively
>>> heavyweight, especially if one is only interested in the latest values.
>> Ah, I also forgot to mention: it is not very hard to make a cut-down
>> version of calamari that doesn't require lots of heavyweight
>> dependencies. I started building this a while back before switching
>> tasks, but there's an old branch here:
>> https://github.com/ceph/calamari/commits/wip-lite
>>
>> The key things there are that it doesn't require a postgres database,
>> and the remote-execution is abstracted into a "Remote" interface so that
>> you can implement alternatives to salt (e.g. SSH, or run locally on
>> mon). It's all free software so borrow what you wish ;-) The point is
>> that it isn't necessary to start from scratch in order to get something
>> lightweight.
>
> My personal vote is to try to get ourselves well integrated into a good cross section of the existing tools that already do this kind of thing (zabbix, collectd, collectl, etc)
...and PCP (Performance Co-Pilot) which I have begun work on.
> I'm slightly guilty of rolling my own too since in cbt I gather up some of our daemon socket output from all the hosts via ssh and just dump it in the output directory. There's tons of other systems out there that do this kind of thing way better though. I don't want to discourage anyone from making a new tool if that's their preference, but I think a lot of folks would benefit if they could just keep using their existing monitoring tools.
>
> Perhaps part of this might be to just try to get a better idea of which tools folks are using to do performance monitoring on their existing clusters (ceph or otherwise). I've heard zabbix come up quite a bit recently.
>
> Mark
>
>>
>> Cheers,
>> John
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Kindest Regards,
Brad Hubbard
Senior Software Maintenance Engineer
Red Hat Global Support Services
Asia Pacific Region
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Tool for ceph performance analysis
2015-02-25 1:45 ` Brad Hubbard
@ 2015-02-25 1:48 ` Mark Nelson
2015-02-25 1:56 ` Brad Hubbard
0 siblings, 1 reply; 9+ messages in thread
From: Mark Nelson @ 2015-02-25 1:48 UTC (permalink / raw)
To: bhubbard, John Spray, Alyona Kiselyova,
ceph-devel@vger.kernel.org
Cc: ceph-calamari@lists.ceph.com
On 02/24/2015 07:45 PM, Brad Hubbard wrote:
> On 02/25/2015 12:14 AM, Mark Nelson wrote:
>> On 02/24/2015 06:16 AM, John Spray wrote:
>>>
>>> On 24/02/2015 11:57, John Spray wrote:
>>>>> It would be great, if there will be internal possibility to collect
>>>>> info about whole cluster from one node. May be, something like
>>>>> extension for "tell" command, which can call any node directly and
>>>>> replace external network connections. Or improved version of "ceph osd
>>>>> perf" command, which would allow to get more info.
>>>>>
>>>> This pretty much already exists if someone chooses to deploy
>>>> diamond+graphite. Perhaps we need to talk about what's wrong with
>>>> that solution as it stands? I'm guessing the main problem is that
>>>> it's less highly available than ceph mons, and comparatively
>>>> heavyweight, especially if one is only interested in the latest values.
>>> Ah, I also forgot to mention: it is not very hard to make a cut-down
>>> version of calamari that doesn't require lots of heavyweight
>>> dependencies. I started building this a while back before switching
>>> tasks, but there's an old branch here:
>>> https://github.com/ceph/calamari/commits/wip-lite
>>>
>>> The key things there are that it doesn't require a postgres database,
>>> and the remote-execution is abstracted into a "Remote" interface so that
>>> you can implement alternatives to salt (e.g. SSH, or run locally on
>>> mon). It's all free software so borrow what you wish ;-) The point is
>>> that it isn't necessary to start from scratch in order to get something
>>> lightweight.
>>
>> My personal vote is to try to get ourselves well integrated into a
>> good cross section of the existing tools that already do this kind of
>> thing (zabbix, collectd, collectl, etc)
>
> ...and PCP (Performance Co-Pilot) which I have begun work on.
Indeed! I think this just goes to show that there's not going to be one
set way that people do this. We need to appeal to a broad coalition of
folks.
>
>> I'm slightly guilty of rolling my own too since in cbt I gather up
>> some of our daemon socket output from all the hosts via ssh and just
>> dump it in the output directory. There's tons of other systems out
>> there that do this kind of thing way better though. I don't want to
>> discourage anyone from making a new tool if that's their preference,
>> but I think a lot of folks would benefit if they could just keep using
>> their existing monitoring tools.
>>
>> Perhaps part of this might be to just try to get a better idea of
>> which tools folks are using to do performance monitoring on their
>> existing clusters (ceph or otherwise). I've heard zabbix come up
>> quite a bit recently.
>>
>> Mark
>>
>>>
>>> Cheers,
>>> John
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Tool for ceph performance analysis
2015-02-25 1:48 ` Mark Nelson
@ 2015-02-25 1:56 ` Brad Hubbard
2015-02-25 2:31 ` Brad Hubbard
0 siblings, 1 reply; 9+ messages in thread
From: Brad Hubbard @ 2015-02-25 1:56 UTC (permalink / raw)
To: Mark Nelson, John Spray, Alyona Kiselyova,
ceph-devel@vger.kernel.org
Cc: ceph-calamari@lists.ceph.com
On 02/25/2015 11:48 AM, Mark Nelson wrote:
>
>
> On 02/24/2015 07:45 PM, Brad Hubbard wrote:
>> On 02/25/2015 12:14 AM, Mark Nelson wrote:
>>> On 02/24/2015 06:16 AM, John Spray wrote:
>>>>
>>>> On 24/02/2015 11:57, John Spray wrote:
>>>>>> It would be great, if there will be internal possibility to collect
>>>>>> info about whole cluster from one node. May be, something like
>>>>>> extension for "tell" command, which can call any node directly and
>>>>>> replace external network connections. Or improved version of "ceph osd
>>>>>> perf" command, which would allow to get more info.
>>>>>>
>>>>> This pretty much already exists if someone chooses to deploy
>>>>> diamond+graphite. Perhaps we need to talk about what's wrong with
>>>>> that solution as it stands? I'm guessing the main problem is that
>>>>> it's less highly available than ceph mons, and comparatively
>>>>> heavyweight, especially if one is only interested in the latest values.
>>>> Ah, I also forgot to mention: it is not very hard to make a cut-down
>>>> version of calamari that doesn't require lots of heavyweight
>>>> dependencies. I started building this a while back before switching
>>>> tasks, but there's an old branch here:
>>>> https://github.com/ceph/calamari/commits/wip-lite
>>>>
>>>> The key things there are that it doesn't require a postgres database,
>>>> and the remote-execution is abstracted into a "Remote" interface so that
>>>> you can implement alternatives to salt (e.g. SSH, or run locally on
>>>> mon). It's all free software so borrow what you wish ;-) The point is
>>>> that it isn't necessary to start from scratch in order to get something
>>>> lightweight.
>>>
>>> My personal vote is to try to get ourselves well integrated into a
>>> good cross section of the existing tools that already do this kind of
>>> thing (zabbix, collectd, collectl, etc)
>>
>> ...and PCP (Performance Co-Pilot) which I have begun work on.
>
> Indeed! I think this just goes to show that there's not going to be one set way that people do this. We need to appeal to a broad coalition of folks.
Right, and let each solution stand on it's merits.
>
>>
>>> I'm slightly guilty of rolling my own too since in cbt I gather up
>>> some of our daemon socket output from all the hosts via ssh and just
>>> dump it in the output directory. There's tons of other systems out
>>> there that do this kind of thing way better though. I don't want to
>>> discourage anyone from making a new tool if that's their preference,
>>> but I think a lot of folks would benefit if they could just keep using
>>> their existing monitoring tools.
>>>
>>> Perhaps part of this might be to just try to get a better idea of
>>> which tools folks are using to do performance monitoring on their
>>> existing clusters (ceph or otherwise). I've heard zabbix come up
>>> quite a bit recently.
>>>
>>> Mark
>>>
>>>>
>>>> Cheers,
>>>> John
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Kindest Regards,
Brad Hubbard
Senior Software Maintenance Engineer
Red Hat Global Support Services
Asia Pacific Region
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-02-25 2:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-24 8:40 Tool for ceph performance analysis Alyona Kiselyova
2015-02-24 11:57 ` John Spray
2015-02-24 12:16 ` John Spray
2015-02-24 14:14 ` Mark Nelson
[not found] ` <1780156480.1852442.1424793103509.JavaMail.zimbra@oxygem.tv>
2015-02-24 15:51 ` Alexandre DERUMIER
2015-02-25 1:45 ` Brad Hubbard
2015-02-25 1:48 ` Mark Nelson
2015-02-25 1:56 ` Brad Hubbard
2015-02-25 2:31 ` Brad Hubbard
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.