Possible memory leak in mon?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Possible memory leak in mon?
@ 2012-05-02 22:28 Vladimir Bashkirtsev
  2012-05-03  0:22 ` Greg Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-02 22:28 UTC (permalink / raw)
  To: ceph-devel

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?

Regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Possible memory leak in mon?
@ 2012-05-02 22:49 Vladimir Bashkirtsev
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-02 22:49 UTC (permalink / raw)
  To: ceph-devel

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?
4. I guess it is good idea to keep mon fs on separate partition so it 
would not experience disk-full state. Currently it is around 80Mb while 
whole ceph 42% full of 2100Gb with 6 OSDs and 600 pgs. Can you provide 
some idea how to estimate mon fs size?

Regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Possible memory leak in mon?
@ 2012-05-02 23:36 Vladimir Bashkirtsev
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-02 23:36 UTC (permalink / raw)
  To: ceph-devel

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?
4. I guess it is good idea to keep mon fs on separate partition so it 
would not experience disk-full state. Currently it is around 80Mb while 
whole ceph 42% full of 2100Gb with 6 OSDs and 600 pgs. Can you provide 
some idea how to estimate mon fs size?

Regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Possible memory leak in mon?
@ 2012-05-02 23:52 Vladimir Bashkirtsev
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-02 23:52 UTC (permalink / raw)
  To: ceph-devel

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?
4. I guess it is good idea to keep mon fs on separate partition so it 
would not experience disk-full state. Currently it is around 80Mb while 
whole ceph 42% full of 2100Gb with 6 OSDs and 600 pgs. Can you provide 
some idea how to estimate mon fs size?

Regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-02 22:28 Possible memory leak in mon? Vladimir Bashkirtsev
@ 2012-05-03  0:22 ` Greg Farnum
  2012-05-03  6:24   ` Vladimir Bashkirtsev
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Farnum @ 2012-05-03  0:22 UTC (permalink / raw)
  To: Vladimir Bashkirtsev; +Cc: ceph-devel

On Wednesday, May 2, 2012 at 3:28 PM, Vladimir Bashkirtsev wrote:
> Dear devs,
> 
> I have three mons and two of them suddenly consumed around 4G of RAM 
> while third one happily lived with 150M. This immediately prompts few 
> questions:
> 
> 1. What is expected memory use of mon? I believed that mon merely 
> directs clients to relevant OSDs and should not consume a lot of 
> resources - please correct me if I am wrong.
> 2. In both cases where mon consumed a lot of memory it was preceded by 
> disk-full condition and both machines where incidents happened are 64 
> bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
> same partition - ceph osd produced a lot of messages, filled up disk, 
> mon crashed (no core as disk was full), manually deleted logs, restarted 
> mon without any issue, some time later found mon using 4G of RAM. 
> Running 0.45. Should I deliberately recreate conditions and crash mon to 
> get more debug info (if you need it of course, and if yes then what)?
> 3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
> potentially can consume more than 4G?
> 
> Regards,
> Vladimir
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org (mailto:majordomo@vger.kernel.org)
> More majordomo info at http://vger.kernel.org/majordomo-info.html

First: one email is enough. 

Second: in normal use your monitors should not consume very much memory. It sounds like something's wrong. Can you please provide the output of "ceph -s"?
Also, do you have any monitor logging on? My best guess is that for some reason the monitors aren't all communicating with each other and so they are buffering messages.
-Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-03  0:22 ` Greg Farnum
@ 2012-05-03  6:24   ` Vladimir Bashkirtsev
  2012-05-03  6:53     ` Greg Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-03  6:24 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

Greg,

Apologies for multiple emails: my mail server is backed by ceph now and 
it struggled this morning (separate issue). So my mail server reported 
back to my mailer that sending of email failed when obviously it was not 
the case.

[root@gamma ~]# ceph -s
2012-05-03 15:46:55.640951   mds e2666: 1/1/1 up {0=1=up:active}, 1 
up:standby
2012-05-03 15:46:55.647106   osd e10728: 6 osds: 6 up, 6 in
2012-05-03 15:46:55.654052   log 2012-05-03 15:46:26.557084 mon.2 
172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
2012-05-03 15:46:55.654425   mon e7: 3 mons at 
{0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
2012-05-03 15:46:56.961624    pg v1251669: 600 pgs: 2 creating, 598 
active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail

Loggin is on but nothing obvious in there: logs quite small. Number of 
ceph health logged (ceph monitored by nagios and so this record appears 
every 5 minutes), monitors periodically call for election (different 
periods between 1 to 15 minutes as it looks). That's it.

Regards,
Vladimir

On 03/05/12 09:52, Greg Farnum wrote:
> On Wednesday, May 2, 2012 at 3:28 PM, Vladimir Bashkirtsev wrote:
>> Dear devs,
>>
>> I have three mons and two of them suddenly consumed around 4G of RAM
>> while third one happily lived with 150M. This immediately prompts few
>> questions:
>>
>> 1. What is expected memory use of mon? I believed that mon merely
>> directs clients to relevant OSDs and should not consume a lot of
>> resources - please correct me if I am wrong.
>> 2. In both cases where mon consumed a lot of memory it was preceded by
>> disk-full condition and both machines where incidents happened are 64
>> bit, rest of cluster 32 bit. mon fs and log files happened to be in the
>> same partition - ceph osd produced a lot of messages, filled up disk,
>> mon crashed (no core as disk was full), manually deleted logs, restarted
>> mon without any issue, some time later found mon using 4G of RAM.
>> Running 0.45. Should I deliberately recreate conditions and crash mon to
>> get more debug info (if you need it of course, and if yes then what)?
>> 3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon
>> potentially can consume more than 4G?
>>
>> Regards,
>> Vladimir
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org (mailto:majordomo@vger.kernel.org)
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> First: one email is enough.
>
> Second: in normal use your monitors should not consume very much memory. It sounds like something's wrong. Can you please provide the output of "ceph -s"?
> Also, do you have any monitor logging on? My best guess is that for some reason the monitors aren't all communicating with each other and so they are buffering messages.
> -Greg
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-03  6:24   ` Vladimir Bashkirtsev
@ 2012-05-03  6:53     ` Greg Farnum
  2012-05-07  0:52       ` Vladimir Bashkirtsev
  2012-05-07  0:53       ` Vladimir Bashkirtsev
  0 siblings, 2 replies; 13+ messages in thread
From: Greg Farnum @ 2012-05-03  6:53 UTC (permalink / raw)
  To: Vladimir Bashkirtsev; +Cc: ceph-devel

On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
> Greg,
>  
> Apologies for multiple emails: my mail server is backed by ceph now and  
> it struggled this morning (separate issue). So my mail server reported  
> back to my mailer that sending of email failed when obviously it was not  
> the case.

Interesting — I presume you're using the file system? That's not something we've heard of anybody doing with Ceph before. :)

>  
> [root@gamma ~]# ceph -s
> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1  
> up:standby
> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2  
> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
> 2012-05-03 15:46:55.654425 mon e7: 3 mons at  
> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598  
> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>  
> Loggin is on but nothing obvious in there: logs quite small. Number of  
> ceph health logged (ceph monitored by nagios and so this record appears  
> every 5 minutes), monitors periodically call for election (different  
> periods between 1 to 15 minutes as it looks). That's it.

Hrm. Generally speaking the monitors shouldn't call for elections unless something changes (one of them crashes) or the leader monitor is slowing down.
Can you increase the debug_mon to 20, the debug_ms to 1, and post one of the logs somewhere? The "Live Debugging" section of http://ceph.com/wiki/Debugging should give you what you need. :)
  
>  
> Regards,
> Vladimir


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-03  6:53     ` Greg Farnum
@ 2012-05-07  0:52       ` Vladimir Bashkirtsev
  2012-05-07  0:53       ` Vladimir Bashkirtsev
  1 sibling, 0 replies; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-07  0:52 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

On 03/05/12 16:23, Greg Farnum wrote:
> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>> Greg,
>>
>> Apologies for multiple emails: my mail server is backed by ceph now and
>> it struggled this morning (separate issue). So my mail server reported
>> back to my mailer that sending of email failed when obviously it was not
>> the case.
> Interesting — I presume you're using the file system? That's not something we've heard of anybody doing with Ceph before. :)
>
>>
>> [root@gamma ~]# ceph -s
>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>> up:standby
>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>
>> Loggin is on but nothing obvious in there: logs quite small. Number of
>> ceph health logged (ceph monitored by nagios and so this record appears
>> every 5 minutes), monitors periodically call for election (different
>> periods between 1 to 15 minutes as it looks). That's it.
> Hrm. Generally speaking the monitors shouldn't call for elections unless something changes (one of them crashes) or the leader monitor is slowing down.
> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of the logs somewhere? The "Live Debugging" section of http://ceph.com/wiki/Debugging should give you what you need. :)
Here's the logs and core dumps:
http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2

Mons grown to 1.2GB and 2GB of memory.
>
>>
>> Regards,
>> Vladimir
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-03  6:53     ` Greg Farnum
  2012-05-07  0:52       ` Vladimir Bashkirtsev
@ 2012-05-07  0:53       ` Vladimir Bashkirtsev
  2012-05-14 21:23         ` Gregory Farnum
  2012-05-15 17:13         ` Gregory Farnum
  1 sibling, 2 replies; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-07  0:53 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

On 03/05/12 16:23, Greg Farnum wrote:
> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>> Greg,
>>
>> Apologies for multiple emails: my mail server is backed by ceph now and
>> it struggled this morning (separate issue). So my mail server reported
>> back to my mailer that sending of email failed when obviously it was not
>> the case.
> Interesting — I presume you're using the file system? That's not something we've heard of anybody doing with Ceph before. :)
>
>>
>> [root@gamma ~]# ceph -s
>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>> up:standby
>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>
>> Loggin is on but nothing obvious in there: logs quite small. Number of
>> ceph health logged (ceph monitored by nagios and so this record appears
>> every 5 minutes), monitors periodically call for election (different
>> periods between 1 to 15 minutes as it looks). That's it.
> Hrm. Generally speaking the monitors shouldn't call for elections unless something changes (one of them crashes) or the leader monitor is slowing down.
> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of the logs somewhere? The "Live Debugging" section of http://ceph.com/wiki/Debugging should give you what you need. :)
Here's the logs and core dumps:
http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2

Mons grown to 1.2GB and 2GB of memory.
>
>>
>> Regards,
>> Vladimir
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-07  0:53       ` Vladimir Bashkirtsev
@ 2012-05-14 21:23         ` Gregory Farnum
  2012-05-15 17:13         ` Gregory Farnum
  1 sibling, 0 replies; 13+ messages in thread
From: Gregory Farnum @ 2012-05-14 21:23 UTC (permalink / raw)
  To: Vladimir Bashkirtsev; +Cc: ceph-devel

On Sun, May 6, 2012 at 5:53 PM, Vladimir Bashkirtsev
<vladimir@bashkirtsev.com> wrote:
> On 03/05/12 16:23, Greg Farnum wrote:
>>
>> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>>>
>>> Greg,
>>>
>>> Apologies for multiple emails: my mail server is backed by ceph now and
>>> it struggled this morning (separate issue). So my mail server reported
>>> back to my mailer that sending of email failed when obviously it was not
>>> the case.
>>
>> Interesting — I presume you're using the file system? That's not something
>> we've heard of anybody doing with Ceph before. :)
>>
>>>
>>> [root@gamma ~]# ceph -s
>>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>>> up:standby
>>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>>
>>> Loggin is on but nothing obvious in there: logs quite small. Number of
>>> ceph health logged (ceph monitored by nagios and so this record appears
>>> every 5 minutes), monitors periodically call for election (different
>>> periods between 1 to 15 minutes as it looks). That's it.
>>
>> Hrm. Generally speaking the monitors shouldn't call for elections unless
>> something changes (one of them crashes) or the leader monitor is slowing
>> down.
>> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of
>> the logs somewhere? The "Live Debugging" section of
>> http://ceph.com/wiki/Debugging should give you what you need. :)
>
> Here's the logs and core dumps:
> http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2
>
> Mons grown to 1.2GB and 2GB of memory.

Sorry for the delayed response; I was busy on vacation and launching
our new company. :)
Downloading the logs now and will probably look into them tomorrow.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-07  0:53       ` Vladimir Bashkirtsev
  2012-05-14 21:23         ` Gregory Farnum
@ 2012-05-15 17:13         ` Gregory Farnum
  2012-05-18 10:07           ` Vladimir Bashkirtsev
  1 sibling, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2012-05-15 17:13 UTC (permalink / raw)
  To: Vladimir Bashkirtsev; +Cc: ceph-devel

On Sun, May 6, 2012 at 5:53 PM, Vladimir Bashkirtsev
<vladimir@bashkirtsev.com> wrote:
> On 03/05/12 16:23, Greg Farnum wrote:
>>
>> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>>>
>>> Greg,
>>>
>>> Apologies for multiple emails: my mail server is backed by ceph now and
>>> it struggled this morning (separate issue). So my mail server reported
>>> back to my mailer that sending of email failed when obviously it was not
>>> the case.
>>
>> Interesting — I presume you're using the file system? That's not something
>> we've heard of anybody doing with Ceph before. :)
>>
>>>
>>> [root@gamma ~]# ceph -s
>>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>>> up:standby
>>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>>
>>> Loggin is on but nothing obvious in there: logs quite small. Number of
>>> ceph health logged (ceph monitored by nagios and so this record appears
>>> every 5 minutes), monitors periodically call for election (different
>>> periods between 1 to 15 minutes as it looks). That's it.
>>
>> Hrm. Generally speaking the monitors shouldn't call for elections unless
>> something changes (one of them crashes) or the leader monitor is slowing
>> down.
>> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of
>> the logs somewhere? The "Live Debugging" section of
>> http://ceph.com/wiki/Debugging should give you what you need. :)
>
> Here's the logs and core dumps:
> http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2
>
> Mons grown to 1.2GB and 2GB of memory.

When I look at the logs for mon.0, I see that there are a lot of
places where mon.0 takes tens of seconds to write something to disk.
If the disk is just about full, that might make sense (many
filesystems don't handle a nearly-full disk very well at all); and a
monitor getting stuck for that long could definitely explain why they
start using up so much memory (they're buffering messages). I suspect
that there's not anything particularly wrong here, unless I'm
misunderstanding the story you're telling me. :) Have you noticed this
problem when the monitor's disk partition isn't nearly full?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-15 17:13         ` Gregory Farnum
@ 2012-05-18 10:07           ` Vladimir Bashkirtsev
  2012-05-21 18:18             ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Bashkirtsev @ 2012-05-18 10:07 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 16/05/12 02:43, Gregory Farnum wrote:
> On Sun, May 6, 2012 at 5:53 PM, Vladimir Bashkirtsev
> <vladimir@bashkirtsev.com>  wrote:
>> On 03/05/12 16:23, Greg Farnum wrote:
>>> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>>>> Greg,
>>>>
>>>> Apologies for multiple emails: my mail server is backed by ceph now and
>>>> it struggled this morning (separate issue). So my mail server reported
>>>> back to my mailer that sending of email failed when obviously it was not
>>>> the case.
>>> Interesting — I presume you're using the file system? That's not something
>>> we've heard of anybody doing with Ceph before. :)
>>>
>>>> [root@gamma ~]# ceph -s
>>>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>>>> up:standby
>>>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>>>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>>>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>>>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>>>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>>>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>>>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>>>
>>>> Loggin is on but nothing obvious in there: logs quite small. Number of
>>>> ceph health logged (ceph monitored by nagios and so this record appears
>>>> every 5 minutes), monitors periodically call for election (different
>>>> periods between 1 to 15 minutes as it looks). That's it.
>>> Hrm. Generally speaking the monitors shouldn't call for elections unless
>>> something changes (one of them crashes) or the leader monitor is slowing
>>> down.
>>> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of
>>> the logs somewhere? The "Live Debugging" section of
>>> http://ceph.com/wiki/Debugging should give you what you need. :)
>> Here's the logs and core dumps:
>> http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2
>>
>> Mons grown to 1.2GB and 2GB of memory.
> When I look at the logs for mon.0, I see that there are a lot of
> places where mon.0 takes tens of seconds to write something to disk.
> If the disk is just about full, that might make sense (many
> filesystems don't handle a nearly-full disk very well at all); and a
> monitor getting stuck for that long could definitely explain why they
> start using up so much memory (they're buffering messages). I suspect
> that there's not anything particularly wrong here, unless I'm
> misunderstanding the story you're telling me. :) Have you noticed this
> problem when the monitor's disk partition isn't nearly full?
> -Greg
I have recreated conditions when mon started to consume more memory: 
everything appears in line with your suspicions. When disk gets almost 
full, mon slows down and finally crashes quite badly so I cannot recover 
it. I am forced then to destroy mon all together and create a new one 
instead.

Long story short: in docs/wiki it should be stated as recommendation NOT 
to keep monfs on the same partition as ceph log (which can grow quickly) 
and preferably keep it on separate partition all together.

In the same time it begs another question: what it recommended partition 
size for monfs?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible memory leak in mon?
  2012-05-18 10:07           ` Vladimir Bashkirtsev
@ 2012-05-21 18:18             ` Gregory Farnum
  0 siblings, 0 replies; 13+ messages in thread
From: Gregory Farnum @ 2012-05-21 18:18 UTC (permalink / raw)
  To: Vladimir Bashkirtsev; +Cc: ceph-devel

On Fri, May 18, 2012 at 3:07 AM, Vladimir Bashkirtsev
<vladimir@bashkirtsev.com> wrote:
> On 16/05/12 02:43, Gregory Farnum wrote:
>>
>> On Sun, May 6, 2012 at 5:53 PM, Vladimir Bashkirtsev
>> <vladimir@bashkirtsev.com>  wrote:
>>>
>>> On 03/05/12 16:23, Greg Farnum wrote:
>>>>
>>>> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>>>>>
>>>>> Greg,
>>>>>
>>>>> Apologies for multiple emails: my mail server is backed by ceph now and
>>>>> it struggled this morning (separate issue). So my mail server reported
>>>>> back to my mailer that sending of email failed when obviously it was
>>>>> not
>>>>> the case.
>>>>
>>>> Interesting — I presume you're using the file system? That's not
>>>> something
>>>> we've heard of anybody doing with Ceph before. :)
>>>>
>>>>> [root@gamma ~]# ceph -s
>>>>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>>>>> up:standby
>>>>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>>>>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>>>>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>>>>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>>>>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>>>>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>>>>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>>>>
>>>>> Loggin is on but nothing obvious in there: logs quite small. Number of
>>>>> ceph health logged (ceph monitored by nagios and so this record appears
>>>>> every 5 minutes), monitors periodically call for election (different
>>>>> periods between 1 to 15 minutes as it looks). That's it.
>>>>
>>>> Hrm. Generally speaking the monitors shouldn't call for elections unless
>>>> something changes (one of them crashes) or the leader monitor is slowing
>>>> down.
>>>> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of
>>>> the logs somewhere? The "Live Debugging" section of
>>>> http://ceph.com/wiki/Debugging should give you what you need. :)
>>>
>>> Here's the logs and core dumps:
>>> http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2
>>>
>>> Mons grown to 1.2GB and 2GB of memory.
>>
>> When I look at the logs for mon.0, I see that there are a lot of
>> places where mon.0 takes tens of seconds to write something to disk.
>> If the disk is just about full, that might make sense (many
>> filesystems don't handle a nearly-full disk very well at all); and a
>> monitor getting stuck for that long could definitely explain why they
>> start using up so much memory (they're buffering messages). I suspect
>> that there's not anything particularly wrong here, unless I'm
>> misunderstanding the story you're telling me. :) Have you noticed this
>> problem when the monitor's disk partition isn't nearly full?
>> -Greg
>
> I have recreated conditions when mon started to consume more memory:
> everything appears in line with your suspicions. When disk gets almost full,
> mon slows down and finally crashes quite badly so I cannot recover it. I am
> forced then to destroy mon all together and create a new one instead.
>
> Long story short: in docs/wiki it should be stated as recommendation NOT to
> keep monfs on the same partition as ceph log (which can grow quickly) and
> preferably keep it on separate partition all together.

Patches and edits welcome! :)

> In the same time it begs another question: what it recommended partition
> size for monfs?
I'm looking at a cluster about a month old with a 765MB mon data
directory. Most of that (~500 MB) is in the log files, which can be
trimmed manually, and I believe that everything else taking up data
trims itself when stuff's working. So if you're willing to set up a
pseudo-log rotation (or do it yourself on a timer every month or so) a
couple GB should leave you plenty of breathing room.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-05-21 18:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-02 22:28 Possible memory leak in mon? Vladimir Bashkirtsev
2012-05-03  0:22 ` Greg Farnum
2012-05-03  6:24   ` Vladimir Bashkirtsev
2012-05-03  6:53     ` Greg Farnum
2012-05-07  0:52       ` Vladimir Bashkirtsev
2012-05-07  0:53       ` Vladimir Bashkirtsev
2012-05-14 21:23         ` Gregory Farnum
2012-05-15 17:13         ` Gregory Farnum
2012-05-18 10:07           ` Vladimir Bashkirtsev
2012-05-21 18:18             ` Gregory Farnum
  -- strict thread matches above, loose matches on Subject: below --
2012-05-02 22:49 Vladimir Bashkirtsev
2012-05-02 23:36 Vladimir Bashkirtsev
2012-05-02 23:52 Vladimir Bashkirtsev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.