osd memory usage with lots of objects

All of lore.kernel.org
 help / color / mirror / Atom feed

* osd memory usage with lots of objects
@ 2011-01-04 21:58 John Leach
  2011-01-04 22:28 ` Gregory Farnum
  2011-01-05  0:20 ` Colin McCabe
  0 siblings, 2 replies; 4+ messages in thread
From: John Leach @ 2011-01-04 21:58 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've got a 3 node test cluster (3 mons, 3 osds) with about 24,000,000
very small objects across 2400 pools (written directly with librados,
this isn't a ceph filesystem).

The cosd processes have steadily grown in ram size and have finally
exhausted ram and are getting killed by the oom killer (the nodes have
6gig RAM and no swap).

When I start them back up they just very quickly increase in ram size
again and get killed.

Is this expected? Do the osds require a certain amount of resident
memory relative to the data size (or perhaps number of objects)?

Can you offer any guidance on planning for ram usage?

I'm running ceph 0.24 on 64bit Ubuntu Lucid servers.  In case it's
useful, I've just written these objects serially, no reading, no
rewrites, updates or snapshots.

I've got some further questions/observations about disk usage with this
scenario but I'll start a separate thread about that.

Thanks,

John. 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: osd memory usage with lots of objects
  2011-01-04 21:58 osd memory usage with lots of objects John Leach
@ 2011-01-04 22:28 ` Gregory Farnum
  2011-01-05  0:39   ` John Leach
  2011-01-05  0:20 ` Colin McCabe
  1 sibling, 1 reply; 4+ messages in thread
From: Gregory Farnum @ 2011-01-04 22:28 UTC (permalink / raw)
  To: John Leach; +Cc: ceph-devel

On Tue, Jan 4, 2011 at 1:58 PM, John Leach <john@brightbox.co.uk> wrote:
> Hi,
>
> I've got a 3 node test cluster (3 mons, 3 osds) with about 24,000,000
> very small objects across 2400 pools (written directly with librados,
> this isn't a ceph filesystem).
>
> The cosd processes have steadily grown in ram size and have finally
> exhausted ram and are getting killed by the oom killer (the nodes have
> 6gig RAM and no swap).
>
> When I start them back up they just very quickly increase in ram size
> again and get killed.
>
> Is this expected?
No, it's definitely not. :/

> Do the osds require a certain amount of resident
> memory relative to the data size (or perhaps number of objects)?
Well, there's a small amount of memory overhead per-PG and per-pool,
but the data size and number of objects shouldn't impact it. And I
presume you haven't been changing your pgnum as you go?

So, some questions:
1) How far through startup do your OSDs get before crashing? Does
peering complete (I'd expect no)? Can you show us the output of "ceph
-w" during your attempted startup?
2) Assuming you've built them with tcmalloc, can you enable memory
profiling before you try and start it up, and post the results
somewhere? (http://ceph.newdream.net/wiki/Memory_Profiling will get
you started)


> Can you offer any guidance on planning for ram usage?
Our target is under a few hundred megabytes. In the past whenever
we've seen usage higher than this during normal operation we've had
serious memory leaks. 6GB is way past what the memory requirements
should ever be, though of course the more RAM you have the more
file/object data can be cached in-memory which can provide some nice
boosts in read bandwidth.

That said, we haven't been very careful about memory usage in our
peering code and this may be the cause of your problems with starting
up again. But it wouldn't explain why they ran out of memory to begin
with.

> I've got some further questions/observations about disk usage with this
> scenario but I'll start a separate thread about that.
Please do! :)
-Greg

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: osd memory usage with lots of objects
  2011-01-04 21:58 osd memory usage with lots of objects John Leach
  2011-01-04 22:28 ` Gregory Farnum
@ 2011-01-05  0:20 ` Colin McCabe
  1 sibling, 0 replies; 4+ messages in thread
From: Colin McCabe @ 2011-01-05  0:20 UTC (permalink / raw)
  To: John Leach; +Cc: ceph-devel

A week or two back, I had some cases where cosd got killed by the OOM
killer on my test box.

Someone else was hogging memory with some other programs running on
the same computer, so I thought that was the cause. Also, it didn't
happen again after like the first two times, so I turned my attention
to other things.

Unfortunately SIGKILL, which the OOM killer sends, is impossible to
handle. However, it would be nice if we could dump out a memory usage
report when the usage rises above a certain (user-defined) point.

Colin


On Tue, Jan 4, 2011 at 1:58 PM, John Leach <john@brightbox.co.uk> wrote:
> Hi,
>
> I've got a 3 node test cluster (3 mons, 3 osds) with about 24,000,000
> very small objects across 2400 pools (written directly with librados,
> this isn't a ceph filesystem).
>
> The cosd processes have steadily grown in ram size and have finally
> exhausted ram and are getting killed by the oom killer (the nodes have
> 6gig RAM and no swap).
>
> When I start them back up they just very quickly increase in ram size
> again and get killed.
>
> Is this expected? Do the osds require a certain amount of resident
> memory relative to the data size (or perhaps number of objects)?
>
> Can you offer any guidance on planning for ram usage?
>
> I'm running ceph 0.24 on 64bit Ubuntu Lucid servers.  In case it's
> useful, I've just written these objects serially, no reading, no
> rewrites, updates or snapshots.
>
> I've got some further questions/observations about disk usage with this
> scenario but I'll start a separate thread about that.
>
> Thanks,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: osd memory usage with lots of objects
  2011-01-04 22:28 ` Gregory Farnum
@ 2011-01-05  0:39   ` John Leach
  0 siblings, 0 replies; 4+ messages in thread
From: John Leach @ 2011-01-05  0:39 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 4736 bytes --]

On Tue, 2011-01-04 at 14:28 -0800, Gregory Farnum wrote:
> On Tue, Jan 4, 2011 at 1:58 PM, John Leach <john@brightbox.co.uk> wrote:
> > Hi,
> >
> > I've got a 3 node test cluster (3 mons, 3 osds) with about 24,000,000
> > very small objects across 2400 pools (written directly with librados,
> > this isn't a ceph filesystem).
> >
> > The cosd processes have steadily grown in ram size and have finally
> > exhausted ram and are getting killed by the oom killer (the nodes have
> > 6gig RAM and no swap).
> >
> > When I start them back up they just very quickly increase in ram size
> > again and get killed.
> >
> > Is this expected?
> No, it's definitely not. :/

excellent news (much better than it being expected!)

> 
> > Do the osds require a certain amount of resident
> > memory relative to the data size (or perhaps number of objects)?
> Well, there's a small amount of memory overhead per-PG and per-pool,
> but the data size and number of objects shouldn't impact it. And I
> presume you haven't been changing your pgnum as you go?

I haven't touched the pg_nums on this cluster that I recall (it's been
up a couple of weeks but has nearly exclusively been used for writing
this test data).

> 
> So, some questions:
> 1) How far through startup do your OSDs get before crashing? Does
> peering complete (I'd expect no)? Can you show us the output of "ceph
> -w" during your attempted startup?

2011-01-05 00:17:58.532524   mon e1: 3 mons at {0=10.135.211.78:6789/0,1=10.61.136.222:6789/0,2=10.202.105.222:6789/0}
2011-01-05 00:22:53.325264   osd e10659: 3 osds: 3 up, 3 in
2011-01-05 00:22:53.383272    pg v151295: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)
2011-01-05 00:22:53.422433   log 2011-01-05 00:22:53.325027 mon0 10.135.211.78:6789/0 4 : [INF] osd0 10.135.211.78:6801/31836 boot
2011-01-05 00:24:47.301186    pg v151296: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)

<cosd crashes here>

2011-01-05 00:25:52.422340   log 2011-01-05 00:25:52.189259 mon0 10.135.211.78:6789/0 5 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:25:57.265635   log 2011-01-05 00:25:57.121870 mon0 10.135.211.78:6789/0 6 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:26:02.341805   osd e10660: 3 osds: 2 up, 3 in
2011-01-05 00:26:02.362526   log 2011-01-05 00:26:02.127627 mon0 10.135.211.78:6789/0 7 : [INF] osd0 10.135.211.78:6801/31836 failed (by osd2 10.61.136.222:6800/915)
2011-01-05 00:26:02.470942    pg v151297: 20936 pgs: 1 creating, 2 peering, 10352 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 476 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 12489924/49420044 degraded (25.273%)
2011-01-05 00:26:12.578266    pg v151298: 20936 pgs: 1 creating, 2 peering, 3393 crashed+peering, 3052 active+clean+degraded, 7053 degraded+peering, 7435 crashed+degraded+peering; 24130 MB data, 266 GB used, 332 GB / 630 GB avail; 20728862/49420044 degraded (41.944%)


> 2) Assuming you've built them with tcmalloc, can you enable memory
> profiling before you try and start it up, and post the results
> somewhere? (http://ceph.newdream.net/wiki/Memory_Profiling will get
> you started)

done. attached pprof output from last heap profile before cosd was
killed.

Watching the process carefully, I noticed that it doesn't grow in size
steadily.  It slowly grows to around 2gig (say over a 10 minute period)
then suddenly inflates to 5.5gig in perhaps less than a minute.

> 
> 
> > Can you offer any guidance on planning for ram usage?
> Our target is under a few hundred megabytes. In the past whenever
> we've seen usage higher than this during normal operation we've had
> serious memory leaks. 6GB is way past what the memory requirements
> should ever be, though of course the more RAM you have the more
> file/object data can be cached in-memory which can provide some nice
> boosts in read bandwidth.
> 
> That said, we haven't been very careful about memory usage in our
> peering code and this may be the cause of your problems with starting
> up again. But it wouldn't explain why they ran out of memory to begin
> with.

I was investigating a crash of the osd at the time (definitely not an
oom kill, got a core dump) so that probably started it all off.

John.

p.s: thanks for the help with tcmalloc profiling on irc :)


[-- Attachment #2: osd.0.0017.heap.pprof.txt.gz --]
[-- Type: application/x-gzip, Size: 6505 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-05  0:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 21:58 osd memory usage with lots of objects John Leach
2011-01-04 22:28 ` Gregory Farnum
2011-01-05  0:39   ` John Leach
2011-01-05  0:20 ` Colin McCabe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.