All of lore.kernel.org
 help / color / mirror / Atom feed
* bluestore onode diet and encoding overhead
@ 2016-07-12  7:03 Mark Nelson
  2016-07-12  7:13 ` Somnath Roy
  2016-07-12 15:20 ` Allen Samuels
  0 siblings, 2 replies; 39+ messages in thread
From: Mark Nelson @ 2016-07-12  7:03 UTC (permalink / raw)
  To: ceph-devel

Hi All,

With Igor's patch last week I was able to get some bluestore performance 
runs in without segfaulting and started looking int the results. 
Somewhere along the line we really screwed up read performance, but 
that's another topic.  Right now I want to focus on random writes. 
Before we put the onode on a diet we were seeing massive amounts of read 
traffic in RocksDB during compaction that caused write stalls during 4K 
random writes.  Random write performance on fast hardware like NVMe 
devices was often below filestore at anything other than very large IO 
sizes.  This was largely due to the size of the onode compounded with 
RocksDB's tendency toward read and write amplification.

The new test results look very promising.  We've dramatically improved 
performance of random writes at most IO sizes, so that they are now 
typically quite a bit higher than both filestore and older bluestore 
code.  Unfortunately for very small IO sizes performance hasn't improved 
much.  We are no longer seeing huge amounts of RocksDB read traffic and 
fewer write stalls.  We are however seeing huge memory usage (~9GB RSS 
per OSD) and very high CPU usage.  I think this confirms some of the 
memory issues somnath was continuing to see.  I don't think it's a leak 
exactly based on how the OSDs were behaving, but we need to run through 
massif still to be sure.

I ended up spending some time tonight with perf and digging through the 
encode code.  I wrote up some notes with graphs and code snippets and 
decided to put them up on the web.  Basically some of the encoding 
changes we implemented last month to reduce the onode size also appear 
to result in more buffer::list appends and the associated overhead. 
I've been trying to think through ways to improve the situation and 
thought other people might have some ideas too.  Here's a link to the 
short writeup:

https://drive.google.com/file/d/0B2gTBZrkrnpZeC04eklmM2I4Wkk/view?usp=sharing

Thanks,
Mark

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-08-14 20:37 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-12  7:03 bluestore onode diet and encoding overhead Mark Nelson
2016-07-12  7:13 ` Somnath Roy
2016-07-12 12:34   ` Mark Nelson
2016-07-12 12:40     ` Igor Fedotov
2016-07-12 12:47       ` Varada Kari
2016-07-12 12:48       ` Mark Nelson
2016-07-12 12:57         ` Igor Fedotov
2016-07-12 13:02           ` Mark Nelson
2016-07-12 15:14             ` Somnath Roy
2016-07-12 15:31               ` Igor Fedotov
2016-07-12 15:36                 ` Somnath Roy
2016-07-12 15:46                   ` Mark Nelson
2016-07-12 20:48                     ` Mark Nelson
2016-07-12 15:37               ` Varada Kari
2016-07-12 16:56               ` Sage Weil
2016-07-12 16:57                 ` Sage Weil
2016-07-12 17:06                   ` Somnath Roy
2016-07-12 17:50                 ` Allen Samuels
2016-07-12 15:20 ` Allen Samuels
2016-07-12 15:37   ` Mark Nelson
2016-07-12 21:15     ` Allen Samuels
2016-07-12 22:04       ` Mark Nelson
2016-07-13  1:50   ` Sage Weil
2016-07-13  3:13     ` Mark Nelson
2016-07-13  6:33       ` Piotr Dałek
2016-07-13 16:05         ` Sage Weil
2016-07-13 21:29           ` Allen Samuels
2016-07-14  5:52       ` Allen Samuels
2016-07-14 11:15         ` Mark Nelson
2016-07-14 14:10           ` Allen Samuels
2016-08-12 16:18             ` Sage Weil
2016-08-12 22:25               ` Allen Samuels
2016-08-13 21:36                 ` Sage Weil
2016-08-14 20:37                   ` Allen Samuels
2016-07-14 14:14           ` Allen Samuels
2016-07-14 16:20           ` Allen Samuels
2016-07-14 16:31             ` Mark Nelson
2016-07-14 16:34               ` Allen Samuels
2016-07-13 14:47     ` Samuel Just

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.