All of lore.kernel.org
 help / color / mirror / Atom feed
* puzzled with the design pattern of ceph journal, really ruining performance
@ 2014-09-17  6:29 姚宁
  2014-09-17  7:29 ` Somnath Roy
  0 siblings, 1 reply; 8+ messages in thread
From: 姚宁 @ 2014-09-17  6:29 UTC (permalink / raw)
  To: ceph-devel

Hi, guys

I analyze the architecture of the ceph souce code.

I know that, in order to keep journal atomic and consistent, the
journal write mode should be set with O_DSYNC or called fdatasync()
system call after every write operation. However, this kind of
operation is really killing the performance as well as achieving high
committing latency, even if SSD is used as journal disk. If the SSD
has capacitor to keep the data safe when the system crashes, we can
set the mount option nobarrier or SSD itself will ignore the FLUSH
REQUEST. So the performance would be better.

So can it be instead by other strategies?
As far as I am concerned, I think the most important part is pg_log
and pg_info. It will guides the crashed osd recovery its objects from
the peers. Therefore, if we can keep pg_log at a consistent point, we
can recovery data without journal. So can we just use an "undo"
strategy on pg_log and neglect ceph journal?  It will save lots of
bandwidth, and also based on the consistent pg_log epoch, we can
always recovery data from its peering osd, right? But this will lead
to recovery more objects if the osd crash.

Nicheal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-18  1:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-17  6:29 puzzled with the design pattern of ceph journal, really ruining performance 姚宁
2014-09-17  7:29 ` Somnath Roy
2014-09-17  7:59   ` Chen, Xiaoxi
2014-09-17 14:20     ` Alexandre DERUMIER
2014-09-17 15:01       ` Mark Nelson
2014-09-17 21:13         ` Alexandre DERUMIER
2014-09-18  1:05           ` Chen, Xiaoxi
2014-09-18  1:23             ` Mark Nelson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.