All-Flash Ceph cluster and journal

All of lore.kernel.org
 help / color / mirror / Atom feed

* All-Flash Ceph cluster and journal
@ 2015-11-19 11:29 Mike Almateia
  2015-11-19 14:34 ` Mark Nelson
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Almateia @ 2015-11-19 11:29 UTC (permalink / raw)
  To: Ceph Development

Hello.

By now we have SSD disks with a great perfomance under O_DIRECT/O_SYNC 
flags (and we recomended to use they for journal).

Why in a All-Flash type Ceph clusters we still use journal and them put 
a data into OSD?
Why we can't just write a data into OSD with O_DIRECT/O_SYNC flags, 
without journal?

Can we just switch off the journal functional for all-flash ceph clusters?

-- 
Mike, runs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: All-Flash Ceph cluster and journal
  2015-11-19 11:29 All-Flash Ceph cluster and journal Mike Almateia
@ 2015-11-19 14:34 ` Mark Nelson
  2015-11-20 10:42   ` Mike Almateia
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Nelson @ 2015-11-19 14:34 UTC (permalink / raw)
  To: Mike Almateia, Ceph Development

This is actually the direction newstore is heading with the 
newstore_min_alloc_size so that you can configure when to write into the 
rocksdb WAL.  By default it's set to 512k, but for SSDs we will almost 
certainly want to go smaller.

On 11/19/2015 05:29 AM, Mike Almateia wrote:
> Hello.
>
> By now we have SSD disks with a great perfomance under O_DIRECT/O_SYNC
> flags (and we recomended to use they for journal).
>
> Why in a All-Flash type Ceph clusters we still use journal and them put
> a data into OSD?
> Why we can't just write a data into OSD with O_DIRECT/O_SYNC flags,
> without journal?
>
> Can we just switch off the journal functional for all-flash ceph clusters?
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: All-Flash Ceph cluster and journal
  2015-11-19 14:34 ` Mark Nelson
@ 2015-11-20 10:42   ` Mike Almateia
  2015-11-20 10:49     ` Piotr.Dalek
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Almateia @ 2015-11-20 10:42 UTC (permalink / raw)
  To: Ceph Development

19-Nov-15 17:34, Mark Nelson пишет:
> This is actually the direction newstore is heading with the
> newstore_min_alloc_size so that you can configure when to write into the
> rocksdb WAL.  By default it's set to 512k, but for SSDs we will almost
> certainly want to go smaller.
>
> On 11/19/2015 05:29 AM, Mike Almateia wrote:
>> Hello.
>>
>> By now we have SSD disks with a great perfomance under O_DIRECT/O_SYNC
>> flags (and we recomended to use they for journal).
>>
>> Why in a All-Flash type Ceph clusters we still use journal and them put
>> a data into OSD?
>> Why we can't just write a data into OSD with O_DIRECT/O_SYNC flags,
>> without journal?
>>
>> Can we just switch off the journal functional for all-flash ceph
>> clusters?
>>

Thanks for answer!

By now is it resonable to use NVMe Flash for OSD on Ceph? Overpower? Is 
it possible to achive full speed NVMe Flash driver under Ceph?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: All-Flash Ceph cluster and journal
  2015-11-20 10:42   ` Mike Almateia
@ 2015-11-20 10:49     ` Piotr.Dalek
  2015-11-23 17:37       ` Blinick, Stephen L
  0 siblings, 1 reply; 7+ messages in thread
From: Piotr.Dalek @ 2015-11-20 10:49 UTC (permalink / raw)
  To: Mike Almateia, Ceph Development

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Mike Almateia
> Sent: Friday, November 20, 2015 11:43 AM

> By now is it resonable to use NVMe Flash for OSD on Ceph? Overpower? Is it
> possible to achive full speed NVMe Flash driver under Ceph?

Yes and no. Ceph on any Flash drive will perform way better than on regular spinning disks, though certainly will not utilize its full potential. There is ongoing effort from multiple developers from multiple companies to fix that and things are getting better with each release.


With best regards / Pozdrawiam
Piotr Dałek


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: All-Flash Ceph cluster and journal
  2015-11-20 10:49     ` Piotr.Dalek
@ 2015-11-23 17:37       ` Blinick, Stephen L
  2015-11-23 18:39         ` Daniel Swarbrick
  0 siblings, 1 reply; 7+ messages in thread
From: Blinick, Stephen L @ 2015-11-23 17:37 UTC (permalink / raw)
  To: Piotr.Dalek@ts.fujitsu.com, Mike Almateia, Ceph Development

This link points to a presentation we did a few weeks back where we used NVMe devices for both the data and journal.  We partitioned the devices multiple times to co-locate multiple OSD's per device.  The configuration data on the cluster is in the backup.

http://www.slideshare.net/Inktank_Ceph/accelerating-cassandra-workloads-on-ceph-with-allflash-pcie-ssds

Thanks,

Stephen

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Piotr.Dalek@ts.fujitsu.com
Sent: Friday, November 20, 2015 3:50 AM
To: Mike Almateia; Ceph Development
Subject: RE: All-Flash Ceph cluster and journal

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- 
> owner@vger.kernel.org] On Behalf Of Mike Almateia
> Sent: Friday, November 20, 2015 11:43 AM

> By now is it resonable to use NVMe Flash for OSD on Ceph? Overpower? 
> Is it possible to achive full speed NVMe Flash driver under Ceph?

Yes and no. Ceph on any Flash drive will perform way better than on regular spinning disks, though certainly will not utilize its full potential. There is ongoing effort from multiple developers from multiple companies to fix that and things are getting better with each release.


With best regards / Pozdrawiam
Piotr Dałek

\x04 {.n +       +%  lzwm  b 맲  r  yǩ ׯzX  \x17  ܨ}   Ơz &j:+v        zZ+  +zf   h   ~    i   z \x1e w   ?    & )ߢ^[f

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: All-Flash Ceph cluster and journal
  2015-11-23 17:37       ` Blinick, Stephen L
@ 2015-11-23 18:39         ` Daniel Swarbrick
  2015-11-23 21:20           ` Blinick, Stephen L
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Swarbrick @ 2015-11-23 18:39 UTC (permalink / raw)
  To: ceph-devel

Simple but clever way of ensuring that NVMe's deep queues aren't starved
of work to do. But doesn't this suggest that Ceph needs optimizing or
tuning for NVMe? Could you not have tweaked OSD parameters to ensure
more threads / IO operations in parallel and have the same effect?

On 23/11/15 18:37, Blinick, Stephen L wrote:
> This link points to a presentation we did a few weeks back where we used NVMe devices for both the data and journal.  We partitioned the devices multiple times to co-locate multiple OSD's per device.  The configuration data on the cluster is in the backup.
> 
> http://www.slideshare.net/Inktank_Ceph/accelerating-cassandra-workloads-on-ceph-with-allflash-pcie-ssds
> 
> Thanks,
> 
> Stephen
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: All-Flash Ceph cluster and journal
  2015-11-23 18:39         ` Daniel Swarbrick
@ 2015-11-23 21:20           ` Blinick, Stephen L
  0 siblings, 0 replies; 7+ messages in thread
From: Blinick, Stephen L @ 2015-11-23 21:20 UTC (permalink / raw)
  To: Daniel Swarbrick, ceph-devel@vger.kernel.org

It's an example of what Hammer can do, and we're seeing some improvements already with Infernalis.  I agree regarding the tuning and optimization, and a lot of work is currently underway towards that goal as Piotr pointed out.

For completeness we did do a bit of the OSD tweaking a week or so ago (results in the mailinglist).  http://www.spinics.net/lists/ceph-devel/msg27256.html

Thanks,

Stephen

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Daniel Swarbrick
Sent: Monday, November 23, 2015 11:39 AM
To: ceph-devel@vger.kernel.org
Subject: Re: All-Flash Ceph cluster and journal

Simple but clever way of ensuring that NVMe's deep queues aren't starved of work to do. But doesn't this suggest that Ceph needs optimizing or tuning for NVMe? Could you not have tweaked OSD parameters to ensure more threads / IO operations in parallel and have the same effect?

On 23/11/15 18:37, Blinick, Stephen L wrote:
> This link points to a presentation we did a few weeks back where we used NVMe devices for both the data and journal.  We partitioned the devices multiple times to co-locate multiple OSD's per device.  The configuration data on the cluster is in the backup.
> 
> http://www.slideshare.net/Inktank_Ceph/accelerating-cassandra-workload
> s-on-ceph-with-allflash-pcie-ssds
> 
> Thanks,
> 
> Stephen
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-23 21:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-19 11:29 All-Flash Ceph cluster and journal Mike Almateia
2015-11-19 14:34 ` Mark Nelson
2015-11-20 10:42   ` Mike Almateia
2015-11-20 10:49     ` Piotr.Dalek
2015-11-23 17:37       ` Blinick, Stephen L
2015-11-23 18:39         ` Daniel Swarbrick
2015-11-23 21:20           ` Blinick, Stephen L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.