All of lore.kernel.org
 help / color / mirror / Atom feed
* Disabling journal
@ 2012-11-11 14:29 Stefan Priebe - Profihost AG
  2012-11-11 15:24 ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-11-11 14:29 UTC (permalink / raw)
  To: ceph-devel

Hi list,

is there a way to disable journal completely? For fast ssd storage it doesn't make sense and I want to test how speed changes.

Greets
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Disabling journal
  2012-11-11 14:29 Disabling journal Stefan Priebe - Profihost AG
@ 2012-11-11 15:24 ` Sage Weil
  2012-11-11 21:33   ` Stefan Priebe
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2012-11-11 15:24 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel

On Sun, 11 Nov 2012, Stefan Priebe - Profihost AG wrote:
> Hi list,
> 
> is there a way to disable journal completely? For fast ssd storage it 
> doesn't make sense and I want to test how speed changes.

With btrfs, yes, although this isn't something we have tested in a while.  
You'd need to play with 'filestore min sync interval' and 'filestore max 
sync interval' to basically pick your latency range.

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Disabling journal
  2012-11-11 15:24 ` Sage Weil
@ 2012-11-11 21:33   ` Stefan Priebe
  2012-11-12 14:42     ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Priebe @ 2012-11-11 21:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

> With btrfs, yes, although this isn't something we have tested in a while.
I'm not using btrfs as long as the devs claim it is not ready for prod.

> You'd need to play with 'filestore min sync interval' and 'filestore max
> sync interval' to basically pick your latency range.
Any suggestions? Right now i've 500MB tmpfs journal per SSD.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Disabling journal
  2012-11-11 21:33   ` Stefan Priebe
@ 2012-11-12 14:42     ` Sage Weil
  2012-11-12 14:48       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2012-11-12 14:42 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel

On Sun, 11 Nov 2012, Stefan Priebe wrote:
> Hi Sage,
> 
> > With btrfs, yes, although this isn't something we have tested in a while.
> I'm not using btrfs as long as the devs claim it is not ready for prod.

In that case, the journal is needed for consistency of the fs; we rely on 
writeahead journaling.  It can't be turned off.

Putting it on a ramdisk in this case is interesting for performance, but 
it means that a crash/reboot/powerloss event leaves the fs in an 
inconsistent and unusable state.

The only time tmpfs is potentially useful in production is when you're 
using btrfs *and* have independent backup power sources for replicas (and 
can thus avoid worrying about a site-wide power failure and loss of 
journal).  (Or have relaxed requirements for the durability of recent 
writes.)

sage


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Disabling journal
  2012-11-12 14:42     ` Sage Weil
@ 2012-11-12 14:48       ` Stefan Priebe - Profihost AG
  2012-11-12 14:59         ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-11-12 14:48 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Am 12.11.2012 15:42, schrieb Sage Weil:
> On Sun, 11 Nov 2012, Stefan Priebe wrote:
>> Hi Sage,
>>
>>> With btrfs, yes, although this isn't something we have tested in a while.
>> I'm not using btrfs as long as the devs claim it is not ready for prod.
>
> In that case, the journal is needed for consistency of the fs; we rely on
> writeahead journaling.  It can't be turned off.
>
> Putting it on a ramdisk in this case is interesting for performance, but
> it means that a crash/reboot/powerloss event leaves the fs in an
> inconsistent and unusable state.

But only if for replicas 2 both nodes crash / have a powerloss?

> The only time tmpfs is potentially useful in production is when you're
> using btrfs *and* have independent backup power sources for replicas (and
> can thus avoid worrying about a site-wide power failure and loss of
> journal).  (Or have relaxed requirements for the durability of recent
> writes.)
What happens for XFS and replicas two and ONE host has a power loss? The 
other replica / journal should be still there.

I've no idea where to put the journal on.

I mean i've 8 SSDs per Host one per osd each with a write IOP/s speed of 
45.000 iops to whole IOP/s write speed of 360.000 IOP/s per Node.

Which journal device can handle this? And if i put the journal on the 
same disk as the OSD it has to copy the data around.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Disabling journal
  2012-11-12 14:48       ` Stefan Priebe - Profihost AG
@ 2012-11-12 14:59         ` Sage Weil
  0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2012-11-12 14:59 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel

On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote:
> Am 12.11.2012 15:42, schrieb Sage Weil:
> > On Sun, 11 Nov 2012, Stefan Priebe wrote:
> > > Hi Sage,
> > > 
> > > > With btrfs, yes, although this isn't something we have tested in a
> > > > while.
> > > I'm not using btrfs as long as the devs claim it is not ready for prod.
> > 
> > In that case, the journal is needed for consistency of the fs; we rely on
> > writeahead journaling.  It can't be turned off.
> > 
> > Putting it on a ramdisk in this case is interesting for performance, but
> > it means that a crash/reboot/powerloss event leaves the fs in an
> > inconsistent and unusable state.
> 
> But only if for replicas 2 both nodes crash / have a powerloss?

Then you're okay.. but the one that lost the journal effectively also lost 
the contents of the SSD.  Also, manual intervention is currently needed to 
reinitialize the osd (since this is not a normal failure mode).

> > The only time tmpfs is potentially useful in production is when you're
> > using btrfs *and* have independent backup power sources for replicas (and
> > can thus avoid worrying about a site-wide power failure and loss of
> > journal).  (Or have relaxed requirements for the durability of recent
> > writes.)
> What happens for XFS and replicas two and ONE host has a power loss? The other
> replica / journal should be still there.
> 
> I've no idea where to put the journal on.
> 
> I mean i've 8 SSDs per Host one per osd each with a write IOP/s speed of
> 45.000 iops to whole IOP/s write speed of 360.000 IOP/s per Node.
> 
> Which journal device can handle this? And if i put the journal on the same
> disk as the OSD it has to copy the data around.

I think you have two choices.  Either put the journal SSDs (perhaps a 
journal on an existing one), or use a higher-end NVRAM-based device.  
There are several of these out there, although I'm blanking on product 
names at the moment.  The best are probably the battery-backed DRAM ones 
with a bit of flash for when the battery gets low.  Lots of RAID 
controllers also have some onboard NVRAM that can often be finagled into 
being useful, at least with spinning disks; I'm not sure how they perform 
with SSDs.

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-12 14:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-11 14:29 Disabling journal Stefan Priebe - Profihost AG
2012-11-11 15:24 ` Sage Weil
2012-11-11 21:33   ` Stefan Priebe
2012-11-12 14:42     ` Sage Weil
2012-11-12 14:48       ` Stefan Priebe - Profihost AG
2012-11-12 14:59         ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.