All of lore.kernel.org
 help / color / mirror / Atom feed
* Intel 520/530 SSD for ceph
@ 2013-11-18 13:38 Stefan Priebe - Profihost AG
       [not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-11-18 13:38 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org, ceph-users@lists.ceph.com

Hi guys,

in the past we've used intel 520 ssds for ceph journal - this worked
great and our experience was good.

Now they started to replace the 520 series with their new 530.

When we did we were supriced by the ugly performance and i need some
days to reproduce.

While O_DIRECT works fine for both and the intel ssd 530 is even faster
than the 520.

O_DSYNC... see the results:

~# dd if=randfile.gz of=/dev/sda bs=350k count=10000 oflag=direct,dsync
3584000000 bytes (3,6 GB) copied, 22,287 s, 161 MB/s

~# dd if=randfile.gz of=/dev/sdb bs=350k count=10000 oflag=direct,dsync
3584000000 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s

I used a blocksize of 350k as my graphes shows me that this is the
average workload we have on the journal. But i also tried using fio,
bigger blocksize, ... it stays the same.

Does anybody have an idea? Without dsync both devices have around the
same performance of 260MB/s.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intel 520/530 SSD for ceph
       [not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-11-18 22:51   ` mdw-Jp3n8lUXroRWk0Htik3J/w
       [not found]     ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: mdw-Jp3n8lUXroRWk0Htik3J/w @ 2013-11-18 22:51 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
> Hi guys,
> 
> in the past we've used intel 520 ssds for ceph journal - this worked
> great and our experience was good.
> 
> Now they started to replace the 520 series with their new 530.
> 
> When we did we were supriced by the ugly performance and i need some
> days to reproduce.
> 
> While O_DIRECT works fine for both and the intel ssd 530 is even faster
> than the 520.
> 
> O_DSYNC... see the results:
> 
> ~# dd if=randfile.gz of=/dev/sda bs=350k count=10000 oflag=direct,dsync
> 3584000000 bytes (3,6 GB) copied, 22,287 s, 161 MB/s
> 
> ~# dd if=randfile.gz of=/dev/sdb bs=350k count=10000 oflag=direct,dsync
> 3584000000 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s
> 
> I used a blocksize of 350k as my graphes shows me that this is the
> average workload we have on the journal. But i also tried using fio,
> bigger blocksize, ... it stays the same.
> 
> Does anybody have an idea? Without dsync both devices have around the
> same performance of 260MB/s.
> 
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
but glibc maps O_DSYNC into O_SYNC.  But since you're writing to the
block device this won't matter much.

I believe the effect of O_DIRECT by itself is just to bypass the buffer
cache, which is not going to make much difference for your dd case.
(It will mainly affect other applications that are also using the
buffer cache...)

O_SYNC should be causing the writes to block until a response
is received from the disk.  Without O_SYNC, the writes will
just queue operations and return - potentially very fast.
Your dd is probably writing enough data that there is some
throttling by the system as it runs out of disk buffers and
has to wait for some previous data to be written to the drive,
but the delay for any individual block is not likely to matter.
With O_SYNC, you are measuring the delay for each block directly,
and you have absolutely removed the ability for the disk to
perform any sort of parallism.
	[It's also conceivable the kernel is sending some form of write
	barrier flag to the drive, which will slow it down further,
	but I can't find any kernel logic that does this at a quick glance.]
Sounds like the intel 530 is has a much larger block write latency,
but can make up for it by performing more overlapped operations.

You might be able to vary this behavior by experimenting with sdparm,
smartctl or other tools, or possibly with different microcode in the drive.

				-Marcus Watts

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intel 520/530 SSD for ceph
       [not found]     ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
@ 2013-11-19  8:02       ` Stefan Priebe
       [not found]         ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Priebe @ 2013-11-19  8:02 UTC (permalink / raw)
  To: mdw-Jp3n8lUXroRWk0Htik3J/w
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

Hi Marcus,

Am 18.11.2013 23:51, schrieb mdw-Jp3n8lUXroRWk0Htik3J/w@public.gmane.org:
> On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
> You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
> but glibc maps O_DSYNC into O_SYNC.  But since you're writing to the
> block device this won't matter much.

No difference regarding O_DSYNC or O_SYNC the values are the same. Also 
I'm using 3.10.19 as a kernel so it is recent enough.

> I believe the effect of O_DIRECT by itself is just to bypass the buffer
> cache, which is not going to make much difference for your dd case.
> (It will mainly affect other applications that are also using the
> buffer cache...)
 > O_SYNC should be causing the writes to block until a response
 > is received from the disk.  Without O_SYNC, the writes will
 > just queue operations and return - potentially very fast.
 > Your dd is probably writing enough data that there is some
 > throttling by the system as it runs out of disk buffers and
 > has to wait for some previous data to be written to the drive,
 > but the delay for any individual block is not likely to matter.
 > With O_SYNC, you are measuring the delay for each block directly,
 > and you have absolutely removed the ability for the disk to
 > perform any sort of parallism.

That's correct but ceph uses O_DSYNC for his journal and may be other 
stuff so it is important to have devices performing well with O_DSYNC.

> Sounds like the intel 530 is has a much larger block write latency,
> but can make up for it by performing more overlapped operations.
>
> You might be able to vary this behavior by experimenting with sdparm,
> smartctl or other tools, or possibly with different microcode in the drive.
Which values or which settings do you think of?

Greets
Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intel 520/530 SSD for ceph
       [not found]         ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-11-21  0:29           ` mdw-Jp3n8lUXroRWk0Htik3J/w
  2013-11-21  8:36             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 5+ messages in thread
From: mdw-Jp3n8lUXroRWk0Htik3J/w @ 2013-11-21  0:29 UTC (permalink / raw)
  To: Stefan Priebe
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote:
...
> >You might be able to vary this behavior by experimenting with sdparm,
> >smartctl or other tools, or possibly with different microcode in the drive.
> Which values or which settings do you think of?
...

Off-hand, I don't know.  Probably the first thing would be
to compare the configuration of your 520 & 530; anything that's
different is certainly worth investigating.

This should display all pages,
	sdparm --all --long /dev/sdX
the 520 only appears to have 3 pages, which can be fetched directly w/
	sdparm --page=ca --long /dev/sdX
	sdparm --page=co --long /dev/sdX
	sdparm --page=rw --long /dev/sdX

The sample machine I'm looking has an intel 520, and on ours,
most options show as 0 except for
  AWRE        1  [cha: n, def:  1]  Automatic write reallocation enabled
  WCE         1  [cha: y, def:  1]  Write cache enable
  DRA         1  [cha: n, def:  1]  Disable read ahead
  GLTSD       1  [cha: n, def:  1]  Global logging target save disable
  BTP        -1  [cha: n, def: -1]  Busy timeout period (100us)
  ESTCT      30  [cha: n, def: 30]  Extended self test completion time (sec)
Perhaps that's an interesting data point to compare with yours.

Figuring out if you have up-to-date intel firmware appears to require
burning and running an iso image from
https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455

The results of sdparm --page=<whatever> --long /dev/sdc
show the intel firmware, but this labels it better:
smartctl -i /dev/sdc
Our 520 has firmware "400i" loaded.

				-Marcus Watts

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intel 520/530 SSD for ceph
  2013-11-21  0:29           ` mdw-Jp3n8lUXroRWk0Htik3J/w
@ 2013-11-21  8:36             ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-11-21  8:36 UTC (permalink / raw)
  To: mdw; +Cc: ceph-devel@vger.kernel.org, ceph-users@lists.ceph.com

Hi,

Am 21.11.2013 01:29, schrieb mdw@linuxbox.com:
> On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote:
> ...
>>> You might be able to vary this behavior by experimenting with sdparm,
>>> smartctl or other tools, or possibly with different microcode in the drive.
>> Which values or which settings do you think of?
> ...
> 
> Off-hand, I don't know.  Probably the first thing would be
> to compare the configuration of your 520 & 530; anything that's
> different is certainly worth investigating.
> 
> This should display all pages,
> 	sdparm --all --long /dev/sdX
> the 520 only appears to have 3 pages, which can be fetched directly w/
> 	sdparm --page=ca --long /dev/sdX
> 	sdparm --page=co --long /dev/sdX
> 	sdparm --page=rw --long /dev/sdX
> 
> The sample machine I'm looking has an intel 520, and on ours,
> most options show as 0 except for
>   AWRE        1  [cha: n, def:  1]  Automatic write reallocation enabled
>   WCE         1  [cha: y, def:  1]  Write cache enable
>   DRA         1  [cha: n, def:  1]  Disable read ahead
>   GLTSD       1  [cha: n, def:  1]  Global logging target save disable
>   BTP        -1  [cha: n, def: -1]  Busy timeout period (100us)
>   ESTCT      30  [cha: n, def: 30]  Extended self test completion time (sec)
> Perhaps that's an interesting data point to compare with yours.
> 
> Figuring out if you have up-to-date intel firmware appears to require
> burning and running an iso image from
> https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455
> 
> The results of sdparm --page=<whatever> --long /dev/sdc
> show the intel firmware, but this labels it better:
> smartctl -i /dev/sdc
> Our 520 has firmware "400i" loaded.

Firmware is up2date and all values are the same. I expect that the 520
firmware just ignores CMD_FLUSH commands and the 530 does not.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-11-21  8:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-18 13:38 Intel 520/530 SSD for ceph Stefan Priebe - Profihost AG
     [not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-11-18 22:51   ` mdw-Jp3n8lUXroRWk0Htik3J/w
     [not found]     ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
2013-11-19  8:02       ` Stefan Priebe
     [not found]         ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-11-21  0:29           ` mdw-Jp3n8lUXroRWk0Htik3J/w
2013-11-21  8:36             ` Stefan Priebe - Profihost AG

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.