* Intel 520/530 SSD for ceph
@ 2013-11-18 13:38 Stefan Priebe - Profihost AG
[not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-11-18 13:38 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org, ceph-users@lists.ceph.com
Hi guys,
in the past we've used intel 520 ssds for ceph journal - this worked
great and our experience was good.
Now they started to replace the 520 series with their new 530.
When we did we were supriced by the ugly performance and i need some
days to reproduce.
While O_DIRECT works fine for both and the intel ssd 530 is even faster
than the 520.
O_DSYNC... see the results:
~# dd if=randfile.gz of=/dev/sda bs=350k count=10000 oflag=direct,dsync
3584000000 bytes (3,6 GB) copied, 22,287 s, 161 MB/s
~# dd if=randfile.gz of=/dev/sdb bs=350k count=10000 oflag=direct,dsync
3584000000 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s
I used a blocksize of 350k as my graphes shows me that this is the
average workload we have on the journal. But i also tried using fio,
bigger blocksize, ... it stays the same.
Does anybody have an idea? Without dsync both devices have around the
same performance of 260MB/s.
Greets,
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Intel 520/530 SSD for ceph
[not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-11-18 22:51 ` mdw-Jp3n8lUXroRWk0Htik3J/w
[not found] ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: mdw-Jp3n8lUXroRWk0Htik3J/w @ 2013-11-18 22:51 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
> Hi guys,
>
> in the past we've used intel 520 ssds for ceph journal - this worked
> great and our experience was good.
>
> Now they started to replace the 520 series with their new 530.
>
> When we did we were supriced by the ugly performance and i need some
> days to reproduce.
>
> While O_DIRECT works fine for both and the intel ssd 530 is even faster
> than the 520.
>
> O_DSYNC... see the results:
>
> ~# dd if=randfile.gz of=/dev/sda bs=350k count=10000 oflag=direct,dsync
> 3584000000 bytes (3,6 GB) copied, 22,287 s, 161 MB/s
>
> ~# dd if=randfile.gz of=/dev/sdb bs=350k count=10000 oflag=direct,dsync
> 3584000000 bytes (3,6 GB) copied, 136,505 s, 26,3 MB/s
>
> I used a blocksize of 350k as my graphes shows me that this is the
> average workload we have on the journal. But i also tried using fio,
> bigger blocksize, ... it stays the same.
>
> Does anybody have an idea? Without dsync both devices have around the
> same performance of 260MB/s.
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
but glibc maps O_DSYNC into O_SYNC. But since you're writing to the
block device this won't matter much.
I believe the effect of O_DIRECT by itself is just to bypass the buffer
cache, which is not going to make much difference for your dd case.
(It will mainly affect other applications that are also using the
buffer cache...)
O_SYNC should be causing the writes to block until a response
is received from the disk. Without O_SYNC, the writes will
just queue operations and return - potentially very fast.
Your dd is probably writing enough data that there is some
throttling by the system as it runs out of disk buffers and
has to wait for some previous data to be written to the drive,
but the delay for any individual block is not likely to matter.
With O_SYNC, you are measuring the delay for each block directly,
and you have absolutely removed the ability for the disk to
perform any sort of parallism.
[It's also conceivable the kernel is sending some form of write
barrier flag to the drive, which will slow it down further,
but I can't find any kernel logic that does this at a quick glance.]
Sounds like the intel 530 is has a much larger block write latency,
but can make up for it by performing more overlapped operations.
You might be able to vary this behavior by experimenting with sdparm,
smartctl or other tools, or possibly with different microcode in the drive.
-Marcus Watts
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Intel 520/530 SSD for ceph
[not found] ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
@ 2013-11-19 8:02 ` Stefan Priebe
[not found] ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Priebe @ 2013-11-19 8:02 UTC (permalink / raw)
To: mdw-Jp3n8lUXroRWk0Htik3J/w
Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
Hi Marcus,
Am 18.11.2013 23:51, schrieb mdw-Jp3n8lUXroRWk0Htik3J/w@public.gmane.org:
> On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
> You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
> but glibc maps O_DSYNC into O_SYNC. But since you're writing to the
> block device this won't matter much.
No difference regarding O_DSYNC or O_SYNC the values are the same. Also
I'm using 3.10.19 as a kernel so it is recent enough.
> I believe the effect of O_DIRECT by itself is just to bypass the buffer
> cache, which is not going to make much difference for your dd case.
> (It will mainly affect other applications that are also using the
> buffer cache...)
> O_SYNC should be causing the writes to block until a response
> is received from the disk. Without O_SYNC, the writes will
> just queue operations and return - potentially very fast.
> Your dd is probably writing enough data that there is some
> throttling by the system as it runs out of disk buffers and
> has to wait for some previous data to be written to the drive,
> but the delay for any individual block is not likely to matter.
> With O_SYNC, you are measuring the delay for each block directly,
> and you have absolutely removed the ability for the disk to
> perform any sort of parallism.
That's correct but ceph uses O_DSYNC for his journal and may be other
stuff so it is important to have devices performing well with O_DSYNC.
> Sounds like the intel 530 is has a much larger block write latency,
> but can make up for it by performing more overlapped operations.
>
> You might be able to vary this behavior by experimenting with sdparm,
> smartctl or other tools, or possibly with different microcode in the drive.
Which values or which settings do you think of?
Greets
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Intel 520/530 SSD for ceph
[not found] ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-11-21 0:29 ` mdw-Jp3n8lUXroRWk0Htik3J/w
2013-11-21 8:36 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 5+ messages in thread
From: mdw-Jp3n8lUXroRWk0Htik3J/w @ 2013-11-21 0:29 UTC (permalink / raw)
To: Stefan Priebe
Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote:
...
> >You might be able to vary this behavior by experimenting with sdparm,
> >smartctl or other tools, or possibly with different microcode in the drive.
> Which values or which settings do you think of?
...
Off-hand, I don't know. Probably the first thing would be
to compare the configuration of your 520 & 530; anything that's
different is certainly worth investigating.
This should display all pages,
sdparm --all --long /dev/sdX
the 520 only appears to have 3 pages, which can be fetched directly w/
sdparm --page=ca --long /dev/sdX
sdparm --page=co --long /dev/sdX
sdparm --page=rw --long /dev/sdX
The sample machine I'm looking has an intel 520, and on ours,
most options show as 0 except for
AWRE 1 [cha: n, def: 1] Automatic write reallocation enabled
WCE 1 [cha: y, def: 1] Write cache enable
DRA 1 [cha: n, def: 1] Disable read ahead
GLTSD 1 [cha: n, def: 1] Global logging target save disable
BTP -1 [cha: n, def: -1] Busy timeout period (100us)
ESTCT 30 [cha: n, def: 30] Extended self test completion time (sec)
Perhaps that's an interesting data point to compare with yours.
Figuring out if you have up-to-date intel firmware appears to require
burning and running an iso image from
https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455
The results of sdparm --page=<whatever> --long /dev/sdc
show the intel firmware, but this labels it better:
smartctl -i /dev/sdc
Our 520 has firmware "400i" loaded.
-Marcus Watts
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Intel 520/530 SSD for ceph
2013-11-21 0:29 ` mdw-Jp3n8lUXroRWk0Htik3J/w
@ 2013-11-21 8:36 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-11-21 8:36 UTC (permalink / raw)
To: mdw; +Cc: ceph-devel@vger.kernel.org, ceph-users@lists.ceph.com
Hi,
Am 21.11.2013 01:29, schrieb mdw@linuxbox.com:
> On Tue, Nov 19, 2013 at 09:02:41AM +0100, Stefan Priebe wrote:
> ...
>>> You might be able to vary this behavior by experimenting with sdparm,
>>> smartctl or other tools, or possibly with different microcode in the drive.
>> Which values or which settings do you think of?
> ...
>
> Off-hand, I don't know. Probably the first thing would be
> to compare the configuration of your 520 & 530; anything that's
> different is certainly worth investigating.
>
> This should display all pages,
> sdparm --all --long /dev/sdX
> the 520 only appears to have 3 pages, which can be fetched directly w/
> sdparm --page=ca --long /dev/sdX
> sdparm --page=co --long /dev/sdX
> sdparm --page=rw --long /dev/sdX
>
> The sample machine I'm looking has an intel 520, and on ours,
> most options show as 0 except for
> AWRE 1 [cha: n, def: 1] Automatic write reallocation enabled
> WCE 1 [cha: y, def: 1] Write cache enable
> DRA 1 [cha: n, def: 1] Disable read ahead
> GLTSD 1 [cha: n, def: 1] Global logging target save disable
> BTP -1 [cha: n, def: -1] Busy timeout period (100us)
> ESTCT 30 [cha: n, def: 30] Extended self test completion time (sec)
> Perhaps that's an interesting data point to compare with yours.
>
> Figuring out if you have up-to-date intel firmware appears to require
> burning and running an iso image from
> https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18455
>
> The results of sdparm --page=<whatever> --long /dev/sdc
> show the intel firmware, but this labels it better:
> smartctl -i /dev/sdc
> Our 520 has firmware "400i" loaded.
Firmware is up2date and all values are the same. I expect that the 520
firmware just ignores CMD_FLUSH commands and the 530 does not.
Greets,
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-11-21 8:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-18 13:38 Intel 520/530 SSD for ceph Stefan Priebe - Profihost AG
[not found] ` <528A1862.7010601-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-11-18 22:51 ` mdw-Jp3n8lUXroRWk0Htik3J/w
[not found] ` <20131118225146.GA1043-Hsy7OnahZ0C224KT6AusD78MeWzc+u9DAL8bYrjMMd8@public.gmane.org>
2013-11-19 8:02 ` Stefan Priebe
[not found] ` <528B1B21.1060203-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-11-21 0:29 ` mdw-Jp3n8lUXroRWk0Htik3J/w
2013-11-21 8:36 ` Stefan Priebe - Profihost AG
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.