* poor OSD performance using kernel 3.4
@ 2012-05-24 14:10 Stefan Priebe - Profihost AG
2012-05-24 14:57 ` Mark Nelson
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-24 14:10 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Hi list,
today while testing btrfs i discovered a very poor osd performance using
kernel 3.4.
Underlying FS is XFS but it is the same with btrfs.
3.0.30:
~# rados -p data bench 10 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 41 25 99.9767 100 0.586984 0.447293
2 16 71 55 109.979 120 0.934388 0.488375
3 16 99 83 110.647 112 1.15982 0.503111
4 16 130 114 113.981 124 1.05952 0.516925
5 16 159 143 114.382 116 0.149313 0.510734
6 16 188 172 114.649 116 0.287166 0.52203
7 16 215 199 113.697 108 0.151784 0.531461
8 16 242 226 112.984 108 0.623478 0.539896
9 16 265 249 110.651 92 0.50354 0.538504
10 16 296 280 111.984 124 0.155048 0.542846
Total time run: 10.776153
Total writes made: 297
Write size: 4194304
Bandwidth (MB/sec): 110.243
Average Latency: 0.577534
Max latency: 1.85499
Min latency: 0.091473
3.4:
~# rados -p data bench 10 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 40 24 95.9794 96 0.393196 0.455936
2 16 68 52 103.983 112 0.835652 0.517297
3 16 85 69 91.9849 68 1.00535 0.493058
4 16 96 80 79.9869 44 0.096564 0.577948
5 16 103 87 69.5879 28 0.092722 0.589147
6 16 117 101 67.3216 56 0.222175 0.675334
7 16 130 114 65.1321 52 0.15677 0.623806
8 16 144 128 63.9896 56 0.089157 0.56746
9 16 144 128 56.8794 0 - 0.56746
10 16 144 128 51.1912 0 - 0.56746
11 16 144 128 46.5373 0 - 0.56746
12 16 144 128 42.6591 0 - 0.56746
13 16 144 128 39.3776 0 - 0.56746
14 16 144 128 36.5649 0 - 0.56746
15 16 144 128 34.1272 0 - 0.56746
16 16 145 129 32.2443 0.5 11.3422 0.650985
Total time run: 16.193871
Total writes made: 145
Write size: 4194304
Bandwidth (MB/sec): 35.816
Average Latency: 1.78467
Max latency: 14.4744
Min latency: 0.088753
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-24 14:10 poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
@ 2012-05-24 14:57 ` Mark Nelson
[not found] ` <CAJCPpW+SKnnVUaDEAsCkKyZwMVrHCRJF2C8zqB4eORgwW5p=1Q@mail.gmail.com>
2012-05-29 22:25 ` poor OSD performance using kernel 3.4 Mark Nelson
2 siblings, 0 replies; 73+ messages in thread
From: Mark Nelson @ 2012-05-24 14:57 UTC (permalink / raw)
Cc: ceph-devel@vger.kernel.org
Hi Stefan,
Were these both tested on fresh filesystems? If you still have any
3.0.30 available, could you try a couple of longer running tests (say 5
minutes) and see how they compare?
Thanks,
Mark
On 05/24/2012 09:10 AM, Stefan Priebe - Profihost AG wrote:
> Hi list,
>
> today while testing btrfs i discovered a very poor osd performance using
> kernel 3.4.
>
> Underlying FS is XFS but it is the same with btrfs.
>
> 3.0.30:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 41 25 99.9767 100 0.586984 0.447293
> 2 16 71 55 109.979 120 0.934388 0.488375
> 3 16 99 83 110.647 112 1.15982 0.503111
> 4 16 130 114 113.981 124 1.05952 0.516925
> 5 16 159 143 114.382 116 0.149313 0.510734
> 6 16 188 172 114.649 116 0.287166 0.52203
> 7 16 215 199 113.697 108 0.151784 0.531461
> 8 16 242 226 112.984 108 0.623478 0.539896
> 9 16 265 249 110.651 92 0.50354 0.538504
> 10 16 296 280 111.984 124 0.155048 0.542846
> Total time run: 10.776153
> Total writes made: 297
> Write size: 4194304
> Bandwidth (MB/sec): 110.243
>
> Average Latency: 0.577534
> Max latency: 1.85499
> Min latency: 0.091473
>
>
> 3.4:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 40 24 95.9794 96 0.393196 0.455936
> 2 16 68 52 103.983 112 0.835652 0.517297
> 3 16 85 69 91.9849 68 1.00535 0.493058
> 4 16 96 80 79.9869 44 0.096564 0.577948
> 5 16 103 87 69.5879 28 0.092722 0.589147
> 6 16 117 101 67.3216 56 0.222175 0.675334
> 7 16 130 114 65.1321 52 0.15677 0.623806
> 8 16 144 128 63.9896 56 0.089157 0.56746
> 9 16 144 128 56.8794 0 - 0.56746
> 10 16 144 128 51.1912 0 - 0.56746
> 11 16 144 128 46.5373 0 - 0.56746
> 12 16 144 128 42.6591 0 - 0.56746
> 13 16 144 128 39.3776 0 - 0.56746
> 14 16 144 128 36.5649 0 - 0.56746
> 15 16 144 128 34.1272 0 - 0.56746
> 16 16 145 129 32.2443 0.5 11.3422 0.650985
> Total time run: 16.193871
> Total writes made: 145
> Write size: 4194304
> Bandwidth (MB/sec): 35.816
>
> Average Latency: 1.78467
> Max latency: 14.4744
> Min latency: 0.088753
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <4FBE7ABC.5020502@profihost.ag>
@ 2012-05-24 18:53 ` Mark Nelson
2012-05-24 19:05 ` Stefan Priebe
0 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-24 18:53 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org
Hi Stefan,
Thanks for the info! I've been testing on 3.4 for the last couple of
days but haven't run into that problem here. It looks like your journal
has writes going to it quickly and then things stall as it tries to
write out to your data disk. I wonder if any of the data actually makes
it to the disk... Can you run iostat or collectl or something and see
what kind of write throughput you get to the OSD data disks?
Thanks,
Mark
On 05/24/2012 01:15 PM, Stefan Priebe wrote:
>
> Am 24.05.2012 16:55, schrieb Mark Nelson:
>> Hi Stefan,
>>
>> Were these both tested on fresh filesystems? If you still have any
>> 3.0.30 available, could you try a couple of longer running tests (say 5
>> minutes) and see how they compare?
>
> Yes with 3.4 it totally stalls. Tested with XFS and btrfs. Client
> always had the same Kernel. So i just changed the kernel on osd side.
>
> Kernel 3.4
> http://pastebin.com/raw.php?i=CApKbSNj
>
> Kernel 3.0.30
> http://pastebin.com/raw.php?i=kZ7rnwcM
>
> Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-24 18:53 ` Mark Nelson
@ 2012-05-24 19:05 ` Stefan Priebe
2012-05-25 1:53 ` Mark Nelson
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-24 19:05 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Am 24.05.2012 20:53, schrieb Mark Nelson:
> Hi Stefan,
>
> Thanks for the info! I've been testing on 3.4 for the last couple of
> days but haven't run into that problem here. It looks like your journal
> has writes going to it quickly and then things stall as it tries to
> write out to your data disk.
That's a good point. Right now while testing i'm using a tmpfs ramdisk
for the journal and have set journal dio = false in ceph.conf? Might
this be the difference / problem?
3.2.18 works fine too.
> I wonder if any of the data actually makes
> it to the disk... Can you run iostat or collectl or something and see
> what kind of write throughput you get to the OSD data disks?
none... so it seems get's never transferred from journal to disk.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-24 19:05 ` Stefan Priebe
@ 2012-05-25 1:53 ` Mark Nelson
2012-05-25 8:19 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-25 1:53 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org
On 05/24/2012 02:05 PM, Stefan Priebe wrote:
> Am 24.05.2012 20:53, schrieb Mark Nelson:
>> Hi Stefan,
>>
>> Thanks for the info! I've been testing on 3.4 for the last couple of
>> days but haven't run into that problem here. It looks like your journal
>> has writes going to it quickly and then things stall as it tries to
>> write out to your data disk.
> That's a good point. Right now while testing i'm using a tmpfs ramdisk
> for the journal and have set journal dio = false in ceph.conf? Might
> this be the difference / problem?
>
> 3.2.18 works fine too.
Honestly I don't know if tmpfs journal with dio = false would lead to
that kind of behavior. Anything interesting in the logs if you turn
debugging up?
>
> > I wonder if any of the data actually makes
>> it to the disk... Can you run iostat or collectl or something and see
>> what kind of write throughput you get to the OSD data disks?
> none... so it seems get's never transferred from journal to disk.
This might be a stupid question, but writes to those partitions work
outside of Ceph with the new kernel right?
>
> Stefan
Thanks,
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-25 1:53 ` Mark Nelson
@ 2012-05-25 8:19 ` Stefan Priebe - Profihost AG
2012-05-25 11:31 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-25 8:19 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Am 25.05.2012 03:53, schrieb Mark Nelson:
> On 05/24/2012 02:05 PM, Stefan Priebe wrote:
>> 3.2.18 works fine too.
>
> Honestly I don't know if tmpfs journal with dio = false would lead to
> that kind of behavior. Anything interesting in the logs if you turn
> debugging up?
just stuff like this. But writing to the osd disk works - no idea why i
have seen a rate of 0 yesterday.
[INF] 2.2a scrub ok
2012-05-25 10:01:00.825442 pg v165: 768 pgs: 768 active+clean; 592 MB
data, 1181 MB used, 669 GB / 670 GB avail
2012-05-25 10:01:00.623252 osd.0 10.0.255.100:6800/7423 121 : [WRN] 1
slow requests, 1 included below; oldest blocked for > 30.042783 secs
2012-05-25 10:01:00.623259 osd.0 10.0.255.100:6800/7423 122 : [WRN] slow
request 30.042783 seconds old, received at 2012-05-25 10:00:30.580392:
osd_op(client.4111.0:74 proxmox1_154826_object73 [write 0~4194304]
0.5343bcc6) v4 currently waiting for sub ops
>> > I wonder if any of the data actually makes
>>> it to the disk... Can you run iostat or collectl or something and see
>>> what kind of write throughput you get to the OSD data disks?
>> none... so it seems get's never transferred from journal to disk.
>
> This might be a stupid question, but writes to those partitions work
> outside of Ceph with the new kernel right?
I just tested with dd:
dd if=/dev/zero of=/srv/test bs=1M count=10000 oflag=direct
this gaves me a constant rate of 240MB/s on ALL osds.
Also an "ceph osd tell X bench" shows 260MB/s on all OSDs.
But when i use the rados bench i see the same for XFS and btrfs which
looks like an heavy up and down rate of the cur MB/s while doing the
rados bench.
See:
XFS:
http://pastebin.com/raw.php?i=8ahaePZw
btrfs:
http://pastebin.com/raw.php?i=BrwSC1yg
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-25 8:19 ` Stefan Priebe - Profihost AG
@ 2012-05-25 11:31 ` Stefan Priebe - Profihost AG
2012-05-25 12:10 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-25 11:31 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Some speed tests with different Kernel Versions. The same applies to
other FS like btrfs.
I used "rados -p data bench 100 write -t 16" for all tests and a freshly
created FS. mount options were always: noatime,nodiratime,nobarrier.
3.0.30 with XFS
speed is always between 120 and 160MB/s
Total time run: 100.510061
Total writes made: 3605
Write size: 4194304
Bandwidth (MB/sec): 143.468
Average Latency: 0.445714
Max latency: 1.99929
Min latency: 0.084812
3.2.18 with XFS
speed is between 40 and 170MB/s
Total time run: 100.795653
Total writes made: 3384
Write size: 4194304
Bandwidth (MB/sec): 134.292
Average Latency: 0.476297
Max latency: 2.92075
Min latency: 0.084884
3.3.7 with XFS
!! speed heavily jumps between 0 and 170 MB/s !!
Total time run: 107.398166
Total writes made: 2455
Write size: 4194304
Bandwidth (MB/sec): 91.435
Average Latency: 0.699819
Max latency: 13.8117
Min latency: 0.084624
3.4 with XFS
!! speed heavily jumps between 0 and 130 MB/s - most if the time it's
near 0 !!
Total time run: 115.433531
Total writes made: 468
Write size: 4194304
Bandwidth (MB/sec): 16.217
Average Latency: 3.9452
Max latency: 53.4356
Min latency: 0.091276
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-25 11:31 ` Stefan Priebe - Profihost AG
@ 2012-05-25 12:10 ` Stefan Priebe - Profihost AG
2012-05-25 15:47 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-25 12:10 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Even with v3.3-rc1 is pretty often 0.
Am 25.05.2012 13:31, schrieb Stefan Priebe - Profihost AG:
>
> Some speed tests with different Kernel Versions. The same applies to
> other FS like btrfs.
> I used "rados -p data bench 100 write -t 16" for all tests and a freshly
> created FS. mount options were always: noatime,nodiratime,nobarrier.
>
> 3.0.30 with XFS
>
> speed is always between 120 and 160MB/s
>
> Total time run: 100.510061
> Total writes made: 3605
> Write size: 4194304
> Bandwidth (MB/sec): 143.468
>
> Average Latency: 0.445714
> Max latency: 1.99929
> Min latency: 0.084812
>
> 3.2.18 with XFS
>
> speed is between 40 and 170MB/s
>
> Total time run: 100.795653
> Total writes made: 3384
> Write size: 4194304
> Bandwidth (MB/sec): 134.292
>
> Average Latency: 0.476297
> Max latency: 2.92075
> Min latency: 0.084884
>
> 3.3.7 with XFS
>
> !! speed heavily jumps between 0 and 170 MB/s !!
>
> Total time run: 107.398166
> Total writes made: 2455
> Write size: 4194304
> Bandwidth (MB/sec): 91.435
>
> Average Latency: 0.699819
> Max latency: 13.8117
> Min latency: 0.084624
>
> 3.4 with XFS
>
> !! speed heavily jumps between 0 and 130 MB/s - most if the time it's
> near 0 !!
>
> Total time run: 115.433531
> Total writes made: 468
> Write size: 4194304
> Bandwidth (MB/sec): 16.217
>
> Average Latency: 3.9452
> Max latency: 53.4356
> Min latency: 0.091276
>
>
> Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-25 12:10 ` Stefan Priebe - Profihost AG
@ 2012-05-25 15:47 ` Alexandre DERUMIER
2012-05-27 9:11 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-25 15:47 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel, Mark Nelson
Hi Stephan,
Do you have same performance with read ?
Did you have done some iostats ?
how much time to flush from journal to disks ?
----- Mail original -----
De: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
À: "Mark Nelson" <mark.nelson@inktank.com>
Cc: ceph-devel@vger.kernel.org
Envoyé: Vendredi 25 Mai 2012 14:10:16
Objet: Re: poor OSD performance using kernel 3.4
Even with v3.3-rc1 is pretty often 0.
Am 25.05.2012 13:31, schrieb Stefan Priebe - Profihost AG:
>
> Some speed tests with different Kernel Versions. The same applies to
> other FS like btrfs.
> I used "rados -p data bench 100 write -t 16" for all tests and a freshly
> created FS. mount options were always: noatime,nodiratime,nobarrier.
>
> 3.0.30 with XFS
>
> speed is always between 120 and 160MB/s
>
> Total time run: 100.510061
> Total writes made: 3605
> Write size: 4194304
> Bandwidth (MB/sec): 143.468
>
> Average Latency: 0.445714
> Max latency: 1.99929
> Min latency: 0.084812
>
> 3.2.18 with XFS
>
> speed is between 40 and 170MB/s
>
> Total time run: 100.795653
> Total writes made: 3384
> Write size: 4194304
> Bandwidth (MB/sec): 134.292
>
> Average Latency: 0.476297
> Max latency: 2.92075
> Min latency: 0.084884
>
> 3.3.7 with XFS
>
> !! speed heavily jumps between 0 and 170 MB/s !!
>
> Total time run: 107.398166
> Total writes made: 2455
> Write size: 4194304
> Bandwidth (MB/sec): 91.435
>
> Average Latency: 0.699819
> Max latency: 13.8117
> Min latency: 0.084624
>
> 3.4 with XFS
>
> !! speed heavily jumps between 0 and 130 MB/s - most if the time it's
> near 0 !!
>
> Total time run: 115.433531
> Total writes made: 468
> Write size: 4194304
> Bandwidth (MB/sec): 16.217
>
> Average Latency: 3.9452
> Max latency: 53.4356
> Min latency: 0.091276
>
>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-25 15:47 ` Alexandre DERUMIER
@ 2012-05-27 9:11 ` Stefan Priebe - Profihost AG
2012-05-27 11:33 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-27 9:11 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Can really nobody help?
Am 25.05.2012 17:47, schrieb Alexandre DERUMIER:
> Hi Stephan,
> Do you have same performance with read ?
Read is fine for both versions see here:
3.0.30
Write:
Total time run: 30.872357
Total writes made: 1095
Write size: 4194304
Bandwidth (MB/sec): 141.874
Average Latency: 0.450187
Max latency: 2.00672
Min latency: 0.091783
Read:
Total time run: 22.907021
Total reads made: 1095
Read size: 4194304
Bandwidth (MB/sec): 191.208
Average Latency: 0.333954
Max latency: 1.71987
Min latency: 0.041373
3.4.0
Write:
Total time run: 124.573247
Total writes made: 647
Write size: 4194304
Bandwidth (MB/sec): 20.775
Average Latency: 3.08058
Max latency: 65.2522
Min latency: 0.089587
Read:
Total time run: 13.191562
Total reads made: 647
Read size: 4194304
Bandwidth (MB/sec): 196.186
Average Latency: 0.322895
Max latency: 1.22392
Min latency: 0.043784
> Did you have done some iostats ?
Yes - I/O is heavily jumping between 0 and 60MB/s but of the time it's 0
or around 10MB/s.
> how much time to flush from journal to disks ?
I don't know how to measure this. As ceph starts to write to journal and
disk in parallel and tmpfs isn't even shown in iostat.
Greets,
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-27 9:11 ` Stefan Priebe - Profihost AG
@ 2012-05-27 11:33 ` Alexandre DERUMIER
2012-05-27 18:57 ` Stefan Priebe
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-27 11:33 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel, Mark Nelson
> how much time to flush from journal to disks ?
>>I don't know how to measure this.
Do an iostat, you must see timelapse of write inactivity on disk (datas are written to journal) , then after a timelapse of write activity on disk.(data flushed from journal to disk)
>>As ceph starts to write to journal and
>>disk in parallel
this is strange, from doc:
http://ceph.com/wiki/OSD_journal
the journal mode should be write-ahead with xfs.
So write to journal first then flush to disk each 30sec.
maybe your tmpfs is too small, and flushs occurs at 50% of free space on journal.
If by exemple, your flush occurs each 1 or 2seconds, this can cause very slow write.
>>and tmpfs isn't even shown in iostat.
indeed, iostat doesn't work with tmpfs...
----- Mail original -----
De: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
Envoyé: Dimanche 27 Mai 2012 11:11:13
Objet: Re: poor OSD performance using kernel 3.4
Can really nobody help?
Am 25.05.2012 17:47, schrieb Alexandre DERUMIER:
> Hi Stephan,
> Do you have same performance with read ?
Read is fine for both versions see here:
3.0.30
Write:
Total time run: 30.872357
Total writes made: 1095
Write size: 4194304
Bandwidth (MB/sec): 141.874
Average Latency: 0.450187
Max latency: 2.00672
Min latency: 0.091783
Read:
Total time run: 22.907021
Total reads made: 1095
Read size: 4194304
Bandwidth (MB/sec): 191.208
Average Latency: 0.333954
Max latency: 1.71987
Min latency: 0.041373
3.4.0
Write:
Total time run: 124.573247
Total writes made: 647
Write size: 4194304
Bandwidth (MB/sec): 20.775
Average Latency: 3.08058
Max latency: 65.2522
Min latency: 0.089587
Read:
Total time run: 13.191562
Total reads made: 647
Read size: 4194304
Bandwidth (MB/sec): 196.186
Average Latency: 0.322895
Max latency: 1.22392
Min latency: 0.043784
> Did you have done some iostats ?
Yes - I/O is heavily jumping between 0 and 60MB/s but of the time it's 0
or around 10MB/s.
> how much time to flush from journal to disks ?
I don't know how to measure this. As ceph starts to write to journal and
disk in parallel and tmpfs isn't even shown in iostat.
Greets,
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-27 11:33 ` Alexandre DERUMIER
@ 2012-05-27 18:57 ` Stefan Priebe
2012-05-28 5:37 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-27 18:57 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Am 27.05.2012 13:33, schrieb Alexandre DERUMIER:
>> how much time to flush from journal to disks ?
>>> I don't know how to measure this.
> Do an iostat, you must see timelapse of write inactivity on disk (datas are written to journal) , then after a timelapse
> of write activity on disk.(data flushed from journal to disk)
No it always starts in parallel. Journal is set to 1GB. I've now moved
the journal to disk - so i can use iostat.
>>> As ceph starts to write to journal and
>>> disk in parallel
>
> this is strange, from doc:
> http://ceph.com/wiki/OSD_journal
>
> the journal mode should be write-ahead with xfs.
> So write to journal first then flush to disk each 30sec.
I'm not quite sure as:
http://ceph.com/wiki/Ceph.conf#filestore_journal_writeahead
says there are two options:
filestore journal writeahead
and
filestore journal parallel
but even
filestore journal writeahead = 1
filestore journal parallel = 0
results in a parallel start.
> maybe your tmpfs is too small, and flushs occurs at 50% of free space on journal.
> If by exemple, your flush occurs each 1 or 2seconds, this can cause very slow write.
1GB? My 1Gbit/s LAN test connection can't handle more than about
120MB/s. So there's at least room for 8-10s.
;-(
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-27 18:57 ` Stefan Priebe
@ 2012-05-28 5:37 ` Alexandre DERUMIER
2012-05-28 6:25 ` Stefan Priebe
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-28 5:37 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel, Mark Nelson
I think filestore journal parallel works only with btrfs.
Other filesystem are writeahead.
if you write at 120MB/S, so your journal of 1GB is at 50% in 4sec.
So you got around 480MB each 4sec, does your disks can flush sequentially these 480MB in less than 4sec ?
(do a small benchmark of your disk in local filesystem, without ceph)
If not, you can have spikes in your write stats if the journal.
simple schema if disks are not fast enough:
0-4sec
------
random write (first wave 480MB) --->journal
4-8sec
------
random write (second wave)---->journal---->write flush of first wave(480MB) --->disks
8-12sec
-------
random write (thirst wave) blocked ---->journal---->write of second wave-blocked---->write flush of first wave not yet finished(480MB) --->disks
good schema
-----------
0-4sec
------
random write (first wave 480MB) --->journal
4-8sec
------
random write (second wave)---->journal---->write flush of first wave(480MB) --->disks
8-12sec
-------
random write (thirst wave)---->journal---->write of second wave(480MB) --->disks
So, with a bigger journal, you have more datas to write to disks, so you can write more datas sequentially in 1 flush.
4sec seem very low, you need to have 20-30sec between flush.
How many disks (7,2K) do you have by osd ?
----- Mail original -----
De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
Envoyé: Dimanche 27 Mai 2012 20:57:23
Objet: Re: poor OSD performance using kernel 3.4
Am 27.05.2012 13:33, schrieb Alexandre DERUMIER:
>> how much time to flush from journal to disks ?
>>> I don't know how to measure this.
> Do an iostat, you must see timelapse of write inactivity on disk (datas are written to journal) , then after a timelapse
> of write activity on disk.(data flushed from journal to disk)
No it always starts in parallel. Journal is set to 1GB. I've now moved
the journal to disk - so i can use iostat.
>>> As ceph starts to write to journal and
>>> disk in parallel
>
> this is strange, from doc:
> http://ceph.com/wiki/OSD_journal
>
> the journal mode should be write-ahead with xfs.
> So write to journal first then flush to disk each 30sec.
I'm not quite sure as:
http://ceph.com/wiki/Ceph.conf#filestore_journal_writeahead
says there are two options:
filestore journal writeahead
and
filestore journal parallel
but even
filestore journal writeahead = 1
filestore journal parallel = 0
results in a parallel start.
> maybe your tmpfs is too small, and flushs occurs at 50% of free space on journal.
> If by exemple, your flush occurs each 1 or 2seconds, this can cause very slow write.
1GB? My 1Gbit/s LAN test connection can't handle more than about
120MB/s. So there's at least room for 8-10s.
;-(
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-28 5:37 ` Alexandre DERUMIER
@ 2012-05-28 6:25 ` Stefan Priebe
2012-05-28 6:52 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-28 6:25 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Am 28.05.2012 07:37, schrieb Alexandre DERUMIER:
> I think filestore journal parallel works only with btrfs.
> Other filesystem are writeahead.
... you might be right but i can't change ceph's implementation.
> if you write at 120MB/S, so your journal of 1GB is at 50% in 4sec.
>
> So you got around 480MB each 4sec, does your disks can flush sequentially these 480MB in less than 4sec ?
> (do a small benchmark of your disk in local filesystem, without ceph)
>
> If not, you can have spikes in your write stats if the journal.
>
> simple schema if disks are not fast enough:
I totally aggree with you but this is just a test setup AND if you have
a big log file to copy let's say 100GB your journal will never be big
enough and the speed should never drop to 0MB/s. Also i see the correct
behaviour with 3.0.X where the speed is maxed to the underlying device.
So i still see no reason that with 3.4 the speed drops to 0MB/s and is
mostly 10-20MB/s instead of 130MB/s.
> How many disks (7,2K) do you have by osd ?
One intel 520 SSD per OSD.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-28 6:25 ` Stefan Priebe
@ 2012-05-28 6:52 ` Alexandre DERUMIER
2012-05-28 19:48 ` Stefan Priebe
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-28 6:52 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel, Mark Nelson
> I think filestore journal parallel works only with btrfs.
> Other filesystem are writeahead.
>>... you might be right but i can't change ceph's implementation.
See my schema,
I think you see parallel writes, because you see flush write of first wave to disk, in the same time
of second wave write to journal.
>>I totally aggree with you but this is just a test setup AND if you have
>>a big log file to copy let's say 100GB your journal will never be big
>>enough and the speed should never drop to 0MB/s. Also i see the correct
>>behaviour with 3.0.X where the speed is maxed to the underlying device.
>>So i still see no reason that with 3.4 the speed drops to 0MB/s and is
>>mostly 10-20MB/s instead of 130MB/s.
Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...)
on my schema:
Enough slowly to have the third wave to block on the journal. (so 0MB/S)
maybe some local benchmark of your ssd with 3.4 can give some tips ?
>> How many disks (7,2K) do you have by osd ?
>>>One intel 520 SSD per OSD.
I see some benchmark on internet about 150-300MB/s (depend of the blocksize).
Something must be wrong, Doing local benchmark can really help I think.
You can use sysbench-tools
https://github.com/tsuna/sysbench-tools
It make bench compare with nice graphs.
----- Mail original -----
De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
Envoyé: Lundi 28 Mai 2012 08:25:24
Objet: Re: poor OSD performance using kernel 3.4
Am 28.05.2012 07:37, schrieb Alexandre DERUMIER:
> I think filestore journal parallel works only with btrfs.
> Other filesystem are writeahead.
... you might be right but i can't change ceph's implementation.
> if you write at 120MB/S, so your journal of 1GB is at 50% in 4sec.
>
> So you got around 480MB each 4sec, does your disks can flush sequentially these 480MB in less than 4sec ?
> (do a small benchmark of your disk in local filesystem, without ceph)
>
> If not, you can have spikes in your write stats if the journal.
>
> simple schema if disks are not fast enough:
I totally aggree with you but this is just a test setup AND if you have
a big log file to copy let's say 100GB your journal will never be big
enough and the speed should never drop to 0MB/s. Also i see the correct
behaviour with 3.0.X where the speed is maxed to the underlying device.
So i still see no reason that with 3.4 the speed drops to 0MB/s and is
mostly 10-20MB/s instead of 130MB/s.
> How many disks (7,2K) do you have by osd ?
One intel 520 SSD per OSD.
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-28 6:52 ` Alexandre DERUMIER
@ 2012-05-28 19:48 ` Stefan Priebe
2012-05-29 3:54 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-28 19:48 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Am 28.05.2012 08:52, schrieb Alexandre DERUMIER:
>> I think filestore journal parallel works only with btrfs.
>> Other filesystem are writeahead.
>>> ... you might be right but i can't change ceph's implementation.
>
> See my schema,
> I think you see parallel writes, because you see flush write of first wave to disk, in the same time
> of second wave write to journal.
Yes i fulllý understand and agree - but still this should at least
result in a constant bandwidth near max of underlying disk.
>>> I totally aggree with you but this is just a test setup AND if you have
>>> a big log file to copy let's say 100GB your journal will never be big
>>> enough and the speed should never drop to 0MB/s. Also i see the correct
>>> behaviour with 3.0.X where the speed is maxed to the underlying device.
>>> So i still see no reason that with 3.4 the speed drops to 0MB/s and is
>>> mostly 10-20MB/s instead of 130MB/s.
>
> Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...)
This happens with ext4 or btrfs too.
Squential write speed to FS is exactly the same under 3.0 and 3.4 using
oflag=direct.
3.4:
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 41,4899 s, 253 MB/s
3.0:
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40,861 s, 257 MB/s
> maybe some local benchmark of your ssd with 3.4 can give some tips ?
>>> How many disks (7,2K) do you have by osd ?
>>>> One intel 520 SSD per OSD.
>
> I see some benchmark on internet about 150-300MB/s (depend of the blocksize).
bench OSD shows around 260MB/s
ceph osd tell X bench shows me a speed of 260MB/s under both kernels
which corresponds to the dd from above.
> Something must be wrong, Doing local benchmark can really help I think.
> You can use sysbench-tools
> https://github.com/tsuna/sysbench-tools
> It make bench compare with nice graphs.
Thx hopefully i'll find something.
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-28 19:48 ` Stefan Priebe
@ 2012-05-29 3:54 ` Alexandre DERUMIER
2012-05-29 8:22 ` Stefan Priebe - Profihost AG
2012-05-29 9:46 ` Stefan Priebe - Profihost AG
0 siblings, 2 replies; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-29 3:54 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel, Mark Nelson
>> This happens with ext4 or btrfs too.
maybe this is related to io scheduler ?
did you have compared cfq,deadline,noop scheduler ?
noop should be fast with ssd.
also what's is your sas/sata controller ?
----- Mail original -----
De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
Envoyé: Lundi 28 Mai 2012 21:48:34
Objet: Re: poor OSD performance using kernel 3.4
Am 28.05.2012 08:52, schrieb Alexandre DERUMIER:
>> I think filestore journal parallel works only with btrfs.
>> Other filesystem are writeahead.
>>> ... you might be right but i can't change ceph's implementation.
>
> See my schema,
> I think you see parallel writes, because you see flush write of first wave to disk, in the same time
> of second wave write to journal.
Yes i fulllý understand and agree - but still this should at least
result in a constant bandwidth near max of underlying disk.
>>> I totally aggree with you but this is just a test setup AND if you have
>>> a big log file to copy let's say 100GB your journal will never be big
>>> enough and the speed should never drop to 0MB/s. Also i see the correct
>>> behaviour with 3.0.X where the speed is maxed to the underlying device.
>>> So i still see no reason that with 3.4 the speed drops to 0MB/s and is
>>> mostly 10-20MB/s instead of 130MB/s.
>
> Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...)
This happens with ext4 or btrfs too.
Squential write speed to FS is exactly the same under 3.0 and 3.4 using
oflag=direct.
3.4:
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 41,4899 s, 253 MB/s
3.0:
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40,861 s, 257 MB/s
> maybe some local benchmark of your ssd with 3.4 can give some tips ?
>>> How many disks (7,2K) do you have by osd ?
>>>> One intel 520 SSD per OSD.
>
> I see some benchmark on internet about 150-300MB/s (depend of the blocksize).
bench OSD shows around 260MB/s
ceph osd tell X bench shows me a speed of 260MB/s under both kernels
which corresponds to the dd from above.
> Something must be wrong, Doing local benchmark can really help I think.
> You can use sysbench-tools
> https://github.com/tsuna/sysbench-tools
> It make bench compare with nice graphs.
Thx hopefully i'll find something.
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 3:54 ` Alexandre DERUMIER
@ 2012-05-29 8:22 ` Stefan Priebe - Profihost AG
2012-05-29 13:01 ` Alexandre DERUMIER
2012-05-29 9:46 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-29 8:22 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Am 29.05.2012 05:54, schrieb Alexandre DERUMIER:
>>> This happens with ext4 or btrfs too.
>
> maybe this is related to io scheduler ?
> did you have compared cfq,deadline,noop scheduler ?
This is something i consider for performance tuning later on, when
everything is running smooth. Right now i'm using CFQ with the tuned IBM
settings (which proxmox uses too).
Here are some outputs of basic fio Tests running on 3.4 and 3.0.
3.4: http://pastebin.com/raw.php?i=6GEKsCYH
3.0: http://pastebin.com/raw.php?i=FU4AtUck
strangely 3.4 is faster but this corresponds to the fact that the normal
Disk I/O is working fine with 3.4 It's just ceph which isn't working fine.
> also what's is your sas/sata controller ?
Intel onboard SATA controller in this testsetup.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 3:54 ` Alexandre DERUMIER
2012-05-29 8:22 ` Stefan Priebe - Profihost AG
@ 2012-05-29 9:46 ` Stefan Priebe - Profihost AG
2012-05-29 13:39 ` Yann Dupont
1 sibling, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-29 9:46 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
It would be really nice if somebody from inktank can comment this whole
sitation.
Thanks!
Stefan
Am 29.05.2012 05:54, schrieb Alexandre DERUMIER:
>>> This happens with ext4 or btrfs too.
>
> maybe this is related to io scheduler ?
>
> did you have compared cfq,deadline,noop scheduler ?
>
> noop should be fast with ssd.
>
>
> also what's is your sas/sata controller ?
>
> ----- Mail original -----
>
> De: "Stefan Priebe" <s.priebe@profihost.ag>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
> Envoyé: Lundi 28 Mai 2012 21:48:34
> Objet: Re: poor OSD performance using kernel 3.4
>
> Am 28.05.2012 08:52, schrieb Alexandre DERUMIER:
>>> I think filestore journal parallel works only with btrfs.
>>> Other filesystem are writeahead.
>>>> ... you might be right but i can't change ceph's implementation.
>>
>> See my schema,
>> I think you see parallel writes, because you see flush write of first wave to disk, in the same time
>> of second wave write to journal.
> Yes i fulllý understand and agree - but still this should at least
> result in a constant bandwidth near max of underlying disk.
>
>>>> I totally aggree with you but this is just a test setup AND if you have
>>>> a big log file to copy let's say 100GB your journal will never be big
>>>> enough and the speed should never drop to 0MB/s. Also i see the correct
>>>> behaviour with 3.0.X where the speed is maxed to the underlying device.
>>>> So i still see no reason that with 3.4 the speed drops to 0MB/s and is
>>>> mostly 10-20MB/s instead of 130MB/s.
>>
>> Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...)
>
> This happens with ext4 or btrfs too.
>
> Squential write speed to FS is exactly the same under 3.0 and 3.4 using
> oflag=direct.
>
> 3.4:
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 41,4899 s, 253 MB/s
>
> 3.0:
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 40,861 s, 257 MB/s
>
>> maybe some local benchmark of your ssd with 3.4 can give some tips ?
>
>>>> How many disks (7,2K) do you have by osd ?
>>>>> One intel 520 SSD per OSD.
>>
>> I see some benchmark on internet about 150-300MB/s (depend of the blocksize).
> bench OSD shows around 260MB/s
>
> ceph osd tell X bench shows me a speed of 260MB/s under both kernels
> which corresponds to the dd from above.
>
>> Something must be wrong, Doing local benchmark can really help I think.
>> You can use sysbench-tools
>> https://github.com/tsuna/sysbench-tools
>> It make bench compare with nice graphs.
> Thx hopefully i'll find something.
>
> Stefan
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 8:22 ` Stefan Priebe - Profihost AG
@ 2012-05-29 13:01 ` Alexandre DERUMIER
2012-05-29 14:18 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-29 13:01 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel, Mark Nelson
fio benchmark will give you raw device performance bypassing filesystem.
So maybe the problem is in xfs or linux vfs layer.
I think you need to bench the filesystem to compare performance
----- Mail original -----
De: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org, "Mark Nelson" <mark.nelson@inktank.com>
Envoyé: Mardi 29 Mai 2012 10:22:34
Objet: Re: poor OSD performance using kernel 3.4
Am 29.05.2012 05:54, schrieb Alexandre DERUMIER:
>>> This happens with ext4 or btrfs too.
>
> maybe this is related to io scheduler ?
> did you have compared cfq,deadline,noop scheduler ?
This is something i consider for performance tuning later on, when
everything is running smooth. Right now i'm using CFQ with the tuned IBM
settings (which proxmox uses too).
Here are some outputs of basic fio Tests running on 3.4 and 3.0.
3.4: http://pastebin.com/raw.php?i=6GEKsCYH
3.0: http://pastebin.com/raw.php?i=FU4AtUck
strangely 3.4 is faster but this corresponds to the fact that the normal
Disk I/O is working fine with 3.4 It's just ceph which isn't working fine.
> also what's is your sas/sata controller ?
Intel onboard SATA controller in this testsetup.
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 9:46 ` Stefan Priebe - Profihost AG
@ 2012-05-29 13:39 ` Yann Dupont
2012-05-29 14:43 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-29 13:39 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel
On 29/05/2012 11:46, Stefan Priebe - Profihost AG wrote:
> It would be really nice if somebody from inktank can comment this whole
> sitation.
>
Hello.
I think I have the same bug :
My setup is with 8 OSD nodes, 3 MDS (1 active) & 3 MON.
All my machines are debian, using a custom 3.4.0 kernel. Ceph is
0.47.2-1~bpo60+1 (debian package)
root@label5:~# rados -p data bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 99 83 331.9 332 0.059756 0.0946512
2 16 141 125 249.946 168 0.049822 0.212338
3 16 166 150 199.963 100 0.057352 0.257179
4 16 227 211 210.965 244 0.043592 0.265005
5 16 257 241 192.767 120 0.040883 0.276718
6 16 260 244 162.641 12 1.59593 0.293439
7 16 319 303 173.118 236 0.056913 0.357856
8 16 348 332 165.976 116 0.052954 0.332424
9 16 348 332 147.535 0 - 0.332424
10 16 472 456 182.374 248 0.038543 0.343745
11 16 485 469 170.522 52 0.040475 0.347328
12 16 485 469 156.312 0 - 0.347328
13 16 517 501 154.133 64 0.047759 0.378595
14 16 562 546 155.98 180 0.042814 0.395036
15 16 563 547 145.847 4 0.045834 0.394398
16 16 563 547 136.732 0 - 0.394398
17 16 563 547 128.689 0 - 0.394398
18 16 667 651 144.648 138.667 0.06501 0.440847
19 16 703 687 144.613 144 0.040772 0.421935
min lat: 0.030505 max lat: 5.05834 avg lat: 0.421935
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 16 703 687 137.382 0 - 0.421935
21 16 704 688 131.031 2 2.65675 0.425184
22 14 704 690 125.439 8 3.26857 0.433417
Total time run: 22.042041
Total writes made: 704
Write size: 4194304
Bandwidth (MB/sec): 127.756
Average Latency: 0.498932
Max latency: 5.05834
Min latency: 0.030505
What puzzle me is if I test with pool rbd instead :
root@label5:~# rados -p rbd bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 191 175 699.782 700 0.236737 0.0841979
2 16 397 381 761.837 824 0.065643 0.0813094
3 16 602 586 781.193 820 0.07921 0.0808584
4 16 815 799 798.88 852 0.066597 0.0785906
5 16 1026 1010 807.885 844 0.10364 0.0785475
6 16 1249 1233 821.886 892 0.069324 0.0773951
7 16 1461 1445 825.608 848 0.053176 0.0770628
8 16 1680 1664 831.895 876 0.09612 0.0765263
9 16 1897 1881 835.891 868 0.100736 0.0761617
10 16 2105 2089 835.491 832 0.114913 0.0761897
11 16 2329 2313 840.983 896 0.042009 0.0758589
12 16 2553 2537 845.559 896 0.07017 0.0754364
13 16 2786 2770 852.203 932 0.066365 0.0749136
14 16 3009 2993 855.041 892 0.06491 0.0746046
15 16 3228 3212 856.431 876 0.05698 0.0745573
16 16 3437 3421 855.148 836 0.062162 0.0746339
17 16 3652 3636 855.428 860 0.140451 0.074534
18 16 3878 3862 858.121 904 0.081505 0.0743125
19 16 4106 4090 860.952 912 0.079922 0.0742146
min lat: 0.032342 max lat: 0.63151 avg lat: 0.0741575
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 16 4324 4308 861.495 872 0.06199 0.0741575
Total time run: 20.102264
Total writes made: 4325
Write size: 4194304
Bandwidth (MB/sec): 860.600
Average Latency: 0.0743131
Max latency: 0.63151
Min latency: 0.032342
As you can see, much more stable bandwith with this pool.
I understand data & rbd pool probably don't use the same internals, but
is this difference expected ?
disclaimer: By no mean I'm a ceph expert, I'm just experimenting with
it, and still don't understand all the internals.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 13:01 ` Alexandre DERUMIER
@ 2012-05-29 14:18 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-29 14:18 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel, Mark Nelson
Am 29.05.2012 15:01, schrieb Alexandre DERUMIER:
> fio benchmark will give you raw device performance bypassing filesystem.
>
> So maybe the problem is in xfs or linux vfs layer.
>
> I think you need to bench the filesystem to compare performance
here another test with bonnie, which shows the same:
http://pastebin.com/raw.php?i=fGTt4NLi
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 13:39 ` Yann Dupont
@ 2012-05-29 14:43 ` Stefan Priebe - Profihost AG
2012-05-29 17:50 ` Mark Nelson
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-29 14:43 UTC (permalink / raw)
To: Yann Dupont; +Cc: ceph-devel
Am 29.05.2012 15:39, schrieb Yann Dupont:
> On 29/05/2012 11:46, Stefan Priebe - Profihost AG wrote:
>> It would be really nice if somebody from inktank can comment this whole
>> sitation.
>>
> Hello.
> I think I have the same bug :
>
> My setup is with 8 OSD nodes, 3 MDS (1 active) & 3 MON.
> All my machines are debian, using a custom 3.4.0 kernel. Ceph is
> 0.47.2-1~bpo60+1 (debian package)
That sounds absolutely like the same issue. Sadly nobody from inktank
has replied to this problems for the last days.
> As you can see, much more stable bandwith with this pool.
That's pretty strange...
> I understand data & rbd pool probably don't use the same internals, but
> is this difference expected ?
There must be differences in pool handling.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 14:43 ` Stefan Priebe - Profihost AG
@ 2012-05-29 17:50 ` Mark Nelson
2012-05-29 19:50 ` Yann Dupont
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Mark Nelson @ 2012-05-29 17:50 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Yann Dupont, ceph-devel
On 05/29/2012 09:43 AM, Stefan Priebe - Profihost AG wrote:
> Am 29.05.2012 15:39, schrieb Yann Dupont:
>> On 29/05/2012 11:46, Stefan Priebe - Profihost AG wrote:
>>> It would be really nice if somebody from inktank can comment this whole
>>> sitation.
>>>
>> Hello.
>> I think I have the same bug :
>>
>> My setup is with 8 OSD nodes, 3 MDS (1 active)& 3 MON.
>> All my machines are debian, using a custom 3.4.0 kernel. Ceph is
>> 0.47.2-1~bpo60+1 (debian package)
> That sounds absolutely like the same issue. Sadly nobody from inktank
> has replied to this problems for the last days.
Sorry about that, yesterday was a holiday in the US.
I did some quick tests on a couple of nodes I had laying around this
morning.
Distro: Oneiric (IE no syncfs in glibc)
Ceph: 0.46-65-gf6c5dff
1 1GbE Client node
3 1GbE Mon nodes
2 1GbE OSD nodes with 1 OSD on each mounted on a 7200rpm SAS drive.
btrfs with -l 64k -n64k, mounted using noatime. H700 Raid controller
with each drive in a 1 disk raid0. Journals are partitioned on a
separate drive.
/proc/version:
Linux version 3.4.0-ceph (autobuild-ceph@gitbuilder-kernel-amd64)
rados -p data bench 120 write:
Total time run: 120.601286
Total writes made: 2979
Write size: 4194304
Bandwidth (MB/sec): 98.805
Average Latency: 0.647507
Max latency: 1.39966
Min latency: 0.181663
Once I get these nodes up to 0.47 and get them switched over to 10GbE
I'll redo the btrfs tests and try out xfs as well with longer running tests.
>> As you can see, much more stable bandwith with this pool.
> That's pretty strange...
Indeed, that is very strange! Can you check to see how many pgs are in
each? Any difference in replication level? You can check with:
ceph osd pool get <pool> size
ceph osd pool get <pool> pg_num
>> I understand data& rbd pool probably don't use the same internals, but
>> is this difference expected ?
> There must be differences in pool handling.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks,
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 17:50 ` Mark Nelson
@ 2012-05-29 19:50 ` Yann Dupont
2012-05-29 21:04 ` Stefan Priebe
2012-05-29 21:08 ` Stefan Priebe
2 siblings, 0 replies; 73+ messages in thread
From: Yann Dupont @ 2012-05-29 19:50 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Priebe - Profihost AG, ceph-devel
Le 29/05/2012 19:50, Mark Nelson a écrit :
>
> 1 1GbE Client node
> 3 1GbE Mon nodes
> 2 1GbE OSD nodes with 1 OSD on each mounted on a 7200rpm SAS drive.
> btrfs with -l 64k -n64k, mounted using noatime. H700 Raid controller
> with each drive in a 1 disk raid0. Journals are partitioned on a
> separate drive.
>
Hello ,
Forgot to mention I'm using 10 Gbe and FS using btrfs with -l 64k -n64k,
but also space_cache,compress=lzo,nobarrier,noatime.
journal is on tmpfs :
osd journal = /dev/shm/journal
osd journal size = 6144
Remember It's not a production system for the moment. I'm just trying to
evaluate what is the best performance I can get. (and if the system is
stable enough to start alpha/pre-production services). BTW, I noticed
OSD usings XFS are much much slower than OSD with btrfs right now,
particulary in rbd tests. btrfs have some stability problems, even if
with newer kernels it seems better.
> /proc/version:
> Linux version 3.4.0-ceph (autobuild-ceph@gitbuilder-kernel-amd64)
>
> rados -p data bench 120 write:
>
> Total time run: 120.601286
> Total writes made: 2979
> Write size: 4194304
> Bandwidth (MB/sec): 98.805
>
> Average Latency: 0.647507
> Max latency: 1.39966
> Min latency: 0.181663
>
> Once I get these nodes up to 0.47 and get them switched over to 10GbE
> I'll redo the btrfs tests and try out xfs as well with longer running
> tests.
>
>>> As you can see, much more stable bandwith with this pool.
>> That's pretty strange...
>
> Indeed, that is very strange! Can you check to see how many pgs are
> in each? Any difference in replication level? You can check with:
>
> ceph osd pool get <pool> size
root@label5:~# ceph osd pool get data size
don't know how to get pool field size
root@label5:~# ceph osd pool get rbd size
don't know how to get pool field size
Is size the good name of the field ? In the the wiki size isn't listed
as a valid field
> ceph osd pool get <pool> pg_num
>
root@label5:~# ceph osd pool get rbd pg_num
PG_NUM: 576
root@label5:~# ceph osd pool get data pg_num
PG_NUM: 576
Th pg num is quite low because I started with small OSD (9 osd with 200G
each - internal disks) when I formatted. Now, I reduced to 8 osd, (osd.4
is out) but with much larger (& faster) storage. 6 OSD have 5T on it, 2
have still 200G but they are planned to migrate before the end of the week.
I try, for the moment, to keep the OSD similars. Replication is set to 2.
No OSD is full, I don't have much data stored for the moment.
Concerning crush map, I'm not using the default one :
The 8 nodes are in 3 different locations (some kilometers away). 2 are
in 1 place, 2 in another, and the 4 last in the principal place.
I try to group host together to avoid problem when I loose a location
(electrical problem, for example). Not sure I really customized the
crush map as I should have.
here is the map :
begin crush map
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 device4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
# types
type 0 osd
type 1 host
type 2 rack
type 3 pool
# buckets
host karuizawa {
id -5 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.2 weight 1.000
}
host hazelburn {
id -6 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.3 weight 1.000
}
rack loire {
id -3 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item karuizawa weight 1.000
item hazelburn weight 1.000
}
host carsebridge {
id -8 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.5 weight 1.000
}
host cameronbridge {
id -9 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.6 weight 1.000
}
rack chantrerie {
id -7 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item carsebridge weight 1.000
item cameronbridge weight 1.000
}
host chichibu {
id -2 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
}
host glenesk {
id -4 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 1.000
}
host braeval {
id -10 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.7 weight 1.000
}
host hanyu {
id -11 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.8 weight 1.000
}
rack lombarderie {
id -12 # do not change unnecessarily
# weight 4.000
alg straw
hash 0 # rjenkins1
item chichibu weight 1.000
item glenesk weight 1.000
item braeval weight 1.000
item hanyu weight 1.000
}
pool default {
id -1 # do not change unnecessarily
# weight 8.000
alg straw
hash 0 # rjenkins1
item loire weight 2.000
item chantrerie weight 2.000
item lombarderie weight 4.000
}
# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule rbd {
ruleset 2
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Hope it helps,
cheers
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 17:50 ` Mark Nelson
2012-05-29 19:50 ` Yann Dupont
@ 2012-05-29 21:04 ` Stefan Priebe
2012-05-29 21:08 ` Stefan Priebe
2 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe @ 2012-05-29 21:04 UTC (permalink / raw)
To: Mark Nelson; +Cc: Yann Dupont, ceph-devel
Am 29.05.2012 19:50, schrieb Mark Nelson:
> Once I get these nodes up to 0.47 and get them switched over to 10GbE
> I'll redo the btrfs tests and try out xfs as well with longer running
> tests.
I always test on 1GE and see this proble no matter whether btrfs or xfs.
So i think this is just a waste of time.
At least my test differ as i see this problem on ALL pools.
Mark should i try 0.46?
Thanks,
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 17:50 ` Mark Nelson
2012-05-29 19:50 ` Yann Dupont
2012-05-29 21:04 ` Stefan Priebe
@ 2012-05-29 21:08 ` Stefan Priebe
2012-05-29 21:31 ` Yann Dupont
2012-05-29 21:41 ` Mark Nelson
2 siblings, 2 replies; 73+ messages in thread
From: Stefan Priebe @ 2012-05-29 21:08 UTC (permalink / raw)
To: Mark Nelson; +Cc: Yann Dupont, ceph-devel
Am 29.05.2012 19:50, schrieb Mark Nelson:
> I did some quick tests on a couple of nodes I had laying around this
> morning.
I just noticed that i get a constant rate of 40MB/s while using 1
thread. When i use two thread or more i get drop to 0MB/s and crazy
jumping values.
~# rados -p rbd bench 90 write -t 1
Maintaining 1 concurrent writes of 4194304 bytes for at least 90 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 1 10 9 35.994 36 0.100147 0.101133
2 1 20 19 37.9931 40 0.096893 0.100719
3 1 31 30 39.9921 44 0.09784 0.0999607
4 1 41 40 39.9929 40 0.099156 0.0999003
5 1 51 50 39.9932 40 0.098239 0.0996518
6 1 61 60 39.9932 40 0.098682 0.0994851
7 1 71 70 39.9933 40 0.094397 0.099184
8 1 81 80 39.9931 40 0.099823 0.0993327
9 1 91 90 39.9931 40 0.101013 0.0992236
10 1 101 100 39.993 40 0.098277 0.099237
# rados -p rbd bench 90 write -t 2
Maintaining 2 concurrent writes of 4194304 bytes for at least 90 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 2 15 13 51.9888 52 0.0956 0.115315
2 2 22 20 39.9928 28 0.120065 0.193125
3 2 41 39 51.9917 76 0.09557 0.15246
4 2 58 56 55.9912 68 0.09875 0.137688
5 2 67 65 51.992 36 0.111211 0.139465
6 2 85 83 55.3251 72 0.136967 0.143079
7 2 101 99 56.5625 64 0.098664 0.136263
8 2 101 99 49.4919 0 - 0.136263
9 2 112 110 48.8808 22 0.099479 0.160563
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:08 ` Stefan Priebe
@ 2012-05-29 21:31 ` Yann Dupont
2012-05-29 21:34 ` Stefan Priebe
2012-05-29 21:41 ` Mark Nelson
1 sibling, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-29 21:31 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Mark Nelson, ceph-devel
Le 29/05/2012 23:08, Stefan Priebe a écrit :
> Am 29.05.2012 19:50, schrieb Mark Nelson:
>> I did some quick tests on a couple of nodes I had laying around this
>> morning.
>
> I just noticed that i get a constant rate of 40MB/s while using 1
> thread. When i use two thread or more i get drop to 0MB/s and crazy
> jumping values.
>
> ~# rados -p rbd bench 90 write -t 1
> Maintaining 1 concurrent writes of 4194304 bytes for at least 90 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 1 10 9 35.994 36 0.100147 0.101133
> 2 1 20 19 37.9931 40 0.096893 0.100719
> 3 1 31 30 39.9921 44 0.09784 0.0999607
> 4 1 41 40 39.9929 40 0.099156 0.0999003
> 5 1 51 50 39.9932 40 0.098239 0.0996518
> 6 1 61 60 39.9932 40 0.098682 0.0994851
> 7 1 71 70 39.9933 40 0.094397 0.099184
> 8 1 81 80 39.9931 40 0.099823 0.0993327
> 9 1 91 90 39.9931 40 0.101013 0.0992236
> 10 1 101 100 39.993 40 0.098277 0.099237
>
>
not here :
on data :
root@label5:~# rados -p data bench 20 write -t 1
Maintaining 1 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 1 15 14 55.9837 56 0.096813 0.0677311
2 1 33 32 63.9852 72 0.088802 0.0612602
3 1 51 50 66.6529 72 0.056883 0.0594909
4 1 60 59 58.989 36 0.046377 0.0577145
5 1 60 59 47.1916 0 - 0.0577145
6 1 79 78 51.9911 38 0.041831 0.0768918
7 1 98 97 55.419 76 0.050436 0.0718439
8 1 101 100 49.9919 12 0.043673 0.0712079
9 1 101 100 44.4375 0 - 0.0712079
10 1 115 114 45.5929 28 0.043768 0.0876947
11 1 134 133 48.356 76 0.052382 0.0826428
12 1 154 153 50.9919 80 0.042077 0.0783619
13 1 175 174 53.5299 84 0.053474 0.0745956
14 1 194 193 55.1339 76 0.049631 0.0724711
15 1 211 210 55.991 68 0.052683 0.0712887
16 1 232 231 57.7407 84 0.044341 0.0692121
17 1 249 248 58.3436 68 0.053707 0.0684414
18 1 258 257 57.102 36 0.086088 0.0680656
19 1 267 266 55.9911 36 0.050902 0.0713341
min lat: 0.033395 max lat: 2.14757 avg lat: 0.0703545
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 1 285 284 56.7909 72 0.047755 0.0703545
Total time run: 20.066134
Total writes made: 286
Write size: 4194304
Bandwidth (MB/sec): 57.011
on rbd :
Maintaining 1 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 1 1 0 0 0 - 0
1 1 18 17 67.9801 68 0.065869 0.0587313
2 1 35 34 67.9842 68 0.056982 0.0580468
3 1 55 54 71.9848 80 0.050305 0.0554721
4 1 72 71 70.9858 68 0.039387 0.0561269
5 1 91 90 71.986 76 0.055236 0.0554057
6 1 109 108 71.9864 72 0.069547 0.0554112
7 1 126 125 71.4154 68 0.049234 0.0556564
8 1 146 145 72.4868 80 0.052302 0.0551064
9 1 165 164 72.8758 76 0.0533 0.0548858
10 1 184 183 73.187 76 0.041342 0.0543598
11 1 202 201 73.078 72 0.048963 0.0544978
12 1 218 217 72.3207 64 0.071926 0.0549402
13 1 236 235 72.2951 72 0.055804 0.0551936
14 1 254 253 72.2731 72 0.058315 0.0552612
15 1 272 271 72.2541 72 0.047687 0.0552036
16 1 290 289 72.2375 72 0.059162 0.055275
17 1 308 307 72.2229 72 0.051991 0.0553467
18 1 327 326 72.432 76 0.053271 0.0552114
19 1 346 345 72.6192 76 0.058125 0.0550658
min lat: 0.036202 max lat: 0.113077 avg lat: 0.0547502
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 1 366 365 72.9874 80 0.036246 0.0547502
Total time run: 20.086555
Total writes made: 367
Write size: 4194304
Bandwidth (MB/sec): 73.084
>
> # rados -p rbd bench 90 write -t 2
> Maintaining 2 concurrent writes of 4194304 bytes for at least 90 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 2 15 13 51.9888 52 0.0956 0.115315
> 2 2 22 20 39.9928 28 0.120065 0.193125
> 3 2 41 39 51.9917 76 0.09557 0.15246
> 4 2 58 56 55.9912 68 0.09875 0.137688
> 5 2 67 65 51.992 36 0.111211 0.139465
> 6 2 85 83 55.3251 72 0.136967 0.143079
> 7 2 101 99 56.5625 64 0.098664 0.136263
> 8 2 101 99 49.4919 0 - 0.136263
> 9 2 112 110 48.8808 22 0.099479 0.160563
>
> Stefan
pool rbd stays consistent here, no matter how much thread involved. The
max speed with my setup is around 16~24 threads, and it's quite effective.
on the contrary, pool data is jumping up & down, no matter how much
thread involved :)
Maybe this is because journal is too tight ? Or because 2 of the 8 nodes
have slower disks ?
I may be able to retest thursday, my two last osd should have faster &
larger disks.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:31 ` Yann Dupont
@ 2012-05-29 21:34 ` Stefan Priebe
2012-05-29 21:45 ` Yann Dupont
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-29 21:34 UTC (permalink / raw)
To: Yann Dupont; +Cc: Mark Nelson, ceph-devel
Am 29.05.2012 23:31, schrieb Yann Dupont:
> on the contrary, pool data is jumping up & down, no matter how much
> thread involved :)
>
> Maybe this is because journal is too tight ? Or because 2 of the 8 nodes
> have slower disks ?
Can you try with 3.0.X? I would be really interested what happens in
this case.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:08 ` Stefan Priebe
2012-05-29 21:31 ` Yann Dupont
@ 2012-05-29 21:41 ` Mark Nelson
2012-05-30 6:22 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-29 21:41 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Yann Dupont, ceph-devel
On 05/29/2012 04:08 PM, Stefan Priebe wrote:
> Am 29.05.2012 19:50, schrieb Mark Nelson:
>> I did some quick tests on a couple of nodes I had laying around this
>> morning.
>
> I just noticed that i get a constant rate of 40MB/s while using 1
> thread. When i use two thread or more i get drop to 0MB/s and crazy
> jumping values.
>
> ~# rados -p rbd bench 90 write -t 1
> Maintaining 1 concurrent writes of 4194304 bytes for at least 90 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 1 10 9 35.994 36 0.100147 0.101133
> 2 1 20 19 37.9931 40 0.096893 0.100719
> 3 1 31 30 39.9921 44 0.09784 0.0999607
> 4 1 41 40 39.9929 40 0.099156 0.0999003
> 5 1 51 50 39.9932 40 0.098239 0.0996518
> 6 1 61 60 39.9932 40 0.098682 0.0994851
> 7 1 71 70 39.9933 40 0.094397 0.099184
> 8 1 81 80 39.9931 40 0.099823 0.0993327
> 9 1 91 90 39.9931 40 0.101013 0.0992236
> 10 1 101 100 39.993 40 0.098277 0.099237
>
>
When you are using 1 thread, you are hitting a ~40MB/s limit (probably
networking related) before the data gets to the journal. Because (in
this case) the filestore data disk can handle that throughput,
everything looks nice and consistent.
>
> # rados -p rbd bench 90 write -t 2
> Maintaining 2 concurrent writes of 4194304 bytes for at least 90 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 2 15 13 51.9888 52 0.0956 0.115315
> 2 2 22 20 39.9928 28 0.120065 0.193125
> 3 2 41 39 51.9917 76 0.09557 0.15246
> 4 2 58 56 55.9912 68 0.09875 0.137688
> 5 2 67 65 51.992 36 0.111211 0.139465
> 6 2 85 83 55.3251 72 0.136967 0.143079
> 7 2 101 99 56.5625 64 0.098664 0.136263
> 8 2 101 99 49.4919 0 - 0.136263
> 9 2 112 110 48.8808 22 0.099479 0.160563
>
In this case, that 40MB/s limit with 1 thread has increased. Now more
data is getting fed into the journal than the filestore can write out to
disk. Eventually writes stall while the data is being written out.
> Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:34 ` Stefan Priebe
@ 2012-05-29 21:45 ` Yann Dupont
2012-05-30 6:29 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-29 21:45 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Mark Nelson, ceph-devel
Le 29/05/2012 23:34, Stefan Priebe a écrit :
> Am 29.05.2012 23:31, schrieb Yann Dupont:
>> on the contrary, pool data is jumping up & down, no matter how much
>> thread involved :)
>>
>> Maybe this is because journal is too tight ? Or because 2 of the 8 nodes
>> have slower disks ?
> Can you try with 3.0.X? I would be really interested what happens in
> this case.
>
> Stefan
hum...
probably not directly. Older btrfs won't like big metadata, I think.
This is quite a recent feature.
but as my ceph is not in production, I can stop it, use an older kernel
, format new volumes in btrfs or xfs, or whatever, and try.
It will be a totally fresh install then.
I can do that thursday.
Stefan, mark,
I'll take the latest 3.0 kernel - or do you have a particular 3.0 kernel
version to test ?
And Do you want a particular xfs/btrfs format ?
cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-24 14:10 poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
2012-05-24 14:57 ` Mark Nelson
[not found] ` <CAJCPpW+SKnnVUaDEAsCkKyZwMVrHCRJF2C8zqB4eORgwW5p=1Q@mail.gmail.com>
@ 2012-05-29 22:25 ` Mark Nelson
2012-05-30 6:33 ` Stefan Priebe - Profihost AG
2 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-29 22:25 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On 05/24/2012 09:10 AM, Stefan Priebe - Profihost AG wrote:
> Hi list,
>
> today while testing btrfs i discovered a very poor osd performance using
> kernel 3.4.
>
> Underlying FS is XFS but it is the same with btrfs.
>
> 3.0.30:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 41 25 99.9767 100 0.586984 0.447293
> 2 16 71 55 109.979 120 0.934388 0.488375
> 3 16 99 83 110.647 112 1.15982 0.503111
> 4 16 130 114 113.981 124 1.05952 0.516925
> 5 16 159 143 114.382 116 0.149313 0.510734
> 6 16 188 172 114.649 116 0.287166 0.52203
> 7 16 215 199 113.697 108 0.151784 0.531461
> 8 16 242 226 112.984 108 0.623478 0.539896
> 9 16 265 249 110.651 92 0.50354 0.538504
> 10 16 296 280 111.984 124 0.155048 0.542846
> Total time run: 10.776153
> Total writes made: 297
> Write size: 4194304
> Bandwidth (MB/sec): 110.243
>
> Average Latency: 0.577534
> Max latency: 1.85499
> Min latency: 0.091473
>
>
> 3.4:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 40 24 95.9794 96 0.393196 0.455936
> 2 16 68 52 103.983 112 0.835652 0.517297
> 3 16 85 69 91.9849 68 1.00535 0.493058
> 4 16 96 80 79.9869 44 0.096564 0.577948
> 5 16 103 87 69.5879 28 0.092722 0.589147
> 6 16 117 101 67.3216 56 0.222175 0.675334
> 7 16 130 114 65.1321 52 0.15677 0.623806
> 8 16 144 128 63.9896 56 0.089157 0.56746
> 9 16 144 128 56.8794 0 - 0.56746
> 10 16 144 128 51.1912 0 - 0.56746
> 11 16 144 128 46.5373 0 - 0.56746
> 12 16 144 128 42.6591 0 - 0.56746
> 13 16 144 128 39.3776 0 - 0.56746
> 14 16 144 128 36.5649 0 - 0.56746
> 15 16 144 128 34.1272 0 - 0.56746
> 16 16 145 129 32.2443 0.5 11.3422 0.650985
> Total time run: 16.193871
> Total writes made: 145
> Write size: 4194304
> Bandwidth (MB/sec): 35.816
>
> Average Latency: 1.78467
> Max latency: 14.4744
> Min latency: 0.088753
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I setup some tests today to try to replicate your findings (and also
check results against some previous ones I've done). I don't think I'm
seeing exactly the same results as you, but I definitely see xfs
performing worse in this specific test than btrfs. I've included the
results here.
Distro: Ubuntu Oneiric (IE no syncfs in glibc)
Ceph: 0.47.2
Kernel 3.4.0-ceph (autobuild-ceph@gitbuilder-kernel-amd64)
Network: 10GbE
1 Client node
3 Mon nodes
2 OSD nodes with 1 OSD each mounted on a 7200rpm SAS drive. H700 Raid
controller with each drive in a 1 disk raid0. Journals are partitioned
on a separate drive. OSD data disks are using WT cache while journals
are using WB.
btrfs created with -l 64k -n64k, mounted using noatime.
xfs created with -f -d su=64k,sw=1 -i size=2048, mounted using noatime.
rados bench invocation: rados -p data bench 300 write -t 16 -b 4194304
btrfs:
Total time run: 300.413696
Total writes made: 7582
Write size: 4194304
Bandwidth (MB/sec): 100.954
Average Latency: 0.633932
Max latency: 3.78661
Min latency: 0.065734
xfs:
Total time run: 304.435966
Total writes made: 5023
Write size: 4194304
Bandwidth (MB/sec): 65.997
Average Latency: 0.96965
Max latency: 36.4993
Min latency: 0.07516
Full results are available here:
http://nhm.ceph.com/results/mailinglist-tests/
I created seekwatcher movies by running blktrace on the underlying OSD
data disks during the tests. These show throughput over time,
seeks/sec, and visual representation of where the disk is being written
to for each OSD. You can see them here:
http://nhm.ceph.com/movies/mailinglist-tests/
As you can see, at least for the quick tests I did this afternoon, the
performance of the underlying OSD disk is highly correlated with the
number of seeks being done. These results may improve with syncfs
support in Ubuntu 12.04. If you have your journals on the same disks as
the OSDs, that will cause even more seeks (in addition to the additional
to the greater throughput demands). These are things that we are
actively investigating and hopefully will be able to improve over the
coming months.
Thanks,
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:41 ` Mark Nelson
@ 2012-05-30 6:22 ` Stefan Priebe - Profihost AG
2012-05-30 7:20 ` building test cluster : missing /etc/ceph/client.admin.keyring, need help Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 6:22 UTC (permalink / raw)
To: Mark Nelson; +Cc: Yann Dupont, ceph-devel
Am 29.05.2012 23:41, schrieb Mark Nelson:
> When you are using 1 thread, you are hitting a ~40MB/s limit (probably
> networking related) before the data gets to the journal.
1GB/s is capable of at least 130Mb/s and i get 130MB/s with 3.0.30 using
16 threads. I don't get why i should hit a limit here.
> Because (in
> this case) the filestore data disk can handle that throughput,
> everything looks nice and consistent.
osd bench and fio and dd tells me the underlying disks can handle
260MB/s (Intel SSD).
> In this case, that 40MB/s limit with 1 thread has increased. Now more
> data is getting fed into the journal than the filestore can write out to
> disk. Eventually writes stall while the data is being written out.
I don't want to argue but why should this only happen with 3.4.0 and NOT
with 3.0.30. Even though it does not matter which underlying FS i use.
It is the same with XFS AND btrfs.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 21:45 ` Yann Dupont
@ 2012-05-30 6:29 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 6:29 UTC (permalink / raw)
To: Yann Dupont; +Cc: Mark Nelson, ceph-devel
Am 29.05.2012 23:45, schrieb Yann Dupont:
> Le 29/05/2012 23:34, Stefan Priebe a écrit :
>> Am 29.05.2012 23:31, schrieb Yann Dupont:
>>> on the contrary, pool data is jumping up & down, no matter how much
>>> thread involved :)
>>>
>>> Maybe this is because journal is too tight ? Or because 2 of the 8 nodes
>>> have slower disks ?
>> Can you try with 3.0.X? I would be really interested what happens in
>> this case.
>>
>> Stefan
> hum...
> probably not directly. Older btrfs won't like big metadata, I think.
> This is quite a recent feature.
That's absolutely correct. If you test 3.0.X i think its better to use
XFS. I'm just interested if the problem we both see is gone for you too
with 3.0.X.
> I'll take the latest 3.0 kernel - or do you have a particular 3.0 kernel
> version to test ?
I've used the latest 3.0.X stable (.32 right now)
> And Do you want a particular xfs/btrfs format ?
mkfs.xfs is enough ;-)
Thanks!
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-29 22:25 ` poor OSD performance using kernel 3.4 Mark Nelson
@ 2012-05-30 6:33 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
2012-05-30 11:51 ` poor OSD performance using kernel 3.4 Mark Nelson
0 siblings, 2 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 6:33 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
>
> I setup some tests today to try to replicate your findings (and also
> check results against some previous ones I've done). I don't think I'm
> seeing exactly the same results as you, but I definitely see xfs
> performing worse in this specific test than btrfs. I've included the
> results here.
>
> Full results are available here:
> http://nhm.ceph.com/results/mailinglist-tests/
But these tests shows exactly he same bad behaviour i'm seeing. Instead
of having a constant sequential write ratio you've heavily jumping
values. Are you able to test with XFS and 3.0.32? You'll then probably
see an absolutely constant write ratio.
Greets,
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
@ 2012-05-30 7:16 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuiJqZUCK-0qR_CrOo6GRhkjaCdkOhJ2boq3zD0_voTsA@mail.gmail.com>
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 7:16 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 09:01, schrieb Stefan Majer:
> Hi Stefan,
>
> what is your replication factor ? If it set to 2 and your osds have a
> single 1GB/sec link you never will see more than 120MB/sec i suspect
> much less because every write have to go to the same wire twice from
> each osd.
Sure - but right now i see 10MB/s with kernel 3.4 and 170MB/s with
3.0.30 using bonded 2x 1Gbit/s links.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* building test cluster : missing /etc/ceph/client.admin.keyring, need help
2012-05-30 6:22 ` Stefan Priebe - Profihost AG
@ 2012-05-30 7:20 ` Alexandre DERUMIER
2012-05-30 7:25 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-30 7:20 UTC (permalink / raw)
To: ceph-devel
Hi,
I'm building my rados test cluster,
3 servers,with on each server : 1 mon - 5 osd
mon daemon and osd are started, but when i use ceph command, it's missing client.admin.keyring
root@cephtest1:/etc/ceph# ceph -w
2012-05-30 09:05:35.255619 7fd1e9cfa760 -1 auth: failed to open keyring from /etc/ceph/client.admin.keyring
2012-05-30 09:05:35.255631 7fd1e9cfa760 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
2012-05-30 09:05:35.255693 7fd1e9cfa760 -1 ceph_tool_common_init failed.
root@cephtest1:/etc/ceph# ls /etc/ceph/
ceph.conf osd.0.keyring osd.1.keyring osd.2.keyring osd.3.keyring osd.4.keyring
Do I need to generate a keyring ? how can I do it ?
/etc/ceph.conf
[global]
; use cephx or none
auth supported = cephx
keyring = /etc/ceph/$name.keyring
[mon]
mon data = /srv/mon.$id
[mds]
[osd]
osd data = /srv/osd.$id
osd journal = /srv/osd.$id.journal
osd journal size = 1000
; uncomment the following line if you are mounting with ext4
; filestore xattr use omap = true
[mon.a]
host = cephtest1
mon addr = 10.3.94.27:6789
[mon.b]
host = cephtest2
mon addr = 10.3.94.28:6789
[mon.c]
host = cephtest3
mon addr = 10.3.94.29:6789
[osd.0]
host = cephtest1
addr = 10.3.94.27
[osd.1]
host = cephtest1
addr = 10.3.94.27
[osd.2]
host = cephtest1
addr = 10.3.94.27
[osd.3]
host = cephtest1
addr = 10.3.94.27
[osd.4]
host = cephtest1
addr = 10.3.94.27
[osd.5]
host = cephtest2
addr = 10.3.94.28
[osd.6]
host = cephtest2
addr = 10.3.94.28
[osd.7]
host = cephtest2
addr = 10.3.94.28
[osd.8]
host = cephtest2
addr = 10.3.94.28
[osd.9]
host = cephtest2
addr = 10.3.94.28
[osd.10]
host = cephtest3
addr = 10.3.94.29
[osd.11]
host = cephtest3
addr = 10.3.94.29
[osd.12]
host = cephtest3
addr = 10.3.94.29
[osd.13]
host = cephtest3
addr = 10.3.94.29
[osd.14]
host = cephtest3
addr = 10.3.94.29
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
2012-05-30 7:20 ` building test cluster : missing /etc/ceph/client.admin.keyring, need help Alexandre DERUMIER
@ 2012-05-30 7:25 ` Stefan Priebe - Profihost AG
2012-05-30 7:33 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 7:25 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: ceph-devel
Am 30.05.2012 09:20, schrieb Alexandre DERUMIER:
>
> Hi,
> I'm building my rados test cluster,
>
>
> 3 servers,with on each server : 1 mon - 5 osd
>
> mon daemon and osd are started, but when i use ceph command, it's missing client.admin.keyring
>
> root@cephtest1:/etc/ceph# ceph -w
> 2012-05-30 09:05:35.255619 7fd1e9cfa760 -1 auth: failed to open keyring from /etc/ceph/client.admin.keyring
> 2012-05-30 09:05:35.255631 7fd1e9cfa760 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
> 2012-05-30 09:05:35.255693 7fd1e9cfa760 -1 ceph_tool_common_init failed.
Just run:
mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/client.admin.keyring
and it will create the admin key for you.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
2012-05-30 7:25 ` Stefan Priebe - Profihost AG
@ 2012-05-30 7:33 ` Alexandre DERUMIER
2012-05-30 7:47 ` Alexandre DERUMIER
0 siblings, 1 reply; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-30 7:33 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel
ok ,thanks
I had created the cluster, following the official doc
http://ceph.com/docs/master/config-cluster/deploying-ceph-with-mkcephfs/
with
mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
and file was created in /srv
# cat /srv/ceph.keyring
[client.admin]
key = AQCQwcVPGIAwHhAAuS5Veg7GoOyzh59zq2TKag==
is it an error in documentation ?
----- Mail original -----
De: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org
Envoyé: Mercredi 30 Mai 2012 09:25:56
Objet: Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
Am 30.05.2012 09:20, schrieb Alexandre DERUMIER:
>
> Hi,
> I'm building my rados test cluster,
>
>
> 3 servers,with on each server : 1 mon - 5 osd
>
> mon daemon and osd are started, but when i use ceph command, it's missing client.admin.keyring
>
> root@cephtest1:/etc/ceph# ceph -w
> 2012-05-30 09:05:35.255619 7fd1e9cfa760 -1 auth: failed to open keyring from /etc/ceph/client.admin.keyring
> 2012-05-30 09:05:35.255631 7fd1e9cfa760 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
> 2012-05-30 09:05:35.255693 7fd1e9cfa760 -1 ceph_tool_common_init failed.
Just run:
mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/client.admin.keyring
and it will create the admin key for you.
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
2012-05-30 7:33 ` Alexandre DERUMIER
@ 2012-05-30 7:47 ` Alexandre DERUMIER
0 siblings, 0 replies; 73+ messages in thread
From: Alexandre DERUMIER @ 2012-05-30 7:47 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel
root@cephtest1:/srv# cp /srv/ceph.keyring /etc/ceph/client.admin.keyring
root@cephtest1:/srv# ceph -w
2012-05-30 09:26:40.336175 pg v572: 2880 pgs: 2880 active+clean; 0 bytes data, 544 MB used, 2039 GB / 2039 GB avail
2012-05-30 09:26:40.342175 mds e1: 0/0/1 up
2012-05-30 09:26:40.342207 osd e17: 15 osds: 15 up, 15 in
2012-05-30 09:26:40.342331 log 2012-05-30 09:06:35.419340 osd.9 10.3.94.28:6812/13794 260 : [INF] 2.3bb scrub ok
2012-05-30 09:26:40.342424 mon e1: 3 mons at {a=10.3.94.27:6789/0,b=10.3.94.28:6789/0,c=10.3.94.29:6789/0}
Ok, the fun will begin now :)
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier@odiso.com>
À: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
Cc: ceph-devel@vger.kernel.org
Envoyé: Mercredi 30 Mai 2012 09:33:40
Objet: Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
ok ,thanks
I had created the cluster, following the official doc
http://ceph.com/docs/master/config-cluster/deploying-ceph-with-mkcephfs/
with
mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
and file was created in /srv
# cat /srv/ceph.keyring
[client.admin]
key = AQCQwcVPGIAwHhAAuS5Veg7GoOyzh59zq2TKag==
is it an error in documentation ?
----- Mail original -----
De: "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org
Envoyé: Mercredi 30 Mai 2012 09:25:56
Objet: Re: building test cluster : missing /etc/ceph/client.admin.keyring, need help
Am 30.05.2012 09:20, schrieb Alexandre DERUMIER:
>
> Hi,
> I'm building my rados test cluster,
>
>
> 3 servers,with on each server : 1 mon - 5 osd
>
> mon daemon and osd are started, but when i use ceph command, it's missing client.admin.keyring
>
> root@cephtest1:/etc/ceph# ceph -w
> 2012-05-30 09:05:35.255619 7fd1e9cfa760 -1 auth: failed to open keyring from /etc/ceph/client.admin.keyring
> 2012-05-30 09:05:35.255631 7fd1e9cfa760 -1 monclient(hunting): failed to open keyring: (2) No such file or directory
> 2012-05-30 09:05:35.255693 7fd1e9cfa760 -1 ceph_tool_common_init failed.
Just run:
mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/client.admin.keyring
and it will create the admin key for you.
Stefan
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGuiJqZUCK-0qR_CrOo6GRhkjaCdkOhJ2boq3zD0_voTsA@mail.gmail.com>
@ 2012-05-30 11:04 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
2012-05-30 12:17 ` Mark Nelson
0 siblings, 2 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 11:04 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 09:19, schrieb Stefan Majer:
> Hi,
>
> ok, so your replication level is 2 and you have 2*1GB/sec right ?
Generally yes - but for this new test it was just 1*1GB/s (see below).
> do you have a iostat -x 3 output and or a dstat from all effected
> machines during your rados bench runs as well ?
As the output looks exactly the same on all OSDs here is it from ONE osd:
Kernel 3.4:
http://pastebin.com/raw.php?i=sV9vKsWy
Kernel 3.0:
http://pastebin.com/raw.php?i=eafjpPpK
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
@ 2012-05-30 11:25 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 11:25 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 13:20, schrieb Stefan Majer:
> H,
>
>
> On Wed, May 30, 2012 at 1:04 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag <mailto:s.priebe@profihost.ag>> wrote:
>
> Am 30.05.2012 09:19, schrieb Stefan Majer:
> > Hi,
> >
> > ok, so your replication level is 2 and you have 2*1GB/sec right ?
> Generally yes - but for this new test it was just 1*1GB/s (see below).
>
> > do you have a iostat -x 3 output and or a dstat from all effected
> > machines during your rados bench runs as well ?
>
> As the output looks exactly the same on all OSDs here is it from ONE
> osd:
>
> Kernel 3.4:
> http://pastebin.com/raw.php?i=sV9vKsWy
>
> This is strange, looks like a real regression in 3.4 ? but i guess it is
> only possible to track down this by doing
> git bisect on the kernel sources :-(
I also tried 3.3 and 3.2 it's the same... (haven't tested 3.1).
> Kernel 3.0:
> http://pastebin.com/raw.php?i=eafjpPpK
>
>
> Here you can see a constant rate to disk of ~ 50 - 70Mbyte/sec with
> about 10-15% utilization on them. So this test is not disk bound i
> guess your network is saturated. Can you run dstat during this test as
> well to see the network bandwith used as well.
Absolutely correct. I'm aware of this. I just want to have this result
with 3.4 so that i can use btrfs.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 6:33 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
@ 2012-05-30 11:51 ` Mark Nelson
1 sibling, 0 replies; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 11:51 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On 05/30/2012 01:33 AM, Stefan Priebe - Profihost AG wrote:
>> I setup some tests today to try to replicate your findings (and also
>> check results against some previous ones I've done). I don't think I'm
>> seeing exactly the same results as you, but I definitely see xfs
>> performing worse in this specific test than btrfs. I've included the
>> results here.
>>
>> Full results are available here:
>> http://nhm.ceph.com/results/mailinglist-tests/
> But these tests shows exactly he same bad behaviour i'm seeing. Instead
> of having a constant sequential write ratio you've heavily jumping
> values. Are you able to test with XFS and 3.0.32? You'll then probably
> see an absolutely constant write ratio.
>
> Greets,
> Stefan
The jumping around is due to the writes to the underlying OSD disk not
being able to keep up with the journal. I think it's more a symptom of
the problem rather than the problem itself. Presumably the OSD data
disk is performing slowly because of the number of seeks that are
happening (In my tests almost always between 40-60 on XFS, and growing
over time on btrfs). It's entirely possible that something changed
going from 3.0 to 3.4 that is causing the seek behavior to be worse.
I'll try the test again on a 3.0 kernel and record seekwatcher results
to see if the write patterns look any different.
Btw, I apologize if you mentioned this already, but are you running MONs
on the OSD nodes? Also, what version of glibc do you have?
Thanks,
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 11:04 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
@ 2012-05-30 12:17 ` Mark Nelson
2012-05-30 12:41 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 12:17 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
On 05/30/2012 06:04 AM, Stefan Priebe - Profihost AG wrote:
> Am 30.05.2012 09:19, schrieb Stefan Majer:
>> Hi,
>>
>> ok, so your replication level is 2 and you have 2*1GB/sec right ?
> Generally yes - but for this new test it was just 1*1GB/s (see below).
>
>> do you have a iostat -x 3 output and or a dstat from all effected
>> machines during your rados bench runs as well ?
> As the output looks exactly the same on all OSDs here is it from ONE osd:
>
> Kernel 3.4:
> http://pastebin.com/raw.php?i=sV9vKsWy
>
> Kernel 3.0:
> http://pastebin.com/raw.php?i=eafjpPpK
>
> Stefan
Would you mind installing blktrace and running "blktrace -o test-3.4 -d
/dev/sdb" on the OSD node during a short (say 60s) test on 3.4?
If you could archive/send me the results, that might help us get an idea
of what is actually getting sent out to the disk. Your data disk
throughput on 3.0 looks pretty close to what I normally get (including
on 3.4). I'm guessing the issue you are seeing on 3.4 is probably not
the seek problem I mentioned earlier (unless something is causing so
many seeks that it more or less paralyzes the disk).
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 12:17 ` Mark Nelson
@ 2012-05-30 12:41 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 12:41 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
Hi Mark,
didn't had the time to answer your mails - but i will get on this one first.
> Would you mind installing blktrace and running "blktrace -o test-3.4 -d
> /dev/sdb" on the OSD node during a short (say 60s) test on 3.4?
sure no problem.
here it is:
http://www.mediafire.com/?6cw87btn7mzco25
Output:
=== sdb ===
CPU 0: 18075 events, 848 KiB data
CPU 1: 10738 events, 504 KiB data
CPU 2: 8639 events, 405 KiB data
CPU 3: 8614 events, 404 KiB data
CPU 4: 0 events, 0 KiB data
CPU 5: 0 events, 0 KiB data
CPU 6: 143 events, 7 KiB data
CPU 7: 0 events, 0 KiB data
Total: 46209 events (dropped 0), 2167 KiB data
> If you could archive/send me the results, that might help us get an idea
> of what is actually getting sent out to the disk. Your data disk
> throughput on 3.0 looks pretty close to what I normally get (including
> on 3.4). I'm guessing the issue you are seeing on 3.4 is probably not
> the seek problem I mentioned earlier (unless something is causing so
> many seeks that it more or less paralyzes the disk).
As i have a SSD i can't believe seeks can be a problem.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
@ 2012-05-30 13:19 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGvxCmuViy+0==Vkdz_QjC1K+kD5kD1m7+0tYM2YDTtJbw@mail.gmail.com>
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 13:19 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 15:11, schrieb Stefan Majer:
> Hi,
>
> I dont think seeks are a problem, because Stefan would see huge disk
> util percentage with iostat which is not the case.
> I guess the problem with 3.2 and greater is somewhere else for example
> in a network card driver which changed dramaticalle or something like that.
Can't beliebe that the e1000 driver is now so buggy - although iperf
still shows me around 950MBit/s no matter if 3.0, 3.2, ...
> Is it possible to to a git bisect, on that machine and do some runs,
> otherwise i see no point how to identify this.
I'm not familiar with git bisect so i can't answer this question
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 12:41 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
@ 2012-05-30 13:27 ` Mark Nelson
2012-05-30 13:51 ` Stefan Priebe - Profihost AG
2012-05-30 14:16 ` Mark Nelson
2 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 13:27 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
On 5/30/12 7:41 AM, Stefan Priebe - Profihost AG wrote:
> Hi Mark,
>
> didn't had the time to answer your mails - but i will get on this one first.
>
>> Would you mind installing blktrace and running "blktrace -o test-3.4 -d
>> /dev/sdb" on the OSD node during a short (say 60s) test on 3.4?
> sure no problem.
>
> here it is:
> http://www.mediafire.com/?6cw87btn7mzco25
>
> Output:
> === sdb ===
> CPU 0: 18075 events, 848 KiB data
> CPU 1: 10738 events, 504 KiB data
> CPU 2: 8639 events, 405 KiB data
> CPU 3: 8614 events, 404 KiB data
> CPU 4: 0 events, 0 KiB data
> CPU 5: 0 events, 0 KiB data
> CPU 6: 143 events, 7 KiB data
> CPU 7: 0 events, 0 KiB data
> Total: 46209 events (dropped 0), 2167 KiB data
Great, thanks. I'll try to look at the results later this morning. If
you want to look at them yourself you can open them with the blkparse
program (and seekwatcher too, though there is a bug in the src you have
to fix to make it work right)
>> If you could archive/send me the results, that might help us get an idea
>> of what is actually getting sent out to the disk. Your data disk
>> throughput on 3.0 looks pretty close to what I normally get (including
>> on 3.4). I'm guessing the issue you are seeing on 3.4 is probably not
>> the seek problem I mentioned earlier (unless something is causing so
>> many seeks that it more or less paralyzes the disk).
> As i have a SSD i can't believe seeks can be a problem.
Ah, sorry. I forgot you were on SSD. Honestly I'm surpised that with
3.0 you weren't getting better performance. Something to look into once
we figure out why your 3.4 performance is so bad!
> Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 13:27 ` Mark Nelson
@ 2012-05-30 13:51 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 13:51 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
Am 30.05.2012 15:27, schrieb Mark Nelson:
> Great, thanks. I'll try to look at the results later this morning. If
> you want to look at them yourself you can open them with the blkparse
> program (and seekwatcher too, though there is a bug in the src you have
> to fix to make it work right)
I've no idea about blkparse and seekwatcher - so i don't know what i
should do with the output...
>>> If you could archive/send me the results, that might help us get an idea
>>> of what is actually getting sent out to the disk. Your data disk
>>> throughput on 3.0 looks pretty close to what I normally get (including
>>> on 3.4). I'm guessing the issue you are seeing on 3.4 is probably not
>>> the seek problem I mentioned earlier (unless something is causing so
>>> many seeks that it more or less paralyzes the disk).
>> As i have a SSD i can't believe seeks can be a problem.
>
> Ah, sorry. I forgot you were on SSD. Honestly I'm surpised that with
> 3.0 you weren't getting better performance. Something to look into once
> we figure out why your 3.4 performance is so bad!
Yes i think this is another problem.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGvxCmuViy+0==Vkdz_QjC1K+kD5kD1m7+0tYM2YDTtJbw@mail.gmail.com>
@ 2012-05-30 13:54 ` Stefan Priebe - Profihost AG
[not found] ` <4FC63381.6090300@inktank.com>
1 sibling, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-30 13:54 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 15:38, schrieb Stefan Majer:
> There is a small howto from linus:
> http://kerneltrap.org/node/11753
>
> you basically need to be able to compile the kernel from source and
> start in the freshly checked out source
> git bisect good v3.0
> git bisect bad v3.2
>
> Then git will pick a version inbetween an you can compile this, depoy it
> to your machine an look if it good or bad.
> Then tell git if it was bad or good and git will again choose a version
> between both versions. So you will get a single commit or a handful of
> commits which are probably the cause of the problem.
Thanks will try that after mark has looked into the blktrace ;-)
Thanks,
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 12:41 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
2012-05-30 13:27 ` Mark Nelson
@ 2012-05-30 14:16 ` Mark Nelson
2012-05-30 18:42 ` Stefan Priebe
2 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 14:16 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
On 5/30/12 7:41 AM, Stefan Priebe - Profihost AG wrote:
> Hi Mark,
>
> didn't had the time to answer your mails - but i will get on this one first.
>
>> Would you mind installing blktrace and running "blktrace -o test-3.4 -d
>> /dev/sdb" on the OSD node during a short (say 60s) test on 3.4?
> sure no problem.
>
> here it is:
> http://www.mediafire.com/?6cw87btn7mzco25
>
> Output:
> === sdb ===
> CPU 0: 18075 events, 848 KiB data
> CPU 1: 10738 events, 504 KiB data
> CPU 2: 8639 events, 405 KiB data
> CPU 3: 8614 events, 404 KiB data
> CPU 4: 0 events, 0 KiB data
> CPU 5: 0 events, 0 KiB data
> CPU 6: 143 events, 7 KiB data
> CPU 7: 0 events, 0 KiB data
> Total: 46209 events (dropped 0), 2167 KiB data
>
>> If you could archive/send me the results, that might help us get an idea
>> of what is actually getting sent out to the disk. Your data disk
>> throughput on 3.0 looks pretty close to what I normally get (including
>> on 3.4). I'm guessing the issue you are seeing on 3.4 is probably not
>> the seek problem I mentioned earlier (unless something is causing so
>> many seeks that it more or less paralyzes the disk).
> As i have a SSD i can't believe seeks can be a problem.
>
> Stefan
Ok, I put up a seekwatcher movie showing the writes going to your SSD:
http://nhm.ceph.com/movies/mailinglist-tests/stefan.mpg
Some quick observations:
In your blktrace results there are some really big gaps after cfq
schedule dispatch:
> 8,16 0 0 11.386025866 0 m N cfq schedule dispatch
> 8,16 2 975 12.393446988 3074 A WS 176147976 + 8 <-
> (8,17) 176145928
> 8,16 0 0 12.762164080 0 m N cfq schedule dispatch
> 8,16 0 2193 13.355165118 3312 A WSM 175875008 + 227 <-
> (8,17) 175872960
Specifically, the gap in the movie where there is no write activity
around second 30 correlates in the blktrace results with one of these
stalls:
> 8,16 0 0 29.548567957 0 m N cfq schedule dispatch
> 8,16 2 2185 34.548923918 2688 A W 2192 + 8 <- (8,17) 144
As to why this is happening, I don't know yet. I'll have more later.
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <4FC63381.6090300@inktank.com>
@ 2012-05-30 14:53 ` Stefan Priebe
2012-05-30 14:56 ` Mark Nelson
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-30 14:53 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
Am 30.05.2012 16:49, schrieb Mark Nelson:
> On 05/30/2012 08:38 AM, Stefan Majer wrote:
>> No i dont think so either, this was just a example. Maybe it is totaly
>> different.
>
> You could try setting up a pool with a replication level of 1 and see
> how that does. It will be faster in any event, but it would be
> interesting to see how much faster.
is there an easier way than modifying the crush map?
PS: i also tested noop scheduler - same result.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 14:53 ` Stefan Priebe
@ 2012-05-30 14:56 ` Mark Nelson
2012-05-30 18:26 ` Stefan Priebe
0 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 14:56 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
On 05/30/2012 09:53 AM, Stefan Priebe wrote:
> Am 30.05.2012 16:49, schrieb Mark Nelson:
>> On 05/30/2012 08:38 AM, Stefan Majer wrote:
>>> No i dont think so either, this was just a example. Maybe it is totaly
>>> different.
>>
>> You could try setting up a pool with a replication level of 1 and see
>> how that does. It will be faster in any event, but it would be
>> interesting to see how much faster.
> is there an easier way than modifying the crush map?
>
> PS: i also tested noop scheduler - same result.
>
> Stefan
something like:
ceph osd pool create POOL [pg_num [pgp_num]]
then:
ceph osd pool set POOL size VALUE
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 14:56 ` Mark Nelson
@ 2012-05-30 18:26 ` Stefan Priebe
2012-05-30 19:41 ` Mark Nelson
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-30 18:26 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
Hi Mark,
Am 30.05.2012 16:56, schrieb Mark Nelson:
> On 05/30/2012 09:53 AM, Stefan Priebe wrote:
>> Am 30.05.2012 16:49, schrieb Mark Nelson:
>>> You could try setting up a pool with a replication level of 1 and see
>>> how that does. It will be faster in any event, but it would be
>>> interesting to see how much faster.
>> is there an easier way than modifying the crush map?
>
> something like:
> ceph osd pool create POOL [pg_num [pgp_num]]
> then:
> ceph osd pool set POOL size VALUE
With pool size 1 the writes are constant around 112MB/s:
http://pastebin.com/raw.php?i=haDPNTfQ
So has it something todo with the replication?
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 14:16 ` Mark Nelson
@ 2012-05-30 18:42 ` Stefan Priebe
[not found] ` <CADdPHGuxa7TAyqXcXehb9WgKgkHwkybYTrj2oue_PKsiF+oR3A@mail.gmail.com>
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-30 18:42 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
Hi Mark,
> Specifically, the gap in the movie where there is no write activity
> around second 30 correlates in the blktrace results with one of these
> stalls:
>> 8,16 0 0 29.548567957 0 m N cfq schedule dispatch
>> 8,16 2 2185 34.548923918 2688 A W 2192 + 8 <- (8,17) 144
>
> As to why this is happening, I don't know yet. I'll have more later.
Should i try the bisect thing?
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
2012-05-30 18:26 ` Stefan Priebe
@ 2012-05-30 19:41 ` Mark Nelson
0 siblings, 0 replies; 73+ messages in thread
From: Mark Nelson @ 2012-05-30 19:41 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Stefan Majer, ceph-devel@vger.kernel.org
On 05/30/2012 01:26 PM, Stefan Priebe wrote:
> Hi Mark,
>
> Am 30.05.2012 16:56, schrieb Mark Nelson:
>> On 05/30/2012 09:53 AM, Stefan Priebe wrote:
>>> Am 30.05.2012 16:49, schrieb Mark Nelson:
>>>> You could try setting up a pool with a replication level of 1 and see
>>>> how that does. It will be faster in any event, but it would be
>>>> interesting to see how much faster.
>>> is there an easier way than modifying the crush map?
> >
>> something like:
>> ceph osd pool create POOL [pg_num [pgp_num]]
>> then:
>> ceph osd pool set POOL size VALUE
>
> With pool size 1 the writes are constant around 112MB/s:
> http://pastebin.com/raw.php?i=haDPNTfQ
>
> So has it something todo with the replication?
>
> Stefan
Well now that is interesting. Replication is pretty network heavy. In
addition to the client transfers to the OSDs, you have each OSD node
sending and receiving data from each other. Based on these results it
looks like you may be stalling waiting for data to replicate so the
client stops sending new requests. If you set the osd, filestore, and
messenger debugging up to like 20 you'll get a ton of info that may
provide more clues.
Otherwise, a while ago I started making a list of performance related
settings and tests that we (Inktank) may want to check for customers.
Note that this is a work in progress and the values may not be exactly
right yet. You could check and see if any of the networking settings
have changed on your setup between 3.0 and 3.4:
http://ceph.com/wiki/Performance_analysis
Also there was a thread a while back where Jim Schutt saw problems that
looked like disk performance issues due to tcp autotuning policy:
http://www.spinics.net/lists/ceph-devel/msg05049.html
That seemed to be more an issue with lots of clients and OSDs per node,
but I thought I'd mention it since some of the effects are similar.
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4
[not found] ` <CADdPHGuxa7TAyqXcXehb9WgKgkHwkybYTrj2oue_PKsiF+oR3A@mail.gmail.com>
@ 2012-05-30 21:10 ` Stefan Priebe
[not found] ` <CADdPHGutEwoDc=Kcrqcx2ZMO=dqhuoT5iLoP-WxqD+e5ZUmBRA@mail.gmail.com>
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe @ 2012-05-30 21:10 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel@vger.kernel.org
Am 30.05.2012 20:47, schrieb Stefan Majer:
> From my perspective marks hints regarding blktrace end up in the same
> summary as the iostat ouput gives.
> You see stalls, not induced by disk by any means, no other obvious hints
> where the lag might come from.
> So if you want to know why kernels > 3.2 are slow for your workload i
> would drill down this with git bisect.
OK here are some tests regarding the kernel version - all made with XFS.
Starting with 3.2.0-rc1 it drops from 164MB/s (bonding) to 119MB/s but
it never goes down to 0MB/s. 3.2.18 shows the same as 3.2-rc1.
Then with 3.3-rc1 i'm seeing even faster speed (178MB/s) than with 3.0.X
- so everything is fine again. So it seems 3.2.X had another bug which
reduced the speed which was fixed in 3.3.
Beginning with 3.3-rc4 it get's bad with drops to 0MB/s. So it should be
a commit between 3.3-rc3 and 3.3-rc4. Sadly this are 370 commits. No
idea where to start.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
[not found] ` <CADdPHGutEwoDc=Kcrqcx2ZMO=dqhuoT5iLoP-WxqD+e5ZUmBRA@mail.gmail.com>
@ 2012-05-31 7:10 ` Stefan Priebe - Profihost AG
2012-05-31 7:30 ` Yehuda Sadeh
[not found] ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
0 siblings, 2 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 7:10 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, ceph-devel
Hi Marc, Hi Stefan,
first thanks for all your help and time.
I found the commit which results in this problem and it is TCP related
but i'm still wondering if the expected behaviour of this commit is
expected?
The commit in question is:
git show c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
commit c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
Author: Jason Wang <jasowang@redhat.com>
Date: Thu Feb 2 00:07:00 2012 +0000
tcp: properly initialize tcp memory limits
Commit 4acb4190 tries to fix the using uninitialized value
introduced by commit 3dc43e3, but it would make the
per-socket memory limits too small.
This patch fixes this and also remove the redundant codes
introduced in 4acb4190.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 4cb9cd2..7a7724d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -778,7 +778,6 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
static __net_init int ipv4_sysctl_init_net(struct net *net)
{
struct ctl_table *table;
- unsigned long limit;
table = ipv4_net_table;
if (!net_eq(net, &init_net)) {
@@ -815,11 +814,6 @@ static __net_init int ipv4_sysctl_init_net(struct
net *net)
net->ipv4.sysctl_rt_cache_rebuild_count = 4;
tcp_init_mem(net);
- limit = nr_free_buffer_pages() / 8;
- limit = max(limit, 128UL);
- net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
- net->ipv4.sysctl_tcp_mem[1] = limit;
- net->ipv4.sysctl_tcp_mem[2] = net->ipv4.sysctl_tcp_mem[0] * 2;
net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
net_ipv4_ctl_path, table);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a34f5cf..37755cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3229,7 +3229,6 @@ __setup("thash_entries=", set_thash_entries);
void tcp_init_mem(struct net *net)
{
- /* Set per-socket limits to no more than 1/128 the pressure
threshold */
unsigned long limit = nr_free_buffer_pages() / 8;
limit = max(limit, 128UL);
net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
@@ -3298,7 +3297,8 @@ void __init tcp_init(void)
sysctl_max_syn_backlog = max(128, cnt / 256);
tcp_init_mem(&init_net);
- limit = nr_free_buffer_pages() / 8;
+ /* Set per-socket limits to no more than 1/128 the pressure
threshold */
+ limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
limit = max(limit, 128UL);
max_share = min(4UL*1024*1024, limit);
Greets
Stefan
^ permalink raw reply related [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 7:10 ` poor OSD performance using kernel 3.4 => problem found Stefan Priebe - Profihost AG
@ 2012-05-31 7:30 ` Yehuda Sadeh
[not found] ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
2012-05-31 13:21 ` Yann Dupont
[not found] ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
1 sibling, 2 replies; 73+ messages in thread
From: Yehuda Sadeh @ 2012-05-31 7:30 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Stefan Majer, Mark Nelson, ceph-devel
On Thu, May 31, 2012 at 12:10 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Hi Marc, Hi Stefan,
>
> first thanks for all your help and time.
>
> I found the commit which results in this problem and it is TCP related
> but i'm still wondering if the expected behaviour of this commit is
> expected?
>
> The commit in question is:
> git show c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
> commit c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
> Author: Jason Wang <jasowang@redhat.com>
> Date: Thu Feb 2 00:07:00 2012 +0000
>
> tcp: properly initialize tcp memory limits
>
> Commit 4acb4190 tries to fix the using uninitialized value
> introduced by commit 3dc43e3, but it would make the
> per-socket memory limits too small.
>
> This patch fixes this and also remove the redundant codes
> introduced in 4acb4190.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Acked-by: Glauber Costa <glommer@parallels.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 4cb9cd2..7a7724d 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -778,7 +778,6 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
> static __net_init int ipv4_sysctl_init_net(struct net *net)
> {
> struct ctl_table *table;
> - unsigned long limit;
>
> table = ipv4_net_table;
> if (!net_eq(net, &init_net)) {
> @@ -815,11 +814,6 @@ static __net_init int ipv4_sysctl_init_net(struct
> net *net)
> net->ipv4.sysctl_rt_cache_rebuild_count = 4;
>
> tcp_init_mem(net);
> - limit = nr_free_buffer_pages() / 8;
> - limit = max(limit, 128UL);
> - net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
> - net->ipv4.sysctl_tcp_mem[1] = limit;
> - net->ipv4.sysctl_tcp_mem[2] = net->ipv4.sysctl_tcp_mem[0] * 2;
>
> net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
> net_ipv4_ctl_path, table);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index a34f5cf..37755cc 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3229,7 +3229,6 @@ __setup("thash_entries=", set_thash_entries);
>
> void tcp_init_mem(struct net *net)
> {
> - /* Set per-socket limits to no more than 1/128 the pressure
> threshold */
> unsigned long limit = nr_free_buffer_pages() / 8;
> limit = max(limit, 128UL);
> net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
> @@ -3298,7 +3297,8 @@ void __init tcp_init(void)
> sysctl_max_syn_backlog = max(128, cnt / 256);
>
> tcp_init_mem(&init_net);
> - limit = nr_free_buffer_pages() / 8;
> + /* Set per-socket limits to no more than 1/128 the pressure
> threshold */
> + limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
> limit = max(limit, 128UL);
> max_share = min(4UL*1024*1024, limit);
>
Yeah, this might have affected the tcp performance. Looking at the
current linus tree this function looks more like it looked beforehand,
so it was probable reverted this way or another.
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
[not found] ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
@ 2012-05-31 8:04 ` Stefan Priebe - Profihost AG
2012-05-31 8:09 ` Stefan Majer
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 8:04 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, yehuda, ceph-devel@vger.kernel.org
Am 31.05.2012 09:27, schrieb Stefan Majer:
> we have set them in /etc/sysctl.conf to:
> net.ipv4.tcp_mem = 10000000 10000000 10000000
This does not help ;-(
> wow, this was fast !
> if i understand this commit correct it simply skips a in-kernel
> configuration of network related sysctl parameters, especialy
> net.ipv4.tcp_mem
I also tied this one:
net.ipv4.tcp_rmem = 4096 524287 16777216
net.ipv4.tcp_wmem = 4096 524287 16777216
# grabbed values from 3.0.X
net.ipv4.tcp_mem = 1162962 1550617 2325924
still - no help -. But if i use 3.4 and revert the commit it works fine.
But i wasn't able to find which other parts are influenced by this limit
while browsing through the source.
I only found:
net.ipv4.tcp_mem
and
net.ipv4.tcp_rmem
and
net.ipv4.tcp_wmem
Greets
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 8:04 ` Stefan Priebe - Profihost AG
@ 2012-05-31 8:09 ` Stefan Majer
2012-05-31 11:34 ` Stefan Priebe - Profihost AG
2012-05-31 12:18 ` Stefan Priebe - Profihost AG
0 siblings, 2 replies; 73+ messages in thread
From: Stefan Majer @ 2012-05-31 8:09 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Mark Nelson, yehuda, ceph-devel@vger.kernel.org
Hi Stefan,
then you should probably describe this in a short mail to Jason Wang
and ask him how to circumvent this commit with sysctl settings.
I´m pretty sure my sysctl setting reverts the first part of the
commit. So probably the second part is the evil one ?
Greetings
Stefan
On Thu, May 31, 2012 at 10:04 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Am 31.05.2012 09:27, schrieb Stefan Majer:
> > we have set them in /etc/sysctl.conf to:
> > net.ipv4.tcp_mem = 10000000 10000000 10000000
>
> This does not help ;-(
>
> > wow, this was fast !
> > if i understand this commit correct it simply skips a in-kernel
> > configuration of network related sysctl parameters, especialy
> > net.ipv4.tcp_mem
>
> I also tied this one:
> net.ipv4.tcp_rmem = 4096 524287 16777216
> net.ipv4.tcp_wmem = 4096 524287 16777216
> # grabbed values from 3.0.X
> net.ipv4.tcp_mem = 1162962 1550617 2325924
>
> still - no help -. But if i use 3.4 and revert the commit it works fine.
> But i wasn't able to find which other parts are influenced by this limit
> while browsing through the source.
>
> I only found:
> net.ipv4.tcp_mem
> and
> net.ipv4.tcp_rmem
> and
> net.ipv4.tcp_wmem
>
> Greets
> Stefan
--
Stefan Majer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 8:09 ` Stefan Majer
@ 2012-05-31 11:34 ` Stefan Priebe - Profihost AG
2012-05-31 12:18 ` Stefan Priebe - Profihost AG
1 sibling, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 11:34 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, yehuda, ceph-devel@vger.kernel.org
Am 31.05.2012 10:09, schrieb Stefan Majer:
> Hi Stefan,
>
> then you should probably describe this in a short mail to Jason Wang
> and ask him how to circumvent this commit with sysctl settings.
done hopefully he can help
> I´m pretty sure my sysctl setting reverts the first part of the
> commit. So probably the second part is the evil one ?
Yes it seems like that
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 8:09 ` Stefan Majer
2012-05-31 11:34 ` Stefan Priebe - Profihost AG
@ 2012-05-31 12:18 ` Stefan Priebe - Profihost AG
1 sibling, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 12:18 UTC (permalink / raw)
To: Stefan Majer; +Cc: Mark Nelson, yehuda, ceph-devel@vger.kernel.org
Hi Mark, Hi Stefan,
i found a way to solve it by comparing /proc/sys/net with an patched and
an unpatched kernel.
Strangely the problem occours when the values are too big (in new kernel).
With the smaller values everything works fine even under 3.4. Any ideas
how that can be? I thought these values should be tuned to a maximum for
max performance.
- => old kernel
+ => new kernel
-/proc/sys/net/ipv4/tcp_rmem:4096 87380 6291456
+/proc/sys/net/ipv4/tcp_rmem:4096 87380 514873
-/proc/sys/net/ipv4/tcp_wmem:4096 16384 4194304
+/proc/sys/net/ipv4/tcp_wmem:4096 16384 514873
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
[not found] ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
@ 2012-05-31 12:31 ` Mark Nelson
2012-05-31 12:33 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-31 12:31 UTC (permalink / raw)
To: Stefan Majer; +Cc: Yehuda Sadeh, Stefan Priebe - Profihost AG, ceph-devel
Hi Stefan,
Please do share! I was planning on starting out on the wiki and
eventually getting these kinds of things into the master docs. If you
(and others) have already done testing it would be really interesting to
compare experiences. So far I've been just kind of throwing stuff into:
http://ceph.com/wiki/Performance_analysis
In it's current form it's pretty inadequate, but I'm hoping to
eventually get back to it. A lot of the work I've been doing recently
is looking at underlying FS write behavior (specifically seeks) and if
we can get any reasonable improvement through mkfs and mount options.
Mark
On 5/31/12 2:34 AM, Stefan Majer wrote:
> Hi,
>
> if Stefan confirms this as a solution it might me a good idea to
> collect some performance optimizations hints for osds to
> http://ceph.com/docs/master
> probably seperated in:
>
> Gigabit Ethernet based deployments
> with Jumbo Frames
>
> without Jumbo Frames
> 10 Gigabit Ethernet based deployments
> with Jumbo Frames
>
> without Jumbo Frames
>
> I can share some of our configurations as well
>
> Greetings
> Stefan
>
> On Thu, May 31, 2012 at 9:30 AM, Yehuda Sadeh <yehuda@inktank.com
> <mailto:yehuda@inktank.com>> wrote:
>
> On Thu, May 31, 2012 at 12:10 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag <mailto:s.priebe@profihost.ag>> wrote:
> > Hi Marc, Hi Stefan,
> >
> > first thanks for all your help and time.
> >
> > I found the commit which results in this problem and it is TCP
> related
> > but i'm still wondering if the expected behaviour of this commit is
> > expected?
> >
> > The commit in question is:
> > git show c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
> > commit c43b874d5d714f271b80d4c3f49e05d0cbf51ed2
> > Author: Jason Wang <jasowang@redhat.com
> <mailto:jasowang@redhat.com>>
> > Date: Thu Feb 2 00:07:00 2012 +0000
> >
> > tcp: properly initialize tcp memory limits
> >
> > Commit 4acb4190 tries to fix the using uninitialized value
> > introduced by commit 3dc43e3, but it would make the
> > per-socket memory limits too small.
> >
> > This patch fixes this and also remove the redundant codes
> > introduced in 4acb4190.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com
> <mailto:jasowang@redhat.com>>
> > Acked-by: Glauber Costa <glommer@parallels.com
> <mailto:glommer@parallels.com>>
> > Signed-off-by: David S. Miller <davem@davemloft.net
> <mailto:davem@davemloft.net>>
> >
> > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > index 4cb9cd2..7a7724d 100644
> > --- a/net/ipv4/sysctl_net_ipv4.c
> > +++ b/net/ipv4/sysctl_net_ipv4.c
> > @@ -778,7 +778,6 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
> > static __net_init int ipv4_sysctl_init_net(struct net *net)
> > {
> > struct ctl_table *table;
> > - unsigned long limit;
> >
> > table = ipv4_net_table;
> > if (!net_eq(net, &init_net)) {
> > @@ -815,11 +814,6 @@ static __net_init int
> ipv4_sysctl_init_net(struct
> > net *net)
> > net->ipv4.sysctl_rt_cache_rebuild_count = 4;
> >
> > tcp_init_mem(net);
> > - limit = nr_free_buffer_pages() / 8;
> > - limit = max(limit, 128UL);
> > - net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
> > - net->ipv4.sysctl_tcp_mem[1] = limit;
> > - net->ipv4.sysctl_tcp_mem[2] =
> net->ipv4.sysctl_tcp_mem[0] * 2;
> >
> > net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
> > net_ipv4_ctl_path, table);
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index a34f5cf..37755cc 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3229 <tel:3229>,7 +3229,6 @@ __setup("thash_entries=",
> set_thash_entries);
> >
> > void tcp_init_mem(struct net *net)
> > {
> > - /* Set per-socket limits to no more than 1/128 the pressure
> > threshold */
> > unsigned long limit = nr_free_buffer_pages() / 8;
> > limit = max(limit, 128UL);
> > net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
> > @@ -3298 <tel:3298>,7 +3297,8 @@ void __init tcp_init(void)
> > sysctl_max_syn_backlog = max(128, cnt / 256);
> >
> > tcp_init_mem(&init_net);
> > - limit = nr_free_buffer_pages() / 8;
> > + /* Set per-socket limits to no more than 1/128 the pressure
> > threshold */
> > + limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
> > limit = max(limit, 128UL);
> > max_share = min(4UL*1024*1024, limit);
> >
> Yeah, this might have affected the tcp performance. Looking at the
> current linus tree this function looks more like it looked beforehand,
> so it was probable reverted this way or another.
>
> Yehuda
>
>
>
>
> --
> Stefan Majer
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 12:31 ` Mark Nelson
@ 2012-05-31 12:33 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 12:33 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Majer, Yehuda Sadeh, ceph-devel
Am 31.05.2012 14:31, schrieb Mark Nelson:
> Hi Stefan,
>
> Please do share! I was planning on starting out on the wiki and
> eventually getting these kinds of things into the master docs. If you
> (and others) have already done testing it would be really interesting to
> compare experiences. So far I've been just kind of throwing stuff into:
>
> http://ceph.com/wiki/Performance_analysis
>
> In it's current form it's pretty inadequate, but I'm hoping to
> eventually get back to it. A lot of the work I've been doing recently is
> looking at underlying FS write behavior (specifically seeks) and if we
> can get any reasonable improvement through mkfs and mount options.
At least i'll start sharing when i've a fine running system ;-) I plan
to switch to 10Gbe next week.
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 7:30 ` Yehuda Sadeh
[not found] ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
@ 2012-05-31 13:21 ` Yann Dupont
2012-05-31 13:37 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-31 13:21 UTC (permalink / raw)
To: Yehuda Sadeh
Cc: Stefan Priebe - Profihost AG, Stefan Majer, Mark Nelson,
ceph-devel
On 31/05/2012 09:30, Yehuda Sadeh wrote:
> On Thu, May 31, 2012 at 12:10 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Hi Marc, Hi Stefan,
>>
Hello, back today
Today, I upgraded my 2 last osd nodes with big storage, so now all my
nodes are equivalent.
Using 3.4.0 kernel, I still have good results with rbd pool, but jumping
values with data.
>> first thanks for all your help and time.
>>
>> I found the commit which results in this problem and it is TCP related
>> but i'm still wondering if the expected behaviour of this commit is
>> expected?
>
....
>>
> Yeah, this might have affected the tcp performance. Looking at the
> current linus tree this function looks more like it looked beforehand,
> so it was probable reverted this way or another!
>
> Yehuda
Well, I saw you probably found the culprit.
So tried the latest (this morning) git kernel.
Now data gives good results :
root@label5:~# rados -p data bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 215 199 795.765 796 0.073769 0.0745517
2 16 430 414 827.833 860 0.060165 0.0753952
3 16 632 616 821.207 808 0.072241 0.0772463
4 16 838 822 821.883 824 0.129571 0.0768741
5 16 1039 1023 818.271 804 0.056867 0.077637
6 16 1254 1238 825.209 860 0.078801 0.0771122
7 16 1474 1458 833.023 880 0.062886 0.0764071
8 16 1669 1653 826.389 780 0.09632 0.0767323
9 16 1877 1861 827.003 832 0.083765 0.0770398
10 16 2087 2071 828.294 840 0.051437 0.076937
11 16 2309 2293 833.714 888 0.080584 0.0764829
12 16 2535 2519 839.563 904 0.078095 0.0759574
13 16 2762 2746 844.816 908 0.081323 0.0754571
14 16 2984 2968 847.889 888 0.076973 0.0752921
15 16 3203 3187 849.754 876 0.069877 0.0750613
16 16 3437 3421 855.138 936 0.046845 0.0746941
17 16 3655 3639 856.126 872 0.052258 0.0745157
18 16 3862 3846 854.559 828 0.061542 0.0746875
19 16 4085 4069 856.525 892 0.053889 0.0745582
min lat: 0.033007 max lat: 0.462951 avg lat: 0.0743988
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 15 4308 4293 858.492 896 0.054176 0.0743988
Total time run: 20.103415
Total writes made: 4309
Write size: 4194304
Bandwidth (MB/sec): 857.367
Average Latency: 0.0746302
Max latency: 0.462951
Min latency: 0.033007
But very strangely it's now rbd that isn't stable ?!
root@label5:~# rados -p rbd bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 155 139 555.87 556 0.046232 0.109021
2 16 250 234 467.923 380 0.046793 0.0985316
3 16 250 234 311.955 0 - 0.0985316
4 16 250 234 233.965 0 - 0.0985316
5 16 250 234 187.173 0 - 0.0985316
6 16 266 250 166.645 16 0.038083 0.175697
7 16 266 250 142.839 0 - 0.175697
8 16 441 425 212.475 350 0.05512 0.298391
9 16 476 460 204.422 140 0.04372 0.280483
10 16 531 515 205.976 220 0.125076 0.309449
11 16 734 718 261.06 812 0.127582 0.244134
12 16 795 779 259.637 244 0.065158 0.234156
13 16 818 802 246.742 92 0.054514 0.241704
14 16 830 814 232.546 48 0.044386 0.239006
15 16 837 821 218.909 28 3.41523 0.267521
16 16 1043 1027 256.721 824 0.04898 0.248212
17 16 1147 1131 266.088 416 0.048591 0.232725
18 16 1147 1131 251.305 0 - 0.232725
19 16 1202 1186 249.657 110 0.081777 0.25501
min lat: 0.033773 max lat: 5.92059 avg lat: 0.245711
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 16 1296 1280 255.97 376 0.053797 0.245711
21 9 1297 1288 245.305 32 0.708133 0.248248
22 9 1297 1288 234.155 0 - 0.248248
23 9 1297 1288 223.975 0 - 0.248248
24 9 1297 1288 214.643 0 - 0.248248
25 9 1297 1288 206.057 0 - 0.248248
26 9 1297 1288 198.131 0 - 0.248248
Total time run: 26.829870
Total writes made: 1297
Write size: 4194304
Bandwidth (MB/sec): 193.367
Average Latency: 0.295922
Max latency: 7.36701
Min latency: 0.033773
Strange. I'm wondering if this has something to do with cache (that is,
operation I could have done before on nodes, as all my nodes are just
freshly rebooted).
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 13:21 ` Yann Dupont
@ 2012-05-31 13:37 ` Stefan Priebe - Profihost AG
2012-05-31 13:45 ` Yann Dupont
0 siblings, 1 reply; 73+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-31 13:37 UTC (permalink / raw)
To: Yann Dupont; +Cc: Yehuda Sadeh, Stefan Majer, Mark Nelson, ceph-devel
Am 31.05.2012 15:21, schrieb Yann Dupont:
> On 31/05/2012 09:30, Yehuda Sadeh wrote:
>> On Thu, May 31, 2012 at 12:10 AM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
> But very strangely it's now rbd that isn't stable ?!
>
> root@label5:~# rados -p rbd bench 20 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 155 139 555.87 556 0.046232 0.109021
> 2 16 250 234 467.923 380 0.046793 0.0985316
> 3 16 250 234 311.955 0 - 0.0985316
> 4 16 250 234 233.965 0 - 0.0985316
> 5 16 250 234 187.173 0 - 0.0985316
> 6 16 266 250 166.645 16 0.038083 0.175697
> 7 16 266 250 142.839 0 - 0.175697
> 8 16 441 425 212.475 350 0.05512 0.298391
> 9 16 476 460 204.422 140 0.04372 0.280483
> 10 16 531 515 205.976 220 0.125076 0.309449
> 11 16 734 718 261.06 812 0.127582 0.244134
> 12 16 795 779 259.637 244 0.065158 0.234156
> 13 16 818 802 246.742 92 0.054514 0.241704
> 14 16 830 814 232.546 48 0.044386 0.239006
> 15 16 837 821 218.909 28 3.41523 0.267521
> 16 16 1043 1027 256.721 824 0.04898 0.248212
> 17 16 1147 1131 266.088 416 0.048591 0.232725
> 18 16 1147 1131 251.305 0 - 0.232725
> 19 16 1202 1186 249.657 110 0.081777 0.25501
> min lat: 0.033773 max lat: 5.92059 avg lat: 0.245711
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 20 16 1296 1280 255.97 376 0.053797 0.245711
> 21 9 1297 1288 245.305 32 0.708133 0.248248
> 22 9 1297 1288 234.155 0 - 0.248248
> 23 9 1297 1288 223.975 0 - 0.248248
> 24 9 1297 1288 214.643 0 - 0.248248
> 25 9 1297 1288 206.057 0 - 0.248248
> 26 9 1297 1288 198.131 0 - 0.248248
> Total time run: 26.829870
> Total writes made: 1297
> Write size: 4194304
> Bandwidth (MB/sec): 193.367
>
> Average Latency: 0.295922
> Max latency: 7.36701
> Min latency: 0.033773
>
>
> Strange. I'm wondering if this has something to do with cache (that is,
> operation I could have done before on nodes, as all my nodes are just
> freshly rebooted).
Please test setting these values on all OSDs and Clients:
sysctl -w net.ipv4.tcp_rmem="4096 87380 514873"
sysctl -w net.ipv4.tcp_wmem="4096 16384 514873"
Stefan
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 13:37 ` Stefan Priebe - Profihost AG
@ 2012-05-31 13:45 ` Yann Dupont
2012-05-31 14:42 ` Yann Dupont
0 siblings, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-31 13:45 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Yehuda Sadeh, Stefan Majer, Mark Nelson, ceph-devel
On 31/05/2012 15:37, Stefan Priebe - Profihost AG wrote:
> Am 31.05.2012 15:21, schrieb Yann Dupont:
>> On 31/05/2012 09:30, Yehuda Sadeh wrote:
>>> On Thu, May 31, 2012 at 12:10 AM, Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>> But very strangely it's now rbd that isn't stable ?!
>>
>> root@label5:~# rados -p rbd bench 20 write -t 16
>> Maintaining 16 concurrent writes of 4194304 bytes for at least 20
>> seconds.
>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
>> 0 0 0 0 0 0 - 0
>> 1 16 155 139 555.87 556 0.046232 0.109021
>> 2 16 250 234 467.923 380 0.046793 0.0985316
>> 3 16 250 234 311.955 0 - 0.0985316
>> 4 16 250 234 233.965 0 - 0.0985316
>> 5 16 250 234 187.173 0 - 0.0985316
>> 6 16 266 250 166.645 16 0.038083 0.175697
>> 7 16 266 250 142.839 0 - 0.175697
>> 8 16 441 425 212.475 350 0.05512 0.298391
>> 9 16 476 460 204.422 140 0.04372 0.280483
>> 10 16 531 515 205.976 220 0.125076 0.309449
>> 11 16 734 718 261.06 812 0.127582 0.244134
>> 12 16 795 779 259.637 244 0.065158 0.234156
>> 13 16 818 802 246.742 92 0.054514 0.241704
>> 14 16 830 814 232.546 48 0.044386 0.239006
>> 15 16 837 821 218.909 28 3.41523 0.267521
>> 16 16 1043 1027 256.721 824 0.04898 0.248212
>> 17 16 1147 1131 266.088 416 0.048591 0.232725
>> 18 16 1147 1131 251.305 0 - 0.232725
>> 19 16 1202 1186 249.657 110 0.081777 0.25501
>> min lat: 0.033773 max lat: 5.92059 avg lat: 0.245711
>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
>> 20 16 1296 1280 255.97 376 0.053797 0.245711
>> 21 9 1297 1288 245.305 32 0.708133 0.248248
>> 22 9 1297 1288 234.155 0 - 0.248248
>> 23 9 1297 1288 223.975 0 - 0.248248
>> 24 9 1297 1288 214.643 0 - 0.248248
>> 25 9 1297 1288 206.057 0 - 0.248248
>> 26 9 1297 1288 198.131 0 - 0.248248
>> Total time run: 26.829870
>> Total writes made: 1297
>> Write size: 4194304
>> Bandwidth (MB/sec): 193.367
>>
>> Average Latency: 0.295922
>> Max latency: 7.36701
>> Min latency: 0.033773
>>
>>
>> Strange. I'm wondering if this has something to do with cache (that is,
>> operation I could have done before on nodes, as all my nodes are just
>> freshly rebooted).
>
> Please test setting these values on all OSDs and Clients:
> sysctl -w net.ipv4.tcp_rmem="4096 87380 514873"
> sysctl -w net.ipv4.tcp_wmem="4096 16384 514873"
>
> Stefan
same. stable for pool data (845 MB/s average), jumping with rbd (229
average, with a max latency of 6).
I'm with latest linus git kernel
(commit af56e0aa35f3ae2a4c1a6d1000702df1dd78cb76) , and I based on the
fact that the patch was reversed on it.
I can try with plain 3.4.0 with 'culprit patch' manually reversed.
what puzzles me is that this morning, with 3.4.0 it was rbd that was
stable, and now I have the exact contrary.
I'll begin to reboot with old 3.4.0 kernel to see if things are
reproductible.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 13:45 ` Yann Dupont
@ 2012-05-31 14:42 ` Yann Dupont
2012-05-31 15:32 ` Mark Nelson
0 siblings, 1 reply; 73+ messages in thread
From: Yann Dupont @ 2012-05-31 14:42 UTC (permalink / raw)
To: Yann Dupont
Cc: Stefan Priebe - Profihost AG, Yehuda Sadeh, Stefan Majer,
Mark Nelson, ceph-devel
On 31/05/2012 15:45, Yann Dupont wrote:
> On 31/05/2012 15:37, Stefan Priebe - Profihost AG wrote:
> what puzzles me is that this morning, with 3.4.0 it was rbd that was
> stable, and now I have the exact contrary.
>
> I'll begin to reboot with old 3.4.0 kernel to see if things are
> reproductible.
>
> Cheers,
I'd say my problem is probably not related. Freshly rebooting all osd
nodes with 3.4.0 kernel (the same kernel I used this morning) now gives
pool data stable & rbd unstable. As with current git, and the exact
opposite of results I had tuesday & this morning.
Go figure.
Could it have to do with previous usage in OSD ? or active mds ? or mon ?
As I already said, as my osd are using btrfs with big medata features,
so going back in 3.0 kernel need a complete reformat of my OSD before.
But I will do it if you judge it can bring some light on this case.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 14:42 ` Yann Dupont
@ 2012-05-31 15:32 ` Mark Nelson
2012-05-31 15:43 ` Yann Dupont
0 siblings, 1 reply; 73+ messages in thread
From: Mark Nelson @ 2012-05-31 15:32 UTC (permalink / raw)
To: Yann Dupont
Cc: Stefan Priebe - Profihost AG, Yehuda Sadeh, Stefan Majer,
ceph-devel
On 05/31/2012 09:42 AM, Yann Dupont wrote:
> On 31/05/2012 15:45, Yann Dupont wrote:
>> On 31/05/2012 15:37, Stefan Priebe - Profihost AG wrote:
>
>> what puzzles me is that this morning, with 3.4.0 it was rbd that was
>> stable, and now I have the exact contrary.
>>
>> I'll begin to reboot with old 3.4.0 kernel to see if things are
>> reproductible.
>>
>> Cheers,
>
>
> I'd say my problem is probably not related. Freshly rebooting all osd
> nodes with 3.4.0 kernel (the same kernel I used this morning) now
> gives pool data stable & rbd unstable. As with current git, and the
> exact opposite of results I had tuesday & this morning.
>
> Go figure.
>
> Could it have to do with previous usage in OSD ? or active mds ? or mon ?
>
> As I already said, as my osd are using btrfs with big medata features,
> so going back in 3.0 kernel need a complete reformat of my OSD before.
>
> But I will do it if you judge it can bring some light on this case.
>
> Cheers,
Hi Yann,
Can you take a look at how many PGs are in each pool?
ceph osd pool get<pool> pg_num
Thanks,
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 15:32 ` Mark Nelson
@ 2012-05-31 15:43 ` Yann Dupont
2012-05-31 16:14 ` Mark Nelson
2012-05-31 16:29 ` Sage Weil
0 siblings, 2 replies; 73+ messages in thread
From: Yann Dupont @ 2012-05-31 15:43 UTC (permalink / raw)
To: Mark Nelson
Cc: Stefan Priebe - Profihost AG, Yehuda Sadeh, Stefan Majer,
ceph-devel
On 31/05/2012 17:32, Mark Nelson wrote:
> ceph osd pool get<pool> pg_num
My setup is detailed in a previous mail , But as I changed some
parameters this morning, here we go :
root@chichibu:~# ceph osd pool get data pg_num
PG_NUM: 576
root@chichibu:~# ceph osd pool get rbd pg_num
PG_NUM: 576
The pg num is quite low because I started with small OSD (9 osd with
200G each - internal disks) when I formatted. Now, I reduced to 8 osd,
(osd.4 is out) but with much larger (& faster) storage.
Now, each of the 8 OSD have 5T on it, I try, for the moment, to keep the
OSD similars. Replication is set to 2.
The fs is btrfs formatted with big metadata (-l 64k -n64k), and mounted
via space_cache,compress=lzo,nobarrier,noatime.
journal is on tmpfs :
osd journal = /dev/shm/journal
osd journal size = 6144
I know this is dangerous, remember It's NOT a production system for the
moment.
No OSD is full, I don't have much data stored for the moment.
Concerning crush map, I'm not using the default one :
The 8 nodes are in 3 different locations (some kilometers away). 2 are
in 1 place, 2 in another, and the 4 last in the principal place.
There is 10G between all the nodes and they are in the same VLAN, no
router involved (but there is (negligible ?) latency between nodes)
I try to group host together to avoid problem when I loose a location
(electrical problem, for example). Not sure I really customized the
crush map as I should have.
here is the map :
begin crush map
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 device4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
# types
type 0 osd
type 1 host
type 2 rack
type 3 pool
# buckets
host karuizawa {
id -5 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.2 weight 1.000
}
host hazelburn {
id -6 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.3 weight 1.000
}
rack loire {
id -3 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item karuizawa weight 1.000
item hazelburn weight 1.000
}
host carsebridge {
id -8 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.5 weight 1.000
}
host cameronbridge {
id -9 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.6 weight 1.000
}
rack chantrerie {
id -7 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item carsebridge weight 1.000
item cameronbridge weight 1.000
}
host chichibu {
id -2 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
}
host glenesk {
id -4 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 1.000
}
host braeval {
id -10 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.7 weight 1.000
}
host hanyu {
id -11 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.8 weight 1.000
}
rack lombarderie {
id -12 # do not change unnecessarily
# weight 4.000
alg straw
hash 0 # rjenkins1
item chichibu weight 1.000
item glenesk weight 1.000
item braeval weight 1.000
item hanyu weight 1.000
}
pool default {
id -1 # do not change unnecessarily
# weight 8.000
alg straw
hash 0 # rjenkins1
item loire weight 2.000
item chantrerie weight 2.000
item lombarderie weight 4.000
}
# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule rbd {
ruleset 2
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Hope it helps,
cheers
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 15:43 ` Yann Dupont
@ 2012-05-31 16:14 ` Mark Nelson
2012-05-31 16:29 ` Sage Weil
1 sibling, 0 replies; 73+ messages in thread
From: Mark Nelson @ 2012-05-31 16:14 UTC (permalink / raw)
To: Yann Dupont
Cc: Stefan Priebe - Profihost AG, Yehuda Sadeh, Stefan Majer,
ceph-devel
On 05/31/2012 10:43 AM, Yann Dupont wrote:
> On 31/05/2012 17:32, Mark Nelson wrote:
>> ceph osd pool get<pool> pg_num
>
> My setup is detailed in a previous mail , But as I changed some
> parameters this morning, here we go :
>
> root@chichibu:~# ceph osd pool get data pg_num
> PG_NUM: 576
> root@chichibu:~# ceph osd pool get rbd pg_num
> PG_NUM: 576
>
>
>
> The pg num is quite low because I started with small OSD (9 osd with
> 200G each - internal disks) when I formatted. Now, I reduced to 8 osd,
> (osd.4 is out) but with much larger (& faster) storage.
>
>
> Now, each of the 8 OSD have 5T on it, I try, for the moment, to keep the
> OSD similars. Replication is set to 2.
>
>
> The fs is btrfs formatted with big metadata (-l 64k -n64k), and mounted
> via space_cache,compress=lzo,nobarrier,noatime.
>
> journal is on tmpfs :
> osd journal = /dev/shm/journal
> osd journal size = 6144
>
> I know this is dangerous, remember It's NOT a production system for the
> moment.
>
> No OSD is full, I don't have much data stored for the moment.
>
> Concerning crush map, I'm not using the default one :
>
> The 8 nodes are in 3 different locations (some kilometers away). 2 are
> in 1 place, 2 in another, and the 4 last in the principal place.
>
> There is 10G between all the nodes and they are in the same VLAN, no
> router involved (but there is (negligible ?) latency between nodes)
>
> I try to group host together to avoid problem when I loose a location
> (electrical problem, for example). Not sure I really customized the
> crush map as I should have.
>
> here is the map :
> begin crush map
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 device4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
>
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 pool
>
> # buckets
> host karuizawa {
> id -5 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.2 weight 1.000
> }
> host hazelburn {
> id -6 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.3 weight 1.000
> }
> rack loire {
> id -3 # do not change unnecessarily
> # weight 2.000
> alg straw
> hash 0 # rjenkins1
> item karuizawa weight 1.000
> item hazelburn weight 1.000
> }
> host carsebridge {
> id -8 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.5 weight 1.000
> }
> host cameronbridge {
> id -9 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.6 weight 1.000
> }
> rack chantrerie {
> id -7 # do not change unnecessarily
> # weight 2.000
> alg straw
> hash 0 # rjenkins1
> item carsebridge weight 1.000
> item cameronbridge weight 1.000
> }
> host chichibu {
> id -2 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.0 weight 1.000
> }
> host glenesk {
> id -4 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.1 weight 1.000
> }
> host braeval {
> id -10 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.7 weight 1.000
> }
> host hanyu {
> id -11 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.8 weight 1.000
> }
> rack lombarderie {
> id -12 # do not change unnecessarily
> # weight 4.000
> alg straw
> hash 0 # rjenkins1
> item chichibu weight 1.000
> item glenesk weight 1.000
> item braeval weight 1.000
> item hanyu weight 1.000
> }
> pool default {
> id -1 # do not change unnecessarily
> # weight 8.000
> alg straw
> hash 0 # rjenkins1
> item loire weight 2.000
> item chantrerie weight 2.000
> item lombarderie weight 4.000
> }
>
> # rules
> rule data {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule metadata {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule rbd {
> ruleset 2
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # end crush map
>
> Hope it helps,
> cheers
>
>
Hi Yann,
You might want to start out by running sar/iostat/collectl on the OSD
nodes and seeing if anything looks funny during the slow test compared
to the fast one. If that doesn't reveal much, you could run blktrace on
one of the OSDs during the tests and see if the IO to the disk looks
different. I can help out if you want to send me your blktrace results.
Similarly you could watch the network streams for both tests and see
if anything looks different there.
Thanks!
Mark
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 15:43 ` Yann Dupont
2012-05-31 16:14 ` Mark Nelson
@ 2012-05-31 16:29 ` Sage Weil
2012-05-31 16:37 ` Yann Dupont
1 sibling, 1 reply; 73+ messages in thread
From: Sage Weil @ 2012-05-31 16:29 UTC (permalink / raw)
To: Yann Dupont
Cc: Mark Nelson, Stefan Priebe - Profihost AG, Yehuda Sadeh,
Stefan Majer, ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5385 bytes --]
On Thu, 31 May 2012, Yann Dupont wrote:
> On 31/05/2012 17:32, Mark Nelson wrote:
> > ceph osd pool get<pool> pg_num
>
> My setup is detailed in a previous mail , But as I changed some parameters
> this morning, here we go :
>
> root@chichibu:~# ceph osd pool get data pg_num
> PG_NUM: 576
> root@chichibu:~# ceph osd pool get rbd pg_num
> PG_NUM: 576
Can you post 'ceph osd dump | grep ^pool' so we can see which CRUSH rules
the pools are mapped to?
Thanks!
sage
>
>
>
> The pg num is quite low because I started with small OSD (9 osd with 200G each
> - internal disks) when I formatted. Now, I reduced to 8 osd, (osd.4 is out)
> but with much larger (& faster) storage.
>
>
> Now, each of the 8 OSD have 5T on it, I try, for the moment, to keep the OSD
> similars. Replication is set to 2.
>
>
> The fs is btrfs formatted with big metadata (-l 64k -n64k), and mounted via
> space_cache,compress=lzo,nobarrier,noatime.
>
> journal is on tmpfs :
> osd journal = /dev/shm/journal
> osd journal size = 6144
>
> I know this is dangerous, remember It's NOT a production system for the
> moment.
>
> No OSD is full, I don't have much data stored for the moment.
>
> Concerning crush map, I'm not using the default one :
>
> The 8 nodes are in 3 different locations (some kilometers away). 2 are in 1
> place, 2 in another, and the 4 last in the principal place.
>
> There is 10G between all the nodes and they are in the same VLAN, no router
> involved (but there is (negligible ?) latency between nodes)
>
> I try to group host together to avoid problem when I loose a location
> (electrical problem, for example). Not sure I really customized the crush map
> as I should have.
>
> here is the map :
> begin crush map
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 device4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
>
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 pool
>
> # buckets
> host karuizawa {
> id -5 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.2 weight 1.000
> }
> host hazelburn {
> id -6 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.3 weight 1.000
> }
> rack loire {
> id -3 # do not change unnecessarily
> # weight 2.000
> alg straw
> hash 0 # rjenkins1
> item karuizawa weight 1.000
> item hazelburn weight 1.000
> }
> host carsebridge {
> id -8 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.5 weight 1.000
> }
> host cameronbridge {
> id -9 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.6 weight 1.000
> }
> rack chantrerie {
> id -7 # do not change unnecessarily
> # weight 2.000
> alg straw
> hash 0 # rjenkins1
> item carsebridge weight 1.000
> item cameronbridge weight 1.000
> }
> host chichibu {
> id -2 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.0 weight 1.000
> }
> host glenesk {
> id -4 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.1 weight 1.000
> }
> host braeval {
> id -10 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.7 weight 1.000
> }
> host hanyu {
> id -11 # do not change unnecessarily
> # weight 1.000
> alg straw
> hash 0 # rjenkins1
> item osd.8 weight 1.000
> }
> rack lombarderie {
> id -12 # do not change unnecessarily
> # weight 4.000
> alg straw
> hash 0 # rjenkins1
> item chichibu weight 1.000
> item glenesk weight 1.000
> item braeval weight 1.000
> item hanyu weight 1.000
> }
> pool default {
> id -1 # do not change unnecessarily
> # weight 8.000
> alg straw
> hash 0 # rjenkins1
> item loire weight 2.000
> item chantrerie weight 2.000
> item lombarderie weight 4.000
> }
>
> # rules
> rule data {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule metadata {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule rbd {
> ruleset 2
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # end crush map
>
> Hope it helps,
> cheers
>
>
> --
> Yann Dupont - Service IRTS, DSI Université de Nantes
> Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: poor OSD performance using kernel 3.4 => problem found
2012-05-31 16:29 ` Sage Weil
@ 2012-05-31 16:37 ` Yann Dupont
0 siblings, 0 replies; 73+ messages in thread
From: Yann Dupont @ 2012-05-31 16:37 UTC (permalink / raw)
To: Sage Weil
Cc: Mark Nelson, Stefan Priebe - Profihost AG, Yehuda Sadeh,
Stefan Majer, ceph-devel
Le 31/05/2012 18:29, Sage Weil a écrit :
> Can you post 'ceph osd dump | grep ^pool' so we can see which CRUSH rules
> the pools are mapped to?
>
yes :
root@label5:~# ceph osd dump | grep ^pool
pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 576
pgp_num 576 last_change 816 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num
576 pgp_num 576 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 576
pgp_num 576 last_change 1 owner 0
cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2012-05-31 16:37 UTC | newest]
Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24 14:10 poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
2012-05-24 14:57 ` Mark Nelson
[not found] ` <CAJCPpW+SKnnVUaDEAsCkKyZwMVrHCRJF2C8zqB4eORgwW5p=1Q@mail.gmail.com>
[not found] ` <4FBE7ABC.5020502@profihost.ag>
2012-05-24 18:53 ` Mark Nelson
2012-05-24 19:05 ` Stefan Priebe
2012-05-25 1:53 ` Mark Nelson
2012-05-25 8:19 ` Stefan Priebe - Profihost AG
2012-05-25 11:31 ` Stefan Priebe - Profihost AG
2012-05-25 12:10 ` Stefan Priebe - Profihost AG
2012-05-25 15:47 ` Alexandre DERUMIER
2012-05-27 9:11 ` Stefan Priebe - Profihost AG
2012-05-27 11:33 ` Alexandre DERUMIER
2012-05-27 18:57 ` Stefan Priebe
2012-05-28 5:37 ` Alexandre DERUMIER
2012-05-28 6:25 ` Stefan Priebe
2012-05-28 6:52 ` Alexandre DERUMIER
2012-05-28 19:48 ` Stefan Priebe
2012-05-29 3:54 ` Alexandre DERUMIER
2012-05-29 8:22 ` Stefan Priebe - Profihost AG
2012-05-29 13:01 ` Alexandre DERUMIER
2012-05-29 14:18 ` Stefan Priebe - Profihost AG
2012-05-29 9:46 ` Stefan Priebe - Profihost AG
2012-05-29 13:39 ` Yann Dupont
2012-05-29 14:43 ` Stefan Priebe - Profihost AG
2012-05-29 17:50 ` Mark Nelson
2012-05-29 19:50 ` Yann Dupont
2012-05-29 21:04 ` Stefan Priebe
2012-05-29 21:08 ` Stefan Priebe
2012-05-29 21:31 ` Yann Dupont
2012-05-29 21:34 ` Stefan Priebe
2012-05-29 21:45 ` Yann Dupont
2012-05-30 6:29 ` Stefan Priebe - Profihost AG
2012-05-29 21:41 ` Mark Nelson
2012-05-30 6:22 ` Stefan Priebe - Profihost AG
2012-05-30 7:20 ` building test cluster : missing /etc/ceph/client.admin.keyring, need help Alexandre DERUMIER
2012-05-30 7:25 ` Stefan Priebe - Profihost AG
2012-05-30 7:33 ` Alexandre DERUMIER
2012-05-30 7:47 ` Alexandre DERUMIER
2012-05-29 22:25 ` poor OSD performance using kernel 3.4 Mark Nelson
2012-05-30 6:33 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
2012-05-30 7:16 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuiJqZUCK-0qR_CrOo6GRhkjaCdkOhJ2boq3zD0_voTsA@mail.gmail.com>
2012-05-30 11:04 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
2012-05-30 11:25 ` Stefan Priebe - Profihost AG
2012-05-30 12:17 ` Mark Nelson
2012-05-30 12:41 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
2012-05-30 13:19 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGvxCmuViy+0==Vkdz_QjC1K+kD5kD1m7+0tYM2YDTtJbw@mail.gmail.com>
2012-05-30 13:54 ` Stefan Priebe - Profihost AG
[not found] ` <4FC63381.6090300@inktank.com>
2012-05-30 14:53 ` Stefan Priebe
2012-05-30 14:56 ` Mark Nelson
2012-05-30 18:26 ` Stefan Priebe
2012-05-30 19:41 ` Mark Nelson
2012-05-30 13:27 ` Mark Nelson
2012-05-30 13:51 ` Stefan Priebe - Profihost AG
2012-05-30 14:16 ` Mark Nelson
2012-05-30 18:42 ` Stefan Priebe
[not found] ` <CADdPHGuxa7TAyqXcXehb9WgKgkHwkybYTrj2oue_PKsiF+oR3A@mail.gmail.com>
2012-05-30 21:10 ` Stefan Priebe
[not found] ` <CADdPHGutEwoDc=Kcrqcx2ZMO=dqhuoT5iLoP-WxqD+e5ZUmBRA@mail.gmail.com>
2012-05-31 7:10 ` poor OSD performance using kernel 3.4 => problem found Stefan Priebe - Profihost AG
2012-05-31 7:30 ` Yehuda Sadeh
[not found] ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
2012-05-31 12:31 ` Mark Nelson
2012-05-31 12:33 ` Stefan Priebe - Profihost AG
2012-05-31 13:21 ` Yann Dupont
2012-05-31 13:37 ` Stefan Priebe - Profihost AG
2012-05-31 13:45 ` Yann Dupont
2012-05-31 14:42 ` Yann Dupont
2012-05-31 15:32 ` Mark Nelson
2012-05-31 15:43 ` Yann Dupont
2012-05-31 16:14 ` Mark Nelson
2012-05-31 16:29 ` Sage Weil
2012-05-31 16:37 ` Yann Dupont
[not found] ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
2012-05-31 8:04 ` Stefan Priebe - Profihost AG
2012-05-31 8:09 ` Stefan Majer
2012-05-31 11:34 ` Stefan Priebe - Profihost AG
2012-05-31 12:18 ` Stefan Priebe - Profihost AG
2012-05-30 11:51 ` poor OSD performance using kernel 3.4 Mark Nelson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.