All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Durgin <josh.durgin@inktank.com>
To: Alexandre DERUMIER <aderumier@odiso.com>
Cc: Marcus Sorensen <shadowsor@gmail.com>,
	Sage Weil <sage@inktank.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: slow fio random read benchmark, need help
Date: Wed, 31 Oct 2012 13:22:22 -0700	[thread overview]
Message-ID: <5091887E.3020605@inktank.com> (raw)
In-Reply-To: <8be796b6-f1c1-43ed-abe8-ec06b62c7e84@mailpro>

On 10/31/2012 11:56 AM, Alexandre DERUMIER wrote:
> Yes, I think you are right, round trip with mon must cut by half the performance.

I just want to note that the monitors aren't in the data path.
The client knows how to reach the osds and which osds to talk to based
on the osdmap. This is updated asynchronously from the client's
perspective.

> I have just done test with 2 parallel fio bench, from 2 differents host,
> I get 2 x 5000 iops

It'd be interesting to try smaller rbd objects (rbd create --order 12
...) to rule out contention in the OSD for particular objects.

Josh

> so it must be related to network latency.
>
> I have also done tests with --numjob 1000, it doesn't help, same results.
>
>
> Do you have an idea how I can have more io from 1 host ?
> Doing lacp with multiple links ?
>
> I think that 10gigabit latency is almost same, i'm not sure it will improve iops too much
> Maybe InfiniBand can help?
>
> ----- Mail original -----
>
> De: "Marcus Sorensen" <shadowsor@gmail.com>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: "Sage Weil" <sage@inktank.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 31 Octobre 2012 18:38:46
> Objet: Re: slow fio random read benchmark, need help
>
> Yes, I was going to say that the most I've ever seen out of gigabit is
> about 15k iops, with parallel tests and NFS (or iSCSI). Multipathing
> may not really parallelize the io for you. It can send an io down one
> path, then move to the next path and send the next io without
> necessarily waiting for the previous one to respond, but it only
> shaves a slight amount from your latency under some scenarios as
> opposed to sending down all paths simultaneously. I have seen it help
> with high latency links.
>
> I don't remember the Ceph design that well, but with distributed
> storage systems you're going to pay a penalty. If you can do 10-15k
> with one TCP round trip, you'll get half that with the round trip to
> talk to the metadata server to find your blocks and then to fetch
> them. Like I said, that might not be exactly what Ceph does, but
> you're going to have more traffic than just a straight single attached
> NFS or iscsi server.
>
> On Wed, Oct 31, 2012 at 11:27 AM, Alexandre DERUMIER
> <aderumier@odiso.com> wrote:
>> Thanks Marcus,
>>
>> indeed gigabit ethernet.
>>
>> note that my iscsi results (40k)was with multipath, so multiple gigabit links.
>>
>> I have also done tests with a netapp array, with nfs, single link, I'm around 13000 iops
>>
>> I will do more tests with multiples vms, from differents hosts, and with --numjobs.
>>
>> I'll keep you in touch,
>>
>> Thanks for help,
>>
>> Regards,
>>
>> Alexandre
>>
>>
>> ----- Mail original -----
>>
>> De: "Marcus Sorensen" <shadowsor@gmail.com>
>> À: "Alexandre DERUMIER" <aderumier@odiso.com>
>> Cc: "Sage Weil" <sage@inktank.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 31 Octobre 2012 18:08:11
>> Objet: Re: slow fio random read benchmark, need help
>>
>> 5000 is actually really good, if you ask me. Assuming everything is
>> connected via gigabit. If you get 40k iops locally, you add the
>> latency of tcp, as well as that of the ceph services and VM layer, and
>> that's what you get. On my network I get about a .1ms round trip on
>> gigabit over the same switch, which by definition can only do 10,000
>> iops. Then if you have storage on the other end capable of 40k iops,
>> you add the latencies together (.1ms + .025ms) and you're at 8k iops.
>> Then add the small latency of the application servicing the io (NFS,
>> Ceph, etc), and the latency introduced by your VM layer, and 5k sounds
>> about right.
>>
>> The good news is that you probably aren't taxing the storage, you can
>> likely do many simultaneous tests from several VMs and get the same
>> results.
>>
>> You can try adding --numjobs to your fio to parallelize the specific
>> test you're doing, or launching a second VM and doing the same test at
>> the same time. This would be a good indicator if it's latency.
>>
>> On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER
>> <aderumier@odiso.com> wrote:
>>>>> Have you tried increasing the iodepth?
>>> Yes, I have try with 100 and 200, same results.
>>>
>>> I have also try directly from the host, with /dev/rbd1, and I have same result.
>>> I have also try with 3 differents hosts, with differents cpus models.
>>>
>>> (note: I can reach around 40.000 iops with same fio config on a zfs iscsi array)
>>>
>>> My test ceph cluster nodes cpus are old (xeon E5420), but they are around 10% usage, so I think it's ok.
>>>
>>>
>>> Do you have an idea if I can trace something ?
>>>
>>> Thanks,
>>>
>>> Alexandre
>>>
>>> ----- Mail original -----
>>>
>>> De: "Sage Weil" <sage@inktank.com>
>>> À: "Alexandre DERUMIER" <aderumier@odiso.com>
>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Mercredi 31 Octobre 2012 16:57:05
>>> Objet: Re: slow fio random read benchmark, need help
>>>
>>> On Wed, 31 Oct 2012, Alexandre DERUMIER wrote:
>>>> Hello,
>>>>
>>>> I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none), randread, with 4K block size on a small size of 1G (so it can be handle by the buffer cache on ceph cluster)
>>>>
>>>>
>>>> fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M --iodepth=40 --group_reporting --name=file1 --ioengine=libaio --direct=1
>>>>
>>>>
>>>> I can't get more than 5000 iops.
>>>
>>> Have you tried increasing the iodepth?
>>>
>>> sage
>>>
>>>>
>>>>
>>>> RBD cluster is :
>>>> ---------------
>>>> 3 nodes,with each node :
>>>> -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon
>>>> -cpu: 2x 4 cores intel xeon E5420@2.5GHZ
>>>> rbd 0.53
>>>>
>>>> ceph.conf
>>>>
>>>> journal dio = false
>>>> filestore fiemap = false
>>>> filestore flusher = false
>>>> osd op threads = 24
>>>> osd disk threads = 24
>>>> filestore op threads = 6
>>>>
>>>> kvm host is : 4 x 12 cores opteron
>>>> ------------
>>>>
>>>>
>>>> During the bench:
>>>>
>>>> on ceph nodes:
>>>> - cpu is around 10% used
>>>> - iostat show no disks activity on osds. (so I think that the 1G file is handle in the linux buffer)
>>>>
>>>>
>>>> on kvm host:
>>>>
>>>> -cpu is around 20% used
>>>>
>>>>
>>>> I really don't see where is the bottleneck....
>>>>
>>>> Any Ideas, hints ?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Alexandre


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-10-31 20:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-31  1:41 [PATCH 0/6] rbd: version 2 parent probing Alex Elder
2012-10-31  1:49 ` [PATCH 1/6] rbd: skip getting image id if known Alex Elder
2012-10-31 21:05   ` Josh Durgin
2012-10-31  1:49 ` [PATCH 2/6] rbd: allow null image name Alex Elder
2012-10-31 21:07   ` Josh Durgin
2012-10-31  1:49 ` [PATCH 3/6] rbd: get parent spec for version 2 images Alex Elder
2012-11-01  1:33   ` Josh Durgin
2012-10-31  1:49 ` [PATCH 4/6] libceph: define ceph_pg_pool_name_by_id() Alex Elder
2012-11-01  1:34   ` Josh Durgin
2012-10-31  1:49 ` [PATCH 5/6] rbd: get additional info in parent spec Alex Elder
2012-10-31 14:11   ` Alex Elder
2012-11-01  1:49   ` Josh Durgin
2012-11-01 12:18     ` Alex Elder
2012-10-31  1:50 ` [PATCH 6/6] rbd: probe the parent of an image if present Alex Elder
2012-10-31 11:59   ` slow fio random read benchmark, need help Alexandre DERUMIER
2012-10-31 15:57     ` Sage Weil
2012-10-31 16:29       ` Alexandre DERUMIER
2012-10-31 16:50         ` Alexandre DERUMIER
2012-10-31 17:08         ` Marcus Sorensen
2012-10-31 17:27           ` Alexandre DERUMIER
2012-10-31 17:38             ` Marcus Sorensen
2012-10-31 18:56               ` Alexandre DERUMIER
2012-10-31 19:50                 ` Marcus Sorensen
2012-11-01  5:11                   ` Alexandre DERUMIER
2012-11-01  5:41                     ` Stefan Priebe - Profihost AG
2012-10-31 20:22                 ` Josh Durgin [this message]
2012-11-01  7:38             ` Dietmar Maurer
2012-11-01  8:08               ` Stefan Priebe - Profihost AG
2012-11-01 10:40               ` Gregory Farnum
2012-11-01 10:54                 ` Stefan Priebe - Profihost AG
2012-11-02  9:38                   ` Alexandre DERUMIER
2012-11-03 10:01                     ` slow fio random read benchmark: last librbd git : 20000iops ! Alexandre DERUMIER
2012-11-03 12:09                       ` Alexandre DERUMIER
2012-11-01 15:46                 ` slow fio random read benchmark, need help Marcus Sorensen
2012-11-01 16:28                   ` Marcus Sorensen
2012-11-01 17:00                     ` Dietmar Maurer
2012-11-03 17:09                       ` Gregory Farnum
2012-11-04 14:54                         ` Alexandre DERUMIER
2012-11-01  2:07   ` [PATCH 6/6] rbd: probe the parent of an image if present Josh Durgin
2012-11-01 12:26     ` Alex Elder
     [not found] <CAMiztYLY364EXVQu6d6+He-FXd_AsDOqxkTO_DzKk24iJjwTcQ@mail.gmail.com>
2012-10-31 17:22 ` slow fio random read benchmark, need help Alexandre DERUMIER

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5091887E.3020605@inktank.com \
    --to=josh.durgin@inktank.com \
    --cc=aderumier@odiso.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    --cc=shadowsor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.