From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: Object Write Latency Date: Fri, 20 Sep 2013 08:11:29 -0500 Message-ID: <523C4981.9080406@inktank.com> References: <3472A07E6605974CBC9BC573F1BC02E4A52724A9@PLOXCHG03.cern.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-vc0-f179.google.com ([209.85.220.179]:53594 "EHLO mail-vc0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754590Ab3ITNLe (ORCPT ); Fri, 20 Sep 2013 09:11:34 -0400 Received: by mail-vc0-f179.google.com with SMTP id ht10so248023vcb.24 for ; Fri, 20 Sep 2013 06:11:34 -0700 (PDT) In-Reply-To: <3472A07E6605974CBC9BC573F1BC02E4A52724A9@PLOXCHG03.cern.ch> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Andreas Joachim Peters Cc: "ceph-devel@vger.kernel.org" On 09/20/2013 07:27 AM, Andreas Joachim Peters wrote: > Hi, Hi Andreas! > > we made some benchmarks about object read/write latencies on the CERN ceph installation. > > The cluster has 44 nodes and ~1k disks, all on 10GE and the pool configuration has 3 copies. > Client & Server is 0.67. > > The latencies we observe (using tiny objects ... 5 bytes) on the idle pool: > > write full object(sync) ~65-80ms > append to object ~60-75ms > set xattr object ~65-80ms > lock object ~65-80ms > stat object ~1ms > > We seem to saturate the pools writing ~ 20k objects/s (= internally 60k/s). Out of curiosity, how much difference do you see with write latencies if you do the same thing to a pool with 1 copy? > > Is there an easy explanation for 80 ms (quasi without payload) and a possible tuning to reduce that? > I measured (append few bytes +fsync) on such a disk around 33ms which explains probably part of the latency. I've been wanting to really dig into object write latency in RADOS but just haven't had the time to devote to it yet. I've been doing some simple rados bench tests to a 8-SSD test node and am topping out at about 8-9K write IOPS and 26K read IOPS (no replication) though with little tuning. I suspect there are many areas in the code where we could improve things. > > Then I tried with the async API to see if there is a difference in the measurement between wait_for_complete or wait_for_safe ... shouldn't wait_for_complete be much shorter, but I get always comparable results ... Hrm, I'm going to let Sage or someone else comment on this. > > Thanks, Andreas.-- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html