All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Mitchell <jeffrey.mitchell@gmail.com>
To: James Harper <james.harper@bendigoit.com.au>
Cc: "Harald Rößler" <Harald.Roessler@btd.de>,
	"Wolfgang Hennerbichler"
	<wolfgang.hennerbichler@risc-software.at>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: poor write performance
Date: Sat, 20 Apr 2013 17:04:44 -0400	[thread overview]
Message-ID: <517302EC.5020701@gmail.com> (raw)
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B4F35E2CA@BITCOM1.int.sbss.com.au>

James Harper wrote:
>> Hi James,
>>
>> do you VLAN's interfaces configured on your bonding interfaces? Because
>> I saw a similar situation in my setup.
>>
>
> No VLAN's on my bonding interface, although extensively used elsewhere.

What the OP described is *exactly* like a problem I've been struggling 
with. I thought the blame had lay elsewhere but maybe not.

My setup:

4 Ceph nodes, with 6 OSDs each and dual (bonded) 10GbE, with VLANs, 
running Precise. OSDs are using XFS. Replica count of 3. 3 of these are 
mons.
4 compute nodes, with dual (bonded) 10GbE, with VLANs, running a base of 
Precise along with a 3.6.3 Ceph-provided kernel, running KVM-based VMs. 
2 of these are also mons. VMs are Precise and accessing RBD through the 
kernel client.

(Eventually there will be 12 Ceph nodes. 5 mons seemed an appropriate 
number and when I've run into issues in the past I've actually gotten to 
cases where > 3 mons were knocked out, so 5 is a comfortable number 
unless it's problematic.)

In the VMs, I/O with ext4 is fine -- 10-15MB/s sustained. However, using 
ZFS (via ZFSonLinux, not FUSE), I see write speeds of about 150kb/sec, 
just like the OP.

I had figured that the problem lay with ZFS inside the VM (I've used 
ZFSonLinux on many bare metal machines without a problem for a couple of 
years now). The VMs were using virtio, and I'd heard that it was found 
that pre-1.4 Qemu versions could have some serious problems with virtio 
(which I didn't know at the time); also, I know that the kernel client 
is not the preferred client, and the version I'm using is a rather older 
version of the Ceph-provided builds. As a result, my plan was to try the 
updated Qemu version along with native Qemu librados RBD support once 
Raring was out, as I figured that the problem was either something in 
ZFSonLinux (though I reported the issue and nobody had ever heard of any 
such problem, or had any idea why it would be happening) or something 
specifically about ZFS running inside Qemu, as ext4 in the VMs is fine.

But, this thread has made me wonder if what's actually happening is in 
fact something else -- either something, as someone else saw, to do with 
using VLANs on the bonded interface (although I don't see such a write 
problem with any other traffic going through these VLANs); or, something 
about how ZFS inside the VM is writing to the RBD disk causing some kind 
of giant slowdown in Ceph. The numbers that the OP cited were exactly in 
line with what I was seeing.

I don't know offhand what the block sizes are that the kernel client was 
using, or that the different filesystems inside the VMs might be using 
when trying to write to their virtual disks (I'm guessing that if you 
are using virtio, as I am, it potentially could be anything). But 
perhaps ZFS writes extremely small blocks and ext4 doesn't.

Unfortunately, I don't have access to this testbed for the next few 
weeks, so for the moment I can only recount my experience and not 
actually test out any suggestions (unless I can corral someone with 
access to it to run tests).

Thanks,
Jeff

  reply	other threads:[~2013-04-20 21:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-18 11:46 poor write performance James Harper
2013-04-18 12:15 ` Wolfgang Hennerbichler
2013-04-18 23:11   ` James Harper
2013-04-20 10:52     ` Harald Rößler
2013-04-20 11:12       ` James Harper
2013-04-20 21:04         ` Jeff Mitchell [this message]
2013-04-18 13:43 ` Mark Nelson
2013-04-18 16:46   ` Andrey Korolyov
2013-04-18 17:01     ` Mark Nelson
2013-04-18 23:23   ` James Harper
2013-04-19  7:21     ` James Harper
2013-04-19  7:30       ` James Harper
2013-04-19 11:09         ` James Harper
2013-04-19 14:50           ` Mark Nelson
2013-04-20  0:33             ` James Harper
2013-04-20  1:30               ` James Harper
2013-04-21 13:52                 ` Mark Nelson
2013-04-22  5:32                   ` James Harper
2013-04-22 11:34                     ` Mark Nelson
2013-04-22 11:40                       ` James Harper
2013-04-21 17:56               ` Sylvain Munaut
2013-04-21 23:04                 ` James Harper
2013-04-22  8:34                   ` Sylvain Munaut
2013-04-22 11:34                     ` James Harper
2013-04-22 11:39                       ` Mark Nelson
2013-04-22 11:48                         ` James Harper
2013-04-22 12:01                           ` Mark Nelson
2013-04-22 13:47                             ` Mark Nelson
2013-04-22 15:20                         ` Sage Weil
2013-04-22 15:35                           ` Sylvain Munaut

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517302EC.5020701@gmail.com \
    --to=jeffrey.mitchell@gmail.com \
    --cc=Harald.Roessler@btd.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=james.harper@bendigoit.com.au \
    --cc=wolfgang.hennerbichler@risc-software.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.