Re: [linux-lvm] poor read performance on rbd+LVM, LVM overload

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Mike Snitzer <snitzer@redhat.com>
To: Sage Weil <sage@inktank.com>
Cc: elder@inktank.com, Christoph Hellwig <hch@infradead.org>,
	Ugis <ugis22@gmail.com>,
	linux-lvm@redhat.com,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"ceph-users@ceph.com" <ceph-users@ceph.com>
Subject: Re: [linux-lvm] poor read performance on rbd+LVM, LVM overload
Date: Mon, 21 Oct 2013 13:48:51 -0400	[thread overview]
Message-ID: <20131021174850.GA29416@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1310210853140.29488@cobra.newdream.net>

On Mon, Oct 21 2013 at 12:02pm -0400,
Sage Weil <sage@inktank.com> wrote:

> On Mon, 21 Oct 2013, Mike Snitzer wrote:
> > On Mon, Oct 21 2013 at 10:11am -0400,
> > Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > > > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > > > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > > > fuzzy here, but I seem to recall a property on the request_queue or device 
> > > > that affected this.  RBD is currently doing
> > > 
> > > Unfortunately most device mapper modules still split all I/O into 4k
> > > chunks before handling them.  They rely on the elevator to merge them
> > > back together down the line, which isn't overly efficient but should at
> > > least provide larger segments for the common cases.
> > 
> > It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> > no?  Unless care is taken to assemble larger bios (higher up the IO
> > stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
> > in $PAGE_SIZE granularity.
> > 
> > I would expect direct IO to before better here because it will make use
> > of bio_add_page to build up larger IOs.
> 
> I do know that we regularly see 128 KB requests when we put XFS (or 
> whatever else) directly on top of /dev/rbd*.

Should be pretty straight-forward to identify any limits that are
different by walking sysfs/queue, e.g.:

grep -r . /sys/block/rdbXXX/queue
vs
grep -r . /sys/block/dm-X/queue

Could be there is an unexpected difference.  For instance, there was
this fix recently: http://patchwork.usersys.redhat.com/patch/69661/

> > Taking a step back, the rbd driver is exposing both the minimum_io_size
> > and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
> > the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
> > to respect the limits when it assembles its bios (via bio_add_page).
> > 
> > Sage, any reason why you don't use traditional raid geomtry based IO
> > limits?, e.g.:
> > 
> > minimum_io_size = raid chunk size
> > optimal_io_size = raid chunk size * N stripes (aka full stripe)
> 
> We are... by default we stripe 4M chunks across 4M objects.  You're 
> suggesting it would actually help to advertise a smaller minimim_io_size 
> (say, 1MB)?  This could easily be made tunable.

You're striping 4MB chunks across 4 million stripes?

So the full stripe size in bytes is 17592186044416 (or 16TB)?  Yeah
cannot see how XFS could make use of that ;)

next prev parent reply	other threads:[~2013-10-21 17:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-16 14:46 [linux-lvm] poor read performance on rbd+LVM, LVM overload Ugis
2013-10-16 16:16 ` Sage Weil
2013-10-17  9:06   ` David McBride
2013-10-17 15:18   ` Mike Snitzer
2013-10-18  7:56     ` Ugis
2013-10-19  0:01       ` Sage Weil
2013-10-20 15:18         ` Ugis
2013-10-20 18:21           ` [linux-lvm] [ceph-users] " Josh Durgin
2013-10-21  3:58           ` [linux-lvm] " Sage Weil
2013-10-21 14:11             ` Christoph Hellwig
2013-10-21 15:01               ` Mike Snitzer
2013-10-21 15:06                 ` Mike Snitzer
2013-10-21 16:02                 ` Sage Weil
2013-10-21 17:48                   ` Mike Snitzer [this message]
2013-10-21 18:05                     ` Sage Weil
2013-10-21 18:06                 ` Christoph Hellwig
2013-10-21 18:27                   ` Mike Snitzer
2013-10-30 14:53                     ` Ugis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131021174850.GA29416@redhat.com \
    --to=snitzer@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.com \
    --cc=elder@inktank.com \
    --cc=hch@infradead.org \
    --cc=linux-lvm@redhat.com \
    --cc=sage@inktank.com \
    --cc=ugis22@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).