All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, "Yan, Zheng" <zyan@redhat.com>,
	Sage Weil <sage@redhat.com>,
	Ceph Development <ceph-devel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Zhu, Caifeng" <zhucaifeng@unissoft-nj.com>
Subject: Re: [PATCH v2] ceph/iov_iter: fix bad iov_iter handling in ceph splice codepaths
Date: Wed, 18 Jan 2017 07:14:48 -0500	[thread overview]
Message-ID: <1484741688.2669.3.camel@redhat.com> (raw)
In-Reply-To: <CAOi1vP8bqHAt_ZfTxTuqk7gxiia-Q=X+V4b409jCdW_y=kAJOg@mail.gmail.com>

On Thu, 2017-01-12 at 12:37 +0100, Ilya Dryomov wrote:
> On Thu, Jan 12, 2017 at 12:27 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > On Thu, 2017-01-12 at 07:59 +0000, Al Viro wrote:
> > > On Tue, Jan 10, 2017 at 07:57:31AM -0500, Jeff Layton wrote:
> > > > 
> > > > v2: fix bug in offset handling in iov_iter_pvec_size
> > > > 
> > > > xfstest generic/095 triggers soft lockups in kcephfs. Basically it uses
> > > > fio to drive some I/O via vmsplice ane splice. Ceph then ends up trying
> > > > to access an ITER_BVEC type iov_iter as a ITER_IOVEC one. That causes it
> > > > to pick up a wrong offset and get stuck in an infinite loop while trying
> > > > to populate the page array. dio_get_pagev_size has a similar problem.
> > > > 
> > > > To fix the first problem, add a new iov_iter helper to determine the
> > > > offset into the page for the current segment and have ceph call that.
> > > > I would just replace dio_get_pages_alloc with iov_iter_get_pages_alloc,
> > > > but that will only return a single page at a time for ITER_BVEC and
> > > > it's better to make larger requests when possible.
> > > > 
> > > > For the second problem, we simply replace it with a new helper that does
> > > > what it does, but properly for all iov_iter types.
> > > > 
> > > > Since we're moving that into generic code, we can also utilize the
> > > > iterate_all_kinds macro to simplify this. That means that we need to
> > > > rework the logic a bit since we can't advance to the next vector while
> > > > checking the current one.
> > > 
> > > Yecchhh...  That really looks like exposing way too low-level stuff instead
> > > of coming up with saner primitive ;-/
> > > 
> > 
> > Fair point. That said, I'm not terribly thrilled with how
> > iov_iter_get_pages* works right now.
> > 
> > Note that it only ever touches the first vector. Would it not be better
> > to keep getting page references if the bvec/iov elements are aligned
> > properly? It seems quite plausible that they often would be, and being
> > able to hand back a larger list of pages in most cases would be
> > advantageous.
> > 
> > IOW, should we have iov_iter_get_pages basically do what
> > dio_get_pages_alloc does -- try to build as long an array of pages as
> > possible before returning, provided that the alignment works out?
> > 
> > The NFS DIO code, for instance, could also benefit there. I know we've
> > had reports there in the past that sending down a bunch of small iovecs
> > causes a lot of small-sized requests on the wire.
> > 
> > > Is page vector + offset in the first page + number of bytes really what
> > > ceph wants?  Would e.g. an array of bio_vec be saner?  Because _that_
> > > would make a lot more natural iov_iter_get_pages_alloc() analogue...
> > > 
> > > And yes, I realize that you have ->pages wired into the struct ceph_osd_request;
> > > how painful would it be to have it switched to struct bio_vec array instead?
> > 
> > Actually...it looks like that might not be too hard. The low-level OSD
> > handling code can already handle bio_vec arrays in order to service RBD.
> > It looks like we could switch cephfs to use
> > osd_req_op_extent_osd_data_bio instead of
> > osd_req_op_extent_osd_data_pages. That would add a dependency in cephfs
> > on CONFIG_BLOCK, but I think we could probably live with that.
> 
> Ah, just that part might be easy enough ;)
> 
> 

Yeah, that part doesn't look too bad. Regardless though, I think we need
to get a fix in for this sooner rather than later as it's trivial to get
the kernel stuck in this loop today, by any user with write access to a
ceph mount.

Al, when you mentioned switching this over to a bio_vec based interface,
were you planning to roll up the iov_iter->bio_vec array helper for
this, or should I be looking into doing that?

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2017-01-18 12:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-06 18:23 [PATCH] ceph/iov_iter: fix bad iov_iter handling in ceph splice codepaths Jeff Layton
2017-01-06 20:36 ` Jeff Layton
2017-01-09 23:11 ` Jeff Layton
2017-01-10 12:57 ` [PATCH v2] " Jeff Layton
2017-01-11  2:42   ` Yan, Zheng
2017-01-12  7:59   ` Al Viro
2017-01-12 11:13     ` Ilya Dryomov
2017-01-12 11:37       ` Al Viro
2017-01-12 11:46         ` Ilya Dryomov
2017-01-12 11:53           ` Al Viro
2017-01-12 12:17             ` Ilya Dryomov
2017-01-12 11:57           ` Jeff Layton
2017-01-12 11:27     ` Jeff Layton
2017-01-12 11:37       ` Ilya Dryomov
2017-01-18 12:14         ` Jeff Layton [this message]
2017-01-24 15:00           ` Jeff Layton
2017-01-24 20:51             ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484741688.2669.3.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhucaifeng@unissoft-nj.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.