linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kmo@daterainc.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Jens Axboe <axboe@kernel.dk>,
	linux-next@vger.kernel.org, linux-kernel@vger.kernel.org,
	Zach Brown <zab@zabbo.net>, Olof Johansson <olof@lixom.net>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: linux-next: manual merge of the block tree with the  tree
Date: Fri, 8 Nov 2013 01:21:06 -0800	[thread overview]
Message-ID: <20131108092106.GA30271@kmo-pixel> (raw)
In-Reply-To: <20131108083251.GA20121@infradead.org>

On Fri, Nov 08, 2013 at 12:32:51AM -0800, Christoph Hellwig wrote:
> On Fri, Nov 08, 2013 at 12:17:37AM -0800, Kent Overstreet wrote:
> > The core issue isn't whether the IO is going to a block based filesystem
> > (but thanks for pointing out that that's not necessarily true!) but
> > whether we want to work with pinned pages or not. If pinned pages are ok
> > for everything, then bios as a common interface work - likely evolving
> > them a bit to be more general (it's just bi_bdev and bi_sector that's
> > actually block specific) - and IMO that would be far preferable to this
> > abstraction layer.
> > 
> > If OTOH we need a common interface that's also for places where we can't
> > afford the overhead of pinning user pages - that's a different story,
> > and maybe we do need all this infrastructure then. That's why I'm asking
> > about the stuff you meantioned, I'm honestly not sure.
> 
> For both of them we will deal with kernel-allocated pages that are never
> mapped to userspace.  This is likely to be true for all the consumers
> of in-kernel aio/dio as the existing interfaces handle user pages just
> fine.

Ok, that's good to know.

> > What I'm working towards though is a clean separation between buffered
> > and direct code paths, so that buffered IO can continue work with iovs
> > and for O_DIRECT the first thing you do is fill out a bio with pinned
> > pages and send it down to filesystem code or wherever it's going to go.
> 
> I don't think pushing bios above the fs interface is a good idea. Note
> that the iovecs come from userspace for the user pages cases, so there
> is little we can do about that, and non-bio based direct I/O
> implementations generally work directly at just that level and never
> even touch the direct-io.c code.

Bios can point to userspage pages just fine (and they do today for DIO
to block devices/block based filesystems today). Don't think of bios as
"block device IOs", just think of them as the equivalent of an iovec +
iov_iter except instead of (potentially userspace) pointers you have
page pointers. That's the core part of what they do (and even if we
don't standardize on bios for that we should standardize on _something_
for that functionality).

Here's the helper function I wrote for my dio rewrite - it should really
take an iov_iter instead of uaddr and len, but user iovec -> bio is the
easy bit:

http://evilpiepirate.org/git/linux-bcache.git/commit/?h=block_stuff&id=4462c03167767c656986afaf981f891705fd5d3b

> If you want to redo the ->direct_IO address_space operation and
> generic_file_direct_write and the direct I/O side of
> generic_file_aio_read (both of which aren't anywhere near as generic as
> the name claims) I'm all for it, but it really won't affect the consumer
> of the in-kernel aio/dio code.

I'm skeptical, but I'm way too tired to make good arguments and this
touches on too much code that I'm less familiar with.

also the flow of control in this code is such a goddamn clusterfuck I
don't even know what to say.

I'll dig more into the ecryptfs and target aio stuff tomorrow though.

> > That make sense? I can show you more concretely what I'm working on if
> > you want. Or if I'm full of crap and this is useless for what you guys
> > want I'm sure you'll let me know :)
> 
> It sounds interesting, but also a little confusing at this point, at
> least from the non-block side of view.

Zack, you want to chime in? He was involved in the discussion yesterday,
he might be able to explain this stuff better than I.

  reply	other threads:[~2013-11-08  9:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-01  3:20 linux-next: manual merge of the block tree with the tree Stephen Rothwell
2013-11-01 15:10 ` Jens Axboe
2013-11-01 20:22   ` Stephen Rothwell
2013-11-01 20:27     ` Jens Axboe
2013-11-01 20:41       ` Dave Kleikamp
2013-11-01 20:53         ` Jens Axboe
2013-11-01 21:07           ` Dave Kleikamp
2013-11-02 20:50           ` Dave Kleikamp
2013-11-07 19:17             ` Olof Johansson
2013-11-07 19:20               ` Kent Overstreet
2013-11-07 19:20             ` Dave Kleikamp
2013-11-07 19:25               ` Kent Overstreet
2013-11-07 19:38                 ` Dave Kleikamp
2013-11-08  0:04                 ` Dave Kleikamp
2013-11-08  1:53                   ` Stephen Rothwell
2013-11-08  2:08                     ` Kent Overstreet
2013-11-08  2:32                       ` Dave Kleikamp
2013-11-08  7:33                         ` Christoph Hellwig
2013-11-08  7:39                           ` Kent Overstreet
2013-11-08  7:44                             ` Christoph Hellwig
2013-11-08  7:56                               ` Kent Overstreet
2013-11-08  8:02                                 ` Christoph Hellwig
2013-11-08  8:17                                   ` Kent Overstreet
2013-11-08  8:32                                     ` Christoph Hellwig
2013-11-08  9:21                                       ` Kent Overstreet [this message]
2013-11-08 17:56                                         ` Zach Brown
2013-11-08 15:10                           ` Dave Kleikamp
2013-11-08 15:29                           ` Jens Axboe
2013-11-08 16:15                             ` Jens Axboe
2013-11-10 21:32                               ` Stephen Rothwell
2013-11-08  2:39                     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2010-12-17  1:28 Stephen Rothwell
2010-12-17 14:53 ` James Bottomley
2010-12-18  7:15   ` Tejun Heo
2009-09-10  4:48 Stephen Rothwell
2009-09-10  7:24 ` Jens Axboe
2009-09-10  7:40   ` Stephen Rothwell
2009-09-10  7:43     ` Jens Axboe
2009-07-01  5:37 Stephen Rothwell
2009-07-01  6:59 ` Jens Axboe
2009-05-18  4:53 Stephen Rothwell
2009-05-18  6:27 ` Jens Axboe
2009-05-18 12:34   ` Rusty Russell
2009-05-18 12:42     ` Jens Axboe
2009-05-19  0:11       ` Stephen Rothwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131108092106.GA30271@kmo-pixel \
    --to=kmo@daterainc.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dave.kleikamp@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=olof@lixom.net \
    --cc=sfr@canb.auug.org.au \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).