linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kmo@daterainc.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Maxim V. Patlasov" <mpatlasov@parallels.com>,
	linux-aio@kvack.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [GIT PULL] direct IO support for loop driver
Date: Wed, 20 Nov 2013 13:50:26 -0800	[thread overview]
Message-ID: <20131120215026.GA21515@kmo> (raw)
In-Reply-To: <CA+55aFwgGg2yzoUukVuAz2Feo4kU4=8CNBGY7phUHGSBDt8xwA@mail.gmail.com>

On Wed, Nov 20, 2013 at 01:38:19PM -0800, Linus Torvalds wrote:
> On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > At that point, I just couldn't take it any more.
> 
> Just to clarify, I think it might be fixable. But it does need fixing,
> because I really feel dirty from reading it. And I may not care all
> that deeply about what random drivers or low-level filesystems do, but
> I *do* care about generic code in mm/ and fs/, so making those iovec
> functions uglier makes me go all "Hulk angry! Hulk smash" on the code.
> 
> The whole "separate out checking from user copy" needs to go away.
> There's no excuse for it.
> 
> The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap
> needs to go away. You can do it either by just duplicating the
> function, or by having it use a indirect function for the copy (and
> that indirect function acts like copy_from/to_user() and checks the
> address range - and you can obviously then also have it be a "copy
> from kernel" thing too if you want/need it). And no, you don't then
> make it do *both* the conditional *and* the function pointer like you
> did in that discusting commit that mixes the two with the struct
> iov_iter_ops).
> 
> The "__" versions that don't check the user address range needs to die entirely.
> 
> The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even
> make sense (one of the "i"s is for "iov".
> 
> That "unsigned long data" that contains an iovec *? WTF? How did that
> ever start making sense?
> 
> IOW, there are many many details that just make me absolutely detest
> this series. Enough that there's no way in hell I feel comfortable
> pulling it. But they are likely fixable.

To be honest, I'm skeptical that this approach (adding methods and a bunch of
complexity to iov_iter for backing them with biovecs) is necessary for fixing
the loop driver. 

I've been working on an alternate approach that in the process cleans a bunch of
stuff up - the idea is basically, 1) refactor and massage a bunch of stuff in
the block layer so bios can be created and submitting without caring about the
constraints of the underlying device, and then 2) rewrite the dio code to start
out by pinning pages directly into bios, then query the filesystem to map those
bios wherever - splitting as needed.

What this gets us is a nice clean split where (with some more handwaving; Zach
had some ideas about how to handle some annoying details) we can just submit
bios to some sort of new DIO method - and then the loop driver could just use
that directly.

IMO this would be vastly cleaner; we'd have one data structure - the iov_iter -
for memory that isn't pinned, and another data structure - struct bio - for
iterating over pinned pages. Most of the aforementioned block layer massaging
I've been doing was creating a real iterator for bios (that doesn't modify the
biovecs).

I've also done the DIO rewrite - awhile ago - and it's definitely not ready to
go upstream (various prereqs still aren't in), but it at least shows that the
approach is viable and my rewrite cuts fs/direct-io.c in _half_ while
significantly improving performance:

http://evilpiepirate.org/git/linux-bcache.git/commit/?h=block_stuff&id=caf18e8ec531daea29f61c9aa486443026a343c7

  reply	other threads:[~2013-11-20 21:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp
2013-11-18 19:07 ` Dave Kleikamp
2013-11-20 21:19 ` Linus Torvalds
2013-11-20 21:38   ` Linus Torvalds
2013-11-20 21:50     ` Kent Overstreet [this message]
2013-11-20 22:46     ` Dave Kleikamp
2013-11-21  4:24       ` Stephen Rothwell
2013-11-21  9:58   ` Christoph Hellwig
2013-11-21 10:06     ` Kent Overstreet
2013-11-21 10:11       ` Christoph Hellwig
2013-11-21 10:13         ` Kent Overstreet
2013-11-21 17:34       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131120215026.GA21515@kmo \
    --to=kmo@daterainc.com \
    --cc=axboe@kernel.dk \
    --cc=dave.kleikamp@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatlasov@parallels.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).