linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Dave Chinner <david@fromorbit.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	jens.axboe@oracle.com, torvalds@linux-foundation.org,
	tytso@mit.edu, kyle@mcmartin.ca, linux-parisc@vger.kernel.org,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	linux-arch@vger.kernel.org
Subject: Re: [git patches] xfs and block fixes for virtually indexed arches
Date: Fri, 18 Dec 2009 08:10:39 +0100	[thread overview]
Message-ID: <1261120239.3013.10.camel@mulgrave.site> (raw)
In-Reply-To: <20091218024440.GG4850@discord.disaster>

On Fri, 2009-12-18 at 13:44 +1100, Dave Chinner wrote:
> On Fri, Dec 18, 2009 at 10:00:21AM +0900, FUJITA Tomonori wrote:
> > On Fri, 18 Dec 2009 00:57:00 +0100
> > James Bottomley <James.Bottomley@suse.de> wrote:
> > 
> > > On Thu, 2009-12-17 at 20:36 +0100, Jens Axboe wrote:
> > > > On Thu, Dec 17 2009, Linus Torvalds wrote:
> > > > > 
> > > > > 
> > > > > On Thu, 17 Dec 2009, tytso@mit.edu wrote:
> > > > > > 
> > > > > > Sure, but there's some rumors/oral traditions going around that some
> > > > > > block devices want bio address which are page aligned, because they
> > > > > > want to play some kind of refcounting game,
> > > > > 
> > > > > Yeah, you might be right at that.
> > > > > 
> > > > > > And it's Weird Shit(tm) (aka iSCSI, AoE) type drivers, that most of us 
> > > > > > don't have access to, so just because it works Just Fine on SATA doesn't 
> > > > > > mean anything.
> > > > > > 
> > > > > > And none of this is documented anywhere, which is frustrating as hell.
> > > > > > Just rumors that "if you do this, AoE/iSCSI will corrupt your file
> > > > > > systems".
> > > > > 
> > > > > ACK. Jens? 
> > > > 
> > > > I've heard those rumours too, and I don't even know if they are true.
> > > > Who has a pointer to such a bug report and/or issue? The block layer
> > > > itself doesn't not have any such requirements, and the only places where
> > > > we play page games is for bio's that were explicitly mapped with pages
> > > > by itself (like mapping user data).o
> > > 
> > > OK, so what happened is that prior to the map single fix
> > > 
> > > commit df46b9a44ceb5af2ea2351ce8e28ae7bd840b00f
> > > Author: Mike Christie  <michaelc@cs.wisc.edu>
> > > Date:   Mon Jun 20 14:04:44 2005 +0200
> > > 
> > >     [PATCH] Add blk_rq_map_kern()
> > > 
> > > 
> > > bio could only accept user space buffers, so we had a special path for
> > > kernel allocated buffers.  That commit unified the path (with a separate
> > > block API) so we could now submit kmalloc'd buffers via block APIs.
> > > 
> > > So the rule now is we can accept any user mapped area via
> > > blk_rq_map_user and any kmalloc'd area via blk_rq_map_kern().  We might
> > > not be able to do a stack area (depending on how the arch maps the
> > > stack) and we definitely cannot do a vmalloc'd area.
> > > 
> > > So it sounds like we only need a blk_rq_map_vmalloc() using the same
> > > techniques as the patch set and we're good to go.
> > 
> > I'm not sure about it.
> > 
> > As I said before (when I was against this 'adding vmalloc support to
> > the block layer' stuff), are there potential users of this except for
> > XFS? Are there anyone who does such a thing now?
> 
> As Christoph already mentioned, XFS is not passing the vmalloc'd
> range to the block layer - it passes the underlying pages to the
> block layer. Hence I'm not sure there actually is anyone who is
> passing vmalloc'd addresses to the block layer. Perhaps we should
> put a WARN_ON() in the block layer to catch anyone doing such a
> thing before considering supporting vmalloc'd addresses in the block
> layer?

vmalloc is just an alias for vmap/vmalloc in the above statements
(basically anything with an additional kernel virtual mapping which
causes aliases).  If we support vmap, we naturally support vmalloc as
well.

> > This API might be useful for only journaling file systems using log
> > formats that need large contiguous buffer. Sound like only XFS?
> 
> FWIW, mapped buffers larger than PAGE_SIZE are used for more than just log
> recovery in XFS. e.g. filesystems with directory block size larger
> than page size uses mapped buffers.

However, XFS is the only fs that actually uses kernel virtual mapping to
solve this problem.

James

  parent reply	other threads:[~2009-12-18  7:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091216043618.GB9104@hera.kernel.org>
2009-12-17 13:22 ` [git patches] xfs and block fixes for virtually indexed arches Kyle McMartin
2009-12-17 13:22   ` Kyle McMartin
2009-12-17 13:25   ` Christoph Hellwig
2009-12-17 16:16   ` Linus Torvalds
2009-12-17 16:30     ` tytso
2009-12-17 16:46       ` Linus Torvalds
2009-12-17 16:46         ` Linus Torvalds
2009-12-17 17:07         ` Christoph Hellwig
2009-12-17 17:07           ` Christoph Hellwig
2009-12-17 17:42           ` Linus Torvalds
2009-12-17 17:51             ` Christoph Hellwig
2009-12-17 17:51               ` Christoph Hellwig
2009-12-17 18:08             ` Russell King
2009-12-17 18:08               ` Russell King
2009-12-17 18:17               ` Linus Torvalds
2009-12-17 18:17                 ` Linus Torvalds
2009-12-19 18:33             ` Ralf Baechle
2009-12-19 18:33               ` Ralf Baechle
2009-12-21 17:14               ` James Bottomley
2009-12-17 17:39         ` tytso
2009-12-17 17:39           ` tytso
2009-12-17 17:51           ` Linus Torvalds
2009-12-17 19:36             ` Jens Axboe
2009-12-17 19:36               ` Jens Axboe
2009-12-17 23:57               ` James Bottomley
2009-12-17 23:57                 ` James Bottomley
2009-12-18  1:00                 ` FUJITA Tomonori
2009-12-18  2:44                   ` Dave Chinner
2009-12-18  2:44                     ` Dave Chinner
2009-12-18  3:51                     ` FUJITA Tomonori
2009-12-18  3:51                       ` FUJITA Tomonori
2009-12-18  7:10                     ` James Bottomley [this message]
2009-12-18  7:08                   ` James Bottomley
2009-12-18  9:34                     ` FUJITA Tomonori
2009-12-18 10:01                       ` James Bottomley
2009-12-18 10:01                         ` James Bottomley
2009-12-18 10:24                         ` FUJITA Tomonori
2009-12-18 10:30                           ` James Bottomley
2009-12-18 12:00                     ` Dave Chinner
2009-12-18 12:00                       ` Dave Chinner
2009-12-18  0:21           ` FUJITA Tomonori
2009-12-18 14:17             ` tytso
2009-12-18 14:17               ` tytso
2009-12-21  8:53               ` FUJITA Tomonori
2009-12-17 17:10       ` Christoph Hellwig
2009-12-17 17:10       ` Christoph Hellwig
2009-12-17 17:33         ` tytso
2009-12-17 17:33           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1261120239.3013.10.camel@mulgrave.site \
    --to=james.bottomley@hansenpartnership.com \
    --cc=david@fromorbit.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=kyle@mcmartin.ca \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).