linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: tytso@mit.edu
Cc: Kyle McMartin <kyle@mcmartin.ca>,
	linux-parisc@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	James.Bottomley@suse.de, hch@infradead.org,
	linux-arch@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [git patches] xfs and block fixes for virtually indexed arches
Date: Thu, 17 Dec 2009 08:46:33 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.0912170839550.15740@localhost.localdomain> (raw)
In-Reply-To: <20091217163036.GE2123@thunk.org>



On Thu, 17 Dec 2009, tytso@mit.edu wrote:
> 
> That's because apparently the iSCSI and DMA blocks assume that they
> have Real Pages (tm) passed to block I/O requests, and apparently XFS
> ran into problems when sending vmalloc'ed pages.  I don't know if this
> is a problem if we pass the bio layer addresses coming from the SLAB
> allocator, but oral tradition seems to indicate this is problematic,
> although no one has given me the full chapter and verse explanation
> about why this is so.

kmalloc() memory should be ok. It's backed by "real pages". Doing the DMA 
translations for such pages is trivial and fundamental.

In contrast, vmalloc is pure and utter unadulterated CRAP. The pages 
may be contiguous virtually, but it makes no difference for the block 
layer, that has to be able to do IO by DMA anyway, so it has to look up 
the page translations in the page tables etc crazy sh*t.

So passing vmalloc'ed page addresses around to something that will 
eventually do a non-CPU-virtual thing on them is fundamentally insane. The 
vmalloc space is about CPU virtual addresses. Such concepts simpyl do not 
-exist- for some random block device.

> Now that I see Linus's complaint, I'm wondering if the issue is really
> about kernel virtual addresses (i.e., coming from vmalloc), and not a
> requirement for Real Pages (i.e., coming from the SLAB allocator as
> opposed to get_free_page).  And can this be documented someplace?  I
> tried looking at the bio documentation, and couldn't find anything
> definitive on the subject.

The whole "vmalloc is special" has always been true. If you want to 
treat vmalloc as normal memory, you need to look up the pages yourself. We 
have helpers for that (including helpers that populate vmalloc space from 
a page array to begin with - so you can _start_ from some array of pages 
and then lay them out virtually if you want to have a convenient CPU 
access to the array).

And this whole "vmalloc is about CPU virtual addresses" is so obviously 
and fundamentally true that I don't understand how anybody can ever be 
confused about it. The "v" in vmalloc is for "virtual" as in virtual 
memory.

Think of it like virtual user addresses. Does anybody really expect to be 
able to pass a random user address to the BIO layer?

And if you do, I would suggest that you get out of kernel programming 
pronto. You're a danger to society, and have a lukewarm IQ. I don't want 
you touching kernel code.

And no, I do _not_ want the BIO layer having to walk page tables. Not for 
vmalloc space, not for user virtual addresses.

(And don't tell me it already does. Maybe somebody sneaked it in past me, 
without me ever noticing. That wouldn't be an excuse, that would be just 
sad. Jesus wept)

			Linus

  reply	other threads:[~2009-12-17 16:46 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091216043618.GB9104@hera.kernel.org>
2009-12-17 13:22 ` [git patches] xfs and block fixes for virtually indexed arches Kyle McMartin
2009-12-17 13:22   ` Kyle McMartin
2009-12-17 13:25   ` Christoph Hellwig
2009-12-17 16:16   ` Linus Torvalds
2009-12-17 16:30     ` tytso
2009-12-17 16:46       ` Linus Torvalds [this message]
2009-12-17 16:46         ` Linus Torvalds
2009-12-17 17:07         ` Christoph Hellwig
2009-12-17 17:07           ` Christoph Hellwig
2009-12-17 17:42           ` Linus Torvalds
2009-12-17 17:51             ` Christoph Hellwig
2009-12-17 17:51               ` Christoph Hellwig
2009-12-17 18:08             ` Russell King
2009-12-17 18:08               ` Russell King
2009-12-17 18:17               ` Linus Torvalds
2009-12-17 18:17                 ` Linus Torvalds
2009-12-19 18:33             ` Ralf Baechle
2009-12-19 18:33               ` Ralf Baechle
2009-12-21 17:14               ` James Bottomley
2009-12-17 17:39         ` tytso
2009-12-17 17:39           ` tytso
2009-12-17 17:51           ` Linus Torvalds
2009-12-17 19:36             ` Jens Axboe
2009-12-17 19:36               ` Jens Axboe
2009-12-17 23:57               ` James Bottomley
2009-12-17 23:57                 ` James Bottomley
2009-12-18  1:00                 ` FUJITA Tomonori
2009-12-18  2:44                   ` Dave Chinner
2009-12-18  2:44                     ` Dave Chinner
2009-12-18  3:51                     ` FUJITA Tomonori
2009-12-18  3:51                       ` FUJITA Tomonori
2009-12-18  7:10                     ` James Bottomley
2009-12-18  7:08                   ` James Bottomley
2009-12-18  9:34                     ` FUJITA Tomonori
2009-12-18 10:01                       ` James Bottomley
2009-12-18 10:01                         ` James Bottomley
2009-12-18 10:24                         ` FUJITA Tomonori
2009-12-18 10:30                           ` James Bottomley
2009-12-18 12:00                     ` Dave Chinner
2009-12-18 12:00                       ` Dave Chinner
2009-12-18  0:21           ` FUJITA Tomonori
2009-12-18 14:17             ` tytso
2009-12-18 14:17               ` tytso
2009-12-21  8:53               ` FUJITA Tomonori
2009-12-17 17:10       ` Christoph Hellwig
2009-12-17 17:10       ` Christoph Hellwig
2009-12-17 17:33         ` tytso
2009-12-17 17:33           ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0912170839550.15740@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=James.Bottomley@suse.de \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=kyle@mcmartin.ca \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).