From: Chris Mason <chris.mason@fusionio.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
Jens Axboe <axboe@kernel.dk>,
Jeff Chua <jeff.chua.linux@gmail.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>, Jan Kara <jack@suse.cz>,
lkml <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
Date: Thu, 29 Nov 2012 09:12:49 -0500 [thread overview]
Message-ID: <20121129141249.GB30766@shiny> (raw)
In-Reply-To: <CA+55aFxrkX_=4yNQ1TwjqU9=pkRY=ud+Kw3YnADJMJxA-ZqQUg@mail.gmail.com>
On Wed, Nov 28, 2012 at 11:16:21PM -0700, Linus Torvalds wrote:
> On Wed, Nov 28, 2012 at 6:58 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But the fact that the code wants to do things like
> >
> > block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits);
> >
> > seriously seems to be the main thing that keeps us using
> > 'inode->i_blkbits'. Calculating bbits from bh->b_size is just costly
> > enough to hurt (not everywhere, but on some machines).
> >
> > Very annoying.
>
> Hmm. Here's a patch that does that anyway. I'm not 100% happy with the
> whole ilog2 thing, but at the same time, in other cases it actually
> seems to improve code generation (ie gets rid of the whole unnecessary
> two dereferences through page->mapping->host just to get the block
> size, when we have it in the buffer-head that we have to touch
> *anyway*).
>
> Comments? Again, untested.
Jumping in based on Linus original patch, which is doing something like
this:
set_blocksize() {
block new calls to writepage, prepare/commit_write
set the block size
unblock
< --- can race in here and find bad buffers --->
sync_blockdev()
kill_bdev()
< --- now we're safe --- >
}
We could add a second semaphore and a page_mkwrite call:
set_blocksize() {
block new calls to prepare/commit_write and page_mkwrite(), but
leave writepage unblocked.
sync_blockev()
<--- now we're safe. There are no dirty pages and no ways to
make new ones --->
block new calls to readpage (writepage too for good luck?)
kill_bdev()
set the block size
unblock readpage/writepage
unblock prepare/commit_write and page_mkwrite
}
Another way to look at things:
As Linus said in a different email, we don't need to drop the pages, just
the buffers. Once we've blocked prepare/commit_write,
there is no way to make a partially up to date page with dirty data.
We may make fully uptodate dirty pages, but for those we can
just create dirty buffers for the whole page.
As long as we had prepare/commit write blocked while we ran
sync_blockdev, we can blindly detach any buffers that are the wrong size
and just make new ones.
This may or may not apply to loop.c, I'd have to read that more
carefully.
-chris
next prev parent reply other threads:[~2012-11-29 14:12 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAAJw_ZtbhE5Jtd4PsWx8a23QdFTW7aMrKBmRf-bo5Wrean9Xhg@mail.gmail.com>
2012-11-20 18:09 ` Recent kernel "mount" slow Jan Kara
2012-11-21 15:46 ` Jeff Chua
2012-11-22 14:30 ` Jeff Chua
2012-11-22 19:21 ` Linus Torvalds
2012-11-23 13:24 ` Jens Axboe
2012-11-23 22:21 ` Jeff Chua
2012-11-23 23:31 ` Jeff Chua
2012-11-23 23:48 ` Jeff Chua
2012-11-24 21:09 ` Mikulas Patocka
2012-11-24 23:23 ` Jeff Chua
2012-11-27 5:57 ` Jeff Chua
2012-11-27 7:38 ` Jens Axboe
2012-11-27 7:44 ` Jens Axboe
2012-11-27 8:45 ` Jeff Chua
2012-11-27 10:06 ` Jeff Chua
2012-11-27 12:33 ` Jens Axboe
2012-11-28 3:57 ` Mikulas Patocka
2012-11-28 8:33 ` Jens Axboe
2012-11-28 13:05 ` Jeff Chua
2012-11-28 17:25 ` [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) Mikulas Patocka
2012-11-28 19:15 ` Linus Torvalds
2012-11-28 19:43 ` Al Viro
2012-11-28 19:53 ` Linus Torvalds
2012-11-28 22:01 ` [PATCH v2] Do a proper locking for mmap and block size change Mikulas Patocka
2012-11-29 17:19 ` Linus Torvalds
2012-11-29 18:23 ` Mikulas Patocka
2012-11-29 18:46 ` Linus Torvalds
2012-11-29 19:02 ` Linus Torvalds
2012-11-29 19:15 ` Chris Mason
2012-11-29 19:26 ` Linus Torvalds
2012-11-29 19:48 ` Chris Mason
2012-11-29 19:55 ` Linus Torvalds
2012-11-29 20:10 ` Linus Torvalds
2012-11-29 20:52 ` Linus Torvalds
2012-11-29 21:29 ` Chris Mason
2012-11-29 22:16 ` Linus Torvalds
2012-11-29 22:36 ` Linus Torvalds
2012-11-30 1:16 ` Chris Mason
2012-11-30 2:13 ` Linus Torvalds
2012-11-30 2:27 ` Chris Mason
2012-11-30 2:49 ` Dave Chinner
2012-11-30 14:31 ` Chris Mason
2012-11-30 16:42 ` Linus Torvalds
2012-11-30 16:36 ` Christoph Hellwig
2012-11-30 22:40 ` Dave Chinner
2012-11-30 23:09 ` Christoph Hellwig
2012-11-29 19:50 ` Linus Torvalds
2012-11-28 19:50 ` [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) Mikulas Patocka
2012-11-28 20:03 ` Linus Torvalds
2012-11-28 20:13 ` Linus Torvalds
2012-11-28 20:32 ` Linus Torvalds
2012-11-28 20:47 ` Linus Torvalds
2012-11-28 22:10 ` Mikulas Patocka
2012-11-28 21:29 ` Mikulas Patocka
2012-11-28 22:52 ` Linus Torvalds
2012-11-28 23:13 ` Linus Torvalds
2012-11-29 1:20 ` Mikulas Patocka
2012-11-29 0:38 ` Mikulas Patocka
2012-11-29 2:04 ` Linus Torvalds
2012-11-29 2:58 ` Linus Torvalds
2012-11-29 6:16 ` Linus Torvalds
2012-11-29 6:25 ` Al Viro
2012-11-29 6:30 ` Al Viro
2012-11-29 6:37 ` Linus Torvalds
2012-11-29 6:45 ` Al Viro
2012-11-29 10:57 ` Jeff Chua
2012-11-29 6:33 ` Linus Torvalds
2012-11-29 14:12 ` Chris Mason [this message]
2012-11-29 17:26 ` Chris Mason
2012-11-29 17:26 ` Linus Torvalds
2012-11-29 17:51 ` Chris Mason
2012-11-29 18:12 ` Linus Torvalds
2012-11-28 3:59 ` [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited Mikulas Patocka
2012-11-28 4:01 ` [PATCH 2/2] block_dev: don't take the write lock if block size doesn't change Mikulas Patocka
2012-11-28 14:24 ` Jeff Chua
2012-11-28 22:03 ` Mikulas Patocka
2012-11-28 14:19 ` [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited Jeff Chua
2012-11-30 0:06 ` Andrew Morton
2012-11-30 3:00 ` Mikulas Patocka
2012-11-30 13:42 ` Paul E. McKenney
2012-11-30 18:57 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121129141249.GB30766@shiny \
--to=chris.mason@fusionio.com \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=jeff.chua.linux@gmail.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).