linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@fusionio.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clmason@fusionio.com>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Jens Axboe <axboe@kernel.dk>,
	Jeff Chua <jeff.chua.linux@gmail.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>, Jan Kara <jack@suse.cz>,
	lkml <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
Date: Thu, 29 Nov 2012 12:51:02 -0500	[thread overview]
Message-ID: <20121129175102.GA3490@shiny> (raw)
In-Reply-To: <CA+55aFzYzLKGfb3vw7A4y1NU2XB4DFmVp4UnYqPJafufaNqhEg@mail.gmail.com>

On Thu, Nov 29, 2012 at 10:26:56AM -0700, Linus Torvalds wrote:
> On Thu, Nov 29, 2012 at 6:12 AM, Chris Mason <chris.mason@fusionio.com> wrote:
> >
> > Jumping in based on Linus original patch, which is doing something like
> > this:
> >
> > set_blocksize() {
> >         block new calls to writepage, prepare/commit_write
> >         set the block size
> >         unblock
> >
> >         < --- can race in here and find bad buffers --->
> >
> >         sync_blockdev()
> >         kill_bdev()
> >
> >         < --- now we're safe --- >
> > }
> >
> > We could add a second semaphore and a page_mkwrite call:
> 
> Yeah, we could be fancy, but the more I think about it, the less I can
> say I care.
> 
> After all, the only things that do the whole set_blocksize() thing should be:
> 
>  - filesystems at mount-time
> 
>  - things like loop/md at block device init time.
> 
> and quite frankly, if there are any *concurrent* writes with either of
> the above, I really *really* don't think we should care. I mean,
> seriously.
> 
> So the _only_ real reason for the locking in the first place is to
> make sure of internal kernel consistency. We do not want to oops or
> corrupt memory if people do odd things. But we really *really* don't
> care if somebody writes to a partition at the same time as somebody
> else mounts it. Not enough to do extra work to please insane people.
> 
> It's also worth noting that NONE OF THIS HAS EVER WORKED IN THE PAST.
> The whole sequence always used to be unlocked. The locking is entirely
> new. There is certainly not any legacy users that can possibly rely on
> "I did writes at the same time as the mount with no serialization, and
> it worked". It never has worked.
> 
> So I think this is a case of "perfect is the enemy of good".
> Especially since I think that with the fs/buffer.c approach, we don't
> actually need any locking at all at higher levels.

The bigger question is do we have users that expect to be able to set
the blocksize after mmaping the block device (no writes required)?  I
actually feel a little bad for taking up internet bandwidth asking, but
it is a change in behaviour.

Regardless, changing mmap for a race in the page cache is just backwards, and
with the current 3.7 code, we can still trigger the race with fadvise ->
readpage in the middle of set_blocksize()

Obviously nobody does any of this, otherwise we'd have tons of reports
from those handy WARN_ONs in fs/buffer.c.  So its definitely hard to be
worried one way or another.

-chris

  reply	other threads:[~2012-11-29 17:51 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAAJw_ZtbhE5Jtd4PsWx8a23QdFTW7aMrKBmRf-bo5Wrean9Xhg@mail.gmail.com>
2012-11-20 18:09 ` Recent kernel "mount" slow Jan Kara
2012-11-21 15:46   ` Jeff Chua
2012-11-22 14:30     ` Jeff Chua
2012-11-22 19:21       ` Linus Torvalds
2012-11-23 13:24         ` Jens Axboe
2012-11-23 22:21           ` Jeff Chua
2012-11-23 23:31             ` Jeff Chua
2012-11-23 23:48               ` Jeff Chua
2012-11-24 21:09             ` Mikulas Patocka
2012-11-24 23:23               ` Jeff Chua
2012-11-27  5:57                 ` Jeff Chua
2012-11-27  7:38                   ` Jens Axboe
2012-11-27  7:44                     ` Jens Axboe
2012-11-27  8:45                       ` Jeff Chua
2012-11-27 10:06                     ` Jeff Chua
2012-11-27 12:33                       ` Jens Axboe
2012-11-28  3:57                         ` Mikulas Patocka
2012-11-28  8:33                           ` Jens Axboe
2012-11-28 13:05                             ` Jeff Chua
2012-11-28 17:25                             ` [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) Mikulas Patocka
2012-11-28 19:15                               ` Linus Torvalds
2012-11-28 19:43                                 ` Al Viro
2012-11-28 19:53                                   ` Linus Torvalds
2012-11-28 22:01                                   ` [PATCH v2] Do a proper locking for mmap and block size change Mikulas Patocka
2012-11-29 17:19                                     ` Linus Torvalds
2012-11-29 18:23                                       ` Mikulas Patocka
2012-11-29 18:46                                         ` Linus Torvalds
2012-11-29 19:02                                       ` Linus Torvalds
2012-11-29 19:15                                         ` Chris Mason
2012-11-29 19:26                                           ` Linus Torvalds
2012-11-29 19:48                                             ` Chris Mason
2012-11-29 19:55                                               ` Linus Torvalds
2012-11-29 20:10                                                 ` Linus Torvalds
2012-11-29 20:52                                               ` Linus Torvalds
2012-11-29 21:29                                                 ` Chris Mason
2012-11-29 22:16                                                   ` Linus Torvalds
2012-11-29 22:36                                                     ` Linus Torvalds
2012-11-30  1:16                                                       ` Chris Mason
2012-11-30  2:13                                                         ` Linus Torvalds
2012-11-30  2:27                                                           ` Chris Mason
2012-11-30  2:49                                                     ` Dave Chinner
2012-11-30 14:31                                                       ` Chris Mason
2012-11-30 16:42                                                         ` Linus Torvalds
2012-11-30 16:36                                                       ` Christoph Hellwig
2012-11-30 22:40                                                         ` Dave Chinner
2012-11-30 23:09                                                           ` Christoph Hellwig
2012-11-29 19:50                                             ` Linus Torvalds
2012-11-28 19:50                                 ` [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow) Mikulas Patocka
2012-11-28 20:03                                   ` Linus Torvalds
2012-11-28 20:13                                     ` Linus Torvalds
2012-11-28 20:32                                       ` Linus Torvalds
2012-11-28 20:47                                         ` Linus Torvalds
2012-11-28 22:10                                           ` Mikulas Patocka
2012-11-28 21:29                                       ` Mikulas Patocka
2012-11-28 22:52                                         ` Linus Torvalds
2012-11-28 23:13                                           ` Linus Torvalds
2012-11-29  1:20                                             ` Mikulas Patocka
2012-11-29  0:38                                           ` Mikulas Patocka
2012-11-29  2:04                                             ` Linus Torvalds
2012-11-29  2:58                                               ` Linus Torvalds
2012-11-29  6:16                                                 ` Linus Torvalds
2012-11-29  6:25                                                   ` Al Viro
2012-11-29  6:30                                                     ` Al Viro
2012-11-29  6:37                                                       ` Linus Torvalds
2012-11-29  6:45                                                         ` Al Viro
2012-11-29 10:57                                                           ` Jeff Chua
2012-11-29  6:33                                                     ` Linus Torvalds
2012-11-29 14:12                                                   ` Chris Mason
2012-11-29 17:26                                                     ` Chris Mason
2012-11-29 17:26                                                     ` Linus Torvalds
2012-11-29 17:51                                                       ` Chris Mason [this message]
2012-11-29 18:12                                                         ` Linus Torvalds
2012-11-28  3:59                       ` [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited Mikulas Patocka
2012-11-28  4:01                         ` [PATCH 2/2] block_dev: don't take the write lock if block size doesn't change Mikulas Patocka
2012-11-28 14:24                           ` Jeff Chua
2012-11-28 22:03                             ` Mikulas Patocka
2012-11-28 14:19                         ` [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited Jeff Chua
2012-11-30  0:06                         ` Andrew Morton
2012-11-30  3:00                           ` Mikulas Patocka
2012-11-30 13:42                             ` Paul E. McKenney
2012-11-30 18:57                           ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121129175102.GA3490@shiny \
    --to=chris.mason@fusionio.com \
    --cc=axboe@kernel.dk \
    --cc=clmason@fusionio.com \
    --cc=jack@suse.cz \
    --cc=jeff.chua.linux@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).