All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: cmm@us.ibm.com, jack@suse.cz, linux-ext4@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification.
Date: Thu, 12 Jun 2008 09:36:43 +0530	[thread overview]
Message-ID: <20080612040643.GA5518@skywalker> (raw)
In-Reply-To: <20080611120749.d0c5a7de.akpm@linux-foundation.org>

On Wed, Jun 11, 2008 at 12:07:49PM -0700, Andrew Morton wrote:
> On Wed, 11 Jun 2008 20:38:45 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote:
> > > On Thu,  5 Jun 2008 22:35:12 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > We would like to get notified when we are doing a write on mmap
> > > > section.  The changes are needed to handle ENOSPC when writing to an
> > > > mmap section of files with holes.
> > > > 
> > > 
> > > Whoa.  You didn't copy anything like enough mailing lists for a change
> > > of this magnitude.  I added some.
> > > 
> > > This is a large change in behaviour!
> > > 
> > > a) applications will now get a synchronous SIGBUS when modifying a
> > >    page over an ENOSPC filesystem.  Whereas previously they could have
> > >    proceeded to completion and then detected the error via an fsync().
> > 
> > Or not detect the error at all if we don't call fsync() right ? Isn't a
> > synchronous SIGBUS the right behaviour ?
> >
> 
> Not according to POSIX.  Or at least posix-several-years-ago, when this
> last was discussed.  The spec doesn't have much useful to say about any
> of this.
> 
> It's a significant change in the userspace interface.
> 
> > 
> > > 
> > >    It's going to take more than one skimpy little paragraph to
> > >    justify this, and to demonstrate that it is preferable, and to
> > >    convince us that nothing will break from this user-visible behaviour
> > >    change.
> > > 
> > > b) we're now doing fs operations (and some I/O) in the pagefault
> > >    code.  This has several implications:
> > > 
> > >    - performance changes
> > > 
> > >    - potential for deadlocks when a process takes the fault from
> > >      within a copy_to_user() in, say, mm/filemap.c
> > > 
> > >    - performing additional memory allocations within that
> > >      copy_to_user().  Possibility that these will reenter the
> > >      filesystem.
> > > 
> > > And that's just ext2.
> > > 
> > > For ext3 things are even more complex, because we have the
> > > journal_start/journal_end pair which is effectively another "lock" for
> > > ranking/deadlock purposes.  And now we're taking i_alloc_sem and
> > > lock_page and we're doing ->writepage() and its potential
> > > journal_start(), all potentially within the context of a
> > > copy_to_user().
> > 
> > One of the reason why we would need this in ext3/ext4 is that we cannot
> > do block allocation in the writepage with the recent locking changes.
> 
> Perhaps those recent locking changes were wrong.
> 
> > The locking changes involve changing the locking order of journal_start
> > and page_lock. With writepage we are already called with page_lock and
> > we can't start new transaction needed for block allocation.
> 
> ext3_write_begin() has journal_start() nesting inside the lock_page().
> 

All those are changed as a part of lock inversion changes.



> > But if we agree that we should not do block allocation in page_mkwrite
> > we need to add writepages and allocate blocks in writepages.
> 
> I'm not sure what writepages has to do with pagefaults?
> 

The idea is to have ext3/4_writepages. In writepages start a transaction
and iterate over the pages take the lock and do block allocation. With
that change we should be able to not do block allocation in the
page_mkwrite path. We may still want to do block reservation there.

Something like.

ext4_writepages()
{
	journal_start()
	for_each_page()
	lock_page
	if (bh_unmapped()...)
		block_alloc()
	unlock_page
	journal_stop()

}

ext4_writepage()
{
	for_each_buffer_head()
		if (bh_unmapped()) {
			redirty_page
			unlock_page
			return;
		}
}

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: cmm@us.ibm.com, jack@suse.cz, linux-ext4@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification.
Date: Thu, 12 Jun 2008 09:36:43 +0530	[thread overview]
Message-ID: <20080612040643.GA5518@skywalker> (raw)
In-Reply-To: <20080611120749.d0c5a7de.akpm@linux-foundation.org>

On Wed, Jun 11, 2008 at 12:07:49PM -0700, Andrew Morton wrote:
> On Wed, 11 Jun 2008 20:38:45 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote:
> > > On Thu,  5 Jun 2008 22:35:12 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > We would like to get notified when we are doing a write on mmap
> > > > section.  The changes are needed to handle ENOSPC when writing to an
> > > > mmap section of files with holes.
> > > > 
> > > 
> > > Whoa.  You didn't copy anything like enough mailing lists for a change
> > > of this magnitude.  I added some.
> > > 
> > > This is a large change in behaviour!
> > > 
> > > a) applications will now get a synchronous SIGBUS when modifying a
> > >    page over an ENOSPC filesystem.  Whereas previously they could have
> > >    proceeded to completion and then detected the error via an fsync().
> > 
> > Or not detect the error at all if we don't call fsync() right ? Isn't a
> > synchronous SIGBUS the right behaviour ?
> >
> 
> Not according to POSIX.  Or at least posix-several-years-ago, when this
> last was discussed.  The spec doesn't have much useful to say about any
> of this.
> 
> It's a significant change in the userspace interface.
> 
> > 
> > > 
> > >    It's going to take more than one skimpy little paragraph to
> > >    justify this, and to demonstrate that it is preferable, and to
> > >    convince us that nothing will break from this user-visible behaviour
> > >    change.
> > > 
> > > b) we're now doing fs operations (and some I/O) in the pagefault
> > >    code.  This has several implications:
> > > 
> > >    - performance changes
> > > 
> > >    - potential for deadlocks when a process takes the fault from
> > >      within a copy_to_user() in, say, mm/filemap.c
> > > 
> > >    - performing additional memory allocations within that
> > >      copy_to_user().  Possibility that these will reenter the
> > >      filesystem.
> > > 
> > > And that's just ext2.
> > > 
> > > For ext3 things are even more complex, because we have the
> > > journal_start/journal_end pair which is effectively another "lock" for
> > > ranking/deadlock purposes.  And now we're taking i_alloc_sem and
> > > lock_page and we're doing ->writepage() and its potential
> > > journal_start(), all potentially within the context of a
> > > copy_to_user().
> > 
> > One of the reason why we would need this in ext3/ext4 is that we cannot
> > do block allocation in the writepage with the recent locking changes.
> 
> Perhaps those recent locking changes were wrong.
> 
> > The locking changes involve changing the locking order of journal_start
> > and page_lock. With writepage we are already called with page_lock and
> > we can't start new transaction needed for block allocation.
> 
> ext3_write_begin() has journal_start() nesting inside the lock_page().
> 

All those are changed as a part of lock inversion changes.



> > But if we agree that we should not do block allocation in page_mkwrite
> > we need to add writepages and allocate blocks in writepages.
> 
> I'm not sure what writepages has to do with pagefaults?
> 

The idea is to have ext3/4_writepages. In writepages start a transaction
and iterate over the pages take the lock and do block allocation. With
that change we should be able to not do block allocation in the
page_mkwrite path. We may still want to do block reservation there.

Something like.

ext4_writepages()
{
	journal_start()
	for_each_page()
	lock_page
	if (bh_unmapped()...)
		block_alloc()
	unlock_page
	journal_stop()

}

ext4_writepage()
{
	for_each_buffer_head()
		if (bh_unmapped()) {
			redirty_page
			unlock_page
			return;
		}
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-06-12  4:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-05 17:05 [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-06-05 17:05 ` [PATCH] ext3: " Aneesh Kumar K.V
2008-06-05 19:30 ` [PATCH] ext2: " Andrew Morton
2008-06-05 19:30   ` Andrew Morton
2008-06-11 15:08   ` Aneesh Kumar K.V
2008-06-11 15:08     ` Aneesh Kumar K.V
2008-06-11 19:07     ` Andrew Morton
2008-06-11 19:07       ` Andrew Morton
2008-06-12  4:06       ` Aneesh Kumar K.V [this message]
2008-06-12  4:06         ` Aneesh Kumar K.V
2008-06-12 12:22         ` Chris Mason
2008-06-12 12:22           ` Chris Mason
2008-06-12 16:17       ` Jan Kara
2008-06-12 16:17         ` Jan Kara
2008-06-22 22:50         ` Dave Chinner
2008-06-22 22:50           ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2008-06-06 18:24 Patches for the patchqueue Aneesh Kumar K.V
2008-06-06 18:24 ` [PATCH] ext4: cleanup blockallocator Aneesh Kumar K.V
2008-06-06 18:24   ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-03-24 17:04 [PATCH] ext3: Return EIO if new block is allocated from system zone Aneesh Kumar K.V
2008-03-24 17:04 ` [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-03-24 17:04   ` [PATCH] ext4: Export needed symbol for ZERO_PAGE usage in modules Aneesh Kumar K.V
2008-03-24 17:04     ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080612040643.GA5518@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.