Re: ext4_page_mkwrite and delalloc

Linux EXT4 FS development
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Jan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: ext4_page_mkwrite and delalloc
Date: Fri, 13 Jun 2008 08:50:06 +0530	[thread overview]
Message-ID: <20080613032006.GC12892@skywalker> (raw)
In-Reply-To: <1213304446.3698.9.camel@localhost.localdomain>

On Thu, Jun 12, 2008 at 02:00:46PM -0700, Mingming Cao wrote:
> On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> > Hi,
> > 
> > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > with delalloc is to delay the block allocation and make sure we allocate
> > chunks of blocks together at writepages. So i guess we should update
> > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> 
> I agree with delayed allocation page_mkwrite is much simplier, just to
> block reservation to prevent ENOSPC
> 
> > Taking i_alloc_sem should protect against parallel truncate and the page
> > lock should protect against parallel write_begin/write_end.
> > 
> > How about the patch below ?
> > 
> 
> Do we plan to support page_mkwrite for non delalloc? the following patch
> seems suggesting that we only do page_mkwrite with delalloc?

Yes it is needed for non delalloc also. The primary requirement is for
lock inversion patches. With lock inversion patches we don't do
block allocation in writepage


> 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index cac132b..7f162cc 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> >  	return err;
> >  }
> > 
> > -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> > -{
> > -	if (!buffer_mapped(bh)) {
> > -		/*
> > -		 * Mark buffer as dirty so that
> > -		 * block_write_full_page() writes it
> > -		 */
> > -		set_buffer_dirty(bh);
> > -	}
> > -	return 0;
> > -}
> > -
> >  static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> >  {
> >  	return !buffer_mapped(bh);
> > @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> >  		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> >  				       ext4_bh_unmapped))
> >  			goto out_unlock;
> > -		/*
> > -		 * Now mark all the  buffer head dirty so
> > -		 * that writepage can write it
> > -		 */
> > -		walk_page_buffers(NULL, page_buffers(page), 0, len,
> > -					NULL, ext4_bh_prepare_fill);
> >  	}
> >  	/*
> > -	 * OK, we need to fill the hole... Lock the page and do writepage.
> > -	 * We can't do write_begin and write_end here because we don't
> > -	 * have inode_mutex and that allow parallel write_begin, write_end call.
> > +	 * OK, we need to fill the hole... Lock the page and do write_begin
> > +	 * write_end. We are not holding inode.i__mutex here. That allow
> > +	 * parallel write_begin, write_end call.
> >  	 * (lock_page prevent this from happening on the same page though)
> >  	 */
> > -	lock_page(page);
> > -	wbc.range_start = page_offset(page);
> > -	wbc.range_end = page_offset(page) + len;
> > -	ret = mapping->a_ops->writepage(page, &wbc);
> > -	/* writepage unlocks the page */
> > +	ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> > +			len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
> 
> What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
> delalloc is enabled?
> 

Since we are not doing any real copy here I guess we can say that
we don't do short write. The flag means that.

#define AOP_FLAG_UNINTERRUPTIBLE        0x0001 /* will not do a short write */

> > +	if (ret < 0)
> > +		goto out_unlock;
> > +	ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > +			len, len, page, NULL);
> 
> I am still puzzled why we need to mark the page dirty in write_end here.
> Thought only do block reservation in write_begin is enough, we haven't
> write anything yet...


The reason is to get the ordered and journaled mode behavior correct.
We need ensure that the meta-data that got allocated in the write_begin
get commited in the right order. We need add the buffer_heads
corresponding to the data (page) to the right list in the journal.
write_end mostly does that.

-aneesh

next prev parent reply	other threads:[~2008-06-13  3:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-12 18:14 ext4_page_mkwrite and delalloc Aneesh Kumar K.V
2008-06-12 21:00 ` Mingming Cao
2008-06-13  3:20   ` Aneesh Kumar K.V [this message]
2008-06-13 22:35     ` Mingming
2008-06-14  6:43       ` Aneesh Kumar K.V
2008-06-16 14:11 ` Jan Kara
2008-06-16 16:09   ` Aneesh Kumar K.V
2008-06-16 17:34     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080613032006.GC12892@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox