From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Jan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: ext4_page_mkwrite and delalloc
Date: Fri, 13 Jun 2008 08:50:06 +0530 [thread overview]
Message-ID: <20080613032006.GC12892@skywalker> (raw)
In-Reply-To: <1213304446.3698.9.camel@localhost.localdomain>
On Thu, Jun 12, 2008 at 02:00:46PM -0700, Mingming Cao wrote:
> On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> > Hi,
> >
> > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > with delalloc is to delay the block allocation and make sure we allocate
> > chunks of blocks together at writepages. So i guess we should update
> > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
>
> I agree with delayed allocation page_mkwrite is much simplier, just to
> block reservation to prevent ENOSPC
>
> > Taking i_alloc_sem should protect against parallel truncate and the page
> > lock should protect against parallel write_begin/write_end.
> >
> > How about the patch below ?
> >
>
> Do we plan to support page_mkwrite for non delalloc? the following patch
> seems suggesting that we only do page_mkwrite with delalloc?
Yes it is needed for non delalloc also. The primary requirement is for
lock inversion patches. With lock inversion patches we don't do
block allocation in writepage
>
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index cac132b..7f162cc 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> > return err;
> > }
> >
> > -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> > -{
> > - if (!buffer_mapped(bh)) {
> > - /*
> > - * Mark buffer as dirty so that
> > - * block_write_full_page() writes it
> > - */
> > - set_buffer_dirty(bh);
> > - }
> > - return 0;
> > -}
> > -
> > static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> > {
> > return !buffer_mapped(bh);
> > @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> > if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> > ext4_bh_unmapped))
> > goto out_unlock;
> > - /*
> > - * Now mark all the buffer head dirty so
> > - * that writepage can write it
> > - */
> > - walk_page_buffers(NULL, page_buffers(page), 0, len,
> > - NULL, ext4_bh_prepare_fill);
> > }
> > /*
> > - * OK, we need to fill the hole... Lock the page and do writepage.
> > - * We can't do write_begin and write_end here because we don't
> > - * have inode_mutex and that allow parallel write_begin, write_end call.
> > + * OK, we need to fill the hole... Lock the page and do write_begin
> > + * write_end. We are not holding inode.i__mutex here. That allow
> > + * parallel write_begin, write_end call.
> > * (lock_page prevent this from happening on the same page though)
> > */
> > - lock_page(page);
> > - wbc.range_start = page_offset(page);
> > - wbc.range_end = page_offset(page) + len;
> > - ret = mapping->a_ops->writepage(page, &wbc);
> > - /* writepage unlocks the page */
> > + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> > + len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
>
> What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
> delalloc is enabled?
>
Since we are not doing any real copy here I guess we can say that
we don't do short write. The flag means that.
#define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */
> > + if (ret < 0)
> > + goto out_unlock;
> > + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > + len, len, page, NULL);
>
> I am still puzzled why we need to mark the page dirty in write_end here.
> Thought only do block reservation in write_begin is enough, we haven't
> write anything yet...
The reason is to get the ordered and journaled mode behavior correct.
We need ensure that the meta-data that got allocated in the write_begin
get commited in the right order. We need add the buffer_heads
corresponding to the data (page) to the right list in the journal.
write_end mostly does that.
-aneesh
next prev parent reply other threads:[~2008-06-13 3:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-12 18:14 ext4_page_mkwrite and delalloc Aneesh Kumar K.V
2008-06-12 21:00 ` Mingming Cao
2008-06-13 3:20 ` Aneesh Kumar K.V [this message]
2008-06-13 22:35 ` Mingming
2008-06-14 6:43 ` Aneesh Kumar K.V
2008-06-16 14:11 ` Jan Kara
2008-06-16 16:09 ` Aneesh Kumar K.V
2008-06-16 17:34 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080613032006.GC12892@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox