From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] GFS2: directly write blocks past i_size
Date: Thu, 17 Mar 2011 09:26:08 +0000 [thread overview]
Message-ID: <1300353968.2596.5.camel@dolmen> (raw)
In-Reply-To: <20110317050812.GO23657@ether.msp.redhat.com>
Hi,
On Thu, 2011-03-17 at 00:08 -0500, Benjamin Marzinski wrote:
> GFS2 was relying on the writepage code to write out the zeroed data for
> fallocate. However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
> If it is, it will be ignored. To work around this, gfs2 now calls
> write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
> is set, and it's writing past i_size.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> fs/gfs2/file.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 52 insertions(+), 10 deletions(-)
>
Generally looks good, but a few minor points....
> Index: gfs2-2.6-nmw/fs/gfs2/file.c
> ===================================================================
> --- gfs2-2.6-nmw.orig/fs/gfs2/file.c
> +++ gfs2-2.6-nmw/fs/gfs2/file.c
> @@ -617,18 +617,55 @@ static ssize_t gfs2_file_aio_write(struc
> return generic_file_aio_write(iocb, iov, nr_segs, pos);
> }
>
> -static void empty_write_end(struct page *page, unsigned from,
> - unsigned to)
> +static int empty_write_end(struct page *page, unsigned from,
> + unsigned to, int mode)
> {
> - struct gfs2_inode *ip = GFS2_I(page->mapping->host);
> + struct inode *inode = page->mapping->host;
> + struct gfs2_inode *ip = GFS2_I(inode);
> + struct buffer_head *bh;
> + int waiting = 0;
> + unsigned offset, blksize = 1 << inode->i_blkbits;
> + loff_t i_size = i_size_read(inode);
> + pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
>
> zero_user(page, from, to-from);
> mark_page_accessed(page);
>
> - if (!gfs2_is_writeback(ip))
> - gfs2_page_add_databufs(ip, page, from, to);
> + if (page->index < end_index || !(mode & FALLOC_FL_KEEP_SIZE)) {
> + if (!gfs2_is_writeback(ip))
> + gfs2_page_add_databufs(ip, page, from, to);
>
> - block_commit_write(page, from, to);
> + block_commit_write(page, from, to);
> + return 0;
> + }
> +
> +second_pass:
> + offset = 0;
> + bh = page_buffers(page);
> + while (offset < from) {
> + offset += blksize;
> + bh = bh->b_this_page;
> + }
> + while (offset < to) {
> + if (waiting){
> + wait_on_buffer(bh);
> + if (!buffer_uptodate(bh))
> + return -EIO;
> + }
> + else {
The else should start on the line above to make it more obvious.
> + set_buffer_uptodate(bh);
> + mark_buffer_dirty(bh);
> + clear_buffer_new(bh);
> + write_dirty_buffer(bh, WRITE);
Should this be WRITE_SYNC or WRITE_SYNC_PLUG I wonder?
> + }
> + offset += blksize;
> + bh = bh->b_this_page;
> + }
> + if (!waiting) {
> + waiting = 1;
> + goto second_pass;
> + }
I think the code might be a bit cleaner if it was just written as two
loops, one after the other since most of the loop content seems to be
different according to weather "waiting" is set or not.
Otherwise I think this is a good solution,
Steve.
> + return 0;
> }
>
> static int needs_empty_write(sector_t block, struct inode *inode)
> @@ -643,7 +680,8 @@ static int needs_empty_write(sector_t bl
> return !buffer_mapped(&bh_map);
> }
>
> -static int write_empty_blocks(struct page *page, unsigned from, unsigned to)
> +static int write_empty_blocks(struct page *page, unsigned from, unsigned to,
> + int mode)
> {
> struct inode *inode = page->mapping->host;
> unsigned start, end, next, blksize;
> @@ -668,7 +706,9 @@ static int write_empty_blocks(struct pag
> gfs2_block_map);
> if (unlikely(ret))
> return ret;
> - empty_write_end(page, start, end);
> + ret = empty_write_end(page, start, end, mode);
> + if (unlikely(ret))
> + return ret;
> end = 0;
> }
> start = next;
> @@ -682,7 +722,9 @@ static int write_empty_blocks(struct pag
> ret = __block_write_begin(page, start, end - start, gfs2_block_map);
> if (unlikely(ret))
> return ret;
> - empty_write_end(page, start, end);
> + ret = empty_write_end(page, start, end, mode);
> + if (unlikely(ret))
> + return ret;
> }
>
> return 0;
> @@ -731,7 +773,7 @@ static int fallocate_chunk(struct inode
>
> if (curr == end)
> to = end_offset;
> - error = write_empty_blocks(page, from, to);
> + error = write_empty_blocks(page, from, to, mode);
> if (!error && offset + to > inode->i_size &&
> !(mode & FALLOC_FL_KEEP_SIZE)) {
> i_size_write(inode, offset + to);
>
next prev parent reply other threads:[~2011-03-17 9:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-17 5:08 [Cluster-devel] [PATCH] GFS2: directly write blocks past i_size Benjamin Marzinski
2011-03-17 9:26 ` Steven Whitehouse [this message]
2011-03-17 14:05 ` Benjamin Marzinski
2011-03-18 2:53 ` Benjamin Marzinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1300353968.2596.5.camel@dolmen \
--to=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).