All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: David Chinner <dgc@sgi.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2
Date: Mon, 19 Mar 2007 17:37:03 +1100	[thread overview]
Message-ID: <45FE2F8F.6010603@yahoo.com.au> (raw)
In-Reply-To: <20070318233008.GA32597093@melbourne.sgi.com>

David Chinner wrote:
> Generic page_mkwrite functionality.
> 
> Filesystems that make use of the VM ->page_mkwrite() callout will generally use
> the same core code to implement it. There are several tricky truncate-related
> issues that we need to deal with here as we cannot take the i_mutex as we
> normally would for these paths.  These issues are not documented anywhere yet
> so block_page_mkwrite() seems like the best place to start.



> 
> Version 2:
> 
> - read inode size only once
> - more comments explaining implementation restrictions
> 
> Signed-Off-By: Dave Chinner <dgc@sgi.com>
> 
> ---
>  fs/buffer.c                 |   47 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/buffer_head.h |    2 +
>  2 files changed, 49 insertions(+)
> 
> Index: 2.6.x-xfs-new/fs/buffer.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/buffer.c	2007-03-17 10:55:32.291414968 +1100
> +++ 2.6.x-xfs-new/fs/buffer.c	2007-03-19 08:13:54.519909087 +1100
> @@ -2194,6 +2194,52 @@ int generic_commit_write(struct file *fi
>  	return 0;
>  }
>  
> +/*
> + * block_page_mkwrite() is not allowed to change the file size as it gets
> + * called from a page fault handler when a page is first dirtied. Hence we must
> + * be careful to check for EOF conditions here. We set the page up correctly
> + * for a written page which means we get ENOSPC checking when writing into
> + * holes and correct delalloc and unwritten extent mapping on filesystems that
> + * support these features.
> + *
> + * We are not allowed to take the i_mutex here so we have to play games to
> + * protect against truncate races as the page could now be beyond EOF.  Because
> + * vmtruncate() writes the inode size before removing pages, once we have the
> + * page lock we can determine safely if the page is beyond EOF. If it is not
> + * beyond EOF, then the page is guaranteed safe against truncation until we
> + * unlock the page.
> + */
> +int
> +block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> +		   get_block_t get_block)
> +{
> +	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
> +	unsigned long end;
> +	loff_t size;
> +	int ret = -EINVAL;
> +
> +	lock_page(page);
> +	size = i_size_read(inode);
> +	if ((page->mapping != inode->i_mapping) ||
> +	    ((page->index << PAGE_CACHE_SHIFT) > size)) {
> +		/* page got truncated out from underneath us */
> +		goto out_unlock;
> +	}

I see your explanation above, but I still don't see why this can't
just follow the conventional if (!page->mapping) check for truncation.
If the test happens to be performed after truncate concurrently
decreases i_size, then the blocks are going to get truncated by the
truncate afterwards anyway.

> +
> +	/* page is wholly or partially inside EOF */
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
> +		end = size & ~PAGE_CACHE_MASK;
> +	else
> +		end = PAGE_CACHE_SIZE;
> +
> +	ret = block_prepare_write(page, 0, end, get_block);
> +	if (!ret)
> +		ret = block_commit_write(page, 0, end);
> +
> +out_unlock:
> +	unlock_page(page);
> +	return ret;
> +}

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: David Chinner <dgc@sgi.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2
Date: Mon, 19 Mar 2007 17:37:03 +1100	[thread overview]
Message-ID: <45FE2F8F.6010603@yahoo.com.au> (raw)
In-Reply-To: <20070318233008.GA32597093@melbourne.sgi.com>

David Chinner wrote:
> Generic page_mkwrite functionality.
> 
> Filesystems that make use of the VM ->page_mkwrite() callout will generally use
> the same core code to implement it. There are several tricky truncate-related
> issues that we need to deal with here as we cannot take the i_mutex as we
> normally would for these paths.  These issues are not documented anywhere yet
> so block_page_mkwrite() seems like the best place to start.



> 
> Version 2:
> 
> - read inode size only once
> - more comments explaining implementation restrictions
> 
> Signed-Off-By: Dave Chinner <dgc@sgi.com>
> 
> ---
>  fs/buffer.c                 |   47 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/buffer_head.h |    2 +
>  2 files changed, 49 insertions(+)
> 
> Index: 2.6.x-xfs-new/fs/buffer.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/buffer.c	2007-03-17 10:55:32.291414968 +1100
> +++ 2.6.x-xfs-new/fs/buffer.c	2007-03-19 08:13:54.519909087 +1100
> @@ -2194,6 +2194,52 @@ int generic_commit_write(struct file *fi
>  	return 0;
>  }
>  
> +/*
> + * block_page_mkwrite() is not allowed to change the file size as it gets
> + * called from a page fault handler when a page is first dirtied. Hence we must
> + * be careful to check for EOF conditions here. We set the page up correctly
> + * for a written page which means we get ENOSPC checking when writing into
> + * holes and correct delalloc and unwritten extent mapping on filesystems that
> + * support these features.
> + *
> + * We are not allowed to take the i_mutex here so we have to play games to
> + * protect against truncate races as the page could now be beyond EOF.  Because
> + * vmtruncate() writes the inode size before removing pages, once we have the
> + * page lock we can determine safely if the page is beyond EOF. If it is not
> + * beyond EOF, then the page is guaranteed safe against truncation until we
> + * unlock the page.
> + */
> +int
> +block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> +		   get_block_t get_block)
> +{
> +	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
> +	unsigned long end;
> +	loff_t size;
> +	int ret = -EINVAL;
> +
> +	lock_page(page);
> +	size = i_size_read(inode);
> +	if ((page->mapping != inode->i_mapping) ||
> +	    ((page->index << PAGE_CACHE_SHIFT) > size)) {
> +		/* page got truncated out from underneath us */
> +		goto out_unlock;
> +	}

I see your explanation above, but I still don't see why this can't
just follow the conventional if (!page->mapping) check for truncation.
If the test happens to be performed after truncate concurrently
decreases i_size, then the blocks are going to get truncated by the
truncate afterwards anyway.

> +
> +	/* page is wholly or partially inside EOF */
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
> +		end = size & ~PAGE_CACHE_MASK;
> +	else
> +		end = PAGE_CACHE_SIZE;
> +
> +	ret = block_prepare_write(page, 0, end, get_block);
> +	if (!ret)
> +		ret = block_commit_write(page, 0, end);
> +
> +out_unlock:
> +	unlock_page(page);
> +	return ret;
> +}

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-03-19  6:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-18 23:30 [PATCH 1 of 2] block_page_mkwrite() Implementation V2 David Chinner
2007-03-18 23:30 ` David Chinner
2007-03-19  6:37 ` Nick Piggin [this message]
2007-03-19  6:37   ` Nick Piggin
2007-03-19  8:12   ` David Chinner
2007-03-19  8:12     ` David Chinner
2007-03-19  9:57     ` Nick Piggin
2007-03-19  9:57       ` Nick Piggin
2007-03-19 10:28       ` Nick Piggin
2007-03-19 10:28         ` Nick Piggin
2007-03-19  9:22 ` Christoph Hellwig
2007-03-19  9:22   ` Christoph Hellwig
2007-03-19 10:11   ` Nick Piggin
2007-03-19 10:11     ` Nick Piggin
2007-03-19 12:22     ` Christoph Hellwig
2007-03-19 12:22       ` Christoph Hellwig
2007-03-20  5:34       ` Nick Piggin
2007-03-20  5:34         ` Nick Piggin
2007-05-16 10:19 ` David Howells
2007-05-16 10:19   ` David Howells
2007-05-16 11:59   ` Nick Piggin
2007-05-16 11:59     ` Nick Piggin
2007-05-16 13:20     ` David Howells
2007-05-16 13:20       ` David Howells
2007-05-16 13:41       ` Nick Piggin
2007-05-16 13:41         ` Nick Piggin
2007-05-16 12:09   ` David Woodhouse
2007-05-16 12:09     ` David Woodhouse
2007-05-16 12:53     ` Chris Mason
2007-05-16 12:53       ` Chris Mason
2007-05-16 13:04       ` Nick Piggin
2007-05-16 13:04         ` Nick Piggin
2007-05-16 13:10         ` Chris Mason
2007-05-16 13:10           ` Chris Mason
2007-05-16 13:25     ` David Howells
2007-05-16 13:25       ` David Howells
2007-05-16 23:28   ` David Chinner
2007-05-16 23:28     ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45FE2F8F.6010603@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.