From mboxrd@z Thu Jan  1 00:00:00 1970
From: Junxiao Bi <junxiao.bi@oracle.com>
Date: Wed, 28 Apr 2021 09:00:06 -0700
Subject: [Cluster-devel] [PATCH 2/3] ocfs2: allow writing back pages out
	of inode size
In-Reply-To: <20210426220552.45413-2-junxiao.bi@oracle.com>
References: <20210426220552.45413-1-junxiao.bi@oracle.com>
	<20210426220552.45413-2-junxiao.bi@oracle.com>
Message-ID: <9aeafe69-0bf2-f4cb-431a-1cb4329a178c@oracle.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi Joseph,

Can you help review the first two patches?

Thanks,

Junxiao.

On 4/26/21 3:05 PM, Junxiao Bi wrote:
> When fallocate/truncate extend inode size, if the original isize is in
> the middle of last cluster, then the part from isize to the end of the
> cluster needs to be zeroed with buffer write, at that time isize is not
> yet updated to match the new size, if writeback is kicked in, it will
> invoke ocfs2_writepage()->block_write_full_page() where the pages out
> of inode size will be dropped. That will cause file corruption.
>
> Running the following command with qemu-image 4.2.1 can get a corrupted
> coverted image file easily.
>
>      qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
>               -O qcow2 -o compat=1.1 $qcow_image.conv
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>   fs/ocfs2/aops.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index ad20403b383f..7a3e3d59f6a9 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac)
>    */
>   static int ocfs2_writepage(struct page *page, struct writeback_control *wbc)
>   {
> +	struct inode * const inode = page->mapping->host;
> +	loff_t i_size = i_size_read(inode);
> +	const pgoff_t end_index = i_size >> PAGE_SHIFT;
> +	unsigned int offset;
> +
>   	trace_ocfs2_writepage(
>   		(unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno,
>   		page->index);
>   
> -	return block_write_full_page(page, ocfs2_get_block, wbc);
> +	/*
> +	 * The page straddles i_size.  It must be zeroed out on each and every
> +	 * writepage invocation because it may be mmapped.  "A file is mapped
> +	 * in multiples of the page size.  For a file that is not a multiple of
> +	 * the  page size, the remaining memory is zeroed when mapped, and
> +	 * writes to that region are not written out to the file."
> +	 */
> +	offset = i_size & (PAGE_SIZE-1);
> +	if (page->index == end_index && offset)
> +		zero_user_segment(page, offset, PAGE_SIZE);
> +
> +	return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc,
> +			end_buffer_async_write, true);
>   }
>   
>   /* Taken from ext3. We don't necessarily need the full blown