From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junxiao Bi Date: Wed, 28 Apr 2021 09:00:06 -0700 Subject: [Cluster-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size In-Reply-To: <20210426220552.45413-2-junxiao.bi@oracle.com> References: <20210426220552.45413-1-junxiao.bi@oracle.com> <20210426220552.45413-2-junxiao.bi@oracle.com> Message-ID: <9aeafe69-0bf2-f4cb-431a-1cb4329a178c@oracle.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Joseph, Can you help review the first two patches? Thanks, Junxiao. On 4/26/21 3:05 PM, Junxiao Bi wrote: > When fallocate/truncate extend inode size, if the original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster needs to be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > Cc: > Signed-off-by: Junxiao Bi > --- > fs/ocfs2/aops.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index ad20403b383f..7a3e3d59f6a9 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac) > */ > static int ocfs2_writepage(struct page *page, struct writeback_control *wbc) > { > + struct inode * const inode = page->mapping->host; > + loff_t i_size = i_size_read(inode); > + const pgoff_t end_index = i_size >> PAGE_SHIFT; > + unsigned int offset; > + > trace_ocfs2_writepage( > (unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno, > page->index); > > - return block_write_full_page(page, ocfs2_get_block, wbc); > + /* > + * The page straddles i_size. It must be zeroed out on each and every > + * writepage invocation because it may be mmapped. "A file is mapped > + * in multiples of the page size. For a file that is not a multiple of > + * the page size, the remaining memory is zeroed when mapped, and > + * writes to that region are not written out to the file." > + */ > + offset = i_size & (PAGE_SIZE-1); > + if (page->index == end_index && offset) > + zero_user_segment(page, offset, PAGE_SIZE); > + > + return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc, > + end_buffer_async_write, true); > } > > /* Taken from ext3. We don't necessarily need the full blown