From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
To: Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: [RFC][PATCH] problem of cont_prepare_write()
Date: Mon, 22 Nov 2004 19:30:18 +0900 [thread overview]
Message-ID: <877joexjk5.fsf@devron.myhome.or.jp> (raw)
Hi,
If I do the following operation on fatfs, and my box under heavy load,
open("testfile", O_CREAT | O_TRUNC | O_RDWR, 0664);
lseek(fd, 500*1024*1024 - 1, SEEK_SET);
write(fd, "\0", 1);
In cont_prepare_write(), kernel fills the hole by zero cleared page.
fs/buffer.c:cont_prepare_write:2210,
while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {
[...]
status = __block_prepare_write(inode, new_page, zerofrom,
PAGE_CACHE_SIZE, get_block);
if (status)
goto out_unmap;
kaddr = kmap_atomic(new_page, KM_USER0);
memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
flush_dcache_page(new_page);
kunmap_atomic(kaddr, KM_USER0);
__block_commit_write(inode, new_page,
zerofrom, PAGE_CACHE_SIZE);
unlock_page(new_page);
page_cache_release(new_page);
}
But until ->commit_write(), kernel doesn't update the ->i_size. Then,
if kernel writes out that hole page before updates of ->i_size, dirty
flag of buffer_head is cleared in __block_write_full_page(). So hole
page was not writed to disk.
fs/buffer.c:__block_write_full_page:1773,
if (block > last_block) {
/*
* mapped buffers outside i_size will occur, because
* this page can be outside i_size when there is a
* truncate in progress.
*/
/*
* The buffer was zeroed by block_write_full_page()
*/
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
This became cause of data corruption.
I thought that it is not good to update ->i_size before ->commit_write().
So, I wrote the following patch. Anyone has good idea for solving this
problem?
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---
fs/buffer.c | 4 ++--
include/linux/fs.h | 4 ++++
mm/memory.c | 2 ++
3 files changed, 8 insertions(+), 2 deletions(-)
diff -puN mm/memory.c~cont_prepare_write-fix mm/memory.c
--- linux-2.6.10-rc2/mm/memory.c~cont_prepare_write-fix 2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/mm/memory.c 2004-11-22 18:59:19.000000000 +0900
@@ -1246,9 +1246,11 @@ int vmtruncate(struct inode * inode, lof
*/
if (IS_SWAPFILE(inode))
goto out_busy;
+ inode->i_flags |= S_TRUNCATING;
i_size_write(inode, offset);
unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
truncate_inode_pages(mapping, offset);
+ inode->i_flags &= ~S_TRUNCATING;
goto out_truncate;
do_expand:
diff -puN fs/buffer.c~cont_prepare_write-fix fs/buffer.c
--- linux-2.6.10-rc2/fs/buffer.c~cont_prepare_write-fix 2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/fs/buffer.c 2004-11-22 18:59:19.000000000 +0900
@@ -1763,7 +1763,7 @@ static int __block_write_full_page(struc
* handle any aliases from the underlying blockdev's mapping.
*/
do {
- if (block > last_block) {
+ if (IS_TRUNCATING(inode) && block > last_block) {
/*
* mapped buffers outside i_size will occur, because
* this page can be outside i_size when there is a
@@ -2628,7 +2628,7 @@ int block_write_full_page(struct page *p
/* Is the page fully outside i_size? (truncate in progress) */
offset = i_size & (PAGE_CACHE_SIZE-1);
- if (page->index >= end_index+1 || !offset) {
+ if (IS_TRUNCATING(inode) && (page->index >= end_index+1 || !offset)) {
/*
* The page may have dirty, unmapped buffers. For example,
* they may have been added in ext3_writepage(). Make them
diff -puN include/linux/fs.h~cont_prepare_write-fix include/linux/fs.h
--- linux-2.6.10-rc2/include/linux/fs.h~cont_prepare_write-fix 2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/include/linux/fs.h 2004-11-22 18:59:19.000000000 +0900
@@ -151,6 +151,8 @@ extern int dir_notify_enable;
#define S_NOCMTIME 128 /* Do not update file c/mtime */
#define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
+#define S_TRUNCATING (1<<31) /* Hack: inode is truncating the data now */
+
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
* flags just means all the inodes inherit those flags by default. It might be
@@ -185,6 +187,8 @@ extern int dir_notify_enable;
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
#define IS_SWAPFILE(inode) ((inode)->i_flags & S_SWAPFILE)
+#define IS_TRUNCATING(inode) ((inode)->i_flags & S_TRUNCATING)
+
/* the read-only stuff doesn't really belong here, but any other place is
probably as bad and I don't want to create yet another include file. */
_
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
next reply other threads:[~2004-11-22 10:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-11-22 10:30 OGAWA Hirofumi [this message]
2004-11-22 10:41 ` [RFC][PATCH] problem of cont_prepare_write() Andrew Morton
2004-11-22 10:46 ` Andrew Morton
2004-11-22 11:03 ` Anton Altaparmakov
2004-11-22 21:53 ` Andrew Morton
2004-11-22 23:30 ` Anton Altaparmakov
2004-11-22 23:43 ` Andrew Morton
2005-01-13 14:47 ` write barriers - Was: " Anton Altaparmakov
2005-01-13 15:56 ` Chris Friesen
2005-01-13 18:08 ` Florian Weimer
2004-11-22 11:38 ` OGAWA Hirofumi
2004-11-22 21:43 ` Andrew Morton
2004-11-23 2:22 ` OGAWA Hirofumi
2004-11-23 2:30 ` Andrew Morton
2004-11-23 2:55 ` OGAWA Hirofumi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877joexjk5.fsf@devron.myhome.or.jp \
--to=hirofumi@mail.parknet.co.jp \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox