Re: [PATCH] incorrect error handling inside generic_file_direct_write

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: Andrew Morton <akpm@osdl.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management <linux-mm@kvack.org>,
	devel@openvz.org, xfs@oss.sgi.com
Subject: Re: [PATCH]  incorrect error handling inside generic_file_direct_write
Date: Tue, 12 Dec 2006 12:22:14 +0300	[thread overview]
Message-ID: <87lkld31vd.fsf@sw.ru> (raw)
In-Reply-To: <20061211124052.144e69a0.akpm@osdl.org> (Andrew Morton's message of "Mon, 11 Dec 2006 12:40:52 -0800")

Andrew Morton <akpm@osdl.org> writes:

> On Mon, 11 Dec 2006 16:34:27 +0300
> Dmitriy Monakhov <dmonakhov@openvz.org> wrote:
>
>> OpenVZ team has discovered error inside generic_file_direct_write()
>> If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated
>> a few blocks outside i_size. And fsck will complain about wrong i_size
>> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error),
>> after fsck will fix error i_size will be increased to the biggest block,
>> but this blocks contain gurbage from previous write attempt, this is not 
>> information leak, but its silence file data corruption. 
>> We need truncate any block beyond i_size after write have failed , do in simular
>> generic_file_buffered_write() error path.
>> 
>> Exampe:
>> open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
>> write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device)
>> 
>> stat mnt2/FILE3
>> File: `mnt2/FILE3'
>> Size: 0               Blocks: 4          IO Block: 4096   regular empty file
>> >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx
>> Device: 700h/1792d      Inode: 14          Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>> 
>> fsck.ext2 -f -n  mnt1/fs_img
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 14, i_size is 0, should be 2048.  Fix? no
>> 
>> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
>> ----------
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7b84dc8..bf7cf6c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb *
>>  			mark_inode_dirty(inode);
>>  		}
>>  		*ppos = end;
>> +	} else if (written < 0) {
>> +		loff_t isize = i_size_read(inode);
>> +		/*
>> +		 * generic_file_direct_IO() may have instantiated a few blocks
>> +		 * outside i_size.  Trim these off again.
>> +		 */
>> +		if (pos + count > isize)
>> +			vmtruncate(inode, isize);
>>  	}
>>  
>
> XFS (at least) can call generic_file_direct_write() with i_mutex not held. 
How could it be ?

from mm/filemap.c:2046 generic_file_direct_write() comment right after 
place where i want to add vmtruncate()
/*
	 * Sync the fs metadata but not the minor inode changes and
	 * of course not the data as we did direct DMA for the IO.
	 * i_mutex is held, which protects generic_osync_inode() from
	 * livelocking.
	 */

> And vmtruncate() expects i_mutex to be held.
generic_file_direct_IO must called under i_mutex too
from mm/filemap.c:2388
  /*
   * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
   * went wrong during pagecache shootdown.
   */
  static ssize_t
  generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,

This means XFS generic_file_direct_write() call generic_file_direct_IO() without
i_mutex held too?
>
> I guess a suitable solution would be to push this problem back up to the
> callers: let them decide whether to run vmtruncate() and if so, to ensure
> that i_mutex is held.
>
> The existence of generic_file_aio_write_nolock() makes that rather messy
> though.

WARNING: multiple messages have this Message-ID (diff)

From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: Andrew Morton <akpm@osdl.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management <linux-mm@kvack.org>, <devel@openvz.org>,
	xfs@oss.sgi.com
Subject: Re: [PATCH]  incorrect error handling inside generic_file_direct_write
Date: Tue, 12 Dec 2006 12:22:14 +0300	[thread overview]
Message-ID: <87lkld31vd.fsf@sw.ru> (raw)
In-Reply-To: <20061211124052.144e69a0.akpm@osdl.org> (Andrew Morton's message of "Mon, 11 Dec 2006 12:40:52 -0800")

Andrew Morton <akpm@osdl.org> writes:

> On Mon, 11 Dec 2006 16:34:27 +0300
> Dmitriy Monakhov <dmonakhov@openvz.org> wrote:
>
>> OpenVZ team has discovered error inside generic_file_direct_write()
>> If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated
>> a few blocks outside i_size. And fsck will complain about wrong i_size
>> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error),
>> after fsck will fix error i_size will be increased to the biggest block,
>> but this blocks contain gurbage from previous write attempt, this is not 
>> information leak, but its silence file data corruption. 
>> We need truncate any block beyond i_size after write have failed , do in simular
>> generic_file_buffered_write() error path.
>> 
>> Exampe:
>> open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
>> write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device)
>> 
>> stat mnt2/FILE3
>> File: `mnt2/FILE3'
>> Size: 0               Blocks: 4          IO Block: 4096   regular empty file
>> >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx
>> Device: 700h/1792d      Inode: 14          Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>> 
>> fsck.ext2 -f -n  mnt1/fs_img
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 14, i_size is 0, should be 2048.  Fix? no
>> 
>> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
>> ----------
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7b84dc8..bf7cf6c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb *
>>  			mark_inode_dirty(inode);
>>  		}
>>  		*ppos = end;
>> +	} else if (written < 0) {
>> +		loff_t isize = i_size_read(inode);
>> +		/*
>> +		 * generic_file_direct_IO() may have instantiated a few blocks
>> +		 * outside i_size.  Trim these off again.
>> +		 */
>> +		if (pos + count > isize)
>> +			vmtruncate(inode, isize);
>>  	}
>>  
>
> XFS (at least) can call generic_file_direct_write() with i_mutex not held. 
How could it be ?

from mm/filemap.c:2046 generic_file_direct_write() comment right after 
place where i want to add vmtruncate()
/*
	 * Sync the fs metadata but not the minor inode changes and
	 * of course not the data as we did direct DMA for the IO.
	 * i_mutex is held, which protects generic_osync_inode() from
	 * livelocking.
	 */

> And vmtruncate() expects i_mutex to be held.
generic_file_direct_IO must called under i_mutex too
from mm/filemap.c:2388
  /*
   * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
   * went wrong during pagecache shootdown.
   */
  static ssize_t
  generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,

This means XFS generic_file_direct_write() call generic_file_direct_IO() without
i_mutex held too?
>
> I guess a suitable solution would be to push this problem back up to the
> callers: let them decide whether to run vmtruncate() and if so, to ensure
> that i_mutex is held.
>
> The existence of generic_file_aio_write_nolock() makes that rather messy
> though.

WARNING: multiple messages have this Message-ID (diff)

From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: Andrew Morton <akpm@osdl.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management <linux-mm@kvack.org>,
	devel@openvz.org, xfs@oss.sgi.com
Subject: Re: [PATCH]  incorrect error handling inside generic_file_direct_write
Date: Tue, 12 Dec 2006 12:22:14 +0300	[thread overview]
Message-ID: <87lkld31vd.fsf@sw.ru> (raw)
In-Reply-To: <20061211124052.144e69a0.akpm@osdl.org> (Andrew Morton's message of "Mon, 11 Dec 2006 12:40:52 -0800")

Andrew Morton <akpm@osdl.org> writes:

> On Mon, 11 Dec 2006 16:34:27 +0300
> Dmitriy Monakhov <dmonakhov@openvz.org> wrote:
>
>> OpenVZ team has discovered error inside generic_file_direct_write()
>> If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated
>> a few blocks outside i_size. And fsck will complain about wrong i_size
>> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error),
>> after fsck will fix error i_size will be increased to the biggest block,
>> but this blocks contain gurbage from previous write attempt, this is not 
>> information leak, but its silence file data corruption. 
>> We need truncate any block beyond i_size after write have failed , do in simular
>> generic_file_buffered_write() error path.
>> 
>> Exampe:
>> open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
>> write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device)
>> 
>> stat mnt2/FILE3
>> File: `mnt2/FILE3'
>> Size: 0               Blocks: 4          IO Block: 4096   regular empty file
>> >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx
>> Device: 700h/1792d      Inode: 14          Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>> 
>> fsck.ext2 -f -n  mnt1/fs_img
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 14, i_size is 0, should be 2048.  Fix? no
>> 
>> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
>> ----------
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7b84dc8..bf7cf6c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb *
>>  			mark_inode_dirty(inode);
>>  		}
>>  		*ppos = end;
>> +	} else if (written < 0) {
>> +		loff_t isize = i_size_read(inode);
>> +		/*
>> +		 * generic_file_direct_IO() may have instantiated a few blocks
>> +		 * outside i_size.  Trim these off again.
>> +		 */
>> +		if (pos + count > isize)
>> +			vmtruncate(inode, isize);
>>  	}
>>  
>
> XFS (at least) can call generic_file_direct_write() with i_mutex not held. 
How could it be ?

from mm/filemap.c:2046 generic_file_direct_write() comment right after 
place where i want to add vmtruncate()
/*
	 * Sync the fs metadata but not the minor inode changes and
	 * of course not the data as we did direct DMA for the IO.
	 * i_mutex is held, which protects generic_osync_inode() from
	 * livelocking.
	 */

> And vmtruncate() expects i_mutex to be held.
generic_file_direct_IO must called under i_mutex too
from mm/filemap.c:2388
  /*
   * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
   * went wrong during pagecache shootdown.
   */
  static ssize_t
  generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,

This means XFS generic_file_direct_write() call generic_file_direct_IO() without
i_mutex held too?
>
> I guess a suitable solution would be to push this problem back up to the
> callers: let them decide whether to run vmtruncate() and if so, to ensure
> that i_mutex is held.
>
> The existence of generic_file_aio_write_nolock() makes that rather messy
> though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-12-12  7:04 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-11 13:34 [PATCH] incorrect error handling inside generic_file_direct_write Dmitriy Monakhov
2006-12-11 13:34 ` Dmitriy Monakhov
2006-12-11 12:38 ` [Devel] " Kirill Korotaev
2006-12-11 12:38   ` Kirill Korotaev
2006-12-11 20:40 ` Andrew Morton
2006-12-11 20:40   ` Andrew Morton
2006-12-11 20:40   ` Andrew Morton
2006-12-12  9:22   ` Dmitriy Monakhov [this message]
2006-12-12  9:22     ` Dmitriy Monakhov
2006-12-12  9:22     ` Dmitriy Monakhov
2006-12-12  6:36     ` Andrew Morton
2006-12-12  6:36       ` Andrew Morton
2006-12-12  6:36       ` Andrew Morton
2006-12-12 12:20   ` Dmitriy Monakhov
2006-12-12 12:20     ` Dmitriy Monakhov
2006-12-12 12:20     ` Dmitriy Monakhov
2006-12-12  9:52     ` Andrew Morton
2006-12-12  9:52       ` Andrew Morton
2006-12-12  9:52       ` Andrew Morton
2006-12-12 13:18       ` Dmitriy Monakhov
2006-12-12 13:18         ` Dmitriy Monakhov
2006-12-12 10:40         ` Andrew Morton
2006-12-12 10:40           ` Andrew Morton
2006-12-12 10:40           ` Andrew Morton
2006-12-12 23:14           ` Dmitriy Monakhov
2006-12-12 23:14             ` Dmitriy Monakhov
2006-12-13  2:43           ` Chen, Kenneth W
2006-12-13  2:43             ` Chen, Kenneth W
2006-12-13  2:43             ` Chen, Kenneth W
2006-12-15 10:43             ` 'Christoph Hellwig'
2006-12-15 10:43               ` 'Christoph Hellwig'
2006-12-15 18:53               ` Chen, Kenneth W
2006-12-15 18:53                 ` Chen, Kenneth W
2006-12-15 18:53                 ` Chen, Kenneth W
2007-01-02 11:17                 ` 'Christoph Hellwig'
2007-01-02 11:17                   ` 'Christoph Hellwig'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lkld31vd.fsf@sw.ru \
    --to=dmonakhov@sw.ru \
    --cc=akpm@osdl.org \
    --cc=devel@openvz.org \
    --cc=dmonakhov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.