linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fredrick <fjohnber@zoho.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, Andreas Dilger <adilger@dilger.ca>,
	wenqing.lz@taobao.com
Subject: Re: ext4_fallocate
Date: Mon, 25 Jun 2012 18:23:29 -0700	[thread overview]
Message-ID: <4FE90F11.4040801@zoho.com> (raw)
In-Reply-To: <20120625191744.GB9688@thunk.org>

On 06/25/2012 12:17 PM, Theodore Ts'o wrote:
> On Mon, Jun 25, 2012 at 04:51:59PM +0800, Zheng Liu wrote:
>>
>> Actually I want to send a url for you from linux mailing list archive but
>> I cannot find it.  After applying this patch, you can call ioctl(2) to
>> enable expose_stale_data flag, and then when you call fallocate(2), ext4
>> create initialized extents for you.  This patch cannot be merged into
>> upstream kernel because it brings a huge security hole.
>
> This is what we're using internally inside Google.... this allows the
> security exposure to be restricted to those programs running with a
> specific group id (which is better than giving programs access to
> CAP_SYS_RAWIO).  We also require the use of a specific fallocate flag
> so that programs have to explicitly ask for this feature.
>
> Also note that I restrict the combination of NO_HIDE_STALE &&
> KEEP_SIZE since it causes e2fsck to complain --- and if you're trying
> to avoid fs metadata I/O, you want to avoid the extra i_size update
> anyway, so it's not worth trying to make this work w/o causing e2fsck
> complaints.
>
> This patch is versus the v3.3 kernel (as it happens, I was just in the
> middle of rebasing this patch from 2.6.34 :-)
>
> 					- Ted
>
> P.S.  It just occurred to me that there are some patches being
> discussed that assign new fallocate flags for volatile data handling.
> So it would probably be a good idea to move the fallocate flag
> codepoint assignment up out of the way to avoid future conflicts.
>
> commit 5f12f1bc2b0fb0866d52763a611b022780780f05
> Author: Theodore Ts'o <tytso@google.com>
> Date:   Fri Jun 22 17:19:53 2012 -0400
>
>      ext4: add an fallocate flag to mark newly allocated extents initialized
>
>      This commit adds a new flag to ext4's fallocate that allows new,
>      uninitialized extents to be marked as initialized. This flag,
>      FALLOC_FL_NO_HIDE_STALE requires that the nohide_stale_gid=<gid> mount
>      option be used when the file system is mounted, and that the user is
>      in the group <gid>.
>
>      The benefit is to a program fallocates a larger space, but then writes
>      to that space in small increments.  This option prevents ext4 from
>      having to split the unallocated extent and merge the newly initialized
>      extent with the extent to its left.  Even though this usually happens
>      in-memory, this option is useful for tight memory situations and for
>      ext4 on flash.  Note: This allows an application in ths hohide_stale
>      group to see stale data on the filesystem.
>
>      Tested: Updated xfstests g002 to test a case where
>        fallocate:no-hide-stale is not allowed.  The existing tests now pass
>        because I added a remount with a group that user root is in.
>      Rebase-Tested-v3.3: same
>
>      Effort: fs/nohide-stale
>      Origin-2.6.34-SHA1: c3099bf61be1baf94bc91c481995bb0d77f05786
>      Origin-2.6.34-SHA1: 004dd33b9ebc5d860781c3435526658cc8aa8ccb
>      Change-Id: I0d2a7f2a4cf34443269acbcedb7b7074e0055e69
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index aaaece6..ac7aa42 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1240,6 +1240,9 @@ struct ext4_sb_info {
>   	unsigned long s_mb_last_group;
>   	unsigned long s_mb_last_start;
>
> +	/* gid that's allowed to see stale data via falloc flag. */
> +	gid_t no_hide_stale_gid;
> +
>   	/* stats for buddy allocator */
>   	atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
>   	atomic_t s_bal_success;	/* we found long enough chunks */
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index cb99346..cc57c85 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4375,6 +4375,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	int retries = 0;
>   	int flags;
>   	struct ext4_map_blocks map;
> +	struct ext4_sb_info *sbi;
>   	unsigned int credits, blkbits = inode->i_blkbits;
>
>   	/*
> @@ -4385,12 +4386,28 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   		return -EOPNOTSUPP;
>
>   	/* Return error if mode is not supported */
> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_NO_HIDE_STALE))
> +		return -EOPNOTSUPP;
> +
> +	/* The combination of NO_HIDE_STALE and KEEP_SIZE is not supported */
> +	if ((mode & FALLOC_FL_NO_HIDE_STALE) &&
> +	    (mode & FALLOC_FL_KEEP_SIZE))
>   		return -EOPNOTSUPP;
>
>   	if (mode & FALLOC_FL_PUNCH_HOLE)
>   		return ext4_punch_hole(file, offset, len);
>
> +	sbi = EXT4_SB(inode->i_sb);
> +	/* Must have RAWIO to see stale data. */
> +	if ((mode & FALLOC_FL_NO_HIDE_STALE) &&
> +	    !in_egroup_p(sbi->no_hide_stale_gid))
> +		return -EACCES;
> +
> +	/* preallocation to directories is currently not supported */
> +	if (S_ISDIR(inode->i_mode))
> +		return -ENODEV;
> +
>   	trace_ext4_fallocate_enter(inode, offset, len, mode);
>   	map.m_lblk = offset >> blkbits;
>   	/*
> @@ -4429,6 +4446,8 @@ retry:
>   			ret = PTR_ERR(handle);
>   			break;
>   		}
> +		if (mode & FALLOC_FL_NO_HIDE_STALE)
> +			flags &= ~EXT4_GET_BLOCKS_UNINIT_EXT;
>   		ret = ext4_map_blocks(handle, inode, &map, flags);
>   		if (ret <= 0) {
>   #ifdef EXT4FS_DEBUG
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 5b443a8..d976ec1 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1175,6 +1175,8 @@ static int ext4_show_options(struct seq_file *seq, struct dentry *root)
>   	if (test_opt2(sb, BIG_EXT))
>   		seq_puts(seq, ",big_extent");
>   #endif
> +	if (sbi->no_hide_stale_gid != -1)
> +		seq_printf(seq, ",nohide_stale_gid=%u", sbi->no_hide_stale_gid);
>
>   	ext4_show_quota_options(seq, sb);
>
> @@ -1353,6 +1355,7 @@ enum {
>   #ifdef CONFIG_EXT4_BIG_EXTENT
>   	Opt_big_extent, Opt_nobig_extent,
>   #endif
> +	Opt_nohide_stale_gid,
>   };
>
>   static const match_table_t tokens = {
> @@ -1432,6 +1435,7 @@ static const match_table_t tokens = {
>   	{Opt_big_extent, "big_extent"},
>   	{Opt_nobig_extent, "nobig_extent"},
>   #endif
> +	{Opt_nohide_stale_gid, "nohide_stale_gid=%u"},
>   	{Opt_err, NULL},
>   };
>
> @@ -1931,6 +1935,12 @@ set_qf_format:
>   				return 0;
>   			sbi->s_li_wait_mult = option;
>   			break;
> +		case Opt_nohide_stale_gid:
> +			if (match_int(&args[0], &option))
> +				return 0;
> +			/* -1 for disabled, otherwise it's valid. */
> +			sbi->no_hide_stale_gid = option;
> +			break;
>   		case Opt_noinit_itable:
>   			clear_opt(sb, INIT_INODE_TABLE);
>   			break;
> @@ -3274,6 +3284,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>   #ifdef CONFIG_EXT4_BIG_EXTENT
>   	sbi->s_min_big_ext_size = EXT4_DEFAULT_MIN_BIG_EXT_SIZE;
>   #endif
> +	/* Default to having no-hide-stale disabled. */
> +	sbi->no_hide_stale_gid = -1;
>
>   	if ((def_mount_opts & EXT4_DEFM_NOBARRIER) == 0)
>   		set_opt(sb, BARRIER);
> diff --git a/fs/open.c b/fs/open.c
> index 201431a..4edc0cd 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -224,7 +224,9 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   		return -EINVAL;
>
>   	/* Return error if mode is not supported */
> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE |
> +		     FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_NO_HIDE_STALE))
>   		return -EOPNOTSUPP;
>
>   	/* Punch hole must have keep size set */
> diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> index 73e0b62..a2489ac 100644
> --- a/include/linux/falloc.h
> +++ b/include/linux/falloc.h
> @@ -3,6 +3,7 @@
>
>   #define FALLOC_FL_KEEP_SIZE	0x01 /* default is extend size */
>   #define FALLOC_FL_PUNCH_HOLE	0x02 /* de-allocates range */
> +#define FALLOC_FL_NO_HIDE_STALE	0x04 /* default is hide stale data */
>
>   #ifdef __KERNEL__
>
>

Thanks Ted. This patch is very nice
and addresses the comments of Andreas
of using a mount option.

-Fredrick



  reply	other threads:[~2012-06-26  1:23 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-25  6:42 ext4_fallocate Fredrick
2012-06-25  7:33 ` ext4_fallocate Andreas Dilger
2012-06-28 15:12   ` ext4_fallocate Phillip Susi
2012-06-28 15:23     ` ext4_fallocate Eric Sandeen
2012-06-25  8:51 ` ext4_fallocate Zheng Liu
2012-06-25 19:04   ` ext4_fallocate Fredrick
2012-06-25 19:17   ` ext4_fallocate Theodore Ts'o
2012-06-26  1:23     ` Fredrick [this message]
2012-06-26 13:13     ` ext4_fallocate Ric Wheeler
2012-06-26 17:30       ` ext4_fallocate Theodore Ts'o
2012-06-26 18:06         ` ext4_fallocate Fredrick
2012-06-26 18:21         ` ext4_fallocate Ric Wheeler
2012-06-26 18:57           ` ext4_fallocate Ted Ts'o
2012-06-26 19:22             ` ext4_fallocate Ric Wheeler
2012-06-26 18:05       ` ext4_fallocate Fredrick
2012-06-26 18:59         ` ext4_fallocate Ted Ts'o
2012-06-26 19:30         ` ext4_fallocate Ric Wheeler
2012-06-26 19:57           ` ext4_fallocate Eric Sandeen
2012-06-26 20:44             ` ext4_fallocate Eric Sandeen
2012-06-27 15:14               ` ext4_fallocate Eric Sandeen
2012-06-27 19:30               ` ext4_fallocate Theodore Ts'o
2012-06-27 23:02                 ` ext4_fallocate Eric Sandeen
2012-06-28 11:27                   ` ext4_fallocate Ric Wheeler
2012-06-29 19:02                     ` ext4_fallocate Andreas Dilger
2012-07-02  3:03                       ` ext4_fallocate Zheng Liu
2012-06-28 12:48                   ` ext4_fallocate Theodore Ts'o
2012-07-02  3:16                   ` ext4_fallocate Zheng Liu
2012-07-02 16:33                     ` ext4_fallocate Eric Sandeen
2012-07-02 17:44                       ` ext4_fallocate Jan Kara
2012-07-02 17:48                         ` ext4_fallocate Ric Wheeler
2012-07-03 17:41                           ` ext4_fallocate Zheng Liu
2012-07-03 17:57                             ` ext4_fallocate Zach Brown
2012-07-04  2:23                               ` ext4_fallocate Zheng Liu
2012-07-02 18:01                         ` ext4_fallocate Theodore Ts'o
2012-07-03  9:30                           ` ext4_fallocate Jan Kara
2012-07-04  1:15                         ` ext4_fallocate Phillip Susi
2012-07-04  2:36                           ` ext4_fallocate Zheng Liu
2012-07-04  3:06                             ` ext4_fallocate Phillip Susi
2012-07-04  3:48                               ` ext4_fallocate Zheng Liu
2012-07-04 12:20                               ` ext4_fallocate Ric Wheeler
2012-07-04 13:25                                 ` ext4_fallocate Zheng Liu
2012-06-26 13:06 ` ext4_fallocate Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE90F11.4040801@zoho.com \
    --to=fjohnber@zoho.com \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).