public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Peter Vajgel <pv@fb.com>
Cc: Jef Fox <jef.fox@kinetx.com>, "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: XFS Preallocation
Date: Tue, 1 Feb 2011 19:03:54 +1100	[thread overview]
Message-ID: <20110201080354.GM11040@dastard> (raw)
In-Reply-To: <3F5ACD12257C714E9C0535D0A839171802A9B4@SC-MBX02-2.TheFacebook.com>

On Tue, Feb 01, 2011 at 04:45:09AM +0000, Peter Vajgel wrote:
> > Preallocation is the only option. Allowing preallocation without
> > marking extents as unwritten opens a massive security hole (i.e.
> > exposes stale data) so I say no to any request for addition of
> > such functionality (and have for years).
> 
> How about opening this option to at least root (root can already
> read the device anyway)?.

# ls -l foo
-rw-r--r-- 1 dave dave        0 Aug 16 10:44 foo
#
# prealloc_without_unwritten 0 1048576 foo
# ls -l foo
-rw-r--r-- 1 dave dave  1048576 Aug 16 10:44 foo
#

Now user dave can read the stale data exposed by the root only
operation. Any combination of making the file available to a
non-root user after a preallocation-without-unwritten-extents
operation has this problem.  IOWs, just making such a syscall "root
only" doesn't solve the security problem.

To fix it, we have to require inodes have 0600 perms, owned by root,
and cannot be chmod/chowned to anyone else, ever. At that point,
we're requiring applications to run as root to to use this
functionality. Same requirement as fiemap + reading from the block
device, which you can do right without any kernel mods or filesystem
hacks...

> There are cases when creating large
> files without writing to them is important. A good example is
> testing xfs overhead when doing a specific workload (like random
> reads) to large files.

For testing it doesn't matter how long it takes you to write
the file in the first place.

> In this case we want to hit the disk on
> every request. Currently we have a workaround (below) but official
> support would be preferable.

Officially, we _removed_ the unwritten=0 option from mkfs because of
the security problems. Not to mention that it was never, ever
tested...

> 
> --pv
> 
> 
> # create_xfs_files
> 
> dev=$1
> mntpt=$2
> dircount=$3
> filecount=$4
> size=$5
> 
> # Umount.
> umount $2
> 
> # Create the fs.
> mkfs -t xfs -f -d unwritten=0,su=256k,sw=10 -l su=256k -L "/hay" $dev

Which fails due to:

unknown option -d unwritten=0
/* blocksize */         [-b log=n|size=num]
/* data subvol */       [-d agcount=n,agsize=n,file,name=xxx,size=num,
                            (sunit=value,swidth=value|su=num,sw=num),
                            sectlog=n|sectsize=num
.....

> # Clear unwritten flag - current xfs ignores this flag
> typeset -i agcount=$(xfs_db -c "sb" -c "print" $dev | grep agcount)
> typeset -i i=0
> while [[ $i != $agcount ]]
> do
>   xfs_db -x -c "sb $i" -c "write versionnum 0xa4a4" $dev
>   i=i+1
> done
> 
> # Mount the filesystem.
> mount -t xfs -o nobarrier,noatime,nodiratime,inode64,allocsize=1g $dev $mntpt
> 
> i=0
> while [[ $i != $dircount ]]
> do
>   mkdir $mntpt/dir$i
>   typeset -i j=0
>   while [[ $j != $filecount ]]
>   do
>     file=$mntpt/dir$i/file$j
>     xfs_io -f -c "resvsp 0 $size" $file
>     inum=$(ls -i $file | awk '{print $1}')
>     umount $mntpt
>     xfs_db -x -c "inode $inum" -c "write core.size $size" $dev
>     mount -t xfs -o nobarrier,noatime,nodiratime,inode64,allocsize=1g $dev $mntpt

That's quite a hack to work around the EOF zeroing that extending the
file size after allocating would do because the preallocated extents
beyond EOF are not marked unwritten. Perhaps truncating the file
first, then preallocating is what you want:

	xfs_io -f -c "truncate $size" -c "resvsp 0 $size" $file

>     j=j+1
>   done
>   i=i+1
> done

Regardless of all this, perhaps themost important point is that your
proposed use of XFS is fundamentally unsupportable by the linux XFS
community: you've got proprietary software on some external hardware
writing to the disk without going through the linux XFS kernel code.
You're basically in the same boat as people running proprietary
kernel modules - unless you can prove the problem is not caused by
your hw/sw or manual filesystem modifications, then it's a waste of
our (limited) resources to even look at the problem.  That generally
comes down to being able to reproduce the problem on a vanilla kernel
on a filesystem created with a supported mkfs....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-02-01  8:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-28  2:05 XFS Preallocation Jef Fox
2011-01-28  4:52 ` Dave Chinner
2011-01-28 15:15   ` Jef Fox
2011-01-28 17:33   ` Jef Fox
2011-01-29  0:17     ` Dave Chinner
2011-02-01  4:45       ` Peter Vajgel
2011-02-01  8:03         ` Dave Chinner [this message]
2011-02-01 19:20           ` Peter Vajgel
2011-02-01 20:12             ` Stan Hoeppner
2011-02-01 22:47               ` Peter Vajgel
2011-02-02  0:07             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110201080354.GM11040@dastard \
    --to=david@fromorbit.com \
    --cc=jef.fox@kinetx.com \
    --cc=pv@fb.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox