public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS_IOC_RESVSP64 for swap files
@ 2007-06-17 10:08 Peter Cordes
  2007-06-19  4:33 ` David Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Cordes @ 2007-06-17 10:08 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 2852 bytes --]

 Hi XFS list.  I'm not subscribed, please CC me.

 Programs such as swapspace and swapd create new swap files when vmem runs
low.  They would benefit hugely from being able to create a swapfile without
any significant disk I/O.  (If a process grabs a lot of memory quickly, the
system will be swapping hard while swapspace(8) is writing a swapfile.)

unfortunately,
 touch foo
 xfs_io -c 'truncate 1000000000' -c "resvsp 0 1000000000" foo
 mkswap foo
 sudo swapon foo
doesn't work.  The kernel complains:
 swapon: swapfile has holes

 foo is a ~1GB file with disk space allocated for it, though.  But reading
it doesn't create any disk I/O and reads all zero, so it's treated like a
sparse file.  Is this because my filesystem flags unwritten extents?  And if
my FS was created with that option off, would RESVSP make the file contain
the previous contents of that disk space?  That would be an obvious security
hole, but it would still be useful for making swap files even if only root
could do it.

 So, any ideas on how to make swap files without writing the whole file?
(swapd and swapspace both avoid deleting swap files right away, IIRC, so
don't suggest workarounds unless you have something really clever...)

 Could swapon(2) in the kernel be made to work on XFS files with reserved
space?  i.e. call something that would give XFS a chance to mark all the
extents as written, even though they're not.  Memory content is at least
as sensitive as anything in the filesystem, and if this file is going to be
trusted with that, it hardly matters if it also has parts of deleted files.




 I'm on GNU/Linux: Ubuntu Feisty AMD64, Linux 2.6.20-16-generic.
xfs_io version 2.8.18

peter@tesla:/var/tmp/peter$ xfs_info /var/tmp
meta-data=/dev/evms/temp         isize=256    agcount=16, agsize=800767 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=12812272, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=3328, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0


 BTW, I think xfs_allocsp has its args reversed, or something.
touch bar
xfs_io -c "allocsp 0 1000000" bar; ll -h
-rw-rw-r-- 1 peter peter    0 2007-06-17 06:45 bar
xfs_io -c "allocsp 1000000 0" bar; ll -h
-rw-rw-r-- 1 peter peter 977K 2007-06-17 06:45 bar

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 351 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS_IOC_RESVSP64 for swap files
  2007-06-17 10:08 XFS_IOC_RESVSP64 for swap files Peter Cordes
@ 2007-06-19  4:33 ` David Chinner
  2007-06-21  6:14   ` Peter Cordes
  0 siblings, 1 reply; 5+ messages in thread
From: David Chinner @ 2007-06-19  4:33 UTC (permalink / raw)
  To: Peter Cordes; +Cc: xfs

On Sun, Jun 17, 2007 at 07:08:23AM -0300, Peter Cordes wrote:
>  Hi XFS list.  I'm not subscribed, please CC me.
> 
>  Programs such as swapspace and swapd create new swap files when vmem runs
> low.  They would benefit hugely from being able to create a swapfile without
> any significant disk I/O.  (If a process grabs a lot of memory quickly, the
> system will be swapping hard while swapspace(8) is writing a swapfile.)
> 
> unfortunately,
>  touch foo
>  xfs_io -c 'truncate 1000000000' -c "resvsp 0 1000000000" foo
>  mkswap foo
>  sudo swapon foo
> doesn't work.  The kernel complains:
>  swapon: swapfile has holes
> 
>  foo is a ~1GB file with disk space allocated for it, though.  But reading
> it doesn't create any disk I/O and reads all zero, so it's treated like a
> sparse file.  Is this because my filesystem flags unwritten extents?

Yes.

> And if
> my FS was created with that option off, would RESVSP make the file contain
> the previous contents of that disk space?

Yes.

> That would be an obvious security hole,

Yes. That's why unwritten extents were introduced 10 or so years ago.

> but it would still be useful for making swap files even if only root
> could do it.

Still a potential security hole.

>  So, any ideas on how to make swap files without writing the whole file?

You can't. You need to use allocsp to allocate zero'd space. i.e.

# xfs_io -f -c 'allocsp 1000000000 0' foo

>  Could swapon(2) in the kernel be made to work on XFS files with reserved
> space?

Basically, the swapon syscall calls bmap() for the block mapping of the
file and XFS returns "holes" for unwritten extents because this is the
interface needed for reads to zero fill the pages. Something would need
to be changed in XFS to make it return anything different, and that would
break other things. So I doubt anything will change here.

> i.e. call something that would give XFS a chance to mark all the
> extents as written, even though they're not.

You mean like XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE? ;)

That's not going to happen.

In fact, I plan to make unwritten extents non-optional soon (i.e. I've already
got preliminary patches to do this) so that filesystems that have it turned
off will get them turned on automatically.

The reasons?

	a) there is no good reason for unwritten=0 from a performance
	   perspective
	b) there is good reason for unwritten=1 from a security perspective
	c) we need to use unwritten extents in place of written extents
	   during delayed allocation to prevent stale data exposure on
	   crash and when using extent size hints.

So soon unwritten=0 is likely to go the way of the dodo.....

>  BTW, I think xfs_allocsp has its args reversed, or something.
> touch bar
> xfs_io -c "allocsp 0 1000000" bar; ll -h
> -rw-rw-r-- 1 peter peter    0 2007-06-17 06:45 bar
> xfs_io -c "allocsp 1000000 0" bar; ll -h
> -rw-rw-r-- 1 peter peter 977K 2007-06-17 06:45 bar

Nope, acting as [badly] documented. in the xfsctl man page:

	"If the section specified is beyond the current end of file,  the
	file  is  grown  and  filled  with zeroes. The l_len field is
	currently ignored, and should be set to zero."

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS_IOC_RESVSP64 for swap files
  2007-06-19  4:33 ` David Chinner
@ 2007-06-21  6:14   ` Peter Cordes
  2007-06-21 23:52     ` David Chinner
  2007-06-25  0:07     ` Nathan Scott
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Cordes @ 2007-06-21  6:14 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 3093 bytes --]

On Tue, Jun 19, 2007 at 02:33:33PM +1000, David Chinner wrote:
> On Sun, Jun 17, 2007 at 07:08:23AM -0300, Peter Cordes wrote:
> >  Hi XFS list.  I'm not subscribed, please CC me.
> > 
> >  Programs such as swapspace and swapd create new swap files when vmem runs
> > low.  They would benefit hugely from being able to create a swapfile without
> > any significant disk I/O.  (If a process grabs a lot of memory quickly, the
> > system will be swapping hard while swapspace(8) is writing a swapfile.)


> > but it [exposing stale data] would still be useful for making swap files
> > even if only root could do it.
> 
> Still a potential security hole.

 Root can read the device file, so how is letting root expose stale data any
worse?  If a program run by root makes a file with mode 0600, and then calls
XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE, where's the security problem?


> >  Could swapon(2) in the kernel be made to work on XFS files with reserved
> > space?
> 
> Basically, the swapon syscall calls bmap() for the block mapping of the
> file and XFS returns "holes" [...]

 Yeah, bad idea to put special case stuff in the kernel.

> > i.e. call something that would give XFS a chance to mark all the
> > extents as written, even though they're not.
> 
> You mean like XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE? ;)
> 
> That's not going to happen.
> 
> In fact, I plan to make unwritten extents non-optional soon (i.e. I've already
> got preliminary patches to do this) so that filesystems that have it turned
> off will get them turned on automatically.
> 
> The reasons?
> 
> 	a) there is no good reason for unwritten=0 from a performance
> 	   perspective
> 	b) there is good reason for unwritten=1 from a security perspective
> 	c) we need to use unwritten extents in place of written extents
> 	   during delayed allocation to prevent stale data exposure on
> 	   crash and when using extent size hints.
> 
> So soon unwritten=0 is likely to go the way of the dodo.....

 Ok.  I didn't really want to recreate my /var/tmp filesystem with
unwritten=0, but I really wish I had
XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE on my desktop machine.  I think
dynamic swap file creation is a cool idea, and that ioctl would make it work
perfectly.

 This ioctl is only useful for making swap files.  Nothing else cares if the
file has "holes" or not.  But for that one application, it's great.  There
are lots of ways root can shoot himself in the foot, and I don't think
adding one more is enough reason to not add an ioctl.

 Is it just that you don't want to take time to implement such a feature, or
would you reject a patch that added it?  (Not that I'm volunteering,
necessarily.)

 BTW, thanks for taking the time to respond.

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 351 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS_IOC_RESVSP64 for swap files
  2007-06-21  6:14   ` Peter Cordes
@ 2007-06-21 23:52     ` David Chinner
  2007-06-25  0:07     ` Nathan Scott
  1 sibling, 0 replies; 5+ messages in thread
From: David Chinner @ 2007-06-21 23:52 UTC (permalink / raw)
  To: Peter Cordes; +Cc: xfs

On Thu, Jun 21, 2007 at 03:14:49AM -0300, Peter Cordes wrote:
> On Tue, Jun 19, 2007 at 02:33:33PM +1000, David Chinner wrote:
> > On Sun, Jun 17, 2007 at 07:08:23AM -0300, Peter Cordes wrote:
> > >  Hi XFS list.  I'm not subscribed, please CC me.
> > > 
> > >  Programs such as swapspace and swapd create new swap files when vmem runs
> > > low.  They would benefit hugely from being able to create a swapfile without
> > > any significant disk I/O.  (If a process grabs a lot of memory quickly, the
> > > system will be swapping hard while swapspace(8) is writing a swapfile.)
> 
> 
> > > but it [exposing stale data] would still be useful for making swap files
> > > even if only root could do it.
> > 
> > Still a potential security hole.
> 
>  Root can read the device file, so how is letting root expose stale data any
> worse?  If a program run by root makes a file with mode 0600, and then calls
> XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE, where's the security problem?

If a file is not 0600 or is not owned by root, then you've got a
problem.  Even if you only allow root to use the ioctl, there's
still plenty of ways that you can screw up and expose data to normal
users with something that causes persistent exposure.....

>  Ok.  I didn't really want to recreate my /var/tmp filesystem with
> unwritten=0, but I really wish I had
> XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE on my desktop machine.  I think
> dynamic swap file creation is a cool idea, and that ioctl would make it work
> perfectly.

I don't think XFS specific hacks are the way to acheive this.
Perhaps you want to look at ->fallocate and introduce a new mode
there for preallocating uninitialised swapfile extents.

>  This ioctl is only useful for making swap files.  Nothing else cares if the
> file has "holes" or not.  But for that one application, it's great.  There
> are lots of ways root can shoot himself in the foot, and I don't think
> adding one more is enough reason to not add an ioctl.
> 
>  Is it just that you don't want to take time to implement such a feature, or
> would you reject a patch that added it?  (Not that I'm volunteering,
> necessarily.)

I think XFS is the wrong place to do this.  If you want
pre-allocated swap files then a generic solution needs to be
implemented.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS_IOC_RESVSP64 for swap files
  2007-06-21  6:14   ` Peter Cordes
  2007-06-21 23:52     ` David Chinner
@ 2007-06-25  0:07     ` Nathan Scott
  1 sibling, 0 replies; 5+ messages in thread
From: Nathan Scott @ 2007-06-25  0:07 UTC (permalink / raw)
  To: Peter Cordes; +Cc: xfs

On Thu, 2007-06-21 at 03:14 -0300, Peter Cordes wrote:
> 
> > >  Could swapon(2) in the kernel be made to work on XFS files with
> reserved
> > > space?
> > 
> > Basically, the swapon syscall calls bmap() for the block mapping of
> the
> > file and XFS returns "holes" [...]
> 
>  Yeah, bad idea to put special case stuff in the kernel.
> 
> > > i.e. call something that would give XFS a chance to mark all the
> > > extents as written, even though they're not.
> > 
> > You mean like XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE? ;)
> > 
> > That's not going to happen.

I wouldn't dismiss it out of hand.

Maybe if it was approached as follows:
- forget about "preallocation", as that's clouding the discussion
  with talk of unwritten extents, etc.  This is a real allocation,
  and filesize would need to reflect that (its XFS_IOC_ALLOCSP, just
  without the zeroing).
- instead, if you added a flag to the shiny new fallocate syscall,
  F_MKSWAP or some such, and then implemented it as an allocate
  with no zeroing.
- change mkswap to be able to optionally use this, and those dynamic
  methods for adding swap that you mentioned.
- produce objective data demonstrating the improvements this makes,
  as this will be needed when arguing the case for the flag in the
  kernel (the fallocate flag, I mean).

I can't see any reason this shouldn't be acceptable - there really
is no security issue here, provided the F_MKSWAP syscall has tight
restrictions (the arbitrary contents of memory that will get swapped
have potentially more sensitive info, like decrypted passwords, than
would usually be stored in a filesystem).  And there is very little
added complexity/code introduced, almost everything is already in
place, so if its demonstrably useful... *shrug*.

> > In fact, I plan to make unwritten extents non-optional soon (i.e.
> I've already

Unwritten extents aren't really relevent here - you would definately
not want to have these extents marked unwritten, as that would cause
additional transactions, memory allocations, writes, etc during swap.

>  Ok.  I didn't really want to recreate my /var/tmp filesystem with
> unwritten=0, but I really wish I had
> XFS_IOC_EXPOSE_MY_STALE_DATA_TO_EVERYONE on my desktop machine.  I
> think
> dynamic swap file creation is a cool idea, and that ioctl would make
> it work
> perfectly.

I think if the data was compelling enough, there's no obvious reason
this couldn't be merged, IMO (you may want to make an XFS ioctl also,
just to test it - use xfs_io on the frontend & write some xfstests -
and you'll need to tweak the call down into xfs_alloc_file_space - the
4th "alloc_type" argument will need to become a flags parameter instead
of effectively the boolean it is now, and theres a few code changes to
be done related to that).

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-06-25  6:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-17 10:08 XFS_IOC_RESVSP64 for swap files Peter Cordes
2007-06-19  4:33 ` David Chinner
2007-06-21  6:14   ` Peter Cordes
2007-06-21 23:52     ` David Chinner
2007-06-25  0:07     ` Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox