Re: New reflink(2) syscall

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: New reflink(2) syscall
       [not found] ` <1241443016.3023.51.camel@localhost.localdomain>
@ 2009-05-04 15:35   ` James Morris
  2009-05-04 16:59     ` Stephen Smalley
       [not found]   ` <20090504163514.GB31249@mail.oracle.com>
  1 sibling, 1 reply; 34+ messages in thread
From: James Morris @ 2009-05-04 15:35 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: Joel Becker, lsm, linux-fsdevel

On Mon, 4 May 2009, Stephen Smalley wrote:

[added fsdevel to this thread..]

> > 
> > http://marc.info/?l=linux-fsdevel&m=124133134306871&w=2
> > http://marc.info/?l=linux-fsdevel&m=124133137106901&w=2
> > http://marc.info/?l=linux-fsdevel&m=124133134406874&w=2
> > 
> > We need to determine if the security hooks included are appropriate, and 
> > provide feedback (I've asked the author to cc this list with future 
> > postings).
> > 
> > In summary, reflink(2) has an interface similar to link(2), but creates a 
> > new file with copy on write semantics.
> > 
> > The existing LSM hooks are security_path_mknod() and 
> > security_inode_create(), as well as security_inode_permission() via 
> > may_create().
> > 
> > For SELinux, at least, it seems we may need another check to control 
> > information flow from the source file to the new file, which may be 
> > instantiated with a different security context.
> 
> The reflink(2) documentation in patch 1/3 suggests that the security
> context of the new file would be initially identical to the original
> file ("All file attributes and extended attributes of the new file must
> be identical to the source file...").  

Ok (I missed the extended attribute mention initially).

> In that case, security_inode_create() is not the right hook to use as
> security_inode_create() presumes that the new file is labeled based on
> the creating process and the parent directory and that the filesystem
> will use the security attribute name:value pair returned by
> security_inode_init_security() as the initial attribute for the new
> file.
> 
> It sounds like reflink(2) is more akin to copying a file while
> preserving attributes (ala cp -a foo bar), only performing the actual
> copying lazily.  In the case of a normal file copy while preserving
> attributes, we would check that the process can open and read the
> original file, write to the target directory, create a file with the
> security context of the original file, and set the mode/owner/timestamps
> of the new file.  That sequence of checks however is based on the
> presumption that the data flows through the process, potentially being
> mutated by it, and that we don't directly see the inter-file
> relationship in the kernel.  With direct kernel support for file 
> copying, we may want a different set of checks. 

What's fundamentally different, though, that the process would only be 
able to then modify the data in a subsequent syscall?

> I think we likely need a new security hook.

Agreed, perhaps something like:

int security_inode_reflink(struct dentry *dentry, struct inode *dir);


> BTW, the DAC permission checking here also needs some thought, and 
> wouldn't be handled by the LSM hook.  Should reflink(2) require DAC read 
> permission to the file, like a file copy would?

Yes -- it seems all you need to be able to do at the moment is lookup the 
original file for the syscall to work from that end.

> And if the owner of the original differs from the fsuid of the current 
> process, should reflink(2) require CAP_CHOWN in order to set the 
> ownership of the copy to the original's owner?

Good question :-)

Also, do we ignore create_sid in SELinux for this?


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 15:35   ` New reflink(2) syscall James Morris
@ 2009-05-04 16:59     ` Stephen Smalley
  2009-05-04 17:49       ` Joel Becker
  2009-05-05 18:00       ` Joel Becker
  0 siblings, 2 replies; 34+ messages in thread
From: Stephen Smalley @ 2009-05-04 16:59 UTC (permalink / raw)
  To: James Morris; +Cc: Joel Becker, lsm, linux-fsdevel

On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> On Mon, 4 May 2009, Stephen Smalley wrote:
> 
> [added fsdevel to this thread..]
> 
> > > 
> > > http://marc.info/?l=linux-fsdevel&m=124133134306871&w=2
> > > http://marc.info/?l=linux-fsdevel&m=124133137106901&w=2
> > > http://marc.info/?l=linux-fsdevel&m=124133134406874&w=2
> > > 
> > > We need to determine if the security hooks included are appropriate, and 
> > > provide feedback (I've asked the author to cc this list with future 
> > > postings).
> > > 
> > > In summary, reflink(2) has an interface similar to link(2), but creates a 
> > > new file with copy on write semantics.
> > > 
> > > The existing LSM hooks are security_path_mknod() and 
> > > security_inode_create(), as well as security_inode_permission() via 
> > > may_create().
> > > 
> > > For SELinux, at least, it seems we may need another check to control 
> > > information flow from the source file to the new file, which may be 
> > > instantiated with a different security context.
> > 
> > The reflink(2) documentation in patch 1/3 suggests that the security
> > context of the new file would be initially identical to the original
> > file ("All file attributes and extended attributes of the new file must
> > be identical to the source file...").  
> 
> Ok (I missed the extended attribute mention initially).
> 
> > In that case, security_inode_create() is not the right hook to use as
> > security_inode_create() presumes that the new file is labeled based on
> > the creating process and the parent directory and that the filesystem
> > will use the security attribute name:value pair returned by
> > security_inode_init_security() as the initial attribute for the new
> > file.
> > 
> > It sounds like reflink(2) is more akin to copying a file while
> > preserving attributes (ala cp -a foo bar), only performing the actual
> > copying lazily.  In the case of a normal file copy while preserving
> > attributes, we would check that the process can open and read the
> > original file, write to the target directory, create a file with the
> > security context of the original file, and set the mode/owner/timestamps
> > of the new file.  That sequence of checks however is based on the
> > presumption that the data flows through the process, potentially being
> > mutated by it, and that we don't directly see the inter-file
> > relationship in the kernel.  With direct kernel support for file 
> > copying, we may want a different set of checks. 
> 
> What's fundamentally different, though, that the process would only be 
> able to then modify the data in a subsequent syscall?

Since the data doesn't flow through the process at all, it can neither
be leaked nor modified by the process.  Whereas normally the data would
be copied into the memory of the process (and potentially leaked
elsewhere) and the process could write any arbitrary data it liked to
the new file.  As a result, one might be willing to allow reflink(2) in
situations where one would not be willing to allow a userspace file
copy.

> > I think we likely need a new security hook.
> 
> Agreed, perhaps something like:
> 
> int security_inode_reflink(struct dentry *dentry, struct inode *dir);

I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
new_dentry.

> > BTW, the DAC permission checking here also needs some thought, and 
> > wouldn't be handled by the LSM hook.  Should reflink(2) require DAC read 
> > permission to the file, like a file copy would?
> 
> Yes -- it seems all you need to be able to do at the moment is lookup the 
> original file for the syscall to work from that end.
> 
> > And if the owner of the original differs from the fsuid of the current 
> > process, should reflink(2) require CAP_CHOWN in order to set the 
> > ownership of the copy to the original's owner?
> 
> Good question :-)
> 
> Also, do we ignore create_sid in SELinux for this?

Yes (assuming attribute preservation by the filesystem).

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 16:59     ` Stephen Smalley
@ 2009-05-04 17:49       ` Joel Becker
  2009-05-05 18:00       ` Joel Becker
  1 sibling, 0 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-04 17:49 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > What's fundamentally different, though, that the process would only be 
> > able to then modify the data in a subsequent syscall?
> 
> Since the data doesn't flow through the process at all, it can neither
> be leaked nor modified by the process.  Whereas normally the data would
> be copied into the memory of the process (and potentially leaked
> elsewhere) and the process could write any arbitrary data it liked to
> the new file.  As a result, one might be willing to allow reflink(2) in
> situations where one would not be willing to allow a userspace file
> copy.

	Oh, that's a good point.  A process using reflink(2) to make a
snapshot can do the snap but not modify.  That's neat.

Joel

-- 

Life's Little Instruction Book #237

	"Seek out the good in people."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
       [not found]     ` <1241458669.3023.203.camel@localhost.localdomain>
@ 2009-05-04 18:08       ` Joel Becker
  2009-05-04 19:30         ` Stephen Smalley
  0 siblings, 1 reply; 34+ messages in thread
From: Joel Becker @ 2009-05-04 18:08 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

[Re-adding linux-fsdevel]

On Mon, May 04, 2009 at 01:37:49PM -0400, Stephen Smalley wrote:
> On Mon, 2009-05-04 at 09:35 -0700, Joel Becker wrote:
> > On Mon, May 04, 2009 at 09:16:56AM -0400, Stephen Smalley wrote:
> > > BTW, the DAC permission checking here also needs some thought, and
> > > wouldn't be handled by the LSM hook.  Should reflink(2) require DAC read
> > > permission to the file, like a file copy would?  And if the owner of the
> > > original differs from the fsuid of the current process, should
> > > reflink(2) require CAP_CHOWN in order to set the ownership of the copy
> > > to the original's owner?
> > 
> > 	I'm thinking it should require read, yes.  That's part of what
> > I'm asking.  Regarding CAP_CHOWN - I don't want to limit the call to
> > root-only.  Are you saying something like "If you have CAP_CHOWN, you
> > can reflink() the sucker and keep the original ownership, otherwise
> > sorry, it's gotta be owned by the current process"?
> 
> Is reflink() supposed to be more like link(2) or more like an in-kernel
> optimized file copy?

	More like link(2).

> And what is the real usage scenario?  Are users likely to need/want to
> be able to reflink() to files that they do not own?  If so, will they be
> more likely to want to own the new copies or preserve the original
> ownership?

	The real usage scenarios are varied.  The idea came out of inode
snapshots.  And for that, you really want to preserve everything.  But
when we came up with the reflink(2) interface as a more generic way to
invoke it, we came up with a lot of fun uses.
	The other mail had a good point - if you can allow someone the
ability to reflink() a file but not read or write it, you might not even
need read permission.  I'm thinking of a snapshotter that can make
snapshots with only the permissions to create in the snapshot directory.
That's neat.

> If you want to support multiple attribute assignment behaviors (e.g.
> sometimes preservation, sometimes inherit from the process), then you
> should make that explicit in the interface, e.g. preservation flags for
> the different attributes, and fail the operation if unable to honor the
> request. 

	Yeah, I really don't want to create multiple behaviors.  I
wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
clarify what you were thinking.

JOel

-- 

"Sometimes I think the surest sign intelligent
 life exists elsewhere in the universe is that
 none of it has tried to contact us."
                                -Calvin & Hobbes

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 18:08       ` Joel Becker
@ 2009-05-04 19:30         ` Stephen Smalley
  2009-05-04 21:03           ` Joel Becker
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Smalley @ 2009-05-04 19:30 UTC (permalink / raw)
  To: Joel Becker; +Cc: James Morris, lsm, linux-fsdevel

On Mon, 2009-05-04 at 11:08 -0700, Joel Becker wrote:
> [Re-adding linux-fsdevel]
> 
> On Mon, May 04, 2009 at 01:37:49PM -0400, Stephen Smalley wrote:
> > On Mon, 2009-05-04 at 09:35 -0700, Joel Becker wrote:
> > > On Mon, May 04, 2009 at 09:16:56AM -0400, Stephen Smalley wrote:
> > > > BTW, the DAC permission checking here also needs some thought, and
> > > > wouldn't be handled by the LSM hook.  Should reflink(2) require DAC read
> > > > permission to the file, like a file copy would?  And if the owner of the
> > > > original differs from the fsuid of the current process, should
> > > > reflink(2) require CAP_CHOWN in order to set the ownership of the copy
> > > > to the original's owner?
> > > 
> > > 	I'm thinking it should require read, yes.  That's part of what
> > > I'm asking.  Regarding CAP_CHOWN - I don't want to limit the call to
> > > root-only.  Are you saying something like "If you have CAP_CHOWN, you
> > > can reflink() the sucker and keep the original ownership, otherwise
> > > sorry, it's gotta be owned by the current process"?
> > 
> > Is reflink() supposed to be more like link(2) or more like an in-kernel
> > optimized file copy?
> 
> 	More like link(2).
> 
> > And what is the real usage scenario?  Are users likely to need/want to
> > be able to reflink() to files that they do not own?  If so, will they be
> > more likely to want to own the new copies or preserve the original
> > ownership?
> 
> 	The real usage scenarios are varied.  The idea came out of inode
> snapshots.  And for that, you really want to preserve everything.  But
> when we came up with the reflink(2) interface as a more generic way to
> invoke it, we came up with a lot of fun uses.
> 	The other mail had a good point - if you can allow someone the
> ability to reflink() a file but not read or write it, you might not even
> need read permission.  I'm thinking of a snapshotter that can make
> snapshots with only the permissions to create in the snapshot directory.
> That's neat.
> 
> > If you want to support multiple attribute assignment behaviors (e.g.
> > sometimes preservation, sometimes inherit from the process), then you
> > should make that explicit in the interface, e.g. preservation flags for
> > the different attributes, and fail the operation if unable to honor the
> > request. 
> 
> 	Yeah, I really don't want to create multiple behaviors.  I
> wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
> clarify what you were thinking.

Given that normally users can't create files with other ownerships, it
seemed that we might want to require CAP_CHOWN or some other capability
in order to reflink(2) a file that isn't owned by the fsuid of the
process.  Possibly is_owner_or_cap(), i.e. owner or CAP_FOWNER, would be
suitable.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 19:30         ` Stephen Smalley
@ 2009-05-04 21:03           ` Joel Becker
  2009-05-04 21:30             ` Joel Becker
  2009-05-04 23:13             ` Theodore Tso
  0 siblings, 2 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-04 21:03 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 03:30:46PM -0400, Stephen Smalley wrote:
> > 	Yeah, I really don't want to create multiple behaviors.  I
> > wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
> > clarify what you were thinking.
> 
> Given that normally users can't create files with other ownerships, it
> seemed that we might want to require CAP_CHOWN or some other capability
> in order to reflink(2) a file that isn't owned by the fsuid of the
> process.  Possibly is_owner_or_cap(), i.e. owner or CAP_FOWNER, would be
> suitable.

	Yeah, the more I think about it the more I agree.  It's a simple
story - you're creating a file with ownership !you, you need
owner_or_cap.
	This also prevents a fun quota DoS.

Joel

-- 

"Well-timed silence hath more eloquence than speech."  
         - Martin Fraquhar Tupper

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 21:03           ` Joel Becker
@ 2009-05-04 21:30             ` Joel Becker
  2009-05-05 11:44               ` Stephen Smalley
  2009-05-04 23:13             ` Theodore Tso
  1 sibling, 1 reply; 34+ messages in thread
From: Joel Becker @ 2009-05-04 21:30 UTC (permalink / raw)
  To: Stephen Smalley, James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 02:03:56PM -0700, Joel Becker wrote:
> On Mon, May 04, 2009 at 03:30:46PM -0400, Stephen Smalley wrote:
> > > 	Yeah, I really don't want to create multiple behaviors.  I
> > > wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
> > > clarify what you were thinking.
> > 
> > Given that normally users can't create files with other ownerships, it
> > seemed that we might want to require CAP_CHOWN or some other capability
> > in order to reflink(2) a file that isn't owned by the fsuid of the
> > process.  Possibly is_owner_or_cap(), i.e. owner or CAP_FOWNER, would be
> > suitable.
> 
> 	Yeah, the more I think about it the more I agree.  It's a simple
> story - you're creating a file with ownership !you, you need
> owner_or_cap.

	Wouldn't testing inode_change_ok() be the right thing here?
Hits up uid, gid, perms, times.

Joel

-- 

"In the beginning, the universe was created. This has made a lot 
 of people very angry, and is generally considered to have been a 
 bad move."
        - Douglas Adams

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 21:03           ` Joel Becker
  2009-05-04 21:30             ` Joel Becker
@ 2009-05-04 23:13             ` Theodore Tso
  2009-05-05 16:47               ` Joel Becker
  1 sibling, 1 reply; 34+ messages in thread
From: Theodore Tso @ 2009-05-04 23:13 UTC (permalink / raw)
  To: Stephen Smalley, James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 02:03:56PM -0700, Joel Becker wrote:
> On Mon, May 04, 2009 at 03:30:46PM -0400, Stephen Smalley wrote:
> > > 	Yeah, I really don't want to create multiple behaviors.  I
> > > wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
> > > clarify what you were thinking.
> > 
> > Given that normally users can't create files with other ownerships, it
> > seemed that we might want to require CAP_CHOWN or some other capability
> > in order to reflink(2) a file that isn't owned by the fsuid of the
> > process.  Possibly is_owner_or_cap(), i.e. owner or CAP_FOWNER, would be
> > suitable.
> 
> 	Yeah, the more I think about it the more I agree.  It's a simple
> story - you're creating a file with ownership !you, you need
> owner_or_cap.

Stupid question --- why not create the file with ownership == you?
It's a new inode, so this should be trivially easy to do, right?

       	   	     	  	    	      - Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 21:30             ` Joel Becker
@ 2009-05-05 11:44               ` Stephen Smalley
  2009-05-05 16:46                 ` Joel Becker
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Smalley @ 2009-05-05 11:44 UTC (permalink / raw)
  To: Joel Becker; +Cc: James Morris, lsm, linux-fsdevel

On Mon, 2009-05-04 at 14:30 -0700, Joel Becker wrote:
> On Mon, May 04, 2009 at 02:03:56PM -0700, Joel Becker wrote:
> > On Mon, May 04, 2009 at 03:30:46PM -0400, Stephen Smalley wrote:
> > > > 	Yeah, I really don't want to create multiple behaviors.  I
> > > > wasn't proposing the "behaves differently on CAP_CHOWN," I was trying to
> > > > clarify what you were thinking.
> > > 
> > > Given that normally users can't create files with other ownerships, it
> > > seemed that we might want to require CAP_CHOWN or some other capability
> > > in order to reflink(2) a file that isn't owned by the fsuid of the
> > > process.  Possibly is_owner_or_cap(), i.e. owner or CAP_FOWNER, would be
> > > suitable.
> > 
> > 	Yeah, the more I think about it the more I agree.  It's a simple
> > story - you're creating a file with ownership !you, you need
> > owner_or_cap.
> 
> 	Wouldn't testing inode_change_ok() be the right thing here?
> Hits up uid, gid, perms, times.

I don't think so, as you aren't actually changing the attributes of an
inode but rather are cloning the attributes from the original to the new
one.  And I doubt you want the same level of restrictiveness, since in
the reflink(2) case, the process is limited to only preserving the
original attributes (not setting arbitrary values) and only on the same
content/data (not on arbitrary content/data).

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 11:44               ` Stephen Smalley
@ 2009-05-05 16:46                 ` Joel Becker
  0 siblings, 0 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-05 16:46 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 07:44:01AM -0400, Stephen Smalley wrote:
> On Mon, 2009-05-04 at 14:30 -0700, Joel Becker wrote:
> > 	Wouldn't testing inode_change_ok() be the right thing here?
> > Hits up uid, gid, perms, times.
> 
> I don't think so, as you aren't actually changing the attributes of an
> inode but rather are cloning the attributes from the original to the new
> one.  And I doubt you want the same level of restrictiveness, since in
> the reflink(2) case, the process is limited to only preserving the
> original attributes (not setting arbitrary values) and only on the same
> content/data (not on arbitrary content/data).

	Ok, I was looking at avoiding re-implementing the UID/GID
checks, but I suppose I'll just do them straight up in vfs_reflink().

Joel

-- 

Life's Little Instruction Book #407

	"Every once in a while, take the scenic route."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 23:13             ` Theodore Tso
@ 2009-05-05 16:47               ` Joel Becker
  2009-05-05 16:56                 ` Chris Mason
  0 siblings, 1 reply; 34+ messages in thread
From: Joel Becker @ 2009-05-05 16:47 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Stephen Smalley, James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 07:13:34PM -0400, Theodore Tso wrote:
> On Mon, May 04, 2009 at 02:03:56PM -0700, Joel Becker wrote:
> > 	Yeah, the more I think about it the more I agree.  It's a simple
> > story - you're creating a file with ownership !you, you need
> > owner_or_cap.
> 
> Stupid question --- why not create the file with ownership == you?
> It's a new inode, so this should be trivially easy to do, right?

	Because then you have to change the entire security structure,
and you aren't a snapshot anymore.

Joel

-- 

Life's Little Instruction Book #451

	"Don't be afraid to say, 'I'm sorry.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 16:47               ` Joel Becker
@ 2009-05-05 16:56                 ` Chris Mason
  2009-05-05 17:13                   ` Joel Becker
  0 siblings, 1 reply; 34+ messages in thread
From: Chris Mason @ 2009-05-05 16:56 UTC (permalink / raw)
  To: Joel Becker
  Cc: Theodore Tso, Stephen Smalley, James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 09:47 -0700, Joel Becker wrote:
> On Mon, May 04, 2009 at 07:13:34PM -0400, Theodore Tso wrote:
> > On Mon, May 04, 2009 at 02:03:56PM -0700, Joel Becker wrote:
> > > 	Yeah, the more I think about it the more I agree.  It's a simple
> > > story - you're creating a file with ownership !you, you need
> > > owner_or_cap.
> > 
> > Stupid question --- why not create the file with ownership == you?
> > It's a new inode, so this should be trivially easy to do, right?
> 
> 	Because then you have to change the entire security structure,
> and you aren't a snapshot anymore.

I won't argue with the security part, but the snapshot part could just
as easily be defined by the data and not the inode.

-chris



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 16:56                 ` Chris Mason
@ 2009-05-05 17:13                   ` Joel Becker
  2009-05-05 17:34                     ` Theodore Tso
  2009-05-05 17:36                     ` Chris Mason
  0 siblings, 2 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-05 17:13 UTC (permalink / raw)
  To: Chris Mason
  Cc: Theodore Tso, Stephen Smalley, James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 12:56:58PM -0400, Chris Mason wrote:
> On Tue, 2009-05-05 at 09:47 -0700, Joel Becker wrote:
> > 	Because then you have to change the entire security structure,
> > and you aren't a snapshot anymore.
> 
> I won't argue with the security part, but the snapshot part could just
> as easily be defined by the data and not the inode.

	In ZFS/btrfs/WAFL/disk array snaps, if you go back to a snap
does the selinux context or acls or equivalent appear different?  I don't
think so, and I expect people would be really upset if they had to know
all the restorecon/acl-fu to get it right.

Joel

-- 

"Under capitalism, man exploits man.  Under Communism, it's just 
   the opposite."
				 - John Kenneth Galbraith

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:13                   ` Joel Becker
@ 2009-05-05 17:34                     ` Theodore Tso
  2009-05-05 17:44                       ` Stephen Smalley
  2009-05-05 17:36                     ` Chris Mason
  1 sibling, 1 reply; 34+ messages in thread
From: Theodore Tso @ 2009-05-05 17:34 UTC (permalink / raw)
  To: Chris Mason, Stephen Smalley, James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 10:13:31AM -0700, Joel Becker wrote:
> On Tue, May 05, 2009 at 12:56:58PM -0400, Chris Mason wrote:
> > On Tue, 2009-05-05 at 09:47 -0700, Joel Becker wrote:
> > > 	Because then you have to change the entire security structure,
> > > and you aren't a snapshot anymore.
> > 
> > I won't argue with the security part, but the snapshot part could just
> > as easily be defined by the data and not the inode.
> 
> 	In ZFS/btrfs/WAFL/disk array snaps, if you go back to a snap
> does the selinux context or acls or equivalent appear different?  I don't
> think so, and I expect people would be really upset if they had to know
> all the restorecon/acl-fu to get it right.

OK, now I understand; sorry, I didn't realize that when you said
"snapshot", what you were really talking about was a way to implement
WAFL-style snapshots, where reflink was a low-level operation that
would be used to implement that particular use case.  Hmm, maybe the
answer is that we implement reflinkat(2) with flags that indicate
whether this is supposed to be more like a hard link (i.e., acl and
ownership should be preserved) or more a like a copy (i.e., acl is
inherited from the new containing directory's directory creation acl,
uid/guid are set using the standard rules for creating new inodes).

Both use cases are equally valid, and I imagine there would be
interest in using reflinks both for snapshots and as a very
lightweight copy operation by commands like /bin/cp.

						- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:13                   ` Joel Becker
  2009-05-05 17:34                     ` Theodore Tso
@ 2009-05-05 17:36                     ` Chris Mason
  1 sibling, 0 replies; 34+ messages in thread
From: Chris Mason @ 2009-05-05 17:36 UTC (permalink / raw)
  To: Joel Becker
  Cc: Theodore Tso, Stephen Smalley, James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 10:13 -0700, Joel Becker wrote:
> On Tue, May 05, 2009 at 12:56:58PM -0400, Chris Mason wrote:
> > On Tue, 2009-05-05 at 09:47 -0700, Joel Becker wrote:
> > > 	Because then you have to change the entire security structure,
> > > and you aren't a snapshot anymore.
> > 
> > I won't argue with the security part, but the snapshot part could just
> > as easily be defined by the data and not the inode.
> 
> 	In ZFS/btrfs/WAFL/disk array snaps, if you go back to a snap
> does the selinux context or acls or equivalent appear different?  I don't
> think so, and I expect people would be really upset if they had to know
> all the restorecon/acl-fu to get it right.

So a btrfs snapshot is a whole subvolume (directory tree), and if you
haven't changed the snapshot it'll be the same.

For the btrfs clone ioctl, you're explicitly snapshotting only the data.
The inode permissions, acls etc are userland's problem.

Both ways have benefits, and you can get from the reflink one to any
other form pretty easily after the reflink call.

-chris





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:34                     ` Theodore Tso
@ 2009-05-05 17:44                       ` Stephen Smalley
  2009-05-05 17:56                         ` Joel Becker
  2009-05-05 22:45                         ` Jamie Lokier
  0 siblings, 2 replies; 34+ messages in thread
From: Stephen Smalley @ 2009-05-05 17:44 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Chris Mason, James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 13:34 -0400, Theodore Tso wrote:
> On Tue, May 05, 2009 at 10:13:31AM -0700, Joel Becker wrote:
> > On Tue, May 05, 2009 at 12:56:58PM -0400, Chris Mason wrote:
> > > On Tue, 2009-05-05 at 09:47 -0700, Joel Becker wrote:
> > > > 	Because then you have to change the entire security structure,
> > > > and you aren't a snapshot anymore.
> > > 
> > > I won't argue with the security part, but the snapshot part could just
> > > as easily be defined by the data and not the inode.
> > 
> > 	In ZFS/btrfs/WAFL/disk array snaps, if you go back to a snap
> > does the selinux context or acls or equivalent appear different?  I don't
> > think so, and I expect people would be really upset if they had to know
> > all the restorecon/acl-fu to get it right.
> 
> OK, now I understand; sorry, I didn't realize that when you said
> "snapshot", what you were really talking about was a way to implement
> WAFL-style snapshots, where reflink was a low-level operation that
> would be used to implement that particular use case.  Hmm, maybe the
> answer is that we implement reflinkat(2) with flags that indicate
> whether this is supposed to be more like a hard link (i.e., acl and
> ownership should be preserved) or more a like a copy (i.e., acl is
> inherited from the new containing directory's directory creation acl,
> uid/guid are set using the standard rules for creating new inodes).
> 
> Both use cases are equally valid, and I imagine there would be
> interest in using reflinks both for snapshots and as a very
> lightweight copy operation by commands like /bin/cp.

Not arguing against this, but just to note:  the security model will
differ depending on these flags, as the link-like case doesn't require
the caller to have read access to the file (the data is no more
accessible than it was before), whereas the copy-like case requires the
caller to have read access to the original file since the data "leaks"
into a container with potentially different access constraints.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:44                       ` Stephen Smalley
@ 2009-05-05 17:56                         ` Joel Becker
  2009-05-05 18:21                           ` Theodore Tso
  2009-05-05 22:45                         ` Jamie Lokier
  1 sibling, 1 reply; 34+ messages in thread
From: Joel Becker @ 2009-05-05 17:56 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Theodore Tso, Chris Mason, James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 01:44:11PM -0400, Stephen Smalley wrote:
> > Both use cases are equally valid, and I imagine there would be
> > interest in using reflinks both for snapshots and as a very
> > lightweight copy operation by commands like /bin/cp.

	Sure, but you can start with a reflink and then do what you want
to it.

> Not arguing against this, but just to note:  the security model will
> differ depending on these flags, as the link-like case doesn't require
> the caller to have read access to the file (the data is no more
> accessible than it was before), whereas the copy-like case requires the
> caller to have read access to the original file since the data "leaks"
> into a container with potentially different access constraints.

	Yeah, another reason why I don't want to complicate the
behavior.  I defined it as "the operation is like link(2)" for a reason
:-)

Joel

-- 

"I inject pure kryptonite into my brain.
 It improves my kung fu, and it eases the pain."


Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-04 16:59     ` Stephen Smalley
  2009-05-04 17:49       ` Joel Becker
@ 2009-05-05 18:00       ` Joel Becker
  2009-05-05 18:41         ` Stephen Smalley
  2009-05-05 22:15         ` James Morris
  1 sibling, 2 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-05 18:00 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > Agreed, perhaps something like:
> > 
> > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> 
> I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> new_dentry.

	I'm about to insert this bit.  I agree with
security_inode_reflink(old_dentry, dir, new_dentry), but I note that
security_path_reflink() was proposed in another email, and I'm guessing
I should add both?

Joel

-- 

Life's Little Instruction Book #207

	"Swing for the fence."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:56                         ` Joel Becker
@ 2009-05-05 18:21                           ` Theodore Tso
  2009-05-06  4:27                             ` Casey Schaufler
  0 siblings, 1 reply; 34+ messages in thread
From: Theodore Tso @ 2009-05-05 18:21 UTC (permalink / raw)
  To: Stephen Smalley, Chris Mason, James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 10:56:03AM -0700, Joel Becker wrote:
> On Tue, May 05, 2009 at 01:44:11PM -0400, Stephen Smalley wrote:
> > > Both use cases are equally valid, and I imagine there would be
> > > interest in using reflinks both for snapshots and as a very
> > > lightweight copy operation by commands like /bin/cp.
> 
> 	Sure, but you can start with a reflink and then do what you want
> to it.
> 
> > Not arguing against this, but just to note:  the security model will
> > differ depending on these flags, as the link-like case doesn't require
> > the caller to have read access to the file (the data is no more
> > accessible than it was before), whereas the copy-like case requires the
> > caller to have read access to the original file since the data "leaks"
> > into a container with potentially different access constraints.
> 
> 	Yeah, another reason why I don't want to complicate the
> behavior.  I defined it as "the operation is like link(2)" for a reason
> :-)

The security model *is* the problem, however.  If we have a mode where
reflink acts like cp, then it doesn't require anything special in
terms of CAP_FOWNER.  It really is the same as a copy command.   

So sure, you could start with a reflink and then modify it, but if
you're an unprivileged user, you won't be able to create the reflink
in the first place.

					- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 18:00       ` Joel Becker
@ 2009-05-05 18:41         ` Stephen Smalley
  2009-05-05 19:15           ` Joel Becker
  2009-05-05 22:15         ` James Morris
  1 sibling, 1 reply; 34+ messages in thread
From: Stephen Smalley @ 2009-05-05 18:41 UTC (permalink / raw)
  To: Joel Becker; +Cc: James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 11:00 -0700, Joel Becker wrote:
> On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > Agreed, perhaps something like:
> > > 
> > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > 
> > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > new_dentry.
> 
> 	I'm about to insert this bit.  I agree with
> security_inode_reflink(old_dentry, dir, new_dentry), but I note that
> security_path_reflink() was proposed in another email, and I'm guessing
> I should add both?

The TOMOYO folks said that calling security_path_link() would suffice
for their purposes.  SELinux would want security_inode_reflink() from
vfs_reflink().

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 19:15           ` Joel Becker
@ 2009-05-05 19:14             ` Stephen Smalley
  2009-05-05 19:33               ` Joel Becker
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Smalley @ 2009-05-05 19:14 UTC (permalink / raw)
  To: Joel Becker; +Cc: James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 12:15 -0700, Joel Becker wrote:
> On Tue, May 05, 2009 at 02:41:22PM -0400, Stephen Smalley wrote:
> > On Tue, 2009-05-05 at 11:00 -0700, Joel Becker wrote:
> > > On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > > > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > > > Agreed, perhaps something like:
> > > > > 
> > > > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > > > 
> > > > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > > > new_dentry.
> > > 
> > > 	I'm about to insert this bit.  I agree with
> > > security_inode_reflink(old_dentry, dir, new_dentry), but I note that
> > > security_path_reflink() was proposed in another email, and I'm guessing
> > > I should add both?
> > 
> > The TOMOYO folks said that calling security_path_link() would suffice
> > for their purposes.  SELinux would want security_inode_reflink() from
> > vfs_reflink().
> 
> 	I've added both.  I have no idea how to add the actual
> SELinux/TOMOYO bits, so I've just added the operations hook :-)

That's fine - we can fill in the hook implementations for our respective
modules.  You do need to add a stub function to capability.c and add a
line to security_fixup_ops() so that the function pointer is initially
set though.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 18:41         ` Stephen Smalley
@ 2009-05-05 19:15           ` Joel Becker
  2009-05-05 19:14             ` Stephen Smalley
  0 siblings, 1 reply; 34+ messages in thread
From: Joel Becker @ 2009-05-05 19:15 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 02:41:22PM -0400, Stephen Smalley wrote:
> On Tue, 2009-05-05 at 11:00 -0700, Joel Becker wrote:
> > On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > > Agreed, perhaps something like:
> > > > 
> > > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > > 
> > > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > > new_dentry.
> > 
> > 	I'm about to insert this bit.  I agree with
> > security_inode_reflink(old_dentry, dir, new_dentry), but I note that
> > security_path_reflink() was proposed in another email, and I'm guessing
> > I should add both?
> 
> The TOMOYO folks said that calling security_path_link() would suffice
> for their purposes.  SELinux would want security_inode_reflink() from
> vfs_reflink().

	I've added both.  I have no idea how to add the actual
SELinux/TOMOYO bits, so I've just added the operations hook :-)

Joel

-- 

To spot the expert, pick the one who predicts the job will take the
longest and cost the most.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 19:14             ` Stephen Smalley
@ 2009-05-05 19:33               ` Joel Becker
  0 siblings, 0 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-05 19:33 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: James Morris, lsm, linux-fsdevel

On Tue, May 05, 2009 at 03:14:06PM -0400, Stephen Smalley wrote:
> On Tue, 2009-05-05 at 12:15 -0700, Joel Becker wrote:
> > 	I've added both.  I have no idea how to add the actual
> > SELinux/TOMOYO bits, so I've just added the operations hook :-)
> 
> That's fine - we can fill in the hook implementations for our respective
> modules.  You do need to add a stub function to capability.c and add a
> line to security_fixup_ops() so that the function pointer is initially
> set though.

	Thanks, I missed that.

Joel

-- 

Life's Little Instruction Book #173

	"Be kinder than necessary."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 18:00       ` Joel Becker
  2009-05-05 18:41         ` Stephen Smalley
@ 2009-05-05 22:15         ` James Morris
  2009-05-05 22:31           ` Joel Becker
  2009-05-06 11:23           ` Stephen Smalley
  1 sibling, 2 replies; 34+ messages in thread
From: James Morris @ 2009-05-05 22:15 UTC (permalink / raw)
  To: Joel Becker; +Cc: Stephen Smalley, lsm, linux-fsdevel

On Tue, 5 May 2009, Joel Becker wrote:

> On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > Agreed, perhaps something like:
> > > 
> > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > 
> > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > new_dentry.
> 
> 	I'm about to insert this bit.  I agree with
> security_inode_reflink(old_dentry, dir, new_dentry),

If the files and metadata are initially identical (except for inode #), 
why do we need to see both the old and new dentry?


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 22:15         ` James Morris
@ 2009-05-05 22:31           ` Joel Becker
  2009-05-06 11:23           ` Stephen Smalley
  1 sibling, 0 replies; 34+ messages in thread
From: Joel Becker @ 2009-05-05 22:31 UTC (permalink / raw)
  To: James Morris; +Cc: Stephen Smalley, lsm, linux-fsdevel

On Wed, May 06, 2009 at 08:15:08AM +1000, James Morris wrote:
> On Tue, 5 May 2009, Joel Becker wrote:
> > On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > > Agreed, perhaps something like:
> > > > 
> > > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > > 
> > > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > > new_dentry.
> > 
> > 	I'm about to insert this bit.  I agree with
> > security_inode_reflink(old_dentry, dir, new_dentry),
> 
> If the files and metadata are initially identical (except for inode #), 
> why do we need to see both the old and new dentry?

	I'm learning more about the LSM hooks as we go here...
	Now, obviously path checkers want the old path and the new path,
but I think we satisfy that with security_path_reflink().
	I started by making security_inode_reflink() consistent with
security_inode_link().  There the actual source/dest is the same inode,
yet we have the same argument set.  So I have to think that any reason
that holds for security_inode_link() would hold for
security_inode_reflink().
	The new_dentry doesn't have an inode here yet, so I would think
you want to look up the security context of the source inode, which is
hanging off of old_dentry.  I can't see how you get to it otherwise.
	But this is just me speculating based on "reflink looks like
link."  If you know you do/don't need fields, I can easily change it.

Joel

-- 

"The nice thing about egotists is that they don't talk about other
 people."
         - Lucille S. Harper

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 17:44                       ` Stephen Smalley
  2009-05-05 17:56                         ` Joel Becker
@ 2009-05-05 22:45                         ` Jamie Lokier
  2009-05-06  4:08                           ` Casey Schaufler
  2009-05-06 11:25                           ` Stephen Smalley
  1 sibling, 2 replies; 34+ messages in thread
From: Jamie Lokier @ 2009-05-05 22:45 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Theodore Tso, Chris Mason, James Morris, lsm, linux-fsdevel

Stephen Smalley wrote:
> Not arguing against this, but just to note:  the security model will
> differ depending on these flags, as the link-like case doesn't require
> the caller to have read access to the file (the data is no more
> accessible than it was before)

One security difference between reflink() and link() when linking to
_other_ user's files is they can tell if you suddenly got a link to
their file, from their i_nlink.  They can be suspicious and maybe
overwrite the file in place, truncate it or something, and look around
for the link you created in a secret place in your /home.

But they can't see if you got a reflink to their file.

Even though you can't read the file if you couldn't read it before,
you now have a link to it which might preserve data they don't want to
be preserved.

So reflink() should, perhaps, be more restricted than link().

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 22:45                         ` Jamie Lokier
@ 2009-05-06  4:08                           ` Casey Schaufler
  2009-05-06  4:28                             ` Jamie Lokier
  2009-05-06 11:25                           ` Stephen Smalley
  1 sibling, 1 reply; 34+ messages in thread
From: Casey Schaufler @ 2009-05-06  4:08 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Stephen Smalley, Theodore Tso, Chris Mason, James Morris, lsm,
	linux-fsdevel

Jamie Lokier wrote:
> Stephen Smalley wrote:
>   
>> Not arguing against this, but just to note:  the security model will
>> differ depending on these flags, as the link-like case doesn't require
>> the caller to have read access to the file (the data is no more
>> accessible than it was before)
>>     
>
> One security difference between reflink() and link() when linking to
> _other_ user's files is they can tell if you suddenly got a link to
> their file, from their i_nlink.  They can be suspicious and maybe
> overwrite the file in place, truncate it or something, and look around
> for the link you created in a secret place in your /home.
>
> But they can't see if you got a reflink to their file.
>
> Even though you can't read the file if you couldn't read it before,
> you now have a link to it which might preserve data they don't want to
> be preserved.
>
> So reflink() should, perhaps, be more restricted than link().
>   
If I use reflink() I end up with two sets of initially identical
security credentials, which is the right thing, but now read access
(I'll skip write access for now) can be set differently on the two
inodes via chmod(), chgrp(), chown(), chacl(), and setxattr(). Or
have I missed something? Is this really your intent?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 18:21                           ` Theodore Tso
@ 2009-05-06  4:27                             ` Casey Schaufler
  2009-05-06  4:42                               ` Jamie Lokier
  0 siblings, 1 reply; 34+ messages in thread
From: Casey Schaufler @ 2009-05-06  4:27 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Stephen Smalley, Chris Mason, James Morris, lsm, linux-fsdevel

Theodore Tso wrote:
> On Tue, May 05, 2009 at 10:56:03AM -0700, Joel Becker wrote:
>   
>> On Tue, May 05, 2009 at 01:44:11PM -0400, Stephen Smalley wrote:
>>     
>>>> Both use cases are equally valid, and I imagine there would be
>>>> interest in using reflinks both for snapshots and as a very
>>>> lightweight copy operation by commands like /bin/cp.
>>>>         
>> 	Sure, but you can start with a reflink and then do what you want
>> to it.
>>
>>     
>>> Not arguing against this, but just to note:  the security model will
>>> differ depending on these flags, as the link-like case doesn't require
>>> the caller to have read access to the file (the data is no more
>>> accessible than it was before), whereas the copy-like case requires the
>>> caller to have read access to the original file since the data "leaks"
>>> into a container with potentially different access constraints.
>>>       
>> 	Yeah, another reason why I don't want to complicate the
>> behavior.  I defined it as "the operation is like link(2)" for a reason
>> :-)
>>     
>
> The security model *is* the problem, however.  If we have a mode where
> reflink acts like cp, then it doesn't require anything special in
> terms of CAP_FOWNER.  It really is the same as a copy command.   
>
> So sure, you could start with a reflink and then modify it, but if
> you're an unprivileged user, you won't be able to create the reflink
> in the first place.
>
>   

On the topic of security modeling, I'd like to point out that one of
the reasons that Linux has been such a hit with the security community
is that you can model the file system accesses easily because no
matter what you do you end up at a definitive access control point,
the inode. Now I have a file that can have a thousand inodes, each of
which might have a different set of access control characteristics.
All existing Linux security descriptions go strait out the window.
Once a chown() has occurred any chance of limiting the propagation
of access rights is lost. With a single inode there is a definitive
name for the file system object (device/inode) where with multiple
inodes there is not. I'm not ignoring the copy-on-write, for a file
that has not been changed since the reflink() call that doesn't matter.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-06  4:08                           ` Casey Schaufler
@ 2009-05-06  4:28                             ` Jamie Lokier
  0 siblings, 0 replies; 34+ messages in thread
From: Jamie Lokier @ 2009-05-06  4:28 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Stephen Smalley, Theodore Tso, Chris Mason, James Morris, lsm,
	linux-fsdevel

Casey Schaufler wrote:
> > Even though you can't read the file if you couldn't read it before,
> > you now have a link to it which might preserve data they don't want to
> > be preserved.
> >
> > So reflink() should, perhaps, be more restricted than link().
>
> If I use reflink() I end up with two sets of initially identical
> security credentials, which is the right thing, but now read access
> (I'll skip write access for now) can be set differently on the two
> inodes via chmod(), chgrp(), chown(), chacl(), and setxattr(). Or
> have I missed something? Is this really your intent?

I guess the idea is that if you can do
chmod/chgrp/chown/chacl/setxattr on the new inode, then you had
sufficient permission to do it on the old inode anyway, so you can
read the data either way.

My points are: (1) You can do it covertly with reflink() - the owner
doesn't know - whereas with link() or just accessing the file
directly, they will notice.  (2) You can grab a reflink now while you
don't have permission to read the file, just inadvertant access to
it's directory entry, and perhaps some time in the future you will
have access to read the snapshot you have just grabbed.

(2) Cannot happen without reflink, because the source file owner may
know they have deleted or wiped the file before you are granted enough
permissions to be able to read it.  Heck, the owner might be the
system administrator, carefully scrubbing their penguin porn
collection just before they promote you to be another administrator.
reflink() lets you see what they had tantalisingly kept unreadable but
ls'able before - if you had the foresight to use it.

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-06  4:27                             ` Casey Schaufler
@ 2009-05-06  4:42                               ` Jamie Lokier
  2009-05-06  5:38                                 ` Casey Schaufler
  0 siblings, 1 reply; 34+ messages in thread
From: Jamie Lokier @ 2009-05-06  4:42 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Theodore Tso, Stephen Smalley, Chris Mason, James Morris, lsm,
	linux-fsdevel

Casey Schaufler wrote:
> Now I have a file that can have a thousand inodes, each of
> which might have a different set of access control characteristics.

>From a security perspective, how is this different from a
thousand separate files?

The copy-on-write is just an optimisation, a filesystem implementation
detail, from a certain perspective.

> With a single inode there is a definitive
> name for the file system object (device/inode) where with multiple
> inodes there is not. 

That's because there isn't a single object to name.  Why do you want
to pretend they are the same object?

They are separate files which share some disk blocks to save space and
time that's all.  A low-level implementation detail.  Completely
separate files can share blocks like that too on some filesystems.

In what way do the shared data blocks between otherwise separate files
have any security implication?

(Ok, ok, timing, ENOSPC, covert communications, but independent files
can trigger such interactions too.)

There's the actual creation of reflinks being invisible to i_nlink
watchers yet not requiring read permission, which is new.  But that
has nothing to do with the shared data: it would have the same
security implication even if reflink was just an ordinary file copy
with its proposed permission check!  :-)

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-06  4:42                               ` Jamie Lokier
@ 2009-05-06  5:38                                 ` Casey Schaufler
  2009-05-06  7:12                                   ` Theodore Tso
  0 siblings, 1 reply; 34+ messages in thread
From: Casey Schaufler @ 2009-05-06  5:38 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Theodore Tso, Stephen Smalley, Chris Mason, James Morris, lsm,
	linux-fsdevel

Jamie Lokier wrote:
> Casey Schaufler wrote:
>   
>> Now I have a file that can have a thousand inodes, each of
>> which might have a different set of access control characteristics.
>>     
>
> >From a security perspective, how is this different from a
> thousand separate files?
>
> The copy-on-write is just an optimisation, a filesystem implementation
> detail, from a certain perspective.
>   

Yes, I understand that. It's entirely possible that I don't
actually have a valid concern, but I'm having a little trouble
convincing myself that all the bases are covered.

It's different from 1000 separate files because I can now have
one set of data blocks with read access controlled by 1000 different
users.

    # chown user000 rfile000
    ...
    # chown user999 rfile999

now 1000 different users can grant access to those blocks,
so long as they don't change. Without reflink() I know that if
I own the file, it isn't open (fuser says so) and it is mode 700
that noone else can read it sans privilege. With reflink() not
only is this not true, but I can't find out who might be able to
read it. Changing the permissions, ACL, SELinux label, Smack label,
or TOMOYO policy won't help, because there may be another inode
out there somewhere that I can't even access that is granting the
rest of the world access.

>> With a single inode there is a definitive
>> name for the file system object (device/inode) where with multiple
>> inodes there is not. 
>>     
>
> That's because there isn't a single object to name.  Why do you want
> to pretend they are the same object?
>   

Until they actually diverge you can't say which object the data
blocks belong to. That means you can't say which set of access
control information protects the information, because someone
needs access through one or the other but in either case is
looking at the same data.

> They are separate files which share some disk blocks to save space and
> time that's all.  A low-level implementation detail.  Completely
> separate files can share blocks like that too on some filesystems.
>
> In what way do the shared data blocks between otherwise separate files
> have any security implication?
>
> (Ok, ok, timing, ENOSPC, covert communications, but independent files
> can trigger such interactions too.)
>
> There's the actual creation of reflinks being invisible to i_nlink
> watchers yet not requiring read permission, which is new.  But that
> has nothing to do with the shared data: it would have the same
> security implication even if reflink was just an ordinary file copy
> with its proposed permission check!  :-)
>   

Yeah, I can see the argument, I'm just not sure that I could
turn around and sell it to an eager-puppy security evaluator
fresh out of a PHD program at the U of Maryland.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-06  5:38                                 ` Casey Schaufler
@ 2009-05-06  7:12                                   ` Theodore Tso
  0 siblings, 0 replies; 34+ messages in thread
From: Theodore Tso @ 2009-05-06  7:12 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Jamie Lokier, Stephen Smalley, Chris Mason, James Morris, lsm,
	linux-fsdevel

On Tue, May 05, 2009 at 10:38:51PM -0700, Casey Schaufler wrote:
> It's different from 1000 separate files because I can now have
> one set of data blocks with read access controlled by 1000 different
> users.
> 
>     # chown user000 rfile000
>     ...
>     # chown user999 rfile999
> 
> now 1000 different users can grant access to those blocks,
> so long as they don't change. Without reflink() I know that if
> I own the file, it isn't open (fuser says so) and it is mode 700
> that noone else can read it sans privilege. With reflink() not
> only is this not true, but I can't find out who might be able to
> read it. Changing the permissions, ACL, SELinux label, Smack label,
> or TOMOYO policy won't help, because there may be another inode
> out there somewhere that I can't even access that is granting the
> rest of the world access.

Sure, but if the file is readable by 1000 different users, then they
they could each make 1000 different copies of the file.  So the
"reflink-copy-optimization" variant (i.e., do a reflink where the
initial owner is the user doing the reflink, and where the initial ACL
is the destination directory's default creation ACL, and the initial
group ownership, etc. is exactly the same as if you had created a new
file in the destination directory).... then this acts *precisely* the
same as an optimized file copy.  So if you allow someone to do a
"reflink-copy" only when they would be allowed to read the file, it's
merely a low-level optimization.

In contrast, the "reflink-link" variant which OCFS2 has prototyped
acts more like a link --- except it gets a new inode number.  From a
security perspective, you treat this exactly as if it were a link.

In both cases, you treat the quota as if the new file was created,
since the original file could be removed at any time, or the COW link
could be snapped and the file really copied.

> Yeah, I can see the argument, I'm just not sure that I could
> turn around and sell it to an eager-puppy security evaluator
> fresh out of a PHD program at the U of Maryland.

That's going to be true of *any* new filesystem feature, wouldn't it?
I don't think that's a justifiable reason not to implement a new
feature.  In any case, if the security evaluators are that silly, you
can always simply remove the ability to use reflinks altogether.  That
might break some application programs, but if the some that breaks
some General or Admiral's pet project, I'm sure pressure can be
brought to bear on the security evaluator.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 22:15         ` James Morris
  2009-05-05 22:31           ` Joel Becker
@ 2009-05-06 11:23           ` Stephen Smalley
  1 sibling, 0 replies; 34+ messages in thread
From: Stephen Smalley @ 2009-05-06 11:23 UTC (permalink / raw)
  To: James Morris; +Cc: Joel Becker, lsm, linux-fsdevel

On Wed, 2009-05-06 at 08:15 +1000, James Morris wrote:
> On Tue, 5 May 2009, Joel Becker wrote:
> 
> > On Mon, May 04, 2009 at 12:59:39PM -0400, Stephen Smalley wrote:
> > > On Tue, 2009-05-05 at 01:35 +1000, James Morris wrote:
> > > > Agreed, perhaps something like:
> > > > 
> > > > int security_inode_reflink(struct dentry *dentry, struct inode *dir);
> > > 
> > > I'd pass the same arguments as vfs_reflink(), i.e. old_dentry, dir,
> > > new_dentry.
> > 
> > 	I'm about to insert this bit.  I agree with
> > security_inode_reflink(old_dentry, dir, new_dentry),
> 
> If the files and metadata are initially identical (except for inode #), 
> why do we need to see both the old and new dentry?

Fair enough - he can drop the new_dentry argument.  selinux_inode_link()
doesn't use the new_dentry argument to security_inode_link() either.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: New reflink(2) syscall
  2009-05-05 22:45                         ` Jamie Lokier
  2009-05-06  4:08                           ` Casey Schaufler
@ 2009-05-06 11:25                           ` Stephen Smalley
  1 sibling, 0 replies; 34+ messages in thread
From: Stephen Smalley @ 2009-05-06 11:25 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Theodore Tso, Chris Mason, James Morris, lsm, linux-fsdevel

On Tue, 2009-05-05 at 23:45 +0100, Jamie Lokier wrote:
> Stephen Smalley wrote:
> > Not arguing against this, but just to note:  the security model will
> > differ depending on these flags, as the link-like case doesn't require
> > the caller to have read access to the file (the data is no more
> > accessible than it was before)
> 
> One security difference between reflink() and link() when linking to
> _other_ user's files is they can tell if you suddenly got a link to
> their file, from their i_nlink.  They can be suspicious and maybe
> overwrite the file in place, truncate it or something, and look around
> for the link you created in a secret place in your /home.
> 
> But they can't see if you got a reflink to their file.
> 
> Even though you can't read the file if you couldn't read it before,
> you now have a link to it which might preserve data they don't want to
> be preserved.
> 
> So reflink() should, perhaps, be more restricted than link().

That's why I suggested is_ower_or_cap() or a similar test in the case
where reflink(2) is applied to an inode owned by a user other than the
caller's fsuid.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2009-05-06 11:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.LRH.2.00.0905041655220.21713@tundra.namei.org>
     [not found] ` <1241443016.3023.51.camel@localhost.localdomain>
2009-05-04 15:35   ` New reflink(2) syscall James Morris
2009-05-04 16:59     ` Stephen Smalley
2009-05-04 17:49       ` Joel Becker
2009-05-05 18:00       ` Joel Becker
2009-05-05 18:41         ` Stephen Smalley
2009-05-05 19:15           ` Joel Becker
2009-05-05 19:14             ` Stephen Smalley
2009-05-05 19:33               ` Joel Becker
2009-05-05 22:15         ` James Morris
2009-05-05 22:31           ` Joel Becker
2009-05-06 11:23           ` Stephen Smalley
     [not found]   ` <20090504163514.GB31249@mail.oracle.com>
     [not found]     ` <1241458669.3023.203.camel@localhost.localdomain>
2009-05-04 18:08       ` Joel Becker
2009-05-04 19:30         ` Stephen Smalley
2009-05-04 21:03           ` Joel Becker
2009-05-04 21:30             ` Joel Becker
2009-05-05 11:44               ` Stephen Smalley
2009-05-05 16:46                 ` Joel Becker
2009-05-04 23:13             ` Theodore Tso
2009-05-05 16:47               ` Joel Becker
2009-05-05 16:56                 ` Chris Mason
2009-05-05 17:13                   ` Joel Becker
2009-05-05 17:34                     ` Theodore Tso
2009-05-05 17:44                       ` Stephen Smalley
2009-05-05 17:56                         ` Joel Becker
2009-05-05 18:21                           ` Theodore Tso
2009-05-06  4:27                             ` Casey Schaufler
2009-05-06  4:42                               ` Jamie Lokier
2009-05-06  5:38                                 ` Casey Schaufler
2009-05-06  7:12                                   ` Theodore Tso
2009-05-05 22:45                         ` Jamie Lokier
2009-05-06  4:08                           ` Casey Schaufler
2009-05-06  4:28                             ` Jamie Lokier
2009-05-06 11:25                           ` Stephen Smalley
2009-05-05 17:36                     ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).