* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] <1030820234.4408.119.camel@ldb> @ 2002-08-31 19:36 ` Linus Torvalds 2002-08-31 19:38 ` Luca Barbieri 0 siblings, 1 reply; 23+ messages in thread From: Linus Torvalds @ 2002-08-31 19:36 UTC (permalink / raw) To: Luca Barbieri; +Cc: trond.myklebust, Linux FSdevel, Linux Kernel On 31 Aug 2002, Luca Barbieri wrote: > > But aren't credential changes supposed to be infrequent? > > If so, isn't it faster to put the fields directly in task_struct, keep a > separate shared structure and copy them on kernel entry? But that makes us copy them every time, even though they practically never change.. Much better to only copy them in the extremely rare cases when they do change. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-08-31 19:36 ` [PATCH] Initial support for struct vfs_cred [0/1] Linus Torvalds @ 2002-08-31 19:38 ` Luca Barbieri 2002-08-31 22:30 ` Trond Myklebust 0 siblings, 1 reply; 23+ messages in thread From: Luca Barbieri @ 2002-08-31 19:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: trond.myklebust, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1290 bytes --] On Sat, 2002-08-31 at 21:36, Linus Torvalds wrote: > > On 31 Aug 2002, Luca Barbieri wrote: > > > > But aren't credential changes supposed to be infrequent? > > > > If so, isn't it faster to put the fields directly in task_struct, keep a > > separate shared structure and copy them on kernel entry? > > But that makes us copy them every time, even though they practically never > change.. Much better to only copy them in the extremely rare cases when > they do change. Sorry, I have explained myself incorrectly. When credentials are changed, the changing task walks the list of tasks sharing credentials with him and sets the propagate_cred flag in their thread_info's. The assembly code at entry checks this. It's just two instructions, one memory read: cmpl $0, propagate_cred(%ebx) jnz do_propagate_cred We could use the flags field, but we need atomic and/or and we still don't have them for all architectures. We could even merge this with the syscall trace check (but that brings more complexity to avoid races). Then the rest of the code doesn't need to know at all that credentials are shared and is simpler and faster. We have however a larger penalty on credential change but, as you say, that's extremely rare (well, perhaps not necessarily extremely, but still rare). [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-08-31 19:38 ` Luca Barbieri @ 2002-08-31 22:30 ` Trond Myklebust 2002-08-31 23:13 ` Luca Barbieri 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2002-08-31 22:30 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > Then the rest of the code doesn't need to know at all that > credentials are shared and is simpler and faster. We have > however a larger penalty on credential change but, as you say, > that's extremely rare (well, perhaps not necessarily extremely, > but still rare). What if I, in a fit of madness/perversion, decide to use CLONE_CRED between 2 kernel threads (i.e. no 'kernel entry')? Leaving CLONE_CRED aside, please do not forget that most of the motivation for vfs_cred is the need to *cache* credentials. This is something which we already do today in several filesystems: Coda, Intermezzo, NFS, to name but the most obvious. The result of the lack of a VFS-sanctioned credential is that we have to use 'struct file' as a vehicle for passing credentials in, for instance, the address_space_operations, and that each filesystem ends up having to keep its own private copies of those credentials in file->private_data. Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-08-31 22:30 ` Trond Myklebust @ 2002-08-31 23:13 ` Luca Barbieri 2002-09-01 13:03 ` Trond Myklebust 2002-09-01 15:15 ` Daniel Phillips 0 siblings, 2 replies; 23+ messages in thread From: Luca Barbieri @ 2002-08-31 23:13 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 3512 bytes --] On Sun, 2002-09-01 at 00:30, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > Then the rest of the code doesn't need to know at all that > > credentials are shared and is simpler and faster. We have > > however a larger penalty on credential change but, as you say, > > that's extremely rare (well, perhaps not necessarily extremely, > > but still rare). > > What if I, in a fit of madness/perversion, decide to use CLONE_CRED > between 2 kernel threads (i.e. no 'kernel entry')? You don't or you manually patch the task_struct of the other threads. This isn't a serious concern. > Leaving CLONE_CRED aside, please do not forget that most of the > motivation for vfs_cred is the need to *cache* credentials. > This is something which we already do today in several filesystems: > Coda, Intermezzo, NFS, to name but the most obvious. > The result of the lack of a VFS-sanctioned credential is that we have > to use 'struct file' as a vehicle for passing credentials in, for > instance, the address_space_operations, and that each filesystem ends > up having to keep its own private copies of those credentials in > file->private_data. I'm not proposing to not use vfs_cred (the pseudocode I written includes it). I'm just suggesting a way to speed up access to credentials. Your code does get_current_vfscred()/put_vfscred() in normal paths and accesses credentials with an indirect pointer. Furthermore, setting the uid and gid are very expensive since they need to break copy-on-write. If we instead only allow changes to credentials on kernel mode entry, we don't have to worry about changing credentials and the code becomes simpler and faster. It also seems better to only copy-on-write groups and just copy uid and gid. For example, rather than this; - uid_t saved_fsuid = current->fsuid; + struct vfs_cred *saved_cred = get_current_vfscred(); kernel_cap_t saved_cap = current->cap_effective; /* Create RPC socket as root user so we get a priv port */ - current->fsuid = 0; + setfsuid(0); cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE); xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL); - current->fsuid = saved_fsuid; + set_current_vfscred(saved_cred); + put_vfscred(saved_cred); you can just do this: - uid_t saved_fsuid = current->fsuid; + uid_t saved_fsuid = current->fscred.uid; kernel_cap_t saved_cap = current->cap_effective; /* Create RPC socket as root user so we get a priv port */ - current->fsuid = 0; + current->fscred.uid = 0; cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE); xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL); - current->fsuid = saved_fsuid; + current->fscred.uid = saved_fsuid; (here the fscred is the one called cred in the pseudocode: fscred is a better name). This code uses vfs_cred but is as fast as the old one while the one in your patch is significantly slower (e.g. compare your setfsuid with current->fscred.uid = xxx). If you then need to keep the vfs_cred data for longer than a single syscall, just copy it like you would do in the propagation code. That means copying the structure and then incrementing the reference count on groups. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-08-31 23:13 ` Luca Barbieri @ 2002-09-01 13:03 ` Trond Myklebust 2002-09-01 14:10 ` Trond Myklebust 2002-09-01 14:33 ` Luca Barbieri 2002-09-01 15:15 ` Daniel Phillips 1 sibling, 2 replies; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 13:03 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > For example, rather than this; <snip> > you can just do this: > - uid_t saved_fsuid = current->fsuid; > + uid_t saved_fsuid = current->fscred.uid; > kernel_cap_t saved_cap = > current->cap_effective; But I don't want to have to do that at all. Why should I change the actual task's privileges in order to call down into a single VFS function? The point of VFS support for credentials is to eliminate these hacks, and cut down on all this gratuitous changing of privilege. That's what we want the API changes for. Who cares if changing fsuid/fsgid is more expensive? The only place we should actually be doing that is in sys_fsuid(), sys_fsgid(), and possibly daemonize(), where adequate security checks can be made. Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 13:03 ` Trond Myklebust @ 2002-09-01 14:10 ` Trond Myklebust 2002-09-01 14:20 ` Luca Barbieri [not found] ` <1030890022.2145.52.camel@ldb> 2002-09-01 14:33 ` Luca Barbieri 1 sibling, 2 replies; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 14:10 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel >>>>> " " == Trond Myklebust <trond.myklebust@fys.uio.no> writes: >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: >> For example, rather than this; > <snip> >> you can just do this: >> - uid_t saved_fsuid = current->fsuid; >> + uid_t saved_fsuid = current->fscred.uid; >> kernel_cap_t saved_cap = current-> cap_effective; > But I don't want to have to do that at all. Why should I change Just to follow up on why the above 'optimization' is just plain wrong. You are forgetting that the fscred might simultaneously be referenced by an open struct file. Are you saying that this file should suddenly see its credential change? The alternative without copy on write is to make a full copy of the fscred every time we open a file or schedule some form of asynchronous I/O, and hence need to cache the current VFS credentials. The copy-on-write rule is there in order to *minimize* the need to copy the cred. It works because changing the cred's entries is supposed to be a *rare* occurrence, whereas taking references and reading are common. Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 14:10 ` Trond Myklebust @ 2002-09-01 14:20 ` Luca Barbieri [not found] ` <1030890022.2145.52.camel@ldb> 1 sibling, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 14:20 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1724 bytes --] On Sun, 2002-09-01 at 16:10, Trond Myklebust wrote: > >>>>> " " == Trond Myklebust <trond.myklebust@fys.uio.no> writes: > > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > >> For example, rather than this; > > <snip> > > >> you can just do this: > >> - uid_t saved_fsuid = current->fsuid; > >> + uid_t saved_fsuid = current->fscred.uid; > >> kernel_cap_t saved_cap = > current-> cap_effective; > > > But I don't want to have to do that at all. Why should I change > > Just to follow up on why the above 'optimization' is just plain wrong. > > You are forgetting that the fscred might simultaneously be referenced > by an open struct file. Are you saying that this file should suddenly > see its credential change? No, it cannot be referenced by an open struct file because you copy the structure, not pointers to it. > The alternative without copy on write is to make a full copy of the > fscred every time we open a file or schedule some form of asynchronous > I/O, and hence need to cache the current VFS credentials. Yes. Note however that the structure is like this: struct vfs_cred { uid_t uid, gid; vfs_cred_groups* groups; } vfs_cred_copy(struct vfs_cred* dest, struct vfs_cred* src) { dest->uid = src->uid; dest->gid = src->gid; dest->groups = src->groups; atomic_inc(&src->groups->count); } Of course if you don't need groups you can avoid copying them. This is efficient if you usually either check the uid and gid or copy only them (omitting groups). If instead copying the whole structure is more frequent than the operations described above, then use reference counting and copy-on-write for the whole structure, but I don't think that this is the case. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1030890022.2145.52.camel@ldb>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] ` <1030890022.2145.52.camel@ldb> @ 2002-09-01 16:40 ` Trond Myklebust [not found] ` <15730.17171.162970.367575@charged.uio.no> 1 sibling, 0 replies; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 16:40 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: >> You are forgetting that the fscred might simultaneously be >> referenced by an open struct file. Are you saying that this >> file should suddenly see its credential change? > No, it cannot be referenced by an open struct file because you > copy the structure, not pointers to it. So you are proposing to optimize for the rare case of setuid(), instead of the more common case of file open()? Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <15730.17171.162970.367575@charged.uio.no>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] ` <15730.17171.162970.367575@charged.uio.no> @ 2002-09-01 18:54 ` Luca Barbieri 2002-09-01 19:40 ` Trond Myklebust [not found] ` <15730.27952.29723.552617@charged.uio.no> 0 siblings, 2 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 18:54 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1539 bytes --] On Sun, 2002-09-01 at 18:40, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > >> You are forgetting that the fscred might simultaneously be > >> referenced by an open struct file. Are you saying that this > >> file should suddenly see its credential change? > > No, it cannot be referenced by an open struct file because you > > copy the structure, not pointers to it. > > So you are proposing to optimize for the rare case of setuid(), > instead of the more common case of file open()? No, this does not optimize for a setuid. It allows to easily make temporary modifications to the credentials but that's not the main focus. Permanent modifications, if credentials are shared, would need to walk the shared-cred tasklist and set the propagation flag on all tasks on it so I'm surely not proposing to optimize for them. And, in the common case of open, why do you need to copy the structure to check permissions? I think that open should just check the current values. open might want to copy credentials in case you want to do the inode lookup asynchronously but then it doesn't make sense to optimize for this since you already have the huge disk read penalty. BTW, the 2.5.32 open does the check in vfs_permission without copying anything. Anyway it's just a 3 long copy plus an atomic inc vs. 1 long copy and atomic inc. And if you don't need the groups array, it's just a 2 longs copy that on some architectures with very slow atomic operations (e.g. sparc) is much better. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 18:54 ` Luca Barbieri @ 2002-09-01 19:40 ` Trond Myklebust [not found] ` <15730.27952.29723.552617@charged.uio.no> 1 sibling, 0 replies; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 19:40 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > And, in the common case of open, why do you need to copy the > structure to check permissions? I think that open should just > check the current values. open might want to copy credentials Because, as has been explained to you, we have things like Coda, Intermezzo, NFS, for which this is insufficient. > in case you want to do the inode lookup asynchronously but then > it doesn't make sense to optimize for this since you already > have the huge disk read penalty. BTW, the 2.5.32 open does the > check in vfs_permission without copying anything. Anyway it's > just a 3 long copy plus an atomic inc vs. 1 long copy and > atomic inc. And if you don't need the groups array, it's just > a 2 longs copy that on some architectures with very slow atomic > operations (e.g. sparc) is much better. But we we do need to check the groups array in the VFS. And as Linus pointed out, there is a good case for passing info from the user_struct too (crypto), etc... Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <15730.27952.29723.552617@charged.uio.no>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] ` <15730.27952.29723.552617@charged.uio.no> @ 2002-09-01 21:34 ` Luca Barbieri [not found] ` <1030916061.2145.344.camel@ldb> 1 sibling, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 21:34 UTC (permalink / raw) To: trond.myklebust; +Cc: Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1703 bytes --] On Sun, 2002-09-01 at 21:40, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > And, in the common case of open, why do you need to copy the > > structure to check permissions? I think that open should just > > check the current values. open might want to copy credentials > > Because, as has been explained to you, we have things like Coda, > Intermezzo, NFS, for which this is insufficient. If they only need them at open, and the open is synchronous, you don't need to copy them. Otherwise, if you need them in other syscall, or in another context, you probably need to copy them, that only costs an additional 2 long copies. > > in case you want to do the inode lookup asynchronously but then > > it doesn't make sense to optimize for this since you already > > have the huge disk read penalty. BTW, the 2.5.32 open does the > > check in vfs_permission without copying anything. Anyway it's > > just a 3 long copy plus an atomic inc vs. 1 long copy and > > atomic inc. And if you don't need the groups array, it's just > > a 2 longs copy that on some architectures with very slow atomic > > operations (e.g. sparc) is much better. > > But we we do need to check the groups array in the VFS. And as Linus > pointed out, there is a good case for passing info from the > user_struct too (crypto), etc... There is copy-on-write for the groups array. The idea is that anything that takes little time to use should be in the copy-always structure, while things that need a long time to access (e.g. groups, lsm data) and/or are big, stay in the copy-on-write structure referenced from the copy-always one. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1030916061.2145.344.camel@ldb>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] ` <1030916061.2145.344.camel@ldb> @ 2002-09-01 21:56 ` Trond Myklebust [not found] ` <15730.36080.987645.452664@charged.uio.no> 1 sibling, 0 replies; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 21:56 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: >> Because, as has been explained to you, we have things like >> Coda, Intermezzo, NFS, for which this is insufficient. > If they only need them at open, and the open is synchronous, > you don't need to copy them. Bullshit. You clearly haven't got a clue what you are talking about. For all these 3 systems credentials need to be cached from file open to file close. YES EVEN NOW, WITH NO CLONE_CRED AND WITH SYNCHRONOUS OPEN !!!! On something like NFS or Coda, the server needs to receive authentication information for each and every RPC call we send to it. There's no state. The server does not know that we have opened a file. Currently this is done by the NFS client hiding information in the file's private_data field. This means that other people that want to do write-through-caching etc are in trouble 'cos they have to cope with the fact that NFS has its private field, Coda has its private field,... And they are all doing the same thing in different ways. Now stop trolling Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <15730.36080.987645.452664@charged.uio.no>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] ` <15730.36080.987645.452664@charged.uio.no> @ 2002-09-01 22:50 ` Luca Barbieri 0 siblings, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 22:50 UTC (permalink / raw) To: trond.myklebust; +Cc: Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 2519 bytes --] On Sun, 2002-09-01 at 23:56, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > >> Because, as has been explained to you, we have things like > >> Coda, Intermezzo, NFS, for which this is insufficient. > > If they only need them at open, and the open is synchronous, > > you don't need to copy them. > > Bullshit. You clearly haven't got a clue what you are talking about. > For all these 3 systems credentials need to be cached from file open > to file close. > > YES EVEN NOW, WITH NO CLONE_CRED AND WITH SYNCHRONOUS OPEN !!!! > > On something like NFS or Coda, the server needs to receive > authentication information for each and every RPC call we send to > it. There's no state. The server does not know that we have opened a > file. But then in the _open_ syscall you don't need to send them from a copy. However, since you need them in the later syscalls, you need to copy them to the file structure for the later syscalls in open, but you don't need to use copied credentials for the operation of opening a file (assuming it's done synchronously within sys_open). Anyway this is only relevant to decide whether to always copy uid and gid or to use copy-write on them and access them with an extra memory read (to read the pointer to the copy-on-write structure), which is not the main issue. BTW, imho a correctly designed network filesystem should have a single stateful encrypted connection (or a pool of equally authenticated ones) and credentials (i.e. passwords) should only be passed when the user makes the first filesystem access. After that the server should do authentication with the OR of all credentials received and the client kernel should further decide whether it can give access to a particular user. This is off-topic here, though. > Currently this is done by the NFS client hiding information in the > file's private_data field. This means that other people that want to > do write-through-caching etc are in trouble 'cos they have to cope > with the fact that NFS has its private field, Coda has its private > field,... And they are all doing the same thing in different ways. Yes, I agree with the need to provide copy-on-write. I just disagree with the idea that code should be written to handle changing credentials and that credentials should be passed as parameters and I suggest that always copying uid and gid might be better since accessing uid and gid is more frequent and happens in faster paths than copying credentials. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 13:03 ` Trond Myklebust 2002-09-01 14:10 ` Trond Myklebust @ 2002-09-01 14:33 ` Luca Barbieri 2002-09-01 16:38 ` Trond Myklebust 1 sibling, 1 reply; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 14:33 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1438 bytes --] On Sun, 2002-09-01 at 15:03, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > For example, rather than this; > <snip> > > > you can just do this: > > - uid_t saved_fsuid = current->fsuid; > > + uid_t saved_fsuid = current->fscred.uid; > > kernel_cap_t saved_cap = > > current->cap_effective; > > But I don't want to have to do that at all. Why should I change the > actual task's privileges in order to call down into a single VFS > function? > The point of VFS support for credentials is to eliminate these hacks, > and cut down on all this gratuitous changing of privilege. That's what > we want the API changes for. That's what the code did, unless I misunderstood it. Anyway if you want to give a different fsuid to a filesystem function, you either pass credentials as a parameter (that means that you change all the functions in the call chain to do that) or you modify a structure present or referenced from the task_struct (or you disable preemption and use a per-cpu variable, but this would be a very bad idea here). Furthermore if no one except the current task can access/modify the task credentials the "gratuitous changing of privileges" is only seen by the VFS function you call, so it is fine (it may not be conceptually elegant, but that should not be preferred over being fast and minimally intrusive). [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 14:33 ` Luca Barbieri @ 2002-09-01 16:38 ` Trond Myklebust 2002-09-01 18:42 ` Luca Barbieri 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 16:38 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > That's what the code did, unless I misunderstood it. Anyway if > you want to give a different fsuid to a filesystem function, > you either pass credentials as a parameter (that means that you > change all the functions in the call chain to do that) or you If you read through the beginning of the thread, you will see that this is *exactly* what I'm proposing to do. Just not in this one already very large patch. Cheers, Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 16:38 ` Trond Myklebust @ 2002-09-01 18:42 ` Luca Barbieri 2002-09-01 19:25 ` Trond Myklebust 0 siblings, 1 reply; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 18:42 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1386 bytes --] On Sun, 2002-09-01 at 18:38, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > That's what the code did, unless I misunderstood it. Anyway if > > you want to give a different fsuid to a filesystem function, > > you either pass credentials as a parameter (that means that you > > change all the functions in the call chain to do that) or you > > If you read through the beginning of the thread, you will see that > this is *exactly* what I'm proposing to do. Just not in this one > already very large patch. But you'll need to modify the declaration of the various function pointers whose implementations might need credentials and modify all functions that call them and deal with permissions. Instead with my proposal the credentials are automatically immutable across the syscall without needing to worry at all about locks, counts and sharing. And remember that passing parameters has a cost since you need to push them and the stack will be larger, occupy more cachelines, and thus cause more cache misses. All this without any particular reason to do since normally the credentials are always the same. Of course, if you have reason to think that we'll need to call VFS functions with a lot of differential credential then you are right, but otherwise you would add just a lot of useless overhead and code modifications. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 18:42 ` Luca Barbieri @ 2002-09-01 19:25 ` Trond Myklebust 2002-09-01 21:36 ` Luca Barbieri 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2002-09-01 19:25 UTC (permalink / raw) To: Luca Barbieri; +Cc: Linux FSdevel, Linux Kernel >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > But you'll need to modify the declaration of the various > function pointers whose implementations might need credentials > and modify all functions that call them and deal with > permissions. Instead with my proposal the credentials are Yes... And this is a useful activity in itself, as the existence of all these hacks that temporarily change uid/gid/whatever... show. > automatically immutable across the syscall without needing to > worry at all about locks, counts and sharing. I still have no opinion about your proposal for implementing CLONE_CRED. What I fail to see is why you appear to insist it would be incompatible with the idea of copy-on-write VFS credentials (which I explained are interesting for other purposes). I also fail to understand why you insist that we need to drop the idea of copy-on-write credentials in order to optimize for this fringe case in which somebody calls sys_access() or exec with euid != fsuid. Now repeat after me "changing fsuid/fsgid is *not* the common case that needs optimization." Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 19:25 ` Trond Myklebust @ 2002-09-01 21:36 ` Luca Barbieri 0 siblings, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 21:36 UTC (permalink / raw) To: trond.myklebust; +Cc: Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 3795 bytes --] On Sun, 2002-09-01 at 21:25, Trond Myklebust wrote: > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > But you'll need to modify the declaration of the various > > function pointers whose implementations might need credentials > > and modify all functions that call them and deal with > > permissions. Instead with my proposal the credentials are > > Yes... And this is a useful activity in itself, as the existence of > all these hacks that temporarily change uid/gid/whatever... show. I disagree. It reduces performance, and results in huge patches. Furthermore you can't predict whether the target of a function pointer will need credentials or not. > > automatically immutable across the syscall without needing to > > worry at all about locks, counts and sharing. > > I still have no opinion about your proposal for implementing CLONE_CRED. > > What I fail to see is why you appear to insist it would be > incompatible with the idea of copy-on-write VFS credentials (which I > explained are interesting for other purposes). I don't insist on that (see below). > I also fail to understand why you insist that we need to drop the idea > of copy-on-write credentials in order to optimize for this fringe case > in which somebody calls sys_access() or exec with euid != fsuid. I do not propose to remove the idea of copy-on-write credentials, just to use copy-on-write only for groups and "copy-always" for uid and gid so that access to uid and gid doesn't need to go through the credentials pointer. You have code like this: + cred = get_current_vfscred(); fdata->fd_do_lml = 0; - fdata->fd_fsuid = current->fsuid; - fdata->fd_fsgid = current->fsgid; + fdata->fd_fsuid = cred->uid; + fdata->fd_fsgid = cred->gid; fdata->fd_mode = file->f_dentry->d_inode->i_mode; - fdata->fd_ngroups = current->ngroups; - for (i=0 ; i<current->ngroups ; i++) - fdata->fd_groups[i] = current->groups[i]; + fdata->fd_ngroups = cred->ngroups; + for (i=0 ; i<cred->ngroups ; i++) + fdata->fd_groups[i] = cred->groups[i]; fdata->fd_bytes_written = 0; /*when open,written data is zero*/ file->private_data = fdata; + put_vfscred(cred); The get/put here only make sense if something else can modify the credentials without it. Otherwise, you can skip them since the reference count will always be > 0 because nothing else can change the current->vfscred pointer, and that will contribute to the reference count. What I'm trying to do is to design credentials in such a way that, even with shared credentials, the get/put won't be necessary. I'm also trying to put such credentials directly in task_struct so that the code doesn't need to get a pointer from task_struct and then fetch the credentials from the structure it points to. The order of likelyhood of credential operations is assumed to be the following: 1. Use/copy/local temporary modification of uid and gid 2. Use/copy/local temporary modification of groups 3. Permanent modification of something * Local temporary modification means that the modification only has effect on the current task and is revert before the end of the syscall To optimize for this I propose to: 1. Put uid and gid directly in task_struct (optimize for 1) 2. Use copy-on-write on groups so that we avoid copying an big array (optimize for 2.copy) 3. Change credentials only when not performing a syscall (optimize for 1 and 2) > "changing fsuid/fsgid is *not* the common case that needs optimization." I completely agree with this, that's the whole point! [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-08-31 23:13 ` Luca Barbieri 2002-09-01 13:03 ` Trond Myklebust @ 2002-09-01 15:15 ` Daniel Phillips 2002-09-01 15:35 ` Luca Barbieri 1 sibling, 1 reply; 23+ messages in thread From: Daniel Phillips @ 2002-09-01 15:15 UTC (permalink / raw) To: Luca Barbieri, trond.myklebust Cc: Linus Torvalds, Linux FSdevel, Linux Kernel On Sunday 01 September 2002 01:13, Luca Barbieri wrote: > On Sun, 2002-09-01 at 00:30, Trond Myklebust wrote: > > >>>>> " " == Luca Barbieri <ldb@ldb.ods.org> writes: > > > > > Then the rest of the code doesn't need to know at all that > > > credentials are shared and is simpler and faster. We have > > > however a larger penalty on credential change but, as you say, > > > that's extremely rare (well, perhaps not necessarily extremely, > > > but still rare). > > > > What if I, in a fit of madness/perversion, decide to use CLONE_CRED > > between 2 kernel threads (i.e. no 'kernel entry')? > You don't or you manually patch the task_struct of the other threads. > This isn't a serious concern. It is a serious concern. Inventing new, subtle behavior differences between user and kernel threads is, in a word, gross. It's certain to bite people in the future. -- Daniel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] 2002-09-01 15:15 ` Daniel Phillips @ 2002-09-01 15:35 ` Luca Barbieri 0 siblings, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-09-01 15:35 UTC (permalink / raw) To: Daniel Phillips Cc: trond.myklebust, Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 420 bytes --] > It is a serious concern. Inventing new, subtle behavior differences > between user and kernel threads is, in a word, gross. It's certain > to bite people in the future. So you are suggesting that it's better to slow down *all* threads so that it's possible to have kernel threads with automatically shared credentials? BTW, signals and rescheduling (unless PREEMPT=y) don't work automatically for the same reason. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <15728.61345.184030.293634@charged.uio.no>]
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] <15728.61345.184030.293634@charged.uio.no> @ 2002-08-31 18:57 ` Luca Barbieri 2002-08-31 19:51 ` Luca Barbieri 1 sibling, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-08-31 18:57 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 2278 bytes --] But aren't credential changes supposed to be infrequent? If so, isn't it faster to put the fields directly in task_struct, keep a separate shared structure and copy them on kernel entry? Here is some pseudocode to better illustrate the idea. vfs_shared_cred::lock can be removed if it's not necessary to do atomic modifications of multiple cred fields (otherwise you could copy and then exchange a pointer). vfs_shared_cred::tasklist_lock can be replaced with RCU. struct vfs_cred_groups { int ngroups; gid_t groups[]; }; struct vfs_cred { uid_t uid, gid; struct vfs_cred_groups* groups; }; struct vfs_shared_cred { atomic_t count; spinlock_t lock; spinlock_t tasklist_lock; vfs_cred cred; }; struct task_struct { struct vfs_cred cred; struct vfs_cred* shared_cred; list_t shared_cred_list; }; struct thread_info { unsigned propagate_cred; }; propagate_cred() { current_thread_info()->propagate_cred = 0; spin_lock(¤t->shared_cred->lock); current->cred = current->shared_cred->cred; spin_unlock(¤t->shared_cred->lock); } kernel_entry() /* asm for every architecture :( */ { if(unlikely(current_thread_info()->propagate_cred)) propagate_cred(); } change_credentials() { list_t list; spin_lock(¤t->shared_cred->lock); frob(current->shared_cred); spin_unlock(¤t->shared_cred->lock); wmb(); spin_lock(¤t->shared_cred->tasklist_lock); list_for_each(list, ¤t->shared_cred_list) { struct task_struct* task = list_entry(list, struct task_struct, shared_cred_list); task->thread_info->propagate_cred = 1; } spin_unlock(¤t->shared_cred->tasklist_lock); } clone() { spin_lock(¤t->shared_cred->lock); new->cred = current->shared_cred->cred); /* not current->cred! */ spin_unlock(¤t->shared_cred->lock); if(flags & CLONE_CRED) { new->shared_cred = get_shared_cred(current->shared_cred); spin_lock(¤t->shared_cred->tasklist_lock); list_add(new, ¤t->shared_cred_list); spin_unlock(¤t->shared_cred->tasklist_lock); } } static void release_task(struct task_struct * p) { spin_lock(¤t->shared_cred->tasklist_lock); __list_del(current->shared_cred_list); spin_unlock(¤t->shared_cred->tasklist_lock); put_shared_cred(current->shared_cred); } [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Initial support for struct vfs_cred [0/1] [not found] <15728.61345.184030.293634@charged.uio.no> 2002-08-31 18:57 ` Luca Barbieri @ 2002-08-31 19:51 ` Luca Barbieri 1 sibling, 0 replies; 23+ messages in thread From: Luca Barbieri @ 2002-08-31 19:51 UTC (permalink / raw) To: trond.myklebust; +Cc: Linus Torvalds, Linux FSdevel, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 420 bytes --] Forgot to mention that, of course, the vfs_cred_groups struct needs to be recreated if modified and we must keep an atomic reference count in it. This allows to avoid having to copy the whole groups array for each task on modification. uid and gid instead are placed directly in task_struct since checking them is faster and thus needs more optimization. Same reasoning applies for eventual ACLs or similar structures. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Initial support for struct vfs_cred [0/1]
@ 2002-08-31 16:32 Trond Myklebust
0 siblings, 0 replies; 23+ messages in thread
From: Trond Myklebust @ 2002-08-31 16:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux FSdevel, Linux Kernel
Hi Linus,
I hope this addresses all the issues that you raised (minus
user_struct, as I explained). Changes w.r.t. previous patches are as
follows:
1) Dropped current_fsuid(), current_fsgid(), current_ngroups().
=> had to merge patches [1/3] & [3/3]. Makes for a rather large
initial patch (see diffstat output below) 8-(
2) 'struct ucred' renamed to 'struct vfs_cred'. Renamed the
helper routines to reflect the new struct name.
3) Moved #ifdef __KERNEL__
4) Renamed the remaining current_* routines so as to drop the
'current_' prefix where appropriate (e.g. setfsuid() instead of
current_setfsuid(), ...).
Hopefully the naming scheme will be seen as an improvement.
5) Dropped patch [2/3]. There is no longer a namespace conflict
to justify renaming the networking code's 'struct ucred'.
Cheers,
Trond
arch/s390x/kernel/linux32.c | 9
arch/sparc64/kernel/sys_sparc32.c | 9
drivers/hotplug/pci_hotplug_core.c | 4
drivers/isdn/capi/capifs.c | 4
drivers/usb/core/inode.c | 4
fs/affs/inode.c | 4
fs/attr.c | 6
fs/bfs/dir.c | 4
fs/coda/coda_linux.c | 6
fs/devpts/inode.c | 4
fs/dquot.c | 2
fs/driverfs/inode.c | 4
fs/exec.c | 6
fs/ext2/balloc.c | 2
fs/ext2/ialloc.c | 4
fs/ext2/ioctl.c | 4
fs/ext3/balloc.c | 2
fs/ext3/ialloc.c | 4
fs/ext3/ioctl.c | 4
fs/file_table.c | 8
fs/hpfs/namei.c | 24 -
fs/intermezzo/file.c | 13 -
fs/intermezzo/journal.c | 54 ++--
fs/intermezzo/kml_reint.c | 12
fs/intermezzo/upcall.c | 2
fs/intermezzo/vfs.c | 4
fs/jffs/inode-v23.c | 28 +-
fs/jffs2/fs.c | 4
fs/jfs/jfs_inode.c | 4
fs/lockd/host.c | 7
fs/locks.c | 2
fs/minix/bitmap.c | 4
fs/namei.c | 8
fs/nfsd/auth.c | 11
fs/nfsd/vfs.c | 4
fs/open.c | 17 -
fs/pipe.c | 4
fs/proc/array.c | 11
fs/proc/base.c | 4
fs/ramfs/inode.c | 4
fs/reiserfs/namei.c | 4
fs/sysv/ialloc.c | 4
fs/udf/ialloc.c | 4
fs/udf/namei.c | 2
fs/ufs/ialloc.c | 4
fs/umsdos/namei.c | 8
include/linux/cred.h | 88 ++++++
include/linux/init_task.h | 1
include/linux/sched.h | 8
init/main.c | 1
kernel/Makefile | 5
kernel/cred.c | 468 +++++++++++++++++++++++++++++++++++++
kernel/fork.c | 17 +
kernel/kmod.c | 6
kernel/sys.c | 65 ++---
kernel/uid16.c | 9
mm/shmem.c | 8
net/socket.c | 4
net/sunrpc/auth_unix.c | 55 ++--
net/sunrpc/sched.c | 2
security/capability.c | 4
security/dummy.c | 5
62 files changed, 835 insertions(+), 252 deletions(-)
^ permalink raw reply [flat|nested] 23+ messages in threadend of thread, other threads:[~2002-09-01 22:50 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1030820234.4408.119.camel@ldb>
2002-08-31 19:36 ` [PATCH] Initial support for struct vfs_cred [0/1] Linus Torvalds
2002-08-31 19:38 ` Luca Barbieri
2002-08-31 22:30 ` Trond Myklebust
2002-08-31 23:13 ` Luca Barbieri
2002-09-01 13:03 ` Trond Myklebust
2002-09-01 14:10 ` Trond Myklebust
2002-09-01 14:20 ` Luca Barbieri
[not found] ` <1030890022.2145.52.camel@ldb>
2002-09-01 16:40 ` Trond Myklebust
[not found] ` <15730.17171.162970.367575@charged.uio.no>
2002-09-01 18:54 ` Luca Barbieri
2002-09-01 19:40 ` Trond Myklebust
[not found] ` <15730.27952.29723.552617@charged.uio.no>
2002-09-01 21:34 ` Luca Barbieri
[not found] ` <1030916061.2145.344.camel@ldb>
2002-09-01 21:56 ` Trond Myklebust
[not found] ` <15730.36080.987645.452664@charged.uio.no>
2002-09-01 22:50 ` Luca Barbieri
2002-09-01 14:33 ` Luca Barbieri
2002-09-01 16:38 ` Trond Myklebust
2002-09-01 18:42 ` Luca Barbieri
2002-09-01 19:25 ` Trond Myklebust
2002-09-01 21:36 ` Luca Barbieri
2002-09-01 15:15 ` Daniel Phillips
2002-09-01 15:35 ` Luca Barbieri
[not found] <15728.61345.184030.293634@charged.uio.no>
2002-08-31 18:57 ` Luca Barbieri
2002-08-31 19:51 ` Luca Barbieri
2002-08-31 16:32 Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).