* [kernel-hardening] Re: + ipc-introduce-shm_rmid_forced-sysctl.patch added to -mm tree
[not found] ` <20110630134855.GA6165@mail.hallyn.com>
@ 2011-06-30 13:57 ` Vasiliy Kulikov
2011-07-03 18:00 ` Vasiliy Kulikov
0 siblings, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-06-30 13:57 UTC (permalink / raw)
To: Serge Hallyn
Cc: akpm, mm-commits, daniel.lezcano, ebiederm, mingo, oleg, rdunlap,
tj, kernel-hardening
Hi Serge,
On Thu, Jun 30, 2011 at 08:48 -0500, Serge Hallyn wrote:
> Quoting akpm@linux-foundation.org (akpm@linux-foundation.org):
> > +static int shm_try_destroy_orphaned(int id, void *p, void *data)
> > +{
> > + struct ipc_namespace *ns = data;
> > + struct shmid_kernel *shp = shm_lock(ns, id);
> > + struct task_struct *task;
> > +
> > + if (IS_ERR(shp))
> > + return 0;
> > +
> > + /*
> > + * We want to destroy segments without users and with already
> > + * exit'ed originating process.
> > + *
> > + * XXX: the originating process may exist in another pid namespace.
> > + */
>
> (Sorry, I no longer have the original patch. I'd meant to take a closer
> look at the time...)
>
> So shp should store a reference to the struct pid, which you can check
> here? I think that'll do exactly what you need.
Documentation/namespaces/compatibility-list.txt says that IPC and PID
namespaces have not been fully separated yet. This is a core problem.
If I store a reference to pid namespace, it would solve only this little
problem, but the global problem of pid vs. ipc namespaces is left
unsolved. It would be much more consistent to fix the whole ipc code in
one step (if ever).
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: + ipc-introduce-shm_rmid_forced-sysctl.patch added to -mm tree
2011-06-30 13:57 ` [kernel-hardening] Re: + ipc-introduce-shm_rmid_forced-sysctl.patch added to -mm tree Vasiliy Kulikov
@ 2011-07-03 18:00 ` Vasiliy Kulikov
2011-07-04 11:55 ` [kernel-hardening] [PATCH] shm: handle separate PID namespaces case Vasiliy Kulikov
0 siblings, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-03 18:00 UTC (permalink / raw)
To: Serge Hallyn
Cc: akpm, mm-commits, daniel.lezcano, ebiederm, mingo, oleg, rdunlap,
tj, kernel-hardening
On Thu, Jun 30, 2011 at 17:57 +0400, Vasiliy Kulikov wrote:
> > So shp should store a reference to the struct pid, which you can check
> > here? I think that'll do exactly what you need.
>
> Documentation/namespaces/compatibility-list.txt says that IPC and PID
> namespaces have not been fully separated yet.
Looks like I've misunderstood the documentation. It says that
identifiers from the same ipc namespace shouldn't travel between
different pid namespaces, not about incomplete implementaiton. So yes,
storing pid or task will help. I'll send a patch after some testing.
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] [PATCH] shm: handle separate PID namespaces case
2011-07-03 18:00 ` Vasiliy Kulikov
@ 2011-07-04 11:55 ` Vasiliy Kulikov
2011-07-04 15:05 ` [kernel-hardening] " Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-04 11:55 UTC (permalink / raw)
To: akpm
Cc: Serge Hallyn, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
the case of separate PID namespaces, but a single IPC namespace. If
there are tasks with the same PID values using the same shmem object,
the wrong destroy decision could be reached.
On shm segment creation store the pointer to the creator task in
shmid_kernel->shm_creator field and zero it on task exit. Then
use the ->shm_creator insread of ->shm_cprid in both functions.
As shmid_kernel object is already locked at this stage, no additional
locking is needed.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---
include/linux/shm.h | 3 +++
ipc/shm.c | 27 ++++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index b030a4e..12d2234 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
pid_t shm_cprid;
pid_t shm_lprid;
struct user_struct *mlock_user;
+
+ /* The task created the shm object. NULL if the task is dead. */
+ struct task_struct *shm_creator;
};
/* shm_mode upper byte flags */
diff --git a/ipc/shm.c b/ipc/shm.c
index 22006f1..3baae98 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
if (IS_ERR(shp))
return 0;
- if (shp->shm_cprid != task_tgid_vnr(current)) {
+ if (shp->shm_creator != current) {
+ shm_unlock(shp);
+ return 0;
+ }
+
+ /*
+ * Mark it as orphaned to destroy the segment when
+ * kernel.shm_forced_rmid is changed.
+ * It is noop if the following shm_may_destroy() returns true.
+ */
+ shp->shm_creator = NULL;
+
+ /*
+ * Don't even try to destroy it. If shm_forced_rmid=0 and IPC_RMID
+ * is not set, it shouldn't be deleted here.
+ */
+ if (!ns->shm_forced_rmid) {
shm_unlock(shp);
return 0;
}
@@ -255,7 +271,6 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
struct shmid_kernel *shp = shm_lock(ns, id);
- struct task_struct *task;
if (IS_ERR(shp))
return 0;
@@ -263,11 +278,8 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
/*
* We want to destroy segments without users and with already
* exit'ed originating process.
- *
- * XXX: the originating process may exist in another pid namespace.
*/
- task = find_task_by_vpid(shp->shm_cprid);
- if (task != NULL) {
+ if (shp->shm_creator != NULL) {
shm_unlock(shp);
return 0;
}
@@ -295,7 +307,7 @@ void exit_shm(struct task_struct *task)
if (!nsp)
return;
ns = nsp->ipc_ns;
- if (!ns || !ns->shm_forced_rmid)
+ if (!ns)
return;
/* Destroy all already created segments, but not mapped yet */
@@ -494,6 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
shp->shm_segsz = size;
shp->shm_nattch = 0;
shp->shm_file = file;
+ shp->shm_creator = current;
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
* proc-ps tools use this. Changing this will break them.
--
1.7.0.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 11:55 ` [kernel-hardening] [PATCH] shm: handle separate PID namespaces case Vasiliy Kulikov
@ 2011-07-04 15:05 ` Oleg Nesterov
2011-07-04 15:26 ` Vasiliy Kulikov
2011-07-05 14:26 ` [kernel-hardening] Re: [PATCH] " Serge Hallyn
2011-07-05 17:29 ` Vasiliy Kulikov
2 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2011-07-04 15:05 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj,
kernel-hardening
On 07/04, Vasiliy Kulikov wrote:
>
> @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> if (IS_ERR(shp))
> return 0;
>
> - if (shp->shm_cprid != task_tgid_vnr(current)) {
> + if (shp->shm_creator != current) {
> + shm_unlock(shp);
> + return 0;
I know absolutely nothing about ipc/, so probably I am wrong. But do
we really need shm_lock() (which also another idr_find) to check
->shm_creator ? This is calles by idr_for_each() and afaics "void *p"
should match, no?
IOW, can't shm_try_destroy_current() do something like
struct shmid_kernel *shp = container_of(p, shmid_kernel, shm_perm);
if (shp->shm_creator != current)
return;
shm_lock();
...
?
Just curious.
Oleg.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 15:05 ` [kernel-hardening] " Oleg Nesterov
@ 2011-07-04 15:26 ` Vasiliy Kulikov
2011-07-04 15:37 ` Oleg Nesterov
2011-07-05 17:37 ` [kernel-hardening] [PATCH v2] shm: handle separate PID namespaces case Vasiliy Kulikov
0 siblings, 2 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-04 15:26 UTC (permalink / raw)
To: Oleg Nesterov
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj,
kernel-hardening
On Mon, Jul 04, 2011 at 17:05 +0200, Oleg Nesterov wrote:
> On 07/04, Vasiliy Kulikov wrote:
> >
> > @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> > if (IS_ERR(shp))
> > return 0;
> >
> > - if (shp->shm_cprid != task_tgid_vnr(current)) {
> > + if (shp->shm_creator != current) {
> > + shm_unlock(shp);
> > + return 0;
>
> I know absolutely nothing about ipc/, so probably I am wrong. But do
> we really need shm_lock()
It is needed to protect against parallel reads. To read one may just
hold shm_lock, but to write both shm_lock and rw_mutex are needed.
> (which also another idr_find)
Yes, this is a waste of time, actually. Directly locking
->shm_perm->lock is what is needed here and in shm_try_destroy_orphaned().
Thanks for looking into this!
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 15:26 ` Vasiliy Kulikov
@ 2011-07-04 15:37 ` Oleg Nesterov
2011-07-04 15:48 ` Vasiliy Kulikov
2011-07-04 17:01 ` [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting Vasiliy Kulikov
2011-07-05 17:37 ` [kernel-hardening] [PATCH v2] shm: handle separate PID namespaces case Vasiliy Kulikov
1 sibling, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2011-07-04 15:37 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj,
kernel-hardening
On 07/04, Vasiliy Kulikov wrote:
>
> On Mon, Jul 04, 2011 at 17:05 +0200, Oleg Nesterov wrote:
> > On 07/04, Vasiliy Kulikov wrote:
> > >
> > > @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> > > if (IS_ERR(shp))
> > > return 0;
> > >
> > > - if (shp->shm_cprid != task_tgid_vnr(current)) {
> > > + if (shp->shm_creator != current) {
> > > + shm_unlock(shp);
> > > + return 0;
> >
> > I know absolutely nothing about ipc/, so probably I am wrong. But do
> > we really need shm_lock()
>
> It is needed to protect against parallel reads. To read one may just
> hold shm_lock, but to write both shm_lock and rw_mutex are needed.
Hmm. Still can't understand... Once again, it seems to me we can
check shp->shm_creator != current and return lockless. Or do you
think without shm_lock() we can see the false positive?
If shp->shm_creator was current, it was set by use, we can't miss
this shp. Of course, if we are going to shm_destroy() then we need
shm_lock().
Thanks,
Oleg.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 15:37 ` Oleg Nesterov
@ 2011-07-04 15:48 ` Vasiliy Kulikov
2011-07-04 17:01 ` [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting Vasiliy Kulikov
1 sibling, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-04 15:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj,
kernel-hardening
On Mon, Jul 04, 2011 at 17:37 +0200, Oleg Nesterov wrote:
> On 07/04, Vasiliy Kulikov wrote:
> >
> > On Mon, Jul 04, 2011 at 17:05 +0200, Oleg Nesterov wrote:
> > > On 07/04, Vasiliy Kulikov wrote:
> > > >
> > > > @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> > > > if (IS_ERR(shp))
> > > > return 0;
> > > >
> > > > - if (shp->shm_cprid != task_tgid_vnr(current)) {
> > > > + if (shp->shm_creator != current) {
> > > > + shm_unlock(shp);
> > > > + return 0;
> > >
> > > I know absolutely nothing about ipc/, so probably I am wrong. But do
> > > we really need shm_lock()
> >
> > It is needed to protect against parallel reads. To read one may just
> > hold shm_lock, but to write both shm_lock and rw_mutex are needed.
>
> Hmm. Still can't understand... Once again, it seems to me we can
> check shp->shm_creator != current and return lockless.
Ah, I understood what you're saying. Yes, as before shm_destroy() we
change only ->shm_creator, which is checked only under rw_mutex, it is
safe to lock it just before shm_destroy(). So, ipc_lock_by_ptr()
instead of shm_lock() and only if we're going to destroy the segment.
Thanks again! :)
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting
2011-07-04 15:37 ` Oleg Nesterov
2011-07-04 15:48 ` Vasiliy Kulikov
@ 2011-07-04 17:01 ` Vasiliy Kulikov
2011-07-04 17:29 ` [kernel-hardening] " Oleg Nesterov
2011-07-05 17:38 ` [kernel-hardening] [PATCH v2] " Vasiliy Kulikov
1 sibling, 2 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-04 17:01 UTC (permalink / raw)
To: akpm
Cc: Oleg Nesterov, Serge Hallyn, daniel.lezcano, ebiederm, mingo,
rdunlap, tj, kernel-hardening
shm_lock() does a lookup of shm segment in shm_ids(ns).ipcs_idr, which
is redundant as we already know shmid_kernel address. An actual lock is
also not required for reads until we really want to destroy the segment.
exit_shm() and shm_destroy_orphaned() may avoid the loop by checking
whether there is at least one segment in current ipc_namespace.
The check of nsproxy and ipc_ns against NULL is redundant as exit_shm()
is called from do_exit() before the call to exit_notify(), so the
dereferencing current->nsproxy->ipc_ns is guaranteed to be safe.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---
ipc/shm.c | 67 +++++++++++++++++++++++++++++++------------------------------
1 files changed, 34 insertions(+), 33 deletions(-)
---
diff --git a/ipc/shm.c b/ipc/shm.c
index 3baae98..aa91236 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -131,6 +131,12 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
return container_of(ipcp, struct shmid_kernel, shm_perm);
}
+static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
+{
+ rcu_read_lock();
+ spin_lock(&ipcp->shm_perm.lock);
+}
+
static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
int id)
{
@@ -231,18 +237,16 @@ static void shm_close(struct vm_area_struct *vma)
up_write(&shm_ids(ns).rw_mutex);
}
+/* Called with ns->shm_ids(ns).rw_mutex locked */
static int shm_try_destroy_current(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
- struct shmid_kernel *shp = shm_lock(ns, id);
-
- if (IS_ERR(shp))
- return 0;
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp;
+ shp = container_of(ipcp, struct shmid_kernel, shm_perm);
- if (shp->shm_creator != current) {
- shm_unlock(shp);
+ if (shp->shm_creator != current)
return 0;
- }
/*
* Mark it as orphaned to destroy the segment when
@@ -255,64 +259,61 @@ static int shm_try_destroy_current(int id, void *p, void *data)
* Don't even try to destroy it. If shm_forced_rmid=0 and IPC_RMID
* is not set, it shouldn't be deleted here.
*/
- if (!ns->shm_forced_rmid) {
- shm_unlock(shp);
+ if (!ns->shm_forced_rmid)
return 0;
- }
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
- else
- shm_unlock(shp);
+ }
return 0;
}
+/* Called with ns->shm_ids(ns).rw_mutex locked */
static int shm_try_destroy_orphaned(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
- struct shmid_kernel *shp = shm_lock(ns, id);
-
- if (IS_ERR(shp))
- return 0;
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp;
+ shp = container_of(ipcp, struct shmid_kernel, shm_perm);
/*
* We want to destroy segments without users and with already
* exit'ed originating process.
+ *
+ * As shp->* are changed under rw_mutex, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL) {
- shm_unlock(shp);
+ if (shp->shm_creator != NULL)
return 0;
- }
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
- else
- shm_unlock(shp);
+ }
return 0;
}
void shm_destroy_orphaned(struct ipc_namespace *ns)
{
down_write(&shm_ids(ns).rw_mutex);
- idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr,
+ &shm_try_destroy_orphaned,
+ ns);
up_write(&shm_ids(ns).rw_mutex);
}
void exit_shm(struct task_struct *task)
{
- struct nsproxy *nsp = task->nsproxy;
- struct ipc_namespace *ns;
-
- if (!nsp)
- return;
- ns = nsp->ipc_ns;
- if (!ns)
- return;
+ struct ipc_namespace *ns = task->nsproxy->ipc_ns;
/* Destroy all already created segments, but not mapped yet */
down_write(&shm_ids(ns).rw_mutex);
- idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr,
+ &shm_try_destroy_current,
+ ns);
up_write(&shm_ids(ns).rw_mutex);
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: optimize locking and ipc_namespace getting
2011-07-04 17:01 ` [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting Vasiliy Kulikov
@ 2011-07-04 17:29 ` Oleg Nesterov
2011-07-04 17:51 ` Vasiliy Kulikov
2011-07-05 17:38 ` [kernel-hardening] [PATCH v2] " Vasiliy Kulikov
1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2011-07-04 17:29 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj,
kernel-hardening
On 07/04, Vasiliy Kulikov wrote:
>
> exit_shm() and shm_destroy_orphaned() may avoid the loop by checking
> whether there is at least one segment in current ipc_namespace.
Obviously I can't ack because I do not really understand this code,
but looks good to me.
Minor nit,
> void exit_shm(struct task_struct *task)
> {
> - struct nsproxy *nsp = task->nsproxy;
> - struct ipc_namespace *ns;
> -
> - if (!nsp)
> - return;
> - ns = nsp->ipc_ns;
> - if (!ns)
> - return;
> + struct ipc_namespace *ns = task->nsproxy->ipc_ns;
>
> /* Destroy all already created segments, but not mapped yet */
> down_write(&shm_ids(ns).rw_mutex);
> - idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
> + if (&shm_ids(ns).in_use)
Afaics, unlike shm_destroy_orphaned(), exit_shm() can check .in_use
lockless and thus avoid down_write() in the fast path. Given that
this sem is "global", I think this makes sense.
exit_shm() only cares about shmid_kernel's which were created by
current, we can't miss .in_use++ in ipc_addid(), it was called by us.
and thus we can't miss in_use != 0 although it is not stable without
the lock.
Oleg.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kernel-hardening] Re: [PATCH] shm: optimize locking and ipc_namespace getting
2011-07-04 17:29 ` [kernel-hardening] " Oleg Nesterov
@ 2011-07-04 17:51 ` Vasiliy Kulikov
0 siblings, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-04 17:51 UTC (permalink / raw)
To: kernel-hardening
Cc: akpm, Serge Hallyn, daniel.lezcano, ebiederm, mingo, rdunlap, tj
On Mon, Jul 04, 2011 at 19:29 +0200, Oleg Nesterov wrote:
> On 07/04, Vasiliy Kulikov wrote:
> > - idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
> > + if (&shm_ids(ns).in_use)
>
> Afaics, unlike shm_destroy_orphaned(), exit_shm() can check .in_use
> lockless and thus avoid down_write() in the fast path. Given that
> this sem is "global", I think this makes sense.
>
> exit_shm() only cares about shmid_kernel's which were created by
> current, we can't miss .in_use++ in ipc_addid(), it was called by us.
> and thus we can't miss in_use != 0 although it is not stable without
> the lock.
I agree that if in some moment of shm_exit() .in_use is zero, we can
avoid the loop. But is it guaranteed .in_use is assigned a value in an
atomic way? I mean, e.g. if it was 0x00ff and becomes equal to 0x0100
is it possible .in_use's lower bytes are changed _before_ the high
bytes? Then it can be observed as zero, when it logically is not. Even
if it is guaranteed for some architectures, is it possible the implicit
atomicity is violated by some new architecture?
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 11:55 ` [kernel-hardening] [PATCH] shm: handle separate PID namespaces case Vasiliy Kulikov
2011-07-04 15:05 ` [kernel-hardening] " Oleg Nesterov
@ 2011-07-05 14:26 ` Serge Hallyn
2011-07-05 14:50 ` Vasiliy Kulikov
2011-07-05 17:29 ` Vasiliy Kulikov
2 siblings, 1 reply; 22+ messages in thread
From: Serge Hallyn @ 2011-07-05 14:26 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
Quoting Vasiliy Kulikov (segoon@openwall.com):
> shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
> the case of separate PID namespaces, but a single IPC namespace. If
> there are tasks with the same PID values using the same shmem object,
> the wrong destroy decision could be reached.
>
> On shm segment creation store the pointer to the creator task in
> shmid_kernel->shm_creator field and zero it on task exit. Then
> use the ->shm_creator insread of ->shm_cprid in both functions.
> As shmid_kernel object is already locked at this stage, no additional
> locking is needed.
Thanks, Vasiliy. Sounds like the Documentation/ file could stand some
clarification.
A concern below, though:
> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
> ---
> include/linux/shm.h | 3 +++
> ipc/shm.c | 27 ++++++++++++++++++++-------
> 2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index b030a4e..12d2234 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
> pid_t shm_cprid;
> pid_t shm_lprid;
> struct user_struct *mlock_user;
> +
> + /* The task created the shm object. NULL if the task is dead. */
> + struct task_struct *shm_creator;
> };
>
> /* shm_mode upper byte flags */
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 22006f1..3baae98 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> if (IS_ERR(shp))
> return 0;
>
> - if (shp->shm_cprid != task_tgid_vnr(current)) {
> + if (shp->shm_creator != current) {
> + shm_unlock(shp);
> + return 0;
> + }
> +
> + /*
> + * Mark it as orphaned to destroy the segment when
> + * kernel.shm_forced_rmid is changed.
> + * It is noop if the following shm_may_destroy() returns true.
> + */
> + shp->shm_creator = NULL;
This function, shm_try_destroy_current(), only gets called by shm_exit()
if the shm_forced_rmid is set, right? So something funky can happen if
first shm_forced_rmid is 0 and some get created and the creating tasks
exits, then shm_forced_rmid gets set to one, and the task pointer gets
reused?
Using a struct pid may still be the best bet. It's much lighter-weight
than a task struct, so keeping a ref shouldn't much matter. It'll
avoid this wraparound issue (assuming I'm not imagining that issue).
Struct pid is namespace-safe, and you can still do your simple, quick
pointer comparison.
> + /*
> + * Don't even try to destroy it. If shm_forced_rmid=0 and IPC_RMID
> + * is not set, it shouldn't be deleted here.
> + */
> + if (!ns->shm_forced_rmid) {
> shm_unlock(shp);
> return 0;
> }
> @@ -255,7 +271,6 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
> {
> struct ipc_namespace *ns = data;
> struct shmid_kernel *shp = shm_lock(ns, id);
> - struct task_struct *task;
>
> if (IS_ERR(shp))
> return 0;
> @@ -263,11 +278,8 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
> /*
> * We want to destroy segments without users and with already
> * exit'ed originating process.
> - *
> - * XXX: the originating process may exist in another pid namespace.
> */
> - task = find_task_by_vpid(shp->shm_cprid);
> - if (task != NULL) {
> + if (shp->shm_creator != NULL) {
> shm_unlock(shp);
> return 0;
> }
> @@ -295,7 +307,7 @@ void exit_shm(struct task_struct *task)
> if (!nsp)
> return;
> ns = nsp->ipc_ns;
> - if (!ns || !ns->shm_forced_rmid)
> + if (!ns)
> return;
>
> /* Destroy all already created segments, but not mapped yet */
> @@ -494,6 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> shp->shm_segsz = size;
> shp->shm_nattch = 0;
> shp->shm_file = file;
> + shp->shm_creator = current;
> /*
> * shmid gets reported as "inode#" in /proc/pid/maps.
> * proc-ps tools use this. Changing this will break them.
> --
> 1.7.0.4
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-05 14:26 ` [kernel-hardening] Re: [PATCH] " Serge Hallyn
@ 2011-07-05 14:50 ` Vasiliy Kulikov
2011-07-05 15:57 ` Serge Hallyn
0 siblings, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-05 14:50 UTC (permalink / raw)
To: Serge Hallyn
Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
Hi Serge,
On Tue, Jul 05, 2011 at 09:26 -0500, Serge Hallyn wrote:
> > diff --git a/ipc/shm.c b/ipc/shm.c
> > index 22006f1..3baae98 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> > if (IS_ERR(shp))
> > return 0;
> >
> > - if (shp->shm_cprid != task_tgid_vnr(current)) {
> > + if (shp->shm_creator != current) {
> > + shm_unlock(shp);
> > + return 0;
> > + }
> > +
> > + /*
> > + * Mark it as orphaned to destroy the segment when
> > + * kernel.shm_forced_rmid is changed.
> > + * It is noop if the following shm_may_destroy() returns true.
> > + */
> > + shp->shm_creator = NULL;
>
> This function, shm_try_destroy_current(), only gets called by shm_exit()
> if the shm_forced_rmid is set, right? So something funky can happen if
> first shm_forced_rmid is 0 and some get created and the creating tasks
> exits, then shm_forced_rmid gets set to one, and the task pointer gets
> reused?
No, sinse this patch exit_shm() iterates all segments regardless of
shm_forced_rmid value (it is tried to _destroy_ the segment only if
shm_forced_rmid==1). The ->shm_creator is set when the segment is
created and explicitly NULL'ed when the task exits. As it has such
explicit rules, the ref counting is not needed at all.
Also ->shm_creator is not needed for anything, but for tracking whether
the creator has already exited, so keeping a reference neither to task
nor to pid is needed at all.
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-05 14:50 ` Vasiliy Kulikov
@ 2011-07-05 15:57 ` Serge Hallyn
2011-07-05 17:42 ` Vasiliy Kulikov
0 siblings, 1 reply; 22+ messages in thread
From: Serge Hallyn @ 2011-07-05 15:57 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
Quoting Vasiliy Kulikov (segoon@openwall.com):
> Hi Serge,
>
> On Tue, Jul 05, 2011 at 09:26 -0500, Serge Hallyn wrote:
> > > diff --git a/ipc/shm.c b/ipc/shm.c
> > > index 22006f1..3baae98 100644
> > > --- a/ipc/shm.c
> > > +++ b/ipc/shm.c
> > > @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> > > if (IS_ERR(shp))
> > > return 0;
> > >
> > > - if (shp->shm_cprid != task_tgid_vnr(current)) {
> > > + if (shp->shm_creator != current) {
> > > + shm_unlock(shp);
> > > + return 0;
> > > + }
> > > +
> > > + /*
> > > + * Mark it as orphaned to destroy the segment when
> > > + * kernel.shm_forced_rmid is changed.
> > > + * It is noop if the following shm_may_destroy() returns true.
> > > + */
> > > + shp->shm_creator = NULL;
> >
> > This function, shm_try_destroy_current(), only gets called by shm_exit()
> > if the shm_forced_rmid is set, right? So something funky can happen if
> > first shm_forced_rmid is 0 and some get created and the creating tasks
> > exits, then shm_forced_rmid gets set to one, and the task pointer gets
> > reused?
>
> No, sinse this patch exit_shm() iterates all segments regardless of
> shm_forced_rmid value (it is tried to _destroy_ the segment only if
> shm_forced_rmid==1). The ->shm_creator is set when the segment is
> created and explicitly NULL'ed when the task exits. As it has such
> explicit rules, the ref counting is not needed at all.
Ok, cool.
> Also ->shm_creator is not needed for anything, but for tracking whether
> the creator has already exited, so keeping a reference neither to task
> nor to pid is needed at all.
Do you have all of these patches applied in one git tree? Could
you do a quick 'git diff master..' (assuming master is Linus' tree)
and send the result (or send a git url), and I'll take a last closer
look at the patch tomorrow?
thanks,
-serge
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-04 11:55 ` [kernel-hardening] [PATCH] shm: handle separate PID namespaces case Vasiliy Kulikov
2011-07-04 15:05 ` [kernel-hardening] " Oleg Nesterov
2011-07-05 14:26 ` [kernel-hardening] Re: [PATCH] " Serge Hallyn
@ 2011-07-05 17:29 ` Vasiliy Kulikov
2 siblings, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-05 17:29 UTC (permalink / raw)
To: akpm
Cc: Serge Hallyn, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
On Mon, Jul 04, 2011 at 15:55 +0400, Vasiliy Kulikov wrote:
> shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
> the case of separate PID namespaces, but a single IPC namespace. If
> there are tasks with the same PID values using the same shmem object,
> the wrong destroy decision could be reached.
>
> On shm segment creation store the pointer to the creator task in
> shmid_kernel->shm_creator field and zero it on task exit. Then
> use the ->shm_creator insread of ->shm_cprid in both functions.
> As shmid_kernel object is already locked at this stage, no additional
> locking is needed.
>
> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
> ---
[...]
> + if (!ns->shm_forced_rmid) {
Oops, this patch is based on my old tree where it was shm_forced_rmid
instead of shm_rmid_forced. I'll resend these 2 patches. Sorry for the
noise...
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] [PATCH v2] shm: handle separate PID namespaces case
2011-07-04 15:26 ` Vasiliy Kulikov
2011-07-04 15:37 ` Oleg Nesterov
@ 2011-07-05 17:37 ` Vasiliy Kulikov
2011-07-15 6:45 ` [kernel-hardening] " Vasiliy Kulikov
1 sibling, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-05 17:37 UTC (permalink / raw)
To: akpm
Cc: Oleg Nesterov, Serge Hallyn, daniel.lezcano, ebiederm, mingo,
rdunlap, tj, kernel-hardening
shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
the case of separate PID namespaces, but a single IPC namespace. If
there are tasks with the same PID values using the same shmem object,
the wrong destroy decision could be reached.
On shm segment creation store the pointer to the creator task in
shmid_kernel->shm_creator field and zero it on task exit. Then
use the ->shm_creator insread of shm_cprid in both functions. As
shmid_kernel object is already locked at this stage, no additional
locking is needed.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---
include/linux/shm.h | 3 +++
ipc/shm.c | 27 ++++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index b030a4e..12d2234 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
pid_t shm_cprid;
pid_t shm_lprid;
struct user_struct *mlock_user;
+
+ /* The task created the shm object. NULL if the task is dead. */
+ struct task_struct *shm_creator;
};
/* shm_mode upper byte flags */
diff --git a/ipc/shm.c b/ipc/shm.c
index 22006f1..3baae98 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
if (IS_ERR(shp))
return 0;
- if (shp->shm_cprid != task_tgid_vnr(current)) {
+ if (shp->shm_creator != current) {
+ shm_unlock(shp);
+ return 0;
+ }
+
+ /*
+ * Mark it as orphaned to destroy the segment when
+ * kernel.shm_rmid_forced is changed.
+ * It is noop if the following shm_may_destroy() returns true.
+ */
+ shp->shm_creator = NULL;
+
+ /*
+ * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
+ * is not set, it shouldn't be deleted here.
+ */
+ if (!ns->shm_rmid_forced) {
shm_unlock(shp);
return 0;
}
@@ -255,7 +271,6 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
struct shmid_kernel *shp = shm_lock(ns, id);
- struct task_struct *task;
if (IS_ERR(shp))
return 0;
@@ -263,11 +278,8 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
/*
* We want to destroy segments without users and with already
* exit'ed originating process.
- *
- * XXX: the originating process may exist in another pid namespace.
*/
- task = find_task_by_vpid(shp->shm_cprid);
- if (task != NULL) {
+ if (shp->shm_creator != NULL) {
shm_unlock(shp);
return 0;
}
@@ -295,7 +307,7 @@ void exit_shm(struct task_struct *task)
if (!nsp)
return;
ns = nsp->ipc_ns;
- if (!ns || !ns->shm_rmid_forced)
+ if (!ns)
return;
/* Destroy all already created segments, but not mapped yet */
@@ -494,6 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
shp->shm_segsz = size;
shp->shm_nattch = 0;
shp->shm_file = file;
+ shp->shm_creator = current;
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
* proc-ps tools use this. Changing this will break them.
--
1.7.0.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [kernel-hardening] [PATCH v2] shm: optimize locking and ipc_namespace getting
2011-07-04 17:01 ` [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting Vasiliy Kulikov
2011-07-04 17:29 ` [kernel-hardening] " Oleg Nesterov
@ 2011-07-05 17:38 ` Vasiliy Kulikov
1 sibling, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-05 17:38 UTC (permalink / raw)
To: akpm
Cc: Oleg Nesterov, Serge Hallyn, daniel.lezcano, ebiederm, mingo,
rdunlap, tj, kernel-hardening
shm_lock() does a lookup of shm segment in shm_ids(ns).ipcs_idr, which
is redundant as we already know shmid_kernel address. An actual lock is
also not required for reads until we really want to destroy the segment.
exit_shm() and shm_destroy_orphaned() may avoid the loop by checking
whether there is at least one segment in current ipc_namespace.
The check of nsproxy and ipc_ns against NULL is redundant as exit_shm()
is called from do_exit() before the call to exit_notify(), so the
dereferencing current->nsproxy->ipc_ns is guaranteed to be safe.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---
ipc/shm.c | 61 ++++++++++++++++++++++++++++---------------------------------
1 files changed, 28 insertions(+), 33 deletions(-)
diff --git a/ipc/shm.c b/ipc/shm.c
index 3baae98..85db561 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -131,6 +131,12 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
return container_of(ipcp, struct shmid_kernel, shm_perm);
}
+static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
+{
+ rcu_read_lock();
+ spin_lock(&ipcp->shm_perm.lock);
+}
+
static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
int id)
{
@@ -231,18 +237,15 @@ static void shm_close(struct vm_area_struct *vma)
up_write(&shm_ids(ns).rw_mutex);
}
+/* Called with ns->shm_ids(ns).rw_mutex locked */
static int shm_try_destroy_current(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
- struct shmid_kernel *shp = shm_lock(ns, id);
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
- if (IS_ERR(shp))
- return 0;
-
- if (shp->shm_creator != current) {
- shm_unlock(shp);
+ if (shp->shm_creator != current)
return 0;
- }
/*
* Mark it as orphaned to destroy the segment when
@@ -255,64 +258,56 @@ static int shm_try_destroy_current(int id, void *p, void *data)
* Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
* is not set, it shouldn't be deleted here.
*/
- if (!ns->shm_rmid_forced) {
- shm_unlock(shp);
+ if (!ns->shm_rmid_forced)
return 0;
- }
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
- else
- shm_unlock(shp);
+ }
return 0;
}
+/* Called with ns->shm_ids(ns).rw_mutex locked */
static int shm_try_destroy_orphaned(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
- struct shmid_kernel *shp = shm_lock(ns, id);
-
- if (IS_ERR(shp))
- return 0;
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
/*
* We want to destroy segments without users and with already
* exit'ed originating process.
+ *
+ * As shp->* are changed under rw_mutex, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL) {
- shm_unlock(shp);
+ if (shp->shm_creator != NULL)
return 0;
- }
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
- else
- shm_unlock(shp);
+ }
return 0;
}
void shm_destroy_orphaned(struct ipc_namespace *ns)
{
down_write(&shm_ids(ns).rw_mutex);
- idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
up_write(&shm_ids(ns).rw_mutex);
}
void exit_shm(struct task_struct *task)
{
- struct nsproxy *nsp = task->nsproxy;
- struct ipc_namespace *ns;
-
- if (!nsp)
- return;
- ns = nsp->ipc_ns;
- if (!ns)
- return;
+ struct ipc_namespace *ns = task->nsproxy->ipc_ns;
/* Destroy all already created segments, but not mapped yet */
down_write(&shm_ids(ns).rw_mutex);
- idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
up_write(&shm_ids(ns).rw_mutex);
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-05 15:57 ` Serge Hallyn
@ 2011-07-05 17:42 ` Vasiliy Kulikov
2011-07-06 16:31 ` Serge Hallyn
0 siblings, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-05 17:42 UTC (permalink / raw)
To: Serge Hallyn
Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
On Tue, Jul 05, 2011 at 10:57 -0500, Serge Hallyn wrote:
> Do you have all of these patches applied in one git tree? Could
> you do a quick 'git diff master..' (assuming master is Linus' tree)
> and send the result (or send a git url), and I'll take a last closer
> look at the patch tomorrow?
These patches are against 3.0-rc2, it should be simple to rebase to -rc5
since ipc/ is not heavily changed nowadays. The patches are stored in
-mm tree. The blob against -rc2 is as follows:
---
Documentation/sysctl/kernel.txt | 22 ++++++++
include/linux/ipc_namespace.h | 7 +++
include/linux/shm.h | 7 +++
ipc/ipc_sysctl.c | 36 +++++++++++++
ipc/shm.c | 105 +++++++++++++++++++++++++++++++++++++--
kernel/exit.c | 1 +
6 files changed, 174 insertions(+), 4 deletions(-)
---
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 5e7cb39..4d7b866 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -62,6 +62,7 @@ show up in /proc/sys/kernel:
- shmall
- shmmax [ sysv ipc ]
- shmmni
+- shm_rmid_forced
- stop-a [ SPARC only ]
- sysrq ==> Documentation/sysrq.txt
- tainted
@@ -475,6 +476,27 @@ kernel. This value defaults to SHMMAX.
==============================================================
+shm_rmid_forced:
+
+Linux lets you set resource limits, including how much memory one
+process can consume, via setrlimit(2). Unfortunately, shared memory
+segments are allowed to exist without association with any process, and
+thus might not be counted against any resource limits. If enabled,
+shared memory segments are automatically destroyed when their attach
+count becomes zero after a detach or a process termination. It will
+also destroy segments that were created, but never attached to, on exit
+from the process. The only use left for IPC_RMID is to immediately
+destroy an unattached segment. Of course, this breaks the way things are
+defined, so some applications might stop working. Note that this
+feature will do you no good unless you also configure your resource
+limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't
+need this.
+
+Note that if you change this from 0 to 1, already created segments
+without users and with a dead originative process will be destroyed.
+
+==============================================================
+
softlockup_thresh:
This value can be used to lower the softlockup tolerance threshold. The
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index a6d1655..8a297a5 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -44,6 +44,11 @@ struct ipc_namespace {
size_t shm_ctlall;
int shm_ctlmni;
int shm_tot;
+ /*
+ * Defines whether IPC_RMID is forced for _all_ shm segments regardless
+ * of shmctl()
+ */
+ int shm_rmid_forced;
struct notifier_block ipcns_nb;
@@ -72,6 +77,7 @@ extern int register_ipcns_notifier(struct ipc_namespace *);
extern int cond_register_ipcns_notifier(struct ipc_namespace *);
extern void unregister_ipcns_notifier(struct ipc_namespace *);
extern int ipcns_notify(unsigned long);
+extern void shm_destroy_orphaned(struct ipc_namespace *ns);
#else /* CONFIG_SYSVIPC */
static inline int register_ipcns_notifier(struct ipc_namespace *ns)
{ return 0; }
@@ -79,6 +85,7 @@ static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns)
{ return 0; }
static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { }
static inline int ipcns_notify(unsigned long l) { return 0; }
+static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {}
#endif /* CONFIG_SYSVIPC */
#ifdef CONFIG_POSIX_MQUEUE
diff --git a/include/linux/shm.h b/include/linux/shm.h
index eca6235..92808b8 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
pid_t shm_cprid;
pid_t shm_lprid;
struct user_struct *mlock_user;
+
+ /* The task created the shm object. NULL if the task is dead. */
+ struct task_struct *shm_creator;
};
/* shm_mode upper byte flags */
@@ -106,6 +109,7 @@ struct shmid_kernel /* private to the kernel */
#ifdef CONFIG_SYSVIPC
long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr);
extern int is_file_shm_hugepages(struct file *file);
+extern void exit_shm(struct task_struct *task);
#else
static inline long do_shmat(int shmid, char __user *shmaddr,
int shmflg, unsigned long *addr)
@@ -116,6 +120,9 @@ static inline int is_file_shm_hugepages(struct file *file)
{
return 0;
}
+static inline void exit_shm(struct task_struct *task)
+{
+}
#endif
#endif /* __KERNEL__ */
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 56410fa..00fba2b 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -31,12 +31,37 @@ static int proc_ipc_dointvec(ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table ipc_table;
+
memcpy(&ipc_table, table, sizeof(ipc_table));
ipc_table.data = get_ipc(table);
return proc_dointvec(&ipc_table, write, buffer, lenp, ppos);
}
+static int proc_ipc_dointvec_minmax(ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table ipc_table;
+
+ memcpy(&ipc_table, table, sizeof(ipc_table));
+ ipc_table.data = get_ipc(table);
+
+ return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
+}
+
+static int proc_ipc_dointvec_minmax_orphans(ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+ int err = proc_ipc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (err < 0)
+ return err;
+ if (ns->shm_rmid_forced)
+ shm_destroy_orphaned(ns);
+ return err;
+}
+
static int proc_ipc_callback_dointvec(ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -125,6 +150,8 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
#else
#define proc_ipc_doulongvec_minmax NULL
#define proc_ipc_dointvec NULL
+#define proc_ipc_dointvec_minmax NULL
+#define proc_ipc_dointvec_minmax_orphans NULL
#define proc_ipc_callback_dointvec NULL
#define proc_ipcauto_dointvec_minmax NULL
#endif
@@ -155,6 +182,15 @@ static struct ctl_table ipc_kern_table[] = {
.proc_handler = proc_ipc_dointvec,
},
{
+ .procname = "shm_rmid_forced",
+ .data = &init_ipc_ns.shm_rmid_forced,
+ .maxlen = sizeof(init_ipc_ns.shm_rmid_forced),
+ .mode = 0644,
+ .proc_handler = proc_ipc_dointvec_minmax_orphans,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ {
.procname = "msgmax",
.data = &init_ipc_ns.msg_ctlmax,
.maxlen = sizeof (init_ipc_ns.msg_ctlmax),
diff --git a/ipc/shm.c b/ipc/shm.c
index ab3385a..bf46636 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -74,6 +74,7 @@ void shm_init_ns(struct ipc_namespace *ns)
ns->shm_ctlmax = SHMMAX;
ns->shm_ctlall = SHMALL;
ns->shm_ctlmni = SHMMNI;
+ ns->shm_rmid_forced = 1;
ns->shm_tot = 0;
ipc_init_ids(&shm_ids(ns));
}
@@ -130,6 +131,12 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
return container_of(ipcp, struct shmid_kernel, shm_perm);
}
+static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
+{
+ rcu_read_lock();
+ spin_lock(&ipcp->shm_perm.lock);
+}
+
static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
int id)
{
@@ -187,6 +194,23 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
}
/*
+ * shm_may_destroy - identifies whether shm segment should be destroyed now
+ *
+ * Returns true if and only if there are no active users of the segment and
+ * one of the following is true:
+ *
+ * 1) shmctl(id, IPC_RMID, NULL) was called for this shp
+ *
+ * 2) sysctl kernel.shm_rmid_forced is set to 1.
+ */
+static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+{
+ return (shp->shm_nattch == 0) &&
+ (ns->shm_rmid_forced ||
+ (shp->shm_perm.mode & SHM_DEST));
+}
+
+/*
* remove the attach descriptor vma.
* free memory for segment if it is marked destroyed.
* The descriptor has already been removed from the current->mm->mmap list
@@ -206,14 +230,87 @@ static void shm_close(struct vm_area_struct *vma)
shp->shm_lprid = task_tgid_vnr(current);
shp->shm_dtim = get_seconds();
shp->shm_nattch--;
- if(shp->shm_nattch == 0 &&
- shp->shm_perm.mode & SHM_DEST)
+ if (shm_may_destroy(ns, shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
up_write(&shm_ids(ns).rw_mutex);
}
+/* Called with ns->shm_ids(ns).rw_mutex locked */
+static int shm_try_destroy_current(int id, void *p, void *data)
+{
+ struct ipc_namespace *ns = data;
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+
+ if (shp->shm_creator != current)
+ return 0;
+
+ /*
+ * Mark it as orphaned to destroy the segment when
+ * kernel.shm_rmid_forced is changed.
+ * It is noop if the following shm_may_destroy() returns true.
+ */
+ shp->shm_creator = NULL;
+
+ /*
+ * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
+ * is not set, it shouldn't be deleted here.
+ */
+ if (!ns->shm_rmid_forced)
+ return 0;
+
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
+ shm_destroy(ns, shp);
+ }
+ return 0;
+}
+
+/* Called with ns->shm_ids(ns).rw_mutex locked */
+static int shm_try_destroy_orphaned(int id, void *p, void *data)
+{
+ struct ipc_namespace *ns = data;
+ struct kern_ipc_perm *ipcp = p;
+ struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+
+ /*
+ * We want to destroy segments without users and with already
+ * exit'ed originating process.
+ *
+ * As shp->* are changed under rw_mutex, it's safe to skip shp locking.
+ */
+ if (shp->shm_creator != NULL)
+ return 0;
+
+ if (shm_may_destroy(ns, shp)) {
+ shm_lock_by_ptr(shp);
+ shm_destroy(ns, shp);
+ }
+ return 0;
+}
+
+void shm_destroy_orphaned(struct ipc_namespace *ns)
+{
+ down_write(&shm_ids(ns).rw_mutex);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
+ up_write(&shm_ids(ns).rw_mutex);
+}
+
+
+void exit_shm(struct task_struct *task)
+{
+ struct ipc_namespace *ns = task->nsproxy->ipc_ns;
+
+ /* Destroy all already created segments, but not mapped yet */
+ down_write(&shm_ids(ns).rw_mutex);
+ if (&shm_ids(ns).in_use)
+ idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
+ up_write(&shm_ids(ns).rw_mutex);
+}
+
static int shm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct file *file = vma->vm_file;
@@ -404,6 +501,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
shp->shm_segsz = size;
shp->shm_nattch = 0;
shp->shm_file = file;
+ shp->shm_creator = current;
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
* proc-ps tools use this. Changing this will break them.
@@ -950,8 +1048,7 @@ out_nattch:
shp = shm_lock(ns, shmid);
BUG_ON(IS_ERR(shp));
shp->shm_nattch--;
- if(shp->shm_nattch == 0 &&
- shp->shm_perm.mode & SHM_DEST)
+ if (shm_may_destroy(ns, shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
diff --git a/kernel/exit.c b/kernel/exit.c
index 20a4064..816c610 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -991,6 +991,7 @@ NORET_TYPE void do_exit(long code)
trace_sched_process_exit(tsk);
exit_sem(tsk);
+ exit_shm(tsk);
exit_files(tsk);
exit_fs(tsk);
check_stack_usage();
--
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-05 17:42 ` Vasiliy Kulikov
@ 2011-07-06 16:31 ` Serge Hallyn
2011-07-06 16:57 ` Vasiliy Kulikov
0 siblings, 1 reply; 22+ messages in thread
From: Serge Hallyn @ 2011-07-06 16:31 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj,
kernel-hardening
Quoting Vasiliy Kulikov (segoon@openwall.com):
> On Tue, Jul 05, 2011 at 10:57 -0500, Serge Hallyn wrote:
> > Do you have all of these patches applied in one git tree? Could
> > you do a quick 'git diff master..' (assuming master is Linus' tree)
> > and send the result (or send a git url), and I'll take a last closer
> > look at the patch tomorrow?
>
> These patches are against 3.0-rc2, it should be simple to rebase to -rc5
> since ipc/ is not heavily changed nowadays. The patches are stored in
> -mm tree. The blob against -rc2 is as follows:
>
Thanks, Vasiliy. I don't see anything wrong with it, though I do
wory (see below) about contention on task exit? The rest of my
comments are just safe-to-ignore-of-you-like nits.
> ---
>
> Documentation/sysctl/kernel.txt | 22 ++++++++
> include/linux/ipc_namespace.h | 7 +++
> include/linux/shm.h | 7 +++
> ipc/ipc_sysctl.c | 36 +++++++++++++
> ipc/shm.c | 105 +++++++++++++++++++++++++++++++++++++--
> kernel/exit.c | 1 +
> 6 files changed, 174 insertions(+), 4 deletions(-)
> ---
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 5e7cb39..4d7b866 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -62,6 +62,7 @@ show up in /proc/sys/kernel:
> - shmall
> - shmmax [ sysv ipc ]
> - shmmni
> +- shm_rmid_forced
> - stop-a [ SPARC only ]
> - sysrq ==> Documentation/sysrq.txt
> - tainted
> @@ -475,6 +476,27 @@ kernel. This value defaults to SHMMAX.
>
> ==============================================================
>
> +shm_rmid_forced:
> +
> +Linux lets you set resource limits, including how much memory one
> +process can consume, via setrlimit(2). Unfortunately, shared memory
> +segments are allowed to exist without association with any process, and
> +thus might not be counted against any resource limits. If enabled,
> +shared memory segments are automatically destroyed when their attach
> +count becomes zero after a detach or a process termination. It will
> +also destroy segments that were created, but never attached to, on exit
> +from the process. The only use left for IPC_RMID is to immediately
> +destroy an unattached segment. Of course, this breaks the way things are
> +defined, so some applications might stop working. Note that this
> +feature will do you no good unless you also configure your resource
> +limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't
> +need this.
> +
> +Note that if you change this from 0 to 1, already created segments
> +without users and with a dead originative process will be destroyed.
> +
> +==============================================================
> +
> softlockup_thresh:
>
> This value can be used to lower the softlockup tolerance threshold. The
> diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
> index a6d1655..8a297a5 100644
> --- a/include/linux/ipc_namespace.h
> +++ b/include/linux/ipc_namespace.h
> @@ -44,6 +44,11 @@ struct ipc_namespace {
> size_t shm_ctlall;
> int shm_ctlmni;
> int shm_tot;
> + /*
> + * Defines whether IPC_RMID is forced for _all_ shm segments regardless
> + * of shmctl()
> + */
> + int shm_rmid_forced;
>
> struct notifier_block ipcns_nb;
>
> @@ -72,6 +77,7 @@ extern int register_ipcns_notifier(struct ipc_namespace *);
> extern int cond_register_ipcns_notifier(struct ipc_namespace *);
> extern void unregister_ipcns_notifier(struct ipc_namespace *);
> extern int ipcns_notify(unsigned long);
> +extern void shm_destroy_orphaned(struct ipc_namespace *ns);
> #else /* CONFIG_SYSVIPC */
> static inline int register_ipcns_notifier(struct ipc_namespace *ns)
> { return 0; }
> @@ -79,6 +85,7 @@ static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns)
> { return 0; }
> static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { }
> static inline int ipcns_notify(unsigned long l) { return 0; }
> +static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {}
> #endif /* CONFIG_SYSVIPC */
>
> #ifdef CONFIG_POSIX_MQUEUE
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index eca6235..92808b8 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
> pid_t shm_cprid;
> pid_t shm_lprid;
> struct user_struct *mlock_user;
> +
> + /* The task created the shm object. NULL if the task is dead. */
> + struct task_struct *shm_creator;
> };
>
> /* shm_mode upper byte flags */
> @@ -106,6 +109,7 @@ struct shmid_kernel /* private to the kernel */
> #ifdef CONFIG_SYSVIPC
> long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr);
> extern int is_file_shm_hugepages(struct file *file);
> +extern void exit_shm(struct task_struct *task);
> #else
> static inline long do_shmat(int shmid, char __user *shmaddr,
> int shmflg, unsigned long *addr)
> @@ -116,6 +120,9 @@ static inline int is_file_shm_hugepages(struct file *file)
> {
> return 0;
> }
> +static inline void exit_shm(struct task_struct *task)
> +{
> +}
> #endif
>
> #endif /* __KERNEL__ */
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index 56410fa..00fba2b 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -31,12 +31,37 @@ static int proc_ipc_dointvec(ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos)
> {
> struct ctl_table ipc_table;
> +
> memcpy(&ipc_table, table, sizeof(ipc_table));
> ipc_table.data = get_ipc(table);
>
> return proc_dointvec(&ipc_table, write, buffer, lenp, ppos);
> }
>
> +static int proc_ipc_dointvec_minmax(ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + struct ctl_table ipc_table;
> +
> + memcpy(&ipc_table, table, sizeof(ipc_table));
> + ipc_table.data = get_ipc(table);
> +
> + return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
> +}
> +
> +static int proc_ipc_dointvec_minmax_orphans(ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + struct ipc_namespace *ns = current->nsproxy->ipc_ns;
> + int err = proc_ipc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +
> + if (err < 0)
> + return err;
> + if (ns->shm_rmid_forced)
> + shm_destroy_orphaned(ns);
> + return err;
> +}
> +
> static int proc_ipc_callback_dointvec(ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos)
> {
> @@ -125,6 +150,8 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
> #else
> #define proc_ipc_doulongvec_minmax NULL
> #define proc_ipc_dointvec NULL
> +#define proc_ipc_dointvec_minmax NULL
> +#define proc_ipc_dointvec_minmax_orphans NULL
> #define proc_ipc_callback_dointvec NULL
> #define proc_ipcauto_dointvec_minmax NULL
> #endif
> @@ -155,6 +182,15 @@ static struct ctl_table ipc_kern_table[] = {
> .proc_handler = proc_ipc_dointvec,
> },
> {
> + .procname = "shm_rmid_forced",
> + .data = &init_ipc_ns.shm_rmid_forced,
> + .maxlen = sizeof(init_ipc_ns.shm_rmid_forced),
> + .mode = 0644,
> + .proc_handler = proc_ipc_dointvec_minmax_orphans,
> + .extra1 = &zero,
> + .extra2 = &one,
> + },
> + {
> .procname = "msgmax",
> .data = &init_ipc_ns.msg_ctlmax,
> .maxlen = sizeof (init_ipc_ns.msg_ctlmax),
> diff --git a/ipc/shm.c b/ipc/shm.c
> index ab3385a..bf46636 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -74,6 +74,7 @@ void shm_init_ns(struct ipc_namespace *ns)
> ns->shm_ctlmax = SHMMAX;
> ns->shm_ctlall = SHMALL;
> ns->shm_ctlmni = SHMMNI;
> + ns->shm_rmid_forced = 1;
Given the description in Documentation/sysctl/kernel.txt, shouldn't
this default to 0?
> ns->shm_tot = 0;
> ipc_init_ids(&shm_ids(ns));
> }
> @@ -130,6 +131,12 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
> return container_of(ipcp, struct shmid_kernel, shm_perm);
> }
>
> +static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
> +{
> + rcu_read_lock();
> + spin_lock(&ipcp->shm_perm.lock);
> +}
> +
> static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
> int id)
> {
> @@ -187,6 +194,23 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
> }
>
> /*
> + * shm_may_destroy - identifies whether shm segment should be destroyed now
> + *
> + * Returns true if and only if there are no active users of the segment and
> + * one of the following is true:
> + *
> + * 1) shmctl(id, IPC_RMID, NULL) was called for this shp
> + *
> + * 2) sysctl kernel.shm_rmid_forced is set to 1.
> + */
> +static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
'may' usually implies a permission check. Would this be better named
'shm_should_destroy()'?
> +{
> + return (shp->shm_nattch == 0) &&
> + (ns->shm_rmid_forced ||
> + (shp->shm_perm.mode & SHM_DEST));
> +}
> +
> +/*
> * remove the attach descriptor vma.
> * free memory for segment if it is marked destroyed.
> * The descriptor has already been removed from the current->mm->mmap list
> @@ -206,14 +230,87 @@ static void shm_close(struct vm_area_struct *vma)
> shp->shm_lprid = task_tgid_vnr(current);
> shp->shm_dtim = get_seconds();
> shp->shm_nattch--;
> - if(shp->shm_nattch == 0 &&
> - shp->shm_perm.mode & SHM_DEST)
> + if (shm_may_destroy(ns, shp))
> shm_destroy(ns, shp);
> else
> shm_unlock(shp);
> up_write(&shm_ids(ns).rw_mutex);
> }
>
> +/* Called with ns->shm_ids(ns).rw_mutex locked */
> +static int shm_try_destroy_current(int id, void *p, void *data)
> +{
> + struct ipc_namespace *ns = data;
> + struct kern_ipc_perm *ipcp = p;
> + struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
> +
> + if (shp->shm_creator != current)
> + return 0;
> +
> + /*
> + * Mark it as orphaned to destroy the segment when
> + * kernel.shm_rmid_forced is changed.
> + * It is noop if the following shm_may_destroy() returns true.
> + */
> + shp->shm_creator = NULL;
> +
> + /*
> + * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
> + * is not set, it shouldn't be deleted here.
> + */
> + if (!ns->shm_rmid_forced)
> + return 0;
> +
> + if (shm_may_destroy(ns, shp)) {
This seems redundant. Would it be better to just make this
if (shp->shm_nattch == 0) {
here as we already know ns->shm_rmid_forced == 1?
> + shm_lock_by_ptr(shp);
> + shm_destroy(ns, shp);
Wish there were a clean way to document that the locks will be
released by shm_destroy().
> + }
> + return 0;
> +}
> +
> +/* Called with ns->shm_ids(ns).rw_mutex locked */
> +static int shm_try_destroy_orphaned(int id, void *p, void *data)
> +{
> + struct ipc_namespace *ns = data;
> + struct kern_ipc_perm *ipcp = p;
> + struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
> +
> + /*
> + * We want to destroy segments without users and with already
> + * exit'ed originating process.
> + *
> + * As shp->* are changed under rw_mutex, it's safe to skip shp locking.
> + */
> + if (shp->shm_creator != NULL)
> + return 0;
> +
> + if (shm_may_destroy(ns, shp)) {
> + shm_lock_by_ptr(shp);
> + shm_destroy(ns, shp);
> + }
> + return 0;
> +}
> +
> +void shm_destroy_orphaned(struct ipc_namespace *ns)
> +{
> + down_write(&shm_ids(ns).rw_mutex);
> + if (&shm_ids(ns).in_use)
> + idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
> + up_write(&shm_ids(ns).rw_mutex);
Hm, is this going to cause contention when killing a lot of tasks?
> +}
> +
> +
> +void exit_shm(struct task_struct *task)
> +{
> + struct ipc_namespace *ns = task->nsproxy->ipc_ns;
> +
> + /* Destroy all already created segments, but not mapped yet */
> + down_write(&shm_ids(ns).rw_mutex);
> + if (&shm_ids(ns).in_use)
> + idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
> + up_write(&shm_ids(ns).rw_mutex);
Having exit_shm() call shm_destroy_orphaned(task->nsproxy->ipc_ns) seems
more future-proof?
> +}
> +
> static int shm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> {
> struct file *file = vma->vm_file;
> @@ -404,6 +501,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> shp->shm_segsz = size;
> shp->shm_nattch = 0;
> shp->shm_file = file;
> + shp->shm_creator = current;
> /*
> * shmid gets reported as "inode#" in /proc/pid/maps.
> * proc-ps tools use this. Changing this will break them.
> @@ -950,8 +1048,7 @@ out_nattch:
> shp = shm_lock(ns, shmid);
> BUG_ON(IS_ERR(shp));
> shp->shm_nattch--;
> - if(shp->shm_nattch == 0 &&
> - shp->shm_perm.mode & SHM_DEST)
> + if (shm_may_destroy(ns, shp))
> shm_destroy(ns, shp);
> else
> shm_unlock(shp);
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 20a4064..816c610 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -991,6 +991,7 @@ NORET_TYPE void do_exit(long code)
> trace_sched_process_exit(tsk);
>
> exit_sem(tsk);
> + exit_shm(tsk);
> exit_files(tsk);
> exit_fs(tsk);
> check_stack_usage();
> --
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-06 16:31 ` Serge Hallyn
@ 2011-07-06 16:57 ` Vasiliy Kulikov
2011-07-06 18:08 ` Oleg Nesterov
0 siblings, 1 reply; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-06 16:57 UTC (permalink / raw)
To: kernel-hardening; +Cc: akpm, daniel.lezcano, ebiederm, mingo, oleg, rdunlap, tj
Hi Serge,
On Wed, Jul 06, 2011 at 11:31 -0500, Serge Hallyn wrote:
> > diff --git a/ipc/shm.c b/ipc/shm.c
> > index ab3385a..bf46636 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -74,6 +74,7 @@ void shm_init_ns(struct ipc_namespace *ns)
> > ns->shm_ctlmax = SHMMAX;
> > ns->shm_ctlall = SHMALL;
> > ns->shm_ctlmni = SHMMNI;
> > + ns->shm_rmid_forced = 1;
>
> Given the description in Documentation/sysctl/kernel.txt, shouldn't
> this default to 0?
This is a change for testing purposes only, by Andrew:
http://www.openwall.com/lists/kernel-hardening/2011/06/29/7
> > /*
> > + * shm_may_destroy - identifies whether shm segment should be destroyed now
> > + *
> > + * Returns true if and only if there are no active users of the segment and
> > + * one of the following is true:
> > + *
> > + * 1) shmctl(id, IPC_RMID, NULL) was called for this shp
> > + *
> > + * 2) sysctl kernel.shm_rmid_forced is set to 1.
> > + */
> > +static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
>
> 'may' usually implies a permission check. Would this be better named
> 'shm_should_destroy()'?
Looks right.
> > +/* Called with ns->shm_ids(ns).rw_mutex locked */
> > +static int shm_try_destroy_current(int id, void *p, void *data)
> > +{
> > + struct ipc_namespace *ns = data;
> > + struct kern_ipc_perm *ipcp = p;
> > + struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
> > +
> > + if (shp->shm_creator != current)
> > + return 0;
> > +
> > + /*
> > + * Mark it as orphaned to destroy the segment when
> > + * kernel.shm_rmid_forced is changed.
> > + * It is noop if the following shm_may_destroy() returns true.
> > + */
> > + shp->shm_creator = NULL;
> > +
> > + /*
> > + * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
> > + * is not set, it shouldn't be deleted here.
> > + */
> > + if (!ns->shm_rmid_forced)
> > + return 0;
> > +
> > + if (shm_may_destroy(ns, shp)) {
>
> This seems redundant. Would it be better to just make this
>
> if (shp->shm_nattch == 0) {
>
> here as we already know ns->shm_rmid_forced == 1?
As this check doesn't cost much (shm_may_destroy() even may be inlined),
I want to leave the code here more readable.
> > + shm_lock_by_ptr(shp);
> > + shm_destroy(ns, shp);
>
> Wish there were a clean way to document that the locks will be
> released by shm_destroy().
Isn't the current comment sufficient?
/*
* shm_destroy - free the struct shmid_kernel
*
* @ns: namespace
* @shp: struct to free
*
* It has to be called with shp and shm_ids.rw_mutex (writer) locked,
* but returns with shp unlocked and freed.
*/
> > +void shm_destroy_orphaned(struct ipc_namespace *ns)
> > +{
> > + down_write(&shm_ids(ns).rw_mutex);
> > + if (&shm_ids(ns).in_use)
> > + idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
> > + up_write(&shm_ids(ns).rw_mutex);
>
> Hm, is this going to cause contention when killing a lot of tasks?
The default limit is 4096 segments, IMO too few to cause something
nasty.
> > +}
> > +
> > +
> > +void exit_shm(struct task_struct *task)
> > +{
> > + struct ipc_namespace *ns = task->nsproxy->ipc_ns;
> > +
> > + /* Destroy all already created segments, but not mapped yet */
> > + down_write(&shm_ids(ns).rw_mutex);
> > + if (&shm_ids(ns).in_use)
> > + idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
> > + up_write(&shm_ids(ns).rw_mutex);
>
> Having exit_shm() call shm_destroy_orphaned(task->nsproxy->ipc_ns) seems
> more future-proof?
shm_destroy_orphaned() doesn't clear ->shm_creator. Logically it sovles
another problem - it is used ONLY to be consistent while changing
kernel.shm_rmid_forced (having orphans with shm_rmid_forced=1 is not
honest).
Thanks,
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-06 16:57 ` Vasiliy Kulikov
@ 2011-07-06 18:08 ` Oleg Nesterov
2011-07-06 18:35 ` Vasiliy Kulikov
0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2011-07-06 18:08 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: kernel-hardening, akpm, daniel.lezcano, ebiederm, mingo, rdunlap,
tj
On 07/06, Vasiliy Kulikov wrote:
>
> > > +void exit_shm(struct task_struct *task)
> > > +{
> > > + struct ipc_namespace *ns = task->nsproxy->ipc_ns;
> > > +
> > > + /* Destroy all already created segments, but not mapped yet */
> > > + down_write(&shm_ids(ns).rw_mutex);
> > > + if (&shm_ids(ns).in_use)
> > > + idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
> > > + up_write(&shm_ids(ns).rw_mutex);
> >
> > Having exit_shm() call shm_destroy_orphaned(task->nsproxy->ipc_ns) seems
> > more future-proof?
>
> shm_destroy_orphaned() doesn't clear ->shm_creator. Logically it sovles
> another problem - it is used ONLY to be consistent while changing
> kernel.shm_rmid_forced (having orphans with shm_rmid_forced=1 is not
> honest).
Yes, there are different things.
Cough. I stil think exit_shm() should check .in_use != 0 lockless.
Oleg.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kernel-hardening] Re: [PATCH] shm: handle separate PID namespaces case
2011-07-06 18:08 ` Oleg Nesterov
@ 2011-07-06 18:35 ` Vasiliy Kulikov
0 siblings, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-06 18:35 UTC (permalink / raw)
To: kernel-hardening; +Cc: akpm, daniel.lezcano, ebiederm, mingo, rdunlap, tj
On Wed, Jul 06, 2011 at 20:08 +0200, Oleg Nesterov wrote:
> Cough. I stil think exit_shm() should check .in_use != 0 lockless.
Give me a proof it is safe for all architectures :)
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
* [kernel-hardening] Re: [PATCH v2] shm: handle separate PID namespaces case
2011-07-05 17:37 ` [kernel-hardening] [PATCH v2] shm: handle separate PID namespaces case Vasiliy Kulikov
@ 2011-07-15 6:45 ` Vasiliy Kulikov
0 siblings, 0 replies; 22+ messages in thread
From: Vasiliy Kulikov @ 2011-07-15 6:45 UTC (permalink / raw)
To: akpm
Cc: Oleg Nesterov, Serge Hallyn, daniel.lezcano, ebiederm, mingo,
rdunlap, tj, kernel-hardening
Hi Andrew,
On Tue, Jul 05, 2011 at 21:37 +0400, Vasiliy Kulikov wrote:
> shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
> the case of separate PID namespaces, but a single IPC namespace. If
> there are tasks with the same PID values using the same shmem object,
> the wrong destroy decision could be reached.
>
> On shm segment creation store the pointer to the creator task in
> shmid_kernel->shm_creator field and zero it on task exit. Then
> use the ->shm_creator insread of shm_cprid in both functions. As
> shmid_kernel object is already locked at this stage, no additional
> locking is needed.
Can you please pick these 2 patches? This one is a bug fix and another
is a cleanup/speedup. Thanks!
> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
> ---
> include/linux/shm.h | 3 +++
> ipc/shm.c | 27 ++++++++++++++++++++-------
> 2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index b030a4e..12d2234 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -95,6 +95,9 @@ struct shmid_kernel /* private to the kernel */
> pid_t shm_cprid;
> pid_t shm_lprid;
> struct user_struct *mlock_user;
> +
> + /* The task created the shm object. NULL if the task is dead. */
> + struct task_struct *shm_creator;
> };
>
> /* shm_mode upper byte flags */
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 22006f1..3baae98 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -239,7 +239,23 @@ static int shm_try_destroy_current(int id, void *p, void *data)
> if (IS_ERR(shp))
> return 0;
>
> - if (shp->shm_cprid != task_tgid_vnr(current)) {
> + if (shp->shm_creator != current) {
> + shm_unlock(shp);
> + return 0;
> + }
> +
> + /*
> + * Mark it as orphaned to destroy the segment when
> + * kernel.shm_rmid_forced is changed.
> + * It is noop if the following shm_may_destroy() returns true.
> + */
> + shp->shm_creator = NULL;
> +
> + /*
> + * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID
> + * is not set, it shouldn't be deleted here.
> + */
> + if (!ns->shm_rmid_forced) {
> shm_unlock(shp);
> return 0;
> }
> @@ -255,7 +271,6 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
> {
> struct ipc_namespace *ns = data;
> struct shmid_kernel *shp = shm_lock(ns, id);
> - struct task_struct *task;
>
> if (IS_ERR(shp))
> return 0;
> @@ -263,11 +278,8 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
> /*
> * We want to destroy segments without users and with already
> * exit'ed originating process.
> - *
> - * XXX: the originating process may exist in another pid namespace.
> */
> - task = find_task_by_vpid(shp->shm_cprid);
> - if (task != NULL) {
> + if (shp->shm_creator != NULL) {
> shm_unlock(shp);
> return 0;
> }
> @@ -295,7 +307,7 @@ void exit_shm(struct task_struct *task)
> if (!nsp)
> return;
> ns = nsp->ipc_ns;
> - if (!ns || !ns->shm_rmid_forced)
> + if (!ns)
> return;
>
> /* Destroy all already created segments, but not mapped yet */
> @@ -494,6 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> shp->shm_segsz = size;
> shp->shm_nattch = 0;
> shp->shm_file = file;
> + shp->shm_creator = current;
> /*
> * shmid gets reported as "inode#" in /proc/pid/maps.
> * proc-ps tools use this. Changing this will break them.
> --
> 1.7.0.4
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-07-15 6:45 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201106292214.p5TMEtHg015372@imap1.linux-foundation.org>
[not found] ` <20110630134855.GA6165@mail.hallyn.com>
2011-06-30 13:57 ` [kernel-hardening] Re: + ipc-introduce-shm_rmid_forced-sysctl.patch added to -mm tree Vasiliy Kulikov
2011-07-03 18:00 ` Vasiliy Kulikov
2011-07-04 11:55 ` [kernel-hardening] [PATCH] shm: handle separate PID namespaces case Vasiliy Kulikov
2011-07-04 15:05 ` [kernel-hardening] " Oleg Nesterov
2011-07-04 15:26 ` Vasiliy Kulikov
2011-07-04 15:37 ` Oleg Nesterov
2011-07-04 15:48 ` Vasiliy Kulikov
2011-07-04 17:01 ` [kernel-hardening] [PATCH] shm: optimize locking and ipc_namespace getting Vasiliy Kulikov
2011-07-04 17:29 ` [kernel-hardening] " Oleg Nesterov
2011-07-04 17:51 ` Vasiliy Kulikov
2011-07-05 17:38 ` [kernel-hardening] [PATCH v2] " Vasiliy Kulikov
2011-07-05 17:37 ` [kernel-hardening] [PATCH v2] shm: handle separate PID namespaces case Vasiliy Kulikov
2011-07-15 6:45 ` [kernel-hardening] " Vasiliy Kulikov
2011-07-05 14:26 ` [kernel-hardening] Re: [PATCH] " Serge Hallyn
2011-07-05 14:50 ` Vasiliy Kulikov
2011-07-05 15:57 ` Serge Hallyn
2011-07-05 17:42 ` Vasiliy Kulikov
2011-07-06 16:31 ` Serge Hallyn
2011-07-06 16:57 ` Vasiliy Kulikov
2011-07-06 18:08 ` Oleg Nesterov
2011-07-06 18:35 ` Vasiliy Kulikov
2011-07-05 17:29 ` Vasiliy Kulikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox