public inbox for kernel-hardening@lists.openwall.com
 help / color / mirror / Atom feed
* [kernel-hardening] destroy unused shmem segments
@ 2011-06-13  6:42 Vasiliy Kulikov
  2011-06-13  7:07 ` Solar Designer
  0 siblings, 1 reply; 5+ messages in thread
From: Vasiliy Kulikov @ 2011-06-13  6:42 UTC (permalink / raw)
  To: kernel-hardening

Solar,

I'm looking into the -ow patch and I see this code:

	#ifdef CONFIG_HARDEN_SHM
	void shm_exit (void)
	{
		int i;
		struct shmid_kernel *shp;

		for (i = 0; i <= shm_ids.max_id; i++) {
			shp = shm_get(i);
			if (!shp) continue;

			if (shp->shm_cprid != current->pid) continue;

			if (shp->shm_nattch <= 0) {
				shp->shm_flags |= SHM_DEST;
				shm_destroy (shp);
			}
		}
	}
	#endif


	NORET_TYPE void do_exit(long code)
	{
		...
	#ifdef CONFIG_HARDEN_SHM
		shm_exit();
	#endif
		...
	}

However, the shm segment should be already freed by exit_mm() => vma->close():

	static struct vm_operations_struct shm_vm_ops = {
		open:	shm_open,	/* callback for a new vm-area open */
		close:	shm_close,	/* callback for when the vm-area is released */
		nopage:	shmem_nopage,
	};

	static void shm_close (struct vm_area_struct *shmd)
	{
		...
		shp->shm_nattch--;
	#ifdef CONFIG_HARDEN_SHM
		if(shp->shm_nattch == 0) {
			shp->shm_flags |= SHM_DEST;
			shm_destroy (shp);
		}
	#else
		if(shp->shm_nattch == 0 &&
		   shp->shm_flags & SHM_DEST)
			shm_destroy (shp);
	#endif
		...
	}

Is it some additional "safety" check or a workaround for some dubious
race?  I see no explicit need of such freeing cycle in do_exit().

Thanks,

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [kernel-hardening] destroy unused shmem segments
  2011-06-13  6:42 Vasiliy Kulikov
@ 2011-06-13  7:07 ` Solar Designer
  2011-06-14  8:33   ` Vasiliy Kulikov
  0 siblings, 1 reply; 5+ messages in thread
From: Solar Designer @ 2011-06-13  7:07 UTC (permalink / raw)
  To: kernel-hardening

Vasiliy,

On Mon, Jun 13, 2011 at 10:42:52AM +0400, Vasiliy Kulikov wrote:
> Is it some additional "safety" check or a workaround for some dubious
> race?

Neither.  IIRC, I thought that this was needed, and not as a workaround,
nor because of a race.  However, your analysis appears to be correct to
me.  This leaves us with the following three possibilities:

- The extra code was never needed.  This is unlikely because I was
adding those pieces of code based on my testing results.

- The extra code was needed for some older kernel version (maybe older
than 2.4), then forward-ported (not carefully enough to spot this).
(Most CONFIG_HARDEN_* features 2.4.x-ow date back to my patches for 2.0.)

- We're missing something now.

I think that you don't need to figure out which it is.  Rather, you
need to implement the functionality for 3.0 and test it.  Then do it for
RHEL6/OpenVZ as well.

> I see no explicit need of such freeing cycle in do_exit().

Yes, it appears so from your analysis.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [kernel-hardening] destroy unused shmem segments
  2011-06-13  7:07 ` Solar Designer
@ 2011-06-14  8:33   ` Vasiliy Kulikov
  2011-06-15 14:42     ` Solar Designer
  0 siblings, 1 reply; 5+ messages in thread
From: Vasiliy Kulikov @ 2011-06-14  8:33 UTC (permalink / raw)
  To: kernel-hardening

Solar,

On Mon, Jun 13, 2011 at 11:07 +0400, Solar Designer wrote:
> - We're missing something now.

It is used in case shmem segment was created, but never used.  In this
case refcount is never incremented and never decremented, shm_clone()
isn't called.

If shmctl() sets IPC_RMID, it is deleted, but as you have implemented
implicit semantics (like IPC_RMID is always set), actual check in
shmget()+exit() is needed.

Spotted by reading -ow README and testing :)


I think forcing IPC_RMID should be configurable via sysctl like other
ipc variables.  Something like /proc/sys/kernel/shm_orphans_denied.
And I think changing it from 0 to 1 should destroy already orphaned
segments without users.


A patch implementing the feature for 3.0-rc2 is as follows:

 Documentation/sysctl/kernel.txt |   22 ++++++++++
 include/linux/ipc_namespace.h   |    3 +
 include/linux/shm.h             |    5 ++
 ipc/ipc_sysctl.c                |   33 +++++++++++++++
 ipc/shm.c                       |   87 +++++++++++++++++++++++++++++++++++++--
 kernel/exit.c                   |    1 +
 6 files changed, 147 insertions(+), 4 deletions(-)

---
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 5e7cb39..7a0ecfd 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -62,6 +62,7 @@ show up in /proc/sys/kernel:
 - shmall
 - shmmax                      [ sysv ipc ]
 - shmmni
+- shm_orphan_denied
 - stop-a                      [ SPARC only ]
 - sysrq                       ==> Documentation/sysrq.txt
 - tainted
@@ -475,6 +476,27 @@ kernel.  This value defaults to SHMMAX.
 
 ==============================================================
 
+shm_orphan_denied:
+
+Linux lets you set resource limits, including on how much memory one
+process can consume, via setrlimit(2).  Unfortunately, shared memory
+segments are allowed to exist without association with any process, and
+thus might not be counted against any resource limits.  If enabled,
+shared memory segments are automatically destroyed when their attach
+count becomes zero after a detach or a process termination.  It will
+also destroy segments that were created, but never attached to, on exit
+from the process.  The only use left for IPC_RMID is to immediately
+destroy an unattached segment. Of course, this breaks the way things are
+defined, so some applications might stop working.  Note that this
+feature will do you no good unless you also configure your resource
+limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't
+need this.
+
+Note that if you change this from 0 to 1, already created segments
+without users and with a dead originative process will be destroyed.
+
+==============================================================
+
 softlockup_thresh:
 
 This value can be used to lower the softlockup tolerance threshold.  The
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index a6d1655..1647795 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -44,6 +44,7 @@ struct ipc_namespace {
 	size_t		shm_ctlall;
 	int		shm_ctlmni;
 	int		shm_tot;
+	int		shm_orphans_denied;
 
 	struct notifier_block ipcns_nb;
 
@@ -72,6 +73,7 @@ extern int register_ipcns_notifier(struct ipc_namespace *);
 extern int cond_register_ipcns_notifier(struct ipc_namespace *);
 extern void unregister_ipcns_notifier(struct ipc_namespace *);
 extern int ipcns_notify(unsigned long);
+extern void shm_destroy_orphaned(struct ipc_namespace *ns);
 #else /* CONFIG_SYSVIPC */
 static inline int register_ipcns_notifier(struct ipc_namespace *ns)
 { return 0; }
@@ -79,6 +81,7 @@ static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns)
 { return 0; }
 static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { }
 static inline int ipcns_notify(unsigned long l) { return 0; }
+static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {}
 #endif /* CONFIG_SYSVIPC */
 
 #ifdef CONFIG_POSIX_MQUEUE
diff --git a/include/linux/shm.h b/include/linux/shm.h
index eca6235..b030a4e 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -106,6 +106,7 @@ struct shmid_kernel /* private to the kernel */
 #ifdef CONFIG_SYSVIPC
 long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr);
 extern int is_file_shm_hugepages(struct file *file);
+extern void exit_shm(struct task_struct *task);
 #else
 static inline long do_shmat(int shmid, char __user *shmaddr,
 				int shmflg, unsigned long *addr)
@@ -116,6 +117,10 @@ static inline int is_file_shm_hugepages(struct file *file)
 {
 	return 0;
 }
+static inline void exit_shm(struct task_struct *task)
+{
+	return 0;
+}
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 56410fa..1d36879 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -37,6 +37,28 @@ static int proc_ipc_dointvec(ctl_table *table, int write,
 	return proc_dointvec(&ipc_table, write, buffer, lenp, ppos);
 }
 
+static int proc_ipc_dointvec_minmax(ctl_table *table, int write,
+	void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table ipc_table;
+	memcpy(&ipc_table, table, sizeof(ipc_table));
+	ipc_table.data = get_ipc(table);
+
+	return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
+}
+
+static int proc_ipc_dointvec_minmax_orphans(ctl_table *table, int write,
+	void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+	int err = proc_ipc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (err < 0)
+		return err;
+	if (ns->shm_orphans_denied)
+		shm_destroy_orphaned(ns);
+	return err;
+}
+
 static int proc_ipc_callback_dointvec(ctl_table *table, int write,
 	void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -125,6 +147,8 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
 #else
 #define proc_ipc_doulongvec_minmax NULL
 #define proc_ipc_dointvec	   NULL
+#define proc_ipc_dointvec_minmax   NULL
+#define proc_ipc_dointvec_minmax_orphans   NULL
 #define proc_ipc_callback_dointvec NULL
 #define proc_ipcauto_dointvec_minmax NULL
 #endif
@@ -155,6 +179,15 @@ static struct ctl_table ipc_kern_table[] = {
 		.proc_handler	= proc_ipc_dointvec,
 	},
 	{
+		.procname	= "shm_orphans_denied",
+		.data		= &init_ipc_ns.shm_orphans_denied,
+		.maxlen		= sizeof (init_ipc_ns.shm_orphans_denied),
+		.mode		= 0644,
+		.proc_handler	= proc_ipc_dointvec_minmax_orphans,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname	= "msgmax",
 		.data		= &init_ipc_ns.msg_ctlmax,
 		.maxlen		= sizeof (init_ipc_ns.msg_ctlmax),
diff --git a/ipc/shm.c b/ipc/shm.c
index ab3385a..91b3026 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -74,6 +74,7 @@ void shm_init_ns(struct ipc_namespace *ns)
 	ns->shm_ctlmax = SHMMAX;
 	ns->shm_ctlall = SHMALL;
 	ns->shm_ctlmni = SHMMNI;
+	ns->shm_orphans_denied = 0;
 	ns->shm_tot = 0;
 	ipc_init_ids(&shm_ids(ns));
 }
@@ -186,6 +187,13 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
 	ipc_rcu_putref(shp);
 }
 
+static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+{
+	return (shp->shm_nattch == 0) &&
+	       (ns->shm_orphans_denied ||
+		(shp->shm_perm.mode & SHM_DEST));
+}
+
 /*
  * remove the attach descriptor vma.
  * free memory for segment if it is marked destroyed.
@@ -206,11 +214,83 @@ static void shm_close(struct vm_area_struct *vma)
 	shp->shm_lprid = task_tgid_vnr(current);
 	shp->shm_dtim = get_seconds();
 	shp->shm_nattch--;
-	if(shp->shm_nattch == 0 &&
-	   shp->shm_perm.mode & SHM_DEST)
+	if (shm_may_destroy(ns, shp))
+		shm_destroy(ns, shp);
+	else
+		shm_unlock(shp);
+	up_write(&shm_ids(ns).rw_mutex);
+}
+
+static int shm_try_destroy_current(int id, void *p, void *data)
+{
+	struct ipc_namespace *ns = data;
+	struct shmid_kernel *shp = shm_lock(ns, id);
+
+	if (IS_ERR(shp))
+		return 0;
+
+	if (shp->shm_cprid != task_tgid_vnr(current)) {
+		shm_unlock(shp);
+		return 0;
+	}
+
+	if (shm_may_destroy(ns, shp))
 		shm_destroy(ns, shp);
 	else
 		shm_unlock(shp);
+	return 0;
+}
+
+static int shm_try_destroy_orphaned(int id, void *p, void *data)
+{
+	struct ipc_namespace *ns = data;
+	struct shmid_kernel *shp = shm_lock(ns, id);
+	struct task_struct *task;
+
+	if (IS_ERR(shp))
+		return 0;
+
+	/*
+	 * We want to destroy segments without users and with already
+	 * exit'ed originating process.
+	 *
+	 * XXX: the originating process may exist in another pid namespace.
+	 */
+	task = find_task_by_vpid(shp->shm_cprid);
+	if (task != NULL) {
+		shm_unlock(shp);
+		return 0;
+	}
+
+	if (shm_may_destroy(ns, shp))
+		shm_destroy(ns, shp);
+	else
+		shm_unlock(shp);
+	return 0;
+}
+
+void shm_destroy_orphaned(struct ipc_namespace *ns)
+{
+	down_write(&shm_ids(ns).rw_mutex);
+	idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
+	up_write(&shm_ids(ns).rw_mutex);
+}
+
+
+void exit_shm(struct task_struct *task)
+{
+	struct nsproxy *nsp = task->nsproxy;
+	struct ipc_namespace *ns;
+
+	if (!nsp)
+		return;
+	ns = nsp->ipc_ns;
+	if (!ns || !ns->shm_orphans_denied)
+		return;
+
+	/* Destroy all already created segments, but not mapped yet */
+	down_write(&shm_ids(ns).rw_mutex);
+	idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
 	up_write(&shm_ids(ns).rw_mutex);
 }
 
@@ -950,8 +1030,7 @@ out_nattch:
 	shp = shm_lock(ns, shmid);
 	BUG_ON(IS_ERR(shp));
 	shp->shm_nattch--;
-	if(shp->shm_nattch == 0 &&
-	   shp->shm_perm.mode & SHM_DEST)
+	if (shm_may_destroy(ns, shp))
 		shm_destroy(ns, shp);
 	else
 		shm_unlock(shp);
diff --git a/kernel/exit.c b/kernel/exit.c
index 20a4064..816c610 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -991,6 +991,7 @@ NORET_TYPE void do_exit(long code)
 	trace_sched_process_exit(tsk);
 
 	exit_sem(tsk);
+	exit_shm(tsk);
 	exit_files(tsk);
 	exit_fs(tsk);
 	check_stack_usage();
---

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [kernel-hardening] destroy unused shmem segments
@ 2011-06-14 17:51 Vasiliy Kulikov
  0 siblings, 0 replies; 5+ messages in thread
From: Vasiliy Kulikov @ 2011-06-14 17:51 UTC (permalink / raw)
  To: kernel-hardening

> +- shm_orphan_denied

> +shm_orphan_denied:

> +		.procname	= "shm_orphans_denied",

Oops, a typo.

Also maybe it should be named as "shm_forced_ipc_rmid" or just
"shm_forced_rmid"?  I dunno.

-- 
Vasiliy Kulikov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [kernel-hardening] destroy unused shmem segments
  2011-06-14  8:33   ` Vasiliy Kulikov
@ 2011-06-15 14:42     ` Solar Designer
  0 siblings, 0 replies; 5+ messages in thread
From: Solar Designer @ 2011-06-15 14:42 UTC (permalink / raw)
  To: kernel-hardening

Vasiliy,

On Tue, Jun 14, 2011 at 12:33:45PM +0400, Vasiliy Kulikov wrote:
> It is used in case shmem segment was created, but never used.  In this
> case refcount is never incremented and never decremented, shm_clone()
> isn't called.
> 
> If shmctl() sets IPC_RMID, it is deleted, but as you have implemented
> implicit semantics (like IPC_RMID is always set), actual check in
> shmget()+exit() is needed.
> 
> Spotted by reading -ow README and testing :)

Oh, right.  Having documentation is good.

> I think forcing IPC_RMID should be configurable via sysctl like other
> ipc variables.  Something like /proc/sys/kernel/shm_orphans_denied.
> And I think changing it from 0 to 1 should destroy already orphaned
> segments without users.

OK.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-15 14:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-14 17:51 [kernel-hardening] destroy unused shmem segments Vasiliy Kulikov
  -- strict thread matches above, loose matches on Subject: below --
2011-06-13  6:42 Vasiliy Kulikov
2011-06-13  7:07 ` Solar Designer
2011-06-14  8:33   ` Vasiliy Kulikov
2011-06-15 14:42     ` Solar Designer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox