public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit
@ 2010-05-10 18:00 Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 02/11] rlimits: add task_struct to update_rlimit_cpu Jiri Slaby
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Jiri Slaby <jirislaby@gmail.com>

Andrew, could you pick them up, please?

In this version I fixed up commit logs and added rationale in 10/11.

--

Add task_struct to task_setrlimit of security_operations to be able to set
rlimit of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
---
 include/linux/security.h |    9 ++++++---
 kernel/sys.c             |    2 +-
 security/capability.c    |    3 ++-
 security/security.c      |    5 +++--
 security/selinux/hooks.c |    7 ++++---
 5 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 24fc295..2c6a756 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1502,7 +1502,8 @@ struct security_operations {
 	int (*task_setnice) (struct task_struct *p, int nice);
 	int (*task_setioprio) (struct task_struct *p, int ioprio);
 	int (*task_getioprio) (struct task_struct *p);
-	int (*task_setrlimit) (unsigned int resource, struct rlimit *new_rlim);
+	int (*task_setrlimit) (struct task_struct *p, unsigned int resource,
+			struct rlimit *new_rlim);
 	int (*task_setscheduler) (struct task_struct *p, int policy,
 				  struct sched_param *lp);
 	int (*task_getscheduler) (struct task_struct *p);
@@ -1752,7 +1753,8 @@ void security_task_getsecid(struct task_struct *p, u32 *secid);
 int security_task_setnice(struct task_struct *p, int nice);
 int security_task_setioprio(struct task_struct *p, int ioprio);
 int security_task_getioprio(struct task_struct *p);
-int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim);
+int security_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim);
 int security_task_setscheduler(struct task_struct *p,
 				int policy, struct sched_param *lp);
 int security_task_getscheduler(struct task_struct *p);
@@ -2314,7 +2316,8 @@ static inline int security_task_getioprio(struct task_struct *p)
 	return 0;
 }
 
-static inline int security_task_setrlimit(unsigned int resource,
+static inline int security_task_setrlimit(struct task_struct *p,
+					  unsigned int resource,
 					  struct rlimit *new_rlim)
 {
 	return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index e83ddbb..1ba4522 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1290,7 +1290,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
 		return -EPERM;
 
-	retval = security_task_setrlimit(resource, &new_rlim);
+	retval = security_task_setrlimit(current, resource, &new_rlim);
 	if (retval)
 		return retval;
 
diff --git a/security/capability.c b/security/capability.c
index 7f093d5..33a16d6 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -412,7 +412,8 @@ static int cap_task_getioprio(struct task_struct *p)
 	return 0;
 }
 
-static int cap_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+static int cap_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index a525e01..3ed1719 100644
--- a/security/security.c
+++ b/security/security.c
@@ -781,9 +781,10 @@ int security_task_getioprio(struct task_struct *p)
 	return security_ops->task_getioprio(p);
 }
 
-int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+int security_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	return security_ops->task_setrlimit(resource, new_rlim);
+	return security_ops->task_setrlimit(p, resource, new_rlim);
 }
 
 int security_task_setscheduler(struct task_struct *p,
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ebee467..674992f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3393,16 +3393,17 @@ static int selinux_task_getioprio(struct task_struct *p)
 	return current_has_perm(p, PROCESS__GETSCHED);
 }
 
-static int selinux_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
+static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	struct rlimit *old_rlim = current->signal->rlim + resource;
+	struct rlimit *old_rlim = p->signal->rlim + resource;
 
 	/* Control the ability to change the hard limit (whether
 	   lowering or raising it), so that the hard limit can
 	   later be used as a safe reset point for the soft limit
 	   upon context transitions.  See selinux_bprm_committing_creds. */
 	if (old_rlim->rlim_max != new_rlim->rlim_max)
-		return current_has_perm(current, PROCESS__SETRLIMIT);
+		return current_has_perm(p, PROCESS__SETRLIMIT);
 
 	return 0;
 }
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 02/11] rlimits: add task_struct to update_rlimit_cpu
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 03/11] rlimits: make sure ->rlim_max never grows in sys_setrlimit Jiri Slaby
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Jiri Slaby <jirislaby@gmail.com>

Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
---
 include/linux/posix-timers.h |    2 +-
 kernel/posix-cpu-timers.c    |    8 ++++----
 kernel/sys.c                 |    2 +-
 security/selinux/hooks.c     |    3 ++-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 4f71bf4..3e23844 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -117,6 +117,6 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 
 long clock_nanosleep_restart(struct restart_block *restart_block);
 
-void update_rlimit_cpu(unsigned long rlim_new);
+void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
 #endif
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 9829646..0513900 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -16,13 +16,13 @@
  * siglock protection since other code may update expiration cache as
  * well.
  */
-void update_rlimit_cpu(unsigned long rlim_new)
+void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new)
 {
 	cputime_t cputime = secs_to_cputime(rlim_new);
 
-	spin_lock_irq(&current->sighand->siglock);
-	set_process_cpu_timer(current, CPUCLOCK_PROF, &cputime, NULL);
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_lock_irq(&task->sighand->siglock);
+	set_process_cpu_timer(task, CPUCLOCK_PROF, &cputime, NULL);
+	spin_unlock_irq(&task->sighand->siglock);
 }
 
 static int check_clock(const clockid_t which_clock)
diff --git a/kernel/sys.c b/kernel/sys.c
index 1ba4522..f5183b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1320,7 +1320,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	if (new_rlim.rlim_cur == RLIM_INFINITY)
 		goto out;
 
-	update_rlimit_cpu(new_rlim.rlim_cur);
+	update_rlimit_cpu(current, new_rlim.rlim_cur);
 out:
 	return 0;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 674992f..740a71f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2360,7 +2360,8 @@ static void selinux_bprm_committing_creds(struct linux_binprm *bprm)
 			initrlim = init_task.signal->rlim + i;
 			rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
 		}
-		update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
+		update_rlimit_cpu(current,
+				current->signal->rlim[RLIMIT_CPU].rlim_cur);
 	}
 }
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 03/11] rlimits: make sure ->rlim_max never grows in sys_setrlimit
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 02/11] rlimits: add task_struct to update_rlimit_cpu Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 04/11] rlimits: split sys_setrlimit Jiri Slaby
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Oleg Nesterov <oleg@redhat.com>

Mostly preparation for Jiri's changes, but probably makes sense anyway.

sys_setrlimit() checks new_rlim.rlim_max <= old_rlim->rlim_max, but when
it takes task_lock() old_rlim->rlim_max can be already lowered. Move this
check under task_lock().

Currently this is not important, we can only race with our sub-thread,
this means the application is stupid. But when we change the code to allow
the update of !current task's limits, it becomes important to make sure
->rlim_max can be lowered "reliably" even if we race with the application
doing sys_setrlimit().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
---
 kernel/sys.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index f5183b0..d527c46 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1283,10 +1283,6 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 		return -EFAULT;
 	if (new_rlim.rlim_cur > new_rlim.rlim_max)
 		return -EINVAL;
-	old_rlim = current->signal->rlim + resource;
-	if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
-	    !capable(CAP_SYS_RESOURCE))
-		return -EPERM;
 	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
 		return -EPERM;
 
@@ -1304,11 +1300,16 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 		new_rlim.rlim_cur = 1;
 	}
 
+	old_rlim = current->signal->rlim + resource;
 	task_lock(current->group_leader);
-	*old_rlim = new_rlim;
+	if ((new_rlim.rlim_max <= old_rlim->rlim_max) ||
+				capable(CAP_SYS_RESOURCE))
+		*old_rlim = new_rlim;
+	else
+		retval = -EPERM;
 	task_unlock(current->group_leader);
 
-	if (resource != RLIMIT_CPU)
+	if (retval || resource != RLIMIT_CPU)
 		goto out;
 
 	/*
@@ -1322,7 +1323,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 
 	update_rlimit_cpu(current, new_rlim.rlim_cur);
 out:
-	return 0;
+	return retval;
 }
 
 /*
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 04/11] rlimits: split sys_setrlimit
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 02/11] rlimits: add task_struct to update_rlimit_cpu Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 03/11] rlimits: make sure ->rlim_max never grows in sys_setrlimit Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks Jiri Slaby
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Jiri Slaby <jirislaby@gmail.com>

Create do_setrlimit from sys_setrlimit and declare do_setrlimit
in the resource header. This is the firts phase to have generic
do_prlimit which allows to be called from read, write and compat
rlimits code.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/resource.h |    2 ++
 kernel/sys.c             |   40 ++++++++++++++++++++++++----------------
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index f1e914e..cf8dc96 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -73,6 +73,8 @@ struct rlimit {
 struct task_struct;
 
 int getrusage(struct task_struct *p, int who, struct rusage __user *ru);
+int do_setrlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim);
 
 #endif /* __KERNEL__ */
 
diff --git a/kernel/sys.c b/kernel/sys.c
index d527c46..7c76f84 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1272,42 +1272,41 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 
 #endif
 
-SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
+int do_setrlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim)
 {
-	struct rlimit new_rlim, *old_rlim;
+	struct rlimit *old_rlim;
 	int retval;
 
 	if (resource >= RLIM_NLIMITS)
 		return -EINVAL;
-	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
-		return -EFAULT;
-	if (new_rlim.rlim_cur > new_rlim.rlim_max)
+	if (new_rlim->rlim_cur > new_rlim->rlim_max)
 		return -EINVAL;
-	if (resource == RLIMIT_NOFILE && new_rlim.rlim_max > sysctl_nr_open)
+	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
-	retval = security_task_setrlimit(current, resource, &new_rlim);
+	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
 		return retval;
 
-	if (resource == RLIMIT_CPU && new_rlim.rlim_cur == 0) {
+	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
 		 * The caller is asking for an immediate RLIMIT_CPU
 		 * expiry.  But we use the zero value to mean "it was
 		 * never set".  So let's cheat and make it one second
 		 * instead
 		 */
-		new_rlim.rlim_cur = 1;
+		new_rlim->rlim_cur = 1;
 	}
 
-	old_rlim = current->signal->rlim + resource;
-	task_lock(current->group_leader);
-	if ((new_rlim.rlim_max <= old_rlim->rlim_max) ||
+	old_rlim = tsk->signal->rlim + resource;
+	task_lock(tsk->group_leader);
+	if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
 				capable(CAP_SYS_RESOURCE))
-		*old_rlim = new_rlim;
+		*old_rlim = *new_rlim;
 	else
 		retval = -EPERM;
-	task_unlock(current->group_leader);
+	task_unlock(tsk->group_leader);
 
 	if (retval || resource != RLIMIT_CPU)
 		goto out;
@@ -1318,14 +1317,23 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 	 * very long-standing error, and fixing it now risks breakage of
 	 * applications, so we live with it
 	 */
-	if (new_rlim.rlim_cur == RLIM_INFINITY)
+	if (new_rlim->rlim_cur == RLIM_INFINITY)
 		goto out;
 
-	update_rlimit_cpu(current, new_rlim.rlim_cur);
+	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
 	return retval;
 }
 
+SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
+{
+	struct rlimit new_rlim;
+
+	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
+		return -EFAULT;
+	return do_setrlimit(current, resource, &new_rlim);
+}
+
 /*
  * It would make sense to put struct rusage in the task_struct,
  * except that would make the task_struct be *really big*.  After
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (2 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 04/11] rlimits: split sys_setrlimit Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-13 22:56   ` Andrew Morton
  2010-05-10 18:00 ` Jiri Slaby
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Jiri Slaby <jirislaby@gmail.com>

Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has sighand non-null. Do all the signal processing under
that lock still held.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 7c76f84..eb21661 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1272,6 +1272,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 
 #endif
 
+/* make sure you are allowed to change @tsk limits before calling this */
 int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim)
 {
@@ -1285,9 +1286,16 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
 		return -EPERM;
 
+	/* protect tsk->signal and tsk->sighand from disappearing */
+	read_lock(&tasklist_lock);
+	if (!tsk->sighand) {
+		retval = -ESRCH;
+		goto out;
+	}
+
 	retval = security_task_setrlimit(tsk, resource, new_rlim);
 	if (retval)
-		return retval;
+		goto out;
 
 	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
@@ -1322,6 +1330,7 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 
 	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
+	read_unlock(&tasklist_lock);
 	return retval;
 }
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (3 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-13 22:56   ` Andrew Morton
  2010-05-10 18:00 ` [PATCH v3 07/11] rlimits: add rlimit64 structure Jiri Slaby
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm
  Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby, Heiko Carstens,
	Ingo Molnar

Do security_task_setrlimit under task_lock. Other tasks may
change limits under our hands while we are checking limits
inside the function. From now on, they can't.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: James Morris <jmorris@namei.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/sys.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index eb21661..9e9a3a7 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1277,7 +1277,7 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim)
 {
 	struct rlimit *old_rlim;
-	int retval;
+	int retval = 0;
 
 	if (resource >= RLIM_NLIMITS)
 		return -EINVAL;
@@ -1293,10 +1293,6 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 		goto out;
 	}
 
-	retval = security_task_setrlimit(tsk, resource, new_rlim);
-	if (retval)
-		goto out;
-
 	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
 		/*
 		 * The caller is asking for an immediate RLIMIT_CPU
@@ -1309,11 +1305,13 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 
 	old_rlim = tsk->signal->rlim + resource;
 	task_lock(tsk->group_leader);
-	if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
-				capable(CAP_SYS_RESOURCE))
-		*old_rlim = *new_rlim;
-	else
+	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
+				!capable(CAP_SYS_RESOURCE))
 		retval = -EPERM;
+	if (!retval)
+		retval = security_task_setrlimit(tsk, resource, new_rlim);
+	if (!retval)
+		*old_rlim = *new_rlim;
 	task_unlock(tsk->group_leader);
 
 	if (retval || resource != RLIMIT_CPU)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 07/11] rlimits: add rlimit64 structure
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (4 preceding siblings ...)
  2010-05-10 18:00 ` Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 08/11] rlimits: redo do_setrlimit to more generic do_prlimit Jiri Slaby
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

Add a platform independent structure for resource limits to use with
a new prlimit64 syscall. This structure is the same which uses glibc
for 64-bit limits.

Also add corresponding infinity which is a 64-bit full of ones.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/resource.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index cf8dc96..037aa7e 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -43,6 +43,13 @@ struct rlimit {
 	unsigned long	rlim_max;
 };
 
+#define RLIM64_INFINITY		(~0ULL)
+
+struct rlimit64 {
+	__u64 rlim_cur;
+	__u64 rlim_max;
+};
+
 #define	PRIO_MIN	(-20)
 #define	PRIO_MAX	20
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 08/11] rlimits: redo do_setrlimit to more generic do_prlimit
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (5 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 07/11] rlimits: add rlimit64 structure Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 09/11] rlimits: switch more rlimit syscalls to do_prlimit Jiri Slaby
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

It now allows also reading of limits. I.e. all read and writes will
later use this function.

It takes two parameters, new and old limits which can be both NULL.
If new is non-NULL, the value in it is set to rlimits.
If old is non-NULL, current rlimits are stored there.
If both are non-NULL, old are stored prior to setting the new ones,
atomically.
(Similar to sigaction.)

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/resource.h |    4 +-
 kernel/sys.c             |   63 ++++++++++++++++++++++++++-------------------
 2 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index 037aa7e..88d36f9 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -80,8 +80,8 @@ struct rlimit64 {
 struct task_struct;
 
 int getrusage(struct task_struct *p, int who, struct rusage __user *ru);
-int do_setrlimit(struct task_struct *tsk, unsigned int resource,
-		struct rlimit *new_rlim);
+int do_prlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim, struct rlimit *old_rlim);
 
 #endif /* __KERNEL__ */
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 9e9a3a7..7a453d2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1273,18 +1273,30 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 #endif
 
 /* make sure you are allowed to change @tsk limits before calling this */
-int do_setrlimit(struct task_struct *tsk, unsigned int resource,
-		struct rlimit *new_rlim)
+int do_prlimit(struct task_struct *tsk, unsigned int resource,
+		struct rlimit *new_rlim, struct rlimit *old_rlim)
 {
-	struct rlimit *old_rlim;
+	struct rlimit *rlim;
 	int retval = 0;
 
 	if (resource >= RLIM_NLIMITS)
 		return -EINVAL;
-	if (new_rlim->rlim_cur > new_rlim->rlim_max)
-		return -EINVAL;
-	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
-		return -EPERM;
+	if (new_rlim) {
+		if (new_rlim->rlim_cur > new_rlim->rlim_max)
+			return -EINVAL;
+		if (resource == RLIMIT_NOFILE &&
+				new_rlim->rlim_max > sysctl_nr_open)
+			return -EPERM;
+		if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
+			/*
+			 * The caller is asking for an immediate RLIMIT_CPU
+			 * expiry.  But we use the zero value to mean "it was
+			 * never set".  So let's cheat and make it one second
+			 * instead
+			 */
+			new_rlim->rlim_cur = 1;
+		}
+	}
 
 	/* protect tsk->signal and tsk->sighand from disappearing */
 	read_lock(&tasklist_lock);
@@ -1293,28 +1305,25 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
 		goto out;
 	}
 
-	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
-		/*
-		 * The caller is asking for an immediate RLIMIT_CPU
-		 * expiry.  But we use the zero value to mean "it was
-		 * never set".  So let's cheat and make it one second
-		 * instead
-		 */
-		new_rlim->rlim_cur = 1;
-	}
-
-	old_rlim = tsk->signal->rlim + resource;
+	rlim = tsk->signal->rlim + resource;
 	task_lock(tsk->group_leader);
-	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
-				!capable(CAP_SYS_RESOURCE))
-		retval = -EPERM;
-	if (!retval)
-		retval = security_task_setrlimit(tsk, resource, new_rlim);
-	if (!retval)
-		*old_rlim = *new_rlim;
+	if (new_rlim) {
+		if ((new_rlim->rlim_max > rlim->rlim_max) &&
+					!capable(CAP_SYS_RESOURCE))
+			retval = -EPERM;
+		if (!retval)
+			retval = security_task_setrlimit(tsk, resource,
+					new_rlim);
+	}
+	if (!retval) {
+		if (old_rlim)
+			*old_rlim = *rlim;
+		if (new_rlim)
+			*rlim = *new_rlim;
+	}
 	task_unlock(tsk->group_leader);
 
-	if (retval || resource != RLIMIT_CPU)
+	if (retval || !new_rlim || resource != RLIMIT_CPU)
 		goto out;
 
 	/*
@@ -1338,7 +1347,7 @@ SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 
 	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
 		return -EFAULT;
-	return do_setrlimit(current, resource, &new_rlim);
+	return do_prlimit(current, resource, &new_rlim, NULL);
 }
 
 /*
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 09/11] rlimits: switch more rlimit syscalls to do_prlimit
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (6 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 08/11] rlimits: redo do_setrlimit to more generic do_prlimit Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 10/11] rlimits: implement prlimit64 syscall Jiri Slaby
  2010-05-10 18:00 ` [PATCH v3 11/11] unistd: add __NR_prlimit64 syscall numbers Jiri Slaby
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

From: Jiri Slaby <jirislaby@gmail.com>

After we added more generic do_prlimit, switch sys_getrlimit to that.
Also switch compat handling, so we can get rid of ugly __user casts
and avoid setting process' address limit to kernel data and back.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 kernel/compat.c |   17 +++--------------
 kernel/sys.c    |   17 ++++++++---------
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 7f40e92..87d054f 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -279,11 +279,6 @@ asmlinkage long compat_sys_setrlimit(unsigned int resource,
 		struct compat_rlimit __user *rlim)
 {
 	struct rlimit r;
-	int ret;
-	mm_segment_t old_fs = get_fs ();
-
-	if (resource >= RLIM_NLIMITS)
-		return -EINVAL;
 
 	if (!access_ok(VERIFY_READ, rlim, sizeof(*rlim)) ||
 	    __get_user(r.rlim_cur, &rlim->rlim_cur) ||
@@ -294,10 +289,7 @@ asmlinkage long compat_sys_setrlimit(unsigned int resource,
 		r.rlim_cur = RLIM_INFINITY;
 	if (r.rlim_max == COMPAT_RLIM_INFINITY)
 		r.rlim_max = RLIM_INFINITY;
-	set_fs(KERNEL_DS);
-	ret = sys_setrlimit(resource, (struct rlimit __user *) &r);
-	set_fs(old_fs);
-	return ret;
+	return do_prlimit(current, resource, &r, NULL);
 }
 
 #ifdef COMPAT_RLIM_OLD_INFINITY
@@ -329,16 +321,13 @@ asmlinkage long compat_sys_old_getrlimit(unsigned int resource,
 
 #endif
 
-asmlinkage long compat_sys_getrlimit (unsigned int resource,
+asmlinkage long compat_sys_getrlimit(unsigned int resource,
 		struct compat_rlimit __user *rlim)
 {
 	struct rlimit r;
 	int ret;
-	mm_segment_t old_fs = get_fs();
 
-	set_fs(KERNEL_DS);
-	ret = sys_getrlimit(resource, (struct rlimit __user *) &r);
-	set_fs(old_fs);
+	ret = do_prlimit(current, resource, NULL, &r);
 	if (!ret) {
 		if (r.rlim_cur > COMPAT_RLIM_INFINITY)
 			r.rlim_cur = COMPAT_RLIM_INFINITY;
diff --git a/kernel/sys.c b/kernel/sys.c
index 7a453d2..0f8034f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1236,15 +1236,14 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
 
 SYSCALL_DEFINE2(getrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 {
-	if (resource >= RLIM_NLIMITS)
-		return -EINVAL;
-	else {
-		struct rlimit value;
-		task_lock(current->group_leader);
-		value = current->signal->rlim[resource];
-		task_unlock(current->group_leader);
-		return copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0;
-	}
+	struct rlimit value;
+	int ret;
+
+	ret = do_prlimit(current, resource, NULL, &value);
+	if (!ret)
+		ret = copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0;
+
+	return ret;
 }
 
 #ifdef __ARCH_WANT_SYS_OLD_GETRLIMIT
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 10/11] rlimits: implement prlimit64 syscall
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (7 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 09/11] rlimits: switch more rlimit syscalls to do_prlimit Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  2010-05-13 22:56   ` Andrew Morton
  2010-05-10 18:00 ` [PATCH v3 11/11] unistd: add __NR_prlimit64 syscall numbers Jiri Slaby
  9 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

This patch adds the code to support the sys_prlimit64 syscall which
modifies-and-returns the rlim values of a selected process
atomically. The first parameter, pid, being 0 means current process.

Unlike the current implementation, it is a generic interface,
architecture indepentent so that we needn't handle compat stuff
anymore. In the future, after glibc start to use this we can deprecate
sys_setrlimit and sys_getrlimit in favor to clean up the code finally.

It also adds a possibility of changing limits of other processes. We
check the user's permissions to do that and if it succeeds, the new
limits are propagated online. This is good for large scale
applications such as SAP or databases where administrators need to
change limits time by time (e.g. on crashes increase core size). And
it is unacceptable to restart the service.

For safety, all rlim users now either use accessors or doesn't need
them due to
- locking
- the fact a process was just forked and nobody else knows about it
  yet (and nobody can't thus read/write limits)
hence it is safe to modify limits now.

The limitation is that we currently stay at ulong internal
representation. So we use the rlim64_is_infinity check where we
compare to ULONG_MAX on 32-bit which is the maximum value there.

And since internally we hold limits in struct rlimit, we introduce
converters used before and after do_prlimit call in sys_prlimit64.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/syscalls.h |    4 ++
 kernel/sys.c             |   99 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f93a5c4..3f0e3ab 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -35,6 +35,7 @@ struct oldold_utsname;
 struct old_utsname;
 struct pollfd;
 struct rlimit;
+struct rlimit64;
 struct rusage;
 struct sched_param;
 struct sel_arg_struct;
@@ -667,6 +668,9 @@ asmlinkage long sys_old_getrlimit(unsigned int resource, struct rlimit __user *r
 #endif
 asmlinkage long sys_setrlimit(unsigned int resource,
 				struct rlimit __user *rlim);
+asmlinkage long sys_prlimit64(pid_t pid, unsigned int resource,
+				const struct rlimit64 __user *new_rlim,
+				struct rlimit64 __user *old_rlim);
 asmlinkage long sys_getrusage(int who, struct rusage __user *ru);
 asmlinkage long sys_umask(int mask);
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 0f8034f..d7fcd4a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1271,6 +1271,39 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 
 #endif
 
+static inline bool rlim64_is_infinity(__u64 rlim64)
+{
+#if BITS_PER_LONG < 64
+	return rlim64 >= ULONG_MAX;
+#else
+	return rlim64 == RLIM64_INFINITY;
+#endif
+}
+
+static void rlim_to_rlim64(const struct rlimit *rlim, struct rlimit64 *rlim64)
+{
+	if (rlim->rlim_cur == RLIM_INFINITY)
+		rlim64->rlim_cur = RLIM64_INFINITY;
+	else
+		rlim64->rlim_cur = rlim->rlim_cur;
+	if (rlim->rlim_max == RLIM_INFINITY)
+		rlim64->rlim_max = RLIM64_INFINITY;
+	else
+		rlim64->rlim_max = rlim->rlim_max;
+}
+
+static void rlim64_to_rlim(const struct rlimit64 *rlim64, struct rlimit *rlim)
+{
+	if (rlim64_is_infinity(rlim64->rlim_cur))
+		rlim->rlim_cur = RLIM_INFINITY;
+	else
+		rlim->rlim_cur = (unsigned long)rlim64->rlim_cur;
+	if (rlim64_is_infinity(rlim64->rlim_max))
+		rlim->rlim_max = RLIM_INFINITY;
+	else
+		rlim->rlim_max = (unsigned long)rlim64->rlim_max;
+}
+
 /* make sure you are allowed to change @tsk limits before calling this */
 int do_prlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim, struct rlimit *old_rlim)
@@ -1340,6 +1373,72 @@ out:
 	return retval;
 }
 
+/* rcu lock must be held */
+static int check_prlimit_permission(struct task_struct *task)
+{
+	const struct cred *cred = current_cred(), *tcred;
+
+	tcred = __task_cred(task);
+	if ((cred->uid != tcred->euid ||
+	     cred->uid != tcred->suid ||
+	     cred->uid != tcred->uid  ||
+	     cred->gid != tcred->egid ||
+	     cred->gid != tcred->sgid ||
+	     cred->gid != tcred->gid) &&
+	     !capable(CAP_SYS_RESOURCE)) {
+		return -EPERM;
+	}
+
+	return 0;
+}
+
+SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
+		const struct rlimit64 __user *, new_rlim,
+		struct rlimit64 __user *, old_rlim)
+{
+	struct rlimit64 old64, new64;
+	struct rlimit old, new;
+	struct task_struct *tsk;
+	int ret;
+
+	if (new_rlim) {
+		if (copy_from_user(&new64, new_rlim, sizeof(new64)))
+			return -EFAULT;
+		rlim64_to_rlim(&new64, &new);
+	}
+
+	/* we don't want to fail after do_rlimit */
+	if (old_rlim && !access_ok(VERIFY_WRITE, old_rlim, sizeof(old64)))
+		return -EFAULT;
+
+	rcu_read_lock();
+	tsk = pid ? find_task_by_vpid(pid) : current;
+	if (!tsk) {
+		rcu_read_unlock();
+		return -ESRCH;
+	}
+	ret = check_prlimit_permission(tsk);
+	if (ret) {
+		rcu_read_unlock();
+		return ret;
+	}
+	get_task_struct(tsk);
+	rcu_read_unlock();
+
+	ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
+			old_rlim ? &old : NULL);
+
+	if (!ret && old_rlim) {
+		rlim_to_rlim64(&old, &old64);
+		if (WARN_ON_ONCE(__copy_to_user(old_rlim, &old64,
+						sizeof(old64))))
+			ret = -EFAULT;
+	}
+
+	put_task_struct(tsk);
+	return ret;
+}
+
 SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
 {
 	struct rlimit new_rlim;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 11/11] unistd: add __NR_prlimit64 syscall numbers
  2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
                   ` (8 preceding siblings ...)
  2010-05-10 18:00 ` [PATCH v3 10/11] rlimits: implement prlimit64 syscall Jiri Slaby
@ 2010-05-10 18:00 ` Jiri Slaby
  9 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-10 18:00 UTC (permalink / raw)
  To: akpm
  Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin

Add __NR_prlimit64 syscall numbers to asm-generic. Add them also to
asm-x86, both 32 and 64-bit.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/ia32/ia32entry.S          |    1 +
 arch/x86/include/asm/unistd_32.h   |    3 ++-
 arch/x86/include/asm/unistd_64.h   |    2 ++
 arch/x86/kernel/syscall_table_32.S |    1 +
 include/asm-generic/unistd.h       |    4 +++-
 5 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 17cf65c..b9472ec 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -844,4 +844,5 @@ ia32_sys_call_table:
 	.quad compat_sys_recvmmsg
 	.quad sys_fanotify_init
 	.quad sys32_fanotify_mark
+	.quad sys_prlimit64			/* 340 */
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 80b799c..b766a5e 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -345,10 +345,11 @@
 #define __NR_recvmmsg		337
 #define __NR_fanotify_init	338
 #define __NR_fanotify_mark	339
+#define __NR_prlimit64		340
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 340
+#define NR_syscalls 341
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 5b7b1d5..363e9b8 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -667,6 +667,8 @@ __SYSCALL(__NR_recvmmsg, sys_recvmmsg)
 __SYSCALL(__NR_fanotify_init, sys_fanotify_init)
 #define __NR_fanotify_mark			301
 __SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
+#define __NR_prlimit64				302
+__SYSCALL(__NR_prlimit64, sys_prlimit64)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 07ad5eb..b35786d 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -339,3 +339,4 @@ ENTRY(sys_call_table)
 	.long sys_recvmmsg
 	.long sys_fanotify_init
 	.long sys_fanotify_mark
+	.long sys_prlimit64		/* 340 */
diff --git a/include/asm-generic/unistd.h b/include/asm-generic/unistd.h
index 6a0b30f..0dfd517 100644
--- a/include/asm-generic/unistd.h
+++ b/include/asm-generic/unistd.h
@@ -626,9 +626,11 @@ __SYSCALL(__NR_perf_event_open, sys_perf_event_open)
 __SYSCALL(__NR_accept4, sys_accept4)
 #define __NR_recvmmsg 243
 __SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_prlimit64 244
+__SYSCALL(__NR_prlimit64, sys_prlimit64)
 
 #undef __NR_syscalls
-#define __NR_syscalls 244
+#define __NR_syscalls 245
 
 /*
  * All syscalls below here should go away really,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks
  2010-05-10 18:00 ` [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks Jiri Slaby
@ 2010-05-13 22:56   ` Andrew Morton
  2010-06-06 20:23     ` [PATCH v3 06/11] rlimits: do security check under task_lock Jiri Slaby
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2010-05-13 22:56 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

On Mon, 10 May 2010 20:00:45 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> From: Jiri Slaby <jirislaby@gmail.com>
> 
> Add locking to allow setrlimit accept task parameter other than
> current.
> 
> Namely, lock tasklist_lock for read and check whether the task
> structure has sighand non-null. Do all the signal processing under
> that lock still held.
> 
> Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/sys.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 7c76f84..eb21661 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1272,6 +1272,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
>  
>  #endif
>  
> +/* make sure you are allowed to change @tsk limits before calling this */
>  int do_setrlimit(struct task_struct *tsk, unsigned int resource,
>  		struct rlimit *new_rlim)
>  {
> @@ -1285,9 +1286,16 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
>  	if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open)
>  		return -EPERM;
>  
> +	/* protect tsk->signal and tsk->sighand from disappearing */
> +	read_lock(&tasklist_lock);
> +	if (!tsk->sighand) {
> +		retval = -ESRCH;
> +		goto out;
> +	}
> +
>  	retval = security_task_setrlimit(tsk, resource, new_rlim);

I looked at the amount of code which exists under
security_task_setrlimit() and nearly died.  Please convince me that it
is correct to do all that work under tasklist_lock, and that it is also
maintainable.

> 	update_rlimit_cpu(tsk, new_rlim->rlim_cur);
>  out:
> +	read_unlock(&tasklist_lock);

And this causes task-struct.sighand->siglock to be taken under
tasklist_lock.  Was that a pre-existing ranking, or is it a new one?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-05-10 18:00 ` Jiri Slaby
@ 2010-05-13 22:56   ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2010-05-13 22:56 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby, Heiko Carstens,
	Ingo Molnar

On Mon, 10 May 2010 20:00:46 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> Do security_task_setrlimit under task_lock. Other tasks may
> change limits under our hands while we are checking limits
> inside the function. From now on, they can't.
>
> ...
>
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1277,7 +1277,7 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
>  		struct rlimit *new_rlim)
>  {
>  	struct rlimit *old_rlim;
> -	int retval;
> +	int retval = 0;
>  
>  	if (resource >= RLIM_NLIMITS)
>  		return -EINVAL;
> @@ -1293,10 +1293,6 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
>  		goto out;
>  	}
>  
> -	retval = security_task_setrlimit(tsk, resource, new_rlim);
> -	if (retval)
> -		goto out;
> -
>  	if (resource == RLIMIT_CPU && new_rlim->rlim_cur == 0) {
>  		/*
>  		 * The caller is asking for an immediate RLIMIT_CPU
> @@ -1309,11 +1305,13 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
>  
>  	old_rlim = tsk->signal->rlim + resource;
>  	task_lock(tsk->group_leader);
> -	if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
> -				capable(CAP_SYS_RESOURCE))
> -		*old_rlim = *new_rlim;
> -	else
> +	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
> +				!capable(CAP_SYS_RESOURCE))
>  		retval = -EPERM;
> +	if (!retval)
> +		retval = security_task_setrlimit(tsk, resource, new_rlim);
> +	if (!retval)
> +		*old_rlim = *new_rlim;
>  	task_unlock(tsk->group_leader);
>  
>  	if (retval || resource != RLIMIT_CPU)

Yikes, so the locking around all that selinux code becomes even more
brutal.  How much rope are you tying around the selinux developers'
hands here?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 10/11] rlimits: implement prlimit64 syscall
  2010-05-10 18:00 ` [PATCH v3 10/11] rlimits: implement prlimit64 syscall Jiri Slaby
@ 2010-05-13 22:56   ` Andrew Morton
  2010-05-26 12:58     ` Jiri Slaby
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2010-05-13 22:56 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

On Mon, 10 May 2010 20:00:50 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> This patch adds the code to support the sys_prlimit64 syscall which
> modifies-and-returns the rlim values of a selected process
> atomically. The first parameter, pid, being 0 means current process.
> 
> Unlike the current implementation, it is a generic interface,
> architecture indepentent so that we needn't handle compat stuff
> anymore. In the future, after glibc start to use this we can deprecate
> sys_setrlimit and sys_getrlimit in favor to clean up the code finally.
> 
> It also adds a possibility of changing limits of other processes. We
> check the user's permissions to do that and if it succeeds, the new
> limits are propagated online. This is good for large scale
> applications such as SAP or databases where administrators need to
> change limits time by time (e.g. on crashes increase core size). And
> it is unacceptable to restart the service.
> 
> For safety, all rlim users now either use accessors or doesn't need
> them due to
> - locking
> - the fact a process was just forked and nobody else knows about it
>   yet (and nobody can't thus read/write limits)
> hence it is safe to modify limits now.
> 
> The limitation is that we currently stay at ulong internal
> representation. So we use the rlim64_is_infinity check where we
> compare to ULONG_MAX on 32-bit which is the maximum value there.
> 
> And since internally we hold limits in struct rlimit, we introduce
> converters used before and after do_prlimit call in sys_prlimit64.
> 

Is this worth all the new code and the increase in locking dependencies
which I think is there?

This could all be done in userspace, couldn't it?  Write a little library
which clones a thread then waits for someone to send it a
change-your-rlimits message.  Write a little tool to send those
messages and voila.

>
> ...
>
> +SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
> +		const struct rlimit64 __user *, new_rlim,
> +		struct rlimit64 __user *, old_rlim)
> +{
> +	struct rlimit64 old64, new64;
> +	struct rlimit old, new;
> +	struct task_struct *tsk;
> +	int ret;
> +
> +	if (new_rlim) {
> +		if (copy_from_user(&new64, new_rlim, sizeof(new64)))
> +			return -EFAULT;
> +		rlim64_to_rlim(&new64, &new);
> +	}
> +
> +	/* we don't want to fail after do_rlimit */

You meant "do_prlimit".

If doesn't explain _why_ we don't want to fail.  And it shold do so,
because the __copy_to_user() can still fail.


> +	if (old_rlim && !access_ok(VERIFY_WRITE, old_rlim, sizeof(old64)))
> +		return -EFAULT;
> +
> +	rcu_read_lock();
> +	tsk = pid ? find_task_by_vpid(pid) : current;
> +	if (!tsk) {
> +		rcu_read_unlock();
> +		return -ESRCH;
> +	}
> +	ret = check_prlimit_permission(tsk);
> +	if (ret) {
> +		rcu_read_unlock();
> +		return ret;
> +	}
> +	get_task_struct(tsk);
> +	rcu_read_unlock();
> +
> +	ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL,
> +			old_rlim ? &old : NULL);
> +
> +	if (!ret && old_rlim) {
> +		rlim_to_rlim64(&old, &old64);
> +		if (WARN_ON_ONCE(__copy_to_user(old_rlim, &old64,
> +						sizeof(old64))))
> +			ret = -EFAULT;
> +	}
> +
> +	put_task_struct(tsk);
> +	return ret;
> +}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 10/11] rlimits: implement prlimit64 syscall
  2010-05-13 22:56   ` Andrew Morton
@ 2010-05-26 12:58     ` Jiri Slaby
  2010-05-26 14:30       ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-05-26 12:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: adobriyan, nhorman, oleg, linux-kernel

On 05/14/2010 12:56 AM, Andrew Morton wrote:
> On Mon, 10 May 2010 20:00:50 +0200
> Jiri Slaby <jslaby@suse.cz> wrote:
> 
>> This patch adds the code to support the sys_prlimit64 syscall which
>> modifies-and-returns the rlim values of a selected process
>> atomically. The first parameter, pid, being 0 means current process.
>>
>> Unlike the current implementation, it is a generic interface,
>> architecture indepentent so that we needn't handle compat stuff
>> anymore. In the future, after glibc start to use this we can deprecate
>> sys_setrlimit and sys_getrlimit in favor to clean up the code finally.
>>
>> It also adds a possibility of changing limits of other processes. We
>> check the user's permissions to do that and if it succeeds, the new
>> limits are propagated online. This is good for large scale
>> applications such as SAP or databases where administrators need to
>> change limits time by time (e.g. on crashes increase core size). And
>> it is unacceptable to restart the service.
>>
>> For safety, all rlim users now either use accessors or doesn't need
>> them due to
>> - locking
>> - the fact a process was just forked and nobody else knows about it
>>   yet (and nobody can't thus read/write limits)
>> hence it is safe to modify limits now.
>>
>> The limitation is that we currently stay at ulong internal
>> representation. So we use the rlim64_is_infinity check where we
>> compare to ULONG_MAX on 32-bit which is the maximum value there.
>>
>> And since internally we hold limits in struct rlimit, we introduce
>> converters used before and after do_prlimit call in sys_prlimit64.
>>
> 
> Is this worth all the new code and the increase in locking dependencies
> which I think is there?

Sorry, for the late reply, I was busy with other things.

Both
tasklist_lock -> (task_struct->sighand->siglock)
tasklist_lock -> (task_struct->alloc_lock)
are OK, since both dependencies already exist in the kernel.

This should have been in the changelogs, yes!

> This could all be done in userspace, couldn't it?  Write a little library
> which clones a thread then waits for someone to send it a
> change-your-rlimits message.  Write a little tool to send those
> messages and voila.

Sorry, I'm not sure I understand this. Could you shed some light on what
will run in the new thread?

A code such as:
main()
{
  if (!clone())
    exec("something");

  while (wait_for_message(&m)) {
     setrlimit(m);
     sleep();
  }
}
won't obviously work. Could you change it so it reflects your idea or
explain what I'm missing?

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 10/11] rlimits: implement prlimit64 syscall
  2010-05-26 12:58     ` Jiri Slaby
@ 2010-05-26 14:30       ` Andrew Morton
  2010-05-26 15:13         ` Jiri Slaby
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2010-05-26 14:30 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: adobriyan, nhorman, oleg, linux-kernel

On Wed, 26 May 2010 14:58:05 +0200 Jiri Slaby <jirislaby@gmail.com> wrote:

> > This could all be done in userspace, couldn't it?  Write a little library
> > which clones a thread then waits for someone to send it a
> > change-your-rlimits message.  Write a little tool to send those
> > messages and voila.
> 
> Sorry, I'm not sure I understand this. Could you shed some light on what
> will run in the new thread?
> 
> A code such as:
> main()
> {
>   if (!clone())
>     exec("something");
> 
>   while (wait_for_message(&m)) {
>      setrlimit(m);
>      sleep();
>   }
> }
> won't obviously work. Could you change it so it reflects your idea or
> explain what I'm missing?

main()
{
	if (clone(...)) {
		while (wait_for_message(&m))
			setrlimit(m);
	}

	...
}
		
?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 10/11] rlimits: implement prlimit64 syscall
  2010-05-26 14:30       ` Andrew Morton
@ 2010-05-26 15:13         ` Jiri Slaby
  0 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-05-26 15:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: adobriyan, nhorman, oleg, linux-kernel

On 05/26/2010 04:30 PM, Andrew Morton wrote:
> main()
> {
> 	if (clone(...)) {
> 		while (wait_for_message(&m))
> 			setrlimit(m);
> 	}
> 
> 	...
> }
> 		
> ?

Hmm. Probably I'm still missing something obvious, but how does this
help for existing services like databases and/or closed-source products?

Or do you mean to provide this as a library
(s@main@__attribute__((constructor)) my_main@ then) and link (or even
preload) to the programs people want to change limits on-the-fly in?
(And clone with THREAD flag to share task_struct->signal.)

But this approach can only help for the parent of all the forked-later
processes. Especially if a process forks and exits (i.e. creates a
daemon), the child running in the background doesn't have the
wait_for_message thread. So if it forks again (e.g. to service a new
request) the limits cannot be changed in any of them.

This can be solved by adding such a cloned-thread loop into every forked
child, but I'm not sure this is something we want.

In addition, people don't know which process will need to change limits
in advance. Every single binary would have to contain such code.

-- 
js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-05-13 22:56   ` Andrew Morton
@ 2010-06-06 20:23     ` Jiri Slaby
  2010-06-07 18:08       ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-06-06 20:23 UTC (permalink / raw)
  To: akpm; +Cc: adobriyan, nhorman, oleg, linux-kernel, jirislaby

Andrew Morton wrote:
> On Mon, 10 May 2010 20:00:46 +0200
> Jiri Slaby <jslaby@suse.cz> wrote:
> > @@ -1309,11 +1305,13 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
> >  
> >  	old_rlim = tsk->signal->rlim + resource;
> >  	task_lock(tsk->group_leader);
> > -	if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
> > -				capable(CAP_SYS_RESOURCE))
> > -		*old_rlim = *new_rlim;
> > -	else
> > +	if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
> > +				!capable(CAP_SYS_RESOURCE))
> >  		retval = -EPERM;
> > +	if (!retval)
> > +		retval = security_task_setrlimit(tsk, resource, new_rlim);
> 
> I looked at the amount of code which exists under
> security_task_setrlimit() and nearly died.  Please convince me that it
> is correct to do all that work under tasklist_lock, and that it is also
> maintainable.

When I wrote this code, I looked how it is done in the current code
and redid what others, including getioprio, setnice and task_kill
(this one even with task_lock write-locked) do. I need the lock
to protect task->signal (which is checked inside the security
function) from disappearing.

> > +	if (!retval)
> > +		*old_rlim = *new_rlim;
> >  	task_unlock(tsk->group_leader);
> >  
> >  	if (retval || resource != RLIMIT_CPU)
> 
> Yikes, so the locking around all that selinux code becomes even more
> brutal.  How much rope are you tying around the selinux developers'
> hands here?

I agree with that and didn't find a better solution than is attached
here.

--
The code performed in security_task_setrlimit is very long when
CONFIG_SECURITY is enabled. Don't execute the function under
alloc_lock, unlock it temporarily and check whether the limits changed
in the meantime while the lock was not held. If yes, redo do check.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 kernel/sys.c |   39 +++++++++++++++++++++++++++++++++++----
 1 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index d7fcd4a..48cb21d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1304,12 +1304,37 @@ static void rlim64_to_rlim(const struct rlimit64 *rlim64, struct rlimit *rlim)
 		rlim->rlim_max = (unsigned long)rlim64->rlim_max;
 }
 
+/**
+ * check_security_task_setrlimit_unlocked - calls security_task_setrlimit
+ *
+ * But security_task_setrlimit is complex with much of code, so that we don't
+ * want to call it while holding the task_lock. We temporarily unlock the lock
+ * and after the check we test whether the limits changed in the meantime.
+ */
+static int check_security_task_setrlimit_unlocked(struct task_struct *tsk,
+		unsigned int resource, struct rlimit *new_rlim,
+		struct rlimit *old_rlim)
+{
+	struct rlimit rlim;
+	int ret;
+
+	memcpy(&rlim, old_rlim, sizeof(rlim));
+
+	task_unlock(tsk->group_leader);
+	ret = security_task_setrlimit(tsk, resource, new_rlim);
+	task_lock(tsk->group_leader);
+
+	if (!ret && memcmp(&rlim, old_rlim, sizeof(rlim)))
+		return -EAGAIN;
+	return ret;
+}
+
 /* make sure you are allowed to change @tsk limits before calling this */
 int do_prlimit(struct task_struct *tsk, unsigned int resource,
 		struct rlimit *new_rlim, struct rlimit *old_rlim)
 {
 	struct rlimit *rlim;
-	int retval = 0;
+	int retval;
 
 	if (resource >= RLIM_NLIMITS)
 		return -EINVAL;
@@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
 
 	rlim = tsk->signal->rlim + resource;
 	task_lock(tsk->group_leader);
+again:
+	retval = 0;
 	if (new_rlim) {
 		if ((new_rlim->rlim_max > rlim->rlim_max) &&
 					!capable(CAP_SYS_RESOURCE))
 			retval = -EPERM;
-		if (!retval)
-			retval = security_task_setrlimit(tsk, resource,
-					new_rlim);
+		if (!retval) {
+			retval = check_security_task_setrlimit_unlocked(tsk,
+					resource, new_rlim, rlim);
+			if (retval == -EAGAIN) {
+				goto again;
+			}
+		}
 	}
 	if (!retval) {
 		if (old_rlim)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-06 20:23     ` [PATCH v3 06/11] rlimits: do security check under task_lock Jiri Slaby
@ 2010-06-07 18:08       ` Oleg Nesterov
  2010-06-23 15:20         ` Jiri Slaby
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2010-06-07 18:08 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: akpm, adobriyan, nhorman, linux-kernel, jirislaby,
	Stephen Smalley, James Morris, Eric Paris

(add selinux maintainers)

First of all, my apologies for the huge delay. And I still didn't
read the whole series, sorry.

On 06/06, Jiri Slaby wrote:
>
> +static int check_security_task_setrlimit_unlocked(struct task_struct *tsk,
> +		unsigned int resource, struct rlimit *new_rlim,
> +		struct rlimit *old_rlim)
> +{
> +	struct rlimit rlim;
> +	int ret;
> +
> +	memcpy(&rlim, old_rlim, sizeof(rlim));
> +
> +	task_unlock(tsk->group_leader);
> +	ret = security_task_setrlimit(tsk, resource, new_rlim);
> +	task_lock(tsk->group_leader);
> +
> +	if (!ret && memcmp(&rlim, old_rlim, sizeof(rlim)))
> +		return -EAGAIN;
> +	return ret;
> +}
> +
>  /* make sure you are allowed to change @tsk limits before calling this */
>  int do_prlimit(struct task_struct *tsk, unsigned int resource,
>  		struct rlimit *new_rlim, struct rlimit *old_rlim)
>  {
>  	struct rlimit *rlim;
> -	int retval = 0;
> +	int retval;
>  
>  	if (resource >= RLIM_NLIMITS)
>  		return -EINVAL;
> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>  
>  	rlim = tsk->signal->rlim + resource;
>  	task_lock(tsk->group_leader);
> +again:
> +	retval = 0;
>  	if (new_rlim) {
>  		if ((new_rlim->rlim_max > rlim->rlim_max) &&
>  					!capable(CAP_SYS_RESOURCE))
>  			retval = -EPERM;
> -		if (!retval)
> -			retval = security_task_setrlimit(tsk, resource,
> -					new_rlim);
> +		if (!retval) {
> +			retval = check_security_task_setrlimit_unlocked(tsk,
> +					resource, new_rlim, rlim);
> +			if (retval == -EAGAIN) {
> +				goto again;
> +			}
> +		}

Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
current_has_perm() can be called under task_lock. Actually, I do not think
we have a race, selinux_task_setrlimit() only checks that the caller has
rights to change the rlimits.

And. Given that avc_has_perm() can be called from irq context (say,
selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
to call it under task_lock() which is not irq-safe.

But. OTOH, if we are really worried about security_ ops, then we have another
reason to call this hook under task_lock(), and we probably want to modify
selinux_bprm_committing_creds() to take this lock too:

	--- security/selinux/hooks.c
	+++ security/selinux/hooks.c
	@@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
		rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
				  PROCESS__RLIMITINH, NULL);
		if (rc) {
	+		/* protects against do_prlimit() */
	+		task_lock(current);
			for (i = 0; i < RLIM_NLIMITS; i++) {
				rlim = current->signal->rlim + i;
				initrlim = init_task.signal->rlim + i;
				rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
			}
	+		task_unlock(current);
			update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
		}
	 }

Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
This looks a bit strange, different threads can have different creds
but obviously rlimits are per-process.

Perhaps it makes sense to do selinux_task_setrlimit(p->group_leader)?
At least in this case the result should be "consistent".

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-07 18:08       ` Oleg Nesterov
@ 2010-06-23 15:20         ` Jiri Slaby
  2010-06-23 16:12           ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2010-06-23 15:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, adobriyan, nhorman, linux-kernel, Stephen Smalley,
	James Morris, Eric Paris

On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> First of all, my apologies for the huge delay. And I still didn't
> read the whole series, sorry.

Hi, never mind, my RTT of 2 weeks doesn't look like very short too :).

> On 06/06, Jiri Slaby wrote:
>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>>  
>>  	rlim = tsk->signal->rlim + resource;
>>  	task_lock(tsk->group_leader);
>> +again:
>> +	retval = 0;
>>  	if (new_rlim) {
>>  		if ((new_rlim->rlim_max > rlim->rlim_max) &&
>>  					!capable(CAP_SYS_RESOURCE))

BTW this capable() has the exactly same problem with being called with
task_lock held. Is it OK to move it completely out of critical section?
I'm asking because it sets a current->flags SU bit used for accounting.
If I move it out of the section, it will set the bit always.

>>  			retval = -EPERM;
>> -		if (!retval)
>> -			retval = security_task_setrlimit(tsk, resource,
>> -					new_rlim);
>> +		if (!retval) {
>> +			retval = check_security_task_setrlimit_unlocked(tsk,
>> +					resource, new_rlim, rlim);
>> +			if (retval == -EAGAIN) {
>> +				goto again;
>> +			}
>> +		}
> 
> Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
> current_has_perm() can be called under task_lock. Actually, I do not think
> we have a race, selinux_task_setrlimit() only checks that the caller has
> rights to change the rlimits.

But does so only if current limits are different to the new ones. My
opinion is that we can ignore it anyway.

> And. Given that avc_has_perm() can be called from irq context (say,
> selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
> to call it under task_lock() which is not irq-safe.
> 
> But. OTOH, if we are really worried about security_ ops, then we have another
> reason to call this hook under task_lock(), and we probably want to modify
> selinux_bprm_committing_creds() to take this lock too:
> 
> 	--- security/selinux/hooks.c
> 	+++ security/selinux/hooks.c
> 	@@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
> 		rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
> 				  PROCESS__RLIMITINH, NULL);
> 		if (rc) {
> 	+		/* protects against do_prlimit() */
> 	+		task_lock(current);
> 			for (i = 0; i < RLIM_NLIMITS; i++) {
> 				rlim = current->signal->rlim + i;
> 				initrlim = init_task.signal->rlim + i;
> 				rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
> 			}
> 	+		task_unlock(current);
> 			update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
> 		}
> 	 }

Makes sense to me.

> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> This looks a bit strange, different threads can have different creds
> but obviously rlimits are per-process.

Sorry I can't see it. Could you point out in which function this is done?

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-23 15:20         ` Jiri Slaby
@ 2010-06-23 16:12           ` Oleg Nesterov
  2010-06-23 17:44             ` Jiri Slaby
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2010-06-23 16:12 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: akpm, adobriyan, nhorman, linux-kernel, Stephen Smalley,
	James Morris, Eric Paris

On 06/23, Jiri Slaby wrote:
>
> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> > First of all, my apologies for the huge delay. And I still didn't
> > read the whole series, sorry.
>
> Hi, never mind, my RTT of 2 weeks doesn't look like very short too :).
>
> > On 06/06, Jiri Slaby wrote:
> >> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
> >>
> >>  	rlim = tsk->signal->rlim + resource;
> >>  	task_lock(tsk->group_leader);
> >> +again:
> >> +	retval = 0;
> >>  	if (new_rlim) {
> >>  		if ((new_rlim->rlim_max > rlim->rlim_max) &&
> >>  					!capable(CAP_SYS_RESOURCE))
>
> BTW this capable() has the exactly same problem with being called with
> task_lock held. Is it OK to move it completely out of critical section?
> I'm asking because it sets a current->flags SU bit used for accounting.
> If I move it out of the section, it will set the bit always.

Well, with all these delays I do not know what "exactly same problem"
means ;) Please explain?

> >>  			retval = -EPERM;
> >> -		if (!retval)
> >> -			retval = security_task_setrlimit(tsk, resource,
> >> -					new_rlim);
> >> +		if (!retval) {
> >> +			retval = check_security_task_setrlimit_unlocked(tsk,
> >> +					resource, new_rlim, rlim);
> >> +			if (retval == -EAGAIN) {
> >> +				goto again;
> >> +			}
> >> +		}
> >
> > Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
> > current_has_perm() can be called under task_lock. Actually, I do not think
> > we have a race, selinux_task_setrlimit() only checks that the caller has
> > rights to change the rlimits.
>
> But does so only if current limits are different to the new ones. My
> opinion is that we can ignore it anyway.

Or call it under task_lock(), see below

> > And. Given that avc_has_perm() can be called from irq context (say,
> > selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
> > to call it under task_lock() which is not irq-safe.
> >
> > But. OTOH, if we are really worried about security_ ops, then we have another
> > reason to call this hook under task_lock(), and we probably want to modify
> > selinux_bprm_committing_creds() to take this lock too:
> >
> > 	--- security/selinux/hooks.c
> > 	+++ security/selinux/hooks.c
> > 	@@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
> > 		rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
> > 				  PROCESS__RLIMITINH, NULL);
> > 		if (rc) {
> > 	+		/* protects against do_prlimit() */
> > 	+		task_lock(current);
> > 			for (i = 0; i < RLIM_NLIMITS; i++) {
> > 				rlim = current->signal->rlim + i;
> > 				initrlim = init_task.signal->rlim + i;
> > 				rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
> > 			}
> > 	+		task_unlock(current);
> > 			update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
> > 		}
> > 	 }
>
> Makes sense to me.

see above ;)

> > Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> > This looks a bit strange, different threads can have different creds
> > but obviously rlimits are per-process.
>
> Sorry I can't see it. Could you point out in which function this is done?

selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-23 16:12           ` Oleg Nesterov
@ 2010-06-23 17:44             ` Jiri Slaby
  2010-06-23 17:56               ` Oleg Nesterov
  2010-06-23 18:37               ` Stephen Smalley
  0 siblings, 2 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-06-23 17:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, adobriyan, nhorman, linux-kernel, Stephen Smalley,
	James Morris, Eric Paris

On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> On 06/23, Jiri Slaby wrote:
>>
>> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
>>> On 06/06, Jiri Slaby wrote:
>>>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>>>>
>>>>  	rlim = tsk->signal->rlim + resource;
>>>>  	task_lock(tsk->group_leader);
>>>> +again:
>>>> +	retval = 0;
>>>>  	if (new_rlim) {
>>>>  		if ((new_rlim->rlim_max > rlim->rlim_max) &&
>>>>  					!capable(CAP_SYS_RESOURCE))
>>
>> BTW this capable() has the exactly same problem with being called with
>> task_lock held. Is it OK to move it completely out of critical section?
>> I'm asking because it sets a current->flags SU bit used for accounting.
>> If I move it out of the section, it will set the bit always.
> 
> Well, with all these delays I do not know what "exactly same problem"
> means ;) Please explain?

As I wrote: that the capable() is called with task_lock held. With
security enabled, capable() goes through all the avc_has_perm_noaudit,
avc_audit and similar (in selinux), the same as security_task_setrlimit
which we were writing about -- Andrew complaining about doing very long
security checks while holding spinlocks.

I mean we should do either none of capable and selinux_task_setrlimit
under task_lock or both :).

>>> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
>>> This looks a bit strange, different threads can have different creds
>>> but obviously rlimits are per-process.
>>
>> Sorry I can't see it. Could you point out in which function this is done?
> 
> selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()

I still see no way how this is wrong. We want to check whether current
thread has capabilities to change (someone else's) rlimits. Maybe I'm
missing something?

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-23 17:44             ` Jiri Slaby
@ 2010-06-23 17:56               ` Oleg Nesterov
  2010-06-23 21:35                 ` Jiri Slaby
  2010-06-23 18:37               ` Stephen Smalley
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2010-06-23 17:56 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: akpm, adobriyan, nhorman, linux-kernel, Stephen Smalley,
	James Morris, Eric Paris

On 06/23, Jiri Slaby wrote:
>
> On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> > On 06/23, Jiri Slaby wrote:
> >>
> >> BTW this capable() has the exactly same problem with being called with
> >> task_lock held. Is it OK to move it completely out of critical section?
> >> I'm asking because it sets a current->flags SU bit used for accounting.
> >> If I move it out of the section, it will set the bit always.
> >
> > Well, with all these delays I do not know what "exactly same problem"
> > means ;) Please explain?
>
> As I wrote: that the capable() is called with task_lock held.

Ah, got it, yes.

> > selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()

I meant
selinux_task_setrlimit(p)->current_has_perm(p)->task_sid(p)->__task_cred(p)

> I still see no way how this is wrong. We want to check whether current
> thread has capabilities to change (someone else's) rlimits.

Yes. but what is "someone else" ?

IIRC, one of your patches (correctly) changes security_task_setrlimit()
to have the new argument, p == "someone else", correct?

Now, the result of security check depends on __task_cred(p) above, and
thus depends on which thread we choose to change rlimits.

I think it makes more sense to always pass ->group_leader as an argument
to security_task_setrlimit(p). But probably I missed something, I do not
remember what exactly other patches do.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-23 17:44             ` Jiri Slaby
  2010-06-23 17:56               ` Oleg Nesterov
@ 2010-06-23 18:37               ` Stephen Smalley
  1 sibling, 0 replies; 25+ messages in thread
From: Stephen Smalley @ 2010-06-23 18:37 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Oleg Nesterov, akpm, adobriyan, nhorman, linux-kernel,
	James Morris, Eric Paris

On Wed, 2010-06-23 at 19:44 +0200, Jiri Slaby wrote:
> On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> > On 06/23, Jiri Slaby wrote:
> >>
> >> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> >>> On 06/06, Jiri Slaby wrote:
> >>>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
> >>>>
> >>>>  	rlim = tsk->signal->rlim + resource;
> >>>>  	task_lock(tsk->group_leader);
> >>>> +again:
> >>>> +	retval = 0;
> >>>>  	if (new_rlim) {
> >>>>  		if ((new_rlim->rlim_max > rlim->rlim_max) &&
> >>>>  					!capable(CAP_SYS_RESOURCE))
> >>
> >> BTW this capable() has the exactly same problem with being called with
> >> task_lock held. Is it OK to move it completely out of critical section?
> >> I'm asking because it sets a current->flags SU bit used for accounting.
> >> If I move it out of the section, it will set the bit always.
> > 
> > Well, with all these delays I do not know what "exactly same problem"
> > means ;) Please explain?
> 
> As I wrote: that the capable() is called with task_lock held. With
> security enabled, capable() goes through all the avc_has_perm_noaudit,
> avc_audit and similar (in selinux), the same as security_task_setrlimit
> which we were writing about -- Andrew complaining about doing very long
> security checks while holding spinlocks.
> 
> I mean we should do either none of capable and selinux_task_setrlimit
> under task_lock or both :).

IMO, just do them both under task_lock.  All of the core SELinux
permission checking code should be safe under task lock and the audit
path is the exceptional case (only upon denial).

> >>> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> >>> This looks a bit strange, different threads can have different creds
> >>> but obviously rlimits are per-process.
> >>
> >> Sorry I can't see it. Could you point out in which function this is done?
> > 
> > selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()
> 
> I still see no way how this is wrong. We want to check whether current
> thread has capabilities to change (someone else's) rlimits. Maybe I'm
> missing something?
> 
> thanks,

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 06/11] rlimits: do security check under task_lock
  2010-06-23 17:56               ` Oleg Nesterov
@ 2010-06-23 21:35                 ` Jiri Slaby
  0 siblings, 0 replies; 25+ messages in thread
From: Jiri Slaby @ 2010-06-23 21:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, adobriyan, nhorman, linux-kernel, Stephen Smalley,
	James Morris, Eric Paris

On 06/23/2010 07:56 PM, Oleg Nesterov wrote:
> On 06/23, Jiri Slaby wrote:
>> I still see no way how this is wrong. We want to check whether current
>> thread has capabilities to change (someone else's) rlimits.

You know, this is one of those sentences where you wonder what kind of
idiot wrote that. And then you find out it was you being totally off base.

> Yes. but what is "someone else" ?

I don't know what was I thinking of. Indeed you are correct. When I was
writing that I was somehow under the impression (dunno why) that 'p' is
current process and the code checks whether 'p' can do the change. But
we are indeed checking whether 'current' may change 'p's limits -- or
better, as you wrote, not of the thread 'p' itself, but of its whole
process.

I'll resend a new version with you all in CCs to see what we can do with
the patches, if anything.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-06-23 21:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-10 18:00 [PATCH v3 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 02/11] rlimits: add task_struct to update_rlimit_cpu Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 03/11] rlimits: make sure ->rlim_max never grows in sys_setrlimit Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 04/11] rlimits: split sys_setrlimit Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 05/11] rlimits: allow setrlimit to non-current tasks Jiri Slaby
2010-05-13 22:56   ` Andrew Morton
2010-06-06 20:23     ` [PATCH v3 06/11] rlimits: do security check under task_lock Jiri Slaby
2010-06-07 18:08       ` Oleg Nesterov
2010-06-23 15:20         ` Jiri Slaby
2010-06-23 16:12           ` Oleg Nesterov
2010-06-23 17:44             ` Jiri Slaby
2010-06-23 17:56               ` Oleg Nesterov
2010-06-23 21:35                 ` Jiri Slaby
2010-06-23 18:37               ` Stephen Smalley
2010-05-10 18:00 ` Jiri Slaby
2010-05-13 22:56   ` Andrew Morton
2010-05-10 18:00 ` [PATCH v3 07/11] rlimits: add rlimit64 structure Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 08/11] rlimits: redo do_setrlimit to more generic do_prlimit Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 09/11] rlimits: switch more rlimit syscalls to do_prlimit Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 10/11] rlimits: implement prlimit64 syscall Jiri Slaby
2010-05-13 22:56   ` Andrew Morton
2010-05-26 12:58     ` Jiri Slaby
2010-05-26 14:30       ` Andrew Morton
2010-05-26 15:13         ` Jiri Slaby
2010-05-10 18:00 ` [PATCH v3 11/11] unistd: add __NR_prlimit64 syscall numbers Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox