linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Olsa <jolsa@redhat.com>
Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
Date: Wed, 4 Sep 2019 09:55:32 +0200	[thread overview]
Message-ID: <20190904075532.GA26751@gmail.com> (raw)
In-Reply-To: <20190903171645.28090-2-cascardo@canonical.com>


* Thadeu Lima de Souza Cascardo <cascardo@canonical.com> wrote:

> After commit a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support
> utilization clamping), using sched_getattr with size 48 will return E2BIG.
> 
> This breaks, for example, chrt.
> $ chrt -p $$
> chrt: failed to get pid 26306's policy: Argument list too long
> $
> 
> With this fix, when using the old size, the utilization clamping values will be
> absent from sched_attr. When using the new size or some larger size, they will
> be present.
> 
> After the fix, chrt works again.
> $ chrt -p $$
> pid 4649's current scheduling policy: SCHED_OTHER
> pid 4649's current scheduling priority: 0
> $
> 
> The drawback with this solution is that userspace will ignore there are
> non-default utilization clamps, but it's arguable whether returning E2BIG in
> this case makes sense when that same userspace doesn't know about those values
> anyway.

a509a7cd7974 breaks the ABI when CONFIG_UCLAMP_TASK=y, so there's really 
no choice here but to fix it.

But this is not the right fix:

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5183,8 +5183,10 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
>  		attr.sched_nice = task_nice(p);
>  
>  #ifdef CONFIG_UCLAMP_TASK
> -	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> -	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> +	if (size >= SCHED_ATTR_SIZE_VER1) {
> +		attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> +		attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> +	}
>  #endif

The real bug is that the whole sched_read_attr() logic of checking 
non-zero bits in new ABI components is arguably broken, and pretty much 
any extension of the ABI will spuriously break the ABI as you just found 
out. That's way too fragile.

Instead how about something like the patch below? It cleans up the ABI 
logic:

 - if user-attributes have the same size as kernel attributes then the 
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply 
   skip the extra bits, but set attr->size to the (smaller) kernel size 
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just 
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we 
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This 
   code is so far away from VFS read() semantics that the naming is 
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the 
   sched_attr_copy_to_user() function. Nothing else within the kernel 
   should care about the size of the structure.

With this patch I believe the sched_getattr() syscall now nicely supports 
an extensible ABI in both a forward and backward compatible fashion, and 
will also fix the bug.

As an added bonus the bogus -EFBIG return is removed as well.

This needs to be fixed before v5.3 is released.

NOTE: one additional open question is whether to set attr->size to the 
      *larger* value if the kernel has a newer ABI. This would allow 
      user-space to detect a new ABI.

I've also Cc:-ed perf tooling ABI experts, which handles things in a 
similar fashion. (And if it doesn't or shouldn't then please chime in.)

Thanks,

	Ingo

------
Subject: sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
From: Ingo Molnar <mingo@kernel.org>
Date: Sat Aug 24 17:41:57 CEST 2019

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |   80 ++++++++++++++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

Index: linux/kernel/sched/core.c
===================================================================
--- linux.orig/kernel/sched/core.c
+++ linux/kernel/sched/core.c
@@ -5299,37 +5299,40 @@ out_unlock:
 	return retval;
 }
 
-static int sched_read_attr(struct sched_attr __user *uattr,
-			   struct sched_attr *attr,
-			   unsigned int usize)
+/*
+ * Copy the kernel size attribute structure (which might be larger
+ * than what user-space knows about) to user-space.
+ *
+ * Note that all cases are valid: user-space buffer can be larger or
+ * smaller than the kernel-space buffer. The usual case is that both
+ * have the same size.
+ */
+static int
+sched_attr_copy_to_user(struct sched_attr __user *uattr,
+			struct sched_attr *kattr,
+			unsigned int usize)
 {
-	int ret;
+	unsigned int ksize = sizeof(*kattr);
 
-	if (!access_ok(uattr, usize))
+	if (!access_ok(uattr, max(usize, ksize)))
 		return -EFAULT;
 
 	/*
-	 * If we're handed a smaller struct than we know of,
-	 * ensure all the unknown bits are 0 - i.e. old
-	 * user-space does not get uncomplete information.
+	 * sched_getattr() ABI forwards and backwards compatibility:
+	 *
+	 * If usize == ksize then we just copy everything to user-space and all is good.
+	 *
+	 * If usize < ksize then we only copy as much as user-space has space for,
+	 * this keeps ABI compatibility as well. We skip the rest.
+	 *
+	 * If usize > ksize then user-space is using a newer version of the ABI,
+	 * which part the kernel doesn't know about. Just ignore it - tooling can
+	 * detect the kernel's knowledge of attributes from the attr->size value
+	 * which is set to ksize in this case.
 	 */
-	if (usize < sizeof(*attr)) {
-		unsigned char *addr;
-		unsigned char *end;
-
-		addr = (void *)attr + usize;
-		end  = (void *)attr + sizeof(*attr);
-
-		for (; addr < end; addr++) {
-			if (*addr)
-				return -EFBIG;
-		}
-
-		attr->size = usize;
-	}
+	kattr->size = min(usize, ksize);
 
-	ret = copy_to_user(uattr, attr, attr->size);
-	if (ret)
+	if (copy_to_user(uattr, kattr, kattr->size))
 		return -EFAULT;
 
 	return 0;
@@ -5339,20 +5342,18 @@ static int sched_read_attr(struct sched_
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @size: sizeof(attr) for fwd/bwd comp.
+ * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-		unsigned int, size, unsigned int, flags)
+		unsigned int, usize, unsigned int, flags)
 {
-	struct sched_attr attr = {
-		.size = sizeof(struct sched_attr),
-	};
+	struct sched_attr kattr;
 	struct task_struct *p;
 	int retval;
 
-	if (!uattr || pid < 0 || size > PAGE_SIZE ||
-	    size < SCHED_ATTR_SIZE_VER0 || flags)
+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||
+	    usize < SCHED_ATTR_SIZE_VER0 || flags)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -5365,25 +5366,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi
 	if (retval)
 		goto out_unlock;
 
-	attr.sched_policy = p->policy;
+	kattr.sched_policy = p->policy;
 	if (p->sched_reset_on_fork)
-		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
 	if (task_has_dl_policy(p))
-		__getparam_dl(p, &attr);
+		__getparam_dl(p, &kattr);
 	else if (task_has_rt_policy(p))
-		attr.sched_priority = p->rt_priority;
+		kattr.sched_priority = p->rt_priority;
 	else
-		attr.sched_nice = task_nice(p);
+		kattr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-	attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-	attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
 #endif
 
 	rcu_read_unlock();
 
-	retval = sched_read_attr(uattr, &attr, size);
-	return retval;
+	return sched_attr_copy_to_user(uattr, &kattr, usize);
 
 out_unlock:
 	rcu_read_unlock();

  reply	other threads:[~2019-09-04  7:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 17:16 [PATCH 1/2] sched: sched_getattr should return E2BIG, not EFBIG Thadeu Lima de Souza Cascardo
2019-09-03 17:16 ` [PATCH 2/2] sched: allow sched_getattr with old size Thadeu Lima de Souza Cascardo
2019-09-04  7:55   ` Ingo Molnar [this message]
2019-09-04  8:49     ` [PATCH v2] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Ingo Molnar
2019-09-04  8:55       ` [PATCH v3] " Ingo Molnar
2019-09-04  9:31         ` Dietmar Eggemann
2019-09-04 10:39           ` Ingo Molnar
2019-09-04 10:55             ` Dietmar Eggemann
2019-09-04 13:19             ` Thadeu Lima de Souza Cascardo
2019-09-04 17:53               ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190904075532.GA26751@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@infradead.org \
    --cc=cascardo@canonical.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).