Re: I.1 - System calls - ioctl

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: eranian@gmail.com, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Robert Richter <robert.richter@amd.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Mackerras <paulus@samba.org>,
	Andi Kleen <andi@firstfloor.org>,
	Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <cel@us.ibm.com>,
	Corey J Ashford <cjashfor@us.ibm.com>,
	Philip Mucci <mucci@eecs.utk.edu>,
	Dan Terpstra <terpstra@eecs.utk.edu>,
	perfmon2-devel <perfmon2-devel@lists.sourceforge.net>
Subject: Re: I.1 - System calls - ioctl
Date: Mon, 22 Jun 2009 08:58:37 -0400	[thread overview]
Message-ID: <20090622125837.GA9429@infradead.org> (raw)
In-Reply-To: <20090622114931.GB24366@elte.hu>

On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote:
> > How do you justify your usage of ioctl() in this context?
> 
> We can certainly do a separate sys_perf_counter_ctrl() syscall - and
> we will do that if people think the extra syscall slot is worth it
> in this case.
> 
> The (mild) counter-argument so far was that the current ioctls are
> very simple over "IO" attributes of counters:
> 
>  - enable
>  - disable
>  - reset
>  - refresh
>  - set-period
> 
> So they could be considered 'IO controls' in the classic sense and
> act as a (mild) exception to the 'dont use ioctls' rule.
> 
> They are not some weird tacked-on syscall functionality - they
> modify the IO properties of counters: on/off, value and rate. If
> they go beyond that we'll put it all into a separate syscall and
> deprecate the ioctl (which will have a relatively short half-time
> due to the tools being hosted in the kernel repo).
> 
> This could happen right now in fact, if people think it's worth it.

Yet another multiplexer doesn't buy as anything over ioctls unless it
adds more structure.  PERF_COUNTER_IOC_ENABLE/PERF_COUNTER_IOC_DISABLE/
PERF_COUNTER_IOC_RESET are calls without any argument, so it's kinda
impossible to add more structure.  perf_counter_refresh has an integer
argument, and perf_counter_period aswell (with a slightly more
complicated calling convention due to passing a pointer to the 64bit
integer).  I don't see how moving this to syscalls would improve
things.

But talking about syscalls the sys_perf_counter_open prototype is
really ugly - it uses either the pid or cpu argument which is a pretty
clear indicator it should actually be two sys calls.

Incomplete patch without touching the actuall wire-up below to
demonstrate it:


Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c	2009-06-22 14:43:35.323966162 +0200
+++ linux-2.6/kernel/perf_counter.c	2009-06-22 14:57:30.223807475 +0200
@@ -1396,41 +1396,14 @@ __perf_counter_init_context(struct perf_
 	ctx->task = task;
 }
 
-static struct perf_counter_context *find_get_context(pid_t pid, int cpu)
+static struct perf_counter_context *find_get_pid_context(pid_t pid)
 {
 	struct perf_counter_context *parent_ctx;
 	struct perf_counter_context *ctx;
-	struct perf_cpu_context *cpuctx;
 	struct task_struct *task;
 	unsigned long flags;
 	int err;
 
-	/*
-	 * If cpu is not a wildcard then this is a percpu counter:
-	 */
-	if (cpu != -1) {
-		/* Must be root to operate on a CPU counter: */
-		if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
-			return ERR_PTR(-EACCES);
-
-		if (cpu < 0 || cpu > num_possible_cpus())
-			return ERR_PTR(-EINVAL);
-
-		/*
-		 * We could be clever and allow to attach a counter to an
-		 * offline CPU and activate it when the CPU comes up, but
-		 * that's for later.
-		 */
-		if (!cpu_isset(cpu, cpu_online_map))
-			return ERR_PTR(-ENODEV);
-
-		cpuctx = &per_cpu(perf_cpu_context, cpu);
-		ctx = &cpuctx->ctx;
-		get_ctx(ctx);
-
-		return ctx;
-	}
-
 	rcu_read_lock();
 	if (!pid)
 		task = current;
@@ -3727,6 +3700,16 @@ static int perf_copy_attr(struct perf_co
 	if (attr->read_format & ~(PERF_FORMAT_MAX-1))
 		return -EINVAL;
 
+	if (!attr->exclude_kernel) {
+		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+			return -EACCES;
+	}
+
+	if (attr->freq) {
+		if (attr->sample_freq > sysctl_perf_counter_sample_rate)
+			return -EINVAL;
+	}
+
 out:
 	return ret;
 
@@ -3736,52 +3719,16 @@ err_size:
 	goto out;
 }
 
-/**
- * sys_perf_counter_open - open a performance counter, associate it to a task/cpu
- *
- * @attr_uptr:	event type attributes for monitoring/sampling
- * @pid:		target pid
- * @cpu:		target cpu
- * @group_fd:		group leader counter fd
- */
-SYSCALL_DEFINE5(perf_counter_open,
-		struct perf_counter_attr __user *, attr_uptr,
-		pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
+static int do_perf_counter_open(struct perf_counter_attr *attr,
+		struct perf_counter_context *ctx, int cpu, int group_fd)
 {
 	struct perf_counter *counter, *group_leader;
-	struct perf_counter_attr attr;
-	struct perf_counter_context *ctx;
 	struct file *counter_file = NULL;
 	struct file *group_file = NULL;
 	int fput_needed = 0;
 	int fput_needed2 = 0;
 	int ret;
 
-	/* for future expandability... */
-	if (flags)
-		return -EINVAL;
-
-	ret = perf_copy_attr(attr_uptr, &attr);
-	if (ret)
-		return ret;
-
-	if (!attr.exclude_kernel) {
-		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
-			return -EACCES;
-	}
-
-	if (attr.freq) {
-		if (attr.sample_freq > sysctl_perf_counter_sample_rate)
-			return -EINVAL;
-	}
-
-	/*
-	 * Get the target context (task or percpu):
-	 */
-	ctx = find_get_context(pid, cpu);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
 	/*
 	 * Look up the group leader (we will attach this counter to it):
 	 */
@@ -3810,11 +3757,11 @@ SYSCALL_DEFINE5(perf_counter_open,
 		/*
 		 * Only a group leader can be exclusive or pinned
 		 */
-		if (attr.exclusive || attr.pinned)
+		if (attr->exclusive || attr->pinned)
 			goto err_put_context;
 	}
 
-	counter = perf_counter_alloc(&attr, cpu, ctx, group_leader,
+	counter = perf_counter_alloc(attr, cpu, ctx, group_leader,
 				     GFP_KERNEL);
 	ret = PTR_ERR(counter);
 	if (IS_ERR(counter))
@@ -3857,6 +3804,68 @@ err_put_context:
 	goto out_fput;
 }
 
+SYSCALL_DEFINE4(perf_counter_open_pid,
+		struct perf_counter_attr __user *, attr_uptr,
+		pid_t, pid, int, group_fd, unsigned long, flags)
+{
+	struct perf_counter_attr attr;
+	struct perf_counter_context *ctx;
+	int ret;
+
+	/* for future expandability... */
+	if (flags)
+		return -EINVAL;
+
+	ret = perf_copy_attr(attr_uptr, &attr);
+	if (ret)
+		return ret;
+
+	ctx = find_get_pid_context(pid);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	return do_perf_counter_open(&attr, ctx, -1, group_fd);
+}
+
+SYSCALL_DEFINE4(perf_counter_open_cpu,
+		struct perf_counter_attr __user *, attr_uptr,
+		int, cpu, int, group_fd, unsigned long, flags)
+{
+	struct perf_counter_attr attr;
+	struct perf_counter_context *ctx;
+	struct perf_cpu_context *cpuctx;
+	int ret;
+
+	/* for future expandability... */
+	if (flags)
+		return -EINVAL;
+
+	ret = perf_copy_attr(attr_uptr, &attr);
+	if (ret)
+		return ret;
+
+	/* Must be root to operate on a CPU counter: */
+	if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (cpu < 0 || cpu > num_possible_cpus())
+		return -EINVAL;
+
+	/*
+	 * We could be clever and allow to attach a counter to an
+	 * offline CPU and activate it when the CPU comes up, but
+	 * that's for later.
+	 */
+	if (!cpu_isset(cpu, cpu_online_map))
+		return -ENODEV;
+
+	cpuctx = &per_cpu(perf_cpu_context, cpu);
+	ctx = &cpuctx->ctx;
+	get_ctx(ctx);
+
+	return do_perf_counter_open(&attr, ctx, cpu, group_fd);
+}
+
 /*
  * inherit a counter from parent task to child task:
  */
@@ -4027,7 +4036,7 @@ void perf_counter_exit_task(struct task_
 	__perf_counter_task_sched_out(child_ctx);
 
 	/*
-	 * Take the context lock here so that if find_get_context is
+	 * Take the context lock here so that if find_get_pid_context is
 	 * reading child->perf_counter_ctxp, we wait until it has
 	 * incremented the context's refcount before we do put_ctx below.
 	 */

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---

next prev parent reply	other threads:[~2009-06-22 12:58 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 17:42 v2 of comments on Performance Counters for Linux (PCL) stephane eranian
2009-06-22 11:48 ` Ingo Molnar
2009-06-22 11:49 ` I.1 - System calls - ioctl Ingo Molnar
2009-06-22 12:58   ` Christoph Hellwig [this message]
2009-06-22 13:56     ` Ingo Molnar
2009-06-22 17:41       ` Arnd Bergmann
2009-07-13 10:53     ` Peter Zijlstra
2009-07-13 17:30       ` [perfmon2] " Arnd Bergmann
2009-07-13 17:34         ` Peter Zijlstra
2009-07-13 17:53           ` Arnd Bergmann
2009-07-14 13:51       ` Christoph Hellwig
2009-07-30 13:58       ` stephane eranian
2009-07-30 14:13         ` Peter Zijlstra
2009-07-30 16:17           ` stephane eranian
2009-07-30 16:40             ` Arnd Bergmann
2009-07-30 16:53               ` stephane eranian
2009-07-30 17:20                 ` Arnd Bergmann
2009-08-03 14:22                   ` Peter Zijlstra
2009-06-22 11:50 ` I.2 - Grouping Ingo Molnar
2009-06-22 19:45   ` stephane eranian
2009-06-22 22:04     ` Corey Ashford
2009-06-23 17:51       ` stephane eranian
2009-06-22 21:38   ` Corey Ashford
2009-06-23  5:16   ` Paul Mackerras
2009-06-23  7:36     ` stephane eranian
2009-06-23  8:26       ` Paul Mackerras
2009-06-23  8:30         ` stephane eranian
2009-06-23 16:24           ` Corey Ashford
2009-06-22 11:51 ` I.3 - Multiplexing and system-wide Ingo Molnar
2009-06-22 11:51 ` I.4 - Controlling group multiplexing Ingo Molnar
2009-06-22 11:52 ` I.5 - Mmaped count Ingo Molnar
2009-06-22 12:25   ` stephane eranian
2009-06-22 12:35     ` Peter Zijlstra
2009-06-22 12:54       ` stephane eranian
2009-06-22 14:39         ` Peter Zijlstra
2009-06-23  0:41         ` Paul Mackerras
2009-06-23  0:39       ` Paul Mackerras
2009-06-23  6:13         ` Peter Zijlstra
2009-06-23  7:40         ` stephane eranian
2009-06-23  0:33     ` Paul Mackerras
2009-06-22 11:53 ` I.6 - Group scheduling Ingo Molnar
2009-06-22 11:54 ` I.7 - Group validity checking Ingo Molnar
2009-06-22 11:54 ` I.8 - Generalized cache events Ingo Molnar
2009-06-22 11:55 ` I.9 - Group reading Ingo Molnar
2009-06-22 11:55 ` I.10 - Event buffer minimal useful size Ingo Molnar
2009-06-22 11:56 ` I.11 - Missing definitions for generic events Ingo Molnar
2009-06-22 14:54   ` stephane eranian
2009-06-22 11:57 ` II.1 - Fixed counters on Intel Ingo Molnar
2009-06-22 14:27   ` stephane eranian
2009-06-22 11:57 ` II.2 - Event knowledge missing Ingo Molnar
2009-06-23 13:18   ` stephane eranian
2009-06-22 11:58 ` III.1 - Sampling period randomization Ingo Molnar
2009-06-22 11:58 ` IV.1 - Support for model-specific uncore PMU Ingo Molnar
2009-06-22 11:59 ` IV.2 - Features impacting all counters Ingo Molnar
2009-06-22 12:00 ` IV.3 - AMD IBS Ingo Molnar
2009-06-22 14:08   ` [perfmon2] " Rob Fowler
2009-06-22 17:58     ` Maynard Johnson
2009-06-23  6:19     ` Peter Zijlstra
2009-06-23  8:19       ` stephane eranian
2009-06-23 14:05         ` Ingo Molnar
2009-06-23 14:25           ` stephane eranian
2009-06-23 14:55             ` Ingo Molnar
2009-06-23 14:40       ` Rob Fowler
2009-06-22 19:17   ` stephane eranian
2009-06-22 12:00 ` IV.4 - Intel PEBS Ingo Molnar
2009-06-22 12:16   ` Andi Kleen
2009-06-22 12:01 ` IV.5 - Intel Last Branch Record (LBR) Ingo Molnar
2009-06-22 20:02   ` stephane eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090622125837.GA9429@infradead.org \
    --to=hch@infradead.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cel@us.ibm.com \
    --cc=cjashfor@us.ibm.com \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpjohn@us.ibm.com \
    --cc=mucci@eecs.utk.edu \
    --cc=paulus@samba.org \
    --cc=perfmon2-devel@lists.sourceforge.net \
    --cc=robert.richter@amd.com \
    --cc=terpstra@eecs.utk.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox