Re: I.1 - System calls - ioctl

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: eranian@gmail.com, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Robert Richter <robert.richter@amd.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Mackerras <paulus@samba.org>,
	Andi Kleen <andi@firstfloor.org>,
	Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <cel@us.ibm.com>,
	Corey J Ashford <cjashfor@us.ibm.com>,
	Philip Mucci <mucci@eecs.utk.edu>,
	Dan Terpstra <terpstra@eecs.utk.edu>,
	perfmon2-devel <perfmon2-devel@lists.sourceforge.net>
Subject: Re: I.1 - System calls - ioctl
Date: Mon, 22 Jun 2009 08:58:37 -0400	[thread overview]
Message-ID: <20090622125837.GA9429@infradead.org> (raw)
In-Reply-To: <20090622114931.GB24366@elte.hu>

On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote:
> > How do you justify your usage of ioctl() in this context?
> 
> We can certainly do a separate sys_perf_counter_ctrl() syscall - and
> we will do that if people think the extra syscall slot is worth it
> in this case.
> 
> The (mild) counter-argument so far was that the current ioctls are
> very simple over "IO" attributes of counters:
> 
>  - enable
>  - disable
>  - reset
>  - refresh
>  - set-period
> 
> So they could be considered 'IO controls' in the classic sense and
> act as a (mild) exception to the 'dont use ioctls' rule.
> 
> They are not some weird tacked-on syscall functionality - they
> modify the IO properties of counters: on/off, value and rate. If
> they go beyond that we'll put it all into a separate syscall and
> deprecate the ioctl (which will have a relatively short half-time
> due to the tools being hosted in the kernel repo).
> 
> This could happen right now in fact, if people think it's worth it.

Yet another multiplexer doesn't buy as anything over ioctls unless it
adds more structure.  PERF_COUNTER_IOC_ENABLE/PERF_COUNTER_IOC_DISABLE/
PERF_COUNTER_IOC_RESET are calls without any argument, so it's kinda
impossible to add more structure.  perf_counter_refresh has an integer
argument, and perf_counter_period aswell (with a slightly more
complicated calling convention due to passing a pointer to the 64bit
integer).  I don't see how moving this to syscalls would improve
things.

But talking about syscalls the sys_perf_counter_open prototype is
really ugly - it uses either the pid or cpu argument which is a pretty
clear indicator it should actually be two sys calls.

Incomplete patch without touching the actuall wire-up below to
demonstrate it:


Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c	2009-06-22 14:43:35.323966162 +0200
+++ linux-2.6/kernel/perf_counter.c	2009-06-22 14:57:30.223807475 +0200
@@ -1396,41 +1396,14 @@ __perf_counter_init_context(struct perf_
 	ctx->task = task;
 }
 
-static struct perf_counter_context *find_get_context(pid_t pid, int cpu)
+static struct perf_counter_context *find_get_pid_context(pid_t pid)
 {
 	struct perf_counter_context *parent_ctx;
 	struct perf_counter_context *ctx;
-	struct perf_cpu_context *cpuctx;
 	struct task_struct *task;
 	unsigned long flags;
 	int err;
 
-	/*
-	 * If cpu is not a wildcard then this is a percpu counter:
-	 */
-	if (cpu != -1) {
-		/* Must be root to operate on a CPU counter: */
-		if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
-			return ERR_PTR(-EACCES);
-
-		if (cpu < 0 || cpu > num_possible_cpus())
-			return ERR_PTR(-EINVAL);
-
-		/*
-		 * We could be clever and allow to attach a counter to an
-		 * offline CPU and activate it when the CPU comes up, but
-		 * that's for later.
-		 */
-		if (!cpu_isset(cpu, cpu_online_map))
-			return ERR_PTR(-ENODEV);
-
-		cpuctx = &per_cpu(perf_cpu_context, cpu);
-		ctx = &cpuctx->ctx;
-		get_ctx(ctx);
-
-		return ctx;
-	}
-
 	rcu_read_lock();
 	if (!pid)
 		task = current;
@@ -3727,6 +3700,16 @@ static int perf_copy_attr(struct perf_co
 	if (attr->read_format & ~(PERF_FORMAT_MAX-1))
 		return -EINVAL;
 
+	if (!attr->exclude_kernel) {
+		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+			return -EACCES;
+	}
+
+	if (attr->freq) {
+		if (attr->sample_freq > sysctl_perf_counter_sample_rate)
+			return -EINVAL;
+	}
+
 out:
 	return ret;
 
@@ -3736,52 +3719,16 @@ err_size:
 	goto out;
 }
 
-/**
- * sys_perf_counter_open - open a performance counter, associate it to a task/cpu
- *
- * @attr_uptr:	event type attributes for monitoring/sampling
- * @pid:		target pid
- * @cpu:		target cpu
- * @group_fd:		group leader counter fd
- */
-SYSCALL_DEFINE5(perf_counter_open,
-		struct perf_counter_attr __user *, attr_uptr,
-		pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
+static int do_perf_counter_open(struct perf_counter_attr *attr,
+		struct perf_counter_context *ctx, int cpu, int group_fd)
 {
 	struct perf_counter *counter, *group_leader;
-	struct perf_counter_attr attr;
-	struct perf_counter_context *ctx;
 	struct file *counter_file = NULL;
 	struct file *group_file = NULL;
 	int fput_needed = 0;
 	int fput_needed2 = 0;
 	int ret;
 
-	/* for future expandability... */
-	if (flags)
-		return -EINVAL;
-
-	ret = perf_copy_attr(attr_uptr, &attr);
-	if (ret)
-		return ret;
-
-	if (!attr.exclude_kernel) {
-		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
-			return -EACCES;
-	}
-
-	if (attr.freq) {
-		if (attr.sample_freq > sysctl_perf_counter_sample_rate)
-			return -EINVAL;
-	}
-
-	/*
-	 * Get the target context (task or percpu):
-	 */
-	ctx = find_get_context(pid, cpu);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
 	/*
 	 * Look up the group leader (we will attach this counter to it):
 	 */
@@ -3810,11 +3757,11 @@ SYSCALL_DEFINE5(perf_counter_open,
 		/*
 		 * Only a group leader can be exclusive or pinned
 		 */
-		if (attr.exclusive || attr.pinned)
+		if (attr->exclusive || attr->pinned)
 			goto err_put_context;
 	}
 
-	counter = perf_counter_alloc(&attr, cpu, ctx, group_leader,
+	counter = perf_counter_alloc(attr, cpu, ctx, group_leader,
 				     GFP_KERNEL);
 	ret = PTR_ERR(counter);
 	if (IS_ERR(counter))
@@ -3857,6 +3804,68 @@ err_put_context:
 	goto out_fput;
 }
 
+SYSCALL_DEFINE4(perf_counter_open_pid,
+		struct perf_counter_attr __user *, attr_uptr,
+		pid_t, pid, int, group_fd, unsigned long, flags)
+{
+	struct perf_counter_attr attr;
+	struct perf_counter_context *ctx;
+	int ret;
+
+	/* for future expandability... */
+	if (flags)
+		return -EINVAL;
+
+	ret = perf_copy_attr(attr_uptr, &attr);
+	if (ret)
+		return ret;
+
+	ctx = find_get_pid_context(pid);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	return do_perf_counter_open(&attr, ctx, -1, group_fd);
+}
+
+SYSCALL_DEFINE4(perf_counter_open_cpu,
+		struct perf_counter_attr __user *, attr_uptr,
+		int, cpu, int, group_fd, unsigned long, flags)
+{
+	struct perf_counter_attr attr;
+	struct perf_counter_context *ctx;
+	struct perf_cpu_context *cpuctx;
+	int ret;
+
+	/* for future expandability... */
+	if (flags)
+		return -EINVAL;
+
+	ret = perf_copy_attr(attr_uptr, &attr);
+	if (ret)
+		return ret;
+
+	/* Must be root to operate on a CPU counter: */
+	if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (cpu < 0 || cpu > num_possible_cpus())
+		return -EINVAL;
+
+	/*
+	 * We could be clever and allow to attach a counter to an
+	 * offline CPU and activate it when the CPU comes up, but
+	 * that's for later.
+	 */
+	if (!cpu_isset(cpu, cpu_online_map))
+		return -ENODEV;
+
+	cpuctx = &per_cpu(perf_cpu_context, cpu);
+	ctx = &cpuctx->ctx;
+	get_ctx(ctx);
+
+	return do_perf_counter_open(&attr, ctx, cpu, group_fd);
+}
+
 /*
  * inherit a counter from parent task to child task:
  */
@@ -4027,7 +4036,7 @@ void perf_counter_exit_task(struct task_
 	__perf_counter_task_sched_out(child_ctx);
 
 	/*
-	 * Take the context lock here so that if find_get_context is
+	 * Take the context lock here so that if find_get_pid_context is
 	 * reading child->perf_counter_ctxp, we wait until it has
 	 * incremented the context's refcount before we do put_ctx below.
 	 */

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---

next prev parent reply	other threads:[~2009-06-22 12:58 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 17:42 v2 of comments on Performance Counters for Linux (PCL) stephane eranian
2009-06-22 11:48 ` Ingo Molnar
2009-06-22 11:49 ` I.1 - System calls - ioctl Ingo Molnar
2009-06-22 12:58   ` Christoph Hellwig [this message]
2009-06-22 13:56     ` Ingo Molnar
2009-06-22 17:41       ` Arnd Bergmann
2009-07-13 10:53     ` Peter Zijlstra
2009-07-13 17:30       ` [perfmon2] " Arnd Bergmann
2009-07-13 17:34         ` Peter Zijlstra
2009-07-13 17:53           ` Arnd Bergmann
2009-07-14 13:51       ` Christoph Hellwig
2009-07-30 13:58       ` stephane eranian
2009-07-30 14:13         ` Peter Zijlstra
2009-07-30 16:17           ` stephane eranian
2009-07-30 16:40             ` Arnd Bergmann
2009-07-30 16:53               ` stephane eranian
2009-07-30 17:20                 ` Arnd Bergmann
2009-08-03 14:22                   ` Peter Zijlstra
2009-06-22 11:50 ` I.2 - Grouping Ingo Molnar
2009-06-22 19:45   ` stephane eranian
2009-06-22 22:04     ` Corey Ashford
2009-06-23 17:51       ` stephane eranian
2009-06-22 21:38   ` Corey Ashford
2009-06-23  5:16   ` Paul Mackerras
2009-06-23  7:36     ` stephane eranian
2009-06-23  8:26       ` Paul Mackerras
2009-06-23  8:30         ` stephane eranian
2009-06-23 16:24           ` Corey Ashford
2009-06-22 11:51 ` I.3 - Multiplexing and system-wide Ingo Molnar
2009-06-22 11:51 ` I.4 - Controlling group multiplexing Ingo Molnar
2009-06-22 11:52 ` I.5 - Mmaped count Ingo Molnar
2009-06-22 12:25   ` stephane eranian
2009-06-22 12:35     ` Peter Zijlstra
2009-06-22 12:54       ` stephane eranian
2009-06-22 14:39         ` Peter Zijlstra
2009-06-23  0:41         ` Paul Mackerras
2009-06-23  0:39       ` Paul Mackerras
2009-06-23  6:13         ` Peter Zijlstra
2009-06-23  7:40         ` stephane eranian
2009-06-23  0:33     ` Paul Mackerras
2009-06-22 11:53 ` I.6 - Group scheduling Ingo Molnar
2009-06-22 11:54 ` I.7 - Group validity checking Ingo Molnar
2009-06-22 11:54 ` I.8 - Generalized cache events Ingo Molnar
2009-06-22 11:55 ` I.9 - Group reading Ingo Molnar
2009-06-22 11:55 ` I.10 - Event buffer minimal useful size Ingo Molnar
2009-06-22 11:56 ` I.11 - Missing definitions for generic events Ingo Molnar
2009-06-22 14:54   ` stephane eranian
2009-06-22 11:57 ` II.1 - Fixed counters on Intel Ingo Molnar
2009-06-22 14:27   ` stephane eranian
2009-06-22 11:57 ` II.2 - Event knowledge missing Ingo Molnar
2009-06-23 13:18   ` stephane eranian
2009-06-22 11:58 ` III.1 - Sampling period randomization Ingo Molnar
2009-06-22 11:58 ` IV.1 - Support for model-specific uncore PMU Ingo Molnar
2009-06-22 11:59 ` IV.2 - Features impacting all counters Ingo Molnar
2009-06-22 12:00 ` IV.3 - AMD IBS Ingo Molnar
2009-06-22 14:08   ` [perfmon2] " Rob Fowler
2009-06-22 17:58     ` Maynard Johnson
2009-06-23  6:19     ` Peter Zijlstra
2009-06-23  8:19       ` stephane eranian
2009-06-23 14:05         ` Ingo Molnar
2009-06-23 14:25           ` stephane eranian
2009-06-23 14:55             ` Ingo Molnar
2009-06-23 14:40       ` Rob Fowler
2009-06-22 19:17   ` stephane eranian
2009-06-22 12:00 ` IV.4 - Intel PEBS Ingo Molnar
2009-06-22 12:16   ` Andi Kleen
2009-06-22 12:01 ` IV.5 - Intel Last Branch Record (LBR) Ingo Molnar
2009-06-22 20:02   ` stephane eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090622125837.GA9429@infradead.org \
    --to=hch@infradead.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cel@us.ibm.com \
    --cc=cjashfor@us.ibm.com \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpjohn@us.ibm.com \
    --cc=mucci@eecs.utk.edu \
    --cc=paulus@samba.org \
    --cc=perfmon2-devel@lists.sourceforge.net \
    --cc=robert.richter@amd.com \
    --cc=terpstra@eecs.utk.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.