From: Arnd Bergmann <arnd@arndb.de>
To: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Stephane Eranian <eranian@googlemail.com>,
Eric Dumazet <dada1@cosmosbay.com>,
Robert Richter <robert.richter@amd.com>,
Arjan van de Ven <arjan@infradead.org>,
Peter Anvin <hpa@zytor.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"David S. Miller" <davem@davemloft.net>,
Mike Galbraith <efault@gmx.de>
Subject: Re: [announce] Performance Counters for Linux, v6
Date: Thu, 26 Feb 2009 14:37:40 +0100 [thread overview]
Message-ID: <200902261437.41833.arnd@arndb.de> (raw)
In-Reply-To: <18854.26007.890850.336512@cargo.ozlabs.ibm.com>
On Thursday 26 February 2009, Paul Mackerras wrote:
> Arnd Bergmann writes:
> No, it's just an oversight, plus the fact that this stuff doesn't
> support the hardware counters on Cell, which I've heard is rather
> unusual. Don't you have to hcalls to access it on machines with a
> hypervisor? Also, is there any facility for counting events on the
> SPUs?
Yes, there is, and it's suported in both oprofile and perfmon2.
It uses a set of hooks in the spu scheduler to know about what
code got loaded into it, and the platform interface is a combination
of mmio and rtas/hcall accesses.
> > > +static const struct file_operations perf_fops = {
> > > + .release = perf_release,
> > > + .read = perf_read,
> > > + .poll = perf_poll,
> > > + .unlocked_ioctl = perf_ioctl,
> > > + .compat_ioctl = perf_ioctl,
> > > +};
> >
> > It feels inconsistent to combine a syscall for getting the fd with ioctl.
>
> Why? Once you have an fd, what's inconsistent about doing an ioctl on
> it?
I don't think we have any interface doing that today outside of the
traditional chardev, blockdev, socket and regular file descriptors.
The first inconsistency would be in having another type of object
that you can do ioctl on.
Back in the original spufs interface design, I had similar plans
for adding a combination of syscall and ioctl interfaces. The point
that convinced me to go with just syscall and regular file operations
was that syscall and ioctl are basically two interchangeable ways of
doing the same thing (adding an arbitrary function call into the kernel).
When you look in /proc/misc, half the stuff in there only exists as
a cheap way to sneak a syscall into the kernel without the necessary
review.
The inconsistency in combining the two would be the same as having
sys_timerfd_settime an ioctl on the fd from sys_timerfd_create,
or getdents an ioctl on a directory fd.
The most significant impact of allowing ioctl on the fd is certainly
that it allows adding many more interfaces on top of the performance
counters as soon as the code is initially merged. Whether you count
that as a an advantage or a disadvantage depends on your point
of view ;-).
> > Assuming the interface is semantically right, I'd suggest using either
> > a chardev with more ioctls or more syscalls but not both:
> >
> > asmlinkage long sys_perfcounter_fd(
> > struct perf_counter_hw_event *hw_event_uptr __user,
> > pid_t pid,
> > int cpu,
> > int group_fd);
> > asmlinkage long sys_perfcounter_setflags(int fd, u32 flags);
> >
> > or
> >
> > struct perf_counter_setup {
> > struct perf_counter_hw_event event;
> > __u64 pid;
> > __u32 cpu;
> > __u32 group_fd;
> > };
> > #define PERF_COUNTER_IOC_SETUP _IOW('$', 1, struct perf_counter_setup)
> > #define PERF_COUNTER_IOC_SETFLAGS _IOW('$', 2, __u32)
>
> I don't see any compelling value in either of those suggestions,
> sorry. Possibly, if it turns out that read() or ioctl() are just too
> slow (and can't be improved), we might consider adding syscalls.
I don't think that performance is an issue here, read() is quite well
optimized and ioctl() would only be used in the slowpath.
My second example is a bit constructed, if you wanted to go with a
chardev, that would probably imply larger interface changes than that.
One way to improve the interface over PERF_COUNTER_IOC_ENABLE might
be to use write() for enable/disable, but if sys_perfcounter_fd (or
whatever we call it) stays, using other syscalls for operating on
it seems to be the most consistant way of doing it.
Another point that now occurred to me is that in sys_perfcounter_fd(),
you should probably put the group descriptor argument first, to be
more consistent with the openat() family of syscalls which all take
the relative position as their first argument.
Arnd <><
next prev parent reply other threads:[~2009-02-26 13:39 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-21 18:50 [announce] Performance Counters for Linux, v6 Ingo Molnar
2009-01-21 19:34 ` Randy Dunlap
2009-01-21 19:56 ` Ingo Molnar
2009-01-21 21:14 ` Randy Dunlap
2009-01-22 11:22 ` Karel Zak
2009-01-22 12:04 ` Karel Zak
2009-01-22 12:06 ` Ingo Molnar
2009-01-26 1:06 ` Corey Ashford
2009-01-26 9:13 ` stephane eranian
2009-01-26 15:17 ` Ingo Molnar
2009-01-26 16:55 ` stephane eranian
2009-01-26 19:13 ` Corey Ashford
2009-01-26 19:39 ` [perfmon2] " Luck, Tony
2009-01-26 22:10 ` Ingo Molnar
2009-01-26 22:15 ` Ingo Molnar
2009-01-26 23:41 ` Corey Ashford
2009-01-29 2:10 ` Corey Ashford
2009-01-29 12:32 ` stephane eranian
2009-01-29 20:01 ` Corey Ashford
2009-01-29 21:44 ` stephane eranian
2009-02-19 21:53 ` Corey Ashford
2009-02-20 8:10 ` Ingo Molnar
2009-02-20 22:38 ` Corey Ashford
2009-02-20 22:47 ` Peter Zijlstra
2009-02-20 23:04 ` Corey Ashford
2009-02-20 23:24 ` stephane eranian
2009-02-20 23:58 ` Corey Ashford
2009-02-21 0:47 ` Arnd Bergmann
2009-02-26 9:49 ` Paul Mackerras
2009-02-26 13:37 ` Arnd Bergmann [this message]
2009-03-09 1:39 ` Robert Richter
2009-03-09 23:01 ` Paul Mackerras
2009-03-10 9:44 ` Robert Richter
2009-03-10 10:29 ` Peter Zijlstra
2009-03-10 11:49 ` Paul Mackerras
2009-03-10 11:53 ` Ingo Molnar
2009-03-10 16:26 ` Robert Richter
2009-03-10 17:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902261437.41833.arnd@arndb.de \
--to=arnd@arndb.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=efault@gmx.de \
--cc=eranian@googlemail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=robert.richter@amd.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox