public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Roland Dreier <rdreier@cisco.com>
Cc: Ingo Molnar <mingo@elte.hu>, Pavel Machek <pavel@ucw.cz>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	general@lists.openfabrics.org, akpm@linux-foundation.org,
	torvalds@linux-foundation.org
Subject: Re: [ofa-general] Re: [GIT PULL] please pull ummunotify
Date: Mon, 12 Oct 2009 19:33:22 +0200	[thread overview]
Message-ID: <1255368802.8392.26.camel@twins> (raw)
In-Reply-To: <ada3a5uq1dk.fsf@cisco.com>

On Wed, 2009-10-07 at 15:34 -0700, Roland Dreier wrote:
> > So I looked a little deeper into this, and I don't think (even with the
>  > filtering extensions) that perf events are directly applicable to this
>  > problem.  The first issue is that, assuming I'm understanding the
>  > comment in perf_event.c:
>  > 
>  >         /*
>  >          * Raw tracepoint data is a severe data leak, only allow root to
>  >          * have these.
>  >          */
>  > 
>  > currently tracepoints can only be used by privileged processes.  A key
>  > feature of ummunotify is that ordinary unprivileged processes can use it.
>  > 
>  > So would it be acceptable to add something like PERF_TYPE_MMU_NOTIFIER
>  > as a way of letting unprivileged userspace get access to just MMU events
>  > for their own process?  Clearly this touches core infrastructure and is
>  > not as simple as just adding two tracepoints.
>  > 
>  > Then, assuming we have some way to create an "MMU notifier" perf event,
>  > we need a way for userspace to specify which address ranges it would
>  > like events for (I don't think the string filter expression used by
>  > existing trace filtering works, because if userspace is looking at a few
>  > hundred regions, then the size of the filtering expression explodes, and
>  > adding or removing a single range becomes a pain).  So I guess a new
>  > ioctl() to add/remove ranges for MMU_NOTIFIER perf events?
>  > 
>  > I think filtering is needed, because otherwise events for ranges that
>  > are not of interest are just a waste of resources to generate and
>  > process, and make losing good events because of overflow much more
>  > likely.
>  > 
>  > We still have the problem of lost events if the mmap buffer overflows,
>  > but userspace should be able to size the buffer so that such events are
>  > rare I guess.
>  > 
>  > In the end this seems to just take the ummunotify code I have, and make
>  > it be a new type of perf counter instead of a character special device.
>  > I'd actually be OK with that, since having an oddball new char dev
>  > interface is not particularly nice.  But on the other hand just
>  > multiplexing a new type of thing under perf events is not all that much
>  > better.  What do you think?
> 
> Ingo/Peter/<anyone suggesting perf events> -- can you comment on this
> plan of creating PERF_TYPE_MMU_NOTIFIER for perf events to implement
> ummunotify?  To me it looks like a wash -- the main difference is how
> userspace gets the magic ummunotify file descriptor, either by
> open("/dev/ummunotify") or by perf_event_open(...PERF_TYPE_MMU_NOTIFIER...),
> but pretty much everything else stays pretty much the same in terms of
> how much kernel code is involved.  We do reuse the perf events mmap
> buffer code but I think that ends up being more complicated than
> returning events via read().
> 
> Anyway, before I spend the time converting over to the new
> infrastructure and causing the MPI guys to churn their code, I'd like to
> make sure that this is what you guys have in mind.
> 
> (By the way, after thinking about this more, I really do think that
> filtering events by address range is a must-have -- with filtering,
> userspace can map sufficient buffer space to avoid losing events for a
> given number of regions; without filtering, events might get lost just
> because of invalidate events for ranges userspace didn't even care about)

I think something like

PERF_TYPE_SOFTWARE, PERF_COUNT_SW_MUNMAP + $filter

or

PERF_TYPE_TRACEPOINT, //events/vm/munmap/id + $filter

As for the read/poll issue, I think we can do something like
PERF_FORMAT_BLOCK which would make read() block when ->count hasn't
changed, and make poll() work without requiring a mmap().

As to filter, we can do two things, add a simple single range filter to
perf_event_attr, which is something ia64 has hardware support for IIRC,
or we can possibly use this trace filter muck.

Would something like that be sufficient? With such events only
generating a wakeup (poll) when the unmap actually happens, you'd not
even need an mmap() buffer to keep up with that.



  reply	other threads:[~2009-10-12 17:34 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  4:38 [GIT PULL] please pull ummunotify Roland Dreier
2009-09-11  5:56 ` KOSAKI Motohiro
2009-09-11  6:03   ` Roland Dreier
2009-09-11  6:11     ` KOSAKI Motohiro
2009-09-11 16:42       ` Gleb Natapov
2009-09-11  6:15     ` Brice Goglin
2009-09-11  6:21       ` KOSAKI Motohiro
2009-09-11  6:22       ` Roland Dreier
2009-09-11  6:40         ` [ofa-general] " Jason Gunthorpe
2009-09-11 16:58           ` Roland Dreier
2009-09-15  7:03             ` KOSAKI Motohiro
2009-09-15  8:27               ` Roland Dreier
2009-09-15 12:38               ` Jeff Squyres
2009-09-15 11:34 ` Pavel Machek
2009-09-15 14:57   ` [ofa-general] " Roland Dreier
2009-09-28 20:49     ` Pavel Machek
2009-09-28 21:40       ` Jason Gunthorpe
2009-09-16 16:30 ` Roland Dreier
2009-09-16 16:40   ` Linus Torvalds
2009-09-17 11:30 ` Peter Zijlstra
2009-09-17 14:24   ` [ofa-general] " Roland Dreier
2009-09-17 14:32     ` Roland Dreier
2009-09-17 14:49       ` Peter Zijlstra
2009-09-17 15:03         ` Roland Dreier
2009-09-17 15:22           ` Peter Zijlstra
2009-09-17 15:45           ` Roland Dreier
2009-09-18 11:50             ` Ingo Molnar
2009-09-29 17:13             ` Pavel Machek
2009-09-30  9:44               ` Ingo Molnar
2009-09-30 16:02                 ` Jason Gunthorpe
2009-10-12 18:19                   ` Ingo Molnar
2009-10-12 19:30                     ` Jason Gunthorpe
2009-10-12 20:20                       ` Ingo Molnar
2009-10-13  4:05                         ` Jason Gunthorpe
2009-10-13  6:40                           ` Ingo Molnar
2009-10-13 16:27                             ` Jason Gunthorpe
2009-10-13  5:43                         ` Brice Goglin
2009-10-13  6:38                           ` Ingo Molnar
2009-09-30 17:06                 ` Roland Dreier
2009-10-02 16:32                 ` Roland Dreier
2009-10-02 20:45                   ` Pavel Machek
2009-10-07 22:34                   ` Roland Dreier
2009-10-12 17:33                     ` Peter Zijlstra [this message]
2009-09-17 14:43     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1255368802.8392.26.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=general@lists.openfabrics.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=pavel@ucw.cz \
    --cc=rdreier@cisco.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox