From: Roland Dreier <rdreier@cisco.com>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, jsquyres@cisco.com,
rostedt@goodmis.org
Subject: Re: [PATCH v3] ummunotify: Userspace support for MMU notifications
Date: Sun, 02 Aug 2009 21:55:49 -0700 [thread overview]
Message-ID: <aday6q1o5re.fsf@cisco.com> (raw)
In-Reply-To: <4A75F00D.7010400@inria.fr> (Brice Goglin's message of "Sun, 02 Aug 2009 21:59:09 +0200")
> I like the interface but I have a couple questions:
Thanks.
> 1) Why does userspace have to register these address ranges? I would
> have just reported all invalidation evens and let user-space check which
> ones are interesting. My feeling is that the number of invalidation
> events will usually be lower than the number registered ranges, so
> you'll report more events through the file descriptor, but userspace
> will do a lot less ioctls.
A couple of reasons. First, MMU notifier events may be delivered (in
the kernel) in interrupt context so the amount of allocation we can do
in a notifier hook is limited (and any allocation will fail sometimes).
So if we just want to report all events to userspace then I don't see
any was around having to sometimes deliver an event like "uh, some
events got lost" and have userspace have to flush everything.
I suspect that MPI workloads will hit the overflow case in practice,
since they probably want to run as close to out-of-memory as possible,
and the application may not enter the MPI library often enough to keep
the queue of ummunotify events short -- I can imagine some codes that do
a lot of memory management, enter MPI infrequently, and end up
overflowing the queue and flushing all registrations over and over.
Having userspace register ranges means I can preallocate a landing area
for each event and make the MMU notifier hook pretty simple.
Second, it turns out that having the filter does cut down quite a bit on
the events. From running some Open MPI tests that Jeff provided, I saw
that there were often several times as many MMU notifier events
delivered in the kernel than ended up being reported to userspace.
> 2) What happens in case of fork? If father+child keep reading from the
> previously-open /dev/ummunotify, each event will be delivered only to
> the first reader, right? Fork is always a mess in HPC, but maybe there's
> something to do here.
It works just like any other file where fork results in two file
descriptors in two processes... as you point out the two processes can
step on each other. (And in the ummunotify case the file remains
associated with the original mm) However this is the case for simpler
stuff like sockets etc too, and I think uniformity of interface and
least surprise say that ummunotify should follow the same model.
> 3) What's userspace supposed to do if 2 libraries need such events in
> the same process? Should each of them open /dev/ummunotify separately?
> Doesn't matter much for performance, just wondering.
I guess the libraries could work out some way to share things, but that
would require one library to pass events to the other or something like
that. It should work fine for 2 libraries to have independent
ummunotify files open though (I've not tested but "what could go wrong"?).
Thanks,
Roland
next prev parent reply other threads:[~2009-08-03 4:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-22 17:47 [PATCH/RFC] ummunot: Userspace support for MMU notifications Roland Dreier
2009-07-22 18:15 ` Andrew Morton
2009-07-22 19:27 ` Roland Dreier
2009-07-22 19:42 ` Andrew Morton
2009-07-23 2:26 ` Steven Rostedt
2009-07-23 20:21 ` Roland Dreier
2009-07-24 0:25 ` Steven Rostedt
2009-07-24 22:56 ` [PATCH v2] ummunotify: " Roland Dreier
2009-07-27 23:53 ` Andrew Morton
2009-07-28 16:14 ` Roland Dreier
2009-07-31 18:54 ` [PATCH v3] " Roland Dreier
2009-08-02 19:59 ` Brice Goglin
2009-08-03 4:55 ` Roland Dreier [this message]
2009-08-03 6:57 ` Brice Goglin
2009-08-04 17:14 ` Roland Dreier
2009-07-23 9:04 ` [PATCH/RFC] ummunot: " Li Zefan
2009-07-23 20:28 ` Roland Dreier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aday6q1o5re.fsf@cisco.com \
--to=rdreier@cisco.com \
--cc=Brice.Goglin@inria.fr \
--cc=akpm@linux-foundation.org \
--cc=jsquyres@cisco.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox