public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Isaacson <adi@hexapodia.org>
To: David Addison <addy@quadrics.com>
Cc: Brice Goglin <Brice.Goglin@ens-lyon.org>,
	linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
	Andrea Arcangeli <andrea@suse.de>,
	David Addison <david.addison@quadrics.com>
Subject: Re: [PATCH][RFC] Linux VM hooks for advanced RDMA NICs
Date: Thu, 28 Apr 2005 01:38:08 -0700	[thread overview]
Message-ID: <20050428083808.GA23849@hexapodia.org> (raw)
In-Reply-To: <426F5E33.8000906@quadrics.com>

On Wed, Apr 27, 2005 at 10:41:07AM +0100, David Addison wrote:
> Brice Goglin wrote:
> >I worked on a similar patch to help updating a registration cache on
> >Myrinet. I came to the problem of deciding between registering ioproc
> >to the entire address space (1) or only to some VMA (2).
> >You're doing (1), I tried (2).
>
> We have always taken approach (1) as it seems to be the simplest
> method and offers the model where the whole user process space can be
> made available for RDMA operations.

I agree that this is a nice patch for exploring the design space (and
frankly, for maintaining outside the kernel tree).  I'd like to see
something like this merged.  As it stands, the patch is a decent
standalone implementation of (1).

I would personally strongly prefer that whatever is merged be low-impact
and so obviously good that it would not need to be a CONFIG_ option.
(Or rather, it should be a CONFIG_ option, but one which is forced to
yes if !CONFIG_EMBEDDED.)

And of course, it needs to be general-purpose enough to satisfy all the
significant constituencies:
1. Myrinet/Quadrics (proprietary interconnects for HPC/etc)
2. Infiniband (slightly more general-purpose interconnect standard for
               etc/HPC)
3. RDMA TCP
and I would add
4. people who want to add a commodity card to a general-purpose server
   and be able to take advantage of direct-to-userspace transfers
   without breaking the general-purposeness of their server.

I think that given a reliable framework for DMA-to-userspace, other
users will pop up.  OpenGL (DRI) is one obvious example; I think there
are others.

With those (fairly lofty) goals in mind, I think the verdict is not good
for ioproc-2.6.12-rc3.patch.

It's got some style-ish issues that would have to be worked out before
it could be merged.  (#ifdef in code, for one.)

It's adding a linked-list walk to a bunch of places in mm/, which is (or
at least, seems to me) pretty unacceptable (even if it's just one
cacheline miss) in the fast paths.

Did you understand Andi's suggestion about NUMA policies?  (I'm not
smart enough to follow it.)  Can we share code between this and the NUMA
stuff?

> static over the life of the job and hence most of the costs are taken
> as the pages are created during job startup and initialisation.

Yeah, I'm pretty skeptical about claims that "It's too much work to keep
track of all that" regarding per-proc versus per-vma, and also regarding
explicit-lock-from-commlib versus dynamic-pinning.  For the people who
care (HPC), pin/unpin events are very rare (zero during normal runtime),
so the overhead is unimportant.  It's more important to provide reliable
operation with minimal impact to standard mm semantics.

> However, I still prefer model (1) as it allows both implementations and
> appears to be much simpler in terms of the linux kernel changes required.

I agree that (1) looks easier to implement when you're doing it outside
the kernel (and tracking).  However, if you're aiming for integration
we should figure out what the right answer is.  It feels like that's
per-vma, but I freely admit I don't have any code to back that up.

> Thanks for your comments,

Thank you for stepping up to be our archery target. :)

> diff -ruN linux-2.6.12-rc3.orig/include/linux/ioproc.h linux-2.6.12-rc3.ioproc/include/linux/ioproc.h

Could you add -p to your diff invocation, please...

This patch is *exactly* what I'd want if I were looking for an obvious,
easy-to-maintain externally-maintained patch to add this capability.
(Would that I could say that for all the HPC kernel patches I've been
subjected to.)

But I think we can do better.

At least I would like to see Andi (or another NUMA mm god) and you (or
another RDMA expert) hash over the possiblity of sharing code.

-andy

  reply	other threads:[~2005-04-28  8:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-26 15:49 [PATCH][RFC] Linux VM hooks for advanced RDMA NICs David Addison
2005-04-26 16:57 ` Jesper Juhl
2005-04-26 17:13   ` Lee Revell
2005-04-26 17:20     ` Jesper Juhl
2005-04-26 17:28       ` Lee Revell
2005-04-26 17:38         ` Jesper Juhl
2005-04-26 20:14         ` John W. Linville
2005-04-26 20:17           ` Lee Revell
2005-04-26 20:09       ` Lars Marowsky-Bree
2005-04-28 11:34     ` Jakob Oestergaard
2005-04-29  8:22     ` Benjamin Herrenschmidt
2005-04-26 17:06 ` Brice Goglin
2005-04-27  9:41   ` David Addison
2005-04-28  8:38     ` Andy Isaacson [this message]
2005-04-27 13:43   ` Andi Kleen
2005-04-28  1:42 ` Troy Benjegerdes
2005-04-28  7:21 ` Brice Goglin
2005-04-28  9:21   ` David Addison
2005-04-29  8:19   ` Benjamin Herrenschmidt
2005-04-29  9:25     ` David Addison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050428083808.GA23849@hexapodia.org \
    --to=adi@hexapodia.org \
    --cc=Brice.Goglin@ens-lyon.org \
    --cc=addy@quadrics.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=david.addison@quadrics.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox