All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christopher Yeoh <cyeoh@au1.ibm.com>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: linux-kernel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] Cross Memory Attach
Date: Thu, 16 Sep 2010 23:30:45 +0930	[thread overview]
Message-ID: <20100916233045.73aecc26@lilo> (raw)
In-Reply-To: <4C91E01E.4070209@inria.fr>

On Thu, 16 Sep 2010 11:15:10 +0200
Brice Goglin <Brice.Goglin@inria.fr> wrote:

> Le 16/09/2010 08:32, Brice Goglin a écrit :
> > I am the guy doing KNEM so I can comment on this. The I/OAT part of
> > KNEM was mostly a research topic, it's mostly useless on current
> > machines since the memcpy performance is much larger than I/OAT DMA
> > Engine. We also have an offload model with a kernel thread, but it
> > wasn't used a lot so far. These features can be ignored for the
> > current discussion.
> 
> I've just created a knem branch where I removed all the above, and
> some other stuff that are not necessary for normal users. So it just
> contains the region management code and two commands to copy between
> regions or between a region and some local iovecs.

When I did the original hpcc runs for CMA vs shared mem double copy I
also did some KNEM runs as a bit of a sanity check. The CMA OpenMPI
implementation actually uses the infrastructure KNEM put into the
OpenMPI shared mem btl - thanks for that btw it made things much easier
for me to test CMA.

Interestingly although KNEM and CMA fundamentally are doing very
similar things, at least with hpcc I didn't see as much of a gain with
KNEM as with CMA:

MB/s				
Naturally Ordered	4	8	16	32
Base	1235	935	622	419
CMA	4741	3769	1977	703
KNEM	3362	3091	1857	681
				
MB/s				
Randomly Ordered	4	8	16	32
Base	1227	947	638	412
CMA	4666	3682	1978	710
KNEM	3348	3050	1883	684
				
MB/s				
Max Ping Pong	4	8	16	32
Base	2028	1938	1928	1882
CMA	7424	7510	7598	7708
KNEM	5661	5476	6050	6290

I don't know the reason behind the difference - if its something
perculiar to hpcc,  or if there's extra overhead the way that
knem does setup for copying, or if knem wasn't configured
optimally. I haven't done any comparison IMB or NPB runs...

syscall and setup overhead does have some measurable effect - although I
don't have the numbers for it here, neither KNEM nor CMA does quite as
well with hpcc when compared against a hacked version of hpcc  where
everything is declared ahead of time as shared memory so the receiver
can just do a single copy from userspace - which I think is
representative of a theoretical maximum gain from the single copy
approach.

Chris
-- 
cyeoh@au.ibm.com

WARNING: multiple messages have this Message-ID (diff)
From: Christopher Yeoh <cyeoh@au1.ibm.com>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: linux-kernel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] Cross Memory Attach
Date: Thu, 16 Sep 2010 23:30:45 +0930	[thread overview]
Message-ID: <20100916233045.73aecc26@lilo> (raw)
In-Reply-To: <4C91E01E.4070209@inria.fr>

On Thu, 16 Sep 2010 11:15:10 +0200
Brice Goglin <Brice.Goglin@inria.fr> wrote:

> Le 16/09/2010 08:32, Brice Goglin a écrit :
> > I am the guy doing KNEM so I can comment on this. The I/OAT part of
> > KNEM was mostly a research topic, it's mostly useless on current
> > machines since the memcpy performance is much larger than I/OAT DMA
> > Engine. We also have an offload model with a kernel thread, but it
> > wasn't used a lot so far. These features can be ignored for the
> > current discussion.
> 
> I've just created a knem branch where I removed all the above, and
> some other stuff that are not necessary for normal users. So it just
> contains the region management code and two commands to copy between
> regions or between a region and some local iovecs.

When I did the original hpcc runs for CMA vs shared mem double copy I
also did some KNEM runs as a bit of a sanity check. The CMA OpenMPI
implementation actually uses the infrastructure KNEM put into the
OpenMPI shared mem btl - thanks for that btw it made things much easier
for me to test CMA.

Interestingly although KNEM and CMA fundamentally are doing very
similar things, at least with hpcc I didn't see as much of a gain with
KNEM as with CMA:

MB/s				
Naturally Ordered	4	8	16	32
Base	1235	935	622	419
CMA	4741	3769	1977	703
KNEM	3362	3091	1857	681
				
MB/s				
Randomly Ordered	4	8	16	32
Base	1227	947	638	412
CMA	4666	3682	1978	710
KNEM	3348	3050	1883	684
				
MB/s				
Max Ping Pong	4	8	16	32
Base	2028	1938	1928	1882
CMA	7424	7510	7598	7708
KNEM	5661	5476	6050	6290

I don't know the reason behind the difference - if its something
perculiar to hpcc,  or if there's extra overhead the way that
knem does setup for copying, or if knem wasn't configured
optimally. I haven't done any comparison IMB or NPB runs...

syscall and setup overhead does have some measurable effect - although I
don't have the numbers for it here, neither KNEM nor CMA does quite as
well with hpcc when compared against a hacked version of hpcc  where
everything is declared ahead of time as shared memory so the receiver
can just do a single copy from userspace - which I think is
representative of a theoretical maximum gain from the single copy
approach.

Chris
-- 
cyeoh@au.ibm.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-16 14:00 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-15  1:18 [RFC][PATCH] Cross Memory Attach Christopher Yeoh
2010-09-15  8:02 ` Ingo Molnar
2010-09-15  8:02   ` Ingo Molnar
2010-09-15  8:16   ` Ingo Molnar
2010-09-15  8:16     ` Ingo Molnar
2010-09-15 13:23     ` Christopher Yeoh
2010-09-15 13:23       ` Christopher Yeoh
2010-09-15 13:20   ` Christopher Yeoh
2010-09-15 13:20     ` Christopher Yeoh
2010-09-15 10:58 ` Avi Kivity
2010-09-15 10:58   ` Avi Kivity
2010-09-15 13:51   ` Ingo Molnar
2010-09-15 13:51     ` Ingo Molnar
2010-09-15 16:10     ` Avi Kivity
2010-09-15 16:10       ` Avi Kivity
2010-09-15 14:42   ` Christopher Yeoh
2010-09-15 14:42     ` Christopher Yeoh
2010-09-15 14:52     ` Linus Torvalds
2010-09-15 14:52       ` Linus Torvalds
2010-09-15 15:44       ` Robin Holt
2010-09-15 15:44         ` Robin Holt
2010-09-16  6:32     ` Brice Goglin
2010-09-16  6:32       ` Brice Goglin
2010-09-16  9:15       ` Brice Goglin
2010-09-16  9:15         ` Brice Goglin
2010-09-16 14:00         ` Christopher Yeoh [this message]
2010-09-16 14:00           ` Christopher Yeoh
2010-09-15 14:46   ` Bryan Donlan
2010-09-15 14:46     ` Bryan Donlan
2010-09-15 16:13     ` Avi Kivity
2010-09-15 16:13       ` Avi Kivity
2010-09-15 19:35       ` Eric W. Biederman
2010-09-15 19:35         ` Eric W. Biederman
2010-09-16  1:18     ` Christopher Yeoh
2010-09-16  1:18       ` Christopher Yeoh
2010-09-16  9:26       ` Avi Kivity
2010-09-16  9:26         ` Avi Kivity
2010-11-02  3:37         ` Christopher Yeoh
2010-11-02  3:37           ` Christopher Yeoh
2010-11-02 11:10           ` Avi Kivity
2010-11-02 11:10             ` Avi Kivity
2010-09-16  1:58     ` KOSAKI Motohiro
2010-09-16  1:58       ` KOSAKI Motohiro
2010-09-16  8:08       ` Ingo Molnar
2010-09-16  8:08         ` Ingo Molnar
2010-09-15 15:11 ` Linus Torvalds
2010-09-15 15:14   ` Linus Torvalds
2010-09-16  2:25     ` Christopher Yeoh
2010-09-16 16:27   ` Peter Zijlstra
2010-09-16 16:54     ` Linus Torvalds
2010-09-16 17:13       ` Peter Zijlstra
2010-09-16 17:34         ` Linus Torvalds
2010-09-16 17:47           ` Peter Zijlstra
2010-09-16 17:54             ` Linus Torvalds
2010-09-16 18:00               ` Linus Torvalds
2010-09-19  4:44                 ` Yuhong Bao
2010-09-19 19:20               ` Yuhong Bao
2010-09-19 21:48                 ` Russell King - ARM Linux
2010-09-19 22:47                   ` Yuhong Bao
2010-09-19  4:55           ` Yuhong Bao
2010-09-15 16:07 ` Valdis.Kletnieks
2010-09-16  2:17   ` Christopher Yeoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100916233045.73aecc26@lilo \
    --to=cyeoh@au1.ibm.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.