Re: [RFC][PATCH] Cross Memory Attach v2 (resend)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christopher Yeoh <cyeoh@au1.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, Ingo Molnar <mingo@elte.hu>,
	Brice Goglin <Brice.Goglin@inria.fr>
Subject: Re: [RFC][PATCH] Cross Memory Attach v2 (resend)
Date: Tue, 23 Nov 2010 19:55:23 +1030	[thread overview]
Message-ID: <20101123195523.46e6addb@lilo> (raw)
In-Reply-To: <20101122130527.c13c99d3.akpm@linux-foundation.org>

On Mon, 22 Nov 2010 13:05:27 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:
> We have a bit of a track record of adding cool-looking syscalls and
> then regretting it a few years later.  Few people use them, and maybe
> they weren't so cool after all, and we have to maintain them for
> ever. Bugs (sometimes security-relevant ones) remain undiscovered for
> long periods because few people use (or care about) the code.
> 
> So I think the bar is a high one - higher than it used to be.
> Convince us that this feature is so important that it's worth all
> that overhead and risk?

Well there are the benchmark results to show that there is
real improvement for MPI implementations (well at least for those
benchmarks ;-) There's also been a few papers written on something
quite similar (KNEM) which goes into more detail on the potential gains.

http://runtime.bordeaux.inria.fr/knem/

I've also heard privately that something very similar has been used in
at least one device driver to support intranode operations for quite a
while, but maintaining this out of tree as the mm has changed has been
quite painful. 

And I can get it down to just one syscall by using the flags parameter
if that helps at all.

> > HPCC results:
> > =============
> > 
> > MB/s			Num Processes	
> > Naturally Ordered	4	8	16	32
> > Base			1235	935	622	419
> > CMA			4741	3769	1977	703
> > 
> > 			
> > MB/s			Num Processes	
> > Randomly Ordered	4	8	16	32
> > Base			1227	947	638	412
> > CMA			4666	3682	1978	710
> > 				
> > MB/s			Num Processes	
> > Max Ping Pong		4	8	16	32
> > Base			2028	1938	1928	1882
> > CMA			7424	7510	7598	7708
> 
> So with the "Naturally ordered" testcase, it got 4741/1235 times
> faster with four processes?

Yes, thats correct.

> > +asmlinkage long sys_process_vm_writev(pid_t pid,
> > +				      const struct iovec __user
> > *lvec,
> > +				      unsigned long liovcnt,
> > +				      const struct iovec __user
> > *rvec,
> > +				      unsigned long riovcnt,
> > +				      unsigned long flags);
> 
> I have a vague feeling that some architectures have issues with six or
> more syscall args.  Or maybe it was seven.

There seem to be quite a few syscalls around with 6 args and none with
7 so I suspect (or at least hope) its 7. 

> > +		bytes_to_copy = min(PAGE_SIZE - start_offset,
> > +				    len - bytes_copied);
> > +		bytes_to_copy = min((size_t)bytes_to_copy,
> > +				    lvec[*lvec_current].iov_len -
> > *lvec_offset);
> 
> Use of min_t() is conventional.

ok

> It might be a little more efficient to do
> 
> 
> 	if (vm_write) {
> 		for (j = 0; j < pages_pinned; j++) {
> 			if (j < i)
> 				set_page_dirty_lock(process_pages[j]);
> 			put_page(process_pages[j]);
> 	} else {
> 		for (j = 0; j < pages_pinned; j++)
> 			put_page(process_pages[j]);
> 	}
> 
> and it is hopefully more efficient still to use release_pages() for
> the second loop.
> 
> This code would have been clearer if a better identifier than `i' had
> been chosen.

ok.
 
> > +			 struct page **process_pages,
> > +			 struct mm_struct *mm,
> > +			 struct task_struct *task,
> > +			 unsigned long flags, int vm_write)
> > +{
> > +	unsigned long pa = addr & PAGE_MASK;
> > +	unsigned long start_offset = addr - pa;
> > +	int nr_pages;
> > +	unsigned long bytes_copied = 0;
> > +	int rc;
> > +	unsigned int nr_pages_copied = 0;
> > +	unsigned int nr_pages_to_copy;
> 
> What prevents me from copying more than 2^32 pages?

Yea it should support that... will fix.

> > +		if (rc == -EFAULT)
> 
> It would be more future-safe to use
> 
> 		if (rc < 0)
> 
> > +			goto free_mem;

ok.

> > +	int i;
> > +	int rc;
> > +	int bytes_copied;
> 
> This was unsigned long in process_vm_rw().  Please review all these
> types for appropriate size and signedness.
> 

ok, will do.

Thanks for looking over the patch!

Chris
-- 
cyeoh@au1.ibm.com

WARNING: multiple messages have this Message-ID (diff)

From: Christopher Yeoh <cyeoh@au1.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, Ingo Molnar <mingo@elte.hu>,
	Brice Goglin <Brice.Goglin@inria.fr>
Subject: Re: [RFC][PATCH] Cross Memory Attach v2 (resend)
Date: Tue, 23 Nov 2010 19:55:23 +1030	[thread overview]
Message-ID: <20101123195523.46e6addb@lilo> (raw)
In-Reply-To: <20101122130527.c13c99d3.akpm@linux-foundation.org>

On Mon, 22 Nov 2010 13:05:27 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:
> We have a bit of a track record of adding cool-looking syscalls and
> then regretting it a few years later.  Few people use them, and maybe
> they weren't so cool after all, and we have to maintain them for
> ever. Bugs (sometimes security-relevant ones) remain undiscovered for
> long periods because few people use (or care about) the code.
> 
> So I think the bar is a high one - higher than it used to be.
> Convince us that this feature is so important that it's worth all
> that overhead and risk?

Well there are the benchmark results to show that there is
real improvement for MPI implementations (well at least for those
benchmarks ;-) There's also been a few papers written on something
quite similar (KNEM) which goes into more detail on the potential gains.

http://runtime.bordeaux.inria.fr/knem/

I've also heard privately that something very similar has been used in
at least one device driver to support intranode operations for quite a
while, but maintaining this out of tree as the mm has changed has been
quite painful. 

And I can get it down to just one syscall by using the flags parameter
if that helps at all.

> > HPCC results:
> > =============
> > 
> > MB/s			Num Processes	
> > Naturally Ordered	4	8	16	32
> > Base			1235	935	622	419
> > CMA			4741	3769	1977	703
> > 
> > 			
> > MB/s			Num Processes	
> > Randomly Ordered	4	8	16	32
> > Base			1227	947	638	412
> > CMA			4666	3682	1978	710
> > 				
> > MB/s			Num Processes	
> > Max Ping Pong		4	8	16	32
> > Base			2028	1938	1928	1882
> > CMA			7424	7510	7598	7708
> 
> So with the "Naturally ordered" testcase, it got 4741/1235 times
> faster with four processes?

Yes, thats correct.

> > +asmlinkage long sys_process_vm_writev(pid_t pid,
> > +				      const struct iovec __user
> > *lvec,
> > +				      unsigned long liovcnt,
> > +				      const struct iovec __user
> > *rvec,
> > +				      unsigned long riovcnt,
> > +				      unsigned long flags);
> 
> I have a vague feeling that some architectures have issues with six or
> more syscall args.  Or maybe it was seven.

There seem to be quite a few syscalls around with 6 args and none with
7 so I suspect (or at least hope) its 7. 

> > +		bytes_to_copy = min(PAGE_SIZE - start_offset,
> > +				    len - bytes_copied);
> > +		bytes_to_copy = min((size_t)bytes_to_copy,
> > +				    lvec[*lvec_current].iov_len -
> > *lvec_offset);
> 
> Use of min_t() is conventional.

ok

> It might be a little more efficient to do
> 
> 
> 	if (vm_write) {
> 		for (j = 0; j < pages_pinned; j++) {
> 			if (j < i)
> 				set_page_dirty_lock(process_pages[j]);
> 			put_page(process_pages[j]);
> 	} else {
> 		for (j = 0; j < pages_pinned; j++)
> 			put_page(process_pages[j]);
> 	}
> 
> and it is hopefully more efficient still to use release_pages() for
> the second loop.
> 
> This code would have been clearer if a better identifier than `i' had
> been chosen.

ok.
 
> > +			 struct page **process_pages,
> > +			 struct mm_struct *mm,
> > +			 struct task_struct *task,
> > +			 unsigned long flags, int vm_write)
> > +{
> > +	unsigned long pa = addr & PAGE_MASK;
> > +	unsigned long start_offset = addr - pa;
> > +	int nr_pages;
> > +	unsigned long bytes_copied = 0;
> > +	int rc;
> > +	unsigned int nr_pages_copied = 0;
> > +	unsigned int nr_pages_to_copy;
> 
> What prevents me from copying more than 2^32 pages?

Yea it should support that... will fix.

> > +		if (rc == -EFAULT)
> 
> It would be more future-safe to use
> 
> 		if (rc < 0)
> 
> > +			goto free_mem;

ok.

> > +	int i;
> > +	int rc;
> > +	int bytes_copied;
> 
> This was unsigned long in process_vm_rw().  Please review all these
> types for appropriate size and signedness.
> 

ok, will do.

Thanks for looking over the patch!

Chris
-- 
cyeoh@au1.ibm.com


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-11-23  9:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-22  1:58 [RFC][PATCH] Cross Memory Attach v2 (resend) Christopher Yeoh
2010-11-22  1:58 ` Christopher Yeoh
2010-11-22  8:45 ` Milton Miller
2010-11-23  9:26   ` Christopher Yeoh
2010-11-22 21:05 ` Andrew Morton
2010-11-22 21:05   ` Andrew Morton
2010-11-23  9:25   ` Christopher Yeoh [this message]
2010-11-23  9:25     ` Christopher Yeoh
2010-11-23 10:05     ` Brice Goglin
2010-11-23 10:05       ` Brice Goglin
2010-12-03  5:37       ` Robin Holt
2010-12-03  5:37         ` Robin Holt
2010-11-26  8:06   ` Ingo Molnar
2010-11-26  8:06     ` Ingo Molnar
2010-11-26  8:09     ` Andrew Morton
2010-11-26  8:09       ` Andrew Morton
2010-11-26  8:41       ` Ingo Molnar
2010-11-26  8:41         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101123195523.46e6addb@lilo \
    --to=cyeoh@au1.ibm.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.