From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 8F5426B0087 for ; Tue, 23 Nov 2010 04:25:43 -0500 (EST) Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com [202.81.31.247]) by e23smtp03.au.ibm.com (8.14.4/8.13.1) with ESMTP id oAN9LNZX003733 for ; Tue, 23 Nov 2010 20:21:23 +1100 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id oAN9PWhV1933358 for ; Tue, 23 Nov 2010 20:25:33 +1100 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id oAN9PW8W022100 for ; Tue, 23 Nov 2010 20:25:32 +1100 Date: Tue, 23 Nov 2010 19:55:23 +1030 From: Christopher Yeoh Subject: Re: [RFC][PATCH] Cross Memory Attach v2 (resend) Message-ID: <20101123195523.46e6addb@lilo> In-Reply-To: <20101122130527.c13c99d3.akpm@linux-foundation.org> References: <20101122122847.3585b447@lilo> <20101122130527.c13c99d3.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: linux-kernel@vger.kernel.org, Linus Torvalds , linux-mm@kvack.org, Ingo Molnar , Brice Goglin List-ID: On Mon, 22 Nov 2010 13:05:27 -0800 Andrew Morton wrote: > We have a bit of a track record of adding cool-looking syscalls and > then regretting it a few years later. Few people use them, and maybe > they weren't so cool after all, and we have to maintain them for > ever. Bugs (sometimes security-relevant ones) remain undiscovered for > long periods because few people use (or care about) the code. > > So I think the bar is a high one - higher than it used to be. > Convince us that this feature is so important that it's worth all > that overhead and risk? Well there are the benchmark results to show that there is real improvement for MPI implementations (well at least for those benchmarks ;-) There's also been a few papers written on something quite similar (KNEM) which goes into more detail on the potential gains. http://runtime.bordeaux.inria.fr/knem/ I've also heard privately that something very similar has been used in at least one device driver to support intranode operations for quite a while, but maintaining this out of tree as the mm has changed has been quite painful. And I can get it down to just one syscall by using the flags parameter if that helps at all. > > HPCC results: > > ============= > > > > MB/s Num Processes > > Naturally Ordered 4 8 16 32 > > Base 1235 935 622 419 > > CMA 4741 3769 1977 703 > > > > > > MB/s Num Processes > > Randomly Ordered 4 8 16 32 > > Base 1227 947 638 412 > > CMA 4666 3682 1978 710 > > > > MB/s Num Processes > > Max Ping Pong 4 8 16 32 > > Base 2028 1938 1928 1882 > > CMA 7424 7510 7598 7708 > > So with the "Naturally ordered" testcase, it got 4741/1235 times > faster with four processes? Yes, thats correct. > > +asmlinkage long sys_process_vm_writev(pid_t pid, > > + const struct iovec __user > > *lvec, > > + unsigned long liovcnt, > > + const struct iovec __user > > *rvec, > > + unsigned long riovcnt, > > + unsigned long flags); > > I have a vague feeling that some architectures have issues with six or > more syscall args. Or maybe it was seven. There seem to be quite a few syscalls around with 6 args and none with 7 so I suspect (or at least hope) its 7. > > + bytes_to_copy = min(PAGE_SIZE - start_offset, > > + len - bytes_copied); > > + bytes_to_copy = min((size_t)bytes_to_copy, > > + lvec[*lvec_current].iov_len - > > *lvec_offset); > > Use of min_t() is conventional. ok > It might be a little more efficient to do > > > if (vm_write) { > for (j = 0; j < pages_pinned; j++) { > if (j < i) > set_page_dirty_lock(process_pages[j]); > put_page(process_pages[j]); > } else { > for (j = 0; j < pages_pinned; j++) > put_page(process_pages[j]); > } > > and it is hopefully more efficient still to use release_pages() for > the second loop. > > This code would have been clearer if a better identifier than `i' had > been chosen. ok. > > + struct page **process_pages, > > + struct mm_struct *mm, > > + struct task_struct *task, > > + unsigned long flags, int vm_write) > > +{ > > + unsigned long pa = addr & PAGE_MASK; > > + unsigned long start_offset = addr - pa; > > + int nr_pages; > > + unsigned long bytes_copied = 0; > > + int rc; > > + unsigned int nr_pages_copied = 0; > > + unsigned int nr_pages_to_copy; > > What prevents me from copying more than 2^32 pages? Yea it should support that... will fix. > > + if (rc == -EFAULT) > > It would be more future-safe to use > > if (rc < 0) > > > + goto free_mem; ok. > > + int i; > > + int rc; > > + int bytes_copied; > > This was unsigned long in process_vm_rw(). Please review all these > types for appropriate size and signedness. > ok, will do. Thanks for looking over the patch! Chris -- cyeoh@au1.ibm.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org