* [RFC] [PATCH] Add memcpy32 function
@ 2005-12-23 1:35 Bryan O'Sullivan
2005-12-23 2:49 ` Dave Jones
2005-12-28 11:01 ` Andi Kleen
0 siblings, 2 replies; 11+ messages in thread
From: Bryan O'Sullivan @ 2005-12-23 1:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Matt Mackall, Andrew Morton
In response to the comments that followed Roland Dreier posting our
InfiniPath driver for review last week, we've been making some cleanups
to our driver code.
As our chip requires 32-bit accesses, we need a copy function that
guarantees operating in such terms. It was suggested that we make this
generic, with arch-specific optimised versions.
This patch introduces the generic copy routine, memcpy32. At Andrew's
suggestion, I've put it in a new header file, include/linux/io.h, which
I've styled after include/linux/string.h.
Linus made some comments this morning about the manner in which the
functions in string.h are provided, but I wanted to float this patch for
review in any case, if only to see whether the alternative really is
preferred.
If desired, I can introduce instead an
include/asm-generic/memcpy32-algo.h header which defines the generic
memcpy32 routine, and edit all 20-odd asm-*/io.h files to use it. That
seems wasteful, though.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
diff -r 0a6978881777 lib/Makefile
--- a/lib/Makefile Fri Dec 23 01:41:03 2005 +0800
+++ b/lib/Makefile Thu Dec 22 15:45:05 2005 -0800
@@ -5,7 +5,7 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
- sha1.o
+ sha1.o io.o
lib-y += kobject.o kref.o kobject_uevent.o klist.o
diff -r 0a6978881777 include/linux/io.h
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/include/linux/io.h Thu Dec 22 15:45:05 2005 -0800
@@ -0,0 +1,32 @@
+#ifndef _LINUX_IO_H_
+#define _LINUX_IO_H_
+
+#include <linux/types.h>
+
+#ifdef __KERNEL__
+
+/*
+ * Include machine specific inline routines.
+ */
+#include <asm/io.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifndef __HAVE_ARCH_MEMCPY32
+/*
+ * Copy routine that is guaranteed to operate in terms of 32-bit
+ * quantities. Source and dest must be 32-bit aligned. Count is
+ * number of 32-bit quantities (NOT bytes) to copy.
+ */
+void *memcpy32(void *, const void *, __kernel_size_t);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_IO_H_ */
diff -r 0a6978881777 lib/io.c
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/lib/io.c Thu Dec 22 15:45:05 2005 -0800
@@ -0,0 +1,21 @@
+#include <linux/io.h>
+
+#ifndef __HAVE_ARCH_MEMCPY32
+/**
+ * memcpy32 - Copy one area of memory to another, as a series of
+ * aligned 32-bit values.
+ * @dest: Where to copy to (must be 32-bit aligned)
+ * @src: Where to copy from (must be 32-bit aligned)
+ * @count: The number of 32-bit values to copy
+ */
+void *memcpy32(void *dest, const void *src, size_t count)
+{
+ u32 *tmp = dest;
+ u32 *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+EXPORT_SYMBOL_GPL(memcpy32);
+#endif
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 1:35 [RFC] [PATCH] Add memcpy32 function Bryan O'Sullivan
@ 2005-12-23 2:49 ` Dave Jones
2005-12-23 17:16 ` Matt Mackall
2005-12-28 11:01 ` Andi Kleen
1 sibling, 1 reply; 11+ messages in thread
From: Dave Jones @ 2005-12-23 2:49 UTC (permalink / raw)
To: Bryan O'Sullivan; +Cc: linux-kernel, Matt Mackall, Andrew Morton
On Thu, Dec 22, 2005 at 05:35:59PM -0800, Bryan O'Sullivan wrote:
> In response to the comments that followed Roland Dreier posting our
> InfiniPath driver for review last week, we've been making some cleanups
> to our driver code.
>
> As our chip requires 32-bit accesses, we need a copy function that
> guarantees operating in such terms. It was suggested that we make this
> generic, with arch-specific optimised versions.
>
> This patch introduces the generic copy routine, memcpy32. At Andrew's
> suggestion, I've put it in a new header file, include/linux/io.h, which
> I've styled after include/linux/string.h.
io.h is a very generic sounding name for something that just houses
a memcpy variant. What's wrong with calling a spade a spade,
and using memcpy32.h ?
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 2:49 ` Dave Jones
@ 2005-12-23 17:16 ` Matt Mackall
2005-12-23 17:42 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Matt Mackall @ 2005-12-23 17:16 UTC (permalink / raw)
To: Dave Jones, Bryan O'Sullivan, linux-kernel, Andrew Morton
On Thu, Dec 22, 2005 at 09:49:43PM -0500, Dave Jones wrote:
> On Thu, Dec 22, 2005 at 05:35:59PM -0800, Bryan O'Sullivan wrote:
> > In response to the comments that followed Roland Dreier posting our
> > InfiniPath driver for review last week, we've been making some cleanups
> > to our driver code.
> >
> > As our chip requires 32-bit accesses, we need a copy function that
> > guarantees operating in such terms. It was suggested that we make this
> > generic, with arch-specific optimised versions.
> >
> > This patch introduces the generic copy routine, memcpy32. At Andrew's
> > suggestion, I've put it in a new header file, include/linux/io.h, which
> > I've styled after include/linux/string.h.
>
> io.h is a very generic sounding name for something that just houses
> a memcpy variant. What's wrong with calling a spade a spade,
> and using memcpy32.h ?
I think it belongs in string.h alongside memcpy, just for tradition's
sake. I don't think it belongs in a file named io.h, as it probably
has uses beyond I/O.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 17:16 ` Matt Mackall
@ 2005-12-23 17:42 ` Christoph Hellwig
2005-12-23 18:14 ` Matt Mackall
2005-12-23 23:50 ` Bryan O'Sullivan
0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2005-12-23 17:42 UTC (permalink / raw)
To: Matt Mackall
Cc: Dave Jones, Bryan O'Sullivan, linux-kernel, Andrew Morton
On Fri, Dec 23, 2005 at 11:16:28AM -0600, Matt Mackall wrote:
> > io.h is a very generic sounding name for something that just houses
> > a memcpy variant. What's wrong with calling a spade a spade,
> > and using memcpy32.h ?
>
> I think it belongs in string.h alongside memcpy, just for tradition's
> sake. I don't think it belongs in a file named io.h, as it probably
> has uses beyond I/O.
Actually I think memcpy32 is not the thing pathscale wants. They want
memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
on I/O mapped memory. I'd say back to the drawingboard.
And to pathscale: please get your driver __iomem and endianess annotated
before sending out further core patches, I'm pretty sure getting those
things fixed will shed some light on the actual requirements.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 17:42 ` Christoph Hellwig
@ 2005-12-23 18:14 ` Matt Mackall
2005-12-23 23:50 ` Bryan O'Sullivan
1 sibling, 0 replies; 11+ messages in thread
From: Matt Mackall @ 2005-12-23 18:14 UTC (permalink / raw)
To: Christoph Hellwig, Dave Jones, Bryan O'Sullivan, linux-kernel,
Andrew Morton
On Fri, Dec 23, 2005 at 05:42:28PM +0000, Christoph Hellwig wrote:
> On Fri, Dec 23, 2005 at 11:16:28AM -0600, Matt Mackall wrote:
> > > io.h is a very generic sounding name for something that just houses
> > > a memcpy variant. What's wrong with calling a spade a spade,
> > > and using memcpy32.h ?
> >
> > I think it belongs in string.h alongside memcpy, just for tradition's
> > sake. I don't think it belongs in a file named io.h, as it probably
> > has uses beyond I/O.
>
> Actually I think memcpy32 is not the thing pathscale wants. They want
> memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
> on I/O mapped memory. I'd say back to the drawingboard.
Ahh, excellent point. We probably want something closer to
iowrite32_rep and have it live in iomap.h. Perhaps iowrite32_copy?
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 17:42 ` Christoph Hellwig
2005-12-23 18:14 ` Matt Mackall
@ 2005-12-23 23:50 ` Bryan O'Sullivan
1 sibling, 0 replies; 11+ messages in thread
From: Bryan O'Sullivan @ 2005-12-23 23:50 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Matt Mackall, Dave Jones, linux-kernel, Andrew Morton
On Fri, 2005-12-23 at 17:42 +0000, Christoph Hellwig wrote:
> Actually I think memcpy32 is not the thing pathscale wants. They want
> memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
> on I/O mapped memory. I'd say back to the drawingboard.
Fair enough. I'll follow Matt's suggestion of iowrite32_copy and
ioread32_copy, in that case, and put them in asm-generic/iomap.h.
> And to pathscale: please get your driver __iomem and endianess annotated
> before sending out further core patches, I'm pretty sure getting those
> things fixed will shed some light on the actual requirements.
OK, will do.
<b
--
Bryan O'Sullivan <bos@pathscale.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-23 1:35 [RFC] [PATCH] Add memcpy32 function Bryan O'Sullivan
2005-12-23 2:49 ` Dave Jones
@ 2005-12-28 11:01 ` Andi Kleen
2005-12-28 15:00 ` Bryan O'Sullivan
1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-12-28 11:01 UTC (permalink / raw)
To: Bryan O'Sullivan; +Cc: Matt Mackall, Andrew Morton, linux-kernel
Bryan O'Sullivan <bos@pathscale.com> writes:
> In response to the comments that followed Roland Dreier posting our
> InfiniPath driver for review last week, we've been making some cleanups
> to our driver code.
What irritates me is that the original author said this copy
would happen from user space in ipath. In that case you would need
exception handling for all memory accesses to return EFAULT,
otherwise everybody can crash the kernel.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-28 11:01 ` Andi Kleen
@ 2005-12-28 15:00 ` Bryan O'Sullivan
2005-12-28 17:50 ` Andreas Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Bryan O'Sullivan @ 2005-12-28 15:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: Matt Mackall, Andrew Morton, linux-kernel
On Wed, 2005-12-28 at 12:01 +0100, Andi Kleen wrote:
> What irritates me is that the original author said this copy
> would happen from user space in ipath.
I'm afraid there may have been some miscommunication there.
All of our uses of memcpy_toio32 (which uses memcpy32 on x86_64) copy
from kernel virtual addresses to MMIO space. There's no direct copying
from userspace to MMIO space through the driver.
However, we do let userspace code directly access portions of our chip.
That code uses a routine that is exactly the same as memcpy32 to perform
MMIO writes. That's where I think the confusion arose on the part of
whoever responded to you.
> In that case you would need
> exception handling for all memory accesses to return EFAULT,
> otherwise everybody can crash the kernel.
Just to be clear: we use copy_{to,from}_user for all copies between
userspace and driver, and we return -EFAULT on short copies, so we're
already doing the right thing in that sort of case.
Sorry for the confusion. I hope this clears that business up.
<b
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-28 15:00 ` Bryan O'Sullivan
@ 2005-12-28 17:50 ` Andreas Kleen
2005-12-28 18:11 ` Bryan O'Sullivan
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Kleen @ 2005-12-28 17:50 UTC (permalink / raw)
To: Bryan O'Sullivan; +Cc: Matt Mackall, Andrew Morton, linux-kernel
Am Mi 28.12.2005 16:00 schrieb Bryan O'Sullivan <bos@pathscale.com>:
> All of our uses of memcpy_toio32 (which uses memcpy32 on x86_64) copy
> from kernel virtual addresses to MMIO space. There's no direct copying
> from userspace to MMIO space through the driver.
>
> However, we do let userspace code directly access portions of our
> chip.
> That code uses a routine that is exactly the same as memcpy32 to
> perform
> MMIO writes. That's where I think the confusion arose on the part of
> whoever responded to you.
Ok thanks. And do you have numbers that show that the assembly
function with rep ; movsl actually improves performance over C? I
guess the MMIO
is uncached or at best write combined and in both cases it usually
doesn't
matter if the CPU burns a few more cycles generating the writes or not
because everything is bus bound and every cycle you save
is just lost again on the next synchronization point with the hardware.
If the assembly is not really faster I would recommend you just use a
writel()
loop in the driver instead of adding this very special purpose function
everywhere.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-28 17:50 ` Andreas Kleen
@ 2005-12-28 18:11 ` Bryan O'Sullivan
2005-12-28 18:22 ` Andreas Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Bryan O'Sullivan @ 2005-12-28 18:11 UTC (permalink / raw)
To: Andreas Kleen; +Cc: Matt Mackall, Andrew Morton, linux-kernel
On Wed, 2005-12-28 at 18:50 +0100, Andreas Kleen wrote:
> Ok thanks. And do you have numbers that show that the assembly
> function with rep ; movsl actually improves performance over C?
I'll see if I can ferret some numbers out. If not, I'll generate them,
but it will take me a day or so. I'm pretty sure it makes a difference
of tens to hundreds of nanoseconds, which in our case is very
significant (we measure some of our user-level performance in increments
of 10ns, very repeatably).
> If the assembly is not really faster I would recommend you just use a
> writel()
> loop in the driver instead of adding this very special purpose function
> everywhere.
Yeah, clearly if there's no difference, it's not worth the trouble.
<b
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] [PATCH] Add memcpy32 function
2005-12-28 18:11 ` Bryan O'Sullivan
@ 2005-12-28 18:22 ` Andreas Kleen
0 siblings, 0 replies; 11+ messages in thread
From: Andreas Kleen @ 2005-12-28 18:22 UTC (permalink / raw)
To: Bryan O'Sullivan; +Cc: Matt Mackall, Andrew Morton, linux-kernel
Am Mi 28.12.2005 19:11 schrieb Bryan O'Sullivan <bos@pathscale.com>:
> On Wed, 2005-12-28 at 18:50 +0100, Andreas Kleen wrote:
>
> > Ok thanks. And do you have numbers that show that the assembly
> > function with rep ; movsl actually improves performance over C?
>
> I'll see if I can ferret some numbers out. If not, I'll generate them,
> but it will take me a day or so. I'm pretty sure it makes a difference
> of tens to hundreds of nanoseconds, which in our case is very
> significant (we measure some of our user-level performance in
> increments
> of 10ns, very repeatably).
If you test the C version use
CFLAGS_memcpy32.o := -funroll-loops
BTW on x86-64 with CONFIG_UNORDERED_IO writel can actually expand to a
non temporal write which might break it.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-12-28 18:22 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-23 1:35 [RFC] [PATCH] Add memcpy32 function Bryan O'Sullivan
2005-12-23 2:49 ` Dave Jones
2005-12-23 17:16 ` Matt Mackall
2005-12-23 17:42 ` Christoph Hellwig
2005-12-23 18:14 ` Matt Mackall
2005-12-23 23:50 ` Bryan O'Sullivan
2005-12-28 11:01 ` Andi Kleen
2005-12-28 15:00 ` Bryan O'Sullivan
2005-12-28 17:50 ` Andreas Kleen
2005-12-28 18:11 ` Bryan O'Sullivan
2005-12-28 18:22 ` Andreas Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox