Question about memcpy

kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed

* Question about memcpy
@ 2018-07-07 11:36 bing zhu
  2018-07-07 18:44 ` valdis.kletnieks at vt.edu
  0 siblings, 1 reply; 22+ messages in thread
From: bing zhu @ 2018-07-07 11:36 UTC (permalink / raw)
  To: kernelnewbies

Dear Sir/Ma'am
Thank you for your time ,i'm a student new to linux kernel.
I have a question about memcpy,i noticed that memcpy is faster in kernel
than in user space
for example :
in a module helloworld , i use memcpy to copy a 4096B to a block of memory
for like 10000 times
and in user space i do the same thing,I noticed that kernel is faster than
user ,
is it possible that in kernel when i insmod hello it can not be scheduled
but in user space it will so kernel is faster?
is there a possible way that a user task can run a block of code that
uninterruptable? No switch ,no schedule ?
Thank you !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180707/2f8c8e06/attachment-0001.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-07 11:36 Question about memcpy bing zhu
@ 2018-07-07 18:44 ` valdis.kletnieks at vt.edu
  2018-07-08 14:03   ` bing zhu
  0 siblings, 1 reply; 22+ messages in thread
From: valdis.kletnieks at vt.edu @ 2018-07-07 18:44 UTC (permalink / raw)
  To: kernelnewbies

On Sat, 07 Jul 2018 19:36:47 +0800, bing zhu said:

> and in user space i do the same thing,I noticed that kernel is faster than
> user ,

How did you measure the times? Doing this right is actually harder than it looks...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180707/56888e38/attachment.sig>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-07 18:44 ` valdis.kletnieks at vt.edu
@ 2018-07-08 14:03   ` bing zhu
  2018-07-09  7:54     ` 袁建鹏
  2018-07-09 14:04     ` Himanshu Jha
  0 siblings, 2 replies; 22+ messages in thread
From: bing zhu @ 2018-07-08 14:03 UTC (permalink / raw)
  To: kernelnewbies

void *p = malloc(4096 * max);
start = usec();
for (i = 0; i < max; i++) {
memcpy(p + i * 4096, page, 4096);
}
end = usec();
printf("%s : %d time use %lu us \n", __func__, max,end - start?;

static unsigned long usec(void)
{
        struct timeval tv;
        gettimeofday(&tv, 0);
        return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
}


I'm don't think it's really precise but i did notice a difference ,

2018-07-08 2:44 GMT+08:00 <valdis.kletnieks@vt.edu>:

> On Sat, 07 Jul 2018 19:36:47 +0800, bing zhu said:
>
> > and in user space i do the same thing,I noticed that kernel is faster
> than
> > user ,
>
> How did you measure the times? Doing this right is actually harder than it
> looks...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180708/3bba6606/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-08 14:03   ` bing zhu
@ 2018-07-09  7:54     ` 袁建鹏
  2018-07-09  8:14       ` bing zhu
  2018-07-09 14:04     ` Himanshu Jha
  1 sibling, 1 reply; 22+ messages in thread
From: 袁建鹏 @ 2018-07-09  7:54 UTC (permalink / raw)
  To: kernelnewbies

can you show all code kernel and userspace ?

Kernel compile options are optimized, very different from userspace.

you can use the same object (memcpy.o) to link userspace program and kernel module.

-----????-----
???:"bing zhu" <zhubohong12@gmail.com>
????:2018-07-08 22:03:48 (???)
???: "Valdis Kletnieks" <valdis.kletnieks@vt.edu>
??: kernelnewbies at kernelnewbies.org
??: Re: Question about memcpy


void *p = malloc(4096 * max);
start = usec();
for (i = 0; i < max; i++) {
memcpy(p + i * 4096, page, 4096);
}
end = usec();
printf("%s : %d time use %lu us \n", __func__, max,end - start?;


static unsigned long usec(void)
{
        struct timeval tv;
        gettimeofday(&tv, 0);
        return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
}




I'm don't think it's really precise but i did notice a difference ,


2018-07-08 2:44 GMT+08:00 <valdis.kletnieks@vt.edu>:
On Sat, 07 Jul 2018 19:36:47 +0800, bing zhu said:

> and in user space i do the same thing,I noticed that kernel is faster than
> user ,

How did you measure the times? Doing this right is actually harder than it looks...


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180709/fc177327/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-09  7:54     ` 袁建鹏
@ 2018-07-09  8:14       ` bing zhu
  0 siblings, 0 replies; 22+ messages in thread
From: bing zhu @ 2018-07-09  8:14 UTC (permalink / raw)
  To: kernelnewbies

in kernel you should use this func:
static unsigned long usec(void)
{
        struct timeval tv;
        do_gettimeofday(&tv);
        return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
}


2018-07-09 15:54 GMT+08:00 ??? <yuanjp@hust.edu.cn>:

> can you show all code kernel and userspace ?
>
> Kernel compile options are optimized, very different from userspace.
>
> you can use the same object (memcpy.o) to link userspace program and
> kernel module.
>
> -----????-----
> *???:*"bing zhu" <zhubohong12@gmail.com>
> *????:*2018-07-08 22:03:48 (???)
> *???:* "Valdis Kletnieks" <valdis.kletnieks@vt.edu>
> *??:* kernelnewbies at kernelnewbies.org
> *??:* Re: Question about memcpy
>
> void *p = malloc(4096 * max);
> start = usec();
> for (i = 0; i < max; i++) {
> memcpy(p + i * 4096, page, 4096);
> }
> end = usec();
> printf("%s : %d time use %lu us \n", __func__, max,end - start?;
>
> static unsigned long usec(void)
> {
>         struct timeval tv;
>         gettimeofday(&tv, 0);
>         return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
> }
>
>
> I'm don't think it's really precise but i did notice a difference ,
>
> 2018-07-08 2:44 GMT+08:00 <valdis.kletnieks@vt.edu>:
>
>> On Sat, 07 Jul 2018 19:36:47 +0800, bing zhu said:
>>
>> > and in user space i do the same thing,I noticed that kernel is faster
>> than
>> > user ,
>>
>> How did you measure the times? Doing this right is actually harder than
>> it looks...
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180709/aa3f9663/attachment-0001.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-08 14:03   ` bing zhu
  2018-07-09  7:54     ` 袁建鹏
@ 2018-07-09 14:04     ` Himanshu Jha
  2018-07-09 16:16       ` valdis.kletnieks at vt.edu
  2018-07-10  4:50       ` bing zhu
  1 sibling, 2 replies; 22+ messages in thread
From: Himanshu Jha @ 2018-07-09 14:04 UTC (permalink / raw)
  To: kernelnewbies

Hi Bing,

On Sun, Jul 08, 2018 at 10:03:48PM +0800, bing zhu wrote:
> void *p = malloc(4096 * max);
> start = usec();
> for (i = 0; i < max; i++) {
> memcpy(p + i * 4096, page, 4096);
> }
> end = usec();
> printf("%s : %d time use %lu us \n", __func__, max,end - start?;
> 
> static unsigned long usec(void)
> {
>         struct timeval tv;
>         gettimeofday(&tv, 0);
>         return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
> }

I think for these benchmarking stuff, to evaluate the cycles and time
correctly you should use the __rdtscp(more info at "AMD64 Architecture
Programmer?s Manual Volume 3: General-Purpose and System Instructions"
Pg 401)

Userspace:
----------------------------------------------------------------------
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <x86intrin.h>

volatile unsigned sink;
unsigned int junk;

int main (void)
{
clock_t start = clock();
register uint64_t t=__rdtscp(&junk);

for(size_t i=0; i<10000000; ++i)
	sink++;

t=__rdtscp(&junk)-t;
clock_t end = clock();
double cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;

printf("for loop took %f seconds to execute %zu cylces\n", cpu_time_used, t);
}
---------------------------------------------------------------------

Kernelspace:
If you want to dig more:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf


Thanks
-- 
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-09 14:04     ` Himanshu Jha
@ 2018-07-09 16:16       ` valdis.kletnieks at vt.edu
  2018-07-14 10:10         ` Himanshu Jha
  2018-07-10  4:50       ` bing zhu
  1 sibling, 1 reply; 22+ messages in thread
From: valdis.kletnieks at vt.edu @ 2018-07-09 16:16 UTC (permalink / raw)
  To: kernelnewbies

On Mon, 09 Jul 2018 19:34:44 +0530, Himanshu Jha said:

> I think for these benchmarking stuff, to evaluate the cycles and time
> correctly you should use the __rdtscp(more info at "AMD64 Architecture
> Programmer???s Manual Volume 3: General-Purpose and System Instructions"
> Pg 401)

Just beware that many Intel (and maybe some AMD) chipsets have a non-constant
TSC frequency.  Check /proc/cpuinfo for 'constant_tsc' before relying on the value.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180709/313cad4f/attachment.sig>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-09 16:16       ` valdis.kletnieks at vt.edu
@ 2018-07-14 10:10         ` Himanshu Jha
  0 siblings, 0 replies; 22+ messages in thread
From: Himanshu Jha @ 2018-07-14 10:10 UTC (permalink / raw)
  To: kernelnewbies

On Mon, Jul 09, 2018 at 12:16:27PM -0400, valdis.kletnieks at vt.edu wrote:
> On Mon, 09 Jul 2018 19:34:44 +0530, Himanshu Jha said:
> 
> > I think for these benchmarking stuff, to evaluate the cycles and time
> > correctly you should use the __rdtscp(more info at "AMD64 Architecture
> > Programmer???s Manual Volume 3: General-Purpose and System Instructions"
> > Pg 401)
> 
> Just beware that many Intel (and maybe some AMD) chipsets have a non-constant
> TSC frequency.  Check /proc/cpuinfo for 'constant_tsc' before relying on the value.

How about setting "performance" governor[1] for all CPUs ?
Would that work ? I mean no throttle down, but not sure if we have a
constant cpufreq.

Something like the following script:

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;
	do [ -f $CPUFREQ ] || continue; 
	echo -n performance > $CPUFREQ;
done

[1] https://www.kernel.org/doc/html/v4.14/admin-guide/pm/cpufreq.html#performance

-- 
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-09 14:04     ` Himanshu Jha
  2018-07-09 16:16       ` valdis.kletnieks at vt.edu
@ 2018-07-10  4:50       ` bing zhu
  2018-07-10  6:22         ` Greg KH
  1 sibling, 1 reply; 22+ messages in thread
From: bing zhu @ 2018-07-10  4:50 UTC (permalink / raw)
  To: kernelnewbies

I agree !,just i think the problem is still there,memcpy is indeed faster
in kernel than in user,i've tried both ways .
schedule might be to blame.

2018-07-09 22:04 GMT+08:00 Himanshu Jha <himanshujha199640@gmail.com>:

> Hi Bing,
>
> On Sun, Jul 08, 2018 at 10:03:48PM +0800, bing zhu wrote:
> > void *p = malloc(4096 * max);
> > start = usec();
> > for (i = 0; i < max; i++) {
> > memcpy(p + i * 4096, page, 4096);
> > }
> > end = usec();
> > printf("%s : %d time use %lu us \n", __func__, max,end - start?;
> >
> > static unsigned long usec(void)
> > {
> >         struct timeval tv;
> >         gettimeofday(&tv, 0);
> >         return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
> > }
>
> I think for these benchmarking stuff, to evaluate the cycles and time
> correctly you should use the __rdtscp(more info at "AMD64 Architecture
> Programmer?s Manual Volume 3: General-Purpose and System Instructions"
> Pg 401)
>
> Userspace:
> ----------------------------------------------------------------------
> #include <stdio.h>
> #include <time.h>
> #include <stdint.h>
> #include <x86intrin.h>
>
> volatile unsigned sink;
> unsigned int junk;
>
> int main (void)
> {
> clock_t start = clock();
> register uint64_t t=__rdtscp(&junk);
>
> for(size_t i=0; i<10000000; ++i)
>         sink++;
>
> t=__rdtscp(&junk)-t;
> clock_t end = clock();
> double cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
>
> printf("for loop took %f seconds to execute %zu cylces\n", cpu_time_used,
> t);
> }
> ---------------------------------------------------------------------
>
> Kernelspace:
> If you want to dig more:
> https://www.intel.com/content/dam/www/public/us/en/
> documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
>
>
> Thanks
> --
> Himanshu Jha
> Undergraduate Student
> Department of Electronics & Communication
> Guru Tegh Bahadur Institute of Technology
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180710/52e1350d/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-10  4:50       ` bing zhu
@ 2018-07-10  6:22         ` Greg KH
  2018-07-10 14:51           ` bing zhu
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2018-07-10  6:22 UTC (permalink / raw)
  To: kernelnewbies

On Tue, Jul 10, 2018 at 12:50:21PM +0800, bing zhu wrote:
> I agree !,just i think the problem is still there,memcpy is indeed faster in
> kernel than in user,i've tried both ways .

Make sure you are actually using the same code for memcpy in both
places.  Do not rely on your libc or the kernel library for such a
thing, otherwise you are not comparing the same code exactly.

> schedule might be to blame.

Lots of things "might be to blame", but first off, try to work out
exactly what you are trying to test, and why, and work on that.

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-10  6:22         ` Greg KH
@ 2018-07-10 14:51           ` bing zhu
  2018-07-10 14:57             ` Greg KH
  2018-07-10 16:03             ` valdis.kletnieks at vt.edu
  0 siblings, 2 replies; 22+ messages in thread
From: bing zhu @ 2018-07-10 14:51 UTC (permalink / raw)
  To: kernelnewbies

Thank you ,I use this func for both kernel and user ,result are same.
void *memcpy(void *dest, const void *src, size_t n)
{
long d0, d1, d2;
asm volatile(
"rep ; movsq\n\t"
"movq %4,%%rcx\n\t"
"rep ; movsb\n\t"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
: "memory");

return dest;
}
kernel is indeed faster than user.

2018-07-10 14:22 GMT+08:00 Greg KH <greg@kroah.com>:

> On Tue, Jul 10, 2018 at 12:50:21PM +0800, bing zhu wrote:
> > I agree !,just i think the problem is still there,memcpy is indeed
> faster in
> > kernel than in user,i've tried both ways .
>
> Make sure you are actually using the same code for memcpy in both
> places.  Do not rely on your libc or the kernel library for such a
> thing, otherwise you are not comparing the same code exactly.
>
> > schedule might be to blame.
>
> Lots of things "might be to blame", but first off, try to work out
> exactly what you are trying to test, and why, and work on that.
>
> good luck!
>
> greg k-h
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180710/30970b5b/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-10 14:51           ` bing zhu
@ 2018-07-10 14:57             ` Greg KH
  2018-07-10 16:03             ` valdis.kletnieks at vt.edu
  1 sibling, 0 replies; 22+ messages in thread
From: Greg KH @ 2018-07-10 14:57 UTC (permalink / raw)
  To: kernelnewbies

On Tue, Jul 10, 2018 at 10:51:34PM +0800, bing zhu wrote:
> Thank you ,I use this func for both kernel and user ,result are same.
> void *memcpy(void *dest, const void *src, size_t n)
> {
> long d0, d1, d2;
> asm volatile(
> "rep ; movsq\n\t"
> "movq %4,%%rcx\n\t"
> "rep ; movsb\n\t"
> : "=&c" (d0), "=&D" (d1), "=&S" (d2)
> : "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
> : "memory");
> 
> return dest;
> }
> kernel is indeed faster than user.

Ok, and that is due to the fact that the kernel thread does not get
scheduled, unlike your userspace program.  So this means the kernel is
working as designed :)

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-10 14:51           ` bing zhu
  2018-07-10 14:57             ` Greg KH
@ 2018-07-10 16:03             ` valdis.kletnieks at vt.edu
  2018-07-12  4:47               ` bing zhu
  1 sibling, 1 reply; 22+ messages in thread
From: valdis.kletnieks at vt.edu @ 2018-07-10 16:03 UTC (permalink / raw)
  To: kernelnewbies

On Tue, 10 Jul 2018 22:51:34 +0800, bing zhu said:

> Thank you ,I use this func for both kernel and user ,result are same.
> void *memcpy(void *dest, const void *src, size_t n)
> {

Might want to use 'void *my_memcpy(..)' instead, just in case the build
environment plays #define games with you and causes a different memcpy()
to get invoked instead.

[/usr/src/linux-next] egrep -r '#define\s*memcpy\(' include/ arch/*/include
arch/arm64/include/asm/string.h:#define memcpy(dst, src, len) __memcpy(dst, src, len)
arch/m68k/include/asm/string.h:#define memcpy(d, s, n) __builtin_memcpy(d, s, n)
arch/sparc/include/asm/string.h:#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
arch/x86/include/asm/string_64.h:#define memcpy(dst, src, len)					\
arch/x86/include/asm/string_64.h:#define memcpy(dst, src, len) __memcpy(dst, src, len)
arch/x86/include/asm/string_32.h:#define memcpy(t, f, n)				\
arch/x86/include/asm/string_32.h:#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
arch/x86/include/asm/string_32.h:#define memcpy(t, f, n)				\
arch/xtensa/include/asm/string.h:#define memcpy(dst, src, len) __memcpy(dst, src, len)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180710/85541f24/attachment.sig>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-10 16:03             ` valdis.kletnieks at vt.edu
@ 2018-07-12  4:47               ` bing zhu
  2018-07-12  5:34                 ` Greg KH
  0 siblings, 1 reply; 22+ messages in thread
From: bing zhu @ 2018-07-12  4:47 UTC (permalink / raw)
  To: kernelnewbies

agree! a simple rename would survice.results are the same .kernel is faster
could anyone help fix this ?

2018-07-11 0:03 GMT+08:00 <valdis.kletnieks@vt.edu>:

> On Tue, 10 Jul 2018 22:51:34 +0800, bing zhu said:
>
> > Thank you ,I use this func for both kernel and user ,result are same.
> > void *memcpy(void *dest, const void *src, size_t n)
> > {
>
> Might want to use 'void *my_memcpy(..)' instead, just in case the build
> environment plays #define games with you and causes a different memcpy()
> to get invoked instead.
>
> [/usr/src/linux-next] egrep -r '#define\s*memcpy\(' include/ arch/*/include
> arch/arm64/include/asm/string.h:#define memcpy(dst, src, len)
> __memcpy(dst, src, len)
> arch/m68k/include/asm/string.h:#define memcpy(d, s, n)
> __builtin_memcpy(d, s, n)
> arch/sparc/include/asm/string.h:#define memcpy(t, f, n)
> __builtin_memcpy(t, f, n)
> arch/x86/include/asm/string_64.h:#define memcpy(dst, src, len)
>                       \
> arch/x86/include/asm/string_64.h:#define memcpy(dst, src, len)
> __memcpy(dst, src, len)
> arch/x86/include/asm/string_32.h:#define memcpy(t, f, n)
>               \
> arch/x86/include/asm/string_32.h:#define memcpy(t, f, n)
> __builtin_memcpy(t, f, n)
> arch/x86/include/asm/string_32.h:#define memcpy(t, f, n)
>               \
> arch/xtensa/include/asm/string.h:#define memcpy(dst, src, len)
> __memcpy(dst, src, len)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180712/cce74095/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-12  4:47               ` bing zhu
@ 2018-07-12  5:34                 ` Greg KH
  2018-07-12 14:27                   ` bing zhu
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2018-07-12  5:34 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Jul 12, 2018 at 12:47:12PM +0800, bing zhu wrote:
> agree! a simple rename would survice.results are the same .kernel is faster
> could anyone help fix this ?

Fix what exactly?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-12  5:34                 ` Greg KH
@ 2018-07-12 14:27                   ` bing zhu
  2018-07-12 14:53                     ` Greg KH
  2018-07-12 16:49                     ` valdis.kletnieks at vt.edu
  0 siblings, 2 replies; 22+ messages in thread
From: bing zhu @ 2018-07-12 14:27 UTC (permalink / raw)
  To: kernelnewbies

as for memcpy ,kernel is faster than user ,might because schedule ,can i
try to make user as fast as kernel ?

2018-07-12 13:34 GMT+08:00 Greg KH <greg@kroah.com>:

> On Thu, Jul 12, 2018 at 12:47:12PM +0800, bing zhu wrote:
> > agree! a simple rename would survice.results are the same .kernel is
> faster
> > could anyone help fix this ?
>
> Fix what exactly?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180712/87cca5b9/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-12 14:27                   ` bing zhu
@ 2018-07-12 14:53                     ` Greg KH
  2018-07-13  3:02                       ` bing zhu
  2018-07-12 16:49                     ` valdis.kletnieks at vt.edu
  1 sibling, 1 reply; 22+ messages in thread
From: Greg KH @ 2018-07-12 14:53 UTC (permalink / raw)
  To: kernelnewbies

A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Thu, Jul 12, 2018 at 10:27:37PM +0800, bing zhu wrote:
> as for memcpy ,kernel is faster than user ,might because schedule ,can i try to
> make user as fast as kernel ?

You can bind a specific CPU to your userspace task, and have it only run
that program and not get interupted at all for anything.  That would
make it as fast as the kernel runs.  Lots of people do that in high
frequency trading as they don't want the CPU to get in the way of their
work or response times to the network.

But without doing fancy tricks like that, no.  Think about what an
operating system does.  It's job is to schedule things that need to be
done behind the back of your program.  Otherwise there's no need for it,
right?

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-12 14:53                     ` Greg KH
@ 2018-07-13  3:02                       ` bing zhu
  2018-07-13  7:33                         ` valdis.kletnieks at vt.edu
  0 siblings, 1 reply; 22+ messages in thread
From: bing zhu @ 2018-07-13  3:02 UTC (permalink / raw)
  To: kernelnewbies

I?m trying to write a simple fs in user space,if memcpy is slower than
kernel , i think it's unfair,as for only cpu for my task,
it's a bit of arbitrary ?i just want my task not interrupted during a
specific time is that possible ?

2018-07-12 22:53 GMT+08:00 Greg KH <greg@kroah.com>:

> A: http://en.wikipedia.org/wiki/Top_post
> Q: Were do I find info about this thing called top-posting?
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
> A: No.
> Q: Should I include quotations after my reply?
>
> http://daringfireball.net/2007/07/on_top
>
>
> On Thu, Jul 12, 2018 at 10:27:37PM +0800, bing zhu wrote:
> > as for memcpy ,kernel is faster than user ,might because schedule ,can i
> try to
> > make user as fast as kernel ?
>
> You can bind a specific CPU to your userspace task, and have it only run
> that program and not get interupted at all for anything.  That would
> make it as fast as the kernel runs.  Lots of people do that in high
> frequency trading as they don't want the CPU to get in the way of their
> work or response times to the network.
>
> But without doing fancy tricks like that, no.  Think about what an
> operating system does.  It's job is to schedule things that need to be
> done behind the back of your program.  Otherwise there's no need for it,
> right?
>
> good luck!
>
> greg k-h
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180713/6f3913e9/attachment-0001.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-13  3:02                       ` bing zhu
@ 2018-07-13  7:33                         ` valdis.kletnieks at vt.edu
  2018-07-17  2:44                           ` bing zhu
  0 siblings, 1 reply; 22+ messages in thread
From: valdis.kletnieks at vt.edu @ 2018-07-13  7:33 UTC (permalink / raw)
  To: kernelnewbies

On Fri, 13 Jul 2018 11:02:13 +0800, bing zhu said:

> I???m trying to write a simple fs in user space,if memcpy is slower than
> kernel , i think it's unfair,as for only cpu for my task,
> it's a bit of arbitrary ???i just want my task not interrupted during a
> specific time is that possible ?

Not getting interrupted is an *entirely* different issue than making memcpy fast.

Note that in general, systems code should be able to deal with interruptions
during most parts of the code, and locking used and disabling pre-emption for
sections of code that can't deal with being interrupted.  Remember that if your
filesystem code turns off interrupts for long enough, you can start losing
things like I/O completions.  Fortunately for those who write systems code,
the vast majority of interrupts are totally transparent to the vast majority
of the kernel code.

And if you're doing a file system in userspace, you're going to fail to notice
hundreds or even thousands of interrupts happening. If you don't believe me,
'cat /proc/interrupts', and realize that userspace didn't notice *any* of them
happening.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180713/2d312e5f/attachment.sig>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-13  7:33                         ` valdis.kletnieks at vt.edu
@ 2018-07-17  2:44                           ` bing zhu
  0 siblings, 0 replies; 22+ messages in thread
From: bing zhu @ 2018-07-17  2:44 UTC (permalink / raw)
  To: kernelnewbies

Thanks for elaborating ,I've learned that it's not worth it,i'm turning
other ways for performance consideration ,thanks!

2018-07-13 15:33 GMT+08:00 <valdis.kletnieks@vt.edu>:

> On Fri, 13 Jul 2018 11:02:13 +0800, bing zhu said:
>
> > I???m trying to write a simple fs in user space,if memcpy is slower than
> > kernel , i think it's unfair,as for only cpu for my task,
> > it's a bit of arbitrary ???i just want my task not interrupted during a
> > specific time is that possible ?
>
> Not getting interrupted is an *entirely* different issue than making
> memcpy fast.
>
> Note that in general, systems code should be able to deal with
> interruptions
> during most parts of the code, and locking used and disabling pre-emption
> for
> sections of code that can't deal with being interrupted.  Remember that if
> your
> filesystem code turns off interrupts for long enough, you can start losing
> things like I/O completions.  Fortunately for those who write systems code,
> the vast majority of interrupts are totally transparent to the vast
> majority
> of the kernel code.
>
> And if you're doing a file system in userspace, you're going to fail to
> notice
> hundreds or even thousands of interrupts happening. If you don't believe
> me,
> 'cat /proc/interrupts', and realize that userspace didn't notice *any* of
> them
> happening.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180717/9a59950a/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
  2018-07-12 14:27                   ` bing zhu
  2018-07-12 14:53                     ` Greg KH
@ 2018-07-12 16:49                     ` valdis.kletnieks at vt.edu
  1 sibling, 0 replies; 22+ messages in thread
From: valdis.kletnieks at vt.edu @ 2018-07-12 16:49 UTC (permalink / raw)
  To: kernelnewbies

On Thu, 12 Jul 2018 22:27:37 +0800, bing zhu said:

> as for memcpy ,kernel is faster than user ,might because schedule ,can i
> try to make user as fast as kernel ?

Do you have an actual issue where the difference in speed of these two
things makes a difference?  Or is this primarily a mental curiosity thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180712/16647dfd/attachment.sig>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question about memcpy
@ 2018-07-07 13:21 Alex Arvelaez
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Arvelaez @ 2018-07-07 13:21 UTC (permalink / raw)
  To: kernelnewbies

On Jul 7, 2018 7:37 AM, bing zhu <zhubohong12@gmail.com> wrote:
>
> Dear Sir/Ma'am
> Thank you for your time ,i'm a student new to linux kernel.
> I have a question about memcpy,i noticed that memcpy is faster in kernel than in user space
> for example :
> in a module helloworld , i use memcpy to copy a 4096B to a block of memory for like 10000 times
> and in user space i do the same thing,I noticed that kernel is faster than user ,
> is it possible that in kernel when i insmod hello it can not be scheduled but in user space it will so kernel is faster?

This makes sense, less context switches.

> is there a possible way that a user task can run a block of code that uninterruptable? No switch ,no schedule ?

I don't think this is possible, Linux is a preemptive kernel.

> Thank you !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20180707/1a889d44/attachment.html>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-07-17  2:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-07 11:36 Question about memcpy bing zhu
2018-07-07 18:44 ` valdis.kletnieks at vt.edu
2018-07-08 14:03   ` bing zhu
2018-07-09  7:54     ` 袁建鹏
2018-07-09  8:14       ` bing zhu
2018-07-09 14:04     ` Himanshu Jha
2018-07-09 16:16       ` valdis.kletnieks at vt.edu
2018-07-14 10:10         ` Himanshu Jha
2018-07-10  4:50       ` bing zhu
2018-07-10  6:22         ` Greg KH
2018-07-10 14:51           ` bing zhu
2018-07-10 14:57             ` Greg KH
2018-07-10 16:03             ` valdis.kletnieks at vt.edu
2018-07-12  4:47               ` bing zhu
2018-07-12  5:34                 ` Greg KH
2018-07-12 14:27                   ` bing zhu
2018-07-12 14:53                     ` Greg KH
2018-07-13  3:02                       ` bing zhu
2018-07-13  7:33                         ` valdis.kletnieks at vt.edu
2018-07-17  2:44                           ` bing zhu
2018-07-12 16:49                     ` valdis.kletnieks at vt.edu
  -- strict thread matches above, loose matches on Subject: below --
2018-07-07 13:21 Alex Arvelaez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).