* makedumpfile: optimize is_zero_page
@ 2014-03-03 19:44 Marc Milgram
2014-03-05 4:55 ` Atsushi Kumagai
0 siblings, 1 reply; 4+ messages in thread
From: Marc Milgram @ 2014-03-03 19:44 UTC (permalink / raw)
To: kexec; +Cc: Marc Milgram
There are local complaints that filtering out only zero pages is slow. I
found that is_zero_page was inefficient. It checks if the page contains any
non-zero bytes - one byte at a time.
Improve performance by checking for non-zero data 64 bits at a time. Also,
unroll the loop for additional performance.
Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
Executed:
time makedumpfile -d 1 /proc/vmcore <destination>
The amount of time taken in User space was reduced by 75%. The total time to
dump memory was reduced by 28%.
is_zero_page
Signed-off-by: Marc Milgram <mmilgram at redhat.com>
---
diff --git a/makedumpfile.h b/makedumpfile.h
index 3d270c6..0f211c4 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1634,10 +1634,27 @@ static inline int
is_zero_page(unsigned char *buf, long page_size)
{
size_t i;
+ unsigned long long *vect = (unsigned long long *) buf;
+ long page_len = page_size / (sizeof(unsigned long long));
- for (i = 0; i < page_size; i++)
- if (buf[i])
+ for (i = 0; i < page_len; i+=8) {
+ if (vect[i])
return FALSE;
+ if (vect[i+1])
+ return FALSE;
+ if (vect[i+2])
+ return FALSE;
+ if (vect[i+3])
+ return FALSE;
+ if (vect[i+4])
+ return FALSE;
+ if (vect[i+5])
+ return FALSE;
+ if (vect[i+6])
+ return FALSE;
+ if (vect[i+7])
+ return FALSE;
+ }
return TRUE;
}
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: makedumpfile: optimize is_zero_page
2014-03-03 19:44 makedumpfile: optimize is_zero_page Marc Milgram
@ 2014-03-05 4:55 ` Atsushi Kumagai
0 siblings, 0 replies; 4+ messages in thread
From: Atsushi Kumagai @ 2014-03-05 4:55 UTC (permalink / raw)
To: mmilgram@redhat.com; +Cc: kexec@lists.infradead.org
Hello Marc,
>There are local complaints that filtering out only zero pages is slow. I
>found that is_zero_page was inefficient. It checks if the page contains any
>non-zero bytes - one byte at a time.
>
>Improve performance by checking for non-zero data 64 bits at a time. Also,
>unroll the loop for additional performance.
>
>Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
>Executed:
> time makedumpfile -d 1 /proc/vmcore <destination>
>
>The amount of time taken in User space was reduced by 75%. The total time to
>dump memory was reduced by 28%.
Thanks for your good work, but...
>is_zero_page
>Signed-off-by: Marc Milgram <mmilgram at redhat.com>
>---
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 3d270c6..0f211c4 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1634,10 +1634,27 @@ static inline int
> is_zero_page(unsigned char *buf, long page_size)
> {
> size_t i;
>+ unsigned long long *vect = (unsigned long long *) buf;
>+ long page_len = page_size / (sizeof(unsigned long long));
>
>- for (i = 0; i < page_size; i++)
>- if (buf[i])
>+ for (i = 0; i < page_len; i+=8) {
>+ if (vect[i])
> return FALSE;
>+ if (vect[i+1])
>+ return FALSE;
>+ if (vect[i+2])
>+ return FALSE;
>+ if (vect[i+3])
>+ return FALSE;
>+ if (vect[i+4])
>+ return FALSE;
>+ if (vect[i+5])
>+ return FALSE;
>+ if (vect[i+6])
>+ return FALSE;
>+ if (vect[i+7])
>+ return FALSE;
>+ }
> return TRUE;
> }
It looks messy, I don't like such a manual loop unrolling and it
seems to affect performance a little according to my small test:
(test results for 4k page)
8 bits x 1 line x 4096 loops (current code): user 0m7.045s
64 bits x 1 line x 512 loops : user 0m1.847s
64 bits x 8 lines x 64 loops (your patch) : user 0m1.546s
64 bits x 512 lines x 1 loop : user 0m1.642s
(dump information)
Original pages : 0x000000000013073b
Excluded pages : 0x0000000000108b4e
Pages filled with zero : 0x0000000000108b4e
Cache pages : 0x0000000000000000
Cache pages + private : 0x0000000000000000
User process data pages : 0x0000000000000000
Free pages : 0x0000000000000000
Hwpoison pages : 0x0000000000000000
Remaining pages : 0x0000000000027bed
(The number of pages is reduced to 13%.)
Memory Hole : 0x000000000003f8c5
--------------------------------------------------
Total pages : 0x0000000000170000
So I think the loop unrolling is unnecessary.
Thanks
Atsushi Kumagai
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 4+ messages in thread
* makedumpfile: optimize is_zero_page
@ 2014-03-05 14:49 Marc Milgram
2014-03-10 5:25 ` Atsushi Kumagai
0 siblings, 1 reply; 4+ messages in thread
From: Marc Milgram @ 2014-03-05 14:49 UTC (permalink / raw)
To: kexec; +Cc: Marc Milgram
There are local complaints that filtering out only zero pages is slow. I
found that is_zero_page was inefficient. It checks if the page contains any
non-zero bytes - one byte at a time.
Improve performance by checking for non-zero data 64 bits at a time.
Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
Executed:
time makedumpfile -d 1 /proc/vmcore <destination>
The amount of time taken in User space was reduced by 64%. The total time to
dump memory was reduced by 27%.
Change Log:
v1 => v2)
o Eliminate loop unrolling as it is of minimal benefit based on CPU.
is_zero_page
Signed-off-by: Marc Milgram <mmilgram at redhat.com>
---
diff --git a/makedumpfile.h b/makedumpfile.h
index 3d270c6..1751e3a 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1634,9 +1634,11 @@ static inline int
is_zero_page(unsigned char *buf, long page_size)
{
size_t i;
+ unsigned long long *vect = (unsigned long long *) buf;
+ long page_len = page_size / sizeof(unsigned long long);
- for (i = 0; i < page_size; i++)
- if (buf[i])
+ for (i = 0; i < page_len; i++)
+ if (vect[i])
return FALSE;
return TRUE;
}
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: makedumpfile: optimize is_zero_page
2014-03-05 14:49 Marc Milgram
@ 2014-03-10 5:25 ` Atsushi Kumagai
0 siblings, 0 replies; 4+ messages in thread
From: Atsushi Kumagai @ 2014-03-10 5:25 UTC (permalink / raw)
To: mmilgram@redhat.com; +Cc: kexec@lists.infradead.org
>There are local complaints that filtering out only zero pages is slow. I
>found that is_zero_page was inefficient. It checks if the page contains any
>non-zero bytes - one byte at a time.
>
>Improve performance by checking for non-zero data 64 bits at a time.
>
>Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
>Executed:
> time makedumpfile -d 1 /proc/vmcore <destination>
>
>The amount of time taken in User space was reduced by 64%. The total time to
>dump memory was reduced by 27%.
>
>Change Log:
>
>v1 => v2)
>
>o Eliminate loop unrolling as it is of minimal benefit based on CPU.
Thank you for fixing, I'll merge this into v1.5.6.
Atsushi Kumagai
>is_zero_page
>Signed-off-by: Marc Milgram <mmilgram at redhat.com>
>---
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 3d270c6..1751e3a 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1634,9 +1634,11 @@ static inline int
> is_zero_page(unsigned char *buf, long page_size)
> {
> size_t i;
>+ unsigned long long *vect = (unsigned long long *) buf;
>+ long page_len = page_size / sizeof(unsigned long long);
>
>- for (i = 0; i < page_size; i++)
>- if (buf[i])
>+ for (i = 0; i < page_len; i++)
>+ if (vect[i])
> return FALSE;
> return TRUE;
> }
>
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-10 5:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-03 19:44 makedumpfile: optimize is_zero_page Marc Milgram
2014-03-05 4:55 ` Atsushi Kumagai
-- strict thread matches above, loose matches on Subject: below --
2014-03-05 14:49 Marc Milgram
2014-03-10 5:25 ` Atsushi Kumagai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox