Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
* makedumpfile: optimize is_zero_page
@ 2014-03-03 19:44 Marc Milgram
  2014-03-05  4:55 ` Atsushi Kumagai
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Milgram @ 2014-03-03 19:44 UTC (permalink / raw)
  To: kexec; +Cc: Marc Milgram

There are local complaints that  filtering out only zero pages is slow.  I
found that is_zero_page was inefficient.  It checks if the page contains any
non-zero bytes - one byte at a time.

Improve performance by checking for non-zero data 64 bits at a time.  Also,
unroll the loop for additional performance.

Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
Executed:
  time makedumpfile -d 1 /proc/vmcore <destination>

The amount of time taken in User space was reduced by 75%.  The total time to
dump memory was reduced by 28%.

is_zero_page 
Signed-off-by: Marc Milgram <mmilgram at redhat.com>
---
diff --git a/makedumpfile.h b/makedumpfile.h
index 3d270c6..0f211c4 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1634,10 +1634,27 @@ static inline int
 is_zero_page(unsigned char *buf, long page_size)
 {
 	size_t i;
+	unsigned long long *vect = (unsigned long long *) buf;
+	long page_len = page_size / (sizeof(unsigned long long));
 
-	for (i = 0; i < page_size; i++)
-		if (buf[i])
+	for (i = 0; i < page_len; i+=8) {
+		if (vect[i])
 			return FALSE;
+		if (vect[i+1])
+			return FALSE;
+		if (vect[i+2])
+			return FALSE;
+		if (vect[i+3])
+			return FALSE;
+		if (vect[i+4])
+			return FALSE;
+		if (vect[i+5])
+			return FALSE;
+		if (vect[i+6])
+			return FALSE;
+		if (vect[i+7])
+			return FALSE;
+	}
 	return TRUE;
 }
 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: makedumpfile: optimize is_zero_page
  2014-03-03 19:44 Marc Milgram
@ 2014-03-05  4:55 ` Atsushi Kumagai
  0 siblings, 0 replies; 4+ messages in thread
From: Atsushi Kumagai @ 2014-03-05  4:55 UTC (permalink / raw)
  To: mmilgram@redhat.com; +Cc: kexec@lists.infradead.org

Hello Marc,

>There are local complaints that  filtering out only zero pages is slow.  I
>found that is_zero_page was inefficient.  It checks if the page contains any
>non-zero bytes - one byte at a time.
>
>Improve performance by checking for non-zero data 64 bits at a time.  Also,
>unroll the loop for additional performance.
>
>Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
>Executed:
>  time makedumpfile -d 1 /proc/vmcore <destination>
>
>The amount of time taken in User space was reduced by 75%.  The total time to
>dump memory was reduced by 28%.

Thanks for your good work, but...

>is_zero_page
>Signed-off-by: Marc Milgram <mmilgram at redhat.com>
>---
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 3d270c6..0f211c4 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1634,10 +1634,27 @@ static inline int
> is_zero_page(unsigned char *buf, long page_size)
> {
> 	size_t i;
>+	unsigned long long *vect = (unsigned long long *) buf;
>+	long page_len = page_size / (sizeof(unsigned long long));
>
>-	for (i = 0; i < page_size; i++)
>-		if (buf[i])
>+	for (i = 0; i < page_len; i+=8) {
>+		if (vect[i])
> 			return FALSE;
>+		if (vect[i+1])
>+			return FALSE;
>+		if (vect[i+2])
>+			return FALSE;
>+		if (vect[i+3])
>+			return FALSE;
>+		if (vect[i+4])
>+			return FALSE;
>+		if (vect[i+5])
>+			return FALSE;
>+		if (vect[i+6])
>+			return FALSE;
>+		if (vect[i+7])
>+			return FALSE;
>+	}
> 	return TRUE;
> }

It looks messy, I don't like such a manual loop unrolling and it
seems to affect performance a little according to my small test:

(test results for 4k page)
   8 bits x   1 line  x 4096 loops (current code):      user  0m7.045s
  64 bits x   1 line  x  512 loops               :      user  0m1.847s
  64 bits x   8 lines x   64 loops (your patch)  :      user  0m1.546s
  64 bits x 512 lines x    1 loop                :      user  0m1.642s

(dump information)
    Original pages  : 0x000000000013073b
      Excluded pages   : 0x0000000000108b4e
        Pages filled with zero  : 0x0000000000108b4e
        Cache pages             : 0x0000000000000000
        Cache pages + private   : 0x0000000000000000
        User process data pages : 0x0000000000000000
        Free pages              : 0x0000000000000000
        Hwpoison pages          : 0x0000000000000000
      Remaining pages  : 0x0000000000027bed
      (The number of pages is reduced to 13%.)
    Memory Hole     : 0x000000000003f8c5
    --------------------------------------------------
    Total pages     : 0x0000000000170000

So I think the loop unrolling is unnecessary.


Thanks
Atsushi Kumagai

>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 4+ messages in thread

* makedumpfile: optimize is_zero_page
@ 2014-03-05 14:49 Marc Milgram
  2014-03-10  5:25 ` Atsushi Kumagai
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Milgram @ 2014-03-05 14:49 UTC (permalink / raw)
  To: kexec; +Cc: Marc Milgram

There are local complaints that  filtering out only zero pages is slow.  I
found that is_zero_page was inefficient.  It checks if the page contains any
non-zero bytes - one byte at a time.

Improve performance by checking for non-zero data 64 bits at a time.

Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
Executed:
  time makedumpfile -d 1 /proc/vmcore <destination>

The amount of time taken in User space was reduced by 64%.  The total time to
dump memory was reduced by 27%.

Change Log:

v1 => v2)

o Eliminate loop unrolling as it is of minimal benefit based on CPU.

is_zero_page 
Signed-off-by: Marc Milgram <mmilgram at redhat.com>
---
diff --git a/makedumpfile.h b/makedumpfile.h
index 3d270c6..1751e3a 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1634,9 +1634,11 @@ static inline int
 is_zero_page(unsigned char *buf, long page_size)
 {
 	size_t i;
+	unsigned long long *vect = (unsigned long long *) buf;
+	long page_len = page_size / sizeof(unsigned long long);
 
-	for (i = 0; i < page_size; i++)
-		if (buf[i])
+	for (i = 0; i < page_len; i++)
+		if (vect[i])
 			return FALSE;
 	return TRUE;
 }

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: makedumpfile: optimize is_zero_page
  2014-03-05 14:49 makedumpfile: optimize is_zero_page Marc Milgram
@ 2014-03-10  5:25 ` Atsushi Kumagai
  0 siblings, 0 replies; 4+ messages in thread
From: Atsushi Kumagai @ 2014-03-10  5:25 UTC (permalink / raw)
  To: mmilgram@redhat.com; +Cc: kexec@lists.infradead.org

>There are local complaints that  filtering out only zero pages is slow.  I
>found that is_zero_page was inefficient.  It checks if the page contains any
>non-zero bytes - one byte at a time.
>
>Improve performance by checking for non-zero data 64 bits at a time.
>
>Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
>Executed:
>  time makedumpfile -d 1 /proc/vmcore <destination>
>
>The amount of time taken in User space was reduced by 64%.  The total time to
>dump memory was reduced by 27%.
>
>Change Log:
>
>v1 => v2)
>
>o Eliminate loop unrolling as it is of minimal benefit based on CPU.

Thank you for fixing, I'll merge this into v1.5.6.

Atsushi Kumagai

>is_zero_page
>Signed-off-by: Marc Milgram <mmilgram at redhat.com>
>---
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 3d270c6..1751e3a 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1634,9 +1634,11 @@ static inline int
> is_zero_page(unsigned char *buf, long page_size)
> {
> 	size_t i;
>+	unsigned long long *vect = (unsigned long long *) buf;
>+	long page_len = page_size / sizeof(unsigned long long);
>
>-	for (i = 0; i < page_size; i++)
>-		if (buf[i])
>+	for (i = 0; i < page_len; i++)
>+		if (vect[i])
> 			return FALSE;
> 	return TRUE;
> }
>
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-03-10  5:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-05 14:49 makedumpfile: optimize is_zero_page Marc Milgram
2014-03-10  5:25 ` Atsushi Kumagai
  -- strict thread matches above, loose matches on Subject: below --
2014-03-03 19:44 Marc Milgram
2014-03-05  4:55 ` Atsushi Kumagai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox