* [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM @ 2018-09-13 21:41 Timofey Titovets 2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets 2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets 0 siblings, 2 replies; 5+ messages in thread From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw) To: linux-mm; +Cc: rppt, Timofey Titovets, Andrea Arcangeli, kvm, leesioh About speed (in kernel): ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s By sioh Lee tests (copy from other mail): Test platform: openstack cloud platform (NEWTON version) Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb) VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4 Linux kernel: 4.14 (latest version) KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200 Experiment process Firstly, we turn off KSM and launch 4 VMs. Then we turn on the KSM and measure the checksum computation time until full_scans become two. The experimental results (the experimental value is the average of the measured values) crc32c_intel: 1084.10ns crc32c (no hardware acceleration): 7012.51ns xxhash32: 2227.75ns xxhash64: 1413.16ns jhash2: 5128.30ns In summary, the result shows that crc32c_intel has advantages over all of the hash function used in the experiment. (decreased by 84.54% compared to crc32c, 78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64) the results are similar to those of Timofey. But, use only xxhash for now, because for using crc32c, cryptoapi must be initialized first - that require some tricky solution to work good in all situations. So: - Fisrt patch implement compile time pickup of fastest implementation of xxhash for target platform. - Second replace jhash2 with xxhash Thanks. CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org CC: leesioh <solee@os.korea.ac.kr> Timofey Titovets (2): xxHash: create arch dependent 32/64-bit xxhash() ksm: replace jhash2 with xxhash include/linux/xxhash.h | 23 +++++++++++++++++++++++ mm/Kconfig | 1 + mm/ksm.c | 4 ++-- 3 files changed, 26 insertions(+), 2 deletions(-) -- 2.19.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() 2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets @ 2018-09-13 21:41 ` Timofey Titovets 2018-09-14 8:41 ` Mike Rapoport 2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets 1 sibling, 1 reply; 5+ messages in thread From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw) To: linux-mm; +Cc: rppt, Timofey Titovets, Andrea Arcangeli, kvm, leesioh From: Timofey Titovets <nefelim4ag@gmail.com> xxh32() - fast on both 32/64-bit platforms xxh64() - fast only on 64-bit platform Create xxhash() which will pickup fastest version on compile time. As result depends on cpu word size, the main proporse of that - in memory hashing. Changes: v2: - Create that patch v3 -> v8: - Nothing, whole patchset version bump Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org CC: leesioh <solee@os.korea.ac.kr> --- include/linux/xxhash.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h index 9e1f42cb57e9..52b073fea17f 100644 --- a/include/linux/xxhash.h +++ b/include/linux/xxhash.h @@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed); */ uint64_t xxh64(const void *input, size_t length, uint64_t seed); +/** + * xxhash() - calculate wordsize hash of the input with a given seed + * @input: The data to hash. + * @length: The length of the data to hash. + * @seed: The seed can be used to alter the result predictably. + * + * If the hash does not need to be comparable between machines with + * different word sizes, this function will call whichever of xxh32() + * or xxh64() is faster. + * + * Return: wordsize hash of the data. + */ + +static inline unsigned long xxhash(const void *input, size_t length, + uint64_t seed) +{ +#if BITS_PER_LONG == 64 + return xxh64(input, length, seed); +#else + return xxh32(input, length, seed); +#endif +} + /*-**************************** * Streaming Hash Functions *****************************/ -- 2.19.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() 2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets @ 2018-09-14 8:41 ` Mike Rapoport 0 siblings, 0 replies; 5+ messages in thread From: Mike Rapoport @ 2018-09-14 8:41 UTC (permalink / raw) To: Timofey Titovets Cc: linux-mm, Timofey Titovets, Andrea Arcangeli, kvm, leesioh On Fri, Sep 14, 2018 at 12:41:01AM +0300, Timofey Titovets wrote: > From: Timofey Titovets <nefelim4ag@gmail.com> > > xxh32() - fast on both 32/64-bit platforms > xxh64() - fast only on 64-bit platform > > Create xxhash() which will pickup fastest version > on compile time. > > As result depends on cpu word size, > the main proporse of that - in memory hashing. > > Changes: > v2: > - Create that patch > v3 -> v8: > - Nothing, whole patchset version bump > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> > CC: Andrea Arcangeli <aarcange@redhat.com> > CC: linux-mm@kvack.org > CC: kvm@vger.kernel.org > CC: leesioh <solee@os.korea.ac.kr> > --- > include/linux/xxhash.h | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h > index 9e1f42cb57e9..52b073fea17f 100644 > --- a/include/linux/xxhash.h > +++ b/include/linux/xxhash.h > @@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed); > */ > uint64_t xxh64(const void *input, size_t length, uint64_t seed); > > +/** > + * xxhash() - calculate wordsize hash of the input with a given seed > + * @input: The data to hash. > + * @length: The length of the data to hash. > + * @seed: The seed can be used to alter the result predictably. > + * > + * If the hash does not need to be comparable between machines with > + * different word sizes, this function will call whichever of xxh32() > + * or xxh64() is faster. > + * > + * Return: wordsize hash of the data. > + */ > + > +static inline unsigned long xxhash(const void *input, size_t length, > + uint64_t seed) > +{ > +#if BITS_PER_LONG == 64 > + return xxh64(input, length, seed); > +#else > + return xxh32(input, length, seed); > +#endif > +} > + > /*-**************************** > * Streaming Hash Functions > *****************************/ > -- > 2.19.0 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH V8 2/2] ksm: replace jhash2 with xxhash 2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets 2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets @ 2018-09-13 21:41 ` Timofey Titovets 2018-09-14 8:42 ` Mike Rapoport 1 sibling, 1 reply; 5+ messages in thread From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw) To: linux-mm; +Cc: rppt, Timofey Titovets, leesioh, Andrea Arcangeli, kvm From: Timofey Titovets <nefelim4ag@gmail.com> Replace jhash2 with xxhash. Perf numbers: Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s >From Sioh Lee: crc32c_intel: 1084.10ns crc32c (no hardware acceleration): 7012.51ns xxhash32: 2227.75ns xxhash64: 1413.16ns jhash2: 5128.30ns As jhash2 always will be slower (for data size like PAGE_SIZE). Don't use it in ksm at all. Use only xxhash for now, because for using crc32c, cryptoapi must be initialized first - that require some tricky solution to work good in all situations. Thanks. Changes: v1 -> v2: - Move xxhash() to xxhash.h/c and separate patches v2 -> v3: - Move xxhash() xxhash.c -> xxhash.h - replace xxhash_t with 'unsigned long' - update kerneldoc above xxhash() v3 -> v4: - Merge xxhash/crc32 patches - Replace crc32 with crc32c (crc32 have same as jhash2 speed) - Add auto speed test and auto choice of fastest hash function v4 -> v5: - Pickup missed xxhash patch - Update code with compile time choicen xxhash - Add more macros to make code more readable - As now that only possible use xxhash or crc32c, on crc32c allocation error, skip speed test and fallback to xxhash - For workaround too early init problem (crc32c not avaliable), move zero_checksum init to first call of fastcall() - Don't alloc page for hash testing, use arch zero pages for that v5 -> v6: - Use libcrc32c instead of CRYPTO API, mainly for code/Kconfig deps Simplification - Add crc32c_available(): libcrc32c will BUG_ON on crc32c problems, so test crc32c avaliable by crc32c_available() - Simplify choice_fastest_hash() - Simplify fasthash() - struct rmap_item && stable_node have sizeof == 64 on x86_64, that makes them cache friendly. As we don't suffer from hash collisions, change hash type from unsigned long back to u32. - Fix kbuild robot warning, make all local functions static v6 -> v7: - Drop crc32c for now and use only xxhash in ksm. v7 -> v8: - Remove empty line changes Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: leesioh <solee@os.korea.ac.kr> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org --- mm/Kconfig | 1 + mm/ksm.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index a550635ea5c3..b5f923081bce 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -297,6 +297,7 @@ config MMU_NOTIFIER config KSM bool "Enable KSM for page merging" depends on MMU + select XXHASH help Enable Kernel Samepage Merging: KSM periodically scans those areas of an application's address space that an app has advised may be diff --git a/mm/ksm.c b/mm/ksm.c index 5b0894b45ee5..1a088306ef81 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -25,7 +25,7 @@ #include <linux/pagemap.h> #include <linux/rmap.h> #include <linux/spinlock.h> -#include <linux/jhash.h> +#include <linux/xxhash.h> #include <linux/delay.h> #include <linux/kthread.h> #include <linux/wait.h> @@ -1009,7 +1009,7 @@ static u32 calc_checksum(struct page *page) { u32 checksum; void *addr = kmap_atomic(page); - checksum = jhash2(addr, PAGE_SIZE / 4, 17); + checksum = xxhash(addr, PAGE_SIZE, 0); kunmap_atomic(addr); return checksum; } -- 2.19.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH V8 2/2] ksm: replace jhash2 with xxhash 2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets @ 2018-09-14 8:42 ` Mike Rapoport 0 siblings, 0 replies; 5+ messages in thread From: Mike Rapoport @ 2018-09-14 8:42 UTC (permalink / raw) To: Timofey Titovets Cc: linux-mm, Timofey Titovets, leesioh, Andrea Arcangeli, kvm On Fri, Sep 14, 2018 at 12:41:02AM +0300, Timofey Titovets wrote: > From: Timofey Titovets <nefelim4ag@gmail.com> > > Replace jhash2 with xxhash. > > Perf numbers: > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > ksm: crc32c hash() 12081 MB/s > ksm: xxh64 hash() 8770 MB/s > ksm: xxh32 hash() 4529 MB/s > ksm: jhash2 hash() 1569 MB/s > > From Sioh Lee: > crc32c_intel: 1084.10ns > crc32c (no hardware acceleration): 7012.51ns > xxhash32: 2227.75ns > xxhash64: 1413.16ns > jhash2: 5128.30ns > > As jhash2 always will be slower (for data size like PAGE_SIZE). > Don't use it in ksm at all. > > Use only xxhash for now, because for using crc32c, > cryptoapi must be initialized first - that require some > tricky solution to work good in all situations. > > Thanks. > > Changes: > v1 -> v2: > - Move xxhash() to xxhash.h/c and separate patches > v2 -> v3: > - Move xxhash() xxhash.c -> xxhash.h > - replace xxhash_t with 'unsigned long' > - update kerneldoc above xxhash() > v3 -> v4: > - Merge xxhash/crc32 patches > - Replace crc32 with crc32c (crc32 have same as jhash2 speed) > - Add auto speed test and auto choice of fastest hash function > v4 -> v5: > - Pickup missed xxhash patch > - Update code with compile time choicen xxhash > - Add more macros to make code more readable > - As now that only possible use xxhash or crc32c, > on crc32c allocation error, skip speed test and fallback to xxhash > - For workaround too early init problem (crc32c not avaliable), > move zero_checksum init to first call of fastcall() > - Don't alloc page for hash testing, use arch zero pages for that > v5 -> v6: > - Use libcrc32c instead of CRYPTO API, mainly for > code/Kconfig deps Simplification > - Add crc32c_available(): > libcrc32c will BUG_ON on crc32c problems, > so test crc32c avaliable by crc32c_available() > - Simplify choice_fastest_hash() > - Simplify fasthash() > - struct rmap_item && stable_node have sizeof == 64 on x86_64, > that makes them cache friendly. As we don't suffer from hash collisions, > change hash type from unsigned long back to u32. > - Fix kbuild robot warning, make all local functions static > v6 -> v7: > - Drop crc32c for now and use only xxhash in ksm. > v7 -> v8: > - Remove empty line changes > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > Signed-off-by: leesioh <solee@os.korea.ac.kr> > Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> > CC: Andrea Arcangeli <aarcange@redhat.com> > CC: linux-mm@kvack.org > CC: kvm@vger.kernel.org > --- > mm/Kconfig | 1 + > mm/ksm.c | 4 ++-- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index a550635ea5c3..b5f923081bce 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -297,6 +297,7 @@ config MMU_NOTIFIER > config KSM > bool "Enable KSM for page merging" > depends on MMU > + select XXHASH > help > Enable Kernel Samepage Merging: KSM periodically scans those areas > of an application's address space that an app has advised may be > diff --git a/mm/ksm.c b/mm/ksm.c > index 5b0894b45ee5..1a088306ef81 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -25,7 +25,7 @@ > #include <linux/pagemap.h> > #include <linux/rmap.h> > #include <linux/spinlock.h> > -#include <linux/jhash.h> > +#include <linux/xxhash.h> > #include <linux/delay.h> > #include <linux/kthread.h> > #include <linux/wait.h> > @@ -1009,7 +1009,7 @@ static u32 calc_checksum(struct page *page) > { > u32 checksum; > void *addr = kmap_atomic(page); > - checksum = jhash2(addr, PAGE_SIZE / 4, 17); > + checksum = xxhash(addr, PAGE_SIZE, 0); > kunmap_atomic(addr); > return checksum; > } > -- > 2.19.0 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-14 8:42 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets 2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets 2018-09-14 8:41 ` Mike Rapoport 2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets 2018-09-14 8:42 ` Mike Rapoport
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).