Flexible I/O Tester development
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] memtests for ioengines using mmap
@ 2018-01-18 23:53 Robert Elliott
  2018-01-18 23:53 ` [PATCH 1/3] memcpytest: Add more sizes Robert Elliott
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Robert Elliott @ 2018-01-18 23:53 UTC (permalink / raw)
  To: fio; +Cc: Robert Elliott

Add memtest workloads for an ioengine using mmap to run within
the memory mapped region (not to/from another transfer buffer in
regular memory).  Useful for persistent memory testing.

Tests include:
  memcpy = copy with libc memcpy() (d = s)(one read, one write)
  memscan = read memory to registers (one read)
  memset = write memory from registers with libc memset() (one write)
  wmemset = write memory from registers with libc wmemset() (one write)
  streamcopy = STREAM copy (d = s)(one read, one write)
  streamadd = STREAM add (d = s1 + s2)(two reads, add, one write)
  streamscale = STREAM scale (d = 3 * s1)(one read, multiply, one write)
  streamtriad = STREAM triad (d = s1 + 3 * s2)(two reads, add and
                multiply, one write)

Open issues:
* make memscan architecture-independent (or make the test unavailable
  in non-x86). The initial generic memcsum attempt still results in the
  compiler generating memory writes.
* ensure fio is not allocating/filling unused xfer_buf
* ensure Read and Write statistics make sense for each memtest (e.g.
  streamadd should count 2x reads and 1x writes)
* make use_glibc_nt functional (needs to be earlier, may not even
  be possible)
* combine map_populate, glibc_nt, etc. to avoid creating too many
  top-level fio options
* add to dev-dax and libpmem ioengines

Robert Elliott (3):
  memcpytest: Add more sizes
  memcpytest: add more memcpy tests
  ioengines: add memtest workloads for ioengines using mmap

 HOWTO             |  37 +++++
 debug.h           |   1 +
 engines/dev-dax.c |  12 +-
 engines/libpmem.c |  18 +--
 engines/mmap.c    | 142 +++++++++++++++++--
 fio.1             |  37 +++++
 init.c            |   4 +
 io_ddir.h         |  27 +++-
 io_u.c            |   3 +-
 io_u.h            |   9 +-
 lib/memcpy.c      | 411 ++++++++++++++++++++++++++++++++++++++++++++++++------
 lib/memcpy.h      |   4 +
 options.c         |  91 ++++++++++++
 thread_options.h  |   7 +
 14 files changed, 733 insertions(+), 70 deletions(-)

-- 
2.14.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] memcpytest: Add more sizes
  2018-01-18 23:53 [RFC PATCH 0/3] memtests for ioengines using mmap Robert Elliott
@ 2018-01-18 23:53 ` Robert Elliott
  2018-01-18 23:53 ` [PATCH 2/3] memcpytest: add more memcpy tests Robert Elliott
  2018-01-18 23:53 ` [PATCH 3/3] ioengines: add memtest workloads for ioengines using mmap Robert Elliott
  2 siblings, 0 replies; 5+ messages in thread
From: Robert Elliott @ 2018-01-18 23:53 UTC (permalink / raw)
  To: fio; +Cc: Robert Elliott

From: Robert Elliott <elliott@hpe.com>

Run memcpy tests over much larger sizes (L3 cache size and larger),
and reduce the number of iterations.
---
 lib/memcpy.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 78 insertions(+), 10 deletions(-)

diff --git a/lib/memcpy.c b/lib/memcpy.c
index 00e65aa7..a79d7c50 100644
--- a/lib/memcpy.c
+++ b/lib/memcpy.c
@@ -8,9 +8,17 @@
 #include "../gettime.h"
 #include "../fio.h"
 
-#define BUF_SIZE	32 * 1024 * 1024ULL
+/* largest last-level CPU cache size of an x86 in 2018 in bytes */
+#define LLC_SIZE	(45* 1024 * 1024ULL)
 
-#define NR_ITERS	64
+#define BUF_SIZE	(LLC_SIZE * 4ULL)
+
+/* alignment in bytes for the buffers.  Ensure that functions like
+ * libc memcpy can use most optimal paths (512 B for x86_64 AVX2).
+ */
+#define BUF_ALIGN	512
+
+#define NR_ITERS	8
 
 struct memcpy_test {
 	const char *name;
@@ -21,15 +29,27 @@ struct memcpy_test {
 
 static struct memcpy_test tests[] = {
 	{
-		.name		= "8 bytes",
+		.name		= "  4 bytes",
+		.size		= 4,
+	},
+	{
+		.name		= "  8 bytes",
 		.size		= 8,
 	},
 	{
-		.name		= "16 bytes",
+		.name		= " 16 bytes",
 		.size		= 16,
 	},
 	{
-		.name		= "96 bytes",
+		.name		= " 32 bytes",
+		.size		= 32,
+	},
+	{
+		.name		= " 64 bytes",
+		.size		= 64,
+	},
+	{
+		.name		= " 96 bytes",
 		.size		= 96,
 	},
 	{
@@ -45,25 +65,73 @@ static struct memcpy_test tests[] = {
 		.size		= 512,
 	},
 	{
-		.name		= "2048 bytes",
+		.name		= "  2 KiB",
 		.size		= 2048,
 	},
 	{
-		.name		= "8192 bytes",
+		.name		= "  4 KiB",
+		.size		= 4096,
+	},
+	{
+		.name		= "  8 KiB",
 		.size		= 8192,
 	},
 	{
-		.name		= "131072 bytes",
+		.name		= "128 KiB",
 		.size		= 131072,
 	},
 	{
-		.name		= "262144 bytes",
+		.name		= "256 KiB",
 		.size		= 262144,
 	},
 	{
-		.name		= "524288 bytes",
+		.name		= "512 KiB",
 		.size		= 524288,
 	},
+	{
+		.name		= "  8 MiB",
+		.size		= 8 * 1024 * 1024,
+	},
+	{
+		.name		= "6x 1.375 MiB",
+		.size		= 8650752,
+	},
+	{
+		.name		= "  9 MiB",
+		.size		= 9 * 1024 * 1024,
+	},
+	{
+		.name		= " 16 MiB",
+		.size		= 16 * 1024 * 1024,	/* 3/4 L3 size is 16.5 */
+	},
+	{
+		.name		= " 17 MiB",
+		.size		= 17 * 1024 * 1024,	/* 3/4 L3 size is 16.5 */
+	},
+	{
+		.name		= " 22 MiB",
+		.size		= 22 * 1024 * 1024,	/* L3 size */
+	},
+	{
+		.name		= " 32 MiB",
+		.size		= 32 * 1024 * 1024,	/* >L3 size */
+	},
+	{
+		.name		= " 40 MiB",
+		.size		= 40 * 1024 * 1024,
+	},
+	{
+		.name		= " 48 MiB",
+		.size		= 48 * 1024 * 1024,	/* larger than most L3 */
+	},
+	{
+		.name		= "128 MiB",
+		.size		= 128 * 1024 * 1024,	/* much larger than L3 */
+	},
+	{
+		.name		= "full buffer",
+		.size		= BUF_SIZE,
+	},
 	{
 		.name		= NULL,
 	},
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] memcpytest: add more memcpy tests
  2018-01-18 23:53 [RFC PATCH 0/3] memtests for ioengines using mmap Robert Elliott
  2018-01-18 23:53 ` [PATCH 1/3] memcpytest: Add more sizes Robert Elliott
@ 2018-01-18 23:53 ` Robert Elliott
  2018-01-25 21:22   ` Jens Axboe
  2018-01-18 23:53 ` [PATCH 3/3] ioengines: add memtest workloads for ioengines using mmap Robert Elliott
  2 siblings, 1 reply; 5+ messages in thread
From: Robert Elliott @ 2018-01-18 23:53 UTC (permalink / raw)
  To: fio; +Cc: Robert Elliott

From: Robert Elliott <elliott@hpe.com>

Add more memcpy tests:
    memcpy = copy with libc memcpy() (d = s)(one read, one write)
    memcsum = read memory to registers (one read)
    memset = write memory from registers with libc memset() (one write)
    wmemset = write memory from registers with libc wmemset() (one write)
    streamcopy = STREAM copy (d = s)(one read, one write)
    streamadd = STREAM add (d = s1 + s2)(two reads, add, one write)
    streamscale = STREAM scale (d = 3 * s1)(one read, multiply, one write)
    streamtriad = STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)
---
 engines/dev-dax.c |  12 +-
 engines/libpmem.c |  18 +--
 engines/mmap.c    |  13 ++-
 lib/memcpy.c      | 323 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 lib/memcpy.h      |   4 +
 5 files changed, 320 insertions(+), 50 deletions(-)

diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index caae1e09..fc169450 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -73,19 +73,19 @@ static int fio_devdax_file(struct thread_data *td, struct fio_file *f,
 			   size_t length, off_t off)
 {
 	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
-	int flags = 0;
+	int prot = 0;
 
 	if (td_rw(td))
-		flags = PROT_READ | PROT_WRITE;
+		prot = PROT_READ | PROT_WRITE;
 	else if (td_write(td)) {
-		flags = PROT_WRITE;
+		prot = PROT_WRITE;
 
 		if (td->o.verify != VERIFY_NONE)
-			flags |= PROT_READ;
+			prot |= PROT_READ;
 	} else
-		flags = PROT_READ;
+		prot = PROT_READ;
 
-	fdd->devdax_ptr = mmap(NULL, length, flags, MAP_SHARED, f->fd, off);
+	fdd->devdax_ptr = mmap(NULL, length, prot, MAP_SHARED, f->fd, off);
 	if (fdd->devdax_ptr == MAP_FAILED) {
 		fdd->devdax_ptr = NULL;
 		td_verror(td, errno, "mmap");
diff --git a/engines/libpmem.c b/engines/libpmem.c
index aa0a36f9..a6fdf964 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -318,31 +318,31 @@ static int fio_libpmem_file(struct thread_data *td, struct fio_file *f,
 			    size_t length, off_t off)
 {
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-	int flags = 0;
+	int prot = 0;
 	void *addr = NULL;
 
 	dprint(FD_IO, "DEBUG fio_libpmem_file\n");
 
 	if (td_rw(td))
-		flags = PROT_READ | PROT_WRITE;
+		prot = PROT_READ | PROT_WRITE;
 	else if (td_write(td)) {
-		flags = PROT_WRITE;
+		prot = PROT_WRITE;
 
 		if (td->o.verify != VERIFY_NONE)
-			flags |= PROT_READ;
+			prot |= PROT_READ;
 	} else
-		flags = PROT_READ;
+		prot = PROT_READ;
 
 	dprint(FD_IO, "f->file_name = %s  td->o.verify = %d \n", f->file_name,
 			td->o.verify);
-	dprint(FD_IO, "length = %ld  flags = %d  f->fd = %d off = %ld \n",
-			length, flags, f->fd,off);
+	dprint(FD_IO, "length = %ld  prot = %d  f->fd = %d off = %ld \n",
+			length, prot, f->fd,off);
 
 	addr = util_map_hint(length, 0);
 
 	dprint(FD_IO, "DEBUG mmap addr=%p length=0x%lx prot=0x%x\n",
-	       addr, length, flags);
-	fdd->libpmem_ptr = mmap(addr, length, flags, MAP_SHARED, f->fd, off);
+	       addr, length, prot);
+	fdd->libpmem_ptr = mmap(addr, length, prot, MAP_SHARED, f->fd, off);
 	if (fdd->libpmem_ptr == MAP_FAILED) {
 		fdd->libpmem_ptr = NULL;
 		td_verror(td, errno, "mmap");
diff --git a/engines/mmap.c b/engines/mmap.c
index 77556588..54b5b11d 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -31,19 +31,20 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 			 size_t length, off_t off)
 {
 	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
-	int flags = 0;
+	int prot = 0;
+	int flags = MAP_SHARED;
 
 	if (td_rw(td) && !td->o.verify_only)
-		flags = PROT_READ | PROT_WRITE;
+		prot = PROT_READ | PROT_WRITE;
 	else if (td_write(td) && !td->o.verify_only) {
-		flags = PROT_WRITE;
+		prot = PROT_WRITE;
 
 		if (td->o.verify != VERIFY_NONE)
-			flags |= PROT_READ;
+			prot |= PROT_READ;
 	} else
-		flags = PROT_READ;
+		prot = PROT_READ;
 
-	fmd->mmap_ptr = mmap(NULL, length, flags, MAP_SHARED, f->fd, off);
+	fmd->mmap_ptr = mmap(NULL, length, prot, flags, f->fd, off);
 	if (fmd->mmap_ptr == MAP_FAILED) {
 		fmd->mmap_ptr = NULL;
 		td_verror(td, errno, "mmap");
diff --git a/lib/memcpy.c b/lib/memcpy.c
index a79d7c50..e52a08fd 100644
--- a/lib/memcpy.c
+++ b/lib/memcpy.c
@@ -1,7 +1,10 @@
+#include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <wchar.h>
 
+#include "memalign.h"
 #include "memcpy.h"
 #include "rand.h"
 #include "../fio_time.h"
@@ -23,6 +26,7 @@
 struct memcpy_test {
 	const char *name;
 	void *src;
+	void *src2;
 	void *dst;
 	size_t size;
 };
@@ -140,14 +144,22 @@ static struct memcpy_test tests[] = {
 struct memcpy_type {
 	const char *name;
 	unsigned int mask;
-	void (*fn)(struct memcpy_test *);
+	void (*fn)(struct memcpy_type *, struct memcpy_test *);
 };
 
 enum {
 	T_MEMCPY	= 1U << 0,
 	T_MEMMOVE	= 1U << 1,
-	T_SIMPLE	= 1U << 2,
+	T_SIMPLE_MEMCPY	= 1U << 2,
 	T_HYBRID	= 1U << 3,
+	T_MEMSET	= 1U << 4,
+	T_WMEMSET	= 1U << 5,
+	T_SIMPLE_MEMSET	= 1U << 6,
+	T_MEMCSUM	= 1U << 7,
+	T_STREAMCOPY	= 1U << 8,
+	T_STREAMSCALE	= 1U << 9,
+	T_STREAMADD	= 1U << 10,
+	T_STREAMTRIAD	= 1U << 11,
 };
 
 #define do_test(test, fn)	do {					\
@@ -171,31 +183,61 @@ enum {
 	}								\
 } while (0)
 
-static void t_memcpy(struct memcpy_test *test)
+#define do_test_twosources(t, test, fn)	do {				\
+	size_t left, this;						\
+	void *src, *src2, *dst;						\
+	int i;								\
+									\
+	for (i = 0; i < NR_ITERS; i++) {				\
+		left = BUF_SIZE;					\
+		src = test->src;					\
+		src2 = test->src2;					\
+		dst = test->dst;					\
+		while (left) {						\
+			this = test->size;				\
+			if (this > left)				\
+				this = left;				\
+			(fn)(dst, src, src2, this);			\
+			left -= this;					\
+			src += this;					\
+			src2 += this;					\
+			dst += this;					\
+		}							\
+	}								\
+} while (0)
+
+static void flush_caches(struct memcpy_type *t, struct memcpy_test *test)
+{
+	__builtin___clear_cache(test->src, test->src + BUF_SIZE);
+	__builtin___clear_cache(test->src2, test->src2 + BUF_SIZE);
+	__builtin___clear_cache(test->dst, test->dst + BUF_SIZE);
+}
+
+static void t_memcpy(struct memcpy_type *t, struct memcpy_test *test)
 {
 	do_test(test, memcpy);
 }
 
-static void t_memmove(struct memcpy_test *test)
+static void t_memmove(struct memcpy_type *t, struct memcpy_test *test)
 {
 	do_test(test, memmove);
 }
 
 static void simple_memcpy(void *dst, void const *src, size_t len)
 {
- 	char *d = dst;
+	char *d = dst;
 	const char *s = src;
 
 	while (len--)
 		*d++ = *s++;
 }
 
-static void t_simple(struct memcpy_test *test)
+static void t_simple_memcpy(struct memcpy_type *t, struct memcpy_test *test)
 {
 	do_test(test, simple_memcpy);
 }
 
-static void t_hybrid(struct memcpy_test *test)
+static void t_hybrid(struct memcpy_type *t, struct memcpy_test *test)
 {
 	if (test->size >= 64)
 		do_test(test, simple_memcpy);
@@ -203,6 +245,186 @@ static void t_hybrid(struct memcpy_test *test)
 		do_test(test, memcpy);
 }
 
+static void t_memset(struct memcpy_type *t, struct memcpy_test *test)
+{
+	size_t left, this;
+	void *dst;
+	int i;
+
+	for (i = 0; i < NR_ITERS; i++) {
+		left = BUF_SIZE;
+		dst = test->dst;
+		// NOTE: test->size must divide into BUF_SIZE or this will loop forever
+		while (left) {
+			this = test->size;
+			if (this > left)
+				this = left;
+			memset(dst, 0x00, this);
+			left -= this;
+			dst += this;
+		}
+	}
+}
+
+static void t_wmemset(struct memcpy_type *t, struct memcpy_test *test)
+{
+	size_t left, this;
+	void *dst;
+	int i;
+
+	for (i = 0; i < NR_ITERS; i++) {
+		left = BUF_SIZE;
+		dst = test->dst;
+		// NOTE: test->size must divide into BUF_SIZE or this will loop forever
+		while (left) {
+			this = test->size;
+			if (this > left)
+				this = left;
+			wmemset(dst, 0x0000, this / sizeof(wchar_t));
+			left -= this;
+			dst += this;
+		}
+	}
+}
+static void simple_memset(void *dst, uint8_t val, size_t len)
+{
+	uint8_t *d = dst;
+
+	// assert len is multiple of 8
+	while (len) {
+		*d++ = val + len;
+		len -= sizeof(uint8_t);
+	}
+}
+
+static void t_simple_memset(struct memcpy_type *t, struct memcpy_test *test)
+{
+	size_t left, this;
+	uint8_t *dst;
+	int i;
+
+	for (i = 0; i < NR_ITERS; i++) {
+		left = BUF_SIZE;
+		dst = test->dst;
+		// NOTE: test->size must divide into BUF_SIZE or this will loop forever
+		while (left) {
+			this = test->size;
+			if (this > left)
+				this = left;
+			simple_memset(dst, 0x00, this);
+			left -= this;
+			dst += this;
+		}
+	}
+}
+
+volatile uint64_t csum;
+static void simple_memcsum(void const *src, size_t len)
+{
+	const uint64_t *s = src;
+
+	// assert len is multiple of 8
+	while (len) {
+		csum += *s++;
+		len -= sizeof(uint64_t);
+	}
+}
+
+// read memory, but use all the results so it is not optimized away
+// to benchmark read performance
+static void t_memcsum(struct memcpy_type *t, struct memcpy_test *test)
+{
+	size_t left, this;
+	void *src;
+	int i;
+
+	if (test->size < sizeof csum)
+		return;
+	for (i = 0; i < NR_ITERS; i++) {
+		left = BUF_SIZE;
+		src = test->src;
+		while (left) {
+			this = test->size;
+			if (this > left)
+				this = left;
+			simple_memcsum(src, this);
+			left -= this;
+			src += this;
+		}
+	}
+}
+
+const double scalar = 3.0;
+void streamcopy(void *dst, void const *src, size_t len)
+{
+	double *d = dst;
+	const double *s = src;
+
+	while (len -= sizeof(double))
+		*d++ = *s++;
+}
+
+static void t_streamcopy(struct memcpy_type *t, struct memcpy_test *test)
+{
+	if (test->size < sizeof scalar)
+		return;
+	do_test(test, streamcopy);
+}
+
+void streamscale(void *dst, void const *src, size_t len)
+{
+	double *d = dst;
+	const double *s = src;
+
+	while (len -= sizeof(double))
+		*d++ = scalar * *s++;
+}
+
+static void t_streamscale(struct memcpy_type *t, struct memcpy_test *test)
+{
+	if (test->size < sizeof scalar)
+		return;
+	do_test(test, streamscale);
+}
+
+void streamadd(void *dst, void const *src, void const *src2, size_t len)
+{
+	double *d = dst;
+	const double *s = src;
+	const double *s2 = src2;
+
+	while (len) {
+		*d++ = *s++ + *s2++;
+		len -= sizeof(double);
+	}
+}
+
+static void t_streamadd(struct memcpy_type *t, struct memcpy_test *test)
+{
+	if (test->size < sizeof scalar)
+		return;
+	do_test_twosources(t, test, streamadd);
+}
+
+void streamtriad(void *dst, void const *src, void const *src2, size_t len)
+{
+	double *d = dst;
+	const double *s = src;
+	const double *s2 = src2;
+
+	while (len) {
+		*d++ = *s++ + scalar * *s2++;
+		len -= sizeof(double);
+	}
+}
+
+static void t_streamtriad(struct memcpy_type *t, struct memcpy_test *test)
+{
+	if (test->size < sizeof scalar)
+		return;
+	do_test_twosources(t, test, streamtriad);
+}
+
 static struct memcpy_type t[] = {
 	{
 		.name = "memcpy",
@@ -215,9 +437,49 @@ static struct memcpy_type t[] = {
 		.fn = t_memmove,
 	},
 	{
-		.name = "simple",
-		.mask = T_SIMPLE,
-		.fn = t_simple,
+		.name = "simple_memcpy",
+		.mask = T_SIMPLE_MEMCPY,
+		.fn = t_simple_memcpy,
+	},
+	{
+		.name = "memset",
+		.mask = T_MEMSET,
+		.fn = t_memset,
+	},
+	{
+		.name = "wmemset",
+		.mask = T_WMEMSET,
+		.fn = t_wmemset,
+	},
+	{
+		.name = "simple_memset",
+		.mask = T_SIMPLE_MEMSET,
+		.fn = t_simple_memset,
+	},
+	{
+		.name = "memcsum",
+		.mask = T_MEMCSUM,
+		.fn = t_memcsum,
+	},
+	{
+		.name = "streamcopy",
+		.mask = T_STREAMCOPY,
+		.fn = t_streamcopy,
+	},
+	{
+		.name = "streamscale",
+		.mask = T_STREAMSCALE,
+		.fn = t_streamscale,
+	},
+	{
+		.name = "streamadd",
+		.mask = T_STREAMADD,
+		.fn = t_streamadd,
+	},
+	{
+		.name = "streamtriad",
+		.mask = T_STREAMTRIAD,
+		.fn = t_streamtriad,
 	},
 	{
 		.name = "hybrid",
@@ -265,23 +527,27 @@ static int setup_tests(void)
 {
 	struct memcpy_test *test;
 	struct frand_state state;
-	void *src, *dst;
+	void *src, *src2, *dst;
 	int i;
 
-	src = malloc(BUF_SIZE);
-	dst = malloc(BUF_SIZE);
-	if (!src || !dst) {
-		free(src);
-		free(dst);
+	// align to multiple of cache line size so library functions take the
+	// optimized paths
+	// e.g., __memmove_avx_erms rather than _mmmemmove_avs_unaligned_erms
+	src = fio_memalign(BUF_ALIGN, BUF_SIZE);
+	src2 = fio_memalign(BUF_ALIGN, BUF_SIZE);
+	dst = fio_memalign(BUF_ALIGN, BUF_SIZE);
+	if (!src || !src2 || !dst)
+		// FIXFIX free too
 		return 1;
-	}
 
 	init_rand_seed(&state, 0x8989, 0);
 	fill_random_buf(&state, src, BUF_SIZE);
+	fill_random_buf(&state, src2, BUF_SIZE);
 
 	for (i = 0; tests[i].name; i++) {
 		test = &tests[i];
 		test->src = src;
+		test->src2 = src2;
 		test->dst = dst;
 	}
 
@@ -290,8 +556,9 @@ static int setup_tests(void)
 
 static void free_tests(void)
 {
-	free(tests[0].src);
-	free(tests[0].dst);
+	fio_memfree(tests[0].src, BUF_SIZE);
+	fio_memfree(tests[0].src2, BUF_SIZE);
+	fio_memfree(tests[0].dst, BUF_SIZE);
 }
 
 int fio_memcpy_test(const char *type)
@@ -316,6 +583,9 @@ int fio_memcpy_test(const char *type)
 		return 1;
 	}
 
+	printf("memcpytest compile-time options: BUF_SIZE=%lld MiB, NR_INTERS=%d\n",
+	       BUF_SIZE / 1024 / 1024, NR_ITERS);
+
 	for (i = 0; t[i].name; i++) {
 		struct timespec ts;
 		double mb_sec;
@@ -324,18 +594,13 @@ int fio_memcpy_test(const char *type)
 		if (!(t[i].mask & test_mask))
 			continue;
 
-		/*
-		 * For first run, make sure CPUs are spun up and that
-		 * we've touched the data.
-		 */
-		usec_spin(100000);
-		t[i].fn(&tests[0]);
-
 		printf("%s\n", t[i].name);
 
 		for (j = 0; tests[j].name; j++) {
+			flush_caches(&t[i], &tests[j]);
 			fio_gettime(&ts, NULL);
-			t[i].fn(&tests[j]);
+			t[i].fn(&t[i], &tests[j]);
+			flush_caches(&t[i], &tests[j]);
 			usec = utime_since_now(&ts);
 
 			if (usec) {
@@ -343,9 +608,9 @@ int fio_memcpy_test(const char *type)
 
 				mb_sec = (double) mb / (double) usec;
 				mb_sec /= (1.024 * 1.024);
-				printf("\t%s:\t%8.2f MiB/sec\n", tests[j].name, mb_sec);
+				printf("\t%s:\t%8.2f MiB/s\n", tests[j].name, mb_sec);
 			} else
-				printf("\t%s:inf MiB/sec\n", tests[j].name);
+				printf("\t%s:\tinf MiB/s\n", tests[j].name);
 		}
 	}
 
diff --git a/lib/memcpy.h b/lib/memcpy.h
index f61a4a09..86006e71 100644
--- a/lib/memcpy.h
+++ b/lib/memcpy.h
@@ -2,5 +2,9 @@
 #define FIO_MEMCPY_H
 
 int fio_memcpy_test(const char *type);
+void streamcopy(void *dst, void const *src, size_t len);
+void streamscale(void *dst, void const *src, size_t len);
+void streamadd(void *dst, void const *src, void const *src2, size_t len);
+void streamtriad(void *dst, void const *src, void const *src2, size_t len);
 
 #endif
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] ioengines: add memtest workloads for ioengines using mmap
  2018-01-18 23:53 [RFC PATCH 0/3] memtests for ioengines using mmap Robert Elliott
  2018-01-18 23:53 ` [PATCH 1/3] memcpytest: Add more sizes Robert Elliott
  2018-01-18 23:53 ` [PATCH 2/3] memcpytest: add more memcpy tests Robert Elliott
@ 2018-01-18 23:53 ` Robert Elliott
  2 siblings, 0 replies; 5+ messages in thread
From: Robert Elliott @ 2018-01-18 23:53 UTC (permalink / raw)
  To: fio; +Cc: Robert Elliott

From: Robert Elliott <elliott@hpe.com>

Add memtest workloads for an ioengine using mmap to run within
the memory mapped region (not to/from another transfer buffer in
regular memory).  Useful for persistent memory testing.

Tests include:
  memcpy = copy with libc memcpy() (d = s)(one read, one write)
  memscan = read memory to registers (one read)
  memset = write memory from registers with libc memset() (one write)
  wmemset = write memory from registers with libc wmemset() (one write)
  streamcopy = STREAM copy (d = s)(one read, one write)
  streamadd = STREAM add (d = s1 + s2)(two reads, add, one write)
  streamscale = STREAM scale (d = 3 * s1)(one read, multiply, one write)
  streamtriad = STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)

NOTE: memscan function is x86-specific, not ready for inclusion yet.
---
 HOWTO            |  37 ++++++++++++++++
 debug.h          |   1 +
 engines/mmap.c   | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fio.1            |  37 ++++++++++++++++
 init.c           |   4 ++
 io_ddir.h        |  27 ++++++++++--
 io_u.c           |   3 +-
 io_u.h           |   9 +++-
 options.c        |  91 +++++++++++++++++++++++++++++++++++++++
 thread_options.h |   7 +++
 10 files changed, 334 insertions(+), 9 deletions(-)

diff --git a/HOWTO b/HOWTO
index 78fa6ccf..b2d0c69e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -992,6 +992,9 @@ I/O type
 				Sequential writes.
 		**trim**
 				Sequential trims (Linux block devices only).
+		**memtest**
+				Memory test (ioengines using mmap only).
+				Specified with memtest=.
 		**randread**
 				Random reads.
 		**randwrite**
@@ -1019,6 +1022,40 @@ I/O type
 	For instance, using ``rw=write:4k`` will skip 4k for every write.  Also see
 	the :option:`rw_sequencer` option.
 
+.. option:: memtest=str
+
+	Type of memory test to perform if rw=memtest is specified.
+	For use with ioengines supporting mmap() - performs the tests within the
+	memory mapped region.  Useful for persistent memory testing.
+
+	Accepted values are:
+
+		**memcpy**
+				copy with libc memcpy() (d = s)(one read, one write)
+		**memscan** (default)
+				read memory to registers (one read)
+		**memset**
+				write memory from registers with libc memset() (one write)
+		**wmemset**
+				write memory from registers with libc wmemset() (one write)
+		**streamcopy**
+				STREAM copy (d = s)(one read, one write)
+		**streamadd**
+				STREAM add (d = s1 + s2)(two reads, add, one write)
+		**streamscale**
+				STREAM scale (d = 3 * s1)(one read, multiply, one write)
+		**streamtriad**
+				STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)
+
+	If library functions are provided by glibc, memcpy() honors this
+	environment variable:
+		export GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=131072
+	to select the threshold for choosing non-temporal stores (e.g., vmovnt)
+	rather than normal stores (e.g., rep movsb).
+
+	Additional tunables might also be needed:
+		export GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=131072:glibc.tune.hwcaps=AVX2_Usable,ERMS,-Prefer_No_VZEROUPPER,AVX_Fast_Unaligned_Load
+
 .. option:: rw_sequencer=str
 
 	If an offset modifier is given by appending a number to the ``rw=<str>``
diff --git a/debug.h b/debug.h
index e3aa3f18..e7b176c6 100644
--- a/debug.h
+++ b/debug.h
@@ -23,6 +23,7 @@ enum {
 	FD_COMPRESS,
 	FD_STEADYSTATE,
 	FD_HELPERTHREAD,
+	FD_MEMTEST,
 	FD_DEBUG_MAX,
 };
 
diff --git a/engines/mmap.c b/engines/mmap.c
index 54b5b11d..edc59f50 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -10,7 +10,9 @@
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>
+#include <wchar.h>
 
+#include "../lib/memcpy.h"
 #include "../fio.h"
 #include "../verify.h"
 
@@ -34,7 +36,9 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 	int prot = 0;
 	int flags = MAP_SHARED;
 
-	if (td_rw(td) && !td->o.verify_only)
+	if (td->o.td_memtest)
+		prot = PROT_READ | PROT_WRITE;
+	else if (td_rw(td) && !td->o.verify_only)
 		prot = PROT_READ | PROT_WRITE;
 	else if (td_write(td) && !td->o.verify_only) {
 		prot = PROT_WRITE;
@@ -44,7 +48,12 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 	} else
 		prot = PROT_READ;
 
+	if (td->o.use_map_populate)
+		flags |= MAP_POPULATE;
 	fmd->mmap_ptr = mmap(NULL, length, prot, flags, f->fd, off);
+	dprint(FD_MEMTEST,
+	       "mmap addr=%p len=0x%lx=%ld off=0x%lx=%ld prot=0x%x flags=0x%x\n",
+	       fmd->mmap_ptr, length, length, off, off, prot, flags);
 	if (fmd->mmap_ptr == MAP_FAILED) {
 		fmd->mmap_ptr = NULL;
 		td_verror(td, errno, "mmap");
@@ -163,6 +172,30 @@ done:
 	return 0;
 }
 
+/* read from memory to register (don't write to memory) */
+static void memtoreg(uint64_t const *p, size_t len)
+{
+	uint64_t localreg = 0;
+        uint64_t ptmp = (uint64_t)p;
+	uint64_t end = (uint64_t)p + len / 8;
+
+	/* read 0x8 bytes per pass */
+	__asm__ __volatile__(
+		"loop:\n\t"
+		"mov 0(%[ptmp]), %[localreg]\n\t"
+		"add $0x8, %[ptmp]\n\t"
+		"cmp %[ptmp], %[end]\n\t"
+		"jne loop"
+		/* Output operands */
+		:"=r" (localreg)
+		/* Input operands */
+		:[localreg] "0" (localreg),
+		 [ptmp] "rp" (ptmp),
+		 [end] "r" (end)
+		/* Clobbered registers after another : */
+	);
+}
+
 static int fio_mmapio_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
@@ -170,7 +203,95 @@ static int fio_mmapio_queue(struct thread_data *td, struct io_u *io_u)
 
 	fio_ro_check(td, io_u);
 
-	if (io_u->ddir == DDIR_READ)
+	if (io_u->memtest == TD_MEMTEST_MEMSCAN) {
+		/* presence of this keeps the compiler from optimizing away memtoreg() */
+		uint32_t volatile result = 0;
+
+		dprint(FD_MEMTEST, "memscan %p len=0x%lx\n",
+		       io_u->mmap_data, io_u->xfer_buflen);
+		memtoreg(io_u->mmap_data, io_u->xfer_buflen);
+	} else if (io_u->memtest == TD_MEMTEST_MEMSET) {
+		dprint(FD_MEMTEST, "memset %p len=0x%lx\n",
+		       io_u->mmap_data, io_u->xfer_buflen);
+		memset(io_u->mmap_data, 0x00, io_u->xfer_buflen);
+	} else if (io_u->memtest == TD_MEMTEST_WMEMSET) {
+		dprint(FD_MEMTEST, "wmemset %p len=0x%lx\n",
+		       io_u->mmap_data, io_u->xfer_buflen);
+		wmemset(io_u->mmap_data, 0x00, io_u->xfer_buflen / sizeof(wchar_t));
+
+// HACKHACK
+#define PAGE_SIZE 4096
+
+	} else if (io_u->memtest == TD_MEMTEST_MEMCPY) {
+		size_t len = io_u->xfer_buflen / 2;
+		void *dst = io_u->mmap_data;
+		void *src = io_u->mmap_data + len;
+
+		dprint(FD_MEMTEST, "memcpy dst=%p src=%p len=0x%lx\n", dst, src, len);
+
+		// FIXFIX this doesn't work here, must be done before the process makes
+		// any memcpy() calls (first call selects the function to use)
+		if (td->o.use_glibc_nt) {
+			char ntstr[96];
+			int err;
+
+			// 1 = off (huge threshold)
+			// 2 = on (low threshold)
+			snprintf(ntstr, sizeof ntstr,
+				 "GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=%lu",
+				 (td->o.use_glibc_nt == 1)? len * 2: 0);
+
+			err = putenv(ntstr);
+			if (err)
+				dprint(FD_MEMTEST, "error setting GLIBC_TUNABLES=%s\n", ntstr);
+			else
+				dprint(FD_MEMTEST, "setting GLIBC_TUNABLES=%s\n", ntstr);
+		}
+		memcpy(dst, src, io_u->xfer_buflen / 2);
+		unsetenv("GLIBC_TUNABLES");
+	} else if (io_u->memtest == TD_MEMTEST_STREAM_COPY) {
+		size_t len = io_u->xfer_buflen / 2;
+		void *dst = io_u->mmap_data;
+		void *src = io_u->mmap_data + len;
+
+		dprint(FD_MEMTEST, "streamcopy dst=%p src=%p len=0x%lx\n",
+		       dst, src, len);
+		streamcopy(dst, src, io_u->xfer_buflen / 2);
+	} else if (io_u->memtest == TD_MEMTEST_STREAM_SCALE) {
+		size_t len = io_u->xfer_buflen / 2;
+		void *dst = io_u->mmap_data;
+		void *src = io_u->mmap_data + len;
+
+		dprint(FD_MEMTEST, "streamscale dst=%p src=%p len=0x%lx\n",
+		       dst, src, len);
+		streamscale(dst, src, io_u->xfer_buflen / 2);
+	} else if (io_u->memtest == TD_MEMTEST_STREAM_ADD) {
+		size_t len = (io_u->xfer_buflen / 3) & ~(PAGE_SIZE - 1);
+		void *dst = io_u->mmap_data;
+		void *src1 = PTR_ALIGN(io_u->mmap_data + len, PAGE_SIZE);
+		void *src2 = PTR_ALIGN(io_u->mmap_data + 2 * len, PAGE_SIZE);
+
+		dprint(FD_MEMTEST,
+		       "streamadd     dst=%p src1=%p src2=%p len=0x%lx=%ld\n",
+		       dst, src1, src2, len, len);
+		dprint(FD_MEMTEST,
+		       "streamadd rel dst=0x%lx src1=0x%lx src2=0x%lx\n",
+		       dst - dst, src1 - dst, src2 - dst);
+		streamadd(dst, src1, src2, len);
+	} else if (io_u->memtest == TD_MEMTEST_STREAM_TRIAD) {
+		size_t len = (io_u->xfer_buflen / 3) & ~(PAGE_SIZE - 1);
+		void *dst = io_u->mmap_data;
+		void *src1 = PTR_ALIGN(io_u->mmap_data + len, PAGE_SIZE);
+		void *src2 = PTR_ALIGN(io_u->mmap_data + 2 * len, PAGE_SIZE);
+
+		dprint(FD_MEMTEST,
+		       "streamtriad     dst=%p src1=%p src2=%p len=0x%lx=%ld\n",
+		       dst, src1, src2, len, len);
+		dprint(FD_MEMTEST,
+		       "streamtriad rel dst=0x%lx src1=0x%lx src2=0x%lx\n",
+		       dst - dst, src1 - dst, src2 - dst);
+		streamtriad(dst, src1, src2, len);
+	} else if (io_u->ddir == DDIR_READ)
 		memcpy(io_u->xfer_buf, io_u->mmap_data, io_u->xfer_buflen);
 	else if (io_u->ddir == DDIR_WRITE)
 		memcpy(io_u->mmap_data, io_u->xfer_buf, io_u->xfer_buflen);
@@ -186,7 +307,6 @@ static int fio_mmapio_queue(struct thread_data *td, struct io_u *io_u)
 			td_verror(td, io_u->error, "trim");
 	}
 
-
 	/*
 	 * not really direct, but should drop the pages from the cache
 	 */
@@ -216,6 +336,7 @@ static int fio_mmapio_init(struct thread_data *td)
 	}
 
 	mmap_map_size = MMAP_TOTAL_SZ / o->nr_files;
+
 	return 0;
 }
 
diff --git a/fio.1 b/fio.1
index 70eeeb0f..7672e9e7 100644
--- a/fio.1
+++ b/fio.1
@@ -769,6 +769,9 @@ Random writes.
 .B randtrim
 Random trims (Linux block devices only).
 .TP
+.B memtest
+Memory test (for ioengines using mmap only).
+.TP
 .B rw,readwrite
 Sequential mixed reads and writes.
 .TP
@@ -818,6 +821,40 @@ behaves in a similar fashion, except it sends the same offset 8 number of
 times before generating a new offset.
 .RE
 .TP
+.BI memtest \fR=\fPstr "\fR
+Type of memory test to perform if rw=memtest is specified.
+For use with ioengines supporting mmap() - performs the tests within the
+mapped region.  Useful for persistent memory testing.
+Accepted values are:
+.RS
+.RS
+.TP
+.B memcpy
+.thcopy with libc memcpy() (d = s)(one read, one write)
+.TP
+.B memscan (default)
+read memory to registers (one read)
+.TP
+.B memset
+write memory from registers with libc memset() (one write)
+.TP
+.B wmemset
+write memory from registers with libc wmemset() (one write)
+.TP
+.B streamcopy
+STREAM copy (d = s)(one read, one write)
+.TP
+.B streamadd
+STREAM add (d = s1 + s2)(two reads, add, one write)
+.TP
+.B streamscale
+STREAM scale (d = 3 * s1)(one read, multiply, one write)
+.TP
+.B streamtriad
+STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)
+.RE
+.RE
+.TP
 .BI unified_rw_reporting \fR=\fPbool
 Fio normally reports statistics on a per data direction basis, meaning that
 reads, writes, and trims are accounted and reported separately. If this
diff --git a/init.c b/init.c
index 8a801383..78167a47 100644
--- a/init.c
+++ b/init.c
@@ -2251,6 +2251,10 @@ struct debug_level debug_levels[] = {
 	  .help = "Helper thread logging",
 	  .shift = FD_HELPERTHREAD,
 	},
+	{ .name = "mmap",
+	  .help = "mmap-based memory test logging",
+	  .shift = FD_MEMTEST,
+	},
 	{ .name = NULL, },
 };
 
diff --git a/io_ddir.h b/io_ddir.h
index 613d5fbc..0b0a0139 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -37,6 +37,7 @@ enum td_ddir {
 	TD_DDIR_RANDRW		= TD_DDIR_RW | TD_DDIR_RAND,
 	TD_DDIR_RANDTRIM	= TD_DDIR_TRIM | TD_DDIR_RAND,
 	TD_DDIR_TRIMWRITE	= TD_DDIR_TRIM | TD_DDIR_WRITE,
+	TD_DDIR_LAST		= TD_DDIR_TRIMWRITE + 1
 };
 
 #define td_read(td)		((td)->o.td_ddir & TD_DDIR_READ)
@@ -61,14 +62,32 @@ static inline int ddir_rw(enum fio_ddir ddir)
 
 static inline const char *ddir_str(enum td_ddir ddir)
 {
-	static const char *__str[] = { NULL, "read", "write", "rw", "rand",
-				"randread", "randwrite", "randrw",
-				"trim", NULL, "trimwrite", NULL, "randtrim" };
+	static const char *__str[] = {
+		NULL, "read", "write", "rw",			// 0x0 - 0x3
+		"rand", "randread", "randwrite", "randrw",	// 0x4 - 0x7 RAND
+		NULL, NULL, "trimwrite", NULL,			// 0x8 - 0xB TRIM
+		"randtrim", NULL, NULL, NULL,			// 0xC - 0xF RAND, TRIM
+	};
 
-	return __str[ddir];
+	if (ddir < TD_DDIR_LAST)
+		return __str[ddir];
+	else
+		return NULL;
 }
 
 #define ddir_rw_sum(arr)	\
 	((arr)[DDIR_READ] + (arr)[DDIR_WRITE] + (arr)[DDIR_TRIM])
 
+enum td_memtest {
+	TD_MEMTEST_MEMCPY,
+	TD_MEMTEST_MEMSCAN,
+	TD_MEMTEST_MEMSET,
+	TD_MEMTEST_WMEMSET,
+	TD_MEMTEST_STREAM_COPY,
+	TD_MEMTEST_STREAM_ADD,
+	TD_MEMTEST_STREAM_SCALE,
+	TD_MEMTEST_STREAM_TRIAD,
+};
+
 #endif
+
diff --git a/io_u.c b/io_u.c
index 1d6872ed..738801a1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -968,6 +968,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	if (td_ioengine_flagged(td, FIO_NOIO))
 		goto out;
 
+	io_u->memtest = td->o.td_memtest;
 	set_rw_ddir(td, io_u);
 
 	/*
@@ -1791,7 +1792,7 @@ struct io_u *get_io_u(struct thread_data *td)
 		f->last_start[io_u->ddir] = io_u->offset;
 		f->last_pos[io_u->ddir] = io_u->offset + io_u->buflen;
 
-		if (io_u->ddir == DDIR_WRITE) {
+		if (io_u->ddir == DDIR_WRITE && !io_u->memtest) {
 			if (td->flags & TD_F_REFILL_BUFFERS) {
 				io_u_fill_buffer(td, io_u,
 					td->o.min_bs[DDIR_WRITE],
diff --git a/io_u.h b/io_u.h
index da25efb9..4d39a10b 100644
--- a/io_u.h
+++ b/io_u.h
@@ -37,6 +37,7 @@ struct io_u {
 	struct fio_file *file;
 	unsigned int flags;
 	enum fio_ddir ddir;
+	unsigned int memtest;
 
 	/*
 	 * For replay workloads, we may want to account as a different
@@ -152,7 +153,13 @@ static inline void dprint_io_u(struct io_u *io_u, const char *p)
 {
 	struct fio_file *f = io_u->file;
 
-	if (f)
+	if (f && io_u->memtest)
+		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d,memtest=%d,file=%s\n",
+				p, io_u,
+				(unsigned long long) io_u->offset,
+				io_u->buflen, io_u->ddir, io_u->memtest,
+				f->file_name);
+	else if (f)
 		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d,file=%s\n",
 				p, io_u,
 				(unsigned long long) io_u->offset,
diff --git a/options.c b/options.c
index 9a3431d8..e6b214e1 100644
--- a/options.c
+++ b/options.c
@@ -409,6 +409,14 @@ static int str_rw_cb(void *data, const char *str)
 	return 0;
 }
 
+static int str_memtest_cb(void *data, const char *str)
+{
+	//struct thread_data *td = cb_data_to_td(data);
+	//struct thread_options *o = &td->o;
+
+	return 0;
+}
+
 static int str_mem_cb(void *data, const char *mem)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -1534,6 +1542,19 @@ static int rw_verify(struct fio_option *o, void *data)
 	return 0;
 }
 
+// FIXFIX add more checks
+static int memtest_verify(struct fio_option *o, void *data)
+{
+	struct thread_data *td = cb_data_to_td(data);
+
+	if (read_only && td_write(td)) {
+		log_err("fio: job <%s> has write bit set, but fio is in read-only mode\n", td->o.name);
+		return 1;
+	}
+
+	return 0;
+}
+
 static int gtod_cpu_verify(struct fio_option *o, void *data)
 {
 #ifndef FIO_HAVE_CPU_AFFINITY
@@ -1685,6 +1706,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = TD_DDIR_TRIM,
 			    .help = "Sequential trim",
 			  },
+			  { .ival = "memtest",
+			    .oval = TD_DDIR_WRITE,	// assume both directions for accounting
+			    .help = "Memory test for mmap engines (specify with memtest option)",
+			  },
 			  { .ival = "randread",
 			    .oval = TD_DDIR_RANDREAD,
 			    .help = "Random read",
@@ -1715,6 +1740,72 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 		},
 	},
+	{
+		.name	= "memtest",
+		.lname	= "memory test for ioengines using mmap()",
+		.type	= FIO_OPT_STR,
+		.cb	= str_memtest_cb,
+		.off1	= offsetof(struct thread_options, td_memtest),
+		.help	= "memory test within the mmap() region of the specified file or device",
+		.def	= "memscan",
+		.verify	= memtest_verify,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IO_BASIC,
+		.posval = {
+			  { .ival = "memcpy",
+			    .oval = TD_MEMTEST_MEMCPY,
+			    .help = "copy with libc memcpy() (d = s)(one read, one write)",
+			  },
+			  { .ival = "memscan",
+			    .oval = TD_MEMTEST_MEMSCAN,
+			    .help = "read memory to registers (one read)",
+			  },
+			  { .ival = "memset",
+			    .oval = TD_MEMTEST_MEMSET,
+			    .help = "write memory from registers with libc memset() (one write)",
+			  },
+			  { .ival = "wmemset",
+			    .oval = TD_MEMTEST_WMEMSET,
+			    .help = "write memory from registers with libc wmemset() (one write)",
+			  },
+			  { .ival = "streamcopy",
+			    .oval = TD_MEMTEST_STREAM_COPY,
+			    .help = "STREAM copy (d = s)(one read, one write)",
+			  },
+			  { .ival = "streamadd",
+			    .oval = TD_MEMTEST_STREAM_ADD,
+			    .help = "STREAM add (d = s1 + s2)(two reads, add, one write)",
+			  },
+			  { .ival = "streamscale",
+			    .oval = TD_MEMTEST_STREAM_SCALE,
+			    .help = "STREAM scale (d = 3 * s1)(one read, multiply, one write)",
+			  },
+			  { .ival = "streamtriad",
+			    .oval = TD_MEMTEST_STREAM_TRIAD,
+			    .help = "STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)",
+			  },
+		},
+	},
+	{
+		.name	= "mmap_populate",
+		.lname	= "mmap MAP_POPULATE",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct thread_options, use_map_populate),
+		.help	= "Use MAP_POPULATE on mmap() calls",
+		.def	= 0,
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_IO_BASIC,
+	},
+	{
+		.name	= "memtest_nt",
+		.lname	= "memtest non-temporal GLIBC tunable",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct thread_options, use_glibc_nt),
+		.help	= "Set GLIBC_TUNABLES nontemporal threshold below the transfer size (0=natural, 1=force temporal, 2=force NT)",
+		.def	= 0,
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_IO_BASIC,
+	},
 	{
 		.name	= "rw_sequencer",
 		.lname	= "RW Sequencer",
diff --git a/thread_options.h b/thread_options.h
index dc290b0b..ee51e898 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -58,6 +58,7 @@ struct thread_options {
 	char *ioengine_so_path;
 	char *mmapfile;
 	enum td_ddir td_ddir;
+	enum td_memtest td_memtest;
 	unsigned int rw_seq;
 	unsigned int kb_base;
 	unsigned int unit_base;
@@ -191,6 +192,8 @@ struct thread_options {
 	unsigned long long lockmem;
 	enum fio_memtype mem_type;
 	unsigned int mem_align;
+	unsigned int use_map_populate;
+	unsigned int use_glibc_nt;
 
 	unsigned long long max_latency;
 
@@ -338,6 +341,8 @@ struct thread_options_pack {
 	uint8_t ioengine[FIO_TOP_STR_MAX];
 	uint8_t mmapfile[FIO_TOP_STR_MAX];
 	uint32_t td_ddir;
+	uint32_t td_memtest;
+	uint32_t reserved;
 	uint32_t rw_seq;
 	uint32_t kb_base;
 	uint32_t unit_base;
@@ -469,6 +474,8 @@ struct thread_options_pack {
 	uint64_t lockmem;
 	uint32_t mem_type;
 	uint32_t mem_align;
+	uint32_t use_map_populate;
+	uint32_t use_glibc_nt;
 
 	uint32_t stonewall;
 	uint32_t new_group;
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/3] memcpytest: add more memcpy tests
  2018-01-18 23:53 ` [PATCH 2/3] memcpytest: add more memcpy tests Robert Elliott
@ 2018-01-25 21:22   ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-01-25 21:22 UTC (permalink / raw)
  To: Robert Elliott, fio

On 1/18/18 4:53 PM, Robert Elliott wrote:
> From: Robert Elliott <elliott@hpe.com>
> 
> Add more memcpy tests:
>     memcpy = copy with libc memcpy() (d = s)(one read, one write)
>     memcsum = read memory to registers (one read)
>     memset = write memory from registers with libc memset() (one write)
>     wmemset = write memory from registers with libc wmemset() (one write)
>     streamcopy = STREAM copy (d = s)(one read, one write)
>     streamadd = STREAM add (d = s1 + s2)(two reads, add, one write)
>     streamscale = STREAM scale (d = 3 * s1)(one read, multiply, one write)
>     streamtriad = STREAM triad (d = s1 + 3 * s2)(two reads, add and multiply, one write)

The engine changes in here don't seem related?

> +static void flush_caches(struct memcpy_type *t, struct memcpy_test *test)
> +{
> +	__builtin___clear_cache(test->src, test->src + BUF_SIZE);
> +	__builtin___clear_cache(test->src2, test->src2 + BUF_SIZE);
> +	__builtin___clear_cache(test->dst, test->dst + BUF_SIZE);
> +}

Is this going to work on all platforms? I'm fine with adding it, but
we'll probably need a configure test to ensure we don't break various
builds.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-25 21:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-18 23:53 [RFC PATCH 0/3] memtests for ioengines using mmap Robert Elliott
2018-01-18 23:53 ` [PATCH 1/3] memcpytest: Add more sizes Robert Elliott
2018-01-18 23:53 ` [PATCH 2/3] memcpytest: add more memcpy tests Robert Elliott
2018-01-25 21:22   ` Jens Axboe
2018-01-18 23:53 ` [PATCH 3/3] ioengines: add memtest workloads for ioengines using mmap Robert Elliott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox