All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matteo Croce <mcroce@linux.microsoft.com>
To: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Kossifidis <mick@ics.forth.gr>, Guo Ren <guoren@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	David Laight <David.Laight@aculab.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Drew Fustini <drew@beagleboard.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-riscv <linux-riscv@lists.infradead.org>
Subject: Re: [PATCH 0/3] lib/string: optimized mem* functions
Date: Thu, 1 Jul 2021 02:29:55 +0200	[thread overview]
Message-ID: <20210701022955.61ecb657@linux.microsoft.com> (raw)
In-Reply-To: <CAFnufp160uvpqNowZpDdOY4bSN1NJ5j81nYcX=9+rW9WZHfXcQ@mail.gmail.com>

On Sun, 27 Jun 2021 02:19:59 +0200
Matteo Croce <mcroce@linux.microsoft.com> wrote:

> On Fri, Jun 25, 2021 at 7:45 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Thu, Jun 24, 2021 at 6:02 PM Matteo Croce
> > <mcroce@linux.microsoft.com> wrote:
> > >
> > > From: Matteo Croce <mcroce@microsoft.com>
> > >
> > > Rewrite the generic mem{cpy,move,set} so that memory is accessed
> > > with the widest size possible, but without doing unaligned
> > > accesses.
> > >
> > > This was originally posted as C string functions for RISC-V[1],
> > > but as there was no specific RISC-V code, it was proposed for the
> > > generic lib/string.c implementation.
> > >
> > > Tested on RISC-V and on x86_64 by undefining
> > > __HAVE_ARCH_MEM{CPY,SET,MOVE} and HAVE_EFFICIENT_UNALIGNED_ACCESS.
> > >
> > > Further testing on big endian machines will be appreciated, as I
> > > don't have such hardware at the moment.
> >
> > Hi Matteo,
> > Neat patches.  Do you have you any benchmark data showing the
> > claimed improvements? Is it worthwhile to define these only when
> > CC_OPTIMIZE_FOR_PERFORMANCE/CC_OPTIMIZE_FOR_PERFORMANCE_O3 are
> > defined, not CC_OPTIMIZE_FOR_SIZE? I'd be curious to know the delta
> > in ST_SIZE of these functions otherwise.
> >
> 
> I compared the current versions with the new one with bloat-o-meter,
> the kernel grows by ~400 bytes on x86_64 and RISC-V
> 
> x86_64
> 
> $ scripts/bloat-o-meter vmlinux.orig vmlinux
> add/remove: 0/0 grow/shrink: 4/1 up/down: 427/-6 (421)
> Function                                     old     new   delta
> memcpy                                        29     351    +322
> memset                                        29     117     +88
> strlcat                                       68      78     +10
> strlcpy                                       50      57      +7
> memmove                                       56      50      -6
> Total: Before=8556964, After=8557385, chg +0.00%
> 
> RISC-V
> 
> $ scripts/bloat-o-meter vmlinux.orig vmlinux
> add/remove: 0/0 grow/shrink: 4/2 up/down: 432/-36 (396)
> Function                                     old     new   delta
> memcpy                                        36     324    +288
> memset                                        32     148    +116
> strlcpy                                      116     132     +16
> strscpy_pad                                   84      96     +12
> strlcat                                      176     164     -12
> memmove                                       76      52     -24
> Total: Before=1225371, After=1225767, chg +0.03%
> 
> I will post benchmarks made on a RISC-V machine which can't handle
> unaligned accesses, and it will be the first user of the new
> functions.
> 
> > For big endian, you ought to be able to boot test in QEMU.  I think
> > you'd find out pretty quickly if any of the above had issues.
> > (Enabling KASAN is probably also a good idea for a test, too). Check
> > out
> > https://github.com/ClangBuiltLinux/boot-utils
> > for ready made images and scripts for launching various
> > architectures and endiannesses.
> >
> 
> Will do!
> 

I finally made the benchmarks on RISC-V.

The current byte-at-time memcpy() always copy 74 Mbyte/sec, no matter
the alignment of the buffers, while my implementation do 114 Mb/s on
two aligned buffers and 107 Mb/s when need to shift and merge words.

For memset(), the current byte-at-time implementation always writes 140
Mb/s, my word-wise implementation always copies 241 Mb/s. Both
implementations have the same performance with aligned and unaligned
buffers.

The memcpy() test is a simple loop which calls memcpy with different
offsets on a 32MB buffer:

#define NUM_PAGES	(1 << (MAX_ORDER + 2))
#define PG_SIZE		(PAGE_SIZE * NUM_PAGES)


	page1 = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);
	page2 = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);

	src = page_to_virt(page1);
	dst = page_to_virt(page2);

	for (i = 0; i < sizeof(void*); i++) {
		for (j = 0; j < sizeof(void*); j++) {
			t0 = ktime_get();
			memcpy(dst + i, src + j, PG_SIZE - max(i, j));
			t1 = ktime_get();
			printk("Strings selftest: memcpy(dst+%d, src+%d), distance %lu: %llu Mb/s\n",
			       i, j, (j - i) % sizeof(long),
			       PG_SIZE * (1000000000l / 1048576l) / (t1-t0));
		}
		printk("\n");
	}

Similarly, the memset() one:

	page = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);
	dst = page_to_virt(page);

	for (i = 0; i < sizeof(void*); i++) {
		t0 = ktime_get();
		memset(dst + i, 0, PG_SIZE - i);
		t1 = ktime_get();
		printk("Strings selftest: memset(dst+%d): %llu Mb/s\n",
		       i, PG_SIZE * (1000000000l / 1048576l) / (t1-t0));
	}

The results for the current one are:

[   27.893931] Strings selftest: memcpy testing with size: 32 Mb
[   28.315272] Strings selftest: memcpy(dst+0, src+0), distance 0: 75 Mb/s
[   28.736485] Strings selftest: memcpy(dst+0, src+1), distance 1: 75 Mb/s
[   29.156826] Strings selftest: memcpy(dst+0, src+2), distance 2: 76 Mb/s
[   29.576199] Strings selftest: memcpy(dst+0, src+3), distance 3: 76 Mb/s
[   29.994360] Strings selftest: memcpy(dst+0, src+4), distance 4: 76 Mb/s
[   30.411766] Strings selftest: memcpy(dst+0, src+5), distance 5: 76 Mb/s
[   30.828124] Strings selftest: memcpy(dst+0, src+6), distance 6: 76 Mb/s
[   31.243514] Strings selftest: memcpy(dst+0, src+7), distance 7: 76 Mb/s

[...]

[   52.077251] Strings selftest: memcpy(dst+7, src+0), distance 1: 74 Mb/s
[   52.508115] Strings selftest: memcpy(dst+7, src+1), distance 2: 74 Mb/s
[   52.939309] Strings selftest: memcpy(dst+7, src+2), distance 3: 74 Mb/s
[   53.370493] Strings selftest: memcpy(dst+7, src+3), distance 4: 74 Mb/s
[   53.801865] Strings selftest: memcpy(dst+7, src+4), distance 5: 74 Mb/s
[   54.233287] Strings selftest: memcpy(dst+7, src+5), distance 6: 74 Mb/s
[   54.664990] Strings selftest: memcpy(dst+7, src+6), distance 7: 74 Mb/s
[   55.086996] Strings selftest: memcpy(dst+7, src+7), distance 0: 75 Mb/s

[   55.109680] Strings selftest: memset testing with size: 32 Mb
[   55.337873] Strings selftest: memset(dst+0): 140 Mb/s
[   55.565905] Strings selftest: memset(dst+1): 140 Mb/s
[   55.793987] Strings selftest: memset(dst+2): 140 Mb/s
[   56.022140] Strings selftest: memset(dst+3): 140 Mb/s
[   56.250259] Strings selftest: memset(dst+4): 140 Mb/s
[   56.478283] Strings selftest: memset(dst+5): 140 Mb/s
[   56.706296] Strings selftest: memset(dst+6): 140 Mb/s
[   56.934335] Strings selftest: memset(dst+7): 140 Mb/s

While for the proposed one:

[   38.843970] Strings selftest: memcpy testing with size: 32 Mb
[   39.124047] Strings selftest: memcpy(dst+0, src+0), distance 0: 114 Mb/s
[   39.421848] Strings selftest: memcpy(dst+0, src+1), distance 1: 107 Mb/s
[   39.719613] Strings selftest: memcpy(dst+0, src+2), distance 2: 107 Mb/s
[   40.017310] Strings selftest: memcpy(dst+0, src+3), distance 3: 107 Mb/s
[   40.314939] Strings selftest: memcpy(dst+0, src+4), distance 4: 107 Mb/s
[   40.612485] Strings selftest: memcpy(dst+0, src+5), distance 5: 107 Mb/s
[   40.910054] Strings selftest: memcpy(dst+0, src+6), distance 6: 107 Mb/s
[   41.207577] Strings selftest: memcpy(dst+0, src+7), distance 7: 107 Mb/s

[...]

[   55.682517] Strings selftest: memcpy(dst+7, src+0), distance 1: 107 Mb/s
[   55.980262] Strings selftest: memcpy(dst+7, src+1), distance 2: 107 Mb/s
[   56.277862] Strings selftest: memcpy(dst+7, src+2), distance 3: 107 Mb/s
[   56.575514] Strings selftest: memcpy(dst+7, src+3), distance 4: 107 Mb/s
[   56.873142] Strings selftest: memcpy(dst+7, src+4), distance 5: 107 Mb/s
[   57.170840] Strings selftest: memcpy(dst+7, src+5), distance 6: 107 Mb/s
[   57.468553] Strings selftest: memcpy(dst+7, src+6), distance 7: 107 Mb/s
[   57.748231] Strings selftest: memcpy(dst+7, src+7), distance 0: 114 Mb/s

[   57.772721] Strings selftest: memset testing with size: 32 Mb
[   57.905358] Strings selftest: memset(dst+0): 241 Mb/s
[   58.037974] Strings selftest: memset(dst+1): 241 Mb/s
[   58.170619] Strings selftest: memset(dst+2): 241 Mb/s
[   58.303228] Strings selftest: memset(dst+3): 241 Mb/s
[   58.435808] Strings selftest: memset(dst+4): 241 Mb/s
[   58.568373] Strings selftest: memset(dst+5): 241 Mb/s
[   58.700968] Strings selftest: memset(dst+6): 241 Mb/s
[   58.833499] Strings selftest: memset(dst+7): 241 Mb/s

Anyway, I have to submit a v2 because the current one fails to build on
compilers which can't understand that byte_long is constant:

lib/string.c:52:39: error: initializer element is not constant
 static const unsigned int word_mask = bytes_long - 1;
                                       ^~~~~~~~~~

Regards,
-- 
per aspera ad upstream

WARNING: multiple messages have this Message-ID (diff)
From: Matteo Croce <mcroce@linux.microsoft.com>
To: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Kossifidis <mick@ics.forth.gr>, Guo Ren <guoren@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	David Laight <David.Laight@aculab.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Drew Fustini <drew@beagleboard.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-riscv <linux-riscv@lists.infradead.org>
Subject: Re: [PATCH 0/3] lib/string: optimized mem* functions
Date: Thu, 1 Jul 2021 02:29:55 +0200	[thread overview]
Message-ID: <20210701022955.61ecb657@linux.microsoft.com> (raw)
In-Reply-To: <CAFnufp160uvpqNowZpDdOY4bSN1NJ5j81nYcX=9+rW9WZHfXcQ@mail.gmail.com>

On Sun, 27 Jun 2021 02:19:59 +0200
Matteo Croce <mcroce@linux.microsoft.com> wrote:

> On Fri, Jun 25, 2021 at 7:45 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Thu, Jun 24, 2021 at 6:02 PM Matteo Croce
> > <mcroce@linux.microsoft.com> wrote:
> > >
> > > From: Matteo Croce <mcroce@microsoft.com>
> > >
> > > Rewrite the generic mem{cpy,move,set} so that memory is accessed
> > > with the widest size possible, but without doing unaligned
> > > accesses.
> > >
> > > This was originally posted as C string functions for RISC-V[1],
> > > but as there was no specific RISC-V code, it was proposed for the
> > > generic lib/string.c implementation.
> > >
> > > Tested on RISC-V and on x86_64 by undefining
> > > __HAVE_ARCH_MEM{CPY,SET,MOVE} and HAVE_EFFICIENT_UNALIGNED_ACCESS.
> > >
> > > Further testing on big endian machines will be appreciated, as I
> > > don't have such hardware at the moment.
> >
> > Hi Matteo,
> > Neat patches.  Do you have you any benchmark data showing the
> > claimed improvements? Is it worthwhile to define these only when
> > CC_OPTIMIZE_FOR_PERFORMANCE/CC_OPTIMIZE_FOR_PERFORMANCE_O3 are
> > defined, not CC_OPTIMIZE_FOR_SIZE? I'd be curious to know the delta
> > in ST_SIZE of these functions otherwise.
> >
> 
> I compared the current versions with the new one with bloat-o-meter,
> the kernel grows by ~400 bytes on x86_64 and RISC-V
> 
> x86_64
> 
> $ scripts/bloat-o-meter vmlinux.orig vmlinux
> add/remove: 0/0 grow/shrink: 4/1 up/down: 427/-6 (421)
> Function                                     old     new   delta
> memcpy                                        29     351    +322
> memset                                        29     117     +88
> strlcat                                       68      78     +10
> strlcpy                                       50      57      +7
> memmove                                       56      50      -6
> Total: Before=8556964, After=8557385, chg +0.00%
> 
> RISC-V
> 
> $ scripts/bloat-o-meter vmlinux.orig vmlinux
> add/remove: 0/0 grow/shrink: 4/2 up/down: 432/-36 (396)
> Function                                     old     new   delta
> memcpy                                        36     324    +288
> memset                                        32     148    +116
> strlcpy                                      116     132     +16
> strscpy_pad                                   84      96     +12
> strlcat                                      176     164     -12
> memmove                                       76      52     -24
> Total: Before=1225371, After=1225767, chg +0.03%
> 
> I will post benchmarks made on a RISC-V machine which can't handle
> unaligned accesses, and it will be the first user of the new
> functions.
> 
> > For big endian, you ought to be able to boot test in QEMU.  I think
> > you'd find out pretty quickly if any of the above had issues.
> > (Enabling KASAN is probably also a good idea for a test, too). Check
> > out
> > https://github.com/ClangBuiltLinux/boot-utils
> > for ready made images and scripts for launching various
> > architectures and endiannesses.
> >
> 
> Will do!
> 

I finally made the benchmarks on RISC-V.

The current byte-at-time memcpy() always copy 74 Mbyte/sec, no matter
the alignment of the buffers, while my implementation do 114 Mb/s on
two aligned buffers and 107 Mb/s when need to shift and merge words.

For memset(), the current byte-at-time implementation always writes 140
Mb/s, my word-wise implementation always copies 241 Mb/s. Both
implementations have the same performance with aligned and unaligned
buffers.

The memcpy() test is a simple loop which calls memcpy with different
offsets on a 32MB buffer:

#define NUM_PAGES	(1 << (MAX_ORDER + 2))
#define PG_SIZE		(PAGE_SIZE * NUM_PAGES)


	page1 = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);
	page2 = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);

	src = page_to_virt(page1);
	dst = page_to_virt(page2);

	for (i = 0; i < sizeof(void*); i++) {
		for (j = 0; j < sizeof(void*); j++) {
			t0 = ktime_get();
			memcpy(dst + i, src + j, PG_SIZE - max(i, j));
			t1 = ktime_get();
			printk("Strings selftest: memcpy(dst+%d, src+%d), distance %lu: %llu Mb/s\n",
			       i, j, (j - i) % sizeof(long),
			       PG_SIZE * (1000000000l / 1048576l) / (t1-t0));
		}
		printk("\n");
	}

Similarly, the memset() one:

	page = alloc_contig_pages(NUM_PAGES, GFP_KERNEL, NUMA_NO_NODE, 0);
	dst = page_to_virt(page);

	for (i = 0; i < sizeof(void*); i++) {
		t0 = ktime_get();
		memset(dst + i, 0, PG_SIZE - i);
		t1 = ktime_get();
		printk("Strings selftest: memset(dst+%d): %llu Mb/s\n",
		       i, PG_SIZE * (1000000000l / 1048576l) / (t1-t0));
	}

The results for the current one are:

[   27.893931] Strings selftest: memcpy testing with size: 32 Mb
[   28.315272] Strings selftest: memcpy(dst+0, src+0), distance 0: 75 Mb/s
[   28.736485] Strings selftest: memcpy(dst+0, src+1), distance 1: 75 Mb/s
[   29.156826] Strings selftest: memcpy(dst+0, src+2), distance 2: 76 Mb/s
[   29.576199] Strings selftest: memcpy(dst+0, src+3), distance 3: 76 Mb/s
[   29.994360] Strings selftest: memcpy(dst+0, src+4), distance 4: 76 Mb/s
[   30.411766] Strings selftest: memcpy(dst+0, src+5), distance 5: 76 Mb/s
[   30.828124] Strings selftest: memcpy(dst+0, src+6), distance 6: 76 Mb/s
[   31.243514] Strings selftest: memcpy(dst+0, src+7), distance 7: 76 Mb/s

[...]

[   52.077251] Strings selftest: memcpy(dst+7, src+0), distance 1: 74 Mb/s
[   52.508115] Strings selftest: memcpy(dst+7, src+1), distance 2: 74 Mb/s
[   52.939309] Strings selftest: memcpy(dst+7, src+2), distance 3: 74 Mb/s
[   53.370493] Strings selftest: memcpy(dst+7, src+3), distance 4: 74 Mb/s
[   53.801865] Strings selftest: memcpy(dst+7, src+4), distance 5: 74 Mb/s
[   54.233287] Strings selftest: memcpy(dst+7, src+5), distance 6: 74 Mb/s
[   54.664990] Strings selftest: memcpy(dst+7, src+6), distance 7: 74 Mb/s
[   55.086996] Strings selftest: memcpy(dst+7, src+7), distance 0: 75 Mb/s

[   55.109680] Strings selftest: memset testing with size: 32 Mb
[   55.337873] Strings selftest: memset(dst+0): 140 Mb/s
[   55.565905] Strings selftest: memset(dst+1): 140 Mb/s
[   55.793987] Strings selftest: memset(dst+2): 140 Mb/s
[   56.022140] Strings selftest: memset(dst+3): 140 Mb/s
[   56.250259] Strings selftest: memset(dst+4): 140 Mb/s
[   56.478283] Strings selftest: memset(dst+5): 140 Mb/s
[   56.706296] Strings selftest: memset(dst+6): 140 Mb/s
[   56.934335] Strings selftest: memset(dst+7): 140 Mb/s

While for the proposed one:

[   38.843970] Strings selftest: memcpy testing with size: 32 Mb
[   39.124047] Strings selftest: memcpy(dst+0, src+0), distance 0: 114 Mb/s
[   39.421848] Strings selftest: memcpy(dst+0, src+1), distance 1: 107 Mb/s
[   39.719613] Strings selftest: memcpy(dst+0, src+2), distance 2: 107 Mb/s
[   40.017310] Strings selftest: memcpy(dst+0, src+3), distance 3: 107 Mb/s
[   40.314939] Strings selftest: memcpy(dst+0, src+4), distance 4: 107 Mb/s
[   40.612485] Strings selftest: memcpy(dst+0, src+5), distance 5: 107 Mb/s
[   40.910054] Strings selftest: memcpy(dst+0, src+6), distance 6: 107 Mb/s
[   41.207577] Strings selftest: memcpy(dst+0, src+7), distance 7: 107 Mb/s

[...]

[   55.682517] Strings selftest: memcpy(dst+7, src+0), distance 1: 107 Mb/s
[   55.980262] Strings selftest: memcpy(dst+7, src+1), distance 2: 107 Mb/s
[   56.277862] Strings selftest: memcpy(dst+7, src+2), distance 3: 107 Mb/s
[   56.575514] Strings selftest: memcpy(dst+7, src+3), distance 4: 107 Mb/s
[   56.873142] Strings selftest: memcpy(dst+7, src+4), distance 5: 107 Mb/s
[   57.170840] Strings selftest: memcpy(dst+7, src+5), distance 6: 107 Mb/s
[   57.468553] Strings selftest: memcpy(dst+7, src+6), distance 7: 107 Mb/s
[   57.748231] Strings selftest: memcpy(dst+7, src+7), distance 0: 114 Mb/s

[   57.772721] Strings selftest: memset testing with size: 32 Mb
[   57.905358] Strings selftest: memset(dst+0): 241 Mb/s
[   58.037974] Strings selftest: memset(dst+1): 241 Mb/s
[   58.170619] Strings selftest: memset(dst+2): 241 Mb/s
[   58.303228] Strings selftest: memset(dst+3): 241 Mb/s
[   58.435808] Strings selftest: memset(dst+4): 241 Mb/s
[   58.568373] Strings selftest: memset(dst+5): 241 Mb/s
[   58.700968] Strings selftest: memset(dst+6): 241 Mb/s
[   58.833499] Strings selftest: memset(dst+7): 241 Mb/s

Anyway, I have to submit a v2 because the current one fails to build on
compilers which can't understand that byte_long is constant:

lib/string.c:52:39: error: initializer element is not constant
 static const unsigned int word_mask = bytes_long - 1;
                                       ^~~~~~~~~~

Regards,
-- 
per aspera ad upstream

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2021-07-01  0:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25  1:01 [PATCH 0/3] lib/string: optimized mem* functions Matteo Croce
2021-06-25  1:01 ` Matteo Croce
2021-06-25  1:01 ` [PATCH 1/3] lib/string: optimized memcpy Matteo Croce
2021-06-25  1:01   ` Matteo Croce
2021-06-25  5:01   ` kernel test robot
2021-06-25  5:01     ` kernel test robot
2021-06-25 14:41   ` kernel test robot
2021-06-25 14:41     ` kernel test robot
2021-06-25  1:01 ` [PATCH 2/3] lib/string: optimized memmove Matteo Croce
2021-06-25  1:01   ` Matteo Croce
2021-06-25  1:02 ` [PATCH 3/3] lib/string: optimized memset Matteo Croce
2021-06-25  1:02   ` Matteo Croce
2021-06-25 17:45 ` [PATCH 0/3] lib/string: optimized mem* functions Nick Desaulniers
2021-06-25 17:45   ` Nick Desaulniers
2021-06-27  0:19   ` Matteo Croce
2021-06-27  0:19     ` Matteo Croce
2021-07-01  0:29     ` Matteo Croce [this message]
2021-07-01  0:29       ` Matteo Croce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210701022955.61ecb657@linux.microsoft.com \
    --to=mcroce@linux.microsoft.com \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=drew@beagleboard.org \
    --cc=guoren@kernel.org \
    --cc=hch@infradead.org \
    --cc=kernel@esmil.dk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mick@ics.forth.gr \
    --cc=ndesaulniers@google.com \
    --cc=palmer@dabbelt.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.