From: "Sébastien Dugué" <sebastien.dugue@bull.net>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Andi Kleen <andi@firstfloor.org>, <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
<x86@kernel.org>
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
Date: Tue, 15 Oct 2013 09:12:51 +0200 [thread overview]
Message-ID: <20131015091251.2345b918@b012350-ux> (raw)
In-Reply-To: <20131014202528.GG26880@hmsreliant.think-freely.org>
Hi Neil, Andi,
On Mon, 14 Oct 2013 16:25:28 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:
> On Sun, Oct 13, 2013 at 09:38:33PM -0700, Andi Kleen wrote:
> > Neil Horman <nhorman@tuxdriver.com> writes:
> >
> > > Sébastien Dugué reported to me that devices implementing ipoib (which don't have
> > > checksum offload hardware were spending a significant amount of time computing
> >
> > Must be an odd workload, most TCP/UDP workloads do copy-checksum
> > anyways. I would rather investigate why that doesn't work.
> >
> FWIW, the reporter was reporting this using an IP over Infiniband network.
> Neil
indeed, our typical workload is connected mode IPoIB on mlx4 QDR hardware
where one cannot benefit from hardware offloads.
For a bit of background on the issue:
It all started nearly 3 years ago when trying to understand why IPoIB BW was
so low in our setups and why ksoftirqd used 100% of one CPU. A kernel profile
trace showed that the CPU spent most of it's time in checksum computation (from
the only old trace I managed to unearth):
Function Hit Time Avg
-------- --- ---- ---
schedule 1730 629976998 us 364148.5 us
csum_partial 10813465 20944414 us 1.936 us
mwait_idle_with_hints 1451 9858861 us 6794.529 us
get_page_from_freelist 10110434 8120524 us 0.803 us
alloc_pages_current 10093675 5180650 us 0.513 us
__phys_addr 35554783 4471387 us 0.125 us
zone_statistics 10110434 4360871 us 0.431 us
ipoib_cm_alloc_rx_skb 673899 4343949 us 6.445 us
After having recoded the checksum to use 2 ALUs, csum_partial() disappeared
from the tracer radar. IPoIB BW got from ~12Gb/s to ~ 20Gb/s and ksoftirqd load
dropped down drastically. Sorry, I could not manage to locate my old traces and
results, those seem to have been lost in the mist of time.
I did some micro benchmark (dirty hack code below) of different solutions.
It looks like processing 128-byte blocks in 4 chains allows the best performance,
but there are plenty other possibilities.
FWIW, this code has been running as is at our customers sites for 3 years now.
Sébastien.
>
> > That said the change looks reasonable, but may not fix the root cause.
> >
> > -Andi
> >
> > --
> > ak@linux.intel.com -- Speaking for myself only
> >
8<----------------------------------------------------------------------
/*
* gcc -Wall -O3 -o csum_test csum_test.c -lrt
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
#include <string.h>
#include <errno.h>
#define __force
#define unlikely(x) (x)
typedef uint32_t u32;
typedef uint16_t u16;
typedef u16 __sum16;
typedef u32 __wsum;
#define NUM_LOOPS 100000
#define BUF_LEN 65536
unsigned char buf[BUF_LEN];
/*
* csum_fold - Fold and invert a 32bit checksum.
* sum: 32bit unfolded sum
*
* Fold a 32bit running checksum to 16bit and invert it. This is usually
* the last step before putting a checksum into a packet.
* Make sure not to mix with 64bit checksums.
*/
static inline __sum16 csum_fold(__wsum sum)
{
asm(" addl %1,%0\n"
" adcl $0xffff,%0"
: "=r" (sum)
: "r" ((__force u32)sum << 16),
"0" ((__force u32)sum & 0xffff0000));
return (__force __sum16)(~(__force u32)sum >> 16);
}
static inline unsigned short from32to16(unsigned a)
{
unsigned short b = a >> 16;
asm("addw %w2,%w0\n\t"
"adcw $0,%w0\n"
: "=r" (b)
: "0" (b), "r" (a));
return b;
}
static inline unsigned add32_with_carry(unsigned a, unsigned b)
{
asm("addl %2,%0\n\t"
"adcl $0,%0"
: "=r" (a)
: "0" (a), "r" (b));
return a;
}
/*
* Do a 64-bit checksum on an arbitrary memory area.
* Returns a 32bit checksum.
*
* This isn't as time critical as it used to be because many NICs
* do hardware checksumming these days.
*
* Things tried and found to not make it faster:
* Manual Prefetching
* Unrolling to an 128 bytes inner loop.
* Using interleaving with more registers to break the carry chains.
*/
static unsigned do_csum(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result = 0;
if (unlikely(len == 0))
return result;
odd = 1 & (unsigned long) buff;
if (unlikely(odd)) {
result = *buff << 8;
len--;
buff++;
}
count = len >> 1; /* nr of 16-bit words.. */
if (count) {
if (2 & (unsigned long) buff) {
result += *(unsigned short *)buff;
count--;
len -= 2;
buff += 2;
}
count >>= 1; /* nr of 32-bit words.. */
if (count) {
unsigned long zero;
unsigned count64;
if (4 & (unsigned long) buff) {
result += *(unsigned int *) buff;
count--;
len -= 4;
buff += 4;
}
count >>= 1; /* nr of 64-bit words.. */
/* main loop using 64byte blocks */
zero = 0;
count64 = count >> 3;
while (count64) {
asm("addq 0*8(%[src]),%[res]\n\t"
"adcq 1*8(%[src]),%[res]\n\t"
"adcq 2*8(%[src]),%[res]\n\t"
"adcq 3*8(%[src]),%[res]\n\t"
"adcq 4*8(%[src]),%[res]\n\t"
"adcq 5*8(%[src]),%[res]\n\t"
"adcq 6*8(%[src]),%[res]\n\t"
"adcq 7*8(%[src]),%[res]\n\t"
"adcq %[zero],%[res]"
: [res] "=r" (result)
: [src] "r" (buff), [zero] "r" (zero),
"[res]" (result));
buff += 64;
count64--;
}
/* printf("csum %lx\n", result); */
/* last upto 7 8byte blocks */
count %= 8;
while (count) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
--count;
buff += 8;
}
result = add32_with_carry(result>>32,
result&0xffffffff);
if (len & 4) {
result += *(unsigned int *) buff;
buff += 4;
}
}
if (len & 2) {
result += *(unsigned short *) buff;
buff += 2;
}
}
if (len & 1)
result += *buff;
result = add32_with_carry(result>>32, result & 0xffffffff);
if (unlikely(odd)) {
result = from32to16(result);
result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
}
return result;
}
static unsigned do_csum1(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result1 = 0;
unsigned long result2 = 0;
unsigned long result = 0;
if (unlikely(len == 0))
return result;
odd = 1 & (unsigned long) buff;
if (unlikely(odd)) {
result = *buff << 8;
len--;
buff++;
}
count = len >> 1; /* nr of 16-bit words.. */
if (count) {
if (2 & (unsigned long) buff) {
result += *(unsigned short *)buff;
count--;
len -= 2;
buff += 2;
}
count >>= 1; /* nr of 32-bit words.. */
if (count) {
unsigned long zero;
unsigned count64;
if (4 & (unsigned long) buff) {
result += *(unsigned int *) buff;
count--;
len -= 4;
buff += 4;
}
count >>= 1; /* nr of 64-bit words.. */
/* main loop using 64byte blocks */
zero = 0;
count64 = count >> 3;
while (count64) {
asm("addq 0*8(%[src]),%[res1]\n\t"
"adcq 2*8(%[src]),%[res1]\n\t"
"adcq 4*8(%[src]),%[res1]\n\t"
"adcq 6*8(%[src]),%[res1]\n\t"
"adcq %[zero],%[res1]\n\t"
"addq 1*8(%[src]),%[res2]\n\t"
"adcq 3*8(%[src]),%[res2]\n\t"
"adcq 5*8(%[src]),%[res2]\n\t"
"adcq 7*8(%[src]),%[res2]\n\t"
"adcq %[zero],%[res2]"
: [res1] "=r" (result1),
[res2] "=r" (result2)
: [src] "r" (buff), [zero] "r" (zero),
"[res1]" (result1), "[res2]" (result2));
buff += 64;
count64--;
}
asm("addq %[res1],%[res]\n\t"
"adcq %[res2],%[res]\n\t"
"adcq %[zero],%[res]"
: [res] "=r" (result)
: [res1] "r" (result1),
[res2] "r" (result2),
[zero] "r" (zero),
"0" (result));
/* last upto 7 8byte blocks */
count %= 8;
while (count) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
--count;
buff += 8;
}
result = add32_with_carry(result>>32,
result&0xffffffff);
if (len & 4) {
result += *(unsigned int *) buff;
buff += 4;
}
}
if (len & 2) {
result += *(unsigned short *) buff;
buff += 2;
}
}
if (len & 1)
result += *buff;
result = add32_with_carry(result>>32, result & 0xffffffff);
if (unlikely(odd)) {
result = from32to16(result);
result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
}
return result;
}
static unsigned do_csum2(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result1 = 0;
unsigned long result2 = 0;
unsigned long result3 = 0;
unsigned long result4 = 0;
unsigned long result = 0;
if (unlikely(len == 0))
return result;
odd = 1 & (unsigned long) buff;
if (unlikely(odd)) {
result = *buff << 8;
len--;
buff++;
}
count = len >> 1; /* nr of 16-bit words.. */
if (count) {
if (2 & (unsigned long) buff) {
result += *(unsigned short *)buff;
count--;
len -= 2;
buff += 2;
}
count >>= 1; /* nr of 32-bit words.. */
if (count) {
if (4 & (unsigned long) buff) {
result += *(unsigned int *) buff;
count--;
len -= 4;
buff += 4;
}
count >>= 1; /* nr of 64-bit words.. */
if (count) {
unsigned long zero = 0;
unsigned count128;
if (8 & (unsigned long) buff) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
count--;
buff += 8;
}
/* main loop using 128 byte blocks */
count128 = count >> 4;
while (count128) {
asm("addq 0*8(%[src]),%[res1]\n\t"
"adcq 4*8(%[src]),%[res1]\n\t"
"adcq 8*8(%[src]),%[res1]\n\t"
"adcq 12*8(%[src]),%[res1]\n\t"
"adcq %[zero],%[res1]\n\t"
"addq 1*8(%[src]),%[res2]\n\t"
"adcq 5*8(%[src]),%[res2]\n\t"
"adcq 9*8(%[src]),%[res2]\n\t"
"adcq 13*8(%[src]),%[res2]\n\t"
"adcq %[zero],%[res2]\n\t"
"addq 2*8(%[src]),%[res3]\n\t"
"adcq 6*8(%[src]),%[res3]\n\t"
"adcq 10*8(%[src]),%[res3]\n\t"
"adcq 14*8(%[src]),%[res3]\n\t"
"adcq %[zero],%[res3]\n\t"
"addq 3*8(%[src]),%[res4]\n\t"
"adcq 7*8(%[src]),%[res4]\n\t"
"adcq 11*8(%[src]),%[res4]\n\t"
"adcq 15*8(%[src]),%[res4]\n\t"
"adcq %[zero],%[res4]"
: [res1] "=r" (result1),
[res2] "=r" (result2),
[res3] "=r" (result3),
[res4] "=r" (result4)
: [src] "r" (buff),
[zero] "r" (zero),
"[res1]" (result1),
"[res2]" (result2),
"[res3]" (result3),
"[res4]" (result4));
buff += 128;
count128--;
}
asm("addq %[res1],%[res]\n\t"
"adcq %[res2],%[res]\n\t"
"adcq %[res3],%[res]\n\t"
"adcq %[res4],%[res]\n\t"
"adcq %[zero],%[res]"
: [res] "=r" (result)
: [res1] "r" (result1),
[res2] "r" (result2),
[res3] "r" (result3),
[res4] "r" (result4),
[zero] "r" (zero),
"0" (result));
/* last upto 15 8byte blocks */
count %= 16;
while (count) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
--count;
buff += 8;
}
result = add32_with_carry(result>>32,
result&0xffffffff);
if (len & 8) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
buff += 8;
}
}
if (len & 4) {
result += *(unsigned int *) buff;
buff += 4;
}
}
if (len & 2) {
result += *(unsigned short *) buff;
buff += 2;
}
}
if (len & 1)
result += *buff;
result = add32_with_carry(result>>32, result & 0xffffffff);
if (unlikely(odd)) {
result = from32to16(result);
result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
}
return result;
}
static unsigned do_csum3(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result1 = 0;
unsigned long result2 = 0;
unsigned long result3 = 0;
unsigned long result4 = 0;
unsigned long result = 0;
if (unlikely(len == 0))
return result;
odd = 1 & (unsigned long) buff;
if (unlikely(odd)) {
result = *buff << 8;
len--;
buff++;
}
count = len >> 1; /* nr of 16-bit words.. */
if (count) {
if (2 & (unsigned long) buff) {
result += *(unsigned short *)buff;
count--;
len -= 2;
buff += 2;
}
count >>= 1; /* nr of 32-bit words.. */
if (count) {
unsigned long zero;
unsigned count64;
if (4 & (unsigned long) buff) {
result += *(unsigned int *) buff;
count--;
len -= 4;
buff += 4;
}
count >>= 1; /* nr of 64-bit words.. */
/* main loop using 64byte blocks */
zero = 0;
count64 = count >> 3;
while (count64) {
asm("addq 0*8(%[src]),%[res1]\n\t"
"adcq 4*8(%[src]),%[res1]\n\t"
"adcq %[zero],%[res1]\n\t"
"addq 1*8(%[src]),%[res2]\n\t"
"adcq 5*8(%[src]),%[res2]\n\t"
"adcq %[zero],%[res2]\n\t"
"addq 2*8(%[src]),%[res3]\n\t"
"adcq 6*8(%[src]),%[res3]\n\t"
"adcq %[zero],%[res3]\n\t"
"addq 3*8(%[src]),%[res4]\n\t"
"adcq 7*8(%[src]),%[res4]\n\t"
"adcq %[zero],%[res4]\n\t"
: [res1] "=r" (result1),
[res2] "=r" (result2),
[res3] "=r" (result3),
[res4] "=r" (result4)
: [src] "r" (buff),
[zero] "r" (zero),
"[res1]" (result1),
"[res2]" (result2),
"[res3]" (result3),
"[res4]" (result4));
buff += 64;
count64--;
}
asm("addq %[res1],%[res]\n\t"
"adcq %[res2],%[res]\n\t"
"adcq %[res3],%[res]\n\t"
"adcq %[res4],%[res]\n\t"
"adcq %[zero],%[res]"
: [res] "=r" (result)
: [res1] "r" (result1),
[res2] "r" (result2),
[res3] "r" (result3),
[res4] "r" (result4),
[zero] "r" (zero),
"0" (result));
/* printf("csum1 %lx\n", result); */
/* last upto 7 8byte blocks */
count %= 8;
while (count) {
asm("addq %1,%0\n\t"
"adcq %2,%0\n"
: "=r" (result)
: "m" (*(unsigned long *)buff),
"r" (zero), "0" (result));
--count;
buff += 8;
}
result = add32_with_carry(result>>32,
result&0xffffffff);
if (len & 4) {
result += *(unsigned int *) buff;
buff += 4;
}
}
if (len & 2) {
result += *(unsigned short *) buff;
buff += 2;
}
}
if (len & 1)
result += *buff;
result = add32_with_carry(result>>32, result & 0xffffffff);
if (unlikely(odd)) {
result = from32to16(result);
result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
}
return result;
}
long long delta_ns(struct timespec *t1, struct timespec *t2)
{
long long tt1, tt2, delta;
tt1 = t1->tv_sec * 1000000000 + t1->tv_nsec;
tt2 = t2->tv_sec * 1000000000 + t2->tv_nsec;
delta = tt2 - tt1;
return delta;
}
int main(int argc, char **argv)
{
FILE *f;
unsigned csum1, csum2, csum3, csum4;
struct timespec t1;
struct timespec t2;
double delta;
int i;
unsigned int offset = 0;
unsigned char *ptr;
unsigned int size;
if ((f = fopen("data.bin", "r")) == NULL) {
printf("Failed to open input file data.bin: %s\n",
strerror(errno));
return -1;
}
if (fread(buf, 1, BUF_LEN, f) != BUF_LEN) {
printf("Failed to read data.bin: %s\n",
strerror(errno));
fclose(f);
return -1;
}
fclose(f);
if (argc > 1)
offset = atoi(argv[1]);
printf("Using offset=%d\n", offset);
ptr = &buf[offset];
size = BUF_LEN - offset;
clock_gettime(CLOCK_MONOTONIC, &t1);
for (i = 0; i < NUM_LOOPS; i++)
csum1 = do_csum((const unsigned char *)ptr, size);
clock_gettime(CLOCK_MONOTONIC, &t2);
delta = (double)delta_ns(&t1, &t2)/1000.0;
printf("Original: %.8x %f us\n",
csum1, (double)delta/(double)NUM_LOOPS);
clock_gettime(CLOCK_MONOTONIC, &t1);
for (i = 0; i < NUM_LOOPS; i++)
csum2 = do_csum1((const unsigned char *)ptr, size);
clock_gettime(CLOCK_MONOTONIC, &t2);
delta = (double)delta_ns(&t1, &t2)/1000.0;
printf("64B Split2: %.8x %f us\n",
csum2, (double)delta/(double)NUM_LOOPS);
clock_gettime(CLOCK_MONOTONIC, &t1);
for (i = 0; i < NUM_LOOPS; i++)
csum3 = do_csum2((const unsigned char *)ptr, size);
clock_gettime(CLOCK_MONOTONIC, &t2);
delta = (double)delta_ns(&t1, &t2)/1000.0;
printf("128B Split4: %.8x %f us\n",
csum3, (double)delta/(double)NUM_LOOPS);
clock_gettime(CLOCK_MONOTONIC, &t1);
for (i = 0; i < NUM_LOOPS; i++)
csum4 = do_csum3((const unsigned char *)ptr, size);
clock_gettime(CLOCK_MONOTONIC, &t2);
delta = (double)delta_ns(&t1, &t2)/1000.0;
printf("64B Split4: %.8x %f us\n",
csum4, (double)delta/(double)NUM_LOOPS);
if ((csum1 != csum2) || (csum1 != csum3) || (csum1 != csum4))
printf("Wrong checksum\n");
return 0;
}
next prev parent reply other threads:[~2013-10-15 7:47 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-11 16:51 [PATCH] x86: Run checksumming in parallel accross multiple alu's Neil Horman
2013-10-12 17:21 ` Ingo Molnar
2013-10-13 12:53 ` Neil Horman
2013-10-14 20:28 ` Neil Horman
2013-10-14 21:19 ` Eric Dumazet
2013-10-14 22:18 ` Eric Dumazet
2013-10-14 22:37 ` Joe Perches
2013-10-14 22:44 ` Eric Dumazet
2013-10-14 22:49 ` Joe Perches
2013-10-15 7:41 ` Ingo Molnar
2013-10-15 10:51 ` Borislav Petkov
2013-10-15 12:04 ` Ingo Molnar
2013-10-15 16:21 ` Joe Perches
2013-10-16 0:34 ` Eric Dumazet
2013-10-16 6:25 ` Ingo Molnar
2013-10-16 16:55 ` Joe Perches
2013-10-17 0:34 ` Neil Horman
2013-10-17 1:42 ` Eric Dumazet
2013-10-18 16:50 ` Neil Horman
2013-10-18 17:20 ` Eric Dumazet
2013-10-18 20:11 ` Neil Horman
2013-10-18 21:15 ` Eric Dumazet
2013-10-20 21:29 ` Neil Horman
2013-10-21 17:31 ` Eric Dumazet
2013-10-21 17:46 ` Neil Horman
2013-10-21 19:21 ` Neil Horman
2013-10-21 19:44 ` Eric Dumazet
2013-10-21 20:19 ` Neil Horman
2013-10-26 12:01 ` Ingo Molnar
2013-10-26 13:58 ` Neil Horman
2013-10-27 7:26 ` Ingo Molnar
2013-10-27 17:05 ` Neil Horman
2013-10-17 8:41 ` Ingo Molnar
2013-10-17 18:19 ` H. Peter Anvin
2013-10-17 18:48 ` Eric Dumazet
2013-10-18 6:43 ` Ingo Molnar
2013-10-28 16:01 ` Neil Horman
2013-10-28 16:20 ` Ingo Molnar
2013-10-28 17:49 ` Neil Horman
2013-10-28 16:24 ` Ingo Molnar
2013-10-28 16:49 ` David Ahern
2013-10-28 17:46 ` Neil Horman
2013-10-28 18:29 ` Neil Horman
2013-10-29 8:25 ` Ingo Molnar
2013-10-29 11:20 ` Neil Horman
2013-10-29 11:30 ` Ingo Molnar
2013-10-29 11:49 ` Neil Horman
2013-10-29 12:52 ` Ingo Molnar
2013-10-29 13:07 ` Neil Horman
2013-10-29 13:11 ` Ingo Molnar
2013-10-29 13:20 ` Neil Horman
2013-10-29 14:17 ` Neil Horman
2013-10-29 14:27 ` Ingo Molnar
2013-10-29 20:26 ` Neil Horman
2013-10-31 10:22 ` Ingo Molnar
2013-10-31 14:33 ` Neil Horman
2013-11-01 9:13 ` Ingo Molnar
2013-11-01 14:06 ` Neil Horman
2013-10-29 14:12 ` David Ahern
2013-10-15 7:32 ` Ingo Molnar
2013-10-15 13:14 ` Neil Horman
2013-10-12 22:29 ` H. Peter Anvin
2013-10-13 12:53 ` Neil Horman
2013-10-18 16:42 ` Neil Horman
2013-10-18 17:09 ` H. Peter Anvin
2013-10-25 13:06 ` Neil Horman
2013-10-14 4:38 ` Andi Kleen
2013-10-14 7:49 ` Ingo Molnar
2013-10-14 21:07 ` Eric Dumazet
2013-10-15 13:17 ` Neil Horman
2013-10-14 20:25 ` Neil Horman
2013-10-15 7:12 ` Sébastien Dugué [this message]
2013-10-15 13:33 ` Andi Kleen
2013-10-15 13:56 ` Sébastien Dugué
2013-10-15 14:06 ` Eric Dumazet
2013-10-15 14:15 ` Sébastien Dugué
2013-10-15 14:26 ` Eric Dumazet
2013-10-15 14:52 ` Eric Dumazet
2013-10-15 16:02 ` Andi Kleen
2013-10-16 0:28 ` Eric Dumazet
2013-11-06 15:23 ` x86: Enhance perf checksum profiling and x86 implementation Neil Horman
2013-11-06 15:23 ` [PATCH v2 1/2] perf: Add csum benchmark tests to perf Neil Horman
2013-11-06 15:23 ` [PATCH v2 2/2] x86: add prefetching to do_csum Neil Horman
2013-11-06 15:34 ` Dave Jones
2013-11-06 15:54 ` Neil Horman
2013-11-06 17:19 ` Joe Perches
2013-11-06 18:11 ` Neil Horman
2013-11-06 20:02 ` Neil Horman
2013-11-06 20:07 ` Joe Perches
2013-11-08 16:25 ` Neil Horman
2013-11-08 16:51 ` Joe Perches
2013-11-08 19:07 ` Neil Horman
2013-11-08 19:17 ` Joe Perches
2013-11-08 20:08 ` Neil Horman
2013-11-08 19:17 ` H. Peter Anvin
2013-11-08 19:01 ` Neil Horman
2013-11-08 19:33 ` Joe Perches
2013-11-08 20:14 ` Neil Horman
2013-11-08 20:29 ` Joe Perches
2013-11-11 19:40 ` Neil Horman
2013-11-11 21:18 ` Ingo Molnar
2013-11-06 18:23 ` Eric Dumazet
2013-11-06 18:59 ` Neil Horman
2013-11-06 20:19 ` Andi Kleen
2013-11-07 21:23 ` Neil Horman
-- strict thread matches above, loose matches on Subject: below --
2013-10-18 15:46 [PATCH] x86: Run checksumming in parallel accross multiple alu's Doug Ledford
2013-10-18 17:42 Doug Ledford
2013-10-19 8:23 ` Ingo Molnar
2013-10-21 17:54 ` Doug Ledford
2013-10-26 11:55 ` Ingo Molnar
2013-10-28 17:02 ` Doug Ledford
2013-10-29 8:38 ` Ingo Molnar
2013-10-30 5:25 Doug Ledford
2013-10-30 10:27 ` David Laight
2013-10-30 11:02 ` Neil Horman
2013-10-30 12:18 ` David Laight
2013-10-30 13:22 ` Doug Ledford
2013-10-30 13:35 ` Doug Ledford
2013-10-30 14:04 ` David Laight
2013-10-30 14:52 ` Neil Horman
2013-10-31 18:30 ` Neil Horman
2013-11-01 9:21 ` Ingo Molnar
2013-11-01 15:42 ` Ben Hutchings
2013-11-01 16:08 ` Neil Horman
2013-11-01 16:16 ` Ben Hutchings
2013-11-01 16:18 ` David Laight
2013-11-01 17:37 ` Neil Horman
2013-11-01 19:45 ` Joe Perches
2013-11-01 19:58 ` Neil Horman
2013-11-01 20:26 ` Joe Perches
2013-11-02 2:07 ` Neil Horman
2013-11-04 9:47 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131015091251.2345b918@b012350-ux \
--to=sebastien.dugue@bull.net \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nhorman@tuxdriver.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).