netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net/core/flow.c: compare data with memcmp
@ 2006-12-31 16:37 Daniel Marjamäki
  2006-12-31 19:49 ` [KJ] " Nishanth Aravamudan
  2006-12-31 20:37 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Marjamäki @ 2006-12-31 16:37 UTC (permalink / raw)
  To: netdev; +Cc: kernel-janitors, linux-kernel

From: Daniel Marjamäki
This has been tested by me.
Signed-off-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>
--- linux-2.6.20-rc2/net/core/flow.c	2006-12-27 09:59:56.000000000 +0100
+++ linux/net/core/flow.c	2006-12-31 18:26:06.000000000 +0100
@@ -144,29 +144,16 @@ typedef u32 flow_compare_t;

 extern void flowi_is_missized(void);

-/* I hear what you're saying, use memcmp.  But memcmp cannot make
- * important assumptions that we can here, such as alignment and
- * constant size.
- */
 static int flow_key_compare(struct flowi *key1, struct flowi *key2)
 {
-	flow_compare_t *k1, *k1_lim, *k2;
-	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
 	if (sizeof(struct flowi) % sizeof(flow_compare_t))
 		flowi_is_missized();

-	k1 = (flow_compare_t *) key1;
-	k1_lim = k1 + n_elem;
-
-	k2 = (flow_compare_t *) key2;
-
-	do {
-		if (*k1++ != *k2++)
-			return 1;
-	} while (k1 < k1_lim);
+	/* Number of elements to compare */
+	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

-	return 0;
+	/* Compare all elements in key1 and key2. */
+	return memcmp(key1, key2, n_elem * sizeof(flow_compare_t));
 }

 void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [KJ] [PATCH] net/core/flow.c: compare data with memcmp
  2006-12-31 16:37 [PATCH] net/core/flow.c: compare data with memcmp Daniel Marjamäki
@ 2006-12-31 19:49 ` Nishanth Aravamudan
  2006-12-31 20:37 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: Nishanth Aravamudan @ 2006-12-31 19:49 UTC (permalink / raw)
  To: Daniel Marjamäki; +Cc: netdev, kernel-janitors, linux-kernel

On 31.12.2006 [17:37:05 +0100], Daniel Marjam?ki wrote:
> From: Daniel Marjamäki
> This has been tested by me.
> Signed-off-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>
> --- linux-2.6.20-rc2/net/core/flow.c	2006-12-27 09:59:56.000000000 +0100
> +++ linux/net/core/flow.c	2006-12-31 18:26:06.000000000 +0100
> @@ -144,29 +144,16 @@ typedef u32 flow_compare_t;
> 
>  extern void flowi_is_missized(void);
> 
> -/* I hear what you're saying, use memcmp.  But memcmp cannot make
> - * important assumptions that we can here, such as alignment and
> - * constant size.
> - */
>  static int flow_key_compare(struct flowi *key1, struct flowi *key2)
>  {
> -	flow_compare_t *k1, *k1_lim, *k2;
> -	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
> -
>  	if (sizeof(struct flowi) % sizeof(flow_compare_t))
>  		flowi_is_missized();
> 
> -	k1 = (flow_compare_t *) key1;
> -	k1_lim = k1 + n_elem;
> -
> -	k2 = (flow_compare_t *) key2;
> -
> -	do {
> -		if (*k1++ != *k2++)
> -			return 1;
> -	} while (k1 < k1_lim);
> +	/* Number of elements to compare */
> +	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
> 
> -	return 0;
> +	/* Compare all elements in key1 and key2. */
> +	return memcmp(key1, key2, n_elem * sizeof(flow_compare_t));
>  }

Small nit, I don't think either of your comments make the code any
clearer than the code on its own, so drop them both.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/core/flow.c: compare data with memcmp
  2006-12-31 16:37 [PATCH] net/core/flow.c: compare data with memcmp Daniel Marjamäki
  2006-12-31 19:49 ` [KJ] " Nishanth Aravamudan
@ 2006-12-31 20:37 ` David Miller
  2007-01-01  7:47   ` Daniel Marjamäki
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2006-12-31 20:37 UTC (permalink / raw)
  To: daniel.marjamaki; +Cc: netdev, kernel-janitors, linux-kernel

From: "Daniel_Marjamäki" <daniel.marjamaki@gmail.com>
Date: Sun, 31 Dec 2006 17:37:05 +0100

> From: Daniel Marjamäki
> This has been tested by me.
> Signed-off-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>

Please do not do this.

memcmp() cannot assume the alignment of the source and
destination buffers and thus will run more slowly than
that open-coded comparison.

That code was done like that on purpose because it is
one of the most critical paths in the networking flow
cache lookup which runs for every IPSEC packet going
throught the system.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/core/flow.c: compare data with memcmp
  2006-12-31 20:37 ` David Miller
@ 2007-01-01  7:47   ` Daniel Marjamäki
  2007-01-01  8:49     ` David Miller
  2007-01-01  9:16     ` Daniel Marjamäki
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Marjamäki @ 2007-01-01  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kernel-janitors, linux-kernel

Hello!

So you mean that in this particular case it's faster with a handcoded
comparison than memcmp? Because both key1 and key2 are located at
word-aligned addresses?
That's fascinating.

Best regards,
Daniel

2006/12/31, David Miller <davem@davemloft.net>:
> From: "Daniel_Marjamäki" <daniel.marjamaki@gmail.com>
> Date: Sun, 31 Dec 2006 17:37:05 +0100
>
> > From: Daniel Marjamäki
> > This has been tested by me.
> > Signed-off-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>
>
> Please do not do this.
>
> memcmp() cannot assume the alignment of the source and
> destination buffers and thus will run more slowly than
> that open-coded comparison.
>
> That code was done like that on purpose because it is
> one of the most critical paths in the networking flow
> cache lookup which runs for every IPSEC packet going
> throught the system.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/core/flow.c: compare data with memcmp
  2007-01-01  7:47   ` Daniel Marjamäki
@ 2007-01-01  8:49     ` David Miller
  2007-01-01  9:16     ` Daniel Marjamäki
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2007-01-01  8:49 UTC (permalink / raw)
  To: daniel.marjamaki; +Cc: netdev, kernel-janitors, linux-kernel

From: "Daniel_Marjamäki" <daniel.marjamaki@gmail.com>
Date: Mon, 1 Jan 2007 08:47:48 +0100

> So you mean that in this particular case it's faster with a handcoded
> comparison than memcmp? Because both key1 and key2 are located at
> word-aligned addresses?
> That's fascinating.

Essentially, yes.

However, I wonder.  GCC should be able to see this also, and
if it expands the memset() inline the code emitted should be
very similar.

It is something to investigate on a few cpu types, for sure.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/core/flow.c: compare data with memcmp
  2007-01-01  7:47   ` Daniel Marjamäki
  2007-01-01  8:49     ` David Miller
@ 2007-01-01  9:16     ` Daniel Marjamäki
  2007-01-02 23:36       ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Marjamäki @ 2007-01-01  9:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kernel-janitors, linux-kernel

Hello!

I have done a little testing on my own. My results is that memcpy is
many times faster even with aligned data.

I am testing in an ordinary console program. I am including the code below.
If I'm doing something wrong, please tell me so.

As you can see I am not using the same datadeclarations as the kernel
but I'm testing the algorithm here not the data. By testing various
data and types of data I can make sure the algorithm behaves correctly
in all situations.
The datamember 'd' in flowi is not part of the comparison, but by
changing it into an 'unsigned int' it becomes part of the comparison.



const int NUM_REP = 0x7FFFFFFF;

typedef unsigned int flow_compare_t;

struct flowi {
	unsigned int a,b,c;
	unsigned char d;
};


/* I hear what you're saying, use memcmp.  But memcmp cannot make
 * important assumptions that we can here, such as alignment and
 * constant size.
 */
static int flow_key_compare(struct flowi *key1, struct flowi *key2)
{
	flow_compare_t *k1, *k1_lim, *k2;
	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

	k1 = (flow_compare_t *) key1;
	k1_lim = k1 + n_elem;

	k2 = (flow_compare_t *) key2;

	do {
		if (*k1++ != *k2++)
			return 1;
	} while (k1 < k1_lim);

	return 0;
}



static int flow_key_compare2(struct flowi *key1, struct flowi *key2)
{
	return memcmp(key1, key2, (sizeof(struct flowi) /
sizeof(flow_compare_t)) * sizeof(flow_compare_t));
}






int main()
{

	struct flowi key1 = {0,1,2,3};
	struct flowi key2 = {0,1,2,0};
	char str[300];
	int i;

	/* put data in aligned addresses */
	struct flowi *k1 = (struct flowi *)((int)(&str[100]) & 0xFFFFFFF0);
	struct flowi *k2 = (struct flowi *)((int)(&str[200]) & 0xFFFFFFF0);
	memcpy(k1, &key1, sizeof(struct flowi));
	memcpy(k2, &key2, sizeof(struct flowi));

	/* Compare data */
	printf("compare1..\n");
	for (i = 0; i < NUM_REP; i++)
		flow_key_compare(k1, k2);

	printf("compare2..\n");
	for (i = 0; i < NUM_REP; i++)
		flow_key_compare2(k1, k2);

	printf((flow_key_compare(k1,k2)==(flow_key_compare2(k1,k2)?1:0))?"ok\n":"error\n");

	return 0;
}











2007/1/1, Daniel Marjamäki <daniel.marjamaki@gmail.com>:
> Hello!
>
> So you mean that in this particular case it's faster with a handcoded
> comparison than memcmp? Because both key1 and key2 are located at
> word-aligned addresses?
> That's fascinating.
>
> Best regards,
> Daniel
>
> 2006/12/31, David Miller <davem@davemloft.net>:
> > From: "Daniel_Marjamäki" <daniel.marjamaki@gmail.com>
> > Date: Sun, 31 Dec 2006 17:37:05 +0100
> >
> > > From: Daniel Marjamäki
> > > This has been tested by me.
> > > Signed-off-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>
> >
> > Please do not do this.
> >
> > memcmp() cannot assume the alignment of the source and
> > destination buffers and thus will run more slowly than
> > that open-coded comparison.
> >
> > That code was done like that on purpose because it is
> > one of the most critical paths in the networking flow
> > cache lookup which runs for every IPSEC packet going
> > throught the system.
> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/core/flow.c: compare data with memcmp
  2007-01-01  9:16     ` Daniel Marjamäki
@ 2007-01-02 23:36       ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2007-01-02 23:36 UTC (permalink / raw)
  To: daniel.marjamaki; +Cc: netdev, kernel-janitors, linux-kernel

From: "Daniel_Marjamäki" <daniel.marjamaki@gmail.com>
Date: Mon, 1 Jan 2007 10:16:02 +0100

> I have done a little testing on my own. My results is that memcpy is
> many times faster even with aligned data.

Your test program doesn't make any measurements, from where did
you get these "results"?

Also, your test program is broken because in the memcmp() case GCC
totally optimizes away the call to memcmp() because it can see the
comparison data at compile time and therefore it computes the memcmp()
result at compile time.  There are no memcmp() calls made at all by
your program.

You should look at the assembler code emitted by a test program
that is measuring performance, in detail, to make sure the test
program really is doing what you think it is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-01-02 23:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-31 16:37 [PATCH] net/core/flow.c: compare data with memcmp Daniel Marjamäki
2006-12-31 19:49 ` [KJ] " Nishanth Aravamudan
2006-12-31 20:37 ` David Miller
2007-01-01  7:47   ` Daniel Marjamäki
2007-01-01  8:49     ` David Miller
2007-01-01  9:16     ` Daniel Marjamäki
2007-01-02 23:36       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).