[PATCH][XFRM] Optimize policy dumping

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH][XFRM] Optimize policy dumping
@ 2006-12-03 15:11 jamal
  2006-12-04 12:24 ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-03 15:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]


This improves dumping performance of xfrm policies. I started getting
bothered when i noticed it took upto a minute dumping 60K policies from
the kernel.

I have another one i am testing that uses the same approach for SAs.
Will send in the next hour.
Against net-2.6.20.

cheers,
jamal

[-- Attachment #2: xfrm_pol_opt --]
[-- Type: text/plain, Size: 5090 bytes --]

[XFRM] Optimize policy dumping

This code is Fuugly. This patch doesnt make it fuglier.
I could have optimized more but PFKEY is a fugly protocol because
of a rather incomplete 2 phase commit semantics.
Therefore, it adds more overhead that we have to carry around.

>From RFC 2367:
"
3.1.10 SADB_DUMP
   The SADB_DUMP message causes the kernel to dump the operating
    system's entire Key Table to the requesting key socket.....
    Each Security Association is returned in its own SADB_DUMP message.
    A SADB_DUMP message with a sadb_seq field of zero indicates the end
    of the dump transaction. The dump message is used for debugging
    purposes only and is not intended for production use.

    Support for the dump message MAY be discontinued in future versions
    of PF_KEY.  Key management applications MUST NOT depend on this
    message for basic operation.
"

Note the funny comment above on the dump message being discontinued at
some point and is only for debugging ;->
The way to eventually fix this IMO and reach the goals stated by
Davem of making "pfkey more robust" is to add to pfkey a socket->cb
structure. For now i think this even improves the pfkey by reducing
the compute. The advantages are noticeable when you have a large number
of policies installed.
I have tested this with setkey (which uses the same API as racoon) and all
looks fine.
I was more interested in the performance of netlink side though; so
heres some numbers with 40K policies installed (i hacked ip xfrm so
it doesnt print the output):

---

1) original with sub-policies compiled in ..
speedopolis:~# time ./ip xf pol

real    0m22.274s
user    0m0.000s
sys     0m22.269s

2) Turn off sub-policies

speedopolis:~# ./ip xf pol

real    0m13.496s
user    0m0.000s
sys     0m13.493s

i suppose the above is to be expected

3) With this patch and no subpolicies

speedopolis:~# ./ip xf pol

real    0m7.751s
user    0m0.012s
sys     0m7.740s

speedopolis:~#

------------

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 0355ced4a81a1af96b4531680e9c593d3967a5f1
tree dd60c3be71f4bd460d2e15e81689560099751daa
parent c3b92488bea3f11a6cc7c1c59101444c26ad12ce
author Jamal Hadi Salim <hadi@cyberus.ca> Sun, 03 Dec 2006 10:04:44 -0500
committer Jamal Hadi Salim <hadi@cyberus.ca> Sun, 03 Dec 2006 10:04:44 -0500

 net/xfrm/xfrm_policy.c |   73 ++++++++++++++++++++++++++++--------------------
 1 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 64d3938..c8a98ca 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -860,57 +860,70 @@ EXPORT_SYMBOL(xfrm_policy_flush);
 int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
 {
-	struct xfrm_policy *pol;
 	struct hlist_node *entry;
-	int dir, count, error;
+	int dir = 0, last_dir = 0, count = 0, error = -ENOENT;
+	struct xfrm_policy *pol = NULL, *send_pol = NULL, *last_pol = NULL;
 
 	read_lock_bh(&xfrm_policy_lock);
-	count = 0;
+
 	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
 		struct hlist_head *table = xfrm_policy_bydst[dir].table;
 		int i;
 
 		hlist_for_each_entry(pol, entry,
 				     &xfrm_policy_inexact[dir], bydst) {
-			if (pol->type == type)
-				count++;
-		}
-		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
-			hlist_for_each_entry(pol, entry, table + i, bydst) {
-				if (pol->type == type)
-					count++;
+			if (count && send_pol && send_pol != last_pol) {
+				error = func(send_pol, dir % XFRM_POLICY_MAX, count, data);
+				if (error)
+					goto out;
+				send_pol = NULL;
 			}
-		}
-	}
-
-	if (count == 0) {
-		error = -ENOENT;
-		goto out;
-	}
-
-	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
-		struct hlist_head *table = xfrm_policy_bydst[dir].table;
-		int i;
 
-		hlist_for_each_entry(pol, entry,
-				     &xfrm_policy_inexact[dir], bydst) {
 			if (pol->type != type)
 				continue;
-			error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-			if (error)
-				goto out;
+
+			if (!count) {
+				last_pol = send_pol = pol;
+			} else {
+				send_pol = last_pol;
+				last_pol = pol;
+			}
+
+			last_dir = dir;
+			count++;
 		}
+
 		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
 			hlist_for_each_entry(pol, entry, table + i, bydst) {
+				if (count && send_pol && send_pol != last_pol) {
+					error = func(send_pol, dir % XFRM_POLICY_MAX, count, data);
+					send_pol = NULL;
+					if (error)
+						goto out;
+				}
 				if (pol->type != type)
 					continue;
-				error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-				if (error)
-					goto out;
+				if (!count) {
+					last_pol = send_pol = pol;
+				} else {
+					send_pol = last_pol;
+					last_pol = pol;
+				}
+				last_dir = dir;
+				count++;
 			}
 		}
 	}
-	error = 0;
+
+	if (send_pol && send_pol != last_pol) {
+		error = func(send_pol, last_dir % XFRM_POLICY_MAX, count, data);
+	}
+
+	if (count) {
+		BUG_TRAP(last_pol == NULL);
+		error = func(send_pol, last_dir % XFRM_POLICY_MAX, 0, data);
+	}
+
 out:
 	read_unlock_bh(&xfrm_policy_lock);
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-03 15:11 [PATCH][XFRM] Optimize policy dumping jamal
@ 2006-12-04 12:24 ` Patrick McHardy
  2006-12-04 13:26   ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 12:24 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

jamal wrote:
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 64d3938..c8a98ca 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -860,57 +860,70 @@ EXPORT_SYMBOL(xfrm_policy_flush);
>  int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*),
>  		     void *data)
>  {
> -	struct xfrm_policy *pol;
>  	struct hlist_node *entry;
> -	int dir, count, error;
> +	int dir = 0, last_dir = 0, count = 0, error = -ENOENT;
> +	struct xfrm_policy *pol = NULL, *send_pol = NULL, *last_pol = NULL;
>  
>  	read_lock_bh(&xfrm_policy_lock);
> -	count = 0;
> +
>  	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
>  		struct hlist_head *table = xfrm_policy_bydst[dir].table;
>  		int i;
>  
>  		hlist_for_each_entry(pol, entry,
>  				     &xfrm_policy_inexact[dir], bydst) {
> -			if (pol->type == type)
> -				count++;
> -		}
> -		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
> -			hlist_for_each_entry(pol, entry, table + i, bydst) {
> -				if (pol->type == type)
> -					count++;
> +			if (count && send_pol && send_pol != last_pol) {
> +				error = func(send_pol, dir % XFRM_POLICY_MAX, count, data);
> +				if (error)
> +					goto out;
> +				send_pol = NULL;
>  			}
> -		}
> -	}
> -
> -	if (count == 0) {
> -		error = -ENOENT;
> -		goto out;
> -	}
> -
> -	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
> -		struct hlist_head *table = xfrm_policy_bydst[dir].table;
> -		int i;
>  
> -		hlist_for_each_entry(pol, entry,
> -				     &xfrm_policy_inexact[dir], bydst) {
>  			if (pol->type != type)
>  				continue;
> -			error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
> -			if (error)
> -				goto out;
> +
> +			if (!count) {
> +				last_pol = send_pol = pol;
> +			} else {
> +				send_pol = last_pol;
> +				last_pol = pol;
> +			}
> +
> +			last_dir = dir;
> +			count++;
>  		}
> +
>  		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
>  			hlist_for_each_entry(pol, entry, table + i, bydst) {
> +				if (count && send_pol && send_pol != last_pol) {
> +					error = func(send_pol, dir % XFRM_POLICY_MAX, count, data);
> +					send_pol = NULL;
> +					if (error)
> +						goto out;
> +				}
>  				if (pol->type != type)
>  					continue;
> -				error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
> -				if (error)
> -					goto out;
> +				if (!count) {
> +					last_pol = send_pol = pol;
> +				} else {
> +					send_pol = last_pol;
> +					last_pol = pol;
> +				}
> +				last_dir = dir;
> +				count++;
>  			}
>  		}
>  	}
> -	error = 0;
> +
> +	if (send_pol && send_pol != last_pol) {
> +		error = func(send_pol, last_dir % XFRM_POLICY_MAX, count, data);
> +	}
> +
> +	if (count) {
> +		BUG_TRAP(last_pol == NULL);
> +		error = func(send_pol, last_dir % XFRM_POLICY_MAX, 0, data);
> +	}
> +
>  out:
>  	read_unlock_bh(&xfrm_policy_lock);
>  	return error;


A few cases that will behave incorrectly:

- two policies in xfrm_policy_inexact with the same direction:
  after the first iteration we have last_pol = send_pol = first policy
  and no messages sent, after the second iteration we have
  send_pol = first policy, last_pol = second policy and still no
  messages sent. Since send_pol && send_pol != last_pol, the
  second to last block will send send_pol with last_dir, since
  count > 0 the last block will send send_pol again. So we get
  two times the first policy and zero times the second one.

- same case as above, but policies in opposite directions. The
  first policy will again be sent twice, but with last_dir, which
  is the direction of the second policy.

- three policies in xfrm_policy_inexact, two with similar direction,
  one with opposite direction. The first two iterations look similar
  and no policies are dumped, during the third iteration we have
  count && send_pol && send_pol != last_pol. So send_pol (the
  first policy) is sent, but with direction dir, which is at that
  time the opposite direction of the policy.


I guess its easy to construct more cases. In general I don't see
how remebering only the last direction can work since two policies
with potentially different directions are remembered. Within the
loop you always use dir, which also look wrong.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 12:24 ` Patrick McHardy
@ 2006-12-04 13:26   ` jamal
  2006-12-04 13:52     ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 13:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Mon, 2006-04-12 at 13:24 +0100, Patrick McHardy wrote:

> A few cases that will behave incorrectly:
> 
> - two policies in xfrm_policy_inexact with the same direction:
>   after the first iteration we have last_pol = send_pol = first policy
>   and no messages sent, after the second iteration we have
>   send_pol = first policy, last_pol = second policy and still no
>   messages sent. Since send_pol && send_pol != last_pol, the
>   second to last block will send send_pol with last_dir, since
>   count > 0 the last block will send send_pol again. So we get
>   two times the first policy and zero times the second one.
> 
> - same case as above, but policies in opposite directions. The
>   first policy will again be sent twice, but with last_dir, which
>   is the direction of the second policy.
> 
> - three policies in xfrm_policy_inexact, two with similar direction,
>   one with opposite direction. The first two iterations look similar
>   and no policies are dumped, during the third iteration we have
>   count && send_pol && send_pol != last_pol. So send_pol (the
>   first policy) is sent, but with direction dir, which is at that
>   time the opposite direction of the policy.
> 
> 
> I guess its easy to construct more cases. In general I don't see
> how remebering only the last direction can work since two policies
> with potentially different directions are remembered. Within the
> loop you always use dir, which also look wrong.

All very valid points.
Yikes, the directionality is not something i thought clearly about or
tested well. I can fix this but this code will only get fuglier. How
about the following approach:

I add a new callback which is passed in the invocation to walk. 
This callback is invoked at the end to signal the end of the walk, sort
of what done() does in netlink.
netlink doesnt use this call but pfkey does. So the burden is then moved
to pfkey to keep track of the stoopid count.

Thoughts?

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 13:26   ` jamal
@ 2006-12-04 13:52     ` Patrick McHardy
  2006-12-04 13:57       ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 13:52 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

jamal wrote:
> All very valid points.
> Yikes, the directionality is not something i thought clearly about or
> tested well. I can fix this but this code will only get fuglier. How
> about the following approach:
> 
> I add a new callback which is passed in the invocation to walk. 
> This callback is invoked at the end to signal the end of the walk, sort
> of what done() does in netlink.
> netlink doesnt use this call but pfkey does. So the burden is then moved
> to pfkey to keep track of the stoopid count.
> 
> Thoughts?


I think the complications come from the fact that you remeber two
policies, but only one seems necessary.  How about this (completely
untested) patch? It simply uses increasing sequence numbers for all
but the last entry and uses zero for the last one.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2233 bytes --]

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 64d3938..c790420 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush);
 int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
 {
-	struct xfrm_policy *pol;
+	struct xfrm_policy *pol, *last = NULL;
 	struct hlist_node *entry;
-	int dir, count, error;
+	int dir, last_dir, count, error;
 
 	read_lock_bh(&xfrm_policy_lock);
 	count = 0;
-	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
-		struct hlist_head *table = xfrm_policy_bydst[dir].table;
-		int i;
-
-		hlist_for_each_entry(pol, entry,
-				     &xfrm_policy_inexact[dir], bydst) {
-			if (pol->type == type)
-				count++;
-		}
-		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
-			hlist_for_each_entry(pol, entry, table + i, bydst) {
-				if (pol->type == type)
-					count++;
-			}
-		}
-	}
-
-	if (count == 0) {
-		error = -ENOENT;
-		goto out;
-	}
 
 	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
 		struct hlist_head *table = xfrm_policy_bydst[dir].table;
@@ -894,23 +873,39 @@ int xfrm_policy_walk(u8 type, int (*func
 
 		hlist_for_each_entry(pol, entry,
 				     &xfrm_policy_inexact[dir], bydst) {
+			if (last) {
+				error = func(last, last_dir % XFRM_POLICY_MAX,
+					     ++count, data);
+				if (error)
+					goto out;
+				last = NULL;
+			}
 			if (pol->type != type)
 				continue;
-			error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-			if (error)
-				goto out;
+			last = pol;
+			last_dir = dir;
 		}
 		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
 			hlist_for_each_entry(pol, entry, table + i, bydst) {
+				if (last) {
+					error = func(last, last_dir % XFRM_POLICY_MAX,
+						     ++count, data);
+					if (error)
+						goto out;
+					last = NULL;
+				}
 				if (pol->type != type)
 					continue;
-				error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-				if (error)
-					goto out;
+				last = pol;
+				last_dir = dir;
 			}
 		}
 	}
-	error = 0;
+	if (count == 0) {
+		error = -ENOENT;
+		goto out;
+	}
+	error = func(last, last_dir % XFRM_POLICY_MAX, 0, data);
 out:
 	read_unlock_bh(&xfrm_policy_lock);
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 13:52     ` Patrick McHardy
@ 2006-12-04 13:57       ` Patrick McHardy
  2006-12-04 13:58         ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 13:57 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

Patrick McHardy wrote:
> jamal wrote:
> 
>>All very valid points.
>>Yikes, the directionality is not something i thought clearly about or
>>tested well. I can fix this but this code will only get fuglier. How
>>about the following approach:
>>
>>I add a new callback which is passed in the invocation to walk. 
>>This callback is invoked at the end to signal the end of the walk, sort
>>of what done() does in netlink.
>>netlink doesnt use this call but pfkey does. So the burden is then moved
>>to pfkey to keep track of the stoopid count.
>>
>>Thoughts?
> 
> I think the complications come from the fact that you remeber two
> policies, but only one seems necessary.  How about this (completely
> untested) patch? It simply uses increasing sequence numbers for all
> but the last entry and uses zero for the last one.

And the same for SAs.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1193 bytes --]

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 864962b..8e7c52d 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1099,7 +1099,7 @@ int xfrm_state_walk(u8 proto, int (*func
 		    void *data)
 {
 	int i;
-	struct xfrm_state *x;
+	struct xfrm_state *x, *last = NULL;
 	struct hlist_node *entry;
 	int count = 0;
 	int err = 0;
@@ -1107,24 +1107,21 @@ int xfrm_state_walk(u8 proto, int (*func
 	spin_lock_bh(&xfrm_state_lock);
 	for (i = 0; i <= xfrm_state_hmask; i++) {
 		hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) {
-			if (xfrm_id_proto_match(x->id.proto, proto))
-				count++;
+			if (last) {
+				err = func(last, ++count, data);
+				if (err)
+					goto out;
+			}
+			if (!xfrm_id_proto_match(x->id.proto, proto))
+				continue;
+			last = x;
 		}
 	}
 	if (count == 0) {
 		err = -ENOENT;
 		goto out;
 	}
-
-	for (i = 0; i <= xfrm_state_hmask; i++) {
-		hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) {
-			if (!xfrm_id_proto_match(x->id.proto, proto))
-				continue;
-			err = func(x, --count, data);
-			if (err)
-				goto out;
-		}
-	}
+	err = func(last, 0, data);
 out:
 	spin_unlock_bh(&xfrm_state_lock);
 	return err;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 13:57       ` Patrick McHardy
@ 2006-12-04 13:58         ` jamal
  2006-12-04 14:05           ` jamal
  2006-12-04 14:06           ` Patrick McHardy
  0 siblings, 2 replies; 18+ messages in thread
From: jamal @ 2006-12-04 13:58 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote:

>  I think the complications come from the fact that you remeber two
>  policies, but only one seems necessary.  How about this (completely
>  untested) patch? It simply uses increasing sequence numbers for all
>  but the last entry and uses zero for the last one.
> 

I could give this a try in about 2 hours. But why dont you like the
callback approach? You have to admit, this is hairy code.

> And the same for SAs.
> 

The SA has less things to remember, so it is easier; but i will apply
this and test it and if it meets the requirements I will look into
converting the SA to the same scheme.

cheers,
jamal



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 13:58         ` jamal
@ 2006-12-04 14:05           ` jamal
  2006-12-04 15:37             ` jamal
  2006-12-04 14:06           ` Patrick McHardy
  1 sibling, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 14:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

Patrick,

Your approach is much cleaner. Let me give these a few tests then
I will repost later today; forget about the callback approach for now.

cheers,
jamal

On Mon, 2006-04-12 at 08:58 -0500, jamal wrote:
> On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote:
> 
> >  I think the complications come from the fact that you remeber two
> >  policies, but only one seems necessary.  How about this (completely
> >  untested) patch? It simply uses increasing sequence numbers for all
> >  but the last entry and uses zero for the last one.
> > 
> 
> I could give this a try in about 2 hours. But why dont you like the
> callback approach? You have to admit, this is hairy code.
> 
> > And the same for SAs.
> > 
> 
> The SA has less things to remember, so it is easier; but i will apply
> this and test it and if it meets the requirements I will look into
> converting the SA to the same scheme.
> 
> cheers,
> jamal
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 13:58         ` jamal
  2006-12-04 14:05           ` jamal
@ 2006-12-04 14:06           ` Patrick McHardy
  2006-12-04 14:11             ` jamal
  1 sibling, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 14:06 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

jamal wrote:
> On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote:
> 
> 
>> I think the complications come from the fact that you remeber two
>> policies, but only one seems necessary.  How about this (completely
>> untested) patch? It simply uses increasing sequence numbers for all
>> but the last entry and uses zero for the last one.
>>
> 
> 
> I could give this a try in about 2 hours. But why dont you like the
> callback approach? You have to admit, this is hairy code.

Both ways are fine I guess. But the counting has almost no
overhead with the patch I sent, so I'm not sure if its worth
adding a callback (which still needs to get the last policy/SA
as argument, so that part won't get any nicer).

BTW, I'm not sure whether there are further requirements than
those you quoted, but according to that text, using 1 for
all but the last message would be fine as well :)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 14:06           ` Patrick McHardy
@ 2006-12-04 14:11             ` jamal
  2006-12-04 14:26               ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 14:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Mon, 2006-04-12 at 15:06 +0100, Patrick McHardy wrote:

> Both ways are fine I guess. But the counting has almost no
> overhead with the patch I sent, so I'm not sure if its worth
> adding a callback (which still needs to get the last policy/SA
> as argument, so that part won't get any nicer).
> 
> BTW, I'm not sure whether there are further requirements than
> those you quoted, but according to that text, using 1 for
> all but the last message would be fine as well :)
> 

The only arguement for the callback is it will lead to eventually
having some semi-reliable dump for pfkey. But i think that is a separate
issue to be tackled later.
I am actually scratching my head a little as to what happens when the
pfkey socket recv is full.

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 14:11             ` jamal
@ 2006-12-04 14:26               ` Patrick McHardy
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 14:26 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

jamal wrote:
> On Mon, 2006-04-12 at 15:06 +0100, Patrick McHardy wrote:
> 
> 
>>Both ways are fine I guess. But the counting has almost no
>>overhead with the patch I sent, so I'm not sure if its worth
>>adding a callback (which still needs to get the last policy/SA
>>as argument, so that part won't get any nicer).
>>
>
> The only arguement for the callback is it will lead to eventually
> having some semi-reliable dump for pfkey. But i think that is a separate
> issue to be tackled later.

Agreed, that also looks a bit tricker than the optimization.

> I am actually scratching my head a little as to what happens when the
> pfkey socket recv is full.

dump_sp() doesn't check the return value of pfkey_broadcast, so I
guess it will just try to stuff more and more data in the recv queue,
leading to either all messages after the last one fitting getting
dropped or random drops if dumping and reading happen in parallel.

setkey will loop forever if it doesn't receive the zero sequence
number.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 14:05           ` jamal
@ 2006-12-04 15:37             ` jamal
  2006-12-04 15:55               ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 15:37 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Mon, 2006-04-12 at 09:05 -0500, jamal wrote:
> Patrick,
> 
> Your approach is much cleaner. Let me give these a few tests then
> I will repost later today; forget about the callback approach for now.
> 

I have just applied the policy patch; havent compiled or tested (the
setup takes me a while to put together). But by staring, I am seeing
that you will end up with the same thing of sending a NULL or the same
entry twice.

Consider a simple hypothetical test. You have one one entry in the 
xfrm_policy_inexact table that matches. It happens to be the fifth out
of 10 elements. You find it at the 5th iteration. At the sixth iteration
you send it and last becomes null. 

All the way down, you call func with a NULL entry. You could add a check
to make sure it only gets invoked when last is not null, but the result
is in such a case, you will never send a 0 count element. I am sure
there could be other tricky scenarios like this that could be
constructed.

Thoughts.

cheers,
jamal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 15:37             ` jamal
@ 2006-12-04 15:55               ` Patrick McHardy
  2006-12-04 15:57                 ` Patrick McHardy
  2006-12-04 17:43                 ` jamal
  0 siblings, 2 replies; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 15:55 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

jamal wrote:
> On Mon, 2006-04-12 at 09:05 -0500, jamal wrote:
> 
>>Patrick,
>>
>>Your approach is much cleaner. Let me give these a few tests then
>>I will repost later today; forget about the callback approach for now.
>>
> 
> 
> I have just applied the policy patch; havent compiled or tested (the
> setup takes me a while to put together). But by staring, I am seeing
> that you will end up with the same thing of sending a NULL or the same
> entry twice.
> 
> Consider a simple hypothetical test. You have one one entry in the 
> xfrm_policy_inexact table that matches. It happens to be the fifth out
> of 10 elements. You find it at the 5th iteration. At the sixth iteration
> you send it and last becomes null. 
> 
> All the way down, you call func with a NULL entry. You could add a check
> to make sure it only gets invoked when last is not null, but the result
> is in such a case, you will never send a 0 count element. I am sure
> there could be other tricky scenarios like this that could be
> constructed.
> 
> Thoughts.

Double sending can't happen, but you're right about potentially
sending a NULL ptr when after setting it to NULL we don't find
any other matching elements.

This patch should fix it (and is even simpler), by moving the
check for pol->type != type before sending, we make sure that
last always contains a valid element unless count == 0.

Also fixed an incorrect gcc warning about last_dir potentially
being used uninitialized.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2162 bytes --]

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 64d3938..e19ec1e 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush);
 int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
 {
-	struct xfrm_policy *pol;
+	struct xfrm_policy *pol, *last = NULL;
 	struct hlist_node *entry;
-	int dir, count, error;
+	int dir, last_dir = 0, count, error;
 
 	read_lock_bh(&xfrm_policy_lock);
 	count = 0;
-	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
-		struct hlist_head *table = xfrm_policy_bydst[dir].table;
-		int i;
-
-		hlist_for_each_entry(pol, entry,
-				     &xfrm_policy_inexact[dir], bydst) {
-			if (pol->type == type)
-				count++;
-		}
-		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
-			hlist_for_each_entry(pol, entry, table + i, bydst) {
-				if (pol->type == type)
-					count++;
-			}
-		}
-	}
-
-	if (count == 0) {
-		error = -ENOENT;
-		goto out;
-	}
 
 	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
 		struct hlist_head *table = xfrm_policy_bydst[dir].table;
@@ -896,21 +875,35 @@ int xfrm_policy_walk(u8 type, int (*func
 				     &xfrm_policy_inexact[dir], bydst) {
 			if (pol->type != type)
 				continue;
-			error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-			if (error)
-				goto out;
+			if (last) {
+				error = func(last, last_dir % XFRM_POLICY_MAX,
+					     ++count, data);
+				if (error)
+					goto out;
+			}
+			last = pol;
+			last_dir = dir;
 		}
 		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
 			hlist_for_each_entry(pol, entry, table + i, bydst) {
 				if (pol->type != type)
 					continue;
-				error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-				if (error)
-					goto out;
+				if (last) {
+					error = func(last, last_dir % XFRM_POLICY_MAX,
+						     ++count, data);
+					if (error)
+						goto out;
+				}
+				last = pol;
+				last_dir = dir;
 			}
 		}
 	}
-	error = 0;
+	if (count == 0) {
+		error = -ENOENT;
+		goto out;
+	}
+	error = func(last, last_dir % XFRM_POLICY_MAX, 0, data);
 out:
 	read_unlock_bh(&xfrm_policy_lock);
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 15:55               ` Patrick McHardy
@ 2006-12-04 15:57                 ` Patrick McHardy
  2006-12-04 17:43                 ` jamal
  1 sibling, 0 replies; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 15:57 UTC (permalink / raw)
  Cc: hadi, David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

Patrick McHardy wrote:
> jamal wrote:
> 
>>All the way down, you call func with a NULL entry. You could add a check
>>to make sure it only gets invoked when last is not null, but the result
>>is in such a case, you will never send a 0 count element. I am sure
>>there could be other tricky scenarios like this that could be
>>constructed.
>>
>>Thoughts.
> 
> 
> Double sending can't happen, but you're right about potentially
> sending a NULL ptr when after setting it to NULL we don't find
> any other matching elements.
> 
> This patch should fix it (and is even simpler), by moving the
> check for pol->type != type before sending, we make sure that
> last always contains a valid element unless count == 0.
> 
> Also fixed an incorrect gcc warning about last_dir potentially
> being used uninitialized.


And again for SAs ..


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1193 bytes --]

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 864962b..a5877f8 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1099,7 +1099,7 @@ int xfrm_state_walk(u8 proto, int (*func
 		    void *data)
 {
 	int i;
-	struct xfrm_state *x;
+	struct xfrm_state *x, *last = NULL;
 	struct hlist_node *entry;
 	int count = 0;
 	int err = 0;
@@ -1107,24 +1107,21 @@ int xfrm_state_walk(u8 proto, int (*func
 	spin_lock_bh(&xfrm_state_lock);
 	for (i = 0; i <= xfrm_state_hmask; i++) {
 		hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) {
-			if (xfrm_id_proto_match(x->id.proto, proto))
-				count++;
+			if (!xfrm_id_proto_match(x->id.proto, proto))
+				continue;
+			if (last) {
+				err = func(last, ++count, data);
+				if (err)
+					goto out;
+			}
+			last = x;
 		}
 	}
 	if (count == 0) {
 		err = -ENOENT;
 		goto out;
 	}
-
-	for (i = 0; i <= xfrm_state_hmask; i++) {
-		hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) {
-			if (!xfrm_id_proto_match(x->id.proto, proto))
-				continue;
-			err = func(x, --count, data);
-			if (err)
-				goto out;
-		}
-	}
+	err = func(last, 0, data);
 out:
 	spin_unlock_bh(&xfrm_state_lock);
 	return err;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 15:55               ` Patrick McHardy
  2006-12-04 15:57                 ` Patrick McHardy
@ 2006-12-04 17:43                 ` jamal
  2006-12-04 17:59                   ` Patrick McHardy
  1 sibling, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 17:43 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

On Mon, 2006-04-12 at 16:55 +0100, Patrick McHardy wrote:


> This patch should fix it (and is even simpler), by moving the
> check for pol->type != type before sending, we make sure that
> last always contains a valid element unless count == 0.
> 
> Also fixed an incorrect gcc warning about last_dir potentially
> being used uninitialized.


Ok, both looked good except for the test of a single entry.
This was because you would break out of the loop with count of 0.
The patch against yours would look like something attached for the state
case. Dont forget there are two spots on the policy side of things;->

You can either submit both patches or i could later today. If you do,
please look at some of the comments i made in the first patch and
include them. 
Just from the outset the numbers improvement looked the same as the way
i had it.

With these changes, 40K SPDs and 20K SAs dumping (subpolicies compiled
out)

---
speedopolis:~# time ./ip x state

real    0m1.985s
user    0m0.000s
sys     0m1.984s


speedopolis:~# time ./ip x policy

real    0m7.901s
user    0m0.008s
sys     0m7.896s
---

As a reference point, the old numbers:

---
speedopolis:~# ./ip xf pol

real    0m13.496s
user    0m0.000s
sys     0m13.493s
speedopolis:~#
speedopolis:~# time ./ip xf sta

real    0m5.321s
user    0m0.004s
sys     0m5.316s

----


Thanks a lot for your efforts Patrick.

cheers,
jamal

[-- Attachment #2: state-patch --]
[-- Type: text/x-patch, Size: 399 bytes --]

--- a/net/xfrm/xfrm_state.c	2006-12-04 12:06:23.000000000 -0500
+++ b/net/xfrm/xfrm_state.c	2006-12-04 12:07:09.000000000 -0500
@@ -1159,11 +1159,12 @@
 			if (!xfrm_id_proto_match(x->id.proto, proto))
 				continue;
 			if (last) {
-				err = func(last, ++count, data);
+				err = func(last, count, data);
 				if (err)
 					goto out;
 			}
 			last = x;
+			count++;
 		}
 	}
 	if (count == 0) {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 17:43                 ` jamal
@ 2006-12-04 17:59                   ` Patrick McHardy
  2006-12-04 20:46                     ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2006-12-04 17:59 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev

jamal wrote:
> Ok, both looked good except for the test of a single entry.
> This was because you would break out of the loop with count of 0.
> The patch against yours would look like something attached for the state
> case. Dont forget there are two spots on the policy side of things;->

You're right, that case was also broken. With your patch on top
it looks all right.

> You can either submit both patches or i could later today. If you do,
> please look at some of the comments i made in the first patch and
> include them. 

I'd prefer if you did it since you're already testing the thing :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 17:59                   ` Patrick McHardy
@ 2006-12-04 20:46                     ` jamal
  0 siblings, 0 replies; 18+ messages in thread
From: jamal @ 2006-12-04 20:46 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Mon, 2006-04-12 at 18:59 +0100, Patrick McHardy wrote:
> jamal wrote:

> I'd prefer if you did it since you're already testing the thing :)

Ok, will do shortly.

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH][XFRM] Optimize policy dumping
@ 2006-12-04 20:58 jamal
  2006-12-05  4:03 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2006-12-04 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev

[-- Attachment #1: Type: text/plain, Size: 91 bytes --]

Dave,

This one has undergone more scrutiny and testing. Against net-2.6.20

cheers,
jamal

[-- Attachment #2: patrick-spopt --]
[-- Type: text/plain, Size: 4328 bytes --]

[XFRM] Optimize policy dumping

This change optimizes the dumping of Security policies.

1) Before this change ..
speedopolis:~# time ./ip xf pol

real    0m22.274s
user    0m0.000s
sys     0m22.269s

2) Turn off sub-policies

speedopolis:~# ./ip xf pol

real    0m13.496s
user    0m0.000s
sys     0m13.493s

i suppose the above is to be expected

3) With this change ..
speedopolis:~# time ./ip x policy

real    0m7.901s
user    0m0.008s
sys     0m7.896s
---------

This is probably the best we can do for now.

The current code attempts to work well for PFKEY which
has a broken two phase semantic.

>From RFC 2367:
"
3.1.10 SADB_DUMP
   The SADB_DUMP message causes the kernel to dump the operating
   system's entire Key Table to the requesting key socket.....
   Each Security Association is returned in its own SADB_DUMP
   message.
   A SADB_DUMP message with a sadb_seq field of zero indicates
   the end of the dump transaction. The dump message is used for
   debugging purposes only and is not intended for production
   use.

   Support for the dump message MAY be discontinued
   in future versions of PF_KEY.  Key management applications MUST
   NOT depend on this message for basic operation.
 "

Note the funny comment above on the dump message being discontinued
at some point and is only for debugging ;->
The way to eventually fix this IMO and reach the goals stated by
Davem of making "pfkey more robust" is to add to pfkey a socket->cb
structure. For now i think this even improves the pfkey by reducing
the compute. The advantages are noticeable when you have a large number
of policies installed.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

---
commit 33b1e3fcdaee3252cce3c1cadf93a4d81f2200ff
tree 584411b6ad0ac830cc39dd184ccb32573739036d
parent 5465ae68b5ec11b2820db3f9b4c6fd94f113da44
author Patrick McHardy <kaber@trash.net> Mon, 04 Dec 2006 15:33:48 -0500
committer Jamal Hadi Salim <hadi@cyberus.ca> Mon, 04 Dec 2006 15:33:48 -0500

 net/xfrm/xfrm_policy.c |   55 ++++++++++++++++++++++--------------------------
 1 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 64d3938..c438035 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush);
 int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
 {
-	struct xfrm_policy *pol;
+	struct xfrm_policy *pol, *last = NULL;
 	struct hlist_node *entry;
-	int dir, count, error;
+	int dir, last_dir = 0, count, error;
 
 	read_lock_bh(&xfrm_policy_lock);
 	count = 0;
-	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
-		struct hlist_head *table = xfrm_policy_bydst[dir].table;
-		int i;
-
-		hlist_for_each_entry(pol, entry,
-				     &xfrm_policy_inexact[dir], bydst) {
-			if (pol->type == type)
-				count++;
-		}
-		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
-			hlist_for_each_entry(pol, entry, table + i, bydst) {
-				if (pol->type == type)
-					count++;
-			}
-		}
-	}
-
-	if (count == 0) {
-		error = -ENOENT;
-		goto out;
-	}
 
 	for (dir = 0; dir < 2*XFRM_POLICY_MAX; dir++) {
 		struct hlist_head *table = xfrm_policy_bydst[dir].table;
@@ -896,21 +875,37 @@ int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*)
 				     &xfrm_policy_inexact[dir], bydst) {
 			if (pol->type != type)
 				continue;
-			error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-			if (error)
-				goto out;
+			if (last) {
+				error = func(last, last_dir % XFRM_POLICY_MAX,
+					     count, data);
+				if (error)
+					goto out;
+			}
+			last = pol;
+			last_dir = dir;
+			count++;
 		}
 		for (i = xfrm_policy_bydst[dir].hmask; i >= 0; i--) {
 			hlist_for_each_entry(pol, entry, table + i, bydst) {
 				if (pol->type != type)
 					continue;
-				error = func(pol, dir % XFRM_POLICY_MAX, --count, data);
-				if (error)
-					goto out;
+				if (last) {
+					error = func(last, last_dir % XFRM_POLICY_MAX,
+						     count, data);
+					if (error)
+						goto out;
+				}
+				last = pol;
+				last_dir = dir;
+				count++;
 			}
 		}
 	}
-	error = 0;
+	if (count == 0) {
+		error = -ENOENT;
+		goto out;
+	}
+	error = func(last, last_dir % XFRM_POLICY_MAX, 0, data);
 out:
 	read_unlock_bh(&xfrm_policy_lock);
 	return error;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH][XFRM] Optimize policy dumping
  2006-12-04 20:58 jamal
@ 2006-12-05  4:03 ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2006-12-05  4:03 UTC (permalink / raw)
  To: hadi; +Cc: kaber, netdev

From: jamal <hadi@cyberus.ca>
Date: Mon, 04 Dec 2006 15:58:03 -0500

> This one has undergone more scrutiny and testing. Against net-2.6.20

Applied, thanks Jamal.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-12-05  4:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-03 15:11 [PATCH][XFRM] Optimize policy dumping jamal
2006-12-04 12:24 ` Patrick McHardy
2006-12-04 13:26   ` jamal
2006-12-04 13:52     ` Patrick McHardy
2006-12-04 13:57       ` Patrick McHardy
2006-12-04 13:58         ` jamal
2006-12-04 14:05           ` jamal
2006-12-04 15:37             ` jamal
2006-12-04 15:55               ` Patrick McHardy
2006-12-04 15:57                 ` Patrick McHardy
2006-12-04 17:43                 ` jamal
2006-12-04 17:59                   ` Patrick McHardy
2006-12-04 20:46                     ` jamal
2006-12-04 14:06           ` Patrick McHardy
2006-12-04 14:11             ` jamal
2006-12-04 14:26               ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2006-12-04 20:58 jamal
2006-12-05  4:03 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).