* [net-next sample action optimization 0/3] @ 2017-03-11 0:51 Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 1/3] openvswitch: Deferred fifo API change Andy Zhou ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Andy Zhou @ 2017-03-11 0:51 UTC (permalink / raw) To: netdev; +Cc: joe, pshelar, Andy Zhou The sample action can be used for translating Openflow 'clone' action. However its implementation has not been sufficiently optimized for this use case. This series attempts to close the gap. Patch 3 commit message has more details on the specific optimizations implemented. Andy Zhou (3): openvswitch: Deferred fifo API change. openvswitch: Refactor recirc key allocation. openvswitch: Optimize sample action for the clone use cases include/uapi/linux/openvswitch.h | 13 +++ net/openvswitch/actions.c | 194 ++++++++++++++++++++++----------------- net/openvswitch/datapath.h | 1 + net/openvswitch/flow_netlink.c | 126 +++++++++++++++++-------- 4 files changed, 213 insertions(+), 121 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [net-next sample action optimization 1/3] openvswitch: Deferred fifo API change. 2017-03-11 0:51 [net-next sample action optimization 0/3] Andy Zhou @ 2017-03-11 0:51 ` Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou 2 siblings, 0 replies; 11+ messages in thread From: Andy Zhou @ 2017-03-11 0:51 UTC (permalink / raw) To: netdev; +Cc: joe, pshelar, Andy Zhou add_deferred_actions() API currently requires actions to be passed in as a fully encoded netlink message. So far both 'sample' and 'recirc' actions happens to carry actions as fully encoded netlink messages. However, this requirement is more restrictive than necessary, future patch will need to pass in action lists that are not fully encoded by themselves. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> --- net/openvswitch/actions.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index c82301c..75182e9 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -51,6 +51,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, struct deferred_action { struct sk_buff *skb; const struct nlattr *actions; + int actions_len; /* Store pkt_key clone when creating deferred action. */ struct sw_flow_key pkt_key; @@ -119,8 +120,9 @@ static struct deferred_action *action_fifo_put(struct action_fifo *fifo) /* Return true if fifo is not full */ static struct deferred_action *add_deferred_actions(struct sk_buff *skb, - const struct sw_flow_key *key, - const struct nlattr *attr) + const struct sw_flow_key *key, + const struct nlattr *actions, + const int actions_len) { struct action_fifo *fifo; struct deferred_action *da; @@ -129,7 +131,8 @@ static struct deferred_action *add_deferred_actions(struct sk_buff *skb, da = action_fifo_put(fifo); if (da) { da->skb = skb; - da->actions = attr; + da->actions = actions; + da->actions_len = actions_len; da->pkt_key = *key; } @@ -966,7 +969,8 @@ static int sample(struct datapath *dp, struct sk_buff *skb, /* Skip the sample action when out of memory. */ return 0; - if (!add_deferred_actions(skb, key, a)) { + if (!add_deferred_actions(skb, key, nla_data(acts_list), + nla_len(acts_list))) { if (net_ratelimit()) pr_warn("%s: deferred actions limit reached, dropping sample action\n", ovs_dp_name(dp)); @@ -1123,7 +1127,7 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, return 0; } - da = add_deferred_actions(skb, key, NULL); + da = add_deferred_actions(skb, key, NULL, 0); if (da) { da->pkt_key.recirc_id = nla_get_u32(a); } else { @@ -1278,10 +1282,10 @@ static void process_deferred_actions(struct datapath *dp) struct sk_buff *skb = da->skb; struct sw_flow_key *key = &da->pkt_key; const struct nlattr *actions = da->actions; + int actions_len = da->actions_len; if (actions) - do_execute_actions(dp, skb, key, actions, - nla_len(actions)); + do_execute_actions(dp, skb, key, actions, actions_len); else ovs_dp_process_packet(skb, key); } while (!action_fifo_is_empty(fifo)); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation. 2017-03-11 0:51 [net-next sample action optimization 0/3] Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 1/3] openvswitch: Deferred fifo API change Andy Zhou @ 2017-03-11 0:51 ` Andy Zhou 2017-03-13 20:32 ` Joe Stringer 2017-03-11 0:51 ` [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou 2 siblings, 1 reply; 11+ messages in thread From: Andy Zhou @ 2017-03-11 0:51 UTC (permalink / raw) To: netdev; +Cc: joe, pshelar, Andy Zhou The logic of allocating and copy key for each 'exec_actions_level' was specific to execute_recirc(). However, future patches will reuse as well. Refactor the logic into its own function clone_key(). Signed-off-by: Andy Zhou <azhou@ovn.org> --- net/openvswitch/actions.c | 71 ++++++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 28 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 75182e9..a622c19 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2007-2014 Nicira, Inc. + * Copyright (c) 2007-2017 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -75,7 +75,7 @@ struct ovs_frag_data { #define DEFERRED_ACTION_FIFO_SIZE 10 #define OVS_RECURSION_LIMIT 5 -#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2) +#define OVS_DEFERRED_ACITON_THRESHOLD (OVS_RECURSION_LIMIT - 2) struct action_fifo { int head; int tail; @@ -83,14 +83,31 @@ struct action_fifo { struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE]; }; -struct recirc_keys { - struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD]; +struct action_flow_keys { + struct sw_flow_key key[OVS_DEFERRED_ACITON_THRESHOLD]; }; static struct action_fifo __percpu *action_fifos; -static struct recirc_keys __percpu *recirc_keys; +static struct action_flow_keys __percpu *flow_keys; static DEFINE_PER_CPU(int, exec_actions_level); +/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys' + * space. Return NULL if out of key spaces. + */ +static struct sw_flow_key *clone_key(const struct sw_flow_key *key_) +{ + struct action_flow_keys *keys = this_cpu_ptr(flow_keys); + int level = this_cpu_read(exec_actions_level); + struct sw_flow_key *key = NULL; + + if (level <= OVS_DEFERRED_ACITON_THRESHOLD) { + key = &keys->key[level - 1]; + *key = *key_; + } + + return key; +} + static void action_fifo_init(struct action_fifo *fifo) { fifo->head = 0; @@ -1090,8 +1107,8 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, struct sw_flow_key *key, const struct nlattr *a, int rem) { + struct sw_flow_key *recirc_key; struct deferred_action *da; - int level; if (!is_flow_key_valid(key)) { int err; @@ -1115,29 +1132,27 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, return 0; } - level = this_cpu_read(exec_actions_level); - if (level <= OVS_DEFERRED_ACTION_THRESHOLD) { - struct recirc_keys *rks = this_cpu_ptr(recirc_keys); - struct sw_flow_key *recirc_key = &rks->key[level - 1]; - - *recirc_key = *key; + /* If we are within the limit of 'OVS_DEFERRED_ACITON_THRESHOLD', + * recirc immediately, otherwise, defer it for later execution. + */ + recirc_key = clone_key(key); + if (recirc_key) { recirc_key->recirc_id = nla_get_u32(a); ovs_dp_process_packet(skb, recirc_key); - - return 0; - } - - da = add_deferred_actions(skb, key, NULL, 0); - if (da) { - da->pkt_key.recirc_id = nla_get_u32(a); } else { - kfree_skb(skb); - - if (net_ratelimit()) - pr_warn("%s: deferred action limit reached, drop recirc action\n", - ovs_dp_name(dp)); + da = add_deferred_actions(skb, key, NULL, 0); + if (da) { + recirc_key = &da->pkt_key; + recirc_key->recirc_id = nla_get_u32(a); + } else { + /* Log an error in case action fifo is full. + */ + kfree_skb(skb); + if (net_ratelimit()) + pr_warn("%s: deferred action limit reached, drop recirc action\n", + ovs_dp_name(dp)); + } } - return 0; } @@ -1327,8 +1342,8 @@ int action_fifos_init(void) if (!action_fifos) return -ENOMEM; - recirc_keys = alloc_percpu(struct recirc_keys); - if (!recirc_keys) { + flow_keys = alloc_percpu(struct action_flow_keys); + if (!flow_keys) { free_percpu(action_fifos); return -ENOMEM; } @@ -1339,5 +1354,5 @@ int action_fifos_init(void) void action_fifos_exit(void) { free_percpu(action_fifos); - free_percpu(recirc_keys); + free_percpu(flow_keys); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation. 2017-03-11 0:51 ` [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou @ 2017-03-13 20:32 ` Joe Stringer 0 siblings, 0 replies; 11+ messages in thread From: Joe Stringer @ 2017-03-13 20:32 UTC (permalink / raw) To: Andy Zhou; +Cc: netdev, pravin shelar On 10 March 2017 at 16:51, Andy Zhou <azhou@ovn.org> wrote: > The logic of allocating and copy key for each 'exec_actions_level' > was specific to execute_recirc(). However, future patches will reuse > as well. Refactor the logic into its own function clone_key(). > > Signed-off-by: Andy Zhou <azhou@ovn.org> > --- > net/openvswitch/actions.c | 71 ++++++++++++++++++++++++++++------------------- > 1 file changed, 43 insertions(+), 28 deletions(-) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 75182e9..a622c19 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2007-2014 Nicira, Inc. > + * Copyright (c) 2007-2017 Nicira, Inc. > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of version 2 of the GNU General Public > @@ -75,7 +75,7 @@ struct ovs_frag_data { > > #define DEFERRED_ACTION_FIFO_SIZE 10 > #define OVS_RECURSION_LIMIT 5 > -#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2) > +#define OVS_DEFERRED_ACITON_THRESHOLD (OVS_RECURSION_LIMIT - 2) > struct action_fifo { > int head; > int tail; > @@ -83,14 +83,31 @@ struct action_fifo { > struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE]; > }; > > -struct recirc_keys { > - struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD]; > +struct action_flow_keys { > + struct sw_flow_key key[OVS_DEFERRED_ACITON_THRESHOLD]; ACITON -> ACTION > }; > > static struct action_fifo __percpu *action_fifos; > -static struct recirc_keys __percpu *recirc_keys; > +static struct action_flow_keys __percpu *flow_keys; > static DEFINE_PER_CPU(int, exec_actions_level); > > +/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys' > + * space. Return NULL if out of key spaces. > + */ > +static struct sw_flow_key *clone_key(const struct sw_flow_key *key_) > +{ > + struct action_flow_keys *keys = this_cpu_ptr(flow_keys); > + int level = this_cpu_read(exec_actions_level); > + struct sw_flow_key *key = NULL; > + > + if (level <= OVS_DEFERRED_ACITON_THRESHOLD) { > + key = &keys->key[level - 1]; > + *key = *key_; > + } > + > + return key; > +} > + > static void action_fifo_init(struct action_fifo *fifo) > { > fifo->head = 0; > @@ -1090,8 +1107,8 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, > struct sw_flow_key *key, > const struct nlattr *a, int rem) > { > + struct sw_flow_key *recirc_key; > struct deferred_action *da; > - int level; > > if (!is_flow_key_valid(key)) { > int err; > @@ -1115,29 +1132,27 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, > return 0; > } > > - level = this_cpu_read(exec_actions_level); > - if (level <= OVS_DEFERRED_ACTION_THRESHOLD) { > - struct recirc_keys *rks = this_cpu_ptr(recirc_keys); > - struct sw_flow_key *recirc_key = &rks->key[level - 1]; > - > - *recirc_key = *key; > + /* If we are within the limit of 'OVS_DEFERRED_ACITON_THRESHOLD', > + * recirc immediately, otherwise, defer it for later execution. > + */ > + recirc_key = clone_key(key); > + if (recirc_key) { > recirc_key->recirc_id = nla_get_u32(a); > ovs_dp_process_packet(skb, recirc_key); > - > - return 0; > - } > - > - da = add_deferred_actions(skb, key, NULL, 0); > - if (da) { > - da->pkt_key.recirc_id = nla_get_u32(a); > } else { > - kfree_skb(skb); > - > - if (net_ratelimit()) > - pr_warn("%s: deferred action limit reached, drop recirc action\n", > - ovs_dp_name(dp)); > + da = add_deferred_actions(skb, key, NULL, 0); > + if (da) { > + recirc_key = &da->pkt_key; > + recirc_key->recirc_id = nla_get_u32(a); > + } else { > + /* Log an error in case action fifo is full. > + */ Comment end marker "*/" should be on same line for single-line comment. > + kfree_skb(skb); > + if (net_ratelimit()) > + pr_warn("%s: deferred action limit reached, drop recirc action\n", > + ovs_dp_name(dp)); > + } > } > - > return 0; > } > > @@ -1327,8 +1342,8 @@ int action_fifos_init(void) > if (!action_fifos) > return -ENOMEM; > > - recirc_keys = alloc_percpu(struct recirc_keys); > - if (!recirc_keys) { > + flow_keys = alloc_percpu(struct action_flow_keys); > + if (!flow_keys) { > free_percpu(action_fifos); > return -ENOMEM; > } > @@ -1339,5 +1354,5 @@ int action_fifos_init(void) > void action_fifos_exit(void) > { > free_percpu(action_fifos); > - free_percpu(recirc_keys); > + free_percpu(flow_keys); > } > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-11 0:51 [net-next sample action optimization 0/3] Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 1/3] openvswitch: Deferred fifo API change Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou @ 2017-03-11 0:51 ` Andy Zhou 2017-03-13 7:08 ` Pravin Shelar 2 siblings, 1 reply; 11+ messages in thread From: Andy Zhou @ 2017-03-11 0:51 UTC (permalink / raw) To: netdev; +Cc: joe, pshelar, Andy Zhou With the introduction of open flow 'clone' action, the OVS user space can now translate the 'clone' action into kernel datapath 'sample' action, with 100% probability, to ensure that the clone semantics, which is that the packet seen by the clone action is the same as the packet seen by the action after clone, is faithfully carried out in the datapath. While the sample action in the datpath has the matching semantics, its implementation is only optimized for its original use. Specifically, there are two limitation: First, there is a 3 level of nesting restriction, enforced at the flow downloading time. This limit turns out to be too restrictive for the 'clone' use case. Second, the implementation avoid recursive call only if the sample action list has a single userspace action. The main optimization implemented in this series removes the static nesting limit check, instead, implement the run time recursion limit check, and recursion avoidance similar to that of the 'recirc' action. This optimization solve both #1 and #2 issues above. One related optimization attemps to avoid copying flow key as long as the actions enclosed does not change the flow key. The detection is performed only once at the flow downloading time. Another related optimization is to rewrite the action list at flow downloading time in order to save the fast path from parsing the sample action list in its original form repeatedly. Signed-off-by: Andy Zhou <azhou@ovn.org> --- include/uapi/linux/openvswitch.h | 13 ++++ net/openvswitch/actions.c | 111 ++++++++++++++++++---------------- net/openvswitch/datapath.h | 1 + net/openvswitch/flow_netlink.c | 126 ++++++++++++++++++++++++++++----------- 4 files changed, 162 insertions(+), 89 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 7f41f7d..0dfe69b 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -578,10 +578,23 @@ enum ovs_sample_attr { OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* attributes. */ __OVS_SAMPLE_ATTR_MAX, + +#ifdef __KERNEL__ + OVS_SAMPLE_ATTR_ARG /* struct sample_arg */ +#endif }; #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1) +#ifdef __KERNEL__ +struct sample_arg { + bool exec; /* When true, actions in sample will not + change flow keys. False otherwise. */ + u32 probability; /* Same value as + 'OVS_SAMPLE_ATTR_PROBABILITY'. */ +}; +#endif + /** * enum ovs_userspace_attr - Attributes for %OVS_ACTION_ATTR_USERSPACE action. * @OVS_USERSPACE_ATTR_PID: u32 Netlink PID to which the %OVS_PACKET_CMD_ACTION diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index a622c19..8428438 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -928,73 +928,72 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb, return ovs_dp_upcall(dp, skb, key, &upcall, cutlen); } + +/* When 'last' is true, sample() should always consume the 'skb'. + * Otherwise, sample() should keep 'skb' intact regardless what + * actions are executed within sample(). + */ static int sample(struct datapath *dp, struct sk_buff *skb, struct sw_flow_key *key, const struct nlattr *attr, - const struct nlattr *actions, int actions_len) + bool last) { - const struct nlattr *acts_list = NULL; - const struct nlattr *a; - int rem; - u32 cutlen = 0; - - for (a = nla_data(attr), rem = nla_len(attr); rem > 0; - a = nla_next(a, &rem)) { - u32 probability; + struct nlattr *actions; + struct nlattr *sample_arg; + struct sk_buff *clone_skb; + struct sw_flow_key *orig = key; + int rem = nla_len(attr); + int err = 0; + const struct sample_arg *arg; - switch (nla_type(a)) { - case OVS_SAMPLE_ATTR_PROBABILITY: - probability = nla_get_u32(a); - if (!probability || prandom_u32() > probability) - return 0; - break; + /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ + sample_arg = nla_data(attr); + arg = nla_data(sample_arg); + actions = nla_next(sample_arg, &rem); - case OVS_SAMPLE_ATTR_ACTIONS: - acts_list = a; - break; - } + if ((arg->probability != U32_MAX) && + (!arg->probability || prandom_u32() > arg->probability)) { + if (last) + consume_skb(skb); + return 0; } - rem = nla_len(acts_list); - a = nla_data(acts_list); - - /* Actions list is empty, do nothing */ - if (unlikely(!rem)) + /* Unless the last action, Sample actions work on the skb clone. */ + clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC); + if (!clone_skb) { + /* Out of memory, skip this sample action. + */ return 0; + } - /* The only known usage of sample action is having a single user-space - * action, or having a truncate action followed by a single user-space - * action. Treat this usage as a special case. - * The output_userspace() should clone the skb to be sent to the - * user space. This skb will be consumed by its caller. + /* In case the sample actions won't change 'key', + * we can use key for the clone execution. + * Otherwise, try to allocate a key from the + * next recursion level of 'flow_keys'. If + * successful, we can still execute the clone + * actions without deferring. + * + * Defer the sample action if the action recursion + * limit has been reached. */ - if (unlikely(nla_type(a) == OVS_ACTION_ATTR_TRUNC)) { - struct ovs_action_trunc *trunc = nla_data(a); - - if (skb->len > trunc->max_len) - cutlen = skb->len - trunc->max_len; - - a = nla_next(a, &rem); + if (!arg->exec) { + __this_cpu_inc(exec_actions_level); + key = clone_key(key); } - if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE && - nla_is_last(a, rem))) - return output_userspace(dp, skb, key, a, actions, - actions_len, cutlen); - - skb = skb_clone(skb, GFP_ATOMIC); - if (!skb) - /* Skip the sample action when out of memory. */ - return 0; + if (key) { + err = do_execute_actions(dp, skb, key, actions, rem); + } else if (!add_deferred_actions(skb, orig, actions, rem)) { - if (!add_deferred_actions(skb, key, nla_data(acts_list), - nla_len(acts_list))) { if (net_ratelimit()) - pr_warn("%s: deferred actions limit reached, dropping sample action\n", + pr_warn("%s: deferred action limit reached, drop sample action\n", ovs_dp_name(dp)); - - kfree_skb(skb); + kfree_skb(clone_skb); } - return 0; + + if (!arg->exec) + __this_cpu_dec(exec_actions_level); + + return err; } static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key, @@ -1245,9 +1244,15 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, err = execute_masked_set_action(skb, key, nla_data(a)); break; - case OVS_ACTION_ATTR_SAMPLE: - err = sample(dp, skb, key, a, attr, len); + case OVS_ACTION_ATTR_SAMPLE: { + bool last = nla_is_last(a, rem); + + err = sample(dp, skb, key, a, last); + if (last) + return err; + break; + } case OVS_ACTION_ATTR_CT: if (!is_flow_key_valid(key)) { diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 1c6e937..d8d3953 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -220,4 +220,5 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, if (logging_allowed && net_ratelimit()) \ pr_info("netlink: " fmt "\n", ##__VA_ARGS__); \ } while (0) + #endif /* datapath.h */ diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 6f5fa50..c8ac538 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2007-2014 Nicira, Inc. + * Copyright (c) 2007-2017 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -59,6 +59,39 @@ struct ovs_len_tbl { #define OVS_ATTR_NESTED -1 #define OVS_ATTR_VARIABLE -2 +static bool actions_may_change_flow(const struct nlattr *actions) +{ + struct nlattr *nla; + int rem; + + nla_for_each_nested(nla, actions, rem) { + u16 action = nla_type(nla); + + switch (action) { + case OVS_ACTION_ATTR_OUTPUT: + case OVS_ACTION_ATTR_RECIRC: + case OVS_ACTION_ATTR_USERSPACE: + case OVS_ACTION_ATTR_SAMPLE: + case OVS_ACTION_ATTR_TRUNC: + break; + + case OVS_ACTION_ATTR_PUSH_MPLS: + case OVS_ACTION_ATTR_POP_MPLS: + case OVS_ACTION_ATTR_PUSH_VLAN: + case OVS_ACTION_ATTR_POP_VLAN: + case OVS_ACTION_ATTR_SET: + case OVS_ACTION_ATTR_SET_MASKED: + case OVS_ACTION_ATTR_HASH: + case OVS_ACTION_ATTR_CT: + case OVS_ACTION_ATTR_PUSH_ETH: + case OVS_ACTION_ATTR_POP_ETH: + default: + return true; + } + } + return false; +} + static void update_range(struct sw_flow_match *match, size_t offset, size_t size, bool is_mask) { @@ -2027,12 +2060,14 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, int depth, struct sw_flow_actions **sfa, - __be16 eth_type, __be16 vlan_tci, bool log) + __be16 eth_type, __be16 vlan_tci, + bool log, bool last) { const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; const struct nlattr *probability, *actions; const struct nlattr *a; - int rem, start, err, st_acts; + int rem, start, err; + struct sample_arg arg; memset(attrs, 0, sizeof(attrs)); nla_for_each_nested(a, attr, rem) { @@ -2056,20 +2091,32 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log); if (start < 0) return start; - err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, - nla_data(probability), sizeof(u32), log); + + /* When both skb and flow may be changed, put the sample + * into a deferred fifo. On the other hand, if only skb + * may be modified, the actions can be executed in place. + * + * Do this analysis at the flow installation time. + * Set 'clone_action->exec' to true if the actions can be + * executed without being deferred. + * + * If the sample is the last action, it can always be excuted + * rather than deferred. + */ + arg.exec = last || !actions_may_change_flow(actions); + arg.probability = nla_get_u32(probability); + + err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg), + log); if (err) return err; - st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log); - if (st_acts < 0) - return st_acts; - err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa, + err = __ovs_nla_copy_actions(net, actions, key, depth, sfa, eth_type, vlan_tci, log); + if (err) return err; - add_nested_action_end(*sfa, st_acts); add_nested_action_end(*sfa, start); return 0; @@ -2554,8 +2601,10 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, break; case OVS_ACTION_ATTR_SAMPLE: - err = validate_and_copy_sample(net, a, key, depth, sfa, - eth_type, vlan_tci, log); + err = validate_and_copy_sample(net, a, key, depth, + sfa, eth_type, + vlan_tci, log, + nla_is_last(a, rem)); if (err) return err; skip_copy = true; @@ -2621,39 +2670,44 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, return err; } -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) +static int sample_action_to_attr(const struct nlattr *attr, + struct sk_buff *skb) { - const struct nlattr *a; - struct nlattr *start; - int err = 0, rem; + struct nlattr *start, *ac_start = NULL, *sample_arg; + int err = 0, rem = nla_len(attr); + const struct sample_arg *arg; + struct nlattr *actions; start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE); if (!start) return -EMSGSIZE; - nla_for_each_nested(a, attr, rem) { - int type = nla_type(a); - struct nlattr *st_sample; + sample_arg = nla_data(attr); + arg = nla_data(sample_arg); + actions = nla_next(sample_arg, &rem); - switch (type) { - case OVS_SAMPLE_ATTR_PROBABILITY: - if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY, - sizeof(u32), nla_data(a))) - return -EMSGSIZE; - break; - case OVS_SAMPLE_ATTR_ACTIONS: - st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS); - if (!st_sample) - return -EMSGSIZE; - err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb); - if (err) - return err; - nla_nest_end(skb, st_sample); - break; - } + if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability)) { + err = -EMSGSIZE; + goto out; + } + + ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS); + if (!ac_start) { + err = -EMSGSIZE; + goto out; + } + + err = ovs_nla_put_actions(actions, rem, skb); + +out: + if (err) { + nla_nest_cancel(skb, ac_start); + nla_nest_cancel(skb, start); + } else { + nla_nest_end(skb, ac_start); + nla_nest_end(skb, start); } - nla_nest_end(skb, start); return err; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-11 0:51 ` [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou @ 2017-03-13 7:08 ` Pravin Shelar 2017-03-13 20:14 ` Andy Zhou 0 siblings, 1 reply; 11+ messages in thread From: Pravin Shelar @ 2017-03-13 7:08 UTC (permalink / raw) To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer On Fri, Mar 10, 2017 at 4:51 PM, Andy Zhou <azhou@ovn.org> wrote: > With the introduction of open flow 'clone' action, the OVS user space > can now translate the 'clone' action into kernel datapath 'sample' > action, with 100% probability, to ensure that the clone semantics, > which is that the packet seen by the clone action is the same as the > packet seen by the action after clone, is faithfully carried out > in the datapath. > > While the sample action in the datpath has the matching semantics, > its implementation is only optimized for its original use. > Specifically, there are two limitation: First, there is a 3 level of > nesting restriction, enforced at the flow downloading time. This > limit turns out to be too restrictive for the 'clone' use case. > Second, the implementation avoid recursive call only if the sample > action list has a single userspace action. > > The main optimization implemented in this series removes the static > nesting limit check, instead, implement the run time recursion limit > check, and recursion avoidance similar to that of the 'recirc' action. > This optimization solve both #1 and #2 issues above. > > One related optimization attemps to avoid copying flow key as > long as the actions enclosed does not change the flow key. The > detection is performed only once at the flow downloading time. > > Another related optimization is to rewrite the action list > at flow downloading time in order to save the fast path from parsing > the sample action list in its original form repeatedly. > > Signed-off-by: Andy Zhou <azhou@ovn.org> > --- > include/uapi/linux/openvswitch.h | 13 ++++ > net/openvswitch/actions.c | 111 ++++++++++++++++++---------------- > net/openvswitch/datapath.h | 1 + > net/openvswitch/flow_netlink.c | 126 ++++++++++++++++++++++++++++----------- > 4 files changed, 162 insertions(+), 89 deletions(-) > .... ... > static int sample(struct datapath *dp, struct sk_buff *skb, > struct sw_flow_key *key, const struct nlattr *attr, > - const struct nlattr *actions, int actions_len) > + bool last) > { > - const struct nlattr *acts_list = NULL; > - const struct nlattr *a; > - int rem; > - u32 cutlen = 0; > - > - for (a = nla_data(attr), rem = nla_len(attr); rem > 0; > - a = nla_next(a, &rem)) { > - u32 probability; > + struct nlattr *actions; > + struct nlattr *sample_arg; > + struct sk_buff *clone_skb; > + struct sw_flow_key *orig = key; > + int rem = nla_len(attr); > + int err = 0; > + const struct sample_arg *arg; > > - switch (nla_type(a)) { > - case OVS_SAMPLE_ATTR_PROBABILITY: > - probability = nla_get_u32(a); > - if (!probability || prandom_u32() > probability) > - return 0; > - break; > + /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ > + sample_arg = nla_data(attr); > + arg = nla_data(sample_arg); > + actions = nla_next(sample_arg, &rem); > > - case OVS_SAMPLE_ATTR_ACTIONS: > - acts_list = a; > - break; > - } > + if ((arg->probability != U32_MAX) && > + (!arg->probability || prandom_u32() > arg->probability)) { > + if (last) > + consume_skb(skb); To simplify let the existing code in do_execute_action() handle freeing skb in this case. > + return 0; > } > > - rem = nla_len(acts_list); > - a = nla_data(acts_list); > - > - /* Actions list is empty, do nothing */ > - if (unlikely(!rem)) > + /* Unless the last action, Sample actions work on the skb clone. */ > + clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC); > + if (!clone_skb) { > + /* Out of memory, skip this sample action. > + */ > return 0; > + } > > - /* The only known usage of sample action is having a single user-space > - * action, or having a truncate action followed by a single user-space > - * action. Treat this usage as a special case. > - * The output_userspace() should clone the skb to be sent to the > - * user space. This skb will be consumed by its caller. > + /* In case the sample actions won't change 'key', > + * we can use key for the clone execution. > + * Otherwise, try to allocate a key from the > + * next recursion level of 'flow_keys'. If > + * successful, we can still execute the clone > + * actions without deferring. > + * > + * Defer the sample action if the action recursion > + * limit has been reached. > */ > - if (unlikely(nla_type(a) == OVS_ACTION_ATTR_TRUNC)) { > - struct ovs_action_trunc *trunc = nla_data(a); > - > - if (skb->len > trunc->max_len) > - cutlen = skb->len - trunc->max_len; > - > - a = nla_next(a, &rem); > + if (!arg->exec) { > + __this_cpu_inc(exec_actions_level); > + key = clone_key(key); > } > > - if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE && > - nla_is_last(a, rem))) > - return output_userspace(dp, skb, key, a, actions, > - actions_len, cutlen); > - > - skb = skb_clone(skb, GFP_ATOMIC); > - if (!skb) > - /* Skip the sample action when out of memory. */ > - return 0; > + if (key) { > + err = do_execute_actions(dp, skb, key, actions, rem); > + } else if (!add_deferred_actions(skb, orig, actions, rem)) { > We can refactor this code to avoid duplicate code all actions implementation. This way there could be single function dealing with both defered action and key fifo arrays. .... .... > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h > index 1c6e937..d8d3953 100644 > --- a/net/openvswitch/datapath.h > +++ b/net/openvswitch/datapath.h > @@ -220,4 +220,5 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, > if (logging_allowed && net_ratelimit()) \ > pr_info("netlink: " fmt "\n", ##__VA_ARGS__); \ > } while (0) > + > #endif /* datapath.h */ Extraneous whitespace change. > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c > index 6f5fa50..c8ac538 100644 > --- a/net/openvswitch/flow_netlink.c > +++ b/net/openvswitch/flow_netlink.c ... > +static bool actions_may_change_flow(const struct nlattr *actions) > +{ > + struct nlattr *nla; > + int rem; > + > + nla_for_each_nested(nla, actions, rem) { > + u16 action = nla_type(nla); > + > + switch (action) { > + case OVS_ACTION_ATTR_OUTPUT: > + case OVS_ACTION_ATTR_RECIRC: > + case OVS_ACTION_ATTR_USERSPACE: > + case OVS_ACTION_ATTR_SAMPLE: > + case OVS_ACTION_ATTR_TRUNC: > + break; > + > + case OVS_ACTION_ATTR_PUSH_MPLS: > + case OVS_ACTION_ATTR_POP_MPLS: > + case OVS_ACTION_ATTR_PUSH_VLAN: > + case OVS_ACTION_ATTR_POP_VLAN: > + case OVS_ACTION_ATTR_SET: > + case OVS_ACTION_ATTR_SET_MASKED: > + case OVS_ACTION_ATTR_HASH: > + case OVS_ACTION_ATTR_CT: > + case OVS_ACTION_ATTR_PUSH_ETH: > + case OVS_ACTION_ATTR_POP_ETH: > + default: > + return true; > + } > + } > + return false; > +} > + I am not sure if we can put sample or recirc in "may not change flow" actions list. Consider following set of actions: sample(sample(set-IP)),userpsace(),... In this case the userspace action could recieve skb with inconsistent flow due to preceding of set action nested in the sample action. > static void update_range(struct sw_flow_match *match, > size_t offset, size_t size, bool is_mask) > { .... > @@ -2056,20 +2091,32 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, > start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log); > if (start < 0) > return start; > - err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, > - nla_data(probability), sizeof(u32), log); > + > + /* When both skb and flow may be changed, put the sample > + * into a deferred fifo. On the other hand, if only skb > + * may be modified, the actions can be executed in place. > + * > + * Do this analysis at the flow installation time. > + * Set 'clone_action->exec' to true if the actions can be > + * executed without being deferred. > + * > + * If the sample is the last action, it can always be excuted > + * rather than deferred. > + */ > + arg.exec = last || !actions_may_change_flow(actions); > + arg.probability = nla_get_u32(probability); > + > + err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg), > + log); > if (err) > return err; > - st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log); > - if (st_acts < 0) > - return st_acts; > > - err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa, > + err = __ovs_nla_copy_actions(net, actions, key, depth, sfa, > eth_type, vlan_tci, log); Since sample action `depth` is not tracked anymore during flow install, we should remove the parameter and its limit from datapath.h. ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-13 7:08 ` Pravin Shelar @ 2017-03-13 20:14 ` Andy Zhou 2017-03-13 23:39 ` Andy Zhou 2017-03-14 5:57 ` Pravin Shelar 0 siblings, 2 replies; 11+ messages in thread From: Andy Zhou @ 2017-03-13 20:14 UTC (permalink / raw) To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer Thanks for the review. Please see comments inline. On Mon, Mar 13, 2017 at 12:08 AM, Pravin Shelar <pshelar@ovn.org> wrote: > On Fri, Mar 10, 2017 at 4:51 PM, Andy Zhou <azhou@ovn.org> wrote: >> With the introduction of open flow 'clone' action, the OVS user space >> can now translate the 'clone' action into kernel datapath 'sample' >> action, with 100% probability, to ensure that the clone semantics, >> which is that the packet seen by the clone action is the same as the >> packet seen by the action after clone, is faithfully carried out >> in the datapath. >> >> While the sample action in the datpath has the matching semantics, >> its implementation is only optimized for its original use. >> Specifically, there are two limitation: First, there is a 3 level of >> nesting restriction, enforced at the flow downloading time. This >> limit turns out to be too restrictive for the 'clone' use case. >> Second, the implementation avoid recursive call only if the sample >> action list has a single userspace action. >> >> The main optimization implemented in this series removes the static >> nesting limit check, instead, implement the run time recursion limit >> check, and recursion avoidance similar to that of the 'recirc' action. >> This optimization solve both #1 and #2 issues above. >> >> One related optimization attemps to avoid copying flow key as >> long as the actions enclosed does not change the flow key. The >> detection is performed only once at the flow downloading time. >> >> Another related optimization is to rewrite the action list >> at flow downloading time in order to save the fast path from parsing >> the sample action list in its original form repeatedly. >> >> Signed-off-by: Andy Zhou <azhou@ovn.org> >> --- >> include/uapi/linux/openvswitch.h | 13 ++++ >> net/openvswitch/actions.c | 111 ++++++++++++++++++---------------- >> net/openvswitch/datapath.h | 1 + >> net/openvswitch/flow_netlink.c | 126 ++++++++++++++++++++++++++++----------- >> 4 files changed, 162 insertions(+), 89 deletions(-) >> > .... > ... >> static int sample(struct datapath *dp, struct sk_buff *skb, >> struct sw_flow_key *key, const struct nlattr *attr, >> - const struct nlattr *actions, int actions_len) >> + bool last) >> { >> - const struct nlattr *acts_list = NULL; >> - const struct nlattr *a; >> - int rem; >> - u32 cutlen = 0; >> - >> - for (a = nla_data(attr), rem = nla_len(attr); rem > 0; >> - a = nla_next(a, &rem)) { >> - u32 probability; >> + struct nlattr *actions; >> + struct nlattr *sample_arg; >> + struct sk_buff *clone_skb; >> + struct sw_flow_key *orig = key; >> + int rem = nla_len(attr); >> + int err = 0; >> + const struct sample_arg *arg; >> >> - switch (nla_type(a)) { >> - case OVS_SAMPLE_ATTR_PROBABILITY: >> - probability = nla_get_u32(a); >> - if (!probability || prandom_u32() > probability) >> - return 0; >> - break; >> + /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ >> + sample_arg = nla_data(attr); >> + arg = nla_data(sample_arg); >> + actions = nla_next(sample_arg, &rem); >> >> - case OVS_SAMPLE_ATTR_ACTIONS: >> - acts_list = a; >> - break; >> - } >> + if ((arg->probability != U32_MAX) && >> + (!arg->probability || prandom_u32() > arg->probability)) { >> + if (last) >> + consume_skb(skb); > To simplify let the existing code in do_execute_action() handle > freeing skb in this case. In the 'last' case, this function always consumes skb. In case the probability test passes, this is done by not cloning the skb before passing the skb to 'do_execute_actions'. In case the probability test does not pass, the skb has to be explicitly freed. Currently, the caller does not know which case it is. Are you suggesting that we change the function signature to pass this information back to the caller? Or something else? > >> + return 0; >> } >> >> - rem = nla_len(acts_list); >> - a = nla_data(acts_list); >> - >> - /* Actions list is empty, do nothing */ >> - if (unlikely(!rem)) >> + /* Unless the last action, Sample actions work on the skb clone. */ >> + clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC); >> + if (!clone_skb) { >> + /* Out of memory, skip this sample action. >> + */ >> return 0; >> + } >> >> - /* The only known usage of sample action is having a single user-space >> - * action, or having a truncate action followed by a single user-space >> - * action. Treat this usage as a special case. >> - * The output_userspace() should clone the skb to be sent to the >> - * user space. This skb will be consumed by its caller. >> + /* In case the sample actions won't change 'key', >> + * we can use key for the clone execution. >> + * Otherwise, try to allocate a key from the >> + * next recursion level of 'flow_keys'. If >> + * successful, we can still execute the clone >> + * actions without deferring. >> + * >> + * Defer the sample action if the action recursion >> + * limit has been reached. >> */ >> - if (unlikely(nla_type(a) == OVS_ACTION_ATTR_TRUNC)) { >> - struct ovs_action_trunc *trunc = nla_data(a); >> - >> - if (skb->len > trunc->max_len) >> - cutlen = skb->len - trunc->max_len; >> - >> - a = nla_next(a, &rem); >> + if (!arg->exec) { >> + __this_cpu_inc(exec_actions_level); >> + key = clone_key(key); >> } >> >> - if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE && >> - nla_is_last(a, rem))) >> - return output_userspace(dp, skb, key, a, actions, >> - actions_len, cutlen); >> - >> - skb = skb_clone(skb, GFP_ATOMIC); >> - if (!skb) >> - /* Skip the sample action when out of memory. */ >> - return 0; >> + if (key) { >> + err = do_execute_actions(dp, skb, key, actions, rem); >> + } else if (!add_deferred_actions(skb, orig, actions, rem)) { >> > We can refactor this code to avoid duplicate code all actions > implementation. This way there could be single function dealing with > both defered action and key fifo arrays. O.K. I will make the change in the next version. > > .... > .... >> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h >> index 1c6e937..d8d3953 100644 >> --- a/net/openvswitch/datapath.h >> +++ b/net/openvswitch/datapath.h >> @@ -220,4 +220,5 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, >> if (logging_allowed && net_ratelimit()) \ >> pr_info("netlink: " fmt "\n", ##__VA_ARGS__); \ >> } while (0) >> + >> #endif /* datapath.h */ > > Extraneous whitespace change. Thanks, will fix. > >> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c >> index 6f5fa50..c8ac538 100644 >> --- a/net/openvswitch/flow_netlink.c >> +++ b/net/openvswitch/flow_netlink.c > ... >> +static bool actions_may_change_flow(const struct nlattr *actions) >> +{ >> + struct nlattr *nla; >> + int rem; >> + >> + nla_for_each_nested(nla, actions, rem) { >> + u16 action = nla_type(nla); >> + >> + switch (action) { >> + case OVS_ACTION_ATTR_OUTPUT: >> + case OVS_ACTION_ATTR_RECIRC: >> + case OVS_ACTION_ATTR_USERSPACE: >> + case OVS_ACTION_ATTR_SAMPLE: >> + case OVS_ACTION_ATTR_TRUNC: >> + break; >> + >> + case OVS_ACTION_ATTR_PUSH_MPLS: >> + case OVS_ACTION_ATTR_POP_MPLS: >> + case OVS_ACTION_ATTR_PUSH_VLAN: >> + case OVS_ACTION_ATTR_POP_VLAN: >> + case OVS_ACTION_ATTR_SET: >> + case OVS_ACTION_ATTR_SET_MASKED: >> + case OVS_ACTION_ATTR_HASH: >> + case OVS_ACTION_ATTR_CT: >> + case OVS_ACTION_ATTR_PUSH_ETH: >> + case OVS_ACTION_ATTR_POP_ETH: >> + default: >> + return true; >> + } >> + } >> + return false; >> +} >> + > I am not sure if we can put sample or recirc in "may not change flow" > actions list. Consider following set of actions: > sample(sample(set-IP)),userpsace(),... > In this case the userspace action could recieve skb with inconsistent > flow due to preceding of set action nested in the sample action. Good catch. Will fix. > >> static void update_range(struct sw_flow_match *match, >> size_t offset, size_t size, bool is_mask) >> { > .... >> @@ -2056,20 +2091,32 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, >> start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log); >> if (start < 0) >> return start; >> - err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, >> - nla_data(probability), sizeof(u32), log); >> + >> + /* When both skb and flow may be changed, put the sample >> + * into a deferred fifo. On the other hand, if only skb >> + * may be modified, the actions can be executed in place. >> + * >> + * Do this analysis at the flow installation time. >> + * Set 'clone_action->exec' to true if the actions can be >> + * executed without being deferred. >> + * >> + * If the sample is the last action, it can always be excuted >> + * rather than deferred. >> + */ >> + arg.exec = last || !actions_may_change_flow(actions); >> + arg.probability = nla_get_u32(probability); >> + >> + err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg), >> + log); >> if (err) >> return err; >> - st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log); >> - if (st_acts < 0) >> - return st_acts; >> >> - err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa, >> + err = __ovs_nla_copy_actions(net, actions, key, depth, sfa, >> eth_type, vlan_tci, log); > > Since sample action `depth` is not tracked anymore during flow > install, we should remove the parameter and its limit from datapath.h. Sure, will fix. > ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-13 20:14 ` Andy Zhou @ 2017-03-13 23:39 ` Andy Zhou 2017-03-14 6:20 ` Pravin Shelar 2017-03-14 5:57 ` Pravin Shelar 1 sibling, 1 reply; 11+ messages in thread From: Andy Zhou @ 2017-03-13 23:39 UTC (permalink / raw) To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer >>> - skb = skb_clone(skb, GFP_ATOMIC); >>> - if (!skb) >>> - /* Skip the sample action when out of memory. */ >>> - return 0; >>> + if (key) { >>> + err = do_execute_actions(dp, skb, key, actions, rem); >>> + } else if (!add_deferred_actions(skb, orig, actions, rem)) { >>> >> We can refactor this code to avoid duplicate code all actions >> implementation. This way there could be single function dealing with >> both defered action and key fifo arrays. > > O.K. I will make the change in the next version. After looking more at it, the sample action and recirc action are different enough that I don't see clean ways refactor them. For example, recirc needs to deal with setting recirc_id. recirc calls ovs_dp_process_packet, and sample calls do_execute_action. >> I am not sure if we can put sample or recirc in "may not change flow" >> actions list. Consider following set of actions: >> sample(sample(set-IP)),userpsace(),... >> In this case the userspace action could recieve skb with inconsistent >> flow due to preceding of set action nested in the sample action. > > Good catch. Will fix. What's the objection on recirc action? It always works on clone key. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-13 23:39 ` Andy Zhou @ 2017-03-14 6:20 ` Pravin Shelar 2017-03-14 23:11 ` Andy Zhou 0 siblings, 1 reply; 11+ messages in thread From: Pravin Shelar @ 2017-03-14 6:20 UTC (permalink / raw) To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer On Mon, Mar 13, 2017 at 4:39 PM, Andy Zhou <azhou@ovn.org> wrote: >>>> - skb = skb_clone(skb, GFP_ATOMIC); >>>> - if (!skb) >>>> - /* Skip the sample action when out of memory. */ >>>> - return 0; >>>> + if (key) { >>>> + err = do_execute_actions(dp, skb, key, actions, rem); >>>> + } else if (!add_deferred_actions(skb, orig, actions, rem)) { >>>> >>> We can refactor this code to avoid duplicate code all actions >>> implementation. This way there could be single function dealing with >>> both defered action and key fifo arrays. >> >> O.K. I will make the change in the next version. > > After looking more at it, the sample action and recirc action are different > enough that I don't see clean ways refactor them. For example, recirc > needs to deal with setting recirc_id. recirc calls > ovs_dp_process_packet, and sample calls do_execute_action. > Actions parameter which hints if it is recirc or sample. We can add recic-id param and set it if it is recic case. will this work? >>> I am not sure if we can put sample or recirc in "may not change flow" >>> actions list. Consider following set of actions: >>> sample(sample(set-IP)),userpsace(),... >>> In this case the userspace action could recieve skb with inconsistent >>> flow due to preceding of set action nested in the sample action. >> >> Good catch. Will fix. > > What's the objection on recirc action? It always works on clone key. ok. recic does not have the issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-14 6:20 ` Pravin Shelar @ 2017-03-14 23:11 ` Andy Zhou 0 siblings, 0 replies; 11+ messages in thread From: Andy Zhou @ 2017-03-14 23:11 UTC (permalink / raw) To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer > Actions parameter which hints if it is recirc or sample. We can add > recic-id param and set it if it is recic case. > will this work? > Just posted v2. I added a patch in the end to implement this refactoring as suggested. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases 2017-03-13 20:14 ` Andy Zhou 2017-03-13 23:39 ` Andy Zhou @ 2017-03-14 5:57 ` Pravin Shelar 1 sibling, 0 replies; 11+ messages in thread From: Pravin Shelar @ 2017-03-14 5:57 UTC (permalink / raw) To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer On Mon, Mar 13, 2017 at 1:14 PM, Andy Zhou <azhou@ovn.org> wrote: > Thanks for the review. Please see comments inline. > > On Mon, Mar 13, 2017 at 12:08 AM, Pravin Shelar <pshelar@ovn.org> wrote: >> On Fri, Mar 10, 2017 at 4:51 PM, Andy Zhou <azhou@ovn.org> wrote: >>> With the introduction of open flow 'clone' action, the OVS user space >>> can now translate the 'clone' action into kernel datapath 'sample' >>> action, with 100% probability, to ensure that the clone semantics, >>> which is that the packet seen by the clone action is the same as the >>> packet seen by the action after clone, is faithfully carried out >>> in the datapath. >>> >>> While the sample action in the datpath has the matching semantics, >>> its implementation is only optimized for its original use. >>> Specifically, there are two limitation: First, there is a 3 level of >>> nesting restriction, enforced at the flow downloading time. This >>> limit turns out to be too restrictive for the 'clone' use case. >>> Second, the implementation avoid recursive call only if the sample >>> action list has a single userspace action. >>> >>> The main optimization implemented in this series removes the static >>> nesting limit check, instead, implement the run time recursion limit >>> check, and recursion avoidance similar to that of the 'recirc' action. >>> This optimization solve both #1 and #2 issues above. >>> >>> One related optimization attemps to avoid copying flow key as >>> long as the actions enclosed does not change the flow key. The >>> detection is performed only once at the flow downloading time. >>> >>> Another related optimization is to rewrite the action list >>> at flow downloading time in order to save the fast path from parsing >>> the sample action list in its original form repeatedly. >>> >>> Signed-off-by: Andy Zhou <azhou@ovn.org> >>> --- >>> include/uapi/linux/openvswitch.h | 13 ++++ >>> net/openvswitch/actions.c | 111 ++++++++++++++++++---------------- >>> net/openvswitch/datapath.h | 1 + >>> net/openvswitch/flow_netlink.c | 126 ++++++++++++++++++++++++++++----------- >>> 4 files changed, 162 insertions(+), 89 deletions(-) >>> >> .... >> ... >>> static int sample(struct datapath *dp, struct sk_buff *skb, >>> struct sw_flow_key *key, const struct nlattr *attr, >>> - const struct nlattr *actions, int actions_len) >>> + bool last) >>> { >>> - const struct nlattr *acts_list = NULL; >>> - const struct nlattr *a; >>> - int rem; >>> - u32 cutlen = 0; >>> - >>> - for (a = nla_data(attr), rem = nla_len(attr); rem > 0; >>> - a = nla_next(a, &rem)) { >>> - u32 probability; >>> + struct nlattr *actions; >>> + struct nlattr *sample_arg; >>> + struct sk_buff *clone_skb; >>> + struct sw_flow_key *orig = key; >>> + int rem = nla_len(attr); >>> + int err = 0; >>> + const struct sample_arg *arg; >>> >>> - switch (nla_type(a)) { >>> - case OVS_SAMPLE_ATTR_PROBABILITY: >>> - probability = nla_get_u32(a); >>> - if (!probability || prandom_u32() > probability) >>> - return 0; >>> - break; >>> + /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ >>> + sample_arg = nla_data(attr); >>> + arg = nla_data(sample_arg); >>> + actions = nla_next(sample_arg, &rem); >>> >>> - case OVS_SAMPLE_ATTR_ACTIONS: >>> - acts_list = a; >>> - break; >>> - } >>> + if ((arg->probability != U32_MAX) && >>> + (!arg->probability || prandom_u32() > arg->probability)) { >>> + if (last) >>> + consume_skb(skb); >> To simplify let the existing code in do_execute_action() handle >> freeing skb in this case. > > In the 'last' case, this function always consumes skb. In case the > probability test passes, this is done by not cloning the skb > before passing the skb to 'do_execute_actions'. In case the > probability test does not pass, the skb has to be explicitly freed. > Currently, the caller does not know which case it is. Are you > suggesting that we change the function signature to pass > this information back to the caller? Or something else? >> I see, lets keep this code then. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-03-14 23:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-11 0:51 [net-next sample action optimization 0/3] Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 1/3] openvswitch: Deferred fifo API change Andy Zhou 2017-03-11 0:51 ` [net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou 2017-03-13 20:32 ` Joe Stringer 2017-03-11 0:51 ` [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou 2017-03-13 7:08 ` Pravin Shelar 2017-03-13 20:14 ` Andy Zhou 2017-03-13 23:39 ` Andy Zhou 2017-03-14 6:20 ` Pravin Shelar 2017-03-14 23:11 ` Andy Zhou 2017-03-14 5:57 ` Pravin Shelar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).