* [PATCH net-next 1/3] packet: improve socket create/bind latency in some cases
From: Daniel Borkmann @ 2014-01-12 16:22 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1389543768-20234-1-git-send-email-dborkman@redhat.com>
Most people acquire PF_PACKET sockets with a protocol argument in
the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for
all its sockets. Most likely, at some point in time a subsequent
bind() call will follow, e.g. in libpcap with ...
memset(&sll, 0, sizeof(sll));
sll.sll_family = AF_PACKET;
sll.sll_ifindex = ifindex;
sll.sll_protocol = htons(ETH_P_ALL);
... as arguments. What happens in the kernel is that already
in socket() syscall, we install a proto hook via register_prot_hook()
if our protocol argument is != 0. Yet, in bind() we're almost
doing the same work by doing a unregister_prot_hook() with an
expensive synchronize_net() call in case during socket() the proto
was != 0, plus follow-up register_prot_hook() with a bound device
to it this time, in order to limit traffic we get.
In the case when the protocol and user supplied device index (== 0)
does not change from socket() to bind(), we can spare us doing
the same work twice. Similarly for re-binding to the same device
and protocol. For these scenarios, we can decrease create/bind
latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case)
with this patch.
Alternatively, for the first case, if people care, they should
simply create their sockets with proto == 0 argument and define
the protocol during bind() as this saves a call to synchronize_net()
as well (sock-bind-3 case).
In all other cases, we're tied to user space behaviour we must not
change, also since a bind() is not strictly required. Thus, we need
the synchronize_net() to make sure no asynchronous packet processing
paths still refer to the previous elements of po->prot_hook.
In case of mmap()ed sockets, the workflow that includes bind() is
socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of
{__unregister, register}_prot_hook is being called from setsockopt()
in order to install the new protocol receive handler. Thus, when
we call bind and can skip a re-hook, we have already previously
installed the new handler. For fanout, this is handled different
entirely, so we should be good.
Timings on an i7-3520M machine:
* sock-bind-1: 89 us
* sock-bind-2: 7447 us
* sock-bind-3: 75 us
sock-bind-1:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0),
pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
sock-bind-2:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
sock-bind-3:
socket(PF_PACKET, SOCK_RAW, 0) = 3
bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
v1->v2:
- applied Dave's feedback to move assignments under bind lock
- removed cleanup part
net/packet/af_packet.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 279467b..85bb38c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2567,9 +2567,12 @@ static int packet_release(struct socket *sock)
* Attach a packet hook.
*/
-static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protocol)
+static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
{
struct packet_sock *po = pkt_sk(sk);
+ const struct net_device *dev_curr;
+ __be16 proto_curr;
+ bool need_rehook;
if (po->fanout) {
if (dev)
@@ -2579,21 +2582,29 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc
}
lock_sock(sk);
-
spin_lock(&po->bind_lock);
- unregister_prot_hook(sk, true);
- po->num = protocol;
- po->prot_hook.type = protocol;
- if (po->prot_hook.dev)
- dev_put(po->prot_hook.dev);
+ proto_curr = po->prot_hook.type;
+ dev_curr = po->prot_hook.dev;
+
+ need_rehook = proto_curr != proto || dev_curr != dev;
+
+ if (need_rehook) {
+ unregister_prot_hook(sk, true);
- po->prot_hook.dev = dev;
- po->ifindex = dev ? dev->ifindex : 0;
+ po->num = proto;
+ po->prot_hook.type = proto;
+
+ if (po->prot_hook.dev)
+ dev_put(po->prot_hook.dev);
- packet_cached_dev_assign(po, dev);
+ po->prot_hook.dev = dev;
+
+ po->ifindex = dev ? dev->ifindex : 0;
+ packet_cached_dev_assign(po, dev);
+ }
- if (protocol == 0)
+ if (proto == 0 || !need_rehook)
goto out_unlock;
if (!dev || (dev->flags & IFF_UP)) {
--
1.7.11.7
^ permalink raw reply related
* [PATCH net-next 0/3] pf_packet updates
From: Daniel Borkmann @ 2014-01-12 16:22 UTC (permalink / raw)
To: davem; +Cc: netdev
Daniel Borkmann (3):
packet: improve socket create/bind latency in some cases
packet: don't unconditionally schedule() in case of MSG_DONTWAIT
packet: use percpu mmap tx frame pending refcount
net/packet/af_packet.c | 105 +++++++++++++++++++++++++++++++++++++++----------
net/packet/diag.c | 1 +
net/packet/internal.h | 2 +-
3 files changed, 86 insertions(+), 22 deletions(-)
--
1.7.11.7
^ permalink raw reply
* Re: [PATCH net-next] IPv6: enable TCP to use an anycast address
From: François-Xavier Le Bail @ 2014-01-12 14:53 UTC (permalink / raw)
To: Alexey Kuznetsov, Hannes Frederic Sowa
Cc: netdev, David S. Miller, James Morris, Hideaki Yoshifuji,
Patrick McHardy
On Sat, 1/11/14, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> On Sat, Jan 11, 2014 at 05:38:27PM +0400, Alexey Kuznetsov wrote:
> > On Sat, Jan 11, 2014 at 5:06 PM, François-Xavier Le Bail
> > <fx.lebail@yahoo.com> wrote:
> > > Many DNS root-servers use TCP with anycast (IPv4 and IPV6).
> >
> > Actually, I was alerted by reset processing in your patch, it cannot be right.
> >
> > Do not you think this must not be enabled for common use? At least
> > some separate sysctl disabled by default.
> The idea I had, was, that if a socket does knowingly bind to an anycast
> address, it is allowed to do so and process queries on it with both TCP and
> UDP. I don't think we need a sysctl for that? Anycast addresses are either
> pre-defined (e.g. the subnet router anycast address) or specified by a flag
> when the administrator adds one. Currently one can only add anycast addresses
> either by forwarding and gets the per-subnet anycast address or with a
> setsockopt IPV6_JOIN_ANYCAST.
> So the problem is what should be allowed when the socket listens on an any
> address? Maybe this should be protected by a sysctl?
Hi,
TCP case:
With my two patches (the one for bind and this one for tcp), when a
SOCK_STREAM socket listen to in6addr_any, the server is able to
send TCP reply with unicast or anycast source address, according
to the destination address used by the client.
dest request unicast => src reply unicast (current behavior)
dest resquet anycast => src reply anycast (new)
So, I don't think there is a need for a sysctl.
UDP case:
By default (no socket option), the server program don't know the
destination address of the request. The ipv6_dev_get_saddr() is
used for choosing the unicast source address of the reply.
I am not sure a change is needed here.
When using IPV6_RECVPKTINFO, a server is able to know
the destination address of the request and can use it as source
address for the reply.
To enable anycast for this (don't get EINVAL), there is need for
a patch like the one I posted ("IPv6: add option to use anycast
addresses as source addresses for datagrams").
I am working on a v2.
With the appropriate change:
dest request unicast => src reply unicast (current behavior)
dest resquet anycast => src reply anycast (new)
I don't think, there either, there is a need for a sysctl.
What do you think about it?
BR,
Francois-Xavier
^ permalink raw reply
* Re: [RFC PATCH 00/12] RCU'ify the net:sched classifier chains
From: Jamal Hadi Salim @ 2014-01-12 14:18 UTC (permalink / raw)
To: John Fastabend, xiyou.wangcong, eric.dumazet; +Cc: netdev, davem
In-Reply-To: <52D29F4C.90808@mojatatu.com>
On 01/12/14 08:57, Jamal Hadi Salim wrote:
> I looked and here's a general question:
> Does even using RCU make any sense here? What we have
> is a lot of updates and very very little reads (reads essentially
> are done from the control side; the data path is is all about updates).
>
> I am not sure if RCU is a win in such a case - it could make things
> worse. At least that used to be the Truth(tm) many moons back.
> Is that not the case anymore?
>
Never mind.
You are not trying to make stats rcu - rather the list
of filters and actions (which is read mostly from data path).
Looking at the u32 piece - i think this is in the right
direction. Good stuff John!
cheers,
jamal
^ permalink raw reply
* Re: [RFC PATCH 00/12] RCU'ify the net:sched classifier chains
From: Jamal Hadi Salim @ 2014-01-12 13:57 UTC (permalink / raw)
To: John Fastabend, xiyou.wangcong, eric.dumazet; +Cc: netdev, davem
In-Reply-To: <52D2987D.5060807@mojatatu.com>
On 01/12/14 08:28, Jamal Hadi Salim wrote:
> I will scan through the patches...
>
I looked and here's a general question:
Does even using RCU make any sense here? What we have
is a lot of updates and very very little reads (reads essentially
are done from the control side; the data path is is all about updates).
I am not sure if RCU is a win in such a case - it could make things
worse. At least that used to be the Truth(tm) many moons back.
Is that not the case anymore?
cheers,
jamal
> This is fun stuff - I will try to participate whenever i can
> (unfortunately not much time at the moment).
>
^ permalink raw reply
* Re: [PATCH net-next 0/3] bonding: cleanup bond_3ad.c
From: Veaceslav Falico @ 2014-01-12 13:27 UTC (permalink / raw)
To: David Miller; +Cc: netdev, fubar, andy
In-Reply-To: <20140110.180614.405638448761352206.davem@davemloft.net>
On Fri, Jan 10, 2014 at 06:06:14PM -0500, David Miller wrote:
>From: Veaceslav Falico <vfalico@redhat.com>
>Date: Wed, 8 Jan 2014 16:46:45 +0100
>
>> It's a huge mess there currently - and, thus, really hard to read and
>> debug.
>>
>> This is the first series, and doesn't change the logic at all, only makes
>> it a bit more readable.
>>
>> CC: Jay Vosburgh <fubar@us.ibm.com>
>> CC: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>
>Series applied, thanks.
Hrm, still can't see them. Did I miss something?..
^ permalink raw reply
* Re: [RFC PATCH 00/12] RCU'ify the net:sched classifier chains
From: Jamal Hadi Salim @ 2014-01-12 13:28 UTC (permalink / raw)
To: John Fastabend, xiyou.wangcong, eric.dumazet; +Cc: netdev, davem
In-Reply-To: <20140110092041.7193.5952.stgit@nitbit.x32>
On 01/10/14 04:36, John Fastabend wrote:
>
> The ingress qdisc is a simple qdisc which doesn't maintain any
> actual list of skb's and is primarily a hook to attach filters.
> Further the only qdisc that can be attached to the ingress qdisc
> is sch_ingress. The qdisc lock is currently serializing two
> operations (1) tc_classify which is addressed here and (2)
> statistics accounting. The second point is not solved here but
> it could be a matter of making the bstats and qstats per cpu
> stats.
I think as you observed in your other email:
There is a little more than just the stats (on qdiscs as well);
internal to specific filters etc.
> This is an RFC for now and needs some more work. Some items
> I know about are (a) an audit of the ematch code paths, (b) resolving
> the checpatch errors mostly due to moving code around that
> generates those errors, (c) run smatch, (d) audit u32 code
> for correctness,
Ok, my feel-good dial went up reading #d above;->
>(e) do a lot more testing so far only very
> basic testing has been done. I tried to put some reasonable
> comments in the commit logs but yes they need more work.
>
Things like MSI based devices, massive single tun device
tests, ifb with redirect etc would help cleanse things up.
> Cong, if its not too much to ask can we use this as a base
> set of patches for this work? I think its reasonably close to
> correct as is.
>
I believe these are based on your original patches John, no?
So I would agree we use these as a base.
I will scan through the patches...
This is fun stuff - I will try to participate whenever i can
(unfortunately not much time at the moment).
cheers,
jamal
> Thanks! John.
>
> ---
>
> John Fastabend (12):
> net: qdisc: use rcu prefix and silence sparse warnings
> net: rcu-ify tcf_proto
> net: sched: cls_basic use RCU
> net: sched: cls_cgroup use RCU
> net: sched: cls_flow use RCU
> net: sched: fw use RCU
> net: sched: RCU cls_route
> net: sched: RCU cls_tcindex
> net: sched: make cls_u32 lockless
> net: sched: rcu'ify cls_rsvp
> net: make cls_bpf rcu safe
> net: sched: make tc_action safe to walk under RCU
>
>
> include/linux/netdevice.h | 41 +-----
> include/linux/rtnetlink.h | 10 +
> include/net/act_api.h | 1
> include/net/pkt_cls.h | 12 +-
> include/net/sch_generic.h | 34 ++++-
> net/core/dev.c | 54 +++++++
> net/sched/act_api.c | 18 +-
> net/sched/cls_api.c | 44 +++---
> net/sched/cls_basic.c | 82 ++++++-----
> net/sched/cls_bpf.c | 79 ++++++-----
> net/sched/cls_cgroup.c | 65 ++++++---
> net/sched/cls_flow.c | 145 ++++++++++++--------
> net/sched/cls_fw.c | 112 +++++++++++----
> net/sched/cls_route.c | 218 +++++++++++++++++-------------
> net/sched/cls_rsvp.h | 152 ++++++++++++---------
> net/sched/cls_tcindex.c | 327 ++++++++++++++++++++++++++-------------------
> net/sched/cls_u32.c | 258 +++++++++++++++++++++++-------------
> net/sched/sch_api.c | 6 -
> net/sched/sch_atm.c | 30 +++-
> net/sched/sch_cbq.c | 21 ++-
> net/sched/sch_choke.c | 18 ++
> net/sched/sch_drr.c | 10 +
> net/sched/sch_dsmark.c | 8 +
> net/sched/sch_fq_codel.c | 11 +-
> net/sched/sch_generic.c | 4 -
> net/sched/sch_hfsc.c | 17 ++
> net/sched/sch_htb.c | 23 ++-
> net/sched/sch_ingress.c | 8 +
> net/sched/sch_mqprio.c | 4 -
> net/sched/sch_multiq.c | 8 +
> net/sched/sch_prio.c | 11 +-
> net/sched/sch_qfq.c | 9 +
> net/sched/sch_sfb.c | 15 +-
> net/sched/sch_sfq.c | 11 +-
> net/sched/sch_teql.c | 9 +
> 35 files changed, 1139 insertions(+), 736 deletions(-)
>
^ permalink raw reply
* Re: [Patch net-next 7/7] net_sched: act: remove struct tcf_act_hdr
From: Jamal Hadi Salim @ 2014-01-12 13:13 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-8-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> It is not necessary at all.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Interestingly enough this is how things were originally ;->
Thanks Cong for all the effort.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [Patch net-next 6/7] net_sched: cls: move allocation in ->init to generic layer
From: Jamal Hadi Salim @ 2014-01-12 13:07 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: Thomas Graf, David S. Miller
In-Reply-To: <1389312845-10304-7-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> Most of the filters need allocation of tp->root in ->init()
> and free it in ->destroy(), make this generic.
>
> Also we could reduce the use of tcf_tree_lock a bit.
>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Hrm. This one worries me a little.
I dont see how just pre-allocing the private head of the classifier
magically allows you to get rid of locks. Have you tested against those
classifiers you changed?
If those locks are useless - then that is a separate patch to kill
them (sorry, dont have time to test myself right now).
cheers,
jamal
> ---
> include/net/sch_generic.h | 1 +
> net/sched/cls_api.c | 7 +++++++
> net/sched/cls_basic.c | 8 ++------
> net/sched/cls_bpf.c | 11 ++---------
> net/sched/cls_cgroup.c | 21 +++++++--------------
> net/sched/cls_flow.c | 8 ++------
> net/sched/cls_fw.c | 14 ++++----------
> net/sched/cls_route.c | 15 ++-------------
> net/sched/cls_rsvp.h | 10 ++--------
> net/sched/cls_tcindex.c | 13 ++-----------
> net/sched/cls_u32.c | 9 ++-------
> net/sched/sch_api.c | 1 +
> 12 files changed, 34 insertions(+), 84 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index d062f81..819dc1d 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -208,6 +208,7 @@ struct tcf_proto_ops {
> struct sk_buff *skb, struct tcmsg*);
>
> struct module *owner;
> + size_t root_size;
> };
>
> struct tcf_proto {
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 29a30a1..8460c75 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -262,6 +262,13 @@ replay:
> tp->q = q;
> tp->classify = tp_ops->classify;
> tp->classid = parent;
> + tp->root = kzalloc(tp_ops->root_size, GFP_KERNEL);
> + if (!tp->root) {
> + err = -ENOBUFS;
> + module_put(tp_ops->owner);
> + kfree(tp);
> + goto errout;
> + }
>
> err = tp_ops->init(tp);
> if (err != 0) {
> diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
> index e98ca99..318f672 100644
> --- a/net/sched/cls_basic.c
> +++ b/net/sched/cls_basic.c
> @@ -75,13 +75,9 @@ static void basic_put(struct tcf_proto *tp, unsigned long f)
>
> static int basic_init(struct tcf_proto *tp)
> {
> - struct basic_head *head;
> + struct basic_head *head = tp->root;
>
> - head = kzalloc(sizeof(*head), GFP_KERNEL);
> - if (head == NULL)
> - return -ENOBUFS;
> INIT_LIST_HEAD(&head->flist);
> - tp->root = head;
> return 0;
> }
>
> @@ -102,7 +98,6 @@ static void basic_destroy(struct tcf_proto *tp)
> list_del(&f->link);
> basic_delete_filter(tp, f);
> }
> - kfree(head);
> }
>
> static int basic_delete(struct tcf_proto *tp, unsigned long arg)
> @@ -288,6 +283,7 @@ static struct tcf_proto_ops cls_basic_ops __read_mostly = {
> .walk = basic_walk,
> .dump = basic_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct basic_head),
> };
>
> static int __init init_basic(void)
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 8e3cf49..eedd296 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -75,15 +75,9 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>
> static int cls_bpf_init(struct tcf_proto *tp)
> {
> - struct cls_bpf_head *head;
> -
> - head = kzalloc(sizeof(*head), GFP_KERNEL);
> - if (head == NULL)
> - return -ENOBUFS;
> + struct cls_bpf_head *head = tp->root;
>
> INIT_LIST_HEAD(&head->plist);
> - tp->root = head;
> -
> return 0;
> }
>
> @@ -126,8 +120,6 @@ static void cls_bpf_destroy(struct tcf_proto *tp)
> list_del(&prog->link);
> cls_bpf_delete_prog(tp, prog);
> }
> -
> - kfree(head);
> }
>
> static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
> @@ -366,6 +358,7 @@ static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
> .delete = cls_bpf_delete,
> .walk = cls_bpf_walk,
> .dump = cls_bpf_dump,
> + .root_size = sizeof(struct cls_bpf_head),
> };
>
> static int __init cls_bpf_init_mod(void)
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 8e2158a..4b7e083 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -22,6 +22,7 @@ struct cls_cgroup_head {
> u32 handle;
> struct tcf_exts exts;
> struct tcf_ematch_tree ematches;
> + bool init;
> };
>
> static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> @@ -73,6 +74,9 @@ static void cls_cgroup_put(struct tcf_proto *tp, unsigned long f)
>
> static int cls_cgroup_init(struct tcf_proto *tp)
> {
> + struct cls_cgroup_head *head = tp->root;
> +
> + tcf_exts_init(&head->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
> return 0;
> }
>
> @@ -94,20 +98,9 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
> if (!tca[TCA_OPTIONS])
> return -EINVAL;
>
> - if (head == NULL) {
> - if (!handle)
> - return -EINVAL;
> -
> - head = kzalloc(sizeof(*head), GFP_KERNEL);
> - if (head == NULL)
> - return -ENOBUFS;
> -
> - tcf_exts_init(&head->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
> + if (!head->init) {
> head->handle = handle;
> -
> - tcf_tree_lock(tp);
> - tp->root = head;
> - tcf_tree_unlock(tp);
> + head->init = true;
> }
>
> if (handle != head->handle)
> @@ -140,7 +133,6 @@ static void cls_cgroup_destroy(struct tcf_proto *tp)
> if (head) {
> tcf_exts_destroy(tp, &head->exts);
> tcf_em_tree_destroy(tp, &head->ematches);
> - kfree(head);
> }
> }
>
> @@ -205,6 +197,7 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
> .walk = cls_cgroup_walk,
> .dump = cls_cgroup_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct cls_cgroup_head),
> };
>
> static int __init init_cgroup_cls(void)
> diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
> index 257029c..b39080a 100644
> --- a/net/sched/cls_flow.c
> +++ b/net/sched/cls_flow.c
> @@ -526,13 +526,9 @@ static int flow_delete(struct tcf_proto *tp, unsigned long arg)
>
> static int flow_init(struct tcf_proto *tp)
> {
> - struct flow_head *head;
> + struct flow_head *head = tp->root;
>
> - head = kzalloc(sizeof(*head), GFP_KERNEL);
> - if (head == NULL)
> - return -ENOBUFS;
> INIT_LIST_HEAD(&head->filters);
> - tp->root = head;
> return 0;
> }
>
> @@ -545,7 +541,6 @@ static void flow_destroy(struct tcf_proto *tp)
> list_del(&f->list);
> flow_destroy_filter(tp, f);
> }
> - kfree(head);
> }
>
> static unsigned long flow_get(struct tcf_proto *tp, u32 handle)
> @@ -653,6 +648,7 @@ static struct tcf_proto_ops cls_flow_ops __read_mostly = {
> .dump = flow_dump,
> .walk = flow_walk,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct flow_head),
> };
>
> static int __init cls_flow_init(void)
> diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
> index ed00e8c..73cd277 100644
> --- a/net/sched/cls_fw.c
> +++ b/net/sched/cls_fw.c
> @@ -34,6 +34,7 @@
> struct fw_head {
> struct fw_filter *ht[HTSIZE];
> u32 mask;
> + bool init;
> };
>
> struct fw_filter {
> @@ -155,7 +156,6 @@ static void fw_destroy(struct tcf_proto *tp)
> fw_delete_filter(tp, f);
> }
> }
> - kfree(head);
> }
>
> static int fw_delete(struct tcf_proto *tp, unsigned long arg)
> @@ -259,19 +259,12 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
> if (!handle)
> return -EINVAL;
>
> - if (head == NULL) {
> + if (!head->init) {
> u32 mask = 0xFFFFFFFF;
> if (tb[TCA_FW_MASK])
> mask = nla_get_u32(tb[TCA_FW_MASK]);
> -
> - head = kzalloc(sizeof(struct fw_head), GFP_KERNEL);
> - if (head == NULL)
> - return -ENOBUFS;
> head->mask = mask;
> -
> - tcf_tree_lock(tp);
> - tp->root = head;
> - tcf_tree_unlock(tp);
> + head->init = true;
> }
>
> f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
> @@ -388,6 +381,7 @@ static struct tcf_proto_ops cls_fw_ops __read_mostly = {
> .walk = fw_walk,
> .dump = fw_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct fw_head),
> };
>
> static int __init init_fw(void)
> diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
> index 1ad3068..038f35f 100644
> --- a/net/sched/cls_route.c
> +++ b/net/sched/cls_route.c
> @@ -279,7 +279,6 @@ static void route4_destroy(struct tcf_proto *tp)
> kfree(b);
> }
> }
> - kfree(head);
> }
>
> static int route4_delete(struct tcf_proto *tp, unsigned long arg)
> @@ -462,20 +461,9 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
> goto reinsert;
> }
>
> - err = -ENOBUFS;
> - if (head == NULL) {
> - head = kzalloc(sizeof(struct route4_head), GFP_KERNEL);
> - if (head == NULL)
> - goto errout;
> -
> - tcf_tree_lock(tp);
> - tp->root = head;
> - tcf_tree_unlock(tp);
> - }
> -
> f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL);
> if (f == NULL)
> - goto errout;
> + return -ENOBUFS;
>
> tcf_exts_init(&f->exts, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE);
> err = route4_set_parms(net, tp, base, f, handle, head, tb,
> @@ -613,6 +601,7 @@ static struct tcf_proto_ops cls_route4_ops __read_mostly = {
> .walk = route4_walk,
> .dump = route4_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct route4_head),
> };
>
> static int __init init_route4(void)
> diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
> index 19f8e5d..47930bc 100644
> --- a/net/sched/cls_rsvp.h
> +++ b/net/sched/cls_rsvp.h
> @@ -242,14 +242,7 @@ static void rsvp_put(struct tcf_proto *tp, unsigned long f)
>
> static int rsvp_init(struct tcf_proto *tp)
> {
> - struct rsvp_head *data;
> -
> - data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
> - if (data) {
> - tp->root = data;
> - return 0;
> - }
> - return -ENOBUFS;
> + return 0;
> }
>
> static void
> @@ -656,6 +649,7 @@ static struct tcf_proto_ops RSVP_OPS __read_mostly = {
> .walk = rsvp_walk,
> .dump = rsvp_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct rsvp_head),
> };
>
> static int __init init_rsvp(void)
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index eed8404..6454158 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -118,18 +118,11 @@ static void tcindex_put(struct tcf_proto *tp, unsigned long f)
>
> static int tcindex_init(struct tcf_proto *tp)
> {
> - struct tcindex_data *p;
> -
> - pr_debug("tcindex_init(tp %p)\n", tp);
> - p = kzalloc(sizeof(struct tcindex_data), GFP_KERNEL);
> - if (!p)
> - return -ENOMEM;
> + struct tcindex_data *p = tp->root;
>
> p->mask = 0xffff;
> p->hash = DEFAULT_HASH_SIZE;
> p->fall_through = 1;
> -
> - tp->root = p;
> return 0;
> }
>
> @@ -407,15 +400,12 @@ static void tcindex_destroy(struct tcf_proto *tp)
> struct tcindex_data *p = tp->root;
> struct tcf_walker walker;
>
> - pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
> walker.count = 0;
> walker.skip = 0;
> walker.fn = &tcindex_destroy_element;
> tcindex_walk(tp, &walker);
> kfree(p->perfect);
> kfree(p->h);
> - kfree(p);
> - tp->root = NULL;
> }
>
>
> @@ -491,6 +481,7 @@ static struct tcf_proto_ops cls_tcindex_ops __read_mostly = {
> .walk = tcindex_walk,
> .dump = tcindex_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct tcindex_data),
> };
>
> static int __init init_tcindex(void)
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 84c28da..678c2d72 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -300,15 +300,11 @@ static u32 gen_new_htid(struct tc_u_common *tp_c)
>
> static int u32_init(struct tcf_proto *tp)
> {
> - struct tc_u_hnode *root_ht;
> + struct tc_u_hnode *root_ht = tp->root;
> struct tc_u_common *tp_c;
>
> tp_c = tp->q->u32_node;
>
> - root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL);
> - if (root_ht == NULL)
> - return -ENOBUFS;
> -
> root_ht->divisor = 0;
> root_ht->refcnt++;
> root_ht->handle = tp_c ? gen_new_htid(tp_c) : 0x80000000;
> @@ -329,7 +325,6 @@ static int u32_init(struct tcf_proto *tp)
> tp_c->hlist = root_ht;
> root_ht->tp_c = tp_c;
>
> - tp->root = root_ht;
> tp->data = tp_c;
> return 0;
> }
> @@ -394,7 +389,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
> for (hn = &tp_c->hlist; *hn; hn = &(*hn)->next) {
> if (*hn == ht) {
> *hn = ht->next;
> - kfree(ht);
> return 0;
> }
> }
> @@ -801,6 +795,7 @@ static struct tcf_proto_ops cls_u32_ops __read_mostly = {
> .walk = u32_walk,
> .dump = u32_dump,
> .owner = THIS_MODULE,
> + .root_size = sizeof(struct tc_u_hnode),
> };
>
> static int __init init_u32(void)
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 1313145..5fef7f4 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1829,6 +1829,7 @@ EXPORT_SYMBOL(tc_classify);
> void tcf_destroy(struct tcf_proto *tp)
> {
> tp->ops->destroy(tp);
> + kfree(tp->root);
> module_put(tp->ops->owner);
> kfree(tp);
> }
>
^ permalink raw reply
* [PATCH TRIVIAL] net: Spelling s/transmition/transmission/
From: Geert Uytterhoeven @ 2014-01-12 13:06 UTC (permalink / raw)
To: Andy Gospodarek, David S. Miller, Jiri Kosina
Cc: netdev, linux-wireless, Geert Uytterhoeven
From: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
---
drivers/net/ethernet/tehuti/tehuti.c | 2 +-
drivers/net/wan/hd64570.h | 4 ++--
drivers/net/wan/hd64572.h | 2 +-
net/nfc/hci/llc_shdlc.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/tehuti/tehuti.c b/drivers/net/ethernet/tehuti/tehuti.c
index 4f1d2549130e..2ead87759ab4 100644
--- a/drivers/net/ethernet/tehuti/tehuti.c
+++ b/drivers/net/ethernet/tehuti/tehuti.c
@@ -1764,7 +1764,7 @@ static void bdx_tx_cleanup(struct bdx_priv *priv)
WRITE_REG(priv, f->m.reg_RPTR, f->m.rptr & TXF_WPTR_WR_PTR);
/* We reclaimed resources, so in case the Q is stopped by xmit callback,
- * we resume the transmition and use tx_lock to synchronize with xmit.*/
+ * we resume the transmission and use tx_lock to synchronize with xmit.*/
spin_lock(&priv->tx_lock);
priv->tx_level += tx_level;
BDX_ASSERT(priv->tx_level <= 0 || priv->tx_level > BDX_MAX_TX_LEVEL);
diff --git a/drivers/net/wan/hd64570.h b/drivers/net/wan/hd64570.h
index e4f539ad071b..10963e8f4b39 100644
--- a/drivers/net/wan/hd64570.h
+++ b/drivers/net/wan/hd64570.h
@@ -159,7 +159,7 @@ typedef struct {
/* Packet Descriptor Status bits */
#define ST_TX_EOM 0x80 /* End of frame */
-#define ST_TX_EOT 0x01 /* End of transmition */
+#define ST_TX_EOT 0x01 /* End of transmission */
#define ST_RX_EOM 0x80 /* End of frame */
#define ST_RX_SHORT 0x40 /* Short frame */
@@ -211,7 +211,7 @@ typedef struct {
#define CTL_NORTS 0x01
#define CTL_IDLE 0x10 /* Transmit an idle pattern */
-#define CTL_UDRNC 0x20 /* Idle after CRC or FCS+flag transmition */
+#define CTL_UDRNC 0x20 /* Idle after CRC or FCS+flag transmission */
#define ST0_TXRDY 0x02 /* TX ready */
#define ST0_RXRDY 0x01 /* RX ready */
diff --git a/drivers/net/wan/hd64572.h b/drivers/net/wan/hd64572.h
index 96567c2dc4db..22137ee669cf 100644
--- a/drivers/net/wan/hd64572.h
+++ b/drivers/net/wan/hd64572.h
@@ -218,7 +218,7 @@ typedef struct {
#define ST_TX_EOM 0x80 /* End of frame */
#define ST_TX_UNDRRUN 0x08
#define ST_TX_OWNRSHP 0x02
-#define ST_TX_EOT 0x01 /* End of transmition */
+#define ST_TX_EOT 0x01 /* End of transmission */
#define ST_RX_EOM 0x80 /* End of frame */
#define ST_RX_SHORT 0x40 /* Short frame */
diff --git a/net/nfc/hci/llc_shdlc.c b/net/nfc/hci/llc_shdlc.c
index 27b313befc35..3e53c1e029dc 100644
--- a/net/nfc/hci/llc_shdlc.c
+++ b/net/nfc/hci/llc_shdlc.c
@@ -300,7 +300,7 @@ static void llc_shdlc_rcv_rej(struct llc_shdlc *shdlc, int y_nr)
{
struct sk_buff *skb;
- pr_debug("remote asks retransmition from frame %d\n", y_nr);
+ pr_debug("remote asks retransmission from frame %d\n", y_nr);
if (llc_shdlc_x_lteq_y_lt_z(shdlc->dnr, y_nr, shdlc->ns)) {
if (shdlc->t2_active) {
--
1.7.9.5
^ permalink raw reply related
* [PATCH TRIVIAL] net: amd8111e: Spelling s/recive/receive/
From: Geert Uytterhoeven @ 2014-01-12 13:02 UTC (permalink / raw)
To: Jiri Kosina, netdev; +Cc: Geert Uytterhoeven
From: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
---
drivers/net/ethernet/amd/amd8111e.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amd/amd8111e.h b/drivers/net/ethernet/amd/amd8111e.h
index 8baa3527ba74..be19c3058d69 100644
--- a/drivers/net/ethernet/amd/amd8111e.h
+++ b/drivers/net/ethernet/amd/amd8111e.h
@@ -753,7 +753,7 @@ struct amd8111e_priv{
const char *name;
struct pci_dev *pci_dev; /* Ptr to the associated pci_dev */
struct net_device* amd8111e_net_dev; /* ptr to associated net_device */
- /* Transmit and recive skbs */
+ /* Transmit and receive skbs */
struct sk_buff *tx_skbuff[NUM_TX_BUFFERS];
struct sk_buff *rx_skbuff[NUM_RX_BUFFERS];
/* Transmit and receive dma mapped addr */
--
1.7.9.5
^ permalink raw reply related
* Re: [Patch net-next 5/7] net_sched: avoid casting void pointer
From: Jamal Hadi Salim @ 2014-01-12 12:55 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-6-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> tp->root is a void* pointer, no need to cast it.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [Patch net-next 4/7] net_sched: optimize tcf_match_indev()
From: Jamal Hadi Salim @ 2014-01-12 12:50 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-5-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> tcf_match_indev() is called in fast path, it is not wise to
> search for a netdev by ifindex and then compare by its name,
> just compare the ifindex.
>
> Also, dev->name could be changed by user-space, therefore
> the match would be always fail, but dev->ifindex could
> be consistent.
>
> BTW, this will also save some bytes from the core struct of u32.
>
excellent.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* [PATCH net-next] bnx2x: Correct default Tx switching behaviour
From: Yuval Mintz @ 2014-01-12 12:37 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz, Ariel Elior
With this patch bnx2x will configure the PF to perform Tx switching on
out-going traffic as soon as SR-IOV is dynamically enabled and de-activate
it when it is disabled.
This will allow VFs to communicate with their parent PFs.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
---
Hi Dave,
Please consider applying this to `net-next'.
Thanks,
Yuval
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 7 +++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 61 +++++++++++++++++++++++
5 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index eb105ab..e800b01 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1542,6 +1542,7 @@ struct bnx2x {
#define NO_ISCSI_FLAG (1 << 14)
#define NO_FCOE_FLAG (1 << 15)
#define BC_SUPPORTS_PFC_STATS (1 << 17)
+#define TX_SWITCHING (1 << 18)
#define BC_SUPPORTS_FCOE_FEATURES (1 << 19)
#define USING_SINGLE_MSIX_FLAG (1 << 20)
#define BC_SUPPORTS_DCBX_MSG_NON_PMF (1 << 21)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 18498fe..1fdc8a3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3001,6 +3001,9 @@ static unsigned long bnx2x_get_common_flags(struct bnx2x *bp,
if (zero_stats)
__set_bit(BNX2X_Q_FLG_ZERO_STATS, &flags);
+ if (bp->flags & TX_SWITCHING)
+ __set_bit(BNX2X_Q_FLG_TX_SWITCH, &flags);
+
__set_bit(BNX2X_Q_FLG_PCSUM_ON_PKT, &flags);
__set_bit(BNX2X_Q_FLG_TUN_INC_INNER_IP_ID, &flags);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index 98cccd4..6036405 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -4988,6 +4988,13 @@ static void bnx2x_q_fill_update_data(struct bnx2x *bp,
test_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM, ¶ms->update_flags);
data->silent_vlan_value = cpu_to_le16(params->silent_removal_value);
data->silent_vlan_mask = cpu_to_le16(params->silent_removal_mask);
+
+ /* tx switching */
+ data->tx_switching_flg =
+ test_bit(BNX2X_Q_UPDATE_TX_SWITCHING, ¶ms->update_flags);
+ data->tx_switching_change_flg =
+ test_bit(BNX2X_Q_UPDATE_TX_SWITCHING_CHNG,
+ ¶ms->update_flags);
}
static inline int bnx2x_q_send_update(struct bnx2x *bp,
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
index 6a53c15..d34664f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
@@ -770,7 +770,9 @@ enum {
BNX2X_Q_UPDATE_DEF_VLAN_EN,
BNX2X_Q_UPDATE_DEF_VLAN_EN_CHNG,
BNX2X_Q_UPDATE_SILENT_VLAN_REM_CHNG,
- BNX2X_Q_UPDATE_SILENT_VLAN_REM
+ BNX2X_Q_UPDATE_SILENT_VLAN_REM,
+ BNX2X_Q_UPDATE_TX_SWITCHING_CHNG,
+ BNX2X_Q_UPDATE_TX_SWITCHING
};
/* Allowed Queue states */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index 31ab924..049edd1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -3130,6 +3130,60 @@ void bnx2x_unlock_vf_pf_channel(struct bnx2x *bp, struct bnx2x_virtf *vf,
vf->abs_vfid, vf->op_current);
}
+static int bnx2x_set_pf_tx_switching(struct bnx2x *bp, bool enable)
+{
+ struct bnx2x_queue_state_params q_params;
+ u32 prev_flags;
+ int i, rc;
+
+ /* Verify changes are needed and record current Tx switching state */
+ prev_flags = bp->flags;
+ if (enable)
+ bp->flags |= TX_SWITCHING;
+ else
+ bp->flags &= ~TX_SWITCHING;
+ if (prev_flags == bp->flags)
+ return 0;
+
+ /* Verify state enables the sending of queue ramrods */
+ if ((bp->state != BNX2X_STATE_OPEN) ||
+ (bnx2x_get_q_logical_state(bp,
+ &bnx2x_sp_obj(bp, &bp->fp[0]).q_obj) !=
+ BNX2X_Q_LOGICAL_STATE_ACTIVE))
+ return 0;
+
+ /* send q. update ramrod to configure Tx switching */
+ memset(&q_params, 0, sizeof(q_params));
+ __set_bit(RAMROD_COMP_WAIT, &q_params.ramrod_flags);
+ q_params.cmd = BNX2X_Q_CMD_UPDATE;
+ __set_bit(BNX2X_Q_UPDATE_TX_SWITCHING_CHNG,
+ &q_params.params.update.update_flags);
+ if (enable)
+ __set_bit(BNX2X_Q_UPDATE_TX_SWITCHING,
+ &q_params.params.update.update_flags);
+ else
+ __clear_bit(BNX2X_Q_UPDATE_TX_SWITCHING,
+ &q_params.params.update.update_flags);
+
+ /* send the ramrod on all the queues of the PF */
+ for_each_eth_queue(bp, i) {
+ struct bnx2x_fastpath *fp = &bp->fp[i];
+
+ /* Set the appropriate Queue object */
+ q_params.q_obj = &bnx2x_sp_obj(bp, fp).q_obj;
+
+ /* Update the Queue state */
+ rc = bnx2x_queue_state_change(bp, &q_params);
+ if (rc) {
+ BNX2X_ERR("Failed to configure Tx switching\n");
+ return rc;
+ }
+ }
+
+ DP(BNX2X_MSG_IOV, "%s Tx Switching\n", enable ? "Enabled" : "Disabled");
+ return 0;
+}
+
int bnx2x_sriov_configure(struct pci_dev *dev, int num_vfs_param)
{
struct bnx2x *bp = netdev_priv(pci_get_drvdata(dev));
@@ -3157,12 +3211,14 @@ int bnx2x_sriov_configure(struct pci_dev *dev, int num_vfs_param)
bp->requested_nr_virtfn = num_vfs_param;
if (num_vfs_param == 0) {
+ bnx2x_set_pf_tx_switching(bp, false);
pci_disable_sriov(dev);
return 0;
} else {
return bnx2x_enable_sriov(bp);
}
}
+
#define IGU_ENTRY_SIZE 4
int bnx2x_enable_sriov(struct bnx2x *bp)
@@ -3240,6 +3296,11 @@ int bnx2x_enable_sriov(struct bnx2x *bp)
*/
DP(BNX2X_MSG_IOV, "about to call enable sriov\n");
bnx2x_disable_sriov(bp);
+
+ rc = bnx2x_set_pf_tx_switching(bp, true);
+ if (rc)
+ return rc;
+
rc = pci_enable_sriov(bp->pdev, req_vfs);
if (rc) {
BNX2X_ERR("pci_enable_sriov failed with %d\n", rc);
--
1.8.1.227.g44fe835
^ permalink raw reply related
* Re: [Patch net-next 3/7] net_sched: add struct net pointer to tcf_proto_ops->dump
From: Jamal Hadi Salim @ 2014-01-12 12:46 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-4-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> It will be needed by the next patch.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: net/mlx4_en: call gro handler for encapsulated frames
From: Amir Vadai @ 2014-01-12 12:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Or Gerlitz, Jerry Chu
In-Reply-To: <1389292213.31367.49.camel@edumazet-glaptop2.roam.corp.google.com>
On 09/01/14 10:30 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> In order to use the native GRO handling of encapsulated protocols on
> mlx4, we need to call napi_gro_receive() instead of netif_receive_skb()
> unless busy polling is in action.
>
> While we are at it, rename mlx4_en_cq_ll_polling() to
> mlx4_en_cq_busy_polling()
>
> Tested with GRE tunnel : GRO aggregation is now performed on the
> ethernet device instead of being done later on gre device.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Amir Vadai <amirv@mellanox.com>
> Cc: Jerry Chu <hkchu@google.com>
> Cc: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 +++++---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 ++--
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
Acked-By: Amir Vadai <amirv@mellanox.com>
^ permalink raw reply
* Re: [Patch net-next 2/7] net_sched: act: clean up notification functions
From: Jamal Hadi Salim @ 2014-01-12 12:38 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-3-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:14, Cong Wang wrote:
> Refactor tcf_add_notify() and factor out tcf_del_notify().
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Thanks Cong!
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
^ permalink raw reply
* Re: [Patch net-next 1/7] net_sched: act: move idx_gen into struct tcf_hashinfo
From: Jamal Hadi Salim @ 2014-01-12 12:34 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-2-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:13, Cong Wang wrote:
> There is no need to store the index separatedly
> since tcf_hashinfo is allocated statically too.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
An improvement for sure. Thanks Cong!
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
^ permalink raw reply
* Re: [Patch net-next 0/7] net_sched: some more cleanup and improvements
From: Jamal Hadi Salim @ 2014-01-12 12:32 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: David S. Miller
In-Reply-To: <1389312845-10304-1-git-send-email-xiyou.wangcong@gmail.com>
On 01/09/14 19:13, Cong Wang wrote:
> This patchset collects the previous patches I sent which Jamal doesn't object.
> They are still some cleanup and improvements for tc actions and filters.
>
>
Cong - I dont have time to test at the moment and like you said
they dont look controversial; i will review them, please try to test
them to the best you can.
cheers,
jamal
^ permalink raw reply
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
From: Jamal Hadi Salim @ 2014-01-12 12:30 UTC (permalink / raw)
To: John Fastabend, Cong Wang
Cc: Stephen Hemminger, Eric Dumazet, Linux Kernel Network Developers,
John Fastabend, David S. Miller
In-Reply-To: <52CF4795.106@intel.com>
On 01/09/14 20:06, John Fastabend wrote:
> Just to re-iterate you need to go through each and every qdisc,
> classifier, action and verify it is safe to run in parallel. Take
> a look at how the skb lists are managed in the qdiscs. If we want
> to do this we need to make these changes in some coherent way
> because it touches lots of pieces.
>
Indeed. Everything assumes the global qdisc lock is protecting them.
Actually actions are probably the best at the moment because
the lock is very fine grained to just the action instance
and it protects both control and data paths.
But filters have stuff littered everywhere. Egress qdiscs
as you mention have queues at multi levels etc.
> Also your stats are going to get hosed none of the bstats, qstats
> supports this.
Stats are probably the easiest to "fix".
Didnt Eric (or somebody else) fix netdev level stats to use seq counts?
Would that idea not be applicable here?
>I'll send out the classifier set later tonight
> if you want. I got stalled going through the actions.
>
The thing to note is:
actions can be shared across filters, netdevices and cpus.
By default they are not shared across filters and netdevices that is
a config option. You still have to worry about sharing across cpus
which will happen because a flow can be shared across cpus.
You could probably get rid of the lock if you can show
that you can make data and control path mutually exclusive
(rtnl will protect control path).
> Finally any global state in those qdiscs is going to drive performance
> down so many of them would likely need to be redesigned.
>
I feel like per-cpu qdiscs is the best surgery at the moment.
cheers,
jamal
^ permalink raw reply
* Re: [RFC net-next 0/3] bonding: new option API
From: Nikolay Aleksandrov @ 2014-01-12 12:09 UTC (permalink / raw)
To: Scott Feldman
Cc: Netdev, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico,
David S. Miller
In-Reply-To: <AAA51DDF-74AF-4A70-8072-62F2593F8619@cumulusnetworks.com>
On 01/10/2014 09:32 PM, Scott Feldman wrote:
>
> On Jan 10, 2014, at 5:11 AM, Nikolay Aleksandrov <nikolay@redhat.com> wrote:
>
>> Hi,
>> This patchset aims to introduce a new option API that can be easily
>> extended if necessary and which attempts to remove some common problems
>> and code. In the beginning there was support for inter-option dependencies,
>> but that turned out to be unnecessary as the only 2 options that _enforce_
>> another option to be set prior to setting are up/down delay and they can be
>> easily re-worked to not require miimon to be set, so we can spare ourselves
>> 100+ lines of checks, dealing with complex dependency errors and such.
>> In case this becomes necessary I've kept the old version of this patch-set
>> which has it, and can easily re-work it at any time.
>> There're still a lot of things to fix/clean but I've done some limited testing
>> with the options that are converted and it seems to work.
>> The main exported functions (as can be seen) are:
>> __bond_opt_set() - to be used when a string is passed which needs to be
>> converted in the case of BOND_OPTVAL_INTEGER. (sysfs)
>> __bond_opt_intset() - to be used when a value is passed to
>> BOND_OPTVAL_INTEGER (netlink), this function can't
>> be used for BOND_OPTVAL_STRING options
>> These two can be used from inside other options to stop them (e.g., arp_interval
>> stopping miimon and vice versa).
>> I've also added bond_opt_tryset_rtnl() mostly for sysfs use.
>> See the description of patch 01 and the comments inside for more information.
>>
>> Value tables of converted options are no longer exported, and can be accessed
>> through the API (bond_opt_get_val() & bond_opt_get_flags).
>> Another good side-effect is that the error codes are standard for all options
>> for the common errors at least.
>
> Nice!
>
>> When/if this patchset is posted for inclusion, I'll have all options converted.
>> I actually had them before but while on vacation during December a lot of code
>> went in changing the bonding options and have to re-work most of the patches.
>
> Oops, sorry about that ;)
>
>> Some of the future plans for this are:
>> Verbose outputting of dependencies (done, just have to polish it)
>> Automatic sysfs generation from the bond_opts[].
>
> I had a patch in my queue to do something similar, but yours is so much nicer.
>
> For sysfs nodes, there is a file permission. I wonder if bond_opts should have a sense of RO, RW, or WO? Then automatic sysfs generation is even easier. Hmmm, actually I think the answer is no. Nevermind.
>
>> Use of the API in bond_check_params() and thus cleaning it up
>> Structure for accessor fn parameter passing so we can implement get/set
>> in a more general manner
>>
>> Sending it with 2 options converted which illustrate the use of different
>> features of the API. I've tested them via sysfs.
>>
>> Any thoughts, comments and suggestions are very welcome.
>
> Nice job Nik, well done.
>
> -scott
>
Hi Scott,
Thank you for the review, I'll take care of the comments for the first version.
I saw your first comment about generalizing this to more than just the
bonding, I'll look into that because I know I'm reinventing the wheel here
as there are already network drivers which have similar APIs, but such
project would have much larger set of requirements (e.g. a much better and
more descriptive inter-dependency checks, custom errors, custom
permissions, more opaque objects etc.). Maybe we can leave it as the next
step, I have to give it some more thought :-)
Cheers,
Nik
^ permalink raw reply
* Re: [PATCH 1/1] When timestamping is enabled, stmmac_tx_clean will call stmmac_get_tx_hwtstamp to get tx TS. It's possible that skb is NULL because there are other network frames that use several descriptors. So we must return immediately in stmmac_get_tx_hwtstamp if skb is NULL to avoid system crash.
From: Daniel Borkmann @ 2014-01-12 11:38 UTC (permalink / raw)
To: Bruce Liu; +Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20140112093951.GA3743@gmail.com>
On 01/12/2014 10:39 AM, Bruce Liu wrote:
> When timestamping is enabled, stmmac_tx_clean will call stmmac_get_tx_hwtstamp to get tx TS.
> It's possible that skb is NULL because there are other network frames that use several descriptors.
> So we must return immediately in stmmac_get_tx_hwtstamp if skb is NULL to avoid system crash.
>
>
> Signed-off-by: Bruce Liu <damuzi000@gmail.com>
Please see Documentation/SubmittingPatches +489
You subject line is way too long and should just be something like:
[PATCH net-next] net: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp
Don't indent your actual commit message with whitespaces as prefix,
and do a line break after around 70 chars.
Btw, I mentioned net-next in the subject since merge window will
open soon anyway.
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 797b56a..47f2287 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -332,7 +332,7 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv,
> return;
>
> /* exit if skb doesn't support hw tstamp */
> - if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
> + if (likely(!skb || !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
> return;
>
> if (priv->adv_ts)
>
^ permalink raw reply
* [PATCH] MAINTAINERS: add virtio-dev ML for virtio
From: Michael S. Tsirkin @ 2014-01-12 11:37 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, Andrew Morton, Joe Perches, Greg Kroah-Hartman,
Mauro Carvalho Chehab, David S. Miller, amit.shah, Jason Wang,
Wanlong Gao, virtio-dev, netdev, virtualization
Since virtio is an OASIS standard draft now, virtio implementation
discussions are taking place on the virtio-dev OASIS mailing list.
Update MAINTAINERS.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
MAINTAINERS | 3 +
drivers/net/virtio_net.c | 355 +++++++++++++++++++++++++----------------------
2 files changed, 191 insertions(+), 167 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index ffcaf97..75202af 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9088,6 +9088,7 @@ F: include/media/videobuf2-*
VIRTIO CONSOLE DRIVER
M: Amit Shah <amit.shah@redhat.com>
+L: virtio-dev@lists.oasis-open.org
L: virtualization@lists.linux-foundation.org
S: Maintained
F: drivers/char/virtio_console.c
@@ -9097,6 +9098,7 @@ F: include/uapi/linux/virtio_console.h
VIRTIO CORE, NET AND BLOCK DRIVERS
M: Rusty Russell <rusty@rustcorp.com.au>
M: "Michael S. Tsirkin" <mst@redhat.com>
+L: virtio-dev@lists.oasis-open.org
L: virtualization@lists.linux-foundation.org
S: Maintained
F: drivers/virtio/
@@ -9109,6 +9111,7 @@ F: include/uapi/linux/virtio_*.h
VIRTIO HOST (VHOST)
M: "Michael S. Tsirkin" <mst@redhat.com>
L: kvm@vger.kernel.org
+L: virtio-dev@lists.oasis-open.org
L: virtualization@lists.linux-foundation.org
L: netdev@vger.kernel.org
S: Maintained
--
MST
^ permalink raw reply related
* [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389525848-1814-1-git-send-email-w@1wt.eu>
calling dma_map_single()/dma_unmap_single() is quite expensive compared
to copying a small packet. So let's copy short frames and keep the buffers
mapped. We set the limit to 256 bytes which seems to give good results both
on the XP-GP board and on the AX3/4.
The Rx small packet rate increased by 16.4% doing this, from 486kpps to
573kpps. It is worth noting that even the call to the function
dma_sync_single_range_for_cpu() is expensive (300 ns) although less
than dma_unmap_single(). Without it, the packet rate raises to 711kpps
(+24% more). Thus on systems where coherency from device to CPU is
guaranteed by a snoop control unit, this patch should provide even more
gains, and probably rx_copybreak could be increased.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
drivers/net/ethernet/marvell/mvneta.c | 44 ++++++++++++++++++++++++++++++-----
1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 726a8d2..f5fc7a2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -444,6 +444,8 @@ static int txq_number = 8;
static int rxq_def;
+static int rx_copybreak __read_mostly = 256;
+
#define MVNETA_DRIVER_NAME "mvneta"
#define MVNETA_DRIVER_VERSION "1.0"
@@ -1463,22 +1465,51 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
rx_done++;
rx_filled++;
rx_status = rx_desc->status;
+ rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
data = (unsigned char *)rx_desc->buf_cookie;
if (!mvneta_rxq_desc_is_first_last(rx_status) ||
- (rx_status & MVNETA_RXD_ERR_SUMMARY) ||
- !(skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size))) {
+ (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+ err_drop_frame:
dev->stats.rx_errors++;
mvneta_rx_error(pp, rx_desc);
/* leave the descriptor untouched */
continue;
}
- dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
+ if (rx_bytes <= rx_copybreak) {
+ /* better copy a small frame and not unmap the DMA region */
+ skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
+ if (unlikely(!skb))
+ goto err_drop_frame;
+
+ dma_sync_single_range_for_cpu(dev->dev.parent,
+ rx_desc->buf_phys_addr,
+ MVNETA_MH_SIZE + NET_SKB_PAD,
+ rx_bytes,
+ DMA_FROM_DEVICE);
+ memcpy(skb_put(skb, rx_bytes),
+ data + MVNETA_MH_SIZE + NET_SKB_PAD,
+ rx_bytes);
+
+ skb->protocol = eth_type_trans(skb, dev);
+ mvneta_rx_csum(pp, rx_status, skb);
+ napi_gro_receive(&pp->napi, skb);
+
+ rcvd_pkts++;
+ rcvd_bytes += rx_bytes;
+
+ /* leave the descriptor and buffer untouched */
+ continue;
+ }
+
+ skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size);
+ if (!skb)
+ goto err_drop_frame;
+
+ dma_unmap_single(dev->dev.parent, rx_desc->buf_phys_addr,
MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
- rx_bytes = rx_desc->data_size -
- (ETH_FCS_LEN + MVNETA_MH_SIZE);
rcvd_pkts++;
rcvd_bytes += rx_bytes;
@@ -1495,7 +1526,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
/* Refill processing */
err = mvneta_rx_refill(pp, rx_desc);
if (err) {
- netdev_err(pp->dev, "Linux processing - Can't refill\n");
+ netdev_err(dev, "Linux processing - Can't refill\n");
rxq->missed++;
rx_filled--;
}
@@ -2945,3 +2976,4 @@ module_param(rxq_number, int, S_IRUGO);
module_param(txq_number, int, S_IRUGO);
module_param(rxq_def, int, S_IRUGO);
+module_param(rx_copybreak, int, S_IRUGO | S_IWUSR);
--
1.7.12.2.21.g234cd45.dirty
^ permalink raw reply related
* [PATCH net-next 4/6] net: mvneta: prefetch next rx descriptor instead of current one
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389525848-1814-1-git-send-email-w@1wt.eu>
Currently, the mvneta driver tries to prefetch the current Rx
descriptor during read. Tests have shown that prefetching the
next one instead increases general performance by about 1% on
HTTP traffic.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
drivers/net/ethernet/marvell/mvneta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index aa3a4f7..c7b37e0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -611,6 +611,7 @@ mvneta_rxq_next_desc_get(struct mvneta_rx_queue *rxq)
int rx_desc = rxq->next_desc_to_proc;
rxq->next_desc_to_proc = MVNETA_QUEUE_NEXT_DESC(rxq, rx_desc);
+ prefetch(rxq->descs + rxq->next_desc_to_proc);
return rxq->descs + rx_desc;
}
@@ -1442,7 +1443,6 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
u32 rx_status;
int rx_bytes, err;
- prefetch(rx_desc);
rx_done++;
rx_filled++;
rx_status = rx_desc->status;
--
1.7.12.2.21.g234cd45.dirty
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox