Netdev List
 help / color / mirror / Atom feed
* (unknown)
From: David Miller @ 2010-10-21  7:56 UTC (permalink / raw)
  To: ddutt; +Cc: netdev, rmody, huangj, amathur
In-Reply-To: <F363E7AC84E1B646A0358B281A46F4AEABA0FFCC62@HQ1-EXCH03.corp.brocade.com>


People are very unlikely to read your posting because you
did not provide a subject line.

^ permalink raw reply

* Re: Linux 2.6.36
From: Mihai Donțu @ 2010-10-21  7:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Gary Zambrano, netdev
In-Reply-To: <AANLkTimdii2F3PsG4SxO5Zym7TB=MSGhtN+TpG=HmbcT@mail.gmail.com>

[-- Attachment #1: Type: Text/Plain, Size: 2050 bytes --]

On Thursday 21 October 2010 00:01:27 Linus Torvalds wrote:
> So it's a week later than I wanted (plus all the days that added up
> from me having a few 8-day weeks during this release window), but it's
> out there now.
> 
> The delay means that the merge window that opens now would cover the
> upcoming kernel summit. However, I really hope that everybody sends me
> their patches and pull requests _before_ KS even starts. And if you're
> affected by the kernel summit you probably won't have time during it
> to finalize anything that week anyway, especially for those staying
> for plumbers afterwards, and...
> 
> So I'm going to hope that we could perhaps even do the 2.6.37 -rc1
> release and close the merge window the Sunday before KS opens. Since
> 2.6.36 was longer than usual (at least it felt that way), I wouldn't
> mind having a 2.6.37 that is shorter than usual.
> 
> But holler if this really screws up any plans. Ten days instead of two
> weeks? Let's see if it's even reasonably realistic.
> 
> Anyway, I'm appending the shortlog since -rc8. At least it's
> noticeably shorter than the -rc7 and -rc8 logs were, and most of it
> really is pretty small.
> 
> For the bigger picture of changes since 2.6.35, see for example
> 
>    http://kernelnewbies.org/Linux_2_6_36
> 
> but it may be worth pointing out that we ended up disabling the new
> fanotify system calls because people were still unsure about the
> interfaces. Better let the interface discussion cook a bit longer than
> release with a bad interface that we need to redo.

I get a rather big amount of 'b44 ssb1:0: eth0: powering down PHY' messages in 
dmesg shortly after booting:

# grep -c 'b44 ssb1:0: eth0: powering down PHY' /var/log/messages
124566
# grep -c 'b44 ssb1:0: eth0: late interrupt' /var/log/messages
1141

The same thing happens when resuming from suspend to RAM. This is accompanied 
by kworker/0:3 (?) taking 100% CPU time for 1 min or so. I'm running 2.6.35 
now, so I might be wrong about the name of the kernel thread.

Thanks,

-- 
Mihai Donțu

[-- Attachment #2: syslog-messages.gz --]
[-- Type: application/x-gzip, Size: 299996 bytes --]

[-- Attachment #3: lspci.txt --]
[-- Type: text/plain, Size: 1758 bytes --]

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
02:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
02:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21)
02:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02)
0c:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 01)

^ permalink raw reply

* Re: [RFC PATCH 3/9] ipvs network name space aware
From: Hans Schillstrom @ 2010-10-21  7:51 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, ja@ssi.bg, wensong@linux-vs.org,
	daniel.lezcano@free.fr
In-Reply-To: <20101020140318.GA17760@verge.net.au>

On Wednesday 20 October 2010 16:03:24 Simon Horman wrote:
> On Fri, Oct 08, 2010 at 01:16:57PM +0200, Hans Schillstrom wrote:
> >
> > This patch just contains ip_vs_conn.c
> > and does the normal
> >  - moving to vars to struct ipvs
> >  - adding per netns init and exit
> >
> > proc_fs required some extra work with adding/chaning private data to get the net ptr.
>
> I am currently working on rebasing this patch against the
> current nf-next-2.6 tree with includes persistence engines
> and I noticed a few things.
>
> > Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
> >
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index b71c69a..c47828f 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -47,7 +47,7 @@
> >
> >  /*
> >   * Connection hash size. Default is what was selected at compile time.
> > -*/
> > + */
> >  int ip_vs_conn_tab_bits = CONFIG_IP_VS_TAB_BITS;
> >  module_param_named(conn_tab_bits, ip_vs_conn_tab_bits, int, 0444);
> >  MODULE_PARM_DESC(conn_tab_bits, "Set connections' hash size");
>
> This fragment is not needed.

OK

>
> > @@ -56,23 +56,12 @@ MODULE_PARM_DESC(conn_tab_bits, "Set connections' hash size");
> >  int ip_vs_conn_tab_size;
> >  int ip_vs_conn_tab_mask;
> >
> > -/*
> > - *  Connection hash table: for input and output packets lookups of IPVS
> > - */
> > -static struct list_head *ip_vs_conn_tab;
> > -
> > -/*  SLAB cache for IPVS connections */
> > -static struct kmem_cache *ip_vs_conn_cachep __read_mostly;
> > -
> > -/*  counter for current IPVS connections */
> > -static atomic_t ip_vs_conn_count = ATOMIC_INIT(0);
> > -
> > -/*  counter for no client port connections */
> > -static atomic_t ip_vs_conn_no_cport_cnt = ATOMIC_INIT(0);
> > -
> >  /* random value for IPVS connection hash */
> >  static unsigned int ip_vs_conn_rnd;
> >
> > +/* cache name cnt */
> > +static atomic_t conn_cache_nr = ATOMIC_INIT(0);
> > +
> >  /*
> >   *  Fine locking granularity for big connection hash table
> >   */
> > @@ -153,7 +142,7 @@ static unsigned int ip_vs_conn_hashkey(int af, unsigned proto,
> >   *   Hashes ip_vs_conn in ip_vs_conn_tab by proto,addr,port.
> >   *   returns bool success.
> >   */
> > -static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
> > +static inline int ip_vs_conn_hash(struct net *net, struct ip_vs_conn *cp)
> >  {
> >       unsigned hash;
> >       int ret;
> > @@ -168,7 +157,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
> >       spin_lock(&cp->lock);
> >
> >       if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
> > -             list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
> > +             list_add(&cp->c_list, &net->ipvs->conn_tab[hash]);
> >               cp->flags |= IP_VS_CONN_F_HASHED;
> >               atomic_inc(&cp->refcnt);
> >               ret = 1;
> > @@ -221,18 +210,20 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
> >   *   s_addr, s_port: pkt source address (foreign host)
> >   *   d_addr, d_port: pkt dest address (load balancer)
> >   */
> > -static inline struct ip_vs_conn *__ip_vs_conn_in_get
> > -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> > - const union nf_inet_addr *d_addr, __be16 d_port)
> > +static inline struct ip_vs_conn *
> > +__ip_vs_conn_in_get(struct net *net, int af, int protocol,
> > +                 const union nf_inet_addr *s_addr, __be16 s_port,
> > +                 const union nf_inet_addr *d_addr, __be16 d_port)
> >  {
> >       unsigned hash;
> >       struct ip_vs_conn *cp;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
> >
> >       ct_read_lock(hash);
> >
> > -     list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > +     list_for_each_entry(cp, &ipvs->conn_tab[hash], c_list) {
> >               if (cp->af == af &&
> >                   ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> >                   ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> > @@ -251,16 +242,18 @@ static inline struct ip_vs_conn *__ip_vs_conn_in_get
> >       return NULL;
> >  }
> >
> > -struct ip_vs_conn *ip_vs_conn_in_get
> > -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> > - const union nf_inet_addr *d_addr, __be16 d_port)
> > +struct ip_vs_conn *
> > +ip_vs_conn_in_get(struct net *net, int af, int protocol,
> > +               const union nf_inet_addr *s_addr, __be16 s_port,
> > +               const union nf_inet_addr *d_addr, __be16 d_port)
> >  {
> >       struct ip_vs_conn *cp;
> >
> > -     cp = __ip_vs_conn_in_get(af, protocol, s_addr, s_port, d_addr, d_port);
> > -     if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt))
> > -             cp = __ip_vs_conn_in_get(af, protocol, s_addr, 0, d_addr,
> > -                                      d_port);
> > +     cp = __ip_vs_conn_in_get(net, af, protocol,
> > +                              s_addr, s_port, d_addr, d_port);
> > +     if (!cp && atomic_read(&net->ipvs->conn_no_cport_cnt))
> > +             cp = __ip_vs_conn_in_get(net, af, protocol,
> > +                                      s_addr, 0, d_addr, d_port);
> >
> >       IP_VS_DBG_BUF(9, "lookup/in %s %s:%d->%s:%d %s\n",
> >                     ip_vs_proto_name(protocol),
> > @@ -278,35 +271,41 @@ ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
> >                       unsigned int proto_off, int inverse)
> >  {
> >       __be16 _ports[2], *pptr;
> > +     struct net *net = dev_net(skb->dev);
> >
> >       pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> >       if (pptr == NULL)
> >               return NULL;
> >
> > +     BUG_ON(!net);
>
> Can you explain why BUG_ON is here?

Yes, I forgot to remove it.
I had them every where to make sure that net ptr was set,
- don't call me paranoid ;-)
>
> >       if (likely(!inverse))
> > -             return ip_vs_conn_in_get(af, iph->protocol,
> > +             return ip_vs_conn_in_get(net, af, iph->protocol,
> >                                        &iph->saddr, pptr[0],
> >                                        &iph->daddr, pptr[1]);
> >       else
> > -             return ip_vs_conn_in_get(af, iph->protocol,
> > +             return ip_vs_conn_in_get(net, af, iph->protocol,
> >                                        &iph->daddr, pptr[1],
> >                                        &iph->saddr, pptr[0]);
> >  }
> >  EXPORT_SYMBOL_GPL(ip_vs_conn_in_get_proto);
> >
> > -/* Get reference to connection template */
> > -struct ip_vs_conn *ip_vs_ct_in_get
> > -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> > - const union nf_inet_addr *d_addr, __be16 d_port)
> > +/*
> > + *  Get reference to connection template
> > + */
> > +struct ip_vs_conn *
> > +ip_vs_ct_in_get(struct net *net, int af, int protocol,
> > +             const union nf_inet_addr *s_addr, __be16 s_port,
> > +             const union nf_inet_addr *d_addr, __be16 d_port)
> >  {
> >       unsigned hash;
> >       struct ip_vs_conn *cp;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
> >
> >       ct_read_lock(hash);
> >
> > -     list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > +     list_for_each_entry(cp, &ipvs->conn_tab[hash], c_list) {
> >               if (cp->af == af &&
> >                   ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> >                   /* protocol should only be IPPROTO_IP if
> > @@ -341,12 +340,14 @@ struct ip_vs_conn *ip_vs_ct_in_get
> >   *   s_addr, s_port: pkt source address (inside host)
> >   *   d_addr, d_port: pkt dest address (foreign host)
> >   */
> > -struct ip_vs_conn *ip_vs_conn_out_get
> > -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> > - const union nf_inet_addr *d_addr, __be16 d_port)
> > +struct ip_vs_conn *
> > +ip_vs_conn_out_get(struct net *net, int af, int protocol,
> > +                const union nf_inet_addr *s_addr, __be16 s_port,
> > +                const union nf_inet_addr *d_addr, __be16 d_port)
> >  {
> >       unsigned hash;
> >       struct ip_vs_conn *cp, *ret=NULL;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       /*
> >        *      Check for "full" addressed entries
> > @@ -355,7 +356,7 @@ struct ip_vs_conn *ip_vs_conn_out_get
> >
> >       ct_read_lock(hash);
> >
> > -     list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > +     list_for_each_entry(cp, &ipvs->conn_tab[hash], c_list) {
> >               if (cp->af == af &&
> >                   ip_vs_addr_equal(af, d_addr, &cp->caddr) &&
> >                   ip_vs_addr_equal(af, s_addr, &cp->daddr) &&
> > @@ -386,17 +387,19 @@ ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
> >                        unsigned int proto_off, int inverse)
> >  {
> >       __be16 _ports[2], *pptr;
> > +     struct net *net = dev_net(skb->dev);
> >
> >       pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> >       if (pptr == NULL)
> >               return NULL;
> >
> > +     BUG_ON(!net);
> >       if (likely(!inverse))
> > -             return ip_vs_conn_out_get(af, iph->protocol,
> > +             return ip_vs_conn_out_get(net, af, iph->protocol,
> >                                         &iph->saddr, pptr[0],
> >                                         &iph->daddr, pptr[1]);
> >       else
> > -             return ip_vs_conn_out_get(af, iph->protocol,
> > +             return ip_vs_conn_out_get(net, af, iph->protocol,
> >                                         &iph->daddr, pptr[1],
> >                                         &iph->saddr, pptr[0]);
> >  }
> > @@ -408,7 +411,7 @@ EXPORT_SYMBOL_GPL(ip_vs_conn_out_get_proto);
> >  void ip_vs_conn_put(struct ip_vs_conn *cp)
> >  {
> >       unsigned long t = (cp->flags & IP_VS_CONN_F_ONE_PACKET) ?
> > -             0 : cp->timeout;
> > +                        0 : cp->timeout;
> >       mod_timer(&cp->timer, jiffies+t);
> >
> >       __ip_vs_conn_put(cp);
> > @@ -418,19 +421,19 @@ void ip_vs_conn_put(struct ip_vs_conn *cp)
> >  /*
> >   *   Fill a no_client_port connection with a client port number
> >   */
> > -void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)
> > +void ip_vs_conn_fill_cport(struct net *net, struct ip_vs_conn *cp, __be16 cport)
> >  {
> >       if (ip_vs_conn_unhash(cp)) {
> >               spin_lock(&cp->lock);
> >               if (cp->flags & IP_VS_CONN_F_NO_CPORT) {
> > -                     atomic_dec(&ip_vs_conn_no_cport_cnt);
> > +                     atomic_dec(&net->ipvs->conn_no_cport_cnt);
> >                       cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
> >                       cp->cport = cport;
> >               }
> >               spin_unlock(&cp->lock);
> >
> >               /* hash on new dport */
> > -             ip_vs_conn_hash(cp);
> > +             ip_vs_conn_hash(net, cp);
> >       }
> >  }
> >
> > @@ -561,12 +564,12 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
> >   * Check if there is a destination for the connection, if so
> >   * bind the connection to the destination.
> >   */
> > -struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
> > +struct ip_vs_dest *ip_vs_try_bind_dest(struct net *net, struct ip_vs_conn *cp)
> >  {
> >       struct ip_vs_dest *dest;
> >
> >       if ((cp) && (!cp->dest)) {
> > -             dest = ip_vs_find_dest(cp->af, &cp->daddr, cp->dport,
> > +             dest = ip_vs_find_dest(net, cp->af, &cp->daddr, cp->dport,
> >                                      &cp->vaddr, cp->vport,
> >                                      cp->protocol);
> >               ip_vs_bind_dest(cp, dest);
> > @@ -638,7 +641,7 @@ static inline void ip_vs_unbind_dest(struct ip_vs_conn *cp)
> >   *   If available, return 1, otherwise invalidate this connection
> >   *   template and return 0.
> >   */
> > -int ip_vs_check_template(struct ip_vs_conn *ct)
> > +int ip_vs_check_template(struct net *net, struct ip_vs_conn *ct)
> >  {
> >       struct ip_vs_dest *dest = ct->dest;
> >
> > @@ -647,7 +650,7 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
> >        */
> >       if ((dest == NULL) ||
> >           !(dest->flags & IP_VS_DEST_F_AVAILABLE) ||
> > -         (sysctl_ip_vs_expire_quiescent_template &&
> > +         (net->ipvs->sysctl_expire_quiescent_template &&
> >            (atomic_read(&dest->weight) == 0))) {
> >               IP_VS_DBG_BUF(9, "check_template: dest not available for "
> >                             "protocol %s s:%s:%d v:%s:%d "
> > @@ -668,7 +671,7 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
> >                               ct->dport = htons(0xffff);
> >                               ct->vport = htons(0xffff);
> >                               ct->cport = 0;
> > -                             ip_vs_conn_hash(ct);
> > +                             ip_vs_conn_hash(net, ct);
> >                       }
> >               }
> >
> > @@ -720,16 +723,17 @@ static void ip_vs_conn_expire(unsigned long data)
> >               if (unlikely(cp->app != NULL))
> >                       ip_vs_unbind_app(cp);
> >               ip_vs_unbind_dest(cp);
> > +             BUG_ON(!cp->net);
> >               if (cp->flags & IP_VS_CONN_F_NO_CPORT)
> > -                     atomic_dec(&ip_vs_conn_no_cport_cnt);
> > -             atomic_dec(&ip_vs_conn_count);
> > +                     atomic_dec(&cp->net->ipvs->conn_no_cport_cnt);
> > +             atomic_dec(&cp->net->ipvs->conn_count);
> >
> > -             kmem_cache_free(ip_vs_conn_cachep, cp);
> > +             kmem_cache_free(cp->net->ipvs->conn_cachep, cp);
> >               return;
> >       }
> >
> >       /* hash it back to the table */
> > -     ip_vs_conn_hash(cp);
> > +     ip_vs_conn_hash(cp->net, cp);
> >
> >    expire_later:
> >       IP_VS_DBG(7, "delayed: conn->refcnt-1=%d conn->n_control=%d\n",
> > @@ -748,18 +752,22 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp)
> >
> >
> >  /*
> > - *   Create a new connection entry and hash it into the ip_vs_conn_tab
> > + *   Create a new connection entry and hash it into the ip_vs_conn_tab,
> > + *   netns ptr will be stored in ip_vs_con here.
> >   */
> >  struct ip_vs_conn *
> > -ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> > +ip_vs_conn_new(struct net *net, int af, int proto,
> > +            const union nf_inet_addr *caddr, __be16 cport,
> >              const union nf_inet_addr *vaddr, __be16 vport,
> > -            const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
> > -            struct ip_vs_dest *dest)
> > +            const union nf_inet_addr *daddr, __be16 dport,
> > +            unsigned flags, struct ip_vs_dest *dest)
> >  {
> >       struct ip_vs_conn *cp;
> > -     struct ip_vs_protocol *pp = ip_vs_proto_get(proto);
> > +     struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, proto);
> > +     struct ip_vs_protocol *pp;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> > -     cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
> > +     cp = kmem_cache_zalloc(ipvs->conn_cachep, GFP_ATOMIC);
> >       if (cp == NULL) {
> >               IP_VS_ERR_RL("%s(): no memory\n", __func__);
> >               return NULL;
> > @@ -790,9 +798,9 @@ ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> >       atomic_set(&cp->n_control, 0);
> >       atomic_set(&cp->in_pkts, 0);
> >
> > -     atomic_inc(&ip_vs_conn_count);
> > +     atomic_inc(&ipvs->conn_count);
> >       if (flags & IP_VS_CONN_F_NO_CPORT)
> > -             atomic_inc(&ip_vs_conn_no_cport_cnt);
> > +             atomic_inc(&ipvs->conn_no_cport_cnt);
> >
> >       /* Bind the connection with a destination server */
> >       ip_vs_bind_dest(cp, dest);
> > @@ -808,12 +816,14 @@ ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> >       else
> >  #endif
> >               ip_vs_bind_xmit(cp);
> > -
> > -     if (unlikely(pp && atomic_read(&pp->appcnt)))
> > -             ip_vs_bind_app(cp, pp);
> > -
> > +     cp->net = net;  /* netns ptr  needed in timer */
> > +     if( pd ) {
> > +             pp = pd->pp;
> > +             if (unlikely(pp && atomic_read(&pd->appcnt)))
> > +                     ip_vs_bind_app(net, cp, pp);
> > +     }
> >       /* Hash it in the ip_vs_conn_tab finally */
> > -     ip_vs_conn_hash(cp);
> > +     ip_vs_conn_hash(net, cp);
> >
> >       return cp;
> >  }
> > @@ -824,16 +834,33 @@ ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> >   */
> >  #ifdef CONFIG_PROC_FS
> >
> > +struct ipvs_private {
> > +     struct seq_net_private p;
> > +     void *private;
> > +};
> > +
> > +static inline void ipvs_seq_priv_set(struct seq_file *seq, void *data)
> > +{
> > +     struct ipvs_private *ipriv=(struct ipvs_private *)seq->private;
> > +     ipriv->private = data;
> > +}
> > +static inline void *ipvs_seq_priv_get(struct seq_file *seq)
> > +{
> > +     return ((struct ipvs_private *)seq->private)->private;
> > +}
> > +
> >  static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> >  {
> >       int idx;
> >       struct ip_vs_conn *cp;
> > +     struct net *net = seq_file_net(seq);
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> >               ct_read_lock_bh(idx);
> > -             list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
> > +             list_for_each_entry(cp, &ipvs->conn_tab[idx], c_list) {
> >                       if (pos-- == 0) {
> > -                             seq->private = &ip_vs_conn_tab[idx];
> > +                             ipvs_seq_priv_set(seq, &ipvs->conn_tab[idx]);
> >                               return cp;
> >                       }
> >               }
> > @@ -845,15 +872,17 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> >
> >  static void *ip_vs_conn_seq_start(struct seq_file *seq, loff_t *pos)
> >  {
> > -     seq->private = NULL;
> > +     ipvs_seq_priv_set(seq, NULL);
> >       return *pos ? ip_vs_conn_array(seq, *pos - 1) :SEQ_START_TOKEN;
> >  }
> > -
> > + /* netns: conn_tab OK */
> >  static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> >  {
> >       struct ip_vs_conn *cp = v;
> > -     struct list_head *e, *l = seq->private;
> > +     struct list_head *e, *l = ipvs_seq_priv_get(seq);
> >       int idx;
> > +     struct net *net = seq_file_net(seq);
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       ++*pos;
> >       if (v == SEQ_START_TOKEN)
> > @@ -863,27 +892,28 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> >       if ((e = cp->c_list.next) != l)
> >               return list_entry(e, struct ip_vs_conn, c_list);
> >
> > -     idx = l - ip_vs_conn_tab;
> > +     idx = l - ipvs->conn_tab;
> >       ct_read_unlock_bh(idx);
> >
> >       while (++idx < ip_vs_conn_tab_size) {
> >               ct_read_lock_bh(idx);
> > -             list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
> > -                     seq->private = &ip_vs_conn_tab[idx];
> > +             list_for_each_entry(cp, &ipvs->conn_tab[idx], c_list) {
> > +                     ipvs_seq_priv_set(seq, &ipvs->conn_tab[idx]);
> >                       return cp;
> >               }
> >               ct_read_unlock_bh(idx);
> >       }
> > -     seq->private = NULL;
> > +     ipvs_seq_priv_set(seq, NULL);
> >       return NULL;
> >  }
> > -
> > +/* netns: conn_tab OK */
> >  static void ip_vs_conn_seq_stop(struct seq_file *seq, void *v)
> >  {
> > -     struct list_head *l = seq->private;
> > +     struct list_head *l = ipvs_seq_priv_get(seq);
> > +     struct net *net = seq_file_net(seq);
> >
> >       if (l)
> > -             ct_read_unlock_bh(l - ip_vs_conn_tab);
> > +             ct_read_unlock_bh(l - net->ipvs->conn_tab);
> >  }
> >
> >  static int ip_vs_conn_seq_show(struct seq_file *seq, void *v)
> > @@ -928,7 +958,16 @@ static const struct seq_operations ip_vs_conn_seq_ops = {
> >
> >  static int ip_vs_conn_open(struct inode *inode, struct file *file)
> >  {
> > -     return seq_open(file, &ip_vs_conn_seq_ops);
> > +     int ret;
> > +     struct ipvs_private *priv;
> > +
> > +     ret = seq_open_net(inode, file, &ip_vs_conn_seq_ops,
> > +                        sizeof(struct ipvs_private));
> > +     if (!ret) {
> > +             priv = ((struct seq_file *)file->private_data)->private;
> > +             priv->private = NULL;
> > +     }
> > +     return ret;
> >  }
> >
> >  static const struct file_operations ip_vs_conn_fops = {
> > @@ -936,7 +975,8 @@ static const struct file_operations ip_vs_conn_fops = {
> >       .open    = ip_vs_conn_open,
> >       .read    = seq_read,
> >       .llseek  = seq_lseek,
> > -     .release = seq_release,
> > +     .release = seq_release_private,
> > +
> >  };
> >
> >  static const char *ip_vs_origin_name(unsigned flags)
> > @@ -991,7 +1031,17 @@ static const struct seq_operations ip_vs_conn_sync_seq_ops = {
> >
> >  static int ip_vs_conn_sync_open(struct inode *inode, struct file *file)
> >  {
> > -     return seq_open(file, &ip_vs_conn_sync_seq_ops);
> > +     int ret;
> > +     struct ipvs_private *ipriv;
> > +
> > +     ret = seq_open_net(inode, file, &ip_vs_conn_sync_seq_ops,
> > +                        sizeof(struct ipvs_private));
> > +     if (!ret) {
> > +             ipriv = ((struct seq_file *)file->private_data)->private;
> > +             ipriv->private = NULL;
> > +     }
> > +     return ret;
> > +//   return seq_open(file, &ip_vs_conn_sync_seq_ops);
> >  }
> >
> >  static const struct file_operations ip_vs_conn_sync_fops = {
> > @@ -999,7 +1049,7 @@ static const struct file_operations ip_vs_conn_sync_fops = {
> >       .open    = ip_vs_conn_sync_open,
> >       .read    = seq_read,
> >       .llseek  = seq_lseek,
> > -     .release = seq_release,
> > +     .release = seq_release_private,
> >  };
> >
> >  #endif
> > @@ -1036,11 +1086,14 @@ static inline int todrop_entry(struct ip_vs_conn *cp)
> >       return 1;
> >  }
> >
> > -/* Called from keventd and must protect itself from softirqs */
> > -void ip_vs_random_dropentry(void)
> > +/* Called from keventd and must protect itself from softirqs
> > + * netns: conn_tab OK
> > + */
> > +void ip_vs_random_dropentry(struct net *net)
> >  {
> >       int idx;
> >       struct ip_vs_conn *cp;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       /*
> >        * Randomly scan 1/32 of the whole table every second
> > @@ -1053,7 +1106,7 @@ void ip_vs_random_dropentry(void)
> >                */
> >               ct_write_lock_bh(hash);
> >
> > -             list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > +             list_for_each_entry(cp, &ipvs->conn_tab[hash], c_list) {
> >                       if (cp->flags & IP_VS_CONN_F_TEMPLATE)
> >                               /* connection template */
> >                               continue;
> > @@ -1091,11 +1144,13 @@ void ip_vs_random_dropentry(void)
> >
> >  /*
> >   *      Flush all the connection entries in the ip_vs_conn_tab
> > + * netns: conn_tab OK
> >   */
> > -static void ip_vs_conn_flush(void)
> > +static void ip_vs_conn_flush(struct net *net)
> >  {
> >       int idx;
> >       struct ip_vs_conn *cp;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >    flush_again:
> >       for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> > @@ -1104,7 +1159,7 @@ static void ip_vs_conn_flush(void)
> >                */
> >               ct_write_lock_bh(idx);
> >
> > -             list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
> > +             list_for_each_entry(cp, &ipvs->conn_tab[idx], c_list) {
> >
> >                       IP_VS_DBG(4, "del connection\n");
> >                       ip_vs_conn_expire_now(cp);
> > @@ -1118,16 +1173,17 @@ static void ip_vs_conn_flush(void)
> >
> >       /* the counter may be not NULL, because maybe some conn entries
> >          are run by slow timer handler or unhashed but still referred */
> > -     if (atomic_read(&ip_vs_conn_count) != 0) {
> > +     if (atomic_read(&ipvs->conn_count) != 0) {
> >               schedule();
> >               goto flush_again;
> >       }
> >  }
> >
> >
> > -int __init ip_vs_conn_init(void)
> > +int __net_init __ip_vs_conn_init(struct net *net)
> >  {
> >       int idx;
> > +     struct netns_ipvs *ipvs = net->ipvs;
> >
> >       /* Compute size and mask */
> >       ip_vs_conn_tab_size = 1 << ip_vs_conn_tab_bits;
> > @@ -1136,19 +1192,26 @@ int __init ip_vs_conn_init(void)
> >       /*
> >        * Allocate the connection hash table and initialize its list heads
> >        */
> > -     ip_vs_conn_tab = vmalloc(ip_vs_conn_tab_size *
> > +     ipvs->conn_tab = vmalloc(ip_vs_conn_tab_size *
> >                                sizeof(struct list_head));
> > -     if (!ip_vs_conn_tab)
> > +     if (!ipvs->conn_tab)
> >               return -ENOMEM;
> >
> >       /* Allocate ip_vs_conn slab cache */
> > -     ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
> > +     /* Todo: find a better way to name the cache */
> > +     snprintf(ipvs->conn_cname, sizeof(ipvs->conn_cname)-1,
> > +                     "ipvs_conn_%d", atomic_read(&conn_cache_nr) );
> > +     atomic_inc(&conn_cache_nr);
> > +
> > +     ipvs->conn_cachep = kmem_cache_create(ipvs->conn_cname,
> >                                             sizeof(struct ip_vs_conn), 0,
> >                                             SLAB_HWCACHE_ALIGN, NULL);
> > -     if (!ip_vs_conn_cachep) {
> > -             vfree(ip_vs_conn_tab);
> > +     if (!ipvs->conn_cachep) {
> > +             vfree(ipvs->conn_tab);
> >               return -ENOMEM;
> >       }
> > +     atomic_set(&ipvs->conn_count, 0);
> > +     atomic_set(&ipvs->conn_no_cport_cnt, 0);
> >
> >       pr_info("Connection hash table configured "
> >               "(size=%d, memory=%ldKbytes)\n",
> > @@ -1158,31 +1221,46 @@ int __init ip_vs_conn_init(void)
> >                 sizeof(struct ip_vs_conn));
> >
> >       for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> > -             INIT_LIST_HEAD(&ip_vs_conn_tab[idx]);
> > +             INIT_LIST_HEAD(&ipvs->conn_tab[idx]);
> >       }
> >
> >       for (idx = 0; idx < CT_LOCKARRAY_SIZE; idx++)  {
> >               rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
> >       }
> >
> > -     proc_net_fops_create(&init_net, "ip_vs_conn", 0, &ip_vs_conn_fops);
> > -     proc_net_fops_create(&init_net, "ip_vs_conn_sync", 0, &ip_vs_conn_sync_fops);
> > -
> > -     /* calculate the random value for connection hash */
> > -     get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
> > +     proc_net_fops_create(net, "ip_vs_conn", 0, &ip_vs_conn_fops);
> > +     proc_net_fops_create(net, "ip_vs_conn_sync", 0, &ip_vs_conn_sync_fops);
> >
> >       return 0;
> >  }
> > +/* Cleanup and release all netns related ... */
> > +static void __net_exit __ip_vs_conn_cleanup(struct net *net) {
> >
> > +     /* flush all the connection entries first */
> > +     ip_vs_conn_flush(net);
> > +     /* Release the empty cache */
> > +     kmem_cache_destroy(net->ipvs->conn_cachep);
> > +     proc_net_remove(net, "ip_vs_conn");
> > +     proc_net_remove(net, "ip_vs_conn_sync");
> > +     vfree(net->ipvs->conn_tab);
> > +}
> > +static struct pernet_operations ipvs_conn_ops = {
> > +     .init = __ip_vs_conn_init,
> > +     .exit = __ip_vs_conn_cleanup,
> > +};
> >
> > -void ip_vs_conn_cleanup(void)
> > +int __init ip_vs_conn_init(void)
> >  {
> > -     /* flush all the connection entries first */
> > -     ip_vs_conn_flush();
> > +     int rv;
> >
> > -     /* Release the empty cache */
> > -     kmem_cache_destroy(ip_vs_conn_cachep);
> > -     proc_net_remove(&init_net, "ip_vs_conn");
> > -     proc_net_remove(&init_net, "ip_vs_conn_sync");
> > -     vfree(ip_vs_conn_tab);
> > +     rv = register_pernet_subsys(&ipvs_conn_ops);
> > +
> > +     /* calculate the random value for connection hash */
> > +     get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
> > +     return rv;
> > +}
> > +
> > +void ip_vs_conn_cleanup(void)
> > +{
> > +     unregister_pernet_subsys(&ipvs_conn_ops);
> >  }
> >
> > --
> > Regards
> > Hans Schillstrom <hans.schillstrom@ericsson.com>
> > --
> > To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>

--
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [BUG] problems with "ip xfrm" on 32-bit userspace with 64-bit kernel
From: Florian Westphal @ 2010-10-21  7:50 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, Linux Kernel Mailing List
In-Reply-To: <4CBF78B6.90002@genband.com>

Chris Friesen <chris.friesen@genband.com> wrote:
> We've run into a 32/64 compatibility problem with iproute2.  The "ip
> xfrm monitor acquire" command doesn't work properly due to struct size
> mismatches between kernel and userspace.

Yes.  See archives for 'xfrm: add x86 CONFIG_COMPAT support'
(http://marc.info/?t=127050655600003&r=1&w=2)

for a discussion on why the patch set to fix this was rejected.

^ permalink raw reply

* Re: [RFC PATCH 1/9] ipvs network name space aware
From: Hans Schillstrom @ 2010-10-21  7:45 UTC (permalink / raw)
  To: paulmck@linux.vnet.ibm.com
  Cc: Daniel Lezcano, lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, horms@verge.net.au, ja@ssi.bg,
	wensong@linux-vs.org
In-Reply-To: <20101020160205.GB2386@linux.vnet.ibm.com>

On Wednesday 20 October 2010 18:02:06 Paul E. McKenney wrote:
> On Wed, Oct 20, 2010 at 10:25:19AM +0200, Hans Schillstrom wrote:
> > On Tuesday 19 October 2010 20:44:36 Paul E. McKenney wrote:
> > > On Mon, Oct 18, 2010 at 03:23:48PM +0200, Hans Schillstrom wrote:
> > > > On Monday 18 October 2010 13:37:38 Daniel Lezcano wrote:
> > > > > On 10/18/2010 11:54 AM, Hans Schillstrom wrote:
> > > > > > On Monday 18 October 2010 10:59:25 Daniel Lezcano wrote:
> > > > > >
> > > > > >> On 10/08/2010 01:16 PM, Hans Schillstrom wrote:
> > > > > >>
> > > > > >>> This part contains the include files
> > > > > >>> where include/net/netns/ip_vs.h is new and contains all moved vars.
> > > > > >>>
> > > > > >>> SUMMARY
> > > > > >>>
> > > > > >>>    include/net/ip_vs.h                     |  136 ++++---
> > > > > >>>    include/net/net_namespace.h             |    2 +
> > > > > >>>    include/net/netns/ip_vs.h               |  112 +++++
> > > > > >>>
> > > > > >>> Signed-off-by:Hans Schillstrom<hans.schillstrom@ericsson.com>
> > > > > >>> ---
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >> [ ... ]
> > > > > >>
> > > > > >>
> > > > > >>>    #ifdef CONFIG_IP_VS_IPV6
> > > > > >>> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> > > > > >>> index bd10a79..b59cdc5 100644
> > > > > >>> --- a/include/net/net_namespace.h
> > > > > >>> +++ b/include/net/net_namespace.h
> > > > > >>> @@ -15,6 +15,7 @@
> > > > > >>>    #include<net/netns/ipv4.h>
> > > > > >>>    #include<net/netns/ipv6.h>
> > > > > >>>    #include<net/netns/dccp.h>
> > > > > >>> +#include<net/netns/ip_vs.h>
> > > > > >>>    #include<net/netns/x_tables.h>
> > > > > >>>    #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
> > > > > >>>    #include<net/netns/conntrack.h>
> > > > > >>> @@ -91,6 +92,7 @@ struct net {
> > > > > >>>    	struct sk_buff_head	wext_nlevents;
> > > > > >>>    #endif
> > > > > >>>    	struct net_generic	*gen;
> > > > > >>> +	struct netns_ipvs       *ipvs;
> > > > > >>>    };
> > > > > >>>
> > > > > >>>
> > > > > >> IMHO, it would be better to use the net_generic infra-structure instead
> > > > > >> of adding a new field in the netns structure.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > > I realized that to, but the performance penalty is quite high with net_generic :-(
> > > > > > But on the other hand if you are going to backport it, (without recompiling the kernel)
> > > > > > you gonna need it!
> > > > > >
> > > > >
> > > > > Hmm, yes. We don't want to have the init_net_ns performances to be impacted.
> > > > >
> > > > > You use here a pointer which will be dereferenced like the net_generic,
> > > > > I don't think there will be
> > > > > a big difference between using net_generic and using a pointer in the
> > > > > net namespace structure.
> > > > >
> > > > > The difference is the id usage, but this one is based on the idr which
> > > > > is quite fast.
> > > > >
> > > >
> > > > I'm not so sure about that, have a look at net_generic and rcu_read_lock
> > > > and compare
> > > >  ipvs = net->ipvs;
> > > > vs.
> > > >  ipvs = net_generic(net, id)
> > > >
> > > > static inline void *net_generic(struct net *net, int id)
> > > > {
> > > > 	struct net_generic *ng;
> > > > 	void *ptr;
> > > >
> > > > 	rcu_read_lock();
> > > > 	ng = rcu_dereference(net->gen);
> > > > 	BUG_ON(id == 0 || id > ng->len);
> > > > 	ptr = ng->ptr[id - 1];
> > > > 	rcu_read_unlock();
> > > >
> > > > 	return ptr;
> > > > }
> > > > ...
> > > > static inline void rcu_read_lock(void)
> > > > {
> > > >         __rcu_read_lock();
> > > >         __acquire(RCU);
> > > >         rcu_read_acquire();
> > > > }
> > > >
> > > > Another way of doing it is to pass the ipvs ptr instead of the net ptr,
> > > > and add *net to the ipvs struct.
> > > >
> > > > > We should experiment a bit here to compare both solutions.
> > > > Agre
> > > > >
> > > > I single stepped through the rcu_read_lock() on a x86_64
> > > > and it's quite many "stepi" that you need to enter :-(
> > >
> > > Was this by chance with lockdep enabled?  If not, could you please send
> > > your .config?
> > >
> > > 							Thanx, Paul
> >
> > No lockdep, but what I ment is that net_generic is not as fast as a plain ptr->xxx.
> > IPVS has hooks in the netfilter chain, and gets a huge amount of packets .
> >
> > I don't think IPVS is a candidate for net_generic, it should have its own part in "struct net"
> > That was my point.
> > ( No critic to locking or net_generic)
>
> You said that there were a lot of "stepi" commands to get through
> rcu_read_lock() on x86_64.  This is quite surprising, especially if you
> built with CONFIG_RCU_TREE.  Even if you built with CONFIG_PREEMPT_RCU_TREE,
> you should only see something like the following from rcu_read_lock():
>
> 000000b7 <__rcu_read_lock>:
>       b7:	55                   	push   %ebp
>       b8:	64 a1 00 00 00 00    	mov    %fs:0x0,%eax
>       be:	ff 80 80 01 00 00    	incl   0x180(%eax)
>       c4:	89 e5                	mov    %esp,%ebp
>       c6:	5d                   	pop    %ebp
>       c7:	c3                   	ret
>
> Unless you have some sort of debugging options turned on.  Or unless
> six instructions counts for "quite many" stepi commands.  ;-)
>
I do have this (and some debuging)
__rcu_read_lock()
=> 0xffffffff8108bcf3 <+0>:	push   %rbp
   0xffffffff8108bcf4 <+1>:	mov    %rsp,%rbp
   0xffffffff8108bcf7 <+4>:	nopl   0x0(%rax,%rax,1)
   0xffffffff8108bcfc <+9>:	mov    %gs:0xb540,%rax
   0xffffffff8108bd05 <+18>:	mov    0x108(%rax),%edx
   0xffffffff8108bd0b <+24>:	inc    %edx
   0xffffffff8108bd0d <+26>:	mov    %edx,0x108(%rax)
   0xffffffff8108bd13 <+32>:	leaveq
   0xffffffff8108bd14 <+33>:	retq

which is not that many, actually imprerssing few instructions :-)

Thanks
	Hans

> So I am quite curious, independent of whether or not IPVS is a candidate
> for net_generic.  That choice for IPVS is not mine to make, and I will
> trust the relevant developers and maintainers to make the right choice,
> whether that be RCU or something else.  Even I do not claim that RCU
> is the right tool for all jobs!  ;-)
>
> 							Thanx, Paul
> --
> To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [RFC PATCH 5/9] ipvs network name space aware
From: Hans Schillstrom @ 2010-10-21  7:35 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, ja@ssi.bg, wensong@linux-vs.org,
	daniel.lezcano@free.fr
In-Reply-To: <20101020152112.GA8502@verge.net.au>

On Wednesday 20 October 2010 17:21:45 Simon Horman wrote:
> On Fri, Oct 08, 2010 at 01:17:02PM +0200, Hans Schillstrom wrote:
> > This patch just contains ip_vs_ctl
> >
> > Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
> >
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index ca8ec8c..7e99cbc 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
>
> [ snip ]
>
> > @@ -3377,62 +3383,131 @@ static void ip_vs_genl_unregister(void)
> >  }
> >
> >  /* End of Generic Netlink interface definitions */
> > +/*
> > + * per netns intit/exit func.
> > + */
> > +int /*__net_init*/ __ip_vs_control_init(struct net *net)
>
> Can you describe why __net_init is commented out?

The coloring in my editor get fucked up  :-)
I just forgott to remove the comment

>
> [ snip ]
>
--
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* TCP always advertises zero window.
From: Li Yu @ 2010-10-21  7:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi,

	We found this on RHEL 5.4, the kernel is 2.6.18-164.11.1.el5, and also suspect that recent kernel also has similar problem. 

	First, we turned off both TCP window scaling option and MTU probe feature. On some servers, we found that some servers may always advertised another ends zero received window, below are some captured traffic (by tcpdump -S -nn -vv):

16:24:59.990545 IP (tos 0x0, ttl  64, id 37079, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.157.3.2904 > 10.1.157.4.2903: ., cksum 0x96df (correct), 3830348746:3830348746(0) ack 1951026211 win 65160 <nop,nop,timestamp 1040455485 1040632013>
16:25:00.054563 IP (tos 0x0, ttl  64, id 47424, offset 0, flags [DF], proto: TCP (6), length: 460) 10.1.157.4.2903 > 10.1.157.3.2904: P 1951026211:1951026619(408) ack 3830348746 win 0 <nop,nop,timestamp 1040632077 1040455485>
16:25:00.054579 IP (tos 0x0, ttl  64, id 37080, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.157.3.2904 > 10.1.157.4.2903: ., cksum 0x94c7 (correct), 3830348746:3830348746(0) ack 1951026619 win 65160 <nop,nop,timestamp 1040455549 1040632077>
16:25:01.451253 IP (tos 0x0, ttl  64, id 47425, offset 0, flags [DF], proto: TCP (6), length: 4148) 10.1.157.4.2903 > 10.1.157.3.2904: P 1951026619:1951030715(4096) ack 3830348746 win 0 <nop,nop,timestamp 1040633474 1040455549>

	As above example show, 10.1.157.4 always advertise zero window forever. I wrote a small toy kernel module to show TCP internal socket status as below:

tcp-snapshot:sock:
  sk->sk_family=2
  sk->sk_state=1
  sk->sk_reuse=1
  sk->sk_bound_dev_if=0
  atomic_read(&sk->sk_refcnt)=3
  sk->sk_hash=117920776
  sk->sk_shutdown=0
  sk->sk_no_check=0
  sk->sk_userlocks=7
  sk->sk_protocol=6
  sk->sk_type=1
  sk->sk_rcvbuf=131072
  list_empty(&sk->sk_sleep->task_list)=0
  atomic_read(&sk->sk_rmem_alloc)=0
  atomic_read(&sk->sk_wmem_alloc)=0
  atomic_read(&sk->sk_omem_alloc)=0
  sk->sk_receive_queue.qlen=0
  sk->sk_write_queue.qlen=0
  sk->sk_async_wait_queue.qlen=0
  sk->sk_error_queue.qlen=0
  sk->sk_wmem_queued=0
  sk->sk_forward_alloc=8192
  sk->sk_allocation=d0
  sk->sk_sndbuf=131072
  sk->sk_route_caps=1143a9
  sk->sk_gso_type=1
  sk->sk_rcvlowat=1
  sk->sk_flags=300
  sk->sk_lingertime=0
  sk->sk_err=0
  sk->sk_err_soft=0
  sk->sk_ack_backlog=0
  sk->sk_max_ack_backlog=128
  sk->sk_priority=0
  sk->sk_rcvtimeo=9223372036854775807
  sk->sk_sndtimeo=9223372036854775807
  sk->sk_protinfo=0000000000000000
  sk->sk_stamp.tv_sec=18446744073709551615
  sk->sk_stamp.tv_usec=18446744073709551615
  sk->sk_socket=ffff81053ee71080
  sk->sk_user_data=0000000000000000
  sk->sk_sndmsg_page=ffff8103761ab220
  sk->sk_sndmsg_off=475
  sk->sk_send_head=0000000000000000
  sk->sk_write_pending=0
tcp-snapshot:inet_sock:
  inetsk->daddr=39d010a
  inetsk->rcv_saddr=49d010a
  inetsk->dport=580b
  inetsk->num=b57
  inetsk->saddr=49d010a
  inetsk->uc_ttl=4294967295
  inetsk->cmsg_flags=0
  inetsk->opt=0000000000000000
  inetsk->sport=570b
  inetsk->id=5843
  inetsk->tos=0
  inetsk->mc_ttl=64
  inetsk->pmtudisc=1
  inetsk->recverr=0
  inetsk->is_icsk=1
  inetsk->freebind=0
  inetsk->hdrincl=0
  inetsk->mc_loop=1
  inetsk->mc_index=2
  inetsk->mc_addr=0
  inetsk->mc_list=0000000000000000
tcp-snapshot:inet_connection_sk
  icsk->icsk_accept_queue.rskq_defer_accept=0
  icsk->icsk_accept_queue.listen_opt=0000000000000000
  icsk->icsk_timeout=5336784156
  icsk->icsk_rto=218
  icsk->icsk_pmtu_cookie=1500
  icsk->icsk_ca_state=0
  icsk->icsk_retransmits=0
  icsk->icsk_pending=0
  icsk->icsk_backoff=0
  icsk->icsk_syn_retries=0
  icsk->icsk_probes_out=0
  icsk->icsk_ext_hdr_len=0
  icsk->icsk_ack.pending=0
  icsk->icsk_ack.quick=0
  icsk->icsk_ack.pingpong=1
  icsk->icsk_ack.blocked=0
  icsk->icsk_ack.ato=40
  icsk->icsk_ack.timeout=5303454287
  icsk->icsk_ack.lrcvtime=1008486952
  icsk->icsk_ack.last_seg_size=6814
  icsk->icsk_ack.rcv_mss=8688
  icsk->icsk_mtup.enabled=0
  icsk->icsk_mtup.search_high=1500
  icsk->icsk_mtup.search_low=564
  icsk->icsk_mtup.probe_size=0
tcp-snapshot:tcp_sock
  tcpsk->tcp_header_len=32
  tcpsk->pred_flags=0
  tcpsk->rcv_nxt=3830348746
  tcpsk->snd_nxt=1984376345
  tcpsk->snd_una=1984376345
  tcpsk->snd_sml=1984376345
  tcpsk->rcv_tstamp=1041816640
  tcpsk->lsndtime=1041816640
  tcpsk->ucopy.prequeue.qlen=0
  tcpsk->ucopy.task=0000000000000000
  tcpsk->ucopy.iov=0000000000000000
  tcpsk->ucopy.memory=0
  tcpsk->ucopy.len=0
  tcpsk->snd_wl1=3830348746
  tcpsk->snd_wnd=65160
  tcpsk->max_window=65524
  tcpsk->mss_cache=1448
  tcpsk->xmit_size_goal=31856
  tcpsk->window_clamp=65535
  tcpsk->rcv_ssthresh=5792
  tcpsk->frto_highmark=0
  tcpsk->reordering=3
  tcpsk->frto_counter=0
  tcpsk->nonagle=1
  tcpsk->keepalive_probes=0
  tcpsk->srtt=121
  tcpsk->mdev=76
  tcpsk->mdev_max=200
  tcpsk->rttvar=203
  tcpsk->rtt_seq=1984376345
  tcpsk->packets_out=0
  tcpsk->left_out=0
  tcpsk->retrans_out=0
  tcpsk->rx_opt.ts_recent_stamp=1287564284
  tcpsk->rx_opt.ts_recent=1041640111
  tcpsk->rx_opt.rcv_tsval=1041640111
  tcpsk->rx_opt.rcv_tsecr=1041816640
  tcpsk->rx_opt.saw_tstamp=1
  tcpsk->rx_opt.tstamp_ok=1
  tcpsk->rx_opt.dsack=0
  tcpsk->rx_opt.wscale_ok=0
  tcpsk->rx_opt.sack_ok=5
  tcpsk->rx_opt.snd_wscale=0
  tcpsk->rx_opt.rcv_wscale=0
  tcpsk->rx_opt.eff_sacks=0
  tcpsk->rx_opt.num_sacks=0
  tcpsk->rx_opt.user_mss=0
  tcpsk->rx_opt.mss_clamp=1460
  tcpsk->snd_ssthresh=4
  tcpsk->snd_cwnd=4
  tcpsk->snd_cwnd_cnt=4
  tcpsk->snd_cwnd_clamp=65535
  tcpsk->snd_cwnd_used=2
  tcpsk->snd_cwnd_stamp=1041816640
  tcpsk->out_of_order_queue.qlen=0
  tcpsk->rcv_wnd=0
  tcpsk->rcv_wup=3830348746
  tcpsk->write_seq=1984376345
  tcpsk->pushed_seq=1984376345
  tcpsk->copied_seq=3830348746
  tcpsk->duplicate_sack[0].start_seq=3613713418
  tcpsk->duplicate_sack[0].end_seq=3613714866
  tcpsk->selective_acks[i].start_seq=3648234364
  tcpsk->selective_acks[i].end_seq=3648247396
  tcpsk->selective_acks[i].start_seq=3647855528
  tcpsk->selective_acks[i].end_seq=3647856976
  tcpsk->selective_acks[i].start_seq=3640487648
  tcpsk->selective_acks[i].end_seq=3640496336
  tcpsk->selective_acks[i].start_seq=3498843984
  tcpsk->selective_acks[i].end_seq=3498845432
  tcpsk->recv_sack_cache[i].start_seq=1226527628
  tcpsk->recv_sack_cache[i].end_seq=1226549030
  tcpsk->recv_sack_cache[i].start_seq=179088461
  tcpsk->recv_sack_cache[i].end_seq=179091357
  tcpsk->recv_sack_cache[i].start_seq=4042009662
  tcpsk->recv_sack_cache[i].end_seq=4042011110
  tcpsk->recv_sack_cache[i].start_seq=0
  tcpsk->recv_sack_cache[i].end_seq=0
  tcpsk->lost_skb_hint=0000000000000000
  tcpsk->scoreboard_skb_hint=0000000000000000
  tcpsk->retransmit_skb_hint=0000000000000000
  tcpsk->forward_skb_hint=0000000000000000
  tcpsk->fastpath_skb_hint=0000000000000000
  tcpsk->fastpath_cnt_hint=15
  tcpsk->lost_cnt_hint=6
  tcpsk->retransmit_cnt_hint=0
  tcpsk->forward_cnt_hint=9
  tcpsk->advmss=1448
  tcpsk->prior_ssthresh=5
  tcpsk->lost_out=0
  tcpsk->sacked_out=0
  tcpsk->fackets_out=0
  tcpsk->high_seq=1226549030
  tcpsk->retrans_stamp=0
  tcpsk->undo_marker=0
  tcpsk->undo_retrans=1
  tcpsk->urg_seq=0
  tcpsk->urg_data=0
  tcpsk->urg_mode=0
  tcpsk->ecn_flags=0
  tcpsk->snd_up=0
  tcpsk->total_retrans=2110
  tcpsk->bytes_acked=0
  tcpsk->keepalive_time=0
  tcpsk->keepalive_intvl=0
  tcpsk->linger2=0
  tcpsk->last_synq_overflow=0
  tcpsk->rcv_rtt_est.rtt=15
  tcpsk->rcv_rtt_est.seq=3830352454
  tcpsk->rcv_rtt_est.time=1008486951
  tcpsk->rcvq_space.space=468244
  tcpsk->rcvq_space.seq=3830258350
  tcpsk->rcvq_space.time=1008486952
  tcpsk->mtu_probe.probe_seq_start=0
  tcpsk->mtu_probe.probe_seq_end=0

	We noticed that the tcpsk->rcv_wnd indeed is 0, but tcpsk->copied_seq equals with tcpsk->rcv_nxt, and sk->sk_rmem_alloc is 0, both latters mean that there has not any pending data in receive queue.

	After some digging against source code, I found that __tcp_select_window() actually returns zero in such case. In my words, the function should resume window into non-zero value in this time (we have full free space on receive queue), is it right?

	In such case, I think that tcpsk->rcv_ssthresh has an exceptional value, it is too small, which triggers skiping rest processing in __tcp_select_window(), leave zero window forever.

	According to source code, only MTU probing success and receive some non-zero length of L7 payload could grow tcp_sock->rcv_ssthresh. Because of we turn off MTU probing and TCP only received some zero-window probe from another end, so it seem that we have not any chance to update tcp_sock->rcv_ssthresh at all, so a dead loop come here.

	It seem that we lost some processing in skb_data_queue() to keep consistent between free space of receive queue and tcpsk->rcv_ssthresh, is it right? or I missed something or some wrongs in my understanding...


	Thank you~

Yu





^ permalink raw reply

* [PATCHv2] vmxnet3: remove set_flag_le{16,64} helpers
From: Harvey Harrison @ 2010-10-21  6:32 UTC (permalink / raw)
  To: sbhatewara; +Cc: netdev, shemminger

It's easier to just annotate the constants as little endian types and set/clear
the flags directly.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
---i
Sorry, missed a git add and left a line out of the previous patch.

 drivers/net/vmxnet3/upt1_defs.h       |    8 +++---
 drivers/net/vmxnet3/vmxnet3_defs.h    |    6 ++--
 drivers/net/vmxnet3/vmxnet3_drv.c     |   37 ++++++++-------------------------
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   14 +++++-------
 drivers/net/vmxnet3/vmxnet3_int.h     |    4 ---
 5 files changed, 22 insertions(+), 47 deletions(-)

diff --git a/drivers/net/vmxnet3/upt1_defs.h b/drivers/net/vmxnet3/upt1_defs.h
index 37108fb..969c751 100644
--- a/drivers/net/vmxnet3/upt1_defs.h
+++ b/drivers/net/vmxnet3/upt1_defs.h
@@ -88,9 +88,9 @@ struct UPT1_RSSConf {
 
 /* features */
 enum {
-	UPT1_F_RXCSUM		= 0x0001,   /* rx csum verification */
-	UPT1_F_RSS		= 0x0002,
-	UPT1_F_RXVLAN		= 0x0004,   /* VLAN tag stripping */
-	UPT1_F_LRO		= 0x0008,
+	UPT1_F_RXCSUM		= cpu_to_le64(0x0001),   /* rx csum verification */
+	UPT1_F_RSS		= cpu_to_le64(0x0002),
+	UPT1_F_RXVLAN		= cpu_to_le64(0x0004),   /* VLAN tag stripping */
+	UPT1_F_LRO		= cpu_to_le64(0x0008),
 };
 #endif
diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h
index ca7727b..4d84912 100644
--- a/drivers/net/vmxnet3/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/vmxnet3_defs.h
@@ -523,9 +523,9 @@ struct Vmxnet3_RxFilterConf {
 #define VMXNET3_PM_MAX_PATTERN_SIZE   128
 #define VMXNET3_PM_MAX_MASK_SIZE      (VMXNET3_PM_MAX_PATTERN_SIZE / 8)
 
-#define VMXNET3_PM_WAKEUP_MAGIC       0x01  /* wake up on magic pkts */
-#define VMXNET3_PM_WAKEUP_FILTER      0x02  /* wake up on pkts matching
-					     * filters */
+#define VMXNET3_PM_WAKEUP_MAGIC       cpu_to_le16(0x01)  /* wake up on magic pkts */
+#define VMXNET3_PM_WAKEUP_FILTER      cpu_to_le16(0x02)  /* wake up on pkts matching
+							  * filters */
 
 
 struct Vmxnet3_PM_PktFilter {
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 198ce92..c8d1a14 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1548,23 +1548,6 @@ vmxnet3_free_irqs(struct vmxnet3_adapter *adapter)
 	}
 }
 
-
-inline void set_flag_le16(__le16 *data, u16 flag)
-{
-	*data = cpu_to_le16(le16_to_cpu(*data) | flag);
-}
-
-inline void set_flag_le64(__le64 *data, u64 flag)
-{
-	*data = cpu_to_le64(le64_to_cpu(*data) | flag);
-}
-
-inline void reset_flag_le64(__le64 *data, u64 flag)
-{
-	*data = cpu_to_le64(le64_to_cpu(*data) & ~flag);
-}
-
-
 static void
 vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 {
@@ -1580,8 +1563,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 			adapter->vlan_grp = grp;
 
 			/* update FEATURES to device */
-			set_flag_le64(&devRead->misc.uptFeatures,
-				      UPT1_F_RXVLAN);
+			devRead->misc.uptFeatures |= UPT1_F_RXVLAN;
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
 			/*
@@ -1604,7 +1586,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 		struct Vmxnet3_DSDevRead *devRead = &shared->devRead;
 		adapter->vlan_grp = NULL;
 
-		if (le64_to_cpu(devRead->misc.uptFeatures) & UPT1_F_RXVLAN) {
+		if (devRead->misc.uptFeatures & UPT1_F_RXVLAN) {
 			int i;
 
 			for (i = 0; i < VMXNET3_VFT_SIZE; i++) {
@@ -1617,8 +1599,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 					       VMXNET3_CMD_UPDATE_VLAN_FILTERS);
 
 			/* update FEATURES to device */
-			reset_flag_le64(&devRead->misc.uptFeatures,
-					UPT1_F_RXVLAN);
+			devRead->misc.uptFeatures &= ~UPT1_F_RXVLAN;
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
 		}
@@ -1779,15 +1760,15 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
 
 	/* set up feature flags */
 	if (adapter->rxcsum)
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_RXCSUM);
+		devRead->misc.uptFeatures |= UPT1_F_RXCSUM;
 
 	if (adapter->lro) {
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_LRO);
+		devRead->misc.uptFeatures |= UPT1_F_LRO;
 		devRead->misc.maxNumRxSG = cpu_to_le16(1 + MAX_SKB_FRAGS);
 	}
 	if ((adapter->netdev->features & NETIF_F_HW_VLAN_RX) &&
 	    adapter->vlan_grp) {
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_RXVLAN);
+		devRead->misc.uptFeatures |= UPT1_F_RXVLAN;
 	}
 
 	devRead->misc.mtu = cpu_to_le32(adapter->netdev->mtu);
@@ -2594,7 +2575,7 @@ vmxnet3_suspend(struct device *device)
 		memcpy(pmConf->filters[i].pattern, netdev->dev_addr, ETH_ALEN);
 		pmConf->filters[i].mask[0] = 0x3F; /* LSB ETH_ALEN bits */
 
-		set_flag_le16(&pmConf->wakeUpEvents, VMXNET3_PM_WAKEUP_FILTER);
+		pmConf->wakeUpEvents |= VMXNET3_PM_WAKEUP_FILTER;
 		i++;
 	}
 
@@ -2636,13 +2617,13 @@ vmxnet3_suspend(struct device *device)
 		pmConf->filters[i].mask[5] = 0x03; /* IPv4 TIP */
 		in_dev_put(in_dev);
 
-		set_flag_le16(&pmConf->wakeUpEvents, VMXNET3_PM_WAKEUP_FILTER);
+		pmConf->wakeUpEvents |= VMXNET3_PM_WAKEUP_FILTER;
 		i++;
 	}
 
 skip_arp:
 	if (adapter->wol & WAKE_MAGIC)
-		set_flag_le16(&pmConf->wakeUpEvents, VMXNET3_PM_WAKEUP_MAGIC);
+		pmConf->wakeUpEvents |= VMXNET3_PM_WAKEUP_MAGIC;
 
 	pmConf->numFilters = i;
 
diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7e4b5a8..b79070b 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -50,13 +50,11 @@ vmxnet3_set_rx_csum(struct net_device *netdev, u32 val)
 		adapter->rxcsum = val;
 		if (netif_running(netdev)) {
 			if (val)
-				set_flag_le64(
-				&adapter->shared->devRead.misc.uptFeatures,
-				UPT1_F_RXCSUM);
+				adapter->shared->devRead.misc.uptFeatures |=
+				UPT1_F_RXCSUM;
 			else
-				reset_flag_le64(
-				&adapter->shared->devRead.misc.uptFeatures,
-				UPT1_F_RXCSUM);
+				adapter->shared->devRead.misc.uptFeatures &=
+				~UPT1_F_RXCSUM;
 
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
@@ -292,10 +290,10 @@ vmxnet3_set_flags(struct net_device *netdev, u32 data)
 		/* update harware LRO capability accordingly */
 		if (lro_requested)
 			adapter->shared->devRead.misc.uptFeatures |=
-						cpu_to_le64(UPT1_F_LRO);
+							UPT1_F_LRO;
 		else
 			adapter->shared->devRead.misc.uptFeatures &=
-						cpu_to_le64(~UPT1_F_LRO);
+							~UPT1_F_LRO;
 		VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 				       VMXNET3_CMD_UPDATE_FEATURE);
 	}
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 2121c73..46aee6d 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -353,10 +353,6 @@ struct vmxnet3_adapter {
 #define VMXNET3_MAX_ETH_HDR_SIZE    22
 #define VMXNET3_MAX_SKB_BUF_SIZE    (3*1024)
 
-void set_flag_le16(__le16 *data, u16 flag);
-void set_flag_le64(__le64 *data, u64 flag);
-void reset_flag_le64(__le64 *data, u64 flag);
-
 int
 vmxnet3_quiesce_dev(struct vmxnet3_adapter *adapter);
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH] vmxnet3: remove set_flag_le{16,64} helpers
From: Harvey Harrison @ 2010-10-21  6:28 UTC (permalink / raw)
  To: sbhatewara; +Cc: netdev, shemminger

It's easier to just annotate the constants as little endian types and set/clear
the flags directly.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
---
 drivers/net/vmxnet3/upt1_defs.h       |    8 +++---
 drivers/net/vmxnet3/vmxnet3_defs.h    |    6 ++--
 drivers/net/vmxnet3/vmxnet3_drv.c     |   35 +++++++-------------------------
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   14 +++++-------
 drivers/net/vmxnet3/vmxnet3_int.h     |    4 ---
 5 files changed, 21 insertions(+), 46 deletions(-)

diff --git a/drivers/net/vmxnet3/upt1_defs.h b/drivers/net/vmxnet3/upt1_defs.h
index 37108fb..969c751 100644
--- a/drivers/net/vmxnet3/upt1_defs.h
+++ b/drivers/net/vmxnet3/upt1_defs.h
@@ -88,9 +88,9 @@ struct UPT1_RSSConf {
 
 /* features */
 enum {
-	UPT1_F_RXCSUM		= 0x0001,   /* rx csum verification */
-	UPT1_F_RSS		= 0x0002,
-	UPT1_F_RXVLAN		= 0x0004,   /* VLAN tag stripping */
-	UPT1_F_LRO		= 0x0008,
+	UPT1_F_RXCSUM		= cpu_to_le64(0x0001),   /* rx csum verification */
+	UPT1_F_RSS		= cpu_to_le64(0x0002),
+	UPT1_F_RXVLAN		= cpu_to_le64(0x0004),   /* VLAN tag stripping */
+	UPT1_F_LRO		= cpu_to_le64(0x0008),
 };
 #endif
diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h
index ca7727b..4d84912 100644
--- a/drivers/net/vmxnet3/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/vmxnet3_defs.h
@@ -523,9 +523,9 @@ struct Vmxnet3_RxFilterConf {
 #define VMXNET3_PM_MAX_PATTERN_SIZE   128
 #define VMXNET3_PM_MAX_MASK_SIZE      (VMXNET3_PM_MAX_PATTERN_SIZE / 8)
 
-#define VMXNET3_PM_WAKEUP_MAGIC       0x01  /* wake up on magic pkts */
-#define VMXNET3_PM_WAKEUP_FILTER      0x02  /* wake up on pkts matching
-					     * filters */
+#define VMXNET3_PM_WAKEUP_MAGIC       cpu_to_le16(0x01)  /* wake up on magic pkts */
+#define VMXNET3_PM_WAKEUP_FILTER      cpu_to_le16(0x02)  /* wake up on pkts matching
+							  * filters */
 
 
 struct Vmxnet3_PM_PktFilter {
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 198ce92..ce292d4 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1548,23 +1548,6 @@ vmxnet3_free_irqs(struct vmxnet3_adapter *adapter)
 	}
 }
 
-
-inline void set_flag_le16(__le16 *data, u16 flag)
-{
-	*data = cpu_to_le16(le16_to_cpu(*data) | flag);
-}
-
-inline void set_flag_le64(__le64 *data, u64 flag)
-{
-	*data = cpu_to_le64(le64_to_cpu(*data) | flag);
-}
-
-inline void reset_flag_le64(__le64 *data, u64 flag)
-{
-	*data = cpu_to_le64(le64_to_cpu(*data) & ~flag);
-}
-
-
 static void
 vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 {
@@ -1580,8 +1563,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 			adapter->vlan_grp = grp;
 
 			/* update FEATURES to device */
-			set_flag_le64(&devRead->misc.uptFeatures,
-				      UPT1_F_RXVLAN);
+			devRead->misc.uptFeatures |= UPT1_F_RXVLAN;
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
 			/*
@@ -1604,7 +1586,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 		struct Vmxnet3_DSDevRead *devRead = &shared->devRead;
 		adapter->vlan_grp = NULL;
 
-		if (le64_to_cpu(devRead->misc.uptFeatures) & UPT1_F_RXVLAN) {
+		if (devRead->misc.uptFeatures & UPT1_F_RXVLAN) {
 			int i;
 
 			for (i = 0; i < VMXNET3_VFT_SIZE; i++) {
@@ -1617,8 +1599,7 @@ vmxnet3_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp)
 					       VMXNET3_CMD_UPDATE_VLAN_FILTERS);
 
 			/* update FEATURES to device */
-			reset_flag_le64(&devRead->misc.uptFeatures,
-					UPT1_F_RXVLAN);
+			devRead->misc.uptFeatures &= ~UPT1_F_RXVLAN;
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
 		}
@@ -1779,15 +1760,15 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
 
 	/* set up feature flags */
 	if (adapter->rxcsum)
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_RXCSUM);
+		devRead->misc.uptFeatures |= UPT1_F_RXCSUM;
 
 	if (adapter->lro) {
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_LRO);
+		devRead->misc.uptFeatures |= UPT1_F_LRO;
 		devRead->misc.maxNumRxSG = cpu_to_le16(1 + MAX_SKB_FRAGS);
 	}
 	if ((adapter->netdev->features & NETIF_F_HW_VLAN_RX) &&
 	    adapter->vlan_grp) {
-		set_flag_le64(&devRead->misc.uptFeatures, UPT1_F_RXVLAN);
+		devRead->misc.uptFeatures |= UPT1_F_RXVLAN;
 	}
 
 	devRead->misc.mtu = cpu_to_le32(adapter->netdev->mtu);
@@ -2594,7 +2575,7 @@ vmxnet3_suspend(struct device *device)
 		memcpy(pmConf->filters[i].pattern, netdev->dev_addr, ETH_ALEN);
 		pmConf->filters[i].mask[0] = 0x3F; /* LSB ETH_ALEN bits */
 
-		set_flag_le16(&pmConf->wakeUpEvents, VMXNET3_PM_WAKEUP_FILTER);
+		pmConf->wakeUpEvents |= VMXNET3_PM_WAKEUP_FILTER;
 		i++;
 	}
 
@@ -2642,7 +2623,7 @@ vmxnet3_suspend(struct device *device)
 
 skip_arp:
 	if (adapter->wol & WAKE_MAGIC)
-		set_flag_le16(&pmConf->wakeUpEvents, VMXNET3_PM_WAKEUP_MAGIC);
+		pmConf->wakeUpEvents |= VMXNET3_PM_WAKEUP_MAGIC;
 
 	pmConf->numFilters = i;
 
diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7e4b5a8..b79070b 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -50,13 +50,11 @@ vmxnet3_set_rx_csum(struct net_device *netdev, u32 val)
 		adapter->rxcsum = val;
 		if (netif_running(netdev)) {
 			if (val)
-				set_flag_le64(
-				&adapter->shared->devRead.misc.uptFeatures,
-				UPT1_F_RXCSUM);
+				adapter->shared->devRead.misc.uptFeatures |=
+				UPT1_F_RXCSUM;
 			else
-				reset_flag_le64(
-				&adapter->shared->devRead.misc.uptFeatures,
-				UPT1_F_RXCSUM);
+				adapter->shared->devRead.misc.uptFeatures &=
+				~UPT1_F_RXCSUM;
 
 			VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 					       VMXNET3_CMD_UPDATE_FEATURE);
@@ -292,10 +290,10 @@ vmxnet3_set_flags(struct net_device *netdev, u32 data)
 		/* update harware LRO capability accordingly */
 		if (lro_requested)
 			adapter->shared->devRead.misc.uptFeatures |=
-						cpu_to_le64(UPT1_F_LRO);
+							UPT1_F_LRO;
 		else
 			adapter->shared->devRead.misc.uptFeatures &=
-						cpu_to_le64(~UPT1_F_LRO);
+							~UPT1_F_LRO;
 		VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 				       VMXNET3_CMD_UPDATE_FEATURE);
 	}
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 2121c73..46aee6d 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -353,10 +353,6 @@ struct vmxnet3_adapter {
 #define VMXNET3_MAX_ETH_HDR_SIZE    22
 #define VMXNET3_MAX_SKB_BUF_SIZE    (3*1024)
 
-void set_flag_le16(__le16 *data, u16 flag);
-void set_flag_le64(__le64 *data, u64 flag);
-void reset_flag_le64(__le64 *data, u64 flag);
-
 int
 vmxnet3_quiesce_dev(struct vmxnet3_adapter *adapter);
 
-- 
1.7.1


^ permalink raw reply related

* Re: Question w.r.t debugfs / netdevice pass-through IOCTL
From: Stephen Hemminger @ 2010-10-21  4:19 UTC (permalink / raw)
  To: Debashis Dutt; +Cc: netdev@vger.kernel.org
In-Reply-To: <F363E7AC84E1B646A0358B281A46F4AEABA0FFCC68@HQ1-EXCH03.corp.brocade.com>

On Wed, 20 Oct 2010 20:26:50 -0700
Debashis Dutt <ddutt@Brocade.COM> wrote:

> Hi, 
> 
> For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not supported by current tools like ethtool. 
> 
> Examples of such operations would be 
>        a) Queries related to CEE, if the link is CEE.
>        b) Get traces from firmware.

> 
> I was wondering what would be right approach to take here:
>                 a) use debugfs (like the Chelsio cxgb4 driver)
Works as long as they are really debug operations. The debugfs isn't always
available, and support should be a config option for your driver.

>                 b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
>                     struct net_device_ops{}

The problem with ioctl is it doesn't work for 32 bit user space
compatiablity. The ioctl compat layer does not have enough context
to translate SIOCDEVPRIVATE

>                     As per comments in the header file, b) should not be used
>                     since this IOCTL is supposed to be deprecated.
>                 c) use procfs / sysfs (these may not scale, in our opinion)

Although less common, there were drivers putting things in /proc/net/xxx/ethX



-- 

^ permalink raw reply

* [RFC PATCH] net: consolidate 8021q tagging
From: John Fastabend @ 2010-10-21  3:40 UTC (permalink / raw)
  To: netdev; +Cc: john.r.fastabend, jesse, davem

This is an example to illustrate my comment to Jesse Gross
patch where he adds vlan tagging for the non offload case
to dev_hard_start_xmit. It compiles but otherwise I've not
tested it.

If we tag vlan packets in dev_hard_start_xmit we no longer
need to actually tag them here just set the vlan_tci field
in the skb and let the stack get them at the bottom.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |  105 +++-----------------------------------------------
 1 files changed, 7 insertions(+), 98 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 14e3d1f..78b1618 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -326,24 +326,12 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	 */
 	if (veth->h_vlan_proto != htons(ETH_P_8021Q) ||
 	    vlan_dev_info(dev)->flags & VLAN_FLAG_REORDER_HDR) {
-		unsigned int orig_headroom = skb_headroom(skb);
 		u16 vlan_tci;
-
-		vlan_dev_info(dev)->cnt_encap_on_xmit++;
-
 		vlan_tci = vlan_dev_info(dev)->vlan_id;
 		vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-		skb = __vlan_put_tag(skb, vlan_tci);
-		if (!skb) {
-			txq->tx_dropped++;
-			return NETDEV_TX_OK;
-		}
-
-		if (orig_headroom < VLAN_HLEN)
-			vlan_dev_info(dev)->cnt_inc_headroom_on_tx++;
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
 	}
 
-
 	skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
 	len = skb->len;
 	ret = dev_queue_xmit(skb);
@@ -357,32 +345,6 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	return ret;
 }
 
-static netdev_tx_t vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb,
-						    struct net_device *dev)
-{
-	int i = skb_get_queue_mapping(skb);
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-	u16 vlan_tci;
-	unsigned int len;
-	int ret;
-
-	vlan_tci = vlan_dev_info(dev)->vlan_id;
-	vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-	skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
-
-	skb->dev = vlan_dev_info(dev)->real_dev;
-	len = skb->len;
-	ret = dev_queue_xmit(skb);
-
-	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
-		txq->tx_packets++;
-		txq->tx_bytes += len;
-	} else
-		txq->tx_dropped++;
-
-	return ret;
-}
-
 static u16 vlan_dev_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct net_device *rdev = vlan_dev_info(dev)->real_dev;
@@ -719,8 +681,7 @@ static const struct header_ops vlan_header_ops = {
 	.parse	 = eth_header_parse,
 };
 
-static const struct net_device_ops vlan_netdev_ops, vlan_netdev_accel_ops,
-		    vlan_netdev_ops_sq, vlan_netdev_accel_ops_sq;
+static const struct net_device_ops vlan_netdev_ops, vlan_netdev_ops_sq;
 
 static int vlan_dev_init(struct net_device *dev)
 {
@@ -755,19 +716,16 @@ static int vlan_dev_init(struct net_device *dev)
 	if (real_dev->features & NETIF_F_HW_VLAN_TX) {
 		dev->header_ops      = real_dev->header_ops;
 		dev->hard_header_len = real_dev->hard_header_len;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_accel_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_accel_ops;
 	} else {
 		dev->header_ops      = &vlan_header_ops;
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_ops;
 	}
 
+	if (real_dev->netdev_ops->ndo_select_queue)
+		dev->netdev_ops = &vlan_netdev_ops_sq;
+	else
+		dev->netdev_ops = &vlan_netdev_ops;
+
 	if (is_vlan_dev(real_dev))
 		subclass = 1;
 
@@ -908,30 +866,6 @@ static const struct net_device_ops vlan_netdev_ops = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops = {
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 static const struct net_device_ops vlan_netdev_ops_sq = {
 	.ndo_select_queue	= vlan_dev_select_queue,
 	.ndo_change_mtu		= vlan_dev_change_mtu,
@@ -957,31 +891,6 @@ static const struct net_device_ops vlan_netdev_ops_sq = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops_sq = {
-	.ndo_select_queue	= vlan_dev_select_queue,
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 void vlan_setup(struct net_device *dev)
 {
 	ether_setup(dev);


^ permalink raw reply related

* Re: [PATCH v2 04/14] vlan: Enable software emulation for vlan accleration.
From: John Fastabend @ 2010-10-21  3:32 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1287618974-4714-5-git-send-email-jesse@nicira.com>

On 10/20/2010 4:56 PM, Jesse Gross wrote:
> Currently users of hardware vlan accleration need to know whether
> the device supports it before generating packets.  However, vlan
> acceleration will soon be available in a more flexible manner so
> knowing ahead of time becomes much more difficult.  This adds
> a software fallback path for vlan packets on devices without the
> necessary offloading support, similar to other types of hardware
> accleration.
> 
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
>  include/linux/netdevice.h |   14 +++++++++++---
>  net/core/dev.c            |   36 +++++++++++++++++++++++++++++++++---
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 880d565..2861565 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2248,9 +2248,17 @@ static inline int skb_gso_ok(struct sk_buff *skb, int features)
>  
>  static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
>  {
> -	return skb_is_gso(skb) &&
> -	       (!skb_gso_ok(skb, dev->features) ||
> -		unlikely(skb->ip_summed != CHECKSUM_PARTIAL));
> +	if (skb_is_gso(skb)) {
> +		int features = dev->features;
> +
> +		if (skb->protocol == htons(ETH_P_8021Q) || skb->vlan_tci)
> +			features &= dev->vlan_features;
> +
> +		return (!skb_gso_ok(skb, features) ||
> +			unlikely(skb->ip_summed != CHECKSUM_PARTIAL));
> +	}
> +
> +	return 0;
>  }
>  
>  static inline void netif_set_gso_max_size(struct net_device *dev,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4c3ac53..1bfd96b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1694,7 +1694,12 @@ static bool can_checksum_protocol(unsigned long features, __be16 protocol)
>  
>  static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
>  {
> -	if (can_checksum_protocol(dev->features, skb->protocol))
> +	int features = dev->features;
> +
> +	if (vlan_tx_tag_present(skb))
> +		features &= dev->vlan_features;
> +
> +	if (can_checksum_protocol(features, skb->protocol))
>  		return true;
>  
>  	if (skb->protocol == htons(ETH_P_8021Q)) {
> @@ -1793,6 +1798,16 @@ struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
>  	__be16 type = skb->protocol;
>  	int err;
>  
> +	if (type == htons(ETH_P_8021Q)) {
> +		struct vlan_ethhdr *veh;
> +
> +		if (unlikely(!pskb_may_pull(skb, VLAN_ETH_HLEN)))
> +			return ERR_PTR(-EINVAL);
> +
> +		veh = (struct vlan_ethhdr *)skb->data;
> +		type = veh->h_vlan_encapsulated_proto;
> +	}
> +
>  	skb_reset_mac_header(skb);
>  	skb->mac_len = skb->network_header - skb->mac_header;
>  	__skb_pull(skb, skb->mac_len);
> @@ -1964,9 +1979,14 @@ static inline void skb_orphan_try(struct sk_buff *skb)
>  static inline int skb_needs_linearize(struct sk_buff *skb,
>  				      struct net_device *dev)
>  {
> +	int features = dev->features;
> +
> +	if (skb->protocol == htons(ETH_P_8021Q) || vlan_tx_tag_present(skb))
> +		features &= dev->vlan_features;
> +
>  	return skb_is_nonlinear(skb) &&
> -	       ((skb_has_frag_list(skb) && !(dev->features & NETIF_F_FRAGLIST)) ||
> -	        (skb_shinfo(skb)->nr_frags && (!(dev->features & NETIF_F_SG) ||
> +	       ((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
> +		(skb_shinfo(skb)->nr_frags && (!(features & NETIF_F_SG) ||
>  					      illegal_highdma(dev, skb))));
>  }
>  
> @@ -1989,6 +2009,15 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  
>  		skb_orphan_try(skb);
>  
> +		if (vlan_tx_tag_present(skb) &&
> +		    !(dev->features & NETIF_F_HW_VLAN_TX)) {
> +			skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> +			if (unlikely(!skb))
> +				goto out;
> +
> +			skb->vlan_tci = 0;
> +		}
> +

Nice set of patches! If we tag frames in dev_hard_start_xmit() can we consolidate
the offload enabled and non-offloaded net_device_ops in 8021q. And then not tag in
vlan_dev_hard_start_xmit? I'll post an example thinking out loud here.

Thanks,
John.


>  		if (netif_needs_gso(dev, skb)) {
>  			if (unlikely(dev_gso_segment(skb)))
>  				goto out_kfree_skb;
> @@ -2050,6 +2079,7 @@ out_kfree_gso_skb:
>  		skb->destructor = DEV_GSO_CB(skb)->destructor;
>  out_kfree_skb:
>  	kfree_skb(skb);
> +out:
>  	return rc;
>  }
>  


^ permalink raw reply

* Re: [PATCH v2 07/14] ethtool: Add support for vlan accleration.
From: John Fastabend @ 2010-10-21  3:27 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1287618974-4714-8-git-send-email-jesse@nicira.com>

On 10/20/2010 4:56 PM, Jesse Gross wrote:
> Now that vlan acceleration is handled consistently regardless of usage,
> it is possible to enable and disable it at will.  This adds support for
> Ethtool operations that change the offloading status for debugging
> purposes, similar to other forms of hardware acceleration.
> 

Jesse,

Not sure if this is enough to get dynamic toggling like this
dev->hard_header_len is set depending on offloads at init time in
vlan_dev_init(). By changing this LL_RESERVED_SPACE won't work
correctly and we end up having to call pskb_expand_head(). I think
this might end up hurting performance.

That said I think I can probably get this working by fixing up the
header_ops in vlan_dev.c.  And while I'm at it add a vlan_header_cache
and vlan_header_cache_update routines. I'll try to get something out
tomorrow in the meantime nothing too bad is happening.

Thanks,
John.

> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
>  include/linux/ethtool.h |    2 ++
>  net/core/ethtool.c      |    3 ++-
>  2 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 8a3338c..6628a50 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -309,6 +309,8 @@ struct ethtool_perm_addr {
>   * flag differs from the read-only value.
>   */
>  enum ethtool_flags {
> +	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
> +	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
>  	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
>  	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
>  	ETH_FLAG_RXHASH		= (1 << 28),
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 685c700..956a9f4 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -132,7 +132,8 @@ EXPORT_SYMBOL(ethtool_op_set_ufo);
>   * NETIF_F_xxx values in include/linux/netdevice.h
>   */
>  static const u32 flags_dup_features =
> -	(ETH_FLAG_LRO | ETH_FLAG_NTUPLE | ETH_FLAG_RXHASH);
> +	(ETH_FLAG_LRO | ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN | ETH_FLAG_NTUPLE |
> +	 ETH_FLAG_RXHASH);
>  
>  u32 ethtool_op_get_flags(struct net_device *dev)
>  {


^ permalink raw reply

* Question w.r.t debugfs / netdevice pass-through IOCTL
From: Debashis Dutt @ 2010-10-21  3:26 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi, 

For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not supported by current tools like ethtool. 

Examples of such operations would be 
       a) Queries related to CEE, if the link is CEE.
       b) Get traces from firmware.

I was wondering what would be right approach to take here:
                a) use debugfs (like the Chelsio cxgb4 driver)
                b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
                    struct net_device_ops{}
                    As per comments in the header file, b) should not be used
                    since this IOCTL is supposed to be deprecated.
                c) use procfs / sysfs (these may not scale, in our opinion)

Please suggest.

Thanks
--Debashis


^ permalink raw reply

* (unknown), 
From: Debashis Dutt @ 2010-10-21  3:07 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David S. Miller
  Cc: Rasesh Mody, Jing Huang, Akshay Mathur

Hi, 

For the Brocade 10G Ethernet driver (bna) we want to implement a set of 
operations which is not supported by current tools like ethtool. 

Examples of such operations would be 
       a) Queries related to CEE, if the link is CEE.
       b) Get traces from firmware.

I was wondering what would be right approach to take here:
                a) use debugfs (like the Chelsio cxgb4 driver)
                b) use SIOCDEVPRIVATE for the pass through IOCTL defined in 
                    struct net_device_ops{}
                    As per comments in the header file, b) should not be used
                    since this IOCTL is supposed to be deprecated.
                c) use procfs / sysfs (these may not scale, in our opinion)

Please suggest.

Thanks
--Debashis


^ permalink raw reply

* Re: [PATCH 1/2] r6040: fix multicast operations
From: Ben Hutchings @ 2010-10-21  2:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, Shawn Lin, Marc Leclerc, Albert Chen, David Miller
In-Reply-To: <201010202309.43812.florian@openwrt.org>

[-- Attachment #1: Type: text/plain, Size: 5893 bytes --]

On Wed, 2010-10-20 at 22:25 +0100, Florian Fainelli wrote:
> This patch fixes the following issues with the r6040 NIC operating in
> multicast:
> 
> 1) When the IFF_ALLMULTI flag is set, we should write 0xffff to the NIC hash
>    table registers to make it process multicast traffic
> 2) When the number of multicast address to handle is smaller than MCAST_MAX
>    we should use the NIC multicast registers MID1_{L,M,H}.
> 3) The hashing of the address was not correct, due to an invalid substraction
>    (15 - (crc & 0x0f)) instead of (crc & 0x0f)

> Reported-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
> Tested-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
> Signed-off-by: Shawn Lin <shawn@dmp.com.tw>
> Signed-off-by: Albert Chen <albert.chen@rdc.com.tw>
> Signed-off-by: Florian Fainelli <florian@openwrt.org>
> CC: stable@kernel.org

Remember you'll need to provide a different version for 2.6.27.y and
2.6.32.y.

> ---
> diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
> index 68a8419..3843363 100644
> --- a/drivers/net/r6040.c
> +++ b/drivers/net/r6040.c
> @@ -852,74 +852,90 @@ static void r6040_multicast_list(struct net_device *dev)
>  	struct r6040_private *lp = netdev_priv(dev);
>  	void __iomem *ioaddr = lp->base;
>  	u16 *adrp;
> -	u16 reg;
>  	unsigned long flags;
>  	struct netdev_hw_addr *ha;
>  	int i;
>  
> -	/* MAC Address */
> -	adrp = (u16 *)dev->dev_addr;
> -	iowrite16(adrp[0], ioaddr + MID_0L);
> -	iowrite16(adrp[1], ioaddr + MID_0M);
> -	iowrite16(adrp[2], ioaddr + MID_0H);
> -
> -	/* Promiscous Mode */
>  	spin_lock_irqsave(&lp->lock, flags);
>  
>  	/* Clear AMCP & PROM bits */
> -	reg = ioread16(ioaddr) & ~0x0120;
> -	if (dev->flags & IFF_PROMISC) {
> -		reg |= 0x0020;
> +	lp->mcr0 = ioread16(ioaddr) & ~0x0120;
> +
> +	/* Promiscuous Mode */
> +	if (dev->flags & IFF_PROMISC)
>  		lp->mcr0 |= 0x0020;
> -	}
> -	/* Too many multicast addresses
> -	 * accept all traffic */
> -	else if ((netdev_mc_count(dev) > MCAST_MAX) ||
> -		 (dev->flags & IFF_ALLMULTI))
> -		reg |= 0x0020;
>  
> -	iowrite16(reg, ioaddr);
> -	spin_unlock_irqrestore(&lp->lock, flags);
> +	/* Enable multicast hash table function to
> +	 * receive all multicast packets.
> +	 */
> +	else if (dev->flags & IFF_ALLMULTI) {
> +		lp->mcr0 |= 0x0100;

Please give these flags names.

>  
> -	/* Build the hash table */
> -	if (netdev_mc_count(dev) > MCAST_MAX) {
> -		u16 hash_table[4];
> +		for (i = 0; i < MCAST_MAX ; i++) {
> +			iowrite16(0, ioaddr + MID_1L + 8 * i);
> +			iowrite16(0, ioaddr + MID_1M + 8 * i);
> +			iowrite16(0, ioaddr + MID_1H + 8 * i);
> +		}
> +
> +		iowrite16(0xffff, ioaddr + MAR0);
> +		iowrite16(0xffff, ioaddr + MAR1);
> +		iowrite16(0xffff, ioaddr + MAR2);
> +		iowrite16(0xffff, ioaddr + MAR3);
> +	}
> +
> +	/* Use internal multicast address registers
> +	 * if the number of multicast addresses is not greater than MCAST_MAX.
> +	 */
> +	else if (netdev_mc_empty(dev)) {
> +		for (i = 0; i < MCAST_MAX ; i++) {
> +			iowrite16(0, ioaddr + MID_1L + 8 * i);
> +			iowrite16(0, ioaddr + MID_1M + 8 * i);
> +			iowrite16(0, ioaddr + MID_1H + 8 * i);
> +		}
> +	} else if (netdev_mc_count(dev) <= MCAST_MAX) {
> +		i = 0;
> +		netdev_for_each_mc_addr(ha, dev) {
> +			adrp = (u16 *) ha->addr;
> +			iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
> +			iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
> +			iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
> +			i++;
> +		}

What about the unused exact match entries?  And why is the empty case
special?

> +	}
> +	/* Otherwise, Enable multicast hash table function. */
> +	else {
> +		u16 hash_table[4] = { 0, };
>  		u32 crc;
>  
> -		for (i = 0; i < 4; i++)
> -			hash_table[i] = 0;
> +		lp->mcr0 |= 0x0100;
>  
> +		for (i = 0; i < MCAST_MAX ; i++) {
> +			iowrite16(0, ioaddr + MID_1L + 8 * i);
> +			iowrite16(0, ioaddr + MID_1M + 8 * i);
> +			iowrite16(0, ioaddr + MID_1H + 8 * i);
> +		}
> +
> +		/* Build multicast hash table */
>  		netdev_for_each_mc_addr(ha, dev) {
>  			char *addrs = ha->addr;
>  
>  			if (!(*addrs & 1))
>  				continue;
>  
> -			crc = ether_crc_le(6, addrs);
> +			crc = ether_crc(ETH_ALEN, addrs);

You're reversing the order of bits in the CRC, which is not mentioned in
the commit message; are you sure that's right?

>  			crc >>= 26;
> -			hash_table[crc >> 4] |= 1 << (15 - (crc & 0xf));
> +			hash_table[crc >> 4] |= 1 << (crc & 0xf);
>  		}
> +
>  		/* Fill the MAC hash tables with their values */
>  		iowrite16(hash_table[0], ioaddr + MAR0);
>  		iowrite16(hash_table[1], ioaddr + MAR1);
>  		iowrite16(hash_table[2], ioaddr + MAR2);
>  		iowrite16(hash_table[3], ioaddr + MAR3);
>  	}
> -	/* Multicast Address 1~4 case */
> -	i = 0;
> -	netdev_for_each_mc_addr(ha, dev) {
> -		if (i < MCAST_MAX) {
> -			adrp = (u16 *) ha->addr;
> -			iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
> -			iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
> -			iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
> -		} else {
> -			iowrite16(0xffff, ioaddr + MID_1L + 8 * i);
> -			iowrite16(0xffff, ioaddr + MID_1M + 8 * i);
> -			iowrite16(0xffff, ioaddr + MID_1H + 8 * i);
> -		}

This conflicts with my patch in
<http://article.gmane.org/gmane.linux.network/174926> which Dave has
already applied (but not pushed out).

Ben.

> -		i++;
> -	}
> +	iowrite16(lp->mcr0, ioaddr);
> +
> +	spin_unlock_irqrestore(&lp->lock, flags);
>  }
>  
>  static void netdev_get_drvinfo(struct net_device *dev,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH v2 00/14] Move vlan acceleration into networking core.
From: David Dillow @ 2010-10-21  2:02 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
> The first eleven patches can be applied immediately, while the last three need
> to wait until all drivers that support vlan acceleration are updated.  If
> people agree that this patch set makes sense I will go ahead and switch over
> the dozen or so drivers that would need to change.

Here's a first pass at converting typhoon to the new methods. It is
compile tested, but I have to put the hardware back in a machine to do
some testing, which I may not be able to do before the weekend.

Of course, we could just change it to not offload by default if we need
to push this sooner.

Applies to net-next-2.6 with the vlan changes.



diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
index 1cc6713..5ee0324 100644
--- a/drivers/net/typhoon.c
+++ b/drivers/net/typhoon.c
@@ -280,8 +280,6 @@ struct typhoon {
 	struct pci_dev *	pdev;
 	struct net_device *	dev;
 	struct napi_struct	napi;
-	spinlock_t		state_lock;
-	struct vlan_group *	vlgrp;
 	struct basic_ring	rxHiRing;
 	struct basic_ring	rxBuffRing;
 	struct rxbuff_ent	rxbuffers[RXENT_ENTRIES];
@@ -695,42 +693,39 @@ out:
 	return err;
 }
 
-static void
-typhoon_vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
+static int
+typhoon_offload_vlan(struct net_device *dev, bool enabled)
 {
 	struct typhoon *tp = netdev_priv(dev);
+	__le32 offload = tp->offload;
 	struct cmd_desc xp_cmd;
 	int err;
 
-	spin_lock_bh(&tp->state_lock);
-	if(!tp->vlgrp != !grp) {
-		/* We've either been turned on for the first time, or we've
-		 * been turned off. Update the 3XP.
-		 */
-		if(grp)
-			tp->offload |= TYPHOON_OFFLOAD_VLAN;
-		else
-			tp->offload &= ~TYPHOON_OFFLOAD_VLAN;
-
-		/* If the interface is up, the runtime is running -- and we
-		 * must be up for the vlan core to call us.
-		 *
-		 * Do the command outside of the spin lock, as it is slow.
-		 */
-		INIT_COMMAND_WITH_RESPONSE(&xp_cmd,
-					TYPHOON_CMD_SET_OFFLOAD_TASKS);
-		xp_cmd.parm2 = tp->offload;
-		xp_cmd.parm3 = tp->offload;
-		spin_unlock_bh(&tp->state_lock);
-		err = typhoon_issue_command(tp, 1, &xp_cmd, 0, NULL);
-		if(err < 0)
-			netdev_err(tp->dev, "vlan offload error %d\n", -err);
-		spin_lock_bh(&tp->state_lock);
-	}
-
-	/* now make the change visible */
-	tp->vlgrp = grp;
-	spin_unlock_bh(&tp->state_lock);
+	if (enabled)
+		offload |= TYPHOON_OFFLOAD_VLAN;
+	else
+		offload &= ~TYPHOON_OFFLOAD_VLAN;
+
+	if (offload == tp->offload)
+		return 0;
+
+	/* We've either been turned on for the first time, or we've
+	 * been turned off. Save the setting, and update the 3XP if the
+	 * runtime is active.
+	 *
+	 * Caller must hold the RTNL lock.
+	 */
+	tp->offload = offload;
+	if (!netif_running(dev))
+		return 0;
+
+	INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_SET_OFFLOAD_TASKS);
+	xp_cmd.parm2 = tp->offload;
+	xp_cmd.parm3 = tp->offload;
+	err = typhoon_issue_command(tp, 1, &xp_cmd, 0, NULL);
+	if(err < 0)
+		netdev_err(tp->dev, "vlan offload error %d\n", -err);
+	return err;
 }
 
 static inline void
@@ -1198,6 +1193,30 @@ typhoon_get_rx_csum(struct net_device *dev)
 	return 1;
 }
 
+static int
+typhoon_set_flags(struct net_device *dev, u32 data)
+{
+	u32 orig_flags = dev->features;
+	int rc;
+
+	/* VLAN offloading is a package deal on the 3XP -- if enabled,
+	 * we'll always have RX offload active, but we can choose to
+	 * not use the TX offload.
+	 */
+	if ((data & ETH_FLAG_TXVLAN) && !(data & ETH_FLAG_RXVLAN))
+		return -EINVAL;
+
+	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
+	if (rc)
+		return rc;
+
+	data &= ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN;
+	rc = typhoon_offload_vlan(dev, data);
+	if (rc)
+		dev->features = orig_flags;
+	return rc;
+}
+
 static void
 typhoon_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ering)
 {
@@ -1224,6 +1243,8 @@ static const struct ethtool_ops typhoon_ethtool_ops = {
 	.set_sg			= ethtool_op_set_sg,
 	.set_tso		= ethtool_op_set_tso,
 	.get_ringparam		= typhoon_get_ringparam,
+	.set_flags		= typhoon_set_flags,
+	.get_flags		= ethtool_op_get_flags,
 };
 
 static int
@@ -1309,9 +1330,9 @@ typhoon_init_interface(struct typhoon *tp)
 
 	tp->offload = TYPHOON_OFFLOAD_IP_CHKSUM | TYPHOON_OFFLOAD_TCP_CHKSUM;
 	tp->offload |= TYPHOON_OFFLOAD_UDP_CHKSUM | TSO_OFFLOAD_ON;
+	tp->offload |= TYPHOON_OFFLOAD_VLAN;
 
 	spin_lock_init(&tp->command_lock);
-	spin_lock_init(&tp->state_lock);
 
 	/* Force the writes to the shared memory area out before continuing. */
 	wmb();
@@ -1762,13 +1783,10 @@ typhoon_rx(struct typhoon *tp, struct basic_ring *rxRing, volatile __le32 * read
 		} else
 			skb_checksum_none_assert(new_skb);
 
-		spin_lock(&tp->state_lock);
-		if(tp->vlgrp != NULL && rx->rxStatus & TYPHOON_RX_VLAN)
-			vlan_hwaccel_receive_skb(new_skb, tp->vlgrp,
-						 ntohl(rx->vlanTag) & 0xffff);
-		else
-			netif_receive_skb(new_skb);
-		spin_unlock(&tp->state_lock);
+		if (rx->rxStatus & TYPHOON_RX_VLAN)
+			__vlan_hwaccel_put_tag(new_skb,
+					       ntohl(rx->vlanTag) & 0xffff);
+		netif_receive_skb(new_skb);
 
 		received++;
 		budget--;
@@ -1989,11 +2007,9 @@ typhoon_start_runtime(struct typhoon *tp)
 		goto error_out;
 
 	INIT_COMMAND_NO_RESPONSE(&xp_cmd, TYPHOON_CMD_SET_OFFLOAD_TASKS);
-	spin_lock_bh(&tp->state_lock);
 	xp_cmd.parm2 = tp->offload;
 	xp_cmd.parm3 = tp->offload;
 	err = typhoon_issue_command(tp, 1, &xp_cmd, 0, NULL);
-	spin_unlock_bh(&tp->state_lock);
 	if(err < 0)
 		goto error_out;
 
@@ -2231,13 +2247,9 @@ typhoon_suspend(struct pci_dev *pdev, pm_message_t state)
 	if(!netif_running(dev))
 		return 0;
 
-	spin_lock_bh(&tp->state_lock);
-	if(tp->vlgrp && tp->wol_events & TYPHOON_WAKE_MAGIC_PKT) {
-		spin_unlock_bh(&tp->state_lock);
-		netdev_err(dev, "cannot do WAKE_MAGIC with VLANS\n");
-		return -EBUSY;
-	}
-	spin_unlock_bh(&tp->state_lock);
+	if(tp->offload & TYPHOON_OFFLOAD_VLAN &&
+				tp->wol_events & TYPHOON_WAKE_MAGIC_PKT)
+		netdev_warn(dev, "WAKE_MAGIC does not work with VLANS\n");
 
 	netif_device_detach(dev);
 
@@ -2338,7 +2350,6 @@ static const struct net_device_ops typhoon_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= typhoon_set_mac_address,
 	.ndo_change_mtu		= eth_change_mtu,
-	.ndo_vlan_rx_register	= typhoon_vlan_rx_register,
 };
 
 static int __devinit



^ permalink raw reply related

* (unknown), 
From: Lindley, Janalyn @ 2010-10-21  0:21 UTC (permalink / raw)
  To: info

You won BMW X6 and £250,000.00GBP.Contact Barr Mark Hills for clams, Email;barrmarkhills@yahoo.com.hk

^ permalink raw reply

* [PATCH v2 14/14] vlan: Remove accleration legacy functions.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

This removes the explicit vlan accleration functions that acted
as shims in favor of the main receive functions that can now
handle vlans.

Signed-off-by: Jesse Gross <jesse@nicira.com>
--
This patch can only be applied once all drivers that use vlan acceleration
have been converted over to the new model.
---
 include/linux/if_vlan.h   |   63 +++------------------------------------------
 include/linux/netdevice.h |    8 -----
 net/8021q/vlan.c          |    7 +----
 net/8021q/vlan_core.c     |   25 ------------------
 4 files changed, 5 insertions(+), 98 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index a0d9786..e607256 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -74,6 +74,10 @@ static inline struct vlan_ethhdr *vlan_eth_hdr(const struct sk_buff *skb)
 /* found in socket.c */
 extern void vlan_ioctl_set(int (*hook)(struct net *, void __user *));
 
+#define vlan_tx_tag_present(__skb)	((__skb)->vlan_tci & VLAN_TAG_PRESENT)
+#define vlan_tx_tag_get(__skb)		((__skb)->vlan_tci & ~VLAN_TAG_PRESENT)
+
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
 /* if this changes, algorithm will have to be reworked because this
  * depends on completely exhausting the VLAN identifier space.  Thus
  * it gives constant time look-up, but in many cases it wastes memory.
@@ -111,10 +115,6 @@ static inline void vlan_group_set_device(struct vlan_group *vg,
 	array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] = dev;
 }
 
-#define vlan_tx_tag_present(__skb)	((__skb)->vlan_tci & VLAN_TAG_PRESENT)
-#define vlan_tx_tag_get(__skb)		((__skb)->vlan_tci & ~VLAN_TAG_PRESENT)
-
-#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
 /* Must be invoked with rcu_read_lock or with RTNL. */
 static inline struct net_device *vlan_find_dev(struct net_device *real_dev,
 					       u16 vlan_id)
@@ -130,15 +130,7 @@ static inline struct net_device *vlan_find_dev(struct net_device *real_dev,
 extern struct net_device *vlan_dev_real_dev(const struct net_device *dev);
 extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 
-extern int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
-			     u16 vlan_tci, int polling);
 extern bool vlan_hwaccel_do_receive(struct sk_buff **skb);
-extern gro_result_t
-vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
-		 unsigned int vlan_tci, struct sk_buff *skb);
-extern gro_result_t
-vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
-	       unsigned int vlan_tci);
 
 #else
 static inline struct net_device *vlan_find_dev(struct net_device *real_dev,
@@ -159,61 +151,14 @@ static inline u16 vlan_dev_vlan_id(const struct net_device *dev)
 	return 0;
 }
 
-static inline int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
-				    u16 vlan_tci, int polling)
-{
-	BUG();
-	return NET_XMIT_SUCCESS;
-}
-
 static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb)
 {
 	BUG();
 	return false;
 }
-
-static inline gro_result_t
-vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
-		 unsigned int vlan_tci, struct sk_buff *skb)
-{
-	return GRO_DROP;
-}
-
-static inline gro_result_t
-vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
-	       unsigned int vlan_tci)
-{
-	return GRO_DROP;
-}
 #endif
 
 /**
- * vlan_hwaccel_rx - netif_rx wrapper for VLAN RX acceleration
- * @skb: buffer
- * @grp: vlan group
- * @vlan_tci: VLAN TCI as received from the card
- */
-static inline int vlan_hwaccel_rx(struct sk_buff *skb,
-				  struct vlan_group *grp,
-				  u16 vlan_tci)
-{
-	return __vlan_hwaccel_rx(skb, grp, vlan_tci, 0);
-}
-
-/**
- * vlan_hwaccel_receive_skb - netif_receive_skb wrapper for VLAN RX acceleration
- * @skb: buffer
- * @grp: vlan group
- * @vlan_tci: VLAN TCI as received from the card
- */
-static inline int vlan_hwaccel_receive_skb(struct sk_buff *skb,
-					   struct vlan_group *grp,
-					   u16 vlan_tci)
-{
-	return __vlan_hwaccel_rx(skb, grp, vlan_tci, 1);
-}
-
-/**
  * __vlan_put_tag - regular VLAN tag inserting
  * @skb: skbuff to tag
  * @vlan_tci: VLAN TCI to insert
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ed7db7e..22c3c46 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -682,12 +682,6 @@ struct netdev_rx_queue {
  *	3. Update dev->stats asynchronously and atomically, and define
  *	   neither operation.
  *
- * void (*ndo_vlan_rx_register)(struct net_device *dev, struct vlan_group *grp);
- *	If device support VLAN receive accleration
- *	(ie. dev->features & NETIF_F_HW_VLAN_RX), then this function is called
- *	when vlan groups for the device changes.  Note: grp is NULL
- *	if no vlan's groups are being used.
- *
  * void (*ndo_vlan_rx_add_vid)(struct net_device *dev, unsigned short vid);
  *	If device support VLAN filtering (dev->features & NETIF_F_HW_VLAN_FILTER)
  *	this function is called when a VLAN id is registered.
@@ -739,8 +733,6 @@ struct net_device_ops {
 						     struct rtnl_link_stats64 *storage);
 	struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
 
-	void			(*ndo_vlan_rx_register)(struct net_device *dev,
-						        struct vlan_group *grp);
 	void			(*ndo_vlan_rx_add_vid)(struct net_device *dev,
 						       unsigned short vid);
 	void			(*ndo_vlan_rx_kill_vid)(struct net_device *dev,
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 05b867e..4e91db3 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -135,8 +135,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 		vlan_gvrp_uninit_applicant(real_dev);
 
 		rcu_assign_pointer(real_dev->vlgrp, NULL);
-		if (ops->ndo_vlan_rx_register)
-			ops->ndo_vlan_rx_register(real_dev, NULL);
 
 		/* Free the group, after all cpu's are done. */
 		call_rcu(&grp->rcu, vlan_rcu_free);
@@ -207,11 +205,8 @@ int register_vlan_dev(struct net_device *dev)
 	vlan_group_set_device(grp, vlan_id, dev);
 	grp->nr_vlans++;
 
-	if (ngrp) {
-		if (ops->ndo_vlan_rx_register)
-			ops->ndo_vlan_rx_register(real_dev, ngrp);
+	if (ngrp)
 		rcu_assign_pointer(real_dev->vlgrp, ngrp);
-	}
 	if (real_dev->features & NETIF_F_HW_VLAN_FILTER)
 		ops->ndo_vlan_rx_add_vid(real_dev, vlan_id);
 
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 69b2f79..4d6a2b8 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -63,28 +63,3 @@ u16 vlan_dev_vlan_id(const struct net_device *dev)
 	return vlan_dev_info(dev)->vlan_id;
 }
 EXPORT_SYMBOL(vlan_dev_vlan_id);
-
-/* VLAN rx hw acceleration helper.  This acts like netif_{rx,receive_skb}(). */
-int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
-		      u16 vlan_tci, int polling)
-{
-	__vlan_hwaccel_put_tag(skb, vlan_tci);
-	return polling ? netif_receive_skb(skb) : netif_rx(skb);
-}
-EXPORT_SYMBOL(__vlan_hwaccel_rx);
-
-gro_result_t vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
-			      unsigned int vlan_tci, struct sk_buff *skb)
-{
-	__vlan_hwaccel_put_tag(skb, vlan_tci);
-	return napi_gro_receive(napi, skb);
-}
-EXPORT_SYMBOL(vlan_gro_receive);
-
-gro_result_t vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
-			    unsigned int vlan_tci)
-{
-	__vlan_hwaccel_put_tag(napi->skb, vlan_tci);
-	return napi_gro_frags(napi);
-}
-EXPORT_SYMBOL(vlan_gro_frags);
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 13/14] bonding: Update bonding for new vlan model.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

It is no longer necessary to register vlan groups, so update bonding
to not do that on its slaves.  Although the new vlan accleration
model allows additional flexibility, bonding continues to require vlan
devices since it needs addtional system state to handle ARP/IGMP.  This
also removes fallback code for non-vlan acclerated slaves since core
networking now handles that.

Signed-off-by: Jesse Gross <jesse@nicira.com>
--
This patch can only be applied once all drivers that use vlan acceleration
have been converted over to the new model.
---
 drivers/net/bonding/bond_alb.c  |    8 +--
 drivers/net/bonding/bond_ipv6.c |    5 +-
 drivers/net/bonding/bond_main.c |  143 ++++++++-------------------------------
 drivers/net/bonding/bonding.h   |    1 -
 4 files changed, 32 insertions(+), 125 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 26bb118..c911456 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -685,10 +685,8 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 			client_info->ntt = 0;
 		}
 
-		if (bond->vlgrp) {
-			if (!vlan_get_tag(skb, &client_info->vlan_id))
-				client_info->tag = 1;
-		}
+		if (!vlan_get_tag(skb, &client_info->vlan_id))
+			client_info->tag = 1;
 
 		if (!client_info->assigned) {
 			u32 prev_tbl_head = bond_info->rx_hashtbl_head;
@@ -907,7 +905,7 @@ static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[])
 		skb->priority = TC_PRIO_CONTROL;
 		skb->dev = slave->dev;
 
-		if (bond->vlgrp) {
+		if (!list_empty(&bond->vlan_list)) {
 			struct vlan_entry *vlan;
 
 			vlan = bond_next_vlan(bond,
diff --git a/drivers/net/bonding/bond_ipv6.c b/drivers/net/bonding/bond_ipv6.c
index 121b073..c276b5a 100644
--- a/drivers/net/bonding/bond_ipv6.c
+++ b/drivers/net/bonding/bond_ipv6.c
@@ -178,10 +178,7 @@ static int bond_inet6addr_event(struct notifier_block *this,
 		}
 
 		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
-			if (!bond->vlgrp)
-				continue;
-			vlan_dev = vlan_group_get_device(bond->vlgrp,
-							 vlan->vlan_id);
+			vlan_dev = vlan_find_dev(bond->dev, vlan->vlan_id);
 			if (vlan_dev == event_dev) {
 				switch (event) {
 				case NETDEV_UP:
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6b9a7bd..1b89b61 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -418,36 +418,11 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
  * @bond: bond device that got this skb for tx.
  * @skb: hw accel VLAN tagged skb to transmit
  * @slave_dev: slave that is supposed to xmit this skbuff
- *
- * When the bond gets an skb to transmit that is
- * already hardware accelerated VLAN tagged, and it
- * needs to relay this skb to a slave that is not
- * hw accel capable, the skb needs to be "unaccelerated",
- * i.e. strip the hwaccel tag and re-insert it as part
- * of the payload.
  */
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 			struct net_device *slave_dev)
 {
-	unsigned short uninitialized_var(vlan_id);
-
-	/* Test vlan_list not vlgrp to catch and handle 802.1p tags */
-	if (!list_empty(&bond->vlan_list) &&
-	    !(slave_dev->features & NETIF_F_HW_VLAN_TX) &&
-	    vlan_get_tag(skb, &vlan_id) == 0) {
-		skb->dev = slave_dev;
-		skb = vlan_put_tag(skb, vlan_id);
-		if (!skb) {
-			/* vlan_put_tag() frees the skb in case of error,
-			 * so return success here so the calling functions
-			 * won't attempt to free is again.
-			 */
-			return 0;
-		}
-	} else {
-		skb->dev = slave_dev;
-	}
-
+	skb->dev = slave_dev;
 	skb->priority = 1;
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	if (unlikely(bond->dev->priv_flags & IFF_IN_NETPOLL)) {
@@ -464,8 +439,8 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 }
 
 /*
- * In the following 3 functions, bond_vlan_rx_register(), bond_vlan_rx_add_vid
- * and bond_vlan_rx_kill_vid, We don't protect the slave list iteration with a
+ * In the following 2 functions, bond_vlan_rx_add_vid and
+ * bond_vlan_rx_kill_vid, we don't protect the slave list iteration with a
  * lock because:
  * a. This operation is performed in IOCTL context,
  * b. The operation is protected by the RTNL semaphore in the 8021q code,
@@ -482,33 +457,6 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 */
 
 /**
- * bond_vlan_rx_register - Propagates registration to slaves
- * @bond_dev: bonding net device that got called
- * @grp: vlan group being registered
- */
-static void bond_vlan_rx_register(struct net_device *bond_dev,
-				  struct vlan_group *grp)
-{
-	struct bonding *bond = netdev_priv(bond_dev);
-	struct slave *slave;
-	int i;
-
-	write_lock(&bond->lock);
-	bond->vlgrp = grp;
-	write_unlock(&bond->lock);
-
-	bond_for_each_slave(bond, slave, i) {
-		struct net_device *slave_dev = slave->dev;
-		const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
-
-		if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
-		    slave_ops->ndo_vlan_rx_register) {
-			slave_ops->ndo_vlan_rx_register(slave_dev, grp);
-		}
-	}
-}
-
-/**
  * bond_vlan_rx_add_vid - Propagates adding an id to slaves
  * @bond_dev: bonding net device that got called
  * @vid: vlan id being added
@@ -545,7 +493,6 @@ static void bond_vlan_rx_kill_vid(struct net_device *bond_dev, uint16_t vid)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
-	struct net_device *vlan_dev;
 	int i, res;
 
 	bond_for_each_slave(bond, slave, i) {
@@ -553,14 +500,8 @@ static void bond_vlan_rx_kill_vid(struct net_device *bond_dev, uint16_t vid)
 		const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 
 		if ((slave_dev->features & NETIF_F_HW_VLAN_FILTER) &&
-		    slave_ops->ndo_vlan_rx_kill_vid) {
-			/* Save and then restore vlan_dev in the grp array,
-			 * since the slave's driver might clear it.
-			 */
-			vlan_dev = vlan_group_get_device(bond->vlgrp, vid);
+		    slave_ops->ndo_vlan_rx_kill_vid)
 			slave_ops->ndo_vlan_rx_kill_vid(slave_dev, vid);
-			vlan_group_set_device(bond->vlgrp, vid, vlan_dev);
-		}
 	}
 
 	res = bond_del_vlan(bond, vid);
@@ -575,13 +516,6 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla
 	struct vlan_entry *vlan;
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 
-	if (!bond->vlgrp)
-		return;
-
-	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
-	    slave_ops->ndo_vlan_rx_register)
-		slave_ops->ndo_vlan_rx_register(slave_dev, bond->vlgrp);
-
 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
 	    !(slave_ops->ndo_vlan_rx_add_vid))
 		return;
@@ -595,30 +529,17 @@ static void bond_del_vlans_from_slave(struct bonding *bond,
 {
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 	struct vlan_entry *vlan;
-	struct net_device *vlan_dev;
-
-	if (!bond->vlgrp)
-		return;
 
 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
 	    !(slave_ops->ndo_vlan_rx_kill_vid))
-		goto unreg;
+		return;
 
 	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
 		if (!vlan->vlan_id)
 			continue;
-		/* Save and then restore vlan_dev in the grp array,
-		 * since the slave's driver might clear it.
-		 */
-		vlan_dev = vlan_group_get_device(bond->vlgrp, vlan->vlan_id);
+
 		slave_ops->ndo_vlan_rx_kill_vid(slave_dev, vlan->vlan_id);
-		vlan_group_set_device(bond->vlgrp, vlan->vlan_id, vlan_dev);
 	}
-
-unreg:
-	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
-	    slave_ops->ndo_vlan_rx_register)
-		slave_ops->ndo_vlan_rx_register(slave_dev, NULL);
 }
 
 /*------------------------------- Link status -------------------------------*/
@@ -896,23 +817,22 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 	struct vlan_entry *vlan;
 
 	read_lock(&bond->lock);
+	rcu_read_lock();
 
 	/* rejoin all groups on bond device */
 	__bond_resend_igmp_join_requests(bond->dev);
 
 	/* rejoin all groups on vlan devices */
-	if (bond->vlgrp) {
-		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
-			vlan_dev = vlan_group_get_device(bond->vlgrp,
-							 vlan->vlan_id);
-			if (vlan_dev)
-				__bond_resend_igmp_join_requests(vlan_dev);
-		}
+	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
+		vlan_dev = vlan_find_dev(bond->dev, vlan->vlan_id);
+		if (vlan_dev)
+			__bond_resend_igmp_join_requests(vlan_dev);
 	}
 
 	if (--bond->igmp_retrans > 0)
 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
 
+	rcu_read_unlock();
 	read_unlock(&bond->lock);
 }
 
@@ -1386,8 +1306,7 @@ static int bond_sethwaddr(struct net_device *bond_dev,
 }
 
 #define BOND_VLAN_FEATURES \
-	(NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
-	 NETIF_F_HW_VLAN_FILTER)
+	(NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_FILTER)
 
 /*
  * Compute the common dev->feature set available to all slaves.  Some
@@ -1483,7 +1402,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	/* no need to lock since we're protected by rtnl_lock */
 	if (slave_dev->features & NETIF_F_VLAN_CHALLENGED) {
 		pr_debug("%s: NETIF_F_VLAN_CHALLENGED\n", slave_dev->name);
-		if (bond->vlgrp) {
+		if (!list_empty(&bond->vlan_list)) {
 			pr_err("%s: Error: cannot enslave VLAN challenged slave %s on VLAN enabled bond %s\n",
 			       bond_dev->name, slave_dev->name, bond_dev->name);
 			return -EPERM;
@@ -1976,9 +1895,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		 */
 		memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
 
-		if (!bond->vlgrp) {
-			bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
-		} else {
+		if (!list_empty(&bond->vlan_list)) {
 			pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
 				   bond_dev->name, bond_dev->name);
 			pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
@@ -2167,9 +2084,7 @@ static int bond_release_all(struct net_device *bond_dev)
 	 */
 	memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
 
-	if (!bond->vlgrp) {
-		bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
-	} else {
+	if (!list_empty(&bond->vlan_list)) {
 		pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
 			   bond_dev->name, bond_dev->name);
 		pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
@@ -2604,11 +2519,13 @@ static void bond_arp_send_all(struct bonding *bond, struct slave *slave)
 	struct flowi fl;
 	struct rtable *rt;
 
+	rcu_read_lock();
+
 	for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) {
 		if (!targets[i])
 			break;
 		pr_debug("basa: target %x\n", targets[i]);
-		if (!bond->vlgrp) {
+		if (list_empty(&bond->vlan_list)) {
 			pr_debug("basa: empty vlan: arp_send\n");
 			bond_arp_send(slave->dev, ARPOP_REQUEST, targets[i],
 				      bond->master_ip, 0);
@@ -2646,7 +2563,7 @@ static void bond_arp_send_all(struct bonding *bond, struct slave *slave)
 
 		vlan_id = 0;
 		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
-			vlan_dev = vlan_group_get_device(bond->vlgrp, vlan->vlan_id);
+			vlan_dev = vlan_find_dev(bond->dev, vlan->vlan_id);
 			if (vlan_dev == rt->dst.dev) {
 				vlan_id = vlan->vlan_id;
 				pr_debug("basa: vlan match on %s %d\n",
@@ -2669,6 +2586,8 @@ static void bond_arp_send_all(struct bonding *bond, struct slave *slave)
 		}
 		ip_rt_put(rt);
 	}
+
+	rcu_read_unlock();
 }
 
 /*
@@ -2697,16 +2616,17 @@ static void bond_send_gratuitous_arp(struct bonding *bond)
 				bond->master_ip, 0);
 	}
 
-	if (!bond->vlgrp)
-		return;
+	rcu_read_lock();
 
 	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
-		vlan_dev = vlan_group_get_device(bond->vlgrp, vlan->vlan_id);
+		vlan_dev = vlan_find_dev(bond->dev, vlan->vlan_id);
 		if (vlan->vlan_ip) {
 			bond_arp_send(slave->dev, ARPOP_REPLY, vlan->vlan_ip,
 				      vlan->vlan_ip, vlan->vlan_id);
 		}
 	}
+
+	rcu_read_unlock();
 }
 
 static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 sip, __be32 tip)
@@ -3660,9 +3580,7 @@ static int bond_inetaddr_event(struct notifier_block *this, unsigned long event,
 		}
 
 		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
-			if (!bond->vlgrp)
-				continue;
-			vlan_dev = vlan_group_get_device(bond->vlgrp, vlan->vlan_id);
+			vlan_dev = vlan_find_dev(bond->dev, vlan->vlan_id);
 			if (vlan_dev == event_dev) {
 				switch (event) {
 				case NETDEV_UP:
@@ -4670,7 +4588,6 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_change_mtu		= bond_change_mtu,
 	.ndo_set_mac_address 	= bond_set_mac_address,
 	.ndo_neigh_setup	= bond_neigh_setup,
-	.ndo_vlan_rx_register	= bond_vlan_rx_register,
 	.ndo_vlan_rx_add_vid 	= bond_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= bond_vlan_rx_kill_vid,
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -4730,13 +4647,9 @@ static void bond_setup(struct net_device *bond_dev)
 	bond_dev->features |= NETIF_F_LLTX;
 
 	/* By default, we declare the bond to be fully
-	 * VLAN hardware accelerated capable. Special
-	 * care is taken in the various xmit functions
-	 * when there are slaves that are not hw accel
-	 * capable
+	 * VLAN hardware accelerated capable.
 	 */
 	bond_dev->features |= (NETIF_F_HW_VLAN_TX |
-			       NETIF_F_HW_VLAN_RX |
 			       NETIF_F_HW_VLAN_FILTER);
 
 	/* By default, we enable GRO on bonding devices.
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 2c12a5f..e9de73a 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -248,7 +248,6 @@ struct bonding {
 	struct   alb_bond_info alb_info;
 	struct   bond_params params;
 	struct   list_head vlan_list;
-	struct   vlan_group *vlgrp;
 	struct   packet_type arp_mon_pt;
 	struct   workqueue_struct *wq;
 	struct   delayed_work mii_work;
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 12/14] lro: Remove explicit vlan support.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

Using the new vlan accleration model, LRO no longer needs to be
explicitly passed the vlan information because it is contained in
the skb.  Since all LRO did was pass the vlan through, this removes
that knowledge.

Signed-off-by: Jesse Gross <jesse@nicira.com>
--
This patch can only be applied once all drivers that use LRO and vlan acceleration
have been converted over to the new model.
---
 include/linux/inet_lro.h |   20 ------------
 net/ipv4/inet_lro.c      |   74 +++++++---------------------------------------
 2 files changed, 11 insertions(+), 83 deletions(-)

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
index c4335fa..667281a 100644
--- a/include/linux/inet_lro.h
+++ b/include/linux/inet_lro.h
@@ -50,7 +50,6 @@ struct net_lro_desc {
 	struct skb_frag_struct *next_frag;
 	struct iphdr *iph;
 	struct tcphdr *tcph;
-	struct vlan_group *vgrp;
 	__wsum  data_csum;
 	__be32 tcp_rcv_tsecr;
 	__be32 tcp_rcv_tsval;
@@ -60,9 +59,7 @@ struct net_lro_desc {
 	u16 ip_tot_len;
 	u16 tcp_saw_tstamp; 		/* timestamps enabled */
 	__be16 tcp_window;
-	u16 vlan_tag;
 	int pkt_aggr_cnt;		/* counts aggregated packets */
-	int vlan_packet;
 	int mss;
 	int active;
 };
@@ -137,16 +134,6 @@ void lro_receive_skb(struct net_lro_mgr *lro_mgr,
 		     void *priv);
 
 /*
- * Processes a SKB with VLAN HW acceleration support
- */
-
-void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
-				  struct sk_buff *skb,
-				  struct vlan_group *vgrp,
-				  u16 vlan_tag,
-				  void *priv);
-
-/*
  * Processes a fragment list
  *
  * This functions aggregate fragments and generate SKBs do pass
@@ -165,13 +152,6 @@ void lro_receive_frags(struct net_lro_mgr *lro_mgr,
 		       struct skb_frag_struct *frags,
 		       int len, int true_size, void *priv, __wsum sum);
 
-void lro_vlan_hwaccel_receive_frags(struct net_lro_mgr *lro_mgr,
-				    struct skb_frag_struct *frags,
-				    int len, int true_size,
-				    struct vlan_group *vgrp,
-				    u16 vlan_tag,
-				    void *priv, __wsum sum);
-
 /*
  * Forward all aggregated SKBs held by lro_mgr to network stack
  */
diff --git a/net/ipv4/inet_lro.c b/net/ipv4/inet_lro.c
index 47038cb..8945a1d 100644
--- a/net/ipv4/inet_lro.c
+++ b/net/ipv4/inet_lro.c
@@ -146,8 +146,7 @@ static __wsum lro_tcp_data_csum(struct iphdr *iph, struct tcphdr *tcph, int len)
 }
 
 static void lro_init_desc(struct net_lro_desc *lro_desc, struct sk_buff *skb,
-			  struct iphdr *iph, struct tcphdr *tcph,
-			  u16 vlan_tag, struct vlan_group *vgrp)
+			  struct iphdr *iph, struct tcphdr *tcph)
 {
 	int nr_frags;
 	__be32 *ptr;
@@ -173,8 +172,6 @@ static void lro_init_desc(struct net_lro_desc *lro_desc, struct sk_buff *skb,
 	}
 
 	lro_desc->mss = tcp_data_len;
-	lro_desc->vgrp = vgrp;
-	lro_desc->vlan_tag = vlan_tag;
 	lro_desc->active = 1;
 
 	lro_desc->data_csum = lro_tcp_data_csum(iph, tcph,
@@ -309,29 +306,17 @@ static void lro_flush(struct net_lro_mgr *lro_mgr,
 
 	skb_shinfo(lro_desc->parent)->gso_size = lro_desc->mss;
 
-	if (lro_desc->vgrp) {
-		if (lro_mgr->features & LRO_F_NAPI)
-			vlan_hwaccel_receive_skb(lro_desc->parent,
-						 lro_desc->vgrp,
-						 lro_desc->vlan_tag);
-		else
-			vlan_hwaccel_rx(lro_desc->parent,
-					lro_desc->vgrp,
-					lro_desc->vlan_tag);
-
-	} else {
-		if (lro_mgr->features & LRO_F_NAPI)
-			netif_receive_skb(lro_desc->parent);
-		else
-			netif_rx(lro_desc->parent);
-	}
+	if (lro_mgr->features & LRO_F_NAPI)
+		netif_receive_skb(lro_desc->parent);
+	else
+		netif_rx(lro_desc->parent);
 
 	LRO_INC_STATS(lro_mgr, flushed);
 	lro_clear_desc(lro_desc);
 }
 
 static int __lro_proc_skb(struct net_lro_mgr *lro_mgr, struct sk_buff *skb,
-			  struct vlan_group *vgrp, u16 vlan_tag, void *priv)
+			  void *priv)
 {
 	struct net_lro_desc *lro_desc;
 	struct iphdr *iph;
@@ -360,7 +345,7 @@ static int __lro_proc_skb(struct net_lro_mgr *lro_mgr, struct sk_buff *skb,
 			goto out;
 
 		skb->ip_summed = lro_mgr->ip_summed_aggr;
-		lro_init_desc(lro_desc, skb, iph, tcph, vlan_tag, vgrp);
+		lro_init_desc(lro_desc, skb, iph, tcph);
 		LRO_INC_STATS(lro_mgr, aggregated);
 		return 0;
 	}
@@ -433,8 +418,7 @@ static struct sk_buff *lro_gen_skb(struct net_lro_mgr *lro_mgr,
 static struct sk_buff *__lro_proc_segment(struct net_lro_mgr *lro_mgr,
 					  struct skb_frag_struct *frags,
 					  int len, int true_size,
-					  struct vlan_group *vgrp,
-					  u16 vlan_tag, void *priv, __wsum sum)
+					  void *priv, __wsum sum)
 {
 	struct net_lro_desc *lro_desc;
 	struct iphdr *iph;
@@ -480,7 +464,7 @@ static struct sk_buff *__lro_proc_segment(struct net_lro_mgr *lro_mgr,
 		tcph = (void *)((u8 *)skb->data + vlan_hdr_len
 				+ IP_HDR_LEN(iph));
 
-		lro_init_desc(lro_desc, skb, iph, tcph, 0, NULL);
+		lro_init_desc(lro_desc, skb, iph, tcph);
 		LRO_INC_STATS(lro_mgr, aggregated);
 		return NULL;
 	}
@@ -514,7 +498,7 @@ void lro_receive_skb(struct net_lro_mgr *lro_mgr,
 		     struct sk_buff *skb,
 		     void *priv)
 {
-	if (__lro_proc_skb(lro_mgr, skb, NULL, 0, priv)) {
+	if (__lro_proc_skb(lro_mgr, skb, priv)) {
 		if (lro_mgr->features & LRO_F_NAPI)
 			netif_receive_skb(skb);
 		else
@@ -523,29 +507,13 @@ void lro_receive_skb(struct net_lro_mgr *lro_mgr,
 }
 EXPORT_SYMBOL(lro_receive_skb);
 
-void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
-				  struct sk_buff *skb,
-				  struct vlan_group *vgrp,
-				  u16 vlan_tag,
-				  void *priv)
-{
-	if (__lro_proc_skb(lro_mgr, skb, vgrp, vlan_tag, priv)) {
-		if (lro_mgr->features & LRO_F_NAPI)
-			vlan_hwaccel_receive_skb(skb, vgrp, vlan_tag);
-		else
-			vlan_hwaccel_rx(skb, vgrp, vlan_tag);
-	}
-}
-EXPORT_SYMBOL(lro_vlan_hwaccel_receive_skb);
-
 void lro_receive_frags(struct net_lro_mgr *lro_mgr,
 		       struct skb_frag_struct *frags,
 		       int len, int true_size, void *priv, __wsum sum)
 {
 	struct sk_buff *skb;
 
-	skb = __lro_proc_segment(lro_mgr, frags, len, true_size, NULL, 0,
-				 priv, sum);
+	skb = __lro_proc_segment(lro_mgr, frags, len, true_size, priv, sum);
 	if (!skb)
 		return;
 
@@ -556,26 +524,6 @@ void lro_receive_frags(struct net_lro_mgr *lro_mgr,
 }
 EXPORT_SYMBOL(lro_receive_frags);
 
-void lro_vlan_hwaccel_receive_frags(struct net_lro_mgr *lro_mgr,
-				    struct skb_frag_struct *frags,
-				    int len, int true_size,
-				    struct vlan_group *vgrp,
-				    u16 vlan_tag, void *priv, __wsum sum)
-{
-	struct sk_buff *skb;
-
-	skb = __lro_proc_segment(lro_mgr, frags, len, true_size, vgrp,
-				 vlan_tag, priv, sum);
-	if (!skb)
-		return;
-
-	if (lro_mgr->features & LRO_F_NAPI)
-		vlan_hwaccel_receive_skb(skb, vgrp, vlan_tag);
-	else
-		vlan_hwaccel_rx(skb, vgrp, vlan_tag);
-}
-EXPORT_SYMBOL(lro_vlan_hwaccel_receive_frags);
-
 void lro_flush_all(struct net_lro_mgr *lro_mgr)
 {
 	int i;
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hao Zheng, Eilon Greenstein
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

From: Hao Zheng <hzheng@nicira.com>

Make the bnx2x driver use the new vlan accleration model.

Signed-off-by: Hao Zheng <hzheng@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x.h         |   10 ------
 drivers/net/bnx2x/bnx2x_cmn.c     |   60 +++++++------------------------------
 drivers/net/bnx2x/bnx2x_ethtool.c |   33 ++++++++++----------
 drivers/net/bnx2x/bnx2x_main.c    |    8 -----
 4 files changed, 27 insertions(+), 84 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index 3bf236b..9571ecf 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -24,10 +24,6 @@
 #define DRV_MODULE_RELDATE      "2010/10/19"
 #define BNX2X_BC_VER            0x040200
 
-#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
-#define BCM_VLAN			1
-#endif
-
 #define BNX2X_MULTI_QUEUE
 
 #define BNX2X_NEW_NAPI
@@ -858,10 +854,6 @@ struct bnx2x {
 
 	int			tx_ring_size;
 
-#ifdef BCM_VLAN
-	struct vlan_group	*vlgrp;
-#endif
-
 	u32			rx_csum;
 	u32			rx_buf_size;
 /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */
@@ -925,8 +917,6 @@ struct bnx2x {
 #define NO_MCP_FLAG			0x100
 #define DISABLE_MSI_FLAG		0x200
 #define BP_NOMCP(bp)			(bp->flags & NO_MCP_FLAG)
-#define HW_VLAN_TX_FLAG			0x400
-#define HW_VLAN_RX_FLAG			0x800
 #define MF_FUNC_DIS			0x1000
 
 	int			pf_num;	/* absolute PF number */
diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
index 6905b2e..bc58375 100644
--- a/drivers/net/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/bnx2x/bnx2x_cmn.c
@@ -16,16 +16,13 @@
  */
 
 #include <linux/etherdevice.h>
+#include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <net/ipv6.h>
 #include <net/ip6_checksum.h>
 #include <linux/firmware.h>
 #include "bnx2x_cmn.h"
 
-#ifdef BCM_VLAN
-#include <linux/if_vlan.h>
-#endif
-
 #include "bnx2x_init.h"
 
 
@@ -346,13 +343,6 @@ static void bnx2x_tpa_stop(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 	if (likely(new_skb)) {
 		/* fix ip xsum and give it to the stack */
 		/* (no need to map the new skb) */
-#ifdef BCM_VLAN
-		int is_vlan_cqe =
-			(le16_to_cpu(cqe->fast_path_cqe.pars_flags.flags) &
-			 PARSING_FLAGS_VLAN);
-		int is_not_hwaccel_vlan_cqe =
-			(is_vlan_cqe && (!(bp->flags & HW_VLAN_RX_FLAG)));
-#endif
 
 		prefetch(skb);
 		prefetch(((char *)(skb)) + L1_CACHE_BYTES);
@@ -377,28 +367,18 @@ static void bnx2x_tpa_stop(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 			struct iphdr *iph;
 
 			iph = (struct iphdr *)skb->data;
-#ifdef BCM_VLAN
-			/* If there is no Rx VLAN offloading -
-			   take VLAN tag into an account */
-			if (unlikely(is_not_hwaccel_vlan_cqe))
-				iph = (struct iphdr *)((u8 *)iph + VLAN_HLEN);
-#endif
 			iph->check = 0;
 			iph->check = ip_fast_csum((u8 *)iph, iph->ihl);
 		}
 
 		if (!bnx2x_fill_frag_skb(bp, fp, skb,
 					 &cqe->fast_path_cqe, cqe_idx)) {
-#ifdef BCM_VLAN
-			if ((bp->vlgrp != NULL) &&
-				(le16_to_cpu(cqe->fast_path_cqe.
-				pars_flags.flags) & PARSING_FLAGS_VLAN))
-				vlan_gro_receive(&fp->napi, bp->vlgrp,
+			if ((le16_to_cpu(cqe->fast_path_cqe.
+			    pars_flags.flags) & PARSING_FLAGS_VLAN))
+				__vlan_hwaccel_put_tag(skb,
 						 le16_to_cpu(cqe->fast_path_cqe.
-							     vlan_tag), skb);
-			else
-#endif
-				napi_gro_receive(&fp->napi, skb);
+							     vlan_tag));
+			napi_gro_receive(&fp->napi, skb);
 		} else {
 			DP(NETIF_MSG_RX_STATUS, "Failed to allocate new pages"
 			   " - dropping packet!\n");
@@ -633,15 +613,11 @@ reuse_rx:
 
 		skb_record_rx_queue(skb, fp->index);
 
-#ifdef BCM_VLAN
-		if ((bp->vlgrp != NULL) && (bp->flags & HW_VLAN_RX_FLAG) &&
-		    (le16_to_cpu(cqe->fast_path_cqe.pars_flags.flags) &
-		     PARSING_FLAGS_VLAN))
-			vlan_gro_receive(&fp->napi, bp->vlgrp,
-				le16_to_cpu(cqe->fast_path_cqe.vlan_tag), skb);
-		else
-#endif
-			napi_gro_receive(&fp->napi, skb);
+		if (le16_to_cpu(cqe->fast_path_cqe.pars_flags.flags) &
+		     PARSING_FLAGS_VLAN)
+			__vlan_hwaccel_put_tag(skb,
+				le16_to_cpu(cqe->fast_path_cqe.vlan_tag));
+		napi_gro_receive(&fp->napi, skb);
 
 
 next_rx:
@@ -2025,14 +2001,12 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	   "sending pkt %u @%p  next_idx %u  bd %u @%p\n",
 	   pkt_prod, tx_buf, fp->tx_pkt_prod, bd_prod, tx_start_bd);
 
-#ifdef BCM_VLAN
 	if (vlan_tx_tag_present(skb)) {
 		tx_start_bd->vlan_or_ethertype =
 		    cpu_to_le16(vlan_tx_tag_get(skb));
 		tx_start_bd->bd_flags.as_bitfield |=
 		    (X_ETH_OUTBAND_VLAN << ETH_TX_BD_FLAGS_VLAN_MODE_SHIFT);
 	} else
-#endif
 		tx_start_bd->vlan_or_ethertype = cpu_to_le16(pkt_prod);
 
 	/* turn on parsing and get a BD */
@@ -2317,18 +2291,6 @@ void bnx2x_tx_timeout(struct net_device *dev)
 	schedule_delayed_work(&bp->reset_task, 0);
 }
 
-#ifdef BCM_VLAN
-/* called with rtnl_lock */
-void bnx2x_vlan_rx_register(struct net_device *dev,
-				   struct vlan_group *vlgrp)
-{
-	struct bnx2x *bp = netdev_priv(dev);
-
-	bp->vlgrp = vlgrp;
-}
-
-#endif
-
 int bnx2x_suspend(struct pci_dev *pdev, pm_message_t state)
 {
 	struct net_device *dev = pci_get_drvdata(pdev);
diff --git a/drivers/net/bnx2x/bnx2x_ethtool.c b/drivers/net/bnx2x/bnx2x_ethtool.c
index 54fe061..daefef6 100644
--- a/drivers/net/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/bnx2x/bnx2x_ethtool.c
@@ -1117,35 +1117,34 @@ static int bnx2x_set_flags(struct net_device *dev, u32 data)
 	int changed = 0;
 	int rc = 0;
 
-	if (data & ~(ETH_FLAG_LRO | ETH_FLAG_RXHASH))
-		return -EINVAL;
-
 	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
 		printk(KERN_ERR "Handling parity error recovery. Try again later\n");
 		return -EAGAIN;
 	}
 
+	if (!(data & ETH_FLAG_RXVLAN))
+		return -EOPNOTSUPP;
+
+	if ((data & ETH_FLAG_LRO) && bp->rx_csum && bp->disable_tpa)
+		return -EINVAL;
+
+	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_LRO | ETH_FLAG_RXVLAN |
+					ETH_FLAG_TXVLAN | ETH_FLAG_RXHASH);
+	if (rc)
+		return rc;
+
 	/* TPA requires Rx CSUM offloading */
 	if ((data & ETH_FLAG_LRO) && bp->rx_csum) {
-		if (!bp->disable_tpa) {
-			if (!(dev->features & NETIF_F_LRO)) {
-				dev->features |= NETIF_F_LRO;
-				bp->flags |= TPA_ENABLE_FLAG;
-				changed = 1;
-			}
-		} else
-			rc = -EINVAL;
-	} else if (dev->features & NETIF_F_LRO) {
+		if (!(bp->flags & TPA_ENABLE_FLAG)) {
+			bp->flags |= TPA_ENABLE_FLAG;
+			changed = 1;
+		}
+	} else if (bp->flags & TPA_ENABLE_FLAG) {
 		dev->features &= ~NETIF_F_LRO;
 		bp->flags &= ~TPA_ENABLE_FLAG;
 		changed = 1;
 	}
 
-	if (data & ETH_FLAG_RXHASH)
-		dev->features |= NETIF_F_RXHASH;
-	else
-		dev->features &= ~NETIF_F_RXHASH;
-
 	if (changed && netif_running(dev)) {
 		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
 		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
index f22e283..ff99a2f 100644
--- a/drivers/net/bnx2x/bnx2x_main.c
+++ b/drivers/net/bnx2x/bnx2x_main.c
@@ -2371,10 +2371,8 @@ static inline u16 bnx2x_get_cl_flags(struct bnx2x *bp,
 	flags |= QUEUE_FLG_HC;
 	flags |= IS_MF(bp) ? QUEUE_FLG_OV : 0;
 
-#ifdef BCM_VLAN
 	flags |= QUEUE_FLG_VLAN;
 	DP(NETIF_MSG_IFUP, "vlan removal enabled\n");
-#endif
 
 	if (!fp->disable_tpa)
 		flags |= QUEUE_FLG_TPA;
@@ -8630,9 +8628,6 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 	.ndo_do_ioctl		= bnx2x_ioctl,
 	.ndo_change_mtu		= bnx2x_change_mtu,
 	.ndo_tx_timeout		= bnx2x_tx_timeout,
-#ifdef BCM_VLAN
-	.ndo_vlan_rx_register	= bnx2x_vlan_rx_register,
-#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= poll_bnx2x,
 #endif
@@ -8764,9 +8759,7 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
 		dev->features |= NETIF_F_HIGHDMA;
 	dev->features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
 	dev->features |= NETIF_F_TSO6;
-#ifdef BCM_VLAN
 	dev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
-	bp->flags |= (HW_VLAN_RX_FLAG | HW_VLAN_TX_FLAG);
 
 	dev->vlan_features |= NETIF_F_SG;
 	dev->vlan_features |= NETIF_F_HW_CSUM;
@@ -8774,7 +8767,6 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
 		dev->vlan_features |= NETIF_F_HIGHDMA;
 	dev->vlan_features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
 	dev->vlan_features |= NETIF_F_TSO6;
-#endif
 
 	/* get_port_hwinfo() will set prtad and mmds properly */
 	bp->mdio.prtad = MDIO_PRTAD_NONE;
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 10/14] ixgbe: Update ixgbe to use new vlan accleration.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Peter Waskiewicz, Emil Tantilov, Jeff Kirsher
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

Make the ixgbe driver use the new vlan accleration model.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
CC: Emil Tantilov <emil.s.tantilov@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbe/ixgbe.h         |    4 +-
 drivers/net/ixgbe/ixgbe_ethtool.c |   12 +++-
 drivers/net/ixgbe/ixgbe_main.c    |  137 ++++++++++++++++---------------------
 3 files changed, 74 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index a8c47b0..5e38de7 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -28,11 +28,13 @@
 #ifndef _IXGBE_H_
 #define _IXGBE_H_
 
+#include <linux/bitops.h>
 #include <linux/types.h>
 #include <linux/pci.h>
 #include <linux/netdevice.h>
 #include <linux/cpumask.h>
 #include <linux/aer.h>
+#include <linux/if_vlan.h>
 
 #include "ixgbe_type.h"
 #include "ixgbe_common.h"
@@ -287,7 +289,7 @@ struct ixgbe_q_vector {
 /* board specific private data structure */
 struct ixgbe_adapter {
 	struct timer_list watchdog_timer;
-	struct vlan_group *vlgrp;
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 	u16 bd_number;
 	struct work_struct reset_task;
 	struct ixgbe_q_vector *q_vector[MAX_MSIX_Q_VECTORS];
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index d4ac943..dbfd62f 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -2113,7 +2113,17 @@ static int ixgbe_set_flags(struct net_device *netdev, u32 data)
 	bool need_reset = false;
 	int rc;
 
-	rc = ethtool_op_set_flags(netdev, data, ETH_FLAG_LRO | ETH_FLAG_NTUPLE);
+#ifdef CONFIG_IXGBE_DCB
+	if ((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
+	    !(data & ETH_FLAG_RXVLAN))
+		return -EINVAL;
+#endif
+
+	need_reset = (data & ETH_FLAG_RXVLAN) !=
+		     (netdev->features & NETIF_F_HW_VLAN_RX);
+
+	rc = ethtool_op_set_flags(netdev, data, ETH_FLAG_LRO |
+					ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
 	if (rc)
 		return rc;
 
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 998debe..56f6b80 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -954,17 +954,13 @@ static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector,
 	bool is_vlan = (status & IXGBE_RXD_STAT_VP);
 	u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
-		if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
-			vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
-		else
-			napi_gro_receive(napi, skb);
-	} else {
-		if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
-			vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
-		else
-			netif_rx(skb);
-	}
+	if (is_vlan && (tag & VLAN_VID_MASK))
+		__vlan_hwaccel_put_tag(skb, tag);
+
+	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+		napi_gro_receive(napi, skb);
+	else
+		netif_rx(skb);
 }
 
 /**
@@ -3065,6 +3061,7 @@ static void ixgbe_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
 
 	/* add VID to filter table */
 	hw->mac.ops.set_vfta(&adapter->hw, vid, pool_ndx, true);
+	set_bit(vid, adapter->active_vlans);
 }
 
 static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -3073,16 +3070,9 @@ static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
 	struct ixgbe_hw *hw = &adapter->hw;
 	int pool_ndx = adapter->num_vfs;
 
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_disable(adapter);
-
-	vlan_group_set_device(adapter->vlgrp, vid, NULL);
-
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable(adapter, true, true);
-
 	/* remove VID from filter table */
 	hw->mac.ops.set_vfta(&adapter->hw, vid, pool_ndx, false);
+	clear_bit(vid, adapter->active_vlans);
 }
 
 /**
@@ -3092,27 +3082,45 @@ static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
 static void ixgbe_vlan_filter_disable(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+	u32 vlnctrl;
+
+	vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+	vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
+	IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
+}
+
+/**
+ * ixgbe_vlan_filter_enable - helper to enable hw vlan filtering
+ * @adapter: driver data
+ */
+static void ixgbe_vlan_filter_enable(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vlnctrl;
+
+	vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+	vlnctrl |= IXGBE_VLNCTRL_VFE;
+	vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
+	IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
+}
+
+/**
+ * ixgbe_vlan_strip_disable - helper to disable hw vlan stripping
+ * @adapter: driver data
+ */
+static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vlnctrl;
 	int i, j;
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
-		vlnctrl &= ~IXGBE_VLNCTRL_VFE;
-#ifdef CONFIG_IXGBE_DCB
-		if (!(adapter->flags & IXGBE_FLAG_DCB_ENABLED))
-			vlnctrl &= ~IXGBE_VLNCTRL_VME;
-#endif
-		vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
+		vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+		vlnctrl &= ~IXGBE_VLNCTRL_VME;
 		IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
 		break;
 	case ixgbe_mac_82599EB:
-		vlnctrl &= ~IXGBE_VLNCTRL_VFE;
-		vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
-		IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
-#ifdef CONFIG_IXGBE_DCB
-		if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
-			break;
-#endif
 		for (i = 0; i < adapter->num_rx_queues; i++) {
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
@@ -3126,25 +3134,22 @@ static void ixgbe_vlan_filter_disable(struct ixgbe_adapter *adapter)
 }
 
 /**
- * ixgbe_vlan_filter_enable - helper to enable hw vlan filtering
+ * ixgbe_vlan_strip_enable - helper to enable hw vlan stripping
  * @adapter: driver data
  */
-static void ixgbe_vlan_filter_enable(struct ixgbe_adapter *adapter)
+static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+	u32 vlnctrl;
 	int i, j;
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
-		vlnctrl |= IXGBE_VLNCTRL_VME | IXGBE_VLNCTRL_VFE;
-		vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
+		vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+		vlnctrl |= IXGBE_VLNCTRL_VME;
 		IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
 		break;
 	case ixgbe_mac_82599EB:
-		vlnctrl |= IXGBE_VLNCTRL_VFE;
-		vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
-		IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
 		for (i = 0; i < adapter->num_rx_queues; i++) {
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
@@ -3157,40 +3162,14 @@ static void ixgbe_vlan_filter_enable(struct ixgbe_adapter *adapter)
 	}
 }
 
-static void ixgbe_vlan_rx_register(struct net_device *netdev,
-				   struct vlan_group *grp)
-{
-	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_disable(adapter);
-	adapter->vlgrp = grp;
-
-	/*
-	 * For a DCB driver, always enable VLAN tag stripping so we can
-	 * still receive traffic from a DCB-enabled host even if we're
-	 * not in DCB mode.
-	 */
-	ixgbe_vlan_filter_enable(adapter);
-
-	ixgbe_vlan_rx_add_vid(netdev, 0);
-
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable(adapter, true, true);
-}
-
 static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
 {
-	ixgbe_vlan_rx_register(adapter->netdev, adapter->vlgrp);
+	u16 vid;
 
-	if (adapter->vlgrp) {
-		u16 vid;
-		for (vid = 0; vid < VLAN_N_VID; vid++) {
-			if (!vlan_group_get_device(adapter->vlgrp, vid))
-				continue;
-			ixgbe_vlan_rx_add_vid(adapter->netdev, vid);
-		}
-	}
+	ixgbe_vlan_rx_add_vid(adapter->netdev, 0);
+
+	for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+		ixgbe_vlan_rx_add_vid(adapter->netdev, vid);
 }
 
 /**
@@ -3305,6 +3284,11 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
 	}
 
 	IXGBE_WRITE_REG(hw, IXGBE_FCTRL, fctrl);
+
+	if (netdev->features & NETIF_F_HW_VLAN_RX)
+		ixgbe_vlan_strip_enable(adapter);
+	else
+		ixgbe_vlan_strip_disable(adapter);
 }
 
 static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
@@ -3388,7 +3372,7 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j), txdctl);
 	}
 	/* Enable VLAN tag insert/strip */
-	ixgbe_vlan_filter_enable(adapter);
+	adapter->netdev->features |= NETIF_F_HW_VLAN_RX;
 
 	hw->mac.ops.set_vfta(&adapter->hw, 0, 0, true);
 }
@@ -3400,13 +3384,13 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	int i;
 
-	ixgbe_set_rx_mode(netdev);
-
-	ixgbe_restore_vlan(adapter);
 #ifdef CONFIG_IXGBE_DCB
 	ixgbe_configure_dcb(adapter);
 #endif
 
+	ixgbe_set_rx_mode(netdev);
+	ixgbe_restore_vlan(adapter);
+
 #ifdef IXGBE_FCOE
 	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED)
 		ixgbe_configure_fcoe(adapter);
@@ -6569,7 +6553,6 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_mac_address	= ixgbe_set_mac,
 	.ndo_change_mtu		= ixgbe_change_mtu,
 	.ndo_tx_timeout		= ixgbe_tx_timeout,
-	.ndo_vlan_rx_register	= ixgbe_vlan_rx_register,
 	.ndo_vlan_rx_add_vid	= ixgbe_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= ixgbe_vlan_rx_kill_vid,
 	.ndo_do_ioctl		= ixgbe_ioctl,
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 07/14] ethtool: Add support for vlan accleration.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

Now that vlan acceleration is handled consistently regardless of usage,
it is possible to enable and disable it at will.  This adds support for
Ethtool operations that change the offloading status for debugging
purposes, similar to other forms of hardware acceleration.

Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/ethtool.h |    2 ++
 net/core/ethtool.c      |    3 ++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 8a3338c..6628a50 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -309,6 +309,8 @@ struct ethtool_perm_addr {
  * flag differs from the read-only value.
  */
 enum ethtool_flags {
+	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
+	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
 	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
 	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
 	ETH_FLAG_RXHASH		= (1 << 28),
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 685c700..956a9f4 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -132,7 +132,8 @@ EXPORT_SYMBOL(ethtool_op_set_ufo);
  * NETIF_F_xxx values in include/linux/netdevice.h
  */
 static const u32 flags_dup_features =
-	(ETH_FLAG_LRO | ETH_FLAG_NTUPLE | ETH_FLAG_RXHASH);
+	(ETH_FLAG_LRO | ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN | ETH_FLAG_NTUPLE |
+	 ETH_FLAG_RXHASH);
 
 u32 ethtool_op_get_flags(struct net_device *dev)
 {
-- 
1.7.1


^ permalink raw reply related

* [PATCH v2 09/14] bnx2: Update bnx2 to use new vlan accleration.
From: Jesse Gross @ 2010-10-20 23:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael Chan
In-Reply-To: <1287618974-4714-1-git-send-email-jesse@nicira.com>

Make the bnx2 driver use the new vlan accleration model.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   97 +++++++++++++++-------------------------------------
 drivers/net/bnx2.h |    4 --
 2 files changed, 28 insertions(+), 73 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 363ca8b..bf3c830 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -37,9 +37,6 @@
 #include <linux/ethtool.h>
 #include <linux/mii.h>
 #include <linux/if_vlan.h>
-#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
-#define BCM_VLAN 1
-#endif
 #include <net/ip.h>
 #include <net/tcp.h>
 #include <net/checksum.h>
@@ -3087,8 +3084,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		struct sw_bd *rx_buf, *next_rx_buf;
 		struct sk_buff *skb;
 		dma_addr_t dma_addr;
-		u16 vtag = 0;
-		int hw_vlan __maybe_unused = 0;
 
 		sw_ring_cons = RX_RING_IDX(sw_cons);
 		sw_ring_prod = RX_RING_IDX(sw_prod);
@@ -3168,23 +3163,8 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			goto next_rx;
 
 		if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
-		    !(bp->rx_mode & BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG)) {
-			vtag = rx_hdr->l2_fhdr_vlan_tag;
-#ifdef BCM_VLAN
-			if (bp->vlgrp)
-				hw_vlan = 1;
-			else
-#endif
-			{
-				struct vlan_ethhdr *ve = (struct vlan_ethhdr *)
-					__skb_push(skb, 4);
-
-				memmove(ve, skb->data + 4, ETH_ALEN * 2);
-				ve->h_vlan_proto = htons(ETH_P_8021Q);
-				ve->h_vlan_TCI = htons(vtag);
-				len += 4;
-			}
-		}
+		    !(bp->rx_mode & BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG))
+			__vlan_hwaccel_put_tag(skb, rx_hdr->l2_fhdr_vlan_tag);
 
 		skb->protocol = eth_type_trans(skb, bp->dev);
 
@@ -3211,14 +3191,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			skb->rxhash = rx_hdr->l2_fhdr_hash;
 
 		skb_record_rx_queue(skb, bnapi - &bp->bnx2_napi[0]);
-
-#ifdef BCM_VLAN
-		if (hw_vlan)
-			vlan_gro_receive(&bnapi->napi, bp->vlgrp, vtag, skb);
-		else
-#endif
-			napi_gro_receive(&bnapi->napi, skb);
-
+		napi_gro_receive(&bnapi->napi, skb);
 		rx_pkt++;
 
 next_rx:
@@ -3533,13 +3506,9 @@ bnx2_set_rx_mode(struct net_device *dev)
 	rx_mode = bp->rx_mode & ~(BNX2_EMAC_RX_MODE_PROMISCUOUS |
 				  BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG);
 	sort_mode = 1 | BNX2_RPM_SORT_USER0_BC_EN;
-#ifdef BCM_VLAN
-	if (!bp->vlgrp && (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN))
+	if (!(dev->features & NETIF_F_HW_VLAN_RX) &&
+	     (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN))
 		rx_mode |= BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG;
-#else
-	if (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN)
-		rx_mode |= BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG;
-#endif
 	if (dev->flags & IFF_PROMISC) {
 		/* Promiscuous mode. */
 		rx_mode |= BNX2_EMAC_RX_MODE_PROMISCUOUS;
@@ -6365,29 +6334,6 @@ bnx2_tx_timeout(struct net_device *dev)
 	schedule_work(&bp->reset_task);
 }
 
-#ifdef BCM_VLAN
-/* Called with rtnl_lock */
-static void
-bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
-{
-	struct bnx2 *bp = netdev_priv(dev);
-
-	if (netif_running(dev))
-		bnx2_netif_stop(bp, false);
-
-	bp->vlgrp = vlgrp;
-
-	if (!netif_running(dev))
-		return;
-
-	bnx2_set_rx_mode(dev);
-	if (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN)
-		bnx2_fw_sync(bp, BNX2_DRV_MSG_CODE_KEEP_VLAN_UPDATE, 0, 1);
-
-	bnx2_netif_start(bp, false);
-}
-#endif
-
 /* Called with netif_tx_lock.
  * bnx2_tx_int() runs without netif_tx_lock unless it needs to call
  * netif_wake_queue().
@@ -6428,12 +6374,11 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		vlan_tag_flags |= TX_BD_FLAGS_TCP_UDP_CKSUM;
 	}
 
-#ifdef BCM_VLAN
 	if (vlan_tx_tag_present(skb)) {
 		vlan_tag_flags |=
 			(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
 	}
-#endif
+
 	if ((mss = skb_shinfo(skb)->gso_size)) {
 		u32 tcp_opt_len;
 		struct iphdr *iph;
@@ -7578,7 +7523,28 @@ bnx2_set_tx_csum(struct net_device *dev, u32 data)
 static int
 bnx2_set_flags(struct net_device *dev, u32 data)
 {
-	return ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH);
+	struct bnx2 *bp = netdev_priv(dev);
+	int rc;
+
+	if (!(bp->flags & BNX2_FLAG_CAN_KEEP_VLAN) &&
+	    !(data & ETH_FLAG_RXVLAN))
+		return -EOPNOTSUPP;
+
+	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH | ETH_FLAG_RXVLAN |
+				  ETH_FLAG_TXVLAN);
+	if (rc)
+		return rc;
+
+	if ((!!(data & ETH_FLAG_RXVLAN) !=
+	    !!(bp->rx_mode & BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG)) &&
+	    netif_running(dev)) {
+		bnx2_netif_stop(bp, false);
+		bnx2_set_rx_mode(dev);
+		bnx2_fw_sync(bp, BNX2_DRV_MSG_CODE_KEEP_VLAN_UPDATE, 0, 1);
+		bnx2_netif_start(bp, false);
+	}
+
+	return 0;
 }
 
 static const struct ethtool_ops bnx2_ethtool_ops = {
@@ -8318,9 +8284,6 @@ static const struct net_device_ops bnx2_netdev_ops = {
 	.ndo_set_mac_address	= bnx2_change_mac_addr,
 	.ndo_change_mtu		= bnx2_change_mtu,
 	.ndo_tx_timeout		= bnx2_tx_timeout,
-#ifdef BCM_VLAN
-	.ndo_vlan_rx_register	= bnx2_vlan_rx_register,
-#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= poll_bnx2,
 #endif
@@ -8328,9 +8291,7 @@ static const struct net_device_ops bnx2_netdev_ops = {
 
 static void inline vlan_features_add(struct net_device *dev, unsigned long flags)
 {
-#ifdef BCM_VLAN
 	dev->vlan_features |= flags;
-#endif
 }
 
 static int __devinit
@@ -8379,9 +8340,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->features |= NETIF_F_IPV6_CSUM;
 		vlan_features_add(dev, NETIF_F_IPV6_CSUM);
 	}
-#ifdef BCM_VLAN
 	dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
-#endif
 	dev->features |= NETIF_F_TSO | NETIF_F_TSO_ECN;
 	vlan_features_add(dev, NETIF_F_TSO | NETIF_F_TSO_ECN);
 	if (CHIP_NUM(bp) == CHIP_NUM_5709) {
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index efdfbc2..4f44db6 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6742,10 +6742,6 @@ struct bnx2 {
 
 	struct bnx2_napi	bnx2_napi[BNX2_MAX_MSIX_VEC];
 
-#ifdef BCM_VLAN
-	struct			vlan_group *vlgrp;
-#endif
-
 	u32			rx_buf_use_size;	/* useable size */
 	u32			rx_buf_size;		/* with alignment */
 	u32			rx_copy_thresh;
-- 
1.7.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox