Netdev List
 help / color / mirror / Atom feed
* Re: [Fwd: Re: Bug#538372: header failure including netlink.h (or uio.h)]
From: Jarek Poplawski @ 2009-09-29  9:27 UTC (permalink / raw)
  To: Manuel Prinz; +Cc: netdev
In-Reply-To: <1254137084.4756.11.camel@ce170155.zmb.uni-duisburg-essen.de>

On 28-09-2009 13:24, Manuel Prinz wrote:
> Hi everyone,
> 
> I'm forwarding this bug in Debian (http://bugs.debian.org/538372) as
> requested by the Debian kernel team. A patch is available. Applying just
> the first hunk fixes the issue for me. I've not enough kernel knowledge
> to judge if this fix is a proper solution, though.
> 
> It would be really great if someone could have a look at it. Thanks in
> advance! (And please CC me in replies. Thanks!)

I've tried it with current include/linux and it works OK. Replacing
uio.h on Debian really was not enough, but it looks like missing
compiler.h entries could be the reason. Otherwise, please send your
compile error log.

Best regards,
Jarek P.

^ permalink raw reply

* Re: [PATCH 2/3] iwmc3200wifi: select IWMC3200TOP in Kconfig
From: Zhu Yi @ 2009-09-29  9:22 UTC (permalink / raw)
  To: Winkler, Tomas
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Perez-Gonzalez, Inaky, Kao, Cindy H, Cohen, Guy, Rindjunsky, Ron
In-Reply-To: <1253662724-16497-3-git-send-email-tomas.winkler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Wed, 2009-09-23 at 07:38 +0800, Winkler, Tomas wrote:
> iwmc3200wifi requires iwmc3200top  for its operation
> 
> Signed-off-by: Tomas Winkler <tomas.winkler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Acked-by: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Thanks,
-yi

> ---
>  drivers/net/wireless/iwmc3200wifi/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/wireless/iwmc3200wifi/Kconfig b/drivers/net/wireless/iwmc3200wifi/Kconfig
> index c62da43..69faaf1 100644
> --- a/drivers/net/wireless/iwmc3200wifi/Kconfig
> +++ b/drivers/net/wireless/iwmc3200wifi/Kconfig
> @@ -3,6 +3,7 @@ config IWM
>  	depends on MMC && WLAN_80211 && EXPERIMENTAL
>  	depends on CFG80211
>  	select FW_LOADER
> +	select IWMC3200TOP
>  	help
>  	  The Intel Wireless Multicomm 3200 hardware is a combo
>  	  card with GPS, Bluetooth, WiMax and 802.11 radios. It

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: WARNING: at net/ipv4/af_inet.c:154 inet_sock_destruct
From: Eric Dumazet @ 2009-09-29  9:18 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Linux Kernel Mailing List, Linux Netdev List, David S. Miller
In-Reply-To: <38b2ab8a0909290109m3f82c161j4fb0f1266152877e@mail.gmail.com>

Francis Moreau a écrit :
> Hello,
> 
> I got this kernel warning when stopping nfsd:
> 
> [260104.553720] WARNING: at net/ipv4/af_inet.c:154
> inet_sock_destruct+0x164/0x182()
> [260104.553722] Hardware name: P5K-VM
> [260104.553724] Modules linked in: jfs loop nfsd lockd nfs_acl
> auth_rpcgss exportfs sunrpc [last unloaded: microcode]
> [260104.553736] Pid: 858, comm: nfsd Tainted: G   M       2.6.31 #13
> [260104.553738] Call Trace:
> [260104.553743]  [<ffffffff813ed53a>] ? inet_sock_destruct+0x164/0x182
> [260104.553748]  [<ffffffff81044471>] warn_slowpath_common+0x7c/0xa9
> [260104.553751]  [<ffffffff810444b2>] warn_slowpath_null+0x14/0x16
> [260104.553754]  [<ffffffff813ed53a>] inet_sock_destruct+0x164/0x182
> [260104.553759]  [<ffffffff8138e1c0>] __sk_free+0x23/0xe7
> [260104.553762]  [<ffffffff8138e2fd>] sk_free+0x1f/0x21
> [260104.553765]  [<ffffffff8138e3c7>] sk_common_release+0xc8/0xcd
> [260104.553769]  [<ffffffff813e4459>] udp_lib_close+0xe/0x10
> [260104.553772]  [<ffffffff813ecfe2>] inet_release+0x55/0x5c
> [260104.553775]  [<ffffffff8138b746>] sock_release+0x1f/0x71
> [260104.553778]  [<ffffffff8138b7bf>] sock_close+0x27/0x2b
> [260104.553782]  [<ffffffff810d0641>] __fput+0xfb/0x1c0
> [260104.553787]  [<ffffffff8104a197>] ? local_bh_disable+0x12/0x14
> [260104.553790]  [<ffffffff810d0723>] fput+0x1d/0x1f
> [260104.553810]  [<ffffffffa0014035>] svc_sock_free+0x40/0x56 [sunrpc]
> [260104.553827]  [<ffffffffa001dea0>] svc_xprt_free+0x43/0x53 [sunrpc]
> [260104.553843]  [<ffffffffa001de5d>] ? svc_xprt_free+0x0/0x53 [sunrpc]
> [260104.553847]  [<ffffffff811b4641>] kref_put+0x43/0x4f
> [260104.553863]  [<ffffffffa001d224>] svc_close_xprt+0x55/0x5e [sunrpc]
> [260104.553879]  [<ffffffffa001d27d>] svc_close_all+0x50/0x69 [sunrpc]
> [260104.553894]  [<ffffffffa0012922>] svc_destroy+0x9e/0x142 [sunrpc]
> [260104.553910]  [<ffffffffa0012a7f>] svc_exit_thread+0xb9/0xc2 [sunrpc]
> [260104.553922]  [<ffffffffa00707b1>] ? nfsd+0x0/0x151 [nfsd]
> [260104.553932]  [<ffffffffa00708e8>] nfsd+0x137/0x151 [nfsd]
> [260104.553936]  [<ffffffff8105ad28>] kthread+0x94/0x9c
> [260104.553941]  [<ffffffff8100c1fa>] child_rip+0xa/0x20
> [260104.553944]  [<ffffffff81047b00>] ? do_exit+0x5d7/0x691
> [260104.553948]  [<ffffffff81039cf8>] ? finish_task_switch+0x6a/0xc7
> [260104.553953]  [<ffffffff8100bb6d>] ? restore_args+0x0/0x30
> [260104.553956]  [<ffffffff8105ac94>] ? kthread+0x0/0x9c
> [260104.553959]  [<ffffffff8100c1f0>] ? child_rip+0x0/0x20
> 
> It happens on 2.6.31 and older kernels as well though I don't remember
> when it really started.

Could you please try following patch ?

Thanks

[PATCH] net: Fix sock_wfree() race

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.


Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/sock.c |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..e1f034e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1228,17 +1228,22 @@ void __init sk_init(void)
 void sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
-	int res;
+	unsigned int len = skb->truesize;
 
-	/* In case it might be waiting for more memory. */
-	res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
-	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
+	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
+		/*
+		 * Keep a reference on sk_wmem_alloc, this will be released
+		 * after sk_write_space() call
+		 */
+		atomic_sub(len - 1, &sk->sk_wmem_alloc);
 		sk->sk_write_space(sk);
+		len = 1;
+	}
 	/*
-	 * if sk_wmem_alloc reached 0, we are last user and should
-	 * free this sock, as sk_free() call could not do it.
+	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
+	 * could not do because of in-flight packets
 	 */
-	if (res == 0)
+	if (atomic_sub_and_test(len, &sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);

^ permalink raw reply related

* Re: [PATCH] /proc/net/tcp, overhead removed
From: Yakov Lerner @ 2009-09-29  8:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem
In-Reply-To: <4AC1BDAD.1010400@gmail.com>

On Tue, Sep 29, 2009 at 10:56, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Yakov Lerner a écrit :
> > Take 2.
> >
> > "Sharp improvement in performance of /proc/net/tcp when number of
> > sockets is large and hashsize is large.
> > O(numsock * hashsize) time becomes O(numsock + hashsize). On slow
> > processors, speed difference can be x100 and more."
> >
> > I must say that I'm not fully satisfied with my choice of "st->sbucket"
> > for the new preserved index. The better name would be "st->snum".
> > Re-using "st->sbucket" saves 4 bytes, and keeps the patch to one sourcefile.
> > But "st->sbucket" has different meaning in OPENREQ and LISTEN states;
> > this can be confusing.
> > Maybe better add "snum" member to struct tcp_iter_state ?
> >
> > Shall I change subject when sending "take N+1", or keep the old subject ?
> >
> > Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
> > ---
> >  net/ipv4/tcp_ipv4.c |   35 +++++++++++++++++++++++++++++++++--
> >  1 files changed, 33 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 7cda24b..e4c4f19 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
> >               hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
> >  }
> >
> > -static void *established_get_first(struct seq_file *seq)
> > +static void *established_get_first_after(struct seq_file *seq, int bucket)
> >  {
> >       struct tcp_iter_state *st = seq->private;
> >       struct net *net = seq_file_net(seq);
> >       void *rc = NULL;
> >
> > -     for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
> > +     for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
> > +          ++st->bucket) {
> >               struct sock *sk;
> >               struct hlist_nulls_node *node;
> >               struct inet_timewait_sock *tw;
> > @@ -2010,6 +2011,8 @@ static void *established_get_first(struct seq_file *seq)
> >               if (empty_bucket(st))
> >                       continue;
> >
> > +             st->sbucket = st->num;
> > +
> >               spin_lock_bh(lock);
> >               sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) {
> >                       if (sk->sk_family != st->family ||
> > @@ -2036,6 +2039,11 @@ out:
> >       return rc;
> >  }
> >
> > +static void *established_get_first(struct seq_file *seq)
> > +{
> > +     return established_get_first_after(seq, 0);
> > +}
> > +
> >  static void *established_get_next(struct seq_file *seq, void *cur)
> >  {
> >       struct sock *sk = cur;
> > @@ -2064,6 +2072,9 @@ get_tw:
> >               while (++st->bucket < tcp_hashinfo.ehash_size &&
> >                               empty_bucket(st))
> >                       ;
> > +
> > +             st->sbucket = st->num;
> > +
> >               if (st->bucket >= tcp_hashinfo.ehash_size)
> >                       return NULL;
> >
> > @@ -2107,6 +2118,7 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
> >
> >       if (!rc) {
> >               st->state = TCP_SEQ_STATE_ESTABLISHED;
> > +             st->sbucket = 0;
> >               rc        = established_get_idx(seq, pos);
> >       }
> >
> > @@ -2116,6 +2128,25 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
> >  static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
> >  {
> >       struct tcp_iter_state *st = seq->private;
> > +
> > +     if (*pos && *pos >= st->sbucket &&
> > +         (st->state == TCP_SEQ_STATE_ESTABLISHED ||
> > +          st->state == TCP_SEQ_STATE_TIME_WAIT)) {
> > +             void *cur;
> > +             int nskip;
> > +
> > +             /* for states estab and tw, st->sbucket is index (*pos) */
> > +             /* corresponding to the beginning of bucket st->bucket */
> > +
> > +             st->num = st->sbucket;
> > +             /* jump to st->bucket, then skip (*pos - st->sbucket) items */
> > +             st->state = TCP_SEQ_STATE_ESTABLISHED;
> > +             cur = established_get_first_after(seq, st->bucket);
> > +             for (nskip = *pos - st->num; cur && nskip > 0; --nskip)
> > +                     cur = established_get_next(seq, cur);
> > +             return cur;
> > +     }
> > +
> >       st->state = TCP_SEQ_STATE_LISTENING;
> >       st->num = 0;
> >       return *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
>
> Just in case you are working on "take 3" of the patch, there is a fondamental problem.
>
> All the scalability problems come from the fact that tcp_seq_start()
> *has* to rescan all the tables from the begining, because of lseek() capability
> on /proc/net/tcp file
>
> We probably could disable llseek() (on other positions than start of the file),
> and rely only on internal state (listening/established hashtable, hash bucket, position in chain)
>
> I cannot imagine how an application could rely on lseek() on >0 position in this file.


I thought  /proc/net/tcp  can  both  be fast and allow lseek;
(1) when no lseek was issued since last read
(we can detect this), /proc/net/tcp can jump to the
last known bucket (common case), vs
(2) switch to slow mode (scan from the beginning of hash)
when lseek was used , no ?

^ permalink raw reply

* Re: [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Chris Wright @ 2009-09-29  8:53 UTC (permalink / raw)
  To: Shreyas Bhatewara
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Stephen Hemminger, David S. Miller, Jeff Garzik, Anthony Liguori,
	Chris Wright, Greg Kroah-Hartman, Andrew Morton, virtualization,
	pv-drivers@vmware.com
In-Reply-To: <89E2752CFA8EC044846EB849981913410173CDFAF6@EXCH-MBX-4.vmware.com>

* Shreyas Bhatewara (sbhatewara@vmware.com) wrote:
> Some of the features of vmxnet3 are :
>         PCIe 2.0 compliant PCI device: Vendor ID 0x15ad, Device ID 0x07b0
>         INTx, MSI, MSI-X (25 vectors) interrupts
>         16 Rx queues, 8 Tx queues

Driver doesn't appear to actually support more than a single MSI-X interrupt.
What is your plan for doing real multiqueue?

>         Offloads: TCP/UDP checksum, TSO over IPv4/IPv6,
>                     802.1q VLAN tag insertion, filtering, stripping
>                     Multicast filtering, Jumbo Frames

How about GRO conversion?

>         Wake-on-LAN, PCI Power Management D0-D3 states
>         PXE-ROM for boot support
> 

Whole thing appears to be space indented, and is fairly noisy w/ printk.
Also, heavy use of BUG_ON() (counted 51 of them), are you sure that none
of them can be triggered by guest or remote (esp. the ones that happen
in interrupt context)?  Some initial thoughts below.

<snip>
> diff --git a/drivers/net/vmxnet3/upt1_defs.h b/drivers/net/vmxnet3/upt1_defs.h
> new file mode 100644
> index 0000000..b50f91b
> --- /dev/null
> +++ b/drivers/net/vmxnet3/upt1_defs.h
> @@ -0,0 +1,104 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara <pv-drivers@vmware.com>
> + *
> + */
> +
> +/* upt1_defs.h
> + *
> + *      Definitions for Uniform Pass Through.
> + */

Most of the source files have this format (some include -- after file
name).  Could just keep it all w/in the same comment block.  Since you
went to the trouble of saying what the file does, something a tad more
descriptive would be welcome.

> +
> +#ifndef _UPT1_DEFS_H
> +#define _UPT1_DEFS_H
> +
> +#define UPT1_MAX_TX_QUEUES  64
> +#define UPT1_MAX_RX_QUEUES  64

This is different than the 16/8 described above (and seemingly all moot
since it becomes a single queue device).

> +
> +/* interrupt moderation level */
> +#define UPT1_IML_NONE     0 /* no interrupt moderation */
> +#define UPT1_IML_HIGHEST  7 /* least intr generated */
> +#define UPT1_IML_ADAPTIVE 8 /* adpative intr moderation */

enum?  also only appears to support adaptive mode?

> +/* values for UPT1_RSSConf.hashFunc */
> +enum {
> +       UPT1_RSS_HASH_TYPE_NONE      = 0x0,
> +       UPT1_RSS_HASH_TYPE_IPV4      = 0x01,
> +       UPT1_RSS_HASH_TYPE_TCP_IPV4  = 0x02,
> +       UPT1_RSS_HASH_TYPE_IPV6      = 0x04,
> +       UPT1_RSS_HASH_TYPE_TCP_IPV6  = 0x08,
> +};
> +
> +enum {
> +       UPT1_RSS_HASH_FUNC_NONE      = 0x0,
> +       UPT1_RSS_HASH_FUNC_TOEPLITZ  = 0x01,
> +};
> +
> +#define UPT1_RSS_MAX_KEY_SIZE        40
> +#define UPT1_RSS_MAX_IND_TABLE_SIZE  128
> +
> +struct UPT1_RSSConf {
> +       uint16_t   hashType;
> +       uint16_t   hashFunc;
> +       uint16_t   hashKeySize;
> +       uint16_t   indTableSize;
> +       uint8_t    hashKey[UPT1_RSS_MAX_KEY_SIZE];
> +       uint8_t    indTable[UPT1_RSS_MAX_IND_TABLE_SIZE];
> +};
> +
> +/* features */
> +enum {
> +       UPT1_F_RXCSUM      = 0x0001,   /* rx csum verification */
> +       UPT1_F_RSS         = 0x0002,
> +       UPT1_F_RXVLAN      = 0x0004,   /* VLAN tag stripping */
> +       UPT1_F_LRO         = 0x0008,
> +};
> +#endif
> diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h
> new file mode 100644
> index 0000000..a33a90b
> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_defs.h
> @@ -0,0 +1,534 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara <pv-drivers@vmware.com>
> + *
> + */
> +
> +/*
> + * vmxnet3_defs.h --

Not particularly useful ;-)

> + */
> +
> +#ifndef _VMXNET3_DEFS_H_
> +#define _VMXNET3_DEFS_H_
> +
> +#include "upt1_defs.h"
> +
> +/* all registers are 32 bit wide */
> +/* BAR 1 */
> +enum {
> +       VMXNET3_REG_VRRS  = 0x0,        /* Vmxnet3 Revision Report Selection */
> +       VMXNET3_REG_UVRS  = 0x8,        /* UPT Version Report Selection */
> +       VMXNET3_REG_DSAL  = 0x10,       /* Driver Shared Address Low */
> +       VMXNET3_REG_DSAH  = 0x18,       /* Driver Shared Address High */
> +       VMXNET3_REG_CMD   = 0x20,       /* Command */
> +       VMXNET3_REG_MACL  = 0x28,       /* MAC Address Low */
> +       VMXNET3_REG_MACH  = 0x30,       /* MAC Address High */
> +       VMXNET3_REG_ICR   = 0x38,       /* Interrupt Cause Register */
> +       VMXNET3_REG_ECR   = 0x40        /* Event Cause Register */
> +};
> +
> +/* BAR 0 */
> +enum {
> +       VMXNET3_REG_IMR      = 0x0,     /* Interrupt Mask Register */
> +       VMXNET3_REG_TXPROD   = 0x600,   /* Tx Producer Index */
> +       VMXNET3_REG_RXPROD   = 0x800,   /* Rx Producer Index for ring 1 */
> +       VMXNET3_REG_RXPROD2  = 0xA00    /* Rx Producer Index for ring 2 */
> +};
> +
> +#define VMXNET3_PT_REG_SIZE     4096   /* BAR 0 */
> +#define VMXNET3_VD_REG_SIZE     4096   /* BAR 1 */
> +
> +#define VMXNET3_REG_ALIGN       8      /* All registers are 8-byte aligned. */
> +#define VMXNET3_REG_ALIGN_MASK  0x7
> +
> +/* I/O Mapped access to registers */
> +#define VMXNET3_IO_TYPE_PT              0
> +#define VMXNET3_IO_TYPE_VD              1
> +#define VMXNET3_IO_ADDR(type, reg)      (((type) << 24) | ((reg) & 0xFFFFFF))
> +#define VMXNET3_IO_TYPE(addr)           ((addr) >> 24)
> +#define VMXNET3_IO_REG(addr)            ((addr) & 0xFFFFFF)
> +
> +enum {
> +       VMXNET3_CMD_FIRST_SET = 0xCAFE0000,
> +       VMXNET3_CMD_ACTIVATE_DEV = VMXNET3_CMD_FIRST_SET,
> +       VMXNET3_CMD_QUIESCE_DEV,
> +       VMXNET3_CMD_RESET_DEV,
> +       VMXNET3_CMD_UPDATE_RX_MODE,
> +       VMXNET3_CMD_UPDATE_MAC_FILTERS,
> +       VMXNET3_CMD_UPDATE_VLAN_FILTERS,
> +       VMXNET3_CMD_UPDATE_RSSIDT,
> +       VMXNET3_CMD_UPDATE_IML,
> +       VMXNET3_CMD_UPDATE_PMCFG,
> +       VMXNET3_CMD_UPDATE_FEATURE,
> +       VMXNET3_CMD_LOAD_PLUGIN,
> +
> +       VMXNET3_CMD_FIRST_GET = 0xF00D0000,
> +       VMXNET3_CMD_GET_QUEUE_STATUS = VMXNET3_CMD_FIRST_GET,
> +       VMXNET3_CMD_GET_STATS,
> +       VMXNET3_CMD_GET_LINK,
> +       VMXNET3_CMD_GET_PERM_MAC_LO,
> +       VMXNET3_CMD_GET_PERM_MAC_HI,
> +       VMXNET3_CMD_GET_DID_LO,
> +       VMXNET3_CMD_GET_DID_HI,
> +       VMXNET3_CMD_GET_DEV_EXTRA_INFO,
> +       VMXNET3_CMD_GET_CONF_INTR
> +};
> +
> +struct Vmxnet3_TxDesc {
> +       uint64_t addr;
> +
> +       uint32_t len:14;
> +       uint32_t gen:1;      /* generation bit */
> +       uint32_t rsvd:1;
> +       uint32_t dtype:1;    /* descriptor type */
> +       uint32_t ext1:1;
> +       uint32_t msscof:14;  /* MSS, checksum offset, flags */
> +
> +       uint32_t hlen:10;    /* header len */
> +       uint32_t om:2;       /* offload mode */
> +       uint32_t eop:1;      /* End Of Packet */
> +       uint32_t cq:1;       /* completion request */
> +       uint32_t ext2:1;
> +       uint32_t ti:1;       /* VLAN Tag Insertion */
> +       uint32_t tci:16;     /* Tag to Insert */
> +};
> +
> +/* TxDesc.OM values */
> +#define VMXNET3_OM_NONE  0
> +#define VMXNET3_OM_CSUM  2
> +#define VMXNET3_OM_TSO   3
> +
> +/* fields in TxDesc we access w/o using bit fields */
> +#define VMXNET3_TXD_EOP_SHIFT 12
> +#define VMXNET3_TXD_CQ_SHIFT  13
> +#define VMXNET3_TXD_GEN_SHIFT 14
> +
> +#define VMXNET3_TXD_CQ  (1 << VMXNET3_TXD_CQ_SHIFT)
> +#define VMXNET3_TXD_EOP (1 << VMXNET3_TXD_EOP_SHIFT)
> +#define VMXNET3_TXD_GEN (1 << VMXNET3_TXD_GEN_SHIFT)
> +
> +#define VMXNET3_HDR_COPY_SIZE   128
> +
> +
> +struct Vmxnet3_TxDataDesc {
> +       uint8_t data[VMXNET3_HDR_COPY_SIZE];
> +};
> +
> +
> +struct Vmxnet3_TxCompDesc {
> +       uint32_t txdIdx:12;    /* Index of the EOP TxDesc */
> +       uint32_t ext1:20;
> +
> +       uint32_t ext2;
> +       uint32_t ext3;
> +
> +       uint32_t rsvd:24;
> +       uint32_t type:7;       /* completion type */
> +       uint32_t gen:1;        /* generation bit */
> +};
> +
> +
> +struct Vmxnet3_RxDesc {
> +       uint64_t addr;
> +
> +       uint32_t len:14;
> +       uint32_t btype:1;      /* Buffer Type */
> +       uint32_t dtype:1;      /* Descriptor type */
> +       uint32_t rsvd:15;
> +       uint32_t gen:1;        /* Generation bit */
> +
> +       uint32_t ext1;
> +};
> +
> +/* values of RXD.BTYPE */
> +#define VMXNET3_RXD_BTYPE_HEAD   0    /* head only */
> +#define VMXNET3_RXD_BTYPE_BODY   1    /* body only */
> +
> +/* fields in RxDesc we access w/o using bit fields */
> +#define VMXNET3_RXD_BTYPE_SHIFT  14
> +#define VMXNET3_RXD_GEN_SHIFT    31
> +
> +
> +struct Vmxnet3_RxCompDesc {
> +       uint32_t rxdIdx:12;    /* Index of the RxDesc */
> +       uint32_t ext1:2;
> +       uint32_t eop:1;        /* End of Packet */
> +       uint32_t sop:1;        /* Start of Packet */
> +       uint32_t rqID:10;      /* rx queue/ring ID */
> +       uint32_t rssType:4;    /* RSS hash type used */
> +       uint32_t cnc:1;        /* Checksum Not Calculated */
> +       uint32_t ext2:1;
> +
> +       uint32_t rssHash;      /* RSS hash value */
> +
> +       uint32_t len:14;       /* data length */
> +       uint32_t err:1;        /* Error */
> +       uint32_t ts:1;         /* Tag is stripped */
> +       uint32_t tci:16;       /* Tag stripped */
> +
> +       uint32_t csum:16;
> +       uint32_t tuc:1;        /* TCP/UDP Checksum Correct */
> +       uint32_t udp:1;        /* UDP packet */
> +       uint32_t tcp:1;        /* TCP packet */
> +       uint32_t ipc:1;        /* IP Checksum Correct */
> +       uint32_t v6:1;         /* IPv6 */
> +       uint32_t v4:1;         /* IPv4 */
> +       uint32_t frg:1;        /* IP Fragment */
> +       uint32_t fcs:1;        /* Frame CRC correct */
> +       uint32_t type:7;       /* completion type */
> +       uint32_t gen:1;        /* generation bit */
> +};
> +
> +/* fields in RxCompDesc we access via Vmxnet3_GenericDesc.dword[3] */
> +#define VMXNET3_RCD_TUC_SHIFT  16
> +#define VMXNET3_RCD_IPC_SHIFT  19
> +
> +/* fields in RxCompDesc we access via Vmxnet3_GenericDesc.qword[1] */
> +#define VMXNET3_RCD_TYPE_SHIFT 56
> +#define VMXNET3_RCD_GEN_SHIFT  63
> +
> +/* csum OK for TCP/UDP pkts over IP */
> +#define VMXNET3_RCD_CSUM_OK (1 << VMXNET3_RCD_TUC_SHIFT | \
> +                            1 << VMXNET3_RCD_IPC_SHIFT)
> +
> +/* value of RxCompDesc.rssType */
> +enum {
> +       VMXNET3_RCD_RSS_TYPE_NONE     = 0,
> +       VMXNET3_RCD_RSS_TYPE_IPV4     = 1,
> +       VMXNET3_RCD_RSS_TYPE_TCPIPV4  = 2,
> +       VMXNET3_RCD_RSS_TYPE_IPV6     = 3,
> +       VMXNET3_RCD_RSS_TYPE_TCPIPV6  = 4,
> +};
> +
> +/* a union for accessing all cmd/completion descriptors */
> +union Vmxnet3_GenericDesc {
> +       uint64_t                        qword[2];
> +       uint32_t                        dword[4];
> +       uint16_t                        word[8];
> +       struct Vmxnet3_TxDesc           txd;
> +       struct Vmxnet3_RxDesc           rxd;
> +       struct Vmxnet3_TxCompDesc       tcd;
> +       struct Vmxnet3_RxCompDesc       rcd;
> +};
> +
> +#define VMXNET3_INIT_GEN       1
> +
> +/* Max size of a single tx buffer */
> +#define VMXNET3_MAX_TX_BUF_SIZE  (1 << 14)
> +
> +/* # of tx desc needed for a tx buffer size */
> +#define VMXNET3_TXD_NEEDED(size) (((size) + VMXNET3_MAX_TX_BUF_SIZE - 1) / \
> +                                 VMXNET3_MAX_TX_BUF_SIZE)
> +
> +/* max # of tx descs for a non-tso pkt */
> +#define VMXNET3_MAX_TXD_PER_PKT 16
> +
> +/* Max size of a single rx buffer */
> +#define VMXNET3_MAX_RX_BUF_SIZE  ((1 << 14) - 1)
> +/* Minimum size of a type 0 buffer */
> +#define VMXNET3_MIN_T0_BUF_SIZE  128
> +#define VMXNET3_MAX_CSUM_OFFSET  1024
> +
> +/* Ring base address alignment */
> +#define VMXNET3_RING_BA_ALIGN   512
> +#define VMXNET3_RING_BA_MASK    (VMXNET3_RING_BA_ALIGN - 1)
> +
> +/* Ring size must be a multiple of 32 */
> +#define VMXNET3_RING_SIZE_ALIGN 32
> +#define VMXNET3_RING_SIZE_MASK  (VMXNET3_RING_SIZE_ALIGN - 1)
> +
> +/* Max ring size */
> +#define VMXNET3_TX_RING_MAX_SIZE   4096
> +#define VMXNET3_TC_RING_MAX_SIZE   4096
> +#define VMXNET3_RX_RING_MAX_SIZE   4096
> +#define VMXNET3_RC_RING_MAX_SIZE   8192
> +
> +/* a list of reasons for queue stop */
> +
> +enum {
> + VMXNET3_ERR_NOEOP        = 0x80000000,  /* cannot find the EOP desc of a pkt */
> + VMXNET3_ERR_TXD_REUSE    = 0x80000001,  /* reuse TxDesc before tx completion */
> + VMXNET3_ERR_BIG_PKT      = 0x80000002,  /* too many TxDesc for a pkt */
> + VMXNET3_ERR_DESC_NOT_SPT = 0x80000003,  /* descriptor type not supported */
> + VMXNET3_ERR_SMALL_BUF    = 0x80000004,  /* type 0 buffer too small */
> + VMXNET3_ERR_STRESS       = 0x80000005,  /* stress option firing in vmkernel */
> + VMXNET3_ERR_SWITCH       = 0x80000006,  /* mode switch failure */
> + VMXNET3_ERR_TXD_INVALID  = 0x80000007,  /* invalid TxDesc */
> +};
> +
> +/* completion descriptor types */
> +#define VMXNET3_CDTYPE_TXCOMP      0    /* Tx Completion Descriptor */
> +#define VMXNET3_CDTYPE_RXCOMP      3    /* Rx Completion Descriptor */
> +
> +enum {
> +       VMXNET3_GOS_BITS_UNK    = 0,   /* unknown */
> +       VMXNET3_GOS_BITS_32     = 1,
> +       VMXNET3_GOS_BITS_64     = 2,
> +};
> +
> +#define VMXNET3_GOS_TYPE_LINUX 1
> +
> +/* All structures in DriverShared are padded to multiples of 8 bytes */
> +
> +
> +struct Vmxnet3_GOSInfo {
> +       uint32_t   gosBits:2;   /* 32-bit or 64-bit? */
> +       uint32_t   gosType:4;   /* which guest */
> +       uint32_t   gosVer:16;   /* gos version */
> +       uint32_t   gosMisc:10;  /* other info about gos */
> +};
> +
> +
> +struct Vmxnet3_DriverInfo {
> +       uint32_t          version;        /* driver version */
> +       struct Vmxnet3_GOSInfo gos;
> +       uint32_t          vmxnet3RevSpt;  /* vmxnet3 revision supported */
> +       uint32_t          uptVerSpt;      /* upt version supported */
> +};
> +
> +#define VMXNET3_REV1_MAGIC  0xbabefee1
> 
> +
> +/*
> + * QueueDescPA must be 128 bytes aligned. It points to an array of
> + * Vmxnet3_TxQueueDesc followed by an array of Vmxnet3_RxQueueDesc.
> + * The number of Vmxnet3_TxQueueDesc/Vmxnet3_RxQueueDesc are specified by
> + * Vmxnet3_MiscConf.numTxQueues/numRxQueues, respectively.
> + */
> +#define VMXNET3_QUEUE_DESC_ALIGN  128

Lot of inconsistent spacing between types and names in the structure def'ns

> +struct Vmxnet3_MiscConf {
> +       struct Vmxnet3_DriverInfo driverInfo;
> +       uint64_t             uptFeatures;
> +       uint64_t             ddPA;         /* driver data PA */
> +       uint64_t             queueDescPA;  /* queue descriptor table PA */
> +       uint32_t             ddLen;        /* driver data len */
> +       uint32_t             queueDescLen; /* queue desc. table len in bytes */
> +       uint32_t             mtu;
> +       uint16_t             maxNumRxSG;
> +       uint8_t              numTxQueues;
> +       uint8_t              numRxQueues;
> +       uint32_t             reserved[4];
> +};

should this be packed (or others that are shared w/ device)?  i assume
you've already done 32 vs 64 here

> +struct Vmxnet3_TxQueueConf {
> +       uint64_t    txRingBasePA;
> +       uint64_t    dataRingBasePA;
> +       uint64_t    compRingBasePA;
> +       uint64_t    ddPA;         /* driver data */
> +       uint64_t    reserved;
> +       uint32_t    txRingSize;   /* # of tx desc */
> +       uint32_t    dataRingSize; /* # of data desc */
> +       uint32_t    compRingSize; /* # of comp desc */
> +       uint32_t    ddLen;        /* size of driver data */
> +       uint8_t     intrIdx;
> +       uint8_t     _pad[7];
> +};
> +
> +
> +struct Vmxnet3_RxQueueConf {
> +       uint64_t    rxRingBasePA[2];
> +       uint64_t    compRingBasePA;
> +       uint64_t    ddPA;            /* driver data */
> +       uint64_t    reserved;
> +       uint32_t    rxRingSize[2];   /* # of rx desc */
> +       uint32_t    compRingSize;    /* # of rx comp desc */
> +       uint32_t    ddLen;           /* size of driver data */
> +       uint8_t     intrIdx;
> +       uint8_t     _pad[7];
> +};
> +
> +enum vmxnet3_intr_mask_mode {
> +       VMXNET3_IMM_AUTO   = 0,
> +       VMXNET3_IMM_ACTIVE = 1,
> +       VMXNET3_IMM_LAZY   = 2
> +};
> +
> +enum vmxnet3_intr_type {
> +       VMXNET3_IT_AUTO = 0,
> +       VMXNET3_IT_INTX = 1,
> +       VMXNET3_IT_MSI  = 2,
> +       VMXNET3_IT_MSIX = 3
> +};
> +
> +#define VMXNET3_MAX_TX_QUEUES  8
> +#define VMXNET3_MAX_RX_QUEUES  16

different to UPT, I must've missed some layering here

> +/* addition 1 for events */
> +#define VMXNET3_MAX_INTRS      25
> +
> +
<snip>

> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -0,0 +1,2608 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
<snip>
> +/*
> + * vmxnet3_drv.c --
> + *
> + *      Linux driver for VMware's vmxnet3 NIC
> + */

Not useful

> +static void
> +vmxnet3_enable_intr(struct vmxnet3_adapter *adapter, unsigned intr_idx)
> +{
> +       VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_IMR + intr_idx * 8, 0);

	writel(0, adapter->hw_addr0 + VMXNET3_REG_IMR + intr_idx * 8)

seems just as clear to me.

> +vmxnet3_enable_all_intrs(struct vmxnet3_adapter *adapter)
> +{
> +       int i;
> +
> +       for (i = 0; i < adapter->intr.num_intrs; i++)
> +               vmxnet3_enable_intr(adapter, i);
> +}
> +
> +static void
> +vmxnet3_disable_all_intrs(struct vmxnet3_adapter *adapter)
> +{
> +       int i;
> +
> +       for (i = 0; i < adapter->intr.num_intrs; i++)
> +               vmxnet3_disable_intr(adapter, i);
> +}

only ever num_intrs=1, so there's some plan to bump this up and make
these wrappers useful?

> +static void
> +vmxnet3_ack_events(struct vmxnet3_adapter *adapter, u32 events)
> +{
> +       VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_ECR, events);
> +}
> +
> +
> +static bool
> +vmxnet3_tq_stopped(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
> +{
> +       return netif_queue_stopped(adapter->netdev);
> +}
> +
> +
> +static void
> +vmxnet3_tq_start(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter  *adapter)
> +{
> +       tq->stopped = false;

is tq->stopped used besides just toggling back and forth?

> +       netif_start_queue(adapter->netdev);
> +}

> +static void
> +vmxnet3_process_events(struct vmxnet3_adapter *adapter)

Should be trivial to break out to it's own MSI-X vector, basically set
up to do that already.

> +{
> +       u32 events = adapter->shared->ecr;
> +       if (!events)
> +               return;
> +
> +       vmxnet3_ack_events(adapter, events);
> +
> +       /* Check if link state has changed */
> +       if (events & VMXNET3_ECR_LINK)
> +               vmxnet3_check_link(adapter);
> +
> +       /* Check if there is an error on xmit/recv queues */
> +       if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
> +               VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> +                                      VMXNET3_CMD_GET_QUEUE_STATUS);
> +
> +               if (adapter->tqd_start->status.stopped) {
> +                       printk(KERN_ERR "%s: tq error 0x%x\n",
> +                              adapter->netdev->name,
> +                              adapter->tqd_start->status.error);
> +               }
> +               if (adapter->rqd_start->status.stopped) {
> +                       printk(KERN_ERR "%s: rq error 0x%x\n",
> +                              adapter->netdev->name,
> +                              adapter->rqd_start->status.error);
> +               }
> +
> +               schedule_work(&adapter->work);
> +       }
> +}
<snip>

> +
> +       tq->buf_info = kcalloc(sizeof(tq->buf_info[0]), tq->tx_ring.size,
> +                              GFP_KERNEL);

kcalloc args look backwards

<snip>
> +static int
> +vmxnet3_alloc_pci_resources(struct vmxnet3_adapter *adapter, bool *dma64)
> +{
> +       int err;
> +       unsigned long mmio_start, mmio_len;
> +       struct pci_dev *pdev = adapter->pdev;
> +
> +       err = pci_enable_device(pdev);

looks ioport free, can be pci_enable_device_mem()...

> +       if (err) {
> +               printk(KERN_ERR "Failed to enable adapter %s: error %d\n",
> +                      pci_name(pdev), err);
> +               return err;
> +       }
> +
> +       if (pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) == 0) {
> +               if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)) != 0) {
> +                       printk(KERN_ERR "pci_set_consistent_dma_mask failed "
> +                              "for adapter %s\n", pci_name(pdev));
> +                       err = -EIO;
> +                       goto err_set_mask;
> +               }
> +               *dma64 = true;
> +       } else {
> +               if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32)) != 0) {
> +                       printk(KERN_ERR "pci_set_dma_mask failed for adapter "
> +                              "%s\n",  pci_name(pdev));
> +                       err = -EIO;
> +                       goto err_set_mask;
> +               }
> +               *dma64 = false;
> +       }
> +
> +       err = pci_request_regions(pdev, vmxnet3_driver_name);

...pci_request_selected_regions()

> +       if (err) {
> +               printk(KERN_ERR "Failed to request region for adapter %s: "
> +                      "error %d\n", pci_name(pdev), err);
> +               goto err_set_mask;
> +       }
> +
> +       pci_set_master(pdev);
> +
> +       mmio_start = pci_resource_start(pdev, 0);
> +       mmio_len = pci_resource_len(pdev, 0);
> +       adapter->hw_addr0 = ioremap(mmio_start, mmio_len);
> +       if (!adapter->hw_addr0) {
> +               printk(KERN_ERR "Failed to map bar0 for adapter %s\n",
> +                      pci_name(pdev));
> +               err = -EIO;
> +               goto err_ioremap;
> +       }
> +
> +       mmio_start = pci_resource_start(pdev, 1);
> +       mmio_len = pci_resource_len(pdev, 1);
> +       adapter->hw_addr1 = ioremap(mmio_start, mmio_len);
> +       if (!adapter->hw_addr1) {
> +               printk(KERN_ERR "Failed to map bar1 for adapter %s\n",
> +                      pci_name(pdev));
> +               err = -EIO;
> +               goto err_bar1;
> +       }
> +       return 0;
> +
> +err_bar1:
> +       iounmap(adapter->hw_addr0);
> +err_ioremap:
> +       pci_release_regions(pdev);

...and pci_release_selected_regions()

> +err_set_mask:
> +       pci_disable_device(pdev);
> +       return err;
> +}
> +

<snip>
> +vmxnet3_declare_features(struct vmxnet3_adapter *adapter, bool dma64)
> +{
> +       struct net_device *netdev = adapter->netdev;
> +
> +       netdev->features = NETIF_F_SG |
> +               NETIF_F_HW_CSUM |
> +               NETIF_F_HW_VLAN_TX |
> +               NETIF_F_HW_VLAN_RX |
> +               NETIF_F_HW_VLAN_FILTER |
> +               NETIF_F_TSO |
> +               NETIF_F_TSO6;
> +
> +       printk(KERN_INFO "features: sg csum vlan jf tso tsoIPv6");
> +
> +       adapter->rxcsum = true;
> +       adapter->jumbo_frame = true;
> +
> +       if (!disable_lro) {
> +               adapter->lro = true;
> +               printk(" lro");
> +       }

Plan to switch to GRO?

> +       if (dma64) {
> +               netdev->features |= NETIF_F_HIGHDMA;
> +               printk(" highDMA");
> +       }
> +
> +       netdev->vlan_features = netdev->features;
> +       printk("\n");
> +}
> +
> +static int __devinit
> +vmxnet3_probe_device(struct pci_dev *pdev,
> +                    const struct pci_device_id *id)
> +{
> +       static const struct net_device_ops vmxnet3_netdev_ops = {
> +               .ndo_open  = vmxnet3_open,
> +               .ndo_stop  = vmxnet3_close,
> +               .ndo_start_xmit = vmxnet3_xmit_frame,
> +               .ndo_set_mac_address = vmxnet3_set_mac_addr,
> +               .ndo_change_mtu = vmxnet3_change_mtu,
> +               .ndo_get_stats = vmxnet3_get_stats,
> +               .ndo_tx_timeout = vmxnet3_tx_timeout,
> +               .ndo_set_multicast_list = vmxnet3_set_mc,
> +               .ndo_vlan_rx_register = vmxnet3_vlan_rx_register,
> +               .ndo_vlan_rx_add_vid = vmxnet3_vlan_rx_add_vid,
> +               .ndo_vlan_rx_kill_vid = vmxnet3_vlan_rx_kill_vid,
> +#   ifdef CONFIG_NET_POLL_CONTROLLER
> +               .ndo_poll_controller = vmxnet3_netpoll,
> +#   endif

#ifdef
#endif

is more typical style here

> +       };
> +       int err;
> +       bool dma64 = false; /* stupid gcc */
> +       u32 ver;
> +       struct net_device *netdev;
> +       struct vmxnet3_adapter *adapter;
> +       u8  mac[ETH_ALEN];

extra space between type and name

> +
> +       netdev = alloc_etherdev(sizeof(struct vmxnet3_adapter));
> +       if (!netdev) {
> +               printk(KERN_ERR "Failed to alloc ethernet device for adapter "
> +                       "%s\n", pci_name(pdev));
> +               return -ENOMEM;
> +       }
> +
> +       pci_set_drvdata(pdev, netdev);
> +       adapter = netdev_priv(netdev);
> +       adapter->netdev = netdev;
> +       adapter->pdev = pdev;
> +
> +       adapter->shared = pci_alloc_consistent(adapter->pdev,
> +                         sizeof(struct Vmxnet3_DriverShared),
> +                         &adapter->shared_pa);
> +       if (!adapter->shared) {
> +               printk(KERN_ERR "Failed to allocate memory for %s\n",
> +                       pci_name(pdev));
> +               err = -ENOMEM;
> +               goto err_alloc_shared;
> +       }
> +
> +       adapter->tqd_start  = pci_alloc_consistent(adapter->pdev,

extra space before =

> diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> new file mode 100644
> index 0000000..490577f
> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> +#include "vmxnet3_int.h"
> +
> +struct vmxnet3_stat_desc {
> +       char desc[ETH_GSTRING_LEN];
> +       int  offset;
> +};
> +
> +
> +static u32
> +vmxnet3_get_rx_csum(struct net_device *netdev)
> +{
> +       struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> +       return adapter->rxcsum;
> +}
> +
> +
> +static int
> +vmxnet3_set_rx_csum(struct net_device *netdev, u32 val)
> +{
> +       struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> +
> +       if (adapter->rxcsum != val) {
> +               adapter->rxcsum = val;
> +               if (netif_running(netdev)) {
> +                       if (val)
> +                               adapter->shared->devRead.misc.uptFeatures |=
> +                                                               UPT1_F_RXCSUM;
> +                       else
> +                               adapter->shared->devRead.misc.uptFeatures &=
> +                                                               ~UPT1_F_RXCSUM;
> +
> +                       VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> +                                              VMXNET3_CMD_UPDATE_FEATURE);
> +               }
> +       }
> +       return 0;
> +}
> +
> +
> +static u32
> +vmxnet3_get_tx_csum(struct net_device *netdev)
> +{
> +       return (netdev->features & NETIF_F_HW_CSUM) != 0;
> +}

Not needed

> +static int
> +vmxnet3_set_tx_csum(struct net_device *netdev, u32 val)
> +{
> +       if (val)
> +               netdev->features |= NETIF_F_HW_CSUM;
> +       else
> +               netdev->features &= ~NETIF_F_HW_CSUM;
> +
> +       return 0;
> +}

This is just ethtool_op_set_tx_hw_csum()

> +static int
> +vmxnet3_set_sg(struct net_device *netdev, u32 val)
> +{
> +       ethtool_op_set_sg(netdev, val);
> +       return 0;
> +}

Useless wrapper

> +static int
> +vmxnet3_set_tso(struct net_device *netdev, u32 val)
> +{
> +       ethtool_op_set_tso(netdev, val);
> +       return 0;
> +}

Useless wrapper

> +struct net_device_stats*
> +vmxnet3_get_stats(struct net_device *netdev)
> +{
> +       struct vmxnet3_adapter *adapter;
> +       struct vmxnet3_tq_driver_stats *drvTxStats;
> +       struct vmxnet3_rq_driver_stats *drvRxStats;
> +       struct UPT1_TxStats *devTxStats;
> +       struct UPT1_RxStats *devRxStats;
> +
> +       adapter = netdev_priv(netdev);
> +
> +       /* Collect the dev stats into the shared area */
> +       VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
> +
> +       /* Assuming that we have a single queue device */
> +       devTxStats = &adapter->tqd_start->stats;
> +       devRxStats = &adapter->rqd_start->stats;

Another single queue assumption

> +
> +       /* Get access to the driver stats per queue */
> +       drvTxStats = &adapter->tx_queue.stats;
> +       drvRxStats = &adapter->rx_queue.stats;
> +
> +       memset(&adapter->net_stats, 0, sizeof(adapter->net_stats));
> +
> +       adapter->net_stats.rx_packets = devRxStats->ucastPktsRxOK +
> +                                       devRxStats->mcastPktsRxOK +
> +                                       devRxStats->bcastPktsRxOK;
> +
> +       adapter->net_stats.tx_packets = devTxStats->ucastPktsTxOK +
> +                                       devTxStats->mcastPktsTxOK +
> +                                       devTxStats->bcastPktsTxOK;
> +
> +       adapter->net_stats.rx_bytes = devRxStats->ucastBytesRxOK +
> +                                       devRxStats->mcastBytesRxOK +
> +                                       devRxStats->bcastBytesRxOK;
> +
> +       adapter->net_stats.tx_bytes = devTxStats->ucastBytesTxOK +
> +                                       devTxStats->mcastBytesTxOK +
> +                                       devTxStats->bcastBytesTxOK;
> +
> +       adapter->net_stats.rx_errors = devRxStats->pktsRxError;
> +       adapter->net_stats.tx_errors = devTxStats->pktsTxError;
> +       adapter->net_stats.rx_dropped = drvRxStats->drop_total;
> +       adapter->net_stats.tx_dropped = drvTxStats->drop_total;
> +       adapter->net_stats.multicast =  devRxStats->mcastPktsRxOK;
> +
> +       return &adapter->net_stats;
> +}
> +
> +static int
> +vmxnet3_get_stats_count(struct net_device *netdev)
> +{
> +       return ARRAY_SIZE(vmxnet3_tq_dev_stats) +
> +               ARRAY_SIZE(vmxnet3_tq_driver_stats) +
> +               ARRAY_SIZE(vmxnet3_rq_dev_stats) +
> +               ARRAY_SIZE(vmxnet3_rq_driver_stats) +
> +               ARRAY_SIZE(vmxnet3_global_stats);
> +}
> +
> +
> +static int
> +vmxnet3_get_regs_len(struct net_device *netdev)
> +{
> +       return 20 * sizeof(u32);
> +}
> +
> +
> +static void
> +vmxnet3_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
> +{
> +       struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> +
> +       strncpy(drvinfo->driver, vmxnet3_driver_name, sizeof(drvinfo->driver));
> +       drvinfo->driver[sizeof(drvinfo->driver) - 1] = '\0';
> +
> +       strncpy(drvinfo->version, VMXNET3_DRIVER_VERSION_REPORT,
> +               sizeof(drvinfo->version));
> +       drvinfo->driver[sizeof(drvinfo->version) - 1] = '\0';
> +
> +       strncpy(drvinfo->fw_version, "N/A", sizeof(drvinfo->fw_version));
> +       drvinfo->fw_version[sizeof(drvinfo->fw_version) - 1] = '\0';
> +
> +       strncpy(drvinfo->bus_info,   pci_name(adapter->pdev),
> +               ETHTOOL_BUSINFO_LEN);

simplify all these to strlcpy

> +       drvinfo->n_stats = vmxnet3_get_stats_count(netdev);
> +       drvinfo->testinfo_len = 0;
> +       drvinfo->eedump_len   = 0;
> +       drvinfo->regdump_len  = vmxnet3_get_regs_len(netdev);
> +}

> +static int
> +vmxnet3_set_ringparam(struct net_device *netdev,
> +               struct ethtool_ringparam *param)
> +{
> +       struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> +       u32 new_tx_ring_size, new_rx_ring_size;
> +       u32 sz;
> +       int err = 0;
> +
> +       if (param->tx_pending == 0 || param->tx_pending >
> +                                               VMXNET3_TX_RING_MAX_SIZE) {
> +               printk(KERN_ERR "%s: invalid tx ring size %u\n", netdev->name,
> +                       param->tx_pending);

Seems noisy

> +               return -EINVAL;
> +       }
> +       if (param->rx_pending == 0 || param->rx_pending >
> +                                       VMXNET3_RX_RING_MAX_SIZE) {
> +               printk(KERN_ERR "%s: invalid rx ring size %u\n", netdev->name,
> +                       param->rx_pending);

Same here

> +               return -EINVAL;
> +       }
> +
> +       /* round it up to a multiple of VMXNET3_RING_SIZE_ALIGN */
> +       new_tx_ring_size = (param->tx_pending + VMXNET3_RING_SIZE_MASK) &
> +                                                       ~VMXNET3_RING_SIZE_MASK;
> +       new_tx_ring_size = min_t(u32, new_tx_ring_size,
> +                                VMXNET3_TX_RING_MAX_SIZE);
> +       BUG_ON(new_tx_ring_size > VMXNET3_TX_RING_MAX_SIZE);
> +       BUG_ON(new_tx_ring_size % VMXNET3_RING_SIZE_ALIGN != 0);

Don't use BUG_ON for validating user input

> +
> +       /* ring0 has to be a multiple of
> +        * rx_buf_per_pkt * VMXNET3_RING_SIZE_ALIGN
> +        */
> +       sz = adapter->rx_buf_per_pkt * VMXNET3_RING_SIZE_ALIGN;
> +       new_rx_ring_size = (param->rx_pending + sz - 1) / sz * sz;
> +       new_rx_ring_size = min_t(u32, new_rx_ring_size,
> +                                VMXNET3_RX_RING_MAX_SIZE / sz * sz);
> +       BUG_ON(new_rx_ring_size > VMXNET3_RX_RING_MAX_SIZE);
> +       BUG_ON(new_rx_ring_size % sz != 0);
> +
> +       if (new_tx_ring_size == adapter->tx_queue.tx_ring.size &&
> +                       new_rx_ring_size == adapter->rx_queue.rx_ring[0].size) {
> +               return 0;
> +       }
> +
> +       /*
> +        * Reset_work may be in the middle of resetting the device, wait for its
> +        * completion.
> +        */
> +       while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
> +               msleep(1);
> +
> +       if (netif_running(netdev)) {
> +               vmxnet3_quiesce_dev(adapter);
> +               vmxnet3_reset_dev(adapter);
> +
> +               /* recreate the rx queue and the tx queue based on the
> +                * new sizes */
> +               vmxnet3_tq_destroy(&adapter->tx_queue, adapter);
> +               vmxnet3_rq_destroy(&adapter->rx_queue, adapter);
> +
> +               err = vmxnet3_create_queues(adapter, new_tx_ring_size,
> +                       new_rx_ring_size, VMXNET3_DEF_RX_RING_SIZE);
> +               if (err) {
> +                       /* failed, most likely because of OOM, try default
> +                        * size */
> +                       printk(KERN_ERR "%s: failed to apply new sizes, try the"
> +                               " default ones\n", netdev->name);
> +                       err = vmxnet3_create_queues(adapter,
> +                                                   VMXNET3_DEF_TX_RING_SIZE,
> +                                                   VMXNET3_DEF_RX_RING_SIZE,
> +                                                   VMXNET3_DEF_RX_RING_SIZE);
> +                       if (err) {
> +                               printk(KERN_ERR "%s: failed to create queues "
> +                                       "with default sizes. Closing it\n",
> +                                       netdev->name);
> +                               goto out;
> +                       }
> +               }
> +
> +               err = vmxnet3_activate_dev(adapter);
> +               if (err) {
> +                       printk(KERN_ERR "%s: failed to re-activate, error %d."
> +                               " Closing it\n", netdev->name, err);
> +                       goto out;

Going to out: anyway...

> +               }
> +       }
> +
> +out:
> +       clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
> +       if (err)
> +               vmxnet3_force_close(adapter);
> +
> +       return err;
> +}
> +
> +
> +static struct ethtool_ops vmxnet3_ethtool_ops = {
> +       .get_settings      = vmxnet3_get_settings,
> +       .get_drvinfo       = vmxnet3_get_drvinfo,
> +       .get_regs_len      = vmxnet3_get_regs_len,
> +       .get_regs          = vmxnet3_get_regs,
> +       .get_wol           = vmxnet3_get_wol,
> +       .set_wol           = vmxnet3_set_wol,
> +       .get_link          = ethtool_op_get_link,
> +       .get_rx_csum       = vmxnet3_get_rx_csum,
> +       .set_rx_csum       = vmxnet3_set_rx_csum,
> +       .get_tx_csum       = vmxnet3_get_tx_csum,
> +       .set_tx_csum       = vmxnet3_set_tx_csum,
> +       .get_sg            = ethtool_op_get_sg,
> +       .set_sg            = vmxnet3_set_sg,
> +       .get_tso           = ethtool_op_get_tso,
> +       .set_tso           = vmxnet3_set_tso,
> +       .get_strings       = vmxnet3_get_strings,
> +       .get_stats_count   = vmxnet3_get_stats_count,

use get_sset_count instead

> +       .get_ethtool_stats = vmxnet3_get_ethtool_stats,
> +       .get_ringparam     = vmxnet3_get_ringparam,
> +       .set_ringparam     = vmxnet3_set_ringparam,
> +};
> +
> +void vmxnet3_set_ethtool_ops(struct net_device *netdev)
> +{
> +       SET_ETHTOOL_OPS(netdev, &vmxnet3_ethtool_ops);
> +}
<snip>

^ permalink raw reply

* Re: [PATCH] /proc/net/tcp, overhead removed
From: Eric Dumazet @ 2009-09-29  7:56 UTC (permalink / raw)
  To: Yakov Lerner; +Cc: netdev, davem
In-Reply-To: <1254178906-5293-1-git-send-email-iler.ml@gmail.com>

Yakov Lerner a écrit :
> Take 2. 
> 
> "Sharp improvement in performance of /proc/net/tcp when number of 
> sockets is large and hashsize is large. 
> O(numsock * hashsize) time becomes O(numsock + hashsize). On slow
> processors, speed difference can be x100 and more."
> 
> I must say that I'm not fully satisfied with my choice of "st->sbucket" 
> for the new preserved index. The better name would be "st->snum". 
> Re-using "st->sbucket" saves 4 bytes, and keeps the patch to one sourcefile.
> But "st->sbucket" has different meaning in OPENREQ and LISTEN states;
> this can be confusing. 
> Maybe better add "snum" member to struct tcp_iter_state ?
> 
> Shall I change subject when sending "take N+1", or keep the old subject ?
> 
> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
> ---
>  net/ipv4/tcp_ipv4.c |   35 +++++++++++++++++++++++++++++++++--
>  1 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7cda24b..e4c4f19 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
>  		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
>  }
>  
> -static void *established_get_first(struct seq_file *seq)
> +static void *established_get_first_after(struct seq_file *seq, int bucket)
>  {
>  	struct tcp_iter_state *st = seq->private;
>  	struct net *net = seq_file_net(seq);
>  	void *rc = NULL;
>  
> -	for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
> +	for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
> +	     ++st->bucket) {
>  		struct sock *sk;
>  		struct hlist_nulls_node *node;
>  		struct inet_timewait_sock *tw;
> @@ -2010,6 +2011,8 @@ static void *established_get_first(struct seq_file *seq)
>  		if (empty_bucket(st))
>  			continue;
>  
> +		st->sbucket = st->num;
> +
>  		spin_lock_bh(lock);
>  		sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) {
>  			if (sk->sk_family != st->family ||
> @@ -2036,6 +2039,11 @@ out:
>  	return rc;
>  }
>  
> +static void *established_get_first(struct seq_file *seq)
> +{
> +	return established_get_first_after(seq, 0);
> +}
> +
>  static void *established_get_next(struct seq_file *seq, void *cur)
>  {
>  	struct sock *sk = cur;
> @@ -2064,6 +2072,9 @@ get_tw:
>  		while (++st->bucket < tcp_hashinfo.ehash_size &&
>  				empty_bucket(st))
>  			;
> +
> +		st->sbucket = st->num;
> +
>  		if (st->bucket >= tcp_hashinfo.ehash_size)
>  			return NULL;
>  
> @@ -2107,6 +2118,7 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
>  
>  	if (!rc) {
>  		st->state = TCP_SEQ_STATE_ESTABLISHED;
> +		st->sbucket = 0;
>  		rc	  = established_get_idx(seq, pos);
>  	}
>  
> @@ -2116,6 +2128,25 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
>  static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
>  {
>  	struct tcp_iter_state *st = seq->private;
> +
> +	if (*pos && *pos >= st->sbucket &&
> +	    (st->state == TCP_SEQ_STATE_ESTABLISHED ||
> +	     st->state == TCP_SEQ_STATE_TIME_WAIT)) {
> +		void *cur;
> +		int nskip;
> +
> +		/* for states estab and tw, st->sbucket is index (*pos) */
> +		/* corresponding to the beginning of bucket st->bucket */
> +
> +		st->num = st->sbucket;
> +		/* jump to st->bucket, then skip (*pos - st->sbucket) items */
> +		st->state = TCP_SEQ_STATE_ESTABLISHED;
> +		cur = established_get_first_after(seq, st->bucket);
> +		for (nskip = *pos - st->num; cur && nskip > 0; --nskip)
> +			cur = established_get_next(seq, cur);
> +		return cur;
> +	}
> +
>  	st->state = TCP_SEQ_STATE_LISTENING;
>  	st->num = 0;
>  	return *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;

Just in case you are working on "take 3" of the patch, there is a fondamental problem.

All the scalability problems come from the fact that tcp_seq_start()
*has* to rescan all the tables from the begining, because of lseek() capability
on /proc/net/tcp file 

We probably could disable llseek() (on other positions than start of the file),
and rely only on internal state (listening/established hashtable, hash bucket, position in chain)

I cannot imagine how an application could rely on lseek() on >0 position in this file.



^ permalink raw reply

* Re: [PATCH] /proc/net/tcp, overhead removed
From: Yakov Lerner @ 2009-09-29  7:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, netdev, David Miller
In-Reply-To: <20090928162417.59640672@nehalam>

On Tue, Sep 29, 2009 at 02:24, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Tue, 29 Sep 2009 00:20:07 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Yakov Lerner a écrit :
>> > On Sun, Sep 27, 2009 at 12:53, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> Yakov Lerner a écrit :
>> >>> /proc/net/tcp does 20,000 sockets in 60-80 milliseconds, with this patch.
>> >>>
>> >>> The overhead was in tcp_seq_start(). See analysis (3) below.
>> >>> The patch is against Linus git tree (1). The patch is small.
>> >>>
>> >>> ------------  -----------   ------------------------------------
>> >>> Before patch  After patch   20,000 sockets (10,000 tw + 10,000 estab)(2)
>> >>> ------------  -----------   ------------------------------------
>> >>> 6 sec          0.06 sec     dd bs=1k if=/proc/net/tcp >/dev/null
>> >>> 1.5 sec        0.06 sec     dd bs=4k if=/proc/net/tcp >/dev/null
>> >>>
>> >>> 1.9 sec        0.16 sec     netstat -4ant >/dev/null
>> >>> ------------  -----------   ------------------------------------
>> >>>
>> >>> This is ~ x25 improvement.
>> >>> The new time is not dependent on read blockize.
>> >>> Speed of netstat, naturally, improves, too; both -4 and -6.
>> >>> /proc/net/tcp6 does 20,000 sockets in 100 millisec.
>> >>>
>> >>> (1) against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>> >>>
>> >>> (2) Used 'manysock' utility to stress system with large number of sockets:
>> >>>   "manysock 10000 10000"    - 10,000 tw + 10,000 estab ip4 sockets.
>> >>>   "manysock -6 10000 10000" - 10,000 tw + 10,000 estab ip6 sockets.
>> >>> Found at http://ilerner.3b1.org/manysock/manysock.c
>> >>>
>> >>> (3) Algorithmic analysis.
>> >>>     Old algorithm.
>> >>>
>> >>> During 'cat </proc/net/tcp', tcp_seq_start() is called O(numsockets) times (4).
>> >>> On average, every call to tcp_seq_start() scans half the whole hashtable. Ouch.
>> >>> This is O(numsockets * hashsize). 95-99% of 'cat </proc/net/tcp' is spent in
>> >>> tcp_seq_start()->tcp_get_idx. This overhead is eliminated by new algorithm,
>> >>> which is O(numsockets + hashsize).
>> >>>
>> >>>     New algorithm.
>> >>>
>> >>> New algorithms is O(numsockets + hashsize). We jump to the right
>> >>> hash bucket in tcp_seq_start(), without scanning half the hash.
>> >>> To jump right to the hash bucket corresponding to *pos in tcp_seq_start(),
>> >>> we reuse three pieces of state (st->num, st->bucket, st->sbucket)
>> >>> as follows:
>> >>>  - we check that requested pos >= last seen pos (st->num), the typical case.
>> >>>  - if so, we jump to bucket st->bucket
>> >>>  - to arrive to the right item after beginning of st->bucket, we
>> >>> keep in st->sbucket the position corresponding to the beginning of
>> >>> bucket.
>> >>>
>> >>> (4) Explanation of O( numsockets * hashsize) of old algorithm.
>> >>>
>> >>> tcp_seq_start() is called once for every ~7 lines of netstat output
>> >>> if readsize is 1kb, or once for every ~28 lines if readsize >= 4kb.
>> >>> Since record length of /proc/net/tcp records is 150 bytes, formula for
>> >>> number of calls to tcp_seq_start() is
>> >>>             (numsockets * 150 / min(4096,readsize)).
>> >>> Netstat uses 4kb readsize (newer versions), or 1kb (older versions).
>> >>> Note that speed of old algorithm does not improve above 4kb blocksize.
>> >>>
>> >>> Speed of the new algorithm does not depend on blocksize.
>> >>>
>> >>> Speed of the new algorithm does not perceptibly depend on hashsize (which
>> >>> depends on ramsize). Speed of old algorithm drops with bigger hashsize.
>> >>>
>> >>> (5) Reporting order.
>> >>>
>> >>> Reporting order is exactly same as before if hash does not change underfoot.
>> >>> When hash elements come and go during report, reporting order will be
>> >>> same as that of tcpdiag.
>> >>>
>> >>> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
>
> Does the netlink interface used by ss command have the problem?

No. It's  /proc/net/tcp that has fixable problem.

Yakov

^ permalink raw reply

* [PATCH] Phonet: fix mutex imbalance
From: Rémi Denis-Courmont @ 2009-09-29  7:16 UTC (permalink / raw)
  To: netdev; +Cc: Rémi Denis-Courmont

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

port_mutex was unlocked twice.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 net/phonet/socket.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 07aa9f0..aa5b5a9 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -407,7 +407,6 @@ int pn_sock_get_port(struct sock *sk, unsigned short sport)
 	return -EADDRINUSE;
 
 found:
-	mutex_unlock(&port_mutex);
 	pn->sobject = pn_object(pn_addr(pn->sobject), sport);
 	return 0;
 }
-- 
1.6.0.4


^ permalink raw reply related

* Re: b43 is broken in latest net-2.6 and linux-2.6
From: Oliver Hartkopp @ 2009-09-29  7:13 UTC (permalink / raw)
  To: John W. Linville; +Cc: Michael Buesch, Linux Netdev List
In-Reply-To: <20090928184211.GC4737@tuxdriver.com>

John W. Linville wrote:
> On Sat, Sep 19, 2009 at 01:23:11PM +0200, Oliver Hartkopp wrote:
>> Hello Michael,
>>
>> my b43 wireless card (Dell 830) is not working with the latest net-2.6 (and
>> also linux-2.6 2.6.31-05767-gdf58bee).
>>
>> net-2.6 2.6.31-03263-gc29854e is working
>> net-2.6 2.6.31-03301-ga97e178 is broken
>>
>> I removed the patch with the work_queue stuff which did not help - so it's
>> probably the other patch you added to b43 recently.
>>
>> Don't know ... the wlan0 link does not become ready anymore.
>>
>> If you need some more information - please let me know.
> 
> Is this working better now, with 2.6.31-rc1?

Thanks for coming back on this.

Yes it is fixed now.

   'cfg80211: fix SME connect'

broke it and

   'cfg80211: don't overwrite privacy setting'

fixed it afterwards (at least in my setup).

So it's pretty well working now.

Best regards,
Oliver

^ permalink raw reply

* Webmail Helpdesk
From: ADMIN @ 2009-09-29  6:22 UTC (permalink / raw)



Your mailbox quota has been exceeded the storage limit which is 20GB
as set by your administrator,You are currently running on 20.9GB.

You may not be able to send or receive new mails until you re-validate
your mailbox.

To re-activate your account please click the link below

http://www.123contactform.com/contact-form-Webmaillove111-32572.html

Thanks and we are sorry for the inconviniences


^ permalink raw reply

* [GIT]: Networking
From: David Miller @ 2009-09-29  6:17 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Refcount bugs in ax25, from Ralf Baechle and Jarek Poplawski.

2) Wireless regression fixes:

   sony laptop rfkill handling loses state over suspend/resume,
   and hard block isn't checked at load time, from Alan Jenkins

   sysfs registry of wireless devices is borked, fix from Johannes Berg.

   cfg80211 can set privacy without key, this was hitting quite a few
   folks, fix from Johannes

   Memory leak and ucode info retrieval iwlwifi fixes.  One of the
   commits looks sizable, but it's predominantly moving code around.
   From Reinette Chatre.

3) netlink/dcbnl fixes from John Fastabend.  In particular, vlan messages
   weren't being sized large enough, nlmsg size for error ACKs was
   completely wrong, and dcbnl could double free message SKB.

4) ISDN driver build fix from Randy Dunlap.

5) Several e1000 bug fixes and fixups from Jesse Brandeburg:

   Timers stopped incorrectly.

   PCI-E support code is completely unused, identical code is only
   active in e1000e driver, not here.

   MTU changing is racy

   queues aren't stopped correctly during shutdown

   Fix namespacecheck warnings.

   Hopefully this driver is less of a screaming pile of poo than it used
   to be.

6) User bound checks in net/socket.c and wireless extensions, from Arjan
   van de Ven.

7) Revert stateless autoconf support for ipv6 isatap in SIT driver.

   It was not implemented according to spec properly, and doing it
   fully and correctly is a lot of code and thus should be done in
   userspace.

Please pull, thanks a lot!

The following changes since commit 17d857be649a21ca90008c6dc425d849fa83db5c:
  Linus Torvalds (1):
        Linux 2.6.32-rc1

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Alan Jenkins (2):
      sony-laptop: check for rfkill hard block at load time
      sony-laptop: re-read the rfkill state when resuming from suspend

Arjan van de Ven (2):
      net: Add explicit bound checks in net/socket.c
      wext: Add bound checks for copy_from_user

David S. Miller (1):
      Merge branch 'master' of ssh://master.kernel.org/.../linville/wireless-2.6

Don Skidmore (1):
      e1000: cleanup unused prototype

Jeff Hansen (1):
      bridge: Fix double-free in br_add_if.

Jesse Brandeburg (12):
      e1000: drop dead pcie code from e1000
      e1000: remove unused functions
      e1000: use netif_tx_disable
      e1000: stop timers at appropriate times
      e1000: test link state conclusively
      e1000: fix tx waking queue after queue stopped during shutdown
      e1000: two workarounds were incomplete, fix them
      e1000: remove races when changing mtu
      e1000: drop redunant line of code, cleanup
      e1000: updated whitespace and comments
      e1000: drop unused functionality for eeprom write/read
      e1000: fix namespacecheck warnings

Johannes Berg (5):
      cfg80211: wext: don't display BSSID unless associated
      cfg80211: don't set privacy w/o key
      cfg80211: always get BSS
      mac80211: improve/fix mlme messages
      wext: add back wireless/ dir in sysfs for cfg80211 interfaces

John Fastabend (3):
      net: fix vlan_get_size to include vlan_flags size
      net: fix nlmsg len size for skb when error bit is set.
      net: fix double skb free in dcbnl

Juha Leppanen (1):
      atm: dereference of he_dev->rbps_virt in he_init_group()

Ralf Baechle (1):
      ax25: Add missing dev_put in ax25_setsockopt

Randy Dunlap (1):
      isdn: fix netjet/isdnhdlc build errors

Reinette Chatre (3):
      iwlwifi: fix debugfs buffer handling
      iwlwifi: fix memory leak in command queue handling
      iwlwifi: fix 3945 ucode info retrieval after failure

Sascha Hlusiak (1):
      Revert "sit: stateless autoconf for isatap"

 drivers/atm/he.c                            |   14 +-
 drivers/isdn/hardware/mISDN/Kconfig         |    1 +
 drivers/isdn/i4l/Kconfig                    |    3 +-
 drivers/net/e1000/e1000.h                   |    3 -
 drivers/net/e1000/e1000_ethtool.c           |  202 +-
 drivers/net/e1000/e1000_hw.c                |12914 ++++++++++-----------------
 drivers/net/e1000/e1000_hw.h                | 3231 +++----
 drivers/net/e1000/e1000_main.c              |  825 +--
 drivers/net/e1000/e1000_param.c             |   22 -
 drivers/net/wireless/iwlwifi/iwl-1000.c     |    2 +
 drivers/net/wireless/iwlwifi/iwl-3945.c     |    2 +
 drivers/net/wireless/iwlwifi/iwl-3945.h     |    2 +
 drivers/net/wireless/iwlwifi/iwl-4965.c     |    2 +
 drivers/net/wireless/iwlwifi/iwl-5000.c     |    4 +
 drivers/net/wireless/iwlwifi/iwl-6000.c     |    2 +
 drivers/net/wireless/iwlwifi/iwl-agn.c      |  185 +
 drivers/net/wireless/iwlwifi/iwl-core.c     |  187 +-
 drivers/net/wireless/iwlwifi/iwl-core.h     |   14 +
 drivers/net/wireless/iwlwifi/iwl-debugfs.c  |    8 +-
 drivers/net/wireless/iwlwifi/iwl-tx.c       |    6 +
 drivers/net/wireless/iwlwifi/iwl3945-base.c |   31 +-
 drivers/platform/x86/sony-laptop.c          |    9 +
 include/linux/if_tunnel.h                   |    2 +-
 include/net/ipip.h                          |    7 -
 include/net/wext.h                          |    1 +
 net/8021q/vlan_netlink.c                    |    1 +
 net/ax25/af_ax25.c                          |   19 +-
 net/bridge/br_if.c                          |    1 +
 net/core/net-sysfs.c                        |   12 +-
 net/dcb/dcbnl.c                             |   15 +-
 net/ipv6/ndisc.c                            |    1 -
 net/ipv6/sit.c                              |   58 -
 net/mac80211/mlme.c                         |   18 +-
 net/netlink/af_netlink.c                    |    2 +-
 net/socket.c                                |    7 +-
 net/wireless/sme.c                          |    5 +-
 net/wireless/wext-sme.c                     |    8 +-
 net/wireless/wext.c                         |   11 +-
 38 files changed, 6792 insertions(+), 11045 deletions(-)

^ permalink raw reply

* Re: [PATCH][RESEND] IPv6: 6rd tunnel mode
From: YOSHIFUJI Hideaki @ 2009-09-29  5:57 UTC (permalink / raw)
  To: Mark Townsley; +Cc: acassen, Brian Haley, netdev
In-Reply-To: <4AC19CD1.6060600@cisco.com>

So, you mean, just in case if there are any IPRs?

--yoshfuji

Mark Townsley wrote:
> This is a general problem with all rfc editor submissions right now. It's not 
> because there is any IPR on the mechanism itself.
> 
> This is the current standards track version:
> 
> http://tools.ietf.org/id/draft-ietf-softwire-ipv6-6rd-00.txt
> 
> We are updating it now based on some discussions, should be published in a week 
> or so. Mostly minor updates. We are working with Remi closely on all of this.
> 
> This is the latest of Remi's independent drafts:
> 
> http://tools.ietf.org/html/draft-despres-6rd-03
> 
> - Mark
> 
> YOSHIFUJI Hideaki wrote:
>> Hello.
>>
>> Alexandre Cassen wrote:
>>
>>   
>>>> I couldn't find RFC 5569 (delayed due to IPR rights?), although I did find
>>>> the latest 6rd draft, -03.  It was showing as Informational, not Standards
>>>> track, is that right?  Just curious.
>>>>       
>>> In fact there is currently two draft :
>>>
>>> 1) https://datatracker.ietf.org/idtracker/draft-despres-6rd/
>>>
>>>    This draft is targeting informational RFC as an independent
>>> submission. It is currently queued and has been delayed since may for
>>> IPR.
>>>     
>>
>> Do you have any pointer about this (IPR)?
>> I probably missed something but I could not find any
>> information in IETF IPR page...
>>
>> --yoshfuji
>>
>>   
> 


^ permalink raw reply

* Re: [2.6.31-git17] WARNING: at kernel/hrtimer.c:648 hres_timers_resume+0x40/0x50()/WARNING: at drivers/base/sys.c:353 __sysdev_resume+0xc3/0xe0()
From: Yong Zhang @ 2009-09-29  5:44 UTC (permalink / raw)
  To: Maciej Rutecki
  Cc: Linux Kernel Mailing List, Rafael J. Wysocki, clemens,
	venkatesh.pallipadi, gregkh, zambrano, davem, netdev
In-Reply-To: <8db1092f0909281138t18a379d1qdf999b0610ed6414@mail.gmail.com>

On Tue, Sep 29, 2009 at 2:38 AM, Maciej Rutecki
<maciej.rutecki@gmail.com> wrote:
> 2009/9/28 Yong Zhang <yong.zhang0@gmail.com>:
>
>>>
>>
>> If you could, then please do it. It can give us some helpful information.
>
> Add patch and remove previous:
> http://unixy.pl/maciek/download/kernel/2.6.31-git17/gumis/dmesg-debug.txt
>
> s2disk&resume twice.
>
> no "timekeeping_resume() called with IRQs enabled!".
>
> I found some interesting thing, warnings appear only once, during
> first s2disk, on second don't appear.
>

Yeah, because WARN_ONCE just print one time.

Thanks,
Yong

> Regards
> --
> Maciej Rutecki
> http://www.maciek.unixy.pl
>

^ permalink raw reply

* Re: [PATCH][RESEND] IPv6: 6rd tunnel mode
From: YOSHIFUJI Hideaki @ 2009-09-29  5:11 UTC (permalink / raw)
  To: acassen; +Cc: Brian Haley, netdev, townsley
In-Reply-To: <1253602772.17175.13.camel@lnxos-dev>

Hello.

Alexandre Cassen wrote:

>> I couldn't find RFC 5569 (delayed due to IPR rights?), although I did find
>> the latest 6rd draft, -03.  It was showing as Informational, not Standards
>> track, is that right?  Just curious.
> 
> In fact there is currently two draft :
> 
> 1) https://datatracker.ietf.org/idtracker/draft-despres-6rd/
> 
>    This draft is targeting informational RFC as an independent
> submission. It is currently queued and has been delayed since may for
> IPR.

Do you have any pointer about this (IPR)?
I probably missed something but I could not find any
information in IETF IPR page...

--yoshfuji

^ permalink raw reply

* Re: [PATCH] /proc/net/tcp, overhead removed
From: Eric Dumazet @ 2009-09-29  4:39 UTC (permalink / raw)
  To: Yakov Lerner; +Cc: netdev, davem
In-Reply-To: <1254178906-5293-1-git-send-email-iler.ml@gmail.com>

Yakov Lerner a écrit :
> Take 2. 
> 
> "Sharp improvement in performance of /proc/net/tcp when number of 
> sockets is large and hashsize is large. 
> O(numsock * hashsize) time becomes O(numsock + hashsize). On slow
> processors, speed difference can be x100 and more."
> 
> I must say that I'm not fully satisfied with my choice of "st->sbucket" 
> for the new preserved index. The better name would be "st->snum". 
> Re-using "st->sbucket" saves 4 bytes, and keeps the patch to one sourcefile.
> But "st->sbucket" has different meaning in OPENREQ and LISTEN states;
> this can be confusing. 
> Maybe better add "snum" member to struct tcp_iter_state ?

You can add more fields to tcp_iter_state if it makes code more easy to read
and faster.

This structure is allocated once at open("/proc/net/tcp") time and could
be any reasonable size. You can add 10 longs in it, it is not a big deal.

> 
> Shall I change subject when sending "take N+1", or keep the old subject ?

Not a big deal, but keeping old subject is probably the common way.

[PATCH v2] tcp: Remove /proc/net/tcp O(N*H) overhead

> 
> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
> ---
>  net/ipv4/tcp_ipv4.c |   35 +++++++++++++++++++++++++++++++++--
>  1 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7cda24b..e4c4f19 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
>  		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
>  }
>  
> -static void *established_get_first(struct seq_file *seq)
> +static void *established_get_first_after(struct seq_file *seq, int bucket)
>  {
>  	struct tcp_iter_state *st = seq->private;
>  	struct net *net = seq_file_net(seq);
>  	void *rc = NULL;
>  
> -	for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
> +	for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
> +	     ++st->bucket) {
>  		struct sock *sk;
>  		struct hlist_nulls_node *node;
>  		struct inet_timewait_sock *tw;
> @@ -2010,6 +2011,8 @@ static void *established_get_first(struct seq_file *seq)
>  		if (empty_bucket(st))
>  			continue;
>  

> +		st->sbucket = st->num;
> +

oh this is ugly...

Check tcp_seq_stop() to see why st->sbucket should not change after getting
lock. Any reader of this will have a heart attack :)

>  		spin_lock_bh(lock);
>  		sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) {
>  			if (sk->sk_family != st->family ||
> @@ -2036,6 +2039,11 @@ out:
>  	return rc;
>  }
>  
> +static void *established_get_first(struct seq_file *seq)
> +{
> +	return established_get_first_after(seq, 0);
> +}
> +
>  static void *established_get_next(struct seq_file *seq, void *cur)
>  {
>  	struct sock *sk = cur;
> @@ -2064,6 +2072,9 @@ get_tw:
>  		while (++st->bucket < tcp_hashinfo.ehash_size &&
>  				empty_bucket(st))
>  			;
> +
> +		st->sbucket = st->num;

same here, this is ugly, even if it happens to work.

> +
>  		if (st->bucket >= tcp_hashinfo.ehash_size)
>  			return NULL;
>  
> @@ -2107,6 +2118,7 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
>  
>  	if (!rc) {
>  		st->state = TCP_SEQ_STATE_ESTABLISHED;
> +		st->sbucket = 0;
>  		rc	  = established_get_idx(seq, pos);
>  	}
>  
> @@ -2116,6 +2128,25 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
>  static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
>  {
>  	struct tcp_iter_state *st = seq->private;
> +
> +	if (*pos && *pos >= st->sbucket &&
> +	    (st->state == TCP_SEQ_STATE_ESTABLISHED ||
> +	     st->state == TCP_SEQ_STATE_TIME_WAIT)) {
> +		void *cur;
> +		int nskip;
> +
> +		/* for states estab and tw, st->sbucket is index (*pos) */
> +		/* corresponding to the beginning of bucket st->bucket */
> +
> +		st->num = st->sbucket;
ugly...
> +		/* jump to st->bucket, then skip (*pos - st->sbucket) items */
> +		st->state = TCP_SEQ_STATE_ESTABLISHED;
> +		cur = established_get_first_after(seq, st->bucket);
> +		for (nskip = *pos - st->num; cur && nskip > 0; --nskip)
> +			cur = established_get_next(seq, cur);
> +		return cur;
> +	}
> +

I dont think you need this chunk in tcp_get_start(), and its also probably buggy,
even if its hard to prove this claim, we'll need some prog to get TIME_WAIT sockets
in a reproducable form.

Jumping to the right hash slot is more than enough to avoid the O(N*H) problem.

You should try to optimize both established/listening algos, so that
code is readable and maintenable. On pathological cases, we can also have 10000
sockets in LISTENING/OPENREQ state.

Maybe we need a first patch to cleanup code, since its a really complex one,
then a patch to optimize it ?

IMHO the /proc/net/tcp file suffers from bugs, before a performance problem.

Currently, we can miss to output some live sockets in the dump, if :

Thread A gets a block from /proc/net/tcp and stops in hash slot N, socket X.
Thread B deletes sockets X, before socket Y in hash chain, or any socket
in previous hash slots.
Thread A gets 'next block', missing socket Y and possibly Y+1, Y+2....

-> Thread A doesnt see socket Y as an established/timewait socket.

So I believe being able to store the hash slot could really help both performance and
avoid skiping lot of sockets in case a thread B destroys sockets 'before our cursor'

The remaining window would be small, as only deleting sockets in our hash slot could
make us skip live sockets. (And closing this hole is really tricky, inet_diag has
same problem I believe)

Following program to establish 10000 sockets in listening state, and 2*10000 in
established state. Non random ports so that we can compare before/after patches.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>

int fdlisten[10000];
#define PORT 2222
int main(int argc, char *argv[])
{
        int i;
        struct sockaddr_in sockaddr, locaddr;

        for (i = 0; i < 10000; i++) {
                fdlisten[i] = socket(AF_INET, SOCK_STREAM, 0);
                memset(&sockaddr, 0, sizeof(sockaddr));
                sockaddr.sin_family = AF_INET;
                sockaddr.sin_port = htons(PORT);
                sockaddr.sin_addr.s_addr = htonl(0x7f000001 + i);
                if (bind(fdlisten[i], (struct sockaddr *)&sockaddr, sizeof(sockaddr))== -1) {
                        perror("bind");
                        return 1;
                }
                if (listen(fdlisten[i], 1)== -1) {
                        perror("listen");
                        return 1;
                }
        }
        if (fork() == 0) {
                i = 0;
                while (1) {
                        socklen_t len = sizeof(sockaddr);
                        int newfd = accept(fdlisten[i++], (struct sockaddr *)&sockaddr, &len);

                        if (newfd == -1)
                                perror("accept");
                        if (i == 10000)
                                i = 0;
                }
        }
        for (i = 0 ; i < 10000; i++) {
                int fd;

                close(fdlisten[i]);
                fd = socket(AF_INET, SOCK_STREAM, 0);
                if (fd == -1) {
                        perror("socket");
                        break;
                        }
                memset(&locaddr, 0, sizeof(locaddr));
                locaddr.sin_family = AF_INET;
                locaddr.sin_port = htons(i + 20000);
                locaddr.sin_addr.s_addr = htonl(0x7f000001 + i);
                bind(fd, (struct sockaddr *)&locaddr, sizeof(locaddr));

                memset(&sockaddr, 0, sizeof(sockaddr));
                sockaddr.sin_family = AF_INET;
                sockaddr.sin_port = htons(PORT);
                sockaddr.sin_addr.s_addr = htonl(0x7f000001 + i);
                connect(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr));
        }
        pause();
        return 0;
}

^ permalink raw reply

* Re: [Bonding-devel] [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: Stephen Hemminger @ 2009-09-29  3:00 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Andy Gospodarek, netdev, fubar, bonding-devel
In-Reply-To: <20090929013713.GG4436@gospo.rdu.redhat.com>

On Mon, 28 Sep 2009 21:37:13 -0400
Andy Gospodarek <andy@greyhouse.net> wrote:

> On Mon, Sep 28, 2009 at 05:34:20PM -0700, Stephen Hemminger wrote:
> > On Mon, 28 Sep 2009 20:12:03 -0400
> > Andy Gospodarek <andy@greyhouse.net> wrote:
> > 
> > > On Mon, Sep 28, 2009 at 04:22:37PM -0700, Stephen Hemminger wrote:
> > > > On Fri, 11 Sep 2009 17:13:17 -0400
> > > > Andy Gospodarek <andy@greyhouse.net> wrote:
> > > > 
> > > > > 
> > > > > bonding: add sysfs files to display tlb and alb hash table contents
> > > > > 
> > > > > While debugging some problems with alb (mode 6) bonding I realized that
> > > > > being able to output the contents of both hash tables would be helpful.
> > > > > This is what the output looks like for the two files:
> > > > > 
> > > > > device  load
> > > > > eth1    491
> > > > > eth2    491
> > > > > hash device   last device   tx bytes       load        next previous
> > > > > 2    eth1     eth1          2254           491         0    0
> > > > > 3    eth2     eth2          2744           491         0    0
> > > > > 6             eth2          0              488         0    0
> > > > > 8             eth2          0              461698      0    0
> > > > > 1b            eth2          0              249         0    0
> > > > > eb            eth2          0              21          0    0
> > > > > ff            eth2          0              22          0    0
> > > > > 
> > > > > hash ip_src          ip_dst          mac_dst           slave assign ntt
> > > > > 2    10.0.3.2        10.0.3.11       00:e0:81:71:ee:a9 eth1  1      0
> > > > > 3    10.0.3.2        10.0.3.10       00:e0:81:71:ee:a9 eth2  1      0
> > > > > 8    10.0.3.2        10.0.3.1        00:e0:81:71:ee:a9 eth2  1      0
> > > > > 
> > > > > These were a great help debugging the fixes I have just posted and they
> > > > > might be helpful for others, so I decided to include them in my
> > > > > patchset.
> > > > > 
> > > > > Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> > > > 
> > > > No.
> > > > 
> > > > Please don't put formatted output in sysfs, it is not meant to be
> > > > used like proc, there is supposed to be only one value per file.
> > > 
> > > Then based on the over 300 files in /sys/ that are more than 1 line on
> > > my currently running kernel, it seems there is significant work to do.
> > > 
> > > Seemingly arbitrary requests like this are extremely annoying when the
> > > current kernel violates them all over the place.
> > > 
> > 
> > The rules are documented in Documentation/sysfs-rules.txt. If you want
> > to change the rules, submit a change to the rules.
> > 
> 
> That specific request is actually in filesystems/sysfs.txt in the
> 'Attributes' section, but the fact that it's actually outlined somewhere
> makes the request seem less 'arbitrary.'  ;-)
> 

Ah, that is where the note is:
----------------------

Attributes
~~~~~~~~~~

Attributes can be exported for kobjects in the form of regular files in
the filesystem. Sysfs forwards file I/O operations to methods defined
for the attributes, providing a means to read and write kernel
attributes.

Attributes should be ASCII text files, preferably with only one value
per file. It is noted that it may not be efficient to contain only one
value per file, so it is socially acceptable to express an array of
values of the same type. 

Mixing types, expressing multiple lines of data, and doing fancy
formatting of data is heavily frowned upon. Doing these things may get
you publically humiliated and your code rewritten without notice.

-- 

^ permalink raw reply

* Re: [Bonding-devel] [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: Andy Gospodarek @ 2009-09-29  1:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andy Gospodarek, netdev, fubar, bonding-devel
In-Reply-To: <20090928173420.07e9dfad@nehalam>

On Mon, Sep 28, 2009 at 05:34:20PM -0700, Stephen Hemminger wrote:
> On Mon, 28 Sep 2009 20:12:03 -0400
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> > On Mon, Sep 28, 2009 at 04:22:37PM -0700, Stephen Hemminger wrote:
> > > On Fri, 11 Sep 2009 17:13:17 -0400
> > > Andy Gospodarek <andy@greyhouse.net> wrote:
> > > 
> > > > 
> > > > bonding: add sysfs files to display tlb and alb hash table contents
> > > > 
> > > > While debugging some problems with alb (mode 6) bonding I realized that
> > > > being able to output the contents of both hash tables would be helpful.
> > > > This is what the output looks like for the two files:
> > > > 
> > > > device  load
> > > > eth1    491
> > > > eth2    491
> > > > hash device   last device   tx bytes       load        next previous
> > > > 2    eth1     eth1          2254           491         0    0
> > > > 3    eth2     eth2          2744           491         0    0
> > > > 6             eth2          0              488         0    0
> > > > 8             eth2          0              461698      0    0
> > > > 1b            eth2          0              249         0    0
> > > > eb            eth2          0              21          0    0
> > > > ff            eth2          0              22          0    0
> > > > 
> > > > hash ip_src          ip_dst          mac_dst           slave assign ntt
> > > > 2    10.0.3.2        10.0.3.11       00:e0:81:71:ee:a9 eth1  1      0
> > > > 3    10.0.3.2        10.0.3.10       00:e0:81:71:ee:a9 eth2  1      0
> > > > 8    10.0.3.2        10.0.3.1        00:e0:81:71:ee:a9 eth2  1      0
> > > > 
> > > > These were a great help debugging the fixes I have just posted and they
> > > > might be helpful for others, so I decided to include them in my
> > > > patchset.
> > > > 
> > > > Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> > > 
> > > No.
> > > 
> > > Please don't put formatted output in sysfs, it is not meant to be
> > > used like proc, there is supposed to be only one value per file.
> > 
> > Then based on the over 300 files in /sys/ that are more than 1 line on
> > my currently running kernel, it seems there is significant work to do.
> > 
> > Seemingly arbitrary requests like this are extremely annoying when the
> > current kernel violates them all over the place.
> > 
> 
> The rules are documented in Documentation/sysfs-rules.txt. If you want
> to change the rules, submit a change to the rules.
> 

That specific request is actually in filesystems/sysfs.txt in the
'Attributes' section, but the fact that it's actually outlined somewhere
makes the request seem less 'arbitrary.'  ;-)


^ permalink raw reply

* Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: David Miller @ 2009-09-29  1:17 UTC (permalink / raw)
  To: akataria
  Cc: greg, sbhatewara, pv-drivers, netdev, shemminger, linux-kernel,
	virtualization, chrisw, anthony, akpm, jgarzik
In-Reply-To: <1254185499.13456.40.camel@ank32.eng.vmware.com>

From: Alok Kataria <akataria@vmware.com>
Date: Mon, 28 Sep 2009 17:51:39 -0700

> As a side note, were there any changes in the networking API's, that we
> should look out for in the merge cycle ?
> If not I think the rebase should be pretty trivial.

Just off the top of my head, the return type of the driver transmit
function was changed to netdev_tx_t, for one thing.

But there were likely numerous others.  You'll have to check.

^ permalink raw reply

* Re: tg3 and Broadcom PHY driver
From: Ben Hutchings @ 2009-09-29  0:53 UTC (permalink / raw)
  To: David Miller; +Cc: felix, mcarlson, netdev
In-Reply-To: <20090928.145522.151077608.davem@davemloft.net>

On Mon, 2009-09-28 at 14:55 -0700, David Miller wrote:
> From: Felix Radensky <felix@embedded-sol.com>
> Date: Mon, 28 Sep 2009 23:52:54 +0200
> 
> > Yes, moving CONFIG_TIGON3 right after CONFIG_PHYLIB in
> > drivers/net/Makefile fixes the problem for me.
> 
> Thanks for testing.
> 
> We really need to fix this generically.
> 
> Does anyone think that moving the MDIO/MII/PHY layer objects
> to the top of drivers/net/Makefile will break anything?
> 
> If not, that's what we should do I think.

Only the phylib drivers actually need to be moved to fix the
initialisation order, but moving the others shouldn't hurt.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Alok Kataria @ 2009-09-29  0:51 UTC (permalink / raw)
  To: Greg KH
  Cc: Shreyas Bhatewara, pv-drivers@vmware.com, netdev@vger.kernel.org,
	Stephen Hemminger, linux-kernel@vger.kernel.org, virtualization,
	Chris Wright, Anthony Liguori, Andrew Morton, Jeff Garzik,
	David S. Miller
In-Reply-To: <20090929002207.GB31372@kroah.com>


On Mon, 2009-09-28 at 17:22 -0700, Greg KH wrote:
> On Mon, Sep 28, 2009 at 04:56:45PM -0700, Shreyas Bhatewara wrote:

> > The patch applies to 2.6.31-rc9.
> 
> 2.6.32-rc1 is out, you should rebase to it as a few tens of thousands of
> changes have already happened since 2.6.31-rc9 :)
> 

Yep, we should rebase this, we will do that while incorporating the
comments that we got from David, and any others that you think we should
be making. 

As a side note, were there any changes in the networking API's, that we
should look out for in the merge cycle ?
If not I think the rebase should be pretty trivial.

Thanks,
Alok

> thanks,
> 
> greg k-h
> _______________________________________________
> Pv-drivers mailing list
> Pv-drivers@vmware.com
> http://mailman2.vmware.com/mailman/listinfo/pv-drivers

^ permalink raw reply

* Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Alok Kataria @ 2009-09-29  0:47 UTC (permalink / raw)
  To: Greg KH
  Cc: Shreyas Bhatewara, pv-drivers@vmware.com, netdev@vger.kernel.org,
	Stephen Hemminger, linux-kernel@vger.kernel.org, virtualization,
	Chris Wright, Anthony Liguori, Andrew Morton, Jeff Garzik,
	David S. Miller
In-Reply-To: <20090929002056.GA31372@kroah.com>

Hi Greg,

On Mon, 2009-09-28 at 17:20 -0700, Greg KH wrote:
> On Mon, Sep 28, 2009 at 04:56:45PM -0700, Shreyas Bhatewara wrote:
> > Ethernet NIC driver for VMware's vmxnet3
> > 
> > From: Shreyas Bhatewara <sbhatewara@vmware.com>
> > 
> > This patch adds driver support for VMware's virtual Ethernet NIC : vmxnet3
> > Guests running on VMware hypervisors supporting vmxnet3 device will thus
> > have access to improved network functionalities and performance.
> > 
> > Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> 
> I thought this was going to be submitted for the drivers/staging/ tree.
> What happened?

We managed to do most of the cleanup's inhouse over the weekend and
think this shouldn't need any major cleanup now. That's why thought
better to submit directly. 

Thanks,
Alok

> 
> thanks,
> 
> greg k-h
> _______________________________________________
> Pv-drivers mailing list
> Pv-drivers@vmware.com
> http://mailman2.vmware.com/mailman/listinfo/pv-drivers


^ permalink raw reply

* Re: [Bonding-devel] [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: David Miller @ 2009-09-29  0:44 UTC (permalink / raw)
  To: andy; +Cc: shemminger, netdev, fubar, bonding-devel
In-Reply-To: <20090929001203.GE4436@gospo.rdu.redhat.com>

From: Andy Gospodarek <andy@greyhouse.net>
Date: Mon, 28 Sep 2009 20:12:03 -0400

> Seemingly arbitrary requests like this are extremely annoying when the
> current kernel violates them all over the place.

And two wrongs don't make a right.

Just because things are not enforced properly elsewhere doesn't
mean we can be knowingly oblivious to the issue on new stuff.

Please adhere to Stephen's request, it's legitimate.


^ permalink raw reply

* Re: [Bonding-devel] [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: Stephen Hemminger @ 2009-09-29  0:34 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Andy Gospodarek, netdev, fubar, bonding-devel
In-Reply-To: <20090929001203.GE4436@gospo.rdu.redhat.com>

On Mon, 28 Sep 2009 20:12:03 -0400
Andy Gospodarek <andy@greyhouse.net> wrote:

> On Mon, Sep 28, 2009 at 04:22:37PM -0700, Stephen Hemminger wrote:
> > On Fri, 11 Sep 2009 17:13:17 -0400
> > Andy Gospodarek <andy@greyhouse.net> wrote:
> > 
> > > 
> > > bonding: add sysfs files to display tlb and alb hash table contents
> > > 
> > > While debugging some problems with alb (mode 6) bonding I realized that
> > > being able to output the contents of both hash tables would be helpful.
> > > This is what the output looks like for the two files:
> > > 
> > > device  load
> > > eth1    491
> > > eth2    491
> > > hash device   last device   tx bytes       load        next previous
> > > 2    eth1     eth1          2254           491         0    0
> > > 3    eth2     eth2          2744           491         0    0
> > > 6             eth2          0              488         0    0
> > > 8             eth2          0              461698      0    0
> > > 1b            eth2          0              249         0    0
> > > eb            eth2          0              21          0    0
> > > ff            eth2          0              22          0    0
> > > 
> > > hash ip_src          ip_dst          mac_dst           slave assign ntt
> > > 2    10.0.3.2        10.0.3.11       00:e0:81:71:ee:a9 eth1  1      0
> > > 3    10.0.3.2        10.0.3.10       00:e0:81:71:ee:a9 eth2  1      0
> > > 8    10.0.3.2        10.0.3.1        00:e0:81:71:ee:a9 eth2  1      0
> > > 
> > > These were a great help debugging the fixes I have just posted and they
> > > might be helpful for others, so I decided to include them in my
> > > patchset.
> > > 
> > > Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> > 
> > No.
> > 
> > Please don't put formatted output in sysfs, it is not meant to be
> > used like proc, there is supposed to be only one value per file.
> 
> Then based on the over 300 files in /sys/ that are more than 1 line on
> my currently running kernel, it seems there is significant work to do.
> 
> Seemingly arbitrary requests like this are extremely annoying when the
> current kernel violates them all over the place.
> 

The rules are documented in Documentation/sysfs-rules.txt. If you want
to change the rules, submit a change to the rules.

-- 

^ permalink raw reply

* Re: [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Greg KH @ 2009-09-29  0:22 UTC (permalink / raw)
  To: Shreyas Bhatewara
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Stephen Hemminger, David S. Miller, Jeff Garzik, Anthony Liguori,
	Chris Wright, Andrew Morton, virtualization,
	pv-drivers@vmware.com
In-Reply-To: <89E2752CFA8EC044846EB849981913410173CDFAF6@EXCH-MBX-4.vmware.com>

On Mon, Sep 28, 2009 at 04:56:45PM -0700, Shreyas Bhatewara wrote:
> 
> Please consider this for inclusion in the linux net tree. I will be
> glad to receive your review comments and answer queries in order to be
> accepted in mainline in 2.6.32 release cycle.

It's usually a bit late given that the big merge window for 2.6.32 is
now closed.

> The patch applies to 2.6.31-rc9.

2.6.32-rc1 is out, you should rebase to it as a few tens of thousands of
changes have already happened since 2.6.31-rc9 :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Greg KH @ 2009-09-29  0:20 UTC (permalink / raw)
  To: Shreyas Bhatewara
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Stephen Hemminger, David S. Miller, Jeff Garzik, Anthony Liguori,
	Chris Wright, Andrew Morton, virtualization,
	pv-drivers@vmware.com
In-Reply-To: <89E2752CFA8EC044846EB849981913410173CDFAF6@EXCH-MBX-4.vmware.com>

On Mon, Sep 28, 2009 at 04:56:45PM -0700, Shreyas Bhatewara wrote:
> Ethernet NIC driver for VMware's vmxnet3
> 
> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> 
> This patch adds driver support for VMware's virtual Ethernet NIC : vmxnet3
> Guests running on VMware hypervisors supporting vmxnet3 device will thus
> have access to improved network functionalities and performance.
> 
> Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

I thought this was going to be submitted for the drivers/staging/ tree.
What happened?

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox