Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: Eric Dumazet @ 2012-09-20 11:51 UTC (permalink / raw)
  To: sclark46; +Cc: Bruce Curtis, David Miller, edumazet, netdev
In-Reply-To: <505AFDE9.4080602@earthlink.net>

On Thu, 2012-09-20 at 07:28 -0400, Stephen Clark wrote:
>  
> Does this mean traffic on the loopback interface will not traverse 
> netfilter?
> 

Yes this was already mentioned.

Only the SYN / SYNACK messages will

All data will bypass IP stack, qdisc (if any), loopback driver, and
netfilter.

^ permalink raw reply

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: Stephen Clark @ 2012-09-20 11:28 UTC (permalink / raw)
  To: Bruce Curtis; +Cc: Eric Dumazet, David Miller, edumazet, netdev
In-Reply-To: <CAEkNxbEJjAu3+3yDGPGSzzee-LY_797RdNbBgcC6=-aDHfEAJQ@mail.gmail.com>

On 09/19/2012 05:19 PM, Bruce Curtis wrote:
> On Wed, Sep 19, 2012 at 2:03 PM, Eric Dumazet<eric.dumazet@gmail.com>  wrote:
>    
>> On Wed, 2012-09-19 at 16:34 -0400, David Miller wrote:
>>
>>      
>>> I have an idea on how to handle this.
>>>
>>> In drivers/net/loopback.c:loopback_tx(), skip the SKB orphan operation
>>> if there is a friend socket at skb->friend.
>>>
>>> When sending such friend SKBs out at connection startup, arrange it
>>> such that the skb->destructor will zap the skb->friend pointer to
>>> NULL.
>>>
>>> Also, in skb_orphan*(), if necessary, set skb->friend to NULL.
>>>
>>> skb->sk will hold a reference to the socket, and since skb->friend
>>> will be equal, this will make sure a pointer to an unreferenced
>>> socket does not escape.
>>>        
>> I now am wondering if we still need skb->friend field.
>>
>> If skb->sk is not zeroed by a premature skb_orphan(), then
>>
>> skb->sk->sk_friend gives the friend ?
>>
>>
>>      
Does this mean traffic on the loopback interface will not traverse 
netfilter?

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)

^ permalink raw reply

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
From: Lorenzo Colitti @ 2012-09-20 11:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, maze, therbert
In-Reply-To: <20120919.235102.1659819445753338481.davem@davemloft.net>

On Thu, Sep 20, 2012 at 12:51 PM, David Miller <davem@davemloft.net> wrote:
>> If this patch makes IPv6 performance better without affecting IPv4, it's a
>> good idea to apply it anyway, right? IPv6 dst entry garbage collection can
>> potentially cause serious performance issues on any server with a public
>> IPv6 address, and this patch substantially improves the situation.
>
> He's targetting net-next, and I've told him both in previous public
> discussions and in recent private communication that the correct fix
> is to make ipv6 routes use ref-count-less neighbour handling schemes
> like ipv4.

Fair enough. Removing the cache is a better solution - requiring a
separate cache entry for every address you want to send a packet to is
not suited to a world where every user has 2^64 addresses or more. But
if removing the route cache for IPv6 is a large amount of work that
nobody will sign up for, then fixing the symptoms might be better than
nothing.

The performance degradation could become an attack vector. Of course
the people that run IPv6 servers today can maintain their own patches,
but that's sort of suboptimal.

Is there something else that can be done other than moving to
non-refcounted neighbours?

^ permalink raw reply

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Patrick McHardy @ 2012-09-20 10:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, Florian Westphal, netfilter-devel, netdev,
	yongjun_wei
In-Reply-To: <20120920100859.GB20828@1984>

>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>> index dcb2791..0f241be 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1224,6 +1224,8 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
>>  	spin_lock_bh(&nf_conntrack_lock);
>>  	for (; *bucket < net->ct.htable_size; (*bucket)++) {
>>  		hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) {
>> +			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>> +				continue;
>
> I think this will make the deletion of entries via `conntrack -F'
> slowier as we'll have to iterate over more entries (we won't delete
> entries for the reply tuple).

Slightly maybe, but I doubt it makes much of a difference.

> I think I prefer Florian's patch, it's fairly small and it does not
> change the current nf_ct_iterate behaviour or adding some
> nf_nat_iterate cleanup.

I don't think I've received it. Could you forward it to me please?

^ permalink raw reply

* Re: [PATCH v2] USB: remove dbg() usage in USB networking drivers
From: Greg Kroah-Hartman @ 2012-09-20 10:30 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, linux-usb, linux-kernel
In-Reply-To: <1348135633.5604.5.camel@joe2Laptop>

On Thu, Sep 20, 2012 at 03:07:13AM -0700, Joe Perches wrote:
> On Wed, 2012-09-19 at 20:46 +0100, Greg Kroah-Hartman wrote:
> > The dbg() USB macro is so old, it predates me.  The USB networking drivers are
> > the last hold-out using this macro, and we want to get rid of it, so replace
> > the usage of it with the proper netdev_dbg() or dev_dbg() (depending on the
> > context) calls.
> 
> OK, one more bit of trivia
> 
> > diff --git a/drivers/net/usb/net1080.c b/drivers/net/usb/net1080.c
> []
> > @@ -422,8 +419,9 @@ static int net1080_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
> >  	if (!(skb->len & 0x01)) {
> >  #ifdef DEBUG
> >  		struct net_device	*net = dev->net;
> > -		dbg("rx framesize %d range %d..%d mtu %d", skb->len,
> > -			net->hard_header_len, dev->hard_mtu, net->mtu);
> > +		netdev_dbg(dev->net, "rx framesize %d range %d..%d mtu %d\n",
> > +			   skb->len, net->hard_header_len, dev->hard_mtu,
> > +			   net->mtu);
> >  #endif
> 
> maybe
> 		netdev_dbg(net, ...
> 
> or remove the odd #ifdef DEBUG surround used to guard
> the otherwise unused net variable and use:
> 
> 		netdev_dbg(dev->net, "rx framesize %d range %d..%d mtu %d\n",
> 			   skb->len, dev->net->hard_header_len, dev->hard_mtu,
> 			   dev->net->mtu);
> 

Yeah, that would be better.

Even better would be just to delete all of this debug crud from these
drivers.  Almost all of the messages are there from when the developer
originally wrote the driver, trying to figure out what was going on.
>From what I have seen, in doing all of these cleanups, is that the need
for maybe a few debug lines that can be used if users have issues, but
the majority are just useless.

But, as I'm not the author or maintainer of these drivers, I'll be nice
and just leave them in, all I want to do is get rid of the old, foolish,
macros for debugging and use the proper dynamic debug code that works so
much better.

So I'll leave this change alone, and if someone wants to do the cleanup
better, the 3 liner above is fine with me to add later.

thanks,

greg k-h

^ permalink raw reply

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Pablo Neira Ayuso @ 2012-09-20 10:08 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jesper Dangaard Brouer, Florian Westphal, netfilter-devel, netdev,
	yongjun_wei
In-Reply-To: <Pine.GSO.4.63.1209200855500.8409@stinky-local.trash.net>

On Thu, Sep 20, 2012 at 08:57:04AM +0200, Patrick McHardy wrote:
> On Wed, 19 Sep 2012, Jesper Dangaard Brouer wrote:
> 
> >On Fri, 2012-09-14 at 15:15 +0200, Patrick McHardy wrote:
> >>On Fri, 14 Sep 2012, Pablo Neira Ayuso wrote:
> >>
> >[...cut...]
> >>>>Patrick, any other idea?
> >>>
> >[...cut...]
> >>>>
> >>>We can add nf_nat_iterate_cleanup that can iterate over the NAT
> >>>hashtable to replace current usage of nf_ct_iterate_cleanup.
> >>
> >>Lets just bail out when IPS_SRC_NAT_DONE is not set, that should also fix
> >>it. Could you try this patch please?
> >
> >On Fri, 2012-09-14 at 15:15 +0200, Patrick McHardy wrote:
> >diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> >>index 29d4452..8b5d220 100644
> >>--- a/net/netfilter/nf_nat_core.c
> >>+++ b/net/netfilter/nf_nat_core.c
> >>@@ -481,6 +481,8 @@ static int nf_nat_proto_clean(struct nf_conn *i,
> >void *data)
> >>
> >>        if (!nat)
> >>                return 0;
> >>+       if (!(i->status & IPS_SRC_NAT_DONE))
> >>+               return 0;
> >>        if ((clean->l3proto && nf_ct_l3num(i) != clean->l3proto) ||
> >>            (clean->l4proto && nf_ct_protonum(i) != clean->l4proto))
> >>                return 0;
> >>
> >
> >No it does not work :-(
> 
> Ok I think I understand the problem now, we're invoking the NAT cleanup
> callback twice with clean->hash = true, once for each direction of the
> conntrack.
> 
> Does this patch fix the problem?

> commit 6c46a3bfb2776ca098565daf7e872a3283d14e0d
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Thu Sep 20 08:43:02 2012 +0200
> 
>     netfilter: nf_nat: fix oops when unloading protocol modules
>     
>     When unloading a protocol module nf_ct_iterate_cleanup() is used to
>     remove all conntracks using the protocol from the bysource hash and
>     clean their NAT sections. Since the conntrack isn't actually killed,
>     the NAT callback is invoked twice, once for each direction, which
>     causes an oops when trying to delete it from the bysource hash for
>     the second time.
>     
>     The same oops can also happen when removing both an L3 and L4 protocol
>     since the cleanup function doesn't check whether the conntrack has
>     already been cleaned up.
>     
>     Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM
>     RIP: 0010:[<ffffffffa002c303>]  [<ffffffffa002c303>] nf_nat_proto_clean+0x73/0xd0 [nf_nat]
>     RSP: 0018:ffff88007808fe18  EFLAGS: 00010246
>     RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0
>     RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208
>     RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000
>     R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00
>     R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88
>     FS:  00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>     CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0
>     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>     Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0)
>     Stack:
>      ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3
>      ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00
>      ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0
>     Call Trace:
>      [<ffffffffa002c290>] ? nf_nat_net_exit+0x50/0x50 [nf_nat]
>      [<ffffffff815614e3>] nf_ct_iterate_cleanup+0xc3/0x170
>      [<ffffffffa002c55a>] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat]
>      [<ffffffff812a0303>] ? compat_prepare_timeout+0x13/0xb0
>      [<ffffffffa0035848>] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4]
>      ...
>     
>     To fix this,
>     
>     - check whether the conntrack has already been cleaned up in
>       nf_nat_proto_clean
>     
>     - change nf_ct_iterate_cleanup() to only invoke the callback function
>       once for each conntrack (IP_CT_DIR_ORIGINAL).
>     
>     The second change doesn't affect other callers since when conntracks are
>     actually killed, both directions are removed from the hash immediately
>     and the callback is already only invoked once. If it is not killed, the
>     second callback invocation will always return the same decision not to
>     kill it.
>     
>     Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
>     Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index dcb2791..0f241be 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1224,6 +1224,8 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
>  	spin_lock_bh(&nf_conntrack_lock);
>  	for (; *bucket < net->ct.htable_size; (*bucket)++) {
>  		hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) {
> +			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> +				continue;

I think this will make the deletion of entries via `conntrack -F'
slowier as we'll have to iterate over more entries (we won't delete
entries for the reply tuple).

I think I prefer Florian's patch, it's fairly small and it does not
change the current nf_ct_iterate behaviour or adding some
nf_nat_iterate cleanup.

>  			ct = nf_ct_tuplehash_to_ctrack(h);
>  			if (iter(ct, data))
>  				goto found;
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index 1816ad3..65cf694 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -481,6 +481,8 @@ static int nf_nat_proto_clean(struct nf_conn *i, void *data)
>  
>  	if (!nat)
>  		return 0;
> +	if (!(i->status & IPS_SRC_NAT_DONE))
> +		return 0;
>  	if ((clean->l3proto && nf_ct_l3num(i) != clean->l3proto) ||
>  	    (clean->l4proto && nf_ct_protonum(i) != clean->l4proto))
>  		return 0;


^ permalink raw reply

* [net-next 4/4 v2] ixgbevf: scheduling while atomic in reset hw path
From: Jeff Kirsher @ 2012-09-20 10:07 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, gospo, sassmann, Eric Dumazet,
	Jeff Kirsher
In-Reply-To: <1348135637-17857-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>

In ixgbevf_reset_hw_vf() msleep is called while holding mbx_lock
resulting in a schedule while atomic bug with trace below.

This patch uses mdelay instead.

BUG: scheduling while atomic: ip/6539/0x00000002
2 locks held by ip/6539:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81419cc3>] rtnl_lock+0x17/0x19
 #1:  (&(&adapter->mbx_lock)->rlock){+.+...}, at: [<ffffffffa0030855>] ixgbevf_reset+0x30/0xc1 [ixgbevf]
Modules linked in: ixgbevf ixgbe mdio libfc scsi_transport_fc 8021q scsi_tgt garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput igb coretemp hwmon crc32c_intel ioatdma i2c_i801 shpchp microcode lpc_ich mfd_core i2c_core joydev dca pcspkr serio_raw pata_acpi ata_generic usb_storage pata_jmicron
Pid: 6539, comm: ip Not tainted 3.6.0-rc3jk-net-next+ #104
Call Trace:
 [<ffffffff81072202>] __schedule_bug+0x6a/0x79
 [<ffffffff814bc7e0>] __schedule+0xa2/0x684
 [<ffffffff8108f85f>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff814bd0c0>] schedule+0x64/0x66
 [<ffffffff814bb5e2>] schedule_timeout+0xa6/0xca
 [<ffffffff810536b9>] ? lock_timer_base+0x52/0x52
 [<ffffffff812629e0>] ? __udelay+0x15/0x17
 [<ffffffff814bb624>] schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffff810541c0>] msleep+0x1b/0x22
 [<ffffffffa002e723>] ixgbevf_reset_hw_vf+0x90/0xe5 [ixgbevf]
 [<ffffffffa0030860>] ixgbevf_reset+0x3b/0xc1 [ixgbevf]
 [<ffffffffa0032fba>] ixgbevf_open+0x43/0x43e [ixgbevf]
 [<ffffffff81409610>] ? dev_set_rx_mode+0x2e/0x33
 [<ffffffff8140b0f1>] __dev_open+0xa0/0xe5
 [<ffffffff814097ed>] __dev_change_flags+0xbe/0x142
 [<ffffffff8140b01c>] dev_change_flags+0x21/0x56
 [<ffffffff8141a843>] do_setlink+0x2e2/0x7f4
 [<ffffffff81016e36>] ? native_sched_clock+0x37/0x39
 [<ffffffff8141b0ac>] rtnl_newlink+0x277/0x4bb
 [<ffffffff8141aee9>] ? rtnl_newlink+0xb4/0x4bb
 [<ffffffff812217d1>] ? selinux_capable+0x32/0x3a
 [<ffffffff8104fb17>] ? ns_capable+0x4f/0x67
 [<ffffffff81419cc3>] ? rtnl_lock+0x17/0x19
 [<ffffffff81419f28>] rtnetlink_rcv_msg+0x236/0x253
 [<ffffffff81419cf2>] ? rtnetlink_rcv+0x2d/0x2d
 [<ffffffff8142fd42>] netlink_rcv_skb+0x43/0x94
 [<ffffffff81419ceb>] rtnetlink_rcv+0x26/0x2d
 [<ffffffff8142faf1>] netlink_unicast+0xee/0x174
 [<ffffffff81430327>] netlink_sendmsg+0x26a/0x288
 [<ffffffff813fb04f>] ? rcu_read_unlock+0x56/0x67
 [<ffffffff813f5e6d>] __sock_sendmsg_nosec+0x58/0x61
 [<ffffffff813f81b7>] __sock_sendmsg+0x3d/0x48
 [<ffffffff813f8339>] sock_sendmsg+0x6e/0x87
 [<ffffffff81107c9f>] ? might_fault+0xa5/0xac
 [<ffffffff81402a72>] ? copy_from_user+0x2a/0x2c
 [<ffffffff81402e62>] ? verify_iovec+0x54/0xaa
 [<ffffffff813f9834>] __sys_sendmsg+0x206/0x288
 [<ffffffff810694fa>] ? up_read+0x23/0x3d
 [<ffffffff811307e5>] ? fcheck_files+0xac/0xea
 [<ffffffff8113095e>] ? fget_light+0x3a/0xb9
 [<ffffffff813f9a2e>] sys_sendmsg+0x42/0x60
 [<ffffffff814c5ba9>] system_call_fastpath+0x16/0x1b

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-By: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/vf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 690801b..87b3f3b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -100,7 +100,7 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
 	msgbuf[0] = IXGBE_VF_RESET;
 	mbx->ops.write_posted(hw, msgbuf, 1);
 
-	msleep(10);
+	mdelay(10);
 
 	/* set our "perm_addr" based on info provided by PF */
 	/* also set up the mc_filter_type which is piggy backed
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 3/4] ixgbevf: Add support for VF API negotiation
From: Jeff Kirsher @ 2012-09-20 10:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348135637-17857-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that the VF can support the PF/VF API negotiation
protocol.  Specifically in this case we are adding support for API 1.0
which will mean that the VF is capable of cleaning up buffers that span
multiple descriptors without triggering an error.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/defines.h      |  1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 ++++++++++++++
 drivers/net/ethernet/intel/ixgbevf/mbx.h          | 21 +++++++++++--
 drivers/net/ethernet/intel/ixgbevf/vf.c           | 37 +++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbevf/vf.h           |  3 ++
 5 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 418af82..da17ccf 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -272,5 +272,6 @@ struct ixgbe_adv_tx_context_desc {
 /* Error Codes */
 #define IXGBE_ERR_INVALID_MAC_ADDR              -1
 #define IXGBE_ERR_RESET_FAILED                  -2
+#define IXGBE_ERR_INVALID_ARGUMENT              -3
 
 #endif /* _IXGBEVF_DEFINES_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a5d9cc5..c5ffe1d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1334,6 +1334,25 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
 	adapter->stats.base_vfmprc = adapter->stats.last_vfmprc;
 }
 
+static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	int api[] = { ixgbe_mbox_api_10,
+		      ixgbe_mbox_api_unknown };
+	int err = 0, idx = 0;
+
+	spin_lock(&adapter->mbx_lock);
+
+	while (api[idx] != ixgbe_mbox_api_unknown) {
+		err = ixgbevf_negotiate_api_version(hw, api[idx]);
+		if (!err)
+			break;
+		idx++;
+	}
+
+	spin_unlock(&adapter->mbx_lock);
+}
+
 static void ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
@@ -1399,6 +1418,8 @@ void ixgbevf_up(struct ixgbevf_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 
+	ixgbevf_negotiate_api(adapter);
+
 	ixgbevf_configure(adapter);
 
 	ixgbevf_up_complete(adapter);
@@ -2388,6 +2409,8 @@ static int ixgbevf_open(struct net_device *netdev)
 		}
 	}
 
+	ixgbevf_negotiate_api(adapter);
+
 	/* allocate transmit descriptors */
 	err = ixgbevf_setup_all_tx_resources(adapter);
 	if (err)
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index cf9131c..946ce86 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -76,12 +76,29 @@
 /* bits 23:16 are used for exra info for certain messages */
 #define IXGBE_VT_MSGINFO_MASK     (0xFF << IXGBE_VT_MSGINFO_SHIFT)
 
+/* definitions to support mailbox API version negotiation */
+
+/*
+ * each element denotes a version of the API; existing numbers may not
+ * change; any additions must go at the end
+ */
+enum ixgbe_pfvf_api_rev {
+	ixgbe_mbox_api_10,	/* API version 1.0, linux/freebsd VF driver */
+	ixgbe_mbox_api_20,	/* API version 2.0, solaris Phase1 VF driver */
+	/* This value should always be last */
+	ixgbe_mbox_api_unknown,	/* indicates that API version is not known */
+};
+
+/* mailbox API, legacy requests */
 #define IXGBE_VF_RESET            0x01 /* VF requests reset */
 #define IXGBE_VF_SET_MAC_ADDR     0x02 /* VF requests PF to set MAC addr */
 #define IXGBE_VF_SET_MULTICAST    0x03 /* VF requests PF to set MC addr */
 #define IXGBE_VF_SET_VLAN         0x04 /* VF requests PF to set VLAN */
-#define IXGBE_VF_SET_LPE          0x05 /* VF requests PF to set VMOLR.LPE */
-#define IXGBE_VF_SET_MACVLAN      0x06 /* VF requests PF for unicast filter */
+
+/* mailbox API, version 1.0 VF requests */
+#define IXGBE_VF_SET_LPE	0x05 /* VF requests PF to set VMOLR.LPE */
+#define IXGBE_VF_SET_MACVLAN	0x06 /* VF requests PF for unicast filter */
+#define IXGBE_VF_API_NEGOTIATE	0x08 /* negotiate API version */
 
 /* length of permanent address message returned from PF */
 #define IXGBE_VF_PERMADDR_MSG_LEN 4
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 3d555a1..690801b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -79,6 +79,9 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
 	/* Call adapter stop to disable tx/rx and clear interrupts */
 	hw->mac.ops.stop_adapter(hw);
 
+	/* reset the api version */
+	hw->api_version = ixgbe_mbox_api_10;
+
 	IXGBE_WRITE_REG(hw, IXGBE_VFCTRL, IXGBE_CTRL_RST);
 	IXGBE_WRITE_FLUSH(hw);
 
@@ -433,6 +436,40 @@ void ixgbevf_rlpml_set_vf(struct ixgbe_hw *hw, u16 max_size)
 	ixgbevf_write_msg_read_ack(hw, msgbuf, 2);
 }
 
+/**
+ *  ixgbevf_negotiate_api_version - Negotiate supported API version
+ *  @hw: pointer to the HW structure
+ *  @api: integer containing requested API version
+ **/
+int ixgbevf_negotiate_api_version(struct ixgbe_hw *hw, int api)
+{
+	int err;
+	u32 msg[3];
+
+	/* Negotiate the mailbox API version */
+	msg[0] = IXGBE_VF_API_NEGOTIATE;
+	msg[1] = api;
+	msg[2] = 0;
+	err = hw->mbx.ops.write_posted(hw, msg, 3);
+
+	if (!err)
+		err = hw->mbx.ops.read_posted(hw, msg, 3);
+
+	if (!err) {
+		msg[0] &= ~IXGBE_VT_MSGTYPE_CTS;
+
+		/* Store value and return 0 on success */
+		if (msg[0] == (IXGBE_VF_API_NEGOTIATE | IXGBE_VT_MSGTYPE_ACK)) {
+			hw->api_version = api;
+			return 0;
+		}
+
+		err = IXGBE_ERR_INVALID_ARGUMENT;
+	}
+
+	return err;
+}
+
 static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
 	.init_hw             = ixgbevf_init_hw_vf,
 	.reset_hw            = ixgbevf_reset_hw_vf,
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 07fd876..47f11a5 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -137,6 +137,8 @@ struct ixgbe_hw {
 
 	u8  revision_id;
 	bool adapter_stopped;
+
+	int api_version;
 };
 
 struct ixgbevf_hw_stats {
@@ -171,5 +173,6 @@ struct ixgbevf_info {
 };
 
 void ixgbevf_rlpml_set_vf(struct ixgbe_hw *hw, u16 max_size);
+int ixgbevf_negotiate_api_version(struct ixgbe_hw *hw, int api);
 #endif /* __IXGBE_VF_H__ */
 
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 2/4] igb: Support to enable EEE on all eee_supported devices
From: Jeff Kirsher @ 2012-09-20 10:07 UTC (permalink / raw)
  To: davem; +Cc: Akeem G. Abodunrin, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348135637-17857-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: "Akeem G. Abodunrin" <akeem.g.abodunrin@intel.com>

Current implementation enables EEE on only i350 device. This patch enables
EEE on all eee_supported devices. Also, configured LPI clock to keep
running before EEE is enabled on i210 and i211 devices.

Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper  <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_82575.c   | 17 +++++++++++++----
 drivers/net/ethernet/intel/igb/e1000_defines.h |  3 ++-
 drivers/net/ethernet/intel/igb/e1000_regs.h    |  1 +
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index ba994fb..ca4641e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -2223,11 +2223,10 @@ out:
 s32 igb_set_eee_i350(struct e1000_hw *hw)
 {
 	s32 ret_val = 0;
-	u32 ipcnfg, eeer, ctrl_ext;
+	u32 ipcnfg, eeer;
 
-	ctrl_ext = rd32(E1000_CTRL_EXT);
-	if ((hw->mac.type != e1000_i350) ||
-	    (ctrl_ext & E1000_CTRL_EXT_LINK_MODE_MASK))
+	if ((hw->mac.type < e1000_i350) ||
+	    (hw->phy.media_type != e1000_media_type_copper))
 		goto out;
 	ipcnfg = rd32(E1000_IPCNFG);
 	eeer = rd32(E1000_EEER);
@@ -2240,6 +2239,14 @@ s32 igb_set_eee_i350(struct e1000_hw *hw)
 			E1000_EEER_RX_LPI_EN |
 			E1000_EEER_LPI_FC);
 
+		/* keep the LPI clock running before EEE is enabled */
+		if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
+			u32 eee_su;
+			eee_su = rd32(E1000_EEE_SU);
+			eee_su &= ~E1000_EEE_SU_LPI_CLK_STP;
+			wr32(E1000_EEE_SU, eee_su);
+		}
+
 	} else {
 		ipcnfg &= ~(E1000_IPCNFG_EEE_1G_AN |
 			E1000_IPCNFG_EEE_100M_AN);
@@ -2249,6 +2256,8 @@ s32 igb_set_eee_i350(struct e1000_hw *hw)
 	}
 	wr32(E1000_IPCNFG, ipcnfg);
 	wr32(E1000_EEER, eeer);
+	rd32(E1000_IPCNFG);
+	rd32(E1000_EEER);
 out:
 
 	return ret_val;
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index cae3070..de4b41e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -857,8 +857,9 @@
 #define E1000_IPCNFG_EEE_100M_AN     0x00000004  /* EEE Enable 100M AN */
 #define E1000_EEER_TX_LPI_EN         0x00010000  /* EEE Tx LPI Enable */
 #define E1000_EEER_RX_LPI_EN         0x00020000  /* EEE Rx LPI Enable */
-#define E1000_EEER_FRC_AN            0x10000000 /* Enable EEE in loopback */
+#define E1000_EEER_FRC_AN            0x10000000  /* Enable EEE in loopback */
 #define E1000_EEER_LPI_FC            0x00040000  /* EEE Enable on FC */
+#define E1000_EEE_SU_LPI_CLK_STP     0X00800000  /* EEE LPI Clock Stop */
 
 /* SerDes Control */
 #define E1000_GEN_CTL_READY             0x80000000
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index faec840..e5db485 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -349,6 +349,7 @@
 /* Energy Efficient Ethernet "EEE" register */
 #define E1000_IPCNFG  0x0E38  /* Internal PHY Configuration */
 #define E1000_EEER    0x0E30  /* Energy Efficient Ethernet */
+#define E1000_EEE_SU  0X0E34  /* EEE Setup */
 
 /* Thermal Sensor Register */
 #define E1000_THSTAT    0x08110 /* Thermal Sensor Status */
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 1/4] igb: Remove artificial restriction on RQDPC stat reading
From: Jeff Kirsher @ 2012-09-20 10:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348135637-17857-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

For some reason the reading of the RQDPC register was being artificially
limited to 4K.  Instead of limiting the value we should read the value and
add the full amount.  Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper   <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 19d7666..246646b 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4681,11 +4681,13 @@ void igb_update_stats(struct igb_adapter *adapter,
 	bytes = 0;
 	packets = 0;
 	for (i = 0; i < adapter->num_rx_queues; i++) {
-		u32 rqdpc_tmp = rd32(E1000_RQDPC(i)) & 0x0FFF;
+		u32 rqdpc = rd32(E1000_RQDPC(i));
 		struct igb_ring *ring = adapter->rx_ring[i];
 
-		ring->rx_stats.drops += rqdpc_tmp;
-		net_stats->rx_fifo_errors += rqdpc_tmp;
+		if (rqdpc) {
+			ring->rx_stats.drops += rqdpc;
+			net_stats->rx_fifo_errors += rqdpc;
+		}
 
 		do {
 			start = u64_stats_fetch_begin_bh(&ring->rx_syncp);
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 0/4 v2][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-20 10:07 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to igb and ixgbevf.

v2: updated patch description in 04 patch (ixgbevf: scheduling while
    atomic in reset hw path)

The following are changes since commit aee77e4accbeb2c86b1d294cd84fec4a12dde3bd:
  r8169: use unlimited DMA burst for TX
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Akeem G. Abodunrin (1):
  igb: Support to enable EEE on all eee_supported devices

Alexander Duyck (2):
  igb: Remove artificial restriction on RQDPC stat reading
  ixgbevf: Add support for VF API negotiation

John Fastabend (1):
  ixgbevf: scheduling while atomic in reset hw path

 drivers/net/ethernet/intel/igb/e1000_82575.c      | 17 +++++++---
 drivers/net/ethernet/intel/igb/e1000_defines.h    |  3 +-
 drivers/net/ethernet/intel/igb/e1000_regs.h       |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c         |  8 +++--
 drivers/net/ethernet/intel/ixgbevf/defines.h      |  1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++++++++
 drivers/net/ethernet/intel/ixgbevf/mbx.h          | 21 ++++++++++--
 drivers/net/ethernet/intel/ixgbevf/vf.c           | 39 ++++++++++++++++++++++-
 drivers/net/ethernet/intel/ixgbevf/vf.h           |  3 ++
 9 files changed, 105 insertions(+), 11 deletions(-)

-- 
1.7.11.4

^ permalink raw reply

* Re: [PATCH v2] USB: remove dbg() usage in USB networking drivers
From: Joe Perches @ 2012-09-20 10:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: netdev, linux-usb, linux-kernel
In-Reply-To: <20120919194614.GA17585@kroah.com>

On Wed, 2012-09-19 at 20:46 +0100, Greg Kroah-Hartman wrote:
> The dbg() USB macro is so old, it predates me.  The USB networking drivers are
> the last hold-out using this macro, and we want to get rid of it, so replace
> the usage of it with the proper netdev_dbg() or dev_dbg() (depending on the
> context) calls.

OK, one more bit of trivia

> diff --git a/drivers/net/usb/net1080.c b/drivers/net/usb/net1080.c
[]
> @@ -422,8 +419,9 @@ static int net1080_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>  	if (!(skb->len & 0x01)) {
>  #ifdef DEBUG
>  		struct net_device	*net = dev->net;
> -		dbg("rx framesize %d range %d..%d mtu %d", skb->len,
> -			net->hard_header_len, dev->hard_mtu, net->mtu);
> +		netdev_dbg(dev->net, "rx framesize %d range %d..%d mtu %d\n",
> +			   skb->len, net->hard_header_len, dev->hard_mtu,
> +			   net->mtu);
>  #endif

maybe
		netdev_dbg(net, ...

or remove the odd #ifdef DEBUG surround used to guard
the otherwise unused net variable and use:

		netdev_dbg(dev->net, "rx framesize %d range %d..%d mtu %d\n",
			   skb->len, dev->net->hard_header_len, dev->hard_mtu,
			   dev->net->mtu);

^ permalink raw reply

* Re: [PATCH] tcp: restore rcv_wscale in a repair mode (v2)
From: Pavel Emelyanov @ 2012-09-20  9:31 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andrew Vagin, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1348083600-3881500-1-git-send-email-avagin@openvz.org>

On 09/19/2012 11:40 PM, Andrew Vagin wrote:
> rcv_wscale is a symetric parameter with snd_wscale.
> 
> Both this parameters are set on a connection handshake.
> 
> Without this value a remote window size can not be interpreted correctly,
> because a value from a packet should be shifted on rcv_wscale.
> 
> And one more thing is that wscale_ok should be set too.
> 
> This patch doesn't break a backward compatibility.
> If someone uses it in a old scheme, a rcv window
> will be restored with the same bug (rcv_wscale = 0).
> 
> v2: Save backward compatibility on big-endian system. Before
>     the first two bytes were snd_wscale and the second two bytes were
>     rcv_wscale. Now snd_wscale is opt_val & 0xFFFF and rcv_wscale >> 16.
>     This approach is independent on byte ordering.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andrew Vagin <avagin@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (net-next tree related)
From: Mika Westerberg @ 2012-09-20  9:10 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20120920173622.2aa7209cd241a3945f4384d4@canb.auug.org.au>

On Thu, Sep 20, 2012 at 05:36:22PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/net/ethernet/i825xx/znet.c: In function 'hardware_init':
> drivers/net/ethernet/i825xx/znet.c:868:2: error: implicit declaration of function 'isa_virt_to_bus' [-Werror=implicit-function-declaration]
> 
> Caused by commit 1d3ff76759b7 ("i825xx: znet: fix compiler warnings when
> building a 64-bit kernel").  Is there some Kconfig dependency missing (CONFIG_ISA)?

If we make it dependent on CONFIG_ISA then the driver cannot be built with
64-bit kernel. Then again is there someone running 64-bit kernel on Zenith
Z-note notebook? From the pictures it looks like very ancient "laptop".

An alternative is to make it depend on X86 like this:

diff --git a/drivers/net/ethernet/i825xx/Kconfig b/drivers/net/ethernet/i825xx/Kconfig
index fed5080..959faf7 100644
--- a/drivers/net/ethernet/i825xx/Kconfig
+++ b/drivers/net/ethernet/i825xx/Kconfig
@@ -150,7 +150,7 @@ config SUN3_82586
 
 config ZNET
 	tristate "Zenith Z-Note support (EXPERIMENTAL)"
-	depends on EXPERIMENTAL && ISA_DMA_API
+	depends on EXPERIMENTAL && ISA_DMA_API && X86
 	---help---
 	  The Zenith Z-Note notebook computer has a built-in network
 	  (Ethernet) card, and this is the Linux driver for it. Note that the

^ permalink raw reply related

* Re: Regarding ethernet directory between IP and SoC chip vendor.
From: Jeff Kirsher @ 2012-09-20  8:35 UTC (permalink / raw)
  To: byungho an
  Cc: netdev, davem, peppe.cavallaro, deepak.silki, francesco.virlinzi,
	eilong, alexander.h.duyck, bhutchings, linville, wey-ty.w.guy,
	coelho, e.wahlig, aditya.ps, ihlee215
In-Reply-To: <CAG4h5ywgCS0n8t70uUNmSOVhKBPkOLeVLaO+D4tYfUT2qN7qTg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

On Wed, 2012-09-19 at 21:39 +0900, byungho an wrote:
> Hi all,
> 
> I have one suggestion for ethernet dir.
> Currently It is well-defined and good for management.
> 
> But if IP vendor is different from SoC vender, It is a bit confusing
> to guess dir name.
> For example, stmmac is using Synopsys dwmac.
> In this case, if another SoC vendors try to use Synopsys IP, they
> sould make their own dir under the their name? Even the IP is same...
> 
> If there is common dir of IP vendor, It would be more clear.
> If that, other SoC vendors that try to use the IP can make their own
> directory and drivers intuitionally.
> 
> What do you think about it?
> I want to exchange opinion and find a resonable and rational way.
> 
> Thank you.
> Andy

A lot of thought went into how to organize all the Ethernet drivers, and
after much discussion, it seemed best to organize the drivers by the
manufacturer rather than by who was supporting/writing the driver.  The
reason being is that the manufacturer of the silicon was not going to
change as frequently as the driver supporters.  We did not want to have
to change the location of a driver every time a company was bought.

I personally am open to suggestions on improving the directory structure
so if have an idea on how to improve the directory structure please
provide a patch.  Keep in mind, that drivers can not keep moving because
some company bought another company.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 0/3] usbnet: support runtime PM triggered by link change
From: Bjørn Mork @ 2012-09-20  8:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, David S. Miller, Greg Kroah-Hartman, Fink Dmitry,
	Rafael Wysocki, Alan Stern, netdev, linux-usb
In-Reply-To: <CACVXFVONb+pKkydhp3iesLt0tKvc+56uv6sd5S0UwUnGZiowYA@mail.gmail.com>

Ming Lei <ming.lei@canonical.com> writes:
> On Mon, Sep 17, 2012 at 4:04 PM, Oliver Neukum <oneukum@suse.de> wrote:
>
>> 1) Does it actually save power? You are constantly waking up a CPU.
>
> Of course, it does. I don't know it will save how much power just on
> usbnet device, but it may save power from usb hub and usb host
> controller in the bus at least.
>
> Anyway we don't need to waste power if the link of usbnet is down.
>
> It just wake up CPU one time each 3sec. In modern linux distribution,
> the CPU will be wakup tens times per second, so it shouldn't be a
> big problem.

I do not buy that constantly polling a device necessarily saves any
power compared to keeping the USB link to the device alive.  You need to
measure the savings if you want us to believe that.

You are not only waking the host CPU.  You are also waking the device
CPU. 

Any usbnet device will consist of more than one building block, having
at least a network block, a USB block and a CPU.  For all you know, the
device CPU might be in deep sleep until it has to service either the
network or USB block, and those might also be sleeping until they see an
interrupt.  Constantly polling the device to receive network link status
might use considerably more power than keeping the USB link up waiting
for a link notification.

If you were to implement this feature anyway, then I would prefer a
userspace based approach instead.  I see no reason to keep the timer in
the kernel.  You could decide to suspend whenever the netdev is down,
and only poll the link when userspace tries to bring up the netdev.
That would let a userspace daemon implement the same feature by trying
to bring up the netdev every 3 seconds (or whatever frequency the user
selected).  And it would allow me to avoid the polling until I know I
have plugged in a cable.

>> From that perspective it is possible that leaving on the ethernet is actually
>> better in terms of power. Only measurements can answer that question.
>
> You know it is a bit difficult to test power save for this situation. And
> most of runtime PM patches didn't provide power save data. In fact,
> I'd like to do it, but I have not some equipment to measure it, :-(.

We don't know, you don't know.  But you claim that your change saves
power, so please provide some documentation showing that it does.

>> 2) Do we have many devices that would be serviced with this approach?
>
> At least I found asix can be serviced by this approach. Considered asix
> is quite popular, it is worthy of the effort. Also the below devices can be
> serviced by the patch in theory:
>
>                    dm9601.c / mcs7830.c / sierra_net.c

The sierra_net.c driver is only used for wireless devices AFAIK. I
really don't see the use case for network link based resume of that.
There is no cable to plug.  Userspace will have to initiate a
connection.

And the DirectIP device I've got (MC7710) supports remote wakeup.  I
assume that will be the case for all such devices, given that this is
mostly a firmware feature. So the correct fix for sierra_net.c is to add
support for remote wakeup.  Then you will be able to suspend the device
regardless of whether the link is up or down, without the constant
polling.

Bjørn

^ permalink raw reply

* [PATCH v2] ucc_geth: Reduce IRQ off in xmit path
From: Joakim Tjernlund @ 2012-09-20  8:17 UTC (permalink / raw)
  To: netdev, Francois Romieu; +Cc: Joakim Tjernlund
In-Reply-To: <20120919223416.GA16087@electric-eye.fr.zoreil.com>

Currently ucc_geth_start_xmit wraps IRQ off for the
whole body just to be safe.
Reduce the IRQ off period to a minimum.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---

 v2: Move assignment of ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]]
     inside IRQ off section to prevent racing against
     ucc_geth_tx(). Spotted by Francois Romieu <romieu@fr.zoreil.com>

 drivers/net/ethernet/freescale/ucc_geth.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 9ac14f8..0100bca 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3181,21 +3181,20 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	ugeth_vdbg("%s: IN", __func__);
 
-	spin_lock_irqsave(&ugeth->lock, flags);
-
 	dev->stats.tx_bytes += skb->len;
 
 	/* Start from the next BD that should be filled */
 	bd = ugeth->txBd[txQ];
 	bd_status = in_be32((u32 __iomem *)bd);
-	/* Save the skb pointer so we can free it later */
-	ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
 
 	/* Update the current skb pointer (wrapping if this was the last) */
 	ugeth->skb_curtx[txQ] =
 	    (ugeth->skb_curtx[txQ] +
 	     1) & TX_RING_MOD_MASK(ugeth->ug_info->bdRingLenTx[txQ]);
 
+	spin_lock_irqsave(&ugeth->lock, flags);
+	/* Save the skb pointer so we can free it later */
+	ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
 	/* set up the buffer descriptor */
 	out_be32(&((struct qe_bd __iomem *)bd)->buf,
 		      dma_map_single(ugeth->dev, skb->data,
@@ -3207,6 +3206,8 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* set bd status and length */
 	out_be32((u32 __iomem *)bd, bd_status);
+	spin_unlock_irqrestore(&ugeth->lock, flags);
+
 
 	/* Move to next BD in the ring */
 	if (!(bd_status & T_W))
@@ -3238,8 +3239,6 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	uccf = ugeth->uccf;
 	out_be16(uccf->p_utodr, UCC_FAST_TOD);
 #endif
-	spin_unlock_irqrestore(&ugeth->lock, flags);
-
 	return NETDEV_TX_OK;
 }
 
-- 
1.7.8.6

^ permalink raw reply related

* Re: [PATCH 1/5] ucc_geth: Reduce IRQ off in xmit path
From: Joakim Tjernlund @ 2012-09-20  8:06 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20120919223416.GA16087@electric-eye.fr.zoreil.com>

Francois Romieu <romieu@fr.zoreil.com> wrote on 2012/09/20 00:34:16:
>
> Joakim Tjernlund <Joakim.Tjernlund@transmode.se> :
> > Currently ucc_geth_start_xmit wraps IRQ off for the
> > whole body just to be safe.
> > Reduce the IRQ off period to a minimum.
>
> It opens a window in ucc_geth_start_xmit where the skb slot in
> ugeth->tx_skbuff[txQ] is set and T_RA has not been written into
> the descriptor status. Consider a racing poll : the !skb test in
> ucc_geth_tx may not work as expected.

Right, good catch!

Surprisingly the driver never showed any malfunction even though I hit
it pretty hard.

I will send a V2 of this patch where I move the assignment inside the IRQ
off part:
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3200,8 +3200,6 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
        /* Start from the next BD that should be filled */
        bd = ugeth->txBd[txQ];
        bd_status = in_be32((u32 __iomem *)bd);
-       /* Save the skb pointer so we can free it later */
-       ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;

        /* Update the current skb pointer (wrapping if this was the last) */
        ugeth->skb_curtx[txQ] =
@@ -3209,6 +3207,8 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
             1) & TX_RING_MOD_MASK(ugeth->ug_info->bdRingLenTx[txQ]);

        spin_lock_irqsave(&ugeth->lock, flags);
+       /* Save the skb pointer so we can free it later */
+       ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
        /* set up the buffer descriptor */
        out_be32(&((struct qe_bd __iomem *)bd)->buf,
                      dma_map_single(ugeth->dev, skb->data,

 Jocke

^ permalink raw reply

* Re: [PATCH net-next 01/11] pps/ptp: Allow PHC devices to adjust PPS events for known delay
From: Rodolfo Giometti @ 2012-09-20  7:29 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers, Richard Cochran,
	Andrew Jackson
In-Reply-To: <1348082024.2636.16.camel@bwh-desktop.uk.solarflarecom.com>

On Wed, Sep 19, 2012 at 08:13:44PM +0100, Ben Hutchings wrote:
> Initial version by Stuart Hodgson <smhodgson@solarflare.com>
> 
> Some PHC device drivers may deliver PPS events with a significant
> and variable delay, but still be able to measure precisely what
> that delay is.
> 
> Add a pps_sub_ts() function for subtracting a delay from the
> timestamp(s) in a PPS event, and a PTP event type (PTP_CLOCK_PPSUSR)
> for which the caller provides a complete PPS event.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Acked-by: Rodolfo Giometti <giometti@enneenne.com>

-- 

GNU/Linux Solutions                  e-mail: giometti@enneenne.com
Linux Device Driver                          giometti@linux.it
Embedded Systems                     phone:  +39 349 2432127
UNIX programming                     skype:  rodolfo.giometti
Freelance ICT Italia - Consulente ICT Italia - www.consulenti-ict.it

^ permalink raw reply

* Re: [PATCH] at91ether: return PTR_ERR if call to clk_get fails
From: Nicolas Ferre @ 2012-09-20  7:42 UTC (permalink / raw)
  To: Devendra Naga, netdev, David Miller; +Cc: linux-arm-kernel
In-Reply-To: <1348124676-6627-1-git-send-email-devendra.aaru@gmail.com>

On 09/20/2012 09:04 AM, Devendra Naga :
> we are currently returning ENODEV, as the clk_get may give a exact
> error code in its returned pointer, assign it to the ret by using the
> PTR_ERR function, so that the subsequent goto label will jump to the
> error path and clean the driver and return the error correctly.
> 
> Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

Thanks,

> ---
>  drivers/net/ethernet/cadence/at91_ether.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/at91_ether.c b/drivers/net/ethernet/cadence/at91_ether.c
> index 7788419..4e980a7 100644
> --- a/drivers/net/ethernet/cadence/at91_ether.c
> +++ b/drivers/net/ethernet/cadence/at91_ether.c
> @@ -1086,7 +1086,7 @@ static int __init at91ether_probe(struct platform_device *pdev)
>  	/* Clock */
>  	lp->ether_clk = clk_get(&pdev->dev, "ether_clk");
>  	if (IS_ERR(lp->ether_clk)) {
> -		res = -ENODEV;
> +		res = PTR_ERR(lp->ether_clk);
>  		goto err_ioumap;
>  	}
>  	clk_enable(lp->ether_clk);
> 


-- 
Nicolas Ferre

^ permalink raw reply

* Re: [PATCH 2/4] ipv6: unify conntrack reassembly expire code with standard one
From: Cong Wang @ 2012-09-20  7:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Netfilter Developers, Herbert Xu, Michal Kubeček,
	David Miller, Hideaki YOSHIFUJI, Patrick McHardy,
	Pablo Neira Ayuso
In-Reply-To: <Pine.LNX.4.64.1209191659150.12601@ask.diku.dk>

On Wed, 2012-09-19 at 17:12 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 19 Sep 2012, Cong Wang wrote:
> 
> [cut]
> > With this patch applied, I can see ICMP Time Exceeded sent
> > from the receiver when the sender sent out 3/4 fragmented
> > IPv6 UPD packet.
> 
> Typo "UPD" -> "UDP"
> 
> If people want to redo the IPv6 UDP fragment tests, they can use my scapy 
> script, and comment out sending the last fragment:
>   https://github.com/netoptimizer/network-testing/blob/master/scapy/ipv6_fragment01.py
> 
> Another thing, could you please "mark"/put the version of the patch in the 
> subject line, like:
> 
>   [PATCH V4 2/4] ipv6: ...
> 
> This makes it easier, to follow on which version of the patch people are 
> replying to.
> 
> With git send-email I think you have to do:
> 
>    git send-email --subject-prefix="PATCH V4"
> 
> And with stg (stacked git) I usually do:
> 
>    stg mail --version "V4" --to netdev ...

Thanks, Jesper!

Unfortunately, git-send-email on F16 doesn't have --subject-prefix
option (git-format-patch does), that is why I didn't add "V4" to every
patch. Perhaps I should use git-format-patch + git-send-email next time.



^ permalink raw reply

* Re: [PATCH 5/6] xfrm_user: ensure user supplied esn replay window is valid
From: Mathias Krause @ 2012-09-20  7:37 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Ben Hutchings, David S. Miller, netdev, linux-kernel,
	Martin Willi
In-Reply-To: <20120920070508.GA4221@secunet.com>

On Thu, Sep 20, 2012 at 9:05 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Thu, Sep 20, 2012 at 08:12:11AM +0200, Mathias Krause wrote:
>> On Thu, Sep 20, 2012 at 12:38 AM, Ben Hutchings
>> <bhutchings@solarflare.com> wrote:
>> > On Wed, 2012-09-19 at 23:33 +0200, Mathias Krause wrote:
>>
>> > I'm a little worried that the user-provided
>> > xfrm_replay_state_esn::bmp_len is not being directly validated anywhere.
>>
>> That's what my P.S. in the cover letter tried to hint at -- a missing
>> upper limit check. But as I wanted to avoid lengthy discussions about
>> the concrete value and the possible need for some sysctl knob to tune
>> this even further, I just left this as an exercise for someone else
>> who is more familiar with the code ;)
>>
>
> I think we should limit bmp_len to some sane value. RFC 4303 recommends
> an anti replay window size of 64 packets, so limiting bmp_len to cover
> 4096 packets should be more that enough. Also we can increase this value
> later without changing the user API if this is needed.

Okay. If no-one objects, I'll at add a upper limit check for 4096
packets to verify_replay().

>> [...]
>> I disagree. The value of nla_len() is ensured to be in the range of
>> [sizeof(*up), USHRT_MAX-NLA_HDRLEN], i.e. a positive 16 bit number,
>> when it passes nlmsg_parse() in xfrm_user_rcv_msg(). This in turn
>> allows us to assume the int value returned by nla_len() is actually
>> positive and the compiler can safely make it unsigned for the compare
>> -- no sign bit, no hassle.
>
> I think xfrm_replay_state_esn_len() should return the same type as
> nla_len(), no matter what we can assume from the current code base.

The type of the expression calculated in xfrm_replay_state_esn_len()
is size_t; the functions the value get passed onto (k*alloc, kmemdup,
memcpy, memcmp) expect a size_t argument; expressions where the value
is evaluated to calculate sizes (e.g. in xfrm_sa_len) operate on
size_t types. So size_t feels just natural.

> Also it should not return anything else than the other xfrm length
> calculation functions.

So the other functions should have a return type of size_t, too?

Anyway, such a cleanup should go into a separate patch as the other
functions are not vulnerable to an overflow like it could happen in
xfrm_replay_state_esn_len().

> Once we limited bmp_len, xfrm_replay_state_esn_len() should return
> always a positive value.

True. So int it'll be then again for xfrm_replay_state_esn_len() in v3
of the patch.


Thanks,
Mathias

^ permalink raw reply

* Re: [Patch net-next] l2tp: fix compile error when CONFIG_IPV6=m and CONFIG_L2TP=y
From: Cong Wang @ 2012-09-20  7:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120919.164403.1265757047954729748.davem@davemloft.net>

On Wed, 2012-09-19 at 16:44 -0400, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Tue, 18 Sep 2012 13:54:02 +0800
> 
> > When CONFIG_IPV6=m and CONFIG_L2TP=y, I got the following compile error:
>  ...
> > This is due to l2tp uses symbols from IPV6, so when l2tp is
> > builtin, IPV6 has to be builtin too.
> > 
> > Cc: David Miller <davem@davemloft.net>
> > Signed-off-by: Cong Wang <amwang@redhat.com>
> 
> The correct way to express this is:
> 
> 	depends on (IPV6 || IPV6=n)
> 
> Which results in the KCONFIG option only being offers in modes
> compatible with the dependency.  Using a 'select' doesn't work
> properly in these kinds of cases.
> 
> Anyways, grep for that string to see how it is used in other similar
> situations.
> 

Thanks for the hints! I will update this patch.

^ permalink raw reply

* linux-next: build failure after merge of the final tree (net-next tree related)
From: Stephen Rothwell @ 2012-09-20  7:36 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 593 bytes --]

Hi all,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/net/ethernet/i825xx/znet.c: In function 'hardware_init':
drivers/net/ethernet/i825xx/znet.c:868:2: error: implicit declaration of function 'isa_virt_to_bus' [-Werror=implicit-function-declaration]

Caused by commit 1d3ff76759b7 ("i825xx: znet: fix compiler warnings when
building a 64-bit kernel").  Is there some Kconfig dependency missing (CONFIG_ISA)?

I have reverted that commit for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Patrick McHardy @ 2012-09-20  7:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, netdev,
	yongjun_wei
In-Reply-To: <1348126142.2761.172.camel@localhost>

On Thu, 20 Sep 2012, Jesper Dangaard Brouer wrote:

> On Thu, 2012-09-20 at 08:57 +0200, Patrick McHardy wrote:
>> On Wed, 19 Sep 2012, Jesper Dangaard Brouer wrote:
>>
>>> On Fri, 2012-09-14 at 15:15 +0200, Patrick McHardy wrote:
>>>> On Fri, 14 Sep 2012, Pablo Neira Ayuso wrote:
>>>>
>>> [...cut...]
>>>>>> Patrick, any other idea?
>>>>>
>>> [...cut...]
> [... (hair)cut(?)...]
>
>>> No it does not work :-(
>>
>> Ok I think I understand the problem now, we're invoking the NAT cleanup
>> callback twice with clean->hash = true, once for each direction of the
>> conntrack.
>>
>> Does this patch fix the problem?
>
> Yes, it fixes the problem :-)
>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Great, thanks for testing.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox