Netdev List
 help / color / mirror / Atom feed
* [PATCH] rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference
From: Mark A. Greer @ 2012-07-20 23:35 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mark A. Greer, Jiri Pirko

From: "Mark A. Greer" <mgreer@animalcreek.com>

Commit 76ff5cc91935c51fcf1a6a99ffa28b97a6e7a884
(rtnl: allow to specify number of rx and tx queues
on device creation) added a reference to the net_device
structure's 'num_rx_queues' member in

	net/core/rtnetlink.c:rtnl_fill_ifinfo()

However, the definition for 'num_rx_queues' is surrounded
by an '#ifdef CONFIG_RPS' while the new reference to it is
not.  This causes a compile error when CONFIG_RPS is not
defined.

Fix the compile error by surrounding the new reference to
'num_rx_queues' by an '#ifdef CONFIG_RPS'.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---

The problem can be easily reproduced by compiling with
davinci_all_defconfig (ARCH=arm).  I don't know this
area well enough to know whether that (and other)
defconfigs should have CONFIG_RPS enabled or not, or
whether there is some missing Kconfig logic to enable
it.

 net/core/rtnetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5bb1ebc..334b930 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -892,7 +892,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	    nla_put_u32(skb, IFLA_GROUP, dev->group) ||
 	    nla_put_u32(skb, IFLA_PROMISCUITY, dev->promiscuity) ||
 	    nla_put_u32(skb, IFLA_NUM_TX_QUEUES, dev->num_tx_queues) ||
+#ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
+#endif
 	    (dev->ifindex != dev->iflink &&
 	     nla_put_u32(skb, IFLA_LINK, dev->iflink)) ||
 	    (dev->master &&
-- 
1.7.11.2

^ permalink raw reply related

* [PATCH RFT]  net: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
From: Shuah Khan @ 2012-07-20 23:34 UTC (permalink / raw)
  To: davem, mcarlson, bhutchings, eric.dumazet, mchan; +Cc: netdev, LKML, shuahkhan

Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return
value to be consistent with the rest of the checks after niu_rbr_add_page()
calls in this file.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
---
 drivers/net/ethernet/sun/niu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index 60d5c03..c2a0fe3 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3517,7 +3517,7 @@ static int niu_rbr_fill(struct niu *np, struct rx_ring_info *rp, gfp_t mask)
 	err = 0;
 	while (index < (rp->rbr_table_size - blocks_per_page)) {
 		err = niu_rbr_add_page(np, rp, mask, index);
-		if (err)
+		if (unlikely(err))
 			break;
 
 		index += blocks_per_page;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [net-next 4/6] e1000: configure and read MDI settings
From: Ben Hutchings @ 2012-07-20 23:27 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Jesse Brandeburg, netdev, gospo, sassmann, Tushar Dave
In-Reply-To: <1342820631-19738-5-git-send-email-jeffrey.t.kirsher@intel.com>

On Fri, 2012-07-20 at 14:43 -0700, Jeff Kirsher wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> This is the implementation in e1000 to allow ethtool to force
> MDI state, allowing users to work around some improperly
> behaving switches.
> 
> Current get_settings behavior slightly changes in that now when link is down
> get_settings will return the MDI state of the last link because get_settings
> needs to succeed to allow the set to work even when link is down.
> 
> Forcing in this driver is for now only allowed when auto-neg is enabled.
> 
> To use must have the matching version of ethtool app that supports
> this functionality.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> CC: Tushar Dave <tushar.n.dave@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c |   34 ++++++++++++++++++++++
>  drivers/net/ethernet/intel/e1000/e1000_main.c    |    4 +++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> index 3103f0b..1d96bda 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> @@ -174,6 +174,15 @@ static int e1000_get_settings(struct net_device *netdev,
>  
>  	ecmd->autoneg = ((hw->media_type == e1000_media_type_fiber) ||
>  			 hw->autoneg) ? AUTONEG_ENABLE : AUTONEG_DISABLE;
> +
> +	/* MDI-X => 1; MDI => 0 */
> +	if (hw->media_type == e1000_media_type_copper)
> +		ecmd->eth_tp_mdix = (!!adapter->phy_info.mdix_mode ?
> +							ETH_TP_MDI_X :
> +							ETH_TP_MDI);
> +	else
> +		ecmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
[...]

Why don't you set ecmd->eth_tp_mdix_ctrl here?

If you also leave it as 0, it's impossible for userland to tell whether
the current mode was forced or automatically selected.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] ipv4: show pmtu in route list
From: David Miller @ 2012-07-20 23:22 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1207210219470.1893@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Sat, 21 Jul 2012 02:26:28 +0300 (EEST)

> 	I'll try this weekend to reorganize the seqlock
> usage in tcp_metrics.c and to provide method to feed
> rt_fill_info with values from this cache.

Wouldn't it be better to just export the TCP metrics via it's own
file or netlink facility?

They keying of the TCP metrics is completely different from how routes
are key'd.  So I see little value in creating the illusion that these
two things live in the same keying domain.

The routing cache will be completely gone and /proc/net/rt_cache will
be an empty file.

^ permalink raw reply

* Re: [PATCH] ipv4: show pmtu in route list
From: Julian Anastasov @ 2012-07-20 23:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120720.111620.1609537308761180.davem@davemloft.net>


	Hello,

On Fri, 20 Jul 2012, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Fri, 20 Jul 2012 12:02:08 +0300
> 
> > Is this patch still useful if routing cache is removed?
> 
> It is, since this function still gets used for rtnetlink route
> queries.  So I'll apply this, thanks Julian!
> 
> Which reminds me that we don't have way to inspect the new TCP metrics
> cache.  Would someone like to work on that?

	I'll try this weekend to reorganize the seqlock
usage in tcp_metrics.c and to provide method to feed
rt_fill_info with values from this cache.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH] forcedeth: spin_unlock_irq in interrupt handler fix
From: David Miller @ 2012-07-20 23:18 UTC (permalink / raw)
  To: yefremov.denis
  Cc: david.decotigny, edumazet, jpirko, ian.campbell, netdev,
	linux-kernel
In-Reply-To: <1342821274-20623-1-git-send-email-yefremov.denis@gmail.com>

From: Denis Efremov <yefremov.denis@gmail.com>
Date: Sat, 21 Jul 2012 01:54:34 +0400

> The replacement of spin_lock_irq/spin_unlock_irq pair in interrupt
> handler by spin_lock_irqsave/spin_lock_irqrestore pair.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [GIT net-next] Open vSwitch
From: David Miller @ 2012-07-20 23:17 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Date: Fri, 20 Jul 2012 15:26:43 -0700

> A few bug fixes and small enhancements for net-next/3.6.
> 
> The following changes since commit bf32fecdc1851ad9ca960f56771b798d17c26cf1:
> 
>   openvswitch: Add length check when retrieving TCP flags. (2012-04-02 14:28:57 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master

Pulled, thanks Jesse.

^ permalink raw reply

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: David Miller @ 2012-07-20 23:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20120720.155412.1512575922394612084.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 20 Jul 2012 15:54:12 -0700 (PDT)

> From: David Miller <davem@davemloft.net>
> Date: Fri, 20 Jul 2012 15:50:21 -0700 (PDT)
> 
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sat, 21 Jul 2012 00:42:46 +0200
>> 
>>> (Apparently we choke on neighbour entries count.
>>> 
>>> entries = atomic_inc_return(&tbl->entries) - 1;
>>> 
>>> We need a percpu_counter ? Or something is wrong ?
>> 
>> What do you mean we choke on it?  Does it exceed the thresholds
>> and we start garbage-collecting?
>> 
>> That would indicate a leak, or we are creating new neigh entries when
>> we shouldn't be, ie. we're not comparing the keys in the hash table
>> entries correctly during the lookup in net/ipv4/ip_output.c
> 
> I see the problem, we get the key wrong during neigh creation for
> loopback.
> 
> I'll fix this, thanks.

This should do it:

====================
[PATCH] ipv4: Fix neigh lookup keying over loopback/point-to-point devices.

We were using a special key "0" for all loopback and point-to-point
device neigh lookups under ipv4, but we wouldn't use that special
key for the neigh creation.

So basically we'd make a new neigh at each and every lookup :-)

This special case to use only one neigh for these device types
is of dubious value, so just remove it entirely.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/arp.h |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 4617d98..7f7df93 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -21,9 +21,6 @@ static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev
 	struct neighbour *n;
 	u32 hash_val;
 
-	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
-		key = 0;
-
 	hash_val = arp_hashfn(key, dev, nht->hash_rnd[0]) >> (32 - nht->hash_shift);
 	for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);
 	     n != NULL;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: David Miller @ 2012-07-20 22:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20120720.155021.1919200619716435427.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 20 Jul 2012 15:50:21 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 21 Jul 2012 00:42:46 +0200
> 
>> (Apparently we choke on neighbour entries count.
>> 
>> entries = atomic_inc_return(&tbl->entries) - 1;
>> 
>> We need a percpu_counter ? Or something is wrong ?
> 
> What do you mean we choke on it?  Does it exceed the thresholds
> and we start garbage-collecting?
> 
> That would indicate a leak, or we are creating new neigh entries when
> we shouldn't be, ie. we're not comparing the keys in the hash table
> entries correctly during the lookup in net/ipv4/ip_output.c

I see the problem, we get the key wrong during neigh creation for
loopback.

I'll fix this, thanks.

^ permalink raw reply

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: David Miller @ 2012-07-20 22:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1342824166.2626.8112.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 21 Jul 2012 00:42:46 +0200

> (Apparently we choke on neighbour entries count.
> 
> entries = atomic_inc_return(&tbl->entries) - 1;
> 
> We need a percpu_counter ? Or something is wrong ?

What do you mean we choke on it?  Does it exceed the thresholds
and we start garbage-collecting?

That would indicate a leak, or we are creating new neigh entries when
we shouldn't be, ie. we're not comparing the keys in the hash table
entries correctly during the lookup in net/ipv4/ip_output.c

^ permalink raw reply

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: Eric Dumazet @ 2012-07-20 22:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1342821959.2626.8052.camel@edumazet-glaptop>

On Sat, 2012-07-21 at 00:06 +0200, Eric Dumazet wrote:

> Hmm, ok, please give me few hours to make some tests ;)
> 

It seems we have a big regression somewhere with net-next,
but it is already there...

(Apparently we choke on neighbour entries count.

entries = atomic_inc_return(&tbl->entries) - 1;

We need a percpu_counter ? Or something is wrong ?

We also choke on write_lock_bh(&tbl->lock); (__write_lock_failed())
 in __neigh_create()

current 'linux' tree :

tbench 24 -t 60

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    8433514     0.023     1.566
 Close        6195255     0.023     1.450
 Rename        357080     0.022     1.457
 Unlink       1702925     0.023     1.409
 Deltree          240     0.000     0.001
 Mkdir            120     0.024     0.032
 Qpathinfo    7643560     0.023     1.565
 Qfileinfo    1340393     0.023     1.566
 Qfsinfo      1401593     0.023     1.425
 Sfileinfo     686932     0.023     0.237
 Find         2955412     0.023     1.566
 WriteX       4209695     0.043     1.468
 ReadX        13218668     0.029     1.614
 LockX          27458     0.024     0.059
 UnlockX        27458     0.024     0.056
 Flush         591126     0.023     0.317

Throughput 4418.83 MB/sec  24 clients  24 procs  max_latency=2.433 ms

net-next tree with your 16 patches :

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    6545220     0.031    14.433
 Close        4808070     0.031    14.105
 Rename        277171     0.030     0.737
 Unlink       1321711     0.031     2.370
 Deltree          172     0.000     0.001
 Mkdir             86     0.033     0.134
 Qpathinfo    5932577     0.031    11.607
 Qfileinfo    1039922     0.031     6.075
 Qfsinfo      1087803     0.031    12.178
 Sfileinfo     533226     0.031     0.993
 Find         2293696     0.031    11.059
 WriteX       3264634     0.054    19.164
 ReadX        10260208     0.038    11.857
 LockX          21319     0.032     0.168
 UnlockX        21319     0.032     0.162
 Flush         458724     0.032     1.774

Throughput 3425.42 MB/sec  24 clients  24 procs  max_latency=19.174 ms



perf output for linux tree :

Samples: 6M of event 'cycles', Event count (approx.): 4966119889380                                                                                            
  4,18%           tbench  tbench                             [.] 0x0000000000001f49
  4,09%           tbench  libc-2.15.so                       [.] 0x000000000003cb08
  3,10%       tbench_srv  [kernel.kallsyms]                  [k] copy_user_generic_string
  2,05%       tbench_srv  [kernel.kallsyms]                  [k] ipt_do_table
  2,04%           tbench  [kernel.kallsyms]                  [k] ipt_do_table
  1,48%           tbench  [kernel.kallsyms]                  [k] copy_user_generic_string
  1,43%       tbench_srv  [kernel.kallsyms]                  [k] tcp_ack
  1,08%       tbench_srv  [kernel.kallsyms]                  [k] tcp_recvmsg
  1,06%           tbench  [kernel.kallsyms]                  [k] nf_iterate
  1,00%       tbench_srv  [kernel.kallsyms]                  [k] nf_iterate
  0,94%       tbench_srv  [nf_conntrack]                     [k] tcp_packet
  0,94%           tbench  [nf_conntrack]                     [k] tcp_packet
  0,90%       tbench_srv  [kernel.kallsyms]                  [k] __schedule
  0,87%           tbench  [kernel.kallsyms]                  [k] __schedule
  0,87%       tbench_srv  [kernel.kallsyms]                  [k] _raw_spin_lock_bh
  0,85%       tbench_srv  [kernel.kallsyms]                  [k] tcp_sendmsg
  0,80%           tbench  [kernel.kallsyms]                  [k] __switch_to
  0,79%           tbench  [kernel.kallsyms]                  [k] _raw_spin_lock_bh
  0,77%           tbench  libc-2.15.so                       [.] vfprintf
  0,76%           tbench  [kernel.kallsyms]                  [k] tcp_sendmsg
  0,74%           tbench  [kernel.kallsyms]                  [k] select_task_rq_fair
  0,72%       tbench_srv  tbench_srv                         [.] 0x0000000000001840
  0,70%       tbench_srv  libc-2.15.so                       [.] recv
  0,65%       tbench_srv  [kernel.kallsyms]                  [k] tcp_rcv_established
  0,65%           tbench  [kernel.kallsyms]                  [k] tcp_transmit_skb
  0,64%           tbench  [vdso]                             [.] 0x00007fffd93459e8
  0,63%       tbench_srv  [kernel.kallsyms]                  [k] tcp_transmit_skb
  0,63%           tbench  [kernel.kallsyms]                  [k] tcp_recvmsg
  0,55%       tbench_srv  [nf_conntrack]                     [k] nf_conntrack_in

perf for net-next tree :

Samples: 6M of event 'cycles', Event count (approx.): 4685309724658                                                                                            
  3,42%           tbench  tbench                         [.] 0x00000000000035ab
  3,32%           tbench  libc-2.15.so                   [.] 0x00000000000913f0
  2,52%       tbench_srv  [kernel.kallsyms]              [k] copy_user_generic_string
  1,75%           tbench  [kernel.kallsyms]              [k] ipt_do_table
  1,71%       tbench_srv  [kernel.kallsyms]              [k] ipt_do_table
  1,31%           tbench  [kernel.kallsyms]              [k] __neigh_create
  1,25%       tbench_srv  [kernel.kallsyms]              [k] __neigh_create
  1,23%           tbench  [kernel.kallsyms]              [k] nf_iterate
  1,19%           tbench  [kernel.kallsyms]              [k] copy_user_generic_string
  1,19%       tbench_srv  [kernel.kallsyms]              [k] nf_iterate
  1,02%       tbench_srv  [kernel.kallsyms]              [k] tcp_ack
  0,96%       tbench_srv  [kernel.kallsyms]              [k] tcp_recvmsg
  0,88%       tbench_srv  [kernel.kallsyms]              [k] __write_lock_failed
  0,88%           tbench  [kernel.kallsyms]              [k] __write_lock_failed
  0,82%           tbench  [kernel.kallsyms]              [k] __schedule
  0,77%       tbench_srv  [kernel.kallsyms]              [k] tcp_sendmsg
  0,76%       tbench_srv  [kernel.kallsyms]              [k] __schedule
  0,76%           tbench  [nf_conntrack]                 [k] tcp_packet
  0,74%       tbench_srv  [nf_conntrack]                 [k] tcp_packet
  0,74%           tbench  [kernel.kallsyms]              [k] __switch_to
  0,71%       tbench_srv  [kernel.kallsyms]              [k] _raw_spin_lock_bh
  0,68%           tbench  [kernel.kallsyms]              [k] tcp_sendmsg
  0,66%           tbench  [kernel.kallsyms]              [k] _raw_spin_lock_bh
  0,63%           tbench  [kernel.kallsyms]              [k] ip_finish_output
  0,63%           tbench  [kernel.kallsyms]              [k] tcp_recvmsg
  0,61%       tbench_srv  [kernel.kallsyms]              [k] ip_finish_output
  0,61%           tbench  [vdso]                         [.] 0x00007fffb57ff8d1
  0,60%       tbench_srv  libc-2.15.so                   [.] recv
  0,59%           tbench  [kernel.kallsyms]              [k] neigh_destroy

^ permalink raw reply

* Re: ibmveth bug?
From: Nishanth Aravamudan @ 2012-07-20 22:41 UTC (permalink / raw)
  To: santil; +Cc: anton, benh, paulus, netdev, linux-kernel
In-Reply-To: <20120515170141.GA14272@linux.vnet.ibm.com>

Ping on this ... we've tripped the same issue on a different system, it
would appear. Would appreciate if anyone can provide answers to the
questions below.

Thanks,
Nish

On 15.05.2012 [10:01:41 -0700], Nishanth Aravamudan wrote:
> Hi Santiago,
> 
> Are you still working on ibmveth?
> 
> I've found a very sporadic bug with ibmveth in some testing. PAPR
> requires that:
> 
> "Validate the Buffer Descriptor of the receive queue buffer (I/O
> addresses for entire buffer length starting at the spec- ified I/O
> address are translated by the RTCE table, length is a multiple of 16
> bytes, and alignment is on a 16 byte boundary) else H_Parameter."
> 
> but from what I can tell ibmveth.c is not enforcing this last condition:
> 
> 	adapter->rx_queue.queue_addr =
> 		kmalloc(adapter->rx_queue.queue_len, GFP_KERNEL);
> 
> 	...
> 
> 	adapter->rx_queue.queue_dma = dma_map_single(dev,
> 		adapter->rx_queue.queue_addr, adapter->rx_queue.queue_len,
> 		DMA_BIDIRECTIONAL);
> 
> 	...
> 
> 	rxq_desc.fields.address = adapter->rx_queue.queue_dma;
> 
> 	...
> 	
> 
> 	lpar_rc = ibmveth_register_logical_lan(adapter, rxq_desc,
> 		mac_address);
> 	netdev_err(netdev, "buffer TCE:0x%llx filter TCE:0x%llx rxq "
> 	 	"desc:0x%llx MAC:0x%llx\n", adapter->buffer_list_dma,
> 	 	adapter->filter_list_dma, rxq_desc.desc, mac_address);
> 
> And I got on one install attempt:
> 
> [ 39.978430] ibmveth 30000004: eth0: h_register_logical_lan failed with -4
> [ 39.978449] ibmveth 30000004: eth0: buffer TCE:0x1000 filter TCE:0x10000 rxq desc:0x80006010000200a8 MAC:0x56754de8e904
> 
> rxq desc, as you can see is not 16byte aligned. kmalloc() only
> guarantees 8-byte alignment (as does gcc, I think). Initially, I thought
> we could just overallocate the queue_addr and ALIGN() down, but then we
> would need to save the original kmalloc pointer in a new struct member
> per rx_queue.
> 
> So a couple of questions:
> 
> 1) Is my analysis accurate? :)
> 
> 2) How gross would it be to save an extra pointer for every rx_queue?
> 
> 3) Based upon 2), is it better to just go ahead and create our own
> kmem_cache (which gets an alignment specified)?
> 
> For 3), I started coding this, but couldn't find a clean place to
> allocate the kmem_cache itself, as the size of each object depends on
> the run-time characteristics (afaict), but needs to be specified at
> cache creation time. Any insight you could provide would be great!
> 
> Thanks,
> Nish
>  
> -- 
> Nishanth Aravamudan <nacc@us.ibm.com>
> IBM Linux Technology Center

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply

* [PATCH net-next 7/7] openvswitch: Fix typo in documentation.
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Leo Alterman <lalterman-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Leo Alterman <lalterman-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 Documentation/networking/openvswitch.txt |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt
index b8a048b..8fa2dd1 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.txt
@@ -118,7 +118,7 @@ essentially like this, ignoring metadata:
 Naively, to add VLAN support, it makes sense to add a new "vlan" flow
 key attribute to contain the VLAN tag, then continue to decode the
 encapsulated headers beyond the VLAN tag using the existing field
-definitions.  With this change, an TCP packet in VLAN 10 would have a
+definitions.  With this change, a TCP packet in VLAN 10 would have a
 flow key much like this:
 
     eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 6/7] openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

At the point where it was used, skb_shinfo(skb)->gso_type referred to a
post-GSO sk_buff.  Thus, it would always be 0.  We want to know the pre-GSO
gso_type, so we need to obtain it before segmenting.

Before this change, the kernel would pass inconsistent data to userspace:
packets for UDP fragments with nonzero offset would be passed along with
flow keys that indicate a zero offset (that is, the flow key for "later"
fragments claimed to be "first" fragments).  This inconsistency tended
to confuse Open vSwitch userspace, causing it to log messages about
"failed to flow_del" the flows with "later" fragments.

Signed-off-by: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 670e630..29dbfcb 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -263,6 +263,7 @@ err:
 static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
 			     const struct dp_upcall_info *upcall_info)
 {
+	unsigned short gso_type = skb_shinfo(skb)->gso_type;
 	struct dp_upcall_info later_info;
 	struct sw_flow_key later_key;
 	struct sk_buff *segs, *nskb;
@@ -279,7 +280,7 @@ static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
 		if (err)
 			break;
 
-		if (skb == segs && skb_shinfo(skb)->gso_type & SKB_GSO_UDP) {
+		if (skb == segs && gso_type & SKB_GSO_UDP) {
 			/* The initial flow key extracted by ovs_flow_extract()
 			 * in this case is for a first fragment, so we need to
 			 * properly mark later fragments.
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 5/7] openvswitch: Check currect return value from skb_gso_segment()
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Fix return check typo.

Signed-off-by: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index b512cb8..670e630 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -269,8 +269,8 @@ static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
 	int err;
 
 	segs = skb_gso_segment(skb, NETIF_F_SG | NETIF_F_HW_CSUM);
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
+	if (IS_ERR(segs))
+		return PTR_ERR(segs);
 
 	/* Queue all of the segments. */
 	skb = segs;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 4/7] openvswitch: Reset upper layer protocol info on internal devices.
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

It's possible that packets that are sent on internal devices (from
the OVS perspective) have already traversed the local IP stack.
After they go through the internal device, they will again travel
through the IP stack which may get confused by the presence of
existing information in the skb. The problem can be observed
when switching between namespaces. This clears out that information
to avoid problems but deliberately leaves other metadata alone.
This is to provide maximum flexibility in chaining together OVS
and other Linux components.

Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/vport-internal_dev.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index de509d3..4061b9e 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -24,6 +24,9 @@
 #include <linux/ethtool.h>
 #include <linux/skbuff.h>
 
+#include <net/dst.h>
+#include <net/xfrm.h>
+
 #include "datapath.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
@@ -209,6 +212,11 @@ static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
 	int len;
 
 	len = skb->len;
+
+	skb_dst_drop(skb);
+	nf_reset(skb);
+	secpath_reset(skb);
+
 	skb->dev = netdev;
 	skb->pkt_type = PACKET_HOST;
 	skb->protocol = eth_type_trans(skb, netdev);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 3/7] openvswitch: Replace Nicira Networks.
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Raju Subramanian
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Raju Subramanian <rsubramanian-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.

Signed-off-by: Raju Subramanian <rsubramanian-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/actions.c            |    2 +-
 net/openvswitch/datapath.c           |    2 +-
 net/openvswitch/datapath.h           |    2 +-
 net/openvswitch/dp_notify.c          |    2 +-
 net/openvswitch/flow.c               |    2 +-
 net/openvswitch/flow.h               |    2 +-
 net/openvswitch/vport-internal_dev.c |    2 +-
 net/openvswitch/vport-internal_dev.h |    2 +-
 net/openvswitch/vport-netdev.c       |    2 +-
 net/openvswitch/vport-netdev.h       |    2 +-
 net/openvswitch/vport.c              |    2 +-
 net/openvswitch/vport.h              |    2 +-
 12 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 48badff..f3f96ba 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2012 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 4813d95..b512cb8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2012 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index c73370c..c1105c1 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 4673651..36dcee8 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index c6e1dae..1115dcf 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2011 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 2747dc2..9b75617 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2011 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index b6b1d7d..de509d3 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport-internal_dev.h b/net/openvswitch/vport-internal_dev.h
index 3454447..9a7d30e 100644
--- a/net/openvswitch/vport-internal_dev.h
+++ b/net/openvswitch/vport-internal_dev.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2011 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index c1068ae..54a456d 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index fd9b008..f7072a2 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2011 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6c066ba..6140336 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 1960962..aac680c 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2011 Nicira Networks.
+ * Copyright (c) 2007-2012 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 2/7] openvswitch: Do not send notification if ovs_vport_set_options() failed
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

There is no need to send a notification if ovs_vport_set_options() failed
and ovs_vport_cmd_set() did not change anything.

Signed-off-by: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e44e631..4813d95 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1635,7 +1635,9 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
 
 	if (!err && a[OVS_VPORT_ATTR_OPTIONS])
 		err = ovs_vport_set_options(vport, a[OVS_VPORT_ATTR_OPTIONS]);
-	if (!err && a[OVS_VPORT_ATTR_UPCALL_PID])
+	if (err)
+		goto exit_unlock;
+	if (a[OVS_VPORT_ATTR_UPCALL_PID])
 		vport->upcall_pid = nla_get_u32(a[OVS_VPORT_ATTR_UPCALL_PID]);
 
 	reply = ovs_vport_cmd_build_info(vport, info->snd_pid, info->snd_seq,
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 1/7] openvswitch: Enable retrieval of TCP flags from IPv6 traffic.
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Michael Mao
In-Reply-To: <1342823210-3308-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

We currently check that a packet is IPv4 and TCP before fetching the
TCP flags.  This enables fetching from IPv6 packets as well.

Reported-by: Michael Mao <mmao-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 2a11ec2..c6e1dae 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -182,7 +182,8 @@ void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb)
 {
 	u8 tcp_flags = 0;
 
-	if (flow->key.eth.type == htons(ETH_P_IP) &&
+	if ((flow->key.eth.type == htons(ETH_P_IP) ||
+	     flow->key.eth.type == htons(ETH_P_IPV6)) &&
 	    flow->key.ip.proto == IPPROTO_TCP &&
 	    likely(skb->len >= skb_transport_offset(skb) + sizeof(struct tcphdr))) {
 		u8 *tcp = (u8 *)tcp_hdr(skb);
-- 
1.7.9.5

^ permalink raw reply related

* [GIT net-next] Open vSwitch
From: Jesse Gross @ 2012-07-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

A few bug fixes and small enhancements for net-next/3.6.

The following changes since commit bf32fecdc1851ad9ca960f56771b798d17c26cf1:

  openvswitch: Add length check when retrieving TCP flags. (2012-04-02 14:28:57 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master

for you to fetch changes up to efaac3bf087b1a6cec28f2a041e01c874d65390c:

  openvswitch: Fix typo in documentation. (2012-07-20 14:51:07 -0700)

----------------------------------------------------------------
Ansis Atteka (1):
      openvswitch: Do not send notification if ovs_vport_set_options() failed

Ben Pfaff (1):
      openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().

Jesse Gross (2):
      openvswitch: Enable retrieval of TCP flags from IPv6 traffic.
      openvswitch: Reset upper layer protocol info on internal devices.

Leo Alterman (1):
      openvswitch: Fix typo in documentation.

Pravin B Shelar (1):
      openvswitch: Check currect return value from skb_gso_segment()

Raju Subramanian (1):
      openvswitch: Replace Nicira Networks.

 Documentation/networking/openvswitch.txt |    2 +-
 net/openvswitch/actions.c                |    2 +-
 net/openvswitch/datapath.c               |   13 ++++++++-----
 net/openvswitch/datapath.h               |    2 +-
 net/openvswitch/dp_notify.c              |    2 +-
 net/openvswitch/flow.c                   |    5 +++--
 net/openvswitch/flow.h                   |    2 +-
 net/openvswitch/vport-internal_dev.c     |   10 +++++++++-
 net/openvswitch/vport-internal_dev.h     |    2 +-
 net/openvswitch/vport-netdev.c           |    2 +-
 net/openvswitch/vport-netdev.h           |    2 +-
 net/openvswitch/vport.c                  |    2 +-
 net/openvswitch/vport.h                  |    2 +-
 13 files changed, 30 insertions(+), 18 deletions(-)

^ permalink raw reply

* Re: [PATCH] netns: correctly use per-netns ipv4 sysctl_tcp_mem
From: Glauber Costa @ 2012-07-20 22:22 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20120709.152100.571089964662155300.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On 07/09/2012 07:21 PM, David Miller wrote:
> From: Huang Qiang <h.huangqiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Date: Mon, 9 Jul 2012 14:05:09 +0800
> 
>> From: Yang Zhenzhang <yangzhenzhang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>>
>> Now, kernel allows each net namespace to independently set up its levels
>> for tcp memory pressure thresholds.
>>
>> But it seems there is a bug, as using the following steps:
>>
>> [root@host socket]# lxc-start -n test -f config /bin/bash
>> [root@net-test socket]# ip route add default via 192.168.58.2
>> [root@net-test socket]# echo 0 0 0 > /proc/sys/net/ipv4/tcp_mem
>> [root@net-test socket]# scp root-Q0ErXNX1RuabR28l3DCWlg@public.gmane.org:/home/tcp_mem_test .
>>
>> and it still can transport the "tcp_mem_test" file which we hope it
>> would not.
>>
>> It's because inet_init() (net/ipv4/af_inet.c)initialize the
>> tcp_prot.sysctl_mem:
>> tcp_prot.sysctl_mem = init_net.ipv4.sysctl_tcp_mem;
>>
>> So when the protocal is TCP, sk->sk_prot->sysctl_mem(following code)
>> always use the ipv4 sysctl_tcp_mem of init_net namespace rather than
>> it's own net namespace.
>> This patch simply set "prot" equal to net->ipv4.sysctl_tcp_mem when
>> the protocol type is TCP.
>>
>> Signed-off-by: Yang Zhenzhang <yangzhenzhang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> Another regression added by the socket memory cgroup code, BIG
> SURPRISE.
> 

Back from vacations: If I understand the submission correctly, this is
not a regression, since it seems to be only happening when those values
are set inside the network namespace - which was not possible before.

In any case, I believe from what I can see that the fix is already in
the way (haven't seen the whole backlog yet)

^ permalink raw reply

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: Eric Dumazet @ 2012-07-20 22:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120720.142502.1144557295933737451.davem@davemloft.net>

On Fri, 2012-07-20 at 14:25 -0700, David Miller wrote:
> [ Ok I'm going to be a little bit cranky, and I think I deserve it.
> 
>   I'm basically not going to go through the multi-hour rebase and
>   retest process again, as it's hit the point of diminishing returns
>   as NOBODY is giving me test results but I can guarentee that
>   EVERYONE will bitch and complain when I push this into net-next and
>   it breaks their favorite feature.  If you can't be bothered to test
>   these changes, I'm honestly going to tell people to take a hike and
>   fix it themselves.  I simply don't care if you don't care enough to
>   test changes of this magnitude to make sure your favorite setup
>   still works.
> 
>   To say that I'm disappointed with the amount of testing feedback
>   after posting more than a dozen iterations of this delicate patch
>   set would be an understatement.  I can think of only one person who
>   actually tested one iteration of these patches and gave feedback.
> 
>   And meanwhile I've personally reviewed, tested, and signed off on
>   everyone else's work WITHOUT DELAY during this entire process.
> 
>   I've pulled 25 hour long hacking shifts to make that a reality, so
>   that my routing cache removal work absolutely would not impact or
>   delay the patch submissions of any other networking developer.  And
>   I can't even get a handful of testers with some feedback?  You
>   really have to be kidding me.. ]

Hmm, ok, please give me few hours to make some tests ;)

^ permalink raw reply

* [PATCH] forcedeth: spin_unlock_irq in interrupt handler fix
From: Denis Efremov @ 2012-07-20 21:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: Denis Efremov, David Decotigny, Eric Dumazet, Jiri Pirko,
	Ian Campbell, netdev, linux-kernel

The replacement of spin_lock_irq/spin_unlock_irq pair in interrupt
handler by spin_lock_irqsave/spin_lock_irqrestore pair.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 928913c..7e68c00 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -3776,7 +3776,7 @@ static irqreturn_t nv_nic_irq_other(int foo, void *data)
 			np->link_timeout = jiffies + LINK_TIMEOUT;
 		}
 		if (events & NVREG_IRQ_RECOVER_ERROR) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			/* disable interrupts on the nic */
 			writel(NVREG_IRQ_OTHER, base + NvRegIrqMask);
 			pci_push(base);
@@ -3786,7 +3786,7 @@ static irqreturn_t nv_nic_irq_other(int foo, void *data)
 				np->recover_error = 1;
 				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
 			}
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 			break;
 		}
 		if (unlikely(i > max_interrupt_work)) {
-- 
1.7.7

^ permalink raw reply related

* [PATCH RFT RESEND] net: Fix Neptune ethernet driver to check dma mapping error
From: Shuah Khan @ 2012-07-20 21:50 UTC (permalink / raw)
  To: davem, mcarlson, bhutchings, eric.dumazet, mchan
  Cc: netdev, LKML, shuahkhan, stable

Fix Neptune ethernet driver to check dma mapping error after map_page()
interface returns.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
---
 drivers/net/ethernet/sun/niu.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index 8c726b7..60d5c03 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3335,6 +3335,10 @@ static int niu_rbr_add_page(struct niu *np, struct rx_ring_info *rp,
 
 	addr = np->ops->map_page(np->device, page, 0,
 				 PAGE_SIZE, DMA_FROM_DEVICE);
+	if (!addr) {
+		__free_page(page);
+		return -ENOMEM;
+	}
 
 	niu_hash_page(rp, page, addr);
 	if (rp->rbr_blocks_per_page > 1)
-- 
1.7.9.5

^ permalink raw reply related

* [net-next 6/6] igb: update to allow reading/setting MDI state
From: Jeff Kirsher @ 2012-07-20 21:43 UTC (permalink / raw)
  To: davem
  Cc: Jesse Brandeburg, netdev, gospo, sassmann, Carolyn Wyborny,
	Jeff Kirsher
In-Reply-To: <1342820631-19738-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

This is the implementation for igb to allow forcing MDI state
via ethtool, allowing users to work around some improperly
behaving switches.

get_settings will now return the MDI state of the last link
because get_settings needs to succeed to allow the set to work even
when link is down.

Forcing in this driver is for now only allowed when auto-neg is
enabled.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   37 ++++++++++++++++++++++++++
 drivers/net/ethernet/intel/igb/igb_main.c    |    4 +++
 2 files changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index a19c84c..bc3c5b4 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -198,6 +198,14 @@ static int igb_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 	}
 
 	ecmd->autoneg = hw->mac.autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+
+	/* MDI-X => 2; MDI =>1; Invalid =>0 */
+	if (hw->phy.media_type == e1000_media_type_copper)
+		ecmd->eth_tp_mdix = hw->phy.is_mdix ? ETH_TP_MDI_X :
+						      ETH_TP_MDI;
+	else
+		ecmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
+
 	return 0;
 }
 
@@ -214,6 +222,22 @@ static int igb_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 		return -EINVAL;
 	}
 
+	/*
+	 * MDI setting is only allowed when autoneg enabled because
+	 * some hardware doesn't allow MDI setting when speed or
+	 * duplex is forced.
+	 */
+	if (ecmd->eth_tp_mdix_ctrl) {
+		if (hw->phy.media_type != e1000_media_type_copper)
+			return -EOPNOTSUPP;
+
+		if ((ecmd->eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) &&
+		    (ecmd->autoneg != AUTONEG_ENABLE)) {
+			dev_err(&adapter->pdev->dev, "forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n");
+			return -EINVAL;
+		}
+	}
+
 	while (test_and_set_bit(__IGB_RESETTING, &adapter->state))
 		msleep(1);
 
@@ -227,12 +251,25 @@ static int igb_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 			hw->fc.requested_mode = e1000_fc_default;
 	} else {
 		u32 speed = ethtool_cmd_speed(ecmd);
+		/* calling this overrides forced MDI setting */
 		if (igb_set_spd_dplx(adapter, speed, ecmd->duplex)) {
 			clear_bit(__IGB_RESETTING, &adapter->state);
 			return -EINVAL;
 		}
 	}
 
+	/* MDI-X => 2; MDI => 1; Auto => 3 */
+	if (ecmd->eth_tp_mdix_ctrl) {
+		/*
+		 * fix up the value for auto (3 => 0) as zero is mapped
+		 * internally to auto
+		 */
+		if (ecmd->eth_tp_mdix_ctrl == ETH_TP_MDI_AUTO)
+			hw->phy.mdix = AUTO_ALL_MODES;
+		else
+			hw->phy.mdix = ecmd->eth_tp_mdix_ctrl;
+	}
+
 	/* reset the link */
 	if (netif_running(adapter->netdev)) {
 		igb_down(adapter);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8adeca9..4df7848 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6675,6 +6675,10 @@ int igb_set_spd_dplx(struct igb_adapter *adapter, u32 spd, u8 dplx)
 	default:
 		goto err_inval;
 	}
+
+	/* clear MDI, MDI(-X) override is only allowed when autoneg enabled */
+	adapter->hw.phy.mdix = AUTO_ALL_MODES;
+
 	return 0;
 
 err_inval:
-- 
1.7.10.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox