Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/3] net: ethernet: cpsw-phy-sel: Remove redundant of_match_ptr
From: David Miller @ 2013-10-01 16:34 UTC (permalink / raw)
  To: sachin.kamat; +Cc: netdev
In-Reply-To: <1380515114-2823-2-git-send-email-sachin.kamat@linaro.org>

From: Sachin Kamat <sachin.kamat@linaro.org>
Date: Mon, 30 Sep 2013 09:55:13 +0530

> The data structure of_match_ptr() protects is always compiled in.
> Hence of_match_ptr() is not needed.
> 
> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3] net: ethernet: cpsw: Remove redundant of_match_ptr
From: David Miller @ 2013-10-01 16:34 UTC (permalink / raw)
  To: sachin.kamat; +Cc: netdev
In-Reply-To: <1380515114-2823-1-git-send-email-sachin.kamat@linaro.org>

From: Sachin Kamat <sachin.kamat@linaro.org>
Date: Mon, 30 Sep 2013 09:55:12 +0530

> The data structure of_match_ptr() protects is always compiled in.
> Hence of_match_ptr() is not needed.
> 
> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>

Applied.

^ permalink raw reply

* Re: Established sockets remain open after iface down or address lost
From: Alexey Kuznetsov @ 2013-10-01 16:33 UTC (permalink / raw)
  To: Chris Verges; +Cc: David Miller, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20130926060433.GA9170@cverges-dev-lnx.sentient-energy.com>

Hello!

> P.S.  I apologize in advance if I missed this answer in the netdev archives.

FYI googling f.e. "netdev tcp remove local address" instantly  finds
all that you want to know. Namrly, subj: Re: "TCP shutdown behaviour
when deleting local IP addresses"

^ permalink raw reply

* Re: [PATCH v2] ll_temac: Reset dma descriptors indexes on ndo_open
From: David Miller @ 2013-10-01 16:32 UTC (permalink / raw)
  To: ricardo.ribalda; +Cc: joe, jg1.han, gregkh, wfp5p, netdev, linux-kernel
In-Reply-To: <1380608230-29183-1-git-send-email-ricardo.ribalda@gmail.com>

From: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Date: Tue,  1 Oct 2013 08:17:10 +0200

> The dma descriptors indexes are only initialized on the probe function.
> 
> If a packet is on the buffer when temac_stop is called, the dma
> descriptors indexes can be left on a incorrect state where no other
> package can be sent.
> 
> So an interface could be left in an usable state after ifdow/ifup.
> 
> This patch makes sure that the descriptors indexes are in a proper
> status when the device is open.
> 
> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH] pkt_sched: fq: rate limiting improvements
From: Eric Dumazet @ 2013-10-01 16:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Steinar H. Gunderson

From: Eric Dumazet <edumazet@google.com>

FQ rate limiting suffers from two problems, reported
by Steinar :

1) FQ enforces a delay when flow quantum is exhausted in order
to reduce cpu overhead. But if packets are small, current
delay computation is slightly wrong, and observed rates can
be too high.

Steinar had this problem because he disabled TSO and GSO,
and default FQ quantum is 2*1514.

(Of course, I wish recent TSO auto sizing changes will help
to not having to disable TSO in the first place)

2) maxrate was not used for forwarded flows (skbs not attached
to a socket)

Tested:

tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
netperf -H lpq84 -l 1000 &
sleep 10 ; tc -s qdisc show dev eth0
qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
 quantum 3028 initial_quantum 15140 maxrate 8000Kbit 
 Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 7831Kbit 653pps backlog 7570b 5p requeues 0 
  44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
  0 gc, 0 highprio, 5545 throttled

lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq.c |   45 ++++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index fc6de56..a2fef8b 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -420,6 +420,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
 	struct fq_flow_head *head;
 	struct sk_buff *skb;
 	struct fq_flow *f;
+	u32 rate;
 
 	skb = fq_dequeue_head(sch, &q->internal);
 	if (skb)
@@ -468,28 +469,34 @@ begin:
 	f->time_next_packet = now;
 	f->credit -= qdisc_pkt_len(skb);
 
-	if (f->credit <= 0 &&
-	    q->rate_enable &&
-	    skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) {
-		u32 rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate;
+	if (f->credit > 0 || !q->rate_enable)
+		goto out;
 
-		rate = min(rate, q->flow_max_rate);
-		if (rate) {
-			u64 len = (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC;
-
-			do_div(len, rate);
-			/* Since socket rate can change later,
-			 * clamp the delay to 125 ms.
-			 * TODO: maybe segment the too big skb, as in commit
-			 * e43ac79a4bc ("sch_tbf: segment too big GSO packets")
-			 */
-			if (unlikely(len > 125 * NSEC_PER_MSEC)) {
-				len = 125 * NSEC_PER_MSEC;
-				q->stat_pkts_too_long++;
-			}
+	if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) {
+		rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate;
 
-			f->time_next_packet = now + len;
+		rate = min(rate, q->flow_max_rate);
+	} else {
+		rate = q->flow_max_rate;
+		if (rate == ~0U)
+			goto out;
+	}
+	if (rate) {
+		u32 plen = max(qdisc_pkt_len(skb), q->quantum);
+		u64 len = (u64)plen * NSEC_PER_SEC;
+
+		do_div(len, rate);
+		/* Since socket rate can change later,
+		 * clamp the delay to 125 ms.
+		 * TODO: maybe segment the too big skb, as in commit
+		 * e43ac79a4bc ("sch_tbf: segment too big GSO packets")
+		 */
+		if (unlikely(len > 125 * NSEC_PER_MSEC)) {
+			len = 125 * NSEC_PER_MSEC;
+			q->stat_pkts_too_long++;
 		}
+
+		f->time_next_packet = now + len;
 	}
 out:
 	qdisc_bstats_update(sch, skb);

^ permalink raw reply related

* Re: Established sockets remain open after iface down or address lost
From: Chris Verges @ 2013-10-01 16:08 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, davem, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <524AEDD1.9010709@hp.com>

On Tue, Oct 01, 2013 at 08:44:17AM -0700, Rick Jones wrote:
> The protocol between client and server needs to have an
> application-layer "keepalive" mechanism added, and then the server
> will be able to detect a dangling connection without need of any
> further kernel modifications.
> 
> If that is not possible, the server can/should set SO_KEEPALIVE and
> perhaps tweak the TCP keepalive settings.  Not as good (IMO) as an
> application-layer keepalive because it only shows that the connection
> is good as far as TCP, but I suppose it could do in a pinch.

I agree that some form of keepalives would solve the problem where
blocking reads need to be interrupted.  However, this creates traffic
across the link -- directly proportional to the keepalive interval.

The underlying physical layer is such that we pay for all traffic going
across it -- including any keepalives at either the application or TCP
layers.  Paying for this keepalive traffic when the link is operational
is not desired.

Thanks for the suggestion.  I, too, am hoping that a kernel mod isn't
required to do what is being asked.  But I'm also willing to put in the
work if needed.  My hope on engaging with the netdev list early in this
process is to (1) figure out whether this has already been fully solved
and (2) determine whether there would be interest in this patch for
general use.

Thanks,
Chris

^ permalink raw reply

* [PATCH net 2/2] ip6tnl: allow to use rtnl ops on fb tunnel
From: Nicolas Dichtel @ 2013-10-01 16:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, steffen.klassert, pshelar, Nicolas Dichtel
In-Reply-To: <1380643500-5018-1-git-send-email-nicolas.dichtel@6wind.com>

rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/ip6_tunnel.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 2d8f4829575b..a791552e0422 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1731,8 +1731,6 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct ip6_tnl_net *ip6n)
 		}
 	}
 
-	t = rtnl_dereference(ip6n->tnls_wc[0]);
-	unregister_netdevice_queue(t->dev, &list);
 	unregister_netdevice_many(&list);
 }
 
@@ -1752,6 +1750,7 @@ static int __net_init ip6_tnl_init_net(struct net *net)
 	if (!ip6n->fb_tnl_dev)
 		goto err_alloc_dev;
 	dev_net_set(ip6n->fb_tnl_dev, net);
+	ip6n->fb_tnl_dev->rtnl_link_ops = &ip6_link_ops;
 	/* FB netdevice is special: we have one, and only one per netns.
 	 * Allowing to move it to another netns is clearly unsafe.
 	 */
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net 1/2] sit: allow to use rtnl ops on fb tunnel
From: Nicolas Dichtel @ 2013-10-01 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, steffen.klassert, pshelar, Nicolas Dichtel
In-Reply-To: <524AB26B.1070700@6wind.com>

rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/sit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index afd5605aea7c..19269453a8ea 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1666,6 +1666,7 @@ static int __net_init sit_init_net(struct net *net)
 		goto err_alloc_dev;
 	}
 	dev_net_set(sitn->fb_tunnel_dev, net);
+	sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
 	/* FB netdevice is special: we have one, and only one per netns.
 	 * Allowing to move it to another netns is clearly unsafe.
 	 */
@@ -1700,7 +1701,6 @@ static void __net_exit sit_exit_net(struct net *net)
 
 	rtnl_lock();
 	sit_destroy_tunnels(sitn, &list);
-	unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
 	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
-- 
1.8.2.1

^ permalink raw reply related

* Re: [net-next  9/9] igb: Add ethtool support to configure number of channels
From: Ben Hutchings @ 2013-10-01 15:59 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Laura Mihaela Vasilescu, netdev, gospo, sassmann
In-Reply-To: <1380627236-3190-10-git-send-email-jeffrey.t.kirsher@intel.com>

On Tue, 2013-10-01 at 04:33 -0700, Jeff Kirsher wrote:
> From: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>
> 
> This patch adds the ethtool callbacks necessary to configure the
> number of RSS queues.
> 
> The maximum number of queues is in accordance with the datasheets.
> 
> Signed-off-by: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[...]
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
[...]
> +static void igb_get_channels(struct net_device *netdev,
> +			     struct ethtool_channels *ch)
[...]
> +static int igb_set_channels(struct net_device *netdev,
> +			    struct ethtool_channels *ch)

These functions look fine to me.

[...]
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7838,4 +7838,26 @@ s32 igb_write_i2c_byte(struct e1000_hw *hw, u8 byte_offset,
>  		return E1000_SUCCESS;
>  
>  }
> +
> +int igb_reinit_queues(struct igb_adapter *adapter)
> +{
> +	struct net_device *netdev = adapter->netdev;
> +	struct pci_dev *pdev = adapter->pdev;
> +	int err = 0;
> +
> +	if (netif_running(netdev))
> +		igb_close(netdev);
> +
> +	igb_clear_interrupt_scheme(adapter);
> +
> +	if (igb_init_interrupt_scheme(adapter, true)) {
> +		dev_err(&pdev->dev, "Unable to allocate memory for queues\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (netif_running(netdev))
> +		err = igb_open(netdev);
> +
> +	return err;
> +}
>  /* igb_main.c */

In case this fails, is the interface in a consistent state where is it
safe to reconfigure the interface again or to unbind the driver?

If it fails, and the interface was up, shouldn't it call dev_close() so
that it's obviously down and the user can then try to bring it up again?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Established sockets remain open after iface down or address lost
From: Rick Jones @ 2013-10-01 15:44 UTC (permalink / raw)
  To: Chris Verges
  Cc: Eric Dumazet, davem, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20131001132707.GA7442@cverges-dev-lnx.sentient-energy.com>

On 10/01/2013 06:27 AM, Chris Verges wrote:
> The client establishes a connection to the server.  It requests some
> data and gets a response.  The socket remains open.  The server then
> decides, through some asynchronous process, that the radio needs to be
> duty cycled.  So the radio is turned off.
>
> The client attempts to make another request to the device, but
> determines that the connection is dead through the normal retry
> mechanisms.  It's write() operation returns something like EPIPE.  So on
> the client's side, the connection is dead.
>
> But on the server side, the socket is still open and waiting for some
> more data.  The interface and IP address and even the remote client are
> long gone, but the socket still persists and uses system resources.

The protocol between client and server needs to have an 
application-layer "keepalive" mechanism added, and then the server will 
be able to detect a dangling connection without need of any further 
kernel modifications.

If that is not possible, the server can/should set SO_KEEPALIVE and 
perhaps tweak the TCP keepalive settings.  Not as good (IMO) as an 
application-layer keepalive because it only shows that the connection is 
good as far as TCP, but I suppose it could do in a pinch.

rick jones

^ permalink raw reply

* Re: [PATCH ethtool] sfc: Stop using bitfields in register definition structures
From: Ben Hutchings @ 2013-10-01 15:37 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers, David Laight
In-Reply-To: <1379960098.2485.48.camel@bwh-desktop.uk.level5networks.com>

On Mon, 2013-09-23 at 19:14 +0100, Ben Hutchings wrote:
> This doesn't make any difference to the static data size, and
> decreases the code size a little.
> 
> Reported-by: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  sfc.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

I've now applied this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH ethtool] sfc: Add support for EF10 registers
From: Ben Hutchings @ 2013-10-01 15:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers
In-Reply-To: <1379952298.2485.35.camel@bwh-desktop.uk.level5networks.com>

On Mon, 2013-09-23 at 17:04 +0100, Ben Hutchings wrote:
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> This works in conjunction with the recently submitted driver change
> 'sfc: Add EF10 registers to register dump'.
> 
> Ben.
> 
>  sfc.c | 159 +++++++++++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 109 insertions(+), 50 deletions(-)

I've now applied this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH ethtool 2/2] Hide state of VLAN tag offload and LRO if the kernel is too old
From: Ben Hutchings @ 2013-10-01 15:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers
In-Reply-To: <1379978791.2485.87.camel@bwh-desktop.uk.level5networks.com>

On Tue, 2013-09-24 at 00:26 +0100, Ben Hutchings wrote:
> Starting with Linux 2.6.37 and ethtool 2.6.36 it was possible to show
> the state of VLAN tag offload (using ETHTOOL_GFLAGS).  But the state
> would always be shown as 'off' for older kernel versions, even though
> VLAN tag offload had been implemented long before this.
> 
> In ethtool 3.4.2 I attempted to fix this by also reading the state of
> VLAN tag offload from the 'features' attribute in sysfs.  But this had
> to be reverted because it causes 'ethtool -K' to pass the flags back
> into ETHTOOL_SFLAGS.
> 
> Instead, hide the VLAN tag offload flags if the kernel is older than
> 2.6.37.
> 
> Similarly, LRO was implemented some time before it was exposed through
> ETHTOOL_GFLAGS in Linux 2.6.24.  So hide the LRO flag if the kernel
> is older than that.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  ethtool.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 46 insertions(+), 12 deletions(-)

I've now applied this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH ethtool 1/2] Revert "Fix reporting of VLAN tag offload flags on Linux < 2.6.37"
From: Ben Hutchings @ 2013-10-01 15:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers
In-Reply-To: <1379978698.2485.85.camel@bwh-desktop.uk.level5networks.com>

On Tue, 2013-09-24 at 00:24 +0100, Ben Hutchings wrote:
> This reverts commit 1cddbe64cfc66b58988c85086d6df3a77c0c61a5.
> It causes 'ethtool -K' to pass the VLAN tag offload flags to
> ETHTOOL_SFLAGS when they are not supported and will be rejected.
> ---
>  ethtool.c | 41 -----------------------------------------
>  1 file changed, 41 deletions(-)

I've now applied this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 2/2] net, mellanox mlx4 Fix compile warnings [v4]
From: Prarit Bhargava @ 2013-10-01 15:25 UTC (permalink / raw)
  To: Jack Morgenstein; +Cc: netdev, dledford, amirv, davem, ogerlitz
In-Reply-To: <20131001181742.72e603d1@jpm-OptiPlex-GX620>



On 10/01/2013 11:17 AM, Jack Morgenstein wrote:
> Since we are in the middle of submitting patches which touch the file
> "resource_tracker.c", I would really like to hold off on these warning
> fixes for a bit, and I'll handle the changes for all the functions at
> once to conform (correctly!) to the format suggested by Dave Miller.
> 
>  -Jack

Hey Jack, np.  Thanks for looking.

P.

^ permalink raw reply

* Re: [PATCH 2/2] net, mellanox mlx4 Fix compile warnings [v4]
From: Jack Morgenstein @ 2013-10-01 15:17 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: netdev, dledford, amirv, davem, ogerlitz
In-Reply-To: <1380633877-24034-2-git-send-email-prarit@redhat.com>

On Tue,  1 Oct 2013 09:24:37 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

Hi,

You missed some needed changes (You did not get compile warnings,
because indeed "r" was initialized in the paths below.  However,
in these error paths, "err" was ignored.
You should be setting "r" to the error return values:

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index
343206b..c4a0a32 100644 ---
a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -1085,9
+1085,9 @@ static struct res_cq *cq_res_start_move_to(struct mlx4_dev
*dev,
 	spin_lock_irq(mlx4_tlock(dev));
 	r = res_tracker_lookup(&tracker->res_tree[RES_CQ], cqn);
 	if (!r)
-               err = -ENOENT;
+               r = ERR_PTR(-ENOENT);
        else if (r->com.owner != slave)
-               err = -EPERM;
+               r = ERR_PTR(-EPERM);
        else {
                switch (state) {
                case RES_CQ_BUSY:
@@ -1140,9 +1140,9 @@ static struct res_srq
*srq_res_start_move_to(struct mlx4_dev *dev, int slave,
 	spin_lock_irq(mlx4_tlock(dev));
 	r = res_tracker_lookup(&tracker->res_tree[RES_SRQ], index);
 	if (!r)
-               err = -ENOENT;
+               r = ERR_PTR(-ENOENT);
        else if (r->com.owner != slave)
-               err = -EPERM;
+               r = ERR_PTR(-EPERM);
        else {
                switch (state) {
                case RES_SRQ_BUSY:

There are also other calls which need to be changed in a similar
fashion (eq_res_start_move_to(), and one or two others -- I'm
surprised that these others did not also generate uninitialized-var
warnings). If we change cq and srq, we should also change the others.

Since we are in the middle of submitting patches which touch the file
"resource_tracker.c", I would really like to hold off on these warning
fixes for a bit, and I'll handle the changes for all the functions at
once to conform (correctly!) to the format suggested by Dave Miller.

 -Jack

> Fix unitialized variable warnings.
> 
> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function
> ‘mlx4_HW2SW_CQ_wrapper’:
> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2551:16: error:
> ‘cq’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized] atomic_dec(&cq->mtt->ref_count); ^
> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function
> ‘mlx4_HW2SW_SRQ_wrapper’:
> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2734:17: error:
> ‘srq’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized] atomic_dec(&srq->mtt->ref_count);
> 
> [v2]: davem suggested making cq_res_start_move_to() return 'cq' as an
> error pointer instead of setting 'cq' by reference.  I also did the
> same for srq.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: dledford@redhat.com
> Cc: amirv@mellanox.com
> Cc: davem@davemloft.net
> Cc: ogerlitz@mellanox.com
> Cc: jackm@dev.mellanox.co.il
> ---
>  .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   46
> ++++++++++---------- 1 file changed, 24 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
> b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index
> dd68763..343206b 100644 ---
> a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++
> b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -1073,8
> +1073,9 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, int
> slave, int index, return err; }
>  
> -static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int
> cqn,
> -				enum res_cq_states state, struct
> res_cq **cq) +static struct res_cq *cq_res_start_move_to(struct
> mlx4_dev *dev,
> +						  int slave, int cqn,
> +						  enum res_cq_states
> state) {
>  	struct mlx4_priv *priv = mlx4_priv(dev);
>  	struct mlx4_resource_tracker *tracker =
> &priv->mfunc.master.res_tracker; @@ -1117,18 +1118,19 @@ static int
> cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
> r->com.from_state = r->com.state; r->com.to_state = state;
>  			r->com.state = RES_CQ_BUSY;
> -			if (cq)
> -				*cq = r;
> +		} else {
> +			r = ERR_PTR(err);
>  		}
>  	}
>  
>  	spin_unlock_irq(mlx4_tlock(dev));
>  
> -	return err;
> +	return r;
>  }
>  
> -static int srq_res_start_move_to(struct mlx4_dev *dev, int slave,
> int index,
> -				 enum res_cq_states state, struct
> res_srq **srq) +static struct res_srq *srq_res_start_move_to(struct
> mlx4_dev *dev, int slave,
> +					     int index,
> +					     enum res_cq_states
> state) {
>  	struct mlx4_priv *priv = mlx4_priv(dev);
>  	struct mlx4_resource_tracker *tracker =
> &priv->mfunc.master.res_tracker; @@ -1167,14 +1169,14 @@ static int
> srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
> r->com.from_state = r->com.state; r->com.to_state = state;
>  			r->com.state = RES_SRQ_BUSY;
> -			if (srq)
> -				*srq = r;
> +		} else {
> +			r = ERR_PTR(err);
>  		}
>  	}
>  
>  	spin_unlock_irq(mlx4_tlock(dev));
>  
> -	return err;
> +	return r;
>  }
>  
>  static void res_abort_move(struct mlx4_dev *dev, int slave,
> @@ -2530,9 +2532,9 @@ int mlx4_SW2HW_CQ_wrapper(struct mlx4_dev *dev,
> int slave, struct res_cq *cq;
>  	struct res_mtt *mtt;
>  
> -	err = cq_res_start_move_to(dev, slave, cqn, RES_CQ_HW, &cq);
> -	if (err)
> -		return err;
> +	cq = cq_res_start_move_to(dev, slave, cqn, RES_CQ_HW);
> +	if (IS_ERR(cq))
> +		return PTR_ERR(cq);
>  	err = get_res(dev, slave, mtt_base, RES_MTT, &mtt);
>  	if (err)
>  		goto out_move;
> @@ -2565,9 +2567,9 @@ int mlx4_HW2SW_CQ_wrapper(struct mlx4_dev *dev,
> int slave, int cqn = vhcr->in_modifier;
>  	struct res_cq *cq;
>  
> -	err = cq_res_start_move_to(dev, slave, cqn,
> RES_CQ_ALLOCATED, &cq);
> -	if (err)
> -		return err;
> +	cq = cq_res_start_move_to(dev, slave, cqn, RES_CQ_ALLOCATED);
> +	if (IS_ERR(cq))
> +		return PTR_ERR(cq);
>  	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>  	if (err)
>  		goto out_move;
> @@ -2709,9 +2711,9 @@ int mlx4_SW2HW_SRQ_wrapper(struct mlx4_dev
> *dev, int slave, if (srqn != (be32_to_cpu(srqc->state_logsize_srqn) &
> 0xffffff)) return -EINVAL;
>  
> -	err = srq_res_start_move_to(dev, slave, srqn, RES_SRQ_HW,
> &srq);
> -	if (err)
> -		return err;
> +	srq = srq_res_start_move_to(dev, slave, srqn,
> RES_SRQ_ALLOCATED);
> +	if (IS_ERR(srq))
> +		return PTR_ERR(srq);
>  	err = get_res(dev, slave, mtt_base, RES_MTT, &mtt);
>  	if (err)
>  		goto ex_abort;
> @@ -2748,9 +2750,9 @@ int mlx4_HW2SW_SRQ_wrapper(struct mlx4_dev
> *dev, int slave, int srqn = vhcr->in_modifier;
>  	struct res_srq *srq;
>  
> -	err = srq_res_start_move_to(dev, slave, srqn,
> RES_SRQ_ALLOCATED, &srq);
> -	if (err)
> -		return err;
> +	srq = srq_res_start_move_to(dev, slave, srqn,
> RES_SRQ_ALLOCATED);
> +	if (IS_ERR(srq))
> +		return PTR_ERR(srq);
>  	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>  	if (err)
>  		goto ex_abort;

^ permalink raw reply

* RE: [PATCH 07/11] igb: Avoid uninitialized advertised variable in eee_set_cur
From: Wyborny, Carolyn @ 2013-10-01 15:00 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel@vger.kernel.org
  Cc: Andi Kleen, Kirsher, Jeffrey T, netdev@vger.kernel.org
In-Reply-To: <1380572952-30729-8-git-send-email-andi@firstfloor.org>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Andi Kleen
> Sent: Monday, September 30, 2013 1:29 PM
> To: linux-kernel@vger.kernel.org
> Cc: Andi Kleen; Kirsher, Jeffrey T; netdev@vger.kernel.org
> Subject: [PATCH 07/11] igb: Avoid uninitialized advertised variable in
> eee_set_cur
> 
> From: Andi Kleen <ak@linux.intel.com>
> 
> eee_get_cur assumes that the output data is already zeroed. It can read-modify-
> write the advertised field:
> 
>               if (ipcnfg & E1000_IPCNFG_EEE_100M_AN)
> 2594			edata->advertised |= ADVERTISED_100baseT_Full;
> 
> This is ok for the normal ethtool eee_get call, which always zeroes the input
> data before.
> 
> But eee_set_cur also calls eee_get_cur and it did not zero the input field. Later
> on it then compares agsinst the field, which can contain partial stack garbage.
> 
> Zero the input field in eee_set_cur() too.
> 
> Cc: jeffrey.t.kirsher@intel.com
> Cc: netdev@vger.kernel.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> index 48cbc83..41e37ff 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> @@ -2652,6 +2652,8 @@ static int igb_set_eee(struct net_device *netdev,
>  	    (hw->phy.media_type != e1000_media_type_copper))
>  		return -EOPNOTSUPP;
> 
> +	memset(&eee_curr, 0, sizeof(struct ethtool_eee));
> +
>  	ret_val = igb_get_eee(netdev, &eee_curr);
>  	if (ret_val)
>  		return ret_val;
> --
> 1.8.3.1
> 

ACK

Thanks,

Carolyn 

^ permalink raw reply

* Re: Established sockets remain open after iface down or address lost
From: Chris Verges @ 2013-10-01 13:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1380203383.3165.172.camel@edumazet-glaptop>

On Thu, Sep 26, 2013 at 06:49:43AM -0700, Eric Dumazet wrote:
> On Wed, 2013-09-25 at 23:04 -0700, Chris Verges wrote:
> >   (6) Physically disconnect the USB/Ethernet adapter from the USB
> >       bus.
> >   (7) Linux removes the 'eth0' interface and associated IP address.
> > 
> > At this point, the socket _still_ shows as ESTABLISHED under
> > netstat.
> > 
> > This is the paradox.  Why is the blocking read not interrupted with
> > a socket error to indicate that the socket is no longer viable?
> 
> Because TCP layer is not sensitive to such temporary events.  You can
> plug again your iface, and IP is valid again.  Why should we give a
> permanent error for such case ?

Yes, the interface could be reconnected ... or it could not be.
Consider an embedded device where a PPP-based radio module is powered
off for a decent amount of time (hours).

  +--------+          +-----------------+
  | Client |----------| Embedded Server |
  +--------+          +-----------------+

The client establishes a connection to the server.  It requests some
data and gets a response.  The socket remains open.  The server then
decides, through some asynchronous process, that the radio needs to be
duty cycled.  So the radio is turned off.

The client attempts to make another request to the device, but
determines that the connection is dead through the normal retry
mechanisms.  It's write() operation returns something like EPIPE.  So on
the client's side, the connection is dead.

But on the server side, the socket is still open and waiting for some
more data.  The interface and IP address and even the remote client are
long gone, but the socket still persists and uses system resources.

This is primarily an issue when the server binds/listens/accepts without
using a socket pool to process sockets after the accept().  That is, the
server processes only one socket.  If the socket resource is held by
this now-dead connection, there is generally no way to reset it without
lengthy or costly keepalives (depending on whether the keepalive timer
is set long or short, respectively.)

> If network communication is cut somewhere, TCP is not supposed to
> immediately react. Normal timeouts and retransmits take place. 

I agree in the sense that "somewhere" is between the remote station
(inclusive) and the local station (exclusive.)  I would argue that the
local station could be aware of its own state changes and may choose to
respond accordingly.

I can see the arguments for why the existing behavior would be viewed
favorably.  From this particular real-world scenario that I am
encountering, I can also see why a modified behavior would be useful.
Would you consider accepting a patch that adds a new socket option to
optionally control this?  The effect would be to cause the socket to
automatically close (interrupting any blocking reads) if the underlying
address used by the socket is unregistered from the stack.  Default
behavior would be maintained.

Thanks,
Chris

^ permalink raw reply

* [PATCH 2/2] net, mellanox mlx4 Fix compile warnings [v4]
From: Prarit Bhargava @ 2013-10-01 13:24 UTC (permalink / raw)
  To: netdev; +Cc: Prarit Bhargava, dledford, amirv, davem, ogerlitz, jackm
In-Reply-To: <1380633877-24034-1-git-send-email-prarit@redhat.com>

Fix unitialized variable warnings.

drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_HW2SW_CQ_wrapper’:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2551:16: error: ‘cq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  atomic_dec(&cq->mtt->ref_count);
                ^
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_HW2SW_SRQ_wrapper’:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2734:17: error: ‘srq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  atomic_dec(&srq->mtt->ref_count);

[v2]: davem suggested making cq_res_start_move_to() return 'cq' as an error
pointer instead of setting 'cq' by reference.  I also did the same for
srq.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: dledford@redhat.com
Cc: amirv@mellanox.com
Cc: davem@davemloft.net
Cc: ogerlitz@mellanox.com
Cc: jackm@dev.mellanox.co.il
---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   46 ++++++++++----------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index dd68763..343206b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -1073,8 +1073,9 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	return err;
 }
 
-static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
-				enum res_cq_states state, struct res_cq **cq)
+static struct res_cq *cq_res_start_move_to(struct mlx4_dev *dev,
+						  int slave, int cqn,
+						  enum res_cq_states state)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
@@ -1117,18 +1118,19 @@ static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
 			r->com.from_state = r->com.state;
 			r->com.to_state = state;
 			r->com.state = RES_CQ_BUSY;
-			if (cq)
-				*cq = r;
+		} else {
+			r = ERR_PTR(err);
 		}
 	}
 
 	spin_unlock_irq(mlx4_tlock(dev));
 
-	return err;
+	return r;
 }
 
-static int srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
-				 enum res_cq_states state, struct res_srq **srq)
+static struct res_srq *srq_res_start_move_to(struct mlx4_dev *dev, int slave,
+					     int index,
+					     enum res_cq_states state)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
@@ -1167,14 +1169,14 @@ static int srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 			r->com.from_state = r->com.state;
 			r->com.to_state = state;
 			r->com.state = RES_SRQ_BUSY;
-			if (srq)
-				*srq = r;
+		} else {
+			r = ERR_PTR(err);
 		}
 	}
 
 	spin_unlock_irq(mlx4_tlock(dev));
 
-	return err;
+	return r;
 }
 
 static void res_abort_move(struct mlx4_dev *dev, int slave,
@@ -2530,9 +2532,9 @@ int mlx4_SW2HW_CQ_wrapper(struct mlx4_dev *dev, int slave,
 	struct res_cq *cq;
 	struct res_mtt *mtt;
 
-	err = cq_res_start_move_to(dev, slave, cqn, RES_CQ_HW, &cq);
-	if (err)
-		return err;
+	cq = cq_res_start_move_to(dev, slave, cqn, RES_CQ_HW);
+	if (IS_ERR(cq))
+		return PTR_ERR(cq);
 	err = get_res(dev, slave, mtt_base, RES_MTT, &mtt);
 	if (err)
 		goto out_move;
@@ -2565,9 +2567,9 @@ int mlx4_HW2SW_CQ_wrapper(struct mlx4_dev *dev, int slave,
 	int cqn = vhcr->in_modifier;
 	struct res_cq *cq;
 
-	err = cq_res_start_move_to(dev, slave, cqn, RES_CQ_ALLOCATED, &cq);
-	if (err)
-		return err;
+	cq = cq_res_start_move_to(dev, slave, cqn, RES_CQ_ALLOCATED);
+	if (IS_ERR(cq))
+		return PTR_ERR(cq);
 	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
 	if (err)
 		goto out_move;
@@ -2709,9 +2711,9 @@ int mlx4_SW2HW_SRQ_wrapper(struct mlx4_dev *dev, int slave,
 	if (srqn != (be32_to_cpu(srqc->state_logsize_srqn) & 0xffffff))
 		return -EINVAL;
 
-	err = srq_res_start_move_to(dev, slave, srqn, RES_SRQ_HW, &srq);
-	if (err)
-		return err;
+	srq = srq_res_start_move_to(dev, slave, srqn, RES_SRQ_ALLOCATED);
+	if (IS_ERR(srq))
+		return PTR_ERR(srq);
 	err = get_res(dev, slave, mtt_base, RES_MTT, &mtt);
 	if (err)
 		goto ex_abort;
@@ -2748,9 +2750,9 @@ int mlx4_HW2SW_SRQ_wrapper(struct mlx4_dev *dev, int slave,
 	int srqn = vhcr->in_modifier;
 	struct res_srq *srq;
 
-	err = srq_res_start_move_to(dev, slave, srqn, RES_SRQ_ALLOCATED, &srq);
-	if (err)
-		return err;
+	srq = srq_res_start_move_to(dev, slave, srqn, RES_SRQ_ALLOCATED);
+	if (IS_ERR(srq))
+		return PTR_ERR(srq);
 	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
 	if (err)
 		goto ex_abort;
-- 
1.7.9.3

^ permalink raw reply related

* [PATCH 1/2] net, vxlan Fix compile warning [v4]
From: Prarit Bhargava @ 2013-10-01 13:24 UTC (permalink / raw)
  To: netdev; +Cc: Prarit Bhargava, jpirko

Fix a unintialized variable warning.

drivers/net/vxlan.c: In function ‘vxlan_sock_add’:
drivers/net/vxlan.c:2240:11: error: ‘sock’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  vs->sock = sock;
           ^
drivers/net/vxlan.c:2217:17: note: ‘sock’ was declared here
  struct socket *sock;
                 ^
[v2]: davem suggested resolving this by making create_v{4,6}_sock() return an err
pointer.
[v3]: Ben Hutchings pointed out a missed conversion

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: jpirko@redhat.com
---
 drivers/net/vxlan.c |   31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d1292fe..cdc78b4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2182,7 +2182,7 @@ static void vxlan_del_work(struct work_struct *work)
  * could be used for both IPv4 and IPv6 communications, but
  * users may set bindv6only=1.
  */
-static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
+static struct socket *create_v6_sock(struct net *net, __be16 port)
 {
 	struct sock *sk;
 	struct socket *sock;
@@ -2195,7 +2195,7 @@ static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
 	rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (rc < 0) {
 		pr_debug("UDPv6 socket create failed\n");
-		return rc;
+		return ERR_PTR(rc);
 	}
 
 	/* Put in proper namespace */
@@ -2210,28 +2210,27 @@ static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
 		pr_debug("bind for UDPv6 socket %pI6:%u (%d)\n",
 			 &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc);
 		sk_release_kernel(sk);
-		return rc;
+		return ERR_PTR(rc);
 	}
 	/* At this point, IPv6 module should have been loaded in
 	 * sock_create_kern().
 	 */
 	BUG_ON(!ipv6_stub);
 
-	*psock = sock;
 	/* Disable multicast loopback */
 	inet_sk(sk)->mc_loop = 0;
-	return 0;
+	return sock;
 }
 
 #else
 
-static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
+static struct socket *create_v6_sock(struct net *net, __be16 port)
 {
-		return -EPFNOSUPPORT;
+		return ERR_PTR(-EPFNOSUPPORT);
 }
 #endif
 
-static int create_v4_sock(struct net *net, __be16 port, struct socket **psock)
+static struct socket *create_v4_sock(struct net *net, __be16 port)
 {
 	struct sock *sk;
 	struct socket *sock;
@@ -2246,7 +2245,7 @@ static int create_v4_sock(struct net *net, __be16 port, struct socket **psock)
 	rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (rc < 0) {
 		pr_debug("UDP socket create failed\n");
-		return rc;
+		return ERR_PTR(rc);
 	}
 
 	/* Put in proper namespace */
@@ -2259,13 +2258,12 @@ static int create_v4_sock(struct net *net, __be16 port, struct socket **psock)
 		pr_debug("bind for UDP socket %pI4:%u (%d)\n",
 			 &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc);
 		sk_release_kernel(sk);
-		return rc;
+		return ERR_PTR(rc);
 	}
 
-	*psock = sock;
 	/* Disable multicast loopback */
 	inet_sk(sk)->mc_loop = 0;
-	return 0;
+	return sock;
 }
 
 /* Create new listen socket if needed */
@@ -2276,7 +2274,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	struct vxlan_sock *vs;
 	struct socket *sock;
 	struct sock *sk;
-	int rc = 0;
 	unsigned int h;
 
 	vs = kmalloc(sizeof(*vs), GFP_KERNEL);
@@ -2289,12 +2286,12 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	INIT_WORK(&vs->del_work, vxlan_del_work);
 
 	if (ipv6)
-		rc = create_v6_sock(net, port, &sock);
+		sock = create_v6_sock(net, port);
 	else
-		rc = create_v4_sock(net, port, &sock);
-	if (rc < 0) {
+		sock = create_v4_sock(net, port);
+	if (IS_ERR(sock)) {
 		kfree(vs);
-		return ERR_PTR(rc);
+		return ERR_CAST(sock);
 	}
 
 	vs->sock = sock;
-- 
1.7.9.3

^ permalink raw reply related

* Re: Question regarding failure utilizing bonding mode 5 (balance-tlb)
From: Veaceslav Falico @ 2013-10-01 12:58 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: Jay Vosburgh, netdev@vger.kernel.org, Ariel Elior
In-Reply-To: <979A8436335E3744ADCD3A9F2A2B68A52ADFFFCF@SJEXCHMB10.corp.ad.broadcom.com>

On Tue, Oct 01, 2013 at 12:56:43PM +0000, Yuval Mintz wrote:
>> On Mon, Sep 30, 2013 at 11:30:40AM +0000, Yuval Mintz wrote:
>> >> > >Again, I think the permanent address is restored only when the bond
>> >> > >releases the slave, which I don't think happens when the slave is
>> unloaded.
>> >> >
>> >> > 	Given a bond0 with two slaves, eth0 and eth1, in tlb mode, eth0
>> >> > being the active,
>> >> >
>> >> > 	1) "ip link set dev eth0 down" which will fail over to eth1
>> >> > 		(swapping the contents of their dev_addr fields).
>> >> >
>> >> > 	2) "ip link set dev eth0 up" eth0 comes back up, reprograms its
>> >> > 		MAC to the wrong thing (what was in dev_addr).
>> >> >
>> >> > 	3) repeat steps 1 and 2 for eth1
>> >> >
>> >> > 	Is this correct?
>> >> >
>> >>
>> >> Yes, sorry for the earlier confusion.
>> >> I think in the case described `alb_swap_mac_addr()' will be called,
>> >> replacing eth0 and eth1's dev_addr, causing eth0 to have dev_addr
>> >> which defers from  the bond device's. Once eth0 reloads, it will use
>> >> the different MAC address for configuring FW/HW.
>> >
>> >Hi,
>> >
>> >Did you by any chance had the time to look at this issue?
>>
>> Hi Yuval,
>>
>> Sorry for getting into the discussion - but I've tried to understand the
>> problem and, possibly, find a fix.
>>
>> I'm not sure that I completely understand it, and I don't have currently
>> hardware on which to test it (though I might have it in the nearest
>> future), so, again, I really am not sure that I won't suggest something
>> completely stupid.
>>
>> Anyway, that being said, I hope that the following patch might fix the
>> problem. I've described the bug and the fix in the changelog, and the code
>> is pretty self-explanitory.
>>
>> And even if the patch fixes the issue - I'm not sure that it's the proper
>> and correct way to do it. But it's definitely worth a try... So, if it's
>> possible, could you please test this patch and see if it fixes it?
>>
>> Warning: I've just compile-tested it.
>>
>> So, FWIW...
>>
>
>Like you, I don't know if yours is the proper way of fixing the issue - but it did
>seem to fix it (the scenario that was described, at least)
>
>Tested-by: Yuval Mintz <yuvalmin@broadcom.com>

Thank you!

I'll then try now to dig it a big futher, and if it happens that this fix is
really the one we need - I'll use your Tested-by, hope that you're ok with
that. Otherwise I'll send a new patch(set) with you in the CC.

Thanks again!

>
>Thanks,
>Yuval
>
>>  From 87e6c584b0ae0f0261610d60cf83778feb9c1edb Mon Sep 17
>> 00:00:00 2001
>> From: Veaceslav Falico <vfalico@redhat.com>
>> Date: Mon, 30 Sep 2013 23:14:23 +0200
>> Subject: [PATCH] bonding: ensure that TLB mode's active slave has correct
>> mac filter
>>
>> Currently, in TLB mode we change mac addresses only by memcpy-ing the to
>> net_device->dev_addr, without actually setting them via
>> dev_set_mac_address(). This permits us to receive all the traffic always on
>> one mac address.
>>
>> However, in case the interface flips, some drivers might enforce the
>> mac filtering for its FW/HW based on current ->dev_addr, and thus we won't
>> be able to receive traffic on that interface, in case it will be selected
>> as active in TLB mode.
>>
>> Fix it by setting the mac address forcefully on every new active slave that
>> we select in TLB mode.
>>
>> CC: Jay Vosburgh <fubar@us.ibm.com>
>> CC: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>> ---
>>   drivers/net/bonding/bond_alb.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>> index e960418..576ccea 100644
>> --- a/drivers/net/bonding/bond_alb.c
>> +++ b/drivers/net/bonding/bond_alb.c
>> @@ -1699,6 +1699,23 @@ void bond_alb_handle_active_change(struct
>> bonding *bond, struct slave *new_slave
>>
>>   	ASSERT_RTNL();
>>
>> +	/* in TLB mode, the slave might flip down/up with the old dev_addr,
>> +	 * and thus filter bond->dev_addr's packets, so force bond's mac
>> +	 */
>> +	if (bond->params.mode == BOND_MODE_TLB) {
>> +		struct sockaddr sa;
>> +		u8 tmp_addr[ETH_ALEN];
>> +
>> +		memcpy(tmp_addr, new_slave->dev->dev_addr, ETH_ALEN);
>> +
>> +		memcpy(sa.sa_data, bond->dev->dev_addr, bond->dev-
>> >addr_len);
>> +		sa.sa_family = bond->dev->type;
>> +		/* we don't care if it can't change its mac, best effort */
>> +		dev_set_mac_address(new_slave->dev, &sa);
>> +
>> +		memcpy(new_slave->dev->dev_addr, tmp_addr, ETH_ALEN);
>> +	}
>> +
>>   	/* curr_active_slave must be set before calling alb_swap_mac_addr
>> */
>>   	if (swap_slave) {
>>   		/* swap mac address */
>> --
>> 1.8.4
>>
>>
>> >
>> >Thanks,
>> >Yuval
>> >
>> >--
>> >To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >the body of a message to majordomo@vger.kernel.org
>> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* RE: Question regarding failure utilizing bonding mode 5 (balance-tlb)
From: Yuval Mintz @ 2013-10-01 12:56 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Jay Vosburgh, netdev@vger.kernel.org, Ariel Elior
In-Reply-To: <20130930212440.GC24830@redhat.com>

> On Mon, Sep 30, 2013 at 11:30:40AM +0000, Yuval Mintz wrote:
> >> > >Again, I think the permanent address is restored only when the bond
> >> > >releases the slave, which I don't think happens when the slave is
> unloaded.
> >> >
> >> > 	Given a bond0 with two slaves, eth0 and eth1, in tlb mode, eth0
> >> > being the active,
> >> >
> >> > 	1) "ip link set dev eth0 down" which will fail over to eth1
> >> > 		(swapping the contents of their dev_addr fields).
> >> >
> >> > 	2) "ip link set dev eth0 up" eth0 comes back up, reprograms its
> >> > 		MAC to the wrong thing (what was in dev_addr).
> >> >
> >> > 	3) repeat steps 1 and 2 for eth1
> >> >
> >> > 	Is this correct?
> >> >
> >>
> >> Yes, sorry for the earlier confusion.
> >> I think in the case described `alb_swap_mac_addr()' will be called,
> >> replacing eth0 and eth1's dev_addr, causing eth0 to have dev_addr
> >> which defers from  the bond device's. Once eth0 reloads, it will use
> >> the different MAC address for configuring FW/HW.
> >
> >Hi,
> >
> >Did you by any chance had the time to look at this issue?
> 
> Hi Yuval,
> 
> Sorry for getting into the discussion - but I've tried to understand the
> problem and, possibly, find a fix.
> 
> I'm not sure that I completely understand it, and I don't have currently
> hardware on which to test it (though I might have it in the nearest
> future), so, again, I really am not sure that I won't suggest something
> completely stupid.
> 
> Anyway, that being said, I hope that the following patch might fix the
> problem. I've described the bug and the fix in the changelog, and the code
> is pretty self-explanitory.
> 
> And even if the patch fixes the issue - I'm not sure that it's the proper
> and correct way to do it. But it's definitely worth a try... So, if it's
> possible, could you please test this patch and see if it fixes it?
> 
> Warning: I've just compile-tested it.
> 
> So, FWIW...
> 

Like you, I don't know if yours is the proper way of fixing the issue - but it did
seem to fix it (the scenario that was described, at least)

Tested-by: Yuval Mintz <yuvalmin@broadcom.com>

Thanks,
Yuval

>  From 87e6c584b0ae0f0261610d60cf83778feb9c1edb Mon Sep 17
> 00:00:00 2001
> From: Veaceslav Falico <vfalico@redhat.com>
> Date: Mon, 30 Sep 2013 23:14:23 +0200
> Subject: [PATCH] bonding: ensure that TLB mode's active slave has correct
> mac filter
> 
> Currently, in TLB mode we change mac addresses only by memcpy-ing the to
> net_device->dev_addr, without actually setting them via
> dev_set_mac_address(). This permits us to receive all the traffic always on
> one mac address.
> 
> However, in case the interface flips, some drivers might enforce the
> mac filtering for its FW/HW based on current ->dev_addr, and thus we won't
> be able to receive traffic on that interface, in case it will be selected
> as active in TLB mode.
> 
> Fix it by setting the mac address forcefully on every new active slave that
> we select in TLB mode.
> 
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>   drivers/net/bonding/bond_alb.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index e960418..576ccea 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -1699,6 +1699,23 @@ void bond_alb_handle_active_change(struct
> bonding *bond, struct slave *new_slave
> 
>   	ASSERT_RTNL();
> 
> +	/* in TLB mode, the slave might flip down/up with the old dev_addr,
> +	 * and thus filter bond->dev_addr's packets, so force bond's mac
> +	 */
> +	if (bond->params.mode == BOND_MODE_TLB) {
> +		struct sockaddr sa;
> +		u8 tmp_addr[ETH_ALEN];
> +
> +		memcpy(tmp_addr, new_slave->dev->dev_addr, ETH_ALEN);
> +
> +		memcpy(sa.sa_data, bond->dev->dev_addr, bond->dev-
> >addr_len);
> +		sa.sa_family = bond->dev->type;
> +		/* we don't care if it can't change its mac, best effort */
> +		dev_set_mac_address(new_slave->dev, &sa);
> +
> +		memcpy(new_slave->dev->dev_addr, tmp_addr, ETH_ALEN);
> +	}
> +
>   	/* curr_active_slave must be set before calling alb_swap_mac_addr
> */
>   	if (swap_slave) {
>   		/* swap mac address */
> --
> 1.8.4
> 
> 
> >
> >Thanks,
> >Yuval
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-01 12:32 UTC (permalink / raw)
  To: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet
In-Reply-To: <20131001120907.GH10771@order.stressinduktion.org>

On Tue, Oct 01, 2013 at 02:09:07PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> > >> 
> > >> What if non-ufo-path-created skb is peeked?
> > >
> > >You mean like:
> > >first append a frame < mtu
> > >second frame > mtu so it gets handles by ip6_ufo_append?
> > >
> > >Currently I don't see a problem with it but I may be wrong. What is
> > >your suspicion?
> > 
> > Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> > Because of that it will be not threated as ufo skb.
> > 
> > Following patch fixes it:
> > 
> > Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> > 
> > Now, if user application does:
> > sendto len<mtu flag MSG_MORE
> > sendto len>mtu flag 0
> > The skb is not treated as fragmented one because it is not initialized
> > that way. So move the initialization to fix this.
> > 
> > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> 
> My understanding is that the frame is only a valid gso frame iff the first skb
> queued up is setup as an UFO frame. In other cases we only need to make sure
> we append to the fragment list without updating the gso field and the skb will
> get linearized (if needed) later in the output path.
> 
> This seems not to matter for virtio_net and loopback because we seem to pass
> the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
> neterion and we have to verify if this assumption is correct, there.

Hm, ip_append_page seems to violate my assumption. I need to read up on the
code later today.

^ permalink raw reply

* RE: [PATCH net RFC 2/2] ixgbe: fix sleep bug caused by napi_disable inside local_bh_disable()d context
From: Yuval Mintz @ 2013-10-01 12:11 UTC (permalink / raw)
  To: Keller, Jacob E, Francois Romieu
  Cc: netdev@vger.kernel.org, Duyck, Alexander H, Hyong-Youb Kim,
	Dmitry Kravkov, Amir Vadai, Eliezer Tamir
In-Reply-To: <02874ECE860811409154E81DA85FBB58565F5E52@ORSMSX104.amr.corp.intel.com>

> > > I have to move the local_bh_disable in order to put napi_disable
> > outside
> > > of the call since napi_disable could sleep, causing a scheduling while
> > > atomic BUG.
> >
> > I am in violent agreement with this part.
> > --
> > Ueimor
> 
> Regards,
> Jake
> --

It seem like we've hit the same issue with the bnx2x driver.
Is there anything new about the RFC?

Thanks,
Yuval

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-01 12:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert,
	eric.dumazet
In-Reply-To: <20131001105837.GA1424@minipsycho.brq.redhat.com>

On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> >> 
> >> What if non-ufo-path-created skb is peeked?
> >
> >You mean like:
> >first append a frame < mtu
> >second frame > mtu so it gets handles by ip6_ufo_append?
> >
> >Currently I don't see a problem with it but I may be wrong. What is
> >your suspicion?
> 
> Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> Because of that it will be not threated as ufo skb.
> 
> Following patch fixes it:
> 
> Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> 
> Now, if user application does:
> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

My understanding is that the frame is only a valid gso frame iff the first skb
queued up is setup as an UFO frame. In other cases we only need to make sure
we append to the fragment list without updating the gso field and the skb will
get linearized (if needed) later in the output path.

This seems not to matter for virtio_net and loopback because we seem to pass
the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
neterion and we have to verify if this assumption is correct, there.

This is also the way the IPv4 stack deals with UFO.

Either way, this needs a look from Eric Dumazet and Herbert Xu, so I added
them in Cc.

Greetings,

  Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox