Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 1/2] net: core: introduce netif_skb_dev_features
From: Florian Westphal @ 2014-02-10 20:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netdevice.h |  7 ++++++-
 net/core/dev.c            | 20 +++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 440a02e..21d4e6b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3068,7 +3068,12 @@ void netdev_change_features(struct net_device *dev);
 void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 					struct net_device *dev);
 
-netdev_features_t netif_skb_features(struct sk_buff *skb);
+netdev_features_t netif_skb_dev_features(struct sk_buff *skb,
+					 const struct net_device *dev);
+static inline netdev_features_t netif_skb_features(struct sk_buff *skb)
+{
+	return netif_skb_dev_features(skb, skb->dev);
+}
 
 static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 3721db7..94f7401 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2495,34 +2495,36 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
 }
 
 static netdev_features_t harmonize_features(struct sk_buff *skb,
-	netdev_features_t features)
+					    const struct net_device *dev,
+					    netdev_features_t features)
 {
 	if (skb->ip_summed != CHECKSUM_NONE &&
 	    !can_checksum_protocol(features, skb_network_protocol(skb))) {
 		features &= ~NETIF_F_ALL_CSUM;
-	} else if (illegal_highdma(skb->dev, skb)) {
+	} else if (illegal_highdma(dev, skb)) {
 		features &= ~NETIF_F_SG;
 	}
 
 	return features;
 }
 
-netdev_features_t netif_skb_features(struct sk_buff *skb)
+netdev_features_t netif_skb_dev_features(struct sk_buff *skb,
+					 const struct net_device *dev)
 {
 	__be16 protocol = skb->protocol;
-	netdev_features_t features = skb->dev->features;
+	netdev_features_t features = dev->features;
 
-	if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
+	if (skb_shinfo(skb)->gso_segs > dev->gso_max_segs)
 		features &= ~NETIF_F_GSO_MASK;
 
 	if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) {
 		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
 		protocol = veh->h_vlan_encapsulated_proto;
 	} else if (!vlan_tx_tag_present(skb)) {
-		return harmonize_features(skb, features);
+		return harmonize_features(skb, dev, features);
 	}
 
-	features &= (skb->dev->vlan_features | NETIF_F_HW_VLAN_CTAG_TX |
+	features &= (dev->vlan_features | NETIF_F_HW_VLAN_CTAG_TX |
 					       NETIF_F_HW_VLAN_STAG_TX);
 
 	if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD))
@@ -2530,9 +2532,9 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
 				NETIF_F_GEN_CSUM | NETIF_F_HW_VLAN_CTAG_TX |
 				NETIF_F_HW_VLAN_STAG_TX;
 
-	return harmonize_features(skb, features);
+	return harmonize_features(skb, dev, features);
 }
-EXPORT_SYMBOL(netif_skb_features);
+EXPORT_SYMBOL(netif_skb_dev_features);
 
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
-- 
1.8.1.5

^ permalink raw reply related

* [PATCH v5 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-02-10 20:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Herbert Xu, Eric Dumazet
In-Reply-To: <1392064537-30646-1-git-send-email-fw@strlen.de>

Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes since V4:
 - use new netif_skb_dev_features instead of netif_skb_features
   as we need dst->dev and not skb->dev (spotted by
   Hannes Frederic Sowa)

Changes since V3:
 - use ip_dst_mtu_maybe_forward instead of dst_mtu
 - add comment wrt. DF bit not being set

Changes since V2:
 - make this thing apply to current -net tree
 - kill unused variables in ip_forward/ip6_output

Changes since V1:
 suggestions from Eric Dumazet:
  - skip more expensive computation for small packets in fwd path
  - use netif_skb_features() feature mask and remove GSO flags
    instead of using 0 feature set.

 include/linux/skbuff.h | 17 ++++++++++++
 net/ipv4/ip_forward.c  | 71 ++++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/ip6_output.c  | 17 ++++++++++--
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f589c9a..3ebbbe7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2916,5 +2916,22 @@ static inline bool skb_head_is_locked(const struct sk_buff *skb)
 {
 	return !skb->head_frag || skb_cloned(skb);
 }
+
+/**
+ * skb_gso_network_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_network_seglen is used to determine the real size of the
+ * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
+ *
+ * The MAC/L2 header is not accounted for.
+ */
+static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
+{
+	unsigned int hdr_len = skb_transport_header(skb) -
+			       skb_network_header(skb);
+	return hdr_len + skb_gso_transport_seglen(skb);
+}
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index e9f1217..f3869c1 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -39,6 +39,71 @@
 #include <net/route.h>
 #include <net/xfrm.h>
 
+static bool ip_may_fragment(const struct sk_buff *skb)
+{
+	return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) ||
+	       !skb->local_df;
+}
+
+static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
+static bool ip_gso_exceeds_dst_mtu(const struct sk_buff *skb)
+{
+	unsigned int mtu;
+
+	if (skb->local_df || !skb_is_gso(skb))
+		return false;
+
+	mtu = ip_dst_mtu_maybe_forward(skb_dst(skb), true);
+
+	/* if seglen > mtu, do software segmentation for IP fragmentation on
+	 * output.  DF bit cannot be set since ip_forward would have sent
+	 * icmp error.
+	 */
+	return skb_gso_network_seglen(skb) > mtu;
+}
+
+/* called if GSO skb needs to be fragmented on forward */
+static int ip_forward_finish_gso(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+	netdev_features_t features;
+	struct sk_buff *segs;
+	int ret = 0;
+
+	features = netif_skb_dev_features(skb, dst->dev);
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+	if (IS_ERR(segs)) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	consume_skb(skb);
+
+	do {
+		struct sk_buff *nskb = segs->next;
+		int err;
+
+		segs->next = NULL;
+		err = dst_output(segs);
+
+		if (err && ret == 0)
+			ret = err;
+		segs = nskb;
+	} while (segs);
+
+	return ret;
+}
+
 static int ip_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options *opt	= &(IPCB(skb)->opt);
@@ -49,6 +114,9 @@ static int ip_forward_finish(struct sk_buff *skb)
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
+	if (ip_gso_exceeds_dst_mtu(skb))
+		return ip_forward_finish_gso(skb);
+
 	return dst_output(skb);
 }
 
@@ -91,8 +159,7 @@ int ip_forward(struct sk_buff *skb)
 
 	IPCB(skb)->flags |= IPSKB_FORWARDED;
 	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
-	if (unlikely(skb->len > mtu && !skb_is_gso(skb) &&
-		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
+	if (!ip_may_fragment(skb) && ip_exceeds_mtu(skb, mtu)) {
 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(mtu));
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ef02b26..070a2fa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -342,6 +342,20 @@ static unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 	return mtu;
 }
 
+static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+{
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)
+		return true;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
 int ip6_forward(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
@@ -466,8 +480,7 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;
 
-	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
-	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
+	if (ip6_pkt_too_big(skb, mtu)) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
1.8.1.5

^ permalink raw reply related

* [PATCH] net: fix macvtap type name in Kconfig
From: Jan Luebbe @ 2014-02-10 20:40 UTC (permalink / raw)
  To: David S. Miller, netdev

The netlink kind (and iproute2 type option) is actually called
'macvtap', not 'macvlan'.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
---
 drivers/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f342278..494b888 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -139,7 +139,7 @@ config MACVTAP
 	  This adds a specialized tap character device driver that is based
 	  on the MAC-VLAN network interface, called macvtap. A macvtap device
 	  can be added in the same way as a macvlan device, using 'type
-	  macvlan', and then be accessed through the tap user space interface.
+	  macvtap', and then be accessed through the tap user space interface.
 
 	  To compile this driver as a module, choose M here: the module
 	  will be called macvtap.
-- 
1.8.5.2

^ permalink raw reply related

* Re: [PATCH iproute2 net-next-for-3.13 2/2] iplink_bond: add support for displaying bond slave attributes
From: Stephen Hemminger @ 2014-02-10 20:53 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20140210081156.787D5E5697@unicorn.suse.cz>

On Mon, 10 Feb 2014 09:11:56 +0100 (CET)
Michal Kubecek <mkubecek@suse.cz> wrote:

> +static const char *slave_state_tbl[] = {
> +	"active",
> +	"backup",
> +	NULL
> +};
> +
> +static const char *slave_mii_status_tbl[] = {
> +	"up",
> +	"going_down",
> +	"down",
> +	"going_back",
> +	NULL,
> +};
> +

Is there some correlation of these states to values in a header file.
Something like

static const char *slave_mii_status_tbl[] = {
	[BOND_LINK_UP] = "active",
	[BOND_LINK_FAIL] = "fail",
	[BOND_LINK_DOWN] = "down",
	[BOND_LINK_BACK] = "back",
};


And don't use null terminated arrays, use ARRAY_SIZE() instead.
	

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Thomas Glanzmann @ 2014-02-10 20:56 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Eric Dumazet, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1392058718.17867.8.camel@haakon3.risingtidesystems.com>

Hello Nab,

> This looks correct to me.  Thomas, once your able to confirm please
> include your 'Tested-by' and I'll include for the next -rc3 PULL
> request.

Eric is currently reviewing our latest iteration with MSG_MORE for
kernel_sendmsg and MSG_MORE | MSG_SENDPAGE_NOTLAST for sendpage. However
with the last iteration we had again a high RTT for some packets. But
than Eric let me tune net.ipv4.tcp_min_tso_segs to 8 and the RTT went
down to what it used before auto corking was enabled. At least almost.

I'm having a steep learning curve but Eric hopefully knows how to get
this back in check. Nevertheless the regression I saw are history
because I saw that Eric has submitted the patch to David S. Miller which
fixes the two bugs that killed the iSCSI performance when tcp auto
corking was on. So currently we're just optimizing to get the last 20%
or so out of it. Quite interesting. Especially how much bandwidth can be
saved by coalescing packets.

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Pravin Shelar @ 2014-02-10 21:00 UTC (permalink / raw)
  To: Pravin Shelar, David Miller, netdev, Templin, Fred L,
	Nicolas Dichtel
In-Reply-To: <20140208005843.GE16198@order.stressinduktion.org>

On Fri, Feb 7, 2014 at 4:58 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> [Cc Nicolas]
>
> On Fri, Feb 07, 2014 at 02:49:20PM -0800, Pravin Shelar wrote:
>> On Fri, Feb 7, 2014 at 2:28 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > Hi!
>> >
>> > On Fri, Feb 07, 2014 at 02:12:38PM -0800, Pravin wrote:
>> >> --- a/net/core/skbuff.c
>> >> +++ b/net/core/skbuff.c
>> >> @@ -3905,12 +3905,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
>> >>   */
>> >>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>> >>  {
>> >> -     if (xnet)
>> >> +     if (xnet) {
>> >>               skb_orphan(skb);
>> >> +             skb->local_df = 0;
>> >> +     }
>> >>       skb->tstamp.tv64 = 0;
>> >>       skb->pkt_type = PACKET_HOST;
>> >>       skb->skb_iif = 0;
>> >> -     skb->local_df = 0;
>> >>       skb_dst_drop(skb);
>> >>       skb->mark = 0;
>> >>       secpath_reset(skb);
>> >
>> > I wonder if this should be the right behaviour for tunnels, which should just
>> > do fragmentation based on IP_DF, even if the packet originated locally from a
>> > socket which allowed local fragmentation (inet->pmtudisc < IP_PMTUDISC_DO).
>> >
>> This is not about tunneling, skb_scrub_packet() is generic function
>> which should not reset local_df on all packets.
>>
>> We can have separate discussion about use of local_df and tunneling in
>> another thread.
>
> This change only affects tunnel code as of current net branch, how do
> you not expect a discussion about that in this thread, I really wonder?
>
> May I know because of wich vport, vxlan or gre, you did this change?
>
It affects both gre and vxlan.

> I am feeling a bit uncomfortable handling remote and local packets that
> differently on lower tunnel output (local_df is mostly set on locally
> originating packets).

For ip traffic it make sense to turn on local_df only for local
traffic, since for remote case we can send icmp (frag-needed) back to
source. No such thing exist for OVS tunnels. ICMP packet are not
returned to source for the tunnels. That is why to be on safe side,
local_df is turned on for tunnels in OVS.

Thanks.

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Eric Dumazet @ 2014-02-10 21:01 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Nicholas A. Bellinger, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140210205634.GC19621@glanzmann.de>

On Mon, 2014-02-10 at 21:56 +0100, Thomas Glanzmann wrote:
> Hello Nab,
> 
> > This looks correct to me.  Thomas, once your able to confirm please
> > include your 'Tested-by' and I'll include for the next -rc3 PULL
> > request.
> 
> Eric is currently reviewing our latest iteration with MSG_MORE for
> kernel_sendmsg and MSG_MORE | MSG_SENDPAGE_NOTLAST for sendpage. However
> with the last iteration we had again a high RTT for some packets. But
> than Eric let me tune net.ipv4.tcp_min_tso_segs to 8 and the RTT went
> down to what it used before auto corking was enabled. At least almost.
> 

Hmm.. I was not aware of high RTT for some packets.

Can you spot this on the pcap you provided ?


> I'm having a steep learning curve but Eric hopefully knows how to get
> this back in check. Nevertheless the regression I saw are history
> because I saw that Eric has submitted the patch to David S. Miller which
> fixes the two bugs that killed the iSCSI performance when tcp auto
> corking was on. So currently we're just optimizing to get the last 20%
> or so out of it. Quite interesting. Especially how much bandwidth can be
> saved by coalescing packets.

It depends on the ratio payload/headers.

The beginning of your pcap show a lot of 512 bytes requests, so for this
kind of requests, the gain is huge (maybe 50%), but for 32K or 64K
request, gain would be marginal.

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Thomas Glanzmann @ 2014-02-10 21:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nicholas A. Bellinger, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1392066100.6615.55.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Hmm.. I was not aware of high RTT for some packets.
> Can you spot this on the pcap you provided ?

with the latest patch as in:

(node-62) [~/work/linux-2.6] git diff | pbot
http://pbot.rmdir.de/CQwqI6b7wJProw_xaukmEg

with net.ipv4.tcp_min_tso_segs=2 we had this pcap:
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more_notlast.pcap.bz2
And here is the RTT TCP graph:
Wireshark > Statistics > TCP Stream Graph > Round Trip Time Graph
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-10-22_04_30.png

with net.ipv4.tcp_min_tso_segs=8 we have this pcap:
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more_notlast_min_tso_segs_8.pcap.bz2
And here is the RTT TCP graph:
Wireshark > Statistics > TCP Stream Graph > Round Trip Time Graph

This gives us 0.0015 seconds RTT (1.5 ms)

Without TCP autocorking we had 0.0005 (0.5 ms).

https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

Cheers,
        Thomas

^ permalink raw reply

* irda: BUG: looking up invalid subclass: 4294967295
From: Dave Jones @ 2014-02-10 21:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: netdev

Is this irda being crap, or some weird lockdep corner case ?

(no idea what the W taint was, some unrelated spew pushed it out
 of the dmesg buffer before it got chance to hit the log)

	Dave

BUG: looking up invalid subclass: 4294967295
turning off the locking correctness validator.
CPU: 0 PID: 1965 Comm: trinity-main Tainted: G        W    3.14.0-rc1+ #108
 00000000ffffffff 00000000a8765326 ffff880205ee7c40 ffffffff9872ae0a
 00000000ffffffff ffff880205ee7c58 ffffffff9872732c ffff880084069730
 ffff880205ee7cd0 ffffffff980c10e2 000000010016000f 00000000ffffffff
Call Trace:
 [<ffffffff9872ae0a>] dump_stack+0x4e/0x7a
 [<ffffffff9872732c>] look_up_lock_class.part.18+0x2f/0x34
 [<ffffffff980c10e2>] __lock_acquire.isra.28+0x722/0xa50
 [<ffffffff9873a01b>] ? preempt_count_sub+0x6b/0xf0
 [<ffffffff980c1b6d>] lock_acquire+0x8d/0x120
 [<ffffffffc05438f2>] ? hashbin_delete+0xf2/0x100 [irda]
 [<ffffffffc0545e30>] ? irias_delete_value+0x30/0x30 [irda]
 [<ffffffff987357fc>] _raw_spin_lock_irqsave_nested+0x4c/0x70
 [<ffffffffc05438f2>] ? hashbin_delete+0xf2/0x100 [irda]
 [<ffffffffc05438f2>] hashbin_delete+0xf2/0x100 [irda]
 [<ffffffffc0546266>] __irias_delete_object+0x26/0x40 [irda]
 [<ffffffffc05462a4>] irias_delete_object+0x24/0x30 [irda]
 [<ffffffffc05489a5>] irda_release+0x65/0x160 [irda]
 [<ffffffff985fa59f>] sock_release+0x1f/0x80
 [<ffffffff985fa612>] sock_close+0x12/0x20
 [<ffffffff981bf0fa>] __fput+0xea/0x2c0
 [<ffffffff981bf31e>] ____fput+0xe/0x10
 [<ffffffff98090924>] task_work_run+0xb4/0xe0
 [<ffffffff9806e791>] do_exit+0x2e1/0xb50
 [<ffffffff980ab791>] ? vtime_account_user+0x91/0xa0
 [<ffffffff9814f69b>] ? context_tracking_user_exit+0x9b/0x100
 [<ffffffff9806ffec>] do_group_exit+0x4c/0xc0
 [<ffffffff98070074>] SyS_exit_group+0x14/0x20
 [<ffffffff9873e66a>] tracesys+0xd4/0xd9

^ permalink raw reply

* Re: [PATCH] net: rfkill-regulator: Add devicetree support.
From: Belisko Marek @ 2014-02-10 21:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: robh+dt@kernel.org, Pawel Moll, ijc+devicetree@hellion.org.uk,
	galak@codeaurora.org, rob@landley.net, linville@tuxdriver.com,
	johannes@sipsolutions.net, davem@davemloft.net,
	grant.likely@linaro.org, neilb@suse.de, hns@goldelico.com,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20140210101842.GS25314@e106331-lin.cambridge.arm.com>

On Mon, Feb 10, 2014 at 11:18 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Feb 07, 2014 at 07:48:49PM +0000, Marek Belisko wrote:
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> Signed-off-by: Marek Belisko <marek@goldelico.com>
>> ---
>> Based on Neil's patch and extend for documentation and bindings include.
>>
>>  .../bindings/net/rfkill/rfkill-relugator.txt       | 28 ++++++++++++++++
>>  include/dt-bindings/net/rfkill-regulator.h         | 23 +++++++++++++
>>  net/rfkill/rfkill-regulator.c                      | 38 ++++++++++++++++++++++
>>  3 files changed, 89 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
>>  create mode 100644 include/dt-bindings/net/rfkill-regulator.h
>>
>> diff --git a/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
>> new file mode 100644
>> index 0000000..cdb7dd7
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
>> @@ -0,0 +1,28 @@
>> +Regulator consumer for rfkill devices
>
> What exactly is an "rfkill" device? How is it used? How does it relate
> to other devices in the DT?
>
> To me, this looks like a leak of a Linux abstraction.
>
>> +
>> +Required properties:
>> +- compatible   : Must be "rfkill-regulator".
>> +- label  : Name of rfkill device.
>
> What's this for? Why does this need a label in the DT? Surely this can
> be implied by the relationship to a particular radio device?
This label is used by rfkill (converted to pdata->name in probe
function) and used for displaying.
Maybe label isn't correct name for that purpose.
>
>> +- type  : Type of rfkill device.
>> +
>> +Possible values (defined in include/dt-bindings/net/rfkill-regulator.h):
>> +     RFKILL_TYPE_ALL
>> +     RFKILL_TYPE_WLAN
>> +     RFKILL_TYPE_BLUETOOTH
>> +     RFKILL_TYPE_UWB
>> +     RFKILL_TYPE_WIMAX
>> +     RFKILL_TYPE_WWAN
>> +     RFKILL_TYPE_GPS
>> +     RFKILL_TYPE_FM
>> +     RFKILL_TYPE_NFC
>
> What do these mean? Why can these not be implied by a relationship to
> any devices of these particular types?
I did platform data -> DT mapping 1 : 1. Maybe we don't need to export
those to separate
include file and only use raw number instead.
>
>> +
>> +- vrfkill-supply - regulator device.
>
> Why isn't this described on the radio revice node? It's a supply to the
> radio, not to the rfkill concept.
rfkill-regulator in probe check for vrfkill regulator so I've added it
to description as without that rfkill-regulator doesn't make sense.
>
>> +
>> +Example:
>> +     gps-rfkill {
>> +             compatible = "rfkill-regulator";
>> +             label = "GPS";
>> +             type = <RFKILL_TYPE_GPS>;
>> +             vrfkill-supply = <&reg>;
>> +     };
>
> Why is this not bound to the particular GPS device in some way?
We can do something like:
gps-device {
    compatible = "my-desired-gps";
    <other device properties>
    rfkill = <&gps-rfkill>;
};
>
> What if I have more than one of any of the types of device this
> supports, which device is this expected to control?
rfkill-regulator is linked with regulator so if you have another
device it is probably controlled
with another regulator.

>
> Why is it described as a separate device in the device tree at all?
>
> I do not think this binding is the right way to describe this.
Some time ago was posted rfkill-gpio DT binding conversion and was
using the nearly the same bindings as
we propose and there was no issue with that.

>
> Thanks,
> Mark.
>
>> +
>> diff --git a/include/dt-bindings/net/rfkill-regulator.h b/include/dt-bindings/net/rfkill-regulator.h
>> new file mode 100644
>> index 0000000..ae32273
>> --- /dev/null
>> +++ b/include/dt-bindings/net/rfkill-regulator.h
>> @@ -0,0 +1,23 @@
>> +/*
>> + * This header provides macros for rfkill-regulator bindings.
>> + *
>> + * Copyright (C) 2014 Marek Belisko <marek@goldelico.com>
>> + *
>> + * GPLv2 only
>> + */
>> +
>> +#ifndef __DT_BINDINGS_RFKILL_REGULATOR_H__
>> +#define __DT_BINDINGS_RFKILL_REGULATOR_H__
>> +
>> +
>> +#define RFKILL_TYPE_ALL              (0)
>> +#define RFKILL_TYPE_WLAN     (1)
>> +#define RFKILL_TYPE_BLUETOOTH        (2)
>> +#define RFKILL_TYPE_UWB              (3)
>> +#define RFKILL_TYPE_WIMAX    (4)
>> +#define RFKILL_TYPE_WWAN     (5)
>> +#define RFKILL_TYPE_GPS              (6)
>> +#define RFKILL_TYPE_FM               (7)
>> +#define RFKILL_TYPE_NFC              (8)
>> +
>> +#endif /* __DT_BINDINGS_RFKILL_REGULATOR_H__ */
>> diff --git a/net/rfkill/rfkill-regulator.c b/net/rfkill/rfkill-regulator.c
>> index cf5b145..a04aff8 100644
>> --- a/net/rfkill/rfkill-regulator.c
>> +++ b/net/rfkill/rfkill-regulator.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/regulator/consumer.h>
>>  #include <linux/rfkill.h>
>>  #include <linux/rfkill-regulator.h>
>> +#include <linux/of_platform.h>
>>
>>  struct rfkill_regulator_data {
>>       struct rfkill *rf_kill;
>> @@ -57,6 +58,31 @@ static struct rfkill_ops rfkill_regulator_ops = {
>>       .set_block = rfkill_regulator_set_block,
>>  };
>>
>> +#ifdef CONFIG_OF
>> +static struct rfkill_regulator_platform_data *
>> +rfkill_regulator_parse_pdata(struct device *dev)
>> +{
>> +     struct rfkill_regulator_platform_data *pdata;
>> +     struct device_node *np = dev->of_node;
>> +     u32 num;
>> +     if (!np)
>> +             return NULL;
>> +     pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
>> +     if (!pdata)
>> +             return NULL;
>> +     if (of_property_read_u32(np, "type", &num) == 0)
>> +             pdata->type = num;
>> +     of_property_read_string(np, "label", &pdata->name);
>> +     return pdata;
>> +}
>> +#else
>> +static inline struct rfkill_regulator_platform_data *
>> +rfkill_regulator_parse_pdata(struct device *dev)
>> +{
>> +     return NULL;
>> +}
>> +#endif
>> +
>>  static int rfkill_regulator_probe(struct platform_device *pdev)
>>  {
>>       struct rfkill_regulator_platform_data *pdata = pdev->dev.platform_data;
>> @@ -65,6 +91,9 @@ static int rfkill_regulator_probe(struct platform_device *pdev)
>>       struct rfkill *rf_kill;
>>       int ret = 0;
>>
>> +     if (!pdata)
>> +             pdata = rfkill_regulator_parse_pdata(&pdev->dev);
>> +
>>       if (pdata == NULL) {
>>               dev_err(&pdev->dev, "no platform data\n");
>>               return -ENODEV;
>> @@ -137,12 +166,21 @@ static int rfkill_regulator_remove(struct platform_device *pdev)
>>       return 0;
>>  }
>>
>> +#ifdef CONFIG_OF
>> +static const struct of_device_id rfkill_regulator_match[] = {
>> +     {.compatible = "rfkill-regulator"},
>> +     {}
>> +};
>> +MODULE_DEVICE_TABLE(of, rfkill_regulator_match);
>> +#endif
>> +
>>  static struct platform_driver rfkill_regulator_driver = {
>>       .probe = rfkill_regulator_probe,
>>       .remove = rfkill_regulator_remove,
>>       .driver = {
>>               .name = "rfkill-regulator",
>>               .owner = THIS_MODULE,
>> +             .of_match_table = of_match_ptr(rfkill_regulator_match),
>>       },
>>  };
>>
>> --
>> 1.8.3.2
>>
>>

BR,

marek

-- 
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

^ permalink raw reply

* Re: irda: BUG: looking up invalid subclass: 4294967295
From: Peter Zijlstra @ 2014-02-10 21:39 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev
In-Reply-To: <20140210213126.GA26078@redhat.com>

On Mon, Feb 10, 2014 at 04:31:26PM -0500, Dave Jones wrote:
> Is this irda being crap, or some weird lockdep corner case ?
> 
> (no idea what the W taint was, some unrelated spew pushed it out
>  of the dmesg buffer before it got chance to hit the log)
> 
> 	Dave
> 
> BUG: looking up invalid subclass: 4294967295

That's -1 if I'm not mistaken, that's indeed an invalid subclass.

> turning off the locking correctness validator.
> CPU: 0 PID: 1965 Comm: trinity-main Tainted: G        W    3.14.0-rc1+ #108
>  00000000ffffffff 00000000a8765326 ffff880205ee7c40 ffffffff9872ae0a
>  00000000ffffffff ffff880205ee7c58 ffffffff9872732c ffff880084069730
>  ffff880205ee7cd0 ffffffff980c10e2 000000010016000f 00000000ffffffff
> Call Trace:
>  [<ffffffff9872ae0a>] dump_stack+0x4e/0x7a
>  [<ffffffff9872732c>] look_up_lock_class.part.18+0x2f/0x34
>  [<ffffffff980c10e2>] __lock_acquire.isra.28+0x722/0xa50
>  [<ffffffff9873a01b>] ? preempt_count_sub+0x6b/0xf0
>  [<ffffffff980c1b6d>] lock_acquire+0x8d/0x120
>  [<ffffffffc05438f2>] ? hashbin_delete+0xf2/0x100 [irda]
>  [<ffffffffc0545e30>] ? irias_delete_value+0x30/0x30 [irda]
>  [<ffffffff987357fc>] _raw_spin_lock_irqsave_nested+0x4c/0x70
>  [<ffffffffc05438f2>] ? hashbin_delete+0xf2/0x100 [irda]
>  [<ffffffffc05438f2>] hashbin_delete+0xf2/0x100 [irda]

I tried looking at that code but gave up real quick.. That code is
'creative'.

^ permalink raw reply

* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Grant Likely @ 2014-02-10 22:05 UTC (permalink / raw)
  To: Sergei Shtylyov, Florian Fainelli, Rob Herring
  Cc: netdev, Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
	linux-doc@vger.kernel.org
In-Reply-To: <52F396DE.8010306@cogentembedded.com>

On Thu, 06 Feb 2014 18:06:22 +0400, Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote:
> Hello.
> 
> On 06-02-2014 13:43, Grant Likely wrote:
> 
> >>>>>>>>>>>        I'm afraid that's too late, it has spread very far, so that
> >>>>>>>>>>> of_get_phy_mode() handles that property, not "phy-connection-type".
> 
> >>>>>>>>>> Uggg, I guess this is a case of a defacto standard then if the kernel
> >>>>>>>>>> doesn't even support it.
> 
> >>>>>>>>> Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
> >>>>>>>>> while ago for of_get_phy_mode() to look for both "phy-mode" and
> >>>>>>>>> "phy-connection-type" since the former has been a Linux invention, but
> >>>>>>>>> the latter is ePAPR specified.
> 
> >>>>>>>> Here is a link to the actual patch in question, not sure which tree
> >>>>>>>> Grant applied it to though:
> 
> >>>>>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1311.2/00048.html
> 
> >>>>>>>        It's not the patch mail, it's Grant's "applied" reply, patch is mangled in
> >>>>>>> this reply, and I couldn't follow the thread. Here's the actual patch mail:
> 
> >>>>>>> http://marc.info/?l=devicetree&m=138449662807254
> 
> >>>>>>        Florian, I didn't find this patch in Grant's official tree, so maybe you
> >>>>>> should ask him where is the patch already?
> 
> >>>>> Sorry, I accidentally dropped it. It will be in the next merge window.
> 
> >>>>       Already saw it, thanks. Would that it was in 3.14 instead of course, so
> >>>> that I could use "phy-connection-type" in my binding...
> 
> >>> Is 3.14 broken because of missing the patch? If so I'll get it merged as
> >>   > a bug fix.
> 
> >>      No, it's not. I could have used "phy-connection-type" in my binding
> >> destined for 3.15 and document it as a preferred property as well.
> 
> > You still can. We just need to make sure that your patch is applied on
> 
>     Patches.
> 
> > top of the phy-connection-type patch.
> 
>     I'm not sure this trick is possible if the patches are merged via the 
> different trees...

There are two ways to do it. A) by having a common merge commit
containing that patch and merged into both branches, or B) just merging
the patch in the same tree.

Normally I'd suggest B), but I've already picked up the patch and I try
very hard not to rebase my commit tree. However, since the branch is
stable, you can ask for my branch to be merged into the net branch
before applying the dependant patches. The relevant commit id is
cf4c9eb5a4, and it is in my devicetree/next branch on
git://git.secretlab.ca/git/linux

g.

^ permalink raw reply

* [RFC 1/2] ipv6: disable autoconfiguration and DAD on non-multicast links
From: Luis R. Rodriguez @ 2014-02-10 22:29 UTC (permalink / raw)
  To: netdev
  Cc: xen-devel, Luis R. Rodriguez, Olaf Kirch, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1392071391-13215-1-git-send-email-mcgrof@do-not-panic.com>

From: "Luis R. Rodriguez" <mcgrof@suse.com>

RFC4862 [0] on IPv6 on Stateless Address Autoconfiguration on
Sections 4 and 5 state that autoconfiguration is performed only
on multicast-capable links. Multicast is used to ensure the
automatically assigned address is unique by sending Neighbor
Solicitation Messages and listening for these same messages
on both the all-nodes multicast address and the solicited-node
multicast address of the tentative address, this is called
Duplicate Address Detection (DAD) and documented on Section 5.4.
DAD has an optimization, Optimistic DAD [1] and it also requires
multicast. Skip autoconfiguration and all forms of DAD on
non-multicast links.

We don't *fully* disable IPV6 for non-multicast links as
there are signs non-multicast IPV6 devices are wished to
be supported, one example being the ipv6 autoconf module
parameter, but it should be noted that RFC4682 Section 5.4
makes it clear that DAD *MUST* be performed on all unicast
addresses prior to assigning them to an interface, regardless of
whether they are obtained through stateless autoconfiguration,
DHCPv6, or manual configuration with the following exceptions:

   -  When DupAddrDetectTransmits is set to zero, DAD
      can be skipped
   -  Anycast addresses can skip DAD

In the case that autoconfiguration is disabled the interface
still gets assigned a temporary address via ipv6_create_tempaddr()
however it will be kept as temporary, IFA_F_TEMPORARY.

[0] http://tools.ietf.org/html/rfc4862
[1] http://tools.ietf.org/html/rfc4429

Cc: Olaf Kirch <okir@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 net/ipv6/addrconf.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ad23569..362f64f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2211,7 +2211,8 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
 
 	/* Try to figure out our local address for this prefix */
 
-	if (pinfo->autoconf && in6_dev->cnf.autoconf) {
+	if (pinfo->autoconf && in6_dev->cnf.autoconf &&
+	    dev->flags & IFF_MULTICAST) {
 		struct inet6_ifaddr *ifp;
 		struct in6_addr addr;
 		int create = 0, update_lft = 0;
@@ -2248,7 +2249,8 @@ ok:
 
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 			if (in6_dev->cnf.optimistic_dad &&
-			    !net->ipv6.devconf_all->forwarding && sllao)
+			    !net->ipv6.devconf_all->forwarding && sllao &&
+			    dev->flags & IFF_MULTICAST)
 				addr_flags = IFA_F_OPTIMISTIC;
 #endif
 
@@ -3161,6 +3163,7 @@ static void addrconf_dad_start(struct inet6_ifaddr *ifp)
 		goto out;
 
 	if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
+	    !(dev->flags&IFF_MULTICAST) ||
 	    idev->cnf.accept_dad < 1 ||
 	    !(ifp->flags&IFA_F_TENTATIVE) ||
 	    ifp->flags & IFA_F_NODAD) {
@@ -3288,6 +3291,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 	send_rs = send_mld &&
 		  ipv6_accept_ra(ifp->idev) &&
 		  ifp->idev->cnf.rtr_solicits > 0 &&
+		  (dev->flags&IFF_MULTICAST) &&
 		  (dev->flags&IFF_LOOPBACK) == 0;
 	read_unlock_bh(&ifp->idev->lock);
 
@@ -4192,8 +4196,9 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_IPV6_IFADDR, err);
 }
 
-static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
-				__s32 *array, int bytes)
+static inline void ipv6_store_devconf(struct net_device *dev,
+				      struct ipv6_devconf *cnf,
+				      __s32 *array, int bytes)
 {
 	BUG_ON(bytes < (DEVCONF_MAX * 4));
 
@@ -4203,7 +4208,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_MTU6] = cnf->mtu6;
 	array[DEVCONF_ACCEPT_RA] = cnf->accept_ra;
 	array[DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
-	array[DEVCONF_AUTOCONF] = cnf->autoconf;
+	if (dev->flags & IFF_MULTICAST)
+		array[DEVCONF_AUTOCONF] = cnf->autoconf;
 	array[DEVCONF_DAD_TRANSMITS] = cnf->dad_transmits;
 	array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
 	array[DEVCONF_RTR_SOLICIT_INTERVAL] =
@@ -4326,7 +4332,7 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev)
 	nla = nla_reserve(skb, IFLA_INET6_CONF, DEVCONF_MAX * sizeof(s32));
 	if (nla == NULL)
 		goto nla_put_failure;
-	ipv6_store_devconf(&idev->cnf, nla_data(nla), nla_len(nla));
+	ipv6_store_devconf(idev->dev, &idev->cnf, nla_data(nla), nla_len(nla));
 
 	/* XXX - MC not implemented */
 
-- 
1.8.5.3

^ permalink raw reply related

* [RFC 0/2] xen-backend interfaces and IFF_MULTICAST
From: Luis R. Rodriguez @ 2014-02-10 22:29 UTC (permalink / raw)
  To: netdev; +Cc: xen-devel, Luis R. Rodriguez

Virtualization hypervisors do not make use of backend Ethernet interfaces
like typical Ethernet interfaces. In my current testing at least xen-netback
requires at least a struct in_device with a proper MTU setting in order for
the front end interface to function properly. Under the current design this
will kick off ipv6 autoconfiguration and DAD on these interfaces. This does
not happen for xen's TAP interface when HVM is used. KVM only uses TAP
interfaces but their backend interfaces *do not* perform ipv6 autoconfiguration
and DAD even though its TAP interfaces do have multicast enabled.

In xen's case some xen users used to run into issues with the current
architecture when bundles of xen guests were on the same network and ipv6
autoconfiguration was performed. This happens because the MAC address is
static and while this can be corrected by randomizing it an ipv6 address
is simply not needed for them. There is currently no way to disable ipv6
interfaces on specific type of interfaces but this is just begging review
of the architecture on why an interface is even needed at all, how about
ipv4 addresses, why do we need inetdev_init() on these virtualized
interfaces?

Disabling multicast on an interface should disable ipv6 autoconfiguration
and DAD but the note on include/uapi/linux/if_link.h makes it clear that
IFF_MULTICAST should be considered carefully given that not-NBMA links are
known to support multicast, this includes all IFF_POINTOPOINT and IFF_BROADCAST
as well. If we are to follow the RFCs on ipv6 autoconfiguration and DAD
however its clear that muliticast is required -- but if we have no reliable
way of determining this capability we won't know when we could perform
autoconfiguration and DAD properly.

If the patch to require IFF_MULTICAST for autoconfiguration and DAD is
valid then xen-netback can simply clear the flag, clearing it is required
as ether_setup() is used during the net_device allocation. I'm currently
reviewing the need for any proper-mtu interface on xen-netback but in the
meantime I'd like some feedback on IFF_MULTICAST and the following
patches.

Luis R. Rodriguez (2):
  ipv6: disable autoconfiguration and DAD on non-multicast links
  xen-netback: disable multicast and use a random hw MAC address

 drivers/net/xen-netback/interface.c | 14 +++++---------
 net/ipv6/addrconf.c                 | 18 ++++++++++++------
 2 files changed, 17 insertions(+), 15 deletions(-)

-- 
1.8.5.3

^ permalink raw reply

* [RFC 2/2] xen-netback: disable multicast and use a random hw MAC address
From: Luis R. Rodriguez @ 2014-02-10 22:29 UTC (permalink / raw)
  To: netdev; +Cc: xen-devel, Luis R. Rodriguez, Paul Durrant, Ian Campbell, Wei Liu
In-Reply-To: <1392071391-13215-1-git-send-email-mcgrof@do-not-panic.com>

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Although the xen-netback interfaces do not participate in the
link as a typical Ethernet device interfaces for them are
still required under the current archtitecture. IPv6 addresses
do not need to be created or assigned on the xen-netback interfaces
however, even if the frontend devices do need them, so clear the
multicast flag to ensure the net core does not initiate IPv6
Stateless Address Autoconfiguration. Clearing the multicast
flag is required given that the net_device is using the
ether_setup() helper.

There's also no good reason why the special MAC address of
FE:FF:FF:FF:FF:FF is being used other than to avoid issues
with STP, since using this can create an issue if a user
decides to enable multicast on the backend interfaces simply
use a random MAC address with the xen OUI prefix as is done
with the frontend through xen udev scripts.

Cc: Paul Durrant <Paul.Durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/net/xen-netback/interface.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index b9de31e..479fbd1 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -42,6 +42,8 @@
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64

+static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };
+
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@@ -347,15 +349,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	for (i = 0; i < MAX_PENDING_REQS; i++)
 		vif->mmap_pages[i] = NULL;

-	/*
-	 * Initialise a dummy MAC address. We choose the numerically
-	 * largest non-broadcast address to prevent the address getting
-	 * stolen by an Ethernet bridge for STP purposes.
-	 * (FE:FF:FF:FF:FF:FF)
-	 */
-	memset(dev->dev_addr, 0xFF, ETH_ALEN);
-	dev->dev_addr[0] &= ~0x01;
-
+	eth_hw_addr_random(dev);
+	memcpy(dev->dev_addr, xen_oui, 3);
+	dev->flags &= ~IFF_MULTICAST;
 	netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);

 	netif_carrier_off(dev);
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH net-next 0/9] bnx2x: Enhancements & semantic changes series
From: David Miller @ 2014-02-10 22:32 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, ariele
In-Reply-To: <1392045422-5437-1-git-send-email-yuvalmin@broadcom.com>

From: Yuval Mintz <yuvalmin@broadcom.com>
Date: Mon, 10 Feb 2014 17:16:53 +0200

> This patch series contains several semantic (or mostly semantic) patches,
> as well as adding support for packet aggregations on the receive path
> of windows VMs and updating bnx2x to the new FW recently accepted upstream.
> 
> Please consider applying these patches to `net-next'.

The net-next tree is not open yet, please resubmit this when I open
that tree back up.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 net 0/9] bridge: Fix corner case problems around local fdb entries
From: David Miller @ 2014-02-10 22:36 UTC (permalink / raw)
  To: makita.toshiaki; +Cc: stephen, vyasevic, netdev
In-Reply-To: <1391759306-24956-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Date: Fri,  7 Feb 2014 16:48:17 +0900

> There are so many corner cases that are not handled properly around local
> fdb entries.
> - We might fail to delete the old entry and might delete an arbitrary local
>   entry when changing mac address of a bridge port.
> - We always fail to delete the old entry when changing mac address of the
>   bridge device.
> - We might incorrectly delete a necessary entry when detaching a bridge port.
> - We might incorrectly delete a necessary entry when deleting a vlan.
> and so on.
> 
> This is a patch series to fix these issues.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] iproute: Properly handle protocol level diag module absence
From: Stephen Hemminger @ 2014-02-10 22:37 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Stephen Hemminger, Linux Netdev List,
	François-Xavier Le Bail
In-Reply-To: <52E7E990.7050501@parallels.com>

On Tue, 28 Jan 2014 21:32:00 +0400
Pavel Emelyanov <xemul@parallels.com> wrote:

> When *_diag module is missing in the kernel, the ss tool should go
> ad read legacry /proc/* files.
> 
> This is the case when all *_diag stuff is missing, but in case the
> inet_diag.ko is loaded, but (tcp|udp)_diag.ko is not, the ss tool
> doesn't notice this and produces empty output. The reason for that
> is -- error from the inet_diag module (which means, that e.g. the
> udp_diag is missing) is reported in the NLMSG_DONE message body.
> 
> That said, we need to check the NLMSG_DONE's message return code
> and act respectively.
> 
> Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
> 

Silently ignoring the error seems wrong.
The fallback is good, but we should try and report the error so that
the user fixes the kernel config.

^ permalink raw reply

* Re: [PATCH 09/28] Remove ATHEROS_AR231X
From: Florian Fainelli @ 2014-02-10 22:37 UTC (permalink / raw)
  To: Sergey Ryazanov
  Cc: Oleksij Rempel, Richard Weinberger, Jonathan Bither,
	OpenWrt Development List, Hauke Mehrtens, Jiri Slaby,
	Nick Kossifidis, Luis R. Rodriguez, John W. Linville,
	open list:ATHEROS ATH5K WIR..., open list:ATHEROS ATH5K WIR...,
	open, list@hauke-m.de:NETWORKING DRIVERS, open list,
	antonynpavlov@gmail.com
In-Reply-To: <CAHNKnsQF5Hdf3JT91moPraRQzn4kMpdD2NSiJ7b30wQhS1W4MA@mail.gmail.com>

2014-02-10 4:38 GMT-08:00 Sergey Ryazanov <ryazanov.s.a@gmail.com>:
> 2014-02-10 16:17 GMT+04:00 Oleksij Rempel <linux@rempel-privat.de>:
>> Am 10.02.2014 13:05, schrieb Sergey Ryazanov:
>>> 2014-02-10 0:03 GMT+04:00 Richard Weinberger <richard@nod.at>:
>>>> Am 09.02.2014 20:18, schrieb Hauke Mehrtens:
>>>>> On 02/09/2014 07:47 PM, Richard Weinberger wrote:
>>>>>> The symbol is an orphan, get rid of it.
>>>>>>
>>>>>> Signed-off-by: Richard Weinberger <richard@nod.at>
>>>>>> ---
>>>>>>  drivers/net/wireless/ath/ath5k/Kconfig | 10 +++++-----
>>>>>>  drivers/net/wireless/ath/ath5k/ath5k.h | 28 ----------------------------
>>>>>>  drivers/net/wireless/ath/ath5k/base.c  | 14 --------------
>>>>>>  drivers/net/wireless/ath/ath5k/led.c   |  7 -------
>>>>>>  4 files changed, 5 insertions(+), 54 deletions(-)
>>>>>>
>>>>>
>>>>> This code is used in OpenWrt with an out of tree arch code for the
>>>>> Atheros 231x/531x SoC. [0] I do not think anyone is working on adding
>>>>> this code to mainline Linux kernel, because of lack of time/interest.
>>>>
>>>> Sorry, we don't maintain out of tree code.
>>>>
>>>
>>> Oleksij, Jonathan do you still working to make ar231x devices work
>>> with upstream, since your posts [1, 2]? Or may be someone from OpenWRT
>>> team would like to add upstream support?
>>>
>>> 1. https://lkml.org/lkml/2013/5/13/321
>>> 2. https://lkml.org/lkml/2013/5/13/358
>>>
>>
>> Hi,
>> my current target was to provide barebox and openocd support.
>> - ar2313 is already upstream on barebox.
>> - ar2315-2318 (barebox) awaiting review by Anthony Pavlov.
>> - openocd (EJTAG) support is ready and i'll push it ASUP.
>>
> WOW, Impressive.

That's a nice toy project, although since there are is an existing
bootloader with sources, I would have shifted the priority towards
getting the kernel support merged such that the bootloader can be used
for something. BTW I sent a few devices to Jonathan, not sure if he
ever got those...

>
>> I hope Jonathan do kernel part. If not, i can provide some work, since i
>> have testing boards and expiriance on this hardware.
>>
> If you need, I can test kernel part, or even do some porting work. I
> have some AR231x based boards, e.g. Ubnt LS2 and NS2.

I guess you could start splitting the OpenWrt patches into a format
that makes them suitable for being merged upstream and starting with
the MIPS parts. There might be a bunch of checkpatch.pl cleanup work
to do before getting those submitted.
-- 
Florian

^ permalink raw reply

* Re: [PATCH v2 1/3] net: stmmac:sti: Add STi SOC glue driver.
From: David Miller @ 2014-02-10 22:40 UTC (permalink / raw)
  To: srinivas.kandagatla
  Cc: netdev, robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	rob, linux, stuart.menefy, peppe.cavallaro, devicetree, linux-doc,
	linux-kernel, linux-arm-kernel, kernel
In-Reply-To: <1391770525-24349-1-git-send-email-srinivas.kandagatla@st.com>

From: <srinivas.kandagatla@st.com>
Date: Fri, 7 Feb 2014 10:55:25 +0000

> +		if (dwmac->interface == PHY_INTERFACE_MODE_MII ||
> +			dwmac->interface == PHY_INTERFACE_MODE_GMII) {

This is not indented correctly, the first character on the second line should
line up exactly at the column after the openning parenthesis on the first
line.

The objective is not to indent using only TAB characters, which you
are doing here.

Rather, the objective is to use the appropriate number of TAB _and_
space characters necessary to reach the proper column.

> +		const char *rs;
> +		err = of_property_read_string(np, "st,tx-retime-src", &rs);

Please add an empty line after the local variable declaration.

> +		if (!strcasecmp(rs, "clk_125"))
> +			dwmac->is_tx_retime_src_clk_125 = true;
> +
> +	}

That empty line is superfluous, please delete it.

^ permalink raw reply

* Re: [Patch iproute2] pedit: do not print debugging information by default
From: Stephen Hemminger @ 2014-02-10 22:44 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Jamal Hadi Salim
In-Reply-To: <1391037179-3510-1-git-send-email-xiyou.wangcong@gmail.com>

On Wed, 29 Jan 2014 15:12:59 -0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> ---
> diff --git a/tc/m_pedit.c b/tc/m_pedit.c
> index 452d96f..16dd277 100644
> --- a/tc/m_pedit.c
> +++ b/tc/m_pedit.c
> @@ -30,7 +30,7 @@
>  #include "m_pedit.h"
>  
>  static struct m_pedit_util *pedit_list;
> -int pedit_debug = 1;
> +static int pedit_debug;
>  
>  static void
>  explain(void)

Applied

^ permalink raw reply

* Re: [PATCH iproute2 0/3] Support for tcp-metrics source address
From: Stephen Hemminger @ 2014-02-10 22:47 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: netdev
In-Reply-To: <1391724904-30504-1-git-send-email-christoph.paasch@uclouvain.be>

On Thu,  6 Feb 2014 23:15:01 +0100
Christoph Paasch <christoph.paasch@uclouvain.be> wrote:

> This patchset implements support for showing and deleting tcp-metrics
> based on the source-address.
> 
> Christoph Paasch (3):
>   tcp_metrics: Rename addr to daddr and add local variable
>   tcp_metrics: Display source-address
>   tcp_metrics: Allow removal based on the source-IP
> 
>  include/linux/tcp_metrics.h |   2 +
>  ip/tcp_metrics.c            | 154 +++++++++++++++++++++++++++++++-------------
>  2 files changed, 111 insertions(+), 45 deletions(-)
> 

Applied

^ permalink raw reply

* Re: [PATCH] iproute: Fix Netid value for multi-families output
From: Stephen Hemminger @ 2014-02-10 22:48 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Stephen Hemminger, Linux Netdev List,
	François-Xavier Le Bail
In-Reply-To: <52E74D0F.4050806@parallels.com>

On Tue, 28 Jan 2014 10:24:15 +0400
Pavel Emelyanov <xemul@parallels.com> wrote:

> When requesting simultaneous output of TCP and UDP sockets
> the netid field shows "tcp" always.

Applied

^ permalink raw reply

* Latest version of zero-copy fix
From: David Miller @ 2014-02-10 23:21 UTC (permalink / raw)
  To: ryao; +Cc: netdev

So now the patch only tests is_vmalloc_addr(), did you test this
version with the situation that triggers the given backtrace?

[<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510
[<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
[<ffffffff814839dd>] p9_client_read+0x15d/0x240
[<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0
[<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20
[<ffffffff811c84e7>] v9fs_file_read+0x37/0x70
[<ffffffff8114e3fb>] vfs_read+0x9b/0x160
[<ffffffff81153571>] kernel_read+0x41/0x60
[<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180

This is reading from v9fs into a module address.

It's going generate the same backtrace with your fix.

I don't think the situation was sufficiently explained to Linus.  In
fact, I didn't see the above backtrace mentioned at all.

^ permalink raw reply

* Re: [PATCH] tcp: tsq: fix nonagle handling
From: David Miller @ 2014-02-10 23:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.ogness, thomas
In-Reply-To: <1392000011.6615.15.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Feb 2014 18:40:11 -0800

> From: John Ogness <john.ogness@linutronix.de>
> 
> Commit 46d3ceabd8d9 ("tcp: TCP Small Queues") introduced a possible
> regression for applications using TCP_NODELAY.
> 
> If TCP session is throttled because of tsq, we should consult
> tp->nonagle when TX completion is done and allow us to send additional
> segment, especially if this segment is not a full MSS.
> Otherwise this segment is sent after an RTO.
> 
> [edumazet] : Cooked the changelog, added another fix about testing
> sk_wmem_alloc twice because TX completion can happen right before
> setting TSQ_THROTTLED bit.
> 
> This problem is particularly visible with recent auto corking,
> but might also be triggered with low tcp_limit_output_bytes
> values or NIC drivers delaying TX completion by hundred of usec,
> and very low rtt.
> 
> Thomas Glanzmann for example reported an iscsi regression, caused
> by tcp auto corking making this bug quite visible.
> 
> Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues")
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Thomas Glanzmann <thomas@glanzmann.de>

Applied and queued up for -stable, thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox