Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Thomas Glanzmann @ 2014-02-10 21:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nicholas A. Bellinger, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1392066100.6615.55.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Hmm.. I was not aware of high RTT for some packets.
> Can you spot this on the pcap you provided ?

with the latest patch as in:

(node-62) [~/work/linux-2.6] git diff | pbot
http://pbot.rmdir.de/CQwqI6b7wJProw_xaukmEg

with net.ipv4.tcp_min_tso_segs=2 we had this pcap:
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more_notlast.pcap.bz2
And here is the RTT TCP graph:
Wireshark > Statistics > TCP Stream Graph > Round Trip Time Graph
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-10-22_04_30.png

with net.ipv4.tcp_min_tso_segs=8 we have this pcap:
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more_notlast_min_tso_segs_8.pcap.bz2
And here is the RTT TCP graph:
Wireshark > Statistics > TCP Stream Graph > Round Trip Time Graph

This gives us 0.0015 seconds RTT (1.5 ms)

Without TCP autocorking we had 0.0005 (0.5 ms).

https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Eric Dumazet @ 2014-02-10 21:01 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Nicholas A. Bellinger, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140210205634.GC19621@glanzmann.de>

On Mon, 2014-02-10 at 21:56 +0100, Thomas Glanzmann wrote:
> Hello Nab,
> 
> > This looks correct to me.  Thomas, once your able to confirm please
> > include your 'Tested-by' and I'll include for the next -rc3 PULL
> > request.
> 
> Eric is currently reviewing our latest iteration with MSG_MORE for
> kernel_sendmsg and MSG_MORE | MSG_SENDPAGE_NOTLAST for sendpage. However
> with the last iteration we had again a high RTT for some packets. But
> than Eric let me tune net.ipv4.tcp_min_tso_segs to 8 and the RTT went
> down to what it used before auto corking was enabled. At least almost.
> 

Hmm.. I was not aware of high RTT for some packets.

Can you spot this on the pcap you provided ?


> I'm having a steep learning curve but Eric hopefully knows how to get
> this back in check. Nevertheless the regression I saw are history
> because I saw that Eric has submitted the patch to David S. Miller which
> fixes the two bugs that killed the iSCSI performance when tcp auto
> corking was on. So currently we're just optimizing to get the last 20%
> or so out of it. Quite interesting. Especially how much bandwidth can be
> saved by coalescing packets.

It depends on the ratio payload/headers.

The beginning of your pcap show a lot of 512 bytes requests, so for this
kind of requests, the gain is huge (maybe 50%), but for 32K or 64K
request, gain would be marginal.

^ permalink raw reply

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Pravin Shelar @ 2014-02-10 21:00 UTC (permalink / raw)
  To: Pravin Shelar, David Miller, netdev, Templin, Fred L,
	Nicolas Dichtel
In-Reply-To: <20140208005843.GE16198@order.stressinduktion.org>

On Fri, Feb 7, 2014 at 4:58 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> [Cc Nicolas]
>
> On Fri, Feb 07, 2014 at 02:49:20PM -0800, Pravin Shelar wrote:
>> On Fri, Feb 7, 2014 at 2:28 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > Hi!
>> >
>> > On Fri, Feb 07, 2014 at 02:12:38PM -0800, Pravin wrote:
>> >> --- a/net/core/skbuff.c
>> >> +++ b/net/core/skbuff.c
>> >> @@ -3905,12 +3905,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
>> >>   */
>> >>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>> >>  {
>> >> -     if (xnet)
>> >> +     if (xnet) {
>> >>               skb_orphan(skb);
>> >> +             skb->local_df = 0;
>> >> +     }
>> >>       skb->tstamp.tv64 = 0;
>> >>       skb->pkt_type = PACKET_HOST;
>> >>       skb->skb_iif = 0;
>> >> -     skb->local_df = 0;
>> >>       skb_dst_drop(skb);
>> >>       skb->mark = 0;
>> >>       secpath_reset(skb);
>> >
>> > I wonder if this should be the right behaviour for tunnels, which should just
>> > do fragmentation based on IP_DF, even if the packet originated locally from a
>> > socket which allowed local fragmentation (inet->pmtudisc < IP_PMTUDISC_DO).
>> >
>> This is not about tunneling, skb_scrub_packet() is generic function
>> which should not reset local_df on all packets.
>>
>> We can have separate discussion about use of local_df and tunneling in
>> another thread.
>
> This change only affects tunnel code as of current net branch, how do
> you not expect a discussion about that in this thread, I really wonder?
>
> May I know because of wich vport, vxlan or gre, you did this change?
>
It affects both gre and vxlan.

> I am feeling a bit uncomfortable handling remote and local packets that
> differently on lower tunnel output (local_df is mostly set on locally
> originating packets).

For ip traffic it make sense to turn on local_df only for local
traffic, since for remote case we can send icmp (frag-needed) back to
source. No such thing exist for OVS tunnels. ICMP packet are not
returned to source for the tunnels. That is why to be on safe side,
local_df is turned on for tunnels in OVS.

Thanks.

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Thomas Glanzmann @ 2014-02-10 20:56 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Eric Dumazet, John Ogness, Eric Dumazet, David S. Miller,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1392058718.17867.8.camel@haakon3.risingtidesystems.com>

Hello Nab,

> This looks correct to me.  Thomas, once your able to confirm please
> include your 'Tested-by' and I'll include for the next -rc3 PULL
> request.

Eric is currently reviewing our latest iteration with MSG_MORE for
kernel_sendmsg and MSG_MORE | MSG_SENDPAGE_NOTLAST for sendpage. However
with the last iteration we had again a high RTT for some packets. But
than Eric let me tune net.ipv4.tcp_min_tso_segs to 8 and the RTT went
down to what it used before auto corking was enabled. At least almost.

I'm having a steep learning curve but Eric hopefully knows how to get
this back in check. Nevertheless the regression I saw are history
because I saw that Eric has submitted the patch to David S. Miller which
fixes the two bugs that killed the iSCSI performance when tcp auto
corking was on. So currently we're just optimizing to get the last 20%
or so out of it. Quite interesting. Especially how much bandwidth can be
saved by coalescing packets.

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH iproute2 net-next-for-3.13 2/2] iplink_bond: add support for displaying bond slave attributes
From: Stephen Hemminger @ 2014-02-10 20:53 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20140210081156.787D5E5697@unicorn.suse.cz>

On Mon, 10 Feb 2014 09:11:56 +0100 (CET)
Michal Kubecek <mkubecek@suse.cz> wrote:

> +static const char *slave_state_tbl[] = {
> +	"active",
> +	"backup",
> +	NULL
> +};
> +
> +static const char *slave_mii_status_tbl[] = {
> +	"up",
> +	"going_down",
> +	"down",
> +	"going_back",
> +	NULL,
> +};
> +

Is there some correlation of these states to values in a header file.
Something like

static const char *slave_mii_status_tbl[] = {
	[BOND_LINK_UP] = "active",
	[BOND_LINK_FAIL] = "fail",
	[BOND_LINK_DOWN] = "down",
	[BOND_LINK_BACK] = "back",
};


And don't use null terminated arrays, use ARRAY_SIZE() instead.
	

^ permalink raw reply

* [PATCH] net: fix macvtap type name in Kconfig
From: Jan Luebbe @ 2014-02-10 20:40 UTC (permalink / raw)
  To: David S. Miller, netdev

The netlink kind (and iproute2 type option) is actually called
'macvtap', not 'macvlan'.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
---
 drivers/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f342278..494b888 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -139,7 +139,7 @@ config MACVTAP
 	  This adds a specialized tap character device driver that is based
 	  on the MAC-VLAN network interface, called macvtap. A macvtap device
 	  can be added in the same way as a macvlan device, using 'type
-	  macvlan', and then be accessed through the tap user space interface.
+	  macvtap', and then be accessed through the tap user space interface.
 
 	  To compile this driver as a module, choose M here: the module
 	  will be called macvtap.
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH v5 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-02-10 20:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Herbert Xu, Eric Dumazet
In-Reply-To: <1392064537-30646-1-git-send-email-fw@strlen.de>

Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes since V4:
 - use new netif_skb_dev_features instead of netif_skb_features
   as we need dst->dev and not skb->dev (spotted by
   Hannes Frederic Sowa)

Changes since V3:
 - use ip_dst_mtu_maybe_forward instead of dst_mtu
 - add comment wrt. DF bit not being set

Changes since V2:
 - make this thing apply to current -net tree
 - kill unused variables in ip_forward/ip6_output

Changes since V1:
 suggestions from Eric Dumazet:
  - skip more expensive computation for small packets in fwd path
  - use netif_skb_features() feature mask and remove GSO flags
    instead of using 0 feature set.

 include/linux/skbuff.h | 17 ++++++++++++
 net/ipv4/ip_forward.c  | 71 ++++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/ip6_output.c  | 17 ++++++++++--
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f589c9a..3ebbbe7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2916,5 +2916,22 @@ static inline bool skb_head_is_locked(const struct sk_buff *skb)
 {
 	return !skb->head_frag || skb_cloned(skb);
 }
+
+/**
+ * skb_gso_network_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_network_seglen is used to determine the real size of the
+ * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
+ *
+ * The MAC/L2 header is not accounted for.
+ */
+static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
+{
+	unsigned int hdr_len = skb_transport_header(skb) -
+			       skb_network_header(skb);
+	return hdr_len + skb_gso_transport_seglen(skb);
+}
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index e9f1217..f3869c1 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -39,6 +39,71 @@
 #include <net/route.h>
 #include <net/xfrm.h>
 
+static bool ip_may_fragment(const struct sk_buff *skb)
+{
+	return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) ||
+	       !skb->local_df;
+}
+
+static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
+static bool ip_gso_exceeds_dst_mtu(const struct sk_buff *skb)
+{
+	unsigned int mtu;
+
+	if (skb->local_df || !skb_is_gso(skb))
+		return false;
+
+	mtu = ip_dst_mtu_maybe_forward(skb_dst(skb), true);
+
+	/* if seglen > mtu, do software segmentation for IP fragmentation on
+	 * output.  DF bit cannot be set since ip_forward would have sent
+	 * icmp error.
+	 */
+	return skb_gso_network_seglen(skb) > mtu;
+}
+
+/* called if GSO skb needs to be fragmented on forward */
+static int ip_forward_finish_gso(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+	netdev_features_t features;
+	struct sk_buff *segs;
+	int ret = 0;
+
+	features = netif_skb_dev_features(skb, dst->dev);
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+	if (IS_ERR(segs)) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	consume_skb(skb);
+
+	do {
+		struct sk_buff *nskb = segs->next;
+		int err;
+
+		segs->next = NULL;
+		err = dst_output(segs);
+
+		if (err && ret == 0)
+			ret = err;
+		segs = nskb;
+	} while (segs);
+
+	return ret;
+}
+
 static int ip_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options *opt	= &(IPCB(skb)->opt);
@@ -49,6 +114,9 @@ static int ip_forward_finish(struct sk_buff *skb)
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
+	if (ip_gso_exceeds_dst_mtu(skb))
+		return ip_forward_finish_gso(skb);
+
 	return dst_output(skb);
 }
 
@@ -91,8 +159,7 @@ int ip_forward(struct sk_buff *skb)
 
 	IPCB(skb)->flags |= IPSKB_FORWARDED;
 	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
-	if (unlikely(skb->len > mtu && !skb_is_gso(skb) &&
-		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
+	if (!ip_may_fragment(skb) && ip_exceeds_mtu(skb, mtu)) {
 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(mtu));
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ef02b26..070a2fa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -342,6 +342,20 @@ static unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 	return mtu;
 }
 
+static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+{
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)
+		return true;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
 int ip6_forward(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
@@ -466,8 +480,7 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;
 
-	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
-	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
+	if (ip6_pkt_too_big(skb, mtu)) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
1.8.1.5

^ permalink raw reply related

* [PATCH 1/2] net: core: introduce netif_skb_dev_features
From: Florian Westphal @ 2014-02-10 20:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netdevice.h |  7 ++++++-
 net/core/dev.c            | 20 +++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 440a02e..21d4e6b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3068,7 +3068,12 @@ void netdev_change_features(struct net_device *dev);
 void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 					struct net_device *dev);
 
-netdev_features_t netif_skb_features(struct sk_buff *skb);
+netdev_features_t netif_skb_dev_features(struct sk_buff *skb,
+					 const struct net_device *dev);
+static inline netdev_features_t netif_skb_features(struct sk_buff *skb)
+{
+	return netif_skb_dev_features(skb, skb->dev);
+}
 
 static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 3721db7..94f7401 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2495,34 +2495,36 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
 }
 
 static netdev_features_t harmonize_features(struct sk_buff *skb,
-	netdev_features_t features)
+					    const struct net_device *dev,
+					    netdev_features_t features)
 {
 	if (skb->ip_summed != CHECKSUM_NONE &&
 	    !can_checksum_protocol(features, skb_network_protocol(skb))) {
 		features &= ~NETIF_F_ALL_CSUM;
-	} else if (illegal_highdma(skb->dev, skb)) {
+	} else if (illegal_highdma(dev, skb)) {
 		features &= ~NETIF_F_SG;
 	}
 
 	return features;
 }
 
-netdev_features_t netif_skb_features(struct sk_buff *skb)
+netdev_features_t netif_skb_dev_features(struct sk_buff *skb,
+					 const struct net_device *dev)
 {
 	__be16 protocol = skb->protocol;
-	netdev_features_t features = skb->dev->features;
+	netdev_features_t features = dev->features;
 
-	if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
+	if (skb_shinfo(skb)->gso_segs > dev->gso_max_segs)
 		features &= ~NETIF_F_GSO_MASK;
 
 	if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) {
 		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
 		protocol = veh->h_vlan_encapsulated_proto;
 	} else if (!vlan_tx_tag_present(skb)) {
-		return harmonize_features(skb, features);
+		return harmonize_features(skb, dev, features);
 	}
 
-	features &= (skb->dev->vlan_features | NETIF_F_HW_VLAN_CTAG_TX |
+	features &= (dev->vlan_features | NETIF_F_HW_VLAN_CTAG_TX |
 					       NETIF_F_HW_VLAN_STAG_TX);
 
 	if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD))
@@ -2530,9 +2532,9 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
 				NETIF_F_GEN_CSUM | NETIF_F_HW_VLAN_CTAG_TX |
 				NETIF_F_HW_VLAN_STAG_TX;
 
-	return harmonize_features(skb, features);
+	return harmonize_features(skb, dev, features);
 }
-EXPORT_SYMBOL(netif_skb_features);
+EXPORT_SYMBOL(netif_skb_dev_features);
 
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
-- 
1.8.1.5

^ permalink raw reply related

* Re: [PATCH] 6lowpan: Remove unused pointer in lowpan_header_create()
From: David Miller @ 2014-02-10 20:25 UTC (permalink / raw)
  To: alex.aring
  Cc: cengelma, netdev, alex.bluesman.smirnov, dbaryshkov,
	jukka.rissanen
In-Reply-To: <20140210174056.GB12594@omega>

From: Alexander Aring <alex.aring@gmail.com>
Date: Mon, 10 Feb 2014 18:40:58 +0100

> I need the hdr pointer for my upcomming patches. Nevertheless it
> should be removed... it's not used anymore.
> 
> I am wondering why this patch was accepted, it's not a bug fix. Is
> net-next open again?

Unused variables make code difficult to audit for bugs and similar.

Sometimes I apply patches like these for net, net-next is not open
yet.

^ permalink raw reply

* [PATCHv3 1/2] net: stmmac: Add SOCFPGA glue driver
From: dinguyen @ 2014-02-10 19:48 UTC (permalink / raw)
  To: netdev, devicetree
  Cc: dinh.linux, Dinh Nguyen, Giuseppe Cavallaro, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Vince Bridgers

From: Dinh Nguyen <dinguyen@altera.com>

Like the STi series SOCs, Altera's SOCFPGA also needs a glue layer on top of the
Synopsys gmac IP.

This patch adds the platform driver for the glue layer which configures the IP
before the generic STMMAC driver takes over.

Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Vince Bridgers <vbridgers2013@gmail.com>
---
v3: Remove stray empty line at end of dwmac-socfpga.c.
v2: Use the dwmac-sti as an example for a glue layer and split patch up
to have dts as a separate patch. Also cc dts maintainers since there is
a new binding.
---
 drivers/net/ethernet/stmicro/stmmac/Makefile       |    1 +
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    |  183 ++++++++++++++++++++
 2 files changed, 184 insertions(+)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile
index ecadece..73df8b6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_STMMAC_ETH) += stmmac.o
+stmmac-$(CONFIG_ARCH_SOCFPGA) += dwmac-socfpga.o
 stmmac-$(CONFIG_STMMAC_PLATFORM) += stmmac_platform.o
 stmmac-$(CONFIG_STMMAC_PCI) += stmmac_pci.o
 stmmac-$(CONFIG_DWMAC_SUNXI) += dwmac-sunxi.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
new file mode 100644
index 0000000..c7f034b
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -0,0 +1,183 @@
+/*  Copyright (C) 2014 Altera Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Adopted from dwmac-sti.c
+ */
+
+#include <linux/clk.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/mfd/syscon.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/stmmac.h>
+
+#define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII 0x0
+#define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII 0x1
+#define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RMII 0x2
+#define SYSMGR_EMACGRP_CTRL_PHYSEL_WIDTH 2
+#define SYSMGR_EMACGRP_CTRL_PHYSEL_MASK 0x00000003
+
+struct socfpga_dwmac {
+	int	interface;
+	u32	reg_offset;
+	struct	device *dev;
+	struct regmap *sys_mgr_base_addr;
+	struct	device_node *dwmac_np;
+};
+
+static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *dev)
+{
+	struct device_node *np	= dev->of_node;
+	struct device_node *stmmac_np;
+	struct regmap *sys_mgr_base_addr;
+	u32 reg_offset;
+	int ret;
+
+	stmmac_np = of_get_next_available_child(np, NULL);
+	if (!stmmac_np) {
+		dev_info(dev, "No dwmac node found\n");
+		return -EINVAL;
+	}
+
+	if (!of_device_is_compatible(stmmac_np, "snps,dwmac")) {
+		dev_info(dev, "dwmac node isn't compatible with snps,dwmac\n");
+		return -EINVAL;
+	}
+
+	dwmac->interface = of_get_phy_mode(stmmac_np);
+	of_node_put(stmmac_np);
+
+	sys_mgr_base_addr = syscon_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon");
+	if (IS_ERR(sys_mgr_base_addr)) {
+		dev_info(dev, "No sysmgr-syscon node found\n");
+		return PTR_ERR(sys_mgr_base_addr);
+	}
+
+	ret = of_property_read_u32_index(np, "altr,sysmgr-syscon", 1, &reg_offset);
+	if (ret) {
+		dev_info(dev, "Could not reg_offset into sysmgr-syscon!\n");
+		return -EINVAL;
+	}
+
+	dwmac->reg_offset = reg_offset;
+	dwmac->sys_mgr_base_addr = sys_mgr_base_addr;
+	dwmac->dwmac_np = stmmac_np;
+	dwmac->dev = dev;
+
+	return 0;
+}
+
+static int socfpga_dwmac_setup(struct socfpga_dwmac *dwmac)
+{
+	struct regmap *sys_mgr_base_addr = dwmac->sys_mgr_base_addr;
+	int phymode = dwmac->interface;
+	u32 reg_offset = dwmac->reg_offset;
+	u32 ctrl, val, shift = 0;
+
+	if (of_machine_is_compatible("altr,socfpga-vt"))
+		return 0;
+
+	switch (phymode) {
+	case PHY_INTERFACE_MODE_RGMII:
+		val = SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+	case PHY_INTERFACE_MODE_GMII:
+		val = SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII;
+		break;
+	default:
+		dev_err(dwmac->dev, "bad phy mode %d\n", phymode);
+		return -EINVAL;
+	}
+
+	regmap_read(sys_mgr_base_addr, reg_offset, &ctrl);
+	ctrl &= ~(SYSMGR_EMACGRP_CTRL_PHYSEL_MASK << shift);
+	ctrl |= val << shift;
+
+	regmap_write(sys_mgr_base_addr, reg_offset, ctrl);
+	return 0;
+}
+
+static int socfpga_dwmac_probe(struct platform_device *pdev)
+{
+	struct device		*dev = &pdev->dev;
+	struct device_node	*node = dev->of_node;
+	int			ret = -ENOMEM;
+	struct socfpga_dwmac	*dwmac;
+
+	dwmac = devm_kzalloc(dev, sizeof(*dwmac), GFP_KERNEL);
+	if (!dwmac)
+		return -ENOMEM;
+
+	ret = socfpga_dwmac_parse_data(dwmac, dev);
+	if (ret) {
+		dev_err(dev, "Unable to parse OF data\n");
+		return ret;
+	}
+
+	ret = socfpga_dwmac_setup(dwmac);
+	if (ret) {
+		dev_err(dev, "couldn't setup SoC glue (%d)\n", ret);
+		return ret;
+	}
+
+	if (node) {
+		ret = of_platform_populate(node, NULL, NULL, dev);
+		if (ret) {
+			dev_err(dev, "failed to add dwmac core\n");
+			return ret;
+		}
+	} else {
+		dev_err(dev, "no device node, failed to add dwmac core\n");
+		return -ENODEV;
+	}
+
+	platform_set_drvdata(pdev, dwmac);
+
+	return 0;
+}
+
+static int socfpga_dwmac_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static const struct of_device_id socfpga_dwmac_match[] = {
+	{ .compatible = "altr,socfpga-stmmac" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, socfpga_dwmac_match);
+
+static struct platform_driver socfpga_dwmac_driver = {
+	.probe		= socfpga_dwmac_probe,
+	.remove		= socfpga_dwmac_remove,
+	.driver		= {
+		.name	= "socfpga-dwmac",
+		.of_match_table = of_match_ptr(socfpga_dwmac_match),
+	},
+};
+
+module_platform_driver(socfpga_dwmac_driver);
+
+MODULE_ALIAS("platform:socfpga-dwmac");
+MODULE_AUTHOR("Dinh Nguyen <dinguyen@altera.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Altera SOCFPGA DWMAC Glue Layer");
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] net: rfkill-regulator: Add devicetree support.
From: Belisko Marek @ 2014-02-10 20:05 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Johannes Berg, Rob Herring, Pawel Moll, Mark Rutland,
	ijc+devicetree@hellion.org.uk, Kumar Gala, Rob Landley,
	John W. Linville, David Miller, Grant Likely, NeilBrown Brown,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org, LKML,
	linux-wireless, netdev
In-Reply-To: <BF9457FD-CEED-4290-9150-57F51F6E42FC@goldelico.com>

On Mon, Feb 10, 2014 at 9:54 AM, Dr. H. Nikolaus Schaller
<hns@goldelico.com> wrote:
> Am 10.02.2014 um 09:27 schrieb Johannes Berg:
>
>> On Fri, 2014-02-07 at 20:48 +0100, Marek Belisko wrote:
>>
>>> +#define RFKILL_TYPE_ALL             (0)
>>> +#define RFKILL_TYPE_WLAN    (1)
>>> +#define RFKILL_TYPE_BLUETOOTH       (2)
>>> +#define RFKILL_TYPE_UWB             (3)
>>> +#define RFKILL_TYPE_WIMAX   (4)
>>> +#define RFKILL_TYPE_WWAN    (5)
>>> +#define RFKILL_TYPE_GPS             (6)
>>> +#define RFKILL_TYPE_FM              (7)
>>> +#define RFKILL_TYPE_NFC             (8)
>>
>> This seems like a bad idea since there's an enum elsewhere in userspace
>> API already.
I know this is duplicate but in device tree we cannot use enums
(AFAIU include/dt-bindings are defines used in device tree files)
>
> Yes,
> you are right. It is defined in include/uapi/linux/rfkill.h
>
> Tnx,
> Nikolaus
>

BR,

marek

-- 
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

^ permalink raw reply

* [PATCHv3 2/2] dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac.
From: dinguyen @ 2014-02-10 19:48 UTC (permalink / raw)
  To: netdev, devicetree
  Cc: dinh.linux, Dinh Nguyen, Giuseppe Cavallaro, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Vince Bridgers
In-Reply-To: <1392061697-20193-1-git-send-email-dinguyen@altera.com>

From: Dinh Nguyen <dinguyen@altera.com>

This patch adds the dts bindings documenation for the Altera SOCFPGA glue
layer for the Synopsys STMMAC ethernet driver.

Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Vince Bridgers <vbridgers2013@gmail.com>
---
v3: Remove stray empty line at end of socfpga_cyclone5_socdk.dts
v2: Use the dwmac-sti as an example for a glue layer and split patch up
to have dts as a separate patch. Also cc dts maintainers since there is
a new binding.
---
 .../devicetree/bindings/net/socfpga-dwmac.txt      |   35 ++++++++++++++
 arch/arm/boot/dts/socfpga.dtsi                     |   51 +++++++++++++-------
 arch/arm/boot/dts/socfpga_arria5_socdk.dts         |   24 +++++++++
 arch/arm/boot/dts/socfpga_cyclone5.dtsi            |    6 ---
 arch/arm/boot/dts/socfpga_cyclone5_socdk.dts       |   17 +++++++
 arch/arm/boot/dts/socfpga_cyclone5_sockit.dts      |   22 ++++++++-
 arch/arm/boot/dts/socfpga_vt.dts                   |   13 +++--
 7 files changed, 139 insertions(+), 29 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/socfpga-dwmac.txt

diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
new file mode 100644
index 0000000..d53d376
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
@@ -0,0 +1,35 @@
+Altera SOCFPGA SoC DWMAC controller
+
+The device node has following properties.
+
+Required properties:
+ - compatible	: Should contain "altr,socfpga-stmmac"
+ - altr,sysmgr-syscon : Should be the phandle to the system manager node that
+   encompasses the glue register, and the register offset.
+
+Sub-nodes:
+The dwmac core should be added as subnode to SOCFPGA dwmac glue.
+- dwmac :	The binding details of dwmac can be found in
+  Documentation/devicetree/bindings/net/stmmac.txt
+
+Example:
+
+ethernet0: ethernet0 {
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	compatible = "altr,socfpga-stmmac";
+	altr,sysmgr-syscon = <&sysmgr 0x60>;
+	status = "disabled";
+	ranges;
+
+	gmac0: gmac0@ff700000 {
+		compatible = "snps,dwmac-3.70a", "snps,dwmac";
+		reg = <0xff700000 0x2000>;
+		interrupts = <0 115 4>;
+		interrupt-names = "macirq";
+		mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
+		clocks = <&emac0_clk>;
+		clock-names = "stmmaceth";
+	};
+};
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 8c4adb7..ebf6113 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -442,26 +442,43 @@
 				};
 			};
 
-		gmac0: ethernet@ff700000 {
-			compatible = "altr,socfpga-stmmac", "snps,dwmac-3.70a", "snps,dwmac";
-			reg = <0xff700000 0x2000>;
-			interrupts = <0 115 4>;
-			interrupt-names = "macirq";
-			mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
-			clocks = <&emac0_clk>;
-			clock-names = "stmmaceth";
+		ethernet0: ethernet0 {
+			#address-cells = <1>;
+			#size-cells = <1>;
+			compatible = "altr,socfpga-stmmac";
+			altr,sysmgr-syscon = <&sysmgr 0x60>;
 			status = "disabled";
+			ranges;
+
+			gmac0: gmac0@ff700000 {
+				compatible = "snps,dwmac-3.70a", "snps,dwmac";
+				reg = <0xff700000 0x2000>;
+				interrupts = <0 115 4>;
+				interrupt-names = "macirq";
+				mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
+				clocks = <&emac0_clk>;
+				clock-names = "stmmaceth";
+			};
 		};
 
-		gmac1: ethernet@ff702000 {
-			compatible = "altr,socfpga-stmmac", "snps,dwmac-3.70a", "snps,dwmac";
-			reg = <0xff702000 0x2000>;
-			interrupts = <0 120 4>;
-			interrupt-names = "macirq";
-			mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
-			clocks = <&emac1_clk>;
-			clock-names = "stmmaceth";
+		ethernet1: ethernet1 {
+			#address-cells = <1>;
+			#size-cells = <1>;
+			compatible = "altr,socfpga-stmmac";
+			altr,sysmgr-syscon = <&sysmgr 0x60>;
 			status = "disabled";
+			ranges;
+
+			gmac1: gmac1@ff702000 {
+				device_type = "network";
+				compatible = "snps,dwmac-3.70a", "snps,dwmac";
+				reg = <0xff702000 0x2000>;
+				interrupts = <0 120 4>;
+				interrupt-names = "macirq";
+				mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
+				clocks = <&emac1_clk>;
+				clock-names = "stmmaceth";
+			};
 		};
 
 		L2: l2-cache@fffef000 {
@@ -538,7 +555,7 @@
 			reg = <0xffd05000 0x1000>;
 		};
 
-		sysmgr@ffd08000 {
+		sysmgr: sysmgr@ffd08000 {
 				compatible = "altr,sys-mgr", "syscon";
 				reg = <0xffd08000 0x4000>;
 			};
diff --git a/arch/arm/boot/dts/socfpga_arria5_socdk.dts b/arch/arm/boot/dts/socfpga_arria5_socdk.dts
index 5beffb2..2d6b38b 100644
--- a/arch/arm/boot/dts/socfpga_arria5_socdk.dts
+++ b/arch/arm/boot/dts/socfpga_arria5_socdk.dts
@@ -37,4 +37,28 @@
 		*/
 		ethernet0 = &gmac1;
 	};
+
+	aliases {
+		/* this allow the ethaddr uboot environmnet variable contents
+		 * to be added to the gmac1 device tree blob.
+		 */
+		ethernet0 = &gmac1;
+	};
+};
+
+&ethernet1 {
+	status = "okay";
+};
+
+&gmac1 {
+	phy-mode = "rgmii";
+
+	rxd0-skew-ps = <0>;
+	rxd1-skew-ps = <0>;
+	rxd2-skew-ps = <0>;
+	rxd3-skew-ps = <0>;
+	txen-skew-ps = <0>;
+	txc-skew-ps = <2600>;
+	rxdv-skew-ps = <0>;
+	rxc-skew-ps = <2000>;
 };
diff --git a/arch/arm/boot/dts/socfpga_cyclone5.dtsi b/arch/arm/boot/dts/socfpga_cyclone5.dtsi
index ca41b0e..454148d 100644
--- a/arch/arm/boot/dts/socfpga_cyclone5.dtsi
+++ b/arch/arm/boot/dts/socfpga_cyclone5.dtsi
@@ -39,12 +39,6 @@
 			};
 		};
 
-		ethernet@ff702000 {
-			phy-mode = "rgmii";
-			phy-addr = <0xffffffff>; /* probe for phy addr */
-			status = "okay";
-		};
-
 		timer0@ffc08000 {
 			clock-frequency = <100000000>;
 		};
diff --git a/arch/arm/boot/dts/socfpga_cyclone5_socdk.dts b/arch/arm/boot/dts/socfpga_cyclone5_socdk.dts
index 2ee52ab..26c63a0 100644
--- a/arch/arm/boot/dts/socfpga_cyclone5_socdk.dts
+++ b/arch/arm/boot/dts/socfpga_cyclone5_socdk.dts
@@ -38,3 +38,20 @@
 		ethernet0 = &gmac1;
 	};
 };
+
+&ethernet1 {
+	status = "okay";
+};
+
+&gmac1 {
+	phy-mode = "rgmii";
+
+	rxd0-skew-ps = <0>;
+	rxd1-skew-ps = <0>;
+	rxd2-skew-ps = <0>;
+	rxd3-skew-ps = <0>;
+	txen-skew-ps = <0>;
+	txc-skew-ps = <2600>;
+	rxdv-skew-ps = <0>;
+	rxc-skew-ps = <2000>;
+};
diff --git a/arch/arm/boot/dts/socfpga_cyclone5_sockit.dts b/arch/arm/boot/dts/socfpga_cyclone5_sockit.dts
index 50b99a2..469bb5c 100644
--- a/arch/arm/boot/dts/socfpga_cyclone5_sockit.dts
+++ b/arch/arm/boot/dts/socfpga_cyclone5_sockit.dts
@@ -30,8 +30,28 @@
 		device_type = "memory";
 		reg = <0x0 0x40000000>; /* 1GB */
 	};
+
+	aliases {
+		/* this allow the ethaddr uboot environmnet variable contents
+		 * to be added to the gmac1 device tree blob.
+		 */
+		ethernet0 = &gmac1;
+	};
 };
 
-&gmac1 {
+&ethernet1 {
 	status = "okay";
 };
+
+&gmac1 {
+	phy-mode = "rgmii";
+
+	rxd0-skew-ps = <0>;
+	rxd1-skew-ps = <0>;
+	rxd2-skew-ps = <0>;
+	rxd3-skew-ps = <0>;
+	txen-skew-ps = <0>;
+	txc-skew-ps = <2600>;
+	rxdv-skew-ps = <0>;
+	rxc-skew-ps = <2000>;
+};
diff --git a/arch/arm/boot/dts/socfpga_vt.dts b/arch/arm/boot/dts/socfpga_vt.dts
index 222313f..418472c 100644
--- a/arch/arm/boot/dts/socfpga_vt.dts
+++ b/arch/arm/boot/dts/socfpga_vt.dts
@@ -52,11 +52,6 @@
 			};
 		};
 
-		ethernet@ff700000 {
-			phy-mode = "gmii";
-			status = "okay";
-		};
-
 		timer0@ffc08000 {
 			clock-frequency = <7000000>;
 		};
@@ -86,3 +81,11 @@
 		};
 	};
 };
+
+&ethernet0 {
+        status = "okay";
+};
+
+&gmac0 {
+	phy-mode = "gmii";
+};
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 1/2] mm/vmalloc: export is_vmalloc_or_module_addr
From: Linus Torvalds @ 2014-02-10 19:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Yao, Mel Gorman, Andrew Morton, Rik van Riel,
	Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	David S. Miller, V9FS Develooper Mailing List,
	Linux Netdev Mailing List, Linux Kernel Mailing List,
	Aneesh Kumar K.V, Will Deacon, Christopher Covington,
	Matthew Thode
In-Reply-To: <52F929A6.8040209@mit.edu>

On Mon, Feb 10, 2014 at 11:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I agree that reading to module space is awful, but is it obviously
> terrible for a module to do this:
>
> static const char header[] = {...};
> kernel_write(file, header, sizeof(header), 0);

Yes, it is terrible. Don't do it. It shouldn't work.

Now, whether it is "obvious" or not is immaterial. It might be hard to
notice, but let's face it, we are kernel programmers. Our interfaces
aren't designed for convenience, they are for working well and
efficiently.  And if that happens to mean that you shouldn't do IO on
kmap'ed pages, or use random static data in your modules, that is what
it means.

We have lots of other rules too. People should deal with it, and
realize that in the kernel we absolutely *have* to try to minimize the
problem space as much as possible.

The usual computer sciencey approach of "make things as generic as
possible so that you can reuse the code" stuff is generally bullshit
even in other contexts (example: STL), but it's _particularly_ bad for
the kernel. It's much better to have strict rules about what is
acceptable.

                Linus

^ permalink raw reply

* [PATCH] 6lowpan: fix lockdep splats
From: Eric Dumazet @ 2014-02-10 19:42 UTC (permalink / raw)
  To: Alexander Aring, David Miller; +Cc: netdev
In-Reply-To: <1391949707.10160.130.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

When a device ndo_start_xmit() calls again dev_queue_xmit(),
lockdep can complain because dev_queue_xmit() is re-entered and the
spinlocks protecting tx queues share a common lockdep class.

Same issue was fixed for bonding/l2tp/ppp in commits 

0daa2303028a6 ("[PATCH] bonding: lockdep annotation")
49ee49202b4ac ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat")
23d3b8bfb8eb2 ("net: qdisc busylock needs lockdep annotations ")
303c07db487be ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ")

Reported-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
---
 net/ieee802154/6lowpan.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 8bfb40153fe7..8edfea5da572 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -530,7 +530,27 @@ static struct header_ops lowpan_header_ops = {
 	.create	= lowpan_header_create,
 };
 
+static struct lock_class_key lowpan_tx_busylock;
+static struct lock_class_key lowpan_netdev_xmit_lock_key;
+
+static void lowpan_set_lockdep_class_one(struct net_device *dev,
+					 struct netdev_queue *txq,
+					 void *_unused)
+{
+	lockdep_set_class(&txq->_xmit_lock,
+			  &lowpan_netdev_xmit_lock_key);
+}
+
+
+static int lowpan_dev_init(struct net_device *dev)
+{
+	netdev_for_each_tx_queue(dev, lowpan_set_lockdep_class_one, NULL);
+	dev->qdisc_tx_busylock = &lowpan_tx_busylock;
+	return 0;
+}
+
 static const struct net_device_ops lowpan_netdev_ops = {
+	.ndo_init		= lowpan_dev_init,
 	.ndo_start_xmit		= lowpan_xmit,
 	.ndo_set_mac_address	= lowpan_set_address,
 };

^ permalink raw reply related

* [PATCH] alx: add missing stats_lock spinlock init
From: John Greene @ 2014-02-10 19:34 UTC (permalink / raw)
  To: netdev

Trivial fix for init time stack trace occuring in
alx_get_stats64 upon start up. Should have been part of
commit adding the spinlock:
f1b6b106 alx: add alx_get_stats64 operation

Signed-off-by: John Greene <jogreene@redhat.com>
---
 drivers/net/ethernet/atheros/alx/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index e92ffd6..2e45f6e 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -1292,6 +1292,7 @@ static int alx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	alx = netdev_priv(netdev);
 	spin_lock_init(&alx->hw.mdio_lock);
 	spin_lock_init(&alx->irq_lock);
+	spin_lock_init(&alx->stats_lock);
 	alx->dev = netdev;
 	alx->hw.pdev = pdev;
 	alx->msg_enable = NETIF_MSG_LINK | NETIF_MSG_HW | NETIF_MSG_IFUP |
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH 1/2] mm/vmalloc: export is_vmalloc_or_module_addr
From: Andy Lutomirski @ 2014-02-10 19:33 UTC (permalink / raw)
  To: Linus Torvalds, Richard Yao
  Cc: Mel Gorman, Andrew Morton, Rik van Riel, Eric Van Hensbergen,
	Ron Minnich, Latchesar Ionkov, David S. Miller,
	V9FS Develooper Mailing List, Linux Netdev Mailing List,
	Linux Kernel Mailing List, Aneesh Kumar K.V, Will Deacon,
	Christopher Covington, Matthew Thode
In-Reply-To: <CA+55aFzzxcrLrUaCrzFbwS_74B2iHd3OvQ=WC-DR-rqpOA0j0A@mail.gmail.com>

On 02/08/2014 02:24 PM, Linus Torvalds wrote:
> On Sat, Feb 8, 2014 at 12:44 PM, Richard Yao <ryao@gentoo.org> wrote:
>>
>> However, is_vmalloc_addr() only applies to the vmalloc region. While all
>> architectures load kernel modules into virtual memory (to my knowledge),
>> some architectures do not load them into the vmalloc region.
> 
> So?
> 
> People shouldn't do IO to module data, so who cares if something is a
> module address or not?
> 
> The thing is, even module *loading* doesn't do IO to the magic module
> addresses - it loads the module data into regular vmalloc space, and
> then copies it into the final location separately.
> 
> And no module should ever do any IO on random static data (and
> certainly not on code).
> 
> So there is _zero_ reason for a driver or a filesystem to use
> is_vmalloc_or_module_addr(). It's just not a valid question to ask.
> 
> If somebody uses module data/code addresses, we're *better* off with a
> oops or other nasty behavior than to try to make it "work".

I agree that reading to module space is awful, but is it obviously
terrible for a module to do this:

static const char header[] = {...};
kernel_write(file, header, sizeof(header), 0);

The current nasty behavior is doing the I/O to the wrong place if the
appropriate CONFIG_DEBUG option isn't set.  That IMO sucks.

--Andy

^ permalink raw reply

* Re: 6lowpan: lockless tx queue of routing netlink device
From: Alexander Aring @ 2014-02-10 19:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-zigbee-devel
In-Reply-To: <1392060372.6615.41.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, Feb 10, 2014 at 11:26:12AM -0800, Eric Dumazet wrote:
> On Mon, 2014-02-10 at 19:21 +0100, Alexander Aring wrote:
> 
> > no problem. Maybe you can try to give me some explanation what you did
> > there?
> > 
> > I see you make a lockdep_set_class for each tx queue and assign the 
> > lowpan_netdev_xmit_lock_key. So why it's better to make a own
> > "lock_key".
> 
> This will be explained in the patch I'll cook.
> 
Okay, then you can add a

Tested-by: Alexander Aring <alex.aring@gmail.com>

Thanks!

- Alex

^ permalink raw reply

* Re: 6lowpan: lockless tx queue of routing netlink device
From: Eric Dumazet @ 2014-02-10 19:26 UTC (permalink / raw)
  To: Alexander Aring; +Cc: netdev, linux-zigbee-devel
In-Reply-To: <20140210182138.GB15674@omega>

On Mon, 2014-02-10 at 19:21 +0100, Alexander Aring wrote:

> no problem. Maybe you can try to give me some explanation what you did
> there?
> 
> I see you make a lockdep_set_class for each tx queue and assign the 
> lowpan_netdev_xmit_lock_key. So why it's better to make a own
> "lock_key".

This will be explained in the patch I'll cook.

^ permalink raw reply

* Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg
From: Nicholas A. Bellinger @ 2014-02-10 18:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, target-devel,
	Linux Network Development, LKML, thomas
In-Reply-To: <1391949039.10160.129.camel@edumazet-glaptop2.roam.corp.google.com>

Hi Eric & Thomas,

On Sun, 2014-02-09 at 04:30 -0800, Eric Dumazet wrote:
> On Sun, 2014-02-09 at 08:42 +0100, Eric Dumazet wrote:
> > The new infrastructure is used in iscsit_fe_sendpage_sg to avoid sending three
> > TCP packets instead of one by settings the MSG_MORE when calling kernel_sendmsg
> > via the wrapper functions tx_data and iscsit_do_tx_data. This reduces the TCP
> > overhead by sending the same data in less TCP packets and minimized the TCP RTP
> > when TCP auto corking is enabled. When creating a 500 GB VMFS filesystem the
> > filesystem is created in 3 seconds instead of 4 seconds.
> > 
> > Signed-off-by: Thomas Glanzmann <thomas@glanzmann.de>
> > X-tested-by: Thomas Glanzmann <thomas@glanzmann.de>
> > ---
> 
> Hmm, thanks but this is not how to do this.
> 
> When you submit a patch written by someone else, you should :
> 
> 1) Use your own identity as the sender, not impersonate me.
> ( thats standard convention )
> 
> 2) Put following line as first line of the mail
> ( Documentation/SubmittingPatches lines ~565)
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Then I'll add my :
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Anyway, patch is not yet complete : We also want to set
> MSG_MORE/MSG_SENDPAGE_NOTLAST for all pages but last one in a sg list.
> 
> 
> This will fix suboptimal traffic :
> 
> 13:32:04.976923 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [.], seq 289953:292849, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2896
> 13:32:04.976936 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [.], seq 292849:295745, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2896
> 13:32:04.976944 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 295745:298193, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2448
> 13:32:04.976952 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [.], seq 298193:301089, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2896
> 13:32:04.976960 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [.], seq 301089:303985, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2896
> 13:32:04.976998 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 303985:306385, ack 45792, win 795, options [nop,nop,TS val 4294914045 ecr 1577012], length 2400
> 
> Please try following updated patch, thanks !
> 
> Once tested, we'll submit it formally.
> 
>  drivers/target/iscsi/iscsi_target_parameters.c |    2 
>  drivers/target/iscsi/iscsi_target_util.c       |   38 +++++++++------
>  drivers/target/iscsi/iscsi_target_util.h       |    2 
>  3 files changed, 25 insertions(+), 17 deletions(-)
> 

This looks correct to me..

Thomas, once your able to confirm please include your 'Tested-by' and
I'll include for the next -rc3 PULL request.

Thanks!

--nab

> diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c
> index 4d2e23fc76fd..b80239250a1c 100644
> --- a/drivers/target/iscsi/iscsi_target_parameters.c
> +++ b/drivers/target/iscsi/iscsi_target_parameters.c
> @@ -79,7 +79,7 @@ int iscsi_login_tx_data(
>  	 */
>  	conn->if_marker += length;
>  
> -	tx_sent = tx_data(conn, &iov[0], iov_cnt, length);
> +	tx_sent = tx_data(conn, &iov[0], iov_cnt, length, 0);
>  	if (tx_sent != length) {
>  		pr_err("tx_data returned %d, expecting %d.\n",
>  				tx_sent, length);
> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> index 0819e688a398..3c529f7c61ce 100644
> --- a/drivers/target/iscsi/iscsi_target_util.c
> +++ b/drivers/target/iscsi/iscsi_target_util.c
> @@ -1165,7 +1165,7 @@ send_data:
>  		iov_count = cmd->iov_misc_count;
>  	}
>  
> -	tx_sent = tx_data(conn, &iov[0], iov_count, tx_size);
> +	tx_sent = tx_data(conn, &iov[0], iov_count, tx_size, 0);
>  	if (tx_size != tx_sent) {
>  		if (tx_sent == -EAGAIN) {
>  			pr_err("tx_data() returned -EAGAIN\n");
> @@ -1196,7 +1196,8 @@ send_hdr:
>  	iov.iov_base = cmd->pdu;
>  	iov.iov_len = tx_hdr_size;
>  
> -	tx_sent = tx_data(conn, &iov, 1, tx_hdr_size);
> +	tx_sent = tx_data(conn, &iov, 1, tx_hdr_size,
> +			  cmd->tx_size != tx_hdr_size ? MSG_MORE : 0);
>  	if (tx_hdr_size != tx_sent) {
>  		if (tx_sent == -EAGAIN) {
>  			pr_err("tx_data() returned -EAGAIN\n");
> @@ -1225,18 +1226,24 @@ send_hdr:
>  	while (data_len) {
>  		u32 space = (sg->length - offset);
>  		u32 sub_len = min_t(u32, data_len, space);
> +		int flags = 0;
> +	
> +		if ((data_len != sub_len) || cmd->padding ||
> +		    conn->conn_ops->DataDigest)
> +			flags = MSG_SENDPAGE_NOTLAST | MSG_MORE;
> +
>  send_pg:
>  		tx_sent = conn->sock->ops->sendpage(conn->sock,
> -					sg_page(sg), sg->offset + offset, sub_len, 0);
> +						    sg_page(sg),
> +						    sg->offset + offset,
> +						    sub_len, flags);
>  		if (tx_sent != sub_len) {
>  			if (tx_sent == -EAGAIN) {
> -				pr_err("tcp_sendpage() returned"
> -						" -EAGAIN\n");
> +				pr_err("tcp_sendpage() returned -EAGAIN\n");
>  				goto send_pg;
>  			}
>  
> -			pr_err("tcp_sendpage() failure: %d\n",
> -					tx_sent);
> +			pr_err("tcp_sendpage() failure: %d\n", tx_sent);
>  			return -1;
>  		}
>  
> @@ -1249,7 +1256,8 @@ send_padding:
>  	if (cmd->padding) {
>  		struct kvec *iov_p = &cmd->iov_data[iov_off++];
>  
> -		tx_sent = tx_data(conn, iov_p, 1, cmd->padding);
> +		tx_sent = tx_data(conn, iov_p, 1, cmd->padding,
> +				  conn->conn_ops->DataDigest ? MSG_MORE : 0);
>  		if (cmd->padding != tx_sent) {
>  			if (tx_sent == -EAGAIN) {
>  				pr_err("tx_data() returned -EAGAIN\n");
> @@ -1263,7 +1271,7 @@ send_datacrc:
>  	if (conn->conn_ops->DataDigest) {
>  		struct kvec *iov_d = &cmd->iov_data[iov_off];
>  
> -		tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
> +		tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
>  		if (ISCSI_CRC_LEN != tx_sent) {
>  			if (tx_sent == -EAGAIN) {
>  				pr_err("tx_data() returned -EAGAIN\n");
> @@ -1349,11 +1357,12 @@ static int iscsit_do_rx_data(
>  
>  static int iscsit_do_tx_data(
>  	struct iscsi_conn *conn,
> -	struct iscsi_data_count *count)
> +	struct iscsi_data_count *count,
> +	int flags)
>  {
>  	int data = count->data_length, total_tx = 0, tx_loop = 0, iov_len;
>  	struct kvec *iov_p;
> -	struct msghdr msg;
> +	struct msghdr msg = { .msg_flags = flags };
>  
>  	if (!conn || !conn->sock || !conn->conn_ops)
>  		return -1;
> @@ -1363,8 +1372,6 @@ static int iscsit_do_tx_data(
>  		return -1;
>  	}
>  
> -	memset(&msg, 0, sizeof(struct msghdr));
> -
>  	iov_p = count->iov;
>  	iov_len = count->iov_count;
>  
> @@ -1408,7 +1415,8 @@ int tx_data(
>  	struct iscsi_conn *conn,
>  	struct kvec *iov,
>  	int iov_count,
> -	int data)
> +	int data,
> +	int flags)
>  {
>  	struct iscsi_data_count c;
>  
> @@ -1421,7 +1429,7 @@ int tx_data(
>  	c.data_length = data;
>  	c.type = ISCSI_TX_DATA;
>  
> -	return iscsit_do_tx_data(conn, &c);
> +	return iscsit_do_tx_data(conn, &c, flags);
>  }
>  
>  void iscsit_collect_login_stats(
> diff --git a/drivers/target/iscsi/iscsi_target_util.h b/drivers/target/iscsi/iscsi_target_util.h
> index e4fc34a02f57..1b4f06801adc 100644
> --- a/drivers/target/iscsi/iscsi_target_util.h
> +++ b/drivers/target/iscsi/iscsi_target_util.h
> @@ -54,7 +54,7 @@ extern int iscsit_print_dev_to_proc(char *, char **, off_t, int);
>  extern int iscsit_print_sessions_to_proc(char *, char **, off_t, int);
>  extern int iscsit_print_tpg_to_proc(char *, char **, off_t, int);
>  extern int rx_data(struct iscsi_conn *, struct kvec *, int, int);
> -extern int tx_data(struct iscsi_conn *, struct kvec *, int, int);
> +extern int tx_data(struct iscsi_conn *, struct kvec *, int, int, int);
>  extern void iscsit_collect_login_stats(struct iscsi_conn *, u8, u8);
>  extern struct iscsi_tiqn *iscsit_snmp_get_tiqn(struct iscsi_conn *);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] dp83640: Get gpio and master/slave configuration from DT
From: Richard Cochran @ 2014-02-10 18:46 UTC (permalink / raw)
  To: Stefan Sørensen
  Cc: grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1392037240-30913-1-git-send-email-stefan.sorensen-usnHOLptxrsHrNJx0XZkJA@public.gmane.org>

On Mon, Feb 10, 2014 at 02:00:40PM +0100, Stefan Sørensen wrote:
> This patch removes the module parameters gpio_tab and chosen_phy in favour of
> retrieving the configuration from DT through the properties

Can we please keep the module parameters? I have two platforms with
phyters neither of which will ever support DT, namely ixp and m68k,
and I want to run recent kernels on them.

Are module parameters even considered user ABI?

Even if not, still I want to retain the module params until there is
an alternative for non-DT platforms.

> 	dp83640,slave
> 	dp83640,calibrate-gpio
> 	dp83640,perout-gpios
> 	dp83640,extts-gpios
> The configuration is now stored for each master clock device, allowing different 
> gpio setups for each master.

What do you mean by "each master"? Do you mean each individual PHY device
or each group of PHYs on the same MDIO bus?

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH iproute2 net-next-for-3.13 1/2] iplink: display slave information
From: Michal Kubecek @ 2014-02-10 18:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20140210090916.34d4577e@nehalam.linuxnetplumber.net>

On Mon, Feb 10, 2014 at 09:09:16AM -0800, Stephen Hemminger wrote:
> On Mon, 10 Feb 2014 09:11:46 +0100 (CET)
> Michal Kubecek <mkubecek@suse.cz> wrote:
> 
> > +	if (1) {
> > +		struct rtattr *attr[lu->maxslattr+1], **data = NULL;
> > +
> > +		if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
> > +			parse_rtattr_nested(attr, lu->maxslattr,
> > +					    linkinfo[IFLA_INFO_SLAVE_DATA]);
> > +			data = attr;
> > +		}
> > +		lu->print_slave_opt(lu, fp, data);
> > +	}
> >  }
> 
> Ugly, please don't use if (1)

Normally I woudn't but it was already there this way in the original
code so I assumed there was a reason for that (even if I couldn't see
an obvious one).

Should I send a v2 or are you going to use the implementation submitted
by Jiri Pirko?

                                                       Michal Kubecek

^ permalink raw reply

* Re: 6lowpan: lockless tx queue of routing netlink device
From: Alexander Aring @ 2014-02-10 18:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-zigbee-devel
In-Reply-To: <1392056186.6615.40.camel@edumazet-glaptop2.roam.corp.google.com>

Hi,

On Mon, Feb 10, 2014 at 10:16:26AM -0800, Eric Dumazet wrote:
> On Mon, 2014-02-10 at 18:33 +0100, Alexander Aring wrote:
> > Hi Eric,
> > 
> > On Sun, Feb 09, 2014 at 04:41:47AM -0800, Eric Dumazet wrote:
> > > 
> > > Please try the following fix, thanks for this report !
> > > 
> > > diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
> > > index 48b25c0af4d0..069af33013c4 100644
> > > --- a/net/ieee802154/6lowpan.c
> > > +++ b/net/ieee802154/6lowpan.c
> > > @@ -533,7 +533,27 @@ static struct header_ops lowpan_header_ops = {
> > >  	.create	= lowpan_header_create,
> > >  };
> > >  
> > > +static struct lock_class_key lowpan_tx_busylock;
> > > +static struct lock_class_key lowpan_netdev_xmit_lock_key;
> > > +
> > > +static void lowpan_set_lockdep_class_one(struct net_device *dev,
> > > +					 struct netdev_queue *txq,
> > > +					 void *_unused)
> > > +{
> > > +	lockdep_set_class(&txq->_xmit_lock,
> > > +			  &lowpan_netdev_xmit_lock_key);
> > > +}
> > > +
> > > +
> > > +static int lowpan_dev_init(struct net_device *dev)
> > > +{
> > > +	netdev_for_each_tx_queue(dev, lowpan_set_lockdep_class_one, NULL);
> > > +	dev->qdisc_tx_busylock = &lowpan_tx_busylock;
> > > +	return 0;
> > > +}
> > > +
> > >  static const struct net_device_ops lowpan_netdev_ops = {
> > > +	.ndo_init		= lowpan_dev_init,
> > >  	.ndo_start_xmit		= lowpan_xmit,
> > >  	.ndo_set_mac_address	= lowpan_set_address,
> > >  };
> > 
> > thanks, this fixed the issue. What we should do as next?
> > 
> > Should I create a patch for this or do you want to make a patch?
> 
> I'll take care of this, don't worry ;)
> 
> Thanks for testing !
> 
no problem. Maybe you can try to give me some explanation what you did
there?

I see you make a lockdep_set_class for each tx queue and assign the 
lowpan_netdev_xmit_lock_key. So why it's better to make a own
"lock_key".

thanks.

- Alex

^ permalink raw reply

* Re: 6lowpan: lockless tx queue of routing netlink device
From: Eric Dumazet @ 2014-02-10 18:16 UTC (permalink / raw)
  To: Alexander Aring
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <20140210173314.GA12594@omega>

On Mon, 2014-02-10 at 18:33 +0100, Alexander Aring wrote:
> Hi Eric,
> 
> On Sun, Feb 09, 2014 at 04:41:47AM -0800, Eric Dumazet wrote:
> > 
> > Please try the following fix, thanks for this report !
> > 
> > diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
> > index 48b25c0af4d0..069af33013c4 100644
> > --- a/net/ieee802154/6lowpan.c
> > +++ b/net/ieee802154/6lowpan.c
> > @@ -533,7 +533,27 @@ static struct header_ops lowpan_header_ops = {
> >  	.create	= lowpan_header_create,
> >  };
> >  
> > +static struct lock_class_key lowpan_tx_busylock;
> > +static struct lock_class_key lowpan_netdev_xmit_lock_key;
> > +
> > +static void lowpan_set_lockdep_class_one(struct net_device *dev,
> > +					 struct netdev_queue *txq,
> > +					 void *_unused)
> > +{
> > +	lockdep_set_class(&txq->_xmit_lock,
> > +			  &lowpan_netdev_xmit_lock_key);
> > +}
> > +
> > +
> > +static int lowpan_dev_init(struct net_device *dev)
> > +{
> > +	netdev_for_each_tx_queue(dev, lowpan_set_lockdep_class_one, NULL);
> > +	dev->qdisc_tx_busylock = &lowpan_tx_busylock;
> > +	return 0;
> > +}
> > +
> >  static const struct net_device_ops lowpan_netdev_ops = {
> > +	.ndo_init		= lowpan_dev_init,
> >  	.ndo_start_xmit		= lowpan_xmit,
> >  	.ndo_set_mac_address	= lowpan_set_address,
> >  };
> 
> thanks, this fixed the issue. What we should do as next?
> 
> Should I create a patch for this or do you want to make a patch?

I'll take care of this, don't worry ;)

Thanks for testing !



------------------------------------------------------------------------------
Android&trade; apps run on BlackBerry&reg;10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk

^ permalink raw reply

* Re: [PATCH v2] SUNRPC: Allow one callback request to be received from two sk_buff
From: Trond Myklebust @ 2014-02-10 17:46 UTC (permalink / raw)
  To: shaobingqing
  Cc: bfields-H+wXaHxf7aLQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1391826543-3102-1-git-send-email-shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>

On Sat, 2014-02-08 at 10:29 +0800, shaobingqing wrote:
> In current code, there only one struct rpc_rqst is prealloced. If one
> callback request is received from two sk_buff, the xprt_alloc_bc_request
> would be execute two times with the same transport->xid. The first time
> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
> bit of transport->tcp_flags will not be cleared. The second time
> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
> pointer will be returned, then xprt_force_disconnect occur. I think one
> callback request can be allowed to be received from two sk_buff.
> 
> Signed-off-by: shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>
> ---

There are still several problems with this patch.
For one thing, xprt->req_first is never cleared, which means that the
tests you add in xs_tcp_read_callback() could trigger even if the
request is in use by the callback server.
Furthermore, if the xprt is destroyed before the RPC callback has been
filled, then the request is leaked: it is no longer on the xprt
backchannel list, so nothing will clean it up there, and nothing ever
checks xprt->req_first, so it won't get cleaned up there either.

Instead of adding a new tracking mechanism, what say we just keep the
request on the xprt callback list until it has been completely filled.

IOW: something like the following patch:

8<-------------------------------------------------------------------
>From 2aa57b6c6c6025e17d7ca164bbda01767af662be Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
Date: Mon, 10 Feb 2014 11:18:39 -0500
Subject: [PATCH] SUNRPC: RPC callbacks may be split across several TCP
 segments

Since TCP is a stream protocol, our callback read code needs to take into
account the fact that RPC callbacks are not always confined to a single
TCP segment.
This patch adds support for multiple TCP segments by ensuring that we
only remove the rpc_rqst structure from the 'free backchannel requests'
list once the data has been completely received. We rely on the fact
that TCP data is ordered for the duration of the connection.

Reported-by: shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>
Signed-off-by: Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 include/linux/sunrpc/bc_xprt.h |  3 +-
 net/sunrpc/backchannel_rqst.c  | 92 +++++++++++++++++++++++++++++-------------
 net/sunrpc/xprtsock.c          | 28 ++++---------
 3 files changed, 73 insertions(+), 50 deletions(-)

diff --git a/include/linux/sunrpc/bc_xprt.h b/include/linux/sunrpc/bc_xprt.h
index 969c0a671dbf..2ca67b55e0fe 100644
--- a/include/linux/sunrpc/bc_xprt.h
+++ b/include/linux/sunrpc/bc_xprt.h
@@ -32,7 +32,8 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <linux/sunrpc/sched.h>
 
 #ifdef CONFIG_SUNRPC_BACKCHANNEL
-struct rpc_rqst *xprt_alloc_bc_request(struct rpc_xprt *xprt);
+struct rpc_rqst *xprt_lookup_bc_request(struct rpc_xprt *xprt, __be32 xid);
+void xprt_complete_bc_request(struct rpc_rqst *req, uint32_t copied);
 void xprt_free_bc_request(struct rpc_rqst *req);
 int xprt_setup_backchannel(struct rpc_xprt *, unsigned int min_reqs);
 void xprt_destroy_backchannel(struct rpc_xprt *, unsigned int max_reqs);
diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 890a29912d5a..7cceeb06b939 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -210,39 +210,23 @@ out:
 }
 EXPORT_SYMBOL_GPL(xprt_destroy_backchannel);
 
-/*
- * One or more rpc_rqst structure have been preallocated during the
- * backchannel setup.  Buffer space for the send and private XDR buffers
- * has been preallocated as well.  Use xprt_alloc_bc_request to allocate
- * to this request.  Use xprt_free_bc_request to return it.
- *
- * We know that we're called in soft interrupt context, grab the spin_lock
- * since there is no need to grab the bottom half spin_lock.
- *
- * Return an available rpc_rqst, otherwise NULL if non are available.
- */
-struct rpc_rqst *xprt_alloc_bc_request(struct rpc_xprt *xprt)
+static struct rpc_rqst *xprt_alloc_bc_request(struct rpc_xprt *xprt, __be32 xid)
 {
-	struct rpc_rqst *req;
+	struct rpc_rqst *req = NULL;
 
 	dprintk("RPC:       allocate a backchannel request\n");
-	spin_lock(&xprt->bc_pa_lock);
-	if (!list_empty(&xprt->bc_pa_list)) {
-		req = list_first_entry(&xprt->bc_pa_list, struct rpc_rqst,
-				rq_bc_pa_list);
-		list_del(&req->rq_bc_pa_list);
-	} else {
-		req = NULL;
-	}
-	spin_unlock(&xprt->bc_pa_lock);
+	if (list_empty(&xprt->bc_pa_list))
+		goto not_found;
 
-	if (req != NULL) {
-		set_bit(RPC_BC_PA_IN_USE, &req->rq_bc_pa_state);
-		req->rq_reply_bytes_recvd = 0;
-		req->rq_bytes_sent = 0;
-		memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
+	req = list_first_entry(&xprt->bc_pa_list, struct rpc_rqst,
+				rq_bc_pa_list);
+	req->rq_reply_bytes_recvd = 0;
+	req->rq_bytes_sent = 0;
+	memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
 			sizeof(req->rq_private_buf));
-	}
+	req->rq_xid = xid;
+	req->rq_connect_cookie = xprt->connect_cookie;
+not_found:
 	dprintk("RPC:       backchannel req=%p\n", req);
 	return req;
 }
@@ -257,6 +241,7 @@ void xprt_free_bc_request(struct rpc_rqst *req)
 
 	dprintk("RPC:       free backchannel req=%p\n", req);
 
+	req->rq_connect_cookie = xprt->connect_cookie - 1;
 	smp_mb__before_clear_bit();
 	WARN_ON_ONCE(!test_bit(RPC_BC_PA_IN_USE, &req->rq_bc_pa_state));
 	clear_bit(RPC_BC_PA_IN_USE, &req->rq_bc_pa_state);
@@ -279,7 +264,56 @@ void xprt_free_bc_request(struct rpc_rqst *req)
 	 * may be reused by a new callback request.
 	 */
 	spin_lock_bh(&xprt->bc_pa_lock);
-	list_add(&req->rq_bc_pa_list, &xprt->bc_pa_list);
+	list_add_tail(&req->rq_bc_pa_list, &xprt->bc_pa_list);
 	spin_unlock_bh(&xprt->bc_pa_lock);
 }
 
+/*
+ * One or more rpc_rqst structure have been preallocated during the
+ * backchannel setup.  Buffer space for the send and private XDR buffers
+ * has been preallocated as well.  Use xprt_alloc_bc_request to allocate
+ * to this request.  Use xprt_free_bc_request to return it.
+ *
+ * We know that we're called in soft interrupt context, grab the spin_lock
+ * since there is no need to grab the bottom half spin_lock.
+ *
+ * Return an available rpc_rqst, otherwise NULL if non are available.
+ */
+struct rpc_rqst *xprt_lookup_bc_request(struct rpc_xprt *xprt, __be32 xid)
+{
+	struct rpc_rqst *req;
+
+	spin_lock(&xprt->bc_pa_lock);
+	list_for_each_entry(req, &xprt->bc_pa_list, rq_bc_pa_list) {
+		if (req->rq_connect_cookie != xprt->connect_cookie)
+			continue;
+		if (req->rq_xid == xid)
+			goto found;
+	}
+	req = xprt_alloc_bc_request(xprt, xid);
+found:
+	spin_unlock(&xprt->bc_pa_lock);
+	return req;
+}
+
+/*
+ * Add callback request to callback list.  The callback
+ * service sleeps on the sv_cb_waitq waiting for new
+ * requests.  Wake it up after adding enqueing the
+ * request.
+ */
+void xprt_complete_bc_request(struct rpc_rqst *req, uint32_t copied)
+{
+	struct rpc_xprt *xprt = req->rq_xprt;
+	struct svc_serv *bc_serv = xprt->bc_serv;
+
+	req->rq_private_buf.len = copied;
+	set_bit(RPC_BC_PA_IN_USE, &req->rq_bc_pa_state);
+
+	dprintk("RPC:       add callback request to list\n");
+	spin_lock(&bc_serv->sv_cb_lock);
+	list_move(&req->rq_bc_list, &bc_serv->sv_cb_list);
+	wake_up(&bc_serv->sv_cb_waitq);
+	spin_unlock(&bc_serv->sv_cb_lock);
+}
+
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 817a1e523969..6497c221612c 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1302,41 +1302,29 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
  * If we're unable to obtain the rpc_rqst we schedule the closing of the
  * connection and return -1.
  */
-static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
+static int xs_tcp_read_callback(struct rpc_xprt *xprt,
 				       struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport =
 				container_of(xprt, struct sock_xprt, xprt);
 	struct rpc_rqst *req;
 
-	req = xprt_alloc_bc_request(xprt);
+	/* Look up and lock the request corresponding to the given XID */
+	spin_lock(&xprt->transport_lock);
+	req = xprt_lookup_bc_request(xprt, transport->tcp_xid);
 	if (req == NULL) {
+		spin_unlock(&xprt->transport_lock);
 		printk(KERN_WARNING "Callback slot table overflowed\n");
 		xprt_force_disconnect(xprt);
 		return -1;
 	}
 
-	req->rq_xid = transport->tcp_xid;
 	dprintk("RPC:       read callback  XID %08x\n", ntohl(req->rq_xid));
 	xs_tcp_read_common(xprt, desc, req);
 
-	if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) {
-		struct svc_serv *bc_serv = xprt->bc_serv;
-
-		/*
-		 * Add callback request to callback list.  The callback
-		 * service sleeps on the sv_cb_waitq waiting for new
-		 * requests.  Wake it up after adding enqueing the
-		 * request.
-		 */
-		dprintk("RPC:       add callback request to list\n");
-		spin_lock(&bc_serv->sv_cb_lock);
-		list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
-		spin_unlock(&bc_serv->sv_cb_lock);
-		wake_up(&bc_serv->sv_cb_waitq);
-	}
-
-	req->rq_private_buf.len = transport->tcp_copied;
+	if (!(transport->tcp_flags & TCP_RCV_COPY_DATA))
+		xprt_complete_bc_request(req, transport->tcp_copied);
+	spin_unlock(&xprt->transport_lock);
 
 	return 0;
 }
-- 
1.8.5.3


-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] 6lowpan: Remove unused pointer in lowpan_header_create()
From: Alexander Aring @ 2014-02-10 17:40 UTC (permalink / raw)
  To: David Miller
  Cc: cengelma, netdev, alex.bluesman.smirnov, dbaryshkov,
	jukka.rissanen
In-Reply-To: <20140209.183829.1382019488417715676.davem@davemloft.net>

Hi,

I need the hdr pointer for my upcomming patches. Nevertheless it
should be removed... it's not used anymore.

I am wondering why this patch was accepted, it's not a bug fix. Is
net-next open again?

- Alex

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox