Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] ipv6: Remove superfluous NULL pointer check in ipv6_local_rxpmtu
From: Steffen Klassert @ 2011-10-11 12:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

The pointer to mtu_info is taken from the common buffer
of the skb, thus it can't be a NULL pointer. This patch
removes this check on mtu_info.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/datagram.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index b46e9f8..e248069 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -297,10 +297,6 @@ void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu)
 	ipv6_addr_copy(&iph->daddr, &fl6->daddr);
 
 	mtu_info = IP6CBMTU(skb);
-	if (!mtu_info) {
-		kfree_skb(skb);
-		return;
-	}
 
 	mtu_info->ip6m_mtu = mtu;
 	mtu_info->ip6m_addr.sin6_family = AF_INET6;
-- 
1.7.0.4

^ permalink raw reply related

* Re: e100 + VLANs?
From: Michael Tokarev @ 2011-10-11 11:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Lamparter, jeffrey.t.kirsher, netdev
In-Reply-To: <CAL4WiiphpyizVaNkvOdJp1+UK53TkGw9RXnC-vzH7fTNpBAAEA@mail.gmail.com>

11.10.2011 15:25, Eric Dumazet wrote:
>> So, is that a hardware limitation?
> 
> Its a driver bug
> 
> This comes from fact that sizeof(struct rfd) = 16
> 
> and VLAN_ETH_FRAME_LEN is 1518
> 
> driver mixes VLAN_ETH_FRAME_LEN and 1500+4+sizeof(struct rfd)   (1520, not 1518)
> 
> It therefore misses 2 bytes for large frames (VLAN tagged)
> 
> Fix is to remove VLAN_ETH_FRAME_LEN references for good...
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index c1352c6..3287d31 100644

Hmm.  This does not _exactly_ work.

I applied it on top of 2.6.32, patch applied but with some fuzz - I checked
manually and it appears to be ok.

Now, with this patch applied, I see on the e100 side:

00:1f:c6:ef:e5:1b > 00:90:27:30:6d:1c, ethertype 802.1Q (0x8100), length 1504: vlan 6, p 0, ethertype IPv4, truncated-ip - 1 bytes missing! (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1487)
    10.48.6.1 > 10.48.6.2: ICMP echo request, id 27613, seq 6, length 1467

when pinging it with -s 1459 (1487 total size).

I added extra +40 for RFD_BUF_LEN define and recompiled
(this resulted in RFD_BUF_LEN=1560, - I've added a printk
just to be sure).

Now I see the same effect as before the patch:  maximum
packet size that goes and can be seen on the e100 side
is 1468(1496) (ping), which results in this on e100 side:

15:54:44.830941 00:1f:c6:ef:e5:1b > 00:90:27:30:6d:1c, ethertype 802.1Q (0x8100), length 1514: vlan 6, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1496)
    10.48.6.1 > 10.48.6.2: ICMP echo request, id 28735, seq 4, length 1476
15:54:44.831025 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype 802.1Q (0x8100), length 1514: vlan 6, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 12010, offset 0, flags [none], proto ICMP (1), length 1496)
    10.48.6.2 > 10.48.6.1: ICMP echo reply, id 28735, seq 4, length 1476

(and it works).  With -s 1469, no ICMP packets can be seen
on e100 side.

So it still may be the hardware... :)

Thank you!

/mjt

^ permalink raw reply

* [PATCH net-next] xfrm: Simplify the replay check and advance functions
From: Steffen Klassert @ 2011-10-11 11:58 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, netdev

The replay check and replay advance functions had some code
duplications. This patch removes the duplications.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_replay.c |   98 ++++++++++++++++-------------------------------
 1 files changed, 34 insertions(+), 64 deletions(-)

diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index b11ea69..6ca3574 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -203,8 +203,6 @@ static int xfrm_replay_check_bmp(struct xfrm_state *x,
 	if (!replay_esn->replay_window)
 		return 0;
 
-	pos = (replay_esn->seq - 1) % replay_esn->replay_window;
-
 	if (unlikely(seq == 0))
 		goto err;
 
@@ -216,19 +214,18 @@ static int xfrm_replay_check_bmp(struct xfrm_state *x,
 		goto err;
 	}
 
-	if (pos >= diff) {
+	pos = (replay_esn->seq - 1) % replay_esn->replay_window;
+
+	if (pos >= diff)
 		bitnr = (pos - diff) % replay_esn->replay_window;
-		nr = bitnr >> 5;
-		bitnr = bitnr & 0x1F;
-		if (replay_esn->bmp[nr] & (1U << bitnr))
-			goto err_replay;
-	} else {
+	else
 		bitnr = replay_esn->replay_window - (diff - pos);
-		nr = bitnr >> 5;
-		bitnr = bitnr & 0x1F;
-		if (replay_esn->bmp[nr] & (1U << bitnr))
-			goto err_replay;
-	}
+
+	nr = bitnr >> 5;
+	bitnr = bitnr & 0x1F;
+	if (replay_esn->bmp[nr] & (1U << bitnr))
+		goto err_replay;
+
 	return 0;
 
 err_replay:
@@ -259,39 +256,27 @@ static void xfrm_replay_advance_bmp(struct xfrm_state *x, __be32 net_seq)
 				bitnr = bitnr & 0x1F;
 				replay_esn->bmp[nr] &=  ~(1U << bitnr);
 			}
-
-			bitnr = (pos + diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
 		} else {
 			nr = (replay_esn->replay_window - 1) >> 5;
 			for (i = 0; i <= nr; i++)
 				replay_esn->bmp[i] = 0;
-
-			bitnr = (pos + diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
 		}
 
+		bitnr = (pos + diff) % replay_esn->replay_window;
 		replay_esn->seq = seq;
 	} else {
 		diff = replay_esn->seq - seq;
 
-		if (pos >= diff) {
+		if (pos >= diff)
 			bitnr = (pos - diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
-		} else {
+		else
 			bitnr = replay_esn->replay_window - (diff - pos);
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
-		}
 	}
 
+	nr = bitnr >> 5;
+	bitnr = bitnr & 0x1F;
+	replay_esn->bmp[nr] |= (1U << bitnr);
+
 	if (xfrm_aevent_is_on(xs_net(x)))
 		xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
 }
@@ -390,8 +375,6 @@ static int xfrm_replay_check_esn(struct xfrm_state *x,
 	if (!wsize)
 		return 0;
 
-	pos = (replay_esn->seq - 1) % replay_esn->replay_window;
-
 	if (unlikely(seq == 0 && replay_esn->seq_hi == 0 &&
 		     (replay_esn->seq < replay_esn->replay_window - 1)))
 		goto err;
@@ -415,19 +398,18 @@ static int xfrm_replay_check_esn(struct xfrm_state *x,
 		goto err;
 	}
 
-	if (pos >= diff) {
+	pos = (replay_esn->seq - 1) % replay_esn->replay_window;
+
+	if (pos >= diff)
 		bitnr = (pos - diff) % replay_esn->replay_window;
-		nr = bitnr >> 5;
-		bitnr = bitnr & 0x1F;
-		if (replay_esn->bmp[nr] & (1U << bitnr))
-			goto err_replay;
-	} else {
+	else
 		bitnr = replay_esn->replay_window - (diff - pos);
-		nr = bitnr >> 5;
-		bitnr = bitnr & 0x1F;
-		if (replay_esn->bmp[nr] & (1U << bitnr))
-			goto err_replay;
-	}
+
+	nr = bitnr >> 5;
+	bitnr = bitnr & 0x1F;
+	if (replay_esn->bmp[nr] & (1U << bitnr))
+		goto err_replay;
+
 	return 0;
 
 err_replay:
@@ -465,22 +447,13 @@ static void xfrm_replay_advance_esn(struct xfrm_state *x, __be32 net_seq)
 				bitnr = bitnr & 0x1F;
 				replay_esn->bmp[nr] &=  ~(1U << bitnr);
 			}
-
-			bitnr = (pos + diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
 		} else {
 			nr = (replay_esn->replay_window - 1) >> 5;
 			for (i = 0; i <= nr; i++)
 				replay_esn->bmp[i] = 0;
-
-			bitnr = (pos + diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
 		}
 
+		bitnr = (pos + diff) % replay_esn->replay_window;
 		replay_esn->seq = seq;
 
 		if (unlikely(wrap > 0))
@@ -488,19 +461,16 @@ static void xfrm_replay_advance_esn(struct xfrm_state *x, __be32 net_seq)
 	} else {
 		diff = replay_esn->seq - seq;
 
-		if (pos >= diff) {
+		if (pos >= diff)
 			bitnr = (pos - diff) % replay_esn->replay_window;
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
-		} else {
+		else
 			bitnr = replay_esn->replay_window - (diff - pos);
-			nr = bitnr >> 5;
-			bitnr = bitnr & 0x1F;
-			replay_esn->bmp[nr] |= (1U << bitnr);
-		}
 	}
 
+	nr = bitnr >> 5;
+	bitnr = bitnr & 0x1F;
+	replay_esn->bmp[nr] |= (1U << bitnr);
+
 	if (xfrm_aevent_is_on(xs_net(x)))
 		xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
 }
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 2/2] xfrm6: Don't call icmpv6_send on local error
From: Steffen Klassert @ 2011-10-11 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, netdev
In-Reply-To: <20111011114251.GH1830@secunet.com>

Calling icmpv6_send() on a local message size error leads to
an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
to notify about the pmtu if the IPV6_DONTFRAG socket option is
set on an udp or raw socket, according RFC 3542 and use
ipv6_local_error() otherwise.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/xfrm6_output.c |   56 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 49a91c5..faae417 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -28,6 +28,43 @@ int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 
 EXPORT_SYMBOL(xfrm6_find_1stfragopt);
 
+static int xfrm6_local_dontfrag(struct sk_buff *skb)
+{
+	int proto;
+	struct sock *sk = skb->sk;
+
+	if (sk) {
+		proto = sk->sk_protocol;
+
+		if (proto == IPPROTO_UDP || proto == IPPROTO_RAW)
+			return inet6_sk(sk)->dontfrag;
+	}
+
+	return 0;
+}
+
+static void xfrm6_local_rxpmtu(struct sk_buff *skb, u32 mtu)
+{
+	struct flowi6 fl6;
+	struct sock *sk = skb->sk;
+
+	fl6.flowi6_oif = sk->sk_bound_dev_if;
+	ipv6_addr_copy(&fl6.daddr, &ipv6_hdr(skb)->daddr);
+
+	ipv6_local_rxpmtu(sk, &fl6, mtu);
+}
+
+static void xfrm6_local_error(struct sk_buff *skb, u32 mtu)
+{
+	struct flowi6 fl6;
+	struct sock *sk = skb->sk;
+
+	fl6.fl6_dport = inet_sk(sk)->inet_dport;
+	ipv6_addr_copy(&fl6.daddr, &ipv6_hdr(skb)->daddr);
+
+	ipv6_local_error(sk, EMSGSIZE, &fl6, mtu);
+}
+
 static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 {
 	int mtu, ret = 0;
@@ -39,7 +76,13 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 
 	if (!skb->local_df && skb->len > mtu) {
 		skb->dev = dst->dev;
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+
+		if (xfrm6_local_dontfrag(skb))
+			xfrm6_local_rxpmtu(skb, mtu);
+		else if (skb->sk)
+			xfrm6_local_error(skb, mtu);
+		else
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		ret = -EMSGSIZE;
 	}
 
@@ -93,9 +136,18 @@ static int __xfrm6_output(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct xfrm_state *x = dst->xfrm;
+	int mtu = ip6_skb_dst_mtu(skb);
+
+	if (skb->len > mtu && xfrm6_local_dontfrag(skb)) {
+		xfrm6_local_rxpmtu(skb, mtu);
+		return -EMSGSIZE;
+	} else if (!skb->local_df && skb->len > mtu && skb->sk) {
+		xfrm6_local_error(skb, mtu);
+		return -EMSGSIZE;
+	}
 
 	if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
-	    ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
+	    ((skb->len > mtu && !skb_is_gso(skb)) ||
 		dst_allfrag(skb_dst(skb)))) {
 			return ip6_fragment(skb, x->outer_mode->afinfo->output_finish);
 	}
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 1/2] ipv6: Fix IPsec slowpath fragmentation problem
From: Steffen Klassert @ 2011-10-11 11:43 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, netdev
In-Reply-To: <20111011114251.GH1830@secunet.com>

ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/ip6_output.c |   18 ++++++++++++------
 net/ipv6/raw.c        |    3 +--
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 835c04b..1e20b64 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1193,6 +1193,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	struct sk_buff *skb;
 	unsigned int maxfraglen, fragheaderlen;
 	int exthdrlen;
+	int dst_exthdrlen;
 	int hh_len;
 	int mtu;
 	int copy;
@@ -1248,7 +1249,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 		np->cork.hop_limit = hlimit;
 		np->cork.tclass = tclass;
 		mtu = np->pmtudisc == IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
 		if (np->frag_size < mtu) {
 			if (np->frag_size)
 				mtu = np->frag_size;
@@ -1259,16 +1260,17 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 		cork->length = 0;
 		sk->sk_sndmsg_page = NULL;
 		sk->sk_sndmsg_off = 0;
-		exthdrlen = rt->dst.header_len + (opt ? opt->opt_flen : 0) -
-			    rt->rt6i_nfheader_len;
+		exthdrlen = (opt ? opt->opt_flen : 0) - rt->rt6i_nfheader_len;
 		length += exthdrlen;
 		transhdrlen += exthdrlen;
+		dst_exthdrlen = rt->dst.header_len;
 	} else {
 		rt = (struct rt6_info *)cork->dst;
 		fl6 = &inet->cork.fl.u.ip6;
 		opt = np->cork.opt;
 		transhdrlen = 0;
 		exthdrlen = 0;
+		dst_exthdrlen = 0;
 		mtu = cork->fragsize;
 	}
 
@@ -1368,6 +1370,8 @@ alloc_new_skb:
 			else
 				alloclen = datalen + fragheaderlen;
 
+			alloclen += dst_exthdrlen;
+
 			/*
 			 * The last fragment gets additional space at tail.
 			 * Note: we overallocate on fragments with MSG_MODE
@@ -1419,9 +1423,9 @@ alloc_new_skb:
 			/*
 			 *	Find where to start putting bytes
 			 */
-			data = skb_put(skb, fraglen);
-			skb_set_network_header(skb, exthdrlen);
-			data += fragheaderlen;
+			data = skb_put(skb, fraglen + dst_exthdrlen);
+			skb_set_network_header(skb, exthdrlen + dst_exthdrlen);
+			data += fragheaderlen + dst_exthdrlen;
 			skb->transport_header = (skb->network_header +
 						 fragheaderlen);
 			if (fraggap) {
@@ -1434,6 +1438,7 @@ alloc_new_skb:
 				pskb_trim_unique(skb_prev, maxfraglen);
 			}
 			copy = datalen - transhdrlen - fraggap;
+
 			if (copy < 0) {
 				err = -EINVAL;
 				kfree_skb(skb);
@@ -1448,6 +1453,7 @@ alloc_new_skb:
 			length -= datalen - fraggap;
 			transhdrlen = 0;
 			exthdrlen = 0;
+			dst_exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
 			/*
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 3486f62..6f7824e 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -542,8 +542,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 		goto out;
 
 	offset = rp->offset;
-	total_len = inet_sk(sk)->cork.base.length - (skb_network_header(skb) -
-						     skb->data);
+	total_len = inet_sk(sk)->cork.base.length;
 	if (offset >= total_len - 1) {
 		err = -EINVAL;
 		ip6_flush_pending_frames(sk);
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH net-next 0/2] ipv6: IPsec pmtu/fragmentation fixes
From: Steffen Klassert @ 2011-10-11 11:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, netdev

This patchset fixes two issues for IPsec on ipv6. These issues
are arround, basically forever.

The patchset is based on current net-next/master.

1) Fix for a IPv6 IPsec slowpath fragmentation problem.
   Already fixed for IPv4 with
   git commit 353e5c9 (ipv4: Fix IPsec slowpath fragmentation problem).

2) Fix an incorrect update of the path mtu on IPv6 IPsec.
   Already fixed for IPv4 with
   git commit b00897b (xfrm4: Don't call icmp_send on local error).

^ permalink raw reply

* Re: e100 + VLANs?
From: Eric Dumazet @ 2011-10-11 11:25 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: David Lamparter, jeffrey.t.kirsher, netdev
In-Reply-To: <4E941198.9000307@msgid.tls.msk.ru>

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

> So, is that a hardware limitation?

Its a driver bug

This comes from fact that sizeof(struct rfd) = 16

and VLAN_ETH_FRAME_LEN is 1518

driver mixes VLAN_ETH_FRAME_LEN and 1500+4+sizeof(struct rfd)   (1520, not 1518)

It therefore misses 2 bytes for large frames (VLAN tagged)

Fix is to remove VLAN_ETH_FRAME_LEN references for good...

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index c1352c6..3287d31 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -435,6 +435,7 @@ struct rfd {
        __le16 actual_size;
        __le16 size;
 };
+#define RFD_BUF_LEN (sizeof(struct rfd) + ETH_DATA_LEN + VLAN_HLEN)

 struct rx {
        struct rx *next, *prev;
@@ -1075,7 +1076,7 @@ static void e100_get_defaults(struct nic *nic)
        /* Template for a freshly allocated RFD */
        nic->blank_rfd.command = 0;
        nic->blank_rfd.rbd = cpu_to_le32(0xFFFFFFFF);
-       nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
+       nic->blank_rfd.size = cpu_to_le16(RFD_BUF_LEN);

        /* MII setup */
        nic->mii.phy_id_mask = 0x1F;
@@ -1881,7 +1882,6 @@ static inline void e100_start_receiver(struct
nic *nic, struct rx *rx)
        }
 }

-#define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
 static int e100_rx_alloc_skb(struct nic *nic, struct rx *rx)
 {
        if (!(rx->skb = netdev_alloc_skb_ip_align(nic->netdev, RFD_BUF_LEN)))
@@ -2058,7 +2058,7 @@ static void e100_rx_clean(struct nic *nic,
unsigned int *work_done,
                pci_dma_sync_single_for_device(nic->pdev,
                        old_before_last_rx->dma_addr, sizeof(struct rfd),
                        PCI_DMA_BIDIRECTIONAL);
-               old_before_last_rfd->size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
+               old_before_last_rfd->size = cpu_to_le16(RFD_BUF_LEN);
                pci_dma_sync_single_for_device(nic->pdev,
                        old_before_last_rx->dma_addr, sizeof(struct rfd),
                        PCI_DMA_BIDIRECTIONAL);

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 1434 bytes --]

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index c1352c6..3287d31 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -435,6 +435,7 @@ struct rfd {
 	__le16 actual_size;
 	__le16 size;
 };
+#define RFD_BUF_LEN (sizeof(struct rfd) + ETH_DATA_LEN + VLAN_HLEN)
 
 struct rx {
 	struct rx *next, *prev;
@@ -1075,7 +1076,7 @@ static void e100_get_defaults(struct nic *nic)
 	/* Template for a freshly allocated RFD */
 	nic->blank_rfd.command = 0;
 	nic->blank_rfd.rbd = cpu_to_le32(0xFFFFFFFF);
-	nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
+	nic->blank_rfd.size = cpu_to_le16(RFD_BUF_LEN);
 
 	/* MII setup */
 	nic->mii.phy_id_mask = 0x1F;
@@ -1881,7 +1882,6 @@ static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
 	}
 }
 
-#define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
 static int e100_rx_alloc_skb(struct nic *nic, struct rx *rx)
 {
 	if (!(rx->skb = netdev_alloc_skb_ip_align(nic->netdev, RFD_BUF_LEN)))
@@ -2058,7 +2058,7 @@ static void e100_rx_clean(struct nic *nic, unsigned int *work_done,
 		pci_dma_sync_single_for_device(nic->pdev,
 			old_before_last_rx->dma_addr, sizeof(struct rfd),
 			PCI_DMA_BIDIRECTIONAL);
-		old_before_last_rfd->size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
+		old_before_last_rfd->size = cpu_to_le16(RFD_BUF_LEN);
 		pci_dma_sync_single_for_device(nic->pdev,
 			old_before_last_rx->dma_addr, sizeof(struct rfd),
 			PCI_DMA_BIDIRECTIONAL);

^ permalink raw reply related

* [PATCH 4/4] ipv4: Fix inetpeer expire time information
From: Steffen Klassert @ 2011-10-11 11:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111011110842.GC1830@secunet.com>

As we update the learned pmtu informations on demand, we might
report a nagative expiration time value to userspace if the
pmtu informations are already expired and we have not send a
packet to that inetpeer after expiration. With this patch we
send a expire time of null to userspace after expiration
until the next packet is send to that inetpeer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cda370c..501a3c0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2832,7 +2832,7 @@ static int rt_fill_info(struct net *net,
 	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
-	long expires = 0;
+	unsigned long expires = 0;
 	const struct inet_peer *peer = rt->peer;
 	u32 id = 0, ts = 0, tsage = 0, error;
 
@@ -2889,8 +2889,12 @@ static int rt_fill_info(struct net *net,
 			tsage = get_seconds() - peer->tcp_ts_stamp;
 		}
 		expires = ACCESS_ONCE(peer->pmtu_expires);
-		if (expires)
-			expires -= jiffies;
+		if (expires) {
+			if (time_before(jiffies, expires))
+				expires -= jiffies;
+			else
+				expires = 0;
+		}
 	}
 
 	if (rt_is_input_route(rt)) {
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 3/4] ipv4: Fix inetpeer expiration handling
From: Steffen Klassert @ 2011-10-11 11:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111011110842.GC1830@secunet.com>

We leak a trigger to check if the learned pmtu information has
expired, so the learned pmtu informations never expire. This patch
add such a trigger to ipv4_dst_check().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9a6623e..cda370c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1660,6 +1660,10 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 
 	if (rt_is_expired(rt))
 		return NULL;
+
+	if (rt->peer && peer_pmtu_expired(rt->peer))
+		dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig);
+
 	if (rt->rt_peer_genid != rt_peer_genid()) {
 		struct inet_peer *peer;
 
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
From: Steffen Klassert @ 2011-10-11 11:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111011110842.GC1830@secunet.com>

The pmtu informations on the inetpeer are visible for output and
input routes. On packet forwarding, we might propagate a learned
pmtu to the sender. As we update the pmtu informations of the
inetpeer on demand, the original sender of the forwarded packets
might never notice when the pmtu to that inetpeer increases.
So propagate the nexthop mtu instead of the pmtu to the final
destination.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 075212e..9a6623e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1579,9 +1579,10 @@ unsigned short ip_rt_frag_needed(struct net *net, const struct iphdr *iph,
 
 static void check_peer_pmtu(struct dst_entry *dst, struct inet_peer *peer)
 {
+	struct rtable *rt = (struct rtable *) dst;
 	unsigned long expires = ACCESS_ONCE(peer->pmtu_expires);
 
-	if (!expires)
+	if (rt_is_input_route(rt) || !expires)
 		return;
 	if (time_before(jiffies, expires)) {
 		u32 orig_dst_mtu = dst_mtu(dst);
@@ -1803,6 +1804,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			    struct fib_info *fi)
 {
 	struct inet_peer *peer;
+	struct dst_entry *dst = &rt->dst;
 	int create = 0;
 
 	/* If a peer entry exists for this destination, we must hook
@@ -1817,9 +1819,14 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 		if (inet_metrics_new(peer))
 			memcpy(peer->metrics, fi->fib_metrics,
 			       sizeof(u32) * RTAX_MAX);
-		dst_init_metrics(&rt->dst, peer->metrics, false);
 
-		check_peer_pmtu(&rt->dst, peer);
+		dst_init_metrics(dst, peer->metrics, false);
+		check_peer_pmtu(dst, peer);
+
+		if (rt_is_input_route(rt))
+			dst_metric_set(dst, RTAX_MTU,
+				       dst->ops->default_mtu(dst));
+
 		if (peer->redirect_learned.a4 &&
 		    peer->redirect_learned.a4 != rt->rt_gateway) {
 			rt->rt_gateway = peer->redirect_learned.a4;
@@ -1830,7 +1837,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			rt->fi = fi;
 			atomic_inc(&fi->fib_clntref);
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		dst_init_metrics(dst, fi->fib_metrics, true);
 	}
 }
 
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 1/4] ipv4: Fix pmtu propagating
From: Steffen Klassert @ 2011-10-11 11:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111011110842.GC1830@secunet.com>

Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
we cache the learned pmtu informations in inetpeer and propagate these
informations with the dst_ops->check() functions. However, these functions
might not be called for udp and raw packets. As a consequence, we don't
use the learned pmtu informations in these cases. With this patch we
call dst_check() from ip_setup_cork() to propagate the pmtu informations.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_output.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8c65633..f682437 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1063,6 +1063,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	rt = *rtp;
 	if (unlikely(!rt))
 		return -EFAULT;
+
+	rt = (struct rtable *)dst_check(&rt->dst, 0);
+	if (!rt)
+		return -EHOSTUNREACH;
+
 	/*
 	 * We steal reference to this route, caller should not release it
 	 */
-- 
1.7.0.4

^ permalink raw reply related

* Re: [net-next PATCH] net: allow vlan traffic to be received under bond
From: Hans Schillstrom @ 2011-10-11 11:08 UTC (permalink / raw)
  To: Jesse Gross
  Cc: John Fastabend, Jiri Pirko, davem@davemloft.net,
	netdev@vger.kernel.org, fubar@us.ibm.com
In-Reply-To: <CAEP_g=83tCQMVPOTT82GBw6GamodpqU-SAeYyx9noWnzbGfUpg@mail.gmail.com>

Hello
On Tuesday 11 October 2011 04:43:03 Jesse Gross wrote:
> On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
> > On 10/10/2011 3:37 PM, Jiri Pirko wrote:
> >> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
> >>> The following configuration used to work as I expected. At least
> >>> we could use the fcoe interfaces to do MPIO and the bond0 iface
> >>> to do load balancing or failover.
> >>>
> >>>       ---eth2.228-fcoe
> >>>       |
> >>> eth2 -----|
> >>>          |
> >>>          |---- bond0
> >>>          |
> >>> eth3 -----|
> >>>       |
> >>>       ---eth3.228-fcoe
> >>>
> >>> This worked because of a change we added to allow inactive slaves
> >>> to rx 'exact' matches. This functionality was kept intact with the
> >>> rx_handler mechanism. However now the vlan interface attached to the
> >>> active slave never receives traffic because the bonding rx_handler
> >>> updates the skb->dev and goto's another_round. Previously, the
> >>> vlan_do_receive() logic was called before the bonding rx_handler.
> >>>
> >>> Now by the time vlan_do_receive calls vlan_find_dev() the
> >>> skb->dev is set to bond0 and it is clear no vlan is attached
> >>> to this iface. The vlan lookup fails.
> >>>
> >>> This patch moves the VLAN check above the rx_handler. A VLAN
> >>> tagged frame is now routed to the eth2.228-fcoe iface in the
> >>> above schematic. Untagged frames continue to the bond0 as
> >>> normal. This case also remains intact,
> >>>
> >>> eth2 --> bond0 --> vlan.228
> >>>
> >>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
> >>> causing the bonding rx_handler to be called. On the second
> >>> pass the vlan lookup is on the bond0 iface and completes as
> >>> expected.
> >>>
> >>> Putting a VLAN.228 on both the bond0 and eth2 device will
> >>> result in eth2.228 receiving the skb. I don't think this is
> >>> completely unexpected and was the result prior to the rx_handler
> >>> result.

I think this OK, but I do have a question
if bond0 is in Active/Backup mode, eth2 and eth3 got the same MAC.addr,
what about the VLAN:s ?
(or is just one of thme working ??)

> >>>
> >>> Note, the same setup is also used for other storage traffic that
> >>> MPIO is used with eg. iSCSI and similar setups can be contrived
> >>> without storage protocols.
> >>>
> >>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>> ---
> >>>
> >>> net/core/dev.c |   22 +++++++++++-----------
> >>> 1 files changed, 11 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index 70ecb86..8b6118a 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -3231,6 +3231,17 @@ another_round:
> >>> ncls:
> >>> #endif
> >>>
> >>> +    if (vlan_tx_tag_present(skb)) {
> >>> +            if (pt_prev) {
> >>> +                    ret = deliver_skb(skb, pt_prev, orig_dev);
> >>> +                    pt_prev = NULL;
> >>> +            }
> >>> +            if (vlan_do_receive(&skb))
> >>> +                    goto another_round;
> >>> +            else if (unlikely(!skb))
> >>> +                    goto out;
> >>> +    }
> >>> +
> >>>      rx_handler = rcu_dereference(skb->dev->rx_handler);
> >>>      if (rx_handler) {
> >>>              if (pt_prev) {
> >>> @@ -3251,17 +3262,6 @@ ncls:
> >>>              }
> >>>      }
> >>>
> >>> -    if (vlan_tx_tag_present(skb)) {
> >>> -            if (pt_prev) {
> >>> -                    ret = deliver_skb(skb, pt_prev, orig_dev);
> >>> -                    pt_prev = NULL;
> >>> -            }
> >>> -            if (vlan_do_receive(&skb))
> >>> -                    goto another_round;
> >>> -            else if (unlikely(!skb))
> >>> -                    goto out;
> >>> -    }
> >>> -
> >>>      /* deliver only exact match when indicated */
> >>>      null_or_dev = deliver_exact ? skb->dev : NULL;
> >>>
> >>>
> >>
> >> Hmm, I must look at this again tomorrow but I have strong feeling this
> >> will break some some scenario including vlan-bridge-macvlan.
> >
> > Yes please review... I tested cases with vlan, bridge, and macvlan
> > components and believe this works unless I missed something.
> >
> > Maybe Jesse, can comment though on why this commit that moved (and
> > cleaned up) the vlan tag handling put the vlan_do_receive below the
> > rx_handler rather than above it. Was this intended to fix something?
> 
> The original reason was to ensure packets received from NICs that do
> stripping behaved the same as those that don't.  At the time, the
> packets with inline vlan tags were handled by the same code that
> handled upper layer protocols so it was important that code for vlan
> stripped tags be immediately before that.  Otherwise, packets might be
> taken either by the bridge hook or vlan code depending the the type of
> device.
> 
> However, that's no longer an issue because we now emulate vlan
> acceleration by untagging packets at the beginning of
> __netif_receive_skb(), so the code path will always be the same.
> Furthermore, based on feedback received since that patch it seems
> pretty clear that people prefer the behavior where vlan devices take
> traffic first, so now that we can have both that and consistent
> behavior it seems to be the way to go.
> 
> This looks correct to me:
> Acked-by: Jesse Gross <jesse@nicira.com>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* [PATCH net 0/4] ipv4: various pmtu discovery fixes
From: Steffen Klassert @ 2011-10-11 11:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

This patchset addresses some issues I found during investigating
pmtu discovery. These issues were introduced with
git commit 2c8cec5 (ipv4: Cache learned PMTU information in inetpeer).

The patchset is based on current net/master.

Steffen

^ permalink raw reply

* Re: [net-next PATCH] net: allow vlan traffic to be received under bond
From: Jiri Pirko @ 2011-10-11 10:57 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, netdev, jesse, fubar
In-Reply-To: <20111010223752.GB2373@minipsycho>

Tue, Oct 11, 2011 at 12:37:53AM CEST, jpirko@redhat.com wrote:
>Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>The following configuration used to work as I expected. At least
>>we could use the fcoe interfaces to do MPIO and the bond0 iface
>>to do load balancing or failover.
>>
>>       ---eth2.228-fcoe
>>       |
>>eth2 -----|
>>          |
>>          |---- bond0
>>          |
>>eth3 -----|
>>       |
>>       ---eth3.228-fcoe
>>
>>This worked because of a change we added to allow inactive slaves
>>to rx 'exact' matches. This functionality was kept intact with the
>>rx_handler mechanism. However now the vlan interface attached to the
>>active slave never receives traffic because the bonding rx_handler
>>updates the skb->dev and goto's another_round. Previously, the
>>vlan_do_receive() logic was called before the bonding rx_handler.
>>
>>Now by the time vlan_do_receive calls vlan_find_dev() the
>>skb->dev is set to bond0 and it is clear no vlan is attached
>>to this iface. The vlan lookup fails.
>>
>>This patch moves the VLAN check above the rx_handler. A VLAN
>>tagged frame is now routed to the eth2.228-fcoe iface in the
>>above schematic. Untagged frames continue to the bond0 as
>>normal. This case also remains intact,
>>
>>eth2 --> bond0 --> vlan.228
>>
>>Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>causing the bonding rx_handler to be called. On the second
>>pass the vlan lookup is on the bond0 iface and completes as
>>expected.
>>
>>Putting a VLAN.228 on both the bond0 and eth2 device will
>>result in eth2.228 receiving the skb. I don't think this is
>>completely unexpected and was the result prior to the rx_handler
>>result.
>>
>>Note, the same setup is also used for other storage traffic that
>>MPIO is used with eg. iSCSI and similar setups can be contrived
>>without storage protocols.
>>
>>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>---
>>
>> net/core/dev.c |   22 +++++++++++-----------
>> 1 files changed, 11 insertions(+), 11 deletions(-)
>>
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 70ecb86..8b6118a 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -3231,6 +3231,17 @@ another_round:
>> ncls:
>> #endif
>> 
>>+	if (vlan_tx_tag_present(skb)) {
>>+		if (pt_prev) {
>>+			ret = deliver_skb(skb, pt_prev, orig_dev);
>>+			pt_prev = NULL;
>>+		}
>>+		if (vlan_do_receive(&skb))
>>+			goto another_round;
>>+		else if (unlikely(!skb))
>>+			goto out;
>>+	}
>>+
>> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>>@@ -3251,17 +3262,6 @@ ncls:
>> 		}
>> 	}
>> 
>>-	if (vlan_tx_tag_present(skb)) {
>>-		if (pt_prev) {
>>-			ret = deliver_skb(skb, pt_prev, orig_dev);
>>-			pt_prev = NULL;
>>-		}
>>-		if (vlan_do_receive(&skb))
>>-			goto another_round;
>>-		else if (unlikely(!skb))
>>-			goto out;
>>-	}
>>-
>> 	/* deliver only exact match when indicated */
>> 	null_or_dev = deliver_exact ? skb->dev : NULL;
>> 
>>
>
>Hmm, I must look at this again tomorrow but I have strong feeling this
>will break some some scenario including vlan-bridge-macvlan.

I didn't find out anything.

Reviewed-by: Jiri Pirko <jpirko@redhat.com>

^ permalink raw reply

* Re: e100 + VLANs?
From: Michael Tokarev @ 2011-10-11  9:51 UTC (permalink / raw)
  To: David Lamparter; +Cc: Eric Dumazet, jeffrey.t.kirsher, netdev
In-Reply-To: <4E932278.8010802@tls.msk.ru>

10.10.2011 20:51, Michael Tokarev wrote:
> 10.10.2011 19:13, David Lamparter wrote:
>> On Mon, Oct 10, 2011 at 05:05:52PM +0200, Eric Dumazet wrote:
>>>> When pinging this NIC from another machine over VLAN5, I see
>>>> ARP packets coming to it, gets recognized and replies going
>>>> back, all on vlan 5.  But on the other side, replies comes
>>>> WITHOUT a VLAN tag!
>>>>
>>>> From this NIC's point of view, capturing on whole ethX:
>>>>
>>>> 00:1f:c6:ef:e5:1b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.48.11.2 tell 10.48.11.1, length 42
>>>> 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.48.11.2 is-at 00:90:27:30:6d:1c, length 28
>>>>
>>>> From the partner point of view, also on whole ethX:
>>>>
>>>> 00:1f:c6:ef:e5:1b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.48.11.2 tell 10.48.11.1, length 28
>>>> 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.48.11.2 is-at 00:90:27:30:6d:1c, length 46
>>>>
>>>> So, the tag gets eaten somewhere along the way... ;)
>>
>> Hmm. Looks like broken VLAN TX offload, but the driver doesn't even
>> implement VLAN offload. Maybe it's broken in its non-implementation...
>>
>> Your "partner" is a known-good setup and can be assumed to be working
>> correctly? This is over a crossover cable, no evil switches involved?
> 
> There are just two machines involved, both connected to the
> same _switch_ - no, it is not over cross-over cable.  It's a
> good idea to test one, I'll try it tomorrow (will insert a
> second "known good" nic into another machine).

Ok, I found the issue - it was my misconfiguration of vlan5 in the
switch.  So tags are correctly set and processed by e100 NIC and
the driver.

I tried direct (without a switch) connection, and it shows the same
problem - the MTU issues, large packets does not work, exactly as
shown below...

> The second machine, the "partner", has this NIC:
> 
> 02:00.0 Ethernet controller: Atheros Communications L1 Gigabit Ethernet (rev b0)
> 
> and it is a known-good implementation - it worked with and without vlan
> tags (we had a weird mixed tagged/untagged setup) for over 2 years without
> any issues, and which works now as well - it's our main server which is
> in two VLANs, connected to an interface marked as tagged in the switch.
> It communicates with the other machine when that other machine uses
> already mentioned VIA RhineIII NIC - which I used to replace this non-working
> E100.
> 
> So it's 2 machines, one with 2 nics - VIA Rhine (working) and e100 (non-working),
> both connected to two "tagged" ports in the switch.  And another, with atl1 NIC,
> also connected to a "tagged" port in the switch.
[]
> Ok.  So now I can reproduce the initial problem.
> 
> So, `ping -s 1469' from atl1 side, so that the resulting packet side
s/side/size/

> is 1497 bytes (1468 is the largest size that works) -- the packets
> does not arrive at e100 side at all - it's 100% quiet in tcpdump there.
> 
> When pinging from e100 side and tcpdump'ing on atl1 side (replies does
> not come back to e100):
> 
> 20:49:33.322646 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype 802.1Q (0x8100), length 1515: vlan 8, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1497)
>     10.48.8.2 > 10.48.8.1: ICMP echo request, id 5785, seq 72, length 1477
> 20:49:33.322691 00:1f:c6:ef:e5:1b > 00:90:27:30:6d:1c, ethertype 802.1Q (0x8100), length 1515: vlan 8, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 23781, offset 0, flags [none], proto ICMP (1), length 1497)
>     10.48.8.1 > 10.48.8.2: ICMP echo reply, id 5785, seq 72, length 1477
> 
> So it appears that on e100 side, the _receive_ buffer is too small
> somehow.

So, is that a hardware limitation?

Thank you!

/mjt

^ permalink raw reply

* Re: [PATCH] netconsole: enable netconsole can make net_device refcnt
From: Gao feng @ 2011-10-11  8:05 UTC (permalink / raw)
  To: Wanlong Gao; +Cc: netdev, davem
In-Reply-To: <1318319438-7159-1-git-send-email-gaowanlong@cn.fujitsu.com>

Im so sorry.
the first patch has some format err.
Please use this one.
thanks wanlong! ^V^

11.10.2011 15:50, Wanlong Gao wrote:
> There is no check if netconsole is enabled current.
> so when exec echo 1 > enabled;
> the reference of net_device will increment always.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  drivers/net/netconsole.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index ed2a397..4e6323df 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -307,6 +307,8 @@ static ssize_t store_enabled(struct netconsole_target *nt,
>  		return err;
>  	if (enabled < 0 || enabled > 1)
>  		return -EINVAL;
> +	if (enabled == nt->enabled)
> +		return err;
>  
>  	if (enabled) {	/* 1 */
>  

^ permalink raw reply

* [net-next 4/5] stmmac: Stop advertising 1000Base capabilties for non GMII iface (v2).
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Srinivas Kandagatla, Giuseppe Cavallaro
In-Reply-To: <1318318246-1326-1-git-send-email-peppe.cavallaro@st.com>

From: Srinivas Kandagatla <srinivas.kandagatla@st.com>

This patch stops advertising 1000Base capablities if GMAC is either
configured for MII or RMII mode and on board there is a GPHY plugged on.
Without this patch if an GBit switch is connected on MII interface,
Ethernet stops working at all.

Discovered as part of
https://bugzilla.stlinux.com/show_bug.cgi?id=14148 triage

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 87767d5..4f7af52 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -304,7 +304,7 @@ static int stmmac_init_phy(struct net_device *dev)
 	struct phy_device *phydev;
 	char phy_id[MII_BUS_ID_SIZE + 3];
 	char bus_id[MII_BUS_ID_SIZE];
-
+	int interface = priv->plat->interface;
 	priv->oldlink = 0;
 	priv->speed = 0;
 	priv->oldduplex = -1;
@@ -314,14 +314,21 @@ static int stmmac_init_phy(struct net_device *dev)
 		 priv->plat->phy_addr);
 	pr_debug("stmmac_init_phy:  trying to attach to %s\n", phy_id);
 
-	phydev = phy_connect(dev, phy_id, &stmmac_adjust_link, 0,
-			     priv->plat->interface);
+	phydev = phy_connect(dev, phy_id, &stmmac_adjust_link, 0, interface);
 
 	if (IS_ERR(phydev)) {
 		pr_err("%s: Could not attach to PHY\n", dev->name);
 		return PTR_ERR(phydev);
 	}
 
+	/* Stop Advertising 1000BASE Capability if interface is not GMII */
+	if ((interface == PHY_INTERFACE_MODE_MII) ||
+	    (interface == PHY_INTERFACE_MODE_RMII)) {
+		phydev->supported &= (PHY_BASIC_FEATURES | SUPPORTED_Pause |
+				      SUPPORTED_Asym_Pause);
+		priv->phydev->advertising = priv->phydev->supported;
+	}
+
 	/*
 	 * Broken HW is sometimes missing the pull-up resistor on the
 	 * MDIO line, which results in reads to non-existent devices returning
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH] netconsole: enable netconsole can make net_device refcnt
From: Wanlong Gao @ 2011-10-11  7:50 UTC (permalink / raw)
  To: netdev; +Cc: davem, Wanlong Gao, Gao feng
In-Reply-To: <4E93E238.7000105@cn.fujitsu.com>

There is no check if netconsole is enabled current.
so when exec echo 1 > enabled;
the reference of net_device will increment always.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 drivers/net/netconsole.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index ed2a397..4e6323df 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -307,6 +307,8 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 		return err;
 	if (enabled < 0 || enabled > 1)
 		return -EINVAL;
+	if (enabled == nt->enabled)
+		return err;
 
 	if (enabled) {	/* 1 */
 
-- 
1.7.7.rc1

^ permalink raw reply related

* [net-next] net/phy: extra delay only for RGMII interfaces for IC+ IP 1001
From: Giuseppe CAVALLARO @ 2011-10-11  7:37 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro

The extra delay of 2ns to adjust RX clock phase is actually needed
in RGMII mode. Tested on the HDK7108 (STx7108c2).

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/phy/icplus.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/icplus.c b/drivers/net/phy/icplus.c
index d66bd8d..c81f136 100644
--- a/drivers/net/phy/icplus.c
+++ b/drivers/net/phy/icplus.c
@@ -128,12 +128,15 @@ static int ip1001_config_init(struct phy_device *phydev)
 	if (c < 0)
 		return c;
 
-	/* Additional delay (2ns) used to adjust RX clock phase
-	 * at GMII/ RGMII interface */
-	c = phy_read(phydev, IP10XX_SPEC_CTRL_STATUS);
-	c |= IP1001_PHASE_SEL_MASK;
+	if (phydev->interface == PHY_INTERFACE_MODE_RGMII) {
+		/* Additional delay (2ns) used to adjust RX clock phase
+		 * at RGMII interface */
+		c = phy_read(phydev, IP10XX_SPEC_CTRL_STATUS);
+		c |= IP1001_PHASE_SEL_MASK;
+		c = phy_write(phydev, IP10XX_SPEC_CTRL_STATUS, c);
+	}
 
-	return phy_write(phydev, IP10XX_SPEC_CTRL_STATUS, c);
+	return c;
 }
 
 static int ip101a_config_init(struct phy_device *phydev)
-- 
1.7.4.4

^ permalink raw reply related

* Re: loopback IP alias breaks tftp?
From: Francis SOUYRI @ 2011-10-11  7:28 UTC (permalink / raw)
  To: Olaf van der Spek
  Cc: Eric Dumazet, Josh Boyer, Joel Sing, Julian Anastasov,
	netdev@vger.kernel.org
In-Reply-To: <CAGVGHmt4_kf5XTS745aMa9kuxb1Fu9v8f6boG-vewVeto5jJ=w@mail.gmail.com>

Hello,

I put this in the Red Hat Bugzilla – Bug 739534:

#####	

I do the test with the bind of the IP alias (for test purpose 
127.0.0.2), but I have the same problem.

# ifconfig lo; ifconfig lo:0
lo        Link encap:Local Loopback
           inet addr:127.0.0.1  Mask:255.0.0.0
           inet6 addr: ::1/128 Scope:Host
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:24418 errors:0 dropped:0 overruns:0 frame:0
           TX packets:24418 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:3798250 (3.6 MiB)  TX bytes:3798250 (3.6 MiB)

lo:0      Link encap:Local Loopback
           inet addr:127.0.0.2  Mask:255.0.0.0
           UP LOOPBACK RUNNING  MTU:16436  Metric:1

# netstat -an | grep 69 | grep udp
udp        0      0 127.0.0.2:69                0.0.0.0:* 

udp        0      0 :::52697                    :::* 

# tftp 127.0.0.2
tftp> trace
Packet tracing on.
tftp> get /thinstation/dell/vmlinuz
sent RRQ <file=/thinstation/dell/vmlinuz, mode=netascii>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
sent ACK <block=1>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
sent ACK <block=1>
sent ACK <block=1>
sent ACK <block=1>
received DATA <block=1, 512 bytes>
sent ACK <block=1>
Transfer timed out.

tftp>

I do also this test, I changed the local route (the loopback used is 
only for test purpose, in production I used 192.168.100.0/24 network).

# ip route show table local | grep 127.0.0.
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.2 dev lo  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
#
# ip route change local 127.0.0.2 dev lo  proto kernel  scope host  src
127.0.0.2
#
# ip route show table local | grep 127.0.0.
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.2 dev lo  proto kernel  scope host  src 127.0.0.2
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
#
# tftp 127.0.0.2
tftp> get /thinstation/dell/vmlinuz
tftp> quit
#
# ip route change local 127.0.0.2 dev lo  proto kernel  scope host  src
127.0.0.1
# tftp 127.0.0.2
tftp> get /thinstation/dell/vmlinuz
Transfer timed out.

tftp> quit
#

#####

Best regards.

Francis

On 10/10/2011 05:25 PM, Olaf van der Spek wrote:
> On Mon, Oct 10, 2011 at 5:22 PM, Eric Dumazet<eric.dumazet@gmail.com>  wrote:
>>> Isn't that a bad way to work around this issue?
>>> It'd require you to duplicate your IP config for every daemon that
>>> listens on UDP interfaces.
>>> What about IP addresses that are added/deleted after the daemon is launchad?
>>>
>>> Olaf
>>
>> Thats a pretty common problem, even prior to discussed commit.
>>
>> If you take a look at common UDP daemons, they have to setup a listener
>> for each IP address, OR use the correct API ( setsockopt(fd, IPPROTO_IP,
>> &on, sizeof(on)) and handle IP_PKTINFO ancillary message)
>
> Only the latter solution seems right.
>
> Olaf
>

^ permalink raw reply

* [net-next 5/5] stmmac: update the driver version and doc
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1318318246-1326-1-git-send-email-peppe.cavallaro@st.com>

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 Documentation/networking/stmmac.txt          |   11 ++++++++++-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h |    2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 40ec92c..8d67980 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -76,7 +76,16 @@ core.
 
 4.5) DMA descriptors
 Driver handles both normal and enhanced descriptors. The latter has been only
-tested on DWC Ether MAC 10/100/1000 Universal version 3.41a.
+tested on DWC Ether MAC 10/100/1000 Universal version 3.41a and later.
+
+STMMAC supports DMA descriptor to operate both in dual buffer (RING)
+and linked-list(CHAINED) mode. In RING each descriptor points to two
+data buffer pointers whereas in CHAINED mode they point to only one data
+buffer pointer. RING mode is the default.
+
+In CHAINED mode each descriptor will have pointer to next descriptor in
+the list, hence creating the explicit chaining in the descriptor itself,
+whereas such explicit chaining is not possible in RING mode.
 
 4.6) Ethtool support
 Ethtool is supported. Driver statistics and internal errors can be taken using:
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 50e95d8..49a4af3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -20,7 +20,7 @@
   Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
 *******************************************************************************/
 
-#define DRV_MODULE_VERSION	"Aug_2011"
+#define DRV_MODULE_VERSION	"Oct_2011"
 #include <linux/stmmac.h>
 
 #include "common.h"
-- 
1.7.4.4

^ permalink raw reply related

* [net-next 3/5] stmmac: protect tx process with lock
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1318318246-1326-1-git-send-email-peppe.cavallaro@st.com>

This patch fixes a problem raised on Orly ARM SMP platform
where, in case of fragmented frames, the descriptors
in the TX ring resulted broken. This was due to a missing lock
protection in the tx process.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |    1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 1434bdb..50e95d8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -70,6 +70,7 @@ struct stmmac_priv {
 
 	u32 msg_enable;
 	spinlock_t lock;
+	spinlock_t tx_lock;
 	int wolopts;
 	int wolenabled;
 	int wol_irq;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 0a7d326..87767d5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -620,6 +620,8 @@ static void stmmac_tx(struct stmmac_priv *priv)
 {
 	unsigned int txsize = priv->dma_tx_size;
 
+	spin_lock(&priv->tx_lock);
+
 	while (priv->dirty_tx != priv->cur_tx) {
 		int last;
 		unsigned int entry = priv->dirty_tx % txsize;
@@ -685,6 +687,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 		}
 		netif_tx_unlock(priv->dev);
 	}
+	spin_unlock(&priv->tx_lock);
 }
 
 static inline void stmmac_enable_irq(struct stmmac_priv *priv)
@@ -1161,6 +1164,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_BUSY;
 	}
 
+	spin_lock(&priv->tx_lock);
+
 	entry = priv->cur_tx % txsize;
 
 #ifdef STMMAC_XMIT_DEBUG
@@ -1257,6 +1262,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	priv->hw->dma->enable_dma_transmission(priv->ioaddr);
 
+	spin_unlock(&priv->tx_lock);
+
 	return NETDEV_TX_OK;
 }
 
@@ -1824,6 +1831,7 @@ static int stmmac_probe(struct net_device *dev)
 			"please, use ifconfig or nwhwconfig!\n");
 
 	spin_lock_init(&priv->lock);
+	spin_lock_init(&priv->tx_lock);
 
 	ret = register_netdev(dev);
 	if (ret) {
-- 
1.7.4.4

^ permalink raw reply related

* [net-next 2/5] stmmac: allow mtu bigger than 1500 in case of normal desc.
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro, Deepak SIKRI
In-Reply-To: <1318318246-1326-1-git-send-email-peppe.cavallaro@st.com>

This patch allows to set the mtu bigger than 1500
in case of normal descriptors.
This is helping some SPEAr customers.

Signed-off-by: Deepak SIKRI <deepak.sikri@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c   |   10 +++++++++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    6 +++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index 6c40a38..c3456f8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -127,6 +127,7 @@ static void ndesc_init_rx_desc(struct dma_desc *p, unsigned int ring_size,
 		p->des01.rx.own = 1;
 		p->des01.rx.buffer1_size = BUF_SIZE_2KiB - 1;
 #if defined(CONFIG_STMMAC_RING)
+		p->des01.rx.buffer2_size = BUF_SIZE_2KiB - 1;
 		if (i == (ring_size - 1))
 			p->des01.rx.end_ring = 1;
 #else
@@ -196,7 +197,14 @@ static void ndesc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
 				  int csum_flag)
 {
 	p->des01.tx.first_segment = is_fs;
-	p->des01.tx.buffer1_size = len;
+
+#if defined(CONFIG_STMMAC_RING)
+	if (unlikely(len > BUF_SIZE_2KiB)) {
+		p->des01.etx.buffer1_size = BUF_SIZE_2KiB - 1;
+		p->des01.etx.buffer2_size = len - p->des01.etx.buffer1_size;
+	} else
+#endif
+		p->des01.tx.buffer1_size = len;
 }
 
 static void ndesc_clear_tx_ic(struct dma_desc *p)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 54f1e76..0a7d326 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1491,17 +1491,17 @@ static void stmmac_set_rx_mode(struct net_device *dev)
 static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
-	int max_mtu;
+	int max_mtu = ETH_DATA_LEN;
 
 	if (netif_running(dev)) {
 		pr_err("%s: must be stopped to change its MTU\n", dev->name);
 		return -EBUSY;
 	}
 
-	if (priv->plat->has_gmac)
+	if (priv->plat->enh_desc)
 		max_mtu = JUMBO_LEN;
 	else
-		max_mtu = ETH_DATA_LEN;
+		max_mtu = BUF_SIZE_4KiB;
 
 	if ((new_mtu < 46) || (new_mtu > max_mtu)) {
 		pr_err("%s: invalid MTU, max MTU is: %d\n", dev->name, max_mtu);
-- 
1.7.4.4

^ permalink raw reply related

* [net-next 1/5] stmmac: add CHAINED descriptor mode support
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Rayagond Kokatanur
In-Reply-To: <1318318246-1326-1-git-send-email-peppe.cavallaro@st.com>

From: Rayagond Kokatanur <rayagond@vayavyalabs.com>

This patch enhances the STMMAC driver to support CHAINED mode of
descriptor (useful also on validation side).

STMMAC supports DMA descriptor to operate both in dual buffer(RING)
and linked-list(CHAINED) mode. In RING mode (default) each descriptor
points to two data buffer pointers whereas in CHAINED mode they point
to only one data buffer pointer.

In CHAINED mode each descriptor will have pointer to next descriptor in
the list, hence creating the explicit chaining in the descriptor itself,
whereas such explicit chaining is not possible in RING mode.

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig       |   18 +++
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c    |   24 +++-
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c   |   17 +++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  136 ++++++++++++++++++---
 4 files changed, 171 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 8cd9dde..ac6f190 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -63,4 +63,22 @@ config STMMAC_RTC_TIMER
 
 endchoice
 
+choice
+	prompt "Select the DMA TX/RX descriptor operating modes"
+	depends on STMMAC_ETH
+	---help---
+	  This driver supports DMA descriptor to operate both in dual buffer
+	  (RING) and linked-list(CHAINED) mode. In RING mode each descriptor
+	  points to two data buffer pointers whereas in CHAINED mode they
+	  points to only one data buffer pointer.
+
+config STMMAC_RING
+	bool "Enable Descriptor Ring Mode"
+
+config STMMAC_CHAINED
+	bool "Enable Descriptor Chained Mode"
+
+endchoice
+
+
 endif
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index e5dfb6a..a5f95dd 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -234,9 +234,13 @@ static void enh_desc_init_rx_desc(struct dma_desc *p, unsigned int ring_size,
 		p->des01.erx.own = 1;
 		p->des01.erx.buffer1_size = BUF_SIZE_8KiB - 1;
 		/* To support jumbo frames */
+#if defined(CONFIG_STMMAC_RING)
 		p->des01.erx.buffer2_size = BUF_SIZE_8KiB - 1;
-		if (i == ring_size - 1)
+		if (i == (ring_size - 1))
 			p->des01.erx.end_ring = 1;
+#else
+		p->des01.erx.second_address_chained = 1;
+#endif
 		if (disable_rx_ic)
 			p->des01.erx.disable_ic = 1;
 		p++;
@@ -249,8 +253,12 @@ static void enh_desc_init_tx_desc(struct dma_desc *p, unsigned int ring_size)
 
 	for (i = 0; i < ring_size; i++) {
 		p->des01.etx.own = 0;
-		if (i == ring_size - 1)
+#if defined(CONFIG_STMMAC_RING)
+		if (i == (ring_size - 1))
 			p->des01.etx.end_ring = 1;
+#else
+		p->des01.etx.second_address_chained = 1;
+#endif
 		p++;
 	}
 }
@@ -282,22 +290,30 @@ static int enh_desc_get_tx_ls(struct dma_desc *p)
 
 static void enh_desc_release_tx_desc(struct dma_desc *p)
 {
+#if defined(CONFIG_STMMAC_RING)
 	int ter = p->des01.etx.end_ring;
 
 	memset(p, 0, offsetof(struct dma_desc, des2));
 	p->des01.etx.end_ring = ter;
+#else
+	memset(p, 0, offsetof(struct dma_desc, des2));
+	p->des01.etx.second_address_chained = 1;
+#endif
 }
 
 static void enh_desc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
 				     int csum_flag)
 {
 	p->des01.etx.first_segment = is_fs;
+
+#if defined(CONFIG_STMMAC_RING)
 	if (unlikely(len > BUF_SIZE_4KiB)) {
 		p->des01.etx.buffer1_size = BUF_SIZE_4KiB;
 		p->des01.etx.buffer2_size = len - BUF_SIZE_4KiB;
-	} else {
+	} else
+#endif
 		p->des01.etx.buffer1_size = len;
-	}
+
 	if (likely(csum_flag))
 		p->des01.etx.checksum_insertion = cic_full;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index 029c2a2..6c40a38 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -126,8 +126,12 @@ static void ndesc_init_rx_desc(struct dma_desc *p, unsigned int ring_size,
 	for (i = 0; i < ring_size; i++) {
 		p->des01.rx.own = 1;
 		p->des01.rx.buffer1_size = BUF_SIZE_2KiB - 1;
-		if (i == ring_size - 1)
+#if defined(CONFIG_STMMAC_RING)
+		if (i == (ring_size - 1))
 			p->des01.rx.end_ring = 1;
+#else
+		p->des01.rx.second_address_chained = 1;
+#endif
 		if (disable_rx_ic)
 			p->des01.rx.disable_ic = 1;
 		p++;
@@ -139,8 +143,12 @@ static void ndesc_init_tx_desc(struct dma_desc *p, unsigned int ring_size)
 	int i;
 	for (i = 0; i < ring_size; i++) {
 		p->des01.tx.own = 0;
-		if (i == ring_size - 1)
+#if defined(CONFIG_STMMAC_RING)
+		if (i == (ring_size - 1))
 			p->des01.tx.end_ring = 1;
+#else
+		p->des01.tx.second_address_chained = 1;
+#endif
 		p++;
 	}
 }
@@ -172,11 +180,16 @@ static int ndesc_get_tx_ls(struct dma_desc *p)
 
 static void ndesc_release_tx_desc(struct dma_desc *p)
 {
+#if defined(CONFIG_STMMAC_RING)
 	int ter = p->des01.tx.end_ring;
 
 	memset(p, 0, offsetof(struct dma_desc, des2));
 	/* set termination field */
 	p->des01.tx.end_ring = ter;
+#else
+	memset(p, 0, offsetof(struct dma_desc, des2));
+	p->des01.tx.second_address_chained = 1;
+#endif
 }
 
 static void ndesc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c0ee6b6..54f1e76 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -381,11 +381,32 @@ static void display_ring(struct dma_desc *p, int size)
 	}
 }
 
+#if defined(CONFIG_STMMAC_CHAINED)
+/* In chained mode des3 points to the next element in the ring.
+ * The latest element has to point to the head.
+ */
+static void init_dma_chain(struct dma_desc *des, dma_addr_t phy_addr,
+			   unsigned int size)
+{
+	int i;
+	struct dma_desc *p = des;
+	dma_addr_t dma_phy = phy_addr;
+
+	for (i = 0; i < (size - 1); i++) {
+		dma_phy += sizeof(struct dma_desc);
+		p->des3 = (unsigned int) dma_phy;
+		p++;
+	}
+	p->des3 = (unsigned int) phy_addr;
+}
+#endif
+
 /**
  * init_dma_desc_rings - init the RX/TX descriptor rings
  * @dev: net device structure
  * Description:  this function initializes the DMA RX/TX descriptors
- * and allocates the socket buffers.
+ * and allocates the socket buffers. It suppors the chained and ring
+ * modes.
  */
 static void init_dma_desc_rings(struct net_device *dev)
 {
@@ -395,14 +416,21 @@ static void init_dma_desc_rings(struct net_device *dev)
 	unsigned int txsize = priv->dma_tx_size;
 	unsigned int rxsize = priv->dma_rx_size;
 	unsigned int bfsize = priv->dma_buf_sz;
-	int buff2_needed = 0, dis_ic = 0;
+	int dis_ic = 0;
+#if defined(CONFIG_STMMAC_RING)
+	int des3_as_data_buf = 0;
 
 	/* Set the Buffer size according to the MTU;
 	 * indeed, in case of jumbo we need to bump-up the buffer sizes.
+	 * Note that device can handle only 8KiB in chained mode.
+	 * In ring mode if the mtu exceeds the 8KiB use des3.
 	 */
-	if (unlikely(dev->mtu >= BUF_SIZE_8KiB))
+	if (unlikely(dev->mtu >= BUF_SIZE_8KiB)) {
 		bfsize = BUF_SIZE_16KiB;
-	else if (unlikely(dev->mtu >= BUF_SIZE_4KiB))
+		des3_as_data_buf = 1;
+	} else
+#endif
+	if (unlikely(dev->mtu >= BUF_SIZE_4KiB))
 		bfsize = BUF_SIZE_8KiB;
 	else if (unlikely(dev->mtu >= BUF_SIZE_2KiB))
 		bfsize = BUF_SIZE_4KiB;
@@ -416,9 +444,6 @@ static void init_dma_desc_rings(struct net_device *dev)
 	if (likely(priv->tm->enable))
 		dis_ic = 1;
 #endif
-	/* If the MTU exceeds 8k so use the second buffer in the chain */
-	if (bfsize >= BUF_SIZE_8KiB)
-		buff2_needed = 1;
 
 	DBG(probe, INFO, "stmmac: txsize %d, rxsize %d, bfsize %d\n",
 	    txsize, rxsize, bfsize);
@@ -446,9 +471,15 @@ static void init_dma_desc_rings(struct net_device *dev)
 		return;
 	}
 
-	DBG(probe, INFO, "stmmac (%s) DMA desc rings: virt addr (Rx %p, "
+	DBG(probe, INFO, "stmmac (%s) DMA desc %s mode: virt addr (Rx %p, "
 	    "Tx %p)\n\tDMA phy addr (Rx 0x%08x, Tx 0x%08x)\n",
-	    dev->name, priv->dma_rx, priv->dma_tx,
+	    dev->name,
+#if defined(CONFIG_STMMAC_RING)
+		"ring",
+#else
+		"chained",
+#endif
+		priv->dma_rx, priv->dma_tx,
 	    (unsigned int)priv->dma_rx_phy, (unsigned int)priv->dma_tx_phy);
 
 	/* RX INITIALIZATION */
@@ -468,8 +499,11 @@ static void init_dma_desc_rings(struct net_device *dev)
 						bfsize, DMA_FROM_DEVICE);
 
 		p->des2 = priv->rx_skbuff_dma[i];
-		if (unlikely(buff2_needed))
+
+#if defined(CONFIG_STMMAC_RING)
+		if (unlikely(des3_as_data_buf))
 			p->des3 = p->des2 + BUF_SIZE_8KiB;
+#endif
 		DBG(probe, INFO, "[%p]\t[%p]\t[%x]\n", priv->rx_skbuff[i],
 			priv->rx_skbuff[i]->data, priv->rx_skbuff_dma[i]);
 	}
@@ -483,6 +517,11 @@ static void init_dma_desc_rings(struct net_device *dev)
 		priv->tx_skbuff[i] = NULL;
 		priv->dma_tx[i].des2 = 0;
 	}
+
+#if defined(CONFIG_STMMAC_CHAINED)
+	init_dma_chain(priv->dma_rx, priv->dma_rx_phy, rxsize);
+	init_dma_chain(priv->dma_tx, priv->dma_tx_phy, txsize);
+#endif
 	priv->dirty_tx = 0;
 	priv->cur_tx = 0;
 
@@ -611,8 +650,10 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			dma_unmap_single(priv->device, p->des2,
 					 priv->hw->desc->get_tx_len(p),
 					 DMA_TO_DEVICE);
+#if defined(CONFIG_STMMAC_RING)
 		if (unlikely(p->des3))
 			p->des3 = 0;
+#endif
 
 		if (likely(skb != NULL)) {
 			/*
@@ -1014,35 +1055,84 @@ static unsigned int stmmac_handle_jumbo_frames(struct sk_buff *skb,
 	unsigned int txsize = priv->dma_tx_size;
 	unsigned int entry = priv->cur_tx % txsize;
 	struct dma_desc *desc = priv->dma_tx + entry;
+	unsigned int buf_max_size;
+	int len;
 
-	if (nopaged_len > BUF_SIZE_8KiB) {
+#if defined(CONFIG_STMMAC_RING)
+	if (priv->plat->enh_desc)
+		buf_max_size = BUF_SIZE_8KiB;
+	else
+		buf_max_size = BUF_SIZE_2KiB;
 
-		int buf2_size = nopaged_len - BUF_SIZE_8KiB;
+	len = nopaged_len - buf_max_size;
+
+	if (nopaged_len > BUF_SIZE_8KiB) {
 
 		desc->des2 = dma_map_single(priv->device, skb->data,
-					    BUF_SIZE_8KiB, DMA_TO_DEVICE);
+					    buf_max_size, DMA_TO_DEVICE);
 		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
-		priv->hw->desc->prepare_tx_desc(desc, 1, BUF_SIZE_8KiB,
+		priv->hw->desc->prepare_tx_desc(desc, 1, buf_max_size,
 						csum_insertion);
 
 		entry = (++priv->cur_tx) % txsize;
 		desc = priv->dma_tx + entry;
 
 		desc->des2 = dma_map_single(priv->device,
-					skb->data + BUF_SIZE_8KiB,
-					buf2_size, DMA_TO_DEVICE);
+					skb->data + buf_max_size,
+					len, DMA_TO_DEVICE);
 		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
-		priv->hw->desc->prepare_tx_desc(desc, 0, buf2_size,
+		priv->hw->desc->prepare_tx_desc(desc, 0, len,
 						csum_insertion);
 		priv->hw->desc->set_tx_owner(desc);
 		priv->tx_skbuff[entry] = NULL;
 	} else {
 		desc->des2 = dma_map_single(priv->device, skb->data,
-					nopaged_len, DMA_TO_DEVICE);
+					    nopaged_len, DMA_TO_DEVICE);
 		desc->des3 = desc->des2 + BUF_SIZE_4KiB;
 		priv->hw->desc->prepare_tx_desc(desc, 1, nopaged_len,
 						csum_insertion);
 	}
+#else
+	unsigned i = 1;
+
+	if (priv->plat->enh_desc)
+		buf_max_size = BUF_SIZE_8KiB;
+	else
+		buf_max_size = BUF_SIZE_2KiB;
+
+	len = nopaged_len - buf_max_size;
+
+	desc->des2 = dma_map_single(priv->device, skb->data,
+				    buf_max_size, DMA_TO_DEVICE);
+	priv->hw->desc->prepare_tx_desc(desc, 1, buf_max_size,
+					csum_insertion);
+
+	while (len != 0) {
+		entry = (++priv->cur_tx) % txsize;
+		desc = priv->dma_tx + entry;
+
+		if (len > buf_max_size) {
+			desc->des2 = dma_map_single(priv->device,
+						(skb->data + buf_max_size * i),
+						buf_max_size, DMA_TO_DEVICE);
+			priv->hw->desc->prepare_tx_desc(desc, 0, buf_max_size,
+							csum_insertion);
+			priv->hw->desc->set_tx_owner(desc);
+			priv->tx_skbuff[entry] = NULL;
+			len -= buf_max_size;
+			i++;
+		} else {
+			desc->des2 = dma_map_single(priv->device,
+						(skb->data + buf_max_size * i),
+						len, DMA_TO_DEVICE);
+			priv->hw->desc->prepare_tx_desc(desc, 0, len,
+							csum_insertion);
+			priv->hw->desc->set_tx_owner(desc);
+			priv->tx_skbuff[entry] = NULL;
+			len = 0;
+		}
+	}
+#endif
 	return entry;
 }
 
@@ -1094,9 +1184,17 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		       skb->len, skb_headlen(skb), nfrags, skb->ip_summed);
 #endif
 	priv->tx_skbuff[entry] = skb;
+
+#if defined(CONFIG_STMMAC_RING)
 	if (unlikely(skb->len >= BUF_SIZE_4KiB)) {
 		entry = stmmac_handle_jumbo_frames(skb, dev, csum_insertion);
 		desc = priv->dma_tx + entry;
+#else
+	if ((priv->plat->enh_desc && unlikely(skb->len > BUF_SIZE_8KiB)) ||
+	(!priv->plat->enh_desc && unlikely(skb->len > BUF_SIZE_2KiB))) {
+		entry = stmmac_handle_jumbo_frames(skb, dev, csum_insertion);
+		desc = priv->dma_tx + entry;
+#endif
 	} else {
 		unsigned int nopaged_len = skb_headlen(skb);
 		desc->des2 = dma_map_single(priv->device, skb->data,
@@ -1187,11 +1285,13 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
 					   DMA_FROM_DEVICE);
 
 			(p + entry)->des2 = priv->rx_skbuff_dma[entry];
+#if defined(CONFIG_STMMAC_RING)
 			if (unlikely(priv->plat->has_gmac)) {
 				if (bfsize >= BUF_SIZE_8KiB)
 					(p + entry)->des3 =
 					    (p + entry)->des2 + BUF_SIZE_8KiB;
 			}
+#endif
 			RX_DBG(KERN_INFO "\trefill entry #%d\n", entry);
 		}
 		wmb();
-- 
1.7.4.4

^ permalink raw reply related

* [net-next 0/5] stmmac: update to Oct 2011 version
From: Giuseppe CAVALLARO @ 2011-10-11  7:30 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro

This patches update the driver adding the chained
descriptor mode and some new useful fixes.

Giuseppe Cavallaro (3):
  stmmac: allow mtu bigger than 1500 in case of normal desc.
  stmmac: protect tx process with lock
  stmmac: update the driver version and doc

Rayagond Kokatanur (1):
  stmmac: add CHAINED descriptor mode support

Srinivas Kandagatla (1):
  stmmac: Stop advertising 1000Base capabilties for non GMII iface
    (v2).

 Documentation/networking/stmmac.txt               |   11 ++-
 drivers/net/ethernet/stmicro/stmmac/Kconfig       |   18 +++
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c    |   24 +++-
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c   |   27 +++-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |    3 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  163 ++++++++++++++++++---
 6 files changed, 213 insertions(+), 33 deletions(-)

-- 
1.7.4.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox