Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: John Williams @ 2010-05-27  4:12 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, michal.simek,
	Brian Hill
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>

On Thu, May 27, 2010 at 3:29 AM, John Linn <john.linn@xilinx.com> wrote:
> The code is not checking the interrupt for DMA correctly so that an
> interrupt number of 0 will cause a false error.
>
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>
> ---
>  drivers/net/ll_temac_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
> index fa7620e..0615737 100644
> --- a/drivers/net/ll_temac_main.c
> +++ b/drivers/net/ll_temac_main.c
> @@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>
>        lp->rx_irq = irq_of_parse_and_map(np, 0);
>        lp->tx_irq = irq_of_parse_and_map(np, 1);
> -       if (!lp->rx_irq || !lp->tx_irq) {
> +       if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {

Personally I think this is the right thing to do.  But, I thought the
IRQ 0 == NO_IRQ (AKA "all-the-world's-an-x86-and-if-not-it-should-be")
holy war was already fought and won (or lost, depending on your
perspective)?

I seem to recall giving reluctant assent to a patch from Grant a few
months ago that touched MicroBlaze thus?

John

^ permalink raw reply

* Re: [PATCH 9/17] net/iucv: Add missing spin_unlock
From: David Miller @ 2010-05-27  4:10 UTC (permalink / raw)
  To: julia
  Cc: ursula.braun, linux390, linux-s390, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005261756330.23743@ask.diku.dk>

From: Julia Lawall <julia@diku.dk>
Date: Wed, 26 May 2010 17:56:48 +0200 (CEST)

> Add a spin_unlock missing on the error path.  There seems like no reason
> why the lock should continue to be held if the kzalloc fail.

Applied, thanks.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: David Miller @ 2010-05-27  4:06 UTC (permalink / raw)
  To: anton; +Cc: eric.dumazet, netdev
In-Reply-To: <20100527035617.GB28295@kryten>

From: Anton Blanchard <anton@samba.org>
Date: Thu, 27 May 2010 13:56:17 +1000

> I'm somewhat confused by the two stage locking in the socket lock
> (ie sk_lock.slock and sk_lock.owned).
> 
> What state should the socket lock be in for serialising updates of
> sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> unlocked, sk_lock.owned = 1:

If sh_lock.owned=1 the user has grabbed exclusive sleepable lock on the
socket, it does this via something approximating:

retry:
	spin_lock(&sk_lock.slock);
	was_locked = sk_lock.owned;
	if (!was_locked)
		sk_lock.owned = 1;
	spin_unlock(&sk_lock.slock);
	if (was_locked) {
		sleep_on(condition(sk_lock.owned));
		goto retry;
	}

This allows user context code to sleep with exclusive access to the
socket.

Code that cannot sleep takes the spinlock, and queues the work if the
owner field if non-zero.  Else, it keeps the spinlock held and does
the work.

In either case, socket modifications are thus done completely protected
from other contexts.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-27  3:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1274868776.2672.96.camel@edumazet-laptop>

Hi Eric,

> You are 100% right David, maybe we should add a test when changing
> sk_forward_alloc to test if socket is locked (lockdep only test), but
> that's for 2.6.36 :)

Thanks for the patch, unfortunately I can still hit the WARN_ON. I'm somewhat
confused by the two stage locking in the socket lock (ie sk_lock.slock and
sk_lock.owned).

What state should the socket lock be in for serialising updates of
sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
unlocked, sk_lock.owned = 1:

NIP [c0000000005b4ad0] .sock_queue_rcv_skb
LR [c0000000005b4acc] .sock_queue_rcv_skb
Call Trace:
[c0000000005f9fcc] .ip_queue_rcv_skb
[c00000000061d604] .__udp_queue_rcv_skb
[c0000000005b1a38] .release_sock
[c0000000006205f0] .udp_sendmsg
[c0000000006290d4] .inet_sendmsg
[c0000000005abfb4] .sock_sendmsg
[c0000000005ae9dc] .SyS_sendto
[c0000000005ab6c0] .SyS_send

And other times we update it with sk_lock.slock = locked, sk_lock.owned = 0:

NIP [c0000000005b2b6c] .sock_rfree
LR [c0000000005b2b68] .sock_rfree
Call Trace:
[c0000000005bca10] .skb_free_datagram_locked
[c00000000061fe88] .udp_recvmsg
[c0000000006285e8] .inet_recvmsg
[c0000000005abe0c] .sock_recvmsg
[c0000000005ae358] .SyS_recvfrom

I see we sometimes take sk_lock.slock then check the owned field, but we
aren't doing that all the time.

Anton

^ permalink raw reply

* Re: [PATCH 2/2] net: ll_temac: fix checksum offload logic
From: David Miller @ 2010-05-27  3:45 UTC (permalink / raw)
  To: john.linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, brian.hill
In-Reply-To: <0690e798-1e5e-4416-90dc-04a9df16600f@SG2EHSMHS005.ehs.local>

aFrom: John Linn <john.linn@xilinx.com>
Date: Wed, 26 May 2010 11:29:19 -0600

> The current checksum offload code does not work and this corrects
> that functionality. It also updates the interrupt coallescing
> initialization so than there are fewer interrupts and performance
> is increased.
> 
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>

Applied, although I changed "temac_features" to be explicitly
"unsigned int" instead of just plain "unsigned"


^ permalink raw reply

* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: David Miller @ 2010-05-27  3:44 UTC (permalink / raw)
  To: john.linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, brian.hill
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>

From: John Linn <john.linn@xilinx.com>
Date: Wed, 26 May 2010 11:29:18 -0600

> The code is not checking the interrupt for DMA correctly so that an
> interrupt number of 0 will cause a false error.
> 
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>

Applied.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-27  3:04 UTC (permalink / raw)
  To: hagen; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526231512.GD2684@nuttenaction>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu, 27 May 2010 01:15:12 +0200

> How can a domain defined as {process,peer-IP} fair to the 1MB
> bottleneck link?

You're asking about a network level issue in terms of what can be done
on a local end-node.

All an end-node can do is abide by congestion control rules and respond
to packet drops, as has been going on for decades.

People have basically (especially in Europe) given up on crazy crap
like RSVP and other forms of bandwidth limiting and reservation.  They
just oversubscribe their links, and increase their capacity as traffic
increases dictate.  It just isn't all that manageable to put people's
traffic into classes and control what they do on a large scale.

I'm also skeptical about those who say the fight belongs squarely at
the end nodes.  If you want to control the network traffic of the
meeting point of your dumbbell, you'll need a machine there doing RED
or traffic limiting.  End-host schemes simply aren't going to work
because I can just add more end-hosts to reintroduce the problem.

The dumbbell situation is independant of the end-node issues, that's
all I'm really saying.

^ permalink raw reply

* UDP Fragmentation and DF bit..
From: Vincent, Pradeep @ 2010-05-27  1:45 UTC (permalink / raw)
  To: netdev@vger.kernel.org

After running into issues with UDP in multi-MTU network environment, I
started digging through the code and found somewhat inconsistent behavior
(if I read the code right)

OMan 7 ip¹ declares that ³The  system-wide  default  is  controlled  by  the
ip_no_pmtu_disc  sysctl  for SOCK_STREAM  sockets,  and  disabled  on all
others.² which led me to think ODF¹ bit will not be set for UDP packets.
But..

The code in ip_output.c seems to do the following,

1. If packet size <= current PMTU, then set the DF=1 and send the packet
out.
2. If packet size > current PMTU, then set DF=0 and send the packet out
after fragmentation.

In a network environment where MTU-big and MTU-small co-exist (and have
router¹s fragmentation turned off in favor of PMTU discovery), UDP packets
that are > MTU-small and < MTU-big find the PMTU effectively but UDP packets
that are > MTU-big get dropped. This looks like inconsistent behavior to me
and doesn¹t seem to match the advertised behavior.

Is there a reason why PMTU support for UDP is somewhat inconsistent ?

Is there a reason ODF¹ bit cannot be set on fragmented packets on UDP
transmission ? I couldn¹t find anything in RFC for IP protocol that
prohibited DF bit on fragmented packets. Did I miss
something here ?

Would it be reasonable if PMTU discovery is performed (DF bit set +
appropriate icmp logic) even for locally fragmented packets ? I think this
will be a great help for UDP users that don¹t have PMTU handling logic in
the application (most udp applications belong to this category in my
experience). Thoughts ?

Thanks,

- Pradeep Vincent

^ permalink raw reply

* issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Xin, Xiaohui @ 2010-05-27  1:21 UTC (permalink / raw)
  To: mst@redhat.com
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Herbert Xu

Michael,
I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.

When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside. In zero-copy case, vhost cannot know which page is used to put header, and which page is used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where the zero-copy use mergeable buffer must modify.

Have I missed something here? And how do you think about it?

Thanks 
Xiaohui 

^ permalink raw reply

* [PATCH] fs_enet: Adjust BDs after tx error
From: Mark Ware @ 2010-05-27  1:12 UTC (permalink / raw)
  To: Linuxppc-dev Development, netdev; +Cc: Vitaly Bordug

This patch fixes an occasional transmit lockup in the mac-fcc which
occurs after a tx error.  The test scenario had the local port set
to autoneg and the other end fixed at 100FD, resulting in a large
number of late collisions.

According to the MPC8280RM 30.10.1.3 (also 8272RM 29.10.1.3), after
a tx error occurs, TBPTR may sometimes point beyond BDs still marked
as ready.  This patch walks back through the BDs and points TBPTR to
the earliest one marked as ready.

Tested on a custom board with a MPC8280.

Signed-off-by: Mark Ware <mware@elphinstone.net>
---
 drivers/net/fs_enet/mac-fcc.c |   49 ++++++++++++++++++++++++++++++++++++-----
 1 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fs_enet/mac-fcc.c b/drivers/net/fs_enet/mac-fcc.c
index 22e5a84..6023f15 100644
--- a/drivers/net/fs_enet/mac-fcc.c
+++ b/drivers/net/fs_enet/mac-fcc.c
@@ -503,17 +503,54 @@ static int get_regs_len(struct net_device *dev)
 }
 
 /* Some transmit errors cause the transmitter to shut
- * down.  We now issue a restart transmit.  Since the
- * errors close the BD and update the pointers, the restart
- * _should_ pick up without having to reset any of our
- * pointers either.  Also, To workaround 8260 device erratum
- * CPM37, we must disable and then re-enable the transmitter
- * following a Late Collision, Underrun, or Retry Limit error.
+ * down.  We now issue a restart transmit.
+ * Also, to workaround 8260 device erratum CPM37, we must
+ * disable and then re-enable the transmitterfollowing a
+ * Late Collision, Underrun, or Retry Limit error.
+ * In addition, tbptr may point beyond BDs beyond still marked
+ * as ready due to internal pipelining, so we need to look back
+ * through the BDs and adjust tbptr to point to the last BD
+ * marked as ready.  This may result in some buffers being
+ * retransmitted.
  */
 static void tx_restart(struct net_device *dev)
 {
 	struct fs_enet_private *fep = netdev_priv(dev);
 	fcc_t __iomem *fccp = fep->fcc.fccp;
+	const struct fs_platform_info *fpi = fep->fpi;
+	fcc_enet_t __iomem *ep = fep->fcc.ep;
+	cbd_t __iomem *curr_tbptr;
+	cbd_t __iomem *recheck_bd;
+	cbd_t __iomem *prev_bd;
+	cbd_t __iomem *last_tx_bd;
+
+	last_tx_bd = fep->tx_bd_base + (fpi->tx_ring * sizeof(cbd_t));
+
+	/* get the current bd held in TBPTR  and scan back from this point */
+	recheck_bd = curr_tbptr = (cbd_t __iomem *)
+		((R32(ep, fen_genfcc.fcc_tbptr) - fep->ring_mem_addr) +
+		fep->ring_base);
+
+	prev_bd = (recheck_bd == fep->tx_bd_base) ? last_tx_bd : recheck_bd - 1;
+
+	/* Move through the bds in reverse, look for the earliest buffer
+	 * that is not ready.  Adjust TBPTR to the following buffer */
+	while ((CBDR_SC(prev_bd) & BD_ENET_TX_READY) != 0) {
+		/* Go back one buffer */
+		recheck_bd = prev_bd;
+
+		/* update the previous buffer */
+		prev_bd = (prev_bd == fep->tx_bd_base) ? last_tx_bd : prev_bd - 1;
+
+		/* We should never see all bds marked as ready, check anyway */
+		if (recheck_bd == curr_tbptr)
+			break;
+	}
+	/* Now update the TBPTR and dirty flag to the current buffer */
+	W32(ep, fen_genfcc.fcc_tbptr,
+		(uint) (((void *)recheck_bd - fep->ring_base) +
+		fep->ring_mem_addr));
+	fep->dirty_tx = recheck_bd;
 
 	C32(fccp, fcc_gfmr, FCC_GFMR_ENT);
 	udelay(10);
-- 
1.5.6.5

^ permalink raw reply related

* Re: [REGRESSION,BISECTED] MIPv6 support broken by f4f914b58019f0
From: Brian Haley @ 2010-05-27  0:48 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: David Miller,
	YOSHIFUJI Hideaki / 吉藤英明, Jiri Olsa,
	Scott Otto, netdev
In-Reply-To: <87zkzmppfg.fsf@small.ssi.corp>

On 05/26/2010 01:01 PM, Arnaud Ebalard wrote:
> Hi,
> 
> I just updated my laptop's kernel to 2.6.34 (previously running .33 and
> configured to act as an IPsec/IKE-protected MIPv6 Mobile Node using
> racoon and umip): after rebooting on the new kernel, the transport mode
> SA protecting MIPv6 signaling traffic are missing.
> 
> I bisected the issue down to f4f914b58019f0e50d521bbbadfaee260d766f95
> (net: ipv6 bind to device issue) which was added after 2.6.34-rc5: 
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c2438e8..05ebd78 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -815,7 +815,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
>  {
>         int flags = 0;
>  
> -       if (rt6_need_strict(&fl->fl6_dst))
> +       if (fl->oif || rt6_need_strict(&fl->fl6_dst))
>                 flags |= RT6_LOOKUP_F_IFACE;

Can you see if fl->oif is at least a sane value here?  Maybe there's some
partially un-initialized flowi getting passed-in, a quick source code check
didn't find anything obvious.

The other thought is that it's the tunnel code calling it, as it's going
to set 'oif' (actually it caches a whole flowi) from the tunnel parms ifindex/link
value.  It could have been setting it forever, but ip6_route_output() just
never enforced it until now.

My $.02.

-Brian

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-26 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>

* David Miller | 2010-05-26 15:10:14 [-0700]:

>From: Andi Kleen <andi@firstfloor.org>
>Date: Wed, 26 May 2010 23:27:45 +0200
>
>> As I understand the idea was that the application knows
>> what flows belong to a single peer and wants to have
>> a single cwnd for all of those. Perhaps there would
>> be a way to generalize that to tell it to the kernel.
>> 
>> e.g. have a "peer id"  that is known by applications
>> and the kernel could manage cwnds shared between connections
>> associated with the same peer id?
>> 
>> Just an idea, I admit I haven't thought very deeply
>> about this. Feel free to poke holes into it.
>
>Yes, a CWND "domain" that can include multiple sockets is
>something that might gain some traction.
>
>The "domain" could just simply be the tuple {process,peer-IP}

This discussion - as once a month - is about fairness. But if we define a
domain as a tuple of {process,peer-IP} the fairness is applied only for the
last link before "peer-IP".

But fairness applies to *all* links in between! For example: consider a
dumpbell scenario:


+------+                                   +------+ 
|      |                                   |      |  
|  H1  |                                   |  H3  | 
|      |                                   |      |  
+------+                                   +------+  
  10MB  \   +------+            +------+  / 10MB
         \  |      |   1MB/s    |      | / 
          > |  R1  |------------|  R2  |<    
         /  |      |            |      | \      
  10MB  /   +------+            +------+  \ 10MB 
+------+                                   +------+  
|      |                                   |      |        
|  H2  |                                   |  H4  | 
|      |                                   |      | 
+------+                                   +------+


How can a domain defined as {process,peer-IP} fair to the 1MB bottleneck link?
It is not fair! And it is also not fair to open n simultaneous streams and so
on. This problem is discussed in several RFC's.

.02


Best regards, Hagen


-- 
Hagen Paul Pfeifer <hagen@jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22


^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Rick Jones @ 2010-05-26 22:29 UTC (permalink / raw)
  To: andi; +Cc: David Miller, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>

David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 26 May 2010 23:27:45 +0200
> 
>>As I understand the idea was that the application knows
>>what flows belong to a single peer and wants to have
>>a single cwnd for all of those. Perhaps there would
>>be a way to generalize that to tell it to the kernel.
>>
>>e.g. have a "peer id"  that is known by applications
>>and the kernel could manage cwnds shared between connections
>>associated with the same peer id?

Then all the app does is say "I'am in peer id foo" right?  Is that really that 
much different from making the setsockopt() call for a different cwnd value? 
Particularly if say the limit were not a global sysctl, but based on the 
existing per-route value (perhaps expanded to have a min, max and default?)

>>Just an idea, I admit I haven't thought very deeply
>>about this. Feel free to poke holes into it.
> 
> Yes, a CWND "domain" that can include multiple sockets is
> something that might gain some traction.
> 
> The "domain" could just simply be the tuple {process,peer-IP}

Name or PID?

rick jones

^ permalink raw reply

* Re: ipv6: Add GSO support on forwarding path
From: Herbert Xu @ 2010-05-26 22:16 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: David S. Miller, netdev
In-Reply-To: <20100526152019.GA11985@linux-mips.org>

On Wed, May 26, 2010 at 04:20:19PM +0100, Ralf Baechle wrote:
> 
> I tested this on top of a 2.6.34 release kernel and asked the user
> experiencing the problem to re-test and the problem still persists.

OK, it looks like my patch doesn't work because the transport
header isn't set on the forwarding path.  Here's an updated patch:

ipv6: Add GSO support on forwarding path

Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.

Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether, 
hen they really should be checking gso_size instead.

I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message.  Someone
should add that to be 100% correct.

Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7cdfb4d..6ec5238 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2117,6 +2117,13 @@ static inline int skb_is_gso(const struct sk_buff *skb)
 	return skb_shinfo(skb)->gso_size;
 }

+static inline int skb_gso_len(const struct sk_buff *skb)
+{
+	return skb_is_gso(skb) ?
+	       skb_shinfo(skb)->gso_size +
+	       (skb->csum_start - skb_headroom(skb)) : skb->len;
+}
+
 static inline int skb_is_gso_v6(const struct sk_buff *skb)
 {
 	return skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cd963f6..8904767 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -507,7 +507,7 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;

-	if (skb->len > mtu) {
+	if (skb_gso_len(skb) > mtu) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-26 22:10 UTC (permalink / raw)
  To: andi; +Cc: therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526212745.GC24615@basil.fritz.box>

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 26 May 2010 23:27:45 +0200

> As I understand the idea was that the application knows
> what flows belong to a single peer and wants to have
> a single cwnd for all of those. Perhaps there would
> be a way to generalize that to tell it to the kernel.
> 
> e.g. have a "peer id"  that is known by applications
> and the kernel could manage cwnds shared between connections
> associated with the same peer id?
> 
> Just an idea, I admit I haven't thought very deeply
> about this. Feel free to poke holes into it.

Yes, a CWND "domain" that can include multiple sockets is
something that might gain some traction.

The "domain" could just simply be the tuple {process,peer-IP}

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-26 21:27 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.140818.245406045.davem@davemloft.net>

> > Yes all of Saudi-Arabia used to be (is?) one IP address...
> > 
> > Caching anything per IP is bogus.
> 
> And letting the applications choose the CWND is better?!?!

No I actually agree with you on that. Just saying that
anything that relies on per IP caching is bad too.

As I understand the idea was that the application knows
what flows belong to a single peer and wants to have
a single cwnd for all of those. Perhaps there would
be a way to generalize that to tell it to the kernel.

e.g. have a "peer id"  that is known by applications
and the kernel could manage cwnds shared between connections
associated with the same peer id?

Just an idea, I admit I haven't thought very deeply
about this. Feel free to poke holes into it.

-Andi

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-26 21:08 UTC (permalink / raw)
  To: andi; +Cc: therbert, shemminger, netdev, ycheng
In-Reply-To: <87fx1e1sat.fsf@basil.nowhere.org>

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 26 May 2010 19:33:46 +0200

> Tom Herbert <therbert@google.com> writes:
>>>
>> Thanks to NAT, the concept of a network path or even host specific
>> path is a weakened concept.  On the Internet this may be a path
>> characteristic per client, which unfortunately has no visibility in
>> the kernel other than per connection state.  When a single IP address
>> may have thousands of hosts behind it, caching TCP parameters for that
>> IP address is implicitly doing a huge aggregation-- probably dicey...
> 
> Yes all of Saudi-Arabia used to be (is?) one IP address...
> 
> Caching anything per IP is bogus.

And letting the applications choose the CWND is better?!?!

Every single proposal being mentioned in this thread has huge,
obvious, downsides.

Just because there are some cases of people NAT'ing many machines
behind one IP address doesn't mean we kill performance for the rest of
the world (the majority of internet usage btw) by not caching TCP path
characteristics per IP address.

And just because applications open up many sockets to get better TCP
latency and work around per-connection CWND limits DOES NOT mean we
let the application increase the initial CWND so it can abuse this
EVEN MORE and cause EVEN BIGGER problems.

If people have real, sane, ideas about how to attack this problem I am
all ears.  But everything proposed here so far is complete and utter
crap.

^ permalink raw reply

* RE: NULL Pointer Deference: NFS & Telnet
From: Eric Dumazet @ 2010-05-26 20:48 UTC (permalink / raw)
  To: Arce, Abraham
  Cc: David Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-omap@vger.kernel.org, tony@atomide.com, Shilimkar, Santosh,
	Ha, Tristram
In-Reply-To: <27F9C60D11D683428E133F85D2BB4A53043E3EE6A3@dlee03.ent.ti.com>

Le mercredi 26 mai 2010 à 15:19 -0500, Arce, Abraham a écrit :

> By increasing the allocation length of our rx skbuff the corruption issue is fixed... I have increased it by 2... Were we writing outside our boundaries of skb data?
> 
> Please let me know about this approach...
> 
> diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
> index b4fb07a..6da81e1 100644
> --- a/drivers/net/ks8851.c
> +++ b/drivers/net/ks8851.c
> @@ -504,7 +504,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
>                                ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
> 
>                 if (rxlen > 0) {
> -                       skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
> +                       skb = netdev_alloc_skb(ks->netdev, rxlen + 4 + 8);
>                         if (!skb) {
> 
> Best Regards
> Abraham
> 

Yes that makes sense, nr_frag is right after the packet (padded to L1
cache size)

But please do the correct allocation ?

Also, we dont need FCS ?

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index b4fb07a..05bd312 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -503,8 +503,9 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
 		ks8851_wrreg16(ks, KS_RXQCR,
 			       ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
 
-		if (rxlen > 0) {
-			skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
+		if (rxlen > 4) {
+			rxlen -= 4;
+			skb = netdev_alloc_skb(ks->netdev, 2 + 8 + ALIGN(rxlen, 4));
 			if (!skb) {
 				/* todo - dump frame and move on */
 			}
@@ -513,7 +514,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
 			 * for the status header and 4 bytes of garbage */
 			skb_reserve(skb, 2 + 4 + 4);
 
-			rxpkt = skb_put(skb, rxlen - 4) - 8;
+			rxpkt = skb_put(skb, rxlen) - 8;
 
 			/* align the packet length to 4 bytes, and add 4 bytes
 			 * as we're getting the rx status header as well */
@@ -526,7 +527,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
 			netif_rx(skb);
 
 			ks->netdev->stats.rx_packets++;
-			ks->netdev->stats.rx_bytes += rxlen - 4;
+			ks->netdev->stats.rx_bytes += rxlen;
 		}
 
 		ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);

^ permalink raw reply related

* RE: NULL Pointer Deference: NFS & Telnet
From: Arce, Abraham @ 2010-05-26 20:19 UTC (permalink / raw)
  To: Eric Dumazet, David Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-omap@vger.kernel.org,
	tony@atomide.com, Shilimkar, Santosh, Ha, Tristram
In-Reply-To: <1274851741.25136.16.camel@edumazet-laptop>

Thanks Eric, David,

[..]

> > > > -		if (skb_shinfo(skb)->nr_frags) {
> > > > +		if (skb_shinfo(skb)->nr_frags && skb_has_frags(skb)) {
> > > >  			int i;
> > > >  			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> > > >  				put_page(skb_shinfo(skb)->frags[i].page);
> > >
> > > skb_shinfo(skb)->nr_frags counts the number of entries contained
> > > in the skb_shinfo(skb)->frags[] array.
> > >
> > > This has nothing to do with the frag list pointer,
> > > skb_shinfo(skb)->frag_list, which is what skb_has_frags()
> > > tests.
> > >
> > > You've got some kind of memory corruption going on and it
> > > appears to have nothing to do with the code paths you're
> > > playing with here.
> >
> > Do you have any recommendation on debugging technique/tool for this memory
> corruption issue?

[..]

> It seems quite strange. You have a skb->nr_frags > 0 value, but a
> frags[i].page = 0 value
> 
> You might add following function :
> 
> shinfo_check(struct sk_buff *skb)
> {
> 	struct skb_shared_info *shinfo = skb_shinfo(skb);
> 	int i;
> 
> 	WARN_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
> 	for (i = 0; i < shinfo->nr_frags; i++)
> 		WARN_ON(!shinfo->frags[i].page);
> }
> 
> And call it from various points, to check who corrupts your skb.

By increasing the allocation length of our rx skbuff the corruption issue is fixed... I have increased it by 2... Were we writing outside our boundaries of skb data?

Please let me know about this approach...

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index b4fb07a..6da81e1 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -504,7 +504,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
                               ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);

                if (rxlen > 0) {
-                       skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
+                       skb = netdev_alloc_skb(ks->netdev, rxlen + 4 + 8);
                        if (!skb) {

Best Regards
Abraham

^ permalink raw reply related

* Re: [patch] caif: unlock on error path in cfserl_receive()
From: Sjur Brændeland @ 2010-05-26 18:18 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Sjur Braendeland, David S. Miller, netdev, kernel-janitors
In-Reply-To: <20100526151648.GA15439@bicker>

Hi Dan.

Dan Carpenter <error27@gmail.com> wrote:
> There was an spin_unlock missing on the error path.  The spin_lock was
> tucked in with the declarations so it was hard to spot.  I added a new
> line.
>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland

Well spotted! Thanks for reviewing and fixing this.

Regards
Sjur

^ permalink raw reply

* Re: [PATCH 8/17] net/caif: Add missing spin_unlock
From: Sjur Brændeland @ 2010-05-26 18:13 UTC (permalink / raw)
  To: Julia Lawall; +Cc: David S. Miller, netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005261756150.23743@ask.diku.dk>

Julia Lawall <julia@diku.dk> wrote:
> From: Julia Lawall <julia@diku.dk>
>
> Add a spin_unlock missing on the error path.  The spin lock is used in a
> balanced way elsewhere in the file.

This bug is already fixed an part of net-2.6 in
"[PATCH net-2.6 5/6] caif: Bugfix - missing spin_unlock".

Regards
Sjur

^ permalink raw reply

* Re: [net-next PATCH] be2net: Adding PCI SRIOV support
From: Venugopal Busireddy @ 2010-05-26 17:55 UTC (permalink / raw)
  To: netdev
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>


David,
>> From: Sarveshwar Bandi <sarveshwarb@serverengines.com>
>> Date: Wed, 31 Mar 2010 18:26:12 +0530
>> - Patch adds support to enable PCI SRIOV in the driver and changes to
handle \
>>                initialization of PCI virtual functions.
>> - Function handler to change mac addresses for VF from its
corresponding PF.
>> 
>> Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
>Applied, thanks.
 I checked the 2.6.34 kernel, and could not see this patch applied in
it. In which kernel would this patch be available? Also, if I were to
apply this patch, which kernel version would this patch require?
Thanks,
Venu

^ permalink raw reply

* Re: [PATCH 1/17] net/rds: Add missing mutex_unlock
From: Andy Grover @ 2010-05-26 17:55 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, rds-devel, netdev, linux-kernel, kernel-janitors,
	Zach Brown
In-Reply-To: <Pine.LNX.4.64.1005261754030.23743@ask.diku.dk>

Reviewed-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>

-- Andy

On 05/26/2010 08:54 AM, Julia Lawall wrote:
> From: Julia Lawall<julia@diku.dk>
>
> Add a mutex_unlock missing on the error path.  In each case, whenever the
> label out is reached from elsewhere in the function, mutex is not locked.
>
> The semantic match that finds this problem is as follows:
> (http://coccinelle.lip6.fr/)
>
> //<smpl>
> @@
> expression E1;
> @@
>
> * mutex_lock(E1);
>    <+... when != E1
>    if (...) {
>      ... when != E1
> *   return ...;
>    }
>    ...+>
> * mutex_unlock(E1);
> //</smpl>
>
> Signed-off-by: Julia Lawall<julia@diku.dk>
>
> ---
>   net/rds/ib_cm.c |    1 +
>   net/rds/iw_cm.c |    1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
> index 10ed0d5..f688327 100644
> --- a/net/rds/ib_cm.c
> +++ b/net/rds/ib_cm.c
> @@ -475,6 +475,7 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
>   	err = rds_ib_setup_qp(conn);
>   	if (err) {
>   		rds_ib_conn_error(conn, "rds_ib_setup_qp failed (%d)\n", err);
> +		mutex_unlock(&conn->c_cm_lock);
>   		goto out;
>   	}
>
> diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
> index a9d951b..b5dd6ac 100644
> --- a/net/rds/iw_cm.c
> +++ b/net/rds/iw_cm.c
> @@ -452,6 +452,7 @@ int rds_iw_cm_handle_connect(struct rdma_cm_id *cm_id,
>   	err = rds_iw_setup_qp(conn);
>   	if (err) {
>   		rds_iw_conn_error(conn, "rds_iw_setup_qp failed (%d)\n", err);
> +		mutex_unlock(&conn->c_cm_lock);
>   		goto out;
>   	}
>

^ permalink raw reply

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
From: Stanislaw Gruszka @ 2010-05-26 18:51 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev
In-Reply-To: <1274872584.20576.13579.camel@macbook.infradead.org>

On Wed, May 26, 2010 at 12:16:24PM +0100, David Woodhouse wrote:
> The problem is that the 'find_vcc' functions in these drivers are
> returning a vcc with the ATM_VF_READY bit cleared, because it's already
> in the process of being destroyed. If we fix that simple oversight,
> there's still a race condition because the socket can still be closed
> (and completely freed, afaict) between our call to find_vcc() and our
> call to vcc->push() in the RX tasklet.
> 
> Here's a patch for solos-pci which should fix it. We prevent the race by
> making the dev->ops->close() function wait for the RX tasklet to
> complete, so it can't still be using the vcc in question.

I do not think this is problem with usbatm. We use custom vcc_list with
custom usbatm_vcc_data entries. We remove entry on atmdev_ops->close()
within tasklet_disable()/tasklet_enable() section. After that
rx tasklet will not find usbatm_vcc_data for corresponding vpi, vpi.

Stanislaw

^ permalink raw reply

* [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: John Linn @ 2010-05-26 17:29 UTC (permalink / raw)
  To: netdev, linuxppc-dev, grant.likely, jwboyer
  Cc: john.williams, michal.simek, John Linn, Brian Hill

The code is not checking the interrupt for DMA correctly so that an
interrupt number of 0 will cause a false error.

Signed-off-by: Brian Hill <brian.hill@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
---
 drivers/net/ll_temac_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
index fa7620e..0615737 100644
--- a/drivers/net/ll_temac_main.c
+++ b/drivers/net/ll_temac_main.c
@@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)

 	lp->rx_irq = irq_of_parse_and_map(np, 0);
 	lp->tx_irq = irq_of_parse_and_map(np, 1);
-	if (!lp->rx_irq || !lp->tx_irq) {
+	if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {
 		dev_err(&op->dev, "could not determine irqs\n");
 		rc = -ENOMEM;
 		goto nodev;
-- 
1.6.2.1

This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox