Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: ipv4: tcp_probe: cleanup snprintf() use
From: Vasiliy Kulikov @ 2010-11-14 17:06 UTC (permalink / raw)
  To: kernel-janitors
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel

snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes.  Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes.  In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10.  The rest
is an unstrusted input.  Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes.  snprintf() is used to follow return
value semantic.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
---
 Compile tested.

Format length:
	20 for '%lu'
	1 for '.'
	9 for '%09lu'
	1 for ' '
	15 for '%pI4'
	1 for ':'
	10 for '%u'
	1 for ' '
	15 for '%pI4'
	1 for ':'
	10 for '%u'
	1 for ' '
	11 for '%d'
	1 for ' '
	10 for '%#x'
	1 for ' '
	10 for '%#x'
	1 for ' '
	10 for '%u'
	1 for ' '
	10 for '%u'
	1 for ' '
	10 for '%u'
	1 for ' '
	10 for '%u'
	1 for '\n'
163 for '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n'

 net/ipv4/tcp_probe.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
index 6211e21..3b7bf19 100644
--- a/net/ipv4/tcp_probe.c
+++ b/net/ipv4/tcp_probe.c
@@ -154,7 +154,7 @@ static int tcpprobe_sprint(char *tbuf, int n)
 	struct timespec tv
 		= ktime_to_timespec(ktime_sub(p->tstamp, tcp_probe.start));
 
-	return snprintf(tbuf, n,
+	return scnprintf(tbuf, n,
 			"%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n",
 			(unsigned long) tv.tv_sec,
 			(unsigned long) tv.tv_nsec,
@@ -174,7 +174,7 @@ static ssize_t tcpprobe_read(struct file *file, char __user *buf,
 		return -EINVAL;
 
 	while (cnt < len) {
-		char tbuf[128];
+		char tbuf[164];
 		int width;
 
 		/* Wait for data in buffer */
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] tcp: restrict net.ipv4.tcp_adv_min_scale (#20312)
From: Alexey Dobriyan @ 2010-11-14 15:18 UTC (permalink / raw)
  To: davem; +Cc: shemminger, netdev

tcp_win_from_space() does the following:

	if (sysctl_tcp_adv_win_scale <= 0)
		return space >> (-sysctl_tcp_adv_win_scale);
	else
		return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32, space >> 32 equals
space and function returns 0;

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

	echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
	wget www.kernel.org
	[softlockup]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -26,6 +26,8 @@ static int zero;
 static int tcp_retr1_max = 255;
 static int ip_local_port_range_min[] = { 1, 1 };
 static int ip_local_port_range_max[] = { 65535, 65535 };
+static int _minus_31 = -31;
+static int _31 = 31;
 
 /* Update system visible IP port range */
 static void set_local_port_range(int range[2])
@@ -426,7 +428,9 @@ static struct ctl_table ipv4_table[] = {
 		.data		= &sysctl_tcp_adv_win_scale,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &_minus_31,
+		.extra2		= &_31,
 	},
 	{
 		.procname	= "tcp_tw_reuse",

^ permalink raw reply

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
From: Zhang Le @ 2010-11-14 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1289724745.2743.61.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

On 09:52 Sun 14 Nov     , Eric Dumazet wrote:
> Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> > req->ts_recent. In this case, theoretically the req should not be ignored.
> > 
> > But in fact, it could be ignored, if peer->tcp_ts is so small that the
> > difference between this two number is larger than 2 to the power of 31.
> > 
> > I understand that under this situation, timestamp does not make sense any more,
> > because it actually comes from difference machines. However, if anyone
> > ever need to do the same investigation which I have done, this will
> > save some time for him.
> > 
> > Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> > ---
> >  net/ipv4/tcp_ipv4.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 8f8527d..1eb4974 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> >  		    peer->v4daddr == saddr) {
> >  			inet_peer_refcheck(peer);
> >  			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> > -			    (s32)(peer->tcp_ts - req->ts_recent) >
> > -							TCP_PAWS_WINDOW) {
> > +			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> > +			     peer->tcp_ts > req->ts_recent)) {
> >  				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> >  				goto drop_and_release;
> >  			}
> 
> This seems very wrong to me.
> 
> Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
> to help. And it might break some working setups, because of wrap around.

Yeah, you are right. And sorry for overlooking this.

I should have reviewed time_{before,after}'s implementation before posting this.

So it seems we can't do anything to improve this except to add some warning in
documentation. Maybe some comments in the code too.

> 
> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.
> 
> What about following patch instead ?
> 
> [PATCH] doc: extend tcp_tw_recycle documentation
> 
> tcp_tw_recycle should not be used on a server if there is a chance
> clients are behind a same NAT. Document this fact before too many users
> discover this too late.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  Documentation/networking/ip-sysctl.txt |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index c7165f4..406f0d5 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
>  tcp_tw_recycle - BOOLEAN
>  	Enable fast recycling TIME-WAIT sockets. Default value is 0.
>  	It should not be changed without advice/request of technical
> -	experts.
> +	experts. If you set it to 1, make sure you dont miss connections
> +	attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
> +	In particular, this might break if several clients are behind
> +	a common NAT device, since their TCP timestamp wont be changed
> +	by the NAT. tcp_tw_recycle should be used with care, most
> +	probably in private networks.
>  
>  tcp_tw_reuse - BOOLEAN
>  	Allow to reuse TIME-WAIT sockets for new connections when it is
> 
> 

-- 
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] ucc_geth: Fix deadlock
From: Joakim Tjernlund @ 2010-11-14 14:43 UTC (permalink / raw)
  To: Anton Vorontsov; +Cc: linuxppc-dev, netdev
In-Reply-To: <20101112140947.GB28223@oksana.dev.rtsoft.ru>

Anton Vorontsov <cbouatmailru@gmail.com> wrote on 2010/11/12 15:09:47:
>
> On Fri, Nov 12, 2010 at 02:55:09PM +0100, Joakim Tjernlund wrote:
> > This script:
> >  while [ 1==1 ] ; do ifconfig eth0 up; usleep 1950000 ;ifconfig eth0 down; dmesg -c ;done
> > causes in just a second or two:
> > INFO: task ifconfig:572 blocked for more than 120 seconds.
> [...]
> > The reason appears to be ucc_geth_stop meets adjust_link as the
> > PHY reports PHY changes. I belive adjust_link hangs somewhere,
> > holding the PHY lock, because ucc_geth_stop disabled the
> > controller HW.
> > Fix is to stop the PHY before disabling the controller.
> >
> > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
>
> It's unclear where exactly adjust_link() hangs, but the patch
> looks as the right thing overall.

Yes, I too cannot find where it is hanging, just that it is hanging somewhere.
I am starting to think it is hanging somewhere else. Anyhow, the hang
goes away 100% when this patch is applied.

 Jocke


^ permalink raw reply

* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-14 13:35 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: netdev, linux1394-devel, linux-kernel
In-Reply-To: <1289735532.24539.12.camel@maxim-laptop>

On 14 Nov, Maxim Levitsky wrote:
> On Sun, 2010-11-14 at 10:25 +0100, Stefan Richter wrote:
>> Maxim Levitsky wrote:
>> > However the 'update 2' (maybe update 1 too, didn't test), lowers
>> > desktop->laptop throughput somewhat.
>> > (250 vs 227 Mbits/s). I tested this many times.
>> > 
>> > Actuall raw troughput possible with UDP stream and ether no throttling
>> > or higher packets in flight count (I tested 50/30), it 280 Mbits/s.
>> 
>> Good, I will test deeper queues with a few different controllers here.  As
>> long as we keep a margin to 64 so that other traffic besides IPover1394 still
>> has a chance to acquire transaction labels, it's OK.
> Just tested the 'update 2' with 8-16 margin. Gives me ~250 Mbits/s TCP
> easily, and ~280 Mbit/s UDP. Pretty much the maximum its possible to get
> out of this hardware.

Good, update below.  Tested also with an OS X peer on my side to exclude
throughput regression.

>> > BTW, I still don't understand fully  why my laptop sends only at 180
>> > Mbits/s pretty much always regardless of patches or TCP/UDP.
>> 
>> If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA
>> unit as well as Texas Instruments did.
> You mean AT, because in the fast case (desktop->laptop), the TI
> transmits and Ricoh receives. In slow case Ricoh receives and TI
> transmits.

Yes, I meant to write 'AT'.

> Anyway speeds of new stack beat the old one by significant margin.

Gap count optimization surely plays a big role in this.

---- 8< ----
[PATCH update 3] firewire: net: throttle TX queue before running out of tlabels

This prevents firewire-net from submitting write requests in fast
succession until failure due to all 64 transaction labels were used up
for unfinished split transactions.  The netif_stop/wake_queue API is
used for this purpose.

Without this stop/wake mechanism, datagrams were simply lost whenever
the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
also prevented other unrelated outbound transactions to be initiated.

The chosen queue depth was checked by me to hit the maximum possible
throughput with an OS X peer whose receive DMA is good enough to never
reject requests due to busy inbound request FIFO.  Current Linux peers
show a mixed picture of -5%...+15% change in bandwidth; their current
bottleneck are RCODE_BUSY situations (fewer or more, depending on TX
queue depth) due to too small AR buffer in firewire-ohci.

Maxim Levitsky tested this change with similar watermarks with a Linux
peer and some pending firewire-ohci improvements that address the
RCODE_BUSY problem and confirmed that these TX queue limits are good.

Note:  This removes some netif_wake_queue from reception code paths.
They were apparently copy&paste artefacts from a nonsensical
netif_wake_queue use in the older eth1394 driver.  This belongs only
into the transmit path.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
---
 drivers/firewire/net.c |   59 ++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 24 deletions(-)

Index: b/drivers/firewire/net.c
===================================================================
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -28,8 +28,14 @@
 #include <asm/unaligned.h>
 #include <net/arp.h>
 
-#define FWNET_MAX_FRAGMENTS	25	/* arbitrary limit */
-#define FWNET_ISO_PAGE_COUNT	(PAGE_SIZE < 16 * 1024 ? 4 : 2)
+/* rx limits */
+#define FWNET_MAX_FRAGMENTS		30 /* arbitrary, > TX queue depth */
+#define FWNET_ISO_PAGE_COUNT		(PAGE_SIZE < 16*1024 ? 4 : 2)
+
+/* tx limits */
+#define FWNET_MAX_QUEUED_DATAGRAMS	20 /* < 64 = number of tlabels */
+#define FWNET_MIN_QUEUED_DATAGRAMS	10 /* should keep AT DMA busy enough */
+#define FWNET_TX_QUEUE_LEN		FWNET_MAX_QUEUED_DATAGRAMS /* ? */
 
 #define IEEE1394_BROADCAST_CHANNEL	31
 #define IEEE1394_ALL_NODES		(0xffc0 | 0x003f)
@@ -641,8 +647,6 @@ static int fwnet_finish_incoming_packet(
 		net->stats.rx_packets++;
 		net->stats.rx_bytes += skb->len;
 	}
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return 0;
 
@@ -651,8 +655,6 @@ static int fwnet_finish_incoming_packet(
 	net->stats.rx_dropped++;
 
 	dev_kfree_skb_any(skb);
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return -ENOENT;
 }
@@ -784,15 +786,10 @@ static int fwnet_incoming_packet(struct 
 	 * Datagram is not complete, we're done for the
 	 * moment.
 	 */
-	spin_unlock_irqrestore(&dev->lock, flags);
-
-	return 0;
+	retval = 0;
  fail:
 	spin_unlock_irqrestore(&dev->lock, flags);
 
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
-
 	return retval;
 }
 
@@ -892,6 +889,13 @@ static void fwnet_free_ptask(struct fwne
 	kmem_cache_free(fwnet_packet_task_cache, ptask);
 }
 
+/* Caller must hold dev->lock. */
+static void dec_queued_datagrams(struct fwnet_device *dev)
+{
+	if (--dev->queued_datagrams == FWNET_MIN_QUEUED_DATAGRAMS)
+		netif_wake_queue(dev->netdev);
+}
+
 static int fwnet_send_packet(struct fwnet_packet_task *ptask);
 
 static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask)
@@ -908,7 +912,7 @@ static void fwnet_transmit_packet_done(s
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = (ptask->outstanding_pkts == 0 && ptask->enqueued);
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	if (ptask->outstanding_pkts == 0) {
 		dev->netdev->stats.tx_packets++;
@@ -979,7 +983,7 @@ static void fwnet_transmit_packet_failed
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = ptask->enqueued;
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	dev->netdev->stats.tx_dropped++;
 	dev->netdev->stats.tx_errors++;
@@ -1064,7 +1068,7 @@ static int fwnet_send_packet(struct fwne
 		if (!free)
 			ptask->enqueued = true;
 		else
-			dev->queued_datagrams--;
+			dec_queued_datagrams(dev);
 
 		spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1083,7 +1087,7 @@ static int fwnet_send_packet(struct fwne
 	if (!free)
 		ptask->enqueued = true;
 	else
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1249,6 +1253,15 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	struct fwnet_peer *peer;
 	unsigned long flags;
 
+	spin_lock_irqsave(&dev->lock, flags);
+
+	/* Can this happen? */
+	if (netif_queue_stopped(dev->netdev)) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+
+		return NETDEV_TX_BUSY;
+	}
+
 	ptask = kmem_cache_alloc(fwnet_packet_task_cache, GFP_ATOMIC);
 	if (ptask == NULL)
 		goto fail;
@@ -1267,9 +1280,6 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	proto = hdr_buf.h_proto;
 	dg_size = skb->len;
 
-	/* serialize access to peer, including peer->datagram_label */
-	spin_lock_irqsave(&dev->lock, flags);
-
 	/*
 	 * Set the transmission type for the packet.  ARP packets and IP
 	 * broadcast packets are sent via GASP.
@@ -1291,7 +1301,7 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 		peer = fwnet_peer_find_by_guid(dev, be64_to_cpu(guid));
 		if (!peer || peer->fifo == FWNET_NO_FIFO_ADDR)
-			goto fail_unlock;
+			goto fail;
 
 		generation         = peer->generation;
 		dest_node          = peer->node_id;
@@ -1345,7 +1355,8 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 		max_payload += RFC2374_FRAG_HDR_SIZE;
 	}
 
-	dev->queued_datagrams++;
+	if (++dev->queued_datagrams == FWNET_MAX_QUEUED_DATAGRAMS)
+		netif_stop_queue(dev->netdev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1356,9 +1367,9 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 	return NETDEV_TX_OK;
 
- fail_unlock:
-	spin_unlock_irqrestore(&dev->lock, flags);
  fail:
+	spin_unlock_irqrestore(&dev->lock, flags);
+
 	if (ptask)
 		kmem_cache_free(fwnet_packet_task_cache, ptask);
 
@@ -1415,7 +1426,7 @@ static void fwnet_init_dev(struct net_de
 	net->addr_len		= FWNET_ALEN;
 	net->hard_header_len	= FWNET_HLEN;
 	net->type		= ARPHRD_IEEE1394;
-	net->tx_queue_len	= 10;
+	net->tx_queue_len	= FWNET_TX_QUEUE_LEN;
 	SET_ETHTOOL_OPS(net, &fwnet_ethtool_ops);
 }
 



-- 
Stefan Richter
-=====-==-=- =-== -===-
http://arcgraph.de/sr/


------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev

^ permalink raw reply

* PLEASE REPLY, I NEED YOUR HELP
From: Jurisprudencia @ 2010-11-14 11:48 UTC (permalink / raw)


I am Mrs. Sarah Grant, the wife of Late Mr.Robert Grant, my husband was a business merchant in Russia before he died in the year 2003. When he was alive he deposited the sum of 7.5 Million Pounds with a Bank. Presently, I'm suffering from esophageal cancer, Please Stand-in as the beneficiary and collect the Funds from the Bank to finance Charity Organizations. If  interested, please reply to this email address: mrssarahgrant@yahoo.com.hk

^ permalink raw reply

* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Maxim Levitsky @ 2010-11-14 11:52 UTC (permalink / raw)
  To: Stefan Richter; +Cc: netdev, linux1394-devel, linux-kernel
In-Reply-To: <4CDFAB10.5050800@s5r6.in-berlin.de>

On Sun, 2010-11-14 at 10:25 +0100, Stefan Richter wrote:
> Maxim Levitsky wrote:
> > In fact after lot of testing I see that original patch, 
> > '[PATCH 4/4] firewire: net: throttle TX queue before running out of
> > tlabels' works the best here.
> > With AR fixes, I don't see even a single fwnet_write_complete error on
> > ether side.
> 
> Well, that version missed that the rx path opened up the tx queue again. I.e.
> it did not work as intended.
> 
> > However the 'update 2' (maybe update 1 too, didn't test), lowers
> > desktop->laptop throughput somewhat.
> > (250 vs 227 Mbits/s). I tested this many times.
> > 
> > Actuall raw troughput possible with UDP stream and ether no throttling
> > or higher packets in flight count (I tested 50/30), it 280 Mbits/s.
> 
> Good, I will test deeper queues with a few different controllers here.  As
> long as we keep a margin to 64 so that other traffic besides IPover1394 still
> has a chance to acquire transaction labels, it's OK.
Just tested the 'update 2' with 8-16 margin. Gives me ~250 Mbits/s TCP
easily, and ~280 Mbit/s UDP. Pretty much the maximum its possible to get
out of this hardware.

> 
> > BTW, I still don't understand fully  why my laptop sends only at 180
> > Mbits/s pretty much always regardless of patches or TCP/UDP.
> 
> If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA
> unit as well as Texas Instruments did.
You mean AT, because in the fast case (desktop->laptop), the TI
transmits and Ricoh receives. In slow case Ricoh receives and TI
transmits.
Anyway speeds of new stack beat the old one by significant margin.

Best regards,
	Maxim Levitsky




------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev

^ permalink raw reply

* [PATCH] netfilter: define nat_pptp_info as needed
From: Changli Gao @ 2010-11-14 11:47 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/netfilter/nf_nat.h |    2 ++
 1 file changed, 2 insertions(+)
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index f5f09f0..d082938 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -56,7 +56,9 @@ struct nf_nat_multi_range_compat {
 /* per conntrack: nat application helper private data */
 union nf_conntrack_nat_help {
 	/* insert nat helper private data here */
+#if defined(CONFIG_NF_NAT_PPTP) || defined(CONFIG_NF_NAT_PPTP_MODULE)
 	struct nf_nat_pptp nat_pptp_info;
+#endif
 };
 
 struct nf_conn;

^ permalink raw reply related

* sky2 for on-board 88e8055 fails to detect mac address
From: Guillaume Leclanche @ 2010-11-14 11:42 UTC (permalink / raw)
  To: netdev

Hello,

I use a "Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet
Controller".
It seems that the sky2 module doesn't manage to grab the mac address
from the HW :

[    0.741173] sky2: driver version 1.28
[    0.741206] sky2 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    0.741216] sky2 0000:02:00.0: setting latency timer to 64
[    0.741244] sky2 0000:02:00.0: Yukon-2 EC Ultra chip revision 2
[    0.741339] sky2 0000:02:00.0: irq 44 for MSI/MSI-X
[    0.744781] sky2 0000:02:00.0: eth0: addr 00:00:00:00:00:00

Of course I can set the mac address manually with "ip link set eth0
addr ..." (and then it works fine) but I guess that's not the normal
behavior from the driver?

I'd be happy to troubleshoot but I'm not very used to kernel
internals, so tell me if you need some output.
Below a few targeted info about my system.

Guillaume

2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055
PCI-E Gigabit Ethernet Controller
	Subsystem: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit
Ethernet Controller
	Flags: bus master, fast devsel, latency 0, IRQ 44
	Memory at fe8fc000 (64-bit, non-prefetchable) [size=16K]
	I/O ports at a800 [size=256]
	Expansion ROM at fe8c0000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 3
	Capabilities: [50] Vital Product Data
	Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Kernel driver in use: sky2
	Kernel modules: sky2

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Eric Dumazet @ 2010-11-14  9:39 UTC (permalink / raw)
  To: Marcos; +Cc: netdev, Stephen Guerin
In-Reply-To: <AANLkTimkPpWob8ANySeWBvDE+Pq2wy4SQWJOSYzbS7QG@mail.gmail.com>

Le dimanche 14 novembre 2010 à 02:14 -0700, Marcos a écrit :
> > I have no idea why and how kademlia would be added to "linux kernel"
> >
> > Its a protocol based on UDP, and probably already done on userland.
> >
> > What am I missing ?
> 
> The idea is to tightly couple it to the operating system to create a
> sort of "super operating system" that is seamless to the application
> layers above.  Just like memory stores are tightly integrated as to be
> unnoticeable....
> 

But we dont want a "super operating system". We want a good one.

Memory stores done in userland are as fast as memory stores done in
kernel.

Once you need to access files, perform complex searches, timers,
logging, and all the stuff, you really want to do it from userland, in
high level language that many programmers master, or get something that
is too complex/buggy.




^ permalink raw reply

* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-14  9:25 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, linux-kernel, netdev
In-Reply-To: <1289710228.8581.16.camel@maxim-laptop>

Maxim Levitsky wrote:
> In fact after lot of testing I see that original patch, 
> '[PATCH 4/4] firewire: net: throttle TX queue before running out of
> tlabels' works the best here.
> With AR fixes, I don't see even a single fwnet_write_complete error on
> ether side.

Well, that version missed that the rx path opened up the tx queue again. I.e.
it did not work as intended.

> However the 'update 2' (maybe update 1 too, didn't test), lowers
> desktop->laptop throughput somewhat.
> (250 vs 227 Mbits/s). I tested this many times.
> 
> Actuall raw troughput possible with UDP stream and ether no throttling
> or higher packets in flight count (I tested 50/30), it 280 Mbits/s.

Good, I will test deeper queues with a few different controllers here.  As
long as we keep a margin to 64 so that other traffic besides IPover1394 still
has a chance to acquire transaction labels, it's OK.

> BTW, I still don't understand fully  why my laptop sends only at 180
> Mbits/s pretty much always regardless of patches or TCP/UDP.

If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA
unit as well as Texas Instruments did.
-- 
Stefan Richter
-=====-==-=- =-== -===-
http://arcgraph.de/sr/

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Marcos @ 2010-11-14  9:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Stephen Guerin
In-Reply-To: <1289724643.2743.58.camel@edumazet-laptop>

> I have no idea why and how kademlia would be added to "linux kernel"
>
> Its a protocol based on UDP, and probably already done on userland.
>
> What am I missing ?

The idea is to tightly couple it to the operating system to create a
sort of "super operating system" that is seamless to the application
layers above.  Just like memory stores are tightly integrated as to be
unnoticeable....

marcos

^ permalink raw reply

* [PATCH] netfilter: don't use atomic bit operation
From: Changli Gao @ 2010-11-14  9:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

As we own ct, and the others can't see it until we confirm it, we don't
need to use atomic bit operation on ct->status.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/netfilter/nf_nat_core.h |    4 ++--
 net/ipv4/netfilter/nf_nat_core.c    |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/net/netfilter/nf_nat_core.h b/include/net/netfilter/nf_nat_core.h
index 33602ab..52ac1d8 100644
--- a/include/net/netfilter/nf_nat_core.h
+++ b/include/net/netfilter/nf_nat_core.h
@@ -21,9 +21,9 @@ static inline int nf_nat_initialized(struct nf_conn *ct,
 				     enum nf_nat_manip_type manip)
 {
 	if (manip == IP_NAT_MANIP_SRC)
-		return test_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
+		return IPS_SRC_NAT_DONE_BIT & ct->status;
 	else
-		return test_bit(IPS_DST_NAT_DONE_BIT, &ct->status);
+		return IPS_DST_NAT_DONE_BIT & ct->status;
 }
 
 struct nlattr;
diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c
index c04787c..ab877ac 100644
--- a/net/ipv4/netfilter/nf_nat_core.c
+++ b/net/ipv4/netfilter/nf_nat_core.c
@@ -323,9 +323,9 @@ nf_nat_setup_info(struct nf_conn *ct,
 
 	/* It's done. */
 	if (maniptype == IP_NAT_MANIP_DST)
-		set_bit(IPS_DST_NAT_DONE_BIT, &ct->status);
+		ct->status |= IPS_DST_NAT_DONE_BIT;
 	else
-		set_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
+		ct->status |= IPS_SRC_NAT_DONE_BIT;
 
 	return NF_ACCEPT;
 }

^ permalink raw reply related

* Re: [PATCH/RFC] netfilter: nf_conntrack_sip: Handle quirky Cisco phones
From: Eric Dumazet @ 2010-11-14  8:59 UTC (permalink / raw)
  To: Kevin Cernekee
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netfilter, coreteam, linux-kernel, netdev
In-Reply-To: <28d666269c390965f1a4edca42f93c12@localhost>

Le dimanche 14 novembre 2010 à 00:32 -0800, Kevin Cernekee a écrit :
> Most SIP devices use a source port of 5060/udp on SIP requests, so the
> response automatically comes back to port 5060:
> 
> phone_ip:5060 -> proxy_ip:5060   REGISTER
> proxy_ip:5060 -> phone_ip:5060   100 Trying
> 
> The newer Cisco IP phones, however, use a randomly chosen high source
> port for the SIP request but expect the response on port 5060:
> 
> phone_ip:49173 -> proxy_ip:5060  REGISTER
> proxy_ip:5060 -> phone_ip:5060   100 Trying
> 
> Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> to port 49173, not 5060:
> 
> phone_ip:49173 -> proxy_ip:5060  REGISTER
> proxy_ip:5060 -> phone_ip:49173  100 Trying
> 
> But the phone is not listening on 49173, so it will never see the reply.
> 
> This issue was seen on a Cisco CP-7965G, firmware 8-5(3).  It appears
> to be a well-known problem on 7941 and newer:
> 
> http://www.voip-info.org/wiki/view/Standalone+Cisco+7941%252F7961+without+a+local+PBX
> 
> Search for "Connecting to the outside world"
> 
> I contacted Cisco support and they were not amenable to changing the
> behavior.  It appears to be RFC3261-compliant, as the "Sent-by port"
> field in the request specifies 5060:
> 

There is a difference between being RFC compliant, and being usable.

Most SIP sotfwares I know will break with such a stupid CISCO behavior.



> 18.2.2 Sending Responses
> 
>    The server transport uses the value of the top Via header field in
>    order to determine where to send a response.  It MUST follow the
>    following process:
> 
> ...
> 
>       o  Otherwise (for unreliable unicast transports), if the top Via
>          has a "received" parameter, the response MUST be sent to the
>          address in the "received" parameter, using the port indicated
>          in the "sent-by" value, or using port 5060 if none is specified
>          explicitly.  If this fails, for example, elicits an ICMP "port
>          unreachable" response, the procedures of Section 5 of [4]
>          SHOULD be used to determine where to send the response.
> 
> This patch modifies nf_*_sip to work around this quirk, by rewriting
> the response port to 5060 when the following conditions are met:
> 
>  - User-Agent starts with "Cisco"
> 
>  - Incoming TTL was exactly 64 (meaning that our system is the phone's
>    local router, not an intermediate router)
> 

This seems a hack to me, sorry. How many different vendors will switch
to "Cisco" broken way, and we have to patch over and over ?

I would like to get an exact SIP exchange to make sure their is not
another way to handle this without adding a "Cisco" string somewhere...

Please provide a pcap or tcpdump -A

Thanks



^ permalink raw reply

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
From: Eric Dumazet @ 2010-11-14  8:52 UTC (permalink / raw)
  To: Zhang Le
  Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1289720156-30118-1-git-send-email-r0bertz@gentoo.org>

Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> req->ts_recent. In this case, theoretically the req should not be ignored.
> 
> But in fact, it could be ignored, if peer->tcp_ts is so small that the
> difference between this two number is larger than 2 to the power of 31.
> 
> I understand that under this situation, timestamp does not make sense any more,
> because it actually comes from difference machines. However, if anyone
> ever need to do the same investigation which I have done, this will
> save some time for him.
> 
> Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> ---
>  net/ipv4/tcp_ipv4.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 8f8527d..1eb4974 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  		    peer->v4daddr == saddr) {
>  			inet_peer_refcheck(peer);
>  			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> -			    (s32)(peer->tcp_ts - req->ts_recent) >
> -							TCP_PAWS_WINDOW) {
> +			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> +			     peer->tcp_ts > req->ts_recent)) {
>  				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
>  				goto drop_and_release;
>  			}

This seems very wrong to me.

Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.

Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.

What about following patch instead ?

[PATCH] doc: extend tcp_tw_recycle documentation

tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/networking/ip-sysctl.txt |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
 tcp_tw_recycle - BOOLEAN
 	Enable fast recycling TIME-WAIT sockets. Default value is 0.
 	It should not be changed without advice/request of technical
-	experts.
+	experts. If you set it to 1, make sure you dont miss connections
+	attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+	In particular, this might break if several clients are behind
+	a common NAT device, since their TCP timestamp wont be changed
+	by the NAT. tcp_tw_recycle should be used with care, most
+	probably in private networks.
 
 tcp_tw_reuse - BOOLEAN
 	Allow to reuse TIME-WAIT sockets for new connections when it is



^ permalink raw reply related

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Eric Dumazet @ 2010-11-14  8:50 UTC (permalink / raw)
  To: Marcos; +Cc: netdev, Stephen Guerin
In-Reply-To: <AANLkTinWUmQ91cCULC8ZXFLwSKz6SNt3BpszrBEhbgcu@mail.gmail.com>

Le dimanche 14 novembre 2010 à 00:21 -0700, Marcos a écrit :
> [Fwd from [linux-kernel], thought I'd follow the suggestion to post
> this to netdev:]
> 
> After seeing some attention this idea generated in the linux press,
> I'd like to re-visit this suggestion.  I'm a nobody on this list, but
> do have some expertise in complex systems (i.e. complexity theory).
> 
> The Kademlia protocol is simple: it has four commands (and won't
> likely grow more): PING, STORE, FIND_NODE, FIND_VALUE.
> It is computationally effortless: it generates random node id's and
> computes distance on a distributed hash table using an simple XOR
> function.
> It is (probably optimally) efficient:  O(log(n)) for n nodes.
> Ultimately, it could increase security: by creating a system for
> tracking trusted peers, a new topology of content-sharing can be
> generated.
> 
> [From the (kademlia) wikipedia article]: "The first generation peer-to-peer file
> sharing networks, such as Napster, relied on a central database to
> co-ordinate look ups on the network. Second generation peer-to-peer
> networks, such as Gnutella, used flooding to locate files, searching
> every node on the network. Third generation peer-to-peer networks use
> Distributed Hash Tables to look up files in the network. Distributed
> hash tables store resource locations throughout the network. A major
> criterion for these protocols is locating the desired nodes quickly."
> 
> Putting a simple, but robust p2p network layer in the kernel offers
> several novel and very interesting possibilities.
> 
> 1. Cutting-edge cool factor:  It would put linux way ahead of the
> net's general evolution to an full-fledged "Internet Operating
> System".  The world needs an open source solution over Google's,
> Microsoft's (or any other's) attempt to create such a solution.
> Dismiss any attempts to see such a request as warez-d00ds looking to
> make a more efficient pirating network.
> 
> 2. Lower maintenance:  Though unification, it would simplify the many
> (currently disparate) linux solutions for large-scale aggregation of
> computational and storage resources that are distributed across many
> machines.  Additionally, NFS (the networking protocol that *IS* in the
> kernel) is stale, has high administrative and operational overhead,
> and is not made to scale to millions of shared nodes in a graph
> topology.
> 
> 3. Excite a new wave of Linux development:  90% of linux machines are
> on the net, but don't utilize the real value of peer connectivity
> (which can grow profoundly faster than Metcalf's N^2 "value of the
> network" law).  Putting p2p in kernel space communicates to every
> developer that linux is serious about creating a unified and complete
> solution for creating such a infrastructure.  Let the cloud
> applications and such be in user space, but keep the main
> connection-tracking in the kernel.   Such a move would make for many
> (unforeseeable) complex emergent behaviors and optimizations to arise
> -- see Wikipedia on Reed's Law for a sense of it (to wit: "even if the
> utility of groups available to be joined is very small on a peer-group
> basis, eventually the network effect of potential group membership ...
> dominate[s] the overall economics of the system").
> 
> Consider, for example, social networking: it is an inherently p2p
> structure and is lying in wait to explode the next wave of internet
> evolution and new-value generation.  There's no doubt that this is the
> trend of the future -- best that open source be there first.  Users
> are creating value on their machines *every day*, but there's little
> infrastructure to take advantage of it.  Currently, it's either lost
> or exploited.  Solution and vision trajectories:  Diaspora comes to
> mind, mash-up applications like Photosynth aggregating the millions of
> photos on people's computers (see the TED.com presentation), open
> currencies and meritocratic market systems using such a "meta-linux"
> as a backbone, etc. -- whole new governance models for sharing content
> would undoubtedly arise.  HTTP/HTML is too much of an all-or-nothing
> and coarse approach to organizing the world's content.  The net needs
> a backbone for sharing personal content and grouping it to create new
> abstractions and wealth.  See pangaia.sourceforge.net for some of
> ideas I've personally been developing.
> 
> Anyway, I'm with hp_fk on this one.  Ignore at the peril and risk of
> the future...  :)
> 

I have no idea why and how kademlia would be added to "linux kernel"

Its a protocol based on UDP, and probably already done on userland.

What am I missing ?




^ permalink raw reply

* [PATCH/RFC] netfilter: nf_conntrack_sip: Handle quirky Cisco phones
From: Kevin Cernekee @ 2010-11-14  8:32 UTC (permalink / raw)
  To: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris <jmorris@
  Cc: netfilter-devel, netfilter, coreteam, linux-kernel, netdev

Most SIP devices use a source port of 5060/udp on SIP requests, so the
response automatically comes back to port 5060:

phone_ip:5060 -> proxy_ip:5060   REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

The newer Cisco IP phones, however, use a randomly chosen high source
port for the SIP request but expect the response on port 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

Standard Linux NAT, with or without nf_nat_sip, will send the reply back
to port 49173, not 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:49173  100 Trying

But the phone is not listening on 49173, so it will never see the reply.

This issue was seen on a Cisco CP-7965G, firmware 8-5(3).  It appears
to be a well-known problem on 7941 and newer:

http://www.voip-info.org/wiki/view/Standalone+Cisco+7941%252F7961+without+a+local+PBX

Search for "Connecting to the outside world"

I contacted Cisco support and they were not amenable to changing the
behavior.  It appears to be RFC3261-compliant, as the "Sent-by port"
field in the request specifies 5060:

18.2.2 Sending Responses

   The server transport uses the value of the top Via header field in
   order to determine where to send a response.  It MUST follow the
   following process:

...

      o  Otherwise (for unreliable unicast transports), if the top Via
         has a "received" parameter, the response MUST be sent to the
         address in the "received" parameter, using the port indicated
         in the "sent-by" value, or using port 5060 if none is specified
         explicitly.  If this fails, for example, elicits an ICMP "port
         unreachable" response, the procedures of Section 5 of [4]
         SHOULD be used to determine where to send the response.

This patch modifies nf_*_sip to work around this quirk, by rewriting
the response port to 5060 when the following conditions are met:

 - User-Agent starts with "Cisco"

 - Incoming TTL was exactly 64 (meaning that our system is the phone's
   local router, not an intermediate router)

Tested on Linus' latest 2.6.37-rc tree.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---
 include/linux/netfilter/nf_conntrack_sip.h |    2 ++
 net/ipv4/netfilter/nf_nat_sip.c            |   12 ++++++++++++
 net/netfilter/nf_conntrack_sip.c           |   25 +++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 0ce91d5..a6ea620 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -8,6 +8,7 @@
 struct nf_ct_sip_master {
 	unsigned int	register_cseq;
 	unsigned int	invite_cseq;
+	unsigned int	cisco_port_mangle;
 };
 
 enum sip_expectation_classes {
@@ -90,6 +91,7 @@ enum sip_header_types {
 	SIP_HDR_EXPIRES,
 	SIP_HDR_CONTENT_LENGTH,
 	SIP_HDR_CALL_ID,
+	SIP_HDR_USER_AGENT,
 };
 
 enum sdp_header_types {
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index e40cf78..4b9a46d 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -121,6 +121,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
 	unsigned int coff, matchoff, matchlen;
 	enum sip_header_types hdr;
 	union nf_inet_addr addr;
@@ -225,6 +226,17 @@ next:
 			return NF_DROP;
 	}
 
+	/* Mangle destination port for Cisco phones, then fix up checksums */
+	if (help->help.ct_sip_info.cisco_port_mangle) {
+		struct udphdr *uh;
+
+		uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
+		uh->dest = htons(SIP_PORT);
+
+		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
+			return NF_DROP;
+	}
+
 	if (!map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_FROM) ||
 	    !map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_TO))
 		return NF_DROP;
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index bcf47eb..6042f66 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -18,6 +18,7 @@
 #include <linux/udp.h>
 #include <linux/tcp.h>
 #include <linux/netfilter.h>
+#include <linux/ip.h>
 
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_core.h>
@@ -338,6 +339,7 @@ static const struct sip_header ct_sip_hdrs[] = {
 	[SIP_HDR_EXPIRES]		= SIP_HDR("Expires", NULL, NULL, digits_len),
 	[SIP_HDR_CONTENT_LENGTH]	= SIP_HDR("Content-Length", "l", NULL, digits_len),
 	[SIP_HDR_CALL_ID]		= SIP_HDR("Call-Id", "i", NULL, callid_len),
+	[SIP_HDR_USER_AGENT]		= SIP_HDR("User-Agent", NULL, NULL, string_len),
 };
 
 static const char *sip_follow_continuation(const char *dptr, const char *limit)
@@ -1366,6 +1368,29 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 	unsigned int matchoff, matchlen;
 	unsigned int cseq, i;
 
+	/* Many Cisco IP phones use a high source port for SIP requests, but
+	 * listen for the response on port 5060.  If we are the local
+	 * router for one of these phones, flag the connection here so that
+	 * responses will be redirected to the correct port.
+	 */
+	do {
+		static const char cisco[] = "Cisco";
+		struct iphdr *iph = ip_hdr(skb);
+		struct nf_conn_help *help = nfct_help(ct);
+
+		if (iph->ttl != 63)
+			break;
+		if (ct_sip_get_header(ct, *dptr, 0, *datalen,
+				SIP_HDR_USER_AGENT, &matchoff, &matchlen) <= 0)
+			break;
+		if (matchlen < strlen(cisco))
+			break;
+		if (strnicmp(*dptr + matchoff, cisco, strlen(cisco)) != 0)
+			break;
+
+		help->help.ct_sip_info.cisco_port_mangle = 1;
+	} while (0);
+
 	for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
 		const struct sip_handler *handler;
 
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH v2] drivers/net/tile/: on-chip network drivers for the tile architecture
From: Sam Ravnborg @ 2010-11-14  7:52 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: linux-kernel, netdev, Stephen Hemminger, Eric Dumazet
In-Reply-To: <201011140335.oAE3Zuqg032279@farm-0027.internal.tilera.com>

> diff --git a/drivers/net/tile/Makefile b/drivers/net/tile/Makefile
> new file mode 100644
> index 0000000..f634f14
> --- /dev/null
> +++ b/drivers/net/tile/Makefile
> @@ -0,0 +1,10 @@
> +#
> +# Makefile for the TILE on-chip networking support.
> +#
> +
> +obj-$(CONFIG_TILE_NET) += tile_net.o
> +ifdef CONFIG_TILEGX
> +tile_net-objs := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
> +else
> +tile_net-objs := tilepro.o
> +endif

The -objs syntax is deprecated these days.
Preferred syntax for the above is:

ifdef CONFIG_TILEGX
tile_net-y := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
else
tile_net-y := tilepro.o
endif

We could make this even shorter like this (assuming TILEGX is a bool):
tile_net-y := tilepro.o
tile_net-$(CONFIG_TILEGX) := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o

But this is less readable - so the longer version is better.


	Sam

^ permalink raw reply

* [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
From: Zhang Le @ 2010-11-14  7:35 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Zhang Le, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.

But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.

I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		    peer->v4daddr == saddr) {
 			inet_peer_refcheck(peer);
 			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
-			    (s32)(peer->tcp_ts - req->ts_recent) >
-							TCP_PAWS_WINDOW) {
+			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+			     peer->tcp_ts > req->ts_recent)) {
 				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
 				goto drop_and_release;
 			}
-- 
1.7.3.2

^ permalink raw reply related

* Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Marcos @ 2010-11-14  7:21 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Guerin
In-Reply-To: <AANLkTi=VgBQ5rwkjZeo2zsRx1dSBy+4PBXeLLPQ4LRES@mail.gmail.com>

[Fwd from [linux-kernel], thought I'd follow the suggestion to post
this to netdev:]

After seeing some attention this idea generated in the linux press,
I'd like to re-visit this suggestion.  I'm a nobody on this list, but
do have some expertise in complex systems (i.e. complexity theory).

The Kademlia protocol is simple: it has four commands (and won't
likely grow more): PING, STORE, FIND_NODE, FIND_VALUE.
It is computationally effortless: it generates random node id's and
computes distance on a distributed hash table using an simple XOR
function.
It is (probably optimally) efficient:  O(log(n)) for n nodes.
Ultimately, it could increase security: by creating a system for
tracking trusted peers, a new topology of content-sharing can be
generated.

[From the (kademlia) wikipedia article]: "The first generation peer-to-peer file
sharing networks, such as Napster, relied on a central database to
co-ordinate look ups on the network. Second generation peer-to-peer
networks, such as Gnutella, used flooding to locate files, searching
every node on the network. Third generation peer-to-peer networks use
Distributed Hash Tables to look up files in the network. Distributed
hash tables store resource locations throughout the network. A major
criterion for these protocols is locating the desired nodes quickly."

Putting a simple, but robust p2p network layer in the kernel offers
several novel and very interesting possibilities.

1. Cutting-edge cool factor:  It would put linux way ahead of the
net's general evolution to an full-fledged "Internet Operating
System".  The world needs an open source solution over Google's,
Microsoft's (or any other's) attempt to create such a solution.
Dismiss any attempts to see such a request as warez-d00ds looking to
make a more efficient pirating network.

2. Lower maintenance:  Though unification, it would simplify the many
(currently disparate) linux solutions for large-scale aggregation of
computational and storage resources that are distributed across many
machines.  Additionally, NFS (the networking protocol that *IS* in the
kernel) is stale, has high administrative and operational overhead,
and is not made to scale to millions of shared nodes in a graph
topology.

3. Excite a new wave of Linux development:  90% of linux machines are
on the net, but don't utilize the real value of peer connectivity
(which can grow profoundly faster than Metcalf's N^2 "value of the
network" law).  Putting p2p in kernel space communicates to every
developer that linux is serious about creating a unified and complete
solution for creating such a infrastructure.  Let the cloud
applications and such be in user space, but keep the main
connection-tracking in the kernel.   Such a move would make for many
(unforeseeable) complex emergent behaviors and optimizations to arise
-- see Wikipedia on Reed's Law for a sense of it (to wit: "even if the
utility of groups available to be joined is very small on a peer-group
basis, eventually the network effect of potential group membership ...
dominate[s] the overall economics of the system").

Consider, for example, social networking: it is an inherently p2p
structure and is lying in wait to explode the next wave of internet
evolution and new-value generation.  There's no doubt that this is the
trend of the future -- best that open source be there first.  Users
are creating value on their machines *every day*, but there's little
infrastructure to take advantage of it.  Currently, it's either lost
or exploited.  Solution and vision trajectories:  Diaspora comes to
mind, mash-up applications like Photosynth aggregating the millions of
photos on people's computers (see the TED.com presentation), open
currencies and meritocratic market systems using such a "meta-linux"
as a backbone, etc. -- whole new governance models for sharing content
would undoubtedly arise.  HTTP/HTML is too much of an all-or-nothing
and coarse approach to organizing the world's content.  The net needs
a backbone for sharing personal content and grouping it to create new
abstractions and wealth.  See pangaia.sourceforge.net for some of
ideas I've personally been developing.

Anyway, I'm with hp_fk on this one.  Ignore at the peril and risk of
the future...  :)

marcos

^ permalink raw reply

* [PATCH] netfilter: define ct_*_info as needed
From: Changli Gao @ 2010-11-14  6:40 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/netfilter/nf_conntrack.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index caf17db..25c34af 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -50,11 +50,24 @@ union nf_conntrack_expect_proto {
 /* per conntrack: application helper private data */
 union nf_conntrack_help {
 	/* insert conntrack helper private data (master) here */
+#if defined(CONFIG_NF_CONNTRACK_FTP) || defined(CONFIG_NF_CONNTRACK_FTP_MODULE)
 	struct nf_ct_ftp_master ct_ftp_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_PPTP) || \
+    defined(CONFIG_NF_CONNTRACK_PPTP_MODULE)
 	struct nf_ct_pptp_master ct_pptp_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_H323) || \
+    defined(CONFIG_NF_CONNTRACK_H323_MODULE)
 	struct nf_ct_h323_master ct_h323_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_SANE) || \
+    defined(CONFIG_NF_CONNTRACK_SANE_MODULE)
 	struct nf_ct_sane_master ct_sane_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_SIP) || defined(CONFIG_NF_CONNTRACK_SIP_MODULE)
 	struct nf_ct_sip_master ct_sip_info;
+#endif
 };
 
 #include <linux/types.h>

^ permalink raw reply related

* [PATCH] netfilter: fix the wrong alloc_size
From: Changli Gao @ 2010-11-14  6:35 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice
wrongly.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 net/netfilter/nf_conntrack_extend.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
index bd82450..920f924 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -144,9 +144,8 @@ static void update_alloc_size(struct nf_ct_ext_type *type)
 		if (!t1)
 			continue;
 
-		t1->alloc_size = sizeof(struct nf_ct_ext)
-				 + ALIGN(sizeof(struct nf_ct_ext), t1->align)
-				 + t1->len;
+		t1->alloc_size = ALIGN(sizeof(struct nf_ct_ext), t1->align) +
+				 t1->len;
 		for (j = 0; j < NF_CT_EXT_NUM; j++) {
 			t2 = nf_ct_ext_types[j];
 			if (t2 == NULL || t2 == t1 ||

^ permalink raw reply related

* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Maxim Levitsky @ 2010-11-14  4:50 UTC (permalink / raw)
  To: Stefan Richter; +Cc: netdev, linux1394-devel, linux-kernel
In-Reply-To: <tkrat.83117f4523f0d928@s5r6.in-berlin.de>

On Sat, 2010-11-13 at 23:07 +0100, Stefan Richter wrote:
> This prevents firewire-net from submitting write requests in fast
> succession until failure due to all 64 transaction labels were used up
> for unfinished split transactions.  The netif_stop/wake_queue API is
> used for this purpose.
> 
> Without this stop/wake mechanism, datagrams were simply lost whenever
> the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
> also prevented other unrelated outbound transactions to be initiated.
> 
> The high watermark is set to considerably less than 64 (I chose 8)
> because peers which run current Linux firewire-ohci are still easily
> saturated by this (i.e. some datagrams are dropped with ack-busy-*
> events), depending on the hardware at transmitter and receiver side.
> 
> I did not see changes to resulting throughput that were discernible from
> the usual measuring noise.  To do:  Revisit the choice of queue depth
> once firewire-ohci's AR DMA was improved.
> 
> I wonder what a good net_device.tx_queue_len value is.  I just set it
> to the same value as the chosen watermark for now.
> 
> Note:  This removes some netif_wake_queue from reception code paths.
> They were apparently copy&paste artefacts from a nonsensical
> netif_wake_queue use in the older eth1394 driver.  This belongs only
> into the transmit path.
> 
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
> Update 2:  Maxim told me to de-obfuscate status tracking.  I realized
> that netif_queue_stopped can be used for that and thereby noticed bogus
> usages of it in the rx path.

In fact after lot of testing I see that original patch, 
'[PATCH 4/4] firewire: net: throttle TX queue before running out of
tlabels' works the best here.
With AR fixes, I don't see even a single fwnet_write_complete error on
ether side.

However the 'update 2' (maybe update 1 too, didn't test), lowers
desktop->laptop throughput somewhat.
(250 vs 227 Mbits/s). I tested this many times.

Actuall raw troughput possible with UDP stream and ether no throttling
or higher packets in flight count (I tested 50/30), it 280 Mbits/s.

BTW, I still don't understand fully  why my laptop sends only at 180
Mbits/s pretty much always regardless of patches or TCP/UDP.

I also tested performance impact of other patches, and it is too small
to see through the noise.

Not bad, ah? From complete trainwreck, the IP over 1394, turned out into
very stable and fast connection that beats 100 Mbit ethernet a bit.

Now next on my list a POC (Piece of cake) items.

I need to figure out why s2ram hoses the network connection.
In fact usually, firewire-ohci does work, and reload of firewire-net
restores the connection.

Also, I need to add all required bits to make firewire-net work with NM.

I need to make resets more robust. Currently after cable plug it takes
some time until connection starts working.

So thanks again, especially to Clemens Ladisch for the hardest fixes
that made all this possible.

And of course feel free to not merge my AR rewrite, it is mostly done
as a prof of concept to see if my hardware is buggy or not.
I am sure these patches can be done better.

Best regards,
	Maxim Levitsky


------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev

^ permalink raw reply

* [PATCH  kernel 2.6.37-rc1]ipg.c: remove id [SUNDANCE, 0x1021]
From: Ken Kawasaki @ 2010-11-13 23:42 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20101107001124.7d8ef6c4.ken_kawasaki@spring.nifty.jp>


ipg.c:
  The id [SUNDANCE, 0x1021] (=[0x13f0, 0x1021]) is defined
  at dl2k.h and ipg.c.
  But this device works better with dl2k driver.
  
  This problem is similar with the commit 
  [25cca5352712561fba97bd37c495593d641c1d39
  ipg: Remove device claimed by dl2k from pci id table]
  at 11 Feb 2010.

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.37-rc1/drivers/net/ipg.c.orig	2010-11-13 19:54:53.000000000 +0900
+++ linux-2.6.37-rc1/drivers/net/ipg.c	2010-11-13 19:57:03.000000000 +0900
@@ -88,16 +88,14 @@ static const char *ipg_brand_name[] = {
 	"IC PLUS IP1000 1000/100/10 based NIC",
 	"Sundance Technology ST2021 based NIC",
 	"Tamarack Microelectronics TC9020/9021 based NIC",
-	"Tamarack Microelectronics TC9020/9021 based NIC",
 	"D-Link NIC IP1000A"
 };
 
 static DEFINE_PCI_DEVICE_TABLE(ipg_pci_tbl) = {
 	{ PCI_VDEVICE(SUNDANCE,	0x1023), 0 },
 	{ PCI_VDEVICE(SUNDANCE,	0x2021), 1 },
-	{ PCI_VDEVICE(SUNDANCE,	0x1021), 2 },
-	{ PCI_VDEVICE(DLINK,	0x9021), 3 },
-	{ PCI_VDEVICE(DLINK,	0x4020), 4 },
+	{ PCI_VDEVICE(DLINK,	0x9021), 2 },
+	{ PCI_VDEVICE(DLINK,	0x4020), 3 },
 	{ 0, }
 };
 

^ permalink raw reply

* bridge netpoll support: mismatch between net core and bridge headers
From: Mike Frysinger @ 2010-11-13 23:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

commit 91d2c34a4eed32876ca333b0ca44f3bc56645805 added this bit of code
to net/bridge/br_private.h:
struct net_bridge_port {
    ....
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   struct netpoll          *np;
+#endif
};
....
#ifdef CONFIG_NET_POLL_CONTROLLER
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+   return br->dev->npinfo;
+}
....

unfortunately, this is not the define protection that is used in the
core net code (include/linux/netdevice.h):
#ifdef CONFIG_NETPOLL
    struct netpoll_info *npinfo;
#endif

so in my randconfig builds, i'm now seeing frequent failures along the lines of:

In file included from net/bridge/br.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br.o] Error 1
In file included from net/bridge/br_device.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
In file included from net/bridge/br_fdb.c:27:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_fdb.o] Error 1
make[2]: *** [net/bridge/br_device.o] Error 1
In file included from net/bridge/br_forward.c:23:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_forward.o] Error 1

seems to be a regression introduced during the 2.6.36 cycle
-mike

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox