Netdev List
 help / color / mirror / Atom feed
* Re: Null pointer dereference when bringing up bonding device on kernel-2.6.24-2.fc9.i686
From: Jay Vosburgh @ 2008-01-28 20:17 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: netdev
In-Reply-To: <m3bq75en2y.fsf@ursa.amorsen.dk>

Benny Amorsen <benny+usenet@amorsen.dk> wrote:

>https://bugzilla.redhat.com/show_bug.cgi?id=430391

	I know what this is, I'll fix it.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Null pointer dereference when bringing up bonding device on kernel-2.6.24-2.fc9.i686
From: Benny Amorsen @ 2008-01-28 20:01 UTC (permalink / raw)
  To: netdev

https://bugzilla.redhat.com/show_bug.cgi?id=430391

Bringing up interface bond0:
BUG: unable to handle kernel NULL
pointer dereference at virtual address 00000000
printing eip: c0506fd8 *pde = 7f5f8067 
Oops: 0000 [#1] SMP 
Modules linked in: bonding ipv6 xt_pkttype ipt_LOG ipt_iprange
ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack_tftp
nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables
dm_mirror dm_mod rtc_cmos iTCO_wdt pcspkr iTCO_vendor_support e1000
i2c_i801 i2c_core button tg3 e752x_edac edac_core sr_mod sg cdrom
ata_generic ata_piix pata_acpi libata sd_mod scsi_mod raid456
async_xor async_memcpy async_tx xor raid1 ext3 jbd mbcache uhci_hcd
ohci_hcd ehci_hcd

Pid: 1620, comm: modprobe Not tainted (2.6.24-2.fc9 #1)
EIP: 0060:[<c0506fd8>] EFLAGS: 00210202 CPU: 3
EIP is at strnicmp+0x25/0x7c
EAX: f6d64062 EBX: 00000000 ECX: 00000010 EDX: 00000000
ESI: 00000010 EDI: 00000000 EBP: f6f2ae64 ESP: f6f2ae50
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process modprobe (pid: 1620, ti=f6f2a000 task=f6fc8000
task.ti=f6f2a000)
Stack: 00000000 f6d64000 f8cf7a14 00000000 f780f630 f6f2ae88 f8ce6fb6 00000000 
       f8cf09b4 00000000 f8cf8f80 00000000 00000001 f780f630 f6f2aeb0 f89a0778 
       f8cf1568 00000064 f6f40540 f8cf7e00 f6f2aeb0 f6f40568 f8cf7e00 f8cf7e00 
Call Trace:
 [<c04064b6>] show_trace_log_lvl+0x1a/0x2f
 [<c0406566>] show_stack_log_lvl+0x9b/0xa3
 [<c0406615>] show_registers+0xa7/0x178
 [<c040681b>] die+0x135/0x220
 [<c0640bd3>] do_page_fault+0x553/0x631
 [<c063f25a>] error_code+0x72/0x78
 [<f8ce6fb6>] bond_create+0x40/0x2da [bonding]
 [<f89a0778>] bonding_init+0x778/0x802 [bonding]
 [<c045482d>] sys_init_module+0x1430/0x155c
 [<c040526e>] syscall_call+0x7/0xb
 =======================
Code: 89 d0 5e 5f 5d c3 55 89 e5 57 31 ff 56 89 ce 53 31 db 83 ec 08
85 c9 89 45 f0 89 55 ec eb 50 31 ff eb 4e 8b 55 f0 8a 02 8b 55 ec <8a>
1a 42 ff 45 f0 84 c0 89 55 ec 74 e7 84 db 89 c7 74 33 38 d8 
EIP: [<c0506fd8>] strnicmp+0x25/0x7c SS:ESP 0068:f6f2ae50
---[ end trace b1f7597239228b36 ]---
./network-functions: line 216:  1620 Segmentation fault      modprobe
$1 > /dev/null 2>&1


Sorry if bugs in distribution kernels are less interesting.


/Benny



^ permalink raw reply

* sky2: tx hang on dual-port Yukon XL when rx csum disabled
From: Tony Battersby @ 2008-01-28 18:43 UTC (permalink / raw)
  To: Stephen Hemminger, netdev

I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
2.6.24.  The problem is triggered by both ports transmitting at high
speed simultaneously.  This problem is 100% quickly reproducible.  Here
is the setup:

PC #1 with Intel PRO/1000 NIC:
e1000 IP address 192.168.1.1
running iperf -s

PC #2 with Intel PRO/1000 NIC:
e1000 IP address 192.168.2.1
running iperf -s

PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
sky2 IP address 192.168.1.2
sky2 IP address 192.168.2.2

So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
-s".  Each of these Intel NICs is directly cabled to one of the two
ports of the SysKonnect NIC.

When I run:
(PC #3 tty1) iperf -c 192.168.1.1 -t 30
(wait for a second or two)
(PC #3 tty2) iperf -c 192.168.2.1 -t 30

"iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
finish.  Press Ctrl-C to abort the hung iperf.  Ping 192.168.1.1 does
not respond.  Ping 192.168.2.1 does respond, but each ping has almost
exactly 1 second latency (the latency should be < 1 ms).

When I switch the order of the tests, whichever iperf -c was started
_first_ is the one that locks up with no ping afterward, and whichever
was started _second_ is the one that finishes, but with a 1-second ping
latency afterward.  So the problem follows the ordering of the tests
rather than a specific port.

Also, the trigger seems to be transmitting, not receiving.  If I run
"iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
PCs, then the tests pass.

When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
checksumming on both ports of the SysKonnect NIC, both tests pass
successfully.  Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
disable rx checksum on Yukon XL" disabled rx checksumming by default on
this NIC to get rid of some "hw csum failure" messages
(http://marc.info/?l=linux-netdev&m=119497815523843&w=4).  However, this
seems to have exposed a different (and arguably worse) bug.

I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
the problem.

As a temporary workaround, I will use ethtool to turn on rx checksumming
and live with the "hw csum failure" messages, since they are better than
network lockups.

Let me know if I can be of any further assistance in tracking down this
problem.

Tony Battersby
Cybernetics


^ permalink raw reply

* ipcomp regression in 2.6.24
From: Beschorner Daniel @ 2008-01-28 19:00 UTC (permalink / raw)
  To: netdev

Has someone an idea which patch could have damaged ipcomp?

Daniel

> With 2.6.24 IPSEC/ESP tunnels to older kernels establish fine, data
> flows in both directions, but no data comes out of the tunnel.
> Needed to disable ipcomp.

^ permalink raw reply

* Re: [PATCH 1/3] ethtool: fix typo on setting speed 10000
From: Kok, Auke @ 2008-01-28 18:55 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, jesse.brandeburg, davem
In-Reply-To: <4789434C.70801@garzik.org>

Jeff Garzik wrote:
> Auke Kok wrote:
>> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>>
>> fix the typo in speed 10000 setting.
>>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
>> ---
>>
>>  ethtool.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> patches 1 and 2 failed to apply.  applied #3

can you refresh the tree on http://git.kernel.org/?p=network/ethtool/ethtool.git ?

I did indeed send the first two patches out earlier but they did not show up on
git.kernel.org, hence I posted them again.

Thanks,

Auke

^ permalink raw reply

* Re: [PATCH 3/6] [DCCP]: Bug-Fix - AWL was never updated
From: Ian McDonald @ 2008-01-28 17:59 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: acme, dccp, netdev
In-Reply-To: <1201515376-8280-4-git-send-email-gerrit@erg.abdn.ac.uk>

On Jan 28, 2008 11:16 PM, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> This patch was triggered by finding the  following message in the syslog:
>  "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...]
>    P.ackno exists or LAWL(82947089) <= P.ackno(82948208)
>                                     <= S.AWH(82948728), sending SYNC..."
>
> Note the difference between AWH and AWL: it is 1639 packets (while Sequence
> Window was actually at 100). A closer look at the trace showed that
> LAWL = AWL = 82947089 equalled the ISS on the Response.
>
> The cause of the bug was that AWL was only ever set on the first packet - the
> DCCP-Request sent by dccp_v{4,6}_connect().
>
> The fix is to continually update AWL/AWH with each new packet (as GSS=AWH).
>
> In addition, AWL/AWH are now updated to enforce more stringent checks on the
> initial sequence numbers when connecting:
>  * AWL is initialised to ISS and remains at this value;
>  * AWH is always set to GSS (via dccp_update_gss());
>  * so on the first Request: AWL =      AWH = ISS,
>    and on the n-th Request: AWL = ISS, AWH = ISS+n.
>
> As a consequence, only Response packets that refer to Requests sent by this
> host will pass, all others are discarded. This is the intention and in effect
> implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1.
>
> Note: A problem that remains is that ISS can potentially be under-run even after
>       the initial handshake; this is addressed a subsequent patch.
>
> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

Yes I had seen this and had worked out that variables weren't being
updated as they should be but hadn't got as far as a fix before I
stopped my coding days so much :-(

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>

^ permalink raw reply

* [PATCH] [AX25]: Kill ax25_bind() user triggable printk.
From: maximilian attems @ 2008-01-28 17:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, maximilian attems

on the last run overlooked that sfuzz triggable message.
move the message to the corresponding comment.

Signed-off-by: maximilian attems <max@stro.at>
---
 net/ax25/af_ax25.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index b4725ff..a2cc7c1 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1038,16 +1038,13 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	int err = 0;
 
 	if (addr_len != sizeof(struct sockaddr_ax25) &&
-	    addr_len != sizeof(struct full_sockaddr_ax25)) {
-		/* support for old structure may go away some time */
+	    addr_len != sizeof(struct full_sockaddr_ax25))
+		/* support for old structure may go away some time
+		 * ax25_bind(): uses old (6 digipeater) socket structure.
+		 */
 		if ((addr_len < sizeof(struct sockaddr_ax25) + sizeof(ax25_address) * 6) ||
-		    (addr_len > sizeof(struct full_sockaddr_ax25))) {
+		    (addr_len > sizeof(struct full_sockaddr_ax25)))
 			return -EINVAL;
-	}
-
-		printk(KERN_WARNING "ax25_bind(): %s uses old (6 digipeater) socket structure.\n",
-			current->comm);
-	}
 
 	if (addr->fsa_ax25.sax25_family != AF_AX25)
 		return -EINVAL;
-- 
1.5.4.rc4


^ permalink raw reply related

* [PATCH] [AX25]: Beautify x25_init() version printk.
From: maximilian attems @ 2008-01-28 17:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, maximilian attems
In-Reply-To: <1201541470-25312-1-git-send-email-max@stro.at>

kill ref to old version and dup Linux.

Signed-off-by: maximilian attems <max@stro.at>
---
 net/x25/af_x25.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 92cfe8e..d0dbce0 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1652,7 +1652,7 @@ static int __init x25_init(void)
 
 	register_netdevice_notifier(&x25_dev_notifier);
 
-	printk(KERN_INFO "X.25 for Linux. Version 0.2 for Linux 2.1.15\n");
+	printk(KERN_INFO "X.25 for Linux Version 0.2\n");
 
 #ifdef CONFIG_SYSCTL
 	x25_register_sysctl();
-- 
1.5.4.rc4


^ permalink raw reply related

* Re: appearing again: kernel: eth0: too many iterations (6) in nv_nic_irq
From: Stephen Hemminger @ 2008-01-28 16:06 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: netdev
In-Reply-To: <18333.55687.621653.838538@fisica.ufpr.br>

On Mon, 28 Jan 2008 11:32:55 -0200
carlos@fisica.ufpr.br (Carlos Carvalho) wrote:

> It seems that this problem with NVidia's nic comes up more and more...
> From time to time we get this in the log:
> 
> Jan 27 14:43:12 duvel kernel: eth0: too many iterations (6) in nv_nic_irq.
> 
> We algo get
> 
> Jan 27 11:32:43 duvel kernel: KERNEL: assertion ((int)tcp_packets_in_flight(tp) >= 0) failed at net/ipv4/tcp_input.c (1274)
> 
> But at different moments, as shown above. Are they related? What's the
> meaning of the "assertion failed" one?
> 
> The messages are more likely to appear when traffic is high
> (>500Mb/s). This is with 2.6.22.16.
> 
> Any suggestions?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Use NAPI which is available as configuration option in this driver.

Increase the max_interrupt_work from the ridiculously low value of 5
to something more larger like 15, with module parameter in /etc/modprobe.d/options:

options forcedeth max_interrupt_work=15

Also, see if you motherboard supports MSI, if so add "msi=1" module parameter

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: 2.6.24 regression: Wake On Lan in sky2 broken on Mac mini
From: Stephen Hemminger @ 2008-01-28 15:59 UTC (permalink / raw)
  To: Tino Keitel; +Cc: netdev
In-Reply-To: <20080128084855.GA5926@dose.home.local>

On Mon, 28 Jan 2008 09:48:55 +0100
Tino Keitel <tino.keitel@gmx.de> wrote:

> On Mon, Jan 28, 2008 at 09:21:30 +0100, Mikael Pettersson wrote:
> > Tino Keitel writes:
> >  > Hi folks,
> >  > 
> >  > with 2.6.24-rc8, Wake On LAN doesn't work anymore as it used to with
> >  > 2.6.23 on my Mac mini Core Duo. I saw that this was reported in
> >  > http://bugzilla.kernel.org/show_bug.cgi?id=9721 and on netdev a patch
> >  > for the sky2 driver was sent by Stephen Hemminger. This patch fixed WOL
> >  > for me after applying it to 2.6.24-rc8.
> >  > 
> >  > However, it seems as the patch never made it into the kernel. Instead,
> >  > the commit that was suspected to break WOL
> >  > (84cd2dfb04d23a961c5f537baa243fa54d0987ac) was reverted
> >  > (be63a21c9573fbf88106ff0f030da5974551257b).
> >  > 
> >  > Now I tried the 2.6.24 release and noticed that WOL is still broken.
> >  > I'll be happy to test any patches that can make it into 2.6.24.1.
> > 
> > 1. Wrong mailing list; use netdev (@vger) instead.
> 
> Done.
> 
> > 2. The reverted commit had much much more serious consequences than
> >    "wol doesn't work", it actually caused BIOS hangs and failed reboots.
> 
> Yes, but even with the reverted commit, WOL still doesn't work. I just
> tried the patch from the netdev mailing list with the 2.6.24 release
> version and now WOL works for me.

Patch went to Jeff, but never made it into 2.6.24.  The timing was just wrong.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 1/5] spidernet: add missing initialization
From: Jens Osterkamp @ 2008-01-28 13:40 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: cbe-oss-dev, netdev, Ishizaki Kou
In-Reply-To: <200801231453.42736.jens@de.ibm.com>


Jeff,

this series of 5 patches by Ishizaki-san runs fine on the blade, I tested it
for a few days. Besides it fixes a nasty bug in spidernet RX descriptor handling.

Can you please consider it for 2.6.25 ?

Thanks !

Jens

On Wednesday 23 January 2008, Jens Osterkamp wrote:
> On Friday 11 January 2008, Ishizaki Kou wrote:
> > This patch fixes initialization of "aneg_count" and "medium" fields in
> > spider_net_card to make spidernet driver correctly sets "link status".
> > 
> > Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
> 
> Acked-by: Jens Osterkamp <jens@de.ibm.com>
> 
> > ---
> > 
> > Index: linux-powerpc-git/drivers/net/spider_net.c
> > ===================================================================
> > --- linux-powerpc-git.orig/drivers/net/spider_net.c
> > +++ linux-powerpc-git/drivers/net/spider_net.c
> > @@ -1399,6 +1399,8 @@ spider_net_link_reset(struct net_device 
> >  	spider_net_write_reg(card, SPIDER_NET_GMACINTEN, 0);
> > 
> >  	/* reset phy and setup aneg */
> > +	card->aneg_count = 0;
> > +	card->medium = BCM54XX_COPPER;
> >  	spider_net_setup_aneg(card);
> >  	mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
> > 
> > @@ -1982,6 +1984,8 @@ spider_net_open(struct net_device *netde
> >  		goto init_firmware_failed;
> > 
> >  	/* start probing with copper */
> > +	card->aneg_count = 0;
> > +	card->medium = BCM54XX_COPPER;
> >  	spider_net_setup_aneg(card);
> >  	if (card->phy.def->phy_id)
> >  		mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
> > _______________________________________________
> > cbe-oss-dev mailing list
> > cbe-oss-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> >



-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* [PACTH 1/1] drivers/net/usb: AX88178 100Mbps problem
From: Reinin Oyama @ 2008-01-28 13:32 UTC (permalink / raw)
  To: netdev; +Cc: David Hollis

[-- Attachment #1: Type: text/plain, Size: 106 bytes --]

Asix 88178 does not work under 100Mbps connection.
This patch correct the problem.
kernel version: 2.6.24

[-- Attachment #2: asix_driver_patch.gz --]
[-- Type: application/gzip, Size: 399 bytes --]

^ permalink raw reply

* appearing again: kernel: eth0: too many iterations (6) in nv_nic_irq
From: Carlos Carvalho @ 2008-01-28 13:32 UTC (permalink / raw)
  To: netdev

It seems that this problem with NVidia's nic comes up more and more...
>From time to time we get this in the log:

Jan 27 14:43:12 duvel kernel: eth0: too many iterations (6) in nv_nic_irq.

We algo get

Jan 27 11:32:43 duvel kernel: KERNEL: assertion ((int)tcp_packets_in_flight(tp) >= 0) failed at net/ipv4/tcp_input.c (1274)

But at different moments, as shown above. Are they related? What's the
meaning of the "assertion failed" one?

The messages are more likely to appear when traffic is high
(>500Mb/s). This is with 2.6.22.16.

Any suggestions?

^ permalink raw reply

* [PATCH net-2.6.25] [IPV6] ADDRLABEL: Fix double free on label deletion.
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-28 12:02 UTC (permalink / raw)
  To: davem; +Cc: yoshfuji, mitch, netdev

If an entry is being deleted because it has only one reference, 
we immediately delete it and blindly register the rcu handler for it,
This results in oops by double freeing that object.

This patch fixes it by consolidating the code paths for the deletion;
let its rcu handler delete the object if it has no more reference.

Bug was found by Mitsuru Chinen <mitch@linux.vnet.ibm.com>

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---

diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 6f1ca60..7a706c4 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -106,6 +106,11 @@ static inline void ip6addrlbl_free(struct ip6addrlbl_entry *p)
 	kfree(p);
 }
 
+static void ip6addrlbl_free_rcu(struct rcu_head *h)
+{
+	ip6addrlbl_free(container_of(h, struct ip6addrlbl_entry, rcu));
+}
+
 static inline int ip6addrlbl_hold(struct ip6addrlbl_entry *p)
 {
 	return atomic_inc_not_zero(&p->refcnt);
@@ -114,12 +119,7 @@ static inline int ip6addrlbl_hold(struct ip6addrlbl_entry *p)
 static inline void ip6addrlbl_put(struct ip6addrlbl_entry *p)
 {
 	if (atomic_dec_and_test(&p->refcnt))
-		ip6addrlbl_free(p);
-}
-
-static void ip6addrlbl_free_rcu(struct rcu_head *h)
-{
-	ip6addrlbl_free(container_of(h, struct ip6addrlbl_entry, rcu));
+		call_rcu(&p->rcu, ip6addrlbl_free_rcu);
 }
 
 /* Find label */
@@ -240,7 +240,6 @@ int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)
 				}
 				hlist_replace_rcu(&p->list, &newp->list);
 				ip6addrlbl_put(p);
-				call_rcu(&p->rcu, ip6addrlbl_free_rcu);
 				goto out;
 			} else if ((p->prefixlen == newp->prefixlen && !p->ifindex) ||
 				   (p->prefixlen < newp->prefixlen)) {
@@ -300,7 +299,6 @@ int __ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
 		    ipv6_addr_equal(&p->prefix, prefix)) {
 			hlist_del_rcu(&p->list);
 			ip6addrlbl_put(p);
-			call_rcu(&p->rcu, ip6addrlbl_free_rcu);
 			ret = 0;
 			break;
 		}

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply related

* [INET]: Prevent out-of-sync truesize on ip_fragment slow path
From: Herbert Xu @ 2008-01-28 11:12 UTC (permalink / raw)
  To: David S. Miller, netdev

Hi:

[INET]: Prevent out-of-sync truesize on ip_fragment slow path

When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length.  This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fd99fbd..76ec973 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -462,6 +462,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*))
 	if (skb_shinfo(skb)->frag_list) {
 		struct sk_buff *frag;
 		int first_len = skb_pagelen(skb);
+		int truesizes = 0;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -485,7 +486,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*))
 				sock_hold(skb->sk);
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
-				skb->truesize -= frag->truesize;
+				truesizes += frag->truesize;
 			}
 		}
 
@@ -496,6 +497,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*))
 		frag = skb_shinfo(skb)->frag_list;
 		skb_shinfo(skb)->frag_list = NULL;
 		skb->data_len = first_len - skb_headlen(skb);
+		skb->truesize -= truesizes;
 		skb->len = first_len;
 		iph->tot_len = htons(first_len);
 		iph->frag_off = htons(IP_MF);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6338a9c..bb9b183 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -609,6 +609,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	if (skb_shinfo(skb)->frag_list) {
 		int first_len = skb_pagelen(skb);
+		int truesizes = 0;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -631,7 +632,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 				sock_hold(skb->sk);
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
-				skb->truesize -= frag->truesize;
+				truesizes += frag->truesize;
 			}
 		}
 
@@ -662,6 +663,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 		first_len = skb_pagelen(skb);
 		skb->data_len = first_len - skb_headlen(skb);
+		skb->truesize -= truesizes;
 		skb->len = first_len;
 		ipv6_hdr(skb)->payload_len = htons(first_len -
 						   sizeof(struct ipv6hdr));

^ permalink raw reply related

* Re: [PATCH][INET_DIAG]: Fix inet_diag_lock_handler error path
From: Herbert Xu @ 2008-01-28 11:03 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David S. Miller, spike, netdev, stable
In-Reply-To: <20080128022050.GQ27661@ghostprotocols.net>

On Mon, Jan 28, 2008 at 12:20:50AM -0200, Arnaldo Carvalho de Melo wrote:
> Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=9825
> 
> The inet_diag_lock_handler function uses ERR_PTR to encode errors but
> its callers were testing against NULL.

Thanks for catching this Arnaldo!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH 1/1] replace logical- by bit-or in smctr_make_tx_status_code(), drivers/net/tokenring/smctr.c
From: Roel Kluin @ 2008-01-28 10:54 UTC (permalink / raw)
  To: mikep; +Cc: netdev

drivers/net/tokenring/smctr.h:50:#define      IBM_PASS_SOURCE_ADDR    0x01
--
replace logical- by bit-or for IBM_PASS_SOURCE_ADDR

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
---
diff --git a/drivers/net/tokenring/smctr.c b/drivers/net/tokenring/smctr.c
index 93da3a3..ec29766 100644
--- a/drivers/net/tokenring/smctr.c
+++ b/drivers/net/tokenring/smctr.c
@@ -3413,7 +3413,7 @@ static int smctr_make_tx_status_code(struct net_device *dev,
         tsv->svi = TRANSMIT_STATUS_CODE;
         tsv->svl = S_TRANSMIT_STATUS_CODE;
 
-        tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) || IBM_PASS_SOURCE_ADDR);
+        tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) | IBM_PASS_SOURCE_ADDR);
 
         /* Stripped frame status of Transmitted Frame */
         tsv->svv[1] = tx_fstatus & 0xff;

^ permalink raw reply related

* [PATCH 4/6] [DCCP]: Fix the adjustments to AWL and SWL
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-4-git-send-email-gerrit@erg.abdn.ac.uk>

This fixes a problem and a potential loophole with regard to seqno/ackno
validity: the problem is that the initial adjustments to AWL/SWL were
only performed at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore the consequence is to perform this safety check each time SWL/AWL
are updated.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection,
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()).

A detailed rationale is below -- can be removed from the commit message.


1. Server sequence number checks during initial handshake
---------------------------------------------------------
The server can not use the fields of the listening socket for seqno/ackno checks
and thus needs to store all relevant information on a per-connection basis on
the dccp_request socket. This is a size-constrained structure and has currently
only ISS (dreq_iss) and ISR (dreq_isr) defined.
Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct
and it is questionable whether this will have any practical gain. The currently
implemented solution is as follows.
 * receiving first Request: dccp_v{4,6}_conn_request sets
                            ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence()

 * sending first Response:  dccp_v{4,6}_send_response via dccp_make_response()
                            sets P.seqno := ISS, sets P.ackno := ISR

 * receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno

 * answering retransmitted Request: dccp_make_response() sets ISS += 1,
                                    otherwise as per first Response

 * completing the handshake: succeeds in dccp_check_req() for the first Ack
                             where P.ackno == ISS (P.seqno is not tested)

 * creating child socket: ISS, ISR are copied from the request_sock

This solution will succeed whenever the server can receive the Request and the
subsequent Ack in succession, without retransmissions. If there is packet loss,
the client needs to retransmit until this condition succeeds; it will otherwise
eventually give up. Adding further fields to the request_sock could increase
the robustness a bit, in that it would make possible to let a reordered Ack
(from a retransmitted Response) pass. The argument against such a solution is
that if the packet loss is not persistent and an Ack gets through, why not
wait for the one answering the original response: if the loss is persistent, it
is probably better to not start the connection in the first place.

Long story short: the present design (by Arnaldo) is simple and will likely work
just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are
not needed until the moment the request_sock is cloned into the accept queue.

At that stage feature negotiation has completed, so that the values for the local
and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a better
position to compute {A,S}W{L,H}.


2. Client sequence number checks during initial handshake
---------------------------------------------------------
Until entering PARTOPEN the client does not need the adjustments, since it
constrains the Ack window to the packet it sent.

 * sending first Request: dccp_v{4,6}_connect() choose ISS,
                          dccp_connect() then sets GAR := ISS (as per 8.5),
			  dccp_transmit_skb() (with the previous bug fix) sets
			         GSS := ISS, AWL := ISS, AWH := GSS
 * n-th retransmitted Request (with previous patch):
	                  dccp_retransmit_skb() via timer calls
			  dccp_transmit_skb(), which sets GSS := ISS+n
                          and then AWL := ISS, AWH := ISS+n

 * receiving any Response: dccp_rcv_request_sent_state_process()
	                   -- accepts packet if AWL <= P.ackno <= AWH;
			   -- sets GSR = ISR = P.seqno

 * sending the Ack completing the handshake: dccp_send_ack() calls
                           dccp_transmit_skb(), which sets GSS += 1
			   and AWL := ISS, AWH := GSS


Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/dccp.h      |   20 ++++++++++++++++++++
 net/dccp/input.c     |   18 ++++++------------
 net/dccp/minisocks.c |   30 +++++++++---------------------
 3 files changed, 35 insertions(+), 33 deletions(-)

--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -392,6 +392,23 @@ static inline void dccp_update_gsr(struct sock *sk, u64 seq)
 	dp->dccps_gsr = seq;
 	/* Sequence validity window depends on remote Sequence Window (7.5.1) */
 	dp->dccps_swl = SUB48(ADD48(dp->dccps_gsr, 1), dp->dccps_r_seq_win / 4);
+	/*
+	 * Adjust SWL so that it is not below ISR. In contrast to RFC 4340,
+	 * 7.5.1 we perform this check beyond the initial handshake: W/W' are
+	 * always > 32, so for the first W/W' packets in the lifetime of a
+	 * connection we always have to adjust SWL.
+	 * A second reason why we are doing this is that the window depends on
+	 * the feature-remote value of Sequence Window: nothing stops the peer
+	 * from updating this value while we are busy adjusting SWL for the
+	 * first W packets (we would have to count from scratch again then).
+	 * Therefore it is safer to always make sure that the Sequence Window
+	 * is not artificially extended by a peer who grows SWL downwards by
+	 * continually updating the feature-remote Sequence-Window.
+	 * If sequence numbers wrap it is bad luck. But that will take a while
+	 * (48 bit), and this measure prevents Sequence-number attacks.
+	 */
+	if (before48(dp->dccps_swl, dp->dccps_isr))
+		dp->dccps_swl = dp->dccps_isr;
 	dp->dccps_swh = ADD48(dp->dccps_gsr, (3 * dp->dccps_r_seq_win) / 4);
 }
 
@@ -402,6 +419,9 @@ static inline void dccp_update_gss(struct sock *sk, u64 seq)
 	dp->dccps_gss = seq;
 	/* Ack validity window depends on local Sequence Window value (7.5.1) */
 	dp->dccps_awl = SUB48(ADD48(dp->dccps_gss, 1), dp->dccps_l_seq_win);
+	/* Adjust AWL so that it is not below ISS - see comment above for SWL */
+	if (before48(dp->dccps_awl, dp->dccps_iss))
+		dp->dccps_awl = dp->dccps_iss;
 	dp->dccps_awh = dp->dccps_gss;
 }
 
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -440,20 +440,14 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
 			dp->dccps_syn_rtt = dccp_sample_rtt(sk, 10 * (tstamp -
 			    dp->dccps_options_received.dccpor_timestamp_echo));
 
-		dp->dccps_isr = DCCP_SKB_CB(skb)->dccpd_seq;
-		dccp_update_gsr(sk, dp->dccps_isr);
 		/*
-		 * SWL and AWL are initially adjusted so that they are not less than
-		 * the initial Sequence Numbers received and sent, respectively:
-		 *	SWL := max(GSR + 1 - floor(W/4), ISR),
-		 *	AWL := max(GSS - W' + 1, ISS).
-		 * These adjustments MUST be applied only at the beginning of the
-		 * connection.
-		 *
-		 * AWL was adjusted in dccp_v4_connect -acme
+		 * Set ISR, GSR from packet. ISS was set in dccp_v{4,6}_connect
+		 * and GSS in dccp_transmit_skb(). Setting AWL/AWH and SWL/SWH
+		 * is done as part of activating the feature values below, since
+		 * these settings depend on the local/remote Sequence Window
+		 * features, which were undefined or not confirmed until now.
 		 */
-		dccp_set_seqno(&dp->dccps_swl,
-			       max48(dp->dccps_swl, dp->dccps_isr));
+		dp->dccps_gsr = dp->dccps_isr = DCCP_SKB_CB(skb)->dccpd_seq;
 
 		dccp_sync_mss(sk, icsk->icsk_pmtu_cookie);
 
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -120,30 +120,18 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
 		 *
 		 *    Choose S.ISS (initial seqno) or set from Init Cookies
 		 *    Initialize S.GAR := S.ISS
-		 *    Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookies
-		 */
-		newdp->dccps_gar = newdp->dccps_iss = dreq->dreq_iss;
-		dccp_update_gss(newsk, dreq->dreq_iss);
-
-		newdp->dccps_isr = dreq->dreq_isr;
-		dccp_update_gsr(newsk, dreq->dreq_isr);
-
-		/*
-		 * SWL and AWL are initially adjusted so that they are not less than
-		 * the initial Sequence Numbers received and sent, respectively:
-		 *	SWL := max(GSR + 1 - floor(W/4), ISR),
-		 *	AWL := max(GSS - W' + 1, ISS).
-		 * These adjustments MUST be applied only at the beginning of the
-		 * connection.
+		 *    Set S.ISR, S.GSR from packet (or Init Cookies)
+		 *
+		 *    Setting AWL/AWH and SWL/SWH happens as part of the feature
+		 *    activation below, as these windows all depend on the local
+		 *    and remote Sequence Window feature values (7.5.2).
 		 */
-		dccp_set_seqno(&newdp->dccps_swl,
-			       max48(newdp->dccps_swl, newdp->dccps_isr));
-		dccp_set_seqno(&newdp->dccps_awl,
-			       max48(newdp->dccps_awl, newdp->dccps_iss));
+		newdp->dccps_gss = newdp->dccps_iss = dreq->dreq_iss;
+		newdp->dccps_gar = newdp->dccps_iss;
+		newdp->dccps_gsr = newdp->dccps_isr = dreq->dreq_isr;
 
 		/*
-		 * Activate features after initialising the sequence numbers since
-		 * CCID initialisation may depend on GSS, ISR, ISS etc.
+		 * Activate features: initialise CCIDs, sequence windows etc.
 		 */
 		if (dccp_feat_activate_values(newsk, &dreq->dreq_featneg)) {
 			/* It is still raw copy of parent, so invalidate

^ permalink raw reply

* [PATCH 6/6] [DCCP-PROBE]: Reduce noise in output and convert to ktime_t
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-6-git-send-email-gerrit@erg.abdn.ac.uk>

This fixes the problem that dccp_probe output can grow quite large without
apparent benefit (many identical data points), creating huge files (up to
over one Gigabyte for a few minutes' test run) which are very hard to
post-process (in one instance it got so bad that gnuplot ate up all memory
plus swap).

The cause for the problem is that the kprobe is inserted into dccp_sendmsg,
which can be called in a polling-mode (whenever the TX queue is full due to
congestion-control issues, EAGAIN is returned). This creates many very
similar data points, i.e. the increase of processing time does not increase
the quality/information of the probe output.

The fix is to attach the probe to a different function -- write_xmit was
chosen since it gets called continually (both via userspace and timer);
an input-path function would stop sampling as soon as the other end stops
sending feedback.

For comparison the output file sizes for the same 20 second test
run over a lossy link:
           * before / without patch:  118   Megabytes
           * after  / with patch:       1.2 Megabytes
and there was much less noise in the output.

To allow backward compatibility with scripts that people use, the now-unused
`size' field in the output has been replaced with the CCID identifier. This
also serves for future compatibility - support for CCID2 is work in progress
(depends on the still unfinished SRTT/RTTVAR updates).

While at it, the update to ktime_t was also performed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/probe.c |   68 ++++++++++++++++++-----------------------------------
 1 files changed, 23 insertions(+), 45 deletions(-)

--- a/net/dccp/probe.c
+++ b/net/dccp/probe.c
@@ -46,77 +46,55 @@ struct {
 	struct kfifo	  *fifo;
 	spinlock_t	  lock;
 	wait_queue_head_t wait;
-	struct timeval	  tstart;
+	ktime_t		  start;
 } dccpw;
 
-static void printl(const char *fmt, ...)
-{
-	va_list args;
-	int len;
-	struct timeval now;
-	char tbuf[256];
-
-	va_start(args, fmt);
-	do_gettimeofday(&now);
-
-	now.tv_sec -= dccpw.tstart.tv_sec;
-	now.tv_usec -= dccpw.tstart.tv_usec;
-	if (now.tv_usec < 0) {
-		--now.tv_sec;
-		now.tv_usec += 1000000;
-	}
-
-	len = sprintf(tbuf, "%lu.%06lu ",
-		      (unsigned long) now.tv_sec,
-		      (unsigned long) now.tv_usec);
-	len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
-	va_end(args);
-
-	kfifo_put(dccpw.fifo, tbuf, len);
-	wake_up(&dccpw.wait);
-}
-
-static int jdccp_sendmsg(struct kiocb *iocb, struct sock *sk,
-			 struct msghdr *msg, size_t size)
+static void jdccp_write_xmit(struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 	struct ccid3_hc_tx_sock *hctx = NULL;
+	struct timespec tv;
+	char tbuf[256];
+	int len, ccid = ccid_get_current_id(dccp_sk(sk), false);
 
-	if (ccid_get_current_id(dccp_sk(sk), false) == DCCPC_CCID3)
+	if (ccid == DCCPC_CCID3)
 		hctx = ccid3_hc_tx_sk(sk);
 
-	if (port == 0 || ntohs(inet->dport) == port ||
-	    ntohs(inet->sport) == port) {
-		if (hctx)
-			printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %d %d %d %u "
-			       "%llu %llu %d\n",
+	if (!port || ntohs(inet->dport) == port || ntohs(inet->sport) == port) {
+
+		tv  = ktime_to_timespec(ktime_sub(ktime_get(), dccpw.start));
+		len = sprintf(tbuf, "%lu.%09lu %d.%d.%d.%d:%u %d.%d.%d.%d:%u %d",
+			       (unsigned long) tv.tv_sec,
+			       (unsigned long) tv.tv_nsec,
 			       NIPQUAD(inet->saddr), ntohs(inet->sport),
-			       NIPQUAD(inet->daddr), ntohs(inet->dport), size,
+			       NIPQUAD(inet->daddr), ntohs(inet->dport), ccid);
+
+		if (hctx)
+			len += sprintf(tbuf + len, " %d %d %d %u %llu %llu %d",
 			       hctx->ccid3hctx_s, hctx->ccid3hctx_rtt,
 			       hctx->ccid3hctx_p, hctx->ccid3hctx_x_calc,
 			       hctx->ccid3hctx_x_recv >> 6,
 			       hctx->ccid3hctx_x >> 6, hctx->ccid3hctx_t_ipi);
-		else
-			printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d\n",
-			       NIPQUAD(inet->saddr), ntohs(inet->sport),
-			       NIPQUAD(inet->daddr), ntohs(inet->dport), size);
+
+		len += sprintf(tbuf + len, "\n");
+		kfifo_put(dccpw.fifo, tbuf, len);
+		wake_up(&dccpw.wait);
 	}
 
 	jprobe_return();
-	return 0;
 }
 
 static struct jprobe dccp_send_probe = {
 	.kp	= {
-		.symbol_name = "dccp_sendmsg",
+		.symbol_name = "dccp_write_xmit",
 	},
-	.entry	= jdccp_sendmsg,
+	.entry	= jdccp_write_xmit,
 };
 
 static int dccpprobe_open(struct inode *inode, struct file *file)
 {
 	kfifo_reset(dccpw.fifo);
-	do_gettimeofday(&dccpw.tstart);
+	dccpw.start = ktime_get();
 	return 0;
 }
 

^ permalink raw reply

* [PATCH 5/6] [DCCP]: Merge now-reduced connect_init() function
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-5-git-send-email-gerrit@erg.abdn.ac.uk>

After moving the assignment of GAR/ISS from dccp_connect_init() to
dccp_transmit_skb(), the former function becomes very small, so that
a merger with dccp_connect() suggested itself.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/output.c |   18 +++++-------------
 1 files changed, 5 insertions(+), 13 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -444,8 +444,9 @@ int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code)
 /*
  * Do all connect socket setups that can be done AF independent.
  */
-static inline void dccp_connect_init(struct sock *sk)
+int dccp_connect(struct sock *sk)
 {
+	struct sk_buff *skb;
 	struct dccp_sock *dp = dccp_sk(sk);
 	struct dst_entry *dst = __sk_dst_get(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -455,22 +456,12 @@ static inline void dccp_connect_init(struct sock *sk)
 
 	dccp_sync_mss(sk, dst_mtu(dst));
 
-	/* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
-	dp->dccps_gar = dp->dccps_iss;
-
-	icsk->icsk_retransmits = 0;
-}
-
-int dccp_connect(struct sock *sk)
-{
-	struct sk_buff *skb;
-	struct inet_connection_sock *icsk = inet_csk(sk);
-
 	/* do not connect if feature negotiation setup fails */
 	if (dccp_feat_finalise_settings(dccp_sk(sk)))
 		return -EPROTO;
 
-	dccp_connect_init(sk);
+	/* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
+	dp->dccps_gar = dp->dccps_iss;
 
 	skb = alloc_skb(sk->sk_prot->max_header, sk->sk_allocation);
 	if (unlikely(skb == NULL))
@@ -486,6 +477,7 @@ int dccp_connect(struct sock *sk)
 	DCCP_INC_STATS(DCCP_MIB_ACTIVEOPENS);
 
 	/* Timer for repeating the REQUEST until an answer. */
+	icsk->icsk_retransmits = 0;
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 				  icsk->icsk_rto, DCCP_RTO_MAX);
 	return 0;

^ permalink raw reply

* [PATCH 3/6] [DCCP]: Bug-Fix - AWL was never updated
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-3-git-send-email-gerrit@erg.abdn.ac.uk>

This patch was triggered by finding the  following message in the syslog:
 "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...]
   P.ackno exists or LAWL(82947089) <= P.ackno(82948208)
	                            <= S.AWH(82948728), sending SYNC..."

Note the difference between AWH and AWL: it is 1639 packets (while Sequence
Window was actually at 100). A closer look at the trace showed that
LAWL = AWL = 82947089 equalled the ISS on the Response.

The cause of the bug was that AWL was only ever set on the first packet - the
DCCP-Request sent by dccp_v{4,6}_connect().

The fix is to continually update AWL/AWH with each new packet (as GSS=AWH).

In addition, AWL/AWH are now updated to enforce more stringent checks on the
initial sequence numbers when connecting:
 * AWL is initialised to ISS and remains at this value;
 * AWH is always set to GSS (via dccp_update_gss());
 * so on the first Request: AWL =      AWH = ISS,
   and on the n-th Request: AWL = ISS, AWH = ISS+n.

As a consequence, only Response packets that refer to Requests sent by this
host will pass, all others are discarded. This is the intention and in effect
implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1.

Note: A problem that remains is that ISS can potentially be under-run even after
      the initial handshake; this is addressed a subsequent patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/output.c |   34 +++++++++++++++-------------------
 1 files changed, 15 insertions(+), 19 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -53,8 +53,11 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
 					  dccp_packet_hdr_len(dcb->dccpd_type);
 		int err, set_ack = 1;
 		u64 ackno = dp->dccps_gsr;
-
-		dccp_inc_seqno(&dp->dccps_gss);
+		/*
+		 * Increment GSS here already in case the option code needs it.
+		 * Update GSS for real only if option processing below succeeds.
+		 */
+		dcb->dccpd_seq = ADD48(dp->dccps_gss, 1);
 
 		switch (dcb->dccpd_type) {
 		case DCCP_PKT_DATA:
@@ -66,6 +69,9 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
 
 		case DCCP_PKT_REQUEST:
 			set_ack = 0;
+			/* Use ISS on the first (non-retransmitted) Request. */
+			if (icsk->icsk_retransmits == 0)
+				dcb->dccpd_seq = dp->dccps_iss;
 			/* fall through */
 
 		case DCCP_PKT_SYNC:
@@ -84,14 +90,11 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
 			break;
 		}
 
-		dcb->dccpd_seq = dp->dccps_gss;
-
 		if (dccp_insert_options(sk, skb)) {
 			kfree_skb(skb);
 			return -EPROTO;
 		}
 
-
 		/* Build DCCP header and checksum it. */
 		dh = dccp_zeroed_hdr(skb, dccp_header_size);
 		dh->dccph_type	= dcb->dccpd_type;
@@ -103,7 +106,7 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
 		/* XXX For now we're using only 48 bits sequence numbers */
 		dh->dccph_x	= 1;
 
-		dp->dccps_awh = dp->dccps_gss;
+		dccp_update_gss(sk, dcb->dccpd_seq);
 		dccp_hdr_set_seq(dh, dp->dccps_gss);
 		if (set_ack)
 			dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), ackno);
@@ -112,6 +115,11 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
 		case DCCP_PKT_REQUEST:
 			dccp_hdr_request(skb)->dccph_req_service =
 							dp->dccps_service;
+			/*
+			 * Limit Ack window to ISS <= P.ackno <= GSS, so that
+			 * only Responses to Requests we sent are considered.
+			 */
+			dp->dccps_awl = dp->dccps_iss;
 			break;
 		case DCCP_PKT_RESET:
 			dccp_hdr_reset(skb)->dccph_reset_code =
@@ -447,19 +455,7 @@ static inline void dccp_connect_init(struct sock *sk)
 
 	dccp_sync_mss(sk, dst_mtu(dst));
 
-	/*
-	 * SWL and AWL are initially adjusted so that they are not less than
-	 * the initial Sequence Numbers received and sent, respectively:
-	 *	SWL := max(GSR + 1 - floor(W/4), ISR),
-	 *	AWL := max(GSS - W' + 1, ISS).
-	 * These adjustments MUST be applied only at the beginning of the
-	 * connection.
-	 */
-	dccp_update_gss(sk, dp->dccps_iss);
-	dccp_set_seqno(&dp->dccps_awl, max48(dp->dccps_awl, dp->dccps_iss));
-
-	/* S.GAR - greatest valid acknowledgement number received on a non-Sync;
-	 *         initialized to S.ISS (sec. 8.5)                            */
+	/* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
 	dp->dccps_gar = dp->dccps_iss;
 
 	icsk->icsk_retransmits = 0;

^ permalink raw reply

* [PATCH 1/6] [CCID3]: Function returned wrong value
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-1-git-send-email-gerrit@erg.abdn.ac.uk>

This fixes a bug in the reverse lookup of p, given a value f(p) (used in the
reverse-lookup of the first/synthetic loss interval in RFC 3448, 6.3.1):
instead of p, the function returned the smallest table value f(p).

The smallest tabulated value of

   10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)

for p=0.0001 is 8172.

Since this value is scaled by 10^6, the impact of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.

This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, and thus unnecessarily reducing the throughput.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/lib/tfrc_equation.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/dccp/ccids/lib/tfrc_equation.c
+++ b/net/dccp/ccids/lib/tfrc_equation.c
@@ -677,11 +677,11 @@ u32 tfrc_calc_x_reverse_lookup(u32 fvalue)
 	/* Error cases. */
 	if (fvalue < tfrc_calc_x_lookup[0][1]) {
 		DCCP_WARN("fvalue %d smaller than resolution\n", fvalue);
-		return tfrc_calc_x_lookup[0][1];
+		return TFRC_SMALLEST_P;
 	}
 	if (fvalue > tfrc_calc_x_lookup[TFRC_CALC_X_ARRSIZE - 1][0]) {
 		DCCP_WARN("fvalue %d exceeds bounds!\n", fvalue);
-		return 1000000;
+		return 1000000;		/* The maximum: 100% scaled by 10^6 */
 	}
 
 	if (fvalue <= tfrc_calc_x_lookup[TFRC_CALC_X_ARRSIZE - 1][1]) {

^ permalink raw reply

* [DCCP] [Patch 0/6] [BUG-Fix]: Fixes for CCID3, seqnos, and dccp_probe
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev

This is a set of bug fixes for CCID3 and general DCCP. 

Please consider patches #1, #2, #3. The remainder are for the test tree
(but are fixes nonetheless) and may not apply directly onto mainline; with
regard to patch #6, please see note at end of message.

Patch #1: Fixes a CCID3 bug: when loss was encountered, the value returned at 
          minimum resolution was wrong and over 80 times too high.

Patch #2: Fixes a bug in the assignment of GAR (used ISR instead of ISS).

Patch #3: Another bug fix: AWL was only ever set after the Response.

Patch #4: Fixes adjustments to AWL and SWL. These were only updated at the
          begin of the connection (where the statements reduced to much 
          simpler assignments), but need to be adjusted for the reception
	  of subsequent packets also.
	  (Note: This patch is best used after going through the feature
	         negotiation patches, since the feature handlers for Sequence
		 Window ensure that these values are up to date.)

Patch #5: As a consequence of the preceding patches, the dccp_connect_init
          function may be merged into dccp_connect. Is a suggestion (which
          would make sense), not a bug fix.

Patch #6: A fix for dccp_probe - this used to be noisy and created huge
          files. By attaching the dccp_probe onto a different function, this
	  was fixed, the output file size shrank by a factor of over 100,
	  with qualitatively the same or better output.


I have put these after the feature-negotiation patches onto

	git://eden-feed.erg.abdn.ac.uk/dccp_exp 	{dccp,ccid4}

and updated the CCID4 tree to take advantage of the dccp_probe update.

If people would like to use the dccp_probe update for a mainline kernel,
I have uploaded a similar patch to patch #6 onto
http://www.erg.abdn.ac.uk/users/gerrit/dccp/dccp_probe_for_mainline.diff

^ permalink raw reply

* [PATCH 2/6] [DCCP]: Bug in initial acknowledgment number assignment
From: Gerrit Renker @ 2008-01-28 10:16 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1201515376-8280-2-git-send-email-gerrit@erg.abdn.ac.uk>

Step 8.5 in RFC 4340 says for the newly cloned socket

           Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

           Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/minisocks.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -122,13 +122,12 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
 		 *    Initialize S.GAR := S.ISS
 		 *    Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookies
 		 */
+		newdp->dccps_gar = newdp->dccps_iss = dreq->dreq_iss;
+		dccp_update_gss(newsk, dreq->dreq_iss);
 
-		newdp->dccps_gar = newdp->dccps_isr = dreq->dreq_isr;
+		newdp->dccps_isr = dreq->dreq_isr;
 		dccp_update_gsr(newsk, dreq->dreq_isr);
 
-		newdp->dccps_iss = dreq->dreq_iss;
-		dccp_update_gss(newsk, dreq->dreq_iss);
-
 		/*
 		 * SWL and AWL are initially adjusted so that they are not less than
 		 * the initial Sequence Numbers received and sent, respectively:

^ permalink raw reply

* 2.6.24 regression: Wake On Lan in sky2 broken on Mac mini
From: Tino Keitel @ 2008-01-28  8:42 UTC (permalink / raw)
  To: netdev

Hi folks,

with 2.6.24-rc8, Wake On LAN doesn't work anymore as it used to with
2.6.23 on my Mac mini Core Duo. I saw that this was reported in
http://bugzilla.kernel.org/show_bug.cgi?id=9721 and on netdev a patch
for the sky2 driver was sent by Stephen Hemminger. This patch fixed WOL
for me after applying it to 2.6.24-rc8.

However, it seems as the patch never made it into the kernel. Instead,
the commit that was suspected to break WOL
(84cd2dfb04d23a961c5f537baa243fa54d0987ac) was reverted
(be63a21c9573fbf88106ff0f030da5974551257b).

Now I tried the 2.6.24 release and noticed that WOL is still broken.
I'll be happy to test any patches that can make it into 2.6.24.1.

Regards,
Tino

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox