Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Karol Lewandowski @ 2009-10-28 11:42 UTC (permalink / raw)
  To: Mel LKML
  Cc: Karol Lewandowski, Mel Gorman, Frans Pop, Jiri Kosina,
	Sven Geggus, Tobias Oetiker, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, David Rientjes, KOSAKI Motohiro,
	Mohamed Abbas, Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" <li
In-Reply-To: <9ec2d7290910240646p75b93c68v6ea1648d628a9660-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Sat, Oct 24, 2009 at 02:46:56PM +0100, Mel LKML wrote:
> Hi,

Hi,

> On 10/23/09, Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Fri, Oct 23, 2009 at 06:58:10PM +0200, Karol Lewandowski wrote:

> > Ok, I've tested patches 1+2+4 and bug, while very hard to trigger, is
> > still present. I'll test complete 1-4 patchset as time permits.

Sorry for silence, I've been quite busy lately.


> And also patch 5 please which is the revert. Patch 5 as pointed out is
> probably a red herring. Hwoever, it has changed the timing and made a
> difference for some testing so I'd like to know if it helps yours as
> well.

I've tested patches 1+2+3+4 in my normal usage scenario (do some work,
suspend, do work, suspend, ...) and it failed today after 4 days (== 4
suspend-resume cycles).

I'll test 1-5 now.


Thanks.

^ permalink raw reply

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Mel Gorman @ 2009-10-28 11:59 UTC (permalink / raw)
  To: Karol Lewandowski
  Cc: Mel LKML, Frans Pop, Jiri Kosina, Sven Geggus, Tobias Oetiker,
	Rafael J. Wysocki, David Miller, Reinette Chatre, Kalle Valo,
	David Rientjes, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <20091028114208.GA14476-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org>

On Wed, Oct 28, 2009 at 12:42:08PM +0100, Karol Lewandowski wrote:
> On Sat, Oct 24, 2009 at 02:46:56PM +0100, Mel LKML wrote:
> > Hi,
> 
> Hi,
> 
> > On 10/23/09, Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Fri, Oct 23, 2009 at 06:58:10PM +0200, Karol Lewandowski wrote:
> 
> > > Ok, I've tested patches 1+2+4 and bug, while very hard to trigger, is
> > > still present. I'll test complete 1-4 patchset as time permits.
> 
> Sorry for silence, I've been quite busy lately.
> 
> 
> > And also patch 5 please which is the revert. Patch 5 as pointed out is
> > probably a red herring. Hwoever, it has changed the timing and made a
> > difference for some testing so I'd like to know if it helps yours as
> > well.
> 
> I've tested patches 1+2+3+4 in my normal usage scenario (do some work,
> suspend, do work, suspend, ...) and it failed today after 4 days (== 4
> suspend-resume cycles).
> 
> I'll test 1-5 now.
> 

I was digging through commits for suspend-related changes. Rafael, is
there any chance that some change to suspend is responsible for this
regression? This commit for example is a vague possibility;
c6f37f12197ac3bd2e5a35f2f0e195ae63d437de: PM/Suspend: Do not shrink memory before suspend

I say vague because FREE_PAGE_NUMBER is so small.

Also, what was the behaviour of the e100 driver when suspending before
this commit?

6905b1f1a03a48dcf115a2927f7b87dba8d5e566: Net / e100: Fix suspend of devices that cannot be power managed

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* RE: [PATCH v2 4/7] fsl_pq_mdio: Add Suport for etsec2.0 devices.
From: Kumar Gopalpet-B05799 @ 2009-10-28 12:00 UTC (permalink / raw)
  To: netdev; +Cc: David Miller
In-Reply-To: <20091028.024325.134189895.davem@davemloft.net>

 

>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net] 
>Sent: Wednesday, October 28, 2009 3:13 PM
>To: Kumar Gopalpet-B05799
>Cc: netdev@vger.kernel.org
>Subject: Re: [PATCH v2 4/7] fsl_pq_mdio: Add Suport for 
>etsec2.0 devices.
>
>From: Sandeep Gopalpet <sandeep.kumar@freescale.com>
>Date: Tue, 27 Oct 2009 22:25:18 +0530
>
>> This patch adds mdio support for etsec2.0 devices.
>> 
>> Modified the fsl_pq_mdio structure to include the new mdio members.
>> 
>> Signed-off-by: Sandeep Gopalpet <sandeep.kumar@freescale.com>
>
>This is the third time you've submitted this patch, and for 
>the third time it DOES NOT apply to net-next-2.6 at all when I 
>try to apply this gianfar patch series.
>
>You must be patching against another tree that has some 
>changes that conflict with this one.
>

I had rebased and tested on the master branch of the following tree
http://www.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git

(The git protocol clone was not working, hence we used http)
If this is not the right tree, kindly correct me.
I could not find any other net-next-2.6 tree/branch on kernel.org.


I have updated the tree again, (though there is no change)
the last commit on the tree being :

commit b37b62fea1d1bf68ca51818f8eb1035188efd030
Author: Ben Hutchings <bhutchings@solarflare.com>
Date:   Fri Oct 23 08:33:42 2009 +0000

    sfc: Rename 'xfp' file and functions to reflect reality

    The 'XFP' driver is really a driver for the QT2022C2 and QT2025C
PHYs,
    covering both more and less than XFP.  Rename its functions and
    constants to reflect reality and to reduce namespace pollution when
    sfc is a built-in driver.

    Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

 
Also, I have merged it with the latest linux-next tree 
(It is already updated with the net-next-2.6.git)

>Sort this out before submitting this again.

I am working on it.
>
>If you submit once more this same series, and it doesn't apply 
>properly to net-next-2.6, I will flat our ignore your 
>submissions for a week or so.
>

>You are wasting that much of my time by doing this over and over.
I apologize for that.
>
>Get your act together.
>


^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: Daniel Silverstone @ 2009-10-28 12:06 UTC (permalink / raw)
  To: Figo.zhang; +Cc: David S. Miller, netdev, Ben Dooks
In-Reply-To: <1256653422.2148.23.camel@myhost>

On Tue, Oct 27, 2009 at 10:23:42PM +0800, Figo.zhang wrote:
> for wan, irq = 29; for lan ,irq = 16.
> so we can do this read the interrupt status:
> 
> unsigned long mask_bit = 1 << ksp->rx_irq;
> status = readl(KS8695_IRQ_VA + KS8695_INTST);

I hate that there's no proper IRQ functions for managing these, as Ben has
commented.  Although I can understand that writing such is beyond the scope of
this patch.

>  #define MODULENAME	"ks8695_ether"
>  #define MODULEVERSION	"1.01"

You still didn't update the module version.  This is a pity because you've
potentially radically changed behaviour and you definitely have radically
changed implementation.

> +	struct napi_struct	napi;
> +	spinlock_t rx_lock;

You have not added documentation for these fields in the structure's
documentation string.

> + *	Use NAPI to receive packets.

"Inform NAPI that packet reception needs to be scheduled." might be better.

> +static int ks8695_rx(struct net_device *ndev, int budget)

This routine lacks a documentation string. Please write one.

> -	/* Kick the RX DMA engine, in case it became suspended */
> -	ks8695_writereg(ksp, KS8695_DRSC, 0);

I can't see where you have moved this to.  Without it, sometimes the KS8695's
RX DMA engine will falter and packets won't be transferred properly.

> +static int ks8695_poll(struct napi_struct *napi, int budget)

This routine also lacks a documentation string.

> +	netif_napi_add(ndev, &ksp->napi, ks8695_poll, 64);

This '64' seems quite arbitrary.  Is it a standard default? Did you work it out
from something else?  Some explanation would be nice.


I see that Dave Miller has accepted your patch into net-next-2.6.  I'd like to
see the above fixed before that gets merged any further.

Regards,

Daniel.

-- 
Daniel Silverstone                              http://www.simtec.co.uk/


^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: David Miller @ 2009-10-28 12:14 UTC (permalink / raw)
  To: dsilvers; +Cc: figo1802, netdev, ben
In-Reply-To: <20091028120643.GA7883@digital-scurf.org>

From: Daniel Silverstone <dsilvers@simtec.co.uk>
Date: Wed, 28 Oct 2009 12:06:44 +0000

> I see that Dave Miller has accepted your patch into net-next-2.6.
> I'd like to see the above fixed before that gets merged any further.

Any such change would need to be relative to what's already
in net-next-2.6

^ permalink raw reply

* [PATCH]udev:Extend udev to support move events
From: Narendra K @ 2009-10-28 12:46 UTC (permalink / raw)
  To: linux-hotplug
  Cc: netdev, matt_domsch, jordan_hargrave, charles_rose,
	sandeep_k_shandilya, dannf

As of now, udev does not support move events that are generated when
network interfaces are renamed. This patch extends udev to support move
events. With this patch udev would support rules like 

ACTION=="move", SUBSYSTEM=="net", PROGRAM="netif_id %k"

Signed-off-by: Narendra K <Narendra_K@dell.com>
---
 udev/udev-event.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/udev/udev-event.c b/udev/udev-event.c
index f4d7121..4a77753 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -647,6 +647,13 @@ exit_add:
 		goto exit;
 	}
 
+	/* handle "move" event */
+	if (strcmp(udev_device_get_subsystem(dev), "net") == 0 && strcmp(udev_device_get_action(dev), "move") == 0) {
+		udev_rules_apply_to_event(rules, event);
+		udev_device_update_db(dev);
+		goto exit;
+	}
+
 	/* remove device node */
 	if (major(udev_device_get_devnum(dev)) != 0 && strcmp(udev_device_get_action(dev), "remove") == 0) {
 		/* import database entry and delete it */
-- 

With regards,
Narendra K


^ permalink raw reply related

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Tobi Oetiker @ 2009-10-28 12:55 UTC (permalink / raw)
  To: Karol Lewandowski
  Cc: Mel LKML, Mel Gorman, Frans Pop, Jiri Kosina, Sven Geggus,
	Rafael J. Wysocki, David Miller, Reinette Chatre, Kalle Valo,
	David Rientjes, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <20091028114208.GA14476-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org>

 Today Karol Lewandowski wrote:

> On Sat, Oct 24, 2009 at 02:46:56PM +0100, Mel LKML wrote:
> > Hi,
>
> Hi,
>
> > On 10/23/09, Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Fri, Oct 23, 2009 at 06:58:10PM +0200, Karol Lewandowski wrote:
>
> > > Ok, I've tested patches 1+2+4 and bug, while very hard to trigger, is
> > > still present. I'll test complete 1-4 patchset as time permits.
>
> Sorry for silence, I've been quite busy lately.
>
>
> > And also patch 5 please which is the revert. Patch 5 as pointed out is
> > probably a red herring. Hwoever, it has changed the timing and made a
> > difference for some testing so I'd like to know if it helps yours as
> > well.
>
> I've tested patches 1+2+3+4 in my normal usage scenario (do some work,
> suspend, do work, suspend, ...) and it failed today after 4 days (== 4
> suspend-resume cycles).

I have been testing 1+2,1+2+3 as well as 3+4 and have been of the
assumption that 3+4 does help ... I have now been runing a modified
version of 4 which prints a warning instead of doing anything ... I
have now seen the allocation issue again without the warning being
printed. So in other words

1+2+3 make the problem less severe, but do not solve it
4 seems to be a red hering.

cheers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900

^ permalink raw reply

* Re: [PATCH 2/3] net: TCP thin linear timeouts
From: Arnd Hannemann @ 2009-10-28 12:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andreas Petlund, netdev, linux-kernel, shemminger, ilpo.jarvinen,
	davem
In-Reply-To: <4AE7262B.1060703@gmail.com>

Eric Dumazet schrieb:
> Andreas Petlund a écrit :
>> This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.
>>
> 
> Wont this reduce the session timeout to something very small, ie 15 retransmits, way under the minute ?

The session timeout no longer depends on the actual number of retransmits. Instead its a time interval,
which is roughly equivalent to the time a TCP, performing exponential backoff would need to perform
15 retransmits.

However, addressing the proposal:
I wonder how one can seriously suggest to just skip congestion response during timeout-based
loss recovery? I believe that in a heavily congested scenarios, this would lead to a goodput
goodput disaster... Not to mention that in a heavily congested scenario, suddenly every flow
will become "thin", so this will even amplify the problems. Or did I miss something?

Best regards,
Arnd

^ permalink raw reply

* Re: [PATCH] udev: create empty regular files to represent net interfaces
From: Matt Domsch @ 2009-10-28 13:03 UTC (permalink / raw)
  To: Kay Sievers
  Cc: dann frazier, linux-hotplug, Narendra_K, netdev, Jordan_Hargrave,
	Charles_Rose, Ben Hutchings
In-Reply-To: <ac3eb2510910280123g3c0e3d95wb38a239238906027@mail.gmail.com>

On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> > On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> >> Here's a proof of concept to further the discussion..
> >>
> >> The default filename uses the format:
> >> ?? /dev/netdev/by-ifindex/$ifindex
> >>
> >> This provides the infrastructure to permit udev rules to create aliases for
> >> network devices using symlinks, for example:
> >>
> >> ?? /dev/netdev/by-name/eth0 -> ../by-ifindex/1
> >> ?? /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> >>
> >> A library (such as the proposed libnetdevname) could use this information
> >> to provide an alias->realname mapping for network utilities.
> >
> > yes, this could work, as IFINDEX is already exported in the uevents,
> > and that's the primary value udev needs to set up the mapping.
> >
> > While I like the little ifindex2name script you've got, I think udev
> > could simply call if_indextoname() to get this, and not call an
> > external program? ??I suppose it could be a really really simple
> > external program too.
> 
> What's the point of all this? Why would udev ever need to find the
> name of a device by the ifindex? The device name is the primary value
> for the kernel events udev acts on.

Ultimately, udev doesn't care.  I just want to use udev to keep track
of the pathname to device connections, like it does for all other
types of devices.

Applications such as net-tools, iproute, ethtool, etc.  take a kernel
device name.  I want to extend them to also take a path, and resolve
that path to a kernel device name.  libnetdevname currently is _one
small function_ which does this.  It need not even be in a library.
But whatever the mechanism, the path names need to be anchored
somewhere, so the library or all apps doing this kind of lookup know
where to look.

> That all sounds very much like something which will hit us back some
> day. I'm not sure, if udev should publish such dead text files in
> /dev, it does not seem to fit the usual APIs/assumptions where /sys
> and /dev match, and libudev provides access to both. It all sounds
> more like a database for a possible netdevname library, which does not
> need to be public in /dev, right?

Right, it doesn't need to be in /dev.  We could have udev rules that
simply call yet another program to maintain that database, in yet
another way.  But I really like how udev maintains the database of
symlinks for other device types, using symlinks in /dev/, and which
people are quite familiar with.  Why can't it be extended to do
likewise for network device names too?

There is a completely different approach possible here, if people
don't want to use something like /dev to track device name aliases.
We could put the whole name alias mechanism in the kernel, with new
netlink commands to add/remove/list aliases (and now we've overloaded that
term, as the old eth0:1 "alias" and dmz -> eth1 "alias" wouldn't be
the same thing).  But that idea hasn't met with a lot of interest
either.


-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Narendra K @ 2009-10-28 13:06 UTC (permalink / raw)
  To: notting, scott
  Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave, rose_charles
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE58964E@blrx3m08.blr.amer.dell.com>

On Wed, Oct 28, 2009 at 06:21:49PM +0530, K, Narendra wrote:
> > At the moment, we do not appear to get the proper change uevents from 
> > things like 'ip link set dev <foo> address <bar>', so we can't 
> > currently maintain these symlinks.
> > 
> 
> I have observed that the kernel does generate a "move" event when
> interfaces are renamed. Looks like udev at present doesn't handle this
> event, but i suppose it could be extended to hanlde this event.
> 

With the patch "[PATCH]udev:Extend udev to support move events"
(http://marc.info/?l=linux-hotplug&m=125673399217656&w=2) udev would be
able to handle "move" events that are generated when interfaces are
renamed by commands like nameif. And we can maintain the symlinks by
having rules to handle this move event.

With regards,
Narendra K

^ permalink raw reply

* Re: [PATCH] udev: create empty regular files to represent net interfaces
From: Ben Hutchings @ 2009-10-28 13:06 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Matt Domsch, dann frazier, linux-hotplug, Narendra_K, netdev,
	Jordan_Hargrave, Charles_Rose
In-Reply-To: <ac3eb2510910280123g3c0e3d95wb38a239238906027@mail.gmail.com>

On Wed, 2009-10-28 at 09:23 +0100, Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> > On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> >> Here's a proof of concept to further the discussion..
> >>
> >> The default filename uses the format:
> >>   /dev/netdev/by-ifindex/$ifindex
> >>
> >> This provides the infrastructure to permit udev rules to create aliases for
> >> network devices using symlinks, for example:
> >>
> >>   /dev/netdev/by-name/eth0 -> ../by-ifindex/1
> >>   /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> >>
> >> A library (such as the proposed libnetdevname) could use this information
> >> to provide an alias->realname mapping for network utilities.
> >
> > yes, this could work, as IFINDEX is already exported in the uevents,
> > and that's the primary value udev needs to set up the mapping.
> >
> > While I like the little ifindex2name script you've got, I think udev
> > could simply call if_indextoname() to get this, and not call an
> > external program?  I suppose it could be a really really simple
> > external program too.
> 
> What's the point of all this? Why would udev ever need to find the
> name of a device by the ifindex? The device name is the primary value
> for the kernel events udev acts on.
[...]

Since net devices can be renamed, unlike other devices, the ifindex is
the proper stable identifier.  Using the name as an identifier opens up
race conditions.  If there are events that don't include the ifindex,
this should be fixed.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] Multicast packet reassembly can fail
From: Steve Chen @ 2009-10-28 13:29 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <4AE780CB.8070401@hp.com>

On Tue, 2009-10-27 at 16:22 -0700, Rick Jones wrote:
> Steve Chen wrote:
> > Multicast packet reassembly can fail
> > 
> > When multicast connections with multiple fragments are received by the same
> > node from more than one Ethernet ports, race condition between fragments
> > from each Ethernet port can cause fragment reassembly to fail leading to
> > packet drop.  This is because packets from each Ethernet port appears identical
> > to the the code that reassembles the Ethernet packet.
> > 
> > The solution is evaluate the Ethernet interface number in addition to all other
> > parameters so that every packet can be uniquely identified.  The existing
> > iif field in struct ipq is now used to generate the hash key, and iif is also
> > used for comparison in case of hash collision.
> > 
> > Please note that q->saddr ^ (q->iif << 5) is now being passed into
> > ipqhashfn to generate the hash key.  This is borrowed from the routing
> > code.
> > 
> > Signed-off-by: Steve Chen <schen@mvista.com>
> > Signed-off-by: Mark Huth <mhuth@mvista.com>
> 
> It has been hours since my last good Emily Litella moment so I'll ask - isn't 
> the combination of source and dest addr, protocol, IP ID and fragment offset 
> supposed to take care of this?  How does the ingress interface have anything to 
> do with it?

Here is the scenario this patch tries to address

<src node> ---->  <switch>  ----> <eth0 dest node>
                            \--->  <eth1 dest node>

For this specific case, src/dst address, protocol, IP ID and fragment
offset are all identical.  The only difference is the ingress interface.
A good follow up question would be why would anyone in their right mind
multicast to the same destination?  well, I don't know.  I can not get
the people who reported the problem to tell me either.   Since someone
found the need to do this,  perhaps others may find it useful too.

Regards,

Steve


^ permalink raw reply

* [PATCH V3]NET/KS8695: add support NAPI for Rx
From: Figo.zhang @ 2009-10-28 13:23 UTC (permalink / raw)
  To: Daniel Silverstone, David S. Miller; +Cc: netdev, Ben Dooks

Add support NAPI Rx API for KS8695NET driver.

v2, change the Rx function to NAPI.

in <KS8695X Integrated Multi-port Gateway Solution Register Description
 v1.0>:
Interrupt Enable Register (offset 0xE204)
Bit29 : WAN MAC Receive Interrupt Enable
Bit16 : LAN MAC Receive Interrupt Enable

Interrupt Status Register (Offset 0xF208)
Bit29: WAN MAC Receive Status
Bit16: LAN MAC Receive Status

see arch/arm/mach-ks8695/devices.c:
ks8695_wan_resources[] and ks8695_lan_resources[]
have IORESOURCE_IRQ , it have define the RX irq,
for wan, irq = 29; for lan ,irq = 16.
so we can do this read the interrupt status:

unsigned long mask_bit = 1 << ksp->rx_irq;
status = readl(KS8695_IRQ_VA + KS8695_INTST);

In v3, some changes that adviced by Daniel Silverstone
and Ben Dooks.

Add k8695_get_rx_enable_bit() for get Rx interrupt enable/status
bit.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
--- 

 drivers/net/arm/ks8695net.c |  145 +++++++++++++++++++++++++++++++++++--------
 1 files changed, 120 insertions(+), 25 deletions(-)

diff --git a/drivers/net/arm/ks8695net.c b/drivers/net/arm/ks8695net.c
index 2a7b774..7051bcc 100644
--- a/drivers/net/arm/ks8695net.c
+++ b/drivers/net/arm/ks8695net.c
@@ -35,11 +35,13 @@
 
 #include <mach/regs-switch.h>
 #include <mach/regs-misc.h>
+#include <asm/mach/irq.h>
+#include <mach/regs-irq.h>
 
 #include "ks8695net.h"
 
 #define MODULENAME	"ks8695_ether"
-#define MODULEVERSION	"1.01"
+#define MODULEVERSION	"1.02"
 
 /*
  * Transmit and device reset timeout, default 5 seconds.
@@ -95,6 +97,9 @@ struct ks8695_skbuff {
 #define MAX_RX_DESC 16
 #define MAX_RX_DESC_MASK 0xf
 
+/*napi_weight have better more than rx DMA buffers*/
+#define NAPI_WEIGHT   64
+
 #define MAX_RXBUF_SIZE 0x700
 
 #define TX_RING_DMA_SIZE (sizeof(struct tx_ring_desc) * MAX_TX_DESC)
@@ -120,6 +125,7 @@ enum ks8695_dtype {
  *	@dev: The platform device object for this interface
  *	@dtype: The type of this device
  *	@io_regs: The ioremapped registers for this interface
+ *      @napi : Add support NAPI for Rx
  *	@rx_irq_name: The textual name of the RX IRQ from the platform data
  *	@tx_irq_name: The textual name of the TX IRQ from the platform data
  *	@link_irq_name: The textual name of the link IRQ from the
@@ -143,6 +149,7 @@ enum ks8695_dtype {
  *	@rx_ring_dma: The DMA mapped equivalent of rx_ring
  *	@rx_buffers: The sk_buff mappings for the RX ring
  *	@next_rx_desc_read: The next RX descriptor to read from on IRQ
+ *      @rx_lock: A lock to protect Rx irq function
  *	@msg_enable: The flags for which messages to emit
  */
 struct ks8695_priv {
@@ -152,6 +159,8 @@ struct ks8695_priv {
 	enum ks8695_dtype dtype;
 	void __iomem *io_regs;
 
+	struct napi_struct	napi;
+
 	const char *rx_irq_name, *tx_irq_name, *link_irq_name;
 	int rx_irq, tx_irq, link_irq;
 
@@ -172,6 +181,7 @@ struct ks8695_priv {
 	dma_addr_t rx_ring_dma;
 	struct ks8695_skbuff rx_buffers[MAX_RX_DESC];
 	int next_rx_desc_read;
+	spinlock_t rx_lock;
 
 	int msg_enable;
 };
@@ -392,29 +402,82 @@ ks8695_tx_irq(int irq, void *dev_id)
 }
 
 /**
+ *	k8695_get_rx_enable_bit - Get rx interrupt enable/status bit
+ *	@ksp: Private data for the KS8695 Ethernet
+ *
+ *    For KS8695 document:
+ *    Interrupt Enable Register (offset 0xE204)
+ *        Bit29 : WAN MAC Receive Interrupt Enable
+ *        Bit16 : LAN MAC Receive Interrupt Enable
+ *    Interrupt Status Register (Offset 0xF208)
+ *        Bit29: WAN MAC Receive Status
+ *        Bit16: LAN MAC Receive Status
+ *    So, this Rx interrrupt enable/status bit number is equal
+ *    as Rx IRQ number.
+ */
+static inline u32 k8695_get_rx_enable_bit(struct ks8695_priv *ksp)
+{
+	return ksp->rx_irq;
+}
+
+/**
  *	ks8695_rx_irq - Receive IRQ handler
  *	@irq: The IRQ which went off (ignored)
  *	@dev_id: The net_device for the interrupt
  *
- *	Process the RX ring, passing any received packets up to the
- *	host.  If we received anything other than errors, we then
- *	refill the ring.
+ *	Inform NAPI that packet reception needs to be scheduled
  */
+
 static irqreturn_t
 ks8695_rx_irq(int irq, void *dev_id)
 {
 	struct net_device *ndev = (struct net_device *)dev_id;
 	struct ks8695_priv *ksp = netdev_priv(ndev);
+	unsigned long status;
+
+	unsigned long mask_bit = 1 << k8695_get_rx_enable_bit();
+
+	spin_lock(&ksp->rx_lock);
+
+	status = readl(KS8695_IRQ_VA + KS8695_INTST);
+
+	/*clean rx status bit*/
+	writel(status | mask_bit , KS8695_IRQ_VA + KS8695_INTST);
+
+	if (status & mask_bit) {
+		if (napi_schedule_prep(&ksp->napi)) {
+			/*disable rx interrupt*/
+			status &= ~mask_bit;
+			writel(status , KS8695_IRQ_VA + KS8695_INTEN);
+			__napi_schedule(&ksp->napi);
+		}
+	}
+
+	spin_unlock(&ksp->rx_lock);
+	return IRQ_HANDLED;
+}
+
+/**
+ *	ks8695_rx - Receive packets  called by NAPI poll method
+ *	@ksp: Private data for the KS8695 Ethernet
+ *	@budget: The max packets would be receive
+ */
+
+static int ks8695_rx(struct ks8695_priv *ksp, int budget)
+{
+	struct net_device *ndev = ksp->ndev;
 	struct sk_buff *skb;
 	int buff_n;
 	u32 flags;
 	int pktlen;
 	int last_rx_processed = -1;
+	int received = 0;
 
 	buff_n = ksp->next_rx_desc_read;
-	do {
-		if (ksp->rx_buffers[buff_n].skb &&
-		    !(ksp->rx_ring[buff_n].status & cpu_to_le32(RDES_OWN))) {
+	while (received < budget
+			&& ksp->rx_buffers[buff_n].skb
+			&& (!(ksp->rx_ring[buff_n].status &
+					cpu_to_le32(RDES_OWN)))) {
 			rmb();
 			flags = le32_to_cpu(ksp->rx_ring[buff_n].status);
 			/* Found an SKB which we own, this means we
@@ -464,7 +527,7 @@ ks8695_rx_irq(int irq, void *dev_id)
 			/* Relinquish the SKB to the network layer */
 			skb_put(skb, pktlen);
 			skb->protocol = eth_type_trans(skb, ndev);
-			netif_rx(skb);
+			netif_receive_skb(skb);
 
 			/* Record stats */
 			ndev->stats.rx_packets++;
@@ -478,29 +541,57 @@ rx_failure:
 			/* Give the ring entry back to the hardware */
 			ksp->rx_ring[buff_n].status = cpu_to_le32(RDES_OWN);
 rx_finished:
+			received++;
 			/* And note this as processed so we can start
 			 * from here next time
 			 */
 			last_rx_processed = buff_n;
-		} else {
-			/* Ran out of things to process, stop now */
-			break;
-		}
-		buff_n = (buff_n + 1) & MAX_RX_DESC_MASK;
-	} while (buff_n != ksp->next_rx_desc_read);
-
-	/* And note which RX descriptor we last did anything with */
-	if (likely(last_rx_processed != -1))
-		ksp->next_rx_desc_read =
-			(last_rx_processed + 1) & MAX_RX_DESC_MASK;
-
-	/* And refill the buffers */
-	ks8695_refill_rxbuffers(ksp);
+			buff_n = (buff_n + 1) & MAX_RX_DESC_MASK;
+			/*And note which RX descriptor we last did */
+			if (likely(last_rx_processed != -1))
+				ksp->next_rx_desc_read =
+					(last_rx_processed + 1) &
+					MAX_RX_DESC_MASK;
+
+			/* And refill the buffers */
+			ks8695_refill_rxbuffers(ksp);
+
+			/* Kick the RX DMA engine, in case it became
+			 *  suspended */
+			ks8695_writereg(ksp, KS8695_DRSC, 0);
+	}
+	return received;
+}
 
-	/* Kick the RX DMA engine, in case it became suspended */
-	ks8695_writereg(ksp, KS8695_DRSC, 0);
 
-	return IRQ_HANDLED;
+/**
+ *	ks8695_poll - Receive packet by NAPI poll method
+ *	@ksp: Private data for the KS8695 Ethernet
+ *	@budget: The remaining number packets for network subsystem
+ *
+ *     Invoked by the network core when it requests for new
+ *     packets from the driver
+ */
+static int ks8695_poll(struct napi_struct *napi, int budget)
+{
+	struct ks8695_priv *ksp = container_of(napi, struct ks8695_priv, napi);
+	struct net_device *dev = ksp->ndev;
+	unsigned long  work_done;
+
+	unsigned long isr = readl(KS8695_IRQ_VA + KS8695_INTEN);
+	unsigned long mask_bit = 1 << k8695_get_rx_enable_bit();
+
+	work_done = ks8695_rx(ksp, budget);
+
+	if (work_done < budget) {
+		unsigned long flags;
+		spin_lock_irqsave(&ksp->rx_lock, flags);
+		/*enable rx interrupt*/
+		writel(isr | mask_bit, KS8695_IRQ_VA + KS8695_INTEN);
+		__napi_complete(napi);
+		spin_unlock_irqrestore(&ksp->rx_lock, flags);
+	}
+	return work_done;
 }
 
 /**
@@ -1472,6 +1563,8 @@ ks8695_probe(struct platform_device *pdev)
 	SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	ndev->watchdog_timeo	 = msecs_to_jiffies(watchdog);
 
+	netif_napi_add(ndev, &ksp->napi, ks8695_poll, NAPI_WEIGHT);
+
 	/* Retrieve the default MAC addr from the chip. */
 	/* The bootloader should have left it in there for us. */
 
@@ -1505,6 +1598,7 @@ ks8695_probe(struct platform_device *pdev)
 
 	/* And initialise the queue's lock */
 	spin_lock_init(&ksp->txq_lock);
+	spin_lock_init(&ksp->rx_lock);
 
 	/* Specify the RX DMA ring buffer */
 	ksp->rx_ring = ksp->ring_base + TX_RING_DMA_SIZE;
@@ -1626,6 +1720,7 @@ ks8695_drv_remove(struct platform_device *pdev)
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 
 	platform_set_drvdata(pdev, NULL);
+	netif_napi_del(&ksp->napi);
 
 	unregister_netdev(ndev);
 	ks8695_release_device(ksp);



^ permalink raw reply related

* Re: [PATCH] Multicast packet reassembly can fail
From: Steve Chen @ 2009-10-28 13:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <4AE81A70.5060307@gmail.com>

On Wed, 2009-10-28 at 11:18 +0100, Eric Dumazet wrote:
> Steve Chen a écrit :
> > Multicast packet reassembly can fail
> > 
> > When multicast connections with multiple fragments are received by the same
> > node from more than one Ethernet ports, race condition between fragments
> > from each Ethernet port can cause fragment reassembly to fail leading to
> > packet drop.  This is because packets from each Ethernet port appears identical
> > to the the code that reassembles the Ethernet packet.
> > 
> > The solution is evaluate the Ethernet interface number in addition to all other
> > parameters so that every packet can be uniquely identified.  The existing
> > iif field in struct ipq is now used to generate the hash key, and iif is also
> > used for comparison in case of hash collision.
> > 
> > Please note that q->saddr ^ (q->iif << 5) is now being passed into
> > ipqhashfn to generate the hash key.  This is borrowed from the routing
> > code.
> > 
> > Signed-off-by: Steve Chen <schen@mvista.com>
> > Signed-off-by: Mark Huth <mhuth@mvista.com>
> > 
> 
> This makes no sense to me, but I need to check the code.
> 
> How interface could matter in IP defragmentation ?
> And why multicast is part of the equation ?
> 
> If defrag fails, this must be for other reason,
> and probably needs another fix.
> 
> Check line 219 of net/ipv4/inet_fragment.c
> 
> #ifdef CONFIG_SMP
>         /* With SMP race we have to recheck hash table, because
>          * such entry could be created on other cpu, while we
>          * promoted read lock to write lock.
>          */
>         hlist_for_each_entry(qp, n, &f->hash[hash], list) {
>                 if (qp->net == nf && f->match(qp, arg)) {
>                         atomic_inc(&qp->refcnt);
>                         write_unlock(&f->lock);
>                         qp_in->last_in |= INET_FRAG_COMPLETE;   <<< HERE >>>
>                         inet_frag_put(qp_in, f);
>                         return qp;
>                 }
>         }
> #endif
> 
> I really wonder why we set INET_FRAG_COMPLETE here

I sent the specific scenario the patch tries to address to the list in
an earlier e-mail.  Would it be beneficial if I post the test code
somewhere so everyone can have access?

Regards,

Steve


^ permalink raw reply

* WAN device configuration, again...
From: Krzysztof Halasa @ 2009-10-28 13:28 UTC (permalink / raw)
  To: netdev

Hi,

I'm currently at final stages of "producing" two WAN drivers and there
is one thing to solve: they have really complex options. It's no longer
a V.35 with ca. 4 clock modes, a clock rate and few encodings etc. They
need many options unique to each driver/board. I think I need a more
capable interface to configure the devices than the current ioctl-based
one.

I think of something:
- using netlink or similar interface
- with potentially unlimited "payload" size (data may be transfered in
  smaller packets)
- the "command" and "response" should be variable-length ASCII-based,
  instead of fixed structures. This way I don't have to duplicate all
  option handling in userspace, only the specific driver has to know
  about them.

Comments? Perhaps there is already an example?
Should I use something else?

I also thought about using /sys read/write calls, but I'm not sure it's
a good idea.
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: [PATCH V3]NET/KS8695: add support NAPI for Rx
From: David Miller @ 2009-10-28 13:37 UTC (permalink / raw)
  To: figo1802; +Cc: dsilvers, netdev, ben
In-Reply-To: <1256736189.2148.30.camel@myhost>


I said to send a relative patch against V2, so that I can
apply it on top what you've already sent.

Why are you sending a full fresh patch after being instructed
not to do that?

^ permalink raw reply

* Re: [PATCH] Multicast packet reassembly can fail
From: Eric Dumazet @ 2009-10-28 13:30 UTC (permalink / raw)
  To: Steve Chen; +Cc: netdev
In-Reply-To: <1256736757.3153.412.camel@linux-1lbu>

Steve Chen a écrit :
 
> I sent the specific scenario the patch tries to address to the list in
> an earlier e-mail.  Would it be beneficial if I post the test code
> somewhere so everyone can have access?
> 

Yes please, I cannot find your previous mail in my archives.

Thanks


^ permalink raw reply

* [PATCHv4 0/7] Per route TCP options support kill switches
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori

Allow selectively turning off support for specific TCP options
on a per route basis.

One normally want to disable SACK, DSACK, time stamp or window
scale if one got a piece of broken networking equipment somewhere
as a stop gap until you can bring a big enough hammer to deal with
the broken network equipment. It doesn't make sense to "punish" the
entire connections going through the machine to destinations not
related to the broken equipment.

This is doubly true when one is dealing with network containers
used to isolate several virtual domains.

Per route options implemented in free bits in the features route
entry property, which in some cases were reserved by name for these
options, so this does not inflate any structure.

Global sysctls for these options are still preserved and retain 
the exact original meaning (e.g. you have to have both the global 
sysctl turned on and not turn off the TCP option parsing in the
specific route to have it proccessed).

It is not possible to turn off globally an option but turn it on
per route, so as to not subtly change the meaning of current
establish sysctls (and this is a rare need anyway).

Tested on x86 using Qemu/KVM.

Working but crude matching patch to iproute2 sent earlier to the list.

Patchset based on original work by Ori Finkelman and Yony Amit
from ComSleep Ltd.

The author wishes to thank Eric Dumazaet, William Allen Simpson, 
Bill Fink and Ilpo Jarvinen for their feedback.


Gilad Ben-Yossef (7):
  Only parse time stamp TCP option in time wait sock
  Allow tcp_parse_options to consult dst entry
  Add dst_feature to query route entry features
  Add the no SACK route option feature
  Allow disabling TCP timestamp options per route
  Allow to turn off TCP window scale opt per route
  Allow disabling of DSACK TCP option per route

 include/linux/rtnetlink.h |    6 ++++--
 include/net/dst.h         |    8 +++++++-
 include/net/tcp.h         |    3 ++-
 net/ipv4/syncookies.c     |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c      |   26 ++++++++++++++++++--------
 net/ipv4/tcp_ipv4.c       |   21 ++++++++++++---------
 net/ipv4/tcp_minisocks.c  |    9 ++++++---
 net/ipv4/tcp_output.c     |   18 +++++++++++++-----
 net/ipv6/syncookies.c     |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c       |    3 ++-
 10 files changed, 93 insertions(+), 56 deletions(-)


^ permalink raw reply

* [PATCHv4 2/7] Allow tcp_parse_options to consult dst entry
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/net/tcp.h        |    3 ++-
 net/ipv4/syncookies.c    |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c     |    9 ++++++---
 net/ipv4/tcp_ipv4.c      |   21 ++++++++++++---------
 net/ipv4/tcp_minisocks.c |    7 +++++--
 net/ipv6/syncookies.c    |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c      |    3 ++-
 7 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..740d09b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -409,7 +409,8 @@ extern int			tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 extern void			tcp_parse_options(struct sk_buff *skb,
 						  struct tcp_options_received *opt_rx,
-						  int estab);
+						  int estab,
+						  struct dst_entry *dst);
 
 extern u8			*tcp_parse_md5sig_option(struct tcphdr *th);
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..4990dd4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
 	if (!req)
@@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	ireq->loc_addr		= ip_hdr(skb)->daddr;
 	ireq->rmt_addr		= ip_hdr(skb)->saddr;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 
 	/* We throwed the options of the initial SYN away, so we hope
 	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
@@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
 	/* Try to redo what tcp_v4_send_synack did. */
 	req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..d502f49 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,12 +3698,14 @@ old_ack:
  * the fast version below fails.
  */
 void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
-		       int estab)
+		       int estab,  struct dst_entry *dst)
 {
 	unsigned char *ptr;
 	struct tcphdr *th = tcp_hdr(skb);
 	int length = (th->doff * 4) - sizeof(struct tcphdr);
 
+	BUG_ON(!estab && !dst);
+
 	ptr = (unsigned char *)(th + 1);
 	opt_rx->saw_tstamp = 0;
 
@@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
 		if (tcp_parse_aligned_timestamp(tp, th))
 			return 1;
 	}
-	tcp_parse_options(skb, &tp->rx_opt, 1);
+	tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
 	return 1;
 }
 
@@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int saved_clamp = tp->rx_opt.mss_clamp;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	tcp_parse_options(skb, &tp->rx_opt, 0);
+	tcp_parse_options(skb, &tp->rx_opt, 0, dst);
 
 	if (th->ack) {
 		/* rfc793:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..1d611e3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1256,11 +1256,21 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
 #endif
 
+	ireq = inet_rsk(req);
+	ireq->loc_addr = daddr;
+	ireq->rmt_addr = saddr;
+	ireq->no_srccheck = inet_sk(sk)->transparent;
+	ireq->opt = tcp_v4_save_options(sk, skb);
+
+	dst = inet_csk_route_req(sk, req);
+	if(!dst)
+		goto drop_and_free;
+
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = 536;
 	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
@@ -1269,14 +1279,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 	tcp_openreq_init(req, &tmp_opt, skb);
 
-	ireq = inet_rsk(req);
-	ireq->loc_addr = daddr;
-	ireq->rmt_addr = saddr;
-	ireq->no_srccheck = inet_sk(sk)->transparent;
-	ireq->opt = tcp_v4_save_options(sk, skb);
-
 	if (security_inet_conn_request(sk, skb, req))
-		goto drop_and_free;
+		goto drop_and_release;
 
 	if (!want_cookie)
 		TCP_ECN_create_request(req, tcp_hdr(skb));
@@ -1301,7 +1305,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (tmp_opt.saw_tstamp &&
 		    tcp_death_row.sysctl_tw_recycle &&
-		    (dst = inet_csk_route_req(sk, req)) != NULL &&
 		    (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
 		    peer->v4daddr == saddr) {
 			if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 8c8c6e6..8bb560d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -102,7 +102,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
 		tmp_opt.tstamp_ok = 1;
-		tcp_parse_options(skb, &tmp_opt, 1);
+		tcp_parse_options(skb, &tmp_opt, 1, NULL);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
@@ -500,10 +500,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	int paws_reject = 0;
 	struct tcp_options_received tmp_opt;
 	struct sock *child;
+	struct dst_entry *dst = inet_csk_route_req(sk, req);
 
 	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(struct tcphdr)>>2)) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
@@ -516,6 +517,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	dst_release(dst);
+
 	/* Check for pure retransmitted SYN. */
 	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
 	    flg == TCP_FLAG_SYN &&
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 6b6ae91..6ece408 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -184,13 +184,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
 	if (!req)
@@ -224,12 +217,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	req->expires = 0UL;
 	req->retrans = 0;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
 
@@ -264,6 +251,21 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 			goto out_free;
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+
 	req->window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW);
 	tcp_select_initial_window(tcp_full_space(sk), req->mss,
 				  &req->rcv_wnd, &req->window_clamp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21d100b..2eebab5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1165,6 +1165,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct request_sock *req = NULL;
 	__u32 isn = TCP_SKB_CB(skb)->when;
+	struct dst_entry *dst = __sk_dst_get(sk);
 #ifdef CONFIG_SYN_COOKIES
 	int want_cookie = 0;
 #else
@@ -1203,7 +1204,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	tmp_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
 	tmp_opt.user_mss = tp->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 6/7]  Allow to turn off TCP window scale opt per route
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Add and use no window scale bit in the features field.

Note that this is not the same as setting a window scale of 0
as would happen with window limit on route.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    6 ++++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2ab8c75..6784b34 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -380,6 +380,7 @@ enum
 #define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
+#define RTAX_FEATURE_NO_WSCALE	0x00000010
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d2f9742..4f5e914 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3739,7 +3739,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_WINDOW:
 				if (opsize == TCPOLEN_WINDOW && th->syn &&
-				    !estab && sysctl_tcp_window_scaling) {
+				    !estab && sysctl_tcp_window_scaling &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)) {
 					__u8 snd_wscale = *(__u8 *)ptr;
 					opt_rx->wscale_ok = 1;
 					if (snd_wscale > 14) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8f30c18..ff60a21 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -496,7 +496,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->tsecr = tp->rx_opt.ts_recent;
 		size += TCPOLEN_TSTAMP_ALIGNED;
 	}
-	if (likely(sysctl_tcp_window_scaling)) {
+	if (likely(sysctl_tcp_window_scaling &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE))) {
 		opts->ws = tp->rx_opt.rcv_wscale;
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
@@ -2347,7 +2348,8 @@ static void tcp_connect_init(struct sock *sk)
 				  tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
 				  &tp->rcv_wnd,
 				  &tp->window_clamp,
-				  sysctl_tcp_window_scaling,
+				  (sysctl_tcp_window_scaling &&
+				   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)),
 				  &rcv_wscale);
 
 	tp->rx_opt.rcv_wscale = rcv_wscale;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 4/7] Add the no SACK route option feature
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Implement querying and acting upon the no sack bit in the features
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    4 +++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index adf2068..9c802a6 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -377,7 +377,7 @@ enum
 #define RTAX_MAX (__RTAX_MAX - 1)
 
 #define RTAX_FEATURE_ECN	0x00000001
-#define RTAX_FEATURE_SACK	0x00000002
+#define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_TIMESTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d502f49..b14f780 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3763,7 +3763,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_SACK_PERM:
 				if (opsize == TCPOLEN_SACK_PERM && th->syn &&
-				    !estab && sysctl_tcp_sack) {
+				    !estab && sysctl_tcp_sack &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_SACK)) {
 					opt_rx->sack_ok = 1;
 					tcp_sack_reset(opt_rx);
 				}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fcd278a..64db8dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -464,6 +464,7 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 				struct tcp_md5sig_key **md5) {
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned size = 0;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tp->af_specific->md5_lookup(sk, sk);
@@ -498,7 +499,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
 	}
-	if (likely(sysctl_tcp_sack)) {
+	if (likely(sysctl_tcp_sack &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_SACK))) {
 		opts->options |= OPTION_SACK_ADVERTISE;
 		if (unlikely(!(OPTION_TS & opts->options)))
 			size += TCPOLEN_SACKPERM_ALIGNED;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 3/7] Add dst_feature to query route entry features
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Adding an accessor to existing  dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/net/dst.h |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..b562be3 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -111,6 +111,12 @@ dst_metric(const struct dst_entry *dst, int metric)
 	return dst->metrics[metric-1];
 }
 
+static inline u32
+dst_feature(const struct dst_entry *dst, u32 feature)
+{
+	return dst_metric(dst, RTAX_FEATURES) & feature;
+}
+
 static inline u32 dst_mtu(const struct dst_entry *dst)
 {
 	u32 mtu = dst_metric(dst, RTAX_MTU);
@@ -136,7 +142,7 @@ static inline void set_dst_metric_rtt(struct dst_entry *dst, int metric,
 static inline u32
 dst_allfrag(const struct dst_entry *dst)
 {
-	int ret = dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_ALLFRAG;
+	int ret = dst_feature(dst,  RTAX_FEATURE_ALLFRAG);
 	/* Yes, _exactly_. This is paranoia. */
 	barrier();
 	return ret;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 5/7] Allow disabling TCP timestamp options per route
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Implement querying and acting upon the no timestamp bit in the feature
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    8 ++++++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9c802a6..2ab8c75 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -378,7 +378,7 @@ enum
 
 #define RTAX_FEATURE_ECN	0x00000001
 #define RTAX_FEATURE_NO_SACK	0x00000002
-#define RTAX_FEATURE_TIMESTAMP	0x00000004
+#define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
 struct rta_session
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b14f780..d2f9742 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3755,7 +3755,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 			case TCPOPT_TIMESTAMP:
 				if ((opsize == TCPOLEN_TIMESTAMP) &&
 				    ((estab && opt_rx->tstamp_ok) ||
-				     (!estab && sysctl_tcp_timestamps))) {
+				     (!estab && sysctl_tcp_timestamps &&
+				      !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP)))) {
 					opt_rx->saw_tstamp = 1;
 					opt_rx->rcv_tsval = get_unaligned_be32(ptr);
 					opt_rx->rcv_tsecr = get_unaligned_be32(ptr + 4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 64db8dd..8f30c18 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -488,7 +488,9 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 	opts->mss = tcp_advertise_mss(sk);
 	size += TCPOLEN_MSS_ALIGNED;
 
-	if (likely(sysctl_tcp_timestamps && *md5 == NULL)) {
+	if (likely(sysctl_tcp_timestamps &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) &&
+		   *md5 == NULL)) {
 		opts->options |= OPTION_TS;
 		opts->tsval = TCP_SKB_CB(skb)->when;
 		opts->tsecr = tp->rx_opt.ts_recent;
@@ -2317,7 +2319,9 @@ static void tcp_connect_init(struct sock *sk)
 	 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
 	 */
 	tp->tcp_header_len = sizeof(struct tcphdr) +
-		(sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);
+		(sysctl_tcp_timestamps &&
+		(!dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) ?
+		  TCPOLEN_TSTAMP_ALIGNED : 0));
 
 #ifdef CONFIG_TCP_MD5SIG
 	if (tp->af_specific->md5_lookup(sk, sk) != NULL)
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 7/7] Allow disabling of DSACK TCP option per route
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Add and use no DSCAK bit in the features field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6784b34..e78b60c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -381,6 +381,7 @@ enum
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 #define RTAX_FEATURE_NO_WSCALE	0x00000010
+#define RTAX_FEATURE_NO_DSACK	0x00000020
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f5e914..4262da5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4080,8 +4080,10 @@ static inline int tcp_sack_extend(struct tcp_sack_block *sp, u32 seq,
 static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+	if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+	    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 		int mib_idx;
 
 		if (before(seq, tp->rcv_nxt))
@@ -4110,13 +4112,15 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
 static void tcp_send_dupack(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
 	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
 		tcp_enter_quickack_mode(sk);
 
-		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+		if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+		    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 			u32 end_seq = TCP_SKB_CB(skb)->end_seq;
 
 			if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
-- 
1.5.6.3


^ permalink raw reply related

* [PATCHv4 1/7] Only parse time stamp TCP option in time wait sock
From: Gilad Ben-Yossef @ 2009-10-28 14:15 UTC (permalink / raw)
  To: netdev; +Cc: ori
In-Reply-To: <1256739327-11576-1-git-send-email-gilad@codefidence.com>

Since we only use tcp_parse_options here to check for the exietence
of TCP timestamp option in the header, it is better to call with
the "established" flag on.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Signed-off-by: Yony Amit <yony@comsleep.com>
---
 net/ipv4/tcp_minisocks.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 624c3c9..8c8c6e6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -100,9 +100,9 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	struct tcp_options_received tmp_opt;
 	int paws_reject = 0;
 
-	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tmp_opt.tstamp_ok = 1;
+		tcp_parse_options(skb, &tmp_opt, 1);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
-- 
1.5.6.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox