Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux-next: Tree for Jun 18 (netfilter nfconntrack)
From: Pablo Neira Ayuso @ 2012-06-19  3:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, linux-next, LKML, netdev, netfilter-devel,
	coreteam
In-Reply-To: <4FDF65F6.6010002@xenotime.net>

On Mon, Jun 18, 2012 at 10:31:34AM -0700, Randy Dunlap wrote:
> On 06/17/2012 11:53 PM, Stephen Rothwell wrote:
> 
> > Hi all,
> > 
> > Changes since 20120615:
> 
> 
> 
> on i386 or x86_64:
> 
> # CONFIG_NF_CONNTRACK is not set
> 
>   CC [M]  net/netfilter/nfnetlink_cthelper.o
> In file included from include/net/netfilter/nf_conntrack_helper.h:12:0,
>                  from net/netfilter/nfnetlink_cthelper.c:23:
> include/net/netfilter/nf_conntrack.h:77:22: error: field 'ct_general' has incomplete type
> include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
> include/net/netfilter/nf_conntrack.h:157:30: error: 'const struct sk_buff' has no member named 'nfct'
> include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
> include/net/netfilter/nf_conntrack.h:164:2: error: implicit declaration of function 'nf_conntrack_put'
> make[3]: *** [net/netfilter/nfnetlink_cthelper.o] Error 1

I've send a patch to David to solve this:

 netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset

It seems to resolve the issue for me here.

Thanks for the report.

^ permalink raw reply

* Re: [PATCH 0/4] netfilter updates for net-next (batch 3)
From: David Miller @ 2012-06-19  3:28 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Tue, 19 Jun 2012 05:16:25 +0200

> The patches provide:
> 
> * compilation fixes if CONFIG_NF_CONNTRACK is disabled: I moved all the
>   conntrack code from nfnetlink_queue.c to nfnetlink_queue_ct.c to avoid
>   peppering the entire code with lots of ifdefs. I needed to rename
>   nfnetlink_queue.c to nfnetlink_queue_core.c to get it working with the
>   Makefile tweaks I've added.
> 
> * fix NULL pointer dereference via ctnetlink while trying to change the helper
>   for an existing conntrack entry. I don't find any reasonable use case for
>   changing the helper from one to another in run-time. Thus, now ctnetlink
>   returns -EOPNOTSUPP for this operation.
> 
> * fix possible out-of-bound zeroing of the conntrack extension area due to
>   the helper automatic assignation routine.
> 
> You can pull these changes from:
> 
> git://1984.lsi.us.es/nf-next master

Pulled, thanks.

^ permalink raw reply

* Re: pull request: batman-adv 2012-06-18
From: David Miller @ 2012-06-19  3:28 UTC (permalink / raw)
  To: ordex; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1340051963-14836-1-git-send-email-ordex@autistici.org>

From: Antonio Quartulli <ordex@autistici.org>
Date: Mon, 18 Jun 2012 22:39:04 +0200

> Hello David,
> 	here is our first set of changes intended for next-next/linux-3.6.
> 
> Patch 2 fixes a major bug in the TranslationTable code where the old value of
> skb->data is used for memory access even if the data was relocated.
> Patches 4, 10, 11, 13, 14 are endianess-related cleanups that we wrote thanks
> to Al Viro's advice and help.
> Thanks to Martin Hundebøll batman-adv now supports the ethtool API and can
> export several counters that are specific to our module (patch 5).
> Then patch 16 improves the routing protocol API by making part of the
> TranslationTable code routing agnostic.
> The rest are minor fixes and other cleanups.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH 0/4] netfilter updates for net-next (batch 3)
From: Pablo Neira Ayuso @ 2012-06-19  3:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

On Tue, Jun 19, 2012 at 05:16:25AM +0200, pablo@netfilter.org wrote:
[...]
> You can pull these changes from:
> 
> git://1984.lsi.us.es/nf-next master

Please, also take the small patch attached after this 4 patch series. It
fixes one linking issue.

Sorry, I'll put more care next time testing compilation options more
extensively.

[-- Attachment #2: 0001-netfilter-fix-missing-symbols-if-CONFIG_NETFILTER_NE.patch --]
[-- Type: text/x-diff, Size: 1358 bytes --]

>From af6b248c22759fb7448668bbe495f1cbe0a9109d Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 19 Jun 2012 05:25:46 +0200
Subject: [PATCH] netfilter: fix missing symbols if
 CONFIG_NETFILTER_NETLINK_QUEUE_CT unset

ERROR: "nfqnl_ct_parse" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_seq_adjust" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_put" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_get" [net/netfilter/nfnetlink_queue.ko] undefined!

We have to use CONFIG_NETFILTER_NETLINK_QUEUE_CT in
include/net/netfilter/nfnetlink_queue.h, not CONFIG_NF_CONNTRACK.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nfnetlink_queue.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netfilter/nfnetlink_queue.h b/include/net/netfilter/nfnetlink_queue.h
index 9f8095c..86267a5 100644
--- a/include/net/netfilter/nfnetlink_queue.h
+++ b/include/net/netfilter/nfnetlink_queue.h
@@ -5,7 +5,7 @@
 
 struct nf_conn;
 
-#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+#ifdef CONFIG_NETFILTER_NETLINK_QUEUE_CT
 struct nf_conn *nfqnl_ct_get(struct sk_buff *entskb, size_t *size,
 			     enum ip_conntrack_info *ctinfo);
 struct nf_conn *nfqnl_ct_parse(const struct sk_buff *skb,
-- 
1.7.10


^ permalink raw reply related

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Stephen Hemminger @ 2012-06-19  4:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.194016.2282814982594761206.davem@davemloft.net>

> 
> You know you want it.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

David, I understand it, Eric understands it, and maybe one or
two others. But on the principal of what is "good for the goose
is good for the gander", you really need to provide a reasonable
change log entry. Just because you are the network maintainer
doesn't mean you get to skip all the documented rules about submitting
patches.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.194016.2282814982594761206.davem@davemloft.net>

On Mon, 2012-06-18 at 19:40 -0700, David Miller wrote:
> You know you want it.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Yeah, very good idea David ;)

needs some polishing of course.

This reminds the idea of having seperate dst per tcp socket, to remove
the dst refcnt contention as well.

^ permalink raw reply

* Re: [PATCH 0/4] netfilter updates for net-next (batch 3)
From: David Miller @ 2012-06-19  4:09 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20120619033745.GA31405@1984>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 19 Jun 2012 05:37:45 +0200

> On Tue, Jun 19, 2012 at 05:16:25AM +0200, pablo@netfilter.org wrote:
> [...]
>> You can pull these changes from:
>> 
>> git://1984.lsi.us.es/nf-next master
> 
> Please, also take the small patch attached after this 4 patch series. It
> fixes one linking issue.
> 
> Sorry, I'll put more care next time testing compilation options more
> extensively.

Done.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  4:10 UTC (permalink / raw)
  To: stephen.hemminger; +Cc: netdev
In-Reply-To: <a22edf13-1bf5-49fb-8ebe-05054f68c129@tahiti.vyatta.com>

From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Mon, 18 Jun 2012 21:03:26 -0700 (PDT)

> David, I understand it, Eric understands it, and maybe one or
> two others. But on the principal of what is "good for the goose
> is good for the gander", you really need to provide a reasonable
> change log entry. Just because you are the network maintainer
> doesn't mean you get to skip all the documented rules about submitting
> patches.

This wasn't going to be the final commit log entry.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Changli Gao @ 2012-06-19  4:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <a22edf13-1bf5-49fb-8ebe-05054f68c129@tahiti.vyatta.com>

On Tue, Jun 19, 2012 at 12:03 PM, Stephen Hemminger
<stephen.hemminger@vyatta.com> wrote:
>
>>
>> You know you want it.
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> David, I understand it, Eric understands it, and maybe one or
> two others. But on the principal of what is "good for the goose
> is good for the gander", you really need to provide a reasonable
> change log entry. Just because you are the network maintainer
> doesn't mean you get to skip all the documented rules about submitting
> patches.

Agree. Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  4:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340078846.7491.2127.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jun 2012 06:07:26 +0200

> On Mon, 2012-06-18 at 19:40 -0700, David Miller wrote:
>> You know you want it.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Yeah, very good idea David ;)
> 
> needs some polishing of course.

Such as?  IPv6 support?

> This reminds the idea of having seperate dst per tcp socket, to remove
> the dst refcnt contention as well.

I'm leery of this.

We're going to move towards having dst entries more strongly shared
as we move to remove the routing cache.

In fact I have another short-term change planned that adjusts which
keys the routing cache uses based upon what kinds of keys are actually
active in the current FIB rule configuration.

I think we want to encourage sharing and make the route footprint
smaller rather than expanding it's size artificually on even the
socket level.

If you really care about this refcount problem, then it's another
reason to never orphan socket sourced packets.  Then we wouldn't need
to ever refcount the route just to send a packet, we'd just use the
implicit reference held by the socket instead.  Socket route releasing
would be held back by the presence of any packets in the socket send
queue.  If we have to reset the dst mis-lifetime due to route flushes,
we'd need to use a specific packet as a sequence point.

That to me sounds like a more reasonable approach than just making
more and more routes.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  4:16 UTC (permalink / raw)
  To: xiaosuo; +Cc: stephen.hemminger, netdev
In-Reply-To: <CABa6K_E+EzeJQF_m4aoq1WAYUwWQxd8_=k7HT0XZ6P5EpBR+4g@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 19 Jun 2012 12:13:48 +0800

> On Tue, Jun 19, 2012 at 12:03 PM, Stephen Hemminger
> <stephen.hemminger@vyatta.com> wrote:
>>
>>>
>>> You know you want it.
>>>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> David, I understand it, Eric understands it, and maybe one or
>> two others. But on the principal of what is "good for the goose
>> is good for the gander", you really need to provide a reasonable
>> change log entry. Just because you are the network maintainer
>> doesn't mean you get to skip all the documented rules about submitting
>> patches.
> 
> Agree. Thanks.

That's the last time I try to be even slightly humerous on this
list.

Thanks for killing the fun Stephen.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.211539.105938285016510975.davem@davemloft.net>

On Mon, 2012-06-18 at 21:15 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 19 Jun 2012 06:07:26 +0200
> 
> > On Mon, 2012-06-18 at 19:40 -0700, David Miller wrote:
> >> You know you want it.
> >> 
> >> Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > Yeah, very good idea David ;)
> > 
> > needs some polishing of course.
> 
> Such as?  IPv6 support?
> 

I was referring to socket leak in :

+       if (sk) {
+               skb->sk = sk;
+               skb->destructor = sock_edemux;
+               if (sk->sk_state != TCP_TIME_WAIT) {
+                       struct dst_entry *dst = sk->sk_rx_dst;
+                       if (dst) {
+                               skb_dst_set_noref(skb, dst);
+                               err = 0;
+                       }
+               }
+       }

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1340079813.7491.2164.camel@edumazet-glaptop>

On Tue, 2012-06-19 at 06:23 +0200, Eric Dumazet wrote:
> On Mon, 2012-06-18 at 21:15 -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue, 19 Jun 2012 06:07:26 +0200
> > 
> > > On Mon, 2012-06-18 at 19:40 -0700, David Miller wrote:
> > >> You know you want it.
> > >> 
> > >> Signed-off-by: David S. Miller <davem@davemloft.net>
> > > 
> > > Yeah, very good idea David ;)
> > > 
> > > needs some polishing of course.
> > 
> > Such as?  IPv6 support?
> > 
> 
> I was referring to socket leak in :
> 
> +       if (sk) {
> +               skb->sk = sk;
> +               skb->destructor = sock_edemux;
> +               if (sk->sk_state != TCP_TIME_WAIT) {
> +                       struct dst_entry *dst = sk->sk_rx_dst;
> +                       if (dst) {
> +                               skb_dst_set_noref(skb, dst);
> +                               err = 0;
> +                       }
> +               }
> +       }
> 
> 

It's not a leak, but seems strange to keep it around if we dont use it
yet.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  4:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340079813.7491.2164.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jun 2012 06:23:33 +0200

> I was referring to socket leak in :
> 
> +       if (sk) {
> +               skb->sk = sk;
> +               skb->destructor = sock_edemux;
> +               if (sk->sk_state != TCP_TIME_WAIT) {
> +                       struct dst_entry *dst = sk->sk_rx_dst;
> +                       if (dst) {
> +                               skb_dst_set_noref(skb, dst);
> +                               err = 0;
> +                       }
> +               }
> +       }
> 

I see no leak.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  4:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340079938.7491.2172.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jun 2012 06:25:38 +0200

> On Tue, 2012-06-19 at 06:23 +0200, Eric Dumazet wrote:
>> On Mon, 2012-06-18 at 21:15 -0700, David Miller wrote:
>> > From: Eric Dumazet <eric.dumazet@gmail.com>
>> > Date: Tue, 19 Jun 2012 06:07:26 +0200
>> > 
>> > > On Mon, 2012-06-18 at 19:40 -0700, David Miller wrote:
>> > >> You know you want it.
>> > >> 
>> > >> Signed-off-by: David S. Miller <davem@davemloft.net>
>> > > 
>> > > Yeah, very good idea David ;)
>> > > 
>> > > needs some polishing of course.
>> > 
>> > Such as?  IPv6 support?
>> > 
>> 
>> I was referring to socket leak in :
>> 
>> +       if (sk) {
>> +               skb->sk = sk;
>> +               skb->destructor = sock_edemux;
>> +               if (sk->sk_state != TCP_TIME_WAIT) {
>> +                       struct dst_entry *dst = sk->sk_rx_dst;
>> +                       if (dst) {
>> +                               skb_dst_set_noref(skb, dst);
>> +                               err = 0;
>> +                       }
>> +               }
>> +       }
>> 
>> 
> 
> It's not a leak, but seems strange to keep it around if we dont use it
> yet.

How are we not using it?  We use the cached SKB socket no matter what
happens.

Look at how inet hash lookup works.

The error tells the caller solely whether a route lookup is still
necessary.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.211539.105938285016510975.davem@davemloft.net>

On Mon, 2012-06-18 at 21:15 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>

> > This reminds the idea of having seperate dst per tcp socket, to remove
> > the dst refcnt contention as well.
> 
> I'm leery of this.
> 
> We're going to move towards having dst entries more strongly shared
> as we move to remove the routing cache.
> 
> In fact I have another short-term change planned that adjusts which
> keys the routing cache uses based upon what kinds of keys are actually
> active in the current FIB rule configuration.
> 
> I think we want to encourage sharing and make the route footprint
> smaller rather than expanding it's size artificually on even the
> socket level.
> 
> If you really care about this refcount problem, then it's another
> reason to never orphan socket sourced packets.  Then we wouldn't need
> to ever refcount the route just to send a packet, we'd just use the
> implicit reference held by the socket instead.  Socket route releasing
> would be held back by the presence of any packets in the socket send
> queue.  If we have to reset the dst mis-lifetime due to route flushes,
> we'd need to use a specific packet as a sequence point.
> 
> That to me sounds like a more reasonable approach than just making
> more and more routes.

We already don't touch dst refcnt on TCP xmit path, unless packet is
parked in Qdisc queue...

But with BQL this is becoming less effective.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.212721.2148947025741866390.davem@davemloft.net>

On Mon, 2012-06-18 at 21:27 -0700, David Miller wrote:

> How are we not using it?  We use the cached SKB socket no matter what
> happens.
> 
> Look at how inet hash lookup works.
> 
> The error tells the caller solely whether a route lookup is still
> necessary.

OK, remove the unlikely() in __inet_lookup_skb() so that its obvious we
have this skb_steal_sock() thing :)

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Changli Gao @ 2012-06-19  4:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert
In-Reply-To: <20120618.194016.2282814982594761206.davem@davemloft.net>

On Tue, Jun 19, 2012 at 10:40 AM, David Miller <davem@davemloft.net> wrote:
> @@ -324,19 +324,34 @@ static int ip_rcv_finish(struct sk_buff *skb)
>         *      how the packet travels inside Linux networking.
>         */
>        if (skb_dst(skb) == NULL) {
> -               int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
> -                                              iph->tos, skb->dev);
> -               if (unlikely(err)) {
> -                       if (err == -EHOSTUNREACH)
> -                               IP_INC_STATS_BH(dev_net(skb->dev),
> -                                               IPSTATS_MIB_INADDRERRORS);
> -                       else if (err == -ENETUNREACH)
> -                               IP_INC_STATS_BH(dev_net(skb->dev),
> -                                               IPSTATS_MIB_INNOROUTES);
> -                       else if (err == -EXDEV)
> -                               NET_INC_STATS_BH(dev_net(skb->dev),
> -                                                LINUX_MIB_IPRPFILTER);
> -                       goto drop;
> +               const struct net_protocol *ipprot;
> +               int protocol = iph->protocol;
> +               int hash, err;
> +
> +               hash = protocol & (MAX_INET_PROTOS - 1);
> +
> +               rcu_read_lock();
> +               ipprot = rcu_dereference(inet_protos[hash]);
> +               err = -ENOENT;
> +               if (ipprot && ipprot->early_demux)
> +                       err = ipprot->early_demux(skb);

I am afraid that this lookup with hurt the performance of the
forwarding path. A knob?

If this approach is acceptable, maybe we can use sockets to do finer RFS.

Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  4:47 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <CABa6K_Ge-Y1ne3iBcN1HSRN+G4bW6hSAHYFGKuA5QKW6CT3grQ@mail.gmail.com>

On Tue, 2012-06-19 at 12:43 +0800, Changli Gao wrote:

> I am afraid that this lookup with hurt the performance of the
> forwarding path. A knob?
> 

ip_rcv() & ip_rcv_finish() in the forwarding path ?

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Changli Gao @ 2012-06-19  4:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <1340081276.7491.2228.camel@edumazet-glaptop>

On Tue, Jun 19, 2012 at 12:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-06-19 at 12:43 +0800, Changli Gao wrote:
>
>> I am afraid that this lookup with hurt the performance of the
>> forwarding path. A knob?
>>
>
> ip_rcv() & ip_rcv_finish() in the forwarding path ?
>
>

Yes, the two routines are shared by both. I think you mean
ip_local_deliver() and ip_local_deliver_finish().

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [RFC] TCP:  Support configurable delayed-ack parameters.
From: Eric Dumazet @ 2012-06-19  5:11 UTC (permalink / raw)
  To: greearb; +Cc: netdev, Daniel Baluta
In-Reply-To: <1340067163-29329-1-git-send-email-greearb@candelatech.com>

On Mon, 2012-06-18 at 17:52 -0700, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> RFC2581 ($4.2) specifies when an ACK should be generated as follows:
> 
> " .. an ACK SHOULD be generated for at least every second
>   full-sized segment, and MUST be generated within 500 ms
>   of the arrival of the first unacknowledged packet.
> "
> 
> We export the number of segments and the timeout limits
> specified above, so that a user can tune them according
> to their needs.
> 
> Specifically:
> 	* /proc/sys/net/ipv4/tcp_default_delack_segs, represents
> 	the threshold for the number of segments.
> 	* /proc/sys/net/ipv4/tcp_default_delack_min, specifies
> 	the minimum timeout value
> 	* /proc/sys/net/ipv4/tcp_default_delack_max, specifies
> 	the maximum timeout value.
> 
> In addition, new TCP socket options are added to allow
> per-socket configuration:
> 
> TCP_DELACK_SEGS
> TCP_DELACK_MIN
> TCP_DELACK_MAX
> 
> In order to keep a multiply out of the hot path, the segs * mss
> computation is recalculated and cached whenever segs or mss changes.
> 

I know David was worried about this multiply, but current cpus do a
multiply in at most 3 cycles.

Addding an u32 field in socket structure adds 1/16 of a cache line, and
adds more penalty.

Avoiding to build/send an ACK packet can save us so many cpu cycles that
the multiply is pure noise.

^ permalink raw reply

* RE: [PATCH] net: added support for 40GbE link.
From: Parav.Pandit @ 2012-06-19  5:20 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev, bhutchings
In-Reply-To: <4FDF56FB.9080509@hp.com>



> -----Original Message-----
> From: Rick Jones [mailto:rick.jones2@hp.com]
> Sent: Monday, June 18, 2012 9:58 PM
> To: Pandit, Parav
> Cc: netdev@vger.kernel.org; bhutchings@solarflare.com
> Subject: Re: [PATCH] net: added support for 40GbE link.
> 
> On 06/18/2012 05:44 AM, Parav Pandit wrote:
> 
> > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index
> > 297370a..1ebfa6e 100644
> > --- a/include/linux/ethtool.h
> > +++ b/include/linux/ethtool.h
> > @@ -1153,6 +1153,10 @@ struct ethtool_ops {
> >   #define SUPPORTED_10000baseR_FEC	(1<<  20)
> >   #define SUPPORTED_20000baseMLD2_Full	(1<<  21)
> >   #define SUPPORTED_20000baseKR2_Full	(1<<  22)
> > +#define SUPPORTED_40000baseKR4_Full	(1<<  23)
> > +#define SUPPORTED_40000baseCR4_Full	(1<<  24)
> > +#define SUPPORTED_40000baseSR4_Full	(1<<  25)
> > +#define SUPPORTED_40000baseLR4_Full	(1<<  26)
> >
> >   /* Indicates what features are advertised by the interface. */
> >   #define ADVERTISED_10baseT_Half		(1<<  0)
> > @@ -1178,6 +1182,10 @@ struct ethtool_ops {
> >   #define ADVERTISED_10000baseR_FEC	(1<<  20)
> >   #define ADVERTISED_20000baseMLD2_Full	(1<<  21)
> >   #define ADVERTISED_20000baseKR2_Full	(1<<  22)
> > +#define ADVERTISED_40000baseKR4_Full	(1<<  23)
> > +#define ADVERTISED_40000baseCR4_Full	(1<<  24)
> > +#define ADVERTISED_40000baseSR4_Full	(1<<  25)
> > +#define ADVERTISED_40000baseLR4_Full	(1<<  26)
> 
> Any idea how many defines will be wanted for 100 Gbit Ethernet?
> Supported and advertising in ethtool_cmd are __u32s...
> 
100G supports CR10, ER4, LR4, Base-R and SR10 interfaces. So 5 bits. We have space from 27 to 31 bits for 100G as well.

> rick jones

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  6:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340080661.7491.2205.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jun 2012 06:37:41 +0200

> On Mon, 2012-06-18 at 21:27 -0700, David Miller wrote:
> 
>> How are we not using it?  We use the cached SKB socket no matter what
>> happens.
>> 
>> Look at how inet hash lookup works.
>> 
>> The error tells the caller solely whether a route lookup is still
>> necessary.
> 
> OK, remove the unlikely() in __inet_lookup_skb() so that its obvious we
> have this skb_steal_sock() thing :)

Sure thing.

We also need to add some dst->ops->check() handling as well.

^ permalink raw reply

* Re: [PATCH] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-19  6:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120618.230723.612232275251003129.davem@davemloft.net>

On Mon, 2012-06-18 at 23:07 -0700, David Miller wrote:

> Sure thing.
> 
> We also need to add some dst->ops->check() handling as well.

Yes, rp_filter comes to mind.

^ permalink raw reply

* Re: [PATCH] sctp: fix warning when compiling without IPv6
From: David Miller @ 2012-06-19  7:28 UTC (permalink / raw)
  To: dhalperi; +Cc: netdev
In-Reply-To: <B7062EEE-046D-4435-B5FC-54FF3F763645@cs.washington.edu>

From: Daniel Halperin <dhalperi@cs.washington.edu>
Date: Mon, 18 Jun 2012 14:04:55 -0700

> net/sctp/protocol.c: In function ‘sctp_addr_wq_timeout_handler’:
> net/sctp/protocol.c:676: warning: label ‘free_next’ defined but not used
> 
> Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox