Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
From: David Miller @ 2011-05-19 19:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1305821795.3028.82.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 May 2011 18:16:35 +0200

> For sure it helps if this machine is the final host for these packets.
> 
> If I am a firewall or router [and not looking into GRE packets], maybe I
> dont want to spread all packets received on a tunnel to several cpus and
> reorder them when forwarded.

Maybe is the operative word here.

Unless you look inside of the tunnel, you may not have any entropy
at all for packet steering and that seems to be what Tom is trying
to attack here.

Also, if we are properly keying the inner-flow, reordering isn't an
issue.  Actual flows will not be reordered.

> Maybe we need to add a table, so that upper layer (GRE or IPIP tunnels)
> can instruct __skb_get_rxhash() that we want to deep inspect packets.

Keep in mind that we have essentially already established that the
goal of this code is to obtain as much hash steering entropy as
possible without causing the reordering of traffic within a
TCP/UDP/SCTP/etc. connection.

And Tom's changes are consistent with those goals.

If we want to start having knobs and ways to change this policy that
is a totally seperate discussion from whether Tom's changes are
correct and ready to be applied, which I think they essentially are.

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: David Miller @ 2011-05-19 19:35 UTC (permalink / raw)
  To: micha; +Cc: dcbw, netdev
In-Reply-To: <4DD56E7A.3010901@neli.hopto.org>

From: Micha Nelissen <micha@neli.hopto.org>
Date: Thu, 19 May 2011 21:24:42 +0200

> Dan Williams wrote:
>> Shouldn't the code still wait at *least* one second?  Not all drivers
>> support carrier detect, and those that don't set the carrier always-on.
>> Thus older devices that used to have 1s to get carrier in line (even if
>> they don't report it) now have only 10ms.
> 
> Btw, it does not matter much, there are 2 cases:
> 1) DHCP: dhcp will retry every few seconds, so if link is not up, then a
> later try will succeed
> 2) Static IP: an ARP request is performed every second, so the second
> request will be answered instead of the first.
> 
> Even if link is "fake up" by driver and not actually up after 10 msecs,
> things will continue to work (eventually), after a second, just like now.

But this eats one of the CONF_SEND_RETRIES attempts, which is only 6.

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: Dan Williams @ 2011-05-19 19:36 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: David Miller, netdev
In-Reply-To: <4DD56E7A.3010901@neli.hopto.org>

On Thu, 2011-05-19 at 21:24 +0200, Micha Nelissen wrote:
> Dan Williams wrote:
> > Shouldn't the code still wait at *least* one second?  Not all drivers
> > support carrier detect, and those that don't set the carrier always-on.
> > Thus older devices that used to have 1s to get carrier in line (even if
> > they don't report it) now have only 10ms.
> 
> Btw, it does not matter much, there are 2 cases:
> 1) DHCP: dhcp will retry every few seconds, so if link is not up, then a
> later try will succeed
> 2) Static IP: an ARP request is performed every second, so the second
> request will be answered instead of the first.
> 
> Even if link is "fake up" by driver and not actually up after 10 msecs,
> things will continue to work (eventually), after a second, just like now.

I don't particularly care what happens here, I was simply pointing out
that previous assumptions about older driver behavior are broken by this
patch, and this can cause a change in behavior.  The simplest thing to
do here is to revert only the hunks that change CONF_POST_OPEN, ie set
CONF_POST_OPEN back to 1, and revert the ssleep() -> msleep() bit.  The
rest of it looks fine to me.

But if davem wants to take the patch anyway, that's fine with me too,
since I believe all drivers that don't support carrier detect should be
put out of their misery by a quick bullet to the back of the head at the
end of a dark alley anyway.

Dan

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: Dan Williams @ 2011-05-19 19:32 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: David Miller, netdev
In-Reply-To: <4DD56701.8040007@neli.hopto.org>

On Thu, 2011-05-19 at 20:52 +0200, Micha Nelissen wrote:
> Dan Williams wrote:
> > On Wed, 2011-05-18 at 18:14 -0400, David Miller wrote:
> >> Please fix ic_is_init_dev() to return a proper boolean "false" instead
> >> of "0" when IFF_LOOPBACK is set.
> 
> Ok. Had an int before, but boolean is better.
> 
> > Shouldn't the code still wait at *least* one second?  Not all drivers
> > support carrier detect, and those that don't set the carrier always-on.
> > Thus older devices that used to have 1s to get carrier in line (even if
> > they don't report it) now have only 10ms.
> > 
> > I think it should wait at least one second like the code currently does,
> > and then if the carrier still isn't up, wait longer.
> 
> What is the 1 second based on?
> 
> If a driver does not support carrier detect, then this code will wait
> for the timeout period. Or do those older drivers set carrier detect
> immediately when device is probed?

Older devices that do not support carrier detect have the carrier always
on, ie IFF_RUNNING is always set, and netif_carrier_ok() always returns
1.  There is currently no way to determine whether a device supports
carrier detection or not, since drivers do not set a flag anywhere to
that effect (though I'd like it if they did).

I think the bit we're talking about is the change to CONF_POST_OPEN and
the corresponding change from ssleep(1) -> msleep(10).  For drivers that
don't support carrier detect, the patch would effectively decrease the
time that these drivers have to establish a carrier (even if they don't
report it to the kernel!) from 1 second to 10ms.  Then the code in
ic_open_devs() will immediately fall down to have_carrier because these
drivers have the carrier always one.

Net result: previously there was a 1 second window for this older
hardware to establish a link, now there is a 10ms window.  That's a
behavior change that could break stuff that used to work.

Dan

> Micha
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: David Miller @ 2011-05-19 19:27 UTC (permalink / raw)
  To: tsunanet
  Cc: alexander.zimmermann, hagen, kuznet, pekkas, jmorris, yoshfuji,
	kaber, eric.dumazet, netdev, linux-kernel
In-Reply-To: <BANLkTikZasOVfr2zTfXVjPqQsnzy9Hk5uA@mail.gmail.com>

From: tsuna <tsunanet@gmail.com>
Date: Thu, 19 May 2011 10:11:50 -0700

> Looking through the kernel, I see that SCTP already has knobs for
> this: sctp_rto_initial, sctp_rto_min, sctp_rto_max.  You can even
> control the constants used to update rttvar and srtt: sctp_rto_alpha,
> sctp_rto_beta

SCTP is 1) not even a sliver of deployment compared to TCP and 2)
doesn't get nearly the same scrutiny on patch review that TCP
changes do.

I basically let the SCTP folks play in their own sandbox, because
frankly SCTP doesn't matter.

The only time I care about an SCTP change is when it has an impact on
the rest of the networking code.

So using SCTP as an example of "see we do this already over here" is a
non-starter.  Don't do it.

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: Micha Nelissen @ 2011-05-19 19:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: David Miller, netdev
In-Reply-To: <1305819120.3271.3.camel@dcbw.foobar.com>

Dan Williams wrote:
> Shouldn't the code still wait at *least* one second?  Not all drivers
> support carrier detect, and those that don't set the carrier always-on.
> Thus older devices that used to have 1s to get carrier in line (even if
> they don't report it) now have only 10ms.

Btw, it does not matter much, there are 2 cases:
1) DHCP: dhcp will retry every few seconds, so if link is not up, then a
later try will succeed
2) Static IP: an ARP request is performed every second, so the second
request will be answered instead of the first.

Even if link is "fake up" by driver and not actually up after 10 msecs,
things will continue to work (eventually), after a second, just like now.

Micha

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: David Miller @ 2011-05-19 19:19 UTC (permalink / raw)
  To: dcbw; +Cc: micha, netdev
In-Reply-To: <1305819120.3271.3.camel@dcbw.foobar.com>

From: Dan Williams <dcbw@redhat.com>
Date: Thu, 19 May 2011 10:31:58 -0500

> On Wed, 2011-05-18 at 18:14 -0400, David Miller wrote:
>> From: Micha Nelissen <micha@neli.hopto.org>
>> Date: Wed, 18 May 2011 08:59:32 +0200
>> 
>> > Op 2011-05-18 8:37, David Miller schreef:
>> >> From: Micha Nelissen<micha@neli.hopto.org>
>> >> Date: Wed, 18 May 2011 08:32:35 +0200
>> >>
>> >>> I'm confused. Against which tree/commit do you want it then?
>> >>
>> >> Linus's current tree would be fine as would:
>> >>
>> >> 	git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git
>> > 
>> > Ok I see, thanks. The patch should apply just fine to your tree, there
>> > is only a spelling change since 2.6.38 which does not conflict.
>> 
>> Please fix ic_is_init_dev() to return a proper boolean "false" instead
>> of "0" when IFF_LOOPBACK is set.
> 
> Shouldn't the code still wait at *least* one second?  Not all drivers
> support carrier detect, and those that don't set the carrier always-on.
> Thus older devices that used to have 1s to get carrier in line (even if
> they don't report it) now have only 10ms.
> 
> I think it should wait at least one second like the code currently does,
> and then if the carrier still isn't up, wait longer.

Agreed.

^ permalink raw reply

* RE: [RFC 2/3] RDMA/cma: Add support for netlink statistics export
From: Hefty, Sean @ 2011-05-19 19:03 UTC (permalink / raw)
  To: Roland Dreier
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <BANLkTimjhAVfpJQX-PshVBgcshzfh-taRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

> And actually looking at https://patchwork.kernel.org/patch/90204/
> it looks reasonable to merge now... or I could just use
> cma_is_ud_ps() in the netlink stuff.
> 
> I'm inclined to just take your patch now if that makes sense to you
> (it looks pretty independent of everything else)
> 
> Is that the latest version in patchwork?

The only difference that I see between that patch and what's in my tree are the line numbers.  The patch looks safe enough to me to go in separately from anything else.  Thanks

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next-2.6] netfilter: add more values to enum ip_conntrack_info
From: David Miller @ 2011-05-19 19:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kaber, netfilter-devel, netdev
In-Reply-To: <1305812667.3028.66.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 May 2011 15:44:27 +0200

> Following error is raised (and other similar ones) :
> 
> net/ipv4/netfilter/nf_nat_standalone.c: In function ‘nf_nat_fn’:
> net/ipv4/netfilter/nf_nat_standalone.c:119:2: warning: case value ‘4’
> not in enumerated type ‘enum ip_conntrack_info’
> 
> gcc barfs on adding two enum values and getting a not enumerated
> result :
> 
> case IP_CT_RELATED+IP_CT_IS_REPLY:
> 
> Add missing enum values
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [RFC 2/3] RDMA/cma: Add support for netlink statistics export
From: Roland Dreier @ 2011-05-19 18:53 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <BANLkTimahta6QYrjTE5+DmN6RZXY_c72=Q@mail.gmail.com>

On Thu, May 19, 2011 at 11:49 AM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, May 19, 2011 at 11:35 AM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> One of the patches in the af_ib patch set adds the qp_type to struct rdma_cm_id.  I'm guessing that patch will also be needed for xrc.
>
> Would it make sense to pull that patch at least in?
>
> Sorry I haven't had time to really think about AF_IB in general but
> maybe I can at least merge the netlink stuff this cycle?

And actually looking at https://patchwork.kernel.org/patch/90204/
it looks reasonable to merge now... or I could just use
cma_is_ud_ps() in the netlink stuff.

I'm inclined to just take your patch now if that makes sense to you
(it looks pretty independent of everything else)

Is that the latest version in patchwork?

Thanks,
 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3] ipconfig wait for carrier
From: Micha Nelissen @ 2011-05-19 18:52 UTC (permalink / raw)
  To: Dan Williams; +Cc: David Miller, netdev
In-Reply-To: <1305819120.3271.3.camel@dcbw.foobar.com>

Dan Williams wrote:
> On Wed, 2011-05-18 at 18:14 -0400, David Miller wrote:
>> Please fix ic_is_init_dev() to return a proper boolean "false" instead
>> of "0" when IFF_LOOPBACK is set.

Ok. Had an int before, but boolean is better.

> Shouldn't the code still wait at *least* one second?  Not all drivers
> support carrier detect, and those that don't set the carrier always-on.
> Thus older devices that used to have 1s to get carrier in line (even if
> they don't report it) now have only 10ms.
> 
> I think it should wait at least one second like the code currently does,
> and then if the carrier still isn't up, wait longer.

What is the 1 second based on?

If a driver does not support carrier detect, then this code will wait
for the timeout period. Or do those older drivers set carrier detect
immediately when device is probed?

Micha

^ permalink raw reply

* Re: net: add seq_before/seq_after functions
From: David Miller @ 2011-05-19 18:52 UTC (permalink / raw)
  To: sven; +Cc: ordex, linux-kernel, netdev, paulus, linux-ppp
In-Reply-To: <201105191121.23888.sven@narfation.org>

From: Sven Eckelmann <sven@narfation.org>
Date: Thu, 19 May 2011 11:21:21 +0200

> This is currently used by vis.c in net/batman-adv and could also be used by 
> ppp-generic.c (with my changes of course). And it is planned to be used by 
> transtable.c in net/batman-adv. The idea was to propose this to linux-
> kernel/netdev before we move it to a place were only batman-adv can use it 
> (the current situation is that vis.c in batman-adv can only use it).
> 
> It is ok that you say that it should be batman-adv specific - we only wanted 
> to ask first.

Well, this is a purely networking change, the header you're touching is
networking specific, so really in this case linux-kernel didn't need to
get involved :-)

^ permalink raw reply

* Re: [RFC 2/3] RDMA/cma: Add support for netlink statistics export
From: Roland Dreier @ 2011-05-19 18:49 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373FF9E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

On Thu, May 19, 2011 at 11:35 AM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> One of the patches in the af_ib patch set adds the qp_type to struct rdma_cm_id.  I'm guessing that patch will also be needed for xrc.

Would it make sense to pull that patch at least in?

Sorry I haven't had time to really think about AF_IB in general but
maybe I can at least merge the netlink stuff this cycle?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 2/3] RDMA/cma: Add support for netlink statistics export
From: Hefty, Sean @ 2011-05-19 18:35 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <BANLkTin1uVyrzdn_030bGmC=QHtfkvOTNA@mail.gmail.com>

> Is there an easy way to get the qp_type from a struct rdma_cm_id?
> 
> ie what code needs to go into cma_get_id_stats() to handle this?

With the current code, you'd need to map from the port space:

static inline enum ib_qp_type cma_get_qp_type(struct rdma_cm_id *id)
{
	if (id->ps == RDMA_PS_IPOIB || id->ps == RDMA_PS_UDP)
		return IB_QPT_RC;
	else
		return IB_QPT_RC;
}

	..
	id_stats->qp_type = cma_get_qp_type(id);
	..

One of the patches in the af_ib patch set adds the qp_type to struct rdma_cm_id.  I'm guessing that patch will also be needed for xrc.

- Sean

^ permalink raw reply

* Re: [RFC 2/3] RDMA/cma: Add support for netlink statistics export
From: Roland Dreier @ 2011-05-19 18:10 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373F428-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

On Fri, May 13, 2011 at 10:21 AM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> +struct rdma_cm_id_stats {
>> +     __u32   qp_num;
>> +     __u32   bound_dev_if;
>> +     __u32   port_space;
>> +     __s32   pid;
>> +     __u8    cm_state;
>> +     __u8    node_type;
>> +     __u8    port_num;
>> +     __u8    reserved;
>> +};
>
> We may also want to add qp_type

Is there an easy way to get the qp_type from a struct rdma_cm_id?

ie what code needs to go into cma_get_id_stats() to handle this?

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02.
From: Yuchung Cheng @ 2011-05-19 17:42 UTC (permalink / raw)
  To: Benoit Sigoure; +Cc: netdev, linux-kernel, Benoit Sigoure, Hsiao-keng Jerry Chu
In-Reply-To: <1305786976-84532-1-git-send-email-tsunanet@gmail.com>

Hi Benoit,

AFAICT, the passive open side would not fall back the
RTO to 3sec in this change because SYNACK timeouts are not
recorded in icsk_retransmits but reqsk->retrans?

Yuchung

On Wed, May 18, 2011 at 11:36 PM, Benoit Sigoure <tsunanet@gmail.com> wrote:
>
> From: Benoit Sigoure <tsuna@stumbleupon.com>
>
> Draft RFC 2988bis-02 recommends that the initial RTO be lowered
> from 3 seconds down to 1 second, and that in case of a timeout
> during the TCP 3WHS, the RTO should fallback to 3 seconds when
> data transmission begins.
> ---
>
> On Wed, May 18, 2011 at 10:46 PM, David Miller <davem@davemloft.net> wrote:
> > From: tsuna <tsunanet@gmail.com>
> > Date: Wed, 18 May 2011 21:33:21 -0700
> >
> >> On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote:
> >>> I really would rather see the initial RTO be static and be set to 1
> >>> with fallback RTO of 3.
> >>
> >> I can also provide a simple patch for this if you want to start from
> >> there.  And then maybe we can discuss having a runtime knob some more
> >> :-)
> >
> > Yeah why don't we do that :-)
>
> Alright, here we go.
>
>
>  include/net/tcp.h    |    5 ++++-
>  net/ipv4/tcp_input.c |   13 +++++++++----
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index cda30ea..274d761 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
>  #endif
>  #define TCP_RTO_MAX    ((unsigned)(120*HZ))
>  #define TCP_RTO_MIN    ((unsigned)(HZ/5))
> -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ))    /* RFC 1122 initial RTO value   */
> +/* The next 2 values come from Draft RFC 2988bis-02. */
> +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ))            /* initial RTO value    */
> +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ))   /* initial RTO to fallback to when
> +                                                        * a timeout happens during the 3WHS.   */
>
>  #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes
>                                                         * for local resources.
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bef9f04..a36bc35 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk)
>  {
>        struct tcp_sock *tp = tcp_sk(sk);
>        struct dst_entry *dst = __sk_dst_get(sk);
> +       /* If we had to retransmit anything during the 3WHS, use
> +        * the initial fallback RTO as per draft RFC 2988bis-02.
> +        */
> +       int init_rto = inet_csk(sk)->icsk_retransmits ?
> +               TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT;
>
>        if (dst == NULL)
>                goto reset;
> @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk)
>        if (dst_metric(dst, RTAX_RTT) == 0)
>                goto reset;
>
> -       if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3))
> +       if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3))
>                goto reset;
>
>        /* Initial rtt is determined from SYN,SYN-ACK.
> @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk)
>                tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
>        }
>        tcp_set_rto(sk);
> -       if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) {
> +       if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) {
>  reset:
>                /* Play conservative. If timestamps are not
>                 * supported, TCP will fail to recalculate correct
> @@ -924,8 +929,8 @@ reset:
>                 */
>                if (!tp->rx_opt.saw_tstamp && tp->srtt) {
>                        tp->srtt = 0;
> -                       tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
> -                       inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
> +                       tp->mdev = tp->mdev_max = tp->rttvar = init_rto;
> +                       inet_csk(sk)->icsk_rto = init_rto;
>                }
>        }
>        tp->snd_cwnd = tcp_init_cwnd(tp, dst);
> --
> 1.7.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Add libertas_disablemesh module parameter to disable mesh interface
From: Dan Williams @ 2011-05-19 17:16 UTC (permalink / raw)
  To: Sascha Silbe
  Cc: linux-wireless, devel, John, W.Linville, linville, libertas-dev,
	netdev, linux-kernel
In-Reply-To: <1305290935-sup-4547@xo15-sascha.sascha.silbe.org>

On Fri, 2011-05-13 at 15:16 +0200, Sascha Silbe wrote:
> Excerpts from Dan Williams's message of Thu May 12 05:11:36 +0200 2011:
> > On Wed, 2011-05-11 at 14:52 +0200, Sascha Silbe wrote:
> > > This allows individual users and deployments to disable mesh support at
> > > runtime, i.e. without having to build and maintain a custom kernel.
> 
> > Does the mesh interface somehow cause problems, even when nothing is
> > using it?
> 
> Some people suspect it does, but there's no hard data showing that. But
> then the problems are often hard to reproduce in the first place, so
> proving a correlation with mesh is even harder.

That's not an excuse for not finding and fixing the problem though.
What problems are we actually talking about here?

> The hardware based mesh support is based on an outdated draft of
> 802.11s and not interoperable with any other device AFAIK. For most
> users Ad-hoc networks are the better option. Disabling mesh support as
> low-level as possible makes it less likely that any remains are causing
> trouble. With at least four layers (firmware, kernel, NM, Sugar)
> involved in managing connectivity and one of the (firmware) being closed
> source, I prefer to simplify things by eliminating three layers for
> functionality we don't intend to use. It makes debugging (and
> blaming ;) ) a lot easier.
> 
> In the field, mesh support is currently disabled using
> /sys/class/net/eth0/lbs_mesh. However, it comes back after resume
> (possibly only if powercycled) and needs to be disabled again by
> post-resume hacks. Race conditions with NM are possible.

That's a parameter handled by the driver; so shouldn't we make sure it's
respected again on resume?

> A user space option would be to teach NM to disable mesh support (at
> runtime - we don't want to ship a custom NM package). I'd expect the
> patch to be much more invasive than the one posted for libertas.

Not really, but we already have on/off for a bunch of other stuff, I
don't see why we can't add one for OLPC mesh.

Dan

^ permalink raw reply

* [PATCH RFC] virtio_ring: fix delayed enable cb implementation
From: Michael S. Tsirkin @ 2011-05-19 17:12 UTC (permalink / raw)
  To: rusty, habanero, Shirley Ma, Krishna Kumar2, kvm, steved,
	Tom Lendacky <tahm@
  Cc: virtualization, netdev, linux-kernel

Fix some signed/assigned mistakes in virtqueue_enable_cb_delayed
by using u16 math all over.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

I'll put this on my v1 branch as well

@@ -398,7 +397,7 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
 bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	int bufs;
+	u16 bufs;
 
 	START_USE(vq);
 
@@ -412,7 +411,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	bufs = (vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
 	vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
 	virtio_mb();
-	if (unlikely(vq->vring.used->idx - vq->last_used_idx > bufs)) {
+	if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
 	}
 
-- 
1.7.5.53.gc233e

^ permalink raw reply

* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: tsuna @ 2011-05-19 17:11 UTC (permalink / raw)
  To: Alexander Zimmermann
  Cc: Hagen Paul Pfeifer, David Miller, kuznet, pekkas, jmorris,
	yoshfuji, kaber, eric.dumazet, netdev, linux-kernel
In-Reply-To: <1456193D-84D1-46E2-B930-8FD0A5B8C409@comsys.rwth-aachen.de>

On Thu, May 19, 2011 at 9:55 AM, Alexander Zimmermann
<alexander.zimmermann@comsys.rwth-aachen.de> wrote:
> Exactly. This is the point. It's *your* environment. However, TCP is
> general purpose. And for the wider internet 1s is know to be save. See the
> measurements in the draft that Mark Allman run.

That's right, there's no one-size-fits-all solution.  That's why I'm
in favor of keeping a reasonably conservative default (say 1s to 3s,
so we don't break the Internets) and giving people a knob to adjust it
to whatever makes sense for them.

Looking through the kernel, I see that SCTP already has knobs for
this: sctp_rto_initial, sctp_rto_min, sctp_rto_max.  You can even
control the constants used to update rttvar and srtt: sctp_rto_alpha,
sctp_rto_beta

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

^ permalink raw reply

* Re: kernel BUG at net/ipv4/tcp_output.c:1006!
From: Eric Dumazet @ 2011-05-19 17:11 UTC (permalink / raw)
  To: TB; +Cc: David Miller, linux-kernel, netdev
In-Reply-To: <4DD54E89.7050707@techboom.com>

Le jeudi 19 mai 2011 à 13:08 -0400, TB a écrit :
> On 11-05-13 04:01 PM, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Fri, 13 May 2011 21:47:38 +0200
> > 
> >> I suspect we should push commit 2fceec13375e5d98 (tcp: len check is
> >> unnecessarily devastating, change to WARN_ON) to stable if not already
> >> done...
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2fceec13375e5d98
> >>
> >> David, is this commit in your stable queue ?
> > 
> > No, but now it is.
> 
> We've put this commit with the previous tcp_cubic patch on 60 of our
> servers and we're waiting to see how it goes.

Dont expect too much. It only permits to survive after logging messages,
instead of halting machine ;)

^ permalink raw reply

* Re: kernel BUG at net/ipv4/tcp_output.c:1006!
From: TB @ 2011-05-19 17:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, linux-kernel, netdev
In-Reply-To: <20110513.160138.1477780250019480052.davem@davemloft.net>

On 11-05-13 04:01 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 13 May 2011 21:47:38 +0200
> 
>> I suspect we should push commit 2fceec13375e5d98 (tcp: len check is
>> unnecessarily devastating, change to WARN_ON) to stable if not already
>> done...
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2fceec13375e5d98
>>
>> David, is this commit in your stable queue ?
> 
> No, but now it is.

We've put this commit with the previous tcp_cubic patch on 60 of our
servers and we're waiting to see how it goes.

^ permalink raw reply

* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: Alexander Zimmermann @ 2011-05-19 16:55 UTC (permalink / raw)
  To: tsuna
  Cc: Hagen Paul Pfeifer, David Miller, kuznet, pekkas, jmorris,
	yoshfuji, kaber, eric.dumazet, netdev, linux-kernel
In-Reply-To: <BANLkTimSZEbnNVzi3UvBFndHp25S0ow7YQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]


Am 19.05.2011 um 18:40 schrieb tsuna:

> On Thu, May 19, 2011 at 1:02 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
>> So yes, it CAN be wise to choose other lower/upper bounds. But keep in
>> mind that we should NOT artificial limit ourself. I can image data center
>> scenarios where a initial RTO of <1 match perfectly.
> 
> Yes that's exactly the point I was trying to make when talking to
> Alexander offline.  On today's Internet, RTTs are easily in the
> hundreds of ms, and initRTO is 3s, so there's 2 orders of magnitude of
> difference.  In my environment,

Exactly. This is the point. It's *your* environment. However, TCP is
general purpose. And for the wider internet 1s is know to be save. See the
measurements in the draft that Mark Allman run.

> if my RTT is ~2µs, an initRTO of 200ms
> means that there's a gap of 6 orders of magnitude (!).

Currently, initRTO is 3s. So you the gap is even larger. 

> And yes,
> although I don't work for High Frequency Trading companies in Wall
> Street, I'm already buying switches full of line-rate 10Gb ports with
> a port-to-port latency of 500ns for L2/L3 forwarding/switching.  I
> expect this kind of network gear will quickly become prevalent in
> datacenter/backend environments.
> 
> -- 
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//


[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply

* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: tsuna @ 2011-05-19 16:40 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Alexander Zimmermann, David Miller, kuznet, pekkas, jmorris,
	yoshfuji, kaber, eric.dumazet, netdev, linux-kernel
In-Reply-To: <ef84de89c2597793d4cca5eee446ba90@localhost>

On Thu, May 19, 2011 at 1:02 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> So yes, it CAN be wise to choose other lower/upper bounds. But keep in
> mind that we should NOT artificial limit ourself. I can image data center
> scenarios where a initial RTO of <1 match perfectly.

Yes that's exactly the point I was trying to make when talking to
Alexander offline.  On today's Internet, RTTs are easily in the
hundreds of ms, and initRTO is 3s, so there's 2 orders of magnitude of
difference.  In my environment, if my RTT is ~2µs, an initRTO of 200ms
means that there's a gap of 6 orders of magnitude (!).  And yes,
although I don't work for High Frequency Trading companies in Wall
Street, I'm already buying switches full of line-rate 10Gb ports with
a port-to-port latency of 500ns for L2/L3 forwarding/switching.  I
expect this kind of network gear will quickly become prevalent in
datacenter/backend environments.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

^ permalink raw reply

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
From: Eric Dumazet @ 2011-05-19 16:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1105190827420.12804@pokey.mtv.corp.google.com>

Le jeudi 19 mai 2011 à 08:39 -0700, Tom Herbert a écrit :
> Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
> in encapsulated packet.  Note that this is used only when the
> __skb_get_rxhash is taken, in particular only when the device does
> not compute provide the rxhash (ie. feature is disabled).
> 
> This was tested by creating a single GRE tunnel between two 16 core
> AMD machines.  200 netperf TCP_RR streams were ran with 1 byte
> request and response size.
> 
> Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
> With patch: 325896 tps, 50/90/99% latencies 603/848/1169
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/core/dev.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0c83494..7799bbd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2552,6 +2552,28 @@ again:
>  	}
>  
>  	switch (ip_proto) {
> +	case IPPROTO_GRE:
> +		if (pskb_may_pull(skb, nhoff + 16)) {
> +			u8 *h = skb->data + nhoff;
> +			__be16 flags = *(__be16 *)h;
> +
> +			/*
> +			 * Only look inside GRE if version zero and no
> +			 * routing
> +			 */
> +			if (!(flags & (GRE_VERSION|GRE_ROUTING))) {
> +				proto = *(__be16 *)(h + 2);
> +				nhoff += 4;
> +				if (flags & GRE_CSUM)
> +					nhoff += 4;
> +				if (flags & GRE_KEY)
> +					nhoff += 4;
> +				if (flags & GRE_SEQ)
> +					nhoff += 4;
> +				goto again;
> +			}
> +		}
> +		break;
>  	default:
>  		break;
>  	}

Hi Tom

For sure it helps if this machine is the final host for these packets.

If I am a firewall or router [and not looking into GRE packets], maybe I
dont want to spread all packets received on a tunnel to several cpus and
reorder them when forwarded.

Maybe we need to add a table, so that upper layer (GRE or IPIP tunnels)
can instruct __skb_get_rxhash() that we want to deep inspect packets.

1) Say we keep rxhash first evaluation be the one we have today.

2) Do a hash lookup in a new table to tell if upper layer handled a
previous packet for this first level flow and want more inspection.

3) table could contains 'pointers' to decoding function, that would
recompute a new rxhash function.

4) Find a way to "clean the table", garbage collect or expirations times
can do.

This way we can add stuff in GRE and IPIP modules [and other kind of
tunnels], without layer violations ?

^ permalink raw reply

* Re: [Bridge] [Patch] bridge: call NETDEV_ENSLAVE notifiers when adding a slave
From: Stephen Hemminger @ 2011-05-19 16:04 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Amerigo Wang, Neil Horman, bridge, netdev, Jay Vosburgh,
	linux-kernel, akpm, David S. Miller
In-Reply-To: <20110519081213.15c05da2@nehalam>

On Thu, 19 May 2011 08:12:13 -0700
Stephen Hemminger <shemminger@linux-foundation.org> wrote:

> On Thu, 19 May 2011 18:24:17 +0800
> Amerigo Wang <amwang@redhat.com> wrote:
> 
> > In the previous patch I added NETDEV_ENSLAVE, now
> > we can notify netconsole when adding a device to a bridge too.
> > 
> > By the way, s/netdev_bonding_change/call_netdevice_notifiers/ in
> > bond_main.c, since this is not bonding specific.
> > 
> > Signed-off-by: WANG Cong <amwang@redhat.com>
> > Cc: Neil Horman <nhorman@redhat.com>
> > 
> 
> Is there a usage for this? What listens for this notification?

Never mind it was in the first patch which you did not send.
You should always put a number on group of patches and send
to all parties.

Also, sending networking patches to LKML is a waste of bandwidth
please don't bother.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox