Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h
From: David Miller @ 2007-12-21  9:55 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <476B5AC0.9060604@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 21 Dec 2007 07:18:40 +0100

> Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
> forces compiler to use an integer divide. We can instead use
> a right shift.
> 
> SK_STREAM_MEM_QUANTUM deserves to be declared as an unsigned
> quantity, so that sk_stream_pages() and __sk_stream_mem_reclaim()
> can use right shifts instead of integer divides.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Everything that works with SK_STREAM_MEM_QUANTUM is an int.

We do a DIV_ROUND_UP() using SK_STREAM_MEM_QUANTUM and an int.

We do sk->sk_forward_alloc modifications using multiplies on
SK_STREAM_MEM_QUANTUM and an int, sk_forward_alloc is an int
too.

Changing the type of SK_STREAM_MEM_QUANTUM does nothing more than
create an exception in the typing used in these operations for no real
gain, in fact it makes this code's correctness harder to verify.

I'm fine with the rest of your change, so please resubmit this patch
with the type change removed.

Thanks.

^ permalink raw reply

* Re: [TCP] tcp_write_timeout.c cleanup
From: David Miller @ 2007-12-21  9:51 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <476B5581.6040603@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 21 Dec 2007 06:56:17 +0100

> Before submiting a patch to change a divide to a right shift, I felt
> necessary to create a helper function tcp_mtu_probing() to reduce length of 
> lines exceeding 100 chars in tcp_write_timeout().
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Thanks for not mixing code cleanups with functional changes,
it makes my life so much easier and make me so much want to
review and apply your patches.

^ permalink raw reply

* Fwd: Increasing initial cwnd size
From: Sourav Chakraborty @ 2007-12-21  9:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <a5ed8fe60712202358o7afc0bddhe77447d0d77631d8@mail.gmail.com>

---------- Forwarded message ----------
From: Sourav Chakraborty <sourav1877@gmail.com>
Date: Dec 21, 2007 1:28 PM
Subject: Increasing initial cwnd size
To: linux-kernel@vger.kernel.org

Hello,

We are trying to test with kernel version 2.6.20,the RFC 3390
implementation of increased initial congestion window size(~4 when
MSS=1460 bytes).We are not getting consistent results
meaning,sometimes we see 4 data pkts sent without an ACK received and
sometimes we see 3.Also we would like to test by setting the initial
cwnd to a value more than 4.Can you guys advise us on how to do
that(where in the code should we change and what)?Also mention if
using linux 2.6.23 kernel instead of  the one we are using,would help
in this work.

Thanks and Regards
Sourav

^ permalink raw reply

* Re: [INET] Avoid an integer divide in rt_garbage_collect()
From: David Miller @ 2007-12-21  9:49 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <476B7762.30908@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 21 Dec 2007 09:20:50 +0100

> Since 'goal' is a signed int, compiler may emit an integer divide
> to compute goal/2.
> 
> Using a right shift is OK here and less expensive.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks.

^ permalink raw reply

* Re: [TCP]: Convert several length variable to unsigned.
From: David Miller @ 2007-12-21  9:47 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20071221.164811.126977620.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Fri, 21 Dec 2007 16:48:11 +0900 (JST)

> Several length variables cannot be negative, so convert int to
> unsigned int.  This also allows us to do sane shift operations
> on those variables.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied to net-2.6.25, thanks.

^ permalink raw reply

* Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: David Miller @ 2007-12-21  9:46 UTC (permalink / raw)
  To: dada1; +Cc: yoshfuji, netdev
In-Reply-To: <476B6DAC.2030102@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 21 Dec 2007 08:39:24 +0100

> I didnt chose this path, because David was against changing some
> fields from 'int' to 'unsigned'. If you look in other parts of
> networking, we have many >> 1 or >> 2 already there.

I don't remember making this statement, where did I say this?
I'm genuinely curious :-)

But I am indeed against it in cases where the variable is compared
against signed quantities.

I think the shifts are more pretty and more closely match what the
programmer wants to come out of the compiler.  Getting all of these
divides is an awful surprise for me.

I've learned over the years to never explicitly code divides
or multiplies by powers of two and to always use shifts.  As
a result I am never surprised.  In fact I've been burnt every
time I mistakedly didn't use a shift.

Nevertheless, this tcplen arg is always assigned to and used
with unsigned quantities so I'll apply Yoshifuji's version of
the fix.

^ permalink raw reply

* Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
From: David Miller @ 2007-12-21  9:43 UTC (permalink / raw)
  To: paul.moore; +Cc: netdev, linux-audit, latten
In-Reply-To: <20071220214225.12122.48656.stgit@flek.lan>

From: Paul Moore <paul.moore@hp.com>
Date: Thu, 20 Dec 2007 16:42:25 -0500

> This patch adds a number of new IPsec audit events to meet the auditing
> requirements of RFC4303.  This includes audit hooks for the following events:
> 
>  * Could not find a valid SA [sections 2.1, 3.4.2]
>    . xfrm_audit_state_notfound()
>    . xfrm_audit_state_notfound_simple()
> 
>  * Sequence number overflow [section 3.3.3]
>    . xfrm_audit_state_replay_overflow()
> 
>  * Replayed packet [section 3.4.3]
>    . xfrm_audit_state_replay()
> 
>  * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
>    . xfrm_audit_state_icvfail()
> 
> While RFC4304 deals only with ESP most of the changes in this patch apply to
> IPsec in general, i.e. both AH and ESP.  The one case, integrity check
> failure, where ESP specific code had to be modified the same was done to the
> AH code for the sake of consistency.
> 
> Signed-off-by: Paul Moore <paul.moore@hp.com>

This doesn't apply at all to net-2.6.25, in particular
xfrm6_input_addr() doesn't even have a local variable
named "xfrm_vec_one" let alone the conditional where you're
adding the state notfound audit hook.

Please respin this and the third patch, thanks.

^ permalink raw reply

* Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
From: Denis V. Lunev @ 2007-12-21  9:39 UTC (permalink / raw)
  To: David Miller; +Cc: den, devel, netdev, kaber, kuznet
In-Reply-To: <20071221.013321.164620125.davem@davemloft.net>

David Miller wrote:
> From: "Denis V. Lunev" <den@openvz.org>
> Date: Fri, 21 Dec 2007 12:00:43 +0300
> 
>> nl_fib_input re-reuses incoming skb to send the reply. This means that this
>> packet will be freed twice, namely in:
>> - netlink_unicast_kernel
>> - on receive path
>> Use clone to send as a cure, the caller is responsible for kfree_skb on error.
>>
>> Thanks to Alexey Dobryan, who originally found the problem.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
> 
> What introduced this bug?  This code didn't have this
> problem previously.
> 

commit cd40b7d3983c708aabe3d3008ec64ffce56d33b0
Author: Denis V. Lunev <den@openvz.org>
Date:   Wed Oct 10 21:15:29 2007 -0700

^ permalink raw reply

* Re: TSO trimming question
From: David Miller @ 2007-12-21  9:36 UTC (permalink / raw)
  To: herbert; +Cc: billfink, ilpo.jarvinen, netdev, jheffner
In-Reply-To: <20071221092927.GA32434@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 21 Dec 2007 17:29:27 +0800

> On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
> >
> > It's two shifts, and this gets scheduled along with the other
> > instructions on many cpus so it's effectively free.
> > 
> > I don't see why this is even worth mentioning and discussing.
> 
> I totally agree.  Two shifts are way better than a branch.

We take probably a thousand+ 100+ cycle cache misses in the TCP stack
on big window openning ACKs.

Instead of discussing ways to solve that huge performance killer we're
wanking about two friggin' integer shifts.

It's hilarious isn't it? :-)


^ permalink raw reply

* Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
From: David Miller @ 2007-12-21  9:33 UTC (permalink / raw)
  To: den; +Cc: devel, netdev, kaber, kuznet
In-Reply-To: <20071221090043.GA25484@iris.sw.ru>

From: "Denis V. Lunev" <den@openvz.org>
Date: Fri, 21 Dec 2007 12:00:43 +0300

> nl_fib_input re-reuses incoming skb to send the reply. This means that this
> packet will be freed twice, namely in:
> - netlink_unicast_kernel
> - on receive path
> Use clone to send as a cure, the caller is responsible for kfree_skb on error.
> 
> Thanks to Alexey Dobryan, who originally found the problem.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>

What introduced this bug?  This code didn't have this
problem previously.

^ permalink raw reply

* Re: TSO trimming question
From: Herbert Xu @ 2007-12-21  9:29 UTC (permalink / raw)
  To: David Miller; +Cc: billfink, ilpo.jarvinen, netdev, jheffner
In-Reply-To: <20071221.012720.67812056.davem@davemloft.net>

On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
>
> It's two shifts, and this gets scheduled along with the other
> instructions on many cpus so it's effectively free.
> 
> I don't see why this is even worth mentioning and discussing.

I totally agree.  Two shifts are way better than a branch.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: TSO trimming question
From: Ilpo Järvinen @ 2007-12-21  9:28 UTC (permalink / raw)
  To: Bill Fink; +Cc: David Miller, Netdev, Herbert Xu, John Heffner
In-Reply-To: <Pine.LNX.4.64.0712211055550.31652@kivilampi-30.cs.helsinki.fi>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 717 bytes --]

On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

> On Fri, 21 Dec 2007, Bill Fink wrote:
> 
> > If so it seems like a lot of unnecessary
> > work just to avoid a 1 in 4 billion event, since it's my understanding
> > that the whole tcp_tso_should_defer function is just an optimization
> > and not a criticality to the proper functioning of TCP, especially
> > considering it hasn't even been executing at all up to now.
> 
> It would still be good to not return 1 in that case we didn't flag the 
> deferral, how about adding one additional tick (+comment) in the zero 
> case making tso_deferred == 0 again unambiguously defined, e.g.:
> 
> 	tp->tso_deferred = min_t(u32, jiffies, 1);

Blah, max_t of course :-).


-- 
 i.

^ permalink raw reply

* Re: TSO trimming question
From: David Miller @ 2007-12-21  9:27 UTC (permalink / raw)
  To: billfink; +Cc: ilpo.jarvinen, netdev, herbert, jheffner
In-Reply-To: <20071221030648.389669c4.billfink@mindspring.com>

From: Bill Fink <billfink@mindspring.com>
Date: Fri, 21 Dec 2007 03:06:48 -0500

> What's with all the shifting back and forth?  Here with:
> 
> 	((jiffies<<1)>>1) - (tp->tso_deferred>>1)
> 
> and later with:
> 
> 	/* Ok, it looks like it is advisable to defer.  */
> 	tp->tso_deferred = 1 | (jiffies<<1);
> 
> Is this just done to try and avoid the special case of jiffies==0 
> when the jiffies wrap?  If so it seems like a lot of unnecessary
> work just to avoid a 1 in 4 billion event, since it's my understanding
> that the whole tcp_tso_should_defer function is just an optimization
> and not a criticality to the proper functioning of TCP, especially
> considering it hasn't even been executing at all up to now.

How else would you avoid the incorrect result when jiffies is
indeed zero?

It's two shifts, and this gets scheduled along with the other
instructions on many cpus so it's effectively free.

I don't see why this is even worth mentioning and discussing.

^ permalink raw reply

* Re: TSO trimming question
From: Ilpo Järvinen @ 2007-12-21  9:26 UTC (permalink / raw)
  To: Bill Fink; +Cc: David Miller, Netdev, Herbert Xu, John Heffner
In-Reply-To: <20071221030648.389669c4.billfink@mindspring.com>

On Fri, 21 Dec 2007, Bill Fink wrote:

> I meant to ask about this a while back but then got distracted by
> other things.  But now since the subject has come up, I had a couple
> of more questions about this code.
> 
> What's with all the shifting back and forth?  Here with:
> 
> 	((jiffies<<1)>>1) - (tp->tso_deferred>>1)
> 
> and later with:
> 
> 	/* Ok, it looks like it is advisable to defer.  */
> 	tp->tso_deferred = 1 | (jiffies<<1);
> 
> Is this just done to try and avoid the special case of jiffies==0 
> when the jiffies wrap? 

I thought that it must be the reason. I couldn't figure out any other 
explination while thinking the same thing but since I saw no problem in 
that rather weird approach, I didn't want to touch that in a patch which 
had net-2.6 (or stable) potential.

> If so it seems like a lot of unnecessary
> work just to avoid a 1 in 4 billion event, since it's my understanding
> that the whole tcp_tso_should_defer function is just an optimization
> and not a criticality to the proper functioning of TCP, especially
> considering it hasn't even been executing at all up to now.

It would still be good to not return 1 in that case we didn't flag the 
deferral, how about adding one additional tick (+comment) in the zero 
case making tso_deferred == 0 again unambiguously defined, e.g.:

	tp->tso_deferred = min_t(u32, jiffies, 1);

...I'm relatively sure that nobody would ever notice that tick :-) and we 
kept return value consistent with tso_deferred state invariant.

> My second question is more basic and if I'm not mistaken actually
> relates to a remaining bug in the (corrected) test:
> 
> 	/* Defer for less than two clock ticks. */
> 	if (tp->tso_deferred &&
> 	    ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
> 
> Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
> whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which
> will happen in about 25 days if HZ=1000, won't the second part of the
> test always be true after that?  Or am I missing something obvious?

I didn't notice that earlier but I think you're right though my knowledge 
about jiffies and such is quite limited.

...Feel free to submit a patch to correct these.


-- 
 i.

^ permalink raw reply

* Re: [PATCH] PS3: gelic: Add wireless support for PS3
From: Masakazu Mokuno @ 2007-12-21  9:26 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	geoffrey.levand-mEdOJwZ7QcZBDgjK7y7TUQ, Geert Uytterhoeven
In-Reply-To: <1197614933.3798.36.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>

	Hi David,

On Fri, 14 Dec 2007 01:48:52 -0500
David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c~	2007-12-14 01:31:50.000000000 -0500
> +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c	2007-12-14 01:39:25.000000000 -0500
> @@ -1139,7 +1139,7 @@ static irqreturn_t gelic_card_interrupt(
>   *
>   * see Documentation/networking/netconsole.txt
>   */
> -static void gelic_net_poll_controller(struct net_device *netdev)
> +void gelic_net_poll_controller(struct net_device *netdev)
>  {
>  	struct gelic_card *card = netdev_card(netdev);
>  
> --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h~	2007-12-14 01:31:50.000000000 -0500
> +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h	2007-12-14 01:39:47.000000000 -0500
> @@ -346,6 +346,7 @@ extern void gelic_net_tx_timeout(struct 
>  extern int gelic_net_change_mtu(struct net_device *netdev, int new_mtu);
>  extern int gelic_net_setup_netdev(struct net_device *netdev,
>  				  struct gelic_card *card);
> +extern void gelic_net_poll_controller(struct net_device *netdev);
>  
>  /* shared ethtool ops */
>  extern void gelic_net_get_drvinfo(struct net_device *netdev,


Thank you for your reviewing.  I'll fix it.

-- 
Masakazu MOKUNO

^ permalink raw reply

* [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
From: Denis V. Lunev @ 2007-12-21  9:00 UTC (permalink / raw)
  To: davem; +Cc: devel, netdev, kaber, kuznet

nl_fib_input re-reuses incoming skb to send the reply. This means that this
packet will be freed twice, namely in:
- netlink_unicast_kernel
- on receive path
Use clone to send as a cure, the caller is responsible for kfree_skb on error.

Thanks to Alexey Dobryan, who originally found the problem.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index b03c4c6..ac6238a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -832,10 +832,13 @@ static void nl_fib_input(struct sk_buff *skb)
 
 	nlh = nlmsg_hdr(skb);
 	if (skb->len < NLMSG_SPACE(0) || skb->len < nlh->nlmsg_len ||
-	    nlh->nlmsg_len < NLMSG_LENGTH(sizeof(*frn))) {
-		kfree_skb(skb);
+	    nlh->nlmsg_len < NLMSG_LENGTH(sizeof(*frn)))
 		return;
-	}
+
+	skb = skb_clone(skb, GFP_KERNEL);
+	if (skb == NULL)
+		return;
+	nlh = nlmsg_hdr(skb);
 
 	frn = (struct fib_result_nl *) NLMSG_DATA(nlh);
 	tb = fib_get_table(frn->tb_id_in);

^ permalink raw reply related

* Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-12-21  8:53 UTC (permalink / raw)
  To: satoru.satoh, shemminger; +Cc: netdev, yoshfuji
In-Reply-To: <d72b7ace0712201824k3754962cse377256cb5452b8a@mail.gmail.com>

In article <d72b7ace0712201824k3754962cse377256cb5452b8a@mail.gmail.com> (at Fri, 21 Dec 2007 11:24:54 +0900), "Satoru SATOH" <satoru.satoh@gmail.com> says:

> 2007/12/21, Jarek Poplawski <jarkao2@gmail.com>:
> > Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
> > ...
> >
> > > but since it's your patch, I hope you do some additional checking
> > > if it's always like this...
> >
> >
> > ...or maybe only changing this all a little bit will make it look safer!
> >
> > Jarek P.
> 
> 
> OK, how about this?
> 
> Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
> 
>  ip/iproute.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index f4200ae..c771b34 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
> struct nlmsghdr *n, void *arg)
>  				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
>  			else {
>  				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
> +				unsigned hz1 = hz;
> +				if (hz1 > 1000)

Why don't you simply use unsigned long long (or maybe uint64_t) here?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

--- 
diff --git a/ip/iproute.c b/ip/iproute.c
index f4200ae..db9a3b6 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 			    i != RTAX_RTO_MIN)
 				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
 			else {
-				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
+				unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]);
+				unsigned div = 1;
 
-				val *= 1000;
 				if (i == RTAX_RTT)
-					val /= 8;
+					div = 8;
 				else if (i == RTAX_RTTVAR)
-					val /= 4;
-				if (val >= hz)
-					fprintf(fp, " %ums", val/hz);
+					div = 4;
 				else
+					div = 1;
+
+				val = val * 1000ULL / div;
+
+				if (val >= hz) {
+					fprintf(fp, " %llums", val/hz);
+				} else
 					fprintf(fp, " %.2fms", (float)val/hz);
 			}
 		}

--yoshfuji

^ permalink raw reply related

* Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
From: Jarek Poplawski @ 2007-12-21  8:34 UTC (permalink / raw)
  To: Satoru SATOH; +Cc: netdev
In-Reply-To: <d72b7ace0712201824k3754962cse377256cb5452b8a@mail.gmail.com>

On 21-12-2007 03:24, Satoru SATOH wrote:
> 2007/12/21, Jarek Poplawski <jarkao2@gmail.com>:
>> Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
>> ...
>>
>>> but since it's your patch, I hope you do some additional checking
>>> if it's always like this...
>>
>> ...or maybe only changing this all a little bit will make it look safer!
>>
>> Jarek P.
> 
> 
> OK, how about this?
> 
> Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
> 
>  ip/iproute.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index f4200ae..c771b34 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
> struct nlmsghdr *n, void *arg)
>  				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
>  			else {
>  				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
> +				unsigned hz1 = hz;
> +				if (hz1 > 1000)

Looks OK (safe) to me: it's compatible both with old and new way.
I'd only suggest to maybe change this to '(hz1 > 1024)', because
it's the biggest HZ currently in the kernel, so this compatibility
should be 100%. I think, you could leave 1 empty line before this
'if', as well. (Btw., aren't these overflows connected with
CONFIG_HIGH_RES_TIMERS?)

On the other hand this 'hz' still looks 'strange' here - I don't
understand, why, a bit earlier it's:

	if (!hz)
		hz = get_hz();

while 'else' would use: hz == get_user_hz();
So, probably I miss something, but even after your patch, there
could be different outputs here...

Thanks,
Jarek P.

PS: did you CC Stephen Hemminger on this?

> +					hz1 /= 1000;
> +				else
> +					val *= 1000;
> 
> -				val *= 1000;
>  				if (i == RTAX_RTT)
>  					val /= 8;
>  				else if (i == RTAX_RTTVAR)
>  					val /= 4;
> -				if (val >= hz)
> -					fprintf(fp, " %ums", val/hz);
> +				if (val >= hz1)
> +					fprintf(fp, " %ums", val/hz1);
>  				else
> -					fprintf(fp, " %.2fms", (float)val/hz);
> +					fprintf(fp, " %.2fms", (float)val/hz1);
>  			}
>  		}
>  	}
> 
> 
> Thanks,
> Satoru SATOH

^ permalink raw reply

* [INET] Avoid an integer divide in rt_garbage_collect()
From: Eric Dumazet @ 2007-12-21  8:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 189 bytes --]

Since 'goal' is a signed int, compiler may emit an integer divide
to compute goal/2.

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: route_divide.patch --]
[-- Type: text/plain, Size: 789 bytes --]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e35076e..10915bb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -851,14 +851,14 @@ static int rt_garbage_collect(void)
 			equilibrium = ipv4_dst_ops.gc_thresh;
 		goal = atomic_read(&ipv4_dst_ops.entries) - equilibrium;
 		if (goal > 0) {
-			equilibrium += min_t(unsigned int, goal / 2, rt_hash_mask + 1);
+			equilibrium += min_t(unsigned int, goal >> 1, rt_hash_mask + 1);
 			goal = atomic_read(&ipv4_dst_ops.entries) - equilibrium;
 		}
 	} else {
 		/* We are in dangerous area. Try to reduce cache really
 		 * aggressively.
 		 */
-		goal = max_t(unsigned int, goal / 2, rt_hash_mask + 1);
+		goal = max_t(unsigned int, goal >> 1, rt_hash_mask + 1);
 		equilibrium = atomic_read(&ipv4_dst_ops.entries) - goal;
 	}
 

^ permalink raw reply related

* Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: Eric Dumazet @ 2007-12-21  8:11 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: davem, netdev
In-Reply-To: <20071221.162833.82587283.yoshfuji@linux-ipv6.org>

YOSHIFUJI Hideaki / 吉藤英明 a écrit :
> In article <476B65F8.10201@cosmosbay.com> (at Fri, 21 Dec 2007 08:06:32 +0100), Eric Dumazet <dada1@cosmosbay.com> says:
> 
>> YOSHIFUJI Hideaki / 吉藤英明 a écrit :
>>> In article <476B574E.80601@cosmosbay.com> (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet <dada1@cosmosbay.com> says:
>>>
>>>> Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler
>>>> to emit an integer divide, while we can help it to use a right shift,
>>>> less expensive.
>>> Are you really sure?
>>> At least, gcc-4.1.2-20061115 (debian) does not make any difference.
>>>
>>> And, IMHO, because shift for signed variable is fragile, so we should
>>> avoid using it.
>>>
>> Yes I am sure, but maybe you are on x86_64 ?
>>
>> gcc-4.2.2 on x86
> 
> I'm on gcc-4.1.2 20061115 (prerelease) (Debian 4.1.1-21), on x86 (i686).
> Maybe compiler difference?!

hum, more probably a .config difference. Try with

CONFIG_CC_OPTIMIZE_FOR_SIZE=Y

(default config)



^ permalink raw reply

* Re: TSO trimming question
From: Bill Fink @ 2007-12-21  8:06 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, netdev, herbert, jheffner
In-Reply-To: <20071220.035621.147230372.davem@davemloft.net>

On Thu, 20 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
> 
> > [PATCH] [TCP]: Fix TSO deferring
> > 
> > I'd say that most of what tcp_tso_should_defer had in between
> > there was dead code because of this.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Yikes!!!!!
> 
> John, we've been living a lie for more than a year. :-/
> 
> On the bright side this explains a lot of small TSO frames I've been
> seeing in traces over the past year but never got a chance to
> investigate.
> 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 8dafda9..693b9f6 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
> >  		goto send_now;
> >  
> >  	/* Defer for less than two clock ticks. */
> > -	if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1)
> > +	if (tp->tso_deferred &&
> > +	    ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
> >  		goto send_now;
> >  
> >  	in_flight = tcp_packets_in_flight(tp);

I meant to ask about this a while back but then got distracted by
other things.  But now since the subject has come up, I had a couple
of more questions about this code.

What's with all the shifting back and forth?  Here with:

	((jiffies<<1)>>1) - (tp->tso_deferred>>1)

and later with:

	/* Ok, it looks like it is advisable to defer.  */
	tp->tso_deferred = 1 | (jiffies<<1);

Is this just done to try and avoid the special case of jiffies==0 
when the jiffies wrap?  If so it seems like a lot of unnecessary
work just to avoid a 1 in 4 billion event, since it's my understanding
that the whole tcp_tso_should_defer function is just an optimization
and not a criticality to the proper functioning of TCP, especially
considering it hasn't even been executing at all up to now.

My second question is more basic and if I'm not mistaken actually
relates to a remaining bug in the (corrected) test:

	/* Defer for less than two clock ticks. */
	if (tp->tso_deferred &&
	    ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)

Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which
will happen in about 25 days if HZ=1000, won't the second part of the
test always be true after that?  Or am I missing something obvious?

						-Bill

^ permalink raw reply

* Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: Eric Dumazet @ 2007-12-21  7:50 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: davem, netdev
In-Reply-To: <20071221.164458.51184496.yoshfuji@linux-ipv6.org>

YOSHIFUJI Hideaki / 吉藤英明 a écrit :
> In article <476B6DAC.2030102@cosmosbay.com> (at Fri, 21 Dec 2007 08:39:24 +0100), Eric Dumazet <dada1@cosmosbay.com> says:
> 
>>> Okay, anyway, I'll convert them to unsigned int, which is more
>>> appropriate.
>> I didnt chose this path, because David was against changing some fields from 
>> 'int' to 'unsigned'. If you look in other parts of networking, we have many >> 
>> 1 or >> 2 already there.
> 
> I do think it is safe to convert them here.
> 

Sure :)




^ permalink raw reply

* [TCP]: Convert several length variable to unsigned.
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-12-21  7:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, yoshfuji

Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

--
diff --git a/include/net/tcp.h b/include/net/tcp.h
index cb5b033..f663a85 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1153,7 +1153,8 @@ extern int			tcp_v4_calc_md5_hash(char *md5_hash,
 						     struct dst_entry *dst,
 						     struct request_sock *req,
 						     struct tcphdr *th,
-						     int protocol, int tcplen);
+						     int protocol,
+						     unsigned int tcplen);
 extern struct tcp_md5sig_key	*tcp_v4_md5_lookup(struct sock *sk,
 						   struct sock *addr_sk);
 
@@ -1359,7 +1360,8 @@ struct tcp_sock_af_ops {
 						  struct dst_entry *dst,
 						  struct request_sock *req,
 						  struct tcphdr *th,
-						  int protocol, int len);
+						  int protocol,
+						  unsigned int len);
 	int			(*md5_add) (struct sock *sk,
 					    struct sock *addr_sk,
 					    u8 *newkey,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 652c323..601b4ca 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -99,7 +99,7 @@ static struct tcp_md5sig_key *tcp_v4_md5_do_lookup(struct sock *sk,
 static int tcp_v4_do_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
 				   __be32 saddr, __be32 daddr,
 				   struct tcphdr *th, int protocol,
-				   int tcplen);
+				   unsigned int tcplen);
 #endif
 
 struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
@@ -1020,7 +1020,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char __user *optval,
 static int tcp_v4_do_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
 				   __be32 saddr, __be32 daddr,
 				   struct tcphdr *th, int protocol,
-				   int tcplen)
+				   unsinged int tcplen)
 {
 	struct scatterlist sg[4];
 	__u16 data_len;
@@ -1113,7 +1113,7 @@ int tcp_v4_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
 			 struct dst_entry *dst,
 			 struct request_sock *req,
 			 struct tcphdr *th, int protocol,
-			 int tcplen)
+			 unsigned int tcplen)
 {
 	__be32 saddr, daddr;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 93980c3..3b4169c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -733,7 +733,7 @@ static int tcp_v6_do_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
 				   struct in6_addr *saddr,
 				   struct in6_addr *daddr,
 				   struct tcphdr *th, int protocol,
-				   int tcplen)
+				   unsigned int tcplen)
 {
 	struct scatterlist sg[4];
 	__u16 data_len;
@@ -818,7 +818,7 @@ static int tcp_v6_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
 				struct dst_entry *dst,
 				struct request_sock *req,
 				struct tcphdr *th, int protocol,
-				int tcplen)
+				unsigned int tcplen)
 {
 	struct in6_addr *saddr, *daddr;
 
@@ -985,7 +985,7 @@ static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
 	struct tcphdr *th = tcp_hdr(skb), *t1;
 	struct sk_buff *buff;
 	struct flowi fl;
-	int tot_len = sizeof(*th);
+	unsigned int tot_len = sizeof(*th);
 #ifdef CONFIG_TCP_MD5SIG
 	struct tcp_md5sig_key *key;
 #endif
@@ -1085,7 +1085,7 @@ static void tcp_v6_send_ack(struct tcp_timewait_sock *tw,
 	struct tcphdr *th = tcp_hdr(skb), *t1;
 	struct sk_buff *buff;
 	struct flowi fl;
-	int tot_len = sizeof(struct tcphdr);
+	unsigned int tot_len = sizeof(struct tcphdr);
 	__be32 *topt;
 #ifdef CONFIG_TCP_MD5SIG
 	struct tcp_md5sig_key *key;

^ permalink raw reply related

* Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-12-21  7:44 UTC (permalink / raw)
  To: dada1; +Cc: davem, netdev, yoshfuji
In-Reply-To: <476B6DAC.2030102@cosmosbay.com>

In article <476B6DAC.2030102@cosmosbay.com> (at Fri, 21 Dec 2007 08:39:24 +0100), Eric Dumazet <dada1@cosmosbay.com> says:

> > Okay, anyway, I'll convert them to unsigned int, which is more
> > appropriate.
> 
> I didnt chose this path, because David was against changing some fields from 
> 'int' to 'unsigned'. If you look in other parts of networking, we have many >> 
> 1 or >> 2 already there.

I do think it is safe to convert them here.

--yoshfuji

^ permalink raw reply

* Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: Eric Dumazet @ 2007-12-21  7:39 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: davem, netdev
In-Reply-To: <20071221.162833.82587283.yoshfuji@linux-ipv6.org>

YOSHIFUJI Hideaki / 吉藤英明 a écrit :
> In article <476B65F8.10201@cosmosbay.com> (at Fri, 21 Dec 2007 08:06:32 +0100), Eric Dumazet <dada1@cosmosbay.com> says:
> 
>> YOSHIFUJI Hideaki / 吉藤英明 a écrit :
>>> In article <476B574E.80601@cosmosbay.com> (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet <dada1@cosmosbay.com> says:
>>>
>>>> Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler
>>>> to emit an integer divide, while we can help it to use a right shift,
>>>> less expensive.
>>> Are you really sure?
>>> At least, gcc-4.1.2-20061115 (debian) does not make any difference.
>>>
>>> And, IMHO, because shift for signed variable is fragile, so we should
>>> avoid using it.
>>>
>> Yes I am sure, but maybe you are on x86_64 ?
>>
>> gcc-4.2.2 on x86
> 
> I'm on gcc-4.1.2 20061115 (prerelease) (Debian 4.1.1-21), on x86 (i686).
> Maybe compiler difference?!
> 
>> If you think tot_len can be negative, I understand you can be against this 
>> patch. But I am sure it's allways > 0, even if I am a total ipv6 newbie :)
> 
> Okay, anyway, I'll convert them to unsigned int, which is more
> appropriate.

I didnt chose this path, because David was against changing some fields from 
'int' to 'unsigned'. If you look in other parts of networking, we have many >> 
1 or >> 2 already there.



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox