Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-26  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: xiaosuo, netdev
In-Reply-To: <20100425.232758.101478192.davem@davemloft.net>

Le dimanche 25 avril 2010 à 23:27 -0700, David Miller a écrit :

> No matter what is faster, we have to process packets in
> FIFO order, otherwise we get reordering within a flow
> which is to be absolutely avoided.

Hmm, completion queue is about freeing skb, and its name is misleading.

This queue is feed by dev_kfree_skb_irq() only




^ permalink raw reply

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: David Miller @ 2010-04-26  7:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev
In-Reply-To: <1272266509.2346.2.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Apr 2010 09:21:49 +0200

> Le dimanche 25 avril 2010 à 23:27 -0700, David Miller a écrit :
> 
>> No matter what is faster, we have to process packets in
>> FIFO order, otherwise we get reordering within a flow
>> which is to be absolutely avoided.
> 
> Hmm, completion queue is about freeing skb, and its name is misleading.
> 
> This queue is feed by dev_kfree_skb_irq() only

Ok I see, thanks for explaining.

^ permalink raw reply

* Re: RPS and forwarding
From: Eric Dumazet @ 2010-04-26  7:25 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, therbert, netdev
In-Reply-To: <20100426063136.GA22401@gondor.apana.org.au>

Le lundi 26 avril 2010 à 14:31 +0800, Herbert Xu a écrit :
> On Sun, Apr 25, 2010 at 11:28:57PM -0700, David Miller wrote:

> 
> > It might be possible to somehow make the sock table get bypassed for
> > forwarded traffic, but I can't think of a cheap way to do that at
> > the moment.
> 
> Yeah I thought about that too but as we need to go through the
> routing table before we know whether something is forwarded or
> not, it certainly seems non-trivial.
> 
> Hmm, maybe if we used a routing cache keyed by rxhash we could
> make an appromixation? After all, we only need to make sure that
> no forwarded traffic is redirected, and not the reverse.  IOW,
> it's OK if we incorrectly classify some local traffic as forwarded.
> 

RFS is already working like that, if RPS is not used in conjonction.

Forwarded traffic will see a non tagged rfs flow entry, so this skb will
be processed by this cpu, not a remote one.

Currently, only an application can set a RFS flow entry.




^ permalink raw reply

* Re: RPS and forwarding
From: Herbert Xu @ 2010-04-26  7:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev
In-Reply-To: <1272266716.2346.5.camel@edumazet-laptop>

On Mon, Apr 26, 2010 at 09:25:16AM +0200, Eric Dumazet wrote:
>
> RFS is already working like that, if RPS is not used in conjonction.
> 
> Forwarded traffic will see a non tagged rfs flow entry, so this skb will
> be processed by this cpu, not a remote one.
> 
> Currently, only an application can set a RFS flow entry.

I was referring to the case when a forwarded flow gets hashed to
the same index as a local flow.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: RPS and forwarding
From: Eric Dumazet @ 2010-04-26  7:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, therbert, netdev
In-Reply-To: <20100426073017.GA22803@gondor.apana.org.au>

Le lundi 26 avril 2010 à 15:30 +0800, Herbert Xu a écrit :
> On Mon, Apr 26, 2010 at 09:25:16AM +0200, Eric Dumazet wrote:
> >
> > RFS is already working like that, if RPS is not used in conjonction.
> > 
> > Forwarded traffic will see a non tagged rfs flow entry, so this skb will
> > be processed by this cpu, not a remote one.
> > 
> > Currently, only an application can set a RFS flow entry.
> 
> I was referring to the case when a forwarded flow gets hashed to
> the same index as a local flow.

Ah OK, with same rxhash I guess ?

(we could store rxhash next to cpu in struct rps_dev_flow, or 16bit part
of it in the existing padding)




^ permalink raw reply

* Re: RPS and forwarding
From: Herbert Xu @ 2010-04-26  7:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev
In-Reply-To: <1272267534.2346.7.camel@edumazet-laptop>

On Mon, Apr 26, 2010 at 09:38:54AM +0200, Eric Dumazet wrote:
>
> Ah OK, with same rxhash I guess ?

It's worse than that I think, because the actual table contains
less than 4G entries.  So either the same rxhash or the same index
in the table.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/7] Topcliff GbE: Add The Main code
From: Masayuki Ohtake @ 2010-04-26  7:53 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: NETDEV, Wang, Yong Y, Wang, Qi, Intel OTC, Andrew
In-Reply-To: <201004231727.23246.arnd@arndb.de>

On: Fri, 23 Apr 2010 17:27:23 +0200, Arnd Bergmann <arnd@arndb.de> wrote.

Hi Arnd

Thank you for your review.
We will confirm them and update the new patch.
Our company have a vacation.
Therefore we will update patch after May 6.

Best regards,
Masayuki Ohtake <masa-korg@dsn.okisemi.com>

^ permalink raw reply

* Re: [PATCH] Topcliff PHUB: Add The Packet Hub driver [1/2]
From: Masayuki Ohtake @ 2010-04-26  7:53 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: NETDEV, Wang, Yong Y, Wang, Qi, andrew.chih.howe.khor
In-Reply-To: <201004231657.57466.arnd@arndb.de>

On: Fri, 23 Apr 2010 16:57:57 +0200, Arnd Bergmann <arnd@arndb.de>  wrote:

Hi Arnd

Thank you for your review.
We will confirm them and update the new patch.
Our company have a vacation.
Therefore we will update patch after May 6.

Best regards,
Masayuki Ohtake <masa-korg@dsn.okisemi.com>

^ permalink raw reply

* Re: [PATCH 1/7] Topcliff GbE: Add The Main code [1/3]
From: Masayuki Ohtake @ 2010-04-26  8:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: NETDEV, Wang, Yong Y, Wang, Qi, Intel OTC, Andrew
In-Reply-To: <20100423082624.147de667@nehalam>

On: Fri, 23 Apr 2010 08:26:24 -0700, Stephen Hemminger
<shemminger@vyatta.com> wrote.

Hi Stephen

Thank you for your review.
We will confirm them and update the new patch.
Our company have a vacation.
Therefore we will update patch after May 6.

Best regards,
Masayuki Ohtake <masa-korg@dsn.okisemi.com>


^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_sleep() helper
From: Eric Dumazet @ 2010-04-26  8:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100420.163919.82125824.davem@davemloft.net>

Le mardi 20 avril 2010 à 16:39 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Apr 2010 01:03:51 +0200
> 
> > [PATCH net-next-2.6] net: sk_sleep() helper
> 
> Looks good, applied, thanks Eric.

Here is a followup to this patch, I missed three files in this
conversion.

Thanks !

[PATCH net-next-2.6] net: use sk_sleep()

Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/macvtap.c  |    6 +++---
 net/caif/caif_socket.c |   30 +++++++++++++++---------------
 net/rxrpc/ar-recvmsg.c |    6 +++---
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 85d6420..d97e1fd 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -181,7 +181,7 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
 		return -ENOLINK;
 
 	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-	wake_up_interruptible_poll(q->sk.sk_sleep, POLLIN | POLLRDNORM | POLLRDBAND);
+	wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
 	return 0;
 }
 
@@ -562,7 +562,7 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	struct sk_buff *skb;
 	ssize_t ret = 0;
 
-	add_wait_queue(q->sk.sk_sleep, &wait);
+	add_wait_queue(sk_sleep(&q->sk), &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -587,7 +587,7 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(q->sk.sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(&q->sk), &wait);
 	return ret;
 }
 
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 90317e7..d455375 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -169,7 +169,7 @@ static int caif_sktrecv_cb(struct cflayer *layr, struct cfpkt *pkt)
 
 	/* Signal reader that data is available. */
 
-	wake_up_interruptible(cf_sk->sk.sk_sleep);
+	wake_up_interruptible(sk_sleep(&cf_sk->sk));
 
 	return 0;
 }
@@ -203,7 +203,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		dbfs_atomic_inc(&cnt.num_tx_flow_on_ind);
 		/* Signal reader that data is available. */
 		SET_TX_FLOW_ON(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_FLOW_OFF_IND:
@@ -217,7 +217,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		caif_assert(STATE_IS_OPEN(cf_sk));
 		SET_PENDING_OFF(cf_sk);
 		SET_TX_FLOW_ON(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_DEINIT_RSP:
@@ -225,8 +225,8 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		caif_assert(!STATE_IS_OPEN(cf_sk));
 		SET_PENDING_OFF(cf_sk);
 		if (!STATE_IS_PENDING_DESTROY(cf_sk)) {
-			if (cf_sk->sk.sk_sleep != NULL)
-				wake_up_interruptible(cf_sk->sk.sk_sleep);
+			if (sk_sleep(&cf_sk->sk) != NULL)
+				wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		}
 		dbfs_atomic_inc(&cnt.num_deinit);
 		sock_put(&cf_sk->sk);
@@ -238,7 +238,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		SET_STATE_CLOSED(cf_sk);
 		SET_PENDING_OFF(cf_sk);
 		SET_TX_FLOW_OFF(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_REMOTE_SHUTDOWN_IND:
@@ -247,7 +247,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		/* Use sk_shutdown to indicate remote shutdown indication */
 		cf_sk->sk.sk_shutdown |= RCV_SHUTDOWN;
 		cf_sk->file_mode = 0;
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	default:
@@ -325,7 +325,7 @@ static int caif_recvmsg(struct kiocb *iocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 
 		lock_sock(&(cf_sk->sk));
@@ -365,7 +365,7 @@ static int caif_recvmsg(struct kiocb *iocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Block reader until data arrives or socket is closed. */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					cfpkt_qpeek(cf_sk->pktq)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)) ==
@@ -537,7 +537,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		 * for its conclusion.
 		 */
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 		/* I want to be alone on cf_sk (except status and queue) */
 		lock_sock(&(cf_sk->sk));
@@ -573,7 +573,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Wait until flow is on or socket is closed */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					TX_FLOW_IS_ON(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
@@ -650,7 +650,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Wait until flow is on or socket is closed */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					TX_FLOW_IS_ON(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
@@ -898,7 +898,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uservaddr,
 			 * for its conclusion.
 			 */
 			result =
-			    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+			    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 						     !STATE_IS_PENDING(cf_sk));
 
 			lock_sock(&(cf_sk->sk));
@@ -965,7 +965,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uservaddr,
 		release_sock(&cf_sk->sk);
 
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 
 		lock_sock(&(cf_sk->sk));
@@ -1107,7 +1107,7 @@ static int caif_release(struct socket *sock)
 	 * CAIF stack.
 	 */
 	if (!(sock->file->f_flags & O_NONBLOCK)) {
-		res = wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 						!STATE_IS_PENDING(cf_sk));
 
 		if (res == -ERESTARTSYS) {
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index 60c2b94..0c65013 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -91,7 +91,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 			/* wait for a message to turn up */
 			release_sock(&rx->sk);
-			prepare_to_wait_exclusive(rx->sk.sk_sleep, &wait,
+			prepare_to_wait_exclusive(sk_sleep(&rx->sk), &wait,
 						  TASK_INTERRUPTIBLE);
 			ret = sock_error(&rx->sk);
 			if (ret)
@@ -102,7 +102,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 					goto wait_interrupted;
 				timeo = schedule_timeout(timeo);
 			}
-			finish_wait(rx->sk.sk_sleep, &wait);
+			finish_wait(sk_sleep(&rx->sk), &wait);
 			lock_sock(&rx->sk);
 			continue;
 		}
@@ -356,7 +356,7 @@ csum_copy_error:
 wait_interrupted:
 	ret = sock_intr_errno(timeo);
 wait_error:
-	finish_wait(rx->sk.sk_sleep, &wait);
+	finish_wait(sk_sleep(&rx->sk), &wait);
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	if (copied)



^ permalink raw reply related

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-26  8:41 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20100419.141947.129754682.davem@davemloft.net>

Le lundi 19 avril 2010 à 14:19 -0700, David Miller a écrit :

> 
> I was thinking also about how we could compute rxhash in the
> loopback driver :-)

This would be easy if rxhash was not a "struct inet_sock" field but a
"struct sock" one

sock_alloc_send_pskb() (or skb_set_owner_w())

skb->rxhash = sk->rxhash;




^ permalink raw reply

* Re: [PATCH] e100: expose broadcast_disabled as a module option
From: Erwan Velu @ 2010-04-26  8:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Kirsher, netdev, David Miller, linux-kernel,
	jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <20100423150207.7969c9f6@nehalam>

Agreed that will be a better implementation. Changes of IFF_BROADCAST
could play with this "broadcast_disabled" configuration switch to
increase the cpu efficiency.

 I'm not a kernel expert and I don't really figure how the changes of
the IFF_BROADCAST should be forwarded to the interfaces and how it can
be manipulated. I saw that ifconfig have a '-broadcast' option but
looks like none of my drivers are compatible with that.

Does any one you could help me understanding what should be the good way to
1- enabled/disabled IFF_BROADCAST for a given interface
2- populate this changes to the driver

Once I'll be able to have a proper patch using that, I'll post it
again to the list.

Cheers,
Erwan

2010/4/24 Stephen Hemminger <shemminger@vyatta.com>:
> On Fri, 23 Apr 2010 23:03:59 +0200
[...]
> The point is that the driver can look at IFF_BROADCAST rather than having
> module parameter. Module parameters are device driver specific and should
> be avoid as much as possible in favor of general mechanism. This is a repeated
> problem where users and vendors make special hooks that only work with their
> driver, which makes life hard for other users and distribution providers.
>

^ permalink raw reply

* Re: [PATCH 2/2] net: reimplement softnet_data.output_queue as a FIFO queue
From: Changli Gao @ 2010-04-26  9:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Changli Gao
In-Reply-To: <1272252361-11799-1-git-send-email-xiaosuo@gmail.com>

On Mon, Apr 26, 2010 at 11:26 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> reimplement softnet_data.output_queue as a FIFO queue.
>
> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> the qdiscs rescheduled.
>

Dear Eric:
  do you think it is worth continuing? I didn't received any comments
even negative ones. I am really sorry for disrupting you, but I don't
know what did it mean.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH 2/2] net: reimplement softnet_data.output_queue as a FIFO  queue
From: Eric Dumazet @ 2010-04-26  9:30 UTC (permalink / raw)
  To: Changli Gao; +Cc: netdev
In-Reply-To: <q2p412e6f7f1004260211t19efd3ebk183577ac4a39d31e@mail.gmail.com>

Le lundi 26 avril 2010 à 17:11 +0800, Changli Gao a écrit :
> On Mon, Apr 26, 2010 at 11:26 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> > reimplement softnet_data.output_queue as a FIFO queue.
> >
> > reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> > the qdiscs rescheduled.
> >
> 
> Dear Eric:
>   do you think it is worth continuing? I didn't received any comments
> even negative ones. I am really sorry for disrupting you, but I don't
> know what did it mean.
> 

For this patch I had no comment, I believe it is fine, at a first
glance...

Please give some time for netdev people to review it and Ack it.




^ permalink raw reply

* Re: [PATCH v3] TCP: avoid to send keepalive probes if receiving data
From: Ilpo Järvinen @ 2010-04-26  9:47 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: Netdev, David Miller, Eric Dumazet
In-Reply-To: <1272249659-2292-1-git-send-email-fleitner@redhat.com>

On Sun, 25 Apr 2010, Flavio Leitner wrote:

> RFC 1122 says the following:
> ...
>   Keep-alive packets MUST only be sent when no data or
>   acknowledgement packets have been received for the
>   connection within an interval.
> ...
> 
> The acknowledgement packet is reseting the keepalive
> timer but the data packet isn't. This patch fixes it by
> checking the timestamp of the last received data packet
> too when the keepalive timer expires.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> ---
>  net/ipv4/tcp.c       |    5 ++++-
>  net/ipv4/tcp_timer.c |    8 ++++++++
>  2 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0f8caf6..94f605c 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  			if (sock_flag(sk, SOCK_KEEPOPEN) &&
>  			    !((1 << sk->sk_state) &
>  			      (TCPF_CLOSE | TCPF_LISTEN))) {
> -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
> +				u32 elapsed = min_t(u32,
> +				      tcp_time_stamp - icsk->icsk_ack.lrcvtime,
> +				      tcp_time_stamp - tp->rcv_tstamp);
> +

I'd put this into a helper in include/net/tcp.h

>  				if (tp->keepalive_time > elapsed)
>  					elapsed = tp->keepalive_time - elapsed;
>  				else
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 8a0ab29..9d1bb6f 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
>  	if (tp->packets_out || tcp_send_head(sk))
>  		goto resched;
>  
> +	elapsed = tcp_time_stamp - icsk->icsk_ack.lrcvtime;
> +
> +	/* receiving data means alive */
> +	if (elapsed < keepalive_time_when(tp)) {
> +		elapsed = keepalive_time_when(tp) - elapsed;
> +		goto resched;
> +	}
> +
>  	elapsed = tcp_time_stamp - tp->rcv_tstamp;

...And then use it here too for setting the elapsed and doing the test 
and what follows only once?

>  	if (elapsed >= keepalive_time_when(tp)) {
> 

-- 
 i.

^ permalink raw reply

* [PATCH torvalds-2.6] cdc_ether: fix autosuspend for mbm devices
From: Torgny Johansson @ 2010-04-26 10:14 UTC (permalink / raw)
  To: oliver, davem, netdev

Hi!

In 2.6.33 it seems autosuspend is broken for mbm devices.

Autosuspend works until you bring the wwan interface up, then the device does not enter autosuspend anymore.

The following patch fixes the problem by setting the .manage_power field in the mbm_info struct 
to the same as in the cdc_info struct (cdc_manager_power).

I am unsure exactly what that does and why autosuspend doesn't work without it. 
Can you guys comment on that? Is this fix the correct approach?

Also, if you like this patch, is it possible to get it included in a minor release of 2.6.33 (e.g. 2.6.33.3) as a bugfix?

Regards
Torgny Johansson

---
Signed-off-by: Torgny Johansson <torgny.johansson@ericsson.com>

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index c8cdb7f..3547cf1 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -431,6 +431,7 @@ static const struct driver_info mbm_info = {
 	.bind = 	cdc_bind,
 	.unbind =	usbnet_cdc_unbind,
 	.status =	cdc_status,
+	.manage_power =	cdc_manage_power,
 };

 /*-------------------------------------------------------------------------*/

^ permalink raw reply related

* Re: [PATCH torvalds-2.6] cdc_ether: fix autosuspend for mbm devices
From: Oliver Neukum @ 2010-04-26 10:30 UTC (permalink / raw)
  To: Torgny Johansson; +Cc: netdev
In-Reply-To: <201004261214.13836.torgny.johansson@ericsson.com>

Am Montag, 26. April 2010 12:14:13 schrieb Torgny Johansson:
> Hi!
> 
> In 2.6.33 it seems autosuspend is broken for mbm devices.

It is hardly broken. It simply wasn't implemented.

> Autosuspend works until you bring the wwan interface up, then the device does not enter autosuspend anymore.
> 
> The following patch fixes the problem by setting the .manage_power field in the mbm_info struct 
> to the same as in the cdc_info struct (cdc_manager_power).
> 
> I am unsure exactly what that does and why autosuspend doesn't work without it. 
> Can you guys comment on that? Is this fix the correct approach?

The patch is correct. If you use another struct driver_info it is a different
driver as far as usbnet is concerned. A driver needs to give some
minimal support for autosuspend. mbm didn't do so.
 
> Also, if you like this patch, is it possible to get it included in a minor release of 2.6.33 (e.g. 2.6.33.3) as a bugfix?

Up to David.

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH net-next-2.6] netns: call ops_free right after ops_exit
From: Jiri Pirko @ 2010-04-26 10:43 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev
In-Reply-To: <m1r5m2ud40.fsf@fess.ebiederm.org>

Mon, Apr 26, 2010 at 05:06:07AM CEST, ebiederm@xmission.com wrote:
>Jiri Pirko <jpirko@redhat.com> writes:
>
>> Sun, Apr 25, 2010 at 04:50:34PM CEST, ebiederm@xmission.com wrote:
>>>David Miller <davem@davemloft.net> writes:
>>>
>>>> From: Jiri Pirko <jpirko@redhat.com>
>>>> Date: Sun, 25 Apr 2010 11:26:01 +0200
>>>>
>>>>> There's no need to iterate this twice. We can free net generic
>>>>> variables right after exit is called.
>>>>>
>>>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>>>
>>>> Are you sure there are no problems with doing this?
>>>>
>>>> What if there are inter-net variable reference dependencies
>>>> or something like that?
>>>>
>>>> I really suspect it is being done this way on purpose, but
>>>> in the end I defer to experts like Eric B. :-)
>>>
>>>I am pretty certain there is a problem.  My memory is fuzzy this
>>>morning but I believe we can have rcu references between various
>>>pieces of the networking stack for a single network namespace.  So we
>>>need to cause all of the network namespace to exit before it is safe
>>>to free those pieces.
>>
>> Hmm, that doesn't make much sense to me. Since the allocated memory in question
>> is used locally, after exit() is called, the memory chunk should not be used by
>> anyone and if it is, I think it's a bug.
>>
>> Earlier, when the memory wasn't allocated automatically (by filling .size)
>> memory was individually freed in exit(). From what I understood from your reply,
>> you are telling this was buggy?
>
>Now I remember clearly.  The use case for not freeing memory
>immediately is the delayed freeing of network devices.  Ideally we
>delay unregistering all of the network devices into
>default_device_exit_batch, when we terminate one or namespaces.
>
>Since network devices are freed outside of their exit routines we need
>to keep their per net memory around until they are freed.
>
>Ultimately all of this is much easier to think about if these chunks of
>memory can be logically thought of as living on struct net and have the
>same lifetime rules.

Ok, I see your point. So scratch this.

Thanks,

	Jirka
>
>Eric

^ permalink raw reply

* [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 11:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm

This is needed to let know appropriate code (driver) know that pernet memory
chunk  was freed already. For example when driver has also registered netdev
notifier, it can be called after memory is freed which could eventually lead
to panic. Also a null check should be present in these notifiers.

For example, previously, when drivers were responsible for pernet memory
allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
called from pppoe_device_event) but makes no sense now without this patch.

I already know about several notifiers where the check should be present,
patches will follow up.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index bd8c471..29f622c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
 	if (ops->id && ops->size) {
 		int id = *ops->id;
 		kfree(net_generic(net, id));
+		net_assign_generic(net, id, NULL);
 	}
 }

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-26 11:35 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <t2r412e6f7f1004241931ie9e70e3aka77b49557a7872e3@mail.gmail.com>

On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:

> I read the code again, and find that we don't use spin_lock_irqsave(),
> and we use local_irq_save() and spin_lock() instead, so
> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
> related to backlog. the lock maybe sk_receive_queue.lock.

Possible.
I am wondering if there's a way we can precisely nail where that is
happening? is lockstat any use? 
Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.

So looking at your patch now i see it is likely there was an improvement
made for non-rps case (moving out of loop some irq_enable etc).
i.e my results may not be crazy after adding your patch and seeing an
improvement for non-rps case.
However, whatever your patch did - it did not help the rps case case:
call_function_single_interrupt() comes out higher in the profile,
and # of IPIs seems to have gone up (although i did not measure this, I
can see the interrupts/second went up by almost 50-60%)

> Jamal, did you use a single socket to serve all the clients?

Socket per detected cpu.

> BTW:  completion_queue and output_queue in softnet_data both are LIFO
> queues. For completion_queue, FIFO is better, as the last used skb is
> more likely in cache, and should be used first. Since slab has always
> cache the last used memory at the head, we'd better free the skb in
> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.

I think it will depend on how many of those skbs are sitting in the
completion queue, cache warmth etc. LIFO is always safest, you have
higher probability of finding a cached skb infront.

cheers,
jamal

^ permalink raw reply

* [patch v2] bluetooth: handle l2cap_create_connless_pdu() errors
From: Dan Carpenter @ 2010-04-26 11:36 UTC (permalink / raw)
  To: Gustavo F. Padovan
  Cc: Marcel Holtmann, David S. Miller, Andrei Emeltchenko,
	linux-bluetooth, netdev, kernel-janitors
In-Reply-To: <20100422123347.GA5265@vigoh>

l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
ERR_PTR(-EFAULT).

Signed-off-by: Dan Carpenter <error27@gmail.com>
---
In v2 I wrote the patch on top of Gustavo Padovon's devel tree

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index a668ef4..b32682c 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1712,8 +1712,12 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms
 	/* Connectionless channel */
 	if (sk->sk_type == SOCK_DGRAM) {
 		skb = l2cap_create_connless_pdu(sk, msg, len);
-		l2cap_do_send(sk, skb);
-		err = len;
+		if (IS_ERR(skb)) {
+			err = PTR_ERR(skb);
+		} else {
+			l2cap_do_send(sk, skb);
+			err = len;
+		}
 		goto done;
 	}
 

^ permalink raw reply related

* [PATCH net-next-2.6] pppoe: use pppoe_pernet instead of directly net_generic
From: Jiri Pirko @ 2010-04-26 11:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, mostrows


Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index cdd11ba..c059c8d 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -258,7 +258,7 @@ static inline struct pppox_sock *get_item_by_addr(struct net *net,
 	dev = dev_get_by_name_rcu(net, sp->sa_addr.pppoe.dev);
 	if (dev) {
 		ifindex = dev->ifindex;
-		pn = net_generic(net, pppoe_net_id);
+		pn = pppoe_pernet(net);
 		pppox_sock = get_item(pn, sp->sa_addr.pppoe.sid,
 				sp->sa_addr.pppoe.remote, ifindex);
 	}

^ permalink raw reply related

* Re: [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 12:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm
In-Reply-To: <20100426111806.GB2941@psychotron.lab.eng.brq.redhat.com>

Mon, Apr 26, 2010 at 01:18:07PM CEST, jpirko@redhat.com wrote:
>This is needed to let know appropriate code (driver) know that pernet memory
>chunk  was freed already. For example when driver has also registered netdev
>notifier, it can be called after memory is freed which could eventually lead
>to panic. Also a null check should be present in these notifiers.
>
>For example, previously, when drivers were responsible for pernet memory
>allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
>after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
>called from pppoe_device_event) but makes no sense now without this patch.

Hmm, thinking about this, since the life of pernet memories is directly connected
with the life of net, as described by Eric, this should not be needed. So please
scrach this one too. Sorry.

Anyway, I'm still wondering how to prevent netdev_notifiers from using this when
net is gone. Going to do more research, I'm probably missing something.


>
>I already know about several notifiers where the check should be present,
>patches will follow up.
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>index bd8c471..29f622c 100644
>--- a/net/core/net_namespace.c
>+++ b/net/core/net_namespace.c
>@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
> 	if (ops->id && ops->size) {
> 		int id = *ops->id;
> 		kfree(net_generic(net, id));
>+		net_assign_generic(net, id, NULL);
> 	}
> }
> 

^ permalink raw reply

* Re: [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 13:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm
In-Reply-To: <20100426120716.GD2941@psychotron.lab.eng.brq.redhat.com>

Mon, Apr 26, 2010 at 02:07:17PM CEST, jpirko@redhat.com wrote:
>Mon, Apr 26, 2010 at 01:18:07PM CEST, jpirko@redhat.com wrote:
>>This is needed to let know appropriate code (driver) know that pernet memory
>>chunk  was freed already. For example when driver has also registered netdev
>>notifier, it can be called after memory is freed which could eventually lead
>>to panic. Also a null check should be present in these notifiers.
>>
>>For example, previously, when drivers were responsible for pernet memory
>>allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
>>after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
>>called from pppoe_device_event) but makes no sense now without this patch.
>
>Hmm, thinking about this, since the life of pernet memories is directly connected
>with the life of net, as described by Eric, this should not be needed. So please
>scrach this one too. Sorry.
>
>Anyway, I'm still wondering how to prevent netdev_notifiers from using this when
>net is gone. Going to do more research, I'm probably missing something.

Right, it becomes part of init_ns then. So this looks good then. One thing I can
still imagine what may cause problem, since netdev_notifier is registered before
register_pernet_device, the notifier can be called in between. So I think this
should be reordered, am I right?

Thanks

Jirka

>
>
>>
>>I already know about several notifiers where the check should be present,
>>patches will follow up.
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>>diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>>index bd8c471..29f622c 100644
>>--- a/net/core/net_namespace.c
>>+++ b/net/core/net_namespace.c
>>@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
>> 	if (ops->id && ops->size) {
>> 		int id = *ops->id;
>> 		kfree(net_generic(net, id));
>>+		net_assign_generic(net, id, NULL);
>> 	}
>> }
>> 

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-26 13:35 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1272281709.8918.35.camel@bigi>

On Mon, Apr 26, 2010 at 7:35 PM, jamal <hadi@cyberus.ca> wrote:
> On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:
>
>> I read the code again, and find that we don't use spin_lock_irqsave(),
>> and we use local_irq_save() and spin_lock() instead, so
>> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
>> related to backlog. the lock maybe sk_receive_queue.lock.
>
> Possible.
> I am wondering if there's a way we can precisely nail where that is
> happening? is lockstat any use?
> Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.
>

Maybe lockstat can help in this case.

> So looking at your patch now i see it is likely there was an improvement
> made for non-rps case (moving out of loop some irq_enable etc).
> i.e my results may not be crazy after adding your patch and seeing an
> improvement for non-rps case.
> However, whatever your patch did - it did not help the rps case case:
> call_function_single_interrupt() comes out higher in the profile,
> and # of IPIs seems to have gone up (although i did not measure this, I
> can see the interrupts/second went up by almost 50-60%)

Did you apply the patch from Eric? It would reduce the number of
local_irq_disable() calls but increase the number of IPIs.

>
>> Jamal, did you use a single socket to serve all the clients?
>
> Socket per detected cpu.

Ignore it. I made a mistake here.

>
>> BTW:  completion_queue and output_queue in softnet_data both are LIFO
>> queues. For completion_queue, FIFO is better, as the last used skb is
>> more likely in cache, and should be used first. Since slab has always
>> cache the last used memory at the head, we'd better free the skb in
>> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.
>
> I think it will depend on how many of those skbs are sitting in the
> completion queue, cache warmth etc. LIFO is always safest, you have
> higher probability of finding a cached skb infront.
>

we call kfree_skb() to release skbs to slab allocator, then slab
allocator stores them in a LIFO queue. If completion queue is also a
LIFO queue, the latest unused skb will be in the front of the queue,
and will be released to slab allocator at first. At the next time, we
call alloc_skb(), the memory used by the skb in the end of the
completion queue will be returned instead of the hot one.

However, as Eric said, new drivers don't rely on completion queue, it
isn't a real problem, especially in your test case.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox