Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] b44: add 64 bit stats
From: Ben Hutchings @ 2012-07-20 14:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kevin Groeneveld, netdev
In-Reply-To: <1342761865.2626.5572.camel@edumazet-glaptop>

On Fri, 2012-07-20 at 07:24 +0200, Eric Dumazet wrote:
> On Fri, 2012-07-20 at 06:53 +0200, Eric Dumazet wrote:
> > On Thu, 2012-07-19 at 21:56 -0400, Kevin Groeneveld wrote:
> > 
> > > I am still trying to make sure I understand this fully.  I want to
> > > update some other drivers with 64 bit stats as well.  What you said
> > > seems to make sense, but...
> > > 
> > > I was looking at the virtio_net.c driver.  One spot in this driver
> > > which updates the stats is the receive_buf function.  recive_buf is
> > > called from virtnet_poll which is registered as a napi poll function.
> > > According to Documentation/networking/netdevices.txt the poll function
> > > is called in a softirq context.  However, the function which reads the
> > > stats uses u64_stats_fetch_begin/u64_stats_fetch_retry.  Shouldn't
> > > this be u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh for the
> > > exact reasons you described for my b44 patch?
> > 
> > Absolutely. You can argue that probably nobody use this driver on a
> > 32bit UP machine, but technically speaking the current implementation is
> > racy.
> > 
> 
> In fact all network drivers should use the _bh version.
> 
> Could you send a patch for all of them, based on net-next tree ?
> 
> Thanks !

Don't we need an _irq variant for drivers that support netpoll?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix ABC in tcp_slow_start()
From: John Heffner @ 2012-07-20 14:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Tom Herbert, Yuchung Cheng, Neal Cardwell,
	Nandita Dukkipati, Stephen Hemminger
In-Reply-To: <1342762841.2626.5633.camel@edumazet-glaptop>

It might be clearer to instead introduce a temporary variable to
calculate the snd_cwnd change in the while loop.  That is:

        unsigned int snd_cwnd_delta = 0;
...
        tp->snd_cwnd_cnt += cnt;
        while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
                tp->snd_cwnd_cnt -= tp->snd_cwnd;
                snd_cwnd_delta++;
        }
        tp->snd_cwnd = min(tp->snd_cwnd + snd_cwnd_delta, tp->snd_cwnd_clamp);



On Fri, Jul 20, 2012 at 1:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
> received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()
>
> Problem is this RFC 3465 statement is not correctly coded, as
> the while () loop increases snd_cwnd one by one.
>
> So to reach the "cwnd += 2" goal, we need to use "cnt = 2*cnt + 1"
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Nandita Dukkipati <nanditad@google.com>
> Cc: John Heffner <johnwheffner@gmail.com>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> ---
>  net/ipv4/tcp_cong.c |   13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 04dbd7a..486379a 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -306,7 +306,7 @@ EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited);
>   */
>  void tcp_slow_start(struct tcp_sock *tp)
>  {
> -       int cnt; /* increase in packets */
> +       unsigned int cnt; /* increase in packets */
>
>         /* RFC3465: ABC Slow start
>          * Increase only after a full MSS of bytes is acked
> @@ -318,16 +318,19 @@ void tcp_slow_start(struct tcp_sock *tp)
>         if (sysctl_tcp_abc && tp->bytes_acked < tp->mss_cache)
>                 return;
>
> -       if (sysctl_tcp_max_ssthresh > 0 && tp->snd_cwnd > sysctl_tcp_max_ssthresh)
> +       cnt = tp->snd_cwnd;                     /* exponential increase */
> +       if (sysctl_tcp_max_ssthresh > 0 &&
> +           tp->snd_cwnd > sysctl_tcp_max_ssthresh)
>                 cnt = sysctl_tcp_max_ssthresh >> 1;     /* limited slow start */
> -       else
> -               cnt = tp->snd_cwnd;                     /* exponential increase */
>
>         /* RFC3465: ABC
>          * We MAY increase by 2 if discovered delayed ack
> +        * The "+ 1" in the expression is needed if we want to increase
> +        * cwnd by 2, because the way is coded the following loop.
>          */
>         if (sysctl_tcp_abc > 1 && tp->bytes_acked >= 2*tp->mss_cache)
> -               cnt <<= 1;
> +               cnt = 2*cnt + 1;
> +
>         tp->bytes_acked = 0;
>
>         tp->snd_cwnd_cnt += cnt;
>
>

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix ABC in tcp_slow_start()
From: Eric Dumazet @ 2012-07-20 14:56 UTC (permalink / raw)
  To: John Heffner
  Cc: David Miller, netdev, Tom Herbert, Yuchung Cheng, Neal Cardwell,
	Nandita Dukkipati, Stephen Hemminger
In-Reply-To: <CABrhC0kdw7Uv-67t7fFcfh-Pve1LsNAQ3HGnUmCOJVB7b0dUVQ@mail.gmail.com>

On Fri, 2012-07-20 at 10:41 -0400, John Heffner wrote:
> It might be clearer to instead introduce a temporary variable to
> calculate the snd_cwnd change in the while loop.  That is:
> 
>         unsigned int snd_cwnd_delta = 0;
> ...
>         tp->snd_cwnd_cnt += cnt;
>         while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
>                 tp->snd_cwnd_cnt -= tp->snd_cwnd;
>                 snd_cwnd_delta++;
>         }
>         tp->snd_cwnd = min(tp->snd_cwnd + snd_cwnd_delta, tp->snd_cwnd_clamp);
> 

Good idea, thanks, I'll send a v2

^ permalink raw reply

* [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Eric Dumazet @ 2012-07-20 15:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Tom Herbert, Neal Cardwell, Yuchung Cheng,
	Stephen Hemminger, John Heffner, Nandita Dukkipati

From: Eric Dumazet <edumazet@google.com>

When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()

Problem is this RFC 3465 statement is not correctly coded, as
the while () loop increases snd_cwnd one by one.

Add a new variable to avoid this off-by one error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---
v2: added John suggestion

 net/ipv4/tcp_cong.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 04dbd7a..4d4db16 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -307,6 +307,7 @@ EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited);
 void tcp_slow_start(struct tcp_sock *tp)
 {
 	int cnt; /* increase in packets */
+	unsigned int delta = 0;
 
 	/* RFC3465: ABC Slow start
 	 * Increase only after a full MSS of bytes is acked
@@ -333,9 +334,9 @@ void tcp_slow_start(struct tcp_sock *tp)
 	tp->snd_cwnd_cnt += cnt;
 	while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
 		tp->snd_cwnd_cnt -= tp->snd_cwnd;
-		if (tp->snd_cwnd < tp->snd_cwnd_clamp)
-			tp->snd_cwnd++;
+		delta++;
 	}
+	tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);
 

^ permalink raw reply related

* Re: [patch net-next 5/6] bond_sysfs: use ream_num_tx_queues rather than params.tx_queue
From: Ben Hutchings @ 2012-07-20 15:03 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, edumazet, shemminger, fubar, andy
In-Reply-To: <1342787331-1866-6-git-send-email-jiri@resnulli.us>

Typo in the subject line. :-)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Yuchung Cheng @ 2012-07-20 15:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Tom Herbert, Neal Cardwell,
	Stephen Hemminger, John Heffner, Nandita Dukkipati
In-Reply-To: <1342796553.2626.7389.camel@edumazet-glaptop>

On Fri, Jul 20, 2012 at 8:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the
> received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start()
>
> Problem is this RFC 3465 statement is not correctly coded, as
> the while () loop increases snd_cwnd one by one.
>
> Add a new variable to avoid this off-by one error.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Nandita Dukkipati <nanditad@google.com>
> Cc: John Heffner <johnwheffner@gmail.com>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> ---
> v2: added John suggestion
>
>  net/ipv4/tcp_cong.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 04dbd7a..4d4db16 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -307,6 +307,7 @@ EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited);
>  void tcp_slow_start(struct tcp_sock *tp)
>  {
>         int cnt; /* increase in packets */
> +       unsigned int delta = 0;
>
>         /* RFC3465: ABC Slow start
>          * Increase only after a full MSS of bytes is acked
> @@ -333,9 +334,9 @@ void tcp_slow_start(struct tcp_sock *tp)
>         tp->snd_cwnd_cnt += cnt;
>         while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
>                 tp->snd_cwnd_cnt -= tp->snd_cwnd;
> -               if (tp->snd_cwnd < tp->snd_cwnd_clamp)
> -                       tp->snd_cwnd++;
> +               delta++;
Nice! this also removes wasteful iteration when clamp << cwnd_cnt.
>         }
> +       tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
>  }
>  EXPORT_SYMBOL_GPL(tcp_slow_start);
>
>
>

^ permalink raw reply

* Re: [PATCH 00/16]: Kill the ipv4 routing cache.
From: Ben Hutchings @ 2012-07-20 15:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120719.143403.1702333347618917729.davem@davemloft.net>

On Thu, 2012-07-19 at 14:34 -0700, David Miller wrote:
> The ipv4 routing cache is non-deterministic, performance wise, and is
> subject to reasonably easy to launch denial of service attacks.
[...]

This is a great explanation, but it still doesn't appear to be going
into the commit log...

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] b44: add 64 bit stats
From: Eric Dumazet @ 2012-07-20 15:12 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Kevin Groeneveld, netdev
In-Reply-To: <1342794821.2678.8.camel@bwh-desktop.uk.solarflarecom.com>

On Fri, 2012-07-20 at 15:33 +0100, Ben Hutchings wrote:

> Don't we need an _irq variant for drivers that support netpoll?
> 

netpoll is such a hack I would not bother having a 0.000001 % chance to
get a "about to be wrapped 64bit counter" on a 32bit cpu.

^ permalink raw reply

* Re: [ethtool 1/2] ethtool.h: implement new MDI-X set defines
From: Ben Hutchings @ 2012-07-20 15:24 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Jesse Brandeburg, netdev, davem
In-Reply-To: <1342765524-29711-1-git-send-email-jeffrey.t.kirsher@intel.com>

On Thu, 2012-07-19 at 23:25 -0700, Jeff Kirsher wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> These changes implement the kernel side of the interface
> for allowing drivers to set MDI-X state on twisted pair.
> 
> Changes implemented as suggested by Ben Hutchings, thanks Ben!
> 
> see ethtool patches titled:
> ethtool: allow setting MDI-X state
[...]

I don't see the corresponding kernel changes, but I'll hold onto these
for a while in the hope that they show up.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: New commands to configure IOV features
From: Chris Friesen @ 2012-07-20 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: ddutile, yuvalmin, bhutchings, gregory.v.rose, netdev, linux-pci
In-Reply-To: <20120717.141153.46613285253481776.davem@davemloft.net>

On 07/17/2012 03:11 PM, David Miller wrote:
> From: Chris Friesen<chris.friesen@genband.com>
> Date: Tue, 17 Jul 2012 15:08:45 -0600
>
>>  From that perspective a sysfs-based interface is ideal since it is
>> directly scriptable.
>
> As is anything ethtool or netlink based, since we have 'ethtool'
> and 'ip' for scripting.

I'm not picky...whatever works.

To me the act of creating virtual functions seems generic enough (I'm 
aware of SR-IOV capable storage controllers, I'm sure there is other 
hardware as well) that ethtool/ip don't really seem like the most 
appropriate tools for the job.

I would have thought it would make more sense as a generic PCI 
functionality, in which case I'm not aware of an existing binary tool 
that would be a logical choice to extend.

Chris

^ permalink raw reply

* Re: [PATCH net-next 4/7] sfc: Add support for IEEE-1588 PTP
From: Richard Cochran @ 2012-07-20 15:30 UTC (permalink / raw)
  To: Stuart Hodgson
  Cc: Ben Hutchings, David Miller, netdev, linux-net-drivers,
	Andrew Jackson
In-Reply-To: <500921C2.1080001@solarflare.com>

On Fri, Jul 20, 2012 at 10:15:46AM +0100, Stuart Hodgson wrote:
> 
> Do you mean using the PPS kernel consumer to govern the system time?

Well, I meant just using the PPS subsystem, which does not necessarily
mean that the kernel consumer has to be used. In my experience, it is
better to handle the servo in user space, but in any case, the user
has the choice of what to do.

> >>>> +	ptp_pps_evt.type = PTP_CLOCK_EXTTS;
> >>>> +	ptp_pps_evt.timestamp = ktime_to_ns(gen_time_host);
> >>>> +	ptp_clock_event(ptp->phc_clock, &ptp_pps_evt);

> In order for a PPS to arrive at the kernel consumer ptp_clock_event
> needs to be called with PTP_CLOCK_PPS. This then calls pps_get_ts
> and stamps the event with the current system time, not the time
> that was put into the event.

Oops, I meant PTP_CLOCK_PPS. I overlooked that your code is making an
external timestamp event, but the basic idea is similar.

> Using PTP_CLOCK_EXTTS the PPS is visible to userspace via a read
> on the phc device and can then be used in our modified ptpd2.

How does your program use this information?

> > ... why can't you also just set the time?
> 
> Our hardware can only have an offset applied to the clock. In order to set time
> we need to know the time now, then work out and offset to get to the target time.
> At the point that we apply this offset the clock will have moved on and not be
> set to the target time. We can apply some measured average times to the offset
> to get closer but with this hardware settime will not leave the NIC clock at the
> desired time.

It does not matter if setting the time introduces a small error. That
usually happens, but it is no big deal, since the servo in the PTP
stack will correct the error.

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next] tcp: improve latencies of timer triggered events
From: Eric Dumazet @ 2012-07-20 15:45 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Tom Herbert, Yuchung Cheng, Neal Cardwell,
	Nandita Dukkipati, John Heffner, H.K. Jerry Chu

From: Eric Dumazet <edumazet@google.com>

Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
---
 include/linux/tcp.h   |    4 +-
 include/net/tcp.h     |    2 +
 net/ipv4/tcp_output.c |   46 ++++++++++++++++----------
 net/ipv4/tcp_timer.c  |   70 +++++++++++++++++++++-------------------
 4 files changed, 71 insertions(+), 51 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 9febfb6..2761856 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -515,7 +515,9 @@ struct tcp_sock {
 enum tsq_flags {
 	TSQ_THROTTLED,
 	TSQ_QUEUED,
-	TSQ_OWNED, /* tcp_tasklet_func() found socket was locked */
+	TCP_TSQ_DEFERRED,	   /* tcp_tasklet_func() found socket was owned */
+	TCP_WRITE_TIMER_DEFERRED,  /* tcp_write_timer() found socket was owned */
+	TCP_DELACK_TIMER_DEFERRED, /* tcp_delack_timer() found socket was owned */
 };
 
 static inline struct tcp_sock *tcp_sk(const struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index bc7c134..e19124b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -350,6 +350,8 @@ extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 extern int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 			size_t size, int flags);
 extern void tcp_release_cb(struct sock *sk);
+extern void tcp_write_timer_handler(struct sock *sk);
+extern void tcp_delack_timer_handler(struct sock *sk);
 extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 				 const struct tcphdr *th, unsigned int len);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 27a32ac..950aebf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -837,6 +837,13 @@ struct tsq_tasklet {
 };
 static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
 
+static void tcp_tsq_handler(struct sock *sk)
+{
+	if ((1 << sk->sk_state) &
+	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
+	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
+		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+}
 /*
  * One tasklest per cpu tries to send more skbs.
  * We run in tasklet context but need to disable irqs when
@@ -864,16 +871,10 @@ static void tcp_tasklet_func(unsigned long data)
 		bh_lock_sock(sk);
 
 		if (!sock_owned_by_user(sk)) {
-			if ((1 << sk->sk_state) &
-			    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
-			     TCPF_CLOSING | TCPF_CLOSE_WAIT | TCPF_LAST_ACK))
-				tcp_write_xmit(sk,
-					       tcp_current_mss(sk),
-					       0, 0,
-					       GFP_ATOMIC);
+			tcp_tsq_handler(sk);
 		} else {
 			/* defer the work to tcp_release_cb() */
-			set_bit(TSQ_OWNED, &tp->tsq_flags);
+			set_bit(TCP_TSQ_DEFERRED, &tp->tsq_flags);
 		}
 		bh_unlock_sock(sk);
 
@@ -882,6 +883,9 @@ static void tcp_tasklet_func(unsigned long data)
 	}
 }
 
+#define TCP_DEFERRED_ALL ((1UL << TCP_TSQ_DEFERRED) |		\
+			  (1UL << TCP_WRITE_TIMER_DEFERRED) |	\
+			  (1UL << TCP_DELACK_TIMER_DEFERRED))
 /**
  * tcp_release_cb - tcp release_sock() callback
  * @sk: socket
@@ -892,16 +896,24 @@ static void tcp_tasklet_func(unsigned long data)
 void tcp_release_cb(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	unsigned long flags, nflags;
 
-	if (test_and_clear_bit(TSQ_OWNED, &tp->tsq_flags)) {
-		if ((1 << sk->sk_state) &
-		    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
-		     TCPF_CLOSING | TCPF_CLOSE_WAIT | TCPF_LAST_ACK))
-			tcp_write_xmit(sk,
-				       tcp_current_mss(sk),
-				       0, 0,
-				       GFP_ATOMIC);
-	}
+	/* perform an atomic operation only if at least one flag is set */
+	do {
+		flags = tp->tsq_flags;
+		if (!(flags & TCP_DEFERRED_ALL))
+			return;
+		nflags = flags & ~TCP_DEFERRED_ALL;
+	} while (cmpxchg(&tp->tsq_flags, flags, nflags) != flags);
+
+	if (flags & (1UL << TCP_TSQ_DEFERRED))
+		tcp_tsq_handler(sk);
+
+	if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED))
+		tcp_write_timer_handler(sk);
+
+	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED))
+		tcp_delack_timer_handler(sk);
 }
 EXPORT_SYMBOL(tcp_release_cb);
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index e911e6c..6df36ad 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -32,17 +32,6 @@ int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
 int sysctl_tcp_orphan_retries __read_mostly;
 int sysctl_tcp_thin_linear_timeouts __read_mostly;
 
-static void tcp_write_timer(unsigned long);
-static void tcp_delack_timer(unsigned long);
-static void tcp_keepalive_timer (unsigned long data);
-
-void tcp_init_xmit_timers(struct sock *sk)
-{
-	inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
-				  &tcp_keepalive_timer);
-}
-EXPORT_SYMBOL(tcp_init_xmit_timers);
-
 static void tcp_write_err(struct sock *sk)
 {
 	sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT;
@@ -205,21 +194,11 @@ static int tcp_write_timeout(struct sock *sk)
 	return 0;
 }
 
-static void tcp_delack_timer(unsigned long data)
+void tcp_delack_timer_handler(struct sock *sk)
 {
-	struct sock *sk = (struct sock *)data;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
-	bh_lock_sock(sk);
-	if (sock_owned_by_user(sk)) {
-		/* Try again later. */
-		icsk->icsk_ack.blocked = 1;
-		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED);
-		sk_reset_timer(sk, &icsk->icsk_delack_timer, jiffies + TCP_DELACK_MIN);
-		goto out_unlock;
-	}
-
 	sk_mem_reclaim_partial(sk);
 
 	if (sk->sk_state == TCP_CLOSE || !(icsk->icsk_ack.pending & ICSK_ACK_TIMER))
@@ -260,7 +239,21 @@ static void tcp_delack_timer(unsigned long data)
 out:
 	if (sk_under_memory_pressure(sk))
 		sk_mem_reclaim(sk);
-out_unlock:
+}
+
+static void tcp_delack_timer(unsigned long data)
+{
+	struct sock *sk = (struct sock *)data;
+
+	bh_lock_sock(sk);
+	if (!sock_owned_by_user(sk)) {
+		tcp_delack_timer_handler(sk);
+	} else {
+		inet_csk(sk)->icsk_ack.blocked = 1;
+		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED);
+		/* deleguate our work to tcp_release_cb() */
+		set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags);
+	}
 	bh_unlock_sock(sk);
 	sock_put(sk);
 }
@@ -450,19 +443,11 @@ out_reset_timer:
 out:;
 }
 
-static void tcp_write_timer(unsigned long data)
+void tcp_write_timer_handler(struct sock *sk)
 {
-	struct sock *sk = (struct sock *)data;
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int event;
 
-	bh_lock_sock(sk);
-	if (sock_owned_by_user(sk)) {
-		/* Try again later */
-		sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + (HZ / 20));
-		goto out_unlock;
-	}
-
 	if (sk->sk_state == TCP_CLOSE || !icsk->icsk_pending)
 		goto out;
 
@@ -485,7 +470,19 @@ static void tcp_write_timer(unsigned long data)
 
 out:
 	sk_mem_reclaim(sk);
-out_unlock:
+}
+
+static void tcp_write_timer(unsigned long data)
+{
+	struct sock *sk = (struct sock *)data;
+
+	bh_lock_sock(sk);
+	if (!sock_owned_by_user(sk)) {
+		tcp_write_timer_handler(sk);
+	} else {
+		/* deleguate our work to tcp_release_cb() */
+		set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags);
+	}
 	bh_unlock_sock(sk);
 	sock_put(sk);
 }
@@ -602,3 +599,10 @@ out:
 	bh_unlock_sock(sk);
 	sock_put(sk);
 }
+
+void tcp_init_xmit_timers(struct sock *sk)
+{
+	inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
+				  &tcp_keepalive_timer);
+}
+EXPORT_SYMBOL(tcp_init_xmit_timers);

^ permalink raw reply related

* Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring system
From: Or Gerlitz @ 2012-07-20 15:49 UTC (permalink / raw)
  To: roland, davem; +Cc: linux-rdma, erezsh, Shlomo Pongratz, Or Gerlitz, netdev
In-Reply-To: <1342703938-29904-2-git-send-email-ogerlitz@mellanox.com>

On Thu, Jul 19, 2012 at 4:18 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> From: Shlomo Pongratz <shlomop@mellanox.com>
>
> Dave Miller <davem@davemloft.net> provided a detailed description of why the
> way IPoIB is using neighbours for its own ipoib_neigh struct is buggy:
[...]

> This patch aims to solve the race conditions found in the IPoIB driver.
>
> The patch breaks the connection between the core networking neighbour structure
> and the ipoib_neigh structure. Except for avoiding the race, it allows to in
> under a setup where SKBs carrying IP packets that don't have any associated
> neighbour are transmitted through IPoIB.
>
> We add an ipoib_neigh hash table with 1024 buckets. The hash table key is the destin
> hardware address. Thus the ipoib_neigh is fetched from the hash table and not
> dereferenced from the stashed location at the neighbour structure. The hash table uses
> both RCU and reference count mechanisms to guarantee that no ipoib_neigh instance is
> ever deleted while in use.
>
> Fetching the ipoib_neigh structure instance from the hash also makes the special
> code in ipoib_start_xmit that handles remote and local bonding failover redundant.
>
> Aged ipoib_neigh instances are deleted by a garbage collection task that runs every
> 30 seconds and deletes every ipoib_neigh instance that was idle for at least 60
> seconds. The deletion is safe since the ipoib_neigh instances are protected
> using RCU and reference count mechanisms.

Hi Dave, Roland, Eric

So how does this look? in the right direction? anything that need to be fixed?

Or.

^ permalink raw reply

* Re: [patch iproute2] iplink: add support for num[tr]xqueues
From: Stephen Hemminger @ 2012-07-20 15:52 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, David Miller, edumazet, shemminger

I like the option, but  numtxqueue is too verbose for the syntax model
of iproute. Why not use txq and rxq?

Sent from my ASUS Pad

Jiri Pirko <jiri@resnulli.us> wrote:

>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>---
> include/linux/if_link.h |    2 ++
> ip/iplink.c             |   20 ++++++++++++++++++++
> man/man8/ip-link.8.in   |   13 +++++++++++++
> 3 files changed, 35 insertions(+)
>
>diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>index 00e5868..46f03db 100644
>--- a/include/linux/if_link.h
>+++ b/include/linux/if_link.h
>@@ -140,6 +140,8 @@ enum {
> 	IFLA_EXT_MASK,		/* Extended info mask, VFs, etc */
> 	IFLA_PROMISCUITY,	/* Promiscuity count: > 0 means acts PROMISC */
> #define IFLA_PROMISCUITY IFLA_PROMISCUITY
>+	IFLA_NUM_TX_QUEUES,
>+	IFLA_NUM_RX_QUEUES,
> 	__IFLA_MAX
> };
> 
>diff --git a/ip/iplink.c b/ip/iplink.c
>index 679091e..0baa128 100644
>--- a/ip/iplink.c
>+++ b/ip/iplink.c
>@@ -48,6 +48,8 @@ void iplink_usage(void)
> 		fprintf(stderr, "                   [ address LLADDR ]\n");
> 		fprintf(stderr, "                   [ broadcast LLADDR ]\n");
> 		fprintf(stderr, "                   [ mtu MTU ]\n");
>+		fprintf(stderr, "                   [ numtxqueues QUEUE_COUNT ]\n");
>+		fprintf(stderr, "                   [ numrxqueues QUEUE_COUNT ]\n");
> 		fprintf(stderr, "                   type TYPE [ ARGS ]\n");
> 		fprintf(stderr, "       ip link delete DEV type TYPE [ ARGS ]\n");
> 		fprintf(stderr, "\n");
>@@ -279,6 +281,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
> 	int mtu = -1;
> 	int netns = -1;
> 	int vf = -1;
>+	int numtxqueues = -1;
>+	int numrxqueues = -1;
> 
> 	*group = -1;
> 	ret = argc;
>@@ -445,6 +449,22 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
> 				invarg("Invalid operstate\n", *argv);
> 
> 			addattr8(&req->n, sizeof(*req), IFLA_OPERSTATE, state);
>+		} else if (strcmp(*argv, "numtxqueues") == 0) {
>+			NEXT_ARG();
>+			if (numtxqueues != -1)
>+				duparg("numtxqueues", *argv);
>+			if (get_integer(&numtxqueues, *argv, 0))
>+				invarg("Invalid \"numtxqueues\" value\n", *argv);
>+			addattr_l(&req->n, sizeof(*req), IFLA_NUM_TX_QUEUES,
>+				  &numtxqueues, 4);
>+		} else if (strcmp(*argv, "numrxqueues") == 0) {
>+			NEXT_ARG();
>+			if (numrxqueues != -1)
>+				duparg("numrxqueues", *argv);
>+			if (get_integer(&numrxqueues, *argv, 0))
>+				invarg("Invalid \"numrxqueues\" value\n", *argv);
>+			addattr_l(&req->n, sizeof(*req), IFLA_NUM_RX_QUEUES,
>+				  &numrxqueues, 4);
> 		} else {
> 			if (strcmp(*argv, "dev") == 0) {
> 				NEXT_ARG();
>diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
>index 9386cc6..8a24e51 100644
>--- a/man/man8/ip-link.8.in
>+++ b/man/man8/ip-link.8.in
>@@ -40,6 +40,11 @@ ip-link \- network device configuration
> .RB "[ " mtu
> .IR MTU " ]"
> .br
>+.RB "[ " numtxqueues
>+.IR QUEUE_COUNT " ]"
>+.RB "[ " numrxqueues
>+.IR QUEUE_COUNT " ]"
>+.br
> .BR type " TYPE"
> .RI "[ " ARGS " ]"
> 
>@@ -156,6 +161,14 @@ Link types:
> - Ethernet Bridge device
> .in -8
> 
>+.TP
>+.BI numtxqueues " QUEUE_COUNT "
>+specifies the number of transmit queues for new device.
>+
>+.TP
>+.BI numrxqueues " QUEUE_COUNT "
>+specifies the number of receive queues for new device.
>+
> .SS ip link delete - delete virtual link
> .I DEVICE
> specifies the virtual  device to act operate on.
>-- 
>1.7.10.4
>

^ permalink raw reply

* Re: New commands to configure IOV features
From: Don Dutile @ 2012-07-20 15:56 UTC (permalink / raw)
  To: Chris Friesen
  Cc: David Miller, yuvalmin, bhutchings, gregory.v.rose, netdev,
	linux-pci
In-Reply-To: <500978C7.5050004@genband.com>

On 07/20/2012 11:27 AM, Chris Friesen wrote:
> On 07/17/2012 03:11 PM, David Miller wrote:
>> From: Chris Friesen<chris.friesen@genband.com>
>> Date: Tue, 17 Jul 2012 15:08:45 -0600
>>
>>> From that perspective a sysfs-based interface is ideal since it is
>>> directly scriptable.
>>
>> As is anything ethtool or netlink based, since we have 'ethtool'
>> and 'ip' for scripting.
>
> I'm not picky...whatever works.
>
> To me the act of creating virtual functions seems generic enough (I'm aware of SR-IOV capable storage controllers, I'm sure there is other hardware as well) that ethtool/ip don't really seem like the most appropriate tools for the job.
>
Yes, and then there are 'other network' controllers too ... IB  which I don't know if it adheres to ethtool, since it's not an Ethernet device ... isn't that why they call it Infiniband ... ;-) )
In the telecom space, they use NTBs and PCI as a 'network' ... I know, not common in Linux space, and VFs in that space aren't being discussed (that I've ever heard), but another example where 'network' != Ethernet, so ethtool doesn't solve PCI-level configuration/use.

So, VFs are a PCI defined entity, so their enable/disablement should be handled by PCI.

Conversely, when dealing with networking attributes of a PCI-VF ethernet-nic, networking tools should be used, not PCI tools.

> I would have thought it would make more sense as a generic PCI functionality, in which case I'm not aware of an existing binary tool that would be a logical choice to extend.
>
> Chris

^ permalink raw reply

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
From: hayeswang @ 2012-07-20 16:01 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: 'David Miller', eric.dumazet, netdev
In-Reply-To: <20120720100846.GA17398@electric-eye.fr.zoreil.com>

 Francois Romieu [mailto:romieu@fr.zoreil.com] 

[...]
> > I find that the total length field of IP header would be 
> modified if the hw
> > checksum is enabled. Therefore, skb_padto + hw checksum 
> wouldn't work.
> 
> Ok, my patch completely ignored the fact that skb_padto does 
> not change the
> length.
> 
> However skb_padto + length adjustement + hw checksum should 
> work (at least in
> theory if not in the patch below) ?

If the hw only fills in the checksum fields of IP header, UDP header, and TCP
header, the patch would work. However, the hw would also fill in the total
length field of IP header, so it causes problems. For example, I send a packet
with ethernet header 14 bytes + IP header 20 bytes + data 20 bytes = 54 bytes.

Case 1: Software checksum + pad zeroes to 60 bytes
   Receiver gets this packet and finds the total length in IP header would be 40
bytes. Therefore, the receiver knows the data would be 40 - 20 (IP header) = 20
bytes.

Case 2: pad zeroes to 60 bytes + hw checksum
   Receiver gets this packet and would find the total length in IP header is 40
+ (60-54) = 46 bytes, not 40 bytes. Therefore, the receiver consider the data
would be 46 - 20 = 26 bytes. However, the final 6 bytes should not be the parts
of data.

Best Regards,
Hayes

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Neal Cardwell @ 2012-07-20 16:03 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, David Miller, netdev, Tom Herbert,
	Stephen Hemminger, John Heffner, Nandita Dukkipati
In-Reply-To: <CAK6E8=dkwruGMtW2uQ--xZ_so9aZQt-KcmHsy2yE5GApKphrJQ@mail.gmail.com>

On Fri, Jul 20, 2012 at 8:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> On Fri, Jul 20, 2012 at 8:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>         tp->snd_cwnd_cnt += cnt;
>>         while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {

Nice catch, Eric.

One thing that's always bothered me about the tp->snd_cwnd_cnt code is
that the slow start and congestion avoidance use different criteria
for incrementing snd_cwnd_cnt. tcp_slow_start() increments
snd_cwnd_cnt by snd_cwnd for each ACKed packet, and congestion
avoidance increases snd_cwnd_cnt by just 1 for each packet.

This means that if we exit slow start and enter congestion avoidance,
then we think we can have a "credit" for a bunch of ACKs that never
happened (up to snd_cwnd-1), so we can conceivably do our first
additive increase in congestion avoidance up to almost 1RTT too
early. Can we just get rid of the use of snd_cwnd_cnt in slow start,
and just use local variables in tcp_slow_start() rather than trying to
carry state between ACKs?

neal

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Eric Dumazet @ 2012-07-20 16:08 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Yuchung Cheng, David Miller, netdev, Tom Herbert,
	Stephen Hemminger, John Heffner, Nandita Dukkipati
In-Reply-To: <CADVnQy=nJz33pzd+o1WoCOR6Mk2pbB2x+bujqbfLUi8-+J=hGA@mail.gmail.com>

On Fri, 2012-07-20 at 09:03 -0700, Neal Cardwell wrote:
> On Fri, Jul 20, 2012 at 8:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Fri, Jul 20, 2012 at 8:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>         tp->snd_cwnd_cnt += cnt;
> >>         while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
> 
> Nice catch, Eric.
> 
> One thing that's always bothered me about the tp->snd_cwnd_cnt code is
> that the slow start and congestion avoidance use different criteria
> for incrementing snd_cwnd_cnt. tcp_slow_start() increments
> snd_cwnd_cnt by snd_cwnd for each ACKed packet, and congestion
> avoidance increases snd_cwnd_cnt by just 1 for each packet.
> 
> This means that if we exit slow start and enter congestion avoidance,
> then we think we can have a "credit" for a bunch of ACKs that never
> happened (up to snd_cwnd-1), so we can conceivably do our first
> additive increase in congestion avoidance up to almost 1RTT too
> early. Can we just get rid of the use of snd_cwnd_cnt in slow start,
> and just use local variables in tcp_slow_start() rather than trying to
> carry state between ACKs?

Apparently tcp_slow_start() needs the snd_cwnd_cnt in case 
"limited slow start"  is used :

cnt = sysctl_tcp_max_ssthresh >> 1;

So to address your point, maybe we should clear  snd_cwnd_cnt
when leaving slow start for congestion avoidance phase ?

^ permalink raw reply

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: David Miller @ 2012-07-20 16:17 UTC (permalink / raw)
  To: hayeswang; +Cc: eric.dumazet, romieu, netdev
In-Reply-To: <EF06A7B3C1E6432BB660CDEF8E1D2A1B@realtek.com.tw>

From: hayeswang <hayeswang@realtek.com>
Date: Fri, 20 Jul 2012 15:14:26 +0800

> I find that the total length field of IP header would be modified if the hw
> checksum is enabled. Therefore, skb_padto + hw checksum wouldn't work. The
> software checksum is necessary.

It's really strange that the hardware has any reason to update the IP
header length field when it is asked to compute the checksum.

^ permalink raw reply

* Re: [PATCH 00/16]: Kill the ipv4 routing cache.
From: David Miller @ 2012-07-20 16:26 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev
In-Reply-To: <1342796969.2678.12.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 20 Jul 2012 16:09:29 +0100

> On Thu, 2012-07-19 at 14:34 -0700, David Miller wrote:
>> The ipv4 routing cache is non-deterministic, performance wise, and is
>> subject to reasonably easy to launch denial of service attacks.
> [...]
> 
> This is a great explanation, but it still doesn't appear to be going
> into the commit log...

What in the world do you mean?

When I actually commit this stuff, I'll have it all in a branch and
merge it into net-next's master using "git merge --no-ff" and then use
the merge commit to add this commit message test.

Every damn set of changes I've commited over the past few weeks have
used this technique, are you simply not paying attention?

^ permalink raw reply

* pull-request: can-next 2012-07-20
From: Marc Kleine-Budde @ 2012-07-20 16:27 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, linux-can@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]

Hello David,

the fifth pull request for upcoming v3.6 net-next cleans up and
improves the janz-ican3 driver (6 patches by Ira W. Snyder, one by me).
A patch by Steffen Trumtrar adds imx53 support to the flexcan driver.
And another patch by me, which marks the bit timing constant in the CAN
drivers as "const".

regards, Marc

---

The following changes since commit 769162e38b91e1d300752e666260fa6c7b203fbc:

  Merge branch 'net' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile (2012-07-19 13:39:27 -0700)

are available in the git repository at:


  git://gitorious.org/linux-can/linux-can-next.git for-davem

for you to fetch changes up to 3b5c6b9e49f78f07ebcd34b38c1185e57a0fd9eb:

  can: janz-ican3: add support for one shot mode (2012-07-20 17:49:05 +0200)

----------------------------------------------------------------
Ira W. Snyder (6):
      can: janz-ican3: remove dead code
      can: janz-ican3: drop invalid skbs
      can: janz-ican3: fix error and byte counters
      can: janz-ican3: fix support for CAN_RAW_RECV_OWN_MSGS
      can: janz-ican3: avoid firmware lockup caused by infinite bus error quota
      can: janz-ican3: add support for one shot mode

Marc Kleine-Budde (2):
      can: mark bittiming_const pointer in struct can_priv as const
      can: janz-ican3: cleanup of ican3_to_can_frame and can_frame_to_ican3

Steffen Trumtrar (1):
      can: flexcan: add 2nd clock to support imx53 and newer

 drivers/net/can/at91_can.c                   |    2 +-
 drivers/net/can/bfin_can.c                   |    2 +-
 drivers/net/can/c_can/c_can.c                |    2 +-
 drivers/net/can/cc770/cc770.c                |    2 +-
 drivers/net/can/flexcan.c                    |   47 +++--
 drivers/net/can/janz-ican3.c                 |  241 ++++++++++++++++++++------
 drivers/net/can/mcp251x.c                    |    2 +-
 drivers/net/can/mscan/mscan.c                |    2 +-
 drivers/net/can/pch_can.c                    |    2 +-
 drivers/net/can/sja1000/sja1000.c            |    2 +-
 drivers/net/can/ti_hecc.c                    |    2 +-
 drivers/net/can/usb/ems_usb.c                |    2 +-
 drivers/net/can/usb/esd_usb2.c               |    2 +-
 drivers/net/can/usb/peak_usb/pcan_usb_core.h |    2 +-
 include/linux/can/dev.h                      |    2 +-
 15 files changed, 226 insertions(+), 88 deletions(-)

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: David Miller @ 2012-07-20 16:28 UTC (permalink / raw)
  To: hayeswang; +Cc: romieu, eric.dumazet, netdev
In-Reply-To: <A4FD513AF09D40849B8A75C371F1D431@realtek.com.tw>

From: hayeswang <hayeswang@realtek.com>
Date: Sat, 21 Jul 2012 00:01:38 +0800

> However, the hw would also fill in the total length field of IP
> header, so it causes problems.

Why does the HW modify fields we did not ask it to?

^ permalink raw reply

* Re: [RFC PATCH] net: Add support for virtual machine device queues (VMDQ)
From: John Fastabend @ 2012-07-20 16:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: or.gerlitz, davem, roland, netdev, ali, sean.hefty, shlomop,
	Ronciak, John
In-Reply-To: <20120719064258.GA1665@minipsycho.orion>

On 7/18/2012 11:42 PM, Jiri Pirko wrote:
> Thu, Jul 19, 2012 at 12:05:44AM CEST, john.r.fastabend@intel.com wrote:
>> This adds support to allow virtual net devices to be created. These
>> devices can be managed independtly of the physical function but
>> use the same physical link.

[...]

>> +
>> +size_t vmdq_getpriv_size(struct net *src_net, struct nlattr *tb[])
>> +{
>> +	struct net_device *lowerdev;
>> +
>> +	if (!tb[IFLA_LINK])
>> +		return -EINVAL;
>> +
>> +	lowerdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
>> +	if (!lowerdev)
>> +		return -ENODEV;
>> +
>> +	return sizeof(netdev_priv(lowerdev));
>> +}
>
> Why exactly do you need to have the priv of same size as lowerdev? I do
> not see you use that anywhere...
>

When we add a child device the hardware/sw may have some private data
it needs to manage this device.

I made an assumption here that the priv space for child devices is the
same as the lowerdev but this might be a bad assumption.

.John

^ permalink raw reply

* Re: [PATCH 00/16]: Kill the ipv4 routing cache.
From: Eric Dumazet @ 2012-07-20 16:43 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev
In-Reply-To: <20120720.092614.1285553995467252576.davem@davemloft.net>

On Fri, 2012-07-20 at 09:26 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 20 Jul 2012 16:09:29 +0100
> 
> > On Thu, 2012-07-19 at 14:34 -0700, David Miller wrote:
> >> The ipv4 routing cache is non-deterministic, performance wise, and is
> >> subject to reasonably easy to launch denial of service attacks.
> > [...]
> > 
> > This is a great explanation, but it still doesn't appear to be going
> > into the commit log...
> 
> What in the world do you mean?
> 
> When I actually commit this stuff, I'll have it all in a branch and
> merge it into net-next's master using "git merge --no-ff" and then use
> the merge commit to add this commit message test.

I am not familiar with this "git merge --no-ff", so the following
might be completely irrelevant :

It would be nice if you can copy the changelog of 00/16 to 01/16

01/16 has no changelog, and it probably can confuse git users in the
future, since this patch is not trivial.

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Neal Cardwell @ 2012-07-20 16:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Yuchung Cheng, David Miller, netdev, Tom Herbert,
	Stephen Hemminger, John Heffner, Nandita Dukkipati
In-Reply-To: <1342800536.2626.7670.camel@edumazet-glaptop>

On Fri, Jul 20, 2012 at 9:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> So to address your point, maybe we should clear  snd_cwnd_cnt
> when leaving slow start for congestion avoidance phase ?

Sounds good. That can be a separate commit to add the new logic to the
end of tcp_slow_start() to check to see if we've bumped into ssthresh
and reset snd_cwnd_cnt.

neal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox