* [PATCH for-2.6.35] virtio_net: fix oom handling on tx
From: Michael S. Tsirkin @ 2010-06-10 15:20 UTC (permalink / raw)
To: virtualization, Rusty Russell, Michael S. Tsirkin, Jiri Pirko,
Shirley Ma
virtio net will never try to overflow the TX ring, so the only reason
add_buf may fail is out of memory. Thus, we can not stop the
device until some request completes - there's no guarantee anything
at all is outstanding.
Make the error message clearer as well: error here does not
indicate queue full.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/net/virtio_net.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 85615a3..e48a06f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -563,7 +563,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int capacity;
-again:
/* Free up any pending old buffers before queueing new ones. */
free_old_xmit_skbs(vi);
@@ -572,12 +571,14 @@ again:
/* This can happen with OOM and indirect buffers. */
if (unlikely(capacity < 0)) {
- netif_stop_queue(dev);
- dev_warn(&dev->dev, "Unexpected full queue\n");
- if (unlikely(!virtqueue_enable_cb(vi->svq))) {
- virtqueue_disable_cb(vi->svq);
- netif_start_queue(dev);
- goto again;
+ if (net_ratelimit()) {
+ if (likely(capacity == -ENOMEM))
+ dev_warn(&dev->dev,
+ "TX queue failure: out of memory\n");
+ else
+ dev_warn(&dev->dev,
+ "Unexpected TX queue failure: %d\n",
+ capacity);
}
return NETDEV_TX_BUSY;
}
--
1.7.1.12.g42b7f
^ permalink raw reply related
* Re: [PATCH] virtio_net: indicate oom when addbuf returns failure
From: Bruce Rogers @ 2010-06-10 15:46 UTC (permalink / raw)
To: Michael S. Tsirkin, Rusty Russell
Cc: Herbert Xu, stable, virtualization, netdev
In-Reply-To: <20100606201258.GA21196@redhat.com>
>>> On 6/6/2010 at 02:13 PM, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 04, 2010 at 10:28:56AM +0930, Rusty Russell wrote:
>> This patch is a subset of an already upstream patch, but this portion
>> is useful in earlier releases.
>>
>> Please consider for the 2.6.32 and 2.6.33 stable trees.
>>
>> If the add_buf operation fails, indicate failure to the caller.
>>
>> Signed-off-by: Bruce Rogers <brogers@novell.com>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Actually this code looks strange:
> Note that add_buf inicates out of memory
> condition with a positive return value, and ring full
> (which is not an error!) with -ENOSPC.
>
> So it seems that this patch (and upstream code) will fill
> the ring and then end up setting oom = true and rescheduling the work
> forever. And I suspect I actually saw this at some point
> on one of my systems: observed BW would drop
> with high CPU usage until reboot.
> Can't reproduce it now anymore ..
>
Thanks for looking into this.
We've decided not to use this patch, since it is not a part of the
solution we need. The upstream patch from whence it came at first glance seemed useful, but is problematic as you point out.
We've retested without that patch and are still getting good results.
Bruce
^ permalink raw reply
* Re: [PATCH 1/2] pktgen: increasing transmission granularity
From: robert @ 2010-06-10 15:44 UTC (permalink / raw)
To: Daniel Turull; +Cc: David Miller, netdev, robert, jens.laas
In-Reply-To: <4C10A603.2080402@gmail.com>
Thanks,
So the scheduling of the next transmission is based on the previous
transmission rather rather than now() + delay.
Cheers
--ro
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
>Daniel Turull wrote: This patch correct a bug in the delay of pktgen.
>It makes sure the inter-packet interval is accurate.
>
>Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
>
>---
>diff --git a/net/core/pktgen.c b/net/core/pktgen.c index
>2ad68da..1dacd7b 100644 --- a/net/core/pktgen.c +++
>b/net/core/pktgen.c @@ -2170,7 +2170,7 @@ static void spin(struct
>pktgen_dev *pkt_dev, ktime_t spin_until)
> end_time = ktime_now();
>
> pkt_dev->idle_acc += ktime_to_ns(ktime_sub(end_time,
> start_time));
>- pkt_dev->next_tx = ktime_add_ns(end_time, pkt_dev->delay);
>+ pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
> }
>
> static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
^ permalink raw reply
* Re: [PATCH v3] netfilter: Xtables: idletimer target implementation
From: Jan Engelhardt @ 2010-06-10 15:55 UTC (permalink / raw)
To: Luciano Coelho
Cc: ext Patrick McHardy, netfilter-devel@vger.kernel.org,
netdev@vger.kernel.org, Timo Teras
In-Reply-To: <1276176776.5453.4.camel@chilepepper>
On Thursday 2010-06-10 15:32, Luciano Coelho wrote:
>
>> I also agree that it makes more sense to lookup and store the timer in
>> the targe.
>
>One quick question about this: did you mean to put it in the target info
>structure, in this case struct idletimer_tg_info? So is it okay to have
>internal data in this structure even though it is mostly for passing
>data from the userspace to the kernel?
You put a kernel-private pointer inside idletimer_tginfo,
similar to what other modules do.
That would also allow you to get a reference to the timer
so that you do not have to call find_by_label on every
idletimer_target invocation.
^ permalink raw reply
* Re: BUG: unable to handle kernel paging request at 000041ed00000001
From: Eric Dumazet @ 2010-06-10 16:00 UTC (permalink / raw)
To: Arturas; +Cc: netdev
In-Reply-To: <9D7251E7-0EFD-4645-BC30-A96191D1046E@res.lt>
Le jeudi 10 juin 2010 à 16:45 +0300, Arturas a écrit :
> Hello,
>
> I'm not sure if i'm writing to right mailling list, but i hope I am. I'm doing
> bonding, bridging and traffic shaping on linux. With such setup i have no
> panics for a few days. But when I add ip address on br0, assign bridge
> interface and ip address to different routing table and using iptables n
> at REDIRECT I'm getting an oops (see attachment). An oops triggers only
> after some traffic. Dmesg and .config also attached. I don't know is it
> enough information for you, but if not, just say what I should do to get
> more information and i'll try. Older kernels have deadlocks for such setup
> except bridge routing, so I can't try older kernels (>=2.6.32). If someone is interested i can write call traces.
>
> Bonding is not multiqueue aware right now and someone promised to make it mq aware
> (not just patch bonding with netdev_alloc_mq). Maybe someone knows what is a status? I can test patches.
>
> Performance tips for .config are very welcome.
>
> --
> Arturas
>
>
>
This is right mailing list :)
I would try following patch for 2.6.34,
not blindly trusting sk_tx_queue_get(sk)
--- net/core/dev.c.orig 2010-06-10 17:52:17.000000000 +0200
+++ net/core/dev.c 2010-06-10 17:54:56.000000000 +0200
@@ -1958,12 +1958,10 @@
static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
{
if (unlikely(queue_index >= dev->real_num_tx_queues)) {
- if (net_ratelimit()) {
- WARN(1, "%s selects TX queue %d, but "
- "real number of TX queues is %d\n",
- dev->name, queue_index,
- dev->real_num_tx_queues);
- }
+ WARN_ONCE("%s selects TX queue %d, but "
+ "real number of TX queues is %d\n",
+ dev->name, queue_index,
+ dev->real_num_tx_queues);
return 0;
}
return queue_index;
@@ -1977,6 +1975,7 @@
if (sk_tx_queue_recorded(sk)) {
queue_index = sk_tx_queue_get(sk);
+ queue_index = dev_cap_txqueue(dev, queue_index);
} else {
const struct net_device_ops *ops = dev->netdev_ops;
^ permalink raw reply
* [PATCH] allow to configure tcp_retries1 and tcp_retries2 per TCP socket
From: Salvador Fandino @ 2010-06-10 16:09 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, "; linux-kernel"
Hi,
The included patch adds support for setting the tcp_retries1 and
tcp_retries2 options in a per socket fashion as it is done for the
keepalive options TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL.
The issue I am trying to solve is that when a socket has data queued for
delivering, the keepalive logic is not triggered. Instead, the
tcp_retries1/2 parameters are used to determine how many delivering
attempts should be performed before giving up.
The patch is very straight forward and just replicates similar
functionality. There is one thing I am not completely sure and is if the
new per-socket fields should go into inet_connection_sock instead of
into tcp_sock.
Regards
Signed-off-by: Salvador Fandino <salvador@qindel.com>
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..15ca599 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -105,6 +105,8 @@ enum {
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */
+#define TCP_RETRIES1 18 /* Number of attempts to retransmit packet normally */
+#define TCP_RETRIES2 19 /* Number of attempts to retransmit packet */
/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
@@ -424,6 +426,9 @@ struct tcp_sock {
unsigned int keepalive_time; /* time before keep alive takes place */
unsigned int keepalive_intvl; /* time interval between keep alive probes */
+ int retries1; /* number of attempts to retransmit packed normally */
+ int retries2; /* number of attempts to retransmit packed */
+
int linger2;
/* Receiver side RTT estimation */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a144914..6d13c97 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -138,6 +138,8 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
#define MAX_TCP_KEEPIDLE 32767
#define MAX_TCP_KEEPINTVL 32767
#define MAX_TCP_KEEPCNT 127
+#define MAX_TCP_RETRIES1 255
+#define MAX_TCP_RETRIES2 32767
#define MAX_TCP_SYNCNT 127
#define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */
@@ -1041,6 +1043,16 @@ static inline u32 keepalive_time_elapsed(const struct tcp_sock *tp)
tcp_time_stamp - tp->rcv_tstamp);
}
+static inline int tcp_retries1_when(const struct tcp_sock *tp)
+{
+ return tp->retries1 ? : sysctl_tcp_retries1;
+}
+
+static inline int tcp_retries2_when(const struct tcp_sock *tp)
+{
+ return tp->retries2 ? : sysctl_tcp_retries2;
+}
+
static inline int tcp_fin_time(const struct sock *sk)
{
int fin_timeout = tcp_sk(sk)->linger2 ? : sysctl_tcp_fin_timeout;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d96c1da..d4f6c4a 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -23,7 +23,7 @@
#include <net/inet_frag.h>
static int zero;
-static int tcp_retr1_max = 255;
+static int tcp_retr1_max = MAX_TCP_RETRIES1;
static int ip_local_port_range_min[] = { 1, 1 };
static int ip_local_port_range_max[] = { 65535, 65535 };
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6596b4f..1fb25d5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2319,6 +2319,18 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
else
tp->keepalive_probes = val;
break;
+ case TCP_RETRIES1:
+ if (val < 1 || val > MAX_TCP_RETRIES1)
+ err = -EINVAL;
+ else
+ tp->retries1 = val;
+ break;
+ case TCP_RETRIES2:
+ if (val < 1 || val > MAX_TCP_RETRIES2)
+ err = -EINVAL;
+ else
+ tp->retries2 = val;
+ break;
case TCP_SYNCNT:
if (val < 1 || val > MAX_TCP_SYNCNT)
err = -EINVAL;
@@ -2511,6 +2523,12 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
case TCP_KEEPCNT:
val = keepalive_probes(tp);
break;
+ case TCP_RETRIES1:
+ val = tcp_retries1_when(tp);
+ break;
+ case TCP_RETRIES2:
+ val = tcp_retries2_when(tp);
+ break;
case TCP_SYNCNT:
val = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
break;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 440a5c6..26db67b 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -167,6 +167,7 @@ static bool retransmits_timed_out(struct sock *sk,
static int tcp_write_timeout(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
int retry_until;
bool do_reset;
@@ -175,14 +176,14 @@ static int tcp_write_timeout(struct sock *sk)
dst_negative_advice(sk);
retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
} else {
- if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+ if (retransmits_timed_out(sk, tcp_retries1_when(tp))) {
/* Black hole detection */
tcp_mtu_probing(icsk, sk);
dst_negative_advice(sk);
}
- retry_until = sysctl_tcp_retries2;
+ retry_until = tcp_retries2_when(tp);
if (sock_flag(sk, SOCK_DEAD)) {
const int alive = (icsk->icsk_rto < TCP_RTO_MAX);
@@ -290,7 +291,7 @@ static void tcp_probe_timer(struct sock *sk)
* with RFCs, only probe timer combines both retransmission timeout
* and probe timeout in one bottle. --ANK
*/
- max_probes = sysctl_tcp_retries2;
+ max_probes = tcp_retries2_when(tp);
if (sock_flag(sk, SOCK_DEAD)) {
const int alive = ((icsk->icsk_rto << icsk->icsk_backoff) < TCP_RTO_MAX);
@@ -437,7 +438,7 @@ out_reset_timer:
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
- if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+ if (retransmits_timed_out(sk, tcp_retries1_when(tp) + 1))
__sk_dst_reset(sk);
out:;
^ permalink raw reply related
* Re: [PATCH] allow to configure tcp_retries1 and tcp_retries2 per TCP socket
From: Andi Kleen @ 2010-06-10 17:00 UTC (permalink / raw)
To: Salvador Fandino; +Cc: netdev, David S. Miller, linux-kernel, "
In-Reply-To: <1276186161.2419.10.camel@topo>
Salvador Fandino <salvador@qindel.com> writes:
> The included patch adds support for setting the tcp_retries1 and
> tcp_retries2 options in a per socket fashion as it is done for the
> keepalive options TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL.
>
> The issue I am trying to solve is that when a socket has data queued for
> delivering, the keepalive logic is not triggered. Instead, the
> tcp_retries1/2 parameters are used to determine how many delivering
> attempts should be performed before giving up.
And why exactly do you need new tunables to solve this?
>
> The patch is very straight forward and just replicates similar
> functionality. There is one thing I am not completely sure and is if the
> new per-socket fields should go into inet_connection_sock instead of
> into tcp_sock.
tcp_sock is already quite big (>2k on 64bit)
IMHO any new fields in there need very good justification.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply
* Re: tunnel creation: using fb_tnl_dev inside struct ip6_tnl_net
From: kernel coder @ 2010-06-10 17:12 UTC (permalink / raw)
To: netdev
In-Reply-To: <AANLkTil_o71Qz2y9UJGw4uKVNQAOlYBVQLRjYtghyrni@mail.gmail.com>
Hi,
I need to port some code from linux kernel 2.6.18 to kernel 2.6.29.
One of the file where changes are involved is <net/ipv6/ip6_tunnel.c>,
where a new structure struct ip6_tnl_net was added into kernel2.6.29
that didn't exist in 2.6.18 as follows:
struct ip6_tnl_net {
/* the IPv6 tunnel fallback device */
struct net_device *fb_tnl_dev;
/* lists for storing tunnels in use */
struct ip6_tnl *tnls_r_l[HASH_SIZE];
struct ip6_tnl *tnls_wc[1];
struct ip6_tnl **tnls[2];
};
All those fields in struct ip6_tnl_net were in global namespace in
2.6.18, but wrapped in this struct for some reason (what benefit does
this gain?)
2.6.18 code in function ip6_tunnel_init() works fine as follows. After
being ported to kernel 2.6.29 as shown below, after kernel/module
build, install, reboot hangs. I inserted some printk message as shown,
only 111111 to 44444 were displayed, which indicates that
"register_netdev(ip6n->fb_tnl_dev)" failed.
Any inputs? Thanks.
====================Kernel 2.6.18 code, works fine====================
static int __init ip6_tunnel_init(void)
{
int err;
struct ip6_tnl *tun0;
if (xfrm6_tunnel_register(&ip6ip6_handler)) {
printk(KERN_ERR "ip6ip6 init: can't register tunnel\n");
return -EAGAIN;
}
ip6ip6_fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6tnl0",
ip6ip6_tnl_dev_setup);
if (!ip6ip6_fb_tnl_dev) {
err = -ENOMEM;
goto fail;
}
ip6ip6_fb_tnl_dev->init = ip6ip6_fb_tnl_dev_init;
if ((err = register_netdev(ip6ip6_fb_tnl_dev))) {
free_netdev(ip6ip6_fb_tnl_dev);
goto fail;
}
tun0 = ip6ip6_fb_tnl_dev->priv;
tun0->parms.flags |= (IP6_TNL_F_CAP_XMIT | IP6_TNL_F_CAP_RCV);
return 0;
fail:
xfrm6_tunnel_deregister(&ip6ip6_handler);
return err;
}
=================modified Kernel 2.6.29 code, buggy==============
static int __init ip6_tunnel_init(void)
{
int err;
struct ip6_tnl_net *ip6n; /*added due to kernel struct change*/
struct ip6_tnl *tun0; /*added 06082010*/
if (xfrm6_tunnel_register(&ip4ip6_handler, AF_INET)) {
printk(KERN_ERR "ip6_tunnel init: can't register ip4ip6\n");
err = -EAGAIN;
goto out;
}
if (xfrm6_tunnel_register(&ip6ip6_handler, AF_INET6)) {
printk(KERN_ERR "ip6_tunnel init: can't register ip6ip6\n");
err = -EAGAIN;
goto unreg_ip4ip6;
}
err = register_pernet_gen_device(&ip6_tnl_net_id, &ip6_tnl_net_ops);
if (err < 0)
goto err_pernet;
printk(KERN_ERR "=========1111111111111111111========\n");
//alloc_netdev() and init the device, added
err = -ENOMEM;
ip6n = kzalloc(sizeof(struct ip6_tnl_net), GFP_KERNEL);
if (ip6n == NULL)
goto out;
ip6n->fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6tnl0",
ip6_tnl_dev_setup);
printk(KERN_ERR "=======222222222222222222=========\n");
if (!ip6n->fb_tnl_dev) {
err = -ENOMEM;
goto out;
}
printk(KERN_ERR "=========3333333333333333333========\n");
ip6n->fb_tnl_dev->init = ip6_fb_tnl_dev_init2;
//ip6_fb_tnl_dev_init(ip6n->fb_tnl_dev);
printk(KERN_ERR "=========4444444444444444444========\n");
err = register_netdev(ip6n->fb_tnl_dev);
if (err<0) {
free_netdev(ip6n->fb_tnl_dev);
goto err_pernet;
}
printk(KERN_ERR "=========555555555555555555========\n");
// tun0 = ip6ip6_fb_tnl_dev->priv; /*added per patch*/
tun0 = netdev_priv(ip6n->fb_tnl_dev); /*modified */
printk(KERN_ERR "=========666666666666666666========\n");
tun0->parms.flags |= (IP6_TNL_F_CAP_XMIT | IP6_TNL_F_CAP_RCV);
printk(KERN_ERR "=========777777777777777777========\n");
return 0;
err_pernet:
xfrm6_tunnel_deregister(&ip6ip6_handler, AF_INET6);
printk(KERN_ERR "========err pernet=========\n");
unreg_ip4ip6:
xfrm6_tunnel_deregister(&ip4ip6_handler, AF_INET);
printk(KERN_ERR "========err unreg_ip4ip6=========\n");
out:
printk(KERN_ERR "========err out==================\n");
return err;
}
^ permalink raw reply
* Re: [PATCH for-2.6.35] virtio_net: fix oom handling on tx
From: Sridhar Samudrala @ 2010-06-10 17:17 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization, Rusty Russell, Jiri Pirko, Shirley Ma, netdev,
linux-kernel
In-Reply-To: <20100610152041.GA3480@redhat.com>
On Thu, 2010-06-10 at 18:20 +0300, Michael S. Tsirkin wrote:
> virtio net will never try to overflow the TX ring, so the only reason
> add_buf may fail is out of memory. Thus, we can not stop the
> device until some request completes - there's no guarantee anything
> at all is outstanding.
>
> Make the error message clearer as well: error here does not
> indicate queue full.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 15 ++++++++-------
> 1 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 85615a3..e48a06f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -563,7 +563,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct virtnet_info *vi = netdev_priv(dev);
> int capacity;
>
> -again:
> /* Free up any pending old buffers before queueing new ones. */
> free_old_xmit_skbs(vi);
>
> @@ -572,12 +571,14 @@ again:
>
> /* This can happen with OOM and indirect buffers. */
> if (unlikely(capacity < 0)) {
> - netif_stop_queue(dev);
> - dev_warn(&dev->dev, "Unexpected full queue\n");
> - if (unlikely(!virtqueue_enable_cb(vi->svq))) {
> - virtqueue_disable_cb(vi->svq);
> - netif_start_queue(dev);
> - goto again;
> + if (net_ratelimit()) {
> + if (likely(capacity == -ENOMEM))
> + dev_warn(&dev->dev,
> + "TX queue failure: out of memory\n");
> + else
> + dev_warn(&dev->dev,
> + "Unexpected TX queue failure: %d\n",
> + capacity);
> }
> return NETDEV_TX_BUSY;
> }
It is not clear to me how xmit_skb() can return -ENOMEM.
xmit_skb() calls virtqueue_add_buf_gfp() which can return -ENOSPC.
Even vring_add_indirect() doesn't return -ENOMEM on kmalloc failure.
Thanks
Sridhar
^ permalink raw reply
* Re: [PATCH 1/2] pktgen: increasing transmission granularity
From: Robert Olsson @ 2010-06-10 17:21 UTC (permalink / raw)
To: Daniel Turull; +Cc: David Miller, netdev, robert, jens.laas
In-Reply-To: <4C10A735.20405@gmail.com>
Thanks,
So with this improved rate control we could do RFC2544-like testing as well.
Cheers
--ro
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Daniel Turull writes:
> This patch increases the granularity of the rate generated by pktgen.
> The previous version of pktgen uses micro seconds (udelay) resolution when it
> was delayed causing gaps in the rates. It is changed to nanosecond (ndelay).
> Now any rate is possible.
>
> Also it allows to set, the desired rate in Mb/s or packets per second.
>
> The documentation has been updated.
>
> Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
>
> ---
>
> diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
> index 61bb645..75e4fd7 100644
> --- a/Documentation/networking/pktgen.txt
> +++ b/Documentation/networking/pktgen.txt
> @@ -151,6 +151,8 @@ Examples:
>
> pgset stop aborts injection. Also, ^C aborts generator.
>
> + pgset "rate 300M" set rate to 300 Mb/s
> + pgset "ratep 1000000" set rate to 1Mpps
>
> Example scripts
> ===============
> @@ -241,6 +243,9 @@ src6
> flows
> flowlen
>
> +rate
> +ratep
> +
> References:
> ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
> ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 1dacd7b..6428653 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -169,7 +169,7 @@
> #include <asm/dma.h>
> #include <asm/div64.h> /* do_div */
>
> -#define VERSION "2.73"
> +#define VERSION "2.74"
> #define IP_NAME_SZ 32
> #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
> #define MPLS_STACK_BOTTOM htonl(0x00000100)
> @@ -980,6 +980,40 @@ static ssize_t pktgen_if_write(struct file *file,
> (unsigned long long) pkt_dev->delay);
> return count;
> }
> + if (!strcmp(name, "rate")) {
> + len = num_arg(&user_buffer[i], 10, &value);
> + if (len < 0)
> + return len;
> +
> + i += len;
> + if (!value)
> + return len;
> + pkt_dev->delay = pkt_dev->min_pkt_size*8*NSEC_PER_USEC/value;
> + if (debug)
> + printk(KERN_INFO
> + "pktgen: Delay set at: %llu ns\n",
> + pkt_dev->delay);
> +
> + sprintf(pg_result, "OK: rate=%lu", value);
> + return count;
> + }
> + if (!strcmp(name, "ratep")) {
> + len = num_arg(&user_buffer[i], 10, &value);
> + if (len < 0)
> + return len;
> +
> + i += len;
> + if (!value)
> + return len;
> + pkt_dev->delay = NSEC_PER_SEC/value;
> + if (debug)
> + printk(KERN_INFO
> + "pktgen: Delay set at: %llu ns\n",
> + pkt_dev->delay);
> +
> + sprintf(pg_result, "OK: rate=%lu", value);
> + return count;
> + }
> if (!strcmp(name, "udp_src_min")) {
> len = num_arg(&user_buffer[i], 10, &value);
> if (len < 0)
> @@ -2142,15 +2176,15 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
> hrtimer_init_on_stack(&t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> hrtimer_set_expires(&t.timer, spin_until);
>
> - remaining = ktime_to_us(hrtimer_expires_remaining(&t.timer));
> + remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer));
> if (remaining <= 0) {
> pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
> return;
> }
>
> start_time = ktime_now();
> - if (remaining < 100)
> - udelay(remaining); /* really small just spin */
> + if (remaining < 100000)
> + ndelay(remaining); /* really small just spin */
> else {
> /* see do_nanosleep */
> hrtimer_init_sleeper(&t, current);
^ permalink raw reply
* Re: [PATCH for-2.6.35] virtio_net: fix oom handling on tx
From: Stephen Hemminger @ 2010-06-10 17:46 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Michael S. Tsirkin, virtualization, Rusty Russell, Jiri Pirko,
Shirley Ma, netdev, linux-kernel
In-Reply-To: <1276190227.22064.19.camel@w-sridhar.beaverton.ibm.com>
On Thu, 10 Jun 2010 10:17:07 -0700
Sridhar Samudrala <sri@us.ibm.com> wrote:
> On Thu, 2010-06-10 at 18:20 +0300, Michael S. Tsirkin wrote:
> > virtio net will never try to overflow the TX ring, so the only reason
> > add_buf may fail is out of memory. Thus, we can not stop the
> > device until some request completes - there's no guarantee anything
> > at all is outstanding.
> >
> > Make the error message clearer as well: error here does not
> > indicate queue full.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > drivers/net/virtio_net.c | 15 ++++++++-------
> > 1 files changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 85615a3..e48a06f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -563,7 +563,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > struct virtnet_info *vi = netdev_priv(dev);
> > int capacity;
> >
> > -again:
> > /* Free up any pending old buffers before queueing new ones. */
> > free_old_xmit_skbs(vi);
> >
> > @@ -572,12 +571,14 @@ again:
> >
> > /* This can happen with OOM and indirect buffers. */
> > if (unlikely(capacity < 0)) {
> > - netif_stop_queue(dev);
> > - dev_warn(&dev->dev, "Unexpected full queue\n");
> > - if (unlikely(!virtqueue_enable_cb(vi->svq))) {
> > - virtqueue_disable_cb(vi->svq);
> > - netif_start_queue(dev);
> > - goto again;
> > + if (net_ratelimit()) {
> > + if (likely(capacity == -ENOMEM))
> > + dev_warn(&dev->dev,
> > + "TX queue failure: out of memory\n");
> > + else
> > + dev_warn(&dev->dev,
> > + "Unexpected TX queue failure: %d\n",
> > + capacity);
> > }
> > return NETDEV_TX_BUSY;
> > }
>
> It is not clear to me how xmit_skb() can return -ENOMEM.
> xmit_skb() calls virtqueue_add_buf_gfp() which can return -ENOSPC.
> Even vring_add_indirect() doesn't return -ENOMEM on kmalloc failure.
It makes more sense to have the device increment tx_droppped,
and return NETDEV_TX_OK. Skip the message (or make it a pr_debug()).
Network devices do not guarantee packet delivery, and if out of
resources then holding more data in the
queue is going to hurt not help the situation.
--
^ permalink raw reply
* [PATCH net-2.6] ixgbe: fix automatic LRO/RSC settings for low latency
From: Andy Gospodarek @ 2010-06-10 18:12 UTC (permalink / raw)
To: netdev, Jeff Kirsher, Jesse Brandeburg; +Cc: Stanislaw Gruszka, stable
This patch added to 2.6.34:
commit f8d1dcaf88bddc7f282722ec1fdddbcb06a72f18
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Tue Apr 27 01:37:20 2010 +0000
ixgbe: enable extremely low latency
introduced a feature where LRO (called RSC on the hardware) was disabled
automatically when setting rx-usecs to 0 via ethtool. Some might not
like the fact that LRO was disabled automatically, but I'm fine with
that. What I don't like is that LRO/RSC is automatically enabled when
rx-usecs is set >0 via ethtool.
This would certainly be a problem if the device was used for forwarding
and it was determined that the low latency wasn't needed after the
device was already forwarding. I played around with saving the state of
LRO in the driver, but it just didn't seem worthwhile and would require
a small change to dev_disable_lro() that I did not like.
This patch simply leaves LRO disabled when setting rx-usecs >0 and
requires that the user enable it again. An extra informational message
will also now appear in the log so users can understand why LRO isn't
being enabled as they expect.
Inconsistency of LRO setting first noticed by Stanislaw Gruszka.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: stable@kernel.org
---
drivers/net/ixgbe/ixgbe_ethtool.c | 37 ++++++++-----------------------------
1 files changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index c50a754..9e672ee 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -2077,25 +2077,6 @@ static int ixgbe_get_coalesce(struct net_device *netdev,
return 0;
}
-/*
- * this function must be called before setting the new value of
- * rx_itr_setting
- */
-static bool ixgbe_reenable_rsc(struct ixgbe_adapter *adapter,
- struct ethtool_coalesce *ec)
-{
- /* check the old value and enable RSC if necessary */
- if ((adapter->rx_itr_setting == 0) &&
- (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE)) {
- adapter->flags2 |= IXGBE_FLAG2_RSC_ENABLED;
- adapter->netdev->features |= NETIF_F_LRO;
- DPRINTK(PROBE, INFO, "rx-usecs set to %d, re-enabling RSC\n",
- ec->rx_coalesce_usecs);
- return true;
- }
- return false;
-}
-
static int ixgbe_set_coalesce(struct net_device *netdev,
struct ethtool_coalesce *ec)
{
@@ -2124,9 +2105,6 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
(1000000/ec->rx_coalesce_usecs < IXGBE_MIN_INT_RATE))
return -EINVAL;
- /* check the old value and enable RSC if necessary */
- need_reset = ixgbe_reenable_rsc(adapter, ec);
-
/* store the value in ints/second */
adapter->rx_eitr_param = 1000000/ec->rx_coalesce_usecs;
@@ -2135,9 +2113,6 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
/* clear the lower bit as its used for dynamic state */
adapter->rx_itr_setting &= ~1;
} else if (ec->rx_coalesce_usecs == 1) {
- /* check the old value and enable RSC if necessary */
- need_reset = ixgbe_reenable_rsc(adapter, ec);
-
/* 1 means dynamic mode */
adapter->rx_eitr_param = 20000;
adapter->rx_itr_setting = 1;
@@ -2157,10 +2132,11 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
*/
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) {
adapter->flags2 &= ~IXGBE_FLAG2_RSC_ENABLED;
- netdev->features &= ~NETIF_F_LRO;
- DPRINTK(PROBE, INFO,
- "rx-usecs set to 0, disabling RSC\n");
-
+ if (netdev->features & NETIF_F_LRO) {
+ netdev->features &= ~NETIF_F_LRO;
+ DPRINTK(PROBE, INFO, "rx-usecs set to 0, "
+ "disabling LRO/RSC\n");
+ }
need_reset = true;
}
}
@@ -2255,6 +2231,9 @@ static int ixgbe_set_flags(struct net_device *netdev, u32 data)
}
} else if (!adapter->rx_itr_setting) {
netdev->features &= ~ETH_FLAG_LRO;
+ if (data & ETH_FLAG_LRO)
+ DPRINTK(PROBE, INFO, "rx-usecs set to 0, "
+ "LRO/RSC cannot be enabled.\n");
}
}
^ permalink raw reply related
* Re: [PATCH for-2.6.35] virtio_net: fix oom handling on tx
From: Michael S. Tsirkin @ 2010-06-10 18:57 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: virtualization, Rusty Russell, Jiri Pirko, Shirley Ma, netdev,
linux-kernel
In-Reply-To: <1276190227.22064.19.camel@w-sridhar.beaverton.ibm.com>
On Thu, Jun 10, 2010 at 10:17:07AM -0700, Sridhar Samudrala wrote:
> On Thu, 2010-06-10 at 18:20 +0300, Michael S. Tsirkin wrote:
> > virtio net will never try to overflow the TX ring, so the only reason
> > add_buf may fail is out of memory. Thus, we can not stop the
> > device until some request completes - there's no guarantee anything
> > at all is outstanding.
> >
> > Make the error message clearer as well: error here does not
> > indicate queue full.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > drivers/net/virtio_net.c | 15 ++++++++-------
> > 1 files changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 85615a3..e48a06f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -563,7 +563,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > struct virtnet_info *vi = netdev_priv(dev);
> > int capacity;
> >
> > -again:
> > /* Free up any pending old buffers before queueing new ones. */
> > free_old_xmit_skbs(vi);
> >
> > @@ -572,12 +571,14 @@ again:
> >
> > /* This can happen with OOM and indirect buffers. */
> > if (unlikely(capacity < 0)) {
> > - netif_stop_queue(dev);
> > - dev_warn(&dev->dev, "Unexpected full queue\n");
> > - if (unlikely(!virtqueue_enable_cb(vi->svq))) {
> > - virtqueue_disable_cb(vi->svq);
> > - netif_start_queue(dev);
> > - goto again;
> > + if (net_ratelimit()) {
> > + if (likely(capacity == -ENOMEM))
> > + dev_warn(&dev->dev,
> > + "TX queue failure: out of memory\n");
> > + else
> > + dev_warn(&dev->dev,
> > + "Unexpected TX queue failure: %d\n",
> > + capacity);
> > }
> > return NETDEV_TX_BUSY;
> > }
>
> It is not clear to me how xmit_skb() can return -ENOMEM.
> xmit_skb() calls virtqueue_add_buf_gfp() which can return -ENOSPC.
> Even vring_add_indirect() doesn't return -ENOMEM on kmalloc failure.
>
> Thanks
> Sridhar
A separate patch fixes vring_add_indirect to return -ENOMEM.
-ENOSPC really means ring is full so nothing to do
and no need to retry.
--
MST
^ permalink raw reply
* Re: [PATCH for-2.6.35] virtio_net: fix oom handling on tx
From: Michael S. Tsirkin @ 2010-06-10 19:03 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Sridhar Samudrala, virtualization, Rusty Russell, Jiri Pirko,
Shirley Ma, netdev, linux-kernel
In-Reply-To: <20100610104653.1aed2ecc@nehalam>
On Thu, Jun 10, 2010 at 10:46:53AM -0700, Stephen Hemminger wrote:
> On Thu, 10 Jun 2010 10:17:07 -0700
> Sridhar Samudrala <sri@us.ibm.com> wrote:
>
> > On Thu, 2010-06-10 at 18:20 +0300, Michael S. Tsirkin wrote:
> > > virtio net will never try to overflow the TX ring, so the only reason
> > > add_buf may fail is out of memory. Thus, we can not stop the
> > > device until some request completes - there's no guarantee anything
> > > at all is outstanding.
> > >
> > > Make the error message clearer as well: error here does not
> > > indicate queue full.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > drivers/net/virtio_net.c | 15 ++++++++-------
> > > 1 files changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 85615a3..e48a06f 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -563,7 +563,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > struct virtnet_info *vi = netdev_priv(dev);
> > > int capacity;
> > >
> > > -again:
> > > /* Free up any pending old buffers before queueing new ones. */
> > > free_old_xmit_skbs(vi);
> > >
> > > @@ -572,12 +571,14 @@ again:
> > >
> > > /* This can happen with OOM and indirect buffers. */
> > > if (unlikely(capacity < 0)) {
> > > - netif_stop_queue(dev);
> > > - dev_warn(&dev->dev, "Unexpected full queue\n");
> > > - if (unlikely(!virtqueue_enable_cb(vi->svq))) {
> > > - virtqueue_disable_cb(vi->svq);
> > > - netif_start_queue(dev);
> > > - goto again;
> > > + if (net_ratelimit()) {
> > > + if (likely(capacity == -ENOMEM))
> > > + dev_warn(&dev->dev,
> > > + "TX queue failure: out of memory\n");
> > > + else
> > > + dev_warn(&dev->dev,
> > > + "Unexpected TX queue failure: %d\n",
> > > + capacity);
> > > }
> > > return NETDEV_TX_BUSY;
> > > }
> >
> > It is not clear to me how xmit_skb() can return -ENOMEM.
> > xmit_skb() calls virtqueue_add_buf_gfp() which can return -ENOSPC.
> > Even vring_add_indirect() doesn't return -ENOMEM on kmalloc failure.
>
> It makes more sense to have the device increment tx_droppped,
> and return NETDEV_TX_OK. Skip the message (or make it a pr_debug()).
> Network devices do not guarantee packet delivery, and if out of
> resources then holding more data in the
> queue is going to hurt not help the situation.
>
> --
Well, I only keep the existing behaviour around. The changes you propose
would be 2.6.36 material.
I have it on my todo list to look for a way to test performance under
GFP_ATOMIC failure scenario. Any suggestions?
--
MST
^ permalink raw reply
* Re: [PATCH] ethoc: Write bus addresses to registers
From: Jonas Bonn @ 2010-06-10 19:56 UTC (permalink / raw)
To: netdev; +Cc: davem
In-Reply-To: <1276181958-31172-1-git-send-email-jonas@southpole.se>
[-- Attachment #1: Type: text/plain, Size: 947 bytes --]
Hmm... found a little problem here... this bit is dependent on another
patch that should be in there before it... I'd appreciate comment on the
approach of the whole thing and then I'll submit the missing bits in the
correct order. Thanks and sorry for the confusion...
/Jonas
See my comment below...
> @@ -978,6 +989,12 @@ static int ethoc_probe(struct platform_device *pdev)
> priv->dma_alloc = buffer_size;
> }
---> Missing bits are here: calculate number of transmission buffers...
>
> + priv->vma = devm_kzalloc(&pdev->dev, priv->num_tx*sizeof(void*), GFP_KERNEL);
---> And here: should be
priv->vma = devm_kzalloc(&pdev->dev, (priv->num_tx
+priv->num_rx)*sizeof(void*), GFP_KERNEL);
> + if (!priv->vma) {
> + ret = -ENOMEM;
> + goto error;
> + }
> +
> /* Allow the platform setup code to pass in a MAC address. */
> if (pdev->dev.platform_data) {
> struct ethoc_platform_data *pdata =
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH net-2.6] ixgbe: fix automatic LRO/RSC settings for low latency
From: Jeff Kirsher @ 2010-06-10 20:17 UTC (permalink / raw)
To: Andy Gospodarek; +Cc: netdev, Jesse Brandeburg, Stanislaw Gruszka, stable
In-Reply-To: <20100610181204.GO7497@gospo.rdu.redhat.com>
On Thu, Jun 10, 2010 at 11:12, Andy Gospodarek <andy@greyhouse.net> wrote:
>
> This patch added to 2.6.34:
>
> commit f8d1dcaf88bddc7f282722ec1fdddbcb06a72f18
> Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Date: Tue Apr 27 01:37:20 2010 +0000
>
> ixgbe: enable extremely low latency
>
> introduced a feature where LRO (called RSC on the hardware) was disabled
> automatically when setting rx-usecs to 0 via ethtool. Some might not
> like the fact that LRO was disabled automatically, but I'm fine with
> that. What I don't like is that LRO/RSC is automatically enabled when
> rx-usecs is set >0 via ethtool.
>
> This would certainly be a problem if the device was used for forwarding
> and it was determined that the low latency wasn't needed after the
> device was already forwarding. I played around with saving the state of
> LRO in the driver, but it just didn't seem worthwhile and would require
> a small change to dev_disable_lro() that I did not like.
>
> This patch simply leaves LRO disabled when setting rx-usecs >0 and
> requires that the user enable it again. An extra informational message
> will also now appear in the log so users can understand why LRO isn't
> being enabled as they expect.
>
> Inconsistency of LRO setting first noticed by Stanislaw Gruszka.
>
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Stanislaw Gruszka <sgruszka@redhat.com>
> CC: stable@kernel.org
>
> ---
> drivers/net/ixgbe/ixgbe_ethtool.c | 37 ++++++++-----------------------------
> 1 files changed, 8 insertions(+), 29 deletions(-)
>
Thanks Andy, I have added the patch to my queue.
^ permalink raw reply
* Re: QoS weirdness : HTB accuracy
From: Andrew Beverley @ 2010-06-10 21:22 UTC (permalink / raw)
To: Julien Vehent; +Cc: Philip A. Prindeville, Netdev, netfilter
In-Reply-To: <e6bd0cff8dafc8b3af0760facb5fdc8c@localhost>
> > Sorry for the late response: could this be an "aliasing" issue caused
> > by sampling intervals (granularity)?
> >
>
> I was, in fact, an error in my ruleset. I had put the 'linklayer atm' at
> both the branch and leaf levels, so the overhead was computed twice,
> creating those holes in the bandwidth.
I am seeing similar behaviour with my setup. Am I making the same
mistake? A subset of my rules is as follows:
tc qdisc add dev ppp0 root handle 1: htb r2q 1
tc class add dev ppp0 parent 1: classid 1:1 htb \
rate ${DOWNLINK}kbit ceil ${DOWNLINK}kbit \
overhead $overhead linklayer atm <------- Here
tc class add dev ppp0 parent 1:1 classid 1:10 htb \
rate 612kbit ceil 612kbit prio 0 \
overhead $overhead linklayer atm <------- And here
tc qdisc add dev ppp0 parent 1:10 handle 4210: \
sfq perturb 10 limit 50
tc filter add dev ppp0 parent 1:0 protocol ip \
prio 10 handle 10 fw flowid 1:10
Thanks,
Andy
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Herbert Xu @ 2010-06-10 21:56 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Matt Mackall
In-Reply-To: <20100610074912.1303e3da@nehalam>
On Thu, Jun 10, 2010 at 07:49:12AM -0700, Stephen Hemminger wrote:
>
> Thanks for looking at this, and yes the original code does
> look buggy. Not sure if I like the use of in_irq() to handle the netpoll
> packets as special case. Is in_irq() really reliable for this?
Yes because netpoll always calls ndo_start_xmit with IRQs disabled,
while on the normal TX path IRQs must be enabled (as otherwise
enabling BH would warn).
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Stephen Hemminger @ 2010-06-10 21:59 UTC (permalink / raw)
To: Herbert Xu
Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Matt Mackall
In-Reply-To: <20100610215643.GA22048@gondor.apana.org.au>
On Fri, 11 Jun 2010 07:56:43 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Jun 10, 2010 at 07:49:12AM -0700, Stephen Hemminger wrote:
> >
> > Thanks for looking at this, and yes the original code does
> > look buggy. Not sure if I like the use of in_irq() to handle the netpoll
> > packets as special case. Is in_irq() really reliable for this?
>
> Yes because netpoll always calls ndo_start_xmit with IRQs disabled,
> while on the normal TX path IRQs must be enabled (as otherwise
> enabling BH would warn).
Okay, then add a comment where in_irq is used?
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Herbert Xu @ 2010-06-10 22:48 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Matt Mackall
In-Reply-To: <20100610145915.721a86b7@nehalam>
On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
>
> Okay, then add a comment where in_irq is used?
Actually let me put it into a wrapper. I'll respin the patches.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* puzzled of the congestion control window of Scalable TCP
From: jellyaaa @ 2010-06-11 1:15 UTC (permalink / raw)
To: netdev
I am now doing something related to the STCP (Scalable TCP). By pringking the
dynamic snd_cwnd in linux (kernel 2.6.18) ,the result puzzled me . Theoryly,
if any congestion occurs, the snd_cwnd = snd_cwnd * 0.875,but I haven't got
the result I expected. The snd_cwnd changes like this :
snd_cwnd = 1 , snd_ssthresh = 2147483647
snd_cwnd = 2 , snd_ssthresh = 2147483647
snd_cwnd = 3, snd_ssthresh = 2147483647
......
snd_cwnd = 19022, snd_ssthresh = 2147483647 // keeping this value for
several ms
snd_cwnd = 1769, snd_ssthresh = 16645 // congestion occurs
snd_cwnd = 1770, snd_ssthresh = 16645
......
>From the above data. when congestion occurs , snd_ssthresh = 16645 ~=~
19022*0.875=16626 corresponds to the theory calculated result. no problems.
but what is the mean of snd_cwnd = 1770? how this value is calculated .who
(where )calculated it ?
--
View this message in context: http://old.nabble.com/puzzled-of-the-congestion-control-window-of-Scalable-TCP-tp28850421p28850421.html
Sent from the netdev mailing list archive at Nabble.com.
^ permalink raw reply
* [PATCH] virtio_net: implements ethtool_ops.get_drvinfo
From: Taku Izumi @ 2010-06-11 1:29 UTC (permalink / raw)
To: David S. Miller, netdev@vger.kernel.org, rusty
This patch implements ethtool_ops.get_drvinfo interface of virtio_net driver.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
---
drivers/net/virtio_net.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
Index: net-next.35/drivers/net/virtio_net.c
===================================================================
--- net-next.35.orig/drivers/net/virtio_net.c
+++ net-next.35/drivers/net/virtio_net.c
@@ -701,6 +701,18 @@ static int virtnet_close(struct net_devi
return 0;
}
+static void virtnet_get_drvinfo(struct net_device *dev,
+ struct ethtool_drvinfo *drvinfo)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct virtio_device *vdev = vi->vdev;
+
+ strncpy(drvinfo->driver, KBUILD_MODNAME, 32);
+ strncpy(drvinfo->version, "N/A", 32);
+ strncpy(drvinfo->fw_version, "N/A", 32);
+ strncpy(drvinfo->bus_info, dev_name(&vdev->dev), 32);
+}
+
static int virtnet_set_tx_csum(struct net_device *dev, u32 data)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -813,6 +825,7 @@ static void virtnet_vlan_rx_kill_vid(str
}
static const struct ethtool_ops virtnet_ethtool_ops = {
+ .get_drvinfo = virtnet_get_drvinfo,
.set_tx_csum = virtnet_set_tx_csum,
.set_sg = ethtool_op_set_sg,
.set_tso = ethtool_op_set_tso,
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Herbert Xu @ 2010-06-11 2:11 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Matt Mackall
In-Reply-To: <20100610224839.GA22469@gondor.apana.org.au>
On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> >
> > Okay, then add a comment where in_irq is used?
>
> Actually let me put it into a wrapper. I'll respin the patches.
OK here is a repost. And this time it really is 8 patches :)
I've tested it lightly.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* [PATCH 2/8] bridge: Remove redundant npinfo NULL setting
From: Herbert Xu @ 2010-06-11 2:12 UTC (permalink / raw)
To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Stephen
In-Reply-To: <20100611021142.GA24490@gondor.apana.org.au>
bridge: Remove redundant npinfo NULL setting
Now that netpoll always zaps npinfo we no longer needs to do it
in bridge.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/bridge/br_device.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index eedf2c9..dce0611 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -231,7 +231,6 @@ void br_netpoll_cleanup(struct net_device *dev)
struct net_bridge_port *p, *n;
const struct net_device_ops *ops;
- br->dev->npinfo = NULL;
list_for_each_entry_safe(p, n, &br->port_list, list) {
if (p->dev) {
ops = p->dev->netdev_ops;
^ permalink raw reply related
* [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup
From: Herbert Xu @ 2010-06-11 2:12 UTC (permalink / raw)
To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
WANG Cong, Stephen
In-Reply-To: <20100611021142.GA24490@gondor.apana.org.au>
netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup
Sinec we have to null npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
net/core/netpoll.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 94825b1..748c930 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -898,8 +898,7 @@ void netpoll_cleanup(struct netpoll *np)
ops = np->dev->netdev_ops;
if (ops->ndo_netpoll_cleanup)
ops->ndo_netpoll_cleanup(np->dev);
- else
- np->dev->npinfo = NULL;
+ np->dev->npinfo = NULL;
}
}
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox