Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ucc_geth: Fix hung tasks.
From: Joakim Tjernlund @ 2010-11-10 12:05 UTC (permalink / raw)
  Cc: Anton Vorontsov, netdev
In-Reply-To: <1289211819-21746-1-git-send-email-Joakim.Tjernlund@transmode.se>

Ping?

Even though this patch didn't solve my hang it is still a bug.

     Jocke

Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote on 2010/11/08 11:23:39:

> From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> To: linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, Anton Vorontsov <avorontsov@ru.mvista.com>
> Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> Date: 2010/11/08 11:23
> Subject: [PATCH] ucc_geth: Fix hung tasks.
>
> We noticed a few hangs like this:
>
> INFO: task ifconfig:572 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ifconfig      D 0ff65760     0   572    369 0x00000000
> Call Trace:
> [c6157be0] [c6008460] 0xc6008460 (unreliable)
> [c6157ca0] [c0008608] __switch_to+0x4c/0x6c
> [c6157cb0] [c028fecc] schedule+0x184/0x310
> [c6157ce0] [c0290e54] __mutex_lock_slowpath+0xa4/0x150
> [c6157d20] [c0290c48] mutex_lock+0x44/0x48
> [c6157d30] [c01aba74] phy_stop+0x20/0x70
> [c6157d40] [c01aef40] ucc_geth_stop+0x30/0x98
> [c6157d60] [c01b18fc] ucc_geth_close+0x9c/0xdc
> [c6157d80] [c01db0cc] __dev_close+0xa0/0xd0
> [c6157d90] [c01deddc] __dev_change_flags+0x8c/0x148
> [c6157db0] [c01def54] dev_change_flags+0x1c/0x64
> [c6157dd0] [c0237ac8] devinet_ioctl+0x678/0x784
> [c6157e50] [c0239a58] inet_ioctl+0xb0/0xbc
> [c6157e60] [c01cafa8] sock_ioctl+0x174/0x2a0
> [c6157e80] [c009a16c] vfs_ioctl+0xcc/0xe0
> [c6157ea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
> [c6157f10] [c009b0b0] sys_ioctl+0x40/0x74
> [c6157f40] [c00117c4] ret_from_syscall+0x0/0x38
>
> I THINK this is due to a missing cancel_work_sync in the driver
> although we cannot be sure. I found this by comparing
> ucc_geth with gianfar.
>
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> ---
>  drivers/net/ucc_geth.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
> index 97f9f7d..6647ed7 100644
> --- a/drivers/net/ucc_geth.c
> +++ b/drivers/net/ucc_geth.c
> @@ -3556,6 +3556,7 @@ static int ucc_geth_close(struct net_device *dev)
>
>     napi_disable(&ugeth->napi);
>
> +   cancel_work_sync(&ugeth->timeout_work);
>     ucc_geth_stop(ugeth);
>
>     free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
> --
> 1.7.2.2
>


^ permalink raw reply

* Re: [PATCH] Prevent reading uninitialized memory with socket filters
From: Dan Rosenberg @ 2010-11-10 13:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, stable, security
In-Reply-To: <1289387567.7380.63.camel@dan>

For reasons I don't understand, this code seems to cause some people's
machines to lock up:

BUG: soft lockup - CPU#0 stuck for 10s! [dz:11844]

I haven't been able to reproduce this myself.  Very strange.  I'll send
a stack trace if I can reproduce.

-Dan

On Wed, 2010-11-10 at 06:12 -0500, Dan Rosenberg wrote:
> > 
> > Prove it.
> 
> I hope this was a joke.  In either case, my reply is a joke:
> 
> http://lists.grok.org.uk/pipermail/full-disclosure/2010-November/077321.html



^ permalink raw reply

* possible kernel oops from user MSS
From: Steve Chen @ 2010-11-10 13:24 UTC (permalink / raw)
  To: netdev

Hello

With commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390, a user program
can pass in TCP_MAXSEG of 12 (or TCPOLEN_TSTAMP_ALIGNED), and cause
kernel oops with division by 0
 in tcp_select_initial_window.  One way to prevent it is to change the
minimum value for TCP_MAXSEG in do_tcp_setsockopt from 8 to some value
over 12.  Two questions.

1.  Is this the right solution?
2.  If it is, what is a good minimum value?

Thanks,

Steve

^ permalink raw reply

* Re: [PATCH] ucc_geth: Fix hung tasks.
From: Joakim Tjernlund @ 2010-11-10 14:11 UTC (permalink / raw)
  Cc: Anton Vorontsov, netdev
In-Reply-To: <OFEBFE0319.7D12AA7E-ONC12577D7.00423B59-C12577D7.00426B71@LocalDomain>

Actually, there is something wrong anyway with TX timeout
so don't use this patch. I must investigate more but
it seems like cancel_work_sync hangs whenever an TX timeout
occurs.

      Jocke

Joakim Tjernlund/Transmode wrote on 2010/11/10 13:05:28:
>
> Ping?
>
> Even though this patch didn't solve my hang it is still a bug.
>
>      Jocke
>
> Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote on 2010/11/08 11:23:39:
>
> > From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > To: linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, Anton Vorontsov <avorontsov@ru.mvista.com>
> > Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > Date: 2010/11/08 11:23
> > Subject: [PATCH] ucc_geth: Fix hung tasks.
> >
> > We noticed a few hangs like this:
> >
> > INFO: task ifconfig:572 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > ifconfig      D 0ff65760     0   572    369 0x00000000
> > Call Trace:
> > [c6157be0] [c6008460] 0xc6008460 (unreliable)
> > [c6157ca0] [c0008608] __switch_to+0x4c/0x6c
> > [c6157cb0] [c028fecc] schedule+0x184/0x310
> > [c6157ce0] [c0290e54] __mutex_lock_slowpath+0xa4/0x150
> > [c6157d20] [c0290c48] mutex_lock+0x44/0x48
> > [c6157d30] [c01aba74] phy_stop+0x20/0x70
> > [c6157d40] [c01aef40] ucc_geth_stop+0x30/0x98
> > [c6157d60] [c01b18fc] ucc_geth_close+0x9c/0xdc
> > [c6157d80] [c01db0cc] __dev_close+0xa0/0xd0
> > [c6157d90] [c01deddc] __dev_change_flags+0x8c/0x148
> > [c6157db0] [c01def54] dev_change_flags+0x1c/0x64
> > [c6157dd0] [c0237ac8] devinet_ioctl+0x678/0x784
> > [c6157e50] [c0239a58] inet_ioctl+0xb0/0xbc
> > [c6157e60] [c01cafa8] sock_ioctl+0x174/0x2a0
> > [c6157e80] [c009a16c] vfs_ioctl+0xcc/0xe0
> > [c6157ea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
> > [c6157f10] [c009b0b0] sys_ioctl+0x40/0x74
> > [c6157f40] [c00117c4] ret_from_syscall+0x0/0x38
> >
> > I THINK this is due to a missing cancel_work_sync in the driver
> > although we cannot be sure. I found this by comparing
> > ucc_geth with gianfar.
> >
> > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > ---
> >  drivers/net/ucc_geth.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
> > index 97f9f7d..6647ed7 100644
> > --- a/drivers/net/ucc_geth.c
> > +++ b/drivers/net/ucc_geth.c
> > @@ -3556,6 +3556,7 @@ static int ucc_geth_close(struct net_device *dev)
> >
> >     napi_disable(&ugeth->napi);
> >
> > +   cancel_work_sync(&ugeth->timeout_work);
> >     ucc_geth_stop(ugeth);
> >
> >     free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
> > --
> > 1.7.2.2
> >


^ permalink raw reply

* Re: [PATCH] Prevent reading uninitialized memory with socketfilters
From: Tetsuo Handa @ 2010-11-10 14:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1289373771.2700.110.camel@edumazet-laptop>

Just I thought...

> unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen)
> {
> 	struct sock_filter *fentry;     /* We walk down these */
Can't this be "const struct sock_filter *"?
> (...snipped...)
> 	for (pc = 0; pc < flen; pc++) {
> 		fentry = &filter[pc];
Can't we do
		u32 f_k = fentry->k;
and replace 27 repetition of fentry->k with f_k?

^ permalink raw reply

* Re: Routing over multiple interfaces
From: David Woodhouse @ 2010-11-10 14:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, uweber
In-Reply-To: <1288647330.2660.116.camel@edumazet-laptop>

On Mon, 2010-11-01 at 22:35 +0100, Eric Dumazet wrote:
> 
> David W. probably wants to use teql or some bonding ?
> 
> # tc qdisc add dev ppp0 root teql0
> # tc qdisc add dev ppp1 root teql0
> # ip link set dev teql0 up
> # ip route add default src 90.155.92.214 dev teql0

That works; thanks. Or it should do... there's a slight complication
right now in that one of my lines is actually routed through a separate
ADSL<->Ethernet box, because one of the ports on my Solos card is a bit
shagged and syncs a lot slower than it should. So it's actually eth1 and
ppp1 that I'm sharing. That should be fixed relatively soon as they're
sending me a new box.

There's a minor issue with this setup though — ideally, any given TCP
connection would tend to use the *same* interface as much as possible,
and only 'overflow' onto the other interface if it's actually saturating
the uplink on the first. That would tend to reduce the amount of packet
re-ordering. At the moment it will split outbound packets over both
lines even when they're relatively idle, introducing packet re-ordering
where it wasn't really necessary.

That said, the world is supposed to cope with packet re-ordering, so I'm
not going to lose a lot of sleep over it. And I have a feeling the
downstream link from the ISP does it exactly the same.

-- 
dwmw2

^ permalink raw reply

* Re: Routing over multiple interfaces
From: Eric Dumazet @ 2010-11-10 15:08 UTC (permalink / raw)
  To: David Woodhouse; +Cc: David Miller, netdev, uweber
In-Reply-To: <1289400654.2721.7.camel@macbook.infradead.org>

Le mercredi 10 novembre 2010 à 14:50 +0000, David Woodhouse a écrit :
> On Mon, 2010-11-01 at 22:35 +0100, Eric Dumazet wrote:
> > 
> > David W. probably wants to use teql or some bonding ?
> > 
> > # tc qdisc add dev ppp0 root teql0
> > # tc qdisc add dev ppp1 root teql0
> > # ip link set dev teql0 up
> > # ip route add default src 90.155.92.214 dev teql0
> 
> That works; thanks. Or it should do... there's a slight complication
> right now in that one of my lines is actually routed through a separate
> ADSL<->Ethernet box, because one of the ports on my Solos card is a bit
> shagged and syncs a lot slower than it should. So it's actually eth1 and
> ppp1 that I'm sharing. That should be fixed relatively soon as they're
> sending me a new box.
> 
> There's a minor issue with this setup though — ideally, any given TCP
> connection would tend to use the *same* interface as much as possible,
> and only 'overflow' onto the other interface if it's actually saturating
> the uplink on the first. That would tend to reduce the amount of packet
> re-ordering. At the moment it will split outbound packets over both
> lines even when they're relatively idle, introducing packet re-ordering
> where it wasn't really necessary.
> 
> That said, the world is supposed to cope with packet re-ordering, so I'm
> not going to lose a lot of sleep over it. And I have a feeling the
> downstream link from the ISP does it exactly the same.
> 

Its a bit problematic, as you said you want to use both links to do one
upload ;)

Maybe it is possible to instruct teql to try to not change links unless
one link is full...




^ permalink raw reply

* Re: [RESEND PATCH] virtio-net: init link state correctly
From: Michael S. Tsirkin @ 2010-11-10 15:08 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Jason Wang, netdev, linux-kernel, davem, markmc, kvm
In-Reply-To: <201011080941.28190.rusty@rustcorp.com.au>

On Mon, Nov 08, 2010 at 09:41:27AM +1030, Rusty Russell wrote:
> On Fri, 5 Nov 2010 08:17:18 pm Jason Wang wrote:
> > For device that supports VIRTIO_NET_F_STATUS, there's no need to
> > assume the link is up and we need to call nerif_carrier_off() before
> > querying device status, otherwise we may get wrong operstate after
> > diver was loaded because the link watch event was not fired as
> > expected.
> > 
> > For device that does not support VIRITO_NET_F_STATUS, we could not get
> > its status through virtnet_update_status() and what we can only do is
> > always assuming the link is up.
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> Thanks!
> Rusty.

Rusty, just to verify: who shall be queueing this up?

^ permalink raw reply

* [PATCH] macvlan: lockless tx path
From: Eric Dumazet @ 2010-11-10 15:41 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev

macvlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from rx_stats to pcpu_stats.

Note : this reverts commit 2c11455321f37 (macvlan: add multiqueue
capability)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/macvlan.c      |   72 +++++++++++++----------------------
 include/linux/if_macvlan.h |   26 +++++++-----
 2 files changed, 43 insertions(+), 55 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 0fc9dc7..fce80a7 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -243,17 +243,19 @@ xmit_world:
 netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 			       struct net_device *dev)
 {
-	int i = skb_get_queue_mapping(skb);
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
 	unsigned int len = skb->len;
 	int ret;
+	const struct macvlan_dev *vlan = netdev_priv(dev);
+	struct macvlan_pcpu_stats *pcpu_stats = this_cpu_ptr(vlan->pcpu_stats);
 
 	ret = macvlan_queue_xmit(skb, dev);
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
-		txq->tx_packets++;
-		txq->tx_bytes += len;
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->tx_packets++;
+		pcpu_stats->tx_bytes += len;
+		u64_stats_update_end(&pcpu_stats->syncp);
 	} else
-		txq->tx_dropped++;
+		pcpu_stats->tx_dropped++;
 
 	return ret;
 }
@@ -414,14 +416,15 @@ static int macvlan_init(struct net_device *dev)
 	dev->state		= (dev->state & ~MACVLAN_STATE_MASK) |
 				  (lowerdev->state & MACVLAN_STATE_MASK);
 	dev->features 		= lowerdev->features & MACVLAN_FEATURES;
+	dev->features		|= NETIF_F_LLTX;
 	dev->gso_max_size	= lowerdev->gso_max_size;
 	dev->iflink		= lowerdev->ifindex;
 	dev->hard_header_len	= lowerdev->hard_header_len;
 
 	macvlan_set_lockdep_class(dev);
 
-	vlan->rx_stats = alloc_percpu(struct macvlan_rx_stats);
-	if (!vlan->rx_stats)
+	vlan->pcpu_stats = alloc_percpu(struct macvlan_pcpu_stats);
+	if (!vlan->pcpu_stats)
 		return -ENOMEM;
 
 	return 0;
@@ -431,7 +434,7 @@ static void macvlan_uninit(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
-	free_percpu(vlan->rx_stats);
+	free_percpu(vlan->pcpu_stats);
 }
 
 static struct rtnl_link_stats64 *macvlan_dev_get_stats64(struct net_device *dev,
@@ -439,33 +442,34 @@ static struct rtnl_link_stats64 *macvlan_dev_get_stats64(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
-	dev_txq_stats_fold(dev, stats);
-
-	if (vlan->rx_stats) {
-		struct macvlan_rx_stats *p, accum = {0};
-		u64 rx_packets, rx_bytes, rx_multicast;
+	if (vlan->pcpu_stats) {
+		struct macvlan_pcpu_stats *p;
+		u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
 		unsigned int start;
 		int i;
 
 		for_each_possible_cpu(i) {
-			p = per_cpu_ptr(vlan->rx_stats, i);
+			p = per_cpu_ptr(vlan->pcpu_stats, i);
 			do {
 				start = u64_stats_fetch_begin_bh(&p->syncp);
 				rx_packets	= p->rx_packets;
 				rx_bytes	= p->rx_bytes;
 				rx_multicast	= p->rx_multicast;
+				tx_packets	= p->tx_packets;
+				tx_bytes	= p->tx_bytes;
 			} while (u64_stats_fetch_retry_bh(&p->syncp, start));
-			accum.rx_packets	+= rx_packets;
-			accum.rx_bytes		+= rx_bytes;
-			accum.rx_multicast	+= rx_multicast;
-			/* rx_errors is an ulong, updated without syncp protection */
-			accum.rx_errors		+= p->rx_errors;
+			stats->rx_packets	+= rx_packets;
+			stats->rx_bytes		+= rx_bytes;
+			stats->multicast	+= rx_multicast;
+			stats->tx_packets	+= tx_packets;
+			stats->tx_bytes		+= tx_bytes;
+			/* rx_errors & tx_dropped are ulong, updated
+			 * without syncp protection
+			 */
+			stats->rx_errors	+= p->rx_errors;
+			stats->tx_dropped	+= p->tx_dropped;
 		}
-		stats->rx_packets = accum.rx_packets;
-		stats->rx_bytes   = accum.rx_bytes;
-		stats->rx_errors  = accum.rx_errors;
-		stats->rx_dropped = accum.rx_errors;
-		stats->multicast  = accum.rx_multicast;
+		stats->rx_dropped = stats->rx_errors;
 	}
 	return stats;
 }
@@ -601,25 +605,6 @@ static int macvlan_validate(struct nlattr *tb[], struct nlattr *data[])
 	return 0;
 }
 
-static int macvlan_get_tx_queues(struct net *net,
-				 struct nlattr *tb[],
-				 unsigned int *num_tx_queues,
-				 unsigned int *real_num_tx_queues)
-{
-	struct net_device *real_dev;
-
-	if (!tb[IFLA_LINK])
-		return -EINVAL;
-
-	real_dev = __dev_get_by_index(net, nla_get_u32(tb[IFLA_LINK]));
-	if (!real_dev)
-		return -ENODEV;
-
-	*num_tx_queues      = real_dev->num_tx_queues;
-	*real_num_tx_queues = real_dev->real_num_tx_queues;
-	return 0;
-}
-
 int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 			   struct nlattr *tb[], struct nlattr *data[],
 			   int (*receive)(struct sk_buff *skb),
@@ -743,7 +728,6 @@ int macvlan_link_register(struct rtnl_link_ops *ops)
 {
 	/* common fields */
 	ops->priv_size		= sizeof(struct macvlan_dev);
-	ops->get_tx_queues	= macvlan_get_tx_queues;
 	ops->validate		= macvlan_validate;
 	ops->maxtype		= IFLA_MACVLAN_MAX;
 	ops->policy		= macvlan_policy;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 8a2fd66..3779b5f 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -25,19 +25,23 @@ struct macvlan_port;
 struct macvtap_queue;
 
 /**
- *	struct macvlan_rx_stats - MACVLAN percpu rx stats
+ *	struct macvlan_pcpu_stats - MACVLAN percpu stats
  *	@rx_packets: number of received packets
  *	@rx_bytes: number of received bytes
  *	@rx_multicast: number of received multicast packets
+ *	@tx_
  *	@syncp: synchronization point for 64bit counters
  *	@rx_errors: number of errors
  */
-struct macvlan_rx_stats {
+struct macvlan_pcpu_stats {
 	u64			rx_packets;
 	u64			rx_bytes;
 	u64			rx_multicast;
+	u64			tx_packets;
+	u64			tx_bytes;
 	struct u64_stats_sync	syncp;
 	unsigned long		rx_errors;
+	unsigned long		tx_dropped;
 };
 
 /*
@@ -52,7 +56,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
-	struct macvlan_rx_stats __percpu *rx_stats;
+	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 	enum macvlan_mode	mode;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
@@ -64,18 +68,18 @@ static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
 				    unsigned int len, bool success,
 				    bool multicast)
 {
-	struct macvlan_rx_stats *rx_stats;
+	struct macvlan_pcpu_stats *pcpu_stats;
 
-	rx_stats = this_cpu_ptr(vlan->rx_stats);
+	pcpu_stats = this_cpu_ptr(vlan->pcpu_stats);
 	if (likely(success)) {
-		u64_stats_update_begin(&rx_stats->syncp);
-		rx_stats->rx_packets++;;
-		rx_stats->rx_bytes += len;
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->rx_packets++;;
+		pcpu_stats->rx_bytes += len;
 		if (multicast)
-			rx_stats->rx_multicast++;
-		u64_stats_update_end(&rx_stats->syncp);
+			pcpu_stats->rx_multicast++;
+		u64_stats_update_end(&pcpu_stats->syncp);
 	} else {
-		rx_stats->rx_errors++;
+		pcpu_stats->rx_errors++;
 	}
 }
 



^ permalink raw reply related

* Re: Routing over multiple interfaces
From: David Woodhouse @ 2010-11-10 15:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, uweber
In-Reply-To: <1289401704.2860.182.camel@edumazet-laptop>

On Wed, 2010-11-10 at 16:08 +0100, Eric Dumazet wrote:
> Its a bit problematic, as you said you want to use both links to do
> one upload ;)

When it's *necessary*, yes :)

> Maybe it is possible to instruct teql to try to not change links
> unless one link is full... 

Hm... teql doesn't have any visibility into which packets belong to
which flow, does it?

Perhaps the answer is based on my previous setup ('ip route add default
src 90.155.92.214 nexthop dev ppp0 nexthop dev ppp1'). That would
naturally distribute connections between the interfaces, so all we need
on top is a qdisc which will move packets from one interface to the
other when one queue becomes full (or at a given rate limit).

-- 
dwmw2

^ permalink raw reply

* Re: [PATCH 3/3] net: tipc: fix information leak to userland
From: Vasiliy Kulikov @ 2010-11-10 15:54 UTC (permalink / raw)
  To: walter harms
  Cc: David Miller, kernel-janitors, jon.maloy, allan.stephens,
	tipc-discussion, netdev, linux-kernel
In-Reply-To: <4CDA88FE.8040801@bfs.de>

On Wed, Nov 10, 2010 at 12:58 +0100, walter harms wrote:
> NTL the core problem was that sizeof sa_data is 14 while dev->name is IFNAMESZ=15.

With this code it is NOT a bug because the output buffer is much bigger
than 14 (128 bytes).  I think it was just designed to overflow 14 bytes,
assign sa_data[14] = 0 and ignore it (lack of snprintf() those days?).

Anywhere else sa_data[14] = ... is a bug.

-- 
Vasiliy

^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-11-10 16:16 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <OFFE63B7F4.D491BE12-ON652577D6.005B4090-652577D6.005F565A@in.ibm.com>

On Tue, Nov 09, 2010 at 10:54:57PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 11/09/2010 09:03:25 PM:
> 
> > > > Something strange here, right?
> > > > 1. You are consistently getting >10G/s here, and even with a single
> > > stream?
> > >
> > > Sorry, I should have mentioned this though I had stated in my
> > > earlier mails. Each test result has two iterations, each of 60
> > > seconds, except when #netperfs is 1 for which I do 10 iteration
> > > (sum across 10 iterations).
> >
> > So need to divide the number by 10?
> 
> Yes, that is what I get with 512/1K macvtap I/O size :)
> 
> > >  I started doing many more iterations
> > > for 1 netperf after finding the issue earlier with single stream.
> > > So the BW is only 4.5-7 Gbps.
> > >
> > > > 2. With 2 streams, is where we get < 10G/s originally. Instead of
> > > >    doubling that we get a marginal improvement with 2 queues and
> > > >    about 30% worse with 1 queue.
> > >
> > > (doubling happens consistently for guest -> host, but never for
> > > remote host) I tried 512/txqs=2 and 1024/txqs=8 to get a varied
> > > testing scenario. In first case, there is a slight improvement in
> > > BW and good reduction in SD. In the second case, only SD improves
> > > (though BW drops for 2 stream for some reason).  In both cases,
> > > BW and SD improves as the number of sessions increase.
> >
> > I guess this is another indication that something's wrong.
> 
> The patch - both virtio-net and vhost-net, doesn't have any
> locking/mutex's/ or any synchronization method. Guest -> host
> performance improvement of upto 100% shows the patch is not
> doing anything wrong.

My concern is this: we don't seem to do anything in tap or macvtap to
help packets from separate virtio queues get to separate queues in the
hardware device and to avoid reordering when we do this.

- skb_tx_hash calculation will get different results
- hash math that e.g. tcp does will run on guest and seems to be discarded

etc

Maybe it's as simple as some tap/macvtap ioctls to set up the queue number
in skbs. Or maybe we need to pass the skb hash from guest to host.
It's this last option that should make us especially cautios as it'll
affect guest/host interface.

Also see d5a9e24afb4ab38110ebb777588ea0bd0eacbd0a: if we have
hardware which records an RX queue, it appears important to
pass that info to guest and to use that in selecting the TX queue.
Of course we won't see this in netperf runs but this needs to
be given thought too - supporting this seems to suggest either
sticking the hash in the virtio net header for both tx and rx,
or using multiplease RX queues.

> > We are quite far from line rate, the fact BW does not scale
> > means there's some contention in the code.
> 
> Attaining line speed with macvtap seems to be a generic issue
> and unrelated to my patch specifically. IMHO if there is nothing
> wrong in the code (review) and is accepted, it will benefit as
> others can also help to find what needs to be implemented in
> vhost/macvtap/qemu to get line speed for guest->remote-host.

No problem, I will queue these patches in some branch
to help enable cooperation, as well as help you
iterate with incremental patches instead of resending it all each time.

> PS: bare-metal performance for host->remote-host is also
>     2.7 Gbps and 2.8 Gbps for 512/1024 for the same card.
> 
> Thanks,

You mean native linux BW does not scale for your host with
# of connections either? I guess this just means need another
setup for testing?

> - KK

^ permalink raw reply

* Re: [PATCH 0/2] net: Changes in queue allocation and freeing
From: Tom Herbert @ 2010-11-10 16:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1289385696.2860.147.camel@edumazet-laptop>

On Wed, Nov 10, 2010 at 2:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 09 novembre 2010 à 12:47 -0800, Tom Herbert a écrit :
>> Changes to both RX and TX queue allocation.  In both cases allocate
>> in alloc_netdev_mq and free in free_netdev.  For RX the reference
>> couting also changed, the device reference count can now be used.
>
> Oh well :)
>
> Are they preliminary patches so that XPS also dont need the "reference
> counts specific to TX queues" ? ;)
>
Yes, this should allow the xps maps to be in the net_device also.
Sorry I neglected to mention that.

Also I noticed that the comment about RX queues refcnts is no longer
valid.  I can respin patch if necessary.

diff --git a/net/core/dev.c b/net/core/dev.c
index 87d89ba..34a42a8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5029,10 +5029,6 @@ static int netif_alloc_rx_queues(struct net_device *dev)
        }
        dev->_rx = rx;

-       /*
-        * Set a pointer to first element in the array which holds the
-        * reference count.
-        */
        for (i = 0; i < count; i++)
                rx[i].dev = dev;
 #endif

>
>
>

^ permalink raw reply related

* Re: [PATCH 1/1] UDEV - Add 'udevlom' command line param to start_udev
From: Harald Hoyer @ 2010-11-10 16:32 UTC (permalink / raw)
  To: Narendra_K
  Cc: linux-hotplug, netdev, Matt_Domsch, Jordan_Hargrave, Charles_Rose
In-Reply-To: <20101103165505.GA3281@fedora-14-r710.oslab.blr.amer.dell.com>

On 11/03/2010 05:55 PM, Narendra_K@Dell.com wrote:
> Hello,
>
> This patch allows users to specify if they want the onboard network
> interfaces to be renamed to lomN by implementing a command line param
> 'udevlom'.
>
> From: Narendra K<narendra_k@dell.com>
> Subject: [PATCH] UDEV - Add 'udevlom' command line param to start_udev
>
> This patch implements 'udevlom' command line parameter, which
> when passed, results in onboard network interfaces getting
> renamed to lomN.
>
> Signed-off-by: Narendra K<narendra_k@dell.com>
> ---
>   start_udev |    5 +++++
>   1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/start_udev b/start_udev
> index 49fc286..57d60c9 100755
> --- a/start_udev
> +++ b/start_udev
> @@ -32,6 +32,7 @@ export TZ=/etc/localtime
>   . /etc/init.d/functions
>
>   prog=udev
> +cmdline=`cat /proc/cmdline`
>
>   touch_recursive() {
>   	( cd $1;
> @@ -60,6 +61,10 @@ fi
>
>   ret=$[$ret + $?]
>
> +if strstr "$cmdline" udevlom; then
> +	/sbin/udevadm control --env=UDEVLOM="y"
> +fi
> +
>   /sbin/udevadm trigger --type=subsystems --action=add
>   /sbin/udevadm trigger --type=devices --action=add
>   /sbin/udevadm settle

start_udev is obsolete with the use of systemd service files anyway in Fedora>=15

^ permalink raw reply

* Re: [PATCH 1/1] UDEV - Add 'udevlom' command line param to start_udev
From: Harald Hoyer @ 2010-11-10 16:37 UTC (permalink / raw)
  To: Narendra_K
  Cc: linux-hotplug, netdev, Matt_Domsch, Jordan_Hargrave, Charles_Rose
In-Reply-To: <4CDAC930.4010801@redhat.com>

On 11/10/2010 05:32 PM, Harald Hoyer wrote:
> On 11/03/2010 05:55 PM, Narendra_K@Dell.com wrote:
>> Hello,
>>
>> This patch allows users to specify if they want the onboard network
>> interfaces to be renamed to lomN by implementing a command line param
>> 'udevlom'.
>>
>> From: Narendra K<narendra_k@dell.com>
>> Subject: [PATCH] UDEV - Add 'udevlom' command line param to start_udev
>>
>> This patch implements 'udevlom' command line parameter, which
>> when passed, results in onboard network interfaces getting
>> renamed to lomN.
>>
>> Signed-off-by: Narendra K<narendra_k@dell.com>
>> ---
>> start_udev | 5 +++++
>> 1 files changed, 5 insertions(+), 0 deletions(-)
>>
>> diff --git a/start_udev b/start_udev
>> index 49fc286..57d60c9 100755
>> --- a/start_udev
>> +++ b/start_udev
>> @@ -32,6 +32,7 @@ export TZ=/etc/localtime
>> . /etc/init.d/functions
>>
>> prog=udev
>> +cmdline=`cat /proc/cmdline`
>>
>> touch_recursive() {
>> ( cd $1;
>> @@ -60,6 +61,10 @@ fi
>>
>> ret=$[$ret + $?]
>>
>> +if strstr "$cmdline" udevlom; then
>> + /sbin/udevadm control --env=UDEVLOM="y"
>> +fi
>> +
>> /sbin/udevadm trigger --type=subsystems --action=add
>> /sbin/udevadm trigger --type=devices --action=add
>> /sbin/udevadm settle
>
> start_udev is obsolete with the use of systemd service files anyway in Fedora>=15

not saying that we really should use "udevlom" on the kernel command line, but 
you could use:

IMPORT{cmdline}="udevlom"
KERNEL="eth*", ENV{udevlom}==1, ....

^ permalink raw reply

* Re: sk->sk_socket seems to disappear before connection termination
From: Eric Dumazet @ 2010-11-10 16:44 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: David Howells, Netfilter Developer Mailing List, netdev,
	Rafał Maj
In-Reply-To: <alpine.LNX.2.01.1011101148060.18018@obet.zrqbmnf.qr>

Le mercredi 10 novembre 2010 à 11:53 +0100, Jan Engelhardt a écrit :
> On Wednesday 2010-11-10 06:47, Eric Dumazet wrote:
> >Le mercredi 10 novembre 2010 à 02:09 +0100, Jan Engelhardt a écrit :
> >> Hi,
> >> 
> >> Rafał reported this to us on IRC, paraphrasing what has been observed:
> >> 
> >> Using a simple rule like `iptables -A OUTPUT -p tcp --dport 80 -j LOG 
> >> --log-uid`, one can observe on creating a connection and terminating
> >> it that the trailing packets have skb->sk->sk_socket == NULL.
> >> Is this intended? Is the socket not retained until after TCP has
> >> sent out the closing exchange?
> >> 
> >> As I can reproduce:
> >> 
> >> $ telnet 134.76.13.21 80
> >> Trying 134.76.13.21...
> >> Connected to 134.76.13.21.
> >> Escape character is '^]'.
> >> ^]
> >> telnet> ^D
> >> Connection closed.
> >> 
> >> [491419.500978] IN= OUT=tun0 SRC=134.76.2.163 DST=134.76.13.21 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=35420 DF PROTO=TCP SPT=58613 DPT=80 WINDOW=5488 RES=0x00 SYN URGP=0 UID=25121 GID=100 
> >> [491419.511533] IN= OUT=tun0 SRC=134.76.2.163 DST=134.76.13.21 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=35421 DF PROTO=TCP SPT=58613 DPT=80 WINDOW=86 RES=0x00 ACK URGP=0 UID=25121 GID=100 
> >> [491420.052182] IN= OUT=tun0 SRC=134.76.2.163 DST=134.76.13.21 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=35422 DF PROTO=TCP SPT=58613 DPT=80 WINDOW=86 RES=0x00 ACK FIN URGP=0 UID=25121 GID=100 
> >> [491420.063619] IN= OUT=tun0 SRC=134.76.2.163 DST=134.76.13.21 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=35423 DF PROTO=TCP SPT=58613 DPT=80 WINDOW=86 RES=0x00 ACK URGP=0 
> >
> >Hmmm... skb->sk->sk_socket is really NULL ?
> >Are you sure its not skb->sk->sk_socket->file which is NULL ?
> 
> I am certain of it, having augmented ipt_LOG/xt_LOGMARK temporarily by 
> appropriate printks.
> 
> >In this case, you might need to use sock_i_uid() / sock_i_ino() as a
> >fallback ? (expensive because they take a rwlock)
> 
> No, sock_i_uid also uses sk->sk_socket. What is interesting though is 
> that sock_i_uid uses SOCK_INODE(sk->sk_socket)->i_uid, but xt_owner uses 
> sk->sk_socket->file->f_cred->fsuid. Would you have an idea as to why 
> that is?
> Dave Howells (cced) did the last change on it.
> 

Hmm, this is not what I get here. Could you please recheck ?

diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c
index 72ffc8f..b0933b7 100644
--- a/net/ipv4/netfilter/ipt_LOG.c
+++ b/net/ipv4/netfilter/ipt_LOG.c
@@ -337,6 +337,8 @@ static void dump_packet(struct sbuff *m,
        /* Max length: 15 "UID=4294967295 " */
        if ((logflags & IPT_LOG_UID) && !iphoff && skb->sk) {
                read_lock_bh(&skb->sk->sk_callback_lock);
+               pr_err("sk=%p sk->sk_socket=%p file=%p\n",
+                       skb->sk, skb->sk->sk_socket, skb->sk->sk_socket ? skb->sk->sk_socket->file : NULL);
                if (skb->sk->sk_socket && skb->sk->sk_socket->file)
                        sb_add(m, "UID=%u GID=%u ",
                                skb->sk->sk_socket->file->f_cred->fsuid,


[ 9917.808796] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
[ 9917.808851] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=63701 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=4380 RES=0x00 SYN URGP=0 UID=0 GID=0 
[ 9917.809091] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
[ 9917.809142] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=63702 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK URGP=0 UID=0 GID=0 
[ 9917.814199] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
[ 9917.814251] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=63703 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK URGP=0 UID=0 GID=0 
[ 9920.234680] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
[ 9920.234731] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=63704 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK FIN URGP=0 UID=0 GID=0 
[ 9920.235221] ipt_LOG: sk=ffff880078998000 sk->sk_socket=ffff880078c58300 file=          (null)
[ 9920.235271] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK URGP=0 


You can see in my log, that the last packet seems to be from a different socket !
(sk pointer changed to ffff880078998000 !)

Well well well, thats an ACK, in answer to FIN packet received from remote side.






^ permalink raw reply related

* Re: [PATCH] ucc_geth: Fix hung tasks.
From: Joakim Tjernlund @ 2010-11-10 16:57 UTC (permalink / raw)
  Cc: Anton Vorontsov, netdev
In-Reply-To: <OF7C87528F.1727C94B-ONC12577D7.004DB53C-C12577D7.004DF235@LocalDomain>

Joakim Tjernlund/Transmode wrote on 2010/11/10 15:11:22:
>
> Actually, there is something wrong anyway with TX timeout
> so don't use this patch. I must investigate more but
> it seems like cancel_work_sync hangs whenever an TX timeout
> occurs.

OK, found the problem. Currently ucc_geth bring the IF down and up
each time a TX timeout occurs which means you cannot do cancel_work_sync()
in ucc_geth_close as it will dead lock.

Looking at gianfar, it just reinits the controller and PHY and
I guess ucc_geth really should do the same.
This patch tries to do that but I am not sure it recovers
after a TX timeout.

Anton, what do think? If OK with you I will write up
a proper patch.

 diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 6647ed7..133aaba 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2065,9 +2065,6 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	/* Disable Rx and Tx */
 	clrbits32(&ug_regs->maccfg1, MACCFG1_ENABLE_RX | MACCFG1_ENABLE_TX);

-	phy_disconnect(ugeth->phydev);
-	ugeth->phydev = NULL;
-
 	ucc_geth_memclean(ugeth);
 }

@@ -3558,6 +3555,8 @@ static int ucc_geth_close(struct net_device *dev)

 	cancel_work_sync(&ugeth->timeout_work);
 	ucc_geth_stop(ugeth);
+	phy_disconnect(ugeth->phydev);
+	ugeth->phydev = NULL;

 	free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);

@@ -3586,8 +3585,12 @@ static void ucc_geth_timeout_work(struct work_struct *work)
 		 * Must reset MAC *and* PHY. This is done by reopening
 		 * the device.
 		 */
-		ucc_geth_close(dev);
-		ucc_geth_open(dev);
+		netif_tx_stop_all_queues(dev);
+		ucc_geth_stop(ugeth);
+		ucc_geth_init_mac(ugeth);
+		/* Must start PHY here */
+		phy_start(ugeth->phydev);
+		netif_tx_start_all_queues(dev);
 	}

 	netif_tx_schedule_all(dev);

>
> Joakim Tjernlund/Transmode wrote on 2010/11/10 13:05:28:
> >
> > Ping?
> >
> > Even though this patch didn't solve my hang it is still a bug.
> >
> >      Jocke
> >
> > Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote on 2010/11/08 11:23:39:
> >
> > > From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > To: linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, Anton Vorontsov <avorontsov@ru.mvista.com>
> > > Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > Date: 2010/11/08 11:23
> > > Subject: [PATCH] ucc_geth: Fix hung tasks.
> > >
> > > We noticed a few hangs like this:
> > >
> > > INFO: task ifconfig:572 blocked for more than 120 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > ifconfig      D 0ff65760     0   572    369 0x00000000
> > > Call Trace:
> > > [c6157be0] [c6008460] 0xc6008460 (unreliable)
> > > [c6157ca0] [c0008608] __switch_to+0x4c/0x6c
> > > [c6157cb0] [c028fecc] schedule+0x184/0x310
> > > [c6157ce0] [c0290e54] __mutex_lock_slowpath+0xa4/0x150
> > > [c6157d20] [c0290c48] mutex_lock+0x44/0x48
> > > [c6157d30] [c01aba74] phy_stop+0x20/0x70
> > > [c6157d40] [c01aef40] ucc_geth_stop+0x30/0x98
> > > [c6157d60] [c01b18fc] ucc_geth_close+0x9c/0xdc
> > > [c6157d80] [c01db0cc] __dev_close+0xa0/0xd0
> > > [c6157d90] [c01deddc] __dev_change_flags+0x8c/0x148
> > > [c6157db0] [c01def54] dev_change_flags+0x1c/0x64
> > > [c6157dd0] [c0237ac8] devinet_ioctl+0x678/0x784
> > > [c6157e50] [c0239a58] inet_ioctl+0xb0/0xbc
> > > [c6157e60] [c01cafa8] sock_ioctl+0x174/0x2a0
> > > [c6157e80] [c009a16c] vfs_ioctl+0xcc/0xe0
> > > [c6157ea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
> > > [c6157f10] [c009b0b0] sys_ioctl+0x40/0x74
> > > [c6157f40] [c00117c4] ret_from_syscall+0x0/0x38
> > >
> > > I THINK this is due to a missing cancel_work_sync in the driver
> > > although we cannot be sure. I found this by comparing
> > > ucc_geth with gianfar.
> > >
> > > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > ---
> > >  drivers/net/ucc_geth.c |    1 +
> > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
> > > index 97f9f7d..6647ed7 100644
> > > --- a/drivers/net/ucc_geth.c
> > > +++ b/drivers/net/ucc_geth.c
> > > @@ -3556,6 +3556,7 @@ static int ucc_geth_close(struct net_device *dev)
> > >
> > >     napi_disable(&ugeth->napi);
> > >
> > > +   cancel_work_sync(&ugeth->timeout_work);
> > >     ucc_geth_stop(ugeth);
> > >
> > >     free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
> > > --
> > > 1.7.2.2
> > >


^ permalink raw reply related

* Re: sk->sk_socket seems to disappear before connection termination
From: Jan Engelhardt @ 2010-11-10 17:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Howells, Netfilter Developer Mailing List, netdev,
	Rafał Maj
In-Reply-To: <1289407451.2860.239.camel@edumazet-laptop>


On Wednesday 2010-11-10 17:44, Eric Dumazet wrote:
>
>[ 9920.234680] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
>[ 9920.234731] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=63704 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK FIN URGP=0 UID=0 GID=0 
>[ 9920.235221] ipt_LOG: sk=ffff880078998000 sk->sk_socket=ffff880078c58300 file=          (null)
>[ 9920.235271] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK URGP=0 
>
>You can see in my log, that the last packet seems to be from a different
>socket ! (sk pointer changed to ffff880078998000 !)

Yes, that's it.

>Well well well, thats an ACK, in answer to FIN packet received from remote
>side.

But why is it not handled by sk ffff880118bd32c0 anymore?
It does have, after all, the same (addr,port) tuple.
And it is sort of a hiccup for xt_owner users.

^ permalink raw reply

* Re: sk->sk_socket seems to disappear before connection termination
From: Eric Dumazet @ 2010-11-10 17:37 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: David Howells, Netfilter Developer Mailing List, netdev,
	Rafał Maj
In-Reply-To: <alpine.LNX.2.01.1011101816330.2886@obet.zrqbmnf.qr>

Le mercredi 10 novembre 2010 à 18:17 +0100, Jan Engelhardt a écrit :
> On Wednesday 2010-11-10 17:44, Eric Dumazet wrote:
> >
> >[ 9920.234680] ipt_LOG: sk=ffff880118bd32c0 sk->sk_socket=ffff88011d0d8c00 file=ffff88011cd4e100
> >[ 9920.234731] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=63704 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK FIN URGP=0 UID=0 GID=0 
> >[ 9920.235221] ipt_LOG: sk=ffff880078998000 sk->sk_socket=ffff880078c58300 file=          (null)
> >[ 9920.235271] IN= OUT=eth1 SRC=192.168.20.108 DST=192.168.20.110 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=60088 DPT=22 WINDOW=35 RES=0x00 ACK URGP=0 
> >
> >You can see in my log, that the last packet seems to be from a different
> >socket ! (sk pointer changed to ffff880078998000 !)
> 
> Yes, that's it.
> 
> >Well well well, thats an ACK, in answer to FIN packet received from remote
> >side.
> 
> But why is it not handled by sk ffff880118bd32c0 anymore?
> It does have, after all, the same (addr,port) tuple.
> And it is sort of a hiccup for xt_owner users.

Its because of TIMEWAIT state : no more socket

We use a special tcp socket (net->ipv4.tcp_sock) in tcp_v4_send_ack()

Its yet another contention point on SMP machines :(




^ permalink raw reply

* Re: [PATCH] macvlan: lockless tx path
From: Ben Greear @ 2010-11-10 17:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev
In-Reply-To: <1289403709.2860.216.camel@edumazet-laptop>

On 11/10/2010 07:41 AM, Eric Dumazet wrote:
> macvlan is a stacked device, like tunnels. We should use the lockless
> mechanism we are using in tunnels and loopback.
>
> This patch completely removes locking in TX path.
>
> tx stat counters are added into existing percpu stat structure, renamed
> from rx_stats to pcpu_stats.
>
> Note : this reverts commit 2c11455321f37 (macvlan: add multiqueue
> capability)
>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> ---
>   drivers/net/macvlan.c      |   72 +++++++++++++----------------------
>   include/linux/if_macvlan.h |   26 +++++++-----
>   2 files changed, 43 insertions(+), 55 deletions(-)
>

> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 8a2fd66..3779b5f 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -25,19 +25,23 @@ struct macvlan_port;
>   struct macvtap_queue;
>
>   /**
> - *	struct macvlan_rx_stats - MACVLAN percpu rx stats
> + *	struct macvlan_pcpu_stats - MACVLAN percpu stats
>    *	@rx_packets: number of received packets
>    *	@rx_bytes: number of received bytes
>    *	@rx_multicast: number of received multicast packets
> + *	@tx_

Minor nit..seems you missed a few there?

>    *	@syncp: synchronization point for 64bit counters
>    *	@rx_errors: number of errors
>    */
> -struct macvlan_rx_stats {
> +struct macvlan_pcpu_stats {
>   	u64			rx_packets;
>   	u64			rx_bytes;
>   	u64			rx_multicast;
> +	u64			tx_packets;
> +	u64			tx_bytes;
>   	struct u64_stats_sync	syncp;
>   	unsigned long		rx_errors;
> +	unsigned long		tx_dropped;

Any reason to not also make those u64?


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Your email id has won 750,000 GBP in gnld xmas promo.send info
From: richard-mullen @ 2010-11-10 17:43 UTC (permalink / raw)



Name::::::






Country::::::::

^ permalink raw reply

* Re: [PATCH] macvlan: lockless tx path
From: Eric Dumazet @ 2010-11-10 17:43 UTC (permalink / raw)
  To: Ben Greear; +Cc: David Miller, Patrick McHardy, netdev
In-Reply-To: <4CDAD8C8.20807@candelatech.com>

Le mercredi 10 novembre 2010 à 09:39 -0800, Ben Greear a écrit :

> >   /**
> > - *	struct macvlan_rx_stats - MACVLAN percpu rx stats
> > + *	struct macvlan_pcpu_stats - MACVLAN percpu stats
> >    *	@rx_packets: number of received packets
> >    *	@rx_bytes: number of received bytes
> >    *	@rx_multicast: number of received multicast packets
> > + *	@tx_
> 
> Minor nit..seems you missed a few there?
> 

Arg... you're right !

> >    *	@syncp: synchronization point for 64bit counters
> >    *	@rx_errors: number of errors
> >    */
> > -struct macvlan_rx_stats {
> > +struct macvlan_pcpu_stats {
> >   	u64			rx_packets;
> >   	u64			rx_bytes;
> >   	u64			rx_multicast;
> > +	u64			tx_packets;
> > +	u64			tx_bytes;
> >   	struct u64_stats_sync	syncp;
> >   	unsigned long		rx_errors;
> > +	unsigned long		tx_dropped;
> 
> Any reason to not also make those u64?
> 

Well, they are supposed to be not incremented often, and they are packet
counts only, so a wrap around in less than 5 seconds is very unlikely.




^ permalink raw reply

* Re: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.
From: David Miller @ 2010-11-10 17:46 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, herbert, jdike
In-Reply-To: <1289381008-5484-1-git-send-email-xiaohui.xin@intel.com>

From: xiaohui.xin@intel.com
Date: Wed, 10 Nov 2010 17:23:28 +0800

> From: Xin Xiaohui <xiaohui.xin@intel.com>
> 
>>2) The idea to key off of skb->dev in skb_release_data() is
>>   fundamentally flawed since many actions can change skb->dev on you,
>>   which will end up causing a leak of your external data areas.
> 
> How about this one? If the destructor_arg is not a good candidate,
> then I have to add an apparent field in shinfo.

If destructor_arg is actually a net_device pointer or similar,
you will need to take a reference count on it or similar.

Which means --> good bye performance especially on SMP.

You're going to be adding new serialization points and at
least two new atomics per packet.

^ permalink raw reply

* Re: [PATCH] Fix CAN info leak/minor heap overflow
From: David Miller @ 2010-11-10 17:51 UTC (permalink / raw)
  To: socketcan; +Cc: urs, netdev, drosenberg, security, torvalds
In-Reply-To: <4CDA412B.90900@hartkopp.net>

From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Wed, 10 Nov 2010 07:52:27 +0100

> IMHO the patch improves the historic situation and fixes the useless leakage
> of kernel addresses. Please consider to apply that procfs changes.

I'm only fine with fixing the kernel pointer fields in some way.

But moving forward any other change to the procfs file is simply
a waste of time.

You should create sysfs files and add logic to your tools to look
for them and use them if they exist.

Your forward path _SHOULD NOT_ be continuing this procfs versioning
madness.  Use something sane and do the work to make userland start
to be ready for this transition.

^ permalink raw reply

* Re: [PATCH] macvlan: lockless tx path
From: Ben Greear @ 2010-11-10 17:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev
In-Reply-To: <1289411027.2860.248.camel@edumazet-laptop>

On 11/10/2010 09:43 AM, Eric Dumazet wrote:
> Le mercredi 10 novembre 2010 à 09:39 -0800, Ben Greear a écrit :
>
>>>    /**
>>> - *	struct macvlan_rx_stats - MACVLAN percpu rx stats
>>> + *	struct macvlan_pcpu_stats - MACVLAN percpu stats
>>>     *	@rx_packets: number of received packets
>>>     *	@rx_bytes: number of received bytes
>>>     *	@rx_multicast: number of received multicast packets
>>> + *	@tx_
>>
>> Minor nit..seems you missed a few there?
>>
>
> Arg... you're right !
>
>>>     *	@syncp: synchronization point for 64bit counters
>>>     *	@rx_errors: number of errors
>>>     */
>>> -struct macvlan_rx_stats {
>>> +struct macvlan_pcpu_stats {
>>>    	u64			rx_packets;
>>>    	u64			rx_bytes;
>>>    	u64			rx_multicast;
>>> +	u64			tx_packets;
>>> +	u64			tx_bytes;
>>>    	struct u64_stats_sync	syncp;
>>>    	unsigned long		rx_errors;
>>> +	unsigned long		tx_dropped;
>>
>> Any reason to not also make those u64?
>>
>
> Well, they are supposed to be not incremented often, and they are packet
> counts only, so a wrap around in less than 5 seconds is very unlikely.

I agree, but if these can be read from user-space, it can be tricky to make
solid code to deal with wraps when the thing wrapping can be 32 or 64 bits,
depending on whether the kernel is compiled 32-bit or 64-bit.

So, my preference is to use u32 or u64 so there is no guesswork involved.

To be sure, this problem exists in lots of places already (/proc/net/dev comes to mind),
but the fewer places the better in my opinion.

That said, I don't feel too strongly about it, so if you want to keep these
stats as they are, I shall argue no more :)

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox