Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: Krzysztof Mazur @ 2012-11-29 16:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: David Laight, chas williams - CONTRACTOR, davem, netdev,
	linux-kernel, nathan
In-Reply-To: <1354204077.21562.172.camel@shinybook.infradead.org>

On Thu, Nov 29, 2012 at 03:47:57PM +0000, David Woodhouse wrote:
> On Thu, 2012-11-29 at 16:09 +0100, Krzysztof Mazur wrote:
> > 
> > I don't like two thinks about this patch:
> > 
> >         - if allos_skb(sizeof(*header), GFP_ATOMIC) at beginning of
> >           pclose() fails we will crash
> > 
> >         - if card wakes up after this timeout we will probably crash too
> > 
> > That's why proposed different approach, but it has other problems.
> 
> How about this variant on what you suggested. Yes, we can definitely
> remove everything that's in the queue... as long as we use
> skb_queue_walk_safe() instead of skb_queue_walk().
> 
> We can use GFP_KERNEL instead of GFP_ATOMIC, which at least reduces the
> likelihood of failing to close the vcc.
> 
> We end up waiting *only* if there is a packet which is *currently* being
> DMA'd to the card. And if the card doesn't take that within 5 seconds,
> it almost certainly never will. So I can live with that.
> 

Yeah, that shouldn't happen.

> +				if (!test_bit(ATM_VF_READY, &vcc->flags))
> +					wake_up(&card->param_wq);
> +			} else

according to CodingStyle:

+ } else {
>  				dev_kfree_skb_irq(oldskb);
> -			}
+ }

Krzysiek

^ permalink raw reply

* RE: [net-next PATCH V2 9/9] net: increase frag queue hash size andcache-line
From: David Laight @ 2012-11-29 16:39 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Eric Dumazet, David S. Miller,
	Florian Westphal
  Cc: netdev, Pablo Neira Ayuso, Thomas Graf, Cong Wang,
	Patrick McHardy, Paul E. McKenney, Herbert Xu
In-Reply-To: <20121129161552.17754.86087.stgit@dragon>

> Increase frag queue hash size and assure cache-line alignment to
> avoid false sharing.  Hash size is set to 256, because I have
> observed 206 frag queues in use at 4x10G with packet size 4416 bytes
> (three fragments).

Since it is a hash list, won't there always be workloads where
there are hash collisions?
I'm not sure I'm in favour of massively padding out structure
to the size of cache lines.

Both these changes add a moderate amount to the kernel data size
(those people who are worried about not being able to discard
code because hot_plug is always configured really ought to
be worried about the footprint of some of these hash tables).

We run Linux on embedded (small) ppc where there might only
be a handful of TCP connections, these sort of tables use up
precious memory.

While padding to cache line might reduce the number of cache
snoops when attacking the code, in a real life situation I
suspect the effect of making another cache line busy is as
likely to flush out some other important data.
This is similar to the reasons that excessive function inlining
and loop unrolling will speed up benchmarks but slow down real
code.

	David


^ permalink raw reply

* Re: [net-next PATCH V2 9/9] net: increase frag queue hash size and cache-line
From: Eric Dumazet @ 2012-11-29 16:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
	Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121129161552.17754.86087.stgit@dragon>

On Thu, 2012-11-29 at 17:16 +0100, Jesper Dangaard Brouer wrote:
> Increase frag queue hash size and assure cache-line alignment to
> avoid false sharing.  Hash size is set to 256, because I have
> observed 206 frag queues in use at 4x10G with packet size 4416 bytes
> (three fragments).
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
>  include/net/inet_frag.h  |    5 ++---
>  net/ipv4/inet_fragment.c |    2 +-
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 431f68e..c8ad7e4 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -47,13 +47,12 @@ struct inet_frag_queue {
>  	u16			max_size;
>  };
>  
> -#define INETFRAGS_HASHSZ		64
> -
> +#define INETFRAGS_HASHSZ	256
>  
>  struct inet_frag_bucket {
>  	struct hlist_head	chain;
>  	spinlock_t		chain_lock;
> -};
> +} ____cacheline_aligned_in_smp;
>  

This is a waste of memory.

Most linux powered devices dont care at all about fragments.

Just increase hashsz if you really want, and rely on hash dispersion
to avoid false sharing.

You gave no performance results for this patch anyway.

^ permalink raw reply

* Re: [net-next PATCH V2 5/9] net: frag, per CPU resource, mem limit and LRU list accounting
From: Eric Dumazet @ 2012-11-29 17:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
	Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121129161303.17754.47046.stgit@dragon>

On Thu, 2012-11-29 at 17:13 +0100, Jesper Dangaard Brouer wrote:
> The major performance bottleneck on NUMA systems, is the mem limit
> counter which is based an atomic counter.  This patch removes the
> cache-bouncing of the atomic counter, by moving this accounting to be
> bound to each CPU.  The LRU list also need to be done per CPU,
> in-order to keep the accounting straight.
> 
> If fragments belonging together is "sprayed" across CPUs, performance
> will still suffer, but due to NIC rxhashing this is not very common.
> Correct accounting in this situation is maintained by recording and
> "assigning" a CPU to a frag queue when its allocated (caused by the
> first packet associated packet).
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> ---
> V2:
>  - Rename struct cpu_resource -> frag_cpu_limit
>  - Move init functions from inet_frag.h to inet_fragment.c
>  - Cleanup per CPU in inet_frags_exit_net()
> 
>  include/net/inet_frag.h                 |   64 +++++++++++++++++++------------
>  net/ipv4/inet_fragment.c                |   50 ++++++++++++++++++------
>  net/ipv4/ip_fragment.c                  |    3 +
>  net/ipv6/netfilter/nf_conntrack_reasm.c |    2 -
>  net/ipv6/reassembly.c                   |    2 -
>  5 files changed, 80 insertions(+), 41 deletions(-)
> 
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 9bbef17..8421904 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -1,11 +1,22 @@
>  #ifndef __NET_FRAG_H__
>  #define __NET_FRAG_H__
>  
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +
> +/* Need to maintain these resource limits per CPU, else we will kill
> + * performance due to cache-line bouncing
> + */
> +struct frag_cpu_limit {
> +	atomic_t                mem;
> +	struct list_head        lru_list;
> +	spinlock_t              lru_lock;
> +} ____cacheline_aligned_in_smp;
> +

This looks like a big patch introducing a specific infrastructure, while
we already have lib/percpu_counter.c

Not counting the addition of a NR_CPUS array, which is really
unfortunate these days.

^ permalink raw reply

* Re: [net-next PATCH V2 8/9] net: frag queue locking per hash bucket
From: Eric Dumazet @ 2012-11-29 17:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
	Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121129161512.17754.93933.stgit@dragon>

On Thu, 2012-11-29 at 17:15 +0100, Jesper Dangaard Brouer wrote:
> This patch implements per hash bucket locking for the frag queue
> hash.  This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild.  This essentially reduce the
> readers-writer lock to a rebuild lock.

So we still have this huge contention on the reader-writer lock cache
line...

I would just remove it. (And remove hash rebuild, or make it RCU
compatible )

^ permalink raw reply

* Re: [PATCH 1/2] of_mdio: Honour "status=disabled" property of device
From: Rob Herring @ 2012-11-29 17:12 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Stephen Warren, devicetree-discuss, Rob Herring, Grant Likely,
	netdev, Barry.Song, w.sang, alexander sverdlin
In-Reply-To: <50B71290.2060200@sysgo.com>

On 11/29/2012 01:45 AM, Alexander Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@sysgo.com>
> 
> of_mdio: Honour "status=disabled" property of device
> 
> Currently of_mdiobus_register() function registers all PHY devices,
> independetly from their status property in device tree. According to
> "ePAPR 1.1" spec, device should only be registered if there is no
> "status" property, or it has "ok" (or "okay") value (see
> of_device_is_available()). In case of "platform devices",
> of_platform_device_create_pdata() checks for "status" and ensures
> that disabled devices are not pupulated. But such check for MDIO buses
> was missing until now. Fix it.
> 
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@sysgo.com>
> ---

Applied.

Rob

> --- linux.orig/drivers/of/of_mdio.c
> +++ linux/drivers/of/of_mdio.c
> @@ -53,7 +53,7 @@ int of_mdiobus_register(struct mii_bus *
>  		return rc;
>  
>  	/* Loop over the child nodes and register a phy_device for each one */
> -	for_each_child_of_node(np, child) {
> +	for_each_available_child_of_node(np, child) {
>  		const __be32 *paddr;
>  		u32 addr;
>  		int len;
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/devicetree-discuss
> 

^ permalink raw reply

* [PATCH net-next 0/3] mlx4_en: set number of rx/tx channels using ethtool
From: Amir Vadai @ 2012-11-29 17:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: Amir Vadai, Or Gerlitz, Oren Duer, netdev
In-Reply-To: <1354194894-16527-1-git-send-email-amirv@mellanox.com>

1. Added a record in the MAINTAINERS file for the mlx4_en driver
2. Fix set_ringparam not to forget tx moderation info + remove code duplication
3. Add support to changing number of rx/tx channels using ethtool

Amir Vadai (3):
  MAINTAINERS: Add Mellanox ethernet driver - mlx4_en
  net/mlx4_en: Fix TX moderation info loss after set_ringparam is
    called
  net/mlx4_en: Set number of rx/tx channels using ethtool

 MAINTAINERS                                     |    7 ++
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  128 +++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx4/en_main.c    |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   26 +++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    8 ++-
 6 files changed, 130 insertions(+), 43 deletions(-)

-- 
1.7.8.2

^ permalink raw reply

* [PATCH net-next 2/3] net/mlx4_en: Fix TX moderation info loss after set_ringparam is called
From: Amir Vadai @ 2012-11-29 17:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: Amir Vadai, Or Gerlitz, Oren Duer, netdev
In-Reply-To: <1354209333-27672-1-git-send-email-amirv@mellanox.com>

We need to re-set tx moderation information after calling set_ringparam
else default tx moderation will be used.
Also avoid related code duplication, by putting it in a utility function.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   59 ++++++++++++-----------
 1 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 9d0b88e..dc8ccb4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -43,6 +43,34 @@
 #define EN_ETHTOOL_SHORT_MASK cpu_to_be16(0xffff)
 #define EN_ETHTOOL_WORD_MASK  cpu_to_be32(0xffffffff)
 
+static int mlx4_en_moderation_update(struct mlx4_en_priv *priv)
+{
+	int i;
+	int err = 0;
+
+	for (i = 0; i < priv->tx_ring_num; i++) {
+		priv->tx_cq[i].moder_cnt = priv->tx_frames;
+		priv->tx_cq[i].moder_time = priv->tx_usecs;
+		err = mlx4_en_set_cq_moder(priv, &priv->tx_cq[i]);
+		if (err)
+			return err;
+	}
+
+	if (priv->adaptive_rx_coal)
+		return 0;
+
+	for (i = 0; i < priv->rx_ring_num; i++) {
+		priv->rx_cq[i].moder_cnt = priv->rx_frames;
+		priv->rx_cq[i].moder_time = priv->rx_usecs;
+		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
+		err = mlx4_en_set_cq_moder(priv, &priv->rx_cq[i]);
+		if (err)
+			return err;
+	}
+
+	return err;
+}
+
 static void
 mlx4_en_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo)
 {
@@ -381,7 +409,6 @@ static int mlx4_en_set_coalesce(struct net_device *dev,
 			      struct ethtool_coalesce *coal)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
-	int err, i;
 
 	priv->rx_frames = (coal->rx_max_coalesced_frames ==
 			   MLX4_EN_AUTO_CONF) ?
@@ -397,14 +424,6 @@ static int mlx4_en_set_coalesce(struct net_device *dev,
 	    coal->tx_max_coalesced_frames != priv->tx_frames) {
 		priv->tx_usecs = coal->tx_coalesce_usecs;
 		priv->tx_frames = coal->tx_max_coalesced_frames;
-		for (i = 0; i < priv->tx_ring_num; i++) {
-			priv->tx_cq[i].moder_cnt = priv->tx_frames;
-			priv->tx_cq[i].moder_time = priv->tx_usecs;
-			if (mlx4_en_set_cq_moder(priv, &priv->tx_cq[i])) {
-				en_warn(priv, "Failed changing moderation "
-					      "for TX cq %d\n", i);
-			}
-		}
 	}
 
 	/* Set adaptive coalescing params */
@@ -414,18 +433,8 @@ static int mlx4_en_set_coalesce(struct net_device *dev,
 	priv->rx_usecs_high = coal->rx_coalesce_usecs_high;
 	priv->sample_interval = coal->rate_sample_interval;
 	priv->adaptive_rx_coal = coal->use_adaptive_rx_coalesce;
-	if (priv->adaptive_rx_coal)
-		return 0;
 
-	for (i = 0; i < priv->rx_ring_num; i++) {
-		priv->rx_cq[i].moder_cnt = priv->rx_frames;
-		priv->rx_cq[i].moder_time = priv->rx_usecs;
-		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
-		err = mlx4_en_set_cq_moder(priv, &priv->rx_cq[i]);
-		if (err)
-			return err;
-	}
-	return 0;
+	return mlx4_en_moderation_update(priv);
 }
 
 static int mlx4_en_set_pauseparam(struct net_device *dev,
@@ -466,7 +475,6 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 	u32 rx_size, tx_size;
 	int port_up = 0;
 	int err = 0;
-	int i;
 
 	if (param->rx_jumbo_pending || param->rx_mini_pending)
 		return -EINVAL;
@@ -505,14 +513,7 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 			en_err(priv, "Failed starting port\n");
 	}
 
-	for (i = 0; i < priv->rx_ring_num; i++) {
-		priv->rx_cq[i].moder_cnt = priv->rx_frames;
-		priv->rx_cq[i].moder_time = priv->rx_usecs;
-		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
-		err = mlx4_en_set_cq_moder(priv, &priv->rx_cq[i]);
-		if (err)
-			goto out;
-	}
+	err = mlx4_en_moderation_update(priv);
 
 out:
 	mutex_unlock(&mdev->state_lock);
-- 
1.7.8.2

^ permalink raw reply related

* [PATCH net-next 1/3] MAINTAINERS: Add Mellanox ethernet driver - mlx4_en
From: Amir Vadai @ 2012-11-29 17:15 UTC (permalink / raw)
  To: David S. Miller
  Cc: Amir Vadai, Or Gerlitz, Oren Duer, netdev, Yevgeny Petrilin
In-Reply-To: <1354209333-27672-1-git-send-email-amirv@mellanox.com>

Set mlx4_en maintainer to Amir Vadai instead of Yevgeny Petrilin.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d72dd5..67fd3de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4822,6 +4822,13 @@ F:	Documentation/scsi/megaraid.txt
 F:	drivers/scsi/megaraid.*
 F:	drivers/scsi/megaraid/
 
+MELLANOX ETHERNET DRIVER (mlx4_en)
+M:	Amir Vadai <amirv@mellanox.com>
+L: 	netdev@vger.kernel.org
+S:	Supported
+W:	http://www.mellanox.com
+Q:	http://patchwork.ozlabs.org/project/netdev/list/
+
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
 W:	http://www.linux-mm.org
-- 
1.7.8.2

^ permalink raw reply related

* [PATCH net-next 3/3] net/mlx4_en: Set number of rx/tx channels using ethtool
From: Amir Vadai @ 2012-11-29 17:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: Amir Vadai, Or Gerlitz, Oren Duer, netdev
In-Reply-To: <1354209333-27672-1-git-send-email-amirv@mellanox.com>

Add support to changing number of rx/tx channels using
ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
is the number of rings per user priority - not total number of tx rings.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   69 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_main.c    |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   26 +++++----
 drivers/net/ethernet/mellanox/mlx4/en_tx.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    8 ++-
 5 files changed, 93 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index dc8ccb4..681bc1b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -999,6 +999,73 @@ static int mlx4_en_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 	return err;
 }
 
+static void mlx4_en_get_channels(struct net_device *dev,
+		struct ethtool_channels *channel)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+
+	memset(channel, 0, sizeof(*channel));
+
+	channel->max_rx = MAX_RX_RINGS;
+	channel->max_tx = MLX4_EN_MAX_TX_RING_P_UP;
+
+	channel->rx_count = priv->rx_ring_num;
+	channel->tx_count = priv->tx_ring_num / MLX4_EN_NUM_UP;
+}
+
+static int mlx4_en_set_channels(struct net_device *dev,
+		struct ethtool_channels *channel)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	int port_up;
+	int err = 0;
+
+	if (channel->other_count || channel->combined_count ||
+	    channel->tx_count > channel->max_tx ||
+	    channel->rx_count > channel->max_rx ||
+	    !channel->tx_count || !channel->rx_count)
+		return -EINVAL;
+
+	mutex_lock(&mdev->state_lock);
+	if (priv->port_up) {
+		port_up = 1;
+		mlx4_en_stop_port(dev);
+	}
+
+	mlx4_en_free_resources(priv);
+
+	priv->num_tx_rings_p_up = channel->tx_count;
+	priv->tx_ring_num = channel->tx_count * MLX4_EN_NUM_UP;
+	priv->rx_ring_num = channel->rx_count;
+
+	err = mlx4_en_alloc_resources(priv);
+	if (err) {
+		en_err(priv, "Failed reallocating port resources\n");
+		goto out;
+	}
+
+	netif_set_real_num_tx_queues(dev, priv->tx_ring_num);
+	netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
+
+	mlx4_en_setup_tc(dev, MLX4_EN_NUM_UP);
+
+	en_warn(priv, "Using %d TX rings\n", priv->tx_ring_num);
+	en_warn(priv, "Using %d RX rings\n", priv->rx_ring_num);
+
+	if (port_up) {
+		err = mlx4_en_start_port(dev);
+		if (err)
+			en_err(priv, "Failed starting port\n");
+	}
+
+	err = mlx4_en_moderation_update(priv);
+
+out:
+	mutex_unlock(&mdev->state_lock);
+	return err;
+}
+
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
 	.get_settings = mlx4_en_get_settings,
@@ -1023,6 +1090,8 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_rxfh_indir_size = mlx4_en_get_rxfh_indir_size,
 	.get_rxfh_indir = mlx4_en_get_rxfh_indir,
 	.set_rxfh_indir = mlx4_en_set_rxfh_indir,
+	.get_channels = mlx4_en_get_channels,
+	.set_channels = mlx4_en_set_channels,
 };
 
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index a52922e..3a2b8c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -250,7 +250,7 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
 				rounddown_pow_of_two(max_t(int, MIN_RX_RINGS,
 							   min_t(int,
 								 dev->caps.num_comp_vectors,
-								 MAX_RX_RINGS)));
+								 DEF_RX_RINGS)));
 		} else {
 			mdev->profile.prof[i].rx_ring_num = rounddown_pow_of_two(
 				min_t(int, dev->caps.comp_pool/
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 2b23ca2..7d1287f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -47,11 +47,11 @@
 #include "mlx4_en.h"
 #include "en_port.h"
 
-static int mlx4_en_setup_tc(struct net_device *dev, u8 up)
+int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int i;
-	unsigned int q, offset = 0;
+	unsigned int offset = 0;
 
 	if (up && up != MLX4_EN_NUM_UP)
 		return -EINVAL;
@@ -59,10 +59,9 @@ static int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	netdev_set_num_tc(dev, up);
 
 	/* Partition Tx queues evenly amongst UP's */
-	q = priv->tx_ring_num / up;
 	for (i = 0; i < up; i++) {
-		netdev_set_tc_queue(dev, i, q, offset);
-		offset += q;
+		netdev_set_tc_queue(dev, i, priv->num_tx_rings_p_up, offset);
+		offset += priv->num_tx_rings_p_up;
 	}
 
 	return 0;
@@ -1114,7 +1113,7 @@ int mlx4_en_start_port(struct net_device *dev)
 		/* Configure ring */
 		tx_ring = &priv->tx_ring[i];
 		err = mlx4_en_activate_tx_ring(priv, tx_ring, cq->mcq.cqn,
-			i / priv->mdev->profile.num_tx_rings_p_up);
+			i / priv->num_tx_rings_p_up);
 		if (err) {
 			en_err(priv, "Failed allocating Tx ring\n");
 			mlx4_en_deactivate_cq(priv, cq);
@@ -1564,10 +1563,13 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	int err;
 
 	dev = alloc_etherdev_mqs(sizeof(struct mlx4_en_priv),
-	    prof->tx_ring_num, prof->rx_ring_num);
+				 MAX_TX_RINGS, MAX_RX_RINGS);
 	if (dev == NULL)
 		return -ENOMEM;
 
+	netif_set_real_num_tx_queues(dev, prof->tx_ring_num);
+	netif_set_real_num_rx_queues(dev, prof->rx_ring_num);
+
 	SET_NETDEV_DEV(dev, &mdev->dev->pdev->dev);
 	dev->dev_id =  port - 1;
 
@@ -1586,15 +1588,17 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	priv->flags = prof->flags;
 	priv->ctrl_flags = cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE |
 			MLX4_WQE_CTRL_SOLICITED);
+	priv->num_tx_rings_p_up = mdev->profile.num_tx_rings_p_up;
 	priv->tx_ring_num = prof->tx_ring_num;
-	priv->tx_ring = kzalloc(sizeof(struct mlx4_en_tx_ring) *
-			priv->tx_ring_num, GFP_KERNEL);
+
+	priv->tx_ring = kzalloc(sizeof(struct mlx4_en_tx_ring) * MAX_TX_RINGS,
+				GFP_KERNEL);
 	if (!priv->tx_ring) {
 		err = -ENOMEM;
 		goto out;
 	}
-	priv->tx_cq = kzalloc(sizeof(struct mlx4_en_cq) * priv->tx_ring_num,
-			GFP_KERNEL);
+	priv->tx_cq = kzalloc(sizeof(struct mlx4_en_cq) * MAX_RX_RINGS,
+			      GFP_KERNEL);
 	if (!priv->tx_cq) {
 		err = -ENOMEM;
 		goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index b35094c..1f571d0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -523,7 +523,7 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk
 u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
-	u16 rings_p_up = priv->mdev->profile.num_tx_rings_p_up;
+	u16 rings_p_up = priv->num_tx_rings_p_up;
 	u8 up = 0;
 
 	if (dev->num_tc)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d3eba8b..334ec48 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -67,7 +67,8 @@
 
 #define MLX4_EN_PAGE_SHIFT	12
 #define MLX4_EN_PAGE_SIZE	(1 << MLX4_EN_PAGE_SHIFT)
-#define MAX_RX_RINGS		16
+#define DEF_RX_RINGS		16
+#define MAX_RX_RINGS		128
 #define MIN_RX_RINGS		4
 #define TXBB_SIZE		64
 #define HEADROOM		(2048 / TXBB_SIZE + 1)
@@ -118,6 +119,8 @@ enum {
 #define MLX4_EN_NUM_UP			8
 #define MLX4_EN_DEF_TX_RING_SIZE	512
 #define MLX4_EN_DEF_RX_RING_SIZE  	1024
+#define MAX_TX_RINGS			(MLX4_EN_MAX_TX_RING_P_UP * \
+					 MLX4_EN_NUM_UP)
 
 /* Target number of packets to coalesce with interrupt moderation */
 #define MLX4_EN_RX_COAL_TARGET	44
@@ -476,6 +479,7 @@ struct mlx4_en_priv {
 	u32 flags;
 #define MLX4_EN_FLAG_PROMISC	0x1
 #define MLX4_EN_FLAG_MC_PROMISC	0x2
+	u8 num_tx_rings_p_up;
 	u32 tx_ring_num;
 	u32 rx_ring_num;
 	u32 rx_skb_size;
@@ -596,6 +600,8 @@ int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
 extern const struct dcbnl_rtnl_ops mlx4_en_dcbnl_ops;
 #endif
 
+int mlx4_en_setup_tc(struct net_device *dev, u8 up);
+
 #ifdef CONFIG_RFS_ACCEL
 void mlx4_en_cleanup_filters(struct mlx4_en_priv *priv,
 			     struct mlx4_en_rx_ring *rx_ring);
-- 
1.7.8.2

^ permalink raw reply related

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: chas williams - CONTRACTOR @ 2012-11-29 17:17 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Krzysztof Mazur, David Laight, davem, netdev, linux-kernel,
	nathan
In-Reply-To: <1354206269.21562.189.camel@shinybook.infradead.org>

On Thu, 29 Nov 2012 16:24:29 +0000
David Woodhouse <dwmw2@infradead.org> wrote:

> On Thu, 2012-11-29 at 10:59 -0500, chas williams - CONTRACTOR wrote:
> > the part that bothers me (and i dont have the programmer's guide for
> > the solos hardware) is that you are watching for the PKT_PCLOSE to be
> > sent to the card.  shouldnt you be watching for the PKT_PCLOSE to be
> > returned from the card (assuming it does such a thing) so that you can
> > be assured that the tx/rx for this vpi/vci pair has been "stopped"?
> 
> Define "stopped".
> 
> For the RX case... the other end may *always* take it upon itself to
> send us a packet marked with arbitrary VCI/VPI, right? There's no
> connection setup for it "on the wire", in the case of PVC?

most atm hardware that i am familiar with, wont deliver vpi/vci data
unless you are actually trying to receive it.  however, this hardware
is generally doing its reassembly in hardware and delivering aal5
pdu's and needs to manage its memory resources carefully.  you might be
trying to reassemble 1000 pdu's from different vpi/vci's.

> So bearing that in mind: from the moment ATM_VF_READY gets cleared, as
> far as the ATM core is concerned, we will no longer receive packets on
> the given VCC. If we receive any, we'll just complain about receiving
> packets for an unknown VCI/VPI.
> 
> For the TX case ... yes, we need to be sure we aren't continuing to send
> packets after our close() routine completes. We *used* to, but the
> resulting ->pop() calls were causing problems, and that's why we're
> looking at this code path closer. The currently proposed patches (except
> one suggestion from Krzyztof that we both shouted down) would fix that.

again part of this is poor synchronization.  the detach protocol (i.e.
push of a NULL skb) should be flushing any pending transmits and
shutting down whatever in the protocol is doing any sending and
receiving.  however, i release this might be difficult to do since the
detach protocol is invoked in such a strange way.

^ permalink raw reply

* Re: [net-next RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent
From: Stephen Hemminger @ 2012-11-29 17:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, davem
In-Reply-To: <20121129113012.GA7866@redhat.com>

On Thu, 29 Nov 2012 13:30:13 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Nov 29, 2012 at 06:13:13PM +0800, Jason Wang wrote:
> > On Wednesday, November 28, 2012 08:53:05 AM Stephen Hemminger wrote:
> > > On Wed, 28 Nov 2012 14:48:52 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> > > > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > > > 
> > > > > Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > > >>> On Mon, 26 Nov 2012 15:56:52 +0800
> > > > >>> 
> > > > >>> Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> Some deivces do not free the old tx skbs immediately after it has
> > > > >>>> been sent
> > > > >>>> (usually in tx interrupt). One such example is virtio-net which
> > > > >>>> optimizes for virt and only free the possible old tx skbs during the
> > > > >>>> next packet sending. This would lead the pktgen to wait forever in
> > > > >>>> the refcount of the skb if no other pakcet will be sent afterwards.
> > > > >>>> 
> > > > >>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > > >>>> which could notify the pktgen that the device does not free skb
> > > > >>>> immediately after it has been sent and let it not to wait for the
> > > > >>>> refcount to be one.
> > > > >>>> 
> > > > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >>> 
> > > > >>> Another alternative would be using skb_orphan() and skb->destructor.
> > > > >>> There are other cases where skb's are not freed right away.
> > > > >>> --
> > > > >>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > >>> the body of a message to majordomo@vger.kernel.org
> > > > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > >> 
> > > > >> Hi Stephen:
> > > > >> 
> > > > >> Do you mean registering a skb->destructor for pktgen then set and check
> > > > >> bits in skb->tx_flag?
> > > > > 
> > > > > Yes. Register a destructor that does something like update a counter
> > > > > (number of packets pending), then just spin while number of packets
> > > > > pending is over threshold.
> > > > > --
> > > > 
> > > > Not sure this is the best method, since pktgen was used to test the tx
> > > > process of the device driver and NIC. If we use skb_orhpan(), we would
> > > > miss the test of tx completion part.
> > > 
> > > There are other places that delay freeing and your solution would mean
> > > finding and fixing all those. Code that does that already has to use
> > > skb_orphan() to work, and I was looking for a way that could use that.
> > > Introducing another flag value seems like a long term burden.
> > > 
> > 
> > Get the point, will draft another version.
> >
> > > Alternatively, virtio could do cleanup more aggressively. Maybe in response
> > > to ring getting half full, or add a cleanup timer or something to avoid the
> > > problem.
> 
> Timer would prevent complete deadlock but it is very expensive
> in the virt scenario.
> pulling at ring half full would only help if ring gets half full :)
> which it does not have to.

A timer that fires once (when idle) to mop up the last packet in a stream
would be not very expensive. Most of the time, it would just be continually
rescheduled.

^ permalink raw reply

* Re: [PATCH net-next 1/3] MAINTAINERS: Add Mellanox ethernet driver - mlx4_en
From: Joe Perches @ 2012-11-29 17:29 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, Or Gerlitz, Oren Duer, netdev, Yevgeny Petrilin
In-Reply-To: <1354209333-27672-2-git-send-email-amirv@mellanox.com>

On Thu, 2012-11-29 at 19:15 +0200, Amir Vadai wrote:
> Set mlx4_en maintainer to Amir Vadai instead of Yevgeny Petrilin.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
> ---
>  MAINTAINERS |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5d72dd5..67fd3de 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4822,6 +4822,13 @@ F:	Documentation/scsi/megaraid.txt
>  F:	drivers/scsi/megaraid.*
>  F:	drivers/scsi/megaraid/
>  
> +MELLANOX ETHERNET DRIVER (mlx4_en)
> +M:	Amir Vadai <amirv@mellanox.com>
> +L: 	netdev@vger.kernel.org
> +S:	Supported
> +W:	http://www.mellanox.com
> +Q:	http://patchwork.ozlabs.org/project/netdev/list/

What about a file pattern?  Maybe
F:	drivers/net/ethernet/mellanox/mlx/

^ permalink raw reply

* Re: [PATCH v3 net-next] sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call
From: Vlad Yasevich @ 2012-11-29 17:31 UTC (permalink / raw)
  To: Michele Baldessari
  Cc: linux-sctp, Neil Horman, Thomas Graf, netdev, David S. Miller
In-Reply-To: <1354131552-24889-1-git-send-email-michele@acksyn.org>

On 11/28/2012 02:39 PM, Michele Baldessari wrote:

> diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
> index 397296f..7edc89d 100644
> --- a/net/sctp/inqueue.c
> +++ b/net/sctp/inqueue.c
> @@ -105,6 +105,9 @@ void sctp_inq_push(struct sctp_inq *q, struct sctp_chunk *chunk)
>   	 */
>   	list_add_tail(&chunk->list, &q->in_chunk_list);
>   	q->immediate.func(&q->immediate);
> +
> +	if (chunk->asoc)
> +		chunk->asoc->stats.ipackets++;

you may want to consider putting this before the immediate.func() call, 
so that you record an incoming packet before it's fully processed.  This
would mimic the behavior of the rest of the stats you are collecting, as
you typically increment the stat and then process the data.

>   }
>
>   /* Peek at the next chunk on the inqeue. */

>
> +/*
> + * SCTP_GET_ASSOC_STATS
> + *
> + * This option retrieves local per endpoint statistics. It is modeled
> + * after OpenSolaris' implementation
> + */
> +static int sctp_getsockopt_assoc_stats(struct sock *sk, int len,
> +				       char __user *optval,
> +				       int __user *optlen)
> +{
> +	struct sctp_assoc_stats sas;
> +	struct sctp_association *asoc = NULL;
> +
> +	/* User must provide at least the assoc id */
> +	if (len < sizeof(sctp_assoc_t))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&sas, optval, len))
> +		return -EFAULT;
> +
> +	asoc = sctp_id2assoc(sk, sas.sas_assoc_id);
> +	if (!asoc)
> +		return -EINVAL;
> +
> +	sas.sas_rtxchunks = asoc->stats.rtxchunks;
> +	sas.sas_gapcnt = asoc->stats.gapcnt;
> +	sas.sas_outofseqtsns = asoc->stats.outofseqtsns;
> +	sas.sas_osacks = asoc->stats.osacks;
> +	sas.sas_isacks = asoc->stats.isacks;
> +	sas.sas_octrlchunks = asoc->stats.octrlchunks;
> +	sas.sas_ictrlchunks = asoc->stats.ictrlchunks;
> +	sas.sas_oodchunks = asoc->stats.oodchunks;
> +	sas.sas_iodchunks = asoc->stats.iodchunks;
> +	sas.sas_ouodchunks = asoc->stats.ouodchunks;
> +	sas.sas_iuodchunks = asoc->stats.iuodchunks;
> +	sas.sas_idupchunks = asoc->stats.idupchunks;
> +	sas.sas_opackets = asoc->stats.opackets;
> +	sas.sas_ipackets = asoc->stats.ipackets;
> +
> +	/* New high max rto observed, will return 0 if not a single
> +	 * RTO update took place. obs_rto_ipaddr will be bogus
> +	 * in such a case
> +	 */
> +	sas.sas_maxrto = asoc->stats.max_obs_rto;
> +	memcpy(&sas.sas_obs_rto_ipaddr, &asoc->stats.obs_rto_ipaddr,
> +		sizeof(struct sockaddr_storage));
> +
> +	/* Mark beginning of a new observation period */
> +	asoc->stats.max_obs_rto = 0;

Ok.  That's better.  If there are 2 fast queries you get 0 on the 
second, but that still feels a bit strange to me.  I think resetting
max_obs_rto to rto_min might make more sense.  rto_min is the low bound
and rto can't go below that.  So, on the next rto measurement, that'll
be the smallest value of max_obs_rto.  IMO, it might be better to reset
the max_obs_rto to rto_min, so that
  1) We always see a valid rto value in the field.
  2) We can more easily see the variances in rto over time.

What do you think?

-vlad
> +
> +	/* Allow the struct to grow and fill in as much as possible */
> +	len = min_t(size_t, len, sizeof(sas));
> +
> +	if (put_user(len, optlen))
> +		return -EFAULT;
> +
> +	SCTP_DEBUG_PRINTK("sctp_getsockopt_assoc_stat(%d): %d\n",
> +			  len, sas.sas_assoc_id);
> +
> +	if (copy_to_user(optval, &sas, len))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>   SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   				char __user *optval, int __user *optlen)
>   {
> @@ -5776,6 +5842,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   	case SCTP_PEER_ADDR_THLDS:
>   		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len, optlen);
>   		break;
> +	case SCTP_GET_ASSOC_STATS:
> +		retval = sctp_getsockopt_assoc_stats(sk, len, optval, optlen);
> +		break;
>   	default:
>   		retval = -ENOPROTOOPT;
>   		break;
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index 953c21e..8c6920d 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -350,6 +350,7 @@ void sctp_transport_update_rto(struct sctp_transport *tp, __u32 rtt)
>
>   	/* 6.3.1 C3) After the computation, update RTO <- SRTT + 4 * RTTVAR. */
>   	tp->rto = tp->srtt + (tp->rttvar << 2);
> +	sctp_max_rto(tp->asoc, tp);
>
>   	/* 6.3.1 C6) Whenever RTO is computed, if it is less than RTO.Min
>   	 * seconds then it is rounded up to RTO.Min seconds.
> @@ -620,6 +621,7 @@ void sctp_transport_reset(struct sctp_transport *t)
>   	t->burst_limited = 0;
>   	t->ssthresh = asoc->peer.i.a_rwnd;
>   	t->rto = asoc->rto_initial;
> +	sctp_max_rto(asoc, t);
>   	t->rtt = 0;
>   	t->srtt = 0;
>   	t->rttvar = 0;
>

^ permalink raw reply

* Re: [net-next PATCH V2 5/9] net: frag, per CPU resource, mem limit and LRU list accounting
From: David Miller @ 2012-11-29 17:31 UTC (permalink / raw)
  To: eric.dumazet
  Cc: brouer, fw, netdev, pablo, tgraf, amwang, kaber, paulmck, herbert
In-Reply-To: <1354208776.14302.1898.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Nov 2012 09:06:16 -0800

> Not counting the addition of a NR_CPUS array, which is really
> unfortunate these days.

That is addressed in patch #6.

^ permalink raw reply

* Re: [PATCH 1/2] of_mdio: Honour "status=disabled" property of device
From: Grant Likely @ 2012-11-29 17:36 UTC (permalink / raw)
  To: Alexander Sverdlin, Stephen Warren, devicetree-discuss,
	Rob Herring
  Cc: alexander sverdlin, w.sang, Barry.Song, netdev
In-Reply-To: <50B71290.2060200@sysgo.com>

On Thu, 29 Nov 2012 08:45:20 +0100, Alexander Sverdlin <asv@sysgo.com> wrote:
> From: Alexander Sverdlin <alexander.sverdlin@sysgo.com>
> 
> of_mdio: Honour "status=disabled" property of device
> 
> Currently of_mdiobus_register() function registers all PHY devices,
> independetly from their status property in device tree. According to
> "ePAPR 1.1" spec, device should only be registered if there is no
> "status" property, or it has "ok" (or "okay") value (see
> of_device_is_available()). In case of "platform devices",
> of_platform_device_create_pdata() checks for "status" and ensures
> that disabled devices are not pupulated. But such check for MDIO buses
> was missing until now. Fix it.
> 
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@sysgo.com>

Applied, thanks.

g.

> ---
> --- linux.orig/drivers/of/of_mdio.c
> +++ linux/drivers/of/of_mdio.c
> @@ -53,7 +53,7 @@ int of_mdiobus_register(struct mii_bus *
>  		return rc;
>  
>  	/* Loop over the child nodes and register a phy_device for each one */
> -	for_each_child_of_node(np, child) {
> +	for_each_available_child_of_node(np, child) {
>  		const __be32 *paddr;
>  		u32 addr;
>  		int len;

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply

* Re: [PATCH v2] bonding: fix miimon and arp_interval delayed work race conditions
From: Jay Vosburgh @ 2012-11-29 17:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, andy, davem
In-Reply-To: <1354188691-15509-1-git-send-email-nikolay@redhat.com>

Nikolay Aleksandrov <nikolay@redhat.com> wrote:

>First I would give three observations which will be used later.
>Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
> This usage is wrong because the pending bit is cleared just before the
> work's fn is executed and if the function re-arms itself we might end up
> with the work still running. It's safe to call cancel_delayed_work_sync()
> even if the work is not queued at all.
>Observation 2: Use of INIT_DELAYED_WORK()
> Work needs to be initialized only once prior to (de/en)queueing.
>Observation 3: IFF_UP is set only after ndo_open is called
>
>Related race conditions:
>1. Race between bonding_store_miimon() and bonding_store_arp_interval()
> Because of Obs.1 we can end up having both works enqueued.
>2. Multiple races with INIT_DELAYED_WORK()
> Since the works are not protected by anything between INIT_DELAYED_WORK()
> and calls to (en/de)queue it is possible for races between the following
> functions:
> (races are also possible between the calls to INIT_DELAYED_WORK()
>  and workqueue code)
> bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
>			  bond_open(), enqueued functions
> bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
>				bond_open(), enqueued functions
>3. By Obs.1 we need to change bond_cancel_all()
>
>Bugs 1 and 2 are fixed by moving all work initializations in bond_open
>which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
>are cancelled in bond_close(), is guaranteed not to have any work
>enqueued.
>Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
>they can't race with bond_close and bond_open. The opposing work is
>cancelled only if the IFF_UP flag is set and it is cancelled
>unconditionally. The opposing work is already cancelled if the interface
>is down so no need to cancel it again. This way we don't need new
>synchronizations for the bonding workqueue. These bugs (and fixes) are
>tied together and belong in the same patch.
>Note: I have left 1 line intentionally over 80 characters (84) because I
>      didn't like how it looks broken down. If you'd prefer it otherwise,
>      then simply break it.
>
> v2: Make description text < 75 columns
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

>---
> drivers/net/bonding/bond_main.c  | 88 ++++++++++++----------------------------
> drivers/net/bonding/bond_sysfs.c | 34 +++++-----------
> 2 files changed, 36 insertions(+), 86 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 5f5b69f..1445c7d 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3459,6 +3459,28 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>
> /*-------------------------- Device entry points ----------------------------*/
>
>+static void bond_work_init_all(struct bonding *bond)
>+{
>+	INIT_DELAYED_WORK(&bond->mcast_work,
>+			  bond_resend_igmp_join_requests_delayed);
>+	INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor);
>+	INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor);
>+	if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
>+		INIT_DELAYED_WORK(&bond->arp_work, bond_activebackup_arp_mon);
>+	else
>+		INIT_DELAYED_WORK(&bond->arp_work, bond_loadbalance_arp_mon);
>+	INIT_DELAYED_WORK(&bond->ad_work, bond_3ad_state_machine_handler);
>+}
>+
>+static void bond_work_cancel_all(struct bonding *bond)
>+{
>+	cancel_delayed_work_sync(&bond->mii_work);
>+	cancel_delayed_work_sync(&bond->arp_work);
>+	cancel_delayed_work_sync(&bond->alb_work);
>+	cancel_delayed_work_sync(&bond->ad_work);
>+	cancel_delayed_work_sync(&bond->mcast_work);
>+}
>+
> static int bond_open(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>@@ -3481,41 +3503,27 @@ static int bond_open(struct net_device *bond_dev)
> 	}
> 	read_unlock(&bond->lock);
>
>-	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
>+	bond_work_init_all(bond);
>
> 	if (bond_is_lb(bond)) {
> 		/* bond_alb_initialize must be called before the timer
> 		 * is started.
> 		 */
>-		if (bond_alb_initialize(bond, (bond->params.mode == BOND_MODE_ALB))) {
>-			/* something went wrong - fail the open operation */
>+		if (bond_alb_initialize(bond, (bond->params.mode == BOND_MODE_ALB)))
> 			return -ENOMEM;
>-		}
>-
>-		INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor);
> 		queue_delayed_work(bond->wq, &bond->alb_work, 0);
> 	}
>
>-	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
>-		INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor);
>+	if (bond->params.miimon)  /* link check interval, in milliseconds. */
> 		queue_delayed_work(bond->wq, &bond->mii_work, 0);
>-	}
>
> 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
>-		if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
>-			INIT_DELAYED_WORK(&bond->arp_work,
>-					  bond_activebackup_arp_mon);
>-		else
>-			INIT_DELAYED_WORK(&bond->arp_work,
>-					  bond_loadbalance_arp_mon);
>-
> 		queue_delayed_work(bond->wq, &bond->arp_work, 0);
> 		if (bond->params.arp_validate)
> 			bond->recv_probe = bond_arp_rcv;
> 	}
>
> 	if (bond->params.mode == BOND_MODE_8023AD) {
>-		INIT_DELAYED_WORK(&bond->ad_work, bond_3ad_state_machine_handler);
> 		queue_delayed_work(bond->wq, &bond->ad_work, 0);
> 		/* register to receive LACPDUs */
> 		bond->recv_probe = bond_3ad_lacpdu_recv;
>@@ -3530,34 +3538,10 @@ static int bond_close(struct net_device *bond_dev)
> 	struct bonding *bond = netdev_priv(bond_dev);
>
> 	write_lock_bh(&bond->lock);
>-
> 	bond->send_peer_notif = 0;
>-
> 	write_unlock_bh(&bond->lock);
>
>-	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
>-		cancel_delayed_work_sync(&bond->mii_work);
>-	}
>-
>-	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
>-		cancel_delayed_work_sync(&bond->arp_work);
>-	}
>-
>-	switch (bond->params.mode) {
>-	case BOND_MODE_8023AD:
>-		cancel_delayed_work_sync(&bond->ad_work);
>-		break;
>-	case BOND_MODE_TLB:
>-	case BOND_MODE_ALB:
>-		cancel_delayed_work_sync(&bond->alb_work);
>-		break;
>-	default:
>-		break;
>-	}
>-
>-	if (delayed_work_pending(&bond->mcast_work))
>-		cancel_delayed_work_sync(&bond->mcast_work);
>-
>+	bond_work_cancel_all(bond);
> 	if (bond_is_lb(bond)) {
> 		/* Must be called only after all
> 		 * slaves have been released
>@@ -4436,26 +4420,6 @@ static void bond_setup(struct net_device *bond_dev)
> 	bond_dev->features |= bond_dev->hw_features;
> }
>
>-static void bond_work_cancel_all(struct bonding *bond)
>-{
>-	if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
>-		cancel_delayed_work_sync(&bond->mii_work);
>-
>-	if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
>-		cancel_delayed_work_sync(&bond->arp_work);
>-
>-	if (bond->params.mode == BOND_MODE_ALB &&
>-	    delayed_work_pending(&bond->alb_work))
>-		cancel_delayed_work_sync(&bond->alb_work);
>-
>-	if (bond->params.mode == BOND_MODE_8023AD &&
>-	    delayed_work_pending(&bond->ad_work))
>-		cancel_delayed_work_sync(&bond->ad_work);
>-
>-	if (delayed_work_pending(&bond->mcast_work))
>-		cancel_delayed_work_sync(&bond->mcast_work);
>-}
>-
> /*
> * Destroy a bonding device.
> * Must be under rtnl_lock when this function is called.
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index ef8d2a0..3327a07 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -513,6 +513,8 @@ static ssize_t bonding_store_arp_interval(struct device *d,
> 	int new_value, ret = count;
> 	struct bonding *bond = to_bond(d);
>
>+	if (!rtnl_trylock())
>+		return restart_syscall();
> 	if (sscanf(buf, "%d", &new_value) != 1) {
> 		pr_err("%s: no arp_interval value specified.\n",
> 		       bond->dev->name);
>@@ -539,10 +541,6 @@ static ssize_t bonding_store_arp_interval(struct device *d,
> 		pr_info("%s: ARP monitoring cannot be used with MII monitoring. %s Disabling MII monitoring.\n",
> 			bond->dev->name, bond->dev->name);
> 		bond->params.miimon = 0;
>-		if (delayed_work_pending(&bond->mii_work)) {
>-			cancel_delayed_work(&bond->mii_work);
>-			flush_workqueue(bond->wq);
>-		}
> 	}
> 	if (!bond->params.arp_targets[0]) {
> 		pr_info("%s: ARP monitoring has been set up, but no ARP targets have been specified.\n",
>@@ -554,19 +552,12 @@ static ssize_t bonding_store_arp_interval(struct device *d,
> 		 * timer will get fired off when the open function
> 		 * is called.
> 		 */
>-		if (!delayed_work_pending(&bond->arp_work)) {
>-			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
>-				INIT_DELAYED_WORK(&bond->arp_work,
>-						  bond_activebackup_arp_mon);
>-			else
>-				INIT_DELAYED_WORK(&bond->arp_work,
>-						  bond_loadbalance_arp_mon);
>-
>-			queue_delayed_work(bond->wq, &bond->arp_work, 0);
>-		}
>+		cancel_delayed_work_sync(&bond->mii_work);
>+		queue_delayed_work(bond->wq, &bond->arp_work, 0);
> 	}
>
> out:
>+	rtnl_unlock();
> 	return ret;
> }
> static DEVICE_ATTR(arp_interval, S_IRUGO | S_IWUSR,
>@@ -962,6 +953,8 @@ static ssize_t bonding_store_miimon(struct device *d,
> 	int new_value, ret = count;
> 	struct bonding *bond = to_bond(d);
>
>+	if (!rtnl_trylock())
>+		return restart_syscall();
> 	if (sscanf(buf, "%d", &new_value) != 1) {
> 		pr_err("%s: no miimon value specified.\n",
> 		       bond->dev->name);
>@@ -993,10 +986,6 @@ static ssize_t bonding_store_miimon(struct device *d,
> 				bond->params.arp_validate =
> 					BOND_ARP_VALIDATE_NONE;
> 			}
>-			if (delayed_work_pending(&bond->arp_work)) {
>-				cancel_delayed_work(&bond->arp_work);
>-				flush_workqueue(bond->wq);
>-			}
> 		}
>
> 		if (bond->dev->flags & IFF_UP) {
>@@ -1005,15 +994,12 @@ static ssize_t bonding_store_miimon(struct device *d,
> 			 * timer will get fired off when the open function
> 			 * is called.
> 			 */
>-			if (!delayed_work_pending(&bond->mii_work)) {
>-				INIT_DELAYED_WORK(&bond->mii_work,
>-						  bond_mii_monitor);
>-				queue_delayed_work(bond->wq,
>-						   &bond->mii_work, 0);
>-			}
>+			cancel_delayed_work_sync(&bond->arp_work);
>+			queue_delayed_work(bond->wq, &bond->mii_work, 0);
> 		}
> 	}
> out:
>+	rtnl_unlock();
> 	return ret;
> }
> static DEVICE_ATTR(miimon, S_IRUGO | S_IWUSR,
>-- 
>1.7.11.7
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v2] bonding: fix race condition in bonding_store_slaves_active
From: Jay Vosburgh @ 2012-11-29 17:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, andy, davem
In-Reply-To: <1354189079-15754-1-git-send-email-nikolay@redhat.com>

Nikolay Aleksandrov <nikolay@redhat.com> wrote:

> Race between bonding_store_slaves_active() and slave manipulation 
> functions. The bond_for_each_slave use in bonding_store_slaves_active()
> is not protected by any synchronization mechanism.
> NULL pointer dereference is easy to reach.
> Fixed by acquiring the bond->lock for the slave walk.
>
> v2: Make description text < 75 columns
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

>---
> drivers/net/bonding/bond_sysfs.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index ef8d2a0..ba4f95b 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -1582,6 +1582,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
> 		goto out;
> 	}
>
>+	read_lock(&bond->lock);
> 	bond_for_each_slave(bond, slave, i) {
> 		if (!bond_is_active_slave(slave)) {
> 			if (new_value)
>@@ -1590,6 +1591,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
> 				slave->inactive = 1;
> 		}
> 	}
>+	read_unlock(&bond->lock);
> out:
> 	return ret;
> }
>-- 
>1.7.11.7
>

^ permalink raw reply

* Re: [PATCH v3] bonding: make arp_ip_target parameter checks consistent with sysfs
From: Jay Vosburgh @ 2012-11-29 17:39 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, andy, davem
In-Reply-To: <1354188846-15577-1-git-send-email-nikolay@redhat.com>

Nikolay Aleksandrov <nikolay@redhat.com> wrote:

> The module can be loaded with arp_ip_target="255.255.255.255" which makes
> it impossible to remove as the function in sysfs checks for that value,
> so we make the parameter checks consistent with sysfs.
>
> v2: Fix formatting
> v3: Make description text < 75 columns
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

>---
> drivers/net/bonding/bond_main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 5f5b69f..d29159a 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4706,12 +4706,13 @@ static int bond_check_params(struct bond_params *params)
> 	     arp_ip_count++) {
> 		/* not complete check, but should be good enough to
> 		   catch mistakes */
>-		if (!isdigit(arp_ip_target[arp_ip_count][0])) {
>+		__be32 ip = in_aton(arp_ip_target[arp_ip_count]);
>+		if (!isdigit(arp_ip_target[arp_ip_count][0]) ||
>+		    ip == 0 || ip == htonl(INADDR_BROADCAST)) {
> 			pr_warning("Warning: bad arp_ip_target module parameter (%s), ARP monitoring will not be performed\n",
> 				   arp_ip_target[arp_ip_count]);
> 			arp_interval = 0;
> 		} else {
>-			__be32 ip = in_aton(arp_ip_target[arp_ip_count]);
> 			arp_target[arp_ip_count] = ip;
> 		}
> 	}
>-- 
>1.7.11.7
>

^ permalink raw reply

* Re: [net-next PATCH V2 3/9] net: frag, move LRU list maintenance outside of rwlock
From: Eric Dumazet @ 2012-11-29 17:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, Florian Westphal, netdev, Pablo Neira Ayuso,
	Thomas Graf, Cong Wang, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121129161137.17754.48002.stgit@dragon>

On Thu, 2012-11-29 at 17:12 +0100, Jesper Dangaard Brouer wrote:
> Updating the fragmentation queues LRU (Least-Recently-Used) list,
> required taking the hash writer lock.  However, the LRU list isn't
> tied to the hash at all, so we can use a separate lock for it.
> 
> This change, in it self, does not improve performance significantly.
> But its part of making the fragmentation code scale.


I would just remove the LRU completely.

All this stuff is 20 years old, we dont need it anymore now we can use
more than 64KB of memory. The design was OK on !SMP

Here is what I would suggest :

Use a schem with a hash table of 256 (or 1024) slots.

Each slot/bucket has : 
  - Its own spinlock.
  - List of items
  - A limit of 5 (or so) elems in the list.

No more LRU, no more rehash (thanks to jhash and the random seed at boot
or first frag created), no more reader-writer lock.

Use a percpu_counter to implement ipfrag_low_thresh/ipfrag_high_thresh

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: David Miller @ 2012-11-29 17:44 UTC (permalink / raw)
  To: brouer
  Cc: eric.dumazet, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <20121129161052.17754.85017.stgit@dragon>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 29 Nov 2012 17:11:09 +0100

> The fragmentation evictor system have a very unfortunate eviction
> system for killing fragment, when the system is put under pressure.
> If packets are coming in too fast, the evictor code kills "warm"
> fragments too quickly.  Resulting in a massive performance drop,
> because we drop frag lists where we have already queue up a lot of
> fragments/work, which gets killed before they have a chance to
> complete.

I think this is a trade-off where the decision is somewhat
arbitrary.

If you kill warm entries, the sending of all of the fragments is
wasted.  If you retain warm entries and drop incoming new fragments,
well then the sending of all of those newer fragments is wasted too.

The only way I could see this making sense is if some "probability
of fulfillment" was taken into account.  For example, if you have
more than half of the fragments already, then yes it may be
advisable to retain the warm entry.

Otherwise, as I said, the decision seems arbitrary.

Let's take a step back and think about why this is happening at all.

I wonder how reasonable the high and low thresholds really are.  Even
once you move them to per-cpu, I think the limits are far too small.

I'm under the impression that it's common for skb->truesize for 1500
MTU frames to be something rounded up to the next power of 2, so
2048 bytes, or something like that.  Then add in the sk_buff control
overhead, as well as the inet_frag head.

So a 64K fragmented frame probably consumes close to 100K.

So once we have three 64K frames in flight, we're already over the
high threshold and will start dropping things.

That's beyond stupid.

^ permalink raw reply

* Re: [net-next PATCH V2 3/9] net: frag, move LRU list maintenance outside of rwlock
From: David Miller @ 2012-11-29 17:48 UTC (permalink / raw)
  To: eric.dumazet
  Cc: brouer, fw, netdev, pablo, tgraf, amwang, kaber, paulmck, herbert
In-Reply-To: <1354211004.3299.12.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Nov 2012 09:43:24 -0800

> Use a schem with a hash table of 256 (or 1024) slots.
> 
> Each slot/bucket has : 
>   - Its own spinlock.
>   - List of items
>   - A limit of 5 (or so) elems in the list.
> 
> No more LRU, no more rehash (thanks to jhash and the random seed at boot
> or first frag created), no more reader-writer lock.
> 
> Use a percpu_counter to implement ipfrag_low_thresh/ipfrag_high_thresh

If we limit the chain sizes to 5 elements, there is no need for
any thresholds at all.

^ permalink raw reply

* Re: [net-next PATCH V2 3/9] net: frag, move LRU list maintenance outside of rwlock
From: Eric Dumazet @ 2012-11-29 17:54 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, fw, netdev, pablo, tgraf, amwang, kaber, paulmck, herbert
In-Reply-To: <20121129.124839.963269461515687321.davem@davemloft.net>

On Thu, 2012-11-29 at 12:48 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 29 Nov 2012 09:43:24 -0800
> 
> > Use a schem with a hash table of 256 (or 1024) slots.
> > 
> > Each slot/bucket has : 
> >   - Its own spinlock.
> >   - List of items
> >   - A limit of 5 (or so) elems in the list.
> > 
> > No more LRU, no more rehash (thanks to jhash and the random seed at boot
> > or first frag created), no more reader-writer lock.
> > 
> > Use a percpu_counter to implement ipfrag_low_thresh/ipfrag_high_thresh
> 
> If we limit the chain sizes to 5 elements, there is no need for
> any thresholds at all.

One element can hold about 100KB.

I guess some systems could have some worries if we consume 1024 * 5 *
100 KB

So lets call the threshold a limit ;)

I agree the ipfrag_low_thresh should disappear :

One we hit the ipfrag_high_thresh (under softirq), we really dont want
to scan table to perform gc, as it might take more than 10 ms

^ permalink raw reply

* Re: [net-next PATCH V2 3/9] net: frag, move LRU list maintenance outside of rwlock
From: David Miller @ 2012-11-29 18:05 UTC (permalink / raw)
  To: eric.dumazet
  Cc: brouer, fw, netdev, pablo, tgraf, amwang, kaber, paulmck, herbert
In-Reply-To: <1354211659.3299.15.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Nov 2012 09:54:19 -0800

> One element can hold about 100KB.
> 
> I guess some systems could have some worries if we consume 1024 * 5 *
> 100 KB

That's true.

Replace 1024 in your formula with X and the limit is therefore
controlled by X.

So it seems the high_thresh can be replaced with an appropriate
determination of X to size the hash.

If X is 256, that limits us to ~130MB per cpu.

We could also tweak the chain limit, call it Y, as well.

^ permalink raw reply

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: David Woodhouse @ 2012-11-29 18:11 UTC (permalink / raw)
  To: chas williams - CONTRACTOR
  Cc: Krzysztof Mazur, David Laight, davem, netdev, linux-kernel,
	nathan
In-Reply-To: <20121129121701.21ab7c0b@thirdoffive.cmf.nrl.navy.mil>

[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]

On Thu, 2012-11-29 at 12:17 -0500, chas williams - CONTRACTOR wrote:
> most atm hardware that i am familiar with, wont deliver vpi/vci data
> unless you are actually trying to receive it.  however, this hardware
> is generally doing its reassembly in hardware and delivering aal5
> pdu's and needs to manage its memory resources carefully.  you might be
> trying to reassemble 1000 pdu's from different vpi/vci's.

In almost *all* cases, there is only one VCC. So much so, in fact, that
I only just realised its firmware seems to crash if you send it a
PKT_POPEN for a new VPI/VCI pair while it's already got one open. But
yes, I think it does actually work in the same way.

We do see the 'packet received for unknown VCC' complaint, after we
reboot the host without resetting the card. And as shown in the commit I
just referenced, we rely on the !ATM_VF_READY check in order to prevent
a use-after-free when the tasklet is sending packets up a VCC that's
just been closed.

> > So bearing that in mind: from the moment ATM_VF_READY gets cleared, as
> > far as the ATM core is concerned, we will no longer receive packets on
> > the given VCC. If we receive any, we'll just complain about receiving
> > packets for an unknown VCI/VPI.
> > 
> > For the TX case ... yes, we need to be sure we aren't continuing to send
> > packets after our close() routine completes. We *used* to, but the
> > resulting ->pop() calls were causing problems, and that's why we're
> > looking at this code path closer. The currently proposed patches (except
> > one suggestion from Krzyztof that we both shouted down) would fix that.
> 
> again part of this is poor synchronization.  the detach protocol (i.e.
> push of a NULL skb) should be flushing any pending transmits and
> shutting down whatever in the protocol is doing any sending and
> receiving.  however, i release this might be difficult to do since the
> detach protocol is invoked in such a strange way.

You mean that (e.g.) pppoatm_push(vcc, NULL) should be waiting for any
pending TX skb (that has already been passed off to the driver) to
*complete*? How would it even do that?

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox