Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 1/3] phylib: Convert MDIO and PHY Lib drivers to support 10G
From: David Daney @ 2011-10-13 16:00 UTC (permalink / raw)
  To: Andy Fleming; +Cc: davem, netdev
In-Reply-To: <1318516660-25452-2-git-send-email-afleming@freescale.com>

On 10/13/2011 07:37 AM, Andy Fleming wrote:
> 10G MDIO is a totally different protocol (clause 45 of 802.3).
> Supporting this new protocol requires a couple of changes:
>
> * Add a new parameter to the mdiobus_read functions to specify the
>    "device address" inside the PHY.
> * Add a phy45_read/write function which takes advantage of that
>    new parameter
> * Convert all of the existing drivers to use the new format
>
> I created a new clause-45-specific read/write functions because:
> 1) phy_read and phy_write are highly overloaded functions, and
>     finding every instance which is actually the PHY Lib version
>     was quite difficult
> 2) Most code which invokes phy_read/phy_write inside PHY Lib is
>     Clause-22-specific. None of the phy_read/phy_write invocations
>     were useable on 10G PHYs
>

I think converting all these phy_read/phy_write to take an extra
parameter is a mistake.  99% of the users have no need for the "device
address".  Also you are still passing the protocol mode as a high
order bit in the register address, so that part is still quite ugly.

The existing infrastructure where we pass the "device address" in bits
16..20 of the register number is much less disruptive.

If you don't like it, an easy and much less intrusive approach might
be a simple (untested) wrapper:

static inline int phy45_read(struct phy_device *phydev,
                              int devad, u16 regnum)
{
	u32 c45_reg = MII_ADDR_C45 | ((devad & 0x1f) << 16) | regnum;
	return phy_read(phydev, c45_reg)
}

static inline int phy45_write(struct phy_device *phydev,
                               int devad, u16 regnum, u16 val)
{
	u32 c45_reg = MII_ADDR_C45 | ((devad & 0x1f) << 16) | regnum;
	return phy_write(phydev, c45_reg, val)
}


> Signed-off-by: Andy Fleming<afleming@freescale.com>
> ---
> v2: Convert newer buses, split out generic PHY support
> v3: Make patch series more coherent
>
>   Documentation/networking/phy.txt                  |   15 +++--
>   arch/powerpc/platforms/pasemi/gpio_mdio.c         |    6 +-
>   drivers/net/ethernet/adi/bfin_mac.c               |    7 +-
>   drivers/net/ethernet/aeroflex/greth.c             |    5 +-
>   drivers/net/ethernet/amd/au1000_eth.c             |    7 +-
>   drivers/net/ethernet/broadcom/bcm63xx_enet.c      |    4 +-
>   drivers/net/ethernet/broadcom/sb1250-mac.c        |    7 +-
>   drivers/net/ethernet/broadcom/tg3.c               |    5 +-
>   drivers/net/ethernet/cadence/macb.c               |    7 +-
>   drivers/net/ethernet/dnet.c                       |    7 +-
>   drivers/net/ethernet/ethoc.c                      |    5 +-
>   drivers/net/ethernet/faraday/ftgmac100.c          |    5 +-
>   drivers/net/ethernet/freescale/fec.c              |    7 +-
>   drivers/net/ethernet/freescale/fec_mpc52xx_phy.c  |    7 +-
>   drivers/net/ethernet/freescale/fs_enet/mii-fec.c  |    6 +-
>   drivers/net/ethernet/freescale/fsl_pq_mdio.c      |   13 ++--
>   drivers/net/ethernet/freescale/fsl_pq_mdio.h      |   11 ++-
>   drivers/net/ethernet/lantiq_etop.c                |    5 +-
>   drivers/net/ethernet/marvell/mv643xx_eth.c        |    5 +-
>   drivers/net/ethernet/marvell/pxa168_eth.c         |    7 +-
>   drivers/net/ethernet/rdc/r6040.c                  |    5 +-
>   drivers/net/ethernet/s6gmac.c                     |    5 +-
>   drivers/net/ethernet/smsc/smsc911x.c              |   22 ++++---
>   drivers/net/ethernet/smsc/smsc9420.c              |   10 ++-
>   drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c |    9 ++-
>   drivers/net/ethernet/ti/cpmac.c                   |    4 +-
>   drivers/net/ethernet/ti/davinci_mdio.c            |    5 +-
>   drivers/net/ethernet/toshiba/tc35815.c            |    5 +-
>   drivers/net/ethernet/xilinx/ll_temac_mdio.c       |    5 +-
>   drivers/net/ethernet/xilinx/xilinx_emaclite.c     |    9 ++-
>   drivers/net/ethernet/xscale/ixp4xx_eth.c          |    7 +-
>   drivers/net/phy/fixed.c                           |    5 +-
>   drivers/net/phy/icplus.c                          |   17 +++--
>   drivers/net/phy/mdio-bitbang.c                    |    5 +-
>   drivers/net/phy/mdio-octeon.c                     |    5 +-
>   drivers/net/phy/mdio_bus.c                        |    8 +-
>   drivers/net/phy/phy.c                             |    5 +-
>   drivers/net/phy/phy_device.c                      |   64 +++++++++++++------
>   include/linux/phy.h                               |   70 ++++++++++++++++++---
>   net/dsa/slave.c                                   |    5 +-
>   40 files changed, 270 insertions(+), 141 deletions(-)
>

^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Ben Hutchings @ 2011-10-13 16:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318519581.2393.18.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Thu, 2011-10-13 at 17:26 +0200, Eric Dumazet wrote:
> skb truesize currently accounts for sk_buff struct and part of skb head.
> 
> Considering that skb_shared_info is larger than sk_buff, its time to
> take it into account for better memory accounting.
> 
> This patch introduces SKB_TRUESIZE(X) macro to centralize various
> assumptions into a single place.
> 
> At skb alloc phase, we put skb_shared_info struct at the exact end of
> skb head, to allow a better use of memory (lowering number of
> reallocations), since kmalloc() gives us power-of-two memory blocks.
[...]
> index 5b2c5f1..be66154 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -184,11 +184,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  		goto out;
>  	prefetchw(skb);
>  
> -	size = SKB_DATA_ALIGN(size);
> -	data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
> -			gfp_mask, node);
> +	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
[...]

If we want to put the data and skb_shared_info on separate cache-lines
then we should use:
	size = SKB_DATA_ALIGN(size) + sizeof(struct skb_shared_info);
(which is effectively what we're doing now).

If that's not important, and we just want to be sure that the allocation
occupies at least a whole cache line, then it should be:
	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));

But I don't think it makes sense to use SKB_DATA_ALIGN(sizeof(struct
skb_shared_info)).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v3 1/3] phylib: Convert MDIO and PHY Lib drivers to support 10G
From: David Daney @ 2011-10-13 16:18 UTC (permalink / raw)
  To: Andy Fleming; +Cc: davem, netdev
In-Reply-To: <1318516660-25452-2-git-send-email-afleming@freescale.com>

On 10/13/2011 07:37 AM, Andy Fleming wrote:

>   /**
> + * is_10g_interface - Distinguishes 10G from 10/100/1000
> + * @interface: PHY interface type
> + *
> + * Returns true if the passed interface is capable of 10G,
> + * and therefore indicates the need for Clause-45-style
> + * MDIO transactions.
> + *
> + * For now, XGMII is the only 10G interface
> + */
> +static inline bool is_10g_interface(phy_interface_t interface)
> +{
> +	return interface == PHY_INTERFACE_MODE_XGMII;
> +}
> +

The packet interface to the PHY and the protocol used on its MDIO bus 
are two separate things.  This function conflates them.

Although it is not perfect, my alternate approach:

http://marc.info/?l=linux-netdev&m=131844284003871

Adds a flag bit to struct phy_device that indicates the MDIO bus 
protocol to be used.  Have you considered doing something like that instead?

David Daney

^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Eric Dumazet @ 2011-10-13 16:24 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318521901.2745.18.camel@bwh-desktop>

Le jeudi 13 octobre 2011 à 17:05 +0100, Ben Hutchings a écrit :
> On Thu, 2011-10-13 at 17:26 +0200, Eric Dumazet wrote:
> > skb truesize currently accounts for sk_buff struct and part of skb head.
> > 
> > Considering that skb_shared_info is larger than sk_buff, its time to
> > take it into account for better memory accounting.
> > 
> > This patch introduces SKB_TRUESIZE(X) macro to centralize various
> > assumptions into a single place.
> > 
> > At skb alloc phase, we put skb_shared_info struct at the exact end of
> > skb head, to allow a better use of memory (lowering number of
> > reallocations), since kmalloc() gives us power-of-two memory blocks.
> [...]
> > index 5b2c5f1..be66154 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -184,11 +184,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >  		goto out;
> >  	prefetchw(skb);
> >  
> > -	size = SKB_DATA_ALIGN(size);
> > -	data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
> > -			gfp_mask, node);
> > +	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> [...]
> 
> If we want to put the data and skb_shared_info on separate cache-lines
> then we should use:
> 	size = SKB_DATA_ALIGN(size) + sizeof(struct skb_shared_info);
> (which is effectively what we're doing now).
> 

Same behavior after my patch : skb_shared_info starts at a cache-line
boundary, like before, unless kmalloc() gives us unaligned memory (it
can in certain debugging situations)

So previous part (before skb_shared_info) will also be a multiple of
SMP_CACHE_BYTES because of kmalloc() behavior.

> If that's not important, and we just want to be sure that the allocation
> occupies at least a whole cache line, then it should be:
> 	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));
> 
> But I don't think it makes sense to use SKB_DATA_ALIGN(sizeof(struct
> skb_shared_info)).

If you take a closer look, you'll see that my patch addresses your
concerns, but at minimal cpu cost.

kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

will give same result than :

kmalloc(SKB_DATA_ALIGN(size) + SKB_DATA_ALIGN(sizeof(struct
skb_shared_info)))

But my version is a bit faster (a single add of a compiler known
constant)

^ permalink raw reply

* Re: [PATCH 1/3] netdev/phy: Handle IEEE802.3 clause 45 Ethernet PHYs
From: Ben Hutchings @ 2011-10-13 16:27 UTC (permalink / raw)
  To: David Daney
  Cc: devicetree-discuss, grant.likely, linux-kernel, netdev, davem,
	afleming
In-Reply-To: <1318442783-29058-2-git-send-email-david.daney@cavium.com>

On Wed, 2011-10-12 at 11:06 -0700, David Daney wrote:
> The IEEE802.3 clause 45 MDIO bus protocol allows for directly
> addressing PHY registers using a 21 bit address, and is used by many
> 10G Ethernet PHYS.  Already existing is the ability of MDIO bus
> drivers to use clause 45, with the MII_ADDR_C45 flag.  Here we add
> some support in the PHY and device tree infrastructure to use these
> PHYs.
> 
> Normally the MII_ADDR_C45 flag is ORed with the register address to
> indicate a clause 45 transaction.  Here we also use this flag in the
> *device* address passed to get_phy_id() and get_phy_device() to
> indicate that probing should be done with clause 45 transactions.  If
> a PHY is successfully probed with MII_ADDR_C45, the new struct
> phy_device is_c45 flag is set for the PHY.

That deserves a comment next to the definition of the macro.

> Signed-off-by: David Daney <david.daney@cavium.com>
> ---
>  drivers/net/phy/phy_device.c |   25 ++++++++++++++++++++++---
>  include/linux/phy.h          |    3 +++
>  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 83a5a5a..7e4d61b 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -171,6 +171,8 @@ static struct phy_device* phy_device_create(struct mii_bus *bus,
>  
>  	dev->autoneg = AUTONEG_ENABLE;
>  
> +	dev->is_c45 = (addr & MII_ADDR_C45) != 0;
> +	addr &= ~MII_ADDR_C45;
>  	dev->addr = addr;
>  	dev->phy_id = phy_id;
>  	dev->bus = bus;
> @@ -205,15 +207,24 @@ static struct phy_device* phy_device_create(struct mii_bus *bus,
>   * @phy_id: where to store the ID retrieved.
>   *
>   * Description: Reads the ID registers of the PHY at @addr on the
> - *   @bus, stores it in @phy_id and returns zero on success.
> + *   @bus, stores it in @phy_id and returns zero on success.  If the
> + *   @addr has been ORed with MII_ADDR_C45, mdio clause 45 data
> + *   transfer is used to read ID from the PHY device, otherwise the
> + *   standard protocol (clause 22) is used.
>   */
>  int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id)
>  {
>  	int phy_reg;
> +	u32 c45_reg_base = 0;
>  
> +	if (addr & MII_ADDR_C45) {
> +		addr &= ~MII_ADDR_C45;
> +		/* Access the PHY's PHY XS registers with C45 mode. */
> +		c45_reg_base = MII_ADDR_C45 | 0x40000;
> +	}
[...]

I'm not sure it's a safe assumption that every PHY has a PHY XS block.
That said, I've not seen any that don't.  mdio45_probe() should work out
which blocks are there, if you can set up an mdio_if_info for it.

It would be nice if someone would try to improve integration between
mdio/mii and phylib rather than duplicating logic and interfaces.  I'm
afraid I'm not going to be spending time on MDIO, though, since we've
putting PHY drivers in firmware rather than on the host now.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH net-next] be2net: fix truesize errors
From: Eric Dumazet @ 2011-10-13 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Sathya Perla, Subbu Seetharaman, Ajit Khaparde

Fix skb truesize underestimations of this driver.

Each frag truesize is exactly rx_frag_size bytes. (2048 bytes per
default)

A driver should not use "sizeof(struct sk_buff)" at all.

Signed-off-by: Eric Dumazet <eric.dumazet>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 816ce56..679b804 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1071,6 +1071,7 @@ static void skb_fill_rx_data(struct be_adapter *adapter, struct be_rx_obj *rxo,
 					page_info->page_offset + hdr_len;
 		skb_shinfo(skb)->frags[0].size = curr_frag_len - hdr_len;
 		skb->data_len = curr_frag_len - hdr_len;
+		skb->truesize += rx_frag_size;
 		skb->tail += hdr_len;
 	}
 	page_info->page = NULL;
@@ -1103,7 +1104,7 @@ static void skb_fill_rx_data(struct be_adapter *adapter, struct be_rx_obj *rxo,
 		skb_shinfo(skb)->frags[j].size += curr_frag_len;
 		skb->len += curr_frag_len;
 		skb->data_len += curr_frag_len;
-
+		skb->truesize += rx_frag_size;
 		remaining -= curr_frag_len;
 		index_inc(&rxcp->rxq_idx, rxq->len);
 		page_info->page = NULL;
@@ -1133,7 +1134,6 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 	else
 		skb_checksum_none_assert(skb);
 
-	skb->truesize = skb->len + sizeof(struct sk_buff);
 	skb->protocol = eth_type_trans(skb, netdev);
 	if (adapter->netdev->features & NETIF_F_RXHASH)
 		skb->rxhash = rxcp->rss_hash;
@@ -1181,7 +1181,7 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 			put_page(page_info->page);
 		}
 		skb_shinfo(skb)->frags[j].size += curr_frag_len;
-
+		skb->truesize += rx_frag_size;
 		remaining -= curr_frag_len;
 		index_inc(&rxcp->rxq_idx, rxq->len);
 		memset(page_info, 0, sizeof(*page_info));
@@ -1191,7 +1191,6 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	skb_shinfo(skb)->nr_frags = j + 1;
 	skb->len = rxcp->pkt_size;
 	skb->data_len = rxcp->pkt_size;
-	skb->truesize += rxcp->pkt_size;
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	if (adapter->netdev->features & NETIF_F_RXHASH)
 		skb->rxhash = rxcp->rss_hash;

^ permalink raw reply related

* Re: [PATCH net-next] be2net: fix truesize errors
From: Eric Dumazet @ 2011-10-13 16:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Sathya Perla, Subbu Seetharaman, Ajit Khaparde
In-Reply-To: <1318523462.2393.33.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Le jeudi 13 octobre 2011 à 18:31 +0200, Eric Dumazet a écrit :
> Fix skb truesize underestimations of this driver.
> 
> Each frag truesize is exactly rx_frag_size bytes. (2048 bytes per
> default)
> 
> A driver should not use "sizeof(struct sk_buff)" at all.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet>

Oh well, garbled Signed-off-by, sorry !

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Ben Hutchings @ 2011-10-13 16:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318523090.2393.28.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Thu, 2011-10-13 at 18:24 +0200, Eric Dumazet wrote:
> Le jeudi 13 octobre 2011 à 17:05 +0100, Ben Hutchings a écrit :
[...]
> > If that's not important, and we just want to be sure that the allocation
> > occupies at least a whole cache line, then it should be:
> > 	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));
> > 
> > But I don't think it makes sense to use SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info)).
> 
> If you take a closer look, you'll see that my patch addresses your
> concerns, but at minimal cpu cost.
> 
> kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> 
> will give same result than :
> 
> kmalloc(SKB_DATA_ALIGN(size) + SKB_DATA_ALIGN(sizeof(struct
> skb_shared_info)))
> 
> But my version is a bit faster (a single add of a compiler known
> constant)

Fair enough, but please add a comment explaining this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC PATCH V2 1/2] net/core/ethtool: New Commands to Configure IOV features
From: Rose, Gregory V @ 2011-10-13 17:04 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1318034250.2771.158.camel@bwh-desktop>

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Friday, October 07, 2011 5:38 PM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH V2 1/2] net/core/ethtool: New Commands to
> Configure IOV features
> 
> On Thu, 2011-09-22 at 14:35 -0700, Greg Rose wrote:
> > The only currently supported method of configuring the number of VFs
> > is through the max_vfs module parameter.  This method is inadequate to
> > support scenarios in which the user might wish to have varying numbers
> > of VFs per PF.  There is additional support for drivers that want to
> > partition some driver resources to additional net devices and for
> > configuring the number of Tx/Rx queue pairs per VF.
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > ---
> >
> >  include/linux/ethtool.h |   30 +++++++++++++++++++++++++++++-
> >  net/core/ethtool.c      |   38 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 67 insertions(+), 1 deletions(-)
> >
> > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> > index 45f00b6..448730f 100644
> > --- a/include/linux/ethtool.h
> > +++ b/include/linux/ethtool.h
> > @@ -720,6 +720,27 @@ enum ethtool_sfeatures_retval_bits {
> >  #define ETHTOOL_F_WISH          (1 << ETHTOOL_F_WISH__BIT)
> >  #define ETHTOOL_F_COMPAT        (1 << ETHTOOL_F_COMPAT__BIT)
> >
> > +enum ethtool_iov_modes {
> > +	ETHTOOL_IOV_MODE_NONE,
> > +	ETHTOOL_IOV_MODE_SRIOV,
> > +	ETHTOOL_IOV_MODE_NETDEVS,
> > +};
> > +
> > +#define	ETHTOOL_IOV_CMD_CONFIGURE_SRIOV		1
> > +#define	ETHTOOL_IOV_CMD_CONFIGURE_NETDEVS	2
> > +#define	ETHTOOL_IOV_CMD_CONFIGURE_VF_QUEUES	3
> > +
> > +struct ethtool_iov_set_cmd {
> > +	__u32 cmd;
> > +	__u32 set_cmd;
> > +	__u32 cmd_param;
> > +};
> 
> Why introduce sub-commands?

For maximum flexibility.  It's difficult (to me anyway) to foresee all the things we might want to configure for I/O virtualization features.

> 
> > +struct ethtool_iov_get_cmd {
> > +	__u32 cmd;
> > +	__u32 mode;
> > +};
> 
> Why is it only possible to get the mode?

To my way of thinking it was the only interesting thing to query.  I suppose I could also return the number of VFs or net devices along with the mode.

> 
> It really seems to make more sense to group these three parameters
> together and get/set them all at once.  But if you can justify making
> them separate then struct ethtool_value is the type to use.

I can do that.

> 
> In any case, any new types or macros need comments.

OK.

> 
> [...]
> > +static int ethtool_iov_get_command(struct net_device *dev,
> > +				  void __user *useraddr)
> > +{
> > +	int ret;
> > +	struct ethtool_iov_get_cmd iov_cmd;
> > +
> > +	if (!dev->ethtool_ops->iov_get_cmd)
> > +		return -EOPNOTSUPP;
> > +	if (copy_from_user(&iov_cmd, useraddr, sizeof(iov_cmd)))
> > +		return -EFAULT;
> > +
> > +	ret = dev->ethtool_ops->iov_get_cmd(dev, &iov_cmd);
> > +	if (!ret)
> > +		ret = copy_to_user(useraddr, &iov_cmd, sizeof(iov_cmd));
> 
> 	if (!ret && copy_to_user(...))
> 		ret = -EFAULT;
> 
> > +	return ret;
> > +}
> [...]
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: [RFC PATCH V2] ethtool:  Add command to configure IOV features
From: Rose, Gregory V @ 2011-10-13 17:07 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1318034485.2771.162.camel@bwh-desktop>

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Friday, October 07, 2011 5:41 PM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH V2] ethtool: Add command to configure IOV features
> 
> On Thu, 2011-09-22 at 14:35 -0700, Greg Rose wrote:
> > New command to allow configuration of IOV features such as the number of
> > Virtual Functions to allocate for a given Physical Function interface,
> > the number of semi-independent net devices to allocate from partitioned
> > I/O resources in the PF and to set the number of queues per VF.
> [...]
> > @@ -266,6 +270,8 @@ static struct option {
> >      { "-W", "--set-dump", MODE_SET_DUMP,
> >  		"Set dump flag of the device",
> >  		"		N\n"},
> > +    { "-v", "--get-iov", MODE_GET_IOV, "Get IOV parameters", "\n" },
> > +    { "-V", "--set_iov", MODE_SET_IOV, "Set IOV parameters", "[ N ]\n"
> },
> 
> Doesn't match the implementation.

Whoops... OK.

> 
> [...]
> > +static int do_get_iov(int fd, struct ifreq *ifr)
> > +{
> > +	int err;
> > +	struct ethtool_iov_get_cmd iov_cmd;
> > +
> > +	iov_cmd.cmd = ETHTOOL_IOV_GET_CMD;
> > +	ifr->ifr_data = (caddr_t)&iov_cmd;
> > +	err = send_ioctl(fd, ifr);
> > +	if (err < 0) {
> > +		perror("Can not get current IOV mode\n");
> > +		return 1;
> > +	}
> > +
> > +	memcpy(&iov_cmd, ifr->ifr_data, sizeof(iov_cmd));
> 
> But ifr->ifr_data == &iov_cmd.  So this is both pointless and dangerous
> (as memcpy() doesn't handle overlapping source and destination).

Huh... what was I thinking?  Yeah, I'll fix that.

> 
> [...]
> > +static int do_set_iov(int fd, struct ifreq *ifr)
> > +{
> > +	int err;
> > +	struct ethtool_iov_set_cmd iov_cmd;
> > +
> > +	if (iov_changed) {
> > +		iov_cmd.cmd = ETHTOOL_IOV_SET_CMD;
> > +		if (iov_numvfs_wanted >= 0) {
> > +			iov_cmd.set_cmd = ETHTOOL_IOV_CMD_CONFIGURE_SRIOV;
> > +			iov_cmd.cmd_param = iov_numvfs_wanted;
> > +		} else if (iov_numnetdevs_wanted >= 0) {
> > +			iov_cmd.set_cmd = ETHTOOL_IOV_CMD_CONFIGURE_NETDEVS;
> > +			iov_cmd.cmd_param = iov_numnetdevs_wanted;
> > +		} else if (iov_numvqueues_wanted >= 0) {
> > +			iov_cmd.set_cmd = ETHTOOL_IOV_CMD_CONFIGURE_VF_QUEUES;
> > +			iov_cmd.cmd_param = iov_numvqueues_wanted;
> > +		} else {
> > +			return -EINVAL;
> > +		}
> [...]
> 
> So what if the user specifies multiple keywords?

I hadn't contemplated that.

> 
> Also missing an update to the manual page.

When I get through the RFC process and we've settled on something acceptable to the community I'll update the man page.

Thanks for the review Ben.

- Greg


^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Eric Dumazet @ 2011-10-13 17:11 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318523564.2745.32.camel@bwh-desktop>

Le jeudi 13 octobre 2011 à 17:32 +0100, Ben Hutchings a écrit :

> Fair enough, but please add a comment explaining this.
> 


What about :

	/* We do our best to align skb_shared_info on a separate cache
	 * line. It usually works because kmalloc(X > CACHE_BYTES) gives
	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
	 */
	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Ben Hutchings @ 2011-10-13 17:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318525877.2393.41.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Thu, 2011-10-13 at 19:11 +0200, Eric Dumazet wrote:
> Le jeudi 13 octobre 2011 à 17:32 +0100, Ben Hutchings a écrit :
> 
> > Fair enough, but please add a comment explaining this.
> > 
> 
> 
> What about :
> 
> 	/* We do our best to align skb_shared_info on a separate cache
> 	 * line. It usually works because kmalloc(X > CACHE_BYTES) gives
> 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> 	 */
> 	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

Yes, that seems like a good explanation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH net-next] bonding: fix wrong port enabling in 802.3ad
From: Flavio Leitner @ 2011-10-13 17:21 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek, Flavio Leitner

The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.

The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem.  However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---

The comments there suggests it was a workaround for losses
of link events, but I couldn't track the changelog as it
seems to be pretty old.  Thus, as all the link notification
stuff has been improved a lot, maybe this is not an issue
anymore.  At least, I didn't find any problem while
unplugging/plugging cables here.

 drivers/net/bonding/bond_3ad.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 47b928e..b33c099 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1135,13 +1135,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
 			__record_pdu(lacpdu, port);
 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT));
 			port->actor_oper_port_state &= ~AD_STATE_EXPIRED;
-			// verify that if the aggregator is enabled, the port is enabled too.
-			//(because if the link goes down for a short time, the 802.3ad will not
-			// catch it, and the port will continue to be disabled)
-			if (port->aggregator
-			    && port->aggregator->is_active
-			    && !__port_is_enabled(port))
-				__enable_port(port);
 			break;
 		default:    //to silence the compiler
 			break;
-- 
1.7.6

^ permalink raw reply related

* Re: [PATCH 0/5] Better namespace handling for /sys/class/net/bonding_masters
From: Greg KH @ 2011-10-13 17:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, linux-kernel, netdev, Tejun Heo, Jay Vosburgh,
	Andy Gospodarek
In-Reply-To: <m1hb3dbae5.fsf@fess.ebiederm.org>

On Thu, Oct 13, 2011 at 12:47:46AM -0700, Eric W. Biederman wrote:
> 
> When I was looking at another sysfs issue that Al pointed out (since
> fixed) I realized that I had implemented a trivial code size but overly
> clever way to handle /sys/class/net/bonding_masters.
> 
> This patchset removes the support for untagged entries in tagged
> directories (that is currently used to support bonding_masters)
> and replaces it with support for tagged sysfs attributes.
> 
> In the process this fixes a small misfeature in how bonding_masters
> derives the network namespace we are dealing with.  This change
> allows bonding_masters to derive the network namespace from the
> copy of bonding_masters we open instead of magically from current.
> 
> The final patch of this patchset adds sanity checks to sysfs.  To
> ensure that we don't accidentally mishandle tagged sysfs entities.
> 
> I have tested this code against 3.1-rc9 on my laptop with a mostly yes
> config and I am not seeing any problems.  The loud screaming warnings I
> have added in the last patch should catch any corner cases in how people
> use sysfs that I might have overlooked.
> 
> Greg, Dave I'm don't know whose tree to merge this through as this code
> is equally device-core and networking.  I am hoping that we can get this
> improvement merged for 3.2.

These all look fine to me, and I have no problem with them going through
the network tree as that will most likely be the easiest way due to any
potential merge issues.

David, feel free to add an:
	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

to these patches and take them through your tree if that is ok with you.
If not, I can also always take them through mine, it's your choice.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next] net: more accurate skb truesize
From: Eric Dumazet @ 2011-10-13 17:28 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, Andi Kleen
In-Reply-To: <1318525990.2745.33.camel@bwh-desktop>

Le jeudi 13 octobre 2011 à 18:13 +0100, Ben Hutchings a écrit :
> On Thu, 2011-10-13 at 19:11 +0200, Eric Dumazet wrote:
> > Le jeudi 13 octobre 2011 à 17:32 +0100, Ben Hutchings a écrit :
> > 
> > > Fair enough, but please add a comment explaining this.
> > > 
> > 
> > 
> > What about :
> > 
> > 	/* We do our best to align skb_shared_info on a separate cache
> > 	 * line. It usually works because kmalloc(X > CACHE_BYTES) gives
> > 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> > 	 */
> > 	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> 
> Yes, that seems like a good explanation.

Thanks !

Here is the second version of the patch, including this comment.

[PATCH V2 net-next] net: more accurate skb truesize

skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/skbuff.h |    5 +++++
 net/core/skbuff.c      |   18 ++++++++++++++----
 net/core/sock.c        |    2 +-
 net/ipv4/icmp.c        |    5 ++---
 net/ipv4/tcp_input.c   |   14 +++++++-------
 net/ipv6/icmp.c        |    3 +--
 net/iucv/af_iucv.c     |    2 +-
 net/sctp/protocol.c    |    2 +-
 8 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac6b05a..64f8695 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -46,6 +46,11 @@
 #define SKB_MAX_HEAD(X)		(SKB_MAX_ORDER((X), 0))
 #define SKB_MAX_ALLOC		(SKB_MAX_ORDER(0, 2))
 
+/* return minimum truesize of one skb containing X bytes of data */
+#define SKB_TRUESIZE(X) ((X) +						\
+			 SKB_DATA_ALIGN(sizeof(struct sk_buff)) +	\
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 /* A. Checksumming of received packets by device.
  *
  *	NONE: device failed to checksum this packet.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b2c5f1..a7f855d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -184,11 +184,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 		goto out;
 	prefetchw(skb);
 
-	size = SKB_DATA_ALIGN(size);
-	data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
-			gfp_mask, node);
+	/* We do our best to align skb_shared_info on a separate cache
+	 * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
+	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
+	 * Both skb->head and skb_shared_info are cache line aligned.
+	 */
+	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	data = kmalloc_node_track_caller(size, gfp_mask, node);
 	if (!data)
 		goto nodata;
+	/* kmalloc(size) might give us more room than requested.
+	 * Put skb_shared_info exactly at the end of allocated zone,
+	 * to allow max possible filling before reallocation.
+	 */
+	size = SKB_WITH_OVERHEAD(ksize(data));
 	prefetchw(data + size);
 
 	/*
@@ -197,7 +206,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * the tail pointer in struct sk_buff!
 	 */
 	memset(skb, 0, offsetof(struct sk_buff, tail));
-	skb->truesize = size + sizeof(struct sk_buff);
+	/* Account for allocated memory : skb + skb->head */
+	skb->truesize = SKB_TRUESIZE(size);
 	atomic_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
diff --git a/net/core/sock.c b/net/core/sock.c
index 83c462d..5a08762 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -207,7 +207,7 @@ static struct lock_class_key af_callback_keys[AF_MAX];
  * not depend upon such differences.
  */
 #define _SK_MEM_PACKETS		256
-#define _SK_MEM_OVERHEAD	(sizeof(struct sk_buff) + 256)
+#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
 #define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
 #define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 23ef31b..ab188ae 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -1152,10 +1152,9 @@ static int __net_init icmp_sk_init(struct net *net)
 		net->ipv4.icmp_sk[i] = sk;
 
 		/* Enough space for 2 64K ICMP packets, including
-		 * sk_buff struct overhead.
+		 * sk_buff/skb_shared_info struct overhead.
 		 */
-		sk->sk_sndbuf =
-			(2 * ((64 * 1024) + sizeof(struct sk_buff)));
+		sk->sk_sndbuf =	2 * SKB_TRUESIZE(64 * 1024);
 
 		/*
 		 * Speedup sock_wfree()
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 81cae64..c1653fe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -265,8 +265,7 @@ static inline int TCP_ECN_rcv_ecn_echo(struct tcp_sock *tp, struct tcphdr *th)
 
 static void tcp_fixup_sndbuf(struct sock *sk)
 {
-	int sndmem = tcp_sk(sk)->rx_opt.mss_clamp + MAX_TCP_HEADER + 16 +
-		     sizeof(struct sk_buff);
+	int sndmem = SKB_TRUESIZE(tcp_sk(sk)->rx_opt.mss_clamp + MAX_TCP_HEADER);
 
 	if (sk->sk_sndbuf < 3 * sndmem) {
 		sk->sk_sndbuf = 3 * sndmem;
@@ -349,7 +348,7 @@ static void tcp_grow_window(struct sock *sk, struct sk_buff *skb)
 static void tcp_fixup_rcvbuf(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int rcvmem = tp->advmss + MAX_TCP_HEADER + 16 + sizeof(struct sk_buff);
+	int rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER);
 
 	/* Try to select rcvbuf so that 4 mss-sized segments
 	 * will fit to window and corresponding skbs will fit to our rcvbuf.
@@ -540,8 +539,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
 			space /= tp->advmss;
 			if (!space)
 				space = 1;
-			rcvmem = (tp->advmss + MAX_TCP_HEADER +
-				  16 + sizeof(struct sk_buff));
+			rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER);
 			while (tcp_win_from_space(rcvmem) < tp->advmss)
 				rcvmem += 128;
 			space *= rcvmem;
@@ -4950,8 +4948,10 @@ static void tcp_new_space(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	if (tcp_should_expand_sndbuf(sk)) {
-		int sndmem = max_t(u32, tp->rx_opt.mss_clamp, tp->mss_cache) +
-			MAX_TCP_HEADER + 16 + sizeof(struct sk_buff);
+		int sndmem = SKB_TRUESIZE(max_t(u32,
+						tp->rx_opt.mss_clamp,
+						tp->mss_cache) +
+					  MAX_TCP_HEADER);
 		int demanded = max_t(unsigned int, tp->snd_cwnd,
 				     tp->reordering + 1);
 		sndmem *= 2 * demanded;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 2b59154..90868fb 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -835,8 +835,7 @@ static int __net_init icmpv6_sk_init(struct net *net)
 		/* Enough space for 2 64K ICMP packets, including
 		 * sk_buff struct overhead.
 		 */
-		sk->sk_sndbuf =
-			(2 * ((64 * 1024) + sizeof(struct sk_buff)));
+		sk->sk_sndbuf = 2 * SKB_TRUESIZE(64 * 1024);
 	}
 	return 0;
 
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index c39f3a4..274d150 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1819,7 +1819,7 @@ static void iucv_callback_rx(struct iucv_path *path, struct iucv_message *msg)
 		goto save_message;
 
 	len = atomic_read(&sk->sk_rmem_alloc);
-	len += iucv_msg_length(msg) + sizeof(struct sk_buff);
+	len += SKB_TRUESIZE(iucv_msg_length(msg));
 	if (len > sk->sk_rcvbuf)
 		goto save_message;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 91784f4..61b9fca 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1299,7 +1299,7 @@ SCTP_STATIC __init int sctp_init(void)
 	max_share = min(4UL*1024*1024, limit);
 
 	sysctl_sctp_rmem[0] = SK_MEM_QUANTUM; /* give each asoc 1 page min */
-	sysctl_sctp_rmem[1] = (1500 *(sizeof(struct sk_buff) + 1));
+	sysctl_sctp_rmem[1] = 1500 * SKB_TRUESIZE(1);
 	sysctl_sctp_rmem[2] = max(sysctl_sctp_rmem[1], max_share);
 
 	sysctl_sctp_wmem[0] = SK_MEM_QUANTUM;

^ permalink raw reply related

* How are you
From: Kobi Tema @ 2011-10-13 16:42 UTC (permalink / raw)




Dear friend,

I am Mr. Kobi Tema, I work with bank here in Accra Ghana. I have a proposition
involving the sum of $36 million USD in our bank invested by one Iraq man who is
late now, this deal will be of mutual benefit for both of us.

Should you be interested kindly get back to me

Your earliest response to this mail will be highly appreciated.

Kind Regards
Kobi Tema.

^ permalink raw reply

* Re: [net-next PATCH] net: allow vlan traffic to be received under bond
From: John Fastabend @ 2011-10-13 17:42 UTC (permalink / raw)
  To: Hans Schillström
  Cc: Jiri Pirko, Maxime Bizon, davem@davemloft.net,
	netdev@vger.kernel.org, jesse@nicira.com, fubar@us.ibm.com
In-Reply-To: <C8A6796DE7C66C4ABCBC18106CB6C1CC106D903179@ESESSCMS0356.eemea.ericsson.se>

On 10/13/2011 8:59 AM, Hans Schillström wrote:
> Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>>>
>>> On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>>>
>>>> Hmm, I must look at this again tomorrow but I have strong feeling this
>>>> will break some some scenario including vlan-bridge-macvlan.
>>>
>>> unless I'm mistaken, today's behaviour:
>>>
>>> # vconfig add eth0 100
>>> # brctl addbr br0
>>> # brctl addif br0 eth0
>>>
>>> => eth0.100 gets no more packets, br0.100 is to be used
>>>
>>> after the patch won't we get the opposite ?
>>
>> Looks like it. The question is what is the correct behaviour...
> 
> I think this it become correct now, you should not destroy lover level if possible.
> I.e. as John wrote "it's not an unexpected behaviour"
> 
> Consider adding a bridge to a vlan like this
> 
> vconfig add eth0 100
> brctl addbr br1
> brctl addif br1 eth0.100
> 
> If you later add a bridge (or bond) should the previous added bridge still work ? 
> Yes I think so, for me it's the expected behaviour.
> 
> brctl addbr br0
> brctl addif br0 eth0
> 
> Regards
> Hans
> 
> 

Sorry I'm not entirely sure I followed the above two posts. Note
this patch restores behavior that has existed for most of the
2.6.x kernels. In the 3.x kernels VLAN100 is dropped in the
schematic below (assuming eth0 is active), I think this is
incorrect.

       ---> eth0.100
       |
       |
eth0 -----> br0

With this patch VLAN100 is delivered to eth0.100 as I expect. Now
adding a VLAN100 to br0.

       ---> eth0.100
       |
       |
eth0 -----> br0---> br0.100

With this patch in the above case VLAN100 is delivered to the
first matching vlan. Here eth0.100 will receive the packet If
you really want br0.100 to receive the packet remove eth0.100.
Without this patch the packet will be delivered to br0.100,
but no configuration will allow a packet to be delivered to
eth0.100 with a bond.

The first schematic is used for doing bonding on LAN traffic
and SAN (storage) traffic with MPIO. So I think my expectations
are correct and have a real use case.

Thanks,
John

^ permalink raw reply

* Re: [PATCH net-next] bonding: fix wrong port enabling in 802.3ad
From: Jay Vosburgh @ 2011-10-13 17:46 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, Andy Gospodarek
In-Reply-To: <1318526483-16073-1-git-send-email-fbl@redhat.com>

Flavio Leitner <fbl@redhat.com> wrote:

>The port shouldn't be enabled unless its current MUX
>state is DISTRIBUTING which is correctly handled by
>ad_mux_machine(), otherwise the packet sent can be
>lost because the other end may not be ready.
>
>The issue happens on every port initialization, but
>as the ports are expected to move quickly to DISTRIBUTING,
>it doesn't cause much problem.  However, it does cause
>constant packet loss if the other peer has the port
>configured to stay in STANDBY (i.e. SYNC set to OFF).

	This may explain another misbehavior I've been looking into: if
the bond's outgoing LACPDUs are lost (never received by the switch), but
the switch's incoming LACPDUs are received, bonding puts the port into
use, and packets to the switch are dropped by the switch.

>Signed-off-by: Flavio Leitner <fbl@redhat.com>
>---
>
>The comments there suggests it was a workaround for losses
>of link events, but I couldn't track the changelog as it
>seems to be pretty old.  Thus, as all the link notification
>stuff has been improved a lot, maybe this is not an issue
>anymore.  At least, I didn't find any problem while
>unplugging/plugging cables here.

	I believe this code fragment is original to the 802.3ad
submission, which would have been around 2003 or so.

	Did you check the standard for what it says should happen in
this case?  I'm guessing this is something not specified by the
standard, given the comment, but we should check to make sure.

	-J

> drivers/net/bonding/bond_3ad.c |    7 -------
> 1 files changed, 0 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 47b928e..b33c099 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1135,13 +1135,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 			__record_pdu(lacpdu, port);
> 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT));
> 			port->actor_oper_port_state &= ~AD_STATE_EXPIRED;
>-			// verify that if the aggregator is enabled, the port is enabled too.
>-			//(because if the link goes down for a short time, the 802.3ad will not
>-			// catch it, and the port will continue to be disabled)
>-			if (port->aggregator
>-			    && port->aggregator->is_active
>-			    && !__port_is_enabled(port))
>-				__enable_port(port);
> 			break;
> 		default:    //to silence the compiler
> 			break;
>-- 
>1.7.6
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH net-next] bnx2: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-13 17:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael Chan

bnx2 allocates a full page per fragment. We must account PAGE_SIZE
increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 3c221be..6ff7636 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -3051,7 +3051,6 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 						&skb_shinfo(skb)->frags[i - 1];
 					frag->size -= tail;
 					skb->data_len -= tail;
-					skb->truesize -= tail;
 				}
 				return 0;
 			}
@@ -3083,7 +3082,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 
 			frag_size -= frag_len;
 			skb->data_len += frag_len;
-			skb->truesize += frag_len;
+			skb->truesize += PAGE_SIZE;
 			skb->len += frag_len;
 
 			pg_prod = NEXT_RX_BD(pg_prod);

^ permalink raw reply related

* [PATCH net-next] e1000: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-13 17:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jeff Kirsher

e1000 allocates a full page per skb fragment. We must account PAGE_SIZE
increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index a42421f..7b54d72 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3737,7 +3737,7 @@ static void e1000_consume_page(struct e1000_buffer *bi, struct sk_buff *skb,
 	bi->page = NULL;
 	skb->len += length;
 	skb->data_len += length;
-	skb->truesize += length;
+	skb->truesize += PAGE_SIZE;
 }
 
 /**

^ permalink raw reply related

* [PATCH net-next] igb: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-13 17:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jeff Kirsher

e1000 allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 10670f9..1adcdf8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5946,7 +5946,7 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 
 			skb->len += length;
 			skb->data_len += length;
-			skb->truesize += length;
+			skb->truesize += PAGE_SIZE / 2;
 
 			if ((page_count(buffer_info->page) != 1) ||
 			    (page_to_nid(buffer_info->page) != current_node))

^ permalink raw reply related

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
From: David Miller @ 2011-10-13 17:58 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20111013100950.GO1830@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 13 Oct 2011 12:09:50 +0200

> At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> a cached route, they do the route lookup in any case. I don't see
> where we check if we learned a new pmtu in this cases. 

A freshly looked up route should not have ->obsolete set.

That's why we don't do dst_check() in that part of the ip_output.c
helper code you're modifying.

Please find out exactly why dst->obsolete is non-zero on a freshly
looked up route.  It's unexpected.

^ permalink raw reply

* [PATCH net-next] ixgbe: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-13 17:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jeff Kirsher

ixgbe allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1519a23..fdb6ae4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1326,7 +1326,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 			skb->len += upper_len;
 			skb->data_len += upper_len;
-			skb->truesize += upper_len;
+			skb->truesize += PAGE_SIZE / 2;
 		}
 
 		i++;

^ permalink raw reply related

* Re: [net-next 00/11 v2][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2011-10-13 18:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1318490112-5092-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 13 Oct 2011 00:15:01 -0700

> The following series contains updates to ixgbe and igb.  This
> version of the series contains the following changes:
> 
> - ixgbe add FCoE stats, add protection from invalid DMA and fix
>   check for change in FCoE priority
> - igb version bump, fix sparse warnings and finish up the cleanup
>   work of igb by Alex Duyck.
> 
> - v2: drop the "igb: move DMA Coalescing feature code into separate function."
>   patch while Carolyn fixes the patch based on David Miller's comments.
> 
> The following are changes since commit 9687c637388f63b87fcc18eee6e65bcfca4f49ca:
>   Merge branch 'for-davem' of git://git.infradead.org/users/linville/wireless-next
> and are available in the git repository at
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next.git

Pulled, thanks.

^ permalink raw reply

* [PATCH net-next] e1000e: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-13 18:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jeff Kirsher

e1000e allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 78c5d21..035ce73 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1300,7 +1300,7 @@ static bool e1000_clean_rx_irq_ps(struct e1000_adapter *adapter,
 			ps_page->page = NULL;
 			skb->len += length;
 			skb->data_len += length;
-			skb->truesize += length;
+			skb->truesize += PAGE_SIZE;
 		}
 
 		/* strip the ethernet crc, problem is we're using pages now so
@@ -1360,7 +1360,7 @@ static void e1000_consume_page(struct e1000_buffer *bi, struct sk_buff *skb,
 	bi->page = NULL;
 	skb->len += length;
 	skb->data_len += length;
-	skb->truesize += length;
+	skb->truesize += PAGE_SIZE;
 }
 
 /**

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox