Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 16:48 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D6534C3.1080305@trash.net>

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :

> This needs to be a per-skb property, otherwise you could have the
> situation:
> 
> - enqueue skb, double_buffering=0, increment buffer 0
> - enable double buffering
> - swap buffers
> - dequeue same skb, decrement buffer 1
> 
> after which the qlen values of buffer 1 will be incorrect.
> 

Normally its OK, because we bzero() the zone, and the "decrement" is
0-bounded.

I had this idea (of storing two bits per skb), but :

- It means that swap_buffer() should not touch (bzero) the 'old' bins

- Since hash perturbator is changed, we have to store the two hash
values per skb (instead of one u32 / classid).


> 
> > +		slot ^= 1;
> > +		sfbhash = sfb_hash(skb, slot, q);
> 
> Isn't there room in the cb to store both hash values?

Yes, I am going to implement your idea, its probably OK to use two u32
on skb_cb for this.

Thanks !



^ permalink raw reply

* Re: [RFC PATCH 09/10] netdev: octeon_mgmt: Convert to use device tree.
From: Grant Likely @ 2011-02-23 16:32 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	David S. Miller
In-Reply-To: <1298408274-20856-10-git-send-email-ddaney-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>

On Tue, Feb 22, 2011 at 12:57:53PM -0800, David Daney wrote:
> The device tree will supply the register bank base addresses, make
> register addressing relative to those.  PHY connection is now
> described by the device tree.
> 
> Signed-off-by: David Daney <ddaney-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  arch/mips/cavium-octeon/octeon-platform.c |   62 -------
>  drivers/net/octeon/octeon_mgmt.c          |  265 +++++++++++++++++------------
>  2 files changed, 160 insertions(+), 167 deletions(-)
> 
> diff --git a/drivers/net/octeon/octeon_mgmt.c b/drivers/net/octeon/octeon_mgmt.c
> index b264f0f..6c301c7 100644
> --- a/drivers/net/octeon/octeon_mgmt.c
> +++ b/drivers/net/octeon/octeon_mgmt.c
> @@ -6,16 +6,21 @@
>   * Copyright (C) 2009 Cavium Networks
>   */
>  
> -#include <linux/capability.h>
> -#include <linux/dma-mapping.h>
> -#include <linux/init.h>
>  #include <linux/platform_device.h>
> -#include <linux/netdevice.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/etherdevice.h>
> +#include <linux/of_platform.h>
> +#include <linux/capability.h>
> +#include <linux/of_address.h>
> +#include <linux/netdevice.h>
> +#include <linux/spinlock.h>
>  #include <linux/if_vlan.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_net.h>
> +#include <linux/init.h>
>  #include <linux/slab.h>
>  #include <linux/phy.h>
> -#include <linux/spinlock.h>
>  
>  #include <asm/octeon/octeon.h>
>  #include <asm/octeon/cvmx-mixx-defs.h>
> @@ -55,8 +60,56 @@ union mgmt_port_ring_entry {
>  	} s;
>  };
>  
> +#define MIX_ORING1	0x0
> +#define MIX_ORING2	0x8
> +#define MIX_IRING1	0x10
> +#define MIX_IRING2	0x18
> +#define MIX_CTL		0x20
> +#define MIX_IRHWM	0x28
> +#define MIX_IRCNT	0x30
> +#define MIX_ORHWM	0x38
> +#define MIX_ORCNT	0x40
> +#define MIX_ISR		0x48
> +#define MIX_INTENA	0x50
> +#define MIX_REMCNT	0x58
> +#define MIX_BIST	0x78
> +
> +#define AGL_GMX_PRT_CFG			0x10
> +#define AGL_GMX_RX_FRM_CTL		0x18
> +#define AGL_GMX_RX_FRM_MAX		0x30
> +#define AGL_GMX_RX_JABBER		0x38
> +#define AGL_GMX_RX_STATS_CTL		0x50
> +
> +#define AGL_GMX_RX_STATS_PKTS_DRP	0xb0
> +#define AGL_GMX_RX_STATS_OCTS_DRP	0xb8
> +#define AGL_GMX_RX_STATS_PKTS_BAD	0xc0
> +
> +#define AGL_GMX_RX_ADR_CTL		0x100
> +#define AGL_GMX_RX_ADR_CAM_EN		0x108
> +#define AGL_GMX_RX_ADR_CAM0		0x180
> +#define AGL_GMX_RX_ADR_CAM1		0x188
> +#define AGL_GMX_RX_ADR_CAM2		0x190
> +#define AGL_GMX_RX_ADR_CAM3		0x198
> +#define AGL_GMX_RX_ADR_CAM4		0x1a0
> +#define AGL_GMX_RX_ADR_CAM5		0x1a8
> +
> +#define AGL_GMX_TX_STATS_CTL		0x268
> +#define AGL_GMX_TX_CTL			0x270
> +#define AGL_GMX_TX_STAT0		0x280
> +#define AGL_GMX_TX_STAT1		0x288
> +#define AGL_GMX_TX_STAT2		0x290
> +#define AGL_GMX_TX_STAT3		0x298
> +#define AGL_GMX_TX_STAT4		0x2a0
> +#define AGL_GMX_TX_STAT5		0x2a8
> +#define AGL_GMX_TX_STAT6		0x2b0
> +#define AGL_GMX_TX_STAT7		0x2b8
> +#define AGL_GMX_TX_STAT8		0x2c0
> +#define AGL_GMX_TX_STAT9		0x2c8
> +
>  struct octeon_mgmt {
>  	struct net_device *netdev;
> +	u64 mix;
> +	u64 agl;
>  	int port;
>  	int irq;
>  	u64 *tx_ring;
> @@ -82,31 +135,30 @@ struct octeon_mgmt {
>  	struct napi_struct napi;
>  	struct tasklet_struct tx_clean_tasklet;
>  	struct phy_device *phydev;
> +	struct device_node *phy_np;
>  };
>  
>  static void octeon_mgmt_set_rx_irq(struct octeon_mgmt *p, int enable)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_intena mix_intena;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&p->lock, flags);
> -	mix_intena.u64 = cvmx_read_csr(CVMX_MIXX_INTENA(port));
> +	mix_intena.u64 = cvmx_read_csr(p->mix + MIX_INTENA);
>  	mix_intena.s.ithena = enable ? 1 : 0;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
>  
>  static void octeon_mgmt_set_tx_irq(struct octeon_mgmt *p, int enable)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_intena mix_intena;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&p->lock, flags);
> -	mix_intena.u64 = cvmx_read_csr(CVMX_MIXX_INTENA(port));
> +	mix_intena.u64 = cvmx_read_csr(p->mix + MIX_INTENA);
>  	mix_intena.s.othena = enable ? 1 : 0;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
>  
> @@ -143,7 +195,6 @@ static unsigned int ring_size_to_bytes(unsigned int ring_size)
>  static void octeon_mgmt_rx_fill_ring(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  
>  	while (p->rx_current_fill < ring_max_fill(OCTEON_MGMT_RX_RING_SIZE)) {
>  		unsigned int size;
> @@ -174,24 +225,23 @@ static void octeon_mgmt_rx_fill_ring(struct net_device *netdev)
>  			(p->rx_next_fill + 1) % OCTEON_MGMT_RX_RING_SIZE;
>  		p->rx_current_fill++;
>  		/* Ring the bell.  */
> -		cvmx_write_csr(CVMX_MIXX_IRING2(port), 1);
> +		cvmx_write_csr(p->mix + MIX_IRING2, 1);
>  	}
>  }
>  
>  static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_orcnt mix_orcnt;
>  	union mgmt_port_ring_entry re;
>  	struct sk_buff *skb;
>  	int cleaned = 0;
>  	unsigned long flags;
>  
> -	mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +	mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  	while (mix_orcnt.s.orcnt) {
>  		spin_lock_irqsave(&p->tx_list.lock, flags);
>  
> -		mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +		mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  
>  		if (mix_orcnt.s.orcnt == 0) {
>  			spin_unlock_irqrestore(&p->tx_list.lock, flags);
> @@ -211,7 +261,7 @@ static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  		mix_orcnt.s.orcnt = 1;
>  
>  		/* Acknowledge to hardware that we have the buffer.  */
> -		cvmx_write_csr(CVMX_MIXX_ORCNT(port), mix_orcnt.u64);
> +		cvmx_write_csr(p->mix + MIX_ORCNT, mix_orcnt.u64);
>  		p->tx_current_fill--;
>  
>  		spin_unlock_irqrestore(&p->tx_list.lock, flags);
> @@ -221,7 +271,7 @@ static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  		dev_kfree_skb_any(skb);
>  		cleaned++;
>  
> -		mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +		mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  	}
>  
>  	if (cleaned && netif_queue_stopped(p->netdev))
> @@ -238,13 +288,12 @@ static void octeon_mgmt_clean_tx_tasklet(unsigned long arg)
>  static void octeon_mgmt_update_rx_stats(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	unsigned long flags;
>  	u64 drop, bad;
>  
>  	/* These reads also clear the count registers.  */
> -	drop = cvmx_read_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_DRP(port));
> -	bad = cvmx_read_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_BAD(port));
> +	drop = cvmx_read_csr(p->agl + AGL_GMX_RX_STATS_PKTS_DRP);
> +	bad = cvmx_read_csr(p->agl + AGL_GMX_RX_STATS_PKTS_BAD);
>  
>  	if (drop || bad) {
>  		/* Do an atomic update. */
> @@ -258,15 +307,14 @@ static void octeon_mgmt_update_rx_stats(struct net_device *netdev)
>  static void octeon_mgmt_update_tx_stats(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	unsigned long flags;
>  
>  	union cvmx_agl_gmx_txx_stat0 s0;
>  	union cvmx_agl_gmx_txx_stat1 s1;
>  
>  	/* These reads also clear the count registers.  */
> -	s0.u64 = cvmx_read_csr(CVMX_AGL_GMX_TXX_STAT0(port));
> -	s1.u64 = cvmx_read_csr(CVMX_AGL_GMX_TXX_STAT1(port));
> +	s0.u64 = cvmx_read_csr(p->agl + AGL_GMX_TX_STAT0);
> +	s1.u64 = cvmx_read_csr(p->agl + AGL_GMX_TX_STAT1);
>  
>  	if (s0.s.xsdef || s0.s.xscol || s1.s.scol || s1.s.mcol) {
>  		/* Do an atomic update. */
> @@ -305,7 +353,6 @@ static u64 octeon_mgmt_dequeue_rx_buffer(struct octeon_mgmt *p,
>  
>  static int octeon_mgmt_receive_one(struct octeon_mgmt *p)
>  {
> -	int port = p->port;
>  	struct net_device *netdev = p->netdev;
>  	union cvmx_mixx_ircnt mix_ircnt;
>  	union mgmt_port_ring_entry re;
> @@ -378,18 +425,17 @@ done:
>  	/* Tell the hardware we processed a packet.  */
>  	mix_ircnt.u64 = 0;
>  	mix_ircnt.s.ircnt = 1;
> -	cvmx_write_csr(CVMX_MIXX_IRCNT(port), mix_ircnt.u64);
> +	cvmx_write_csr(p->mix + MIX_IRCNT, mix_ircnt.u64);
>  	return rc;
>  }
>  
>  static int octeon_mgmt_receive_packets(struct octeon_mgmt *p, int budget)
>  {
> -	int port = p->port;
>  	unsigned int work_done = 0;
>  	union cvmx_mixx_ircnt mix_ircnt;
>  	int rc;
>  
> -	mix_ircnt.u64 = cvmx_read_csr(CVMX_MIXX_IRCNT(port));
> +	mix_ircnt.u64 = cvmx_read_csr(p->mix + MIX_IRCNT);
>  	while (work_done < budget && mix_ircnt.s.ircnt) {
>  
>  		rc = octeon_mgmt_receive_one(p);
> @@ -397,7 +443,7 @@ static int octeon_mgmt_receive_packets(struct octeon_mgmt *p, int budget)
>  			work_done++;
>  
>  		/* Check for more packets. */
> -		mix_ircnt.u64 = cvmx_read_csr(CVMX_MIXX_IRCNT(port));
> +		mix_ircnt.u64 = cvmx_read_csr(p->mix + MIX_IRCNT);
>  	}
>  
>  	octeon_mgmt_rx_fill_ring(p->netdev);
> @@ -431,16 +477,16 @@ static void octeon_mgmt_reset_hw(struct octeon_mgmt *p)
>  	union cvmx_agl_gmx_bist agl_gmx_bist;
>  
>  	mix_ctl.u64 = 0;
> -	cvmx_write_csr(CVMX_MIXX_CTL(p->port), mix_ctl.u64);
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  	do {
> -		mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(p->port));
> +		mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  	} while (mix_ctl.s.busy);
>  	mix_ctl.s.reset = 1;
> -	cvmx_write_csr(CVMX_MIXX_CTL(p->port), mix_ctl.u64);
> -	cvmx_read_csr(CVMX_MIXX_CTL(p->port));
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
> +	cvmx_read_csr(p->mix + MIX_CTL);
>  	cvmx_wait(64);
>  
> -	mix_bist.u64 = cvmx_read_csr(CVMX_MIXX_BIST(p->port));
> +	mix_bist.u64 = cvmx_read_csr(p->mix + MIX_BIST);
>  	if (mix_bist.u64)
>  		dev_warn(p->dev, "MIX failed BIST (0x%016llx)\n",
>  			(unsigned long long)mix_bist.u64);
> @@ -471,7 +517,6 @@ static void octeon_mgmt_cam_state_add(struct octeon_mgmt_cam_state *cs,
>  static void octeon_mgmt_set_rx_filtering(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_agl_gmx_rxx_adr_ctl adr_ctl;
>  	union cvmx_agl_gmx_prtx_cfg agl_gmx_prtx;
>  	unsigned long flags;
> @@ -517,29 +562,29 @@ static void octeon_mgmt_set_rx_filtering(struct net_device *netdev)
>  	spin_lock_irqsave(&p->lock, flags);
>  
>  	/* Disable packet I/O. */
> -	agl_gmx_prtx.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	agl_gmx_prtx.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prev_packet_enable = agl_gmx_prtx.s.en;
>  	agl_gmx_prtx.s.en = 0;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), agl_gmx_prtx.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, agl_gmx_prtx.u64);
>  
>  	adr_ctl.u64 = 0;
>  	adr_ctl.s.cam_mode = cam_mode;
>  	adr_ctl.s.mcst = multicast_mode;
>  	adr_ctl.s.bcst = 1;     /* Allow broadcast */
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CTL(port), adr_ctl.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CTL, adr_ctl.u64);
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM0(port), cam_state.cam[0]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM1(port), cam_state.cam[1]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM2(port), cam_state.cam[2]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM3(port), cam_state.cam[3]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM4(port), cam_state.cam[4]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM5(port), cam_state.cam[5]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM_EN(port), cam_state.cam_mask);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM0, cam_state.cam[0]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM1, cam_state.cam[1]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM2, cam_state.cam[2]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM3, cam_state.cam[3]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM4, cam_state.cam[4]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM5, cam_state.cam[5]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM_EN, cam_state.cam_mask);
>  
>  	/* Restore packet I/O. */
>  	agl_gmx_prtx.s.en = prev_packet_enable;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), agl_gmx_prtx.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, agl_gmx_prtx.u64);
>  
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
> @@ -561,7 +606,6 @@ static int octeon_mgmt_set_mac_address(struct net_device *netdev, void *addr)
>  static int octeon_mgmt_change_mtu(struct net_device *netdev, int new_mtu)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	int size_without_fcs = new_mtu + OCTEON_MGMT_RX_HEADROOM;
>  
>  	/*
> @@ -577,8 +621,8 @@ static int octeon_mgmt_change_mtu(struct net_device *netdev, int new_mtu)
>  
>  	netdev->mtu = new_mtu;
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_FRM_MAX(port), size_without_fcs);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_JABBER(port),
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_FRM_MAX, size_without_fcs);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_JABBER,
>  		       (size_without_fcs + 7) & 0xfff8);
>  
>  	return 0;
> @@ -588,14 +632,13 @@ static irqreturn_t octeon_mgmt_interrupt(int cpl, void *dev_id)
>  {
>  	struct net_device *netdev = dev_id;
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_mixx_isr mixx_isr;
>  
> -	mixx_isr.u64 = cvmx_read_csr(CVMX_MIXX_ISR(port));
> +	mixx_isr.u64 = cvmx_read_csr(p->mix + MIX_ISR);
>  
>  	/* Clear any pending interrupts */
> -	cvmx_write_csr(CVMX_MIXX_ISR(port), mixx_isr.u64);
> -	cvmx_read_csr(CVMX_MIXX_ISR(port));
> +	cvmx_write_csr(p->mix + MIX_ISR, mixx_isr.u64);
> +	cvmx_read_csr(p->mix + MIX_ISR);
>  
>  	if (mixx_isr.s.irthresh) {
>  		octeon_mgmt_disable_rx_irq(p);
> @@ -626,7 +669,6 @@ static int octeon_mgmt_ioctl(struct net_device *netdev,
>  static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_agl_gmx_prtx_cfg prtx_cfg;
>  	unsigned long flags;
>  	int link_changed = 0;
> @@ -637,11 +679,9 @@ static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  			link_changed = 1;
>  		if (p->last_duplex != p->phydev->duplex) {
>  			p->last_duplex = p->phydev->duplex;
> -			prtx_cfg.u64 =
> -				cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +			prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  			prtx_cfg.s.duplex = p->phydev->duplex;
> -			cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port),
> -				       prtx_cfg.u64);
> +			cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  		}
>  	} else {
>  		if (p->last_link)
> @@ -667,18 +707,16 @@ static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  static int octeon_mgmt_init_phy(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	char phy_id[20];
>  
> -	if (octeon_is_simulation()) {
> +	if (octeon_is_simulation() || p->phy_np == NULL) {
>  		/* No PHYs in the simulator. */
>  		netif_carrier_on(netdev);
>  		return 0;
>  	}
>  
> -	snprintf(phy_id, sizeof(phy_id), PHY_ID_FMT, "0", p->port);
> -
> -	p->phydev = phy_connect(netdev, phy_id, octeon_mgmt_adjust_link, 0,
> -				PHY_INTERFACE_MODE_MII);
> +	p->phydev = of_phy_connect(netdev, p->phy_np,
> +				   octeon_mgmt_adjust_link, 0,
> +				   PHY_INTERFACE_MODE_MII);
>  
>  	if (IS_ERR(p->phydev)) {
>  		p->phydev = NULL;
> @@ -734,14 +772,14 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  
>  	octeon_mgmt_reset_hw(p);
>  
> -	mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(port));
> +	mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  
>  	/* Bring it out of reset if needed. */
>  	if (mix_ctl.s.reset) {
>  		mix_ctl.s.reset = 0;
> -		cvmx_write_csr(CVMX_MIXX_CTL(port), mix_ctl.u64);
> +		cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  		do {
> -			mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(port));
> +			mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  		} while (mix_ctl.s.reset);
>  	}
>  
> @@ -752,17 +790,17 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	oring1.u64 = 0;
>  	oring1.s.obase = p->tx_ring_handle >> 3;
>  	oring1.s.osize = OCTEON_MGMT_TX_RING_SIZE;
> -	cvmx_write_csr(CVMX_MIXX_ORING1(port), oring1.u64);
> +	cvmx_write_csr(p->mix + MIX_ORING1, oring1.u64);
>  
>  	iring1.u64 = 0;
>  	iring1.s.ibase = p->rx_ring_handle >> 3;
>  	iring1.s.isize = OCTEON_MGMT_RX_RING_SIZE;
> -	cvmx_write_csr(CVMX_MIXX_IRING1(port), iring1.u64);
> +	cvmx_write_csr(p->mix + MIX_IRING1, iring1.u64);
>  
>  	/* Disable packet I/O. */
> -	prtx_cfg.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prtx_cfg.s.en = 0;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), prtx_cfg.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  
>  	memcpy(sa.sa_data, netdev->dev_addr, ETH_ALEN);
>  	octeon_mgmt_set_mac_address(netdev, &sa);
> @@ -779,7 +817,7 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	mix_ctl.s.nbtarb = 0;       /* Arbitration mode */
>  	/* MII CB-request FIFO programmable high watermark */
>  	mix_ctl.s.mrq_hwm = 1;
> -	cvmx_write_csr(CVMX_MIXX_CTL(port), mix_ctl.u64);
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  
>  	if (OCTEON_IS_MODEL(OCTEON_CN56XX_PASS1_X)
>  	    || OCTEON_IS_MODEL(OCTEON_CN52XX_PASS1_X)) {
> @@ -806,16 +844,16 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  
>  	/* Clear statistics. */
>  	/* Clear on read. */
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_CTL(port), 1);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_DRP(port), 0);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_BAD(port), 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_CTL, 1);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_PKTS_DRP, 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_PKTS_BAD, 0);
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STATS_CTL(port), 1);
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STAT0(port), 0);
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STAT1(port), 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STATS_CTL, 1);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STAT0, 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STAT1, 0);
>  
>  	/* Clear any pending interrupts */
> -	cvmx_write_csr(CVMX_MIXX_ISR(port), cvmx_read_csr(CVMX_MIXX_ISR(port)));
> +	cvmx_write_csr(p->mix + MIX_ISR, cvmx_read_csr(p->mix + MIX_ISR));
>  
>  	if (request_irq(p->irq, octeon_mgmt_interrupt, 0, netdev->name,
>  			netdev)) {
> @@ -826,18 +864,18 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	/* Interrupt every single RX packet */
>  	mix_irhwm.u64 = 0;
>  	mix_irhwm.s.irhwm = 0;
> -	cvmx_write_csr(CVMX_MIXX_IRHWM(port), mix_irhwm.u64);
> +	cvmx_write_csr(p->mix + MIX_IRHWM, mix_irhwm.u64);
>  
>  	/* Interrupt when we have 1 or more packets to clean.  */
>  	mix_orhwm.u64 = 0;
>  	mix_orhwm.s.orhwm = 1;
> -	cvmx_write_csr(CVMX_MIXX_ORHWM(port), mix_orhwm.u64);
> +	cvmx_write_csr(p->mix + MIX_ORHWM, mix_orhwm.u64);
>  
>  	/* Enable receive and transmit interrupts */
>  	mix_intena.u64 = 0;
>  	mix_intena.s.ithena = 1;
>  	mix_intena.s.othena = 1;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  
>  
>  	/* Enable packet I/O. */
> @@ -868,7 +906,7 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	 * frame.  GMX checks that the PREAMBLE is sent correctly.
>  	 */
>  	rxx_frm_ctl.s.pre_chk = 1;
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_FRM_CTL(port), rxx_frm_ctl.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_FRM_CTL, rxx_frm_ctl.u64);
>  
>  	/* Enable the AGL block */
>  	agl_gmx_inf_mode.u64 = 0;
> @@ -876,13 +914,13 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	cvmx_write_csr(CVMX_AGL_GMX_INF_MODE, agl_gmx_inf_mode.u64);
>  
>  	/* Configure the port duplex and enables */
> -	prtx_cfg.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prtx_cfg.s.tx_en = 1;
>  	prtx_cfg.s.rx_en = 1;
>  	prtx_cfg.s.en = 1;
>  	p->last_duplex = 1;
>  	prtx_cfg.s.duplex = p->last_duplex;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), prtx_cfg.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  
>  	p->last_link = 0;
>  	netif_carrier_off(netdev);
> @@ -946,7 +984,6 @@ static int octeon_mgmt_stop(struct net_device *netdev)
>  static int octeon_mgmt_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union mgmt_port_ring_entry re;
>  	unsigned long flags;
>  	int rv = NETDEV_TX_BUSY;
> @@ -990,7 +1027,7 @@ static int octeon_mgmt_xmit(struct sk_buff *skb, struct net_device *netdev)
>  	netdev->stats.tx_bytes += skb->len;
>  
>  	/* Ring the bell.  */
> -	cvmx_write_csr(CVMX_MIXX_ORING2(port), 1);
> +	cvmx_write_csr(p->mix + MIX_ORING2, 1);
>  
>  	rv = NETDEV_TX_OK;
>  out:
> @@ -1067,12 +1104,14 @@ static const struct net_device_ops octeon_mgmt_ops = {
>  #endif
>  };
>  
> -static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
> +static int __devinit octeon_mgmt_probe(struct platform_device *pdev,
> +				       const struct of_device_id *match)

Same as my comments on the i2c patch; this driver does *not* need to
be converted into an of_platform_driver.  Normal platform_driver will
work just fine.

>  {
> -	struct resource *res_irq;
>  	struct net_device *netdev;
>  	struct octeon_mgmt *p;
> -	int i;
> +	const __be32 *data;
> +	const u8 *mac;
> +	int len;
>  
>  	netdev = alloc_etherdev(sizeof(struct octeon_mgmt));
>  	if (netdev == NULL)
> @@ -1086,14 +1125,21 @@ static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
>  	p->netdev = netdev;
>  	p->dev = &pdev->dev;
>  
> -	p->port = pdev->id;
> +	data = of_get_property(pdev->dev.of_node, "unit-number", &len);
> +	if (data && len == sizeof(*data))
> +		p->port = be32_to_cpup(data);
> +	else
> +		p->port = 0;

Port numbers should be dynamically assigned.  If you *really* want to
assign logical names to devices, then add properties to the /aliases
node.  unit-number is the sort of thing that causes problems in the
future.

> +
>  	snprintf(netdev->name, IFNAMSIZ, "mgmt%d", p->port);
>  
> -	res_irq = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
> -	if (!res_irq)
> +	p->irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
> +	if (!p->irq)
>  		goto err;

Drop this hunk

>  
> -	p->irq = res_irq->start;
> +	p->mix = (u64)of_iomap(pdev->dev.of_node, 0);
> +	p->agl = (u64)of_iomap(pdev->dev.of_node, 1);
> +
>  	spin_lock_init(&p->lock);
>  
>  	skb_queue_head_init(&p->tx_list);
> @@ -1104,15 +1150,15 @@ static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
>  	netdev->netdev_ops = &octeon_mgmt_ops;
>  	netdev->ethtool_ops = &octeon_mgmt_ethtool_ops;
>  
> -	/* The mgmt ports get the first N MACs.  */
> -	for (i = 0; i < 6; i++)
> -		netdev->dev_addr[i] = octeon_bootinfo->mac_addr_base[i];
> -	netdev->dev_addr[5] += p->port;
> +	mac = of_get_mac_address(pdev->dev.of_node);
> +
> +	if (mac)
> +		memcpy(netdev->dev_addr, mac, 6);
>  
> -	if (p->port >= octeon_bootinfo->mac_addr_count)
> -		dev_err(&pdev->dev,
> -			"Error %s: Using MAC outside of the assigned range: %pM\n",
> -			netdev->name, netdev->dev_addr);
> +	p->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
> +
> +	pdev->dev.coherent_dma_mask = DMA_BIT_MASK(64);
> +	pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>  
>  	if (register_netdev(netdev))
>  		goto err;
> @@ -1133,10 +1179,19 @@ static int __devexit octeon_mgmt_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> -static struct platform_driver octeon_mgmt_driver = {
> +static struct of_device_id octeon_mgmt_match[] = {
> +	{
> +		.compatible = "octeon,mgmt",

Ditto to the i2c comment.  Compatible value should be in the form
<vendor>,<soc-part-number>-<device>, and it needs to be documented in
Documentation/devicetree/bindings

> +	},
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, octeon_mgmt_match);
> +
> +static struct of_platform_driver octeon_mgmt_driver = {

of_platform_driver is deprecated; just stick with a normal
platform_driver.

>  	.driver = {
>  		.name		= "octeon_mgmt",
>  		.owner		= THIS_MODULE,
> +		.of_match_table = octeon_mgmt_match,
>  	},
>  	.probe		= octeon_mgmt_probe,
>  	.remove		= __devexit_p(octeon_mgmt_remove),
> @@ -1148,12 +1203,12 @@ static int __init octeon_mgmt_mod_init(void)
>  {
>  	/* Force our mdiobus driver module to be loaded first. */
>  	octeon_mdiobus_force_mod_depencency();
> -	return platform_driver_register(&octeon_mgmt_driver);
> +	return of_register_platform_driver(&octeon_mgmt_driver);

Drop this hunk

>  }
>  
>  static void __exit octeon_mgmt_mod_exit(void)
>  {
> -	platform_driver_unregister(&octeon_mgmt_driver);
> +	of_unregister_platform_driver(&octeon_mgmt_driver);

Ditto

>  }
>  
>  module_init(octeon_mgmt_mod_init);
> -- 
> 1.7.2.3
> 

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <1298474091.3301.364.camel@edumazet-laptop>

Am 23.02.2011 16:14, schrieb Eric Dumazet:
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 16626a0..f40d32e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -218,6 +218,7 @@ struct tcf_proto {
>  
>  struct qdisc_skb_cb {
>  	unsigned int		pkt_len;
> +	unsigned int		sfb_classid;
>  	char			data[];
>  };

This could be moved into a SFB specific cb, similar to what netem
does.

> diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
> new file mode 100644
> index 0000000..b7f1c6e
> --- /dev/null
> +++ b/net/sched/sch_sfb.c
> @@ -0,0 +1,696 @@
> +/*
> + * net/sched/sch_sfb.c	  Stochastic Fair Blue
> + *
> + * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
> + * A New Class of Active Queue Management Algorithms. 
> + * U. Michigan CSE-TR-387-99, April 1999.
> + *
> + * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/random.h>
> +#include <linux/jhash.h>
> +#include <net/ip.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +
> +/*
> + * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
> + * This implementation uses L = 8 and N = 16
> + * This permits us to split one 32bit hash (provided per packet by rxhash or
> + * external classifier) into 8 subhashes of 4 bits.
> + */
> +#define SFB_BUCKET_SHIFT 4

If you want to make this dynamic, there are a couple of papers analyzing
combined hash functions for bloom filters, f.i.
"Less Hashing, Same Performance: Building a Better Bloom Filter".

> +/*
> + * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
> + * If using external classifier, sfb_classid contains the classid.
> + */
> +static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
> +		    struct sfb_sched_data *q)
> +{
> +	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
> +			   q->bins[slot].perturbation);
> +}
> +
> +/* Probabilities are coded as Q0.16 fixed-point values,
> + * with 0xFFFF representing 65535/65536 (almost 1.0)
> + * Addition and subtraction are saturating in [0, 65535]
> + */
> +static u32 prob_plus(u32 p1, u32 p2)
> +{
> +	u32 res = p1 + p2;
> +
> +	return min_t(u32, res, SFB_MAX_PROB);
> +}
> +
> +static u32 prob_minus(u32 p1, u32 p2)
> +{
> +	return p1 > p2 ? p1 - p2 : 0;
> +}
> +
> +static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen < 0xFFFF)
> +			b[hash].qlen++;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +
> +	increment_one_qlen(hashes[slot], slot, q);
> +	if (q->double_buffering) {
> +		slot ^= 1;
> +		increment_one_qlen(hashes[slot], slot, q);
> +	}
> +}
> +
> +static void decrement_one_qlen(u32 sfbhash, u32 slot,
> +			       struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen > 0)
> +			b[hash].qlen--;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +	u32 sfbhash = sfb_hash(skb, slot, q);
> +
> +	decrement_one_qlen(sfbhash, slot, q);
> +	if (q->double_buffering) {

This needs to be a per-skb property, otherwise you could have the
situation:

- enqueue skb, double_buffering=0, increment buffer 0
- enable double buffering
- swap buffers
- dequeue same skb, decrement buffer 1

after which the qlen values of buffer 1 will be incorrect.


> +		slot ^= 1;
> +		sfbhash = sfb_hash(skb, slot, q);

Isn't there room in the cb to store both hash values?

> +		decrement_one_qlen(sfbhash, slot, q);
> +	}
> +}
> +

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Juliusz Chroboczek,
	John W. Linville, netdev, Andi Kleen
In-Reply-To: <1298477600.3301.367.camel@edumazet-laptop>

Am 23.02.2011 17:13, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
>> On Wed, 23 Feb 2011 16:14:51 +0100
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
>>> qdisc but a child qdisc MUST not drop a packet previously queued. This
>>> is because SFB needs to handle a dequeued packet in order to maintain
>>> its virtual queue states. pfifo_head_drop or CHOKe should not be used.
>>
>> Why not add a flag field to Qdisc_ops and to mark qdisc's that
>> are (or not) work conserving?
>>
> 
> That was my initial idea, but have no idea how to implement it (outside
> of fast path, I mean...)

This also doesn't really have anything to do with work-conserving
qdiscs, SFB f.i. is work conserving, but still might drop other
packets. Actually I don't think there's any qdisc besides the
*fifos that can reasonably be used with SFB, so we might as well
only support a built-in qdisc.


^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 16:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen
In-Reply-To: <20110223074305.755c5713@nehalam>

Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
> On Wed, 23 Feb 2011 16:14:51 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> > qdisc but a child qdisc MUST not drop a packet previously queued. This
> > is because SFB needs to handle a dequeued packet in order to maintain
> > its virtual queue states. pfifo_head_drop or CHOKe should not be used.
> 
> Why not add a flag field to Qdisc_ops and to mark qdisc's that
> are (or not) work conserving?
> 

That was my initial idea, but have no idea how to implement it (outside
of fast path, I mean...)




^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2011-02-23 15:55 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, Simon Horman, kvm,
	netdev, rusty
In-Reply-To: <OFA5C641DD.CCB1702C-ON65257840.0024C22E-65257840.002545D9@in.ibm.com>

On Wed, Feb 23, 2011 at 12:18:36PM +0530, Krishna Kumar2 wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 02/23/2011 12:09:15 PM:
> 
> Hi Michael,
> 
> > > Yes. Michael Tsirkin had wanted to see how the MQ RX patch
> > > would look like, so I was in the process of getting the two
> > > working together. The patch is ready and is being tested.
> > > Should I send a RFC patch at this time?
> >
> > Yes, please do.
> 
> Sure, will get a build/test on latest bits and send in 1-2 days.
> 
> > > The TX-only patch helped the guest TX path but didn't help
> > > host->guest much (as tested using TCP_MAERTS from the guest).
> > > But with the TX+RX patch, both directions are getting
> > > improvements.
> >
> > Also, my hope is that with appropriate queue mapping,
> > we might be able to do away with heuristics to detect
> > single stream load that TX only code needs.
> 
> Yes, that whole stuff is removed, and the TX/RX path is
> unchanged with this patch (thankfully :)

Cool. I was wondering whether in that case, we can
do without host kernel changes at all,
and use a separate fd for each TX/RX pair.
The advantage of that approach is that this way,
the max fd limit naturally sets an upper bound
on the amount of resources userspace can use up.

Thoughts?

In any case, pls don't let the above delay
sending an RFC.

> > > Remote testing is still to be done.
> >
> > Others might be able to help here once you post the patch.
> 
> That's great, will appreciate any help.
> 
> Thanks,
> 
> - KK

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Stephen Hemminger @ 2011-02-23 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen
In-Reply-To: <1298474091.3301.364.camel@edumazet-laptop>

On Wed, 23 Feb 2011 16:14:51 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> qdisc but a child qdisc MUST not drop a packet previously queued. This
> is because SFB needs to handle a dequeued packet in order to maintain
> its virtual queue states. pfifo_head_drop or CHOKe should not be used.

Why not add a flag field to Qdisc_ops and to mark qdisc's that
are (or not) work conserving?

-- 

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Cong Wang @ 2011-02-23 15:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: linux-kernel, Herbert Xu, David S. Miller, Neil Horman,
	Eric Dumazet, netdev
In-Reply-To: <20110223153041.GE11496@hmsreliant.think-freely.org>

于 2011年02月23日 23:30, Neil Horman 写道:
>
> seems reasonable, thanks!
> Acked-by: Neil Horman<nhorman@tuxdriver.com>
>

Thanks.

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Neil Horman @ 2011-02-23 15:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, Herbert Xu, David S. Miller, Neil Horman,
	Eric Dumazet, netdev
In-Reply-To: <4D6526BD.6030306@redhat.com>

On Wed, Feb 23, 2011 at 11:24:45PM +0800, Cong Wang wrote:
> Neil, ping. :)
> 
> 于 2011年02月18日 17:43, Amerigo Wang 写道:
> >Neil pointed out that we can't send ARP reply on behalf of slaves,
> >we need to move the arp queue to their bond device.
> >
> >Signed-off-by: WANG Cong<amwang@redhat.com>
> >Cc: Neil Horman<nhorman@redhat.com>
> >
> >---
> >  net/core/netpoll.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> >
> >diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> >index f68e694..013e04a 100644
> >--- a/net/core/netpoll.c
> >+++ b/net/core/netpoll.c
> >@@ -193,6 +193,15 @@ void netpoll_poll_dev(struct net_device *dev)
> >
> >  	poll_napi(dev);
> >
> >+	if (dev->priv_flags&  IFF_SLAVE) {
> >+		if (dev->npinfo) {
> >+			struct net_device *bond_dev = dev->master;
> >+			struct sk_buff *skb;
> >+			while ((skb = skb_dequeue(&dev->npinfo->arp_tx)))
> >+				skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
> >+		}
> >+	}
> >+
> >  	service_arp_queue(dev->npinfo);
> >
> >  	zap_completion_queue();
> 
> 
> -- 
> 我认识的人越多，我就越喜欢狗。

seems reasonable, thanks!
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Cong Wang @ 2011-02-23 15:24 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: linux-kernel, Neil Horman, Herbert Xu, David S. Miller,
	Neil Horman, Eric Dumazet, netdev
In-Reply-To: <1298022215-21059-3-git-send-email-amwang@redhat.com>

Neil, ping. :)

于 2011年02月18日 17:43, Amerigo Wang 写道:
> Neil pointed out that we can't send ARP reply on behalf of slaves,
> we need to move the arp queue to their bond device.
>
> Signed-off-by: WANG Cong<amwang@redhat.com>
> Cc: Neil Horman<nhorman@redhat.com>
>
> ---
>   net/core/netpoll.c |    9 +++++++++
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index f68e694..013e04a 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -193,6 +193,15 @@ void netpoll_poll_dev(struct net_device *dev)
>
>   	poll_napi(dev);
>
> +	if (dev->priv_flags&  IFF_SLAVE) {
> +		if (dev->npinfo) {
> +			struct net_device *bond_dev = dev->master;
> +			struct sk_buff *skb;
> +			while ((skb = skb_dequeue(&dev->npinfo->arp_tx)))
> +				skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
> +		}
> +	}
> +
>   	service_arp_queue(dev->npinfo);
>
>   	zap_completion_queue();


-- 
我认识的人越多，我就越喜欢狗。

^ permalink raw reply

* [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 15:14 UTC (permalink / raw)
  To: David Miller, Juliusz Chroboczek
  Cc: John W. Linville, Stephen Hemminger, Patrick McHardy, netdev,
	Andi Kleen
In-Reply-To: <1298390536.2861.9.camel@edumazet-laptop>

Hi David & Juliusz

Here is v3 of SFB. (previous ones were from Juliusz)

Thanks

[PATCH net-next-2.6 v3] net_sched: SFB flow scheduler

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].pm += delta;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].pm -= delta;
pmin = min(B[0][h{0}].pm ... B[L-1][h{L-1}].pm);
if (pmin == 1.0)
    ratelimit();
else
    mark/drop with probabilty pmin;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $IFB parent 1:11 handle 11:  \
	est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
	flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) I added one field in qdisc_skb_cb because SFB needs to remember the
hash/classid of an skb to decrement virtual queue lengthes at dequeue()
time.

3) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
---
 include/linux/pkt_sched.h |   38 +
 include/net/sch_generic.h |    1 
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_sfb.c       |  696 ++++++++++++++++++++++++++++++++++++
 5 files changed, 747 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d4bb6f5..629a8b0 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -522,4 +522,42 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+/* SFB */
+
+enum {
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+/*
+ * Note: increment, decrement are Q0.16 fixed-point values.
+ */
+struct tc_sfb_qopt {
+	__u32 rehash_interval;	/* delay between hash flip, in seconds */
+	__u32 db_interval;	/* double buffering interval in seconds (db_interval < rehash_interval) */
+	__u32 max;		/* max len of qlen_min */
+	__u32 target;		/* bin_size */
+	__u32 increment;	/* delta, (d1 in Blue) */
+	__u32 decrement;	/* delta, (d2 in Blue) */
+	__u32 limit;		/* max SFB queue length */
+	__u32 penalty_rate;
+	__u32 penalty_burst;
+};
+
+struct tc_sfb_xstats {
+	__u32 earlydrop;
+	__u32 penaltydrop;
+	__u32 bucketdrop;
+	__u32 queuedrop;
+	__u32 childdrop; /* drops in child qdisc */
+	__u32 marked;
+	__u32 maxqlen;
+	__u32 maxprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..f40d32e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,6 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
+	unsigned int		sfb_classid;
 	char			data[];
 };
 
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8c19b6e..a7a5583 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -126,6 +126,17 @@ config NET_SCH_RED
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_red.
 
+config NET_SCH_SFB
+	tristate "Stochastic Fair Blue (SFB)"
+	---help---
+	  Say Y here if you want to use the Stochastic Fair Blue (SFB)
+	  packet scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_sfb.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_sfb.
+
 config NET_SCH_SFQ
 	tristate "Stochastic Fairness Queueing (SFQ)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 06c6cdf..2e77b8d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET_SCH_RED)	+= sch_red.o
 obj-$(CONFIG_NET_SCH_GRED)	+= sch_gred.o
 obj-$(CONFIG_NET_SCH_INGRESS)	+= sch_ingress.o 
 obj-$(CONFIG_NET_SCH_DSMARK)	+= sch_dsmark.o
+obj-$(CONFIG_NET_SCH_SFB)	+= sch_sfb.o
 obj-$(CONFIG_NET_SCH_SFQ)	+= sch_sfq.o
 obj-$(CONFIG_NET_SCH_TBF)	+= sch_tbf.o
 obj-$(CONFIG_NET_SCH_TEQL)	+= sch_teql.o
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
new file mode 100644
index 0000000..b7f1c6e
--- /dev/null
+++ b/net/sched/sch_sfb.c
@@ -0,0 +1,696 @@
+/*
+ * net/sched/sch_sfb.c	  Stochastic Fair Blue
+ *
+ * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
+ * A New Class of Active Queue Management Algorithms. 
+ * U. Michigan CSE-TR-387-99, April 1999.
+ *
+ * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/random.h>
+#include <linux/jhash.h>
+#include <net/ip.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+
+/*
+ * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
+ * This implementation uses L = 8 and N = 16
+ * This permits us to split one 32bit hash (provided per packet by rxhash or
+ * external classifier) into 8 subhashes of 4 bits.
+ */
+#define SFB_BUCKET_SHIFT 4
+#define SFB_NUMBUCKETS	(1 << SFB_BUCKET_SHIFT) /* N bins per Level */
+#define SFB_BUCKET_MASK (SFB_NUMBUCKETS - 1)
+#define SFB_LEVELS	(32 / SFB_BUCKET_SHIFT) /* L */
+
+/* SFB algo uses a virtual queue, named "bin" */
+struct sfb_bucket {
+	u16		qlen; /* length of virtual queue */
+	u16		pm; /* marking probability */
+};
+
+/* We use a double buffering right before hash change
+ * (Section 4.4 of SFB reference : moving hash functions)
+ */
+struct sfb_bins {
+	u32		  perturbation; /* jhash perturbation */
+	struct sfb_bucket bins[SFB_LEVELS][SFB_NUMBUCKETS];
+};
+
+struct sfb_sched_data {
+	struct Qdisc	*qdisc;
+	struct tcf_proto *filter_list;
+	unsigned long	rehash_interval;
+	unsigned long	db_interval;	/* interval of double buffering */
+	u32		max;
+	u32		target;		/* bin_size */
+	u32		increment;	/* d1 */
+	u32		decrement;	/* d2 */
+	u32		limit;		/* HARD maximal queue length */
+	u32		penalty_rate;
+	u32		penalty_burst;
+	u32		tokens_avail;
+	unsigned long	rehash_time;
+	unsigned long	token_time;
+
+	u8		slot;		/* current active bins (0 or 1) */
+	bool		double_buffering;
+	struct sfb_bins bins[2];
+
+	struct {
+		u32	earlydrop;
+		u32	penaltydrop;
+		u32	bucketdrop;
+		u32	queuedrop;
+		u32	childdrop;	/* drops in child qdisc */
+		u32	marked;		/* ECN mark */
+	} stats;
+};
+
+/*
+ * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
+ * If using external classifier, sfb_classid contains the classid.
+ */
+static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
+		    struct sfb_sched_data *q)
+{
+	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
+			   q->bins[slot].perturbation);
+}
+
+/* Probabilities are coded as Q0.16 fixed-point values,
+ * with 0xFFFF representing 65535/65536 (almost 1.0)
+ * Addition and subtraction are saturating in [0, 65535]
+ */
+static u32 prob_plus(u32 p1, u32 p2)
+{
+	u32 res = p1 + p2;
+
+	return min_t(u32, res, SFB_MAX_PROB);
+}
+
+static u32 prob_minus(u32 p1, u32 p2)
+{
+	return p1 > p2 ? p1 - p2 : 0;
+}
+
+static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen < 0xFFFF)
+			b[hash].qlen++;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+
+	increment_one_qlen(hashes[slot], slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		increment_one_qlen(hashes[slot], slot, q);
+	}
+}
+
+static void decrement_one_qlen(u32 sfbhash, u32 slot,
+			       struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen > 0)
+			b[hash].qlen--;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+	u32 sfbhash = sfb_hash(skb, slot, q);
+
+	decrement_one_qlen(sfbhash, slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		sfbhash = sfb_hash(skb, slot, q);
+		decrement_one_qlen(sfbhash, slot, q);
+	}
+}
+
+static void decrement_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm =	prob_minus(b->pm, q->decrement);
+}
+
+static void increment_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm = prob_plus(b->pm, q->increment);
+}
+
+static void sfb_zero_all_buckets(int slot, struct sfb_sched_data *q)
+{
+	memset(&q->bins[slot], 0, sizeof(q->bins[slot]));
+}
+
+/*
+ * compute max qlen and max pm
+ */
+static u32 sfb_compute_qlen(u32 *prob_r, const struct sfb_sched_data *q)
+{
+	int i;
+	u32 qlen = 0, prob = 0;
+	const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
+		if (qlen < b->qlen)
+			qlen = b->qlen;
+		if (prob < b->pm)
+			prob = b->pm;
+		b++;
+	}
+	*prob_r = prob;
+	return qlen;
+}
+
+
+static void sfb_init_perturbation(u32 slot, struct sfb_sched_data *q)
+{
+	q->bins[slot].perturbation = net_random();
+}
+
+static void sfb_swap_buffers(struct sfb_sched_data *q)
+{
+	sfb_zero_all_buckets(q->slot, q);
+	sfb_init_perturbation(q->slot, q);
+	q->slot ^= 1;
+	q->double_buffering = false;
+}
+
+/* Non elastic flows are allowed to use part of the bandwidth, expressed
+ * in "penalty_rate" packets per second, with "penalty_burst" burst
+ */
+static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	if (q->penalty_rate == 0 || q->penalty_burst == 0)
+		return true;
+
+	if (q->tokens_avail < 1) {
+		unsigned long age = min(10UL * HZ, jiffies - q->token_time);
+
+		q->tokens_avail = (age * q->penalty_rate) / HZ;
+		if (q->tokens_avail > q->penalty_burst)
+			q->tokens_avail = q->penalty_burst;
+		q->token_time = jiffies;
+		if (q->tokens_avail < 1)
+			return true;
+	}
+
+	q->tokens_avail--;
+	return false;
+}
+
+static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+			 int *qerr)
+{
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		qdisc_skb_cb(skb)->sfb_classid = TC_H_MIN(res.classid);
+		return true;
+	}
+	return false;
+}
+
+static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	int i;
+	u32 minprob = SFB_MAX_PROB;
+	u32 minqlen = ~0;
+	u32 r, slot, hashes[2], sfbhash;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!sfb_classify(skb, q, &ret))
+			goto other_drop;
+	} else {
+		qdisc_skb_cb(skb)->sfb_classid = skb_get_rxhash(skb);
+	}
+
+	if (q->rehash_interval > 0) {
+		unsigned long limit = q->rehash_time + q->rehash_interval;
+
+		if (unlikely(time_after(jiffies, limit))) {
+			sfb_swap_buffers(q);
+			q->rehash_time = jiffies;
+		} else if (unlikely(!q->double_buffering && q->db_interval > 0 &&
+				    time_after(jiffies, limit - q->db_interval))) {
+			q->double_buffering = true;
+		}
+	}
+
+	slot = q->slot;
+
+	hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+		struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b->qlen == 0)
+			decrement_prob(b, q);
+		else if (unlikely(b->qlen >= q->target))
+			increment_prob(b, q);
+		if (minqlen > b->qlen)
+			minqlen = b->qlen;
+		if (minprob > b->pm)
+			minprob = b->pm;
+	}
+
+	if (q->double_buffering) {
+		slot ^= 1;
+		hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+		for (i = 0; i < SFB_LEVELS; i++) {
+			u32 hash = sfbhash & SFB_BUCKET_MASK;
+			struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+			sfbhash >>= SFB_BUCKET_SHIFT;
+			if (b->qlen == 0)
+				decrement_prob(b, q);
+			else if (unlikely(b->qlen >= q->target))
+				increment_prob(b, q);
+		}
+	}
+
+	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		if (minqlen >= q->max)
+			q->stats.bucketdrop++;
+		else
+			q->stats.queuedrop++;
+		goto drop;
+	}
+
+	if (unlikely(minprob >= SFB_MAX_PROB)) {
+		/* Inelastic flow */
+		if (sfb_rate_limit(skb, q)) {
+			sch->qstats.overlimits++;
+			q->stats.penaltydrop++;
+			goto drop;
+		}
+		goto enqueue;
+	}
+
+	r = net_random() & SFB_MAX_PROB;
+
+	if (unlikely(r < minprob)) {
+		if (unlikely(minprob > SFB_MAX_PROB / 2)) {
+			/* If we're marking that many packets, then either
+			 * this flow is unresponsive, or we're badly congested.
+			 * In either case, we want to start dropping packets.
+			 */
+			if (r < (minprob - SFB_MAX_PROB / 2) * 2) {
+				q->stats.earlydrop++;
+				goto drop;
+			}
+		}
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.marked++;
+		} else {
+			q->stats.earlydrop++;
+			goto drop;
+		}
+	}
+
+enqueue:
+	ret = qdisc_enqueue(skb, child);
+	if (likely(ret == NET_XMIT_SUCCESS)) {
+		sch->q.qlen++;
+		increment_qlen(hashes, q);
+	} else if (net_xmit_drop_count(ret)) {
+		q->stats.childdrop++;
+		sch->qstats.drops++;
+	}
+	return ret;
+
+drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	struct sk_buff *skb;
+
+	skb = child->dequeue(q->qdisc);
+
+	if (skb) {
+		qdisc_bstats_update(sch, skb);
+		sch->q.qlen--;
+		decrement_qlen(skb, q);
+	}
+
+	return skb;
+}
+
+static struct sk_buff *sfb_peek(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+
+	return child->ops->peek(child);
+}
+
+/* No sfb_drop -- impossible since the child doesn't return the dropped skb. */
+
+static void sfb_reset(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+	sch->q.qlen = 0;
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+}
+
+static void sfb_destroy(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	qdisc_destroy(q->qdisc);
+}
+
+static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
+	[TCA_SFB_PARMS]	= { .len = sizeof(struct tc_sfb_qopt) },
+};
+
+static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = NULL;
+	struct nlattr *tb[TCA_SFB_MAX + 1];
+	struct tc_sfb_qopt *ctl;
+	u32 rehash_interval, db_interval;
+	u32 limit;
+	u32 max, target;
+	u32 increment, decrement;
+	u32 penalty_rate, penalty_burst;
+	int err;
+
+	if (opt == NULL) {
+		rehash_interval = 600;
+		db_interval = 60;
+		limit = 0;
+		max = 25;
+		target = 20;
+		increment = (SFB_MAX_PROB + 500) / 1000; /* 0.1 % */
+		decrement = (SFB_MAX_PROB + 3000) / 6000;
+		penalty_rate = 10;
+		penalty_burst = 20;
+	} else {
+		err = nla_parse_nested(tb, TCA_SFB_MAX, opt, sfb_policy);
+		if (err < 0)
+			return -EINVAL;
+
+		if (tb[TCA_SFB_PARMS] == NULL)
+			return -EINVAL;
+
+		ctl = nla_data(tb[TCA_SFB_PARMS]);
+
+		rehash_interval = ctl->rehash_interval;
+		db_interval = ctl->db_interval;
+		limit = ctl->limit;
+		max = ctl->max;
+		target = ctl->target;
+		increment = ctl->increment;
+		decrement = ctl->decrement;
+		penalty_rate = ctl->penalty_rate;
+		penalty_burst = ctl->penalty_burst;
+	}
+
+	if (limit == 0)
+		limit = qdisc_dev(sch)->tx_queue_len;
+	if (limit == 0)
+		limit = 1;
+
+	child = fifo_create_dflt(sch, &pfifo_qdisc_ops, limit);
+	if (IS_ERR(child))
+		return PTR_ERR(child);
+
+	sch_tree_lock(sch);
+
+	qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+	qdisc_destroy(q->qdisc);
+	q->qdisc = child;
+
+	q->rehash_interval = (unsigned long)rehash_interval * HZ;
+	q->db_interval = (unsigned long)db_interval * HZ;
+	q->rehash_time = jiffies;
+	q->limit = limit;
+	q->increment = increment;
+	q->decrement = decrement;
+	q->max = max;
+	q->target = target;
+	q->penalty_rate = penalty_rate;
+	q->penalty_burst = penalty_burst;
+	q->tokens_avail = penalty_burst;
+	q->token_time = jiffies;
+
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	q->qdisc = &noop_qdisc;
+	return sfb_change(sch, opt);
+}
+
+static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	struct tc_sfb_qopt opt = {
+		.rehash_interval = q->rehash_interval / HZ,
+		.db_interval = q->db_interval / HZ,
+		.limit = q->limit,
+		.max = q->max,
+		.target = q->target,
+		.increment = q->increment,
+		.decrement = q->decrement,
+		.penalty_rate = q->penalty_rate,
+		.penalty_burst = q->penalty_burst,
+	};
+
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	NLA_PUT(skb, TCA_SFB_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tc_sfb_xstats st = {
+		.earlydrop = q->stats.earlydrop,
+		.penaltydrop = q->stats.penaltydrop,
+		.bucketdrop = q->stats.bucketdrop,
+		.queuedrop = q->stats.queuedrop,
+		.childdrop = q->stats.childdrop,
+		.marked = q->stats.marked,
+	};
+
+	st.maxqlen = sfb_compute_qlen(&st.maxprob, q);
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static int sfb_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	return -ENOSYS;
+}
+
+static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = q->qdisc;
+	q->qdisc = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	return q->qdisc;
+}
+
+static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static void sfb_put(struct Qdisc *sch, unsigned long arg)
+{
+}
+
+static int sfb_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+			    struct nlattr **tca, unsigned long *arg)
+{
+	return -ENOSYS;
+}
+
+static int sfb_delete(struct Qdisc *sch, unsigned long cl)
+{
+	return -ENOSYS;
+}
+
+static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static unsigned long sfb_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	return 0;
+}
+
+
+static const struct Qdisc_class_ops sfb_class_ops = {
+	.graft		=	sfb_graft,
+	.leaf		=	sfb_leaf,
+	.get		=	sfb_get,
+	.put		=	sfb_put,
+	.change		=	sfb_change_class,
+	.delete		=	sfb_delete,
+	.walk		=	sfb_walk,
+	.tcf_chain	=	sfb_find_tcf,
+	.bind_tcf	=	sfb_bind,
+	.unbind_tcf	=	sfb_put,
+	.dump		=	sfb_dump_class,
+};
+
+struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
+	.id		=	"sfb",
+	.priv_size	=	sizeof(struct sfb_sched_data),
+	.cl_ops		=	&sfb_class_ops,
+	.enqueue	=	sfb_enqueue,
+	.dequeue	=	sfb_dequeue,
+	.peek		=	sfb_peek,
+	.init		=	sfb_init,
+	.reset		=	sfb_reset,
+	.destroy	=	sfb_destroy,
+	.change		=	sfb_change,
+	.dump		=	sfb_dump,
+	.dump_stats	=	sfb_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init sfb_module_init(void)
+{
+	return register_qdisc(&sfb_qdisc_ops);
+}
+
+static void __exit sfb_module_exit(void)
+{
+	unregister_qdisc(&sfb_qdisc_ops);
+}
+
+module_init(sfb_module_init)
+module_exit(sfb_module_exit)
+
+MODULE_DESCRIPTION("Stochastic Fair Blue queue discipline");
+MODULE_AUTHOR("Juliusz Chroboczek");
+MODULE_LICENSE("GPL");



^ permalink raw reply related

* [PATCH 2/2 net-next-2.6] r6040: bump to version 0.27 and date 23Feb2011
From: Florian Fainelli @ 2011-02-23 14:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: David Miller

From: Florian Fainelli <florian@openwrt.org>

Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 7965ae4..e3ebd90 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -49,8 +49,8 @@
 #include <asm/processor.h>

 #define DRV_NAME	"r6040"
-#define DRV_VERSION	"0.26"
-#define DRV_RELDATE	"30May2010"
+#define DRV_VERSION	"0.27"
+#define DRV_RELDATE	"23Feb2011"

 /* PHY CHIP Address */
 #define PHY1_ADDR	1	/* For MAC1 */
--
1.7.2.3


^ permalink raw reply related

* [PATCH 1/2 net-next-2.6] r6040: fix multicast operations
From: Florian Fainelli @ 2011-02-23 14:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: David Miller

From: Shawn Lin <shawn@dmp.com.tw>

The original code does not work well when the number of mulitcast
address to handle is greater than MCAST_MAX. It only enable promiscous
mode instead of multicast hash table mode, so the hash table function
will not be activated and all multicast frames will be recieved in this
condition.

This patch fixes the following issues with the r6040 NIC operating in
multicast:

1) When the IFF_ALLMULTI flag is set, we should write 0xffff to the NIC
hash table registers to make it process multicast traffic.

2) When the number of multicast address to handle is smaller than
MCAST_MAX, we should use the NIC multicast registers MID1_{L,M,H}.

3) The hashing of the address was not correct, due to an invalid
substraction (15 - (crc & 0x0f)) instead of (crc & 0x0f) and an
incorrect crc algorithm (ether_crc_le) instead of (ether_crc).

4) If necessary, we should set HASH_EN flag in MCR0 to enable multicast
hash table function.

CC: stable@kernel.org
Reported-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
Tested-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
Signed-off-by: Shawn Lin <shawn@dmp.com.tw>
Signed-off-by: Albert Chen <albert.chen@rdc.com.tw>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
Stable: 2.6.33+

diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 27e6f6d..7965ae4 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -69,6 +69,8 @@

 /* MAC registers */
 #define MCR0		0x00	/* Control register 0 */
+#define  MCR0_PROMISC	0x0020	/* Promiscuous mode */
+#define  MCR0_HASH_EN	0x0100	/* Enable multicast hash table function */
 #define MCR1		0x04	/* Control register 1 */
 #define  MAC_RST	0x0001	/* Reset the MAC */
 #define MBCR		0x08	/* Bus control */
@@ -851,77 +853,92 @@ static void r6040_multicast_list(struct net_device *dev)
 {
 	struct r6040_private *lp = netdev_priv(dev);
 	void __iomem *ioaddr = lp->base;
-	u16 *adrp;
-	u16 reg;
 	unsigned long flags;
 	struct netdev_hw_addr *ha;
 	int i;
+	u16 *adrp;
+	u16 hash_table[4] = { 0 };
+
+	spin_lock_irqsave(&lp->lock, flags);

-	/* MAC Address */
+	/* Keep our MAC Address */
 	adrp = (u16 *)dev->dev_addr;
 	iowrite16(adrp[0], ioaddr + MID_0L);
 	iowrite16(adrp[1], ioaddr + MID_0M);
 	iowrite16(adrp[2], ioaddr + MID_0H);

-	/* Promiscous Mode */
-	spin_lock_irqsave(&lp->lock, flags);
-
 	/* Clear AMCP & PROM bits */
-	reg = ioread16(ioaddr) & ~0x0120;
-	if (dev->flags & IFF_PROMISC) {
-		reg |= 0x0020;
-		lp->mcr0 |= 0x0020;
-	}
-	/* Too many multicast addresses
-	 * accept all traffic */
-	else if ((netdev_mc_count(dev) > MCAST_MAX) ||
-		 (dev->flags & IFF_ALLMULTI))
-		reg |= 0x0020;
+	lp->mcr0 = ioread16(ioaddr + MCR0) & ~(MCR0_PROMISC | MCR0_HASH_EN);

-	iowrite16(reg, ioaddr);
-	spin_unlock_irqrestore(&lp->lock, flags);
+	/* Promiscuous mode */
+	if (dev->flags & IFF_PROMISC)
+		lp->mcr0 |= MCR0_PROMISC;

-	/* Build the hash table */
-	if (netdev_mc_count(dev) > MCAST_MAX) {
-		u16 hash_table[4];
-		u32 crc;
+	/* Enable multicast hash table function to
+	 * receive all multicast packets. */
+	else if (dev->flags & IFF_ALLMULTI) {
+		lp->mcr0 |= MCR0_HASH_EN;

-		for (i = 0; i < 4; i++)
-			hash_table[i] = 0;
+		for (i = 0; i < MCAST_MAX ; i++) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+		}

+		for (i = 0; i < 4; i++)
+			hash_table[i] = 0xffff;
+	}
+	/* Use internal multicast address registers if the number of
+	 * multicast addresses is not greater than MCAST_MAX. */
+	else if (netdev_mc_count(dev) <= MCAST_MAX) {
+		i = 0;
 		netdev_for_each_mc_addr(ha, dev) {
-			char *addrs = ha->addr;
+			u16 *adrp = (u16 *) ha->addr;
+			iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
+			iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
+			iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
+			i++;
+		}
+		while (i < MCAST_MAX) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+			i++;
+		}
+	}
+	/* Otherwise, Enable multicast hash table function. */
+	else {
+		u32 crc;

-			if (!(*addrs & 1))
-				continue;
+		lp->mcr0 |= MCR0_HASH_EN;
+
+		for (i = 0; i < MCAST_MAX ; i++) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+		}

-			crc = ether_crc_le(6, addrs);
+		/* Build multicast hash table */
+		netdev_for_each_mc_addr(ha, dev) {
+			u8 *addrs = ha->addr;
+
+			crc = ether_crc(ETH_ALEN, addrs);
 			crc >>= 26;
-			hash_table[crc >> 4] |= 1 << (15 - (crc & 0xf));
+			hash_table[crc >> 4] |= 1 << (crc & 0xf);
 		}
-		/* Fill the MAC hash tables with their values */
+	}
+
+	iowrite16(lp->mcr0, ioaddr + MCR0);
+
+	/* Fill the MAC hash tables with their values */
+	if (lp->mcr0 && MCR0_HASH_EN) {
 		iowrite16(hash_table[0], ioaddr + MAR0);
 		iowrite16(hash_table[1], ioaddr + MAR1);
 		iowrite16(hash_table[2], ioaddr + MAR2);
 		iowrite16(hash_table[3], ioaddr + MAR3);
 	}
-	/* Multicast Address 1~4 case */
-	i = 0;
-	netdev_for_each_mc_addr(ha, dev) {
-		if (i >= MCAST_MAX)
-			break;
-		adrp = (u16 *) ha->addr;
-		iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
-		iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
-		iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
-		i++;
-	}
-	while (i < MCAST_MAX) {
-		iowrite16(0xffff, ioaddr + MID_1L + 8 * i);
-		iowrite16(0xffff, ioaddr + MID_1M + 8 * i);
-		iowrite16(0xffff, ioaddr + MID_1H + 8 * i);
-		i++;
-	}
+
+	spin_unlock_irqrestore(&lp->lock, flags);
 }

 static void netdev_get_drvinfo(struct net_device *dev,
--
1.7.2.3


^ permalink raw reply related

* check for return value of clk_get() in mpc512x_can_get_clock()
From: Andrej Krutak @ 2011-02-23 13:35 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, netdev

Hi,

just wanted to inform you - in the mpc512x_can_get_clock() function 
(drivers/net/can/mscan/mpc5xxx_can.c), there are calls to clg_get(), and later 
checks

if (!sys_clk) {
and
if (!ref_clk) {

but these should be

if (IS_ERR(sys_clk)) {
and
if (IS_ERR(ref_clk)) {

(otherwise you'll get an oops if something goes wrong with clg_get()).


-- 
Best regards | S pozdravom | Mit freundlichen Grüßen

Andrej Krutak

^ permalink raw reply

* [PATCHv2 NEXT 2/2] qlcnic: Remove validation for max tx and max rx queues
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Rajesh Borundia
In-Reply-To: <1298467285-7201-1-git-send-email-amit.salecha@qlogic.com>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

Max rx queues and tx queues are governed by fimware.
So driver should not validate these values.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    4 ----
 drivers/net/qlcnic/qlcnic_main.c |    6 ++----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index fa7f794..dc44564 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -1132,14 +1132,10 @@ struct qlcnic_eswitch {
 #define MAX_BW			100	/* % of link speed */
 #define MAX_VLAN_ID		4095
 #define MIN_VLAN_ID		2
-#define MAX_TX_QUEUES		1
-#define MAX_RX_QUEUES		4
 #define DEFAULT_MAC_LEARN	1
 
 #define IS_VALID_VLAN(vlan)	(vlan >= MIN_VLAN_ID && vlan < MAX_VLAN_ID)
 #define IS_VALID_BW(bw)		(bw <= MAX_BW)
-#define IS_VALID_TX_QUEUES(que)	(que > 0 && que <= MAX_TX_QUEUES)
-#define IS_VALID_RX_QUEUES(que)	(que > 0 && que <= MAX_RX_QUEUES)
 
 struct qlcnic_pci_func_cfg {
 	u16	func_type;
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 4994b94..cd88c7e 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -3653,10 +3653,8 @@ validate_npar_config(struct qlcnic_adapter *adapter,
 		if (adapter->npars[pci_func].type != QLCNIC_TYPE_NIC)
 			return QL_STATUS_INVALID_PARAM;
 
-		if (!IS_VALID_BW(np_cfg[i].min_bw)
-				|| !IS_VALID_BW(np_cfg[i].max_bw)
-				|| !IS_VALID_RX_QUEUES(np_cfg[i].max_rx_queues)
-				|| !IS_VALID_TX_QUEUES(np_cfg[i].max_tx_queues))
+		if (!IS_VALID_BW(np_cfg[i].min_bw) ||
+		    !IS_VALID_BW(np_cfg[i].max_bw))
 			return QL_STATUS_INVALID_PARAM;
 	}
 	return 0;
-- 
1.7.3.2


^ permalink raw reply related

* [PATCHv2 NEXT 1/2] qlcnic: fix checks for auto_fw_reset
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
In-Reply-To: <1298467285-7201-1-git-send-email-amit.salecha@qlogic.com>

o Remove checks of 1 for auto_fw_reset module parameter.
  auto_fw_reset is of type int and can have value > 1.
o Remove unnecessary #define for 1

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 -
 drivers/net/qlcnic/qlcnic_main.c |    9 ++++-----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 44e316f..fa7f794 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -867,7 +867,6 @@ struct qlcnic_nic_intr_coalesce {
 #define LINKEVENT_LINKSPEED_MBPS	0
 #define LINKEVENT_LINKSPEED_ENCODED	1
 
-#define AUTO_FW_RESET_ENABLED	0x01
 /* firmware response header:
  *	63:58 - message type
  *	57:56 - owner
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 37c04b4..4994b94 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -42,7 +42,7 @@ static int use_msi_x = 1;
 module_param(use_msi_x, int, 0444);
 MODULE_PARM_DESC(use_msi_x, "MSI-X interrupt (0=disabled, 1=enabled");
 
-static int auto_fw_reset = AUTO_FW_RESET_ENABLED;
+static int auto_fw_reset = 1;
 module_param(auto_fw_reset, int, 0644);
 MODULE_PARM_DESC(auto_fw_reset, "Auto firmware reset (0=disabled, 1=enabled");
 
@@ -2959,8 +2959,7 @@ qlcnic_check_health(struct qlcnic_adapter *adapter)
 		if (adapter->need_fw_reset)
 			goto detach;
 
-		if (adapter->reset_context &&
-		    auto_fw_reset == AUTO_FW_RESET_ENABLED) {
+		if (adapter->reset_context && auto_fw_reset) {
 			qlcnic_reset_hw_context(adapter);
 			adapter->netdev->trans_start = jiffies;
 		}
@@ -2973,7 +2972,7 @@ qlcnic_check_health(struct qlcnic_adapter *adapter)
 
 	qlcnic_dev_request_reset(adapter);
 
-	if ((auto_fw_reset == AUTO_FW_RESET_ENABLED))
+	if (auto_fw_reset)
 		clear_bit(__QLCNIC_FW_ATTACHED, &adapter->state);
 
 	dev_info(&netdev->dev, "firmware hang detected\n");
@@ -2982,7 +2981,7 @@ detach:
 	adapter->dev_state = (state == QLCNIC_DEV_NEED_QUISCENT) ? state :
 		QLCNIC_DEV_NEED_RESET;
 
-	if ((auto_fw_reset == AUTO_FW_RESET_ENABLED) &&
+	if (auto_fw_reset &&
 		!test_and_set_bit(__QLCNIC_RESETTING, &adapter->state)) {
 
 		qlcnic_schedule_work(adapter, qlcnic_detach_work, 0);
-- 
1.7.3.2


^ permalink raw reply related

* [PATCHv2 NEXT 0/2]qlcnic: minor fixes
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty

Hi
  Series v2 of minor fixes. Apply them on net-next.
  Now, patch 1 doesn't change module parameters name and its properties.

-Amit

^ permalink raw reply

* Re: [PATCH V3] Export ACPI _DSM provided firmware instance number and string name to sysfs
From: Narendra_K @ 2011-02-23 13:00 UTC (permalink / raw)
  To: mjg59
  Cc: linux-pci, linux-hotplug, netdev, Matt_Domsch, Charles_Rose,
	Jordan_Hargrave, Shyam_Iyer
In-Reply-To: <20110223124419.GA5695@srcf.ucam.org>

On Wed, Feb 23, 2011 at 06:14:19PM +0530, Matthew Garrett wrote:
> I think this version will still break the build. You need to depend on 
> CONFIG_NLS.

Matthew,

Thanks. I posted a patch to linux-next to fix the build failure.

With regards,
Narendra K

^ permalink raw reply

* [PATCH] f_phonet: avoid pskb_pull(), fix OOPS with CONFIG_HIGHMEM
From: Rémi Denis-Courmont @ 2011-02-23 12:51 UTC (permalink / raw)
  To: netdev
In-Reply-To: <linux-usb@vger.kernel.org>

This is similar to what we already do in cdc-phonet.c in the same
situation. pskb_pull() refuses to work with HIGHMEM, even if it is
known that the socket buffer is entirely in "low" memory.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 drivers/usb/gadget/f_phonet.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/f_phonet.c b/drivers/usb/gadget/f_phonet.c
index 3c6e1a0..5e14950 100644
--- a/drivers/usb/gadget/f_phonet.c
+++ b/drivers/usb/gadget/f_phonet.c
@@ -346,14 +346,19 @@ static void pn_rx_complete(struct usb_ep *ep, struct usb_request *req)
 
 		if (unlikely(!skb))
 			break;
-		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, 0,
-				req->actual);
-		page = NULL;
 
-		if (req->actual < req->length) { /* Last fragment */
+		if (skb->len == 0) { /* First fragment */
 			skb->protocol = htons(ETH_P_PHONET);
 			skb_reset_mac_header(skb);
-			pskb_pull(skb, 1);
+			/* Can't use pskb_pull() on page in IRQ */
+			memcpy(skb_put(skb, 1), page_address(page), 1);
+		}
+
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+				skb->len == 0, req->actual);
+		page = NULL;
+
+		if (req->actual < req->length) { /* Last fragment */
 			skb->dev = dev;
 			dev->stats.rx_packets++;
 			dev->stats.rx_bytes += skb->len;
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH V3] Export ACPI _DSM provided firmware instance number and string name to sysfs
From: Matthew Garrett @ 2011-02-23 12:44 UTC (permalink / raw)
  To: Narendra_K
  Cc: linux-pci, linux-hotplug, netdev, Matt_Domsch, Charles_Rose,
	Jordan_Hargrave, Shyam_Iyer
In-Reply-To: <20110223125741.GA16473@fedora14-r610.blr.amer.dell.com>

I think this version will still break the build. You need to depend on 
CONFIG_NLS.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* [PATCH V3] Export ACPI _DSM provided firmware instance number and string name to sysfs
From: Narendra_K @ 2011-02-23 12:36 UTC (permalink / raw)
  To: linux-pci, linux-hotplug
  Cc: netdev, mjg, Matt_Domsch, Charles_Rose, Jordan_Hargrave,
	Shyam_Iyer

Hello,

This patch exports ACPI _DSM provided firmware instance number and
string name to sysfs.

V1 -> V2:
The attribute 'index' is changed to 'acpi_index' as the semantics of
SMBIOS provided device type instance and ACPI _DSM provided firmware
instance number are different.

V2 -> V3:
Matthew Garrett pointed out that 'sysfs_create_groups' does return an
error when there are no ACPI _DSM attributes available and because of
that the fallback to SMBIOS will not happen. As a result SMBIOS provided
attributes are not created. This version of the patch addresses the issue.

Please consider this patch for inclusion.

From: Narendra K <narendra_k@dell.com>
Subject: [PATCH] Export ACPI _DSM provided firmware instance number and string to sysfs

This patch exports ACPI _DSM (Device Specific Method) provided firmware
instance number and string name of PCI devices as defined by
'PCI Firmware Specification Revision 3.1' section 4.6.7.( DSM for Naming
a PCI or PCI Express Device Under Operating Systems) to sysfs.

New files created are:
  /sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
  /sys/bus/pci/devices/.../acpi_index which contains the firmware device type
instance for the given device.

cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/acpi_index
1
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/label
Embedded Broadcom 5709C NIC 1

cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/acpi_index
2
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/label
Embedded Broadcom 5709C NIC 2

The ACPI _DSM provided firmware 'instance number' and 'string name' will
be given priority if the firmware also provides 'SMBIOS type 41 device
type instance and string'.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
---
 Documentation/ABI/testing/sysfs-bus-pci |   31 +++-
 drivers/pci/Makefile                    |    3 +-
 drivers/pci/pci-label.c                 |  247 ++++++++++++++++++++++++++++++-
 drivers/pci/pci.h                       |    2 +-
 4 files changed, 268 insertions(+), 15 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index f979d82..36bf454 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -145,9 +145,11 @@ Date:		July 2010
 Contact:	Narendra K <narendra_k@dell.com>, linux-bugs@dell.com
 Description:
 		Reading this attribute will provide the firmware
-		given name(SMBIOS type 41 string) of the PCI device.
-		The attribute will be created only if the firmware
-		has given a name to the PCI device.
+		given name (SMBIOS type 41 string or ACPI _DSM string) of
+		the PCI device.	The attribute will be created only
+		if the firmware	has given a name to the PCI device.
+		ACPI _DSM string name will be given priority if the
+		system firmware provides SMBIOS type 41 string also.
 Users:
 		Userspace applications interested in knowing the
 		firmware assigned name of the PCI device.
@@ -157,12 +159,27 @@ Date:		July 2010
 Contact:	Narendra K <narendra_k@dell.com>, linux-bugs@dell.com
 Description:
 		Reading this attribute will provide the firmware
-		given instance(SMBIOS type 41 device type instance)
-		of the PCI device. The attribute will be created
-		only if the firmware has given a device type instance
-		to the PCI device.
+		given instance (SMBIOS type 41 device type instance) of the
+		PCI device. The attribute will be created only if the firmware
+		has given an instance number to the PCI device.
 Users:
 		Userspace applications interested in knowing the
 		firmware assigned device type instance of the PCI
 		device that can help in understanding the firmware
 		intended order of the PCI device.
+
+What:		/sys/bus/pci/devices/.../acpi_index
+Date:		July 2010
+Contact:	Narendra K <narendra_k@dell.com>, linux-bugs@dell.com
+Description:
+		Reading this attribute will provide the firmware
+		given instance (ACPI _DSM instance number) of the PCI device.
+		The attribute will be created only if the firmware has given
+		an instance number to the PCI device. ACPI _DSM instance number
+		will be given priority if the system firmware provides SMBIOS
+		type 41 device type instance also.
+Users:
+		Userspace applications interested in knowing the
+		firmware assigned instance number of the PCI
+		device that can help in understanding the firmware
+		intended order of the PCI device.
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 98e6fdf..bb1d3b2 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -53,8 +53,9 @@ obj-$(CONFIG_TILE) += setup-bus.o setup-irq.o
 
 #
 # ACPI Related PCI FW Functions
+# ACPI _DSM provided firmware instance and string name
 #
-obj-$(CONFIG_ACPI)    += pci-acpi.o
+obj-$(CONFIG_ACPI)    += pci-acpi.o pci-label.o
 
 # SMBIOS provided firmware instance and labels
 obj-$(CONFIG_DMI)    += pci-label.o
diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
index 90c0a72..824e247 100644
--- a/drivers/pci/pci-label.c
+++ b/drivers/pci/pci-label.c
@@ -5,6 +5,13 @@
  * by Narendra K <Narendra_K@dell.com>,
  * Jordan Hargrave <Jordan_Hargrave@dell.com>
  *
+ * PCI Firmware Specification Revision 3.1 section 4.6.7 (DSM for Naming a
+ * PCI or PCI Express Device Under Operating Systems) defines an instance
+ * number and string name. This code retrieves them and exports them to sysfs.
+ * If the system firmware does not provide the ACPI _DSM (Device Specific
+ * Method), then the SMBIOS type 41 instance number and string is exported to
+ * sysfs.
+ *
  * SMBIOS defines type 41 for onboard pci devices. This code retrieves
  * the instance number and string from the type 41 record and exports
  * it to sysfs.
@@ -19,8 +26,30 @@
 #include <linux/pci_ids.h>
 #include <linux/module.h>
 #include <linux/device.h>
+#include <linux/nls.h>
+#include <linux/acpi.h>
+#include <linux/pci-acpi.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acpi_bus.h>
 #include "pci.h"
 
+#define	DEVICE_LABEL_DSM	0x07
+
+#ifndef CONFIG_DMI
+
+static inline int
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+	return -1;
+}
+
+static inline void
+pci_remove_smbiosname_file(struct pci_dev *pdev)
+{
+}
+
+#else
+
 enum smbios_attr_enum {
 	SMBIOS_ATTR_NONE = 0,
 	SMBIOS_ATTR_LABEL_SHOW,
@@ -120,9 +149,7 @@ static struct attribute_group smbios_attr_group = {
 static int
 pci_create_smbiosname_file(struct pci_dev *pdev)
 {
-	if (!sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group))
-		return 0;
-	return -ENODEV;
+	return sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group);
 }
 
 static void
@@ -131,13 +158,221 @@ pci_remove_smbiosname_file(struct pci_dev *pdev)
 	sysfs_remove_group(&pdev->dev.kobj, &smbios_attr_group);
 }
 
+#endif
+
+#ifndef CONFIG_ACPI
+
+static inline int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+	return -1;
+}
+
+static inline int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+	return -1;
+}
+
+#else
+
+static const char device_label_dsm_uuid[] = {
+	0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
+	0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
+};
+
+enum acpi_attr_enum {
+	ACPI_ATTR_NONE = 0,
+	ACPI_ATTR_LABEL_SHOW,
+	ACPI_ATTR_INDEX_SHOW,
+};
+
+static void dsm_label_utf16s_to_utf8s(union acpi_object *obj, char *buf)
+{
+	int len;
+	len = utf16s_to_utf8s((const wchar_t *)obj->
+			      package.elements[1].string.pointer,
+			      obj->package.elements[1].string.length,
+			      UTF16_LITTLE_ENDIAN,
+			      buf, PAGE_SIZE);
+	buf[len] = '\n';
+}
+
+static int
+dsm_get_label(acpi_handle handle, int func,
+	      struct acpi_buffer *output,
+	      char *buf, enum acpi_attr_enum attribute)
+{
+	struct acpi_object_list input;
+	union acpi_object params[4];
+	union acpi_object *obj;
+	int len = 0;
+
+	int err;
+
+	input.count = 4;
+	input.pointer = params;
+	params[0].type = ACPI_TYPE_BUFFER;
+	params[0].buffer.length = sizeof(device_label_dsm_uuid);
+	params[0].buffer.pointer = (char *)device_label_dsm_uuid;
+	params[1].type = ACPI_TYPE_INTEGER;
+	params[1].integer.value = 0x02;
+	params[2].type = ACPI_TYPE_INTEGER;
+	params[2].integer.value = func;
+	params[3].type = ACPI_TYPE_PACKAGE;
+	params[3].package.count = 0;
+	params[3].package.elements = NULL;
+
+	err = acpi_evaluate_object(handle, "_DSM", &input, output);
+	if (err)
+		return -1;
+
+	obj = (union acpi_object *)output->pointer;
+
+	switch (obj->type) {
+	case ACPI_TYPE_PACKAGE:
+		if (obj->package.count != 2)
+			break;
+		len = obj->package.elements[0].integer.value;
+		if (buf) {
+			if (attribute == ACPI_ATTR_INDEX_SHOW)
+				scnprintf(buf, PAGE_SIZE, "%llu\n",
+				obj->package.elements[0].integer.value);
+			else if (attribute == ACPI_ATTR_LABEL_SHOW)
+				dsm_label_utf16s_to_utf8s(obj, buf);
+			kfree(output->pointer);
+			return strlen(buf);
+		}
+		kfree(output->pointer);
+		return len;
+	break;
+	default:
+		kfree(output->pointer);
+	}
+	return -1;
+}
+
+static bool
+device_has_dsm(struct device *dev)
+{
+	acpi_handle handle;
+	struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+
+	handle = DEVICE_ACPI_HANDLE(dev);
+
+	if (!handle)
+		return FALSE;
+
+	if (dsm_get_label(handle, DEVICE_LABEL_DSM, &output, NULL,
+			  ACPI_ATTR_NONE) > 0)
+		return TRUE;
+
+	return FALSE;
+}
+
+static mode_t
+acpi_index_string_exist(struct kobject *kobj, struct attribute *attr, int n)
+{
+	struct device *dev;
+
+	dev = container_of(kobj, struct device, kobj);
+
+	if (device_has_dsm(dev))
+		return S_IRUGO;
+
+	return 0;
+}
+
+static ssize_t
+acpilabel_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+	acpi_handle handle;
+	int length;
+
+	handle = DEVICE_ACPI_HANDLE(dev);
+
+	if (!handle)
+		return -1;
+
+	length = dsm_get_label(handle, DEVICE_LABEL_DSM,
+			       &output, buf, ACPI_ATTR_LABEL_SHOW);
+
+	if (length < 1)
+		return -1;
+
+	return length;
+}
+
+static ssize_t
+acpiindex_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+	acpi_handle handle;
+	int length;
+
+	handle = DEVICE_ACPI_HANDLE(dev);
+
+	if (!handle)
+		return -1;
+
+	length = dsm_get_label(handle, DEVICE_LABEL_DSM,
+			       &output, buf, ACPI_ATTR_INDEX_SHOW);
+
+	if (length < 0)
+		return -1;
+
+	return length;
+
+}
+
+static struct device_attribute acpi_attr_label = {
+	.attr = {.name = "label", .mode = 0444},
+	.show = acpilabel_show,
+};
+
+static struct device_attribute acpi_attr_index = {
+	.attr = {.name = "acpi_index", .mode = 0444},
+	.show = acpiindex_show,
+};
+
+static struct attribute *acpi_attributes[] = {
+	&acpi_attr_label.attr,
+	&acpi_attr_index.attr,
+	NULL,
+};
+
+static struct attribute_group acpi_attr_group = {
+	.attrs = acpi_attributes,
+	.is_visible = acpi_index_string_exist,
+};
+
+static int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+	return sysfs_create_group(&pdev->dev.kobj, &acpi_attr_group);
+}
+
+static int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+	sysfs_remove_group(&pdev->dev.kobj, &acpi_attr_group);
+	return 0;
+}
+#endif
+
 void pci_create_firmware_label_files(struct pci_dev *pdev)
 {
-	if (!pci_create_smbiosname_file(pdev))
-		;
+	if (device_has_dsm(&pdev->dev))
+		pci_create_acpi_index_label_files(pdev);
+	else
+		pci_create_smbiosname_file(pdev);
 }
 
 void pci_remove_firmware_label_files(struct pci_dev *pdev)
 {
-	pci_remove_smbiosname_file(pdev);
+	if (device_has_dsm(&pdev->dev))
+		pci_remove_acpi_index_label_files(pdev);
+	else
+		pci_remove_smbiosname_file(pdev);
 }
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f69d6e0..a6ec200 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -11,7 +11,7 @@
 extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env);
 extern int pci_create_sysfs_dev_files(struct pci_dev *pdev);
 extern void pci_remove_sysfs_dev_files(struct pci_dev *pdev);
-#ifndef CONFIG_DMI
+#if !defined(CONFIG_DMI) && !defined(CONFIG_ACPI)
 static inline void pci_create_firmware_label_files(struct pci_dev *pdev)
 { return; }
 static inline void pci_remove_firmware_label_files(struct pci_dev *pdev)
-- 
1.7.3.1

With regards,
Narendra K

^ permalink raw reply related

* Re: [PATCH] cls_u32: fix sparse warnings
From: jamal @ 2011-02-23 12:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110220181423.492594b7@nehalam>

On Sun, 2011-02-20 at 18:14 -0800, Stephen Hemminger wrote:
> The variable _data is used in asm-generic to define sections
> which causes sparse warnings, so just rename the variable.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>

Hey, while you are sparsing away you may wanna fix 
br_fdb_cleanup() as well ;-> /me runs

cheers,
jamal


^ permalink raw reply

* Re: Mass udp flow reboot linux with RealTek RTL-8169 Gigabit
From: Hans Nieser @ 2011-02-23 12:21 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20110223095538.GA12662@electric-eye.fr.zoreil.com>

On Wed, 2011-02-23 at 10:55 +0100, Francois Romieu wrote:
> Hans Nieser <hnsr@xs4all.nl> :
> [...]
> > With your patches applied to 2.6.38-rc6, I have gathered some of the
> > info you requested from Seblu as well, I hope it's helpful:
> > 
> > 1: see attachment
> 
> Ok.
> 
> The chipset requires no trivial last minute regression fix (yet).
> 
> > 2: I'm not sure how to check the size of the packets, but I'm just
> > fetching a (large) file over http/tcp, so I guess they are mostly of the
> > size of my MTU which is 1500 looking at ifconfig output
> 
> Fine.
> 
> Your testcases are always based on a real download, whence including some
> disk activity, as opposed to a pure network test, right ?

Yeah, I just had a little script that wgetted a file from a webserver in
my LAN and saved it to separate (non-root) fs, then removed it - in a
loop. When testing on the 2.6.35 and 2.6.35.9 kernels it did max out at
about 107MiB/s, sometimes falling down a little presumably when disk was
being touched.

> > For the other vmstat/ethtool/interrupts output, I started the following
> > commands remotely via ssh a second or two before starting the download,
> > and the machine locked up a few seconds later:
> 
> SysRq is enabled (/etc/sysctl.conf::kernel.sysrq = 1), the computer was
> switched back on a no-X console before the test. Then the keyboard leds
> ignore keypresses and the sysrq keys don't display anything in the
> console, right ?

Yep I had X shutdown and switched to VT1, after lock up the LEDs can't
be toggled anymore and sysrq key combo was nonresponsive (it works if I
do it before it locks up)

> You may enable PCIEASPM_DEBUG, force 'pcie_aspm=off' and switch from
> SLUB to SLAB but it's a bit cargo-cultish.

I'll give that a try this evening

> A bisection could help. Bisecting 2.6.35 .. 2.6.35.9 may be enough if
> 2.6.35.9 works well.

Hmm did you mean bisecting 2.6.36 - 2.6.35.9 ? Since with 2.6.36 and
above I can get the machine to hang within seconds and performance is
really bad (10-20MiB/s with wget), while with 2.6.35.9 and 2.6.35
performance was really good (reaching 107MiB/s most of the time) and
lock up took 5-10 minutes instead of seconds (I guess I didn't mention
this in my last e-mail but I managed to get both 2.6.35 and 2.6.35.9 to
lock up eventually) - but I guess something changed between .35 and .36
that made the issue easier to trigger.

I can also try even older kernels to see if there is one that doesn't
lock up at all

^ permalink raw reply

* FW: tproxy + bridge  - possible redundant skb_orphan call ?
From: Haim Daniel @ 2011-02-23 10:56 UTC (permalink / raw)
  To: 'netdev@vger.kernel.org'

Hi,
I use 2.6.34 on a transparent PEP:

A daemon listening to port 50800 on PEP is set.
The following transparent proxy rule is set for PEP:
iptables -t mangle -A PREROUTING  -p tcp --dport 666 -j TPROXY --tproxy-mark 0x1/0x1 --on-port 50800

In case of bridging: 
	the mangle PREROUTING is hooked from ebtables _before_ ip_rcv().
	the skb hits the TPROXY rule.
	nf_tproxy_assign_sock() sets skb->sk to local listen socket on port 50800.

However, ip_rcv() invokes later and orphans the skb. skb->sk is nulled => it prevents the
skb from reaching the local socket.

When the call to skb_orphan(skb) in ip_rcv() is removed, the skb reaches the local sk - ok.

On other cases, when the bridging is not involved, ip_rcv() is called before mangle PREROUTING ,hence skb
reaches the TPROXY target and being orphaned before skb->sk is set in nf_tproxy_assign_sock().

I don't understand what is the purpose of skb_orphan(skb) call in ip_rcv.
The comment above it states that it's related to tproxy:

/* Must drop socket now because of tproxy. */
skb_orphan(skb);

but since nf_tproxy_assign_sock() orphans the skb, it looks like
ip_rcv() skb_orphan() is redundant ?

appreciate your response,

best regards,

--haim d.
--eyal g.

^ permalink raw reply

* Re: [PATCH V11 0/4] ptp: IEEE 1588 hardware clock support
From: Richard Cochran @ 2011-02-23 10:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, netdev, devicetree-discuss, linux-arm-kernel,
	linuxppc-dev, Alan Cox, Arnd Bergmann, Christoph Lameter,
	David Miller, John Stultz, Krzysztof Halasa, Peter Zijlstra,
	Rodolfo Giometti, Thomas Gleixner, Benjamin Herrenschmidt,
	Mike Frysinger, Paul Mackerras, Russell King
In-Reply-To: <cover.1298447722.git.richard.cochran@omicron.at>

* Previous Discussions

  This patch set previously appeared on the netdev list. Since V5 of
  the character device patch set, the discussion has moved to the
  lkml.

  - IEEE 1588 hardware clock support [V5]
    http://lkml.org/lkml/2010/8/16/90

  - POSIX clock tuning syscall with static clock ids
    http://lkml.org/lkml/2010/8/23/49

  - POSIX clock tuning syscall with dynamic clock ids
    http://lkml.org/lkml/2010/9/3/119

  - IEEE 1588 hardware clock support [V6]
    http://lkml.org/lkml/2010/9/23/310

  - Dynamic clock devices [RFC]
    http://lkml.org/lkml/2010/11/4/290

  - IEEE 1588 hardware clock support [V7]
    http://lkml.org/lkml/2010/12/16/195

  - IEEE 1588 hardware clock support [V8]
    http://lkml.org/lkml/2010/12/31/128

  - IEEE 1588 hardware clock support [V9]
    http://lkml.org/lkml/2011/1/13/65

  - IEEE 1588 hardware clock support [V10]
    http://lkml.org/lkml/2011/1/27/71

  - Thomas Gleixner: Rework of the PTP support series core code
    http://lkml.org/lkml/2011/2/1/137

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox