Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] sctp: implement SIOCINQ ioctl() (take 3)
From: Diego Elio Pettenò @ 2010-10-01  9:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sctp
In-Reply-To: <20100930.173557.108787840.davem@davemloft.net>

Il giorno gio, 30/09/2010 alle 17.35 -0700, David Miller ha scritto:
> 
> 
> Please put the break statement inside of the basic block.
> 
> The way you have it here the indentation looks not so nice. 

Okay will resend; please do note that I copied the style (and most of
the code) from dccp anyway.

-- 
Diego Elio Pettenò — “Flameeyes”
http://blog.flameeyes.eu/

If you found a .asc file in this mail and know not what it is,
it's a GnuPG digital signature: http://www.gnupg.org/



^ permalink raw reply

* [PATCH] sctp: implement SIOCINQ ioctl() (take 3 bis)
From: Diego Elio Pettenò @ 2010-10-01  9:56 UTC (permalink / raw)
  To: netdev, linux-sctp; +Cc: Diego Elio 'Flameeyes' Pettenò
In-Reply-To: <20100930.173557.108787840.davem@davemloft.net>

From: Diego Elio 'Flameeyes' Pettenò <flameeyes@gmail.com>

This simple patch copies the current approach for SIOCINQ ioctl() from DCCP
into SCTP so that the userland code working with SCTP can use a similar
interface across different protocols to know how much space to allocate for
a buffer.

Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com>
---
 net/sctp/socket.c |   35 ++++++++++++++++++++++++++++++++++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ca44917..8872028 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3595,7 +3595,40 @@ out:
 /* The SCTP ioctl handler. */
 SCTP_STATIC int sctp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-	return -ENOIOCTLCMD;
+	int rc = -ENOTCONN;
+
+	sctp_lock_sock(sk);
+
+	/*
+	 * SEQPACKET-style sockets in LISTENING state are valid, for
+	 * SCTP, so only discard TCP-style sockets in LISTENING state.
+	 */
+	if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING))
+		goto out;
+
+	switch (cmd) {
+	case SIOCINQ: {
+		struct sk_buff *skb;
+		unsigned int amount = 0;
+
+		skb = skb_peek(&sk->sk_receive_queue);
+		if (skb != NULL) {
+			/*
+			 * We will only return the amount of this packet since
+			 * that is all that will be read.
+			 */
+			amount = skb->len;
+		}
+		rc = put_user(amount, (int __user *)arg);
+		break;
+	}
+	default:
+		rc = -ENOIOCTLCMD;
+		break;
+	}
+out:
+	sctp_release_sock(sk);
+	return rc;
 }
 
 /* This is the function which gets called during socket creation to
-- 
1.7.3.1


^ permalink raw reply related

* Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_CAN driver to 2.6.35
From: Masayuki Ohtake @ 2010-10-01 10:02 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w,
	margie.foster-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w, Christian Pellegrin,
	Tomoya MORINAGA, meego-dev-WXzIur8shnEAvxtiuMwx3w,
	David S. Miller, joel.clark-ral2JQCrhuEAvxtiuMwx3w, Samuel Ortiz
In-Reply-To: <4CA4541F.5040804@grandegger.com>

Hi Wolfgang Grandegger,

Thank you for your comments.

We will modify and re-post ASAP.

I have a comment about below.
> In this driver you are using just *one* RX object. This means that the
> CPU must handle new messages as quickly as possible otherwise message
> losses will happen, right?. For sure, this will not make user's happy.
> Any chance to use more RX objects in FIFO mode?

In case implementing with FIFO mode,
received packets may be our of order.
Because our CAN register access is slow.

I am confirming our CAN HW spec and the possibility of our-of-order.

Thanks, Ohtake(OKISemi)
---- Original Message ----- 
From: "Wolfgang Grandegger" <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
To: "Masayuki Ohtak" <masa-korg-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>; "Wolfram Sang" <w.sang-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>; "Christian Pellegrin"
<chripell-VaTbYqLCNhc@public.gmane.org>; "Barry Song" <21cnbao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; "Samuel Ortiz" <sameo-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>;
<socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org>; <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; <meego-dev-WXzIur8shnEAvxtiuMwx3w@public.gmane.org>;
<andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; <qi.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; <margie.foster-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; <yong.y.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>;
<kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; "Tomoya MORINAGA" <morinaga526-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>; <joel.clark-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Sent: Thursday, September 30, 2010 6:10 PM
Subject: Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_CAN driver to 2.6.35


> Hi Ohtake,
>
> here comes my review, sorry for delay.
>
> On 09/24/2010 12:24 PM, Masayuki Ohtak wrote:
> > Hi Wolfgang and Marc,
> >
> > We have modified a pretty amount of our driver based on other accepted Socket CAN driver.
> > Additionally, We have reduced the number of lines 3601 to 1444.
>
> Much better, but I believe it could be reduced even further.
>
> > Please check below.
> >
> > Thanks, Ohtake(OKISemi)
> >
> > ---
> > CAN driver of Topcliff PCH
> >
> > Topcliff PCH is the platform controller hub that is going to be used in
> > Intel's upcoming general embedded platform. All IO peripherals in
> > Topcliff PCH are actually devices sitting on AMBA bus.
> > Topcliff PCH has CAN I/F. This driver enables CAN function.
> >
> > Signed-off-by: Masayuki Ohtake <masa-korg-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
> > ---
> >  drivers/net/can/Kconfig   |    8 +
> >  drivers/net/can/Makefile  |    1 +
> >  drivers/net/can/pch_can.c | 1444 +++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 1453 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/net/can/pch_can.c
> >
> > diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
> > index 2c5227c..5c98a20 100644
> > --- a/drivers/net/can/Kconfig
> > +++ b/drivers/net/can/Kconfig
> > @@ -73,6 +73,14 @@ config CAN_JANZ_ICAN3
> >    This driver can also be built as a module. If so, the module will be
> >    called janz-ican3.ko.
> >
> > +config PCH_CAN
> > + tristate "PCH CAN"
> > + depends on  CAN_DEV
> > + ---help---
> > +   This driver is for PCH CAN of Topcliff which is an IOH for x86
> > +   embedded processor.
> > +   This driver can access CAN bus.
> > +
> >  source "drivers/net/can/mscan/Kconfig"
> >
> >  source "drivers/net/can/sja1000/Kconfig"
> > diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
> > index 9047cd0..3ddc6a7 100644
> > --- a/drivers/net/can/Makefile
> > +++ b/drivers/net/can/Makefile
> > @@ -16,5 +16,6 @@ obj-$(CONFIG_CAN_TI_HECC) += ti_hecc.o
> >  obj-$(CONFIG_CAN_MCP251X) += mcp251x.o
> >  obj-$(CONFIG_CAN_BFIN) += bfin_can.o
> >  obj-$(CONFIG_CAN_JANZ_ICAN3) += janz-ican3.o
> > +obj-$(CONFIG_PCH_CAN) += pch_can.o
>
> Please provide patches against David Millers "net-next-2.6" GIT tree and
> use the prefix "can: " in your subject next time. See
> http://svn.berlios.de/wsvn/socketcan/trunk/README.submitting-patches
> for further information.
>
> >  ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
> > diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
> > new file mode 100644
> > index 0000000..8c1731b
> > --- /dev/null
> > +++ b/drivers/net/can/pch_can.c
> > @@ -0,0 +1,1444 @@
> > +/*
> > + * Copyright (C) 1999 - 2010 Intel Corporation.
> > + * Copyright (C) 2010 OKI SEMICONDUCTOR Co., LTD.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; version 2 of the License.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307, USA.
> > + */
> > +
> > +#include <linux/interrupt.h>
> > +#include <linux/delay.h>
> > +#include <linux/io.h>
> > +#include <linux/module.h>
> > +#include <linux/sched.h>
> > +#include <linux/pci.h>
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/types.h>
> > +#include <linux/errno.h>
> > +#include <linux/netdevice.h>
> > +#include <linux/skbuff.h>
> > +#include <linux/can.h>
> > +#include <linux/can/dev.h>
> > +#include <linux/can/error.h>
> > +
> > +#define MAX_BITRATE 0x3e8
>
> Dead code? At least it's not used anywhere.
>
> > +
> > +#define MAX_MSG_OBJ 32
> > +#define MSG_OBJ_RX 0 /* The receive message object flag. */
> > +#define MSG_OBJ_TX 1 /* The transmit message object flag. */
> > +
> > +#define ENABLE 1 /* The enable flag */
> > +#define DISABLE 0 /* The disable flag */
> > +#define CAN_CTRL_INIT 0x0001 /* The INIT bit of CANCONT register. */
> > +#define CAN_CTRL_IE 0x0002 /* The IE bit of CAN control register */
> > +#define CAN_CTRL_IE_SIE_EIE 0x000e
> > +#define CAN_CTRL_CCE 0x0040
> > +#define CAN_CTRL_OPT 0x0080 /* The OPT bit of CANCONT register. */
> > +#define CAN_OPT_SILENT 0x0008 /* The Silent bit of CANOPT reg. */
> > +#define CAN_OPT_LBACK 0x0010 /* The LoopBack bit of CANOPT reg. */
> > +#define CAN_CMASK_RX_TX_SET 0x00f3
> > +#define CAN_CMASK_RX_TX_GET 0x0073
> > +#define CAN_CMASK_ALL 0xff
> > +#define CAN_CMASK_RDWR 0x80
> > +#define CAN_CMASK_ARB 0x20
> > +#define CAN_CMASK_CTRL 0x10
> > +#define CAN_CMASK_MASK 0x40
> > +
> > +#define CAN_IF_MCONT_NEWDAT 0x8000
> > +#define CAN_IF_MCONT_INTPND 0x2000
> > +#define CAN_IF_MCONT_UMASK 0x1000
> > +#define CAN_IF_MCONT_TXIE 0x0800
> > +#define CAN_IF_MCONT_RXIE 0x0400
> > +#define CAN_IF_MCONT_RMTEN 0x0200
> > +#define CAN_IF_MCONT_TXRQXT 0x0100
> > +#define CAN_IF_MCONT_EOB 0x0080
> > +#define CAN_IF_MCONT_MSGLOST 0x4000
> > +#define CAN_MASK2_MDIR_MXTD 0xc000
> > +#define CAN_ID2_DIR 0x2000
> > +#define CAN_ID_MSGVAL 0x8000
> > +
> > +#define CAN_STATUS_INT 0x8000
> > +#define CAN_IF_CREQ_BUSY 0x8000
> > +#define CAN_ID2_XTD 0x4000
> > +
> > +#define CAN_REC 0x00007f00
> > +#define CAN_TEC 0x000000ff
> > +
> > +#define PCH_RX_OK 0x00000010
> > +#define PCH_TX_OK 0x00000008
> > +#define PCH_BUS_OFF 0x00000080
> > +#define PCH_EWARN 0x00000040
> > +#define PCH_EPASSIV 0x00000020
>
> > +#define PCH_LEC0 0x00000001
> > +#define PCH_LEC1 0x00000002
> > +#define PCH_LEC2 0x00000004
> > +#define PCH_LEC_ALL (PCH_LEC0 | PCH_LEC1 | PCH_LEC2)
> > +#define PCH_STUF_ERR PCH_LEC0
> > +#define PCH_FORM_ERR PCH_LEC1
> > +#define PCH_ACK_ERR (PCH_LEC0 | PCH_LEC1)
> > +#define PCH_BIT1_ERR PCH_LEC2
> > +#define PCH_BIT0_ERR (PCH_LEC0 | PCH_LEC2)
> > +#define PCH_CRC_ERR (PCH_LEC1 | PCH_LEC2)
>
> enum {
>   PCH_LEC_STUF_ERR = 0,
> PCH_LEC_FORM_ERR,
> PCH_LEC_ACK_ERR,
> ...
> PCH_LEC_ALL
> };
>
> Seems more appropriate. More comments below.
>
> > +
> > +/* bit position of certain controller bits. */
> > +#define BIT_BITT_BRP 0
> > +#define BIT_BITT_SJW 6
> > +#define BIT_BITT_TSEG1 8
> > +#define BIT_BITT_TSEG2 12
> > +#define BIT_IF1_MCONT_RXIE 10
> > +#define BIT_IF2_MCONT_TXIE 11
> > +#define BIT_BRPE_BRPE 6
> > +#define BIT_ES_TXERRCNT 0
> > +#define BIT_ES_RXERRCNT 8
> > +#define MSK_BITT_BRP 0x3f
> > +#define MSK_BITT_SJW 0xc0
> > +#define MSK_BITT_TSEG1 0xf00
> > +#define MSK_BITT_TSEG2 0x7000
> > +#define MSK_BRPE_BRPE 0x3c0
> > +#define MSK_BRPE_GET 0x0f
> > +#define MSK_CTRL_IE_SIE_EIE 0x07
> > +#define MSK_MCONT_TXIE 0x08
> > +#define MSK_MCONT_RXIE 0x10
> > +#define PCH_CAN_NO_TX_BUFF 1
> > +#define PCI_DEVICE_ID_INTEL_PCH1_CAN 0x8818
> > +#define COUNTER_LIMIT 0xFFFF
>
> Keep alignment?
>
> > +#define PCH_CAN_CLK 50000 /* 50MHz */
>
> Please specify it in Hz already here.
>
> > +
> > +/* Total 32 OBJs */
> > +#define PCH_RX_OBJ_NUM 1
> > +#define PCH_TX_OBJ_NUM 1
> > +#define PCH_OBJ_NUM (PCH_TX_OBJ_NUM + PCH_RX_OBJ_NUM)
>
> Please explain biefly what message object are use for what purpose.
> Either here or in the initialization code.
>
> > +
> > +#define PCH_CAN_ACTIVE 0
> > +#define PCH_CAN_LISTEN 1
> > +#define PCH_CAN_STOP 0
> > +#define PCH_CAN_RUN 1
> > +
> > +#define PCH_CAN_ENABLE 0
> > +#define PCH_CAN_DISABLE 1
> > +#define PCH_CAN_ALL 2
> > +#define PCH_CAN_NONE 3
>
> The above are used in switch case and should therefore be anonymous
> enums. I suggested to remove them because I'm not a real friend of the
> helper functions which are just called *once*.
>
> > +
> > +struct pch_can_regs {
> > + u32 cont;
> > + u32 stat;
> > + u32 errc;
> > + u32 bitt;
> > + u32 intr;
> > + u32 opt;
> > + u32 brpe;
> > + u32 reserve1;
> > + u32 if1_creq;
> > + u32 if1_cmask;
> > + u32 if1_mask1;
> > + u32 if1_mask2;
> > + u32 if1_id1;
> > + u32 if1_id2;
> > + u32 if1_mcont;
> > + u32 if1_dataa1;
> > + u32 if1_dataa2;
> > + u32 if1_datab1;
> > + u32 if1_datab2;
> > + u32 reserve2;
> > + u32 reserve3[12];
> > + u32 if2_creq;
> > + u32 if2_cmask;
> > + u32 if2_mask1;
> > + u32 if2_mask2;
> > + u32 if2_id1;
> > + u32 if2_id2;
> > + u32 if2_mcont;
> > + u32 if2_dataa1;
> > + u32 if2_dataa2;
> > + u32 if2_datab1;
> > + u32 if2_datab2;
> > + u32 reserve4;
> > + u32 reserve5[20];
> > + u32 treq1;
> > + u32 treq2;
> > + u32 reserve6[2];
> > + u32 reserve7[56];
> > + u32 reserve8[3];
> > + u32 srst;
> > +};
>
> Nice.
>
> > +struct pch_can_priv {
> > + struct can_priv can;
> > + void __iomem *base;
> > + unsigned int can_num;
> > + struct pci_dev *dev;
> > + unsigned int tx_enable[MAX_MSG_OBJ];
> > + unsigned int rx_enable[MAX_MSG_OBJ];
> > + unsigned int rx_link[MAX_MSG_OBJ];
> > + unsigned int int_enables;
> > + unsigned int int_stat;
> > + unsigned int bus_off_interrupt;
> > + struct net_device *ndev;
> > + spinlock_t msgif_reg_lock; /* Message Interface Registers Access Lock*/
> > + unsigned int msg_obj[MAX_MSG_OBJ];
> > + struct pch_can_regs *regs;
>
> Please add __iomem. Do you need both, regs *and* base?
>
> > +};
> > +
> > +static struct can_bittiming_const pch_can_bittiming_const = {
> > + .name = KBUILD_MODNAME,
>
> Not sure what KBUILD_MODNAME is. Should be "pch_can", the name of the
> driver.
>
> > + .tseg1_min = 1,
> > + .tseg1_max = 16,
> > + .tseg2_min = 1,
> > + .tseg2_max = 8,
> > + .sjw_max = 4,
> > + .brp_min = 1,
> > + .brp_max = 1024, /* 6bit + extended 4bit */
> > + .brp_inc = 1,
> > +};
> > +
> > +static const struct pci_device_id pch_can_pcidev_id[] __devinitdata = {
> > + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PCH1_CAN)},
> > + {}
> > +};
>
> Please use DEFINE_PCI_DEVICE_TABLE.
>
> > +static inline void pch_can_bit_set(u32 *addr, u32 mask)
> > +{
> > + iowrite32((ioread32(addr) | mask), addr);
>
> Outer brackets not needed!
>
> > +}
> > +
> > +static inline void pch_can_bit_clear(u32 *addr, u32 mask)
> > +{
> > + iowrite32((ioread32(addr) & ~(mask)), addr);
>
> Ditto.
>
> > +}
> > +
> > +static void pch_can_set_run_mode(struct pch_can_priv *priv, u32 mode)
> > +{
> > + switch (mode) {
> > + case PCH_CAN_RUN:
> > + pch_can_bit_clear(&(priv->regs)->cont, CAN_CTRL_INIT);
> > + break;
> > +
> > + case PCH_CAN_STOP:
> > + pch_can_bit_set(&(priv->regs)->cont, CAN_CTRL_INIT);
> > + break;
> > +
> > + default:
> > + dev_err(&priv->ndev->dev, "%s -> Invalid Mode.\n", __func__);
> > + break;
> > + }
> > +}
> > +
> > +static void pch_can_get_run_mode(struct pch_can_priv *priv, u32 *mode)
> > +{
> > + u32 reg_val = ioread32(&(priv->regs)->cont);
>
> I don't think you need the brackets around "priv->regs". Therefore I
> suggest s/&(priv->regs)/&priv->regs/ for the whole file.
>
> > +
> > + if (reg_val & CAN_CTRL_INIT)
> > + *mode = PCH_CAN_STOP;
> > + else
> > + *mode = PCH_CAN_RUN;
> > +}
>
> These are the helper functions I complained about above. And reg_val is
> not really needed.
>
> > +static void pch_can_set_optmode(struct pch_can_priv *priv)
> > +{
> > + u32 reg_val = ioread32(&(priv->regs)->opt);
> > +
> > + if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
> > + reg_val |= CAN_OPT_SILENT;
> > +
> > + if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK)
> > + reg_val |= CAN_OPT_LBACK;
> > +
> > + pch_can_bit_set(&(priv->regs)->cont, CAN_CTRL_OPT);
> > + iowrite32(reg_val, &(priv->regs)->opt);
> > +}
> > +
> > +static void pch_can_set_int_custom(struct pch_can_priv *priv)
> > +{
> > + /* Clearing the IE, SIE and EIE bits of Can control register. */
> > + pch_can_bit_clear(&(priv->regs)->cont, CAN_CTRL_IE_SIE_EIE);
> > +
> > + /* Appropriately setting them. */
> > + pch_can_bit_set(&(priv->regs)->cont,
> > + ((priv->int_enables & MSK_CTRL_IE_SIE_EIE) << 1));
> > +}
> > +
> > +/* This function retrieves interrupt enabled for the CAN device. */
> > +static void pch_can_get_int_enables(struct pch_can_priv *priv, u32 *enables)
> > +{
> > + u32 reg_ctrl_val = ioread32(&(priv->regs)->cont);
> > +
> > + /* Obtaining the status of IE, SIE and EIE interrupt bits. */
> > + *enables = ((reg_ctrl_val & CAN_CTRL_IE_SIE_EIE) >> 1);
>
> Do you really need an extra variable?
>
> > +}
> > +
> > +static void pch_can_set_int_enables(struct pch_can_priv *priv, u32 interrupt_no)
> > +{
> > + switch (interrupt_no) {
> > + case PCH_CAN_ENABLE:
> > + pch_can_bit_set(&(priv->regs)->cont, CAN_CTRL_IE);
> > + break;
> > +
> > + case PCH_CAN_DISABLE:
> > + pch_can_bit_clear(&(priv->regs)->cont, CAN_CTRL_IE);
> > + break;
> > +
> > + case PCH_CAN_ALL:
> > + pch_can_bit_set(&(priv->regs)->cont, CAN_CTRL_IE_SIE_EIE);
> > + break;
> > +
> > + case PCH_CAN_NONE:
> > + pch_can_bit_clear(&(priv->regs)->cont, CAN_CTRL_IE_SIE_EIE);
> > + break;
> > +
> > + default:
> > + dev_err(&priv->ndev->dev, "Invalid interrupt number.\n");
> > + break;
> > + }
> > +}
> > +
> > +static void pch_can_check_if1_busy(struct pch_can_priv *priv, u32 num)
> > +{
> > + u32 counter = COUNTER_LIMIT;
> > + u32 if1_creq;
> > +
> > + iowrite32(num, &(priv->regs)->if1_creq);
> > + while (counter) {
> > + if1_creq = (ioread32(&(priv->regs)->if1_creq)) &
> > +      CAN_IF_CREQ_BUSY;
> > + if (!if1_creq)
> > + break;
> > + counter--;
> > + }
> > + if (!counter)
> > + dev_err(&priv->ndev->dev, "IF1 BUSY Flag is set forever.\n");
>
> Please use a defined delay for the above timeout. How long does it
> usually take the bit to toggle? A small delay, e.g. udelay(1) could be
> fine. This function is called in the time critical path!
>
> > +}
> > +
> > +static void pch_can_check_if2_busy(struct pch_can_priv *priv, u32 num)
> > +{
> > + u32 counter = COUNTER_LIMIT;
> > + u32 if2_creq;
> > +
> > + iowrite32(num, &(priv->regs)->if2_creq);
> > + while (counter) {
> > + if2_creq = (ioread32(&(priv->regs)->if2_creq)) &
> > +      CAN_IF_CREQ_BUSY;
> > + if (!if2_creq)
> > + break;
> > + counter--;
> > + }
> > + if (!counter)
> > + dev_err(&priv->ndev->dev, "IF2 BUSY Flag is set forever.\n");
> > +}
>
> Duplicated code!
>
> > +static void pch_can_set_rx_enable(struct pch_can_priv *priv, u32 buff_num,
> > +   u32 set)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + /*Reading the receive buffer data from RAM to Interface1 registers */
>
> Space after /* ?
>
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, buff_num); /* Read from MsgRAN */
> > +
> > + /* Setting the IF1MASK1 register to access MsgVal and RxIE bits */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_ARB | CAN_CMASK_CTRL),
> > +   (&(priv->regs)->if1_cmask));
> > +
> > + if (set == ENABLE) {
> > + /* Setting the MsgVal and RxIE bits */
> > + pch_can_bit_set(&(priv->regs)->if1_mcont, CAN_IF_MCONT_RXIE);
> > + pch_can_bit_set(&(priv->regs)->if1_id2, CAN_ID_MSGVAL);
> > +
> > + } else if (set == DISABLE) {
> > + /* Resetting the MsgVal and RxIE bits */
> > + pch_can_bit_clear(&(priv->regs)->if1_mcont, CAN_IF_MCONT_RXIE);
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, CAN_ID_MSGVAL);
> > + }
> > +
> > + pch_can_check_if1_busy(priv, buff_num); /* Write to MsgRAM */
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_rx_enable_all(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > +
> > + /* Traversing to obtain the object configured as receivers. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX)
> > + pch_can_set_rx_enable(priv, i + 1, ENABLE);
> > + }
> > +}
> > +
> > +static void pch_can_rx_disable_all(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > +
> > + /* Traversing to obtain the object configured as receivers. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX)
> > + pch_can_set_rx_enable(priv, (i + 1), DISABLE);
> > + }
> > +}
> > +
> > +static void pch_can_set_tx_enable(struct pch_can_priv *priv, u32 buff_num,
> > + u32 set)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + /* Reading the Msg buffer from Message RAM to Interface2 registers. */
> > + iowrite32(CAN_CMASK_RX_TX_GET, (&(priv->regs)->if1_cmask));
> > + pch_can_check_if1_busy(priv, buff_num);
> > +
> > + /* Setting the IF2CMASK register for accessing the
> > + MsgVal and TxIE bits */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_ARB | CAN_CMASK_CTRL),
> > + (&(priv->regs)->if1_cmask));
> > +
> > + if (set == ENABLE) {
> > + /* Setting the MsgVal and TxIE bits */
> > + pch_can_bit_set(&(priv->regs)->if1_mcont, CAN_IF_MCONT_TXIE);
> > + pch_can_bit_set(&(priv->regs)->if1_id2, CAN_ID_MSGVAL);
> > + } else if (set == DISABLE) {
> > + /* Resetting the MsgVal and TxIE bits. */
> > + pch_can_bit_clear(&(priv->regs)->if1_mcont, CAN_IF_MCONT_TXIE);
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, CAN_ID_MSGVAL);
> > + }
> > +
> > + pch_can_check_if1_busy(priv, buff_num); /* Write to MsgRAM */
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_tx_enable_all(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > +
> > + /* Traversing to obtain the object configured as transmit object. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_TX)
> > + pch_can_set_tx_enable(priv, (i + 1), ENABLE);
> > + }
> > +}
> > +
> > +static void pch_can_tx_disable_all(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > +
> > + /* Traversing to obtain the object configured as transmit object. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_TX)
> > + pch_can_set_tx_enable(priv, (i + 1), DISABLE);
> > + }
> > +}
> > +
> > +static void pch_can_get_rx_enable(struct pch_can_priv *priv, u32 buff_num,
> > + u32 *enable)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + iowrite32(CAN_CMASK_RX_TX_GET, (&(priv->regs)->if1_cmask));
> > + pch_can_check_if1_busy(priv, buff_num);
> > +
> > + if (((ioread32(&(priv->regs)->if1_id2)) & CAN_ID_MSGVAL) &&
> > + ((ioread32(&(priv->regs)->if1_mcont)) &
> > + CAN_IF_MCONT_RXIE))
> > + *enable = ENABLE;
> > + else
> > + *enable = DISABLE;
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_get_tx_enable(struct pch_can_priv *priv, u32 buff_num,
> > + u32 *enable)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, buff_num);
> > +
> > + if (((ioread32(&(priv->regs)->if1_id2)) & CAN_ID_MSGVAL) &&
> > + ((ioread32(&(priv->regs)->if1_mcont)) &
> > + CAN_IF_MCONT_TXIE)) {
> > + *enable = ENABLE;
> > + } else {
> > + *enable = DISABLE;
> > + }
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static int pch_can_int_pending(struct pch_can_priv *priv)
> > +{
> > + return ioread32(&(priv->regs)->intr) & 0xffff;
> > +}
> > +
> > +static void pch_can_set_rx_buffer_link(struct pch_can_priv *priv,
> > +        u32 buffer_num, u32 set)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, buffer_num);
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_CTRL), &(priv->regs)->if1_cmask);
> > + if (set == ENABLE)
> > + pch_can_bit_clear(&(priv->regs)->if1_mcont, CAN_IF_MCONT_EOB);
> > + else
> > + pch_can_bit_set(&(priv->regs)->if1_mcont, CAN_IF_MCONT_EOB);
> > +
> > + pch_can_check_if1_busy(priv, buffer_num);
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_get_rx_buffer_link(struct pch_can_priv *priv,
> > +        u32 buffer_num, u32 *link)
> > +{
> > + u32 reg_val;
>
> Really needed?
>
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, buffer_num);
> > +
> > + reg_val = ioread32(&(priv->regs)->if1_mcont);
> > + if (reg_val & CAN_IF_MCONT_EOB)
> > + *link = DISABLE;
> > + else
> > + *link = ENABLE;
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_clear_buffers(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > + u32 rx_buff_num;
> > + u32 tx_buff_num;
>
> Really needed?
>
> > +
> > + iowrite32(CAN_CMASK_RX_TX_SET, &(priv->regs)->if1_cmask);
> > + iowrite32(CAN_CMASK_RX_TX_SET, &(priv->regs)->if2_cmask);
> > + iowrite32(0xffff, &(priv->regs)->if1_mask1);
> > + iowrite32(0xffff, &(priv->regs)->if1_mask2);
> > + iowrite32(0xffff, &(priv->regs)->if2_mask1);
> > + iowrite32(0xffff, &(priv->regs)->if2_mask2);
> > +
> > + iowrite32(0x0, &(priv->regs)->if1_id1);
> > + iowrite32(0x0, &(priv->regs)->if1_id2);
> > + iowrite32(0x0, &(priv->regs)->if2_id1);
> > + iowrite32(0x0, &(priv->regs)->if2_id2);
> > + iowrite32(0x0, &(priv->regs)->if1_mcont);
> > + iowrite32(0x0, &(priv->regs)->if2_mcont);
> > + iowrite32(0x0, &(priv->regs)->if1_dataa1);
> > + iowrite32(0x0, &(priv->regs)->if1_dataa2);
> > + iowrite32(0x0, &(priv->regs)->if1_datab1);
> > + iowrite32(0x0, &(priv->regs)->if1_datab2);
> > + iowrite32(0x0, &(priv->regs)->if2_dataa1);
> > + iowrite32(0x0, &(priv->regs)->if2_dataa2);
> > + iowrite32(0x0, &(priv->regs)->if2_datab1);
> > + iowrite32(0x0, &(priv->regs)->if2_datab2);
> > +
> > + for (i = 1; i <= (MAX_MSG_OBJ / 2); i++) {
> > + rx_buff_num = 2 * i;
> > + tx_buff_num = (2 * i) - 1;
> > +
> > + iowrite32(rx_buff_num, &(priv->regs)->if1_creq);
> > + iowrite32(tx_buff_num, &(priv->regs)->if2_creq);
> > +
> > + mdelay(10);
> > + }
> > +}
> > +
> > +static void pch_can_config_rx_tx_buffers(struct pch_can_priv *priv)
> > +{
> > + u32 i;
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > + /* For accssing MsgVal, ID and EOB bit */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_ARB | CAN_CMASK_CTRL),
> > + (&(priv->regs)->if1_cmask));
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_ARB | CAN_CMASK_CTRL),
> > + (&(priv->regs)->if2_cmask));
> > + iowrite32(0x0, (&(priv->regs)->if1_id1));
> > + iowrite32(0x0, (&(priv->regs)->if1_id2));
> > +
> > + /* Resetting DIR bit for reception */
> > + iowrite32(0x0, (&(priv->regs)->if2_id1));
> > +
> > + /* Setting DIR bit for transmission */
> > + iowrite32((CAN_ID2_DIR | (0x7ff << 2)),
> > + (&(priv->regs)->if2_id2));
> > +
> > + /* Setting EOB bit for receiver */
> > + iowrite32(CAN_IF_MCONT_EOB, &(priv->regs)->if1_mcont);
> > +
> > + /* Setting EOB bit for transmitter */
> > + iowrite32(CAN_IF_MCONT_EOB, (&(priv->regs)->if2_mcont));
> > +
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX)
> > + pch_can_check_if1_busy(priv, i + 1);
> > + else if (priv->msg_obj[i] == MSG_OBJ_TX)
> > + pch_can_check_if2_busy(priv, i + 1);
> > + else
> > + dev_err(&priv->ndev->dev, "Invalid OBJ\n");
> > + }
> > +
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX) {
> > + iowrite32(CAN_CMASK_RX_TX_GET,
> > + &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, i+1);
> > +
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, 0x1fff);
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, CAN_ID2_XTD);
>
> Could'nt it be set just by one call?
>
> > + iowrite32(0, (&(priv->regs)->if1_id1));
> > + pch_can_bit_set(&(priv->regs)->if1_id2, 0);
> > + pch_can_bit_set(&(priv->regs)->if1_mcont,
> > + CAN_IF_MCONT_UMASK);
> > + pch_can_bit_set(&(priv->regs)->if2_mcont,
> > + CAN_IF_MCONT_UMASK);
> > +
> > + iowrite32(0, &(priv->regs)->if1_mask1);
> > + pch_can_bit_clear(&(priv->regs)->if1_mask2, 0x1fff);
> > + pch_can_bit_clear(&(priv->regs)->if1_mask2,
> > +   CAN_MASK2_MDIR_MXTD);
> > +
> > + iowrite32(0, &(priv->regs)->if2_mask1);
> > + pch_can_bit_clear(&(priv->regs)->if2_mask2, 0x1fff);
> > +
> > + /* Setting CMASK for writing */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_MASK |
> > +    CAN_CMASK_ARB | CAN_CMASK_CTRL),
> > +   (&(priv->regs)->if1_cmask));
> > +
> > + pch_can_check_if1_busy(priv, i+1);
> > + }
> > + }
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +}
> > +
> > +static void pch_can_open(struct pch_can_priv *priv)
>
> Probably pch_can_init is the better name.
>
> > +{
> > + /* Stopping the Can device. */
> > + pch_can_set_run_mode(priv, PCH_CAN_STOP);
> > +
> > + /* Clearing all the message object buffers. */
> > + pch_can_clear_buffers(priv);
> > +
> > + /* Configuring the respective message object as either rx/tx object. */
> > + pch_can_config_rx_tx_buffers(priv);
> > +
> > + /* Enabling all receive objects. */
> > + pch_can_rx_enable_all(priv);
> > +
> > + /* Enabling all transmit objects. */
> > + pch_can_tx_enable_all(priv);
> > +
> > + /* Enabling the interrupts. */
> > + pch_can_set_int_enables(priv, PCH_CAN_ALL);
> > +
> > + /* Setting the CAN to run mode. */
> > + pch_can_set_run_mode(priv, PCH_CAN_RUN);
>
> Hm, you start the controller here... more later.
>
> > +}
> > +
> > +static void pch_can_release(struct pch_can_priv *priv)
> > +{
> > + /* Stooping the CAN device. */
> > + pch_can_set_run_mode(priv, PCH_CAN_STOP);
> > +
> > + /* Disabling the interrupts. */
> > + pch_can_set_int_enables(priv, PCH_CAN_NONE);
> > +
> > + /* Disabling all the receive object. */
> > + pch_can_rx_disable_all(priv);
> > +
> > + /* Disabling all the transmit object. */
> > + pch_can_tx_disable_all(priv);
> > +}
> > +
> > +/* This function clears interrupt(s) from the CAN device. */
> > +static void pch_can_int_clr(struct pch_can_priv *priv, u32 mask)
> > +{
> > + if (mask == CAN_STATUS_INT) {
> > + ioread32(&(priv->regs)->stat);
> > + return;
> > + }
> > +
> > + /* Clear interrupt for transmit object */
> > + if (priv->msg_obj[mask - 1] == MSG_OBJ_TX) {
> > + /* Setting CMASK for clearing interrupts for
> > + frame transmission. */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_CTRL | CAN_CMASK_ARB),
> > + (&(priv->regs)->if2_cmask));
> > +
> > + /* Resetting the ID registers. */
> > + pch_can_bit_set(&(priv->regs)->if2_id2,
> > +        (CAN_ID2_DIR | (0x7ff << 2)));
> > + iowrite32(0x0, (&(priv->regs)->if2_id1));
> > +
> > + /* Claring NewDat, TxRqst & IntPnd */
> > + pch_can_bit_clear(&(priv->regs)->if2_mcont,
> > +   (CAN_IF_MCONT_NEWDAT | CAN_IF_MCONT_INTPND |
> > +    CAN_IF_MCONT_TXRQXT));
> > + pch_can_check_if2_busy(priv, mask);
> > + }
> > + /* Clear interrupt for receive object */
> > + else if (priv->msg_obj[mask - 1] == MSG_OBJ_RX) {
>
> Should be "} else if ..."
>
> > + /* Setting CMASK for clearing the reception interrupts. */
> > + iowrite32((CAN_CMASK_RDWR | CAN_CMASK_CTRL | CAN_CMASK_ARB),
> > +   (&(priv->regs)->if2_cmask));
> > +
> > + /* Clearing the Dir bit. */
> > + pch_can_bit_clear(&(priv->regs)->if2_id2, CAN_ID2_DIR);
> > +
> > + /* Clearing NewDat & IntPnd */
> > + pch_can_bit_clear(&(priv->regs)->if2_mcont,
> > +   (CAN_IF_MCONT_NEWDAT | CAN_IF_MCONT_INTPND));
> > +
> > + pch_can_check_if2_busy(priv, mask);
> > + }
> > +}
> > +
> > +static int pch_can_get_buffer_status(struct pch_can_priv *priv)
> > +{
> > + u32 reg_treq1;
> > + u32 reg_treq2;
>
> Really needed?
>
> > +
> > + /* Reading the transmission request registers. */
> > + reg_treq1 = (ioread32(&(priv->regs)->treq1) & 0xffff);
> > + reg_treq2 = ((ioread32(&(priv->regs)->treq2) & 0xffff) << 16);
> > +
> > + return reg_treq1 | reg_treq2;
> > +}
> > +
> > +static void pch_can_reset(struct pch_can_priv *priv)
> > +{
> > + /* write to sw reset register */
> > + iowrite32(1, (&(priv->regs)->srst));
> > + iowrite32(0, (&(priv->regs)->srst));
> > +}
> > +
> > +static void pch_can_msg_obj(struct net_device *ndev, u32 status)
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + u32 reg;
> > + struct sk_buff *skb;
> > + struct can_frame *cf;
> > + canid_t id;
> > + u32 ide;
> > + u32 rtr;
> > + int i, j;
> > + struct net_device_stats *stats = &(priv->ndev->stats);
> > +
> > + /* Reading the messsage object from the Message RAM */
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if2_cmask);
> > + pch_can_check_if2_busy(priv, status);
> > +
> > + /* Reading the MCONT register. */
> > + reg = ioread32(&(priv->regs)->if2_mcont);
> > + reg &= 0xffff;
> > +
> > + /* If MsgLost bit set. */
> > + if (reg & CAN_IF_MCONT_MSGLOST) {
> > + pch_can_bit_clear(&(priv->regs)->if2_mcont,
> > +   CAN_IF_MCONT_MSGLOST);
> > + dev_err(&priv->ndev->dev, "Msg Obj is overwritten.\n");
>
> That should create an error message as well.
>
> > + }
> > + /* Read the direction bit for determination of remote frame . */
> > + rtr = (ioread32((&(priv->regs)->if2_id2)) &  CAN_ID2_DIR);
> > + /* Clearing interrupts. */
> > + pch_can_int_clr(priv, status);
> > + /* Hanlde reception interrupt */
>
> Typo!
>
> > + if (priv->msg_obj[status - 1] == MSG_OBJ_RX) {
> > + if (!(reg & CAN_IF_MCONT_NEWDAT)) {
> > + dev_err(&priv->ndev->dev, "MCONT_NEWDAT isn't SET.\n");
> > + return;
> > + }
> > + skb = alloc_can_skb(priv->ndev, &cf);
> > + if (!skb)
> > + return;
> > +
> > + ide = ((ioread32(&(priv->regs)->if2_id2)) & CAN_ID2_XTD) >> 14;
> > + if (ide) {
> > + id = (ioread32(&(priv->regs)->if2_id1) & 0xffff);
> > + id |= (((ioread32(&(priv->regs)->if2_id2)) &
> > +     0x1fff) << 16);
> > + cf->can_id = (id & CAN_EFF_MASK) | CAN_EFF_FLAG;
> > + } else {
> > + id = (((ioread32(&(priv->regs)->if2_id2)) &
> > +   (CAN_SFF_MASK << 2)) >> 2);
> > + cf->can_id = (id & CAN_SFF_MASK);
> > + }
> > +
> > + if (rtr) {
> > + cf->can_dlc = 0;
> > + cf->can_id |= CAN_RTR_FLAG;
> > + } else {
> > + cf->can_dlc = ((ioread32(&(priv->regs)->if2_mcont)) &
> > +    0x0f);
> > + }
> > +
> > + /* Reading back the data. */
> > + for (i = 0, j = 0; i < cf->can_dlc; j++) {
> > + reg = ioread32(&(priv->regs)->if2_dataa1 + j*4);
> > + cf->data[i++] = cpu_to_le32(reg & 0xff);
> > + if (i == cf->can_dlc)
> > + break;
> > + cf->data[i++] = cpu_to_le32((reg & (0xff << 8)) >> 8);
> > + }
> > + netif_rx(skb);
> > + stats->rx_packets++;
> > + stats->rx_bytes += cf->can_dlc;
> > + } else if (priv->msg_obj[status - 1] == MSG_OBJ_TX) {
> > + /* Hanlde transmission interrupt */
>
> Typo!
>
> > + can_get_echo_skb(priv->ndev, 0);
> > + netif_wake_queue(priv->ndev);
> > + }
> > +}
> > +
> > +static void pch_can_error(struct net_device *ndev, u32 status)
> > +{
> > + struct sk_buff *skb;
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + struct can_frame *cf;
> > + u32 errc;
> > + struct net_device_stats *stats = &(priv->ndev->stats);
> > +
> > + skb = alloc_can_err_skb(ndev, &cf);
> > + if (!skb) {
> > + dev_err(&ndev->dev, "%s -> No memory.\n", __func__);
>
> Please drop the error message.
>
> > + return;
> > + }
> > +
> > + if (status & PCH_BUS_OFF) {
> > + if (!priv->bus_off_interrupt) {
> > + pch_can_tx_disable_all(priv);
> > + pch_can_rx_disable_all(priv);
> > +
> > + priv->can.state = CAN_STATE_BUS_OFF;
> > + cf->can_id |= CAN_ERR_BUSOFF;
> > + can_bus_off(ndev);
> > +
> > + priv->bus_off_interrupt = 1;
> > + pch_can_set_run_mode(priv, PCH_CAN_RUN);
>
> Hm, you automatically restart the contoller after a bus-off. That's not
> the intended behaviour. It's up to the user to define how and when the
> device should recover from bus-off. For further information read
>
> http://lxr.linux.no/#linux+v2.6.35.7/Documentation/networking/can.txt#L767
>
> > + }
> > + }
>
> > + /* Warning interrupt. */
> > + if (status & PCH_EWARN) {
> > + priv->can.state = CAN_STATE_ERROR_WARNING;
> > + priv->can.can_stats.error_warning++;
> > + cf->can_id |= CAN_ERR_CRTL;
> > + errc = ioread32((&(priv->regs)->errc));
> > + if (((errc & CAN_REC) >> 8) > 96)
> > + cf->data[1] |= CAN_ERR_CRTL_RX_WARNING;
> > + if ((errc & CAN_TEC) > 96)
> > + cf->data[1] |= CAN_ERR_CRTL_TX_WARNING;
> > + dev_warn(&ndev->dev, "%s -> Warning interrupt.\n", __func__);
> > + }
> > + /* Error passive interrupt. */
> > + if (status & PCH_EPASSIV) {
> > + priv->can.can_stats.error_passive++;
> > + priv->can.state = CAN_STATE_ERROR_PASSIVE;
> > + cf->can_id |= CAN_ERR_CRTL;
> > + errc = ioread32((&(priv->regs)->errc));
> > + if (((errc & CAN_REC) >> 8) > 127)
> > + cf->data[1] |= CAN_ERR_CRTL_RX_PASSIVE;
> > + if ((errc & CAN_TEC) > 127)
> > + cf->data[1] |= CAN_ERR_CRTL_TX_PASSIVE;
> > + dev_err(&ndev->dev,
> > + "%s -> Error interrupt.\n", __func__);
> > + }
> > +
> > + if (status & PCH_STUF_ERR)
> > + cf->data[2] |= CAN_ERR_PROT_STUFF;
> > +
> > + if (status & PCH_FORM_ERR)
> > + cf->data[2] |= CAN_ERR_PROT_FORM;
> > +
> > + if (status & PCH_ACK_ERR)
> > + cf->data[2] |= CAN_ERR_PROT_LOC_ACK | CAN_ERR_PROT_LOC_ACK_DEL;
> > +
> > + if ((status & PCH_BIT1_ERR) || (status & PCH_BIT0_ERR))
> > + cf->data[2] |= CAN_ERR_PROT_BIT;
> > +
> > + if (status & PCH_CRC_ERR)
> > + cf->data[2] |= CAN_ERR_PROT_LOC_CRC_SEQ |
> > + CAN_ERR_PROT_LOC_CRC_DEL;
> > +
> > + if (status & PCH_LEC_ALL)
> > + iowrite32(status | PCH_LEC_ALL,
> > +   &(priv->regs)->stat);
>
> A bit-wise test of the above values is wrong, I believe. Please use the
> switch statement instead.
>
> > +
> > + stats->rx_packets++;
> > + stats->rx_bytes += cf->can_dlc;
> > + netif_rx(skb);
> > +}
> > +
> > +static irqreturn_t pch_can_handler(int irq, void *dev_id)
>
> A better name making clear that it's the interrupt handler would be nice.
>
> > +{
> > + u32 int_stat;
> > + u32 reg_stat;
> > + struct net_device *ndev = (struct net_device *)dev_id;
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + int_stat = pch_can_int_pending(priv);
> > +
> > + if (!int_stat)
> > + return IRQ_NONE;
> > +
> > + if (int_stat == CAN_STATUS_INT) {
> > + reg_stat = ioread32((&(priv->regs)->stat));
> > + if (reg_stat & (PCH_BUS_OFF | PCH_LEC_ALL | PCH_EWARN |
> > + PCH_EPASSIV)) {
> > + if ((reg_stat & PCH_LEC_ALL) != PCH_LEC_ALL)
> > + pch_can_error(ndev, reg_stat);
> > + }
> > +
> > + /* Recover from Bus Off */
> > + if (!reg_stat && priv->bus_off_interrupt) {
> > + priv->bus_off_interrupt = 0;
> > + pch_can_tx_enable_all(priv);
> > + pch_can_rx_enable_all(priv);
> > +
> > + dev_info(&priv->ndev->dev, "BusOff stage recovered.\n");
>
> Bogus bus-off handling, more later...
>
> > + }
> > +
> > + if (reg_stat & PCH_RX_OK)
> > + pch_can_bit_clear(&(priv->regs)->stat, PCH_RX_OK);
> > +
> > + if (reg_stat & PCH_TX_OK)
> > + pch_can_bit_clear(&(priv->regs)->stat, PCH_TX_OK);
>
> Could be done in one call, I think.
>
> > + int_stat = pch_can_int_pending(priv);
> > + }
> > +
> > + if ((int_stat > 0) && (int_stat <= MAX_MSG_OBJ))
> > + pch_can_msg_obj(ndev, int_stat);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static int pch_set_bittiming(struct net_device *ndev)
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + const struct can_bittiming *bt = &priv->can.bittiming;
> > + u32 curr_mode;
> > + u32 reg1; /* CANBIT */
> > + u32 reg2; /* BEPE */
>
> Why not "u32 canbit" then?
>
> > + u32 brp;
> > +
> > + pch_can_get_run_mode(priv, &curr_mode);
> > + if (curr_mode == PCH_CAN_RUN)
> > + pch_can_set_run_mode(priv, PCH_CAN_STOP);
>
> The device is stopped when this function is called. Please remove.
>
> > +
> > + /* Setting the CCE bit for accessing the Can Timing register. */
> > + pch_can_bit_set(&(priv->regs)->cont, CAN_CTRL_CCE);
> > +
> > + brp = (bt->tq) / (1000000/PCH_CAN_CLK) - 1;
> > + reg1 = brp & MSK_BITT_BRP;
> > + reg1 |= (bt->sjw - 1) << BIT_BITT_SJW;
> > + reg1 |= (bt->phase_seg1 + bt->prop_seg - 1) << BIT_BITT_TSEG1;
> > + reg1 |= (bt->phase_seg2 - 1) << BIT_BITT_TSEG2;
> > + reg2 = (brp & MSK_BRPE_BRPE) >> BIT_BRPE_BRPE;
> > + iowrite32(reg1, (&(priv->regs)->bitt));
> > + iowrite32(reg2, (&(priv->regs)->brpe));
> > + pch_can_bit_clear(&(priv->regs)->cont, CAN_CTRL_CCE);
> > +
> > + if (curr_mode == PCH_CAN_RUN)
> > + pch_can_set_run_mode(priv, PCH_CAN_RUN);
>
> Ditto.
>
> > + return 0;
> > +}
> > +
> > +static void pch_can_start(struct net_device *ndev)
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > +
> > + if (priv->can.state != CAN_STATE_STOPPED)
> > + pch_can_reset(priv);
> > +
> > + pch_set_bittiming(ndev);
> > + pch_can_set_optmode(priv);
> > + priv->can.state = CAN_STATE_ERROR_ACTIVE;
>
> Hm, where do you really start the controller. I'm missing
> pch_can_set_run_mode(priv, PCH_CAN_RUN).
>
> > + return;
> > +}
> > +
> > +static int pch_can_do_set_mode(struct net_device *ndev, enum can_mode mode)
> > +{
> > + int ret = 0;
> > +
> > + switch (mode) {
> > + case CAN_MODE_START:
> > + pch_can_start(ndev);
> > + netif_wake_queue(ndev);
> > + break;
> > + default:
> > + ret = -EOPNOTSUPP;
> > + break;
> > + }
> > +
> > + return ret;
> > +}
>
> Note that this function is called when the device will recover from bus-off.
>
> > +static int pch_can_get_state(const struct net_device *ndev,
> > +      enum can_state *state)
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > +
> > + *state = priv->can.state;
> > + return 0;
> > +}
>
> There is no need for that function as the driver handles state changes
> in the interrupt handler.
>
> > +static int pch_open(struct net_device *ndev)
>
> That's confussing! Please use the prefix pch_can throught this file.
>
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + int retval;
> > +
> > + pch_can_open(priv);
>
> This function already starts the controller, which is too *early*.
>
> > +
> > + retval = pci_enable_msi(priv->dev);
> > + if (retval) {
> > + dev_err(&ndev->dev, "Unable to allocate MSI ret=%d\n", retval);
> > + goto pci_en_msi_err;
> > + }
> > +
> > + /* Regsitering the interrupt. */
> > + retval = request_irq(priv->dev->irq, pch_can_handler, IRQF_SHARED,
> > +      ndev->name, ndev);
> > + if (retval) {
> > + dev_err(&ndev->dev, "request_irq failed.\n");
> > + goto req_irq_err;
> > + }
> > +
> > + /* Assuming that no bus off interrupt. */
> > + priv->bus_off_interrupt = 0;
> > +
> > + /* Open common can device */
> > + retval = open_candev(ndev);
> > + if (retval) {
> > + dev_err(ndev->dev.parent, "open_candev() failed %d\n", retval);
> > + goto err_open_candev;
> > + }
> > +
> > + pch_can_start(ndev);
>
> Thde above function should finally start the controller.
>
> > + netif_start_queue(ndev);
> > +
> > + return 0;
> > +
> > +err_open_candev:
> > + free_irq(priv->dev->irq, ndev);
> > +req_irq_err:
> > + pci_disable_msi(priv->dev);
> > +pci_en_msi_err:
> > + pch_can_release(priv);
> > +
> > + return retval;
> > +}
> > +
> > +static int pch_close(struct net_device *ndev)
> > +{
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > +
> > + netif_stop_queue(ndev);
> > + pch_can_release(priv);
> > + free_irq(priv->dev->irq, ndev);
> > + pci_disable_msi(priv->dev);
> > + close_candev(ndev);
> > + priv->can.state = CAN_STATE_STOPPED;
> > + return 0;
> > +}
> > +
> > +static int pch_get_free_msg_obj(struct net_device *ndev)
> > +{
> > + u32 buffer_status = 0;
> > + u32 tx_disable_counter = 0;
> > + u32 tx_buffer_avail = 0;
> > + u32 status;
> > + s32 i;
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > +
> > + /* Getting the message object status. */
> > + buffer_status = (u32) pch_can_get_buffer_status(priv);
> > +
> > + /* Getting the free transmit message object. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if ((priv->msg_obj[i] == MSG_OBJ_TX)) {
> > + /* Checking whether the object is enabled. */
> > + pch_can_get_tx_enable(priv, i + 1, &status);
> > + if (status == ENABLE) {
> > + if (!((buffer_status >> i) & 1)) {
> > + tx_buffer_avail = (i + 1);
> > + break;
> > + }
> > + } else {
> > + tx_disable_counter++;
> > + }
> > + }
> > + }
> > +
> > + /* If no transmit object available. */
> > + if (!tx_buffer_avail) {
> > + /* If no object is enabled. */
> > + if ((tx_disable_counter == PCH_TX_OBJ_NUM)) {
> > + dev_err(&ndev->dev, "All tx buffers are disabled.\n");
> > + return -EPERM;
> > + } else {
> > + dev_err(&ndev->dev, "%s:No tx buf free.\n", __func__);
> > + return -PCH_CAN_NO_TX_BUFF;
> > + }
> > + }
> > + return tx_buffer_avail;
> > +}
> > +
> > +static netdev_tx_t pch_xmit(struct sk_buff *skb, struct net_device *ndev)
> > +{
> > + canid_t id;
> > + u32 id1 = 0;
> > + u32 id2 = 0;
>
> Need these values to be preset?
>
> > + u32 run_mode;
> > + u32 i, j;
>
> It's common to use type "int" for the usual incrementer value... as you
> do in other places as well. Please check!
>
> > + unsigned long flags;
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > + struct can_frame *cf = (struct can_frame *)skb->data;
> > + struct net_device_stats *stats = &ndev->stats;
> > + u32 tx_buffer_avail = 0;
> > +
> > + if (can_dropped_invalid_skb(ndev, skb))
> > + return NETDEV_TX_OK;
> > +
> > + /* Getting the current CAN mode. */
> > + pch_can_get_run_mode(priv, &run_mode);
> > + if (run_mode != PCH_CAN_RUN) {
> > + dev_err(&ndev->dev, "CAN stopped on transmit attempt.\n");
> > + return -EPERM;
> > + }
>
> Can this happen? I think this check can be removed. Anyway, -EPERM is
> not a valid return value for that function.
>
> > +
> > + tx_buffer_avail = pch_get_free_msg_obj(ndev);
> > + if (tx_buffer_avail < 0)
> > + return tx_buffer_avail;
>
> Wrong return value?
>
> > +
> > + /* Attaining the lock. */
> > + spin_lock_irqsave(&priv->msgif_reg_lock, flags);
> > +
> > + /* Reading the Msg Obj from the Msg RAM to the Interface register. */
> > + iowrite32(CAN_CMASK_RX_TX_GET, &(priv->regs)->if1_cmask);
> > + pch_can_check_if1_busy(priv, tx_buffer_avail);
> > +
> > + /* Setting the CMASK register. */
> > + pch_can_bit_set(&(priv->regs)->if1_cmask, CAN_CMASK_ALL);
> > +
> > + /* If ID extended is set. */
> > + if (cf->can_id & CAN_EFF_FLAG) {
> > + id =  cf->can_id & CAN_EFF_MASK;
> > + id1 = id & 0xffff;
> > + id2 = ((id & (0x1fff << 16)) >> 16) | CAN_ID2_XTD;
>
> Please use some more macro definitions for the sake of readability.
>
> > + } else {
> > + id =  cf->can_id & CAN_SFF_MASK;
> > + id1 = 0;
> > + id2 = ((id & CAN_SFF_MASK) << 2);
> > + }
> > + pch_can_bit_clear(&(priv->regs)->if1_id1, 0xffff);
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, 0x1fff | CAN_ID2_XTD);
> > + pch_can_bit_set(&(priv->regs)->if1_id1, id1);
> > + pch_can_bit_set(&(priv->regs)->if1_id2, id2);
> > +
> > + /* If remote frame has to be transmitted.. */
> > + if (cf->can_id & CAN_RTR_FLAG)
> > + pch_can_bit_clear(&(priv->regs)->if1_id2, CAN_ID2_DIR);
> > +
> > + for (i = 0, j = 0; i < cf->can_dlc; j++) {
> > + iowrite32(le32_to_cpu(cf->data[i++]),
> > + (&(priv->regs)->if1_dataa1) + j*4);
> > + if (i == cf->can_dlc)
> > + break;
> > + iowrite32(le32_to_cpu(cf->data[i++] << 8),
> > + (&(priv->regs)->if1_dataa1) + j*4);
> > + }
> > + can_put_echo_skb(skb, ndev, 0);
> > +
> > + /* Updating the size of the data. */
> > + pch_can_bit_clear(&(priv->regs)->if1_mcont, 0x0f);
> > + pch_can_bit_set(&(priv->regs)->if1_mcont, cf->can_dlc);
> > +
> > + /* Clearing IntPend, NewDat & TxRqst */
> > + pch_can_bit_clear(&(priv->regs)->if1_mcont,
> > +    (CAN_IF_MCONT_NEWDAT | CAN_IF_MCONT_INTPND |
> > +     CAN_IF_MCONT_TXRQXT));
> > +
> > + /* Setting NewDat, TxRqst bits */
> > + pch_can_bit_set(&(priv->regs)->if1_mcont,
> > + (CAN_IF_MCONT_NEWDAT | CAN_IF_MCONT_TXRQXT));
> > +
> > + pch_can_check_if1_busy(priv, tx_buffer_avail);
> > + spin_unlock_irqrestore(&priv->msgif_reg_lock, flags);
> > +
> > + stats->tx_bytes += cf->can_dlc;
> > + stats->tx_packets++;
>
> That shoould be incremented when the TX done interrupt is handled.
>
> > +
> > + return NETDEV_TX_OK;
> > +}
> > +
> > +static const struct net_device_ops pch_can_netdev_ops = {
> > + .ndo_open = pch_open,
> > + .ndo_stop = pch_close,
> > + .ndo_start_xmit = pch_xmit,
> > +};
> > +
> > +static void __devexit pch_can_remove(struct pci_dev *pdev)
> > +{
> > + struct net_device *ndev = pci_get_drvdata(pdev);
> > + struct pch_can_priv *priv = netdev_priv(ndev);
> > +
> > + unregister_candev(priv->ndev);
> > + free_candev(priv->ndev);
> > + pci_iounmap(pdev, priv->base);
> > + pci_release_regions(pdev);
> > + pci_disable_device(pdev);
> > + pci_set_drvdata(pdev, NULL);
> > + pch_can_reset(priv);
> > +}
> > +
> > +#ifdef CONFIG_PM
> > +static int pch_can_suspend(struct pci_dev *pdev, pm_message_t state)
> > +{
> > + int i; /* Counter variable. */
> > + int retval; /* Return value. */
> > + u32 buf_stat; /* Variable for reading the transmit buffer status. */
> > + u32 counter = 0xFFFFFF;
> > +
> > + struct net_device *dev = pci_get_drvdata(pdev);
> > + struct pch_can_priv *priv = netdev_priv(dev);
> > +
> > + /* Stop the CAN controller */
> > + pch_can_set_run_mode(priv, PCH_CAN_STOP);
> > +
> > + /* Indicate that we are aboutto/in suspend */
> > + priv->can.state = CAN_STATE_SLEEPING;
> > +
> > + /* Waiting for all transmission to complete. */
> > + while (counter) {
> > + buf_stat = pch_can_get_buffer_status(priv);
> > + if (!buf_stat)
> > + break;
> > + counter--;
> > + }
> > + if (!counter)
> > + dev_err(&pdev->dev, "%s -> Transmission time out.\n", __func__);
>
> Timeout without defined delay!
>
> > + /* Save interrupt configuration and then disable them */
> > + pch_can_get_int_enables(priv, &(priv->int_enables));
> > + pch_can_set_int_enables(priv, PCH_CAN_DISABLE);
> > +
> > + /* Save Tx buffer enable state */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_TX)
> > + pch_can_get_tx_enable(priv, (i + 1),
> > +       &(priv->tx_enable[i]));
> > + }
> > +
> > + /* Disable all Transmit buffers */
> > + pch_can_tx_disable_all(priv);
> > +
> > + /* Save Rx buffer enable state */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX) {
> > + pch_can_get_rx_enable(priv, (i + 1),
> > + &(priv->rx_enable[i]));
> > + pch_can_get_rx_buffer_link(priv, (i + 1),
> > + &(priv->rx_link[i]));
> > + }
> > + }
> > +
> > + /* Disable all Receive buffers */
> > + pch_can_rx_disable_all(priv);
> > + retval = pci_save_state(pdev);
> > + if (retval) {
> > + dev_err(&pdev->dev, "pci_save_state failed.\n");
> > + } else {
> > + pci_enable_wake(pdev, PCI_D3hot, 0);
> > + pci_disable_device(pdev);
> > + pci_set_power_state(pdev, pci_choose_state(pdev, state));
> > + }
> > +
> > + return retval;
> > +}
> > +
> > +static int pch_can_resume(struct pci_dev *pdev)
> > +{
> > + int i; /* Counter variable. */
> > + int retval; /* Return variable. */
> > + struct net_device *dev = pci_get_drvdata(pdev);
> > + struct pch_can_priv *priv = netdev_priv(dev);
> > +
> > + pci_set_power_state(pdev, PCI_D0);
> > + pci_restore_state(pdev);
> > + retval = pci_enable_device(pdev);
> > + if (retval) {
> > + dev_err(&pdev->dev, "pci_enable_device failed.\n");
> > + return retval;
> > + }
> > +
> > + pci_enable_wake(pdev, PCI_D3hot, 0);
> > +
> > + priv->can.state = CAN_STATE_ERROR_ACTIVE;
> > +
> > + /* Disabling all interrupts. */
> > + pch_can_set_int_enables(priv, PCH_CAN_DISABLE);
> > +
> > + /* Setting the CAN device in Stop Mode. */
> > + pch_can_set_run_mode(priv, PCH_CAN_STOP);
> > +
> > + /* Configuring the transmit and receive buffers. */
> > + pch_can_config_rx_tx_buffers(priv);
> > +
> > + /* Restore the CAN state */
> > + pch_set_bittiming(dev);
> > +
> > + /* Listen/Active */
> > + pch_can_set_optmode(priv);
> > +
> > + /* Enabling the transmit buffer. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_TX) {
> > + pch_can_set_tx_enable(priv, i + 1,
> > +       priv->tx_enable[i]);
> > + }
> > + }
> > +
> > + /* Configuring the receive buffer and enabling them. */
> > + for (i = 0; i < PCH_OBJ_NUM; i++) {
> > + if (priv->msg_obj[i] == MSG_OBJ_RX) {
> > + /* Restore buffer link */
> > + pch_can_set_rx_buffer_link(priv, i + 1,
> > +    priv->rx_link[i]);
> > +
> > + /* Restore buffer enables */
> > + pch_can_set_rx_enable(priv, i + 1, priv->rx_enable[i]);
> > + }
> > + }
> > +
> > + /* Enable CAN Interrupts */
> > + pch_can_set_int_custom(priv);
> > +
> > + /* Restore Run Mode */
> > + pch_can_set_run_mode(priv, PCH_CAN_RUN);
> > +
> > + return retval;
> > +}
>
> Are the suspend and resume functions tested?
>
> > +#else
> > +#define pch_can_suspend NULL
> > +#define pch_can_resume NULL
> > +#endif
>
> Add empty line here
>
> > +static int __devinit pch_can_probe(struct pci_dev *pdev,
> > +    const struct pci_device_id *id)
> > +{
> > + struct net_device *ndev;
> > + struct pch_can_priv *priv;
> > + int rc;
> > + int index;
> > + void __iomem *addr;
> > +
> > + rc = pci_enable_device(pdev);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed pci_enable_device %d\n", rc);
> > + goto probe_exit_endev;
> > + }
> > +
> > + rc = pci_request_regions(pdev, KBUILD_MODNAME);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed pci_request_regions %d\n", rc);
> > + goto probe_exit_pcireq;
> > + }
> > +
> > + addr = pci_iomap(pdev, 1, 0);
> > + if (!addr) {
> > + rc = -EIO;
> > + dev_err(&pdev->dev, "Failed pci_iomap\n");
> > + goto probe_exit_ipmap;
> > + }
> > +
> > + ndev = alloc_candev(sizeof(struct pch_can_priv), 1);
> > + if (!ndev) {
> > + rc = -ENOMEM;
> > + dev_err(&pdev->dev, "Failed alloc_candev\n");
> > + goto probe_exit_alloc_candev;
> > + }
> > +
> > + priv = netdev_priv(ndev);
> > + priv->ndev = ndev;
> > + priv->base = addr;
> > + priv->regs = addr;
> > + priv->dev = pdev;
> > + priv->can.bittiming_const = &pch_can_bittiming_const;
> > + priv->can.do_set_mode = pch_can_do_set_mode;
> > + priv->can.do_get_state = pch_can_get_state;
>
> Not needed! See above.
>
> Could you please also implement do_get_berr_counter().
>
> > + priv->can.ctrlmode_supported = CAN_CTRLMODE_LISTENONLY |
> > +        CAN_CTRLMODE_LOOPBACK;
> > + ndev->irq = pdev->irq;
> > + ndev->flags |= IFF_ECHO;
> > +
> > + pci_set_drvdata(pdev, ndev);
> > + SET_NETDEV_DEV(ndev, &pdev->dev);
> > + ndev->netdev_ops = &pch_can_netdev_ops;
> > +
> > + priv->can.clock.freq = PCH_CAN_CLK * 1000; /* Hz to KHz) */
> > + for (index = 0; index < PCH_RX_OBJ_NUM;)
> > + priv->msg_obj[index++] = MSG_OBJ_RX;
> > +
> > + for (index = index;  index < PCH_OBJ_NUM;)
> > + priv->msg_obj[index++] = MSG_OBJ_TX;
> > +
> > + rc = register_candev(ndev);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed register_candev %d\n", rc);
> > + goto probe_exit_reg_candev;
> > + }
> > +
> > + return 0;
> > +
> > +probe_exit_reg_candev:
> > + free_candev(ndev);
> > +probe_exit_alloc_candev:
> > + pci_iounmap(pdev, addr);
> > +probe_exit_ipmap:
> > + pci_release_regions(pdev);
> > +probe_exit_pcireq:
> > + pci_disable_device(pdev);
> > +probe_exit_endev:
> > + return rc;
> > +}
> > +
> > +static struct pci_driver pch_can_pcidev = {
> > + .name = KBUILD_MODNAME,
> > + .id_table = pch_can_pcidev_id,
> > + .probe = pch_can_probe,
> > + .remove = __devexit_p(pch_can_remove),
> > + .suspend = pch_can_suspend,
> > + .resume = pch_can_resume,
> > +};
> > +
> > +static int __init pch_can_pci_init(void)
> > +{
> > + return pci_register_driver(&pch_can_pcidev);
> > +}
> > +module_init(pch_can_pci_init);
> > +
> > +static void __exit pch_can_pci_exit(void)
> > +{
> > + pci_unregister_driver(&pch_can_pcidev);
> > +}
> > +module_exit(pch_can_pci_exit);
> > +
> > +MODULE_DESCRIPTION("Controller Area Network Driver");
> > +MODULE_LICENSE("GPL");
>
> GPL v2 ?
>
> > +MODULE_VERSION("0.94");
> > +MODULE_DEVICE_TABLE(pci, pch_can_pcidev_id);
>
> Please add it below the declaration of pch_can_pcidev_id.
>
> In this driver you are using just *one* RX object. This means that the
> CPU must handle new messages as quickly as possible otherwise message
> losses will happen, right?. For sure, this will not make user's happy.
> Any chance to use more RX objects in FIFO mode?
>
> Thanks,
>
> Wolfgang.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: Packet time delays on multi-core systems
From: Alexey Vlasov @ 2010-10-01 10:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <1285872765.2615.1008.camel@edumazet-laptop>

On Thu, Sep 30, 2010 at 08:52:45PM +0200, Eric Dumazet wrote:
> 
> If you have a burst of 'LOG' matches, it can really slow down the whole
> thing.
> 
> You should add a limiter (eg: no more than 5 messages per second)
> 
> http://netfilter.org/documentation/HOWTO/packet-filtering-HOWTO-7.html

Yes, sometimes there aren't even 5 packets a second, but also it slows down.
So I don't think this is the reason.

I have also found that:
1. rx overruns is increasing.
2. rx_queue_drop_packet_count is increasing.
# ethtool -S eth0 | grep drop
     tx_dropped: 0
     rx_queue_drop_packet_count: 1260743751
     dropped_smbus: 0
     rx_queue_0_drops: 0
     rx_queue_1_drops: 0
     rx_queue_2_drops: 0
     rx_queue_3_drops: 0

3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
be it is just some feature in 2.6.32.
newbox # hping -c 1 -S -p 80 111.111.111.111
HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms

--- 111.111.111.111 hping statistic ---
1 packets tramitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 99.0/99.0/99.0 ms

13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>

As a result I got doubles:
DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms

Example of another TCP-session from 2.6.28 kernel:
oldbox # hping -c 1 -S -p 80 111.111.111.111
HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms

--- 111.111.111.111 hping statistic ---
1 packets tramitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 2.9/2.9/2.9 ms

14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply

* [PATCH net-next-2.6] be2net: add multiple RX queue support
From: Sathya Perla @ 2010-10-01 10:41 UTC (permalink / raw)
  To: netdev

Dave, incorporated your comment to discover the num of supported MSIx vectors
dynamically; thanks.

This patch adds multiple RX queue support to be2net. There are
upto 4 extra rx-queues per port into which TCP/UDP traffic can be hashed into.
Some of the ethtool stats are now displayed on a per queue basis.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/benet/be.h         |   82 +++---
 drivers/net/benet/be_cmds.c    |   40 +++-
 drivers/net/benet/be_cmds.h    |   32 ++-
 drivers/net/benet/be_ethtool.c |  174 ++++++++-----
 drivers/net/benet/be_main.c    |  561 ++++++++++++++++++++++------------------
 5 files changed, 526 insertions(+), 363 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 4faf696..1afabb1 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -78,6 +78,8 @@ static inline char *nic_name(struct pci_dev *pdev)
 #define MCC_Q_LEN		128	/* total size not to exceed 8 pages */
 #define MCC_CQ_LEN		256
 
+#define MAX_RSS_QS		4	/* BE limit is 4 queues/port */
+#define BE_MAX_MSIX_VECTORS	(MAX_RSS_QS + 1 + 1)/* RSS qs + 1 def Rx + Tx */
 #define BE_NAPI_WEIGHT		64
 #define MAX_RX_POST 		BE_NAPI_WEIGHT /* Frags posted at a time */
 #define RX_FRAGS_REFILL_WM	(RX_Q_LEN - MAX_RX_POST)
@@ -157,10 +159,9 @@ struct be_mcc_obj {
 	bool rearm_cq;
 };
 
-struct be_drvr_stats {
+struct be_tx_stats {
 	u32 be_tx_reqs;		/* number of TX requests initiated */
 	u32 be_tx_stops;	/* number of times TX Q was stopped */
-	u32 be_fwd_reqs;	/* number of send reqs through forwarding i/f */
 	u32 be_tx_wrbs;		/* number of tx WRBs used */
 	u32 be_tx_events;	/* number of tx completion events  */
 	u32 be_tx_compl;	/* number of tx completion entries processed */
@@ -169,35 +170,6 @@ struct be_drvr_stats {
 	u64 be_tx_bytes_prev;
 	u64 be_tx_pkts;
 	u32 be_tx_rate;
-
-	u32 cache_barrier[16];
-
-	u32 be_ethrx_post_fail;/* number of ethrx buffer alloc failures */
-	u32 be_rx_polls;	/* number of times NAPI called poll function */
-	u32 be_rx_events;	/* number of ucast rx completion events  */
-	u32 be_rx_compl;	/* number of rx completion entries processed */
-	ulong be_rx_jiffies;
-	u64 be_rx_bytes;
-	u64 be_rx_bytes_prev;
-	u64 be_rx_pkts;
-	u32 be_rx_rate;
-	u32 be_rx_mcast_pkt;
-	/* number of non ether type II frames dropped where
-	 * frame len > length field of Mac Hdr */
-	u32 be_802_3_dropped_frames;
-	/* number of non ether type II frames malformed where
-	 * in frame len < length field of Mac Hdr */
-	u32 be_802_3_malformed_frames;
-	u32 be_rxcp_err;	/* Num rx completion entries w/ err set. */
-	ulong rx_fps_jiffies;	/* jiffies at last FPS calc */
-	u32 be_rx_frags;
-	u32 be_prev_rx_frags;
-	u32 be_rx_fps;		/* Rx frags per second */
-};
-
-struct be_stats_obj {
-	struct be_drvr_stats drvr_stats;
-	struct be_dma_mem cmd;
 };
 
 struct be_tx_obj {
@@ -215,10 +187,34 @@ struct be_rx_page_info {
 	bool last_page_user;
 };
 
+struct be_rx_stats {
+	u32 rx_post_fail;/* number of ethrx buffer alloc failures */
+	u32 rx_polls;	/* number of times NAPI called poll function */
+	u32 rx_events;	/* number of ucast rx completion events  */
+	u32 rx_compl;	/* number of rx completion entries processed */
+	ulong rx_jiffies;
+	u64 rx_bytes;
+	u64 rx_bytes_prev;
+	u64 rx_pkts;
+	u32 rx_rate;
+	u32 rx_mcast_pkts;
+	u32 rxcp_err;	/* Num rx completion entries w/ err set. */
+	ulong rx_fps_jiffies;	/* jiffies at last FPS calc */
+	u32 rx_frags;
+	u32 prev_rx_frags;
+	u32 rx_fps;		/* Rx frags per second */
+};
+
 struct be_rx_obj {
+	struct be_adapter *adapter;
 	struct be_queue_info q;
 	struct be_queue_info cq;
 	struct be_rx_page_info page_info_tbl[RX_Q_LEN];
+	struct be_eq_obj rx_eq;
+	struct be_rx_stats stats;
+	u8 rss_id;
+	bool rx_post_starved;	/* Zero rx frags have been posted to BE */
+	u32 cache_line_barrier[16];
 };
 
 struct be_vf_cfg {
@@ -229,7 +225,6 @@ struct be_vf_cfg {
 	u32 vf_tx_rate;
 };
 
-#define BE_NUM_MSIX_VECTORS		2	/* 1 each for Tx and Rx */
 #define BE_INVALID_PMAC_ID		0xffffffff
 struct be_adapter {
 	struct pci_dev *pdev;
@@ -249,21 +244,21 @@ struct be_adapter {
 	spinlock_t mcc_lock;	/* For serializing mcc cmds to BE card */
 	spinlock_t mcc_cq_lock;
 
-	struct msix_entry msix_entries[BE_NUM_MSIX_VECTORS];
+	struct msix_entry msix_entries[BE_MAX_MSIX_VECTORS];
 	bool msix_enabled;
 	bool isr_registered;
 
 	/* TX Rings */
 	struct be_eq_obj tx_eq;
 	struct be_tx_obj tx_obj;
+	struct be_tx_stats tx_stats;
 
 	u32 cache_line_break[8];
 
 	/* Rx rings */
-	struct be_eq_obj rx_eq;
-	struct be_rx_obj rx_obj;
+	struct be_rx_obj rx_obj[MAX_RSS_QS + 1]; /* one default non-rss Q */
+	u32 num_rx_qs;
 	u32 big_page_size;	/* Compounded page size shared by rx wrbs */
-	bool rx_post_starved;	/* Zero rx frags have been posted to BE */
 
 	struct vlan_group *vlan_grp;
 	u16 vlans_added;
@@ -271,7 +266,7 @@ struct be_adapter {
 	u8 vlan_tag[VLAN_GROUP_ARRAY_LEN];
 	struct be_dma_mem mc_cmd_mem;
 
-	struct be_stats_obj stats;
+	struct be_dma_mem stats_cmd;
 	/* Work queue used to perform periodic tasks like getting statistics */
 	struct delayed_work work;
 
@@ -287,6 +282,7 @@ struct be_adapter {
 	bool promiscuous;
 	bool wol;
 	u32 function_mode;
+	u32 function_caps;
 	u32 rx_fc;		/* Rx flow control */
 	u32 tx_fc;		/* Tx flow control */
 	bool ue_detected;
@@ -313,10 +309,20 @@ struct be_adapter {
 
 extern const struct ethtool_ops be_ethtool_ops;
 
-#define drvr_stats(adapter)		(&adapter->stats.drvr_stats)
+#define tx_stats(adapter)		(&adapter->tx_stats)
+#define rx_stats(rxo)			(&rxo->stats)
 
 #define BE_SET_NETDEV_OPS(netdev, ops)	(netdev->netdev_ops = ops)
 
+#define for_all_rx_queues(adapter, rxo, i)				\
+	for (i = 0, rxo = &adapter->rx_obj[i]; i < adapter->num_rx_qs;	\
+		i++, rxo++)
+
+/* Just skip the first default non-rss queue */
+#define for_all_rss_queues(adapter, rxo, i)				\
+	for (i = 0, rxo = &adapter->rx_obj[i+1]; i < (adapter->num_rx_qs - 1);\
+		i++, rxo++)
+
 #define PAGE_SHIFT_4K		12
 #define PAGE_SIZE_4K		(1 << PAGE_SHIFT_4K)
 
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 0db28b4..bf2dc26 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -71,7 +71,7 @@ static int be_mcc_compl_process(struct be_adapter *adapter,
 	if (compl_status == MCC_STATUS_SUCCESS) {
 		if (compl->tag0 == OPCODE_ETH_GET_STATISTICS) {
 			struct be_cmd_resp_get_stats *resp =
-						adapter->stats.cmd.va;
+						adapter->stats_cmd.va;
 			be_dws_le_to_cpu(&resp->hw_stats,
 						sizeof(resp->hw_stats));
 			netdev_stats_update(adapter);
@@ -754,7 +754,7 @@ int be_cmd_txq_create(struct be_adapter *adapter,
 /* Uses mbox */
 int be_cmd_rxq_create(struct be_adapter *adapter,
 		struct be_queue_info *rxq, u16 cq_id, u16 frag_size,
-		u16 max_frame_size, u32 if_id, u32 rss)
+		u16 max_frame_size, u32 if_id, u32 rss, u8 *rss_id)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_eth_rx_create *req;
@@ -785,6 +785,7 @@ int be_cmd_rxq_create(struct be_adapter *adapter,
 		struct be_cmd_resp_eth_rx_create *resp = embedded_payload(wrb);
 		rxq->id = le16_to_cpu(resp->id);
 		rxq->created = true;
+		*rss_id = resp->rss_id;
 	}
 
 	spin_unlock(&adapter->mbox_lock);
@@ -1259,7 +1260,8 @@ err:
 }
 
 /* Uses mbox */
-int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num, u32 *mode)
+int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num,
+		u32 *mode, u32 *caps)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_query_fw_cfg *req;
@@ -1281,6 +1283,7 @@ int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num, u32 *mode)
 		struct be_cmd_resp_query_fw_cfg *resp = embedded_payload(wrb);
 		*port_num = le32_to_cpu(resp->phys_port);
 		*mode = le32_to_cpu(resp->function_mode);
+		*caps = le32_to_cpu(resp->function_caps);
 	}
 
 	spin_unlock(&adapter->mbox_lock);
@@ -1311,6 +1314,37 @@ int be_cmd_reset_function(struct be_adapter *adapter)
 	return status;
 }
 
+int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, u16 table_size)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_rss_config *req;
+	u32 myhash[10];
+	int status;
+
+	spin_lock(&adapter->mbox_lock);
+
+	wrb = wrb_from_mbox(adapter);
+	req = embedded_payload(wrb);
+
+	be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0,
+		OPCODE_ETH_RSS_CONFIG);
+
+	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_ETH,
+		OPCODE_ETH_RSS_CONFIG, sizeof(*req));
+
+	req->if_id = cpu_to_le32(adapter->if_handle);
+	req->enable_rss = cpu_to_le16(RSS_ENABLE_TCP_IPV4 | RSS_ENABLE_IPV4);
+	req->cpu_table_size_log2 = cpu_to_le16(fls(table_size) - 1);
+	memcpy(req->cpu_table, rsstable, table_size);
+	memcpy(req->hash, myhash, sizeof(myhash));
+	be_dws_cpu_to_le(req->hash, sizeof(req->hash));
+
+	status = be_mbox_notify_wait(adapter);
+
+	spin_unlock(&adapter->mbox_lock);
+	return status;
+}
+
 /* Uses sync mcc */
 int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num,
 			u8 bcn, u8 sts, u8 state)
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index ad1e6fa..b7a40b1 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -147,6 +147,7 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_READ_TRANSRECV_DATA		73
 #define OPCODE_COMMON_GET_PHY_DETAILS			102
 
+#define OPCODE_ETH_RSS_CONFIG				1
 #define OPCODE_ETH_ACPI_CONFIG				2
 #define OPCODE_ETH_PROMISCUOUS				3
 #define OPCODE_ETH_GET_STATISTICS			4
@@ -409,7 +410,7 @@ struct be_cmd_req_eth_rx_create {
 struct be_cmd_resp_eth_rx_create {
 	struct be_cmd_resp_hdr hdr;
 	u16 id;
-	u8 cpu_id;
+	u8 rss_id;
 	u8 rsvd0;
 } __packed;
 
@@ -739,9 +740,10 @@ struct be_cmd_resp_modify_eq_delay {
 } __packed;
 
 /******************** Get FW Config *******************/
+#define BE_FUNCTION_CAPS_RSS			0x2
 struct be_cmd_req_query_fw_cfg {
 	struct be_cmd_req_hdr hdr;
-	u32 rsvd[30];
+	u32 rsvd[31];
 };
 
 struct be_cmd_resp_query_fw_cfg {
@@ -751,6 +753,26 @@ struct be_cmd_resp_query_fw_cfg {
 	u32 phys_port;
 	u32 function_mode;
 	u32 rsvd[26];
+	u32 function_caps;
+};
+
+/******************** RSS Config *******************/
+/* RSS types */
+#define RSS_ENABLE_NONE				0x0
+#define RSS_ENABLE_IPV4				0x1
+#define RSS_ENABLE_TCP_IPV4			0x2
+#define RSS_ENABLE_IPV6				0x4
+#define RSS_ENABLE_TCP_IPV6			0x8
+
+struct be_cmd_req_rss_config {
+	struct be_cmd_req_hdr hdr;
+	u32 if_id;
+	u16 enable_rss;
+	u16 cpu_table_size_log2;
+	u32 hash[10];
+	u8 cpu_table[128];
+	u8 flush;
+	u8 rsvd0[3];
 };
 
 /******************** Port Beacon ***************************/
@@ -937,7 +959,7 @@ extern int be_cmd_txq_create(struct be_adapter *adapter,
 extern int be_cmd_rxq_create(struct be_adapter *adapter,
 			struct be_queue_info *rxq, u16 cq_id,
 			u16 frag_size, u16 max_frame_size, u32 if_id,
-			u32 rss);
+			u32 rss, u8 *rss_id);
 extern int be_cmd_q_destroy(struct be_adapter *adapter, struct be_queue_info *q,
 			int type);
 extern int be_cmd_link_status_query(struct be_adapter *adapter,
@@ -960,8 +982,10 @@ extern int be_cmd_set_flow_control(struct be_adapter *adapter,
 extern int be_cmd_get_flow_control(struct be_adapter *adapter,
 			u32 *tx_fc, u32 *rx_fc);
 extern int be_cmd_query_fw_cfg(struct be_adapter *adapter,
-			u32 *port_num, u32 *cap);
+			u32 *port_num, u32 *function_mode, u32 *function_caps);
 extern int be_cmd_reset_function(struct be_adapter *adapter);
+extern int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable,
+			u16 table_size);
 extern int be_process_mcc(struct be_adapter *adapter, int *status);
 extern int be_cmd_set_beacon_state(struct be_adapter *adapter,
 			u8 port_num, u8 beacon, u8 status, u8 state);
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index d920634..0f46366 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -26,14 +26,16 @@ struct be_ethtool_stat {
 	int offset;
 };
 
-enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT, ERXSTAT};
+enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT};
 #define FIELDINFO(_struct, field) FIELD_SIZEOF(_struct, field), \
 					offsetof(_struct, field)
 #define NETSTAT_INFO(field) 	#field, NETSTAT,\
 					FIELDINFO(struct net_device_stats,\
 						field)
-#define DRVSTAT_INFO(field) 	#field, DRVSTAT,\
-					FIELDINFO(struct be_drvr_stats, field)
+#define DRVSTAT_TX_INFO(field)	#field, DRVSTAT_TX,\
+					FIELDINFO(struct be_tx_stats, field)
+#define DRVSTAT_RX_INFO(field)	#field, DRVSTAT_RX,\
+					FIELDINFO(struct be_rx_stats, field)
 #define MISCSTAT_INFO(field) 	#field, MISCSTAT,\
 					FIELDINFO(struct be_rxf_stats, field)
 #define PORTSTAT_INFO(field) 	#field, PORTSTAT,\
@@ -51,21 +53,12 @@ static const struct be_ethtool_stat et_stats[] = {
 	{NETSTAT_INFO(tx_errors)},
 	{NETSTAT_INFO(rx_dropped)},
 	{NETSTAT_INFO(tx_dropped)},
-	{DRVSTAT_INFO(be_tx_reqs)},
-	{DRVSTAT_INFO(be_tx_stops)},
-	{DRVSTAT_INFO(be_fwd_reqs)},
-	{DRVSTAT_INFO(be_tx_wrbs)},
-	{DRVSTAT_INFO(be_rx_polls)},
-	{DRVSTAT_INFO(be_tx_events)},
-	{DRVSTAT_INFO(be_rx_events)},
-	{DRVSTAT_INFO(be_tx_compl)},
-	{DRVSTAT_INFO(be_rx_compl)},
-	{DRVSTAT_INFO(be_rx_mcast_pkt)},
-	{DRVSTAT_INFO(be_ethrx_post_fail)},
-	{DRVSTAT_INFO(be_802_3_dropped_frames)},
-	{DRVSTAT_INFO(be_802_3_malformed_frames)},
-	{DRVSTAT_INFO(be_tx_rate)},
-	{DRVSTAT_INFO(be_rx_rate)},
+	{DRVSTAT_TX_INFO(be_tx_rate)},
+	{DRVSTAT_TX_INFO(be_tx_reqs)},
+	{DRVSTAT_TX_INFO(be_tx_wrbs)},
+	{DRVSTAT_TX_INFO(be_tx_stops)},
+	{DRVSTAT_TX_INFO(be_tx_events)},
+	{DRVSTAT_TX_INFO(be_tx_compl)},
 	{PORTSTAT_INFO(rx_unicast_frames)},
 	{PORTSTAT_INFO(rx_multicast_frames)},
 	{PORTSTAT_INFO(rx_broadcast_frames)},
@@ -106,11 +99,24 @@ static const struct be_ethtool_stat et_stats[] = {
 	{MISCSTAT_INFO(rx_drops_too_many_frags)},
 	{MISCSTAT_INFO(rx_drops_invalid_ring)},
 	{MISCSTAT_INFO(forwarded_packets)},
-	{MISCSTAT_INFO(rx_drops_mtu)},
-	{ERXSTAT_INFO(rx_drops_no_fragments)},
+	{MISCSTAT_INFO(rx_drops_mtu)}
 };
 #define ETHTOOL_STATS_NUM ARRAY_SIZE(et_stats)
 
+/* Stats related to multi RX queues */
+static const struct be_ethtool_stat et_rx_stats[] = {
+	{DRVSTAT_RX_INFO(rx_bytes)},
+	{DRVSTAT_RX_INFO(rx_pkts)},
+	{DRVSTAT_RX_INFO(rx_rate)},
+	{DRVSTAT_RX_INFO(rx_polls)},
+	{DRVSTAT_RX_INFO(rx_events)},
+	{DRVSTAT_RX_INFO(rx_compl)},
+	{DRVSTAT_RX_INFO(rx_mcast_pkts)},
+	{DRVSTAT_RX_INFO(rx_post_fail)},
+	{ERXSTAT_INFO(rx_drops_no_fragments)}
+};
+#define ETHTOOL_RXSTATS_NUM (ARRAY_SIZE(et_rx_stats))
+
 static const char et_self_tests[][ETH_GSTRING_LEN] = {
 	"MAC Loopback test",
 	"PHY Loopback test",
@@ -143,7 +149,7 @@ static int
 be_get_coalesce(struct net_device *netdev, struct ethtool_coalesce *coalesce)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
+	struct be_eq_obj *rx_eq = &adapter->rx_obj[0].rx_eq;
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 
 	coalesce->rx_coalesce_usecs = rx_eq->cur_eqd;
@@ -167,25 +173,49 @@ static int
 be_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *coalesce)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
+	struct be_rx_obj *rxo;
+	struct be_eq_obj *rx_eq;
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 	u32 tx_max, tx_min, tx_cur;
 	u32 rx_max, rx_min, rx_cur;
-	int status = 0;
+	int status = 0, i;
 
 	if (coalesce->use_adaptive_tx_coalesce == 1)
 		return -EINVAL;
 
-	/* if AIC is being turned on now, start with an EQD of 0 */
-	if (rx_eq->enable_aic == 0 &&
-		coalesce->use_adaptive_rx_coalesce == 1) {
-		rx_eq->cur_eqd = 0;
+	for_all_rx_queues(adapter, rxo, i) {
+		rx_eq = &rxo->rx_eq;
+
+		if (!rx_eq->enable_aic && coalesce->use_adaptive_rx_coalesce)
+			rx_eq->cur_eqd = 0;
+		rx_eq->enable_aic = coalesce->use_adaptive_rx_coalesce;
+
+		rx_max = coalesce->rx_coalesce_usecs_high;
+		rx_min = coalesce->rx_coalesce_usecs_low;
+		rx_cur = coalesce->rx_coalesce_usecs;
+
+		if (rx_eq->enable_aic) {
+			if (rx_max > BE_MAX_EQD)
+				rx_max = BE_MAX_EQD;
+			if (rx_min > rx_max)
+				rx_min = rx_max;
+			rx_eq->max_eqd = rx_max;
+			rx_eq->min_eqd = rx_min;
+			if (rx_eq->cur_eqd > rx_max)
+				rx_eq->cur_eqd = rx_max;
+			if (rx_eq->cur_eqd < rx_min)
+				rx_eq->cur_eqd = rx_min;
+		} else {
+			if (rx_cur > BE_MAX_EQD)
+				rx_cur = BE_MAX_EQD;
+			if (rx_eq->cur_eqd != rx_cur) {
+				status = be_cmd_modify_eqd(adapter, rx_eq->q.id,
+						rx_cur);
+				if (!status)
+					rx_eq->cur_eqd = rx_cur;
+			}
+		}
 	}
-	rx_eq->enable_aic = coalesce->use_adaptive_rx_coalesce;
-
-	rx_max = coalesce->rx_coalesce_usecs_high;
-	rx_min = coalesce->rx_coalesce_usecs_low;
-	rx_cur = coalesce->rx_coalesce_usecs;
 
 	tx_max = coalesce->tx_coalesce_usecs_high;
 	tx_min = coalesce->tx_coalesce_usecs_low;
@@ -199,27 +229,6 @@ be_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *coalesce)
 			tx_eq->cur_eqd = tx_cur;
 	}
 
-	if (rx_eq->enable_aic) {
-		if (rx_max > BE_MAX_EQD)
-			rx_max = BE_MAX_EQD;
-		if (rx_min > rx_max)
-			rx_min = rx_max;
-		rx_eq->max_eqd = rx_max;
-		rx_eq->min_eqd = rx_min;
-		if (rx_eq->cur_eqd > rx_max)
-			rx_eq->cur_eqd = rx_max;
-		if (rx_eq->cur_eqd < rx_min)
-			rx_eq->cur_eqd = rx_min;
-	} else {
-		if (rx_cur > BE_MAX_EQD)
-			rx_cur = BE_MAX_EQD;
-		if (rx_eq->cur_eqd != rx_cur) {
-			status = be_cmd_modify_eqd(adapter, rx_eq->q.id,
-					rx_cur);
-			if (!status)
-				rx_eq->cur_eqd = rx_cur;
-		}
-	}
 	return 0;
 }
 
@@ -247,32 +256,25 @@ be_get_ethtool_stats(struct net_device *netdev,
 		struct ethtool_stats *stats, uint64_t *data)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_drvr_stats *drvr_stats = &adapter->stats.drvr_stats;
-	struct be_hw_stats *hw_stats = hw_stats_from_cmd(adapter->stats.cmd.va);
-	struct be_rxf_stats *rxf_stats = &hw_stats->rxf;
-	struct be_port_rxf_stats *port_stats =
-			&rxf_stats->port[adapter->port_num];
-	struct net_device_stats *net_stats = &netdev->stats;
+	struct be_hw_stats *hw_stats = hw_stats_from_cmd(adapter->stats_cmd.va);
 	struct be_erx_stats *erx_stats = &hw_stats->erx;
+	struct be_rx_obj *rxo;
 	void *p = NULL;
-	int i;
+	int i, j;
 
 	for (i = 0; i < ETHTOOL_STATS_NUM; i++) {
 		switch (et_stats[i].type) {
 		case NETSTAT:
-			p = net_stats;
+			p = &netdev->stats;
 			break;
-		case DRVSTAT:
-			p = drvr_stats;
+		case DRVSTAT_TX:
+			p = &adapter->tx_stats;
 			break;
 		case PORTSTAT:
-			p = port_stats;
+			p = &hw_stats->rxf.port[adapter->port_num];
 			break;
 		case MISCSTAT:
-			p = rxf_stats;
-			break;
-		case ERXSTAT: /* Currently only one ERX stat is provided */
-			p = (u32 *)erx_stats + adapter->rx_obj.q.id;
+			p = &hw_stats->rxf;
 			break;
 		}
 
@@ -280,19 +282,44 @@ be_get_ethtool_stats(struct net_device *netdev,
 		data[i] = (et_stats[i].size == sizeof(u64)) ?
 				*(u64 *)p: *(u32 *)p;
 	}
+
+	for_all_rx_queues(adapter, rxo, j) {
+		for (i = 0; i < ETHTOOL_RXSTATS_NUM; i++) {
+			switch (et_rx_stats[i].type) {
+			case DRVSTAT_RX:
+				p = (u8 *)&rxo->stats + et_rx_stats[i].offset;
+				break;
+			case ERXSTAT:
+				p = (u32 *)erx_stats + rxo->q.id;
+				break;
+			}
+			data[ETHTOOL_STATS_NUM + j * ETHTOOL_RXSTATS_NUM + i] =
+				(et_rx_stats[i].size == sizeof(u64)) ?
+					*(u64 *)p: *(u32 *)p;
+		}
+	}
 }
 
 static void
 be_get_stat_strings(struct net_device *netdev, uint32_t stringset,
 		uint8_t *data)
 {
-	int i;
+	struct be_adapter *adapter = netdev_priv(netdev);
+	int i, j;
+
 	switch (stringset) {
 	case ETH_SS_STATS:
 		for (i = 0; i < ETHTOOL_STATS_NUM; i++) {
 			memcpy(data, et_stats[i].desc, ETH_GSTRING_LEN);
 			data += ETH_GSTRING_LEN;
 		}
+		for (i = 0; i < adapter->num_rx_qs; i++) {
+			for (j = 0; j < ETHTOOL_RXSTATS_NUM; j++) {
+				sprintf(data, "rxq%d: %s", i,
+					et_rx_stats[j].desc);
+				data += ETH_GSTRING_LEN;
+			}
+		}
 		break;
 	case ETH_SS_TEST:
 		for (i = 0; i < ETHTOOL_TESTS_NUM; i++) {
@@ -305,11 +332,14 @@ be_get_stat_strings(struct net_device *netdev, uint32_t stringset,
 
 static int be_get_sset_count(struct net_device *netdev, int stringset)
 {
+	struct be_adapter *adapter = netdev_priv(netdev);
+
 	switch (stringset) {
 	case ETH_SS_TEST:
 		return ETHTOOL_TESTS_NUM;
 	case ETH_SS_STATS:
-		return ETHTOOL_STATS_NUM;
+		return ETHTOOL_STATS_NUM +
+			adapter->num_rx_qs * ETHTOOL_RXSTATS_NUM;
 	default:
 		return -EINVAL;
 	}
@@ -424,10 +454,10 @@ be_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
-	ring->rx_max_pending = adapter->rx_obj.q.len;
+	ring->rx_max_pending = adapter->rx_obj[0].q.len;
 	ring->tx_max_pending = adapter->tx_obj.q.len;
 
-	ring->rx_pending = atomic_read(&adapter->rx_obj.q.used);
+	ring->rx_pending = atomic_read(&adapter->rx_obj[0].q.used);
 	ring->tx_pending = atomic_read(&adapter->tx_obj.q.used);
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 43a3a57..9a1cd28 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -32,6 +32,10 @@ module_param(num_vfs, uint, S_IRUGO);
 MODULE_PARM_DESC(rx_frag_size, "Size of a fragment that holds rcvd data.");
 MODULE_PARM_DESC(num_vfs, "Number of PCI VFs to initialize");
 
+static bool multi_rxq = true;
+module_param(multi_rxq, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(multi_rxq, "Multi Rx Queue support. Enabled by default");
+
 static DEFINE_PCI_DEVICE_TABLE(be_dev_ids) = {
 	{ PCI_DEVICE(BE_VENDOR_ID, BE_DEVICE_ID1) },
 	{ PCI_DEVICE(BE_VENDOR_ID, BE_DEVICE_ID2) },
@@ -111,6 +115,11 @@ static char *ue_status_hi_desc[] = {
 	"Unknown"
 };
 
+static inline bool be_multi_rxq(struct be_adapter *adapter)
+{
+	return (adapter->num_rx_qs > 1);
+}
+
 static void be_queue_free(struct be_adapter *adapter, struct be_queue_info *q)
 {
 	struct be_dma_mem *mem = &q->dma_mem;
@@ -236,18 +245,27 @@ netdev_addr:
 
 void netdev_stats_update(struct be_adapter *adapter)
 {
-	struct be_hw_stats *hw_stats = hw_stats_from_cmd(adapter->stats.cmd.va);
+	struct be_hw_stats *hw_stats = hw_stats_from_cmd(adapter->stats_cmd.va);
 	struct be_rxf_stats *rxf_stats = &hw_stats->rxf;
 	struct be_port_rxf_stats *port_stats =
 			&rxf_stats->port[adapter->port_num];
 	struct net_device_stats *dev_stats = &adapter->netdev->stats;
 	struct be_erx_stats *erx_stats = &hw_stats->erx;
+	struct be_rx_obj *rxo;
+	int i;
 
-	dev_stats->rx_packets = drvr_stats(adapter)->be_rx_pkts;
-	dev_stats->tx_packets = drvr_stats(adapter)->be_tx_pkts;
-	dev_stats->rx_bytes = drvr_stats(adapter)->be_rx_bytes;
-	dev_stats->tx_bytes = drvr_stats(adapter)->be_tx_bytes;
-	dev_stats->multicast = drvr_stats(adapter)->be_rx_mcast_pkt;
+	memset(dev_stats, 0, sizeof(*dev_stats));
+	for_all_rx_queues(adapter, rxo, i) {
+		dev_stats->rx_packets += rx_stats(rxo)->rx_pkts;
+		dev_stats->rx_bytes += rx_stats(rxo)->rx_bytes;
+		dev_stats->multicast += rx_stats(rxo)->rx_mcast_pkts;
+		/*  no space in linux buffers: best possible approximation */
+		dev_stats->rx_dropped +=
+			erx_stats->rx_drops_no_fragments[rxo->q.id];
+	}
+
+	dev_stats->tx_packets = tx_stats(adapter)->be_tx_pkts;
+	dev_stats->tx_bytes = tx_stats(adapter)->be_tx_bytes;
 
 	/* bad pkts received */
 	dev_stats->rx_errors = port_stats->rx_crc_errors +
@@ -264,18 +282,11 @@ void netdev_stats_update(struct be_adapter *adapter)
 		port_stats->rx_ip_checksum_errs +
 		port_stats->rx_udp_checksum_errs;
 
-	/*  no space in linux buffers: best possible approximation */
-	dev_stats->rx_dropped =
-		erx_stats->rx_drops_no_fragments[adapter->rx_obj.q.id];
-
 	/* detailed rx errors */
 	dev_stats->rx_length_errors = port_stats->rx_in_range_errors +
 		port_stats->rx_out_range_errors +
 		port_stats->rx_frame_too_long;
 
-	/* receive ring buffer overflow */
-	dev_stats->rx_over_errors = 0;
-
 	dev_stats->rx_crc_errors = port_stats->rx_crc_errors;
 
 	/* frame alignment errors */
@@ -286,23 +297,6 @@ void netdev_stats_update(struct be_adapter *adapter)
 	dev_stats->rx_fifo_errors = port_stats->rx_fifo_overflow +
 					port_stats->rx_input_fifo_overflow +
 					rxf_stats->rx_drops_no_pbuf;
-	/* receiver missed packetd */
-	dev_stats->rx_missed_errors = 0;
-
-	/*  packet transmit problems */
-	dev_stats->tx_errors = 0;
-
-	/* no space available in linux */
-	dev_stats->tx_dropped = 0;
-
-	dev_stats->collisions = 0;
-
-	/* detailed tx_errors */
-	dev_stats->tx_aborted_errors = 0;
-	dev_stats->tx_carrier_errors = 0;
-	dev_stats->tx_fifo_errors = 0;
-	dev_stats->tx_heartbeat_errors = 0;
-	dev_stats->tx_window_errors = 0;
 }
 
 void be_link_status_update(struct be_adapter *adapter, bool link_up)
@@ -326,10 +320,10 @@ void be_link_status_update(struct be_adapter *adapter, bool link_up)
 }
 
 /* Update the EQ delay n BE based on the RX frags consumed / sec */
-static void be_rx_eqd_update(struct be_adapter *adapter)
+static void be_rx_eqd_update(struct be_adapter *adapter, struct be_rx_obj *rxo)
 {
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
-	struct be_drvr_stats *stats = &adapter->stats.drvr_stats;
+	struct be_eq_obj *rx_eq = &rxo->rx_eq;
+	struct be_rx_stats *stats = &rxo->stats;
 	ulong now = jiffies;
 	u32 eqd;
 
@@ -346,12 +340,12 @@ static void be_rx_eqd_update(struct be_adapter *adapter)
 	if ((now - stats->rx_fps_jiffies) < HZ)
 		return;
 
-	stats->be_rx_fps = (stats->be_rx_frags - stats->be_prev_rx_frags) /
+	stats->rx_fps = (stats->rx_frags - stats->prev_rx_frags) /
 			((now - stats->rx_fps_jiffies) / HZ);
 
 	stats->rx_fps_jiffies = now;
-	stats->be_prev_rx_frags = stats->be_rx_frags;
-	eqd = stats->be_rx_fps / 110000;
+	stats->prev_rx_frags = stats->rx_frags;
+	eqd = stats->rx_fps / 110000;
 	eqd = eqd << 3;
 	if (eqd > rx_eq->max_eqd)
 		eqd = rx_eq->max_eqd;
@@ -378,7 +372,7 @@ static u32 be_calc_rate(u64 bytes, unsigned long ticks)
 
 static void be_tx_rate_update(struct be_adapter *adapter)
 {
-	struct be_drvr_stats *stats = drvr_stats(adapter);
+	struct be_tx_stats *stats = tx_stats(adapter);
 	ulong now = jiffies;
 
 	/* Wrapped around? */
@@ -400,7 +394,7 @@ static void be_tx_rate_update(struct be_adapter *adapter)
 static void be_tx_stats_update(struct be_adapter *adapter,
 			u32 wrb_cnt, u32 copied, u32 gso_segs, bool stopped)
 {
-	struct be_drvr_stats *stats = drvr_stats(adapter);
+	struct be_tx_stats *stats = tx_stats(adapter);
 	stats->be_tx_reqs++;
 	stats->be_tx_wrbs += wrb_cnt;
 	stats->be_tx_bytes += copied;
@@ -651,14 +645,8 @@ static int be_vid_config(struct be_adapter *adapter, bool vf, u32 vf_num)
 static void be_vlan_register(struct net_device *netdev, struct vlan_group *grp)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
-	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 
-	be_eq_notify(adapter, rx_eq->q.id, false, false, 0);
-	be_eq_notify(adapter, tx_eq->q.id, false, false, 0);
 	adapter->vlan_grp = grp;
-	be_eq_notify(adapter, rx_eq->q.id, true, false, 0);
-	be_eq_notify(adapter, tx_eq->q.id, true, false, 0);
 }
 
 static void be_vlan_add_vid(struct net_device *netdev, u16 vid)
@@ -820,40 +808,38 @@ static int be_set_vf_tx_rate(struct net_device *netdev,
 	return status;
 }
 
-static void be_rx_rate_update(struct be_adapter *adapter)
+static void be_rx_rate_update(struct be_rx_obj *rxo)
 {
-	struct be_drvr_stats *stats = drvr_stats(adapter);
+	struct be_rx_stats *stats = &rxo->stats;
 	ulong now = jiffies;
 
 	/* Wrapped around */
-	if (time_before(now, stats->be_rx_jiffies)) {
-		stats->be_rx_jiffies = now;
+	if (time_before(now, stats->rx_jiffies)) {
+		stats->rx_jiffies = now;
 		return;
 	}
 
 	/* Update the rate once in two seconds */
-	if ((now - stats->be_rx_jiffies) < 2 * HZ)
+	if ((now - stats->rx_jiffies) < 2 * HZ)
 		return;
 
-	stats->be_rx_rate = be_calc_rate(stats->be_rx_bytes
-					  - stats->be_rx_bytes_prev,
-					 now - stats->be_rx_jiffies);
-	stats->be_rx_jiffies = now;
-	stats->be_rx_bytes_prev = stats->be_rx_bytes;
+	stats->rx_rate = be_calc_rate(stats->rx_bytes - stats->rx_bytes_prev,
+				now - stats->rx_jiffies);
+	stats->rx_jiffies = now;
+	stats->rx_bytes_prev = stats->rx_bytes;
 }
 
-static void be_rx_stats_update(struct be_adapter *adapter,
+static void be_rx_stats_update(struct be_rx_obj *rxo,
 		u32 pktsize, u16 numfrags, u8 pkt_type)
 {
-	struct be_drvr_stats *stats = drvr_stats(adapter);
-
-	stats->be_rx_compl++;
-	stats->be_rx_frags += numfrags;
-	stats->be_rx_bytes += pktsize;
-	stats->be_rx_pkts++;
+	struct be_rx_stats *stats = &rxo->stats;
 
+	stats->rx_compl++;
+	stats->rx_frags += numfrags;
+	stats->rx_bytes += pktsize;
+	stats->rx_pkts++;
 	if (pkt_type == BE_MULTICAST_PACKET)
-		stats->be_rx_mcast_pkt++;
+		stats->rx_mcast_pkts++;
 }
 
 static inline bool do_pkt_csum(struct be_eth_rx_compl *rxcp, bool cso)
@@ -873,12 +859,14 @@ static inline bool do_pkt_csum(struct be_eth_rx_compl *rxcp, bool cso)
 }
 
 static struct be_rx_page_info *
-get_rx_page_info(struct be_adapter *adapter, u16 frag_idx)
+get_rx_page_info(struct be_adapter *adapter,
+		struct be_rx_obj *rxo,
+		u16 frag_idx)
 {
 	struct be_rx_page_info *rx_page_info;
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
+	struct be_queue_info *rxq = &rxo->q;
 
-	rx_page_info = &adapter->rx_obj.page_info_tbl[frag_idx];
+	rx_page_info = &rxo->page_info_tbl[frag_idx];
 	BUG_ON(!rx_page_info->page);
 
 	if (rx_page_info->last_page_user) {
@@ -893,9 +881,10 @@ get_rx_page_info(struct be_adapter *adapter, u16 frag_idx)
 
 /* Throwaway the data in the Rx completion */
 static void be_rx_compl_discard(struct be_adapter *adapter,
-			struct be_eth_rx_compl *rxcp)
+		struct be_rx_obj *rxo,
+		struct be_eth_rx_compl *rxcp)
 {
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
+	struct be_queue_info *rxq = &rxo->q;
 	struct be_rx_page_info *page_info;
 	u16 rxq_idx, i, num_rcvd;
 
@@ -903,7 +892,7 @@ static void be_rx_compl_discard(struct be_adapter *adapter,
 	num_rcvd = AMAP_GET_BITS(struct amap_eth_rx_compl, numfrags, rxcp);
 
 	for (i = 0; i < num_rcvd; i++) {
-		page_info = get_rx_page_info(adapter, rxq_idx);
+		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
 		put_page(page_info->page);
 		memset(page_info, 0, sizeof(*page_info));
 		index_inc(&rxq_idx, rxq->len);
@@ -914,11 +903,11 @@ static void be_rx_compl_discard(struct be_adapter *adapter,
  * skb_fill_rx_data forms a complete skb for an ether frame
  * indicated by rxcp.
  */
-static void skb_fill_rx_data(struct be_adapter *adapter,
+static void skb_fill_rx_data(struct be_adapter *adapter, struct be_rx_obj *rxo,
 			struct sk_buff *skb, struct be_eth_rx_compl *rxcp,
 			u16 num_rcvd)
 {
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
+	struct be_queue_info *rxq = &rxo->q;
 	struct be_rx_page_info *page_info;
 	u16 rxq_idx, i, j;
 	u32 pktsize, hdr_len, curr_frag_len, size;
@@ -929,7 +918,7 @@ static void skb_fill_rx_data(struct be_adapter *adapter,
 	pktsize = AMAP_GET_BITS(struct amap_eth_rx_compl, pktsize, rxcp);
 	pkt_type = AMAP_GET_BITS(struct amap_eth_rx_compl, cast_enc, rxcp);
 
-	page_info = get_rx_page_info(adapter, rxq_idx);
+	page_info = get_rx_page_info(adapter, rxo, rxq_idx);
 
 	start = page_address(page_info->page) + page_info->page_offset;
 	prefetch(start);
@@ -967,7 +956,7 @@ static void skb_fill_rx_data(struct be_adapter *adapter,
 	for (i = 1, j = 0; i < num_rcvd; i++) {
 		size -= curr_frag_len;
 		index_inc(&rxq_idx, rxq->len);
-		page_info = get_rx_page_info(adapter, rxq_idx);
+		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
 
 		curr_frag_len = min(size, rx_frag_size);
 
@@ -993,11 +982,12 @@ static void skb_fill_rx_data(struct be_adapter *adapter,
 	BUG_ON(j > MAX_SKB_FRAGS);
 
 done:
-	be_rx_stats_update(adapter, pktsize, num_rcvd, pkt_type);
+	be_rx_stats_update(rxo, pktsize, num_rcvd, pkt_type);
 }
 
 /* Process the RX completion indicated by rxcp when GRO is disabled */
 static void be_rx_compl_process(struct be_adapter *adapter,
+			struct be_rx_obj *rxo,
 			struct be_eth_rx_compl *rxcp)
 {
 	struct sk_buff *skb;
@@ -1014,11 +1004,11 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 	if (unlikely(!skb)) {
 		if (net_ratelimit())
 			dev_warn(&adapter->pdev->dev, "skb alloc failed\n");
-		be_rx_compl_discard(adapter, rxcp);
+		be_rx_compl_discard(adapter, rxo, rxcp);
 		return;
 	}
 
-	skb_fill_rx_data(adapter, skb, rxcp, num_rcvd);
+	skb_fill_rx_data(adapter, rxo, skb, rxcp, num_rcvd);
 
 	if (do_pkt_csum(rxcp, adapter->rx_csum))
 		skb_checksum_none_assert(skb);
@@ -1051,12 +1041,13 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 
 /* Process the RX completion indicated by rxcp when GRO is enabled */
 static void be_rx_compl_process_gro(struct be_adapter *adapter,
-			struct be_eth_rx_compl *rxcp)
+		struct be_rx_obj *rxo,
+		struct be_eth_rx_compl *rxcp)
 {
 	struct be_rx_page_info *page_info;
 	struct sk_buff *skb = NULL;
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
-	struct be_eq_obj *eq_obj =  &adapter->rx_eq;
+	struct be_queue_info *rxq = &rxo->q;
+	struct be_eq_obj *eq_obj =  &rxo->rx_eq;
 	u32 num_rcvd, pkt_size, remaining, vlanf, curr_frag_len;
 	u16 i, rxq_idx = 0, vid, j;
 	u8 vtm;
@@ -1080,13 +1071,13 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 
 	skb = napi_get_frags(&eq_obj->napi);
 	if (!skb) {
-		be_rx_compl_discard(adapter, rxcp);
+		be_rx_compl_discard(adapter, rxo, rxcp);
 		return;
 	}
 
 	remaining = pkt_size;
 	for (i = 0, j = -1; i < num_rcvd; i++) {
-		page_info = get_rx_page_info(adapter, rxq_idx);
+		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
 
 		curr_frag_len = min(remaining, rx_frag_size);
 
@@ -1127,12 +1118,12 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 		vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp, vid);
 	}
 
-	be_rx_stats_update(adapter, pkt_size, num_rcvd, pkt_type);
+	be_rx_stats_update(rxo, pkt_size, num_rcvd, pkt_type);
 }
 
-static struct be_eth_rx_compl *be_rx_compl_get(struct be_adapter *adapter)
+static struct be_eth_rx_compl *be_rx_compl_get(struct be_rx_obj *rxo)
 {
-	struct be_eth_rx_compl *rxcp = queue_tail_node(&adapter->rx_obj.cq);
+	struct be_eth_rx_compl *rxcp = queue_tail_node(&rxo->cq);
 
 	if (rxcp->dw[offsetof(struct amap_eth_rx_compl, valid) / 32] == 0)
 		return NULL;
@@ -1140,7 +1131,7 @@ static struct be_eth_rx_compl *be_rx_compl_get(struct be_adapter *adapter)
 	rmb();
 	be_dws_le_to_cpu(rxcp, sizeof(*rxcp));
 
-	queue_tail_inc(&adapter->rx_obj.cq);
+	queue_tail_inc(&rxo->cq);
 	return rxcp;
 }
 
@@ -1166,22 +1157,23 @@ static inline struct page *be_alloc_pages(u32 size)
  * Allocate a page, split it to fragments of size rx_frag_size and post as
  * receive buffers to BE
  */
-static void be_post_rx_frags(struct be_adapter *adapter)
+static void be_post_rx_frags(struct be_rx_obj *rxo)
 {
-	struct be_rx_page_info *page_info_tbl = adapter->rx_obj.page_info_tbl;
+	struct be_adapter *adapter = rxo->adapter;
+	struct be_rx_page_info *page_info_tbl = rxo->page_info_tbl;
 	struct be_rx_page_info *page_info = NULL, *prev_page_info = NULL;
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
+	struct be_queue_info *rxq = &rxo->q;
 	struct page *pagep = NULL;
 	struct be_eth_rx_d *rxd;
 	u64 page_dmaaddr = 0, frag_dmaaddr;
 	u32 posted, page_offset = 0;
 
-	page_info = &page_info_tbl[rxq->head];
+	page_info = &rxo->page_info_tbl[rxq->head];
 	for (posted = 0; posted < MAX_RX_POST && !page_info->page; posted++) {
 		if (!pagep) {
 			pagep = be_alloc_pages(adapter->big_page_size);
 			if (unlikely(!pagep)) {
-				drvr_stats(adapter)->be_ethrx_post_fail++;
+				rxo->stats.rx_post_fail++;
 				break;
 			}
 			page_dmaaddr = pci_map_page(adapter->pdev, pagep, 0,
@@ -1220,7 +1212,7 @@ static void be_post_rx_frags(struct be_adapter *adapter)
 		be_rxq_notify(adapter, rxq->id, posted);
 	} else if (atomic_read(&rxq->used) == 0) {
 		/* Let be_worker replenish when memory is available */
-		adapter->rx_post_starved = true;
+		rxo->rx_post_starved = true;
 	}
 }
 
@@ -1323,17 +1315,17 @@ static void be_eq_clean(struct be_adapter *adapter,
 		be_eq_notify(adapter, eq_obj->q.id, false, true, num);
 }
 
-static void be_rx_q_clean(struct be_adapter *adapter)
+static void be_rx_q_clean(struct be_adapter *adapter, struct be_rx_obj *rxo)
 {
 	struct be_rx_page_info *page_info;
-	struct be_queue_info *rxq = &adapter->rx_obj.q;
-	struct be_queue_info *rx_cq = &adapter->rx_obj.cq;
+	struct be_queue_info *rxq = &rxo->q;
+	struct be_queue_info *rx_cq = &rxo->cq;
 	struct be_eth_rx_compl *rxcp;
 	u16 tail;
 
 	/* First cleanup pending rx completions */
-	while ((rxcp = be_rx_compl_get(adapter)) != NULL) {
-		be_rx_compl_discard(adapter, rxcp);
+	while ((rxcp = be_rx_compl_get(rxo)) != NULL) {
+		be_rx_compl_discard(adapter, rxo, rxcp);
 		be_rx_compl_reset(rxcp);
 		be_cq_notify(adapter, rx_cq->id, true, 1);
 	}
@@ -1341,7 +1333,7 @@ static void be_rx_q_clean(struct be_adapter *adapter)
 	/* Then free posted rx buffer that were not used */
 	tail = (rxq->head + rxq->len - atomic_read(&rxq->used)) % rxq->len;
 	for (; atomic_read(&rxq->used) > 0; index_inc(&tail, rxq->len)) {
-		page_info = get_rx_page_info(adapter, tail);
+		page_info = get_rx_page_info(adapter, rxo, tail);
 		put_page(page_info->page);
 		memset(page_info, 0, sizeof(*page_info));
 	}
@@ -1519,92 +1511,101 @@ tx_eq_free:
 static void be_rx_queues_destroy(struct be_adapter *adapter)
 {
 	struct be_queue_info *q;
-
-	q = &adapter->rx_obj.q;
-	if (q->created) {
-		be_cmd_q_destroy(adapter, q, QTYPE_RXQ);
-
-		/* After the rxq is invalidated, wait for a grace time
-		 * of 1ms for all dma to end and the flush compl to arrive
-		 */
-		mdelay(1);
-		be_rx_q_clean(adapter);
+	struct be_rx_obj *rxo;
+	int i;
+
+	for_all_rx_queues(adapter, rxo, i) {
+		q = &rxo->q;
+		if (q->created) {
+			be_cmd_q_destroy(adapter, q, QTYPE_RXQ);
+			/* After the rxq is invalidated, wait for a grace time
+			 * of 1ms for all dma to end and the flush compl to
+			 * arrive
+			 */
+			mdelay(1);
+			be_rx_q_clean(adapter, rxo);
+		}
+		be_queue_free(adapter, q);
+
+		q = &rxo->cq;
+		if (q->created)
+			be_cmd_q_destroy(adapter, q, QTYPE_CQ);
+		be_queue_free(adapter, q);
+
+		/* Clear any residual events */
+		q = &rxo->rx_eq.q;
+		if (q->created) {
+			be_eq_clean(adapter, &rxo->rx_eq);
+			be_cmd_q_destroy(adapter, q, QTYPE_EQ);
+		}
+		be_queue_free(adapter, q);
 	}
-	be_queue_free(adapter, q);
-
-	q = &adapter->rx_obj.cq;
-	if (q->created)
-		be_cmd_q_destroy(adapter, q, QTYPE_CQ);
-	be_queue_free(adapter, q);
-
-	/* Clear any residual events */
-	be_eq_clean(adapter, &adapter->rx_eq);
-
-	q = &adapter->rx_eq.q;
-	if (q->created)
-		be_cmd_q_destroy(adapter, q, QTYPE_EQ);
-	be_queue_free(adapter, q);
 }
 
 static int be_rx_queues_create(struct be_adapter *adapter)
 {
 	struct be_queue_info *eq, *q, *cq;
-	int rc;
+	struct be_rx_obj *rxo;
+	int rc, i;
 
 	adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
-	adapter->rx_eq.max_eqd = BE_MAX_EQD;
-	adapter->rx_eq.min_eqd = 0;
-	adapter->rx_eq.cur_eqd = 0;
-	adapter->rx_eq.enable_aic = true;
-
-	/* Alloc Rx Event queue */
-	eq = &adapter->rx_eq.q;
-	rc = be_queue_alloc(adapter, eq, EVNT_Q_LEN,
-				sizeof(struct be_eq_entry));
-	if (rc)
-		return rc;
-
-	/* Ask BE to create Rx Event queue */
-	rc = be_cmd_eq_create(adapter, eq, adapter->rx_eq.cur_eqd);
-	if (rc)
-		goto rx_eq_free;
-
-	/* Alloc RX eth compl queue */
-	cq = &adapter->rx_obj.cq;
-	rc = be_queue_alloc(adapter, cq, RX_CQ_LEN,
-			sizeof(struct be_eth_rx_compl));
-	if (rc)
-		goto rx_eq_destroy;
-
-	/* Ask BE to create Rx eth compl queue */
-	rc = be_cmd_cq_create(adapter, cq, eq, false, false, 3);
-	if (rc)
-		goto rx_cq_free;
-
-	/* Alloc RX eth queue */
-	q = &adapter->rx_obj.q;
-	rc = be_queue_alloc(adapter, q, RX_Q_LEN, sizeof(struct be_eth_rx_d));
-	if (rc)
-		goto rx_cq_destroy;
-
-	/* Ask BE to create Rx eth queue */
-	rc = be_cmd_rxq_create(adapter, q, cq->id, rx_frag_size,
-		BE_MAX_JUMBO_FRAME_SIZE, adapter->if_handle, false);
-	if (rc)
-		goto rx_q_free;
+	for_all_rx_queues(adapter, rxo, i) {
+		rxo->adapter = adapter;
+		rxo->rx_eq.max_eqd = BE_MAX_EQD;
+		rxo->rx_eq.enable_aic = true;
+
+		/* EQ */
+		eq = &rxo->rx_eq.q;
+		rc = be_queue_alloc(adapter, eq, EVNT_Q_LEN,
+					sizeof(struct be_eq_entry));
+		if (rc)
+			goto err;
+
+		rc = be_cmd_eq_create(adapter, eq, rxo->rx_eq.cur_eqd);
+		if (rc)
+			goto err;
+
+		/* CQ */
+		cq = &rxo->cq;
+		rc = be_queue_alloc(adapter, cq, RX_CQ_LEN,
+				sizeof(struct be_eth_rx_compl));
+		if (rc)
+			goto err;
+
+		rc = be_cmd_cq_create(adapter, cq, eq, false, false, 3);
+		if (rc)
+			goto err;
+
+		/* Rx Q */
+		q = &rxo->q;
+		rc = be_queue_alloc(adapter, q, RX_Q_LEN,
+				sizeof(struct be_eth_rx_d));
+		if (rc)
+			goto err;
+
+		rc = be_cmd_rxq_create(adapter, q, cq->id, rx_frag_size,
+			BE_MAX_JUMBO_FRAME_SIZE, adapter->if_handle,
+			(i > 0) ? 1 : 0/* rss enable */, &rxo->rss_id);
+		if (rc)
+			goto err;
+	}
+
+	if (be_multi_rxq(adapter)) {
+		u8 rsstable[MAX_RSS_QS];
+
+		for_all_rss_queues(adapter, rxo, i)
+			rsstable[i] = rxo->rss_id;
+
+		rc = be_cmd_rss_config(adapter, rsstable,
+			adapter->num_rx_qs - 1);
+		if (rc)
+			goto err;
+	}
 
 	return 0;
-rx_q_free:
-	be_queue_free(adapter, q);
-rx_cq_destroy:
-	be_cmd_q_destroy(adapter, cq, QTYPE_CQ);
-rx_cq_free:
-	be_queue_free(adapter, cq);
-rx_eq_destroy:
-	be_cmd_q_destroy(adapter, eq, QTYPE_EQ);
-rx_eq_free:
-	be_queue_free(adapter, eq);
-	return rc;
+err:
+	be_rx_queues_destroy(adapter);
+	return -1;
 }
 
 /* There are 8 evt ids per func. Retruns the evt id's bit number */
@@ -1616,24 +1617,31 @@ static inline int be_evt_bit_get(struct be_adapter *adapter, u32 eq_id)
 static irqreturn_t be_intx(int irq, void *dev)
 {
 	struct be_adapter *adapter = dev;
-	int isr;
+	struct be_rx_obj *rxo;
+	int isr, i;
 
 	isr = ioread32(adapter->csr + CEV_ISR0_OFFSET +
 		(adapter->tx_eq.q.id/ 8) * CEV_ISR_SIZE);
 	if (!isr)
 		return IRQ_NONE;
 
-	event_handle(adapter, &adapter->tx_eq);
-	event_handle(adapter, &adapter->rx_eq);
+	if ((1 << be_evt_bit_get(adapter, adapter->tx_eq.q.id) & isr))
+		event_handle(adapter, &adapter->tx_eq);
+
+	for_all_rx_queues(adapter, rxo, i) {
+		if ((1 << be_evt_bit_get(adapter, rxo->rx_eq.q.id) & isr))
+			event_handle(adapter, &rxo->rx_eq);
+	}
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t be_msix_rx(int irq, void *dev)
 {
-	struct be_adapter *adapter = dev;
+	struct be_rx_obj *rxo = dev;
+	struct be_adapter *adapter = rxo->adapter;
 
-	event_handle(adapter, &adapter->rx_eq);
+	event_handle(adapter, &rxo->rx_eq);
 
 	return IRQ_HANDLED;
 }
@@ -1647,14 +1655,14 @@ static irqreturn_t be_msix_tx_mcc(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
-static inline bool do_gro(struct be_adapter *adapter,
+static inline bool do_gro(struct be_adapter *adapter, struct be_rx_obj *rxo,
 			struct be_eth_rx_compl *rxcp)
 {
 	int err = AMAP_GET_BITS(struct amap_eth_rx_compl, err, rxcp);
 	int tcp_frame = AMAP_GET_BITS(struct amap_eth_rx_compl, tcpf, rxcp);
 
 	if (err)
-		drvr_stats(adapter)->be_rxcp_err++;
+		rxo->stats.rxcp_err++;
 
 	return (tcp_frame && !err) ? true : false;
 }
@@ -1662,29 +1670,29 @@ static inline bool do_gro(struct be_adapter *adapter,
 int be_poll_rx(struct napi_struct *napi, int budget)
 {
 	struct be_eq_obj *rx_eq = container_of(napi, struct be_eq_obj, napi);
-	struct be_adapter *adapter =
-		container_of(rx_eq, struct be_adapter, rx_eq);
-	struct be_queue_info *rx_cq = &adapter->rx_obj.cq;
+	struct be_rx_obj *rxo = container_of(rx_eq, struct be_rx_obj, rx_eq);
+	struct be_adapter *adapter = rxo->adapter;
+	struct be_queue_info *rx_cq = &rxo->cq;
 	struct be_eth_rx_compl *rxcp;
 	u32 work_done;
 
-	adapter->stats.drvr_stats.be_rx_polls++;
+	rxo->stats.rx_polls++;
 	for (work_done = 0; work_done < budget; work_done++) {
-		rxcp = be_rx_compl_get(adapter);
+		rxcp = be_rx_compl_get(rxo);
 		if (!rxcp)
 			break;
 
-		if (do_gro(adapter, rxcp))
-			be_rx_compl_process_gro(adapter, rxcp);
+		if (do_gro(adapter, rxo, rxcp))
+			be_rx_compl_process_gro(adapter, rxo, rxcp);
 		else
-			be_rx_compl_process(adapter, rxcp);
+			be_rx_compl_process(adapter, rxo, rxcp);
 
 		be_rx_compl_reset(rxcp);
 	}
 
 	/* Refill the queue */
-	if (atomic_read(&adapter->rx_obj.q.used) < RX_FRAGS_REFILL_WM)
-		be_post_rx_frags(adapter);
+	if (atomic_read(&rxo->q.used) < RX_FRAGS_REFILL_WM)
+		be_post_rx_frags(rxo);
 
 	/* All consumed */
 	if (work_done < budget) {
@@ -1738,8 +1746,8 @@ static int be_poll_tx_mcc(struct napi_struct *napi, int budget)
 			netif_wake_queue(adapter->netdev);
 		}
 
-		drvr_stats(adapter)->be_tx_events++;
-		drvr_stats(adapter)->be_tx_compl += tx_compl;
+		tx_stats(adapter)->be_tx_events++;
+		tx_stats(adapter)->be_tx_compl += tx_compl;
 	}
 
 	return 1;
@@ -1788,20 +1796,24 @@ static void be_worker(struct work_struct *work)
 {
 	struct be_adapter *adapter =
 		container_of(work, struct be_adapter, work.work);
+	struct be_rx_obj *rxo;
+	int i;
 
 	if (!adapter->stats_ioctl_sent)
-		be_cmd_get_stats(adapter, &adapter->stats.cmd);
-
-	/* Set EQ delay */
-	be_rx_eqd_update(adapter);
+		be_cmd_get_stats(adapter, &adapter->stats_cmd);
 
 	be_tx_rate_update(adapter);
-	be_rx_rate_update(adapter);
 
-	if (adapter->rx_post_starved) {
-		adapter->rx_post_starved = false;
-		be_post_rx_frags(adapter);
+	for_all_rx_queues(adapter, rxo, i) {
+		be_rx_rate_update(rxo);
+		be_rx_eqd_update(adapter, rxo);
+
+		if (rxo->rx_post_starved) {
+			rxo->rx_post_starved = false;
+			be_post_rx_frags(rxo);
+		}
 	}
+
 	if (!adapter->ue_detected)
 		be_detect_dump_ue(adapter);
 
@@ -1816,17 +1828,45 @@ static void be_msix_disable(struct be_adapter *adapter)
 	}
 }
 
+static int be_num_rxqs_get(struct be_adapter *adapter)
+{
+	if (multi_rxq && (adapter->function_caps & BE_FUNCTION_CAPS_RSS) &&
+		!adapter->sriov_enabled && !(adapter->function_mode & 0x400)) {
+		return 1 + MAX_RSS_QS; /* one default non-RSS queue */
+	} else {
+		dev_warn(&adapter->pdev->dev,
+			"No support for multiple RX queues\n");
+		return 1;
+	}
+}
+
 static void be_msix_enable(struct be_adapter *adapter)
 {
+#define BE_MIN_MSIX_VECTORS	(1 + 1) /* Rx + Tx */
 	int i, status;
 
-	for (i = 0; i < BE_NUM_MSIX_VECTORS; i++)
+	adapter->num_rx_qs = be_num_rxqs_get(adapter);
+
+	for (i = 0; i < (adapter->num_rx_qs + 1); i++)
 		adapter->msix_entries[i].entry = i;
 
 	status = pci_enable_msix(adapter->pdev, adapter->msix_entries,
-		BE_NUM_MSIX_VECTORS);
-	if (status == 0)
-		adapter->msix_enabled = true;
+			adapter->num_rx_qs + 1);
+	if (status == 0) {
+		goto done;
+	} else if (status >= BE_MIN_MSIX_VECTORS) {
+		if (pci_enable_msix(adapter->pdev, adapter->msix_entries,
+				status) == 0) {
+			adapter->num_rx_qs = status - 1;
+			dev_warn(&adapter->pdev->dev,
+				"Could alloc only %d MSIx vectors. "
+				"Using %d RX Qs\n", status, adapter->num_rx_qs);
+			goto done;
+		}
+	}
+	return;
+done:
+	adapter->msix_enabled = true;
 }
 
 static void be_sriov_enable(struct be_adapter *adapter)
@@ -1860,38 +1900,50 @@ static inline int be_msix_vec_get(struct be_adapter *adapter, u32 eq_id)
 
 static int be_request_irq(struct be_adapter *adapter,
 		struct be_eq_obj *eq_obj,
-		void *handler, char *desc)
+		void *handler, char *desc, void *context)
 {
 	struct net_device *netdev = adapter->netdev;
 	int vec;
 
 	sprintf(eq_obj->desc, "%s-%s", netdev->name, desc);
 	vec = be_msix_vec_get(adapter, eq_obj->q.id);
-	return request_irq(vec, handler, 0, eq_obj->desc, adapter);
+	return request_irq(vec, handler, 0, eq_obj->desc, context);
 }
 
-static void be_free_irq(struct be_adapter *adapter, struct be_eq_obj *eq_obj)
+static void be_free_irq(struct be_adapter *adapter, struct be_eq_obj *eq_obj,
+			void *context)
 {
 	int vec = be_msix_vec_get(adapter, eq_obj->q.id);
-	free_irq(vec, adapter);
+	free_irq(vec, context);
 }
 
 static int be_msix_register(struct be_adapter *adapter)
 {
-	int status;
+	struct be_rx_obj *rxo;
+	int status, i;
+	char qname[10];
 
-	status = be_request_irq(adapter, &adapter->tx_eq, be_msix_tx_mcc, "tx");
+	status = be_request_irq(adapter, &adapter->tx_eq, be_msix_tx_mcc, "tx",
+				adapter);
 	if (status)
 		goto err;
 
-	status = be_request_irq(adapter, &adapter->rx_eq, be_msix_rx, "rx");
-	if (status)
-		goto free_tx_irq;
+	for_all_rx_queues(adapter, rxo, i) {
+		sprintf(qname, "rxq%d", i);
+		status = be_request_irq(adapter, &rxo->rx_eq, be_msix_rx,
+				qname, rxo);
+		if (status)
+			goto err_msix;
+	}
 
 	return 0;
 
-free_tx_irq:
-	be_free_irq(adapter, &adapter->tx_eq);
+err_msix:
+	be_free_irq(adapter, &adapter->tx_eq, adapter);
+
+	for (i--, rxo = &adapter->rx_obj[i]; i >= 0; i--, rxo--)
+		be_free_irq(adapter, &rxo->rx_eq, rxo);
+
 err:
 	dev_warn(&adapter->pdev->dev,
 		"MSIX Request IRQ failed - err %d\n", status);
@@ -1931,6 +1983,8 @@ done:
 static void be_irq_unregister(struct be_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
+	struct be_rx_obj *rxo;
+	int i;
 
 	if (!adapter->isr_registered)
 		return;
@@ -1942,8 +1996,11 @@ static void be_irq_unregister(struct be_adapter *adapter)
 	}
 
 	/* MSIx */
-	be_free_irq(adapter, &adapter->tx_eq);
-	be_free_irq(adapter, &adapter->rx_eq);
+	be_free_irq(adapter, &adapter->tx_eq, adapter);
+
+	for_all_rx_queues(adapter, rxo, i)
+		be_free_irq(adapter, &rxo->rx_eq, rxo);
+
 done:
 	adapter->isr_registered = false;
 }
@@ -1951,9 +2008,9 @@ done:
 static int be_close(struct net_device *netdev)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
+	struct be_rx_obj *rxo;
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
-	int vec;
+	int vec, i;
 
 	cancel_delayed_work_sync(&adapter->work);
 
@@ -1968,14 +2025,19 @@ static int be_close(struct net_device *netdev)
 	if (adapter->msix_enabled) {
 		vec = be_msix_vec_get(adapter, tx_eq->q.id);
 		synchronize_irq(vec);
-		vec = be_msix_vec_get(adapter, rx_eq->q.id);
-		synchronize_irq(vec);
+
+		for_all_rx_queues(adapter, rxo, i) {
+			vec = be_msix_vec_get(adapter, rxo->rx_eq.q.id);
+			synchronize_irq(vec);
+		}
 	} else {
 		synchronize_irq(netdev->irq);
 	}
 	be_irq_unregister(adapter);
 
-	napi_disable(&rx_eq->napi);
+	for_all_rx_queues(adapter, rxo, i)
+		napi_disable(&rxo->rx_eq.napi);
+
 	napi_disable(&tx_eq->napi);
 
 	/* Wait for all pending tx completions to arrive so that
@@ -1989,17 +2051,17 @@ static int be_close(struct net_device *netdev)
 static int be_open(struct net_device *netdev)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *rx_eq = &adapter->rx_eq;
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
+	struct be_rx_obj *rxo;
 	bool link_up;
-	int status;
+	int status, i;
 	u8 mac_speed;
 	u16 link_speed;
 
-	/* First time posting */
-	be_post_rx_frags(adapter);
-
-	napi_enable(&rx_eq->napi);
+	for_all_rx_queues(adapter, rxo, i) {
+		be_post_rx_frags(rxo);
+		napi_enable(&rxo->rx_eq.napi);
+	}
 	napi_enable(&tx_eq->napi);
 
 	be_irq_register(adapter);
@@ -2007,12 +2069,12 @@ static int be_open(struct net_device *netdev)
 	be_intr_set(adapter, true);
 
 	/* The evt queues are created in unarmed state; arm them */
-	be_eq_notify(adapter, rx_eq->q.id, true, false, 0);
+	for_all_rx_queues(adapter, rxo, i) {
+		be_eq_notify(adapter, rxo->rx_eq.q.id, true, false, 0);
+		be_cq_notify(adapter, rxo->cq.id, true, 0);
+	}
 	be_eq_notify(adapter, tx_eq->q.id, true, false, 0);
 
-	/* Rx compl queue may be in unarmed state; rearm it */
-	be_cq_notify(adapter, adapter->rx_obj.cq.id, true, 0);
-
 	/* Now that interrupts are on we can process async mcc */
 	be_async_mcc_enable(adapter);
 
@@ -2088,7 +2150,7 @@ static int be_setup_wol(struct be_adapter *adapter, bool enable)
 static inline int be_vf_eth_addr_config(struct be_adapter *adapter)
 {
 	u32 vf = 0;
-	int status;
+	int status = 0;
 	u8 mac[ETH_ALEN];
 
 	be_vf_eth_addr_generate(adapter, mac);
@@ -2134,6 +2196,11 @@ static int be_setup(struct be_adapter *adapter)
 				BE_IF_FLAGS_PROMISCUOUS |
 				BE_IF_FLAGS_PASS_L3L4_ERRORS;
 		en_flags |= BE_IF_FLAGS_PASS_L3L4_ERRORS;
+
+		if (be_multi_rxq(adapter)) {
+			cap_flags |= BE_IF_FLAGS_RSS;
+			en_flags |= BE_IF_FLAGS_RSS;
+		}
 	}
 
 	status = be_cmd_if_create(adapter, cap_flags, en_flags,
@@ -2455,6 +2522,8 @@ static struct net_device_ops be_netdev_ops = {
 static void be_netdev_init(struct net_device *netdev)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
+	struct be_rx_obj *rxo;
+	int i;
 
 	netdev->features |= NETIF_F_SG | NETIF_F_HW_VLAN_RX | NETIF_F_TSO |
 		NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_FILTER | NETIF_F_HW_CSUM |
@@ -2476,8 +2545,10 @@ static void be_netdev_init(struct net_device *netdev)
 
 	SET_ETHTOOL_OPS(netdev, &be_ethtool_ops);
 
-	netif_napi_add(netdev, &adapter->rx_eq.napi, be_poll_rx,
-		BE_NAPI_WEIGHT);
+	for_all_rx_queues(adapter, rxo, i)
+		netif_napi_add(netdev, &rxo->rx_eq.napi, be_poll_rx,
+				BE_NAPI_WEIGHT);
+
 	netif_napi_add(netdev, &adapter->tx_eq.napi, be_poll_tx_mcc,
 		BE_NAPI_WEIGHT);
 
@@ -2611,8 +2682,7 @@ done:
 
 static void be_stats_cleanup(struct be_adapter *adapter)
 {
-	struct be_stats_obj *stats = &adapter->stats;
-	struct be_dma_mem *cmd = &stats->cmd;
+	struct be_dma_mem *cmd = &adapter->stats_cmd;
 
 	if (cmd->va)
 		pci_free_consistent(adapter->pdev, cmd->size,
@@ -2621,8 +2691,7 @@ static void be_stats_cleanup(struct be_adapter *adapter)
 
 static int be_stats_init(struct be_adapter *adapter)
 {
-	struct be_stats_obj *stats = &adapter->stats;
-	struct be_dma_mem *cmd = &stats->cmd;
+	struct be_dma_mem *cmd = &adapter->stats_cmd;
 
 	cmd->size = sizeof(struct be_cmd_req_get_stats);
 	cmd->va = pci_alloc_consistent(adapter->pdev, cmd->size, &cmd->dma);
@@ -2667,8 +2736,8 @@ static int be_get_config(struct be_adapter *adapter)
 	if (status)
 		return status;
 
-	status = be_cmd_query_fw_cfg(adapter,
-				&adapter->port_num, &adapter->function_mode);
+	status = be_cmd_query_fw_cfg(adapter, &adapter->port_num,
+			&adapter->function_mode, &adapter->function_caps);
 	if (status)
 		return status;
 
@@ -2703,7 +2772,6 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	struct be_adapter *adapter;
 	struct net_device *netdev;
 
-
 	status = pci_enable_device(pdev);
 	if (status)
 		goto do_none;
@@ -2736,11 +2804,8 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	adapter->pdev = pdev;
 	pci_set_drvdata(pdev, adapter);
 	adapter->netdev = netdev;
-	be_netdev_init(netdev);
 	SET_NETDEV_DEV(netdev, &pdev->dev);
 
-	be_msix_enable(adapter);
-
 	status = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
 	if (!status) {
 		netdev->features |= NETIF_F_HIGHDMA;
@@ -2784,12 +2849,15 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	if (status)
 		goto stats_clean;
 
+	be_msix_enable(adapter);
+
 	INIT_DELAYED_WORK(&adapter->work, be_worker);
 
 	status = be_setup(adapter);
 	if (status)
-		goto stats_clean;
+		goto msix_disable;
 
+	be_netdev_init(netdev);
 	status = register_netdev(netdev);
 	if (status != 0)
 		goto unsetup;
@@ -2799,12 +2867,13 @@ static int __devinit be_probe(struct pci_dev *pdev,
 
 unsetup:
 	be_clear(adapter);
+msix_disable:
+	be_msix_disable(adapter);
 stats_clean:
 	be_stats_cleanup(adapter);
 ctrl_clean:
 	be_ctrl_cleanup(adapter);
 free_netdev:
-	be_msix_disable(adapter);
 	be_sriov_disable(adapter);
 	free_netdev(adapter->netdev);
 	pci_set_drvdata(pdev, NULL);
-- 
1.6.5.2


^ permalink raw reply related

* Re: [PATCH net-next2.6] net: dynamic ingress_queue allocation
From: jamal @ 2010-10-01 11:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, David Miller, netdev
In-Reply-To: <1285887509.2705.33.camel@edumazet-laptop>

On Fri, 2010-10-01 at 00:58 +0200, Eric Dumazet wrote:
> Hi Jamal
> 
> Here is the dynamic allocation I promised. I lightly tested it, could
> you review it please ?
> Thanks !
> 
> [PATCH net-next2.6] net: dynamic ingress_queue allocation
> 
> ingress being not used very much, and net_device->ingress_queue being
> quite a big object (128 or 256 bytes), use a dynamic allocation if
> needed (tc qdisc add dev eth0 ingress ...)

I agree with the principle that it is valuable in making it dynamic for
people who dont want it; but, but (like my kid would say, sniff, sniff)
you are making me sad saying it is not used very much ;-> It is used
very much in my world. My friend Jarek uses it;->


> +#ifdef CONFIG_NET_CLS_ACT

I think appropriately this should be NET_SCH_INGRESS (everywhere else as
well).

> +static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
> +{
> +#ifdef CONFIG_NET_CLS_ACT
> +	return dev->ingress_queue;
> +#else
> +	return NULL;
> +#endif

Above, if you just returned dev->ingress_queue wouldnt it always be 
NULL if it was not allocated?


> @@ -2737,7 +2734,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
>  					 struct packet_type **pt_prev,
>  					 int *ret, struct net_device *orig_dev)
>  {
> -	if (skb->dev->ingress_queue.qdisc == &noop_qdisc)
> +	struct netdev_queue *rxq = dev_ingress_queue(skb->dev);
> +
> +	if (!rxq || rxq->qdisc == &noop_qdisc)
>  		goto out;

I stared at above a little longer since this is the only fast path
affected; is it a few more cycles now for people who love ingress?

> @@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>  		    (new && new->flags & TCQ_F_INGRESS)) {
>  			num_q = 1;
>  			ingress = 1;
> +			if (!dev_ingress_queue(dev))
> +				return -ENOENT;
>  		}
>  

The above looks clever but worries me because it changes the old flow.
If you have time,  the following tests will alleviate my fears

1) compile support for ingress and add/delete ingress qdisc
2) Dont compile support and add/delete ingress qdisc
3) Compile ingress as a module and add/delete ingress qdisc


Other than that excellent work Eric. And you can add my
Acked/reviewed-by etc.

BTW, did i say i like your per-cpu stats stuff? It applies nicely to
qdiscs, actions etc ;->

cheers,
jamal


^ permalink raw reply

* Re: wireless-testing (2.6.36-rc6):  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
From: Johannes Berg @ 2010-10-01 11:49 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, NetDev
In-Reply-To: <4CA4F000.2060509-my8/4N5VtI7c+919tysfdA@public.gmane.org>

On Thu, 2010-09-30 at 13:16 -0700, Ben Greear wrote:
> We saw this on a system that has two ath9k APs, some extra routing tables
> and rules to use them, and a user-space 'bridge' that uses packet-sockets.
> 
> Aside from a few patches to help virtualize wireless devices (and none directly to ath9k),
> this is today's wireless-testing tree.
> 
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.36-rc6-wl+ #20
> ---------------------------------
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> kworker/u:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (&(&list->lock)->rlock){+.?...}, at: [<c0741066>] packet_rcv+0x1f3/0x27a

I think this is due to

commit 5ed3bc7288487bd4f891f420a07319e0b538b4fe
Author: John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date:   Thu Jun 24 14:38:30 2010 -0400

    mac80211: use netif_receive_skb in ieee80211_tx_status callpath


since

/**
 *      netif_receive_skb - process receive buffer from network
 *      @skb: buffer to process
 *
 *      netif_receive_skb() is the main receive data processing function.
 *      It always succeeds. The buffer may be dropped during processing   
 *      for congestion control or by the protocol layers.
 *
 *      This function may only be called from softirq context and interrupts
 *      should be enabled.

but we don't explicitly disable BHs. I suppose we should impose the same
on drivers calling ieee80211_tx_status().

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_CAN driver to 2.6.35
From: Wolfgang Grandegger @ 2010-10-01 12:40 UTC (permalink / raw)
  To: Masayuki Ohtake
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w,
	margie.foster-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
	joel.clark-ral2JQCrhuEAvxtiuMwx3w, Tomoya MORINAGA,
	meego-dev-WXzIur8shnEAvxtiuMwx3w, David S. Miller,
	Christian Pellegrin, Samuel Ortiz
In-Reply-To: <000401cb614f$c932c750$66f8800a-a06+6cuVnkTSQfdrb5gaxUEOCMrvLtNR@public.gmane.org>

On 10/01/2010 12:02 PM, Masayuki Ohtake wrote:
> Hi Wolfgang Grandegger,
> 
> Thank you for your comments.
> 
> We will modify and re-post ASAP.
> 
> I have a comment about below.
>> In this driver you are using just *one* RX object. This means that the
>> CPU must handle new messages as quickly as possible otherwise message
>> losses will happen, right?. For sure, this will not make user's happy.
>> Any chance to use more RX objects in FIFO mode?
> 
> In case implementing with FIFO mode,
> received packets may be our of order.

Hm, FIFO means "First in First out"! It might be tricky to implement,
though.

> Because our CAN register access is slow.
> 
> I am confirming our CAN HW spec and the possibility of our-of-order.

I don't understand?

Wolfgang.

^ permalink raw reply

* [patch 0/1] s390: qeth patch for net-next
From: frank.blaschka @ 2010-10-01 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

here is one more qeth patch for net-next.

shortlog:
Ursula Braun (1)
qeth: tagging with VLAN-ID 0

Thanks,
        Frank

^ permalink raw reply

* [patch 1/1] [PATCH] qeth: tagging with VLAN-ID 0
From: frank.blaschka @ 2010-10-01 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20101001125112.459999476@de.ibm.com>

[-- Attachment #1: vlan0.patch --]
[-- Type: text/plain, Size: 3098 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

This patch adapts qeth to handle tagged frames with VLAN-ID 0 and
with or without priority information in the tag. It enables qeth to
receive priority-tagged frames on a base interface, for example from
z/OS, without configuring an additional VLAN interface.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---
 drivers/s390/net/qeth_l2_main.c |    2 ++
 drivers/s390/net/qeth_l3_main.c |   27 +++++++++++++--------------
 2 files changed, 15 insertions(+), 14 deletions(-)

--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -310,6 +310,8 @@ static void qeth_l2_vlan_rx_add_vid(stru
 	struct qeth_vlan_vid *id;
 
 	QETH_CARD_TEXT_(card, 4, "aid:%d", vid);
+	if (!vid)
+		return;
 	if (card->info.type == QETH_CARD_TYPE_OSM) {
 		QETH_CARD_TEXT(card, 3, "aidOSM");
 		return;
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2013,13 +2013,14 @@ static void qeth_l3_vlan_rx_kill_vid(str
 	qeth_l3_set_multicast_list(card->dev);
 }
 
-static inline __u16 qeth_l3_rebuild_skb(struct qeth_card *card,
-			struct sk_buff *skb, struct qeth_hdr *hdr)
+static inline int qeth_l3_rebuild_skb(struct qeth_card *card,
+			struct sk_buff *skb, struct qeth_hdr *hdr,
+			unsigned short *vlan_id)
 {
-	unsigned short vlan_id = 0;
 	__be16 prot;
 	struct iphdr *ip_hdr;
 	unsigned char tg_addr[MAX_ADDR_LEN];
+	int is_vlan = 0;
 
 	if (!(hdr->hdr.l3.flags & QETH_HDR_PASSTHRU)) {
 		prot = htons((hdr->hdr.l3.flags & QETH_HDR_IPV6)? ETH_P_IPV6 :
@@ -2082,8 +2083,9 @@ static inline __u16 qeth_l3_rebuild_skb(
 
 	if (hdr->hdr.l3.ext_flags &
 	    (QETH_HDR_EXT_VLAN_FRAME | QETH_HDR_EXT_INCLUDE_VLAN_TAG)) {
-		vlan_id = (hdr->hdr.l3.ext_flags & QETH_HDR_EXT_VLAN_FRAME)?
+		*vlan_id = (hdr->hdr.l3.ext_flags & QETH_HDR_EXT_VLAN_FRAME) ?
 		 hdr->hdr.l3.vlan_id : *((u16 *)&hdr->hdr.l3.dest_addr[12]);
+		is_vlan = 1;
 	}
 
 	switch (card->options.checksum_type) {
@@ -2104,7 +2106,7 @@ static inline __u16 qeth_l3_rebuild_skb(
 			skb->ip_summed = CHECKSUM_NONE;
 	}
 
-	return vlan_id;
+	return is_vlan;
 }
 
 static int qeth_l3_process_inbound_buffer(struct qeth_card *card,
@@ -2114,6 +2116,7 @@ static int qeth_l3_process_inbound_buffe
 	struct sk_buff *skb;
 	struct qeth_hdr *hdr;
 	__u16 vlan_tag = 0;
+	int is_vlan;
 	unsigned int len;
 
 	*done = 0;
@@ -2129,16 +2132,12 @@ static int qeth_l3_process_inbound_buffe
 		skb->dev = card->dev;
 		switch (hdr->hdr.l3.id) {
 		case QETH_HEADER_TYPE_LAYER3:
-			vlan_tag = qeth_l3_rebuild_skb(card, skb, hdr);
+			is_vlan = qeth_l3_rebuild_skb(card, skb, hdr,
+						      &vlan_tag);
 			len = skb->len;
-			if (vlan_tag && !card->options.sniffer)
-				if (card->vlangrp)
-					vlan_gro_receive(&card->napi,
-						card->vlangrp, vlan_tag, skb);
-				else {
-					dev_kfree_skb_any(skb);
-					continue;
-				}
+			if (is_vlan && !card->options.sniffer)
+				vlan_gro_receive(&card->napi, card->vlangrp,
+					vlan_tag, skb);
 			else
 				napi_gro_receive(&card->napi, skb);
 			break;


^ permalink raw reply

* Re: Packet time delays on multi-core systems
From: Eric Dumazet @ 2010-10-01 12:59 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <20101001101650.GD4094@beaver.vrungel.ru>

Le vendredi 01 octobre 2010 à 14:16 +0400, Alexey Vlasov a écrit :

> I have also found that:
> 1. rx overruns is increasing.
> 2. rx_queue_drop_packet_count is increasing.

So you flood machine with packets, its not an idle one ?

I thought you were doing experiments with light trafic.


> # ethtool -S eth0 | grep drop
>      tx_dropped: 0
>      rx_queue_drop_packet_count: 1260743751
>      dropped_smbus: 0
>      rx_queue_0_drops: 0
>      rx_queue_1_drops: 0
>      rx_queue_2_drops: 0
>      rx_queue_3_drops: 0
> 


ethtool -S eth0   (full output, not small parts)


> 3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
> be it is just some feature in 2.6.32.
> newbox # hping -c 1 -S -p 80 111.111.111.111
> HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms
> 
> --- 111.111.111.111 hping statistic ---
> 1 packets tramitted, 1 packets received, 0% packet loss
> round-trip min/avg/max = 99.0/99.0/99.0 ms
> 
> 13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
> 13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 
> As a result I got doubles:

Are you playing with trafic shaping ?

tc -s -d qdisc



> DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms
> 
> Example of another TCP-session from 2.6.28 kernel:
> oldbox # hping -c 1 -S -p 80 111.111.111.111
> HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms
> 
> --- 111.111.111.111 hping statistic ---
> 1 packets tramitted, 1 packets received, 0% packet loss
> round-trip min/avg/max = 2.9/2.9/2.9 ms
> 
> 14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
> 14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
> 14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0
> 



^ permalink raw reply

* GOOD DAY
From: Mrs. Zeng Q Zhen @ 2010-10-01  5:47 UTC (permalink / raw)




I AM MRS. ZENG Q ZHEN, I HAVE A SECURED BUSINESS PROPOSAL FOR YOU. IF
INTERESTED E_MAIL(mrs-zengzin.hk@qatar.io)

^ permalink raw reply

* Re: [PATCH v4 1/2] HID: Add Support for Setting and Getting Feature Reports from hidraw
From: Jiri Kosina @ 2010-10-01 13:30 UTC (permalink / raw)
  To: Antonio Ospite
  Cc: Alan Ott, Stefan Achatz, Alexey Dobriyan, Tejun Heo, Alan Stern,
	Greg Kroah-Hartman, Marcel Holtmann, Stephane Chatty,
	Michael Poole, David S. Miller, Bastien Nocera, Eric Dumazet,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100928153011.32750e5d.ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn@public.gmane.org>

On Tue, 28 Sep 2010, Antonio Ospite wrote:

> Hi Alan, I am doing some stress testing on hidraw, if I have a loop with
> HIDIOCGFEATURE on a given report and I disconnect the device while the
> loop is running I get this:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> IP: [<ffffffffa02c66b4>] hidraw_ioctl+0xfc/0x32c [hid]
> 
> Full log attached along with the test program, the device is a Sony PS3
> Controller (sixaxis).
> 
> If my objdump analysis is right, hidraw_ioctl+0xfc should be around line
> 361 in hidraw.c (with your patch applied):
> 
> struct hid_device *hid = dev->hid;
> 
> It looks like 'dev' (which is hidraw_table[minor]) can be NULL
> sometimes, can't it?
> This is not introduced by your changes tho.
> 
> Just as a side note, the bug does not show up if the userspace program
> handles return values properly and exits as soon as it gets an error
> from the HID layer, see the XXX comment in test_hidraw_feature.c.
> 
> This fixes it, if it looks ok I will resend the patch rebased on
> mainline code:
> 
> diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
> index 7df1310..3c040c6 100644
> --- a/drivers/hid/hidraw.c
> +++ b/drivers/hid/hidraw.c
> @@ -322,6 +322,10 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
> 
>         mutex_lock(&minors_lock);
>         dev = hidraw_table[minor];
> +       if (!dev) {
> +               ret = -ENODEV;
> +               goto out;
> +       }
> 
>         switch (cmd) {
>                 case HIDIOCGRDESCSIZE:
> @@ -412,6 +416,7 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
> 
>                 ret = -ENOTTY;
>         }
> +out:
>         mutex_unlock(&minors_lock);
>         return ret;
>  }

Yes, this patch makes sense even for current mainline code. Could you 
please resend it to me with Signed-off-by: and changelog text, so that I 
could apply it?

Thanks!

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH 1/2] drivers/net/usb/qcusbnet: Checkpatch cleanups
From: Paulius Zaleckas @ 2010-10-01 13:26 UTC (permalink / raw)
  To: Joe Perches; +Cc: Elly Jones, netdev, dbrownell, mjg59, jglasgow, msb, olofj
In-Reply-To: <0aa502d0e385f2333f8bc12dafdcde88e5ca0262.1285727642.git.joe@perches.com>

On 09/29/2010 05:39 AM, Joe Perches wrote:
> Whitespace and removal of KERNEL_VERSION tests
> Neaten DBG macro

Why not use dev_dbg istead of this ugly DBG macro?

^ permalink raw reply

* Re: [PATCH] Add Qualcomm Gobi 2000 driver.
From: Paulius Zaleckas @ 2010-10-01 13:37 UTC (permalink / raw)
  To: Elly Jones; +Cc: netdev, dbrownell, mjg59, jglasgow, msb, olofj
In-Reply-To: <20100928171026.GB6083@google.com>

On 09/28/2010 08:10 PM, Elly Jones wrote:
> From: Elizabeth Jones<ellyjones@google.com>
>
> This driver is a rewrite of the original Qualcomm GPL driver, released as part
> of Qualcomm's "Code Aurora" initiative. The driver has been transformed into
> Linux kernel style and made to use kernel APIs where appropriate; some bugs have
> also been fixed. Note that the device in question requires firmware and a
> firmware loader; the latter has been written by mjg (see
> http://www.codon.org.uk/~mjg59/gobi_loader/).

Why not use already existing in kernel firmware upload API?

> Signed-Off-By: Elizabeth Jones<ellyjones@google.com>
> Signed-Off-By: Jason Glasgow<jglasgow@google.com>

^ permalink raw reply

* Re: [PATCH V4] fs: allow for more than 2^31 files
From: Robin Holt @ 2010-10-01 13:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Robin Holt, David Miller, dipankar, viro, bcrl, den, mingo,
	mszeredi, cmm, npiggin, xemul, linux-kernel, netdev
In-Reply-To: <1285910958.2705.56.camel@edumazet-laptop>

Looks good.

Reviewed-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>

I don't mean to flood this with my name, merely that I do find this
patch acceptable, worthy, and have tested it.  Feel free to lop off any
of these lines that are offensive.

Robin

On Fri, Oct 01, 2010 at 07:29:18AM +0200, Eric Dumazet wrote:
> Le vendredi 01 octobre 2010 à 07:03 +0200, Eric Dumazet a écrit :
> > Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit :
> > 
> > > The proc_handler used to be proc_nr_files() which would call
> > > get_nr_files() and deposit the result in files_stat.nr_files then cascade
> > > to proc_dointvec() which would dump the 3 values.  Now it will dump the
> > > three values, but not update the middle (nr_files) value first.
> > > 
> > 
> > Ah I get it now, thanks !
> > 
> > I'll send a V4 shortly.
> > 
> > 
> 
> In this v4, I call proc_nr_files() again, and proc_nr_files() calls
> proc_doulongvec_minmax() instead of proc_dointvec()
> 
> Added the "cat /proc/sys/fs/file-nr" in Changelog
> 
> Thanks again Robin
> 
> [PATCH V3] fs: allow for more than 2^31 files
> 
> Robin Holt tried to boot a 16TB system and found af_unix was overflowing
> a 32bit value :
> 
> <quote>
> 
> We were seeing a failure which prevented boot.  The kernel was incapable
> of creating either a named pipe or unix domain socket.  This comes down
> to a common kernel function called unix_create1() which does:
> 
>         atomic_inc(&unix_nr_socks);
>         if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
>                 goto out;
> 
> The function get_max_files() is a simple return of files_stat.max_files.
> files_stat.max_files is a signed integer and is computed in
> fs/file_table.c's files_init().
> 
>         n = (mempages * (PAGE_SIZE / 1024)) / 10;
>         files_stat.max_files = n;
> 
> In our case, mempages (total_ram_pages) is approx 3,758,096,384
> (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
> This causes 2 * get_max_files() to integer overflow.
> 
> </quote>
> 
> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
> integers, and change af_unix to use an atomic_long_t instead of
> atomic_t.
> 
> get_max_files() is changed to return an unsigned long.
> get_nr_files() is changed to return a long.
> 
> unix_nr_socks is changed from atomic_t to atomic_long_t, while not
> strictly needed to address Robin problem.
>  
> Before patch (on a 64bit kernel) :
> # echo 2147483648 >/proc/sys/fs/file-max
> # cat /proc/sys/fs/file-max
> -18446744071562067968
> 
> After patch:
> # echo 2147483648 >/proc/sys/fs/file-max
> # cat /proc/sys/fs/file-max
> 2147483648
> # cat /proc/sys/fs/file-nr
> 704	0	2147483648
> 
> 
> Reported-by: Robin Holt <holt@sgi.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  fs/file_table.c    |   17 +++++++----------
>  include/linux/fs.h |    8 ++++----
>  kernel/sysctl.c    |    6 +++---
>  net/unix/af_unix.c |   14 +++++++-------
>  4 files changed, 21 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/file_table.c b/fs/file_table.c
> index a04bdd8..c3dee38 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -60,7 +60,7 @@ static inline void file_free(struct file *f)
>  /*
>   * Return the total number of open files in the system
>   */
> -static int get_nr_files(void)
> +static long get_nr_files(void)
>  {
>  	return percpu_counter_read_positive(&nr_files);
>  }
> @@ -68,7 +68,7 @@ static int get_nr_files(void)
>  /*
>   * Return the maximum number of open files in the system
>   */
> -int get_max_files(void)
> +unsigned long get_max_files(void)
>  {
>  	return files_stat.max_files;
>  }
> @@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write,
>                       void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
>  	files_stat.nr_files = get_nr_files();
> -	return proc_dointvec(table, write, buffer, lenp, ppos);
> +	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
>  }
>  #else
>  int proc_nr_files(ctl_table *table, int write,
> @@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write,
>  struct file *get_empty_filp(void)
>  {
>  	const struct cred *cred = current_cred();
> -	static int old_max;
> +	static long old_max;
>  	struct file * f;
>  
>  	/*
> @@ -140,8 +140,7 @@ struct file *get_empty_filp(void)
>  over:
>  	/* Ran out of filps - report that */
>  	if (get_nr_files() > old_max) {
> -		printk(KERN_INFO "VFS: file-max limit %d reached\n",
> -					get_max_files());
> +		pr_info("VFS: file-max limit %lu reached\n", get_max_files());
>  		old_max = get_nr_files();
>  	}
>  	goto fail;
> @@ -487,7 +486,7 @@ retry:
>  
>  void __init files_init(unsigned long mempages)
>  { 
> -	int n; 
> +	unsigned long n;
>  
>  	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
>  			SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> @@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages)
>  	 */ 
>  
>  	n = (mempages * (PAGE_SIZE / 1024)) / 10;
> -	files_stat.max_files = n; 
> -	if (files_stat.max_files < NR_FILE)
> -		files_stat.max_files = NR_FILE;
> +	files_stat.max_files = max_t(unsigned long, n, NR_FILE);
>  	files_defer_init();
>  	lg_lock_init(files_lglock);
>  	percpu_counter_init(&nr_files, 0);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 63d069b..8c06590 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -34,9 +34,9 @@
>  
>  /* And dynamically-tunable limits and defaults: */
>  struct files_stat_struct {
> -	int nr_files;		/* read only */
> -	int nr_free_files;	/* read only */
> -	int max_files;		/* tunable */
> +	unsigned long nr_files;		/* read only */
> +	unsigned long nr_free_files;	/* read only */
> +	unsigned long max_files;		/* tunable */
>  };
>  
>  struct inodes_stat_t {
> @@ -404,7 +404,7 @@ extern void __init inode_init_early(void);
>  extern void __init files_init(unsigned long);
>  
>  extern struct files_stat_struct files_stat;
> -extern int get_max_files(void);
> +extern unsigned long get_max_files(void);
>  extern int sysctl_nr_open;
>  extern struct inodes_stat_t inodes_stat;
>  extern int leases_enable, lease_break_time;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f88552c..f789a0a 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = {
>  	{
>  		.procname	= "file-nr",
>  		.data		= &files_stat,
> -		.maxlen		= 3*sizeof(int),
> +		.maxlen		= sizeof(files_stat),
>  		.mode		= 0444,
>  		.proc_handler	= proc_nr_files,
>  	},
>  	{
>  		.procname	= "file-max",
>  		.data		= &files_stat.max_files,
> -		.maxlen		= sizeof(int),
> +		.maxlen		= sizeof(files_stat.max_files),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec,
> +		.proc_handler	= proc_doulongvec_minmax,
>  	},
>  	{
>  		.procname	= "nr_open",
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 0b39b24..3e1d7d1 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -117,7 +117,7 @@
>  
>  static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
>  static DEFINE_SPINLOCK(unix_table_lock);
> -static atomic_t unix_nr_socks = ATOMIC_INIT(0);
> +static atomic_long_t unix_nr_socks;
>  
>  #define unix_sockets_unbound	(&unix_socket_table[UNIX_HASH_SIZE])
>  
> @@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk)
>  	if (u->addr)
>  		unix_release_addr(u->addr);
>  
> -	atomic_dec(&unix_nr_socks);
> +	atomic_long_dec(&unix_nr_socks);
>  	local_bh_disable();
>  	sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>  	local_bh_enable();
>  #ifdef UNIX_REFCNT_DEBUG
> -	printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk,
> -		atomic_read(&unix_nr_socks));
> +	printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk,
> +		atomic_long_read(&unix_nr_socks));
>  #endif
>  }
>  
> @@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
>  	struct sock *sk = NULL;
>  	struct unix_sock *u;
>  
> -	atomic_inc(&unix_nr_socks);
> -	if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
> +	atomic_long_inc(&unix_nr_socks);
> +	if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files())
>  		goto out;
>  
>  	sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto);
> @@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
>  	unix_insert_socket(unix_sockets_unbound, sk);
>  out:
>  	if (sk == NULL)
> -		atomic_dec(&unix_nr_socks);
> +		atomic_long_dec(&unix_nr_socks);
>  	else {
>  		local_bh_disable();
>  		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 1/2] drivers/net/usb/qcusbnet: Checkpatch cleanups
From: Joe Perches @ 2010-10-01 13:41 UTC (permalink / raw)
  To: Paulius Zaleckas
  Cc: Elly Jones, netdev, dbrownell, mjg59, jglasgow, msb, olofj
In-Reply-To: <4CA5E185.3090902@gmail.com>

On Fri, 2010-10-01 at 16:26 +0300, Paulius Zaleckas wrote:
> On 09/29/2010 05:39 AM, Joe Perches wrote:
> > Whitespace and removal of KERNEL_VERSION tests
> > Neaten DBG macro
> 
> Why not use dev_dbg istead of this ugly DBG macro?

Currently, like a lot of other macros
in the tree, it uses a runtime flag to
control output.

dev_dbg doesn't have that capability.



^ permalink raw reply

* Re: [PATCH] Add Qualcomm Gobi 2000 driver.
From: Matthew Garrett @ 2010-10-01 13:50 UTC (permalink / raw)
  To: Paulius Zaleckas; +Cc: Elly Jones, netdev, dbrownell, jglasgow, msb, olofj
In-Reply-To: <4CA5E40D.4080507@gmail.com>

On Fri, Oct 01, 2010 at 04:37:17PM +0300, Paulius Zaleckas wrote:
> On 09/28/2010 08:10 PM, Elly Jones wrote:
>> From: Elizabeth Jones<ellyjones@google.com>
>>
>> This driver is a rewrite of the original Qualcomm GPL driver, released as part
>> of Qualcomm's "Code Aurora" initiative. The driver has been transformed into
>> Linux kernel style and made to use kernel APIs where appropriate; some bugs have
>> also been fixed. Note that the device in question requires firmware and a
>> firmware loader; the latter has been written by mjg (see
>> http://www.codon.org.uk/~mjg59/gobi_loader/).
>
> Why not use already existing in kernel firmware upload API?

Because chosing the correct firmware requires making a runtime policy 
decision, the firmware is of variable size (anywhere from 12MB to 16MB) 
and it needs to be reencoded with something resembling PPP framing 
before it gets dumped to the device. It's certainly possible to do it 
in-kernel, but it'd be rather a lot more code and allocating that much 
unswappable space sounds like something that would make some usecases 
impossible.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: [PATCH 1/2] drivers/net/usb/qcusbnet: Checkpatch cleanups
From: Paulius Zaleckas @ 2010-10-01 13:54 UTC (permalink / raw)
  To: Joe Perches; +Cc: Elly Jones, netdev, dbrownell, mjg59, jglasgow, msb, olofj
In-Reply-To: <1285940519.752.10.camel@Joe-Laptop>

On Fri, Oct 1, 2010 at 4:41 PM, Joe Perches <joe@perches.com> wrote:
> On Fri, 2010-10-01 at 16:26 +0300, Paulius Zaleckas wrote:
>> On 09/29/2010 05:39 AM, Joe Perches wrote:
>> > Whitespace and removal of KERNEL_VERSION tests
>> > Neaten DBG macro
>>
>> Why not use dev_dbg istead of this ugly DBG macro?
>
> Currently, like a lot of other macros
> in the tree, it uses a runtime flag to
> control output.
>
> dev_dbg doesn't have that capability.

It has runtime flag if CONFIG_DYNAMIC_DEBUG is enabled.

^ permalink raw reply

* [PATCH net-next V2] net: dynamic ingress_queue allocation
From: Eric Dumazet @ 2010-10-01 13:56 UTC (permalink / raw)
  To: hadi; +Cc: Jarek Poplawski, David Miller, netdev
In-Reply-To: <1285933506.3553.176.camel@bigi>

Le vendredi 01 octobre 2010 à 07:45 -0400, jamal a écrit :
> On Fri, 2010-10-01 at 00:58 +0200, Eric Dumazet wrote:
> > Hi Jamal
> > 
> > Here is the dynamic allocation I promised. I lightly tested it, could
> > you review it please ?
> > Thanks !
> > 
> > [PATCH net-next2.6] net: dynamic ingress_queue allocation
> > 
> > ingress being not used very much, and net_device->ingress_queue being
> > quite a big object (128 or 256 bytes), use a dynamic allocation if
> > needed (tc qdisc add dev eth0 ingress ...)
> 
> I agree with the principle that it is valuable in making it dynamic for
> people who dont want it; but, but (like my kid would say, sniff, sniff)
> you are making me sad saying it is not used very much ;-> It is used
> very much in my world. My friend Jarek uses it;->
> 

;)

> 
> > +#ifdef CONFIG_NET_CLS_ACT
> 
> I think appropriately this should be NET_SCH_INGRESS (everywhere else as
> well).
> 

I first thought of this, and found it would add a new dependence on
vmlinux :

If someone wants to add NET_SCH_INGRESS module, he would need to
recompile whole kernel and reboot.

This is probably why ing_filter() and handle_ing() are enclosed with
CONFIG_NET_CLS_ACT, not CONFIG_NET_SCH_INGRESS.

Since struct net_dev only holds one pointer (after this patch), I
believe its better to use same dependence.

> 
> > +static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
> > +{
> > +#ifdef CONFIG_NET_CLS_ACT
> > +	return dev->ingress_queue;
> > +#else
> > +	return NULL;
> > +#endif
> 
> Above, if you just returned dev->ingress_queue wouldnt it always be 
> NULL if it was not allocated?
> 

ingress_queue is not defined in "struct net_device *dev" if 
!CONFIG_NET_CLS_ACT

Returning NULL here permits dead code elimination by compiler.

Then, probably nobody unset CONFIG_NET_CLS_ACT, so we can probably avoid
this preprocessor stuff.

> 
> > @@ -2737,7 +2734,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> >  					 struct packet_type **pt_prev,
> >  					 int *ret, struct net_device *orig_dev)
> >  {
> > -	if (skb->dev->ingress_queue.qdisc == &noop_qdisc)
> > +	struct netdev_queue *rxq = dev_ingress_queue(skb->dev);
> > +
> > +	if (!rxq || rxq->qdisc == &noop_qdisc)
> >  		goto out;
> 
> I stared at above a little longer since this is the only fast path
> affected; is it a few more cycles now for people who love ingress?

I see, this adds an indirection and a conditional branch, but this
should be in cpu cache and well predicted.

I thought adding a fake "struct netdev_queue" object, with a qdisc
pointing to noop_qdisc. But this would slow down a bit non ingress
users ;)

For people not using ingress, my solution removes an access to an extra
cache line. So network latency is improved a bit when cpu caches are
full of user land data.

> 
> > @@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> >  		    (new && new->flags & TCQ_F_INGRESS)) {
> >  			num_q = 1;
> >  			ingress = 1;
> > +			if (!dev_ingress_queue(dev))
> > +				return -ENOENT;
> >  		}
> >  
> 
> The above looks clever but worries me because it changes the old flow.
> If you have time,  the following tests will alleviate my fears
> 
> 1) compile support for ingress and add/delete ingress qdisc

This worked for me, but I dont know complex setups.

> 2) Dont compile support and add/delete ingress qdisc

tc gives an error (a bit like !CONFIG_NET_SCH_INGRESS)

# tc qdisc add dev eth0 ingress
RTNETLINK answers: No such file or directory
# tc -s -d qdisc show dev eth0
qdisc mq 0: root 
 Sent 636 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 


> 3) Compile ingress as a module and add/delete ingress qdisc
> 
> 

Seems to work like 1)

> Other than that excellent work Eric. And you can add my
> Acked/reviewed-by etc.
> 
> BTW, did i say i like your per-cpu stats stuff? It applies nicely to
> qdiscs, actions etc ;->

I took a look at ifb as suggested by Stephen but could not see trivial
changes (LLTX or per-cpu stats), since central lock is needed I am
afraid. And qdisc are the same, stats updates are mostly free as we
dirtied cache line for the lock.

Thanks Jamal !

Here is the V2, with two #ifdef removed.


[PATCH net-next V2] net: dynamic ingress_queue allocation

ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |   11 ++++++--
 net/core/dev.c            |   48 +++++++++++++++++++++++++++---------
 net/sched/sch_api.c       |   40 ++++++++++++++++++++----------
 net/sched/sch_generic.c   |   36 +++++++++++++++------------
 4 files changed, 92 insertions(+), 43 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ceed347..4f86009 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -986,8 +986,7 @@ struct net_device {
 	rx_handler_func_t	*rx_handler;
 	void			*rx_handler_data;
 
-	struct netdev_queue	ingress_queue; /* use two cache lines */
-
+	struct netdev_queue	*ingress_queue;
 /*
  * Cache lines mostly used on transmit path
  */
@@ -1115,6 +1114,14 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev,
 		f(dev, &dev->_tx[i], arg);
 }
 
+
+static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
+{
+	return dev->ingress_queue;
+}
+
+extern struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
+
 /*
  * Net namespace inlines
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index a313bab..e3bb8c9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2702,11 +2702,10 @@ EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
  * the ingress scheduler, you just cant add policies on ingress.
  *
  */
-static int ing_filter(struct sk_buff *skb)
+static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 {
 	struct net_device *dev = skb->dev;
 	u32 ttl = G_TC_RTTL(skb->tc_verd);
-	struct netdev_queue *rxq;
 	int result = TC_ACT_OK;
 	struct Qdisc *q;
 
@@ -2720,8 +2719,6 @@ static int ing_filter(struct sk_buff *skb)
 	skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
 
-	rxq = &dev->ingress_queue;
-
 	q = rxq->qdisc;
 	if (q != &noop_qdisc) {
 		spin_lock(qdisc_lock(q));
@@ -2737,7 +2734,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 					 struct packet_type **pt_prev,
 					 int *ret, struct net_device *orig_dev)
 {
-	if (skb->dev->ingress_queue.qdisc == &noop_qdisc)
+	struct netdev_queue *rxq = dev_ingress_queue(skb->dev);
+
+	if (!rxq || rxq->qdisc == &noop_qdisc)
 		goto out;
 
 	if (*pt_prev) {
@@ -2745,7 +2744,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		*pt_prev = NULL;
 	}
 
-	switch (ing_filter(skb)) {
+	switch (ing_filter(skb, rxq)) {
 	case TC_ACT_SHOT:
 	case TC_ACT_STOLEN:
 		kfree_skb(skb);
@@ -4932,15 +4931,17 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
 					  struct netdev_queue *dev_queue,
 					  void *_unused)
 {
-	spin_lock_init(&dev_queue->_xmit_lock);
-	netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
-	dev_queue->xmit_lock_owner = -1;
+	if (dev_queue) {
+		spin_lock_init(&dev_queue->_xmit_lock);
+		netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
+		dev_queue->xmit_lock_owner = -1;
+	}
 }
 
 static void netdev_init_queue_locks(struct net_device *dev)
 {
 	netdev_for_each_tx_queue(dev, __netdev_init_queue_locks_one, NULL);
-	__netdev_init_queue_locks_one(dev, &dev->ingress_queue, NULL);
+	__netdev_init_queue_locks_one(dev, dev_ingress_queue(dev), NULL);
 }
 
 unsigned long netdev_fix_features(unsigned long features, const char *name)
@@ -5447,16 +5448,37 @@ static void netdev_init_one_queue(struct net_device *dev,
 				  struct netdev_queue *queue,
 				  void *_unused)
 {
-	queue->dev = dev;
+	if (queue)
+		queue->dev = dev;
 }
 
 static void netdev_init_queues(struct net_device *dev)
 {
-	netdev_init_one_queue(dev, &dev->ingress_queue, NULL);
+	netdev_init_one_queue(dev, dev_ingress_queue(dev), NULL);
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
 	spin_lock_init(&dev->tx_global_lock);
 }
 
+struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
+{
+	struct netdev_queue *queue = dev_ingress_queue(dev);
+
+#ifdef CONFIG_NET_CLS_ACT
+	if (queue)
+		return queue;
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+	if (!queue)
+		return NULL;
+	netdev_init_one_queue(dev, queue, NULL);
+	__netdev_init_queue_locks_one(dev, queue, NULL);
+	queue->qdisc = &noop_qdisc;
+	queue->qdisc_sleeping = &noop_qdisc;
+	smp_wmb();
+	dev->ingress_queue = queue;
+#endif
+	return queue;
+}
+
 /**
  *	alloc_netdev_mq - allocate network device
  *	@sizeof_priv:	size of private data to allocate space for
@@ -5559,6 +5581,8 @@ void free_netdev(struct net_device *dev)
 
 	kfree(dev->_tx);
 
+	kfree(dev_ingress_queue(dev));
+
 	/* Flush device addresses */
 	dev_addr_flush(dev);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b802078..8635110 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -240,7 +240,10 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 	if (q)
 		goto out;
 
-	q = qdisc_match_from_root(dev->ingress_queue.qdisc_sleeping, handle);
+	if (!dev_ingress_queue(dev))
+		goto out;
+	q = qdisc_match_from_root(dev_ingress_queue(dev)->qdisc_sleeping,
+				  handle);
 out:
 	return q;
 }
@@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		    (new && new->flags & TCQ_F_INGRESS)) {
 			num_q = 1;
 			ingress = 1;
+			if (!dev_ingress_queue(dev))
+				return -ENOENT;
 		}
 
 		if (dev->flags & IFF_UP)
@@ -701,7 +706,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		}
 
 		for (i = 0; i < num_q; i++) {
-			struct netdev_queue *dev_queue = &dev->ingress_queue;
+			struct netdev_queue *dev_queue = dev_ingress_queue(dev);
 
 			if (!ingress)
 				dev_queue = netdev_get_tx_queue(dev, i);
@@ -979,7 +984,8 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 					return -ENOENT;
 				q = qdisc_leaf(p, clid);
 			} else { /* ingress */
-				q = dev->ingress_queue.qdisc_sleeping;
+				if (dev_ingress_queue(dev))
+					q = dev_ingress_queue(dev)->qdisc_sleeping;
 			}
 		} else {
 			q = dev->qdisc;
@@ -1044,7 +1050,8 @@ replay:
 					return -ENOENT;
 				q = qdisc_leaf(p, clid);
 			} else { /*ingress */
-				q = dev->ingress_queue.qdisc_sleeping;
+				if (dev_ingress_queue_create(dev))
+					q = dev_ingress_queue(dev)->qdisc_sleeping;
 			}
 		} else {
 			q = dev->qdisc;
@@ -1123,11 +1130,14 @@ replay:
 create_n_graft:
 	if (!(n->nlmsg_flags&NLM_F_CREATE))
 		return -ENOENT;
-	if (clid == TC_H_INGRESS)
-		q = qdisc_create(dev, &dev->ingress_queue, p,
-				 tcm->tcm_parent, tcm->tcm_parent,
-				 tca, &err);
-	else {
+	if (clid == TC_H_INGRESS) {
+		if (dev_ingress_queue(dev))
+			q = qdisc_create(dev, dev_ingress_queue(dev), p,
+					 tcm->tcm_parent, tcm->tcm_parent,
+					 tca, &err);
+		else
+			err = -ENOENT;
+	} else {
 		struct netdev_queue *dev_queue;
 
 		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
@@ -1304,8 +1314,10 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
 		if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
 			goto done;
 
-		dev_queue = &dev->ingress_queue;
-		if (tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb, &q_idx, s_q_idx) < 0)
+		dev_queue = dev_ingress_queue(dev);
+		if (dev_queue &&
+		    tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
+				       &q_idx, s_q_idx) < 0)
 			goto done;
 
 cont:
@@ -1595,8 +1607,10 @@ static int tc_dump_tclass(struct sk_buff *skb, struct netlink_callback *cb)
 	if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
 		goto done;
 
-	dev_queue = &dev->ingress_queue;
-	if (tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb, &t, s_t) < 0)
+	dev_queue = dev_ingress_queue(dev);
+	if (dev_queue &&
+	    tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb,
+				&t, s_t) < 0)
 		goto done;
 
 done:
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 545278a..c42dec5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -721,16 +721,18 @@ static void transition_one_qdisc(struct net_device *dev,
 				 struct netdev_queue *dev_queue,
 				 void *_need_watchdog)
 {
-	struct Qdisc *new_qdisc = dev_queue->qdisc_sleeping;
-	int *need_watchdog_p = _need_watchdog;
+	if (dev_queue) {
+		struct Qdisc *new_qdisc = dev_queue->qdisc_sleeping;
+		int *need_watchdog_p = _need_watchdog;
 
-	if (!(new_qdisc->flags & TCQ_F_BUILTIN))
-		clear_bit(__QDISC_STATE_DEACTIVATED, &new_qdisc->state);
+		if (!(new_qdisc->flags & TCQ_F_BUILTIN))
+			clear_bit(__QDISC_STATE_DEACTIVATED, &new_qdisc->state);
 
-	rcu_assign_pointer(dev_queue->qdisc, new_qdisc);
-	if (need_watchdog_p && new_qdisc != &noqueue_qdisc) {
-		dev_queue->trans_start = 0;
-		*need_watchdog_p = 1;
+		rcu_assign_pointer(dev_queue->qdisc, new_qdisc);
+		if (need_watchdog_p && new_qdisc != &noqueue_qdisc) {
+			dev_queue->trans_start = 0;
+			*need_watchdog_p = 1;
+		}
 	}
 }
 
@@ -753,7 +755,7 @@ void dev_activate(struct net_device *dev)
 
 	need_watchdog = 0;
 	netdev_for_each_tx_queue(dev, transition_one_qdisc, &need_watchdog);
-	transition_one_qdisc(dev, &dev->ingress_queue, NULL);
+	transition_one_qdisc(dev, dev_ingress_queue(dev), NULL);
 
 	if (need_watchdog) {
 		dev->trans_start = jiffies;
@@ -768,7 +770,7 @@ static void dev_deactivate_queue(struct net_device *dev,
 	struct Qdisc *qdisc_default = _qdisc_default;
 	struct Qdisc *qdisc;
 
-	qdisc = dev_queue->qdisc;
+	qdisc = dev_queue ? dev_queue->qdisc : NULL;
 	if (qdisc) {
 		spin_lock_bh(qdisc_lock(qdisc));
 
@@ -812,7 +814,7 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 void dev_deactivate(struct net_device *dev)
 {
 	netdev_for_each_tx_queue(dev, dev_deactivate_queue, &noop_qdisc);
-	dev_deactivate_queue(dev, &dev->ingress_queue, &noop_qdisc);
+	dev_deactivate_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
 
 	dev_watchdog_down(dev);
 
@@ -830,15 +832,17 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 {
 	struct Qdisc *qdisc = _qdisc;
 
-	dev_queue->qdisc = qdisc;
-	dev_queue->qdisc_sleeping = qdisc;
+	if (dev_queue) {
+		dev_queue->qdisc = qdisc;
+		dev_queue->qdisc_sleeping = qdisc;
+	}
 }
 
 void dev_init_scheduler(struct net_device *dev)
 {
 	dev->qdisc = &noop_qdisc;
 	netdev_for_each_tx_queue(dev, dev_init_scheduler_queue, &noop_qdisc);
-	dev_init_scheduler_queue(dev, &dev->ingress_queue, &noop_qdisc);
+	dev_init_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
 
 	setup_timer(&dev->watchdog_timer, dev_watchdog, (unsigned long)dev);
 }
@@ -847,7 +851,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
 				     struct netdev_queue *dev_queue,
 				     void *_qdisc_default)
 {
-	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+	struct Qdisc *qdisc = dev_queue ? dev_queue->qdisc_sleeping : NULL;
 	struct Qdisc *qdisc_default = _qdisc_default;
 
 	if (qdisc) {
@@ -861,7 +865,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
 void dev_shutdown(struct net_device *dev)
 {
 	netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
-	shutdown_scheduler_queue(dev, &dev->ingress_queue, &noop_qdisc);
+	shutdown_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
 	qdisc_destroy(dev->qdisc);
 	dev->qdisc = &noop_qdisc;
 



^ permalink raw reply related

* INVESTMENT PROJECT IN YOUR COUNTRY. (HOW CAN I INVEST IN YOUR COUNTRY)?
From: kunihira barbara @ 2010-10-01  7:47 UTC (permalink / raw)


Dear Sir, 
 
I am Mr.Sam Gapke, The Managing Director of the REA. I am interested in investing the sum of US $20M (Twenty Million US Dollars) in any Lucrative Business in the area related to your Business line. Thanks for anticipate cooperation, your urgent response will be highly appreciated. 
Feel free to contact me with this my personal email address:sam.pk01@gmail.com
Best regards, 
 
Mr. Sam Gapke

^ permalink raw reply

* Re: Packet time delays on multi-core systems
From: Alexey Vlasov @ 2010-10-01 14:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <1285937966.2641.117.camel@edumazet-laptop>

On Fri, Oct 01, 2010 at 02:59:26PM +0200, Eric Dumazet wrote:
> Le vendredi 01 octobre 2010 ?? 14:16 +0400, Alexey Vlasov a ??crit :
> 
> > I have also found that:
> > 1. rx overruns is increasing.
> > 2. rx_queue_drop_packet_count is increasing.
> 
> So you flood machine with packets, its not an idle one ?
> 
> I thought you were doing experiments with light trafic.

No, it's a usual working server for shared hosting. There're about 1000 
clients' website, and I don't flood it specially. Franlky speaking I don't 
see any network suspicious activity.
 
> > # ethtool -S eth0 | grep drop
> >      tx_dropped: 0
> >      rx_queue_drop_packet_count: 1260743751
> >      dropped_smbus: 0
> >      rx_queue_0_drops: 0
> >      rx_queue_1_drops: 0
> >      rx_queue_2_drops: 0
> >      rx_queue_3_drops: 0
> > 
> 
> 
> ethtool -S eth0   (full output, not small parts)

NIC statistics:
     rx_packets: 2973717440
     tx_packets: 3032670910
     rx_bytes: 1892633650741
     tx_bytes: 2536130682695
     rx_broadcast: 118773199
     tx_broadcast: 68013
     rx_multicast: 95257
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 95257
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 7939
     rx_queue_drop_packet_count: 1324025520
     rx_missed_errors: 146631
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 50715
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 344724062
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 1892633650741
     rx_csum_offload_good: 2973697420
     rx_csum_offload_errors: 6235
     tx_dma_out_of_sync: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 9327
     rx_smbus: 118531661
     dropped_smbus: 0
     tx_queue_0_packets: 797617475
     tx_queue_0_bytes: 630191908685
     tx_queue_1_packets: 719681297
     tx_queue_1_bytes: 625907304846
     tx_queue_2_packets: 718841556
     tx_queue_2_bytes: 620522418855
     tx_queue_3_packets: 796521255
     tx_queue_3_bytes: 646196024585
     rx_queue_0_packets: 788885797
     rx_queue_0_bytes: 458936338699
     rx_queue_0_drops: 0
     rx_queue_1_packets: 701354604
     rx_queue_1_bytes: 457490536453
     rx_queue_1_drops: 0
     rx_queue_2_packets: 791887663
     rx_queue_2_bytes: 534425333616
     rx_queue_2_drops: 0
     rx_queue_3_packets: 691579028
     rx_queue_3_bytes: 429887244557
     rx_queue_3_drops: 0

> > 3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
> > be it is just some feature in 2.6.32.
> > newbox # hping -c 1 -S -p 80 111.111.111.111
> > HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> > len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms
> > 
> > --- 111.111.111.111 hping statistic ---
> > 1 packets tramitted, 1 packets received, 0% packet loss
> > round-trip min/avg/max = 99.0/99.0/99.0 ms
> > 
> > 13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
> > 13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 
> > As a result I got doubles:
> 
> Are you playing with trafic shaping ?
> 
> tc -s -d qdisc
 
No, nothing alike, no shapers.

# tc -s -d qdisc
bash: tc: command not found
 
> > DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms
> > 
> > Example of another TCP-session from 2.6.28 kernel:
> > oldbox # hping -c 1 -S -p 80 111.111.111.111
> > HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> > len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms
> > 
> > --- 111.111.111.111 hping statistic ---
> > 1 packets tramitted, 1 packets received, 0% packet loss
> > round-trip min/avg/max = 2.9/2.9/2.9 ms
> > 
> > 14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
> > 14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
> > 14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0
> > 

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply

* [patch v2 00/12] IPVS: SIP Persistence Engine
From: Simon Horman @ 2010-10-01 14:35 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter, netfilter-devel
  Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy

This patch series adds load-balancing of UDP SIP based on Call-ID to
IPVS as well as a frame-work for extending IPVS to handle alternate
persistence requirements.

REVISIONS

This v2 of this patch series which fixes several problems
including non-atomic allocations while running atomic, and a memory leak.

v1 of this series addressed a few minor problems.

Internally there were 4 rfc versions, 0.1, 0.2, 0.3 and 0.4.

All changes are noted on a per-patch basis.

OVERVIEW

The approach that I have taken is what I call persistence engines.
The basic idea being that you can provide a module to LVS that alters
the way that it handles connection templates, which are at the core
of persistence. In particular, an additional key can be added, and
any of the normal IP address, port and protocol information can either
be used or ignored.

In the case of the SIP persistence engine, the only persistence engine, all
the keys used by the default persistence behaviour are used and the callid
is added as an extra key. I originally intended to ignore the cip, but this
can optionally be done by setting the persistence mask (-M) to 0.0.0.0
while allowing the flexibility of other mask values.

It is envisaged that the SIP persistence engine will be used in conjunction
with one-packet scheduling. I'm interested to hear if that doesn't fit your
needs.


CONFIGURATION

A persistence engine is associated with a virtual service
(as are schedulers). I have added the --pe option to the
ivpsadm -A and -E commands to allow the persistence engine
of a virtual service to be added, changed, or deleted.

e.g. ipvsadm -A -u 10.4.3.192:5060 -p 60 -M 0.0.0.0 -o --pe sip

There are no other configuration parameters at this time.


RUNNING

When a connection template is created, if its virtual service
has a persistence engine, then the persistence engine can add
an extra key to the connection template. For the SIP module this
is the callid. More generically, it is known as "pe data". And
both the name of the persistence engine, "pe name", and "pe data"
can be viewed in /proc/net/ip_vs_conn and by passing the
--persistent-conn option to ipvsadm -Lc.

e.g.
# ipvsadm -Lcn --persistent-conn
UDP 00:38  UDP         10.4.3.0:0         10.4.3.192:5060    127.0.0.1:5060 sip 193373839

Here we see a single persistence template (cport is 0), which has been
handled by the sip persistence engine. The pe data (callid) is 193373839.

In the case where the persistence engine can't match a packet for some
reason, the connection will fall back to the normal persistence handling.
This seems reasonable, as that if the packet ought to be dropped, iptables
could be used.

A limited amount of debugging information has been added which
can be enabled using a value of 9 or greater in
/proc/sys/net/ipv4/vs/debug_level

CODE AVAILABILITY

The kernel patches (12) are available in git as the pe-2 branch of
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git

The ipvsadm patches (2) are available in git as the pe-2 branch of
git://github.com/horms/ipvsadm-test.git

I will post the ipvsadm patches separately


^ permalink raw reply

* [patch v2 01/12] [PATCH 01/12] netfilter: nf_conntrack_sip: Allow ct_sip_get_header() to be called with a null ct argument
From: Simon Horman @ 2010-10-01 14:35 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter, netfilter-devel
  Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <20101001143517.645421976@akiko.akashicho.tokyo.vergenet.net>

[-- Attachment #1: 0001-netfilter-nf_conntrack_sip-Allow-ct_sip_get_header-t.patch --]
[-- Type: text/plain, Size: 476 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 53d8922..2fd1ea2 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -152,6 +152,9 @@ static int parse_addr(const struct nf_conn *ct, const char *cp,
 	const char *end;
 	int ret = 0;
 
+	if (!ct)
+		return 0;
+
 	memset(addr, 0, sizeof(*addr));
 	switch (nf_ct_l3num(ct)) {
 	case AF_INET:
-- 
1.7.1



^ permalink raw reply related

* [patch v2 03/12] [PATCH 03/12] IPVS: compact ip_vs_sched_persist()
From: Simon Horman @ 2010-10-01 14:35 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter, netfilter-devel
  Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <20101001143517.645421976@akiko.akashicho.tokyo.vergenet.net>

[-- Attachment #1: 0003-IPVS-compact-ip_vs_sched_persist.patch --]
[-- Type: text/plain, Size: 5731 bytes --]

Compact ip_vs_sched_persist() by setting up parameters
and calling functions once.

Signed-off-by: Simon Horman <horms@verge.net.au>
---

v2
* Make "union nf_inet_addr fwmark" const
* Don't remove the comment next to the declaration of dport
* Add a comment to the declaration of vport

Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c	2010-10-01 21:56:39.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c	2010-10-01 22:02:41.000000000 +0900
@@ -193,10 +193,14 @@ ip_vs_sched_persist(struct ip_vs_service
 	struct ip_vs_iphdr iph;
 	struct ip_vs_dest *dest;
 	struct ip_vs_conn *ct;
-	__be16  dport;			/* destination port to forward */
+	int protocol = iph.protocol;
+	__be16 dport = 0;		/* destination port to forward */
+	__be16 vport = 0;		/* virtual service port */
 	unsigned int flags;
 	union nf_inet_addr snet;	/* source network of the client,
 					   after masking */
+	const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
+	const union nf_inet_addr *vaddr = &iph.daddr;
 
 	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
 
@@ -227,119 +231,58 @@ ip_vs_sched_persist(struct ip_vs_service
 	 * service, and a template like <caddr, 0, vaddr, vport, daddr, dport>
 	 * is created for other persistent services.
 	 */
-	if (ports[1] == svc->port) {
-		/* Check if a template already exists */
-		if (svc->port != FTPPORT)
-			ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
-					     &iph.daddr, ports[1]);
-		else
-			ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
-					     &iph.daddr, 0);
-
-		if (!ct || !ip_vs_check_template(ct)) {
-			/*
-			 * No template found or the dest of the connection
-			 * template is not available.
-			 */
-			dest = svc->scheduler->schedule(svc, skb);
-			if (dest == NULL) {
-				IP_VS_DBG(1, "p-schedule: no dest found.\n");
-				return NULL;
-			}
-
-			/*
-			 * Create a template like <protocol,caddr,0,
-			 * vaddr,vport,daddr,dport> for non-ftp service,
-			 * and <protocol,caddr,0,vaddr,0,daddr,0>
-			 * for ftp service.
+	{
+		if (ports[1] == svc->port) {
+			/* non-FTP template:
+			 * <protocol, caddr, 0, vaddr, vport, daddr, dport>
+			 * FTP template:
+			 * <protocol, caddr, 0, vaddr, 0, daddr, 0>
 			 */
 			if (svc->port != FTPPORT)
-				ct = ip_vs_conn_new(svc->af, iph.protocol,
-						    &snet, 0,
-						    &iph.daddr,
-						    ports[1],
-						    &dest->addr, dest->port,
-						    IP_VS_CONN_F_TEMPLATE,
-						    dest);
-			else
-				ct = ip_vs_conn_new(svc->af, iph.protocol,
-						    &snet, 0,
-						    &iph.daddr, 0,
-						    &dest->addr, 0,
-						    IP_VS_CONN_F_TEMPLATE,
-						    dest);
-			if (ct == NULL)
-				return NULL;
-
-			ct->timeout = svc->timeout;
+				vport = ports[1];
 		} else {
-			/* set destination with the found template */
-			dest = ct->dest;
-		}
-		dport = dest->port;
-	} else {
-		/*
-		 * Note: persistent fwmark-based services and persistent
-		 * port zero service are handled here.
-		 * fwmark template: <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
-		 * port zero template: <protocol,caddr,0,vaddr,0,daddr,0>
-		 */
-		if (svc->fwmark) {
-			union nf_inet_addr fwmark = {
-				.ip = htonl(svc->fwmark)
-			};
-
-			ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
-					     &fwmark, 0);
-		} else
-			ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
-					     &iph.daddr, 0);
-
-		if (!ct || !ip_vs_check_template(ct)) {
-			/*
-			 * If it is not persistent port zero, return NULL,
-			 * otherwise create a connection template.
+			/* Note: persistent fwmark-based services and
+			 * persistent port zero service are handled here.
+			 * fwmark template:
+			 * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
+			 * port zero template:
+			 * <protocol,caddr,0,vaddr,0,daddr,0>
 			 */
-			if (svc->port)
-				return NULL;
-
-			dest = svc->scheduler->schedule(svc, skb);
-			if (dest == NULL) {
-				IP_VS_DBG(1, "p-schedule: no dest found.\n");
-				return NULL;
+			if (svc->fwmark) {
+				protocol = IPPROTO_IP;
+				vaddr = &fwmark;
 			}
+		}
+	}
 
-			/*
-			 * Create a template according to the service
-			 */
-			if (svc->fwmark) {
-				union nf_inet_addr fwmark = {
-					.ip = htonl(svc->fwmark)
-				};
-
-				ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
-						    &snet, 0,
-						    &fwmark, 0,
-						    &dest->addr, 0,
-						    IP_VS_CONN_F_TEMPLATE,
-						    dest);
-			} else
-				ct = ip_vs_conn_new(svc->af, iph.protocol,
-						    &snet, 0,
-						    &iph.daddr, 0,
-						    &dest->addr, 0,
-						    IP_VS_CONN_F_TEMPLATE,
-						    dest);
-			if (ct == NULL)
-				return NULL;
+	/* Check if a template already exists */
+	ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
 
-			ct->timeout = svc->timeout;
-		} else {
-			/* set destination with the found template */
-			dest = ct->dest;
+	if (!ct || !ip_vs_check_template(ct)) {
+		/* No template found or the dest of the connection
+		 * template is not available.
+		 */
+		dest = svc->scheduler->schedule(svc, skb);
+		if (!dest) {
+			IP_VS_DBG(1, "p-schedule: no dest found.\n");
+			return NULL;
 		}
-		dport = ports[1];
-	}
+
+		if (ports[1] == svc->port && svc->port != FTPPORT)
+			dport = dest->port;
+
+		/* Create a template */
+		ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
+				    &dest->addr, dport,
+				    IP_VS_CONN_F_TEMPLATE, dest);
+		if (ct == NULL)
+			return NULL;
+
+		ct->timeout = svc->timeout;
+	} else
+		/* set destination with the found template */
+		dest = ct->dest;
+	dport = dest->port;
 
 	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
 		 && iph.protocol == IPPROTO_UDP)?


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox