Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ipv4: use daddr to get inet_peer
From: Hannes Frederic Sowa @ 2014-02-14  9:41 UTC (permalink / raw)
  To: Duan Jiong; +Cc: David Miller, netdev
In-Reply-To: <52FDE10F.5010903@cn.fujitsu.com>

On Fri, Feb 14, 2014 at 05:25:35PM +0800, Duan Jiong wrote:
> 
> since commit 1d861aa4("inet: Minimize use of cached route inetpeer"),
> ip_error() uses saddr to get inet_peer, so ip_error() and icmpv4_xrlim_allow()
> use the same inet_peer to limit icmp error message twice.
> 
> In ip_error(), peer->rate_tokens is set to ip_rt_error_burst, but in
> inet_peer_xrlim_allow() peer->rate_tokens is set to XRLIM_BURST_FACTOR.
> XRLIM_BURST_FACTOR is defined to 6, so user seting ip_rt_error_burst makes
> no sense.
> 
> In my opinion, the ip_rt_error_burst is used to limit icmp error messages
> for daddr instead of saddr.

Hmmm...

ip_error is a dst_input function, as such it gets called with the incoming
packet. saddr is the address we send the reply back (see
icmp_send->icmp_route_lookup).

Sorry, I don't think the patch is correct.

Bye,

  Hannes

^ permalink raw reply

* Re: [PATCH] ipv4: use daddr to get inet_peer
From: Duan Jiong @ 2014-02-14  9:51 UTC (permalink / raw)
  To: hannes; +Cc: David Miller, netdev
In-Reply-To: <20140214094151.GA12549@order.stressinduktion.org>

于 2014年02月14日 17:41, Hannes Frederic Sowa 写道:
> On Fri, Feb 14, 2014 at 05:25:35PM +0800, Duan Jiong wrote:
>>
>> since commit 1d861aa4("inet: Minimize use of cached route inetpeer"),
>> ip_error() uses saddr to get inet_peer, so ip_error() and icmpv4_xrlim_allow()
>> use the same inet_peer to limit icmp error message twice.
>>
>> In ip_error(), peer->rate_tokens is set to ip_rt_error_burst, but in
>> inet_peer_xrlim_allow() peer->rate_tokens is set to XRLIM_BURST_FACTOR.
>> XRLIM_BURST_FACTOR is defined to 6, so user seting ip_rt_error_burst makes
>> no sense.
>>
>> In my opinion, the ip_rt_error_burst is used to limit icmp error messages
>> for daddr instead of saddr.
> 
> Hmmm...
> 
> ip_error is a dst_input function, as such it gets called with the incoming
> packet. saddr is the address we send the reply back (see
> icmp_send->icmp_route_lookup).
> 

But if we still use saddr to get inet_peer, seting ip_rt_error_burst will make
no sense, because it will be overwrited by XRLIM_BURST_FACTOR.

Thanks,
  Duan


> Sorry, I don't think the patch is correct.
> 
> Bye,
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net-next] net: remove unnecessary return's
From: Julia Lawall @ 2014-02-14  9:58 UTC (permalink / raw)
  To: David Miller; +Cc: davej, joe, stephen, netdev
In-Reply-To: <20140213.181416.1185115361356253646.davem@davemloft.net>

On Thu, 13 Feb 2014, David Miller wrote:

> From: Julia Lawall <julia.lawall@lip6.fr>
> Date: Thu, 13 Feb 2014 23:28:27 +0100 (CET)
>
> > On Thu, 13 Feb 2014, Dave Jones wrote:
> >
> >> On Thu, Feb 13, 2014 at 10:55:23PM +0100, Julia Lawall wrote:
> >>
> >>  > The patch below converts label: return; } to label: ; }.  I have only
> >>  > scanned through the patches, not patched the code and looked at the
> >>  > results, so I am not sure that it is completely correct.  On the other
> >>  > hand, I'm also not sure that label: ; } is better than label: return; },
> >>  > either.  If people think it is, then I can cheack the results in more
> >>  > detail.
> >>
> >> Why not delete the label, and just replace the goto with a return if
> >> the label is at the end of the function ?
> >
> > Here is an example.  Perhaps the uniformity of the if ... goto pattern is
> > valuable, though?
>
> I think it is valuable, it's so much easier to audit the return paths
> via a process of elimination with that kind of layout.  A return in
> the middle of that looks out of place at best.

Actually, I had a student who made a tool that went the other way around,
and introduced goto labels for sharable error handling code.  We didn't
get around to using it to send patches, though.  In that tool, we didn't
create labels just for returns, with the thought that in that case there
was no point to introduce a goto if there was nothing to share.

julia

^ permalink raw reply

* Re: [PATCH v2] can: xilinx CAN controller support.
From: Marc Kleine-Budde @ 2014-02-14  9:59 UTC (permalink / raw)
  To: Appana Durga Kedareswara Rao, wg@grandegger.com, Michal Simek,
	grant.likely@linaro.org, robh+dt@kernel.org,
	linux-can@vger.kernel.org
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org
In-Reply-To: <ad62e4f0-fe91-4dac-9a74-ffe9c9128cb2@CH1EHSMHS024.ehs.local>

[-- Attachment #1: Type: text/plain, Size: 14479 bytes --]

On 02/14/2014 10:36 AM, Appana Durga Kedareswara Rao wrote:
>>> +/* CAN register bit masks - XCAN_<REG>_<BIT>_MASK */
>>> +#define XCAN_SRR_CEN_MASK          0x00000002 /* CAN enable */
>>> +#define XCAN_SRR_RESET_MASK                0x00000001 /* Soft Reset the
>> CAN core */
>>> +#define XCAN_MSR_LBACK_MASK                0x00000002 /* Loop back
>> mode select */
>>> +#define XCAN_MSR_SLEEP_MASK                0x00000001 /* Sleep mode
>> select */
>>> +#define XCAN_BRPR_BRP_MASK         0x000000FF /* Baud rate
>> prescaler */
>>> +#define XCAN_BTR_SJW_MASK          0x00000180 /* Synchronous
>> jump width */
>>> +#define XCAN_BTR_TS2_MASK          0x00000070 /* Time segment
>> 2 */
>>> +#define XCAN_BTR_TS1_MASK          0x0000000F /* Time segment
>> 1 */
>>> +#define XCAN_ECR_REC_MASK          0x0000FF00 /* Receive error
>> counter */
>>> +#define XCAN_ECR_TEC_MASK          0x000000FF /* Transmit error
>> counter */
>>> +#define XCAN_ESR_ACKER_MASK                0x00000010 /* ACK error */
>>> +#define XCAN_ESR_BERR_MASK         0x00000008 /* Bit error */
>>> +#define XCAN_ESR_STER_MASK         0x00000004 /* Stuff error */
>>> +#define XCAN_ESR_FMER_MASK         0x00000002 /* Form error */
>>> +#define XCAN_ESR_CRCER_MASK                0x00000001 /* CRC error */
>>> +#define XCAN_SR_TXFLL_MASK         0x00000400 /* TX FIFO is full
>> */
>>> +#define XCAN_SR_ESTAT_MASK         0x00000180 /* Error status */
>>> +#define XCAN_SR_ERRWRN_MASK                0x00000040 /* Error warning
>> */
>>> +#define XCAN_SR_NORMAL_MASK                0x00000008 /* Normal mode
>> */
>>> +#define XCAN_SR_LBACK_MASK         0x00000002 /* Loop back
>> mode */
>>> +#define XCAN_SR_CONFIG_MASK                0x00000001 /* Configuration
>> mode */
>>> +#define XCAN_IXR_TXFEMP_MASK               0x00004000 /* TX FIFO Empty
>> */
>>> +#define XCAN_IXR_WKUP_MASK         0x00000800 /* Wake up
>> interrupt */
>>> +#define XCAN_IXR_SLP_MASK          0x00000400 /* Sleep
>> interrupt */
>>> +#define XCAN_IXR_BSOFF_MASK                0x00000200 /* Bus off
>> interrupt */
>>> +#define XCAN_IXR_ERROR_MASK                0x00000100 /* Error interrupt
>> */
>>> +#define XCAN_IXR_RXNEMP_MASK               0x00000080 /* RX FIFO
>> NotEmpty intr */
>>> +#define XCAN_IXR_RXOFLW_MASK               0x00000040 /* RX FIFO
>> Overflow intr */
>>> +#define XCAN_IXR_RXOK_MASK         0x00000010 /* Message
>> received intr */
>>> +#define XCAN_IXR_TXOK_MASK         0x00000002 /* TX successful
>> intr */
>>> +#define XCAN_IXR_ARBLST_MASK               0x00000001 /* Arbitration
>> lost intr */
>>> +#define XCAN_IDR_ID1_MASK          0xFFE00000 /* Standard msg
>> identifier */
>>> +#define XCAN_IDR_SRR_MASK          0x00100000 /* Substitute
>> remote TXreq */
>>> +#define XCAN_IDR_IDE_MASK          0x00080000 /* Identifier
>> extension */
>>> +#define XCAN_IDR_ID2_MASK          0x0007FFFE /* Extended
>> message ident */
>>> +#define XCAN_IDR_RTR_MASK          0x00000001 /* Remote TX
>> request */
>>> +#define XCAN_DLCR_DLC_MASK         0xF0000000 /* Data length
>> code */
>>> +
> 
> Need to use BIT() Macro for the Masks?

You can, but it IMHO only makes sense where only a single bit is set.

[...]

>>> +   int waiting_ech_skb_num;
>>> +   int xcan_echo_skb_max_tx;
>>> +   int xcan_echo_skb_max_rx;
>>> +   struct napi_struct napi;
>>> +   spinlock_t ech_skb_lock;
>>> +   u32 (*read_reg)(const struct xcan_priv *priv, int reg);
>>> +   void (*write_reg)(const struct xcan_priv *priv, int reg, u32 val);
>>
>> Please remove read_reg, write_reg, as long as there isn't any BE support in
>> the driver, call them directly.
>>
> 
> 
> As per yours and Michal discussion I am keeping this here (endianess
> of the IP is not fixed at compile time).

Ok.

[...]

>>> +/**
>>> + * xcan_start_xmit - Starts the transmission
>>> + * @skb:   sk_buff pointer that contains data to be Txed
>>> + * @ndev:  Pointer to net_device structure
>>> + *
>>> + * This function is invoked from upper layers to initiate
>>> +transmission. This
>>> + * function uses the next available free txbuff and populates their
>>> +fields to
>>> + * start the transmission.
>>> + *
>>> + * Return: 0 on success and failure value on error  */ static int
>>> +xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) {
>>> +   struct xcan_priv *priv = netdev_priv(ndev);
>>> +   struct net_device_stats *stats = &ndev->stats;
>>> +   struct can_frame *cf = (struct can_frame *)skb->data;
>>> +   u32 id, dlc, data[2] = {0, 0}, rtr = 0;
>>
>> I think you can drop the rtr varibale and use cf->can_id & CAN_RTR_FLAG
>> instead.
>>
> 
> OK
>>> +   unsigned long flags;
>>> +
>>> +   if (can_dropped_invalid_skb(ndev, skb))
>>> +           return NETDEV_TX_OK;
>>> +
>>> +   /* Watch carefully on the bit sequence */
>>> +   if (cf->can_id & CAN_EFF_FLAG) {
>>> +           /* Extended CAN ID format */
>>> +           id = ((cf->can_id & CAN_EFF_MASK) << XCAN_IDR_ID2_SHIFT)
>> &
>>> +                   XCAN_IDR_ID2_MASK;
>>> +           id |= (((cf->can_id & CAN_EFF_MASK) >>
>>> +                   (CAN_EFF_ID_BITS-CAN_SFF_ID_BITS)) <<
>>> +                   XCAN_IDR_ID1_SHIFT) & XCAN_IDR_ID1_MASK;
>>> +
>>> +           /* The substibute remote TX request bit should be "1"
>>> +            * for extended frames as in the Xilinx CAN datasheet
>>> +            */
>>> +           id |= XCAN_IDR_IDE_MASK | XCAN_IDR_SRR_MASK;
>>> +
>>> +           if (cf->can_id & CAN_RTR_FLAG) {
>>> +                   /* Extended frames remote TX request */
>>> +                   id |= XCAN_IDR_RTR_MASK;
>>> +                   rtr = 1;
>>> +           }
>>> +   } else {
>>> +           /* Standard CAN ID format */
>>> +           id = ((cf->can_id & CAN_SFF_MASK) << XCAN_IDR_ID1_SHIFT)
>> &
>>> +                   XCAN_IDR_ID1_MASK;
>>> +
>>> +           if (cf->can_id & CAN_RTR_FLAG) {
>>> +                   /* Extended frames remote TX request */
>>> +                   id |= XCAN_IDR_SRR_MASK;
>>> +                   rtr = 1;
>>> +           }
>>> +   }
>>> +
>>> +   dlc = (cf->can_dlc & 0xf) << XCAN_DLCR_DLC_SHIFT;
>>
>> No need to mask dlc, it's valid.
>>
> OK
> 
>>> +
>>> +   if (dlc > 0)
>>
>> You've copied my speudo code :)
>> But you have to use (cf->can_dlc > 0) here, as dlc is the shifted value.
> 
> Yes :) I missed it will change
> 
>>
>>> +           data[0] = be32_to_cpup((__be32 *)(cf->data + 0));
>>> +   if (dlc > 4)
>>> +           data[1] = be32_to_cpup((__be32 *)(cf->data + 4));
>>> +
>>> +   can_put_echo_skb(skb, ndev, priv->ech_skb_next);
>>
>>       can_put_echo_skb(skb, ndev,
>>               priv->tx_head % priv->xcan_echo_skb_max_tx);
>>
>>       priv->tx_head++;
>>
> 
> Ok
>>> +
>>> +   /* Write the Frame to Xilinx CAN TX FIFO */
>>> +   priv->write_reg(priv, XCAN_TXFIFO_ID_OFFSET, id);
>>> +   priv->write_reg(priv, XCAN_TXFIFO_DLC_OFFSET, dlc);
>>> +   if (!rtr) {
>>> +           priv->write_reg(priv, XCAN_TXFIFO_DW1_OFFSET, data[0]);
>>> +           priv->write_reg(priv, XCAN_TXFIFO_DW2_OFFSET, data[1]);
>>> +           stats->tx_bytes += cf->can_dlc;
>>
>> Please add a comment which write triggers the tx. What in case of the rtr?
>> Which write triggers the tx then?
>>
> 
> Ok  Will Add
> 
>  In RTR Case  the below write triggers the trasmission
> priv->write_reg(priv, XCAN_TXFIFO_DLC_OFFSET, dlc);
> In Normal case  this write
> priv->write_reg(priv, XCAN_TXFIFO_DW2_OFFSET, data[1]);
> Triggers the transmission.
> 
> In Btw: Due to the limitations in the IP( Tx DLC register is a write only Register)
> I can't put this stats->tx_bytes += cf->can_dlc; in the tx interrupt routine.

No problem.

[...]

>>> +   if (work_done < quota) {
>>> +           napi_complete(napi);
>>> +           ier = priv->read_reg(priv, XCAN_IER_OFFSET);
>>> +           ier |= (XCAN_IXR_RXOK_MASK |
>> XCAN_IXR_RXNEMP_MASK);
>>> +           priv->write_reg(priv, XCAN_IER_OFFSET, ier);
>>
>> Is this a read-modify-write register? I mean will an interrupt get disabled, if
>> you write a 0-bit in the IER register? What does the ICR register?
> 
> ISR- Interrupt Status Register (Read only register)
> IER- Interrupt Enable Register(R/w register)
> ICR- Interrupt Clear Register.(write only register)
> 
> 
> The Interrupt Status Register (ISR) contains bits that are set when a particular interrupt condition occurs. If
> the corresponding mask bit in the Interrupt Enable Register is set, an interrupt is generated.
> Interrupt bits in the ISR can be cleared by writing to the Interrupt Clear Register

Thanks.

[...]

>>> +/**
>>> + * xcan_open - Driver open routine
>>> + * @ndev:  Pointer to net_device structure
>>> + *
>>> + * This is the driver open routine.
>>> + * Return: 0 on success and failure value on error  */ static int
>>> +xcan_open(struct net_device *ndev) {
>>> +   struct xcan_priv *priv = netdev_priv(ndev);
>>> +   int ret;
>>> +
>>> +   ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
>>> +                   ndev->name, (void *)ndev);
>>> +   if (ret < 0) {
>>> +           netdev_err(ndev, "Irq allocation for CAN failed\n");
>>> +           return ret;
>>> +   }
>>> +
>>> +   /* Set chip into reset mode */
>>> +   ret = set_reset_mode(ndev);
>>> +   if (ret < 0)
>>> +           netdev_err(ndev, "mode resetting failed failed!\n");
>>
>> Is this critical?
> 
> This condition usually won't fail.
> But if the controller has a problems at the h/w level in that case putted this err print.
> If you want me to change it as a warning will do

If there is a hardware level problem, is it better to return here with
an error (and free the IRQ)?

[...]

>>> +/**
>>> + * xcan_probe - Platform registration call
>>> + * @pdev:  Handle to the platform device structure
>>> + *
>>> + * This function does all the memory allocation and registration for
>>> +the CAN
>>> + * device.
>>> + *
>>> + * Return: 0 on success and failure value on error  */ static int
>>> +xcan_probe(struct platform_device *pdev) {
>>> +   struct resource *res; /* IO mem resources */
>>> +   struct net_device *ndev;
>>> +   struct xcan_priv *priv;
>>> +   int ret, fifodep;
>>> +
>>> +   /* Create a CAN device instance */
>>> +   ndev = alloc_candev(sizeof(struct xcan_priv),
>> XCAN_ECHO_SKB_MAX);
>>> +   if (!ndev)
>>> +           return -ENOMEM;
>>> +
>>> +   priv = netdev_priv(ndev);
>>> +   priv->dev = ndev;
>>> +   priv->can.bittiming_const = &xcan_bittiming_const;
>>> +   priv->can.do_set_bittiming = xcan_set_bittiming;
>>> +   priv->can.do_set_mode = xcan_do_set_mode;
>>> +   priv->can.do_get_berr_counter = xcan_get_berr_counter;
>>> +   priv->can.ctrlmode_supported = CAN_CTRLMODE_LOOPBACK |
>>> +                                   CAN_CTRLMODE_BERR_REPORTING;
>>> +
>>> +   /* Get IRQ for the device */
>>> +   ndev->irq = platform_get_irq(pdev, 0);
>>> +
>>> +   spin_lock_init(&priv->ech_skb_lock);
>>> +   ndev->flags |= IFF_ECHO;        /* We support local echo */
>>> +
>>> +   platform_set_drvdata(pdev, ndev);
>>> +   SET_NETDEV_DEV(ndev, &pdev->dev);
>>> +   ndev->netdev_ops = &xcan_netdev_ops;
>>> +
>>> +   /* Get the virtual base address for the device */
>>> +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>> +   priv->reg_base = devm_ioremap_resource(&pdev->dev, res);
>>> +   if (IS_ERR(priv->reg_base)) {
>>> +           ret = PTR_ERR(priv->reg_base);
>>> +           goto err_free;
>>> +   }
>>> +   ndev->mem_start = res->start;
>>> +   ndev->mem_end = res->end;
>>> +
>>> +   priv->write_reg = xcan_write_reg;
>>> +   priv->read_reg = xcan_read_reg;
>>> +
>>> +   ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth",
>>> +                           &fifodep);
>>> +   if (ret < 0)
>>> +           goto err_free;
>>> +   priv->xcan_echo_skb_max_tx = fifodep;
>>> +
>>> +   ret = of_property_read_u32(pdev->dev.of_node, "rx-fifo-depth",
>>> +                           &fifodep);
>>> +   if (ret < 0)
>>> +           goto err_free;
>>> +   priv->xcan_echo_skb_max_rx = fifodep;
>>> +
>>> +   /* Getting the CAN devclk info */
>>> +   priv->devclk = devm_clk_get(&pdev->dev, "ref_clk");
>>> +   if (IS_ERR(priv->devclk)) {
>>> +           dev_err(&pdev->dev, "Device clock not found.\n");
>>> +           ret = PTR_ERR(priv->devclk);
>>> +           goto err_free;
>>> +   }
>>> +
>>> +   /* Check for type of CAN device */
>>> +   if (of_device_is_compatible(pdev->dev.of_node,
>>> +                               "xlnx,zynq-can-1.00.a")) {
>>> +           priv->aperclk = devm_clk_get(&pdev->dev, "aper_clk");
>>> +           if (IS_ERR(priv->aperclk)) {
>>> +                   dev_err(&pdev->dev, "aper clock not found\n");
>>> +                   ret = PTR_ERR(priv->aperclk);
>>> +                   goto err_free;
>>> +           }
>>> +   } else {
>>> +           priv->aperclk = priv->devclk;
>>> +   }
>>> +
>>> +   ret = clk_prepare_enable(priv->devclk);
>>> +   if (ret) {
>>> +           dev_err(&pdev->dev, "unable to enable device clock\n");
>>> +           goto err_free;
>>> +   }
>>> +
>>> +   ret = clk_prepare_enable(priv->aperclk);
>>> +   if (ret) {
>>> +           dev_err(&pdev->dev, "unable to enable aper clock\n");
>>> +           goto err_unprepar_disabledev;
>>> +   }
>>
>> Can you keep your clocks disaled if the interface is not up?
> 
> I didn't get it will you please explain?

This feature s optional, but a a good practice.

This function gets called when the driver is loaded, i.e. during boot.
So the complete CAN core will be enabled and powered, even if the
interface is down. To reduce power consumption it's better to enable the
clocks in the open() function and disable in close(). If you access some
CAN registers during probe you have to enable the clock and you probably
have to enable it in the do_get_berr_counter callback, as it may be
called if the interface is still down.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v5 09/10] Documentation: add Device tree bindings for Broadcom GENET
From: Mark Rutland @ 2014-02-14 10:19 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev@vger.kernel.org, davem@davemloft.net,
	devicetree@vger.kernel.org, cernekee@gmail.com,
	romieu@fr.zoreil.com
In-Reply-To: <1392336531-28875-10-git-send-email-f.fainelli@gmail.com>

On Fri, Feb 14, 2014 at 12:08:50AM +0000, Florian Fainelli wrote:
> This patch adds the Device Tree bindings for the Broadcom GENET Gigabit
> Ethernet controller. A bunch of examples are provided to illustrate the
> versatile aspect of the hardare.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Changes in v5:
> - respin due to the CONFIG_OF dependency fix
> 
> Changes in v4:
> - respin due to the Kconfig dependency
> 
> Changes in v3
> - improve compatible property description to specify all compatible strings
>   supported
> - move MDIO bus node to a separate section, to separate from the properties
> - detail which exact phy-mode strings are supported and that this refers to
>   the ePAPR 'phy-connection-type' modes
> - fixed MDIO bus node compatible string to match what is really provided by
>   the bootloader and use the same details as for the GENET compatible string
> - add address-cells and size-cells required properties for the GENET device
>   tree node
> - add a better description for the 'fixed-link' property with reference to
>   existing binding documentation
> - add a reference to the Ethernet PHY device tree binding
> - add a description for the interrupts properties
> - add a documentation for the optional clocks properties
> - removed unused and deprecated device_type properties
> 
> Changes in v2:
> - rebased against net-next/master
> 
>  .../devicetree/bindings/net/broadcom-bcmgenet.txt  | 121 +++++++++++++++++++++
>  1 file changed, 121 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
> new file mode 100644
> index 0000000..afd31f9
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
> @@ -0,0 +1,121 @@
> +* Broadcom BCM7xxx Ethernet Controller (GENET)
> +
> +Required properties:
> +- compatible: should contain one of "brcm,genet-v1", "brcm,genet-v2",
> +  "brcm,genet-v3", "brcm,genet-v4".
> +- reg: address and length of the register set for the device
> +- interrupts: must be two cells, the first cell is the general purpose
> +  interrupt line, while the second cell is the interrupt for the ring
> +  RX and TX queues operating in ring mode
> +- phy-mode: String, operation mode of the PHY interface. Supported values are
> +  "mii", "rgmii", "rgmii-txid", "rev-mii", "moca". Analogous to ePAPR
> +  "phy-connection-type" values
> +- address-cells: should be 1
> +- size-cells: should be 1
> +
> +Optional properties:
> +- clocks: When provided, must be two cells, first one is the main GENET clock
> +  and the second cell is the Genet Wake-on-LAN clock.

Clocks aren't just cells. They are referred to with phandle +
clock-specificer pairs.

This doesn't document the names that the driver requests the clocks by.
Just listing the clocks isn't sufficient, as any dts will have to add a
clock-names property to get those clocks probed.

Thanks,
Mark.

^ permalink raw reply

* Re: [PATCH net] packet: check for ndo_select_queue during queue selection
From: Daniel Borkmann @ 2014-02-14 10:21 UTC (permalink / raw)
  To: David Miller; +Cc: mathias.kretschmer, netdev, brouer
In-Reply-To: <20140214.004048.616136663605863975.davem@davemloft.net>

On 02/14/2014 06:40 AM, David Miller wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Thu, 13 Feb 2014 18:18:55 +0100
>
>> The original pktgen scenario for which PACKET_QDISC_BYPASS was
>> designed for is still intact with this change anyway. We think
>
> I guess you have no intention of using this feature on bnx2x, ixgbe,
> or mlx4 devices then?  That covers a rather large component of gigabit
> ethernet devices out there, doesn't it?
>
> They both hook up an ndo_select_queue method.

Yes, it's suboptimal :/ The other two possibilities I see is 1) to check
for dev->ieee80211_ptr on setsockopt(2) and bind(2) (which could happen
_after_ the option was set) and just disallow that option for those cases
(doesn't make sense to use it for ieee80211 devs anyway, and they seem to
be the only special-cased users), or, preferably 2) to just leave the
code as it currently is, that is, similar to pktgen.

^ permalink raw reply

* [PATCH] ipv4: distinguish EHOSTUNREACH from the ENETUNREACH
From: Duan Jiong @ 2014-02-14 10:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


since commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."),
the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of
err was always set to ENETUNREACH.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/ipv4/route.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 25071b4..6f6dd85 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1695,8 +1695,11 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	fl4.daddr = daddr;
 	fl4.saddr = saddr;
 	err = fib_lookup(net, &fl4, &res);
-	if (err != 0)
+	if (err != 0) {
+		if (!IN_DEV_FORWARD(in_dev))
+			err = -EHOSTUNREACH;
 		goto no_route;
+	}
 
 	RT_CACHE_STAT_INC(in_slow_tot);
 
@@ -1712,8 +1715,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		goto local_input;
 	}
 
-	if (!IN_DEV_FORWARD(in_dev))
+	if (!IN_DEV_FORWARD(in_dev)) {
+		err = -EHOSTUNREACH;
 		goto no_route;
+	}
 	if (res.type != RTN_UNICAST)
 		goto martian_destination;
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] ipv4: use daddr to get inet_peer
From: Hannes Frederic Sowa @ 2014-02-14 10:35 UTC (permalink / raw)
  To: Duan Jiong; +Cc: David Miller, netdev
In-Reply-To: <52FDE70B.2020901@cn.fujitsu.com>

On Fri, Feb 14, 2014 at 05:51:07PM +0800, Duan Jiong wrote:
> 于 2014年02月14日 17:41, Hannes Frederic Sowa 写道:
> > On Fri, Feb 14, 2014 at 05:25:35PM +0800, Duan Jiong wrote:
> >>
> >> since commit 1d861aa4("inet: Minimize use of cached route inetpeer"),
> >> ip_error() uses saddr to get inet_peer, so ip_error() and icmpv4_xrlim_allow()
> >> use the same inet_peer to limit icmp error message twice.
> >>
> >> In ip_error(), peer->rate_tokens is set to ip_rt_error_burst, but in
> >> inet_peer_xrlim_allow() peer->rate_tokens is set to XRLIM_BURST_FACTOR.
> >> XRLIM_BURST_FACTOR is defined to 6, so user seting ip_rt_error_burst makes
> >> no sense.
> >>
> >> In my opinion, the ip_rt_error_burst is used to limit icmp error messages
> >> for daddr instead of saddr.
> > 
> > Hmmm...
> > 
> > ip_error is a dst_input function, as such it gets called with the incoming
> > packet. saddr is the address we send the reply back (see
> > icmp_send->icmp_route_lookup).
> > 
> 
> But if we still use saddr to get inet_peer, seting ip_rt_error_burst will make
> no sense, because it will be overwrited by XRLIM_BURST_FACTOR.

Sorry, I cannot follow you.

On output we refetch the inetpeer with the destination address. I don't
see how the patch helps.

Greetings,

  Hannes

^ permalink raw reply

* [patch net] ovs: fix dp check in ovs_dp_reset_user_features
From: Jiri Pirko @ 2014-02-14 10:42 UTC (permalink / raw)
  To: netdev; +Cc: davem, jesse, dev, tgraf

This fixes crash when userspace does "ovs-dpctl add-dp dev" where dev is
existing non-dp netdevice.

Introduced by:
commit 44da5ae5fbea4686f667dc854e5ea16814e44c59
"openvswitch: Drop user features if old user space attempted to create datapath"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/openvswitch/datapath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e9a48ba..e42340d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1174,7 +1174,7 @@ static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *in
 	struct datapath *dp;
 
 	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
-	if (!dp)
+	if (IS_ERR(dp))
 		return;
 
 	WARN(dp->user_features, "Dropping previously announced user features\n");
-- 
1.8.5.3

^ permalink raw reply related

* Re: suspicious code in net/rxrpc/ar-error.c
From: David Howells @ 2014-02-14 11:21 UTC (permalink / raw)
  To: Dave Jones; +Cc: dhowells, netdev
In-Reply-To: <20140214022815.GA19030@redhat.com>

Dave Jones <davej@redhat.com> wrote:

> What was intended here ? Perhaps one of those mtu comparisons should be
> comparing peer->mtu ?

Good spot, thanks.  Yeah, I should be loading mtu with peer->mtu immediately
after the comment.

David

^ permalink raw reply

* [PATCH V2 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel; +Cc: netdev, paul.durrant, wei.liu2, ian.campbell

This patch series implements multiple transmit and receive queues (i.e.
multiple shared rings) for the xen virtual network interfaces.

The series is split up as follows:
 - Patches 1 and 3 factor out the queue-specific data for netback and
    netfront respectively, and modify the rest of the code to use these
    as appropriate.
 - Patches 2 and 4 introduce new XenStore keys to negotiate and use
   multiple shared rings and event channels, and code to connect these
   as appropriate.
 - Patch 5 documents the XenStore keys required for the new feature
   in include/xen/interface/io/netif.h

All other transmit and receive processing remains unchanged, i.e. there
is a kthread per queue and a NAPI context per queue.

The performance of these patches has been analysed in detail, with
results available at:

http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing

To summarise:
  * Using multiple queues allows a VM to transmit at line rate on a 10
    Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
    with a single queue.
  * For intra-host VM--VM traffic, eight queues provide 171% of the
    throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
  * There is a corresponding increase in total CPU usage, i.e. this is a
    scaling out over available resources, not an efficiency improvement.
  * Results depend on the availability of sufficient CPUs, as well as the
    distribution of interrupts and the distribution of TCP streams across
    the queues.

Queue selection is currently achieved via an L4 hash on the packet (i.e.
TCP src/dst port, IP src/dst address) and is not negotiated between the
frontend and backend, since only one option exists. Future patches to
support other frontends (particularly Windows) will need to add some
capability to negotiate not only the hash algorithm selection, but also
allow the frontend to specify some parameters to this.

Queue-specific XenStore entries for ring references and event channels
are stored hierarchically, i.e. under .../queue-N/... where N varies
from 0 to one less than the requested number of queues (inclusive). If
only one queue is requested, it falls back to the flat structure where
the ring references and event channels are written at the same level as
other vif information.

V2:
- Rebase onto net-next
- Change queue->number to queue->id
- Add atomic operations around the small number of stats variables that
  are not queue-specific or per-cpu
- Fixup formatting and style issues
- XenStore protocol changes documented in netif.h
- Default max. number of queues to num_online_cpus()
- Check requested number of queues does not exceed maximum

--
Andrew J. Bennieston

^ permalink raw reply

* [PATCH V2 net-next 1/5] xen-netback: Factor queue-specific data into queue struct.
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew J. Bennieston, netdev, paul.durrant, wei.liu2,
	ian.campbell
In-Reply-To: <1392378624-6123-1-git-send-email-andrew.bennieston@citrix.com>

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

In preparation for multi-queue support in xen-netback, move the
queue-specific data from struct xenvif into struct xenvif_queue, and
update the rest of the code to use this.

Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_netdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.

Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0 for a single queue and uses
skb_get_hash() to compute the queue index otherwise.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 drivers/net/xen-netback/common.h    |   81 ++++--
 drivers/net/xen-netback/interface.c |  314 ++++++++++++++-------
 drivers/net/xen-netback/netback.c   |  528 ++++++++++++++++++-----------------
 drivers/net/xen-netback/xenbus.c    |   87 ++++--
 4 files changed, 593 insertions(+), 417 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..2550867 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -108,17 +108,36 @@ struct xenvif_rx_meta {
  */
 #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
-struct xenvif {
-	/* Unique identifier for this interface. */
-	domid_t          domid;
-	unsigned int     handle;
+/* Queue name is interface name with "-qNNN" appended */
+#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
+
+/* IRQ name is queue name with "-tx" or "-rx" appended */
+#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 4)
+
+struct xenvif;
+
+struct xenvif_stats {
+	/* Stats fields to be updated per-queue.
+	 * A subset of struct net_device_stats that contains only the
+	 * fields that are updated in netback.c for each queue.
+	 */
+	unsigned int rx_bytes;
+	unsigned int rx_packets;
+	unsigned int tx_bytes;
+	unsigned int tx_packets;
+};
+
+struct xenvif_queue { /* Per-queue data for xenvif */
+	unsigned int id; /* Queue ID, 0-based */
+	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
+	struct xenvif *vif; /* Parent VIF */
 
 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int tx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
+	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
 	struct xen_netif_tx_back_ring tx;
 	struct sk_buff_head tx_queue;
 	struct page *mmap_pages[MAX_PENDING_REQS];
@@ -140,19 +159,34 @@ struct xenvif {
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int rx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
+	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
 	RING_IDX rx_last_skb_slots;
 
-	/* This array is allocated seperately as it is large */
-	struct gnttab_copy *grant_copy_op;
+	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
 
 	/* We create one meta structure per ring request we consume, so
 	 * the maximum number is the same as the ring size.
 	 */
 	struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE];
 
+	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
+	unsigned long   credit_bytes;
+	unsigned long   credit_usec;
+	unsigned long   remaining_credit;
+	struct timer_list credit_timeout;
+	u64 credit_window_start;
+
+	/* Statistics */
+	struct xenvif_stats stats;
+};
+
+struct xenvif {
+	/* Unique identifier for this interface. */
+	domid_t          domid;
+	unsigned int     handle;
+
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
@@ -166,15 +200,12 @@ struct xenvif {
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
 
-	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
-	unsigned long   credit_bytes;
-	unsigned long   credit_usec;
-	unsigned long   remaining_credit;
-	struct timer_list credit_timeout;
-	u64 credit_window_start;
+	/* Queues */
+	unsigned int num_queues;
+	struct xenvif_queue *queues;
 
 	/* Statistics */
-	unsigned long rx_gso_checksum_fixup;
+	atomic_t rx_gso_checksum_fixup;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
@@ -189,7 +220,9 @@ struct xenvif *xenvif_alloc(struct device *parent,
 			    domid_t domid,
 			    unsigned int handle);
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
+void xenvif_init_queue(struct xenvif_queue *queue);
+
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn);
 void xenvif_disconnect(struct xenvif *vif);
@@ -200,31 +233,31 @@ void xenvif_xenbus_fini(void);
 
 int xenvif_schedulable(struct xenvif *vif);
 
-int xenvif_must_stop_queue(struct xenvif *vif);
+int xenvif_must_stop_queue(struct xenvif_queue *queue);
 
 /* (Un)Map communication rings. */
-void xenvif_unmap_frontend_rings(struct xenvif *vif);
-int xenvif_map_frontend_rings(struct xenvif *vif,
+void xenvif_unmap_frontend_rings(struct xenvif_queue *queue);
+int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 			      grant_ref_t tx_ring_ref,
 			      grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
-void xenvif_check_rx_xenvif(struct xenvif *vif);
+void xenvif_check_rx_xenvif(struct xenvif_queue *queue);
 
 /* Prevent the device from generating any further traffic. */
 void xenvif_carrier_off(struct xenvif *vif);
 
-int xenvif_tx_action(struct xenvif *vif, int budget);
+int xenvif_tx_action(struct xenvif_queue *queue, int budget);
 
 int xenvif_kthread(void *data);
-void xenvif_kick_thread(struct xenvif *vif);
+void xenvif_kick_thread(struct xenvif_queue *queue);
 
 /* Determine whether the needed number of slots (req) are available,
  * and set req_event if not.
  */
-bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
+bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);
 
-void xenvif_stop_queue(struct xenvif *vif);
+void xenvif_carrier_on(struct xenvif *vif);
 
 extern bool separate_tx_rx_irq;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7669d49..4cde112 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -34,7 +34,6 @@
 #include <linux/ethtool.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
-#include <linux/vmalloc.h>
 
 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
@@ -42,6 +41,16 @@
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
 
+static inline void xenvif_stop_queue(struct xenvif_queue *queue)
+{
+	struct net_device *dev = queue->vif->dev;
+
+	if (!queue->vif->can_queue)
+		return;
+
+	netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
+}
+
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@@ -49,20 +58,20 @@ int xenvif_schedulable(struct xenvif *vif)
 
 static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;
 
-	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
-		napi_schedule(&vif->napi);
+	if (RING_HAS_UNCONSUMED_REQUESTS(&queue->tx))
+		napi_schedule(&queue->napi);
 
 	return IRQ_HANDLED;
 }
 
-static int xenvif_poll(struct napi_struct *napi, int budget)
+int xenvif_poll(struct napi_struct *napi, int budget)
 {
-	struct xenvif *vif = container_of(napi, struct xenvif, napi);
+	struct xenvif_queue *queue = container_of(napi, struct xenvif_queue, napi);
 	int work_done;
 
-	work_done = xenvif_tx_action(vif, budget);
+	work_done = xenvif_tx_action(queue, budget);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -86,7 +95,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 
 		local_irq_save(flags);
 
-		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
+		RING_FINAL_CHECK_FOR_REQUESTS(&queue->tx, more_to_do);
 		if (!more_to_do)
 			__napi_complete(napi);
 
@@ -98,9 +107,9 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 
 static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;
 
-	xenvif_kick_thread(vif);
+	xenvif_kick_thread(queue);
 
 	return IRQ_HANDLED;
 }
@@ -113,15 +122,48 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
+							   void *accel_priv)
+{
+	struct xenvif *vif = netdev_priv(dev);
+	u32 hash;
+	u16 queue_index;
+
+	/* First, check if there is only one queue to optimise the
+	 * single-queue or old frontend scenario.
+	 */
+	if (vif->num_queues == 1) {
+		queue_index = 0;
+	} else {
+		/* Use skb_get_hash to obtain an L4 hash if available */
+		hash = skb_get_hash(skb);
+		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
+	}
+
+	return queue_index;
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	u16 index;
 	int min_slots_needed;
 
 	BUG_ON(skb->dev != dev);
 
+	/* Drop the packet if queues are not set up */
+	if (vif->num_queues < 1)
+		goto drop;
+
+	/* Obtain the queue to be used to transmit this packet */
+	index = skb_get_queue_mapping(skb);
+	if (index >= vif->num_queues)
+		index = 0; /* Fall back to queue 0 if out of range */
+	queue = &vif->queues[index];
+
 	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL || !xenvif_schedulable(vif))
+	if (queue->task == NULL || !xenvif_schedulable(vif))
 		goto drop;
 
 	/* At best we'll need one slot for the header and one for each
@@ -140,11 +182,11 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
-		xenvif_stop_queue(vif);
+	if (!xenvif_rx_ring_slots_available(queue, min_slots_needed))
+		xenvif_stop_queue(queue);
 
-	skb_queue_tail(&vif->rx_queue, skb);
-	xenvif_kick_thread(vif);
+	skb_queue_tail(&queue->rx_queue, skb);
+	xenvif_kick_thread(queue);
 
 	return NETDEV_TX_OK;
 
@@ -157,25 +199,58 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	unsigned long rx_bytes = 0;
+	unsigned long rx_packets = 0;
+	unsigned long tx_bytes = 0;
+	unsigned long tx_packets = 0;
+	unsigned int index;
+
+	/* Aggregate tx and rx stats from each queue */
+	for (index = 0; index < vif->num_queues; ++index) {
+		queue = &vif->queues[index];
+		rx_bytes += queue->stats.rx_bytes;
+		rx_packets += queue->stats.rx_packets;
+		tx_bytes += queue->stats.tx_bytes;
+		tx_packets += queue->stats.tx_packets;
+	}
+
+	vif->dev->stats.rx_bytes = rx_bytes;
+	vif->dev->stats.rx_packets = rx_packets;
+	vif->dev->stats.tx_bytes = tx_bytes;
+	vif->dev->stats.tx_packets = tx_packets;
+
 	return &vif->dev->stats;
 }
 
 static void xenvif_up(struct xenvif *vif)
 {
-	napi_enable(&vif->napi);
-	enable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		enable_irq(vif->rx_irq);
-	xenvif_check_rx_xenvif(vif);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_enable(&queue->napi);
+		enable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			enable_irq(queue->rx_irq);
+		xenvif_check_rx_xenvif(queue);
+	}
 }
 
 static void xenvif_down(struct xenvif *vif)
 {
-	napi_disable(&vif->napi);
-	disable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		disable_irq(vif->rx_irq);
-	del_timer_sync(&vif->credit_timeout);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_disable(&queue->napi);
+		disable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			disable_irq(queue->rx_irq);
+		del_timer_sync(&queue->credit_timeout);
+	}
 }
 
 static int xenvif_open(struct net_device *dev)
@@ -183,7 +258,7 @@ static int xenvif_open(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_up(vif);
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 	return 0;
 }
 
@@ -192,7 +267,7 @@ static int xenvif_close(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_down(vif);
-	netif_stop_queue(dev);
+	netif_tx_stop_all_queues(dev);
 	return 0;
 }
 
@@ -253,7 +328,7 @@ static void xenvif_get_ethtool_stats(struct net_device *dev,
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++)
-		data[i] = *(unsigned long *)(vif + xenvif_stats[i].offset);
+		data[i] = atomic_read((atomic_t *)vif + xenvif_stats[i].offset);
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * data)
@@ -286,6 +361,7 @@ static const struct net_device_ops xenvif_netdev_ops = {
 	.ndo_fix_features = xenvif_fix_features,
 	.ndo_set_mac_address = eth_mac_addr,
 	.ndo_validate_addr   = eth_validate_addr,
+	.ndo_select_queue = xenvif_select_queue,
 };
 
 struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
@@ -295,10 +371,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	struct net_device *dev;
 	struct xenvif *vif;
 	char name[IFNAMSIZ] = {};
-	int i;
 
 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
-	dev = alloc_netdev(sizeof(struct xenvif), name, ether_setup);
+	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
 	if (dev == NULL) {
 		pr_warn("Could not allocate netdev for %s\n", name);
 		return ERR_PTR(-ENOMEM);
@@ -308,24 +383,15 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 
 	vif = netdev_priv(dev);
 
-	vif->grant_copy_op = vmalloc(sizeof(struct gnttab_copy) *
-				     MAX_GRANT_COPY_OPS);
-	if (vif->grant_copy_op == NULL) {
-		pr_warn("Could not allocate grant copy space for %s\n", name);
-		free_netdev(dev);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
 	vif->ip_csum = 1;
 	vif->dev = dev;
 
-	vif->credit_bytes = vif->remaining_credit = ~0UL;
-	vif->credit_usec  = 0UL;
-	init_timer(&vif->credit_timeout);
-	vif->credit_window_start = get_jiffies_64();
+	/* Start out with no queues */
+	vif->num_queues = 0;
+	vif->queues = NULL;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
@@ -336,16 +402,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
 
-	skb_queue_head_init(&vif->rx_queue);
-	skb_queue_head_init(&vif->tx_queue);
-
-	vif->pending_cons = 0;
-	vif->pending_prod = MAX_PENDING_REQS;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->pending_ring[i] = i;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->mmap_pages[i] = NULL;
-
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
 	 * largest non-broadcast address to prevent the address getting
@@ -355,8 +411,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	memset(dev->dev_addr, 0xFF, ETH_ALEN);
 	dev->dev_addr[0] &= ~0x01;
 
-	netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);
-
 	netif_carrier_off(dev);
 
 	err = register_netdev(dev);
@@ -373,85 +427,111 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
+void xenvif_init_queue(struct xenvif_queue *queue)
+{
+	int i;
+
+	queue->credit_bytes = queue->remaining_credit = ~0UL;
+	queue->credit_usec  = 0UL;
+	init_timer(&queue->credit_timeout);
+	queue->credit_window_start = get_jiffies_64();
+
+	skb_queue_head_init(&queue->rx_queue);
+	skb_queue_head_init(&queue->tx_queue);
+
+	queue->pending_cons = 0;
+	queue->pending_prod = MAX_PENDING_REQS;
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		queue->pending_ring[i] = i;
+		queue->mmap_pages[i] = NULL;
+	}
+
+	netif_napi_add(queue->vif->dev, &queue->napi, xenvif_poll,
+			XENVIF_NAPI_WEIGHT);
+}
+
+void xenvif_carrier_on(struct xenvif *vif)
+{
+	rtnl_lock();
+	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
+		dev_set_mtu(vif->dev, ETH_DATA_LEN);
+	netdev_update_features(vif->dev);
+	netif_carrier_on(vif->dev);
+	if (netif_running(vif->dev))
+		xenvif_up(vif);
+	rtnl_unlock();
+}
+
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn)
 {
 	struct task_struct *task;
 	int err = -ENOMEM;
 
-	BUG_ON(vif->tx_irq);
-	BUG_ON(vif->task);
+	BUG_ON(queue->tx_irq);
+	BUG_ON(queue->task);
 
-	err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
+	err = xenvif_map_frontend_rings(queue, tx_ring_ref, rx_ring_ref);
 	if (err < 0)
 		goto err;
 
-	init_waitqueue_head(&vif->wq);
+	init_waitqueue_head(&queue->wq);
 
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_interrupt, 0,
-			vif->dev->name, vif);
+			queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
+			queue->name, queue);
 		if (err < 0)
 			goto err_unmap;
-		vif->tx_irq = vif->rx_irq = err;
-		disable_irq(vif->tx_irq);
+		queue->tx_irq = queue->rx_irq = err;
+		disable_irq(queue->tx_irq);
 	} else {
 		/* feature-split-event-channels == 1 */
-		snprintf(vif->tx_irq_name, sizeof(vif->tx_irq_name),
-			 "%s-tx", vif->dev->name);
+		snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
+			 "%s-tx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
-			vif->tx_irq_name, vif);
+			queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
+			queue->tx_irq_name, queue);
 		if (err < 0)
 			goto err_unmap;
-		vif->tx_irq = err;
-		disable_irq(vif->tx_irq);
+		queue->tx_irq = err;
+		disable_irq(queue->tx_irq);
 
-		snprintf(vif->rx_irq_name, sizeof(vif->rx_irq_name),
-			 "%s-rx", vif->dev->name);
+		snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
+			 "%s-rx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
-			vif->rx_irq_name, vif);
+			queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
+			queue->rx_irq_name, queue);
 		if (err < 0)
 			goto err_tx_unbind;
-		vif->rx_irq = err;
-		disable_irq(vif->rx_irq);
+		queue->rx_irq = err;
+		disable_irq(queue->rx_irq);
 	}
 
 	task = kthread_create(xenvif_kthread,
-			      (void *)vif, "%s", vif->dev->name);
+			      (void *)queue, "%s", queue->name);
 	if (IS_ERR(task)) {
-		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
+		pr_warn("Could not allocate kthread for %s\n", queue->name);
 		err = PTR_ERR(task);
 		goto err_rx_unbind;
 	}
 
-	vif->task = task;
-
-	rtnl_lock();
-	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
-		dev_set_mtu(vif->dev, ETH_DATA_LEN);
-	netdev_update_features(vif->dev);
-	netif_carrier_on(vif->dev);
-	if (netif_running(vif->dev))
-		xenvif_up(vif);
-	rtnl_unlock();
+	queue->task = task;
 
-	wake_up_process(vif->task);
+	wake_up_process(queue->task);
 
 	return 0;
 
 err_rx_unbind:
-	unbind_from_irqhandler(vif->rx_irq, vif);
-	vif->rx_irq = 0;
+	unbind_from_irqhandler(queue->rx_irq, queue);
+	queue->rx_irq = 0;
 err_tx_unbind:
-	unbind_from_irqhandler(vif->tx_irq, vif);
-	vif->tx_irq = 0;
+	unbind_from_irqhandler(queue->tx_irq, queue);
+	queue->tx_irq = 0;
 err_unmap:
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(queue);
 err:
 	module_put(THIS_MODULE);
 	return err;
@@ -470,34 +550,52 @@ void xenvif_carrier_off(struct xenvif *vif)
 
 void xenvif_disconnect(struct xenvif *vif)
 {
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
 	if (netif_carrier_ok(vif->dev))
 		xenvif_carrier_off(vif);
 
-	if (vif->task) {
-		kthread_stop(vif->task);
-		vif->task = NULL;
-	}
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
 
-	if (vif->tx_irq) {
-		if (vif->tx_irq == vif->rx_irq)
-			unbind_from_irqhandler(vif->tx_irq, vif);
-		else {
-			unbind_from_irqhandler(vif->tx_irq, vif);
-			unbind_from_irqhandler(vif->rx_irq, vif);
+		if (queue->task) {
+			kthread_stop(queue->task);
+			queue->task = NULL;
 		}
-		vif->tx_irq = 0;
+
+		if (queue->tx_irq) {
+			if (queue->tx_irq == queue->rx_irq)
+				unbind_from_irqhandler(queue->tx_irq, queue);
+			else {
+				unbind_from_irqhandler(queue->tx_irq, queue);
+				unbind_from_irqhandler(queue->rx_irq, queue);
+			}
+			queue->tx_irq = 0;
+		}
+
+		xenvif_unmap_frontend_rings(queue);
 	}
 
-	xenvif_unmap_frontend_rings(vif);
+
 }
 
 void xenvif_free(struct xenvif *vif)
 {
-	netif_napi_del(&vif->napi);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
 
-	unregister_netdev(vif->dev);
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		netif_napi_del(&queue->napi);
+	}
 
-	vfree(vif->grant_copy_op);
+	/* Free the array of queues */
+	vfree(vif->queues);
+	vif->num_queues = 0;
+	vif->queues = 0;
+
+	unregister_netdev(vif->dev);
 	free_netdev(vif->dev);
 
 	module_put(THIS_MODULE);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e5284bc..46b2f5b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -75,38 +75,38 @@ module_param(fatal_skb_slots, uint, 0444);
  * one or more merged tx requests, otherwise it is the continuation of
  * previous tx request.
  */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
+static inline int pending_tx_is_head(struct xenvif_queue *queue, RING_IDX idx)
 {
-	return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
+	return queue->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
 }
 
-static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
+static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 			       u8 status);
 
-static void make_tx_response(struct xenvif *vif,
+static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
 			     s8       st);
 
-static inline int tx_work_todo(struct xenvif *vif);
-static inline int rx_work_todo(struct xenvif *vif);
+static inline int tx_work_todo(struct xenvif_queue *queue);
+static inline int rx_work_todo(struct xenvif_queue *queue);
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
+static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue,
 					     u16      id,
 					     s8       st,
 					     u16      offset,
 					     u16      size,
 					     u16      flags);
 
-static inline unsigned long idx_to_pfn(struct xenvif *vif,
+static inline unsigned long idx_to_pfn(struct xenvif_queue *queue,
 				       u16 idx)
 {
-	return page_to_pfn(vif->mmap_pages[idx]);
+	return page_to_pfn(queue->mmap_pages[idx]);
 }
 
-static inline unsigned long idx_to_kaddr(struct xenvif *vif,
+static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
 					 u16 idx)
 {
-	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
+	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
 }
 
 /* This is a miniumum size for the linear area to avoid lots of
@@ -131,30 +131,30 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 	return i & (MAX_PENDING_REQS-1);
 }
 
-static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
+static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
 	return MAX_PENDING_REQS -
-		vif->pending_prod + vif->pending_cons;
+		queue->pending_prod + queue->pending_cons;
 }
 
-bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed)
+bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
 {
 	RING_IDX prod, cons;
 
 	do {
-		prod = vif->rx.sring->req_prod;
-		cons = vif->rx.req_cons;
+		prod = queue->rx.sring->req_prod;
+		cons = queue->rx.req_cons;
 
 		if (prod - cons >= needed)
 			return true;
 
-		vif->rx.sring->req_event = prod + 1;
+		queue->rx.sring->req_event = prod + 1;
 
 		/* Make sure event is visible before we check prod
 		 * again.
 		 */
 		mb();
-	} while (vif->rx.sring->req_prod != prod);
+	} while (queue->rx.sring->req_prod != prod);
 
 	return false;
 }
@@ -208,13 +208,13 @@ struct netrx_pending_operations {
 	grant_ref_t copy_gref;
 };
 
-static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
+static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 						 struct netrx_pending_operations *npo)
 {
 	struct xenvif_rx_meta *meta;
 	struct xen_netif_rx_request *req;
 
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+	req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 
 	meta = npo->meta + npo->meta_prod++;
 	meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
@@ -232,7 +232,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
  */
-static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
+static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
@@ -267,7 +267,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 			 */
 			BUG_ON(*head);
 
-			meta = get_next_rx_buffer(vif, npo);
+			meta = get_next_rx_buffer(queue, npo);
 		}
 
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
@@ -281,7 +281,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
 		copy_gop->source.offset = offset;
 
-		copy_gop->dest.domid = vif->domid;
+		copy_gop->dest.domid = queue->vif->domid;
 		copy_gop->dest.offset = npo->copy_off;
 		copy_gop->dest.u.ref = npo->copy_gref;
 
@@ -306,8 +306,8 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		else
 			gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
-		if (*head && ((1 << gso_type) & vif->gso_mask))
-			vif->rx.req_cons++;
+		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
+			queue->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
 
@@ -327,7 +327,8 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
  * frontend-side LRO).
  */
 static int xenvif_gop_skb(struct sk_buff *skb,
-			  struct netrx_pending_operations *npo)
+			  struct netrx_pending_operations *npo,
+			  struct xenvif_queue *queue)
 {
 	struct xenvif *vif = netdev_priv(skb->dev);
 	int nr_frags = skb_shinfo(skb)->nr_frags;
@@ -355,7 +356,7 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 
 	/* Set up a GSO prefix descriptor, if necessary */
 	if ((1 << gso_type) & vif->gso_prefix_mask) {
-		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+		req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
 		meta->gso_type = gso_type;
 		meta->gso_size = gso_size;
@@ -363,7 +364,7 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 		meta->id = req->id;
 	}
 
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+	req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
 	if ((1 << gso_type) & vif->gso_mask) {
@@ -387,13 +388,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 		if (data + len > skb_tail_pointer(skb))
 			len = skb_tail_pointer(skb) - data;
 
-		xenvif_gop_frag_copy(vif, skb, npo,
+		xenvif_gop_frag_copy(queue, skb, npo,
 				     virt_to_page(data), len, offset, &head);
 		data += len;
 	}
 
 	for (i = 0; i < nr_frags; i++) {
-		xenvif_gop_frag_copy(vif, skb, npo,
+		xenvif_gop_frag_copy(queue, skb, npo,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
@@ -429,7 +430,7 @@ static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
 	return status;
 }
 
-static void xenvif_add_frag_responses(struct xenvif *vif, int status,
+static void xenvif_add_frag_responses(struct xenvif_queue *queue, int status,
 				      struct xenvif_rx_meta *meta,
 				      int nr_meta_slots)
 {
@@ -450,7 +451,7 @@ static void xenvif_add_frag_responses(struct xenvif *vif, int status,
 			flags = XEN_NETRXF_more_data;
 
 		offset = 0;
-		make_rx_response(vif, meta[i].id, status, offset,
+		make_rx_response(queue, meta[i].id, status, offset,
 				 meta[i].size, flags);
 	}
 }
@@ -459,12 +460,12 @@ struct skb_cb_overlay {
 	int meta_slots_used;
 };
 
-void xenvif_kick_thread(struct xenvif *vif)
+void xenvif_kick_thread(struct xenvif_queue *queue)
 {
-	wake_up(&vif->wq);
+	wake_up(&queue->wq);
 }
 
-static void xenvif_rx_action(struct xenvif *vif)
+static void xenvif_rx_action(struct xenvif_queue *queue)
 {
 	s8 status;
 	u16 flags;
@@ -478,13 +479,13 @@ static void xenvif_rx_action(struct xenvif *vif)
 	bool need_to_notify = false;
 
 	struct netrx_pending_operations npo = {
-		.copy  = vif->grant_copy_op,
-		.meta  = vif->meta,
+		.copy  = queue->grant_copy_op,
+		.meta  = queue->meta,
 	};
 
 	skb_queue_head_init(&rxq);
 
-	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
+	while ((skb = skb_dequeue(&queue->rx_queue)) != NULL) {
 		RING_IDX max_slots_needed;
 		int i;
 
@@ -505,41 +506,41 @@ static void xenvif_rx_action(struct xenvif *vif)
 			max_slots_needed++;
 
 		/* If the skb may not fit then bail out now */
-		if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
-			skb_queue_head(&vif->rx_queue, skb);
+		if (!xenvif_rx_ring_slots_available(queue, max_slots_needed)) {
+			skb_queue_head(&queue->rx_queue, skb);
 			need_to_notify = true;
-			vif->rx_last_skb_slots = max_slots_needed;
+			queue->rx_last_skb_slots = max_slots_needed;
 			break;
 		} else
-			vif->rx_last_skb_slots = 0;
+			queue->rx_last_skb_slots = 0;
 
 		sco = (struct skb_cb_overlay *)skb->cb;
-		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
+		sco->meta_slots_used = xenvif_gop_skb(skb, &npo, queue);
 		BUG_ON(sco->meta_slots_used > max_slots_needed);
 
 		__skb_queue_tail(&rxq, skb);
 	}
 
-	BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
+	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
 
 	if (!npo.copy_prod)
 		goto done;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
-	gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod);
+	gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 		sco = (struct skb_cb_overlay *)skb->cb;
 
-		if ((1 << vif->meta[npo.meta_cons].gso_type) &
-		    vif->gso_prefix_mask) {
-			resp = RING_GET_RESPONSE(&vif->rx,
-						 vif->rx.rsp_prod_pvt++);
+		if ((1 << queue->meta[npo.meta_cons].gso_type) &
+		    queue->vif->gso_prefix_mask) {
+			resp = RING_GET_RESPONSE(&queue->rx,
+						 queue->rx.rsp_prod_pvt++);
 
 			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
 
-			resp->offset = vif->meta[npo.meta_cons].gso_size;
-			resp->id = vif->meta[npo.meta_cons].id;
+			resp->offset = queue->meta[npo.meta_cons].gso_size;
+			resp->id = queue->meta[npo.meta_cons].id;
 			resp->status = sco->meta_slots_used;
 
 			npo.meta_cons++;
@@ -547,10 +548,10 @@ static void xenvif_rx_action(struct xenvif *vif)
 		}
 
 
-		vif->dev->stats.tx_bytes += skb->len;
-		vif->dev->stats.tx_packets++;
+		queue->stats.tx_bytes += skb->len;
+		queue->stats.tx_packets++;
 
-		status = xenvif_check_gop(vif, sco->meta_slots_used, &npo);
+		status = xenvif_check_gop(queue->vif, sco->meta_slots_used, &npo);
 
 		if (sco->meta_slots_used == 1)
 			flags = 0;
@@ -564,22 +565,22 @@ static void xenvif_rx_action(struct xenvif *vif)
 			flags |= XEN_NETRXF_data_validated;
 
 		offset = 0;
-		resp = make_rx_response(vif, vif->meta[npo.meta_cons].id,
+		resp = make_rx_response(queue, queue->meta[npo.meta_cons].id,
 					status, offset,
-					vif->meta[npo.meta_cons].size,
+					queue->meta[npo.meta_cons].size,
 					flags);
 
-		if ((1 << vif->meta[npo.meta_cons].gso_type) &
-		    vif->gso_mask) {
+		if ((1 << queue->meta[npo.meta_cons].gso_type) &
+		    queue->vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
-				RING_GET_RESPONSE(&vif->rx,
-						  vif->rx.rsp_prod_pvt++);
+				RING_GET_RESPONSE(&queue->rx,
+						  queue->rx.rsp_prod_pvt++);
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
-			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
-			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
+			gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
+			gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
@@ -587,11 +588,11 @@ static void xenvif_rx_action(struct xenvif *vif)
 			gso->flags = 0;
 		}
 
-		xenvif_add_frag_responses(vif, status,
-					  vif->meta + npo.meta_cons + 1,
+		xenvif_add_frag_responses(queue, status,
+					  queue->meta + npo.meta_cons + 1,
 					  sco->meta_slots_used);
 
-		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
 
 		need_to_notify |= !!ret;
 
@@ -601,20 +602,20 @@ static void xenvif_rx_action(struct xenvif *vif)
 
 done:
 	if (need_to_notify)
-		notify_remote_via_irq(vif->rx_irq);
+		notify_remote_via_irq(queue->rx_irq);
 }
 
-void xenvif_check_rx_xenvif(struct xenvif *vif)
+void xenvif_check_rx_xenvif(struct xenvif_queue *queue)
 {
 	int more_to_do;
 
-	RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
+	RING_FINAL_CHECK_FOR_REQUESTS(&queue->tx, more_to_do);
 
 	if (more_to_do)
-		napi_schedule(&vif->napi);
+		napi_schedule(&queue->napi);
 }
 
-static void tx_add_credit(struct xenvif *vif)
+static void tx_add_credit(struct xenvif_queue *queue)
 {
 	unsigned long max_burst, max_credit;
 
@@ -622,37 +623,37 @@ static void tx_add_credit(struct xenvif *vif)
 	 * Allow a burst big enough to transmit a jumbo packet of up to 128kB.
 	 * Otherwise the interface can seize up due to insufficient credit.
 	 */
-	max_burst = RING_GET_REQUEST(&vif->tx, vif->tx.req_cons)->size;
+	max_burst = RING_GET_REQUEST(&queue->tx, queue->tx.req_cons)->size;
 	max_burst = min(max_burst, 131072UL);
-	max_burst = max(max_burst, vif->credit_bytes);
+	max_burst = max(max_burst, queue->credit_bytes);
 
 	/* Take care that adding a new chunk of credit doesn't wrap to zero. */
-	max_credit = vif->remaining_credit + vif->credit_bytes;
-	if (max_credit < vif->remaining_credit)
+	max_credit = queue->remaining_credit + queue->credit_bytes;
+	if (max_credit < queue->remaining_credit)
 		max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */
 
-	vif->remaining_credit = min(max_credit, max_burst);
+	queue->remaining_credit = min(max_credit, max_burst);
 }
 
 static void tx_credit_callback(unsigned long data)
 {
-	struct xenvif *vif = (struct xenvif *)data;
-	tx_add_credit(vif);
-	xenvif_check_rx_xenvif(vif);
+	struct xenvif_queue *queue = (struct xenvif_queue *)data;
+	tx_add_credit(queue);
+	xenvif_check_rx_xenvif(queue);
 }
 
-static void xenvif_tx_err(struct xenvif *vif,
+static void xenvif_tx_err(struct xenvif_queue *queue,
 			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 
 	do {
-		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
+		make_tx_response(queue, txp, XEN_NETIF_RSP_ERROR);
 		if (cons == end)
 			break;
-		txp = RING_GET_REQUEST(&vif->tx, cons++);
+		txp = RING_GET_REQUEST(&queue->tx, cons++);
 	} while (1);
-	vif->tx.req_cons = cons;
+	queue->tx.req_cons = cons;
 }
 
 static void xenvif_fatal_tx_err(struct xenvif *vif)
@@ -661,12 +662,12 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
 	xenvif_carrier_off(vif);
 }
 
-static int xenvif_count_requests(struct xenvif *vif,
+static int xenvif_count_requests(struct xenvif_queue *queue,
 				 struct xen_netif_tx_request *first,
 				 struct xen_netif_tx_request *txp,
 				 int work_to_do)
 {
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 	int slots = 0;
 	int drop_err = 0;
 	int more_data;
@@ -678,10 +679,10 @@ static int xenvif_count_requests(struct xenvif *vif,
 		struct xen_netif_tx_request dropped_tx = { 0 };
 
 		if (slots >= work_to_do) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Asked for %d slots but exceeds this limit\n",
 				   work_to_do);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -ENODATA;
 		}
 
@@ -689,10 +690,10 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 * considered malicious.
 		 */
 		if (unlikely(slots >= fatal_skb_slots)) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Malicious frontend using %d slots, threshold %u\n",
 				   slots, fatal_skb_slots);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -E2BIG;
 		}
 
@@ -705,7 +706,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 */
 		if (!drop_err && slots >= XEN_NETBK_LEGACY_SLOTS_MAX) {
 			if (net_ratelimit())
-				netdev_dbg(vif->dev,
+				netdev_dbg(queue->vif->dev,
 					   "Too many slots (%d) exceeding limit (%d), dropping packet\n",
 					   slots, XEN_NETBK_LEGACY_SLOTS_MAX);
 			drop_err = -E2BIG;
@@ -714,7 +715,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		if (drop_err)
 			txp = &dropped_tx;
 
-		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
+		memcpy(txp, RING_GET_REQUEST(&queue->tx, cons + slots),
 		       sizeof(*txp));
 
 		/* If the guest submitted a frame >= 64 KiB then
@@ -728,7 +729,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 */
 		if (!drop_err && txp->size > first->size) {
 			if (net_ratelimit())
-				netdev_dbg(vif->dev,
+				netdev_dbg(queue->vif->dev,
 					   "Invalid tx request, slot size %u > remaining size %u\n",
 					   txp->size, first->size);
 			drop_err = -EIO;
@@ -738,9 +739,9 @@ static int xenvif_count_requests(struct xenvif *vif,
 		slots++;
 
 		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
-			netdev_err(vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n",
+			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n",
 				 txp->offset, txp->size);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -EINVAL;
 		}
 
@@ -752,14 +753,14 @@ static int xenvif_count_requests(struct xenvif *vif,
 	} while (more_data);
 
 	if (drop_err) {
-		xenvif_tx_err(vif, first, cons + slots);
+		xenvif_tx_err(queue, first, cons + slots);
 		return drop_err;
 	}
 
 	return slots;
 }
 
-static struct page *xenvif_alloc_page(struct xenvif *vif,
+static struct page *xenvif_alloc_page(struct xenvif_queue *queue,
 				      u16 pending_idx)
 {
 	struct page *page;
@@ -767,12 +768,12 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
 	page = alloc_page(GFP_ATOMIC|__GFP_COLD);
 	if (!page)
 		return NULL;
-	vif->mmap_pages[pending_idx] = page;
+	queue->mmap_pages[pending_idx] = page;
 
 	return page;
 }
 
-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+static struct gnttab_copy *xenvif_get_requests(struct xenvif_queue *queue,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
 					       struct gnttab_copy *gop)
@@ -803,7 +804,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 	for (shinfo->nr_frags = slot = start; slot < nr_slots;
 	     shinfo->nr_frags++) {
 		struct pending_tx_info *pending_tx_info =
-			vif->pending_tx_info;
+			queue->pending_tx_info;
 
 		page = alloc_page(GFP_ATOMIC|__GFP_COLD);
 		if (!page)
@@ -815,7 +816,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 			gop->flags = GNTCOPY_source_gref;
 
 			gop->source.u.ref = txp->gref;
-			gop->source.domid = vif->domid;
+			gop->source.domid = queue->vif->domid;
 			gop->source.offset = txp->offset;
 
 			gop->dest.domid = DOMID_SELF;
@@ -840,9 +841,9 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 				gop->len = txp->size;
 				dst_offset += gop->len;
 
-				index = pending_index(vif->pending_cons++);
+				index = pending_index(queue->pending_cons++);
 
-				pending_idx = vif->pending_ring[index];
+				pending_idx = queue->pending_ring[index];
 
 				memcpy(&pending_tx_info[pending_idx].req, txp,
 				       sizeof(*txp));
@@ -851,7 +852,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 				 * fields for head tx req will be set
 				 * to correct values after the loop.
 				 */
-				vif->mmap_pages[pending_idx] = (void *)(~0UL);
+				queue->mmap_pages[pending_idx] = (void *)(~0UL);
 				pending_tx_info[pending_idx].head =
 					INVALID_PENDING_RING_IDX;
 
@@ -871,7 +872,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 		first->req.offset = 0;
 		first->req.size = dst_offset;
 		first->head = start_idx;
-		vif->mmap_pages[head_idx] = page;
+		queue->mmap_pages[head_idx] = page;
 		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
 	}
 
@@ -881,18 +882,18 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 err:
 	/* Unwind, freeing all pages and sending error responses. */
 	while (shinfo->nr_frags-- > start) {
-		xenvif_idx_release(vif,
+		xenvif_idx_release(queue,
 				frag_get_pending_idx(&frags[shinfo->nr_frags]),
 				XEN_NETIF_RSP_ERROR);
 	}
 	/* The head too, if necessary. */
 	if (start)
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 	return NULL;
 }
 
-static int xenvif_tx_check_gop(struct xenvif *vif,
+static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 			       struct sk_buff *skb,
 			       struct gnttab_copy **gopp)
 {
@@ -907,7 +908,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	/* Check status of header. */
 	err = gop->status;
 	if (unlikely(err))
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -917,7 +918,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
-		tx_info = &vif->pending_tx_info[pending_idx];
+		tx_info = &queue->pending_tx_info[pending_idx];
 		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
@@ -925,19 +926,19 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 			newerr = (++gop)->status;
 			if (newerr)
 				break;
-			peek = vif->pending_ring[pending_index(++head)];
-		} while (!pending_tx_is_head(vif, peek));
+			peek = queue->pending_ring[pending_index(++head)];
+		} while (!pending_tx_is_head(queue, peek));
 
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
-				xenvif_idx_release(vif, pending_idx,
+				xenvif_idx_release(queue, pending_idx,
 						   XEN_NETIF_RSP_OKAY);
 			continue;
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
@@ -945,10 +946,10 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_OKAY);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
-			xenvif_idx_release(vif, pending_idx,
+			xenvif_idx_release(queue, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
 
@@ -960,7 +961,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -974,46 +975,46 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
-		txp = &vif->pending_tx_info[pending_idx].req;
-		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
+		txp = &queue->pending_tx_info[pending_idx].req;
+		page = virt_to_page(idx_to_kaddr(queue, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
 		/* Take an extra reference to offset xenvif_idx_release */
-		get_page(vif->mmap_pages[pending_idx]);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		get_page(queue->mmap_pages[pending_idx]);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_OKAY);
 	}
 }
 
-static int xenvif_get_extras(struct xenvif *vif,
+static int xenvif_get_extras(struct xenvif_queue *queue,
 				struct xen_netif_extra_info *extras,
 				int work_to_do)
 {
 	struct xen_netif_extra_info extra;
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 
 	do {
 		if (unlikely(work_to_do-- <= 0)) {
-			netdev_err(vif->dev, "Missing extra info\n");
-			xenvif_fatal_tx_err(vif);
+			netdev_err(queue->vif->dev, "Missing extra info\n");
+			xenvif_fatal_tx_err(queue->vif);
 			return -EBADR;
 		}
 
-		memcpy(&extra, RING_GET_REQUEST(&vif->tx, cons),
+		memcpy(&extra, RING_GET_REQUEST(&queue->tx, cons),
 		       sizeof(extra));
 		if (unlikely(!extra.type ||
 			     extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
-			vif->tx.req_cons = ++cons;
-			netdev_err(vif->dev,
+			queue->tx.req_cons = ++cons;
+			netdev_err(queue->vif->dev,
 				   "Invalid extra type: %d\n", extra.type);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -EINVAL;
 		}
 
 		memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
-		vif->tx.req_cons = ++cons;
+		queue->tx.req_cons = ++cons;
 	} while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
 	return work_to_do;
@@ -1058,7 +1059,7 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 	 * recalculate the partial checksum.
 	 */
 	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
+		atomic_inc(&vif->rx_gso_checksum_fixup);
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		recalculate_partial_csum = true;
 	}
@@ -1070,31 +1071,31 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 	return skb_checksum_setup(skb, recalculate_partial_csum);
 }
 
-static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
+static bool tx_credit_exceeded(struct xenvif_queue *queue, unsigned size)
 {
 	u64 now = get_jiffies_64();
-	u64 next_credit = vif->credit_window_start +
-		msecs_to_jiffies(vif->credit_usec / 1000);
+	u64 next_credit = queue->credit_window_start +
+		msecs_to_jiffies(queue->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
-	if (timer_pending(&vif->credit_timeout))
+	if (timer_pending(&queue->credit_timeout))
 		return true;
 
 	/* Passed the point where we can replenish credit? */
 	if (time_after_eq64(now, next_credit)) {
-		vif->credit_window_start = now;
-		tx_add_credit(vif);
+		queue->credit_window_start = now;
+		tx_add_credit(queue);
 	}
 
 	/* Still too big to send right now? Set a callback. */
-	if (size > vif->remaining_credit) {
-		vif->credit_timeout.data     =
-			(unsigned long)vif;
-		vif->credit_timeout.function =
+	if (size > queue->remaining_credit) {
+		queue->credit_timeout.data     =
+			(unsigned long)queue;
+		queue->credit_timeout.function =
 			tx_credit_callback;
-		mod_timer(&vif->credit_timeout,
+		mod_timer(&queue->credit_timeout,
 			  next_credit);
-		vif->credit_window_start = next_credit;
+		queue->credit_window_start = next_credit;
 
 		return true;
 	}
@@ -1102,15 +1103,15 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
+static unsigned xenvif_tx_build_gops(struct xenvif_queue *queue, int budget)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+	struct gnttab_copy *gop = queue->tx_copy_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 
-	while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
+	while ((nr_pending_reqs(queue) + XEN_NETBK_LEGACY_SLOTS_MAX
 		< MAX_PENDING_REQS) &&
-	       (skb_queue_len(&vif->tx_queue) < budget)) {
+	       (skb_queue_len(&queue->tx_queue) < budget)) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
 		struct page *page;
@@ -1121,69 +1122,69 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		unsigned int data_len;
 		pending_ring_idx_t index;
 
-		if (vif->tx.sring->req_prod - vif->tx.req_cons >
+		if (queue->tx.sring->req_prod - queue->tx.req_cons >
 		    XEN_NETIF_TX_RING_SIZE) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Impossible number of requests. "
 				   "req_prod %d, req_cons %d, size %ld\n",
-				   vif->tx.sring->req_prod, vif->tx.req_cons,
+				   queue->tx.sring->req_prod, queue->tx.req_cons,
 				   XEN_NETIF_TX_RING_SIZE);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			continue;
 		}
 
-		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&vif->tx);
+		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&queue->tx);
 		if (!work_to_do)
 			break;
 
-		idx = vif->tx.req_cons;
+		idx = queue->tx.req_cons;
 		rmb(); /* Ensure that we see the request before we copy it. */
-		memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq));
+		memcpy(&txreq, RING_GET_REQUEST(&queue->tx, idx), sizeof(txreq));
 
 		/* Credit-based scheduling. */
-		if (txreq.size > vif->remaining_credit &&
-		    tx_credit_exceeded(vif, txreq.size))
+		if (txreq.size > queue->remaining_credit &&
+		    tx_credit_exceeded(queue, txreq.size))
 			break;
 
-		vif->remaining_credit -= txreq.size;
+		queue->remaining_credit -= txreq.size;
 
 		work_to_do--;
-		vif->tx.req_cons = ++idx;
+		queue->tx.req_cons = ++idx;
 
 		memset(extras, 0, sizeof(extras));
 		if (txreq.flags & XEN_NETTXF_extra_info) {
-			work_to_do = xenvif_get_extras(vif, extras,
+			work_to_do = xenvif_get_extras(queue, extras,
 						       work_to_do);
-			idx = vif->tx.req_cons;
+			idx = queue->tx.req_cons;
 			if (unlikely(work_to_do < 0))
 				break;
 		}
 
-		ret = xenvif_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = xenvif_count_requests(queue, &txreq, txfrags, work_to_do);
 		if (unlikely(ret < 0))
 			break;
 
 		idx += ret;
 
 		if (unlikely(txreq.size < ETH_HLEN)) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Bad packet size: %d\n", txreq.size);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
 		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "txreq.offset: %x, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
 				   (txreq.offset&~PAGE_MASK) + txreq.size);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
 
-		index = pending_index(vif->pending_cons);
-		pending_idx = vif->pending_ring[index];
+		index = pending_index(queue->pending_cons);
+		pending_idx = queue->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
 			    ret < XEN_NETBK_LEGACY_SLOTS_MAX) ?
@@ -1192,9 +1193,9 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
 				GFP_ATOMIC | __GFP_NOWARN);
 		if (unlikely(skb == NULL)) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Can't allocate a skb in start_xmit.\n");
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
@@ -1205,7 +1206,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			struct xen_netif_extra_info *gso;
 			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
 
-			if (xenvif_set_skb_gso(vif, skb, gso)) {
+			if (xenvif_set_skb_gso(queue->vif, skb, gso)) {
 				/* Failure in xenvif_set_skb_gso is fatal. */
 				kfree_skb(skb);
 				break;
@@ -1213,15 +1214,15 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		}
 
 		/* XXX could copy straight to head */
-		page = xenvif_alloc_page(vif, pending_idx);
+		page = xenvif_alloc_page(queue, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
 		gop->source.u.ref = txreq.gref;
-		gop->source.domid = vif->domid;
+		gop->source.domid = queue->vif->domid;
 		gop->source.offset = txreq.offset;
 
 		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
@@ -1233,9 +1234,9 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		gop++;
 
-		memcpy(&vif->pending_tx_info[pending_idx].req,
+		memcpy(&queue->pending_tx_info[pending_idx].req,
 		       &txreq, sizeof(txreq));
-		vif->pending_tx_info[pending_idx].head = index;
+		queue->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1250,45 +1251,45 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 					     INVALID_PENDING_IDX);
 		}
 
-		vif->pending_cons++;
+		queue->pending_cons++;
 
-		request_gop = xenvif_get_requests(vif, skb, txfrags, gop);
+		request_gop = xenvif_get_requests(queue, skb, txfrags, gop);
 		if (request_gop == NULL) {
 			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 		gop = request_gop;
 
-		__skb_queue_tail(&vif->tx_queue, skb);
+		__skb_queue_tail(&queue->tx_queue, skb);
 
-		vif->tx.req_cons = idx;
+		queue->tx.req_cons = idx;
 
-		if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+		if ((gop - queue->tx_copy_ops) >= ARRAY_SIZE(queue->tx_copy_ops))
 			break;
 	}
 
-	return gop - vif->tx_copy_ops;
+	return gop - queue->tx_copy_ops;
 }
 
 
-static int xenvif_tx_submit(struct xenvif *vif)
+static int xenvif_tx_submit(struct xenvif_queue *queue)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops;
+	struct gnttab_copy *gop = queue->tx_copy_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
-	while ((skb = __skb_dequeue(&vif->tx_queue)) != NULL) {
+	while ((skb = __skb_dequeue(&queue->tx_queue)) != NULL) {
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
 
 		pending_idx = *((u16 *)skb->data);
-		txp = &vif->pending_tx_info[pending_idx].req;
+		txp = &queue->pending_tx_info[pending_idx].req;
 
 		/* Check the remap error code. */
-		if (unlikely(xenvif_tx_check_gop(vif, skb, &gop))) {
-			netdev_dbg(vif->dev, "netback grant failed.\n");
+		if (unlikely(xenvif_tx_check_gop(queue, skb, &gop))) {
+			netdev_dbg(queue->vif->dev, "netback grant failed.\n");
 			skb_shinfo(skb)->nr_frags = 0;
 			kfree_skb(skb);
 			continue;
@@ -1296,7 +1297,7 @@ static int xenvif_tx_submit(struct xenvif *vif)
 
 		data_len = skb->len;
 		memcpy(skb->data,
-		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
+		       (void *)(idx_to_kaddr(queue, pending_idx)|txp->offset),
 		       data_len);
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
@@ -1304,7 +1305,7 @@ static int xenvif_tx_submit(struct xenvif *vif)
 			txp->size -= data_len;
 		} else {
 			/* Schedule a response immediately. */
-			xenvif_idx_release(vif, pending_idx,
+			xenvif_idx_release(queue, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
 
@@ -1313,19 +1314,19 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xenvif_fill_frags(vif, skb);
+		xenvif_fill_frags(queue, skb);
 
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
 		}
 
-		skb->dev      = vif->dev;
+		skb->dev      = queue->vif->dev;
 		skb->protocol = eth_type_trans(skb, skb->dev);
 		skb_reset_network_header(skb);
 
-		if (checksum_setup(vif, skb)) {
-			netdev_dbg(vif->dev,
+		if (checksum_setup(queue->vif, skb)) {
+			netdev_dbg(queue->vif->dev,
 				   "Can't setup checksum in net_tx_action\n");
 			kfree_skb(skb);
 			continue;
@@ -1347,8 +1348,8 @@ static int xenvif_tx_submit(struct xenvif *vif)
 				DIV_ROUND_UP(skb->len - hdrlen, mss);
 		}
 
-		vif->dev->stats.rx_bytes += skb->len;
-		vif->dev->stats.rx_packets++;
+		queue->stats.rx_bytes += skb->len;
+		queue->stats.rx_packets++;
 
 		work_done++;
 
@@ -1359,53 +1360,53 @@ static int xenvif_tx_submit(struct xenvif *vif)
 }
 
 /* Called after netfront has transmitted */
-int xenvif_tx_action(struct xenvif *vif, int budget)
+int xenvif_tx_action(struct xenvif_queue *queue, int budget)
 {
 	unsigned nr_gops;
 	int work_done;
 
-	if (unlikely(!tx_work_todo(vif)))
+	if (unlikely(!tx_work_todo(queue)))
 		return 0;
 
-	nr_gops = xenvif_tx_build_gops(vif, budget);
+	nr_gops = xenvif_tx_build_gops(queue, budget);
 
 	if (nr_gops == 0)
 		return 0;
 
-	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+	gnttab_batch_copy(queue->tx_copy_ops, nr_gops);
 
-	work_done = xenvif_tx_submit(vif);
+	work_done = xenvif_tx_submit(queue);
 
 	return work_done;
 }
 
-static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
+static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 			       u8 status)
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t head;
 	u16 peek; /* peek into next tx request */
 
-	BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
+	BUG_ON(queue->mmap_pages[pending_idx] == (void *)(~0UL));
 
 	/* Already complete? */
-	if (vif->mmap_pages[pending_idx] == NULL)
+	if (queue->mmap_pages[pending_idx] == NULL)
 		return;
 
-	pending_tx_info = &vif->pending_tx_info[pending_idx];
+	pending_tx_info = &queue->pending_tx_info[pending_idx];
 
 	head = pending_tx_info->head;
 
-	BUG_ON(!pending_tx_is_head(vif, head));
-	BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
+	BUG_ON(!pending_tx_is_head(queue, head));
+	BUG_ON(queue->pending_ring[pending_index(head)] != pending_idx);
 
 	do {
 		pending_ring_idx_t index;
 		pending_ring_idx_t idx = pending_index(head);
-		u16 info_idx = vif->pending_ring[idx];
+		u16 info_idx = queue->pending_ring[idx];
 
-		pending_tx_info = &vif->pending_tx_info[info_idx];
-		make_tx_response(vif, &pending_tx_info->req, status);
+		pending_tx_info = &queue->pending_tx_info[info_idx];
+		make_tx_response(queue, &pending_tx_info->req, status);
 
 		/* Setting any number other than
 		 * INVALID_PENDING_RING_IDX indicates this slot is
@@ -1413,50 +1414,50 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 		 */
 		pending_tx_info->head = 0;
 
-		index = pending_index(vif->pending_prod++);
-		vif->pending_ring[index] = vif->pending_ring[info_idx];
+		index = pending_index(queue->pending_prod++);
+		queue->pending_ring[index] = queue->pending_ring[info_idx];
 
-		peek = vif->pending_ring[pending_index(++head)];
+		peek = queue->pending_ring[pending_index(++head)];
 
-	} while (!pending_tx_is_head(vif, peek));
+	} while (!pending_tx_is_head(queue, peek));
 
-	put_page(vif->mmap_pages[pending_idx]);
-	vif->mmap_pages[pending_idx] = NULL;
+	put_page(queue->mmap_pages[pending_idx]);
+	queue->mmap_pages[pending_idx] = NULL;
 }
 
 
-static void make_tx_response(struct xenvif *vif,
+static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
 			     s8       st)
 {
-	RING_IDX i = vif->tx.rsp_prod_pvt;
+	RING_IDX i = queue->tx.rsp_prod_pvt;
 	struct xen_netif_tx_response *resp;
 	int notify;
 
-	resp = RING_GET_RESPONSE(&vif->tx, i);
+	resp = RING_GET_RESPONSE(&queue->tx, i);
 	resp->id     = txp->id;
 	resp->status = st;
 
 	if (txp->flags & XEN_NETTXF_extra_info)
-		RING_GET_RESPONSE(&vif->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
 
-	vif->tx.rsp_prod_pvt = ++i;
-	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->tx, notify);
+	queue->tx.rsp_prod_pvt = ++i;
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
 	if (notify)
-		notify_remote_via_irq(vif->tx_irq);
+		notify_remote_via_irq(queue->tx_irq);
 }
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
+static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue,
 					     u16      id,
 					     s8       st,
 					     u16      offset,
 					     u16      size,
 					     u16      flags)
 {
-	RING_IDX i = vif->rx.rsp_prod_pvt;
+	RING_IDX i = queue->rx.rsp_prod_pvt;
 	struct xen_netif_rx_response *resp;
 
-	resp = RING_GET_RESPONSE(&vif->rx, i);
+	resp = RING_GET_RESPONSE(&queue->rx, i);
 	resp->offset     = offset;
 	resp->flags      = flags;
 	resp->id         = id;
@@ -1464,39 +1465,39 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 	if (st < 0)
 		resp->status = (s16)st;
 
-	vif->rx.rsp_prod_pvt = ++i;
+	queue->rx.rsp_prod_pvt = ++i;
 
 	return resp;
 }
 
-static inline int rx_work_todo(struct xenvif *vif)
+static inline int rx_work_todo(struct xenvif_queue *queue)
 {
-	return !skb_queue_empty(&vif->rx_queue) &&
-	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
+	return !skb_queue_empty(&queue->rx_queue) &&
+	       xenvif_rx_ring_slots_available(queue, queue->rx_last_skb_slots);
 }
 
-static inline int tx_work_todo(struct xenvif *vif)
+static inline int tx_work_todo(struct xenvif_queue *queue)
 {
 
-	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
-	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
+	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&queue->tx)) &&
+	    (nr_pending_reqs(queue) + XEN_NETBK_LEGACY_SLOTS_MAX
 	     < MAX_PENDING_REQS))
 		return 1;
 
 	return 0;
 }
 
-void xenvif_unmap_frontend_rings(struct xenvif *vif)
+void xenvif_unmap_frontend_rings(struct xenvif_queue *queue)
 {
-	if (vif->tx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->tx.sring);
-	if (vif->rx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->rx.sring);
+	if (queue->tx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(queue->vif),
+					queue->tx.sring);
+	if (queue->rx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(queue->vif),
+					queue->rx.sring);
 }
 
-int xenvif_map_frontend_rings(struct xenvif *vif,
+int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 			      grant_ref_t tx_ring_ref,
 			      grant_ref_t rx_ring_ref)
 {
@@ -1506,67 +1507,72 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
 
 	int err = -ENOMEM;
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     tx_ring_ref, &addr);
 	if (err)
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     rx_ring_ref, &addr);
 	if (err)
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
 
 	return 0;
 
 err:
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(queue);
 	return err;
 }
 
-void xenvif_stop_queue(struct xenvif *vif)
+static inline void xenvif_wake_queue(struct xenvif_queue *queue)
 {
-	if (!vif->can_queue)
-		return;
+	struct net_device *dev = queue->vif->dev;
+	netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
+}
 
-	netif_stop_queue(vif->dev);
+static void xenvif_start_queue(struct xenvif_queue *queue)
+{
+	if (xenvif_schedulable(queue->vif))
+		xenvif_wake_queue(queue);
 }
 
-static void xenvif_start_queue(struct xenvif *vif)
+static int xenvif_queue_stopped(struct xenvif_queue *queue)
 {
-	if (xenvif_schedulable(vif))
-		netif_wake_queue(vif->dev);
+	struct net_device *dev = queue->vif->dev;
+	unsigned int id = queue->id;
+	return netif_tx_queue_stopped(netdev_get_tx_queue(dev, id));
 }
 
 int xenvif_kthread(void *data)
 {
-	struct xenvif *vif = data;
+	struct xenvif_queue *queue = data;
 	struct sk_buff *skb;
 
 	while (!kthread_should_stop()) {
-		wait_event_interruptible(vif->wq,
-					 rx_work_todo(vif) ||
+		wait_event_interruptible(queue->wq,
+					 rx_work_todo(queue) ||
 					 kthread_should_stop());
 		if (kthread_should_stop())
 			break;
 
-		if (!skb_queue_empty(&vif->rx_queue))
-			xenvif_rx_action(vif);
+		if (!skb_queue_empty(&queue->rx_queue))
+			xenvif_rx_action(queue);
 
-		if (skb_queue_empty(&vif->rx_queue) &&
-		    netif_queue_stopped(vif->dev))
-			xenvif_start_queue(vif);
+		if (skb_queue_empty(&queue->rx_queue) &&
+		    xenvif_queue_stopped(queue))
+			xenvif_start_queue(queue);
 
 		cond_resched();
 	}
 
 	/* Bin any remaining skbs */
-	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL)
+	while ((skb = skb_dequeue(&queue->rx_queue)) != NULL)
 		dev_kfree_skb(skb);
 
 	return 0;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 7a206cf..f23ea0a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -19,6 +19,7 @@
 */
 
 #include "common.h"
+#include <linux/vmalloc.h>
 
 struct backend_info {
 	struct xenbus_device *dev;
@@ -34,8 +35,9 @@ struct backend_info {
 	u8 have_hotplug_status_watch:1;
 };
 
-static int connect_rings(struct backend_info *);
-static void connect(struct backend_info *);
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue);
+static void connect(struct backend_info *be);
+static int read_xenbus_vif_flags(struct backend_info *be);
 static void backend_create_xenvif(struct backend_info *be);
 static void unregister_hotplug_status_watch(struct backend_info *be);
 static void set_backend_state(struct backend_info *be,
@@ -485,10 +487,9 @@ static void connect(struct backend_info *be)
 {
 	int err;
 	struct xenbus_device *dev = be->dev;
-
-	err = connect_rings(be);
-	if (err)
-		return;
+	unsigned long credit_bytes, credit_usec;
+	unsigned int queue_index;
+	struct xenvif_queue *queue;
 
 	err = xen_net_read_mac(dev, be->vif->fe_dev_addr);
 	if (err) {
@@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
 		return;
 	}
 
-	xen_net_read_rate(dev, &be->vif->credit_bytes,
-			  &be->vif->credit_usec);
-	be->vif->remaining_credit = be->vif->credit_bytes;
+	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
+	read_xenbus_vif_flags(be);
+
+	be->vif->num_queues = 1;
+	be->vif->queues = vzalloc(be->vif->num_queues *
+			sizeof(struct xenvif_queue));
+
+	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
+		queue = &be->vif->queues[queue_index];
+		queue->vif = be->vif;
+		queue->id = queue_index;
+		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
+				be->vif->dev->name, queue->id);
+
+		xenvif_init_queue(queue);
+
+		queue->remaining_credit = credit_bytes;
+
+		err = connect_rings(be, queue);
+		if (err)
+			goto err;
+	}
+
+	xenvif_carrier_on(be->vif);
 
 	unregister_hotplug_status_watch(be);
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
@@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
 	if (!err)
 		be->have_hotplug_status_watch = 1;
 
-	netif_wake_queue(be->vif->dev);
+	netif_tx_wake_all_queues(be->vif->dev);
+
+	return;
+
+err:
+	vfree(be->vif->queues);
+	be->vif->queues = NULL;
+	be->vif->num_queues = 0;
+	return;
 }
 
 
-static int connect_rings(struct backend_info *be)
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 {
-	struct xenvif *vif = be->vif;
 	struct xenbus_device *dev = be->dev;
 	unsigned long tx_ring_ref, rx_ring_ref;
-	unsigned int tx_evtchn, rx_evtchn, rx_copy;
+	unsigned int tx_evtchn, rx_evtchn;
 	int err;
-	int val;
 
 	err = xenbus_gather(XBT_NIL, dev->otherend,
 			    "tx-ring-ref", "%lu", &tx_ring_ref,
@@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
 		rx_evtchn = tx_evtchn;
 	}
 
+	/* Map the shared frame, irq etc. */
+	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
+			     tx_evtchn, rx_evtchn);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "mapping shared-frames %lu/%lu port tx %u rx %u",
+				 tx_ring_ref, rx_ring_ref,
+				 tx_evtchn, rx_evtchn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int read_xenbus_vif_flags(struct backend_info *be)
+{
+	struct xenvif *vif = be->vif;
+	struct xenbus_device *dev = be->dev;
+	unsigned int rx_copy;
+	int err, val;
+
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
 			   &rx_copy);
 	if (err == -ENOENT) {
@@ -621,16 +670,6 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->ipv6_csum = !!val;
 
-	/* Map the shared frame, irq etc. */
-	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
-			     tx_evtchn, rx_evtchn);
-	if (err) {
-		xenbus_dev_fatal(dev, err,
-				 "mapping shared-frames %lu/%lu port tx %u rx %u",
-				 tx_ring_ref, rx_ring_ref,
-				 tx_evtchn, rx_evtchn);
-		return err;
-	}
 	return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 net-next 2/5] xen-netback: Add support for multiple queues
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew J. Bennieston, netdev, paul.durrant, wei.liu2,
	ian.campbell
In-Reply-To: <1392378624-6123-1-git-send-email-andrew.bennieston@citrix.com>

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Builds on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.

Writes the maximum supported number of queues into XenStore, and reads
the values written by the frontend to determine how many queues to use.

Ring references and event channels are read from XenStore on a per-queue
basis and rings are connected accordingly.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 drivers/net/xen-netback/common.h    |    2 +
 drivers/net/xen-netback/interface.c |    7 +++-
 drivers/net/xen-netback/netback.c   |    6 +++
 drivers/net/xen-netback/xenbus.c    |   76 ++++++++++++++++++++++++++++++-----
 4 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 2550867..8180929 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -261,4 +261,6 @@ void xenvif_carrier_on(struct xenvif *vif);
 
 extern bool separate_tx_rx_irq;
 
+extern unsigned int xenvif_max_queues;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 4cde112..4dc092c 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -373,7 +373,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	char name[IFNAMSIZ] = {};
 
 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
-	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
+	/* Allocate a netdev with the max. supported number of queues.
+	 * When the guest selects the desired number, it will be updated
+	 * via netif_set_real_num_tx_queues().
+	 */
+	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup,
+						  xenvif_max_queues);
 	if (dev == NULL) {
 		pr_warn("Could not allocate netdev for %s\n", name);
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 46b2f5b..aeb5ffa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -54,6 +54,9 @@
 bool separate_tx_rx_irq = 1;
 module_param(separate_tx_rx_irq, bool, 0644);
 
+unsigned int xenvif_max_queues;
+module_param(xenvif_max_queues, uint, 0644);
+
 /*
  * This is the maximum slots a skb can have. If a guest sends a skb
  * which exceeds this limit it is considered malicious.
@@ -1585,6 +1588,9 @@ static int __init netback_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
+	/* Allow as many queues as there are CPUs, by default */
+	xenvif_max_queues = num_online_cpus();
+
 	if (fatal_skb_slots < XEN_NETBK_LEGACY_SLOTS_MAX) {
 		pr_info("fatal_skb_slots too small (%d), bump it to XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n",
 			fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX);
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index f23ea0a..3f97f6f 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -20,6 +20,7 @@
 
 #include "common.h"
 #include <linux/vmalloc.h>
+#include <linux/rtnetlink.h>
 
 struct backend_info {
 	struct xenbus_device *dev;
@@ -159,6 +160,12 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		pr_debug("Error writing feature-split-event-channels\n");
 
+	/* Multi-queue support: This is an optional feature. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			"multi-queue-max-queues", "%u", xenvif_max_queues);
+	if (err)
+		pr_debug("Error writing multi-queue-max-queues\n");
+
 	err = xenbus_switch_state(dev, XenbusStateInitWait);
 	if (err)
 		goto fail;
@@ -490,6 +497,23 @@ static void connect(struct backend_info *be)
 	unsigned long credit_bytes, credit_usec;
 	unsigned int queue_index;
 	struct xenvif_queue *queue;
+	unsigned int requested_num_queues;
+
+	/* Check whether the frontend requested multiple queues
+	 * and read the number requested.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			"multi-queue-num-queues",
+			"%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1; /* Fall back to single queue */
+	} else if (requested_num_queues > xenvif_max_queues) {
+		/* buggy or malicious guest */
+		xenbus_dev_fatal(dev, err,
+						 "guest requested %u queues, exceeding the maximum of %u.",
+						 requested_num_queues, xenvif_max_queues);
+		return;
+	}
 
 	err = xen_net_read_mac(dev, be->vif->fe_dev_addr);
 	if (err) {
@@ -500,9 +524,13 @@ static void connect(struct backend_info *be)
 	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
 	read_xenbus_vif_flags(be);
 
-	be->vif->num_queues = 1;
+	/* Use the number of queues requested by the frontend */
+	be->vif->num_queues = requested_num_queues;
 	be->vif->queues = vzalloc(be->vif->num_queues *
 			sizeof(struct xenvif_queue));
+	rtnl_lock();
+	netif_set_real_num_tx_queues(be->vif->dev, be->vif->num_queues);
+	rtnl_unlock();
 
 	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
 		queue = &be->vif->queues[queue_index];
@@ -547,29 +575,52 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 	unsigned long tx_ring_ref, rx_ring_ref;
 	unsigned int tx_evtchn, rx_evtchn;
 	int err;
+	char *xspath = NULL;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	/* If the frontend requested 1 queue, or we have fallen back
+	 * to single queue due to lack of frontend support for multi-
+	 * queue, expect the remaining XenStore keys in the toplevel
+	 * directory. Otherwise, expect them in a subdirectory called
+	 * queue-N.
+	 */
+	if (queue->vif->num_queues == 1)
+		xspath = (char *)dev->otherend;
+	else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kzalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM,
+					"reading ring references");
+			return -ENOMEM;
+		}
+		snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend,
+				 queue->id);
+	}
 
-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "tx-ring-ref", "%lu", &tx_ring_ref,
 			    "rx-ring-ref", "%lu", &rx_ring_ref, NULL);
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "reading %s/ring-ref",
-				 dev->otherend);
-		return err;
+				 xspath);
+		goto err;
 	}
 
 	/* Try split event channels first, then single event channel. */
-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "event-channel-tx", "%u", &tx_evtchn,
 			    "event-channel-rx", "%u", &rx_evtchn, NULL);
 	if (err < 0) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend,
+		err = xenbus_scanf(XBT_NIL, xspath,
 				   "event-channel", "%u", &tx_evtchn);
 		if (err < 0) {
 			xenbus_dev_fatal(dev, err,
 					 "reading %s/event-channel(-tx/rx)",
-					 dev->otherend);
-			return err;
+					 xspath);
+			goto err;
 		}
 		rx_evtchn = tx_evtchn;
 	}
@@ -582,10 +633,15 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 				 "mapping shared-frames %lu/%lu port tx %u rx %u",
 				 tx_ring_ref, rx_ring_ref,
 				 tx_evtchn, rx_evtchn);
-		return err;
+		goto err;
 	}
 
-	return 0;
+	err = 0;
+err: /* Regular return falls through with err == 0 */
+	if (xspath != dev->otherend)
+		kfree(xspath);
+
+	return err;
 }
 
 static int read_xenbus_vif_flags(struct backend_info *be)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 net-next 3/5] xen-netfront: Factor queue-specific data into queue struct.
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew J. Bennieston, netdev, paul.durrant, wei.liu2,
	ian.campbell
In-Reply-To: <1392378624-6123-1-git-send-email-andrew.bennieston@citrix.com>

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

In preparation for multi-queue support in xen-netfront, move the
queue-specific data from struct netfront_info to struct netfront_queue,
and update the rest of the code to use this.

Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_etherdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.

Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0, selecting the first (and
only) queue.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 drivers/net/xen-netfront.c |  944 ++++++++++++++++++++++++++------------------
 1 file changed, 551 insertions(+), 393 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f9daa9e..d4239b9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -73,6 +73,12 @@ struct netfront_cb {
 #define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
 #define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
 
+/* Queue name is interface name with "-qNNN" appended */
+#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
+
+/* IRQ name is queue name with "-tx" or "-rx" appended */
+#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 4)
+
 struct netfront_stats {
 	u64			rx_packets;
 	u64			tx_packets;
@@ -81,9 +87,12 @@ struct netfront_stats {
 	struct u64_stats_sync	syncp;
 };
 
-struct netfront_info {
-	struct list_head list;
-	struct net_device *netdev;
+struct netfront_info;
+
+struct netfront_queue {
+	unsigned int id; /* Queue ID, 0-based */
+	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
+	struct netfront_info *info;
 
 	struct napi_struct napi;
 
@@ -93,10 +102,8 @@ struct netfront_info {
 	unsigned int tx_evtchn, rx_evtchn;
 	unsigned int tx_irq, rx_irq;
 	/* Only used when split event channels support is enabled */
-	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
-	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
-
-	struct xenbus_device *xbdev;
+	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
+	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
 
 	spinlock_t   tx_lock;
 	struct xen_netif_tx_front_ring tx;
@@ -140,11 +147,22 @@ struct netfront_info {
 	unsigned long rx_pfn_array[NET_RX_RING_SIZE];
 	struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
 	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
+};
+
+struct netfront_info {
+	struct list_head list;
+	struct net_device *netdev;
+
+	struct xenbus_device *xbdev;
+
+	/* Multi-queue support */
+	unsigned int num_queues;
+	struct netfront_queue *queues;
 
 	/* Statistics */
 	struct netfront_stats __percpu *stats;
 
-	unsigned long rx_gso_checksum_fixup;
+	atomic_t rx_gso_checksum_fixup;
 };
 
 struct netfront_rx_info {
@@ -187,21 +205,21 @@ static int xennet_rxidx(RING_IDX idx)
 	return idx & (NET_RX_RING_SIZE - 1);
 }
 
-static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
+static struct sk_buff *xennet_get_rx_skb(struct netfront_queue *queue,
 					 RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	struct sk_buff *skb = np->rx_skbs[i];
-	np->rx_skbs[i] = NULL;
+	struct sk_buff *skb = queue->rx_skbs[i];
+	queue->rx_skbs[i] = NULL;
 	return skb;
 }
 
-static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
+static grant_ref_t xennet_get_rx_ref(struct netfront_queue *queue,
 					    RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	grant_ref_t ref = np->grant_rx_ref[i];
-	np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	grant_ref_t ref = queue->grant_rx_ref[i];
+	queue->grant_rx_ref[i] = GRANT_INVALID_REF;
 	return ref;
 }
 
@@ -221,41 +239,40 @@ static bool xennet_can_sg(struct net_device *dev)
 
 static void rx_refill_timeout(unsigned long data)
 {
-	struct net_device *dev = (struct net_device *)data;
-	struct netfront_info *np = netdev_priv(dev);
-	napi_schedule(&np->napi);
+	struct netfront_queue *queue = (struct netfront_queue *)data;
+	napi_schedule(&queue->napi);
 }
 
-static int netfront_tx_slot_available(struct netfront_info *np)
+static int netfront_tx_slot_available(struct netfront_queue *queue)
 {
-	return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
+	return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) <
 		(TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
 }
 
-static void xennet_maybe_wake_tx(struct net_device *dev)
+static void xennet_maybe_wake_tx(struct netfront_queue *queue)
 {
-	struct netfront_info *np = netdev_priv(dev);
+	struct net_device *dev = queue->info->netdev;
+	struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, queue->id);
 
-	if (unlikely(netif_queue_stopped(dev)) &&
-	    netfront_tx_slot_available(np) &&
+	if (unlikely(netif_tx_queue_stopped(dev_queue)) &&
+	    netfront_tx_slot_available(queue) &&
 	    likely(netif_running(dev)))
-		netif_wake_queue(dev);
+		netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
 }
 
-static void xennet_alloc_rx_buffers(struct net_device *dev)
+static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 {
 	unsigned short id;
-	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
 	struct page *page;
 	int i, batch_target, notify;
-	RING_IDX req_prod = np->rx.req_prod_pvt;
+	RING_IDX req_prod = queue->rx.req_prod_pvt;
 	grant_ref_t ref;
 	unsigned long pfn;
 	void *vaddr;
 	struct xen_netif_rx_request *req;
 
-	if (unlikely(!netif_carrier_ok(dev)))
+	if (unlikely(!netif_carrier_ok(queue->info->netdev)))
 		return;
 
 	/*
@@ -264,9 +281,10 @@ static void xennet_alloc_rx_buffers(struct net_device *dev)
 	 * allocator, so should reduce the chance of failed allocation requests
 	 * both for ourself and for other kernel subsystems.
 	 */
-	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
-	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
-		skb = __netdev_alloc_skb(dev, RX_COPY_THRESHOLD + NET_IP_ALIGN,
+	batch_target = queue->rx_target - (req_prod - queue->rx.rsp_cons);
+	for (i = skb_queue_len(&queue->rx_batch); i < batch_target; i++) {
+		skb = __netdev_alloc_skb(queue->info->netdev,
+					 RX_COPY_THRESHOLD + NET_IP_ALIGN,
 					 GFP_ATOMIC | __GFP_NOWARN);
 		if (unlikely(!skb))
 			goto no_skb;
@@ -279,7 +297,7 @@ static void xennet_alloc_rx_buffers(struct net_device *dev)
 			kfree_skb(skb);
 no_skb:
 			/* Could not allocate any skbuffs. Try again later. */
-			mod_timer(&np->rx_refill_timer,
+			mod_timer(&queue->rx_refill_timer,
 				  jiffies + (HZ/10));
 
 			/* Any skbuffs queued for refill? Force them out. */
@@ -289,44 +307,44 @@ no_skb:
 		}
 
 		skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE);
-		__skb_queue_tail(&np->rx_batch, skb);
+		__skb_queue_tail(&queue->rx_batch, skb);
 	}
 
 	/* Is the batch large enough to be worthwhile? */
-	if (i < (np->rx_target/2)) {
-		if (req_prod > np->rx.sring->req_prod)
+	if (i < (queue->rx_target/2)) {
+		if (req_prod > queue->rx.sring->req_prod)
 			goto push;
 		return;
 	}
 
 	/* Adjust our fill target if we risked running out of buffers. */
-	if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) &&
-	    ((np->rx_target *= 2) > np->rx_max_target))
-		np->rx_target = np->rx_max_target;
+	if (((req_prod - queue->rx.sring->rsp_prod) < (queue->rx_target / 4)) &&
+	    ((queue->rx_target *= 2) > queue->rx_max_target))
+		queue->rx_target = queue->rx_max_target;
 
  refill:
 	for (i = 0; ; i++) {
-		skb = __skb_dequeue(&np->rx_batch);
+		skb = __skb_dequeue(&queue->rx_batch);
 		if (skb == NULL)
 			break;
 
-		skb->dev = dev;
+		skb->dev = queue->info->netdev;
 
 		id = xennet_rxidx(req_prod + i);
 
-		BUG_ON(np->rx_skbs[id]);
-		np->rx_skbs[id] = skb;
+		BUG_ON(queue->rx_skbs[id]);
+		queue->rx_skbs[id] = skb;
 
-		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
+		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
-		np->grant_rx_ref[id] = ref;
+		queue->grant_rx_ref[id] = ref;
 
 		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 		vaddr = page_address(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 
-		req = RING_GET_REQUEST(&np->rx, req_prod + i);
+		req = RING_GET_REQUEST(&queue->rx, req_prod + i);
 		gnttab_grant_foreign_access_ref(ref,
-						np->xbdev->otherend_id,
+						queue->info->xbdev->otherend_id,
 						pfn_to_mfn(pfn),
 						0);
 
@@ -337,72 +355,76 @@ no_skb:
 	wmb();		/* barrier so backend seens requests */
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
-	np->rx.req_prod_pvt = req_prod + i;
+	queue->rx.req_prod_pvt = req_prod + i;
  push:
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->rx, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->rx, notify);
 	if (notify)
-		notify_remote_via_irq(np->rx_irq);
+		notify_remote_via_irq(queue->rx_irq);
 }
 
 static int xennet_open(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-
-	napi_enable(&np->napi);
-
-	spin_lock_bh(&np->rx_lock);
-	if (netif_carrier_ok(dev)) {
-		xennet_alloc_rx_buffers(dev);
-		np->rx.sring->rsp_event = np->rx.rsp_cons + 1;
-		if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
-			napi_schedule(&np->napi);
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
+
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		napi_enable(&queue->napi);
+
+		spin_lock_bh(&queue->rx_lock);
+		if (netif_carrier_ok(dev)) {
+			xennet_alloc_rx_buffers(queue);
+			queue->rx.sring->rsp_event = queue->rx.rsp_cons + 1;
+			if (RING_HAS_UNCONSUMED_RESPONSES(&queue->rx))
+				napi_schedule(&queue->napi);
+		}
+		spin_unlock_bh(&queue->rx_lock);
 	}
-	spin_unlock_bh(&np->rx_lock);
 
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 
 	return 0;
 }
 
-static void xennet_tx_buf_gc(struct net_device *dev)
+static void xennet_tx_buf_gc(struct netfront_queue *queue)
 {
 	RING_IDX cons, prod;
 	unsigned short id;
-	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
 
-	BUG_ON(!netif_carrier_ok(dev));
+	BUG_ON(!netif_carrier_ok(queue->info->netdev));
 
 	do {
-		prod = np->tx.sring->rsp_prod;
+		prod = queue->tx.sring->rsp_prod;
 		rmb(); /* Ensure we see responses up to 'rp'. */
 
-		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
+		for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
 			struct xen_netif_tx_response *txrsp;
 
-			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			txrsp = RING_GET_RESPONSE(&queue->tx, cons);
 			if (txrsp->status == XEN_NETIF_RSP_NULL)
 				continue;
 
 			id  = txrsp->id;
-			skb = np->tx_skbs[id].skb;
+			skb = queue->tx_skbs[id].skb;
 			if (unlikely(gnttab_query_foreign_access(
-				np->grant_tx_ref[id]) != 0)) {
+				queue->grant_tx_ref[id]) != 0)) {
 				pr_alert("%s: warning -- grant still in use by backend domain\n",
 					 __func__);
 				BUG();
 			}
 			gnttab_end_foreign_access_ref(
-				np->grant_tx_ref[id], GNTMAP_readonly);
+				queue->grant_tx_ref[id], GNTMAP_readonly);
 			gnttab_release_grant_reference(
-				&np->gref_tx_head, np->grant_tx_ref[id]);
-			np->grant_tx_ref[id] = GRANT_INVALID_REF;
-			np->grant_tx_page[id] = NULL;
-			add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, id);
+				&queue->gref_tx_head, queue->grant_tx_ref[id]);
+			queue->grant_tx_ref[id] = GRANT_INVALID_REF;
+			queue->grant_tx_page[id] = NULL;
+			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
 
-		np->tx.rsp_cons = prod;
+		queue->tx.rsp_cons = prod;
 
 		/*
 		 * Set a new event, then check for race with update of tx_cons.
@@ -412,21 +434,20 @@ static void xennet_tx_buf_gc(struct net_device *dev)
 		 * data is outstanding: in such cases notification from Xen is
 		 * likely to be the only kick that we'll get.
 		 */
-		np->tx.sring->rsp_event =
-			prod + ((np->tx.sring->req_prod - prod) >> 1) + 1;
+		queue->tx.sring->rsp_event =
+			prod + ((queue->tx.sring->req_prod - prod) >> 1) + 1;
 		mb();		/* update shared area */
-	} while ((cons == prod) && (prod != np->tx.sring->rsp_prod));
+	} while ((cons == prod) && (prod != queue->tx.sring->rsp_prod));
 
-	xennet_maybe_wake_tx(dev);
+	xennet_maybe_wake_tx(queue);
 }
 
-static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
+static void xennet_make_frags(struct sk_buff *skb, struct netfront_queue *queue,
 			      struct xen_netif_tx_request *tx)
 {
-	struct netfront_info *np = netdev_priv(dev);
 	char *data = skb->data;
 	unsigned long mfn;
-	RING_IDX prod = np->tx.req_prod_pvt;
+	RING_IDX prod = queue->tx.req_prod_pvt;
 	int frags = skb_shinfo(skb)->nr_frags;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
@@ -443,19 +464,19 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 		data += tx->size;
 		offset = 0;
 
-		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-		np->tx_skbs[id].skb = skb_get(skb);
-		tx = RING_GET_REQUEST(&np->tx, prod++);
+		id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
+		queue->tx_skbs[id].skb = skb_get(skb);
+		tx = RING_GET_REQUEST(&queue->tx, prod++);
 		tx->id = id;
-		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+		ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 		BUG_ON((signed short)ref < 0);
 
 		mfn = virt_to_mfn(data);
-		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
+		gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
 						mfn, GNTMAP_readonly);
 
-		np->grant_tx_page[id] = virt_to_page(data);
-		tx->gref = np->grant_tx_ref[id] = ref;
+		queue->grant_tx_page[id] = virt_to_page(data);
+		tx->gref = queue->grant_tx_ref[id] = ref;
 		tx->offset = offset;
 		tx->size = len;
 		tx->flags = 0;
@@ -487,21 +508,21 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 
 			tx->flags |= XEN_NETTXF_more_data;
 
-			id = get_id_from_freelist(&np->tx_skb_freelist,
-						  np->tx_skbs);
-			np->tx_skbs[id].skb = skb_get(skb);
-			tx = RING_GET_REQUEST(&np->tx, prod++);
+			id = get_id_from_freelist(&queue->tx_skb_freelist,
+						  queue->tx_skbs);
+			queue->tx_skbs[id].skb = skb_get(skb);
+			tx = RING_GET_REQUEST(&queue->tx, prod++);
 			tx->id = id;
-			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+			ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 			BUG_ON((signed short)ref < 0);
 
 			mfn = pfn_to_mfn(page_to_pfn(page));
 			gnttab_grant_foreign_access_ref(ref,
-							np->xbdev->otherend_id,
+							queue->info->xbdev->otherend_id,
 							mfn, GNTMAP_readonly);
 
-			np->grant_tx_page[id] = page;
-			tx->gref = np->grant_tx_ref[id] = ref;
+			queue->grant_tx_page[id] = page;
+			tx->gref = queue->grant_tx_ref[id] = ref;
 			tx->offset = offset;
 			tx->size = bytes;
 			tx->flags = 0;
@@ -518,7 +539,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	np->tx.req_prod_pvt = prod;
+	queue->tx.req_prod_pvt = prod;
 }
 
 /*
@@ -544,6 +565,12 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }
 
+static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	/* Stub for later implementation of queue selection */
+	return 0;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -559,6 +586,15 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
+	struct netfront_queue *queue = NULL;
+	u16 queue_index;
+
+	/* Drop the packet if no queues are set up */
+	if (np->num_queues < 1)
+		goto drop;
+	/* Determine which queue to transmit this SKB on */
+	queue_index = skb_get_queue_mapping(skb);
+	queue = &np->queues[queue_index];
 
 	/* If skb->len is too big for wire format, drop skb and alert
 	 * user about misconfiguration.
@@ -578,30 +614,30 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
-	spin_lock_irqsave(&np->tx_lock, flags);
+	spin_lock_irqsave(&queue->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
 		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
-		spin_unlock_irqrestore(&np->tx_lock, flags);
+		spin_unlock_irqrestore(&queue->tx_lock, flags);
 		goto drop;
 	}
 
-	i = np->tx.req_prod_pvt;
+	i = queue->tx.req_prod_pvt;
 
-	id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-	np->tx_skbs[id].skb = skb;
+	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
+	queue->tx_skbs[id].skb = skb;
 
-	tx = RING_GET_REQUEST(&np->tx, i);
+	tx = RING_GET_REQUEST(&queue->tx, i);
 
 	tx->id   = id;
-	ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 	mfn = virt_to_mfn(data);
 	gnttab_grant_foreign_access_ref(
-		ref, np->xbdev->otherend_id, mfn, GNTMAP_readonly);
-	np->grant_tx_page[id] = virt_to_page(data);
-	tx->gref = np->grant_tx_ref[id] = ref;
+		ref, queue->info->xbdev->otherend_id, mfn, GNTMAP_readonly);
+	queue->grant_tx_page[id] = virt_to_page(data);
+	tx->gref = queue->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
 
@@ -617,7 +653,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		struct xen_netif_extra_info *gso;
 
 		gso = (struct xen_netif_extra_info *)
-			RING_GET_REQUEST(&np->tx, ++i);
+			RING_GET_REQUEST(&queue->tx, ++i);
 
 		tx->flags |= XEN_NETTXF_extra_info;
 
@@ -632,14 +668,14 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		gso->flags = 0;
 	}
 
-	np->tx.req_prod_pvt = i + 1;
+	queue->tx.req_prod_pvt = i + 1;
 
-	xennet_make_frags(skb, dev, tx);
+	xennet_make_frags(skb, queue, tx);
 	tx->size = skb->len;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->tx, notify);
 	if (notify)
-		notify_remote_via_irq(np->tx_irq);
+		notify_remote_via_irq(queue->tx_irq);
 
 	u64_stats_update_begin(&stats->syncp);
 	stats->tx_bytes += skb->len;
@@ -647,12 +683,12 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	u64_stats_update_end(&stats->syncp);
 
 	/* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
-	xennet_tx_buf_gc(dev);
+	xennet_tx_buf_gc(queue);
 
-	if (!netfront_tx_slot_available(np))
-		netif_stop_queue(dev);
+	if (!netfront_tx_slot_available(queue))
+		netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
 
-	spin_unlock_irqrestore(&np->tx_lock, flags);
+	spin_unlock_irqrestore(&queue->tx_lock, flags);
 
 	return NETDEV_TX_OK;
 
@@ -665,32 +701,37 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static int xennet_close(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-	netif_stop_queue(np->netdev);
-	napi_disable(&np->napi);
+	unsigned int i;
+	struct netfront_queue *queue;
+	netif_tx_stop_all_queues(np->netdev);
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		napi_disable(&queue->napi);
+	}
 	return 0;
 }
 
-static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
+static void xennet_move_rx_slot(struct netfront_queue *queue, struct sk_buff *skb,
 				grant_ref_t ref)
 {
-	int new = xennet_rxidx(np->rx.req_prod_pvt);
-
-	BUG_ON(np->rx_skbs[new]);
-	np->rx_skbs[new] = skb;
-	np->grant_rx_ref[new] = ref;
-	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
-	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
-	np->rx.req_prod_pvt++;
+	int new = xennet_rxidx(queue->rx.req_prod_pvt);
+
+	BUG_ON(queue->rx_skbs[new]);
+	queue->rx_skbs[new] = skb;
+	queue->grant_rx_ref[new] = ref;
+	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->id = new;
+	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->gref = ref;
+	queue->rx.req_prod_pvt++;
 }
 
-static int xennet_get_extras(struct netfront_info *np,
+static int xennet_get_extras(struct netfront_queue *queue,
 			     struct xen_netif_extra_info *extras,
 			     RING_IDX rp)
 
 {
 	struct xen_netif_extra_info *extra;
-	struct device *dev = &np->netdev->dev;
-	RING_IDX cons = np->rx.rsp_cons;
+	struct device *dev = &queue->info->netdev->dev;
+	RING_IDX cons = queue->rx.rsp_cons;
 	int err = 0;
 
 	do {
@@ -705,7 +746,7 @@ static int xennet_get_extras(struct netfront_info *np,
 		}
 
 		extra = (struct xen_netif_extra_info *)
-			RING_GET_RESPONSE(&np->rx, ++cons);
+			RING_GET_RESPONSE(&queue->rx, ++cons);
 
 		if (unlikely(!extra->type ||
 			     extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
@@ -718,33 +759,33 @@ static int xennet_get_extras(struct netfront_info *np,
 			       sizeof(*extra));
 		}
 
-		skb = xennet_get_rx_skb(np, cons);
-		ref = xennet_get_rx_ref(np, cons);
-		xennet_move_rx_slot(np, skb, ref);
+		skb = xennet_get_rx_skb(queue, cons);
+		ref = xennet_get_rx_ref(queue, cons);
+		xennet_move_rx_slot(queue, skb, ref);
 	} while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
-	np->rx.rsp_cons = cons;
+	queue->rx.rsp_cons = cons;
 	return err;
 }
 
-static int xennet_get_responses(struct netfront_info *np,
+static int xennet_get_responses(struct netfront_queue *queue,
 				struct netfront_rx_info *rinfo, RING_IDX rp,
 				struct sk_buff_head *list)
 {
 	struct xen_netif_rx_response *rx = &rinfo->rx;
 	struct xen_netif_extra_info *extras = rinfo->extras;
-	struct device *dev = &np->netdev->dev;
-	RING_IDX cons = np->rx.rsp_cons;
-	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
-	grant_ref_t ref = xennet_get_rx_ref(np, cons);
+	struct device *dev = &queue->info->netdev->dev;
+	RING_IDX cons = queue->rx.rsp_cons;
+	struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
+	grant_ref_t ref = xennet_get_rx_ref(queue, cons);
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
 	unsigned long ret;
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
-		err = xennet_get_extras(np, extras, rp);
-		cons = np->rx.rsp_cons;
+		err = xennet_get_extras(queue, extras, rp);
+		cons = queue->rx.rsp_cons;
 	}
 
 	for (;;) {
@@ -753,7 +794,7 @@ static int xennet_get_responses(struct netfront_info *np,
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %x, size: %u\n",
 					 rx->offset, rx->status);
-			xennet_move_rx_slot(np, skb, ref);
+			xennet_move_rx_slot(queue, skb, ref);
 			err = -EINVAL;
 			goto next;
 		}
@@ -774,7 +815,7 @@ static int xennet_get_responses(struct netfront_info *np,
 		ret = gnttab_end_foreign_access_ref(ref, 0);
 		BUG_ON(!ret);
 
-		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+		gnttab_release_grant_reference(&queue->gref_rx_head, ref);
 
 		__skb_queue_tail(list, skb);
 
@@ -789,9 +830,9 @@ next:
 			break;
 		}
 
-		rx = RING_GET_RESPONSE(&np->rx, cons + slots);
-		skb = xennet_get_rx_skb(np, cons + slots);
-		ref = xennet_get_rx_ref(np, cons + slots);
+		rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
+		skb = xennet_get_rx_skb(queue, cons + slots);
+		ref = xennet_get_rx_ref(queue, cons + slots);
 		slots++;
 	}
 
@@ -802,7 +843,7 @@ next:
 	}
 
 	if (unlikely(err))
-		np->rx.rsp_cons = cons + slots;
+		queue->rx.rsp_cons = cons + slots;
 
 	return err;
 }
@@ -836,17 +877,17 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
 	return 0;
 }
 
-static RING_IDX xennet_fill_frags(struct netfront_info *np,
+static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 				  struct sk_buff *skb,
 				  struct sk_buff_head *list)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
-	RING_IDX cons = np->rx.rsp_cons;
+	RING_IDX cons = queue->rx.rsp_cons;
 	struct sk_buff *nskb;
 
 	while ((nskb = __skb_dequeue(list))) {
 		struct xen_netif_rx_response *rx =
-			RING_GET_RESPONSE(&np->rx, ++cons);
+			RING_GET_RESPONSE(&queue->rx, ++cons);
 		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
 
 		if (shinfo->nr_frags == MAX_SKB_FRAGS) {
@@ -879,7 +920,7 @@ static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
 	 */
 	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
 		struct netfront_info *np = netdev_priv(dev);
-		np->rx_gso_checksum_fixup++;
+		atomic_inc(&np->rx_gso_checksum_fixup)
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		recalculate_partial_csum = true;
 	}
@@ -891,11 +932,10 @@ static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
 	return skb_checksum_setup(skb, recalculate_partial_csum);
 }
 
-static int handle_incoming_queue(struct net_device *dev,
+static int handle_incoming_queue(struct netfront_queue *queue,
 				 struct sk_buff_head *rxq)
 {
-	struct netfront_info *np = netdev_priv(dev);
-	struct netfront_stats *stats = this_cpu_ptr(np->stats);
+	struct netfront_stats *stats = this_cpu_ptr(queue->info->stats);
 	int packets_dropped = 0;
 	struct sk_buff *skb;
 
@@ -906,12 +946,12 @@ static int handle_incoming_queue(struct net_device *dev,
 			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
 
 		/* Ethernet work: Delayed to here as it peeks the header. */
-		skb->protocol = eth_type_trans(skb, dev);
+		skb->protocol = eth_type_trans(skb, queue->info->netdev);
 
-		if (checksum_setup(dev, skb)) {
+		if (checksum_setup(queue->info->netdev, skb)) {
 			kfree_skb(skb);
 			packets_dropped++;
-			dev->stats.rx_errors++;
+			queue->info->netdev->stats.rx_errors++;
 			continue;
 		}
 
@@ -921,7 +961,7 @@ static int handle_incoming_queue(struct net_device *dev,
 		u64_stats_update_end(&stats->syncp);
 
 		/* Pass it up. */
-		napi_gro_receive(&np->napi, skb);
+		napi_gro_receive(&queue->napi, skb);
 	}
 
 	return packets_dropped;
@@ -929,8 +969,8 @@ static int handle_incoming_queue(struct net_device *dev,
 
 static int xennet_poll(struct napi_struct *napi, int budget)
 {
-	struct netfront_info *np = container_of(napi, struct netfront_info, napi);
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = container_of(napi, struct netfront_queue, napi);
+	struct net_device *dev = queue->info->netdev;
 	struct sk_buff *skb;
 	struct netfront_rx_info rinfo;
 	struct xen_netif_rx_response *rx = &rinfo.rx;
@@ -943,29 +983,29 @@ static int xennet_poll(struct napi_struct *napi, int budget)
 	unsigned long flags;
 	int err;
 
-	spin_lock(&np->rx_lock);
+	spin_lock(&queue->rx_lock);
 
 	skb_queue_head_init(&rxq);
 	skb_queue_head_init(&errq);
 	skb_queue_head_init(&tmpq);
 
-	rp = np->rx.sring->rsp_prod;
+	rp = queue->rx.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	i = np->rx.rsp_cons;
+	i = queue->rx.rsp_cons;
 	work_done = 0;
 	while ((i != rp) && (work_done < budget)) {
-		memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx));
+		memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx));
 		memset(extras, 0, sizeof(rinfo.extras));
 
-		err = xennet_get_responses(np, &rinfo, rp, &tmpq);
+		err = xennet_get_responses(queue, &rinfo, rp, &tmpq);
 
 		if (unlikely(err)) {
 err:
 			while ((skb = __skb_dequeue(&tmpq)))
 				__skb_queue_tail(&errq, skb);
 			dev->stats.rx_errors++;
-			i = np->rx.rsp_cons;
+			i = queue->rx.rsp_cons;
 			continue;
 		}
 
@@ -977,7 +1017,7 @@ err:
 
 			if (unlikely(xennet_set_skb_gso(skb, gso))) {
 				__skb_queue_head(&tmpq, skb);
-				np->rx.rsp_cons += skb_queue_len(&tmpq);
+				queue->rx.rsp_cons += skb_queue_len(&tmpq);
 				goto err;
 			}
 		}
@@ -991,7 +1031,7 @@ err:
 		skb->data_len = rx->status;
 		skb->len += rx->status;
 
-		i = xennet_fill_frags(np, skb, &tmpq);
+		i = xennet_fill_frags(queue, skb, &tmpq);
 
 		if (rx->flags & XEN_NETRXF_csum_blank)
 			skb->ip_summed = CHECKSUM_PARTIAL;
@@ -1000,22 +1040,22 @@ err:
 
 		__skb_queue_tail(&rxq, skb);
 
-		np->rx.rsp_cons = ++i;
+		queue->rx.rsp_cons = ++i;
 		work_done++;
 	}
 
 	__skb_queue_purge(&errq);
 
-	work_done -= handle_incoming_queue(dev, &rxq);
+	work_done -= handle_incoming_queue(queue, &rxq);
 
 	/* If we get a callback with very few responses, reduce fill target. */
 	/* NB. Note exponential increase, linear decrease. */
-	if (((np->rx.req_prod_pvt - np->rx.sring->rsp_prod) >
-	     ((3*np->rx_target) / 4)) &&
-	    (--np->rx_target < np->rx_min_target))
-		np->rx_target = np->rx_min_target;
+	if (((queue->rx.req_prod_pvt - queue->rx.sring->rsp_prod) >
+	     ((3*queue->rx_target) / 4)) &&
+	    (--queue->rx_target < queue->rx_min_target))
+		queue->rx_target = queue->rx_min_target;
 
-	xennet_alloc_rx_buffers(dev);
+	xennet_alloc_rx_buffers(queue);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -1024,14 +1064,14 @@ err:
 
 		local_irq_save(flags);
 
-		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
 		if (!more_to_do)
 			__napi_complete(napi);
 
 		local_irq_restore(flags);
 	}
 
-	spin_unlock(&np->rx_lock);
+	spin_unlock(&queue->rx_lock);
 
 	return work_done;
 }
@@ -1079,43 +1119,43 @@ static struct rtnl_link_stats64 *xennet_get_stats64(struct net_device *dev,
 	return tot;
 }
 
-static void xennet_release_tx_bufs(struct netfront_info *np)
+static void xennet_release_tx_bufs(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
 	int i;
 
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		/* Skip over entries which are actually freelist references */
-		if (skb_entry_is_link(&np->tx_skbs[i]))
+		if (skb_entry_is_link(&queue->tx_skbs[i]))
 			continue;
 
-		skb = np->tx_skbs[i].skb;
-		get_page(np->grant_tx_page[i]);
-		gnttab_end_foreign_access(np->grant_tx_ref[i],
+		skb = queue->tx_skbs[i].skb;
+		get_page(queue->grant_tx_page[i]);
+		gnttab_end_foreign_access(queue->grant_tx_ref[i],
 					  GNTMAP_readonly,
-					  (unsigned long)page_address(np->grant_tx_page[i]));
-		np->grant_tx_page[i] = NULL;
-		np->grant_tx_ref[i] = GRANT_INVALID_REF;
-		add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, i);
+					  (unsigned long)page_address(queue->grant_tx_page[i]));
+		queue->grant_tx_page[i] = NULL;
+		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+		add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);
 		dev_kfree_skb_irq(skb);
 	}
 }
 
-static void xennet_release_rx_bufs(struct netfront_info *np)
+static void xennet_release_rx_bufs(struct netfront_queue *queue)
 {
 	int id, ref;
 
-	spin_lock_bh(&np->rx_lock);
+	spin_lock_bh(&queue->rx_lock);
 
 	for (id = 0; id < NET_RX_RING_SIZE; id++) {
 		struct sk_buff *skb;
 		struct page *page;
 
-		skb = np->rx_skbs[id];
+		skb = queue->rx_skbs[id];
 		if (!skb)
 			continue;
 
-		ref = np->grant_rx_ref[id];
+		ref = queue->grant_rx_ref[id];
 		if (ref == GRANT_INVALID_REF)
 			continue;
 
@@ -1127,21 +1167,27 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
 		get_page(page);
 		gnttab_end_foreign_access(ref, 0,
 					  (unsigned long)page_address(page));
-		np->grant_rx_ref[id] = GRANT_INVALID_REF;
+		queue->grant_rx_ref[id] = GRANT_INVALID_REF;
 
 		kfree_skb(skb);
 	}
 
-	spin_unlock_bh(&np->rx_lock);
+	spin_unlock_bh(&queue->rx_lock);
 }
 
 static void xennet_uninit(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-	xennet_release_tx_bufs(np);
-	xennet_release_rx_bufs(np);
-	gnttab_free_grant_references(np->gref_tx_head);
-	gnttab_free_grant_references(np->gref_rx_head);
+	struct netfront_queue *queue;
+	unsigned int i;
+
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		xennet_release_tx_bufs(queue);
+		xennet_release_rx_bufs(queue);
+		gnttab_free_grant_references(queue->gref_tx_head);
+		gnttab_free_grant_references(queue->gref_rx_head);
+	}
 }
 
 static netdev_features_t xennet_fix_features(struct net_device *dev,
@@ -1202,25 +1248,24 @@ static int xennet_set_features(struct net_device *dev,
 
 static irqreturn_t xennet_tx_interrupt(int irq, void *dev_id)
 {
-	struct netfront_info *np = dev_id;
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = dev_id;
 	unsigned long flags;
 
-	spin_lock_irqsave(&np->tx_lock, flags);
-	xennet_tx_buf_gc(dev);
-	spin_unlock_irqrestore(&np->tx_lock, flags);
+	spin_lock_irqsave(&queue->tx_lock, flags);
+	xennet_tx_buf_gc(queue);
+	spin_unlock_irqrestore(&queue->tx_lock, flags);
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t xennet_rx_interrupt(int irq, void *dev_id)
 {
-	struct netfront_info *np = dev_id;
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = dev_id;
+	struct net_device *dev = queue->info->netdev;
 
 	if (likely(netif_carrier_ok(dev) &&
-		   RING_HAS_UNCONSUMED_RESPONSES(&np->rx)))
-			napi_schedule(&np->napi);
+		   RING_HAS_UNCONSUMED_RESPONSES(&queue->rx)))
+			napi_schedule(&queue->napi);
 
 	return IRQ_HANDLED;
 }
@@ -1235,7 +1280,11 @@ static irqreturn_t xennet_interrupt(int irq, void *dev_id)
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void xennet_poll_controller(struct net_device *dev)
 {
-	xennet_interrupt(0, dev);
+	/* Poll each queue */
+	struct netfront_info *info = netdev_priv(dev);
+	unsigned int i;
+	for (i = 0; i < info->num_queues; ++i)
+		xennet_interrupt(0, &info->queues[i]);
 }
 #endif
 
@@ -1250,6 +1299,7 @@ static const struct net_device_ops xennet_netdev_ops = {
 	.ndo_validate_addr   = eth_validate_addr,
 	.ndo_fix_features    = xennet_fix_features,
 	.ndo_set_features    = xennet_set_features,
+	.ndo_select_queue    = xennet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = xennet_poll_controller,
 #endif
@@ -1261,24 +1311,15 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	struct net_device *netdev;
 	struct netfront_info *np;
 
-	netdev = alloc_etherdev(sizeof(struct netfront_info));
+	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), 1);
 	if (!netdev)
 		return ERR_PTR(-ENOMEM);
 
 	np                   = netdev_priv(netdev);
 	np->xbdev            = dev;
 
-	spin_lock_init(&np->tx_lock);
-	spin_lock_init(&np->rx_lock);
-
-	skb_queue_head_init(&np->rx_batch);
-	np->rx_target     = RX_DFL_MIN_TARGET;
-	np->rx_min_target = RX_DFL_MIN_TARGET;
-	np->rx_max_target = RX_MAX_TARGET;
-
-	init_timer(&np->rx_refill_timer);
-	np->rx_refill_timer.data = (unsigned long)netdev;
-	np->rx_refill_timer.function = rx_refill_timeout;
+	np->num_queues = 0;
+	np->queues = NULL;
 
 	err = -ENOMEM;
 	np->stats = alloc_percpu(struct netfront_stats);
@@ -1291,38 +1332,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 		u64_stats_init(&xen_nf_stats->syncp);
 	}
 
-	/* Initialise tx_skbs as a free chain containing every entry. */
-	np->tx_skb_freelist = 0;
-	for (i = 0; i < NET_TX_RING_SIZE; i++) {
-		skb_entry_set_link(&np->tx_skbs[i], i+1);
-		np->grant_tx_ref[i] = GRANT_INVALID_REF;
-	}
-
-	/* Clear out rx_skbs */
-	for (i = 0; i < NET_RX_RING_SIZE; i++) {
-		np->rx_skbs[i] = NULL;
-		np->grant_rx_ref[i] = GRANT_INVALID_REF;
-		np->grant_tx_page[i] = NULL;
-	}
-
-	/* A grant for every tx ring slot */
-	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
-					  &np->gref_tx_head) < 0) {
-		pr_alert("can't alloc tx grant refs\n");
-		err = -ENOMEM;
-		goto exit_free_stats;
-	}
-	/* A grant for every rx ring slot */
-	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
-					  &np->gref_rx_head) < 0) {
-		pr_alert("can't alloc rx grant refs\n");
-		err = -ENOMEM;
-		goto exit_free_tx;
-	}
-
 	netdev->netdev_ops	= &xennet_netdev_ops;
 
-	netif_napi_add(netdev, &np->napi, xennet_poll, 64);
 	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 				  NETIF_F_GSO_ROBUST;
 	netdev->hw_features	= NETIF_F_SG |
@@ -1348,10 +1359,6 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 
 	return netdev;
 
- exit_free_tx:
-	gnttab_free_grant_references(np->gref_tx_head);
- exit_free_stats:
-	free_percpu(np->stats);
  exit:
 	free_netdev(netdev);
 	return ERR_PTR(err);
@@ -1409,30 +1416,35 @@ static void xennet_end_access(int ref, void *page)
 
 static void xennet_disconnect_backend(struct netfront_info *info)
 {
-	/* Stop old i/f to prevent errors whilst we rebuild the state. */
-	spin_lock_bh(&info->rx_lock);
-	spin_lock_irq(&info->tx_lock);
-	netif_carrier_off(info->netdev);
-	spin_unlock_irq(&info->tx_lock);
-	spin_unlock_bh(&info->rx_lock);
-
-	if (info->tx_irq && (info->tx_irq == info->rx_irq))
-		unbind_from_irqhandler(info->tx_irq, info);
-	if (info->tx_irq && (info->tx_irq != info->rx_irq)) {
-		unbind_from_irqhandler(info->tx_irq, info);
-		unbind_from_irqhandler(info->rx_irq, info);
-	}
-	info->tx_evtchn = info->rx_evtchn = 0;
-	info->tx_irq = info->rx_irq = 0;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
+
+	for (i = 0; i < info->num_queues; ++i) {
+		/* Stop old i/f to prevent errors whilst we rebuild the state. */
+		spin_lock_bh(&queue->rx_lock);
+		spin_lock_irq(&queue->tx_lock);
+		netif_carrier_off(queue->info->netdev);
+		spin_unlock_irq(&queue->tx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+
+		if (queue->tx_irq && (queue->tx_irq == queue->rx_irq))
+			unbind_from_irqhandler(queue->tx_irq, queue);
+		if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) {
+			unbind_from_irqhandler(queue->tx_irq, queue);
+			unbind_from_irqhandler(queue->rx_irq, queue);
+		}
+		queue->tx_evtchn = queue->rx_evtchn = 0;
+		queue->tx_irq = queue->rx_irq = 0;
 
-	/* End access and free the pages */
-	xennet_end_access(info->tx_ring_ref, info->tx.sring);
-	xennet_end_access(info->rx_ring_ref, info->rx.sring);
+		/* End access and free the pages */
+		xennet_end_access(queue->tx_ring_ref, queue->tx.sring);
+		xennet_end_access(queue->rx_ring_ref, queue->rx.sring);
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
-	info->tx.sring = NULL;
-	info->rx.sring = NULL;
+		queue->tx_ring_ref = GRANT_INVALID_REF;
+		queue->rx_ring_ref = GRANT_INVALID_REF;
+		queue->tx.sring = NULL;
+		queue->rx.sring = NULL;
+	}
 }
 
 /**
@@ -1473,100 +1485,86 @@ static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
 	return 0;
 }
 
-static int setup_netfront_single(struct netfront_info *info)
+static int setup_netfront_single(struct netfront_queue *queue)
 {
 	int err;
 
-	err = xenbus_alloc_evtchn(info->xbdev, &info->tx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->tx_evtchn);
 	if (err < 0)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+	err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
 					xennet_interrupt,
-					0, info->netdev->name, info);
+					0, queue->info->netdev->name, queue);
 	if (err < 0)
 		goto bind_fail;
-	info->rx_evtchn = info->tx_evtchn;
-	info->rx_irq = info->tx_irq = err;
+	queue->rx_evtchn = queue->tx_evtchn;
+	queue->rx_irq = queue->tx_irq = err;
 
 	return 0;
 
 bind_fail:
-	xenbus_free_evtchn(info->xbdev, info->tx_evtchn);
-	info->tx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->tx_evtchn);
+	queue->tx_evtchn = 0;
 fail:
 	return err;
 }
 
-static int setup_netfront_split(struct netfront_info *info)
+static int setup_netfront_split(struct netfront_queue *queue)
 {
 	int err;
 
-	err = xenbus_alloc_evtchn(info->xbdev, &info->tx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->tx_evtchn);
 	if (err < 0)
 		goto fail;
-	err = xenbus_alloc_evtchn(info->xbdev, &info->rx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->rx_evtchn);
 	if (err < 0)
 		goto alloc_rx_evtchn_fail;
 
-	snprintf(info->tx_irq_name, sizeof(info->tx_irq_name),
-		 "%s-tx", info->netdev->name);
-	err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+	snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
+		 "%s-tx", queue->name);
+	err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
 					xennet_tx_interrupt,
-					0, info->tx_irq_name, info);
+					0, queue->tx_irq_name, queue);
 	if (err < 0)
 		goto bind_tx_fail;
-	info->tx_irq = err;
+	queue->tx_irq = err;
 
-	snprintf(info->rx_irq_name, sizeof(info->rx_irq_name),
-		 "%s-rx", info->netdev->name);
-	err = bind_evtchn_to_irqhandler(info->rx_evtchn,
+	snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
+		 "%s-rx", queue->name);
+	err = bind_evtchn_to_irqhandler(queue->rx_evtchn,
 					xennet_rx_interrupt,
-					0, info->rx_irq_name, info);
+					0, queue->rx_irq_name, queue);
 	if (err < 0)
 		goto bind_rx_fail;
-	info->rx_irq = err;
+	queue->rx_irq = err;
 
 	return 0;
 
 bind_rx_fail:
-	unbind_from_irqhandler(info->tx_irq, info);
-	info->tx_irq = 0;
+	unbind_from_irqhandler(queue->tx_irq, queue);
+	queue->tx_irq = 0;
 bind_tx_fail:
-	xenbus_free_evtchn(info->xbdev, info->rx_evtchn);
-	info->rx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->rx_evtchn);
+	queue->rx_evtchn = 0;
 alloc_rx_evtchn_fail:
-	xenbus_free_evtchn(info->xbdev, info->tx_evtchn);
-	info->tx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->tx_evtchn);
+	queue->tx_evtchn = 0;
 fail:
 	return err;
 }
 
-static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
+static int setup_netfront(struct xenbus_device *dev,
+			struct netfront_queue *queue, unsigned int feature_split_evtchn)
 {
 	struct xen_netif_tx_sring *txs;
 	struct xen_netif_rx_sring *rxs;
 	int err;
-	struct net_device *netdev = info->netdev;
-	unsigned int feature_split_evtchn;
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
-	info->rx.sring = NULL;
-	info->tx.sring = NULL;
-	netdev->irq = 0;
-
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-			   "feature-split-event-channels", "%u",
-			   &feature_split_evtchn);
-	if (err < 0)
-		feature_split_evtchn = 0;
-
-	err = xen_net_read_mac(dev, netdev->dev_addr);
-	if (err) {
-		xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
-		goto fail;
-	}
+	queue->tx_ring_ref = GRANT_INVALID_REF;
+	queue->rx_ring_ref = GRANT_INVALID_REF;
+	queue->rx.sring = NULL;
+	queue->tx.sring = NULL;
 
 	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
 	if (!txs) {
@@ -1575,13 +1573,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
 	if (err < 0)
 		goto grant_tx_ring_fail;
+	queue->tx_ring_ref = err;
 
-	info->tx_ring_ref = err;
 	rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
 	if (!rxs) {
 		err = -ENOMEM;
@@ -1589,21 +1587,21 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
 	if (err < 0)
 		goto grant_rx_ring_fail;
-	info->rx_ring_ref = err;
+	queue->rx_ring_ref = err;
 
 	if (feature_split_evtchn)
-		err = setup_netfront_split(info);
+		err = setup_netfront_split(queue);
 	/* setup single event channel if
 	 *  a) feature-split-event-channels == 0
 	 *  b) feature-split-event-channels == 1 but failed to setup
 	 */
 	if (!feature_split_evtchn || (feature_split_evtchn && err))
-		err = setup_netfront_single(info);
+		err = setup_netfront_single(queue);
 
 	if (err)
 		goto alloc_evtchn_fail;
@@ -1614,17 +1612,77 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	 * granted pages because backend is not accessing it at this point.
 	 */
 alloc_evtchn_fail:
-	gnttab_end_foreign_access_ref(info->rx_ring_ref, 0);
+	gnttab_end_foreign_access_ref(queue->rx_ring_ref, 0);
 grant_rx_ring_fail:
 	free_page((unsigned long)rxs);
 alloc_rx_ring_fail:
-	gnttab_end_foreign_access_ref(info->tx_ring_ref, 0);
+	gnttab_end_foreign_access_ref(queue->tx_ring_ref, 0);
 grant_tx_ring_fail:
 	free_page((unsigned long)txs);
 fail:
 	return err;
 }
 
+/* Queue-specific initialisation
+ * This used to be done in xennet_create_dev() but must now
+ * be run per-queue.
+ */
+static int xennet_init_queue(struct netfront_queue *queue)
+{
+	unsigned short i;
+	int err = 0;
+
+	spin_lock_init(&queue->tx_lock);
+	spin_lock_init(&queue->rx_lock);
+
+	skb_queue_head_init(&queue->rx_batch);
+	queue->rx_target     = RX_DFL_MIN_TARGET;
+	queue->rx_min_target = RX_DFL_MIN_TARGET;
+	queue->rx_max_target = RX_MAX_TARGET;
+
+	init_timer(&queue->rx_refill_timer);
+	queue->rx_refill_timer.data = (unsigned long)queue;
+	queue->rx_refill_timer.function = rx_refill_timeout;
+
+	/* Initialise tx_skbs as a free chain containing every entry. */
+	queue->tx_skb_freelist = 0;
+	for (i = 0; i < NET_TX_RING_SIZE; i++) {
+		skb_entry_set_link(&queue->tx_skbs[i], i+1);
+		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* Clear out rx_skbs */
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		queue->rx_skbs[i] = NULL;
+		queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* A grant for every tx ring slot */
+	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
+					  &queue->gref_tx_head) < 0) {
+		pr_alert("can't alloc tx grant refs\n");
+		err = -ENOMEM;
+		goto exit;
+	}
+
+	/* A grant for every rx ring slot */
+	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
+					  &queue->gref_rx_head) < 0) {
+		pr_alert("can't alloc rx grant refs\n");
+		err = -ENOMEM;
+		goto exit_free_tx;
+	}
+
+	netif_napi_add(queue->info->netdev, &queue->napi, xennet_poll, 64);
+
+	return 0;
+
+ exit_free_tx:
+	gnttab_free_grant_references(queue->gref_tx_head);
+ exit:
+	return err;
+}
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_netback(struct xenbus_device *dev,
 			   struct netfront_info *info)
@@ -1632,13 +1690,72 @@ static int talk_to_netback(struct xenbus_device *dev,
 	const char *message;
 	struct xenbus_transaction xbt;
 	int err;
+	unsigned int feature_split_evtchn;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_netfront(dev, info);
-	if (err)
+	info->netdev->irq = 0;
+
+	/* Check feature-split-event-channels */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "feature-split-event-channels", "%u",
+			   &feature_split_evtchn);
+	if (err < 0)
+		feature_split_evtchn = 0;
+
+	/* Read mac addr. */
+	err = xen_net_read_mac(dev, info->netdev->dev_addr);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
 		goto out;
+	}
+
+	/* Allocate array of queues */
+	info->queues = kcalloc(1, sizeof(struct netfront_queue), GFP_KERNEL);
+	if (!info->queues) {
+		err = -ENOMEM;
+		goto out;
+	}
+	info->num_queues = 1;
+
+	/* Create shared ring, alloc event channel -- for each queue */
+	for (i = 0; i < info->num_queues; ++i) {
+		queue = &info->queues[i];
+		queue->id = i;
+		queue->info = info;
+		err = xennet_init_queue(queue);
+		if (err) {
+			/* xennet_init_queue() cleans up after itself on failure,
+			 * but we still have to clean up any previously initialised
+			 * queues. If i > 0, set info->num_queues to i, then goto
+			 * destroy_ring, which calls xennet_disconnect_backend()
+			 * to tidy up.
+			 */
+			if (i > 0) {
+				info->num_queues = i;
+				goto destroy_ring;
+			} else {
+				goto out;
+			}
+		}
+		err = setup_netfront(dev, queue, feature_split_evtchn);
+		if (err) {
+			/* As for xennet_init_queue(), setup_netfront() will tidy
+			 * up the current queue on error, but we need to clean up
+			 * those already allocated.
+			 */
+			if (i > 0) {
+				info->num_queues = i;
+				goto destroy_ring;
+			} else {
+				goto out;
+			}
+		}
+	}
 
 again:
+	queue = &info->queues[0]; /* Use first queue only */
+
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
@@ -1646,34 +1763,34 @@ again:
 	}
 
 	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
-			    info->tx_ring_ref);
+			    queue->tx_ring_ref);
 	if (err) {
 		message = "writing tx ring-ref";
 		goto abort_transaction;
 	}
 	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
-			    info->rx_ring_ref);
+			    queue->rx_ring_ref);
 	if (err) {
 		message = "writing rx ring-ref";
 		goto abort_transaction;
 	}
 
-	if (info->tx_evtchn == info->rx_evtchn) {
+	if (queue->tx_evtchn == queue->rx_evtchn) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", info->tx_evtchn);
+				    "event-channel", "%u", queue->tx_evtchn);
 		if (err) {
 			message = "writing event-channel";
 			goto abort_transaction;
 		}
 	} else {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-tx", "%u", info->tx_evtchn);
+				    "event-channel-tx", "%u", queue->tx_evtchn);
 		if (err) {
 			message = "writing event-channel-tx";
 			goto abort_transaction;
 		}
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-rx", "%u", info->rx_evtchn);
+				    "event-channel-rx", "%u", queue->rx_evtchn);
 		if (err) {
 			message = "writing event-channel-rx";
 			goto abort_transaction;
@@ -1733,6 +1850,9 @@ again:
 	xenbus_dev_fatal(dev, err, "%s", message);
  destroy_ring:
 	xennet_disconnect_backend(info);
+	kfree(info->queues);
+	info->queues = NULL;
+	info->num_queues = 0;
  out:
 	return err;
 }
@@ -1745,6 +1865,8 @@ static int xennet_connect(struct net_device *dev)
 	grant_ref_t ref;
 	struct xen_netif_rx_request *req;
 	unsigned int feature_rx_copy;
+	unsigned int j = 0;
+	struct netfront_queue *queue = NULL;
 
 	err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
 			   "feature-rx-copy", "%u", &feature_rx_copy);
@@ -1765,36 +1887,40 @@ static int xennet_connect(struct net_device *dev)
 	netdev_update_features(dev);
 	rtnl_unlock();
 
-	spin_lock_bh(&np->rx_lock);
-	spin_lock_irq(&np->tx_lock);
+	/* By now, the queue structures have been set up */
+	for (j = 0; j < np->num_queues; ++j) {
+		queue = &np->queues[j];
+		spin_lock_bh(&queue->rx_lock);
+		spin_lock_irq(&queue->tx_lock);
 
-	/* Step 1: Discard all pending TX packet fragments. */
-	xennet_release_tx_bufs(np);
+		/* Step 1: Discard all pending TX packet fragments. */
+		xennet_release_tx_bufs(queue);
 
-	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
-	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
-		skb_frag_t *frag;
-		const struct page *page;
-		if (!np->rx_skbs[i])
-			continue;
+		/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
+		for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
+			skb_frag_t *frag;
+			const struct page *page;
+			if (!queue->rx_skbs[i])
+				continue;
 
-		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
-		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
-		req = RING_GET_REQUEST(&np->rx, requeue_idx);
+			skb = queue->rx_skbs[requeue_idx] = xennet_get_rx_skb(queue, i);
+			ref = queue->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(queue, i);
+			req = RING_GET_REQUEST(&queue->rx, requeue_idx);
 
-		frag = &skb_shinfo(skb)->frags[0];
-		page = skb_frag_page(frag);
-		gnttab_grant_foreign_access_ref(
-			ref, np->xbdev->otherend_id,
-			pfn_to_mfn(page_to_pfn(page)),
-			0);
-		req->gref = ref;
-		req->id   = requeue_idx;
+			frag = &skb_shinfo(skb)->frags[0];
+			page = skb_frag_page(frag);
+			gnttab_grant_foreign_access_ref(
+				ref, queue->info->xbdev->otherend_id,
+				pfn_to_mfn(page_to_pfn(page)),
+				0);
+			req->gref = ref;
+			req->id   = requeue_idx;
 
-		requeue_idx++;
-	}
+			requeue_idx++;
+		}
 
-	np->rx.req_prod_pvt = requeue_idx;
+		queue->rx.req_prod_pvt = requeue_idx;
+	}
 
 	/*
 	 * Step 3: All public and private state should now be sane.  Get
@@ -1803,14 +1929,17 @@ static int xennet_connect(struct net_device *dev)
 	 * packets.
 	 */
 	netif_carrier_on(np->netdev);
-	notify_remote_via_irq(np->tx_irq);
-	if (np->tx_irq != np->rx_irq)
-		notify_remote_via_irq(np->rx_irq);
-	xennet_tx_buf_gc(dev);
-	xennet_alloc_rx_buffers(dev);
-
-	spin_unlock_irq(&np->tx_lock);
-	spin_unlock_bh(&np->rx_lock);
+	for (j = 0; j < np->num_queues; ++j) {
+		queue = &np->queues[j];
+		notify_remote_via_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			notify_remote_via_irq(queue->rx_irq);
+		xennet_tx_buf_gc(queue);
+		xennet_alloc_rx_buffers(queue);
+
+		spin_unlock_irq(&queue->tx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 
 	return 0;
 }
@@ -1883,7 +2012,7 @@ static void xennet_get_ethtool_stats(struct net_device *dev,
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(xennet_stats); i++)
-		data[i] = *(unsigned long *)(np + xennet_stats[i].offset);
+		data[i] = atomic_read((atomic_t *)(vif + xenvif_stats[i].offset));
 }
 
 static void xennet_get_strings(struct net_device *dev, u32 stringset, u8 * data)
@@ -1915,7 +2044,10 @@ static ssize_t show_rxbuf_min(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_min_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_min_target);
+	else
+		return sprintf(buf, "%u\n", RX_MIN_TARGET);
 }
 
 static ssize_t store_rxbuf_min(struct device *dev,
@@ -1926,6 +2058,8 @@ static ssize_t store_rxbuf_min(struct device *dev,
 	struct netfront_info *np = netdev_priv(netdev);
 	char *endp;
 	unsigned long target;
+	unsigned int i;
+	struct netfront_queue *queue;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -1939,16 +2073,19 @@ static ssize_t store_rxbuf_min(struct device *dev,
 	if (target > RX_MAX_TARGET)
 		target = RX_MAX_TARGET;
 
-	spin_lock_bh(&np->rx_lock);
-	if (target > np->rx_max_target)
-		np->rx_max_target = target;
-	np->rx_min_target = target;
-	if (target > np->rx_target)
-		np->rx_target = target;
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		spin_lock_bh(&queue->rx_lock);
+		if (target > queue->rx_max_target)
+			queue->rx_max_target = target;
+		queue->rx_min_target = target;
+		if (target > queue->rx_target)
+			queue->rx_target = target;
 
-	xennet_alloc_rx_buffers(netdev);
+		xennet_alloc_rx_buffers(queue);
 
-	spin_unlock_bh(&np->rx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 	return len;
 }
 
@@ -1958,7 +2095,10 @@ static ssize_t show_rxbuf_max(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_max_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_max_target);
+	else
+		return sprintf(buf, "%u\n", RX_MAX_TARGET);
 }
 
 static ssize_t store_rxbuf_max(struct device *dev,
@@ -1969,6 +2109,8 @@ static ssize_t store_rxbuf_max(struct device *dev,
 	struct netfront_info *np = netdev_priv(netdev);
 	char *endp;
 	unsigned long target;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -1982,16 +2124,19 @@ static ssize_t store_rxbuf_max(struct device *dev,
 	if (target > RX_MAX_TARGET)
 		target = RX_MAX_TARGET;
 
-	spin_lock_bh(&np->rx_lock);
-	if (target < np->rx_min_target)
-		np->rx_min_target = target;
-	np->rx_max_target = target;
-	if (target < np->rx_target)
-		np->rx_target = target;
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		spin_lock_bh(&queue->rx_lock);
+		if (target < queue->rx_min_target)
+			queue->rx_min_target = target;
+		queue->rx_max_target = target;
+		if (target < queue->rx_target)
+			queue->rx_target = target;
 
-	xennet_alloc_rx_buffers(netdev);
+		xennet_alloc_rx_buffers(queue);
 
-	spin_unlock_bh(&np->rx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 	return len;
 }
 
@@ -2001,7 +2146,10 @@ static ssize_t show_rxbuf_cur(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_target);
+	else
+		return sprintf(buf, "0\n");
 }
 
 static struct device_attribute xennet_attrs[] = {
@@ -2048,17 +2196,27 @@ static const struct xenbus_device_id netfront_ids[] = {
 static int xennet_remove(struct xenbus_device *dev)
 {
 	struct netfront_info *info = dev_get_drvdata(&dev->dev);
+	struct netfront_queue *queue = NULL;
+	unsigned int i = 0;
 
 	dev_dbg(&dev->dev, "%s\n", dev->nodename);
 
 	xennet_disconnect_backend(info);
 
+	for (i = 0; i < info->num_queues; ++i) {
+		queue = &info->queues[i];
+		del_timer_sync(&queue->rx_refill_timer);
+	}
+
+	if (info->num_queues) {
+		kfree(info->queues);
+		info->queues = NULL;
+	}
+
 	xennet_sysfs_delif(info->netdev);
 
 	unregister_netdev(info->netdev);
 
-	del_timer_sync(&info->rx_refill_timer);
-
 	free_percpu(info->stats);
 
 	free_netdev(info->netdev);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 net-next 4/5] xen-netfront: Add support for multiple queues
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew J. Bennieston, netdev, paul.durrant, wei.liu2,
	ian.campbell
In-Reply-To: <1392378624-6123-1-git-send-email-andrew.bennieston@citrix.com>

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Build on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.

Check XenStore for multi-queue support, and set up the rings and event
channels accordingly.

Write ring references and event channels to XenStore in a queue
hierarchy if appropriate, or flat when using only one queue.

Update the xennet_select_queue() function to choose the queue on which
to transmit a packet based on the skb hash result.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 drivers/net/xen-netfront.c |  176 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 138 insertions(+), 38 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d4239b9..d584fa4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -57,6 +57,10 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/grant_table.h>
 
+/* Module parameters */
+unsigned int xennet_max_queues;
+module_param(xennet_max_queues, uint, 0644);
+
 static const struct ethtool_ops xennet_ethtool_ops;
 
 struct netfront_cb {
@@ -565,10 +569,22 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }
 
-static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb)
+static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
+							   void *accel_priv)
 {
-	/* Stub for later implementation of queue selection */
-	return 0;
+	struct netfront_info *info = netdev_priv(dev);
+	u32 hash;
+	u16 queue_idx;
+
+	/* First, check if there is only one queue */
+	if (info->num_queues == 1)
+		queue_idx = 0;
+	else {
+		hash = skb_get_hash(skb);
+		queue_idx = (u16) (((u64)hash * info->num_queues) >> 32);
+	}
+
+	return queue_idx;
 }
 
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -1311,7 +1327,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	struct net_device *netdev;
 	struct netfront_info *np;
 
-	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), 1);
+	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), xennet_max_queues);
 	if (!netdev)
 		return ERR_PTR(-ENOMEM);
 
@@ -1683,6 +1699,88 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	return err;
 }
 
+static int write_queue_xenstore_keys(struct netfront_queue *queue,
+			   struct xenbus_transaction *xbt, int write_hierarchical)
+{
+	/* Write the queue-specific keys into XenStore in the traditional
+	 * way for a single queue, or in a queue subkeys for multiple
+	 * queues.
+	 */
+	struct xenbus_device *dev = queue->info->xbdev;
+	int err;
+	const char *message;
+	char *path;
+	size_t pathsize;
+
+	/* Choose the correct place to write the keys */
+	if (write_hierarchical) {
+		pathsize = strlen(dev->nodename) + 10;
+		path = kzalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "out of memory while writing ring references";
+			goto error;
+		}
+		snprintf(path, pathsize, "%s/queue-%u",
+				dev->nodename, queue->id);
+	} else {
+		path = (char *)dev->nodename;
+	}
+
+	/* Write ring references */
+	err = xenbus_printf(*xbt, path, "tx-ring-ref", "%u",
+			queue->tx_ring_ref);
+	if (err) {
+		message = "writing tx-ring-ref";
+		goto error;
+	}
+
+	err = xenbus_printf(*xbt, path, "rx-ring-ref", "%u",
+			queue->rx_ring_ref);
+	if (err) {
+		message = "writing rx-ring-ref";
+		goto error;
+	}
+
+	/* Write event channels; taking into account both shared
+	 * and split event channel scenarios.
+	 */
+	if (queue->tx_evtchn == queue->rx_evtchn) {
+		/* Shared event channel */
+		err = xenbus_printf(*xbt, path,
+				"event-channel", "%u", queue->tx_evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto error;
+		}
+	} else {
+		/* Split event channels */
+		err = xenbus_printf(*xbt, path,
+				"event-channel-tx", "%u", queue->tx_evtchn);
+		if (err) {
+			message = "writing event-channel-tx";
+			goto error;
+		}
+
+		err = xenbus_printf(*xbt, path,
+				"event-channel-rx", "%u", queue->rx_evtchn);
+		if (err) {
+			message = "writing event-channel-rx";
+			goto error;
+		}
+	}
+
+	if (write_hierarchical)
+		kfree(path);
+	return 0;
+
+error:
+	if (write_hierarchical)
+		kfree(path);
+	xenbus_dev_fatal(dev, err, "%s", message);
+	return err;
+}
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_netback(struct xenbus_device *dev,
 			   struct netfront_info *info)
@@ -1692,10 +1790,21 @@ static int talk_to_netback(struct xenbus_device *dev,
 	int err;
 	unsigned int feature_split_evtchn;
 	unsigned int i = 0;
+	unsigned int max_queues = 0;
 	struct netfront_queue *queue = NULL;
 
 	info->netdev->irq = 0;
 
+	/* Check if backend supports multiple queues */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			"multi-queue-max-queues", "%u", &max_queues);
+	if (err < 0) {
+		max_queues = 1;
+	} else if (max_queues > xennet_max_queues) {
+		/* limit to frontend max if backend max is higher */
+		max_queues = xennet_max_queues;
+	}
+
 	/* Check feature-split-event-channels */
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "feature-split-event-channels", "%u",
@@ -1711,12 +1820,13 @@ static int talk_to_netback(struct xenbus_device *dev,
 	}
 
 	/* Allocate array of queues */
-	info->queues = kcalloc(1, sizeof(struct netfront_queue), GFP_KERNEL);
+	info->queues = kcalloc(max_queues, sizeof(struct netfront_queue), GFP_KERNEL);
 	if (!info->queues) {
 		err = -ENOMEM;
 		goto out;
 	}
-	info->num_queues = 1;
+	info->num_queues = max_queues;
+	netif_set_real_num_tx_queues(info->netdev, info->num_queues);
 
 	/* Create shared ring, alloc event channel -- for each queue */
 	for (i = 0; i < info->num_queues; ++i) {
@@ -1754,49 +1864,35 @@ static int talk_to_netback(struct xenbus_device *dev,
 	}
 
 again:
-	queue = &info->queues[0]; /* Use first queue only */
-
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
 		goto destroy_ring;
 	}
 
-	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
-			    queue->tx_ring_ref);
-	if (err) {
-		message = "writing tx ring-ref";
-		goto abort_transaction;
-	}
-	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
-			    queue->rx_ring_ref);
-	if (err) {
-		message = "writing rx ring-ref";
-		goto abort_transaction;
-	}
-
-	if (queue->tx_evtchn == queue->rx_evtchn) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", queue->tx_evtchn);
-		if (err) {
-			message = "writing event-channel";
-			goto abort_transaction;
-		}
+	if (info->num_queues == 1) {
+		err = write_queue_xenstore_keys(&info->queues[0], &xbt, 0); /* flat */
+		if (err)
+			goto abort_transaction_no_dev_fatal;
 	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-tx", "%u", queue->tx_evtchn);
+		/* Write the number of queues */
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues",
+				"%u", info->num_queues);
 		if (err) {
-			message = "writing event-channel-tx";
-			goto abort_transaction;
+			message = "writing multi-queue-num-queues";
+			goto abort_transaction_no_dev_fatal;
 		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-rx", "%u", queue->rx_evtchn);
-		if (err) {
-			message = "writing event-channel-rx";
-			goto abort_transaction;
+
+		/* Write the keys for each queue */
+		for (i = 0; i < info->num_queues; ++i) {
+			queue = &info->queues[i];
+			err = write_queue_xenstore_keys(queue, &xbt, 1); /* hierarchical */
+			if (err)
+				goto abort_transaction_no_dev_fatal;
 		}
 	}
 
+	/* The remaining keys are not queue-specific */
 	err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
 			    1);
 	if (err) {
@@ -1846,8 +1942,9 @@ again:
 	return 0;
 
  abort_transaction:
-	xenbus_transaction_end(xbt, 1);
 	xenbus_dev_fatal(dev, err, "%s", message);
+abort_transaction_no_dev_fatal:
+	xenbus_transaction_end(xbt, 1);
  destroy_ring:
 	xennet_disconnect_backend(info);
 	kfree(info->queues);
@@ -2241,6 +2338,9 @@ static int __init netif_init(void)
 
 	pr_info("Initialising Xen virtual ethernet driver\n");
 
+	/* Allow as many queues as there are CPUs, by default */
+	xennet_max_queues = num_online_cpus();
+
 	return xenbus_register_frontend(&netfront_driver);
 }
 module_init(netif_init);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h
From: Andrew J. Bennieston @ 2014-02-14 11:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew J. Bennieston, netdev, paul.durrant, wei.liu2,
	ian.campbell
In-Reply-To: <1392378624-6123-1-git-send-email-andrew.bennieston@citrix.com>

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Document the multi-queue feature in terms of XenStore keys to be written
by the backend and by the frontend.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 include/xen/interface/io/netif.h |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index c50061d..8868c51 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -51,6 +51,27 @@
  */
 
 /*
+ * Multiple transmit and receive queues:
+ * If supported, the backend will write "multi-queue-max-queues" and set its
+ * value to the maximum supported number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use.
+ *
+ * Queues replicate the shared rings and event channels, and
+ * "feature-split-event-channels" is required when using multiple queues.
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel and ring-ref keys, instead writing them under sub-keys having
+ * the name "queue-N" where N is the integer ID of the queue for which those
+ * keys belong. Queues are indexed from zero.
+ */
+
+/*
  * "feature-no-csum-offload" should be used to turn IPv4 TCP/UDP checksum
  * offload off or on. If it is missing then the feature is assumed to be on.
  * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next 0/6] gianfar: Device configuration fixes
From: Claudiu Manoil @ 2014-02-14 12:03 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Hi David,

This patchset represents the first part of an effort
to solve some old device configuration issues in gianfar,
especially run-time reset and re-configuration problems.
I'm referring to "on-the-fly" configuration of registers
against HW specification, concurrency issues during device
reset / re-configuration operations, and implementing HW
advisories for these operations.

There's also a good deal of code cleanup and refactoring,
and some other (minor) fixes as well.

Thank you. 

Claudiu Manoil (6):
  gianfar: Cleanup/Fix gfar_probe and the hw init code
  gianfar: Replace sysfs stubs with module params (fix)
  gianfar: Remove useless HAS_PADDING device flag
  gianfar: Factor out enabling/disabling of hw interrupts
  gianfar: Add missing graceful reset steps and fixes
  gianfar: Remove clean_rx_ring race from gfar_ethtool

 drivers/net/ethernet/freescale/Makefile          |   2 +-
 drivers/net/ethernet/freescale/gianfar.c         | 503 +++++++++++------------
 drivers/net/ethernet/freescale/gianfar.h         |  41 +-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  64 +--
 drivers/net/ethernet/freescale/gianfar_param.c   | 123 ++++++
 drivers/net/ethernet/freescale/gianfar_sysfs.c   | 340 ---------------
 6 files changed, 405 insertions(+), 668 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/gianfar_param.c
 delete mode 100644 drivers/net/ethernet/freescale/gianfar_sysfs.c

-- 
1.7.11.7

^ permalink raw reply

* [PATCH net-next 3/6] gianfar: Remove useless HAS_PADDING device flag
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

The RCTRL updates of the FSL_GIANFAR_DEV_HAS_PADDING device
flag get overriden by the FSL_GIANFAR_DEV_HAS_TIMER flag
settings, which impose a Rx padding alignment of 8 bytes.
As all the eTSEC devices that set HAS_PADDING also set the
HAS_TIMER flag, the HAS_PADDING flag is now obsolete.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 15 +++------------
 drivers/net/ethernet/freescale/gianfar.h |  1 -
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 27cc8b2..7c7d3aa 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -375,13 +375,6 @@ static void gfar_init_mac(struct net_device *ndev)
 		rctrl |= RCTRL_PADDING(priv->padding);
 	}
 
-	/* Insert receive time stamps into padding alignment bytes */
-	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER) {
-		rctrl &= ~RCTRL_PAL_MASK;
-		rctrl |= RCTRL_PADDING(8);
-		priv->padding = 8;
-	}
-
 	/* Enable HW time stamping if requested from user space */
 	if (priv->hwts_rx_en) {
 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
@@ -770,7 +763,6 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 				     FSL_GIANFAR_DEV_HAS_COALESCE |
 				     FSL_GIANFAR_DEV_HAS_RMON |
 				     FSL_GIANFAR_DEV_HAS_MULTI_INTR |
-				     FSL_GIANFAR_DEV_HAS_PADDING |
 				     FSL_GIANFAR_DEV_HAS_CSUM |
 				     FSL_GIANFAR_DEV_HAS_VLAN |
 				     FSL_GIANFAR_DEV_HAS_MAGIC_PACKET |
@@ -1171,10 +1163,9 @@ static int gfar_probe(struct platform_device *ofdev)
 
 	gfar_init_addr_hash_table(priv);
 
-	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_PADDING)
-		priv->padding = DEFAULT_PADDING;
-	else
-		priv->padding = 0;
+	/* Insert receive time stamps into padding alignment bytes */
+	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER)
+		priv->padding = 8;
 
 	if (dev->features & NETIF_F_IP_CSUM ||
 	    priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER)
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 20121ef..9fcde7c 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -879,7 +879,6 @@ struct gfar {
 #define FSL_GIANFAR_DEV_HAS_CSUM		0x00000010
 #define FSL_GIANFAR_DEV_HAS_VLAN		0x00000020
 #define FSL_GIANFAR_DEV_HAS_EXTENDED_HASH	0x00000040
-#define FSL_GIANFAR_DEV_HAS_PADDING		0x00000080
 #define FSL_GIANFAR_DEV_HAS_MAGIC_PACKET	0x00000100
 #define FSL_GIANFAR_DEV_HAS_BD_STASHING		0x00000200
 #define FSL_GIANFAR_DEV_HAS_BUF_STASHING	0x00000400
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 1/6] gianfar: Cleanup/Fix gfar_probe and the hw init code
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

Factor out gfar_hw_init() to contain all the controller hw
initialization steps for a better control of register writes,
and to significantly simplify the tangled code from gfar_probe().
This results in code size and stack usage reduction (besides
code readability).

Fix memory leak on device removal, by freeing the rx_/tx_queue
structures.

Replace custom bit swapping function with a library one (bitrev8).

Move allocation of rx_/tx_queue struct arrays before the group
structure init, because in order to assign Rx/Tx queues
to groups we need to have the queues first.  This also allows
earlier bail out of gfar_probe(), in case the memory allocation
fails.

The flow control checks for maccfg1 were removed from gfar_probe(),
since flow control is disabled at probe time (priv->rx_/tx_pause_en
are 0). Redundant initializations (by 0) also removed.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 331 +++++++++++++++----------------
 drivers/net/ethernet/freescale/gianfar.h |  34 +++-
 2 files changed, 188 insertions(+), 177 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index ad5a5aa..ab915b0 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -9,7 +9,7 @@
  * Maintainer: Kumar Gala
  * Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
  *
- * Copyright 2002-2009, 2011 Freescale Semiconductor, Inc.
+ * Copyright 2002-2009, 2011-2013 Freescale Semiconductor, Inc.
  * Copyright 2007 MontaVista Software, Inc.
  *
  * This program is free software; you can redistribute  it and/or modify it
@@ -511,7 +511,43 @@ void unlock_tx_qs(struct gfar_private *priv)
 		spin_unlock(&priv->tx_queue[i]->txlock);
 }
 
-static void free_tx_pointers(struct gfar_private *priv)
+static int gfar_alloc_tx_queues(struct gfar_private *priv)
+{
+	int i;
+
+	for (i = 0; i < priv->num_tx_queues; i++) {
+		priv->tx_queue[i] = kzalloc(sizeof(struct gfar_priv_tx_q),
+					    GFP_KERNEL);
+		if (!priv->tx_queue[i])
+			return -ENOMEM;
+
+		priv->tx_queue[i]->tx_skbuff = NULL;
+		priv->tx_queue[i]->qindex = i;
+		priv->tx_queue[i]->dev = priv->ndev;
+		spin_lock_init(&(priv->tx_queue[i]->txlock));
+	}
+	return 0;
+}
+
+static int gfar_alloc_rx_queues(struct gfar_private *priv)
+{
+	int i;
+
+	for (i = 0; i < priv->num_rx_queues; i++) {
+		priv->rx_queue[i] = kzalloc(sizeof(struct gfar_priv_rx_q),
+					    GFP_KERNEL);
+		if (!priv->rx_queue[i])
+			return -ENOMEM;
+
+		priv->rx_queue[i]->rx_skbuff = NULL;
+		priv->rx_queue[i]->qindex = i;
+		priv->rx_queue[i]->dev = priv->ndev;
+		spin_lock_init(&(priv->rx_queue[i]->rxlock));
+	}
+	return 0;
+}
+
+static void gfar_free_tx_queues(struct gfar_private *priv)
 {
 	int i;
 
@@ -519,7 +555,7 @@ static void free_tx_pointers(struct gfar_private *priv)
 		kfree(priv->tx_queue[i]);
 }
 
-static void free_rx_pointers(struct gfar_private *priv)
+static void gfar_free_rx_queues(struct gfar_private *priv)
 {
 	int i;
 
@@ -608,6 +644,30 @@ static int gfar_parse_group(struct device_node *np,
 		grp->rx_bit_map = 0xFF;
 		grp->tx_bit_map = 0xFF;
 	}
+
+	/* bit_map's MSB is q0 (from q0 to q7) but, for_each_set_bit parses
+	 * right to left, so we need to revert the 8 bits to get the q index
+	 */
+	grp->rx_bit_map = bitrev8(grp->rx_bit_map);
+	grp->tx_bit_map = bitrev8(grp->tx_bit_map);
+
+	/* Calculate RSTAT, TSTAT, RQUEUE and TQUEUE values,
+	 * also assign queues to groups
+	 */
+	for_each_set_bit(i, &grp->rx_bit_map, priv->num_rx_queues) {
+		grp->num_rx_queues++;
+		grp->rstat |= (RSTAT_CLEAR_RHALT >> i);
+		priv->rqueue |= ((RQUEUE_EN0 | RQUEUE_EX0) >> i);
+		priv->rx_queue[i]->grp = grp;
+	}
+
+	for_each_set_bit(i, &grp->tx_bit_map, priv->num_tx_queues) {
+		grp->num_tx_queues++;
+		grp->tstat |= (TSTAT_CLEAR_THALT >> i);
+		priv->tqueue |= (TQUEUE_EN0 >> i);
+		priv->tx_queue[i]->grp = grp;
+	}
+
 	priv->num_grps++;
 
 	return 0;
@@ -664,7 +724,14 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 	priv->num_tx_queues = num_tx_qs;
 	netif_set_real_num_rx_queues(dev, num_rx_qs);
 	priv->num_rx_queues = num_rx_qs;
-	priv->num_grps = 0x0;
+
+	err = gfar_alloc_tx_queues(priv);
+	if (err)
+		goto tx_alloc_failed;
+
+	err = gfar_alloc_rx_queues(priv);
+	if (err)
+		goto rx_alloc_failed;
 
 	/* Init Rx queue filer rule set linked list */
 	INIT_LIST_HEAD(&priv->rx_list.list);
@@ -691,38 +758,6 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 			goto err_grp_init;
 	}
 
-	for (i = 0; i < priv->num_tx_queues; i++)
-		priv->tx_queue[i] = NULL;
-	for (i = 0; i < priv->num_rx_queues; i++)
-		priv->rx_queue[i] = NULL;
-
-	for (i = 0; i < priv->num_tx_queues; i++) {
-		priv->tx_queue[i] = kzalloc(sizeof(struct gfar_priv_tx_q),
-					    GFP_KERNEL);
-		if (!priv->tx_queue[i]) {
-			err = -ENOMEM;
-			goto tx_alloc_failed;
-		}
-		priv->tx_queue[i]->tx_skbuff = NULL;
-		priv->tx_queue[i]->qindex = i;
-		priv->tx_queue[i]->dev = dev;
-		spin_lock_init(&(priv->tx_queue[i]->txlock));
-	}
-
-	for (i = 0; i < priv->num_rx_queues; i++) {
-		priv->rx_queue[i] = kzalloc(sizeof(struct gfar_priv_rx_q),
-					    GFP_KERNEL);
-		if (!priv->rx_queue[i]) {
-			err = -ENOMEM;
-			goto rx_alloc_failed;
-		}
-		priv->rx_queue[i]->rx_skbuff = NULL;
-		priv->rx_queue[i]->qindex = i;
-		priv->rx_queue[i]->dev = dev;
-		spin_lock_init(&(priv->rx_queue[i]->rxlock));
-	}
-
-
 	stash = of_get_property(np, "bd-stash", NULL);
 
 	if (stash) {
@@ -784,12 +819,12 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 
 	return 0;
 
-rx_alloc_failed:
-	free_rx_pointers(priv);
-tx_alloc_failed:
-	free_tx_pointers(priv);
 err_grp_init:
 	unmap_group_regs(priv);
+rx_alloc_failed:
+	gfar_free_rx_queues(priv);
+tx_alloc_failed:
+	gfar_free_tx_queues(priv);
 	free_gfar_dev(priv);
 	return err;
 }
@@ -875,19 +910,6 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	return phy_mii_ioctl(priv->phydev, rq, cmd);
 }
 
-static unsigned int reverse_bitmap(unsigned int bit_map, unsigned int max_qs)
-{
-	unsigned int new_bit_map = 0x0;
-	int mask = 0x1 << (max_qs - 1), i;
-
-	for (i = 0; i < max_qs; i++) {
-		if (bit_map & mask)
-			new_bit_map = new_bit_map + (1 << i);
-		mask = mask >> 0x1;
-	}
-	return new_bit_map;
-}
-
 static u32 cluster_entry_per_class(struct gfar_private *priv, u32 rqfar,
 				   u32 class)
 {
@@ -1005,19 +1027,88 @@ static void gfar_detect_errata(struct gfar_private *priv)
 			 priv->errata);
 }
 
+static void gfar_hw_init(struct gfar_private *priv)
+{
+	struct gfar __iomem *regs = priv->gfargrp[0].regs;
+	u32 tempval;
+
+	/* Reset MAC layer */
+	gfar_write(&regs->maccfg1, MACCFG1_SOFT_RESET);
+
+	/* We need to delay at least 3 TX clocks */
+	udelay(2);
+
+	/* the soft reset bit is not self-resetting, so we need to
+	 * clear it before resuming normal operation
+	 */
+	gfar_write(&regs->maccfg1, 0);
+
+	/* Initialize MACCFG2. */
+	tempval = MACCFG2_INIT_SETTINGS;
+	if (gfar_has_errata(priv, GFAR_ERRATA_74))
+		tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK;
+	gfar_write(&regs->maccfg2, tempval);
+
+	/* Initialize ECNTRL */
+	gfar_write(&regs->ecntrl, ECNTRL_INIT_SETTINGS);
+
+	/* Program the interrupt steering regs, only for MG devices */
+	if (priv->num_grps > 1)
+		gfar_write_isrg(priv);
+
+	/* Enable all Rx/Tx queues after MAC reset */
+	gfar_write(&regs->rqueue, priv->rqueue);
+	gfar_write(&regs->tqueue, priv->tqueue);
+}
+
+static void __init gfar_init_addr_hash_table(struct gfar_private *priv)
+{
+	struct gfar __iomem *regs = priv->gfargrp[0].regs;
+
+	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_EXTENDED_HASH) {
+		priv->extended_hash = 1;
+		priv->hash_width = 9;
+
+		priv->hash_regs[0] = &regs->igaddr0;
+		priv->hash_regs[1] = &regs->igaddr1;
+		priv->hash_regs[2] = &regs->igaddr2;
+		priv->hash_regs[3] = &regs->igaddr3;
+		priv->hash_regs[4] = &regs->igaddr4;
+		priv->hash_regs[5] = &regs->igaddr5;
+		priv->hash_regs[6] = &regs->igaddr6;
+		priv->hash_regs[7] = &regs->igaddr7;
+		priv->hash_regs[8] = &regs->gaddr0;
+		priv->hash_regs[9] = &regs->gaddr1;
+		priv->hash_regs[10] = &regs->gaddr2;
+		priv->hash_regs[11] = &regs->gaddr3;
+		priv->hash_regs[12] = &regs->gaddr4;
+		priv->hash_regs[13] = &regs->gaddr5;
+		priv->hash_regs[14] = &regs->gaddr6;
+		priv->hash_regs[15] = &regs->gaddr7;
+
+	} else {
+		priv->extended_hash = 0;
+		priv->hash_width = 8;
+
+		priv->hash_regs[0] = &regs->gaddr0;
+		priv->hash_regs[1] = &regs->gaddr1;
+		priv->hash_regs[2] = &regs->gaddr2;
+		priv->hash_regs[3] = &regs->gaddr3;
+		priv->hash_regs[4] = &regs->gaddr4;
+		priv->hash_regs[5] = &regs->gaddr5;
+		priv->hash_regs[6] = &regs->gaddr6;
+		priv->hash_regs[7] = &regs->gaddr7;
+	}
+}
+
 /* Set up the ethernet device structure, private data,
  * and anything else we need before we start
  */
 static int gfar_probe(struct platform_device *ofdev)
 {
-	u32 tempval;
 	struct net_device *dev = NULL;
 	struct gfar_private *priv = NULL;
-	struct gfar __iomem *regs = NULL;
-	int err = 0, i, grp_idx = 0;
-	u32 rstat = 0, tstat = 0, rqueue = 0, tqueue = 0;
-	u32 isrg = 0;
-	u32 __iomem *baddr;
+	int err = 0, i;
 
 	err = gfar_of_init(ofdev, &dev);
 
@@ -1034,7 +1125,6 @@ static int gfar_probe(struct platform_device *ofdev)
 	INIT_WORK(&priv->reset_task, gfar_reset_task);
 
 	platform_set_drvdata(ofdev, priv);
-	regs = priv->gfargrp[0].regs;
 
 	gfar_detect_errata(priv);
 
@@ -1043,33 +1133,10 @@ static int gfar_probe(struct platform_device *ofdev)
 	 */
 	gfar_halt(dev);
 
-	/* Reset MAC layer */
-	gfar_write(&regs->maccfg1, MACCFG1_SOFT_RESET);
-
-	/* We need to delay at least 3 TX clocks */
-	udelay(2);
-
-	tempval = 0;
-	if (!priv->pause_aneg_en && priv->tx_pause_en)
-		tempval |= MACCFG1_TX_FLOW;
-	if (!priv->pause_aneg_en && priv->rx_pause_en)
-		tempval |= MACCFG1_RX_FLOW;
-	/* the soft reset bit is not self-resetting, so we need to
-	 * clear it before resuming normal operation
-	 */
-	gfar_write(&regs->maccfg1, tempval);
-
-	/* Initialize MACCFG2. */
-	tempval = MACCFG2_INIT_SETTINGS;
-	if (gfar_has_errata(priv, GFAR_ERRATA_74))
-		tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK;
-	gfar_write(&regs->maccfg2, tempval);
-
-	/* Initialize ECNTRL */
-	gfar_write(&regs->ecntrl, ECNTRL_INIT_SETTINGS);
+	gfar_hw_init(priv);
 
 	/* Set the dev->base_addr to the gfar reg region */
-	dev->base_addr = (unsigned long) regs;
+	dev->base_addr = (unsigned long) priv->gfargrp[0].regs;
 
 	/* Fill in the dev structure */
 	dev->watchdog_timeo = TX_TIMEOUT;
@@ -1099,40 +1166,7 @@ static int gfar_probe(struct platform_device *ofdev)
 		dev->features |= NETIF_F_HW_VLAN_CTAG_RX;
 	}
 
-	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_EXTENDED_HASH) {
-		priv->extended_hash = 1;
-		priv->hash_width = 9;
-
-		priv->hash_regs[0] = &regs->igaddr0;
-		priv->hash_regs[1] = &regs->igaddr1;
-		priv->hash_regs[2] = &regs->igaddr2;
-		priv->hash_regs[3] = &regs->igaddr3;
-		priv->hash_regs[4] = &regs->igaddr4;
-		priv->hash_regs[5] = &regs->igaddr5;
-		priv->hash_regs[6] = &regs->igaddr6;
-		priv->hash_regs[7] = &regs->igaddr7;
-		priv->hash_regs[8] = &regs->gaddr0;
-		priv->hash_regs[9] = &regs->gaddr1;
-		priv->hash_regs[10] = &regs->gaddr2;
-		priv->hash_regs[11] = &regs->gaddr3;
-		priv->hash_regs[12] = &regs->gaddr4;
-		priv->hash_regs[13] = &regs->gaddr5;
-		priv->hash_regs[14] = &regs->gaddr6;
-		priv->hash_regs[15] = &regs->gaddr7;
-
-	} else {
-		priv->extended_hash = 0;
-		priv->hash_width = 8;
-
-		priv->hash_regs[0] = &regs->gaddr0;
-		priv->hash_regs[1] = &regs->gaddr1;
-		priv->hash_regs[2] = &regs->gaddr2;
-		priv->hash_regs[3] = &regs->gaddr3;
-		priv->hash_regs[4] = &regs->gaddr4;
-		priv->hash_regs[5] = &regs->gaddr5;
-		priv->hash_regs[6] = &regs->gaddr6;
-		priv->hash_regs[7] = &regs->gaddr7;
-	}
+	gfar_init_addr_hash_table(priv);
 
 	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_PADDING)
 		priv->padding = DEFAULT_PADDING;
@@ -1143,59 +1177,6 @@ static int gfar_probe(struct platform_device *ofdev)
 	    priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER)
 		dev->needed_headroom = GMAC_FCB_LEN;
 
-	/* Program the isrg regs only if number of grps > 1 */
-	if (priv->num_grps > 1) {
-		baddr = &regs->isrg0;
-		for (i = 0; i < priv->num_grps; i++) {
-			isrg |= (priv->gfargrp[i].rx_bit_map << ISRG_SHIFT_RX);
-			isrg |= (priv->gfargrp[i].tx_bit_map << ISRG_SHIFT_TX);
-			gfar_write(baddr, isrg);
-			baddr++;
-			isrg = 0x0;
-		}
-	}
-
-	/* Need to reverse the bit maps as  bit_map's MSB is q0
-	 * but, for_each_set_bit parses from right to left, which
-	 * basically reverses the queue numbers
-	 */
-	for (i = 0; i< priv->num_grps; i++) {
-		priv->gfargrp[i].tx_bit_map =
-			reverse_bitmap(priv->gfargrp[i].tx_bit_map, MAX_TX_QS);
-		priv->gfargrp[i].rx_bit_map =
-			reverse_bitmap(priv->gfargrp[i].rx_bit_map, MAX_RX_QS);
-	}
-
-	/* Calculate RSTAT, TSTAT, RQUEUE and TQUEUE values,
-	 * also assign queues to groups
-	 */
-	for (grp_idx = 0; grp_idx < priv->num_grps; grp_idx++) {
-		priv->gfargrp[grp_idx].num_rx_queues = 0x0;
-
-		for_each_set_bit(i, &priv->gfargrp[grp_idx].rx_bit_map,
-				 priv->num_rx_queues) {
-			priv->gfargrp[grp_idx].num_rx_queues++;
-			priv->rx_queue[i]->grp = &priv->gfargrp[grp_idx];
-			rstat = rstat | (RSTAT_CLEAR_RHALT >> i);
-			rqueue = rqueue | ((RQUEUE_EN0 | RQUEUE_EX0) >> i);
-		}
-		priv->gfargrp[grp_idx].num_tx_queues = 0x0;
-
-		for_each_set_bit(i, &priv->gfargrp[grp_idx].tx_bit_map,
-				 priv->num_tx_queues) {
-			priv->gfargrp[grp_idx].num_tx_queues++;
-			priv->tx_queue[i]->grp = &priv->gfargrp[grp_idx];
-			tstat = tstat | (TSTAT_CLEAR_THALT >> i);
-			tqueue = tqueue | (TQUEUE_EN0 >> i);
-		}
-		priv->gfargrp[grp_idx].rstat = rstat;
-		priv->gfargrp[grp_idx].tstat = tstat;
-		rstat = tstat =0;
-	}
-
-	gfar_write(&regs->rqueue, rqueue);
-	gfar_write(&regs->tqueue, tqueue);
-
 	priv->rx_buffer_size = DEFAULT_RX_BUFFER_SIZE;
 
 	/* Initializing some of the rx/tx queue level parameters */
@@ -1272,8 +1253,8 @@ static int gfar_probe(struct platform_device *ofdev)
 
 register_fail:
 	unmap_group_regs(priv);
-	free_tx_pointers(priv);
-	free_rx_pointers(priv);
+	gfar_free_rx_queues(priv);
+	gfar_free_tx_queues(priv);
 	if (priv->phy_node)
 		of_node_put(priv->phy_node);
 	if (priv->tbi_node)
@@ -1293,6 +1274,8 @@ static int gfar_remove(struct platform_device *ofdev)
 
 	unregister_netdev(priv->ndev);
 	unmap_group_regs(priv);
+	gfar_free_rx_queues(priv);
+	gfar_free_tx_queues(priv);
 	free_gfar_dev(priv);
 
 	return 0;
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 52bb2b0..63c830c 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -9,7 +9,7 @@
  * Maintainer: Kumar Gala
  * Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
  *
- * Copyright 2002-2009, 2011 Freescale Semiconductor, Inc.
+ * Copyright 2002-2009, 2011-2013 Freescale Semiconductor, Inc.
  *
  * This program is free software; you can redistribute  it and/or modify it
  * under  the terms of  the GNU General  Public License as published by the
@@ -892,8 +892,8 @@ struct gfar {
 #define DEFAULT_MAPPING 	0xFF
 #endif
 
-#define ISRG_SHIFT_TX	0x10
-#define ISRG_SHIFT_RX	0x18
+#define ISRG_RR0	0x80000000
+#define ISRG_TR0	0x00800000
 
 /* The same driver can operate in two modes */
 /* SQ_SG_MODE: Single Queue Single Group Mode
@@ -1113,6 +1113,9 @@ struct gfar_private {
 	unsigned int total_tx_ring_size;
 	unsigned int total_rx_ring_size;
 
+	u32 rqueue;
+	u32 tqueue;
+
 	/* RX per device parameters */
 	unsigned int rx_stash_size;
 	unsigned int rx_stash_index;
@@ -1176,6 +1179,31 @@ static inline void gfar_read_filer(struct gfar_private *priv,
 	*fpr = gfar_read(&regs->rqfpr);
 }
 
+static inline void gfar_write_isrg(struct gfar_private *priv)
+{
+	struct gfar __iomem *regs = priv->gfargrp[0].regs;
+	u32 __iomem *baddr = &regs->isrg0;
+	u32 isrg = 0;
+	int grp_idx, i;
+
+	for (grp_idx = 0; grp_idx < priv->num_grps; grp_idx++) {
+		struct gfar_priv_grp *grp = &priv->gfargrp[grp_idx];
+
+		for_each_set_bit(i, &grp->rx_bit_map, priv->num_rx_queues) {
+			isrg |= (ISRG_RR0 >> i);
+		}
+
+		for_each_set_bit(i, &grp->tx_bit_map, priv->num_tx_queues) {
+			isrg |= (ISRG_TR0 >> i);
+		}
+
+		gfar_write(baddr, isrg);
+
+		baddr++;
+		isrg = 0;
+	}
+}
+
 void lock_rx_qs(struct gfar_private *priv);
 void lock_tx_qs(struct gfar_private *priv);
 void unlock_rx_qs(struct gfar_private *priv);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 4/6] gianfar: Factor out enabling/disabling of hw interrupts
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

Throughout the code there are places where the controller's
hw interrupt sources need to get disabled/enabled (masked/
un-masked) all at once.  The recommendation for disabling
the interrupts is to clear the ievent first then the imask
register (not the other way around).
Use the gfar_ints_enable/disable() helpers to make these
operations consistent.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 60 ++++++++++++++++----------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 7c7d3aa..158c4e8 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -448,6 +448,29 @@ static const struct net_device_ops gfar_netdev_ops = {
 #endif
 };
 
+static void gfar_ints_disable(struct gfar_private *priv)
+{
+	int i;
+	for (i = 0; i < priv->num_grps; i++) {
+		struct gfar __iomem *regs = priv->gfargrp[i].regs;
+		/* Clear IEVENT */
+		gfar_write(&regs->ievent, IEVENT_INIT_CLEAR);
+
+		/* Initialize IMASK */
+		gfar_write(&regs->imask, IMASK_INIT_CLEAR);
+	}
+}
+
+static void gfar_ints_enable(struct gfar_private *priv)
+{
+	int i;
+	for (i = 0; i < priv->num_grps; i++) {
+		struct gfar __iomem *regs = priv->gfargrp[i].regs;
+		/* Unmask the interrupts we look for */
+		gfar_write(&regs->imask, IMASK_DEFAULT);
+	}
+}
+
 void lock_rx_qs(struct gfar_private *priv)
 {
 	int i;
@@ -1551,19 +1574,10 @@ static void gfar_configure_serdes(struct net_device *dev)
 static void init_registers(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar __iomem *regs = NULL;
-	int i;
-
-	for (i = 0; i < priv->num_grps; i++) {
-		regs = priv->gfargrp[i].regs;
-		/* Clear IEVENT */
-		gfar_write(&regs->ievent, IEVENT_INIT_CLEAR);
+	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 
-		/* Initialize IMASK */
-		gfar_write(&regs->imask, IMASK_INIT_CLEAR);
-	}
+	gfar_ints_disable(priv);
 
-	regs = priv->gfargrp[0].regs;
 	/* Init hash registers to zero */
 	gfar_write(&regs->igaddr0, 0);
 	gfar_write(&regs->igaddr1, 0);
@@ -1625,20 +1639,11 @@ static int __gfar_is_rx_idle(struct gfar_private *priv)
 static void gfar_halt_nodisable(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar __iomem *regs = NULL;
+	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 tempval;
-	int i;
-
-	for (i = 0; i < priv->num_grps; i++) {
-		regs = priv->gfargrp[i].regs;
-		/* Mask all interrupts */
-		gfar_write(&regs->imask, IMASK_INIT_CLEAR);
 
-		/* Clear all interrupts */
-		gfar_write(&regs->ievent, IEVENT_INIT_CLEAR);
-	}
+	gfar_ints_disable(priv);
 
-	regs = priv->gfargrp[0].regs;
 	/* Stop the DMA, and wait for it to stop */
 	tempval = gfar_read(&regs->dmactrl);
 	if ((tempval & (DMACTRL_GRS | DMACTRL_GTS)) !=
@@ -1826,10 +1831,10 @@ void gfar_start(struct net_device *dev)
 		/* Clear THLT/RHLT, so that the DMA starts polling now */
 		gfar_write(&regs->tstat, priv->gfargrp[i].tstat);
 		gfar_write(&regs->rstat, priv->gfargrp[i].rstat);
-		/* Unmask the interrupts we look for */
-		gfar_write(&regs->imask, IMASK_DEFAULT);
 	}
 
+	gfar_ints_enable(priv);
+
 	dev->trans_start = jiffies; /* prevent tx timeout */
 }
 
@@ -1934,15 +1939,10 @@ err_irq_fail:
 int startup_gfar(struct net_device *ndev)
 {
 	struct gfar_private *priv = netdev_priv(ndev);
-	struct gfar __iomem *regs = NULL;
 	int err, i, j;
 
-	for (i = 0; i < priv->num_grps; i++) {
-		regs= priv->gfargrp[i].regs;
-		gfar_write(&regs->imask, IMASK_INIT_CLEAR);
-	}
+	gfar_ints_disable(priv);
 
-	regs= priv->gfargrp[0].regs;
 	err = gfar_alloc_skb_resources(ndev);
 	if (err)
 		return err;
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 5/6] gianfar: Add missing graceful reset steps and fixes
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

gfar_halt() and gfar_start() are responsible for stopping
and starting the DMA and the Rx/Tx hw rings. They implement
the support for the "graceful Rx/Tx stop/start" hw procedure,
and also disable/enable eTSEC's hw interrupts in the process.

The GRS/GTS procedure requires however to have the RQUEUE/TQUEUE
registers cleared first and to wait for a period of time for the
current frame to pass through the interface (around ~10ms for a
jumbo frame). Only then may the GTS and GRS bits from DMACTRL be
set to shut down the DMA, and finally the Tx_EN and Rx_EN bits in
MACCFG1 may be cleared to disable the Tx/Rx blocks.

The same register programming order applies to start the Rx/Tx:
enabling the RQUEUE/TQUEUE *before* clearing the GRS/GTS bits.

This is a HW recommendation in order to avoid a possible
controller "lock up" during graceful reset.

Cleanup the gfar_halt()/start() prototypes, to take priv instead
of ndev as their purpose is to operate on HW. Enabling the
RQUEUE/TQUEUE in the hw_init() is not needed anymore since
that's the job of gfar_start().

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c         | 53 ++++++++++++------------
 drivers/net/ethernet/freescale/gianfar.h         |  3 +-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  5 +--
 3 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 158c4e8..7418b2c 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -138,9 +138,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit);
 static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
 static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
 			       int amount_pull, struct napi_struct *napi);
-void gfar_halt(struct net_device *dev);
-static void gfar_halt_nodisable(struct net_device *dev);
-void gfar_start(struct net_device *dev);
+static void gfar_halt_nodisable(struct gfar_private *priv);
 static void gfar_clear_exact_match(struct net_device *dev);
 static void gfar_set_mac_for_addr(struct net_device *dev, int num,
 				  const u8 *addr);
@@ -1070,10 +1068,6 @@ static void gfar_hw_init(struct gfar_private *priv)
 	/* Program the interrupt steering regs, only for MG devices */
 	if (priv->num_grps > 1)
 		gfar_write_isrg(priv);
-
-	/* Enable all Rx/Tx queues after MAC reset */
-	gfar_write(&regs->rqueue, priv->rqueue);
-	gfar_write(&regs->tqueue, priv->tqueue);
 }
 
 static void __init gfar_init_addr_hash_table(struct gfar_private *priv)
@@ -1149,7 +1143,7 @@ static int gfar_probe(struct platform_device *ofdev)
 	/* Stop the DMA engine now, in case it was running before
 	 * (The firmware could have used it, and left it running).
 	 */
-	gfar_halt(dev);
+	gfar_halt(priv);
 
 	gfar_hw_init(priv);
 
@@ -1317,7 +1311,7 @@ static int gfar_suspend(struct device *dev)
 		lock_tx_qs(priv);
 		lock_rx_qs(priv);
 
-		gfar_halt_nodisable(ndev);
+		gfar_halt_nodisable(priv);
 
 		/* Disable Tx, and Rx if wake-on-LAN is disabled. */
 		tempval = gfar_read(&regs->maccfg1);
@@ -1381,7 +1375,7 @@ static int gfar_resume(struct device *dev)
 	tempval &= ~MACCFG2_MPEN;
 	gfar_write(&regs->maccfg2, tempval);
 
-	gfar_start(ndev);
+	gfar_start(priv);
 
 	unlock_rx_qs(priv);
 	unlock_tx_qs(priv);
@@ -1413,7 +1407,7 @@ static int gfar_restore(struct device *dev)
 	init_registers(ndev);
 	gfar_set_mac_address(ndev);
 	gfar_init_mac(ndev);
-	gfar_start(ndev);
+	gfar_start(priv);
 
 	priv->oldlink = 0;
 	priv->oldspeed = 0;
@@ -1636,9 +1630,8 @@ static int __gfar_is_rx_idle(struct gfar_private *priv)
 }
 
 /* Halt the receive and transmit queues */
-static void gfar_halt_nodisable(struct net_device *dev)
+static void gfar_halt_nodisable(struct gfar_private *priv)
 {
-	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 tempval;
 
@@ -1664,15 +1657,20 @@ static void gfar_halt_nodisable(struct net_device *dev)
 }
 
 /* Halt the receive and transmit queues */
-void gfar_halt(struct net_device *dev)
+void gfar_halt(struct gfar_private *priv)
 {
-	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 tempval;
 
-	gfar_halt_nodisable(dev);
+	/* Dissable the Rx/Tx hw queues */
+	gfar_write(&regs->rqueue, 0);
+	gfar_write(&regs->tqueue, 0);
 
-	/* Disable Rx and Tx */
+	mdelay(10);
+
+	gfar_halt_nodisable(priv);
+
+	/* Disable Rx/Tx DMA */
 	tempval = gfar_read(&regs->maccfg1);
 	tempval &= ~(MACCFG1_RX_EN | MACCFG1_TX_EN);
 	gfar_write(&regs->maccfg1, tempval);
@@ -1699,7 +1697,7 @@ void stop_gfar(struct net_device *dev)
 	lock_tx_qs(priv);
 	lock_rx_qs(priv);
 
-	gfar_halt(dev);
+	gfar_halt(priv);
 
 	unlock_rx_qs(priv);
 	unlock_tx_qs(priv);
@@ -1804,17 +1802,15 @@ static void free_skb_resources(struct gfar_private *priv)
 			  priv->tx_queue[0]->tx_bd_dma_base);
 }
 
-void gfar_start(struct net_device *dev)
+void gfar_start(struct gfar_private *priv)
 {
-	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 tempval;
 	int i = 0;
 
-	/* Enable Rx and Tx in MACCFG1 */
-	tempval = gfar_read(&regs->maccfg1);
-	tempval |= (MACCFG1_RX_EN | MACCFG1_TX_EN);
-	gfar_write(&regs->maccfg1, tempval);
+	/* Enable Rx/Tx hw queues */
+	gfar_write(&regs->rqueue, priv->rqueue);
+	gfar_write(&regs->tqueue, priv->tqueue);
 
 	/* Initialize DMACTRL to have WWR and WOP */
 	tempval = gfar_read(&regs->dmactrl);
@@ -1833,9 +1829,14 @@ void gfar_start(struct net_device *dev)
 		gfar_write(&regs->rstat, priv->gfargrp[i].rstat);
 	}
 
+	/* Enable Rx/Tx DMA */
+	tempval = gfar_read(&regs->maccfg1);
+	tempval |= (MACCFG1_RX_EN | MACCFG1_TX_EN);
+	gfar_write(&regs->maccfg1, tempval);
+
 	gfar_ints_enable(priv);
 
-	dev->trans_start = jiffies; /* prevent tx timeout */
+	priv->ndev->trans_start = jiffies; /* prevent tx timeout */
 }
 
 static void gfar_configure_coalescing(struct gfar_private *priv,
@@ -1959,7 +1960,7 @@ int startup_gfar(struct net_device *ndev)
 	}
 
 	/* Start the controller */
-	gfar_start(ndev);
+	gfar_start(priv);
 
 	phy_start(priv->phydev);
 
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 9fcde7c..f288ee9 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -1209,7 +1209,8 @@ void unlock_tx_qs(struct gfar_private *priv);
 irqreturn_t gfar_receive(int irq, void *dev_id);
 int startup_gfar(struct net_device *dev);
 void stop_gfar(struct net_device *dev);
-void gfar_halt(struct net_device *dev);
+void gfar_halt(struct gfar_private *priv);
+void gfar_start(struct gfar_private *priv);
 void gfar_phy_test(struct mii_bus *bus, struct phy_device *phydev, int enable,
 		   u32 regnum, u32 read);
 void gfar_configure_coalescing_all(struct gfar_private *priv);
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 63d2344..69fab72 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -44,7 +44,6 @@
 
 #include "gianfar.h"
 
-extern void gfar_start(struct net_device *dev);
 extern int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue,
 			      int rx_work_limit);
 
@@ -504,7 +503,7 @@ static int gfar_sringparam(struct net_device *dev,
 		lock_tx_qs(priv);
 		lock_rx_qs(priv);
 
-		gfar_halt(dev);
+		gfar_halt(priv);
 
 		unlock_rx_qs(priv);
 		unlock_tx_qs(priv);
@@ -627,7 +626,7 @@ int gfar_set_features(struct net_device *dev, netdev_features_t features)
 		lock_tx_qs(priv);
 		lock_rx_qs(priv);
 
-		gfar_halt(dev);
+		gfar_halt(priv);
 
 		unlock_tx_qs(priv);
 		unlock_rx_qs(priv);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 2/6] gianfar: Replace sysfs stubs with module params (fix)
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

Removing the sysfs stubs for the Tx FIFOCFG and ATTRELI
(stashing) config registers, as these registers may only
be configured after a MAC reset, with the controller stopped
(i.e. during hw init, at probe() time). The current sysfs
stubs allow on-the-fly updates of these registers (the locking
measures are useless and only add unecessary code).

Changing these registers on-the-fly is strogly discouraged.
In this regard, this patch may be seen as a security fix.

To address this issue and not lose entirely these config
params, they are now accessible as driver module parameters,
and their names and default values have been preserved.

Moreover, the stasing configuration options were effectively
disabled (didn't get to the hw anyway if changed) because
the stashing device_flags (HAS_BD_STASHING|HAS_BUF_STASHING)
were "accidentally" cleared during probe(). The patch fixes
this bug as well.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/Makefile        |   2 +-
 drivers/net/ethernet/freescale/gianfar.c       |  60 ++---
 drivers/net/ethernet/freescale/gianfar.h       |   3 +-
 drivers/net/ethernet/freescale/gianfar_param.c | 123 +++++++++
 drivers/net/ethernet/freescale/gianfar_sysfs.c | 340 -------------------------
 5 files changed, 155 insertions(+), 373 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/gianfar_param.c
 delete mode 100644 drivers/net/ethernet/freescale/gianfar_sysfs.c

diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
index 549ce13..e324b7d 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -15,6 +15,6 @@ obj-$(CONFIG_GIANFAR) += gianfar_driver.o
 obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
 gianfar_driver-objs := gianfar.o \
 		gianfar_ethtool.o \
-		gianfar_sysfs.o
+		gianfar_param.o
 obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
 ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index ab915b0..27cc8b2 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -338,7 +338,6 @@ static void gfar_init_mac(struct net_device *ndev)
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 rctrl = 0;
 	u32 tctrl = 0;
-	u32 attrs = 0;
 
 	/* write the tx/rx base registers */
 	gfar_init_tx_rx_base(priv);
@@ -409,29 +408,6 @@ static void gfar_init_mac(struct net_device *ndev)
 	}
 
 	gfar_write(&regs->tctrl, tctrl);
-
-	/* Set the extraction length and index */
-	attrs = ATTRELI_EL(priv->rx_stash_size) |
-		ATTRELI_EI(priv->rx_stash_index);
-
-	gfar_write(&regs->attreli, attrs);
-
-	/* Start with defaults, and add stashing or locking
-	 * depending on the approprate variables
-	 */
-	attrs = ATTR_INIT_SETTINGS;
-
-	if (priv->bd_stash_en)
-		attrs |= ATTR_BDSTASH;
-
-	if (priv->rx_stash_size != 0)
-		attrs |= ATTR_BUFSTASH;
-
-	gfar_write(&regs->attr, attrs);
-
-	gfar_write(&regs->fifo_tx_thr, priv->fifo_threshold);
-	gfar_write(&regs->fifo_tx_starve, priv->fifo_starve);
-	gfar_write(&regs->fifo_tx_starve_shutoff, priv->fifo_starve_off);
 }
 
 static struct net_device_stats *gfar_get_stats(struct net_device *dev)
@@ -784,13 +760,13 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 		memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
 
 	if (model && !strcasecmp(model, "TSEC"))
-		priv->device_flags = FSL_GIANFAR_DEV_HAS_GIGABIT |
+		priv->device_flags |= FSL_GIANFAR_DEV_HAS_GIGABIT |
 				     FSL_GIANFAR_DEV_HAS_COALESCE |
 				     FSL_GIANFAR_DEV_HAS_RMON |
 				     FSL_GIANFAR_DEV_HAS_MULTI_INTR;
 
 	if (model && !strcasecmp(model, "eTSEC"))
-		priv->device_flags = FSL_GIANFAR_DEV_HAS_GIGABIT |
+		priv->device_flags |= FSL_GIANFAR_DEV_HAS_GIGABIT |
 				     FSL_GIANFAR_DEV_HAS_COALESCE |
 				     FSL_GIANFAR_DEV_HAS_RMON |
 				     FSL_GIANFAR_DEV_HAS_MULTI_INTR |
@@ -1030,7 +1006,7 @@ static void gfar_detect_errata(struct gfar_private *priv)
 static void gfar_hw_init(struct gfar_private *priv)
 {
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	u32 tempval;
+	u32 tempval, attrs;
 
 	/* Reset MAC layer */
 	gfar_write(&regs->maccfg1, MACCFG1_SOFT_RESET);
@@ -1052,6 +1028,30 @@ static void gfar_hw_init(struct gfar_private *priv)
 	/* Initialize ECNTRL */
 	gfar_write(&regs->ecntrl, ECNTRL_INIT_SETTINGS);
 
+	/* Set the extraction length and index */
+	attrs = ATTRELI_EL(priv->rx_stash_size) |
+		ATTRELI_EI(priv->rx_stash_index);
+
+	gfar_write(&regs->attreli, attrs);
+
+	/* Start with defaults, and add stashing
+	 * depending on driver parameters
+	 */
+	attrs = ATTR_INIT_SETTINGS;
+
+	if (priv->bd_stash_en)
+		attrs |= ATTR_BDSTASH;
+
+	if (priv->rx_stash_size != 0)
+		attrs |= ATTR_BUFSTASH;
+
+	gfar_write(&regs->attr, attrs);
+
+	/* FIFO configs */
+	gfar_write(&regs->fifo_tx_thr, priv->fifo_threshold);
+	gfar_write(&regs->fifo_tx_starve, priv->fifo_starve);
+	gfar_write(&regs->fifo_tx_starve_shutoff, priv->fifo_starve_off);
+
 	/* Program the interrupt steering regs, only for MG devices */
 	if (priv->num_grps > 1)
 		gfar_write_isrg(priv);
@@ -1108,6 +1108,7 @@ static int gfar_probe(struct platform_device *ofdev)
 {
 	struct net_device *dev = NULL;
 	struct gfar_private *priv = NULL;
+	static int dev_idx;
 	int err = 0, i;
 
 	err = gfar_of_init(ofdev, &dev);
@@ -1128,6 +1129,8 @@ static int gfar_probe(struct platform_device *ofdev)
 
 	gfar_detect_errata(priv);
 
+	gfar_get_params(dev, dev_idx++);
+
 	/* Stop the DMA engine now, in case it was running before
 	 * (The firmware could have used it, and left it running).
 	 */
@@ -1232,9 +1235,6 @@ static int gfar_probe(struct platform_device *ofdev)
 	/* Initialize the filer table */
 	gfar_init_filer_table(priv);
 
-	/* Create all the sysfs files */
-	gfar_init_sysfs(dev);
-
 	/* Print out the device info */
 	netdev_info(dev, "mac: %pM\n", dev->dev_addr);
 
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 63c830c..20121ef 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -17,7 +17,6 @@
  * option) any later version.
  *
  *  Still left to do:
- *      -Add support for module parameters
  *	-Add patch for ethtool phys id
  */
 #ifndef __GIANFAR_H
@@ -1215,7 +1214,7 @@ void gfar_halt(struct net_device *dev);
 void gfar_phy_test(struct mii_bus *bus, struct phy_device *phydev, int enable,
 		   u32 regnum, u32 read);
 void gfar_configure_coalescing_all(struct gfar_private *priv);
-void gfar_init_sysfs(struct net_device *dev);
+void gfar_get_params(struct net_device *dev, int i);
 int gfar_set_features(struct net_device *dev, netdev_features_t features);
 void gfar_check_rx_parser_mode(struct gfar_private *priv);
 void gfar_vlan_mode(struct net_device *dev, netdev_features_t features);
diff --git a/drivers/net/ethernet/freescale/gianfar_param.c b/drivers/net/ethernet/freescale/gianfar_param.c
new file mode 100644
index 0000000..7ed6745
--- /dev/null
+++ b/drivers/net/ethernet/freescale/gianfar_param.c
@@ -0,0 +1,123 @@
+/* Copyright 2002-2009, 2013 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * Derived from:
+ * drivers/net/ethernet/freescale/gianfar_sysfs.c
+ */
+
+#include "gianfar.h"
+
+#define GFAR_DEV_MAXCNT 8
+
+#define PARAM_UNSET	-1
+
+#define GFAR_PARAM_INIT { [0 ... GFAR_DEV_MAXCNT - 1] = PARAM_UNSET }
+#define GFAR_PARAM(X, desc) \
+	static int X[GFAR_DEV_MAXCNT] = GFAR_PARAM_INIT; \
+	module_param_array_named(X, X, int, NULL, S_IRUGO); \
+	MODULE_PARM_DESC(X, desc);
+
+
+GFAR_PARAM(fifo_threshold, "Tx FIFO threshold to trigger frame unload");
+GFAR_PARAM(fifo_starve, "Tx FIFO thershold to enter starve mode");
+GFAR_PARAM(fifo_starve_off, "Tx FIFO thershold to exit starve mode");
+GFAR_PARAM(bd_stash, "Stash Rx buffer descriptors to L2");
+GFAR_PARAM(rx_stash_size, "Stash this many Rx frame bytes to L2");
+GFAR_PARAM(rx_stash_index, "Stash Rx frame starting from this offset");
+
+
+/* FIFO Tx Threshold - the numerical SRAM entry (0-511 for 2-Kbyte FIFO)
+ * to trigger the unloading of FIFO data to the PHY
+ */
+static void gfar_fifo_threshold_update(struct gfar_private *priv, int thr)
+{
+	if (thr >= 0 && thr <= GFAR_MAX_FIFO_THRESHOLD)
+		priv->fifo_threshold = thr;
+}
+
+/* FIFO Tx Starve/Shutoff - numerical SRAM entry (0-511 for 2-Kbyte FIFO).
+ * If the # of valid entries in the FIFO is less than or equal to the
+ * starve register value the controller enters starve state
+ * (i.e. triggers diagnostic interrupt to signal that Tx underrun is
+ * imminent and increases priority of Tx memory transactions).
+ * If the starve state is in effect and the number of valid entries in the
+ * FIFO becomes greater than or equal to the value in the FIFO transmit starve
+ * shutoff register, the starve condition ends.
+ */
+static void gfar_fifo_starve_update(struct gfar_private *priv,
+				    int thr_strv, int thr_strv_off)
+{
+	if (thr_strv >= 0 && thr_strv <= GFAR_MAX_FIFO_STARVE &&
+	    thr_strv_off >= 0 && thr_strv_off <= GFAR_MAX_FIFO_STARVE_OFF &&
+	    thr_strv < thr_strv_off) {
+		priv->fifo_starve = thr_strv;
+		priv->fifo_starve_off = thr_strv_off;
+	}
+}
+
+static void gfar_bd_stash_update(struct gfar_private *priv, int en)
+{
+	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_BD_STASHING))
+		return;
+
+	if (en == 0 || en == 1)
+		priv->bd_stash_en = en;
+}
+
+static void gfar_rx_stash_update(struct gfar_private *priv, int size, int idx)
+{
+	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_BUF_STASHING))
+		return;
+
+	/* valid sizes are multiples of 8 */
+	if (size <= 0 || size > DEFAULT_RX_BUFFER_SIZE || size & 0x7)
+		return;
+
+	/* index must be multiple of 64 */
+	if (idx < 0 || idx > size || idx & 0x3f)
+		return;
+
+	priv->rx_stash_index = idx;
+	priv->rx_stash_size = size;
+}
+
+void __init gfar_get_params(struct net_device *ndev, int i)
+{
+	struct gfar_private *priv = netdev_priv(ndev);
+
+	/* Initialize the default values */
+	priv->fifo_threshold = DEFAULT_FIFO_TX_THR;
+	priv->fifo_starve = DEFAULT_FIFO_TX_STARVE;
+	priv->fifo_starve_off = DEFAULT_FIFO_TX_STARVE_OFF;
+
+	if (i >= GFAR_DEV_MAXCNT) {
+		netdev_warn(ndev, "dev config missing, using defaults\n");
+		return;
+	}
+
+	/* FIFO threshold validation */
+	if (fifo_threshold[i] != PARAM_UNSET)
+		gfar_fifo_threshold_update(priv, fifo_threshold[i]);
+
+	/* FIFO starve mode hysteresis thresholds validation */
+	if (fifo_starve[i] != PARAM_UNSET &&
+	    fifo_starve_off[i] != PARAM_UNSET) {
+		gfar_fifo_starve_update(priv, fifo_starve[i],
+					fifo_starve_off[i]);
+	}
+
+	/* BD stashing validation */
+	if (bd_stash[i] != PARAM_UNSET)
+		gfar_bd_stash_update(priv, bd_stash[i]);
+
+	/* Rx frame stashing validation */
+	if (rx_stash_size[i] != PARAM_UNSET &&
+	    rx_stash_index[i] != PARAM_UNSET) {
+		gfar_rx_stash_update(priv, rx_stash_size[i],
+				     rx_stash_index[i]);
+	}
+}
diff --git a/drivers/net/ethernet/freescale/gianfar_sysfs.c b/drivers/net/ethernet/freescale/gianfar_sysfs.c
deleted file mode 100644
index e02dd13..0000000
--- a/drivers/net/ethernet/freescale/gianfar_sysfs.c
+++ /dev/null
@@ -1,340 +0,0 @@
-/*
- * drivers/net/ethernet/freescale/gianfar_sysfs.c
- *
- * Gianfar Ethernet Driver
- * This driver is designed for the non-CPM ethernet controllers
- * on the 85xx and 83xx family of integrated processors
- * Based on 8260_io/fcc_enet.c
- *
- * Author: Andy Fleming
- * Maintainer: Kumar Gala (galak@kernel.crashing.org)
- * Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
- *
- * Copyright 2002-2009 Freescale Semiconductor, Inc.
- *
- * This program is free software; you can redistribute  it and/or modify it
- * under  the terms of  the GNU General  Public License as published by the
- * Free Software Foundation;  either version 2 of the  License, or (at your
- * option) any later version.
- *
- * Sysfs file creation and management
- */
-
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/errno.h>
-#include <linux/unistd.h>
-#include <linux/delay.h>
-#include <linux/etherdevice.h>
-#include <linux/spinlock.h>
-#include <linux/mm.h>
-#include <linux/device.h>
-
-#include <asm/uaccess.h>
-#include <linux/module.h>
-
-#include "gianfar.h"
-
-static ssize_t gfar_show_bd_stash(struct device *dev,
-				  struct device_attribute *attr, char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%s\n", priv->bd_stash_en ? "on" : "off");
-}
-
-static ssize_t gfar_set_bd_stash(struct device *dev,
-				 struct device_attribute *attr,
-				 const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	int new_setting = 0;
-	u32 temp;
-	unsigned long flags;
-
-	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_BD_STASHING))
-		return count;
-
-
-	/* Find out the new setting */
-	if (!strncmp("on", buf, count - 1) || !strncmp("1", buf, count - 1))
-		new_setting = 1;
-	else if (!strncmp("off", buf, count - 1) ||
-		 !strncmp("0", buf, count - 1))
-		new_setting = 0;
-	else
-		return count;
-
-
-	local_irq_save(flags);
-	lock_rx_qs(priv);
-
-	/* Set the new stashing value */
-	priv->bd_stash_en = new_setting;
-
-	temp = gfar_read(&regs->attr);
-
-	if (new_setting)
-		temp |= ATTR_BDSTASH;
-	else
-		temp &= ~(ATTR_BDSTASH);
-
-	gfar_write(&regs->attr, temp);
-
-	unlock_rx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(bd_stash, 0644, gfar_show_bd_stash, gfar_set_bd_stash);
-
-static ssize_t gfar_show_rx_stash_size(struct device *dev,
-				       struct device_attribute *attr, char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%d\n", priv->rx_stash_size);
-}
-
-static ssize_t gfar_set_rx_stash_size(struct device *dev,
-				      struct device_attribute *attr,
-				      const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	unsigned int length = simple_strtoul(buf, NULL, 0);
-	u32 temp;
-	unsigned long flags;
-
-	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_BUF_STASHING))
-		return count;
-
-	local_irq_save(flags);
-	lock_rx_qs(priv);
-
-	if (length > priv->rx_buffer_size)
-		goto out;
-
-	if (length == priv->rx_stash_size)
-		goto out;
-
-	priv->rx_stash_size = length;
-
-	temp = gfar_read(&regs->attreli);
-	temp &= ~ATTRELI_EL_MASK;
-	temp |= ATTRELI_EL(length);
-	gfar_write(&regs->attreli, temp);
-
-	/* Turn stashing on/off as appropriate */
-	temp = gfar_read(&regs->attr);
-
-	if (length)
-		temp |= ATTR_BUFSTASH;
-	else
-		temp &= ~(ATTR_BUFSTASH);
-
-	gfar_write(&regs->attr, temp);
-
-out:
-	unlock_rx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(rx_stash_size, 0644, gfar_show_rx_stash_size,
-		   gfar_set_rx_stash_size);
-
-/* Stashing will only be enabled when rx_stash_size != 0 */
-static ssize_t gfar_show_rx_stash_index(struct device *dev,
-					struct device_attribute *attr,
-					char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%d\n", priv->rx_stash_index);
-}
-
-static ssize_t gfar_set_rx_stash_index(struct device *dev,
-				       struct device_attribute *attr,
-				       const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	unsigned short index = simple_strtoul(buf, NULL, 0);
-	u32 temp;
-	unsigned long flags;
-
-	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_BUF_STASHING))
-		return count;
-
-	local_irq_save(flags);
-	lock_rx_qs(priv);
-
-	if (index > priv->rx_stash_size)
-		goto out;
-
-	if (index == priv->rx_stash_index)
-		goto out;
-
-	priv->rx_stash_index = index;
-
-	temp = gfar_read(&regs->attreli);
-	temp &= ~ATTRELI_EI_MASK;
-	temp |= ATTRELI_EI(index);
-	gfar_write(&regs->attreli, temp);
-
-out:
-	unlock_rx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(rx_stash_index, 0644, gfar_show_rx_stash_index,
-		   gfar_set_rx_stash_index);
-
-static ssize_t gfar_show_fifo_threshold(struct device *dev,
-					struct device_attribute *attr,
-					char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%d\n", priv->fifo_threshold);
-}
-
-static ssize_t gfar_set_fifo_threshold(struct device *dev,
-				       struct device_attribute *attr,
-				       const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	unsigned int length = simple_strtoul(buf, NULL, 0);
-	u32 temp;
-	unsigned long flags;
-
-	if (length > GFAR_MAX_FIFO_THRESHOLD)
-		return count;
-
-	local_irq_save(flags);
-	lock_tx_qs(priv);
-
-	priv->fifo_threshold = length;
-
-	temp = gfar_read(&regs->fifo_tx_thr);
-	temp &= ~FIFO_TX_THR_MASK;
-	temp |= length;
-	gfar_write(&regs->fifo_tx_thr, temp);
-
-	unlock_tx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(fifo_threshold, 0644, gfar_show_fifo_threshold,
-		   gfar_set_fifo_threshold);
-
-static ssize_t gfar_show_fifo_starve(struct device *dev,
-				     struct device_attribute *attr, char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%d\n", priv->fifo_starve);
-}
-
-static ssize_t gfar_set_fifo_starve(struct device *dev,
-				    struct device_attribute *attr,
-				    const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	unsigned int num = simple_strtoul(buf, NULL, 0);
-	u32 temp;
-	unsigned long flags;
-
-	if (num > GFAR_MAX_FIFO_STARVE)
-		return count;
-
-	local_irq_save(flags);
-	lock_tx_qs(priv);
-
-	priv->fifo_starve = num;
-
-	temp = gfar_read(&regs->fifo_tx_starve);
-	temp &= ~FIFO_TX_STARVE_MASK;
-	temp |= num;
-	gfar_write(&regs->fifo_tx_starve, temp);
-
-	unlock_tx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(fifo_starve, 0644, gfar_show_fifo_starve,
-		   gfar_set_fifo_starve);
-
-static ssize_t gfar_show_fifo_starve_off(struct device *dev,
-					 struct device_attribute *attr,
-					 char *buf)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-
-	return sprintf(buf, "%d\n", priv->fifo_starve_off);
-}
-
-static ssize_t gfar_set_fifo_starve_off(struct device *dev,
-					struct device_attribute *attr,
-					const char *buf, size_t count)
-{
-	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
-	struct gfar __iomem *regs = priv->gfargrp[0].regs;
-	unsigned int num = simple_strtoul(buf, NULL, 0);
-	u32 temp;
-	unsigned long flags;
-
-	if (num > GFAR_MAX_FIFO_STARVE_OFF)
-		return count;
-
-	local_irq_save(flags);
-	lock_tx_qs(priv);
-
-	priv->fifo_starve_off = num;
-
-	temp = gfar_read(&regs->fifo_tx_starve_shutoff);
-	temp &= ~FIFO_TX_STARVE_OFF_MASK;
-	temp |= num;
-	gfar_write(&regs->fifo_tx_starve_shutoff, temp);
-
-	unlock_tx_qs(priv);
-	local_irq_restore(flags);
-
-	return count;
-}
-
-static DEVICE_ATTR(fifo_starve_off, 0644, gfar_show_fifo_starve_off,
-		   gfar_set_fifo_starve_off);
-
-void gfar_init_sysfs(struct net_device *dev)
-{
-	struct gfar_private *priv = netdev_priv(dev);
-	int rc;
-
-	/* Initialize the default values */
-	priv->fifo_threshold = DEFAULT_FIFO_TX_THR;
-	priv->fifo_starve = DEFAULT_FIFO_TX_STARVE;
-	priv->fifo_starve_off = DEFAULT_FIFO_TX_STARVE_OFF;
-
-	/* Create our sysfs files */
-	rc = device_create_file(&dev->dev, &dev_attr_bd_stash);
-	rc |= device_create_file(&dev->dev, &dev_attr_rx_stash_size);
-	rc |= device_create_file(&dev->dev, &dev_attr_rx_stash_index);
-	rc |= device_create_file(&dev->dev, &dev_attr_fifo_threshold);
-	rc |= device_create_file(&dev->dev, &dev_attr_fifo_starve);
-	rc |= device_create_file(&dev->dev, &dev_attr_fifo_starve_off);
-	if (rc)
-		dev_err(&dev->dev, "Error creating gianfar sysfs files\n");
-}
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 6/6] gianfar: Remove clean_rx_ring race from gfar_ethtool
From: Claudiu Manoil @ 2014-02-14 12:04 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1392379445-28358-1-git-send-email-claudiu.manoil@freescale.com>

gfar_clean_rx_ring() was designed to be called from napi
(rx softirq) context to do the Rx processing. Calling it
from a process context like this is a bug as it will
clearly race with the napi Rx processing.

There's also no point in initializing num_txbdfree since
startup_gfar() already does that, when bringing the device
up again (after reset). Changing num_txbdfree "on-the-fly"
like this is also subject to race conditions.  num_txbdfree
is handled by the Tx processing path and the device reset
procedure.  Also, don't assume that num_rx_queues is always
equal to num_tx_queues.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar_ethtool.c | 63 +++---------------------
 1 file changed, 8 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 69fab72..19557ec 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -44,9 +44,6 @@
 
 #include "gianfar.h"
 
-extern int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue,
-			      int rx_work_limit);
-
 #define GFAR_MAX_COAL_USECS 0xffff
 #define GFAR_MAX_COAL_FRAMES 0xff
 static void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy,
@@ -466,15 +463,13 @@ static void gfar_gringparam(struct net_device *dev,
 }
 
 /* Change the current ring parameters, stopping the controller if
- * necessary so that we don't mess things up while we're in
- * motion.  We wait for the ring to be clean before reallocating
- * the rings.
+ * necessary so that we don't mess things up while we're in motion.
  */
 static int gfar_sringparam(struct net_device *dev,
 			   struct ethtool_ringparam *rvals)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	int err = 0, i = 0;
+	int err = 0, i;
 
 	if (rvals->rx_pending > GFAR_RX_MAX_RING_SIZE)
 		return -EINVAL;
@@ -492,38 +487,15 @@ static int gfar_sringparam(struct net_device *dev,
 		return -EINVAL;
 	}
 
-
-	if (dev->flags & IFF_UP) {
-		unsigned long flags;
-
-		/* Halt TX and RX, and process the frames which
-		 * have already been received
-		 */
-		local_irq_save(flags);
-		lock_tx_qs(priv);
-		lock_rx_qs(priv);
-
-		gfar_halt(priv);
-
-		unlock_rx_qs(priv);
-		unlock_tx_qs(priv);
-		local_irq_restore(flags);
-
-		for (i = 0; i < priv->num_rx_queues; i++)
-			gfar_clean_rx_ring(priv->rx_queue[i],
-					   priv->rx_queue[i]->rx_ring_size);
-
-		/* Now we take down the rings to rebuild them */
+	if (dev->flags & IFF_UP)
 		stop_gfar(dev);
-	}
 
-	/* Change the size */
-	for (i = 0; i < priv->num_rx_queues; i++) {
+	/* Change the sizes */
+	for (i = 0; i < priv->num_rx_queues; i++)
 		priv->rx_queue[i]->rx_ring_size = rvals->rx_pending;
+
+	for (i = 0; i < priv->num_tx_queues; i++)
 		priv->tx_queue[i]->tx_ring_size = rvals->tx_pending;
-		priv->tx_queue[i]->num_txbdfree =
-			priv->tx_queue[i]->tx_ring_size;
-	}
 
 	/* Rebuild the rings with the new size */
 	if (dev->flags & IFF_UP) {
@@ -607,10 +579,8 @@ static int gfar_spauseparam(struct net_device *dev,
 
 int gfar_set_features(struct net_device *dev, netdev_features_t features)
 {
-	struct gfar_private *priv = netdev_priv(dev);
-	unsigned long flags;
-	int err = 0, i = 0;
 	netdev_features_t changed = dev->features ^ features;
+	int err = 0;
 
 	if (changed & (NETIF_F_HW_VLAN_CTAG_TX|NETIF_F_HW_VLAN_CTAG_RX))
 		gfar_vlan_mode(dev, features);
@@ -619,23 +589,6 @@ int gfar_set_features(struct net_device *dev, netdev_features_t features)
 		return 0;
 
 	if (dev->flags & IFF_UP) {
-		/* Halt TX and RX, and process the frames which
-		 * have already been received
-		 */
-		local_irq_save(flags);
-		lock_tx_qs(priv);
-		lock_rx_qs(priv);
-
-		gfar_halt(priv);
-
-		unlock_tx_qs(priv);
-		unlock_rx_qs(priv);
-		local_irq_restore(flags);
-
-		for (i = 0; i < priv->num_rx_queues; i++)
-			gfar_clean_rx_ring(priv->rx_queue[i],
-					   priv->rx_queue[i]->rx_ring_size);
-
 		/* Now we take down the rings to rebuild them */
 		stop_gfar(dev);
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH -next V3] ip_tunnel: return more precise errno value when adding tunnel fails
From: Florian Westphal @ 2014-02-14 12:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Currently this always returns ENOBUFS, because the return value of
__ip_tunnel_create is discarded.

A more common failure is a duplicate name (EEXIST).  Propagate the real
error code so userspace can display a more meaningful error message.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes since v2:
 Fix subject prefix (s/gre/ip_tunnel/)
Changes since v1:
  Use ERR_CAST() (suggested by Ben Hutchings)

 net/ipv4/ip_tunnel.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 50228be..f137c5d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -443,7 +443,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
 	fbt = netdev_priv(itn->fb_tunnel_dev);
 	dev = __ip_tunnel_create(net, itn->fb_tunnel_dev->rtnl_link_ops, parms);
 	if (IS_ERR(dev))
-		return NULL;
+		return ERR_CAST(dev);
 
 	dev->mtu = ip_tunnel_bind_dev(dev);
 
@@ -796,9 +796,13 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 
 		t = ip_tunnel_find(itn, p, itn->fb_tunnel_dev->type);
 
-		if (!t && (cmd == SIOCADDTUNNEL))
+		if (!t && (cmd == SIOCADDTUNNEL)) {
 			t = ip_tunnel_create(net, itn, p);
-
+			if (IS_ERR(t)) {
+				err = PTR_ERR(t);
+				break;
+			}
+		}
 		if (dev != itn->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) {
 			if (t != NULL) {
 				if (t->dev != dev) {
@@ -825,8 +829,9 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 		if (t) {
 			err = 0;
 			ip_tunnel_update(itn, t, dev, p, true);
-		} else
-			err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT);
+		} else {
+			err = -ENOENT;
+		}
 		break;
 
 	case SIOCDELTUNNEL:
-- 
1.8.1.5

^ permalink raw reply related

* Re: [PATCH net-next 6/7] sch_netem: clear old rate when old qdisc's replaced
From: Eric Dumazet @ 2014-02-14 12:43 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, stephen
In-Reply-To: <1392366970-11592-7-git-send-email-yangyingliang@huawei.com>

On Fri, 2014-02-14 at 16:36 +0800, Yang Yingliang wrote:
> If we set a netem qdisc with rate option, while we
> use "#tc qdisc replace ..." that without rate option
> to replace the old qdisc, the old rate is still there.
> We need clear old rate after qdisc's replaced.

Wait... Have you tested :

tc qdisc change ...

This is far more needed than 'replace' : 

You (meaning user scripts) can implement replace by delete + create, but
'tc qdisc change' needs current code.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox