* Re: [PATCHv3] drivers/net/usb: Add new driver ipheth
From: "L. Alberto Giménez" @ 2010-04-05 18:51 UTC (permalink / raw)
To: Oliver Neukum
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
linville-2XuSBdqkA4R54TAoqtyWWQ, j.dumon-x9gZzRpC1QbQT0dZR+AlfA,
steve.glendinning-sdUf+H5yV5I, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
gregkh-l3A5Bk7waGM, dgiagio-Re5JQEeQqe8AvxtiuMwx3w,
dborca-/E1597aS9LQAvxtiuMwx3w
In-Reply-To: <201004040924.43949.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
On 04/04/2010 09:24 AM, Oliver Neukum wrote:
> Am Freitag, 2. April 2010 20:23:21 schrieb L. Alberto Giménez:
>> On 03/31/2010 10:33 PM, Oliver Neukum wrote:
>>> Am Mittwoch, 31. März 2010 21:42:07 schrieb L. Alberto Giménez:
>>>> +static struct usb_driver ipheth_driver = {
>>>> + .name = "ipheth",
>>>> + .probe = ipheth_probe,
>>>> + .disconnect = ipheth_disconnect,
>>>> + .id_table = ipheth_table,
>>>> + .supports_autosuspend = 0,
>>> redundant
>> Why?
>
> 0 is the default.
Heh, I thought that you meant that the whole struct was redundant, and
that puzzled me a little bit. Now everything is clear (my fault for not
realizing the redundant initialization to 0).
Regards,
--
L. Alberto Giménez
JabberID agimenez-eu7EghD4TOHJ13y34KW5H97lo5+wdyHW@public.gmane.org
GnuPG key ID 0x3BAABDE1
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] [V2] Add non-Virtex5 support for LL TEMAC driver
From: Grant Likely @ 2010-04-05 18:10 UTC (permalink / raw)
To: David Miller
Cc: netdev, linuxppc-dev, jwboyer, john.williams, michal.simek,
John Tyner, John Linn
In-Reply-To: <fa686aa41003171302v46738069pae51061ba83d0818@mail.gmail.com>
David, are you going to pick up this patch, or would you like me to?
Thanks,
g
On Wed, Mar 17, 2010 at 2:02 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Fri, Mar 12, 2010 at 7:05 PM, John Linn <john.linn@xilinx.com> wrote:
>> This patch adds support for using the LL TEMAC Ethernet driver on
>> non-Virtex 5 platforms by adding support for accessing the Soft DMA
>> registers as if they were memory mapped instead of solely through the
>> DCR's (available on the Virtex 5).
>>
>> The patch also updates the driver so that it runs on the MicroBlaze.
>> The changes were tested on the PowerPC 440, PowerPC 405, and the
>> MicroBlaze platforms.
>>
>> Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
>> Signed-off-by: John Linn <john.linn@xilinx.com>
>> ---
>
> I've not booted this, but it looks right, and it compiles fine. The
> issues that Michal raised need to be delt with too, but they are
> preexisting bugs unrelated to this change which you should fix up in a
> separate patch.
>
> Acked-by: Grant Likely <grant.likely@secretlab.ca>
>
>>
>> V2 - Incorporated comments from Grant and added more logic to allow the driver
>> to work on MicroBlaze.
>>
>> drivers/net/Kconfig | 1 -
>> drivers/net/ll_temac.h | 17 +++++-
>> drivers/net/ll_temac_main.c | 124 ++++++++++++++++++++++++++++++++++---------
>> 3 files changed, 113 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index 9b6efe1..5402105 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -2443,7 +2443,6 @@ config MV643XX_ETH
>> config XILINX_LL_TEMAC
>> tristate "Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver"
>> select PHYLIB
>> - depends on PPC_DCR_NATIVE
>> help
>> This driver supports the Xilinx 10/100/1000 LocalLink TEMAC
>> core used in Xilinx Spartan and Virtex FPGAs
>> diff --git a/drivers/net/ll_temac.h b/drivers/net/ll_temac.h
>> index 1af66a1..915aa34 100644
>> --- a/drivers/net/ll_temac.h
>> +++ b/drivers/net/ll_temac.h
>> @@ -5,8 +5,11 @@
>> #include <linux/netdevice.h>
>> #include <linux/of.h>
>> #include <linux/spinlock.h>
>> +
>> +#ifdef CONFIG_PPC_DCR
>> #include <asm/dcr.h>
>> #include <asm/dcr-regs.h>
>> +#endif
>>
>> /* packet size info */
>> #define XTE_HDR_SIZE 14 /* size of Ethernet header */
>> @@ -290,8 +293,12 @@ This option defaults to enabled (set) */
>>
>> #define TX_CONTROL_CALC_CSUM_MASK 1
>>
>> +/* Align the IP data in the packet on word boundaries as MicroBlaze
>> + * needs it.
>> + */
>> +
>> #define XTE_ALIGN 32
>> -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
>> +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
>>
>> #define MULTICAST_CAM_TABLE_NUM 4
>>
>> @@ -335,9 +342,15 @@ struct temac_local {
>> struct mii_bus *mii_bus; /* MII bus reference */
>> int mdio_irqs[PHY_MAX_ADDR]; /* IRQs table for MDIO bus */
>>
>> - /* IO registers and IRQs */
>> + /* IO registers, dma functions and IRQs */
>> void __iomem *regs;
>> + void __iomem *sdma_regs;
>> +#ifdef CONFIG_PPC_DCR
>> dcr_host_t sdma_dcrs;
>> +#endif
>> + u32 (*dma_in)(struct temac_local *, int);
>> + void (*dma_out)(struct temac_local *, int, u32);
>> +
>> int tx_irq;
>> int rx_irq;
>> int emac_num;
>> diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
>> index a18e348..9aedf9b 100644
>> --- a/drivers/net/ll_temac_main.c
>> +++ b/drivers/net/ll_temac_main.c
>> @@ -20,9 +20,6 @@
>> * or rx, so this should be okay.
>> *
>> * TODO:
>> - * - Fix driver to work on more than just Virtex5. Right now the driver
>> - * assumes that the locallink DMA registers are accessed via DCR
>> - * instructions.
>> * - Factor out locallink DMA code into separate driver
>> * - Fix multicast assignment.
>> * - Fix support for hardware checksumming.
>> @@ -115,17 +112,86 @@ void temac_indirect_out32(struct temac_local *lp, int reg, u32 value)
>> temac_iow(lp, XTE_CTL0_OFFSET, CNTLREG_WRITE_ENABLE_MASK | reg);
>> }
>>
>> +/**
>> + * temac_dma_in32 - Memory mapped DMA read, this function expects a
>> + * register input that is based on DCR word addresses which
>> + * are then converted to memory mapped byte addresses
>> + */
>> static u32 temac_dma_in32(struct temac_local *lp, int reg)
>> {
>> - return dcr_read(lp->sdma_dcrs, reg);
>> + return in_be32((u32 *)(lp->sdma_regs + (reg << 2)));
>> }
>>
>> +/**
>> + * temac_dma_out32 - Memory mapped DMA read, this function expects a
>> + * register input that is based on DCR word addresses which
>> + * are then converted to memory mapped byte addresses
>> + */
>> static void temac_dma_out32(struct temac_local *lp, int reg, u32 value)
>> {
>> + out_be32((u32 *)(lp->sdma_regs + (reg << 2)), value);
>> +}
>> +
>> +/* DMA register access functions can be DCR based or memory mapped.
>> + * The PowerPC 440 is DCR based, the PowerPC 405 and MicroBlaze are both
>> + * memory mapped.
>> + */
>> +#ifdef CONFIG_PPC_DCR
>> +
>> +/**
>> + * temac_dma_dcr_in32 - DCR based DMA read
>> + */
>> +static u32 temac_dma_dcr_in(struct temac_local *lp, int reg)
>> +{
>> + return dcr_read(lp->sdma_dcrs, reg);
>> +}
>> +
>> +/**
>> + * temac_dma_dcr_out32 - DCR based DMA write
>> + */
>> +static void temac_dma_dcr_out(struct temac_local *lp, int reg, u32 value)
>> +{
>> dcr_write(lp->sdma_dcrs, reg, value);
>> }
>>
>> /**
>> + * temac_dcr_setup - If the DMA is DCR based, then setup the address and
>> + * I/O functions
>> + */
>> +static int temac_dcr_setup(struct temac_local *lp, struct of_device *op,
>> + struct device_node *np)
>> +{
>> + unsigned int dcrs;
>> +
>> + /* setup the dcr address mapping if it's in the device tree */
>> +
>> + dcrs = dcr_resource_start(np, 0);
>> + if (dcrs != 0) {
>> + lp->sdma_dcrs = dcr_map(np, dcrs, dcr_resource_len(np, 0));
>> + lp->dma_in = temac_dma_dcr_in;
>> + lp->dma_out = temac_dma_dcr_out;
>> + dev_dbg(&op->dev, "DCR base: %x\n", dcrs);
>> + return 0;
>> + }
>> + /* no DCR in the device tree, indicate a failure */
>> + return -1;
>> +}
>> +
>> +#else
>> +
>> +/*
>> + * temac_dcr_setup - This is a stub for when DCR is not supported,
>> + * such as with MicroBlaze
>> + */
>> +static int temac_dcr_setup(struct temac_local *lp, struct of_device *op,
>> + struct device_node *np)
>> +{
>> + return -1;
>> +}
>> +
>> +#endif
>> +
>> +/**
>> * temac_dma_bd_init - Setup buffer descriptor rings
>> */
>> static int temac_dma_bd_init(struct net_device *ndev)
>> @@ -172,23 +238,23 @@ static int temac_dma_bd_init(struct net_device *ndev)
>> lp->rx_bd_v[i].app0 = STS_CTRL_APP0_IRQONEND;
>> }
>>
>> - temac_dma_out32(lp, TX_CHNL_CTRL, 0x10220400 |
>> + lp->dma_out(lp, TX_CHNL_CTRL, 0x10220400 |
>> CHNL_CTRL_IRQ_EN |
>> CHNL_CTRL_IRQ_DLY_EN |
>> CHNL_CTRL_IRQ_COAL_EN);
>> /* 0x10220483 */
>> /* 0x00100483 */
>> - temac_dma_out32(lp, RX_CHNL_CTRL, 0xff010000 |
>> + lp->dma_out(lp, RX_CHNL_CTRL, 0xff010000 |
>> CHNL_CTRL_IRQ_EN |
>> CHNL_CTRL_IRQ_DLY_EN |
>> CHNL_CTRL_IRQ_COAL_EN |
>> CHNL_CTRL_IRQ_IOE);
>> /* 0xff010283 */
>>
>> - temac_dma_out32(lp, RX_CURDESC_PTR, lp->rx_bd_p);
>> - temac_dma_out32(lp, RX_TAILDESC_PTR,
>> + lp->dma_out(lp, RX_CURDESC_PTR, lp->rx_bd_p);
>> + lp->dma_out(lp, RX_TAILDESC_PTR,
>> lp->rx_bd_p + (sizeof(*lp->rx_bd_v) * (RX_BD_NUM - 1)));
>> - temac_dma_out32(lp, TX_CURDESC_PTR, lp->tx_bd_p);
>> + lp->dma_out(lp, TX_CURDESC_PTR, lp->tx_bd_p);
>>
>> return 0;
>> }
>> @@ -426,9 +492,9 @@ static void temac_device_reset(struct net_device *ndev)
>> temac_indirect_out32(lp, XTE_RXC1_OFFSET, val & ~XTE_RXC1_RXEN_MASK);
>>
>> /* Reset Local Link (DMA) */
>> - temac_dma_out32(lp, DMA_CONTROL_REG, DMA_CONTROL_RST);
>> + lp->dma_out(lp, DMA_CONTROL_REG, DMA_CONTROL_RST);
>> timeout = 1000;
>> - while (temac_dma_in32(lp, DMA_CONTROL_REG) & DMA_CONTROL_RST) {
>> + while (lp->dma_in(lp, DMA_CONTROL_REG) & DMA_CONTROL_RST) {
>> udelay(1);
>> if (--timeout == 0) {
>> dev_err(&ndev->dev,
>> @@ -436,7 +502,7 @@ static void temac_device_reset(struct net_device *ndev)
>> break;
>> }
>> }
>> - temac_dma_out32(lp, DMA_CONTROL_REG, DMA_TAIL_ENABLE);
>> + lp->dma_out(lp, DMA_CONTROL_REG, DMA_TAIL_ENABLE);
>>
>> temac_dma_bd_init(ndev);
>>
>> @@ -597,7 +663,7 @@ static int temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> lp->tx_bd_tail = 0;
>>
>> /* Kick off the transfer */
>> - temac_dma_out32(lp, TX_TAILDESC_PTR, tail_p); /* DMA start */
>> + lp->dma_out(lp, TX_TAILDESC_PTR, tail_p); /* DMA start */
>>
>> return NETDEV_TX_OK;
>> }
>> @@ -663,7 +729,7 @@ static void ll_temac_recv(struct net_device *ndev)
>> cur_p = &lp->rx_bd_v[lp->rx_bd_ci];
>> bdstat = cur_p->app0;
>> }
>> - temac_dma_out32(lp, RX_TAILDESC_PTR, tail_p);
>> + lp->dma_out(lp, RX_TAILDESC_PTR, tail_p);
>>
>> spin_unlock_irqrestore(&lp->rx_lock, flags);
>> }
>> @@ -674,8 +740,8 @@ static irqreturn_t ll_temac_tx_irq(int irq, void *_ndev)
>> struct temac_local *lp = netdev_priv(ndev);
>> unsigned int status;
>>
>> - status = temac_dma_in32(lp, TX_IRQ_REG);
>> - temac_dma_out32(lp, TX_IRQ_REG, status);
>> + status = lp->dma_in(lp, TX_IRQ_REG);
>> + lp->dma_out(lp, TX_IRQ_REG, status);
>>
>> if (status & (IRQ_COAL | IRQ_DLY))
>> temac_start_xmit_done(lp->ndev);
>> @@ -692,8 +758,8 @@ static irqreturn_t ll_temac_rx_irq(int irq, void *_ndev)
>> unsigned int status;
>>
>> /* Read and clear the status registers */
>> - status = temac_dma_in32(lp, RX_IRQ_REG);
>> - temac_dma_out32(lp, RX_IRQ_REG, status);
>> + status = lp->dma_in(lp, RX_IRQ_REG);
>> + lp->dma_out(lp, RX_IRQ_REG, status);
>>
>> if (status & (IRQ_COAL | IRQ_DLY))
>> ll_temac_recv(lp->ndev);
>> @@ -794,7 +860,7 @@ static ssize_t temac_show_llink_regs(struct device *dev,
>> int i, len = 0;
>>
>> for (i = 0; i < 0x11; i++)
>> - len += sprintf(buf + len, "%.8x%s", temac_dma_in32(lp, i),
>> + len += sprintf(buf + len, "%.8x%s", lp->dma_in(lp, i),
>> (i % 8) == 7 ? "\n" : " ");
>> len += sprintf(buf + len, "\n");
>>
>> @@ -820,7 +886,6 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>> struct net_device *ndev;
>> const void *addr;
>> int size, rc = 0;
>> - unsigned int dcrs;
>>
>> /* Init network device structure */
>> ndev = alloc_etherdev(sizeof(*lp));
>> @@ -870,13 +935,20 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>> goto nodev;
>> }
>>
>> - dcrs = dcr_resource_start(np, 0);
>> - if (dcrs == 0) {
>> - dev_err(&op->dev, "could not get DMA register address\n");
>> - goto nodev;
>> + /* Setup the DMA register accesses, could be DCR or memory mapped */
>> + if (temac_dcr_setup(lp, op, np)) {
>> +
>> + /* no DCR in the device tree, try non-DCR */
>> + lp->sdma_regs = of_iomap(np, 0);
>> + if (lp->sdma_regs) {
>> + lp->dma_in = temac_dma_in32;
>> + lp->dma_out = temac_dma_out32;
>> + dev_dbg(&op->dev, "MEM base: %p\n", lp->sdma_regs);
>> + } else {
>> + dev_err(&op->dev, "unable to map DMA registers\n");
>> + goto nodev;
>> + }
>> }
>> - lp->sdma_dcrs = dcr_map(np, dcrs, dcr_resource_len(np, 0));
>> - dev_dbg(&op->dev, "DCR base: %x\n", dcrs);
>>
>> lp->rx_irq = irq_of_parse_and_map(np, 0);
>> lp->tx_irq = irq_of_parse_and_map(np, 1);
>> --
>> 1.6.2.1
>>
>>
>>
>> This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.
>>
>>
>>
>
>
>
> --
> Grant Likely, B.Sc., P.Eng.
> Secret Lab Technologies Ltd.
>
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
^ permalink raw reply
* Re: Increased Latencies when upgrading kernel version
From: Taylor Lewick @ 2010-04-05 17:34 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, linux-kernel
In-Reply-To: <r2pd585dc4f1004011853q405eaadcq14b50a9e7a4dcf21@mail.gmail.com>
Okay, don't know what to officially file this under, as a regression
with regards to performance or what, but here is the data. Again,
I've noticed system and network latency appear to have worsened with
later kernel versions.
I was turned onto this problem via the following links:
http://www.kernel.org/pub/linux/kernel/people/christoph/ols2009/ols-2009-paper.pdf
and http://kerneltrap.org/mailarchive/linux-netdev/2009/4/16/5491284
So I set up a test on two servers with Identical hardware, servers,
nics, etc, and used hackbench, udpping, and an internally written app
to compare latency.
Here are just the hackbench results with just the averages across a 5
runs for two different hackbench tests. The 2.6.16 and 2.6.27 kernels
as set up were configured with voluntary preemption, and 250 HZ, so I
just repeated that initially for 2.6.33.1 test. I also tested no
preemption at same HZ setting of 250.
I ran 2.6.16.60 on one server, and the other kernel versions on
another server. These tests are repeatable across different servers,
as in I verified I
don't have a bad server.
Kernel Version HB1 (25 process 300) HB2 (100 process 300)
2.6.16.60 .5402 1.8946
2.6.27.19 .619 2.6268
2.6.32.3-voluntary .5636 2.3484
2.6.33.1-voluntary .5404 2.2872
2.6.33.1-nopreempt .5606 2.3466
So 2.6.16.60 is fast, 2.6.27.19 is slow, and 2.6.33.1 with voluntary
preemption is the next best, but results didn't hold up well as
Hackbench tests used larger numbers of groups., for example, 2.6.16.60
and 2.6.33.1-voluntary were basically the same for HB1, but that
didn't hold when hackebnch tests used more groups.
At this point, I'm looking for ideas in kernel build to tweak, but I'm
not a developer. So SLAB vs SLUB, sparse vs dense IRQ numbering, etc.
Running a -rt kernel isn't an option at this time. I did test that as
well, and latencies were quite a bit worse, but I wasn't adjusting
code to take advantage of a real time OS.
I can make some changes or repeat tests.
Below is some hardware comparisons betweent the two machines.
Differences I noticed was more interrupts and CPU flags on later
kernel version.
HostA 2.6.16.60
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7
0: 108509762 0 0 0 0 0
0 0 IO-APIC-edge timer
8: 1 0 0 0 0 0
0 0 IO-APIC-edge rtc
9: 0 0 0 0 0 0
0 0 IO-APIC-level acpi
58: 305 0 5157735 220 2980100 5927
1187 0 IO-APIC-level libata
162: 0 0 0 0 0 0
0 0 IO-APIC-level uhci_hcd:usb1
170: 0 0 0 0 0 0
0 0 IO-APIC-level uhci_hcd:usb2
177: 6326 0 229018 0 283720 35597
367 0 IO-APIC-level megasas
178: 122 0 1784 1103 3531 20
1457 0 IO-APIC-level uhci_hcd:usb3, ehci_hcd:usb6
186: 0 0 0 0 0 0
0 0 IO-APIC-level uhci_hcd:usb4
194: 22 0 0 0 0 0
0 0 IO-APIC-level ehci_hcd:usb5
210: 1790109 577 0 0 0 0
0 0 PCI-MSI-X eth4-0
218: 233811 93 0 0 0 0
0 0 PCI-MSI-X eth4-1
NMI: 0 0 0 0 0 0
0 0
LOC: 108509683 108509662 108509637 108509614 108509588 108509566
108509541 108509516
ERR: 7
MIS: 0
lspci
00:00.0 Host bridge: Intel Corporation QuickPath Architecture I/O Hub
to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
PCI Express Root Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
PCI Express Root Port 3 (rev 13)
00:07.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
PCI Express Root Port 7 (rev 13)
00:09.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
PCI Express Root Port 9 (rev 13)
00:14.0 PIC: Intel Corporation QuickPath Architecture I/O Hub System
Management Registers (rev 13)
00:14.1 PIC: Intel Corporation QuickPath Architecture I/O Hub GPIO and
Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation QuickPath Architecture I/O Hub Control
Status and RAS Registers (rev 13)
00:16.0 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.1 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.2 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.3 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.4 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.5 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.6 System peripheral: Intel Corporation DMA Engine (rev 13)
00:16.7 System peripheral: Intel Corporation DMA Engine (rev 13)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express
Port 1 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #2 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA
IDE Controller (rev 02)
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
1078 (rev 04)
04:00.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
8018 (rev 0e)
05:02.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
8018 (rev 0e)
05:04.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
8018 (rev 0e)
06:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
06:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
07:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
08:00.0 Ethernet controller: Solarflare Communications Unknown device
0710 (rev 02)
09:03.0 VGA compatible controller: Matrox Graphics, Inc. Unknown
device 0532 (rev 0a)
cat /proc/cpuinfo (just showing first CPU for brevity)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 2926.090
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
nx rdtscp lm constant_tsc pni monitor d
s_cpl vmx est tm2 cx16 xtpr dca popcnt lahf_lm
bogomips : 5857.34
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
ethtool -c eth4
Coalesce parameters for eth4:
Adaptive RX: on TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 60
rx-frames-irq: 0
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
HostB 2.6.33.1
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7
0: 8637 0 0 0 0
0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0
0 0 0 IO-APIC-edge i8042
3: 2 0 0 0 0
0 0 0 IO-APIC-edge
4: 2 0 0 0 0
0 0 0 IO-APIC-edge
8: 1 0 0 0 0
0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0
0 0 0 IO-APIC-edge i8042
16: 7434 683 0 0 0
0 0 0 IO-APIC-fasteoi megasas
17: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi uhci_hcd:usb3
18: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 23 0 0 0 0
0 0 0 IO-APIC-fasteoi ehci_hcd:usb1
20: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi uhci_hcd:usb6
21: 129 0 15 0 0
0 0 0 IO-APIC-fasteoi ehci_hcd:usb2,
uhci_hcd:usb5
23: 369 0 0 0 0
0 0 0 IO-APIC-fasteoi ata_piix
67: 2346 731 0 0 0
0 0 0 PCI-MSI-edge eth4-0
68: 1809 404 0 0 0
0 0 0 PCI-MSI-edge eth4-1
NMI: 0 0 0 0 0
0 0 0 Non-maskable interrupts
LOC: 33071 38348 47397 23246 15715
11065 9004 10391 Local timer interrupts
SPU: 0 0 0 0 0
0 0 0 Spurious interrupts
PMI: 0 0 0 0 0
0 0 0 Performance monitoring interrupts
PND: 0 0 0 0 0
0 0 0 Performance pending work
RES: 2490 2124 4187 4974 1724
5548 1892 2871 Rescheduling interrupts
CAL: 497 2166 141 115 133
144 140 144 Function call interrupts
TLB: 243 244 928 945 289
187 134 93 TLB shootdowns
TRM: 0 0 0 0 0
0 0 0 Thermal event interrupts
THR: 0 0 0 0 0
0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0
0 0 0 Machine check exceptions
MCP: 2 2 2 2 2
2 2 2 Machine check polls
ERR: 7
MIS: 0
lspci
00:00.0 Host bridge: Intel Corporation X58 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
Port 3 (rev 13)
00:07.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
Port 7 (rev 13)
00:09.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
Port 9 (rev 13)
00:14.0 PIC: Intel Corporation X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation X58 I/O Hub GPIO and Scratch Pad
Registers (rev 13)
00:14.2 PIC: Intel Corporation X58 I/O Hub Control Status and RAS
Registers (rev 13)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express
Port 1 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
UHCI Controller #2 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA
IDE Controller (rev 02)
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
1078 (rev 04)
04:00.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
05:02.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
05:04.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
06:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
06:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
07:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
Connection (rev 02)
08:00.0 Ethernet controller: Solarflare Communications SFC4000 rev B
[Solarstorm] (rev 02)
09:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
WPCM450 (rev 0a)
cat /proc/cpuinfo (just showing first CPU for brevity)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 2925.888
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 16
initial apicid : 16
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bt
s rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl
vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida
tpr_shadow vnmi flexpriority ept vpid
bogomips : 5851.77
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
ethtool -c eth4
Coalesce parameters for eth4:
Adaptive RX: on TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 60
rx-frames-irq: 0
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
On Thu, Apr 1, 2010 at 8:53 PM, Taylor Lewick <taylor.lewick@gmail.com> wrote:
> Okay. I will get this info out to the list Monday. Briefly, I'm
> using identical hardware (server), identical NICs, same drivers,
> connected to same switch, and using udpping, hackbench, and an
> internall written app to test latency. Without exception the
> evolution has looked like the following.
>
> 2.6.16.60 latencies for system and network are fast. Meaning
> hackbench and udpping win, and win by quite a bit.
>
> 2.6.27.19 was awful. 2.6.32.1 and 2.6.331. were better for networking
> (with some tweaks, i.e. disable netfilter, etc), and I was able to get
> networking latencies to within 1-3 microseconds of 2.6.16.60
> latencies, but the hackbench results are still pretty bad.
>
> Again, I'll post numbers and more detailed hardware info on Monday
> when I'm back at office...
>
> On Thu, Apr 1, 2010 at 4:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le jeudi 01 avril 2010 à 14:12 -0500, Taylor Lewick a écrit :
>>> For some time now we've been running an older kernel, 2.6.16.60. When
>>> we tried to upgrade, first going to 2.6.27.19 and then to 2.6.32.1 and
>>> 2.6.33.1 we noticed that latencies increased. At first we noticed it
>>> by doing network tests via udpping, netperf, etc. We made some
>>> tweaks, and were able to get network latency to within 1 to 2
>>> microseconds of where we were previously on 2.6.16.60. Then we did
>>> some more testing, and noticed that system latency also seems higher.
>>>
>>> We've done our tests on identical hardware servers, same NICs,
>>> connected through same network gear. Basically, we've tried to keep
>>> everything identical except the kernel versions, and we are unable to
>>> achieve the same performance for system latency on the newer kernels,
>>> despite adjusting various kernel settings and recompiling.
>>>
>>> The latency differences are about 15 microseconds per transaction.
>>>
>>> At this point, I don't know what else to try. I haven't played around
>>> with the /proc/sys/kernel/sched_* paramaters under the newer kernels
>>> yet. Have tried changing pre-emption modes with little effect, in
>>> fact, voluntary preemption seems to be peforming the best for us.
>>>
>>> At this time the realtime patch isn't really an option for us to
>>> consider, at least not yet.
>>>
>>> Any suggestions? Is this a known issue when upgrading to more recent
>>> kernel versions?
>>>
>>
>> Hi Taylor
>>
>> Well, this is bit difficult to generically answer to your generic
>> question. 15 us more latency per transaction seems pretty bad.
>>
>> Some inputs would be nice, describing your workload and
>> software/hardware architecture.
>>
>> lspci
>> cat /proc/cpuinfo
>> cat /proc/interrupts
>> dmesg
>> ethtool -S eth0
>> ethtool -c eth0
>>
>>
>>
>>
>
^ permalink raw reply
* tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002) on DEC Alpha Personal Workstation 433au
From: Adrian Glaubitz @ 2010-04-05 17:13 UTC (permalink / raw)
To: Grant Grundler, Kyle McMartin, David S. Miller, Joe Perches
Cc: netdev, linux-kernel
Hi guys,
I installed Debian unstable on an old digital workstation "DEC Digital
Personal Workstation 433au" (Miata) which has an on-board tulip
network controller. I'm not really using that network controller but
an off-board intel e1000 controller. However, I found that the tulip
driver produces a lot of noise in the message log, the following
message is repated periodically and spams the whole message log:
0000:00:03.0: tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002)
Do you think this is related to the fact that no cable is connected to
the network controller?
The lspci output of the hardware looks like this:
test-adrian1:~# lspci
00:03.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 30)
00:07.0 ISA bridge: Contaq Microsystems 82c693
00:07.1 IDE interface: Contaq Microsystems 82c693
00:07.2 IDE interface: Contaq Microsystems 82c693
00:07.3 USB Controller: Contaq Microsystems 82c693
00:0b.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2064W [Millennium] (rev 01)
00:14.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
01:04.0 SCSI storage controller: QLogic Corp. ISP1020 Fast-wide SCSI (rev 05)
01:09.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller
If you need anymore verbose or debug output, please let me know.
Adrian
^ permalink raw reply
* Re: tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002) on DEC Alpha Personal Workstation 433au
From: Joe Perches @ 2010-04-05 17:36 UTC (permalink / raw)
To: Adrian Glaubitz
Cc: Grant Grundler, Kyle McMartin, David S. Miller, netdev,
linux-kernel
In-Reply-To: <20100405171318.GA18915@physik.fu-berlin.de>
On Mon, 2010-04-05 at 19:13 +0200, Adrian Glaubitz wrote:
> Hi guys,
>
> I installed Debian unstable on an old digital workstation "DEC Digital
> Personal Workstation 433au" (Miata) which has an on-board tulip
> network controller. I'm not really using that network controller but
> an off-board intel e1000 controller. However, I found that the tulip
> driver produces a lot of noise in the message log, the following
> message is repated periodically and spams the whole message log:
>
> 0000:00:03.0: tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002)
>
> Do you think this is related to the fact that no cable is connected to
> the network controller?
Probably something is trying periodically to open the device.
Maybe this helps reduce the message log noise:
Signed-off-by: Joe Perches <joe@perches.com>
---
diff --git a/drivers/net/tulip/tulip.h b/drivers/net/tulip/tulip.h
index 0afa2d4..8c675aa 100644
--- a/drivers/net/tulip/tulip.h
+++ b/drivers/net/tulip/tulip.h
@@ -515,12 +515,11 @@ static inline void tulip_stop_rxtx(struct tulip_private *tp)
while (--i && (ioread32(ioaddr + CSR5) & (CSR5_TS|CSR5_RS)))
udelay(10);
- if (!i)
- printk(KERN_DEBUG "%s: tulip_stop_rxtx() failed"
- " (CSR5 0x%x CSR6 0x%x)\n",
- pci_name(tp->pdev),
- ioread32(ioaddr + CSR5),
- ioread32(ioaddr + CSR6));
+ if (!i && tulip_debug > 1)
+ printk(KERN_DEBUG "%s: tulip_stop_rxtx() failed (CSR5 0x%x CSR6 0x%x)\n",
+ pci_name(tp->pdev),
+ ioread32(ioaddr + CSR5),
+ ioread32(ioaddr + CSR6));
}
}
^ permalink raw reply related
* Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
From: Sridhar Samudrala @ 2010-04-05 17:35 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Tom Lendacky, netdev, kvm@vger.kernel.org
In-Reply-To: <20100404111433.GD3189@redhat.com>
On Sun, 2010-04-04 at 14:14 +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 02, 2010 at 10:31:20AM -0700, Sridhar Samudrala wrote:
> > Make vhost scalable by creating a separate vhost thread per vhost
> > device. This provides better scaling across multiple guests and with
> > multiple interfaces in a guest.
>
> Thanks for looking into this. An alternative approach is
> to simply replace create_singlethread_workqueue with
> create_workqueue which would get us a thread per host CPU.
>
> It seems that in theory this should be the optimal approach
> wrt CPU locality, however, in practice a single thread
> seems to get better numbers. I have a TODO to investigate this.
> Could you try looking into this?
Yes. I tried using create_workqueue(), but the results were not good
atleast when the number of guest interfaces is less than the number
of CPUs. I didn't try more than 8 guests.
Creating a separate thread per guest interface seems to be more
scalable based on the testing i have done so far.
I will try some more tests and get some numbers to compare the following
3 options.
- single vhost thread
- vhost thread per cpu
- vhost thread per guest virtio interface
Thanks
Sridhar
>
> >
> > I am seeing better aggregated througput/latency when running netperf
> > across multiple guests or multiple interfaces in a guest in parallel
> > with this patch.
>
> Any numbers? What happens to CPU utilization?
>
> > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index a6a88df..29aa80f 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -339,8 +339,10 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> > return r;
> > }
> >
> > - vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
> > - vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
> > + vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
> > + &n->dev);
> > + vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
> > + &n->dev);
> > n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> >
> > f->private_data = n;
> > @@ -643,25 +645,14 @@ static struct miscdevice vhost_net_misc = {
> >
> > int vhost_net_init(void)
> > {
> > - int r = vhost_init();
> > - if (r)
> > - goto err_init;
> > - r = misc_register(&vhost_net_misc);
> > - if (r)
> > - goto err_reg;
> > - return 0;
> > -err_reg:
> > - vhost_cleanup();
> > -err_init:
> > - return r;
> > -
> > + return misc_register(&vhost_net_misc);
> > }
> > +
> > module_init(vhost_net_init);
> >
> > void vhost_net_exit(void)
> > {
> > misc_deregister(&vhost_net_misc);
> > - vhost_cleanup();
> > }
> > module_exit(vhost_net_exit);
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 7bd7a1e..243f4d3 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -36,8 +36,6 @@ enum {
> > VHOST_MEMORY_F_LOG = 0x1,
> > };
> >
> > -static struct workqueue_struct *vhost_workqueue;
> > -
> > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
> > poll_table *pt)
> > {
> > @@ -56,18 +54,19 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> > if (!((unsigned long)key & poll->mask))
> > return 0;
> >
> > - queue_work(vhost_workqueue, &poll->work);
> > + queue_work(poll->dev->wq, &poll->work);
> > return 0;
> > }
> >
> > /* Init poll structure */
> > void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
> > - unsigned long mask)
> > + unsigned long mask, struct vhost_dev *dev)
> > {
> > INIT_WORK(&poll->work, func);
> > init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
> > init_poll_funcptr(&poll->table, vhost_poll_func);
> > poll->mask = mask;
> > + poll->dev = dev;
> > }
> >
> > /* Start polling a file. We add ourselves to file's wait queue. The caller must
> > @@ -96,7 +95,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
> >
> > void vhost_poll_queue(struct vhost_poll *poll)
> > {
> > - queue_work(vhost_workqueue, &poll->work);
> > + queue_work(poll->dev->wq, &poll->work);
> > }
> >
> > static void vhost_vq_reset(struct vhost_dev *dev,
> > @@ -128,6 +127,11 @@ long vhost_dev_init(struct vhost_dev *dev,
> > struct vhost_virtqueue *vqs, int nvqs)
> > {
> > int i;
> > +
> > + dev->wq = create_singlethread_workqueue("vhost");
> > + if (!dev->wq)
> > + return -ENOMEM;
> > +
> > dev->vqs = vqs;
> > dev->nvqs = nvqs;
> > mutex_init(&dev->mutex);
> > @@ -143,7 +147,7 @@ long vhost_dev_init(struct vhost_dev *dev,
> > if (dev->vqs[i].handle_kick)
> > vhost_poll_init(&dev->vqs[i].poll,
> > dev->vqs[i].handle_kick,
> > - POLLIN);
> > + POLLIN, dev);
> > }
> > return 0;
> > }
> > @@ -216,6 +220,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > if (dev->mm)
> > mmput(dev->mm);
> > dev->mm = NULL;
> > +
> > + destroy_workqueue(dev->wq);
> > }
> >
> > static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
> > @@ -1095,16 +1101,3 @@ void vhost_disable_notify(struct vhost_virtqueue *vq)
> > vq_err(vq, "Failed to enable notification at %p: %d\n",
> > &vq->used->flags, r);
> > }
> > -
> > -int vhost_init(void)
> > -{
> > - vhost_workqueue = create_singlethread_workqueue("vhost");
> > - if (!vhost_workqueue)
> > - return -ENOMEM;
> > - return 0;
> > -}
> > -
> > -void vhost_cleanup(void)
> > -{
> > - destroy_workqueue(vhost_workqueue);
> > -}
> > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > index 44591ba..60fefd0 100644
> > --- a/drivers/vhost/vhost.h
> > +++ b/drivers/vhost/vhost.h
> > @@ -29,10 +29,11 @@ struct vhost_poll {
> > /* struct which will handle all actual work. */
> > struct work_struct work;
> > unsigned long mask;
> > + struct vhost_dev *dev;
> > };
> >
> > void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
> > - unsigned long mask);
> > + unsigned long mask, struct vhost_dev *dev);
> > void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> > void vhost_poll_stop(struct vhost_poll *poll);
> > void vhost_poll_flush(struct vhost_poll *poll);
> > @@ -110,6 +111,7 @@ struct vhost_dev {
> > int nvqs;
> > struct file *log_file;
> > struct eventfd_ctx *log_ctx;
> > + struct workqueue_struct *wq;
> > };
> >
> > long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
> > @@ -136,9 +138,6 @@ bool vhost_enable_notify(struct vhost_virtqueue *);
> > int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> > unsigned int log_num, u64 len);
> >
> > -int vhost_init(void);
> > -void vhost_cleanup(void);
> > -
> > #define vq_err(vq, fmt, ...) do { \
> > pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \
> > if ((vq)->error_ctx) \
> >
> >
> >
^ permalink raw reply
* Re: [PATCH] ethtool: add names of newer Marvell chips
From: Stephen Hemminger @ 2010-04-05 17:34 UTC (permalink / raw)
To: Mark Ryden; +Cc: Jeff Garzik, netdev
In-Reply-To: <q2sdac45061004030020n28a1ad17p71ed2c074ebb6450@mail.gmail.com>
On Sat, 3 Apr 2010 10:20:40 +0300
Mark Ryden <markryde@gmail.com> wrote:
> Hi,
>
> > + case 0xba: printf("Yukon Ultra 2"); break;
> > + case 0xbc: printf("Yukon Optima"); break;
>
> What about 0xbb?
> Is there ant reason for not using 0xbb for Yukon Optima?
>
> Is it something with blackberry (bb)? :-)
The value comes from a chip register.
The vendor hardware engineers didn't choose to use that version
yet.
^ permalink raw reply
* [PATCH 1/4] flow: virtualize flow cache entry methods
From: Timo Teras @ 2010-04-05 17:01 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, Timo Teras
In-Reply-To: <20100405091228.GA17059@gondor.apana.org.au>
This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.
In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
each flow matching a killed policy gets refreshed as the getter
function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
flow cache now properly deletes the object if it had any references
Signed-off-by: Timo Teras <timo.teras@iki.fi>
---
include/net/flow.h | 23 +++++++--
include/net/xfrm.h | 2 +
net/core/flow.c | 122 +++++++++++++++++++++++++-----------------------
net/xfrm/xfrm_policy.c | 112 +++++++++++++++++++++++++++++---------------
4 files changed, 158 insertions(+), 101 deletions(-)
diff --git a/include/net/flow.h b/include/net/flow.h
index 809970b..bb08692 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -86,11 +86,26 @@ struct flowi {
struct net;
struct sock;
-typedef int (*flow_resolve_t)(struct net *net, struct flowi *key, u16 family,
- u8 dir, void **objp, atomic_t **obj_refp);
+struct flow_cache_ops;
+
+struct flow_cache_object {
+ const struct flow_cache_ops *ops;
+};
+
+struct flow_cache_ops {
+ struct flow_cache_object *(*get)(struct flow_cache_object *);
+ int (*check)(struct flow_cache_object *);
+ void (*delete)(struct flow_cache_object *);
+};
+
+typedef struct flow_cache_object *(*flow_resolve_t)(
+ struct net *net, struct flowi *key, u16 family,
+ u8 dir, struct flow_cache_object *oldobj, void *ctx);
+
+extern struct flow_cache_object *flow_cache_lookup(
+ struct net *net, struct flowi *key, u16 family,
+ u8 dir, flow_resolve_t resolver, void *ctx);
-extern void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family,
- u8 dir, flow_resolve_t resolver);
extern void flow_cache_flush(void);
extern atomic_t flow_cache_genid;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index d74e080..35396e2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -19,6 +19,7 @@
#include <net/route.h>
#include <net/ipv6.h>
#include <net/ip6_fib.h>
+#include <net/flow.h>
#include <linux/interrupt.h>
@@ -481,6 +482,7 @@ struct xfrm_policy {
atomic_t refcnt;
struct timer_list timer;
+ struct flow_cache_object flo;
u32 priority;
u32 index;
struct xfrm_mark mark;
diff --git a/net/core/flow.c b/net/core/flow.c
index 1d27ca6..4e9fd37 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -26,17 +26,16 @@
#include <linux/security.h>
struct flow_cache_entry {
- struct flow_cache_entry *next;
- u16 family;
- u8 dir;
- u32 genid;
- struct flowi key;
- void *object;
- atomic_t *object_ref;
+ struct flow_cache_entry *next;
+ u16 family;
+ u8 dir;
+ u32 genid;
+ struct flowi key;
+ struct flow_cache_object *object;
};
struct flow_cache_percpu {
- struct flow_cache_entry ** hash_table;
+ struct flow_cache_entry **hash_table;
int hash_count;
u32 hash_rnd;
int hash_rnd_recalc;
@@ -44,7 +43,7 @@ struct flow_cache_percpu {
};
struct flow_flush_info {
- struct flow_cache * cache;
+ struct flow_cache *cache;
atomic_t cpuleft;
struct completion completion;
};
@@ -52,7 +51,7 @@ struct flow_flush_info {
struct flow_cache {
u32 hash_shift;
unsigned long order;
- struct flow_cache_percpu * percpu;
+ struct flow_cache_percpu *percpu;
struct notifier_block hotcpu_notifier;
int low_watermark;
int high_watermark;
@@ -78,12 +77,21 @@ static void flow_cache_new_hashrnd(unsigned long arg)
add_timer(&fc->rnd_timer);
}
+static int flow_entry_valid(struct flow_cache_entry *fle)
+{
+ if (atomic_read(&flow_cache_genid) != fle->genid)
+ return 0;
+ if (fle->object && !fle->object->ops->check(fle->object))
+ return 0;
+ return 1;
+}
+
static void flow_entry_kill(struct flow_cache *fc,
struct flow_cache_percpu *fcp,
struct flow_cache_entry *fle)
{
if (fle->object)
- atomic_dec(fle->object_ref);
+ fle->object->ops->delete(fle->object);
kmem_cache_free(flow_cachep, fle);
fcp->hash_count--;
}
@@ -96,16 +104,18 @@ static void __flow_cache_shrink(struct flow_cache *fc,
int i;
for (i = 0; i < flow_cache_hash_size(fc); i++) {
- int k = 0;
+ int saved = 0;
flp = &fcp->hash_table[i];
- while ((fle = *flp) != NULL && k < shrink_to) {
- k++;
- flp = &fle->next;
- }
while ((fle = *flp) != NULL) {
- *flp = fle->next;
- flow_entry_kill(fc, fcp, fle);
+ if (saved < shrink_to &&
+ flow_entry_valid(fle)) {
+ saved++;
+ flp = &fle->next;
+ } else {
+ *flp = fle->next;
+ flow_entry_kill(fc, fcp, fle);
+ }
}
}
}
@@ -166,18 +176,21 @@ static int flow_key_compare(struct flowi *key1, struct flowi *key2)
return 0;
}
-void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
- flow_resolve_t resolver)
+struct flow_cache_object *
+flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
+ flow_resolve_t resolver, void *ctx)
{
struct flow_cache *fc = &flow_cache_global;
struct flow_cache_percpu *fcp;
struct flow_cache_entry *fle, **head;
+ struct flow_cache_object *flo;
unsigned int hash;
local_bh_disable();
fcp = per_cpu_ptr(fc->percpu, smp_processor_id());
fle = NULL;
+ flo = NULL;
/* Packet really early in init? Making flow_cache_init a
* pre-smp initcall would solve this. --RR */
if (!fcp->hash_table)
@@ -185,24 +198,14 @@ void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
if (fcp->hash_rnd_recalc)
flow_new_hash_rnd(fc, fcp);
- hash = flow_hash_code(fc, fcp, key);
+ hash = flow_hash_code(fc, fcp, key);
head = &fcp->hash_table[hash];
for (fle = *head; fle; fle = fle->next) {
if (fle->family == family &&
fle->dir == dir &&
- flow_key_compare(key, &fle->key) == 0) {
- if (fle->genid == atomic_read(&flow_cache_genid)) {
- void *ret = fle->object;
-
- if (ret)
- atomic_inc(fle->object_ref);
- local_bh_enable();
-
- return ret;
- }
+ flow_key_compare(key, &fle->key) == 0)
break;
- }
}
if (!fle) {
@@ -219,33 +222,35 @@ void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
fle->object = NULL;
fcp->hash_count++;
}
+ } else if (fle->genid == atomic_read(&flow_cache_genid)) {
+ flo = fle->object;
+ if (!flo)
+ goto ret_object;
+ flo = flo->ops->get(flo);
+ if (flo)
+ goto ret_object;
}
nocache:
- {
- int err;
- void *obj;
- atomic_t *obj_ref;
-
- err = resolver(net, key, family, dir, &obj, &obj_ref);
-
- if (fle && !err) {
- fle->genid = atomic_read(&flow_cache_genid);
-
- if (fle->object)
- atomic_dec(fle->object_ref);
-
- fle->object = obj;
- fle->object_ref = obj_ref;
- if (obj)
- atomic_inc(fle->object_ref);
- }
- local_bh_enable();
-
- if (err)
- obj = ERR_PTR(err);
- return obj;
+ flo = NULL;
+ if (fle) {
+ flo = fle->object;
+ fle->object = NULL;
+ }
+ flo = resolver(net, key, family, dir, flo, ctx);
+ if (fle) {
+ fle->genid = atomic_read(&flow_cache_genid);
+ if (!IS_ERR(flo))
+ fle->object = flo;
+ else
+ fle->genid--;
+ } else {
+ if (flo && !IS_ERR(flo))
+ flo->ops->delete(flo);
}
+ret_object:
+ local_bh_enable();
+ return flo;
}
static void flow_cache_flush_tasklet(unsigned long data)
@@ -261,13 +266,12 @@ static void flow_cache_flush_tasklet(unsigned long data)
fle = fcp->hash_table[i];
for (; fle; fle = fle->next) {
- unsigned genid = atomic_read(&flow_cache_genid);
-
- if (!fle->object || fle->genid == genid)
+ if (flow_entry_valid(fle))
continue;
+ if (fle->object)
+ fle->object->ops->delete(fle->object);
fle->object = NULL;
- atomic_dec(fle->object_ref);
}
}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 82789cf..7722bae 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -216,6 +216,35 @@ expired:
xfrm_pol_put(xp);
}
+static struct flow_cache_object *xfrm_policy_flo_get(struct flow_cache_object *flo)
+{
+ struct xfrm_policy *pol = container_of(flo, struct xfrm_policy, flo);
+
+ if (unlikely(pol->walk.dead))
+ flo = NULL;
+ else
+ xfrm_pol_hold(pol);
+
+ return flo;
+}
+
+static int xfrm_policy_flo_check(struct flow_cache_object *flo)
+{
+ struct xfrm_policy *pol = container_of(flo, struct xfrm_policy, flo);
+
+ return !pol->walk.dead;
+}
+
+static void xfrm_policy_flo_delete(struct flow_cache_object *flo)
+{
+ xfrm_pol_put(container_of(flo, struct xfrm_policy, flo));
+}
+
+static const struct flow_cache_ops xfrm_policy_fc_ops = {
+ .get = xfrm_policy_flo_get,
+ .check = xfrm_policy_flo_check,
+ .delete = xfrm_policy_flo_delete,
+};
/* Allocate xfrm_policy. Not used here, it is supposed to be used by pfkeyv2
* SPD calls.
@@ -236,6 +265,7 @@ struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp)
atomic_set(&policy->refcnt, 1);
setup_timer(&policy->timer, xfrm_policy_timer,
(unsigned long)policy);
+ policy->flo.ops = &xfrm_policy_fc_ops;
}
return policy;
}
@@ -269,9 +299,6 @@ static void xfrm_policy_gc_kill(struct xfrm_policy *policy)
if (del_timer(&policy->timer))
atomic_dec(&policy->refcnt);
- if (atomic_read(&policy->refcnt) > 1)
- flow_cache_flush();
-
xfrm_pol_put(policy);
}
@@ -661,10 +688,8 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
}
write_unlock_bh(&xfrm_policy_lock);
- if (ret && delete) {
- atomic_inc(&flow_cache_genid);
+ if (ret && delete)
xfrm_policy_kill(ret);
- }
return ret;
}
EXPORT_SYMBOL(xfrm_policy_bysel_ctx);
@@ -703,10 +728,8 @@ struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
}
write_unlock_bh(&xfrm_policy_lock);
- if (ret && delete) {
- atomic_inc(&flow_cache_genid);
+ if (ret && delete)
xfrm_policy_kill(ret);
- }
return ret;
}
EXPORT_SYMBOL(xfrm_policy_byid);
@@ -822,7 +845,6 @@ int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info)
}
if (!cnt)
err = -ESRCH;
- atomic_inc(&flow_cache_genid);
out:
write_unlock_bh(&xfrm_policy_lock);
return err;
@@ -976,32 +998,35 @@ fail:
return ret;
}
-static int xfrm_policy_lookup(struct net *net, struct flowi *fl, u16 family,
- u8 dir, void **objp, atomic_t **obj_refp)
+static struct flow_cache_object *
+xfrm_policy_lookup(struct net *net, struct flowi *fl, u16 family,
+ u8 dir, struct flow_cache_object *old_obj, void *ctx)
{
struct xfrm_policy *pol;
- int err = 0;
+
+ if (old_obj)
+ xfrm_pol_put(container_of(old_obj, struct xfrm_policy, flo));
#ifdef CONFIG_XFRM_SUB_POLICY
pol = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_SUB, fl, family, dir);
- if (IS_ERR(pol)) {
- err = PTR_ERR(pol);
- pol = NULL;
- }
- if (pol || err)
- goto end;
+ if (IS_ERR(pol))
+ return ERR_CAST(pol);
+ if (pol)
+ goto found;
#endif
pol = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, fl, family, dir);
- if (IS_ERR(pol)) {
- err = PTR_ERR(pol);
- pol = NULL;
- }
-#ifdef CONFIG_XFRM_SUB_POLICY
-end:
-#endif
- if ((*objp = (void *) pol) != NULL)
- *obj_refp = &pol->refcnt;
- return err;
+ if (IS_ERR(pol))
+ return ERR_CAST(pol);
+ if (pol)
+ goto found;
+ return NULL;
+
+found:
+ /* Resolver returns two references:
+ * one for cache and one for caller of flow_cache_lookup() */
+ xfrm_pol_hold(pol);
+
+ return &pol->flo;
}
static inline int policy_to_flow_dir(int dir)
@@ -1091,8 +1116,6 @@ int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
pol = __xfrm_policy_unlink(pol, dir);
write_unlock_bh(&xfrm_policy_lock);
if (pol) {
- if (dir < XFRM_POLICY_MAX)
- atomic_inc(&flow_cache_genid);
xfrm_policy_kill(pol);
return 0;
}
@@ -1578,18 +1601,24 @@ restart:
}
if (!policy) {
+ struct flow_cache_object *flo;
+
/* To accelerate a bit... */
if ((dst_orig->flags & DST_NOXFRM) ||
!net->xfrm.policy_count[XFRM_POLICY_OUT])
goto nopol;
- policy = flow_cache_lookup(net, fl, dst_orig->ops->family,
- dir, xfrm_policy_lookup);
- err = PTR_ERR(policy);
- if (IS_ERR(policy)) {
+ flo = flow_cache_lookup(net, fl, dst_orig->ops->family,
+ dir, xfrm_policy_lookup, NULL);
+ err = PTR_ERR(flo);
+ if (IS_ERR(flo)) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTPOLERROR);
goto dropdst;
}
+ if (flo)
+ policy = container_of(flo, struct xfrm_policy, flo);
+ else
+ policy = NULL;
}
if (!policy)
@@ -1939,9 +1968,16 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
}
}
- if (!pol)
- pol = flow_cache_lookup(net, &fl, family, fl_dir,
- xfrm_policy_lookup);
+ if (!pol) {
+ struct flow_cache_object *flo;
+
+ flo = flow_cache_lookup(net, &fl, family, fl_dir,
+ xfrm_policy_lookup, NULL);
+ if (IS_ERR_OR_NULL(flo))
+ pol = ERR_CAST(flo);
+ else
+ pol = container_of(flo, struct xfrm_policy, flo);
+ }
if (IS_ERR(pol)) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
--
1.6.3.3
^ permalink raw reply related
* Re: Undefined behaviour of connect(fd, NULL, 0);
From: Andreas Schwab @ 2010-04-05 16:25 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Changli Gao, David Miller, neilb, shemminger, netdev
In-Reply-To: <1270483012.4722.161.camel@edumazet-laptop>
Eric Dumazet <eric.dumazet@gmail.com> writes:
> Solaris man page extract :
>
> "Datagram sockets can dissolve the association by connecting to a null
> address."
>
> What is a null address ?
>
> 1) A null pointer ?
> 2) a pointer to a zone, but length of this zone is 0
> 3) Or a pointer to a zone filled with NULL bytes ?
Btw., POSIX.1 has changed the description from "If address is a null
address for the protocol, the socket's peer address shall be reset" in
the 2004 edition to "If the sa_family member of address is AF_UNSPEC,
the socket's peer address shall be reset" in the 2009 edition.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply
* Re: [PATCH] sky2: rx hash offload
From: Eric Dumazet @ 2010-04-05 16:14 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20100405084800.3bcec66a@nehalam>
Le lundi 05 avril 2010 à 08:48 -0700, Stephen Hemminger a écrit :
> Marvell Yukon 2 hardware supports hardware receive hash calculation.
> Now that Receive Packet Steering is available, add support
> to enable it.
>
> Note: still experimental, tested on only a few variants.
> No performance testing has been done.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> drivers/net/sky2.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> drivers/net/sky2.h | 23 ++++++++++++++++
> 2 files changed, 96 insertions(+), 2 deletions(-)
>
Cool :)
I believe some bits are needed in receive_copy() to transfert rxhash to
new skb ?
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d8ec4c1..f420255 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2295,6 +2295,8 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
skb_copy_from_linear_data(re->skb, skb->data, length);
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
+ skb->rxhash = re->skb->rxhash;
+ re->skb->rxhash = 0;
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
length, PCI_DMA_FROMDEVICE);
re->skb->ip_summed = CHECKSUM_NONE;
^ permalink raw reply related
* Re: Undefined behaviour of connect(fd, NULL, 0);
From: Eric Dumazet @ 2010-04-05 15:56 UTC (permalink / raw)
To: Changli Gao; +Cc: David Miller, neilb, shemminger, netdev
In-Reply-To: <w2n412e6f7f1004050223j3e15df91tcdf133670c636a85@mail.gmail.com>
Le lundi 05 avril 2010 à 17:23 +0800, Changli Gao a écrit :
> I found this from the man page of FreeBSD's connect(2).
>
> Generally, stream sockets may successfully connect() only
> once; datagram sockets may use connect() multiple times to change their
> association. Datagram sockets may dissolve the association by connecting
> to an invalid address, such as a null address.
>
> And this from the man page of Darwin's connect(2).
>
> Datagram sockets may dissolve the association by connecting to an
> invalid address, such as a null address or an address with the address
> family set to AF_UNSPEC (the error EAFNOSUPPORT will be harmlessly
> returned).
>
> Since null address behavior has been defined by the others. I think
> Linux should be compatible with the others. So the patch submitted on
> this by me should not been applied. I'll work out another patch later.
>
As pointed by David, no sane application would use this facility until a
decade, I wonder why you insist so much for this minor detail.
Solaris man page extract :
"Datagram sockets can dissolve the association by connecting to a null
address."
What is a null address ?
1) A null pointer ?
2) a pointer to a zone, but length of this zone is 0
3) Or a pointer to a zone filled with NULL bytes ?
Linux implements the later interpretation. Its more than enough.
If a NULL pointer was implemented, man pages would use the following
words : "Datagram sockets can dissolve the association by connecting to
a NULL pointer (NULL second argument to connect())."
If you submit a patch to change connect() behavior, dont forget to send
appropriate changes to Michael, because in the end, nobody but you knows
how things are supposed to work if not documented.
MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
M: Michael Kerrisk <mtk.manpages@gmail.com>
W: http://www.kernel.org/doc/man-pages
^ permalink raw reply
* [PATCH] sky2: rx hash offload
From: Stephen Hemminger @ 2010-04-05 15:48 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Marvell Yukon 2 hardware supports hardware receive hash calculation.
Now that Receive Packet Steering is available, add support
to enable it.
Note: still experimental, tested on only a few variants.
No performance testing has been done.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
drivers/net/sky2.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++--
drivers/net/sky2.h | 23 ++++++++++++++++
2 files changed, 96 insertions(+), 2 deletions(-)
--- a/drivers/net/sky2.c 2010-04-04 15:04:22.582288437 -0700
+++ b/drivers/net/sky2.c 2010-04-05 08:37:47.924236795 -0700
@@ -1195,6 +1195,39 @@ static void rx_set_checksum(struct sky2_
? BMU_ENA_RX_CHKSUM : BMU_DIS_RX_CHKSUM);
}
+/* Enable/disable receive hash calculation (RSS) */
+static void rx_set_rss(struct net_device *dev)
+{
+ struct sky2_port *sky2 = netdev_priv(dev);
+ struct sky2_hw *hw = sky2->hw;
+ int i, nkeys = 4;
+
+ /* Supports IPv6 and other modes */
+ if (hw->flags & SKY2_HW_NEW_LE) {
+ nkeys = 10;
+ sky2_write32(hw, SK_REG(sky2->port, RSS_CFG), HASH_ALL);
+ }
+
+ /* Program RSS initial values */
+ if (dev->features & NETIF_F_RXHASH) {
+ u32 key[nkeys];
+
+ get_random_bytes(key, nkeys * sizeof(u32));
+ for (i = 0; i < nkeys; i++)
+ sky2_write32(hw, SK_REG(sky2->port, RSS_KEY + i * 4),
+ key[i]);
+
+ /* Need to turn on (undocumented) flag to make hashing work */
+ sky2_write32(hw, SK_REG(sky2->port, RX_GMF_CTRL_T),
+ RX_STFW_ENA);
+
+ sky2_write32(hw, Q_ADDR(rxqaddr[sky2->port], Q_CSR),
+ BMU_ENA_RX_RSS_HASH);
+ } else
+ sky2_write32(hw, Q_ADDR(rxqaddr[sky2->port], Q_CSR),
+ BMU_DIS_RX_RSS_HASH);
+}
+
/*
* The RX Stop command will not work for Yukon-2 if the BMU does not
* reach the end of packet and since we can't make sure that we have
@@ -1427,6 +1460,9 @@ static void sky2_rx_start(struct sky2_po
if (!(hw->flags & SKY2_HW_NEW_LE))
rx_set_checksum(sky2);
+ if (!(hw->flags & SKY2_HW_RSS_BROKEN))
+ rx_set_rss(sky2->netdev);
+
/* submit Rx ring */
for (i = 0; i < sky2->rx_pending; i++) {
re = sky2->rx_ring + i;
@@ -2536,6 +2572,14 @@ static void sky2_rx_checksum(struct sky2
}
}
+static void sky2_rx_hash(struct sky2_port *sky2, u32 status)
+{
+ struct sk_buff *skb;
+
+ skb = sky2->rx_ring[sky2->rx_next].skb;
+ skb->rxhash = le32_to_cpu(status);
+}
+
/* Process status response ring */
static int sky2_status_intr(struct sky2_hw *hw, int to_do, u16 idx)
{
@@ -2608,6 +2652,10 @@ static int sky2_status_intr(struct sky2_
sky2_rx_checksum(sky2, status);
break;
+ case OP_RSS_HASH:
+ sky2_rx_hash(sky2, status);
+ break;
+
case OP_TXINDEXLE:
/* TX index reports status for both ports */
sky2_tx_done(hw->dev[0], status & 0xfff);
@@ -2962,6 +3010,8 @@ static int __devinit sky2_init(struct sk
switch(hw->chip_id) {
case CHIP_ID_YUKON_XL:
hw->flags = SKY2_HW_GIGABIT | SKY2_HW_NEWER_PHY;
+ if (hw->chip_rev < CHIP_REV_YU_XL_A2)
+ hw->flags |= SKY2_HW_RSS_BROKEN;
break;
case CHIP_ID_YUKON_EC_U:
@@ -2987,10 +3037,11 @@ static int __devinit sky2_init(struct sk
dev_err(&hw->pdev->dev, "unsupported revision Yukon-EC rev A1\n");
return -EOPNOTSUPP;
}
- hw->flags = SKY2_HW_GIGABIT;
+ hw->flags = SKY2_HW_GIGABIT | SKY2_HW_RSS_BROKEN;
break;
case CHIP_ID_YUKON_FE:
+ hw->flags = SKY2_HW_RSS_BROKEN;
break;
case CHIP_ID_YUKON_FE_P:
@@ -4114,6 +4165,28 @@ static int sky2_set_eeprom(struct net_de
return sky2_vpd_write(sky2->hw, cap, data, eeprom->offset, eeprom->len);
}
+static int sky2_set_flags(struct net_device *dev, u32 data)
+{
+ struct sky2_port *sky2 = netdev_priv(dev);
+
+ if (data & ETH_FLAG_LRO)
+ return -EOPNOTSUPP;
+
+ if (data & ETH_FLAG_NTUPLE)
+ return -EOPNOTSUPP;
+
+ if (data & ETH_FLAG_RXHASH) {
+ if (sky2->hw->flags & SKY2_HW_RSS_BROKEN)
+ return -EINVAL;
+
+ dev->features |= NETIF_F_RXHASH;
+ } else
+ dev->features &= ~NETIF_F_RXHASH;
+
+ rx_set_rss(dev);
+
+ return 0;
+}
static const struct ethtool_ops sky2_ethtool_ops = {
.get_settings = sky2_get_settings,
@@ -4145,6 +4218,7 @@ static const struct ethtool_ops sky2_eth
.phys_id = sky2_phys_id,
.get_sset_count = sky2_get_sset_count,
.get_ethtool_stats = sky2_get_ethtool_stats,
+ .set_flags = sky2_set_flags,
};
#ifdef CONFIG_SKY2_DEBUG
@@ -4497,6 +4571,11 @@ static __devinit struct net_device *sky2
if (highmem)
dev->features |= NETIF_F_HIGHDMA;
+#ifdef CONFIG_RPS
+ if (!(hw->flags & SKY2_HW_RSS_BROKEN))
+ dev->features |= NETIF_F_RXHASH;
+#endif
+
#ifdef SKY2_VLAN_TAG_USED
/* The workaround for FE+ status conflicts with VLAN tag detection. */
if (!(sky2->hw->chip_id == CHIP_ID_YUKON_FE_P &&
--- a/drivers/net/sky2.h 2010-04-02 15:18:00.289206825 -0700
+++ b/drivers/net/sky2.h 2010-04-04 15:05:22.161352031 -0700
@@ -694,8 +694,21 @@ enum {
TXA_CTRL = 0x0210,/* 8 bit Tx Arbiter Control Register */
TXA_TEST = 0x0211,/* 8 bit Tx Arbiter Test Register */
TXA_STAT = 0x0212,/* 8 bit Tx Arbiter Status Register */
+
+ RSS_KEY = 0x0220, /* RSS Key setup */
+ RSS_CFG = 0x0248, /* RSS Configuration */
};
+enum {
+ HASH_TCP_IPV6_EX_CTRL = 1<<5,
+ HASH_IPV6_EX_CTRL = 1<<4,
+ HASH_TCP_IPV6_CTRL = 1<<3,
+ HASH_IPV6_CTRL = 1<<2,
+ HASH_TCP_IPV4_CTRL = 1<<1,
+ HASH_IPV4_CTRL = 1<<0,
+
+ HASH_ALL = 0x3f,
+};
enum {
B6_EXT_REG = 0x0300,/* External registers (GENESIS only) */
@@ -2261,6 +2274,7 @@ struct sky2_hw {
#define SKY2_HW_NEW_LE 0x00000020 /* new LSOv2 format */
#define SKY2_HW_AUTO_TX_SUM 0x00000040 /* new IP decode for Tx */
#define SKY2_HW_ADV_POWER_CTL 0x00000080 /* additional PHY power regs */
+#define SKY2_HW_RSS_BROKEN 0x00000100
u8 chip_id;
u8 chip_rev;
^ permalink raw reply
* RE: CAIF device
From: Sjur BRENDELAND @ 2010-04-05 11:19 UTC (permalink / raw)
To: Alan, netdev@vger.kernel.org
In-Reply-To: <20100401160916.2a2574f4@lxorguk.ukuu.org.uk>
Hi Alan.
Alan wrote:
> I was reading through the CAIF code and I noticed a couple of bugs
>
> Doesn't check there is a write method so set on a read only
> device it's not good news (doubly so as there seem to be no
> permission checks ?) plus no permissions checks and also the
> following which looks unsafe
>
> dev_close(ser->dev);
> unregister_netdevice(ser->dev);
> list_del(&ser->node);
> debugfs_deinit(ser);
>
> Now ser is the netdev private data so what stops it going away when
> unregister_netdev is called ?
I think this should work fine as the unregistration of the ser->dev is done after rtnl_lock,
this delays the freeing of the device until rtnl_unlock.
>
> Secondly tty devices are ref counted and this for some reason didn't
> get fixed in the driver yet.
>
> [Patches to follow for the write and kref bugs, the others need the
> authors and someone who knows the netdev code these days to fix]
Thanks, looking forward to review your patches.
BR/Sjur
^ permalink raw reply
* Re: Undefined behaviour of connect(fd, NULL, 0);
From: Changli Gao @ 2010-04-05 9:23 UTC (permalink / raw)
To: David Miller; +Cc: neilb, shemminger, netdev
In-Reply-To: <20100401.002319.236233308.davem@davemloft.net>
On Thu, Apr 1, 2010 at 3:23 PM, David Miller <davem@davemloft.net> wrote:
>
> This seems logical, but I believe it is wrong.
>
> We already know for a fact that it is guarenteed to not work
> reliably for every single kernel in existence in the world
> right now.
>
> Every system. Ones that have been deployed for 10 years as
> well as those built from GIT 10 seconds ago.
>
> So you tell me, if you put this into an application that you
> wish to deploy anywhere, are you not being completely stupid?
>
> Therefore, if it's illogical to use this in an application, what value
> is there in starting to support it now in the kernel?
>
> I'll tell you, the value is absolutely zero.
>
> Yes we need to add the length check, but the behavior we give to this
> case as a result, is completely arbitrary. And I would in fact argue
> for a hard error in these cases.
>
> Simply mark it as invalid to call connect() this way.
>
I found this from the man page of FreeBSD's connect(2).
Generally, stream sockets may successfully connect() only
once; datagram sockets may use connect() multiple times to change their
association. Datagram sockets may dissolve the association by connecting
to an invalid address, such as a null address.
And this from the man page of Darwin's connect(2).
Datagram sockets may dissolve the association by connecting to an
invalid address, such as a null address or an address with the address
family set to AF_UNSPEC (the error EAFNOSUPPORT will be harmlessly
returned).
Since null address behavior has been defined by the others. I think
Linux should be compatible with the others. So the patch submitted on
this by me should not been applied. I'll work out another patch later.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply
* [v2 Patch 3/3] bonding: make bonding support netpoll
From: Amerigo Wang @ 2010-04-05 9:12 UTC (permalink / raw)
To: linux-kernel
Cc: Matt Mackall, netdev, bridge, Andy Gospodarek, Neil Horman,
Amerigo Wang, Jeff Moyer, Stephen Hemminger, bonding-devel,
Jay Vosburgh, David Miller
In-Reply-To: <20100405091605.4890.31181.sendpatchset@localhost.localdomain>
Based on Andy's work, but I modified a lot.
Similar to the patch for bridge, this patch does:
1) implement the 2 methods to support netpoll for bonding;
2) modify netpoll during forwarding packets via bonding;
3) disable netpoll support of bonding when a netpoll-unabled device
is added to bonding;
4) enable netpoll support when all underlying devices support netpoll.
Cc: Andy Gospodarek <gospo@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: WANG Cong <amwang@redhat.com>
---
Index: linux-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- linux-2.6.orig/drivers/net/bonding/bond_main.c
+++ linux-2.6/drivers/net/bonding/bond_main.c
@@ -59,6 +59,7 @@
#include <linux/uaccess.h>
#include <linux/errno.h>
#include <linux/netdevice.h>
+#include <linux/netpoll.h>
#include <linux/inetdevice.h>
#include <linux/igmp.h>
#include <linux/etherdevice.h>
@@ -430,7 +431,18 @@ int bond_dev_queue_xmit(struct bonding *
}
skb->priority = 1;
- dev_queue_xmit(skb);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (bond->dev->priv_flags & IFF_IN_NETPOLL) {
+ struct netpoll *np = bond->dev->npinfo->netpoll;
+ slave_dev->npinfo = bond->dev->npinfo;
+ np->real_dev = np->dev = skb->dev;
+ slave_dev->priv_flags |= IFF_IN_NETPOLL;
+ netpoll_send_skb(np, skb);
+ slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
+ np->dev = bond->dev;
+ } else
+#endif
+ dev_queue_xmit(skb);
return 0;
}
@@ -1329,6 +1341,60 @@ static void bond_detach_slave(struct bon
bond->slave_cnt--;
}
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static bool slaves_support_netpoll(struct net_device *bond_dev)
+{
+ struct bonding *bond = netdev_priv(bond_dev);
+ struct slave *slave;
+ int i = 0;
+ bool ret = true;
+
+ read_lock(&bond->lock);
+ bond_for_each_slave(bond, slave, i) {
+ if ((slave->dev->priv_flags & IFF_DISABLE_NETPOLL)
+ || !slave->dev->netdev_ops->ndo_poll_controller)
+ ret = false;
+ }
+ read_unlock(&bond->lock);
+ return i != 0 && ret;
+}
+
+static void bond_poll_controller(struct net_device *bond_dev)
+{
+ struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
+ if (dev != bond_dev)
+ netpoll_poll_dev(dev);
+}
+
+static void bond_netpoll_cleanup(struct net_device *bond_dev)
+{
+ struct bonding *bond = netdev_priv(bond_dev);
+ struct slave *slave;
+ const struct net_device_ops *ops;
+ int i;
+
+ read_lock(&bond->lock);
+ bond_dev->npinfo = NULL;
+ bond_for_each_slave(bond, slave, i) {
+ if (slave->dev) {
+ ops = slave->dev->netdev_ops;
+ if (ops->ndo_netpoll_cleanup)
+ ops->ndo_netpoll_cleanup(slave->dev);
+ else
+ slave->dev->npinfo = NULL;
+ }
+ }
+ read_unlock(&bond->lock);
+}
+
+#else
+
+static void bond_netpoll_cleanup(struct net_device *bond_dev)
+{
+}
+
+#endif
+
/*---------------------------------- IOCTL ----------------------------------*/
static int bond_sethwaddr(struct net_device *bond_dev,
@@ -1746,6 +1812,18 @@ int bond_enslave(struct net_device *bond
new_slave->state == BOND_STATE_ACTIVE ? "n active" : " backup",
new_slave->link != BOND_LINK_DOWN ? "n up" : " down");
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (slaves_support_netpoll(bond_dev)) {
+ bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
+ if (bond_dev->npinfo)
+ slave_dev->npinfo = bond_dev->npinfo;
+ } else if (!(bond_dev->priv_flags & IFF_DISABLE_NETPOLL)) {
+ bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
+ pr_info("New slave device %s does not support netpoll\n",
+ slave_dev->name);
+ pr_info("Disabling netpoll support for %s\n", bond_dev->name);
+ }
+#endif
/* enslave is successful */
return 0;
@@ -1929,6 +2007,15 @@ int bond_release(struct net_device *bond
netdev_set_master(slave_dev, NULL);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (slaves_support_netpoll(bond_dev))
+ bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
+ if (slave_dev->netdev_ops->ndo_netpoll_cleanup)
+ slave_dev->netdev_ops->ndo_netpoll_cleanup(slave_dev);
+ else
+ slave_dev->npinfo = NULL;
+#endif
+
/* close slave before restoring its mac address */
dev_close(slave_dev);
@@ -4448,6 +4535,10 @@ static const struct net_device_ops bond_
.ndo_vlan_rx_register = bond_vlan_rx_register,
.ndo_vlan_rx_add_vid = bond_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = bond_vlan_rx_kill_vid,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ .ndo_netpoll_cleanup = bond_netpoll_cleanup,
+ .ndo_poll_controller = bond_poll_controller,
+#endif
};
static void bond_setup(struct net_device *bond_dev)
@@ -4533,6 +4624,8 @@ static void bond_uninit(struct net_devic
{
struct bonding *bond = netdev_priv(bond_dev);
+ bond_netpoll_cleanup(bond_dev);
+
/* Release the bonded slaves */
bond_release_all(bond_dev);
^ permalink raw reply
* [v2 Patch 2/3] bridge: make bridge support netpoll
From: Amerigo Wang @ 2010-04-05 9:12 UTC (permalink / raw)
To: linux-kernel
Cc: Stephen Hemminger, netdev, bridge, Andy Gospodarek, Neil Horman,
Amerigo Wang, Jeff Moyer, Matt Mackall, bonding-devel,
Jay Vosburgh, David Miller
In-Reply-To: <20100405091605.4890.31181.sendpatchset@localhost.localdomain>
Based on the previous patch, make bridge support netpoll by:
1) implement the 2 methods to support netpoll for bridge;
2) modify netpoll during forwarding packets via bridge;
3) disable netpoll support of bridge when a netpoll-unabled device
is added to bridge;
4) enable netpoll support when all underlying devices support netpoll.
Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
---
Index: linux-2.6/net/bridge/br_device.c
===================================================================
--- linux-2.6.orig/net/bridge/br_device.c
+++ linux-2.6/net/bridge/br_device.c
@@ -13,8 +13,10 @@
#include <linux/kernel.h>
#include <linux/netdevice.h>
+#include <linux/netpoll.h>
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
+#include <linux/list.h>
#include <asm/uaccess.h>
#include "br_private.h"
@@ -162,6 +164,59 @@ static int br_set_tx_csum(struct net_dev
return 0;
}
+#ifdef CONFIG_NET_POLL_CONTROLLER
+bool br_devices_support_netpoll(struct net_bridge *br)
+{
+ struct net_bridge_port *p;
+ bool ret = true;
+ int count = 0;
+ unsigned long flags;
+
+ spin_lock_irqsave(&br->lock, flags);
+ list_for_each_entry(p, &br->port_list, list) {
+ count++;
+ if (p->dev->priv_flags & IFF_DISABLE_NETPOLL
+ || !p->dev->netdev_ops->ndo_poll_controller)
+ ret = false;
+ }
+ spin_unlock_irqrestore(&br->lock, flags);
+ return count != 0 && ret;
+}
+
+static void br_poll_controller(struct net_device *br_dev)
+{
+ struct netpoll *np = br_dev->npinfo->netpoll;
+
+ if (np->real_dev != br_dev)
+ netpoll_poll_dev(np->real_dev);
+}
+
+void br_netpoll_cleanup(struct net_device *br_dev)
+{
+ struct net_bridge *br = netdev_priv(br_dev);
+ struct net_bridge_port *p, *n;
+ const struct net_device_ops *ops;
+
+ br->dev->npinfo = NULL;
+ list_for_each_entry_safe(p, n, &br->port_list, list) {
+ if (p->dev) {
+ ops = p->dev->netdev_ops;
+ if (ops->ndo_netpoll_cleanup)
+ ops->ndo_netpoll_cleanup(p->dev);
+ else
+ p->dev->npinfo = NULL;
+ }
+ }
+}
+
+#else
+
+void br_netpoll_cleanup(struct net_device *br_dev)
+{
+}
+
+#endif
+
static const struct ethtool_ops br_ethtool_ops = {
.get_drvinfo = br_getinfo,
.get_link = ethtool_op_get_link,
@@ -184,6 +239,10 @@ static const struct net_device_ops br_ne
.ndo_set_multicast_list = br_dev_set_multicast_list,
.ndo_change_mtu = br_change_mtu,
.ndo_do_ioctl = br_dev_ioctl,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ .ndo_netpoll_cleanup = br_netpoll_cleanup,
+ .ndo_poll_controller = br_poll_controller,
+#endif
};
void br_dev_setup(struct net_device *dev)
Index: linux-2.6/net/bridge/br_forward.c
===================================================================
--- linux-2.6.orig/net/bridge/br_forward.c
+++ linux-2.6/net/bridge/br_forward.c
@@ -14,6 +14,7 @@
#include <linux/err.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
+#include <linux/netpoll.h>
#include <linux/skbuff.h>
#include <linux/if_vlan.h>
#include <linux/netfilter_bridge.h>
@@ -49,7 +50,13 @@ int br_dev_queue_push_xmit(struct sk_buf
else {
skb_push(skb, ETH_HLEN);
- dev_queue_xmit(skb);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (skb->dev->priv_flags & IFF_IN_NETPOLL) {
+ netpoll_send_skb(skb->dev->npinfo->netpoll, skb);
+ skb->dev->priv_flags &= ~IFF_IN_NETPOLL;
+ } else
+#endif
+ dev_queue_xmit(skb);
}
}
@@ -65,9 +72,23 @@ int br_forward_finish(struct sk_buff *sk
static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
{
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ struct net_bridge *br = to->br;
+ if (br->dev->priv_flags & IFF_IN_NETPOLL) {
+ struct netpoll *np;
+ to->dev->npinfo = skb->dev->npinfo;
+ np = skb->dev->npinfo->netpoll;
+ np->real_dev = np->dev = to->dev;
+ to->dev->priv_flags |= IFF_IN_NETPOLL;
+ }
+#endif
skb->dev = to->dev;
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
br_forward_finish);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (skb->dev->npinfo)
+ skb->dev->npinfo->netpoll->dev = br->dev;
+#endif
}
static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
Index: linux-2.6/net/bridge/br_if.c
===================================================================
--- linux-2.6.orig/net/bridge/br_if.c
+++ linux-2.6/net/bridge/br_if.c
@@ -19,6 +19,7 @@
#include <linux/init.h>
#include <linux/rtnetlink.h>
#include <linux/if_ether.h>
+#include <linux/netpoll.h>
#include <net/sock.h>
#include "br_private.h"
@@ -152,6 +153,14 @@ static void del_nbp(struct net_bridge_po
kobject_uevent(&p->kobj, KOBJ_REMOVE);
kobject_del(&p->kobj);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (br_devices_support_netpoll(br))
+ br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
+ if (dev->netdev_ops->ndo_netpoll_cleanup)
+ dev->netdev_ops->ndo_netpoll_cleanup(dev);
+ else
+ dev->npinfo = NULL;
+#endif
call_rcu(&p->rcu, destroy_nbp_rcu);
}
@@ -164,6 +173,8 @@ static void del_br(struct net_bridge *br
del_nbp(p);
}
+ br_netpoll_cleanup(br->dev);
+
del_timer_sync(&br->gc_timer);
br_sysfs_delbr(br->dev);
@@ -437,6 +448,20 @@ int br_add_if(struct net_bridge *br, str
kobject_uevent(&p->kobj, KOBJ_ADD);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ if (br_devices_support_netpoll(br)) {
+ br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
+ if (br->dev->npinfo)
+ dev->npinfo = br->dev->npinfo;
+ } else if (!(br->dev->priv_flags & IFF_DISABLE_NETPOLL)) {
+ br->dev->priv_flags |= IFF_DISABLE_NETPOLL;
+ printk(KERN_INFO "New device %s does not support netpoll\n",
+ dev->name);
+ printk(KERN_INFO "Disabling netpoll for %s\n",
+ br->dev->name);
+ }
+#endif
+
return 0;
err2:
br_fdb_delete_by_port(br, p, 1);
Index: linux-2.6/net/bridge/br_private.h
===================================================================
--- linux-2.6.orig/net/bridge/br_private.h
+++ linux-2.6/net/bridge/br_private.h
@@ -233,6 +233,8 @@ static inline int br_is_root_bridge(cons
extern void br_dev_setup(struct net_device *dev);
extern netdev_tx_t br_dev_xmit(struct sk_buff *skb,
struct net_device *dev);
+extern bool br_devices_support_netpoll(struct net_bridge *br);
+extern void br_netpoll_cleanup(struct net_device *br_dev);
/* br_fdb.c */
extern int br_fdb_init(void);
^ permalink raw reply
* [v2 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Amerigo Wang @ 2010-04-05 9:12 UTC (permalink / raw)
To: linux-kernel
Cc: Matt Mackall, netdev, bridge, Andy Gospodarek, Neil Horman,
Amerigo Wang, Jeff Moyer, Stephen Hemminger, bonding-devel,
Jay Vosburgh, David Miller
V2:
Fix some bugs of previous version.
Remove ->netpoll_setup and ->netpoll_xmit, they are not necessary.
Don't poll all underlying devices, poll ->real_dev in struct netpoll.
Thanks to David for suggesting above.
--------->
This whole patchset is for adding netpoll support to bridge and bonding
devices. I already tested it for bridge, bonding, bridge over bonding,
and bonding over bridge. It looks fine now.
Please comment.
To make bridge and bonding support netpoll, we need to adjust
some netpoll generic code. This patch does the following things:
1) introduce two new priv_flags for struct net_device:
IFF_IN_NETPOLL which identifies we are processing a netpoll;
IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
at run-time;
2) introduce one new method for netdev_ops:
->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
removed.
3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
export netpoll_send_skb() and netpoll_poll_dev() which will be used later;
4) hide a pointer to struct netpoll in struct netpoll_info, ditto.
5) introduce ->real_dev for struct netpoll.
Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
---
Index: linux-2.6/include/linux/if.h
===================================================================
--- linux-2.6.orig/include/linux/if.h
+++ linux-2.6/include/linux/if.h
@@ -71,6 +71,8 @@
* release skb->dst
*/
#define IFF_DONT_BRIDGE 0x800 /* disallow bridging this ether dev */
+#define IFF_IN_NETPOLL 0x1000 /* whether we are processing netpoll */
+#define IFF_DISABLE_NETPOLL 0x2000 /* disable netpoll at run-time */
#define IF_GET_IFACE 0x0001 /* for querying only */
#define IF_GET_PROTO 0x0002
Index: linux-2.6/include/linux/netdevice.h
===================================================================
--- linux-2.6.orig/include/linux/netdevice.h
+++ linux-2.6/include/linux/netdevice.h
@@ -667,6 +667,7 @@ struct net_device_ops {
unsigned short vid);
#ifdef CONFIG_NET_POLL_CONTROLLER
void (*ndo_poll_controller)(struct net_device *dev);
+ void (*ndo_netpoll_cleanup)(struct net_device *dev);
#endif
int (*ndo_set_vf_mac)(struct net_device *dev,
int queue, u8 *mac);
Index: linux-2.6/include/linux/netpoll.h
===================================================================
--- linux-2.6.orig/include/linux/netpoll.h
+++ linux-2.6/include/linux/netpoll.h
@@ -14,6 +14,7 @@
struct netpoll {
struct net_device *dev;
+ struct net_device *real_dev;
char dev_name[IFNAMSIZ];
const char *name;
void (*rx_hook)(struct netpoll *, int, char *, int);
@@ -36,8 +37,11 @@ struct netpoll_info {
struct sk_buff_head txq;
struct delayed_work tx_work;
+
+ struct netpoll *netpoll;
};
+void netpoll_poll_dev(struct net_device *dev);
void netpoll_poll(struct netpoll *np);
void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
void netpoll_print_options(struct netpoll *np);
@@ -47,6 +51,7 @@ int netpoll_trap(void);
void netpoll_set_trap(int trap);
void netpoll_cleanup(struct netpoll *np);
int __netpoll_rx(struct sk_buff *skb);
+void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
#ifdef CONFIG_NETPOLL
Index: linux-2.6/net/core/netpoll.c
===================================================================
--- linux-2.6.orig/net/core/netpoll.c
+++ linux-2.6/net/core/netpoll.c
@@ -178,9 +178,8 @@ static void service_arp_queue(struct net
}
}
-void netpoll_poll(struct netpoll *np)
+void netpoll_poll_dev(struct net_device *dev)
{
- struct net_device *dev = np->dev;
const struct net_device_ops *ops;
if (!dev || !netif_running(dev))
@@ -200,6 +199,11 @@ void netpoll_poll(struct netpoll *np)
zap_completion_queue();
}
+void netpoll_poll(struct netpoll *np)
+{
+ netpoll_poll_dev(np->dev);
+}
+
static void refill_skbs(void)
{
struct sk_buff *skb;
@@ -281,7 +285,7 @@ static int netpoll_owner_active(struct n
return 0;
}
-static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
+void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
int status = NETDEV_TX_BUSY;
unsigned long tries;
@@ -307,7 +311,9 @@ static void netpoll_send_skb(struct netp
tries > 0; --tries) {
if (__netif_tx_trylock(txq)) {
if (!netif_tx_queue_stopped(txq)) {
+ dev->priv_flags |= IFF_IN_NETPOLL;
status = ops->ndo_start_xmit(skb, dev);
+ dev->priv_flags &= ~IFF_IN_NETPOLL;
if (status == NETDEV_TX_OK)
txq_trans_update(txq);
}
@@ -755,7 +761,10 @@ int netpoll_setup(struct netpoll *np)
atomic_inc(&npinfo->refcnt);
}
- if (!ndev->netdev_ops->ndo_poll_controller) {
+ npinfo->netpoll = np;
+
+ if (ndev->priv_flags & IFF_DISABLE_NETPOLL
+ || !ndev->netdev_ops->ndo_poll_controller) {
printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
np->name, np->dev_name);
err = -ENOTSUPP;
@@ -877,6 +886,7 @@ void netpoll_cleanup(struct netpoll *np)
}
if (atomic_dec_and_test(&npinfo->refcnt)) {
+ const struct net_device_ops *ops;
skb_queue_purge(&npinfo->arp_tx);
skb_queue_purge(&npinfo->txq);
cancel_rearming_delayed_work(&npinfo->tx_work);
@@ -884,7 +894,11 @@ void netpoll_cleanup(struct netpoll *np)
/* clean after last, unfinished work */
__skb_queue_purge(&npinfo->txq);
kfree(npinfo);
- np->dev->npinfo = NULL;
+ ops = np->dev->netdev_ops;
+ if (ops->ndo_netpoll_cleanup)
+ ops->ndo_netpoll_cleanup(np->dev);
+ else
+ np->dev->npinfo = NULL;
}
}
@@ -907,6 +921,7 @@ void netpoll_set_trap(int trap)
atomic_dec(&trapped);
}
+EXPORT_SYMBOL(netpoll_send_skb);
EXPORT_SYMBOL(netpoll_set_trap);
EXPORT_SYMBOL(netpoll_trap);
EXPORT_SYMBOL(netpoll_print_options);
@@ -914,4 +929,5 @@ EXPORT_SYMBOL(netpoll_parse_options);
EXPORT_SYMBOL(netpoll_setup);
EXPORT_SYMBOL(netpoll_cleanup);
EXPORT_SYMBOL(netpoll_send_udp);
+EXPORT_SYMBOL(netpoll_poll_dev);
EXPORT_SYMBOL(netpoll_poll);
^ permalink raw reply
* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Herbert Xu @ 2010-04-05 9:12 UTC (permalink / raw)
To: Timo Teräs; +Cc: netdev
In-Reply-To: <4BB9A4F3.9050003@iki.fi>
On Mon, Apr 05, 2010 at 11:53:07AM +0300, Timo Teräs wrote:
>
> Right. I'm fine either way. Is there any preference?
I don't mind either. You can choose either or maybe come up with
something better :)
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 1/2] benet: use the dma state API instead of the pci equivalents
From: Sathya Perla @ 2010-04-05 9:04 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: sathyap, subbus, sarveshwarb, ajitk, netdev
In-Reply-To: <1270176803-8561-1-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Thanks.
Acked-by: Sathya Perla <sathyap@serverengines.com>
On 02/04/10 11:53 +0900, FUJITA Tomonori wrote:
> The DMA API is preferred.
>
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> ---
> drivers/net/benet/be.h | 2 +-
> drivers/net/benet/be_main.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
> index 8f07525..bddae08 100644
> --- a/drivers/net/benet/be.h
> +++ b/drivers/net/benet/be.h
> @@ -206,7 +206,7 @@ struct be_tx_obj {
> /* Struct to remember the pages posted for rx frags */
> struct be_rx_page_info {
> struct page *page;
> - dma_addr_t bus;
> + DEFINE_DMA_UNMAP_ADDR(bus);
> u16 page_offset;
> bool last_page_user;
> };
> diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
> index 17282df..8d5e27b 100644
> --- a/drivers/net/benet/be_main.c
> +++ b/drivers/net/benet/be_main.c
> @@ -682,7 +682,7 @@ get_rx_page_info(struct be_adapter *adapter, u16 frag_idx)
> BUG_ON(!rx_page_info->page);
>
> if (rx_page_info->last_page_user) {
> - pci_unmap_page(adapter->pdev, pci_unmap_addr(rx_page_info, bus),
> + pci_unmap_page(adapter->pdev, dma_unmap_addr(rx_page_info, bus),
> adapter->big_page_size, PCI_DMA_FROMDEVICE);
> rx_page_info->last_page_user = false;
> }
> @@ -993,7 +993,7 @@ static void be_post_rx_frags(struct be_adapter *adapter)
> }
> page_offset = page_info->page_offset;
> page_info->page = pagep;
> - pci_unmap_addr_set(page_info, bus, page_dmaaddr);
> + dma_unmap_addr_set(page_info, bus, page_dmaaddr);
> frag_dmaaddr = page_dmaaddr + page_info->page_offset;
>
> rxd = queue_head_node(rxq);
> --
> 1.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Timo Teräs @ 2010-04-05 8:53 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
In-Reply-To: <20100405084902.GA16912@gondor.apana.org.au>
Herbert Xu wrote:
> On Mon, Apr 05, 2010 at 04:44:22PM +0800, Herbert Xu wrote:
>>> It might actually make more sense to pass struct flow_cache_object**
>>> so the resolver can twiddle the flow_cache_entry's object. Then it'd
>>> be more explicit that the resolver is replacing entries.
>> Yes that sounds good.
>
> Alternatively you can pass in a struct flow_cache_entry *.
>
> Yet another way would be to keep it the same but move the NULL
> setting before the resolver call:
>
> flo = NULL;
> if (fle) {
> flo = fle->object;
> fle->object = NULL;
> }
> flo = resolver(..., flo, ...);
>
> This way it's obvious that we've given the reference over to
> the resolver.
Right. I'm fine either way. Is there any preference?
^ permalink raw reply
* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Herbert Xu @ 2010-04-05 8:49 UTC (permalink / raw)
To: Timo Teräs; +Cc: netdev
In-Reply-To: <20100405084422.GB16788@gondor.apana.org.au>
On Mon, Apr 05, 2010 at 04:44:22PM +0800, Herbert Xu wrote:
>
> > It might actually make more sense to pass struct flow_cache_object**
> > so the resolver can twiddle the flow_cache_entry's object. Then it'd
> > be more explicit that the resolver is replacing entries.
>
> Yes that sounds good.
Alternatively you can pass in a struct flow_cache_entry *.
Yet another way would be to keep it the same but move the NULL
setting before the resolver call:
flo = NULL;
if (fle) {
flo = fle->object;
fle->object = NULL;
}
flo = resolver(..., flo, ...);
This way it's obvious that we've given the reference over to
the resolver.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Herbert Xu @ 2010-04-05 8:44 UTC (permalink / raw)
To: Timo Teräs; +Cc: netdev
In-Reply-To: <4BB9A113.30601@iki.fi>
On Mon, Apr 05, 2010 at 11:36:35AM +0300, Timo Teräs wrote:
> Herbert Xu wrote:
>> On Mon, Apr 05, 2010 at 10:00:21AM +0300, Timo Teras wrote:
>>> @@ -219,33 +222,32 @@ void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
>>> + flo = resolver(net, key, family, dir, fle ? fle->object : NULL, ctx);
>>> + if (fle) {
>>> + fle->genid = atomic_read(&flow_cache_genid);
>>> + if (IS_ERR(flo)) {
>>> + fle->genid--;
>>> + fle->object = NULL;
>>
>> Shouldn't we call fle->object->ops->delete here?
>
> The resolver function releases the old object.
I see.
> It might actually make more sense to pass struct flow_cache_object**
> so the resolver can twiddle the flow_cache_entry's object. Then it'd
> be more explicit that the resolver is replacing entries.
Yes that sounds good.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [RFC PATCH 1/2] netdev: buffer infrastructure to log network driver's information
From: Eric Dumazet @ 2010-04-05 8:42 UTC (permalink / raw)
To: Koki Sanagi
Cc: netdev, izumi.taku, kaneshige.kenji, davem, nhorman,
jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
alexander.h.duyck, peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <4BB988C9.9070709@jp.fujitsu.com>
Le lundi 05 avril 2010 à 15:52 +0900, Koki Sanagi a écrit :
> This patch implements buffer infrastructure under driver/net.
> This buffer records information from network driver.
>
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
> drivers/net/Kconfig | 8 +
> drivers/net/Makefile | 1 +
> drivers/net/ndrvbuf.c | 535 +++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ndrvbuf.h | 57 +++++
> 4 files changed, 601 insertions(+), 0 deletions(-)
>
Wow, 600 lines... thats what I call bloat...
Why no use a very simple interface (printk like) ?
xxx_printf(dev->trace, "xmit qidx=%u ntu=%d->%d\n",
tx_ring->queue_index, first, tx_ring->next_to_use);
Anyway, this has nothing to do with network drivers and should be
discussed on lkml.
^ permalink raw reply
* Re: [PATCH 2/2] benet: fix the misusage of zero dma address
From: FUJITA Tomonori @ 2010-04-05 8:38 UTC (permalink / raw)
To: sathyap; +Cc: fujita.tomonori, subbus, sarveshwarb, ajitk, netdev
In-Reply-To: <20100405082202.GB32671@serverengines.com>
On Mon, 5 Apr 2010 13:52:02 +0530
Sathya Perla <sathyap@serverengines.com> wrote:
> On 05/04/10 16:40 +0900, FUJITA Tomonori wrote:
> > > > + wrb->frag_len = 0;
> > > Why does wrb->frag_len need to be reset here?
> > > In the TX path, it is set to the proper value for data wrbs and zero
> > > for dummy and hdr wrbs.
> >
> > I guess that I misunderstood why unmap_tx_frag() checks a dma address.
> > The checking is necessary to avoid calling pci_unamp_* API for dummy
> > hdr wrbs?
> Yes.
> >
> > Anyway, if wrb->frag_len doesn't need to be reset here, the following
> > patch is ok?
> Yes. Thanks.
>
> Acked-by: Sathya Perla <sathyap@serverengines.com>
Thanks!
Can I also get your ack on "[PATCH 1/2] benet: use the dma state API
instead of the pci equivalents"?
http://patchwork.ozlabs.org/patch/49265/
The reason why I want to replace the PCI DMA state API is:
http://marc.info/?l=linux-netdev&m=127037540020276&w=2
Note that I use DEFINE_DMA_UNMAP_ADDR for bus in struct
be_rx_page_info because dma_unmap_addr and DEFINE_DMA_UNMAP_ADDR are
supposed to be used together.
^ permalink raw reply
* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Timo Teräs @ 2010-04-05 8:36 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
In-Reply-To: <20100405083302.GA16636@gondor.apana.org.au>
Herbert Xu wrote:
> On Mon, Apr 05, 2010 at 10:00:21AM +0300, Timo Teras wrote:
>> @@ -219,33 +222,32 @@ void *flow_cache_lookup(struct net *net, struct flowi *key, u16 family, u8 dir,
>> + flo = resolver(net, key, family, dir, fle ? fle->object : NULL, ctx);
>> + if (fle) {
>> + fle->genid = atomic_read(&flow_cache_genid);
>> + if (IS_ERR(flo)) {
>> + fle->genid--;
>> + fle->object = NULL;
>
> Shouldn't we call fle->object->ops->delete here?
The resolver function releases the old object.
It might actually make more sense to pass struct flow_cache_object**
so the resolver can twiddle the flow_cache_entry's object. Then it'd
be more explicit that the resolver is replacing entries.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox