* Re: [net-next 1/2] net: fix a sparse warning
From: David Miller @ 2014-10-07 4:11 UTC (permalink / raw)
To: azhou; +Cc: netdev
In-Reply-To: <1412626971-30271-1-git-send-email-azhou@nicira.com>
From: Andy Zhou <azhou@nicira.com>
Date: Mon, 6 Oct 2014 13:22:50 -0700
> Fix a sparse warning introduced by Commit
> 0b5e8b8eeae40bae6ad7c7e91c97c3c0d0e57882 (net: Add Geneve tunneling
> protocol driver) caught by kbuild test robot:
>
> # apt-get install sparse
> # git checkout 0b5e8b8eeae40bae6ad7c7e91c97c3c0d0e57882
> # make ARCH=x86_64 allmodconfig
> # make C=1 CF=-D__CHECK_ENDIAN__
> #
> #
> # sparse warnings: (new ones prefixed by >>)
> #
> # >> net/ipv4/geneve.c:230:42: sparse: incorrect type in assignment (different base types)
> # net/ipv4/geneve.c:230:42: expected restricted __be32 [addressable] [assigned] [usertype] s_addr
> # net/ipv4/geneve.c:230:42: got unsigned long [unsigned] <noident>
> #
>
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Signed-off-by: Andy Zhou <azhou@nicira.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 net-next] net: phy: adjust fixed_phy_register() return value
From: David Miller @ 2014-10-07 4:07 UTC (permalink / raw)
To: pgynther; +Cc: netdev, f.fainelli, thomas.petazzoni
In-Reply-To: <20141006183830.D4E82100BC1@puck.mtv.corp.google.com>
From: Petri Gynther <pgynther@google.com>
Date: Mon, 6 Oct 2014 11:38:30 -0700 (PDT)
> Adjust fixed_phy_register() to return struct phy_device *, so that
> it becomes easy to use fixed PHYs without device tree support:
>
> phydev = fixed_phy_register(PHY_POLL, &fixed_phy_status, NULL);
> fixed_phy_set_link_update(phydev, fixed_phy_link_update);
> phy_connect_direct(netdev, phydev, handler_fn, phy_interface);
>
> This change is a prerequisite for modifying bcmgenet driver to work
> without a device tree on Broadcom's MIPS-based 7xxx platforms.
>
> Signed-off-by: Petri Gynther <pgynther@google.com>
If the caller gets this 'phy' pointer and does something with it,
something seems amiss.
We don't hold an extra reference to the 'phy' object for the caller,
so another thread of control can unregister it and kill that last
reference and therefore free it up.
I think to be legitimate, you have to hold an extra reference on
'phy' for the caller. And now that means that code paths that
don't need to do anything with 'phy' now will need to release
that reference.
^ permalink raw reply
* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Dumazet @ 2014-10-07 4:04 UTC (permalink / raw)
To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody
In-Reply-To: <alpine.DEB.2.00.1410061956410.7392@ware.dreamhost.com>
On Mon, 2014-10-06 at 20:05 -0700, Eric Wheeler wrote:
> The patch in question implements a simple mempool for the interrupt page
> allocs---which definitely fixes the problem even if a better solution
> might exist. I have no problem giving up a small amount of memory to
> guarantee page allocs in the interrupt handler.
You have no such guarantee.
Once page is consumed by some networking frame, this frame can be
sitting in some socket receive queue. page cant be reused by the driver.
If you receive (small) burst of 100 frames, your 32 mempool will be
depleted anyway.
Even if you use a mempool with 10000 elements, there is no guarantee, it
really depends on the number of sockets and their SO_RCVBUF limits.
If your NIC depends of having order-2 pages available for its operation,
then I am afraid it cannot possibly work.
Memory will be eventually fragmented and order-2 allocations fail.
Run your patch with 100 concurrent TCP flows, add some losses to force
usage of out or order queue, you'll get errors quite fast.
^ permalink raw reply
* Re: [PATCH v2 net-next 0/5] ipv6: cleanup after rt6_genid removal
From: David Miller @ 2014-10-07 4:03 UTC (permalink / raw)
To: hannes; +Cc: netdev, hideaki, kafai, cwang
In-Reply-To: <cover.1412618014.git.hannes@stressinduktion.org>
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 6 Oct 2014 19:58:33 +0200
> Leftover patches after rt6_genid removal after 705f1c869d577c ("ipv6:
> remove rt6i_genid").
>
> Major two changes are:
> * keep fib6_sernum per namespace to reduce number of flushes in case
> system has high number of namespaces
> * make fn_sernum updates cheaper
>
> v2: Incorporated feedback from Cong Wang, thanks a lot!
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net] bna: allow transmit tagged frames
From: David Miller @ 2014-10-07 4:02 UTC (permalink / raw)
To: ivecera; +Cc: netdev, rasesh.mody
In-Reply-To: <1412614957-18549-1-git-send-email-ivecera@redhat.com>
From: Ivan Vecera <ivecera@redhat.com>
Date: Mon, 6 Oct 2014 19:02:37 +0200
> When Tx VLAN offloading is disabled frames with size ~ MTU are not
> transmitted as the driver does not account 4 bytes of VLAN header added
> by stack. It should use VLAN_ETH_HLEN instead of ETH_HLEN.
>
> The second problem is with newer BNA chips (BNA 1860). These chips filter
> out any VLAN tagged frames in Tx path. This is a problem when Tx VLAN
> offloading is disabled and frames are tagged by stack. Older chips like
> 1010/1020 are not affected as they probably don't do such filtering.
>
> Cc: Rasesh Mody <rasesh.mody@qlogic.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] net: bcmgenet: fix Tx ring priority programming
From: David Miller @ 2014-10-07 3:59 UTC (permalink / raw)
To: pgynther; +Cc: netdev, f.fainelli
In-Reply-To: <20141007005001.61C1D100BC1@puck.mtv.corp.google.com>
From: Petri Gynther <pgynther@google.com>
Date: Mon, 6 Oct 2014 17:50:01 -0700 (PDT)
> @@ -1731,11 +1744,12 @@ static void bcmgenet_init_multiq(struct net_device *dev)
> reg |= ring_cfg;
> bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
>
> - /* Use configured rings priority and set ring #16 priority */
> - reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
> - reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
> - reg |= dma_priority;
> - bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
> + /* Set ring 16 priority and program the hardware registers */
> + dma_priority[2] |=
> + ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
Please use "<< (16 - 12) * DMA_RING_BUF_PRIORITY_SHIFT" otherwise this
constant is magic.
You might, optionally, add macros for the subtraction adjustment each
priority register uses (0, 6, 12, respectively).
^ permalink raw reply
* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Dumazet @ 2014-10-07 3:15 UTC (permalink / raw)
To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody
In-Reply-To: <alpine.DEB.2.00.1410061956410.7392@ware.dreamhost.com>
On Mon, 2014-10-06 at 20:05 -0700, Eric Wheeler wrote:
> On Mon, 6 Oct 2014, Eric Dumazet wrote:
> > On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
> >> This patch fixes an order:2 memory allocation error backtrace by
> >> guaranteeing that memory is available during simultaneous high memory
> >> pressure and packet rates when using 9k jumbo frames.
> >
> > This is highly suspect to me.
> > Most likely yet another truesize lie.
> > At a first glance, bnad_cq_setup_skb_frags() is buggy here :
> > skb->truesize += totlen;
>
> skb->truesize wasn't part of my patch, can you explain in more detail what
> you suggest a better fix might be? If you write a quick patch I can test
> it.
>
> The patch in question implements a simple mempool for the interrupt page
> allocs---which definitely fixes the problem even if a better solution
> might exist. I have no problem giving up a small amount of memory to
> guarantee page allocs in the interrupt handler.
>
> It would be great to see this patch pushed through since it does fix the
> problem---at least until we can come up with a better fix. I'm happy to
> test if you can send a patch.
It seems many drivers make this assumption that a frame of 1000 bytes
consumes 1000 bytes of memory.
Reality is that driver allocated more memory, because it can not predict
how many bytes are going to be received from the network.
By lying on skb->truesize (underestimating real memory cost), this
prevents networking stack making appropriate memory checks.
Here, it seems clear to me the following fix is needed, at very minimum.
And it might be that something better is needed : MTU=9000 might force
the driver to allocate 16384 bytes per frame, not 9018
So it is possible that unmap->vector.len needs to be changed to the real
size of memory region (for example : PAGE_SIZE << 2)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index ffc92a41d75be550d27698af6ca3e600d9a146fe..ce867219e2ceaf33b17595b67bae99e964d5a6b6 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -550,6 +550,7 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
dma_unmap_addr(&unmap->vector, dma_addr),
unmap->vector.len, DMA_FROM_DEVICE);
+ skb->truesize += unmap->vector.len;
len = (vec == nvecs) ?
last_fraglen : unmap->vector.len;
totlen += len;
@@ -563,7 +564,6 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
skb->len += totlen;
skb->data_len += totlen;
- skb->truesize += totlen;
}
static inline void
^ permalink raw reply related
* Interested in your product
From: Emel Group @ 2014-10-07 2:52 UTC (permalink / raw)
Dear Sir,
My company is interested in your products. We are a big import company
based in Nigeria and we will like to import your products to Nigeria
in large quantities as we have decided to venture into the products
market. Nigeria is the biggest market in Africa and the destination of
every foreign product. My company can create very large market share
for your products as we have done for other products under our
company. Your collaboration in this venture is required for successful
business partnership.
View our updated company website below to know more about us.
Website hyperlink: http://ccya.tw/view/emelgroup.html
Yours faithfully.
Ben Hillary.
For: Emel Group Nig. Ltd.
^ permalink raw reply
* RE: r8168 is needed to enter P-state: Package State 6(pc6)onHaswellhardware
From: Hayes Wang @ 2014-10-07 2:50 UTC (permalink / raw)
To: Francois Romieu; +Cc: Ceriel Jacobs, nic_swsd, netdev@vger.kernel.org
In-Reply-To: <20141006221307.GB10936@electric-eye.fr.zoreil.com>
Francois Romieu [mailto:romieu@fr.zoreil.com]
> Sent: Tuesday, October 07, 2014 6:13 AM
[...]
> Realtek's r8168 driver defaults to CONFIG_ASPM=1 but I guess
> some users
> need to disable it and there's no known pattern / blacklist, right ?
When enabling the ASPM, it would influence the thrpughput. It is hard to
choose performance or power saving. Therefore, we reserve the config
to let the user determines it.
Best Regards,
Hayes
^ permalink raw reply
* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Wheeler @ 2014-10-07 3:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody
In-Reply-To: <1412646735.11091.93.camel@edumazet-glaptop2.roam.corp.google.com>
On Mon, 6 Oct 2014, Eric Dumazet wrote:
> On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
>> This patch fixes an order:2 memory allocation error backtrace by
>> guaranteeing that memory is available during simultaneous high memory
>> pressure and packet rates when using 9k jumbo frames.
>
> This is highly suspect to me.
> Most likely yet another truesize lie.
> At a first glance, bnad_cq_setup_skb_frags() is buggy here :
> skb->truesize += totlen;
skb->truesize wasn't part of my patch, can you explain in more detail what
you suggest a better fix might be? If you write a quick patch I can test
it.
The patch in question implements a simple mempool for the interrupt page
allocs---which definitely fixes the problem even if a better solution
might exist. I have no problem giving up a small amount of memory to
guarantee page allocs in the interrupt handler.
It would be great to see this patch pushed through since it does fix the
problem---at least until we can come up with a better fix. I'm happy to
test if you can send a patch.
-Eric
>
> With this kind of lies, system can OOM very fast.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Fast Response .
From: yuxiaoli @ 2014-10-07 1:38 UTC (permalink / raw)
Hello my name is Austin, i have an important message for you, please get back to me as
soon as possible , here is my email : aus.12345@hotmail.com
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply
* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Dumazet @ 2014-10-07 1:52 UTC (permalink / raw)
To: Eric Wheeler
Cc: Shahed Shaikh, Stephen Hemminger, netdev, rmody@brocade.com,
Rasesh Mody
In-Reply-To: <alpine.DEB.2.00.1410061850120.5129@ware.dreamhost.com>
On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
> This patch fixes an order:2 memory allocation error backtrace by
> guaranteeing that memory is available during simultaneous high memory
> pressure and packet rates when using 9k jumbo frames.
>
> Tests between two systems (one patched, one not) succeeded with ~1TB of
> data transferred over DRBD. As expected, the unpatched host gave
> warn_alloc_failed's, and the patched host worked correctly. This patch
> increases kernel memory usage by 32 order-2 allocation when this module is
> loaded (512k on x86) which should be negligible on hosts that use 10GbE
> cards.
This is highly suspect to me.
Most likely yet another truesize lie.
At a first glance, bnad_cq_setup_skb_frags() is buggy here :
skb->truesize += totlen;
With this kind of lies, system can OOM very fast.
^ permalink raw reply
* [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Wheeler @ 2014-10-07 1:57 UTC (permalink / raw)
To: Shahed Shaikh; +Cc: Stephen Hemminger, netdev, rmody@brocade.com, Rasesh Mody
In-Reply-To: <262CB373A6D1F14F9B81E82F74F77D5A4704FAE7@avmb2.qlogic.org>
This patch fixes an order:2 memory allocation error backtrace by
guaranteeing that memory is available during simultaneous high memory
pressure and packet rates when using 9k jumbo frames.
Tests between two systems (one patched, one not) succeeded with ~1TB of
data transferred over DRBD. As expected, the unpatched host gave
warn_alloc_failed's, and the patched host worked correctly. This patch
increases kernel memory usage by 32 order-2 allocation when this module is
loaded (512k on x86) which should be negligible on hosts that use 10GbE
cards.
Fixes:
[<ffffffff8113b17b>] warn_alloc_failed+0xeb/0x150
[<ffffffff8113e2f0>] __alloc_pages_slowpath+0x4a0/0x7e0
[<ffffffff8113e8e2>] __alloc_pages_nodemask+0x2b2/0x2c0
[<ffffffff81181232>] alloc_pages_current+0xb2/0x170
[<ffffffffa0239cb4>] bnad_rxq_refill_page+0x154/0x1e0 [bna]
[<ffffffffa023c282>] bnad_cq_process+0x462/0x840 [bna]
[<ffffffffa023c6af>] bnad_napi_poll_rx+0x4f/0xc0 [bna]
[<ffffffff814dcf4c>] net_rx_action+0xfc/0x280
[<ffffffff8105ba83>] __do_softirq+0xf3/0x2c0
[<ffffffff8105bd5d>] irq_exit+0xbd/0xd0
[<ffffffff815ae697>] do_IRQ+0x67/0x110
[<ffffffff815a39ad>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff81486275>] ? cpuidle_enter_state+0x55/0xd0
[<ffffffff8148626b>] ? cpuidle_enter_state+0x4b/0xd0
[<ffffffff814863b7>] cpuidle_idle_call+0xc7/0x160
[<ffffffff8100d73e>] arch_cpu_idle+0xe/0x30
[<ffffffff810b187e>] cpu_idle_loop+0x9e/0x250
[<ffffffff810b1aa0>] cpu_startup_entry+0x70/0x80
[<ffffffff810350d2>] start_secondary+0xd2/0xe0
Signed-off-by: Eric Wheeler <netdev@lists.ewheeler.net>
---
drivers/net/ethernet/brocade/bna/bnad.c | 27 ++++++++++++++++++++++++---
1 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 3a77f9e..8906ad1 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -26,6 +26,7 @@
#include <linux/ip.h>
#include <linux/prefetch.h>
#include <linux/module.h>
+#include <linux/mempool.h>
#include "bnad.h"
#include "bna.h"
@@ -58,6 +59,8 @@ static struct mutex bnad_list_mutex;
static LIST_HEAD(bnad_list);
static const u8 bnad_bcast_addr[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+static mempool_t *rxq_mempool_o2 = NULL;
+
/*
* Local MACROS
*/
@@ -321,7 +324,7 @@ bnad_rxq_cleanup_page(struct bnad *bnad, struct bnad_rx_unmap *unmap)
dma_unmap_page(&bnad->pcidev->dev,
dma_unmap_addr(&unmap->vector, dma_addr),
unmap->vector.len, DMA_FROM_DEVICE);
- put_page(unmap->page);
+ mempool_free(unmap->page, rxq_mempool_o2);
unmap->page = NULL;
dma_unmap_addr_set(&unmap->vector, dma_addr, 0);
unmap->vector.len = 0;
@@ -380,8 +383,11 @@ bnad_rxq_refill_page(struct bnad *bnad, struct bna_rcb *rcb, u32 nalloc)
unmap = &unmap_q->unmap[prod];
if (unmap_q->reuse_pi < 0) {
- page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
- unmap_q->alloc_order);
+ if (unmap_q->alloc_order == 2)
+ page = mempool_alloc(rxq_mempool_o2, GFP_ATOMIC | __GFP_COMP);
+ else
+ page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
+ unmap_q->alloc_order);
page_offset = 0;
} else {
prev = &unmap_q->unmap[unmap_q->reuse_pi];
@@ -3861,6 +3867,16 @@ static struct pci_driver bnad_pci_driver = {
.remove = bnad_pci_remove,
};
+void *bnad_rxq_mempool_alloc_o2(gfp_t gfp_mask, void *pool_data)
+{
+ return (void*) alloc_pages(gfp_mask, 2);
+}
+
+void bnad_rxq_mempool_free_o2(void *page, void *pool_data)
+{
+ put_page((struct page*)page);
+}
+
static int __init
bnad_module_init(void)
{
@@ -3878,6 +3894,10 @@ bnad_module_init(void)
return err;
}
+ rxq_mempool_o2 = mempool_create(32, /* how many do we really need? */
+ bnad_rxq_mempool_alloc_o2,
+ bnad_rxq_mempool_free_o2, NULL);
+
return 0;
}
@@ -3886,6 +3906,7 @@ bnad_module_exit(void)
{
pci_unregister_driver(&bnad_pci_driver);
release_firmware(bfi_fw);
+ mempool_destroy(rxq_mempool_o2);
}
module_init(bnad_module_init);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next] net: bcmgenet: fix Tx ring priority programming
From: Petri Gynther @ 2014-10-07 0:50 UTC (permalink / raw)
To: netdev; +Cc: davem, f.fainelli
GENET MAC has three Tx ring priority registers:
- GENET_x_TDMA_PRIORITY0 for queues 0-5
- GENET_x_TDMA_PRIORITY1 for queues 6-11
- GENET_x_TDMA_PRIORITY2 for queues 12-16
Fix bcmgenet_init_multiq() to program them correctly.
Signed-off-by: Petri Gynther <pgynther@google.com>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 46 +++++++++++++++++---------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index e0a6238..151d5b2 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -191,8 +191,9 @@ enum dma_reg {
DMA_STATUS,
DMA_SCB_BURST_SIZE,
DMA_ARB_CTRL,
- DMA_PRIORITY,
- DMA_RING_PRIORITY,
+ DMA_PRIORITY_0,
+ DMA_PRIORITY_1,
+ DMA_PRIORITY_2,
};
static const u8 bcmgenet_dma_regs_v3plus[] = {
@@ -201,8 +202,9 @@ static const u8 bcmgenet_dma_regs_v3plus[] = {
[DMA_STATUS] = 0x08,
[DMA_SCB_BURST_SIZE] = 0x0C,
[DMA_ARB_CTRL] = 0x2C,
- [DMA_PRIORITY] = 0x30,
- [DMA_RING_PRIORITY] = 0x38,
+ [DMA_PRIORITY_0] = 0x30,
+ [DMA_PRIORITY_1] = 0x34,
+ [DMA_PRIORITY_2] = 0x38,
};
static const u8 bcmgenet_dma_regs_v2[] = {
@@ -211,8 +213,9 @@ static const u8 bcmgenet_dma_regs_v2[] = {
[DMA_STATUS] = 0x08,
[DMA_SCB_BURST_SIZE] = 0x0C,
[DMA_ARB_CTRL] = 0x30,
- [DMA_PRIORITY] = 0x34,
- [DMA_RING_PRIORITY] = 0x3C,
+ [DMA_PRIORITY_0] = 0x34,
+ [DMA_PRIORITY_1] = 0x38,
+ [DMA_PRIORITY_2] = 0x3C,
};
static const u8 bcmgenet_dma_regs_v1[] = {
@@ -220,8 +223,9 @@ static const u8 bcmgenet_dma_regs_v1[] = {
[DMA_STATUS] = 0x04,
[DMA_SCB_BURST_SIZE] = 0x0C,
[DMA_ARB_CTRL] = 0x30,
- [DMA_PRIORITY] = 0x34,
- [DMA_RING_PRIORITY] = 0x3C,
+ [DMA_PRIORITY_0] = 0x34,
+ [DMA_PRIORITY_1] = 0x38,
+ [DMA_PRIORITY_2] = 0x3C,
};
/* Set at runtime once bcmgenet version is known */
@@ -1696,7 +1700,8 @@ static void bcmgenet_init_multiq(struct net_device *dev)
{
struct bcmgenet_priv *priv = netdev_priv(dev);
unsigned int i, dma_enable;
- u32 reg, dma_ctrl, ring_cfg = 0, dma_priority = 0;
+ u32 reg, dma_ctrl, ring_cfg = 0;
+ u32 dma_priority[3] = {0, 0, 0};
if (!netif_is_multiqueue(dev)) {
netdev_warn(dev, "called with non multi queue aware HW\n");
@@ -1721,9 +1726,17 @@ static void bcmgenet_init_multiq(struct net_device *dev)
/* Configure ring as descriptor ring and setup priority */
ring_cfg |= 1 << i;
- dma_priority |= ((GENET_Q0_PRIORITY + i) <<
- (GENET_MAX_MQ_CNT + 1) * i);
dma_ctrl |= 1 << (i + DMA_RING_BUF_EN_SHIFT);
+
+ if (i < 6)
+ dma_priority[0] |= ((GENET_Q0_PRIORITY + i) <<
+ (i * DMA_RING_BUF_PRIORITY_SHIFT));
+ else if (i < 12)
+ dma_priority[1] |= ((GENET_Q0_PRIORITY + i) <<
+ ((i - 6) * DMA_RING_BUF_PRIORITY_SHIFT));
+ else
+ dma_priority[2] |= ((GENET_Q0_PRIORITY + i) <<
+ ((i - 12) * DMA_RING_BUF_PRIORITY_SHIFT));
}
/* Enable rings */
@@ -1731,11 +1744,12 @@ static void bcmgenet_init_multiq(struct net_device *dev)
reg |= ring_cfg;
bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
- /* Use configured rings priority and set ring #16 priority */
- reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
- reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
- reg |= dma_priority;
- bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
+ /* Set ring 16 priority and program the hardware registers */
+ dma_priority[2] |=
+ ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
+ bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
+ bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
+ bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
/* Configure ring as descriptor ring and re-enable DMA if enabled */
reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related
* Re: [PATCH iproute2 1/5] iplink: Fix setting of -1 as ifindex
From: Cong Wang @ 2014-10-07 0:42 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, Stephen Hemminger, netdev
In-Reply-To: <1412351718-22921-2-git-send-email-therbert@google.com>
On Fri, Oct 3, 2014 at 8:55 AM, Tom Herbert <therbert@google.com> wrote:
> Commit 3c682146aeff157ec3540 ("iplink: forbid negative ifindex and
> modifying ifindex") initializes index to -1 in iplink_modify. When
> creating a link, req.i.ifi_index is then set to -1 if the link option is
> not used. In the kernel this is then used to set dev->ifindex. For
> dev->ifindex, zero is considered to be unset and -1 is treated as
> a set index, so when a second tunnel is create the new device conflicts
> with the old one (both have ifindex of -1) so -EBUSY is returned.
>
> This patch set zero in req.i.ifi_index is index is unset (still -1).
There was a patch before yours:
https://patchwork.ozlabs.org/patch/395404/
^ permalink raw reply
* [Patch net] net_sched: copy exts->type in tcf_exts_change()
From: Cong Wang @ 2014-10-07 0:21 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Jamal Hadi Salim, John Fastabend
We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.
Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/cls_api.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c28b0d3..4f4e08b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -549,6 +549,7 @@ void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts *dst,
tcf_tree_lock(tp);
list_splice_init(&dst->actions, &tmp);
list_splice(&src->actions, &dst->actions);
+ dst->type = src->type;
tcf_tree_unlock(tp);
tcf_action_destroy(&tmp, TCA_ACT_UNBIND);
#endif
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH] net: Add ndo_gso_check
From: Tom Herbert @ 2014-10-07 0:17 UTC (permalink / raw)
To: Jesse Gross
Cc: Or Gerlitz, Alexander Duyck, John Fastabend, Jeff Kirsher,
David Miller, Linux Netdev List, Thomas Graf, Pravin Shelar,
Andy Zhou
In-Reply-To: <CAEP_g=-S2b3mev+q-qPBHeBw7jb-Y2XvbTeZPgaLgVj=c+3Bjg@mail.gmail.com>
On Mon, Oct 6, 2014 at 3:33 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Oct 6, 2014 at 10:59 AM, Tom Herbert <therbert@google.com> wrote:
>> On Sun, Oct 5, 2014 at 12:13 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Sun, Oct 5, 2014 at 9:49 PM, Tom Herbert <therbert@google.com> wrote:
>>>> On Sun, Oct 5, 2014 at 7:04 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Thu, Oct 2, 2014 at 2:06 AM, Tom Herbert <therbert@google.com> wrote:
>>>>>> On Wed, Oct 1, 2014 at 1:58 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>> On Tue, Sep 30, 2014 at 6:34 PM, Tom Herbert <therbert@google.com> wrote:
>>>>>>>> On Tue, Sep 30, 2014 at 7:30 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> [...]
>>>>>> Solution #4: apply this patch and implement the check functions as
>>>>>> needed in those 4 or 5 drivers. If a device can only do VXLAN/NVGRE
>>>>>> then I believe the check function is something like:
>>>>>>
>>>>>> bool mydev_gso_check(struct sk_buff *skb, struct net_device *dev)
>>>>>> {
>>>>>> if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
>>>>>> ((skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
>>>>>> skb->protocol != htons(ETH_P_TEB) ||
>>>>>> skb_inner_mac_header(skb) - skb_transport_header(skb) != 12)
>>>>>> return false;
>>>>>>
>>>>>> return true;
>>>>>> }
>>>>>
>>>>> Yep, such helper can can be basically made to work and let the 4-5
>>>>> drivers that can
>>>>> do GSO offloading for vxlan but not for any FOU/GUE packets signal
>>>>> that to the stack.
>>>>>
>>>>> Re the 12 constant, you were referring to the udp+vxlan headers? it's 8+8
>>>>>
>>>>> Also, we need a way for drivers that can support VXLAN or NVGRE but
>>>>> not concurrently
>>>>> on the same port @ the same time to only let vxlan packet to pass
>>>>> successfully through the helper.
>>>
>>>> Or, there should be no difference in GSO processing between VXLAN and
>>>> NVGRE. Can you explain why you feel you need to differentiate them for GSO?
>>>
>>>
>>> RX wise, Linux tells the driver that UDP port X would be used for
>>> VXLAN, right? and indeed, it's possible for some HW implementations
>>> not to support RX offloading (checksum) for both VXLAN and NVGRE @ the
>>> same time over the same port. But TX/GRO wise, you're probably
>>> correct. The thing is that from the user POV they need solution that
>>> works for both RX and TX offloading.
>>
>> I think from a user POV we want a solution that supports RX and TX
>> offloading across the widest range of protocols. This is accomplished
>> by implementing protocol agnostic mechanisms like CHECKSUM_COMPLETE
>> and protocol agnostic UDP tunnel TSO like we've described. IMO, the
>> fact that we have devices that implement protocol specific mechanisms
>> for NVGRE and VXLAN should be considered legacy support in the stack,
>> for new UDP encapsulation protocols we should not expose specifics in
>> the stack in either by adding a GSO type for each protocol, nor
>> ndo_add_foo_port for each protocol-- these things will not scale and
>> unnecessarily complicate the core stack.
>
> It's not clear to me that allowing devices to know what protocols are
> running on what ports actually complicates the stack. The part that is
> complicated is usually the types of operations that are being
> offloaded (checksum, TSO, etc.). In all of these tunnel cases, the
> operations are same and if you have a clean registration mechanism
> then nothing in the core has to see this - only the protocol doing the
> registering and the driver that is supporting it.
>
We already have an ntuple filtering interface that allows configuring
a device for special processing of RX packets. I don't see why that
shouldn't apply to the use case protocol processing for specific ports
in the encapsulation use case.
> I have no disagreement with trying to be generic across protocols. I'm
> just not convinced that it is a realistic plan. It's obvious that it
> is not doable today nor will be it be in the next generation of NICs
> (which are guaranteed to add support for new protocols). Furthermore,
> there will be more advanced stuff coming in the future that I think
> will be difficult or impossible to make protocol agnostic. Rather than
> pretending that this doesn't exist or will never happen, it's better
> focus on how to integrating it cleanly.
Sorry, but I don't understand how supporting a new protocols in a
device for the purposes of returning CHECKSUM_UNNECESSARY is better or
easier to implement than just returning CHECKSUM_COMPLETE. Same thing
for trying to use NETIF_F_IP_CSUM with encapsulation rather than
NETIF_F_HW_CSUM. I'm not a hardware guy, so it's possible I'm missing
something obvious...
Can you be more specific about this "advanced stuff"?
Thanks,
Tom
^ permalink raw reply
* Re: [PATCH RFC v3 net 0/2] ipv6: Avoid restarting fib6_lookup() for RTF_CACHE hit
From: Hannes Frederic Sowa @ 2014-10-07 0:10 UTC (permalink / raw)
To: Martin KaFai Lau, netdev
In-Reply-To: <1412640315-22472-1-git-send-email-kafai@fb.com>
On Tue, Oct 7, 2014, at 02:05, Martin KaFai Lau wrote:
> Notes: Same as the last two versions but fixed a title and missing
> Signed-Off issue.
>
> I am trying to understand why there is a need to restart fib6_lookup()
> after
> getting rt with RTF_CACHE.
>
> I have adapted davem's udpflood test
> (https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2Faj1ZOQObwbmtLwlDw3XzQ%3D%3D%0A&m=j4KoKiV%2FLl4Dx6wOKiDLZPDODlbMJ5UBybTiTzIRHTM%3D%0A&s=68cac2d1d239e23b104065419b4ad89ea80bc1401571034ccce3b4a52a98a8d3)
> to
> support IPv6 and here is the result:
>
> #root > time ./udpflood -l 20000000 -c 250 2401:db00:face:face::2
>
> Before:
> real 0m33.224s
> user 0m2.941s
> sys 0m30.232s
>
> After:
> real 0m31.517s
> user 0m2.938s
> sys 0m28.536s
Thanks, Martin! I'll review them ASAP.
Bye,
Hannes
^ permalink raw reply
* [PATCH RFC v3 net 0/2] ipv6: Avoid restarting fib6_lookup() for RTF_CACHE hit
From: Martin KaFai Lau @ 2014-10-07 0:05 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa
Notes: Same as the last two versions but fixed a title and missing
Signed-Off issue.
I am trying to understand why there is a need to restart fib6_lookup() after
getting rt with RTF_CACHE.
I have adapted davem's udpflood test
(https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2Faj1ZOQObwbmtLwlDw3XzQ%3D%3D%0A&m=j4KoKiV%2FLl4Dx6wOKiDLZPDODlbMJ5UBybTiTzIRHTM%3D%0A&s=68cac2d1d239e23b104065419b4ad89ea80bc1401571034ccce3b4a52a98a8d3) to
support IPv6 and here is the result:
#root > time ./udpflood -l 20000000 -c 250 2401:db00:face:face::2
Before:
real 0m33.224s
user 0m2.941s
sys 0m30.232s
After:
real 0m31.517s
user 0m2.938s
sys 0m28.536s
/****************************** udpflood.c ******************************/
/* It is an adaptation of the Eric Dumazet's and David Miller's
* udpflood tool, by adding IPv6 support.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <malloc.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define _GNU_SOURCE
#include <getopt.h>
typedef uint32_t u32;
static int debug = 0;
/* Allow -fstrict-aliasing */
typedef union sa_u {
struct sockaddr_storage a46;
struct sockaddr_in a4;
struct sockaddr_in6 a6;
} sa_u;
static int usage(void)
{
printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs ] IP_ADDRESS\n");
return -1;
}
static u32 get_last32h(const sa_u *sa)
{
if (sa->a46.ss_family == PF_INET)
return ntohl(sa->a4.sin_addr.s_addr);
else
return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
}
static void set_last32h(sa_u *sa, u32 last32h)
{
if (sa->a46.ss_family == PF_INET)
sa->a4.sin_addr.s_addr = htonl(last32h);
else
sa->a6.sin6_addr.s6_addr32[3] = htonl(last32h);
}
static void print_saddr(const sa_u *sa, const char *msg)
{
char buf[64];
if (!debug)
return;
switch (sa->a46.ss_family) {
case PF_INET:
inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
sizeof(buf));
break;
case PF_INET6:
inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
break;
}
printf("%s: %s\n", msg, buf);
}
static int send_packets(const sa_u *sa, size_t num_addrs, int count, int msg_sz)
{
char *msg = malloc(msg_sz);
sa_u saddr;
u32 start_addr32h, end_addr32h, cur_addr32h;
int fd, i, err;
if (!msg)
return -ENOMEM;
memset(msg, 0, msg_sz);
memcpy(&saddr, sa, sizeof(saddr));
cur_addr32h = start_addr32h = get_last32h(&saddr);
end_addr32h = start_addr32h + num_addrs;
fd = socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
if (fd < 0) {
perror("socket");
err = fd;
goto out_nofd;
}
/* connect to avoid the kernel spending time in figuring
* out the source address (i.e pin the src address)
*/
err = connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
if (err < 0) {
perror("connect");
goto out;
}
print_saddr(&saddr, "start_addr");
for (i = 0; i < count; i++) {
print_saddr(&saddr, "sendto");
err = sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
sizeof(saddr));
if (err < 0) {
perror("sendto");
goto out;
}
if (++cur_addr32h >= end_addr32h)
cur_addr32h = start_addr32h;
set_last32h(&saddr, cur_addr32h);
}
err = 0;
out:
close(fd);
out_nofd:
free(msg);
return err;
}
int main(int argc, char **argv, char **envp)
{
int port, msg_sz, count, num_addrs, ret;
sa_u start_addr;
port = 6000;
msg_sz = 32;
count = 10000000;
num_addrs = 1;
while ((ret = getopt(argc, argv, "dl:s:p:c:")) >= 0) {
switch (ret) {
case 'l':
sscanf(optarg, "%d", &count);
break;
case 's':
sscanf(optarg, "%d", &msg_sz);
break;
case 'p':
sscanf(optarg, "%d", &port);
break;
case 'c':
sscanf(optarg, "%d", &num_addrs);
break;
case 'd':
debug = 1;
break;
case '?':
return usage();
}
}
if (num_addrs < 1)
return usage();
if (!argv[optind])
return usage();
start_addr.a4.sin_port = htons(port);
if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
start_addr.a46.ss_family = PF_INET;
else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_addr))
start_addr.a46.ss_family = PF_INET6;
else
return usage();
return send_packets(&start_addr, num_addrs, count, msg_sz);
}
^ permalink raw reply
* [PATCH RFC v3 net 1/2] ipv6: Remove the net->ipv6.ip6_null_entry check
From: Martin KaFai Lau @ 2014-10-07 0:05 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa
In-Reply-To: <1412640315-22472-1-git-send-email-kafai@fb.com>
The above BACKTRACK have already caught the rt == net->ipv6.ip6_null_entry case
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
net/ipv6/route.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bafde82..d53dc4f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -936,8 +936,7 @@ restart:
if (rt->rt6i_nsiblings)
rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
BACKTRACK(net, &fl6->saddr);
- if (rt == net->ipv6.ip6_null_entry ||
- rt->rt6i_flags & RTF_CACHE)
+ if (rt->rt6i_flags & RTF_CACHE)
goto out;
dst_hold(&rt->dst);
--
1.8.1
^ permalink raw reply related
* [PATCH RFC v3 net 2/2] ipv6: Avoid restarting fib6_lookup() for RTF_CACHE hit case
From: Martin KaFai Lau @ 2014-10-07 0:05 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa
In-Reply-To: <1412640315-22472-1-git-send-email-kafai@fb.com>
When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
net/ipv6/route.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d53dc4f..e40b5dc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -937,7 +937,7 @@ restart:
rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
BACKTRACK(net, &fl6->saddr);
if (rt->rt6i_flags & RTF_CACHE)
- goto out;
+ goto out1;
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);
@@ -974,6 +974,7 @@ out:
reachable = 0;
goto restart_2;
}
+out1:
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);
out2:
--
1.8.1
^ permalink raw reply related
* [Patch net-next] net_sched: fix unused variables in __gnet_stats_copy_basic_cpu()
From: Cong Wang @ 2014-10-07 0:01 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, John Fastabend
Probably not a big deal, but we'd better just use the
one we get in retry loop.
Fixes: commit 22e0f8b9322cb1a48b1357e8 ("net: sched: make bstats per cpu and estimator RCU safe")
Reported-by: Joe Perches <joe@perches.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/core/gen_stats.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 14681b9..0c08062 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -106,8 +106,8 @@ __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
for_each_possible_cpu(i) {
struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(cpu, i);
unsigned int start;
- __u64 bytes;
- __u32 packets;
+ u64 bytes;
+ u32 packets;
do {
start = u64_stats_fetch_begin_irq(&bcpu->syncp);
@@ -115,8 +115,8 @@ __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
packets = bcpu->bstats.packets;
} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
- bstats->bytes += bcpu->bstats.bytes;
- bstats->packets += bcpu->bstats.packets;
+ bstats->bytes += bytes;
+ bstats->packets += packets;
}
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next] icmp6: Add new icmpv6 type for RPL control message
From: Hannes Frederic Sowa @ 2014-10-06 23:53 UTC (permalink / raw)
To: David Miller, simon.vincent; +Cc: netdev
In-Reply-To: <20141006.181322.2253854250567400185.davem@davemloft.net>
On Tue, Oct 7, 2014, at 00:13, David Miller wrote:
> From: Simon Vincent <simon.vincent@xsilon.com>
> Date: Mon, 6 Oct 2014 11:37:06 +0100
>
> > IANA has defined a type value of 155 for RPL control messages.
> > We do nothing if we recieve one of these messages. This patch is to
> > avoid getting lots of icmpv6 unknown type messages when using RPL.
> >
> > Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
>
> If we agree that pretty much our policy is that we treat as "known"
> any ICMPv6 type assigned officially by IANA, then we should simply
> add everything missing from the table at:
>
> http://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml
>
> Any objections?
Might be possible, but I would favor to get rid of the printk or move
the test for informational icmp notifications up.
Some of the type < 128 icmp (non-informal) packets we also report to
user space, so we cannot just add them to a blacklist.
Bye,
Hannes
^ permalink raw reply
* Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: Hannes Frederic Sowa @ 2014-10-06 23:26 UTC (permalink / raw)
To: John Fastabend
Cc: Daniel Borkmann, John Fastabend, Jesper Dangaard Brouer,
John W. Linville, Neil Horman, Florian Westphal, gerlitz.or,
netdev, john.ronciak, amirv, eric.dumazet, danny.zhou,
Willem de Bruijn
In-Reply-To: <5432FD6D.2020102@intel.com>
Hi John,
On Mon, Oct 6, 2014, at 22:37, John Fastabend wrote:
> > I find the six additional ndo ops a bit worrisome as we are adding more
> > and more subsystem specific ndoops to this struct. I would like to see
> > some unification here, but currently cannot make concrete proposals,
> > sorry.
>
> I agree it seems like a bit much. One thought was to split the ndo
> ops into categories. Switch ops, MACVLAN ops, basic ops and with this
> userspace queue ops. This sort of goes along with some of the switch
> offload work which is going to add a handful more ops as best I can
> tell.
Thanks for your mail, you answered all of my questions.
Have you looked at <https://code.google.com/p/kernel/wiki/ProjectUnetq>?
Willem (also in Cc) used sysfs files which get mmaped to represent the
tx/rx descriptors. The representation was independent of the device and
IIRC the prototype used a write(fd, "", 1) to signal the kernel it
should proceed with tx. I agree, it would be great to be syscall-free
here.
For the semantics of the descriptors we could also easily generate files
in sysfs. I thought about something like tracepoints already do for
representing the data in the ringbuffer depending on the event:
-- >8 --
# cat /sys/kernel/debug/tracing/events/net/net_dev_queue/format
name: net_dev_queue
ID: 1006
format:
field:unsigned short common_type; offset:0; size:2;
signed:0;
field:unsigned char common_flags; offset:2; size:1;
signed:0;
field:unsigned char common_preempt_count; offset:3;
size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:unsigned int len; offset:16; size:4; signed:0;
field:__data_loc char[] name; offset:20; size:4;
signed:1;
print fmt: "dev=%s skbaddr=%p len=%u", __get_str(name), REC->skbaddr,
REC->len
-- >8 --
Maybe the macros from tracing are reusable (TP_STRUCT__entry), e.g.
endianess would need to be added. Hopefully there is already a user
space parser somewhere in the perf sources. An easier to parse binary
representation could be added easily and maybe even something vDSO alike
if people care about that.
Maybe this open/mmap per queue also kills some of the ndo_ops?
Bye,
Hannes
^ permalink raw reply
* [PATCH ethtool v2 2/3] ethtool: Add copybreak support
From: Govindarajulu Varadarajan @ 2014-10-06 23:12 UTC (permalink / raw)
To: ben; +Cc: netdev, ogerlitz, yevgenyp, Govindarajulu Varadarajan
In-Reply-To: <1412637141-3205-1-git-send-email-_govind@gmx.com>
This patch adds support for setting/getting driver's rx_copybreak value.
copybreak is set/get using new ethtool tunable interface.
This was added to net-next in
commit: f0db9b073415848709dd59a6394969882f517da9
ethtool: Add generic options for tunables
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
ethtool.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 177 insertions(+)
diff --git a/ethtool.c b/ethtool.c
index bf583f3..4045356 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -179,6 +179,12 @@ static const struct flag_info flags_msglvl[] = {
{ "wol", NETIF_MSG_WOL },
};
+static const char *tunable_name[] = {
+ [ETHTOOL_ID_UNSPEC] = "Unspec",
+ [ETHTOOL_RX_COPYBREAK] = "rx",
+ [ETHTOOL_TX_COPYBREAK] = "tx",
+};
+
struct off_flag_def {
const char *short_name;
const char *long_name;
@@ -1805,6 +1811,173 @@ static int do_gring(struct cmd_context *ctx)
return 0;
}
+static int get_u32tunable(struct cmd_context *ctx, enum tunable_id id,
+ __u32 *value)
+{
+ struct ethtool_tunable *etuna;
+ int ret;
+
+ etuna = calloc(sizeof(*etuna) + sizeof(__u32), 1);
+ if (!etuna)
+ return 1;
+ etuna->cmd = ETHTOOL_GTUNABLE;
+ etuna->id = id;
+ etuna->type_id = ETHTOOL_TUNABLE_U32;
+ etuna->len = sizeof(__u32);
+ ret = send_ioctl(ctx, etuna);
+ *value = *(__u32 *)((void *)etuna + sizeof(*etuna));
+ free(etuna);
+
+ return ret;
+}
+
+static int print_u32tunable(int err, enum tunable_id id, const __u32 value)
+{
+ if (err) {
+ switch (errno) {
+ /* Driver does not support this particular tunable
+ * Usually displays 0
+ */
+ case EINVAL:
+ goto print;
+ /* Driver does not support get tunables ops or no such device
+ * No point in proceeding further
+ */
+ case EOPNOTSUPP:
+ case ENODEV:
+ perror("Cannot get device settings");
+ exit(err);
+ default:
+ perror(tunable_name[id]);
+ return err;
+ }
+ }
+print:
+ fprintf(stdout, "%s: %u\n", tunable_name[id], value);
+
+ return 0;
+}
+
+static int do_gcopybreak(struct cmd_context *ctx)
+{
+ int err, anyerror = 0;
+ __u32 u32value;
+
+ if (ctx->argc != 0)
+ exit_bad_args();
+
+ fprintf(stdout, "Copybreak settings for device %s\n", ctx->devname);
+
+ err = get_u32tunable(ctx, ETHTOOL_RX_COPYBREAK, &u32value);
+ err = print_u32tunable(err, ETHTOOL_RX_COPYBREAK, u32value);
+ if (err)
+ anyerror = err;
+
+ err = get_u32tunable(ctx, ETHTOOL_TX_COPYBREAK, &u32value);
+ err = print_u32tunable(err, ETHTOOL_TX_COPYBREAK, u32value);
+ if (err)
+ anyerror = err;
+
+ if (anyerror)
+ fprintf(stderr, "Failed to get all settings. displayed partial settings\n");
+
+ return anyerror;
+}
+
+static int set_u32tunable(struct cmd_context *ctx, enum tunable_id id,
+ const __u32 value)
+{
+ struct ethtool_tunable *etuna;
+ int ret;
+ __u32 *data;
+
+ etuna = malloc(sizeof(*etuna) + sizeof(__u32));
+ if (!etuna) {
+ perror(tunable_name[id]);
+ return 1;
+ }
+ data = (void *)etuna + sizeof(*etuna);
+ *data = value;
+ etuna->cmd = ETHTOOL_STUNABLE;
+ etuna->id = id;
+ etuna->type_id = ETHTOOL_TUNABLE_U32;
+ etuna->len = sizeof(__u32);
+ ret = send_ioctl(ctx, etuna);
+ free(etuna);
+
+ return ret;
+}
+
+static int check_set_u32tunable(int err, enum tunable_id id)
+{
+ if (err) {
+ switch (errno) {
+ /* Driver does not support get tunables ops or no such device
+ * No point in proceeding further
+ */
+ case EOPNOTSUPP:
+ case ENODEV:
+ perror("Cannot set device settings");
+ exit(err);
+ default:
+ perror(tunable_name[id]);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static int do_scopybreak(struct cmd_context *ctx)
+{
+ int err, anyerr = 0;
+ int copybreak_changed = 0;
+ __u32 rx, tx;
+ s32 rx_seen = 0;
+ s32 tx_seen = 0;
+
+ struct cmdline_info cmdline_channels[] = {
+ { .name = "rx",
+ .type = CMDL_U32,
+ .wanted_val = &rx,
+ .seen_val = &rx_seen, },
+
+ { .name = "tx",
+ .type = CMDL_U32,
+ .wanted_val = &tx,
+ .seen_val = &tx_seen, },
+ };
+
+ parse_generic_cmdline(ctx, ©break_changed, cmdline_channels,
+ ARRAY_SIZE(cmdline_channels));
+
+ if (!copybreak_changed) {
+ fprintf(stderr, "no copybreak parameters changed\n");
+ return 0;
+ }
+
+ if (rx_seen) {
+ err = set_u32tunable(ctx, ETHTOOL_RX_COPYBREAK, rx);
+ err = check_set_u32tunable(err, ETHTOOL_RX_COPYBREAK);
+ if (err)
+ anyerr = err;
+ }
+
+ if (tx_seen) {
+ err = set_u32tunable(ctx, ETHTOOL_TX_COPYBREAK, tx);
+ err = check_set_u32tunable(err, ETHTOOL_TX_COPYBREAK);
+ if (err)
+ anyerr = err;
+ }
+
+ if (anyerr) {
+ fprintf(stderr, "Failed to set all requested parameters\n");
+ return anyerr;
+ }
+
+ return 0;
+}
+
static int do_schannels(struct cmd_context *ctx)
{
struct ethtool_channels echannels;
@@ -4055,6 +4228,10 @@ static const struct option {
" [ rx-mini N ]\n"
" [ rx-jumbo N ]\n"
" [ tx N ]\n" },
+ { "-b|--show-copybreak", 1, do_gcopybreak, "Show copybreak values" },
+ { "-B|--set-copybreak", 1, do_scopybreak, "Set copybreak values",
+ " [ rx N]\n"
+ " [ tx N]\n" },
{ "-k|--show-features|--show-offload", 1, do_gfeatures,
"Get state of protocol offload and other features" },
{ "-K|--features|--offload", 1, do_sfeatures,
--
2.1.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox