* Re: [PATCH net-next 0/3] myri10ge: LRO to GRO conversion
From: Eric Dumazet @ 2012-11-14 14:27 UTC (permalink / raw)
To: Andrew Gallatin; +Cc: netdev
In-Reply-To: <50A39747.3030207@myri.com>
On Wed, 2012-11-14 at 08:06 -0500, Andrew Gallatin wrote:
> Hi,
>
> The following patchset converts myri10ge from using the old inet_lro
> interface to GRO.
Very nice work, it seems INET_LRO could die eventually ;)
^ permalink raw reply
* Re: [patch net] net: correct check in dev_addr_del()
From: Eric Dumazet @ 2012-11-14 14:18 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, shemminger, john.r.fastabend
In-Reply-To: <1352897464-832-1-git-send-email-jiri@resnulli.us>
On Wed, 2012-11-14 at 13:51 +0100, Jiri Pirko wrote:
> Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
> sets this. Correct the check to behave properly on addr removal.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
Please add in the changelog which commit added the bug, to ease
stable teams work.
Thanks
^ permalink raw reply
* Re: [PATCH v6 2/7] net: mvneta: driver for Marvell Armada 370/XP network unit
From: Thomas Petazzoni @ 2012-11-14 14:04 UTC (permalink / raw)
To: Francois Romieu
Cc: David S. Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
Jason Cooper, Andrew Lunn, Gregory Clement, Lior Amsalem,
Dmitri Epshtein
In-Reply-To: <20121113231248.GA30391@electric-eye.fr.zoreil.com>
Francois,
Thanks again for your detailed review. I have a few comments/questions
below.
On Wed, 14 Nov 2012 00:12:48 +0100, Francois Romieu wrote:
> Thomas Petazzoni <thomas.petazzoni@free-electrons.com> :
> [...]
> > +static int mvneta_mac_addr_set(struct mvneta_port *pp, unsigned
> > char *addr,
> > + int queue)
> > +{
> > + unsigned int mac_h;
> > + unsigned int mac_l;
> > +
> > + if (queue >= 1) {
> > + netdev_err(pp->dev, "RX queue #%d is out of
> > range\n", queue);
> > + return -EINVAL;
> > + }
>
> (whence q <= 0)
After changing the code as per your below comments, I have removed this
check.
> > +
> > + if (queue != -1) {
> > + mac_l = (addr[4] << 8) | (addr[5]);
> > + mac_h = (addr[0] << 24) | (addr[1] << 16) |
> > + (addr[2] << 8) | (addr[3] << 0);
> > +
> > + mvreg_write(pp, MVNETA_MAC_ADDR_LOW, mac_l);
> > + mvreg_write(pp, MVNETA_MAC_ADDR_HIGH, mac_h);
> > + }
>
> What is it trying to achieve with the "queue" argument ?
Basically: queue == -1 is a special value to say "I want to discard the
entries in the ucast table". It's used when switching from one MAC
address to another: we remove the old entries by specifying queue=-1,
and then add the new entries by specifying queue=0.
The thing is that the driver does not yet support the multiqueue
aspects of the device, but since many registers have queue-related
aspects, we still have to worry about queues a little bit in the
driver. Complete multiqueue support is planned as a second step.
> [...]
> > +/* Refill processing */
> > +static int mvneta_rx_refill(struct mvneta_port *pp,
> > + struct mvneta_rx_desc *rx_desc)
> > +
> > +{
> > + dma_addr_t phys_addr;
> > + struct sk_buff *skb;
> > +
> > + skb = netdev_alloc_skb(pp->dev, pp->pkt_size);
> > + if (!skb)
> > + return -ENOMEM;
>
> Could netdev_alloc_skb_ip_align be an option ?
If I'm correct netdev_alloc_skb_ip_align() automatically reserve the
two bytes at the beginning of the Ethernet header to ensure that the IP
header will be aligned.
However, with the Marvell hardware, it isn't needed: the hardware
automatically pads with two empty bytes in the receiving RX buffer
before putting the real data (Ethernet header).
> [...]
> > +static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
> > + struct mvneta_rx_queue *rxq)
> > +{
> > + struct net_device *dev = pp->dev;
> > + int rx_done, rx_filled;
> > +
> > + /* Get number of received packets */
> > + rx_done = mvneta_rxq_busy_desc_num_get(pp, rxq);
> > +
> > + if (rx_todo > rx_done)
> > + rx_todo = rx_done;
> > +
> > + rx_done = 0;
> > + rx_filled = 0;
> > +
> > + /* Fairness NAPI loop */
> > + while (rx_done < rx_todo) {
> > + struct mvneta_rx_desc *rx_desc =
> > mvneta_rxq_next_desc_get(rxq);
> > + struct sk_buff *skb;
> > + u32 rx_status;
> > + int rx_bytes, err;
> > +
> > + prefetch(rx_desc);
> > + rx_done++;
> > + rx_filled++;
> > + rx_status = rx_desc->status;
> > + skb = (struct sk_buff *)rx_desc->buf_cookie;
> > +
> > + if (!mvneta_rxq_desc_is_first_last(rx_desc) ||
> > + (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
> > + dev->stats.rx_errors++;
> > + mvneta_rx_error(pp, rx_desc);
> > + mvneta_rx_desc_fill(rx_desc,
> > rx_desc->buf_phys_addr,
> > + (u32)skb);
> > + continue;
> > + }
> > +
> > + dma_unmap_single(pp->dev->dev.parent,
> > rx_desc->buf_phys_addr,
> > + rx_desc->data_size,
> > DMA_FROM_DEVICE); +
> > + rx_bytes = rx_desc->data_size -
> > + (MVNETA_ETH_CRC_SIZE + MVNETA_MH_SIZE);
> > + u64_stats_update_begin(&pp->rx_stats.syncp);
> > + pp->rx_stats.packets++;
> > + pp->rx_stats.bytes += rx_bytes;
> > + u64_stats_update_end(&pp->rx_stats.syncp);
> > +
> > + /* Linux processing */
> > + skb_reserve(skb, MVNETA_MH_SIZE);
> > + skb_put(skb, rx_bytes);
>
> (I suggested to use skb_reserve / skb_put instead of hand-crafted
> code but I may have ignored the elephant in the living room)
>
> I understand the skb_put, ok. What is the purpose of the skb_reserve ?
As explained above, to skip the starting 2 bytes filled with zeroes by
the Marvell hardware, before passing the SKB up to the networking stack.
> Are MVNETA_ETH_CRC_SIZE (4) and MVNETA_MH_SIZE (2) really different
> from ETH_FCS_LEN and NET_IP_ALIGN ?
I have replaced MVNETA_ETH_CRC_SIZE with ETH_FCS_LEN since they are
indeed the same. However, I am not sure using NET_IP_ALIGN directly
instead of MVNETA_MH_SIZE is correct: the Marvell Header (MH) is always
two bytes: those two bytes are filled with zeroes in the normal case,
or otherwise they are used for some custom interaction between the
Marvell Ethernet controller and specific Marvell switches (this
feature is not supported by the driver being submitted). On the other
hand, the NET_IP_ALIGN value is not necessarily zero apparently, so I
have the feeling that those things should remain two separate values.
> > + if (napi_schedule_prep(&pp->napi))
> > + __napi_schedule(&pp->napi);
>
> Hand-crafted napi_schedule.
Agreed, fixed.
> > + ret = mvneta_mac_addr_set(pp, dev->dev_addr, rxq_def);
> > + if (ret < 0) {
> > + netdev_err(dev, "mvneta_mac_addr_set failed\n");
> > + goto mac_addr_set_failure;
> > + }
>
> AFAIU mvneta_mac_addr_set can only fail when (module parameter)
> rxq_def is not set correctly. It imho calls for a check in
> probe/remove and a removal of the failure case in mvneta_mac_addr_set.
Good point, fixed.
> > + /* Connect to port interrupt line */
> > + ret = request_irq(pp->dev->irq, mvneta_isr, IRQF_DISABLED,
> > + MVNETA_DRIVER_NAME, pp);
>
> include/linux/interrupt.h
> [...]
> * IRQF_DISABLED - keep irqs disabled when calling the action handler.
> * DEPRECATED. This flag is a NOOP and scheduled to
> be removed
Fixed.
> > + if (mvneta_init(pp, phy_addr)) {
> > + dev_err(&pdev->dev, "can't init eth hal\n");
> > + err = -ENODEV;
> > + goto err_base;
> > + }
>
> The error code from mvneta_init() should be used.
Fixed.
> > + if (register_netdev(dev)) {
> > + dev_err(&pdev->dev, "failed to register\n");
> > + err = ENOMEM;
> > + goto err_base;
> > + }
>
> - The error code from register_netdev() should be used.
> - Leak: allocations in mvneta_init() are not balanced.
> - Nit: the "err_where_did_it_trigger_first" style of label shows its
> downside when compared to the "err_what_should_be_done" when
> it gets used several times.
Fixed.
> > +/* Device removal routine */
> > +static int __devexit mvneta_remove(struct platform_device *pdev)
> > +{
> > + struct net_device *dev = platform_get_drvdata(pdev);
> > + struct mvneta_port *pp = netdev_priv(dev);
> > +
> > + iounmap(pp->base);
> > +
> > + unregister_netdev(dev);
> > + irq_dispose_mapping(dev->irq);
> > + free_netdev(dev);
> > + mvneta_deinit(pp);
>
> One may expect the same ordering as the mvneta_probe unroll sequence.
Fixed as well!
Thanks,
Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
^ permalink raw reply
* Re: [PATCHES] Networking fixes for -stable.
From: Peter Senna Tschudin @ 2012-11-14 14:02 UTC (permalink / raw)
To: Ben Hutchings; +Cc: David Miller, Greg Kroah-Hartman, stable, netdev
In-Reply-To: <1352836286.16264.102.camel@deadeye.wl.decadent.org.uk>
Hi Ben,
On Tue, Nov 13, 2012 at 8:51 PM, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Mon, 2012-11-12 at 00:25 -0500, David Miller wrote:
>> Please queue up the following networking bug fixes for
>> 3.0.x, 3.2.x, 3.4.x, and 3.6.x, respectively.
> [...]
>> From 2204849a85383fbde75680aa199142abe504adbb Mon Sep 17 00:00:00 2001
>> From: Peter Senna Tschudin <peter.senna@gmail.com>
>> Date: Sun, 28 Oct 2012 06:12:01 +0000
>> Subject: [PATCH 7/9] drivers/net/phy/mdio-bitbang.c: Call mdiobus_unregister
>> before mdiobus_free
>>
>> [ Upstream commit aa731872f7d33dcb8b54dad0cfb82d4e4d195d7e ]
>>
>> Based on commit b27393aecf66199f5ddad37c302d3e0cfadbe6c0
>>
>> Calling mdiobus_free without calling mdiobus_unregister causes
>> BUG_ON(). This patch fixes the issue.
> [...]
>
> This introduces a regresssion, as mdiobus_unregister() is not safe to
> call if the bus isn't registered. Registration is controlled by the
> drivers that use mdio-bitbang, and they should take care of
> unregistration too - and most of them do.
Is mdiobus_free() safe to call if the bus isn't registered?
I found calls to free_mdio_bitbang() after call to
mdiobus_unregister() that may be the worst cases for my patch. But I
also found places that call free_mdio_bitbang() without first calling
mdiobus_unregister(). See:
linux-next/arch/powerpc/platforms/82xx/ep8248e.c:152
linux-next/drivers/net/ethernet/8390/ax88796.c:647
linux-next/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:197
linux-next/drivers/net/phy/mdio-gpio.c:157
linux-next/drivers/net/phy/mdio-gpio.c:172
Are those calls to free_mdio_bitbang() wrong by not calling
mdiobus_unregister() before?
>
> This should be reverted in mainline and not applied to any stable
> series.
>
> Ben.
>
> --
> Ben Hutchings
> For every complex problem
> there is a solution that is simple, neat, and wrong.
Thank you for reviewing,
Peter
--
Peter
^ permalink raw reply
* iputils-s20121114
From: yoshfuji @ 2012-11-14 13:50 UTC (permalink / raw)
To: netdev
Hello.
iputils-s20121114 comes with a lot of bug fixes and improvements, thanks to
distro bug reports.
Files:
https://sourceforge.net/projects/iputils/files/
http://www.skbuff.net/iputils/
Tree:
http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/iputils.git
https://sourceforge.net/p/iputils/code/ci/HEAD/tree/
Regards,
--yoshfuji
Changes between s20121106 and s20121112:
Sergey Fionov (1):
ping,ping6: Fallback to numeric addresses while exiting
YOSHIFUJI Hideaki (19):
ping,ping6: Rework capability support and Make sure -m and -I options work.
ping,tracepath: Spelling fixes in manpages.
ping,ping6: Fix integer overflow with large interval value (-i option).
clockdiff: Make it work with large pid.
ping,ping6: Make in_pr_addr volatile.
arping: Do not quit too early with large deadline value (-w option).
arping: Maintain minimum capabilities for SO_BINDTODEVICE(-I option).
ping: Fix recorded route comparison.
arping: Use getifaddrs() to get broadcast address.
ping6: Fix typo in error message.
ping6: Generate NI Group Address and Subject Name at once.
ping,ping6: Unmask signals on start-up.
arping: Build with USE_CAP=no.
arping,ping,ping6,tracepath,tracepath6,traceroute6: Experimental IDN support.
ping6: IDN support for the Subject Name in NI Query.
tracepath,tracepath6: Introduce -p option for port.
ping6: Add missing definitions/declarations for flowlabel management (-F option).
makefile: Do not include merge commits in RELNOTES.
iputils-s20121112
Changes since iputils-s20121112:
Jan Synacek (2):
clockdiff: remove unused variable
ping: Wrap SO_BINDTODEVICE with the correct capability.
YOSHIFUJI Hideaki (14):
ping: IP_MULTICAST_IF does not need CAP_NET_RAW.
ping6: Check ranges of flowlabel (-F option) and tclass (-Q option) arguments.
ping6: Accept 0x-notation for flowlabel (-F option) and tclass (-Q option) arguments.
ping,ping6: Manual update regarding -F, -Q and -N option.
arping,ping,ping6: Defer exitting to allow users to see usage.
arping,ping,ping6,ninfod: Change euid to uid (non-root) even if capabiliy is enabled.
ninfod: Add configure.
ninfod: libcap support to drop capabilities.
ninfod: Add run as user (-u user) option.
ninfod: Fix usage message.
arping,clockdiff,rarpd,rdisc,tftpd: Change RFC source to tools.ietf.org.
ninfod: Add ninfod(8) manpage.
makefile: Add ninfod, distclean targets.
iputils-s20121114
^ permalink raw reply
* [PATCH net-next 1/3] myri10ge: Convert from LRO to GRO
From: Andrew Gallatin @ 2012-11-14 13:06 UTC (permalink / raw)
To: netdev
Convert myri10ge from LRO to GRO, and simplify the driver by removing
various LRO-related code which is no longer needed including
ndo_fix_features op, custom skb building from frags, and LRO
header parsing.
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
---
drivers/net/ethernet/myricom/Kconfig | 1 -
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 227
+++-------------------
2 files changed, 31 insertions(+), 197 deletions(-)
diff --git a/drivers/net/ethernet/myricom/Kconfig
b/drivers/net/ethernet/myricom/Kconfig
index 540f0c6..3932d08 100644
--- a/drivers/net/ethernet/myricom/Kconfig
+++ b/drivers/net/ethernet/myricom/Kconfig
@@ -23,7 +23,6 @@ config MYRI10GE
depends on PCI && INET
select FW_LOADER
select CRC32
- select INET_LRO
---help---
This driver supports Myricom Myri-10G Dual Protocol interface in
Ethernet mode. If the eeprom on your board is not recent enough,
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 83516e3..a5ab2f2 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -50,7 +50,6 @@
#include <linux/etherdevice.h>
#include <linux/if_ether.h>
#include <linux/if_vlan.h>
-#include <linux/inet_lro.h>
#include <linux/dca.h>
#include <linux/ip.h>
#include <linux/inet.h>
@@ -96,8 +95,6 @@ MODULE_LICENSE("Dual BSD/GPL");
#define MYRI10GE_EEPROM_STRINGS_SIZE 256
#define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2)
-#define MYRI10GE_MAX_LRO_DESCRIPTORS 8
-#define MYRI10GE_LRO_MAX_PKTS 64
#define MYRI10GE_NO_CONFIRM_DATA htonl(0xffffffff)
#define MYRI10GE_NO_RESPONSE_RESULT 0xffffffff
@@ -165,8 +162,6 @@ struct myri10ge_rx_done {
dma_addr_t bus;
int cnt;
int idx;
- struct net_lro_mgr lro_mgr;
- struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS];
};
struct myri10ge_slice_netstats {
@@ -338,11 +333,6 @@ static int myri10ge_debug = -1; /* defaults above */
module_param(myri10ge_debug, int, 0);
MODULE_PARM_DESC(myri10ge_debug, "Debug level (0=none,...,16=all)");
-static int myri10ge_lro_max_pkts = MYRI10GE_LRO_MAX_PKTS;
-module_param(myri10ge_lro_max_pkts, int, S_IRUGO);
-MODULE_PARM_DESC(myri10ge_lro_max_pkts,
- "Number of LRO packets to be aggregated");
-
static int myri10ge_fill_thresh = 256;
module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(myri10ge_fill_thresh, "Number of empty rx slots
allowed");
@@ -1197,36 +1187,6 @@ static inline void myri10ge_vlan_ip_csum(struct
sk_buff *skb, __wsum hw_csum)
}
}
-static inline void
-myri10ge_rx_skb_build(struct sk_buff *skb, u8 * va,
- struct skb_frag_struct *rx_frags, int len, int hlen)
-{
- struct skb_frag_struct *skb_frags;
-
- skb->len = skb->data_len = len;
- /* attach the page(s) */
-
- skb_frags = skb_shinfo(skb)->frags;
- while (len > 0) {
- memcpy(skb_frags, rx_frags, sizeof(*skb_frags));
- len -= skb_frag_size(rx_frags);
- skb_frags++;
- rx_frags++;
- skb_shinfo(skb)->nr_frags++;
- }
-
- /* pskb_may_pull is not available in irq context, but
- * skb_pull() (for ether_pad and eth_type_trans()) requires
- * the beginning of the packet in skb_headlen(), move it
- * manually */
- skb_copy_to_linear_data(skb, va, hlen);
- skb_shinfo(skb)->frags[0].page_offset += hlen;
- skb_frag_size_sub(&skb_shinfo(skb)->frags[0], hlen);
- skb->data_len -= hlen;
- skb->tail += hlen;
- skb_pull(skb, MXGEFW_PAD);
-}
-
static void
myri10ge_alloc_rx_pages(struct myri10ge_priv *mgp, struct
myri10ge_rx_buf *rx,
int bytes, int watchdog)
@@ -1304,18 +1264,14 @@ myri10ge_unmap_rx_page(struct pci_dev *pdev,
}
}
-#define MYRI10GE_HLEN 64 /* The number of bytes to copy from a
- * page into an skb */
-
static inline int
-myri10ge_rx_done(struct myri10ge_slice_state *ss, int len, __wsum csum,
- bool lro_enabled)
+myri10ge_rx_done(struct myri10ge_slice_state *ss, int len, __wsum csum)
{
struct myri10ge_priv *mgp = ss->mgp;
struct sk_buff *skb;
- struct skb_frag_struct rx_frags[MYRI10GE_MAX_FRAGS_PER_FRAME];
+ struct skb_frag_struct *rx_frags;
struct myri10ge_rx_buf *rx;
- int i, idx, hlen, remainder, bytes;
+ int i, idx, remainder, bytes;
struct pci_dev *pdev = mgp->pdev;
struct net_device *dev = mgp->dev;
u8 *va;
@@ -1332,6 +1288,20 @@ myri10ge_rx_done(struct myri10ge_slice_state *ss,
int len, __wsum csum,
idx = rx->cnt & rx->mask;
va = page_address(rx->info[idx].page) + rx->info[idx].page_offset;
prefetch(va);
+
+ skb = napi_get_frags(&ss->napi);
+ if (unlikely(skb == NULL)) {
+ ss->stats.rx_dropped++;
+ for (i = 0, remainder = len; remainder > 0; i++) {
+ myri10ge_unmap_rx_page(pdev, &rx->info[idx], bytes);
+ put_page(rx->info[idx].page);
+ rx->cnt++;
+ idx = rx->cnt & rx->mask;
+ remainder -= MYRI10GE_ALLOC_SIZE;
+ }
+ return 0;
+ }
+ rx_frags = skb_shinfo(skb)->frags;
/* Fill skb_frag_struct(s) with data from our receive */
for (i = 0, remainder = len; remainder > 0; i++) {
myri10ge_unmap_rx_page(pdev, &rx->info[idx], bytes);
@@ -1345,54 +1315,23 @@ myri10ge_rx_done(struct myri10ge_slice_state
*ss, int len, __wsum csum,
idx = rx->cnt & rx->mask;
remainder -= MYRI10GE_ALLOC_SIZE;
}
+ skb_shinfo(skb)->nr_frags = i;
- if (lro_enabled) {
- rx_frags[0].page_offset += MXGEFW_PAD;
- skb_frag_size_sub(&rx_frags[0], MXGEFW_PAD);
- len -= MXGEFW_PAD;
- lro_receive_frags(&ss->rx_done.lro_mgr, rx_frags,
- /* opaque, will come back in get_frag_header */
- len, len,
- (void *)(__force unsigned long)csum, csum);
+ /* remove padding */
+ rx_frags[0].page_offset += MXGEFW_PAD;
+ rx_frags[0].size -= MXGEFW_PAD;
+ len -= MXGEFW_PAD;
- return 1;
- }
-
- hlen = MYRI10GE_HLEN > len ? len : MYRI10GE_HLEN;
-
- /* allocate an skb to attach the page(s) to. This is done
- * after trying LRO, so as to avoid skb allocation overheads */
-
- skb = netdev_alloc_skb(dev, MYRI10GE_HLEN + 16);
- if (unlikely(skb == NULL)) {
- ss->stats.rx_dropped++;
- do {
- i--;
- __skb_frag_unref(&rx_frags[i]);
- } while (i != 0);
- return 0;
- }
-
- /* Attach the pages to the skb, and trim off any padding */
- myri10ge_rx_skb_build(skb, va, rx_frags, len, hlen);
- if (skb_frag_size(&skb_shinfo(skb)->frags[0]) <= 0) {
- skb_frag_unref(skb, 0);
- skb_shinfo(skb)->nr_frags = 0;
- } else {
- skb->truesize += bytes * skb_shinfo(skb)->nr_frags;
+ skb->len = len;
+ skb->data_len = len;
+ skb->truesize += len;
+ if (dev->features & NETIF_F_RXCSUM) {
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ skb->csum = csum;
}
- skb->protocol = eth_type_trans(skb, dev);
skb_record_rx_queue(skb, ss - &mgp->ss[0]);
- if (dev->features & NETIF_F_RXCSUM) {
- if ((skb->protocol == htons(ETH_P_IP)) ||
- (skb->protocol == htons(ETH_P_IPV6))) {
- skb->csum = csum;
- skb->ip_summed = CHECKSUM_COMPLETE;
- } else
- myri10ge_vlan_ip_csum(skb, csum);
- }
- netif_receive_skb(skb);
+ napi_gro_frags(&ss->napi);
return 1;
}
@@ -1480,18 +1419,11 @@ myri10ge_clean_rx_done(struct
myri10ge_slice_state *ss, int budget)
u16 length;
__wsum checksum;
- /*
- * Prevent compiler from generating more than one ->features memory
- * access to avoid theoretical race condition with functions that
- * change NETIF_F_LRO flag at runtime.
- */
- bool lro_enabled = !!(ACCESS_ONCE(mgp->dev->features) & NETIF_F_LRO);
-
while (rx_done->entry[idx].length != 0 && work_done < budget) {
length = ntohs(rx_done->entry[idx].length);
rx_done->entry[idx].length = 0;
checksum = csum_unfold(rx_done->entry[idx].checksum);
- rx_ok = myri10ge_rx_done(ss, length, checksum, lro_enabled);
+ rx_ok = myri10ge_rx_done(ss, length, checksum);
rx_packets += rx_ok;
rx_bytes += rx_ok * (unsigned long)length;
cnt++;
@@ -1503,9 +1435,6 @@ myri10ge_clean_rx_done(struct myri10ge_slice_state
*ss, int budget)
ss->stats.rx_packets += rx_packets;
ss->stats.rx_bytes += rx_bytes;
- if (lro_enabled)
- lro_flush_all(&rx_done->lro_mgr);
-
/* restock receive rings if needed */
if (ss->rx_small.fill_cnt - ss->rx_small.cnt < myri10ge_fill_thresh)
myri10ge_alloc_rx_pages(mgp, &ss->rx_small,
@@ -1779,7 +1708,6 @@ static const char
myri10ge_gstrings_slice_stats[][ETH_GSTRING_LEN] = {
"tx_pkt_start", "tx_pkt_done", "tx_req", "tx_done",
"rx_small_cnt", "rx_big_cnt",
"wake_queue", "stop_queue", "tx_linearized",
- "LRO aggregated", "LRO flushed", "LRO avg aggr", "LRO no_desc",
};
#define MYRI10GE_NET_STATS_LEN 21
@@ -1880,14 +1808,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
data[i++] = (unsigned int)ss->tx.wake_queue;
data[i++] = (unsigned int)ss->tx.stop_queue;
data[i++] = (unsigned int)ss->tx.linearized;
- data[i++] = ss->rx_done.lro_mgr.stats.aggregated;
- data[i++] = ss->rx_done.lro_mgr.stats.flushed;
- if (ss->rx_done.lro_mgr.stats.flushed)
- data[i++] = ss->rx_done.lro_mgr.stats.aggregated /
- ss->rx_done.lro_mgr.stats.flushed;
- else
- data[i++] = 0;
- data[i++] = ss->rx_done.lro_mgr.stats.no_desc;
}
}
@@ -2271,67 +2191,6 @@ static void myri10ge_free_irq(struct
myri10ge_priv *mgp)
pci_disable_msix(pdev);
}
-static int
-myri10ge_get_frag_header(struct skb_frag_struct *frag, void **mac_hdr,
- void **ip_hdr, void **tcpudp_hdr,
- u64 * hdr_flags, void *priv)
-{
- struct ethhdr *eh;
- struct vlan_ethhdr *veh;
- struct iphdr *iph;
- u8 *va = skb_frag_address(frag);
- unsigned long ll_hlen;
- /* passed opaque through lro_receive_frags() */
- __wsum csum = (__force __wsum) (unsigned long)priv;
-
- /* find the mac header, aborting if not IPv4 */
-
- eh = (struct ethhdr *)va;
- *mac_hdr = eh;
- ll_hlen = ETH_HLEN;
- if (eh->h_proto != htons(ETH_P_IP)) {
- if (eh->h_proto == htons(ETH_P_8021Q)) {
- veh = (struct vlan_ethhdr *)va;
- if (veh->h_vlan_encapsulated_proto != htons(ETH_P_IP))
- return -1;
-
- ll_hlen += VLAN_HLEN;
-
- /*
- * HW checksum starts ETH_HLEN bytes into
- * frame, so we must subtract off the VLAN
- * header's checksum before csum can be used
- */
- csum = csum_sub(csum, csum_partial(va + ETH_HLEN,
- VLAN_HLEN, 0));
- } else {
- return -1;
- }
- }
- *hdr_flags = LRO_IPV4;
-
- iph = (struct iphdr *)(va + ll_hlen);
- *ip_hdr = iph;
- if (iph->protocol != IPPROTO_TCP)
- return -1;
- if (ip_is_fragment(iph))
- return -1;
- *hdr_flags |= LRO_TCP;
- *tcpudp_hdr = (u8 *) (*ip_hdr) + (iph->ihl << 2);
-
- /* verify the IP checksum */
- if (unlikely(ip_fast_csum((u8 *) iph, iph->ihl)))
- return -1;
-
- /* verify the checksum */
- if (unlikely(csum_tcpudp_magic(iph->saddr, iph->daddr,
- ntohs(iph->tot_len) - (iph->ihl << 2),
- IPPROTO_TCP, csum)))
- return -1;
-
- return 0;
-}
-
static int myri10ge_get_txrx(struct myri10ge_priv *mgp, int slice)
{
struct myri10ge_cmd cmd;
@@ -2402,7 +2261,6 @@ static int myri10ge_open(struct net_device *dev)
struct myri10ge_cmd cmd;
int i, status, big_pow2, slice;
u8 *itable;
- struct net_lro_mgr *lro_mgr;
if (mgp->running != MYRI10GE_ETH_STOPPED)
return -EBUSY;
@@ -2513,19 +2371,6 @@ static int myri10ge_open(struct net_device *dev)
goto abort_with_rings;
}
- lro_mgr = &ss->rx_done.lro_mgr;
- lro_mgr->dev = dev;
- lro_mgr->features = LRO_F_NAPI;
- lro_mgr->ip_summed = CHECKSUM_COMPLETE;
- lro_mgr->ip_summed_aggr = CHECKSUM_UNNECESSARY;
- lro_mgr->max_desc = MYRI10GE_MAX_LRO_DESCRIPTORS;
- lro_mgr->lro_arr = ss->rx_done.lro_desc;
- lro_mgr->get_frag_header = myri10ge_get_frag_header;
- lro_mgr->max_aggr = myri10ge_lro_max_pkts;
- lro_mgr->frag_align_pad = 2;
- if (lro_mgr->max_aggr > MAX_SKB_FRAGS)
- lro_mgr->max_aggr = MAX_SKB_FRAGS;
-
/* must happen prior to any irq */
napi_enable(&(ss)->napi);
}
@@ -3143,15 +2988,6 @@ static int myri10ge_set_mac_address(struct
net_device *dev, void *addr)
return 0;
}
-static netdev_features_t myri10ge_fix_features(struct net_device *dev,
- netdev_features_t features)
-{
- if (!(features & NETIF_F_RXCSUM))
- features &= ~NETIF_F_LRO;
-
- return features;
-}
-
static int myri10ge_change_mtu(struct net_device *dev, int new_mtu)
{
struct myri10ge_priv *mgp = netdev_priv(dev);
@@ -3878,7 +3714,6 @@ static const struct net_device_ops
myri10ge_netdev_ops = {
.ndo_get_stats64 = myri10ge_get_stats,
.ndo_validate_addr = eth_validate_addr,
.ndo_change_mtu = myri10ge_change_mtu,
- .ndo_fix_features = myri10ge_fix_features,
.ndo_set_rx_mode = myri10ge_set_multicast_list,
.ndo_set_mac_address = myri10ge_set_mac_address,
};
@@ -4018,7 +3853,7 @@ static int myri10ge_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
netdev->netdev_ops = &myri10ge_netdev_ops;
netdev->mtu = myri10ge_initial_mtu;
- netdev->hw_features = mgp->features | NETIF_F_LRO | NETIF_F_RXCSUM;
+ netdev->hw_features = mgp->features | NETIF_F_RXCSUM;
netdev->features = netdev->hw_features;
if (dac_enabled)
--
1.7.9.5
^ permalink raw reply related
* Re: [RFC PATCH 00/13] Always build GSO/GRO functionality into the kernel
From: Vlad Yasevich @ 2012-11-14 13:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem
In-Reply-To: <1352859928.4497.1.camel@edumazet-glaptop>
On 11/13/2012 09:25 PM, Eric Dumazet wrote:
> On Tue, 2012-11-13 at 20:24 -0500, Vlad Yasevich wrote:
>> This patch series is a revision suggested by Eric to solve the problem where
>> a host without IPv6 support drops GSO frames from the guest.
>>
>> The problem is that GSO/GRO support is per protocol, and when said protocol
>> is not loaded or is disabled, packets attempting to go through GSO/GRO code paths
>> are dropped. This causes retransmissions and a two orders of magnitude drop in
>> performance.
>>
>> Prior attempt to solve the problem simply enabled enough GSO/GRO functionality
>> for IPv6 protocol when IPv6 was diabled. This did not solve the problem when
>> the protocol was not build in or was blacklisted.
>> To solve the problem, it was suggested that we separate GSO/GRO callback
>> registration from packet processing registrations. That way
>> GSO/GRO callbacks can be built into the kernel and always be there.
>> This patch series attempts to do just that.
>> * Patches 1 and 2 split the GSO/GRO handlers from packet_type and convert
>> to the new structure.
>> * Patches 3, 4 and 5 do the same thing for net_protocol structure.
>> * The rest of the patches try to incrementally move the IPv6 GSO/GRO
>> code out of the module and into the static kernel build. Some IPv6
>> helper functions also had to move as well.
>>
>> I am currently testing the patches, but if folks could look this over
>> and send me any comments, I'd appreciate it.
>
> Seems very nice !
>
> I am just wondering if GSO/GRO is fully enabled at every step ?
>
I think so. I ran basic tests along most of the steps and it seemed to
be enabled. That's why this is a 13 patch series :) Tried to do it
incrementally without impacting functionality.
-vlad
^ permalink raw reply
* Re: [RFC PATCH 03/13] net: Add net protocol offload registration infrustructure
From: Vlad Yasevich @ 2012-11-14 13:08 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: netdev, eric.dumazet, davem
In-Reply-To: <50A354C6.20805@6wind.com>
On 11/14/2012 03:22 AM, Nicolas Dichtel wrote:
> Le 14/11/2012 02:24, Vlad Yasevich a écrit :
>> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
>> index 8918eff..1278db8 100644
>> --- a/net/ipv4/protocol.c
>> +++ b/net/ipv4/protocol.c
>> @@ -29,6 +29,7 @@
>> #include <net/protocol.h>
>>
>> const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS]
>> __read_mostly;
>> +const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS]
>> __read_mostly;
>>
>> /*
>> * Add a protocol handler to the hash tables
>> @@ -41,6 +42,13 @@ int inet_add_protocol(const struct net_protocol
>> *prot, unsigned char protocol)
>> }
>> EXPORT_SYMBOL(inet_add_protocol);
>>
>> +int inet_add_offload(const struct net_offload *prot, unsigned char
>> protocol)
>> +{
>> + return !cmpxchg((const struct net_offload
>> **)&inet_offloads[protocol],
>> + NULL, prot) ? 0 : -1;
>> +}
>> +EXPORT_SYMBOL(inet_add_offload);
>> +
>> /*
>> * Remove a protocol from the hash tables.
>> */
>> @@ -56,4 +64,16 @@ int inet_del_protocol(const struct net_protocol
>> *prot, unsigned char protocol)
>>
>> return ret;
>> }
>> -EXPORT_SYMBOL(inet_del_protocol);
> This line should probably not be removed ;-)
Yep, good catch... thanks...
-vald
^ permalink raw reply
* [PATCH net-next 3/3] myri10ge: Use skb_fill_page_desc().
From: Andrew Gallatin @ 2012-11-14 13:06 UTC (permalink / raw)
To: netdev
Now that LRO is gone, the receive routine is much simpler, and
we are able to use the standard skb_fill_page_desc() in myri10ge.
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
---
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index b9b6dfd..82e9dcc 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1347,17 +1347,14 @@ myri10ge_rx_done(struct myri10ge_slice_state
*ss, int len, __wsum csum)
/* Fill skb_frag_struct(s) with data from our receive */
for (i = 0, remainder = len; remainder > 0; i++) {
myri10ge_unmap_rx_page(pdev, &rx->info[idx], bytes);
- __skb_frag_set_page(&rx_frags[i], rx->info[idx].page);
- rx_frags[i].page_offset = rx->info[idx].page_offset;
- if (remainder < MYRI10GE_ALLOC_SIZE)
- skb_frag_size_set(&rx_frags[i], remainder);
- else
- skb_frag_size_set(&rx_frags[i], MYRI10GE_ALLOC_SIZE);
+ skb_fill_page_desc(skb, i, rx->info[idx].page,
+ rx->info[idx].page_offset,
+ remainder < MYRI10GE_ALLOC_SIZE ?
+ remainder : MYRI10GE_ALLOC_SIZE);
rx->cnt++;
idx = rx->cnt & rx->mask;
remainder -= MYRI10GE_ALLOC_SIZE;
}
- skb_shinfo(skb)->nr_frags = i;
/* remove padding */
rx_frags[0].page_offset += MXGEFW_PAD;
--
1.7.9.5
^ permalink raw reply related
* [PATCH net-next 2/3] myri10ge: Add vlan rx for better GRO perf.
From: Andrew Gallatin @ 2012-11-14 13:06 UTC (permalink / raw)
To: netdev
Unlike LRO, GRO requires that vlan tags be removed before
aggregation can occur. Since the myri10ge NIC does not support
hardware vlan tag offload, we must remove the tag in the driver
to achieve performance comparable to LRO for vlan tagged frames.
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
---
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 47
++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index a5ab2f2..b9b6dfd 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1264,6 +1264,48 @@ myri10ge_unmap_rx_page(struct pci_dev *pdev,
}
}
+/*
+ * GRO does not support acceleration of tagged vlan frames, and
+ * this NIC does not support vlan tag offload, so we must pop
+ * the tag ourselves to be able to achieve GRO performance that
+ * is comparable to LRO.
+ */
+
+static inline void
+myri10ge_vlan_rx(struct net_device *dev, void *addr, struct sk_buff *skb)
+{
+ u8 *va;
+ struct vlan_ethhdr *veh;
+ struct ethhdr *eh;
+ struct skb_frag_struct *frag;
+ u16 proto;
+
+ va = addr;
+ va += MXGEFW_PAD;
+ veh = (struct vlan_ethhdr *) va;
+ if ((dev->features & (NETIF_F_HW_VLAN_RX | NETIF_F_GRO)) ==
+ (NETIF_F_HW_VLAN_RX | NETIF_F_GRO) &&
+ (veh->h_vlan_proto == ntohs(ETH_P_8021Q))) {
+ /* fixup csum if needed */
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_sub(skb->csum,
+ csum_partial(va + ETH_HLEN,
+ VLAN_HLEN, 0));
+ /* pop tag */
+ __vlan_hwaccel_put_tag(skb, ntohs(veh->h_vlan_TCI));
+ proto = veh->h_vlan_encapsulated_proto;
+ memmove(va + VLAN_HLEN, va, ETH_HLEN);
+ va += VLAN_HLEN;
+ eh = (struct ethhdr *)va;
+ eh->h_proto = proto;
+ skb->len -= VLAN_HLEN;
+ skb->data_len -= VLAN_HLEN;
+ frag = skb_shinfo(skb)->frags;
+ frag->page_offset += VLAN_HLEN;
+ skb_frag_size_set(frag, skb_frag_size(frag) - VLAN_HLEN);
+ }
+}
+
static inline int
myri10ge_rx_done(struct myri10ge_slice_state *ss, int len, __wsum csum)
{
@@ -1329,6 +1371,7 @@ myri10ge_rx_done(struct myri10ge_slice_state *ss,
int len, __wsum csum)
skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum;
}
+ myri10ge_vlan_rx(mgp->dev, va, skb);
skb_record_rx_queue(skb, ss - &mgp->ss[0]);
napi_gro_frags(&ss->napi);
@@ -3854,6 +3897,10 @@ static int myri10ge_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
netdev->netdev_ops = &myri10ge_netdev_ops;
netdev->mtu = myri10ge_initial_mtu;
netdev->hw_features = mgp->features | NETIF_F_RXCSUM;
+
+ /* fake NETIF_F_HW_VLAN_RX for good GRO performance */
+ netdev->hw_features |= NETIF_F_HW_VLAN_RX;
+
netdev->features = netdev->hw_features;
if (dac_enabled)
--
1.7.9.5
^ permalink raw reply related
* [PATCH net-next 0/3] myri10ge: LRO to GRO conversion
From: Andrew Gallatin @ 2012-11-14 13:06 UTC (permalink / raw)
To: netdev
Hi,
The following patchset converts myri10ge from using the old inet_lro
interface to GRO.
Note that a naive LRO->GRO conversion of myri10ge will result in a
performance regression for vlan tagged frames. This is because
myri10ge does not offer hardware vlan tag offload, and because GRO
requires hardware vlan tag offload to aggregate vlan tagged frames.
To address this performance regression, I have implemented vlan tag
popping in the myri10ge driver, as it seems to be the lesser of two
evils. As eric.dumazet@gmail.com commented when I asked about this on
netdev last week: "Given GRO assumes NIC does hardware vlan
offloading, I guess I would chose to do that. It seems unfortunate to
add vlan decap in GRO path, already very complex."
Andrew Gallatin (3):
myri10ge: Convert from LRO to GRO
myri10ge: Add vlan rx for better GRO perf.
myri10ge: Use skb_fill_page_desc().
drivers/net/ethernet/myricom/Kconfig | 1 -
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 281
++++++----------------
2 files changed, 80 insertions(+), 202 deletions(-)
^ permalink raw reply
* Re: [RFC PATCH 01/13] net: Add generic packet offload infrastructure.
From: Vlad Yasevich @ 2012-11-14 13:03 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem
In-Reply-To: <1352859861.4497.0.camel@edumazet-glaptop>
On 11/13/2012 09:24 PM, Eric Dumazet wrote:
> On Tue, 2012-11-13 at 20:24 -0500, Vlad Yasevich wrote:
>> Create a new data structure to contain the GRO/GSO callbacks and add
>> a new registration mechanism.
>>
>> Singed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>> ---
>> include/linux/netdevice.h | 15 ++++++++
>> net/core/dev.c | 80 +++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 95 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index f8eda02..d15af51 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1511,6 +1511,18 @@ struct packet_type {
>> struct list_head list;
>> };
>>
>> +struct packet_offload {
>> + __be16 type; /* This is really htons(ether_type). */
>> + struct net_device *dev; /* NULL is wildcarded here */
>
> Shouldnt this dev pointer be removed at some point in the patch serie ?
yes. i was thinking about this as well. I actually shouldn't have been
carried into this struct to begin with since its not really being used
by the offload calls.
-vlad
>
>> + struct sk_buff *(*gso_segment)(struct sk_buff *skb,
>> + netdev_features_t features);
>> + int (*gso_send_check)(struct sk_buff *skb);
>> + struct sk_buff **(*gro_receive)(struct sk_buff **head,
>> + struct sk_buff *skb);
>> + int (*gro_complete)(struct sk_buff *skb);
>> + struct list_head list;
>> +};
>> +
>> #include <linux/notifier.h>
>>
>
^ permalink raw reply
* [PATCH 2/2] ping: Don't free an unintialized value.
From: Jan Synacek @ 2012-11-14 12:57 UTC (permalink / raw)
To: yoshfuji; +Cc: netdev, Jan Synacek
In-Reply-To: <1352897836-17603-1-git-send-email-jsynacek@redhat.com>
---
ping.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/ping.c b/ping.c
index fe9ff8a..9de3d08 100644
--- a/ping.c
+++ b/ping.c
@@ -122,7 +122,7 @@ main(int argc, char **argv)
u_char *packet;
char *target;
#ifdef USE_IDN
- char *hnamebuf;
+ char *hnamebuf = NULL;
#else
char hnamebuf[MAX_HOSTNAMELEN];
#endif
@@ -263,8 +263,10 @@ main(int argc, char **argv)
#ifdef USE_IDN
int rc;
- free(hnamebuf);
- hnamebuf = NULL;
+ if (hnamebuf) {
+ free(hnamebuf);
+ hnamebuf = NULL;
+ }
rc = idna_to_ascii_lz(target, &idn, 0);
if (rc != IDNA_SUCCESS) {
--
1.7.11.7
^ permalink raw reply related
* [PATCH 1/2] ping,ping6: Add newline to error message.
From: Jan Synacek @ 2012-11-14 12:57 UTC (permalink / raw)
To: yoshfuji; +Cc: netdev, Jan Synacek
In-Reply-To: <1352897836-17603-1-git-send-email-jsynacek@redhat.com>
---
ping_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ping_common.c b/ping_common.c
index 86dbeaa..1605ca9 100644
--- a/ping_common.c
+++ b/ping_common.c
@@ -265,7 +265,7 @@ void common_options(int ch)
char *endp;
mark = (int)strtoul(optarg, &endp, 10);
if (mark < 0 || *endp != '\0') {
- fprintf(stderr, "mark cannot be negative");
+ fprintf(stderr, "mark cannot be negative\n");
exit(2);
}
options |= F_MARK;
--
1.7.11.7
^ permalink raw reply related
* [PATCH 0/2] iputils: minor fixes
From: Jan Synacek @ 2012-11-14 12:57 UTC (permalink / raw)
To: yoshfuji; +Cc: netdev, Jan Synacek
Jan Synacek (2):
ping,ping6: Add newline to error message.
ping: Don't free an unintialized value.
ping.c | 8 +++++---
ping_common.c | 2 +-
2 files changed, 6 insertions(+), 4 deletions(-)
--
1.7.11.7
^ permalink raw reply
* [patch net] net: correct check in dev_addr_del()
From: Jiri Pirko @ 2012-11-14 12:51 UTC (permalink / raw)
To: netdev; +Cc: davem, eric.dumazet, shemminger, john.r.fastabend
Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
net/core/dev_addr_lists.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 87cc17d..b079c7b 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -319,7 +319,8 @@ int dev_addr_del(struct net_device *dev, const unsigned char *addr,
*/
ha = list_first_entry(&dev->dev_addrs.list,
struct netdev_hw_addr, list);
- if (ha->addr == dev->dev_addr && ha->refcount == 1)
+ if (!memcmp(ha->addr, addr, dev->addr_len) &&
+ ha->type == addr_type && ha->refcount == 1)
return -ENOENT;
err = __hw_addr_del(&dev->dev_addrs, addr, dev->addr_len,
--
1.8.0
^ permalink raw reply related
* RE: [PATCH v2 3/3] ipgre: capture inner headers during encapsulation
From: Dmitry Kravkov @ 2012-11-14 12:32 UTC (permalink / raw)
To: Joseph Gasparakis, davem@davemloft.net, shemminger@vyatta.com,
chrisw@sous-sol.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Peter P Waskiewicz Jr
In-Reply-To: <1352689450-4092-1-git-send-email-joseph.gasparakis@intel.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Joseph Gasparakis
> Sent: Monday, November 12, 2012 5:04 AM
> To: davem@davemloft.net; shemminger@vyatta.com; chrisw@sous-sol.org
> Cc: Joseph Gasparakis; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> Peter P Waskiewicz Jr
> Subject: [PATCH v2 3/3] ipgre: capture inner headers during encapsulation
>
> Populating the inner header pointers of skb for ipgre
> This patch has been compile-tested only.
>
> v2 Makes sure that checksumming does not take place if the offload flag is set in
> the skb's netdev features
>
> Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> ---
> net/ipv4/ip_gre.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 7240f8e..e35ed52 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -766,8 +766,10 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff
> *skb, struct net_device *dev
> int gre_hlen;
> __be32 dst;
> int mtu;
> + unsigned int offset;
>
> - if (skb->ip_summed == CHECKSUM_PARTIAL &&
> + if (!(skb->dev->features & NETIF_F_HW_CSUM_ENC_BIT) &&
You should test for NETIF_F_HW_CSUM_ENC and not for NETIF_F_HW_CSUM_ENC_BIT
^ permalink raw reply
* RE: [PATCH 1/3] net: Add support for hardware-offloaded encapsulation
From: Dmitry Kravkov @ 2012-11-14 12:23 UTC (permalink / raw)
To: Joseph Gasparakis, davem@davemloft.net, shemminger@vyatta.com,
chrisw@sous-sol.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Peter P Waskiewicz Jr
In-Reply-To: <1352427483-4380-2-git-send-email-joseph.gasparakis@intel.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Joseph Gasparakis
> Sent: Friday, November 09, 2012 4:18 AM
> To: davem@davemloft.net; shemminger@vyatta.com; chrisw@sous-sol.org
> Cc: Joseph Gasparakis; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> Peter P Waskiewicz Jr
> Subject: [PATCH 1/3] net: Add support for hardware-offloaded encapsulation
>
> This patch adds support in the kernel for offloading in the NIC Tx and Rx
> checksumming for encapsulated packets (such as VXLAN and IP GRE)
>
> Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
[D.K.]
> NETIF_F_HW_CSUM_BIT, /* Can checksum all the packets.
> */
> + NETIF_F_HW_CSUM_ENC_BIT, /* Can checksum all inner headers */
> NETIF_F_IPV6_CSUM_BIT, /* Can checksum TCP/UDP over
> IPV6 */
Also #define NETIF_F_HW_CSUM_ENC should be added
^ permalink raw reply
* [PATCH] net/smsc911x: Fix ready check in cases where WORD_SWAP is needed
From: Kamlakant Patel @ 2012-11-14 11:41 UTC (permalink / raw)
To: netdev; +Cc: steve, davem, linus.walleij, robert.marklund, Kamlakant Patel
The chip ready check added by the commit 3ac3546e [Always wait for
the chip to be ready] does not work when the register read/write
is word swapped. This check has been added before the WORD_SWAP
register is programmed, so we need to check for swapped register
value as well.
Bit 16 is marked as RESERVED in SMSC datasheet, Steve Glendinning
<steve@shawell.net> checked with SMSC and wrote:
The chip architects have concluded we should be reading PMT_CTRL
until we see any of bits 0, 8, 16 or 24 set. Then we should read
BYTE_TEST to check the byte order is correct (as we already do).
The rationale behind this is that some of the chip variants have
word order swapping features too, so the READY bit could actually
be in any of the 4 possible locations. The architects have confirmed
that if any of these 4 positions is set the chip is ready. The other
3 locations will either never be set or can only go high after READY
does (so also indicate the device is ready).
This change will check for the READY bit at the 16th position. We do
not check the other two cases (bit 8 and 24) since the driver does not
support byte-swapped register read/write.
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
---
drivers/net/ethernet/smsc/smsc911x.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 62d1baf..c53c0f4 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2110,7 +2110,7 @@ static void __devinit smsc911x_read_mac_address(struct net_device *dev)
static int __devinit smsc911x_init(struct net_device *dev)
{
struct smsc911x_data *pdata = netdev_priv(dev);
- unsigned int byte_test;
+ unsigned int byte_test, mask;
unsigned int to = 100;
SMSC_TRACE(pdata, probe, "Driver Parameters:");
@@ -2130,9 +2130,22 @@ static int __devinit smsc911x_init(struct net_device *dev)
/*
* poll the READY bit in PMT_CTRL. Any other access to the device is
* forbidden while this bit isn't set. Try for 100ms
+ *
+ * Note that this test is done before the WORD_SWAP register is
+ * programmed. So in some configurations the READY bit is at 16 before
+ * WORD_SWAP is written to. This issue is worked around by waiting
+ * until either bit 0 or bit 16 gets set in PMT_CTRL.
+ *
+ * SMSC has confirmed that checking bit 16 (marked as reserved in
+ * the datasheet) is fine since these bits "will either never be set
+ * or can only go high after READY does (so also indicate the device
+ * is ready)".
*/
- while (!(smsc911x_reg_read(pdata, PMT_CTRL) & PMT_CTRL_READY_) && --to)
+
+ mask = PMT_CTRL_READY_ | swahw32(PMT_CTRL_READY_);
+ while (!(smsc911x_reg_read(pdata, PMT_CTRL) & mask) && --to)
udelay(1000);
+
if (to == 0) {
pr_err("Device not READY in 100ms aborting\n");
return -ENODEV;
--
1.7.6
^ permalink raw reply related
* Re: [PATCH net-next] add DOVE extensions for VXLAN
From: David Stevens @ 2012-11-14 10:02 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20121113144148.66a86bc7@nehalam.linuxnetplumber.net>
Stephen Hemminger <shemminger@vyatta.com> wrote on 11/13/2012 05:41:48 PM:
> > Someone who doesn't want all of them shouldn't use this
feature.
> > If we're dropping the "dove" flag in favor of individual flags for
each
> > feature, then I could make this into "l2miss" and "l3miss" flags and
> > they should default off, of course.
> >
> > +-DLS
>
> Maybe a OVS style "here is the homeless packet" message is needed.
> That would allow for controller in user space to populate table
> on as needed basis.
The netlink notifications are sending just the link information
and one of a MAC address or an IP address, depending on the sort of
miss we have. Sending the whole packet to user space would add up
to 64K of user packet data that is completely useless to us.
Rate-limiting based on destination means keeping state of
some sort, which means an attacker can present you with new state
to keep on every packet (like sequentially sending to all possible
Internet addresses).
I think a DoS attack from a hosted VM is no different than
any other, unintentional, disproportionate use of resources by a VM
and is ultimately a matter for the host admin.
The netlink notifications include minimal information for
filling the forwarding table on demand and even the ordinary VXLAN
way of sending all the user data via multicast will result in
replication of packet data (up to 64K) to send to multiple destinations,
all
but one of which will ultimately drop it in the unicast dst case. This,
until a valid unicast forwarding table entry is in place.
I think, by comparison, a (smaller) netlink message locally
on every miss is fairly lightweight. We could keep state of what
destinations we've done notifications on, but then we have to age
these and this does nothing for DoS attacks which only have to send to
lots of different destinations to overflow our state table.
The code I sent only turns on notifications at all when the
"dove" flag is set and we can split this into separate flags to allow
finer control. But I don't think sending multiple copies of whole
packets remotely or a single copy of the whole packet locally is
better than sending just the address info locally for unknown
destinations.
+-DLS
^ permalink raw reply
* Re: [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
From: Kirill Smelkov @ 2012-11-14 9:25 UTC (permalink / raw)
To: Francois Romieu
Cc: Realtek linux nic maintainers, Hayes Wang, David S. Miller,
Greg Kroah-Hartman, netdev
In-Reply-To: <20121113223512.GA29342@electric-eye.fr.zoreil.com>
On Tue, Nov 13, 2012 at 11:35:12PM +0100, Francois Romieu wrote:
> Kirill Smelkov <kirr@mns.spb.ru> :
> [...]
> > My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
> > 4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
> > program to transmit/receive video is here: http://repo.or.cz/w/rawv.git
>
> $ git clone http://repo.or.cz/w/rawv.git
> Cloning into 'rawv'...
> fatal: http://repo.or.cz/w/rawv.git/info/refs not valid: is this a git repository?
That's a gitweb view. The actual git repo is here:
git://repo.or.cz/rawv.git
sorry for confusion.
( if you'll want to test with vivi on a slow system, you'll need my
patches, currently staging in media-tree patchwork, or available here:
git://repo.or.cz/linux-2.6/kirr.git vivi-speedup-and-fps.over-net-next )
> [...]
> > (by the way, on atom system, without tx csum offload, half of cpu time
> > is spent only to calculate checksums...)
>
> :o(
yes, that large. In top, my workload is
%sy %id %si
default driver load ~25 ~45 ~27
(ethtool -k shows
tx-checksumming: off)
after ~8 ~81 ~11
`ethtool -K eth0 tx on`
that's why the issue is important.
> > Now I wonder, where that 6K limit came from and why they say it is now
> > not possible to use jumbos together with tx csum offload ?
>
> Here is an excerpt from a mail where Hayes explained the rules of
> engagement back in may 2011 (John Lumby and Chris Friesen were Cced then):
Can't find that mail in gmane netdev archive and on google, to restore
full context. Was that in private?
> ! The Max tx sizes for 8168 series are as following:
> !
> ! 8168B is 4K bytes.
> ! 8168C and 8168CP are 6K bytes.
> ! 8168D and later are 9K bytes.
> !
> ! Note that these sizes all include head size. That is, the mtu must less than
> ! these values.
> ! You have to enable Jumbo frame feature when the tx size is large, otherwise the
> ! packet would not be sent. Because the hw doesn't provide the threshold, the
> ! checking for MTU > 1500 is just for convenience for sw.
This part is clear.
> ! The TSO couldn't work with some feature which need to disable hw checksum, such
> ! as Jumbo frame. The hw checksum have to be disabled in certain situations, so
> ! the TSO feature should be checked in these situations, too.
I don't enable TSO nor I need it. The text indirectly says that hw
checksum should be disabled when jumbo frames are used.
~~~~
Hayes, Realtek linux nic maintainers,
could you please confirm that for all 8168C and 8168CP jumbo_max is
6K and that when jumbos are used, tx checksumming should be off?
If so, how come my two chips work stable with ~7K jumbos and tx checksum
offload on (tested this night again for ~16 hours without any problem).
thanks beforehand.
> > Is my testing enough to justify raising the limits and allowing tx offload ?
>
> I don't oppose knobs to go off-limits but I'll need some rather good reason
> before changing the manufacturer's suggested defaults.
Thanks. Let's see what Realtek people say.
Kirill
^ permalink raw reply
* Re: [PATCH v5 3/9] net: xfrm: use __this_cpu_read per-cpu helper
From: Steffen Klassert @ 2012-11-14 8:43 UTC (permalink / raw)
To: Shan Wei
Cc: Christoph Lameter, David Miller, NetDev, Herbert Xu,
Kernel-Maillist
In-Reply-To: <50A23EB0.1060808@gmail.com>
On Tue, Nov 13, 2012 at 08:36:00PM +0800, Shan Wei wrote:
> Steffen Klassert said, at 2012/11/13 18:48:
> >
> > Ok, so please add a commit message to describe your changes.
> >
> > Thanks.
> >
>
> [PATCH v5] net: xfrm: use __this_cpu_read per-cpu helper
>
> this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id())
> and can reduce memory accesses.
> The latter helper needs to find the offset for current cpu,
> and needs more assembler instructions which objdump shows in following.
>
> this_cpu_ptr relocates and address. this_cpu_read() relocates the address
> and performs the fetch. this_cpu_read() saves you more instructions
> since it can do the relocation and the fetch in one instruction.
>
> per_cpu_ptr(p, smp_processor_id()):
> 1e: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
> 26: 48 98 cltq
> 28: 31 f6 xor %esi,%esi
> 2a: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
> 31: 48 8b 04 c5 00 00 00 00 mov 0x0(,%rax,8),%rax
> 39: c7 44 10 04 14 00 00 00 movl $0x14,0x4(%rax,%rdx,1)
>
> this_cpu_ptr(p)
> 1e: 65 48 03 14 25 00 00 00 00 add %gs:0x0,%rdx
> 27: 31 f6 xor %esi,%esi
> 29: c7 42 04 14 00 00 00 movl $0x14,0x4(%rdx)
> 30: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
>
> Signed-off-by: Shan Wei <davidshan@tencent.com>
Applied to ipsec-next, thanks!
^ permalink raw reply
* IPv6 bonding support?
From: Roy Sigurd Karlsbakk @ 2012-11-14 8:26 UTC (permalink / raw)
To: netdev
Hi all
I've setup bonding on a few machines here. They're all blades in a Dell blade chassis, so they're both connected to the four internal switches (2 gigE, 2 10GigE). Using 10GigE only, I've setup bonding in active-backup mode with the ARP driver. While this works well, we're planning ahead for running IPv6 only.
Does anyone kow if something's in the works to allow redundant links in an IPv6-only setup?
Please cc: to me as I'm not on the list
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
^ permalink raw reply
* Re: [RFC PATCH 03/13] net: Add net protocol offload registration infrustructure
From: Nicolas Dichtel @ 2012-11-14 8:22 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, eric.dumazet, davem
In-Reply-To: <1352856254-29667-4-git-send-email-vyasevic@redhat.com>
Le 14/11/2012 02:24, Vlad Yasevich a écrit :
> Create a new data structure for IPv4 protocols that holds GRO/GSO
> callbacks and a new array to track the protocols that register GRO/GSO.
>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> ---
> include/net/protocol.h | 12 ++++++++++++
> net/ipv4/af_inet.c | 12 ++++++++++++
> net/ipv4/protocol.c | 22 +++++++++++++++++++++-
> 3 files changed, 45 insertions(+), 1 deletions(-)
>
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index 929528c..d8ecb17 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -77,6 +77,15 @@ struct inet6_protocol {
> #define INET6_PROTO_GSO_EXTHDR 0x4
> #endif
>
> +struct net_offload {
> + int (*gso_send_check)(struct sk_buff *skb);
> + struct sk_buff *(*gso_segment)(struct sk_buff *skb,
> + netdev_features_t features);
> + struct sk_buff **(*gro_receive)(struct sk_buff **head,
> + struct sk_buff *skb);
> + int (*gro_complete)(struct sk_buff *skb);
> +};
> +
> /* This is used to register socket interfaces for IP protocols. */
> struct inet_protosw {
> struct list_head list;
> @@ -96,6 +105,7 @@ struct inet_protosw {
> #define INET_PROTOSW_ICSK 0x04 /* Is this an inet_connection_sock? */
>
> extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
> +extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
>
> #if IS_ENABLED(CONFIG_IPV6)
> extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
> @@ -103,6 +113,8 @@ extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
>
> extern int inet_add_protocol(const struct net_protocol *prot, unsigned char num);
> extern int inet_del_protocol(const struct net_protocol *prot, unsigned char num);
> +extern int inet_add_offload(const struct net_offload *prot, unsigned char num);
> +extern int inet_del_offload(const struct net_offload *prot, unsigned char num);
> extern void inet_register_protosw(struct inet_protosw *p);
> extern void inet_unregister_protosw(struct inet_protosw *p);
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 4c99c5f..3918d86 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1566,6 +1566,13 @@ static const struct net_protocol tcp_protocol = {
> .netns_ok = 1,
> };
>
> +static const struct net_offload tcp_offload = {
> + .gso_send_check = tcp_v4_gso_send_check,
> + .gso_segment = tcp_tso_segment,
> + .gro_receive = tcp4_gro_receive,
> + .gro_complete = tcp4_gro_complete,
> +};
> +
> static const struct net_protocol udp_protocol = {
> .handler = udp_rcv,
> .err_handler = udp_err,
> @@ -1575,6 +1582,11 @@ static const struct net_protocol udp_protocol = {
> .netns_ok = 1,
> };
>
> +static const struct net_offload udp_offload = {
> + .gso_send_check = udp4_ufo_send_check,
> + .gso_segment = udp4_ufo_fragment,
> +};
> +
> static const struct net_protocol icmp_protocol = {
> .handler = icmp_rcv,
> .err_handler = ping_err,
> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
> index 8918eff..1278db8 100644
> --- a/net/ipv4/protocol.c
> +++ b/net/ipv4/protocol.c
> @@ -29,6 +29,7 @@
> #include <net/protocol.h>
>
> const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
> +const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
>
> /*
> * Add a protocol handler to the hash tables
> @@ -41,6 +42,13 @@ int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
> }
> EXPORT_SYMBOL(inet_add_protocol);
>
> +int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
> +{
> + return !cmpxchg((const struct net_offload **)&inet_offloads[protocol],
> + NULL, prot) ? 0 : -1;
> +}
> +EXPORT_SYMBOL(inet_add_offload);
> +
> /*
> * Remove a protocol from the hash tables.
> */
> @@ -56,4 +64,16 @@ int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
>
> return ret;
> }
> -EXPORT_SYMBOL(inet_del_protocol);
This line should probably not be removed ;-)
^ permalink raw reply
* Re: [v6 PATCH 1/8] cxgb4/cxgb4vf: Chelsio FCoE offload driver submission (common header updates).
From: James Bottomley @ 2012-11-14 2:26 UTC (permalink / raw)
To: Naresh Kumar Inna; +Cc: linux-scsi, dm, leedom, netdev, chethan
In-Reply-To: <1351158621-32222-2-git-send-email-naresh@chelsio.com>
On Thu, 2012-10-25 at 15:20 +0530, Naresh Kumar Inna wrote:
> This patch contains updates to firmware/hardware header files shared
> between csiostor and cxgb4/cxgb4vf, and the resulting changes to the
> cxgb4/cxgb4vf source files.
>
> Signed-off-by: Naresh Kumar Inna <naresh@chelsio.com>
> ---
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 8 ++--
> drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 ++--
> drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 20 +++++-----
> drivers/net/ethernet/chelsio/cxgb4/t4_msg.h | 1 +
> drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 36 +++++++++++++++++++-
> drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 41 ++++++++++++++++++++---
> drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 8 ++--
> 7 files changed, 92 insertions(+), 28 deletions(-)
This still doesn't compile:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function ‘adap_init0’:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3932:10: error: ‘struct fw_caps_config_cmd’ has no member named ‘retval_len16’
make[5]: *** [drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o] Error 1
James
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox