* LRO/GSO interaction when packets are forwarded
@ 2008-03-07 14:09 Kieran Mansley
2008-03-07 16:25 ` Stephen Hemminger
0 siblings, 1 reply; 21+ messages in thread
From: Kieran Mansley @ 2008-03-07 14:09 UTC (permalink / raw)
To: netdev
We've seen a couple of problems when using a bridge or IP forwarding
combined with LRO packets generated by a network device driver. As you
know, LRO packets can be either be page based (and passed up with
lro_receive_page()) or use the skb frag_list (and passed up with
lro_receive_skb()). In both cases it is likely that the device driver
will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
checksummed by the device, and gso_size to mark it as an LRO packet and
indicate the actual received MSS.
If this skb goes directly to the network stack everything is fine. The
problem comes when this packet instead goes into a bridge and is then
retransmitted on another device. The skb seems to pass through the
bridge relatively unmodified and because it has gso_size set the
transmit path will attempt to segment it. If page-based allocation has
been used, this is fine, but if the skb frag_list has been used the
transmit path BUGs in skb_gso_segment():
http://lxr.linux.no/linux+v2.6.24.3/net/core/dev.c#L1410
Secondly, the same function hopes that a GSO packet will have
CHECKSUM_PARTIAL set - if this packet had originated from a stack rather
than from an LRO device this would be the case - but instead it will
most likely have CHECKSUM_UNNECESSARY.
Both of these problems are essentially being caused by gso_size and the
ip_summed field have slightly different meanings on the receive and
transmit paths, and the bridge/IP forwarding stuff not translating from
one to the other. To be fair to the bridge, it would not be obvious to
it that it will be passing the packet to a real device (that will invoke
the transmit path) or to a stack.
This leads me to my questions:
- any idea why other drivers aren't hitting this problem? One
possibility is that they're using lro_receive_page rather then
lro_receive_skb, but I'd still expect to see the CHECKSUM_PARTIAL
warning. I'm wondering if having LRO and forwarding between devices is
a relatively rare thing, and so it just hasn't been tested.
- any suggestion as to the best place to try and fix this up? My
preference is making the transmit path cope with a packet that has the
frag_list in use. Making it cope with CHECKSUM_UNNECESSARY should also
be possible but to be honest I'm finding skb_gso_segment's handling of
CHECKSUM_PARTIAL a bit hard to follow. The alternative would be I
suppose to get the bridge and IP forwarding code to fix the socket
buffer up before transmitting it, or for the driver to somehow know that
it this packet will be forwarded and so it shouldn't use LRO.
Of course, if we're hitting this because we're doing something wrong and
you're confident it's not a problem in Linux, I'd be grateful to know!
Here's a stack trace showing the path a packet that hits this might
take:
[<c0106831>] die+0x111/0x210
[<c0106d67>] do_trap+0x97/0xf0
[<c0107149>] do_invalid_op+0x89/0xa0
[<c033c2fa>] error_code+0x72/0x78
[<c02d41de>] dev_hard_start_xmit+0x1ae/0x2c0
[<c02e276f>] __qdisc_run+0x4f/0x1d0
[<c02d45c1>] dev_queue_xmit+0x2d1/0x350
[<f8ae4054>] br_dev_queue_push_xmit+0x64/0xb0 [bridge]
[<f8ae8bd3>] br_nf_dev_queue_xmit+0x13/0x40 [bridge]
[<f8ae90b0>] br_nf_post_routing+0x1b0/0x1f0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae41b6>] br_forward_finish+0x46/0x60 [bridge]
[<f8ae9317>] br_nf_forward_finish+0xc7/0x160 [bridge]
[<f8ae98e7>] br_nf_forward_ip+0x137/0x1b0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae4225>] __br_forward+0x55/0x80 [bridge]
[<f8ae4307>] br_forward+0x27/0x30 [bridge]
[<f8ae4cfd>] br_handle_frame_finish+0xed/0x150 [bridge]
[<f8ae960e>] br_nf_pre_routing_finish+0x1be/0x360 [bridge]
[<f8ae9f15>] br_nf_pre_routing+0x425/0x6e0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae4ecb>] br_handle_frame+0x16b/0x210 [bridge]
[<c02d4856>] netif_receive_skb+0x216/0x310
[<c02d49b6>] process_backlog+0x66/0xd0
[<c02d0c72>] net_rx_action+0xd2/0x170
[<c0131f72>] __do_softirq+0x82/0x100
[<c0107f11>] do_softirq+0x71/0xc0
skb_gso_segment is called from dev_gso_segment, which is called from
dev_hard_start_xmit, which is shown in the stack trace.
Thanks
Kieran
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-03-07 14:09 LRO/GSO interaction when packets are forwarded Kieran Mansley
@ 2008-03-07 16:25 ` Stephen Hemminger
2008-03-07 17:06 ` Kieran Mansley
0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2008-03-07 16:25 UTC (permalink / raw)
To: Kieran Mansley; +Cc: netdev
On Fri, 07 Mar 2008 14:09:57 +0000
Kieran Mansley <kmansley@solarflare.com> wrote:
> We've seen a couple of problems when using a bridge or IP forwarding
> combined with LRO packets generated by a network device driver. As you
> know, LRO packets can be either be page based (and passed up with
> lro_receive_page()) or use the skb frag_list (and passed up with
> lro_receive_skb()). In both cases it is likely that the device driver
> will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
> checksummed by the device, and gso_size to mark it as an LRO packet and
> indicate the actual received MSS.
First off, no hardware should ever do LRO on non-local packets. If the
hardware isn't smart enough to do this, I guess the bridge code to have
an API to turn it off. IP should also turn it off if ip_forwarding
is enabled on that device.
> If this skb goes directly to the network stack everything is fine. The
> problem comes when this packet instead goes into a bridge and is then
> retransmitted on another device. The skb seems to pass through the
> bridge relatively unmodified and because it has gso_size set the
> transmit path will attempt to segment it. If page-based allocation has
> been used, this is fine, but if the skb frag_list has been used the
> transmit path BUGs in skb_gso_segment():
You can't do LRO with bridging, it is that simple, it is a protocol
layering violation.
> http://lxr.linux.no/linux+v2.6.24.3/net/core/dev.c#L1410
>
> Secondly, the same function hopes that a GSO packet will have
> CHECKSUM_PARTIAL set - if this packet had originated from a stack rather
> than from an LRO device this would be the case - but instead it will
> most likely have CHECKSUM_UNNECESSARY.
>
> Both of these problems are essentially being caused by gso_size and the
> ip_summed field have slightly different meanings on the receive and
> transmit paths, and the bridge/IP forwarding stuff not translating from
> one to the other. To be fair to the bridge, it would not be obvious to
> it that it will be passing the packet to a real device (that will invoke
> the transmit path) or to a stack.
>
> This leads me to my questions:
>
> - any idea why other drivers aren't hitting this problem? One
> possibility is that they're using lro_receive_page rather then
> lro_receive_skb, but I'd still expect to see the CHECKSUM_PARTIAL
> warning. I'm wondering if having LRO and forwarding between devices is
> a relatively rare thing, and so it just hasn't been tested.
> - any suggestion as to the best place to try and fix this up? My
> preference is making the transmit path cope with a packet that has the
> frag_list in use. Making it cope with CHECKSUM_UNNECESSARY should also
> be possible but to be honest I'm finding skb_gso_segment's handling of
> CHECKSUM_PARTIAL a bit hard to follow. The alternative would be I
> suppose to get the bridge and IP forwarding code to fix the socket
> buffer up before transmitting it, or for the driver to somehow know that
> it this packet will be forwarded and so it shouldn't use LRO.
In br_add_if, it should have a way to tell the device to turn LRO off.
dev_change_flags?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-03-07 16:25 ` Stephen Hemminger
@ 2008-03-07 17:06 ` Kieran Mansley
2008-03-07 21:43 ` [PATCH] ethtool: command line support for lro Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Kieran Mansley @ 2008-03-07 17:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
On Fri, 2008-03-07 at 08:25 -0800, Stephen Hemminger wrote:
> On Fri, 07 Mar 2008 14:09:57 +0000
> Kieran Mansley <kmansley@solarflare.com> wrote:
>
> > We've seen a couple of problems when using a bridge or IP forwarding
> > combined with LRO packets generated by a network device driver. As you
> > know, LRO packets can be either be page based (and passed up with
> > lro_receive_page()) or use the skb frag_list (and passed up with
> > lro_receive_skb()). In both cases it is likely that the device driver
> > will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
> > checksummed by the device, and gso_size to mark it as an LRO packet and
> > indicate the actual received MSS.
>
> First off, no hardware should ever do LRO on non-local packets. If the
> hardware isn't smart enough to do this, I guess the bridge code to have
> an API to turn it off. IP should also turn it off if ip_forwarding
> is enabled on that device.
If the only way to deal with this is to prevent LRO in those cases,
having an API to turn if off would clearly be helpful: working out in
the hardware or driver which packets can be forwarded or not is hard,
and would probably be a layering violation in itself.
However, if there were a way in which it could be made to work, that
would in many ways be preferable.
> > If this skb goes directly to the network stack everything is fine. The
> > problem comes when this packet instead goes into a bridge and is then
> > retransmitted on another device. The skb seems to pass through the
> > bridge relatively unmodified and because it has gso_size set the
> > transmit path will attempt to segment it. If page-based allocation has
> > been used, this is fine, but if the skb frag_list has been used the
> > transmit path BUGs in skb_gso_segment():
>
> You can't do LRO with bridging, it is that simple, it is a protocol
> layering violation.
Agreed. LRO is a layering violation with or without bridging. If the
consensus is that LRO with bridging is not viable, then that's
straightforward for us, but there are a lot of people who will want this
for performance. Bridging and virtualisation are common partners, and
virtualisation and high performance networking are also rather popular
at the moment.
Given that there seem to be a small number of places where this is
currently causing problems (LRO through a bridge to a stack is fine,
it's only if the packet is forwarded that things go wrong), and that an
LRO packet should in most cases be very easy to segment (it's already in
network-sized fragments) I wonder if making it work would be
easier/better than preventing it from happening.
Kieran
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] ethtool: command line support for lro
2008-03-07 17:06 ` Kieran Mansley
@ 2008-03-07 21:43 ` Stephen Hemminger
2008-03-10 18:07 ` Ben Hutchings
2008-03-11 16:50 ` LRO/GSO interaction when packets are forwarded Kieran Mansley
2008-04-22 21:15 ` Ben Hutchings
2 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2008-03-07 21:43 UTC (permalink / raw)
To: Kieran Mansley, Jeff Garzik; +Cc: netdev
Add lro support to command in similar manner to TSO, GSO, etc.
The file ethtool-copy.h is updated to be sanitised version of
ethtool.h from 2.6.25-rc4 (ie make headers_install)
Not tested on actual LRO hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
ethtool-copy.h | 134 +++++++------------------------------------------------
ethtool.c | 51 +++++++++++++++++++---
2 files changed, 62 insertions(+), 123 deletions(-)
diff --git a/ethtool-copy.h b/ethtool-copy.h
index 3a63224..87c7c9a 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -39,7 +39,8 @@ struct ethtool_drvinfo {
char bus_info[ETHTOOL_BUSINFO_LEN]; /* Bus info for this IF. */
/* For PCI devices, use pci_name(pci_dev). */
char reserved1[32];
- char reserved2[16];
+ char reserved2[12];
+ __u32 n_priv_flags; /* number of flags valid in ETHTOOL_GPFLAGS */
__u32 n_stats; /* number of u64's from ETHTOOL_GSTATS */
__u32 testinfo_len;
__u32 eedump_len; /* Size of data from ETHTOOL_GEEPROM (bytes) */
@@ -219,6 +220,7 @@ struct ethtool_pauseparam {
enum ethtool_stringset {
ETH_SS_TEST = 0,
ETH_SS_STATS,
+ ETH_SS_PRIV_FLAGS,
};
/* for passing string sets for data tagging */
@@ -256,125 +258,19 @@ struct ethtool_perm_addr {
__u8 data[0];
};
-#ifdef __KERNEL__
-
-struct net_device;
-
-/* Some generic methods drivers may use in their ethtool_ops */
-u32 ethtool_op_get_link(struct net_device *dev);
-u32 ethtool_op_get_tx_csum(struct net_device *dev);
-int ethtool_op_set_tx_csum(struct net_device *dev, u32 data);
-int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data);
-int ethtool_op_set_tx_ipv6_csum(struct net_device *dev, u32 data);
-u32 ethtool_op_get_sg(struct net_device *dev);
-int ethtool_op_set_sg(struct net_device *dev, u32 data);
-u32 ethtool_op_get_tso(struct net_device *dev);
-int ethtool_op_set_tso(struct net_device *dev, u32 data);
-int ethtool_op_get_perm_addr(struct net_device *dev,
- struct ethtool_perm_addr *addr, u8 *data);
-u32 ethtool_op_get_ufo(struct net_device *dev);
-int ethtool_op_set_ufo(struct net_device *dev, u32 data);
-
-/**
- * ðtool_ops - Alter and report network device settings
- * get_settings: Get device-specific settings
- * set_settings: Set device-specific settings
- * get_drvinfo: Report driver information
- * get_regs: Get device registers
- * get_wol: Report whether Wake-on-Lan is enabled
- * set_wol: Turn Wake-on-Lan on or off
- * get_msglevel: Report driver message level
- * set_msglevel: Set driver message level
- * nway_reset: Restart autonegotiation
- * get_link: Get link status
- * get_eeprom: Read data from the device EEPROM
- * set_eeprom: Write data to the device EEPROM
- * get_coalesce: Get interrupt coalescing parameters
- * set_coalesce: Set interrupt coalescing parameters
- * get_ringparam: Report ring sizes
- * set_ringparam: Set ring sizes
- * get_pauseparam: Report pause parameters
- * set_pauseparam: Set pause paramters
- * get_rx_csum: Report whether receive checksums are turned on or off
- * set_rx_csum: Turn receive checksum on or off
- * get_tx_csum: Report whether transmit checksums are turned on or off
- * set_tx_csum: Turn transmit checksums on or off
- * get_sg: Report whether scatter-gather is enabled
- * set_sg: Turn scatter-gather on or off
- * get_tso: Report whether TCP segmentation offload is enabled
- * set_tso: Turn TCP segmentation offload on or off
- * get_ufo: Report whether UDP fragmentation offload is enabled
- * set_ufo: Turn UDP fragmentation offload on or off
- * self_test: Run specified self-tests
- * get_strings: Return a set of strings that describe the requested objects
- * phys_id: Identify the device
- * get_stats: Return statistics about the device
- * get_perm_addr: Gets the permanent hardware address
- *
- * Description:
+/* boolean flags controlling per-interface behavior characteristics.
+ * When reading, the flag indicates whether or not a certain behavior
+ * is enabled/present. When writing, the flag indicates whether
+ * or not the driver should turn on (set) or off (clear) a behavior.
*
- * get_settings:
- * @get_settings is passed an ðtool_cmd to fill in. It returns
- * an negative errno or zero.
- *
- * set_settings:
- * @set_settings is passed an ðtool_cmd and should attempt to set
- * all the settings this device supports. It may return an error value
- * if something goes wrong (otherwise 0).
- *
- * get_eeprom:
- * Should fill in the magic field. Don't need to check len for zero
- * or wraparound. Fill in the data argument with the eeprom values
- * from offset to offset + len. Update len to the amount read.
- * Returns an error or zero.
- *
- * set_eeprom:
- * Should validate the magic field. Don't need to check len for zero
- * or wraparound. Update len to the amount written. Returns an error
- * or zero.
+ * Some behaviors may read-only (unconditionally absent or present).
+ * If such is the case, return EINVAL in the set-flags operation if the
+ * flag differs from the read-only value.
*/
-struct ethtool_ops {
- int (*get_settings)(struct net_device *, struct ethtool_cmd *);
- int (*set_settings)(struct net_device *, struct ethtool_cmd *);
- void (*get_drvinfo)(struct net_device *, struct ethtool_drvinfo *);
- int (*get_regs_len)(struct net_device *);
- void (*get_regs)(struct net_device *, struct ethtool_regs *, void *);
- void (*get_wol)(struct net_device *, struct ethtool_wolinfo *);
- int (*set_wol)(struct net_device *, struct ethtool_wolinfo *);
- u32 (*get_msglevel)(struct net_device *);
- void (*set_msglevel)(struct net_device *, u32);
- int (*nway_reset)(struct net_device *);
- u32 (*get_link)(struct net_device *);
- int (*get_eeprom_len)(struct net_device *);
- int (*get_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
- int (*set_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
- int (*get_coalesce)(struct net_device *, struct ethtool_coalesce *);
- int (*set_coalesce)(struct net_device *, struct ethtool_coalesce *);
- void (*get_ringparam)(struct net_device *, struct ethtool_ringparam *);
- int (*set_ringparam)(struct net_device *, struct ethtool_ringparam *);
- void (*get_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
- int (*set_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
- u32 (*get_rx_csum)(struct net_device *);
- int (*set_rx_csum)(struct net_device *, u32);
- u32 (*get_tx_csum)(struct net_device *);
- int (*set_tx_csum)(struct net_device *, u32);
- u32 (*get_sg)(struct net_device *);
- int (*set_sg)(struct net_device *, u32);
- u32 (*get_tso)(struct net_device *);
- int (*set_tso)(struct net_device *, u32);
- int (*self_test_count)(struct net_device *);
- void (*self_test)(struct net_device *, struct ethtool_test *, u64 *);
- void (*get_strings)(struct net_device *, u32 stringset, u8 *);
- int (*phys_id)(struct net_device *, u32);
- int (*get_stats_count)(struct net_device *);
- void (*get_ethtool_stats)(struct net_device *, struct ethtool_stats *, u64 *);
- int (*get_perm_addr)(struct net_device *, struct ethtool_perm_addr *, u8 *);
- int (*begin)(struct net_device *);
- void (*complete)(struct net_device *);
- u32 (*get_ufo)(struct net_device *);
- int (*set_ufo)(struct net_device *, u32);
+enum ethtool_flags {
+ ETH_FLAG_LRO = (1 << 15), /* LRO is enabled */
};
-#endif /* __KERNEL__ */
+
/* CMDs currently supported */
#define ETHTOOL_GSET 0x00000001 /* Get settings. */
@@ -414,6 +310,10 @@ struct ethtool_ops {
#define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */
#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */
#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */
+#define ETHTOOL_GFLAGS 0x00000025 /* Get flags bitmap(ethtool_value) */
+#define ETHTOOL_SFLAGS 0x00000026 /* Set flags bitmap(ethtool_value) */
+#define ETHTOOL_GPFLAGS 0x00000027 /* Get driver-private flags bitmap */
+#define ETHTOOL_SPFLAGS 0x00000028 /* Set driver-private flags bitmap */
/* compatibility with older code */
#define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/ethtool.c b/ethtool.c
index a668b49..8a063fa 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -151,7 +151,9 @@ static struct option {
" [ sg on|off ]\n"
" [ tso on|off ]\n"
" [ ufo on|off ]\n"
- " [ gso on|off ]\n" },
+ " [ gso on|off ]\n"
+ " [ lro on|off ]\n"
+ },
{ "-i", "--driver", MODE_GDRV, "Show driver information" },
{ "-d", "--register-dump", MODE_GREGS, "Do a register dump",
" [ raw on|off ]\n"
@@ -200,6 +202,7 @@ static int off_sg_wanted = -1;
static int off_tso_wanted = -1;
static int off_ufo_wanted = -1;
static int off_gso_wanted = -1;
+static int off_lro_wanted = -1;
static struct ethtool_pauseparam epause;
static int gpause_changed = 0;
@@ -310,6 +313,7 @@ static struct cmdline_info cmdline_offload[] = {
{ "tso", CMDL_BOOL, &off_tso_wanted, NULL },
{ "ufo", CMDL_BOOL, &off_ufo_wanted, NULL },
{ "gso", CMDL_BOOL, &off_gso_wanted, NULL },
+ { "lro", CMDL_BOOL, &off_lro_wanted, NULL },
};
static struct cmdline_info cmdline_pause[] = {
@@ -1217,7 +1221,7 @@ static int dump_coalesce(void)
return 0;
}
-static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
+static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso, int lro)
{
fprintf(stdout,
"rx-checksumming: %s\n"
@@ -1225,13 +1229,15 @@ static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
"scatter-gather: %s\n"
"tcp segmentation offload: %s\n"
"udp fragmentation offload: %s\n"
- "generic segmentation offload: %s\n",
+ "generic segmentation offload: %s\n"
+ "large receive offload: %s\n",
rx ? "on" : "off",
tx ? "on" : "off",
sg ? "on" : "off",
tso ? "on" : "off",
ufo ? "on" : "off",
- gso ? "on" : "off");
+ gso ? "on" : "off",
+ lro ? "on" : "off");
return 0;
}
@@ -1495,7 +1501,8 @@ static int do_scoalesce(int fd, struct ifreq *ifr)
static int do_goffload(int fd, struct ifreq *ifr)
{
struct ethtool_value eval;
- int err, allfail = 1, rx = 0, tx = 0, sg = 0, tso = 0, ufo = 0, gso = 0;
+ int err, allfail = 1, rx = 0, tx = 0, sg = 0;
+ int tso = 0, ufo = 0, gso = 0, lro = 0;
fprintf(stdout, "Offload parameters for %s:\n", devname);
@@ -1559,12 +1566,20 @@ static int do_goffload(int fd, struct ifreq *ifr)
allfail = 0;
}
+ eval.cmd = ETHTOOL_GFLAGS;
+ ifr->ifr_data = (caddr_t)&eval;
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (!err) {
+ lro = eval.data & ETH_FLAG_LRO;
+ allfail = 0;
+ }
+
if (allfail) {
fprintf(stdout, "no offload info available\n");
return 83;
}
- return dump_offload(rx, tx, sg, tso, ufo, gso);
+ return dump_offload(rx, tx, sg, tso, ufo, gso, lro);
}
static int do_soffload(int fd, struct ifreq *ifr)
@@ -1641,6 +1656,30 @@ static int do_soffload(int fd, struct ifreq *ifr)
return 90;
}
}
+ if (off_lro_wanted >= 0) {
+ changed = 1;
+ eval.cmd = ETHTOOL_GFLAGS;
+ eval.data = 0;
+ ifr->ifr_data = (caddr_t)&eval;
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (err) {
+ perror("Cannot get device flag settings");
+ return 90;
+ }
+
+ eval.cmd = ETHTOOL_SFLAGS;
+ if (off_lro_wanted == 1)
+ eval.data |= ETH_FLAG_LRO;
+ else
+ eval.data &= ~ETH_FLAG_LRO;
+
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (err) {
+ perror("Cannot set large receive offload settings");
+ return 90;
+ }
+ }
+
if (!changed) {
fprintf(stdout, "no offload settings changed\n");
}
--
1.5.4.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-03-07 21:43 ` [PATCH] ethtool: command line support for lro Stephen Hemminger
@ 2008-03-10 18:07 ` Ben Hutchings
2008-03-10 18:29 ` Stephen Hemminger
2008-04-17 12:11 ` Ben Hutchings
0 siblings, 2 replies; 21+ messages in thread
From: Ben Hutchings @ 2008-03-10 18:07 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Kieran Mansley, Jeff Garzik, netdev
Stephen Hemminger wrote:
> Add lro support to command in similar manner to TSO, GSO, etc.
> The file ethtool-copy.h is updated to be sanitised version of
> ethtool.h from 2.6.25-rc4 (ie make headers_install)
I already posted a patch to do this, though I didn't update
ethtool-copy.h.
> Not tested on actual LRO hardware.
Mine was, and this looks very similar.
> @@ -1559,12 +1566,20 @@ static int do_goffload(int fd, struct ifreq *ifr)
> allfail = 0;
> }
>
> + eval.cmd = ETHTOOL_GFLAGS;
> + ifr->ifr_data = (caddr_t)&eval;
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (!err) {
> + lro = eval.data & ETH_FLAG_LRO;
> + allfail = 0;
> + }
> +
To be consistent, this should print a specific error if the ioctl
fails.
> @@ -1641,6 +1656,30 @@ static int do_soffload(int fd, struct ifreq *ifr)
> return 90;
> }
> }
> + if (off_lro_wanted >= 0) {
> + changed = 1;
> + eval.cmd = ETHTOOL_GFLAGS;
> + eval.data = 0;
> + ifr->ifr_data = (caddr_t)&eval;
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (err) {
> + perror("Cannot get device flag settings");
> + return 90;
> + }
I didn't bother fetching the existing flags because only ETH_FLAG_LRO
is defined. But this would be more future-proof.
> +
> + eval.cmd = ETHTOOL_SFLAGS;
> + if (off_lro_wanted == 1)
> + eval.data |= ETH_FLAG_LRO;
> + else
> + eval.data &= ~ETH_FLAG_LRO;
> +
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (err) {
> + perror("Cannot set large receive offload settings");
> + return 90;
> + }
The error return codes are unique so far, so these error paths
should return 91 and 92, not 90.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-03-10 18:07 ` Ben Hutchings
@ 2008-03-10 18:29 ` Stephen Hemminger
2008-03-10 18:50 ` Ben Hutchings
2008-04-17 12:11 ` Ben Hutchings
1 sibling, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2008-03-10 18:29 UTC (permalink / raw)
To: Ben Hutchings; +Cc: Kieran Mansley, Jeff Garzik, netdev
On Mon, 10 Mar 2008 18:07:07 +0000
Ben Hutchings <bhutchings@solarflare.com> wrote:
> Stephen Hemminger wrote:
> > Add lro support to command in similar manner to TSO, GSO, etc.
> > The file ethtool-copy.h is updated to be sanitised version of
> > ethtool.h from 2.6.25-rc4 (ie make headers_install)
>
> I already posted a patch to do this, though I didn't update
> ethtool-copy.h.
>
> > Not tested on actual LRO hardware.
>
> Mine was, and this looks very similar.
>
> > @@ -1559,12 +1566,20 @@ static int do_goffload(int fd, struct ifreq *ifr)
> > allfail = 0;
> > }
> >
> > + eval.cmd = ETHTOOL_GFLAGS;
> > + ifr->ifr_data = (caddr_t)&eval;
> > + err = ioctl(fd, SIOCETHTOOL, ifr);
> > + if (!err) {
> > + lro = eval.data & ETH_FLAG_LRO;
> > + allfail = 0;
> > + }
> > +
>
> To be consistent, this should print a specific error if the ioctl
> fails.
No, since most hardware won't support LRO or the flags, it makes sense not
to complain when fetching the value.
> > @@ -1641,6 +1656,30 @@ static int do_soffload(int fd, struct ifreq *ifr)
> > return 90;
> > }
> > }
> > + if (off_lro_wanted >= 0) {
> > + changed = 1;
> > + eval.cmd = ETHTOOL_GFLAGS;
> > + eval.data = 0;
> > + ifr->ifr_data = (caddr_t)&eval;
> > + err = ioctl(fd, SIOCETHTOOL, ifr);
> > + if (err) {
> > + perror("Cannot get device flag settings");
> > + return 90;
> > + }
>
> I didn't bother fetching the existing flags because only ETH_FLAG_LRO
> is defined. But this would be more future-proof.
>
> > +
> > + eval.cmd = ETHTOOL_SFLAGS;
> > + if (off_lro_wanted == 1)
> > + eval.data |= ETH_FLAG_LRO;
> > + else
> > + eval.data &= ~ETH_FLAG_LRO;
> > +
> > + err = ioctl(fd, SIOCETHTOOL, ifr);
> > + if (err) {
> > + perror("Cannot set large receive offload settings");
> > + return 90;
> > + }
>
> The error return codes are unique so far, so these error paths
> should return 91 and 92, not 90.
>
> Ben.
>
Sure.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-03-10 18:29 ` Stephen Hemminger
@ 2008-03-10 18:50 ` Ben Hutchings
0 siblings, 0 replies; 21+ messages in thread
From: Ben Hutchings @ 2008-03-10 18:50 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Kieran Mansley, Jeff Garzik, netdev
Stephen Hemminger wrote:
> On Mon, 10 Mar 2008 18:07:07 +0000
> Ben Hutchings <bhutchings@solarflare.com> wrote:
>
> > Stephen Hemminger wrote:
<snip>
> > > @@ -1559,12 +1566,20 @@ static int do_goffload(int fd, struct ifreq *ifr)
> > > allfail = 0;
> > > }
> > >
> > > + eval.cmd = ETHTOOL_GFLAGS;
> > > + ifr->ifr_data = (caddr_t)&eval;
> > > + err = ioctl(fd, SIOCETHTOOL, ifr);
> > > + if (!err) {
> > > + lro = eval.data & ETH_FLAG_LRO;
> > > + allfail = 0;
> > > + }
> > > +
> >
> > To be consistent, this should print a specific error if the ioctl
> > fails.
>
> No, since most hardware won't support LRO or the flags, it makes sense not
> to complain when fetching the value.
<snip>
Many of the other offloads have the same issue, particularly when
using a new ethtool with an old kernel. Maybe we should not print an
error message if the error is EOPNOTSUPP or EPERM (unknown commands
are assumed to require admin capability). But that's a separate
change and again it's not specific to LRO.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-03-07 17:06 ` Kieran Mansley
2008-03-07 21:43 ` [PATCH] ethtool: command line support for lro Stephen Hemminger
@ 2008-03-11 16:50 ` Kieran Mansley
2008-04-22 21:15 ` Ben Hutchings
2 siblings, 0 replies; 21+ messages in thread
From: Kieran Mansley @ 2008-03-11 16:50 UTC (permalink / raw)
To: netdev
On Fri, 2008-03-07 at 17:06 +0000, Kieran Mansley wrote:
> Given that there seem to be a small number of places where this is
> currently causing problems (LRO through a bridge to a stack is fine,
> it's only if the packet is forwarded that things go wrong), and that an
> LRO packet should in most cases be very easy to segment (it's already in
> network-sized fragments) I wonder if making it work would be
> easier/better than preventing it from happening.
I hope to get some time to work on this in the not-too-distant future.
Kieran
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-03-10 18:07 ` Ben Hutchings
2008-03-10 18:29 ` Stephen Hemminger
@ 2008-04-17 12:11 ` Ben Hutchings
2008-04-30 18:36 ` Kok, Auke
2008-09-14 2:09 ` Jeff Garzik
1 sibling, 2 replies; 21+ messages in thread
From: Ben Hutchings @ 2008-04-17 12:11 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Kieran Mansley, Jeff Garzik
Add lro support to command in similar manner to TSO, GSO, etc.
The file ethtool-copy.h is updated to be sanitised version of
ethtool.h from 2.6.25-rc4 (ie make headers_install)
Based on patch by Stephen Hemminger
<http://article.gmane.org/gmane.linux.network/88124>.
My changes:
- Changed error return codes for setting LRO to be unique.
- Report failure to get device flags, consistent with other offload settings.
- Fixed code spacing.
Tested with the sfc driver.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
diff --git a/ethtool-copy.h b/ethtool-copy.h
index 3a63224..87c7c9a 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -39,7 +39,8 @@ struct ethtool_drvinfo {
char bus_info[ETHTOOL_BUSINFO_LEN]; /* Bus info for this IF. */
/* For PCI devices, use pci_name(pci_dev). */
char reserved1[32];
- char reserved2[16];
+ char reserved2[12];
+ __u32 n_priv_flags; /* number of flags valid in ETHTOOL_GPFLAGS */
__u32 n_stats; /* number of u64's from ETHTOOL_GSTATS */
__u32 testinfo_len;
__u32 eedump_len; /* Size of data from ETHTOOL_GEEPROM (bytes) */
@@ -219,6 +220,7 @@ struct ethtool_pauseparam {
enum ethtool_stringset {
ETH_SS_TEST = 0,
ETH_SS_STATS,
+ ETH_SS_PRIV_FLAGS,
};
/* for passing string sets for data tagging */
@@ -256,125 +258,19 @@ struct ethtool_perm_addr {
__u8 data[0];
};
-#ifdef __KERNEL__
-
-struct net_device;
-
-/* Some generic methods drivers may use in their ethtool_ops */
-u32 ethtool_op_get_link(struct net_device *dev);
-u32 ethtool_op_get_tx_csum(struct net_device *dev);
-int ethtool_op_set_tx_csum(struct net_device *dev, u32 data);
-int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data);
-int ethtool_op_set_tx_ipv6_csum(struct net_device *dev, u32 data);
-u32 ethtool_op_get_sg(struct net_device *dev);
-int ethtool_op_set_sg(struct net_device *dev, u32 data);
-u32 ethtool_op_get_tso(struct net_device *dev);
-int ethtool_op_set_tso(struct net_device *dev, u32 data);
-int ethtool_op_get_perm_addr(struct net_device *dev,
- struct ethtool_perm_addr *addr, u8 *data);
-u32 ethtool_op_get_ufo(struct net_device *dev);
-int ethtool_op_set_ufo(struct net_device *dev, u32 data);
-
-/**
- * ðtool_ops - Alter and report network device settings
- * get_settings: Get device-specific settings
- * set_settings: Set device-specific settings
- * get_drvinfo: Report driver information
- * get_regs: Get device registers
- * get_wol: Report whether Wake-on-Lan is enabled
- * set_wol: Turn Wake-on-Lan on or off
- * get_msglevel: Report driver message level
- * set_msglevel: Set driver message level
- * nway_reset: Restart autonegotiation
- * get_link: Get link status
- * get_eeprom: Read data from the device EEPROM
- * set_eeprom: Write data to the device EEPROM
- * get_coalesce: Get interrupt coalescing parameters
- * set_coalesce: Set interrupt coalescing parameters
- * get_ringparam: Report ring sizes
- * set_ringparam: Set ring sizes
- * get_pauseparam: Report pause parameters
- * set_pauseparam: Set pause paramters
- * get_rx_csum: Report whether receive checksums are turned on or off
- * set_rx_csum: Turn receive checksum on or off
- * get_tx_csum: Report whether transmit checksums are turned on or off
- * set_tx_csum: Turn transmit checksums on or off
- * get_sg: Report whether scatter-gather is enabled
- * set_sg: Turn scatter-gather on or off
- * get_tso: Report whether TCP segmentation offload is enabled
- * set_tso: Turn TCP segmentation offload on or off
- * get_ufo: Report whether UDP fragmentation offload is enabled
- * set_ufo: Turn UDP fragmentation offload on or off
- * self_test: Run specified self-tests
- * get_strings: Return a set of strings that describe the requested objects
- * phys_id: Identify the device
- * get_stats: Return statistics about the device
- * get_perm_addr: Gets the permanent hardware address
- *
- * Description:
+/* boolean flags controlling per-interface behavior characteristics.
+ * When reading, the flag indicates whether or not a certain behavior
+ * is enabled/present. When writing, the flag indicates whether
+ * or not the driver should turn on (set) or off (clear) a behavior.
*
- * get_settings:
- * @get_settings is passed an ðtool_cmd to fill in. It returns
- * an negative errno or zero.
- *
- * set_settings:
- * @set_settings is passed an ðtool_cmd and should attempt to set
- * all the settings this device supports. It may return an error value
- * if something goes wrong (otherwise 0).
- *
- * get_eeprom:
- * Should fill in the magic field. Don't need to check len for zero
- * or wraparound. Fill in the data argument with the eeprom values
- * from offset to offset + len. Update len to the amount read.
- * Returns an error or zero.
- *
- * set_eeprom:
- * Should validate the magic field. Don't need to check len for zero
- * or wraparound. Update len to the amount written. Returns an error
- * or zero.
+ * Some behaviors may read-only (unconditionally absent or present).
+ * If such is the case, return EINVAL in the set-flags operation if the
+ * flag differs from the read-only value.
*/
-struct ethtool_ops {
- int (*get_settings)(struct net_device *, struct ethtool_cmd *);
- int (*set_settings)(struct net_device *, struct ethtool_cmd *);
- void (*get_drvinfo)(struct net_device *, struct ethtool_drvinfo *);
- int (*get_regs_len)(struct net_device *);
- void (*get_regs)(struct net_device *, struct ethtool_regs *, void *);
- void (*get_wol)(struct net_device *, struct ethtool_wolinfo *);
- int (*set_wol)(struct net_device *, struct ethtool_wolinfo *);
- u32 (*get_msglevel)(struct net_device *);
- void (*set_msglevel)(struct net_device *, u32);
- int (*nway_reset)(struct net_device *);
- u32 (*get_link)(struct net_device *);
- int (*get_eeprom_len)(struct net_device *);
- int (*get_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
- int (*set_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
- int (*get_coalesce)(struct net_device *, struct ethtool_coalesce *);
- int (*set_coalesce)(struct net_device *, struct ethtool_coalesce *);
- void (*get_ringparam)(struct net_device *, struct ethtool_ringparam *);
- int (*set_ringparam)(struct net_device *, struct ethtool_ringparam *);
- void (*get_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
- int (*set_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
- u32 (*get_rx_csum)(struct net_device *);
- int (*set_rx_csum)(struct net_device *, u32);
- u32 (*get_tx_csum)(struct net_device *);
- int (*set_tx_csum)(struct net_device *, u32);
- u32 (*get_sg)(struct net_device *);
- int (*set_sg)(struct net_device *, u32);
- u32 (*get_tso)(struct net_device *);
- int (*set_tso)(struct net_device *, u32);
- int (*self_test_count)(struct net_device *);
- void (*self_test)(struct net_device *, struct ethtool_test *, u64 *);
- void (*get_strings)(struct net_device *, u32 stringset, u8 *);
- int (*phys_id)(struct net_device *, u32);
- int (*get_stats_count)(struct net_device *);
- void (*get_ethtool_stats)(struct net_device *, struct ethtool_stats *, u64 *);
- int (*get_perm_addr)(struct net_device *, struct ethtool_perm_addr *, u8 *);
- int (*begin)(struct net_device *);
- void (*complete)(struct net_device *);
- u32 (*get_ufo)(struct net_device *);
- int (*set_ufo)(struct net_device *, u32);
+enum ethtool_flags {
+ ETH_FLAG_LRO = (1 << 15), /* LRO is enabled */
};
-#endif /* __KERNEL__ */
+
/* CMDs currently supported */
#define ETHTOOL_GSET 0x00000001 /* Get settings. */
@@ -414,6 +310,10 @@ struct ethtool_ops {
#define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */
#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */
#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */
+#define ETHTOOL_GFLAGS 0x00000025 /* Get flags bitmap(ethtool_value) */
+#define ETHTOOL_SFLAGS 0x00000026 /* Set flags bitmap(ethtool_value) */
+#define ETHTOOL_GPFLAGS 0x00000027 /* Get driver-private flags bitmap */
+#define ETHTOOL_SPFLAGS 0x00000028 /* Set driver-private flags bitmap */
/* compatibility with older code */
#define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/ethtool.c b/ethtool.c
index a668b49..5f519bc 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -151,7 +151,9 @@ static struct option {
" [ sg on|off ]\n"
" [ tso on|off ]\n"
" [ ufo on|off ]\n"
- " [ gso on|off ]\n" },
+ " [ gso on|off ]\n"
+ " [ lro on|off ]\n"
+ },
{ "-i", "--driver", MODE_GDRV, "Show driver information" },
{ "-d", "--register-dump", MODE_GREGS, "Do a register dump",
" [ raw on|off ]\n"
@@ -200,6 +202,7 @@ static int off_sg_wanted = -1;
static int off_tso_wanted = -1;
static int off_ufo_wanted = -1;
static int off_gso_wanted = -1;
+static int off_lro_wanted = -1;
static struct ethtool_pauseparam epause;
static int gpause_changed = 0;
@@ -310,6 +313,7 @@ static struct cmdline_info cmdline_offload[] = {
{ "tso", CMDL_BOOL, &off_tso_wanted, NULL },
{ "ufo", CMDL_BOOL, &off_ufo_wanted, NULL },
{ "gso", CMDL_BOOL, &off_gso_wanted, NULL },
+ { "lro", CMDL_BOOL, &off_lro_wanted, NULL },
};
static struct cmdline_info cmdline_pause[] = {
@@ -1217,7 +1221,7 @@ static int dump_coalesce(void)
return 0;
}
-static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
+static int dump_offload(int rx, int tx, int sg, int tso, int ufo, int gso, int lro)
{
fprintf(stdout,
"rx-checksumming: %s\n"
@@ -1225,13 +1229,15 @@ static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
"scatter-gather: %s\n"
"tcp segmentation offload: %s\n"
"udp fragmentation offload: %s\n"
- "generic segmentation offload: %s\n",
+ "generic segmentation offload: %s\n"
+ "large receive offload: %s\n",
rx ? "on" : "off",
tx ? "on" : "off",
sg ? "on" : "off",
tso ? "on" : "off",
ufo ? "on" : "off",
- gso ? "on" : "off");
+ gso ? "on" : "off",
+ lro ? "on" : "off");
return 0;
}
@@ -1495,7 +1501,8 @@ static int do_scoalesce(int fd, struct ifreq *ifr)
static int do_goffload(int fd, struct ifreq *ifr)
{
struct ethtool_value eval;
- int err, allfail = 1, rx = 0, tx = 0, sg = 0, tso = 0, ufo = 0, gso = 0;
+ int err, allfail = 1, rx = 0, tx = 0, sg = 0;
+ int tso = 0, ufo = 0, gso = 0, lro = 0;
fprintf(stdout, "Offload parameters for %s:\n", devname);
@@ -1559,12 +1566,22 @@ static int do_goffload(int fd, struct ifreq *ifr)
allfail = 0;
}
+ eval.cmd = ETHTOOL_GFLAGS;
+ ifr->ifr_data = (caddr_t)&eval;
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (err) {
+ perror("Cannot get device flags");
+ } else {
+ lro = (eval.data & ETH_FLAG_LRO) != 0;
+ allfail = 0;
+ }
+
if (allfail) {
fprintf(stdout, "no offload info available\n");
return 83;
}
- return dump_offload(rx, tx, sg, tso, ufo, gso);
+ return dump_offload(rx, tx, sg, tso, ufo, gso, lro);
}
static int do_soffload(int fd, struct ifreq *ifr)
@@ -1641,6 +1658,30 @@ static int do_soffload(int fd, struct ifreq *ifr)
return 90;
}
}
+ if (off_lro_wanted >= 0) {
+ changed = 1;
+ eval.cmd = ETHTOOL_GFLAGS;
+ eval.data = 0;
+ ifr->ifr_data = (caddr_t)&eval;
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (err) {
+ perror("Cannot get device flag settings");
+ return 91;
+ }
+
+ eval.cmd = ETHTOOL_SFLAGS;
+ if (off_lro_wanted == 1)
+ eval.data |= ETH_FLAG_LRO;
+ else
+ eval.data &= ~ETH_FLAG_LRO;
+
+ err = ioctl(fd, SIOCETHTOOL, ifr);
+ if (err) {
+ perror("Cannot set large receive offload settings");
+ return 92;
+ }
+ }
+
if (!changed) {
fprintf(stdout, "no offload settings changed\n");
}
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-03-07 17:06 ` Kieran Mansley
2008-03-07 21:43 ` [PATCH] ethtool: command line support for lro Stephen Hemminger
2008-03-11 16:50 ` LRO/GSO interaction when packets are forwarded Kieran Mansley
@ 2008-04-22 21:15 ` Ben Hutchings
2008-04-22 23:01 ` Stephen Hemminger
2 siblings, 1 reply; 21+ messages in thread
From: Ben Hutchings @ 2008-04-22 21:15 UTC (permalink / raw)
To: Kieran Mansley; +Cc: Stephen Hemminger, netdev
Kieran Mansley wrote:
> On Fri, 2008-03-07 at 08:25 -0800, Stephen Hemminger wrote:
> > On Fri, 07 Mar 2008 14:09:57 +0000
> > Kieran Mansley <kmansley@solarflare.com> wrote:
> >
> > > We've seen a couple of problems when using a bridge or IP forwarding
> > > combined with LRO packets generated by a network device driver. As you
> > > know, LRO packets can be either be page based (and passed up with
> > > lro_receive_page()) or use the skb frag_list (and passed up with
> > > lro_receive_skb()). In both cases it is likely that the device driver
> > > will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
> > > checksummed by the device, and gso_size to mark it as an LRO packet and
> > > indicate the actual received MSS.
> >
> > First off, no hardware should ever do LRO on non-local packets. If the
> > hardware isn't smart enough to do this, I guess the bridge code to have
> > an API to turn it off. IP should also turn it off if ip_forwarding
> > is enabled on that device.
>
> If the only way to deal with this is to prevent LRO in those cases,
> having an API to turn if off would clearly be helpful: working out in
> the hardware or driver which packets can be forwarded or not is hard,
> and would probably be a layering violation in itself.
<snip>
Here's a first try at disabling LRO where it should not be used. I've
given it a little testing and would appreciate comments on whether this
is a reasonable approach.
There's one piece clearly missing: since the disabling of LRO can race
with packet reception, and since LRO can potentially be reenabled with
ethtool, the IP forwarding and bridging code needs to detect and drop
LRO'd packets.
Ben.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7c1d446..d7e8f1f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -866,6 +866,7 @@ extern struct net_device *__dev_get_by_name(struct net *net, const char *name);
extern int dev_alloc_name(struct net_device *dev, const char *name);
extern int dev_open(struct net_device *dev);
extern int dev_close(struct net_device *dev);
+extern void dev_disable_lro(struct net_device *dev);
extern int dev_queue_xmit(struct sk_buff *skb);
extern int register_netdevice(struct net_device *dev);
extern void unregister_netdevice(struct net_device *dev);
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 298e0f4..1bd6631 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -387,6 +387,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
goto err2;
rcu_assign_pointer(dev->br_port, p);
+ dev_disable_lro(dev);
dev_set_promiscuity(dev, 1);
list_add_rcu(&p->list, &br->port_list);
diff --git a/net/core/dev.c b/net/core/dev.c
index e1df1ab..62cca32 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -90,6 +90,7 @@
#include <linux/if_ether.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
#include <linux/notifier.h>
#include <linux/skbuff.h>
#include <net/net_namespace.h>
@@ -1108,6 +1109,29 @@ int dev_close(struct net_device *dev)
}
+/**
+ * dev_disable_lro - disable Large Receive Offload on a device
+ * @dev: device
+ *
+ * Disable Large Receive Offload (LRO) on a net device. Must be
+ * called under RTNL. This is needed if received packets may be
+ * forwarded to another interface.
+ */
+void dev_disable_lro(struct net_device *dev)
+{
+ if (dev->ethtool_ops && dev->ethtool_ops->get_flags &&
+ dev->ethtool_ops->set_flags) {
+ u32 flags = dev->ethtool_ops->get_flags(dev);
+ if (flags & ETH_FLAG_LRO) {
+ flags &= ~ETH_FLAG_LRO;
+ dev->ethtool_ops->set_flags(dev, flags);
+ }
+ }
+ WARN_ON(dev->features & NETIF_F_LRO);
+}
+EXPORT_SYMBOL(dev_disable_lro);
+
+
static int dev_boot_phase = 1;
/*
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 6848e47..f88d395 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -171,6 +171,8 @@ static struct in_device *inetdev_init(struct net_device *dev)
in_dev->dev = dev;
if ((in_dev->arp_parms = neigh_parms_alloc(dev, &arp_tbl)) == NULL)
goto out_kfree;
+ if (IPV4_DEVCONF(in_dev->cnf, FORWARDING))
+ dev_disable_lro(dev);
/* Reference in_dev->dev */
dev_hold(dev);
/* Account for reference dev->ip_ptr (below) */
@@ -1250,6 +1252,8 @@ static void inet_forward_change(struct net *net)
read_lock(&dev_base_lock);
for_each_netdev(net, dev) {
struct in_device *in_dev;
+ if (on)
+ dev_disable_lro(dev);
rcu_read_lock();
in_dev = __in_dev_get_rcu(dev);
if (in_dev)
@@ -1257,8 +1261,6 @@ static void inet_forward_change(struct net *net)
rcu_read_unlock();
}
read_unlock(&dev_base_lock);
-
- rt_cache_flush(0);
}
static int devinet_conf_proc(ctl_table *ctl, int write,
@@ -1344,10 +1346,19 @@ static int devinet_sysctl_forward(ctl_table *ctl, int write,
if (write && *valp != val) {
struct net *net = ctl->extra2;
- if (valp == &IPV4_DEVCONF_ALL(net, FORWARDING))
- inet_forward_change(net);
- else if (valp != &IPV4_DEVCONF_DFLT(net, FORWARDING))
+ if (valp != &IPV4_DEVCONF_DFLT(net, FORWARDING)) {
+ rtnl_lock();
+ if (valp == &IPV4_DEVCONF_ALL(net, FORWARDING)) {
+ inet_forward_change(net);
+ } else if (*valp) {
+ struct ipv4_devconf *cnf = ctl->extra1;
+ struct in_device *idev =
+ container_of(cnf, struct in_device, cnf);
+ dev_disable_lro(idev->dev);
+ }
+ rtnl_unlock();
rt_cache_flush(0);
+ }
}
return ret;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8a0fd40..5be6874 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -344,6 +344,8 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev)
kfree(ndev);
return NULL;
}
+ if (ndev->cnf.forwarding)
+ dev_disable_lro(dev);
/* We refer to the device */
dev_hold(dev);
@@ -438,6 +440,8 @@ static void dev_forward_change(struct inet6_dev *idev)
if (!idev)
return;
dev = idev->dev;
+ if (idev->cnf.forwarding)
+ dev_disable_lro(dev);
if (dev && (dev->flags & IFF_MULTICAST)) {
if (idev->cnf.forwarding)
ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
@@ -483,12 +487,14 @@ static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
if (p == &net->ipv6.devconf_dflt->forwarding)
return;
+ rtnl_lock();
if (p == &net->ipv6.devconf_all->forwarding) {
__s32 newf = net->ipv6.devconf_all->forwarding;
net->ipv6.devconf_dflt->forwarding = newf;
addrconf_forward_change(net, newf);
} else if ((!*p) ^ (!old))
dev_forward_change((struct inet6_dev *)table->extra1);
+ rtnl_unlock();
if (*p)
rt6_purge_dflt_routers(net);
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-22 21:15 ` Ben Hutchings
@ 2008-04-22 23:01 ` Stephen Hemminger
2008-04-23 6:00 ` Jarek Poplawski
2008-04-23 10:04 ` Ben Hutchings
0 siblings, 2 replies; 21+ messages in thread
From: Stephen Hemminger @ 2008-04-22 23:01 UTC (permalink / raw)
To: Ben Hutchings; +Cc: Kieran Mansley, Stephen Hemminger, netdev
On Tue, 22 Apr 2008 22:15:13 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:
> Kieran Mansley wrote:
> > On Fri, 2008-03-07 at 08:25 -0800, Stephen Hemminger wrote:
> > > On Fri, 07 Mar 2008 14:09:57 +0000
> > > Kieran Mansley <kmansley@solarflare.com> wrote:
> > >
> > > > We've seen a couple of problems when using a bridge or IP forwarding
> > > > combined with LRO packets generated by a network device driver. As you
> > > > know, LRO packets can be either be page based (and passed up with
> > > > lro_receive_page()) or use the skb frag_list (and passed up with
> > > > lro_receive_skb()). In both cases it is likely that the device driver
> > > > will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
> > > > checksummed by the device, and gso_size to mark it as an LRO packet and
> > > > indicate the actual received MSS.
> > >
> > > First off, no hardware should ever do LRO on non-local packets. If the
> > > hardware isn't smart enough to do this, I guess the bridge code to have
> > > an API to turn it off. IP should also turn it off if ip_forwarding
> > > is enabled on that device.
> >
> > If the only way to deal with this is to prevent LRO in those cases,
> > having an API to turn if off would clearly be helpful: working out in
> > the hardware or driver which packets can be forwarded or not is hard,
> > and would probably be a layering violation in itself.
> <snip>
>
> Here's a first try at disabling LRO where it should not be used. I've
> given it a little testing and would appreciate comments on whether this
> is a reasonable approach.
>
> There's one piece clearly missing: since the disabling of LRO can race
> with packet reception, and since LRO can potentially be reenabled with
> ethtool, the IP forwarding and bridging code needs to detect and drop
> LRO'd packets.
>
> Ben.
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7c1d446..d7e8f1f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -866,6 +866,7 @@ extern struct net_device *__dev_get_by_name(struct net *net, const char *name);
> extern int dev_alloc_name(struct net_device *dev, const char *name);
> extern int dev_open(struct net_device *dev);
> extern int dev_close(struct net_device *dev);
> +extern void dev_disable_lro(struct net_device *dev);
> extern int dev_queue_xmit(struct sk_buff *skb);
> extern int register_netdevice(struct net_device *dev);
> extern void unregister_netdevice(struct net_device *dev);
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 298e0f4..1bd6631 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -387,6 +387,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
> goto err2;
>
> rcu_assign_pointer(dev->br_port, p);
> + dev_disable_lro(dev);
> dev_set_promiscuity(dev, 1);
>
> list_add_rcu(&p->list, &br->port_list);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e1df1ab..62cca32 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -90,6 +90,7 @@
> #include <linux/if_ether.h>
> #include <linux/netdevice.h>
> #include <linux/etherdevice.h>
> +#include <linux/ethtool.h>
> #include <linux/notifier.h>
> #include <linux/skbuff.h>
> #include <net/net_namespace.h>
> @@ -1108,6 +1109,29 @@ int dev_close(struct net_device *dev)
> }
>
>
> +/**
> + * dev_disable_lro - disable Large Receive Offload on a device
> + * @dev: device
> + *
> + * Disable Large Receive Offload (LRO) on a net device. Must be
> + * called under RTNL. This is needed if received packets may be
> + * forwarded to another interface.
> + */
> +void dev_disable_lro(struct net_device *dev)
> +{
> + if (dev->ethtool_ops && dev->ethtool_ops->get_flags &&
> + dev->ethtool_ops->set_flags) {
> + u32 flags = dev->ethtool_ops->get_flags(dev);
> + if (flags & ETH_FLAG_LRO) {
> + flags &= ~ETH_FLAG_LRO;
> + dev->ethtool_ops->set_flags(dev, flags);
> + }
> + }
> + WARN_ON(dev->features & NETIF_F_LRO);
> +}
> +EXPORT_SYMBOL(dev_disable_lro);
Maybe add ASSERT_RTNL() instead of comment.
> +
> +
> static int dev_boot_phase = 1;
>
> /*
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index 6848e47..f88d395 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -171,6 +171,8 @@ static struct in_device *inetdev_init(struct net_device *dev)
> in_dev->dev = dev;
> if ((in_dev->arp_parms = neigh_parms_alloc(dev, &arp_tbl)) == NULL)
> goto out_kfree;
> + if (IPV4_DEVCONF(in_dev->cnf, FORWARDING))
> + dev_disable_lro(dev);
> /* Reference in_dev->dev */
> dev_hold(dev);
> /* Account for reference dev->ip_ptr (below) */
> @@ -1250,6 +1252,8 @@ static void inet_forward_change(struct net *net)
> read_lock(&dev_base_lock);
> for_each_netdev(net, dev) {
> struct in_device *in_dev;
> + if (on)
> + dev_disable_lro(dev);
> rcu_read_lock();
> in_dev = __in_dev_get_rcu(dev);
> if (in_dev)
> @@ -1257,8 +1261,6 @@ static void inet_forward_change(struct net *net)
> rcu_read_unlock();
> }
> read_unlock(&dev_base_lock);
> -
> - rt_cache_flush(0);
> }
Why did you delete the route cache flush here?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-22 23:01 ` Stephen Hemminger
@ 2008-04-23 6:00 ` Jarek Poplawski
2008-04-23 6:15 ` Jarek Poplawski
2008-04-23 10:04 ` Ben Hutchings
1 sibling, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-04-23 6:00 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Ben Hutchings, Kieran Mansley, Stephen Hemminger, netdev
On 23-04-2008 01:01, Stephen Hemminger wrote:
...
>>>> First off, no hardware should ever do LRO on non-local packets. If the
>>>> hardware isn't smart enough to do this, I guess the bridge code to have
>>>> an API to turn it off. IP should also turn it off if ip_forwarding
>>>> is enabled on that device.
Could you explain this more? (I can't see any obvious reason why
forwarding between local networks should differ here from bridging?)
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-23 6:00 ` Jarek Poplawski
@ 2008-04-23 6:15 ` Jarek Poplawski
2008-04-23 10:07 ` Ben Hutchings
0 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-04-23 6:15 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Ben Hutchings, Kieran Mansley, Stephen Hemminger, netdev
On Wed, Apr 23, 2008 at 06:00:18AM +0000, Jarek Poplawski wrote:
> On 23-04-2008 01:01, Stephen Hemminger wrote:
> ...
> >>>> First off, no hardware should ever do LRO on non-local packets. If the
> >>>> hardware isn't smart enough to do this, I guess the bridge code to have
> >>>> an API to turn it off. IP should also turn it off if ip_forwarding
> >>>> is enabled on that device.
>
> Could you explain this more? (I can't see any obvious reason why
> forwarding between local networks should differ here from bridging?)
...and the second question: is only ip_forwarding flag checking right
way to disable something destined for local packets?
Jarek P.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-22 23:01 ` Stephen Hemminger
2008-04-23 6:00 ` Jarek Poplawski
@ 2008-04-23 10:04 ` Ben Hutchings
1 sibling, 0 replies; 21+ messages in thread
From: Ben Hutchings @ 2008-04-23 10:04 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Kieran Mansley, Stephen Hemminger, netdev
Stephen Hemminger wrote:
<snip>
> > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> > index 6848e47..f88d395 100644
> > --- a/net/ipv4/devinet.c
> > +++ b/net/ipv4/devinet.c
> > @@ -171,6 +171,8 @@ static struct in_device *inetdev_init(struct net_device *dev)
> > in_dev->dev = dev;
> > if ((in_dev->arp_parms = neigh_parms_alloc(dev, &arp_tbl)) == NULL)
> > goto out_kfree;
> > + if (IPV4_DEVCONF(in_dev->cnf, FORWARDING))
> > + dev_disable_lro(dev);
> > /* Reference in_dev->dev */
> > dev_hold(dev);
> > /* Account for reference dev->ip_ptr (below) */
> > @@ -1250,6 +1252,8 @@ static void inet_forward_change(struct net *net)
> > read_lock(&dev_base_lock);
> > for_each_netdev(net, dev) {
> > struct in_device *in_dev;
> > + if (on)
> > + dev_disable_lro(dev);
> > rcu_read_lock();
> > in_dev = __in_dev_get_rcu(dev);
> > if (in_dev)
> > @@ -1257,8 +1261,6 @@ static void inet_forward_change(struct net *net)
> > rcu_read_unlock();
> > }
> > read_unlock(&dev_base_lock);
> > -
> > - rt_cache_flush(0);
> > }
>
> Why did you delete the route cache flush here?
I changed devinet_sysctl_forward() (the only caller of inet_forward_change())
to combine code for the change-one and change-all cases. It calls
rt_cache_flush(0) in both cases.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-23 6:15 ` Jarek Poplawski
@ 2008-04-23 10:07 ` Ben Hutchings
2008-04-23 10:38 ` Jarek Poplawski
0 siblings, 1 reply; 21+ messages in thread
From: Ben Hutchings @ 2008-04-23 10:07 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Stephen Hemminger, Kieran Mansley, Stephen Hemminger, netdev
Jarek Poplawski wrote:
> On Wed, Apr 23, 2008 at 06:00:18AM +0000, Jarek Poplawski wrote:
> > On 23-04-2008 01:01, Stephen Hemminger wrote:
> > ...
> > >>>> First off, no hardware should ever do LRO on non-local packets. If the
> > >>>> hardware isn't smart enough to do this, I guess the bridge code to have
> > >>>> an API to turn it off. IP should also turn it off if ip_forwarding
> > >>>> is enabled on that device.
> >
> > Could you explain this more? (I can't see any obvious reason why
> > forwarding between local networks should differ here from bridging?)
>
> ...and the second question: is only ip_forwarding flag checking right
> way to disable something destined for local packets?
"Non-local" here simply means destined for another host. It doesn't
matter whether that host is on the same LAN or not.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-23 10:07 ` Ben Hutchings
@ 2008-04-23 10:38 ` Jarek Poplawski
2008-04-23 10:42 ` David Miller
0 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-04-23 10:38 UTC (permalink / raw)
To: Ben Hutchings
Cc: Stephen Hemminger, Kieran Mansley, Stephen Hemminger, netdev
On Wed, Apr 23, 2008 at 11:07:04AM +0100, Ben Hutchings wrote:
> Jarek Poplawski wrote:
> > On Wed, Apr 23, 2008 at 06:00:18AM +0000, Jarek Poplawski wrote:
> > > On 23-04-2008 01:01, Stephen Hemminger wrote:
> > > ...
> > > >>>> First off, no hardware should ever do LRO on non-local packets. If the
> > > >>>> hardware isn't smart enough to do this, I guess the bridge code to have
> > > >>>> an API to turn it off. IP should also turn it off if ip_forwarding
> > > >>>> is enabled on that device.
> > >
> > > Could you explain this more? (I can't see any obvious reason why
> > > forwarding between local networks should differ here from bridging?)
> >
> > ...and the second question: is only ip_forwarding flag checking right
> > way to disable something destined for local packets?
>
> "Non-local" here simply means destined for another host. It doesn't
> matter whether that host is on the same LAN or not.
Thanks for explanation! I'd still like to know why it's wrong, and
why we should turn this off on a device with ip_forwarding enabled
without checking how smart this hardware is?
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-23 10:38 ` Jarek Poplawski
@ 2008-04-23 10:42 ` David Miller
2008-04-23 11:09 ` Jarek Poplawski
0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-04-23 10:42 UTC (permalink / raw)
To: jarkao2; +Cc: bhutchings, stephen.hemminger, kmansley, shemminger, netdev
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 23 Apr 2008 10:38:10 +0000
> Thanks for explanation! I'd still like to know why it's wrong, and
> why we should turn this off on a device with ip_forwarding enabled
> without checking how smart this hardware is?
It's not about how smart the hardware is.
This a stateless feature, the device accumulates data into
large receive frames when it's TCP and the ports and addresses
match.
There is no knowledge of routes or anything like that, nor should
there be, because otherwise it would cease to be stateless.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: LRO/GSO interaction when packets are forwarded
2008-04-23 10:42 ` David Miller
@ 2008-04-23 11:09 ` Jarek Poplawski
0 siblings, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2008-04-23 11:09 UTC (permalink / raw)
To: David Miller; +Cc: bhutchings, stephen.hemminger, kmansley, shemminger, netdev
On Wed, Apr 23, 2008 at 03:42:25AM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 23 Apr 2008 10:38:10 +0000
>
> > Thanks for explanation! I'd still like to know why it's wrong, and
> > why we should turn this off on a device with ip_forwarding enabled
> > without checking how smart this hardware is?
>
> It's not about how smart the hardware is.
...I think Stephen mentioned there could be some difference?
>
> This a stateless feature, the device accumulates data into
> large receive frames when it's TCP and the ports and addresses
> match.
>
> There is no knowledge of routes or anything like that, nor should
> there be, because otherwise it would cease to be stateless.
OK, but bridging doesn't seem to need much more... But, anyway, why
(with such a smart hardware) it should be impossible to get such
local LRO packets, and do ip forwarding (with non-LRO non-local
packets)?
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-04-17 12:11 ` Ben Hutchings
@ 2008-04-30 18:36 ` Kok, Auke
2008-05-02 14:34 ` Ben Hutchings
2008-09-14 2:09 ` Jeff Garzik
1 sibling, 1 reply; 21+ messages in thread
From: Kok, Auke @ 2008-04-30 18:36 UTC (permalink / raw)
To: Ben Hutchings, Jeff Garzik; +Cc: netdev, Stephen Hemminger, Kieran Mansley
Ben Hutchings wrote:
> Add lro support to command in similar manner to TSO, GSO, etc.
> The file ethtool-copy.h is updated to be sanitised version of
> ethtool.h from 2.6.25-rc4 (ie make headers_install)
>
> Based on patch by Stephen Hemminger
> <http://article.gmane.org/gmane.linux.network/88124>.
>
> My changes:
> - Changed error return codes for setting LRO to be unique.
> - Report failure to get device flags, consistent with other offload settings.
> - Fixed code spacing.
>
> Tested with the sfc driver.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> diff --git a/ethtool-copy.h b/ethtool-copy.h
> index 3a63224..87c7c9a 100644
> --- a/ethtool-copy.h
> +++ b/ethtool-copy.h
> @@ -39,7 +39,8 @@ struct ethtool_drvinfo {
> char bus_info[ETHTOOL_BUSINFO_LEN]; /* Bus info for this IF. */
> /* For PCI devices, use pci_name(pci_dev). */
> char reserved1[32];
> - char reserved2[16];
> + char reserved2[12];
> + __u32 n_priv_flags; /* number of flags valid in ETHTOOL_GPFLAGS */
> __u32 n_stats; /* number of u64's from ETHTOOL_GSTATS */
> __u32 testinfo_len;
> __u32 eedump_len; /* Size of data from ETHTOOL_GEEPROM (bytes) */
> @@ -219,6 +220,7 @@ struct ethtool_pauseparam {
> enum ethtool_stringset {
> ETH_SS_TEST = 0,
> ETH_SS_STATS,
> + ETH_SS_PRIV_FLAGS,
> };
>
> /* for passing string sets for data tagging */
> @@ -256,125 +258,19 @@ struct ethtool_perm_addr {
> __u8 data[0];
> };
>
> -#ifdef __KERNEL__
> -
> -struct net_device;
> -
> -/* Some generic methods drivers may use in their ethtool_ops */
> -u32 ethtool_op_get_link(struct net_device *dev);
> -u32 ethtool_op_get_tx_csum(struct net_device *dev);
> -int ethtool_op_set_tx_csum(struct net_device *dev, u32 data);
> -int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data);
> -int ethtool_op_set_tx_ipv6_csum(struct net_device *dev, u32 data);
> -u32 ethtool_op_get_sg(struct net_device *dev);
> -int ethtool_op_set_sg(struct net_device *dev, u32 data);
> -u32 ethtool_op_get_tso(struct net_device *dev);
> -int ethtool_op_set_tso(struct net_device *dev, u32 data);
> -int ethtool_op_get_perm_addr(struct net_device *dev,
> - struct ethtool_perm_addr *addr, u8 *data);
> -u32 ethtool_op_get_ufo(struct net_device *dev);
> -int ethtool_op_set_ufo(struct net_device *dev, u32 data);
> -
> -/**
> - * ðtool_ops - Alter and report network device settings
> - * get_settings: Get device-specific settings
> - * set_settings: Set device-specific settings
> - * get_drvinfo: Report driver information
> - * get_regs: Get device registers
> - * get_wol: Report whether Wake-on-Lan is enabled
> - * set_wol: Turn Wake-on-Lan on or off
> - * get_msglevel: Report driver message level
> - * set_msglevel: Set driver message level
> - * nway_reset: Restart autonegotiation
> - * get_link: Get link status
> - * get_eeprom: Read data from the device EEPROM
> - * set_eeprom: Write data to the device EEPROM
> - * get_coalesce: Get interrupt coalescing parameters
> - * set_coalesce: Set interrupt coalescing parameters
> - * get_ringparam: Report ring sizes
> - * set_ringparam: Set ring sizes
> - * get_pauseparam: Report pause parameters
> - * set_pauseparam: Set pause paramters
> - * get_rx_csum: Report whether receive checksums are turned on or off
> - * set_rx_csum: Turn receive checksum on or off
> - * get_tx_csum: Report whether transmit checksums are turned on or off
> - * set_tx_csum: Turn transmit checksums on or off
> - * get_sg: Report whether scatter-gather is enabled
> - * set_sg: Turn scatter-gather on or off
> - * get_tso: Report whether TCP segmentation offload is enabled
> - * set_tso: Turn TCP segmentation offload on or off
> - * get_ufo: Report whether UDP fragmentation offload is enabled
> - * set_ufo: Turn UDP fragmentation offload on or off
> - * self_test: Run specified self-tests
> - * get_strings: Return a set of strings that describe the requested objects
> - * phys_id: Identify the device
> - * get_stats: Return statistics about the device
> - * get_perm_addr: Gets the permanent hardware address
> - *
> - * Description:
> +/* boolean flags controlling per-interface behavior characteristics.
> + * When reading, the flag indicates whether or not a certain behavior
> + * is enabled/present. When writing, the flag indicates whether
> + * or not the driver should turn on (set) or off (clear) a behavior.
> *
> - * get_settings:
> - * @get_settings is passed an ðtool_cmd to fill in. It returns
> - * an negative errno or zero.
> - *
> - * set_settings:
> - * @set_settings is passed an ðtool_cmd and should attempt to set
> - * all the settings this device supports. It may return an error value
> - * if something goes wrong (otherwise 0).
> - *
> - * get_eeprom:
> - * Should fill in the magic field. Don't need to check len for zero
> - * or wraparound. Fill in the data argument with the eeprom values
> - * from offset to offset + len. Update len to the amount read.
> - * Returns an error or zero.
> - *
> - * set_eeprom:
> - * Should validate the magic field. Don't need to check len for zero
> - * or wraparound. Update len to the amount written. Returns an error
> - * or zero.
> + * Some behaviors may read-only (unconditionally absent or present).
> + * If such is the case, return EINVAL in the set-flags operation if the
> + * flag differs from the read-only value.
> */
> -struct ethtool_ops {
> - int (*get_settings)(struct net_device *, struct ethtool_cmd *);
> - int (*set_settings)(struct net_device *, struct ethtool_cmd *);
> - void (*get_drvinfo)(struct net_device *, struct ethtool_drvinfo *);
> - int (*get_regs_len)(struct net_device *);
> - void (*get_regs)(struct net_device *, struct ethtool_regs *, void *);
> - void (*get_wol)(struct net_device *, struct ethtool_wolinfo *);
> - int (*set_wol)(struct net_device *, struct ethtool_wolinfo *);
> - u32 (*get_msglevel)(struct net_device *);
> - void (*set_msglevel)(struct net_device *, u32);
> - int (*nway_reset)(struct net_device *);
> - u32 (*get_link)(struct net_device *);
> - int (*get_eeprom_len)(struct net_device *);
> - int (*get_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
> - int (*set_eeprom)(struct net_device *, struct ethtool_eeprom *, u8 *);
> - int (*get_coalesce)(struct net_device *, struct ethtool_coalesce *);
> - int (*set_coalesce)(struct net_device *, struct ethtool_coalesce *);
> - void (*get_ringparam)(struct net_device *, struct ethtool_ringparam *);
> - int (*set_ringparam)(struct net_device *, struct ethtool_ringparam *);
> - void (*get_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
> - int (*set_pauseparam)(struct net_device *, struct ethtool_pauseparam*);
> - u32 (*get_rx_csum)(struct net_device *);
> - int (*set_rx_csum)(struct net_device *, u32);
> - u32 (*get_tx_csum)(struct net_device *);
> - int (*set_tx_csum)(struct net_device *, u32);
> - u32 (*get_sg)(struct net_device *);
> - int (*set_sg)(struct net_device *, u32);
> - u32 (*get_tso)(struct net_device *);
> - int (*set_tso)(struct net_device *, u32);
> - int (*self_test_count)(struct net_device *);
> - void (*self_test)(struct net_device *, struct ethtool_test *, u64 *);
> - void (*get_strings)(struct net_device *, u32 stringset, u8 *);
> - int (*phys_id)(struct net_device *, u32);
> - int (*get_stats_count)(struct net_device *);
> - void (*get_ethtool_stats)(struct net_device *, struct ethtool_stats *, u64 *);
> - int (*get_perm_addr)(struct net_device *, struct ethtool_perm_addr *, u8 *);
> - int (*begin)(struct net_device *);
> - void (*complete)(struct net_device *);
> - u32 (*get_ufo)(struct net_device *);
> - int (*set_ufo)(struct net_device *, u32);
> +enum ethtool_flags {
> + ETH_FLAG_LRO = (1 << 15), /* LRO is enabled */
> };
> -#endif /* __KERNEL__ */
> +
>
> /* CMDs currently supported */
> #define ETHTOOL_GSET 0x00000001 /* Get settings. */
> @@ -414,6 +310,10 @@ struct ethtool_ops {
> #define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */
> #define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */
> #define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */
> +#define ETHTOOL_GFLAGS 0x00000025 /* Get flags bitmap(ethtool_value) */
> +#define ETHTOOL_SFLAGS 0x00000026 /* Set flags bitmap(ethtool_value) */
> +#define ETHTOOL_GPFLAGS 0x00000027 /* Get driver-private flags bitmap */
> +#define ETHTOOL_SPFLAGS 0x00000028 /* Set driver-private flags bitmap */
>
> /* compatibility with older code */
> #define SPARC_ETH_GSET ETHTOOL_GSET
> diff --git a/ethtool.c b/ethtool.c
> index a668b49..5f519bc 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -151,7 +151,9 @@ static struct option {
> " [ sg on|off ]\n"
> " [ tso on|off ]\n"
> " [ ufo on|off ]\n"
> - " [ gso on|off ]\n" },
> + " [ gso on|off ]\n"
> + " [ lro on|off ]\n"
> + },
> { "-i", "--driver", MODE_GDRV, "Show driver information" },
> { "-d", "--register-dump", MODE_GREGS, "Do a register dump",
> " [ raw on|off ]\n"
> @@ -200,6 +202,7 @@ static int off_sg_wanted = -1;
> static int off_tso_wanted = -1;
> static int off_ufo_wanted = -1;
> static int off_gso_wanted = -1;
> +static int off_lro_wanted = -1;
>
> static struct ethtool_pauseparam epause;
> static int gpause_changed = 0;
> @@ -310,6 +313,7 @@ static struct cmdline_info cmdline_offload[] = {
> { "tso", CMDL_BOOL, &off_tso_wanted, NULL },
> { "ufo", CMDL_BOOL, &off_ufo_wanted, NULL },
> { "gso", CMDL_BOOL, &off_gso_wanted, NULL },
> + { "lro", CMDL_BOOL, &off_lro_wanted, NULL },
> };
>
> static struct cmdline_info cmdline_pause[] = {
> @@ -1217,7 +1221,7 @@ static int dump_coalesce(void)
> return 0;
> }
>
> -static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
> +static int dump_offload(int rx, int tx, int sg, int tso, int ufo, int gso, int lro)
> {
> fprintf(stdout,
> "rx-checksumming: %s\n"
> @@ -1225,13 +1229,15 @@ static int dump_offload (int rx, int tx, int sg, int tso, int ufo, int gso)
> "scatter-gather: %s\n"
> "tcp segmentation offload: %s\n"
> "udp fragmentation offload: %s\n"
> - "generic segmentation offload: %s\n",
> + "generic segmentation offload: %s\n"
> + "large receive offload: %s\n",
> rx ? "on" : "off",
> tx ? "on" : "off",
> sg ? "on" : "off",
> tso ? "on" : "off",
> ufo ? "on" : "off",
> - gso ? "on" : "off");
> + gso ? "on" : "off",
> + lro ? "on" : "off");
>
> return 0;
> }
> @@ -1495,7 +1501,8 @@ static int do_scoalesce(int fd, struct ifreq *ifr)
> static int do_goffload(int fd, struct ifreq *ifr)
> {
> struct ethtool_value eval;
> - int err, allfail = 1, rx = 0, tx = 0, sg = 0, tso = 0, ufo = 0, gso = 0;
> + int err, allfail = 1, rx = 0, tx = 0, sg = 0;
> + int tso = 0, ufo = 0, gso = 0, lro = 0;
>
> fprintf(stdout, "Offload parameters for %s:\n", devname);
>
> @@ -1559,12 +1566,22 @@ static int do_goffload(int fd, struct ifreq *ifr)
> allfail = 0;
> }
>
> + eval.cmd = ETHTOOL_GFLAGS;
> + ifr->ifr_data = (caddr_t)&eval;
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (err) {
> + perror("Cannot get device flags");
> + } else {
> + lro = (eval.data & ETH_FLAG_LRO) != 0;
> + allfail = 0;
> + }
> +
> if (allfail) {
> fprintf(stdout, "no offload info available\n");
> return 83;
> }
>
> - return dump_offload(rx, tx, sg, tso, ufo, gso);
> + return dump_offload(rx, tx, sg, tso, ufo, gso, lro);
> }
>
> static int do_soffload(int fd, struct ifreq *ifr)
> @@ -1641,6 +1658,30 @@ static int do_soffload(int fd, struct ifreq *ifr)
> return 90;
> }
> }
> + if (off_lro_wanted >= 0) {
> + changed = 1;
> + eval.cmd = ETHTOOL_GFLAGS;
> + eval.data = 0;
> + ifr->ifr_data = (caddr_t)&eval;
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (err) {
> + perror("Cannot get device flag settings");
> + return 91;
> + }
> +
> + eval.cmd = ETHTOOL_SFLAGS;
> + if (off_lro_wanted == 1)
> + eval.data |= ETH_FLAG_LRO;
> + else
> + eval.data &= ~ETH_FLAG_LRO;
> +
> + err = ioctl(fd, SIOCETHTOOL, ifr);
> + if (err) {
> + perror("Cannot set large receive offload settings");
> + return 92;
> + }
> + }
> +
> if (!changed) {
> fprintf(stdout, "no offload settings changed\n");
> }
>
what is the status of this patch?
I don't like the fact that you included the removal of a __KERNEL__ section in
this patch - that should have been a separate patch IMHO :)
otherwise I would like to see this patch included.
Auke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-04-30 18:36 ` Kok, Auke
@ 2008-05-02 14:34 ` Ben Hutchings
0 siblings, 0 replies; 21+ messages in thread
From: Ben Hutchings @ 2008-05-02 14:34 UTC (permalink / raw)
To: Kok, Auke; +Cc: Jeff Garzik, netdev, Stephen Hemminger, Kieran Mansley
Kok, Auke wrote:
> I don't like the fact that you included the removal of a __KERNEL__ section in
> this patch - that should have been a separate patch IMHO :)
The changes to ethtool-copy.h can just as well be a separate patch, and can
be done mechanically with:
make -C <kernel-dir> INSTALL_HDR_PATH=$PWD/tmp headers_install
mv tmp/include/linux/ethtool.h ethtool-copy.h
rm -rf tmp
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ethtool: command line support for lro
2008-04-17 12:11 ` Ben Hutchings
2008-04-30 18:36 ` Kok, Auke
@ 2008-09-14 2:09 ` Jeff Garzik
1 sibling, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2008-09-14 2:09 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, Stephen Hemminger, Kieran Mansley
Ben Hutchings wrote:
> Add lro support to command in similar manner to TSO, GSO, etc.
> The file ethtool-copy.h is updated to be sanitised version of
> ethtool.h from 2.6.25-rc4 (ie make headers_install)
>
> Based on patch by Stephen Hemminger
> <http://article.gmane.org/gmane.linux.network/88124>.
>
> My changes:
> - Changed error return codes for setting LRO to be unique.
> - Report failure to get device flags, consistent with other offload settings.
> - Fixed code spacing.
>
> Tested with the sfc driver.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
applied
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2008-09-14 2:09 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-07 14:09 LRO/GSO interaction when packets are forwarded Kieran Mansley
2008-03-07 16:25 ` Stephen Hemminger
2008-03-07 17:06 ` Kieran Mansley
2008-03-07 21:43 ` [PATCH] ethtool: command line support for lro Stephen Hemminger
2008-03-10 18:07 ` Ben Hutchings
2008-03-10 18:29 ` Stephen Hemminger
2008-03-10 18:50 ` Ben Hutchings
2008-04-17 12:11 ` Ben Hutchings
2008-04-30 18:36 ` Kok, Auke
2008-05-02 14:34 ` Ben Hutchings
2008-09-14 2:09 ` Jeff Garzik
2008-03-11 16:50 ` LRO/GSO interaction when packets are forwarded Kieran Mansley
2008-04-22 21:15 ` Ben Hutchings
2008-04-22 23:01 ` Stephen Hemminger
2008-04-23 6:00 ` Jarek Poplawski
2008-04-23 6:15 ` Jarek Poplawski
2008-04-23 10:07 ` Ben Hutchings
2008-04-23 10:38 ` Jarek Poplawski
2008-04-23 10:42 ` David Miller
2008-04-23 11:09 ` Jarek Poplawski
2008-04-23 10:04 ` Ben Hutchings
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).