* Re: [GIT] Networking
From: David Miller @ 2012-12-03 0:32 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <CA+55aFy1sa=D-DrWqNuvjrLW8J0Tw1+GbZhXv6VHTTTxXG_DMA@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 2 Dec 2012 16:13:30 -0800
> David, Willy pointed me to the recent splice crash fix
> (do_tcp_sendpages and non-0-order pages). It's apparently easily
> user-triggerable.. Should I take the patch directly, or do you have a
> tree to pull. Don't want to make a release with a known oopser..
I have a tree to pull. Coming in a few minutes.
^ permalink raw reply
* Re: [GIT] Networking
From: Linus Torvalds @ 2012-12-03 0:13 UTC (permalink / raw)
To: David Miller
Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20121128.214732.1634269294133625782.davem@davemloft.net>
David, Willy pointed me to the recent splice crash fix
(do_tcp_sendpages and non-0-order pages). It's apparently easily
user-triggerable.. Should I take the patch directly, or do you have a
tree to pull. Don't want to make a release with a known oopser..
Linus
^ permalink raw reply
* Re: [PATCHv4] virtio-spec: virtio network device RFS support
From: Rusty Russell @ 2012-12-02 22:46 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang; +Cc: netdev, kvm, virtualization
In-Reply-To: <20121122144645.GA28284@redhat.com>
"Michael S. Tsirkin" <mst@redhat.com> writes:
> Add RFS support to virtio network device.
> Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
> configuration field max_virtqueue_pairs to detect supported number of
> virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
> packet steering for unidirectional protocols.
Hi Michael,
Sorry for the delay, I took last week off.
> - rename multiqueue -> rfs this is what we support
> - Be more explicit about what driver should do.
> - Simplify layout making VQs functionality depend on feature.
> - Remove unused commands, only leave in programming # of queues
Thanks: this looks really nice now. Comments are about the text, not
the ideas.
> + 2N+1: transmitqN.
> + 2N+
> +\change_unchanged
> +2:controlq
> \begin_inset Foot
> status open
Hmm, controlq after xmit queues... a nice improvement.
> +VIRTIO_NET_F_RFS(2) Device supports Receive Flow Steering.
I think readers would prefer numerical order to historical order here,
so perhaps move this up in the list.
> -layout Two configuration fields are currently defined.
> +layout
> +\change_deleted 1986246365 1352743300
> +Two
> +\change_inserted 1986246365 1352743301
> +Four
> +\change_unchanged
> + configuration fields are currently defined.
two to four? I only see three? And you didn't update the structure to
match...
> + Following this, driver should not transmit new packets on virtqueues other
> + than transmitq0 and device will not steer new packets on virtqueues other
> + than receiveq0.
"Following this" is vague. After the buffer is consumed by the device.
Should not is kind of meaningless. Let's make it clear: the device will
not steer new packets to RxqN, nor read from TxqN.
You should probably put in a note about the RFS control in the Device
Initialization section, too, ie. if you have negotiated and want to use
more queues, you must initialize them then wait for the ack of the
CTRL_RFS cmd.
Note: the following hunks didn't apply, but I'm not sure why they're in
this anyway...
> @@ -6152,13 +6385,7 @@ Virtqueues 0:receiveq(port0).
> status open
>
> \begin_layout Plain Layout
> -Ports
> -\change_inserted 1986246365 1347188327
> -1
> -\change_deleted 1986246365 1347188327
> -2
> -\change_unchanged
> - onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
> +Ports 12 onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
> \end_layout
>
> \end_inset
> @@ -6185,13 +6412,8 @@ VIRTIO_CONSOLE_F_SIZE
>
> \begin_layout Description
> VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple ports; configurati
> -on fields nr_ports and max_nr_ports are valid
> -\change_inserted 1986246365 1347188404
> -; if this bit is negotiated,
> -\change_deleted 1986246365 1347188406
> - and
> -\change_unchanged
> - control virtqueues will be used.
> +on fields nr_ports and max_nr_ports are valid; if this bit is negotiated,
> + and control virtqueues will be used.
> \end_layout
>
> \end_deeper
> @@ -6260,8 +6482,7 @@ If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the driver can
> spawn multiple ports, not all of which may be attached to a console.
> Some could be generic ports.
> In this case, the control virtqueues are enabled and according to the max_nr_po
> -rts configuration-space value, an appropriate number of virtqueues are
> - created.
> +rts configuration-space value, an appropriate number of virtqueues are created.
> A control message indicating the driver is ready is sent to the host.
> The host can then send control messages for adding new ports to the device.
> After creating and initializing each port, a VIRTIO_CONSOLE_PORT_READY
> @@ -6699,14 +6920,9 @@ The driver constructs an array of addresses of memory pages it has previously
> \end_layout
>
> \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is
> -\change_inserted 1986246365 1347188540
> -negotiated
> -\change_deleted 1986246365 1347188542
> -set
> -\change_unchanged
> -, the guest may not use these requested pages until that descriptor in the
> - deflateq has been used by the device.
> +If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiatedset, the guest
> + may not use these requested pages until that descriptor in the deflateq
> + has been used by the device.
> \end_layout
>
> \begin_layout Enumerate
Cheers,
Rusty.
^ permalink raw reply
* Re: Optics (SFP) monitoring on ixgbe and igbe
From: Aurélien @ 2012-12-02 22:00 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev
In-Reply-To: <CAPN4dA_Ar8rnWE14Dq6uJHhrMY9ttEE4XcWg88jcVe4zc=_e8g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 571 bytes --]
On Sun, Dec 2, 2012 at 10:47 PM, Aurélien <footplus@gmail.com> wrote:
>>
>> This version drops the -lm completely, so it doesn't link. Maybe you
>> edited the generated Makefile or Makefile.in?
>
> No, I just stupidly forgot to make distclean & autogen after removing
> all libm checks. Re-added AC_CHECK_LIB to link with it.
>
Just after re-reading this, I thought it was silly; I instead edited
Makefile.am and reverted the configure.ac change. Here's a new
full-patch with that fix along with the rest.
Sorry for the noise.
--
Aurélien Guillaume
[-- Attachment #2: 0001-Implemented-basic-optics-diagnostics-for-SFF-8472.patch --]
[-- Type: application/octet-stream, Size: 22479 bytes --]
From c6e58988c0c30123f78e0ae83730697bb2b159d0 Mon Sep 17 00:00:00 2001
From: Aurelien Guillaume <aurelien@iwi.me>
Date: Sun, 2 Dec 2012 21:21:01 +0100
Subject: [PATCH] Implemented basic optics diagnostics for SFF-8472
Signed-off-by: Aurelien Guillaume <aurelien@iwi.me>
---
Makefile.am | 3 +-
ethtool.8.in | 8 +-
ethtool.c | 17 +++-
internal.h | 3 +
sfpdiag.c | 362 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sfpid.c | 28 +++--
test-cmdline.c | 2 +
7 files changed, 404 insertions(+), 19 deletions(-)
create mode 100644 sfpdiag.c
diff --git a/Makefile.am b/Makefile.am
index e33f71f..ba1faa6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,4 +1,5 @@
AM_CFLAGS = -Wall
+LDADD = -lm
man_MANS = ethtool.8
EXTRA_DIST = LICENSE ethtool.8 ethtool.spec.in aclocal.m4 ChangeLog autogen.sh
@@ -9,7 +10,7 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h internal.h net_tstamp-copy.h \
fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c \
pcnet32.c realtek.c tg3.c marvell.c vioc.c \
smsc911x.c at76c50x-usb.c sfc.c stmmac.c \
- rxclass.c sfpid.c
+ rxclass.c sfpid.c sfpdiag.c
TESTS = test-cmdline test-features
check_PROGRAMS = test-cmdline test-features
diff --git a/ethtool.8.in b/ethtool.8.in
index a3c7fbb..e701919 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -312,7 +312,7 @@ ethtool \- query or control network driver and hardware settings
.BN other
.BN combined
.HP
-.B ethtool \-m|\-\-dump\-module\-eeprom
+.B ethtool \-m|\-\-dump\-module\-eeprom|\-\-module\-info
.I devname
.B2 raw on off
.B2 hex on off
@@ -815,8 +815,10 @@ Changes the number of channels used only for other purposes e.g. link interrupts
.BI combined \ N
Changes the number of multi-purpose channels.
.TP
-.B \-m \-\-dump\-module\-eeprom
-Retrieves and if possible decodes the EEPROM from plugin modules, e.g SFP+, QSFP
+.B \-m \-\-dump\-module\-eeprom \-\-module\-info
+Retrieves and if possible decodes the EEPROM from plugin modules, e.g SFP+, QSFP.
+If the driver and module support it, the optical diagnostic information is also
+read and decoded.
.TP
.B \-\-show\-priv\-flags
Queries the specified network device for its private flags. The
diff --git a/ethtool.c b/ethtool.c
index 3db7fec..345c21c 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3604,6 +3604,16 @@ static int do_getmodule(struct cmd_context *ctx)
return 1;
}
+ /*
+ * SFF-8079 EEPROM layout contains the memory available at A0 address on
+ * the PHY EEPROM.
+ * SFF-8472 defines a virtual extension of the EEPROM, where the
+ * microcontroller on the SFP/SFP+ generates a page at the A2 address,
+ * which contains data relative to optical diagnostics.
+ * The current kernel implementation returns a blob, which contains:
+ * - ETH_MODULE_SFF_8079 => The A0 page only.
+ * - ETH_MODULE_SFF_8472 => The A0 and A2 page concatenated.
+ */
if (geeprom_dump_raw) {
fwrite(eeprom->data, 1, eeprom->len, stdout);
} else {
@@ -3613,8 +3623,11 @@ static int do_getmodule(struct cmd_context *ctx)
} else if (!geeprom_dump_hex) {
switch (modinfo.type) {
case ETH_MODULE_SFF_8079:
+ sff8079_show_all(eeprom->data);
+ break;
case ETH_MODULE_SFF_8472:
sff8079_show_all(eeprom->data);
+ sff8472_show_all(eeprom->data);
break;
default:
geeprom_dump_hex = 1;
@@ -3831,8 +3844,8 @@ static const struct option {
{ "--show-priv-flags" , 1, do_gprivflags, "Query private flags" },
{ "--set-priv-flags", 1, do_sprivflags, "Set private flags",
" FLAG on|off ...\n" },
- { "-m|--dump-module-eeprom", 1, do_getmodule,
- "Qeuery/Decode Module EEPROM information",
+ { "-m|--dump-module-eeprom|--module-info", 1, do_getmodule,
+ "Query/Decode Module EEPROM information and optical diagnostics if available",
" [ raw on|off ]\n"
" [ hex on|off ]\n"
" [ offset N ]\n"
diff --git a/internal.h b/internal.h
index 4f96fd5..e977a81 100644
--- a/internal.h
+++ b/internal.h
@@ -253,4 +253,7 @@ int rxclass_rule_del(struct cmd_context *ctx, __u32 loc);
/* Module EEPROM parsing code */
void sff8079_show_all(const __u8 *id);
+/* Optics diagnostics */
+void sff8472_show_all(const __u8 *id);
+
#endif /* ETHTOOL_INTERNAL_H__ */
diff --git a/sfpdiag.c b/sfpdiag.c
new file mode 100644
index 0000000..f67e491
--- /dev/null
+++ b/sfpdiag.c
@@ -0,0 +1,362 @@
+/*
+ * sfpdiag.c: Implements SFF-8472 optics diagnostics.
+ *
+ * Aurelien Guillaume <aurelien@iwi.me> (C) 2012
+ * This implementation is loosely based on DOM patches
+ * from Robert Olsson <robert@herjulf.se> (C) 2009
+ * and SFF-8472 specs (ftp://ftp.seagate.com/pub/sff/SFF-8472.PDF)
+ * by SFF Committee.
+ */
+
+#include <stdio.h>
+#include <math.h>
+#include <arpa/inet.h>
+#include "internal.h"
+
+/* Offsets in decimal, for direct comparison with the SFF specs */
+
+/* A0-based EEPROM offsets for DOM support checks */
+#define SFF_A0_DOM 92
+#define SFF_A0_OPTIONS 93
+#define SFF_A0_COMP 94
+
+/* EEPROM bit values for various registers */
+#define SFF_A0_DOM_EXTCAL (1 << 4)
+#define SFF_A0_DOM_INTCAL (1 << 5)
+#define SFF_A0_DOM_IMPL (1 << 6)
+#define SFF_A0_DOM_PWRT (1 << 3)
+
+#define SFF_A0_OPTIONS_AW (1 << 7)
+
+/*
+ * See ethtool.c comments about SFF-8472, this is the offset
+ * at which the A2 page is in the EEPROM blob returned by the
+ * kernel.
+ */
+#define SFF_A2_BASE 0x100
+
+/* A2-based offsets for DOM */
+#define SFF_A2_TEMP 96
+#define SFF_A2_TEMP_HALRM 0
+#define SFF_A2_TEMP_LALRM 2
+#define SFF_A2_TEMP_HWARN 4
+#define SFF_A2_TEMP_LWARN 6
+
+#define SFF_A2_VCC 98
+#define SFF_A2_VCC_HALRM 8
+#define SFF_A2_VCC_LALRM 10
+#define SFF_A2_VCC_HWARN 12
+#define SFF_A2_VCC_LWARN 14
+
+#define SFF_A2_BIAS 96
+#define SFF_A2_BIAS_HALRM 16
+#define SFF_A2_BIAS_LALRM 18
+#define SFF_A2_BIAS_HWARN 20
+#define SFF_A2_BIAS_LWARN 22
+
+#define SFF_A2_TX_PWR 102
+#define SFF_A2_TX_PWR_HALRM 24
+#define SFF_A2_TX_PWR_LALRM 26
+#define SFF_A2_TX_PWR_HWARN 28
+#define SFF_A2_TX_PWR_LWARN 30
+
+#define SFF_A2_RX_PWR 104
+#define SFF_A2_RX_PWR_HALRM 32
+#define SFF_A2_RX_PWR_LALRM 34
+#define SFF_A2_RX_PWR_HWARN 36
+#define SFF_A2_RX_PWR_LWARN 38
+
+#define SFF_A2_ALRM_FLG 112
+#define SFF_A2_WARN_FLG 116
+
+/* 32-bit little-endian calibration constants */
+#define SFF_A2_CAL_RXPWR4 56
+#define SFF_A2_CAL_RXPWR3 60
+#define SFF_A2_CAL_RXPWR2 64
+#define SFF_A2_CAL_RXPWR1 68
+#define SFF_A2_CAL_RXPWR0 72
+
+/* 16-bit little endian calibration constants */
+#define SFF_A2_CAL_TXI_SLP 76
+#define SFF_A2_CAL_TXI_OFF 78
+#define SFF_A2_CAL_TXPWR_SLP 80
+#define SFF_A2_CAL_TXPWR_OFF 82
+#define SFF_A2_CAL_T_SLP 84
+#define SFF_A2_CAL_T_OFF 86
+#define SFF_A2_CAL_V_SLP 88
+#define SFF_A2_CAL_V_OFF 90
+
+
+struct sff8472_diags {
+
+#define MCURR 0
+#define LWARN 1
+#define HWARN 2
+#define LALRM 3
+#define HALRM 4
+
+ /* [5] tables are current, low/high warn, low/high alarm */
+ __u8 supports_dom; /* Supports DOM */
+ __u8 supports_alarms; /* Supports alarm/warning thold */
+ __u8 calibrated_ext; /* Is externally calibrated */
+ __u16 bias_cur[5]; /* Measured bias current in 2uA units */
+ __u16 tx_power[5]; /* Measured TX Power in 0.1uW units */
+ __u16 rx_power[5]; /* Measured RX Power */
+ __u8 rx_power_type; /* 0 = OMA, 1 = Average power */
+ __s16 sfp_temp[5]; /* SFP Temp in 16-bit signed 1/256 Celcius */
+ __u16 sfp_voltage[5]; /* SFP voltage in 0.1mV units */
+
+};
+
+static struct sff8472_aw_flags {
+ const char *str; /* Human-readable string, null at the end */
+ int offset; /* A2-relative adress offset */
+ __u8 value; /* Alarm is on if (offset & value) != 0. */
+} sff8472_aw_flags[] = {
+ { "Laser bias current high alarm", SFF_A2_ALRM_FLG, (1 << 3) },
+ { "Laser bias current low alarm", SFF_A2_ALRM_FLG, (1 << 2) },
+ { "Laser bias current high warning", SFF_A2_WARN_FLG, (1 << 3) },
+ { "Laser bias current low warning", SFF_A2_WARN_FLG, (1 << 2) },
+
+ { "Laser output power high alarm", SFF_A2_ALRM_FLG, (1 << 1) },
+ { "Laser output power low alarm", SFF_A2_ALRM_FLG, (1 << 0) },
+ { "Laser output power high warning", SFF_A2_WARN_FLG, (1 << 1) },
+ { "Laser output power low warning", SFF_A2_WARN_FLG, (1 << 0) },
+
+ { "Module temperature high alarm", SFF_A2_ALRM_FLG, (1 << 7) },
+ { "Module temperature low alarm", SFF_A2_ALRM_FLG, (1 << 6) },
+ { "Module temperature high warning", SFF_A2_WARN_FLG, (1 << 7) },
+ { "Module temperature low warning", SFF_A2_WARN_FLG, (1 << 6) },
+
+ { "Module voltage high alarm", SFF_A2_ALRM_FLG, (1 << 5) },
+ { "Module voltage low alarm", SFF_A2_ALRM_FLG, (1 << 4) },
+ { "Module voltage high warning", SFF_A2_WARN_FLG, (1 << 5) },
+ { "Module voltage low warning", SFF_A2_WARN_FLG, (1 << 4) },
+
+ { "Laser rx power high alarm", SFF_A2_ALRM_FLG + 1, (1 << 7) },
+ { "Laser rx power low alarm", SFF_A2_ALRM_FLG + 1, (1 << 6) },
+ { "Laser rx power high warning", SFF_A2_WARN_FLG + 1, (1 << 7) },
+ { "Laser rx power low warning", SFF_A2_WARN_FLG + 1, (1 << 6) },
+
+ { NULL, 0, 0 },
+};
+
+static double convert_mw_to_dbm(double mw)
+{
+ return (10. * log10(mw / 1000.)) + 30.;
+}
+
+
+/* Most common case: 16-bit unsigned integer in a certain unit */
+#define A2_OFFSET_TO_U16(offset) \
+ (id[SFF_A2_BASE + (offset)] << 8 | id[SFF_A2_BASE + (offset) + 1])
+
+/* Calibration slope is a number between 0.0 included and 256.0 excluded. */
+#define A2_OFFSET_TO_SLP(offset) \
+ (id[SFF_A2_BASE + (offset)] + id[SFF_A2_BASE + (offset) + 1] / 256.)
+
+/* Calibration offset is an integer from -32768 to 32767 */
+#define A2_OFFSET_TO_OFF(offset) \
+ ((__s16)A2_OFFSET_TO_U16(offset))
+
+/* RXPWR(x) are IEEE-754 floating point numbers in big-endian format */
+#define A2_OFFSET_TO_RXPWRx(offset) \
+ (befloattoh((__u32 *)(id + SFF_A2_BASE + (offset))))
+
+/*
+ * 2-byte internal temperature conversions:
+ * First byte is a signed 8-bit integer, which is the temp decimal part
+ * Second byte are 1/256th of degree, which are added to the dec part.
+ */
+#define A2_OFFSET_TO_TEMP(offset) ((__s16)A2_OFFSET_TO_U16(offset))
+
+
+static void sff8472_dom_parse(const __u8 *id, struct sff8472_diags *sd)
+{
+
+ sd->bias_cur[MCURR] = A2_OFFSET_TO_U16(SFF_A2_BIAS);
+ sd->bias_cur[HALRM] = A2_OFFSET_TO_U16(SFF_A2_BIAS_HALRM);
+ sd->bias_cur[LALRM] = A2_OFFSET_TO_U16(SFF_A2_BIAS_LALRM);
+ sd->bias_cur[HWARN] = A2_OFFSET_TO_U16(SFF_A2_BIAS_HWARN);
+ sd->bias_cur[LWARN] = A2_OFFSET_TO_U16(SFF_A2_BIAS_LWARN);
+
+ sd->sfp_voltage[MCURR] = A2_OFFSET_TO_U16(SFF_A2_VCC);
+ sd->sfp_voltage[HALRM] = A2_OFFSET_TO_U16(SFF_A2_VCC_HALRM);
+ sd->sfp_voltage[LALRM] = A2_OFFSET_TO_U16(SFF_A2_VCC_LALRM);
+ sd->sfp_voltage[HWARN] = A2_OFFSET_TO_U16(SFF_A2_VCC_HWARN);
+ sd->sfp_voltage[LWARN] = A2_OFFSET_TO_U16(SFF_A2_VCC_LWARN);
+
+ sd->tx_power[MCURR] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR);
+ sd->tx_power[HALRM] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_HALRM);
+ sd->tx_power[LALRM] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_LALRM);
+ sd->tx_power[HWARN] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_HWARN);
+ sd->tx_power[LWARN] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_LWARN);
+
+ sd->rx_power[MCURR] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR);
+ sd->rx_power[HALRM] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_HALRM);
+ sd->rx_power[LALRM] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_LALRM);
+ sd->rx_power[HWARN] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_HWARN);
+ sd->rx_power[LWARN] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_LWARN);
+
+ sd->sfp_temp[MCURR] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP);
+ sd->sfp_temp[HALRM] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_HALRM);
+ sd->sfp_temp[LALRM] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_LALRM);
+ sd->sfp_temp[HWARN] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_HWARN);
+ sd->sfp_temp[LWARN] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_LWARN);
+
+}
+
+/* Converts to a float from a big-endian 4-byte source buffer. */
+static float befloattoh(const __u32 *source)
+{
+ union {
+ __u32 src;
+ float dst;
+ } converter;
+
+ converter.src = ntohl(*source);
+ return converter.dst;
+}
+
+static void sff8472_calibration(const __u8 *id, struct sff8472_diags *sd)
+{
+ int i;
+ __u16 rx_reading;
+
+ /* Calibration should occur for all values (threshold and current) */
+ for (i = 0; i < sizeof(sd->bias_cur); ++i) {
+ /*
+ * Apply calibration formula 1 (Temp., Voltage, Bias, Tx Power)
+ */
+ sd->bias_cur[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_TXI_SLP);
+ sd->tx_power[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_TXPWR_SLP);
+ sd->sfp_voltage[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_V_SLP);
+ sd->sfp_temp[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_T_SLP);
+
+ sd->bias_cur[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_TXI_OFF);
+ sd->tx_power[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_TXPWR_OFF);
+ sd->sfp_voltage[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_V_OFF);
+ sd->sfp_temp[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_T_OFF);
+
+ /*
+ * Apply calibration formula 2 (Rx Power only)
+ */
+ rx_reading = sd->rx_power[i];
+ sd->rx_power[i] = A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR0);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR1);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR2);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR3);
+ }
+}
+
+static void sff8472_parse_eeprom(const __u8 *id, struct sff8472_diags *sd)
+{
+ sd->supports_dom = id[SFF_A0_DOM] & SFF_A0_DOM_IMPL;
+ sd->supports_alarms = id[SFF_A0_OPTIONS] & SFF_A0_OPTIONS_AW;
+ sd->calibrated_ext = id[SFF_A0_DOM] & SFF_A0_DOM_EXTCAL;
+ sd->rx_power_type = id[SFF_A0_DOM] & SFF_A0_DOM_PWRT;
+
+ sff8472_dom_parse(id, sd);
+
+ /*
+ * If the SFP is externally calibrated, we need to read calibration data
+ * and compensate the already stored readings.
+ */
+ if (sd->calibrated_ext)
+ sff8472_calibration(id, sd);
+}
+
+void sff8472_show_all(const __u8 *id)
+{
+ struct sff8472_diags sd;
+ char *rx_power_string = NULL;
+ int i;
+
+ sff8472_parse_eeprom(id, &sd);
+
+ if (!sd.supports_dom) {
+ printf("\t%-41s : No\n", "Optical diagnostics support");
+ return ;
+ }
+ printf("\t%-41s : Yes\n", "Optical diagnostics support");
+
+#define PRINT_BIAS(string, index) \
+ printf("\t%-41s : %.3f mA\n", (string), \
+ (double)(sd.bias_cur[(index)] / 500.))
+
+# define PRINT_xX_PWR(string, var, index) \
+ printf("\t%-41s : %.4f mW / %.2f dBm\n", (string), \
+ (double)((var)[(index)] / 10000.), \
+ convert_mw_to_dbm((double)((var)[(index)] / 10000.)))
+
+#define PRINT_TEMP(string, index) \
+ printf("\t%-41s : %.2f degrees C / %.2f degrees F\n", (string), \
+ (double)(sd.sfp_temp[(index)] / 256.), \
+ (double)(sd.sfp_temp[(index)] / 256. * 1.8 + 32.))
+
+#define PRINT_VCC(string, index) \
+ printf("\t%-41s : %.4f V\n", (string), \
+ (double)(sd.sfp_voltage[(index)] / 10000.))
+
+ PRINT_BIAS("Laser bias current", MCURR);
+ PRINT_xX_PWR("Laser output power", sd.tx_power, MCURR);
+
+ if (!sd.rx_power_type)
+ rx_power_string = "Receiver signal OMA";
+ else
+ rx_power_string = "Receiver signal average optical power";
+
+ PRINT_xX_PWR(rx_power_string, sd.rx_power, MCURR);
+
+ PRINT_TEMP("Module temperature", MCURR);
+ PRINT_VCC("Module voltage", MCURR);
+
+ printf("\t%-41s : %s\n", "Alarm/warning flags implemented",
+ (sd.supports_alarms ? "Yes" : "No"));
+ if (sd.supports_alarms) {
+
+ for (i = 0; sff8472_aw_flags[i].str; ++i) {
+ printf("\t%-41s : %s\n", sff8472_aw_flags[i].str,
+ id[SFF_A2_BASE + sff8472_aw_flags[i].offset]
+ & sff8472_aw_flags[i].value ? "On" : "Off");
+ }
+
+ PRINT_BIAS("Laser bias current high alarm threshold", HALRM);
+ PRINT_BIAS("Laser bias current low alarm threshold", LALRM);
+ PRINT_BIAS("Laser bias current high warning threshold", HWARN);
+ PRINT_BIAS("Laser bias current low warning threshold", LWARN);
+
+ PRINT_xX_PWR("Laser output power high alarm threshold",
+ sd.tx_power, HALRM);
+ PRINT_xX_PWR("Laser output power low alarm threshold",
+ sd.tx_power, LALRM);
+ PRINT_xX_PWR("Laser output power high warning threshold",
+ sd.tx_power, HWARN);
+ PRINT_xX_PWR("Laser output power low warning threshold",
+ sd.tx_power, LWARN);
+
+ PRINT_TEMP("Module temperature high alarm threshold", HALRM);
+ PRINT_TEMP("Module temperature low alarm threshold", LALRM);
+ PRINT_TEMP("Module temperature high warning threshold", HWARN);
+ PRINT_TEMP("Module temperature low warning threshold", LWARN);
+
+ PRINT_VCC("Module voltage high alarm threshold", HALRM);
+ PRINT_VCC("Module voltage low alarm threshold", LALRM);
+ PRINT_VCC("Module voltage high warning threshold", HWARN);
+ PRINT_VCC("Module voltage low warning threshold", LWARN);
+
+ PRINT_xX_PWR("Laser rx power high alarm threshold",
+ sd.rx_power, HALRM);
+ PRINT_xX_PWR("Laser rx power low alarm threshold",
+ sd.rx_power, LALRM);
+ PRINT_xX_PWR("Laser rx power high warning threshold",
+ sd.rx_power, HWARN);
+ PRINT_xX_PWR("Laser rx power low warning threshold",
+ sd.rx_power, LWARN);
+ }
+
+}
+
diff --git a/sfpid.c b/sfpid.c
index a4a671d..4f88aa2 100644
--- a/sfpid.c
+++ b/sfpid.c
@@ -12,7 +12,7 @@
static void sff8079_show_identifier(const __u8 *id)
{
- printf("\tIdentifier : 0x%02x", id[0]);
+ printf("\t%-41s : 0x%02x", "Identifier", id[0]);
switch (id[0]) {
case 0x00:
printf(" (no module present, unknown, or unspecified)\n");
@@ -34,7 +34,7 @@ static void sff8079_show_identifier(const __u8 *id)
static void sff8079_show_ext_identifier(const __u8 *id)
{
- printf("\tExtended identifier : 0x%02x", id[1]);
+ printf("\t%-41s : 0x%02x", "Extended identifier", id[1]);
if (id[1] == 0x00)
printf(" (GBIC not specified / not MOD_DEF compliant)\n");
else if (id[1] == 0x04)
@@ -47,7 +47,7 @@ static void sff8079_show_ext_identifier(const __u8 *id)
static void sff8079_show_connector(const __u8 *id)
{
- printf("\tConnector : 0x%02x", id[2]);
+ printf("\t%-41s : 0x%02x", "Connector", id[2]);
switch (id[2]) {
case 0x00:
printf(" (unknown or unspecified)\n");
@@ -105,10 +105,12 @@ static void sff8079_show_connector(const __u8 *id)
static void sff8079_show_transceiver(const __u8 *id)
{
- static const char *pfx = "\t : =>";
+ static const char *pfx =
+ "\tTransceiver type :";
- printf("\tTransceiver codes : 0x%02x 0x%02x 0x%02x" \
+ printf("\t%-41s : 0x%02x 0x%02x 0x%02x " \
"0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n",
+ "Transceiver codes",
id[3], id[4], id[5], id[6],
id[7], id[8], id[9], id[10]);
/* 10G Ethernet Compliance Codes */
@@ -239,7 +241,7 @@ static void sff8079_show_transceiver(const __u8 *id)
static void sff8079_show_encoding(const __u8 *id)
{
- printf("\tEncoding : 0x%02x", id[11]);
+ printf("\t%-41s : 0x%02x", "Encoding", id[11]);
switch (id[11]) {
case 0x00:
printf(" (unspecified)\n");
@@ -270,7 +272,7 @@ static void sff8079_show_encoding(const __u8 *id)
static void sff8079_show_rate_identifier(const __u8 *id)
{
- printf("\tRate identifier : 0x%02x", id[13]);
+ printf("\t%-41s : 0x%02x", "Rate identifier", id[13]);
switch (id[13]) {
case 0x00:
printf(" (unspecified)\n");
@@ -295,14 +297,14 @@ static void sff8079_show_rate_identifier(const __u8 *id)
static void sff8079_show_oui(const __u8 *id)
{
- printf("\tVendor OUI : %02x:%02x:%02x\n",
+ printf("\t%-41s : %02x:%02x:%02x\n", "Vendor OUI",
id[37], id[38], id[39]);
}
static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
{
if (id[8] & (1 << 2)) {
- printf("\tPassive Cu cmplnce. : 0x%02x", id[60]);
+ printf("\t%-41s : 0x%02x", "Passive Cu cmplnce.", id[60]);
switch (id[60]) {
case 0x00:
printf(" (unspecified)");
@@ -316,7 +318,7 @@ static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
}
printf(" [SFF-8472 rev10.4 only]\n");
} else if (id[8] & (1 << 3)) {
- printf("\tActive Cu cmplnce. : 0x%02x", id[60]);
+ printf("\t%-41s : 0x%02x", "Active Cu cmplnce.", id[60]);
switch (id[60]) {
case 0x00:
printf(" (unspecified)");
@@ -333,7 +335,7 @@ static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
}
printf(" [SFF-8472 rev10.4 only]\n");
} else {
- printf("\tLaser wavelength : %unm\n",
+ printf("\t%-41s : %unm\n", "Laser wavelength",
(id[60] << 8) | id[61]);
}
}
@@ -344,7 +346,7 @@ static void sff8079_show_value_with_unit(const __u8 *id, unsigned int reg,
{
unsigned int val = id[reg];
- printf("\t%-20s: %u%s\n", name, val * mult, unit);
+ printf("\t%-41s : %u%s\n", name, val * mult, unit);
}
static void sff8079_show_ascii(const __u8 *id, unsigned int first_reg,
@@ -352,7 +354,7 @@ static void sff8079_show_ascii(const __u8 *id, unsigned int first_reg,
{
unsigned int reg, val;
- printf("\t%-20s: ", name);
+ printf("\t%-41s : ", name);
for (reg = first_reg; reg <= last_reg; reg++) {
val = id[reg];
putchar(((val >= 32) && (val <= 126)) ? val : '_');
diff --git a/test-cmdline.c b/test-cmdline.c
index 85b4ce0..f1d4555 100644
--- a/test-cmdline.c
+++ b/test-cmdline.c
@@ -213,6 +213,8 @@ static struct test_case {
{ 0, "-m devname" },
{ 1, "--dump-module-eeprom" },
{ 0, "--dump-module-eeprom devname" },
+ { 1, "--module-info" },
+ { 0, "--module-info devname" },
{ 0, "-m devname raw on" },
{ 0, "-m devname raw off" },
{ 0, "-m devname hex on" },
--
1.7.0.4
^ permalink raw reply related
* [PATCH net-next] tcp: don't abort splice() after small transfers
From: Eric Dumazet @ 2012-12-02 21:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Willy Tarreau
From: Willy Tarreau <w@1wt.eu>
TCP coalescing added a regression in splice(socket->pipe) performance,
for some workloads because of the way tcp_read_sock() is implemented.
The reason for this is the break when (offset + 1 != skb->len).
As we released the socket lock, this condition is possible if TCP stack
added a fragment to the skb, which can happen with TCP coalescing.
So let's go back to the beginning of the loop when this happens,
to give a chance to splice more frags per system call.
Doing so fixes the issue and makes GRO 10% faster than LRO
on CPU-bound splice() workloads instead of the opposite.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1aca02c..8fc5b3bd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1494,15 +1494,19 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
copied += used;
offset += used;
}
- /*
- * If recv_actor drops the lock (e.g. TCP splice
+ /* If recv_actor drops the lock (e.g. TCP splice
* receive) the skb pointer might be invalid when
* getting here: tcp_collapse might have deleted it
* while aggregating skbs from the socket queue.
*/
- skb = tcp_recv_skb(sk, seq-1, &offset);
- if (!skb || (offset+1 != skb->len))
+ skb = tcp_recv_skb(sk, seq - 1, &offset);
+ if (!skb)
break;
+ /* TCP coalescing might have appended data to the skb.
+ * Try to splice more frags
+ */
+ if (offset + 1 != skb->len)
+ continue;
}
if (tcp_hdr(skb)->fin) {
sk_eat_skb(sk, skb, false);
^ permalink raw reply related
* Re: Optics (SFP) monitoring on ixgbe and igbe
From: Aurélien @ 2012-12-02 21:47 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1354335498.2640.23.camel@bwh-desktop.uk.solarflarecom.com>
[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]
Hi Ben,
Thanks for your review. Here's a fixed-up version, according to your remarks.
On Sat, Dec 1, 2012 at 5:18 AM, Ben Hutchings <bhutchings@solarflare.com> wrote:
>
> This version drops the -lm completely, so it doesn't link. Maybe you
> edited the generated Makefile or Makefile.in?
No, I just stupidly forgot to make distclean & autogen after removing
all libm checks. Re-added AC_CHECK_LIB to link with it.
>
> The option alias should be included in the manual page and in a
> (trivial) test case in test-cmdline.c.
>
Included in the man page, along with a modified option description,
and a test equivalent to --dump-module-eeprom, which passes.
>
> The indentation is still weird, though:
>
> [...]
>
> These comments should be lined up vertically.
Yup, I had a mixup between tab/whitespaces, and my Vim config did not
help. Fixed.
>
> be32toh() is non-standard and was apparently added to glibc relatively
> recently (version 2.9). Therefore please use the equivalent ntohl()
> instead.
Did that, it indeed works just fine. I should have used that from the start.
>
> Function-like macros generally shouldn't be defined with a trailing
> semi-colon, as that will be added at the point of use.
>
That was a copy/paste typo, fixed.
> The backslashes should be lined up on the right, and continuation lines
> within parentheses should be indented so they begin just to the right of
> the opening parenthesis, e.g.:
>
> #define PRINT_VCC(string, index) \
> printf("\t%-41s : %.4f V\n", (string), \
> (double)(sd.sfp_voltage[(index)] / 10000.))
>
> [...]
>> + PRINT_xX_PWR("Laser output power low warning threshold",
>> + sd.tx_power, LWARN);
>
> The continuation lines are over-indented here.
>
Fixed (was using wrong tabstop width at 4).
>> - printf("\tActive Cu cmplnce. : 0x%02x", id[60]);
>> + printf("\t%-41s : 0x%02x", "Active copper compliance", id[60]);
>
> If you want to change these labels, do that in a separate patch.
>
There's no real need, so I'll leave those alone, just changing the
alignment like for the other labels.
> [...]
>> - "Length (62.5um)", 10, "m");
>> + "Length (62.5um)", 10, "m");
>
> These changes are unnecessary.
>
Agreed, removed from the patch.
Best regards,
--
Aurélien Guillaume
[-- Attachment #2: 0001-Implemented-basic-optics-diagnostics-for-SFF-8472.patch --]
[-- Type: application/octet-stream, Size: 22638 bytes --]
From 9a6e14770f12aa728751e3c9256968b4fd611290 Mon Sep 17 00:00:00 2001
From: Aurelien Guillaume <aurelien@iwi.me>
Date: Sun, 2 Dec 2012 21:21:01 +0100
Subject: [PATCH] Implemented basic optics diagnostics for SFF-8472
Signed-off-by: Aurelien Guillaume <aurelien@iwi.me>
---
Makefile.am | 2 +-
configure.ac | 1 +
ethtool.8.in | 8 +-
ethtool.c | 17 +++-
internal.h | 3 +
sfpdiag.c | 362 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sfpid.c | 28 +++--
test-cmdline.c | 2 +
8 files changed, 404 insertions(+), 19 deletions(-)
create mode 100644 sfpdiag.c
diff --git a/Makefile.am b/Makefile.am
index e33f71f..89a0d1e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -9,7 +9,7 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h internal.h net_tstamp-copy.h \
fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c \
pcnet32.c realtek.c tg3.c marvell.c vioc.c \
smsc911x.c at76c50x-usb.c sfc.c stmmac.c \
- rxclass.c sfpid.c
+ rxclass.c sfpid.c sfpdiag.c
TESTS = test-cmdline test-features
check_PROGRAMS = test-cmdline test-features
diff --git a/configure.ac b/configure.ac
index 0c597c6..5806eae 100644
--- a/configure.ac
+++ b/configure.ac
@@ -13,6 +13,7 @@ AC_PROG_GCC_TRADITIONAL
AM_PROG_CC_C_O
dnl Checks for libraries.
+AC_CHECK_LIB([m], [log10])
dnl Checks for header files.
AC_CHECK_HEADERS(sys/ioctl.h)
diff --git a/ethtool.8.in b/ethtool.8.in
index a3c7fbb..e701919 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -312,7 +312,7 @@ ethtool \- query or control network driver and hardware settings
.BN other
.BN combined
.HP
-.B ethtool \-m|\-\-dump\-module\-eeprom
+.B ethtool \-m|\-\-dump\-module\-eeprom|\-\-module\-info
.I devname
.B2 raw on off
.B2 hex on off
@@ -815,8 +815,10 @@ Changes the number of channels used only for other purposes e.g. link interrupts
.BI combined \ N
Changes the number of multi-purpose channels.
.TP
-.B \-m \-\-dump\-module\-eeprom
-Retrieves and if possible decodes the EEPROM from plugin modules, e.g SFP+, QSFP
+.B \-m \-\-dump\-module\-eeprom \-\-module\-info
+Retrieves and if possible decodes the EEPROM from plugin modules, e.g SFP+, QSFP.
+If the driver and module support it, the optical diagnostic information is also
+read and decoded.
.TP
.B \-\-show\-priv\-flags
Queries the specified network device for its private flags. The
diff --git a/ethtool.c b/ethtool.c
index 3db7fec..345c21c 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3604,6 +3604,16 @@ static int do_getmodule(struct cmd_context *ctx)
return 1;
}
+ /*
+ * SFF-8079 EEPROM layout contains the memory available at A0 address on
+ * the PHY EEPROM.
+ * SFF-8472 defines a virtual extension of the EEPROM, where the
+ * microcontroller on the SFP/SFP+ generates a page at the A2 address,
+ * which contains data relative to optical diagnostics.
+ * The current kernel implementation returns a blob, which contains:
+ * - ETH_MODULE_SFF_8079 => The A0 page only.
+ * - ETH_MODULE_SFF_8472 => The A0 and A2 page concatenated.
+ */
if (geeprom_dump_raw) {
fwrite(eeprom->data, 1, eeprom->len, stdout);
} else {
@@ -3613,8 +3623,11 @@ static int do_getmodule(struct cmd_context *ctx)
} else if (!geeprom_dump_hex) {
switch (modinfo.type) {
case ETH_MODULE_SFF_8079:
+ sff8079_show_all(eeprom->data);
+ break;
case ETH_MODULE_SFF_8472:
sff8079_show_all(eeprom->data);
+ sff8472_show_all(eeprom->data);
break;
default:
geeprom_dump_hex = 1;
@@ -3831,8 +3844,8 @@ static const struct option {
{ "--show-priv-flags" , 1, do_gprivflags, "Query private flags" },
{ "--set-priv-flags", 1, do_sprivflags, "Set private flags",
" FLAG on|off ...\n" },
- { "-m|--dump-module-eeprom", 1, do_getmodule,
- "Qeuery/Decode Module EEPROM information",
+ { "-m|--dump-module-eeprom|--module-info", 1, do_getmodule,
+ "Query/Decode Module EEPROM information and optical diagnostics if available",
" [ raw on|off ]\n"
" [ hex on|off ]\n"
" [ offset N ]\n"
diff --git a/internal.h b/internal.h
index 4f96fd5..e977a81 100644
--- a/internal.h
+++ b/internal.h
@@ -253,4 +253,7 @@ int rxclass_rule_del(struct cmd_context *ctx, __u32 loc);
/* Module EEPROM parsing code */
void sff8079_show_all(const __u8 *id);
+/* Optics diagnostics */
+void sff8472_show_all(const __u8 *id);
+
#endif /* ETHTOOL_INTERNAL_H__ */
diff --git a/sfpdiag.c b/sfpdiag.c
new file mode 100644
index 0000000..f67e491
--- /dev/null
+++ b/sfpdiag.c
@@ -0,0 +1,362 @@
+/*
+ * sfpdiag.c: Implements SFF-8472 optics diagnostics.
+ *
+ * Aurelien Guillaume <aurelien@iwi.me> (C) 2012
+ * This implementation is loosely based on DOM patches
+ * from Robert Olsson <robert@herjulf.se> (C) 2009
+ * and SFF-8472 specs (ftp://ftp.seagate.com/pub/sff/SFF-8472.PDF)
+ * by SFF Committee.
+ */
+
+#include <stdio.h>
+#include <math.h>
+#include <arpa/inet.h>
+#include "internal.h"
+
+/* Offsets in decimal, for direct comparison with the SFF specs */
+
+/* A0-based EEPROM offsets for DOM support checks */
+#define SFF_A0_DOM 92
+#define SFF_A0_OPTIONS 93
+#define SFF_A0_COMP 94
+
+/* EEPROM bit values for various registers */
+#define SFF_A0_DOM_EXTCAL (1 << 4)
+#define SFF_A0_DOM_INTCAL (1 << 5)
+#define SFF_A0_DOM_IMPL (1 << 6)
+#define SFF_A0_DOM_PWRT (1 << 3)
+
+#define SFF_A0_OPTIONS_AW (1 << 7)
+
+/*
+ * See ethtool.c comments about SFF-8472, this is the offset
+ * at which the A2 page is in the EEPROM blob returned by the
+ * kernel.
+ */
+#define SFF_A2_BASE 0x100
+
+/* A2-based offsets for DOM */
+#define SFF_A2_TEMP 96
+#define SFF_A2_TEMP_HALRM 0
+#define SFF_A2_TEMP_LALRM 2
+#define SFF_A2_TEMP_HWARN 4
+#define SFF_A2_TEMP_LWARN 6
+
+#define SFF_A2_VCC 98
+#define SFF_A2_VCC_HALRM 8
+#define SFF_A2_VCC_LALRM 10
+#define SFF_A2_VCC_HWARN 12
+#define SFF_A2_VCC_LWARN 14
+
+#define SFF_A2_BIAS 96
+#define SFF_A2_BIAS_HALRM 16
+#define SFF_A2_BIAS_LALRM 18
+#define SFF_A2_BIAS_HWARN 20
+#define SFF_A2_BIAS_LWARN 22
+
+#define SFF_A2_TX_PWR 102
+#define SFF_A2_TX_PWR_HALRM 24
+#define SFF_A2_TX_PWR_LALRM 26
+#define SFF_A2_TX_PWR_HWARN 28
+#define SFF_A2_TX_PWR_LWARN 30
+
+#define SFF_A2_RX_PWR 104
+#define SFF_A2_RX_PWR_HALRM 32
+#define SFF_A2_RX_PWR_LALRM 34
+#define SFF_A2_RX_PWR_HWARN 36
+#define SFF_A2_RX_PWR_LWARN 38
+
+#define SFF_A2_ALRM_FLG 112
+#define SFF_A2_WARN_FLG 116
+
+/* 32-bit little-endian calibration constants */
+#define SFF_A2_CAL_RXPWR4 56
+#define SFF_A2_CAL_RXPWR3 60
+#define SFF_A2_CAL_RXPWR2 64
+#define SFF_A2_CAL_RXPWR1 68
+#define SFF_A2_CAL_RXPWR0 72
+
+/* 16-bit little endian calibration constants */
+#define SFF_A2_CAL_TXI_SLP 76
+#define SFF_A2_CAL_TXI_OFF 78
+#define SFF_A2_CAL_TXPWR_SLP 80
+#define SFF_A2_CAL_TXPWR_OFF 82
+#define SFF_A2_CAL_T_SLP 84
+#define SFF_A2_CAL_T_OFF 86
+#define SFF_A2_CAL_V_SLP 88
+#define SFF_A2_CAL_V_OFF 90
+
+
+struct sff8472_diags {
+
+#define MCURR 0
+#define LWARN 1
+#define HWARN 2
+#define LALRM 3
+#define HALRM 4
+
+ /* [5] tables are current, low/high warn, low/high alarm */
+ __u8 supports_dom; /* Supports DOM */
+ __u8 supports_alarms; /* Supports alarm/warning thold */
+ __u8 calibrated_ext; /* Is externally calibrated */
+ __u16 bias_cur[5]; /* Measured bias current in 2uA units */
+ __u16 tx_power[5]; /* Measured TX Power in 0.1uW units */
+ __u16 rx_power[5]; /* Measured RX Power */
+ __u8 rx_power_type; /* 0 = OMA, 1 = Average power */
+ __s16 sfp_temp[5]; /* SFP Temp in 16-bit signed 1/256 Celcius */
+ __u16 sfp_voltage[5]; /* SFP voltage in 0.1mV units */
+
+};
+
+static struct sff8472_aw_flags {
+ const char *str; /* Human-readable string, null at the end */
+ int offset; /* A2-relative adress offset */
+ __u8 value; /* Alarm is on if (offset & value) != 0. */
+} sff8472_aw_flags[] = {
+ { "Laser bias current high alarm", SFF_A2_ALRM_FLG, (1 << 3) },
+ { "Laser bias current low alarm", SFF_A2_ALRM_FLG, (1 << 2) },
+ { "Laser bias current high warning", SFF_A2_WARN_FLG, (1 << 3) },
+ { "Laser bias current low warning", SFF_A2_WARN_FLG, (1 << 2) },
+
+ { "Laser output power high alarm", SFF_A2_ALRM_FLG, (1 << 1) },
+ { "Laser output power low alarm", SFF_A2_ALRM_FLG, (1 << 0) },
+ { "Laser output power high warning", SFF_A2_WARN_FLG, (1 << 1) },
+ { "Laser output power low warning", SFF_A2_WARN_FLG, (1 << 0) },
+
+ { "Module temperature high alarm", SFF_A2_ALRM_FLG, (1 << 7) },
+ { "Module temperature low alarm", SFF_A2_ALRM_FLG, (1 << 6) },
+ { "Module temperature high warning", SFF_A2_WARN_FLG, (1 << 7) },
+ { "Module temperature low warning", SFF_A2_WARN_FLG, (1 << 6) },
+
+ { "Module voltage high alarm", SFF_A2_ALRM_FLG, (1 << 5) },
+ { "Module voltage low alarm", SFF_A2_ALRM_FLG, (1 << 4) },
+ { "Module voltage high warning", SFF_A2_WARN_FLG, (1 << 5) },
+ { "Module voltage low warning", SFF_A2_WARN_FLG, (1 << 4) },
+
+ { "Laser rx power high alarm", SFF_A2_ALRM_FLG + 1, (1 << 7) },
+ { "Laser rx power low alarm", SFF_A2_ALRM_FLG + 1, (1 << 6) },
+ { "Laser rx power high warning", SFF_A2_WARN_FLG + 1, (1 << 7) },
+ { "Laser rx power low warning", SFF_A2_WARN_FLG + 1, (1 << 6) },
+
+ { NULL, 0, 0 },
+};
+
+static double convert_mw_to_dbm(double mw)
+{
+ return (10. * log10(mw / 1000.)) + 30.;
+}
+
+
+/* Most common case: 16-bit unsigned integer in a certain unit */
+#define A2_OFFSET_TO_U16(offset) \
+ (id[SFF_A2_BASE + (offset)] << 8 | id[SFF_A2_BASE + (offset) + 1])
+
+/* Calibration slope is a number between 0.0 included and 256.0 excluded. */
+#define A2_OFFSET_TO_SLP(offset) \
+ (id[SFF_A2_BASE + (offset)] + id[SFF_A2_BASE + (offset) + 1] / 256.)
+
+/* Calibration offset is an integer from -32768 to 32767 */
+#define A2_OFFSET_TO_OFF(offset) \
+ ((__s16)A2_OFFSET_TO_U16(offset))
+
+/* RXPWR(x) are IEEE-754 floating point numbers in big-endian format */
+#define A2_OFFSET_TO_RXPWRx(offset) \
+ (befloattoh((__u32 *)(id + SFF_A2_BASE + (offset))))
+
+/*
+ * 2-byte internal temperature conversions:
+ * First byte is a signed 8-bit integer, which is the temp decimal part
+ * Second byte are 1/256th of degree, which are added to the dec part.
+ */
+#define A2_OFFSET_TO_TEMP(offset) ((__s16)A2_OFFSET_TO_U16(offset))
+
+
+static void sff8472_dom_parse(const __u8 *id, struct sff8472_diags *sd)
+{
+
+ sd->bias_cur[MCURR] = A2_OFFSET_TO_U16(SFF_A2_BIAS);
+ sd->bias_cur[HALRM] = A2_OFFSET_TO_U16(SFF_A2_BIAS_HALRM);
+ sd->bias_cur[LALRM] = A2_OFFSET_TO_U16(SFF_A2_BIAS_LALRM);
+ sd->bias_cur[HWARN] = A2_OFFSET_TO_U16(SFF_A2_BIAS_HWARN);
+ sd->bias_cur[LWARN] = A2_OFFSET_TO_U16(SFF_A2_BIAS_LWARN);
+
+ sd->sfp_voltage[MCURR] = A2_OFFSET_TO_U16(SFF_A2_VCC);
+ sd->sfp_voltage[HALRM] = A2_OFFSET_TO_U16(SFF_A2_VCC_HALRM);
+ sd->sfp_voltage[LALRM] = A2_OFFSET_TO_U16(SFF_A2_VCC_LALRM);
+ sd->sfp_voltage[HWARN] = A2_OFFSET_TO_U16(SFF_A2_VCC_HWARN);
+ sd->sfp_voltage[LWARN] = A2_OFFSET_TO_U16(SFF_A2_VCC_LWARN);
+
+ sd->tx_power[MCURR] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR);
+ sd->tx_power[HALRM] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_HALRM);
+ sd->tx_power[LALRM] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_LALRM);
+ sd->tx_power[HWARN] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_HWARN);
+ sd->tx_power[LWARN] = A2_OFFSET_TO_U16(SFF_A2_TX_PWR_LWARN);
+
+ sd->rx_power[MCURR] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR);
+ sd->rx_power[HALRM] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_HALRM);
+ sd->rx_power[LALRM] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_LALRM);
+ sd->rx_power[HWARN] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_HWARN);
+ sd->rx_power[LWARN] = A2_OFFSET_TO_U16(SFF_A2_RX_PWR_LWARN);
+
+ sd->sfp_temp[MCURR] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP);
+ sd->sfp_temp[HALRM] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_HALRM);
+ sd->sfp_temp[LALRM] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_LALRM);
+ sd->sfp_temp[HWARN] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_HWARN);
+ sd->sfp_temp[LWARN] = A2_OFFSET_TO_TEMP(SFF_A2_TEMP_LWARN);
+
+}
+
+/* Converts to a float from a big-endian 4-byte source buffer. */
+static float befloattoh(const __u32 *source)
+{
+ union {
+ __u32 src;
+ float dst;
+ } converter;
+
+ converter.src = ntohl(*source);
+ return converter.dst;
+}
+
+static void sff8472_calibration(const __u8 *id, struct sff8472_diags *sd)
+{
+ int i;
+ __u16 rx_reading;
+
+ /* Calibration should occur for all values (threshold and current) */
+ for (i = 0; i < sizeof(sd->bias_cur); ++i) {
+ /*
+ * Apply calibration formula 1 (Temp., Voltage, Bias, Tx Power)
+ */
+ sd->bias_cur[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_TXI_SLP);
+ sd->tx_power[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_TXPWR_SLP);
+ sd->sfp_voltage[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_V_SLP);
+ sd->sfp_temp[i] *= A2_OFFSET_TO_SLP(SFF_A2_CAL_T_SLP);
+
+ sd->bias_cur[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_TXI_OFF);
+ sd->tx_power[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_TXPWR_OFF);
+ sd->sfp_voltage[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_V_OFF);
+ sd->sfp_temp[i] += A2_OFFSET_TO_OFF(SFF_A2_CAL_T_OFF);
+
+ /*
+ * Apply calibration formula 2 (Rx Power only)
+ */
+ rx_reading = sd->rx_power[i];
+ sd->rx_power[i] = A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR0);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR1);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR2);
+ sd->rx_power[i] += rx_reading *
+ A2_OFFSET_TO_RXPWRx(SFF_A2_CAL_RXPWR3);
+ }
+}
+
+static void sff8472_parse_eeprom(const __u8 *id, struct sff8472_diags *sd)
+{
+ sd->supports_dom = id[SFF_A0_DOM] & SFF_A0_DOM_IMPL;
+ sd->supports_alarms = id[SFF_A0_OPTIONS] & SFF_A0_OPTIONS_AW;
+ sd->calibrated_ext = id[SFF_A0_DOM] & SFF_A0_DOM_EXTCAL;
+ sd->rx_power_type = id[SFF_A0_DOM] & SFF_A0_DOM_PWRT;
+
+ sff8472_dom_parse(id, sd);
+
+ /*
+ * If the SFP is externally calibrated, we need to read calibration data
+ * and compensate the already stored readings.
+ */
+ if (sd->calibrated_ext)
+ sff8472_calibration(id, sd);
+}
+
+void sff8472_show_all(const __u8 *id)
+{
+ struct sff8472_diags sd;
+ char *rx_power_string = NULL;
+ int i;
+
+ sff8472_parse_eeprom(id, &sd);
+
+ if (!sd.supports_dom) {
+ printf("\t%-41s : No\n", "Optical diagnostics support");
+ return ;
+ }
+ printf("\t%-41s : Yes\n", "Optical diagnostics support");
+
+#define PRINT_BIAS(string, index) \
+ printf("\t%-41s : %.3f mA\n", (string), \
+ (double)(sd.bias_cur[(index)] / 500.))
+
+# define PRINT_xX_PWR(string, var, index) \
+ printf("\t%-41s : %.4f mW / %.2f dBm\n", (string), \
+ (double)((var)[(index)] / 10000.), \
+ convert_mw_to_dbm((double)((var)[(index)] / 10000.)))
+
+#define PRINT_TEMP(string, index) \
+ printf("\t%-41s : %.2f degrees C / %.2f degrees F\n", (string), \
+ (double)(sd.sfp_temp[(index)] / 256.), \
+ (double)(sd.sfp_temp[(index)] / 256. * 1.8 + 32.))
+
+#define PRINT_VCC(string, index) \
+ printf("\t%-41s : %.4f V\n", (string), \
+ (double)(sd.sfp_voltage[(index)] / 10000.))
+
+ PRINT_BIAS("Laser bias current", MCURR);
+ PRINT_xX_PWR("Laser output power", sd.tx_power, MCURR);
+
+ if (!sd.rx_power_type)
+ rx_power_string = "Receiver signal OMA";
+ else
+ rx_power_string = "Receiver signal average optical power";
+
+ PRINT_xX_PWR(rx_power_string, sd.rx_power, MCURR);
+
+ PRINT_TEMP("Module temperature", MCURR);
+ PRINT_VCC("Module voltage", MCURR);
+
+ printf("\t%-41s : %s\n", "Alarm/warning flags implemented",
+ (sd.supports_alarms ? "Yes" : "No"));
+ if (sd.supports_alarms) {
+
+ for (i = 0; sff8472_aw_flags[i].str; ++i) {
+ printf("\t%-41s : %s\n", sff8472_aw_flags[i].str,
+ id[SFF_A2_BASE + sff8472_aw_flags[i].offset]
+ & sff8472_aw_flags[i].value ? "On" : "Off");
+ }
+
+ PRINT_BIAS("Laser bias current high alarm threshold", HALRM);
+ PRINT_BIAS("Laser bias current low alarm threshold", LALRM);
+ PRINT_BIAS("Laser bias current high warning threshold", HWARN);
+ PRINT_BIAS("Laser bias current low warning threshold", LWARN);
+
+ PRINT_xX_PWR("Laser output power high alarm threshold",
+ sd.tx_power, HALRM);
+ PRINT_xX_PWR("Laser output power low alarm threshold",
+ sd.tx_power, LALRM);
+ PRINT_xX_PWR("Laser output power high warning threshold",
+ sd.tx_power, HWARN);
+ PRINT_xX_PWR("Laser output power low warning threshold",
+ sd.tx_power, LWARN);
+
+ PRINT_TEMP("Module temperature high alarm threshold", HALRM);
+ PRINT_TEMP("Module temperature low alarm threshold", LALRM);
+ PRINT_TEMP("Module temperature high warning threshold", HWARN);
+ PRINT_TEMP("Module temperature low warning threshold", LWARN);
+
+ PRINT_VCC("Module voltage high alarm threshold", HALRM);
+ PRINT_VCC("Module voltage low alarm threshold", LALRM);
+ PRINT_VCC("Module voltage high warning threshold", HWARN);
+ PRINT_VCC("Module voltage low warning threshold", LWARN);
+
+ PRINT_xX_PWR("Laser rx power high alarm threshold",
+ sd.rx_power, HALRM);
+ PRINT_xX_PWR("Laser rx power low alarm threshold",
+ sd.rx_power, LALRM);
+ PRINT_xX_PWR("Laser rx power high warning threshold",
+ sd.rx_power, HWARN);
+ PRINT_xX_PWR("Laser rx power low warning threshold",
+ sd.rx_power, LWARN);
+ }
+
+}
+
diff --git a/sfpid.c b/sfpid.c
index a4a671d..4f88aa2 100644
--- a/sfpid.c
+++ b/sfpid.c
@@ -12,7 +12,7 @@
static void sff8079_show_identifier(const __u8 *id)
{
- printf("\tIdentifier : 0x%02x", id[0]);
+ printf("\t%-41s : 0x%02x", "Identifier", id[0]);
switch (id[0]) {
case 0x00:
printf(" (no module present, unknown, or unspecified)\n");
@@ -34,7 +34,7 @@ static void sff8079_show_identifier(const __u8 *id)
static void sff8079_show_ext_identifier(const __u8 *id)
{
- printf("\tExtended identifier : 0x%02x", id[1]);
+ printf("\t%-41s : 0x%02x", "Extended identifier", id[1]);
if (id[1] == 0x00)
printf(" (GBIC not specified / not MOD_DEF compliant)\n");
else if (id[1] == 0x04)
@@ -47,7 +47,7 @@ static void sff8079_show_ext_identifier(const __u8 *id)
static void sff8079_show_connector(const __u8 *id)
{
- printf("\tConnector : 0x%02x", id[2]);
+ printf("\t%-41s : 0x%02x", "Connector", id[2]);
switch (id[2]) {
case 0x00:
printf(" (unknown or unspecified)\n");
@@ -105,10 +105,12 @@ static void sff8079_show_connector(const __u8 *id)
static void sff8079_show_transceiver(const __u8 *id)
{
- static const char *pfx = "\t : =>";
+ static const char *pfx =
+ "\tTransceiver type :";
- printf("\tTransceiver codes : 0x%02x 0x%02x 0x%02x" \
+ printf("\t%-41s : 0x%02x 0x%02x 0x%02x " \
"0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n",
+ "Transceiver codes",
id[3], id[4], id[5], id[6],
id[7], id[8], id[9], id[10]);
/* 10G Ethernet Compliance Codes */
@@ -239,7 +241,7 @@ static void sff8079_show_transceiver(const __u8 *id)
static void sff8079_show_encoding(const __u8 *id)
{
- printf("\tEncoding : 0x%02x", id[11]);
+ printf("\t%-41s : 0x%02x", "Encoding", id[11]);
switch (id[11]) {
case 0x00:
printf(" (unspecified)\n");
@@ -270,7 +272,7 @@ static void sff8079_show_encoding(const __u8 *id)
static void sff8079_show_rate_identifier(const __u8 *id)
{
- printf("\tRate identifier : 0x%02x", id[13]);
+ printf("\t%-41s : 0x%02x", "Rate identifier", id[13]);
switch (id[13]) {
case 0x00:
printf(" (unspecified)\n");
@@ -295,14 +297,14 @@ static void sff8079_show_rate_identifier(const __u8 *id)
static void sff8079_show_oui(const __u8 *id)
{
- printf("\tVendor OUI : %02x:%02x:%02x\n",
+ printf("\t%-41s : %02x:%02x:%02x\n", "Vendor OUI",
id[37], id[38], id[39]);
}
static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
{
if (id[8] & (1 << 2)) {
- printf("\tPassive Cu cmplnce. : 0x%02x", id[60]);
+ printf("\t%-41s : 0x%02x", "Passive Cu cmplnce.", id[60]);
switch (id[60]) {
case 0x00:
printf(" (unspecified)");
@@ -316,7 +318,7 @@ static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
}
printf(" [SFF-8472 rev10.4 only]\n");
} else if (id[8] & (1 << 3)) {
- printf("\tActive Cu cmplnce. : 0x%02x", id[60]);
+ printf("\t%-41s : 0x%02x", "Active Cu cmplnce.", id[60]);
switch (id[60]) {
case 0x00:
printf(" (unspecified)");
@@ -333,7 +335,7 @@ static void sff8079_show_wavelength_or_copper_compliance(const __u8 *id)
}
printf(" [SFF-8472 rev10.4 only]\n");
} else {
- printf("\tLaser wavelength : %unm\n",
+ printf("\t%-41s : %unm\n", "Laser wavelength",
(id[60] << 8) | id[61]);
}
}
@@ -344,7 +346,7 @@ static void sff8079_show_value_with_unit(const __u8 *id, unsigned int reg,
{
unsigned int val = id[reg];
- printf("\t%-20s: %u%s\n", name, val * mult, unit);
+ printf("\t%-41s : %u%s\n", name, val * mult, unit);
}
static void sff8079_show_ascii(const __u8 *id, unsigned int first_reg,
@@ -352,7 +354,7 @@ static void sff8079_show_ascii(const __u8 *id, unsigned int first_reg,
{
unsigned int reg, val;
- printf("\t%-20s: ", name);
+ printf("\t%-41s : ", name);
for (reg = first_reg; reg <= last_reg; reg++) {
val = id[reg];
putchar(((val >= 32) && (val <= 126)) ? val : '_');
diff --git a/test-cmdline.c b/test-cmdline.c
index 85b4ce0..f1d4555 100644
--- a/test-cmdline.c
+++ b/test-cmdline.c
@@ -213,6 +213,8 @@ static struct test_case {
{ 0, "-m devname" },
{ 1, "--dump-module-eeprom" },
{ 0, "--dump-module-eeprom devname" },
+ { 1, "--module-info" },
+ { 0, "--module-info devname" },
{ 0, "-m devname raw on" },
{ 0, "-m devname raw off" },
{ 0, "-m devname hex on" },
--
1.7.0.4
^ permalink raw reply related
* Checking struct size against sizeof(skb->cb) (was Re: [PATCH 00/17] ATM fixes for pppoatm/br2684)
From: David Woodhouse @ 2012-12-02 21:29 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <1354436040.21562.386.camel@shinybook.infradead.org>
[-- Attachment #1: Type: text/plain, Size: 4031 bytes --]
On Sun, 2012-12-02 at 08:14 +0000, David Woodhouse wrote:
> On Sat, 2012-12-01 at 20:49 -0500, David Miller wrote:
> >
> > I actually prefer what we do now, which is do the BUILD_BUG_ON()
> > once in the subsystem specific code, usually the initializer.
> >
> > It's part of creating a new SKB cb, adding that assertion somewhere.
>
> Where it's *subsystem* code that's great...
Hmm... or maybe not. A quick check suggests that about two-thirds of
users even in net/ aren't actually doing the check. My brief
investigation gives a score of 30 to 14 (or thereabouts; there may be
some that *do* check, but I failed to spot it. And I used a Fedora
config, not allyesconfig).
If you don't want an automatic check/cast macro (which is only adding a
compile-time check; no runtime overhead), then is it worth doing a
bombing run on the offenders listed below and adding the manual checks?
And I'll look at *drivers* next... which I suspect will be worse.
I concede there are probably no actual *bugs* being hidden here — I
don't think any of them actually *do* overflow. But since the check is
free at run-time, we *ought* to be doing it. Even if a given struct is
tiny and there's no *chance* of it overflowing, people might still add
to it. After all, the solos_skb_cb struct was tiny too until I stupidly
added a completion to it.
(Actually, I'm not entirely sure about 'no bugs'. The L2TP thing with
starting its own struct at &skb->cb[sizeof(struct inet_skb_parm)]
doesn't make it overflow, but what the hell is l2tp_xmit_skb() doing
poking at IPCB(skb) anyway... is it even guaranteed to be Legacy IP? Can
it not be IPv6? And why would anything else *trust* the contents of ->cb
on a skb that just got handed to it?)
My list:
Size not checked against sizeof(skb->cb):
struct napi_gro_cb (include/linux/netdevice.h)
struct ip6_mtuinfo (include/linux/ipv6.h)
struct sctp_ulpevent (include/net/sctp/ulpevent.h)
struct bt_skb_cb (include/net/bluetooth/bluetooth.h)
struct atm_skb_data (include/linux/atmdev.h)
struct br_input_skb_cb (net/bridge/br_private.h)
struct hci_cb (include/net/bluetooth/hci_core.h)
struct sock_exterr_skb (include/linux/errqueue.h)
struct udp_skb_cb (include/net/udp.h)
struct ipx_cb (include/net/ipx.h)
struct neighbour_cb (include/net/neighbour.h)
struct ip6frag_skb_cb (net/ipv6/reassembly.c)
struct dev_gso_cb (net/core/dev.c)
struct irda_skb_cb (include/net/irda/irda_device.h)
struct cmtp_skb (net/bluetooth/cmtp/cmtp.h)
struct ipfrag_skb_cb (net/ipv6/ip_fragment.c)
struct xfrm_skb_cb (include/net/xfrm.h)
struct xfrm_mode_skb_cb (include/net/xfrm.h)
struct xfrm_spi_skb_cb (include/net/xfrm.h)
struct in_pktinfo (include/uapi/linux/in.h)
struct nf_ct_frag6_skb_cb (net/ipv6/netfilter/nf_conntrack_reasm.c)
struct l2tp_skb_cb (net/l2tp/l2tp_core.c) (less than full cb)
struct ah_skb_cb (net/ipv4/ah4.c)
struct ah_skb_cb (net/ipv6/ah6.c)
struct esp_skb_cb (net/ipv4/esp4.c)
struct esp_skb_cb (net/ipv6/esp6.c)
struct skb_eosp_msg_data (net/mac80211/ieee80211_i.h)
struct mISDNhead (include/linux/mISDNif.h)
struct ieee80211_ra_tid (net/mac80211/iface.c)
struct sctp_input_cb (net/sctp/input.c)
Checked:
struct unix_skb_parms (include/net/af_unix.h)
struct packet_skb_cb (net/packet/af_packet.c)
struct ovs_skb_cb (net/openvswitch/datapath.h)
struct ieee80211_tx_info (include/net/mac80211.h)
struct ieee80211_rx_status (include/net/mac80211.h)
struct tcp_skb_cb (include/net/tcp.h)
struct ieee802154_mac_cb (include/net/ieee802154_netdev.h)
struct netlink_cb_parms (include/linux/netlink.h)
struct inet6_skb_parm (include/linux/ipv6.h)
struct inet_skb_parm (include/net/ip.h)
struct tcp_skb_cb (include/net/tcp.h)
struct garp_skb_cb (include/net/garp.h)
struct dccp_skb_cb (net/dccp/dccp.h)
struct qdisc_skb_cb (include/net/sch_generic.h)
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply
* [PATCH net-next] net: fix sparse endianness warnings on sock_common
From: Eric Dumazet @ 2012-12-02 17:33 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Fengguang Wu, Ling Ma
From: Eric Dumazet <edumazet@google.com>
# make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/inet_hashtables.o
...
net/ipv4/inet_hashtables.c:242:7: warning: restricted __portpair degrades to integer
net/ipv4/inet_hashtables.c:242:7: warning: restricted __addrpair degrades to integer
...
Move __portpair/__addrpair from include/net/inet_hashtables.h
to include/net/sock.h where we need them in struct sock_common
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
---
include/net/inet_hashtables.h | 2 --
include/net/sock.h | 7 +++++--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index d1de4fb..67a8fa0 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -277,7 +277,6 @@ static inline struct sock *inet_lookup_listener(struct net *net,
On 64bit targets we combine comparisons with pair of adjacent __be32
fields in the same way.
*/
-typedef __u32 __bitwise __portpair;
#ifdef __BIG_ENDIAN
#define INET_COMBINED_PORTS(__sport, __dport) \
((__force __portpair)(((__force __u32)(__be16)(__sport) << 16) | (__u32)(__dport)))
@@ -287,7 +286,6 @@ typedef __u32 __bitwise __portpair;
#endif
#if (BITS_PER_LONG == 64)
-typedef __u64 __bitwise __addrpair;
#ifdef __BIG_ENDIAN
#define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
const __addrpair __name = (__force __addrpair) ( \
diff --git a/include/net/sock.h b/include/net/sock.h
index c4132c1..0a9a01a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -126,6 +126,9 @@ struct sock;
struct proto;
struct net;
+typedef __u32 __bitwise __portpair;
+typedef __u64 __bitwise __addrpair;
+
/**
* struct sock_common - minimal network layer representation of sockets
* @skc_daddr: Foreign IPv4 addr
@@ -155,7 +158,7 @@ struct sock_common {
* address on 64bit arches : cf INET_MATCH() and INET_TW_MATCH()
*/
union {
- unsigned long skc_addrpair;
+ __addrpair skc_addrpair;
struct {
__be32 skc_daddr;
__be32 skc_rcv_saddr;
@@ -167,7 +170,7 @@ struct sock_common {
};
/* skc_dport && skc_num must be grouped as well */
union {
- u32 skc_portpair;
+ __portpair skc_portpair;
struct {
__be16 skc_dport;
__u16 skc_num;
^ permalink raw reply related
* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Eric Dumazet @ 2012-12-02 17:20 UTC (permalink / raw)
To: Ling Ma; +Cc: linux-kernel, netdev
In-Reply-To: <CAOGi=dO2wZESX5o4Jr_XZu0oPM-Qe30DKt-4f_3TVBrVoR=12Q@mail.gmail.com>
On Sun, 2012-12-02 at 21:25 +0800, Ling Ma wrote:
> Hi Eric,
>
> Attached benchmark test-cwf.c(cc -o test-cwf test-cwf.c), the result
> shows when last level cache(LLC) miss and CPU fetches data from
> memory, critical word as first 64bit member in cache line has better
> performance(costs 158290336 cycles ) than other positions(offset 0x10,
> costs 164100732 ) in cache line, the performance is improved by 3.6%
> in this case.
> cpu-info is also involved too.
>
> Thanks
> Ling
Thanks Ling.
Note that I was more interested by the case we read more fields per
cache line, like we do in tcp lookups. (skc_daddr, skc_rcv_saddr,
skc_bound_dev_if, skc_net).
I made changes to net-next to prepare your patch.
You'll have to move both skc_rxhash & skc_portpair before the
skc_addrpair.
I have to fix an endianness sparse problem, I'll send a patch for this
in a separate thread right now.
^ permalink raw reply
* Re: [PATCH v3]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-12-02 16:34 UTC (permalink / raw)
To: Francois Romieu; +Cc: nic_swsd, netdev, linux-kernel
In-Reply-To: <20121201114401.GA3989@electric-eye.fr.zoreil.com>
On Sat, Dec 01, 2012 at 12:44:01PM +0100, Francois Romieu wrote:
> Wang YanQing <udknight@gmail.com> :
> > + /*
> > + *This is a fix for BIOS forget to set
> > + *extend GigaMAC registers
> > + *Wang YanQing 12/1/2012
> > + */
>
> This part will go into the changelog.
I think brevity comment in code is good for
code's readableness. We read out the MAC{0,4},
and write them back in next line to call rtl_rar_set,
it don't have obvious sense for new readers, so I think
the brevity comment is good. Could you consider remaining
the comment except the no sense line "Wang YanQing 12/1/2012"?
>
> > + if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
> > + rtl_rar_set(tp, dev->dev_addr);
> > + }
>
> rtl_rar_set already includes a RTL_GIGA_MAC_VER_34 test and non-8168evl
> devices are already able to stand an extra MAC{0, 4} write. I'll check
> it does not hurt on different 81xx devices and submit an update.
I add the test code to ignore the an extra MAC{0,4} write for non-8168evl
devices, and if you think it is not a issue, then I agree with you to remove
the test code.
Thanks.
^ permalink raw reply
* Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool
From: Michael S. Tsirkin @ 2012-12-02 16:09 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <1354011360-39479-4-git-send-email-jasowang@redhat.com>
On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote:
> This patch implement the {set|get}_channels method of ethool to allow user to
> change the number of queues dymaically when the device is running. This would
> let the user to configure it on demand.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/virtio_net.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 41 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index bcaa6e5..f08ec2a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = {
> #endif
> };
>
> +/* TODO: Eliminate OOO packets during switching */
> +static int virtnet_set_channels(struct net_device *dev,
> + struct ethtool_channels *channels)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + u16 queue_pairs = channels->combined_count;
> +
> + /* We don't support separate rx/tx channels.
> + * We don't allow setting 'other' channels.
> + */
> + if (channels->rx_count || channels->tx_count || channels->other_count)
> + return -EINVAL;
> +
> + /* Only two modes were support currently */
> + if (queue_pairs != vi->max_queue_pairs && queue_pairs != 1)
> + return -EINVAL;
> +
Why the limitation?
Also how does userspace discover what the legal values are?
> + vi->curr_queue_pairs = queue_pairs;
> + BUG_ON(virtnet_set_queues(vi));
> +
> + netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> + netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> +
> + return 0;
> +}
> +
> +static void virtnet_get_channels(struct net_device *dev,
> + struct ethtool_channels *channels)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + channels->combined_count = vi->curr_queue_pairs;
> + channels->max_combined = vi->max_queue_pairs;
> + channels->max_other = 0;
> + channels->rx_count = 0;
> + channels->tx_count = 0;
> + channels->other_count = 0;
> +}
> +
> static const struct ethtool_ops virtnet_ethtool_ops = {
> .get_drvinfo = virtnet_get_drvinfo,
> .get_link = ethtool_op_get_link,
> .get_ringparam = virtnet_get_ringparam,
> + .set_channels = virtnet_set_channels,
> + .get_channels = virtnet_get_channels,
> };
>
> static int __init init(void)
> --
> 1.7.1
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Michael S. Tsirkin @ 2012-12-02 16:06 UTC (permalink / raw)
To: Jason Wang
Cc: rusty, krkumar2, virtualization, netdev, linux-kernel, kvm,
bhutchings, jwhan, shiyer
In-Reply-To: <1354011360-39479-3-git-send-email-jasowang@redhat.com>
On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
> This addes multiqueue support to virtio_net driver. In multiple queue modes, the
> driver expects the number of queue paris is equal to the number of vcpus. To
> eliminate the contention bettwen vcpus and virtqueues, per-cpu virtqueue pairs
> were implemented through:
>
> - select the txq based on the smp processor id.
> - smp affinity hint were set to the vcpu that owns the queue pairs.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/virtio_net.c | 454 ++++++++++++++++++++++++++++++---------
> include/uapi/linux/virtio_net.h | 16 ++
> 2 files changed, 371 insertions(+), 99 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7975133..bcaa6e5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -84,17 +84,25 @@ struct virtnet_info {
> struct virtio_device *vdev;
> struct virtqueue *cvq;
> struct net_device *dev;
> - struct napi_struct napi;
> - struct send_queue sq;
> - struct receive_queue rq;
> + struct send_queue *sq;
> + struct receive_queue *rq;
> unsigned int status;
>
> + /* Max # of queue pairs supported by the device */
> + u16 max_queue_pairs;
> +
> + /* # of queue pairs currently used by the driver */
> + u16 curr_queue_pairs;
> +
> /* I like... big packets and I cannot lie! */
> bool big_packets;
>
> /* Host will merge rx buffers for big packets (shake it! shake it!) */
> bool mergeable_rx_bufs;
>
> + /* Has control virtqueue */
> + bool has_cvq;
> +
> /* enable config space updates */
> bool config_enable;
>
> @@ -126,6 +134,34 @@ struct padded_vnet_hdr {
> char padding[6];
> };
>
> +static const struct ethtool_ops virtnet_ethtool_ops;
> +
> +/*
> + * Converting between virtqueue no. and kernel tx/rx queue no.
> + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> + */
Weird, this isn't what spec v5 says: it says
0:rx0 1:tx0 2: rx1 3: tx1 .... vcq
We can change the spec to match but keeping all rx/tx
together seems a bit prettier?
> +static int vq2txq(struct virtqueue *vq)
> +{
> + int index = virtqueue_get_queue_index(vq);
> + return index == 1 ? 0 : (index - 2) / 2;
> +}
> +
> +static int txq2vq(int txq)
> +{
> + return txq ? 2 * txq + 2 : 1;
> +}
> +
> +static int vq2rxq(struct virtqueue *vq)
> +{
> + int index = virtqueue_get_queue_index(vq);
> + return index ? (index - 1) / 2 : 0;
> +}
> +
> +static int rxq2vq(int rxq)
> +{
> + return rxq ? 2 * rxq + 1 : 0;
> +}
> +
> static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
> {
> return (struct skb_vnet_hdr *)skb->cb;
> @@ -166,7 +202,7 @@ static void skb_xmit_done(struct virtqueue *vq)
> virtqueue_disable_cb(vq);
>
> /* We were probably waiting for more output buffers. */
> - netif_wake_queue(vi->dev);
> + netif_wake_subqueue(vi->dev, vq2txq(vq));
> }
>
> static void set_skb_frag(struct sk_buff *skb, struct page *page,
> @@ -503,7 +539,7 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
> static void skb_recv_done(struct virtqueue *rvq)
> {
> struct virtnet_info *vi = rvq->vdev->priv;
> - struct receive_queue *rq = &vi->rq;
> + struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
>
> /* Schedule NAPI, Suppress further interrupts if successful. */
> if (napi_schedule_prep(&rq->napi)) {
> @@ -650,7 +686,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> - struct send_queue *sq = &vi->sq;
> + int qnum = skb_get_queue_mapping(skb);
> + struct send_queue *sq = &vi->sq[qnum];
> int capacity;
>
> /* Free up any pending old buffers before queueing new ones. */
> @@ -664,13 +701,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> if (likely(capacity == -ENOMEM)) {
> if (net_ratelimit())
> dev_warn(&dev->dev,
> - "TX queue failure: out of memory\n");
> + "TXQ (%d) failure: out of memory\n",
> + qnum);
> } else {
> dev->stats.tx_fifo_errors++;
> if (net_ratelimit())
> dev_warn(&dev->dev,
> - "Unexpected TX queue failure: %d\n",
> - capacity);
> + "Unexpected TXQ (%d) failure: %d\n",
> + qnum, capacity);
> }
> dev->stats.tx_dropped++;
> kfree_skb(skb);
> @@ -685,12 +723,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> /* Apparently nice girls don't return TX_BUSY; stop the queue
> * before it gets out of hand. Naturally, this wastes entries. */
> if (capacity < 2+MAX_SKB_FRAGS) {
> - netif_stop_queue(dev);
> + netif_stop_subqueue(dev, qnum);
> if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> /* More just got used, free them then recheck. */
> capacity += free_old_xmit_skbs(sq);
> if (capacity >= 2+MAX_SKB_FRAGS) {
> - netif_start_queue(dev);
> + netif_start_subqueue(dev, qnum);
> virtqueue_disable_cb(sq->vq);
> }
> }
> @@ -758,23 +796,13 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> static void virtnet_netpoll(struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> + int i;
>
> - napi_schedule(&vi->rq.napi);
> + for (i = 0; i < vi->curr_queue_pairs; i++)
> + napi_schedule(&vi->rq[i].napi);
> }
> #endif
>
> -static int virtnet_open(struct net_device *dev)
> -{
> - struct virtnet_info *vi = netdev_priv(dev);
> -
> - /* Make sure we have some buffers: if oom use wq. */
> - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->rq.refill, 0);
> -
> - virtnet_napi_enable(&vi->rq);
> - return 0;
> -}
> -
> /*
> * Send command via the control virtqueue and check status. Commands
> * supported by the hypervisor, as indicated by feature bits, should
> @@ -830,13 +858,53 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
> rtnl_unlock();
> }
>
> +static int virtnet_set_queues(struct virtnet_info *vi)
> +{
> + struct scatterlist sg;
> + struct virtio_net_ctrl_rfs s;
> + struct net_device *dev = vi->dev;
> +
> + s.virtqueue_pairs = vi->curr_queue_pairs;
> + sg_init_one(&sg, &s, sizeof(s));
> +
> + if (!vi->has_cvq)
> + return -EINVAL;
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> + VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> + dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> + " %d\n", vi->curr_queue_pairs);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int virtnet_open(struct net_device *dev)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + int i;
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + /* Make sure we have some buffers: if oom use wq. */
> + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> + schedule_delayed_work(&vi->rq[i].refill, 0);
> + virtnet_napi_enable(&vi->rq[i]);
> + }
> +
> + return 0;
> +}
> +
> static int virtnet_close(struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> + int i;
>
> /* Make sure refill_work doesn't re-enable napi! */
> - cancel_delayed_work_sync(&vi->rq.refill);
> - napi_disable(&vi->rq.napi);
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + cancel_delayed_work_sync(&vi->rq[i].refill);
> + napi_disable(&vi->rq[i].napi);
> + }
>
> return 0;
> }
> @@ -948,8 +1016,8 @@ static void virtnet_get_ringparam(struct net_device *dev,
> {
> struct virtnet_info *vi = netdev_priv(dev);
>
> - ring->rx_max_pending = virtqueue_get_vring_size(vi->rq.vq);
> - ring->tx_max_pending = virtqueue_get_vring_size(vi->sq.vq);
> + ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq);
> + ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq);
> ring->rx_pending = ring->rx_max_pending;
> ring->tx_pending = ring->tx_max_pending;
> }
> @@ -967,12 +1035,6 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>
> }
>
> -static const struct ethtool_ops virtnet_ethtool_ops = {
> - .get_drvinfo = virtnet_get_drvinfo,
> - .get_link = ethtool_op_get_link,
> - .get_ringparam = virtnet_get_ringparam,
> -};
> -
> #define MIN_MTU 68
> #define MAX_MTU 65535
>
> @@ -984,6 +1046,20 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> return 0;
> }
>
> +/* To avoid contending a lock hold by a vcpu who would exit to host, select the
> + * txq based on the processor id.
> + */
> +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> +{
> + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> + smp_processor_id();
> +
> + while (unlikely(txq >= dev->real_num_tx_queues))
> + txq -= dev->real_num_tx_queues;
> +
> + return txq;
> +}
> +
> static const struct net_device_ops virtnet_netdev = {
> .ndo_open = virtnet_open,
> .ndo_stop = virtnet_close,
> @@ -995,6 +1071,7 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_get_stats64 = virtnet_stats,
> .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> + .ndo_select_queue = virtnet_select_queue,
> #ifdef CONFIG_NET_POLL_CONTROLLER
> .ndo_poll_controller = virtnet_netpoll,
> #endif
> @@ -1030,10 +1107,10 @@ static void virtnet_config_changed_work(struct work_struct *work)
>
> if (vi->status & VIRTIO_NET_S_LINK_UP) {
> netif_carrier_on(vi->dev);
> - netif_wake_queue(vi->dev);
> + netif_tx_wake_all_queues(vi->dev);
> } else {
> netif_carrier_off(vi->dev);
> - netif_stop_queue(vi->dev);
> + netif_tx_stop_all_queues(vi->dev);
> }
> done:
> mutex_unlock(&vi->config_lock);
> @@ -1046,41 +1123,212 @@ static void virtnet_config_changed(struct virtio_device *vdev)
> schedule_work(&vi->config_work);
> }
>
> -static int init_vqs(struct virtnet_info *vi)
> +static void free_receive_bufs(struct virtnet_info *vi)
> +{
> + int i;
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + while (vi->rq[i].pages)
> + __free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> + }
> +}
> +
> +/* Free memory allocated for send and receive queues */
> +static void virtnet_free_queues(struct virtnet_info *vi)
> {
> - struct virtqueue *vqs[3];
> - vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
> - const char *names[] = { "input", "output", "control" };
> - int nvqs, err;
> + kfree(vi->rq);
> + vi->rq = NULL;
> + kfree(vi->sq);
> + vi->sq = NULL;
> +}
>
> - /* We expect two virtqueues, receive then send,
> - * and optionally control. */
> - nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
> +static void free_unused_bufs(struct virtnet_info *vi)
> +{
> + void *buf;
> + int i;
>
> - err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
> - if (err)
> - return err;
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + struct virtqueue *vq = vi->sq[i].vq;
> + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> + dev_kfree_skb(buf);
> + }
>
> - vi->rq.vq = vqs[0];
> - vi->sq.vq = vqs[1];
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + struct virtqueue *vq = vi->rq[i].vq;
>
> - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> - vi->cvq = vqs[2];
> + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> + if (vi->mergeable_rx_bufs || vi->big_packets)
> + give_pages(&vi->rq[i], buf);
> + else
> + dev_kfree_skb(buf);
> + --vi->rq[i].num;
> + }
> + BUG_ON(vi->rq[i].num != 0);
> + }
> +}
>
> +static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> +{
> + int i;
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + int cpu = set ? i : -1;
> + virtqueue_set_affinity(vi->rq[i].vq, cpu);
> + virtqueue_set_affinity(vi->sq[i].vq, cpu);
> + }
> +}
> +
> +static void virtnet_del_vqs(struct virtnet_info *vi)
> +{
> + struct virtio_device *vdev = vi->vdev;
> +
> + virtnet_set_affinity(vi, false);
> +
> + vdev->config->del_vqs(vdev);
> +
> + virtnet_free_queues(vi);
> +}
> +
> +static int virtnet_find_vqs(struct virtnet_info *vi)
> +{
> + vq_callback_t **callbacks;
> + struct virtqueue **vqs;
> + int ret = -ENOMEM;
> + int i, total_vqs;
> + char **names;
> +
> + /*
> + * We expect 1 RX virtqueue followed by 1 TX virtqueue, followd by
> + * possible control virtqueue, followed by RX/TX N-1 queue pairs used
> + * in multiqueue mode.
> + */
> + total_vqs = vi->max_queue_pairs * 2 +
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
> +
> + /* Allocate space for find_vqs parameters */
> + vqs = kzalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
> + callbacks = kzalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
> + if (!vqs || !callbacks)
> + goto err_mem;
> + names = kzalloc(total_vqs * sizeof(*names), GFP_KERNEL);
> + if (!names)
> + goto err_mem;
> +
> + /* Parameters for control virtqueue, if any */
> + if (vi->has_cvq) {
> + callbacks[2] = NULL;
> + names[2] = kasprintf(GFP_KERNEL, "control");
> + }
> +
> + /* Allocate/initialize parameters for send/receive virtqueues */
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + callbacks[rxq2vq(i)] = skb_recv_done;
> + callbacks[txq2vq(i)] = skb_xmit_done;
> + names[rxq2vq(i)] = kasprintf(GFP_KERNEL, "input.%d", i);
> + names[txq2vq(i)] = kasprintf(GFP_KERNEL, "output.%d", i);
> + }
> +
> + ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> + (const char **)names);
> + if (ret)
> + goto err_names;
> +
> + if (vi->has_cvq) {
> + vi->cvq = vqs[2];
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
> vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
> }
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + vi->rq[i].vq = vqs[rxq2vq(i)];
> + vi->sq[i].vq = vqs[txq2vq(i)];
> + }
> +
> + kfree(callbacks);
> + kfree(vqs);
> +
> + return 0;
> +
> +err_names:
> + for (i = 0; i < total_vqs * 2; i ++)
> + kfree(names[i]);
> + kfree(names);
> +
> +err_mem:
> + kfree(callbacks);
> + kfree(vqs);
> +
> + return ret;
> +}
> +
> +static int virtnet_alloc_queues(struct virtnet_info *vi)
> +{
> + int i;
> +
> + vi->sq = kzalloc(sizeof(vi->sq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> + vi->rq = kzalloc(sizeof(vi->rq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> + if (!vi->rq || !vi->sq)
> + goto err;
> +
> + /* setup initial receive and send queue parameters */
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + vi->rq[i].pages = NULL;
> + INIT_DELAYED_WORK(&vi->rq[i].refill, refill_work);
> + netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> + napi_weight);
> +
> + sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> + sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
> + }
> +
> +
> return 0;
> +
> +err:
> + virtnet_free_queues(vi);
> + return -ENOMEM;
> +}
> +
> +static int init_vqs(struct virtnet_info *vi)
> +{
> + int ret;
> +
> + /* Allocate send & receive queues */
> + ret = virtnet_alloc_queues(vi);
> + if (ret)
> + goto err;
> +
> + ret = virtnet_find_vqs(vi);
> + if (ret)
> + goto err_free;
> +
> + virtnet_set_affinity(vi, true);
> + return 0;
> +
> +err_free:
> + virtnet_free_queues(vi);
> +err:
> + return ret;
> }
>
> static int virtnet_probe(struct virtio_device *vdev)
> {
> - int err;
> + int i, err;
> struct net_device *dev;
> struct virtnet_info *vi;
> + u16 curr_queue_pairs;
Probably a good idea to rename this max_queue_pairs.
> +
> + /* Find if host supports multiqueue virtio_net device */
> + err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> + offsetof(struct virtio_net_config,
> + max_virtqueue_pairs), &curr_queue_pairs);
> +
> + /* We need at least 2 queue's */
> + if (err)
> + curr_queue_pairs = 1;
Let's also validate against VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN
and VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX.
>
> /* Allocate ourselves a network device with room for our info */
> - dev = alloc_etherdev(sizeof(struct virtnet_info));
> + dev = alloc_etherdev_mq(sizeof(struct virtnet_info), curr_queue_pairs);
> if (!dev)
> return -ENOMEM;
>
> @@ -1126,22 +1374,17 @@ static int virtnet_probe(struct virtio_device *vdev)
>
> /* Set up our device-specific information */
> vi = netdev_priv(dev);
> - netif_napi_add(dev, &vi->rq.napi, virtnet_poll, napi_weight);
> vi->dev = dev;
> vi->vdev = vdev;
> vdev->priv = vi;
> - vi->rq.pages = NULL;
> vi->stats = alloc_percpu(struct virtnet_stats);
> err = -ENOMEM;
> if (vi->stats == NULL)
> goto free;
>
> - INIT_DELAYED_WORK(&vi->rq.refill, refill_work);
> mutex_init(&vi->config_lock);
> vi->config_enable = true;
> INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> - sg_init_table(vi->rq.sg, ARRAY_SIZE(vi->rq.sg));
> - sg_init_table(vi->sq.sg, ARRAY_SIZE(vi->sq.sg));
>
> /* If we can receive ANY GSO packets, we must allocate large ones. */
> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> @@ -1152,10 +1395,21 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> vi->mergeable_rx_bufs = true;
>
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> + vi->has_cvq = true;
> +
> + /* Use single tx/rx queue pair as default */
> + vi->curr_queue_pairs = 1;
> + vi->max_queue_pairs = curr_queue_pairs;
> +
> + /* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> err = init_vqs(vi);
> if (err)
> goto free_stats;
>
> + netif_set_real_num_tx_queues(dev, 1);
> + netif_set_real_num_rx_queues(dev, 1);
> +
> err = register_netdev(dev);
> if (err) {
> pr_debug("virtio_net: registering device failed\n");
> @@ -1163,12 +1417,15 @@ static int virtnet_probe(struct virtio_device *vdev)
> }
>
> /* Last of all, set up some receive buffers. */
> - try_fill_recv(&vi->rq, GFP_KERNEL);
> -
> - /* If we didn't even get one input buffer, we're useless. */
> - if (vi->rq.num == 0) {
> - err = -ENOMEM;
> - goto unregister;
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + try_fill_recv(&vi->rq[i], GFP_KERNEL);
> +
> + /* If we didn't even get one input buffer, we're useless. */
> + if (vi->rq[i].num == 0) {
> + free_unused_bufs(vi);
> + err = -ENOMEM;
> + goto free_recv_bufs;
> + }
> }
>
> /* Assume link up if device can't report link status,
> @@ -1181,13 +1438,20 @@ static int virtnet_probe(struct virtio_device *vdev)
> netif_carrier_on(dev);
> }
>
> - pr_debug("virtnet: registered device %s\n", dev->name);
> + pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
> + dev->name, curr_queue_pairs);
> +
> return 0;
>
> -unregister:
> +free_recv_bufs:
> + free_receive_bufs(vi);
> unregister_netdev(dev);
> +
> free_vqs:
> - vdev->config->del_vqs(vdev);
> + for (i = 0; i <curr_queue_pairs; i++)
> + cancel_delayed_work_sync(&vi->rq[i].refill);
> + virtnet_del_vqs(vi);
> +
> free_stats:
> free_percpu(vi->stats);
> free:
> @@ -1195,28 +1459,6 @@ free:
> return err;
> }
>
> -static void free_unused_bufs(struct virtnet_info *vi)
> -{
> - void *buf;
> - while (1) {
> - buf = virtqueue_detach_unused_buf(vi->sq.vq);
> - if (!buf)
> - break;
> - dev_kfree_skb(buf);
> - }
> - while (1) {
> - buf = virtqueue_detach_unused_buf(vi->rq.vq);
> - if (!buf)
> - break;
> - if (vi->mergeable_rx_bufs || vi->big_packets)
> - give_pages(&vi->rq, buf);
> - else
> - dev_kfree_skb(buf);
> - --vi->rq.num;
> - }
> - BUG_ON(vi->rq.num != 0);
> -}
> -
> static void remove_vq_common(struct virtnet_info *vi)
> {
> vi->vdev->config->reset(vi->vdev);
> @@ -1224,10 +1466,9 @@ static void remove_vq_common(struct virtnet_info *vi)
> /* Free unused buffers in both send and recv, if any. */
> free_unused_bufs(vi);
>
> - vi->vdev->config->del_vqs(vi->vdev);
> + free_receive_bufs(vi);
>
> - while (vi->rq.pages)
> - __free_pages(get_a_page(&vi->rq, GFP_KERNEL), 0);
> + virtnet_del_vqs(vi);
> }
>
> static void __devexit virtnet_remove(struct virtio_device *vdev)
> @@ -1253,6 +1494,7 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
> static int virtnet_freeze(struct virtio_device *vdev)
> {
> struct virtnet_info *vi = vdev->priv;
> + int i;
>
> /* Prevent config work handler from accessing the device */
> mutex_lock(&vi->config_lock);
> @@ -1260,10 +1502,14 @@ static int virtnet_freeze(struct virtio_device *vdev)
> mutex_unlock(&vi->config_lock);
>
> netif_device_detach(vi->dev);
> - cancel_delayed_work_sync(&vi->rq.refill);
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + cancel_delayed_work_sync(&vi->rq[i].refill);
>
> if (netif_running(vi->dev))
> - napi_disable(&vi->rq.napi);
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + napi_disable(&vi->rq[i].napi);
> + netif_napi_del(&vi->rq[i].napi);
> + }
>
> remove_vq_common(vi);
>
> @@ -1275,24 +1521,28 @@ static int virtnet_freeze(struct virtio_device *vdev)
> static int virtnet_restore(struct virtio_device *vdev)
> {
> struct virtnet_info *vi = vdev->priv;
> - int err;
> + int err, i;
>
> err = init_vqs(vi);
> if (err)
> return err;
>
> if (netif_running(vi->dev))
> - virtnet_napi_enable(&vi->rq);
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + virtnet_napi_enable(&vi->rq[i]);
>
> netif_device_attach(vi->dev);
>
> - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->rq.refill, 0);
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> + schedule_delayed_work(&vi->rq[i].refill, 0);
>
> mutex_lock(&vi->config_lock);
> vi->config_enable = true;
> mutex_unlock(&vi->config_lock);
>
> + BUG_ON(virtnet_set_queues(vi));
> +
Won't this always fail when control vq is off?
> return 0;
> }
> #endif
> @@ -1310,7 +1560,7 @@ static unsigned int features[] = {
> VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
> VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> - VIRTIO_NET_F_GUEST_ANNOUNCE,
> + VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_RFS,
> };
>
> static struct virtio_driver virtio_net_driver = {
> @@ -1328,6 +1578,12 @@ static struct virtio_driver virtio_net_driver = {
> #endif
> };
>
> +static const struct ethtool_ops virtnet_ethtool_ops = {
> + .get_drvinfo = virtnet_get_drvinfo,
> + .get_link = ethtool_op_get_link,
> + .get_ringparam = virtnet_get_ringparam,
> +};
> +
> static int __init init(void)
> {
> return register_virtio_driver(&virtio_net_driver);
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 2470f54..6056cec 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -51,6 +51,7 @@
> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
> #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the
> * network */
> +#define VIRTIO_NET_F_RFS 22 /* Device supports multiple TXQ/RXQ */
>
> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
> #define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
> @@ -60,6 +61,8 @@ struct virtio_net_config {
> __u8 mac[6];
> /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> __u16 status;
> + /* Total number of RX/TX queues */
> + __u16 max_virtqueue_pairs;
> } __attribute__((packed));
>
> /* This is the first element of the scatter-gather list. If you don't
> @@ -166,4 +169,17 @@ struct virtio_net_ctrl_mac {
> #define VIRTIO_NET_CTRL_ANNOUNCE 3
> #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
>
> +/*
> + * Control multiqueue
> + *
> + */
> +struct virtio_net_ctrl_rfs {
> + u16 virtqueue_pairs;
> +};
> +
> +#define VIRTIO_NET_CTRL_RFS 4
> + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET 0
> + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN 1
> + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX 0x8000
> +
> #endif /* _LINUX_VIRTIO_NET_H */
> --
> 1.7.1
^ permalink raw reply
* [PATCH net-next v2 0/3] mlx4_en: set number of rx/tx channels using ethtool
From: Amir Vadai @ 2012-12-02 13:49 UTC (permalink / raw)
To: David S. Miller; +Cc: Ben Hutchings, Amir Vadai, Or Gerlitz, Oren Duer, netdev
1. Added a record in the MAINTAINERS file for the mlx4_en driver
2. Fix set_ringparam not to forget tx moderation info + remove code duplication
3. Add support to changing number of rx/tx channels using ethtool
---
Changes from V1:
- Set limits to number of channels by priv->max_* and not values supplied by user
- Fix indentation
Changes from V0:
- Added file pattern to MAINAINERS file
Amir Vadai (3):
MAINTAINERS: Add Mellanox ethernet driver - mlx4_en
net/mlx4_en: Fix TX moderation info loss after set_ringparam is
called
net/mlx4_en: Set number of rx/tx channels using ethtool
MAINTAINERS | 8 ++
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 128 +++++++++++++++++-----
drivers/net/ethernet/mellanox/mlx4/en_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 26 +++--
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 8 ++-
6 files changed, 131 insertions(+), 43 deletions(-)
--
1.7.8.2
^ permalink raw reply
* [PATCH net-next 13/13] bnx2x: Correct PFC disablement
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Barak Witkowski, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Barak Witkowski <barak@broadcom.com>
bnx2x driver could only have enabled pfc via usage of dcbnl; now, it can
also correctly disable it.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
index c8c0340..10bc093 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
@@ -2052,10 +2052,13 @@ static void bnx2x_dcbnl_set_pfc_cfg(struct net_device *netdev, int prio,
if (!bnx2x_dcbnl_set_valid(bp) || prio >= MAX_PFC_PRIORITIES)
return;
- bp->dcbx_config_params.admin_pfc_bitmap |= ((setting ? 1 : 0) << prio);
- if (setting)
+ if (setting) {
+ bp->dcbx_config_params.admin_pfc_bitmap |= (1 << prio);
bp->dcbx_config_params.admin_pfc_tx_enable = 1;
+ } else {
+ bp->dcbx_config_params.admin_pfc_bitmap &= ~(1 << prio);
+ }
}
static void bnx2x_dcbnl_get_pfc_cfg(struct net_device *netdev, int prio,
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 12/13] bnx2x: fix 'Ethtool -A' when autoneg
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
When configuring pauses using 'ethtool -A', the requested values have
effect when used together with autoneg (up to this point, when configured
for autoneg, driver ignored requested pause configuration)
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 49 ++++++++++++----------
2 files changed, 28 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index ad28074..32c3ab7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -144,7 +144,7 @@ u32 bnx2x_fw_command(struct bnx2x *bp, u32 command, u32 param);
* @bp: driver handle
* @load_mode: current mode
*/
-u8 bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode);
+int bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode);
/**
* bnx2x_link_set - configure hw according to link parameters structure.
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index ab65f34..c8ec3fc 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -2103,22 +2103,25 @@ void bnx2x_calc_fc_adv(struct bnx2x *bp)
}
}
-u8 bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode)
+static void bnx2x_set_requested_fc(struct bnx2x *bp)
{
- if (!BP_NOMCP(bp)) {
- u8 rc;
- int cfx_idx = bnx2x_get_link_cfg_idx(bp);
- u16 req_line_speed = bp->link_params.req_line_speed[cfx_idx];
- /*
- * Initialize link parameters structure variables
- * It is recommended to turn off RX FC for jumbo frames
- * for better performance
- */
- if (CHIP_IS_E1x(bp) && (bp->dev->mtu > 5000))
- bp->link_params.req_fc_auto_adv = BNX2X_FLOW_CTRL_TX;
- else
- bp->link_params.req_fc_auto_adv = BNX2X_FLOW_CTRL_BOTH;
+ /* Initialize link parameters structure variables
+ * It is recommended to turn off RX FC for jumbo frames
+ * for better performance
+ */
+ if (CHIP_IS_E1x(bp) && (bp->dev->mtu > 5000))
+ bp->link_params.req_fc_auto_adv = BNX2X_FLOW_CTRL_TX;
+ else
+ bp->link_params.req_fc_auto_adv = BNX2X_FLOW_CTRL_BOTH;
+}
+int bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode)
+{
+ int rc, cfx_idx = bnx2x_get_link_cfg_idx(bp);
+ u16 req_line_speed = bp->link_params.req_line_speed[cfx_idx];
+
+ if (!BP_NOMCP(bp)) {
+ bnx2x_set_requested_fc(bp);
bnx2x_acquire_phy_lock(bp);
if (load_mode == LOAD_DIAG) {
@@ -2147,11 +2150,11 @@ u8 bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode)
bnx2x_calc_fc_adv(bp);
- if (CHIP_REV_IS_SLOW(bp) && bp->link_vars.link_up) {
+ if (bp->link_vars.link_up) {
bnx2x_stats_handle(bp, STATS_EVENT_LINK_UP);
bnx2x_link_report(bp);
- } else
- queue_delayed_work(bnx2x_wq, &bp->period_task, 0);
+ }
+ queue_delayed_work(bnx2x_wq, &bp->period_task, 0);
bp->link_params.req_line_speed[cfx_idx] = req_line_speed;
return rc;
}
@@ -10315,11 +10318,13 @@ static void __devinit bnx2x_link_settings_requested(struct bnx2x *bp)
bp->link_params.req_flow_ctrl[idx] = (link_config &
PORT_FEATURE_FLOW_CONTROL_MASK);
- if ((bp->link_params.req_flow_ctrl[idx] ==
- BNX2X_FLOW_CTRL_AUTO) &&
- !(bp->port.supported[idx] & SUPPORTED_Autoneg)) {
- bp->link_params.req_flow_ctrl[idx] =
- BNX2X_FLOW_CTRL_NONE;
+ if (bp->link_params.req_flow_ctrl[idx] ==
+ BNX2X_FLOW_CTRL_AUTO) {
+ if (!(bp->port.supported[idx] & SUPPORTED_Autoneg))
+ bp->link_params.req_flow_ctrl[idx] =
+ BNX2X_FLOW_CTRL_NONE;
+ else
+ bnx2x_set_requested_fc(bp);
}
BNX2X_DEV_INFO("req_line_speed %d req_duplex %d req_flow_ctrl 0x%x advertising 0x%x\n",
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 11/13] bnx2x: prevent DCB if disabled in nvram
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Barak Witkowski, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c | 5 +++++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 11 +++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
index c0d9b69..c8c0340 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
@@ -1904,6 +1904,11 @@ static u8 bnx2x_dcbnl_set_state(struct net_device *netdev, u8 state)
struct bnx2x *bp = netdev_priv(netdev);
DP(BNX2X_MSG_DCB, "state = %s\n", state ? "on" : "off");
+ if (state && ((bp->dcbx_enabled == BNX2X_DCBX_ENABLED_OFF) ||
+ (bp->dcbx_enabled == BNX2X_DCBX_ENABLED_INVALID))) {
+ DP(BNX2X_MSG_DCB, "Can not set dcbx to enabled while it is disabled in nvm\n");
+ return 1;
+ }
bnx2x_dcbx_set_state(bp, (state ? true : false), bp->dcbx_enabled);
return 0;
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index d76ca90..ab65f34 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -11120,8 +11120,15 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
bp->timer.data = (unsigned long) bp;
bp->timer.function = bnx2x_timer;
- bnx2x_dcbx_set_state(bp, true, BNX2X_DCBX_ENABLED_ON_NEG_ON);
- bnx2x_dcbx_init_params(bp);
+ if (SHMEM2_HAS(bp, dcbx_lldp_params_offset) &&
+ SHMEM2_HAS(bp, dcbx_lldp_dcbx_stat_offset) &&
+ SHMEM2_RD(bp, dcbx_lldp_params_offset) &&
+ SHMEM2_RD(bp, dcbx_lldp_dcbx_stat_offset)) {
+ bnx2x_dcbx_set_state(bp, true, BNX2X_DCBX_ENABLED_ON_NEG_ON);
+ bnx2x_dcbx_init_params(bp);
+ } else {
+ bnx2x_dcbx_set_state(bp, false, BNX2X_DCBX_ENABLED_OFF);
+ }
if (CHIP_IS_E1x(bp))
bp->cnic_base_cl_id = FP_SB_MAX_E1x;
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 09/13] bnx2x: mask CPL_OF interrupt
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
Unmasked interrupt caused "FATAL HW block attention set2 0x20" messages
to erroneously appear, as the associated interrupt is fully recoverable.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 21 ++++++++++-----------
1 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 5ff0bcb..983a0c8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -6059,6 +6059,8 @@ static int bnx2x_int_mem_test(struct bnx2x *bp)
static void bnx2x_enable_blocks_attention(struct bnx2x *bp)
{
+ u32 val;
+
REG_WR(bp, PXP_REG_PXP_INT_MASK_0, 0);
if (!CHIP_IS_E1x(bp))
REG_WR(bp, PXP_REG_PXP_INT_MASK_1, 0x40);
@@ -6092,17 +6094,14 @@ static void bnx2x_enable_blocks_attention(struct bnx2x *bp)
/* REG_WR(bp, CSEM_REG_CSEM_INT_MASK_0, 0); */
/* REG_WR(bp, CSEM_REG_CSEM_INT_MASK_1, 0); */
- if (CHIP_REV_IS_FPGA(bp))
- REG_WR(bp, PXP2_REG_PXP2_INT_MASK_0, 0x580000);
- else if (!CHIP_IS_E1x(bp))
- REG_WR(bp, PXP2_REG_PXP2_INT_MASK_0,
- (PXP2_PXP2_INT_MASK_0_REG_PGL_CPL_OF
- | PXP2_PXP2_INT_MASK_0_REG_PGL_CPL_AFT
- | PXP2_PXP2_INT_MASK_0_REG_PGL_PCIE_ATTN
- | PXP2_PXP2_INT_MASK_0_REG_PGL_READ_BLOCKED
- | PXP2_PXP2_INT_MASK_0_REG_PGL_WRITE_BLOCKED));
- else
- REG_WR(bp, PXP2_REG_PXP2_INT_MASK_0, 0x480000);
+ val = PXP2_PXP2_INT_MASK_0_REG_PGL_CPL_AFT |
+ PXP2_PXP2_INT_MASK_0_REG_PGL_CPL_OF |
+ PXP2_PXP2_INT_MASK_0_REG_PGL_PCIE_ATTN;
+ if (!CHIP_IS_E1x(bp))
+ val |= PXP2_PXP2_INT_MASK_0_REG_PGL_READ_BLOCKED |
+ PXP2_PXP2_INT_MASK_0_REG_PGL_WRITE_BLOCKED;
+ REG_WR(bp, PXP2_REG_PXP2_INT_MASK_0, val);
+
REG_WR(bp, TSDM_REG_TSDM_INT_MASK_0, 0);
REG_WR(bp, TSDM_REG_TSDM_INT_MASK_1, 0);
REG_WR(bp, TCM_REG_TCM_INT_MASK, 0);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 10/13] bnx2x: Handle a rarely missed interrupt
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yaniv Rosner, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Yaniv Rosner <yaniv.rosner@broadcom.com>
A rare case of no link due to a missed interrupt may occur due to a
race condition between acknowledging the IGU via the BAR and restoring the NIG
interrupt mask via the GRC.
To solve it, we wait for the IGU ack command to finish prior to restoring the
NIG interrupt mask.
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 03647bf..02ea644 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -915,6 +915,7 @@ struct bnx2x_common {
#define BNX2X_IGU_STAS_MSG_VF_CNT 64
#define BNX2X_IGU_STAS_MSG_PF_CNT 4
+#define MAX_IGU_ATTN_ACK_TO 100
/* end of common */
/* port */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 983a0c8..d76ca90 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3588,6 +3588,21 @@ static void bnx2x_attn_int_asserted(struct bnx2x *bp, u32 asserted)
/* now set back the mask */
if (asserted & ATTN_NIG_FOR_FUNC) {
+ /* Verify that IGU ack through BAR was written before restoring
+ * NIG mask. This loop should exit after 2-3 iterations max.
+ */
+ if (bp->common.int_block != INT_BLOCK_HC) {
+ u32 cnt = 0, igu_acked;
+ do {
+ igu_acked = REG_RD(bp,
+ IGU_REG_ATTENTION_ACK_BITS);
+ } while (((igu_acked & ATTN_NIG_FOR_FUNC) == 0) &&
+ (++cnt < MAX_IGU_ATTN_ACK_TO));
+ if (!igu_acked)
+ DP(NETIF_MSG_HW,
+ "Failed to verify IGU ack on time\n");
+ barrier();
+ }
REG_WR(bp, nig_int_mask_addr, nig_mask);
bnx2x_release_phy_lock(bp);
}
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 07/13] bnx2x: Ext. config accessed only on non-E1x.
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 89b3d10..571508d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -10484,8 +10484,9 @@ static void __devinit bnx2x_get_fcoe_info(struct bnx2x *bp)
if (BNX2X_MF_EXT_PROTOCOL_FCOE(bp) && !CHIP_IS_E1x(bp))
bnx2x_get_ext_wwn_info(bp, func);
- } else if (IS_MF_FCOE_SD(bp))
+ } else if (IS_MF_FCOE_SD(bp) && !CHIP_IS_E1x(bp)) {
bnx2x_get_ext_wwn_info(bp, func);
+ }
BNX2X_DEV_INFO("max_fcoe_conn 0x%x\n", bp->cnic_eth_dev.max_fcoe_conn);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 08/13] bnx2x: IGU parse error cause probe failure
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Barak Witkowski, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Barak Witkowski <barak@broadcom.com>
If IGU parse error is encountered during the probing process, the error
propagates and the probe gracefully fails (until now, such errors were ignored,
later causing mischief).
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 571508d..5ff0bcb 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9974,7 +9974,7 @@ static void __devinit bnx2x_get_common_hwinfo(struct bnx2x *bp)
#define IGU_FID(val) GET_FIELD((val), IGU_REG_MAPPING_MEMORY_FID)
#define IGU_VEC(val) GET_FIELD((val), IGU_REG_MAPPING_MEMORY_VECTOR)
-static void __devinit bnx2x_get_igu_cam_info(struct bnx2x *bp)
+static int __devinit bnx2x_get_igu_cam_info(struct bnx2x *bp)
{
int pfid = BP_FUNC(bp);
int igu_sb_id;
@@ -9991,7 +9991,7 @@ static void __devinit bnx2x_get_igu_cam_info(struct bnx2x *bp)
bp->igu_dsb_id = E1HVN_MAX * FP_SB_MAX_E1x +
(CHIP_MODE_IS_4_PORT(bp) ? pfid : vn);
- return;
+ return 0;
}
/* IGU in normal mode - read CAM */
@@ -10025,8 +10025,12 @@ static void __devinit bnx2x_get_igu_cam_info(struct bnx2x *bp)
bp->igu_sb_cnt = min_t(int, bp->igu_sb_cnt, igu_sb_cnt);
#endif
- if (igu_sb_cnt == 0)
+ if (igu_sb_cnt == 0) {
BNX2X_ERR("CAM configuration error\n");
+ return -EINVAL;
+ }
+
+ return 0;
}
static void __devinit bnx2x_link_settings_supported(struct bnx2x *bp,
@@ -10706,6 +10710,8 @@ static int __devinit bnx2x_get_hwinfo(struct bnx2x *bp)
if (REG_RD(bp, IGU_REG_RESET_MEMORIES)) {
dev_err(&bp->pdev->dev,
"FORCING Normal Mode failed!!!\n");
+ bnx2x_release_hw_lock(bp,
+ HW_LOCK_RESOURCE_RESET);
return -EPERM;
}
}
@@ -10716,9 +10722,10 @@ static int __devinit bnx2x_get_hwinfo(struct bnx2x *bp)
} else
BNX2X_DEV_INFO("IGU Normal Mode\n");
- bnx2x_get_igu_cam_info(bp);
-
+ rc = bnx2x_get_igu_cam_info(bp);
bnx2x_release_hw_lock(bp, HW_LOCK_RESOURCE_RESET);
+ if (rc)
+ return rc;
}
/*
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 06/13] bnx2x: nvram enables dropless flow control
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
It is now possible to enable dropless flow control via nvram.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 13 +++++++++++--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 20 +++++++++++++++++++-
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 641d884..03647bf 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1488,7 +1488,7 @@ struct bnx2x {
int qm_cid_count;
- int dropless_fc;
+ bool dropless_fc;
void *t2;
dma_addr_t t2_mapping;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index 9a51d49..3369a50 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -500,7 +500,15 @@ struct port_hw_cfg { /* port 0: 0x12c port 1: 0x2bc */
u32 e3_cmn_pin_cfg1; /* 0x170 */
#define PORT_HW_CFG_E3_OVER_CURRENT_MASK 0x000000FF
#define PORT_HW_CFG_E3_OVER_CURRENT_SHIFT 0
- u32 reserved0[7]; /* 0x174 */
+
+ /* pause on host ring */
+ u32 generic_features; /* 0x174 */
+ #define PORT_HW_CFG_PAUSE_ON_HOST_RING_MASK 0x00000001
+ #define PORT_HW_CFG_PAUSE_ON_HOST_RING_SHIFT 0
+ #define PORT_HW_CFG_PAUSE_ON_HOST_RING_DISABLED 0x00000000
+ #define PORT_HW_CFG_PAUSE_ON_HOST_RING_ENABLED 0x00000001
+
+ u32 reserved0[6]; /* 0x178 */
u32 aeu_int_mask; /* 0x190 */
@@ -1518,12 +1526,13 @@ enum mf_cfg_afex_vlan_mode {
/* This structure is not applicable and should not be accessed on 57711 */
struct func_ext_cfg {
u32 func_cfg;
- #define MACP_FUNC_CFG_FLAGS_MASK 0x000000FF
+ #define MACP_FUNC_CFG_FLAGS_MASK 0x0000007F
#define MACP_FUNC_CFG_FLAGS_SHIFT 0
#define MACP_FUNC_CFG_FLAGS_ENABLED 0x00000001
#define MACP_FUNC_CFG_FLAGS_ETHERNET 0x00000002
#define MACP_FUNC_CFG_FLAGS_ISCSI_OFFLOAD 0x00000004
#define MACP_FUNC_CFG_FLAGS_FCOE_OFFLOAD 0x00000008
+ #define MACP_FUNC_CFG_PAUSE_ON_HOST_RING 0x00000080
u32 iscsi_mac_addr_upper;
u32 iscsi_mac_addr_lower;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 62fcf0f..89b3d10 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -10641,8 +10641,26 @@ static void __devinit bnx2x_get_mac_hwinfo(struct bnx2x *bp)
"bad Ethernet MAC address configuration: %pM\n"
"change it manually before bringing up the appropriate network interface\n",
bp->dev->dev_addr);
+}
+static bool __devinit bnx2x_get_dropless_info(struct bnx2x *bp)
+{
+ int tmp;
+ u32 cfg;
+ if (IS_MF(bp) && !CHIP_IS_E1x(bp)) {
+ /* Take function: tmp = func */
+ tmp = BP_ABS_FUNC(bp);
+ cfg = MF_CFG_RD(bp, func_ext_config[tmp].func_cfg);
+ cfg = !!(cfg & MACP_FUNC_CFG_PAUSE_ON_HOST_RING);
+ } else {
+ /* Take port: tmp = port */
+ tmp = BP_PORT(bp);
+ cfg = SHMEM_RD(bp,
+ dev_info.port_hw_config[tmp].generic_features);
+ cfg = !!(cfg & PORT_HW_CFG_PAUSE_ON_HOST_RING_ENABLED);
+ }
+ return cfg;
}
static int __devinit bnx2x_get_hwinfo(struct bnx2x *bp)
@@ -11063,7 +11081,7 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
if (CHIP_IS_E1(bp))
bp->dropless_fc = 0;
else
- bp->dropless_fc = dropless_fc;
+ bp->dropless_fc = dropless_fc | bnx2x_get_dropless_info(bp);
bp->mrrs = mrrs;
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 04/13] bnx2x: Filter packets on FCoE rings
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Dmitry Kravkov, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Dmitry Kravkov <dmitry@broadcom.com>
Whenever bnx2x fails to transmit a packet due to a full Tx ring, if the
ring size is zero (indicating an FCoE ring) driver filters the packet out
and gracefully continues.
Driver also gathers statistics on such filtered packets.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 11 ++++++++---
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 6 +++++-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 +++
4 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index e95174d..5e07aa5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3127,11 +3127,16 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
BDS_PER_TX_PKT +
NEXT_CNT_PER_TX_PKT(MAX_BDS_PER_TX_PKT))) {
/* Handle special storage cases separately */
- if (txdata->tx_ring_size != 0) {
- BNX2X_ERR("BUG! Tx ring full when queue awake!\n");
+ if (txdata->tx_ring_size == 0) {
+ struct bnx2x_eth_q_stats *q_stats =
+ bnx2x_fp_qstats(bp, txdata->parent_fp);
+ q_stats->driver_filtered_tx_pkt++;
+ dev_kfree_skb(skb);
+ return NETDEV_TX_OK;
+ }
bnx2x_fp_qstats(bp, txdata->parent_fp)->driver_xoff++;
netif_tx_stop_queue(txq);
- }
+ BNX2X_ERR("BUG! Tx ring full when queue awake!\n");
return NETDEV_TX_BUSY;
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index e05f981..c7270c0 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -62,7 +62,9 @@ static const struct {
8, "[%s]: tpa_aggregations" },
{ Q_STATS_OFFSET32(total_tpa_aggregated_frames_hi),
8, "[%s]: tpa_aggregated_frames"},
- { Q_STATS_OFFSET32(total_tpa_bytes_hi), 8, "[%s]: tpa_bytes"}
+ { Q_STATS_OFFSET32(total_tpa_bytes_hi), 8, "[%s]: tpa_bytes"},
+ { Q_STATS_OFFSET32(driver_filtered_tx_pkt),
+ 4, "[%s]: driver_filtered_tx_pkt" }
};
#define BNX2X_NUM_Q_STATS ARRAY_SIZE(bnx2x_q_stats_arr)
@@ -177,6 +179,8 @@ static const struct {
4, STATS_FLAGS_FUNC, "recoverable_errors" },
{ STATS_OFFSET32(unrecoverable_error),
4, STATS_FLAGS_FUNC, "unrecoverable_errors" },
+ { STATS_OFFSET32(driver_filtered_tx_pkt),
+ 4, STATS_FLAGS_FUNC, "driver_filtered_tx_pkt" },
{ STATS_OFFSET32(eee_tx_lpi),
4, STATS_FLAGS_PORT, "Tx LPI entry count"}
};
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
index 348ed02..89ec066 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
@@ -1149,6 +1149,7 @@ static void bnx2x_drv_stats_update(struct bnx2x *bp)
UPDATE_ESTAT_QSTAT(rx_err_discard_pkt);
UPDATE_ESTAT_QSTAT(rx_skb_alloc_failed);
UPDATE_ESTAT_QSTAT(hw_csum_err);
+ UPDATE_ESTAT_QSTAT(driver_filtered_tx_pkt);
}
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
index 24b8e50..b4d7b26 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
@@ -203,6 +203,7 @@ struct bnx2x_eth_stats {
/* Recovery */
u32 recoverable_error;
u32 unrecoverable_error;
+ u32 driver_filtered_tx_pkt;
/* src: Clear-on-Read register; Will not survive PMF Migration */
u32 eee_tx_lpi;
};
@@ -264,6 +265,7 @@ struct bnx2x_eth_q_stats {
u32 total_tpa_aggregated_frames_lo;
u32 total_tpa_bytes_hi;
u32 total_tpa_bytes_lo;
+ u32 driver_filtered_tx_pkt;
};
struct bnx2x_eth_stats_old {
@@ -315,6 +317,7 @@ struct bnx2x_eth_q_stats_old {
u32 rx_err_discard_pkt_old;
u32 rx_skb_alloc_failed_old;
u32 hw_csum_err_old;
+ u32 driver_filtered_tx_pkt_old;
};
struct bnx2x_net_stats_old {
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 05/13] bnx2x: Correct advertised speed/duplex
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
If link is down due to management (and not due to actual phy link being lost),
driver should still behave as if the link is down; Querying via ethtool about
speed/duplex state should result in 'UNKNOWN' (same behaviour as when link is
actually down).
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index c7270c0..277f17e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -231,18 +231,14 @@ static int bnx2x_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
cmd->advertising &= ~(ADVERTISED_10000baseT_Full);
}
- if ((bp->state == BNX2X_STATE_OPEN) && (bp->link_vars.link_up)) {
- if (!(bp->flags & MF_FUNC_DIS)) {
- ethtool_cmd_speed_set(cmd, bp->link_vars.line_speed);
+ if ((bp->state == BNX2X_STATE_OPEN) && bp->link_vars.link_up &&
+ !(bp->flags & MF_FUNC_DIS)) {
cmd->duplex = bp->link_vars.duplex;
- } else {
- ethtool_cmd_speed_set(
- cmd, bp->link_params.req_line_speed[cfg_idx]);
- cmd->duplex = bp->link_params.req_duplex[cfg_idx];
- }
if (IS_MF(bp) && !BP_NOMCP(bp))
ethtool_cmd_speed_set(cmd, bnx2x_get_mf_speed(bp));
+ else
+ ethtool_cmd_speed_set(cmd, bp->link_vars.line_speed);
} else {
cmd->duplex = DUPLEX_UNKNOWN;
ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 02/13] bnx2x: parity recovery flow enhancement
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Barak Witkowski, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Barak Witkowski <barak@broadcom.com>
Parity recovery was enhanced in order to handle a few more corner cases.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 33 +++++++++++++--------
1 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 5a22e19..62fcf0f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8720,7 +8720,8 @@ static void bnx2x_reset_mcp_prep(struct bnx2x *bp, u32 *magic_val)
/* Get shmem offset */
shmem = REG_RD(bp, MISC_REG_SHARED_MEM_ADDR);
- validity_offset = offsetof(struct shmem_region, validity_map[0]);
+ validity_offset =
+ offsetof(struct shmem_region, validity_map[BP_PORT(bp)]);
/* Clear validity map flags */
if (shmem > 0)
@@ -8813,7 +8814,11 @@ static void bnx2x_process_kill_chip_reset(struct bnx2x *bp, bool global)
MISC_REGISTERS_RESET_REG_2_RST_MCP_N_RESET_CMN_CPU |
MISC_REGISTERS_RESET_REG_2_RST_MCP_N_RESET_CMN_CORE;
- /* Don't reset the following blocks */
+ /* Don't reset the following blocks.
+ * Important: per port blocks (such as EMAC, BMAC, UMAC) can't be
+ * reset, as in 4 port device they might still be owned
+ * by the MCP (there is only one leader per path).
+ */
not_reset_mask1 =
MISC_REGISTERS_RESET_REG_1_RST_HC |
MISC_REGISTERS_RESET_REG_1_RST_PXPV |
@@ -8829,19 +8834,19 @@ static void bnx2x_process_kill_chip_reset(struct bnx2x *bp, bool global)
MISC_REGISTERS_RESET_REG_2_RST_MCP_N_RESET_REG_HARD_CORE |
MISC_REGISTERS_RESET_REG_2_RST_MCP_N_HARD_CORE_RST_B |
MISC_REGISTERS_RESET_REG_2_RST_ATC |
- MISC_REGISTERS_RESET_REG_2_PGLC;
+ MISC_REGISTERS_RESET_REG_2_PGLC |
+ MISC_REGISTERS_RESET_REG_2_RST_BMAC0 |
+ MISC_REGISTERS_RESET_REG_2_RST_BMAC1 |
+ MISC_REGISTERS_RESET_REG_2_RST_EMAC0 |
+ MISC_REGISTERS_RESET_REG_2_RST_EMAC1 |
+ MISC_REGISTERS_RESET_REG_2_UMAC0 |
+ MISC_REGISTERS_RESET_REG_2_UMAC1;
/*
* Keep the following blocks in reset:
* - all xxMACs are handled by the bnx2x_link code.
*/
stay_reset2 =
- MISC_REGISTERS_RESET_REG_2_RST_BMAC0 |
- MISC_REGISTERS_RESET_REG_2_RST_BMAC1 |
- MISC_REGISTERS_RESET_REG_2_RST_EMAC0 |
- MISC_REGISTERS_RESET_REG_2_RST_EMAC1 |
- MISC_REGISTERS_RESET_REG_2_UMAC0 |
- MISC_REGISTERS_RESET_REG_2_UMAC1 |
MISC_REGISTERS_RESET_REG_2_XMAC |
MISC_REGISTERS_RESET_REG_2_XMAC_SOFT;
@@ -8931,6 +8936,7 @@ static int bnx2x_process_kill(struct bnx2x *bp, bool global)
int cnt = 1000;
u32 val = 0;
u32 sr_cnt, blk_cnt, port_is_idle_0, port_is_idle_1, pgl_exp_rom2;
+ u32 tags_63_32 = 0;
/* Empty the Tetris buffer, wait for 1s */
@@ -8940,10 +8946,14 @@ static int bnx2x_process_kill(struct bnx2x *bp, bool global)
port_is_idle_0 = REG_RD(bp, PXP2_REG_RD_PORT_IS_IDLE_0);
port_is_idle_1 = REG_RD(bp, PXP2_REG_RD_PORT_IS_IDLE_1);
pgl_exp_rom2 = REG_RD(bp, PXP2_REG_PGL_EXP_ROM2);
+ if (CHIP_IS_E3(bp))
+ tags_63_32 = REG_RD(bp, PGLUE_B_REG_TAGS_63_32);
+
if ((sr_cnt == 0x7e) && (blk_cnt == 0xa0) &&
((port_is_idle_0 & 0x1) == 0x1) &&
((port_is_idle_1 & 0x1) == 0x1) &&
- (pgl_exp_rom2 == 0xffffffff))
+ (pgl_exp_rom2 == 0xffffffff) &&
+ (!CHIP_IS_E3(bp) || (tags_63_32 == 0xffffffff)))
break;
usleep_range(1000, 1000);
} while (cnt-- > 0);
@@ -9000,9 +9010,6 @@ static int bnx2x_process_kill(struct bnx2x *bp, bool global)
/* TBD: Add resetting the NO_MCP mode DB here */
- /* PXP */
- bnx2x_pxp_prep(bp);
-
/* Open the gates #2, #3 and #4 */
bnx2x_set_234_gates(bp, false);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 03/13] bnx2x: Management can control PFC/ETS
From: Yuval Mintz @ 2012-12-02 14:05 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Barak Witkowski, Yuval Mintz
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>
From: Barak Witkowski <barak@broadcom.com>
If configured for PFC/ETS by management, configure chip regardless of the
presence of a remote peer which supports DCBX.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c | 21 +++++++++++++++++----
drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 7 ++++++-
3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 8779ac1..e95174d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2426,7 +2426,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
}
if (bp->port.pmf)
- bnx2x_update_drv_flags(bp, 1 << DRV_FLAGS_DCB_CONFIGURED, 0);
+ bnx2x_update_drv_flags(bp, 1 << DRV_FLAGS_PORT_MASK, 0);
else
bnx2x__link_status_update(bp);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
index cba4a16..c0d9b69 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
@@ -413,8 +413,11 @@ static int bnx2x_dcbx_read_mib(struct bnx2x *bp,
static void bnx2x_pfc_set_pfc(struct bnx2x *bp)
{
+ int mfw_configured = SHMEM2_HAS(bp, drv_flags) &&
+ GET_FLAGS(SHMEM2_RD(bp, drv_flags),
+ 1 << DRV_FLAGS_DCB_MFW_CONFIGURED);
if (bp->dcbx_port_params.pfc.enabled &&
- !(bp->dcbx_error & DCBX_REMOTE_MIB_ERROR))
+ (!(bp->dcbx_error & DCBX_REMOTE_MIB_ERROR) || mfw_configured))
/*
* 1. Fills up common PFC structures if required
* 2. Configure NIG, MAC and BRB via the elink
@@ -552,10 +555,13 @@ static void bnx2x_dcbx_update_ets_config(struct bnx2x *bp)
static void bnx2x_dcbx_update_ets_params(struct bnx2x *bp)
{
+ int mfw_configured = SHMEM2_HAS(bp, drv_flags) &&
+ GET_FLAGS(SHMEM2_RD(bp, drv_flags),
+ 1 << DRV_FLAGS_DCB_MFW_CONFIGURED);
bnx2x_ets_disabled(&bp->link_params, &bp->link_vars);
if (!bp->dcbx_port_params.ets.enabled ||
- (bp->dcbx_error & DCBX_REMOTE_MIB_ERROR))
+ ((bp->dcbx_error & DCBX_REMOTE_MIB_ERROR) && !mfw_configured))
return;
if (CHIP_IS_E3B0(bp))
@@ -1802,11 +1808,14 @@ static void bnx2x_dcbx_fw_struct(struct bnx2x *bp,
u8 cos = 0, pri = 0;
struct priority_cos *tt2cos;
u32 *ttp = bp->dcbx_port_params.app.traffic_type_priority;
+ int mfw_configured = SHMEM2_HAS(bp, drv_flags) &&
+ GET_FLAGS(SHMEM2_RD(bp, drv_flags),
+ 1 << DRV_FLAGS_DCB_MFW_CONFIGURED);
memset(pfc_fw_cfg, 0, sizeof(*pfc_fw_cfg));
/* to disable DCB - the structure must be zeroed */
- if (bp->dcbx_error & DCBX_REMOTE_MIB_ERROR)
+ if ((bp->dcbx_error & DCBX_REMOTE_MIB_ERROR) && !mfw_configured)
return;
/*shortcut*/
@@ -2073,8 +2082,12 @@ static u8 bnx2x_dcbnl_set_all(struct net_device *netdev)
"Handling parity error recovery. Try again later\n");
return 1;
}
- if (netif_running(bp->dev))
+ if (netif_running(bp->dev)) {
+ bnx2x_update_drv_flags(bp,
+ 1 << DRV_FLAGS_DCB_MFW_CONFIGURED,
+ 1);
bnx2x_dcbx_init(bp, true);
+ }
DP(BNX2X_MSG_DCB, "set_dcbx_params done (%d)\n", rc);
if (rc)
return 1;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index 1504e0a..9a51d49 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -2088,8 +2088,13 @@ struct shmem2_region {
/* generic flags controlled by the driver */
u32 drv_flags;
- #define DRV_FLAGS_DCB_CONFIGURED 0x1
+ #define DRV_FLAGS_DCB_CONFIGURED 0x0
+ #define DRV_FLAGS_DCB_CONFIGURATION_ABORTED 0x1
+ #define DRV_FLAGS_DCB_MFW_CONFIGURED 0x2
+ #define DRV_FLAGS_PORT_MASK ((1 << DRV_FLAGS_DCB_CONFIGURED) | \
+ (1 << DRV_FLAGS_DCB_CONFIGURATION_ABORTED) | \
+ (1 << DRV_FLAGS_DCB_MFW_CONFIGURED))
/* pointer to extended dev_info shared data copied from nvm image */
u32 extended_dev_info_shared_addr;
u32 ncsi_oem_data_addr;
--
1.7.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox