* Re: [PATCHv2] vhost-net: utilize PUBLISH_USED_IDX feature
From: Avi Kivity @ 2010-05-18 17:47 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: davem, Juan Quintela, Rusty Russell, Paul E. McKenney,
Arnd Bergmann, kvm, virtualization, netdev, linux-kernel,
alex.williamson, amit.shah
In-Reply-To: <20100518022105.GA23129@redhat.com>
On 05/18/2010 05:21 AM, Michael S. Tsirkin wrote:
> With PUBLISH_USED_IDX, guest tells us which used entries
> it has consumed. This can be used to reduce the number
> of interrupts: after we write a used entry, if the guest has not yet
> consumed the previous entry, or if the guest has already consumed the
> new entry, we do not need to interrupt.
> This imporves bandwidth by 30% under some workflows.
>
Seems to be missing the cacheline alignment.
Rusty's clarification did not satisfy me, I think it's needed.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply
* [PATCH] net-2.6 : fix dev_get_valid_name
From: Daniel Lezcano @ 2010-05-18 17:35 UTC (permalink / raw)
To: davem; +Cc: opurdila, netdev
the commit:
commit d90310243fd750240755e217c5faa13e24f41536
Author: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed Nov 18 02:36:59 2009 +0000
net: device name allocation cleanups
introduced a bug when there is a hash collision making impossible
to rename a device with eth%d. This bug is very hard to reproduce
and appears rarely.
The problem is coming from we don't pass a temporary buffer to
__dev_alloc_name but 'dev->name' which is modified by the function.
A detailed explanation is here:
http://marc.info/?l=linux-netdev&m=127417784011987&w=2
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
net/core/dev.c | 20 ++++++++++++--------
1 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 264137f..4704a1a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -936,18 +936,22 @@ int dev_alloc_name(struct net_device *dev, const char *name)
}
EXPORT_SYMBOL(dev_alloc_name);
-static int dev_get_valid_name(struct net *net, const char *name, char *buf,
- bool fmt)
+static int dev_get_valid_name(struct net_device *dev, const char *name, bool fmt)
{
+ struct net *net;
+
+ BUG_ON(!dev_net(dev));
+ net = dev_net(dev);
+
if (!dev_valid_name(name))
return -EINVAL;
if (fmt && strchr(name, '%'))
- return __dev_alloc_name(net, name, buf);
+ return dev_alloc_name(dev, name);
else if (__dev_get_by_name(net, name))
return -EEXIST;
- else if (buf != name)
- strlcpy(buf, name, IFNAMSIZ);
+ else if (strncmp(dev->name, name, IFNAMSIZ))
+ strlcpy(dev->name, name, IFNAMSIZ);
return 0;
}
@@ -979,7 +983,7 @@ int dev_change_name(struct net_device *dev, const char *newname)
memcpy(oldname, dev->name, IFNAMSIZ);
- err = dev_get_valid_name(net, newname, dev->name, 1);
+ err = dev_get_valid_name(dev, newname, 1);
if (err < 0)
return err;
@@ -5083,7 +5087,7 @@ int register_netdevice(struct net_device *dev)
}
}
- ret = dev_get_valid_name(net, dev->name, dev->name, 0);
+ ret = dev_get_valid_name(dev, dev->name, 0);
if (ret)
goto err_uninit;
@@ -5661,7 +5665,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
/* We get here if we can't use the current device name */
if (!pat)
goto out;
- if (dev_get_valid_name(net, pat, dev->name, 1))
+ if (dev_get_valid_name(dev, pat, 1))
goto out;
}
--
1.7.0.4
^ permalink raw reply related
* [PATCH -next] bridge: fix build for CONFIG_SYSFS disabled
From: Stephen Hemminger @ 2010-05-18 17:28 UTC (permalink / raw)
To: David Miller; +Cc: randy.dunlap, sfr, linux-next, linux-kernel, netdev
In-Reply-To: <20100517.223237.55871633.davem@davemloft.net>
From: Randy Dunlap <randy.dunlap@oracle.com>
Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'
Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/bridge/br_if.c 2010-05-17 10:51:48.638634187 -0700
+++ b/net/bridge/br_if.c 2010-05-18 10:22:28.892111158 -0700
@@ -133,7 +133,7 @@ static void del_nbp(struct net_bridge_po
struct net_bridge *br = p->br;
struct net_device *dev = p->dev;
- sysfs_remove_link(br->ifobj, p->sysfs_name);
+ sysfs_remove_link(br->ifobj, p->dev->name);
dev_set_promiscuity(dev, -1);
^ permalink raw reply
* Re: [0/4] Fix addrconf race conditions
From: Stephen Hemminger @ 2010-05-18 17:25 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, jbohac, yoshfuji, netdev
In-Reply-To: <20100518110243.GA7750@gondor.apana.org.au>
On Tue, 18 May 2010 21:02:43 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Fri, Apr 23, 2010 at 11:05:40PM +0800, Herbert Xu wrote:
> >
> > This stuff is more broken than I thought. For example, we perform
> > a number of actions when DAD succeeds, e.g., joining an anycast
> > group. However, this is not synchronised with respect to address
> > deletion at all, so if DAD succeeds just as someone deletes the
> > address, you can easily get stuck on that anycast group.
> >
> > I will try to untangle this mess tomorrow.
>
> Tomorrow took a while to arrive :)
>
> Here is a first batch of patches. Note that this is by no means
> a comprehensive fix for all the ndisc/addrconf race conditions.
> It is just a first step in trying to address the problems.
>
> The patchset revolves around a new lock, ifp->state_lock. I
> added it instead of trying to reuse the existing ifp->lock because
> the latter has serious nesting issues that prevent it from easily
> being used. My long term plan is to restructure the locking and
> eventually phase out ifp->lock in favour of ifp->state_lock.
I wonder if so many fine grained locks are really necessary at
all. Everything but timers looks like it is under RTNL mutex
already.
^ permalink raw reply
* Re: [PATCH 1/4] ipv6: Replace inet6_ifaddr->dead with state
From: Stephen Hemminger @ 2010-05-18 17:23 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, jbohac, yoshfuji, netdev
In-Reply-To: <E1OEKax-00022W-6K@gondolin.me.apana.org.au>
On Tue, 18 May 2010 21:04:19 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> +enum {
> + INET6_IFADDR_STATE_DAD,
> + INET6_IFADDR_STATE_POSTDAD,
> + INET6_IFADDR_STATE_UP,
> + INET6_IFADDR_STATE_DEAD,
> +};
> +
> #ifdef __KERNEL__
Does this really need to be visible to user applications?
--
^ permalink raw reply
* [PATCH] net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is not defined
From: Ben Hutchings @ 2010-05-18 16:56 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-net-drivers, Joe Perches
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
include/linux/netdevice.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c1b2341..667c2a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2314,7 +2314,7 @@ do { \
#define netif_vdbg(priv, type, dev, format, args...) \
({ \
if (0) \
- netif_printk(KERN_DEBUG, dev, format, ##args); \
+ netif_printk(priv, type, KERN_DEBUG, dev, format, ##args); \
0; \
})
#endif
--
1.6.2.5
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply related
* Re: get beyond 1Gbps with pktgen on 10Gb nic?
From: Rick Jones @ 2010-05-18 16:50 UTC (permalink / raw)
To: Jon Zhou; +Cc: netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497EFC00D@MILEXCH2.ds.jdsu.net>
Jon Zhou wrote:
> hi rick:
>
> do you mean "TCP_NODELAY" will send with packet size as I expect
> without this option,netperf might sent packet with large size? (but
> eventually it will be splitted into MTU size?)
First things first - netperf only ever calls send() with the size you give it
via the command line.
It is what happens after that which matters. Specifically, then when/how TCP
decides to send the data across the network. Setting TCP_NODELAY will disable
the Nagle Algorithm, which, 99 times out of 10, will cause each send() call by
the application to be a separate TCP segment. The 100th time out of 10,
something like a retransmission or a zero window from the remote etc may still
cause multiple small send() calls to be aggregated into larger segments. How
much larger will depend on the Maximum Segment Size (MSS) for the connection,
the MTU is one of the inputs to the decision of what to use for the MSS.
At the end of this message is a bit of boilerplate I have on the aforementioned
Nagle algorithm. It is a bit generic, not stack-specific. It discusses issues
beyond benchmarking considerations, so keep that in mind while you are reading it.
happy benchmarking,
rick jones
$ cat usenet_replies/nagle_algorithm
> I'm not familiar with this issue, and I'm mostly ignorant about what
> tcp does below the sockets interface. Can anybody briefly explain what
> "nagle" is, and how and when to turn it off? Or point me to the
> appropriate manual.
In broad terms, whenever an application does a send() call, the logic
of the Nagle algorithm is supposed to go something like this:
1) Is the quantity of data in this send, plus any queued, unsent data,
greater than the MSS (Maximum Segment Size) for this connection? If
yes, send the data in the user's send now (modulo any other
constraints such as receiver's advertised window and the TCP
congestion window). If no, go to 2.
2) Is the connection to the remote otherwise idle? That is, is there
no unACKed data outstanding on the network. If yes, send the data in
the user's send now. If no, queue the data and wait. Either the
application will continue to call send() with enough data to get to a
full MSS-worth of data, or the remote will ACK all the currently sent,
unACKed data, or our retransmission timer will expire.
Now, where applications run into trouble is when they have what might
be described as "write, write, read" behaviour, where they present
logically associated data to the transport in separate 'send' calls
and those sends are typically less than the MSS for the connection.
It isn't so much that they run afoul of Nagle as they run into issues
with the interaction of Nagle and the other heuristics operating on
the remote. In particular, the delayed ACK heuristics.
When a receiving TCP is deciding whether or not to send an ACK back to
the sender, in broad handwaving terms it goes through logic similar to
this:
a) is there data being sent back to the sender? if yes, piggy-back the
ACK on the data segment.
b) is there a window update being sent back to the sender? if yes,
piggy-back the ACK on the window update.
c) has the standalone ACK timer expired.
Window updates are generally triggered by the following heuristics:
i) would the window update be for a non-trivial fraction of the window
- typically somewhere at or above 1/4 the window, that is, has the
application "consumed" at least that much data? if yes, send a
window update. if no, check ii.
ii) would the window update be for, the application "consumed," at
least 2*MSS worth of data? if yes, send a window update, if no wait.
Now, going back to that write, write, read application, on the sending
side, the first write will be transmitted by TCP via logic rule 2 -
the connection is otherwise idle. However, the second small send will
be delayed as there is at that point unACKnowledged data outstanding
on the connection.
At the receiver, that small TCP segment will arrive and will be passed
to the application. The application does not have the entire app-level
message, so it will not send a reply (data to TCP) back. The typical
TCP window is much much larger than the MSS, so no window update would
be triggered by heuristic i. The data just arrived is < 2*MSS, so no
window update from heuristic ii. Since there is no window update, no
ACK is sent by heuristic b.
So, that leaves heuristic c - the standalone ACK timer. That ranges
anywhere between 50 and 200 milliseconds depending on the TCP stack in
use.
If you've read this far :) now we can take a look at the effect of
various things touted as "fixes" to applications experiencing this
interaction. We take as our example a client-server application where
both the client and the server are implemented with a write of a small
application header, followed by application data. First, the
"default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
with standard ACK behaviour:
Client Server
Req Header ->
<- Standalone ACK after Nms
Req Data ->
<- Possible standalone ACK
<- Rsp Header
Standalone ACK ->
<- Rsp Data
Possible standalone ACK ->
For two "messages" we end-up with at least six segments on the wire.
The possible standalone ACKs will depend on whether the server's
response time, or client's think time is longer than the standalone
ACK interval on their respective sides. Now, if TCP_NODELAY is set we
see:
Client Server
Req Header ->
Req Data ->
<- Possible Standalone ACK after Nms
<- Rsp Header
<- Rsp Data
Possible Standalone ACK ->
In theory, we are down two four segments on the wire which seems good,
but frankly we can do better. First though, consider what happens
when someone disables delayed ACKs
Client Server
Req Header ->
<- Immediate Standalone ACK
Req Data ->
<- Immediate Standalone ACK
<- Rsp Header
Immediate Standalone ACK ->
<- Rsp Data
Immediate Standalone ACK ->
Now we definitly see 8 segments on the wire. It will also be that way
if both TCP_NODELAY is set and delayed ACKs are disabled.
How about if the application did the "right" think in the first place?
That is sent the logically associated data at the same time:
Client Server
Request ->
<- Possible Standalone ACK
<- Response
Possible Standalone ACK ->
We are down to two segments on the wire.
For "small" packets, the CPU cost is about the same regardless of data
or ACK. This means that the application which is making the propper
gathering send call will spend far fewer CPU cycles in the networking
stack.
^ permalink raw reply
* [PATCH 2/2] ethtool: Implement named message type flags
From: Ben Hutchings @ 2010-05-18 16:33 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, sf-linux-drivers
In-Reply-To: <1274200336.2113.0.camel@achroite.uk.solarflarecom.com>
Allow message type flags to be turned on and off by name.
Print the names of the currently set flags below the numeric value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
ethtool.8 | 66 ++++++++++++++++++++++++++++++-
ethtool.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 178 insertions(+), 15 deletions(-)
diff --git a/ethtool.8 b/ethtool.8
index a7b43d5..5983d0e 100644
--- a/ethtool.8
+++ b/ethtool.8
@@ -200,7 +200,10 @@ ethtool \- Display or change ethernet card settings
.RB [ wol \ \*(WO]
.RB [ sopass \ \*(MA]
.RB [ msglvl
-.IR N ]
+.IR N \ |
+.BI msglvl \ type
+.A1 on off
+.RB ...]
.B ethtool \-n
.I ethX
@@ -482,9 +485,66 @@ Disable (wake on nothing). This option clears all previous options.
.B sopass \*(MA\c
Sets the SecureOn(tm) password. The argument to this option must be 6
bytes in ethernet MAC hex format (\*(MA).
-.TP
+.PP
.BI msglvl \ N
-Sets the driver message level. Meanings differ per driver.
+.br
+.BI msglvl \ type
+.A1 on off
+.RB ...
+.RS
+Sets the driver message type flags by name or number. \fItype\fR
+names the type of message to enable or disable; \fIN\fR specifies the
+new flags numerically. The defined type names and numbers are:
+.PD 0
+.TP 12
+.B drv
+0x0001 General driver status
+.TP 12
+.B probe
+0x0002 Hardware probing
+.TP 12
+.B link
+0x0004 Link state
+.TP 12
+.B timer
+0x0008 Periodic status check
+.TP 12
+.B ifdown
+0x0010 Interface being brought down
+.TP 12
+.B ifup
+0x0020 Interface being brought up
+.TP 12
+.B rx_err
+0x0040 Receive error
+.TP 12
+.B tx_err
+0x0080 Transmit error
+.TP 12
+.B tx_queued
+0x0100 Transmit queueing
+.TP 12
+.B intr
+0x0200 Interrupt handling
+.TP 12
+.B tx_done
+0x0400 Transmit completion
+.TP 12
+.B rx_status
+0x0800 Receive completion
+.TP 12
+.B pktdata
+0x1000 Packet contents
+.TP 12
+.B hw
+0x2000 Hardware status
+.TP 12
+.B wol
+0x4000 Wake-on-LAN status
+.PP
+The precise meanings of these type flags differ between drivers.
+.PD
+.RE
.TP
.B \-n \-\-show-nfc
Retrieves the receive network flow classification configurations.
diff --git a/ethtool.c b/ethtool.c
index 7004b7f..380a054 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -20,7 +20,6 @@
* * better man page (steal from mii-tool?)
* * fall back on SIOCMII* ioctl()s and possibly SIOCDEVPRIVATE*
* * abstract ioctls to allow for fallback modes of data gathering
- * * symbolic names for msglvl bitmask
*/
#ifdef HAVE_CONFIG_H
@@ -39,6 +38,7 @@
#include <net/if.h>
#include <sys/utsname.h>
#include <limits.h>
+#include <ctype.h>
#include <linux/sockios.h>
#include "ethtool-util.h"
@@ -51,6 +51,26 @@
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
+#ifndef HAVE_NETIF_MSG
+enum {
+ NETIF_MSG_DRV = 0x0001,
+ NETIF_MSG_PROBE = 0x0002,
+ NETIF_MSG_LINK = 0x0004,
+ NETIF_MSG_TIMER = 0x0008,
+ NETIF_MSG_IFDOWN = 0x0010,
+ NETIF_MSG_IFUP = 0x0020,
+ NETIF_MSG_RX_ERR = 0x0040,
+ NETIF_MSG_TX_ERR = 0x0080,
+ NETIF_MSG_TX_QUEUED = 0x0100,
+ NETIF_MSG_INTR = 0x0200,
+ NETIF_MSG_TX_DONE = 0x0400,
+ NETIF_MSG_RX_STATUS = 0x0800,
+ NETIF_MSG_PKTDATA = 0x1000,
+ NETIF_MSG_HW = 0x2000,
+ NETIF_MSG_WOL = 0x4000,
+};
+#endif
+
static int parse_wolopts(char *optstr, u32 *data);
static char *unparse_wolopts(int wolopts);
static int parse_sopass(char *src, unsigned char *dest);
@@ -126,7 +146,7 @@ static struct option {
" [ xcvr internal|external ]\n"
" [ wol p|u|m|b|a|g|s|d... ]\n"
" [ sopass %x:%x:%x:%x:%x:%x ]\n"
- " [ msglvl %d ] \n" },
+ " [ msglvl %d | msglvl type on|off ... ]\n" },
{ "-a", "--show-pause", MODE_GPAUSE, "Show pause options" },
{ "-A", "--pause", MODE_SPAUSE, "Set pause options",
" [ autoneg on|off ]\n"
@@ -311,7 +331,6 @@ static int wol_change = 0;
static u8 sopass_wanted[SOPASS_MAX];
static int sopass_change = 0;
static int gwol_changed = 0; /* did anything in GWOL change? */
-static int msglvl_wanted = -1;
static int phys_id_time = 0;
static int gregs_changed = 0;
static int gregs_dump_raw = 0;
@@ -335,6 +354,25 @@ static struct ethtool_rx_ntuple_flow_spec ntuple_fs;
static char *flash_file = NULL;
static int flash = -1;
static int flash_region = -1;
+
+static int msglvl_changed = 0;
+static int msglvl_wanted = -1;
+static int msg_drv_wanted = -1;
+static int msg_probe_wanted = -1;
+static int msg_link_wanted = -1;
+static int msg_timer_wanted = -1;
+static int msg_ifdown_wanted = -1;
+static int msg_ifup_wanted = -1;
+static int msg_rx_err_wanted = -1;
+static int msg_tx_err_wanted = -1;
+static int msg_tx_queued_wanted = -1;
+static int msg_intr_wanted = -1;
+static int msg_tx_done_wanted = -1;
+static int msg_rx_status_wanted = -1;
+static int msg_pktdata_wanted = -1;
+static int msg_hw_wanted = -1;
+static int msg_wol_wanted = -1;
+
static enum {
ONLINE=0,
OFFLINE,
@@ -447,6 +485,42 @@ static struct cmdline_info cmdline_ntuple[] = {
{ "action", CMDL_INT, &ntuple_fs.action, NULL },
};
+static struct cmdline_info cmdline_msglvl[] = {
+ { "drv", CMDL_BOOL, &msg_drv_wanted, NULL },
+ { "probe", CMDL_BOOL, &msg_probe_wanted, NULL },
+ { "link", CMDL_BOOL, &msg_link_wanted, NULL },
+ { "timer", CMDL_BOOL, &msg_timer_wanted, NULL },
+ { "ifdown", CMDL_BOOL, &msg_ifdown_wanted, NULL },
+ { "ifup", CMDL_BOOL, &msg_ifup_wanted, NULL },
+ { "rx_err", CMDL_BOOL, &msg_rx_err_wanted, NULL },
+ { "tx_err", CMDL_BOOL, &msg_tx_err_wanted, NULL },
+ { "tx_queued", CMDL_BOOL, &msg_tx_queued_wanted, NULL },
+ { "intr", CMDL_BOOL, &msg_intr_wanted, NULL },
+ { "tx_done", CMDL_BOOL, &msg_tx_done_wanted, NULL },
+ { "rx_status", CMDL_BOOL, &msg_rx_status_wanted, NULL },
+ { "pktdata", CMDL_BOOL, &msg_pktdata_wanted, NULL },
+ { "hw", CMDL_BOOL, &msg_hw_wanted, NULL },
+ { "wol", CMDL_BOOL, &msg_wol_wanted, NULL },
+};
+
+static struct named_flag flag_msglvl[] = {
+ { "drv", NETIF_MSG_DRV, &msg_drv_wanted },
+ { "probe", NETIF_MSG_PROBE, &msg_probe_wanted },
+ { "link", NETIF_MSG_LINK, &msg_link_wanted },
+ { "timer", NETIF_MSG_TIMER, &msg_timer_wanted },
+ { "ifdown", NETIF_MSG_IFDOWN, &msg_ifdown_wanted },
+ { "ifup", NETIF_MSG_IFUP, &msg_ifup_wanted },
+ { "rx_err", NETIF_MSG_RX_ERR, &msg_rx_err_wanted },
+ { "tx_err", NETIF_MSG_TX_ERR, &msg_tx_err_wanted },
+ { "tx_queued", NETIF_MSG_TX_QUEUED, &msg_tx_queued_wanted },
+ { "intr", NETIF_MSG_INTR, &msg_intr_wanted },
+ { "tx_done", NETIF_MSG_TX_DONE, &msg_tx_done_wanted },
+ { "rx_status", NETIF_MSG_RX_STATUS, &msg_rx_status_wanted },
+ { "pktdata", NETIF_MSG_PKTDATA, &msg_pktdata_wanted },
+ { "hw", NETIF_MSG_HW, &msg_hw_wanted },
+ { "wol", NETIF_MSG_WOL, &msg_wol_wanted },
+};
+
static int get_int(char *str, int base)
{
long v;
@@ -877,7 +951,17 @@ static void parse_cmdline(int argc, char **argp)
i++;
if (i >= argc)
show_usage(1);
- msglvl_wanted = get_int(argp[i], 0);
+ if (isdigit((unsigned char)argp[i][0])) {
+ msglvl_wanted = get_int(argp[i], 0);
+ msglvl_changed = 1;
+ } else {
+ parse_generic_cmdline(
+ argc, argp, i,
+ &msglvl_changed,
+ cmdline_msglvl,
+ ARRAY_SIZE(cmdline_msglvl));
+ i = argc;
+ }
break;
}
show_usage(1);
@@ -2203,8 +2287,11 @@ static int do_gset(int fd, struct ifreq *ifr)
ifr->ifr_data = (caddr_t)&edata;
err = send_ioctl(fd, ifr);
if (err == 0) {
- fprintf(stdout, " Current message level: 0x%08x (%d)\n",
+ fprintf(stdout, " Current message level: 0x%08x (%d)\n"
+ " ",
edata.data, edata.data);
+ print_flags(flag_msglvl, ARRAY_SIZE(flag_msglvl), edata.data);
+ fprintf(stdout, "\n");
allfail = 0;
} else if (errno != EOPNOTSUPP) {
perror("Cannot get message level");
@@ -2327,15 +2414,31 @@ static int do_sset(int fd, struct ifreq *ifr)
}
}
- if (msglvl_wanted != -1) {
+ if (msglvl_changed) {
struct ethtool_value edata;
- edata.cmd = ETHTOOL_SMSGLVL;
- edata.data = msglvl_wanted;
- ifr->ifr_data = (caddr_t)&edata;;
- err = send_ioctl(fd, ifr);
- if (err < 0)
- perror("Cannot set new msglvl");
+ if (msglvl_wanted == -1) {
+ edata.cmd = ETHTOOL_GMSGLVL;
+ ifr->ifr_data = (caddr_t)&edata;;
+ err = send_ioctl(fd, ifr);
+ if (err < 0)
+ perror("Cannot get msglvl");
+ else
+ msglvl_wanted = update_flags(
+ flag_msglvl, ARRAY_SIZE(flag_msglvl),
+ edata.data);
+ } else {
+ err = 0;
+ }
+
+ if (err == 0) {
+ edata.cmd = ETHTOOL_SMSGLVL;
+ edata.data = msglvl_wanted;
+ ifr->ifr_data = (caddr_t)&edata;;
+ err = send_ioctl(fd, ifr);
+ if (err < 0)
+ perror("Cannot set new msglvl");
+ }
}
return 0;
--
1.6.2.5
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply related
* [PATCH 1/2] ethtool: Add generic structure and functions for named flags
From: Ben Hutchings @ 2010-05-18 16:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, sf-linux-drivers
This will be used to support named message type flags.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
ethtool.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/ethtool.c b/ethtool.c
index 4226a67..7004b7f 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -355,6 +355,12 @@ struct cmdline_info {
void *ioctl_val;
};
+struct named_flag {
+ const char *name;
+ u32 flag;
+ int *wanted;
+};
+
static struct cmdline_info cmdline_gregs[] = {
{ "raw", CMDL_BOOL, &gregs_dump_raw, NULL },
{ "hex", CMDL_BOOL, &gregs_dump_hex, NULL },
@@ -520,6 +526,41 @@ static void parse_generic_cmdline(int argc, char **argp,
}
}
+static void
+print_flags(const struct named_flag *flags, unsigned int n_flags, u32 value)
+{
+ const char *sep = "";
+
+ while (n_flags) {
+ if (value & flags->flag) {
+ printf("%s%s", sep, flags->name);
+ sep = " ";
+ value &= ~flags->flag;
+ }
+ ++flags;
+ --n_flags;
+ }
+
+ /* Print any unrecognised flags in hex */
+ if (value)
+ printf("%s%#x", sep, value);
+}
+
+static u32
+update_flags(const struct named_flag *flags, unsigned int n_flags, u32 value)
+{
+ while (n_flags) {
+ if (*flags->wanted == 0)
+ value &= ~flags->flag;
+ else if (*flags->wanted == 1)
+ value |= flags->flag;
+ ++flags;
+ --n_flags;
+ }
+
+ return value;
+}
+
static int rxflow_str_to_type(const char *str)
{
int flow_type = 0;
--
1.6.2.5
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply related
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-18 16:28 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <1274148718.7893.14.camel@HP1>
On 2010-05-18 04:11, Michael Chan wrote:
>
> On Tue, 2010-05-18 at 08:35 -0700, Krzysztof Olędzki wrote:
>> On 2010-05-16 20:51, Michael Chan wrote:
>>> Krzysztof Oledzki wrote:
>>>
>>>>
>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>> limit it to 4?
>>>>
>>>
>>> The first vector (eth0-0) handles link interrupt and other slow
>>> path events. It also has an RX ring for non-IP packets that are
>>> not hashed by the RSS hash. The majority of the rx packets should
>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>> vectors to different CPUs.
>>
>> Did some more test on a two 4 core CPUs (8 CPUs reported to the system)
>> and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in
>> both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61
>> .. 68). However, dmesg shows that 9 interrupts are allocated:
>>
>> bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
>>
>> It such case, which ring will be used for slow path and non-IP packets
>> and why there is no additional queue like in a 4CPU case?
>>
>
> eth0-0 is always the one handling slow path, rx ring 0 (non-IP), and tx
> ring 0. The last vector is not used by bnx2. It is reserved for iSCSI
> which is handled by the cnic and bnx2i drivers.
Thanks again for the explanation.
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-18 2:11 UTC (permalink / raw)
To: Krzysztof Olędzki; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF2B3BE.60209@ans.pl>
On Tue, 2010-05-18 at 08:35 -0700, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events. It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash. The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
>
> Did some more test on a two 4 core CPUs (8 CPUs reported to the system)
> and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in
> both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61
> .. 68). However, dmesg shows that 9 interrupts are allocated:
>
> bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
>
> It such case, which ring will be used for slow path and non-IP packets
> and why there is no additional queue like in a 4CPU case?
>
eth0-0 is always the one handling slow path, rx ring 0 (non-IP), and tx
ring 0. The last vector is not used by bnx2. It is reserved for iSCSI
which is handled by the cnic and bnx2i drivers.
^ permalink raw reply
* Re: [PATCH v3 3/3] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Scott Wood @ 2010-05-18 16:23 UTC (permalink / raw)
To: Richard Cochran; +Cc: netdev, devicetree-discuss, linuxppc-dev
In-Reply-To: <20100518063608.GA2720@riccoc20.at.omicron.at>
On 05/18/2010 01:36 AM, Richard Cochran wrote:
> On Mon, May 17, 2010 at 01:05:54PM -0500, Scott Wood wrote:
>>>>>> + - tmr_fiper1 Fixed interval period pulse generator.
>>>>>> + - tmr_fiper2 Fixed interval period pulse generator.
>>>>
>>
>> MPC8572 and P2020 have fiper3 as well.
>
> I doubt they really have a third fiper.
>
> First of all, this signal is not routed anywhere on the boards.
OK, but that's a separate issue from whether it exists on the chip and
could be used on a different board.
> Also, according to the documentation, it has no bit in the TMR_CTRL or the
> TMR_TEMASK registers.
It does seem inconsistent -- but could just be bad docs.
> Unless there is a bit in TMR_TEMASK, you cannot
> get an interrupt from it.
>
> If you cannot use the signal externally (in the "real" world) and you
> cannot get an interrupt, what good is it to have such a periodic
> signal? Polling the bit in the TMR_TEVENT to see when a pulse occurs
> seems pointless.
>
> Scott, you have connections, right? Can you clarify this for me?
I'll ask around.
-Scott
^ permalink raw reply
* [PATCH net-next-2.6] bonding: make bonding_store_slaves simpler
From: Jiri Pirko @ 2010-05-18 15:46 UTC (permalink / raw)
To: netdev; +Cc: davem, fubar, bonding-devel
This patch makes bonding_store_slaves function nicer and easier to understand.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/bonding/bond_sysfs.c | 66 ++++++++++++++-----------------------
1 files changed, 25 insertions(+), 41 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 7911438..a4cbaf7 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -211,7 +211,8 @@ static ssize_t bonding_show_slaves(struct device *d,
/*
* Set the slaves in the current bond. The bond interface must be
* up for this to succeed.
- * This function is largely the same flow as bonding_update_bonds().
+ * This is supposed to be only thin wrapper for bond_enslave and bond_release.
+ * All hard work should be done there.
*/
static ssize_t bonding_store_slaves(struct device *d,
struct device_attribute *attr,
@@ -219,9 +220,8 @@ static ssize_t bonding_store_slaves(struct device *d,
{
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
- int i, res, ret = count;
- struct slave *slave;
- struct net_device *dev = NULL;
+ int res, ret = count;
+ struct net_device *dev;
struct bonding *bond = to_bond(d);
/* Quick sanity check -- is the bond interface up? */
@@ -230,8 +230,6 @@ static ssize_t bonding_store_slaves(struct device *d,
bond->dev->name);
}
- /* Note: We can't hold bond->lock here, as bond_create grabs it. */
-
if (!rtnl_trylock())
return restart_syscall();
@@ -241,19 +239,17 @@ static ssize_t bonding_store_slaves(struct device *d,
!dev_valid_name(ifname))
goto err_no_cmd;
- if (command[0] == '+') {
-
- /* Got a slave name in ifname. */
-
- dev = __dev_get_by_name(dev_net(bond->dev), ifname);
- if (!dev) {
- pr_info("%s: Interface %s does not exist!\n",
- bond->dev->name, ifname);
- ret = -ENODEV;
- goto out;
- }
+ dev = __dev_get_by_name(dev_net(bond->dev), ifname);
+ if (!dev) {
+ pr_info("%s: Interface %s does not exist!\n",
+ bond->dev->name, ifname);
+ ret = -ENODEV;
+ goto out;
+ }
- pr_info("%s: Adding slave %s.\n", bond->dev->name, ifname);
+ switch (command[0]) {
+ case '+':
+ pr_info("%s: Adding slave %s.\n", bond->dev->name, dev->name);
/* If this is the first slave, then we need to set
the master's hardware address to be the same as the
@@ -263,33 +259,21 @@ static ssize_t bonding_store_slaves(struct device *d,
dev->addr_len);
res = bond_enslave(bond->dev, dev);
- if (res)
- ret = res;
+ break;
- goto out;
- }
+ case '-':
+ pr_info("%s: Removing slave %s.\n", bond->dev->name, dev->name);
+ res = bond_release(bond->dev, dev);
+ break;
- if (command[0] == '-') {
- dev = NULL;
- bond_for_each_slave(bond, slave, i)
- if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
- dev = slave->dev;
- break;
- }
- if (dev) {
- pr_info("%s: Removing slave %s\n",
- bond->dev->name, dev->name);
- res = bond_release(bond->dev, dev);
- if (res)
- ret = res;
- } else {
- pr_err("unable to remove non-existent slave %s for bond %s.\n",
- ifname, bond->dev->name);
- ret = -ENODEV;
- }
- goto out;
+ default:
+ goto err_no_cmd;
}
+ if (res)
+ ret = res;
+ goto out;
+
err_no_cmd:
pr_err("no command found in slaves file for bond %s. Use +ifname or -ifname.\n",
bond->dev->name);
--
1.6.6.1
^ permalink raw reply related
* [PATCH net-next-2.6] bonding: remove redundant checks from bonding_store_slaves V2
From: Jiri Pirko @ 2010-05-18 15:44 UTC (permalink / raw)
To: netdev; +Cc: davem, fubar, bonding-devel
(it's actually the same as v1)
Remove checks that duplicates similar checks in bond_enslave.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/bonding/bond_sysfs.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 29a7a8a..7911438 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -243,7 +243,7 @@ static ssize_t bonding_store_slaves(struct device *d,
if (command[0] == '+') {
- /* Got a slave name in ifname. Is it already in the list? */
+ /* Got a slave name in ifname. */
dev = __dev_get_by_name(dev_net(bond->dev), ifname);
if (!dev) {
@@ -253,24 +253,6 @@ static ssize_t bonding_store_slaves(struct device *d,
goto out;
}
- if (dev->flags & IFF_UP) {
- pr_err("%s: Error: Unable to enslave %s because it is already up.\n",
- bond->dev->name, dev->name);
- ret = -EPERM;
- goto out;
- }
-
- read_lock(&bond->lock);
- bond_for_each_slave(bond, slave, i)
- if (slave->dev == dev) {
- pr_err("%s: Interface %s is already enslaved!\n",
- bond->dev->name, ifname);
- ret = -EPERM;
- read_unlock(&bond->lock);
- goto out;
- }
- read_unlock(&bond->lock);
-
pr_info("%s: Adding slave %s.\n", bond->dev->name, ifname);
/* If this is the first slave, then we need to set
--
1.6.6.1
^ permalink raw reply related
* [PATCH net-next-2.6] bonding: move slave MTU handling from sysfs V2
From: Jiri Pirko @ 2010-05-18 15:42 UTC (permalink / raw)
To: netdev; +Cc: davem, fubar, bonding-devel, monis
V1->V2: corrected res/ret use
For some reason, MTU handling (storing, and restoring) is taking place in
bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
So move it there.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/bonding/bond_main.c | 15 ++++++++++++++-
drivers/net/bonding/bond_sysfs.c | 22 ++--------------------
2 files changed, 16 insertions(+), 21 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..2c3f9db 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1533,6 +1533,14 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
*/
new_slave->original_flags = slave_dev->flags;
+ /* Save slave's original mtu and then set it to match the bond */
+ new_slave->original_mtu = slave_dev->mtu;
+ res = dev_set_mtu(slave_dev, bond->dev->mtu);
+ if (res) {
+ pr_debug("Error %d calling dev_set_mtu\n", res);
+ goto err_free;
+ }
+
/*
* Save slave's original ("permanent") mac address for modes
* that need it, and for restoring it upon release, and then
@@ -1550,7 +1558,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
res = dev_set_mac_address(slave_dev, &addr);
if (res) {
pr_debug("Error %d calling set_mac_address\n", res);
- goto err_free;
+ goto err_restore_mtu;
}
}
@@ -1785,6 +1793,9 @@ err_restore_mac:
dev_set_mac_address(slave_dev, &addr);
}
+err_restore_mtu:
+ dev_set_mtu(slave_dev, new_slave->original_mtu);
+
err_free:
kfree(new_slave);
@@ -1969,6 +1980,8 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
dev_set_mac_address(slave_dev, &addr);
}
+ dev_set_mtu(slave_dev, slave->original_mtu);
+
slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
IFF_SLAVE_INACTIVE | IFF_BONDING |
IFF_SLAVE_NEEDARP);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 392e291..29a7a8a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -220,7 +220,6 @@ static ssize_t bonding_store_slaves(struct device *d,
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
int i, res, ret = count;
- u32 original_mtu;
struct slave *slave;
struct net_device *dev = NULL;
struct bonding *bond = to_bond(d);
@@ -281,18 +280,7 @@ static ssize_t bonding_store_slaves(struct device *d,
memcpy(bond->dev->dev_addr, dev->dev_addr,
dev->addr_len);
- /* Set the slave's MTU to match the bond */
- original_mtu = dev->mtu;
- res = dev_set_mtu(dev, bond->dev->mtu);
- if (res) {
- ret = res;
- goto out;
- }
-
res = bond_enslave(bond->dev, dev);
- bond_for_each_slave(bond, slave, i)
- if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0)
- slave->original_mtu = original_mtu;
if (res)
ret = res;
@@ -301,23 +289,17 @@ static ssize_t bonding_store_slaves(struct device *d,
if (command[0] == '-') {
dev = NULL;
- original_mtu = 0;
bond_for_each_slave(bond, slave, i)
if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
dev = slave->dev;
- original_mtu = slave->original_mtu;
break;
}
if (dev) {
pr_info("%s: Removing slave %s\n",
bond->dev->name, dev->name);
- res = bond_release(bond->dev, dev);
- if (res) {
+ res = bond_release(bond->dev, dev);
+ if (res)
ret = res;
- goto out;
- }
- /* set the slave MTU to the default */
- dev_set_mtu(dev, original_mtu);
} else {
pr_err("unable to remove non-existent slave %s for bond %s.\n",
ifname, bond->dev->name);
--
1.6.6.1
^ permalink raw reply related
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-18 15:35 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B78147539@IRVEXCHCCR01.corp.ad.broadcom.com>
On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
>
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
>
> The first vector (eth0-0) handles link interrupt and other slow
> path events. It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash. The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.
Did some more test on a two 4 core CPUs (8 CPUs reported to the system)
and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in
both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61
.. 68). However, dmesg shows that 9 interrupts are allocated:
bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
It such case, which ring will be used for slow path and non-IP packets
and why there is no additional queue like in a 4CPU case?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* [PATCH net-next] ixgbe: return error in set_rar when index out of range
From: Shirley Ma @ 2010-05-18 15:34 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: e1000-devel, netdev, davem, kvm
Return -1 when set_rar index is out of range
Signed-off-by: Shirley Ma <xma@us.ibm.com>
---
drivers/net/ixgbe/ixgbe_common.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe_common.c
b/drivers/net/ixgbe/ixgbe_common.c
index 1159d91..77b3cf4 100644
--- a/drivers/net/ixgbe/ixgbe_common.c
+++ b/drivers/net/ixgbe/ixgbe_common.c
@@ -1188,6 +1188,7 @@ s32 ixgbe_set_rar_generic(struct ixgbe_hw *hw, u32
index, u8 *addr, u32 vmdq,
IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high);
} else {
hw_dbg(hw, "RAR index %d is out of range.\n", index);
+ return -1;
}
return 0;
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply related
* Re: [PATCH] iproute2: rework SR-IOV VF support
From: Stephen Hemminger @ 2010-05-18 15:15 UTC (permalink / raw)
To: Chris Wright; +Cc: netdev, Williams, Mitch A
In-Reply-To: <20100518075700.GE8301@sequoia.sous-sol.org>
On Tue, 18 May 2010 00:57:00 -0700
Chris Wright <chrisw@sous-sol.org> wrote:
> The kernel interface changed just before 2.6.34 was released. This brings
> iproute2 in line with the current changes. The VF portion of setlink is
> comprised of a set of nested attributes.
>
> IFLA_VFINFO_LIST (NESTED)
> IFLA_VF_INFO (NESTED)
> IFLA_VF_MAC
> IFLA_VF_VLAN
> IFLA_VF_TX_RATE
>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
Applied
--
^ permalink raw reply
* Re: dev_get_valid_name buggy with hash collision
From: Daniel Lezcano @ 2010-05-18 14:55 UTC (permalink / raw)
To: Octavian Purdila; +Cc: Linux Netdev List
In-Reply-To: <201005181529.37420.opurdila@ixiacom.com>
On 05/18/2010 02:29 PM, Octavian Purdila wrote:
> On Tuesday 18 May 2010 13:17:10 you wrote:
>
>
>> the commit:
>>
>> commit d90310243fd750240755e217c5faa13e24f41536
>> Author: Octavian Purdila<opurdila@ixiacom.com>
>> Date: Wed Nov 18 02:36:59 2009 +0000
>>
>> net: device name allocation cleanups
>>
>> introduced a bug when there is a hash collision making impossible to
>> rename a device with eth%d
>>
>>
[ ... ]
>> --- net-2.6.orig/net/core/dev.c
>> +++ net-2.6/net/core/dev.c
>> @@ -936,18 +936,22 @@ int dev_alloc_name(struct net_device *de
>> }
>> EXPORT_SYMBOL(dev_alloc_name);
>>
>> -static int dev_get_valid_name(struct net *net, const char *name, char *buf,
>> - bool fmt)
>> +static int dev_get_valid_name(struct net_device *dev, const char *name, bool
>>
> fmt)
>
>> {
>> + struct net *net;
>> +
>> + BUG_ON(!dev_net(dev));
>> + net = dev_net(dev);
>> +
>> if (!dev_valid_name(name))
>> return -EINVAL;
>>
>> if (fmt&& strchr(name, '%'))
>> - return __dev_alloc_name(net, name, buf);
>> + return dev_alloc_name(dev, name);
>> else if (__dev_get_by_name(net, name))
>> return -EEXIST;
>> - else if (buf != name)
>> - strlcpy(buf, name, IFNAMSIZ);
>> + else if (strncmp(dev->name, name, IFNAMSIZ))
>> + strlcpy(dev->name, name, IFNAMSIZ);
>>
>>
> Why do the strncmp, can't we preserve the (buf != name) condition
The 'buf' parameter is no longer passed to the function. We have the
'dev' and the 'newname' parameters.
The pointer test was just to check 'dev_get_valid_name' was called from
the 'register_netdevice' function context with 'dev_get_valid_name(net,
dev->name, dev->name, 0)'. Comparing the strings is valid in this case.
Otherwise dev_get_valid_name is called from:
* "dev_change_net_namespace" with "dev%d" or "ifname" specified
within the netlink message. Both are different pointers, the first will
fall in the "if (fmt && strchr(name, '%'))".
* "dev_change_name", where the pointers are different and the strings
are different.
I think it is safe to do the string comparison here. But maybe there are
a few simplifications (eg. remove fmt) to do.
If you agree, I will send this patch against net-2.6 and the
simplifications against net-next-2.6.
Thanks
-- Daniel
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-18 14:55 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274192761.8274.3.camel@edumazet-laptop>
On 2010-05-18 16:26, Eric Dumazet wrote:
> Le mardi 18 mai 2010 à 16:22 +0200, Krzysztof Olędzki a écrit :
>
>> Thank you. What happens if I set it to a lower/bigger value, than
>> avaliable txqueues in a NIC?
>
> lower values -> same situation than today (not all txqueues will be
> used)
>
> bigger values -> it will be capped, so its only a bit more ram
> allocated.
So it is safe to put there little bigger value than needed. Thanks.
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-18 14:26 UTC (permalink / raw)
To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF2A288.7040304@ans.pl>
Le mardi 18 mai 2010 à 16:22 +0200, Krzysztof Olędzki a écrit :
> Thank you. What happens if I set it to a lower/bigger value, than
> avaliable txqueues in a NIC?
lower values -> same situation than today (not all txqueues will be
used)
bigger values -> it will be capped, so its only a bit more ram
allocated.
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-18 14:22 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274045180.2299.38.camel@edumazet-laptop>
On 2010-05-16 23:26, Eric Dumazet wrote:
<CUT>
>> My normal workload is TCP and UDP based so if it is only ICMP then there
>> is no problem. Actually I have noticeably more UDP traffic than an
>> average network, mainly because of LWAPP/CAPWAP, so I'm interested in
>> good performance for both TCP and UDP.
>>
>> During my initial tests ICMP ping showed the same behavior like UDP/TCP
>> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP
>> of course. :)
>>
>>>> BTW: With a normal router workload, should I expect big performance drop
>>>> when receiving and forwarding the same packet using different CPUs?
>>>> Bonding provides very important functionality, I'm not able to drop it. :(
>>>>
>>>
>>> Not sure what you mean by forwarding same packet using different CPUs.
>>> You probably meant different queues, because in normal case, only one
>>> cpu is involved (the one receiving the packet is also the one
>>> transmitting it, unless you have congestion or trafic shaping)
>>
>> I mean to receive it on a one CPU and to send it on a different one. I
>> would like to assing different vectors (eth1-0 .. eth1-4) to different
>> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1
>> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two
>> different CPUs will be involved (RX on q1-q4, TX on q0).
>
> As I said, (unless you use RPS), one forwarded packet only uses one CPU.
> How tx queue is selected is another story. We try to do a 1-1 mapping.
OK, but with multi-queue NIC, I can assign each queue to a different
CPU. So, while forwarding packets from a flow, I would like to assign
the same queue on both input and output.
>>> If you have 4 cpus, you can use following patch and have a transparent
>>> bonding against multiqueue.
>>
>> Thanks! If I get it right: with the patch, packets should be sent using
>> the same CPU (queue?) that was used when receiving?
>
> Yes, for forwarding loads.
>
> (You might use 5 or 8 instead of 4, because its not clear to me if bnx2
> has 5 txqueues or 4 in your case)
Thank you. What happens if I set it to a lower/bigger value, than
avaliable txqueues in a NIC?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: [PATCH] virtio: put last seen used index into ring itself
From: Rusty Russell @ 2010-05-18 7:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jiri Pirko, Shirley Ma, Amit Shah, Mark McLoughlin, netdev,
linux-kernel, quintela, alex.williamson
In-Reply-To: <20100518004744.GA21359@redhat.com>
On Tue, 18 May 2010 10:17:44 am Michael S. Tsirkin wrote:
> Generally, the Host end of the virtio ring doesn't need to see where
> Guest is up to in consuming the ring. However, to completely understand
> what's going on from the outside, this information must be exposed.
> For example, host can reduce the number of interrupts by detecting
> that the guest is currently handling previous buffers.
Thanks applied.
Cheers,
Rusty.
^ permalink raw reply
* [PATCH] net: avoid one atomic op per cloned skb
From: Eric Dumazet @ 2010-05-18 13:40 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Hi David
I know you said 'only patches', but I found following patch small
enough ?
I have a followup patch to avoid two atomic ops per cloned skb on
dataref (helps TCP tx path) but will submit it for 2.6.36, since its
diffstat is a bit more than 3++- :)
Thanks
[PATCH] net: avoid one atomic op per cloned skb
skb_clone() can use atomic_set(clone_ref, 2) safely, because only
current thread can possibly touch clone_ref at this point.
Add a WARN_ON_ONCE() for a while, to catch wrong assumptions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/core/skbuff.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c543dd2..4444f15 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -628,7 +628,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
n->fclone == SKB_FCLONE_UNAVAILABLE) {
atomic_t *fclone_ref = (atomic_t *) (n + 1);
n->fclone = SKB_FCLONE_CLONE;
- atomic_inc(fclone_ref);
+ WARN_ON_ONCE(atomic_read(fclone_ref) != 1);
+ atomic_set(fclone_ref, 2);
} else {
n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
if (!n)
^ permalink raw reply related
* Re: loosing IPMI-card by loading netconsole
From: Henning Fehrmann @ 2010-05-18 13:12 UTC (permalink / raw)
To: Tejun Heo
Cc: Ronciak, John, Kirsher, Jeffrey T, Brandeburg, Jesse,
Allan, Bruce W, Waskiewicz Jr, Peter P, netdev@vger.kernel.org,
Carsten Aulbert
In-Reply-To: <4BEDCD87.4090602@kernel.org>
Hello,
> >> Yeap, sure, it would be effective but I kind of want to leave
> >> bisection as the last resort. Bisection is a somewhat painful process
> >> especially when the machine isn't right next to you and someone who
> >> has overall knowledge can often identify the problem much easier with
> >> appropriate debugging info.
> >
> > Well nothing jumps to mind in the netpoll/netconsole code and I haven't
> > heard any similar reports. My guess is it's something obscure, so I
> > think the sooner you start bisecting... Even one or two tests will get
> > us a lot closer to figuring out what changed in the last 1.5 years.
>
> I see. I was hoping it would ring a bell to someone. We'll probably
> try to provide the info Jesse asked and if that doesn't lead anywhere
> start bisecting.
Let me re-describe the symptoms.
I am not loading any ipmi related modules and not the netconsole
module.
When booting out current 2.6.32 kernel we can not access the IPMI
remotely.
We had one case where the IPMI card was accessible while using this
kernel but probably due to the fact that eth0 was removed. We do not
consider this case anymore.
This problem does not occur when using an older kernel.
It has likely nothing to do with netconsole.
Here is the bisecting result:
The sha1 sum of the first bad commit is:
6e50912a442947d5fafd296ca6fdcbeb36b163ff
Hence, the last good commit has:
b2f8f7525c8aa1fdd8ad8c72c832dfb571d5f768
Cheers,
Henning
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox