* Re: [PATCH] r6040: fix link checking with switches
From: David Miller @ 2010-05-18 5:48 UTC (permalink / raw)
To: florian; +Cc: netdev
In-Reply-To: <201005161430.57429.florian@openwrt.org>
From: Florian Fainelli <florian@openwrt.org>
Date: Sun, 16 May 2010 14:30:56 +0200
> The current link checking logic only works for one port, which is not correct
> for swiches were multiple ports can have different link status. As a result
> we would only check for link status on port 1 of the switch. Move the calls
> to mii_check_media in r6040_timer which will be polling a single PHY chip
> correctly and assume link is up for switches.
>
> Signed-off-by: Florian Fainelli <florian@openwrt.org>
Applied.
^ permalink raw reply
* [net-next-2.6 V9 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Scott Feldman @ 2010-05-18 5:48 UTC (permalink / raw)
To: davem; +Cc: netdev, chrisw, arnd, kaber
In-Reply-To: <20100518054330.21787.29398.stgit@savbu-pc100.cisco.com>
From: Scott Feldman <scofeldm@cisco.com>
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list). These are both nested atrtibutes
using this layout:
[IFLA_NUM_VF]
[IFLA_VF_PORTS]
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
...
[IFLA_PORT_SELF]
[IFLA_PORT_*], ...
These attributes are design to be set and get symmetrically. VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device. PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.
A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev. A port-profile is an identifier known to the switch. How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.
There are two types of port-profiles specs in the netlink msg. The first spec
is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details. In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs. For example, both specs
have a notition of associating/deassociating a port-profile. And both specs
require some information from the hypervisor manager, such as client port
instance ID.
The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile. What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
include/linux/if_link.h | 75 ++++++++++++++++++++
include/linux/netdevice.h | 8 ++
net/core/rtnetlink.c | 169 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 251 insertions(+), 1 deletions(-)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index c3af67f..85c812d 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -113,6 +113,8 @@ enum {
IFLA_NUM_VF, /* Number of VFs if device is SR-IOV PF */
IFLA_VFINFO_LIST,
IFLA_STATS64,
+ IFLA_VF_PORTS,
+ IFLA_PORT_SELF,
__IFLA_MAX
};
@@ -274,4 +276,77 @@ struct ifla_vf_info {
__u32 qos;
__u32 tx_rate;
};
+
+/* VF ports management section
+ *
+ * Nested layout of set/get msg is:
+ *
+ * [IFLA_NUM_VF]
+ * [IFLA_VF_PORTS]
+ * [IFLA_VF_PORT]
+ * [IFLA_PORT_*], ...
+ * [IFLA_VF_PORT]
+ * [IFLA_PORT_*], ...
+ * ...
+ * [IFLA_PORT_SELF]
+ * [IFLA_PORT_*], ...
+ */
+
+enum {
+ IFLA_VF_PORT_UNSPEC,
+ IFLA_VF_PORT, /* nest */
+ __IFLA_VF_PORT_MAX,
+};
+
+#define IFLA_VF_PORT_MAX (__IFLA_VF_PORT_MAX - 1)
+
+enum {
+ IFLA_PORT_UNSPEC,
+ IFLA_PORT_VF, /* __u32 */
+ IFLA_PORT_PROFILE, /* string */
+ IFLA_PORT_VSI_TYPE, /* 802.1Qbg (pre-)standard VDP */
+ IFLA_PORT_INSTANCE_UUID, /* binary UUID */
+ IFLA_PORT_HOST_UUID, /* binary UUID */
+ IFLA_PORT_REQUEST, /* __u8 */
+ IFLA_PORT_RESPONSE, /* __u16, output only */
+ __IFLA_PORT_MAX,
+};
+
+#define IFLA_PORT_MAX (__IFLA_PORT_MAX - 1)
+
+#define PORT_PROFILE_MAX 40
+#define PORT_UUID_MAX 16
+#define PORT_SELF_VF -1
+
+enum {
+ PORT_REQUEST_PREASSOCIATE = 0,
+ PORT_REQUEST_PREASSOCIATE_RR,
+ PORT_REQUEST_ASSOCIATE,
+ PORT_REQUEST_DISASSOCIATE,
+};
+
+enum {
+ PORT_VDP_RESPONSE_SUCCESS = 0,
+ PORT_VDP_RESPONSE_INVALID_FORMAT,
+ PORT_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+ PORT_VDP_RESPONSE_UNUSED_VTID,
+ PORT_VDP_RESPONSE_VTID_VIOLATION,
+ PORT_VDP_RESPONSE_VTID_VERSION_VIOALTION,
+ PORT_VDP_RESPONSE_OUT_OF_SYNC,
+ /* 0x08-0xFF reserved for future VDP use */
+ PORT_PROFILE_RESPONSE_SUCCESS = 0x100,
+ PORT_PROFILE_RESPONSE_INPROGRESS,
+ PORT_PROFILE_RESPONSE_INVALID,
+ PORT_PROFILE_RESPONSE_BADSTATE,
+ PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES,
+ PORT_PROFILE_RESPONSE_ERROR,
+};
+
+struct ifla_port_vsi {
+ __u8 vsi_mgr_id;
+ __u8 vsi_type_id[3];
+ __u8 vsi_type_version;
+ __u8 pad[3];
+};
+
#endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c1b2341..c3487a6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -686,6 +686,9 @@ struct netdev_rx_queue {
* int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
* int (*ndo_get_vf_config)(struct net_device *dev,
* int vf, struct ifla_vf_info *ivf);
+ * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
+ * struct nlattr *port[]);
+ * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
*/
#define HAVE_NET_DEVICE_OPS
struct net_device_ops {
@@ -735,6 +738,11 @@ struct net_device_ops {
int (*ndo_get_vf_config)(struct net_device *dev,
int vf,
struct ifla_vf_info *ivf);
+ int (*ndo_set_vf_port)(struct net_device *dev,
+ int vf,
+ struct nlattr *port[]);
+ int (*ndo_get_vf_port)(struct net_device *dev,
+ int vf, struct sk_buff *skb);
#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
int (*ndo_fcoe_enable)(struct net_device *dev);
int (*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 66db120..e4b9870 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -660,6 +660,31 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
return 0;
}
+static size_t rtnl_port_size(const struct net_device *dev)
+{
+ size_t port_size = nla_total_size(4) /* PORT_VF */
+ + nla_total_size(PORT_PROFILE_MAX) /* PORT_PROFILE */
+ + nla_total_size(sizeof(struct ifla_port_vsi))
+ /* PORT_VSI_TYPE */
+ + nla_total_size(PORT_UUID_MAX) /* PORT_INSTANCE_UUID */
+ + nla_total_size(PORT_UUID_MAX) /* PORT_HOST_UUID */
+ + nla_total_size(1) /* PROT_VDP_REQUEST */
+ + nla_total_size(2); /* PORT_VDP_RESPONSE */
+ size_t vf_ports_size = nla_total_size(sizeof(struct nlattr));
+ size_t vf_port_size = nla_total_size(sizeof(struct nlattr))
+ + port_size;
+ size_t port_self_size = nla_total_size(sizeof(struct nlattr))
+ + port_size;
+
+ if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+ return 0;
+ if (dev_num_vf(dev->dev.parent))
+ return port_self_size + vf_ports_size +
+ vf_port_size * dev_num_vf(dev->dev.parent);
+ else
+ return port_self_size;
+}
+
static inline size_t if_nlmsg_size(const struct net_device *dev)
{
return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -680,9 +705,82 @@ static inline size_t if_nlmsg_size(const struct net_device *dev)
+ nla_total_size(1) /* IFLA_LINKMODE */
+ nla_total_size(4) /* IFLA_NUM_VF */
+ rtnl_vfinfo_size(dev) /* IFLA_VFINFO_LIST */
+ + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
+ rtnl_link_get_size(dev); /* IFLA_LINKINFO */
}
+static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
+{
+ struct nlattr *vf_ports;
+ struct nlattr *vf_port;
+ int vf;
+ int err;
+
+ vf_ports = nla_nest_start(skb, IFLA_VF_PORTS);
+ if (!vf_ports)
+ return -EMSGSIZE;
+
+ for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
+ vf_port = nla_nest_start(skb, IFLA_VF_PORT);
+ if (!vf_port) {
+ nla_nest_cancel(skb, vf_ports);
+ return -EMSGSIZE;
+ }
+ NLA_PUT_U32(skb, IFLA_PORT_VF, vf);
+ err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
+ if (err) {
+nla_put_failure:
+ nla_nest_cancel(skb, vf_port);
+ continue;
+ }
+ nla_nest_end(skb, vf_port);
+ }
+
+ nla_nest_end(skb, vf_ports);
+
+ return 0;
+}
+
+static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev)
+{
+ struct nlattr *port_self;
+ int err;
+
+ port_self = nla_nest_start(skb, IFLA_PORT_SELF);
+ if (!port_self)
+ return -EMSGSIZE;
+
+ err = dev->netdev_ops->ndo_get_vf_port(dev, PORT_SELF_VF, skb);
+ if (err) {
+ nla_nest_cancel(skb, port_self);
+ return err;
+ }
+
+ nla_nest_end(skb, port_self);
+
+ return 0;
+}
+
+static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev)
+{
+ int err;
+
+ if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+ return 0;
+
+ err = rtnl_port_self_fill(skb, dev);
+ if (err)
+ return err;
+
+ if (dev_num_vf(dev->dev.parent)) {
+ err = rtnl_vf_ports_fill(skb, dev);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags)
@@ -754,13 +852,15 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
goto nla_put_failure;
copy_rtnl_link_stats64(nla_data(attr), stats);
+ if (dev->dev.parent)
+ NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
+
if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
int i;
struct nlattr *vfinfo, *vf;
int num_vfs = dev_num_vf(dev->dev.parent);
- NLA_PUT_U32(skb, IFLA_NUM_VF, num_vfs);
vfinfo = nla_nest_start(skb, IFLA_VFINFO_LIST);
if (!vfinfo)
goto nla_put_failure;
@@ -788,6 +888,10 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
}
nla_nest_end(skb, vfinfo);
}
+
+ if (rtnl_port_fill(skb, dev))
+ goto nla_put_failure;
+
if (dev->rtnl_link_ops) {
if (rtnl_link_fill(skb, dev) < 0)
goto nla_put_failure;
@@ -849,6 +953,8 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_NET_NS_PID] = { .type = NLA_U32 },
[IFLA_IFALIAS] = { .type = NLA_STRING, .len = IFALIASZ-1 },
[IFLA_VFINFO_LIST] = {. type = NLA_NESTED },
+ [IFLA_VF_PORTS] = { .type = NLA_NESTED },
+ [IFLA_PORT_SELF] = { .type = NLA_NESTED },
};
EXPORT_SYMBOL(ifla_policy);
@@ -870,6 +976,20 @@ static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
.len = sizeof(struct ifla_vf_tx_rate) },
};
+static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
+ [IFLA_PORT_VF] = { .type = NLA_U32 },
+ [IFLA_PORT_PROFILE] = { .type = NLA_STRING,
+ .len = PORT_PROFILE_MAX },
+ [IFLA_PORT_VSI_TYPE] = { .type = NLA_BINARY,
+ .len = sizeof(struct ifla_port_vsi)},
+ [IFLA_PORT_INSTANCE_UUID] = { .type = NLA_BINARY,
+ .len = PORT_UUID_MAX },
+ [IFLA_PORT_HOST_UUID] = { .type = NLA_STRING,
+ .len = PORT_UUID_MAX },
+ [IFLA_PORT_REQUEST] = { .type = NLA_U8, },
+ [IFLA_PORT_RESPONSE] = { .type = NLA_U16, },
+};
+
struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
{
struct net *net;
@@ -1089,6 +1209,53 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
}
err = 0;
+ if (tb[IFLA_VF_PORTS]) {
+ struct nlattr *port[IFLA_PORT_MAX+1];
+ struct nlattr *attr;
+ int vf;
+ int rem;
+
+ err = -EOPNOTSUPP;
+ if (!ops->ndo_set_vf_port)
+ goto errout;
+
+ nla_for_each_nested(attr, tb[IFLA_VF_PORTS], rem) {
+ if (nla_type(attr) != IFLA_VF_PORT)
+ continue;
+ err = nla_parse_nested(port, IFLA_PORT_MAX,
+ attr, ifla_port_policy);
+ if (err < 0)
+ goto errout;
+ if (!port[IFLA_PORT_VF]) {
+ err = -EOPNOTSUPP;
+ goto errout;
+ }
+ vf = nla_get_u32(port[IFLA_PORT_VF]);
+ err = ops->ndo_set_vf_port(dev, vf, port);
+ if (err < 0)
+ goto errout;
+ modified = 1;
+ }
+ }
+ err = 0;
+
+ if (tb[IFLA_PORT_SELF]) {
+ struct nlattr *port[IFLA_PORT_MAX+1];
+
+ err = nla_parse_nested(port, IFLA_PORT_MAX,
+ tb[IFLA_PORT_SELF], ifla_port_policy);
+ if (err < 0)
+ goto errout;
+
+ err = -EOPNOTSUPP;
+ if (ops->ndo_set_vf_port)
+ err = ops->ndo_set_vf_port(dev, PORT_SELF_VF, port);
+ if (err < 0)
+ goto errout;
+ modified = 1;
+ }
+ err = 0;
+
errout:
if (err < 0 && modified && net_ratelimit())
printk(KERN_WARNING "A link change request failed with "
^ permalink raw reply related
* Re: [PATCH] dm9000: fix "BUG: spinlock recursion"
From: David Miller @ 2010-05-18 5:48 UTC (permalink / raw)
To: baruch; +Cc: netdev, stable, s.hauer, ben-linux
In-Reply-To: <1274004407-12323-1-git-send-email-baruch@tkos.co.il>
From: Baruch Siach <baruch@tkos.co.il>
Date: Sun, 16 May 2010 13:06:47 +0300
> dm9000_set_rx_csum and dm9000_hash_table are called from atomic context (in
> dm9000_init_dm9000), and from non-atomic context (via ethtool_ops and
> net_device_ops respectively). This causes a spinlock recursion BUG. Fix this by
> renaming these functions to *_unlocked for the atomic context, and make the
> original functions locking wrappers for use in the non-atomic context.
>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Applied.
^ permalink raw reply
* [net-next-2.6 V9 PATCH 0/2] Add virtual port netlink support
From: Scott Feldman @ 2010-05-18 5:48 UTC (permalink / raw)
To: davem; +Cc: netdev, chrisw, arnd, kaber
[rebase to sync with virtif changes]
The following series adds virtual port netlink support and adds an
implementation to Cisco's enic netdev driver:
1/2: Adds virtual netlink RTM_SETLINK/RTM_GETLINK support, and
adds matching netdev ops net_{set|get}_vf_port.
2/2: Adds enic support for net_{set|get}_vf_port for enic
dynamic devices.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
^ permalink raw reply
* Re: [PATCH net/next] drivers/net: remove useless semicolons
From: David Miller @ 2010-05-18 5:48 UTC (permalink / raw)
To: joe; +Cc: netdev
In-Reply-To: <1273824312.1583.47.camel@Joe-Laptop.home>
From: Joe Perches <joe@perches.com>
Date: Fri, 14 May 2010 01:05:12 -0700
> switch and while statements don't need semicolons at end of statement
>
> Signed-off-by: Joe Perches <joe@perches.com>
Applied, thanks Joe.
^ permalink raw reply
* Re: [patch 0/2] s390: qeth patches for 2.6.35 II
From: David Miller @ 2010-05-18 5:43 UTC (permalink / raw)
To: frank.blaschka; +Cc: netdev, linux-s390
In-Reply-To: <20100517071512.118564000@de.ibm.com>
From: frank.blaschka@de.ibm.com
Date: Mon, 17 May 2010 09:15:12 +0200
> I just got 2 more qeth patches for 2.6.35 (net-next).
> Hope they can make it into net-next before the merge window.
>
> shortlog:
> Ursula Braun (1)
> qeth: support the new OSA CHPID types OSX and OSM
>
> Julia Lawall (1)
> drivers/s390/net: Drop memory allocation cast
All applied, thanks.
^ permalink raw reply
* Re: [PATCH] pegasus: fix USB device ID for ETX-US2
From: David Miller @ 2010-05-18 5:42 UTC (permalink / raw)
To: tabe; +Cc: netdev
In-Reply-To: <4BF0FC1C.2020408@mvista.com>
From: Tadashi Abe <tabe@mvista.com>
Date: Mon, 17 May 2010 17:19:40 +0900
> USB device ID definition for I-O Data ETX-US2 is wrong.
> Correct ID is 0x093a. Here's snippet from /proc/bus/usb/devices;
>
> T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
> D: Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=04bb ProdID=093a Rev= 1.01
> S: Manufacturer=I-O DATA DEVICE,INC.
> S: Product=I-O DATA ETX2-US2
> S: SerialNumber=A26427
> C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=224mA
> I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=pegasus
> E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=83(I) Atr=03(Int.) MxPS= 8 Ivl=125us
>
> This patch enables pegasus driver to work fine with ETX-US2.
>
> Signed-off-by: Tadashi Abe <tabe@mvista.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next-2.6] can: sja1000 platform data fixes
From: David Miller @ 2010-05-18 5:39 UTC (permalink / raw)
To: mkl; +Cc: wg, Netdev, socketcan-core
In-Reply-To: <4BF155CF.3030905@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Mon, 17 May 2010 16:42:23 +0200
> Wolfgang Grandegger wrote:
>> The member "clock" of struct "sja1000_platform_data" is documented as
>> "CAN bus oscillator frequency in Hz" but it's actually used as the CAN
>> clock frequency, which is half of it. To avoid further confusion, this
>> patch fixes it by renaming the member to "osc_freq". That way, also
>> non mainline users will notice the change. The platform code for the
>> relevant boards is updated accordingly. Furthermore, pre-defined
>> values are now used for the members "ocr" and "cdr".
>>
>> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
>> CC: Marc Kleine-Budde <mkl@pengutronix.de>
>
> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH] net: Introduce skb_tunnel_rx() helper
From: David Miller @ 2010-05-18 5:37 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1274136636.2567.23.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 18 May 2010 00:50:36 +0200
> skb rxhash should be cleared when a skb is handled by a tunnel before
> being delivered again, so that correct packet steering can take place.
>
> There are other cleanups and accounting that we can factorize in a new
> helper, skb_tunnel_rx()
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH] tcp: tcp_synack_options() fix
From: David Miller @ 2010-05-18 5:35 UTC (permalink / raw)
To: eric.dumazet
Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev,
ilpo.jarvinen
In-Reply-To: <1274130278.2567.10.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 17 May 2010 23:04:38 +0200
> Le lundi 17 mai 2010 à 13:42 -0700, Stephen Hemminger a écrit :
>
>> Since you are doing away with flag variable, why not this instead?
>>
>
> Sure, we can eliminate this doing_ts variable and save few bytes
>
> Thanks
>
> [PATCH] tcp: tcp_synack_options() fix
Applied, thanks!
^ permalink raw reply
* Re: [PATCH v2] Fix SJA1000 command register writes on SMP systems
From: David Miller @ 2010-05-18 5:34 UTC (permalink / raw)
To: socketcan; +Cc: wg, netdev, socketcan-core
In-Reply-To: <4BF1A464.6070207@hartkopp.net>
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Mon, 17 May 2010 22:17:40 +0200
> diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
> index 145b1a7..2760085 100644
> --- a/drivers/net/can/sja1000/sja1000.c
> +++ b/drivers/net/can/sja1000/sja1000.c
> @@ -84,6 +84,27 @@ static struct can_bittiming_const sja1000_bittiming_const = {
> .brp_inc = 1,
> };
> +static void sja1000_write_cmdreg(struct sja1000_priv *priv, u8 val)
> +{
> + /* the command register needs some locking on SMP systems */
> +
> +#ifdef CONFIG_SMP
Something is adding spurious leading spaces to lines in your patch.
Also, please don't SMP conditionalize this code. It makes it such that
lock debugging et al. can't be used to check this code on uniprocessor.
^ permalink raw reply
* Re: [PATCH -next] bridge: fix build for CONFIG_SYSFS disabled
From: David Miller @ 2010-05-18 5:32 UTC (permalink / raw)
To: randy.dunlap; +Cc: shemminger, sfr, linux-next, linux-kernel, netdev
In-Reply-To: <4BF18468.5010005@oracle.com>
From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Mon, 17 May 2010 11:01:12 -0700
> On 05/17/10 10:56, Stephen Hemminger wrote:
>> On Mon, 17 May 2010 09:17:56 -0700
>> Randy Dunlap <randy.dunlap@oracle.com> wrote:
>>
>>> From: Randy Dunlap <randy.dunlap@oracle.com>
>>>
>>> Fix build when CONFIG_SYSFS is not enabled:
>>>
>>> net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'
>>>
>>> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
...
>> I don't like peppering code with #ifdef like this.
>
> Thanks. I didn't like it either.
>
>> Turns out that in this place sysfs_name is always the same
>> as the device name so instead:
Stephen, please give me a formal submission of this fix with proper
signoff and credit to Randy.
Thanks!
^ permalink raw reply
* Re: [PATCH net-next] ipv6: fix the bug of address check
From: David Miller @ 2010-05-18 5:27 UTC (permalink / raw)
To: shemminger; +Cc: shanwei, netdev
In-Reply-To: <20100517180221.3a90ffcf@nehalam>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 17 May 2010 18:02:21 -0700
> The duplicate address check code got broken in the conversion
> to hlist (2.6.35). The earlier patch did not fix the case where
> two addresses match same hash value. Use two exit paths,
> rather than depending on state of loop variables (from macro).
>
> Based on earlier fix by Shan Wei.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Applied, thanks everyone.
^ permalink raw reply
* bug fixes only please...
From: David Miller @ 2010-05-18 5:25 UTC (permalink / raw)
To: netdev
We are entering the merge window, so please do not submit any new
features or cleanups at this time.
I will make a final pass through patchwork before I submit to Linus
so anything there still has a chance. I also plan to merge the
respun virtif netlink bits, as that code was ready it's just that
a change that went into net-2.6 the other day created conflicts.
We'll open the spigot back up after 2.6.35-rc1 goes out.
Thanks.
^ permalink raw reply
* Re: [RFC] netem: correlated loss generation (v3)
From: Eric Dumazet @ 2010-05-18 5:19 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Stefano Salsano, David Miller, Fabio Ludovici, netdev, netem
In-Reply-To: <20100517205621.036a06e0@nehalam>
Le lundi 17 mai 2010 à 20:56 -0700, Stephen Hemminger a écrit :
> Subject: netem - revised correlated loss generator
>
> This is a patch originated with Stefano Salsano and Fabio Ludovici.
> It provides several alternative loss models for use with netem.
> There are two state machine based models and one table driven model.
>
> To simplify the original code:
> * eliminated the debugging messages and statistics
> * reformatted for clarity
> * changed API to nested attribute relating to loss
> * changed the table to always loop across bits
> * only allocate parameters needed
>
> Still untested, for comment only...
> Should have tested version before 2.6.35 merge window closes.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> + if (loss[NETEM_LOSS_SEQUENCE]) {
> + struct dlgtable *dlg;
> + size_t len = nla_len(loss[NETEM_LOSS_SEQUENCE]);
> +
> + dlg = kmalloc(sizeof(*dlg) + len, GFP_KERNEL);
No overflow check here, len comes from userland.
> + if (dlg)
> + goto nomem;
> +
> + dlg->length = len * BITS_PER_LONG;
> + dlg->index = 0;
> + memcpy(dlg->sequence, nla_data(loss[NETEM_LOSS_SEQUENCE]), len);
> +
> + kfree(q->dlg);
> + q->dlg = dlg;
> + }
> +
> + q->loss_model = model;
> + sch_tree_unlock(sch);
^ permalink raw reply
* Re: [PATCH 0/6] netns support in the kobject layer
From: David Miller @ 2010-05-18 4:21 UTC (permalink / raw)
To: greg
Cc: gregkh, ebiederm, kay.sievers, linux-kernel, tj, cornelia.huck,
eric.dumazet, bcrl, serue, netdev
In-Reply-To: <20100518040844.GB19928@kroah.com>
From: Greg KH <greg@kroah.com>
Date: Mon, 17 May 2010 21:08:44 -0700
> On Mon, May 17, 2010 at 04:48:21PM -0700, David Miller wrote:
>> Greg, this is complete bullshit.
>
> "complete bullshit"? How about just a "little bullshit" :)
Ok, it was a small turd instead of a big one :-)
>> I reviewed them last week, they are fine
>> and have been around forever.
>>
>> Merge them in now, making them wait until 2.6.36 is completely rediculious.
>
> Ok, as they are primarily affecting your subsystem, if you don't object,
> I'll queue them up to my tree tomorrow and push them to Linus within
> this merge period.
Thanks.
^ permalink raw reply
* Re: [PATCH 0/6] netns support in the kobject layer
From: Greg KH @ 2010-05-18 4:08 UTC (permalink / raw)
To: David Miller
Cc: gregkh, ebiederm, kay.sievers, linux-kernel, tj, cornelia.huck,
eric.dumazet, bcrl, serue, netdev
In-Reply-To: <20100517.164821.189684939.davem@davemloft.net>
On Mon, May 17, 2010 at 04:48:21PM -0700, David Miller wrote:
> From: Greg KH <gregkh@suse.de>
> Date: Mon, 17 May 2010 14:03:18 -0700
>
> > On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
> >> Greg KH <greg@kroah.com> writes:
> >>
> >> If I must I will resend these, but these patches are already in
> >> production use, and I had them to you weeks before the merge window
> >> closed.
> >
> > Yes, but they were not reviewed by the network maintainer until after
> > the merge window closed. I already have your sysfs-namespace patches
> > queued up for .35, and that's a big enough change for me to feel
> > comfortable with at the moment.
>
> Greg, this is complete bullshit.
"complete bullshit"? How about just a "little bullshit" :)
> I reviewed them last week, they are fine
> and have been around forever.
>
> Merge them in now, making them wait until 2.6.36 is completely rediculious.
Ok, as they are primarily affecting your subsystem, if you don't object,
I'll queue them up to my tree tomorrow and push them to Linus within
this merge period.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] [resend] fix non-mergeable buffers packet too large error handling
From: David Miller @ 2010-05-18 4:14 UTC (permalink / raw)
To: mst; +Cc: dlstevens, netdev, kvm, virtualization
In-Reply-To: <20100518031252.GA23764@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 18 May 2010 06:12:52 +0300
> DaveM, just to clarify, this patch is on top of the series
> we are working on with David L Stevens. It's not for your net tree.
Understood.
^ permalink raw reply
* Re: pull request: wireless-next-2.6 2010-05-17
From: David Miller @ 2010-05-18 4:12 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, netdev
In-Reply-To: <20100517182653.GC2436@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Mon, 17 May 2010 14:26:53 -0400
> One last big batch intended for 2.6.35 -- these have all been in
> linux-next for several days. Included are the usual driver updates for
> iwlwifi, ath9k, and rt2x00 along with a smattering of other bits
> (including some trivial fixups).
>
> Please let me know if there are problems!
Pulled, thanks John.
^ permalink raw reply
* Re: [PATCH] vhost-net: utilize PUBLISH_USED_IDX feature
From: David Miller @ 2010-05-18 4:08 UTC (permalink / raw)
To: mst
Cc: quintela, rusty, paulmck, arnd, kvm, virtualization, netdev,
linux-kernel, alex.williamson, amit.shah
In-Reply-To: <20100518011931.GA21918@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 18 May 2010 04:19:31 +0300
> With PUBLISH_USED_IDX, guest tells us which used entries
> it has consumed. This can be used to reduce the number
> of interrupts: after we write a used entry, if the guest has not yet
> consumed the previous entry, or if the guest has already consumed the
> new entry, we do not need to interrupt.
> This imporves bandwidth by 30% under some workflows.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Rusty, Dave, this patch depends on the patch
> "virtio: put last seen used index into ring itself"
> which is currently destined at Rusty's tree.
> Rusty, if you are taking that one for 2.6.35, please
> take this one as well.
> Dave, any objections?
None:
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* [RFC] netem: correlated loss generation (v3)
From: Stephen Hemminger @ 2010-05-18 3:56 UTC (permalink / raw)
To: Stefano Salsano, David Miller; +Cc: Fabio Ludovici, netdev, netem
In-Reply-To: <4BD84428.30904@uniroma2.it>
Subject: netem - revised correlated loss generator
This is a patch originated with Stefano Salsano and Fabio Ludovici.
It provides several alternative loss models for use with netem.
There are two state machine based models and one table driven model.
To simplify the original code:
* eliminated the debugging messages and statistics
* reformatted for clarity
* changed API to nested attribute relating to loss
* changed the table to always loop across bits
* only allocate parameters needed
Still untested, for comment only...
Should have tested version before 2.6.35 merge window closes.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
include/linux/pkt_sched.h | 26 ++++
net/sched/sch_netem.c | 287 +++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 307 insertions(+), 6 deletions(-)
--- a/net/sched/sch_netem.c 2010-05-17 20:51:43.753304581 -0700
+++ b/net/sched/sch_netem.c 2010-05-17 20:51:46.423325162 -0700
@@ -47,6 +47,21 @@
layering other disciplines. It does not need to do bandwidth
control either since that can be handled by using token
bucket or other rate control.
+
+ Correlated Loss Generator models
+
+ Added generation of correlated loss according to the
+ "Gilbert-Elliot" model, a 4-state markov model and to a deterministic
+ loss pattern that can be given as input.
+
+ References:
+ [1] NetemCLG Home http://netgroup.uniroma2.it/NetemCLG
+ [2] S. Salsano, F. Ludovici, A. Ordine, "Definition of a general
+ and intuitive loss model for packet networks and its implementation
+ in the Netem module in the Linux kernel", available in [1]
+
+ Authors: Stefano Salsano <stefano.salsano at uniroma2.it
+ Fabio Ludovici <fabio.ludovici at yahoo.it>
*/
struct netem_sched_data {
@@ -64,6 +79,28 @@ struct netem_sched_data {
u32 reorder;
u32 corrupt;
+ enum netem_clg_model loss_model;
+
+ /* data for Correlated Loss Generation models */
+ struct clgstate {
+ /* state of the Markov chain */
+ u8 state;
+
+ /* 4-states and Gilbert-Elliot models */
+ u32 a1; /* p13 for 4-states or p for GE */
+ u32 a2; /* p31 for 4-states or r for GE */
+ u32 a3; /* p32 for 4-states or h for GE */
+ u32 a4; /* p14 for 4-states or 1-k for GE */
+ u32 a5; /* p23 used only in 4-states */
+ } *clg;
+
+ /* Deterministic loss generator */
+ struct dlgtable {
+ u32 index; /* current place in sequence */
+ u32 length; /* length of the sequence (in bits) */
+ unsigned long sequence[0];
+ } *dlg;
+
struct crndstate {
u32 last;
u32 rho;
@@ -115,6 +152,139 @@ static u32 get_crandom(struct crndstate
return answer;
}
+/* get_loss_pattern_element - deterministic loss generator
+ * Extracts an element (1 means loss event, 0 means transmission)
+ * from the current loss pattern.
+ */
+static int get_loss_pattern_element(struct netem_sched_data *q)
+{
+ struct dlgtable *dlg = q->dlg;
+ u32 val = dlg->sequence[BIT_WORD(dlg->index)] & BIT_MASK(dlg->index);
+
+ if (++dlg->index >= dlg->length)
+ dlg->index = 0;
+
+ return val != 0;
+}
+
+/* get_loss_4state_element - 4-state model loss generator
+ * Generates losses according to the 4-state Markov chain adopted in
+ * the GI (General and Intuitive) loss model.
+ * returns 1 the next packet will be lost,
+ * 0 it will be transmitted.
+ */
+static int get_loss_4state_element(struct netem_sched_data *q)
+{
+ struct clgstate *clg = q->clg;
+ u32 rnd = net_random();
+
+ /*
+ * Makes a comparison between rnd and the transition
+ * probabilities outgoing from the current state, then decides the
+ * next state and if the next packet has to be transmitted or lost.
+ * The four states correspond to:
+ * 1 => successfully transmitted packets within a gap period
+ * 4 => isolated losses within a gap period
+ * 3 => lost packets within a burst period
+ * 2 => successfully transmitted packets within a burst period
+ */
+ switch (clg->state) {
+ case 1:
+ if (rnd < clg->a4) {
+ clg->state = 4;
+ return 1;
+ } else if (clg->a4 < rnd && rnd < clg->a1) {
+ clg->state = 3;
+ return 1;
+ } else if (clg->a1 < rnd)
+ clg->state = 1;
+
+ break;
+ case 2:
+ if (rnd < clg->a5) {
+ clg->state = 3;
+ return 1;
+ } else
+ clg->state = 2;
+
+ break;
+ case 3:
+ if (rnd < clg->a3)
+ clg->state = 2;
+ else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
+ clg->state = 1;
+ return 1;
+ } else if (clg->a2 + clg->a3 < rnd) {
+ clg->state = 3;
+ return 1;
+ }
+ break;
+ case 4:
+ clg->state = 1;
+ break;
+ }
+
+ return 0;
+}
+
+/* get_loss_gilb_ell_element - Gilbert-Elliot model loss generator
+ * Generates losses according to the Gilbert-Elliot loss model or
+ * its special cases (Gilbert or Simple Gilbert)
+ *
+ * Makes a comparison between random_gilb_ell and the transition
+ * probabilities outgoing from the current state, then decides the
+ * next state. A second random number is extracted and the comparison
+ * with the loss probability of the current state decides if the next
+ * packet will be transmitted or lost.
+ */
+static int get_loss_gilb_ell_element(struct netem_sched_data *q)
+{
+ struct clgstate *clg = q->clg;
+
+ switch (clg->state) {
+ case 1:
+ if (net_random() < clg->a1)
+ clg->state = 2;
+ if (net_random() < clg->a4)
+ return 1;
+ case 2:
+ if (net_random() < clg->a2)
+ clg->state = 1;
+ if (clg->a3 > net_random())
+ return 1;
+ }
+
+ return 0;
+}
+
+static int get_loss_event(struct netem_sched_data *q)
+{
+ switch (q->loss_model) {
+ case CLG_DETERMIN:
+ return get_loss_pattern_element(q);
+
+ case CLG_4_STATES:
+ /* 4state loss model algorithm (used also for GI model)
+ * Extracts a value from the markov 4 state loss generator,
+ * if it is 1 drops a packet and if needed writes the event in
+ * the kernel logs
+ */
+ return get_loss_4state_element(q);
+
+ case CLG_GILB_ELL:
+ /* Gilbert-Elliot loss model algorithm
+ * Extracts a value from the Gilbert-Elliot loss generator,
+ * if it is 1 drops a packet and if needed writes the event in
+ * the kernel logs
+ */
+ return get_loss_gilb_ell_element(q);
+
+ default:
+ return 0;
+ }
+}
+
+
/* tabledist - return a pseudo-randomly distributed value with mean mu and
* std deviation sigma. Uses table lookup to approximate the desired
* distribution, and a uniformly-distributed pseudo-random source.
@@ -171,6 +341,10 @@ static int netem_enqueue(struct sk_buff
if (q->loss && q->loss >= get_crandom(&q->loss_cor))
--count;
+ /* Deterministic loss pattern algorithm */
+ if (q->loss_model != CLG_NONE && get_loss_event(q))
+ --count;
+
if (count == 0) {
sch->qstats.drops++;
kfree_skb(skb);
@@ -370,10 +544,91 @@ static void get_corrupt(struct Qdisc *sc
init_crandom(&q->corrupt_cor, r->correlation);
}
+
+static const struct nla_policy netem_loss_nest[NETEM_LOSS_MAX + 1] = {
+ [NETEM_LOSS_MODEL] = { .type = NLA_U8, },
+ [NETEM_LOSS_STATE] = { .len = sizeof(struct tc_netem_loss_state) },
+};
+
+static int get_loss_clg(struct Qdisc *sch, const struct nlattr *attr)
+{
+ struct netem_sched_data *q = qdisc_priv(sch);
+ struct nlattr *loss[NETEM_LOSS_MAX + 1];
+ enum netem_clg_model model;
+ int ret;
+
+ ret = nla_parse_nested(loss, NETEM_LOSS_MAX,
+ attr, netem_loss_nest);
+ if (ret)
+ return ret;
+
+ if (!loss[NETEM_LOSS_MODEL]) {
+ pr_info("netem: missing loss model\n");
+ return -EINVAL;
+ }
+
+ model = nla_get_u8(loss[NETEM_LOSS_MODEL]);
+ switch (model) {
+ case CLG_GILB_ELL:
+ case CLG_4_STATES:
+ if (!loss[NETEM_LOSS_STATE]) {
+ pr_info("netem: missing state information for loss model\n");
+ return -EINVAL;
+ }
+ break;
+ case CLG_DETERMIN:
+ if (!loss[NETEM_LOSS_SEQUENCE]) {
+ pr_info("netem: missing sequence information for loss model\n");
+ return -EINVAL;
+ }
+ break;
+
+ default:
+ pr_info("netem: unknown loss model: %u\n",
+ (unsigned) model);
+ return -EINVAL;
+ }
+
+ sch_tree_lock(sch);
+ if (loss[NETEM_LOSS_STATE]) {
+ if (!q->clg) {
+ q->clg = kmalloc(sizeof(struct clgstate), GFP_KERNEL);
+ if (!q->clg)
+ goto nomem;
+ }
+ memcpy(q->clg, nla_data(loss[NETEM_LOSS_STATE]),
+ sizeof(struct clgstate));
+ }
+ if (loss[NETEM_LOSS_SEQUENCE]) {
+ struct dlgtable *dlg;
+ size_t len = nla_len(loss[NETEM_LOSS_SEQUENCE]);
+
+ dlg = kmalloc(sizeof(*dlg) + len, GFP_KERNEL);
+ if (dlg)
+ goto nomem;
+
+ dlg->length = len * BITS_PER_LONG;
+ dlg->index = 0;
+ memcpy(dlg->sequence, nla_data(loss[NETEM_LOSS_SEQUENCE]), len);
+
+ kfree(q->dlg);
+ q->dlg = dlg;
+ }
+
+ q->loss_model = model;
+ sch_tree_unlock(sch);
+
+ return 0;
+ nomem:
+ sch_tree_unlock(sch);
+ return -ENOMEM;
+}
+
static const struct nla_policy netem_policy[TCA_NETEM_MAX + 1] = {
[TCA_NETEM_CORR] = { .len = sizeof(struct tc_netem_corr) },
[TCA_NETEM_REORDER] = { .len = sizeof(struct tc_netem_reorder) },
[TCA_NETEM_CORRUPT] = { .len = sizeof(struct tc_netem_corrupt) },
+ [TCA_NETEM_LOSS] = { .type = NLA_NESTED },
};
static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
@@ -441,6 +696,9 @@ static int netem_change(struct Qdisc *sc
if (tb[TCA_NETEM_CORRUPT])
get_corrupt(sch, tb[TCA_NETEM_CORRUPT]);
+ if (tb[TCA_NETEM_LOSS])
+ get_loss_clg(sch, tb[TCA_NETEM_LOSS]);
+
return 0;
}
@@ -538,6 +796,7 @@ static int netem_init(struct Qdisc *sch,
qdisc_watchdog_init(&q->watchdog, sch);
+ q->loss_model = CLG_NONE;
q->qdisc = qdisc_create_dflt(qdisc_dev(sch), sch->dev_queue,
&tfifo_qdisc_ops,
TC_H_MAKE(sch->handle, 1));
@@ -561,13 +820,14 @@ static void netem_destroy(struct Qdisc *
qdisc_watchdog_cancel(&q->watchdog);
qdisc_destroy(q->qdisc);
kfree(q->delay_dist);
+ kfree(q->clg);
+ kfree(q->dlg);
}
static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
{
const struct netem_sched_data *q = qdisc_priv(sch);
- unsigned char *b = skb_tail_pointer(skb);
- struct nlattr *nla = (struct nlattr *) b;
+ struct nlattr *nla = (struct nlattr *) skb_tail_pointer(skb);
struct tc_netem_qopt qopt;
struct tc_netem_corr cor;
struct tc_netem_reorder reorder;
@@ -594,13 +854,28 @@ static int netem_dump(struct Qdisc *sch,
corrupt.correlation = q->corrupt_cor.rho;
NLA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt);
- nla->nla_len = skb_tail_pointer(skb) - b;
+ if (q->loss_model != CLG_NONE) {
+ struct nlattr *nest = nla_nest_start(skb, NETEM_LOSS_MAX);
+
+ if (nest == NULL)
+ goto nla_put_failure;
+
+ NLA_PUT_U8(skb, NETEM_LOSS_MODEL, q->loss_model);
+ if (q->clg)
+ NLA_PUT(skb, NETEM_LOSS_STATE,
+ sizeof(struct tc_netem_loss_state), q->clg);
+ /*
+ * Don't bother dumping loss sequence map since it can be large
+ * and hard to display
+ */
+ nla_nest_end(skb, nest);
+ }
return skb->len;
nla_put_failure:
- nlmsg_trim(skb, b);
- return -1;
+ nlmsg_trim(skb, nla);
+ return -EMSGSIZE;
}
static struct Qdisc_ops netem_qdisc_ops __read_mostly = {
--- a/include/linux/pkt_sched.h 2010-05-17 20:51:43.763328095 -0700
+++ b/include/linux/pkt_sched.h 2010-05-17 20:52:10.263123961 -0700
@@ -435,6 +435,7 @@ enum {
TCA_NETEM_DELAY_DIST,
TCA_NETEM_REORDER,
TCA_NETEM_CORRUPT,
+ TCA_NETEM_LOSS,
__TCA_NETEM_MAX,
};
@@ -465,6 +466,31 @@ struct tc_netem_corrupt {
__u32 correlation;
};
+enum {
+ NETEM_LOSS_MODEL,
+ NETEM_LOSS_STATE,
+ NETEM_LOSS_SEQUENCE,
+ __NETEM_LOSS_MAX
+};
+#define NETEM_LOSS_MAX (__NETEM_LOSS_MAX - 1)
+
+/* definition of models for Correlated Loss Generation */
+enum netem_clg_model {
+ CLG_NONE = 0,
+ CLG_GILB_ELL,
+ CLG_4_STATES,
+ CLG_DETERMIN,
+};
+
+/* Correlated Loss Model parameters - GI and Gilbert-Elliot models */
+struct tc_netem_loss_state {
+ __u32 a1; /* p13 for GI or p for Gilbert-Elliot */
+ __u32 a2; /* p31 for GI or r for Gilbert-Elliot */
+ __u32 a3; /* p32 for GI or h for Gilbert-Elliot */
+ __u32 a4; /* p14 for GI or 1-k for Gilbert-Elliot */
+ __u32 a5; /* p23 used only in GI */
+};
+
#define NETEM_DIST_SCALE 8192
/* DRR */
^ permalink raw reply
* Re: [PATCH] [resend] fix non-mergeable buffers packet too large error handling
From: Michael S. Tsirkin @ 2010-05-18 3:12 UTC (permalink / raw)
To: David L Stevens; +Cc: netdev, kvm, virtualization, davem
DaveM, just to clarify, this patch is on top of the series
we are working on with David L Stevens. It's not for your net tree.
--
MST
^ permalink raw reply
* [PATCHv2] vhost-net: utilize PUBLISH_USED_IDX feature
From: Michael S. Tsirkin @ 2010-05-18 2:21 UTC (permalink / raw)
To: davem, Juan Quintela, Rusty Russell, Paul E. McKenney,
Arnd Bergmann, kvm
With PUBLISH_USED_IDX, guest tells us which used entries
it has consumed. This can be used to reduce the number
of interrupts: after we write a used entry, if the guest has not yet
consumed the previous entry, or if the guest has already consumed the
new entry, we do not need to interrupt.
This imporves bandwidth by 30% under some workflows.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
This is on top of Rusty's tree and depends on the virtio patch.
Changes from v1:
fix build
drivers/vhost/vhost.c | 27 +++++++++++++++++++++------
drivers/vhost/vhost.h | 4 ++--
2 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 750effe..18c4f6e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -278,14 +278,15 @@ static int memory_access_ok(struct vhost_dev *d, struct vhost_memory *mem,
return 1;
}
-static int vq_access_ok(unsigned int num,
+static int vq_access_ok(struct vhost_dev *d, unsigned int num,
struct vring_desc __user *desc,
struct vring_avail __user *avail,
struct vring_used __user *used)
{
+ size_t s = vhost_has_feature(d, VIRTIO_RING_F_PUBLISH_USED) ? 2 : 0;
return access_ok(VERIFY_READ, desc, num * sizeof *desc) &&
access_ok(VERIFY_READ, avail,
- sizeof *avail + num * sizeof *avail->ring) &&
+ sizeof *avail + num * sizeof *avail->ring + s) &&
access_ok(VERIFY_WRITE, used,
sizeof *used + num * sizeof *used->ring);
}
@@ -312,7 +313,7 @@ static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user *log_base)
/* Caller should have vq mutex and device mutex */
int vhost_vq_access_ok(struct vhost_virtqueue *vq)
{
- return vq_access_ok(vq->num, vq->desc, vq->avail, vq->used) &&
+ return vq_access_ok(vq->dev, vq->num, vq->desc, vq->avail, vq->used) &&
vq_log_access_ok(vq, vq->log_base);
}
@@ -448,7 +449,7 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
* If it is not, we don't as size might not have been setup.
* We will verify when backend is configured. */
if (vq->private_data) {
- if (!vq_access_ok(vq->num,
+ if (!vq_access_ok(d, vq->num,
(void __user *)(unsigned long)a.desc_user_addr,
(void __user *)(unsigned long)a.avail_user_addr,
(void __user *)(unsigned long)a.used_user_addr)) {
@@ -473,6 +474,7 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
vq->log_used = !!(a.flags & (0x1 << VHOST_VRING_F_LOG));
vq->desc = (void __user *)(unsigned long)a.desc_user_addr;
vq->avail = (void __user *)(unsigned long)a.avail_user_addr;
+ vq->last_used = (u16 __user *)&vq->avail->ring[vq->num];
vq->log_addr = a.log_guest_addr;
vq->used = (void __user *)(unsigned long)a.used_user_addr;
break;
@@ -993,7 +995,8 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
/* After we've used one of their buffers, we tell them about it. We'll then
* want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+static int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head,
+ int len)
{
struct vring_used_elem __user *used;
@@ -1034,9 +1037,10 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
}
/* This actually signals the guest, using eventfd. */
-void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
__u16 flags;
+ __u16 used;
/* Flush out used index updates. This is paired
* with the barrier that the Guest executes when enabling
* interrupts. */
@@ -1053,6 +1057,17 @@ void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
!vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY)))
return;
+ if (vhost_has_feature(dev, VIRTIO_RING_F_PUBLISH_USED)) {
+ __u16 used;
+ if (get_user(used, vq->last_used)) {
+ vq_err(vq, "Failed to get last used idx");
+ return;
+ }
+
+ if (used != (u16)(vq->last_used_idx - 1))
+ return;
+ }
+
/* Signal the Guest tell them we used something up. */
if (vq->call_ctx)
eventfd_signal(vq->call_ctx, 1);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 44591ba..bd01aca 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -52,6 +52,7 @@ struct vhost_virtqueue {
unsigned int num;
struct vring_desc __user *desc;
struct vring_avail __user *avail;
+ u16 __user *last_used;
struct vring_used __user *used;
struct file *kick;
struct file *call;
@@ -126,8 +127,6 @@ unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
struct vhost_log *log, unsigned int *log_num);
void vhost_discard_vq_desc(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
unsigned int head, int len);
void vhost_disable_notify(struct vhost_virtqueue *);
@@ -148,6 +147,7 @@ void vhost_cleanup(void);
enum {
VHOST_FEATURES = (1 << VIRTIO_F_NOTIFY_ON_EMPTY) |
(1 << VIRTIO_RING_F_INDIRECT_DESC) |
+ (1 << VIRTIO_RING_F_PUBLISH_USED) |
(1 << VHOST_F_LOG_ALL) |
(1 << VHOST_NET_F_VIRTIO_NET_HDR),
};
--
1.7.1.12.g42b7f
^ permalink raw reply related
* [PATCHv2] virtio: put last seen used index into ring itself
From: Michael S. Tsirkin @ 2010-05-18 2:13 UTC (permalink / raw)
To: Rusty Russell, Jiri Pirko, Shirley Ma, Amit Shah, Mark McLoughlin,
netdev
Generally, the Host end of the virtio ring doesn't need to see where
Guest is up to in consuming the ring. However, to completely understand
what's going on from the outside, this information must be exposed.
For example, host can reduce the number of interrupts by detecting
that the guest is currently handling previous buffers.
Fortunately, we have room to expand: the ring is always a whole number
of pages and there's hundreds of bytes of padding after the avail ring
and the used ring, whatever the number of descriptors (which must be a
power of 2).
We add a feature bit so the guest can tell the host that it's writing
out the current value there, if it wants to use that.
This is based on a patch by Rusty Russell, with the main difference
being that we dedicate a feature bit to guest to tell the host it is
writing the used index. This way we don't need to force host to publish
the last available index until we have a use for it.
Another difference is that while the feature helps virtio-net,
there have been conflicting reports wrt virtio-blk.
The reason is unknown, it could be due to the fact that
virtio-blk does not bother to disable interrupts at all.
So for now, this patch only acks this feature for -net.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes from v1:
build fix
drivers/net/virtio_net.c | 2 ++
drivers/virtio/virtio_ring.c | 21 ++++++++++++---------
include/linux/virtio_ring.h | 12 ++++++++++++
3 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c0cab7a..327f30f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -23,6 +23,7 @@
#include <linux/module.h>
#include <linux/virtio.h>
#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
#include <linux/scatterlist.h>
#include <linux/if_vlan.h>
#include <linux/slab.h>
@@ -1056,6 +1057,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+ VIRTIO_RING_F_PUBLISH_USED,
};
static struct virtio_driver virtio_net_driver = {
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1ca8890..fb06570 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -89,9 +89,6 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;
- /* Last used index we've seen. */
- u16 last_used_idx;
-
/* How to notify other side. FIXME: commonalize hcalls! */
void (*notify)(struct virtqueue *vq);
@@ -285,12 +282,13 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
static inline bool more_used(const struct vring_virtqueue *vq)
{
- return vq->last_used_idx != vq->vring.used->idx;
+ return *vq->vring.last_used_idx != vq->vring.used->idx;
}
void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ struct vring_used_elem *u;
void *ret;
unsigned int i;
@@ -310,9 +308,9 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
/* Only get used array entries after they have been exposed by host. */
virtio_rmb();
- i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id;
- *len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len;
-
+ u = &vq->vring.used->ring[*vq->vring.last_used_idx % vq->vring.num];
+ i = u->id;
+ *len = u->len;
if (unlikely(i >= vq->vring.num)) {
BAD_RING(vq, "id %u out of range\n", i);
return NULL;
@@ -325,7 +323,8 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
/* detach_buf clears data, so grab it now. */
ret = vq->data[i];
detach_buf(vq, i);
- vq->last_used_idx++;
+ (*vq->vring.last_used_idx)++;
+
END_USE(vq);
return ret;
}
@@ -348,6 +347,8 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ /* Besides flags write, this barrier also flushes out
+ * last available index write. */
virtio_mb();
if (unlikely(more_used(vq))) {
END_USE(vq);
@@ -431,7 +432,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
vq->vq.name = name;
vq->notify = notify;
vq->broken = false;
- vq->last_used_idx = 0;
+ *vq->vring.last_used_idx = 0;
vq->num_added = 0;
list_add_tail(&vq->vq.list, &vdev->vqs);
#ifdef DEBUG
@@ -473,6 +474,8 @@ void vring_transport_features(struct virtio_device *vdev)
switch (i) {
case VIRTIO_RING_F_INDIRECT_DESC:
break;
+ case VIRTIO_RING_F_PUBLISH_USED:
+ break;
default:
/* We don't understand this bit. */
clear_bit(i, vdev->features);
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index e4d144b..0968702 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -29,6 +29,9 @@
/* We support indirect buffer descriptors */
#define VIRTIO_RING_F_INDIRECT_DESC 28
+/* The Guest publishes last-seen used index at the end of the avail ring. */
+#define VIRTIO_RING_F_PUBLISH_USED 29
+
/* Virtio ring descriptors: 16 bytes. These can chain together via "next". */
struct vring_desc {
/* Address (guest-physical). */
@@ -69,6 +72,8 @@ struct vring {
struct vring_avail *avail;
struct vring_used *used;
+ /* Last used index seen by the Guest. */
+ __u16 *last_used_idx;
};
/* The standard layout for the ring is a continuous chunk of memory which looks
@@ -83,6 +88,7 @@ struct vring {
* __u16 avail_flags;
* __u16 avail_idx;
* __u16 available[num];
+ * __u16 last_used_idx;
*
* // Padding to the next align boundary.
* char pad[];
@@ -101,6 +107,12 @@ static inline void vring_init(struct vring *vr, unsigned int num, void *p,
vr->avail = p + num*sizeof(struct vring_desc);
vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + align-1)
& ~(align - 1));
+ /* We publish the last-seen used index at the end of the available ring.
+ * It is at the end for backwards compatibility. */
+ vr->last_used_idx = &(vr)->avail->ring[num];
+ /* Verify that last used index does not spill over the used ring. */
+ BUG_ON((void *)vr->last_used_idx +
+ sizeof *vr->last_used_idx > (void *)vr->used);
}
static inline unsigned vring_size(unsigned int num, unsigned long align)
--
1.7.1.12.g42b7f
^ permalink raw reply related
* Re: [PATCH] [resend] fix non-mergeable buffers packet too large error handling
From: Michael S. Tsirkin @ 2010-05-18 1:53 UTC (permalink / raw)
To: David L Stevens; +Cc: netdev, kvm, virtualization
In-Reply-To: <1274130996.8492.7.camel@w-dls.beaverton.ibm.com>
On Mon, May 17, 2010 at 02:16:36PM -0700, David L Stevens wrote:
> This patch enforces single-buffer allocation when
> mergeable rx buffers is not enabled.
>
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Thanks! Why are you resending?
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 309c570..c346304 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -361,13 +361,21 @@ static void handle_rx(struct vhost_net *net)
> break;
> }
> /* TODO: Should check and handle checksum. */
> - if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
> - memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
> - offsetof(typeof(hdr), num_buffers),
> - sizeof hdr.num_buffers)) {
> - vq_err(vq, "Failed num_buffers write");
> + if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF)) {
> + if (memcpy_toiovecend(vq->hdr,
> + (unsigned char *)&headcount,
> + offsetof(typeof(hdr),
> + num_buffers),
> + sizeof hdr.num_buffers)) {
> + vq_err(vq, "Failed num_buffers write");
> + vhost_discard_desc(vq, headcount);
> + break;
> + }
> + } else if (headcount > 1) {
> + vq_err(vq, "rx packet too large (%d) for guest",
> + sock_len);
> vhost_discard_desc(vq, headcount);
> - break;
> + continue;
> }
> vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
> headcount);
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox