Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] wireless: implement basic ethtool support for cfg80211 devices
From: Ben Hutchings @ 2009-10-01  1:30 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-wireless, netdev, Kalle Valo, Kalle Valo, Luis R. Rodriguez
In-Reply-To: <1254359942-3483-1-git-send-email-linville@tuxdriver.com>

On Wed, 2009-09-30 at 21:19 -0400, John W. Linville wrote:
[...]
> +void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
> +{
> +	struct wireless_dev *wdev = dev->ieee80211_ptr;
> +
> +	strncpy(info->driver, wiphy_dev(wdev->wiphy)->driver->name,
> +		sizeof(info->driver));
> +	info->driver[sizeof(info->driver) - 1] = '\0';
[...]

Use strlcpy() instead of these two statements.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 3/3] at76c50x-usb: set firmware and hardware version in wiphy
From: Ben Hutchings @ 2009-10-01  1:32 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-wireless, netdev, Kalle Valo, Kalle Valo, Luis R. Rodriguez
In-Reply-To: <1254359942-3483-3-git-send-email-linville@tuxdriver.com>

On Wed, 2009-09-30 at 21:19 -0400, John W. Linville wrote:
[...]
> +	len = sizeof(wiphy->fw_version);
> +	snprintf(wiphy->fw_version, len, "%d.%d.%d-%d",
> +		 priv->fw_version.major, priv->fw_version.minor,
> +		 priv->fw_version.patch, priv->fw_version.build);
> +	/* null terminate the strings in case they were truncated */
> +	wiphy->fw_version[len - 1] = '\0';
[...]

This last statement is unnecessary; snprintf() always null-terminates
(unless the length is zero).

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 2.6.32-rc1] net: VMware virtual Ethernet NIC driver: vmxnet3
From: David Miller @ 2009-10-01  2:51 UTC (permalink / raw)
  To: shemminger
  Cc: sbhatewara, linux-kernel, netdev, shemminger, jgarzik, anthony,
	chrisw, greg, akpm, virtualization, pv-drivers
In-Reply-To: <20090930173923.4520716a@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 30 Sep 2009 17:39:23 -0700

> Why not use NETIF_F_LRO and ethtool to control LRO support?

In fact, you must, in order to handle bridging and routing
correctly.

Bridging and routing is illegal with LRO enabled, so the kernel
automatically issues the necessary ethtool commands to disable
LRO in the relevant devices.

Therefore you must support the ethtool LRO operation in order to
support LRO at all.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/5] ixgbe: Fix disabling of relaxed ordering with Tx DCA
From: David Miller @ 2009-10-01  3:04 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, peter.p.waskiewicz.jr
In-Reply-To: <20090930220705.27479.62694.stgit@localhost.localdomain>


All 5 patches applied, thanks!

^ permalink raw reply

* Re: [net-2.6 PATCH 0/5] qlge: Bug fixes for qlge.
From: David Miller @ 2009-10-01  3:04 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev
In-Reply-To: <1254249565-16381-1-git-send-email-ron.mercer@qlogic.com>


All applied, thanks Ron.



^ permalink raw reply

* Re: [PATCH] skge: Make sure both ports initialize correctly
From: David Miller @ 2009-10-01  3:04 UTC (permalink / raw)
  To: shemminger; +Cc: mikem, shemminger, netdev
In-Reply-To: <20090930172821.7c6bd127@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 30 Sep 2009 17:28:21 -0700

> On Wed, 23 Sep 2009 22:50:36 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
> 
>> If allocation of the second ports fails, make sure that hw->ports
>>  is not 2 otherwise we'll crash trying to access the second port.
>> 
>> This fix is copied from a similar fix in the sky2 driver (ca519274...),
>> but is untested, as I don't have a skge card.
>> 
>> Signed-off-by: Mike McCormack <mikem@ring3k.org>
 ...
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2.6.31-rc9] drivers/net: ks8851_mll ethernet network driver
From: David Miller @ 2009-10-01  3:05 UTC (permalink / raw)
  To: David.Choi; +Cc: greg, netdev, Charles.Li, Choi, jgarzik, shemminger
In-Reply-To: <C43529A246480145B0A6D0234BDB0F0D021280@MELANITE.micrel.com>

From: "Choi, David" <David.Choi@Micrel.Com>
Date: Fri, 25 Sep 2009 17:42:12 -0700

> Hello David Miller,
> 
> First of all, thank you for your feedback.  Here is my new patch.
> 
>>From : David J. Choi <david.choi@micrel.com>
> 
> This is the first registration of ks8851 network driver with 
> MLL(address/data multiplexed) interface.
> 
> Signed-off-by : David J. Choi <david.choi@micrel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] bcm63xx_enet: timeout off by one in do_mdio_op()
From: David Miller @ 2009-10-01  3:05 UTC (permalink / raw)
  To: mbizon; +Cc: roel.kluin, netdev, akpm
In-Reply-To: <1254319288.1627.758.camel@sakura.staff.proxad.net>

From: Maxime Bizon <mbizon@freebox.fr>
Date: Wed, 30 Sep 2009 16:01:28 +0200

> 
> On Mon, 2009-09-21 at 22:08 +0200, Roel Kluin wrote:
> 
> Hi Roel,
> 
>> `while (limit-- >= 0)' reaches -2 after the loop upon timeout.
> 
> The 1000us limit was chosen arbitrarily, since mdio access are much
> shorter, and was just to prevent CPU lockup in case of hardware bug.
> 
> But it looks like a bug, and since you're the second one reporting this,
> this should be fixed :)
> 
> 
>> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
> 
> Acked-by: Maxime Bizon <mbizon@freebox.fr>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] 3c59x: Rework suspend and resume
From: David Miller @ 2009-10-01  3:11 UTC (permalink / raw)
  To: rjw; +Cc: avorontsov, stern, linux-pm, netdev
In-Reply-To: <200909252354.34511.rjw@sisk.pl>

From: "Rafael J. Wysocki" <rjw@sisk.pl>
Date: Fri, 25 Sep 2009 23:54:34 +0200

> On Friday 25 September 2009, Anton Vorontsov wrote:
>> As noticed by Alan Stern, there is still one issue with the driver:
>> we disable PCI IRQ on suspend, but other devices on the same IRQ
>> line might still need the IRQ enabled to suspend properly.
>> 
>> Nowadays, PCI core handles all power management work by itself, with
>> one condition though: if we use dev_pm_ops. So, rework the driver to
>> only quiesce 3c59x internal logic on suspend, while PCI core will
>> manage PCI device power state with IRQs disabled.
>> 
>> Suggested-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

Applied, thanks everyone.

^ permalink raw reply

* [PATCH] be2net: Workaround to fix a bug in Rx Completion processing.
From: Ajit Khaparde @ 2009-10-01  4:03 UTC (permalink / raw)
  To: davem, netdev

vtp bit in RX completion descriptor could be wrongly set in
some skews of BladEngine.  Ignore this  bit if vtm is not set.
Resending because the previous patch was against net-next tree.
This patch is against the net-2.6 tree.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
---
 drivers/net/benet/be.h      |    1 +
 drivers/net/benet/be_cmds.c |    3 ++-
 drivers/net/benet/be_cmds.h |    3 ++-
 drivers/net/benet/be_main.c |   23 +++++++++++++++++++----
 4 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 684c6fe..a80da0e 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -258,6 +258,7 @@ struct be_adapter {
 	bool link_up;
 	u32 port_num;
 	bool promiscuous;
+	u32 cap;
 };
 
 extern const struct ethtool_ops be_ethtool_ops;
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 3dd76c4..79d35d1 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -1068,7 +1068,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc)
 }
 
 /* Uses mbox */
-int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num)
+int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num, u32 *cap)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_query_fw_cfg *req;
@@ -1088,6 +1088,7 @@ int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num)
 	if (!status) {
 		struct be_cmd_resp_query_fw_cfg *resp = embedded_payload(wrb);
 		*port_num = le32_to_cpu(resp->phys_port);
+		*cap = le32_to_cpu(resp->function_cap);
 	}
 
 	spin_unlock(&adapter->mbox_lock);
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 93e432f..8b4c2cb 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -760,7 +760,8 @@ extern int be_cmd_set_flow_control(struct be_adapter *adapter,
 			u32 tx_fc, u32 rx_fc);
 extern int be_cmd_get_flow_control(struct be_adapter *adapter,
 			u32 *tx_fc, u32 *rx_fc);
-extern int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num);
+extern int be_cmd_query_fw_cfg(struct be_adapter *adapter,
+			u32 *port_num, u32 *cap);
 extern int be_cmd_reset_function(struct be_adapter *adapter);
 extern int be_process_mcc(struct be_adapter *adapter);
 extern int be_cmd_write_flashrom(struct be_adapter *adapter,
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 409cf05..2f9b501 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -747,9 +747,16 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 			struct be_eth_rx_compl *rxcp)
 {
 	struct sk_buff *skb;
-	u32 vtp, vid;
+	u32 vlanf, vid;
+	u8 vtm;
 
-	vtp = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
+	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
+	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
+
+	/* vlanf could be wrongly set in some cards.
+	 * ignore if vtm is not set */
+	if ((adapter->cap == 0x400) && !vtm)
+		vlanf = 0;
 
 	skb = netdev_alloc_skb(adapter->netdev, BE_HDR_LEN + NET_IP_ALIGN);
 	if (!skb) {
@@ -772,7 +779,7 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 	skb->protocol = eth_type_trans(skb, adapter->netdev);
 	skb->dev = adapter->netdev;
 
-	if (vtp) {
+	if (vlanf) {
 		if (!adapter->vlan_grp || adapter->num_vlans == 0) {
 			kfree_skb(skb);
 			return;
@@ -797,11 +804,18 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	struct be_eq_obj *eq_obj =  &adapter->rx_eq;
 	u32 num_rcvd, pkt_size, remaining, vlanf, curr_frag_len;
 	u16 i, rxq_idx = 0, vid, j;
+	u8 vtm;
 
 	num_rcvd = AMAP_GET_BITS(struct amap_eth_rx_compl, numfrags, rxcp);
 	pkt_size = AMAP_GET_BITS(struct amap_eth_rx_compl, pktsize, rxcp);
 	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
 	rxq_idx = AMAP_GET_BITS(struct amap_eth_rx_compl, fragndx, rxcp);
+	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
+
+	/* vlanf could be wrongly set in some cards.
+	 * ignore if vtm is not set */
+	if ((adapter->cap == 0x400) && !vtm)
+		vlanf = 0;
 
 	skb = napi_get_frags(&eq_obj->napi);
 	if (!skb) {
@@ -2045,7 +2059,8 @@ static int be_hw_up(struct be_adapter *adapter)
 	if (status)
 		return status;
 
-	status = be_cmd_query_fw_cfg(adapter, &adapter->port_num);
+	status = be_cmd_query_fw_cfg(adapter,
+				&adapter->port_num, &adapter->cap);
 	return status;
 }
 
-- 
1.6.0.4


^ permalink raw reply related

* Re: [PATCH] be2net: Workaround to fix a bug in Rx Completion processing.
From: David Miller @ 2009-10-01  4:58 UTC (permalink / raw)
  To: ajitk; +Cc: netdev
In-Reply-To: <20091001040247.GA28228@serverengines.com>

From: Ajit Khaparde <ajitk@serverengines.com>
Date: Thu, 1 Oct 2009 09:33:22 +0530

> vtp bit in RX completion descriptor could be wrongly set in
> some skews of BladEngine.  Ignore this  bit if vtm is not set.
> Resending because the previous patch was against net-next tree.
> This patch is against the net-2.6 tree.
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

Applied, thanks.

^ permalink raw reply

* Re: kernel doc / docbook pdfdocs question
From: Doug Maxey @ 2009-10-01  5:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Randy Dunlap, netdev
In-Reply-To: <20090930173002.36da7ffd@s6510>

On Wed, 30 Sep 2009 17:30:02 PDT, Stephen Hemminger wrote:
>On Wed, 30 Sep 2009 14:59:36 -0500
>Doug Maxey <dwm@enoyolf.org> wrote:
>
>> 
>> Randy,
>> 
>> This may be slightly off topic for this list, but it does involve an
>> (as yet un-released) network driver. :)
>> 
>> Do you have any insight that could guide me toward a fix for an issue
>> seen with some header file constructs when trying to generate a pdf
>> docbook?
>> 
>
>Why clutter docbook output (which is supposed to be about general kernel
>API's) with output for data structures in one driver.

It would be a general mechanism, and it would be to document an API.
There are other subsystems that use DECLARE_BITMAP() (e.g., scsi).
Just none at the moment that attempt to describe such a member,
possibly because there isn't a way to document it.  Dunno.  Build it
and they will come.  There is one party that is interested anyway.

Finally did find where this was getting warned about / tossed, in
kernel-doc itself. =)

++doug

^ permalink raw reply

* Re: [PATCH] ethtool: Add a new ethtool option to flash a firmware image from the specified file to a device.
From: Ajit Khaparde @ 2009-10-01  6:13 UTC (permalink / raw)
  To: davem, jgarzik; +Cc: netdev
In-Reply-To: <20090903030258.GA19401@serverengines.com>

On 03/09/09 08:33 +0530, Ajit Khaparde wrote:
> This patch adds a new "-f" option to the ethtool utility
> to flash a firmware image specified by a file, to a network device.
> The filename is passed to the network driver which will flash the image
> on the chip using the request_firmware path.
> 
> The region "on the chip" to be flashed can be specified by an option.
> It is upto the device driver to enumerate the region number passed by ethtool,
> to the region to be flashed.
> 
> The default behavior is to flash all the regions on the chip.
> 
> Usage:
> ethtool -f <interface name> <filename of firmware image>
> 
> ethtool -f <interface name> <filename of firmware image> [ REGION-NUMBER-TO-FLASH ]
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

When will this change make it to the ethtool source?
Anything that I need to do from my side?
I hope this is the right place to submit such changes.

Thanks
-Ajit

^ permalink raw reply

* [net-2.6 PATCH] ixgbe: correct the parameter description
From: Jeff Kirsher @ 2009-10-01  6:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Jiri Pirko, Peter P Waskiewicz Jr, Jeff Kirsher

From: Jiri Pirko <jpirko@redhat.com>

ccffad25b5136958d4769ed6de5e87992dd9c65c changed parameters for function
ixgbe_update_uc_addr_list_generic but parameter description was not updated.
This patch corrects it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_common.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_common.c b/drivers/net/ixgbe/ixgbe_common.c
index 143b0fc..40ff120 100644
--- a/drivers/net/ixgbe/ixgbe_common.c
+++ b/drivers/net/ixgbe/ixgbe_common.c
@@ -1355,9 +1355,7 @@ static void ixgbe_add_uc_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq)
 /**
  *  ixgbe_update_uc_addr_list_generic - Updates MAC list of secondary addresses
  *  @hw: pointer to hardware structure
- *  @addr_list: the list of new addresses
- *  @addr_count: number of addresses
- *  @next: iterator function to walk the address list
+ *  @uc_list: the list of new addresses
  *
  *  The given list replaces any existing list.  Clears the secondary addrs from
  *  receive address registers.  Uses unused receive address registers for the


^ permalink raw reply related

* Re: [PATCH] net: fix NOHZ: local_softirq_pending 08
From: Oliver Hartkopp @ 2009-10-01  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, mb, kalle.valo, linville, linux-wireless, netdev
In-Reply-To: <20090930.163333.234658158.davem@davemloft.net>

David Miller wrote:
> From: Oliver Hartkopp <oliver@hartkopp.net>
> Date: Wed, 30 Sep 2009 20:18:25 +0200
> 
>> Socket buffers that are generated and received inside softirqs or from process
>> context must not use netif_rx() that's intended to be used from irq context only.
>>
>> This patch introduces a new helper function netif_rx_ti(skb) that tests for
>> in_interrupt() before invoking netif_rx() or netif_rx_ni().
>>
>> It fixes the ratelimited kernel warning
>>
>>         NOHZ: local_softirq_pending 08
>>
>> in the mac80211 and can subsystems.
>>
>> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
> 
> I bet all of these code paths can use netif_receive_skb() or
> don't need this conditional blob at all.
> 
> Looking at some specific cases in this patch:
> 
> 1) VCAN:  This RX routine is only invoked from the drivers
>    ->ndo_start_xmit() handler, and therefore like the loopback
>    driver we know that BH's are already disabled and therefore
>    it can always use netif_rx() safely.
> 
>    Why did you convert this case?
> 
>    And if this needs to be converted, why doesn't loopback need
>    to be?
> 
> 2) af_can.c:  In what situation will netif_rx_ni() not be appropriate
>    here?  It should always execute in softirq or user context, now
>    hardirq context.
> 
> And the list goes on and on, I don't really like this new conditional
> testing of interrupt state.

Hello Dave,

i'm confused about in_interrupt(), in_softirq() and in_irq() as pointed out by
Johannes here:

http://marc.info/?l=linux-wireless&m=125432410405562&w=2

Indeed in the two cases for the CAN stuff (in vcan.c and af_can.c) the skb's
are received in process-context and softirq-context only.

Therefore i used netif_rx_ni() in my last change of this code.

Now i was reading from Johannes that in_interrupt() is used for
hardirq-context /and/ softirq-context, so i was just *unsure* and used the
newly introduced netif_rx_ti() for that, which tests for in_interrupt().

Indeed i'm not really happy with that, as it is always better to check each
receive case in which context it can be used and used exactly the right
function for that.

So when netif_rx_ni() or netif_receive_skb() is the best i can use when in
process-context or in softirq-context, i'll do it with pleasure.

And if it is like this the problematic netif_rx() calls in mac80211 need to be
sorted out in detail also ...

Regards,
Oliver

^ permalink raw reply

* Re: [PATCH] connector: Allow permission checking in the receiver callbacks
From: Lars Ellenberg @ 2009-10-01  8:01 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Philipp Reisner, linux-kernel, netdev, Andrew Morton
In-Reply-To: <20090930192928.GA1315@ioremap.net>

On Wed, Sep 30, 2009 at 11:29:28PM +0400, Evgeniy Polyakov wrote:
> On Wed, Sep 30, 2009 at 03:20:35PM +0200, Lars Ellenberg (lars.ellenberg@linbit.com) wrote:
> > Actually it is the basis for follow-up security fixes.
> > 
> > Without this, unprivileged user space is able to send arbitrary
> > connector requests to kernel subsystems, which have no way to verify the
> > privileges of the sender anymore, because that information, even though
> > available at the netlink layer, has been dropped by the connector.
> 
> It is not. One can add some checks at receiving time which happens in
> process context to get its credentials, but nothing in netlink itself
> carry this info. Getting that connector schedules workqueue this ability
> is lost.

Please correct me if I'm wrong.

My understanding is, that in netlink_sendmsg, the credentials and
capabilities are copied into skb->cb.
During kernel side receive, these can be checked.

If we pass the skb, instead of just the msg, then even an asynchronously
scheduled receive callback, running in any workqueue or other context,
can check for these credentials.

Passing skb instead of just the msg for use in cn_queue_wrapper
is what the fist patch does.

Second patch changes the semantics of the actual callback
to be passed in the msg _and_ the netlink_skb_parms, both
"reconstructed" from the skb.

Now, in the end-user callback, there is the actual msg,
but also the netlink_skb_parms.
So this enables the end-user callback, running in arbitrary context,
to check capabilities and other credentials of the sending process.

> > Once this is applied, the various in-kernel receiving connector
> > callbacks can (and need to) add cap_raised(nsb->eff_cap, cap) where
> > appropriate. For example, you don't want some guest user to be able to
> > trigger a dst_del_node callback by sending a crafted netlink message,
> > right?
> > 
> > So it _is_ a (design-) bug fix.
> > Or am I missing something?
> 
> This patchset is not a bugfix, just a cleanup, since none in patchset
> uses netlink_skb_parms

3. and 4. patch are in fact merely cleanups.

> and currently I see no users which are affected
> by this behaviour in the mainline branch (not counting staging tree).
> 
> But if proposed configuration changes for DM are on the way, then I
> agree and they should force this patchset into the tree as a bugfix.
> 
> -- 
> 	Evgeniy Polyakov
> 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: correct the parameter description
From: David Miller @ 2009-10-01  8:10 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jpirko, peter.p.waskiewicz.jr
In-Reply-To: <20091001065140.13279.42634.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 30 Sep 2009 23:51:41 -0700

> From: Jiri Pirko <jpirko@redhat.com>
> 
> ccffad25b5136958d4769ed6de5e87992dd9c65c changed parameters for function
> ixgbe_update_uc_addr_list_generic but parameter description was not updated.
> This patch corrects it.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/2] net/netfilter/ipvs: Move #define KMSG_COMPONENT to Makefile
From: Jan Engelhardt @ 2009-10-01  8:27 UTC (permalink / raw)
  To: Joe Perches
  Cc: Patrick McHardy, David S. Miller, Simon Horman, Julian Anastasov,
	Netfilter Developer Mailing List, netdev,
	Linux Kernel Mailing List, lvs-devel
In-Reply-To: <1254358235.2960.145.camel@Joe-Laptop.home>

On Thursday 2009-10-01 02:50, Joe Perches wrote:
>On Thu, 2009-10-01 at 02:31 +0200, Jan Engelhardt wrote:
>> Well I personally prefer the #include instead of hiding such in 
>> Makefiles. You know, when newcomers could start doing `grep 
>> KMSG_COMPONENT *.[ch]`. Perhaps GCC's -include flag in a Makefile
>> to avoid #includes in .c files?
>
>I imagine an eventual goal of standardizing the default
>pr_fmt define in kernel.h to
>
>	#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
>so that all pr_<level> calls get this unless otherwise
>specified.

I like that approach. Saves me adding that line to .c
files repeatedly.

>Or perhaps better, to get rid of pr_fmt(fmt) altogether and
>have printk emit the filename/modulename, function and/or
>code offset by using something like %pS after the level.

I object to that. You would be spamming the dmesg ring buffer
with all that info, plus

filename: you would have to keep filename strings in the kernel.
Surely I do not find that thrilling when there are ~18000
non-arch .[ch] files whose pathnames amount to 542K.
Same goes similar for functions.

modulename: obj-y files would only get "<built-in>" or something
for KBUILD_MODNAME. Printing that to dmesg is not too useful.

I would rather keep plain printk as-is.

^ permalink raw reply

* [PATCH 13/34] don't use __devexit_p to wrap meth_remove
From: Uwe Kleine-König @ 2009-10-01  8:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Sam Ravnborg, Andrew Morton, David S. Miller, Ralf Baechle,
	Patrick McHardy, Johannes Berg, netdev
In-Reply-To: <20091001082607.GA2181@pengutronix.de>

The function meth_remove is defined using __exit, so don't use __devexit_p
but __exit_p to wrap it.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/meth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/meth.c b/drivers/net/meth.c
index 92ceb68..2af8173 100644
--- a/drivers/net/meth.c
+++ b/drivers/net/meth.c
@@ -828,7 +828,7 @@ static int __exit meth_remove(struct platform_device *pdev)
 
 static struct platform_driver meth_driver = {
 	.probe	= meth_probe,
-	.remove	= __devexit_p(meth_remove),
+	.remove	= __exit_p(meth_remove),
 	.driver = {
 		.name	= "meth",
 		.owner	= THIS_MODULE,
-- 
1.6.4.3

^ permalink raw reply related

* [PATCH 22/34] don't use __devexit_p to wrap sgiseeq_remove
From: Uwe Kleine-König @ 2009-10-01  8:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Sam Ravnborg, Andrew Morton, David S. Miller, Wang Chen,
	Ralf Baechle, Patrick McHardy, netdev
In-Reply-To: <20091001082607.GA2181@pengutronix.de>

The function sgiseeq_remove is defined using __exit, so don't use
__devexit_p but __exit_p to wrap it.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/sgiseeq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sgiseeq.c b/drivers/net/sgiseeq.c
index ecf3279..f4dfd1f 100644
--- a/drivers/net/sgiseeq.c
+++ b/drivers/net/sgiseeq.c
@@ -826,7 +826,7 @@ static int __exit sgiseeq_remove(struct platform_device *pdev)
 
 static struct platform_driver sgiseeq_driver = {
 	.probe	= sgiseeq_probe,
-	.remove	= __devexit_p(sgiseeq_remove),
+	.remove	= __exit_p(sgiseeq_remove),
 	.driver = {
 		.name	= "sgiseeq",
 		.owner	= THIS_MODULE,
-- 
1.6.4.3

^ permalink raw reply related

* [PATCH 30/34] move virtnet_remove to .devexit.text
From: Uwe Kleine-König @ 2009-10-01  8:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Sam Ravnborg, Andrew Morton, David S. Miller, Rusty Russell,
	Alex Williamson, Mark McLoughlin, netdev
In-Reply-To: <20091001082607.GA2181@pengutronix.de>

The function virtnet_remove is used only wrapped by __devexit_p so define
it using __devexit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alex Williamson <alex.williamson@hp.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/virtio_net.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d445845..8d00976 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -948,7 +948,7 @@ free:
 	return err;
 }
 
-static void virtnet_remove(struct virtio_device *vdev)
+static void __devexit virtnet_remove(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
 	struct sk_buff *skb;
-- 
1.6.4.3

^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-10-01  8:34 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4AC3B9C6.5090408@gmail.com>

On 09/30/2009 10:04 PM, Gregory Haskins wrote:


>> A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work
>> over vbus.
>>      
> Binary compatibility with existing virtio drivers, while nice to have,
> is not a specific requirement nor goal.  We will simply load an updated
> KMP/MSI into those guests and they will work again.  As previously
> discussed, this is how more or less any system works today.  It's like
> we are removing an old adapter card and adding a new one to "uprev the
> silicon".
>    

Virtualization is about not doing that.  Sometimes it's necessary (when 
you have made unfixable design mistakes), but just to replace a bus, 
with no advantages to the guest that has to be changed (other 
hypervisors or hypervisorless deployment scenarios aren't).

>>   Further, non-shmem virtio can't work over vbus.
>>      
> Actually I misspoke earlier when I said virtio works over non-shmem.
> Thinking about it some more, both virtio and vbus fundamentally require
> shared-memory, since sharing their metadata concurrently on both sides
> is their raison d'être.
>
> The difference is that virtio utilizes a pre-translation/mapping (via
> ->add_buf) from the guest side.  OTOH, vbus uses a post translation
> scheme (via memctx) from the host-side.  If anything, vbus is actually
> more flexible because it doesn't assume the entire guest address space
> is directly mappable.
>
> In summary, your statement is incorrect (though it is my fault for
> putting that idea in your head).
>    

Well, Xen requires pre-translation (since the guest has to give the host 
(which is just another guest) permissions to access the data).  So 
neither is a superset of the other, they're just different.

It doesn't really matter since Xen is unlikely to adopt virtio.

> An interesting thing here is that you don't even need a fancy
> multi-homed setup to see the effects of my exit-ratio reduction work:
> even single port configurations suffer from the phenomenon since many
> devices have multiple signal-flows (e.g. network adapters tend to have
> at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
> etc).  Whats worse, is that the flows often are indirectly related (for
> instance, many host adapters will free tx skbs during rx operations, so
> you tend to get bursts of tx-completes at the same time as rx-ready.  If
> the flows map 1:1 with IDT, they will suffer the same problem.
>    

You can simply use the same vector for both rx and tx and poll both at 
every interrupt.

> In any case, here is an example run of a simple single-homed guest over
> standard GigE.  Whats interesting here is that .qnotify to .notify
> ratio, as this is the interrupt-to-signal ratio.  In this case, its
> 170047/151918, which comes out to about 11% savings in interrupt injections:
>
> vbus-guest:/home/ghaskins # netperf -H dev
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> dev.laurelwood.net (192.168.1.10) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
> 1048576  16384  16384    10.01     940.77
> vbus-guest:/home/ghaskins # cat /sys/kernel/debug/pci-to-vbus-bridge
>    .events                        : 170048
>    .qnotify                       : 151918
>    .qinject                       : 0
>    .notify                        : 170047
>    .inject                        : 18238
>    .bridgecalls                   : 18
>    .buscalls                      : 12
> vbus-guest:/home/ghaskins # cat /proc/interrupts
>              CPU0
>     0:         87   IO-APIC-edge      timer
>     1:          6   IO-APIC-edge      i8042
>     4:        733   IO-APIC-edge      serial
>     6:          2   IO-APIC-edge      floppy
>     7:          0   IO-APIC-edge      parport0
>     8:          0   IO-APIC-edge      rtc0
>     9:          0   IO-APIC-fasteoi   acpi
>    10:          0   IO-APIC-fasteoi   virtio1
>    12:         90   IO-APIC-edge      i8042
>    14:       3041   IO-APIC-edge      ata_piix
>    15:       1008   IO-APIC-edge      ata_piix
>    24:     151933   PCI-MSI-edge      vbus
>    25:          0   PCI-MSI-edge      virtio0-config
>    26:        190   PCI-MSI-edge      virtio0-input
>    27:         28   PCI-MSI-edge      virtio0-output
>   NMI:          0   Non-maskable interrupts
>   LOC:       9854   Local timer interrupts
>   SPU:          0   Spurious interrupts
>   CNT:          0   Performance counter interrupts
>   PND:          0   Performance pending work
>   RES:          0   Rescheduling interrupts
>   CAL:          0   Function call interrupts
>   TLB:          0   TLB shootdowns
>   TRM:          0   Thermal event interrupts
>   THR:          0   Threshold APIC interrupts
>   MCE:          0   Machine check exceptions
>   MCP:          1   Machine check polls
>   ERR:          0
>   MIS:          0
>
> Its important to note here that we are actually looking at the interrupt
> rate, not the exit rate (which is usually a multiple of the interrupt
> rate, since you have to factor in as many as three exits per interrupt
> (IPI, window, EOI).  Therefore we saved about 18k interrupts in this 10
> second burst, but we may have actually saved up to 54k exits in the
> process. This is only over a 10 second window at GigE rates, so YMMV.
> These numbers get even more dramatic on higher end hardware, but I
> haven't had a chance to generate new numbers yet.
>    

(irq window exits should only be required on a small percentage of 
interrupt injections, since the guest will try to disable interrupts for 
short periods only)

> Looking at some external stats paints an even bleaker picture: "exits"
> as reported by kvm_stat for virtio-pci based virtio-net tip the scales
> at 65k/s vs 36k/s for vbus based venet.  And virtio is consuming ~30% of
> my quad-core's cpu, vs 19% for venet during the test.  Its hard to know
> which innovation or innovations may be responsible for the entire
> reduction, but certainly the interrupt-to-signal ratio mentioned above
> is probably helping.
>    

Can you please stop comparing userspace-based virtio hosts to 
kernel-based venet hosts?  We know the userspace implementation sucks.

> The even worse news for 1:1 models is that the ratio of
> exits-per-interrupt climbs with load (exactly when it hurts the most)
> since that is when the probability that the vcpu will need all three
> exits is the highest.
>    

Requiring all three exits means the guest is spending most of its time 
with interrupts disabled; that's unlikely.

Thanks for the numbers.  Are those 11% attributable to rx/tx 
piggybacking from the same interface?

Also, 170K interupts -> 17K interrupts/sec -> 55kbit/interrupt -> 
6.8kB/interrupt.  Ignoring interrupt merging and assuming equal rx/tx 
distribution, that's about 13kB/interrupt.  Seems rather low for a 
saturated link.

>>      
>>> and priortizable/nestable signals.
>>>
>>>        
>> That doesn't belong in a bus.
>>      
> Everyone is of course entitled to an opinion, but the industry as a
> whole would disagree with you.  Signal path routing (1:1, aggregated,
> etc) is at the discretion of the bus designer.  Most buses actually do
> _not_ support 1:1 with IDT (think USB, SCSI, IDE, etc).
>    

With standard PCI, they do not.  But all modern host adapters support 
MSI and they will happily give you one interrupt per queue.

> PCI is somewhat of an outlier in that regard afaict.  Its actually a
> nice feature of PCI when its used within its design spec (HW).  For
> SW/PV, 1:1 suffers from, among other issues, that "triple-exit scaling"
> issue in the signal path I mentioned above.  This is one of the many
> reasons I think PCI is not the best choice for PV.
>    

Look at the vmxnet3 submission (recently posted on virtualization@).  
It's a perfectly ordinary PCI NIC driver, apart from having so many 'V's 
in the code.  16 rx queues, 8 tx queues, 25 MSIs, BARs for the 
registers.  So while the industry as a whole might disagree with me, it 
seems VMware does not.


>>> http://developer.novell.com/wiki/images/b/b7/31-rc4_throughput.png
>>>
>>>        
>> That's a red herring.  The problem is not with virtio as an ABI, but
>> with its implementation in userspace.  vhost-net should offer equivalent
>> performance to vbus.
>>      
> That's pure speculation.  I would advise you to reserve such statements
> until after a proper bakeoff can be completed.

Let's do that then.  Please reserve the corresponding comparisons from 
your side as well.

> This is not to mention
> that vhost-net does nothing to address our other goals, like scheduler
> coordination and non-802.x fabrics.
>    

What are scheduler coordination and non-802.x fabrics?

>> Right, when you ignore the points where they don't fit, it's a perfect
>> mesh.
>>      
> Where doesn't it fit?
>    

(avoiding infinite loop)

>>>> But that's not a strong argument for vbus; instead of adding vbus you
>>>> could make virtio more friendly to non-virt
>>>>
>>>>          
>>> Actually, it _is_ a strong argument then because adding vbus is what
>>> helps makes virtio friendly to non-virt, at least for when performance
>>> matters.
>>>
>>>        
>> As vhost-net shows, you can do that without vbus
>>      
> Citation please.  Afaict, the one use case that we looked at for vhost
> outside of KVM failed to adapt properly, so I do not see how this is true.
>    

I think Ira said he can make vhost work?

>> and without breaking compatibility.
>>      
> Compatibility with what?  vhost hasn't even been officially deployed in
> KVM environments afaict, nevermind non-virt.  Therefore, how could it
> possibly have compatibility constraints with something non-virt already?
>   Citation please.
>    

virtio-net over pci is deployed.  Replacing the backend with vhost-net 
will require no guest modifications.  Replacing the frontend with venet 
or virt-net/vbus-pci will require guest modifications.

Obviously virtio-net isn't deployed in non-virt.  But if we adopt vbus, 
we have to migrate guests.



>> Of course there is such a thing as native, a pci-ready guest has tons of
>> support built into it
>>      
> I specifically mentioned that already ([1]).
>
> You are also overstating its role, since the basic OS is what implements
> the native support for bus-objects, hotswap, etc, _not_ PCI.  PCI just
> rides underneath and feeds trivial events up, as do other bus-types
> (usb, scsi, vbus, etc).

But we have to implement vbus for each guest we want to support.  That 
includes Windows and older Linux which has a different internal API, so 
we have to port the code multiple times, to get existing functionality.

> And once those events are fed, you still need a
> PV layer to actually handle the bus interface in a high-performance
> manner so its not like you really have a "native" stack in either case.
>    

virtio-net doesn't use any pv layer.

>> that doesn't need to be retrofitted.
>>      
> No, that is incorrect.  You have to heavily modify the pci model with
> layers on top to get any kind of performance out of it.  Otherwise, we
> would just use realtek emulation, which is technically the native PCI
> you are apparently so enamored with.
>    

virtio-net doesn't modify the PCI model.  And if you look at vmxnet3, 
they mention that it conforms to somthing called UPT, which allows 
hardware vendors to implement parts of their NIC model.  So vmxnet3 is 
apparently suitable to both hardware and software implementations.

> Not to mention there are things you just plain can't do in PCI today,
> like dynamically assign signal-paths,

You can have dynamic MSI/queue routing with virtio, and each MSI can be 
routed to a vcpu at will.

> priority, and coalescing, etc.
>    

Do you mean interrupt priority?  Well, apic allows interrupt priorities 
and Windows uses them; Linux doesn't.  I don't see a reason to provide 
more than native hardware.

>> Since
>> practically everyone (including Xen) does their paravirt drivers atop
>> pci, the claim that pci isn't suitable for high performance is incorrect.
>>      
> Actually IIUC, I think Xen bridges to their own bus as well (and only
> where they have to), just like vbus.  They don't use PCI natively.  PCI
> is perfectly suited as a bridge transport for PV, as I think the Xen and
> vbus examples have demonstrated.  Its the 1:1 device-model where PCI has
> the most problems.
>    

N:1 breaks down on large guests since one vcpu will have to process all 
events.  You could do N:M, with commands to change routings, but where's 
your userspace interface?  you can't tell from /proc/interrupts which 
vbus interupts are active, and irqbalance can't steer them towards less 
busy cpus since they're invisible to the interrupt controller.


>>> And lastly, why would you _need_ to use the so called "native"
>>> mechanism?  The short answer is, "you don't".  Any given system (guest
>>> or bare-metal) already have a wide-range of buses (try running "tree
>>> /sys/bus" in Linux).  More importantly, the concept of adding new buses
>>> is widely supported in both the Windows and Linux driver model (and
>>> probably any other guest-type that matters).  Therefore, despite claims
>>> to the contrary, its not hard or even unusual to add a new bus to the
>>> mix.
>>>
>>>        
>> The short answer is "compatibility".
>>      
> There was a point in time where the same could be said for virtio-pci
> based drivers vs realtek and e1000, so that argument is demonstrably
> silly.  No one tried to make virtio work in a binary compatible way with
> realtek emulation, yet we all survived the requirement for loading a
> virtio driver to my knowledge.
>    

The larger your installed base, the more difficult it is.  Of course 
it's doable, but I prefer not doing it and instead improving things in a 
binary backwards compatible manner.  If there is no choice we will bow 
to the inevitable and make our users upgrade.  But at this point there 
is a choice, and I prefer to stick with vhost-net until it is proven 
that it won't work.

> The bottom line is: Binary device compatibility is not required in any
> other system (as long as you follow sensible versioning/id rules), so
> why is KVM considered special?
>    

One of the benefits of virtualization is that the guest model is 
stable.  You can live-migrate guests and upgrade the hardware 
underneath.  You can have a single guest image that you clone to 
provision new guests.  If you switch to a new model, you give up those 
benefits, or you support both models indefinitely.

Note even hardware nowadays is binary compatible.  One e1000 driver 
supports a ton of different cards, and I think (not sure) newer cards 
will work with older drivers, just without all their features.

> The fact is, it isn't special (at least not in this regard).  What _is_
> required is "support" and we fully intend to support these proposed
> components.  I assure you that at least the users that care about
> maximum performance will not generally mind loading a driver.  Most of
> them would have to anyway if they want to get beyond realtek emulation.
>    

For a new install, sure.  I'm talking about existing deployments (and 
those that will exist by the time vbus is ready for roll out).

> I am certainly in no position to tell you how to feel, but this
> declaration would seem from my perspective to be more of a means to an
> end than a legitimate concern.  Otherwise we would never have had virtio
> support in the first place, since it was not "compatible" with previous
> releases.
>    

virtio was certainly not pain free, needing Windows drivers, updates to 
management tools (you can't enable it by default, so you have to offer 
it as a choice), mkinitrd, etc.  I'd rather not have to go through that 
again.

>>   Especially if the device changed is your boot disk.
>>      
> If and when that becomes a priority concern, that would be a function
> transparently supported in the BIOS shipped with the hypervisor, and
> would thus be invisible to the user.
>    

No, you have to update the driver in your initrd (for Linux) or properly 
install the new driver (for Windows).  It's especially difficult for 
Windows.

>>   You may not care about the pain caused to users, but I do, so I will
>> continue to insist on compatibility.
>>      
> For the users that don't care about maximum performance, there is no
> change (and thus zero pain) required.  They can use realtek or virtio if
> they really want to.  Neither is going away to my knowledge, and lets
> face it: 2.6Gb/s out of virtio to userspace isn't *that* bad.  But "good
> enough" isn't good enough, and I won't rest till we get to native
> performance.

I don't want to support both virtio and vbus in parallel.  There's 
enough work already.  If we adopt vbus, we'll have to deprecate and 
eventually kill off virtio.

> 2) True pain to users is not caused by lack of binary compatibility.
> Its caused by lack of support.  And its a good thing or we would all be
> emulating 8086 architecture forever...
>
> ..oh wait, I guess we kind of do that already ;).  But at least we can
> slip in something more advanced once in a while (APIC vs PIC, USB vs
> uart, iso9660 vs floppy, for instance) and update the guest stack
> instead of insisting it must look like ISA forever for compatibility's sake.
>    

PCI is continuously updated, with MSI, MSI-X, and IOMMU support being 
some recent updates.  I'd like to ride on top of that instead of having 
to clone it for every guest I support.

>> So we have: vbus needs a connector, vhost needs a connector.  vbus
>> doesn't need userspace to program the addresses (but does need userspace
>> to instantiate the devices and to program the bus address decode)
>>      
> First of all, bus-decode is substantially easier than per-device decode
> (you have to track all those per-device/per-signal fds somewhere,
> integrate with hotswap, etc), and its only done once per guest at
> startup and left alone.  So its already not apples to apples.
>    

Right, it means you can hand off those eventfds to other qemus or other 
pure userspace servers.  It's more flexible.

> Second, while its true that the general kvm-connector bus-decode needs
> to be programmed,  that is a function of adapting to the environment
> that _you_ created for me.  The original kvm-connector was discovered
> via cpuid and hypercalls, and didn't need userspace at all to set it up.
>   Therefore it would be entirely unfair of you to turn around and somehow
> try to use that trait of the design against me since you yourself
> imposed it.
>    

No kvm feature will ever be exposed to a guest without userspace 
intervention.  It's a basic requirement.  If it causes complexity (and 
it does) we have to live with it.

>>   Does it work on Windows?
>>      
> This question doesn't make sense.  Hotswap control occurs on the host,
> which is always Linux.
>
> If you were asking about whether a windows guest will support hotswap:
> the answer is "yes".  Our windows driver presents a unique PDO/FDO pair
> for each logical device instance that is pushed out (just like the built
> in usb, pci, scsi bus drivers that windows supports natively).
>    

Ah, you have a Windows venet driver?


>>> As an added bonus, its device-model is modular.  A developer can write a
>>> new device model, compile it, insmod it to the host kernel, hotplug it
>>> to the running guest with mkdir/ln, and the come back out again
>>> (hotunplug with rmdir, rmmod, etc).  They may do this all without taking
>>> the guest down, and while eating QEMU based IO solutions for breakfast
>>> performance wise.
>>>
>>> Afaict, qemu can't do either of those things.
>>>
>>>        
>> We've seen that herring before,
>>      
> Citation?
>    

It's the compare venet-in-kernel to virtio-in-userspace thing again.  
Let's defer that until mst complete vhost-net mergable buffers, it which 
time we can compare vhost-net to venet and see how much vbus contributes 
to performance and how much of it comes from being in-kernel.

>>>> Refactor instead of duplicating.
>>>>
>>>>          
>>> There is no duplicating.  vbus has no equivalent today as virtio doesn't
>>> define these layers.
>>>
>>>        
>> So define them if they're missing.
>>      
> I just did.
>    

Since this is getting confusing to me, I'll start from scratch looking 
at the vbus layers, top to bottom:

Guest side:
1. venet guest kernel driver - AFAICT, duplicates the virtio-net guest 
driver functionality
2. vbus guest driver (config and hotplug) - duplicates pci, or if you 
need non-pci support, virtio config and its pci bindings; needs 
reimplementation for all supported guests
3. vbus guest driver (interrupt coalescing, priority) - if needed, 
should be implemented as an irqchip (and be totally orthogonal to the 
driver); needs reimplementation for all supported guests
4. vbus guest driver (shm/ioq) - finder grained layering than virtio 
(which only supports the combination, due to the need for Xen support); 
can be retrofitted to virtio at some cost

Host side:
1. venet host kernel driver - is duplicated by vhost-net; doesn't 
support live migration, unprivileged users, or slirp
2. vbus host driver (config and hotplug) - duplicates pci support in 
userspace (which will need to be kept in any case); already has two 
userspace interfaces
3. vbus host driver (interrupt coalescing, priority) - if we think we 
need it (and I don't), should be part of kvm core, not a bus
4. vbus host driver (shm) - partially duplicated by vhost memory slots
5. vbus host driver (ioq) - duplicates userspace virtio, duplicated by vhost

>>> There is no rewriting.  vbus has no equivalent today as virtio doesn't
>>> define these layers.
>>>
>>> By your own admission, you said if you wanted that capability, use a
>>> library.  What I think you are not understanding is vbus _is_ that
>>> library.  So what is the problem, exactly?
>>>
>>>        
>> It's not compatible.
>>      
> No, that is incorrect.  What you are apparently not understanding is
> that not only is vbus that library, but its extensible.  So even if
> compatibility is your goal (it doesn't need to be IMO) it can be
> accommodated by how you interface to the library.
>    

To me, compatible means I can live migrate an image to a new system 
without the user knowing about the change.  You'll be able to do that 
with vhost-net.

>>>>
>>>>          
>>> No, it does not.  vbus just needs a relatively simple single message
>>> pipe between the guest and host (think "hypercall tunnel", if you will).
>>>
>>>        
>> That's ioeventfd.  So far so similar.
>>      
> No, that is incorrect.  For one, vhost uses them on a per-signal path
> basis, whereas vbus only has one channel for the entire guest->host.
>    

You'll probably need to change that as you start running smp guests.

> Second, I do not use ioeventfd anymore because it has too many problems
> with the surrounding technology.  However, that is a topic for a
> different thread.
>    

Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.

>> vbus devices aren't magically instantiated.  Userspace needs to
>> instantiate them too.  Sure, there's less work on the host side since
>> you're using vbus instead of the native interface, but more work on the
>> guest side since you're using vbus instead of the native interface.
>>      
>
> No, that is incorrect.  The amount of "work" that a guest does is
> actually the same in both cases, since the guest OS peforms the hotswap
> handling natively for all bus types (at least for Linux and Windows).
> You still need to have a PV layer to interface with those objects in
> both cases, as well, so there is no such thing as "native interface" for
> PV.  Its only a matter of where it occurs in the stack.
>    

I'm missing something.  Where's the pv layer for virtio-net?

Linux drivers have an abstraction layer to deal with non-pci.  But the 
Windows drivers are ordinary pci drivers with nothing that looks 
pv-ish.  You could implement virtio-net hardware if you wanted to.

>>   non-privileged-user capable?
>>      
> The short answer is "not yet (I think)".  I need to write a patch to
> properly set the mode attribute in sysfs, but I think this will be trivial.
>
>    

(and selinux label)

>> Ah, so you have two control planes.
>>      
> So what?  If anything, it goes to show how extensible the framework is
> that a new plane could be added in 119 lines of code:
>
> ~/git/linux-2.6>  stg show vbus-add-admin-ioctls.patch | diffstat
>   Makefile       |    3 -
>   config-ioctl.c |  117
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 119 insertions(+), 1 deletion(-)
>
> if and when having two control planes exceeds its utility, I will submit
> a simple patch that removes the useless one.
>    

It always begins with a 119-line patch and then grows, that's life.

>> kvm didn't have an existing counterpart in Linux when it was
>> proposed/merged.
>>      
> And likewise, neither does vbus.
>
>    

For virt uses, I don't see the need.  For non-virt, I have no opinion.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/3] wireless: implement basic ethtool support for cfg80211 devices
From: Johannes Berg @ 2009-10-01  8:51 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Kalle Valo, Kalle Valo,
	Luis R. Rodriguez
In-Reply-To: <1254359942-3483-1-git-send-email-linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Wed, 2009-09-30 at 21:19 -0400, John W. Linville wrote:

> +		if (!dev->ethtool_ops)
> +			dev->ethtool_ops = &cfg80211_ethtool_ops;
>  		break;

I might go so far and do it unconditionally so we get consistent
functionality across things. OTOH, full-mac drivers might be able to
support more.

> +const struct ethtool_ops cfg80211_ethtool_ops = {
> +	.get_drvinfo = cfg80211_get_drvinfo,
> +	.get_link = ethtool_op_get_link,
> +};
> +
> +void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)

if you change the order, you can make the latter static

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* tg3: bug report, driver freeze (transmit timed out), ifdown+ifup makes interface work again
From: Jesper Dangaard Brouer @ 2009-10-01  8:34 UTC (permalink / raw)
  To: Michael Chan, Matt Carlson
  Cc: netdev@vger.kernel.org, sander.contrib, David S. Miller


A friend of mine is experiencing problems with his tg3 based NIC.  He is
experiencing the net stops working (transmit timed out), and he hade to
access the console to get it working again.

Kernel: 2.6.26-2-686 (standard Debian package)
OS: Debian Lenny 5.0 (all upgrades)

Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 12)
 Subsystem: Dell Broadcom BCM5700
 eth1: Tigon3 [partno(none) rev 7102 PHY(5401)]

Is this a known issue? (If so what kernel is it fixed in... that I can
make him test...)

Cite:
According to the kernel log the tg3 driver tries to reset it self.
However, even though it looks like the interface is up, it is not!

A manuel ifdown eth1 && ifup eth1 does the trick.

According to my rtorrent I had used about 4GB of traffic (combined
down/up)..  so a qualified guess could be a 32-bit limitation in the
tg3-driver?


Server specs:
 DELL PowerEdge 2550
 2 GB Ram
 2x1 Ghz Pentium III (Coppermine)


Sep 30 11:45:46 samurai kernel: [1145615.063992] NETDEV WATCHDOG: eth1: transmit timed out
Sep 30 11:45:46 samurai kernel: [1145615.064028] tg3: eth1: transmit timed out, resetting
Sep 30 11:45:46 samurai kernel: [1145615.064052] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008]
Sep 30 11:45:46 samurai kernel: [1145615.064078] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
Sep 30 11:45:46 samurai kernel: [1145615.064119] ------------[ cut here]------------
Sep 30 11:45:46 samurai kernel: [1145615.064141] WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0x8f/0xdc()
Sep 30 11:45:46 samurai kernel: [1145615.064174] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG nf_conntrack_ip
v4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables ipv6 dm_snapshot dm_mirror dm_log dm_mod loop parport_pc
 parport evdev psmouse snd_pcm snd_timer snd soundcore snd_page_alloc serio_raw pcspkr shpchp pci_hotplug i2c_piix4 i2c_core
button sworks_agp agpgart dcdbas ext3 jbd mbcache sg sd_mod ide_cd_mod cdrom ide_pci_generic serverworks ide_core floppy aacr
aid aic7xxx scsi_transport_spi ata_generic e100 ohci_hcd libata scsi_mod dock tg3 usbcore 8139cp 8139too mii thermal processo
r fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 30 11:45:46 samurai kernel: [1145615.064517] Pid: 0, comm: swapper Not tainted 2.6.26-2-686 #1
Sep 30 11:45:46 samurai kernel: [1145615.064549]  [<c01225f3>] warn_on_slowpath+0x40/0x66
Sep 30 11:45:46 samurai kernel: [1145615.064594]  [<c0119160>] hrtick_start_fair+0xeb/0x12c
Sep 30 11:45:46 samurai kernel: [1145615.064635]  [<c0118926>] enqueue_task+0x52/0x5d
Sep 30 11:45:46 samurai kernel: [1145615.064663]  [<c011894c>] activate_task+0x1b/0x26
Sep 30 11:45:46 samurai kernel: [1145615.064690]  [<c011b6f3>] try_to_wake_up+0xe8/0xf1
Sep 30 11:45:46 samurai kernel: [1145615.064723]  [<c01319a9>] autoremove_wake_function+0xd/0x2d
Sep 30 11:45:46 samurai kernel: [1145615.064760]  [<c01184d1>] __wake_up_common+0x2e/0x58
Sep 30 11:45:46 samurai kernel: [1145615.064792]  [<c011a6bb>] __wake_up+0x29/0x39
Sep 30 11:45:46 samurai kernel: [1145615.064822]  [<c012f11f>] insert_work+0x58/0x5c
Sep 30 11:45:46 samurai kernel: [1145615.064849]  [<c012f40d>] __queue_work+0x1c/0x28
Sep 30 11:45:46 samurai kernel: [1145615.064876]  [<c012f468>] queue_work+0x33/0x3c
Sep 30 11:45:46 samurai kernel: [1145615.064903]  [<c0267035>] dev_watchdog+0x8f/0xdc
Sep 30 11:45:46 samurai kernel: [1145615.064930]  [<c01296d4>] run_timer_softirq+0x11a/0x17c
Sep 30 11:45:46 samurai kernel: [1145615.064960]  [<c0266fa6>] dev_watchdog+0x0/0xdc
Sep 30 11:45:46 samurai kernel: [1145615.064993]  [<c01265f5>] __do_softirq+0x66/0xd3
Sep 30 11:45:46 samurai kernel: [1145615.065022]  [<c01266a7>] do_softirq+0x45/0x53
Sep 30 11:45:46 samurai kernel: [1145615.065047]  [<c012695e>] irq_exit+0x35/0x67
Sep 30 11:45:46 samurai kernel: [1145615.065070]  [<c01101c9>] smp_apic_timer_interrupt+0x6b/0x76
Sep 30 11:45:46 samurai kernel: [1145615.065098]  [<c0102656>] default_idle+0x0/0x53
Sep 30 11:45:46 samurai kernel: [1145615.065127]  [<c0104364>] apic_timer_interrupt+0x28/0x30
Sep 30 11:45:46 samurai kernel: [1145615.065156]  [<c0102656>] default_idle+0x0/0x53
Sep 30 11:45:46 samurai kernel: [1145615.065189]  [<c0114d78>] native_safe_halt+0x2/0x3
Sep 30 11:45:46 samurai kernel: [1145615.065225]  [<c0102683>] default_idle+0x2d/0x53
Sep 30 11:45:46 samurai kernel: [1145615.065250]  [<c01025ce>] cpu_idle+0xab/0xcb
Sep 30 11:45:46 samurai kernel: [1145615.065291]  =======================
Sep 30 11:45:46 samurai kernel: [1145615.065311] ---[ end trace 0dbb94f68d53053b ]---
Sep 30 11:45:46 samurai kernel: [1145615.457820] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
Sep 30 11:45:46 samurai kernel: [1145615.557909] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
Sep 30 11:45:46 samurai kernel: [1145615.657903] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
Sep 30 11:45:46 samurai kernel: [1145615.758203] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
Sep 30 11:45:47 samurai kernel: [1145615.858203] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
Sep 30 11:45:47 samurai kernel: [1145615.958203] tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Sep 30 11:45:47 samurai kernel: [1145616.089213] tg3: eth1: Link is down.
Sep 30 11:45:49 samurai kernel: [1145618.565251] tg3: eth1: Link is up at 100 Mbps, full duplex.
Sep 30 11:45:49 samurai kernel: [1145618.565288] tg3: eth1: Flow control is off for TX and off for RX.

Sep 30 14:02:09 samurai kernel: [1154721.802641] NETDEV WATCHDOG: eth1: transmit timed out
Sep 30 14:02:09 samurai kernel: [1154721.802679] tg3: eth1: transmit timed out, resetting
Sep 30 14:02:09 samurai kernel: [1154721.802702] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008]
Sep 30 14:02:09 samurai kernel: [1154721.802729] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
Sep 30 14:02:09 samurai kernel: [1154721.974663] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
Sep 30 14:02:09 samurai kernel: [1154722.078613] tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Sep 30 14:02:09 samurai kernel: [1154722.206614] tg3: eth1: Link is down.
Sep 30 14:02:11 samurai kernel: [1154724.209290] tg3: eth1: Link is up at 100 Mbps, full duplex.
Sep 30 14:02:11 samurai kernel: [1154724.209328] tg3: eth1: Flow control is off for TX and off for RX.

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer

lspci -vvv
01:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 12)
        Subsystem: Dell Broadcom BCM5700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (16000ns min), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at feb00000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=1
                Status: Dev=ff:1f.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz-
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-
                Address: da6771daee5b44a4  Data: 889a
        Kernel driver in use: tg3
        Kernel modules: tg3


ethtool -i eth1:
driver: tg3
version: 3.92.1
firmware-version:
bus-info: 0000:01:08.0

Sep 18 22:34:19 samurai kernel: [ 4.707217] eth1: Tigon3 [partno(none) rev 7102 PHY(5401)] (PCI:66MHz:64-bit) 10/100/1000B
ase-T Ethernet 00:06:5b:39:d3:4a
Sep 18 22:34:19 samurai kernel: [ 4.707217] eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0]
Sep 18 22:34:19 samurai kernel: [ 4.707217] eth1: dma_rwctrl[76ff000f] dma_mask[64-bit]

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-10-01  9:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gregory Haskins, Ira W. Snyder, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4AC46989.7030502@redhat.com>

On Thu, Oct 01, 2009 at 10:34:17AM +0200, Avi Kivity wrote:
>> Second, I do not use ioeventfd anymore because it has too many problems
>> with the surrounding technology.  However, that is a topic for a
>> different thread.
>>    
>
> Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.

I second that. AFAIK ioeventfd/irqfd got exposed to userspace in 2.6.32-rc1,
if there are issues we better nail them before 2.6.32 is out.
And yes, please start a different thread.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox