Linux virtualization list
 help / color / mirror / Atom feed
* question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
From: Michael S. Tsirkin @ 2012-04-04  9:32 UTC (permalink / raw)
  To: Rusty Russell; +Cc: kvm, netdev, virtualization, Amit Shah, David Miller
In-Reply-To: <87pqf7fyld.fsf@rustcorp.com.au>

On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
> Michael S. Tsirkin noticed that we could run the refill work after
> ndo_close, which can re-enable napi - we don't disable it until
> virtnet_remove.  This is clearly wrong, so move the workqueue control
> to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
> 
> One subtle point: virtnet_probe() could simply fail if it couldn't
> allocate a receive buffer, but that's less polite in virtnet_open() so
> we schedule a refill as we do in the normal receive path if we run out
> of memory.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Doh.
napi_disable does not prevent the following
napi_schedule, does it?

Can someone confirm that I am not seeing things please?

And this means this hack does not work:
try_fill_recv can still run in parallel with
napi, corrupting the vq.

I suspect we need to resurrect a patch that used a
dedicated flag to avoid this race.

Comments?

> ---
>  drivers/net/virtio_net.c |   17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct 
>  	return err;
>  }
>  
> -/* Returns false if we couldn't fill entirely (OOM). */
> +/*
> + * Returns false if we couldn't fill entirely (OOM).
> + *
> + * Normally run in the receive path, but can also be run from ndo_open
> + * before we're receiving packets, or from refill_work which is
> + * careful to disable receiving (using napi_disable).
> + */
>  static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
> @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  
> +	/* Make sure we have some buffers: if oom use wq. */
> +	if (!try_fill_recv(vi, GFP_KERNEL))
> +		schedule_delayed_work(&vi->refill, 0);
> +
>  	virtnet_napi_enable(vi);
>  	return 0;
>  }
> @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  
> +	/* Make sure refill_work doesn't re-enable napi! */
> +	cancel_delayed_work_sync(&vi->refill);
>  	napi_disable(&vi->napi);
>  
>  	return 0;
> @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d
>  
>  unregister:
>  	unregister_netdev(dev);
> -	cancel_delayed_work_sync(&vi->refill);
>  free_vqs:
>  	vdev->config->del_vqs(vdev);
>  free_stats:
> @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str
>  	/* Stop all the virtqueues. */
>  	vdev->config->reset(vdev);
>  
> -
>  	unregister_netdev(vi->dev);
> -	cancel_delayed_work_sync(&vi->refill);
>  
>  	/* Free unused buffers in both send and recv, if any. */
>  	free_unused_bufs(vi);

^ permalink raw reply

* [PATCH RFC] virtio-net: remove useless disable on freeze
From: Michael S. Tsirkin @ 2012-04-04  9:19 UTC (permalink / raw)
  To: netdev; +Cc: Amit Shah, linux-kernel, kvm, virtualization

disable_cb is just an optimization: it
can not guarantee that there are no callbacks.

I didn't yet figure out whether a callback
in freeze will trigger a bug, but disable_cb
won't address it in any case. So let's remove
the useless calls as a first step.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 019da01..971931e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1182,11 +1182,6 @@ static int virtnet_freeze(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
 
-	virtqueue_disable_cb(vi->rvq);
-	virtqueue_disable_cb(vi->svq);
-	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
-		virtqueue_disable_cb(vi->cvq);
-
 	netif_device_detach(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
 
-- 
1.7.9.111.gf3fb0

^ permalink raw reply related

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Michael S. Tsirkin @ 2012-04-04  8:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, KVM, SCSI, LKML, VIRTUAL, James Bottomley,
	Ren Mingxin
In-Reply-To: <20120402190045.GC16226@dhcp-172-17-108-109.mtv.corp.google.com>

On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote:
> Hello, James.
> 
> On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
> > So if we're agreed no other devices going forwards should ever use this
> > interface, is there any point unifying the interface?  No matter how
> > many caveats you hedge it round with, putting the API in a central place
> > will be a bit like a honey trap for careless bears.   It might be safer
> > just to leave it buried in the three current drivers.
> 
> Yeah, that was my hope but I think it would be easier to enforce to
> have a common function which is clearly marked legacy so that new
> driver writers can go look for the naming code in the existing ones,
> find out they're all using the same function which is marked legacy
> and explains what to do for newer drivers.
> 
> Thanks.

I think I'm not the only one to be confused about the
preferred direction here.
James, do you agree to the approach above?

It would be nice to fix virtio block for 3.4, so
how about this:
- I'll just apply the original bugfix patch for 3.4 -
  it only affects virtio
- Ren will repost the refactoring patch on top, and we can
  keep up the discussion

Ren if you agree, can you make this a two patch series please?

> -- 
> tejun
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [V6 PATCH] virtio-net: send gratuitous packets when needed
From: Michael S. Tsirkin @ 2012-04-04  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, qemu-devel, linux-kernel, virtualization
In-Reply-To: <20120403.174905.769295608759578804.davem@davemloft.net>

On Tue, Apr 03, 2012 at 05:49:05PM -0400, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 28 Mar 2012 13:44:28 +0800
> 
> > As hypervior does not have the knowledge of guest network configuration, it's
> > better to ask guest to send gratuitous packets when needed.
> > 
> > Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
> > is set, a workqueue is scheduled to send gratuitous packet through
> > NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> > VIRTIO_NET_F_GUEST_ANNOUNCE.
>  ...
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> Michael and others, what's the story with this patch?

It's a new feature so it will be 3.5 material when it's ready.
Sent some comments for now.

Thanks,
MST


> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [V6 PATCH] virtio-net: send gratuitous packets when needed
From: Michael S. Tsirkin @ 2012-04-04  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, qemu-devel, linux-kernel, Amit Shah, virtualization,
	davem
In-Reply-To: <20120328054428.34415.75432.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, Mar 28, 2012 at 01:44:28PM +0800, Jason Wang wrote:
> As hypervior does not have the knowledge of guest network configuration, it's
> better to ask guest to send gratuitous packets when needed.
> 
> Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
> is set, a workqueue is scheduled to send gratuitous packet through
> NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> VIRTIO_NET_F_GUEST_ANNOUNCE.
> 
> Changes from v5:
> - notify the chain before acking the link annoucement
> - ack the link announcement notification through control vq
> 
> Changes from v4:
> - typos
> - handle workqueue unconditionally
> - move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits
> 
> Changes from v3:
> - cancel the workqueue during freeze
> 
> Changes from v2:
> - fix the race between unregister_dev() and workqueue
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I think this needs some fixes. See below.
Thanks.

> ---
>  drivers/net/virtio_net.c   |   32 +++++++++++++++++++++++++++++++-
>  include/linux/virtio_net.h |   13 +++++++++++++
>  2 files changed, 44 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4880aa8..0f60da7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,9 @@ struct virtnet_info {
>  	/* Work struct for refilling if we run low on memory. */
>  	struct delayed_work refill;
>  
> +	/* Work struct for sending gratuitous packets. */

A bit confusing: it does not send anything itself.
A better comment 'for announcing the existence of device
on the network'.

> +	struct work_struct announce;
> +
>  	/* Chain pages by the private ptr. */
>  	struct page *pages;
>  
> @@ -781,12 +784,30 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  	return status == VIRTIO_NET_OK;
>  }
>  
> +static void virtnet_ack_link_announce(struct virtnet_info *vi)
> +{
> +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
> +				  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL,
> +				  0, 0)) {

This can run in parallel with other commands. That's pretty bad -
will corrupt the cvq.
Take rtnl lock around calls to this function?

> +		dev_warn(&vi->dev->dev, "Failed to ack link nnounce.\n");

announce

> +	}

Better to drop {} around a single statement.

> +}
> +
> +static void announce_work(struct work_struct *work)
> +{
> +	struct virtnet_info *vi = container_of(work, struct virtnet_info,
> +					       announce);
> +	netif_notify_peers(vi->dev);
> +	virtnet_ack_link_announce(vi);
> +}
> +
>  static int virtnet_close(struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  
>  	/* Make sure refill_work doesn't re-enable napi! */
>  	cancel_delayed_work_sync(&vi->refill);
> +	cancel_work_sync(&vi->announce);

I think that a config change event can trigger after this point,
so cancel here won't be effective. But why do it here?
virtnet_remove not a better place? We can do it
after remove_vq_common.

>  	napi_disable(&vi->napi);
>  
>  	return 0;
> @@ -962,11 +983,17 @@ static void virtnet_update_status(struct virtnet_info *vi)
>  		return;
>  
>  	/* Ignore unknown (future) status bits */
> -	v &= VIRTIO_NET_S_LINK_UP;
> +	v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;

The announce bit in vi->status is always clear,
so this is IMO confusing. I would do:

	if (v & VIRTIO_NET_S_ANNOUNCE) {
		schedule
	}

	v &= VIRTIO_NET_S_LINK_UP;

and the rest of the logic is not necessary then.

Also, this might run extra ack announce commands after
VIRTIO_NET_S_ANNOUNCE bit is clear. The spec
isn't clear on whether this is legal.

It would be very hard to fix this, so let's add a comment
stating that it's legal, and clarify the spec
in any case.


>  
>  	if (vi->status == v)
>  		return;
>  
> +	if (v & VIRTIO_NET_S_ANNOUNCE) {
> +		v &= ~VIRTIO_NET_S_ANNOUNCE;
> +		if (v & VIRTIO_NET_S_LINK_UP)
> +			schedule_work(&vi->announce);

I think we really want an nrt wq here - if this triggers
multiple times there's no good reason to try and run
the ack command many times in parallel.

> +	}
> +
>  	vi->status = v;
>  
>  	if (vi->status & VIRTIO_NET_S_LINK_UP) {


> @@ -1076,6 +1103,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free;
>  
>  	INIT_DELAYED_WORK(&vi->refill, refill_work);
> +	INIT_WORK(&vi->announce, announce_work);
>  	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
>  	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
>  
> @@ -1187,6 +1215,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	virtqueue_disable_cb(vi->svq);
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
>  		virtqueue_disable_cb(vi->cvq);
> +	cancel_work_sync(&vi->announce);

A config change event can trigger after this point,
so cancel here won't be effective.
Possibly, we need state like config_enable in the block
device.

Also, what exactly will happen on suspend?
As we reset, ANNOUCE bit will be clear so -
do we forget to announce? Probably not good ...

>  
>  	netif_device_detach(vi->dev);
>  	cancel_delayed_work_sync(&vi->refill);
> @@ -1233,6 +1262,7 @@ static unsigned int features[] = {
>  	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
>  	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> +	VIRTIO_NET_F_GUEST_ANNOUNCE,
>  };
>  
>  static struct virtio_driver virtio_net_driver = {
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 970d5a2..383e8a0 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -49,8 +49,10 @@
>  #define VIRTIO_NET_F_CTRL_RX	18	/* Control channel RX mode support */
>  #define VIRTIO_NET_F_CTRL_VLAN	19	/* Control channel VLAN filtering */
>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can send gratituous packet */

Rusty aligned the comments using tabs so let's do it here too?
A better comment 'can announce device on the network'.

>  
>  #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
> +#define VIRTIO_NET_S_ANNOUNCE   2	/* Announcement is needed */

why 3 spaces before the value here?

>  
>  struct virtio_net_config {
>  	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
> @@ -152,4 +154,15 @@ struct virtio_net_ctrl_mac {
>   #define VIRTIO_NET_CTRL_VLAN_ADD             0
>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>  
> +/*
> + * Control link announce acknowledgement
> + *
> + * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
> + * driver has recevied the notification and device would clear the

s/recevied/received/

A bit clearer to replace 'and' with ; here.

> + * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received

s/filed/field/
s/received/receives/

> + * this command.
> + */
> +#define VIRTIO_NET_CTRL_ANNOUNCE       3
> + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
> +
>  #endif /* _LINUX_VIRTIO_NET_H */
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH] virtio-spec: add an rpmsg appendix
From: Ohad Ben-Cohen @ 2012-04-04  7:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, linux-arm-kernel, virtualization

Add an rpmsg appendix to the virtio spec and tag it as Appendix H.

Push the SCSI host appendix to 'I' so the appendices order stays
according to the virtio device ids.

Patch is against:
http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.4.lyx

Tested with LyX 2.0.0.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: <virtualization@lists.linux-foundation.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <linux-arm-kernel@lists.infradead.org>
---
These changes were produced using lyx: they look reasonable when viewing as a PDF,
but I didn't reviewed the actual .lyx code, so I hope they make sense.

PS. Congrats, Rusty!

 virtio-0.9.4.lyx |  489 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 486 insertions(+), 3 deletions(-)

diff --git a/virtio-0.9.4.lyx b/virtio-0.9.4.lyx
index 6c7bab1..f377294 100644
--- a/virtio-0.9.4.lyx
+++ b/virtio-0.9.4.lyx
@@ -56,6 +56,7 @@
 \html_math_output 0
 \html_css_as_file 0
 \html_be_strict false
+\author -1096565211 "Ohad,,," 
 \author -608949062 "Rusty Russell,,," 
 \author 1531152142 "pbonzini" 
 \end_header
@@ -573,8 +574,12 @@ rpmsg
 
 \begin_layout Plain Layout
 
-\change_inserted -608949062 1323409055
+\change_inserted -1096565211 1333519287
+Appendix H
+\change_deleted -1096565211 1333521034
 -
+\change_inserted -608949062 1323409055
+
 \end_layout
 
 \end_inset
@@ -609,7 +614,13 @@ SCSI host
 \begin_layout Plain Layout
 
 \change_inserted 1531152142 1322650861
-Appendix H
+Appendix 
+\change_deleted -1096565211 1333519291
+H
+\change_inserted -1096565211 1333519291
+I
+\change_inserted 1531152142 1322650861
+
 \end_layout
 
 \end_inset
@@ -6332,12 +6343,484 @@ VIRTIO_BALLOON_S_MEMFREE The amount of memory not being used for any purpose
 
 \begin_layout Description
 VIRTIO_BALLOON_S_MEMTOT The total amount of memory available (in bytes).
+\change_inserted -1096565211 1333519339
+
+\end_layout
+
+\begin_layout Chapter*
+
+\change_inserted -1096565211 1333519459
+Appendix H: Rpmsg: Remote Processor Messaging
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1096565211 1333519548
+Virtio rpmsg devices represent remote processors on the system which run
+ in asymmetric multi-processing (AMP) configuration, and which are usually
+ used to offload cpu-intensive tasks from the main application processor
+ (a typical SoC methodology).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1096565211 1333519483
+Virtio is being used to communicate with those remote processors; empty
+ buffers are placed in one virtqueue for receiving messages, and non-empty
+ buffers, containing outbound messages, are enqueued in a second virtqueue
+ for transmission.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1096565211 1333519614
+Numerous communication channels can be multiplexed over those two virtqueues,
+ so different entities, running on the application and remote processor,
+ can directly communicate in a point-to-point fashion.
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted -1096565211 1333519460
+Configuration
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1096565211 1333519622
+Subsystem
+\begin_inset space ~
+\end_inset
+
+Device
+\begin_inset space ~
+\end_inset
+
+ID 7
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1096565211 1333519647
+Virtqueues 0:receiveq.
+ 1:transmitq.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1096565211 1333519460
+Feature
+\begin_inset space ~
+\end_inset
+
+bits
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted -1096565211 1333520994
+VIRTIO_RPMSG_F_NS
+\begin_inset space ~
+\end_inset
+
+(0) Device sends (and capable of receiving) name service messages announcing
+ the creation (or destruction) of a channel:
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+/**
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * struct rpmsg_ns_msg - dynamic name service announcement message
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * @name: name of remote service that is published
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * @addr: address of remote service that is published
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * @flags: indicates whether service is created or destroyed
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ *
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * This message is sent across to publish a new service (or announce
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * about its removal).
+ When we receives these messages, an appropriate
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * rpmsg channel (i.e device) is created/destroyed.
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+struct rpmsg_ns_msgoon_config {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+	char name[RPMSG_NAME_SIZE];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+	u32 addr;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+	u32 flags;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+} __packed;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+/**
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * enum rpmsg_ns_flags - dynamic name service announcement flags
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ *
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * @RPMSG_NS_CREATE: a new remote service was just created
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ * @RPMSG_NS_DESTROY: a remote service was just destroyed
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+ */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+enum rpmsg_ns_flags {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+	RPMSG_NS_CREATE = 0,
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+	RPMSG_NS_DESTROY = 1,
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520994
+
+};
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted -1096565211 1333520275
+Device
+\begin_inset space ~
+\end_inset
+
+configuration
+\begin_inset space ~
+\end_inset
+
+layout
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1096565211 1333521209
+At his point none currently defined.
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted -1096565211 1333519460
+Device Initialization
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted -1096565211 1333520334
+The initialization routine should identify the receive and transmission
+ virtqueues.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted -1096565211 1333520335
+The receive virtqueue should be filled with receive buffers.
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted -1096565211 1333520381
+Device Operation
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1096565211 1333520426
+Messages are transmitted by placing them in the transmitq, and buffers for
+ inbound messages are placed in the receiveq.
+ In any case, messages are always preceded by the following header: 
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520439
+
+/**
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520443
+
+ * struct rpmsg_hdr - common header for all rpmsg messages
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520447
+
+ * @src: source address
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520450
+
+ * @dst: destination address
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520453
+
+ * @reserved: reserved for future use
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520456
+
+ * @len: length of payload (in bytes)
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520459
+
+ * @flags: message flags
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520462
+
+ * @data: @len bytes of message payload data
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520464
+
+ *
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520467
+
+ * Every message sent(/received) on the rpmsg bus begins with this header.
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520471
+
+ */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520475
+
+struct rpmsg_hdr {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520477
+
+	u32 src;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520480
+
+	u32 dst;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520481
+
+	u32 reserved;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520484
+
+	u16 len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520486
+
+	u16 flags;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520489
+
+	u8 data[0];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1096565211 1333520487
+
+} __packed;
+\end_layout
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Chapter*
 
 \change_inserted 1531152142 1322571716
-Appendix H: SCSI Host Device
+Appendix 
+\change_deleted -1096565211 1333519318
+H
+\change_inserted -1096565211 1333519319
+I
+\change_inserted 1531152142 1322571716
+: SCSI Host Device
 \end_layout
 
 \begin_layout Standard
-- 
1.7.5.4

^ permalink raw reply related

* Re: [V6 PATCH] virtio-net: send gratuitous packets when needed
From: David Miller @ 2012-04-03 21:49 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, qemu-devel, virtualization, linux-kernel
In-Reply-To: <20120328054428.34415.75432.stgit@amd-6168-8-1.englab.nay.redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 28 Mar 2012 13:44:28 +0800

> As hypervior does not have the knowledge of guest network configuration, it's
> better to ask guest to send gratuitous packets when needed.
> 
> Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
> is set, a workqueue is scheduled to send gratuitous packet through
> NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> VIRTIO_NET_F_GUEST_ANNOUNCE.
 ...
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Michael and others, what's the story with this patch?

^ permalink raw reply

* Re: [PATCH] virtio_blk: Drop unused request tracking list
From: Asias He @ 2012-04-02 23:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization
In-Reply-To: <20120401100737.GE22071@redhat.com>

On 04/01/2012 06:07 PM, Michael S. Tsirkin wrote:
> On Fri, Mar 30, 2012 at 11:24:10AM +0800, Asias He wrote:
>> Benchmark shows small performance improvement on fusion io device.
>>
>> Before:
>>    seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
>>    seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
>>    rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
>>    rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec
>>
>> After:
>>    seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
>>    seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
>>    rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
>>    rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
>>
>> Signed-off-by: Asias He<asias@redhat.com>
>
> Thanks, the patch makes sense to me.
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> This is 3.5 material, correct?

Yes, I think so.

>
>> ---
>>   drivers/block/virtio_blk.c |   10 ----------
>>   1 files changed, 0 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index c4a60ba..338da9a 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -29,9 +29,6 @@ struct virtio_blk
>>   	/* The disk structure for the kernel. */
>>   	struct gendisk *disk;
>>
>> -	/* Request tracking. */
>> -	struct list_head reqs;
>> -
>>   	mempool_t *pool;
>>
>>   	/* Process context for config space updates */
>> @@ -55,7 +52,6 @@ struct virtio_blk
>>
>>   struct virtblk_req
>>   {
>> -	struct list_head list;
>>   	struct request *req;
>>   	struct virtio_blk_outhdr out_hdr;
>>   	struct virtio_scsi_inhdr in_hdr;
>> @@ -99,7 +95,6 @@ static void blk_done(struct virtqueue *vq)
>>   		}
>>
>>   		__blk_end_request_all(vbr->req, error);
>> -		list_del(&vbr->list);
>>   		mempool_free(vbr, vblk->pool);
>>   	}
>>   	/* In case queue is stopped waiting for more buffers. */
>> @@ -184,7 +179,6 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
>>   		return false;
>>   	}
>>
>> -	list_add_tail(&vbr->list,&vblk->reqs);
>>   	return true;
>>   }
>>
>> @@ -408,7 +402,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
>>   		goto out_free_index;
>>   	}
>>
>> -	INIT_LIST_HEAD(&vblk->reqs);
>>   	spin_lock_init(&vblk->lock);
>>   	vblk->vdev = vdev;
>>   	vblk->sg_elems = sg_elems;
>> @@ -571,9 +564,6 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
>>   	vblk->config_enable = false;
>>   	mutex_unlock(&vblk->config_lock);
>>
>> -	/* Nothing should be pending. */
>> -	BUG_ON(!list_empty(&vblk->reqs));
>> -
>>   	/* Stop all the virtqueues. */
>>   	vdev->config->reset(vdev);
>>
>> --
>> 1.7.7.6
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Asias

^ permalink raw reply

* RE: [PATCH 1/1] Drivers: scsi: storvsc: Properly handle errors from the host
From: KY Srinivasan @ 2012-04-02 23:09 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	virtualization@lists.osdl.org, ohering@suse.com,
	jbottomley@parallels.com, hch@infradead.org,
	linux-scsi@vger.kernel.org
In-Reply-To: <20120329235111.GA19924@kroah.com>



> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, March 29, 2012 7:51 PM
> To: KY Srinivasan
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; jbottomley@parallels.com;
> hch@infradead.org; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Properly handle errors from the
> host
> 
> On Thu, Mar 29, 2012 at 04:54:43PM -0700, K. Y. Srinivasan wrote:
> > If the host returns error for pass through commands, deal with
> > appropriately. I would like to thank James for patiently helping
> > me with this patch.
> >
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> >  drivers/scsi/storvsc_drv.c |   20 ++++++++++++++++----
> >  1 files changed, 16 insertions(+), 4 deletions(-)
> 
> As this driver is now in Linus's tree, and drivers/scsi/ is James's
> area, he is the one to take them, not me.

Thanks Greg; I understand James would be responsible for deciding if
this patch is the correct way to fix this issue and to check this patch in, if it is.

Regards,

K. Y

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Tejun Heo @ 2012-04-02 19:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, KVM, SCSI, Michael S. Tsirkin, LKML, VIRTUAL,
	Ren Mingxin
In-Reply-To: <1333392978.2971.25.camel@dabdike>

Hello, James.

On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
> So if we're agreed no other devices going forwards should ever use this
> interface, is there any point unifying the interface?  No matter how
> many caveats you hedge it round with, putting the API in a central place
> will be a bit like a honey trap for careless bears.   It might be safer
> just to leave it buried in the three current drivers.

Yeah, that was my hope but I think it would be easier to enforce to
have a common function which is clearly marked legacy so that new
driver writers can go look for the naming code in the existing ones,
find out they're all using the same function which is marked legacy
and explains what to do for newer drivers.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: James Bottomley @ 2012-04-02 18:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, KVM, SCSI, Michael S. Tsirkin, LKML, VIRTUAL,
	Ren Mingxin
In-Reply-To: <20120402185259.GA16226@dhcp-172-17-108-109.mtv.corp.google.com>

On Mon, 2012-04-02 at 11:52 -0700, Tejun Heo wrote:
> > Probably same. Renaming existing devices will break setups.
> > I think the idea is to avoid using the
> > legacy naming in new drivers *that will be added from now on*.
> 
> Yeap.

So if we're agreed no other devices going forwards should ever use this
interface, is there any point unifying the interface?  No matter how
many caveats you hedge it round with, putting the API in a central place
will be a bit like a honey trap for careless bears.   It might be safer
just to leave it buried in the three current drivers.

James

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Tejun Heo @ 2012-04-02 18:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jens Axboe, KVM, SCSI, LKML, VIRTUAL, Ren Mingxin
In-Reply-To: <20120402072009.GF30360@redhat.com>

Hello,

On Mon, Apr 02, 2012 at 10:20:09AM +0300, Michael S. Tsirkin wrote:
> Pleae don't rename virtio disks, it is way too late for that:
> virtio block driver was merged around 2007, it is not new by
> any measure, and there are many systems out there using
> the current naming scheme.

There's no point in renaming device nodes of already deployed drivers.
It'll cause a lot of pain.

> > And how about the rssd in the patch 3 then?
> 
> Probably same. Renaming existing devices will break setups.
> I think the idea is to avoid using the
> legacy naming in new drivers *that will be added from now on*.

Yeap.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] virtio_blk: Drop unused request tracking list
From: Stefan Hajnoczi @ 2012-04-02 14:29 UTC (permalink / raw)
  To: Asias He; +Cc: virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <CAFO3S42y9xYv3i_gffL1qvKmbRTjVVrxdzvj8+vPF-vVYnESNQ@mail.gmail.com>

On Sat, Mar 31, 2012 at 11:22 AM, Asias He <asias.hejun@gmail.com> wrote:
> Hi, Stefan
>
> On Fri, Mar 30, 2012 at 6:14 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Fri, Mar 30, 2012 at 4:24 AM, Asias He <asias@redhat.com> wrote:
>>> Benchmark shows small performance improvement on fusion io device.
>>>
>>> Before:
>>>  seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
>>>  seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
>>>  rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
>>>  rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec
>>>
>>> After:
>>>  seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
>>>  seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
>>>  rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
>>>  rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
>>>
>>> Signed-off-by: Asias He <asias@redhat.com>
>>> ---
>>>  drivers/block/virtio_blk.c |   10 ----------
>>>  1 files changed, 0 insertions(+), 10 deletions(-)
>>
>> Thanks for providing performance results.  It's a bit scary that this
>> unused list has an impact...I'm sure we have worse things elsewhere in
>> the KVM storage code path.
>
> Do you find any worse things? I saw your trace work here:
>
>   http://www.linux-kvm.org/page/Virtio/Block/Latency

I haven't updated those results in a long time because I no longer use
that benchmarking environment.

Stefan

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-02 12:15 UTC (permalink / raw)
  To: Avi Kivity, H. Peter Anvin
  Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
	Jeremy Fitzhardinge, Srivatsa Vaddagiri, Attilio Rao, Ingo Molnar,
	Virtualization, Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F7976B6.5050000@linux.vnet.ibm.com>

On 04/02/2012 03:21 PM, Raghavendra K T wrote:
> On 04/01/2012 07:23 PM, Avi Kivity wrote:
>  > On 04/01/2012 04:48 PM, Raghavendra K T wrote:
>  >>>> I have patch something like below in mind to try:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5127668..3fa912a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1557,12 +1557,17 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
> mark_page_dirty_in_slot(kvm, memslot, gfn);
> }
>
> +#define YIELD_THRESHOLD 2048
> +static void kvm_vcpu_try_yield_to(struct kvm_vcpu *me);
> /*
> * The vCPU has executed a HLT instruction with in-kernel mode enabled.
> */
> void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> {
[...]
> + if (loop_count++ % YIELD_THRESHOLD)
> + schedule();
> + else
> + kvm_vcpu_try_yield_to(vcpu);
> }
>
> +static void kvm_vcpu_try_yield(struct kvm_vcpu *me)

yes, it is kvm_vcpu_try_yield_to. had changed the name just before
sending. sorry.

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-02  9:51 UTC (permalink / raw)
  To: Avi Kivity, H. Peter Anvin
  Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
	Jeremy Fitzhardinge, Srivatsa Vaddagiri, Attilio Rao, Ingo Molnar,
	Virtualization, Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F785DCF.7020809@redhat.com>

On 04/01/2012 07:23 PM, Avi Kivity wrote:
 > On 04/01/2012 04:48 PM, Raghavendra K T wrote:
 >>>> I have patch something like below in mind to try:
 >>>>
 >>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 >>>> index d3b98b1..5127668 100644
 >>>> --- a/virt/kvm/kvm_main.c
 >>>> +++ b/virt/kvm/kvm_main.c
 >>>> @@ -1608,15 +1608,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 >>>>         * else and called schedule in __vcpu_run.  Hopefully that
 >>>>         * VCPU is holding the lock that we need and will release it.
 >>>>         * We approximate round-robin by starting at the last boosted
 >>>> VCPU.
 >>>> +     * Priority is given to vcpu that are unhalted.
 >>>>         */
 >>>> -    for (pass = 0; pass<   2&&   !yielded; pass++) {
 >>>> +    for (pass = 0; pass<   3&&   !yielded; pass++) {
 >>>>            kvm_for_each_vcpu(i, vcpu, kvm) {
 >>>>                struct task_struct *task = NULL;
 >>>>                struct pid *pid;
 >>>> -            if (!pass&&   i<   last_boosted_vcpu) {
 >>>> +            if (!pass&&   !vcpu->pv_unhalted)
 >>>> +                continue;
 >>>> +            else if (pass == 1&&   i<   last_boosted_vcpu) {
 >>>>                    i = last_boosted_vcpu;
 >>>>                    continue;
 >>>> -            } else if (pass&&   i>   last_boosted_vcpu)
 >>>> +            } else if (pass == 2&&   i>   last_boosted_vcpu)
 >>>>                    break;
 >>>>                if (vcpu == me)
 >>>>                    continue;
 >>>>
 >>>
 >>> Actually I think this is unneeded.  The loops tries to find vcpus that
 >>> are runnable but not running (vcpu_active(vcpu->wq)), and halted vcpus
 >>> don't match this condition.
 >>>

Oh! I think I misinterpreted your statement. hmm I got it. you told to
remove if (vcpu == me) condition.

I got some more interesting idea ( not sure there is some flaw in idea too).
Basically tried  similar idea (to PLE exit handler) in vcpu_block.

Instead of blind scheduling we try to do yield to vcpu that is kicked.
IMO it may solve some scalability problem and make LHP problem further
shrink.

I think Thomas would be happy to see the result.

results:
test setup.
===========
Host: i5-2540M CPU @ 2.60GHz laptop with 4cpu w/ hyperthreading. 8GB RAM
guest: 16 vcpu 2GB RAM  single guest.

Did kernbench run under guest:
x rc6-with ticketlock (current patchset)+ kvmpatches 
(CONFIG_PARAVIRT_SPINLOCK=y)
+ rc6-with ticketlock + kvmpatches + try_yield_patch (below one) 
(YIELD_THRESHOLD=256) (CONFIG_PARAVIRT_SPINLOCK=y)
* rc6-withticketlock + kvmpatches + try_yield_patch 
(YIELD_THRESHOLD=2048) (CONFIG_PARAVIRT_SPINLOCK=y)

N           Min           Max        Median           Avg        Stddev
x   3        162.45        165.94       165.433     164.60767     1.8857111
+   3        114.02       117.243       115.953     115.73867     1.6221548
Difference at 95.0% confidence
         -29.6882% +/- 2.42192%
*   3       115.823       120.423       117.103       117.783     2.3741946
Difference at 95.0% confidence
         -28.4462% +/- 2.9521%


improvement ~29% w.r.t to current patches.

Note: vanilla rc6 (host and guest) with (CONFIG_PARAVIRT_SPINLOCK=n)
did not finish kernbench run even after *1hr 45* minutes (above
kernbench runs took 9 minute and  6.5 min respectively). I did not try
to test it again.


Yes, I understand that  have to do some more test. and immediate TODO's
for patch are.

1) code belongs to arch/x86 directory and fill in static inline for
other archs
2) tweek YIELD_THRESHOLD value.

Ideas/suggestions welcome

Here is the try_yield_to patch.
---
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5127668..3fa912a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1557,12 +1557,17 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
  	mark_page_dirty_in_slot(kvm, memslot, gfn);
  }

+#define YIELD_THRESHOLD 2048
+static void kvm_vcpu_try_yield_to(struct kvm_vcpu *me);
  /*
   * The vCPU has executed a HLT instruction with in-kernel mode enabled.
   */
  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  {
  	DEFINE_WAIT(wait);
+	unsigned int loop_count;
+
+	loop_count = 0;

  	for (;;) {
  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
@@ -1579,7 +1584,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  		if (signal_pending(current))
  			break;

-		schedule();
+		if (loop_count++ % YIELD_THRESHOLD)
+			schedule();
+		else
+			kvm_vcpu_try_yield_to(vcpu);
  	}

  	finish_wait(&vcpu->wq, &wait);
@@ -1593,6 +1601,39 @@ void kvm_resched(struct kvm_vcpu *vcpu)
  }
  EXPORT_SYMBOL_GPL(kvm_resched);

+static void kvm_vcpu_try_yield(struct kvm_vcpu *me)
+{
+
+	struct kvm *kvm = me->kvm;
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct task_struct *task = NULL;
+		struct pid *pid;
+		if (!vcpu->pv_unhalted)
+			continue;
+		if (waitqueue_active(&vcpu->wq))
+			continue;
+		rcu_read_lock();
+		pid = rcu_dereference(vcpu->pid);
+		if (pid)
+			task = get_pid_task(vcpu->pid, PIDTYPE_PID);
+		rcu_read_unlock();
+		if (!task)
+			continue;
+		if (task->flags & PF_VCPU) {
+			put_task_struct(task);
+			continue;
+		}
+		if (yield_to(task, 1)) {
+			put_task_struct(task);
+			break;
+		}
+		put_task_struct(task);
+	}
+}
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
  	struct kvm *kvm = me->kvm;
---

^ permalink raw reply related

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-02  9:51 UTC (permalink / raw)
  To: Avi Kivity, H. Peter Anvin
  Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
	Jeremy Fitzhardinge, Srivatsa Vaddagiri, Attilio Rao, Ingo Molnar,
	Virtualization, Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F785DCF.7020809@redhat.com>

On 04/01/2012 07:23 PM, Avi Kivity wrote:
 > On 04/01/2012 04:48 PM, Raghavendra K T wrote:
 >>>> I have patch something like below in mind to try:
 >>>>
 >>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 >>>> index d3b98b1..5127668 100644
 >>>> --- a/virt/kvm/kvm_main.c
 >>>> +++ b/virt/kvm/kvm_main.c
 >>>> @@ -1608,15 +1608,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 >>>>         * else and called schedule in __vcpu_run.  Hopefully that
 >>>>         * VCPU is holding the lock that we need and will release it.
 >>>>         * We approximate round-robin by starting at the last boosted
 >>>> VCPU.
 >>>> +     * Priority is given to vcpu that are unhalted.
 >>>>         */
 >>>> -    for (pass = 0; pass<   2&&   !yielded; pass++) {
 >>>> +    for (pass = 0; pass<   3&&   !yielded; pass++) {
 >>>>            kvm_for_each_vcpu(i, vcpu, kvm) {
 >>>>                struct task_struct *task = NULL;
 >>>>                struct pid *pid;
 >>>> -            if (!pass&&   i<   last_boosted_vcpu) {
 >>>> +            if (!pass&&   !vcpu->pv_unhalted)
 >>>> +                continue;
 >>>> +            else if (pass == 1&&   i<   last_boosted_vcpu) {
 >>>>                    i = last_boosted_vcpu;
 >>>>                    continue;
 >>>> -            } else if (pass&&   i>   last_boosted_vcpu)
 >>>> +            } else if (pass == 2&&   i>   last_boosted_vcpu)
 >>>>                    break;
 >>>>                if (vcpu == me)
 >>>>                    continue;
 >>>>
 >>>
 >>> Actually I think this is unneeded.  The loops tries to find vcpus that
 >>> are runnable but not running (vcpu_active(vcpu->wq)), and halted vcpus
 >>> don't match this condition.
 >>>

Oh! I think I misinterpreted your statement. hmm I got it. you told to
remove if (vcpu == me) condition.

I got some more interesting idea ( not sure there is some flaw in idea too).
Basically tried  similar idea (to PLE exit handler) in vcpu_block.

Instead of blind scheduling we try to do yield to vcpu that is kicked.
IMO it may solve some scalability problem and make LHP problem further
shrink.

I think Thomas would be happy to see the result.

results:
test setup.
===========
Host: i5-2540M CPU @ 2.60GHz laptop with 4cpu w/ hyperthreading. 8GB RAM
guest: 16 vcpu 2GB RAM  single guest.

Did kernbench run under guest:
x rc6-with ticketlock (current patchset)+ kvmpatches 
(CONFIG_PARAVIRT_SPINLOCK=y)
+ rc6-with ticketlock + kvmpatches + try_yield_patch (below one) 
(YIELD_THRESHOLD=256) (CONFIG_PARAVIRT_SPINLOCK=y)
* rc6-withticketlock + kvmpatches + try_yield_patch 
(YIELD_THRESHOLD=2048) (CONFIG_PARAVIRT_SPINLOCK=y)

N           Min           Max        Median           Avg        Stddev
x   3        162.45        165.94       165.433     164.60767     1.8857111
+   3        114.02       117.243       115.953     115.73867     1.6221548
Difference at 95.0% confidence
         -29.6882% +/- 2.42192%
*   3       115.823       120.423       117.103       117.783     2.3741946
Difference at 95.0% confidence
         -28.4462% +/- 2.9521%


improvement ~29% w.r.t to current patches.

Note: vanilla rc6 (host and guest) with (CONFIG_PARAVIRT_SPINLOCK=n)
did not finish kernbench run even after *1hr 45* minutes (above
kernbench runs took 9 minute and  6.5 min respectively). I did not try
to test it again.


Yes, I understand that  have to do some more test. and immediate TODO's
for patch are.

1) code belongs to arch/x86 directory and fill in static inline for
other archs
2) tweek YIELD_THRESHOLD value.

Ideas/suggestions welcome

Here is the try_yield_to patch.
---
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5127668..3fa912a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1557,12 +1557,17 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
  	mark_page_dirty_in_slot(kvm, memslot, gfn);
  }

+#define YIELD_THRESHOLD 2048
+static void kvm_vcpu_try_yield_to(struct kvm_vcpu *me);
  /*
   * The vCPU has executed a HLT instruction with in-kernel mode enabled.
   */
  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  {
  	DEFINE_WAIT(wait);
+	unsigned int loop_count;
+
+	loop_count = 0;

  	for (;;) {
  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
@@ -1579,7 +1584,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  		if (signal_pending(current))
  			break;

-		schedule();
+		if (loop_count++ % YIELD_THRESHOLD)
+			schedule();
+		else
+			kvm_vcpu_try_yield_to(vcpu);
  	}

  	finish_wait(&vcpu->wq, &wait);
@@ -1593,6 +1601,39 @@ void kvm_resched(struct kvm_vcpu *vcpu)
  }
  EXPORT_SYMBOL_GPL(kvm_resched);

+static void kvm_vcpu_try_yield(struct kvm_vcpu *me)
+{
+
+	struct kvm *kvm = me->kvm;
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct task_struct *task = NULL;
+		struct pid *pid;
+		if (!vcpu->pv_unhalted)
+			continue;
+		if (waitqueue_active(&vcpu->wq))
+			continue;
+		rcu_read_lock();
+		pid = rcu_dereference(vcpu->pid);
+		if (pid)
+			task = get_pid_task(vcpu->pid, PIDTYPE_PID);
+		rcu_read_unlock();
+		if (!task)
+			continue;
+		if (task->flags & PF_VCPU) {
+			put_task_struct(task);
+			continue;
+		}
+		if (yield_to(task, 1)) {
+			put_task_struct(task);
+			break;
+		}
+		put_task_struct(task);
+	}
+}
+
  void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
  	struct kvm *kvm = me->kvm;
---

^ permalink raw reply related

* Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Ian Campbell @ 2012-04-02  9:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: the arch/x86 maintainers, KVM, Stefano Stabellini, Peter Zijlstra,
	Konrad Rzeszutek Wilk, LKML, Andi Kleen, Srivatsa Vaddagiri,
	Raghavendra K T, Jeremy Fitzhardinge, H. Peter Anvin, Attilio Rao,
	Ingo Molnar, Virtualization, Linus Torvalds, Xen Devel,
	Stephan Diestelhorst, Avi Kivity
In-Reply-To: <alpine.LFD.2.02.1203302333560.2542@ionos>

On Fri, 2012-03-30 at 23:07 +0100, Thomas Gleixner wrote:
> So if we need to fiddle with the scheduler and frankly that's the only
> way to get a real gain (the numbers, which are achieved by this
> patches, are not that impressive) then the question arises whether we
> should turn the whole thing around.

It probably doesn't materially effect your core point (which seems valid
to me) but it's worth pointing out that the numbers presented in this
thread are AFAICT mostly focused on ensuring that that the impact of
this infrastructure is acceptable on native rather than showing the
benefits for virtualized workloads.

Ian.

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Thomas Gleixner @ 2012-04-02  9:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Stefano Stabellini, Raghavendra K T, LKML,
	Andi Kleen, Srivatsa Vaddagiri, Jeremy Fitzhardinge,
	H. Peter Anvin, Attilio Rao, Ingo Molnar, Virtualization,
	Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F7858C0.90405@redhat.com>

On Sun, 1 Apr 2012, Avi Kivity wrote:
> On 03/31/2012 01:07 AM, Thomas Gleixner wrote:
> > On Fri, 30 Mar 2012, H. Peter Anvin wrote:
> >
> > > What is the current status of this patchset?  I haven't looked at it too
> > > closely because I have been focused on 3.4 up until now...
> >
> > The real question is whether these heuristics are the correct approach
> > or not.
> >
> > If I look at it from the non virtualized kernel side then this is ass
> > backwards. We know already that we are holding a spinlock which might
> > cause other (v)cpus going into eternal spin. The non virtualized
> > kernel solves this by disabling preemption and therefor getting out of
> > the critical section as fast as possible,
> >
> > The virtualization problem reminds me a lot of the problem which RT
> > kernels are observing where non raw spinlocks are turned into
> > "sleeping spinlocks" and therefor can cause throughput issues for non
> > RT workloads.
> >
> > Though the virtualized situation is even worse. Any preempted guest
> > section which holds a spinlock is prone to cause unbound delays.
> >
> > The paravirt ticketlock solution can only mitigate the problem, but
> > not solve it. With massive overcommit there is always a way to trigger
> > worst case scenarious unless you are educating the scheduler to cope
> > with that.
> >
> > So if we need to fiddle with the scheduler and frankly that's the only
> > way to get a real gain (the numbers, which are achieved by this
> > patches, are not that impressive) then the question arises whether we
> > should turn the whole thing around.
> >
> > I know that Peter is going to go berserk on me, but if we are running
> > a paravirt guest then it's simple to provide a mechanism which allows
> > the host (aka hypervisor) to check that in the guest just by looking
> > at some global state.
> >
> > So if a guest exits due to an external event it's easy to inspect the
> > state of that guest and avoid to schedule away when it was interrupted
> > in a spinlock held section. That guest/host shared state needs to be
> > modified to indicate the guest to invoke an exit when the last nested
> > lock has been released.
> 
> Interesting idea (I think it has been raised before btw, don't recall by
> who).

Someoen posted a pointer to that old thread.

> One thing about it is that it can give many false positives.  Consider a
> fine-grained spinlock that is being accessed by many threads.  That is,
> the lock is taken and released with high frequency, but there is no
> contention, because each vcpu is accessing a different instance.  So the
> host scheduler will needlessly delay preemption of vcpus that happen to
> be holding a lock, even though this gains nothing.

You're talking about per cpu locks, right? I can see the point there,
but per cpu stuff might be worth annotating to avoid this.
 
> A second issue may happen with a lock that is taken and released with
> high frequency, with a high hold percentage.  The host scheduler may
> always sample the guest in a held state, leading it to conclude that
> it's exceeding its timeout when in fact the lock is held for a short
> time only.

Well, no. You can avoid that.

host		VCPU
		spin_lock()
		 modify_global_state()
   	exit
check_global_state()
mark_global_state()
reschedule vcpu

		spin_unlock()
		 check_global_state()
		  trigger_exit()

So when an exit occures in the locked section, then the host can mark
the global state to tell the guest to issue a trap on unlock.
 
> > Of course this needs to be time bound, so a rogue guest cannot
> > monopolize the cpu forever, but that's the least to worry about
> > problem simply because a guest which does not get out of a spinlocked
> > region within a certain amount of time is borked and elegible to
> > killing anyway.
> 
> Hopefully not killing!  Just because a guest doesn't scale well, or even
> if it's deadlocked, doesn't mean it should be killed.  Just preempt it.

:)
 
> > Thoughts ?
> 
> It's certainly interesting.  Maybe a combination is worthwhile - prevent
> lockholder preemption for a short period of time AND put waiters to
> sleep in case that period is insufficient to release the lock.

Right, but as Srivatsa pointed out this still needs the ticket lock
ordering support to avoid that guests are completely starved.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Michael S. Tsirkin @ 2012-04-02  7:20 UTC (permalink / raw)
  To: Ren Mingxin; +Cc: Jens Axboe, KVM, SCSI, LKML, VIRTUAL, Tejun Heo
In-Reply-To: <4F78FE89.2070707@cn.fujitsu.com>

On Mon, Apr 02, 2012 at 09:19:05AM +0800, Ren Mingxin wrote:
>  On 03/30/2012 11:28 PM, Tejun Heo wrote:
> >On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote:
> >>On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote:
> >>>  The current virtblk's naming algorithm only supports 263  disks.
> >>>If there are mass of virtblks(exceeding 263), there will be disks
> >>>with the same name.
> >>>
> >>>By renaming "sd_format_disk_name()" to "disk_name_format()"
> >>>and moving it into block core, virtio_blk can use this function to
> >>>support mass of disks.
> >>>
> >>>Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
> >>I guess it's already way too late but why couldn't they have been
> >>named vdD-P where both D and P are integers denoting disk number and
> >>partition number?  [sh]dX's were created when there weren't supposed
> >>to be too many disks, so we had to come up with the horrible alphabet
> >>based numbering scheme but vd is new enough.  I mean, naming is one
> >>thing but who wants to figure out which sequence is or guess what
> >>comes next vdzz9?  :(
> >>
> >>If we're gonna move it to block layer, let's add big blinking red
> >>comment saying "don't ever use it for any new driver".
> >And also let's make that clear in the function name - say,
> >format_legacy_disk_name() or something.
> 
> So, to legacy disks [sh]d, we'd name them as [sh]d[a-z]{1,}. To new devices
> like vd, we'd name them as vd<index>(vd<index>p<partno> as partitions)?

Pleae don't rename virtio disks, it is way too late for that:
virtio block driver was merged around 2007, it is not new by
any measure, and there are many systems out there using
the current naming scheme.

> And how about the rssd in the patch 3 then?

Probably same. Renaming existing devices will break setups.
I think the idea is to avoid using the
legacy naming in new drivers *that will be added from now on*.

> Besides, does anybody have comments?
> Looking forward to your replies ;-)
> 
> -- 
> Thanks,
> Ren

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Michael S. Tsirkin @ 2012-04-02  7:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, KVM, SCSI, LKML, VIRTUAL, Ren Mingxin
In-Reply-To: <20120330152606.GB28934@google.com>

On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote:
> On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote:
> >  The current virtblk's naming algorithm only supports 263  disks.
> > If there are mass of virtblks(exceeding 263), there will be disks
> > with the same name.
> > 
> > By renaming "sd_format_disk_name()" to "disk_name_format()"
> > and moving it into block core, virtio_blk can use this function to
> > support mass of disks.
> > 
> > Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
> 
> I guess it's already way too late but why couldn't they have been
> named vdD-P where both D and P are integers denoting disk number and
> partition number?

Yes they could and yes it's way too late to change device
names for virtio - if we did this will break countless setups.

>  [sh]dX's were created when there weren't supposed
> to be too many disks, so we had to come up with the horrible alphabet
> based numbering scheme but vd is new enough.  I mean, naming is one
> thing but who wants to figure out which sequence is or guess what
> comes next vdzz9?  :(
> 
> If we're gonna move it to block layer, let's add big blinking red
> comment saying "don't ever use it for any new driver".
> 
> Thanks.

Right. virtio would have to use the legacy one though.

> -- 
> tejun

^ permalink raw reply

* Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Juergen Gross @ 2012-04-02  4:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: the arch/x86 maintainers, KVM, Stefano Stabellini, Peter Zijlstra,
	Konrad Rzeszutek Wilk, LKML, Andi Kleen, Srivatsa Vaddagiri,
	Raghavendra K T, Jeremy Fitzhardinge, H. Peter Anvin, Attilio Rao,
	Ingo Molnar, Virtualization, Linus Torvalds, Xen Devel,
	Stephan Diestelhorst, Avi Kivity
In-Reply-To: <alpine.LFD.2.02.1203302333560.2542@ionos>

On 03/31/2012 12:07 AM, Thomas Gleixner wrote:
> On Fri, 30 Mar 2012, H. Peter Anvin wrote:
>
>> What is the current status of this patchset?  I haven't looked at it too
>> closely because I have been focused on 3.4 up until now...
> The real question is whether these heuristics are the correct approach
> or not.
>
> If I look at it from the non virtualized kernel side then this is ass
> backwards. We know already that we are holding a spinlock which might
> cause other (v)cpus going into eternal spin. The non virtualized
> kernel solves this by disabling preemption and therefor getting out of
> the critical section as fast as possible,
>
> The virtualization problem reminds me a lot of the problem which RT
> kernels are observing where non raw spinlocks are turned into
> "sleeping spinlocks" and therefor can cause throughput issues for non
> RT workloads.
>
> Though the virtualized situation is even worse. Any preempted guest
> section which holds a spinlock is prone to cause unbound delays.
>
> The paravirt ticketlock solution can only mitigate the problem, but
> not solve it. With massive overcommit there is always a way to trigger
> worst case scenarious unless you are educating the scheduler to cope
> with that.
>
> So if we need to fiddle with the scheduler and frankly that's the only
> way to get a real gain (the numbers, which are achieved by this
> patches, are not that impressive) then the question arises whether we
> should turn the whole thing around.
>
> I know that Peter is going to go berserk on me, but if we are running
> a paravirt guest then it's simple to provide a mechanism which allows
> the host (aka hypervisor) to check that in the guest just by looking
> at some global state.
>
> So if a guest exits due to an external event it's easy to inspect the
> state of that guest and avoid to schedule away when it was interrupted
> in a spinlock held section. That guest/host shared state needs to be
> modified to indicate the guest to invoke an exit when the last nested
> lock has been released.
>
> Of course this needs to be time bound, so a rogue guest cannot
> monopolize the cpu forever, but that's the least to worry about
> problem simply because a guest which does not get out of a spinlocked
> region within a certain amount of time is borked and elegible to
> killing anyway.
>
> Thoughts ?

I used this approach in 2008:

http://lists.xen.org/archives/html/xen-devel/2008-12/msg00740.html

It worked very well, but it was rejected at that time. I wouldn't mind trying
it again if there is some support from your side. :-)


Juergen

-- 

Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply

* Re: [PATCH 3/4] block: replace rssd_disk_name_format() to disk_name_format()
From: Ren Mingxin @ 2012-04-02  1:21 UTC (permalink / raw)
  To: Asai Thambi Samymuthu Pattrayasamy (asamymuthupa) [CONT - Type 2]
  Cc: Jens Axboe, KVM, SCSI, Michael S. Tsirkin, LKML, VIRTUAL,
	Tejun Heo
In-Reply-To: <4F764778.1030406@micron.com>

  On 03/31/2012 07:54 AM, Asai Thambi Samymuthu Pattrayasamy 
(asamymuthupa) [CONT - Type 2] wrote:
> On 3/30/2012 2:53 AM, Ren Mingxin wrote:
>
>>   Currently, block core has been supplied "disk_name_format()", so
>> we should remove duplicate function "rssd_disk_name_format()"
>> and use the new function to format rssd disk names.
>>
>> Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
>> ---
>>   mtip32xx.c |   33 +--------------------------------
>>   1 file changed, 1 insertion(+), 32 deletions(-)
>
> Looks fine.
>
> Should the subject be "mtip32xx: ..." instead of "block: ..."? I
> understand "block:" as relating to block core. I am fairly new here. If
> "block:" can be used for block drivers too, that is fine too.
>
>

Good idea! Will be fixed in the next version.

-- 
Thanks,
Ren

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Ren Mingxin @ 2012-04-02  1:19 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe, Michael S. Tsirkin, Rusty Russell
  Cc: VIRTUAL, LKML, SCSI, KVM
In-Reply-To: <20120330152808.GC28934@google.com>

  On 03/30/2012 11:28 PM, Tejun Heo wrote:
> On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote:
>> On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote:
>>>   The current virtblk's naming algorithm only supports 263  disks.
>>> If there are mass of virtblks(exceeding 263), there will be disks
>>> with the same name.
>>>
>>> By renaming "sd_format_disk_name()" to "disk_name_format()"
>>> and moving it into block core, virtio_blk can use this function to
>>> support mass of disks.
>>>
>>> Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
>> I guess it's already way too late but why couldn't they have been
>> named vdD-P where both D and P are integers denoting disk number and
>> partition number?  [sh]dX's were created when there weren't supposed
>> to be too many disks, so we had to come up with the horrible alphabet
>> based numbering scheme but vd is new enough.  I mean, naming is one
>> thing but who wants to figure out which sequence is or guess what
>> comes next vdzz9?  :(
>>
>> If we're gonna move it to block layer, let's add big blinking red
>> comment saying "don't ever use it for any new driver".
> And also let's make that clear in the function name - say,
> format_legacy_disk_name() or something.

So, to legacy disks [sh]d, we'd name them as [sh]d[a-z]{1,}. To new devices
like vd, we'd name them as vd<index>(vd<index>p<partno> as partitions)?
And how about the rssd in the patch 3 then?

Besides, does anybody have comments?
Looking forward to your replies ;-)

-- 
Thanks,
Ren

^ permalink raw reply

* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Ren Mingxin @ 2012-04-02  1:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, SCSI, KVM, Michael S. Tsirkin, LKML, VIRTUAL
In-Reply-To: <20120330153828.GD28934@google.com>

  On 03/30/2012 11:38 PM, Tejun Heo wrote:
> On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote:
>> On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote:
>>>   The current virtblk's naming algorithm only supports 263  disks.
>>> If there are mass of virtblks(exceeding 263), there will be disks
>>> with the same name.
>>>
>>> By renaming "sd_format_disk_name()" to "disk_name_format()"
>>> and moving it into block core, virtio_blk can use this function to
>>> support mass of disks.
>>>
>>> Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
>> I guess it's already way too late but why couldn't they have been
>> named vdD-P where both D and P are integers denoting disk number and
> So, if a device name ends in digit the partition code adds delimiter
> 'p' automatically and this is already in use by md.  So, partitioned
> md devices are named mdDpP where D and P are base 10 number indicating
> the sequence like md12p4.  So, let's please add comment that new
> drivers should name their devices PREFIX%d where the sequence number
> can be allocated by ida.
>
>

Oh, got it. Just like md devices' names.

-- 
Thanks,
Ren

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-01 13:56 UTC (permalink / raw)
  To: Avi Kivity
  Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
	Srivatsa Vaddagiri, Jeremy Fitzhardinge, H. Peter Anvin,
	Attilio Rao, Ingo Molnar, Virtualization, Linus Torvalds,
	Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F785DCF.7020809@redhat.com>

On 04/01/2012 07:23 PM, Avi Kivity wrote:
> On 04/01/2012 04:48 PM, Raghavendra K T wrote:
>>>> I have patch something like below in mind to try:
>
> I'm interested in how PLE does vs. your patches, both with PLE enabled
> and disabled.
>

Sure. will update with the experimental results.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox