* RE: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: KY Srinivasan @ 2012-03-11 20:53 UTC (permalink / raw)
To: Dan Carpenter
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <20120311184916.GD3337@mwanda>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Sunday, March 11, 2012 2:49 PM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> Alan Stern
> Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> messages in the driver
>
> On Sun, Mar 11, 2012 at 04:56:06PM +0000, KY Srinivasan wrote:
> > > Probably that's not enough to make a difference and we'd need to
> > > introduce a new function.
> > >
> > > Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> > > It feels like maybe we could end up with ->value_size equal to
> > > HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
> >
> > The MAX value is set to accommodate the maximum string that will ever
> > be handled including the string terminator. The function utf16s_to_utf8s()
> > returns the converted string length but the returned length does not
> > include the string terminator (like strlen), hence the "+1".
> >
>
> sprintf() and friends copy the NUL terminator but utf16s_to_utf8s()
> doesn't so the code isn't right and it does seem like maybe we could
> end up with a ->value_size equal to HV_KVP_EXCHANGE_MAX_VALUE_SIZE +
> 1.
You are right in that utf16s_to_utf8s() does not copy the string terminator. This
is not an issue in this case since the buffer for the utf8 string is zeroed out to begin
with (this memory was allocated using kzalloc()). The return value of the utf16s_to_utf8s()
is the length of the utf8s string as what would be returned by strlen. I add one to take into account
the string terminator character for further processing. As I said before the MAX value takes into
account the terminating character for all the strings handled.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Dan Carpenter @ 2012-03-12 5:22 UTC (permalink / raw)
To: KY Srinivasan
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B75D433@TK5EX14MBXC122.redmond.corp.microsoft.com>
[-- Attachment #1.1: Type: text/plain, Size: 2661 bytes --]
On Sun, Mar 11, 2012 at 08:53:57PM +0000, KY Srinivasan wrote:
>
>
> > -----Original Message-----
> > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > Sent: Sunday, March 11, 2012 2:49 PM
> > To: KY Srinivasan
> > Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> > devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> > Alan Stern
> > Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> > messages in the driver
> >
> > On Sun, Mar 11, 2012 at 04:56:06PM +0000, KY Srinivasan wrote:
> > > > Probably that's not enough to make a difference and we'd need to
> > > > introduce a new function.
> > > >
> > > > Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> > > > It feels like maybe we could end up with ->value_size equal to
> > > > HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
> > >
> > > The MAX value is set to accommodate the maximum string that will ever
> > > be handled including the string terminator. The function utf16s_to_utf8s()
> > > returns the converted string length but the returned length does not
> > > include the string terminator (like strlen), hence the "+1".
> > >
> >
> > sprintf() and friends copy the NUL terminator but utf16s_to_utf8s()
> > doesn't so the code isn't right and it does seem like maybe we could
> > end up with a ->value_size equal to HV_KVP_EXCHANGE_MAX_VALUE_SIZE +
> > 1.
>
> You are right in that utf16s_to_utf8s() does not copy the string
> terminator. This is not an issue in this case since the buffer for
> the utf8 string is zeroed out to begin with (this memory was
> allocated using kzalloc()). The return value of the
> utf16s_to_utf8s() is the length of the utf8s string as what would
> be returned by strlen.
There is no strlen() involved... It returns the number of bytes
copied to the output string. It doesn't copy a NUL. We pass
HV_KVP_EXCHANGE_MAX_VALUE_SIZE bytes as the limit. So it fills up
the buffer with non-null characters and we have an off-by-one.
> I add one to take into account the string
> terminator character for further processing. As I said before the
> MAX value takes into account the terminating character for all the
> strings handled.
So you're saying that since we control the input string, we'll never
hit the max? Still, why not pass HV_KVP_EXCHANGE_MAX_VALUE_SIZE - 1
to leave room for the NUL just for correctness? We'd still add one
to the return value but we wouldn't go over the size of the buffer.
Again, I don't really know how utf16s_to_utf8s() works so I might
have misunderstood.
regards,
dan carpenter
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel
^ permalink raw reply
* RE: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: KY Srinivasan @ 2012-03-12 12:36 UTC (permalink / raw)
To: Dan Carpenter
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <20120312052221.GE3337@mwanda>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Monday, March 12, 2012 1:22 AM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; ohering@suse.com; linux-
> kernel@vger.kernel.org; virtualization@lists.osdl.org; Alan Stern;
> devel@linuxdriverproject.org
> Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> messages in the driver
>
> On Sun, Mar 11, 2012 at 08:53:57PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > > Sent: Sunday, March 11, 2012 2:49 PM
> > > To: KY Srinivasan
> > > Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> > > devel@linuxdriverproject.org; virtualization@lists.osdl.org;
> ohering@suse.com;
> > > Alan Stern
> > > Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> > > messages in the driver
> > >
> > > On Sun, Mar 11, 2012 at 04:56:06PM +0000, KY Srinivasan wrote:
> > > > > Probably that's not enough to make a difference and we'd need to
> > > > > introduce a new function.
> > > > >
> > > > > Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> > > > > It feels like maybe we could end up with ->value_size equal to
> > > > > HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
> > > >
> > > > The MAX value is set to accommodate the maximum string that will ever
> > > > be handled including the string terminator. The function utf16s_to_utf8s()
> > > > returns the converted string length but the returned length does not
> > > > include the string terminator (like strlen), hence the "+1".
> > > >
> > >
> > > sprintf() and friends copy the NUL terminator but utf16s_to_utf8s()
> > > doesn't so the code isn't right and it does seem like maybe we could
> > > end up with a ->value_size equal to HV_KVP_EXCHANGE_MAX_VALUE_SIZE
> +
> > > 1.
> >
> > You are right in that utf16s_to_utf8s() does not copy the string
> > terminator. This is not an issue in this case since the buffer for
> > the utf8 string is zeroed out to begin with (this memory was
> > allocated using kzalloc()). The return value of the
> > utf16s_to_utf8s() is the length of the utf8s string as what would
> > be returned by strlen.
>
>
> There is no strlen() involved... It returns the number of bytes
> copied to the output string. It doesn't copy a NUL. We pass
> HV_KVP_EXCHANGE_MAX_VALUE_SIZE bytes as the limit. So it fills up
> the buffer with non-null characters and we have an off-by-one.
>
Dan,
I am sorry for not being as precise as I should be:
utf16s_to_utf8s() takes two length parameters - the length of the utf16 string
that is to be converted and the second the length of the utf8 output string.
The windows host manipulates all string in utf16 encoding and the string we get
from the host is guaranteed to be less than or equal to MAX value that we have
including the terminating character. In my code, I simply pass the length of the
utf16 string as received from the host.
The parameter that I am currently passing MAX length value is the "maxout"
parameter of the utf16s_utf8s() function. This by definition is the size of the
output buffer and in this case it happens to be MAX characters big.
> > I add one to take into account the string
> > terminator character for further processing. As I said before the
> > MAX value takes into account the terminating character for all the
> > strings handled.
>
> So you're saying that since we control the input string, we'll never
> hit the max? Still, why not pass HV_KVP_EXCHANGE_MAX_VALUE_SIZE - 1
> to leave room for the NUL just for correctness? We'd still add one
> to the return value but we wouldn't go over the size of the buffer.
As I described earlier, with the host side guarantee that the string we get from
the host is always guaranteed to be less than or equal to MAX length
(including the terminating character), there is no question of going over the
size of the output buffer which is sized based on the host specification.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Dan Carpenter @ 2012-03-12 13:03 UTC (permalink / raw)
To: KY Srinivasan
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B75D4CE@TK5EX14MBXC122.redmond.corp.microsoft.com>
[-- Attachment #1.1: Type: text/plain, Size: 2195 bytes --]
On Mon, Mar 12, 2012 at 12:36:53PM +0000, KY Srinivasan wrote:
> Dan,
> I am sorry for not being as precise as I should be:
> utf16s_to_utf8s() takes two length parameters - the length of the utf16 string
> that is to be converted and the second the length of the utf8 output string.
> The windows host manipulates all string in utf16 encoding and the string we get
> from the host is guaranteed to be less than or equal to MAX value that we have
> including the terminating character. In my code, I simply pass the length of the
> utf16 string as received from the host.
>
> The parameter that I am currently passing MAX length value is the "maxout"
> parameter of the utf16s_utf8s() function. This by definition is the size of the
> output buffer and in this case it happens to be MAX characters big.
>
I also think I'm not being as clear as I should... I understand
that you trust the input; I'm say that for correctness sake you
should specify a output size which leaves room for the NUL char.
I can't say I know this code very well so I could be wrong, but it's
what we do inside usb_string() for example. Can someone who knows
the code check if we should do something like this:
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 3b2eeaa..3a97f52 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -173,7 +173,7 @@ kvp_send_key(struct work_struct *dummy)
in_msg->body.kvp_set.data.value_size,
UTF16_LITTLE_ENDIAN,
message->body.kvp_set.data.value,
- HV_KVP_EXCHANGE_MAX_VALUE_SIZE) + 1;
+ HV_KVP_EXCHANGE_MAX_VALUE_SIZE - 1) + 1;
break;
case REG_U32:
@@ -208,7 +208,7 @@ kvp_send_key(struct work_struct *dummy)
in_msg->body.kvp_set.data.key_size,
UTF16_LITTLE_ENDIAN,
message->body.kvp_set.data.key,
- HV_KVP_EXCHANGE_MAX_KEY_SIZE) + 1;
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1) + 1;
break;
@@ -219,7 +219,7 @@ kvp_send_key(struct work_struct *dummy)
in_msg->body.kvp_delete.key_size,
UTF16_LITTLE_ENDIAN,
message->body.kvp_delete.key,
- HV_KVP_EXCHANGE_MAX_KEY_SIZE) + 1;
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1) + 1;
break;
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel
^ permalink raw reply related
* [V4 PATCH] virtio-net: send gratuitous packet when needed
From: Jason Wang @ 2012-03-13 9:08 UTC (permalink / raw)
To: netdev, rusty, virtualization, linux-kernel, mst
As hypervior does not have the knowledge of guest network configuration, it's
better to ask guest to send gratuitous packet when needed.
Guest test VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
is set, a workqueue is scheduled to send gratuitous packet through
NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
VIRTIO_NET_F_GUEST_ANNOUNCE.
Changes from v3:
- cancel the workqueue during freeze
Changes from v2:
- fix the race between unregister_dev() and workqueue
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++++-
include/linux/virtio_net.h | 2 ++
2 files changed, 32 insertions(+), 1 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4880aa8..45f7ac6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,9 @@ struct virtnet_info {
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
+ /* Work struct for sending gratituous packet. */
+ struct work_struct announce;
+
/* Chain pages by the private ptr. */
struct page *pages;
@@ -512,6 +515,13 @@ static void refill_work(struct work_struct *work)
queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2);
}
+static void announce_work(struct work_struct *work)
+{
+ struct virtnet_info *vi = container_of(work, struct virtnet_info,
+ announce);
+ netif_notify_peers(vi->dev);
+}
+
static int virtnet_poll(struct napi_struct *napi, int budget)
{
struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
@@ -787,6 +797,8 @@ static int virtnet_close(struct net_device *dev)
/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ cancel_work_sync(&vi->announce);
napi_disable(&vi->napi);
return 0;
@@ -962,11 +974,23 @@ static void virtnet_update_status(struct virtnet_info *vi)
return;
/* Ignore unknown (future) status bits */
- v &= VIRTIO_NET_S_LINK_UP;
+ v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
if (vi->status == v)
return;
+ if (v & VIRTIO_NET_S_ANNOUNCE) {
+ v &= ~VIRTIO_NET_S_ANNOUNCE;
+ vi->vdev->config->set(vi->vdev,
+ offsetof(struct virtio_net_config,
+ status),
+ &v, sizeof(v));
+
+ if ((v & VIRTIO_NET_S_LINK_UP) &&
+ virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ schedule_work(&vi->announce);
+ }
+
vi->status = v;
if (vi->status & VIRTIO_NET_S_LINK_UP) {
@@ -1076,6 +1100,8 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free;
INIT_DELAYED_WORK(&vi->refill, refill_work);
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ INIT_WORK(&vi->announce, announce_work);
sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
@@ -1187,6 +1213,8 @@ static int virtnet_freeze(struct virtio_device *vdev)
virtqueue_disable_cb(vi->svq);
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
virtqueue_disable_cb(vi->cvq);
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ cancel_work_sync(&vi->announce);
netif_device_detach(vi->dev);
cancel_delayed_work_sync(&vi->refill);
@@ -1233,6 +1261,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+ VIRTIO_NET_F_GUEST_ANNOUNCE,
};
static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 970d5a2..44a38d6 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -49,8 +49,10 @@
#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can send gratituous packet */
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
struct virtio_net_config {
/* The config defining mac address (if VIRTIO_NET_F_MAC) */
^ permalink raw reply related
* Re: [V4 PATCH] virtio-net: send gratuitous packet when needed
From: Michael S. Tsirkin @ 2012-03-13 14:33 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120313090841.11110.82654.stgit@amd-6168-8-1.englab.nay.redhat.com>
On Tue, Mar 13, 2012 at 05:08:41PM +0800, Jason Wang wrote:
> As hypervior does not have the knowledge of guest network configuration, it's
> better to ask guest to send gratuitous packet when needed.
packet -> packets
>
> Guest test VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
test -> tests
> is set, a workqueue is scheduled to send gratuitous packet through
> NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> VIRTIO_NET_F_GUEST_ANNOUNCE.
>
> Changes from v3:
> - cancel the workqueue during freeze
>
> Changes from v2:
> - fix the race between unregister_dev() and workqueue
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++++-
> include/linux/virtio_net.h | 2 ++
> 2 files changed, 32 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4880aa8..45f7ac6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,9 @@ struct virtnet_info {
> /* Work struct for refilling if we run low on memory. */
> struct delayed_work refill;
>
> + /* Work struct for sending gratituous packet. */
packets
> + struct work_struct announce;
> +
> /* Chain pages by the private ptr. */
> struct page *pages;
>
> @@ -512,6 +515,13 @@ static void refill_work(struct work_struct *work)
> queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2);
> }
>
> +static void announce_work(struct work_struct *work)
> +{
> + struct virtnet_info *vi = container_of(work, struct virtnet_info,
> + announce);
> + netif_notify_peers(vi->dev);
> +}
> +
> static int virtnet_poll(struct napi_struct *napi, int budget)
> {
> struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
> @@ -787,6 +797,8 @@ static int virtnet_close(struct net_device *dev)
>
> /* Make sure refill_work doesn't re-enable napi! */
> cancel_delayed_work_sync(&vi->refill);
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
> + cancel_work_sync(&vi->announce);
don't make this cancel conditional on has_feature -
this is out of data path, and code will be cleaner
if we do it unconditionally.
> napi_disable(&vi->napi);
>
> return 0;
> @@ -962,11 +974,23 @@ static void virtnet_update_status(struct virtnet_info *vi)
> return;
>
> /* Ignore unknown (future) status bits */
> - v &= VIRTIO_NET_S_LINK_UP;
> + v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
>
> if (vi->status == v)
> return;
>
> + if (v & VIRTIO_NET_S_ANNOUNCE) {
> + v &= ~VIRTIO_NET_S_ANNOUNCE;
> + vi->vdev->config->set(vi->vdev,
> + offsetof(struct virtio_net_config,
> + status),
> + &v, sizeof(v));
> +
> + if ((v & VIRTIO_NET_S_LINK_UP) &&
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
> + schedule_work(&vi->announce);
> + }
> +
It's probably easier to just do this unconditionally.
The only reason a feature bit might make sense is
that this way host knows guest will announce self.
Alternatively, if you want the ability to reuse the
status bit for something else, set must be conditional as well.
> vi->status = v;
>
> if (vi->status & VIRTIO_NET_S_LINK_UP) {
> @@ -1076,6 +1100,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> goto free;
>
> INIT_DELAYED_WORK(&vi->refill, refill_work);
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
> + INIT_WORK(&vi->announce, announce_work);
Do this unconditionally too.
> sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
> sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
>
> @@ -1187,6 +1213,8 @@ static int virtnet_freeze(struct virtio_device *vdev)
> virtqueue_disable_cb(vi->svq);
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
> virtqueue_disable_cb(vi->cvq);
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
> + cancel_work_sync(&vi->announce);
>
> netif_device_detach(vi->dev);
> cancel_delayed_work_sync(&vi->refill);
> @@ -1233,6 +1261,7 @@ static unsigned int features[] = {
> VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
> VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> + VIRTIO_NET_F_GUEST_ANNOUNCE,
> };
>
> static struct virtio_driver virtio_net_driver = {
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 970d5a2..44a38d6 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -49,8 +49,10 @@
> #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
> #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can send gratituous packet */
>
> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
> +#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
I would put this in bit 8 (0x100), this way low status byte
is RO, high byte is RW.
>
> struct virtio_net_config {
> /* The config defining mac address (if VIRTIO_NET_F_MAC) */
^ permalink raw reply
* Re: [PATCH 0000/0004] drivers: hv
From: Greg KH @ 2012-03-13 21:50 UTC (permalink / raw)
To: K. Y. Srinivasan; +Cc: linux-kernel, devel, virtualization, ohering
In-Reply-To: <1331422300-4330-1-git-send-email-kys@microsoft.com>
On Sat, Mar 10, 2012 at 03:31:40PM -0800, K. Y. Srinivasan wrote:
>
> This patch-set further enhances the KVP functionality for Linux
> guests:
>
> 1. Supports most of the Win8 KVP protocol.
> 2. Supports operations on all the pools.
I've only applied the first patch, due to the issues with the second.
Feel free to resend the rest when you have them worked out.
greg k-h
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Justin Gibbs @ 2012-03-14 6:32 UTC (permalink / raw)
To: Jan Beulich
Cc: jeremy@goop.org, xen-devel@lists.xen.org, Ian Campbell,
konrad.wilk@oracle.com, waldi@debian.org, netdev@vger.kernel.org,
joe.jin@oracle.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org, weiyi.huang@gmail.com,
paul.gortmaker@windriver.com, Paul Durrant, David Vrabel,
Santosh Jodh, linux-pci@vger.kernel.org,
<konrad@darnok.org>, akpm@linux-foundation.org,
"virtualization@lists.linux-foundation.org" <virtualizati>
In-Reply-To: <4F5739760200007800076DA6@nat28.tlf.novell.com>
On Mar 7, 2012, at 2:33 AM, Jan Beulich wrote:
>>>> On 06.03.12 at 18:20, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> -> XENBUS_MAX_RING_PAGES - why 2? Why not 4? What is the optimal
>> default size for SSD usage? 16?
>
> What do SSDs have to do with a XenBus definition? Imo it's wrong (and
> unnecessary) to introduce a limit at the XenBus level at all - each driver
> can do this for itself.
>
> As to the limit for SSDs in the block interface - I don't think the number
> of possibly simultaneous requests has anything to do with this. Instead,
> I'd expect the request number/size/segments extension that NetBSD
> apparently implements to possibly have an effect.
>
> Jan
There's another problem here that I brought up during the Xen
Hack-a-thon. The ring macros require that the ring element count
be a power of two. This doesn't mean that the ring will be a power
of 2 pages in size. To illustrate this point, I modified the FreeBSD
blkback driver to provide negotiated ring stats via sysctl.
Here's a connection to a Windows VM running the Citrix PV drivers:
dev.xbbd.2.max_requests: 128
dev.xbbd.2.max_request_segments: 11
dev.xbbd.2.max_request_size: 45056
dev.xbbd.2.ring_elem_size: 108 <= 32bit ABI
dev.xbbd.2.ring_pages: 4
dev.xbbd.2.ring_elements: 128
dev.xbbd.2.ring_waste: 2496
Over half a page is wasted when ring-page-order is 2. I'm sure you
can see where this is going. :-)
Here are the limits published by our backend to the XenStore:
max-ring-pages = "113"
max-ring-page-order = "7"
max-requests = "256"
max-request-segments = "129"
max-request-size = "524288"
Because we allow so many concurrent, large requests in our product,
the ring wastage really adds up if the front end doesn't support
the "ring-pages" variant of the extension. However, you only need
a ring-page-order of 3 with this protocol to start seeing pages of
wasted ring space.
You don't really want to negotiate "ring-pages" either. The backends
often need to support multiple ABIs. I can easily construct a set
of limits for the FreeBSD blkback driver which will cause the ring
limits to vary by a page between the 32bit and 64bit ABIs.
With all this in mind, the backend must do a dance of rounding up,
taking the max of the ring sizes for the different ABIs, and then
validating the front-end published limits taking its ABI into
account. The front-end does some of this too. Its way too messy
and error prone because we don't communicate the ring element limit
directly.
"max-ring-element-order" anyone? :-)
--
Justin
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-14 8:35 UTC (permalink / raw)
To: Justin Gibbs
Cc: jeremy@goop.org, Ian Campbell, netdev@vger.kernel.org,
konrad.wilk@oracle.com, waldi@debian.org, joe.jin@oracle.com,
weiyi.huang@gmail.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, Paul Durrant, DavidVrabel,
Santosh Jodh, linux-pci@vger.kernel.org,
<konrad@darnok.org>, akpm@linux-foundation.org, xen-devel
In-Reply-To: <230A793A-C929-468E-97BE-1C68920749E1@spectralogic.com>
>>> On 14.03.12 at 07:32, Justin Gibbs <justing@spectralogic.com> wrote:
> There's another problem here that I brought up during the Xen
> Hack-a-thon. The ring macros require that the ring element count
> be a power of two. This doesn't mean that the ring will be a power
> of 2 pages in size. To illustrate this point, I modified the FreeBSD
> blkback driver to provide negotiated ring stats via sysctl.
>
> Here's a connection to a Windows VM running the Citrix PV drivers:
>
> dev.xbbd.2.max_requests: 128
> dev.xbbd.2.max_request_segments: 11
> dev.xbbd.2.max_request_size: 45056
> dev.xbbd.2.ring_elem_size: 108 <= 32bit ABI
> dev.xbbd.2.ring_pages: 4
> dev.xbbd.2.ring_elements: 128
> dev.xbbd.2.ring_waste: 2496
>
> Over half a page is wasted when ring-page-order is 2. I'm sure you
> can see where this is going. :-)
>
> Here are the limits published by our backend to the XenStore:
>
> max-ring-pages = "113"
> max-ring-page-order = "7"
> max-requests = "256"
> max-request-segments = "129"
> max-request-size = "524288"
>
> Because we allow so many concurrent, large requests in our product,
> the ring wastage really adds up if the front end doesn't support
> the "ring-pages" variant of the extension. However, you only need
> a ring-page-order of 3 with this protocol to start seeing pages of
> wasted ring space.
>
> You don't really want to negotiate "ring-pages" either. The backends
> often need to support multiple ABIs. I can easily construct a set
> of limits for the FreeBSD blkback driver which will cause the ring
> limits to vary by a page between the 32bit and 64bit ABIs.
>
> With all this in mind, the backend must do a dance of rounding up,
> taking the max of the ring sizes for the different ABIs, and then
> validating the front-end published limits taking its ABI into
> account. The front-end does some of this too. Its way too messy
> and error prone because we don't communicate the ring element limit
> directly.
>
> "max-ring-element-order" anyone? :-)
Interesting observation - yes, I think deprecating both pre-existing
methods in favor of something along those lines would be desirable.
(But I'd favor not using the term "order" here as it is - at least in
Linux - usually implied to be used on pages. "max-ringent-log2"
perhaps?)
What you say also implies that all currently floating around Linux
backend patches are flawed in their way of calculating the number
of ring entries, as this number really depends on the protocol the
frontend advertises.
Further, if you're concerned about wasting ring space (and
particularly in the context of your request number/size/segments
extension), shouldn't we bother to define pairs (or larger groups)
of struct blkif_request_segment (as currently a quarter of the space
is mere padding)? Or split grefs from {first,last}_sect altogether?
Finally, while looking at all this again, I stumbled across the use
of blkif_vdev_t in the ring structures: At least Linux'es blkback
completely ignores this field - {xen_,}vbd_translate() simply
overwrites what dispatch_rw_block_io() put there (and with this,
struct phys_req's dev and bdev members seem rather pointless too).
Does anyone recall what the original intention with this request field
was? Allowing I/O on multiple devices over a single ring?
Bottom line - shouldn't we define a blkif2 interface to cleanly
accommodate all the various extensions (and do away with the
protocol variations)?
Jan
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-14 15:34 UTC (permalink / raw)
To: Justin Gibbs
Cc: jeremy@goop.org, Ian Campbell, netdev@vger.kernel.org,
konrad.wilk@oracle.com, waldi@debian.org, joe.jin@oracle.com,
weiyi.huang@gmail.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, Paul Durrant, DavidVrabel,
Santosh Jodh, linux-pci@vger.kernel.org,
<konrad@darnok.org>, akpm@linux-foundation.org, xen-devel
In-Reply-To: <230A793A-C929-468E-97BE-1C68920749E1@spectralogic.com>
>>> On 14.03.12 at 07:32, Justin Gibbs <justing@spectralogic.com> wrote:
> There's another problem here that I brought up during the Xen
> Hack-a-thon. The ring macros require that the ring element count
> be a power of two. This doesn't mean that the ring will be a power
> of 2 pages in size. To illustrate this point, I modified the FreeBSD
> blkback driver to provide negotiated ring stats via sysctl.
>
> Here's a connection to a Windows VM running the Citrix PV drivers:
>
> dev.xbbd.2.max_requests: 128
> dev.xbbd.2.max_request_segments: 11
> dev.xbbd.2.max_request_size: 45056
> dev.xbbd.2.ring_elem_size: 108 <= 32bit ABI
> dev.xbbd.2.ring_pages: 4
> dev.xbbd.2.ring_elements: 128
> dev.xbbd.2.ring_waste: 2496
>
> Over half a page is wasted when ring-page-order is 2. I'm sure you
> can see where this is going. :-)
Having looked a little closer on how the wasted space is progressing,
I find myself in the odd position that I can't explain the original (and
still active) definition of BLKIF_MAX_SEGMENTS_PER_REQUEST (11):
With ring-order zero, there's 0x240/0x1c0 bytes (32/64-bit
respectively) are unused. With 32 requests fitting in the ring, and with
each segment occupying 6 bytes (padded to 8), in the 64-bit variant
there's enough space for a 12th segment (32-bit would even have
space for a 13th). Am I missing anything here?
Plus all this assumes a page size of 4k, yet ia64 had always been using
pages of 16k iirc.
> Here are the limits published by our backend to the XenStore:
>
> max-ring-pages = "113"
> max-ring-page-order = "7"
> max-requests = "256"
> max-request-segments = "129"
> max-request-size = "524288"
Oh, so this protocol doesn't require ring-pages (and max-ring-pages)
to be a power of two? In which case I think it is a mistake to also
advertise max-ring-page-order, as at least the (Linux) frontend code
I know of interprets this as being able to set up a ring of (using the
numbers above) 128 pages (unless, of course, your backend can deal
with this regardless of the max-ring-pages value it announces).
Jan
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Justin Gibbs @ 2012-03-14 17:01 UTC (permalink / raw)
To: Jan Beulich
Cc: jeremy@goop.org, Ian Campbell, netdev@vger.kernel.org,
konrad.wilk@oracle.com, waldi@debian.org, joe.jin@oracle.com,
weiyi.huang@gmail.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, Paul Durrant, DavidVrabel,
Santosh Jodh, linux-pci@vger.kernel.org,
<konrad@darnok.org>, akpm@linux-foundation.org, xen-devel
In-Reply-To: <4F60C888020000780007861F@nat28.tlf.novell.com>
On Mar 14, 2012, at 9:34 AM, Jan Beulich wrote:
>>>> On 14.03.12 at 07:32, Justin Gibbs <justing@spectralogic.com> wrote:
>> There's another problem here that I brought up during the Xen
>> Hack-a-thon. The ring macros require that the ring element count
>> be a power of two. This doesn't mean that the ring will be a power
>> of 2 pages in size. To illustrate this point, I modified the FreeBSD
>> blkback driver to provide negotiated ring stats via sysctl.
>>
>> Here's a connection to a Windows VM running the Citrix PV drivers:
>>
>> dev.xbbd.2.max_requests: 128
>> dev.xbbd.2.max_request_segments: 11
>> dev.xbbd.2.max_request_size: 45056
>> dev.xbbd.2.ring_elem_size: 108 <= 32bit ABI
>> dev.xbbd.2.ring_pages: 4
>> dev.xbbd.2.ring_elements: 128
>> dev.xbbd.2.ring_waste: 2496
>>
>> Over half a page is wasted when ring-page-order is 2. I'm sure you
>> can see where this is going. :-)
>
> Having looked a little closer on how the wasted space is progressing,
> I find myself in the odd position that I can't explain the original (and
> still active) definition of BLKIF_MAX_SEGMENTS_PER_REQUEST (11):
> With ring-order zero, there's 0x240/0x1c0 bytes (32/64-bit
> respectively) are unused. With 32 requests fitting in the ring, and with
> each segment occupying 6 bytes (padded to 8), in the 64-bit variant
> there's enough space for a 12th segment (32-bit would even have
> space for a 13th). Am I missing anything here?
I don't profess to know the real reason, but the only thing I can come up
with is a requirement/desire on some platforms for 16byte alignment
of the request structures. This would make the largest possible structure
112 bytes, not the 120 that would allow for more elements.
While we're talking about fixing ring data structures, can RING_IDX
be defined as a "uint32_t" instead of "unsigned int". The structure
padding in the ring macros assumes RING_IDX is exactly 4 bytes,
so this should be made explicit. ILP64 machines may still be a way
out, but the use of non-fixed sized types in places where size really
matters just isn't clean.
>
>> Here are the limits published by our backend to the XenStore:
>>
>> max-ring-pages = "113"
>> max-ring-page-order = "7"
>> max-requests = "256"
>> max-request-segments = "129"
>> max-request-size = "524288"
>
> Oh, so this protocol doesn't require ring-pages (and max-ring-pages)
> to be a power of two? In which case I think it is a mistake to also
> advertise max-ring-page-order, as at least the (Linux) frontend code
> I know of interprets this as being able to set up a ring of (using the
> numbers above) 128 pages (unless, of course, your backend can deal
> with this regardless of the max-ring-pages value it announces).
The advertised max-ring-pages is sufficient to hold the maximum allowed
number of ring elements regardless of ABI. This is then rounded up to the
next power of 2 pages to get the max-ring-page order. When the front-end
negotiates, the backend just verifies that the maximum number of ring
elements in the specified ring size doesn't exceed the backend's limit.
Fortunately, even with this large of a ring, regardless of ABI, a given
page order computes to the same number of ring elements. You just have
more wasted space.
--
Justin
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Justin T. Gibbs @ 2012-03-14 17:17 UTC (permalink / raw)
To: Jan Beulich
Cc: jeremy@goop.org, xen-devel@lists.xen.org, Ian Campbell,
konrad.wilk@oracle.com, waldi@debian.org, netdev@vger.kernel.org,
joe.jin@oracle.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org, weiyi.huang@gmail.com,
paul.gortmaker@windriver.com, Paul Durrant, David Vrabel,
Santosh Jodh, linux-pci@vger.kernel.org,
akpm@linux-foundation.org,
virtualization@lists.linux-foundation.org, lersek
In-Reply-To: <4F55DA3E0200007800076916@nat28.tlf.novell.com>
On Mar 6, 2012, at 1:34 AM, Jan Beulich wrote:
>>>> On 05.03.12 at 22:49, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
…
>> + }
>> +
>> /* Create shared ring, alloc event channel. */
>> err = setup_blkring(dev, info);
>> if (err)
>> @@ -889,12 +916,35 @@ again:
>> goto destroy_blkring;
>> }
>>
>> - err = xenbus_printf(xbt, dev->nodename,
>> - "ring-ref", "%u", info->ring_ref);
>> - if (err) {
>> - message = "writing ring-ref";
>> - goto abort_transaction;
>> + if (legacy_backend) {
>
> Why not use the simpler interface always when info->ring_order == 0?
Because, as I just found out today via a FreeBSD bug report, that's
not how XenServer works. If the front-end publishes "ring-page-order",
the backend assumes the "ring-refNN" XenStore nodes are in effect,
even if the order is 0.
I'm working on a documentation update for blkif.h now.
<sigh>
--
Justin
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-15 8:03 UTC (permalink / raw)
To: Justin Gibbs
Cc: jeremy@goop.org, Ian Campbell, konrad.wilk@oracle.com,
waldi@debian.org, joe.jin@oracle.com, weiyi.huang@gmail.com,
virtualization@lists.linux-foundation.org, Paul Durrant,
DavidVrabel, SantoshJodh, <konrad@darnok.org>,
dgdegra@tycho.nsa.gov, xen-devel@lists.xen.org, lersek@redhat.com
In-Reply-To: <59150D7B-8349-4249-A93A-04883F4AD744@spectralogic.com>
>>> On 14.03.12 at 18:01, Justin Gibbs <justing@spectralogic.com> wrote:
> While we're talking about fixing ring data structures, can RING_IDX
> be defined as a "uint32_t" instead of "unsigned int". The structure
> padding in the ring macros assumes RING_IDX is exactly 4 bytes,
> so this should be made explicit. ILP64 machines may still be a way
> out, but the use of non-fixed sized types in places where size really
> matters just isn't clean.
Yes, if we're going to rev the interface, then any such flaws should be
corrected.
(Also shrinking the Cc list a little.)
Jan
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-15 8:09 UTC (permalink / raw)
To: Justin T. Gibbs
Cc: jeremy@goop.org, Ian Campbell, konrad.wilk@oracle.com,
waldi@debian.org, joe.jin@oracle.com, weiyi.huang@gmail.com,
virtualization@lists.linux-foundation.org, Paul Durrant,
David Vrabel, Santosh Jodh, dgdegra@tycho.nsa.gov,
xen-devel@lists.xen.org, lersek@redhat.com
In-Reply-To: <96ED0F5B-AE7D-4C8E-A8E1-D386B2DED212@scsiguy.com>
>>> On 14.03.12 at 18:17, "Justin T. Gibbs" <gibbs@scsiguy.com> wrote:
> On Mar 6, 2012, at 1:34 AM, Jan Beulich wrote:
>
>>>>> On 05.03.12 at 22:49, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
>
> …
>
>>> + }
>>> +
>>> /* Create shared ring, alloc event channel. */
>>> err = setup_blkring(dev, info);
>>> if (err)
>>> @@ -889,12 +916,35 @@ again:
>>> goto destroy_blkring;
>>> }
>>>
>>> - err = xenbus_printf(xbt, dev->nodename,
>>> - "ring-ref", "%u", info->ring_ref);
>>> - if (err) {
>>> - message = "writing ring-ref";
>>> - goto abort_transaction;
>>> + if (legacy_backend) {
>>
>> Why not use the simpler interface always when info->ring_order == 0?
>
> Because, as I just found out today via a FreeBSD bug report, that's
> not how XenServer works. If the front-end publishes "ring-page-order",
> the backend assumes the "ring-refNN" XenStore nodes are in effect,
> even if the order is 0.
I was certainly implying to not write the ring-page-order and
num-ring-pages nodes in that case.
> I'm working on a documentation update for blkif.h now.
>
> <sigh>
>
> --
> Justin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Ian Campbell @ 2012-03-15 8:51 UTC (permalink / raw)
To: Jan Beulich
Cc: jeremy@goop.org, konrad.wilk@oracle.com, waldi@debian.org,
joe.jin@oracle.com, weiyi.huang@gmail.com,
virtualization@lists.linux-foundation.org, Justin Gibbs,
Paul Durrant, David Vrabel, Santosh Jodh,
<konrad@darnok.org>, dgdegra@tycho.nsa.gov,
xen-devel@lists.xen.org, lersek@redhat.com
In-Reply-To: <4F61B04402000078000788B8@nat28.tlf.novell.com>
On Thu, 2012-03-15 at 08:03 +0000, Jan Beulich wrote:
> >>> On 14.03.12 at 18:01, Justin Gibbs <justing@spectralogic.com> wrote:
> > While we're talking about fixing ring data structures, can RING_IDX
> > be defined as a "uint32_t" instead of "unsigned int". The structure
> > padding in the ring macros assumes RING_IDX is exactly 4 bytes,
> > so this should be made explicit. ILP64 machines may still be a way
> > out, but the use of non-fixed sized types in places where size really
> > matters just isn't clean.
>
> Yes, if we're going to rev the interface, then any such flaws should be
> corrected.
There has been talk of doing something similar for netif too. IIRC the
netchannel2 work included a new generic ring scheme with support for
variable sized req/rsp elements and such.
If we are going to rev the rings then should we try and use a common
ring mechanism? I think so. If so then we could do worse than to start
from the netchannel2 ring stuff and/or concepts?
Looks like that is
http://xenbits.xen.org/ext/netchannel2/linux-2.6.18/log/075f6677a290/include/xen/interface/io/uring.h
still a bit nc2 specific though.
Ian.
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-15 9:31 UTC (permalink / raw)
To: Ian Campbell
Cc: jeremy@goop.org, konrad.wilk@oracle.com, waldi@debian.org,
joe.jin@oracle.com, weiyi.huang@gmail.com,
virtualization@lists.linux-foundation.org, Justin Gibbs,
Paul Durrant, David Vrabel, SantoshJodh,
<konrad@darnok.org>, dgdegra@tycho.nsa.gov,
xen-devel@lists.xen.org, lersek@redhat.com
In-Reply-To: <1331801494.26979.8.camel@zakaz.uk.xensource.com>
>>> On 15.03.12 at 09:51, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2012-03-15 at 08:03 +0000, Jan Beulich wrote:
>> >>> On 14.03.12 at 18:01, Justin Gibbs <justing@spectralogic.com> wrote:
>> > While we're talking about fixing ring data structures, can RING_IDX
>> > be defined as a "uint32_t" instead of "unsigned int". The structure
>> > padding in the ring macros assumes RING_IDX is exactly 4 bytes,
>> > so this should be made explicit. ILP64 machines may still be a way
>> > out, but the use of non-fixed sized types in places where size really
>> > matters just isn't clean.
>>
>> Yes, if we're going to rev the interface, then any such flaws should be
>> corrected.
>
> There has been talk of doing something similar for netif too. IIRC the
> netchannel2 work included a new generic ring scheme with support for
> variable sized req/rsp elements and such.
>
> If we are going to rev the rings then should we try and use a common
> ring mechanism? I think so. If so then we could do worse than to start
> from the netchannel2 ring stuff and/or concepts?
>
> Looks like that is
> http://xenbits.xen.org/ext/netchannel2/linux-2.6.18/log/075f6677a290/include
> /xen/interface/io/uring.h
> still a bit nc2 specific though.
Taking the concept (and the implementation as a starting point) would
seem like a good idea to me. Separate request and reply rings as well
as variable size entries would certainly benefit blkif too.
Jan
^ permalink raw reply
* Re: [Xen-devel] [PATCH 2/4 v2] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Konrad Rzeszutek Wilk @ 2012-03-15 17:23 UTC (permalink / raw)
To: Andrew Jones
Cc: jeremy, xen-devel, FlorianSchandinat, dmitry torokhov,
virtualization, Konrad Rzeszutek Wilk
In-Reply-To: <7e004a73-b1ef-49f4-91e8-68fdf48c3ccb@zmail13.collab.prod.int.phx2.redhat.com>
On Wed, Jan 11, 2012 at 11:29:20AM -0500, Andrew Jones wrote:
>
>
> ----- Original Message -----
> > On Mon, Jan 09, 2012 at 06:51:41PM +0100, Andrew Jones wrote:
> > > PV-on-HVM guests may want to use the xen keyboard/mouse frontend,
> > > but
> > > they don't use the xen frame buffer frontend. For this case it
> > > doesn't
> >
> > Ok, but PV does?
> > > make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> > > XEN_FBDEV_FRONTEND. The opposite direction always makes more sense,
> > > i.e.
> > > if you're using xenfb, then you'll want xenkbd. Switch the
> > > dependencies.
> >
> > That sounds like it would be universal irregardless if it is
> > PV or PVonHVM?
>
> This patch makes it such that if you want to use both, then you must
> select both. It also says that if you want FB, then you need the
> KBD. However, if you only want the KBD then you're fine with just
> that. So there isn't any risk of breaking configs designed to use
> FB, because FB should be manually selected for those configs anyway.
Dmitry,
I am OK with this patch. Should I pick it up on my tree for 3.4 or
are you OK doing it via your tree?
Thanks!
>
> Drew
>
> > >
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > > drivers/input/misc/Kconfig | 2 +-
> > > drivers/video/Kconfig | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/input/misc/Kconfig
> > > b/drivers/input/misc/Kconfig
> > > index 22d875f..36c15bf 100644
> > > --- a/drivers/input/misc/Kconfig
> > > +++ b/drivers/input/misc/Kconfig
> > > @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
> > >
> > > config INPUT_XEN_KBDDEV_FRONTEND
> > > tristate "Xen virtual keyboard and mouse support"
> > > - depends on XEN_FBDEV_FRONTEND
> > > + depends on XEN
> > > default y
> > > select XEN_XENBUS_FRONTEND
> > > help
> > > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> > > index d83e967..3e38c2f 100644
> > > --- a/drivers/video/Kconfig
> > > +++ b/drivers/video/Kconfig
> > > @@ -2263,7 +2263,7 @@ config FB_VIRTUAL
> > >
> > > config XEN_FBDEV_FRONTEND
> > > tristate "Xen virtual frame buffer support"
> > > - depends on FB && XEN
> > > + depends on FB && XEN && INPUT_XEN_KBDDEV_FRONTEND
> > > select FB_SYS_FILLRECT
> > > select FB_SYS_COPYAREA
> > > select FB_SYS_IMAGEBLIT
> > > --
> > > 1.7.7.5
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> >
^ permalink raw reply
* RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: KY Srinivasan @ 2012-03-15 23:03 UTC (permalink / raw)
To: James Bottomley
Cc: Christoph Hellwig, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, ohering@suse.com,
linux-scsi@vger.kernel.org, Haiyang Zhang
In-Reply-To: <1330872513.2858.14.camel@dabdike.int.hansenpartnership.com>
> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> Sent: Sunday, March 04, 2012 9:49 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; linux-scsi@vger.kernel.org;
> Haiyang Zhang
> Subject: RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> the host
>
> On Sun, 2012-03-04 at 14:23 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: Christoph Hellwig [mailto:hch@infradead.org]
> > > Sent: Sunday, March 04, 2012 4:12 AM
> > > To: KY Srinivasan
> > > Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> > > devel@linuxdriverproject.org; virtualization@lists.osdl.org;
> ohering@suse.com;
> > > jbottomley@parallels.com; hch@infradead.org; linux-scsi@vger.kernel.org;
> > > Haiyang Zhang
> > > Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> > > the host
> > >
> > > On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> > > > Windows hosts don't handle the ATA_16 command; don't pass it to the
> host.
> > >
> > > Most devices don't handle it, and answer with and unsupported opcode
> > > sense reason. If hyperv iis buggy enough to crap out on it please add
> > > a comment explaining that.
> >
> > The host does not "crap out", it does return an error code but it is not
> "unsupported opcode".
> > The sense reason that comes back is a generic error SRB_STATUS code. It is
> easier for me to filter the
> > command on the outgoing side as opposed to dealing with a generic error code
> that is coming back from
> > the host.
>
> That's the wrong thing to do ... you need to unwrap the error code.
> The reason being I presume it's not impossible for Windows to host a
> device supporting ATA_16 and there are signs that this is going to be
> necessary to prevent data corruption on some USB devices ... if you just
> filter the command without checking if the host supports it, you're
> going to end up perpetuating the corruption problem.
James,
Currently, the Windows host does not provide me with granular error codes to
distinguish between the case where the operation failed and the case where it is
an illegal or unsupported operation. I have asked the windows team to return more
appropriate error codes and they may do that sometime soon. In the interim, filtering
the command on the outgoing path is the best way I can deal with this issue. Currently,
even on the host side they are filtering this command, except they are returning a failure
to the guest instead of a more appropriate error code.
While my proposal has been accepted for windows 8, I don't know when this change will ship.
Also, it is not clear when prior versions of Windows hosts that we care about will pick up this
change. This patch is the only way I know how to deal with this problems within my current
constraints.
Regards,
K. Y
>
> The general rule of thumb for avoiding this is to let the lower layers
> handle as much as possible, and only begin behaviour alterations in the
> upper layers if the lower layers have a provable and usually fatal
> failure.
>
> James
>
>
^ permalink raw reply
* RE: [PATCH 0000/0004] drivers: hv
From: KY Srinivasan @ 2012-03-15 23:26 UTC (permalink / raw)
To: Greg KH
Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, ohering@suse.com
In-Reply-To: <20120313215036.GA4972@kroah.com>
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Tuesday, March 13, 2012 5:51 PM
> To: KY Srinivasan
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com
> Subject: Re: [PATCH 0000/0004] drivers: hv
>
> On Sat, Mar 10, 2012 at 03:31:40PM -0800, K. Y. Srinivasan wrote:
> >
> > This patch-set further enhances the KVP functionality for Linux
> > guests:
> >
> > 1. Supports most of the Win8 KVP protocol.
> > 2. Supports operations on all the pools.
>
> I've only applied the first patch, due to the issues with the second.
> Feel free to resend the rest when you have them worked out.
Thanks Greg. I don't think there are any substantive issues with the remaining
patches. I will fix them up and re-send them soon.
K. Y
^ permalink raw reply
* RE: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: KY Srinivasan @ 2012-03-15 23:36 UTC (permalink / raw)
To: Dan Carpenter
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <20120312130353.GF3337@mwanda>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Monday, March 12, 2012 9:04 AM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; ohering@suse.com; linux-
> kernel@vger.kernel.org; virtualization@lists.osdl.org; Alan Stern;
> devel@linuxdriverproject.org
> Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> messages in the driver
>
> On Mon, Mar 12, 2012 at 12:36:53PM +0000, KY Srinivasan wrote:
> > Dan,
> > I am sorry for not being as precise as I should be:
> > utf16s_to_utf8s() takes two length parameters - the length of the utf16 string
> > that is to be converted and the second the length of the utf8 output string.
> > The windows host manipulates all string in utf16 encoding and the string we get
> > from the host is guaranteed to be less than or equal to MAX value that we have
> > including the terminating character. In my code, I simply pass the length of the
> > utf16 string as received from the host.
> >
> > The parameter that I am currently passing MAX length value is the "maxout"
> > parameter of the utf16s_utf8s() function. This by definition is the size of the
> > output buffer and in this case it happens to be MAX characters big.
> >
>
> I also think I'm not being as clear as I should... I understand
> that you trust the input; I'm say that for correctness sake you
> should specify a output size which leaves room for the NUL char.
>
> I can't say I know this code very well so I could be wrong, but it's
> what we do inside usb_string() for example. Can someone who knows
> the code check if we should do something like this:
>
Dan,
Sorry I could not get back to you earlier. You are right, in that I am trusting
the host! I am of the firm belief that in a virtualized environment, if you don't trust
the host, there is not a whole lot you can do in a guest! I have had this arguments
with other on this mailing list in the past. Having said that, I think we have spent more
time debating this than we should; your proposal is reasonable and I will go ahead and
re-spin those patches based on your comments. I should be posting them shortly.
Regards,
K. Y
^ permalink raw reply
* [PATCH 0000/0003] drivers: hv
From: K. Y. Srinivasan @ 2012-03-16 0:48 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
This patch-set further enhances the KVP functionality for Linux
guests:
1. Supports most of the Win8 KVP protocol.
2. Supports operations on all the pools.
This is a re-send of the series sent on March10. I have made
changes to address comments Dan Carpenter had posted.
Regards,
K. Y
^ permalink raw reply
* [PATCH 1/3] Drivers: hv: Support the newly introduced KVP messages in the driver
From: K. Y. Srinivasan @ 2012-03-16 0:48 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering; +Cc: K. Y. Srinivasan
In-Reply-To: <1331858893-775-1-git-send-email-kys@microsoft.com>
Support the newly defined KVP message types. It turns out that the host
pushes a set of standard key value pairs as soon as the guest opens the KVP channel.
Since we cannot handle these tuples until the user level daemon loads up, defer
reading the KVP channel until the user level daemon is launched.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/hv_kvp.c | 200 +++++++++++++++++++++++++++++++++++-----------
include/linux/hyperv.h | 2 +
tools/hv/hv_kvp_daemon.c | 7 ++
3 files changed, 162 insertions(+), 47 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 779109b..3485dee 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -42,9 +42,10 @@
static struct {
bool active; /* transaction status - active or not */
int recv_len; /* number of bytes received. */
- int index; /* current index */
+ struct hv_kvp_msg *kvp_msg; /* current message */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
+ void *kvp_context; /* for the channel callback */
} kvp_transaction;
static void kvp_send_key(struct work_struct *dummy);
@@ -110,12 +111,15 @@ kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
struct hv_kvp_msg_enumerate *data;
message = (struct hv_kvp_msg *)msg->data;
- if (message->kvp_hdr.operation == KVP_OP_REGISTER) {
+ switch (message->kvp_hdr.operation) {
+ case KVP_OP_REGISTER:
pr_info("KVP: user-mode registering done.\n");
kvp_register();
- }
+ kvp_transaction.active = false;
+ hv_kvp_onchannelcallback(kvp_transaction.kvp_context);
+ break;
- if (message->kvp_hdr.operation == KVP_OP_ENUMERATE) {
+ default:
data = &message->body.kvp_enum_data;
/*
* Complete the transaction by forwarding the key value
@@ -133,21 +137,104 @@ kvp_send_key(struct work_struct *dummy)
{
struct cn_msg *msg;
struct hv_kvp_msg *message;
- int index = kvp_transaction.index;
+ struct hv_kvp_msg *in_msg;
+ __u8 operation = kvp_transaction.kvp_msg->kvp_hdr.operation;
+ __u8 pool = kvp_transaction.kvp_msg->kvp_hdr.pool;
+ __u32 val32;
+ __u64 val64;
msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg) , GFP_ATOMIC);
+ if (!msg)
+ return;
- if (msg) {
- msg->id.idx = CN_KVP_IDX;
- msg->id.val = CN_KVP_VAL;
+ msg->id.idx = CN_KVP_IDX;
+ msg->id.val = CN_KVP_VAL;
- message = (struct hv_kvp_msg *)msg->data;
- message->kvp_hdr.operation = KVP_OP_ENUMERATE;
- message->body.kvp_enum_data.index = index;
- msg->len = sizeof(struct hv_kvp_msg);
- cn_netlink_send(msg, 0, GFP_ATOMIC);
- kfree(msg);
+ message = (struct hv_kvp_msg *)msg->data;
+ message->kvp_hdr.operation = operation;
+ message->kvp_hdr.pool = pool;
+ in_msg = kvp_transaction.kvp_msg;
+
+ /*
+ * The key/value strings sent from the host are encoded in
+ * in utf16; convert it to utf8 strings.
+ * The host assures us that the utf16 strings will not exceed
+ * the max lengths specified. We will however, reserve room
+ * for the string terminating character - in the utf16s_utf8s()
+ * function we limit the size of the buffer where the converted
+ * string is placed to HV_KVP_EXCHANGE_MAX_*_SIZE -1 to gaurantee
+ * that the strings can be properly terminated!
+ */
+
+ switch (message->kvp_hdr.operation) {
+ case KVP_OP_SET:
+ switch (in_msg->body.kvp_set.data.value_type) {
+ case REG_SZ:
+ /*
+ * The value is a string - utf16 encoding.
+ */
+ message->body.kvp_set.data.value_size =
+ utf16s_to_utf8s(
+ (wchar_t *)in_msg->body.kvp_set.data.value,
+ in_msg->body.kvp_set.data.value_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_set.data.value,
+ HV_KVP_EXCHANGE_MAX_VALUE_SIZE - 1) + 1;
+ break;
+
+ case REG_U32:
+ /*
+ * The value is a 32 bit scalar.
+ * We save this as a utf8 string.
+ */
+ val32 = in_msg->body.kvp_set.data.value_u32;
+ message->body.kvp_set.data.value_size =
+ sprintf(message->body.kvp_set.data.value,
+ "%d", val32) + 1;
+ break;
+
+ case REG_U64:
+ /*
+ * The value is a 64 bit scalar.
+ * We save this as a utf8 string.
+ */
+ val64 = in_msg->body.kvp_set.data.value_u64;
+ message->body.kvp_set.data.value_size =
+ sprintf(message->body.kvp_set.data.value,
+ "%llu", val64) + 1;
+ break;
+
+ }
+ case KVP_OP_GET:
+ message->body.kvp_set.data.key_size =
+ utf16s_to_utf8s(
+ (wchar_t *)in_msg->body.kvp_set.data.key,
+ in_msg->body.kvp_set.data.key_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_set.data.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1) + 1;
+ break;
+
+ case KVP_OP_DELETE:
+ message->body.kvp_delete.key_size =
+ utf16s_to_utf8s(
+ (wchar_t *)in_msg->body.kvp_delete.key,
+ in_msg->body.kvp_delete.key_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_delete.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1) + 1;
+ break;
+
+ case KVP_OP_ENUMERATE:
+ message->body.kvp_enum_data.index =
+ in_msg->body.kvp_enum_data.index;
+ break;
}
+
+ msg->len = sizeof(struct hv_kvp_msg);
+ cn_netlink_send(msg, 0, GFP_ATOMIC);
+ kfree(msg);
+
return;
}
@@ -159,7 +246,7 @@ static void
kvp_respond_to_host(char *key, char *value, int error)
{
struct hv_kvp_msg *kvp_msg;
- struct hv_kvp_msg_enumerate *kvp_data;
+ struct hv_kvp_exchg_msg_value *kvp_data;
char *key_name;
struct icmsg_hdr *icmsghdrp;
int keylen, valuelen;
@@ -189,6 +276,9 @@ kvp_respond_to_host(char *key, char *value, int error)
kvp_transaction.active = false;
+ icmsghdrp = (struct icmsg_hdr *)
+ &recv_buffer[sizeof(struct vmbuspipe_hdr)];
+
if (channel->onchannel_callback == NULL)
/*
* We have raced with util driver being unloaded;
@@ -196,41 +286,57 @@ kvp_respond_to_host(char *key, char *value, int error)
*/
return;
- icmsghdrp = (struct icmsg_hdr *)
- &recv_buffer[sizeof(struct vmbuspipe_hdr)];
- kvp_msg = (struct hv_kvp_msg *)
- &recv_buffer[sizeof(struct vmbuspipe_hdr) +
- sizeof(struct icmsg_hdr)];
- kvp_data = &kvp_msg->body.kvp_enum_data;
- key_name = key;
/*
* If the error parameter is set, terminate the host's enumeration.
*/
if (error) {
/*
- * We don't support this index or the we have timedout;
+ * Something failed or the we have timedout;
* terminate the host-side iteration by returning an error.
*/
icmsghdrp->status = HV_E_FAIL;
goto response_done;
}
+ icmsghdrp->status = HV_S_OK;
+
+ kvp_msg = (struct hv_kvp_msg *)
+ &recv_buffer[sizeof(struct vmbuspipe_hdr) +
+ sizeof(struct icmsg_hdr)];
+
+ switch (kvp_transaction.kvp_msg->kvp_hdr.operation) {
+ case KVP_OP_GET:
+ kvp_data = &kvp_msg->body.kvp_get.data;
+ goto copy_value;
+
+ case KVP_OP_SET:
+ case KVP_OP_DELETE:
+ goto response_done;
+
+ default:
+ break;
+ }
+
+ kvp_data = &kvp_msg->body.kvp_enum_data.data;
+ key_name = key;
+
/*
* The windows host expects the key/value pair to be encoded
* in utf16.
*/
keylen = utf8s_to_utf16s(key_name, strlen(key_name), UTF16_HOST_ENDIAN,
- (wchar_t *) kvp_data->data.key,
+ (wchar_t *) kvp_data->key,
HV_KVP_EXCHANGE_MAX_KEY_SIZE / 2);
- kvp_data->data.key_size = 2*(keylen + 1); /* utf16 encoding */
+ kvp_data->key_size = 2*(keylen + 1); /* utf16 encoding */
+
+copy_value:
valuelen = utf8s_to_utf16s(value, strlen(value), UTF16_HOST_ENDIAN,
- (wchar_t *) kvp_data->data.value,
+ (wchar_t *) kvp_data->value,
HV_KVP_EXCHANGE_MAX_VALUE_SIZE / 2);
- kvp_data->data.value_size = 2*(valuelen + 1); /* utf16 encoding */
+ kvp_data->value_size = 2*(valuelen + 1); /* utf16 encoding */
- kvp_data->data.value_type = REG_SZ; /* all our values are strings */
- icmsghdrp->status = HV_S_OK;
+ kvp_data->value_type = REG_SZ; /* all our values are strings */
response_done:
icmsghdrp->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
@@ -257,11 +363,18 @@ void hv_kvp_onchannelcallback(void *context)
u64 requestid;
struct hv_kvp_msg *kvp_msg;
- struct hv_kvp_msg_enumerate *kvp_data;
struct icmsg_hdr *icmsghdrp;
struct icmsg_negotiate *negop = NULL;
+ if (kvp_transaction.active) {
+ /*
+ * We will defer processing this callback once
+ * the current transaction is complete.
+ */
+ kvp_transaction.kvp_context = context;
+ return;
+ }
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE, &recvlen, &requestid);
@@ -276,29 +389,16 @@ void hv_kvp_onchannelcallback(void *context)
sizeof(struct vmbuspipe_hdr) +
sizeof(struct icmsg_hdr)];
- kvp_data = &kvp_msg->body.kvp_enum_data;
-
- /*
- * We only support the "get" operation on
- * "KVP_POOL_AUTO" pool.
- */
-
- if ((kvp_msg->kvp_hdr.pool != KVP_POOL_AUTO) ||
- (kvp_msg->kvp_hdr.operation !=
- KVP_OP_ENUMERATE)) {
- icmsghdrp->status = HV_E_FAIL;
- goto callback_done;
- }
-
/*
* Stash away this global state for completing the
* transaction; note transactions are serialized.
*/
+
kvp_transaction.recv_len = recvlen;
kvp_transaction.recv_channel = channel;
kvp_transaction.recv_req_id = requestid;
kvp_transaction.active = true;
- kvp_transaction.index = kvp_data->index;
+ kvp_transaction.kvp_msg = kvp_msg;
/*
* Get the information from the
@@ -316,8 +416,6 @@ void hv_kvp_onchannelcallback(void *context)
}
-callback_done:
-
icmsghdrp->icflags = ICMSGHDRFLAG_TRANSACTION
| ICMSGHDRFLAG_RESPONSE;
@@ -338,6 +436,14 @@ hv_kvp_init(struct hv_util_service *srv)
return err;
recv_buffer = srv->recv_buffer;
+ /*
+ * When this driver loads, the user level daemon that
+ * processes the host requests may not yet be running.
+ * Defer processing channel callbacks until the daemon
+ * has registered.
+ */
+ kvp_transaction.active = true;
+
return 0;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index a2d8c54..e88a979 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -119,6 +119,8 @@
*/
#define REG_SZ 1
+#define REG_U32 4
+#define REG_U64 8
enum hv_kvp_exchg_op {
KVP_OP_GET = 0,
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 00d3f7c..a98878c 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -389,10 +389,16 @@ int main(void)
}
continue;
+ case KVP_OP_SET:
+ case KVP_OP_GET:
+ case KVP_OP_DELETE:
default:
break;
}
+ if (hv_msg->kvp_hdr.operation != KVP_OP_ENUMERATE)
+ goto kvp_done;
+
hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
@@ -454,6 +460,7 @@ int main(void)
* already in the receive buffer. Update the cn_msg header to
* reflect the key value that has been added to the message
*/
+kvp_done:
incoming_cn_msg->id.idx = CN_KVP_IDX;
incoming_cn_msg->id.val = CN_KVP_VAL;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 2/3] Tools: hv: Fully support the new KVP verbs in the user level daemon
From: K. Y. Srinivasan @ 2012-03-16 0:48 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
In-Reply-To: <1331858925-827-1-git-send-email-kys@microsoft.com>
Now fully support the new KVP messages in the user level daemon. Hyper-V defines
multiple persistent pools to which the host can write/read/modify KVP tuples.
In this patch we implement a file for each specified pool, where the KVP tuples
will be stored in the guest.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
tools/hv/hv_kvp_daemon.c | 281 +++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 280 insertions(+), 1 deletions(-)
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index a98878c..2fb9c3d 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -39,7 +39,8 @@
#include <ifaddrs.h>
#include <netdb.h>
#include <syslog.h>
-
+#include <sys/stat.h>
+#include <fcntl.h>
/*
* KVP protocol: The user mode component first registers with the
@@ -79,6 +80,250 @@ static char *os_build;
static char *lic_version;
static struct utsname uts_buf;
+
+#define MAX_FILE_NAME 100
+#define ENTRIES_PER_BLOCK 50
+
+struct kvp_record {
+ __u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+ __u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
+};
+
+struct kvp_file_state {
+ int fd;
+ int num_blocks;
+ struct kvp_record *records;
+ int num_records;
+ __u8 fname[MAX_FILE_NAME];
+};
+
+static struct kvp_file_state kvp_file_info[KVP_POOL_COUNT];
+
+static void kvp_acquire_lock(int pool)
+{
+ struct flock fl = {F_WRLCK, SEEK_SET, 0, 0, 0};
+ fl.l_pid = getpid();
+
+ if (fcntl(kvp_file_info[pool].fd, F_SETLKW, &fl) == -1) {
+ syslog(LOG_ERR, "Failed to acquire the lock pool: %d", pool);
+ exit(-1);
+ }
+}
+
+static void kvp_release_lock(int pool)
+{
+ struct flock fl = {F_UNLCK, SEEK_SET, 0, 0, 0};
+ fl.l_pid = getpid();
+
+ if (fcntl(kvp_file_info[pool].fd, F_SETLK, &fl) == -1) {
+ perror("fcntl");
+ syslog(LOG_ERR, "Failed to release the lock pool: %d", pool);
+ exit(-1);
+ }
+}
+
+static void kvp_update_file(int pool)
+{
+ FILE *filep;
+ size_t bytes_written;
+
+ /*
+ * We are going to write our in-memory registry out to
+ * disk; acquire the lock first.
+ */
+ kvp_acquire_lock(pool);
+
+ filep = fopen(kvp_file_info[pool].fname, "w");
+ if (!filep) {
+ kvp_release_lock(pool);
+ syslog(LOG_ERR, "Failed to open file, pool: %d", pool);
+ exit(-1);
+ }
+
+ bytes_written = fwrite(kvp_file_info[pool].records,
+ sizeof(struct kvp_record),
+ kvp_file_info[pool].num_records, filep);
+
+ fflush(filep);
+ kvp_release_lock(pool);
+}
+
+static int kvp_file_init(void)
+{
+ int ret, fd;
+ FILE *filep;
+ size_t records_read;
+ __u8 *fname;
+ struct kvp_record *record;
+ struct kvp_record *readp;
+ int num_blocks;
+ int i;
+ int alloc_unit = sizeof(struct kvp_record) * ENTRIES_PER_BLOCK;
+
+ if (access("/var/opt/hyperv", F_OK)) {
+ if (mkdir("/var/opt/hyperv", S_IRUSR | S_IWUSR | S_IROTH)) {
+ syslog(LOG_ERR, " Failed to create /var/opt/hyperv");
+ exit(-1);
+ }
+ }
+
+ for (i = 0; i < KVP_POOL_COUNT; i++) {
+ fname = kvp_file_info[i].fname;
+ records_read = 0;
+ num_blocks = 1;
+ sprintf(fname, "/var/opt/hyperv/.kvp_pool_%d", i);
+ fd = open(fname, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IROTH);
+
+ if (fd == -1)
+ return 1;
+
+
+ filep = fopen(fname, "r");
+ if (!filep)
+ return 1;
+
+ record = malloc(alloc_unit * num_blocks);
+ if (record == NULL) {
+ fclose(filep);
+ return 1;
+ }
+ while (!feof(filep)) {
+ readp = &record[records_read];
+ records_read += fread(readp, sizeof(struct kvp_record),
+ ENTRIES_PER_BLOCK,
+ filep);
+
+ if (!feof(filep)) {
+ /*
+ * We have more data to read.
+ */
+ num_blocks++;
+ record = realloc(record, alloc_unit *
+ num_blocks);
+ if (record == NULL) {
+ fclose(filep);
+ return 1;
+ }
+ continue;
+ }
+ break;
+ }
+ kvp_file_info[i].fd = fd;
+ kvp_file_info[i].num_blocks = num_blocks;
+ kvp_file_info[i].records = record;
+ kvp_file_info[i].num_records = records_read;
+ fclose(filep);
+
+ }
+
+ return 0;
+}
+
+static int kvp_key_delete(int pool, __u8 *key, int key_size)
+{
+ int i;
+ int j, k;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just move the remaining
+ * entries up.
+ */
+ if (i == num_records) {
+ kvp_file_info[pool].num_records--;
+ kvp_update_file(pool);
+ return 0;
+ }
+
+ j = i;
+ k = j + 1;
+ for (; k < num_records; k++) {
+ strcpy(record[j].key, record[k].key);
+ strcpy(record[j].value, record[k].value);
+ j++;
+ }
+
+ kvp_file_info[pool].num_records--;
+ kvp_update_file(pool);
+ return 0;
+ }
+ return 1;
+}
+
+static int kvp_key_add_or_modify(int pool, __u8 *key, int key_size, __u8 *value,
+ int value_size)
+{
+ int i;
+ int j, k;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+ int num_blocks = kvp_file_info[pool].num_blocks;
+
+ if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
+ (value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
+ return 1;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just update the value -
+ * this is the modify case.
+ */
+ memcpy(record[i].value, value, value_size);
+ kvp_update_file(pool);
+ return 0;
+ }
+
+ /*
+ * Need to add a new entry;
+ */
+ if (num_records == (ENTRIES_PER_BLOCK * num_blocks)) {
+ /* Need to allocate a larger array for reg entries. */
+ record = realloc(record, sizeof(struct kvp_record) *
+ ENTRIES_PER_BLOCK * (num_blocks + 1));
+
+ if (record == NULL)
+ return 1;
+ kvp_file_info[pool].num_blocks++;
+
+ }
+ memcpy(record[i].value, value, value_size);
+ memcpy(record[i].key, key, key_size);
+ kvp_file_info[pool].records = record;
+ kvp_file_info[pool].num_records++;
+ kvp_update_file(pool);
+ return 0;
+}
+
+static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
+ int value_size)
+{
+ int i;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+
+ if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
+ (value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
+ return 1;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just copy the value out.
+ */
+ memcpy(value, record[i].value, value_size);
+ return 0;
+ }
+
+ return 1;
+}
+
void kvp_get_os_info(void)
{
FILE *file;
@@ -315,6 +560,11 @@ int main(void)
*/
kvp_get_os_info();
+ if (kvp_file_init()) {
+ syslog(LOG_ERR, "Failed to initialize the pools");
+ exit(-1);
+ }
+
fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
if (fd < 0) {
syslog(LOG_ERR, "netlink socket creation failed; error:%d", fd);
@@ -389,9 +639,38 @@ int main(void)
}
continue;
+ /*
+ * The current protocol with the kernel component uses a
+ * NULL key name to pass an error condition.
+ * For the SET, GET and DELETE operations,
+ * use the existing protocol to pass back error.
+ */
+
case KVP_OP_SET:
+ if (kvp_key_add_or_modify(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_set.data.key,
+ hv_msg->body.kvp_set.data.key_size,
+ hv_msg->body.kvp_set.data.value,
+ hv_msg->body.kvp_set.data.value_size))
+ strcpy(hv_msg->body.kvp_set.data.key, "");
+ break;
+
case KVP_OP_GET:
+ if (kvp_get_value(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_set.data.key,
+ hv_msg->body.kvp_set.data.key_size,
+ hv_msg->body.kvp_set.data.value,
+ hv_msg->body.kvp_set.data.value_size))
+ strcpy(hv_msg->body.kvp_set.data.key, "");
+ break;
+
case KVP_OP_DELETE:
+ if (kvp_key_delete(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_delete.key,
+ hv_msg->body.kvp_delete.key_size))
+ strcpy(hv_msg->body.kvp_delete.key, "");
+ break;
+
default:
break;
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/3] Tools: hv: Support enumeration from all the pools
From: K. Y. Srinivasan @ 2012-03-16 0:48 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering; +Cc: K. Y. Srinivasan
In-Reply-To: <1331858925-827-1-git-send-email-kys@microsoft.com>
We have supported enumeration only from the AUTO pool. Now support
enumeration from all the available pools.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/hv_kvp.c | 7 ++-
include/linux/hyperv.h | 1 +
tools/hv/hv_kvp_daemon.c | 124 +++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 122 insertions(+), 10 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 3485dee..843bfa3 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -288,14 +288,15 @@ kvp_respond_to_host(char *key, char *value, int error)
/*
- * If the error parameter is set, terminate the host's enumeration.
+ * If the error parameter is set, terminate the host's enumeration
+ * on this pool.
*/
if (error) {
/*
* Something failed or the we have timedout;
- * terminate the host-side iteration by returning an error.
+ * terminate the current host-side iteration.
*/
- icmsghdrp->status = HV_E_FAIL;
+ icmsghdrp->status = HV_S_CONT;
goto response_done;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index e88a979..5852545 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -952,6 +952,7 @@ void vmbus_driver_unregister(struct hv_driver *hv_driver);
#define HV_S_OK 0x00000000
#define HV_E_FAIL 0x80004005
+#define HV_S_CONT 0x80070103
#define HV_ERROR_NOT_SUPPORTED 0x80070032
#define HV_ERROR_MACHINE_LOCKED 0x800704F7
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 2fb9c3d..146fd61 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -148,6 +148,51 @@ static void kvp_update_file(int pool)
kvp_release_lock(pool);
}
+static void kvp_update_mem_state(int pool)
+{
+ FILE *filep;
+ size_t records_read = 0;
+ struct kvp_record *record = kvp_file_info[pool].records;
+ struct kvp_record *readp;
+ int num_blocks = kvp_file_info[pool].num_blocks;
+ int alloc_unit = sizeof(struct kvp_record) * ENTRIES_PER_BLOCK;
+
+ kvp_acquire_lock(pool);
+
+ filep = fopen(kvp_file_info[pool].fname, "r");
+ if (!filep) {
+ kvp_release_lock(pool);
+ syslog(LOG_ERR, "Failed to open file, pool: %d", pool);
+ exit(-1);
+ }
+ while (!feof(filep)) {
+ readp = &record[records_read];
+ records_read += fread(readp, sizeof(struct kvp_record),
+ ENTRIES_PER_BLOCK * num_blocks,
+ filep);
+
+ if (!feof(filep)) {
+ /*
+ * We have more data to read.
+ */
+ num_blocks++;
+ record = realloc(record, alloc_unit * num_blocks);
+
+ if (record == NULL) {
+ syslog(LOG_ERR, "malloc failed");
+ exit(-1);
+ }
+ continue;
+ }
+ break;
+ }
+
+ kvp_file_info[pool].num_blocks = num_blocks;
+ kvp_file_info[pool].records = record;
+ kvp_file_info[pool].num_records = records_read;
+
+ kvp_release_lock(pool);
+}
static int kvp_file_init(void)
{
int ret, fd;
@@ -223,8 +268,16 @@ static int kvp_key_delete(int pool, __u8 *key, int key_size)
{
int i;
int j, k;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
+ int num_records;
+ struct kvp_record *record;
+
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
@@ -259,14 +312,23 @@ static int kvp_key_add_or_modify(int pool, __u8 *key, int key_size, __u8 *value,
{
int i;
int j, k;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
- int num_blocks = kvp_file_info[pool].num_blocks;
+ int num_records;
+ struct kvp_record *record;
+ int num_blocks;
if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
(value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
return 1;
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
+ num_blocks = kvp_file_info[pool].num_blocks;
+
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
continue;
@@ -304,13 +366,21 @@ static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
int value_size)
{
int i;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
+ int num_records;
+ struct kvp_record *record;
if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
(value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
return 1;
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
+
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
continue;
@@ -324,6 +394,31 @@ static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
return 1;
}
+static void kvp_pool_enumerate(int pool, int index, __u8 *key, int key_size,
+ __u8 *value, int value_size)
+{
+ struct kvp_record *record;
+
+ /*
+ * First update our in-memory database.
+ */
+ kvp_update_mem_state(pool);
+ record = kvp_file_info[pool].records;
+
+ if (index >= kvp_file_info[pool].num_records) {
+ /*
+ * This is an invalid index; terminate enumeration;
+ * - a NULL value will do the trick.
+ */
+ strcpy(value, "");
+ return;
+ }
+
+ memcpy(key, record[index].key, key_size);
+ memcpy(value, record[index].value, value_size);
+}
+
+
void kvp_get_os_info(void)
{
FILE *file;
@@ -678,6 +773,21 @@ int main(void)
if (hv_msg->kvp_hdr.operation != KVP_OP_ENUMERATE)
goto kvp_done;
+ /*
+ * If the pool is KVP_POOL_AUTO, dynamically generate
+ * both the key and the value; if not read from the
+ * appropriate pool.
+ */
+ if (hv_msg->kvp_hdr.pool != KVP_POOL_AUTO) {
+ kvp_pool_enumerate(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_enum_data.index,
+ hv_msg->body.kvp_enum_data.data.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE,
+ hv_msg->body.kvp_enum_data.data.value,
+ HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
+ goto kvp_done;
+ }
+
hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Dan Carpenter @ 2012-03-16 5:38 UTC (permalink / raw)
To: KY Srinivasan
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B7663B1@TK5EX14MBXC126.redmond.corp.microsoft.com>
[-- Attachment #1.1: Type: text/plain, Size: 870 bytes --]
On Thu, Mar 15, 2012 at 11:36:16PM +0000, KY Srinivasan wrote:
>
> Dan,
>
> Sorry I could not get back to you earlier. You are right, in that I am trusting
> the host! I am of the firm belief that in a virtualized environment, if you don't trust
> the host, there is not a whole lot you can do in a guest! I have had this arguments
> with other on this mailing list in the past. Having said that, I think we have spent more
> time debating this than we should; your proposal is reasonable and I will go ahead and
> re-spin those patches based on your comments. I should be posting them shortly.
>
> Regards,
It's not about trusting the host or not trusting the host. It's
about "if you're going to specify a limitter, it can't be off by one
even if you don't expect to hit the limit".
But thanks for redoing these.
regards,
dan carpenter
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox