* RE: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: KY Srinivasan @ 2012-03-11 20:53 UTC (permalink / raw)
To: Dan Carpenter
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <20120311184916.GD3337@mwanda>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Sunday, March 11, 2012 2:49 PM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> Alan Stern
> Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> messages in the driver
>
> On Sun, Mar 11, 2012 at 04:56:06PM +0000, KY Srinivasan wrote:
> > > Probably that's not enough to make a difference and we'd need to
> > > introduce a new function.
> > >
> > > Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> > > It feels like maybe we could end up with ->value_size equal to
> > > HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
> >
> > The MAX value is set to accommodate the maximum string that will ever
> > be handled including the string terminator. The function utf16s_to_utf8s()
> > returns the converted string length but the returned length does not
> > include the string terminator (like strlen), hence the "+1".
> >
>
> sprintf() and friends copy the NUL terminator but utf16s_to_utf8s()
> doesn't so the code isn't right and it does seem like maybe we could
> end up with a ->value_size equal to HV_KVP_EXCHANGE_MAX_VALUE_SIZE +
> 1.
You are right in that utf16s_to_utf8s() does not copy the string terminator. This
is not an issue in this case since the buffer for the utf8 string is zeroed out to begin
with (this memory was allocated using kzalloc()). The return value of the utf16s_to_utf8s()
is the length of the utf8s string as what would be returned by strlen. I add one to take into account
the string terminator character for further processing. As I said before the MAX value takes into
account the terminating character for all the strings handled.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Dan Carpenter @ 2012-03-11 18:49 UTC (permalink / raw)
To: KY Srinivasan
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B75B403@TK5EX14MBXC122.redmond.corp.microsoft.com>
[-- Attachment #1.1: Type: text/plain, Size: 888 bytes --]
On Sun, Mar 11, 2012 at 04:56:06PM +0000, KY Srinivasan wrote:
> > Probably that's not enough to make a difference and we'd need to
> > introduce a new function.
> >
> > Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> > It feels like maybe we could end up with ->value_size equal to
> > HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
>
> The MAX value is set to accommodate the maximum string that will ever
> be handled including the string terminator. The function utf16s_to_utf8s()
> returns the converted string length but the returned length does not
> include the string terminator (like strlen), hence the "+1".
>
sprintf() and friends copy the NUL terminator but utf16s_to_utf8s()
doesn't so the code isn't right and it does seem like maybe we could
end up with a ->value_size equal to HV_KVP_EXCHANGE_MAX_VALUE_SIZE +
1.
regards,
dan carpenter
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel
^ permalink raw reply
* RE: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: KY Srinivasan @ 2012-03-11 16:56 UTC (permalink / raw)
To: Dan Carpenter
Cc: gregkh@linuxfoundation.org, ohering@suse.com,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
Alan Stern, devel@linuxdriverproject.org
In-Reply-To: <20120311104230.GC3337@mwanda>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Sunday, March 11, 2012 6:43 AM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> Alan Stern
> Subject: Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP
> messages in the driver
>
> On Sat, Mar 10, 2012 at 03:32:09PM -0800, K. Y. Srinivasan wrote:
> > + switch (message->kvp_hdr.operation) {
> > + case KVP_OP_SET:
> > + switch (in_msg->body.kvp_set.data.value_type) {
> > + case REG_SZ:
> > + /*
> > + * The value is a string - utf16 encoding.
> > + */
> > + message->body.kvp_set.data.value_size =
> > + utf16s_to_utf8s(
> > + (wchar_t *)
> > + in_msg->body.kvp_set.data.value,
> > + in_msg->body.kvp_set.data.value_size,
> > + UTF16_LITTLE_ENDIAN,
> > + message->body.kvp_set.data.value,
> > + HV_KVP_EXCHANGE_MAX_VALUE_SIZE) + 1;
> > + break;
> > +
>
> This block of unreadable text is so nasty.
>
> You could return directly if the msg = kmalloc() fails and pull
> everything in one indent level. It's normally more readable to
> handle errors as soon as possible anyway.
True.
>
> Probably that's not enough to make a difference and we'd need to
> introduce a new function.
>
> Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> It feels like maybe we could end up with ->value_size equal to
> HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
The MAX value is set to accommodate the maximum string that will ever
be handled including the string terminator. The function utf16s_to_utf8s()
returns the converted string length but the returned length does not
include the string terminator (like strlen), hence the "+1".
Dan, I will see if there are other comments on these patches and will
accommodate your suggestion then. If there are no other comments,
would you mind if I addressed your comments here in a separate patch.
Regards,
K. Y
>
> regards,
> dan carpenter
^ permalink raw reply
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Alan Stern @ 2012-03-11 16:01 UTC (permalink / raw)
To: Dan Carpenter
Cc: K. Y. Srinivasan, gregkh, linux-kernel, devel, virtualization,
ohering
In-Reply-To: <20120311104230.GC3337@mwanda>
On Sun, 11 Mar 2012, Dan Carpenter wrote:
> Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
> It feels like maybe we could end up with ->value_size equal to
> HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
It does not count NUL characters. If it encounters a NUL character in
the input, it stops right away without copying that character to the
output. If it reaches the end of the input, it does not add a
terminating NUL character to the output.
Alan Stern
^ permalink raw reply
* Re: [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: Dan Carpenter @ 2012-03-11 10:42 UTC (permalink / raw)
To: K. Y. Srinivasan
Cc: gregkh, ohering, linux-kernel, virtualization, Alan Stern, devel
In-Reply-To: <1331422331-4381-2-git-send-email-kys@microsoft.com>
[-- Attachment #1.1: Type: text/plain, Size: 1078 bytes --]
On Sat, Mar 10, 2012 at 03:32:09PM -0800, K. Y. Srinivasan wrote:
> + switch (message->kvp_hdr.operation) {
> + case KVP_OP_SET:
> + switch (in_msg->body.kvp_set.data.value_type) {
> + case REG_SZ:
> + /*
> + * The value is a string - utf16 encoding.
> + */
> + message->body.kvp_set.data.value_size =
> + utf16s_to_utf8s(
> + (wchar_t *)
> + in_msg->body.kvp_set.data.value,
> + in_msg->body.kvp_set.data.value_size,
> + UTF16_LITTLE_ENDIAN,
> + message->body.kvp_set.data.value,
> + HV_KVP_EXCHANGE_MAX_VALUE_SIZE) + 1;
> + break;
> +
This block of unreadable text is so nasty.
You could return directly if the msg = kmalloc() fails and pull
everything in one indent level. It's normally more readable to
handle errors as soon as possible anyway.
Probably that's not enough to make a difference and we'd need to
introduce a new function.
Btw I don't know if utf16s_to_utf8s() counts the NUL char or not.
It feels like maybe we could end up with ->value_size equal to
HV_KVP_EXCHANGE_MAX_VALUE_SIZE + 1.
regards,
dan carpenter
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel
^ permalink raw reply
* [PATCH 4/4] Tools: hv: Support enumeration from all the pools
From: K. Y. Srinivasan @ 2012-03-10 23:32 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
In-Reply-To: <1331422331-4381-1-git-send-email-kys@microsoft.com>
We have supported enumeration only from the AUTO pool. Now support
enumeration from all the available pools.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/hv_kvp.c | 7 ++-
include/linux/hyperv.h | 1 +
tools/hv/hv_kvp_daemon.c | 124 +++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 122 insertions(+), 10 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 3b2eeaa..1a70b10 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -286,14 +286,15 @@ kvp_respond_to_host(char *key, char *value, int error)
/*
- * If the error parameter is set, terminate the host's enumeration.
+ * If the error parameter is set, terminate the host's enumeration
+ * on this pool.
*/
if (error) {
/*
* Something failed or the we have timedout;
- * terminate the host-side iteration by returning an error.
+ * terminate the current host-side iteration.
*/
- icmsghdrp->status = HV_E_FAIL;
+ icmsghdrp->status = HV_S_CONT;
goto response_done;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index e88a979..5852545 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -952,6 +952,7 @@ void vmbus_driver_unregister(struct hv_driver *hv_driver);
#define HV_S_OK 0x00000000
#define HV_E_FAIL 0x80004005
+#define HV_S_CONT 0x80070103
#define HV_ERROR_NOT_SUPPORTED 0x80070032
#define HV_ERROR_MACHINE_LOCKED 0x800704F7
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 2fb9c3d..146fd61 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -148,6 +148,51 @@ static void kvp_update_file(int pool)
kvp_release_lock(pool);
}
+static void kvp_update_mem_state(int pool)
+{
+ FILE *filep;
+ size_t records_read = 0;
+ struct kvp_record *record = kvp_file_info[pool].records;
+ struct kvp_record *readp;
+ int num_blocks = kvp_file_info[pool].num_blocks;
+ int alloc_unit = sizeof(struct kvp_record) * ENTRIES_PER_BLOCK;
+
+ kvp_acquire_lock(pool);
+
+ filep = fopen(kvp_file_info[pool].fname, "r");
+ if (!filep) {
+ kvp_release_lock(pool);
+ syslog(LOG_ERR, "Failed to open file, pool: %d", pool);
+ exit(-1);
+ }
+ while (!feof(filep)) {
+ readp = &record[records_read];
+ records_read += fread(readp, sizeof(struct kvp_record),
+ ENTRIES_PER_BLOCK * num_blocks,
+ filep);
+
+ if (!feof(filep)) {
+ /*
+ * We have more data to read.
+ */
+ num_blocks++;
+ record = realloc(record, alloc_unit * num_blocks);
+
+ if (record == NULL) {
+ syslog(LOG_ERR, "malloc failed");
+ exit(-1);
+ }
+ continue;
+ }
+ break;
+ }
+
+ kvp_file_info[pool].num_blocks = num_blocks;
+ kvp_file_info[pool].records = record;
+ kvp_file_info[pool].num_records = records_read;
+
+ kvp_release_lock(pool);
+}
static int kvp_file_init(void)
{
int ret, fd;
@@ -223,8 +268,16 @@ static int kvp_key_delete(int pool, __u8 *key, int key_size)
{
int i;
int j, k;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
+ int num_records;
+ struct kvp_record *record;
+
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
@@ -259,14 +312,23 @@ static int kvp_key_add_or_modify(int pool, __u8 *key, int key_size, __u8 *value,
{
int i;
int j, k;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
- int num_blocks = kvp_file_info[pool].num_blocks;
+ int num_records;
+ struct kvp_record *record;
+ int num_blocks;
if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
(value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
return 1;
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
+ num_blocks = kvp_file_info[pool].num_blocks;
+
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
continue;
@@ -304,13 +366,21 @@ static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
int value_size)
{
int i;
- int num_records = kvp_file_info[pool].num_records;
- struct kvp_record *record = kvp_file_info[pool].records;
+ int num_records;
+ struct kvp_record *record;
if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
(value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
return 1;
+ /*
+ * First update the in-memory state.
+ */
+ kvp_update_mem_state(pool);
+
+ num_records = kvp_file_info[pool].num_records;
+ record = kvp_file_info[pool].records;
+
for (i = 0; i < num_records; i++) {
if (memcmp(key, record[i].key, key_size))
continue;
@@ -324,6 +394,31 @@ static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
return 1;
}
+static void kvp_pool_enumerate(int pool, int index, __u8 *key, int key_size,
+ __u8 *value, int value_size)
+{
+ struct kvp_record *record;
+
+ /*
+ * First update our in-memory database.
+ */
+ kvp_update_mem_state(pool);
+ record = kvp_file_info[pool].records;
+
+ if (index >= kvp_file_info[pool].num_records) {
+ /*
+ * This is an invalid index; terminate enumeration;
+ * - a NULL value will do the trick.
+ */
+ strcpy(value, "");
+ return;
+ }
+
+ memcpy(key, record[index].key, key_size);
+ memcpy(value, record[index].value, value_size);
+}
+
+
void kvp_get_os_info(void)
{
FILE *file;
@@ -678,6 +773,21 @@ int main(void)
if (hv_msg->kvp_hdr.operation != KVP_OP_ENUMERATE)
goto kvp_done;
+ /*
+ * If the pool is KVP_POOL_AUTO, dynamically generate
+ * both the key and the value; if not read from the
+ * appropriate pool.
+ */
+ if (hv_msg->kvp_hdr.pool != KVP_POOL_AUTO) {
+ kvp_pool_enumerate(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_enum_data.index,
+ hv_msg->body.kvp_enum_data.data.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE,
+ hv_msg->body.kvp_enum_data.data.value,
+ HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
+ goto kvp_done;
+ }
+
hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/4] Tools: hv: Fully support the new KVP verbs in the user level daemon
From: K. Y. Srinivasan @ 2012-03-10 23:32 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
In-Reply-To: <1331422331-4381-1-git-send-email-kys@microsoft.com>
Now fully support the new KVP messages in the user level daemon. Hyper-V defines
multiple persistent pools to which the host can write/read/modify KVP tuples.
In this patch we implement a file for each specified pool, where the KVP tuples
will stored in the guest.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
tools/hv/hv_kvp_daemon.c | 281 +++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 280 insertions(+), 1 deletions(-)
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index a98878c..2fb9c3d 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -39,7 +39,8 @@
#include <ifaddrs.h>
#include <netdb.h>
#include <syslog.h>
-
+#include <sys/stat.h>
+#include <fcntl.h>
/*
* KVP protocol: The user mode component first registers with the
@@ -79,6 +80,250 @@ static char *os_build;
static char *lic_version;
static struct utsname uts_buf;
+
+#define MAX_FILE_NAME 100
+#define ENTRIES_PER_BLOCK 50
+
+struct kvp_record {
+ __u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+ __u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
+};
+
+struct kvp_file_state {
+ int fd;
+ int num_blocks;
+ struct kvp_record *records;
+ int num_records;
+ __u8 fname[MAX_FILE_NAME];
+};
+
+static struct kvp_file_state kvp_file_info[KVP_POOL_COUNT];
+
+static void kvp_acquire_lock(int pool)
+{
+ struct flock fl = {F_WRLCK, SEEK_SET, 0, 0, 0};
+ fl.l_pid = getpid();
+
+ if (fcntl(kvp_file_info[pool].fd, F_SETLKW, &fl) == -1) {
+ syslog(LOG_ERR, "Failed to acquire the lock pool: %d", pool);
+ exit(-1);
+ }
+}
+
+static void kvp_release_lock(int pool)
+{
+ struct flock fl = {F_UNLCK, SEEK_SET, 0, 0, 0};
+ fl.l_pid = getpid();
+
+ if (fcntl(kvp_file_info[pool].fd, F_SETLK, &fl) == -1) {
+ perror("fcntl");
+ syslog(LOG_ERR, "Failed to release the lock pool: %d", pool);
+ exit(-1);
+ }
+}
+
+static void kvp_update_file(int pool)
+{
+ FILE *filep;
+ size_t bytes_written;
+
+ /*
+ * We are going to write our in-memory registry out to
+ * disk; acquire the lock first.
+ */
+ kvp_acquire_lock(pool);
+
+ filep = fopen(kvp_file_info[pool].fname, "w");
+ if (!filep) {
+ kvp_release_lock(pool);
+ syslog(LOG_ERR, "Failed to open file, pool: %d", pool);
+ exit(-1);
+ }
+
+ bytes_written = fwrite(kvp_file_info[pool].records,
+ sizeof(struct kvp_record),
+ kvp_file_info[pool].num_records, filep);
+
+ fflush(filep);
+ kvp_release_lock(pool);
+}
+
+static int kvp_file_init(void)
+{
+ int ret, fd;
+ FILE *filep;
+ size_t records_read;
+ __u8 *fname;
+ struct kvp_record *record;
+ struct kvp_record *readp;
+ int num_blocks;
+ int i;
+ int alloc_unit = sizeof(struct kvp_record) * ENTRIES_PER_BLOCK;
+
+ if (access("/var/opt/hyperv", F_OK)) {
+ if (mkdir("/var/opt/hyperv", S_IRUSR | S_IWUSR | S_IROTH)) {
+ syslog(LOG_ERR, " Failed to create /var/opt/hyperv");
+ exit(-1);
+ }
+ }
+
+ for (i = 0; i < KVP_POOL_COUNT; i++) {
+ fname = kvp_file_info[i].fname;
+ records_read = 0;
+ num_blocks = 1;
+ sprintf(fname, "/var/opt/hyperv/.kvp_pool_%d", i);
+ fd = open(fname, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IROTH);
+
+ if (fd == -1)
+ return 1;
+
+
+ filep = fopen(fname, "r");
+ if (!filep)
+ return 1;
+
+ record = malloc(alloc_unit * num_blocks);
+ if (record == NULL) {
+ fclose(filep);
+ return 1;
+ }
+ while (!feof(filep)) {
+ readp = &record[records_read];
+ records_read += fread(readp, sizeof(struct kvp_record),
+ ENTRIES_PER_BLOCK,
+ filep);
+
+ if (!feof(filep)) {
+ /*
+ * We have more data to read.
+ */
+ num_blocks++;
+ record = realloc(record, alloc_unit *
+ num_blocks);
+ if (record == NULL) {
+ fclose(filep);
+ return 1;
+ }
+ continue;
+ }
+ break;
+ }
+ kvp_file_info[i].fd = fd;
+ kvp_file_info[i].num_blocks = num_blocks;
+ kvp_file_info[i].records = record;
+ kvp_file_info[i].num_records = records_read;
+ fclose(filep);
+
+ }
+
+ return 0;
+}
+
+static int kvp_key_delete(int pool, __u8 *key, int key_size)
+{
+ int i;
+ int j, k;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just move the remaining
+ * entries up.
+ */
+ if (i == num_records) {
+ kvp_file_info[pool].num_records--;
+ kvp_update_file(pool);
+ return 0;
+ }
+
+ j = i;
+ k = j + 1;
+ for (; k < num_records; k++) {
+ strcpy(record[j].key, record[k].key);
+ strcpy(record[j].value, record[k].value);
+ j++;
+ }
+
+ kvp_file_info[pool].num_records--;
+ kvp_update_file(pool);
+ return 0;
+ }
+ return 1;
+}
+
+static int kvp_key_add_or_modify(int pool, __u8 *key, int key_size, __u8 *value,
+ int value_size)
+{
+ int i;
+ int j, k;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+ int num_blocks = kvp_file_info[pool].num_blocks;
+
+ if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
+ (value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
+ return 1;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just update the value -
+ * this is the modify case.
+ */
+ memcpy(record[i].value, value, value_size);
+ kvp_update_file(pool);
+ return 0;
+ }
+
+ /*
+ * Need to add a new entry;
+ */
+ if (num_records == (ENTRIES_PER_BLOCK * num_blocks)) {
+ /* Need to allocate a larger array for reg entries. */
+ record = realloc(record, sizeof(struct kvp_record) *
+ ENTRIES_PER_BLOCK * (num_blocks + 1));
+
+ if (record == NULL)
+ return 1;
+ kvp_file_info[pool].num_blocks++;
+
+ }
+ memcpy(record[i].value, value, value_size);
+ memcpy(record[i].key, key, key_size);
+ kvp_file_info[pool].records = record;
+ kvp_file_info[pool].num_records++;
+ kvp_update_file(pool);
+ return 0;
+}
+
+static int kvp_get_value(int pool, __u8 *key, int key_size, __u8 *value,
+ int value_size)
+{
+ int i;
+ int num_records = kvp_file_info[pool].num_records;
+ struct kvp_record *record = kvp_file_info[pool].records;
+
+ if ((key_size > HV_KVP_EXCHANGE_MAX_KEY_SIZE) ||
+ (value_size > HV_KVP_EXCHANGE_MAX_VALUE_SIZE))
+ return 1;
+
+ for (i = 0; i < num_records; i++) {
+ if (memcmp(key, record[i].key, key_size))
+ continue;
+ /*
+ * Found a match; just copy the value out.
+ */
+ memcpy(value, record[i].value, value_size);
+ return 0;
+ }
+
+ return 1;
+}
+
void kvp_get_os_info(void)
{
FILE *file;
@@ -315,6 +560,11 @@ int main(void)
*/
kvp_get_os_info();
+ if (kvp_file_init()) {
+ syslog(LOG_ERR, "Failed to initialize the pools");
+ exit(-1);
+ }
+
fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
if (fd < 0) {
syslog(LOG_ERR, "netlink socket creation failed; error:%d", fd);
@@ -389,9 +639,38 @@ int main(void)
}
continue;
+ /*
+ * The current protocol with the kernel component uses a
+ * NULL key name to pass an error condition.
+ * For the SET, GET and DELETE operations,
+ * use the existing protocol to pass back error.
+ */
+
case KVP_OP_SET:
+ if (kvp_key_add_or_modify(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_set.data.key,
+ hv_msg->body.kvp_set.data.key_size,
+ hv_msg->body.kvp_set.data.value,
+ hv_msg->body.kvp_set.data.value_size))
+ strcpy(hv_msg->body.kvp_set.data.key, "");
+ break;
+
case KVP_OP_GET:
+ if (kvp_get_value(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_set.data.key,
+ hv_msg->body.kvp_set.data.key_size,
+ hv_msg->body.kvp_set.data.value,
+ hv_msg->body.kvp_set.data.value_size))
+ strcpy(hv_msg->body.kvp_set.data.key, "");
+ break;
+
case KVP_OP_DELETE:
+ if (kvp_key_delete(hv_msg->kvp_hdr.pool,
+ hv_msg->body.kvp_delete.key,
+ hv_msg->body.kvp_delete.key_size))
+ strcpy(hv_msg->body.kvp_delete.key, "");
+ break;
+
default:
break;
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH 2/4] Drivers: hv: Support the newly introduced KVP messages in the driver
From: K. Y. Srinivasan @ 2012-03-10 23:32 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering; +Cc: K. Y. Srinivasan
In-Reply-To: <1331422331-4381-1-git-send-email-kys@microsoft.com>
Now support the newly defined KVP message types. It turns out that the host
pushes a set of stand key value pairs as soon as the guest opens the KVP channel.
Since we cannot handle these tuples until the user level daemon loads up, defer
reading the KVP channel until the user level daemon is launched.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/hv_kvp.c | 184 ++++++++++++++++++++++++++++++++++++----------
include/linux/hyperv.h | 2 +
tools/hv/hv_kvp_daemon.c | 7 ++
3 files changed, 153 insertions(+), 40 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 779109b..3b2eeaa 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -42,9 +42,10 @@
static struct {
bool active; /* transaction status - active or not */
int recv_len; /* number of bytes received. */
- int index; /* current index */
+ struct hv_kvp_msg *kvp_msg; /* current message */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
+ void *kvp_context; /* for the channel callback */
} kvp_transaction;
static void kvp_send_key(struct work_struct *dummy);
@@ -110,12 +111,15 @@ kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
struct hv_kvp_msg_enumerate *data;
message = (struct hv_kvp_msg *)msg->data;
- if (message->kvp_hdr.operation == KVP_OP_REGISTER) {
+ switch (message->kvp_hdr.operation) {
+ case KVP_OP_REGISTER:
pr_info("KVP: user-mode registering done.\n");
kvp_register();
- }
+ kvp_transaction.active = false;
+ hv_kvp_onchannelcallback(kvp_transaction.kvp_context);
+ break;
- if (message->kvp_hdr.operation == KVP_OP_ENUMERATE) {
+ default:
data = &message->body.kvp_enum_data;
/*
* Complete the transaction by forwarding the key value
@@ -133,7 +137,11 @@ kvp_send_key(struct work_struct *dummy)
{
struct cn_msg *msg;
struct hv_kvp_msg *message;
- int index = kvp_transaction.index;
+ struct hv_kvp_msg *in_msg;
+ __u8 operation = kvp_transaction.kvp_msg->kvp_hdr.operation;
+ __u8 pool = kvp_transaction.kvp_msg->kvp_hdr.pool;
+ __u32 val32;
+ __u64 val64;
msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg) , GFP_ATOMIC);
@@ -142,8 +150,85 @@ kvp_send_key(struct work_struct *dummy)
msg->id.val = CN_KVP_VAL;
message = (struct hv_kvp_msg *)msg->data;
- message->kvp_hdr.operation = KVP_OP_ENUMERATE;
- message->body.kvp_enum_data.index = index;
+ message->kvp_hdr.operation = operation;
+ message->kvp_hdr.pool = pool;
+ in_msg = kvp_transaction.kvp_msg;
+
+ /*
+ * The key/value strings sent from the host are encoded in
+ * in utf16; convert it to utf8 strings.
+ */
+
+ switch (message->kvp_hdr.operation) {
+ case KVP_OP_SET:
+ switch (in_msg->body.kvp_set.data.value_type) {
+ case REG_SZ:
+ /*
+ * The value is a string - utf16 encoding.
+ */
+ message->body.kvp_set.data.value_size =
+ utf16s_to_utf8s(
+ (wchar_t *)
+ in_msg->body.kvp_set.data.value,
+ in_msg->body.kvp_set.data.value_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_set.data.value,
+ HV_KVP_EXCHANGE_MAX_VALUE_SIZE) + 1;
+ break;
+
+ case REG_U32:
+ /*
+ * The value is a 32 bit scalar.
+ * We save this as a utf8 string.
+ */
+ val32 =
+ in_msg->body.kvp_set.data.value_u32;
+ message->body.kvp_set.data.value_size =
+ sprintf(message->body.kvp_set.data.value,
+ "%d", val32) + 1;
+ break;
+
+ case REG_U64:
+ /*
+ * The value is a 64 bit scalar.
+ * We save this as a utf8 string.
+ */
+ val64 =
+ in_msg->body.kvp_set.data.value_u64;
+ message->body.kvp_set.data.value_size =
+ sprintf(message->body.kvp_set.data.value,
+ "%llu", val64) + 1;
+ break;
+
+ }
+ case KVP_OP_GET:
+ message->body.kvp_set.data.key_size =
+ utf16s_to_utf8s(
+ (wchar_t *)in_msg->body.kvp_set.data.key,
+ in_msg->body.kvp_set.data.key_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_set.data.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE) + 1;
+
+ break;
+
+ case KVP_OP_DELETE:
+ message->body.kvp_delete.key_size =
+ utf16s_to_utf8s(
+ (wchar_t *)in_msg->body.kvp_delete.key,
+ in_msg->body.kvp_delete.key_size,
+ UTF16_LITTLE_ENDIAN,
+ message->body.kvp_delete.key,
+ HV_KVP_EXCHANGE_MAX_KEY_SIZE) + 1;
+
+ break;
+
+ case KVP_OP_ENUMERATE:
+ message->body.kvp_enum_data.index =
+ in_msg->body.kvp_enum_data.index;
+ break;
+ }
+
msg->len = sizeof(struct hv_kvp_msg);
cn_netlink_send(msg, 0, GFP_ATOMIC);
kfree(msg);
@@ -159,7 +244,7 @@ static void
kvp_respond_to_host(char *key, char *value, int error)
{
struct hv_kvp_msg *kvp_msg;
- struct hv_kvp_msg_enumerate *kvp_data;
+ struct hv_kvp_exchg_msg_value *kvp_data;
char *key_name;
struct icmsg_hdr *icmsghdrp;
int keylen, valuelen;
@@ -189,6 +274,9 @@ kvp_respond_to_host(char *key, char *value, int error)
kvp_transaction.active = false;
+ icmsghdrp = (struct icmsg_hdr *)
+ &recv_buffer[sizeof(struct vmbuspipe_hdr)];
+
if (channel->onchannel_callback == NULL)
/*
* We have raced with util driver being unloaded;
@@ -196,41 +284,57 @@ kvp_respond_to_host(char *key, char *value, int error)
*/
return;
- icmsghdrp = (struct icmsg_hdr *)
- &recv_buffer[sizeof(struct vmbuspipe_hdr)];
- kvp_msg = (struct hv_kvp_msg *)
- &recv_buffer[sizeof(struct vmbuspipe_hdr) +
- sizeof(struct icmsg_hdr)];
- kvp_data = &kvp_msg->body.kvp_enum_data;
- key_name = key;
/*
* If the error parameter is set, terminate the host's enumeration.
*/
if (error) {
/*
- * We don't support this index or the we have timedout;
+ * Something failed or the we have timedout;
* terminate the host-side iteration by returning an error.
*/
icmsghdrp->status = HV_E_FAIL;
goto response_done;
}
+ icmsghdrp->status = HV_S_OK;
+
+ kvp_msg = (struct hv_kvp_msg *)
+ &recv_buffer[sizeof(struct vmbuspipe_hdr) +
+ sizeof(struct icmsg_hdr)];
+
+ switch (kvp_transaction.kvp_msg->kvp_hdr.operation) {
+ case KVP_OP_GET:
+ kvp_data = &kvp_msg->body.kvp_get.data;
+ goto copy_value;
+
+ case KVP_OP_SET:
+ case KVP_OP_DELETE:
+ goto response_done;
+
+ default:
+ break;
+ }
+
+ kvp_data = &kvp_msg->body.kvp_enum_data.data;
+ key_name = key;
+
/*
* The windows host expects the key/value pair to be encoded
* in utf16.
*/
keylen = utf8s_to_utf16s(key_name, strlen(key_name), UTF16_HOST_ENDIAN,
- (wchar_t *) kvp_data->data.key,
+ (wchar_t *) kvp_data->key,
HV_KVP_EXCHANGE_MAX_KEY_SIZE / 2);
- kvp_data->data.key_size = 2*(keylen + 1); /* utf16 encoding */
+ kvp_data->key_size = 2*(keylen + 1); /* utf16 encoding */
+
+copy_value:
valuelen = utf8s_to_utf16s(value, strlen(value), UTF16_HOST_ENDIAN,
- (wchar_t *) kvp_data->data.value,
+ (wchar_t *) kvp_data->value,
HV_KVP_EXCHANGE_MAX_VALUE_SIZE / 2);
- kvp_data->data.value_size = 2*(valuelen + 1); /* utf16 encoding */
+ kvp_data->value_size = 2*(valuelen + 1); /* utf16 encoding */
- kvp_data->data.value_type = REG_SZ; /* all our values are strings */
- icmsghdrp->status = HV_S_OK;
+ kvp_data->value_type = REG_SZ; /* all our values are strings */
response_done:
icmsghdrp->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
@@ -257,11 +361,18 @@ void hv_kvp_onchannelcallback(void *context)
u64 requestid;
struct hv_kvp_msg *kvp_msg;
- struct hv_kvp_msg_enumerate *kvp_data;
struct icmsg_hdr *icmsghdrp;
struct icmsg_negotiate *negop = NULL;
+ if (kvp_transaction.active) {
+ /*
+ * We will defer processing this callback once
+ * the current transaction is complete.
+ */
+ kvp_transaction.kvp_context = context;
+ return;
+ }
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE, &recvlen, &requestid);
@@ -276,29 +387,16 @@ void hv_kvp_onchannelcallback(void *context)
sizeof(struct vmbuspipe_hdr) +
sizeof(struct icmsg_hdr)];
- kvp_data = &kvp_msg->body.kvp_enum_data;
-
- /*
- * We only support the "get" operation on
- * "KVP_POOL_AUTO" pool.
- */
-
- if ((kvp_msg->kvp_hdr.pool != KVP_POOL_AUTO) ||
- (kvp_msg->kvp_hdr.operation !=
- KVP_OP_ENUMERATE)) {
- icmsghdrp->status = HV_E_FAIL;
- goto callback_done;
- }
-
/*
* Stash away this global state for completing the
* transaction; note transactions are serialized.
*/
+
kvp_transaction.recv_len = recvlen;
kvp_transaction.recv_channel = channel;
kvp_transaction.recv_req_id = requestid;
kvp_transaction.active = true;
- kvp_transaction.index = kvp_data->index;
+ kvp_transaction.kvp_msg = kvp_msg;
/*
* Get the information from the
@@ -316,8 +414,6 @@ void hv_kvp_onchannelcallback(void *context)
}
-callback_done:
-
icmsghdrp->icflags = ICMSGHDRFLAG_TRANSACTION
| ICMSGHDRFLAG_RESPONSE;
@@ -338,6 +434,14 @@ hv_kvp_init(struct hv_util_service *srv)
return err;
recv_buffer = srv->recv_buffer;
+ /*
+ * When this driver loads, the user level daemon that
+ * processes the host requests may not yet be running.
+ * Defer processing channel callbacks until the daemon
+ * has registered.
+ */
+ kvp_transaction.active = true;
+
return 0;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index a2d8c54..e88a979 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -119,6 +119,8 @@
*/
#define REG_SZ 1
+#define REG_U32 4
+#define REG_U64 8
enum hv_kvp_exchg_op {
KVP_OP_GET = 0,
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 00d3f7c..a98878c 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -389,10 +389,16 @@ int main(void)
}
continue;
+ case KVP_OP_SET:
+ case KVP_OP_GET:
+ case KVP_OP_DELETE:
default:
break;
}
+ if (hv_msg->kvp_hdr.operation != KVP_OP_ENUMERATE)
+ goto kvp_done;
+
hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
@@ -454,6 +460,7 @@ int main(void)
* already in the receive buffer. Update the cn_msg header to
* reflect the key value that has been added to the message
*/
+kvp_done:
incoming_cn_msg->id.idx = CN_KVP_IDX;
incoming_cn_msg->id.val = CN_KVP_VAL;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 1/4] Drivers: hv: Add new message types to enhance KVP
From: K. Y. Srinivasan @ 2012-03-10 23:32 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering
In-Reply-To: <1331422300-4330-1-git-send-email-kys@microsoft.com>
Add additional KVP (Key Value Pair) protocol messages to
enhance KVP functionality for Linux guests on Hyper-V. As part of this,
patch define an explicit version negoitiation message.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/hv_kvp.c | 5 +++--
include/linux/hyperv.h | 30 +++++++++++++++++++++++++++---
tools/hv/hv_kvp_daemon.c | 2 +-
3 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 0ef4c1f..779109b 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -78,7 +78,7 @@ kvp_register(void)
if (msg) {
kvp_msg = (struct hv_kvp_msg *)msg->data;
- version = kvp_msg->body.kvp_version;
+ version = kvp_msg->body.kvp_register.version;
msg->id.idx = CN_KVP_IDX;
msg->id.val = CN_KVP_VAL;
@@ -122,7 +122,8 @@ kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
* to the host. But first, cancel the timeout.
*/
if (cancel_delayed_work_sync(&kvp_work))
- kvp_respond_to_host(data->data.key, data->data.value,
+ kvp_respond_to_host(data->data.key,
+ data->data.value,
!strlen(data->data.key));
}
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index e57a6c6..a2d8c54 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -149,7 +149,11 @@ struct hv_kvp_exchg_msg_value {
__u32 key_size;
__u32 value_size;
__u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
- __u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
+ union {
+ __u8 value[HV_KVP_EXCHANGE_MAX_VALUE_SIZE];
+ __u32 value_u32;
+ __u64 value_u64;
+ };
} __attribute__((packed));
struct hv_kvp_msg_enumerate {
@@ -157,11 +161,31 @@ struct hv_kvp_msg_enumerate {
struct hv_kvp_exchg_msg_value data;
} __attribute__((packed));
+struct hv_kvp_msg_get {
+ struct hv_kvp_exchg_msg_value data;
+};
+
+struct hv_kvp_msg_set {
+ struct hv_kvp_exchg_msg_value data;
+};
+
+struct hv_kvp_msg_delete {
+ __u32 key_size;
+ __u8 key[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+};
+
+struct hv_kvp_register {
+ __u8 version[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+};
+
struct hv_kvp_msg {
struct hv_kvp_hdr kvp_hdr;
union {
- struct hv_kvp_msg_enumerate kvp_enum_data;
- char kvp_version[HV_KVP_EXCHANGE_MAX_KEY_SIZE];
+ struct hv_kvp_msg_get kvp_get;
+ struct hv_kvp_msg_set kvp_set;
+ struct hv_kvp_msg_delete kvp_delete;
+ struct hv_kvp_msg_enumerate kvp_enum_data;
+ struct hv_kvp_register kvp_register;
} body;
} __attribute__((packed));
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 4ebf703..00d3f7c 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -378,7 +378,7 @@ int main(void)
* Driver is registering with us; stash away the version
* information.
*/
- p = (char *)hv_msg->body.kvp_version;
+ p = (char *)hv_msg->body.kvp_register.version;
lic_version = malloc(strlen(p) + 1);
if (lic_version) {
strcpy(lic_version, p);
--
1.7.4.1
^ permalink raw reply related
* [PATCH 0000/0004] drivers: hv
From: K. Y. Srinivasan @ 2012-03-10 23:31 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering; +Cc: K. Y. Srinivasan
This patch-set further enhances the KVP functionality for Linux
guests:
1. Supports most of the Win8 KVP protocol.
2. Supports operations on all the pools.
Regards,
K. Y
^ permalink raw reply
* Re: [Xen-devel] [PATCH] blkfront: don't change to closing if we're busy
From: Jan Beulich @ 2012-03-09 13:32 UTC (permalink / raw)
To: joe.jin, Konrad Rzeszutek Wilk
Cc: jeremy, xen-devel, Andrew Jones, virtualization
In-Reply-To: <20120221143634.GD5652@phenom.dumpdata.com>
>>> On 21.02.12 at 15:36, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Tue, Feb 21, 2012 at 09:38:41AM +0000, Jan Beulich wrote:
>> >>> On 21.02.12 at 10:23, Andrew Jones <drjones@redhat.com> wrote:
>> >> >>> On 20.02.12 at 11:35, Andrew Jones <drjones@redhat.com> wrote:
>> >> >> On Fri, Feb 17, 2012 at 05:52:54PM +0100, Andrew Jones wrote:
>> >> >> There was another fix that sounds similar to this in the backend.
>> >> >> 6f5986bce558e64fe867bff600a2127a3cb0c006
>> >> >>
>> >> >
>> >> > Thanks for the pointer. It doesn't look like the upstream 2.6.18
>> >> > tree has that, but it probably would be a good idea there too.
>> >>
>> >> While I had seen the change and considered pulling it in, I wasn't
>> >> really convinced this is the right behavior here: After all, if the
>> >> host
>> >> admin requested a resource to be removed from a guest, it shouldn't
>> >> depend on the guest whether and when to honor that request, yet
>> >> by deferring the disconnect you basically allow the guest to continue
>> >> using the disk indefinitely.
>> >>
>> >
>> > I agree. Yesterday I wrote[1] asking if "deferred detach" is really
>> > something we want. At the moment, Igor and I are poking through
>> > xen-blkfront.c, and currently we'd rather see the feature dropped
>> > in favor of a simplified driver. One that has less release paths,
>> > and/or release paths with more consistent locking behavior.
>>
>> I must have missed this, or it's one more instance of delayed mail
>> delivery via xen-devel.
>>
>> Konrad - care to revert that original change as having barked up
>> the wrong tree?
>
> Meaning the 6f5986bce558e64fe867bff600a2127a3cb0c006?
>
> Lets CC Joe Jin here to get his input. I recall that the --force argument
> still works with that patch so the admin can still choose to terminate the
> state. Which I thought was the point of the --force - as in if the
> guest is still using it, we won't be yanking it out until we are
> completly sure.
Actually I meanwhile think that rather than fully reverting the
change, xen_blkif_disconnect() should be called in both the
XenbusStateClosing and XenbusStateClosed cases, not the least
since the frontend only ever sets the state to Closing when there
are still active users. Not doing the disconnect can e.g. result in
grant reference (and page) leaks e.g. when the frontend driver
gets unloaded while there are still existing (but obviously unused)
devices, and those can't even be eliminated by adding leak
handling code to gnttab_end_foreign_access() (the freeing then
gets deferred until a frontend driver gets loaded again and re-
attaches to the device - currently impossible in the upstream
driver as xlblk_exit() fails to call unregister_blkdev(); see
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/99dc6737898b).
Jan
^ permalink raw reply
* CFP: The 9th Int. Conf. on Autonomic Computing (ICAC) 2012 -- deadline extension to March 16th, 2012
From: Ioan Raicu @ 2012-03-07 21:15 UTC (permalink / raw)
To: virtualization
CALL FOR PAPERS
The 9th International Conference on Autonomic Computing (ICAC 2012)
September 17-21, 2012. San Jose, CA, USA
http://icac2012.cs.fiu.edu/
-----------------------------------------------------------------
IMPORTANT DATES
Paper and Poster Submission: March 16, 2012, 11:59pm PST (EXTENDED)
Notification: May 18, 2012
Camera-ready Due: June 8, 2012
-----------------------------------------------------------------
OVERVIEW
ICAC is the leading conference on autonomic computing techniques,
foundations, and applications. Autonomic computing refers to
methods and means for automated management of performance, fault,
security, and configuration with little involvement of users or
administrators. Systems introducing new autonomic features are
becoming increasingly prevalent, motivating research that spans
a variety of areas, from computer systems, networking, software
engineering, and data management to machine learning, control
theory, and bio-inspired computing. ICAC brings together
researchers and practitioners across these disciplines to
address multiple facets of adaptation and self-management in
computing systems and applications from different perspectives.
Autonomic computing solutions are sought for clouds, grids,
data centers, enterprise software, internet services, data
services, smart phones, embedded systems, and sensor networks.
In these environments, resources and applications must be managed
to maximize performance and minimize cost, while maintaining
predictable and reliable behavior in the face of varying
workloads, failures, and malicious threats. Papers are solicited
from all areas of autonomic computing, including (but not limited
to):
* End-to-end techniques for management of resources, workloads,
performance, faults, power/cooling, security, and others.
* Self-managing components, such as server, storage, network
protocols, or specific application elements, and embedded and
mobile end systems such as smart phones.
* Decision and analysis techniques and their use, such as machine
learning, control theory, predictive methods, probability and
stochastic processes, queuing theory methodologies, emergent
behavior, rule-based systems, and bio-inspired techniques.
* Monitoring systems for autonomic computing.
* Hypervisor, operating systems, hardware, or application support
for autonomic computing.
* Novel human interfaces for monitoring and controlling autonomic
systems.
* Management topics, such as specification and modeling of
service-level agreements, behavior enforcement and tie-in with
IT governance.
* Toolkits, frameworks, principles and architectures, from
software engineering practices and experimental methodologies
to agent-based techniques and virtualization.
* Fundamental science and theory of self-managing systems:
understanding, controlling or exploiting system behaviors to
enforce autonomic properties.
* Applications of autonomic computing and experiences with
prototyped or deployed systems solving real-world problems in
science, engineering, business and society.
Papers will be judged on originality, significance, interest,
correctness, clarity and relevance to the broader community.
Papers should report on experiences, measurements, user studies,
or other evaluations, as appropriate. Evaluations of a prototype
or large-scale deployment of systems and applications is expected.
PAPER AND POSTER SUBMISSIONS
Full papers (a maximum of 10 pages in the two-column ACM proceedings
format) and posters (2 pages) are invited on a wide variety of
topics relating to autonomic computing. Submitted papers must be
original work, and may not be under consideration for another
conference or journal. Complete formatting and submission
instructions can be found on the conference web site. Accepted
papers and posters will appear in proceedings distributed at the
conference and available electronically. Relevant top ICAC'12
papers will be invited for "fast-track" submissions to the
ACM Transactions on Autonomous and Adaptive Systems (TAAS).
INDUSTRY SESSION
One of ICAC's important roles is to bring together researchers
and practitioners from academia and industry. In its industry
session, ICAC helps fulfill this role by presenting an industry
viewpoint on technologies, products, and market needs. The
industry session also addresses current challenges, and
opportunities for academic and corporate research collaborations.
We encourage industry leaders, including entrepreneurs, product
developers, architects, managers, marketers and end users,
to submit their papers and posters reflecting such industry
perspectives as part of the regular submission process.
------------------------------------------------------------------
ORGANIZERS
GENERAL CHAIR
Dejan Milojicic, HP Labs
PROGRAM CHAIRS
Dongyan Xu, Purdue University
Vanish Talwar, HP Labs
INDUSTRY CHAIR
Xiaoyun Zhu, VMware
WORKSHOPS CHAIR
Fred Douglis, EMC
POSTERS/DEMO/EXHIBITS CHAIR
Eno Thereska, Microsoft Research
FINANCE CHAIR
Michael Kozuch, Intel
LOCAL ARRANGEMENT CHAIR
Jessica Blaine
PUBLICITY CHAIRS
Daniel Batista, University of São Paulo
Vartan Padaryan, ISP/Russian Academy of Sci.
Ioan Raicu, Illinois Inst. of Technology
Jianfeng Zhan, ICT/Chinese Academy of Sci.
Ming Zhao, Florida Intl. University
PROGRAM COMMITTEE
Tarek Abdelzaher, UIUC
Umesh Bellur, IIT, Bombay
Ken Birman, Cornell University
Rajkumar Buyya, Univ. of Melbourne
Rocky Chang, Hong Kong Polytechnic University
Yuan Chen, HP Labs
Alva Couch, Tufts University
Peter Dinda, Northwestern University
Fred Douglis, EMC
Renato Figueiredo, University of Florida
Mohamed Hefeeda, Qatar Computing Research Institute
Joe Hellerstein, Google
Geoff Jiang, NEC Labs
Jeff Kephart, IBM Research
Emre Kiciman, Microsoft Research
Fabio Kon, University of São Paulo
Michael Kozuch, Intel
Dejan Milojicic, HP Labs
Klara Nahrstedt, UIUC
Priya Narasimhan, CMU
Manish Parashar, Rutgers University
Ioan Raicu, Illinois Inst. of Technology
Omer Rana, Cardiff University
Masoud Sadjadi, Florida Intl. University
Rick Schlichting, AT&T Labs
Hartmut Schmeck, KIT
Karsten Schwan, Georgia Tech
Onn Shehory, IBM Research
Eno Thereska, Microsoft Research
Xiaoyun Zhu, VMware
--
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel: 1-847-722-0876
Office: 1-312-567-5704
Email: iraicu@cs.iit.edu
Web: http://www.cs.iit.edu/~iraicu/
Web: http://datasys.cs.iit.edu/
=================================================================
=================================================================
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Konrad Rzeszutek Wilk @ 2012-03-07 15:15 UTC (permalink / raw)
To: Jan Beulich
Cc: jeremy@goop.org, Ian Campbell, konrad.wilk@oracle.com,
waldi@debian.org, netdev@vger.kernel.org, joe.jin,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
xen-devel@lists.xen.org, paul.gortmaker@windriver.com,
Paul Durrant, weiyi.huang@gmail.com, Santosh Jodh,
linux-pci@vger.kernel.org, dgdegra,
virtualization@lists.linux-foundation.org, lersek@redhat.com,
akpm@linux-foundation.org, David
In-Reply-To: <4F5739760200007800076DA6@nat28.tlf.novell.com>
[-- Attachment #1.1: Type: text/plain, Size: 1374 bytes --]
On Mar 7, 2012 4:33 AM, "Jan Beulich" <JBeulich@suse.com> wrote:
>
> >>> On 06.03.12 at 18:20, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> > -> the usage of XenbusStateInitWait? Why do we introduce that? Looks
> > like a fix to something.
>
> No, this is required to get the negotiation working (the frontend must
> not try to read the new nodes until it can be certain that the backend
> populated them). However, as already pointed out in an earlier reply
> to Santosh, the way this is done here doesn't appear to allow for the
> backend to already be in InitWait state when the frontend gets
> invoked.
OK.
>
> > -> XENBUS_MAX_RING_PAGES - why 2? Why not 4? What is the optimal
> > default size for SSD usage? 16?
>
> What do SSDs have to do with a XenBus definition? Imo it's wrong (and
> unnecessary) to introduce a limit at the XenBus level at all - each driver
> can do this for itself.
The patch should mention what the benefit of multi ring is.
>
> As to the limit for SSDs in the block interface - I don't think the number
> of possibly simultaneous requests has anything to do with this. Instead,
> I'd expect the request number/size/segments extension that NetBSD
> apparently implements to possibly have an effect.
.. which sounds to me like increasing the bandwidth of the protocol. Should
be mentioned somewhere in the git description.
>
> Jan
>
>
[-- Attachment #1.2: Type: text/html, Size: 1774 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-07 9:33 UTC (permalink / raw)
To: Santosh Jodh, konrad
Cc: jeremy@goop.org, Ian Campbell, netdev@vger.kernel.org,
konrad.wilk@oracle.com, waldi@debian.org, joe.jin@oracle.com,
weiyi.huang@gmail.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, Paul Durrant, David Vrabel,
linux-pci@vger.kernel.org, akpm@linux-foundation.org,
xen-devel@lists.xen.org, lersek@redhat.com, dgdegra
In-Reply-To: <CAPbh3rsExLtohBwVd_scYuO=GN1iZE5egQQ3x5M59YUno5Rtyw@mail.gmail.com>
>>> On 06.03.12 at 18:20, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> -> the usage of XenbusStateInitWait? Why do we introduce that? Looks
> like a fix to something.
No, this is required to get the negotiation working (the frontend must
not try to read the new nodes until it can be certain that the backend
populated them). However, as already pointed out in an earlier reply
to Santosh, the way this is done here doesn't appear to allow for the
backend to already be in InitWait state when the frontend gets
invoked.
> -> XENBUS_MAX_RING_PAGES - why 2? Why not 4? What is the optimal
> default size for SSD usage? 16?
What do SSDs have to do with a XenBus definition? Imo it's wrong (and
unnecessary) to introduce a limit at the XenBus level at all - each driver
can do this for itself.
As to the limit for SSDs in the block interface - I don't think the number
of possibly simultaneous requests has anything to do with this. Instead,
I'd expect the request number/size/segments extension that NetBSD
apparently implements to possibly have an effect.
Jan
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Konrad Rzeszutek Wilk @ 2012-03-06 17:20 UTC (permalink / raw)
To: Santosh Jodh
Cc: jeremy@goop.org, Ian Campbell, konrad.wilk@oracle.com,
waldi@debian.org, weiyi.huang@gmail.com, joe.jin@oracle.com,
linux-kernel@vger.kernel.org, jbeulich@novell.com,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, linux-pci@vger.kernel.org,
Paul Durrant, jbarnes@virtuousgeek.org, netdev@vger.kernel.org,
dgdegra@tycho.nsa.gov, xen-devel@lists.xen.org, lersek@redhat.com,
akpm
In-Reply-To: <7914B38A4445B34AA16EB9F1352942F1010A1FA12364@SJCPMAILBOX01.citrite.net>
On Mon, Mar 5, 2012 at 4:49 PM, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
> From: Santosh Jodh <santosh.jodh@citrix.com>
>
> Add support for multi page ring for block devices.
> The number of pages is configurable for blkback via module parameter.
> blkback reports max-ring-page-order to blkfront via xenstore.
> blkfront reports its supported ring-page-order to blkback via xenstore.
> blkfront reports multi page ring references via ring-refNN in xenstore.
> The change allows newer blkfront to work with older blkback and
> vice-versa.
> Based on original patch by Paul Durrant.
you should include his SoB in this patch.
The patch overall looks Ok, thought I do have some comments:
-> the call to "xenbus_ring_ops_init();" looks like a bug-fix? If so,
it should be a separate patch.
-> the usage of XenbusStateInitWait? Why do we introduce that? Looks
like a fix to something.
-> XENBUS_MAX_RING_PAGES - why 2? Why not 4? What is the optimal
default size for SSD usage? 16?
-> don't do sprintf, use snprinf
-> don't use printk(KERN_..), use pr_info or the variant of
pr_err,pr_debug, etc.
-> don't split the printk contents. It is Ok for them to be more than
80 lines.
-> check that xen_blkif_ring_order is under XENBUS_MAX_RING_PAGES.
Otherwise a joker could do = 9999999999999999999 for ring size and we
would try to use that.
-> Separate the patch that introduces the changes to the XenBus
infrastructure (and then the changes to net* and blk*) to use the
extra arguments would be folded in that patch. Then the patch that
implements the multi ring to blkback is a patch that depends on that
the XenBus modifications patch. Also make sure you CC David Miller and
Jens Axboe on the XenBus patch as it modifies the net-* side which
requires Ian's and David's Ack.
-> Have you done a sanity/test check where the backend and frontend
have different size rings? Just to make sure nothing explodes.
>
> Signed-off-by: Santosh Jodh <santosh.jodh@citrix.com>
> ---
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 0088bf6..72f2e18 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -60,6 +60,39 @@ static int xen_blkif_reqs = 64;
> module_param_named(reqs, xen_blkif_reqs, int, 0);
> MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate");
>
> +/* Order of maximum shared ring size advertised to the front end. */
> +int xen_blkif_max_ring_order = XENBUS_MAX_RING_ORDER;
> +
> +#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
> +
> +static int set_max_ring_order(const char *buf, struct kernel_param *kp)
> +{
> + int err;
> + unsigned long order;
> +
> + err = kstrtol(buf, 0, &order);
> + if (err ||
> + order < 0 ||
> + order > XENBUS_MAX_RING_ORDER)
> + return -EINVAL;
> +
> + if (xen_blkif_reqs < BLK_RING_SIZE(order))
> + printk(KERN_WARNING "WARNING: "
> + "I/O request space (%d reqs) < ring order %ld, "
> + "consider increasing %s.reqs to >= %ld.",
> + xen_blkif_reqs, order, KBUILD_MODNAME,
> + roundup_pow_of_two(BLK_RING_SIZE(order)));
> +
> + xen_blkif_max_ring_order = order;
> +
> + return 0;
> +}
> +
> +module_param_call(max_ring_order,
> + set_max_ring_order, param_get_int,
> + &xen_blkif_max_ring_order, 0644);
> +MODULE_PARM_DESC(max_ring_order, "log2 of maximum ring size, in pages.");
> +
> /* Run-time switchable: /sys/module/blkback/parameters/ */
> static unsigned int log_stats;
> module_param(log_stats, int, 0644);
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index d0ee7ed..5f33a1a 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -126,6 +126,8 @@ struct blkif_x86_64_response {
> int16_t status; /* BLKIF_RSP_??? */
> };
>
> +extern int xen_blkif_max_ring_order;
> +
> DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
> struct blkif_common_response);
> DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 24a2fb5..7a9d71d 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -122,8 +122,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
> return blkif;
> }
>
> -static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
> - unsigned int evtchn)
> +static int xen_blkif_map(struct xen_blkif *blkif, int ring_ref[],
> + unsigned int ring_order, unsigned int evtchn)
> {
> int err;
>
> @@ -131,7 +131,8 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
> if (blkif->irq)
> return 0;
>
> - err = xenbus_map_ring_valloc(blkif->be->dev, shared_page, &blkif->blk_ring);
> + err = xenbus_map_ring_valloc(blkif->be->dev, ring_ref, 1 << ring_order,
> + &blkif->blk_ring);
> if (err < 0)
> return err;
>
> @@ -140,21 +141,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
> {
> struct blkif_sring *sring;
> sring = (struct blkif_sring *)blkif->blk_ring;
> - BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE);
> + BACK_RING_INIT(&blkif->blk_rings.native, sring,
> + PAGE_SIZE << ring_order);
> break;
> }
> case BLKIF_PROTOCOL_X86_32:
> {
> struct blkif_x86_32_sring *sring_x86_32;
> sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
> - BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE);
> + BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
> + PAGE_SIZE << ring_order);
> break;
> }
> case BLKIF_PROTOCOL_X86_64:
> {
> struct blkif_x86_64_sring *sring_x86_64;
> sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
> - BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE);
> + BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
> + PAGE_SIZE << ring_order);
> break;
> }
> default:
> @@ -497,6 +501,11 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
> if (err)
> goto fail;
>
> + err = xenbus_printf(XBT_NIL, dev->nodename, "max-ring-page-order",
> + "%u", xen_blkif_max_ring_order);
> + if (err)
> + goto fail;
> +
> err = xenbus_switch_state(dev, XenbusStateInitWait);
> if (err)
> goto fail;
> @@ -744,22 +753,80 @@ again:
> static int connect_ring(struct backend_info *be)
> {
> struct xenbus_device *dev = be->dev;
> - unsigned long ring_ref;
> + int ring_ref[XENBUS_MAX_RING_PAGES];
> + unsigned int ring_order;
> unsigned int evtchn;
> char protocol[64] = "";
> int err;
>
> DPRINTK("%s", dev->otherend);
>
> - err = xenbus_gather(XBT_NIL, dev->otherend, "ring-ref", "%lu",
> - &ring_ref, "event-channel", "%u", &evtchn, NULL);
> - if (err) {
> - xenbus_dev_fatal(dev, err,
> - "reading %s/ring-ref and event-channel",
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
> + &evtchn);
> + if (err != 1) {
> + err = -EINVAL;
> +
> + xenbus_dev_fatal(dev, err, "reading %s/event-channel",
> dev->otherend);
> return err;
> }
>
> + printk(KERN_INFO "blkback: event-channel %u\n", evtchn);
> +
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
> + &ring_order);
> + if (err != 1) {
> + DPRINTK("%s: using single page handshake", dev->otherend);
> +
> + ring_order = 0;
> +
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
> + "%d", &ring_ref[0]);
> + if (err != 1) {
> + err = -EINVAL;
> +
> + xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
> + dev->otherend);
> + return err;
> + }
> +
> + printk(KERN_INFO "blkback: ring-ref %d\n", ring_ref[0]);
> + } else {
> + unsigned int i;
> +
> + if (ring_order > xen_blkif_max_ring_order) {
> + err = -EINVAL;
> +
> + xenbus_dev_fatal(dev, err,
> + "%s/ring-page-order too big",
> + dev->otherend);
> + return err;
> + }
> +
> + for (i = 0; i < (1u << ring_order); i++) {
> + char ring_ref_name[10];
> +
> + snprintf(ring_ref_name, sizeof(ring_ref_name),
> + "ring-ref%u", i);
> +
> + err = xenbus_scanf(XBT_NIL, dev->otherend,
> + ring_ref_name, "%d",
> + &ring_ref[i]);
> + if (err != 1) {
> + err = -EINVAL;
> +
> + xenbus_dev_fatal(dev, err,
> + "reading %s/%s",
> + dev->otherend,
> + ring_ref_name);
> + return err;
> + }
> +
> + printk(KERN_INFO "blkback: ring-ref%u %d\n", i,
> + ring_ref[i]);
> + }
> + }
> +
> be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> "%63s", protocol, NULL);
> @@ -775,14 +842,11 @@ static int connect_ring(struct backend_info *be)
> xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> return -1;
> }
> - pr_info(DRV_PFX "ring-ref %ld, event-channel %d, protocol %d (%s)\n",
> - ring_ref, evtchn, be->blkif->blk_protocol, protocol);
>
> /* Map the shared frame, irq etc. */
> - err = xen_blkif_map(be->blkif, ring_ref, evtchn);
> + err = xen_blkif_map(be->blkif, ring_ref, ring_order, evtchn);
> if (err) {
> - xenbus_dev_fatal(dev, err, "mapping ring-ref %lu port %u",
> - ring_ref, evtchn);
> + xenbus_dev_fatal(dev, err, "mapping ring-refs and evtchn");
> return err;
> }
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2f22874..485813a 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -57,6 +57,10 @@
>
> #include <asm/xen/hypervisor.h>
>
> +static int xen_blkif_ring_order;
> +module_param_named(reqs, xen_blkif_ring_order, int, 0);
> +MODULE_PARM_DESC(reqs, "log2 of requested ring size, in pages.");
> +
> enum blkif_state {
> BLKIF_STATE_DISCONNECTED,
> BLKIF_STATE_CONNECTED,
> @@ -72,7 +76,8 @@ struct blk_shadow {
> static DEFINE_MUTEX(blkfront_mutex);
> static const struct block_device_operations xlvbd_block_fops;
>
> -#define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
> +#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
> +#define BLK_MAX_RING_SIZE BLK_RING_SIZE(XENBUS_MAX_RING_ORDER)
>
> /*
> * We have one of these per vbd, whether ide, scsi or 'other'. They
> @@ -87,14 +92,15 @@ struct blkfront_info
> int vdevice;
> blkif_vdev_t handle;
> enum blkif_state connected;
> - int ring_ref;
> + int ring_ref[XENBUS_MAX_RING_PAGES];
> + int ring_order;
> struct blkif_front_ring ring;
> struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> unsigned int evtchn, irq;
> struct request_queue *rq;
> struct work_struct work;
> struct gnttab_free_callback callback;
> - struct blk_shadow shadow[BLK_RING_SIZE];
> + struct blk_shadow shadow[BLK_MAX_RING_SIZE];
> unsigned long shadow_free;
> unsigned int feature_flush;
> unsigned int flush_op;
> @@ -111,9 +117,7 @@ static unsigned int nr_minors;
> static unsigned long *minors;
> static DEFINE_SPINLOCK(minor_lock);
>
> -#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
> - (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
> -#define GRANT_INVALID_REF 0
> +#define GRANT_INVALID_REF 0
>
> #define PARTS_PER_DISK 16
> #define PARTS_PER_EXT_DISK 256
> @@ -135,7 +139,7 @@ static DEFINE_SPINLOCK(minor_lock);
> static int get_id_from_freelist(struct blkfront_info *info)
> {
> unsigned long free = info->shadow_free;
> - BUG_ON(free >= BLK_RING_SIZE);
> + BUG_ON(free >= BLK_MAX_RING_SIZE);
> info->shadow_free = info->shadow[free].req.u.rw.id;
> info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
> return free;
> @@ -683,6 +687,8 @@ static void blkif_restart_queue(struct work_struct *work)
>
> static void blkif_free(struct blkfront_info *info, int suspend)
> {
> + int i;
> +
> /* Prevent new requests being issued until we fix things up. */
> spin_lock_irq(&blkif_io_lock);
> info->connected = suspend ?
> @@ -698,16 +704,19 @@ static void blkif_free(struct blkfront_info *info, int suspend)
> flush_work_sync(&info->work);
>
> /* Free resources associated with old device channel. */
> - if (info->ring_ref != GRANT_INVALID_REF) {
> - gnttab_end_foreign_access(info->ring_ref, 0,
> - (unsigned long)info->ring.sring);
> - info->ring_ref = GRANT_INVALID_REF;
> - info->ring.sring = NULL;
> + for (i = 0; i < (1 << info->ring_order); i++) {
> + if (info->ring_ref[i] != GRANT_INVALID_REF) {
> + gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
> + info->ring_ref[i] = GRANT_INVALID_REF;
> + }
> }
> +
> + free_pages((unsigned long)info->ring.sring, info->ring_order);
> + info->ring.sring = NULL;
> +
> if (info->irq)
> unbind_from_irqhandler(info->irq, info);
> info->evtchn = info->irq = 0;
> -
> }
>
> static void blkif_completion(struct blk_shadow *s)
> @@ -828,25 +837,24 @@ static int setup_blkring(struct xenbus_device *dev,
> struct blkif_sring *sring;
> int err;
>
> - info->ring_ref = GRANT_INVALID_REF;
> -
> - sring = (struct blkif_sring *)__get_free_page(GFP_NOIO | __GFP_HIGH);
> + sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
> + info->ring_order);
> if (!sring) {
> xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
> return -ENOMEM;
> }
> SHARED_RING_INIT(sring);
> - FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
> + FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE << info->ring_order);
>
> sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
>
> - err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
> + err = xenbus_grant_ring(dev, info->ring.sring, 1 << info->ring_order,
> + info->ring_ref);
> if (err < 0) {
> - free_page((unsigned long)sring);
> + free_pages((unsigned long)sring, info->ring_order);
> info->ring.sring = NULL;
> goto fail;
> }
> - info->ring_ref = err;
>
> err = xenbus_alloc_evtchn(dev, &info->evtchn);
> if (err)
> @@ -875,8 +883,27 @@ static int talk_to_blkback(struct xenbus_device *dev,
> {
> const char *message = NULL;
> struct xenbus_transaction xbt;
> + unsigned int ring_order;
> + int legacy_backend;
> + int i;
> int err;
>
> + for (i = 0; i < (1 << info->ring_order); i++)
> + info->ring_ref[i] = GRANT_INVALID_REF;
> +
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u",
> + &ring_order);
> +
> + legacy_backend = !(err == 1);
> +
> + if (legacy_backend) {
> + info->ring_order = 0;
> + } else {
> + info->ring_order = (ring_order <= xen_blkif_ring_order) ?
> + ring_order :
> + xen_blkif_ring_order;
> + }
> +
> /* Create shared ring, alloc event channel. */
> err = setup_blkring(dev, info);
> if (err)
> @@ -889,12 +916,35 @@ again:
> goto destroy_blkring;
> }
>
> - err = xenbus_printf(xbt, dev->nodename,
> - "ring-ref", "%u", info->ring_ref);
> - if (err) {
> - message = "writing ring-ref";
> - goto abort_transaction;
> + if (legacy_backend) {
> + err = xenbus_printf(xbt, dev->nodename,
> + "ring-ref", "%d", info->ring_ref[0]);
> + if (err) {
> + message = "writing ring-ref";
> + goto abort_transaction;
> + }
> + } else {
> + for (i = 0; i < (1 << info->ring_order); i++) {
> + char key[sizeof("ring-ref") + 2];
> +
> + sprintf(key, "ring-ref%d", i);
> +
> + err = xenbus_printf(xbt, dev->nodename,
> + key, "%d", info->ring_ref[i]);
> + if (err) {
> + message = "writing ring-ref";
> + goto abort_transaction;
> + }
> + }
> +
> + err = xenbus_printf(xbt, dev->nodename,
> + "ring-page-order", "%u", info->ring_order);
> + if (err) {
> + message = "writing ring-order";
> + goto abort_transaction;
> + }
> }
> +
> err = xenbus_printf(xbt, dev->nodename,
> "event-channel", "%u", info->evtchn);
> if (err) {
> @@ -996,21 +1046,14 @@ static int blkfront_probe(struct xenbus_device *dev,
> info->connected = BLKIF_STATE_DISCONNECTED;
> INIT_WORK(&info->work, blkif_restart_queue);
>
> - for (i = 0; i < BLK_RING_SIZE; i++)
> + for (i = 0; i < BLK_MAX_RING_SIZE; i++)
> info->shadow[i].req.u.rw.id = i+1;
> - info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
> + info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
>
> /* Front end dir is a number, which is used as the id. */
> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> dev_set_drvdata(&dev->dev, info);
>
> - err = talk_to_blkback(dev, info);
> - if (err) {
> - kfree(info);
> - dev_set_drvdata(&dev->dev, NULL);
> - return err;
> - }
> -
> return 0;
> }
>
> @@ -1031,13 +1074,13 @@ static int blkif_recover(struct blkfront_info *info)
>
> /* Stage 2: Set up free list. */
> memset(&info->shadow, 0, sizeof(info->shadow));
> - for (i = 0; i < BLK_RING_SIZE; i++)
> + for (i = 0; i < BLK_MAX_RING_SIZE; i++)
> info->shadow[i].req.u.rw.id = i+1;
> info->shadow_free = info->ring.req_prod_pvt;
> - info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
> + info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
>
> /* Stage 3: Find pending requests and requeue them. */
> - for (i = 0; i < BLK_RING_SIZE; i++) {
> + for (i = 0; i < BLK_RING_SIZE(info->ring_order); i++) {
> /* Not in use? */
> if (!copy[i].request)
> continue;
> @@ -1299,7 +1342,6 @@ static void blkback_changed(struct xenbus_device *dev,
>
> switch (backend_state) {
> case XenbusStateInitialising:
> - case XenbusStateInitWait:
> case XenbusStateInitialised:
> case XenbusStateReconfiguring:
> case XenbusStateReconfigured:
> @@ -1307,6 +1349,10 @@ static void blkback_changed(struct xenbus_device *dev,
> case XenbusStateClosed:
> break;
>
> + case XenbusStateInitWait:
> + talk_to_blkback(dev, info);
> + break;
> +
> case XenbusStateConnected:
> blkfront_connect(info);
> break;
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 94b79c3..f93b59a 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -130,8 +130,8 @@ int xen_netbk_must_stop_queue(struct xenvif *vif);
> /* (Un)Map communication rings. */
> void xen_netbk_unmap_frontend_rings(struct xenvif *vif);
> int xen_netbk_map_frontend_rings(struct xenvif *vif,
> - grant_ref_t tx_ring_ref,
> - grant_ref_t rx_ring_ref);
> + int tx_ring_ref,
> + int rx_ring_ref);
>
> /* (De)Register a xenvif with the netback backend. */
> void xen_netbk_add_xenvif(struct xenvif *vif);
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 59effac..0b014cf 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1594,8 +1594,8 @@ void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
> }
>
> int xen_netbk_map_frontend_rings(struct xenvif *vif,
> - grant_ref_t tx_ring_ref,
> - grant_ref_t rx_ring_ref)
> + int tx_ring_ref,
> + int rx_ring_ref)
> {
> void *addr;
> struct xen_netif_tx_sring *txs;
> @@ -1604,7 +1604,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
> int err = -ENOMEM;
>
> err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> - tx_ring_ref, &addr);
> + &tx_ring_ref, 1, &addr);
> if (err)
> goto err;
>
> @@ -1612,7 +1612,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
> BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
>
> err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> - rx_ring_ref, &addr);
> + &rx_ring_ref, 1, &addr);
> if (err)
> goto err;
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 698b905..521a595 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1496,13 +1496,12 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> SHARED_RING_INIT(txs);
> FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
>
> - err = xenbus_grant_ring(dev, virt_to_mfn(txs));
> + err = xenbus_grant_ring(dev, txs, 1, &info->tx_ring_ref);
> if (err < 0) {
> free_page((unsigned long)txs);
> goto fail;
> }
>
> - info->tx_ring_ref = err;
> rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> if (!rxs) {
> err = -ENOMEM;
> @@ -1512,12 +1511,11 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> SHARED_RING_INIT(rxs);
> FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
>
> - err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
> + err = xenbus_grant_ring(dev, rxs, 1, &info->rx_ring_ref);
> if (err < 0) {
> free_page((unsigned long)rxs);
> goto fail;
> }
> - info->rx_ring_ref = err;
>
> err = xenbus_alloc_evtchn(dev, &info->evtchn);
> if (err)
> diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> index 1620088..95109d8 100644
> --- a/drivers/pci/xen-pcifront.c
> +++ b/drivers/pci/xen-pcifront.c
> @@ -768,12 +768,10 @@ static int pcifront_publish_info(struct pcifront_device *pdev)
> int err = 0;
> struct xenbus_transaction trans;
>
> - err = xenbus_grant_ring(pdev->xdev, virt_to_mfn(pdev->sh_info));
> + err = xenbus_grant_ring(pdev->xdev, pdev->sh_info, 1, &pdev->gnt_ref);
> if (err < 0)
> goto out;
>
> - pdev->gnt_ref = err;
> -
> err = xenbus_alloc_evtchn(pdev->xdev, &pdev->evtchn);
> if (err)
> goto out;
> diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
> index 64b11f9..e0834cd 100644
> --- a/drivers/xen/xen-pciback/xenbus.c
> +++ b/drivers/xen/xen-pciback/xenbus.c
> @@ -108,7 +108,7 @@ static int xen_pcibk_do_attach(struct xen_pcibk_device *pdev, int gnt_ref,
> "Attaching to frontend resources - gnt_ref=%d evtchn=%d\n",
> gnt_ref, remote_evtchn);
>
> - err = xenbus_map_ring_valloc(pdev->xdev, gnt_ref, &vaddr);
> + err = xenbus_map_ring_valloc(pdev->xdev, &gnt_ref, 1, &vaddr);
> if (err < 0) {
> xenbus_dev_fatal(pdev->xdev, err,
> "Error mapping other domain page in ours.");
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index 566d2ad..3a14524 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -53,14 +53,16 @@ struct xenbus_map_node {
> struct vm_struct *area; /* PV */
> struct page *page; /* HVM */
> };
> - grant_handle_t handle;
> + grant_handle_t handle[XENBUS_MAX_RING_PAGES];
> + unsigned int nr_handles;
> };
>
> static DEFINE_SPINLOCK(xenbus_valloc_lock);
> static LIST_HEAD(xenbus_valloc_pages);
>
> struct xenbus_ring_ops {
> - int (*map)(struct xenbus_device *dev, int gnt, void **vaddr);
> + int (*map)(struct xenbus_device *dev, int gnt[], int nr_gnts,
> + void **vaddr);
> int (*unmap)(struct xenbus_device *dev, void *vaddr);
> };
>
> @@ -356,17 +358,38 @@ static void xenbus_switch_fatal(struct xenbus_device *dev, int depth, int err,
> /**
> * xenbus_grant_ring
> * @dev: xenbus device
> - * @ring_mfn: mfn of ring to grant
> -
> - * Grant access to the given @ring_mfn to the peer of the given device. Return
> - * 0 on success, or -errno on error. On error, the device will switch to
> - * XenbusStateClosing, and the error will be saved in the store.
> + * @vaddr: starting virtual address of the ring
> + * @nr_pages: number of page to be granted
> + * @grefs: grant reference array to be filled in
> + * Grant access to the given @vaddr to the peer of the given device.
> + * Then fill in @grefs with grant references. Return 0 on success, or
> + * -errno on error. On error, the device will switch to
> + * XenbusStateClosing, and the first error will be saved in the store.
> */
> -int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn)
> +int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
> + int nr_pages, int grefs[])
> {
> - int err = gnttab_grant_foreign_access(dev->otherend_id, ring_mfn, 0);
> - if (err < 0)
> - xenbus_dev_fatal(dev, err, "granting access to ring page");
> + int i;
> + int err;
> +
> + for (i = 0; i < nr_pages; i++) {
> + unsigned long addr = (unsigned long)vaddr +
> + (PAGE_SIZE * i);
> + err = gnttab_grant_foreign_access(dev->otherend_id,
> + virt_to_mfn(addr), 0);
> + if (err < 0) {
> + xenbus_dev_fatal(dev, err,
> + "granting access to ring page");
> + goto fail;
> + }
> + grefs[i] = err;
> + }
> +
> + return 0;
> +
> +fail:
> + for ( ; i >= 0; i--)
> + gnttab_end_foreign_access_ref(grefs[i], 0);
> return err;
> }
> EXPORT_SYMBOL_GPL(xenbus_grant_ring);
> @@ -447,7 +470,8 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
> /**
> * xenbus_map_ring_valloc
> * @dev: xenbus device
> - * @gnt_ref: grant reference
> + * @gnt_ref: grant reference array
> + * @nr_grefs: number of grant reference
> * @vaddr: pointer to address to be filled out by mapping
> *
> * Based on Rusty Russell's skeleton driver's map_page.
> @@ -458,23 +482,28 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
> * or -ENOMEM on error. If an error is returned, device will switch to
> * XenbusStateClosing and the error message will be saved in XenStore.
> */
> -int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
> +int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
> + int nr_grefs, void **vaddr)
> {
> - return ring_ops->map(dev, gnt_ref, vaddr);
> + return ring_ops->map(dev, gnt_ref, nr_grefs, vaddr);
> }
> EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
>
> +static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
> + struct xenbus_map_node *node);
> +
> static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
> - int gnt_ref, void **vaddr)
> + int gnt_ref[], int nr_grefs, void **vaddr)
> {
> - struct gnttab_map_grant_ref op = {
> - .flags = GNTMAP_host_map | GNTMAP_contains_pte,
> - .ref = gnt_ref,
> - .dom = dev->otherend_id,
> - };
> + struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
> struct xenbus_map_node *node;
> struct vm_struct *area;
> - pte_t *pte;
> + pte_t *pte[XENBUS_MAX_RING_PAGES];
> + int i;
> + int err = 0;
> +
> + if (nr_grefs > XENBUS_MAX_RING_PAGES)
> + return -EINVAL;
>
> *vaddr = NULL;
>
> @@ -482,28 +511,44 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
> if (!node)
> return -ENOMEM;
>
> - area = alloc_vm_area(PAGE_SIZE, &pte);
> + area = alloc_vm_area(PAGE_SIZE * nr_grefs, pte);
> if (!area) {
> kfree(node);
> return -ENOMEM;
> }
>
> - op.host_addr = arbitrary_virt_to_machine(pte).maddr;
> + for (i = 0; i < nr_grefs; i++) {
> + op[i].flags = GNTMAP_host_map | GNTMAP_contains_pte,
> + op[i].ref = gnt_ref[i],
> + op[i].dom = dev->otherend_id,
> + op[i].host_addr = arbitrary_virt_to_machine(pte[i]).maddr;
> + };
>
> if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
> BUG();
>
> - if (op.status != GNTST_okay) {
> - free_vm_area(area);
> - kfree(node);
> - xenbus_dev_fatal(dev, op.status,
> - "mapping in shared page %d from domain %d",
> - gnt_ref, dev->otherend_id);
> - return op.status;
> + node->nr_handles = nr_grefs;
> + node->area = area;
> +
> + for (i = 0; i < nr_grefs; i++) {
> + if (op[i].status != GNTST_okay) {
> + err = op[i].status;
> + node->handle[i] = INVALID_GRANT_HANDLE;
> + continue;
> + }
> + node->handle[i] = op[i].handle;
> }
>
> - node->handle = op.handle;
> - node->area = area;
> + if (err != 0) {
> + for (i = 0; i < nr_grefs; i++)
> + xenbus_dev_fatal(dev, op[i].status,
> + "mapping in shared page %d from domain %d",
> + gnt_ref[i], dev->otherend_id);
> +
> + __xenbus_unmap_ring_vfree_pv(dev, node);
> +
> + return err;
> + }
>
> spin_lock(&xenbus_valloc_lock);
> list_add(&node->next, &xenbus_valloc_pages);
> @@ -514,25 +559,29 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
> }
>
> static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
> - int gnt_ref, void **vaddr)
> + int gnt_ref[], int nr_grefs, void **vaddr)
> {
> struct xenbus_map_node *node;
> int err;
> void *addr;
>
> + if (nr_grefs > XENBUS_MAX_RING_PAGES)
> + return -EINVAL;
> +
> *vaddr = NULL;
>
> node = kzalloc(sizeof(*node), GFP_KERNEL);
> if (!node)
> return -ENOMEM;
>
> - err = alloc_xenballooned_pages(1, &node->page, false /* lowmem */);
> + err = alloc_xenballooned_pages(nr_grefs, &node->page,
> + false /* lowmem */);
> if (err)
> goto out_err;
>
> addr = pfn_to_kaddr(page_to_pfn(node->page));
>
> - err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr);
> + err = xenbus_map_ring(dev, gnt_ref, nr_grefs, node->handle, addr);
> if (err)
> goto out_err;
>
> @@ -544,7 +593,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
> return 0;
>
> out_err:
> - free_xenballooned_pages(1, &node->page);
> + free_xenballooned_pages(nr_grefs, &node->page);
> kfree(node);
> return err;
> }
> @@ -553,36 +602,51 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
> /**
> * xenbus_map_ring
> * @dev: xenbus device
> - * @gnt_ref: grant reference
> - * @handle: pointer to grant handle to be filled
> + * @gnt_ref: grant reference array
> + * @nr_grefs: number of grant references
> + * @handle: pointer to grant handle array to be filled, mind the size
> * @vaddr: address to be mapped to
> *
> - * Map a page of memory into this domain from another domain's grant table.
> + * Map pages of memory into this domain from another domain's grant table.
> * xenbus_map_ring does not allocate the virtual address space (you must do
> - * this yourself!). It only maps in the page to the specified address.
> + * this yourself!). It only maps in the pages to the specified address.
> * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
> * or -ENOMEM on error. If an error is returned, device will switch to
> - * XenbusStateClosing and the error message will be saved in XenStore.
> + * XenbusStateClosing and the last error message will be saved in XenStore.
> */
> -int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
> - grant_handle_t *handle, void *vaddr)
> +int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
> + grant_handle_t handle[], void *vaddr)
> {
> - struct gnttab_map_grant_ref op;
> -
> - gnttab_set_map_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, gnt_ref,
> - dev->otherend_id);
> + struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
> + int i;
> + int err = GNTST_okay; /* 0 */
> +
> + for (i = 0; i < nr_grefs; i++) {
> + unsigned long addr = (unsigned long)vaddr +
> + (PAGE_SIZE * i);
> + gnttab_set_map_op(&op[i], (phys_addr_t)addr,
> + GNTMAP_host_map, gnt_ref[i],
> + dev->otherend_id);
> + }
>
> - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
> + if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nr_grefs))
> BUG();
>
> - if (op.status != GNTST_okay) {
> - xenbus_dev_fatal(dev, op.status,
> - "mapping in shared page %d from domain %d",
> - gnt_ref, dev->otherend_id);
> - } else
> - *handle = op.handle;
> + for (i = 0; i < nr_grefs; i++) {
> + if (op[i].status != GNTST_okay) {
> + err = op[i].status;
> + xenbus_dev_fatal(dev, err,
> + "mapping in shared page %d from domain %d",
> + gnt_ref[i], dev->otherend_id);
> + handle[i] = INVALID_GRANT_HANDLE;
> + } else
> + handle[i] = op[i].handle;
> + }
>
> - return op.status;
> + if (err != GNTST_okay)
> + xenbus_unmap_ring(dev, handle, nr_grefs, vaddr);
> +
> + return err;
> }
> EXPORT_SYMBOL_GPL(xenbus_map_ring);
>
> @@ -605,13 +669,53 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
> }
> EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
>
> +static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
> + struct xenbus_map_node *node)
> +{
> + struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
> + unsigned int level;
> + int i, j;
> + int err = GNTST_okay;
> +
> + j = 0;
> + for (i = 0; i < node->nr_handles; i++) {
> + unsigned long vaddr = (unsigned long)node->area->addr +
> + (PAGE_SIZE * i);
> + if (node->handle[i] != INVALID_GRANT_HANDLE) {
> + memset(&op[j], 0, sizeof(op[0]));
> + op[j].host_addr = arbitrary_virt_to_machine(
> + lookup_address(vaddr, &level)).maddr;
> + op[j].handle = node->handle[i];
> + j++;
> + node->handle[i] = INVALID_GRANT_HANDLE;
> + }
> + }
> +
> + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
> + BUG();
> +
> + node->nr_handles = 0;
> +
> + for (i = 0; i < j; i++) {
> + if (op[i].status != GNTST_okay) {
> + err = op[i].status;
> + xenbus_dev_error(dev, err,
> + "unmapping page %d at handle %d error %d",
> + i, op[i].handle, err);
> + }
> + }
> +
> + if (err == GNTST_okay)
> + free_vm_area(node->area);
> +
> + kfree(node);
> +
> + return err;
> +}
> +
> static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
> {
> struct xenbus_map_node *node;
> - struct gnttab_unmap_grant_ref op = {
> - .host_addr = (unsigned long)vaddr,
> - };
> - unsigned int level;
>
> spin_lock(&xenbus_valloc_lock);
> list_for_each_entry(node, &xenbus_valloc_pages, next) {
> @@ -626,33 +730,18 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
>
> if (!node) {
> xenbus_dev_error(dev, -ENOENT,
> - "can't find mapped virtual address %p", vaddr);
> + "can't find mapped virtual address %p", vaddr);
> return GNTST_bad_virt_addr;
> }
>
> - op.handle = node->handle;
> - op.host_addr = arbitrary_virt_to_machine(
> - lookup_address((unsigned long)vaddr, &level)).maddr;
> -
> - if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
> - BUG();
> -
> - if (op.status == GNTST_okay)
> - free_vm_area(node->area);
> - else
> - xenbus_dev_error(dev, op.status,
> - "unmapping page at handle %d error %d",
> - node->handle, op.status);
> -
> - kfree(node);
> - return op.status;
> + return __xenbus_unmap_ring_vfree_pv(dev, node);
> }
>
> static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
> {
> int rv;
> struct xenbus_map_node *node;
> - void *addr;
> + void *addr = NULL;
>
> spin_lock(&xenbus_valloc_lock);
> list_for_each_entry(node, &xenbus_valloc_pages, next) {
> @@ -668,14 +757,14 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
>
> if (!node) {
> xenbus_dev_error(dev, -ENOENT,
> - "can't find mapped virtual address %p", vaddr);
> + "can't find mapped virtual address %p", vaddr);
> return GNTST_bad_virt_addr;
> }
>
> - rv = xenbus_unmap_ring(dev, node->handle, addr);
> + rv = xenbus_unmap_ring(dev, node->handle, node->nr_handles, addr);
>
> if (!rv)
> - free_xenballooned_pages(1, &node->page);
> + free_xenballooned_pages(node->nr_handles, &node->page);
> else
> WARN(1, "Leaking %p\n", vaddr);
>
> @@ -687,6 +776,7 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
> * xenbus_unmap_ring
> * @dev: xenbus device
> * @handle: grant handle
> + * @nr_handles: number of grant handle
> * @vaddr: addr to unmap
> *
> * Unmap a page of memory in this domain that was imported from another domain.
> @@ -694,21 +784,37 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
> * (see xen/include/interface/grant_table.h).
> */
> int xenbus_unmap_ring(struct xenbus_device *dev,
> - grant_handle_t handle, void *vaddr)
> + grant_handle_t handle[], int nr_handles,
> + void *vaddr)
> {
> - struct gnttab_unmap_grant_ref op;
> -
> - gnttab_set_unmap_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, handle);
> + struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
> + int i, j;
> + int err = GNTST_okay;
> +
> + j = 0;
> + for (i = 0; i < nr_handles; i++) {
> + unsigned long addr = (unsigned long)vaddr +
> + (PAGE_SIZE * i);
> + if (handle[i] != INVALID_GRANT_HANDLE) {
> + gnttab_set_unmap_op(&op[j++], (phys_addr_t)addr,
> + GNTMAP_host_map, handle[i]);
> + handle[i] = INVALID_GRANT_HANDLE;
> + }
> + }
>
> - if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
> + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
> BUG();
>
> - if (op.status != GNTST_okay)
> - xenbus_dev_error(dev, op.status,
> - "unmapping page at handle %d error %d",
> - handle, op.status);
> + for (i = 0; i < j; i++) {
> + if (op[i].status != GNTST_okay) {
> + err = op[i].status;
> + xenbus_dev_error(dev, err,
> + "unmapping page at handle %d error %d",
> + handle[i], err);
> + }
> + }
>
> - return op.status;
> + return err;
> }
> EXPORT_SYMBOL_GPL(xenbus_unmap_ring);
>
> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> index 3864967..62b92d2 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -718,6 +718,7 @@ static int __init xenstored_local_init(void)
> return err;
> }
>
> +extern void xenbus_ring_ops_init(void);
> static int __init xenbus_init(void)
> {
> int err = 0;
> @@ -767,6 +768,8 @@ static int __init xenbus_init(void)
> proc_mkdir("xen", NULL);
> #endif
>
> + xenbus_ring_ops_init();
> +
> out_error:
> return err;
> }
> diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> index e8c599b..cdbd948 100644
> --- a/include/xen/xenbus.h
> +++ b/include/xen/xenbus.h
> @@ -195,15 +195,23 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch,
> const char *pathfmt, ...);
>
> int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state new_state);
> -int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn);
> -int xenbus_map_ring_valloc(struct xenbus_device *dev,
> - int gnt_ref, void **vaddr);
> -int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
> - grant_handle_t *handle, void *vaddr);
> +
> +#define XENBUS_MAX_RING_ORDER 2
> +#define XENBUS_MAX_RING_PAGES (1 << XENBUS_MAX_RING_ORDER)
> +
> +#define INVALID_GRANT_HANDLE (~0U)
> +
> +int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
> + int nr_pages, int grefs[]);
> +int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
> + int nr_grefs, void **vaddr);
> +int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
> + grant_handle_t handle[], void *vaddr);
>
> int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr);
> int xenbus_unmap_ring(struct xenbus_device *dev,
> - grant_handle_t handle, void *vaddr);
> + grant_handle_t handle[], int nr_handles,
> + void *vaddr);
>
> int xenbus_alloc_evtchn(struct xenbus_device *dev, int *port);
> int xenbus_bind_evtchn(struct xenbus_device *dev, int remote_port, int *port);
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply
* Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
From: Wei Liu @ 2012-03-06 11:16 UTC (permalink / raw)
To: Santosh Jodh
Cc: jeremy@goop.org, wei.liu2, Ian Campbell, konrad.wilk@oracle.com,
waldi@debian.org, weiyi.huang@gmail.com, joe.jin@oracle.com,
linux-kernel@vger.kernel.org, jbeulich@novell.com,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, linux-pci@vger.kernel.org,
Paul Durrant, jbarnes@virtuousgeek.org, netdev@vger.kernel.org,
dgdegra@tycho.nsa.gov, xen-devel@lists.xen.org, lersek@redhat.com
In-Reply-To: <7914B38A4445B34AA16EB9F1352942F1010A1FA12364@SJCPMAILBOX01.citrite.net>
On Mon, 2012-03-05 at 21:49 +0000, Santosh Jodh wrote:
> From: Santosh Jodh <santosh.jodh@citrix.com>
>
> Add support for multi page ring for block devices.
> The number of pages is configurable for blkback via module parameter.
> blkback reports max-ring-page-order to blkfront via xenstore.
> blkfront reports its supported ring-page-order to blkback via xenstore.
> blkfront reports multi page ring references via ring-refNN in xenstore.
> The change allows newer blkfront to work with older blkback and
> vice-versa.
> Based on original patch by Paul Durrant.
>
> Signed-off-by: Santosh Jodh <santosh.jodh@citrix.com>
Doesn't the xenbus interface change deserve another patch (as
prerequisite for block devices change)? Or at least please mention the
change in commit message?
Wei.
^ permalink raw reply
* Re: [PATCH 0001/001] xen: multi page ring support for block devices
From: Jan Beulich @ 2012-03-06 8:34 UTC (permalink / raw)
To: Santosh Jodh
Cc: jeremy@goop.org, Ian Campbell, netdev@vger.kernel.org,
konrad.wilk@oracle.com, waldi@debian.org, joe.jin@oracle.com,
weiyi.huang@gmail.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org,
paul.gortmaker@windriver.com, Paul Durrant, David Vrabel,
linux-pci@vger.kernel.org, akpm@linux-foundation.org,
xen-devel@lists.xen.org, lersek@redhat.com, dgdegra
In-Reply-To: <7914B38A4445B34AA16EB9F1352942F1010A1FA12364@SJCPMAILBOX01.citrite.net>
>>> On 05.03.12 at 22:49, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
Could this be split up into 3 patches, for easier reviewing:
- one adjusting the xenbus interface to allow for multiple ring pages (and
maybe even that one should be split into the backend and frontend
related parts), syncing with the similar netback effort?
- one for the blkback changes
- one for the blkfront changes?
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -122,8 +122,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
> return blkif;
> }
>
> -static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
> - unsigned int evtchn)
> +static int xen_blkif_map(struct xen_blkif *blkif, int ring_ref[],
As you need to touch this anyway, can you please switch this to the
proper type (grant_ref_t) rather than using plain "int" (not just here)?
> + unsigned int ring_order, unsigned int evtchn)
> {
> int err;
>
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -135,7 +139,7 @@ static DEFINE_SPINLOCK(minor_lock);
> static int get_id_from_freelist(struct blkfront_info *info)
> {
> unsigned long free = info->shadow_free;
> - BUG_ON(free >= BLK_RING_SIZE);
> + BUG_ON(free >= BLK_MAX_RING_SIZE);
Wouldn't you better check against the actual limit here?
> info->shadow_free = info->shadow[free].req.u.rw.id;
> info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
> return free;
> @@ -698,16 +704,19 @@ static void blkif_free(struct blkfront_info *info, int suspend)
> flush_work_sync(&info->work);
>
> /* Free resources associated with old device channel. */
> - if (info->ring_ref != GRANT_INVALID_REF) {
> - gnttab_end_foreign_access(info->ring_ref, 0,
> - (unsigned long)info->ring.sring);
> - info->ring_ref = GRANT_INVALID_REF;
> - info->ring.sring = NULL;
> + for (i = 0; i < (1 << info->ring_order); i++) {
> + if (info->ring_ref[i] != GRANT_INVALID_REF) {
> + gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
> + info->ring_ref[i] = GRANT_INVALID_REF;
> + }
> }
> +
> + free_pages((unsigned long)info->ring.sring, info->ring_order);
No. The freeing must continue happen in gnttab_end_foreign_access()
(with the sole exception when a page was allocated but the grant
didn't get established), since it must be suppressed/delayed when the
grant is still in use (otherwise the kernel will die on the first re-use of
the page). I just happened to fix that problem at the end of last week
in the variant of the patch that we pulled into our tree.
Further, rather than doing a non-zero order allocation here, I'd
suggest allocating individual pages and vmap()-ing them.
> + info->ring.sring = NULL;
> +
> if (info->irq)
> unbind_from_irqhandler(info->irq, info);
> info->evtchn = info->irq = 0;
> -
> }
>
> static void blkif_completion(struct blk_shadow *s)
> @@ -875,8 +883,27 @@ static int talk_to_blkback(struct xenbus_device *dev,
> {
> const char *message = NULL;
> struct xenbus_transaction xbt;
> + unsigned int ring_order;
> + int legacy_backend;
> + int i;
> int err;
>
> + for (i = 0; i < (1 << info->ring_order); i++)
> + info->ring_ref[i] = GRANT_INVALID_REF;
> +
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u",
> + &ring_order);
At least the frontend should imo also support the alternative interface
(using "max-ring-pages" etc).
> +
> + legacy_backend = !(err == 1);
> +
> + if (legacy_backend) {
> + info->ring_order = 0;
> + } else {
> + info->ring_order = (ring_order <= xen_blkif_ring_order) ?
> + ring_order :
> + xen_blkif_ring_order;
min()?
> + }
> +
> /* Create shared ring, alloc event channel. */
> err = setup_blkring(dev, info);
> if (err)
> @@ -889,12 +916,35 @@ again:
> goto destroy_blkring;
> }
>
> - err = xenbus_printf(xbt, dev->nodename,
> - "ring-ref", "%u", info->ring_ref);
> - if (err) {
> - message = "writing ring-ref";
> - goto abort_transaction;
> + if (legacy_backend) {
Why not use the simpler interface always when info->ring_order == 0?
> + err = xenbus_printf(xbt, dev->nodename,
> + "ring-ref", "%d", info->ring_ref[0]);
> + if (err) {
> + message = "writing ring-ref";
> + goto abort_transaction;
> + }
> + } else {
> + for (i = 0; i < (1 << info->ring_order); i++) {
> + char key[sizeof("ring-ref") + 2];
> +
> + sprintf(key, "ring-ref%d", i);
> +
> + err = xenbus_printf(xbt, dev->nodename,
> + key, "%d", info->ring_ref[i]);
> + if (err) {
> + message = "writing ring-ref";
> + goto abort_transaction;
> + }
> + }
> +
> + err = xenbus_printf(xbt, dev->nodename,
> + "ring-page-order", "%u", info->ring_order);
> + if (err) {
> + message = "writing ring-order";
> + goto abort_transaction;
> + }
> }
> +
> err = xenbus_printf(xbt, dev->nodename,
> "event-channel", "%u", info->evtchn);
> if (err) {
> @@ -996,21 +1046,14 @@ static int blkfront_probe(struct xenbus_device *dev,
> info->connected = BLKIF_STATE_DISCONNECTED;
> INIT_WORK(&info->work, blkif_restart_queue);
>
> - for (i = 0; i < BLK_RING_SIZE; i++)
> + for (i = 0; i < BLK_MAX_RING_SIZE; i++)
> info->shadow[i].req.u.rw.id = i+1;
> - info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
> + info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
A proper terminator must also be written in talk_to_blkback() once
the actual ring size is known.
Further, blkif_recover() must be able to deal with a change of the
allowed upper bound.
> /* Front end dir is a number, which is used as the id. */
> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> dev_set_drvdata(&dev->dev, info);
>
> - err = talk_to_blkback(dev, info);
Completely removing this here is wrong afaict - what if the backend
already is in InitWait when the frontend starts?
Further, whatever is done to this call here also needs to be done in
blkfront_resume().
> - if (err) {
> - kfree(info);
> - dev_set_drvdata(&dev->dev, NULL);
> - return err;
> - }
> -
> return 0;
> }
>
> @@ -1307,6 +1349,10 @@ static void blkback_changed(struct xenbus_device *dev,
> case XenbusStateClosed:
> break;
>
> + case XenbusStateInitWait:
> + talk_to_blkback(dev, info);
This call can return an error.
> + break;
> +
> case XenbusStateConnected:
> blkfront_connect(info);
> break;
> --- a/include/xen/xenbus.h
> +++ b/include/xen/xenbus.h
> @@ -195,15 +195,23 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch,
> const char *pathfmt, ...);
>
> int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state new_state);
> -int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn);
> -int xenbus_map_ring_valloc(struct xenbus_device *dev,
> - int gnt_ref, void **vaddr);
> -int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
> - grant_handle_t *handle, void *vaddr);
> +
> +#define XENBUS_MAX_RING_ORDER 2
> +#define XENBUS_MAX_RING_PAGES (1 << XENBUS_MAX_RING_ORDER)
Why do you need an artificial global limit here? Each driver can decide
individually what its limit should be.
Jan
^ permalink raw reply
* RE: [PATCH 0001/001] xen: multi page ring support for block devices
From: Santosh Jodh @ 2012-03-06 6:21 UTC (permalink / raw)
To: Rusty Russell, konrad.wilk@oracle.com, jeremy@goop.org,
Ian Campbell, jbarnes@virtuousgeek.org, jbeulich@novell.com,
joe.jin@oracle.com, lersek@redhat.com, weiyi.huang@gmail.com,
dgdegra@tycho.nsa.gov, David Vrabel, paul.gortmaker@windriver.com,
akpm@linux-foundation.org, waldi@debian.org,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-pci@vger.kernel.org
Cc: Paul Durrant
In-Reply-To: <87ty22xxee.fsf@rustcorp.com.au>
Great feedback. I removed unsigned for the first, changed the error code and added module param name in the printk.
Please see latest patch:
---
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 0088bf6..cc238e7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -60,6 +60,40 @@ static int xen_blkif_reqs = 64;
module_param_named(reqs, xen_blkif_reqs, int, 0);
MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate");
+/* Order of maximum shared ring size advertised to the front end. */
+int xen_blkif_max_ring_order = XENBUS_MAX_RING_ORDER;
+
+#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
+
+static int set_max_ring_order(const char *buf, struct kernel_param *kp)
+{
+ int err;
+ long order;
+
+ err = kstrtol(buf, 0, &order);
+ if (err ||
+ order < 0 ||
+ order > XENBUS_MAX_RING_ORDER)
+ return -ERANGE;
+
+ if (xen_blkif_reqs < BLK_RING_SIZE(order))
+ printk(KERN_WARNING "WARNING: "
+ "I/O request space (%d reqs) < ring order %ld "
+ "set by module parameter %s.max_ring_order, "
+ "consider increasing %s.reqs to >= %ld.",
+ xen_blkif_reqs, order, KBUILD_MODNAME, KBUILD_MODNAME,
+ roundup_pow_of_two(BLK_RING_SIZE(order)));
+
+ xen_blkif_max_ring_order = order;
+
+ return 0;
+}
+
+module_param_call(max_ring_order,
+ set_max_ring_order, param_get_int,
+ &xen_blkif_max_ring_order, 0644);
+MODULE_PARM_DESC(max_ring_order, "log2 of maximum ring size, in pages.");
+
/* Run-time switchable: /sys/module/blkback/parameters/ */
static unsigned int log_stats;
module_param(log_stats, int, 0644);
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index d0ee7ed..5f33a1a 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -126,6 +126,8 @@ struct blkif_x86_64_response {
int16_t status; /* BLKIF_RSP_??? */
};
+extern int xen_blkif_max_ring_order;
+
DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
struct blkif_common_response);
DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 24a2fb5..7a9d71d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -122,8 +122,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
return blkif;
}
-static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
- unsigned int evtchn)
+static int xen_blkif_map(struct xen_blkif *blkif, int ring_ref[],
+ unsigned int ring_order, unsigned int evtchn)
{
int err;
@@ -131,7 +131,8 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
if (blkif->irq)
return 0;
- err = xenbus_map_ring_valloc(blkif->be->dev, shared_page, &blkif->blk_ring);
+ err = xenbus_map_ring_valloc(blkif->be->dev, ring_ref, 1 << ring_order,
+ &blkif->blk_ring);
if (err < 0)
return err;
@@ -140,21 +141,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.native, sring,
+ PAGE_SIZE << ring_order);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
struct blkif_x86_32_sring *sring_x86_32;
sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+ PAGE_SIZE << ring_order);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
struct blkif_x86_64_sring *sring_x86_64;
sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+ PAGE_SIZE << ring_order);
break;
}
default:
@@ -497,6 +501,11 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
goto fail;
+ err = xenbus_printf(XBT_NIL, dev->nodename, "max-ring-page-order",
+ "%u", xen_blkif_max_ring_order);
+ if (err)
+ goto fail;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -744,22 +753,80 @@ again:
static int connect_ring(struct backend_info *be)
{
struct xenbus_device *dev = be->dev;
- unsigned long ring_ref;
+ int ring_ref[XENBUS_MAX_RING_PAGES];
+ unsigned int ring_order;
unsigned int evtchn;
char protocol[64] = "";
int err;
DPRINTK("%s", dev->otherend);
- err = xenbus_gather(XBT_NIL, dev->otherend, "ring-ref", "%lu",
- &ring_ref, "event-channel", "%u", &evtchn, NULL);
- if (err) {
- xenbus_dev_fatal(dev, err,
- "reading %s/ring-ref and event-channel",
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+ &evtchn);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err, "reading %s/event-channel",
dev->otherend);
return err;
}
+ printk(KERN_INFO "blkback: event-channel %u\n", evtchn);
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+ &ring_order);
+ if (err != 1) {
+ DPRINTK("%s: using single page handshake", dev->otherend);
+
+ ring_order = 0;
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
+ "%d", &ring_ref[0]);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+ dev->otherend);
+ return err;
+ }
+
+ printk(KERN_INFO "blkback: ring-ref %d\n", ring_ref[0]);
+ } else {
+ unsigned int i;
+
+ if (ring_order > xen_blkif_max_ring_order) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err,
+ "%s/ring-page-order too big",
+ dev->otherend);
+ return err;
+ }
+
+ for (i = 0; i < (1u << ring_order); i++) {
+ char ring_ref_name[10];
+
+ snprintf(ring_ref_name, sizeof(ring_ref_name),
+ "ring-ref%u", i);
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend,
+ ring_ref_name, "%d",
+ &ring_ref[i]);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err,
+ "reading %s/%s",
+ dev->otherend,
+ ring_ref_name);
+ return err;
+ }
+
+ printk(KERN_INFO "blkback: ring-ref%u %d\n", i,
+ ring_ref[i]);
+ }
+ }
+
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
"%63s", protocol, NULL);
@@ -775,14 +842,11 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -1;
}
- pr_info(DRV_PFX "ring-ref %ld, event-channel %d, protocol %d (%s)\n",
- ring_ref, evtchn, be->blkif->blk_protocol, protocol);
/* Map the shared frame, irq etc. */
- err = xen_blkif_map(be->blkif, ring_ref, evtchn);
+ err = xen_blkif_map(be->blkif, ring_ref, ring_order, evtchn);
if (err) {
- xenbus_dev_fatal(dev, err, "mapping ring-ref %lu port %u",
- ring_ref, evtchn);
+ xenbus_dev_fatal(dev, err, "mapping ring-refs and evtchn");
return err;
}
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2f22874..485813a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -57,6 +57,10 @@
#include <asm/xen/hypervisor.h>
+static int xen_blkif_ring_order;
+module_param_named(reqs, xen_blkif_ring_order, int, 0);
+MODULE_PARM_DESC(reqs, "log2 of requested ring size, in pages.");
+
enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
@@ -72,7 +76,8 @@ struct blk_shadow {
static DEFINE_MUTEX(blkfront_mutex);
static const struct block_device_operations xlvbd_block_fops;
-#define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
+#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
+#define BLK_MAX_RING_SIZE BLK_RING_SIZE(XENBUS_MAX_RING_ORDER)
/*
* We have one of these per vbd, whether ide, scsi or 'other'. They
@@ -87,14 +92,15 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
- int ring_ref;
+ int ring_ref[XENBUS_MAX_RING_PAGES];
+ int ring_order;
struct blkif_front_ring ring;
struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
unsigned int evtchn, irq;
struct request_queue *rq;
struct work_struct work;
struct gnttab_free_callback callback;
- struct blk_shadow shadow[BLK_RING_SIZE];
+ struct blk_shadow shadow[BLK_MAX_RING_SIZE];
unsigned long shadow_free;
unsigned int feature_flush;
unsigned int flush_op;
@@ -111,9 +117,7 @@ static unsigned int nr_minors;
static unsigned long *minors;
static DEFINE_SPINLOCK(minor_lock);
-#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
- (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
-#define GRANT_INVALID_REF 0
+#define GRANT_INVALID_REF 0
#define PARTS_PER_DISK 16
#define PARTS_PER_EXT_DISK 256
@@ -135,7 +139,7 @@ static DEFINE_SPINLOCK(minor_lock);
static int get_id_from_freelist(struct blkfront_info *info)
{
unsigned long free = info->shadow_free;
- BUG_ON(free >= BLK_RING_SIZE);
+ BUG_ON(free >= BLK_MAX_RING_SIZE);
info->shadow_free = info->shadow[free].req.u.rw.id;
info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
return free;
@@ -683,6 +687,8 @@ static void blkif_restart_queue(struct work_struct *work)
static void blkif_free(struct blkfront_info *info, int suspend)
{
+ int i;
+
/* Prevent new requests being issued until we fix things up. */
spin_lock_irq(&blkif_io_lock);
info->connected = suspend ?
@@ -698,16 +704,19 @@ static void blkif_free(struct blkfront_info *info, int suspend)
flush_work_sync(&info->work);
/* Free resources associated with old device channel. */
- if (info->ring_ref != GRANT_INVALID_REF) {
- gnttab_end_foreign_access(info->ring_ref, 0,
- (unsigned long)info->ring.sring);
- info->ring_ref = GRANT_INVALID_REF;
- info->ring.sring = NULL;
+ for (i = 0; i < (1 << info->ring_order); i++) {
+ if (info->ring_ref[i] != GRANT_INVALID_REF) {
+ gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
+ info->ring_ref[i] = GRANT_INVALID_REF;
+ }
}
+
+ free_pages((unsigned long)info->ring.sring, info->ring_order);
+ info->ring.sring = NULL;
+
if (info->irq)
unbind_from_irqhandler(info->irq, info);
info->evtchn = info->irq = 0;
-
}
static void blkif_completion(struct blk_shadow *s)
@@ -828,25 +837,24 @@ static int setup_blkring(struct xenbus_device *dev,
struct blkif_sring *sring;
int err;
- info->ring_ref = GRANT_INVALID_REF;
-
- sring = (struct blkif_sring *)__get_free_page(GFP_NOIO | __GFP_HIGH);
+ sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
+ info->ring_order);
if (!sring) {
xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
return -ENOMEM;
}
SHARED_RING_INIT(sring);
- FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
+ FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE << info->ring_order);
sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
- err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
+ err = xenbus_grant_ring(dev, info->ring.sring, 1 << info->ring_order,
+ info->ring_ref);
if (err < 0) {
- free_page((unsigned long)sring);
+ free_pages((unsigned long)sring, info->ring_order);
info->ring.sring = NULL;
goto fail;
}
- info->ring_ref = err;
err = xenbus_alloc_evtchn(dev, &info->evtchn);
if (err)
@@ -875,8 +883,27 @@ static int talk_to_blkback(struct xenbus_device *dev,
{
const char *message = NULL;
struct xenbus_transaction xbt;
+ unsigned int ring_order;
+ int legacy_backend;
+ int i;
int err;
+ for (i = 0; i < (1 << info->ring_order); i++)
+ info->ring_ref[i] = GRANT_INVALID_REF;
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u",
+ &ring_order);
+
+ legacy_backend = !(err == 1);
+
+ if (legacy_backend) {
+ info->ring_order = 0;
+ } else {
+ info->ring_order = (ring_order <= xen_blkif_ring_order) ?
+ ring_order :
+ xen_blkif_ring_order;
+ }
+
/* Create shared ring, alloc event channel. */
err = setup_blkring(dev, info);
if (err)
@@ -889,12 +916,35 @@ again:
goto destroy_blkring;
}
- err = xenbus_printf(xbt, dev->nodename,
- "ring-ref", "%u", info->ring_ref);
- if (err) {
- message = "writing ring-ref";
- goto abort_transaction;
+ if (legacy_backend) {
+ err = xenbus_printf(xbt, dev->nodename,
+ "ring-ref", "%d", info->ring_ref[0]);
+ if (err) {
+ message = "writing ring-ref";
+ goto abort_transaction;
+ }
+ } else {
+ for (i = 0; i < (1 << info->ring_order); i++) {
+ char key[sizeof("ring-ref") + 2];
+
+ sprintf(key, "ring-ref%d", i);
+
+ err = xenbus_printf(xbt, dev->nodename,
+ key, "%d", info->ring_ref[i]);
+ if (err) {
+ message = "writing ring-ref";
+ goto abort_transaction;
+ }
+ }
+
+ err = xenbus_printf(xbt, dev->nodename,
+ "ring-page-order", "%u", info->ring_order);
+ if (err) {
+ message = "writing ring-order";
+ goto abort_transaction;
+ }
}
+
err = xenbus_printf(xbt, dev->nodename,
"event-channel", "%u", info->evtchn);
if (err) {
@@ -996,21 +1046,14 @@ static int blkfront_probe(struct xenbus_device *dev,
info->connected = BLKIF_STATE_DISCONNECTED;
INIT_WORK(&info->work, blkif_restart_queue);
- for (i = 0; i < BLK_RING_SIZE; i++)
+ for (i = 0; i < BLK_MAX_RING_SIZE; i++)
info->shadow[i].req.u.rw.id = i+1;
- info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
+ info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);
- err = talk_to_blkback(dev, info);
- if (err) {
- kfree(info);
- dev_set_drvdata(&dev->dev, NULL);
- return err;
- }
-
return 0;
}
@@ -1031,13 +1074,13 @@ static int blkif_recover(struct blkfront_info *info)
/* Stage 2: Set up free list. */
memset(&info->shadow, 0, sizeof(info->shadow));
- for (i = 0; i < BLK_RING_SIZE; i++)
+ for (i = 0; i < BLK_MAX_RING_SIZE; i++)
info->shadow[i].req.u.rw.id = i+1;
info->shadow_free = info->ring.req_prod_pvt;
- info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
+ info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
/* Stage 3: Find pending requests and requeue them. */
- for (i = 0; i < BLK_RING_SIZE; i++) {
+ for (i = 0; i < BLK_RING_SIZE(info->ring_order); i++) {
/* Not in use? */
if (!copy[i].request)
continue;
@@ -1299,7 +1342,6 @@ static void blkback_changed(struct xenbus_device *dev,
switch (backend_state) {
case XenbusStateInitialising:
- case XenbusStateInitWait:
case XenbusStateInitialised:
case XenbusStateReconfiguring:
case XenbusStateReconfigured:
@@ -1307,6 +1349,10 @@ static void blkback_changed(struct xenbus_device *dev,
case XenbusStateClosed:
break;
+ case XenbusStateInitWait:
+ talk_to_blkback(dev, info);
+ break;
+
case XenbusStateConnected:
blkfront_connect(info);
break;
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 94b79c3..f93b59a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -130,8 +130,8 @@ int xen_netbk_must_stop_queue(struct xenvif *vif);
/* (Un)Map communication rings. */
void xen_netbk_unmap_frontend_rings(struct xenvif *vif);
int xen_netbk_map_frontend_rings(struct xenvif *vif,
- grant_ref_t tx_ring_ref,
- grant_ref_t rx_ring_ref);
+ int tx_ring_ref,
+ int rx_ring_ref);
/* (De)Register a xenvif with the netback backend. */
void xen_netbk_add_xenvif(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 59effac..0b014cf 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1594,8 +1594,8 @@ void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
}
int xen_netbk_map_frontend_rings(struct xenvif *vif,
- grant_ref_t tx_ring_ref,
- grant_ref_t rx_ring_ref)
+ int tx_ring_ref,
+ int rx_ring_ref)
{
void *addr;
struct xen_netif_tx_sring *txs;
@@ -1604,7 +1604,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
int err = -ENOMEM;
err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
- tx_ring_ref, &addr);
+ &tx_ring_ref, 1, &addr);
if (err)
goto err;
@@ -1612,7 +1612,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
- rx_ring_ref, &addr);
+ &rx_ring_ref, 1, &addr);
if (err)
goto err;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 698b905..521a595 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1496,13 +1496,12 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
SHARED_RING_INIT(txs);
FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
- err = xenbus_grant_ring(dev, virt_to_mfn(txs));
+ err = xenbus_grant_ring(dev, txs, 1, &info->tx_ring_ref);
if (err < 0) {
free_page((unsigned long)txs);
goto fail;
}
- info->tx_ring_ref = err;
rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
if (!rxs) {
err = -ENOMEM;
@@ -1512,12 +1511,11 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
SHARED_RING_INIT(rxs);
FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
- err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
+ err = xenbus_grant_ring(dev, rxs, 1, &info->rx_ring_ref);
if (err < 0) {
free_page((unsigned long)rxs);
goto fail;
}
- info->rx_ring_ref = err;
err = xenbus_alloc_evtchn(dev, &info->evtchn);
if (err)
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 1620088..95109d8 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -768,12 +768,10 @@ static int pcifront_publish_info(struct pcifront_device *pdev)
int err = 0;
struct xenbus_transaction trans;
- err = xenbus_grant_ring(pdev->xdev, virt_to_mfn(pdev->sh_info));
+ err = xenbus_grant_ring(pdev->xdev, pdev->sh_info, 1, &pdev->gnt_ref);
if (err < 0)
goto out;
- pdev->gnt_ref = err;
-
err = xenbus_alloc_evtchn(pdev->xdev, &pdev->evtchn);
if (err)
goto out;
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index 64b11f9..e0834cd 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -108,7 +108,7 @@ static int xen_pcibk_do_attach(struct xen_pcibk_device *pdev, int gnt_ref,
"Attaching to frontend resources - gnt_ref=%d evtchn=%d\n",
gnt_ref, remote_evtchn);
- err = xenbus_map_ring_valloc(pdev->xdev, gnt_ref, &vaddr);
+ err = xenbus_map_ring_valloc(pdev->xdev, &gnt_ref, 1, &vaddr);
if (err < 0) {
xenbus_dev_fatal(pdev->xdev, err,
"Error mapping other domain page in ours.");
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 566d2ad..3a14524 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -53,14 +53,16 @@ struct xenbus_map_node {
struct vm_struct *area; /* PV */
struct page *page; /* HVM */
};
- grant_handle_t handle;
+ grant_handle_t handle[XENBUS_MAX_RING_PAGES];
+ unsigned int nr_handles;
};
static DEFINE_SPINLOCK(xenbus_valloc_lock);
static LIST_HEAD(xenbus_valloc_pages);
struct xenbus_ring_ops {
- int (*map)(struct xenbus_device *dev, int gnt, void **vaddr);
+ int (*map)(struct xenbus_device *dev, int gnt[], int nr_gnts,
+ void **vaddr);
int (*unmap)(struct xenbus_device *dev, void *vaddr);
};
@@ -356,17 +358,38 @@ static void xenbus_switch_fatal(struct xenbus_device *dev, int depth, int err,
/**
* xenbus_grant_ring
* @dev: xenbus device
- * @ring_mfn: mfn of ring to grant
-
- * Grant access to the given @ring_mfn to the peer of the given device. Return
- * 0 on success, or -errno on error. On error, the device will switch to
- * XenbusStateClosing, and the error will be saved in the store.
+ * @vaddr: starting virtual address of the ring
+ * @nr_pages: number of page to be granted
+ * @grefs: grant reference array to be filled in
+ * Grant access to the given @vaddr to the peer of the given device.
+ * Then fill in @grefs with grant references. Return 0 on success, or
+ * -errno on error. On error, the device will switch to
+ * XenbusStateClosing, and the first error will be saved in the store.
*/
-int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn)
+int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
+ int nr_pages, int grefs[])
{
- int err = gnttab_grant_foreign_access(dev->otherend_id, ring_mfn, 0);
- if (err < 0)
- xenbus_dev_fatal(dev, err, "granting access to ring page");
+ int i;
+ int err;
+
+ for (i = 0; i < nr_pages; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ err = gnttab_grant_foreign_access(dev->otherend_id,
+ virt_to_mfn(addr), 0);
+ if (err < 0) {
+ xenbus_dev_fatal(dev, err,
+ "granting access to ring page");
+ goto fail;
+ }
+ grefs[i] = err;
+ }
+
+ return 0;
+
+fail:
+ for ( ; i >= 0; i--)
+ gnttab_end_foreign_access_ref(grefs[i], 0);
return err;
}
EXPORT_SYMBOL_GPL(xenbus_grant_ring);
@@ -447,7 +470,8 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
/**
* xenbus_map_ring_valloc
* @dev: xenbus device
- * @gnt_ref: grant reference
+ * @gnt_ref: grant reference array
+ * @nr_grefs: number of grant reference
* @vaddr: pointer to address to be filled out by mapping
*
* Based on Rusty Russell's skeleton driver's map_page.
@@ -458,23 +482,28 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
* or -ENOMEM on error. If an error is returned, device will switch to
* XenbusStateClosing and the error message will be saved in XenStore.
*/
-int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
+int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
+ int nr_grefs, void **vaddr)
{
- return ring_ops->map(dev, gnt_ref, vaddr);
+ return ring_ops->map(dev, gnt_ref, nr_grefs, vaddr);
}
EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
+static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
+ struct xenbus_map_node *node);
+
static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
- int gnt_ref, void **vaddr)
+ int gnt_ref[], int nr_grefs, void **vaddr)
{
- struct gnttab_map_grant_ref op = {
- .flags = GNTMAP_host_map | GNTMAP_contains_pte,
- .ref = gnt_ref,
- .dom = dev->otherend_id,
- };
+ struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
struct xenbus_map_node *node;
struct vm_struct *area;
- pte_t *pte;
+ pte_t *pte[XENBUS_MAX_RING_PAGES];
+ int i;
+ int err = 0;
+
+ if (nr_grefs > XENBUS_MAX_RING_PAGES)
+ return -EINVAL;
*vaddr = NULL;
@@ -482,28 +511,44 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
if (!node)
return -ENOMEM;
- area = alloc_vm_area(PAGE_SIZE, &pte);
+ area = alloc_vm_area(PAGE_SIZE * nr_grefs, pte);
if (!area) {
kfree(node);
return -ENOMEM;
}
- op.host_addr = arbitrary_virt_to_machine(pte).maddr;
+ for (i = 0; i < nr_grefs; i++) {
+ op[i].flags = GNTMAP_host_map | GNTMAP_contains_pte,
+ op[i].ref = gnt_ref[i],
+ op[i].dom = dev->otherend_id,
+ op[i].host_addr = arbitrary_virt_to_machine(pte[i]).maddr;
+ };
if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
BUG();
- if (op.status != GNTST_okay) {
- free_vm_area(area);
- kfree(node);
- xenbus_dev_fatal(dev, op.status,
- "mapping in shared page %d from domain %d",
- gnt_ref, dev->otherend_id);
- return op.status;
+ node->nr_handles = nr_grefs;
+ node->area = area;
+
+ for (i = 0; i < nr_grefs; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ node->handle[i] = INVALID_GRANT_HANDLE;
+ continue;
+ }
+ node->handle[i] = op[i].handle;
}
- node->handle = op.handle;
- node->area = area;
+ if (err != 0) {
+ for (i = 0; i < nr_grefs; i++)
+ xenbus_dev_fatal(dev, op[i].status,
+ "mapping in shared page %d from domain %d",
+ gnt_ref[i], dev->otherend_id);
+
+ __xenbus_unmap_ring_vfree_pv(dev, node);
+
+ return err;
+ }
spin_lock(&xenbus_valloc_lock);
list_add(&node->next, &xenbus_valloc_pages);
@@ -514,25 +559,29 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
}
static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
- int gnt_ref, void **vaddr)
+ int gnt_ref[], int nr_grefs, void **vaddr)
{
struct xenbus_map_node *node;
int err;
void *addr;
+ if (nr_grefs > XENBUS_MAX_RING_PAGES)
+ return -EINVAL;
+
*vaddr = NULL;
node = kzalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
- err = alloc_xenballooned_pages(1, &node->page, false /* lowmem */);
+ err = alloc_xenballooned_pages(nr_grefs, &node->page,
+ false /* lowmem */);
if (err)
goto out_err;
addr = pfn_to_kaddr(page_to_pfn(node->page));
- err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr);
+ err = xenbus_map_ring(dev, gnt_ref, nr_grefs, node->handle, addr);
if (err)
goto out_err;
@@ -544,7 +593,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
return 0;
out_err:
- free_xenballooned_pages(1, &node->page);
+ free_xenballooned_pages(nr_grefs, &node->page);
kfree(node);
return err;
}
@@ -553,36 +602,51 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
/**
* xenbus_map_ring
* @dev: xenbus device
- * @gnt_ref: grant reference
- * @handle: pointer to grant handle to be filled
+ * @gnt_ref: grant reference array
+ * @nr_grefs: number of grant references
+ * @handle: pointer to grant handle array to be filled, mind the size
* @vaddr: address to be mapped to
*
- * Map a page of memory into this domain from another domain's grant table.
+ * Map pages of memory into this domain from another domain's grant table.
* xenbus_map_ring does not allocate the virtual address space (you must do
- * this yourself!). It only maps in the page to the specified address.
+ * this yourself!). It only maps in the pages to the specified address.
* Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
* or -ENOMEM on error. If an error is returned, device will switch to
- * XenbusStateClosing and the error message will be saved in XenStore.
+ * XenbusStateClosing and the last error message will be saved in XenStore.
*/
-int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
- grant_handle_t *handle, void *vaddr)
+int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
+ grant_handle_t handle[], void *vaddr)
{
- struct gnttab_map_grant_ref op;
-
- gnttab_set_map_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, gnt_ref,
- dev->otherend_id);
+ struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
+ int i;
+ int err = GNTST_okay; /* 0 */
+
+ for (i = 0; i < nr_grefs; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ gnttab_set_map_op(&op[i], (phys_addr_t)addr,
+ GNTMAP_host_map, gnt_ref[i],
+ dev->otherend_id);
+ }
- if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
+ if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nr_grefs))
BUG();
- if (op.status != GNTST_okay) {
- xenbus_dev_fatal(dev, op.status,
- "mapping in shared page %d from domain %d",
- gnt_ref, dev->otherend_id);
- } else
- *handle = op.handle;
+ for (i = 0; i < nr_grefs; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_fatal(dev, err,
+ "mapping in shared page %d from domain %d",
+ gnt_ref[i], dev->otherend_id);
+ handle[i] = INVALID_GRANT_HANDLE;
+ } else
+ handle[i] = op[i].handle;
+ }
- return op.status;
+ if (err != GNTST_okay)
+ xenbus_unmap_ring(dev, handle, nr_grefs, vaddr);
+
+ return err;
}
EXPORT_SYMBOL_GPL(xenbus_map_ring);
@@ -605,13 +669,53 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
}
EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
+static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
+ struct xenbus_map_node *node)
+{
+ struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
+ unsigned int level;
+ int i, j;
+ int err = GNTST_okay;
+
+ j = 0;
+ for (i = 0; i < node->nr_handles; i++) {
+ unsigned long vaddr = (unsigned long)node->area->addr +
+ (PAGE_SIZE * i);
+ if (node->handle[i] != INVALID_GRANT_HANDLE) {
+ memset(&op[j], 0, sizeof(op[0]));
+ op[j].host_addr = arbitrary_virt_to_machine(
+ lookup_address(vaddr, &level)).maddr;
+ op[j].handle = node->handle[i];
+ j++;
+ node->handle[i] = INVALID_GRANT_HANDLE;
+ }
+ }
+
+ if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
+ BUG();
+
+ node->nr_handles = 0;
+
+ for (i = 0; i < j; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_error(dev, err,
+ "unmapping page %d at handle %d error %d",
+ i, op[i].handle, err);
+ }
+ }
+
+ if (err == GNTST_okay)
+ free_vm_area(node->area);
+
+ kfree(node);
+
+ return err;
+}
+
static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
{
struct xenbus_map_node *node;
- struct gnttab_unmap_grant_ref op = {
- .host_addr = (unsigned long)vaddr,
- };
- unsigned int level;
spin_lock(&xenbus_valloc_lock);
list_for_each_entry(node, &xenbus_valloc_pages, next) {
@@ -626,33 +730,18 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
if (!node) {
xenbus_dev_error(dev, -ENOENT,
- "can't find mapped virtual address %p", vaddr);
+ "can't find mapped virtual address %p", vaddr);
return GNTST_bad_virt_addr;
}
- op.handle = node->handle;
- op.host_addr = arbitrary_virt_to_machine(
- lookup_address((unsigned long)vaddr, &level)).maddr;
-
- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
- BUG();
-
- if (op.status == GNTST_okay)
- free_vm_area(node->area);
- else
- xenbus_dev_error(dev, op.status,
- "unmapping page at handle %d error %d",
- node->handle, op.status);
-
- kfree(node);
- return op.status;
+ return __xenbus_unmap_ring_vfree_pv(dev, node);
}
static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
{
int rv;
struct xenbus_map_node *node;
- void *addr;
+ void *addr = NULL;
spin_lock(&xenbus_valloc_lock);
list_for_each_entry(node, &xenbus_valloc_pages, next) {
@@ -668,14 +757,14 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
if (!node) {
xenbus_dev_error(dev, -ENOENT,
- "can't find mapped virtual address %p", vaddr);
+ "can't find mapped virtual address %p", vaddr);
return GNTST_bad_virt_addr;
}
- rv = xenbus_unmap_ring(dev, node->handle, addr);
+ rv = xenbus_unmap_ring(dev, node->handle, node->nr_handles, addr);
if (!rv)
- free_xenballooned_pages(1, &node->page);
+ free_xenballooned_pages(node->nr_handles, &node->page);
else
WARN(1, "Leaking %p\n", vaddr);
@@ -687,6 +776,7 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
* xenbus_unmap_ring
* @dev: xenbus device
* @handle: grant handle
+ * @nr_handles: number of grant handle
* @vaddr: addr to unmap
*
* Unmap a page of memory in this domain that was imported from another domain.
@@ -694,21 +784,37 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
* (see xen/include/interface/grant_table.h).
*/
int xenbus_unmap_ring(struct xenbus_device *dev,
- grant_handle_t handle, void *vaddr)
+ grant_handle_t handle[], int nr_handles,
+ void *vaddr)
{
- struct gnttab_unmap_grant_ref op;
-
- gnttab_set_unmap_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, handle);
+ struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
+ int i, j;
+ int err = GNTST_okay;
+
+ j = 0;
+ for (i = 0; i < nr_handles; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ if (handle[i] != INVALID_GRANT_HANDLE) {
+ gnttab_set_unmap_op(&op[j++], (phys_addr_t)addr,
+ GNTMAP_host_map, handle[i]);
+ handle[i] = INVALID_GRANT_HANDLE;
+ }
+ }
- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
+ if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
BUG();
- if (op.status != GNTST_okay)
- xenbus_dev_error(dev, op.status,
- "unmapping page at handle %d error %d",
- handle, op.status);
+ for (i = 0; i < j; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_error(dev, err,
+ "unmapping page at handle %d error %d",
+ handle[i], err);
+ }
+ }
- return op.status;
+ return err;
}
EXPORT_SYMBOL_GPL(xenbus_unmap_ring);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3864967..62b92d2 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -718,6 +718,7 @@ static int __init xenstored_local_init(void)
return err;
}
+extern void xenbus_ring_ops_init(void);
static int __init xenbus_init(void)
{
int err = 0;
@@ -767,6 +768,8 @@ static int __init xenbus_init(void)
proc_mkdir("xen", NULL);
#endif
+ xenbus_ring_ops_init();
+
out_error:
return err;
}
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index e8c599b..cdbd948 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -195,15 +195,23 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch,
const char *pathfmt, ...);
int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state new_state);
-int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn);
-int xenbus_map_ring_valloc(struct xenbus_device *dev,
- int gnt_ref, void **vaddr);
-int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
- grant_handle_t *handle, void *vaddr);
+
+#define XENBUS_MAX_RING_ORDER 2
+#define XENBUS_MAX_RING_PAGES (1 << XENBUS_MAX_RING_ORDER)
+
+#define INVALID_GRANT_HANDLE (~0U)
+
+int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
+ int nr_pages, int grefs[]);
+int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
+ int nr_grefs, void **vaddr);
+int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
+ grant_handle_t handle[], void *vaddr);
int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr);
int xenbus_unmap_ring(struct xenbus_device *dev,
- grant_handle_t handle, void *vaddr);
+ grant_handle_t handle[], int nr_handles,
+ void *vaddr);
int xenbus_alloc_evtchn(struct xenbus_device *dev, int *port);
int xenbus_bind_evtchn(struct xenbus_device *dev, int remote_port, int *port);
^ permalink raw reply related
* Re: [PATCH 0001/001] xen: multi page ring support for block devices
From: Rusty Russell @ 2012-03-06 2:42 UTC (permalink / raw)
To: konrad.wilk@oracle.com, jeremy@goop.org, Ian Campbell,
jbarnes@virtuousgeek.org, jbeulich@novell.com, joe.jin@oracle.com,
lersek@redhat.com, weiyi.huang@gmail.com, dgdegra@tycho.nsa.gov,
David Vrabel, paul.gortmaker@windriver.com,
akpm@linux-foundation.org, waldi@debian.org,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-pci@vger.kernel.org, linux-kernel
Cc: Paul Durrant, Santosh Jodh
In-Reply-To: <7914B38A4445B34AA16EB9F1352942F1010A1FA12364@SJCPMAILBOX01.citrite.net>
On Mon, 5 Mar 2012 13:49:07 -0800, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
> +/* Order of maximum shared ring size advertised to the front end. */
> +int xen_blkif_max_ring_order = XENBUS_MAX_RING_ORDER;
> +
> +#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
> +
> +static int set_max_ring_order(const char *buf, struct kernel_param *kp)
> +{
> + int err;
> + unsigned long order;
> +
> + err = kstrtol(buf, 0, &order);
> + if (err ||
> + order < 0 ||
> + order > XENBUS_MAX_RING_ORDER)
> + return -EINVAL;
Hmm, order can't be < 0, since it's unsigned. So did you mean
kstrtoull?
And I think returning err is cleaner (it's -EINVAL for malformed
strings, -ERANGE for ones too big).
> + if (xen_blkif_reqs < BLK_RING_SIZE(order))
> + printk(KERN_WARNING "WARNING: "
> + "I/O request space (%d reqs) < ring order %ld, "
> + "consider increasing %s.reqs to >= %ld.",
> + xen_blkif_reqs, order, KBUILD_MODNAME,
> + roundup_pow_of_two(BLK_RING_SIZE(order)));
This message doesn't mention the module namr or parameter name
anywhere. Think of the poor sysadmins!
Thanks,
Rusty.
--
How could I marry someone with more hair than me? http://baldalex.org
^ permalink raw reply
* [PATCH 0001/001] xen: multi page ring support for block devices
From: Santosh Jodh @ 2012-03-05 21:49 UTC (permalink / raw)
To: konrad.wilk@oracle.com, jeremy@goop.org, Ian Campbell,
jbarnes@virtuousgeek.org, jbeulich@novell.com, joe.jin@oracle.com,
lersek@redhat.com, weiyi.huang@gmail.com, rusty@rustcorp.com.au,
dgdegra@tycho.nsa.gov, David Vrabel, paul.gortmaker@windriver.com,
akpm@linux-foundation.org, waldi@debian.org,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-pci@vger.kernel.org, linux-kernel
Cc: Paul Durrant, Santosh Jodh
In-Reply-To: <1330701099-18281-1-git-send-email-santoshprasadnayak@gmail.com>
From: Santosh Jodh <santosh.jodh@citrix.com>
Add support for multi page ring for block devices.
The number of pages is configurable for blkback via module parameter.
blkback reports max-ring-page-order to blkfront via xenstore.
blkfront reports its supported ring-page-order to blkback via xenstore.
blkfront reports multi page ring references via ring-refNN in xenstore.
The change allows newer blkfront to work with older blkback and
vice-versa.
Based on original patch by Paul Durrant.
Signed-off-by: Santosh Jodh <santosh.jodh@citrix.com>
---
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 0088bf6..72f2e18 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -60,6 +60,39 @@ static int xen_blkif_reqs = 64;
module_param_named(reqs, xen_blkif_reqs, int, 0);
MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate");
+/* Order of maximum shared ring size advertised to the front end. */
+int xen_blkif_max_ring_order = XENBUS_MAX_RING_ORDER;
+
+#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
+
+static int set_max_ring_order(const char *buf, struct kernel_param *kp)
+{
+ int err;
+ unsigned long order;
+
+ err = kstrtol(buf, 0, &order);
+ if (err ||
+ order < 0 ||
+ order > XENBUS_MAX_RING_ORDER)
+ return -EINVAL;
+
+ if (xen_blkif_reqs < BLK_RING_SIZE(order))
+ printk(KERN_WARNING "WARNING: "
+ "I/O request space (%d reqs) < ring order %ld, "
+ "consider increasing %s.reqs to >= %ld.",
+ xen_blkif_reqs, order, KBUILD_MODNAME,
+ roundup_pow_of_two(BLK_RING_SIZE(order)));
+
+ xen_blkif_max_ring_order = order;
+
+ return 0;
+}
+
+module_param_call(max_ring_order,
+ set_max_ring_order, param_get_int,
+ &xen_blkif_max_ring_order, 0644);
+MODULE_PARM_DESC(max_ring_order, "log2 of maximum ring size, in pages.");
+
/* Run-time switchable: /sys/module/blkback/parameters/ */
static unsigned int log_stats;
module_param(log_stats, int, 0644);
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index d0ee7ed..5f33a1a 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -126,6 +126,8 @@ struct blkif_x86_64_response {
int16_t status; /* BLKIF_RSP_??? */
};
+extern int xen_blkif_max_ring_order;
+
DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
struct blkif_common_response);
DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 24a2fb5..7a9d71d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -122,8 +122,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
return blkif;
}
-static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
- unsigned int evtchn)
+static int xen_blkif_map(struct xen_blkif *blkif, int ring_ref[],
+ unsigned int ring_order, unsigned int evtchn)
{
int err;
@@ -131,7 +131,8 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
if (blkif->irq)
return 0;
- err = xenbus_map_ring_valloc(blkif->be->dev, shared_page, &blkif->blk_ring);
+ err = xenbus_map_ring_valloc(blkif->be->dev, ring_ref, 1 << ring_order,
+ &blkif->blk_ring);
if (err < 0)
return err;
@@ -140,21 +141,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.native, sring,
+ PAGE_SIZE << ring_order);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
struct blkif_x86_32_sring *sring_x86_32;
sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+ PAGE_SIZE << ring_order);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
struct blkif_x86_64_sring *sring_x86_64;
sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE);
+ BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+ PAGE_SIZE << ring_order);
break;
}
default:
@@ -497,6 +501,11 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
goto fail;
+ err = xenbus_printf(XBT_NIL, dev->nodename, "max-ring-page-order",
+ "%u", xen_blkif_max_ring_order);
+ if (err)
+ goto fail;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -744,22 +753,80 @@ again:
static int connect_ring(struct backend_info *be)
{
struct xenbus_device *dev = be->dev;
- unsigned long ring_ref;
+ int ring_ref[XENBUS_MAX_RING_PAGES];
+ unsigned int ring_order;
unsigned int evtchn;
char protocol[64] = "";
int err;
DPRINTK("%s", dev->otherend);
- err = xenbus_gather(XBT_NIL, dev->otherend, "ring-ref", "%lu",
- &ring_ref, "event-channel", "%u", &evtchn, NULL);
- if (err) {
- xenbus_dev_fatal(dev, err,
- "reading %s/ring-ref and event-channel",
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+ &evtchn);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err, "reading %s/event-channel",
dev->otherend);
return err;
}
+ printk(KERN_INFO "blkback: event-channel %u\n", evtchn);
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+ &ring_order);
+ if (err != 1) {
+ DPRINTK("%s: using single page handshake", dev->otherend);
+
+ ring_order = 0;
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
+ "%d", &ring_ref[0]);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+ dev->otherend);
+ return err;
+ }
+
+ printk(KERN_INFO "blkback: ring-ref %d\n", ring_ref[0]);
+ } else {
+ unsigned int i;
+
+ if (ring_order > xen_blkif_max_ring_order) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err,
+ "%s/ring-page-order too big",
+ dev->otherend);
+ return err;
+ }
+
+ for (i = 0; i < (1u << ring_order); i++) {
+ char ring_ref_name[10];
+
+ snprintf(ring_ref_name, sizeof(ring_ref_name),
+ "ring-ref%u", i);
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend,
+ ring_ref_name, "%d",
+ &ring_ref[i]);
+ if (err != 1) {
+ err = -EINVAL;
+
+ xenbus_dev_fatal(dev, err,
+ "reading %s/%s",
+ dev->otherend,
+ ring_ref_name);
+ return err;
+ }
+
+ printk(KERN_INFO "blkback: ring-ref%u %d\n", i,
+ ring_ref[i]);
+ }
+ }
+
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
"%63s", protocol, NULL);
@@ -775,14 +842,11 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -1;
}
- pr_info(DRV_PFX "ring-ref %ld, event-channel %d, protocol %d (%s)\n",
- ring_ref, evtchn, be->blkif->blk_protocol, protocol);
/* Map the shared frame, irq etc. */
- err = xen_blkif_map(be->blkif, ring_ref, evtchn);
+ err = xen_blkif_map(be->blkif, ring_ref, ring_order, evtchn);
if (err) {
- xenbus_dev_fatal(dev, err, "mapping ring-ref %lu port %u",
- ring_ref, evtchn);
+ xenbus_dev_fatal(dev, err, "mapping ring-refs and evtchn");
return err;
}
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2f22874..485813a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -57,6 +57,10 @@
#include <asm/xen/hypervisor.h>
+static int xen_blkif_ring_order;
+module_param_named(reqs, xen_blkif_ring_order, int, 0);
+MODULE_PARM_DESC(reqs, "log2 of requested ring size, in pages.");
+
enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
@@ -72,7 +76,8 @@ struct blk_shadow {
static DEFINE_MUTEX(blkfront_mutex);
static const struct block_device_operations xlvbd_block_fops;
-#define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
+#define BLK_RING_SIZE(_order) __CONST_RING_SIZE(blkif, PAGE_SIZE << (_order))
+#define BLK_MAX_RING_SIZE BLK_RING_SIZE(XENBUS_MAX_RING_ORDER)
/*
* We have one of these per vbd, whether ide, scsi or 'other'. They
@@ -87,14 +92,15 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
- int ring_ref;
+ int ring_ref[XENBUS_MAX_RING_PAGES];
+ int ring_order;
struct blkif_front_ring ring;
struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
unsigned int evtchn, irq;
struct request_queue *rq;
struct work_struct work;
struct gnttab_free_callback callback;
- struct blk_shadow shadow[BLK_RING_SIZE];
+ struct blk_shadow shadow[BLK_MAX_RING_SIZE];
unsigned long shadow_free;
unsigned int feature_flush;
unsigned int flush_op;
@@ -111,9 +117,7 @@ static unsigned int nr_minors;
static unsigned long *minors;
static DEFINE_SPINLOCK(minor_lock);
-#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
- (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
-#define GRANT_INVALID_REF 0
+#define GRANT_INVALID_REF 0
#define PARTS_PER_DISK 16
#define PARTS_PER_EXT_DISK 256
@@ -135,7 +139,7 @@ static DEFINE_SPINLOCK(minor_lock);
static int get_id_from_freelist(struct blkfront_info *info)
{
unsigned long free = info->shadow_free;
- BUG_ON(free >= BLK_RING_SIZE);
+ BUG_ON(free >= BLK_MAX_RING_SIZE);
info->shadow_free = info->shadow[free].req.u.rw.id;
info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
return free;
@@ -683,6 +687,8 @@ static void blkif_restart_queue(struct work_struct *work)
static void blkif_free(struct blkfront_info *info, int suspend)
{
+ int i;
+
/* Prevent new requests being issued until we fix things up. */
spin_lock_irq(&blkif_io_lock);
info->connected = suspend ?
@@ -698,16 +704,19 @@ static void blkif_free(struct blkfront_info *info, int suspend)
flush_work_sync(&info->work);
/* Free resources associated with old device channel. */
- if (info->ring_ref != GRANT_INVALID_REF) {
- gnttab_end_foreign_access(info->ring_ref, 0,
- (unsigned long)info->ring.sring);
- info->ring_ref = GRANT_INVALID_REF;
- info->ring.sring = NULL;
+ for (i = 0; i < (1 << info->ring_order); i++) {
+ if (info->ring_ref[i] != GRANT_INVALID_REF) {
+ gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
+ info->ring_ref[i] = GRANT_INVALID_REF;
+ }
}
+
+ free_pages((unsigned long)info->ring.sring, info->ring_order);
+ info->ring.sring = NULL;
+
if (info->irq)
unbind_from_irqhandler(info->irq, info);
info->evtchn = info->irq = 0;
-
}
static void blkif_completion(struct blk_shadow *s)
@@ -828,25 +837,24 @@ static int setup_blkring(struct xenbus_device *dev,
struct blkif_sring *sring;
int err;
- info->ring_ref = GRANT_INVALID_REF;
-
- sring = (struct blkif_sring *)__get_free_page(GFP_NOIO | __GFP_HIGH);
+ sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
+ info->ring_order);
if (!sring) {
xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
return -ENOMEM;
}
SHARED_RING_INIT(sring);
- FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
+ FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE << info->ring_order);
sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
- err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
+ err = xenbus_grant_ring(dev, info->ring.sring, 1 << info->ring_order,
+ info->ring_ref);
if (err < 0) {
- free_page((unsigned long)sring);
+ free_pages((unsigned long)sring, info->ring_order);
info->ring.sring = NULL;
goto fail;
}
- info->ring_ref = err;
err = xenbus_alloc_evtchn(dev, &info->evtchn);
if (err)
@@ -875,8 +883,27 @@ static int talk_to_blkback(struct xenbus_device *dev,
{
const char *message = NULL;
struct xenbus_transaction xbt;
+ unsigned int ring_order;
+ int legacy_backend;
+ int i;
int err;
+ for (i = 0; i < (1 << info->ring_order); i++)
+ info->ring_ref[i] = GRANT_INVALID_REF;
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u",
+ &ring_order);
+
+ legacy_backend = !(err == 1);
+
+ if (legacy_backend) {
+ info->ring_order = 0;
+ } else {
+ info->ring_order = (ring_order <= xen_blkif_ring_order) ?
+ ring_order :
+ xen_blkif_ring_order;
+ }
+
/* Create shared ring, alloc event channel. */
err = setup_blkring(dev, info);
if (err)
@@ -889,12 +916,35 @@ again:
goto destroy_blkring;
}
- err = xenbus_printf(xbt, dev->nodename,
- "ring-ref", "%u", info->ring_ref);
- if (err) {
- message = "writing ring-ref";
- goto abort_transaction;
+ if (legacy_backend) {
+ err = xenbus_printf(xbt, dev->nodename,
+ "ring-ref", "%d", info->ring_ref[0]);
+ if (err) {
+ message = "writing ring-ref";
+ goto abort_transaction;
+ }
+ } else {
+ for (i = 0; i < (1 << info->ring_order); i++) {
+ char key[sizeof("ring-ref") + 2];
+
+ sprintf(key, "ring-ref%d", i);
+
+ err = xenbus_printf(xbt, dev->nodename,
+ key, "%d", info->ring_ref[i]);
+ if (err) {
+ message = "writing ring-ref";
+ goto abort_transaction;
+ }
+ }
+
+ err = xenbus_printf(xbt, dev->nodename,
+ "ring-page-order", "%u", info->ring_order);
+ if (err) {
+ message = "writing ring-order";
+ goto abort_transaction;
+ }
}
+
err = xenbus_printf(xbt, dev->nodename,
"event-channel", "%u", info->evtchn);
if (err) {
@@ -996,21 +1046,14 @@ static int blkfront_probe(struct xenbus_device *dev,
info->connected = BLKIF_STATE_DISCONNECTED;
INIT_WORK(&info->work, blkif_restart_queue);
- for (i = 0; i < BLK_RING_SIZE; i++)
+ for (i = 0; i < BLK_MAX_RING_SIZE; i++)
info->shadow[i].req.u.rw.id = i+1;
- info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
+ info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);
- err = talk_to_blkback(dev, info);
- if (err) {
- kfree(info);
- dev_set_drvdata(&dev->dev, NULL);
- return err;
- }
-
return 0;
}
@@ -1031,13 +1074,13 @@ static int blkif_recover(struct blkfront_info *info)
/* Stage 2: Set up free list. */
memset(&info->shadow, 0, sizeof(info->shadow));
- for (i = 0; i < BLK_RING_SIZE; i++)
+ for (i = 0; i < BLK_MAX_RING_SIZE; i++)
info->shadow[i].req.u.rw.id = i+1;
info->shadow_free = info->ring.req_prod_pvt;
- info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
+ info->shadow[BLK_MAX_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
/* Stage 3: Find pending requests and requeue them. */
- for (i = 0; i < BLK_RING_SIZE; i++) {
+ for (i = 0; i < BLK_RING_SIZE(info->ring_order); i++) {
/* Not in use? */
if (!copy[i].request)
continue;
@@ -1299,7 +1342,6 @@ static void blkback_changed(struct xenbus_device *dev,
switch (backend_state) {
case XenbusStateInitialising:
- case XenbusStateInitWait:
case XenbusStateInitialised:
case XenbusStateReconfiguring:
case XenbusStateReconfigured:
@@ -1307,6 +1349,10 @@ static void blkback_changed(struct xenbus_device *dev,
case XenbusStateClosed:
break;
+ case XenbusStateInitWait:
+ talk_to_blkback(dev, info);
+ break;
+
case XenbusStateConnected:
blkfront_connect(info);
break;
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 94b79c3..f93b59a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -130,8 +130,8 @@ int xen_netbk_must_stop_queue(struct xenvif *vif);
/* (Un)Map communication rings. */
void xen_netbk_unmap_frontend_rings(struct xenvif *vif);
int xen_netbk_map_frontend_rings(struct xenvif *vif,
- grant_ref_t tx_ring_ref,
- grant_ref_t rx_ring_ref);
+ int tx_ring_ref,
+ int rx_ring_ref);
/* (De)Register a xenvif with the netback backend. */
void xen_netbk_add_xenvif(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 59effac..0b014cf 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1594,8 +1594,8 @@ void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
}
int xen_netbk_map_frontend_rings(struct xenvif *vif,
- grant_ref_t tx_ring_ref,
- grant_ref_t rx_ring_ref)
+ int tx_ring_ref,
+ int rx_ring_ref)
{
void *addr;
struct xen_netif_tx_sring *txs;
@@ -1604,7 +1604,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
int err = -ENOMEM;
err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
- tx_ring_ref, &addr);
+ &tx_ring_ref, 1, &addr);
if (err)
goto err;
@@ -1612,7 +1612,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
- rx_ring_ref, &addr);
+ &rx_ring_ref, 1, &addr);
if (err)
goto err;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 698b905..521a595 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1496,13 +1496,12 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
SHARED_RING_INIT(txs);
FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
- err = xenbus_grant_ring(dev, virt_to_mfn(txs));
+ err = xenbus_grant_ring(dev, txs, 1, &info->tx_ring_ref);
if (err < 0) {
free_page((unsigned long)txs);
goto fail;
}
- info->tx_ring_ref = err;
rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
if (!rxs) {
err = -ENOMEM;
@@ -1512,12 +1511,11 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
SHARED_RING_INIT(rxs);
FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
- err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
+ err = xenbus_grant_ring(dev, rxs, 1, &info->rx_ring_ref);
if (err < 0) {
free_page((unsigned long)rxs);
goto fail;
}
- info->rx_ring_ref = err;
err = xenbus_alloc_evtchn(dev, &info->evtchn);
if (err)
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 1620088..95109d8 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -768,12 +768,10 @@ static int pcifront_publish_info(struct pcifront_device *pdev)
int err = 0;
struct xenbus_transaction trans;
- err = xenbus_grant_ring(pdev->xdev, virt_to_mfn(pdev->sh_info));
+ err = xenbus_grant_ring(pdev->xdev, pdev->sh_info, 1, &pdev->gnt_ref);
if (err < 0)
goto out;
- pdev->gnt_ref = err;
-
err = xenbus_alloc_evtchn(pdev->xdev, &pdev->evtchn);
if (err)
goto out;
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index 64b11f9..e0834cd 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -108,7 +108,7 @@ static int xen_pcibk_do_attach(struct xen_pcibk_device *pdev, int gnt_ref,
"Attaching to frontend resources - gnt_ref=%d evtchn=%d\n",
gnt_ref, remote_evtchn);
- err = xenbus_map_ring_valloc(pdev->xdev, gnt_ref, &vaddr);
+ err = xenbus_map_ring_valloc(pdev->xdev, &gnt_ref, 1, &vaddr);
if (err < 0) {
xenbus_dev_fatal(pdev->xdev, err,
"Error mapping other domain page in ours.");
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 566d2ad..3a14524 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -53,14 +53,16 @@ struct xenbus_map_node {
struct vm_struct *area; /* PV */
struct page *page; /* HVM */
};
- grant_handle_t handle;
+ grant_handle_t handle[XENBUS_MAX_RING_PAGES];
+ unsigned int nr_handles;
};
static DEFINE_SPINLOCK(xenbus_valloc_lock);
static LIST_HEAD(xenbus_valloc_pages);
struct xenbus_ring_ops {
- int (*map)(struct xenbus_device *dev, int gnt, void **vaddr);
+ int (*map)(struct xenbus_device *dev, int gnt[], int nr_gnts,
+ void **vaddr);
int (*unmap)(struct xenbus_device *dev, void *vaddr);
};
@@ -356,17 +358,38 @@ static void xenbus_switch_fatal(struct xenbus_device *dev, int depth, int err,
/**
* xenbus_grant_ring
* @dev: xenbus device
- * @ring_mfn: mfn of ring to grant
-
- * Grant access to the given @ring_mfn to the peer of the given device. Return
- * 0 on success, or -errno on error. On error, the device will switch to
- * XenbusStateClosing, and the error will be saved in the store.
+ * @vaddr: starting virtual address of the ring
+ * @nr_pages: number of page to be granted
+ * @grefs: grant reference array to be filled in
+ * Grant access to the given @vaddr to the peer of the given device.
+ * Then fill in @grefs with grant references. Return 0 on success, or
+ * -errno on error. On error, the device will switch to
+ * XenbusStateClosing, and the first error will be saved in the store.
*/
-int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn)
+int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
+ int nr_pages, int grefs[])
{
- int err = gnttab_grant_foreign_access(dev->otherend_id, ring_mfn, 0);
- if (err < 0)
- xenbus_dev_fatal(dev, err, "granting access to ring page");
+ int i;
+ int err;
+
+ for (i = 0; i < nr_pages; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ err = gnttab_grant_foreign_access(dev->otherend_id,
+ virt_to_mfn(addr), 0);
+ if (err < 0) {
+ xenbus_dev_fatal(dev, err,
+ "granting access to ring page");
+ goto fail;
+ }
+ grefs[i] = err;
+ }
+
+ return 0;
+
+fail:
+ for ( ; i >= 0; i--)
+ gnttab_end_foreign_access_ref(grefs[i], 0);
return err;
}
EXPORT_SYMBOL_GPL(xenbus_grant_ring);
@@ -447,7 +470,8 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
/**
* xenbus_map_ring_valloc
* @dev: xenbus device
- * @gnt_ref: grant reference
+ * @gnt_ref: grant reference array
+ * @nr_grefs: number of grant reference
* @vaddr: pointer to address to be filled out by mapping
*
* Based on Rusty Russell's skeleton driver's map_page.
@@ -458,23 +482,28 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
* or -ENOMEM on error. If an error is returned, device will switch to
* XenbusStateClosing and the error message will be saved in XenStore.
*/
-int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
+int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
+ int nr_grefs, void **vaddr)
{
- return ring_ops->map(dev, gnt_ref, vaddr);
+ return ring_ops->map(dev, gnt_ref, nr_grefs, vaddr);
}
EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
+static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
+ struct xenbus_map_node *node);
+
static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
- int gnt_ref, void **vaddr)
+ int gnt_ref[], int nr_grefs, void **vaddr)
{
- struct gnttab_map_grant_ref op = {
- .flags = GNTMAP_host_map | GNTMAP_contains_pte,
- .ref = gnt_ref,
- .dom = dev->otherend_id,
- };
+ struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
struct xenbus_map_node *node;
struct vm_struct *area;
- pte_t *pte;
+ pte_t *pte[XENBUS_MAX_RING_PAGES];
+ int i;
+ int err = 0;
+
+ if (nr_grefs > XENBUS_MAX_RING_PAGES)
+ return -EINVAL;
*vaddr = NULL;
@@ -482,28 +511,44 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
if (!node)
return -ENOMEM;
- area = alloc_vm_area(PAGE_SIZE, &pte);
+ area = alloc_vm_area(PAGE_SIZE * nr_grefs, pte);
if (!area) {
kfree(node);
return -ENOMEM;
}
- op.host_addr = arbitrary_virt_to_machine(pte).maddr;
+ for (i = 0; i < nr_grefs; i++) {
+ op[i].flags = GNTMAP_host_map | GNTMAP_contains_pte,
+ op[i].ref = gnt_ref[i],
+ op[i].dom = dev->otherend_id,
+ op[i].host_addr = arbitrary_virt_to_machine(pte[i]).maddr;
+ };
if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
BUG();
- if (op.status != GNTST_okay) {
- free_vm_area(area);
- kfree(node);
- xenbus_dev_fatal(dev, op.status,
- "mapping in shared page %d from domain %d",
- gnt_ref, dev->otherend_id);
- return op.status;
+ node->nr_handles = nr_grefs;
+ node->area = area;
+
+ for (i = 0; i < nr_grefs; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ node->handle[i] = INVALID_GRANT_HANDLE;
+ continue;
+ }
+ node->handle[i] = op[i].handle;
}
- node->handle = op.handle;
- node->area = area;
+ if (err != 0) {
+ for (i = 0; i < nr_grefs; i++)
+ xenbus_dev_fatal(dev, op[i].status,
+ "mapping in shared page %d from domain %d",
+ gnt_ref[i], dev->otherend_id);
+
+ __xenbus_unmap_ring_vfree_pv(dev, node);
+
+ return err;
+ }
spin_lock(&xenbus_valloc_lock);
list_add(&node->next, &xenbus_valloc_pages);
@@ -514,25 +559,29 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
}
static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
- int gnt_ref, void **vaddr)
+ int gnt_ref[], int nr_grefs, void **vaddr)
{
struct xenbus_map_node *node;
int err;
void *addr;
+ if (nr_grefs > XENBUS_MAX_RING_PAGES)
+ return -EINVAL;
+
*vaddr = NULL;
node = kzalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
- err = alloc_xenballooned_pages(1, &node->page, false /* lowmem */);
+ err = alloc_xenballooned_pages(nr_grefs, &node->page,
+ false /* lowmem */);
if (err)
goto out_err;
addr = pfn_to_kaddr(page_to_pfn(node->page));
- err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr);
+ err = xenbus_map_ring(dev, gnt_ref, nr_grefs, node->handle, addr);
if (err)
goto out_err;
@@ -544,7 +593,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
return 0;
out_err:
- free_xenballooned_pages(1, &node->page);
+ free_xenballooned_pages(nr_grefs, &node->page);
kfree(node);
return err;
}
@@ -553,36 +602,51 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
/**
* xenbus_map_ring
* @dev: xenbus device
- * @gnt_ref: grant reference
- * @handle: pointer to grant handle to be filled
+ * @gnt_ref: grant reference array
+ * @nr_grefs: number of grant references
+ * @handle: pointer to grant handle array to be filled, mind the size
* @vaddr: address to be mapped to
*
- * Map a page of memory into this domain from another domain's grant table.
+ * Map pages of memory into this domain from another domain's grant table.
* xenbus_map_ring does not allocate the virtual address space (you must do
- * this yourself!). It only maps in the page to the specified address.
+ * this yourself!). It only maps in the pages to the specified address.
* Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
* or -ENOMEM on error. If an error is returned, device will switch to
- * XenbusStateClosing and the error message will be saved in XenStore.
+ * XenbusStateClosing and the last error message will be saved in XenStore.
*/
-int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
- grant_handle_t *handle, void *vaddr)
+int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
+ grant_handle_t handle[], void *vaddr)
{
- struct gnttab_map_grant_ref op;
-
- gnttab_set_map_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, gnt_ref,
- dev->otherend_id);
+ struct gnttab_map_grant_ref op[XENBUS_MAX_RING_PAGES];
+ int i;
+ int err = GNTST_okay; /* 0 */
+
+ for (i = 0; i < nr_grefs; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ gnttab_set_map_op(&op[i], (phys_addr_t)addr,
+ GNTMAP_host_map, gnt_ref[i],
+ dev->otherend_id);
+ }
- if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
+ if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nr_grefs))
BUG();
- if (op.status != GNTST_okay) {
- xenbus_dev_fatal(dev, op.status,
- "mapping in shared page %d from domain %d",
- gnt_ref, dev->otherend_id);
- } else
- *handle = op.handle;
+ for (i = 0; i < nr_grefs; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_fatal(dev, err,
+ "mapping in shared page %d from domain %d",
+ gnt_ref[i], dev->otherend_id);
+ handle[i] = INVALID_GRANT_HANDLE;
+ } else
+ handle[i] = op[i].handle;
+ }
- return op.status;
+ if (err != GNTST_okay)
+ xenbus_unmap_ring(dev, handle, nr_grefs, vaddr);
+
+ return err;
}
EXPORT_SYMBOL_GPL(xenbus_map_ring);
@@ -605,13 +669,53 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
}
EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
+static int __xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev,
+ struct xenbus_map_node *node)
+{
+ struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
+ unsigned int level;
+ int i, j;
+ int err = GNTST_okay;
+
+ j = 0;
+ for (i = 0; i < node->nr_handles; i++) {
+ unsigned long vaddr = (unsigned long)node->area->addr +
+ (PAGE_SIZE * i);
+ if (node->handle[i] != INVALID_GRANT_HANDLE) {
+ memset(&op[j], 0, sizeof(op[0]));
+ op[j].host_addr = arbitrary_virt_to_machine(
+ lookup_address(vaddr, &level)).maddr;
+ op[j].handle = node->handle[i];
+ j++;
+ node->handle[i] = INVALID_GRANT_HANDLE;
+ }
+ }
+
+ if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
+ BUG();
+
+ node->nr_handles = 0;
+
+ for (i = 0; i < j; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_error(dev, err,
+ "unmapping page %d at handle %d error %d",
+ i, op[i].handle, err);
+ }
+ }
+
+ if (err == GNTST_okay)
+ free_vm_area(node->area);
+
+ kfree(node);
+
+ return err;
+}
+
static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
{
struct xenbus_map_node *node;
- struct gnttab_unmap_grant_ref op = {
- .host_addr = (unsigned long)vaddr,
- };
- unsigned int level;
spin_lock(&xenbus_valloc_lock);
list_for_each_entry(node, &xenbus_valloc_pages, next) {
@@ -626,33 +730,18 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
if (!node) {
xenbus_dev_error(dev, -ENOENT,
- "can't find mapped virtual address %p", vaddr);
+ "can't find mapped virtual address %p", vaddr);
return GNTST_bad_virt_addr;
}
- op.handle = node->handle;
- op.host_addr = arbitrary_virt_to_machine(
- lookup_address((unsigned long)vaddr, &level)).maddr;
-
- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
- BUG();
-
- if (op.status == GNTST_okay)
- free_vm_area(node->area);
- else
- xenbus_dev_error(dev, op.status,
- "unmapping page at handle %d error %d",
- node->handle, op.status);
-
- kfree(node);
- return op.status;
+ return __xenbus_unmap_ring_vfree_pv(dev, node);
}
static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
{
int rv;
struct xenbus_map_node *node;
- void *addr;
+ void *addr = NULL;
spin_lock(&xenbus_valloc_lock);
list_for_each_entry(node, &xenbus_valloc_pages, next) {
@@ -668,14 +757,14 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
if (!node) {
xenbus_dev_error(dev, -ENOENT,
- "can't find mapped virtual address %p", vaddr);
+ "can't find mapped virtual address %p", vaddr);
return GNTST_bad_virt_addr;
}
- rv = xenbus_unmap_ring(dev, node->handle, addr);
+ rv = xenbus_unmap_ring(dev, node->handle, node->nr_handles, addr);
if (!rv)
- free_xenballooned_pages(1, &node->page);
+ free_xenballooned_pages(node->nr_handles, &node->page);
else
WARN(1, "Leaking %p\n", vaddr);
@@ -687,6 +776,7 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
* xenbus_unmap_ring
* @dev: xenbus device
* @handle: grant handle
+ * @nr_handles: number of grant handle
* @vaddr: addr to unmap
*
* Unmap a page of memory in this domain that was imported from another domain.
@@ -694,21 +784,37 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
* (see xen/include/interface/grant_table.h).
*/
int xenbus_unmap_ring(struct xenbus_device *dev,
- grant_handle_t handle, void *vaddr)
+ grant_handle_t handle[], int nr_handles,
+ void *vaddr)
{
- struct gnttab_unmap_grant_ref op;
-
- gnttab_set_unmap_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, handle);
+ struct gnttab_unmap_grant_ref op[XENBUS_MAX_RING_PAGES];
+ int i, j;
+ int err = GNTST_okay;
+
+ j = 0;
+ for (i = 0; i < nr_handles; i++) {
+ unsigned long addr = (unsigned long)vaddr +
+ (PAGE_SIZE * i);
+ if (handle[i] != INVALID_GRANT_HANDLE) {
+ gnttab_set_unmap_op(&op[j++], (phys_addr_t)addr,
+ GNTMAP_host_map, handle[i]);
+ handle[i] = INVALID_GRANT_HANDLE;
+ }
+ }
- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
+ if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, j))
BUG();
- if (op.status != GNTST_okay)
- xenbus_dev_error(dev, op.status,
- "unmapping page at handle %d error %d",
- handle, op.status);
+ for (i = 0; i < j; i++) {
+ if (op[i].status != GNTST_okay) {
+ err = op[i].status;
+ xenbus_dev_error(dev, err,
+ "unmapping page at handle %d error %d",
+ handle[i], err);
+ }
+ }
- return op.status;
+ return err;
}
EXPORT_SYMBOL_GPL(xenbus_unmap_ring);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3864967..62b92d2 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -718,6 +718,7 @@ static int __init xenstored_local_init(void)
return err;
}
+extern void xenbus_ring_ops_init(void);
static int __init xenbus_init(void)
{
int err = 0;
@@ -767,6 +768,8 @@ static int __init xenbus_init(void)
proc_mkdir("xen", NULL);
#endif
+ xenbus_ring_ops_init();
+
out_error:
return err;
}
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index e8c599b..cdbd948 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -195,15 +195,23 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch,
const char *pathfmt, ...);
int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state new_state);
-int xenbus_grant_ring(struct xenbus_device *dev, unsigned long ring_mfn);
-int xenbus_map_ring_valloc(struct xenbus_device *dev,
- int gnt_ref, void **vaddr);
-int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
- grant_handle_t *handle, void *vaddr);
+
+#define XENBUS_MAX_RING_ORDER 2
+#define XENBUS_MAX_RING_PAGES (1 << XENBUS_MAX_RING_ORDER)
+
+#define INVALID_GRANT_HANDLE (~0U)
+
+int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
+ int nr_pages, int grefs[]);
+int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref[],
+ int nr_grefs, void **vaddr);
+int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref[], int nr_grefs,
+ grant_handle_t handle[], void *vaddr);
int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr);
int xenbus_unmap_ring(struct xenbus_device *dev,
- grant_handle_t handle, void *vaddr);
+ grant_handle_t handle[], int nr_handles,
+ void *vaddr);
int xenbus_alloc_evtchn(struct xenbus_device *dev, int *port);
int xenbus_bind_evtchn(struct xenbus_device *dev, int remote_port, int *port);
^ permalink raw reply related
* RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: KY Srinivasan @ 2012-03-05 2:29 UTC (permalink / raw)
To: James Bottomley
Cc: Christoph Hellwig, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, ohering@suse.com,
linux-scsi@vger.kernel.org, Haiyang Zhang
In-Reply-To: <1330872513.2858.14.camel@dabdike.int.hansenpartnership.com>
> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> Sent: Sunday, March 04, 2012 9:49 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; linux-scsi@vger.kernel.org;
> Haiyang Zhang
> Subject: RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> the host
>
> On Sun, 2012-03-04 at 14:23 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: Christoph Hellwig [mailto:hch@infradead.org]
> > > Sent: Sunday, March 04, 2012 4:12 AM
> > > To: KY Srinivasan
> > > Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> > > devel@linuxdriverproject.org; virtualization@lists.osdl.org;
> ohering@suse.com;
> > > jbottomley@parallels.com; hch@infradead.org; linux-scsi@vger.kernel.org;
> > > Haiyang Zhang
> > > Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> > > the host
> > >
> > > On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> > > > Windows hosts don't handle the ATA_16 command; don't pass it to the
> host.
> > >
> > > Most devices don't handle it, and answer with and unsupported opcode
> > > sense reason. If hyperv iis buggy enough to crap out on it please add
> > > a comment explaining that.
> >
> > The host does not "crap out", it does return an error code but it is not
> "unsupported opcode".
> > The sense reason that comes back is a generic error SRB_STATUS code. It is
> easier for me to filter the
> > command on the outgoing side as opposed to dealing with a generic error code
> that is coming back from
> > the host.
>
> That's the wrong thing to do ... you need to unwrap the error code.
I will see if this is even possible based on the current error codes I get back.
> The reason being I presume it's not impossible for Windows to host a
> device supporting ATA_16 and there are signs that this is going to be
> necessary to prevent data corruption on some USB devices ... if you just
> filter the command without checking if the host supports it, you're
> going to end up perpetuating the corruption problem.
We are talking of virtual block devices exposed to Linux guests running on a Windows
hosts. I don't think they will ever need to support ATA_16 command on these virtual block
devices. I will however confirm with the Windows team.
Regards,
K. Y
^ permalink raw reply
* RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: James Bottomley @ 2012-03-04 14:48 UTC (permalink / raw)
To: KY Srinivasan
Cc: linux-scsi@vger.kernel.org, gregkh@linuxfoundation.org,
Haiyang Zhang, ohering@suse.com, linux-kernel@vger.kernel.org,
Christoph Hellwig, virtualization@lists.osdl.org,
devel@linuxdriverproject.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481B74886F@TK5EX14MBXC126.redmond.corp.microsoft.com>
On Sun, 2012-03-04 at 14:23 +0000, KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: Christoph Hellwig [mailto:hch@infradead.org]
> > Sent: Sunday, March 04, 2012 4:12 AM
> > To: KY Srinivasan
> > Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> > devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> > jbottomley@parallels.com; hch@infradead.org; linux-scsi@vger.kernel.org;
> > Haiyang Zhang
> > Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> > the host
> >
> > On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> > > Windows hosts don't handle the ATA_16 command; don't pass it to the host.
> >
> > Most devices don't handle it, and answer with and unsupported opcode
> > sense reason. If hyperv iis buggy enough to crap out on it please add
> > a comment explaining that.
>
> The host does not "crap out", it does return an error code but it is not "unsupported opcode".
> The sense reason that comes back is a generic error SRB_STATUS code. It is easier for me to filter the
> command on the outgoing side as opposed to dealing with a generic error code that is coming back from
> the host.
That's the wrong thing to do ... you need to unwrap the error code.
The reason being I presume it's not impossible for Windows to host a
device supporting ATA_16 and there are signs that this is going to be
necessary to prevent data corruption on some USB devices ... if you just
filter the command without checking if the host supports it, you're
going to end up perpetuating the corruption problem.
The general rule of thumb for avoiding this is to let the lower layers
handle as much as possible, and only begin behaviour alterations in the
upper layers if the lower layers have a provable and usually fatal
failure.
James
^ permalink raw reply
* RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: KY Srinivasan @ 2012-03-04 14:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
devel@linuxdriverproject.org, virtualization@lists.osdl.org,
ohering@suse.com, jbottomley@parallels.com,
linux-scsi@vger.kernel.org, Haiyang Zhang
In-Reply-To: <20120304091225.GA27297@infradead.org>
> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Sunday, March 04, 2012 4:12 AM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> jbottomley@parallels.com; hch@infradead.org; linux-scsi@vger.kernel.org;
> Haiyang Zhang
> Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> the host
>
> On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> > Windows hosts don't handle the ATA_16 command; don't pass it to the host.
>
> Most devices don't handle it, and answer with and unsupported opcode
> sense reason. If hyperv iis buggy enough to crap out on it please add
> a comment explaining that.
The host does not "crap out", it does return an error code but it is not "unsupported opcode".
The sense reason that comes back is a generic error SRB_STATUS code. It is easier for me to filter the
command on the outgoing side as opposed to dealing with a generic error code that is coming back from
the host.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: Christoph Hellwig @ 2012-03-04 9:12 UTC (permalink / raw)
To: K. Y. Srinivasan
Cc: gregkh, linux-kernel, devel, virtualization, ohering, jbottomley,
hch, linux-scsi, Haiyang Zhang
In-Reply-To: <1330721347-26781-1-git-send-email-kys@microsoft.com>
On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> Windows hosts don't handle the ATA_16 command; don't pass it to the host.
Most devices don't handle it, and answer with and unsupported opcode
sense reason. If hyperv iis buggy enough to crap out on it please add
a comment explaining that.
^ permalink raw reply
* RE: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to the host
From: KY Srinivasan @ 2012-03-02 21:33 UTC (permalink / raw)
To: Greg KH
Cc: linux-scsi@vger.kernel.org, Haiyang Zhang, ohering@suse.com,
jbottomley@parallels.com, linux-kernel@vger.kernel.org,
hch@infradead.org, virtualization@lists.osdl.org,
devel@linuxdriverproject.org
In-Reply-To: <20120302213116.GA23443@kroah.com>
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Friday, March 02, 2012 4:31 PM
> To: KY Srinivasan
> Cc: linux-scsi@vger.kernel.org; Haiyang Zhang; ohering@suse.com;
> jbottomley@parallels.com; linux-kernel@vger.kernel.org; hch@infradead.org;
> virtualization@lists.osdl.org; devel@linuxdriverproject.org
> Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> the host
>
> On Fri, Mar 02, 2012 at 09:22:38PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > Sent: Friday, March 02, 2012 4:14 PM
> > > To: KY Srinivasan
> > > Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > > virtualization@lists.osdl.org; ohering@suse.com; jbottomley@parallels.com;
> > > hch@infradead.org; linux-scsi@vger.kernel.org; Haiyang Zhang
> > > Subject: Re: [PATCH 1/1] Drivers: scsi: storvsc: Don't pass ATA_16 command to
> > > the host
> > >
> > > On Fri, Mar 02, 2012 at 12:49:07PM -0800, K. Y. Srinivasan wrote:
> > > > Windows hosts don't handle the ATA_16 command; don't pass it to the
> host.
> > > >
> > > > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > > ---
> > > > drivers/scsi/storvsc_drv.c | 2 ++
> > > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > Should this go to older kernel versions as well?
> >
> > I think it should. Do you want me to resend this patch with the correct tag?
> > Also, given that storvsc has changed so much over the last several months,
> > this patch may or may not apply to earlier versions of this driver even though
> > this patch itself is quite trivial.
>
> I'll tag it for the stable tree, then when it doesn't apply, you will
> get an email saying it didn't, so you can then send me the correct one
> :)
Thanks Greg.
K. Y
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox