Re: [PATCH] Allow NBD to be used locally

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] Allow NBD to be used locally
@ 2008-02-02 17:31 devzero
  2008-02-03  0:54 ` Jan Engelhardt
  0 siblings, 1 reply; 11+ messages in thread
From: devzero @ 2008-02-02 17:31 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Laurent.Vivier, linux-kernel, pavel

> How will that work? Fuse makes up a filesystem - not helpful
> if you have a raw disk without a known fs to mount.

take zfs-fuse or ntfs-3g for example. 
you have a blockdevice or backing-file containing data structures and fuse makes those show up as a filesystem.
i think vmware-mount is not different here.

> This still does not account for compressed disk images, for example.
unfortunately, not




> 
> 
> On Feb 2 2008 15:40, devzero@web.de wrote:
> >
> >>In fact, VMware uses local nbd today for its vmware-loop helper
> >>utility, most likely because of the above-mentioned reasons. (Though
> >>it quite often hung last time I tried.)
> >
> >seems this will go away. recent vmware versions (e.g. server 2.0
> >beta) have a fuse based replacement for that.
> 
> How will that work? Fuse makes up a filesystem - not helpful
> if you have a raw disk without a known fs to mount.
> 
> >>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
> >>does not deal with that.
> >
> >maybe dm-loop does? http://sources.redhat.com/lvm2/wiki/DMLoop
> 
> This still does not account for compressed disk images, for example.
> 
> 


________________________________________________________
Bis 50 MB Dateianhänge? Kein Problem!
http://www.digitaledienste.web.de/freemail/club/lp/?lp=7


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 17:31 [PATCH] Allow NBD to be used locally devzero
@ 2008-02-03  0:54 ` Jan Engelhardt
  2008-02-03  6:02   ` Kyle Moffett
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Engelhardt @ 2008-02-03  0:54 UTC (permalink / raw)
  To: devzero; +Cc: Laurent.Vivier, linux-kernel, pavel


On Feb 2 2008 18:31, devzero@web.de wrote:
>
>> How will that work? Fuse makes up a filesystem - not helpful
>> if you have a raw disk without a known fs to mount.
>
>take zfs-fuse or ntfs-3g for example. 
>you have a blockdevice or backing-file containing data structures and fuse makes those show up as a filesystem.
>i think vmware-mount is not different here.

vmware-mount IS different, it provides the _block_ device,
which is then mounted through the usual mount(2) mechanism
(if there is a filesystem driver for it).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-03  0:54 ` Jan Engelhardt
@ 2008-02-03  6:02   ` Kyle Moffett
  0 siblings, 0 replies; 11+ messages in thread
From: Kyle Moffett @ 2008-02-03  6:02 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: devzero, Laurent.Vivier, linux-kernel, pavel

Whoops, only hit "Reply" on the first email, sorry Jan.

On Feb 2, 2008 7:54 PM, Jan Engelhardt <jengelh@computergmbh.de> wrote:
> On Feb 2 2008 18:31, devzero@web.de wrote:
> >
> >> How will that work? Fuse makes up a filesystem - not helpful
> >> if you have a raw disk without a known fs to mount.
> >
> >take zfs-fuse or ntfs-3g for example.
> >you have a blockdevice or backing-file containing data structures and fuse makes those show up as a filesystem.
> >i think vmware-mount is not different here.
>
> vmware-mount IS different, it provides the _block_ device,
> which is then mounted through the usual mount(2) mechanism
> (if there is a filesystem driver for it).

As far as I can tell, vmware-mount should be re-implemented as a
little perl script around "dmsetup" and/or "losetup", possibly with
"dm-userspace" patched into the kernel to allow you to handle
non-mapped blocks in your userspace daemon when somebody tries to
access them.  If you don't need that ability then straight dm-loop and
dm-linear will work.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
@ 2008-02-02 14:40 devzero
  2008-02-02 16:57 ` Jan Engelhardt
  0 siblings, 1 reply; 11+ messages in thread
From: devzero @ 2008-02-02 14:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Laurent.Vivier, jengelh, pavel

>In fact, VMware uses local nbd today for its vmware-loop helper
>utility, most likely because of the above-mentioned reasons. (Though
>it quite often hung last time I tried.)

seems this will go away.
recent vmware versions (e.g. server 2.0 beta) have a fuse based replacement for that.

ldd /usr/bin/vmware-mount
        linux-gate.so.1 =>  (0xffffe000)
        libz.so.1 => /lib/libz.so.1 (0xb7f95000)
! ->  libfuse.so.2 => /lib/libfuse.so.2 (0xb7f79000)   
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7f61000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7f5d000)
        libc.so.6 => /lib/libc.so.6 (0xb7e1c000)
        /lib/ld-linux.so.2 (0xb7fbd000)
        librt.so.1 => /lib/librt.so.1 (0xb7e13000)

i`m not sure if this is the perfect approach (slower....) , but at least that shouldn`t have those stability issues as the nbd one.

i always felt uncomfortable with the nbd approach...that`s why i started the following discussion-thread: 
http://communities.vmware.com/message/854746

anyway, i can see a point to use nbd locally, but i think it shouldn`t be abused for mapping local disk-images of any kind, even if this has better capabilities than loop or other. why should local disk data be sent trough the network layer? isn`t device-mapper the better infrastructure here?

>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
>does not deal with that.
maybe dm-loop does? http://sources.redhat.com/lvm2/wiki/DMLoop

regards
roland



>On Feb 2 2008 12:23, Pavel Machek wrote:
>>On Fri 2008-02-01 14:25:32, Laurent Vivier wrote:
>>> This patch allows Network Block Device to be mounted locally.
>>
>>What is local nbd good for? Use loop instead...
>
>Local NBD is good for when the content you want to make available
>through the block device is dynamic (generated on-the-fly),
>non-linear or supersparse.
>
>Take for example VMware virtual disks. Just a guess, but
>they roughly can look like this:
>
>  kilobytes  0.. 1: header
>  kilobytes  1..10: correspond to LBA 0..20
>  kilobytes 11..20: correspond to LBA 40..60
>  kilobytes 21..22: correspond to LBA 22..23
>
>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
>does not deal with that.
>
>And there is supersparsity -- the VMDK file itself is complete, but
>unallocated regions like LBA 24..40 are sparse/zero when projected
>onto a file/block device, respectively; loop cannot deal with that
>either.
>
>In fact, VMware uses local nbd today for its vmware-loop helper
>utility, most likely because of the above-mentioned reasons. (Though
>it quite often hung last time I tried.)
_____________________________________________________________________
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/?mc=100071&distributionid=000000000066


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 14:40 devzero
@ 2008-02-02 16:57 ` Jan Engelhardt
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Engelhardt @ 2008-02-02 16:57 UTC (permalink / raw)
  To: devzero; +Cc: linux-kernel, Laurent.Vivier, pavel


On Feb 2 2008 15:40, devzero@web.de wrote:
>
>>In fact, VMware uses local nbd today for its vmware-loop helper
>>utility, most likely because of the above-mentioned reasons. (Though
>>it quite often hung last time I tried.)
>
>seems this will go away. recent vmware versions (e.g. server 2.0
>beta) have a fuse based replacement for that.

How will that work? Fuse makes up a filesystem - not helpful
if you have a raw disk without a known fs to mount.

>>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
>>does not deal with that.
>
>maybe dm-loop does? http://sources.redhat.com/lvm2/wiki/DMLoop

This still does not account for compressed disk images, for example.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Allow NBD to be used locally
@ 2008-02-01 13:25 Laurent Vivier
  2008-02-02 11:23 ` Pavel Machek
  0 siblings, 1 reply; 11+ messages in thread
From: Laurent Vivier @ 2008-02-01 13:25 UTC (permalink / raw)
  To: Paul.Clements; +Cc: nbd-general, linux-kernel, Laurent Vivier

This patch allows Network Block Device to be mounted locally.

It creates a kthread to avoid the deadlock described in NBD tools documentation.
So, if nbd-client hangs waiting pages, the kblockd thread can continue its
work and free pages.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 drivers/block/nbd.c |  146 ++++++++++++++++++++++++++++++++++-----------------
 include/linux/nbd.h |    4 +-
 2 files changed, 100 insertions(+), 50 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b4c0888..de6685e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <net/sock.h>
 #include <linux/net.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
@@ -434,6 +435,87 @@ static void nbd_clear_que(struct nbd_device *lo)
 }
 
 
+static void nbd_handle_req(struct nbd_device *lo, struct request *req)
+{
+	if (!blk_fs_request(req))
+		goto error_out;
+
+	nbd_cmd(req) = NBD_CMD_READ;
+	if (rq_data_dir(req) == WRITE) {
+		nbd_cmd(req) = NBD_CMD_WRITE;
+		if (lo->flags & NBD_READ_ONLY) {
+			printk(KERN_ERR "%s: Write on read-only\n",
+					lo->disk->disk_name);
+			goto error_out;
+		}
+	}
+
+	req->errors = 0;
+
+	mutex_lock(&lo->tx_lock);
+	if (unlikely(!lo->sock)) {
+		mutex_unlock(&lo->tx_lock);
+		printk(KERN_ERR "%s: Attempted send on closed socket\n",
+		       lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+		return;
+	}
+
+	lo->active_req = req;
+
+	if (nbd_send_req(lo, req) != 0) {
+		printk(KERN_ERR "%s: Request send failed\n",
+				lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+	} else {
+		spin_lock(&lo->queue_lock);
+		list_add(&req->queuelist, &lo->queue_head);
+		spin_unlock(&lo->queue_lock);
+	}
+
+	lo->active_req = NULL;
+	mutex_unlock(&lo->tx_lock);
+	wake_up_all(&lo->active_wq);
+
+	return;
+
+error_out:
+	req->errors++;
+	nbd_end_request(req);
+}
+
+static int nbd_thread(void *data)
+{
+	struct nbd_device *lo = data;
+	struct request *req;
+
+	set_user_nice(current, -20);
+	while (!kthread_should_stop() || !list_empty(&lo->waiting_queue)) {
+		/* wait something to do */
+		wait_event_interruptible(lo->waiting_wq,
+					 kthread_should_stop() ||
+					 !list_empty(&lo->waiting_queue));
+
+		/* extract request */
+
+		if (list_empty(&lo->waiting_queue))
+			continue;
+
+		spin_lock_irq(&lo->queue_lock);
+		req = list_entry(lo->waiting_queue.next, struct request,
+				 queuelist);
+		list_del_init(&req->queuelist);
+		spin_unlock_irq(&lo->queue_lock);
+
+		/* handle request */
+
+		nbd_handle_req(lo, req);
+	}
+	return 0;
+}
+
 /*
  * We always wait for result of write, for now. It would be nice to make it optional
  * in future
@@ -449,65 +531,23 @@ static void do_nbd_request(struct request_queue * q)
 		struct nbd_device *lo;
 
 		blkdev_dequeue_request(req);
+
+		spin_unlock_irq(q->queue_lock);
+
 		dprintk(DBG_BLKDEV, "%s: request %p: dequeued (flags=%x)\n",
 				req->rq_disk->disk_name, req, req->cmd_type);
 
-		if (!blk_fs_request(req))
-			goto error_out;
-
 		lo = req->rq_disk->private_data;
 
 		BUG_ON(lo->magic != LO_MAGIC);
 
-		nbd_cmd(req) = NBD_CMD_READ;
-		if (rq_data_dir(req) == WRITE) {
-			nbd_cmd(req) = NBD_CMD_WRITE;
-			if (lo->flags & NBD_READ_ONLY) {
-				printk(KERN_ERR "%s: Write on read-only\n",
-						lo->disk->disk_name);
-				goto error_out;
-			}
-		}
+		spin_lock_irq(&lo->queue_lock);
+		list_add_tail(&req->queuelist, &lo->waiting_queue);
+		spin_unlock_irq(&lo->queue_lock);
 
-		req->errors = 0;
-		spin_unlock_irq(q->queue_lock);
-
-		mutex_lock(&lo->tx_lock);
-		if (unlikely(!lo->sock)) {
-			mutex_unlock(&lo->tx_lock);
-			printk(KERN_ERR "%s: Attempted send on closed socket\n",
-			       lo->disk->disk_name);
-			req->errors++;
-			nbd_end_request(req);
-			spin_lock_irq(q->queue_lock);
-			continue;
-		}
-
-		lo->active_req = req;
-
-		if (nbd_send_req(lo, req) != 0) {
-			printk(KERN_ERR "%s: Request send failed\n",
-					lo->disk->disk_name);
-			req->errors++;
-			nbd_end_request(req);
-		} else {
-			spin_lock(&lo->queue_lock);
-			list_add(&req->queuelist, &lo->queue_head);
-			spin_unlock(&lo->queue_lock);
-		}
-
-		lo->active_req = NULL;
-		mutex_unlock(&lo->tx_lock);
-		wake_up_all(&lo->active_wq);
+		wake_up(&lo->waiting_wq);
 
 		spin_lock_irq(q->queue_lock);
-		continue;
-
-error_out:
-		req->errors++;
-		spin_unlock(q->queue_lock);
-		nbd_end_request(req);
-		spin_lock(q->queue_lock);
 	}
 }
 
@@ -517,6 +557,7 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
 	struct nbd_device *lo = inode->i_bdev->bd_disk->private_data;
 	int error;
 	struct request sreq ;
+	struct task_struct *thread;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -599,7 +640,12 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
 	case NBD_DO_IT:
 		if (!lo->file)
 			return -EINVAL;
+		thread = kthread_create(nbd_thread, lo, lo->disk->disk_name);
+		if (IS_ERR(thread))
+			return PTR_ERR(thread);
+		wake_up_process(thread);
 		error = nbd_do_it(lo);
+		kthread_stop(thread);
 		if (error)
 			return error;
 		sock_shutdown(lo, 1);
@@ -684,10 +730,12 @@ static int __init nbd_init(void)
 		nbd_dev[i].file = NULL;
 		nbd_dev[i].magic = LO_MAGIC;
 		nbd_dev[i].flags = 0;
+		INIT_LIST_HEAD(&nbd_dev[i].waiting_queue);
 		spin_lock_init(&nbd_dev[i].queue_lock);
 		INIT_LIST_HEAD(&nbd_dev[i].queue_head);
 		mutex_init(&nbd_dev[i].tx_lock);
 		init_waitqueue_head(&nbd_dev[i].active_wq);
+		init_waitqueue_head(&nbd_dev[i].waiting_wq);
 		nbd_dev[i].blksize = 1024;
 		nbd_dev[i].bytesize = 0;
 		disk->major = NBD_MAJOR;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index cc2b472..94f40c9 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -57,9 +57,11 @@ struct nbd_device {
 	int magic;
 
 	spinlock_t queue_lock;
-	struct list_head queue_head;/* Requests are added here...	*/
+	struct list_head queue_head;	/* Requests waiting result */
 	struct request *active_req;
 	wait_queue_head_t active_wq;
+	struct list_head waiting_queue;	/* Requests to be sent */
+	wait_queue_head_t waiting_wq;
 
 	struct mutex tx_lock;
 	struct gendisk *disk;
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-01 13:25 Laurent Vivier
@ 2008-02-02 11:23 ` Pavel Machek
  2008-02-02 11:52   ` Jan Engelhardt
  2008-02-02 15:26   ` Laurent Vivier
  0 siblings, 2 replies; 11+ messages in thread
From: Pavel Machek @ 2008-02-02 11:23 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Paul.Clements, nbd-general, linux-kernel

On Fri 2008-02-01 14:25:32, Laurent Vivier wrote:
> This patch allows Network Block Device to be mounted locally.

What is local nbd good for? Use loop instead...

> It creates a kthread to avoid the deadlock described in NBD tools documentation.
> So, if nbd-client hangs waiting pages, the kblockd thread can continue its
> work and free pages.

Hmm, and if there are no other pages that can be freed? Unlikely, but
can happen AFAICT.

									Pavel



> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> ---
>  drivers/block/nbd.c |  146 ++++++++++++++++++++++++++++++++++-----------------
>  include/linux/nbd.h |    4 +-
>  2 files changed, 100 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index b4c0888..de6685e 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -29,6 +29,7 @@
>  #include <linux/kernel.h>
>  #include <net/sock.h>
>  #include <linux/net.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
> @@ -434,6 +435,87 @@ static void nbd_clear_que(struct nbd_device *lo)
>  }
>  
>  
> +static void nbd_handle_req(struct nbd_device *lo, struct request *req)
> +{
> +	if (!blk_fs_request(req))
> +		goto error_out;
> +
> +	nbd_cmd(req) = NBD_CMD_READ;
> +	if (rq_data_dir(req) == WRITE) {
> +		nbd_cmd(req) = NBD_CMD_WRITE;
> +		if (lo->flags & NBD_READ_ONLY) {
> +			printk(KERN_ERR "%s: Write on read-only\n",
> +					lo->disk->disk_name);
> +			goto error_out;
> +		}
> +	}
> +
> +	req->errors = 0;
> +
> +	mutex_lock(&lo->tx_lock);
> +	if (unlikely(!lo->sock)) {
> +		mutex_unlock(&lo->tx_lock);
> +		printk(KERN_ERR "%s: Attempted send on closed socket\n",
> +		       lo->disk->disk_name);
> +		req->errors++;
> +		nbd_end_request(req);
> +		return;
> +	}
> +
> +	lo->active_req = req;
> +
> +	if (nbd_send_req(lo, req) != 0) {
> +		printk(KERN_ERR "%s: Request send failed\n",
> +				lo->disk->disk_name);
> +		req->errors++;
> +		nbd_end_request(req);
> +	} else {
> +		spin_lock(&lo->queue_lock);
> +		list_add(&req->queuelist, &lo->queue_head);
> +		spin_unlock(&lo->queue_lock);
> +	}
> +
> +	lo->active_req = NULL;
> +	mutex_unlock(&lo->tx_lock);
> +	wake_up_all(&lo->active_wq);
> +
> +	return;
> +
> +error_out:
> +	req->errors++;
> +	nbd_end_request(req);
> +}
> +
> +static int nbd_thread(void *data)
> +{
> +	struct nbd_device *lo = data;
> +	struct request *req;
> +
> +	set_user_nice(current, -20);
> +	while (!kthread_should_stop() || !list_empty(&lo->waiting_queue)) {
> +		/* wait something to do */
> +		wait_event_interruptible(lo->waiting_wq,
> +					 kthread_should_stop() ||
> +					 !list_empty(&lo->waiting_queue));
> +
> +		/* extract request */
> +
> +		if (list_empty(&lo->waiting_queue))
> +			continue;
> +
> +		spin_lock_irq(&lo->queue_lock);
> +		req = list_entry(lo->waiting_queue.next, struct request,
> +				 queuelist);
> +		list_del_init(&req->queuelist);
> +		spin_unlock_irq(&lo->queue_lock);
> +
> +		/* handle request */
> +
> +		nbd_handle_req(lo, req);
> +	}
> +	return 0;
> +}
> +
>  /*
>   * We always wait for result of write, for now. It would be nice to make it optional
>   * in future
> @@ -449,65 +531,23 @@ static void do_nbd_request(struct request_queue * q)
>  		struct nbd_device *lo;
>  
>  		blkdev_dequeue_request(req);
> +
> +		spin_unlock_irq(q->queue_lock);
> +
>  		dprintk(DBG_BLKDEV, "%s: request %p: dequeued (flags=%x)\n",
>  				req->rq_disk->disk_name, req, req->cmd_type);
>  
> -		if (!blk_fs_request(req))
> -			goto error_out;
> -
>  		lo = req->rq_disk->private_data;
>  
>  		BUG_ON(lo->magic != LO_MAGIC);
>  
> -		nbd_cmd(req) = NBD_CMD_READ;
> -		if (rq_data_dir(req) == WRITE) {
> -			nbd_cmd(req) = NBD_CMD_WRITE;
> -			if (lo->flags & NBD_READ_ONLY) {
> -				printk(KERN_ERR "%s: Write on read-only\n",
> -						lo->disk->disk_name);
> -				goto error_out;
> -			}
> -		}
> +		spin_lock_irq(&lo->queue_lock);
> +		list_add_tail(&req->queuelist, &lo->waiting_queue);
> +		spin_unlock_irq(&lo->queue_lock);
>  
> -		req->errors = 0;
> -		spin_unlock_irq(q->queue_lock);
> -
> -		mutex_lock(&lo->tx_lock);
> -		if (unlikely(!lo->sock)) {
> -			mutex_unlock(&lo->tx_lock);
> -			printk(KERN_ERR "%s: Attempted send on closed socket\n",
> -			       lo->disk->disk_name);
> -			req->errors++;
> -			nbd_end_request(req);
> -			spin_lock_irq(q->queue_lock);
> -			continue;
> -		}
> -
> -		lo->active_req = req;
> -
> -		if (nbd_send_req(lo, req) != 0) {
> -			printk(KERN_ERR "%s: Request send failed\n",
> -					lo->disk->disk_name);
> -			req->errors++;
> -			nbd_end_request(req);
> -		} else {
> -			spin_lock(&lo->queue_lock);
> -			list_add(&req->queuelist, &lo->queue_head);
> -			spin_unlock(&lo->queue_lock);
> -		}
> -
> -		lo->active_req = NULL;
> -		mutex_unlock(&lo->tx_lock);
> -		wake_up_all(&lo->active_wq);
> +		wake_up(&lo->waiting_wq);
>  
>  		spin_lock_irq(q->queue_lock);
> -		continue;
> -
> -error_out:
> -		req->errors++;
> -		spin_unlock(q->queue_lock);
> -		nbd_end_request(req);
> -		spin_lock(q->queue_lock);
>  	}
>  }
>  
> @@ -517,6 +557,7 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
>  	struct nbd_device *lo = inode->i_bdev->bd_disk->private_data;
>  	int error;
>  	struct request sreq ;
> +	struct task_struct *thread;
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
> @@ -599,7 +640,12 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
>  	case NBD_DO_IT:
>  		if (!lo->file)
>  			return -EINVAL;
> +		thread = kthread_create(nbd_thread, lo, lo->disk->disk_name);
> +		if (IS_ERR(thread))
> +			return PTR_ERR(thread);
> +		wake_up_process(thread);
>  		error = nbd_do_it(lo);
> +		kthread_stop(thread);
>  		if (error)
>  			return error;
>  		sock_shutdown(lo, 1);
> @@ -684,10 +730,12 @@ static int __init nbd_init(void)
>  		nbd_dev[i].file = NULL;
>  		nbd_dev[i].magic = LO_MAGIC;
>  		nbd_dev[i].flags = 0;
> +		INIT_LIST_HEAD(&nbd_dev[i].waiting_queue);
>  		spin_lock_init(&nbd_dev[i].queue_lock);
>  		INIT_LIST_HEAD(&nbd_dev[i].queue_head);
>  		mutex_init(&nbd_dev[i].tx_lock);
>  		init_waitqueue_head(&nbd_dev[i].active_wq);
> +		init_waitqueue_head(&nbd_dev[i].waiting_wq);
>  		nbd_dev[i].blksize = 1024;
>  		nbd_dev[i].bytesize = 0;
>  		disk->major = NBD_MAJOR;
> diff --git a/include/linux/nbd.h b/include/linux/nbd.h
> index cc2b472..94f40c9 100644
> --- a/include/linux/nbd.h
> +++ b/include/linux/nbd.h
> @@ -57,9 +57,11 @@ struct nbd_device {
>  	int magic;
>  
>  	spinlock_t queue_lock;
> -	struct list_head queue_head;/* Requests are added here...	*/
> +	struct list_head queue_head;	/* Requests waiting result */
>  	struct request *active_req;
>  	wait_queue_head_t active_wq;
> +	struct list_head waiting_queue;	/* Requests to be sent */
> +	wait_queue_head_t waiting_wq;
>  
>  	struct mutex tx_lock;
>  	struct gendisk *disk;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 11:23 ` Pavel Machek
@ 2008-02-02 11:52   ` Jan Engelhardt
  2008-02-02 15:26   ` Laurent Vivier
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Engelhardt @ 2008-02-02 11:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Laurent Vivier, Paul.Clements, nbd-general, linux-kernel

On Feb 2 2008 12:23, Pavel Machek wrote:
>On Fri 2008-02-01 14:25:32, Laurent Vivier wrote:
>> This patch allows Network Block Device to be mounted locally.
>
>What is local nbd good for? Use loop instead...

Local NBD is good for when the content you want to make available
through the block device is dynamic (generated on-the-fly),
non-linear or supersparse.

Take for example VMware virtual disks. Just a guess, but
they roughly can look like this:

  kilobytes  0.. 1: header
  kilobytes  1..10: correspond to LBA 0..20
  kilobytes 11..20: correspond to LBA 40..60
  kilobytes 21..22: correspond to LBA 22..23

So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
does not deal with that.

And there is supersparsity -- the VMDK file itself is complete, but
unallocated regions like LBA 24..40 are sparse/zero when projected
onto a file/block device, respectively; loop cannot deal with that
either.

In fact, VMware uses local nbd today for its vmware-loop helper
utility, most likely because of the above-mentioned reasons. (Though
it quite often hung last time I tried.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 11:23 ` Pavel Machek
  2008-02-02 11:52   ` Jan Engelhardt
@ 2008-02-02 15:26   ` Laurent Vivier
  2008-02-02 16:13     ` Miklos Szeredi
  2008-02-02 20:54     ` Pavel Machek
  1 sibling, 2 replies; 11+ messages in thread
From: Laurent Vivier @ 2008-02-02 15:26 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Paul.Clements, nbd-general, linux-kernel


Le samedi 02 février 2008 à 12:23 +0100, Pavel Machek a écrit :
> On Fri 2008-02-01 14:25:32, Laurent Vivier wrote:
> > This patch allows Network Block Device to be mounted locally.
> 
> What is local nbd good for? Use loop instead...

It allows to write userlevel block device. In my case, I can mount disk
image of Qemu (qcow2, vmdk, ...).

> > It creates a kthread to avoid the deadlock described in NBD tools documentation.
> > So, if nbd-client hangs waiting pages, the kblockd thread can continue its
> > work and free pages.
> 
> Hmm, and if there are no other pages that can be freed? Unlikely, but
> can happen AFAICT.

Correct. The patch improves the NBD behavior even if it is not perfect.
And I think if no other page can be freed your system is in very bad
move ;-)

Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 15:26   ` Laurent Vivier
@ 2008-02-02 16:13     ` Miklos Szeredi
  2008-02-02 20:54     ` Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2008-02-02 16:13 UTC (permalink / raw)
  To: Laurent.Vivier; +Cc: pavel, Paul.Clements, nbd-general, linux-kernel

> > > It creates a kthread to avoid the deadlock described in NBD
> > > tools documentation.  So, if nbd-client hangs waiting pages, the
> > > kblockd thread can continue its work and free pages.
> > 
> > Hmm, and if there are no other pages that can be freed? Unlikely,
> > but can happen AFAICT.
> 
> Correct. The patch improves the NBD behavior even if it is not
> perfect.  And I think if no other page can be freed your system is
> in very bad move ;-)

Not necessarily.  Problems start when the system wants to free memory
by writing out pages through NBD, and the userspace process servicing
it tries to allocate some memory in order to accomplish this.

Recent kernels have gotten much better at coping with this, so it
might not be easy to make local NBD deadlock under normal
circumstances.  But if you try hard enough, it's not impossible:
throttle_vm_writeout() can stall an allocation until pending writes
have completed, all with plenty of memory available in the system.

BTW, you can basically substitute local NBD with fuse-over-loop, and
get a similar kind of service, with similar problems.

Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Allow NBD to be used locally
  2008-02-02 15:26   ` Laurent Vivier
  2008-02-02 16:13     ` Miklos Szeredi
@ 2008-02-02 20:54     ` Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2008-02-02 20:54 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Paul.Clements, nbd-general, linux-kernel

Hi!

> > > This patch allows Network Block Device to be mounted locally.
...
> Correct. The patch improves the NBD behavior even if it is not perfect.
> And I think if no other page can be freed your system is in very bad
> move ;-)

So the description should be 

"This patch lowers probability of deadlock if you mount  Network Block Device locally"

Hmm.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-03  6:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-02 17:31 [PATCH] Allow NBD to be used locally devzero
2008-02-03  0:54 ` Jan Engelhardt
2008-02-03  6:02   ` Kyle Moffett
  -- strict thread matches above, loose matches on Subject: below --
2008-02-02 14:40 devzero
2008-02-02 16:57 ` Jan Engelhardt
2008-02-01 13:25 Laurent Vivier
2008-02-02 11:23 ` Pavel Machek
2008-02-02 11:52   ` Jan Engelhardt
2008-02-02 15:26   ` Laurent Vivier
2008-02-02 16:13     ` Miklos Szeredi
2008-02-02 20:54     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox