public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Allow NBD to be used locally
@ 2008-02-02 17:31 devzero
  2008-02-03  0:54 ` Jan Engelhardt
  0 siblings, 1 reply; 11+ messages in thread
From: devzero @ 2008-02-02 17:31 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Laurent.Vivier, linux-kernel, pavel

> How will that work? Fuse makes up a filesystem - not helpful
> if you have a raw disk without a known fs to mount.

take zfs-fuse or ntfs-3g for example. 
you have a blockdevice or backing-file containing data structures and fuse makes those show up as a filesystem.
i think vmware-mount is not different here.

> This still does not account for compressed disk images, for example.
unfortunately, not




> 
> 
> On Feb 2 2008 15:40, devzero@web.de wrote:
> >
> >>In fact, VMware uses local nbd today for its vmware-loop helper
> >>utility, most likely because of the above-mentioned reasons. (Though
> >>it quite often hung last time I tried.)
> >
> >seems this will go away. recent vmware versions (e.g. server 2.0
> >beta) have a fuse based replacement for that.
> 
> How will that work? Fuse makes up a filesystem - not helpful
> if you have a raw disk without a known fs to mount.
> 
> >>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
> >>does not deal with that.
> >
> >maybe dm-loop does? http://sources.redhat.com/lvm2/wiki/DMLoop
> 
> This still does not account for compressed disk images, for example.
> 
> 


________________________________________________________
Bis 50 MB Dateianhänge? Kein Problem!
http://www.digitaledienste.web.de/freemail/club/lp/?lp=7


^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] Allow NBD to be used locally
@ 2008-02-02 14:40 devzero
  2008-02-02 16:57 ` Jan Engelhardt
  0 siblings, 1 reply; 11+ messages in thread
From: devzero @ 2008-02-02 14:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Laurent.Vivier, jengelh, pavel

>In fact, VMware uses local nbd today for its vmware-loop helper
>utility, most likely because of the above-mentioned reasons. (Though
>it quite often hung last time I tried.)

seems this will go away.
recent vmware versions (e.g. server 2.0 beta) have a fuse based replacement for that.

ldd /usr/bin/vmware-mount
        linux-gate.so.1 =>  (0xffffe000)
        libz.so.1 => /lib/libz.so.1 (0xb7f95000)
! ->  libfuse.so.2 => /lib/libfuse.so.2 (0xb7f79000)   
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7f61000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7f5d000)
        libc.so.6 => /lib/libc.so.6 (0xb7e1c000)
        /lib/ld-linux.so.2 (0xb7fbd000)
        librt.so.1 => /lib/librt.so.1 (0xb7e13000)

i`m not sure if this is the perfect approach (slower....) , but at least that shouldn`t have those stability issues as the nbd one.

i always felt uncomfortable with the nbd approach...that`s why i started the following discussion-thread: 
http://communities.vmware.com/message/854746

anyway, i can see a point to use nbd locally, but i think it shouldn`t be abused for mapping local disk-images of any kind, even if this has better capabilities than loop or other. why should local disk data be sent trough the network layer? isn`t device-mapper the better infrastructure here?

>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
>does not deal with that.
maybe dm-loop does? http://sources.redhat.com/lvm2/wiki/DMLoop

regards
roland



>On Feb 2 2008 12:23, Pavel Machek wrote:
>>On Fri 2008-02-01 14:25:32, Laurent Vivier wrote:
>>> This patch allows Network Block Device to be mounted locally.
>>
>>What is local nbd good for? Use loop instead...
>
>Local NBD is good for when the content you want to make available
>through the block device is dynamic (generated on-the-fly),
>non-linear or supersparse.
>
>Take for example VMware virtual disks. Just a guess, but
>they roughly can look like this:
>
>  kilobytes  0.. 1: header
>  kilobytes  1..10: correspond to LBA 0..20
>  kilobytes 11..20: correspond to LBA 40..60
>  kilobytes 21..22: correspond to LBA 22..23
>
>So what we have is non-linearity -- LBA 22 comes after LBA 40 -- loop
>does not deal with that.
>
>And there is supersparsity -- the VMDK file itself is complete, but
>unallocated regions like LBA 24..40 are sparse/zero when projected
>onto a file/block device, respectively; loop cannot deal with that
>either.
>
>In fact, VMware uses local nbd today for its vmware-loop helper
>utility, most likely because of the above-mentioned reasons. (Though
>it quite often hung last time I tried.)
_____________________________________________________________________
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/?mc=100071&distributionid=000000000066


^ permalink raw reply	[flat|nested] 11+ messages in thread
* [PATCH] Allow NBD to be used locally
@ 2008-02-01 13:25 Laurent Vivier
  2008-02-02 11:23 ` Pavel Machek
  0 siblings, 1 reply; 11+ messages in thread
From: Laurent Vivier @ 2008-02-01 13:25 UTC (permalink / raw)
  To: Paul.Clements; +Cc: nbd-general, linux-kernel, Laurent Vivier

This patch allows Network Block Device to be mounted locally.

It creates a kthread to avoid the deadlock described in NBD tools documentation.
So, if nbd-client hangs waiting pages, the kblockd thread can continue its
work and free pages.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 drivers/block/nbd.c |  146 ++++++++++++++++++++++++++++++++++-----------------
 include/linux/nbd.h |    4 +-
 2 files changed, 100 insertions(+), 50 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b4c0888..de6685e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <net/sock.h>
 #include <linux/net.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
@@ -434,6 +435,87 @@ static void nbd_clear_que(struct nbd_device *lo)
 }
 
 
+static void nbd_handle_req(struct nbd_device *lo, struct request *req)
+{
+	if (!blk_fs_request(req))
+		goto error_out;
+
+	nbd_cmd(req) = NBD_CMD_READ;
+	if (rq_data_dir(req) == WRITE) {
+		nbd_cmd(req) = NBD_CMD_WRITE;
+		if (lo->flags & NBD_READ_ONLY) {
+			printk(KERN_ERR "%s: Write on read-only\n",
+					lo->disk->disk_name);
+			goto error_out;
+		}
+	}
+
+	req->errors = 0;
+
+	mutex_lock(&lo->tx_lock);
+	if (unlikely(!lo->sock)) {
+		mutex_unlock(&lo->tx_lock);
+		printk(KERN_ERR "%s: Attempted send on closed socket\n",
+		       lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+		return;
+	}
+
+	lo->active_req = req;
+
+	if (nbd_send_req(lo, req) != 0) {
+		printk(KERN_ERR "%s: Request send failed\n",
+				lo->disk->disk_name);
+		req->errors++;
+		nbd_end_request(req);
+	} else {
+		spin_lock(&lo->queue_lock);
+		list_add(&req->queuelist, &lo->queue_head);
+		spin_unlock(&lo->queue_lock);
+	}
+
+	lo->active_req = NULL;
+	mutex_unlock(&lo->tx_lock);
+	wake_up_all(&lo->active_wq);
+
+	return;
+
+error_out:
+	req->errors++;
+	nbd_end_request(req);
+}
+
+static int nbd_thread(void *data)
+{
+	struct nbd_device *lo = data;
+	struct request *req;
+
+	set_user_nice(current, -20);
+	while (!kthread_should_stop() || !list_empty(&lo->waiting_queue)) {
+		/* wait something to do */
+		wait_event_interruptible(lo->waiting_wq,
+					 kthread_should_stop() ||
+					 !list_empty(&lo->waiting_queue));
+
+		/* extract request */
+
+		if (list_empty(&lo->waiting_queue))
+			continue;
+
+		spin_lock_irq(&lo->queue_lock);
+		req = list_entry(lo->waiting_queue.next, struct request,
+				 queuelist);
+		list_del_init(&req->queuelist);
+		spin_unlock_irq(&lo->queue_lock);
+
+		/* handle request */
+
+		nbd_handle_req(lo, req);
+	}
+	return 0;
+}
+
 /*
  * We always wait for result of write, for now. It would be nice to make it optional
  * in future
@@ -449,65 +531,23 @@ static void do_nbd_request(struct request_queue * q)
 		struct nbd_device *lo;
 
 		blkdev_dequeue_request(req);
+
+		spin_unlock_irq(q->queue_lock);
+
 		dprintk(DBG_BLKDEV, "%s: request %p: dequeued (flags=%x)\n",
 				req->rq_disk->disk_name, req, req->cmd_type);
 
-		if (!blk_fs_request(req))
-			goto error_out;
-
 		lo = req->rq_disk->private_data;
 
 		BUG_ON(lo->magic != LO_MAGIC);
 
-		nbd_cmd(req) = NBD_CMD_READ;
-		if (rq_data_dir(req) == WRITE) {
-			nbd_cmd(req) = NBD_CMD_WRITE;
-			if (lo->flags & NBD_READ_ONLY) {
-				printk(KERN_ERR "%s: Write on read-only\n",
-						lo->disk->disk_name);
-				goto error_out;
-			}
-		}
+		spin_lock_irq(&lo->queue_lock);
+		list_add_tail(&req->queuelist, &lo->waiting_queue);
+		spin_unlock_irq(&lo->queue_lock);
 
-		req->errors = 0;
-		spin_unlock_irq(q->queue_lock);
-
-		mutex_lock(&lo->tx_lock);
-		if (unlikely(!lo->sock)) {
-			mutex_unlock(&lo->tx_lock);
-			printk(KERN_ERR "%s: Attempted send on closed socket\n",
-			       lo->disk->disk_name);
-			req->errors++;
-			nbd_end_request(req);
-			spin_lock_irq(q->queue_lock);
-			continue;
-		}
-
-		lo->active_req = req;
-
-		if (nbd_send_req(lo, req) != 0) {
-			printk(KERN_ERR "%s: Request send failed\n",
-					lo->disk->disk_name);
-			req->errors++;
-			nbd_end_request(req);
-		} else {
-			spin_lock(&lo->queue_lock);
-			list_add(&req->queuelist, &lo->queue_head);
-			spin_unlock(&lo->queue_lock);
-		}
-
-		lo->active_req = NULL;
-		mutex_unlock(&lo->tx_lock);
-		wake_up_all(&lo->active_wq);
+		wake_up(&lo->waiting_wq);
 
 		spin_lock_irq(q->queue_lock);
-		continue;
-
-error_out:
-		req->errors++;
-		spin_unlock(q->queue_lock);
-		nbd_end_request(req);
-		spin_lock(q->queue_lock);
 	}
 }
 
@@ -517,6 +557,7 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
 	struct nbd_device *lo = inode->i_bdev->bd_disk->private_data;
 	int error;
 	struct request sreq ;
+	struct task_struct *thread;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -599,7 +640,12 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
 	case NBD_DO_IT:
 		if (!lo->file)
 			return -EINVAL;
+		thread = kthread_create(nbd_thread, lo, lo->disk->disk_name);
+		if (IS_ERR(thread))
+			return PTR_ERR(thread);
+		wake_up_process(thread);
 		error = nbd_do_it(lo);
+		kthread_stop(thread);
 		if (error)
 			return error;
 		sock_shutdown(lo, 1);
@@ -684,10 +730,12 @@ static int __init nbd_init(void)
 		nbd_dev[i].file = NULL;
 		nbd_dev[i].magic = LO_MAGIC;
 		nbd_dev[i].flags = 0;
+		INIT_LIST_HEAD(&nbd_dev[i].waiting_queue);
 		spin_lock_init(&nbd_dev[i].queue_lock);
 		INIT_LIST_HEAD(&nbd_dev[i].queue_head);
 		mutex_init(&nbd_dev[i].tx_lock);
 		init_waitqueue_head(&nbd_dev[i].active_wq);
+		init_waitqueue_head(&nbd_dev[i].waiting_wq);
 		nbd_dev[i].blksize = 1024;
 		nbd_dev[i].bytesize = 0;
 		disk->major = NBD_MAJOR;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index cc2b472..94f40c9 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -57,9 +57,11 @@ struct nbd_device {
 	int magic;
 
 	spinlock_t queue_lock;
-	struct list_head queue_head;/* Requests are added here...	*/
+	struct list_head queue_head;	/* Requests waiting result */
 	struct request *active_req;
 	wait_queue_head_t active_wq;
+	struct list_head waiting_queue;	/* Requests to be sent */
+	wait_queue_head_t waiting_wq;
 
 	struct mutex tx_lock;
 	struct gendisk *disk;
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-03  6:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-02 17:31 [PATCH] Allow NBD to be used locally devzero
2008-02-03  0:54 ` Jan Engelhardt
2008-02-03  6:02   ` Kyle Moffett
  -- strict thread matches above, loose matches on Subject: below --
2008-02-02 14:40 devzero
2008-02-02 16:57 ` Jan Engelhardt
2008-02-01 13:25 Laurent Vivier
2008-02-02 11:23 ` Pavel Machek
2008-02-02 11:52   ` Jan Engelhardt
2008-02-02 15:26   ` Laurent Vivier
2008-02-02 16:13     ` Miklos Szeredi
2008-02-02 20:54     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox