From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756617AbZBISWi (ORCPT ); Mon, 9 Feb 2009 13:22:38 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753804AbZBISW3 (ORCPT ); Mon, 9 Feb 2009 13:22:29 -0500 Received: from p02c12o148.mxlogic.net ([208.65.145.81]:39280 "EHLO p02c12o148.mxlogic.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754679AbZBISW2 (ORCPT ); Mon, 9 Feb 2009 13:22:28 -0500 Message-ID: <4990743F.1070409@steeleye.com> Date: Mon, 09 Feb 2009 13:21:51 -0500 From: Paul Clements User-Agent: Swiftdove 2.0.0.9 (X11/20071116) MIME-Version: 1.0 To: Andrew Morton CC: kernel list , jnelson-kernel-bugzilla@jamponi.net Subject: [PATCH 1/1] NBD: fix I/O hang on disconnected nbds Content-Type: multipart/mixed; boundary="------------030208040506030309030302" X-OriginalArrivalTime: 09 Feb 2009 18:21:51.0384 (UTC) FILETIME=[47382180:01C98AE3] X-Spam: [F=0.2000000000; S=0.200(2009020301)] X-MAIL-FROM: X-SOURCE-IP: [207.43.68.209] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------030208040506030309030302 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit This patch fixes a problem that causes I/O to a disconnected (or partially initialized) nbd device to hang indefinitely. To reproduce: # ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048 # dd if=/dev/nbd23 of=/dev/null bs=4096 count=1 ...hangs... This can also occur when an nbd device loses its nbd-client/server connection. Although we clear the queue of any outstanding I/Os after the client/server connection fails, any additional I/Os that get queued later will hang. This bug may also be the problem reported in this bug report: http://bugzilla.kernel.org/show_bug.cgi?id=12277 Testing would need to be performed to determine if the two issues are the same. This problem was introduced by the new request handling thread code ("NBD: allow nbd to be used locally", 3/2008), which entered into mainline around 2.6.25. The fix, which is fairly simple, is to restore the check for lo->sock being NULL in do_nbd_request. This causes I/O to an uninitialized nbd to immediately fail with an I/O error, as it did prior to the introduction of this bug. -- Paul --------------030208040506030309030302 Content-Type: text/x-diff; name="nbd-io-hang.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="nbd-io-hang.diff" This patch fixes a problem that causes I/O to a disconnected (or partially initialized) nbd device to hang indefinitely. To reproduce: # ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048 # dd if=/dev/nbd23 of=/dev/null bs=4096 count=1 ...hangs... This can also occur when an nbd device loses its nbd-client/server connection. Although we clear the queue of any outstanding I/Os after the client/server connection fails, any additional I/Os that get queued later will hang. This bug may also be the problem reported in this bug report: http://bugzilla.kernel.org/show_bug.cgi?id=12277 Testing would need to be performed to determine if the two issues are the same. This problem was introduced by the new request handling thread code ("NBD: allow nbd to be used locally", 3/2008), which entered into mainline around 2.6.25. The fix, which is fairly simple, is to restore the check for lo->sock being NULL in do_nbd_request. This causes I/O to an uninitialized nbd to immediately fail with an I/O error, as it did prior to the introduction of this bug. Signed-off-by: Paul Clements --- nbd.c | 9 +++++++++ 1 files changed, 9 insertions(+) --- ./drivers/block/nbd.c.PRISTINE 2009-02-09 12:41:09.000000000 -0500 +++ ./drivers/block/nbd.c 2009-02-09 12:41:19.000000000 -0500 @@ -547,6 +547,15 @@ static void do_nbd_request(struct reques BUG_ON(lo->magic != LO_MAGIC); + if (unlikely(!lo->sock)) { + printk(KERN_ERR "%s: Attempted send on closed socket\n", + lo->disk->disk_name); + req->errors++; + nbd_end_request(req); + spin_lock_irq(q->queue_lock); + continue; + } + spin_lock_irq(&lo->queue_lock); list_add_tail(&req->queuelist, &lo->waiting_queue); spin_unlock_irq(&lo->queue_lock); --------------030208040506030309030302--