From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Layton Subject: [PATCH] have cifs_reconnect handle signals appropriately Date: Wed, 30 May 2007 17:46:51 -0400 Message-ID: <20070530174651.2af67a97.jlayton@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: linux-fsdevel@vger.kernel.org Return-path: Received: from mx1.redhat.com ([66.187.233.31]:35007 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760258AbXE3Vqx (ORCPT ); Wed, 30 May 2007 17:46:53 -0400 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.1/8.13.1) with ESMTP id l4ULkqjj019924 for ; Wed, 30 May 2007 17:46:52 -0400 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [10.11.255.20]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id l4ULkqgW013917 for ; Wed, 30 May 2007 17:46:52 -0400 Received: from tleilax.poochiereds.net (vpn-14-61.rdu.redhat.com [10.11.14.61]) by pobox.corp.redhat.com (8.13.1/8.13.1) with SMTP id l4ULkpXb022835 for ; Wed, 30 May 2007 17:46:51 -0400 Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org This case is the result of a fairly long, drawn-out case. The problem goes something like this: 1) mount a samba share using CIFS 2) start some continuous I/O on the mount (a loop that creates a tarball on the mount and removes it seems to work) 3) shut down the samba server 4) suspend the process doing I/O (via ^z) 5) kill -9 pid_of_cifsd_kthread (I have no idea why they're doing this, but bear with me) 6) umount -l the mount 7) start up samba again after this, you cannot remount the samba share. mount attempts all return either -ENOTDIR or -EAGAIN. The only fix seems to be to reboot the box. While the steps for this reproducer are pathological, I think they expose a problem with how cifsd handles signals. If we're in cifs_reconnect and cifsd is signalled, then the connect calls will all start returning -ERESTARTSYS and we'll never exit from the while loop. I *think* the following patch (or something like it) might be appropriate. I've tested a similar patch on Steve's backported 1.48a CIFS code and it seems to fix the problem there, but that code doesn't have the kthread changes. Does this look reasonable, or am I missing something important? :-) -- Jeff Layton diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index f4e9266..d369dd0 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -197,6 +197,11 @@ cifs_reconnect(struct TCP_Server_Info *server) server->server_RFC1001_name); } if(rc) { + if (rc == -ERESTARTSYS) { + cFYI(1,("reconnect interrupted by signal")); + kthread_stop(server->tsk); + continue; + } cFYI(1,("reconnect error %d",rc)); msleep(3000); } else {