From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Steve French (smfltc)" Subject: Re: linux-cifs-client Digest, Vol 39, Issue 19 Date: Sun, 25 Feb 2007 22:36:37 -0600 Message-ID: <45E263D5.2020903@us.ibm.com> References: <20070224120026.4BFAC16387C@lists.samba.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: linux-cifs-client@lists.samba.org, linux-fsdevel@vger.kernel.org Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:40249 "EHLO e2.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933763AbXBZEg3 (ORCPT ); Sun, 25 Feb 2007 23:36:29 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l1Q4aROn002946 for ; Sun, 25 Feb 2007 23:36:27 -0500 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.2) with ESMTP id l1Q4aPPH063366 for ; Sun, 25 Feb 2007 23:36:27 -0500 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l1Q4aPnh017769 for ; Sun, 25 Feb 2007 23:36:25 -0500 In-Reply-To: <20070224120026.4BFAC16387C@lists.samba.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org linux-cifs-client-request@lists.samba.org wrote: >Send linux-cifs-client mailing list submissions to > linux-cifs-client@lists.samba.org > >To subscribe or unsubscribe via the World Wide Web, visit > https://lists.samba.org/mailman/listinfo/linux-cifs-client >or, via email, send a message with subject or body 'help' to > linux-cifs-client-request@lists.samba.org > >You can reach the person managing the list at > linux-cifs-client-owner@lists.samba.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of linux-cifs-client digest..." > > >Today's Topics: > > 1. i_mutex and deadlock (Steve French (smfltc)) > 2. Re: i_mutex and deadlock (Dave Kleikamp) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 23 Feb 2007 10:02:16 -0600 >From: "Steve French (smfltc)" >Subject: [linux-cifs-client] i_mutex and deadlock >To: linux-fsdevel@vger.kernel.org >Cc: linux-cifs-client@lists.samba.org >Message-ID: <45DF1008.3090903@us.ibm.com> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >A field in i_size_write (i_size_seqcount) must be protected against >simultaneous update otherwise we risk looping in i_size_read. > >The suggestion in fs.h is to use i_mutex which seems too dangerous due >to the possibility of deadlock. > >There are 65 places in the fs directory which lock an i_mutex, and seven >more in the mm directory. The vfs does clearly lock file inodes in >some paths before calling into a particular filesystem (nfs, ext3, cifs >etc.) - in particular for fsync but probably for others that are harder >to trace. This seems to introduce the possibility of deadlock if a >filesystem also uses i_mutex to protect file size updates > >Documentation/filesystems/Locking describes the use of i_mutex (was >"i_sem" previously) and indicates that it is held by the vfs on three >additional calls on file inodes which concern me (for deadlock >possibility), setattr, truncate and unlink. > >nfs seems to limit its use of i_mutex to llseek and invalidate_mapping, >and does not appear to grab the i_mutex (or any sem for that matter) to >protect i_size_write >(nfs calls i_size_write in nfs_grow_file) - and for the case of >nfs_fhget (in which they bypass i_size_write and set i_size directly) >does not seem to grab i_mutex either. > >ext3 also does not use i_mutex for this purpose (protecting >i_size_write) - ony to protect a journalling ioctl. > >I am concerned about using i_mutex to protect the cifs calls to >i_size_write (although it seems to fix a problem reported in i_size_read >under stress) because of the following: > >1) no one else calls i_size_write AFAIK (on our file inodes) >2) we don't block inside i_size_write do we ... (so why in the world do >they take a slow mutex instead of a fast spinlock) >3) we don't really know what happens inside fsync (the paths through the >page cache code seem complex and we don't want to reenter writepage in >low memory conditions and deadlock updating the file size), and there is >some concern that the vfs takes the i_mutex in other paths on file >inodes before entering our code and could deadlock. > >Any reason, why an fs shouldn't simply use something else (a spinlock) >other than i_mutex to protect the i_size_write call? > > >------------------------------ > >Message: 2 >Date: Fri, 23 Feb 2007 10:29:53 -0600 >From: Dave Kleikamp >Subject: Re: [linux-cifs-client] i_mutex and deadlock >To: "Steve French (smfltc)" >Cc: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org >Message-ID: <1172248194.15810.10.camel@kleikamp.austin.ibm.com> >Content-Type: text/plain > >On Fri, 2007-02-23 at 10:02 -0600, Steve French (smfltc) wrote: > > >>A field in i_size_write (i_size_seqcount) must be protected against >>simultaneous update otherwise we risk looping in i_size_read. >> >>The suggestion in fs.h is to use i_mutex which seems too dangerous due >>to the possibility of deadlock. >> >> > >I'm not sure if it's as much a suggestion as a way of documenting the >locking that exists (or existed when the comment was written). > >"... i_size_write() does need locking around it (normally i_mutex) ..." > > > >>There are 65 places in the fs directory which lock an i_mutex, and seven >>more in the mm directory. The vfs does clearly lock file inodes in >>some paths before calling into a particular filesystem (nfs, ext3, cifs >>etc.) - in particular for fsync but probably for others that are harder >>to trace. This seems to introduce the possibility of deadlock if a >>filesystem also uses i_mutex to protect file size updates >> >>I am concerned about using i_mutex to protect the cifs calls to >>i_size_write (although it seems to fix a problem reported in i_size_read >>under stress) because of the following: >> >>1) no one else calls i_size_write AFAIK (on our file inodes) >> >> > >I think you're right. > > > I produced a patch (attached to http://bugzilla.kernel.org/show_bug.cgi?id=7903) which fixes this, has tested out fine, has no sideeffects/changes outside of cifs and simply uses inode->i_lock (spinlock) to protect i_size_write calls on cifs inodes which seemed faster/safer especially as there was one other path outside the cifs code which calls i_size_write (vmtruncate in mm/memory.c) which was easy to work around since cifs could call its own version of this small function. This is also a lot easier to prove/ understand as there were 72 places which called i_mutex outside of cifs (ie in mm and fs) many of which were potential deadlocks if cifs used i_mutex to protect its i_size_writes (network filesystems do updates of the file size in more places than a local filesystem of course ... so have to be careful about grabbing the same sem twice ...) I don't want to push that over akpm's objections as he knows the mm well, but this seems the simplest/safest approach.