* iget4/read_inode2 race in get_new_inode
@ 2002-04-29 16:14 Jan Harkes
2002-04-29 16:48 ` Jan Harkes
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Jan Harkes @ 2002-04-29 16:14 UTC (permalink / raw)
To: linux-fsdevel
Hi all,
I believe I found a race in get_new_inode (fs/inode.c) which affects
filesystems that use iget4 (and read_inode2). Because the 'opaque' value
of the new inode is initialized in read_inode2 after the inode_lock has
already been released. So another thread could call into get_new_inode
and not see the inode that hasn't yet been initialized in read_inode2.
Moving the spin_unlock(&inode_lock) to after the read_inode calls will
fix this, but could lead to deadlock for stacking filesystems that try
to access another filesystem/inode from their read_inode call. Coda is
safe, but at least intermezzo seems to be calling through to other
filesystems.
A solution would be to add i_ops->init_locked_inode(inode, opaque) which
would be called while we're still holding the inode lock. I couldn't see
a simple way to split such functionality out of the reiserfs read_inode2
function, but it would look something like the following.
static int reiserfs_init_locked_inode(struct inode *inode, void *opaque)
{
INODE_PKEY(inode)->k_dir_id =
((struct reiserfs_iget4_args *)opaque)->objectid;
}
Jan
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 16:14 iget4/read_inode2 race in get_new_inode Jan Harkes @ 2002-04-29 16:48 ` Jan Harkes 2002-04-29 17:49 ` Alexander Viro 2002-04-29 17:18 ` Anton Altaparmakov 2002-04-29 17:44 ` Alexander Viro 2 siblings, 1 reply; 31+ messages in thread From: Jan Harkes @ 2002-04-29 16:48 UTC (permalink / raw) To: linux-fsdevel On Mon, Apr 29, 2002 at 12:14:30PM -0400, Jan Harkes wrote: > A solution would be to add i_ops->init_locked_inode(inode, opaque) which > would be called while we're still holding the inode lock. I couldn't see > a simple way to split such functionality out of the reiserfs read_inode2 > function, but it would look something like the following. Better solution would probably be to pass the opaque value to the existing i_ops->alloc_inode call. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 16:48 ` Jan Harkes @ 2002-04-29 17:49 ` Alexander Viro 2002-04-29 19:34 ` Anton Altaparmakov 0 siblings, 1 reply; 31+ messages in thread From: Alexander Viro @ 2002-04-29 17:49 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel On Mon, 29 Apr 2002, Jan Harkes wrote: > On Mon, Apr 29, 2002 at 12:14:30PM -0400, Jan Harkes wrote: > > A solution would be to add i_ops->init_locked_inode(inode, opaque) which > > would be called while we're still holding the inode lock. I couldn't see > > a simple way to split such functionality out of the reiserfs read_inode2 > > function, but it would look something like the following. > > Better solution would probably be to pass the opaque value to the > existing i_ops->alloc_inode call. Hell, no. ->read_inode()/->read_inode2()/etc. are callbacks. Adding more mess related to iget() into real methods is a Bad Idea(tm). If anything, we might make your init_locked_inode() a second callback argument to iget4() (see the way sget() works). I would prefer that combined with "return locked inode" scheme and killed ->read_inode() as a method - let filesystem call it explicitly. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 17:49 ` Alexander Viro @ 2002-04-29 19:34 ` Anton Altaparmakov 2002-04-29 19:39 ` Anton Altaparmakov ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 19:34 UTC (permalink / raw) To: Alexander Viro; +Cc: Jan Harkes, linux-fsdevel At 18:49 29/04/02, Alexander Viro wrote: >On Mon, Apr 29, 2002 at 12:14:30PM -0400, Jan Harkes wrote: > > A solution would be to add i_ops->init_locked_inode(inode, opaque) which > > would be called while we're still holding the inode lock. I couldn't see > > a simple way to split such functionality out of the reiserfs read_inode2 > > function, but it would look something like the following. > >If anything, we might make your init_locked_inode() a second callback >argument to iget4() (see the way sget() works). I would prefer that >combined with "return locked inode" scheme and killed ->read_inode() >as a method - let filesystem call it explicitly. How about this as 1st patch: - Replace iget4() with iget5() with an additional argument: a callback to initialize a locked inode in order to kill the race condition Jan found. struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); - Let get_new_inode() call init_locked_actor before dropping the inode_lock. It then calls ->read_inode as usual. - read_inode2 is no longer needed as init_actor already has done whatever it needed to do with opaque so no need to pass it into read_inode as well. BONUS: Existing file systems do not need modification except for nfs, ReiserFS and CODA and possibly ex-tree fs like XFS. We can either leave it at this or go on to really remove ->read_inode completely, e.g. like so, in patch 2: - Introduce VFS helper fs_iget(): (perhaps as static inline in linux/fs.h?) struct inode *fs_iget(sb, ino, find_actor, init_locked_actor, opaque, fs_read_inode) { inode = iget5(sb, ino, find_actor, init_locked_actor, opaque); if (inode has I_NEW set) { fs_read_inode(inode); unlock_new_inode(inode); } return inode; } FS which want to play games don't have to use this and write their own. - Remove ->read_inode altogether. Switch all existing fs to using fs_iget. Comments? Al, is that what you are thinking off? Best regards, Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:34 ` Anton Altaparmakov @ 2002-04-29 19:39 ` Anton Altaparmakov 2002-04-29 19:41 ` Steve Lord ` (2 subsequent siblings) 3 siblings, 0 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 19:39 UTC (permalink / raw) To: Alexander Viro; +Cc: Jan Harkes, linux-fsdevel At 20:34 29/04/02, Anton Altaparmakov wrote: >At 18:49 29/04/02, Alexander Viro wrote: >>On Mon, Apr 29, 2002 at 12:14:30PM -0400, Jan Harkes wrote: >> > A solution would be to add i_ops->init_locked_inode(inode, opaque) which >> > would be called while we're still holding the inode lock. I couldn't see >> > a simple way to split such functionality out of the reiserfs read_inode2 >> > function, but it would look something like the following. >> >>If anything, we might make your init_locked_inode() a second callback >>argument to iget4() (see the way sget() works). I would prefer that >>combined with "return locked inode" scheme and killed ->read_inode() >>as a method - let filesystem call it explicitly. > >How about this as 1st patch: > >- Replace iget4() with iget5() with an additional argument: a callback to >initialize a locked inode in order to kill the race condition Jan found. > >struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > >- Let get_new_inode() call init_locked_actor before dropping the >inode_lock. It then calls ->read_inode as usual. > >- read_inode2 is no longer needed as init_actor already has done whatever >it needed to do with opaque so no need to pass it into read_inode as well. > >BONUS: Existing file systems do not need modification except for nfs, >ReiserFS and CODA and possibly ex-tree fs like XFS. > >We can either leave it at this or go on to really remove ->read_inode >completely, e.g. like so, in patch 2: > >- Introduce VFS helper fs_iget(): (perhaps as static inline in linux/fs.h?) > >struct inode *fs_iget(sb, ino, find_actor, init_locked_actor, opaque, >fs_read_inode) Or better make that fs_iget5() and make a simpler fs_iget() wrapping fs_iget5() as fs_iget(sb, ino, fs_read_inode); Anton >{ > inode = iget5(sb, ino, find_actor, init_locked_actor, opaque); > if (inode has I_NEW set) { > fs_read_inode(inode); > unlock_new_inode(inode); > } > return inode; >} > >FS which want to play games don't have to use this and write their own. > >- Remove ->read_inode altogether. Switch all existing fs to using fs_iget. > >Comments? > >Al, is that what you are thinking off? > >Best regards, > > Anton > > >-- > "I've not lost my mind. It's backed up on tape somewhere." - Unknown >-- >Anton Altaparmakov <aia21 at cantab.net> (replace at with @) >Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net >WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ > >- >To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:34 ` Anton Altaparmakov 2002-04-29 19:39 ` Anton Altaparmakov @ 2002-04-29 19:41 ` Steve Lord 2002-04-29 19:51 ` Anton Altaparmakov 2002-04-29 19:51 ` Christoph Hellwig 2002-04-29 22:36 ` Alexander Viro 3 siblings, 1 reply; 31+ messages in thread From: Steve Lord @ 2002-04-29 19:41 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel On Mon, 2002-04-29 at 14:34, Anton Altaparmakov wrote: > > How about this as 1st patch: > > - Replace iget4() with iget5() with an additional argument: a callback to > initialize a locked inode in order to kill the race condition Jan found. > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > > - Let get_new_inode() call init_locked_actor before dropping the > inode_lock. It then calls ->read_inode as usual. > > - read_inode2 is no longer needed as init_actor already has done whatever > it needed to do with opaque so no need to pass it into read_inode as well. > > BONUS: Existing file systems do not need modification except for nfs, > ReiserFS and CODA and possibly ex-tree fs like XFS. Problem is, XFS needs a lot more context than that to fill in the inode at this point. If we are doing this from within a transaction we need the transaction pointer for starters. Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:41 ` Steve Lord @ 2002-04-29 19:51 ` Anton Altaparmakov 0 siblings, 0 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 19:51 UTC (permalink / raw) To: Steve Lord; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel At 20:41 29/04/02, Steve Lord wrote: >On Mon, 2002-04-29 at 14:34, Anton Altaparmakov wrote: > > > > How about this as 1st patch: > > > > - Replace iget4() with iget5() with an additional argument: a callback to > > initialize a locked inode in order to kill the race condition Jan found. > > > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > > > > - Let get_new_inode() call init_locked_actor before dropping the > > inode_lock. It then calls ->read_inode as usual. > > > > - read_inode2 is no longer needed as init_actor already has done whatever > > it needed to do with opaque so no need to pass it into read_inode as well. > > > > BONUS: Existing file systems do not need modification except for nfs, > > ReiserFS and CODA and possibly ex-tree fs like XFS. > >Problem is, XFS needs a lot more context than that to fill in the inode >at this point. If we are doing this from within a transaction we need >the transaction pointer for starters. Then proceed immediately to patch 2 removing read_inode altogether as Al seems to want to happen anyways and do whatever you want inside XFS's own version of xfs_iget(). I.e. xfs_iget() would call iget5() that would return to you a locked, new inode. You then do whatever you like with it and when it is done you do an unlock_new_inode(). Would that work or you? Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:34 ` Anton Altaparmakov 2002-04-29 19:39 ` Anton Altaparmakov 2002-04-29 19:41 ` Steve Lord @ 2002-04-29 19:51 ` Christoph Hellwig 2002-04-29 20:07 ` Anton Altaparmakov 2002-04-29 20:20 ` Jan Harkes 2002-04-29 22:36 ` Alexander Viro 3 siblings, 2 replies; 31+ messages in thread From: Christoph Hellwig @ 2002-04-29 19:51 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel On Mon, Apr 29, 2002 at 08:34:00PM +0100, Anton Altaparmakov wrote: > > - Replace iget4() with iget5() with an additional argument: a callback to > initialize a locked inode in order to kill the race condition Jan found. > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > > - Let get_new_inode() call init_locked_actor before dropping the > inode_lock. It then calls ->read_inode as usual. Don't do this - rather use the XFS-style icreate() plus per-filesystem exclusion. The VFS shouldn't have to worry about this kind of problems, especially with such an ugly API. > struct inode *fs_iget(sb, ino, find_actor, init_locked_actor, opaque, > fs_read_inode) > { > inode = iget5(sb, ino, find_actor, init_locked_actor, opaque); > if (inode has I_NEW set) { > fs_read_inode(inode); > unlock_new_inode(inode); > } > return inode; > } I already though about this alot (as generic_iget()), but I wonder whether it is really worth the effort or whether filesystems should just duplicate that 5 lines. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:51 ` Christoph Hellwig @ 2002-04-29 20:07 ` Anton Altaparmakov 2002-04-29 20:13 ` Christoph Hellwig 2002-04-29 20:20 ` Jan Harkes 1 sibling, 1 reply; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 20:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel At 20:51 29/04/02, Christoph Hellwig wrote: >On Mon, Apr 29, 2002 at 08:34:00PM +0100, Anton Altaparmakov wrote: > > > > - Replace iget4() with iget5() with an additional argument: a callback to > > initialize a locked inode in order to kill the race condition Jan found. > > > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > > > > - Let get_new_inode() call init_locked_actor before dropping the > > inode_lock. It then calls ->read_inode as usual. > >Don't do this - rather use the XFS-style icreate() plus per-filesystem >exclusion. The VFS shouldn't have to worry about this kind of problems, >especially with such an ugly API. Well, I find the icreate ugly... The name is extremely misleading IMO. Why do you want to bring a massive per-sb-global lock onto each fs?!? That is ridiculous, IMHO. And will certainly cause scalability problems on SMP. The two actor API is not ugly at all. If you have one actor you can have two. Same difference. The init_locked_actor would only copy fields from the opaque pointer into the fs specific part of the inode, nothing else. Just enough so that the find_actor is capable of distinguishing two inodes with equal inode number from each other. The VFS has all to worry about this kind of problems because the comparison is passed to the fs already and it has to be. The initialization to go with it fits in very nicely. > > struct inode *fs_iget(sb, ino, find_actor, init_locked_actor, opaque, > > fs_read_inode) > > { > > inode = iget5(sb, ino, find_actor, init_locked_actor, opaque); > > if (inode has I_NEW set) { > > fs_read_inode(inode); > > unlock_new_inode(inode); > > } > > return inode; > > } > >I already though about this alot (as generic_iget()), but I wonder whether >it is really worth the effort or whether filesystems should just duplicate >that 5 lines. IMHO such helpers are very well worth the effort. Otherwise why do we have all the single line bit testing helpers for example? Makes code more readable and makes writing of a new fs much easier... Just my 2p. Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:07 ` Anton Altaparmakov @ 2002-04-29 20:13 ` Christoph Hellwig 2002-04-29 20:24 ` Jan Harkes 2002-04-29 20:28 ` Anton Altaparmakov 0 siblings, 2 replies; 31+ messages in thread From: Christoph Hellwig @ 2002-04-29 20:13 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel On Mon, Apr 29, 2002 at 09:07:30PM +0100, Anton Altaparmakov wrote: > >Don't do this - rather use the XFS-style icreate() plus per-filesystem > >exclusion. The VFS shouldn't have to worry about this kind of problems, > >especially with such an ugly API. > > Well, I find the icreate ugly... The name is extremely misleading IMO. Find a better name.. (and no, please not 'iget_without_read_inode'). > Why do you want to bring a massive per-sb-global lock onto each fs?!? That > is ridiculous, IMHO. And will certainly cause scalability problems on SMP. How do you think it can scale worse than the global inode lock? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:13 ` Christoph Hellwig @ 2002-04-29 20:24 ` Jan Harkes 2002-04-29 20:28 ` Anton Altaparmakov 1 sibling, 0 replies; 31+ messages in thread From: Jan Harkes @ 2002-04-29 20:24 UTC (permalink / raw) To: linux-fsdevel On Mon, Apr 29, 2002 at 09:13:37PM +0100, Christoph Hellwig wrote: > On Mon, Apr 29, 2002 at 09:07:30PM +0100, Anton Altaparmakov wrote: > > Why do you want to bring a massive per-sb-global lock onto each fs?!? That > > is ridiculous, IMHO. And will certainly cause scalability problems on SMP. > > How do you think it can scale worse than the global inode lock? Because the massive per-sb-global lock would hold the lock over the call to read_inode which will most likely have to go to disk to get the actual inode contents. In the mean time we block out access to all inodes, even those that are already in the cache. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:13 ` Christoph Hellwig 2002-04-29 20:24 ` Jan Harkes @ 2002-04-29 20:28 ` Anton Altaparmakov 1 sibling, 0 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 20:28 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Alexander Viro, Jan Harkes, linux-fsdevel At 21:13 29/04/02, Christoph Hellwig wrote: >On Mon, Apr 29, 2002 at 09:07:30PM +0100, Anton Altaparmakov wrote: > > >Don't do this - rather use the XFS-style icreate() plus per-filesystem > > >exclusion. The VFS shouldn't have to worry about this kind of problems, > > >especially with such an ugly API. > > > > Well, I find the icreate ugly... The name is extremely misleading IMO. > >Find a better name.. (and no, please not 'iget_without_read_inode'). iget_locked? Short and to the point... > > Why do you want to bring a massive per-sb-global lock onto each fs?!? That > > is ridiculous, IMHO. And will certainly cause scalability problems on SMP. > >How do you think it can scale worse than the global inode lock? inode lock is a spin lock. Held for very short period. The fs lock will have to be a semaphore and you will sleep in ->alloc_inode with the fs lock held. There is no question which scales better IMHO... (-; Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:51 ` Christoph Hellwig 2002-04-29 20:07 ` Anton Altaparmakov @ 2002-04-29 20:20 ` Jan Harkes 2002-04-29 20:31 ` Anton Altaparmakov 1 sibling, 1 reply; 31+ messages in thread From: Jan Harkes @ 2002-04-29 20:20 UTC (permalink / raw) To: linux-fsdevel On Mon, Apr 29, 2002 at 08:51:13PM +0100, Christoph Hellwig wrote: > On Mon, Apr 29, 2002 at 08:34:00PM +0100, Anton Altaparmakov wrote: > > > > - Replace iget4() with iget5() with an additional argument: a callback to > > initialize a locked inode in order to kill the race condition Jan found. > > > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); > > > > - Let get_new_inode() call init_locked_actor before dropping the > > inode_lock. It then calls ->read_inode as usual. > > Don't do this - rather use the XFS-style icreate() plus per-filesystem > exclusion. The VFS shouldn't have to worry about this kind of problems, > especially with such an ugly API. Correct, add find_actor and init_locked_inode to the icreate interface and drop iget4 like a hot potato. I've modified Steve Lord's patch to include the various suggestions that were made and hacked up a first attempt at converting Coda and ReiserFS to use icreate. Not sure if this works, but it compiles. Jan Patch is against 2.5.11, diff -urN orig/fs/Makefile icreate2/fs/Makefile --- orig/fs/Makefile Mon Apr 29 14:10:14 2002 +++ icreate2/fs/Makefile Mon Apr 29 15:28:02 2002 @@ -7,7 +7,7 @@ O_TARGET := fs.o -export-objs := filesystems.o open.o dcache.o buffer.o bio.o +export-objs := filesystems.o open.o dcache.o buffer.o bio.o inode.o mod-subdirs := nls obj-y := open.o read_write.o devices.o file_table.o buffer.o \ diff -urN orig/fs/coda/cnode.c icreate2/fs/coda/cnode.c --- orig/fs/coda/cnode.c Tue Apr 2 14:36:09 2002 +++ icreate2/fs/coda/cnode.c Mon Apr 29 16:15:28 2002 @@ -25,7 +25,7 @@ return 1; } -static int coda_inocmp(struct inode *inode, unsigned long ino, void *opaque) +static int coda_inocmp(struct inode *inode, void *opaque) { return (coda_fideq((ViceFid *)opaque, &(ITOC(inode)->c_fid))); } @@ -55,26 +55,30 @@ init_special_inode(inode, inode->i_mode, attr->va_rdev); } +int coda_init_inode(struct inode *inode, void *data) +{ + ITOC(inode)->c_fid = *(ViceFid *)data; + return 0; +} + struct inode * coda_iget(struct super_block * sb, ViceFid * fid, struct coda_vattr * attr) { struct inode *inode; struct coda_inode_info *cii; + struct coda_sb_info *sbi = coda_sbp(sb); ino_t ino = coda_f2i(fid); - inode = iget4(sb, ino, coda_inocmp, fid); + inode = icreate(sb, ino, coda_inocmp, coda_init_inode, fid); if (!inode) return ERR_PTR(-ENOMEM); - /* check if the inode is already initialized */ - cii = ITOC(inode); - if (coda_isnullfid(&cii->c_fid)) - /* new, empty inode found... initializing */ - cii->c_fid = *fid; - - /* we shouldnt see inode collisions anymore */ - if (!coda_fideq(fid, &cii->c_fid)) BUG(); + if (inode->i_state & I_NEW) { + cii = ITOC(inode); + list_add(&cii->c_cilist, &sbi->sbi_cihead); + unlock_new_inode(inode); + } /* always replace the attributes, type might have changed */ coda_fill_inode(inode, attr); @@ -126,12 +130,16 @@ insert_inode_hash(inode); } +int coda_fail_inode(struct inode *inode, void *opaque) +{ + return -1; +} + /* convert a fid to an inode. */ struct inode *coda_fid_to_inode(ViceFid *fid, struct super_block *sb) { ino_t nr; struct inode *inode; - struct coda_inode_info *cii; if ( !sb ) { printk("coda_fid_to_inode: no sb!\n"); @@ -139,24 +147,13 @@ } nr = coda_f2i(fid); - inode = iget4(sb, nr, coda_inocmp, fid); - if ( !inode ) { - printk("coda_fid_to_inode: null from iget, sb %p, nr %ld.\n", - sb, (long)nr); - return NULL; - } - - cii = ITOC(inode); - - /* The inode could already be purged due to memory pressure */ - if (coda_isnullfid(&cii->c_fid)) { - inode->i_nlink = 0; - iput(inode); + inode = icreate(sb, nr, coda_inocmp, coda_fail_inode, fid); + if ( !inode ) return NULL; - } - /* we shouldn't see inode collisions anymore */ - if ( !coda_fideq(fid, &cii->c_fid) ) BUG(); + /* we should never see newly created inodes because we intentionally + * fail in the initialization callback */ + BUG_ON(inode->i_state & I_NEW); return inode; } diff -urN orig/fs/coda/inode.c icreate2/fs/coda/inode.c --- orig/fs/coda/inode.c Mon Apr 29 14:09:02 2002 +++ icreate2/fs/coda/inode.c Mon Apr 29 15:55:16 2002 @@ -229,16 +229,9 @@ kfree(sbi); } -/* all filling in of inodes postponed until lookup */ static void coda_read_inode(struct inode *inode) { - struct coda_sb_info *sbi = coda_sbp(inode->i_sb); - struct coda_inode_info *cii; - - if (!sbi) BUG(); - - cii = ITOC(inode); - list_add(&cii->c_cilist, &sbi->sbi_cihead); + make_bad_inode(inode); } static void coda_clear_inode(struct inode *inode) diff -urN orig/fs/inode.c icreate2/fs/inode.c --- orig/fs/inode.c Mon Apr 29 14:09:02 2002 +++ icreate2/fs/inode.c Mon Apr 29 16:07:58 2002 @@ -17,6 +17,7 @@ #include <linux/swapctl.h> #include <linux/prefetch.h> #include <linux/locks.h> +#include <linux/module.h> /* * New inode.c implementation. @@ -809,7 +810,7 @@ continue; if (inode->i_sb != sb) continue; - if (find_actor && !find_actor(inode, ino, opaque)) + if (find_actor && !find_actor(inode, opaque)) continue; break; } @@ -842,15 +843,32 @@ return inode; } +void unlock_new_inode(struct inode *inode) +{ + BUG_ON(!(inode->i_state & I_NEW)); + /* + * This is special! We do not need the spinlock + * when clearing I_LOCK, because we're guaranteed + * that nobody else tries to do anything about the + * state of the inode when it is locked, as we + * just created it (so there can be no old holders + * that haven't tested I_LOCK). + */ + inode->i_state &= ~(I_LOCK|I_NEW); + wake_up(&inode->i_wait); +} + + /* * This is called without the inode lock held.. Be careful. * * We no longer cache the sb_flags in i_flags - see fs.h * -- rmk@arm.uk.linux.org */ -static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque) +static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, init_inode_t init_inode, void *opaque) { struct inode * inode; + int err = 0; inode = alloc_inode(sb); if (inode) { @@ -860,35 +878,24 @@ /* We released the lock, so.. */ old = find_inode(sb, ino, head, find_actor, opaque); if (!old) { - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - list_add(&inode->i_hash, head); inode->i_ino = ino; - inode->i_state = I_LOCK; + inode->i_state = I_LOCK|I_NEW; + if (!init_inode || (err = init_inode(inode, opaque)) == 0) { + inodes_stat.nr_inodes++; + list_add(&inode->i_list, &inode_in_use); + list_add(&inode->i_hash, head); + } spin_unlock(&inode_lock); - /* reiserfs specific hack right here. We don't - ** want this to last, and are looking for VFS changes - ** that will allow us to get rid of it. - ** -- mason@suse.com - */ - if (sb->s_op->read_inode2) { - sb->s_op->read_inode2(inode, opaque) ; - } else { - sb->s_op->read_inode(inode); + /* failed to initialize? */ + if (err) { + destroy_inode(inode); + return NULL; } - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). + /* Return the locked inode with I_NEW set, the + * caller is responsible for filling in the contents */ - inode->i_state &= ~I_LOCK; - wake_up(&inode->i_wait); - return inode; } @@ -968,8 +975,12 @@ return inode; } - -struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque) +/* + * This is iget4 without the read_inode portion of get_new_inode + * the filesystem gets back a new locked and hashed inode and gets + * to fill it in before unlocking it via unlock_new_inode(). + */ +struct inode *icreate(struct super_block *sb, unsigned long ino, find_inode_t find_actor, init_inode_t init_inode, void *opaque) { struct list_head * head = inode_hashtable + hash(sb,ino); struct inode * inode; @@ -984,12 +995,11 @@ } spin_unlock(&inode_lock); - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, ino, head, find_actor, opaque); + return get_new_inode(sb, ino, head, find_actor, init_inode, opaque); } + +EXPORT_SYMBOL(icreate); +EXPORT_SYMBOL(unlock_new_inode); /** * insert_inode_hash - hash an inode diff -urN orig/fs/reiserfs/inode.c icreate2/fs/reiserfs/inode.c --- orig/fs/reiserfs/inode.c Mon Apr 29 14:10:15 2002 +++ icreate2/fs/reiserfs/inode.c Mon Apr 29 16:16:46 2002 @@ -1134,19 +1134,20 @@ /* looks for stat data in the tree, and fills up the fields of in-core inode stat data fields */ -void reiserfs_read_inode2 (struct inode * inode, void *p) +int reiserfs_init_inode (struct inode * inode, void *p) +{ + struct reiserfs_iget4_args *args = (struct reiserfs_iget4_args *)p ; + INODE_PKEY(inode)->k_dir_id = cpu_to_le32(args->objectid); + return 0; +} + +void reiserfs_init_inode2 (struct inode * inode, struct reiserfs_iget4_args *args) { INITIALIZE_PATH (path_to_sd); struct cpu_key key; - struct reiserfs_iget4_args *args = (struct reiserfs_iget4_args *)p ; unsigned long dirino; int retval; - if (!p) { - reiserfs_make_bad_inode(inode) ; - return; - } - dirino = args->objectid ; /* set version 1, version 2 could be used too, because stat data @@ -1216,8 +1217,7 @@ * inode numbers (objectids) are distinguished by parent directory ids. * */ -static int reiserfs_find_actor( struct inode *inode, - unsigned long inode_no, void *opaque ) +static int reiserfs_find_actor( struct inode *inode, void *opaque ) { struct reiserfs_iget4_args *args; @@ -1232,10 +1232,15 @@ struct reiserfs_iget4_args args ; args.objectid = key->on_disk_key.k_dir_id ; - inode = iget4 (s, key->on_disk_key.k_objectid, - reiserfs_find_actor, (void *)(&args)); + inode = icreate (s, key->on_disk_key.k_objectid, + reiserfs_find_actor, reiserfs_init_inode, (void *)(&args)); if (!inode) return ERR_PTR(-ENOMEM) ; + + if (inode->i_state & I_NEW) { + reiserfs_init_inode2(inode, &args); + unlock_new_inode(inode); + } if (comp_short_keys (INODE_PKEY (inode), key) || is_bad_inode (inode)) { /* either due to i/o error or a stale NFS handle */ diff -urN orig/fs/reiserfs/super.c icreate2/fs/reiserfs/super.c --- orig/fs/reiserfs/super.c Mon Apr 29 14:09:03 2002 +++ icreate2/fs/reiserfs/super.c Mon Apr 29 16:18:28 2002 @@ -485,7 +485,6 @@ alloc_inode: reiserfs_alloc_inode, destroy_inode: reiserfs_destroy_inode, read_inode: reiserfs_read_inode, - read_inode2: reiserfs_read_inode2, write_inode: reiserfs_write_inode, dirty_inode: reiserfs_dirty_inode, delete_inode: reiserfs_delete_inode, @@ -1065,10 +1064,15 @@ s->s_flags |= MS_RDONLY ; } args.objectid = REISERFS_ROOT_PARENT_OBJECTID ; - root_inode = iget4 (s, REISERFS_ROOT_OBJECTID, 0, (void *)(&args)); + root_inode = icreate (s, REISERFS_ROOT_OBJECTID, 0, reiserfs_init_inode, (void *)(&args)); if (!root_inode) { printk ("reiserfs_fill_super: get root inode failed\n"); goto error; + } + + if (root_inode->i_state & I_NEW) { + reiserfs_init_inode2(root_inode, &args); + unlock_new_inode(root_inode); } s->s_root = d_alloc_root(root_inode); diff -urN orig/include/linux/fs.h icreate2/include/linux/fs.h --- orig/include/linux/fs.h Mon Apr 29 14:11:21 2002 +++ icreate2/include/linux/fs.h Mon Apr 29 16:03:48 2002 @@ -852,13 +852,6 @@ void (*read_inode) (struct inode *); - /* reiserfs kludge. reiserfs needs 64 bits of information to - ** find an inode. We are using the read_inode2 call to get - ** that information. We don't like this, and are waiting on some - ** VFS changes for the real solution. - ** iget4 calls read_inode2, iff it is defined - */ - void (*read_inode2) (struct inode *, void *) ; void (*dirty_inode) (struct inode *); void (*write_inode) (struct inode *, int); void (*put_inode) (struct inode *); @@ -906,6 +899,7 @@ #define I_LOCK 8 #define I_FREEING 16 #define I_CLEAR 32 +#define I_NEW 64 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) @@ -1442,11 +1436,23 @@ extern struct inode * igrab(struct inode *); extern ino_t iunique(struct super_block *, ino_t); -typedef int (*find_inode_t)(struct inode *, unsigned long, void *); -extern struct inode * iget4(struct super_block *, unsigned long, find_inode_t, void *); +typedef int (*find_inode_t)(struct inode *, void *); +typedef int (*init_inode_t)(struct inode *, void *); +extern struct inode * icreate(struct super_block *, unsigned long, find_inode_t find_actor, init_inode_t init_inode, void *opaque); +extern void unlock_new_inode(struct inode *); + static inline struct inode *iget(struct super_block *sb, unsigned long ino) { - return iget4(sb, ino, NULL, NULL); + struct inode *inode; + + inode = icreate(sb, ino, NULL, NULL, NULL); + + if (inode && (inode->i_state & I_NEW)) { + sb->s_op->read_inode(inode); + unlock_new_inode(inode); + } + + return inode; } extern void clear_inode(struct inode *); diff -urN orig/include/linux/reiserfs_fs.h icreate2/include/linux/reiserfs_fs.h --- orig/include/linux/reiserfs_fs.h Mon Apr 29 14:10:33 2002 +++ icreate2/include/linux/reiserfs_fs.h Mon Apr 29 16:17:31 2002 @@ -1819,7 +1819,8 @@ /* inode.c */ void reiserfs_read_inode (struct inode * inode) ; -void reiserfs_read_inode2(struct inode * inode, void *p) ; +int reiserfs_init_inode(struct inode * inode, void *p) ; +void reiserfs_init_inode2(struct inode * inode, struct reiserfs_iget4_args *args) ; void reiserfs_delete_inode (struct inode * inode); void reiserfs_write_inode (struct inode * inode, int) ; struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, __u32 *data, diff -urN orig/kernel/ksyms.c icreate2/kernel/ksyms.c --- orig/kernel/ksyms.c Mon Apr 29 14:11:22 2002 +++ icreate2/kernel/ksyms.c Mon Apr 29 16:02:01 2002 @@ -137,7 +137,6 @@ EXPORT_SYMBOL(fget); EXPORT_SYMBOL(igrab); EXPORT_SYMBOL(iunique); -EXPORT_SYMBOL(iget4); EXPORT_SYMBOL(iput); EXPORT_SYMBOL(inode_init_once); EXPORT_SYMBOL(force_delete); ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:20 ` Jan Harkes @ 2002-04-29 20:31 ` Anton Altaparmakov 2002-04-29 20:48 ` Steve Lord 2002-04-29 20:58 ` Jan Harkes 0 siblings, 2 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 20:31 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel At 21:20 29/04/02, Jan Harkes wrote: >On Mon, Apr 29, 2002 at 08:51:13PM +0100, Christoph Hellwig wrote: > > On Mon, Apr 29, 2002 at 08:34:00PM +0100, Anton Altaparmakov wrote: > > > > > > - Replace iget4() with iget5() with an additional argument: a > callback to > > > initialize a locked inode in order to kill the race condition Jan found. > > > > > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void > *opaque); > > > > > > - Let get_new_inode() call init_locked_actor before dropping the > > > inode_lock. It then calls ->read_inode as usual. > > > > Don't do this - rather use the XFS-style icreate() plus per-filesystem > > exclusion. The VFS shouldn't have to worry about this kind of problems, > > especially with such an ugly API. > >Correct, add find_actor and init_locked_inode to the icreate interface >and drop iget4 like a hot potato. I may have misunderstood in which case I appologize but I understood that Christoph said that he doesn't want init_locked_inode and instead a fs specific exclusion based approach which you would need to take / drop around icreate() or iget4() or whatever you want to call it... Which IMO would be very bad. Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:31 ` Anton Altaparmakov @ 2002-04-29 20:48 ` Steve Lord 2002-04-29 20:58 ` Jan Harkes 1 sibling, 0 replies; 31+ messages in thread From: Steve Lord @ 2002-04-29 20:48 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Jan Harkes, linux-fsdevel On Mon, 2002-04-29 at 15:31, Anton Altaparmakov wrote: > > I may have misunderstood in which case I appologize but I understood that > Christoph said that he doesn't want init_locked_inode and instead a fs > specific exclusion based approach which you would need to take / drop > around icreate() or iget4() or whatever you want to call it... Which IMO > would be very bad. > > Anton > The current icreate implementation changes nothing in the locking, we create a new inode under the global inode spinlock and drop that lock with the I_LOCK still held. That closes the window of two threads looking up the same inode at once. The contents of this inode are safe from any other thread until we have populated and unlocked it, they will block in wait_on_inode(). There is no other locking requirement being imposed here. If an individual filesystem needs a filesystem wide lock to fill in the inode contents then that's its problem, but there is no such beast in XFS. Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:31 ` Anton Altaparmakov 2002-04-29 20:48 ` Steve Lord @ 2002-04-29 20:58 ` Jan Harkes 2002-04-29 21:04 ` Steve Lord 1 sibling, 1 reply; 31+ messages in thread From: Jan Harkes @ 2002-04-29 20:58 UTC (permalink / raw) To: linux-fsdevel On Mon, Apr 29, 2002 at 09:31:57PM +0100, Anton Altaparmakov wrote: > At 21:20 29/04/02, Jan Harkes wrote: > >On Mon, Apr 29, 2002 at 08:51:13PM +0100, Christoph Hellwig wrote: > >> > >> Don't do this - rather use the XFS-style icreate() plus per-filesystem > >> exclusion. The VFS shouldn't have to worry about this kind of problems, > >> especially with such an ugly API. > > > >Correct, add find_actor and init_locked_inode to the icreate interface > >and drop iget4 like a hot potato. > > I may have misunderstood in which case I appologize but I understood that > Christoph said that he doesn't want init_locked_inode and instead a fs > specific exclusion based approach which you would need to take / drop > around icreate() or iget4() or whatever you want to call it... Which IMO > would be very bad. I conveniently ignored that suggestion. I took icreate as the basis to implement iget4/read_inode2 the "right" way. - Fixes the race I found without resorting to a global per-FS lock. - Replaces the reiserfs specific "hacks", no more iget4 or read_inode2. - Should still be close enough for XFS (add a couple of NULL pointers to the icreate calls). I just thought of moving the initialization of i_state up a couple of lines so that a FS doesn't have to jump through hoops to figure out if the inode passed to destroy_inode has been initialized yet. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 20:58 ` Jan Harkes @ 2002-04-29 21:04 ` Steve Lord 2002-04-29 21:34 ` Anton Altaparmakov 0 siblings, 1 reply; 31+ messages in thread From: Steve Lord @ 2002-04-29 21:04 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel On Mon, 2002-04-29 at 15:58, Jan Harkes wrote: > On Mon, Apr 29, 2002 at 09:31:57PM +0100, Anton Altaparmakov wrote: > > At 21:20 29/04/02, Jan Harkes wrote: > > >On Mon, Apr 29, 2002 at 08:51:13PM +0100, Christoph Hellwig wrote: > > >> > > >> Don't do this - rather use the XFS-style icreate() plus per-filesystem > > >> exclusion. The VFS shouldn't have to worry about this kind of problems, > > >> especially with such an ugly API. > > > > > >Correct, add find_actor and init_locked_inode to the icreate interface > > >and drop iget4 like a hot potato. > > > > I may have misunderstood in which case I appologize but I understood that > > Christoph said that he doesn't want init_locked_inode and instead a fs > > specific exclusion based approach which you would need to take / drop > > around icreate() or iget4() or whatever you want to call it... Which IMO > > would be very bad. > > I conveniently ignored that suggestion. I took icreate as the basis to > implement iget4/read_inode2 the "right" way. > > - Fixes the race I found without resorting to a global per-FS lock. > - Replaces the reiserfs specific "hacks", no more iget4 or read_inode2. > - Should still be close enough for XFS (add a couple of NULL pointers > to the icreate calls). > > I just thought of moving the initialization of i_state up a couple of > lines so that a FS doesn't have to jump through hoops to figure out if > the inode passed to destroy_inode has been initialized yet. > > Jan I think it works for XFS from looking at the code, I am still wondering about the need for the init_inode call, If we are returning to the fs with the inode locked and with I_NEW set, it can do all the work there and then unlock. What is an example of what init_inode is dealing with? Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 21:04 ` Steve Lord @ 2002-04-29 21:34 ` Anton Altaparmakov 0 siblings, 0 replies; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 21:34 UTC (permalink / raw) To: Steve Lord; +Cc: Jan Harkes, linux-fsdevel At 22:04 29/04/02, Steve Lord wrote: >I think it works for XFS from looking at the code, I am still wondering >about the need for the init_inode call, If we are returning to the fs >with the inode locked and with I_NEW set, it can do all the work there >and then unlock. What is an example of what init_inode is dealing with? I will give you NTFS as it is very simple. you have multiple "struct inode" with the same i_ino which are all the same inode but each struct inode signifies a different data stream (this is over simplified but will do for the example). find_actor needs a "stream_id" inopaque do distinguish which is which. You need init_inode in order to set the "stream_id" in the struct ntfs_inode, so that a subsequent find_actor can retrieve it and compare it to the current opaque during the find_inode call. If the stream_id is not set, find_actor will think this is a different inode (even though it may be the same) and you will end up with two struct inodes in the inode cache with identical i_ino AND identical stream_id. Unnecessary to say that you have just blown up the fs/VFS... And this is a very realistic smp race where two igets arrive simultaneously for same i_ino and same "stream_id". The I_LOCK does nothing here. You need the inode cache lock exclusion or find_actor will get the comparison wrong. Is it clear now or have I confused you? Not sure if I explained it well... Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:34 ` Anton Altaparmakov ` (2 preceding siblings ...) 2002-04-29 19:51 ` Christoph Hellwig @ 2002-04-29 22:36 ` Alexander Viro 2002-04-29 22:54 ` Jan Harkes 3 siblings, 1 reply; 31+ messages in thread From: Alexander Viro @ 2002-04-29 22:36 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Jan Harkes, linux-fsdevel On Mon, 29 Apr 2002, Anton Altaparmakov wrote: > - Replace iget4() with iget5() with an additional argument: a callback to > initialize a locked inode in order to kill the race condition Jan found. > > struct inode *iget5(sb, ino, find_actor, init_locked_actor, void *opaque); test, set, data > - Let get_new_inode() call init_locked_actor before dropping the > inode_lock. It then calls ->read_inode as usual. > > - read_inode2 is no longer needed as init_actor already has done whatever > it needed to do with opaque so no need to pass it into read_inode as well. > > BONUS: Existing file systems do not need modification except for nfs, > ReiserFS and CODA and possibly ex-tree fs like XFS. > > We can either leave it at this or go on to really remove ->read_inode > completely, e.g. like so, in patch 2: > > - Introduce VFS helper fs_iget(): (perhaps as static inline in linux/fs.h?) > > struct inode *fs_iget(sb, ino, find_actor, init_locked_actor, opaque, > fs_read_inode) > { > inode = iget5(sb, ino, find_actor, init_locked_actor, opaque); > if (inode has I_NEW set) { > fs_read_inode(inode); > unlock_new_inode(inode); > } > return inode; > } > > FS which want to play games don't have to use this and write their own. > > - Remove ->read_inode altogether. Switch all existing fs to using fs_iget. > > Comments? That would make sense. What's more, that would allow to handle read_inode() failures in more graceful way than we do right now. HOWEVER, keep in mind that there are pieces of code that do iget() on other filesystems' inodes. Each of these will need to be fixed. Notice that knfsd used to be the worst offender, but it's getting better. CODA needs to be fixed. Don't worry about Intermezzo - in 2.5 it's deeply broken, so that will have to wait until maintainers will get around to the damn thing. grep around - there might be other instances. CODA should simply pass opened descriptors instead of inumbers. No need to bother with iget() when server had already opened the file in question. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 22:36 ` Alexander Viro @ 2002-04-29 22:54 ` Jan Harkes 0 siblings, 0 replies; 31+ messages in thread From: Jan Harkes @ 2002-04-29 22:54 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-fsdevel On Mon, Apr 29, 2002 at 06:36:53PM -0400, Alexander Viro wrote: > Notice that knfsd used to be the worst offender, but it's getting better. > CODA needs to be fixed. Don't worry about Intermezzo - in 2.5 it's > deeply broken, so that will have to wait until maintainers will get > around to the damn thing. grep around - there might be other instances. > > CODA should simply pass opened descriptors instead of inumbers. No need > to bother with iget() when server had already opened the file in question. We have already been doing that since 2.3.xx, as well as using iget4 for our own inodes because we map a 96-bit fileid on the 32-bit inode number space. I changed my patch which replaces iget4 with icreate and uses the 'test', 'set', and 'data' names, interested? Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 16:14 iget4/read_inode2 race in get_new_inode Jan Harkes 2002-04-29 16:48 ` Jan Harkes @ 2002-04-29 17:18 ` Anton Altaparmakov 2002-04-29 17:33 ` Jan Harkes 2002-04-29 17:44 ` Alexander Viro 2 siblings, 1 reply; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 17:18 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel At 17:14 29/04/02, Jan Harkes wrote: >Hi all, > >I believe I found a race in get_new_inode (fs/inode.c) which affects >filesystems that use iget4 (and read_inode2). Because the 'opaque' value >of the new inode is initialized in read_inode2 after the inode_lock has >already been released. So another thread could call into get_new_inode >and not see the inode that hasn't yet been initialized in read_inode2. > >Moving the spin_unlock(&inode_lock) to after the read_inode calls will >fix this, ->read_inode can (and will in many cases in NTFS) block as it has to read the inode from disk into the page cache and acquire semaphores and other longs along the way. So this solution doesn't work... >but could lead to deadlock for stacking filesystems that try >to access another filesystem/inode from their read_inode call. Coda is >safe, but at least intermezzo seems to be calling through to other >filesystems. > >A solution would be to add i_ops->init_locked_inode(inode, opaque) which >would be called while we're still holding the inode lock. I couldn't see >a simple way to split such functionality out of the reiserfs read_inode2 >function, but it would look something like the following. > >static int reiserfs_init_locked_inode(struct inode *inode, void *opaque) >{ > INODE_PKEY(inode)->k_dir_id = > ((struct reiserfs_iget4_args *)opaque)->objectid; >} Yes, such an approach would work for ntfs, too. Basically breaking up read inode into two parts, one non-blocking and one blocking. But another posibility would be to create a new field in struct inode, i_opaque say, and have get_new_inode set it to opaque. That would be cleaner in as far as there would be no need to create another i_ops method. And it would get rid of the race completely. The only bad thing about that is that the caller than has to be careful whether or not to free the structure attached to opaque... Best regards, Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 17:18 ` Anton Altaparmakov @ 2002-04-29 17:33 ` Jan Harkes 0 siblings, 0 replies; 31+ messages in thread From: Jan Harkes @ 2002-04-29 17:33 UTC (permalink / raw) To: linux-fsdevel On Mon, Apr 29, 2002 at 06:18:43PM +0100, Anton Altaparmakov wrote: > Yes, such an approach would work for ntfs, too. Basically breaking up read > inode into two parts, one non-blocking and one blocking. > > But another posibility would be to create a new field in struct inode, > i_opaque say, and have get_new_inode set it to opaque. > That would be cleaner in as far as there would be no need to create another > i_ops method. And it would get rid of the race completely. The only bad > thing about that is that the caller than has to be careful whether or not > to free the structure attached to opaque... Yes, it is a pointer to a structure in at least Reiserfs and Coda, and when we return from iget4 the file system doesn't know whether this inode was just taken from the cache, or newly created. Allowing the filesystem to initialize the necessary data in a non-blocking part of read_inode (or alloc_inode) would be better. And I'd like to add my vote for removing the 'temporary reiserfs hack' status of read_inode2 and making it into the default read_inode operation. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 16:14 iget4/read_inode2 race in get_new_inode Jan Harkes 2002-04-29 16:48 ` Jan Harkes 2002-04-29 17:18 ` Anton Altaparmakov @ 2002-04-29 17:44 ` Alexander Viro 2002-04-29 18:02 ` Jan Harkes 2 siblings, 1 reply; 31+ messages in thread From: Alexander Viro @ 2002-04-29 17:44 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel On Mon, 29 Apr 2002, Jan Harkes wrote: > I believe I found a race in get_new_inode (fs/inode.c) which affects > filesystems that use iget4 (and read_inode2). Because the 'opaque' value > of the new inode is initialized in read_inode2 after the inode_lock has > already been released. So another thread could call into get_new_inode > and not see the inode that hasn't yet been initialized in read_inode2. What stops you from doing exclusion between iget4() calls on filesystem level? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 17:44 ` Alexander Viro @ 2002-04-29 18:02 ` Jan Harkes 2002-04-29 18:22 ` Anton Altaparmakov 0 siblings, 1 reply; 31+ messages in thread From: Jan Harkes @ 2002-04-29 18:02 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-fsdevel On Mon, Apr 29, 2002 at 01:44:00PM -0400, Alexander Viro wrote: > On Mon, 29 Apr 2002, Jan Harkes wrote: > > > I believe I found a race in get_new_inode (fs/inode.c) which affects > > filesystems that use iget4 (and read_inode2). Because the 'opaque' value > > of the new inode is initialized in read_inode2 after the inode_lock has > > already been released. So another thread could call into get_new_inode > > and not see the inode that hasn't yet been initialized in read_inode2. > > What stops you from doing exclusion between iget4() calls on filesystem level? Nothing, except for the fact that the regular iget doesn't require such exclusion, and the only 'example' for how to use iget4/read_inode2 in the kernel (reiserfs) doesn't do that either. I will use that to fix the race in Coda for the moment. In another thread Alexander responsed: > > Better solution would probably be to pass the opaque value to the > > existing i_ops->alloc_inode call. > > Hell, no. ->read_inode()/->read_inode2()/etc. are callbacks. Adding > more mess related to iget() into real methods is a Bad Idea(tm). > > If anything, we might make your init_locked_inode() a second callback > argument to iget4() (see the way sget() works). I would prefer that > combined with "return locked inode" scheme and killed ->read_inode() > as a method - let filesystem call it explicitly. I figured you would have a more appropriate solution. If you want I can create a patch that implements this for 2.5 and send it to fsdevel for further comments. I'm sure that some people will not appreciate changes to the VFS interface ;) Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 18:02 ` Jan Harkes @ 2002-04-29 18:22 ` Anton Altaparmakov 2002-04-29 18:30 ` Steve Lord 0 siblings, 1 reply; 31+ messages in thread From: Anton Altaparmakov @ 2002-04-29 18:22 UTC (permalink / raw) To: Jan Harkes; +Cc: Alexander Viro, linux-fsdevel At 19:02 29/04/02, Jan Harkes wrote: >In another thread Alexander responsed: > > Jan Harkes wrote: > > > Better solution would probably be to pass the opaque value to the > > > existing i_ops->alloc_inode call. > > > > Hell, no. ->read_inode()/->read_inode2()/etc. are callbacks. Adding > > more mess related to iget() into real methods is a Bad Idea(tm). > > > > If anything, we might make your init_locked_inode() a second callback > > argument to iget4() (see the way sget() works). I would prefer that > > combined with "return locked inode" scheme and killed ->read_inode() > > as a method - let filesystem call it explicitly. > >I figured you would have a more appropriate solution. If you want I can >create a patch that implements this for 2.5 and send it to fsdevel for >further comments. I'm sure that some people will not appreciate changes >to the VFS interface ;) The VFS interfaces have changed quite a bit since 2.5.0 so a few more changes won't make a difference. While we are entrenched in iget4()/->read_inode{,2}() cleanups, why does find_inode_t take an inode number as argument at all? IMHO that can be removed and find_actor can get the inode number from struct inode passed in as first argument. After all when find_actor is invoked by find_inode the two inode numbers have been tested for equality only 4 lines before. Jan, if you decide not to do the patches, ping me and I will have a go at them. But no point in duplicating work if you are going to do them already... (-; I had a look at the XFS code Christoph mentioned in the previous thread on LKML and their "return locked inode" stuff would be quite usable after a cleanup and a few renames... Best regards, Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cantab.net> (replace at with @) Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 18:22 ` Anton Altaparmakov @ 2002-04-29 18:30 ` Steve Lord 2002-04-29 18:39 ` Christoph Hellwig 2002-04-29 19:12 ` Jan Harkes 0 siblings, 2 replies; 31+ messages in thread From: Steve Lord @ 2002-04-29 18:30 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Jan Harkes, Alexander Viro, linux-fsdevel On Mon, 2002-04-29 at 13:22, Anton Altaparmakov wrote: > At 19:02 29/04/02, Jan Harkes wrote: > >In another thread Alexander responsed: > > > Jan Harkes wrote: > > > > Better solution would probably be to pass the opaque value to the > > > > existing i_ops->alloc_inode call. > > > > > > Hell, no. ->read_inode()/->read_inode2()/etc. are callbacks. Adding > > > more mess related to iget() into real methods is a Bad Idea(tm). > > > > > > If anything, we might make your init_locked_inode() a second callback > > > argument to iget4() (see the way sget() works). I would prefer that > > > combined with "return locked inode" scheme and killed ->read_inode() > > > as a method - let filesystem call it explicitly. > > > >I figured you would have a more appropriate solution. If you want I can > >create a patch that implements this for 2.5 and send it to fsdevel for > >further comments. I'm sure that some people will not appreciate changes > >to the VFS interface ;) > > The VFS interfaces have changed quite a bit since 2.5.0 so a few more > changes won't make a difference. > > While we are entrenched in iget4()/->read_inode{,2}() cleanups, why does > find_inode_t take an inode number as argument at all? IMHO that can be > removed and find_actor can get the inode number from struct inode passed in > as first argument. After all when find_actor is invoked by find_inode the > two inode numbers have been tested for equality only 4 lines before. > > Jan, if you decide not to do the patches, ping me and I will have a go at > them. But no point in duplicating work if you are going to do them > already... (-; I had a look at the XFS code Christoph mentioned in the > previous thread on LKML and their "return locked inode" stuff would be > quite usable after a cleanup and a few renames... > > Best regards, > > Anton > Here for the record is that code - based by the way on some suggestions Al made here to Chris Mason about other ways of doing things than read_inode2 Any comments Al, and Anton, what were the cleanups and renames? The EXPORT_SYMBOL parts of this are missing right now, they should be in inode.c not ksyms.c where we currently have them. Steve --- linux-2.5/fs/inode.c Tue Apr 16 13:26:42 2002 +++ linux-xfs/fs/inode.c Wed Apr 17 11:10:39 2002 @@ -842,6 +843,46 @@ struct inode *new_inode(struct super_blo return inode; } +static inline void _unlock_new_inode(struct inode *inode) +{ + /* + * This is special! We do not need the spinlock + * when clearing I_LOCK, because we're guaranteed + * that nobody else tries to do anything about the + * state of the inode when it is locked, as we + * just created it (so there can be no old holders + * that haven't tested I_LOCK). + */ + inode->i_state &= ~(I_LOCK|I_NEW); + wake_up(&inode->i_wait); +} + + +/* + * Broken out of create_new_inode for clarity, Calls the read_inode + * function, unlocks the populated inode, and wakes up anyone + * waiting for it to be available. + */ + +static inline void populate_inode( + struct super_block *sb, + struct inode * inode, + void *opaque) +{ + /* reiserfs specific hack right here. We don't + ** want this to last, and are looking for VFS changes + ** that will allow us to get rid of it. + ** -- mason@suse.com + */ + if (sb->s_op->read_inode2) { + sb->s_op->read_inode2(inode, opaque) ; + } else { + sb->s_op->read_inode(inode); + } + + _unlock_new_inode(inode); +} + /* * This is called without the inode lock held.. Be careful. * @@ -864,31 +905,12 @@ static struct inode * get_new_inode(stru list_add(&inode->i_list, &inode_in_use); list_add(&inode->i_hash, head); inode->i_ino = ino; - inode->i_state = I_LOCK; + inode->i_state = I_LOCK|I_NEW; spin_unlock(&inode_lock); - /* reiserfs specific hack right here. We don't - ** want this to last, and are looking for VFS changes - ** that will allow us to get rid of it. - ** -- mason@suse.com - */ - if (sb->s_op->read_inode2) { - sb->s_op->read_inode2(inode, opaque) ; - } else { - sb->s_op->read_inode(inode); - } - - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). + /* Return the locked inode with I_NEW set, the + * caller is responsible for filling in the contents */ - inode->i_state &= ~I_LOCK; - wake_up(&inode->i_wait); - return inode; } @@ -968,6 +990,34 @@ struct inode *igrab(struct inode *inode) return inode; } +/* + * This is iget4 without the read_inode portion of get_new_inode + * the filesystem gets back a new locked and hashed inode and gets + * to fill it in before unlocking it via unlock_new_inode(). + */ +struct inode *icreate(struct super_block *sb, unsigned long ino) +{ + struct list_head * head = inode_hashtable + hash(sb,ino); + struct inode * inode; + + spin_lock(&inode_lock); + inode = find_inode(sb, ino, head, NULL, NULL); + if (inode) { + __iget(inode); + spin_unlock(&inode_lock); + wait_on_inode(inode); + return inode; + } + spin_unlock(&inode_lock); + + return get_new_inode(sb, ino, head, NULL, NULL); +} + +void unlock_new_inode(struct inode *inode) +{ + _unlock_new_inode(inode); +} + struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque) { @@ -988,7 +1038,11 @@ struct inode *iget4(struct super_block * * get_new_inode() will do the right thing, re-trying the search * in case it had to block at any point. */ - return get_new_inode(sb, ino, head, find_actor, opaque); + inode = get_new_inode(sb, ino, head, find_actor, opaque); + if (inode && (inode->i_state & I_NEW)) + populate_inode(sb, inode, opaque); + + return inode; } /** --- linux-2.5/include/linux/fs.h Mon Apr 29 07:03:15 2002 +++ linux-xfs/include/linux/fs.h Mon Apr 29 13:25:49 2002 @@ -906,6 +906,7 @@ struct super_operations { #define I_LOCK 8 #define I_FREEING 16 #define I_CLEAR 32 +#define I_NEW 64 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) @@ -1449,6 +1450,9 @@ static inline struct inode *iget(struct return iget4(sb, ino, NULL, NULL); } +extern struct inode * icreate(struct super_block *, unsigned long); +extern void unlock_new_inode(struct inode *); + extern void clear_inode(struct inode *); extern struct inode *new_inode(struct super_block *); extern void remove_suid(struct dentry *); -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 18:30 ` Steve Lord @ 2002-04-29 18:39 ` Christoph Hellwig 2002-04-29 19:20 ` Steve Lord 2002-04-29 19:12 ` Jan Harkes 1 sibling, 1 reply; 31+ messages in thread From: Christoph Hellwig @ 2002-04-29 18:39 UTC (permalink / raw) To: Steve Lord; +Cc: Anton Altaparmakov, Jan Harkes, Alexander Viro, linux-fsdevel On Mon, Apr 29, 2002 at 01:30:46PM -0500, Steve Lord wrote: > Here for the record is that code - based by the way on some suggestions > Al made here to Chris Mason about other ways of doing things than read_inode2 > > Any comments Al, and Anton, what were the cleanups and renames? The > EXPORT_SYMBOL parts of this are missing right now, they should be in > inode.c not ksyms.c where we currently have them. * kill ->read_inode2 entirely, reiserfs can be directly converted to icreate() * kill populate_inode, it is small and has only one user * either remove (_)unlock_new_inode entirely and let the callers do the two lines themselves or make it a #define in fs.h (unfortunately an inline isn't possible there due to missing wake_up() declaration) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 18:39 ` Christoph Hellwig @ 2002-04-29 19:20 ` Steve Lord 2002-04-29 19:52 ` Christoph Hellwig 2002-04-29 22:02 ` Chris Mason 0 siblings, 2 replies; 31+ messages in thread From: Steve Lord @ 2002-04-29 19:20 UTC (permalink / raw) To: Christoph Hellwig Cc: Anton Altaparmakov, Jan Harkes, Alexander Viro, linux-fsdevel, Chris Mason On Mon, 2002-04-29 at 13:39, Christoph Hellwig wrote: > On Mon, Apr 29, 2002 at 01:30:46PM -0500, Steve Lord wrote: > > Here for the record is that code - based by the way on some suggestions > > Al made here to Chris Mason about other ways of doing things than read_inode2 > > > > Any comments Al, and Anton, what were the cleanups and renames? The > > EXPORT_SYMBOL parts of this are missing right now, they should be in > > inode.c not ksyms.c where we currently have them. > > * kill ->read_inode2 entirely, reiserfs can be directly converted to > icreate() You know, Chris Mason was going to look at that a looong time ago. Lets see what he says now. > * kill populate_inode, it is small and has only one user > * either remove (_)unlock_new_inode entirely and let the callers do the > two lines themselves or make it a #define in fs.h (unfortunately an > inline isn't possible there due to missing wake_up() declaration) Yes, these were just making what we had done more obvious. Here is a slight modification, I would prefer reiserfs expertise be involved in removal of read_inode2. Steve Here is a small cleanup then: --- linux-2.5/fs/inode.c Tue Apr 16 13:26:42 2002 +++ linux-xfs/fs/inode.c Mon Apr 29 14:05:58 2002 @@ -17,6 +17,7 @@ #include <linux/swapctl.h> #include <linux/prefetch.h> #include <linux/locks.h> +#include <linux/module.h> /* * New inode.c implementation. @@ -864,31 +866,12 @@ static struct inode * get_new_inode(stru list_add(&inode->i_list, &inode_in_use); list_add(&inode->i_hash, head); inode->i_ino = ino; - inode->i_state = I_LOCK; + inode->i_state = I_LOCK|I_NEW; spin_unlock(&inode_lock); - /* reiserfs specific hack right here. We don't - ** want this to last, and are looking for VFS changes - ** that will allow us to get rid of it. - ** -- mason@suse.com - */ - if (sb->s_op->read_inode2) { - sb->s_op->read_inode2(inode, opaque) ; - } else { - sb->s_op->read_inode(inode); - } - - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). + /* Return the locked inode with I_NEW set, the + * caller is responsible for filling in the contents */ - inode->i_state &= ~I_LOCK; - wake_up(&inode->i_wait); - return inode; } @@ -968,6 +951,46 @@ struct inode *igrab(struct inode *inode) return inode; } +/* + * This is iget4 without the read_inode portion of get_new_inode + * the filesystem gets back a new locked and hashed inode and gets + * to fill it in before unlocking it via unlock_new_inode(). + */ +struct inode *icreate(struct super_block *sb, unsigned long ino) +{ + struct list_head * head = inode_hashtable + hash(sb,ino); + struct inode * inode; + + spin_lock(&inode_lock); + inode = find_inode(sb, ino, head, NULL, NULL); + if (inode) { + __iget(inode); + spin_unlock(&inode_lock); + wait_on_inode(inode); + return inode; + } + spin_unlock(&inode_lock); + + return get_new_inode(sb, ino, head, NULL, NULL); +} + +void unlock_new_inode(struct inode *inode) +{ + /* + * This is special! We do not need the spinlock + * when clearing I_LOCK, because we're guaranteed + * that nobody else tries to do anything about the + * state of the inode when it is locked, as we + * just created it (so there can be no old holders + * that haven't tested I_LOCK). + */ + inode->i_state &= ~(I_LOCK|I_NEW); + wake_up(&inode->i_wait); +} + +EXPORT_SYMBOL(icreate); +EXPORT_SYMBOL(unlock_new_inode); + struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque) { @@ -988,7 +1011,23 @@ struct inode *iget4(struct super_block * * get_new_inode() will do the right thing, re-trying the search * in case it had to block at any point. */ - return get_new_inode(sb, ino, head, find_actor, opaque); + inode = get_new_inode(sb, ino, head, find_actor, opaque); + if (inode && (inode->i_state & I_NEW)) { + /* reiserfs specific hack right here. We don't + ** want this to last, and are looking for VFS changes + ** that will allow us to get rid of it. + ** -- mason@suse.com + */ + if (sb->s_op->read_inode2) { + sb->s_op->read_inode2(inode, opaque) ; + } else { + sb->s_op->read_inode(inode); + } + + unlock_new_inode(inode); + } + + return inode; } /** --- linux-2.5/fs/Makefile Mon Apr 29 07:03:14 2002 +++ linux-xfs/fs/Makefile Mon Apr 29 14:02:21 2002 @@ -7,14 +7,14 @@ O_TARGET := fs.o -export-objs := filesystems.o open.o dcache.o buffer.o bio.o +export-objs := filesystems.o open.o dcache.o buffer.o bio.o inode.o mod-subdirs := nls obj-y := open.o read_write.o devices.o file_table.o buffer.o \ bio.o super.o block_dev.o char_dev.o stat.o exec.o pipe.o \ namei.o fcntl.o ioctl.o readdir.o select.o fifo.o locks.o \ - dcache.o inode.o attr.o bad_inode.o file.o iobuf.o dnotify.o \ + dcache.o attr.o bad_inode.o file.o iobuf.o dnotify.o \ filesystems.o namespace.o seq_file.o xattr.o libfs.o filesystems.o namespace.o seq_file.o xattr.o libfs.o quota.o ifneq ($(CONFIG_NFSD),n) ifneq ($(CONFIG_NFSD),) --- linux-2.5/include/linux/fs.h Mon Apr 29 07:03:15 2002 +++ linux-xfs/include/linux/fs.h Mon Apr 29 13:25:49 2002 @@ -906,6 +906,7 @@ struct super_operations { #define I_LOCK 8 #define I_FREEING 16 #define I_CLEAR 32 +#define I_NEW 64 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) @@ -1449,6 +1450,9 @@ static inline struct inode *iget(struct return iget4(sb, ino, NULL, NULL); } +extern struct inode * icreate(struct super_block *, unsigned long); +extern void unlock_new_inode(struct inode *); + extern void clear_inode(struct inode *); extern struct inode *new_inode(struct super_block *); extern void remove_suid(struct dentry *); -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:20 ` Steve Lord @ 2002-04-29 19:52 ` Christoph Hellwig 2002-04-29 22:02 ` Chris Mason 1 sibling, 0 replies; 31+ messages in thread From: Christoph Hellwig @ 2002-04-29 19:52 UTC (permalink / raw) To: Steve Lord Cc: Anton Altaparmakov, Jan Harkes, Alexander Viro, linux-fsdevel, Chris Mason On Mon, Apr 29, 2002 at 02:20:56PM -0500, Steve Lord wrote: > Yes, these were just making what we had done more obvious. Here is a > slight modification, I would prefer reiserfs expertise be involved in > removal of read_inode2. Code looks okay to be (besides ->read_inode2 stayying, but that can be removed as next step). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 19:20 ` Steve Lord 2002-04-29 19:52 ` Christoph Hellwig @ 2002-04-29 22:02 ` Chris Mason 1 sibling, 0 replies; 31+ messages in thread From: Chris Mason @ 2002-04-29 22:02 UTC (permalink / raw) To: Steve Lord Cc: Christoph Hellwig, Anton Altaparmakov, Jan Harkes, Alexander Viro, linux-fsdevel On Mon, 2002-04-29 at 15:20, Steve Lord wrote: > On Mon, 2002-04-29 at 13:39, Christoph Hellwig wrote: > > On Mon, Apr 29, 2002 at 01:30:46PM -0500, Steve Lord wrote: > > > Here for the record is that code - based by the way on some suggestions > > > Al made here to Chris Mason about other ways of doing things than read_inode2 > > > > > > Any comments Al, and Anton, what were the cleanups and renames? The > > > EXPORT_SYMBOL parts of this are missing right now, they should be in > > > inode.c not ksyms.c where we currently have them. > > > > * kill ->read_inode2 entirely, reiserfs can be directly converted to > > icreate() > > You know, Chris Mason was going to look at that a looong time ago. Lets > see what he says now. I like it a lot. I forget how the last version of this patch died, but I'm coding the reiserfs usage of this tonight. -chris ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: iget4/read_inode2 race in get_new_inode 2002-04-29 18:30 ` Steve Lord 2002-04-29 18:39 ` Christoph Hellwig @ 2002-04-29 19:12 ` Jan Harkes 1 sibling, 0 replies; 31+ messages in thread From: Jan Harkes @ 2002-04-29 19:12 UTC (permalink / raw) To: Steve Lord; +Cc: linux-fsdevel On Mon, Apr 29, 2002 at 01:30:46PM -0500, Steve Lord wrote: > Here for the record is that code - based by the way on some > suggestions Al made here to Chris Mason about other ways of doing > things than read_inode2 Nice, looking pretty much like what I just started on doing. I'll just start working from this one. It's just adding the 'init_locked_inode' callback function to fix the race I mentioned. unlock_new_inode should probably try to catch filesystems that tried to initialize an inode that already existed by adding BUG_ON(!(inode->i_state & I_NEW)); We can remove iget4, any filesystem that uses it will want to do special initialization (read_inode2) anyways. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2002-04-29 22:54 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-04-29 16:14 iget4/read_inode2 race in get_new_inode Jan Harkes 2002-04-29 16:48 ` Jan Harkes 2002-04-29 17:49 ` Alexander Viro 2002-04-29 19:34 ` Anton Altaparmakov 2002-04-29 19:39 ` Anton Altaparmakov 2002-04-29 19:41 ` Steve Lord 2002-04-29 19:51 ` Anton Altaparmakov 2002-04-29 19:51 ` Christoph Hellwig 2002-04-29 20:07 ` Anton Altaparmakov 2002-04-29 20:13 ` Christoph Hellwig 2002-04-29 20:24 ` Jan Harkes 2002-04-29 20:28 ` Anton Altaparmakov 2002-04-29 20:20 ` Jan Harkes 2002-04-29 20:31 ` Anton Altaparmakov 2002-04-29 20:48 ` Steve Lord 2002-04-29 20:58 ` Jan Harkes 2002-04-29 21:04 ` Steve Lord 2002-04-29 21:34 ` Anton Altaparmakov 2002-04-29 22:36 ` Alexander Viro 2002-04-29 22:54 ` Jan Harkes 2002-04-29 17:18 ` Anton Altaparmakov 2002-04-29 17:33 ` Jan Harkes 2002-04-29 17:44 ` Alexander Viro 2002-04-29 18:02 ` Jan Harkes 2002-04-29 18:22 ` Anton Altaparmakov 2002-04-29 18:30 ` Steve Lord 2002-04-29 18:39 ` Christoph Hellwig 2002-04-29 19:20 ` Steve Lord 2002-04-29 19:52 ` Christoph Hellwig 2002-04-29 22:02 ` Chris Mason 2002-04-29 19:12 ` Jan Harkes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.