* Race with inodes in I_FREEING state
@ 2003-06-13 3:44 Livio Baldini Soares
2003-06-13 5:02 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Livio Baldini Soares @ 2003-06-13 3:44 UTC (permalink / raw)
To: linux-fsdevel; +Cc: neilb
Hello!
I'm developing a file system for Linux (I'm currently only using the
2.4 tree), and have seem to have hit a small race with the VFS code
starting to iget() an inode while it's being freed, which is causing
my code to panic.
The race occurs in the following scenario:
1) prune_icache() is called, and inode $x$ (ino = $z$) is removed from
the inode hash.
2) dispose_list() is called, but is preempted/scheduled.
3) Another task calls iget() for inode $y$ (ino also = $z$), doesn't
find it in the hash, and reads the inode (read_inode()).
4) dispose_list() wakes up, and finally calls FS-specific clear_inode()
operation on inode $x$.
It _is_ true that $x$ on steps 1 and 4 is a different inode than $y$
in step 3. However, my FS has some hashed/shared data, kept in 'union
u', which is deleted when clear_inode() is called. So, in the end of
step 4, inode $y$ has a broken 'u' field, pointing to deleted memory.
After looking around in the archive, I believe this race is similar
to the one described here, by Niel Brown:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105235852013658&w=2
Does this not also happen in version 2.4.20? Can anybody tell me if
my logic is wrong, or if I'm just plain doing something stupid in my
FS?
Hope I have not troubled anyone, best regards,
--
Livio B. Soares
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race with inodes in I_FREEING state
2003-06-13 3:44 Race with inodes in I_FREEING state Livio Baldini Soares
@ 2003-06-13 5:02 ` Neil Brown
2003-06-13 7:35 ` Andreas Dilger
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2003-06-13 5:02 UTC (permalink / raw)
To: Livio Baldini Soares; +Cc: linux-fsdevel
On Friday June 13, livio@ime.usp.br wrote:
> Hello!
>
> I'm developing a file system for Linux (I'm currently only using the
> 2.4 tree), and have seem to have hit a small race with the VFS code
> starting to iget() an inode while it's being freed, which is causing
> my code to panic.
>
> The race occurs in the following scenario:
>
> 1) prune_icache() is called, and inode $x$ (ino = $z$) is removed from
> the inode hash.
>
> 2) dispose_list() is called, but is preempted/scheduled.
>
> 3) Another task calls iget() for inode $y$ (ino also = $z$), doesn't
> find it in the hash, and reads the inode (read_inode()).
>
> 4) dispose_list() wakes up, and finally calls FS-specific clear_inode()
> operation on inode $x$.
>
> It _is_ true that $x$ on steps 1 and 4 is a different inode than $y$
> in step 3. However, my FS has some hashed/shared data, kept in 'union
> u', which is deleted when clear_inode() is called. So, in the end of
> step 4, inode $y$ has a broken 'u' field, pointing to deleted memory.
>
> After looking around in the archive, I believe this race is similar
> to the one described here, by Niel Brown:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105235852013658&w=2
>
> Does this not also happen in version 2.4.20? Can anybody tell me if
> my logic is wrong, or if I'm just plain doing something stupid in my
> FS?
Yep. It sound like the same race. I wasn't going to submit a 2.4
patch until the 2.5 one went in. I hope to submit the 2.4 equivalent
when 2.4.22-pre opens up.
NeilBrown
>
> Hope I have not troubled anyone, best regards,
>
> --
> Livio B. Soares
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race with inodes in I_FREEING state
2003-06-13 5:02 ` Neil Brown
@ 2003-06-13 7:35 ` Andreas Dilger
2003-06-13 13:00 ` Livio Baldini Soares
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Dilger @ 2003-06-13 7:35 UTC (permalink / raw)
To: Neil Brown; +Cc: Livio Baldini Soares, linux-fsdevel
On Jun 13, 2003 15:02 +1000, Neil Brown wrote:
> > I'm developing a file system for Linux (I'm currently only using the
> > 2.4 tree), and have seem to have hit a small race with the VFS code
> > starting to iget() an inode while it's being freed, which is causing
> > my code to panic.
> >
> > The race occurs in the following scenario:
> >
> > 1) prune_icache() is called, and inode $x$ (ino = $z$) is removed from
> > the inode hash.
> >
> > 2) dispose_list() is called, but is preempted/scheduled.
> >
> > 3) Another task calls iget() for inode $y$ (ino also = $z$), doesn't
> > find it in the hash, and reads the inode (read_inode()).
> >
> > 4) dispose_list() wakes up, and finally calls FS-specific clear_inode()
> > operation on inode $x$.
> >
> > It _is_ true that $x$ on steps 1 and 4 is a different inode than $y$
> > in step 3. However, my FS has some hashed/shared data, kept in 'union
> > u', which is deleted when clear_inode() is called. So, in the end of
> > step 4, inode $y$ has a broken 'u' field, pointing to deleted memory.
> >
> > After looking around in the archive, I believe this race is similar
> > to the one described here, by Niel Brown:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=105235852013658&w=2
> >
> > Does this not also happen in version 2.4.20? Can anybody tell me if
> > my logic is wrong, or if I'm just plain doing something stupid in my
> > FS?
>
> Yep. It sound like the same race. I wasn't going to submit a 2.4
> patch until the 2.5 one went in. I hope to submit the 2.4 equivalent
> when 2.4.22-pre opens up.
Sigh, we've just spent a week chasing exactly this same race in Lustre
on 2.4. It also stores pointers to shared data in the inode (DLM locks)
which are freed when clear_inode() is called. We fixed it only a few
hours ago by not matching our hashed locks if they are not for the same
inode _pointer_ instead of just for the same inode _number_/generation,
which is what the distributed lock name is.
If only Livio had posted this email last week ;-).
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race with inodes in I_FREEING state
2003-06-13 7:35 ` Andreas Dilger
@ 2003-06-13 13:00 ` Livio Baldini Soares
0 siblings, 0 replies; 4+ messages in thread
From: Livio Baldini Soares @ 2003-06-13 13:00 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Neil Brown, linux-fsdevel
Hey!
Andreas Dilger writes:
> On Jun 13, 2003 15:02 +1000, Neil Brown wrote:
> > > On Friday June 13, livio@ime.usp.br wrote:
[...]
> > > Does this not also happen in version 2.4.20? Can anybody tell me if
> > > my logic is wrong, or if I'm just plain doing something stupid in my
> > > FS?
> >
> > Yep. It sound like the same race. I wasn't going to submit a 2.4
> > patch until the 2.5 one went in. I hope to submit the 2.4 equivalent
> > when 2.4.22-pre opens up.
Ah, _great_! Thanks a lot Niel.
> Sigh, we've just spent a week chasing exactly this same race in Lustre
> on 2.4. It also stores pointers to shared data in the inode (DLM locks)
> which are freed when clear_inode() is called. We fixed it only a few
> hours ago by not matching our hashed locks if they are not for the same
> inode _pointer_ instead of just for the same inode _number_/generation,
> which is what the distributed lock name is.
Humm.. interesting work around. Except, my FS, the hashed data I
have in the inode's private parts has no idea that an inode even
_exists_. Guess I'll have to start keeping a back pointer to the
inode, until this is fixed. Darn.
> If only Livio had posted this email last week ;-).
Oops, sorry about that! :-P But you too, could of sent this earlier
and saved me 2 days of doing absolutely nothing except looking into a
monitor ;-)
Cheers!!
--
Livio B. Soares
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-06-13 12:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-13 3:44 Race with inodes in I_FREEING state Livio Baldini Soares
2003-06-13 5:02 ` Neil Brown
2003-06-13 7:35 ` Andreas Dilger
2003-06-13 13:00 ` Livio Baldini Soares
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox