linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] fsnotify: next_i is freed during
       [not found] <234>
@ 2014-10-10 17:26 ` Jerry Hoemann
  2014-10-10 17:26   ` [PATCH 1/1] fsnotify: next_i is freed during fsnotify_unmount_inodes Jerry Hoemann
  0 siblings, 1 reply; 2+ messages in thread
From: Jerry Hoemann @ 2014-10-10 17:26 UTC (permalink / raw)
  To: akpm, jeffrey.t.kirsher, kenhelias
  Cc: linux-kernel, linux-fsdevel, tmak, Jerry Hoemann

Resending to CC linux-fsdevel@vger.kernel.org.....


During file system stress testing on 3.10 and 3.12 based kernels,
the umount command occasionally hung in fsnotify_unmount_inodes
in the section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in inode_sb_list_del
was changed to list_del.  This poisons the pointers in the i_sb_list
and causes a kernel to panic if it transverse a freed inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the bottom
of the list_for_each_entry_safe loop showing next_i had become free.

We believe the root cause of the problem is that next_i is being freed during
the window of time that the list_for_each_entry_safe loop temporarily releases
inode_sb_list_lock to call fsnotify and fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing
of inode and next_i by calling __iget.  However, the code doesn't
do the __iget call on next_i
	if i_count == 0 or
	if i_state & (I_FREEING | I_WILL_FREE)


The patch addresses this issue by advancing next_i in the above two
cases until we either find a next_i which we can __iget or we reach
the end of the list.  This makes the handling of next_i more closely
match the handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.)
We ran the stress test on a 3.10 kernel with the proposed patch for a
week without failure.

Jerry Hoemann (1):
  fsnotify: next_i is freed during fsnotify_unmount_inodes.

 fs/notify/inode_mark.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
1.7.11.3

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH 1/1] fsnotify: next_i is freed during fsnotify_unmount_inodes.
  2014-10-10 17:26 ` [PATCH 0/1] fsnotify: next_i is freed during Jerry Hoemann
@ 2014-10-10 17:26   ` Jerry Hoemann
  0 siblings, 0 replies; 2+ messages in thread
From: Jerry Hoemann @ 2014-10-10 17:26 UTC (permalink / raw)
  To: akpm, jeffrey.t.kirsher, kenhelias
  Cc: linux-kernel, linux-fsdevel, tmak, Jerry Hoemann

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
---
 fs/notify/inode_mark.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 9ce0622..e849714 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -288,20 +288,25 @@ void fsnotify_unmount_inodes(struct list_head *list)
 		spin_unlock(&inode->i_lock);
 
 		/* In case the dropping of a reference would nuke next_i. */
-		if ((&next_i->i_sb_list != list) &&
-		    atomic_read(&next_i->i_count)) {
+		while (&next_i->i_sb_list != list) {
 			spin_lock(&next_i->i_lock);
-			if (!(next_i->i_state & (I_FREEING | I_WILL_FREE))) {
+			if (!(next_i->i_state & (I_FREEING | I_WILL_FREE)) &&
+						atomic_read(&next_i->i_count)) {
 				__iget(next_i);
 				need_iput = next_i;
+				spin_unlock(&next_i->i_lock);
+				break;
 			}
 			spin_unlock(&next_i->i_lock);
+			next_i = list_entry(next_i->i_sb_list.next,
+						struct inode, i_sb_list);
 		}
 
 		/*
-		 * We can safely drop inode_sb_list_lock here because we hold
-		 * references on both inode and next_i.  Also no new inodes
-		 * will be added since the umount has begun.
+		 * We can safely drop inode_sb_list_lock here because either
+		 * we actually hold references on both inode and next_i or
+		 * end of list.  Also no new inodes will be added since the
+		 * umount has begun.
 		 */
 		spin_unlock(&inode_sb_list_lock);
 
-- 
1.7.11.3


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-10-10 17:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <234>
2014-10-10 17:26 ` [PATCH 0/1] fsnotify: next_i is freed during Jerry Hoemann
2014-10-10 17:26   ` [PATCH 1/1] fsnotify: next_i is freed during fsnotify_unmount_inodes Jerry Hoemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).