All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] tmpfs: fix race between umount and writepage
Date: Fri, 22 Apr 2011 08:05:35 +0400	[thread overview]
Message-ID: <4DB0FE8F.9070407@parallels.com> (raw)
In-Reply-To: <20110421124424.0a10ed0c.akpm@linux-foundation.org>

Andrew Morton wrote:
> On Thu, 21 Apr 2011 10:41:50 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> shmem_writepage() call igrab() on the inode for the page which is came from
>> reclaimer to add it later into shmem_swaplist for swap-unuse operation.
>>
>> This igrab() can race with super-block deactivating process:
>>
>> shrink_inactive_list()		deactivate_super()
>> pageout()			tmpfs_fs_type->kill_sb()
>> shmem_writepage()		kill_litter_super()
>> 				generic_shutdown_super()
>> 				 evict_inodes()
>>   igrab()
>> 				  atomic_read(&inode->i_count)
>> 				   skip-inode
>>   iput()
>> 				 if (!list_empty(&sb->s_inodes))
>> 					printk("VFS: Busy inodes after...
>>
>> This igrap-iput pair was added in commit 1b1b32f2c6f6bb3253
>> based on incorrect assumptions:
>>
>> : Ah, I'd never suspected it, but shmem_writepage's swaplist manipulation
>> : is unsafe: though still hold page lock, which would hold off inode
>> : deletion if the page were i pagecache, it doesn't hold off once it's in
>> : swapcache (free_swap_and_cache doesn't wait on locked pages).  Hmm: we
>> : could put the the inode on swaplist earlier, but then shmem_unuse_inode
>> : could never prune unswapped inodes.
>>
>> Attached locked page actually protect inode from deletion because
>> truncate_inode_pages_range() will sleep on this, so igrab not required.
>> This patch actually revert last hunk from that commit.
>>
>
> hm, is that last paragraph true?  Let's look at the resulting code.
>
>
> : 	if (swap.val&&  add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
> : 		delete_from_page_cache(page);
>
> Here, the page is removed from inode->i_mapping.  So
> truncate_inode_pages() won't see that page and will not block on its
> lock.

Oops, right. Sorry. It produce use-after-free race, but it is quiet and small.
My test is using too few files to catch it in a reasonable time,
and I ran it without slab poisoning.

So, v1 patch is correct but little ugly, while v2 -- broken.

>
> : 		shmem_swp_set(info, entry, swap.val);
> : 		shmem_swp_unmap(entry);
> : 		spin_unlock(&info->lock);
> : 		if (list_empty(&info->swaplist)) {
> : 			mutex_lock(&shmem_swaplist_mutex);
> : 			/* move instead of add in case we're racing */
> : 			list_move_tail(&info->swaplist,&shmem_swaplist);
> : 			mutex_unlock(&shmem_swaplist_mutex);
> : 		}
>
> Here, the code plays with `info', which points at storage which is
> embedded within the inode's filesystem-private part.
>
> But because the inode now has no attached locked page, a concurrent
> umount can free the inode while this code is using it.

I guess we can try to put delete_from_page_cache(page); right before swap_writepage
but it move it outside info->lock...

>
> : 		swap_shmem_alloc(swap);
> : 		BUG_ON(page_mapped(page));
> : 		swap_writepage(page, wbc);
> : 		return 0;
> : 	}
>
> However, I assume that you reran your testcase with the v2 patch and
> that things ran OK.  How come?  Either my analysis is wrong or the
> testcase doesn't trigger races in this code path?
>

WARNING: multiple messages have this Message-ID (diff)
From: Konstantin Khlebnikov <khlebnikov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] tmpfs: fix race between umount and writepage
Date: Fri, 22 Apr 2011 08:05:35 +0400	[thread overview]
Message-ID: <4DB0FE8F.9070407@parallels.com> (raw)
In-Reply-To: <20110421124424.0a10ed0c.akpm@linux-foundation.org>

Andrew Morton wrote:
> On Thu, 21 Apr 2011 10:41:50 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> shmem_writepage() call igrab() on the inode for the page which is came from
>> reclaimer to add it later into shmem_swaplist for swap-unuse operation.
>>
>> This igrab() can race with super-block deactivating process:
>>
>> shrink_inactive_list()		deactivate_super()
>> pageout()			tmpfs_fs_type->kill_sb()
>> shmem_writepage()		kill_litter_super()
>> 				generic_shutdown_super()
>> 				 evict_inodes()
>>   igrab()
>> 				  atomic_read(&inode->i_count)
>> 				   skip-inode
>>   iput()
>> 				 if (!list_empty(&sb->s_inodes))
>> 					printk("VFS: Busy inodes after...
>>
>> This igrap-iput pair was added in commit 1b1b32f2c6f6bb3253
>> based on incorrect assumptions:
>>
>> : Ah, I'd never suspected it, but shmem_writepage's swaplist manipulation
>> : is unsafe: though still hold page lock, which would hold off inode
>> : deletion if the page were i pagecache, it doesn't hold off once it's in
>> : swapcache (free_swap_and_cache doesn't wait on locked pages).  Hmm: we
>> : could put the the inode on swaplist earlier, but then shmem_unuse_inode
>> : could never prune unswapped inodes.
>>
>> Attached locked page actually protect inode from deletion because
>> truncate_inode_pages_range() will sleep on this, so igrab not required.
>> This patch actually revert last hunk from that commit.
>>
>
> hm, is that last paragraph true?  Let's look at the resulting code.
>
>
> : 	if (swap.val&&  add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
> : 		delete_from_page_cache(page);
>
> Here, the page is removed from inode->i_mapping.  So
> truncate_inode_pages() won't see that page and will not block on its
> lock.

Oops, right. Sorry. It produce use-after-free race, but it is quiet and small.
My test is using too few files to catch it in a reasonable time,
and I ran it without slab poisoning.

So, v1 patch is correct but little ugly, while v2 -- broken.

>
> : 		shmem_swp_set(info, entry, swap.val);
> : 		shmem_swp_unmap(entry);
> : 		spin_unlock(&info->lock);
> : 		if (list_empty(&info->swaplist)) {
> : 			mutex_lock(&shmem_swaplist_mutex);
> : 			/* move instead of add in case we're racing */
> : 			list_move_tail(&info->swaplist,&shmem_swaplist);
> : 			mutex_unlock(&shmem_swaplist_mutex);
> : 		}
>
> Here, the code plays with `info', which points at storage which is
> embedded within the inode's filesystem-private part.
>
> But because the inode now has no attached locked page, a concurrent
> umount can free the inode while this code is using it.

I guess we can try to put delete_from_page_cache(page); right before swap_writepage
but it move it outside info->lock...

>
> : 		swap_shmem_alloc(swap);
> : 		BUG_ON(page_mapped(page));
> : 		swap_writepage(page, wbc);
> : 		return 0;
> : 	}
>
> However, I assume that you reran your testcase with the v2 patch and
> that things ran OK.  How come?  Either my analysis is wrong or the
> testcase doesn't trigger races in this code path?
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-04-22  4:05 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-05 10:34 [PATCH] tmpfs: fix race between umount and writepage Konstantin Khlebnikov
2011-04-05 10:34 ` Konstantin Khlebnikov
2011-04-08 12:27 ` Konstantin Khlebnikov
2011-04-08 12:27   ` Konstantin Khlebnikov
2011-04-20 20:04 ` Andrew Morton
2011-04-20 20:04   ` Andrew Morton
2011-04-21  6:37   ` Konstantin Khlebnikov
2011-04-21  6:37     ` Konstantin Khlebnikov
2011-04-21  6:41     ` [PATCH v2] " Konstantin Khlebnikov
2011-04-21  6:41       ` Konstantin Khlebnikov
2011-04-21 19:44       ` Andrew Morton
2011-04-21 19:44         ` Andrew Morton
2011-04-22  4:05         ` Konstantin Khlebnikov [this message]
2011-04-22  4:05           ` Konstantin Khlebnikov
2011-05-03 20:06           ` Hugh Dickins
2011-05-03 20:06             ` Hugh Dickins
2011-05-07  5:33             ` Konstantin Khlebnikov
2011-05-07  5:33               ` Konstantin Khlebnikov
2011-05-07 23:56               ` Hugh Dickins
2011-05-07 23:56                 ` Hugh Dickins
2011-05-08 12:51                 ` Konstantin Khlebnikov
2011-05-08 12:51                   ` Konstantin Khlebnikov
2011-05-08 19:36                   ` Hugh Dickins
2011-05-08 19:36                     ` Hugh Dickins
2011-05-10  9:52                     ` Konstantin Khlebnikov
2011-05-10  9:52                       ` Konstantin Khlebnikov
2011-05-10 18:55                       ` Hugh Dickins
2011-05-10 18:55                         ` Hugh Dickins
2011-05-08 19:41                   ` [PATCH 1/3] " Hugh Dickins
2011-05-08 19:41                     ` Hugh Dickins
2011-05-08 19:43                     ` [PATCH 2/3] tmpfs: fix race between umount and swapoff Hugh Dickins
2011-05-08 19:43                       ` Hugh Dickins
2011-05-08 19:45                     ` [PATCH 3/3] tmpfs: fix spurious ENOSPC when racing with unswap Hugh Dickins
2011-05-08 19:45                       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DB0FE8F.9070407@parallels.com \
    --to=khlebnikov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.