From: Konstantin Khlebnikov <khlebnikov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] tmpfs: fix race between umount and writepage
Date: Thu, 21 Apr 2011 10:37:37 +0400 [thread overview]
Message-ID: <4DAFD0B1.9090603@parallels.com> (raw)
In-Reply-To: <20110420130453.3985144c.akpm@linux-foundation.org>
Andrew Morton wrote:
> On Tue, 5 Apr 2011 14:34:52 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> shmem_writepage() call igrab() on the inode for the page which is came from
>> reclaimer to add it later into shmem_swaplist for swap-unuse operation.
>>
>> This igrab() can race with super-block deactivating process:
>>
>> shrink_inactive_list() deactivate_super()
>> pageout() tmpfs_fs_type->kill_sb()
>> shmem_writepage() kill_litter_super()
>> generic_shutdown_super()
>> evict_inodes()
>> igrab()
>> atomic_read(&inode->i_count)
>> skip-inode
>> iput()
>> if (!list_empty(&sb->s_inodes))
>> printk("VFS: Busy inodes after...
>
> Generally, ->writepage implementations shouldn't play with the inode,
> for the reasons you've discovered. A more common race is
> writepage-versus-reclaim, where writepage is playing with the inode
> when a concurrent reclaim frees the inode (and hence the
> address_space).
>
> It is safe to play with the inode while the passed-in page is locked
> because nobody will free an inode which has an attached locked page.
> But once the page is unlocked, nothing pins the inode. Typically,
> tmpfs goes and breakes this rule.
>
>
> Question is: why is shmem_writepage() doing the igrab/iput?
>
> Read 1b1b32f2c6f6bb3253 and weep.
>
> That changelog is a little incorrect:
>
> : Ah, I'd never suspected it, but shmem_writepage's swaplist manipulation
> : is unsafe: though still hold page lock, which would hold off inode
> : deletion if the page were i pagecache, it doesn't hold off once it's in
> : swapcache (free_swap_and_cache doesn't wait on locked pages). Hmm: we
> : could put the the inode on swaplist earlier, but then shmem_unuse_inode
> : could never prune unswapped inodes.
>
> We don't actually hold the page lock when altering the swaplist:
> swap_writepage() unlocks the page. Doesn't seem to matter.
>
>
> I think we should get the igrab/iput out of there and come up with a
> different way of pinning the inode in ->writepage().
>
> Can we do it in this order?
>
> mutex_lock(&shmem_swaplist_mutex);
> list_move_tail(&info->swaplist,&shmem_swaplist);
> delete_from_page_cache(page);
> shmem_swp_set(info, entry, swap.val);
> shmem_swp_unmap(entry);
> mutex_unlock(&shmem_swaplist_mutex);
> swap_writepage(page, wbc);
>
Yes, we can, but of course without locking shmem_swaplist_mutex if inode already in shmem_swaplist.
I saw that igrab redundancy, but I was confused with lock-nesting and
shmem_swaplist spinlock to mutex conversion.
Seems to shmem_swaplist_mutex is already nested inside PageLock, so all ok.
We can simply revert last hunk from that commit, patch follows.
WARNING: multiple messages have this Message-ID (diff)
From: Konstantin Khlebnikov <khlebnikov@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] tmpfs: fix race between umount and writepage
Date: Thu, 21 Apr 2011 10:37:37 +0400 [thread overview]
Message-ID: <4DAFD0B1.9090603@parallels.com> (raw)
In-Reply-To: <20110420130453.3985144c.akpm@linux-foundation.org>
Andrew Morton wrote:
> On Tue, 5 Apr 2011 14:34:52 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> shmem_writepage() call igrab() on the inode for the page which is came from
>> reclaimer to add it later into shmem_swaplist for swap-unuse operation.
>>
>> This igrab() can race with super-block deactivating process:
>>
>> shrink_inactive_list() deactivate_super()
>> pageout() tmpfs_fs_type->kill_sb()
>> shmem_writepage() kill_litter_super()
>> generic_shutdown_super()
>> evict_inodes()
>> igrab()
>> atomic_read(&inode->i_count)
>> skip-inode
>> iput()
>> if (!list_empty(&sb->s_inodes))
>> printk("VFS: Busy inodes after...
>
> Generally, ->writepage implementations shouldn't play with the inode,
> for the reasons you've discovered. A more common race is
> writepage-versus-reclaim, where writepage is playing with the inode
> when a concurrent reclaim frees the inode (and hence the
> address_space).
>
> It is safe to play with the inode while the passed-in page is locked
> because nobody will free an inode which has an attached locked page.
> But once the page is unlocked, nothing pins the inode. Typically,
> tmpfs goes and breakes this rule.
>
>
> Question is: why is shmem_writepage() doing the igrab/iput?
>
> Read 1b1b32f2c6f6bb3253 and weep.
>
> That changelog is a little incorrect:
>
> : Ah, I'd never suspected it, but shmem_writepage's swaplist manipulation
> : is unsafe: though still hold page lock, which would hold off inode
> : deletion if the page were i pagecache, it doesn't hold off once it's in
> : swapcache (free_swap_and_cache doesn't wait on locked pages). Hmm: we
> : could put the the inode on swaplist earlier, but then shmem_unuse_inode
> : could never prune unswapped inodes.
>
> We don't actually hold the page lock when altering the swaplist:
> swap_writepage() unlocks the page. Doesn't seem to matter.
>
>
> I think we should get the igrab/iput out of there and come up with a
> different way of pinning the inode in ->writepage().
>
> Can we do it in this order?
>
> mutex_lock(&shmem_swaplist_mutex);
> list_move_tail(&info->swaplist,&shmem_swaplist);
> delete_from_page_cache(page);
> shmem_swp_set(info, entry, swap.val);
> shmem_swp_unmap(entry);
> mutex_unlock(&shmem_swaplist_mutex);
> swap_writepage(page, wbc);
>
Yes, we can, but of course without locking shmem_swaplist_mutex if inode already in shmem_swaplist.
I saw that igrab redundancy, but I was confused with lock-nesting and
shmem_swaplist spinlock to mutex conversion.
Seems to shmem_swaplist_mutex is already nested inside PageLock, so all ok.
We can simply revert last hunk from that commit, patch follows.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-21 6:37 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-05 10:34 [PATCH] tmpfs: fix race between umount and writepage Konstantin Khlebnikov
2011-04-05 10:34 ` Konstantin Khlebnikov
2011-04-08 12:27 ` Konstantin Khlebnikov
2011-04-08 12:27 ` Konstantin Khlebnikov
2011-04-20 20:04 ` Andrew Morton
2011-04-20 20:04 ` Andrew Morton
2011-04-21 6:37 ` Konstantin Khlebnikov [this message]
2011-04-21 6:37 ` Konstantin Khlebnikov
2011-04-21 6:41 ` [PATCH v2] " Konstantin Khlebnikov
2011-04-21 6:41 ` Konstantin Khlebnikov
2011-04-21 19:44 ` Andrew Morton
2011-04-21 19:44 ` Andrew Morton
2011-04-22 4:05 ` Konstantin Khlebnikov
2011-04-22 4:05 ` Konstantin Khlebnikov
2011-05-03 20:06 ` Hugh Dickins
2011-05-03 20:06 ` Hugh Dickins
2011-05-07 5:33 ` Konstantin Khlebnikov
2011-05-07 5:33 ` Konstantin Khlebnikov
2011-05-07 23:56 ` Hugh Dickins
2011-05-07 23:56 ` Hugh Dickins
2011-05-08 12:51 ` Konstantin Khlebnikov
2011-05-08 12:51 ` Konstantin Khlebnikov
2011-05-08 19:36 ` Hugh Dickins
2011-05-08 19:36 ` Hugh Dickins
2011-05-10 9:52 ` Konstantin Khlebnikov
2011-05-10 9:52 ` Konstantin Khlebnikov
2011-05-10 18:55 ` Hugh Dickins
2011-05-10 18:55 ` Hugh Dickins
2011-05-08 19:41 ` [PATCH 1/3] " Hugh Dickins
2011-05-08 19:41 ` Hugh Dickins
2011-05-08 19:43 ` [PATCH 2/3] tmpfs: fix race between umount and swapoff Hugh Dickins
2011-05-08 19:43 ` Hugh Dickins
2011-05-08 19:45 ` [PATCH 3/3] tmpfs: fix spurious ENOSPC when racing with unswap Hugh Dickins
2011-05-08 19:45 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DAFD0B1.9090603@parallels.com \
--to=khlebnikov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.