* Re: set_page_dirty vs truncate [not found] <20201218160531.GL15600@casper.infradead.org> @ 2020-12-18 22:03 ` Matthew Wilcox 2020-12-21 14:12 ` Jan Kara 0 siblings, 1 reply; 3+ messages in thread From: Matthew Wilcox @ 2020-12-18 22:03 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-ext4, Jan Kara, Theodore Ts'o, Andreas Dilger On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote: > A number of implementations of ->set_page_dirty check whether the page > has been truncated (ie page->mapping has become NULL since entering > set_page_dirty()). Several other implementations assume that they can do > page->mapping->host to get to the inode. So either some implementations > are doing unnecessary checks or others are vulnerable to a NULL pointer > dereference if truncate() races with set_page_dirty(). > > I'm touching ->set_page_dirty() anyway as part of the page folio > conversion. I'm thinking about passing in the mapping so there's no > need to look at page->mapping. > > The comments on set_page_dirty() and set_page_dirty_lock() suggests > there's no consistency in whether truncation is blocked or not; we're > only guaranteed that the inode itself won't go away. But maybe the > comments are stale. The comments are, I believe, not stale. Here's some syzbot reports which indicate that ext4 is seeing races between set_page_dirty() and truncate(): https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ The reproducer includes calls to ftruncate(), so that would suggest that's what's going on. I would suggest just deleting this line: WARN_ON_ONCE(!page_has_buffers(page)); I'm not sure what value the other WARN_ON_ONCE adds. Maybe just replace ext4_set_page_dirty with __set_page_dirty_buffers in the aops? I'd defer to an ext4 expert on this ... ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: set_page_dirty vs truncate 2020-12-18 22:03 ` set_page_dirty vs truncate Matthew Wilcox @ 2020-12-21 14:12 ` Jan Kara 2020-12-21 14:58 ` Matthew Wilcox 0 siblings, 1 reply; 3+ messages in thread From: Jan Kara @ 2020-12-21 14:12 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-fsdevel, linux-ext4, Jan Kara, Theodore Ts'o, Andreas Dilger On Fri 18-12-20 22:03:16, Matthew Wilcox wrote: > On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote: > > A number of implementations of ->set_page_dirty check whether the page > > has been truncated (ie page->mapping has become NULL since entering > > set_page_dirty()). Several other implementations assume that they can do > > page->mapping->host to get to the inode. So either some implementations > > are doing unnecessary checks or others are vulnerable to a NULL pointer > > dereference if truncate() races with set_page_dirty(). > > > > I'm touching ->set_page_dirty() anyway as part of the page folio > > conversion. I'm thinking about passing in the mapping so there's no > > need to look at page->mapping. > > > > The comments on set_page_dirty() and set_page_dirty_lock() suggests > > there's no consistency in whether truncation is blocked or not; we're > > only guaranteed that the inode itself won't go away. But maybe the > > comments are stale. > > The comments are, I believe, not stale. Here's some syzbot > reports which indicate that ext4 is seeing races between set_page_dirty() > and truncate(): > > https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ > > The reproducer includes calls to ftruncate(), so that would suggest > that's what's going on. > > I would suggest just deleting this line: > > WARN_ON_ONCE(!page_has_buffers(page)); > > I'm not sure what value the other WARN_ON_ONCE adds. Maybe just replace > ext4_set_page_dirty with __set_page_dirty_buffers in the aops? I'd defer > to an ext4 expert on this ... Please no. We've added this WARN_ON_ONCE() in 6dcc693bc57 ("ext4: warn when page is dirtied without buffers") to catch problems with page pinning earlier so that we get more diagnostic information before we actually BUG_ON() in the writeback code ;). To give more context: The question in which states we can see a page in set_page_dirty() is actually filesystem dependent. Filesystems such as ext4, xfs, btrfs expect to have full control over page dirtying because for them it's a question of fs consistency (due to journalling requirements, delayed allocation accounting etc.). Generally they expect the page can be dirtied only through ->page_mkwrite() or through ->write_iter() and lock things accordingly to maintain consistency. Except there's stuff like GUP which breaks these assumptions - GUP users will trigger ->page_mkwrite() but page can be writeprotected and cleaned long before GUP user modifies page data and calls set_page_dirty(). Which is the main point why we came up with pin_user_pages() so that MM / filesystems can detect there are page references which can potentially modify & dirty a page and can count with it (the "count with it" part is still missing, I have some clear ideas how to do it but didn't get to it yet). And the syzkaller reproducer you reference above is exactly one of the paths using GUP (actually already pin_user_pages() these days) that can get fs into inconsistent state. But overall even with GUP woes fixed up, set_page_dirty() called by a PUP user could still see already truncated page. So it has to deal with it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: set_page_dirty vs truncate 2020-12-21 14:12 ` Jan Kara @ 2020-12-21 14:58 ` Matthew Wilcox 0 siblings, 0 replies; 3+ messages in thread From: Matthew Wilcox @ 2020-12-21 14:58 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel, linux-ext4, Jan Kara, Theodore Ts'o, Andreas Dilger On Mon, Dec 21, 2020 at 03:12:57PM +0100, Jan Kara wrote: > But overall even with GUP woes fixed up, set_page_dirty() called by a PUP > user could still see already truncated page. So it has to deal with it. Thanks! That was really helpful. We have a number of currently-buggy filesystems which assume they can do inode = page->mapping->host without checking that page->mapping is not NULL. Anyway, since I'm changing the set_page_dirty signature for folios, this feels like the right time to pass in the page's mapping. __set_page_dirty() rechecks the mapping under the i_pages lock, so we won't do anything inappropriate if the page has been truncated. You can find the whole thing at https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio but the important bit is: - /* Set a page dirty. Return true if this dirtied it */ - int (*set_page_dirty)(struct page *page); + /* Set a folio dirty. Return true if this dirtied it */ + bool (*set_page_dirty)(struct address_space *, struct folio *); I'm kind of tempted to rename it to ->dirty_folio(), but I'm also fine with leaving it this way. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-12-21 15:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20201218160531.GL15600@casper.infradead.org>
2020-12-18 22:03 ` set_page_dirty vs truncate Matthew Wilcox
2020-12-21 14:12 ` Jan Kara
2020-12-21 14:58 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).