* [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage()
@ 2012-12-12 20:50 Jan Kara
2012-12-12 20:50 ` [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer() Jan Kara
2012-12-21 19:48 ` [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Theodore Ts'o
0 siblings, 2 replies; 6+ messages in thread
From: Jan Kara @ 2012-12-12 20:50 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
In data=journal mode we don't need delalloc or DIO handling in invalidatepage
and similarly in other modes we don't need the journal handling. So split
invalidatepage implementations.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 23 ++++++++++++++++-------
include/trace/events/ext4.h | 14 +++++++++++++-
2 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b3c243b..66abac7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2799,8 +2799,6 @@ static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offs
static void ext4_invalidatepage(struct page *page, unsigned long offset)
{
- journal_t *journal = EXT4_JOURNAL(page->mapping->host);
-
trace_ext4_invalidatepage(page, offset);
/*
@@ -2808,16 +2806,27 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
*/
if (ext4_should_dioread_nolock(page->mapping->host))
ext4_invalidatepage_free_endio(page, offset);
+
+ /* No journalling happens on data buffers when this function is used */
+ WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
+
+ block_invalidatepage(page, offset);
+}
+
+static void ext4_journalled_invalidatepage(struct page *page,
+ unsigned long offset)
+{
+ journal_t *journal = EXT4_JOURNAL(page->mapping->host);
+
+ trace_ext4_journalled_invalidatepage(page, offset);
+
/*
* If it's a full truncate we just forget about the pending dirtying
*/
if (offset == 0)
ClearPageChecked(page);
- if (journal)
- jbd2_journal_invalidatepage(journal, page, offset);
- else
- block_invalidatepage(page, offset);
+ jbd2_journal_invalidatepage(journal, page, offset);
}
static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -3201,7 +3210,7 @@ static const struct address_space_operations ext4_journalled_aops = {
.write_end = ext4_journalled_write_end,
.set_page_dirty = ext4_journalled_set_page_dirty,
.bmap = ext4_bmap,
- .invalidatepage = ext4_invalidatepage,
+ .invalidatepage = ext4_journalled_invalidatepage,
.releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO,
.is_partially_uptodate = block_is_partially_uptodate,
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index d49b285..df972d9 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -450,7 +450,7 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
TP_ARGS(page)
);
-TRACE_EVENT(ext4_invalidatepage,
+DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
TP_PROTO(struct page *page, unsigned long offset),
TP_ARGS(page, offset),
@@ -476,6 +476,18 @@ TRACE_EVENT(ext4_invalidatepage,
(unsigned long) __entry->index, __entry->offset)
);
+DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage
+ TP_PROTO(struct page *page, unsigned long offset),
+
+ TP_ARGS(page, offset)
+);
+
+DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage
+ TP_PROTO(struct page *page, unsigned long offset),
+
+ TP_ARGS(page, offset)
+);
+
TRACE_EVENT(ext4_discard_blocks,
TP_PROTO(struct super_block *sb, unsigned long long blk,
unsigned long long count),
--
1.7.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer()
2012-12-12 20:50 [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Jan Kara
@ 2012-12-12 20:50 ` Jan Kara
2012-12-21 19:52 ` Theodore Ts'o
2012-12-21 19:48 ` [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Theodore Ts'o
1 sibling, 1 reply; 6+ messages in thread
From: Jan Kara @ 2012-12-12 20:50 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
We cannot wait for transaction commit in journal_unmap_buffer() because we hold
page lock which ranks below transaction start. We solve the issue by bailing
out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY.
Caller is then responsible for waiting for transaction commit to finish and try
invalidation again. Since the issue can happen only for page stradding i_size,
it is simple enough to manually call jbd2_journal_invalidatepage() for such
page from ext4_setattr(), check the return value and wait if necessary.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 82 +++++++++++++++++++++++++++++++++++++------
fs/jbd2/transaction.c | 27 +++++++-------
include/linux/jbd2.h | 2 +-
include/trace/events/ext4.h | 4 +-
4 files changed, 88 insertions(+), 27 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 66abac7..cc0abeb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2813,8 +2813,8 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
block_invalidatepage(page, offset);
}
-static void ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+static int __ext4_journalled_invalidatepage(struct page *page,
+ unsigned long offset)
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);
@@ -2826,7 +2826,14 @@ static void ext4_journalled_invalidatepage(struct page *page,
if (offset == 0)
ClearPageChecked(page);
- jbd2_journal_invalidatepage(journal, page, offset);
+ return jbd2_journal_invalidatepage(journal, page, offset);
+}
+
+/* Wrapper for aops... */
+static void ext4_journalled_invalidatepage(struct page *page,
+ unsigned long offset)
+{
+ WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
}
static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4230,6 +4237,47 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
+ * In data=journal mode ext4_journalled_invalidatepage() may fail to invalidate
+ * buffers that are attached to a page stradding i_size and are undergoing
+ * commit. In that case we have to wait for commit to finish and try again.
+ */
+static void ext4_wait_for_tail_page_commit(struct inode *inode)
+{
+ struct page *page;
+ unsigned offset;
+ journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+ tid_t commit_tid = 0;
+ int ret;
+
+ offset = inode->i_size & (PAGE_CACHE_SIZE - 1);
+ /*
+ * All buffers in the last page remain valid? Then there's nothing to
+ * do. We do the check mainly to optimize the common PAGE_CACHE_SIZE ==
+ * blocksize case
+ */
+ if (offset > PAGE_CACHE_SIZE - (1 << inode->i_blkbits))
+ return;
+ while (1) {
+ page = find_lock_page(inode->i_mapping,
+ inode->i_size >> PAGE_CACHE_SHIFT);
+ if (!page)
+ return;
+ ret = __ext4_journalled_invalidatepage(page, offset);
+ unlock_page(page);
+ page_cache_release(page);
+ if (ret != -EBUSY)
+ return;
+ commit_tid = 0;
+ read_lock(&journal->j_state_lock);
+ if (journal->j_committing_transaction)
+ commit_tid = journal->j_committing_transaction->t_tid;
+ read_unlock(&journal->j_state_lock);
+ if (commit_tid)
+ jbd2_log_wait_commit(journal, commit_tid);
+ }
+}
+
+/*
* ext4_setattr()
*
* Called from notify_change.
@@ -4342,16 +4390,28 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
}
if (attr->ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- truncate_setsize(inode, attr->ia_size);
- /* Inode size will be reduced, wait for dio in flight.
- * Temporarily disable dioread_nolock to prevent
- * livelock. */
+ if (attr->ia_size != inode->i_size) {
+ loff_t oldsize = inode->i_size;
+
+ i_size_write(inode, attr->ia_size);
+ /*
+ * Blocks are going to be removed from the inode. Wait
+ * for dio in flight. Temporarily disable
+ * dioread_nolock to prevent livelock.
+ */
if (orphan) {
- ext4_inode_block_unlocked_dio(inode);
- inode_dio_wait(inode);
- ext4_inode_resume_unlocked_dio(inode);
+ if (!ext4_should_journal_data(inode)) {
+ ext4_inode_block_unlocked_dio(inode);
+ inode_dio_wait(inode);
+ ext4_inode_resume_unlocked_dio(inode);
+ } else
+ ext4_wait_for_tail_page_commit(inode);
}
+ /*
+ * Truncate pagecache after we've waited for commit
+ * in data=journal mode to make pages freeable.
+ */
+ truncate_pagecache(inode, oldsize, inode->i_size);
}
ext4_truncate(inode);
}
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a74ba46..e1475b4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1850,7 +1850,6 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
BUFFER_TRACE(bh, "entry");
-retry:
/*
* It is safe to proceed here without the j_list_lock because the
* buffers cannot be stolen by try_to_free_buffers as long as we are
@@ -1945,14 +1944,11 @@ retry:
* for commit and try again.
*/
if (partial_page) {
- tid_t tid = journal->j_committing_transaction->t_tid;
-
jbd2_journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
write_unlock(&journal->j_state_lock);
- jbd2_log_wait_commit(journal, tid);
- goto retry;
+ return -EBUSY;
}
/*
* OK, buffer won't be reachable after truncate. We just set
@@ -2013,21 +2009,23 @@ zap_buffer_unlocked:
* @page: page to flush
* @offset: length of page to invalidate.
*
- * Reap page buffers containing data after offset in page.
- *
+ * Reap page buffers containing data after offset in page. Can return -EBUSY
+ * if buffers are part of the committing transaction and the page is straddling
+ * i_size. Caller then has to wait for current commit and try again.
*/
-void jbd2_journal_invalidatepage(journal_t *journal,
- struct page *page,
- unsigned long offset)
+int jbd2_journal_invalidatepage(journal_t *journal,
+ struct page *page,
+ unsigned long offset)
{
struct buffer_head *head, *bh, *next;
unsigned int curr_off = 0;
int may_free = 1;
+ int ret = 0;
if (!PageLocked(page))
BUG();
if (!page_has_buffers(page))
- return;
+ return 0;
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
@@ -2041,9 +2039,11 @@ void jbd2_journal_invalidatepage(journal_t *journal,
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
- may_free &= journal_unmap_buffer(journal, bh,
- offset > 0);
+ ret = journal_unmap_buffer(journal, bh, offset > 0);
unlock_buffer(bh);
+ if (ret < 0)
+ return ret;
+ may_free &= ret;
}
curr_off = next_off;
bh = next;
@@ -2054,6 +2054,7 @@ void jbd2_journal_invalidatepage(journal_t *journal,
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
+ return 0;
}
/*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 3efc43f..cb7d6af 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1099,7 +1099,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
-extern void jbd2_journal_invalidatepage(journal_t *,
+extern int jbd2_journal_invalidatepage(journal_t *,
struct page *, unsigned long);
extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int jbd2_journal_stop(handle_t *);
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index df972d9..3ef522e 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -476,13 +476,13 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
(unsigned long) __entry->index, __entry->offset)
);
-DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage
+DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
TP_PROTO(struct page *page, unsigned long offset),
TP_ARGS(page, offset)
);
-DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage
+DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
TP_PROTO(struct page *page, unsigned long offset),
TP_ARGS(page, offset)
--
1.7.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage()
2012-12-12 20:50 [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Jan Kara
2012-12-12 20:50 ` [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer() Jan Kara
@ 2012-12-21 19:48 ` Theodore Ts'o
1 sibling, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2012-12-21 19:48 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Wed, Dec 12, 2012 at 09:50:15PM +0100, Jan Kara wrote:
> In data=journal mode we don't need delalloc or DIO handling in invalidatepage
> and similarly in other modes we don't need the journal handling. So split
> invalidatepage implementations.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Applied, thanks.
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer()
2012-12-12 20:50 ` [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer() Jan Kara
@ 2012-12-21 19:52 ` Theodore Ts'o
2012-12-21 23:03 ` Jan Kara
0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2012-12-21 19:52 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, Mark Fasheh, Joel Becker, ocfs2-devel
On Wed, Dec 12, 2012 at 09:50:16PM +0100, Jan Kara wrote:
> We cannot wait for transaction commit in journal_unmap_buffer() because we hold
> page lock which ranks below transaction start. We solve the issue by bailing
> out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY.
ocfs2 also calls jbd2_journal_invalidatepage(). I would think we
would need to apply a similar fix up to ocfs2, lest they end up having
jbd2_journal_invalidatepage() silently failing for them.
I'm going to hold off on this patch until we're sure it's not going to
cause problems for ocfs2. A quick check indicates that they also
support sub-page block sizes, which would tend to indicate that they
could get hit by the same dead lock, yes?
- Ted
> Caller is then responsible for waiting for transaction commit to finish and try
> invalidation again. Since the issue can happen only for page stradding i_size,
> it is simple enough to manually call jbd2_journal_invalidatepage() for such
> page from ext4_setattr(), check the return value and wait if necessary.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext4/inode.c | 82 +++++++++++++++++++++++++++++++++++++------
> fs/jbd2/transaction.c | 27 +++++++-------
> include/linux/jbd2.h | 2 +-
> include/trace/events/ext4.h | 4 +-
> 4 files changed, 88 insertions(+), 27 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 66abac7..cc0abeb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2813,8 +2813,8 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
> block_invalidatepage(page, offset);
> }
>
> -static void ext4_journalled_invalidatepage(struct page *page,
> - unsigned long offset)
> +static int __ext4_journalled_invalidatepage(struct page *page,
> + unsigned long offset)
> {
> journal_t *journal = EXT4_JOURNAL(page->mapping->host);
>
> @@ -2826,7 +2826,14 @@ static void ext4_journalled_invalidatepage(struct page *page,
> if (offset == 0)
> ClearPageChecked(page);
>
> - jbd2_journal_invalidatepage(journal, page, offset);
> + return jbd2_journal_invalidatepage(journal, page, offset);
> +}
> +
> +/* Wrapper for aops... */
> +static void ext4_journalled_invalidatepage(struct page *page,
> + unsigned long offset)
> +{
> + WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
> }
>
> static int ext4_releasepage(struct page *page, gfp_t wait)
> @@ -4230,6 +4237,47 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
> }
>
> /*
> + * In data=journal mode ext4_journalled_invalidatepage() may fail to invalidate
> + * buffers that are attached to a page stradding i_size and are undergoing
> + * commit. In that case we have to wait for commit to finish and try again.
> + */
> +static void ext4_wait_for_tail_page_commit(struct inode *inode)
> +{
> + struct page *page;
> + unsigned offset;
> + journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> + tid_t commit_tid = 0;
> + int ret;
> +
> + offset = inode->i_size & (PAGE_CACHE_SIZE - 1);
> + /*
> + * All buffers in the last page remain valid? Then there's nothing to
> + * do. We do the check mainly to optimize the common PAGE_CACHE_SIZE ==
> + * blocksize case
> + */
> + if (offset > PAGE_CACHE_SIZE - (1 << inode->i_blkbits))
> + return;
> + while (1) {
> + page = find_lock_page(inode->i_mapping,
> + inode->i_size >> PAGE_CACHE_SHIFT);
> + if (!page)
> + return;
> + ret = __ext4_journalled_invalidatepage(page, offset);
> + unlock_page(page);
> + page_cache_release(page);
> + if (ret != -EBUSY)
> + return;
> + commit_tid = 0;
> + read_lock(&journal->j_state_lock);
> + if (journal->j_committing_transaction)
> + commit_tid = journal->j_committing_transaction->t_tid;
> + read_unlock(&journal->j_state_lock);
> + if (commit_tid)
> + jbd2_log_wait_commit(journal, commit_tid);
> + }
> +}
> +
> +/*
> * ext4_setattr()
> *
> * Called from notify_change.
> @@ -4342,16 +4390,28 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> }
>
> if (attr->ia_valid & ATTR_SIZE) {
> - if (attr->ia_size != i_size_read(inode)) {
> - truncate_setsize(inode, attr->ia_size);
> - /* Inode size will be reduced, wait for dio in flight.
> - * Temporarily disable dioread_nolock to prevent
> - * livelock. */
> + if (attr->ia_size != inode->i_size) {
> + loff_t oldsize = inode->i_size;
> +
> + i_size_write(inode, attr->ia_size);
> + /*
> + * Blocks are going to be removed from the inode. Wait
> + * for dio in flight. Temporarily disable
> + * dioread_nolock to prevent livelock.
> + */
> if (orphan) {
> - ext4_inode_block_unlocked_dio(inode);
> - inode_dio_wait(inode);
> - ext4_inode_resume_unlocked_dio(inode);
> + if (!ext4_should_journal_data(inode)) {
> + ext4_inode_block_unlocked_dio(inode);
> + inode_dio_wait(inode);
> + ext4_inode_resume_unlocked_dio(inode);
> + } else
> + ext4_wait_for_tail_page_commit(inode);
> }
> + /*
> + * Truncate pagecache after we've waited for commit
> + * in data=journal mode to make pages freeable.
> + */
> + truncate_pagecache(inode, oldsize, inode->i_size);
> }
> ext4_truncate(inode);
> }
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index a74ba46..e1475b4 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1850,7 +1850,6 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
>
> BUFFER_TRACE(bh, "entry");
>
> -retry:
> /*
> * It is safe to proceed here without the j_list_lock because the
> * buffers cannot be stolen by try_to_free_buffers as long as we are
> @@ -1945,14 +1944,11 @@ retry:
> * for commit and try again.
> */
> if (partial_page) {
> - tid_t tid = journal->j_committing_transaction->t_tid;
> -
> jbd2_journal_put_journal_head(jh);
> spin_unlock(&journal->j_list_lock);
> jbd_unlock_bh_state(bh);
> write_unlock(&journal->j_state_lock);
> - jbd2_log_wait_commit(journal, tid);
> - goto retry;
> + return -EBUSY;
> }
> /*
> * OK, buffer won't be reachable after truncate. We just set
> @@ -2013,21 +2009,23 @@ zap_buffer_unlocked:
> * @page: page to flush
> * @offset: length of page to invalidate.
> *
> - * Reap page buffers containing data after offset in page.
> - *
> + * Reap page buffers containing data after offset in page. Can return -EBUSY
> + * if buffers are part of the committing transaction and the page is straddling
> + * i_size. Caller then has to wait for current commit and try again.
> */
> -void jbd2_journal_invalidatepage(journal_t *journal,
> - struct page *page,
> - unsigned long offset)
> +int jbd2_journal_invalidatepage(journal_t *journal,
> + struct page *page,
> + unsigned long offset)
> {
> struct buffer_head *head, *bh, *next;
> unsigned int curr_off = 0;
> int may_free = 1;
> + int ret = 0;
>
> if (!PageLocked(page))
> BUG();
> if (!page_has_buffers(page))
> - return;
> + return 0;
>
> /* We will potentially be playing with lists other than just the
> * data lists (especially for journaled data mode), so be
> @@ -2041,9 +2039,11 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> if (offset <= curr_off) {
> /* This block is wholly outside the truncation point */
> lock_buffer(bh);
> - may_free &= journal_unmap_buffer(journal, bh,
> - offset > 0);
> + ret = journal_unmap_buffer(journal, bh, offset > 0);
> unlock_buffer(bh);
> + if (ret < 0)
> + return ret;
> + may_free &= ret;
> }
> curr_off = next_off;
> bh = next;
> @@ -2054,6 +2054,7 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> if (may_free && try_to_free_buffers(page))
> J_ASSERT(!page_has_buffers(page));
> }
> + return 0;
> }
>
> /*
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 3efc43f..cb7d6af 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1099,7 +1099,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
> extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
> extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> extern void journal_sync_buffer (struct buffer_head *);
> -extern void jbd2_journal_invalidatepage(journal_t *,
> +extern int jbd2_journal_invalidatepage(journal_t *,
> struct page *, unsigned long);
> extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> extern int jbd2_journal_stop(handle_t *);
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index df972d9..3ef522e 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -476,13 +476,13 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
> (unsigned long) __entry->index, __entry->offset)
> );
>
> -DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage
> +DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
> TP_PROTO(struct page *page, unsigned long offset),
>
> TP_ARGS(page, offset)
> );
>
> -DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage
> +DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
> TP_PROTO(struct page *page, unsigned long offset),
>
> TP_ARGS(page, offset)
> --
> 1.7.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer()
2012-12-21 19:52 ` Theodore Ts'o
@ 2012-12-21 23:03 ` Jan Kara
2012-12-22 3:52 ` Theodore Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2012-12-21 23:03 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Jan Kara, linux-ext4, Mark Fasheh, Joel Becker, ocfs2-devel
On Fri 21-12-12 14:52:20, Ted Tso wrote:
> On Wed, Dec 12, 2012 at 09:50:16PM +0100, Jan Kara wrote:
> > We cannot wait for transaction commit in journal_unmap_buffer() because we hold
> > page lock which ranks below transaction start. We solve the issue by bailing
> > out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY.
>
> ocfs2 also calls jbd2_journal_invalidatepage(). I would think we
> would need to apply a similar fix up to ocfs2, lest they end up having
> jbd2_journal_invalidatepage() silently failing for them.
>
> I'm going to hold off on this patch until we're sure it's not going to
> cause problems for ocfs2. A quick check indicates that they also
> support sub-page block sizes, which would tend to indicate that they
> could get hit by the same dead lock, yes?
I should have commented on this in the changelog :).
jbd2_journal_invalidatepage() gets called only for file pagecache pages.
Because ocfs2 doesn't do data journalling it never sees buffers that are
part of a transaction in jbd2_journal_invalidatepage() (similarly to ext4
except for data=journal case). I'll send a patch to just stop calling
jbd2_journal_invalidatepage() for ocfs2... But you are safe to merge this
patch in the mean time.
Honza
>
> - Ted
>
> > Caller is then responsible for waiting for transaction commit to finish and try
> > invalidation again. Since the issue can happen only for page stradding i_size,
> > it is simple enough to manually call jbd2_journal_invalidatepage() for such
> > page from ext4_setattr(), check the return value and wait if necessary.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/ext4/inode.c | 82 +++++++++++++++++++++++++++++++++++++------
> > fs/jbd2/transaction.c | 27 +++++++-------
> > include/linux/jbd2.h | 2 +-
> > include/trace/events/ext4.h | 4 +-
> > 4 files changed, 88 insertions(+), 27 deletions(-)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 66abac7..cc0abeb 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2813,8 +2813,8 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
> > block_invalidatepage(page, offset);
> > }
> >
> > -static void ext4_journalled_invalidatepage(struct page *page,
> > - unsigned long offset)
> > +static int __ext4_journalled_invalidatepage(struct page *page,
> > + unsigned long offset)
> > {
> > journal_t *journal = EXT4_JOURNAL(page->mapping->host);
> >
> > @@ -2826,7 +2826,14 @@ static void ext4_journalled_invalidatepage(struct page *page,
> > if (offset == 0)
> > ClearPageChecked(page);
> >
> > - jbd2_journal_invalidatepage(journal, page, offset);
> > + return jbd2_journal_invalidatepage(journal, page, offset);
> > +}
> > +
> > +/* Wrapper for aops... */
> > +static void ext4_journalled_invalidatepage(struct page *page,
> > + unsigned long offset)
> > +{
> > + WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
> > }
> >
> > static int ext4_releasepage(struct page *page, gfp_t wait)
> > @@ -4230,6 +4237,47 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
> > }
> >
> > /*
> > + * In data=journal mode ext4_journalled_invalidatepage() may fail to invalidate
> > + * buffers that are attached to a page stradding i_size and are undergoing
> > + * commit. In that case we have to wait for commit to finish and try again.
> > + */
> > +static void ext4_wait_for_tail_page_commit(struct inode *inode)
> > +{
> > + struct page *page;
> > + unsigned offset;
> > + journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> > + tid_t commit_tid = 0;
> > + int ret;
> > +
> > + offset = inode->i_size & (PAGE_CACHE_SIZE - 1);
> > + /*
> > + * All buffers in the last page remain valid? Then there's nothing to
> > + * do. We do the check mainly to optimize the common PAGE_CACHE_SIZE ==
> > + * blocksize case
> > + */
> > + if (offset > PAGE_CACHE_SIZE - (1 << inode->i_blkbits))
> > + return;
> > + while (1) {
> > + page = find_lock_page(inode->i_mapping,
> > + inode->i_size >> PAGE_CACHE_SHIFT);
> > + if (!page)
> > + return;
> > + ret = __ext4_journalled_invalidatepage(page, offset);
> > + unlock_page(page);
> > + page_cache_release(page);
> > + if (ret != -EBUSY)
> > + return;
> > + commit_tid = 0;
> > + read_lock(&journal->j_state_lock);
> > + if (journal->j_committing_transaction)
> > + commit_tid = journal->j_committing_transaction->t_tid;
> > + read_unlock(&journal->j_state_lock);
> > + if (commit_tid)
> > + jbd2_log_wait_commit(journal, commit_tid);
> > + }
> > +}
> > +
> > +/*
> > * ext4_setattr()
> > *
> > * Called from notify_change.
> > @@ -4342,16 +4390,28 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> > }
> >
> > if (attr->ia_valid & ATTR_SIZE) {
> > - if (attr->ia_size != i_size_read(inode)) {
> > - truncate_setsize(inode, attr->ia_size);
> > - /* Inode size will be reduced, wait for dio in flight.
> > - * Temporarily disable dioread_nolock to prevent
> > - * livelock. */
> > + if (attr->ia_size != inode->i_size) {
> > + loff_t oldsize = inode->i_size;
> > +
> > + i_size_write(inode, attr->ia_size);
> > + /*
> > + * Blocks are going to be removed from the inode. Wait
> > + * for dio in flight. Temporarily disable
> > + * dioread_nolock to prevent livelock.
> > + */
> > if (orphan) {
> > - ext4_inode_block_unlocked_dio(inode);
> > - inode_dio_wait(inode);
> > - ext4_inode_resume_unlocked_dio(inode);
> > + if (!ext4_should_journal_data(inode)) {
> > + ext4_inode_block_unlocked_dio(inode);
> > + inode_dio_wait(inode);
> > + ext4_inode_resume_unlocked_dio(inode);
> > + } else
> > + ext4_wait_for_tail_page_commit(inode);
> > }
> > + /*
> > + * Truncate pagecache after we've waited for commit
> > + * in data=journal mode to make pages freeable.
> > + */
> > + truncate_pagecache(inode, oldsize, inode->i_size);
> > }
> > ext4_truncate(inode);
> > }
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index a74ba46..e1475b4 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -1850,7 +1850,6 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
> >
> > BUFFER_TRACE(bh, "entry");
> >
> > -retry:
> > /*
> > * It is safe to proceed here without the j_list_lock because the
> > * buffers cannot be stolen by try_to_free_buffers as long as we are
> > @@ -1945,14 +1944,11 @@ retry:
> > * for commit and try again.
> > */
> > if (partial_page) {
> > - tid_t tid = journal->j_committing_transaction->t_tid;
> > -
> > jbd2_journal_put_journal_head(jh);
> > spin_unlock(&journal->j_list_lock);
> > jbd_unlock_bh_state(bh);
> > write_unlock(&journal->j_state_lock);
> > - jbd2_log_wait_commit(journal, tid);
> > - goto retry;
> > + return -EBUSY;
> > }
> > /*
> > * OK, buffer won't be reachable after truncate. We just set
> > @@ -2013,21 +2009,23 @@ zap_buffer_unlocked:
> > * @page: page to flush
> > * @offset: length of page to invalidate.
> > *
> > - * Reap page buffers containing data after offset in page.
> > - *
> > + * Reap page buffers containing data after offset in page. Can return -EBUSY
> > + * if buffers are part of the committing transaction and the page is straddling
> > + * i_size. Caller then has to wait for current commit and try again.
> > */
> > -void jbd2_journal_invalidatepage(journal_t *journal,
> > - struct page *page,
> > - unsigned long offset)
> > +int jbd2_journal_invalidatepage(journal_t *journal,
> > + struct page *page,
> > + unsigned long offset)
> > {
> > struct buffer_head *head, *bh, *next;
> > unsigned int curr_off = 0;
> > int may_free = 1;
> > + int ret = 0;
> >
> > if (!PageLocked(page))
> > BUG();
> > if (!page_has_buffers(page))
> > - return;
> > + return 0;
> >
> > /* We will potentially be playing with lists other than just the
> > * data lists (especially for journaled data mode), so be
> > @@ -2041,9 +2039,11 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> > if (offset <= curr_off) {
> > /* This block is wholly outside the truncation point */
> > lock_buffer(bh);
> > - may_free &= journal_unmap_buffer(journal, bh,
> > - offset > 0);
> > + ret = journal_unmap_buffer(journal, bh, offset > 0);
> > unlock_buffer(bh);
> > + if (ret < 0)
> > + return ret;
> > + may_free &= ret;
> > }
> > curr_off = next_off;
> > bh = next;
> > @@ -2054,6 +2054,7 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> > if (may_free && try_to_free_buffers(page))
> > J_ASSERT(!page_has_buffers(page));
> > }
> > + return 0;
> > }
> >
> > /*
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 3efc43f..cb7d6af 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1099,7 +1099,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
> > extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
> > extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> > extern void journal_sync_buffer (struct buffer_head *);
> > -extern void jbd2_journal_invalidatepage(journal_t *,
> > +extern int jbd2_journal_invalidatepage(journal_t *,
> > struct page *, unsigned long);
> > extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> > extern int jbd2_journal_stop(handle_t *);
> > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> > index df972d9..3ef522e 100644
> > --- a/include/trace/events/ext4.h
> > +++ b/include/trace/events/ext4.h
> > @@ -476,13 +476,13 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
> > (unsigned long) __entry->index, __entry->offset)
> > );
> >
> > -DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage
> > +DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
> > TP_PROTO(struct page *page, unsigned long offset),
> >
> > TP_ARGS(page, offset)
> > );
> >
> > -DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage
> > +DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
> > TP_PROTO(struct page *page, unsigned long offset),
> >
> > TP_ARGS(page, offset)
> > --
> > 1.7.1
> >
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer()
2012-12-21 23:03 ` Jan Kara
@ 2012-12-22 3:52 ` Theodore Ts'o
0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2012-12-22 3:52 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, Mark Fasheh, Joel Becker, ocfs2-devel
On Sat, Dec 22, 2012 at 12:03:47AM +0100, Jan Kara wrote:
> I should have commented on this in the changelog :).
> jbd2_journal_invalidatepage() gets called only for file pagecache pages.
> Because ocfs2 doesn't do data journalling it never sees buffers that are
> part of a transaction in jbd2_journal_invalidatepage() (similarly to ext4
> except for data=journal case). I'll send a patch to just stop calling
> jbd2_journal_invalidatepage() for ocfs2... But you are safe to merge this
> patch in the mean time.
Ok, so if ocfs2 never tries journalling data blocks, then buffer_jbd()
will always be NULL, which reduces jbd2_journal_invalidate() to be
equivalent in functionality to block_invalidatepage(), and so it will
never abort and return EBUSY.
Thanks for the explanation, I'll apply the patch.
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-22 3:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-12 20:50 [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Jan Kara
2012-12-12 20:50 ` [PATCH 2/2] ext4: Fix deadlock in journal_unmap_buffer() Jan Kara
2012-12-21 19:52 ` Theodore Ts'o
2012-12-21 23:03 ` Jan Kara
2012-12-22 3:52 ` Theodore Ts'o
2012-12-21 19:48 ` [PATCH 1/2] ext4: Split off ext4_journalled_invalidatepage() Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).