* [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2
@ 2011-04-21 9:34 Dave Chinner
2011-04-21 9:34 ` [PATCH 1/4] xfs: fix xfs_itruncate_start tracing Dave Chinner
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Dave Chinner @ 2011-04-21 9:34 UTC (permalink / raw)
To: xfs
Candidate bug fix patch stack. Updated to fix minor issues Lachlan
and Christoph noticed in the first version.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/4] xfs: fix xfs_itruncate_start tracing 2011-04-21 9:34 [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2 Dave Chinner @ 2011-04-21 9:34 ` Dave Chinner 2011-04-21 9:34 ` [PATCH 2/4] xfs: don't look forever in xfs_inode_ag_walk during async inode flushes Dave Chinner ` (2 subsequent siblings) 3 siblings, 0 replies; 7+ messages in thread From: Dave Chinner @ 2011-04-21 9:34 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> Variables are ordered incorrectly in trace call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_inode.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a37480a..da46c0f 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1354,7 +1354,7 @@ xfs_itruncate_start( return 0; } last_byte = xfs_file_last_byte(ip); - trace_xfs_itruncate_start(ip, flags, new_size, toss_start, last_byte); + trace_xfs_itruncate_start(ip, new_size, flags, toss_start, last_byte); if (last_byte > toss_start) { if (flags & XFS_ITRUNC_DEFINITE) { xfs_tosspages(ip, toss_start, -- 1.7.4.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] xfs: don't look forever in xfs_inode_ag_walk during async inode flushes 2011-04-21 9:34 [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2 Dave Chinner 2011-04-21 9:34 ` [PATCH 1/4] xfs: fix xfs_itruncate_start tracing Dave Chinner @ 2011-04-21 9:34 ` Dave Chinner 2011-04-21 9:34 ` [PATCH 3/4] xfs: reset buffer pointers before freeing them Dave Chinner 2011-04-21 9:34 ` [PATCH 4/4] xfs: obey minleft values during extent allocation correctly Dave Chinner 3 siblings, 0 replies; 7+ messages in thread From: Dave Chinner @ 2011-04-21 9:34 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> When the underlying inode buffer is locked and xfs_sync_inode_attr() is doing a non-blocking flush, xfs_iflush() can return EAGAIN. When this happenѕ, clear the error rather than returning it to xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk delaying for a short while and trying again. This can result in background walks getting stuck on the one AG until inode buffer is unlocked by some other means. This behaviour was noticed when analysing event traces followed by code inspection and verification of the fix via further traces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/linux-2.6/xfs_sync.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 7883762..3253572 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -267,6 +267,16 @@ xfs_sync_inode_attr( error = xfs_iflush(ip, flags); + /* + * We don't want to try again on non-blocking flushes that can't run + * again immediately. If an inode really must be written, then that's + * what the SYNC_WAIT flag is for. + */ + if (error == EAGAIN) { + ASSERT(!(flags & SYNC_WAIT)); + error = 0; + } + out_unlock: xfs_iunlock(ip, XFS_ILOCK_SHARED); return error; -- 1.7.4.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] xfs: reset buffer pointers before freeing them 2011-04-21 9:34 [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2 Dave Chinner 2011-04-21 9:34 ` [PATCH 1/4] xfs: fix xfs_itruncate_start tracing Dave Chinner 2011-04-21 9:34 ` [PATCH 2/4] xfs: don't look forever in xfs_inode_ag_walk during async inode flushes Dave Chinner @ 2011-04-21 9:34 ` Dave Chinner 2011-04-21 9:34 ` [PATCH 4/4] xfs: obey minleft values during extent allocation correctly Dave Chinner 3 siblings, 0 replies; 7+ messages in thread From: Dave Chinner @ 2011-04-21 9:34 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/linux-2.6/xfs_buf.c | 21 ++++++++++++ fs/xfs/linux-2.6/xfs_buf.h | 1 + fs/xfs/xfs_log.c | 8 ++++- fs/xfs/xfs_log_recover.c | 75 ++++++++++++++++++++++--------------------- 4 files changed, 67 insertions(+), 38 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c index 9ef9ed2..ecf0db6 100644 --- a/fs/xfs/linux-2.6/xfs_buf.c +++ b/fs/xfs/linux-2.6/xfs_buf.c @@ -709,6 +709,27 @@ xfs_buf_get_empty( return bp; } +/* + * Return a buffer allocated as an empty buffer and associated to external + * memory via xfs_buf_associate_memory() back to it's empty state. + */ +void +xfs_buf_set_empty( + struct xfs_buf *bp, + size_t len) +{ + if (bp->b_pages) + _xfs_buf_free_pages(bp); + + bp->b_pages = NULL; + bp->b_page_count = 0; + bp->b_addr = NULL; + bp->b_file_offset = 0; + bp->b_buffer_length = bp->b_count_desired = len; + bp->b_bn = XFS_BUF_DADDR_NULL; + bp->b_flags &= ~XBF_MAPPED; +} + static inline struct page * mem_to_page( void *addr) diff --git a/fs/xfs/linux-2.6/xfs_buf.h b/fs/xfs/linux-2.6/xfs_buf.h index a9a1c45..50a7d5f 100644 --- a/fs/xfs/linux-2.6/xfs_buf.h +++ b/fs/xfs/linux-2.6/xfs_buf.h @@ -178,6 +178,7 @@ extern xfs_buf_t *xfs_buf_read(xfs_buftarg_t *, xfs_off_t, size_t, xfs_buf_flags_t); extern xfs_buf_t *xfs_buf_get_empty(size_t, xfs_buftarg_t *); +extern void xfs_buf_set_empty(struct xfs_buf *bp, size_t len); extern xfs_buf_t *xfs_buf_get_uncached(struct xfs_buftarg *, size_t, int); extern int xfs_buf_associate_memory(xfs_buf_t *, void *, size_t); extern void xfs_buf_hold(xfs_buf_t *); diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b612ce4..3850a91 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1449,6 +1449,13 @@ xlog_dealloc_log(xlog_t *log) xlog_cil_destroy(log); + /* + * always need to ensure that the extra buffer does not point to memory + * owned by another log buffer before we free it. + */ + xfs_buf_set_empty(log->l_xbuf, log->l_iclog_size); + xfs_buf_free(log->l_xbuf); + iclog = log->l_iclog; for (i=0; i<log->l_iclog_bufs; i++) { xfs_buf_free(iclog->ic_bp); @@ -1458,7 +1465,6 @@ xlog_dealloc_log(xlog_t *log) } spinlock_destroy(&log->l_icloglock); - xfs_buf_free(log->l_xbuf); log->l_mp->m_log = NULL; kmem_free(log); } /* xlog_dealloc_log */ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 5cc464a..04142ca 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -205,6 +205,35 @@ xlog_bread( } /* + * Read at an offset into the buffer. Returns with the buffer in it's original + * state regardless of the result of the read. + */ +STATIC int +xlog_bread_offset( + xlog_t *log, + xfs_daddr_t blk_no, /* block to read from */ + int nbblks, /* blocks to read */ + xfs_buf_t *bp, + xfs_caddr_t offset) +{ + xfs_caddr_t orig_offset = XFS_BUF_PTR(bp); + int orig_len = bp->b_buffer_length; + int error, error2; + + error = XFS_BUF_SET_PTR(bp, offset, BBTOB(nbblks)); + if (error) + return error; + + error = xlog_bread_noalign(log, blk_no, nbblks, bp); + + /* must reset buffer pointer even on error */ + error2 = XFS_BUF_SET_PTR(bp, orig_offset, orig_len); + if (error) + return error; + return error2; +} + +/* * Write out the buffer at the given block for the given number of blocks. * The buffer is kept locked across the write and is returned locked. * This can only be used for synchronous log writes. @@ -1229,20 +1258,12 @@ xlog_write_log_records( */ ealign = round_down(end_block, sectbb); if (j == 0 && (start_block + endcount > ealign)) { - offset = XFS_BUF_PTR(bp); - balign = BBTOB(ealign - start_block); - error = XFS_BUF_SET_PTR(bp, offset + balign, - BBTOB(sectbb)); + offset = XFS_BUF_PTR(bp) + BBTOB(ealign - start_block); + error = xlog_bread_offset(log, ealign, sectbb, + bp, offset); if (error) break; - error = xlog_bread_noalign(log, ealign, sectbb, bp); - if (error) - break; - - error = XFS_BUF_SET_PTR(bp, offset, bufblks); - if (error) - break; } offset = xlog_align(log, start_block, endcount, bp); @@ -3448,19 +3469,9 @@ xlog_do_recovery_pass( * - order is important. */ wrapped_hblks = hblks - split_hblks; - error = XFS_BUF_SET_PTR(hbp, - offset + BBTOB(split_hblks), - BBTOB(hblks - split_hblks)); - if (error) - goto bread_err2; - - error = xlog_bread_noalign(log, 0, - wrapped_hblks, hbp); - if (error) - goto bread_err2; - - error = XFS_BUF_SET_PTR(hbp, offset, - BBTOB(hblks)); + error = xlog_bread_offset(log, 0, + wrapped_hblks, hbp, + offset + BBTOB(split_hblks)); if (error) goto bread_err2; } @@ -3511,19 +3522,9 @@ xlog_do_recovery_pass( * _first_, then the log start (LR header end) * - order is important. */ - error = XFS_BUF_SET_PTR(dbp, - offset + BBTOB(split_bblks), - BBTOB(bblks - split_bblks)); - if (error) - goto bread_err2; - - error = xlog_bread_noalign(log, wrapped_hblks, - bblks - split_bblks, - dbp); - if (error) - goto bread_err2; - - error = XFS_BUF_SET_PTR(dbp, offset, h_size); + error = xlog_bread_offset(log, 0, + bblks - split_bblks, hbp, + offset + BBTOB(split_bblks)); if (error) goto bread_err2; } -- 1.7.4.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] xfs: obey minleft values during extent allocation correctly. 2011-04-21 9:34 [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2 Dave Chinner ` (2 preceding siblings ...) 2011-04-21 9:34 ` [PATCH 3/4] xfs: reset buffer pointers before freeing them Dave Chinner @ 2011-04-21 9:34 ` Dave Chinner 2011-04-29 1:51 ` Dave Chinner 2011-04-29 6:25 ` Christoph Hellwig 3 siblings, 2 replies; 7+ messages in thread From: Dave Chinner @ 2011-04-21 9:34 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> When allocating an extent that is long enough to consume the remaining free space in an AG, we need to ensure that the allocation leaves enough space in the AG for any subsequent bmap btree blocks that are needed to track the new extent. These have to be allocated in the same AG as we only reserve enough blocks in an allocation transaction for modification of the freespace trees in a single AG. xfs_alloc_fix_minleft() has been considering blocks on the AGFL as free blocks available for extent and bmbt block allocation, which is not correct - blocks on the AGFL are there exclusively for the use of the free space btrees. As a result, when minleft is less than the number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim the given extent to leave minleft blocks available for bmbt allocation, and hence we can fail allocation during bmbt record insertion. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_alloc.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c index 27d64d7..8946464 100644 --- a/fs/xfs/xfs_alloc.c +++ b/fs/xfs/xfs_alloc.c @@ -280,7 +280,6 @@ xfs_alloc_fix_minleft( return 1; agf = XFS_BUF_TO_AGF(args->agbp); diff = be32_to_cpu(agf->agf_freeblks) - + be32_to_cpu(agf->agf_flcount) - args->len - args->minleft; if (diff >= 0) return 1; -- 1.7.4.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 4/4] xfs: obey minleft values during extent allocation correctly. 2011-04-21 9:34 ` [PATCH 4/4] xfs: obey minleft values during extent allocation correctly Dave Chinner @ 2011-04-29 1:51 ` Dave Chinner 2011-04-29 6:25 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Dave Chinner @ 2011-04-29 1:51 UTC (permalink / raw) To: xfs ping? On Thu, Apr 21, 2011 at 07:34:28PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When allocating an extent that is long enough to consume the > remaining free space in an AG, we need to ensure that the allocation > leaves enough space in the AG for any subsequent bmap btree blocks > that are needed to track the new extent. These have to be allocated > in the same AG as we only reserve enough blocks in an allocation > transaction for modification of the freespace trees in a single AG. > > xfs_alloc_fix_minleft() has been considering blocks on the AGFL as > free blocks available for extent and bmbt block allocation, which is > not correct - blocks on the AGFL are there exclusively for the use > of the free space btrees. As a result, when minleft is less than the > number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim > the given extent to leave minleft blocks available for bmbt > allocation, and hence we can fail allocation during bmbt record > insertion. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_alloc.c | 1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c > index 27d64d7..8946464 100644 > --- a/fs/xfs/xfs_alloc.c > +++ b/fs/xfs/xfs_alloc.c > @@ -280,7 +280,6 @@ xfs_alloc_fix_minleft( > return 1; > agf = XFS_BUF_TO_AGF(args->agbp); > diff = be32_to_cpu(agf->agf_freeblks) > - + be32_to_cpu(agf->agf_flcount) > - args->len - args->minleft; > if (diff >= 0) > return 1; > -- > 1.7.4.4 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 4/4] xfs: obey minleft values during extent allocation correctly. 2011-04-21 9:34 ` [PATCH 4/4] xfs: obey minleft values during extent allocation correctly Dave Chinner 2011-04-29 1:51 ` Dave Chinner @ 2011-04-29 6:25 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2011-04-29 6:25 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-04-29 6:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 9:34 [PATCH 0/4] xfs: candidate fixes for 2.6.39-rc4 v2 Dave Chinner 2011-04-21 9:34 ` [PATCH 1/4] xfs: fix xfs_itruncate_start tracing Dave Chinner 2011-04-21 9:34 ` [PATCH 2/4] xfs: don't look forever in xfs_inode_ag_walk during async inode flushes Dave Chinner 2011-04-21 9:34 ` [PATCH 3/4] xfs: reset buffer pointers before freeing them Dave Chinner 2011-04-21 9:34 ` [PATCH 4/4] xfs: obey minleft values during extent allocation correctly Dave Chinner 2011-04-29 1:51 ` Dave Chinner 2011-04-29 6:25 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox