* [PATCH 0/4] xfs: resurrect realtime subvolume support on kernel 2.6.37 @ 2012-02-17 22:46 kdasu 2012-02-17 22:51 ` [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation kdasu 0 siblings, 1 reply; 11+ messages in thread From: kdasu @ 2012-02-17 22:46 UTC (permalink / raw) To: xfs Fixed bimap inode deadlock existing on 2.6.37. [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 Also added the realtime subvolume patches from the 2.6.39 release. [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation [PATCH 2/4] xfs: fix xfs_get_extsz_hint for a zero extent size hint [PATCH 3/4] xfs: add lockdep annotations for the rt inodes These fixes were needed to resurrect realtime subvolume support on 2.6.37 kernel. I suspect the deadlock problem while freeing multiple extents exits on 2.6.39 in the xfs_rtfree_entent() call. Kamal -- View this message in context: http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33345988.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation 2012-02-17 22:46 [PATCH 0/4] xfs: resurrect realtime subvolume support on kernel 2.6.37 kdasu @ 2012-02-17 22:51 ` kdasu 2012-02-17 22:55 ` [PATCH 2/4] xfs: fix xfs_get_extsz_hint for a zero extent size hint kdasu 0 siblings, 1 reply; 11+ messages in thread From: kdasu @ 2012-02-17 22:51 UTC (permalink / raw) To: xfs Currently both xfs_rtpick_extent and xfs_rtallocate_extent call xfs_trans_iget to grab and lock the rt bitmap inode, which results in a deadlock since the removal of the lock recursion counters in commit "xfs: simplify inode to transaction joining" Fix this by acquiring and locking the inode in xfs_bmap_rtalloc before calling into xfs_rtpick_extent and xfs_rtallocate_extent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_bmap.c | 11 +++++++++++ fs/xfs/xfs_rtalloc.c | 34 +++++++++++++--------------------- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 4111cd3..9d9970b 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -2316,6 +2316,7 @@ xfs_bmap_rtalloc( xfs_extlen_t prod = 0; /* product factor for allocators */ xfs_extlen_t ralen = 0; /* realtime allocation length */ xfs_extlen_t align; /* minimum allocation alignment */ + xfs_inode_t *ip; /* bitmap incore inode */ xfs_rtblock_t rtb; kdasu@kdasu-VirtualBox:~/linux-2.6$ more 0001-xfs-only-lock-the-rt-bitmap-inode-once-per-allocatio.patch >From f59ab29cc191a5955c4d44c0b92a537f981184c2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@infradead.org> Date: Tue, 25 Jan 2011 09:06:19 +0000 Subject: [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation Currently both xfs_rtpick_extent and xfs_rtallocate_extent call xfs_trans_iget to grab and lock the rt bitmap inode, which results in a deadlock since the removal of the lock recursion counters in commit "xfs: simplify inode to transaction joining" Fix this by acquiring and locking the inode in xfs_bmap_rtalloc before calling into xfs_rtpick_extent and xfs_rtallocate_extent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_bmap.c | 11 +++++++++++ fs/xfs/xfs_rtalloc.c | 34 +++++++++++++--------------------- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 4111cd3..9d9970b 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -2316,6 +2316,7 @@ xfs_bmap_rtalloc( xfs_extlen_t prod = 0; /* product factor for allocators */ xfs_extlen_t ralen = 0; /* realtime allocation length */ xfs_extlen_t align; /* minimum allocation alignment */ + xfs_inode_t *ip; /* bitmap incore inode */ xfs_rtblock_t rtb; mp = ap->ip->i_mount; @@ -2348,6 +2349,16 @@ xfs_bmap_rtalloc( */ if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN) ralen = MAXEXTLEN / mp->m_sb.sb_rextsize; + + /* + * Lock out other modifications to the RT bitmap inode. + */ + error = xfs_trans_iget(mp, ap->tp, mp->m_sb.sb_rbmino, 0, + XFS_ILOCK_EXCL, &ip); + if (error) + return error; + ASSERT(ip == mp->m_rbmip); + /* * If it's an allocation to an empty file at offset 0, * pick an extent that will space things out in the rt area. diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 12a1913..037fab1 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -2075,15 +2075,15 @@ xfs_rtallocate_extent( xfs_extlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { + xfs_mount_t *mp = tp->t_mountp; int error; /* error value */ - xfs_inode_t *ip; /* inode for bitmap file */ - xfs_mount_t *mp; /* file system mount structure */ xfs_rtblock_t r; /* result allocated block */ xfs_fsblock_t sb; /* summary file block number */ xfs_buf_t *sumbp; /* summary file block buffer */ + ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); ASSERT(minlen > 0 && minlen <= maxlen); - mp = tp->t_mountp; + /* * If prod is set then figure out what to do to minlen and maxlen. */ @@ -2099,12 +2099,7 @@ xfs_rtallocate_extent( return 0; } } - /* - * Lock out other callers by grabbing the bitmap inode lock. - */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip))) - return error; + sumbp = NULL; /* * Allocate by size, or near another block, or exactly at some block. @@ -2123,11 +2118,12 @@ xfs_rtallocate_extent( len, &sumbp, &sb, prod, &r); break; default: + error = EIO; ASSERT(0); } - if (error) { + if (error) return error; - } + /* * If it worked, update the superblock. */ @@ -2306,20 +2302,16 @@ xfs_rtpick_extent( xfs_rtblock_t *pick) /* result rt extent */ { xfs_rtblock_t b; /* result block */ - int error; /* error return value */ - xfs_inode_t *ip; /* bitmap incore inode */ int log2; /* log of sequence number */ __uint64_t resid; /* residual after log removed */ __uint64_t seq; /* sequence number of file creation */ __uint64_t *seqp; /* pointer to seqno in inode */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip))) - return error; + sumbp = NULL; /* * Allocate by size, or near another block, or exactly at some block. @@ -2123,11 +2118,12 @@ xfs_rtallocate_extent( len, &sumbp, &sb, prod, &r); break; default: + error = EIO; ASSERT(0); } - if (error) { + if (error) return error; - } + /* * If it worked, update the superblock. */ @@ -2306,20 +2302,16 @@ xfs_rtpick_extent( xfs_rtblock_t *pick) /* result rt extent */ { xfs_rtblock_t b; /* result block */ - int error; /* error return value */ - xfs_inode_t *ip; /* bitmap incore inode */ int log2; /* log of sequence number */ __uint64_t resid; /* residual after log removed */ __uint64_t seq; /* sequence number of file creation */ __uint64_t *seqp; /* pointer to seqno in inode */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip))) - return error; - ASSERT(ip == mp->m_rbmip); - seqp = (__uint64_t *)&ip->i_d.di_atime; - if (!(ip->i_d.di_flags & XFS_DIFLAG_NEWRTBM)) { - ip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM; + ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); + + seqp = (__uint64_t *)&mp->m_rbmip->i_d.di_atime; + if (!(mp->m_rbmip->i_d.di_flags & XFS_DIFLAG_NEWRTBM)) { + mp->m_rbmip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM; *seqp = 0; } seq = *seqp; @@ -2335,7 +2327,7 @@ xfs_rtpick_extent( b = mp->m_sb.sb_rextents - len; } *seqp = seq + 1; - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE); *pick = b; return 0; } -- 1.7.5.4 -- View this message in context: http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33346009.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] xfs: fix xfs_get_extsz_hint for a zero extent size hint 2012-02-17 22:51 ` [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation kdasu @ 2012-02-17 22:55 ` kdasu 2012-02-17 22:58 ` [PATCH 3/4] xfs: add lockdep annotations for the rt inodes kdasu 0 siblings, 1 reply; 11+ messages in thread From: kdasu @ 2012-02-17 22:55 UTC (permalink / raw) To: xfs We can easily set the extsize flag without setting an extent size hint, or one that evaluates to zero. Historically the di_extsize field was only used when it was non-zero, but the commit "Cleanup inode extent size hint extraction" broke this. Restore the old behaviour, thus fixing xfsqa 090 with a debug kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_rw.c | 18 +++++------------- 1 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/xfs/xfs_rw.c b/fs/xfs/xfs_rw.c index 56861d5..ccd3adf 100644 --- a/fs/xfs/xfs_rw.c +++ b/fs/xfs/xfs_rw.c @@ -173,17 +173,9 @@ xfs_extlen_t xfs_get_extsz_hint( struct xfs_inode *ip) { - xfs_extlen_t extsz; - - if (unlikely(XFS_IS_REALTIME_INODE(ip))) { - extsz = (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) - ? ip->i_d.di_extsize - : ip->i_mount->m_sb.sb_rextsize; - ASSERT(extsz); - } else { - extsz = (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) - ? ip->i_d.di_extsize : 0; - } - - return extsz; + if ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && ip->i_d.di_extsize) + return ip->i_d.di_extsize; + if (XFS_IS_REALTIME_INODE(ip)) + return ip->i_mount->m_sb.sb_rextsize; + return 0; } -- 1.7.5.4 -- View this message in context: http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33346035.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] xfs: add lockdep annotations for the rt inodes 2012-02-17 22:55 ` [PATCH 2/4] xfs: fix xfs_get_extsz_hint for a zero extent size hint kdasu @ 2012-02-17 22:58 ` kdasu 2012-02-17 23:00 ` [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 kdasu 0 siblings, 1 reply; 11+ messages in thread From: kdasu @ 2012-02-17 22:58 UTC (permalink / raw) To: xfs The rt bitmap and summary inodes do not participate in the normal inode locking protocol. Instead the rt bitmap inode can be locked in any transaction involving rt allocations, and the both of the rt inodes can be locked at the same time. Add specific lockdep subclasses for the rt inodes to prevent lockdep from blowing up. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_bmap.c | 2 +- fs/xfs/xfs_inode.h | 23 +++++++++++++++-------- fs/xfs/xfs_rtalloc.c | 16 ++++++++++------ 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 9d9970b..36c317c 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -2354,7 +2354,7 @@ xfs_bmap_rtalloc( * Lock out other modifications to the RT bitmap inode. */ error = xfs_trans_iget(mp, ap->tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip); + XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP, &ip); if (error) return error; ASSERT(ip == mp->m_rbmip); diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index fb2ca2e..a9e82d4 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -408,28 +408,35 @@ static inline void xfs_ifunlock(xfs_inode_t *ip) /* * Flags for lockdep annotations. * - * XFS_I[O]LOCK_PARENT - for operations that require locking two inodes - * (ie directory operations that require locking a directory inode and - * an entry inode). The first inode gets locked with this flag so it - * gets a lockdep subclass of 1 and the second lock will have a lockdep - * subclass of 0. + * XFS_LOCK_PARENT - for directory operations that require locking a + * parent directory inode and a child entry inode. The parent gets locked + * with this flag so it gets a lockdep subclass of 1 and the child entry + * lock will have a lockdep subclass of 0. + * + * XFS_LOCK_RTBITMAP/XFS_LOCK_RTSUM - the realtime device bitmap and summary + * inodes do not participate in the normal lock order, and thus have their + * own subclasses. * * XFS_LOCK_INUMORDER - for locking several inodes at the some time * with xfs_lock_inodes(). This flag is used as the starting subclass * and each subsequent lock acquired will increment the subclass by one. - * So the first lock acquired will have a lockdep subclass of 2, the - * second lock will have a lockdep subclass of 3, and so on. It is + * So the first lock acquired will have a lockdep subclass of 4, the + * second lock will have a lockdep subclass of 5, and so on. It is * the responsibility of the class builder to shift this to the correct * portion of the lock_mode lockdep mask. */ #define XFS_LOCK_PARENT 1 -#define XFS_LOCK_INUMORDER 2 +#define XFS_LOCK_RTBITMAP 2 +#define XFS_LOCK_RTSUM 3 +#define XFS_LOCK_INUMORDER 4 #define XFS_IOLOCK_SHIFT 16 #define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT) #define XFS_ILOCK_SHIFT 24 #define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT) +#define XFS_ILOCK_RTBITMAP (XFS_LOCK_RTBITMAP << XFS_ILOCK_SHIFT) +#define XFS_ILOCK_RTSUM (XFS_LOCK_RTSUM << XFS_ILOCK_SHIFT) #define XFS_IOLOCK_DEP_MASK 0x00ff0000 #define XFS_ILOCK_DEP_MASK 0xff000000 diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 037fab1..f592ac9 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1972,8 +1972,10 @@ xfs_growfs_rt( /* * Lock out other callers by grabbing the bitmap inode lock. */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip))) + error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, + XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP, + &ip); + if (error) goto error_cancel; ASSERT(ip == mp->m_rbmip); /* @@ -1986,8 +1988,9 @@ xfs_growfs_rt( /* * Get the summary inode into the transaction. */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rsumino, 0, - XFS_ILOCK_EXCL, &ip))) + error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rsumino, 0, + XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM, &ip); + if (error) goto error_cancel; ASSERT(ip == mp->m_rsumip); /* @@ -2160,8 +2163,9 @@ xfs_rtfree_extent( /* * Synchronize by locking the bitmap inode. */ - if ((error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL, &ip))) + error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, + XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP, &ip); + if (error) return error; #if defined(__KERNEL__) && defined(DEBUG) /* -- 1.7.5.4 -- View this message in context: http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33346043.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-17 22:58 ` [PATCH 3/4] xfs: add lockdep annotations for the rt inodes kdasu @ 2012-02-17 23:00 ` kdasu 2012-02-19 22:41 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: kdasu @ 2012-02-17 23:00 UTC (permalink / raw) To: xfs On the 2.6.37 kernel, xfs_fs_evict_inode() leads to a deadlock when freeing multiple realtime extents. On further debugging the root cause it was determined to be recursive locking of the RT bitmap inode during evict operation within the same task context. The same vfs evict sequence is replayed by the xfs log recovery on mounts on a reboot after the problem happens first time. This problem exists on kernel v2.6.39 as well. Call stack: xfs_ilock <- simple task deadlock in the xfs_ilock(ip, XFS_ILOCK_EXCL) re-acquired on second iteration when the inode is cached xfs_iget_cache_hit xfs_iget xfs_trans_iget xfs_rtfree_extent <- Call to xfs_trans_iget() xfs_bmap_del_extent xfs_bunmapi <- while loop based on number of extents to free xfs_itruncate_finish xfs_inactive evict The deadlock fix has two parts : 1) check if the inode is already locked in xfs_iget.c in the xfs_iget_cache_hit() function. Do not acquire the inode lock again if ip is already locked with the XFS_ILOCK_EXCL subclass. We use the active transaction structure to detect if the inode is already lokced. 2) In addition in xfs_trans_inode.c:xfs_trans_iget() prevent joining already active transaction. The above changes are also needed along with the backport of following 2.6.39 kernel patches to 2.6.37 kernel: xfs: only lock the rt bitmap inode once per allocation xfs: fix xfs_get_extsz_hint for a zero extent size hint xfs: add lockdep annotations for the rt inodes Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_iget.c | 12 +++++++++++- fs/xfs/xfs_trans_inode.c | 2 +- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c index 0cdd269..f05bdc2 100644 --- a/fs/xfs/xfs_iget.c +++ b/fs/xfs/xfs_iget.c @@ -143,6 +143,7 @@ xfs_inode_free( static int xfs_iget_cache_hit( struct xfs_perag *pag, + xfs_trans_t *tp, struct xfs_inode *ip, int flags, int lock_flags) __releases(pag->pag_ici_lock) @@ -234,6 +235,15 @@ xfs_iget_cache_hit( trace_xfs_iget_hit(ip); } + /* check inode already locked */ + spin_lock(&ip->i_flags_lock); + if (tp && ip->i_transp == tp) { + if ((ip->i_itemp->ili_lock_flags & lock_flags) & + (XFS_ILOCK_EXCL)) + lock_flags = 0; + } + spin_unlock(&ip->i_flags_lock); + if (lock_flags != 0) xfs_ilock(ip, lock_flags); @@ -379,7 +389,7 @@ again: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (ip) { - error = xfs_iget_cache_hit(pag, ip, flags, lock_flags); + error = xfs_iget_cache_hit(pag, tp, ip, flags, lock_flags); if (error) goto out_error_or_again; } else { diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c index ccb3453..6f8db93 100644 --- a/fs/xfs/xfs_trans_inode.c +++ b/fs/xfs/xfs_trans_inode.c @@ -58,7 +58,7 @@ xfs_trans_iget( int error; error = xfs_iget(mp, tp, ino, flags, lock_flags, ipp); - if (!error && tp) { + if (!error && tp && !((*ipp)->i_transp)) { xfs_trans_ijoin(tp, *ipp); (*ipp)->i_itemp->ili_lock_flags = lock_flags; } -- 1.7.5.4 -- View this message in context: http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33346051.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-17 23:00 ` [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 kdasu @ 2012-02-19 22:41 ` Christoph Hellwig 2012-02-21 17:22 ` Kamal Dasu 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2012-02-19 22:41 UTC (permalink / raw) To: kdasu; +Cc: xfs On Fri, Feb 17, 2012 at 03:00:09PM -0800, kdasu wrote: > > > On the 2.6.37 kernel, xfs_fs_evict_inode() leads to a deadlock when > freeing multiple realtime extents. On further debugging the root > cause it was determined to be recursive locking of the RT bitmap > inode during evict operation within the same task context. > The same vfs evict sequence is replayed by the xfs log recovery on > mounts on a reboot after the problem happens first time. > This problem exists on kernel v2.6.39 as well. I think you're better off fixing this problem like I did for the allocation side, that is: - remove the xfs_ilock and xfs_trans_ijoin (or probably still xfs_trans_iget in your version) from xfs_rtfree_extent, and instead add asserts that the inode is locked and has an inode_item attach to it. - in xfs_bunmapi if we are dealing with an inode with the rt flag bump the reference count on the inode there and attach it to the transaction before calling into xfs_bmap_del_extent, similar to what we do in xfs_bmap_rtalloc. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-19 22:41 ` Christoph Hellwig @ 2012-02-21 17:22 ` Kamal Dasu 2012-02-23 16:52 ` [PATCH 4/4] V2 " Kamal Dasu 0 siblings, 1 reply; 11+ messages in thread From: Kamal Dasu @ 2012-02-21 17:22 UTC (permalink / raw) To: xfs Christoph Hellwig wrote: > > I think you're better off fixing this problem like I did for the > allocation side, that is: > > - remove the xfs_ilock and xfs_trans_ijoin (or probably still > xfs_trans_iget in your version) from xfs_rtfree_extent, and > instead add asserts that the inode is locked and has an inode_item > attach to it. > - in xfs_bunmapi if we are dealing with an inode with the rt flag > bump the reference count on the inode there and attach it to the > transaction before calling into xfs_bmap_del_extent, similar to > what we do in xfs_bmap_rtalloc. > I will make the change and test and send the new version of the patch. BTW when you say reference counting the inode do you mean I should call xfs_trans_ijoin_ref(). -- View this message in context: http://old.nabble.com/-PATCH-0-4--RFC-xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33365485.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] V2 xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-21 17:22 ` Kamal Dasu @ 2012-02-23 16:52 ` Kamal Dasu 2012-02-25 9:40 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Kamal Dasu @ 2012-02-23 16:52 UTC (permalink / raw) To: xfs To fix the deadlock caused by recursively calling xfs_rtfree_extent from xfs_bunmapi(): - removed xfs_trans_iget() from xfs_rtfree_extent(), instead added asserts that the inode is locked and has an inode_item attached to it. - in xfs_bunmapi() when dealing with an inode with the rt flag call xfs_ilock() and xfs_trans_ijoin() so that the reference count is bumped on the inode and attached it to the transaction before calling into xfs_bmap_del_extent, similar to what we do in xfs_bmap_rtalloc. Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- fs/xfs/xfs_bmap.c | 10 ++++++++++ fs/xfs/xfs_rtalloc.c | 13 ++++++------- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 36c317c..103a253 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -5058,6 +5058,16 @@ xfs_bunmapi( cur->bc_private.b.flags = 0; } else cur = NULL; + + if (isrt) { + /* + * Synchronize by locking the bitmap inode. + */ + xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL); + xfs_trans_ijoin_ref(tp, mp->m_rbmip, + XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + } + extno = 0; while (bno != (xfs_fileoff_t)-1 && bno >= start && lastx >= 0 && (nexts == 0 || extno < nexts)) { diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index f592ac9..d0a48d4 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -2160,13 +2160,12 @@ xfs_rtfree_extent( xfs_buf_t *sumbp; /* summary file block buffer */ mp = tp->t_mountp; - /* - * Synchronize by locking the bitmap inode. - */ - error = xfs_trans_iget(mp, tp, mp->m_sb.sb_rbmino, 0, - XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP, &ip); - if (error) - return error; + + ASSERT(mp->m_rbmip->i_itemp != NULL); + ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); + + ip = mp->m_rbmip; + #if defined(__KERNEL__) && defined(DEBUG) /* * Check to see that this whole range is currently allocated. -- 1.7.5.4 -- View this message in context: http://old.nabble.com/-PATCH-0-4--RFC-xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33379323.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] V2 xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-23 16:52 ` [PATCH 4/4] V2 " Kamal Dasu @ 2012-02-25 9:40 ` Christoph Hellwig 2012-02-25 15:46 ` Kamal Dasu 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2012-02-25 9:40 UTC (permalink / raw) To: Kamal Dasu; +Cc: xfs On Thu, Feb 23, 2012 at 08:52:57AM -0800, Kamal Dasu wrote: > > To fix the deadlock caused by recursively calling xfs_rtfree_extent > from xfs_bunmapi(): > > - removed xfs_trans_iget() from xfs_rtfree_extent(), > instead added asserts that the inode is locked and has an inode_item > attached to it. > - in xfs_bunmapi() when dealing with an inode with the rt flag > call xfs_ilock() and xfs_trans_ijoin() so that the > reference count is bumped on the inode and attached it to the > transaction before calling into xfs_bmap_del_extent, similar to > what we do in xfs_bmap_rtalloc. > > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> This looks good, thanks a lot! Do you have an easily reproducable testcase for this which we could put into xfstests? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] V2 xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-25 9:40 ` Christoph Hellwig @ 2012-02-25 15:46 ` Kamal Dasu 2012-02-28 8:36 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Kamal Dasu @ 2012-02-25 15:46 UTC (permalink / raw) To: xfs Christoph, I have not been able to create a simple test case for this yet. Currently the only way I have is to use an a time shift recording application that stored video streams on a real-time subvolume. Sometimes when such a stream is deleted I see the problem. I have not figured out how a test to consistently get allocation where the bit map span multiple extents while freeing the inode. I am still trying to come with a simple test case. If you have any ideas let me know I will be happy to try it out. Kamal Christoph Hellwig wrote: > > On Thu, Feb 23, 2012 at 08:52:57AM -0800, Kamal Dasu wrote: >> >> To fix the deadlock caused by recursively calling xfs_rtfree_extent >> from xfs_bunmapi(): >> >> - removed xfs_trans_iget() from xfs_rtfree_extent(), >> instead added asserts that the inode is locked and has an inode_item >> attached to it. >> - in xfs_bunmapi() when dealing with an inode with the rt flag >> call xfs_ilock() and xfs_trans_ijoin() so that the >> reference count is bumped on the inode and attached it to the >> transaction before calling into xfs_bmap_del_extent, similar to >> what we do in xfs_bmap_rtalloc. >> >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> > > This looks good, thanks a lot! > > Do you have an easily reproducable testcase for this which we could > put into xfstests? > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > > -- View this message in context: http://old.nabble.com/-PATCH-0-4--RFC-xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33390983.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] V2 xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 2012-02-25 15:46 ` Kamal Dasu @ 2012-02-28 8:36 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2012-02-28 8:36 UTC (permalink / raw) To: Kamal Dasu; +Cc: xfs On Sat, Feb 25, 2012 at 07:57:02AM -0800, Kamal Dasu wrote: > > Christoph, > > I have not been able to create a simple test case for this yet. > > Currently the only way I have is to use an a time shift recording > application > that stores video streams on a real-time subvolume. Sometimes when such > a stream is deleted I see the problem. I have not figured out how to create > a > test to consistently get allocation where the bit map span multiple extents > while > freeing the inode. > > I am still trying to come with a simple test case. > If you have any ideas let me know I will be happy to try it out. > > I have also posted a equivalent post for the 3.3 kernel branch. > http://old.nabble.com/-PATCH--xfs%3A-fix-deadlock-in-xfs_rtfree_extent-with-kernel-v3.x-to33375114.html That's the mail I replied to :) Either way - I'd love to get a good test case for it eventually, but for now let's put in the obvious fix even if we don't have a good reproducer. Alex, I'd suggest this is a 3.3-rc issue as it's a regression, even if an old one. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-02-28 8:36 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-17 22:46 [PATCH 0/4] xfs: resurrect realtime subvolume support on kernel 2.6.37 kdasu 2012-02-17 22:51 ` [PATCH 1/4] xfs: only lock the rt bitmap inode once per allocation kdasu 2012-02-17 22:55 ` [PATCH 2/4] xfs: fix xfs_get_extsz_hint for a zero extent size hint kdasu 2012-02-17 22:58 ` [PATCH 3/4] xfs: add lockdep annotations for the rt inodes kdasu 2012-02-17 23:00 ` [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37 kdasu 2012-02-19 22:41 ` Christoph Hellwig 2012-02-21 17:22 ` Kamal Dasu 2012-02-23 16:52 ` [PATCH 4/4] V2 " Kamal Dasu 2012-02-25 9:40 ` Christoph Hellwig 2012-02-25 15:46 ` Kamal Dasu 2012-02-28 8:36 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox