* [3.0-stable PATCH 01/36] xfs: fix possible overflow in xfs_ioc_trim()
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 02/36] xfs: fix allocation length overflow in xfs_bmapi_write() Mark Tinguely
` (36 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 010 --]
[-- Type: text/plain, Size: 3135 bytes --]
From: Lukas Czerner <lczerner@redhat.com>
Upstream commit: c029a50d51b8a9520105ec903639de03389915d0
In xfs_ioc_trim it is possible that computing the last allocation group
to discard might overflow for big start & len values, because the result
might be bigger then xfs_agnumber_t which is 32 bit long. Fix this by not
allowing the start and end block of the range to be beyond the end of the
file system.
Note that if the start is beyond the end of the file system we have to
return -EINVAL, but in the "end" case we have to truncate it to the fs
size.
Also introduce "end" variable, rather than using start+len which which
might be more confusing to get right as this bug shows.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/linux-2.6/xfs_discard.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_discard.c b/fs/xfs/linux-2.6/xfs_discard.c
index 572494f..286a051 100644
--- a/fs/xfs/linux-2.6/xfs_discard.c
+++ b/fs/xfs/linux-2.6/xfs_discard.c
@@ -38,7 +38,7 @@ xfs_trim_extents(
struct xfs_mount *mp,
xfs_agnumber_t agno,
xfs_fsblock_t start,
- xfs_fsblock_t len,
+ xfs_fsblock_t end,
xfs_fsblock_t minlen,
__uint64_t *blocks_trimmed)
{
@@ -100,7 +100,7 @@ xfs_trim_extents(
* down partially overlapping ranges for now.
*/
if (XFS_AGB_TO_FSB(mp, agno, fbno) + flen < start ||
- XFS_AGB_TO_FSB(mp, agno, fbno) >= start + len) {
+ XFS_AGB_TO_FSB(mp, agno, fbno) > end) {
trace_xfs_discard_exclude(mp, agno, fbno, flen);
goto next_extent;
}
@@ -145,7 +145,7 @@ xfs_ioc_trim(
struct request_queue *q = mp->m_ddev_targp->bt_bdev->bd_disk->queue;
unsigned int granularity = q->limits.discard_granularity;
struct fstrim_range range;
- xfs_fsblock_t start, len, minlen;
+ xfs_fsblock_t start, end, minlen;
xfs_agnumber_t start_agno, end_agno, agno;
__uint64_t blocks_trimmed = 0;
int error, last_error = 0;
@@ -165,19 +165,19 @@ xfs_ioc_trim(
* matter as trimming blocks is an advisory interface.
*/
start = XFS_B_TO_FSBT(mp, range.start);
- len = XFS_B_TO_FSBT(mp, range.len);
+ end = start + XFS_B_TO_FSBT(mp, range.len) - 1;
minlen = XFS_B_TO_FSB(mp, max_t(u64, granularity, range.minlen));
- start_agno = XFS_FSB_TO_AGNO(mp, start);
- if (start_agno >= mp->m_sb.sb_agcount)
+ if (start >= mp->m_sb.sb_dblocks)
return -XFS_ERROR(EINVAL);
+ if (end > mp->m_sb.sb_dblocks - 1)
+ end = mp->m_sb.sb_dblocks - 1;
- end_agno = XFS_FSB_TO_AGNO(mp, start + len);
- if (end_agno >= mp->m_sb.sb_agcount)
- end_agno = mp->m_sb.sb_agcount - 1;
+ start_agno = XFS_FSB_TO_AGNO(mp, start);
+ end_agno = XFS_FSB_TO_AGNO(mp, end);
for (agno = start_agno; agno <= end_agno; agno++) {
- error = -xfs_trim_extents(mp, agno, start, len, minlen,
+ error = -xfs_trim_extents(mp, agno, start, end, minlen,
&blocks_trimmed);
if (error)
last_error = error;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 02/36] xfs: fix allocation length overflow in xfs_bmapi_write()
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 01/36] xfs: fix possible overflow in xfs_ioc_trim() Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 03/36] xfs: mark the xfssyncd workqueue as non-reentrant Mark Tinguely
` (35 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 022 --]
[-- Type: text/plain, Size: 2031 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: a99ebf43f49f4499ab0e2a8a9132ad6ed6ba2409
When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():
XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239
The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.
This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.
Fix it by limiting the extent lenth to allocate to MAXEXTLEN.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_bmap.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
Index: b/fs/xfs/xfs_bmap.c
===================================================================
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -4453,8 +4453,18 @@ xfs_bmapi(
xfs_bmbt_get_all(ep, &prev);
}
} else {
- alen = (xfs_extlen_t)
- XFS_FILBLKS_MIN(len, MAXEXTLEN);
+ /*
+ * There's a 32/64 bit type mismatch between the
+ * allocation length request (which can be 64
+ * bits in length) and the bma length request,
+ * which is xfs_extlen_t and therefore 32 bits.
+ * Hence we have to check for 32-bit overflows
+ * and handle them here.
+ */
+ if (len > (xfs_filblks_t)MAXEXTLEN)
+ alen = MAXEXTLEN;
+ else
+ alen = len;
if (!eof)
alen = (xfs_extlen_t)
XFS_FILBLKS_MIN(alen,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 03/36] xfs: mark the xfssyncd workqueue as non-reentrant
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 01/36] xfs: fix possible overflow in xfs_ioc_trim() Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 02/36] xfs: fix allocation length overflow in xfs_bmapi_write() Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 04/36] xfs: xfs_trans_add_item() - dont assign in ASSERT() when compare is intended Mark Tinguely
` (34 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 027 --]
[-- Type: text/plain, Size: 2000 bytes --]
From: Christoph Hellwig <hch@infradead.org>
Upstream commit: 40d344ec5ee440596b1f3ae87556e20c7197757a
On a system with lots of memory pressure that is stuck on synchronous inode
reclaim the workqueue code will run one instance of the inode reclaim work
item on every CPU. which is not what we want. Make sure to mark the
xfssyncd workqueue as non-reentrant to make sure there only is one instace
of each running globally. Also stop using special paramater for the
workqueue; now that we guarantee each fs has only running one of each works
at a time there is no need to artificially lower max_active and compensate
for that by setting the WQ_CPU_INTENSIVE flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_super.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index e6ac98c..04dd4d4 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1617,12 +1617,12 @@ STATIC int __init
xfs_init_workqueues(void)
{
/*
- * max_active is set to 8 to give enough concurency to allow
- * multiple work operations on each CPU to run. This allows multiple
- * filesystems to be running sync work concurrently, and scales with
- * the number of CPUs in the system.
+ * We never want to the same work item to run twice, reclaiming inodes
+ * or idling the log is not going to get any faster by multiple CPUs
+ * competing for ressources. Use the default large max_active value
+ * so that even lots of filesystems can perform these task in parallel.
*/
- xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
+ xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_NON_REENTRANT, 0);
if (!xfs_syncd_wq)
return -ENOMEM;
return 0;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 04/36] xfs: xfs_trans_add_item() - dont assign in ASSERT() when compare is intended
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (2 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 03/36] xfs: mark the xfssyncd workqueue as non-reentrant Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 05/36] xfs: only take the ILOCK in xfs_reclaim_inode() Mark Tinguely
` (33 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 035 --]
[-- Type: text/plain, Size: 1103 bytes --]
From: Jesper Juhl <jj@chaosbits.net>
Upstream commit: f65020a83ad570c1788f7d8ece67f3487166576b
It looks to me like the two ASSERT()s in xfs_trans_add_item() really
want to do a compare (==) rather than assignment (=).
This patch changes it from the latter to the former.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 05293485a0b6b1f803e8a3c0ff188c38f6969985)
---
fs/xfs/xfs_trans.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index efc147f..9b4437d 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1151,8 +1151,8 @@ xfs_trans_add_item(
{
struct xfs_log_item_desc *lidp;
- ASSERT(lip->li_mountp = tp->t_mountp);
- ASSERT(lip->li_ailp = tp->t_mountp->m_ail);
+ ASSERT(lip->li_mountp == tp->t_mountp);
+ ASSERT(lip->li_ailp == tp->t_mountp->m_ail);
lidp = kmem_zone_zalloc(xfs_log_item_desc_zone, KM_SLEEP | KM_NOFS);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 05/36] xfs: only take the ILOCK in xfs_reclaim_inode()
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (3 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 04/36] xfs: xfs_trans_add_item() - dont assign in ASSERT() when compare is intended Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 06/36] xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get Mark Tinguely
` (32 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 048 --]
[-- Type: text/plain, Size: 2915 bytes --]
From: Alex Elder <elder@dreamhost.com>
Upstream commit: ad637a10f444fc66b1f6d4a28fe30d4c61ed0161
At the end of xfs_reclaim_inode(), the inode is locked in order to
we wait for a possible concurrent lookup to complete before the
inode is freed. This synchronization step was taking both the ILOCK
and the IOLOCK, but the latter was causing lockdep to produce
reports of the possibility of deadlock.
It turns out that there's no need to acquire the IOLOCK at this
point anyway. It may have been required in some earlier version of
the code, but there should be no need to take the IOLOCK in
xfs_iget(), so there's no (longer) any need to get it here for
synchronization. Add an assertion in xfs_iget() as a reminder
of this assumption.
Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested
no longer including the IOLOCK. I just put together the patch.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 10 ++++------
fs/xfs/xfs_iget.c | 9 +++++++++
2 files changed, 13 insertions(+), 6 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -921,17 +921,15 @@ reclaim:
* can reference the inodes in the cache without taking references.
*
* We make that OK here by ensuring that we wait until the inode is
- * unlocked after the lookup before we go ahead and free it. We get
- * both the ilock and the iolock because the code may need to drop the
- * ilock one but will still hold the iolock.
+ * unlocked after the lookup before we go ahead and free it.
*/
- xfs_ilock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_qm_dqdetach(ip);
- xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
xfs_inode_free(ip);
- return error;
+ return error;
}
/*
Index: b/fs/xfs/xfs_iget.c
===================================================================
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -430,6 +430,15 @@ xfs_iget(
xfs_perag_t *pag;
xfs_agino_t agino;
+ /*
+ * xfs_reclaim_inode() uses the ILOCK to ensure an inode
+ * doesn't get freed while it's being referenced during a
+ * radix tree traversal here. It assumes this function
+ * aqcuires only the ILOCK (and therefore it has no need to
+ * involve the IOLOCK in this synchronization).
+ */
+ ASSERT((lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) == 0);
+
/* reject inode numbers outside existing AGs */
if (!ino || XFS_INO_TO_AGNO(mp, ino) >= mp->m_sb.sb_agcount)
return EINVAL;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 06/36] xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (4 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 05/36] xfs: only take the ILOCK in xfs_reclaim_inode() Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 07/36] xfs: fallback to vmalloc for large buffers in xfs_getbmap Mark Tinguely
` (31 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 057 --]
[-- Type: text/plain, Size: 1592 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: ad650f5b27bc9858360b42aaa0d9204d16115316
xfsdump uses for a large buffer for extended attributes, which has a
kmalloc'd shadow buffer in the kernel. This can fail after the
system has been running for some time as it is a high order
allocation. Add a fallback to vmalloc so that it doesn't require
contiguous memory and so won't randomly fail while xfsdump is
running.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_ioctl.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_ioctl.c b/fs/xfs/linux-2.6/xfs_ioctl.c
index acca2c5..ada7da9 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl.c
@@ -450,9 +450,12 @@ xfs_attrmulti_attr_get(
if (*len > XATTR_SIZE_MAX)
return EINVAL;
- kbuf = kmalloc(*len, GFP_KERNEL);
- if (!kbuf)
- return ENOMEM;
+ kbuf = kmem_zalloc(*len, KM_SLEEP | KM_MAYFAIL);
+ if (!kbuf) {
+ kbuf = kmem_zalloc_large(*len);
+ if (!kbuf)
+ return ENOMEM;
+ }
error = xfs_attr_get(XFS_I(inode), name, kbuf, (int *)len, flags);
if (error)
@@ -462,7 +465,10 @@ xfs_attrmulti_attr_get(
error = EFAULT;
out_kfree:
- kfree(kbuf);
+ if (is_vmalloc_addr(kbuf))
+ kmem_free_large(kbuf);
+ else
+ kmem_free(kbuf);
return error;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 07/36] xfs: fallback to vmalloc for large buffers in xfs_getbmap
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (5 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 06/36] xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 08/36] xfs: fix deadlock in xfs_rtfree_extent Mark Tinguely
` (30 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 058 --]
[-- Type: text/plain, Size: 1593 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: f074211f6041305b645669464343d504f4e6a290
xfs_getbmap uses for a large buffer for extents, which is kmalloc'd.
This can fail after the system has been running for some time as it
is a high order allocation. Add a fallback to vmalloc so that it
doesn't require contiguous memory and so won't randomly fail on
files with large extent lists.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_bmap.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
Index: b/fs/xfs/xfs_bmap.c
===================================================================
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5457,8 +5457,12 @@ xfs_getbmap(
if (bmv->bmv_count > ULONG_MAX / sizeof(struct getbmapx))
return XFS_ERROR(ENOMEM);
out = kmem_zalloc(bmv->bmv_count * sizeof(struct getbmapx), KM_MAYFAIL);
- if (!out)
- return XFS_ERROR(ENOMEM);
+ if (!out) {
+ out = kmem_zalloc_large(bmv->bmv_count *
+ sizeof(struct getbmapx));
+ if (!out)
+ return XFS_ERROR(ENOMEM);
+ }
xfs_ilock(ip, XFS_IOLOCK_SHARED);
if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
@@ -5583,7 +5587,10 @@ xfs_getbmap(
break;
}
- kmem_free(out);
+ if (is_vmalloc_addr(out))
+ kmem_free_large(out);
+ else
+ kmem_free(out);
return error;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 08/36] xfs: fix deadlock in xfs_rtfree_extent
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (6 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 07/36] xfs: fallback to vmalloc for large buffers in xfs_getbmap Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 09/36] xfs: Fix open flag handling in open_by_handle code Mark Tinguely
` (29 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 059 --]
[-- Type: text/plain, Size: 2231 bytes --]
From: Kamal Dasu <kdasu.kdev@gmail.com>
Upstream commit: 5575acc7807595687288b3bbac15103f2a5462e1
To fix the deadlock caused by repeatedly calling xfs_rtfree_extent
- removed xfs_ilock() and xfs_trans_ijoin() from xfs_rtfree_extent(),
instead added asserts that the inode is locked and has an inode_item
attached to it.
- in xfs_bunmapi() when dealing with an inode with the rt flag
call xfs_ilock() and xfs_trans_ijoin() so that the
reference count is bumped on the inode and attached it to the
transaction before calling into xfs_bmap_del_extent, similar to
what we do in xfs_bmap_rtalloc.
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_bmap.c | 9 +++++++++
fs/xfs/xfs_rtalloc.c | 9 ++++-----
2 files changed, 13 insertions(+), 5 deletions(-)
Index: b/fs/xfs/xfs_bmap.c
===================================================================
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5041,6 +5041,15 @@ xfs_bunmapi(
cur->bc_private.b.flags = 0;
} else
cur = NULL;
+
+ if (isrt) {
+ /*
+ * Synchronize by locking the bitmap inode.
+ */
+ xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, mp->m_rbmip);
+ }
+
extno = 0;
while (bno != (xfs_fileoff_t)-1 && bno >= start && lastx >= 0 &&
(nexts == 0 || extno < nexts)) {
Index: b/fs/xfs/xfs_rtalloc.c
===================================================================
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -183,6 +183,7 @@ error_cancel:
oblocks = map.br_startoff + map.br_blockcount;
}
return 0;
+
error:
return error;
}
@@ -2149,11 +2150,9 @@ xfs_rtfree_extent(
xfs_buf_t *sumbp; /* summary file block buffer */
mp = tp->t_mountp;
- /*
- * Synchronize by locking the bitmap inode.
- */
- xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin_ref(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
+
+ ASSERT(mp->m_rbmip->i_itemp != NULL);
+ ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL));
#if defined(__KERNEL__) && defined(DEBUG)
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 09/36] xfs: Fix open flag handling in open_by_handle code
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (7 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 08/36] xfs: fix deadlock in xfs_rtfree_extent Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 10/36] xfs: Account log unmount transaction correctly Mark Tinguely
` (28 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 060 --]
[-- Type: text/plain, Size: 1933 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 1a1d772433d42aaff7315b3468fef5951604f5c6
Sparse identified some unsafe handling of open flags in the xfs open
by handle ioctl code. Update the code to use the correct access
macros to ensure that we handle the open flags correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_ioctl.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_ioctl.c b/fs/xfs/linux-2.6/xfs_ioctl.c
index ada7da9..b27fb6d 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl.c
@@ -209,6 +209,7 @@ xfs_open_by_handle(
struct file *filp;
struct inode *inode;
struct dentry *dentry;
+ fmode_t fmode;
if (!capable(CAP_SYS_ADMIN))
return -XFS_ERROR(EPERM);
@@ -228,26 +229,21 @@ xfs_open_by_handle(
hreq->oflags |= O_LARGEFILE;
#endif
- /* Put open permission in namei format. */
permflag = hreq->oflags;
- if ((permflag+1) & O_ACCMODE)
- permflag++;
- if (permflag & O_TRUNC)
- permflag |= 2;
-
+ fmode = OPEN_FMODE(permflag);
if ((!(permflag & O_APPEND) || (permflag & O_TRUNC)) &&
- (permflag & FMODE_WRITE) && IS_APPEND(inode)) {
+ (fmode & FMODE_WRITE) && IS_APPEND(inode)) {
error = -XFS_ERROR(EPERM);
goto out_dput;
}
- if ((permflag & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+ if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
error = -XFS_ERROR(EACCES);
goto out_dput;
}
/* Can't write directories. */
- if (S_ISDIR(inode->i_mode) && (permflag & FMODE_WRITE)) {
+ if (S_ISDIR(inode->i_mode) && (fmode & FMODE_WRITE)) {
error = -XFS_ERROR(EISDIR);
goto out_dput;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 10/36] xfs: Account log unmount transaction correctly
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (8 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 09/36] xfs: Fix open flag handling in open_by_handle code Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 11/36] xfs: fix fstrim offset calculations Mark Tinguely
` (27 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 063 --]
[-- Type: text/plain, Size: 1742 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 3948659e30808fbaa7673bbe89de2ae9769e20a7
There have been a few reports of this warning appearing recently:
XFS (dm-4): xlog_space_left: head behind tail
tail_cycle = 129, tail_bytes = 20163072
GH cycle = 129, GH bytes = 20162880
The common cause appears to be lots of freeze and unfreeze cycles,
and the output from the warnings indicates that we are leaking
around 8 bytes of log space per freeze/unfreeze cycle.
When we freeze the filesystem, we write an unmount record and that
uses xlog_write directly - a special type of transaction,
effectively. What it doesn't do, however, is correctly account for
the log space it uses. The unmount record writes an 8 byte structure
with a special magic number into the log, and the space this
consumes is not accounted for in the log ticket tracking the
operation. Hence we leak 8 bytes every unmount record that is
written.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_log.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_log.c
===================================================================
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -531,8 +531,9 @@ xfs_log_unmount_write(xfs_mount_t *mp)
.lv_iovecp = ®,
};
- /* remove inited flag */
+ /* remove inited flag, and account for space used */
tic->t_flags = 0;
+ tic->t_curr_res -= sizeof(magic);
error = xlog_write(log, &vec, tic, &lsn,
NULL, XLOG_UNMOUNT_TRANS);
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 11/36] xfs: fix fstrim offset calculations
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (9 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 10/36] xfs: Account log unmount transaction correctly Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 12/36] xfs: dont fill statvfs with project quota for a directory Mark Tinguely
` (26 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 064 --]
[-- Type: text/plain, Size: 7632 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: a66d636385d621e98a915233250356c394a437de
xfs_ioc_fstrim() doesn't treat the incoming offset and length
correctly. It treats them as a filesystem block address, rather than
a disk address. This is wrong because the range passed in is a
linear representation, while the filesystem block address notation
is a sparse representation. Hence we cannot convert the range direct
to filesystem block units and then use that for calculating the
range to trim.
While this sounds dangerous, the problem is limited to calculating
what AGs need to be trimmed. The code that calcuates the actual
ranges to trim gets the right result (i.e. only ever discards free
space), even though it uses the wrong ranges to limit what is
trimmed. Hence this is not a bug that endangers user data.
Fix this by treating the range as a disk address range and use the
appropriate functions to convert the range into the desired formats
for calculations.
Further, fix the first free extent lookup (the longest) to actually
find the largest free extent. Currently this lookup uses a <=
lookup, which results in finding the extent to the left of the
largest because we can never get an exact match on the largest
extent. This is due to the fact that while we know it's size, we
don't know it's location and so the exact match fails and we move
one record to the left to get the next largest extent. Instead, use
a >= search so that the lookup returns the largest extent regardless
of the fact we don't get an exact match on it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_discard.c | 61 +++++++++++++++++++++++++----------------
fs/xfs/xfs_alloc.c | 2 -
fs/xfs/xfs_alloc.h | 7 ++++
3 files changed, 46 insertions(+), 24 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_discard.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_discard.c
+++ b/fs/xfs/linux-2.6/xfs_discard.c
@@ -37,9 +37,9 @@ STATIC int
xfs_trim_extents(
struct xfs_mount *mp,
xfs_agnumber_t agno,
- xfs_fsblock_t start,
- xfs_fsblock_t end,
- xfs_fsblock_t minlen,
+ xfs_daddr_t start,
+ xfs_daddr_t end,
+ xfs_daddr_t minlen,
__uint64_t *blocks_trimmed)
{
struct block_device *bdev = mp->m_ddev_targp->bt_bdev;
@@ -67,7 +67,7 @@ xfs_trim_extents(
/*
* Look up the longest btree in the AGF and start with it.
*/
- error = xfs_alloc_lookup_le(cur, 0,
+ error = xfs_alloc_lookup_ge(cur, 0,
be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_longest), &i);
if (error)
goto out_del_cursor;
@@ -77,8 +77,10 @@ xfs_trim_extents(
* enough to be worth discarding.
*/
while (i) {
- xfs_agblock_t fbno;
- xfs_extlen_t flen;
+ xfs_agblock_t fbno;
+ xfs_extlen_t flen;
+ xfs_daddr_t dbno;
+ xfs_extlen_t dlen;
error = xfs_alloc_get_rec(cur, &fbno, &flen, &i);
if (error)
@@ -87,9 +89,17 @@ xfs_trim_extents(
ASSERT(flen <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_longest));
/*
+ * use daddr format for all range/len calculations as that is
+ * the format the range/len variables are supplied in by
+ * userspace.
+ */
+ dbno = XFS_AGB_TO_DADDR(mp, agno, fbno);
+ dlen = XFS_FSB_TO_BB(mp, flen);
+
+ /*
* Too small? Give up.
*/
- if (flen < minlen) {
+ if (dlen < minlen) {
trace_xfs_discard_toosmall(mp, agno, fbno, flen);
goto out_del_cursor;
}
@@ -99,8 +109,7 @@ xfs_trim_extents(
* supposed to discard skip it. Do not bother to trim
* down partially overlapping ranges for now.
*/
- if (XFS_AGB_TO_FSB(mp, agno, fbno) + flen < start ||
- XFS_AGB_TO_FSB(mp, agno, fbno) > end) {
+ if (dbno + dlen < start || dbno > end) {
trace_xfs_discard_exclude(mp, agno, fbno, flen);
goto next_extent;
}
@@ -115,10 +124,7 @@ xfs_trim_extents(
}
trace_xfs_discard_extent(mp, agno, fbno, flen);
- error = -blkdev_issue_discard(bdev,
- XFS_AGB_TO_DADDR(mp, agno, fbno),
- XFS_FSB_TO_BB(mp, flen),
- GFP_NOFS, 0);
+ error = -blkdev_issue_discard(bdev, dbno, dlen, GFP_NOFS, 0);
if (error)
goto out_del_cursor;
*blocks_trimmed += flen;
@@ -137,6 +143,15 @@ out_put_perag:
return error;
}
+/*
+ * trim a range of the filesystem.
+ *
+ * Note: the parameters passed from userspace are byte ranges into the
+ * filesystem which does not match to the format we use for filesystem block
+ * addressing. FSB addressing is sparse (AGNO|AGBNO), while the incoming format
+ * is a linear address range. Hence we need to use DADDR based conversions and
+ * comparisons for determining the correct offset and regions to trim.
+ */
int
xfs_ioc_trim(
struct xfs_mount *mp,
@@ -145,7 +160,7 @@ xfs_ioc_trim(
struct request_queue *q = mp->m_ddev_targp->bt_bdev->bd_disk->queue;
unsigned int granularity = q->limits.discard_granularity;
struct fstrim_range range;
- xfs_fsblock_t start, end, minlen;
+ xfs_daddr_t start, end, minlen;
xfs_agnumber_t start_agno, end_agno, agno;
__uint64_t blocks_trimmed = 0;
int error, last_error = 0;
@@ -159,22 +174,22 @@ xfs_ioc_trim(
/*
* Truncating down the len isn't actually quite correct, but using
- * XFS_B_TO_FSB would mean we trivially get overflows for values
+ * BBTOB would mean we trivially get overflows for values
* of ULLONG_MAX or slightly lower. And ULLONG_MAX is the default
* used by the fstrim application. In the end it really doesn't
* matter as trimming blocks is an advisory interface.
*/
- start = XFS_B_TO_FSBT(mp, range.start);
- end = start + XFS_B_TO_FSBT(mp, range.len) - 1;
- minlen = XFS_B_TO_FSB(mp, max_t(u64, granularity, range.minlen));
+ start = BTOBB(range.start);
+ end = start + BTOBBT(range.len) - 1;
+ minlen = BTOBB(max_t(u64, granularity, range.minlen));
- if (start >= mp->m_sb.sb_dblocks)
+ if (XFS_BB_TO_FSB(mp, start) >= mp->m_sb.sb_dblocks)
return -XFS_ERROR(EINVAL);
- if (end > mp->m_sb.sb_dblocks - 1)
- end = mp->m_sb.sb_dblocks - 1;
+ if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
+ end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
- start_agno = XFS_FSB_TO_AGNO(mp, start);
- end_agno = XFS_FSB_TO_AGNO(mp, end);
+ start_agno = xfs_daddr_to_agno(mp, start);
+ end_agno = xfs_daddr_to_agno(mp, end);
for (agno = start_agno; agno <= end_agno; agno++) {
error = -xfs_trim_extents(mp, agno, start, end, minlen,
Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -68,7 +68,7 @@ xfs_alloc_lookup_eq(
* Lookup the first record greater than or equal to [bno, len]
* in the btree given by cur.
*/
-STATIC int /* error */
+int /* error */
xfs_alloc_lookup_ge(
struct xfs_btree_cur *cur, /* btree cursor */
xfs_agblock_t bno, /* starting block of extent */
Index: b/fs/xfs/xfs_alloc.h
===================================================================
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -243,6 +243,13 @@ xfs_alloc_lookup_le(
xfs_extlen_t len, /* length of extent */
int *stat); /* success/failure */
+int /* error */
+xfs_alloc_lookup_ge(
+ struct xfs_btree_cur *cur, /* btree cursor */
+ xfs_agblock_t bno, /* starting block of extent */
+ xfs_extlen_t len, /* length of extent */
+ int *stat); /* success/failure */
+
int /* error */
xfs_alloc_get_rec(
struct xfs_btree_cur *cur, /* btree cursor */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 12/36] xfs: dont fill statvfs with project quota for a directory
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (10 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 11/36] xfs: fix fstrim offset calculations Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 13/36] xfs: Ensure inode reclaim can run during quotacheck Mark Tinguely
` (25 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 066 --]
[-- Type: text/plain, Size: 1087 bytes --]
From: Jie Liu <jeff.liu@oracle.com>
Upstream commit: da5bf95e3cdca348327c82568c2860229c0daaa2
if it was not enabled.
Check if the project quota is running or not before performing
xfs_qm_statvfs(), just return if not. Otherwise the ASSERT
XFS_IS_QUOTA_RUNNING in xfs_qm_dqget will be popped.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1091,7 +1091,7 @@ xfs_fs_statfs(
spin_unlock(&mp->m_sb_lock);
- if ((ip->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) ||
+ if ((ip->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) &&
((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_OQUOTA_ENFD))) ==
(XFS_PQUOTA_ACCT|XFS_OQUOTA_ENFD))
xfs_qm_statvfs(ip, statp);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 13/36] xfs: Ensure inode reclaim can run during quotacheck
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (11 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 12/36] xfs: dont fill statvfs with project quota for a directory Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 14/36] xfs: use shared ilock mode for direct IO writes by default Mark Tinguely
` (24 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 067 --]
[-- Type: text/plain, Size: 4620 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 8a00ebe4cfc90eda9ecb575ba97e22021cd8cf70
Because the mount process can run a quotacheck and consume lots of
inodes, we need to be able to run periodic inode reclaim during the
mount process. This will prevent running the system out of memory
during quota checks.
This essentially reverts 2bcf6e97, but that is safe to do now that
the quota sync code that was causing problems during long quotacheck
executions is now gone.
The reclaim work is currently protected from running during the
unmount process by a check against MS_ACTIVE. Unfortunately, this
also means that the reclaim work cannot run during mount. The
unmount process should stop the reclaim cleanly before freeing
anything that the reclaim work depends on, so there is no need to
have this guard in place.
Also, the inode reclaim work is demand driven, so there is no need
to start it immediately during mount. It will be started the moment
an inode is queued for reclaim, so qutoacheck will trigger it just
fine.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_super.c | 17 +++++++++--------
fs/xfs/linux-2.6/xfs_sync.c | 19 +++++++++----------
2 files changed, 18 insertions(+), 18 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -994,7 +994,6 @@ xfs_fs_put_super(
* structure so we don't have memory reclaim racing with us here.
*/
xfs_inode_shrinker_unregister(mp);
- xfs_syncd_stop(mp);
/*
* Blow away any referenced inode in the filestreams cache.
@@ -1006,6 +1005,7 @@ xfs_fs_put_super(
XFS_bflush(mp->m_ddev_targp);
xfs_unmountfs(mp);
+ xfs_syncd_stop(mp);
xfs_freesb(mp);
xfs_icsb_destroy_counters(mp);
xfs_close_devices(mp);
@@ -1378,22 +1378,22 @@ xfs_fs_fill_super(
xfs_inode_shrinker_register(mp);
- error = xfs_mountfs(mp);
+ error = xfs_syncd_init(mp);
if (error)
goto out_filestream_unmount;
- error = xfs_syncd_init(mp);
+ error = xfs_mountfs(mp);
if (error)
- goto out_unmount;
+ goto out_syncd_stop;
root = igrab(VFS_I(mp->m_rootip));
if (!root) {
error = ENOENT;
- goto out_syncd_stop;
+ goto out_unmount;
}
if (is_bad_inode(root)) {
error = EINVAL;
- goto out_syncd_stop;
+ goto out_unmount;
}
sb->s_root = d_alloc_root(root);
if (!sb->s_root) {
@@ -1403,6 +1403,8 @@ xfs_fs_fill_super(
return 0;
+ out_syncd_stop:
+ xfs_syncd_stop(mp);
out_filestream_unmount:
xfs_inode_shrinker_unregister(mp);
xfs_filestream_unmount(mp);
@@ -1420,8 +1422,6 @@ xfs_fs_fill_super(
out_iput:
iput(root);
- out_syncd_stop:
- xfs_syncd_stop(mp);
out_unmount:
xfs_inode_shrinker_unregister(mp);
@@ -1435,6 +1435,7 @@ xfs_fs_fill_super(
XFS_bflush(mp->m_ddev_targp);
xfs_unmountfs(mp);
+ xfs_syncd_stop(mp);
goto out_free_sb;
}
Index: b/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -502,7 +502,15 @@ xfs_sync_worker(
struct xfs_mount, m_sync_work);
int error;
- if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+ /*
+ * We shouldn't write/force the log if we are in the mount/unmount
+ * process or on a read only filesystem. The workqueue still needs to be
+ * active in both cases, however, because it is used for inode reclaim
+ * during these times. hence use the MS_ACTIVE flag to avoid doing
+ * anything in these periods.
+ */
+ if (!(mp->m_super->s_flags & MS_ACTIVE) &&
+ !(mp->m_flags & XFS_MOUNT_RDONLY)) {
/* dgc: errors ignored here */
if (mp->m_super->s_frozen == SB_UNFROZEN &&
xfs_log_need_covered(mp))
@@ -531,14 +539,6 @@ xfs_syncd_queue_reclaim(
struct xfs_mount *mp)
{
- /*
- * We can have inodes enter reclaim after we've shut down the syncd
- * workqueue during unmount, so don't allow reclaim work to be queued
- * during unmount.
- */
- if (!(mp->m_super->s_flags & MS_ACTIVE))
- return;
-
rcu_read_lock();
if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
@@ -607,7 +607,6 @@ xfs_syncd_init(
INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
xfs_syncd_queue_sync(mp);
- xfs_syncd_queue_reclaim(mp);
return 0;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 14/36] xfs: use shared ilock mode for direct IO writes by default
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (12 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 13/36] xfs: Ensure inode reclaim can run during quotacheck Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 15/36] xfs: punch all delalloc blocks beyond EOF on write failure Mark Tinguely
` (23 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 072 --]
[-- Type: text/plain, Size: 5844 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 507630b29f13a3d8689895618b12015308402e22
For the direct IO write path, we only really need the ilock to be taken in
exclusive mode during IO submission if we need to do extent allocation
instead of all the time.
Change the block mapping code to take the ilock in shared mode for the
initial block mapping, and only retake it exclusively when we actually
have to perform extent allocations. We were already dropping the ilock
for the transaction allocation, so this doesn't introduce new race windows.
Based on an earlier patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 30 +++++++++++++++++++++++++----
fs/xfs/xfs_iomap.c | 45 ++++++++++++++++++--------------------------
2 files changed, 45 insertions(+), 30 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1158,7 +1158,14 @@ __xfs_get_blocks(
if (!create && direct && offset >= i_size_read(inode))
return 0;
- if (create) {
+ /*
+ * Direct I/O is usually done on preallocated files, so try getting
+ * a block mapping without an exclusive lock first. For buffered
+ * writes we already have the exclusive iolock anyway, so avoiding
+ * a lock roundtrip here by taking the ilock exclusive from the
+ * beginning is a useful micro optimization.
+ */
+ if (create && !direct) {
lockmode = XFS_ILOCK_EXCL;
xfs_ilock(ip, lockmode);
} else {
@@ -1181,22 +1188,37 @@ __xfs_get_blocks(
(imap.br_startblock == HOLESTARTBLOCK ||
imap.br_startblock == DELAYSTARTBLOCK))) {
if (direct) {
+ /*
+ * Drop the ilock in preparation for starting the block
+ * allocation transaction. It will be retaken
+ * exclusively inside xfs_iomap_write_direct for the
+ * actual allocation.
+ */
+ xfs_iunlock(ip, lockmode);
error = xfs_iomap_write_direct(ip, offset, size,
&imap, nimaps);
+ if (error)
+ return -error;
} else {
+ /*
+ * Delalloc reservations do not require a transaction,
+ * we can go on without dropping the lock here.
+ */
error = xfs_iomap_write_delay(ip, offset, size, &imap);
+ if (error)
+ goto out_unlock;
+
+ xfs_iunlock(ip, lockmode);
}
- if (error)
- goto out_unlock;
trace_xfs_get_blocks_alloc(ip, offset, size, 0, &imap);
} else if (nimaps) {
trace_xfs_get_blocks_found(ip, offset, size, 0, &imap);
+ xfs_iunlock(ip, lockmode);
} else {
trace_xfs_get_blocks_notfound(ip, offset, size);
goto out_unlock;
}
- xfs_iunlock(ip, lockmode);
if (imap.br_startblock != HOLESTARTBLOCK &&
imap.br_startblock != DELAYSTARTBLOCK) {
Index: b/fs/xfs/xfs_iomap.c
===================================================================
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -141,11 +141,7 @@ xfs_iomap_write_direct(
int committed;
int error;
- /*
- * Make sure that the dquots are there. This doesn't hold
- * the ilock across a disk read.
- */
- error = xfs_qm_dqattach_locked(ip, 0);
+ error = xfs_qm_dqattach(ip, 0);
if (error)
return XFS_ERROR(error);
@@ -157,7 +153,7 @@ xfs_iomap_write_direct(
if ((offset + count) > ip->i_size) {
error = xfs_iomap_eof_align_last_fsb(mp, ip, extsz, &last_fsb);
if (error)
- goto error_out;
+ return XFS_ERROR(error);
} else {
if (nmaps && (imap->br_startblock == HOLESTARTBLOCK))
last_fsb = MIN(last_fsb, (xfs_fileoff_t)
@@ -189,7 +185,6 @@ xfs_iomap_write_direct(
/*
* Allocate and setup the transaction
*/
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
error = xfs_trans_reserve(tp, resblks,
XFS_WRITE_LOG_RES(mp), resrtextents,
@@ -198,15 +193,16 @@ xfs_iomap_write_direct(
/*
* Check for running out of space, note: need lock to return
*/
- if (error)
+ if (error) {
xfs_trans_cancel(tp, 0);
+ return XFS_ERROR(error);
+ }
+
xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (error)
- goto error_out;
error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);
if (error)
- goto error1;
+ goto out_trans_cancel;
xfs_trans_ijoin(tp, ip);
@@ -225,42 +221,39 @@ xfs_iomap_write_direct(
error = xfs_bmapi(tp, ip, offset_fsb, count_fsb, bmapi_flag,
&firstfsb, 0, imap, &nimaps, &free_list);
if (error)
- goto error0;
+ goto out_bmap_cancel;
/*
* Complete the transaction
*/
error = xfs_bmap_finish(&tp, &free_list, &committed);
if (error)
- goto error0;
+ goto out_bmap_cancel;
error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
if (error)
- goto error_out;
+ goto out_unlock;
/*
* Copy any maps to caller's array and return any error.
*/
if (nimaps == 0) {
- error = ENOSPC;
- goto error_out;
+ error = XFS_ERROR(ENOSPC);
+ goto out_unlock;
}
- if (!(imap->br_startblock || XFS_IS_REALTIME_INODE(ip))) {
+ if (!(imap->br_startblock || XFS_IS_REALTIME_INODE(ip)))
error = xfs_alert_fsblock_zero(ip, imap);
- goto error_out;
- }
- return 0;
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
-error0: /* Cancel bmap, unlock inode, unreserve quota blocks, cancel trans */
+out_bmap_cancel:
xfs_bmap_cancel(&free_list);
xfs_trans_unreserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);
-
-error1: /* Just cancel transaction */
+out_trans_cancel:
xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
-
-error_out:
- return XFS_ERROR(error);
+ goto out_unlock;
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 15/36] xfs: punch all delalloc blocks beyond EOF on write failure.
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (13 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 14/36] xfs: use shared ilock mode for direct IO writes by default Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 16/36] xfs: page type check in writeback only checks last buffer Mark Tinguely
` (22 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 073 --]
[-- Type: text/plain, Size: 5471 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 01c84d2dc1311fb76ea217dadfd5b3a5f3cab563
I've been seeing regular ASSERT failures in xfstests when running
fsstress based tests over the past month. xfs_getbmap() has been
failing this test:
XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) ||
(map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c,
line: 5650
where it is encountering a delayed allocation extent after writing
all the dirty data to disk and then walking the extent map
atomically by holding the XFS_IOLOCK_SHARED to prevent new delayed
allocation extents from being created.
Test 083 on a 512 byte block size filesystem was used to reproduce
the problem, because it only had a 5s run timeand would usually fail
every 3-4 runs. This test is exercising ENOSPC behaviour by running
fsstress on a nearly full filesystem. The following trace extract
shows the final few events on the inode that tripped the assert:
xfs_ilock: flags ILOCK_EXCL caller xfs_setfilesize
xfs_setfilesize: isize 0x180000 disize 0x12d400 offset 0x17e200 count 7680
file size updated to 0x180000 by IO completion
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_iext_insert: state idx 3 offset 3072 block 4503599627239432 count 1 flag 0 caller xfs_bmap_add_extent_hole_delay
xfs_get_blocks_alloc: size 0x180000 offset 0x180000 count 512 type startoff 0xc00 startblock -1 blockcount 0x1
xfs_ilock: flags ILOCK_EXCL caller __xfs_get_blocks
delalloc write, adding a single block at offset 0x180000
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180200 count 512
ENOSPC trying to allocate a dellalloc block at offset 0x180200
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_get_blocks_alloc: size 0x180000 offset 0x180200 count 512 type startoff 0xc00 startblock -1 blockcount 0x2
And succeeding on retry after flushing dirty inodes.
xfs_ilock: flags ILOCK_EXCL caller __xfs_get_blocks
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180400 count 512
ENOSPC trying to allocate a dellalloc block at offset 0x180400
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180400 count 512
And failing the retry, giving a real ENOSPC error.
xfs_ilock: flags ILOCK_EXCL caller xfs_vm_write_failed
^^^^^^^^^^^^^^^^^^^
The smoking gun - the write being failed and cleaning up delalloc
blocks beyond EOF allocated by the failed write.
xfs_getattr:
xfs_ilock: flags IOLOCK_SHARED caller xfs_getbmap
xfs_ilock: flags ILOCK_SHARED caller xfs_ilock_map_shared
And that's where we died almost immediately afterwards.
xfs_bmapi_read() found delalloc extent beyond current file in memory
file size. Some debug I added to xfs_getbmap() showed the state just
before the assert failure:
ino 0x80e48: off 0xc00, fsb 0xffffffffffffffff, len 0x1, size 0x180000
start_fsb 0x106, end_fsb 0x638
ino flags 0x2 nex 0xd bmvcnt 0x555, len 0x3c58a6f23c0bf1, start 0xc00
ext 0: off 0x1fc, fsb 0x24782, len 0x254
ext 1: off 0x450, fsb 0x40851, len 0x30
ext 2: off 0x480, fsb 0xd99, len 0x1b8
ext 3: off 0x92f, fsb 0x4099a, len 0x3b
ext 4: off 0x96d, fsb 0x41844, len 0x98
ext 5: off 0xbf1, fsb 0x408ab, len 0xf
which shows that we found a single delalloc block beyond EOF (first
line of output) when we were returning the map for a length
somewhere around 10^16 bytes long (second line), and the on-disk
extents showed they didn't go past EOF (last lines).
Further debug added to xfs_vm_write_failed() showed this happened
when punching out delalloc blocks beyond the end of the file after
the failed write:
[ 132.606693] ino 0x80e48: vwf to 0x181000, sze 0x180000
[ 132.609573] start_fsb 0xc01, end_fsb 0xc08
It punched the range 0xc01 -> 0xc08, but the range we really need to
punch is 0xc00 -> 0xc07 (8 blocks from 0xc00) as this testing was
run on a 512 byte block size filesystem (8 blocks per page).
the punch from is 0xc00. So end_fsb is correct, but start_fsb is
wrong as we punch from start_fsb for (end_fsb - start_fsb) blocks.
Hence we are not punching the delalloc block beyond EOF in the case.
The fix is simple - it's a silly off-by-one mistake in calculating
the range. It's especially silly because the macro used to calculate
the start_fsb already takes into account the case where the inode
size is an exact multiple of the filesystem block size...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1421,7 +1421,7 @@ xfs_vm_write_failed(
* Check if there are any blocks that are outside of i_size
* that need to be trimmed back.
*/
- start_fsb = XFS_B_TO_FSB(ip->i_mount, inode->i_size) + 1;
+ start_fsb = XFS_B_TO_FSB(ip->i_mount, inode->i_size);
end_fsb = XFS_B_TO_FSB(ip->i_mount, to);
if (end_fsb <= start_fsb)
return;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 16/36] xfs: page type check in writeback only checks last buffer
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (14 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 15/36] xfs: punch all delalloc blocks beyond EOF on write failure Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 17/36] xfs: punch new delalloc blocks out of failed writes inside EOF Mark Tinguely
` (21 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 075 --]
[-- Type: text/plain, Size: 2793 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 6ffc4db5de61d36e969a26bc94509c59246c81f8
xfs_is_delayed_page() checks to see if a page has buffers matching
the given IO type passed in. It does so by walking the buffer heads
on the page and checking if the state flags match the IO type.
However, the "acceptable" variable that is calculated is overwritten
every time a new buffer is checked. Hence if the first buffer on the
page is of the right type, this state is lost if the second buffer
is not of the correct type. This means that xfs_aops_discard_page()
may not discard delalloc regions when it is supposed to, and
xfs_convert_page() may not cluster IO as efficiently as possible.
This problem only occurs on filesystems with a block size smaller
than page size.
Also, rename xfs_is_delayed_page() to xfs_check_page_type() to
better describe what it is doing - it is not delalloc specific
anymore.
The problem was first noticed by Peter Watkins.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -636,7 +636,7 @@ xfs_map_at_offset(
* or delayed allocate extent.
*/
STATIC int
-xfs_is_delayed_page(
+xfs_check_page_type(
struct page *page,
unsigned int type)
{
@@ -650,11 +650,11 @@ xfs_is_delayed_page(
bh = head = page_buffers(page);
do {
if (buffer_unwritten(bh))
- acceptable = (type == IO_UNWRITTEN);
+ acceptable += (type == IO_UNWRITTEN);
else if (buffer_delay(bh))
- acceptable = (type == IO_DELALLOC);
+ acceptable += (type == IO_DELALLOC);
else if (buffer_dirty(bh) && buffer_mapped(bh))
- acceptable = (type == IO_OVERWRITE);
+ acceptable += (type == IO_OVERWRITE);
else
break;
} while ((bh = bh->b_this_page) != head);
@@ -697,7 +697,7 @@ xfs_convert_page(
goto fail_unlock_page;
if (page->mapping != inode->i_mapping)
goto fail_unlock_page;
- if (!xfs_is_delayed_page(page, (*ioendp)->io_type))
+ if (!xfs_check_page_type(page, (*ioendp)->io_type))
goto fail_unlock_page;
/*
@@ -847,7 +847,7 @@ xfs_aops_discard_page(
struct buffer_head *bh, *head;
loff_t offset = page_offset(page);
- if (!xfs_is_delayed_page(page, IO_DELALLOC))
+ if (!xfs_check_page_type(page, IO_DELALLOC))
goto out_invalidate;
if (XFS_FORCED_SHUTDOWN(ip->i_mount))
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 17/36] xfs: punch new delalloc blocks out of failed writes inside EOF.
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (15 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 16/36] xfs: page type check in writeback only checks last buffer Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 18/36] xfs: dont assert on delalloc regions beyond EOF Mark Tinguely
` (20 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 076 --]
[-- Type: text/plain, Size: 12248 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: d3bc815afb549eecb3679a4b2f0df216e34df998
When a partial write inside EOF fails, it can leave delayed
allocation blocks lying around because they don't get punched back
out. This leads to assert failures like:
XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/linux-2.6/xfs_super.c, line: 847
when evicting inodes from the cache. This can be trivially triggered
by xfstests 083, which takes between 5 and 15 executions on a 512
byte block size filesystem to trip over this. Debugging shows a
failed write due to ENOSPC calling xfs_vm_write_failed such as:
[ 5012.329024] ino 0xa0026: vwf to 0x17000, sze 0x1c85ae
and no action is taken on it. This leaves behind a delayed
allocation extent that has no page covering it and no data in it:
[ 5015.867162] ino 0xa0026: blks: 0x83 delay blocks 0x1, size 0x2538c0
[ 5015.868293] ext 0: off 0x4a, fsb 0x50306, len 0x1
[ 5015.869095] ext 1: off 0x4b, fsb 0x7899, len 0x6b
[ 5015.869900] ext 2: off 0xb6, fsb 0xffffffffe0008, len 0x1
^^^^^^^^^^^^^^^
[ 5015.871027] ext 3: off 0x36e, fsb 0x7a27, len 0xd
[ 5015.872206] ext 4: off 0x4cf, fsb 0x7a1d, len 0xa
So the delayed allocation extent is one block long at offset
0x16c00. Tracing shows that a bigger write:
xfs_file_buffered_write: size 0x1c85ae offset 0x959d count 0x1ca3f ioflags
allocates the block, and then fails with ENOSPC trying to allocate
the last block on the page, leading to a failed write with stale
delalloc blocks on it.
Because we've had an ENOSPC when trying to allocate 0x16e00, it
means that we are never goinge to call ->write_end on the page and
so the allocated new buffer will not get marked dirty or have the
buffer_new state cleared. In other works, what the above write is
supposed to end up with is this mapping for the page:
+------+------+------+------+------+------+------+------+
UMA UMA UMA UMA UMA UMA UND FAIL
where: U = uptodate
M = mapped
N = new
A = allocated
D = delalloc
FAIL = block we ENOSPC'd on.
and the key point being the buffer_new() state for the newly
allocated delayed allocation block. Except it doesn't - we're not
marking buffers new correctly.
That buffer_new() problem goes back to the xfs_iomap removal days,
where xfs_iomap() used to return a "new" status for any map with
newly allocated blocks, so that __xfs_get_blocks() could call
set_buffer_new() on it. We still have the "new" variable and the
check for it in the set_buffer_new() logic - except we never set it
now!
Hence that newly allocated delalloc block doesn't have the new flag
set on it, so when the write fails we cannot tell which blocks we
are supposed to punch out. WHy do we need the buffer_new flag? Well,
that's because we can have this case:
+------+------+------+------+------+------+------+------+
UMD UMD UMD UMD UMD UMD UND FAIL
where all the UMD buffers contain valid data from a previously
successful write() system call. We only want to punch the UND buffer
because that's the only one that we added in this write and it was
only this write that failed.
That implies that even the old buffer_new() logic was wrong -
because it would result in all those UMD buffers on the page having
set_buffer_new() called on them even though they aren't new. Hence
we shoul donly be calling set_buffer_new() for delalloc buffers that
were allocated (i.e. were a hole before xfs_iomap_write_delay() was
called).
So, fix this set_buffer_new logic according to how we need it to
work for handling failed writes correctly. Also, restore the new
buffer logic handling for blocks allocated via
xfs_iomap_write_direct(), because it should still set the buffer_new
flag appropriately for newly allocated blocks, too.
SO, now we have the buffer_new() being set appropriately in
__xfs_get_blocks(), we can detect the exact delalloc ranges that
we allocated in a failed write, and hence can now do a walk of the
buffers on a page to find them.
Except, it's not that easy. When block_write_begin() fails, it
unlocks and releases the page that we just had an error on, so we
can't use that page to handle errors anymore. We have to get access
to the page while it is still locked to walk the buffers. Hence we
have to open code block_write_begin() in xfs_vm_write_begin() to be
able to insert xfs_vm_write_failed() is the right place.
With that, we can pass the page and write range to
xfs_vm_write_failed() and walk the buffers on the page, looking for
delalloc buffers that are either new or beyond EOF and punch them
out. Handling buffers beyond EOF ensures we still handle the
existing case that xfs_vm_write_failed() handles.
Of special note is the truncate_pagecache() handling - that only
should be done for pages outside EOF - pages within EOF can still
contain valid, dirty data so we must not punch them out of the
cache.
That just leaves the xfs_vm_write_end() failure handling.
The only failure case here is that we didn't copy the entire range,
and generic_write_end() handles that by zeroing the region of the
page that wasn't copied, we don't have to punch out blocks within
the file because they are guaranteed to contain zeros. Hence we only
have to handle the existing "beyond EOF" case and don't need access
to the buffers on the page. Hence it remains largely unchanged.
Note that xfs_getbmap() can still trip over delalloc blocks beyond
EOF that are left there by speculative delayed allocation. Hence
this bug fix does not solve all known issues with bmap vs delalloc,
but it does fix all the the known accidental occurances of the
problem.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 173 ++++++++++++++++++++++++++++++++------------
1 file changed, 126 insertions(+), 47 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1199,11 +1199,18 @@ __xfs_get_blocks(
&imap, nimaps);
if (error)
return -error;
+ new = 1;
} else {
/*
* Delalloc reservations do not require a transaction,
- * we can go on without dropping the lock here.
+ * we can go on without dropping the lock here. If we
+ * are allocating a new delalloc block, make sure that
+ * we set the new flag so that we mark the buffer new so
+ * that we know that it is newly allocated if the write
+ * fails.
*/
+ if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
+ new = 1;
error = xfs_iomap_write_delay(ip, offset, size, &imap);
if (error)
goto out_unlock;
@@ -1394,53 +1401,90 @@ xfs_vm_direct_IO(
return ret;
}
+/*
+ * Punch out the delalloc blocks we have already allocated.
+ *
+ * Don't bother with xfs_setattr given that nothing can have made it to disk yet
+ * as the page is still locked at this point.
+ */
+STATIC void
+xfs_vm_kill_delalloc_range(
+ struct inode *inode,
+ loff_t start,
+ loff_t end)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ xfs_fileoff_t start_fsb;
+ xfs_fileoff_t end_fsb;
+ int error;
+
+ start_fsb = XFS_B_TO_FSB(ip->i_mount, start);
+ end_fsb = XFS_B_TO_FSB(ip->i_mount, end);
+ if (end_fsb <= start_fsb)
+ return;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+ end_fsb - start_fsb);
+ if (error) {
+ /* something screwed, just bail */
+ if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
+ xfs_alert(ip->i_mount,
+ "xfs_vm_write_failed: unable to clean up ino %lld",
+ ip->i_ino);
+ }
+ }
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+}
+
STATIC void
xfs_vm_write_failed(
- struct address_space *mapping,
- loff_t to)
+ struct inode *inode,
+ struct page *page,
+ loff_t pos,
+ unsigned len)
{
- struct inode *inode = mapping->host;
+ loff_t block_offset = pos & PAGE_MASK;
+ loff_t block_start;
+ loff_t block_end;
+ loff_t from = pos & (PAGE_CACHE_SIZE - 1);
+ loff_t to = from + len;
+ struct buffer_head *bh, *head;
+
+ ASSERT(block_offset + from == pos);
+
+ head = page_buffers(page);
+ block_start = 0;
+ for (bh = head; bh != head || !block_start;
+ bh = bh->b_this_page, block_start = block_end,
+ block_offset += bh->b_size) {
+ block_end = block_start + bh->b_size;
+ /* skip buffers before the write */
+ if (block_end <= from)
+ continue;
+
+ /* if the buffer is after the write, we're done */
+ if (block_start >= to)
+ break;
- if (to > inode->i_size) {
- /*
- * punch out the delalloc blocks we have already allocated. We
- * don't call xfs_setattr() to do this as we may be in the
- * middle of a multi-iovec write and so the vfs inode->i_size
- * will not match the xfs ip->i_size and so it will zero too
- * much. Hence we jus truncate the page cache to zero what is
- * necessary and punch the delalloc blocks directly.
- */
- struct xfs_inode *ip = XFS_I(inode);
- xfs_fileoff_t start_fsb;
- xfs_fileoff_t end_fsb;
- int error;
+ if (!buffer_delay(bh))
+ continue;
- truncate_pagecache(inode, to, inode->i_size);
+ if (!buffer_new(bh) && block_offset < i_size_read(inode))
+ continue;
- /*
- * Check if there are any blocks that are outside of i_size
- * that need to be trimmed back.
- */
- start_fsb = XFS_B_TO_FSB(ip->i_mount, inode->i_size);
- end_fsb = XFS_B_TO_FSB(ip->i_mount, to);
- if (end_fsb <= start_fsb)
- return;
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
- end_fsb - start_fsb);
- if (error) {
- /* something screwed, just bail */
- if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
- xfs_alert(ip->i_mount,
- "xfs_vm_write_failed: unable to clean up ino %lld",
- ip->i_ino);
- }
- }
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_vm_kill_delalloc_range(inode, block_offset,
+ block_offset + bh->b_size);
}
+
}
+/*
+ * This used to call block_write_begin(), but it unlocks and releases the page
+ * on error, and we need that page to be able to punch stale delalloc blocks out
+ * on failure. hence we copy-n-waste it here and call xfs_vm_write_failed() at
+ * the appropriate point.
+*/
STATIC int
xfs_vm_write_begin(
struct file *file,
@@ -1451,15 +1495,40 @@ xfs_vm_write_begin(
struct page **pagep,
void **fsdata)
{
- int ret;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ struct page *page;
+ int status;
- ret = block_write_begin(mapping, pos, len, flags | AOP_FLAG_NOFS,
- pagep, xfs_get_blocks);
- if (unlikely(ret))
- xfs_vm_write_failed(mapping, pos + len);
- return ret;
+ ASSERT(len <= PAGE_CACHE_SIZE);
+
+ page = grab_cache_page_write_begin(mapping, index,
+ flags | AOP_FLAG_NOFS);
+ if (!page)
+ return -ENOMEM;
+
+ status = __block_write_begin(page, pos, len, xfs_get_blocks);
+ if (unlikely(status)) {
+ struct inode *inode = mapping->host;
+
+ xfs_vm_write_failed(inode, page, pos, len);
+ unlock_page(page);
+
+ if (pos + len > i_size_read(inode))
+ truncate_pagecache(inode, pos + len, i_size_read(inode));
+
+ page_cache_release(page);
+ page = NULL;
+ }
+
+ *pagep = page;
+ return status;
}
+/*
+ * On failure, we only need to kill delalloc blocks beyond EOF because they
+ * will never be written. For blocks within EOF, generic_write_end() zeros them
+ * so they are safe to leave alone and be written with all the other valid data.
+ */
STATIC int
xfs_vm_write_end(
struct file *file,
@@ -1472,9 +1541,19 @@ xfs_vm_write_end(
{
int ret;
+ ASSERT(len <= PAGE_CACHE_SIZE);
+
ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
- if (unlikely(ret < len))
- xfs_vm_write_failed(mapping, pos + len);
+ if (unlikely(ret < len)) {
+ struct inode *inode = mapping->host;
+ size_t isize = i_size_read(inode);
+ loff_t to = pos + len;
+
+ if (to > isize) {
+ truncate_pagecache(inode, to, isize);
+ xfs_vm_kill_delalloc_range(inode, isize, to);
+ }
+ }
return ret;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 18/36] xfs: dont assert on delalloc regions beyond EOF
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (16 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 17/36] xfs: punch new delalloc blocks out of failed writes inside EOF Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 19/36] xfs: limit specualtive delalloc to maxioffset Mark Tinguely
` (19 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 078 --]
[-- Type: text/plain, Size: 3401 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 58e20770646932fe9b758c94e8c278ea9ec93878
When we are doing speculative delayed allocation beyond EOF,
conversion of the region allocated beyond EOF is dependent on the
largest free space extent available. If the largest free extent is
smaller than the delalloc range, then after allocation we leave
a delalloc extent that starts beyond EOF. This extent cannot *ever*
be converted by flushing data, and so will remain there until either
the EOF moves into the extent or it is truncated away.
Hence if xfs_getbmap() runs on such an inode and is asked to return
extents beyond EOF, it will assert fail on this extent even though
there is nothing xfs_getbmap() can do to convert it to a real
extent. Hence we should simply report these delalloc extents rather
than assert that there should be none.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
A modified XFS_ISIZE() inline function came from:
From: Christoph Hellwig <hch@infradead.org>
upstream commit: ce7ae151ddada3dbf67301464343c154903166b3
xfs: remove the i_size field in struct xfs_inode
---
fs/xfs/xfs_bmap.c | 19 +++++++++++++++----
fs/xfs/xfs_inode.h | 8 ++++++--
2 files changed, 21 insertions(+), 6 deletions(-)
Index: b/fs/xfs/xfs_bmap.c
===================================================================
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -53,7 +53,6 @@
#include "xfs_vnodeops.h"
#include "xfs_trace.h"
-
#ifdef DEBUG
STATIC void
xfs_bmap_check_leaf_extents(xfs_btree_cur_t *cur, xfs_inode_t *ip, int whichfork);
@@ -5543,9 +5542,21 @@ xfs_getbmap(
XFS_FSB_TO_BB(mp, map[i].br_blockcount);
out[cur_ext].bmv_unused1 = 0;
out[cur_ext].bmv_unused2 = 0;
- ASSERT(((iflags & BMV_IF_DELALLOC) != 0) ||
- (map[i].br_startblock != DELAYSTARTBLOCK));
- if (map[i].br_startblock == HOLESTARTBLOCK &&
+
+ /*
+ * delayed allocation extents that start beyond EOF can
+ * occur due to speculative EOF allocation when the
+ * delalloc extent is larger than the largest freespace
+ * extent at conversion time. These extents cannot be
+ * converted by data writeback, so can exist here even
+ * if we are not supposed to be finding delalloc
+ * extents.
+ */
+ if (map[i].br_startblock == DELAYSTARTBLOCK &&
+ map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
+ ASSERT((iflags & BMV_IF_DELALLOC) != 0);
+
+ if (map[i].br_startblock == HOLESTARTBLOCK &&
whichfork == XFS_ATTR_FORK) {
/* came to the end of attribute fork */
out[cur_ext].bmv_oflags |= BMV_OF_LAST;
Index: b/fs/xfs/xfs_inode.h
===================================================================
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -264,8 +264,12 @@ typedef struct xfs_inode {
struct inode i_vnode; /* embedded VFS inode */
} xfs_inode_t;
-#define XFS_ISIZE(ip) (((ip)->i_d.di_mode & S_IFMT) == S_IFREG) ? \
- (ip)->i_size : (ip)->i_d.di_size;
+static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
+{
+ if (S_ISREG(ip->i_d.di_mode))
+ return ip->i_size;
+ return ip->i_d.di_size;
+}
/* Convert from vfs inode to xfs inode */
static inline struct xfs_inode *XFS_I(struct inode *inode)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 19/36] xfs: limit specualtive delalloc to maxioffset
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (17 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 18/36] xfs: dont assert on delalloc regions beyond EOF Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 20/36] xfs: Use preallocation for inodes with extsz hints Mark Tinguely
` (18 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 079 --]
[-- Type: text/plain, Size: 1413 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 3ed9116e8a3e9c0870b2076340b3da9b8f900f3b
Speculative delayed allocation beyond EOF near the maximum supported
file offset can result in creating delalloc extents beyond
mp->m_maxioffset (8EB). These can never be trimmed during
xfs_free_eof_blocks() because they are beyond mp->m_maxioffset, and
that results in assert failures in xfs_fs_destroy_inode() due to
delalloc blocks still being present. xfstests 071 exposes this
problem.
Limit speculative delalloc to mp->m_maxioffset to avoid this
problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_iomap.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: b/fs/xfs/xfs_iomap.c
===================================================================
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -417,6 +417,15 @@ retry:
return error;
}
+ /*
+ * Make sure preallocation does not create extents beyond the range we
+ * actually support in this filesystem.
+ */
+ if (last_fsb > XFS_B_TO_FSB(mp, mp->m_maxioffset))
+ last_fsb = XFS_B_TO_FSB(mp, mp->m_maxioffset);
+
+ ASSERT(last_fsb > offset_fsb);
+
nimaps = XFS_WRITE_IMAPS;
firstblock = NULLFSBLOCK;
error = xfs_bmapi(NULL, ip, offset_fsb,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 20/36] xfs: Use preallocation for inodes with extsz hints
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (18 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 19/36] xfs: limit specualtive delalloc to maxioffset Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 21/36] xfs: Dont allocate new buffers on every call to _xfs_buf_find Mark Tinguely
` (17 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 080 --]
[-- Type: text/plain, Size: 2383 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: aff3a9edb7080f69f07fe76a8bd089b3dfa4cb5d
xfstest 229 exposes a problem with buffered IO, delayed allocation
and extent size hints. That is when we do delayed allocation during
buffered IO, we reserve space for the extent size hint alignment and
allocate the physical space to align the extent, but we do not zero
the regions of the extent that aren't written by the write(2)
syscall. The result is that we expose stale data in unwritten
regions of the extent size hints.
There are two ways to fix this. The first is to detect that we are
doing unaligned writes, check if there is already a mapping or data
over the extent size hint range, and if not zero the page cache
first before then doing the real write. This can be very expensive
for large extent size hints, especially if the subsequent writes
fill then entire extent size before the data is written to disk.
The second, and simpler way, is simply to turn off delayed
allocation when the extent size hint is set and use preallocation
instead. This results in unwritten extents being laid down on disk
and so only the written portions will be converted. This matches the
behaviour for direct IO, and will also work for the real time
device. The disadvantage of this approach is that for small extent
size hints we can get file fragmentation, but in general extent size
hints are fairly large (e.g. stripe width sized) so this isn't a big
deal.
Implement the second approach as it is simple and effective.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1187,7 +1187,7 @@ __xfs_get_blocks(
(!nimaps ||
(imap.br_startblock == HOLESTARTBLOCK ||
imap.br_startblock == DELAYSTARTBLOCK))) {
- if (direct) {
+ if (direct || xfs_get_extsz_hint(ip)) {
/*
* Drop the ilock in preparation for starting the block
* allocation transaction. It will be retaken
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 21/36] xfs: Dont allocate new buffers on every call to _xfs_buf_find
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (19 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 20/36] xfs: Use preallocation for inodes with extsz hints Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 22/36] xfs: clean up buffer allocation Mark Tinguely
` (16 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 012 --]
[-- Type: text/plain, Size: 5455 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 3815832a2aa4df9815d15dac05227e0c8551833f
Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.
The result is profiles that look like this:
samples pcnt function DSO
_______ _____ _______________________________ _________________
1036.00 10.0% _xfs_buf_find [kernel.kallsyms]
582.00 5.6% kmem_cache_alloc [kernel.kallsyms]
519.00 5.0% __memcpy [kernel.kallsyms]
468.00 4.5% __ticket_spin_lock [kernel.kallsyms]
388.00 3.7% kmem_cache_free [kernel.kallsyms]
331.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.
To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:
samples pcnt function DSO
_______ _____ ___________________________ _________________
8111.00 9.1% _xfs_buf_find [kernel.kallsyms]
4380.00 4.9% __memcpy [kernel.kallsyms]
4341.00 4.8% __ticket_spin_lock [kernel.kallsyms]
3401.00 3.8% kmem_cache_alloc [kernel.kallsyms]
2856.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
2625.00 2.9% __kmalloc [kernel.kallsyms]
2380.00 2.7% kfree [kernel.kallsyms]
2016.00 2.3% kmem_cache_free [kernel.kallsyms]
Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/linux-2.6/xfs_buf.c | 48 ++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 5e68099..6a2ab95 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -481,8 +481,6 @@ _xfs_buf_find(
/* No match found */
if (new_bp) {
- _xfs_buf_initialize(new_bp, btp, range_base,
- range_length, flags);
rb_link_node(&new_bp->b_rbnode, parent, rbp);
rb_insert_color(&new_bp->b_rbnode, &pag->pag_buf_tree);
/* the buffer keeps the perag reference until it is freed */
@@ -527,35 +525,53 @@ found:
}
/*
- * Assembles a buffer covering the specified range.
- * Storage in memory for all portions of the buffer will be allocated,
- * although backing storage may not be.
+ * Assembles a buffer covering the specified range. The code is optimised for
+ * cache hits, as metadata intensive workloads will see 3 orders of magnitude
+ * more hits than misses.
*/
-xfs_buf_t *
+struct xfs_buf *
xfs_buf_get(
xfs_buftarg_t *target,/* target for buffer */
xfs_off_t ioff, /* starting offset of range */
size_t isize, /* length of range */
xfs_buf_flags_t flags)
{
- xfs_buf_t *bp, *new_bp;
+ struct xfs_buf *bp;
+ struct xfs_buf *new_bp;
int error = 0;
+ bp = _xfs_buf_find(target, ioff, isize, flags, NULL);
+ if (likely(bp))
+ goto found;
+
new_bp = xfs_buf_allocate(flags);
if (unlikely(!new_bp))
return NULL;
+ _xfs_buf_initialize(new_bp, target,
+ ioff << BBSHIFT, isize << BBSHIFT, flags);
+
bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
+ if (!bp) {
+ xfs_buf_deallocate(new_bp);
+ return NULL;
+ }
+
if (bp == new_bp) {
error = xfs_buf_allocate_memory(bp, flags);
if (error)
goto no_buffer;
- } else {
+ } else
xfs_buf_deallocate(new_bp);
- if (unlikely(bp == NULL))
- return NULL;
- }
+ /*
+ * Now we have a workable buffer, fill in the block number so
+ * that we can do IO on it.
+ */
+ bp->b_bn = ioff;
+ bp->b_count_desired = bp->b_buffer_length;
+
+found:
if (!(bp->b_flags & XBF_MAPPED)) {
error = _xfs_buf_map_pages(bp, flags);
if (unlikely(error)) {
@@ -566,18 +582,10 @@ xfs_buf_get(
}
XFS_STATS_INC(xb_get);
-
- /*
- * Always fill in the block number now, the mapped cases can do
- * their own overlay of this later.
- */
- bp->b_bn = ioff;
- bp->b_count_desired = bp->b_buffer_length;
-
trace_xfs_buf_get(bp, flags, _RET_IP_);
return bp;
- no_buffer:
+no_buffer:
if (flags & (XBF_LOCK | XBF_TRYLOCK))
xfs_buf_unlock(bp);
xfs_buf_rele(bp);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 22/36] xfs: clean up buffer allocation
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (20 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 21/36] xfs: Dont allocate new buffers on every call to _xfs_buf_find Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 23/36] xfs: fix buffer lookup race on allocation failure Mark Tinguely
` (15 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 015 --]
[-- Type: text/plain, Size: 4959 bytes --]
From: Christoph Hellwig <hch@infradead.org>
Upstream commit: 4347b9d7ad4223474d315c3ab6bc1ce7cce7fa2d
Change _xfs_buf_initialize to allocate the buffer directly and rename it to
xfs_buf_alloc now that is the only buffer allocation routine. Also remove
the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/linux-2.6/xfs_buf.c | 50 ++++++++++++++++-----------------------------
fs/xfs/linux-2.6/xfs_buf.h | 3 +-
fs/xfs/xfs_log.c | 2 -
3 files changed, 21 insertions(+), 34 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -66,10 +66,6 @@ struct workqueue_struct *xfsconvertd_wor
#define xb_to_km(flags) \
(((flags) & XBF_DONT_BLOCK) ? KM_NOFS : KM_SLEEP)
-#define xfs_buf_allocate(flags) \
- kmem_zone_alloc(xfs_buf_zone, xb_to_km(flags))
-#define xfs_buf_deallocate(bp) \
- kmem_zone_free(xfs_buf_zone, (bp));
static inline int
xfs_buf_is_vmapped(
@@ -167,14 +163,19 @@ xfs_buf_stale(
ASSERT(atomic_read(&bp->b_hold) >= 1);
}
-STATIC void
-_xfs_buf_initialize(
- xfs_buf_t *bp,
- xfs_buftarg_t *target,
+struct xfs_buf *
+xfs_buf_alloc(
+ struct xfs_buftarg *target,
xfs_off_t range_base,
size_t range_length,
xfs_buf_flags_t flags)
{
+ struct xfs_buf *bp;
+
+ bp = kmem_zone_alloc(xfs_buf_zone, xb_to_km(flags));
+ if (unlikely(!bp))
+ return NULL;
+
/*
* We don't want certain flags to appear in b_flags.
*/
@@ -203,8 +204,9 @@ _xfs_buf_initialize(
init_waitqueue_head(&bp->b_waiters);
XFS_STATS_INC(xb_create);
-
trace_xfs_buf_init(bp, _RET_IP_);
+
+ return bp;
}
/*
@@ -277,7 +279,7 @@ xfs_buf_free(
} else if (bp->b_flags & _XBF_KMEM)
kmem_free(bp->b_addr);
_xfs_buf_free_pages(bp);
- xfs_buf_deallocate(bp);
+ kmem_zone_free(xfs_buf_zone, bp);
}
/*
@@ -544,16 +546,14 @@ xfs_buf_get(
if (likely(bp))
goto found;
- new_bp = xfs_buf_allocate(flags);
+ new_bp = xfs_buf_alloc(target, ioff << BBSHIFT, isize << BBSHIFT,
+ flags);
if (unlikely(!new_bp))
return NULL;
- _xfs_buf_initialize(new_bp, target,
- ioff << BBSHIFT, isize << BBSHIFT, flags);
-
bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
if (!bp) {
- xfs_buf_deallocate(new_bp);
+ kmem_zone_free(xfs_buf_zone, new_bp);
return NULL;
}
@@ -562,7 +562,7 @@ xfs_buf_get(
if (error)
goto no_buffer;
} else
- xfs_buf_deallocate(new_bp);
+ kmem_zone_free(xfs_buf_zone, new_bp);
/*
* Now we have a workable buffer, fill in the block number so
@@ -703,19 +703,6 @@ xfs_buf_read_uncached(
return bp;
}
-xfs_buf_t *
-xfs_buf_get_empty(
- size_t len,
- xfs_buftarg_t *target)
-{
- xfs_buf_t *bp;
-
- bp = xfs_buf_allocate(0);
- if (bp)
- _xfs_buf_initialize(bp, target, 0, len, 0);
- return bp;
-}
-
/*
* Return a buffer allocated as an empty buffer and associated to external
* memory via xfs_buf_associate_memory() back to it's empty state.
@@ -801,10 +788,9 @@ xfs_buf_get_uncached(
int error, i;
xfs_buf_t *bp;
- bp = xfs_buf_allocate(0);
+ bp = xfs_buf_alloc(target, 0, len, 0);
if (unlikely(bp == NULL))
goto fail;
- _xfs_buf_initialize(bp, target, 0, len, 0);
error = _xfs_buf_get_pages(bp, page_count, 0);
if (error)
@@ -834,7 +820,7 @@ xfs_buf_get_uncached(
__free_page(bp->b_pages[i]);
_xfs_buf_free_pages(bp);
fail_free_buf:
- xfs_buf_deallocate(bp);
+ kmem_zone_free(xfs_buf_zone, bp);
fail:
return NULL;
}
Index: b/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- a/fs/xfs/linux-2.6/xfs_buf.h
+++ b/fs/xfs/linux-2.6/xfs_buf.h
@@ -177,7 +177,8 @@ extern xfs_buf_t *xfs_buf_get(xfs_buftar
extern xfs_buf_t *xfs_buf_read(xfs_buftarg_t *, xfs_off_t, size_t,
xfs_buf_flags_t);
-extern xfs_buf_t *xfs_buf_get_empty(size_t, xfs_buftarg_t *);
+struct xfs_buf *xfs_buf_alloc(struct xfs_buftarg *, xfs_off_t, size_t,
+ xfs_buf_flags_t);
extern void xfs_buf_set_empty(struct xfs_buf *bp, size_t len);
extern xfs_buf_t *xfs_buf_get_uncached(struct xfs_buftarg *, size_t, int);
extern int xfs_buf_associate_memory(xfs_buf_t *, void *, size_t);
Index: b/fs/xfs/xfs_log.c
===================================================================
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1054,7 +1054,7 @@ xlog_alloc_log(xfs_mount_t *mp,
xlog_get_iclog_buffer_size(mp, log);
error = ENOMEM;
- bp = xfs_buf_get_empty(log->l_iclog_size, mp->m_logdev_targp);
+ bp = xfs_buf_alloc(mp->m_logdev_targp, 0, log->l_iclog_size, 0);
if (!bp)
goto out_free_log;
XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 23/36] xfs: fix buffer lookup race on allocation failure
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (21 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 22/36] xfs: clean up buffer allocation Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 24/36] xfs: use iolock on XFS_IOC_ALLOCSP calls Mark Tinguely
` (14 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 081 --]
[-- Type: text/plain, Size: 2410 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: fe2429b0966a7ec42b5fe3bf96f0f10de0a3b536
When memory allocation fails to add the page array or tht epages to
a buffer during xfs_buf_get(), the buffer is left in the cache in a
partially initialised state. There is enough state left for the next
lookup on that buffer to find the buffer, and for the buffer to then
be used without finishing the initialisation. As a result, when an
attempt to do IO on the buffer occurs, it fails with EIO because
there are no pages attached to the buffer.
We cannot remove the buffer from the cache immediately and free it,
because there may already be a racing lookup that is blocked on the
buffer lock. Hence the moment we unlock the buffer to then free it,
the other user is woken and we have a use-after-free situation.
To avoid this race condition altogether, allocate the pages for the
buffer before we insert it into the cache. This then means that we
don't have an allocation failure case to deal after the buffer is
already present in the cache, and hence avoid the problem
altogether. In most cases we won't have racing inserts for the same
buffer, and so won't increase the memory pressure allocation before
insertion may entail.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_buf.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -551,18 +551,20 @@ xfs_buf_get(
if (unlikely(!new_bp))
return NULL;
+ error = xfs_buf_allocate_memory(new_bp, flags);
+ if (error) {
+ kmem_zone_free(xfs_buf_zone, new_bp);
+ return NULL;
+ }
+
bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
if (!bp) {
- kmem_zone_free(xfs_buf_zone, new_bp);
+ xfs_buf_free(new_bp);
return NULL;
}
- if (bp == new_bp) {
- error = xfs_buf_allocate_memory(bp, flags);
- if (error)
- goto no_buffer;
- } else
- kmem_zone_free(xfs_buf_zone, new_bp);
+ if (bp != new_bp)
+ xfs_buf_free(new_bp);
/*
* Now we have a workable buffer, fill in the block number so
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 24/36] xfs: use iolock on XFS_IOC_ALLOCSP calls
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (22 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 23/36] xfs: fix buffer lookup race on allocation failure Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 25/36] xfs: Properly exclude IO type flags from buffer flags Mark Tinguely
` (13 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 085 --]
[-- Type: text/plain, Size: 2838 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: bc4010ecb8f4d4316e1a63a879a2715e49d113ad
fsstress has a particular effective way of stopping debug XFS
kernels. We keep seeing assert failures due finding delayed
allocation extents where there should be none. This shows up when
extracting extent maps and we are holding all the locks we should be
to prevent races, so this really makes no sense to see these errors.
After checking that fsstress does not use mmap, it occurred to me
that fsstress uses something that no sane application uses - the
XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
do allocation of blocks beyond EOF without using preallocation, and
then call setattr to extend and zero the allocated blocks.
THe problem here is this is a buffered write, and hence the
allocation is a delayed allocation. Unlike the buffered IO path, the
allocation and zeroing are not serialised using the IOLOCK. Hence
the ALLOCSP operation can race with operations holding the iolock to
prevent buffered IO operations from occurring.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_vnodeops.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
Index: b/fs/xfs/xfs_vnodeops.c
===================================================================
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2781,17 +2781,32 @@ xfs_change_file_space(
case XFS_IOC_ALLOCSP64:
case XFS_IOC_FREESP:
case XFS_IOC_FREESP64:
+ /*
+ * These operations actually do IO when extending the file, but
+ * the allocation is done seperately to the zeroing that is
+ * done. This set of operations need to be serialised against
+ * other IO operations, such as truncate and buffered IO. We
+ * need to take the IOLOCK here to serialise the allocation and
+ * zeroing IO to prevent other IOLOCK holders (e.g. getbmap,
+ * truncate, direct IO) from racing against the transient
+ * allocated but not written state we can have here.
+ */
+ xfs_ilock(ip, XFS_IOLOCK_EXCL);
if (startoffset > fsize) {
error = xfs_alloc_file_space(ip, fsize,
- startoffset - fsize, 0, attr_flags);
- if (error)
+ startoffset - fsize, 0,
+ attr_flags | XFS_ATTR_NOLOCK);
+ if (error) {
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
break;
+ }
}
iattr.ia_valid = ATTR_SIZE;
iattr.ia_size = startoffset;
- error = xfs_setattr(ip, &iattr, attr_flags);
+ error = xfs_setattr(ip, &iattr, attr_flags | XFS_ATTR_NOLOCK);
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
if (error)
return error;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 25/36] xfs: Properly exclude IO type flags from buffer flags
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (23 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 24/36] xfs: use iolock on XFS_IOC_ALLOCSP calls Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 26/36] xfs: flush outstanding buffers on log mount failure Mark Tinguely
` (12 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 086 --]
[-- Type: text/plain, Size: 1411 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 12bcb3f7d4371f74bd25372e98e0d2da978e82b2
Recent event tracing during a debugging session showed that flags
that define the IO type for a buffer are leaking into the flags on
the buffer incorrectly. Fix the flag exclusion mask in
xfs_buf_alloc() to avoid problems that may be caused by such
leakage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_buf.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -177,9 +177,11 @@ xfs_buf_alloc(
return NULL;
/*
- * We don't want certain flags to appear in b_flags.
+ * We don't want certain flags to appear in b_flags unless they are
+ * specifically set by later operations on the buffer.
*/
- flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
+ flags &= ~(XBF_LOCK | XBF_MAPPED | XBF_DONT_BLOCK |
+ XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD);
memset(bp, 0, sizeof(xfs_buf_t));
atomic_set(&bp->b_hold, 1);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 26/36] xfs: flush outstanding buffers on log mount failure
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (24 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 25/36] xfs: Properly exclude IO type flags from buffer flags Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 27/36] xfs: protect xfs_sync_worker with s_umount semaphore Mark Tinguely
` (11 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 087 --]
[-- Type: text/plain, Size: 1780 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: d4f3512b0891658b6b4d5fc99567242b3fc2d6b7
When we fail to mount the log in xfs_mountfs(), we tear down all the
infrastructure we have already allocated. However, the process of
mounting the log may have progressed to the point of reading,
caching and modifying buffers in memory. Hence before we can free
all the infrastructure, we have to flush and remove all the buffers
from memory.
Problem first reported by Eric Sandeen, later a different incarnation
was reported by Ben Myers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_mount.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2f011fd..beb9968 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1297,7 +1297,7 @@ xfs_mountfs(
XFS_FSB_TO_BB(mp, sbp->sb_logblocks));
if (error) {
xfs_warn(mp, "log mount failed");
- goto out_free_perag;
+ goto out_fail_wait;
}
/*
@@ -1324,7 +1324,7 @@ xfs_mountfs(
!mp->m_sb.sb_inprogress) {
error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
if (error)
- goto out_free_perag;
+ goto out_fail_wait;
}
/*
@@ -1448,6 +1448,10 @@ xfs_mountfs(
IRELE(rip);
out_log_dealloc:
xfs_log_unmount(mp);
+ out_fail_wait:
+ if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
+ xfs_wait_buftarg(mp->m_logdev_targp);
+ xfs_wait_buftarg(mp->m_ddev_targp);
out_free_perag:
xfs_free_perag(mp);
out_remove_uuid:
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 46+ messages in thread* [3.0-stable PATCH 27/36] xfs: protect xfs_sync_worker with s_umount semaphore
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (25 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 26/36] xfs: flush outstanding buffers on log mount failure Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 28/36] xfs: fix memory reclaim deadlock on agi buffer Mark Tinguely
` (10 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 088 --]
[-- Type: text/plain, Size: 3646 bytes --]
From: Ben Myers <bpm@sgi.com>
Upstream commit: 1307bbd2af67283131728637e9489002adb26f10
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount. This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:
PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0"
#0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
#1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
#2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
#3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
#4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
#5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
#6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
#7 [ffff88016fdffc60] page_fault at ffffffff813ac635
[exception RIP: xlog_get_lowest_lsn+0x30]
RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246
RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980
RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400
R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
#9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -506,21 +506,24 @@ xfs_sync_worker(
* We shouldn't write/force the log if we are in the mount/unmount
* process or on a read only filesystem. The workqueue still needs to be
* active in both cases, however, because it is used for inode reclaim
- * during these times. hence use the MS_ACTIVE flag to avoid doing
- * anything in these periods.
+ * during these times. Use the s_umount semaphore to provide exclusion
+ * with unmount.
*/
- if (!(mp->m_super->s_flags & MS_ACTIVE) &&
- !(mp->m_flags & XFS_MOUNT_RDONLY)) {
- /* dgc: errors ignored here */
- if (mp->m_super->s_frozen == SB_UNFROZEN &&
- xfs_log_need_covered(mp))
- error = xfs_fs_log_dummy(mp);
- else
- xfs_log_force(mp, 0);
- error = xfs_qm_sync(mp, SYNC_TRYLOCK);
+ if (down_read_trylock(&mp->m_super->s_umount)) {
+ if (!(mp->m_super->s_flags & MS_ACTIVE) &&
+ !(mp->m_flags & XFS_MOUNT_RDONLY)) {
+ /* dgc: errors ignored here */
+ if (mp->m_super->s_frozen == SB_UNFROZEN &&
+ xfs_log_need_covered(mp))
+ error = xfs_fs_log_dummy(mp);
+ else
+ xfs_log_force(mp, 0);
+ error = xfs_qm_sync(mp, SYNC_TRYLOCK);
- /* start pushing all the metadata that is currently dirty */
- xfs_ail_push_all(mp->m_ail);
+ /* start pushing all the metadata that is currently dirty */
+ xfs_ail_push_all(mp->m_ail);
+ }
+ up_read(&mp->m_super->s_umount);
}
/* queue us up again */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 28/36] xfs: fix memory reclaim deadlock on agi buffer
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (26 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 27/36] xfs: protect xfs_sync_worker with s_umount semaphore Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 29/36] xfs: xfs_vm_writepage clear iomap_valid when Mark Tinguely
` (9 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 089 --]
[-- Type: text/plain, Size: 1223 bytes --]
From: Peter Watkins <treestem@gmail.com>
Upstream commit: 3ba316037470bbf98c8a16c2179c02794fb8862e
Note xfs_iget can be called while holding a locked agi buffer. If
it goes into memory reclaim then inode teardown may try to lock the
same buffer. Prevent the deadlock by calling radix_tree_preload
with GFP_NOFS.
Signed-off-by: Peter Watkins <treestem@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_iget.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: b/fs/xfs/xfs_iget.c
===================================================================
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -340,9 +340,10 @@ xfs_iget_cache_miss(
/*
* Preload the radix tree so we can insert safely under the
* write spinlock. Note that we cannot sleep inside the preload
- * region.
+ * region. Since we can be called from transaction context, don't
+ * recurse into the file system.
*/
- if (radix_tree_preload(GFP_KERNEL)) {
+ if (radix_tree_preload(GFP_NOFS)) {
error = EAGAIN;
goto out_destroy;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 29/36] xfs: xfs_vm_writepage clear iomap_valid when
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (27 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 28/36] xfs: fix memory reclaim deadlock on agi buffer Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 30/36] xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
` (8 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 092 --]
[-- Type: text/plain, Size: 2521 bytes --]
From: Alain Renaud <arenaud@sgi.com>
Upstream commit: 66f9311381b4772003d595fb6c518f1647450db0
!buffer_uptodate (REV2)
On filesytems with a block size smaller than PAGE_SIZE we currently have
a problem with unwritten extents. If a we have multi-block page for
which an unwritten extent has been allocated, and only some of the
buffers have been written to, and they are not contiguous, we can expose
stale data from disk in the blocks between the writes after extent
conversion.
Example of a page with unwritten and real data.
buffer content
0 empty b_state = 0
1 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
2 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
3 empty b_state = 0
4 empty b_state = 0
5 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
6 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
7 empty b_state = 0
Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
empty. Currently buffers 1, 2, 5, and 6 are added to a single ioend,
and when IO has completed, extent conversion creates a real extent from
block 1 through block 6, leaving 0 and 7 unwritten. However buffers 3
and 4 were not written to disk, so stale data is exposed from those
blocks on a subsequent read.
Fix this by setting iomap_valid = 0 when we find a buffer that is not
Uptodate. This ensures that buffers 5 and 6 are not added to the same
ioend as buffers 1 and 2. Later these blocks will be converted into two
separate real extents, leaving the blocks in between unwritten.
Signed-off-by: Alain Renaud <arenaud@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1008,10 +1008,15 @@ xfs_vm_writepage(
imap_valid = 0;
}
} else {
- if (PageUptodate(page)) {
+ if (PageUptodate(page))
ASSERT(buffer_mapped(bh));
- imap_valid = 0;
- }
+ /*
+ * This buffer is not uptodate and will not be
+ * written to disk. Ensure that we will put any
+ * subsequent writeable buffers into a new
+ * ioend.
+ */
+ imap_valid = 0;
continue;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 30/36] xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (28 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 29/36] xfs: xfs_vm_writepage clear iomap_valid when Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 31/36] xfs: shutdown xfs_sync_worker before the log Mark Tinguely
` (7 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 097 --]
[-- Type: text/plain, Size: 1066 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: 76d095388b040229ea1aad7dea45be0cfa20f589
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_alloc.c | 1 +
1 file changed, 1 insertion(+)
Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -1086,6 +1086,7 @@ restart:
goto restart;
}
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
trace_xfs_alloc_size_neither(args);
args->agbno = NULLAGBLOCK;
return 0;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 31/36] xfs: shutdown xfs_sync_worker before the log
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (29 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 30/36] xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 32/36] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
` (6 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 098 --]
[-- Type: text/plain, Size: 2899 bytes --]
From: Ben Myers <bpm@sgi.com>
Upstream commit: 8866fc6fa55e31b2bce931b7963ff16641b39dc7
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount. This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 31 +++++++++++++++----------------
fs/xfs/xfs_log.c | 1 +
2 files changed, 16 insertions(+), 16 deletions(-)
Index: b/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -506,24 +506,23 @@ xfs_sync_worker(
* We shouldn't write/force the log if we are in the mount/unmount
* process or on a read only filesystem. The workqueue still needs to be
* active in both cases, however, because it is used for inode reclaim
- * during these times. Use the s_umount semaphore to provide exclusion
- * with unmount.
+ * during these times. Use the MS_ACTIVE flag to avoid doing anything
+ * during mount. Doing work during unmount is avoided by calling
+ * cancel_delayed_work_sync on this work queue before tearing down
+ * the ail and the log in xfs_log_unmount.
*/
- if (down_read_trylock(&mp->m_super->s_umount)) {
- if (!(mp->m_super->s_flags & MS_ACTIVE) &&
- !(mp->m_flags & XFS_MOUNT_RDONLY)) {
- /* dgc: errors ignored here */
- if (mp->m_super->s_frozen == SB_UNFROZEN &&
- xfs_log_need_covered(mp))
- error = xfs_fs_log_dummy(mp);
- else
- xfs_log_force(mp, 0);
- error = xfs_qm_sync(mp, SYNC_TRYLOCK);
+ if (!(mp->m_super->s_flags & MS_ACTIVE) &&
+ !(mp->m_flags & XFS_MOUNT_RDONLY)) {
+ /* dgc: errors ignored here */
+ if (mp->m_super->s_frozen == SB_UNFROZEN &&
+ xfs_log_need_covered(mp))
+ error = xfs_fs_log_dummy(mp);
+ else
+ xfs_log_force(mp, 0);
+ error = xfs_qm_sync(mp, SYNC_TRYLOCK);
- /* start pushing all the metadata that is currently dirty */
- xfs_ail_push_all(mp->m_ail);
- }
- up_read(&mp->m_super->s_umount);
+ /* start pushing all the metadata that is currently dirty */
+ xfs_ail_push_all(mp->m_ail);
}
/* queue us up again */
Index: b/fs/xfs/xfs_log.c
===================================================================
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -618,6 +618,7 @@ xfs_log_unmount_write(xfs_mount_t *mp)
void
xfs_log_unmount(xfs_mount_t *mp)
{
+ cancel_delayed_work_sync(&mp->m_sync_work);
xfs_trans_ail_destroy(mp);
xlog_dealloc_log(mp->m_log);
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 32/36] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (30 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 31/36] xfs: shutdown xfs_sync_worker before the log Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 33/36] xfs: check for stale inode before acquiring iflock on push Mark Tinguely
` (5 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 099 --]
[-- Type: text/plain, Size: 1190 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: e3a746f5aab71f2dd0a83116772922fb37ae29d6
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -1080,13 +1080,13 @@ restart:
* If we couldn't get anything, give up.
*/
if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+
if (!forced++) {
trace_xfs_alloc_near_busy(args);
xfs_log_force(args->mp, XFS_LOG_SYNC);
goto restart;
}
-
- xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
trace_xfs_alloc_size_neither(args);
args->agbno = NULLAGBLOCK;
return 0;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 33/36] xfs: check for stale inode before acquiring iflock on push
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (31 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 32/36] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 34/36] xfs: stop the sync worker before xfs_unmountfs Mark Tinguely
` (4 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: 103 --]
[-- Type: text/plain, Size: 2295 bytes --]
From: Brian Foster <bfoster@redhat.com>
Upstream commit: 9a3a5dab63461b84213052888bf38a962b22d035
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.
Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_inode_item.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
Index: b/fs/xfs/xfs_inode_item.c
===================================================================
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -588,6 +588,21 @@ xfs_inode_item_trylock(
if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
return XFS_ITEM_LOCKED;
+ /*
+ * Re-check the pincount now that we stabilized the value by
+ * taking the ilock.
+ */
+ if (xfs_ipincount(ip) > 0) {
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ return XFS_ITEM_PINNED;
+ }
+
+ /* Stale items should force out the iclog */
+ if (ip->i_flags & XFS_ISTALE) {
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ return XFS_ITEM_PINNED;
+ }
+
if (!xfs_iflock_nowait(ip)) {
/*
* inode has already been flushed to the backing buffer,
@@ -597,17 +612,6 @@ xfs_inode_item_trylock(
return XFS_ITEM_PUSHBUF;
}
- /* Stale items should force out the iclog */
- if (ip->i_flags & XFS_ISTALE) {
- xfs_ifunlock(ip);
- /*
- * we hold the AIL lock - notify the unlock routine of this
- * so it doesn't try to get the lock again.
- */
- xfs_iunlock(ip, XFS_ILOCK_SHARED|XFS_IUNLOCK_NONOTIFY);
- return XFS_ITEM_PINNED;
- }
-
#ifdef DEBUG
if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
ASSERT(iip->ili_format.ilf_fields != 0);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 34/36] xfs: stop the sync worker before xfs_unmountfs
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (32 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 33/36] xfs: check for stale inode before acquiring iflock on push Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 35/36] xfs: zero allocation_args on the kernel stack Mark Tinguely
` (3 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: xfs-stop-sync-worker-before-umount.patch --]
[-- Type: text/plain, Size: 3177 bytes --]
From: Ben Myers <bpm@sgi.com>
Upstream commit: 0ba6e5368c302819da7aff537e0d7eff2616c6a6
xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs. This prevents occasional crashes on unmount like so:
PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3"
#0 [c5377d28] crash_kexec at c0292c94
#1 [c5377d80] oops_end at c07090c2
#2 [c5377d98] no_context at c06f614e
#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
#4 [c5377df4] bad_area_nosemaphore at c06f629b
#5 [c5377e00] do_page_fault at c070b0cb
#6 [c5377e7c] error_code (via page_fault) at c070892c
EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0
DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20
CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246
#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834
PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount"
#0 [cde0fda0] __schedule at c0706595
#1 [cde0fe28] schedule at c0706b89
#2 [cde0fe30] schedule_timeout at c0705600
#3 [cde0fe94] __down_common at c0706098
#4 [cde0fec8] __down at c0706122
#5 [cde0fed0] down at c025936f
#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
#9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66
commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
fs/xfs/linux-2.6/xfs_super.c | 1 +
1 file changed, 1 insertion(+)
Index: b/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1004,6 +1004,7 @@ xfs_fs_put_super(
XFS_bflush(mp->m_ddev_targp);
+ cancel_delayed_work_sync(&mp->m_sync_work);
xfs_unmountfs(mp);
xfs_syncd_stop(mp);
xfs_freesb(mp);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 35/36] xfs: zero allocation_args on the kernel stack
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (33 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 34/36] xfs: stop the sync worker before xfs_unmountfs Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 36/36] xfs: only update the last_sync_lsn when a transaction completes Mark Tinguely
` (2 subsequent siblings)
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: xfs-zero-allocation-args-on-stack.patch --]
[-- Type: text/plain, Size: 2179 bytes --]
From: Mark Tinguely <tinguely@sgi.com>
Upstream commit: a00416844b8f4b0106344bdfd90fe45a854b1d05
xfs: zero allocation_args on the kernel stack
Zero the kernel stack space that makes up the xfs_alloc_arg structures.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_alloc.c | 1 +
fs/xfs/xfs_bmap.c | 3 +++
fs/xfs/xfs_ialloc.c | 1 +
3 files changed, 5 insertions(+)
Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -1872,6 +1872,7 @@ xfs_alloc_fix_freelist(
/*
* Initialize the args structure.
*/
+ memset(&targs, 0, sizeof(targs));
targs.tp = tp;
targs.mp = mp;
targs.agbp = agbp;
Index: b/fs/xfs/xfs_bmap.c
===================================================================
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -2561,6 +2561,7 @@ xfs_bmap_btalloc(
* Normal allocation, done through xfs_alloc_vextent.
*/
tryagain = isaligned = 0;
+ memset(&args, 0, sizeof(args));
args.tp = ap->tp;
args.mp = mp;
args.fsbno = ap->rval;
@@ -3194,6 +3195,7 @@ xfs_bmap_extents_to_btree(
* Convert to a btree with two levels, one record in root.
*/
XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_BTREE);
+ memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = mp;
args.firstblock = *firstblock;
@@ -3355,6 +3357,7 @@ xfs_bmap_local_to_extents(
xfs_buf_t *bp; /* buffer for extent block */
xfs_bmbt_rec_host_t *ep;/* extent record pointer */
+ memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = ip->i_mount;
args.firstblock = *firstblock;
Index: b/fs/xfs/xfs_ialloc.c
===================================================================
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -251,6 +251,7 @@ xfs_ialloc_ag_alloc(
/* boundary */
struct xfs_perag *pag;
+ memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = tp->t_mountp;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* [3.0-stable PATCH 36/36] xfs: only update the last_sync_lsn when a transaction completes
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (34 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 35/36] xfs: zero allocation_args on the kernel stack Mark Tinguely
@ 2012-12-03 23:42 ` Mark Tinguely
2012-12-04 21:44 ` [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Ben Myers
2012-12-05 21:45 ` Dave Chinner
37 siblings, 0 replies; 46+ messages in thread
From: Mark Tinguely @ 2012-12-03 23:42 UTC (permalink / raw)
To: stable; +Cc: xfs
[-- Attachment #1: xfs-update-last_sync-when-trans-completes.patch --]
[-- Type: text/plain, Size: 4013 bytes --]
From: Dave Chinner <dchinner@redhat.com>
Upstream commit: d35e88faa3b0fc2cea35c3b2dca358b5cd09b45f
xfs: only update the last_sync_lsn when a transaction completes
The log write code stamps each iclog with the current tail LSN in
the iclog header so that recovery knows where to find the tail of
thelog once it has found the head. Normally this is taken from the
first item on the AIL - the log item that corresponds to the oldest
active item in the log.
The problem is that when the AIL is empty, the tail lsn is dervied
from the the l_last_sync_lsn, which is the LSN of the last iclog to
be written to the log. In most cases this doesn't happen, because
the AIL is rarely empty on an active filesystem. However, when it
does, it opens up an interesting case when the transaction being
committed to the iclog spans multiple iclogs.
That is, the first iclog is stamped with the l_last_sync_lsn, and IO
is issued. Then the next iclog is setup, the changes copied into the
iclog (takes some time), and then the l_last_sync_lsn is stamped
into the header and IO is issued. This is still the same
transaction, so the tail lsn of both iclogs must be the same for log
recovery to find the entire transaction to be able to replay it.
The problem arises in that the iclog buffer IO completion updates
the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
completes it's IO before the second iclog is filled and has the tail
lsn stamped in it, it will stamp the LSN of the first iclog into
it's tail lsn field. If the system fails at this point, log recovery
will not see a complete transaction, so the transaction will no be
replayed.
The fix is simple - the l_last_sync_lsn is updated when a iclog
buffer IO completes, and this is incorrect. The l_last_sync_lsn
shoul dbe updated when a transaction is completed by a iclog buffer
IO. That is, only iclog buffers that have transaction commit
callbacks attached to them should update the l_last_sync_lsn. This
means that the last_sync_lsn will only move forward when a commit
record it written, not in the middle of a large transaction that is
rolling through multiple iclog buffers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
fs/xfs/xfs_log.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
Index: b/fs/xfs/xfs_log.c
===================================================================
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2216,14 +2216,27 @@ xlog_state_do_callback(
/*
- * update the last_sync_lsn before we drop the
+ * Completion of a iclog IO does not imply that
+ * a transaction has completed, as transactions
+ * can be large enough to span many iclogs. We
+ * cannot change the tail of the log half way
+ * through a transaction as this may be the only
+ * transaction in the log and moving th etail to
+ * point to the middle of it will prevent
+ * recovery from finding the start of the
+ * transaction. Hence we should only update the
+ * last_sync_lsn if this iclog contains
+ * transaction completion callbacks on it.
+ *
+ * We have to do this before we drop the
* icloglock to ensure we are the only one that
* can update it.
*/
ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
- atomic64_set(&log->l_last_sync_lsn,
- be64_to_cpu(iclog->ic_header.h_lsn));
+ if (iclog->ic_callback)
+ atomic64_set(&log->l_last_sync_lsn,
+ be64_to_cpu(iclog->ic_header.h_lsn));
} else
ioerrors++;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (35 preceding siblings ...)
2012-12-03 23:42 ` [3.0-stable PATCH 36/36] xfs: only update the last_sync_lsn when a transaction completes Mark Tinguely
@ 2012-12-04 21:44 ` Ben Myers
2012-12-05 21:45 ` Dave Chinner
37 siblings, 0 replies; 46+ messages in thread
From: Ben Myers @ 2012-12-04 21:44 UTC (permalink / raw)
To: Mark Tinguely; +Cc: stable, xfs
On Mon, Dec 03, 2012 at 05:42:08PM -0600, Mark Tinguely wrote:
> Here a collection of bug fixes for 3.0-stable. Many of these patches
> were also selected by Dave Chinner as possible 3.0-stable patches:
> http://oss.sgi.com/archives/xfs/2012-08/msg00255.html
>
> I chose only bug fixes and kept the changes to a minimum.
>
> Patch 21/22 are required for the bug fix in patch 23 but they are
> important changes in their own right.
Acked-by: Ben Myers <bpm@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
` (36 preceding siblings ...)
2012-12-04 21:44 ` [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Ben Myers
@ 2012-12-05 21:45 ` Dave Chinner
2012-12-06 17:27 ` Mark Tinguely
37 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2012-12-05 21:45 UTC (permalink / raw)
To: Mark Tinguely; +Cc: stable, xfs
On Mon, Dec 03, 2012 at 05:42:08PM -0600, Mark Tinguely wrote:
> Here a collection of bug fixes for 3.0-stable. Many of these patches
> were also selected by Dave Chinner as possible 3.0-stable patches:
> http://oss.sgi.com/archives/xfs/2012-08/msg00255.html
>
> I chose only bug fixes and kept the changes to a minimum.
>
> Patch 21/22 are required for the bug fix in patch 23 but they are
> important changes in their own right.
So I'll ask the same question that Christoph asked me: If nobody is
reporting problems on 3.0.x, why do this and risk regression and
fallout that requires fixing?
FWIW, what testing have you done?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-05 21:45 ` Dave Chinner
@ 2012-12-06 17:27 ` Mark Tinguely
2012-12-07 10:06 ` Dave Chinner
0 siblings, 1 reply; 46+ messages in thread
From: Mark Tinguely @ 2012-12-06 17:27 UTC (permalink / raw)
To: Dave Chinner; +Cc: stable, xfs
On 12/05/12 15:45, Dave Chinner wrote:
> On Mon, Dec 03, 2012 at 05:42:08PM -0600, Mark Tinguely wrote:
>> Here a collection of bug fixes for 3.0-stable. Many of these patches
>> were also selected by Dave Chinner as possible 3.0-stable patches:
>> http://oss.sgi.com/archives/xfs/2012-08/msg00255.html
>>
>> I chose only bug fixes and kept the changes to a minimum.
>>
>> Patch 21/22 are required for the bug fix in patch 23 but they are
>> important changes in their own right.
>
> So I'll ask the same question that Christoph asked me: If nobody is
> reporting problems on 3.0.x, why do this and risk regression and
> fallout that requires fixing?
>
> FWIW, what testing have you done?
>
> Cheers,
>
> Dave.
Do you mean?
http://oss.sgi.com/archives/xfs/2012-09/msg00002.html
I read that message as a concern that your original Linux 3.0-stable
patch series contained some items that did not meet the -stable
criteria.
As for adding patches to 3.0-stable. I believed then and now that
proactively suggesting bug fixes into 3.0-stable is a good thing
because it is the long term stable branch.
A few days after Christoph's email, I put my "Reviewed-by:" on your
series.
http://oss.sgi.com/archives/xfs/2012-09/msg00167.html
As for testing, the whole series is spun on xfstests loops for days on
x86_32 and x86_64 boxes, just like we test a top of tree patch series.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-06 17:27 ` Mark Tinguely
@ 2012-12-07 10:06 ` Dave Chinner
2012-12-07 21:15 ` Ben Myers
0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2012-12-07 10:06 UTC (permalink / raw)
To: Mark Tinguely; +Cc: stable, xfs
On Thu, Dec 06, 2012 at 11:27:22AM -0600, Mark Tinguely wrote:
> On 12/05/12 15:45, Dave Chinner wrote:
> >On Mon, Dec 03, 2012 at 05:42:08PM -0600, Mark Tinguely wrote:
> >>Here a collection of bug fixes for 3.0-stable. Many of these patches
> >>were also selected by Dave Chinner as possible 3.0-stable patches:
> >> http://oss.sgi.com/archives/xfs/2012-08/msg00255.html
> >>
> >>I chose only bug fixes and kept the changes to a minimum.
> >>
> >>Patch 21/22 are required for the bug fix in patch 23 but they are
> >>important changes in their own right.
> >
> >So I'll ask the same question that Christoph asked me: If nobody is
> >reporting problems on 3.0.x, why do this and risk regression and
> >fallout that requires fixing?
> >
> >FWIW, what testing have you done?
>
> Do you mean?
>
> http://oss.sgi.com/archives/xfs/2012-09/msg00002.html
>
> I read that message as a concern that your original Linux 3.0-stable
> patch series contained some items that did not meet the -stable
> criteria.
I read it as "why change something that no-one is reporting bugs
for?". I posted that series because I had to do the work for RHEL to
address customer reported problems, not because I felt like pushing
a bunch of fixes back to 3.0.
I've spent quite a bit of time over the past few weeks dealing with
various weird regressions as a result of that backport. If you're
going to backport a singificant amount of stuff to 3.0.x, then
that's what you are signing up for. i.e. doing all the bug triage
and fixing that will result from the backport...
> As for adding patches to 3.0-stable. I believed then and now that
> proactively suggesting bug fixes into 3.0-stable is a good thing
> because it is the long term stable branch.
Which is in direct contrast to what most of us think. That is, if
nobody is reporting problems, then it ain't broke and it doesn't
need fixing.
> A few days after Christoph's email, I put my "Reviewed-by:" on your
> series.
>
> http://oss.sgi.com/archives/xfs/2012-09/msg00167.html
>
> As for testing, the whole series is spun on xfstests loops for days on
> x86_32 and x86_64 boxes, just like we test a top of tree patch series.
Which we all know does not catch all possible regressions. What
about crash/shutdown testing? Or load/stress testing?
/me is playing Devil's Advocate because I'm not signing up to
triage a whole new set of 3.0.x stable kernel regressions when
nobody is currently reporting problems.....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-07 10:06 ` Dave Chinner
@ 2012-12-07 21:15 ` Ben Myers
2012-12-08 12:06 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 46+ messages in thread
From: Ben Myers @ 2012-12-07 21:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, Greg KH, Mark Tinguely, stable, xfs
Hey,
On Fri, Dec 07, 2012 at 09:06:47PM +1100, Dave Chinner wrote:
> On Thu, Dec 06, 2012 at 11:27:22AM -0600, Mark Tinguely wrote:
> > On 12/05/12 15:45, Dave Chinner wrote:
> > >On Mon, Dec 03, 2012 at 05:42:08PM -0600, Mark Tinguely wrote:
> > >>Here a collection of bug fixes for 3.0-stable. Many of these patches
> > >>were also selected by Dave Chinner as possible 3.0-stable patches:
> > >> http://oss.sgi.com/archives/xfs/2012-08/msg00255.html
> > >>
> > >>I chose only bug fixes and kept the changes to a minimum.
> > >>
> > >>Patch 21/22 are required for the bug fix in patch 23 but they are
> > >>important changes in their own right.
> > >
> > >So I'll ask the same question that Christoph asked me: If nobody is
> > >reporting problems on 3.0.x, why do this and risk regression and
> > >fallout that requires fixing?
> > >
> > >FWIW, what testing have you done?
> >
> > Do you mean?
> >
> > http://oss.sgi.com/archives/xfs/2012-09/msg00002.html
> >
> > I read that message as a concern that your original Linux 3.0-stable
> > patch series contained some items that did not meet the -stable
> > criteria.
>
> I read it as "why change something that no-one is reporting bugs
> for?".
I guess we could all stop putting words in his mouth and let him speak for
himself. Christoph (cc'd), would you please clarify your position?
> I posted that series because I had to do the work for RHEL to
> address customer reported problems, not because I felt like pushing
> a bunch of fixes back to 3.0.
>
> I've spent quite a bit of time over the past few weeks dealing with
> various weird regressions as a result of that backport. If you're
> going to backport a singificant amount of stuff to 3.0.x, then
> that's what you are signing up for. i.e. doing all the bug triage
> and fixing that will result from the backport...
>
> > As for adding patches to 3.0-stable. I believed then and now that
> > proactively suggesting bug fixes into 3.0-stable is a good thing
> > because it is the long term stable branch.
>
> Which is in direct contrast to what most of us think. That is, if
> nobody is reporting problems, then it ain't broke and it doesn't
> need fixing.
Who are you speaking for?
In my opinion, proposing appropriate bug fixes for inclusion in -stable has
merit regardless of whether the bug has been reported by -stable users. If the
fix fits the -stable criteria and there is a chance a stable user will hit the
bug it can be proposed.
> > A few days after Christoph's email, I put my "Reviewed-by:" on your
> > series.
> >
> > http://oss.sgi.com/archives/xfs/2012-09/msg00167.html
> >
> > As for testing, the whole series is spun on xfstests loops for days on
> > x86_32 and x86_64 boxes, just like we test a top of tree patch series.
>
> Which we all know does not catch all possible regressions. What
> about crash/shutdown testing? Or load/stress testing?
I have no objection to additional testing of proposed -stable patches. However
I believe it is impossible to catch 'all possible regressions'. There will
always be some risk here.
> /me is playing Devil's Advocate because I'm not signing up to
> triage a whole new set of 3.0.x stable kernel regressions when
> nobody is currently reporting problems.....
SGI XFS product is based directly upon -stable branches and I'd like to track
these branches as closely as possible. This aligns the interests of the SGI
XFS team and -stable users. If there are regressions, myself, Mark, Phil,
Rich, and Andrew are signed up to fix them regardless of whether you wish to be
involved. Paying customers come first as always, but the bugs will be fixed
either way.
Stable folk (cc'd Greg), what is your disposition with regard to proposing
patches for -stable proactively? Do we really need to have a bug report from a
3.0-stable user for every bug we propose for 3.0-stable?
Thanks,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-07 21:15 ` Ben Myers
@ 2012-12-08 12:06 ` Christoph Hellwig
2012-12-08 19:12 ` Greg KH
2012-12-10 0:24 ` Dave Chinner
2 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2012-12-08 12:06 UTC (permalink / raw)
To: Ben Myers; +Cc: Mark Tinguely, Greg KH, stable, xfs, Christoph Hellwig
On Fri, Dec 07, 2012 at 03:15:36PM -0600, Ben Myers wrote:
> > > >So I'll ask the same question that Christoph asked me: If nobody is
> > > >reporting problems on 3.0.x, why do this and risk regression and
> > > >fallout that requires fixing?
> > > >
> > > >FWIW, what testing have you done?
> > >
> > > Do you mean?
> > >
> > > http://oss.sgi.com/archives/xfs/2012-09/msg00002.html
> > >
> > > I read that message as a concern that your original Linux 3.0-stable
> > > patch series contained some items that did not meet the -stable
> > > criteria.
> >
> > I read it as "why change something that no-one is reporting bugs
> > for?".
>
> I guess we could all stop putting words in his mouth and let him speak for
> himself. Christoph (cc'd), would you please clarify your position?
Well, both of the above applies. A lot of the patches are defintively
above the normal -stable criteria. If we have a reall good reason we
might be able to bend the criteria, but I'd really love to see
justificaton for that, preferably in form of user reports of grave
issues.
> Stable folk (cc'd Greg), what is your disposition with regard to proposing
> patches for -stable proactively? Do we really need to have a bug report from a
> 3.0-stable user for every bug we propose for 3.0-stable?
I don't think we'll need any report for things that are
a) obvious lingering big problems
and have
b) non-invasive fixes
occasionally just one of the criteria should be enough.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-07 21:15 ` Ben Myers
2012-12-08 12:06 ` Christoph Hellwig
@ 2012-12-08 19:12 ` Greg KH
2012-12-10 0:24 ` Dave Chinner
2 siblings, 0 replies; 46+ messages in thread
From: Greg KH @ 2012-12-08 19:12 UTC (permalink / raw)
To: Ben Myers; +Cc: Christoph Hellwig, Mark Tinguely, stable, xfs
On Fri, Dec 07, 2012 at 03:15:36PM -0600, Ben Myers wrote:
> Stable folk (cc'd Greg), what is your disposition with regard to proposing
> patches for -stable proactively? Do we really need to have a bug report from a
> 3.0-stable user for every bug we propose for 3.0-stable?
My rules are if the subsystem maintainer(s) agree that the patches
should be applied, and they meet the stable tree guidelines, then I
accept them. So it's up to you all, I'll do whatever you suggest here.
thanks,
greg k-h
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-07 21:15 ` Ben Myers
2012-12-08 12:06 ` Christoph Hellwig
2012-12-08 19:12 ` Greg KH
@ 2012-12-10 0:24 ` Dave Chinner
2012-12-10 22:03 ` Ben Myers
2 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2012-12-10 0:24 UTC (permalink / raw)
To: Ben Myers; +Cc: Christoph Hellwig, Greg KH, Mark Tinguely, stable, xfs
On Fri, Dec 07, 2012 at 03:15:36PM -0600, Ben Myers wrote:
> > > As for adding patches to 3.0-stable. I believed then and now that
> > > proactively suggesting bug fixes into 3.0-stable is a good thing
> > > because it is the long term stable branch.
> >
> > Which is in direct contrast to what most of us think. That is, if
> > nobody is reporting problems, then it ain't broke and it doesn't
> > need fixing.
>
> Who are you speaking for?
The people who have had to maintain the stable trees for the past
few years. It doesn't take a rocket scientist to work out who those
people are...
> > /me is playing Devil's Advocate because I'm not signing up to
> > triage a whole new set of 3.0.x stable kernel regressions when
> > nobody is currently reporting problems.....
>
> SGI XFS product is based directly upon -stable branches and I'd like to track
> these branches as closely as possible.
I'd say that's an important piece of information - i.e. stating the
motivation for doing this work. Especially as you might be
including patches that fix bugs that have never been reported
outside of SGI customers.
FWIW, I had no idea that SGI is now basing their XFS-derived
products off a current mainline tree. Can you point us to the
relevant XFS source code for these product releases? I, for one, am
interested in the updated DMAPI support infrastructure and how SGI
has implemented all the little tweaks mentioned in SGI's XFS
documentation (e.g. ibound and agskip)...
> This aligns the interests of the SGI
> XFS team and -stable users.
Enough with the marketing speak, already. Be up front with you
motivations - it helps prevent a lot of misunderstandings.
> If there are regressions, myself, Mark, Phil, Rich, and Andrew are
> signed up to fix them regardless of whether you wish to be
> involved.
I'm looking forward to seeing you guys run front-line community
bug triage, then.... :)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches
2012-12-10 0:24 ` Dave Chinner
@ 2012-12-10 22:03 ` Ben Myers
0 siblings, 0 replies; 46+ messages in thread
From: Ben Myers @ 2012-12-10 22:03 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, Greg KH, Mark Tinguely, stable, xfs
Hi Dave,
On Mon, Dec 10, 2012 at 11:24:43AM +1100, Dave Chinner wrote:
> On Fri, Dec 07, 2012 at 03:15:36PM -0600, Ben Myers wrote:
> > > > As for adding patches to 3.0-stable. I believed then and now that
> > > > proactively suggesting bug fixes into 3.0-stable is a good thing
> > > > because it is the long term stable branch.
> > >
> > > Which is in direct contrast to what most of us think. That is, if
> > > nobody is reporting problems, then it ain't broke and it doesn't
> > > need fixing.
> >
> > Who are you speaking for?
>
> The people who have had to maintain the stable trees for the past
> few years. It doesn't take a rocket scientist to work out who those
> people are...
>
> > > /me is playing Devil's Advocate because I'm not signing up to
> > > triage a whole new set of 3.0.x stable kernel regressions when
> > > nobody is currently reporting problems.....
> >
> > SGI XFS product is based directly upon -stable branches and I'd like to track
> > these branches as closely as possible.
>
> I'd say that's an important piece of information - i.e. stating the
> motivation for doing this work. Especially as you might be
> including patches that fix bugs that have never been reported
> outside of SGI customers.
>
> FWIW, I had no idea that SGI is now basing their XFS-derived
> products off a current mainline tree.
Our XFS product is based on 3.0-stable. This is a new development which
I've been working toward for a little while. There are other ways we
could be doing this, but I think it's best to work directly in the
stable branches if possible.
> Can you point us to the relevant XFS source code for these product
> releases?
The sources ship with the product.
> I, for one, am interested in the updated DMAPI support infrastructure and how
> SGI has implemented all the little tweaks mentioned in SGI's XFS
> documentation (e.g. ibound and agskip)...
Sure... Take a look here for dmapi:
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=shortlog;h=refs/heads/v3.0-xfs_dmapi
Rich volunteered to post ibound and agskip. The poor guy keeps
volunteering for things. He never learns.
> > This aligns the interests of the SGI XFS team and -stable users.
>
> Enough with the marketing speak, already. Be up front with you
> motivations - it helps prevent a lot of misunderstandings.
That accurately described I'm trying to do. I don't think I'd last
very long in marketing but you never know: maybe I've missed my
calling. Get over it, already. ;)
> > If there are regressions, myself, Mark, Phil, Rich, and Andrew are
> > signed up to fix them regardless of whether you wish to be
> > involved.
>
> I'm looking forward to seeing you guys run front-line community
> bug triage, then.... :)
We'll be happy to be more involved in front-line triage. I chatted with
management about this and we'll get the thing organised.
We don't have to do it this way, but as long as the work is being done
it might as well benefit the community. That is basically the same
proposition you made when you posted your series for -stable. Mark has
removed the content that seemed inappropriate and posted again. Is
there additional content in this series that you feel is not appropriate
for stable? I'm sure Mark will be willing to have another go at this.
Thanks,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 46+ messages in thread