* [PATCH] xfs: add discard support (at transaction commit)
@ 2010-05-09 17:50 Christoph Hellwig
2010-05-09 21:59 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2010-05-09 17:50 UTC (permalink / raw)
To: xfs
Now that we have reliably tracking of deleted extents in a transaction
we can easily implement "online" discard support which calls
blkdev_issue_discard once a transaction commits. We simply have to
walk the list of busy extents after transaction commit, but before deleting
it from the rbtree tracking these busy extents.
This does not replace by background discard support patch which is probably
better for thin provisioned arrays - I will updated it to apply ontop of
this patch when I'm ready to re-submit it.
[Note: this patch needs Dave's delayed-logging series any patch titled
"xfs: simplify log item descriptor tracking" applied]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2010-05-09 19:08:34.544262404 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2010-05-09 19:09:24.975032203 +0200
@@ -2761,3 +2761,25 @@ xfs_alloc_busy_clear(
kmem_free(busyp);
}
+
+int
+xfs_discard_extent(
+ struct xfs_mount *mp,
+ struct xfs_busy_extent *busyp)
+{
+ int error = 0;
+ xfs_daddr_t bno;
+ __int64_t len;
+
+ if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
+ return 0;
+
+ bno = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno);
+ len = XFS_FSB_TO_BB(mp, busyp->length);
+
+ error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
+ GFP_NOFS, DISCARD_FL_BARRIER);
+ if (error && error != EOPNOTSUPP)
+ xfs_fs_cmn_err(CE_NOTE, mp, "discard failed, error %d", error);
+ return error;
+}
Index: xfs/fs/xfs/xfs_alloc.h
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.h 2010-05-09 19:08:34.551265756 +0200
+++ xfs/fs/xfs/xfs_alloc.h 2010-05-09 19:09:16.213023263 +0200
@@ -128,6 +128,9 @@ xfs_alloc_busy_insert(xfs_trans_t *tp,
void
xfs_alloc_busy_clear(struct xfs_mount *mp, struct xfs_busy_extent *busyp);
+int
+xfs_discard_extent(struct xfs_mount *mp, struct xfs_busy_extent *busyp);
+
#endif /* __KERNEL__ */
/*
Index: xfs/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2010-05-09 19:08:34.562254302 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_super.c 2010-05-09 19:09:16.223693748 +0200
@@ -120,6 +120,8 @@ mempool_t *xfs_ioend_pool;
#define MNTOPT_DMI "dmi" /* DMI enabled (DMAPI / XDSM) */
#define MNTOPT_DELAYLOG "delaylog" /* Delayed loging enabled */
#define MNTOPT_NODELAYLOG "nodelaylog" /* Delayed loging disabled */
+#define MNTOPT_DISCARD "discard" /* Discard unused blocks */
+#define MNTOPT_NODISCARD "nodiscard" /* Do not discard unused blocks */
/*
* Table driven mount option parser.
@@ -382,6 +384,10 @@ xfs_parseargs(
"- use at your own risk.\n");
} else if (!strcmp(this_char, MNTOPT_NODELAYLOG)) {
mp->m_flags &= ~XFS_MOUNT_DELAYLOG;
+ } else if (!strcmp(this_char, MNTOPT_DISCARD)) {
+ mp->m_flags |= XFS_MOUNT_DISCARD;
+ } else if (!strcmp(this_char, MNTOPT_NODISCARD)) {
+ mp->m_flags &= ~XFS_MOUNT_DISCARD;
} else if (!strcmp(this_char, "ihashsize")) {
cmn_err(CE_WARN,
"XFS: ihashsize no longer used, option is deprecated.");
Index: xfs/fs/xfs/xfs_mount.h
===================================================================
--- xfs.orig/fs/xfs/xfs_mount.h 2010-05-09 19:08:34.571254232 +0200
+++ xfs/fs/xfs/xfs_mount.h 2010-05-09 19:09:16.231005734 +0200
@@ -274,6 +274,7 @@ typedef struct xfs_mount {
#define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem
operations, typically for
disk errors in metadata */
+#define XFS_MOUNT_DISCARD (1ULL << 5) /* discard unused blocks */
#define XFS_MOUNT_RETERR (1ULL << 6) /* return alignment errors to
user */
#define XFS_MOUNT_NOALIGN (1ULL << 7) /* turn off stripe alignment
Index: xfs/fs/xfs/xfs_log_cil.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_cil.c 2010-05-09 19:08:34.580004407 +0200
+++ xfs/fs/xfs/xfs_log_cil.c 2010-05-09 19:09:16.241255699 +0200
@@ -415,6 +415,7 @@ xlog_cil_committed(
int abort)
{
struct xfs_cil_ctx *ctx = args;
+ struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
struct xfs_log_vec *lv;
int abortflag = abort ? XFS_LI_ABORTED : 0;
struct xfs_busy_extent *busyp, *n;
@@ -425,8 +426,10 @@ xlog_cil_committed(
abortflag);
}
- list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
- xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
+ list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list) {
+ xfs_discard_extent(mp, busyp);
+ xfs_alloc_busy_clear(mp, busyp);
+ }
spin_lock(&ctx->cil->xc_cil_lock);
list_del(&ctx->committing);
Index: xfs/fs/xfs/xfs_trans.c
===================================================================
--- xfs.orig/fs/xfs/xfs_trans.c 2010-05-09 19:08:34.596025010 +0200
+++ xfs/fs/xfs/xfs_trans.c 2010-05-09 19:09:16.252005804 +0200
@@ -1398,6 +1398,7 @@ xfs_trans_committed(
int abortflag)
{
struct xfs_log_item_desc *lidp, *next;
+ struct xfs_busy_extent *busyp;
/* Call the transaction's completion callback if there is one. */
if (tp->t_callback != NULL)
@@ -1408,6 +1409,9 @@ xfs_trans_committed(
xfs_trans_free_item_desc(lidp);
}
+ list_for_each_entry(busyp, &tp->t_busy, list)
+ xfs_discard_extent(tp->t_mountp, busyp);
+
xfs_trans_free(tp);
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: add discard support (at transaction commit)
2010-05-09 17:50 [PATCH] xfs: add discard support (at transaction commit) Christoph Hellwig
@ 2010-05-09 21:59 ` Dave Chinner
2010-07-26 9:29 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2010-05-09 21:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Sun, May 09, 2010 at 01:50:48PM -0400, Christoph Hellwig wrote:
> Now that we have reliably tracking of deleted extents in a transaction
> we can easily implement "online" discard support which calls
> blkdev_issue_discard once a transaction commits. We simply have to
> walk the list of busy extents after transaction commit, but before deleting
> it from the rbtree tracking these busy extents.
>
> This does not replace by background discard support patch which is probably
> better for thin provisioned arrays - I will updated it to apply ontop of
> this patch when I'm ready to re-submit it.
I think this can be made to work, but I don't really like it that
much, especially the barrier flush part. Is there any particular
reason we need to issue discards at this level apart from "other
filesystems are doing it" rather than doing it lazily in a
non-performance critical piece of code?
Regardless of this, some questions about the patch come to mind:
1. is it safe to block the xfslogd in the block layer in,
say, get_request()? i.e. should we be issuing IO from an IO
completion handler? That raises red flags in my head...
2. issuing discards will block xfslogd and potentially stall
the log if there are lots of discards to issue.
3. DISCARD_FL_BARRIER appears to be used to allow async
issuing of the discard to ensure any followup write has the
discard processed first. What happens if the device does not
support barriers or barriers are turned off?
To me it appears that a lack of barriers could result in a
write being reordered in front of the discard. e.g.
delalloc results in btree block freed, marked busy. New
delalloc occurs, allocates block, marked sync, forces log,
issues async discard, completes transaction and then writes
data async. Which operation does the drive see and complete
first - the discard or the data write?
4. A barrier IO on every discard? In a word: Ouch.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: add discard support (at transaction commit)
2010-05-09 21:59 ` Dave Chinner
@ 2010-07-26 9:29 ` Christoph Hellwig
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2010-07-26 9:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: lczerner, jmoyer, snitzer, xfs
On Mon, May 10, 2010 at 07:59:44AM +1000, Dave Chinner wrote:
> I think this can be made to work, but I don't really like it that
> much, especially the barrier flush part. Is there any particular
> reason we need to issue discards at this level apart from "other
> filesystems are doing it" rather than doing it lazily in a
> non-performance critical piece of code?
To be able to benchmark it. I finally had the opportunity to test
discard against a real life highend storage device - an iscsi attached
RAID array from a big name vendor (not quite sure about the NDA status,
so I can't name which one exactly yet).
Running postmark with our without discards enabled with a setup Lukas
created the online discard support is much faster than the offline
one. With online discard on XFS we are getting close to the speed
without discards.
Unfortunately the discard performance isn't always 100% determinstic so
we have some error margin, but with the online discard support discards
are in the range of 1.5% faster than no
discard mode to 3% slower, while the offline discard is in the range
of 4 to 8% slower. That still compares favourable to the online discard
in ext4 which is 10% to 15% slower than not using discard. The btrfs
is online discard is in the same range as XFS, with a tendency to be
even slightly faster.
> Regardless of this, some questions about the patch come to mind:
>
> 1. is it safe to block the xfslogd in the block layer in,
> say, get_request()? i.e. should we be issuing IO from an IO
> completion handler? That raises red flags in my head...
Good question - probably not a that smart idea.
>
> 2. issuing discards will block xfslogd and potentially stall
> the log if there are lots of discards to issue.
> 3. DISCARD_FL_BARRIER appears to be used to allow async
> issuing of the discard to ensure any followup write has the
> discard processed first. What happens if the device does not
> support barriers or barriers are turned off?
These days all useful devices do support barriers, and certainly
any device that supports discards does. But if not we'll get back
an EOPNOTUPP error which simply gets ignored.
Note that for a device that does not support write barriers (which
includes basically any highend storage device that supports scsi thin
provisioning) the DISCARD flag currently evaluates to a queue drain,
but even that isn't actually nessecary in this case. It will be gone
soon as part changes I plan to the barrier code.
> 4. A barrier IO on every discard? In a word: Ouch.
Whne running with the delaylog option the discard don't happen very
often anymore. Basically we'll do one discard everytime we write a log
record. Which in fact already did a barrier anyway, so the the effect
isn't that bad. We'd could in fact get away with just removing the
barrier and doing a cache flush just after the discard, but before
removing the extents from the busy list.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-07-26 9:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-09 17:50 [PATCH] xfs: add discard support (at transaction commit) Christoph Hellwig
2010-05-09 21:59 ` Dave Chinner
2010-07-26 9:29 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox