* [PATCH 0/8] xfs: current pending patches
@ 2010-08-18 11:36 Dave Chinner
2010-08-18 11:36 ` [PATCH 1/8] xfs: unlock items before allowing the CIL to commit Dave Chinner
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
Folks,
This is the current list of patches that I have for the main XFS tree.
Some of the patches have been reviewed, the ones that haven't got reviewed-by
tags have been updated after the last round of review and need another pass.
The patches are also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git for-oss
Note that this branch is currently based on 2.6.36-rc1, so if you pull from it
and try it you are likely to have a kernel that hangs in the writeback code. I
will rebase it on 2.6.35 once I get everything reviewed and have to update all
the tags.
Cheers,
Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] xfs: unlock items before allowing the CIL to commit
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 11:36 ` [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE Dave Chinner
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.
Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_log_cil.c | 14 ++++++++++++++
fs/xfs/xfs_trans.c | 5 +----
fs/xfs/xfs_trans_priv.h | 3 ++-
3 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 31e4ea2..ef8e7d9 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -377,9 +377,23 @@ xfs_log_commit_cil(
xfs_log_done(mp, tp->t_ticket, NULL, log_flags);
xfs_trans_unreserve_and_mod_sb(tp);
+ /*
+ * Once all the items of the transaction have been copied to the CIL,
+ * the items can be unlocked and freed.
+ *
+ * This needs to be done before we drop the CIL context lock because we
+ * have to update state in the log items and unlock them before they go
+ * to disk. If we don't, then the CIL checkpoint can race with us and
+ * we can run checkpoint completion before we've updated and unlocked
+ * the log items. This affects (at least) processing of stale buffers,
+ * inodes and EFIs.
+ */
+ xfs_trans_free_items(tp, *commit_lsn, 0);
+
/* check for background commit before unlock */
if (log->l_cilp->xc_ctx->space_used > XLOG_CIL_SPACE_LIMIT(log))
push = 1;
+
up_read(&log->l_cilp->xc_ctx_lock);
/*
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index fdca741..1c47eda 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1167,7 +1167,7 @@ xfs_trans_del_item(
* Unlock all of the items of a transaction and free all the descriptors
* of that transaction.
*/
-STATIC void
+void
xfs_trans_free_items(
struct xfs_trans *tp,
xfs_lsn_t commit_lsn,
@@ -1653,9 +1653,6 @@ xfs_trans_commit_cil(
return error;
current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
-
- /* xfs_trans_free_items() unlocks them first */
- xfs_trans_free_items(tp, *commit_lsn, 0);
xfs_trans_free(tp);
return 0;
}
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index e2d93d8..62da86c 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -25,7 +25,8 @@ struct xfs_trans;
void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
void xfs_trans_del_item(struct xfs_log_item *);
-
+void xfs_trans_free_items(struct xfs_trans *tp, xfs_lsn_t commit_lsn,
+ int flags);
void xfs_trans_item_committed(struct xfs_log_item *lip,
xfs_lsn_t commit_lsn, int aborted);
void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
2010-08-18 11:36 ` [PATCH 1/8] xfs: unlock items before allowing the CIL to commit Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-19 12:06 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 3/8] xfs: Reduce log force overhead for delayed logging Dave Chinner
` (5 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Under heavy load parallel metadata loads (e.g. dbench), we can fail
to mark all the inodes in a cluster being freed as XFS_ISTALE as we
skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
When this happens and the inode cluster buffer has already been
marked stale and freed, inode reclaim can try to write the inode out
as it is dirty and not marked stale. This can result in writing th
metadata to an freed extent, or in the case it has already
been overwritten trigger a magic number check failure and return an
EUCLEAN error such as:
Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
Fix this by ensuring that we hoover up all in memory inodes in the
cluster and mark them XFS_ISTALE when freeing the cluster.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 49 ++++++++++++++++++++++++++-----------------------
1 files changed, 26 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 68415cb..34798f3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1914,6 +1914,11 @@ xfs_iunlink_remove(
return 0;
}
+/*
+ * A big issue when freeing the inode cluster is is that we _cannot_ skip any
+ * inodes that are in memory - they all must be marked stale and attached to
+ * the cluster buffer.
+ */
STATIC void
xfs_ifree_cluster(
xfs_inode_t *free_ip,
@@ -1945,8 +1950,6 @@ xfs_ifree_cluster(
}
for (j = 0; j < nbufs; j++, inum += ninodes) {
- int found = 0;
-
blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum),
XFS_INO_TO_AGBNO(mp, inum));
@@ -1965,7 +1968,9 @@ xfs_ifree_cluster(
/*
* Walk the inodes already attached to the buffer and mark them
* stale. These will all have the flush locks held, so an
- * in-memory inode walk can't lock them.
+ * in-memory inode walk can't lock them. By marking them all
+ * stale first, we will not attempt to lock them in the loop
+ * below as the XFS_ISTALE flag will be set.
*/
lip = XFS_BUF_FSPRIVATE(bp, xfs_log_item_t *);
while (lip) {
@@ -1977,11 +1982,11 @@ xfs_ifree_cluster(
&iip->ili_flush_lsn,
&iip->ili_item.li_lsn);
xfs_iflags_set(iip->ili_inode, XFS_ISTALE);
- found++;
}
lip = lip->li_bio_list;
}
+
/*
* For each inode in memory attempt to add it to the inode
* buffer and set it up for being staled on buffer IO
@@ -1993,6 +1998,7 @@ xfs_ifree_cluster(
* even trying to lock them.
*/
for (i = 0; i < ninodes; i++) {
+retry:
read_lock(&pag->pag_ici_lock);
ip = radix_tree_lookup(&pag->pag_ici_root,
XFS_INO_TO_AGINO(mp, (inum + i)));
@@ -2003,38 +2009,36 @@ xfs_ifree_cluster(
continue;
}
- /* don't try to lock/unlock the current inode */
+ /*
+ * Don't try to lock/unlock the current inode, but we
+ * _cannot_ skip the other inodes that we did not find
+ * in the list attached to the buffer and are not
+ * already marked stale. If we can't lock it, back off
+ * and retry.
+ */
if (ip != free_ip &&
!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
read_unlock(&pag->pag_ici_lock);
- continue;
+ delay(1);
+ goto retry;
}
read_unlock(&pag->pag_ici_lock);
- if (!xfs_iflock_nowait(ip)) {
- if (ip != free_ip)
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- continue;
- }
-
+ xfs_iflock(ip);
xfs_iflags_set(ip, XFS_ISTALE);
- if (xfs_inode_clean(ip)) {
- ASSERT(ip != free_ip);
- xfs_ifunlock(ip);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- continue;
- }
+ /*
+ * we don't need to attach clean inodes or those only
+ * with unlogged changes (which we throw away, anyway).
+ */
iip = ip->i_itemp;
- if (!iip) {
- /* inode with unlogged changes only */
+ if (!iip || xfs_inode_clean(ip)) {
ASSERT(ip != free_ip);
ip->i_update_core = 0;
xfs_ifunlock(ip);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
continue;
}
- found++;
iip->ili_last_fields = iip->ili_format.ilf_fields;
iip->ili_format.ilf_fields = 0;
@@ -2049,8 +2053,7 @@ xfs_ifree_cluster(
xfs_iunlock(ip, XFS_ILOCK_EXCL);
}
- if (found)
- xfs_trans_stale_inode_buf(tp, bp);
+ xfs_trans_stale_inode_buf(tp, bp);
xfs_trans_binval(tp, bp);
}
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] xfs: Reduce log force overhead for delayed logging
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
2010-08-18 11:36 ` [PATCH 1/8] xfs: unlock items before allowing the CIL to commit Dave Chinner
2010-08-18 11:36 ` [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 11:36 ` [PATCH 4/8] xfs: dummy transactions should not dirty VFS state Dave Chinner
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Delayed logging adds some serialisation to the log force process to
ensure that it does not deference a bad commit context structure
when determining if a CIL push is necessary or not. It does this by
grabing the CIL context lock exclusively, then dropping it before
pushing the CIL if necessary. This causes serialisation of all log
forces and pushes regardless of whether a force is necessary or not.
As a result fsync heavy workloads (like dbench) can be significantly
slower with delayed logging than without.
To avoid this penalty, copy the current sequence from the context to
the CIL structure when they are swapped. This allows us to do
unlocked checks on the current sequence without having to worry
about dereferencing context structures that may have already been
freed. Hence we can remove the CIL context locking in the forcing
code and only call into the push code if the current context matches
the sequence we need to force.
By passing the sequence into the push code, we can check the
sequence again once we have the CIL lock held exclusive and abort if
the sequence has already been pushed. This avoids a lock round-trip
and unnecessary CIL pushes when we have racing push calls.
The result is that the regression in dbench performance goes away -
this change improves dbench performance on a ramdisk from ~2100MB/s
to ~2500MB/s. This compares favourably to not using delayed logging
which retuns ~2500MB/s for the same workload.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_log.c | 7 +-
fs/xfs/xfs_log_cil.c | 245 ++++++++++++++++++++++++++-----------------------
fs/xfs/xfs_log_priv.h | 13 ++-
3 files changed, 147 insertions(+), 118 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 925d572..33f718f 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3015,7 +3015,8 @@ _xfs_log_force(
XFS_STATS_INC(xs_log_force);
- xlog_cil_push(log, 1);
+ if (log->l_cilp)
+ xlog_cil_force(log);
spin_lock(&log->l_icloglock);
@@ -3167,7 +3168,7 @@ _xfs_log_force_lsn(
XFS_STATS_INC(xs_log_force);
if (log->l_cilp) {
- lsn = xlog_cil_push_lsn(log, lsn);
+ lsn = xlog_cil_force_lsn(log, lsn);
if (lsn == NULLCOMMITLSN)
return 0;
}
@@ -3724,7 +3725,7 @@ xfs_log_force_umount(
* call below.
*/
if (!logerror && (mp->m_flags & XFS_MOUNT_DELAYLOG))
- xlog_cil_push(log, 1);
+ xlog_cil_force(log);
/*
* We must hold both the GRANT lock and the LOG lock,
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index ef8e7d9..9768f24 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -68,6 +68,7 @@ xlog_cil_init(
ctx->sequence = 1;
ctx->cil = cil;
cil->xc_ctx = ctx;
+ cil->xc_current_sequence = ctx->sequence;
cil->xc_log = log;
log->l_cilp = cil;
@@ -321,94 +322,6 @@ xlog_cil_free_logvec(
}
/*
- * Commit a transaction with the given vector to the Committed Item List.
- *
- * To do this, we need to format the item, pin it in memory if required and
- * account for the space used by the transaction. Once we have done that we
- * need to release the unused reservation for the transaction, attach the
- * transaction to the checkpoint context so we carry the busy extents through
- * to checkpoint completion, and then unlock all the items in the transaction.
- *
- * For more specific information about the order of operations in
- * xfs_log_commit_cil() please refer to the comments in
- * xfs_trans_commit_iclog().
- *
- * Called with the context lock already held in read mode to lock out
- * background commit, returns without it held once background commits are
- * allowed again.
- */
-int
-xfs_log_commit_cil(
- struct xfs_mount *mp,
- struct xfs_trans *tp,
- struct xfs_log_vec *log_vector,
- xfs_lsn_t *commit_lsn,
- int flags)
-{
- struct log *log = mp->m_log;
- int log_flags = 0;
- int push = 0;
-
- if (flags & XFS_TRANS_RELEASE_LOG_RES)
- log_flags = XFS_LOG_REL_PERM_RESERV;
-
- if (XLOG_FORCED_SHUTDOWN(log)) {
- xlog_cil_free_logvec(log_vector);
- return XFS_ERROR(EIO);
- }
-
- /* lock out background commit */
- down_read(&log->l_cilp->xc_ctx_lock);
- xlog_cil_format_items(log, log_vector, tp->t_ticket, commit_lsn);
-
- /* check we didn't blow the reservation */
- if (tp->t_ticket->t_curr_res < 0)
- xlog_print_tic_res(log->l_mp, tp->t_ticket);
-
- /* attach the transaction to the CIL if it has any busy extents */
- if (!list_empty(&tp->t_busy)) {
- spin_lock(&log->l_cilp->xc_cil_lock);
- list_splice_init(&tp->t_busy,
- &log->l_cilp->xc_ctx->busy_extents);
- spin_unlock(&log->l_cilp->xc_cil_lock);
- }
-
- tp->t_commit_lsn = *commit_lsn;
- xfs_log_done(mp, tp->t_ticket, NULL, log_flags);
- xfs_trans_unreserve_and_mod_sb(tp);
-
- /*
- * Once all the items of the transaction have been copied to the CIL,
- * the items can be unlocked and freed.
- *
- * This needs to be done before we drop the CIL context lock because we
- * have to update state in the log items and unlock them before they go
- * to disk. If we don't, then the CIL checkpoint can race with us and
- * we can run checkpoint completion before we've updated and unlocked
- * the log items. This affects (at least) processing of stale buffers,
- * inodes and EFIs.
- */
- xfs_trans_free_items(tp, *commit_lsn, 0);
-
- /* check for background commit before unlock */
- if (log->l_cilp->xc_ctx->space_used > XLOG_CIL_SPACE_LIMIT(log))
- push = 1;
-
- up_read(&log->l_cilp->xc_ctx_lock);
-
- /*
- * We need to push CIL every so often so we don't cache more than we
- * can fit in the log. The limit really is that a checkpoint can't be
- * more than half the log (the current checkpoint is not allowed to
- * overwrite the previous checkpoint), but commit latency and memory
- * usage limit this to a smaller size in most cases.
- */
- if (push)
- xlog_cil_push(log, 0);
- return 0;
-}
-
-/*
* Mark all items committed and clear busy extents. We free the log vector
* chains in a separate pass so that we unpin the log items as quickly as
* possible.
@@ -441,13 +354,23 @@ xlog_cil_committed(
}
/*
- * Push the Committed Item List to the log. If the push_now flag is not set,
- * then it is a background flush and so we can chose to ignore it.
+ * Push the Committed Item List to the log. If @push_seq flag is zero, then it
+ * is a background flush and so we can chose to ignore it. Otherwise, if the
+ * current sequence is the same as @push_seq we need to do a flush. If
+ * @push_seq is less than the current sequence, then it has already been
+ * flushed and we don't need to do anything - the caller will wait for it to
+ * complete if necessary.
+ *
+ * @push_seq is a value rather than a flag because that allows us to do an
+ * unlocked check of the sequence number for a match. Hence we can allows log
+ * forces to run racily and not issue pushes for the same sequence twice. If we
+ * get a race between multiple pushes for the same sequence they will block on
+ * the first one and then abort, hence avoiding needless pushes.
*/
-int
+STATIC int
xlog_cil_push(
struct log *log,
- int push_now)
+ xfs_lsn_t push_seq)
{
struct xfs_cil *cil = log->l_cilp;
struct xfs_log_vec *lv;
@@ -467,12 +390,14 @@ xlog_cil_push(
if (!cil)
return 0;
+ ASSERT(!push_seq || push_seq <= cil->xc_ctx->sequence);
+
new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
new_ctx->ticket = xlog_cil_ticket_alloc(log);
/* lock out transaction commit, but don't block on background push */
if (!down_write_trylock(&cil->xc_ctx_lock)) {
- if (!push_now)
+ if (!push_seq)
goto out_free_ticket;
down_write(&cil->xc_ctx_lock);
}
@@ -483,7 +408,11 @@ xlog_cil_push(
goto out_skip;
/* check for spurious background flush */
- if (!push_now && cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log))
+ if (!push_seq && cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log))
+ goto out_skip;
+
+ /* check for a previously pushed seqeunce */
+ if (push_seq < cil->xc_ctx->sequence)
goto out_skip;
/*
@@ -529,6 +458,13 @@ xlog_cil_push(
cil->xc_ctx = new_ctx;
/*
+ * mirror the new sequence into the cil structure so that we can do
+ * unlocked checks against the current sequence in log forces without
+ * risking deferencing a freed context pointer.
+ */
+ cil->xc_current_sequence = new_ctx->sequence;
+
+ /*
* The switch is now done, so we can drop the context lock and move out
* of a shared context. We can't just go straight to the commit record,
* though - we need to synchronise with previous and future commits so
@@ -640,6 +576,94 @@ out_abort:
}
/*
+ * Commit a transaction with the given vector to the Committed Item List.
+ *
+ * To do this, we need to format the item, pin it in memory if required and
+ * account for the space used by the transaction. Once we have done that we
+ * need to release the unused reservation for the transaction, attach the
+ * transaction to the checkpoint context so we carry the busy extents through
+ * to checkpoint completion, and then unlock all the items in the transaction.
+ *
+ * For more specific information about the order of operations in
+ * xfs_log_commit_cil() please refer to the comments in
+ * xfs_trans_commit_iclog().
+ *
+ * Called with the context lock already held in read mode to lock out
+ * background commit, returns without it held once background commits are
+ * allowed again.
+ */
+int
+xfs_log_commit_cil(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_log_vec *log_vector,
+ xfs_lsn_t *commit_lsn,
+ int flags)
+{
+ struct log *log = mp->m_log;
+ int log_flags = 0;
+ int push = 0;
+
+ if (flags & XFS_TRANS_RELEASE_LOG_RES)
+ log_flags = XFS_LOG_REL_PERM_RESERV;
+
+ if (XLOG_FORCED_SHUTDOWN(log)) {
+ xlog_cil_free_logvec(log_vector);
+ return XFS_ERROR(EIO);
+ }
+
+ /* lock out background commit */
+ down_read(&log->l_cilp->xc_ctx_lock);
+ xlog_cil_format_items(log, log_vector, tp->t_ticket, commit_lsn);
+
+ /* check we didn't blow the reservation */
+ if (tp->t_ticket->t_curr_res < 0)
+ xlog_print_tic_res(log->l_mp, tp->t_ticket);
+
+ /* attach the transaction to the CIL if it has any busy extents */
+ if (!list_empty(&tp->t_busy)) {
+ spin_lock(&log->l_cilp->xc_cil_lock);
+ list_splice_init(&tp->t_busy,
+ &log->l_cilp->xc_ctx->busy_extents);
+ spin_unlock(&log->l_cilp->xc_cil_lock);
+ }
+
+ tp->t_commit_lsn = *commit_lsn;
+ xfs_log_done(mp, tp->t_ticket, NULL, log_flags);
+ xfs_trans_unreserve_and_mod_sb(tp);
+
+ /*
+ * Once all the items of the transaction have been copied to the CIL,
+ * the items can be unlocked and freed.
+ *
+ * This needs to be done before we drop the CIL context lock because we
+ * have to update state in the log items and unlock them before they go
+ * to disk. If we don't, then the CIL checkpoint can race with us and
+ * we can run checkpoint completion before we've updated and unlocked
+ * the log items. This affects (at least) processing of stale buffers,
+ * inodes and EFIs.
+ */
+ xfs_trans_free_items(tp, *commit_lsn, 0);
+
+ /* check for background commit before unlock */
+ if (log->l_cilp->xc_ctx->space_used > XLOG_CIL_SPACE_LIMIT(log))
+ push = 1;
+
+ up_read(&log->l_cilp->xc_ctx_lock);
+
+ /*
+ * We need to push CIL every so often so we don't cache more than we
+ * can fit in the log. The limit really is that a checkpoint can't be
+ * more than half the log (the current checkpoint is not allowed to
+ * overwrite the previous checkpoint), but commit latency and memory
+ * usage limit this to a smaller size in most cases.
+ */
+ if (push)
+ xlog_cil_push(log, 0);
+ return 0;
+}
+
+/*
* Conditionally push the CIL based on the sequence passed in.
*
* We only need to push if we haven't already pushed the sequence
@@ -653,39 +677,34 @@ out_abort:
* commit lsn is there. It'll be empty, so this is broken for now.
*/
xfs_lsn_t
-xlog_cil_push_lsn(
+xlog_cil_force_lsn(
struct log *log,
- xfs_lsn_t push_seq)
+ xfs_lsn_t sequence)
{
struct xfs_cil *cil = log->l_cilp;
struct xfs_cil_ctx *ctx;
xfs_lsn_t commit_lsn = NULLCOMMITLSN;
-restart:
- down_write(&cil->xc_ctx_lock);
- ASSERT(push_seq <= cil->xc_ctx->sequence);
-
- /* check to see if we need to force out the current context */
- if (push_seq == cil->xc_ctx->sequence) {
- up_write(&cil->xc_ctx_lock);
- xlog_cil_push(log, 1);
- goto restart;
- }
+ ASSERT(sequence <= cil->xc_current_sequence);
+
+ /*
+ * check to see if we need to force out the current context.
+ * xlog_cil_push() handles racing pushes for the same sequence,
+ * so no need to deal with it here.
+ */
+ if (sequence == cil->xc_current_sequence)
+ xlog_cil_push(log, sequence);
/*
* See if we can find a previous sequence still committing.
- * We can drop the flush lock as soon as we have the cil lock
- * because we are now only comparing contexts protected by
- * the cil lock.
- *
* We need to wait for all previous sequence commits to complete
* before allowing the force of push_seq to go ahead. Hence block
* on commits for those as well.
*/
+restart:
spin_lock(&cil->xc_cil_lock);
- up_write(&cil->xc_ctx_lock);
list_for_each_entry(ctx, &cil->xc_committing, committing) {
- if (ctx->sequence > push_seq)
+ if (ctx->sequence > sequence)
continue;
if (!ctx->commit_lsn) {
/*
@@ -695,7 +714,7 @@ restart:
sv_wait(&cil->xc_commit_wait, 0, &cil->xc_cil_lock, 0);
goto restart;
}
- if (ctx->sequence != push_seq)
+ if (ctx->sequence != sequence)
continue;
/* found it! */
commit_lsn = ctx->commit_lsn;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 8c07261..ced52b9 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -422,6 +422,7 @@ struct xfs_cil {
struct rw_semaphore xc_ctx_lock;
struct list_head xc_committing;
sv_t xc_commit_wait;
+ xfs_lsn_t xc_current_sequence;
};
/*
@@ -562,8 +563,16 @@ int xlog_cil_init(struct log *log);
void xlog_cil_init_post_recovery(struct log *log);
void xlog_cil_destroy(struct log *log);
-int xlog_cil_push(struct log *log, int push_now);
-xfs_lsn_t xlog_cil_push_lsn(struct log *log, xfs_lsn_t push_sequence);
+/*
+ * CIL force routines
+ */
+xfs_lsn_t xlog_cil_force_lsn(struct log *log, xfs_lsn_t sequence);
+
+static inline void
+xlog_cil_force(struct log *log)
+{
+ xlog_cil_force_lsn(log, log->l_cilp->xc_current_sequence);
+}
/*
* Unmount record type is used as a pseudo transaction type for the ticket.
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] xfs: dummy transactions should not dirty VFS state
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
` (2 preceding siblings ...)
2010-08-18 11:36 ` [PATCH 3/8] xfs: Reduce log force overhead for delayed logging Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 12:04 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative Dave Chinner
` (3 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When we need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.
As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.
To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_super.c | 2 +-
fs/xfs/linux-2.6/xfs_sync.c | 42 ++++++------------------------------------
fs/xfs/xfs_fsops.c | 31 ++++++++++++++++++-------------
fs/xfs/xfs_fsops.h | 2 +-
4 files changed, 26 insertions(+), 51 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 15c35b6..d09cd14 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1402,7 +1402,7 @@ xfs_fs_freeze(
xfs_save_resvblks(mp);
xfs_quiesce_attr(mp);
- return -xfs_fs_log_dummy(mp);
+ return -xfs_fs_log_dummy(mp, SYNC_WAIT);
}
STATIC int
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index dfcbd98..d59c4a6 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -34,6 +34,7 @@
#include "xfs_inode_item.h"
#include "xfs_quota.h"
#include "xfs_trace.h"
+#include "xfs_fsops.h"
#include <linux/kthread.h>
#include <linux/freezer.h>
@@ -341,38 +342,6 @@ xfs_sync_attr(
}
STATIC int
-xfs_commit_dummy_trans(
- struct xfs_mount *mp,
- uint flags)
-{
- struct xfs_inode *ip = mp->m_rootip;
- struct xfs_trans *tp;
- int error;
-
- /*
- * Put a dummy transaction in the log to tell recovery
- * that all others are OK.
- */
- tp = xfs_trans_alloc(mp, XFS_TRANS_DUMMY1);
- error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
- if (error) {
- xfs_trans_cancel(tp, 0);
- return error;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
-
- xfs_trans_ijoin(tp, ip);
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- error = xfs_trans_commit(tp, 0);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- /* the log force ensures this transaction is pushed to disk */
- xfs_log_force(mp, (flags & SYNC_WAIT) ? XFS_LOG_SYNC : 0);
- return error;
-}
-
-STATIC int
xfs_sync_fsdata(
struct xfs_mount *mp)
{
@@ -432,7 +401,7 @@ xfs_quiesce_data(
/* mark the log as covered if needed */
if (xfs_log_need_covered(mp))
- error2 = xfs_commit_dummy_trans(mp, SYNC_WAIT);
+ error2 = xfs_fs_log_dummy(mp, SYNC_WAIT);
/* flush data-only devices */
if (mp->m_rtdev_targp)
@@ -563,7 +532,7 @@ xfs_flush_inodes(
/*
* Every sync period we need to unpin all items, reclaim inodes and sync
* disk quotas. We might need to cover the log to indicate that the
- * filesystem is idle.
+ * filesystem is idle and not frozen.
*/
STATIC void
xfs_sync_worker(
@@ -577,8 +546,9 @@ xfs_sync_worker(
xfs_reclaim_inodes(mp, 0);
/* dgc: errors ignored here */
error = xfs_qm_sync(mp, SYNC_TRYLOCK);
- if (xfs_log_need_covered(mp))
- error = xfs_commit_dummy_trans(mp, 0);
+ if (mp->m_super->s_frozen == SB_UNFROZEN &&
+ xfs_log_need_covered(mp))
+ error = xfs_fs_log_dummy(mp, 0);
}
mp->m_sync_seq++;
wake_up(&mp->m_wait_single_sync_task);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index dbca5f5..43b1d56 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -604,31 +604,36 @@ out:
return 0;
}
+/*
+ * Dump a transaction into the log that contains no real change. This is needed
+ * to be able to make the log dirty or stamp the current tail LSN into the log
+ * during the covering operation.
+ *
+ * We cannot use an inode here for this - that will push dirty state back up
+ * into the VFS and then periodic inode flushing will prevent log covering from
+ * making progress. Hence we log a field in the superblock instead.
+ */
int
xfs_fs_log_dummy(
- xfs_mount_t *mp)
+ xfs_mount_t *mp,
+ int flags)
{
xfs_trans_t *tp;
- xfs_inode_t *ip;
int error;
tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1, KM_SLEEP);
- error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
+ error = xfs_trans_reserve(tp, 0, mp->m_sb.sb_sectsize + 128, 0, 0,
+ XFS_DEFAULT_LOG_COUNT);
if (error) {
xfs_trans_cancel(tp, 0);
return error;
}
- ip = mp->m_rootip;
- xfs_ilock(ip, XFS_ILOCK_EXCL);
-
- xfs_trans_ijoin(tp, ip);
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- xfs_trans_set_sync(tp);
- error = xfs_trans_commit(tp, 0);
-
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return error;
+ /* log the UUID because it is an unchanging field */
+ xfs_mod_sb(tp, XFS_SB_UUID);
+ if (flags & SYNC_WAIT)
+ xfs_trans_set_sync(tp);
+ return xfs_trans_commit(tp, 0);
}
int
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 88435e0..a786c52 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -25,6 +25,6 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
extern int xfs_reserve_blocks(xfs_mount_t *mp, __uint64_t *inval,
xfs_fsop_resblks_t *outval);
extern int xfs_fs_goingdown(xfs_mount_t *mp, __uint32_t inflags);
-extern int xfs_fs_log_dummy(xfs_mount_t *mp);
+extern int xfs_fs_log_dummy(xfs_mount_t *mp, int flags);
#endif /* __XFS_FSOPS_H__ */
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
` (3 preceding siblings ...)
2010-08-18 11:36 ` [PATCH 4/8] xfs: dummy transactions should not dirty VFS state Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 12:05 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 6/8] xfs: fix untrusted inode number lookup Dave Chinner
` (2 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Stuart Brodsky <sbrodsky@sgi.com>
Because of delayed updates to sb_icount field in the super block, it
is possible to allocate over maxicount number of inodes. This
causes the arithmetic to calculate a negative number of free inodes
in user commands like df or stat -f.
Since maxicount is a somewhat arbitrary number, a slight over
allocation is not critical but user commands should be displayed as
0 or greater and never go negative. To do this the value in the
stats buffer f_ffree is capped to never go negative.
[ Modified to use max_t as per Christoph's comment. ]
Signed-off-by: Stu Brodsky <sbrodsky@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_super.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index d09cd14..a4e0797 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1226,6 +1226,7 @@ xfs_fs_statfs(
struct xfs_inode *ip = XFS_I(dentry->d_inode);
__uint64_t fakeinos, id;
xfs_extlen_t lsize;
+ __int64_t ffree;
statp->f_type = XFS_SB_MAGIC;
statp->f_namelen = MAXNAMELEN - 1;
@@ -1249,7 +1250,11 @@ xfs_fs_statfs(
statp->f_files = min_t(typeof(statp->f_files),
statp->f_files,
mp->m_maxicount);
- statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
+
+ /* make sure statp->f_ffree does not underflow */
+ ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
+ statp->f_ffree = max_t(__int64_t, ffree, 0);
+
spin_unlock(&mp->m_sb_lock);
if ((ip->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) ||
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] xfs: fix untrusted inode number lookup
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
` (4 preceding siblings ...)
2010-08-18 11:36 ` [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 11:36 ` [PATCH 7/8] xfs: use range primitives for xfs page cache operations Dave Chinner
2010-08-18 11:36 ` [PATCH 8/8] xfs: Introduce XFS_IOC_ZERO_RANGE Dave Chinner
7 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.
The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.
Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_ialloc.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index abf80ae..5371d2d 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -1213,7 +1213,6 @@ xfs_imap_lookup(
struct xfs_inobt_rec_incore rec;
struct xfs_btree_cur *cur;
struct xfs_buf *agbp;
- xfs_agino_t startino;
int error;
int i;
@@ -1227,13 +1226,13 @@ xfs_imap_lookup(
}
/*
- * derive and lookup the exact inode record for the given agino. If the
- * record cannot be found, then it's an invalid inode number and we
- * should abort.
+ * Lookup the inode record for the given agino. If the record cannot be
+ * found, then it's an invalid inode number and we should abort. Once
+ * we have a record, we need to ensure it contains the inode number
+ * we are looking up.
*/
cur = xfs_inobt_init_cursor(mp, tp, agbp, agno);
- startino = agino & ~(XFS_IALLOC_INODES(mp) - 1);
- error = xfs_inobt_lookup(cur, startino, XFS_LOOKUP_EQ, &i);
+ error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
if (!error) {
if (i)
error = xfs_inobt_get_rec(cur, &rec, &i);
@@ -1246,6 +1245,11 @@ xfs_imap_lookup(
if (error)
return error;
+ /* check that the returned record contains the required inode */
+ if (rec.ir_startino > agino ||
+ rec.ir_startino + XFS_IALLOC_INODES(mp) <= agino)
+ return EINVAL;
+
/* for untrusted inodes check it is allocated first */
if ((flags & XFS_IGET_UNTRUSTED) &&
(rec.ir_free & XFS_INOBT_MASK(agino - rec.ir_startino)))
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] xfs: use range primitives for xfs page cache operations
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
` (5 preceding siblings ...)
2010-08-18 11:36 ` [PATCH 6/8] xfs: fix untrusted inode number lookup Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
2010-08-18 11:36 ` [PATCH 8/8] xfs: Introduce XFS_IOC_ZERO_RANGE Dave Chinner
7 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
While XFS passes ranges to operate on from the core code, the
functions being called ignore the either the entire range or the end
of the range. This is historical because when the function were
written linux didn't have the necessary range operations. Update the
functions to use the correct operations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/linux-2.6/xfs_fs_subr.c | 31 +++++++++++++++----------------
1 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_fs_subr.c b/fs/xfs/linux-2.6/xfs_fs_subr.c
index 1f279b0..ed88ed1 100644
--- a/fs/xfs/linux-2.6/xfs_fs_subr.c
+++ b/fs/xfs/linux-2.6/xfs_fs_subr.c
@@ -32,10 +32,9 @@ xfs_tosspages(
xfs_off_t last,
int fiopt)
{
- struct address_space *mapping = VFS_I(ip)->i_mapping;
-
- if (mapping->nrpages)
- truncate_inode_pages(mapping, first);
+ /* can't toss partial tail pages, so mask them out */
+ last &= ~(PAGE_SIZE - 1);
+ truncate_inode_pages_range(VFS_I(ip)->i_mapping, first, last - 1);
}
int
@@ -50,12 +49,11 @@ xfs_flushinval_pages(
trace_xfs_pagecache_inval(ip, first, last);
- if (mapping->nrpages) {
- xfs_iflags_clear(ip, XFS_ITRUNCATED);
- ret = filemap_write_and_wait(mapping);
- if (!ret)
- truncate_inode_pages(mapping, first);
- }
+ xfs_iflags_clear(ip, XFS_ITRUNCATED);
+ ret = filemap_write_and_wait_range(mapping, first,
+ last == -1 ? LLONG_MAX : last);
+ if (!ret)
+ truncate_inode_pages_range(mapping, first, last);
return -ret;
}
@@ -71,10 +69,9 @@ xfs_flush_pages(
int ret = 0;
int ret2;
- if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
- xfs_iflags_clear(ip, XFS_ITRUNCATED);
- ret = -filemap_fdatawrite(mapping);
- }
+ xfs_iflags_clear(ip, XFS_ITRUNCATED);
+ ret = -filemap_fdatawrite_range(mapping, first,
+ last == -1 ? LLONG_MAX : last);
if (flags & XBF_ASYNC)
return ret;
ret2 = xfs_wait_on_pages(ip, first, last);
@@ -91,7 +88,9 @@ xfs_wait_on_pages(
{
struct address_space *mapping = VFS_I(ip)->i_mapping;
- if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
- return -filemap_fdatawait(mapping);
+ if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
+ return -filemap_fdatawait_range(mapping, first,
+ last == -1 ? ip->i_size - 1 : last);
+ }
return 0;
}
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] xfs: Introduce XFS_IOC_ZERO_RANGE
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
` (6 preceding siblings ...)
2010-08-18 11:36 ` [PATCH 7/8] xfs: use range primitives for xfs page cache operations Dave Chinner
@ 2010-08-18 11:36 ` Dave Chinner
7 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2010-08-18 11:36 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
XFS_IOC_ZERO_RANGE is the equivalent of an atomic XFS_IOC_UNRESVSP/
XFS_IOC_RESVSP call pair. It enabled ranges of written data to be
turned into zeroes without requiring IO or having to free and
reallocate the extents in the range given as would occur if we had
to punch and then preallocate them separately. This enables
applications to zero parts of files very quickly without changing
the layout of the files in any way.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/linux-2.6/xfs_ioctl.c | 3 ++-
fs/xfs/linux-2.6/xfs_ioctl32.c | 1 +
fs/xfs/xfs_bmap.c | 12 +++++++++---
fs/xfs/xfs_bmap.h | 9 ++++++---
fs/xfs/xfs_fs.h | 1 +
fs/xfs/xfs_vnodeops.c | 10 ++++++++--
6 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_ioctl.c b/fs/xfs/linux-2.6/xfs_ioctl.c
index 237f5ff..a9c4810 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl.c
@@ -1292,7 +1292,8 @@ xfs_file_ioctl(
case XFS_IOC_ALLOCSP64:
case XFS_IOC_FREESP64:
case XFS_IOC_RESVSP64:
- case XFS_IOC_UNRESVSP64: {
+ case XFS_IOC_UNRESVSP64:
+ case XFS_IOC_ZERO_RANGE: {
xfs_flock64_t bf;
if (copy_from_user(&bf, arg, sizeof(bf)))
diff --git a/fs/xfs/linux-2.6/xfs_ioctl32.c b/fs/xfs/linux-2.6/xfs_ioctl32.c
index 6c83f7f..464bcc7 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl32.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl32.c
@@ -574,6 +574,7 @@ xfs_file_compat_ioctl(
case XFS_IOC_FSGEOMETRY_V1:
case XFS_IOC_FSGROWFSDATA:
case XFS_IOC_FSGROWFSRT:
+ case XFS_IOC_ZERO_RANGE:
return xfs_file_ioctl(filp, cmd, p);
#else
case XFS_IOC_ALLOCSP_32:
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 23f14e5..d36770f 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -4744,8 +4744,12 @@ xfs_bmapi(
* Check if writing previously allocated but
* unwritten extents.
*/
- if (wr && mval->br_state == XFS_EXT_UNWRITTEN &&
- ((flags & (XFS_BMAPI_PREALLOC|XFS_BMAPI_DELAY)) == 0)) {
+ if (wr &&
+ ((mval->br_state == XFS_EXT_UNWRITTEN &&
+ ((flags & (XFS_BMAPI_PREALLOC|XFS_BMAPI_DELAY)) == 0)) ||
+ (mval->br_state == XFS_EXT_NORM &&
+ ((flags & (XFS_BMAPI_PREALLOC|XFS_BMAPI_CONVERT)) ==
+ (XFS_BMAPI_PREALLOC|XFS_BMAPI_CONVERT))))) {
/*
* Modify (by adding) the state flag, if writing.
*/
@@ -4757,7 +4761,9 @@ xfs_bmapi(
*firstblock;
cur->bc_private.b.flist = flist;
}
- mval->br_state = XFS_EXT_NORM;
+ mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
+ ? XFS_EXT_NORM
+ : XFS_EXT_UNWRITTEN;
error = xfs_bmap_add_extent(ip, lastx, &cur, mval,
firstblock, flist, &tmp_logflags,
whichfork, (flags & XFS_BMAPI_RSVBLOCKS));
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index b13569a..71ec9b6 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -74,9 +74,12 @@ typedef struct xfs_bmap_free
#define XFS_BMAPI_IGSTATE 0x080 /* Ignore state - */
/* combine contig. space */
#define XFS_BMAPI_CONTIG 0x100 /* must allocate only one extent */
-#define XFS_BMAPI_CONVERT 0x200 /* unwritten extent conversion - */
- /* need write cache flushing and no */
- /* additional allocation alignments */
+/*
+ * unwritten extent conversion - this needs write cache flushing and no additional
+ * allocation alignments. When specified with XFS_BMAPI_PREALLOC it converts
+ * from written to unwritten, otherwise convert from unwritten to written.
+ */
+#define XFS_BMAPI_CONVERT 0x200
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_WRITE, "WRITE" }, \
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 7cf7220..6af6518 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -446,6 +446,7 @@ typedef struct xfs_handle {
/* XFS_IOC_SETBIOSIZE ---- deprecated 46 */
/* XFS_IOC_GETBIOSIZE ---- deprecated 47 */
#define XFS_IOC_GETBMAPX _IOWR('X', 56, struct getbmap)
+#define XFS_IOC_ZERO_RANGE _IOW ('X', 57, struct xfs_flock64)
/*
* ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 66d585c..3cacf59 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2272,7 +2272,7 @@ xfs_alloc_file_space(
count = len;
imapp = &imaps[0];
nimaps = 1;
- bmapi_flag = XFS_BMAPI_WRITE | (alloc_type ? XFS_BMAPI_PREALLOC : 0);
+ bmapi_flag = XFS_BMAPI_WRITE | alloc_type;
startoffset_fsb = XFS_B_TO_FSBT(mp, offset);
allocatesize_fsb = XFS_B_TO_FSB(mp, count);
@@ -2704,6 +2704,7 @@ xfs_change_file_space(
xfs_off_t llen;
xfs_trans_t *tp;
struct iattr iattr;
+ int prealloc_type;
if (!S_ISREG(ip->i_d.di_mode))
return XFS_ERROR(EINVAL);
@@ -2746,12 +2747,17 @@ xfs_change_file_space(
* size to be changed.
*/
setprealloc = clrprealloc = 0;
+ prealloc_type = XFS_BMAPI_PREALLOC;
switch (cmd) {
+ case XFS_IOC_ZERO_RANGE:
+ prealloc_type |= XFS_BMAPI_CONVERT;
+ xfs_tosspages(ip, startoffset, startoffset + bf->l_len, 0);
+ /* FALLTHRU */
case XFS_IOC_RESVSP:
case XFS_IOC_RESVSP64:
error = xfs_alloc_file_space(ip, startoffset, bf->l_len,
- 1, attr_flags);
+ prealloc_type, attr_flags);
if (error)
return error;
setprealloc = 1;
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 4/8] xfs: dummy transactions should not dirty VFS state
2010-08-18 11:36 ` [PATCH 4/8] xfs: dummy transactions should not dirty VFS state Dave Chinner
@ 2010-08-18 12:04 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-08-18 12:04 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
The updated version looks good to me,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative
2010-08-18 11:36 ` [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative Dave Chinner
@ 2010-08-18 12:05 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-08-18 12:05 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
2010-08-18 11:36 ` [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE Dave Chinner
@ 2010-08-19 12:06 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-08-19 12:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
This new version looks good to me,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-08-19 12:05 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-18 11:36 [PATCH 0/8] xfs: current pending patches Dave Chinner
2010-08-18 11:36 ` [PATCH 1/8] xfs: unlock items before allowing the CIL to commit Dave Chinner
2010-08-18 11:36 ` [PATCH 2/8] xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE Dave Chinner
2010-08-19 12:06 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 3/8] xfs: Reduce log force overhead for delayed logging Dave Chinner
2010-08-18 11:36 ` [PATCH 4/8] xfs: dummy transactions should not dirty VFS state Dave Chinner
2010-08-18 12:04 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 5/8] xfs: ensure f_ffree returned by statfs() is non-negative Dave Chinner
2010-08-18 12:05 ` Christoph Hellwig
2010-08-18 11:36 ` [PATCH 6/8] xfs: fix untrusted inode number lookup Dave Chinner
2010-08-18 11:36 ` [PATCH 7/8] xfs: use range primitives for xfs page cache operations Dave Chinner
2010-08-18 11:36 ` [PATCH 8/8] xfs: Introduce XFS_IOC_ZERO_RANGE Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox