* [PATCH 01/11] xfs: remove xfs_itruncate_data
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
@ 2011-12-08 15:57 ` Christoph Hellwig
2011-12-13 21:23 ` Dave Chinner
2011-12-08 15:57 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
` (9 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:57 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-xfs_itruncate_data --]
[-- Type: text/plain, Size: 14594 bytes --]
This wrapper isn't overly useful, not to say rather confusing.
Around the call to xfs_itruncate_extents it does:
- add tracing
- add a few asserts in debug builds
- conditionally update the inode size in two places
- log the inode
Both the tracing and the inode logging can be moved to xfs_itruncate_extents
as they are useful for the attribute fork as well - in fact the attr code
already does an equivalent xfs_trans_log_inode call just after calling
xfs_itruncate_extents. The conditional size updates are a mess, and there
was no reason to do them in two places anyway, as the first one was
conditional on the inode having extents - but without extents we
xfs_itruncate_extents would be a no-op and the placement wouldn't matter
anyway. Instead move the size assignments and the asserts that make sense
to the callers that want it.
As a side effect of this clean up xfs_setattr_size by introducing variables
for the old and new inode size, and moving the size updates into a common
place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_attr.c | 4 -
fs/xfs/xfs_inode.c | 124 +++--------------------------------------------
fs/xfs/xfs_inode.h | 2
fs/xfs/xfs_iops.c | 47 +++++++++++------
fs/xfs/xfs_qm_syscalls.c | 9 +++
fs/xfs/xfs_trace.h | 4 -
fs/xfs/xfs_vnodeops.c | 17 +++++-
7 files changed, 65 insertions(+), 142 deletions(-)
Index: xfs/fs/xfs/xfs_attr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr.c 2011-11-30 12:58:07.820044461 +0100
+++ xfs/fs/xfs/xfs_attr.c 2011-11-30 12:58:51.519807719 +0100
@@ -827,10 +827,6 @@ xfs_attr_inactive(xfs_inode_t *dp)
if (error)
goto out;
- /*
- * Commit the last in the sequence of transactions.
- */
- xfs_trans_log_inode(trans, dp, XFS_ILOG_CORE);
error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
xfs_iunlock(dp, XFS_ILOCK_EXCL);
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c 2011-11-30 12:58:07.830044408 +0100
+++ xfs/fs/xfs/xfs_inode.c 2011-11-30 12:58:51.519807719 +0100
@@ -1166,52 +1166,6 @@ xfs_ialloc(
}
/*
- * Check to make sure that there are no blocks allocated to the
- * file beyond the size of the file. We don't check this for
- * files with fixed size extents or real time extents, but we
- * at least do it for regular files.
- */
-#ifdef DEBUG
-STATIC void
-xfs_isize_check(
- struct xfs_inode *ip,
- xfs_fsize_t isize)
-{
- struct xfs_mount *mp = ip->i_mount;
- xfs_fileoff_t map_first;
- int nimaps;
- xfs_bmbt_irec_t imaps[2];
- int error;
-
- if (!S_ISREG(ip->i_d.di_mode))
- return;
-
- if (XFS_IS_REALTIME_INODE(ip))
- return;
-
- if (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE)
- return;
-
- nimaps = 2;
- map_first = XFS_B_TO_FSB(mp, (xfs_ufsize_t)isize);
- /*
- * The filesystem could be shutting down, so bmapi may return
- * an error.
- */
- error = xfs_bmapi_read(ip, map_first,
- (XFS_B_TO_FSB(mp,
- (xfs_ufsize_t)XFS_MAXIOFFSET(mp)) - map_first),
- imaps, &nimaps, XFS_BMAPI_ENTIRE);
- if (error)
- return;
- ASSERT(nimaps == 1);
- ASSERT(imaps[0].br_startblock == HOLESTARTBLOCK);
-}
-#else /* DEBUG */
-#define xfs_isize_check(ip, isize)
-#endif /* DEBUG */
-
-/*
* Free up the underlying blocks past new_size. The new size must be smaller
* than the current size. This routine can be used both for the attribute and
* data fork, and does not modify the inode size, which is left to the caller.
@@ -1258,6 +1212,8 @@ xfs_itruncate_extents(
ASSERT(ip->i_itemp->ili_lock_flags == 0);
ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
+ trace_xfs_itruncate_extents_start(ip, new_size);
+
/*
* Since it is possible for space to become allocated beyond
* the end of the file (in a crash where the space is allocated
@@ -1325,6 +1281,14 @@ xfs_itruncate_extents(
goto out;
}
+ /*
+ * Always re-log the inode so that our permanent transaction can keep
+ * on rolling it forward in the log.
+ */
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ trace_xfs_itruncate_extents_end(ip, new_size);
+
out:
*tpp = tp;
return error;
@@ -1338,74 +1302,6 @@ out_bmap_cancel:
goto out;
}
-int
-xfs_itruncate_data(
- struct xfs_trans **tpp,
- struct xfs_inode *ip,
- xfs_fsize_t new_size)
-{
- int error;
-
- trace_xfs_itruncate_data_start(ip, new_size);
-
- /*
- * The first thing we do is set the size to new_size permanently on
- * disk. This way we don't have to worry about anyone ever being able
- * to look at the data being freed even in the face of a crash.
- * What we're getting around here is the case where we free a block, it
- * is allocated to another file, it is written to, and then we crash.
- * If the new data gets written to the file but the log buffers
- * containing the free and reallocation don't, then we'd end up with
- * garbage in the blocks being freed. As long as we make the new_size
- * permanent before actually freeing any blocks it doesn't matter if
- * they get written to.
- */
- if (ip->i_d.di_nextents > 0) {
- /*
- * If we are not changing the file size then do not update
- * the on-disk file size - we may be called from
- * xfs_inactive_free_eofblocks(). If we update the on-disk
- * file size and then the system crashes before the contents
- * of the file are flushed to disk then the files may be
- * full of holes (ie NULL files bug).
- */
- if (ip->i_size != new_size) {
- ip->i_d.di_size = new_size;
- ip->i_size = new_size;
- xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
- }
- }
-
- error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, new_size);
- if (error)
- return error;
-
- /*
- * If we are not changing the file size then do not update the on-disk
- * file size - we may be called from xfs_inactive_free_eofblocks().
- * If we update the on-disk file size and then the system crashes
- * before the contents of the file are flushed to disk then the files
- * may be full of holes (ie NULL files bug).
- */
- xfs_isize_check(ip, new_size);
- if (ip->i_size != new_size) {
- ip->i_d.di_size = new_size;
- ip->i_size = new_size;
- }
-
- ASSERT(new_size != 0 || ip->i_delayed_blks == 0);
- ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);
-
- /*
- * Always re-log the inode so that our permanent transaction can keep
- * on rolling it forward in the log.
- */
- xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
-
- trace_xfs_itruncate_data_end(ip, new_size);
- return 0;
-}
-
/*
* This is called when the inode's link count goes to 0.
* We place the on-disk inode on a list in the AGI. It
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:58:08.843372251 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-11-30 12:58:51.523141034 +0100
@@ -491,8 +491,6 @@ int xfs_ifree(struct xfs_trans *, xfs_i
struct xfs_bmap_free *);
int xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
int, xfs_fsize_t);
-int xfs_itruncate_data(struct xfs_trans **, struct xfs_inode *,
- xfs_fsize_t);
int xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
void xfs_iext_realloc(xfs_inode_t *, int, int);
Index: xfs/fs/xfs/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iops.c 2011-11-30 12:58:07.856710929 +0100
+++ xfs/fs/xfs/xfs_iops.c 2011-11-30 12:58:51.523141034 +0100
@@ -750,6 +750,7 @@ xfs_setattr_size(
struct xfs_mount *mp = ip->i_mount;
struct inode *inode = VFS_I(ip);
int mask = iattr->ia_valid;
+ xfs_off_t oldsize, newsize;
struct xfs_trans *tp;
int error;
uint lock_flags;
@@ -777,11 +778,13 @@ xfs_setattr_size(
lock_flags |= XFS_IOLOCK_EXCL;
xfs_ilock(ip, lock_flags);
+ oldsize = ip->i_size;
+ newsize = iattr->ia_size;
+
/*
* Short circuit the truncate case for zero length files.
*/
- if (iattr->ia_size == 0 &&
- ip->i_size == 0 && ip->i_d.di_nextents == 0) {
+ if (newsize == 0 && oldsize == 0 && ip->i_d.di_nextents == 0) {
if (!(mask & (ATTR_CTIME|ATTR_MTIME)))
goto out_unlock;
@@ -807,14 +810,14 @@ xfs_setattr_size(
* the inode to the transaction, because the inode cannot be unlocked
* once it is a part of the transaction.
*/
- if (iattr->ia_size > ip->i_size) {
+ if (newsize > oldsize) {
/*
* Do the first part of growing a file: zero any data in the
* last block that is beyond the old EOF. We need to do this
* before the inode is joined to the transaction to modify
* i_size.
*/
- error = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
+ error = xfs_zero_eof(ip, newsize, oldsize);
if (error)
goto out_unlock;
}
@@ -833,8 +836,8 @@ xfs_setattr_size(
* here and prevents waiting for other data not within the range we
* care about here.
*/
- if (ip->i_size != ip->i_d.di_size && iattr->ia_size > ip->i_d.di_size) {
- error = xfs_flush_pages(ip, ip->i_d.di_size, iattr->ia_size, 0,
+ if (oldsize != ip->i_d.di_size && newsize > ip->i_d.di_size) {
+ error = xfs_flush_pages(ip, ip->i_d.di_size, newsize, 0,
FI_NONE);
if (error)
goto out_unlock;
@@ -845,8 +848,7 @@ xfs_setattr_size(
*/
inode_dio_wait(inode);
- error = -block_truncate_page(inode->i_mapping, iattr->ia_size,
- xfs_get_blocks);
+ error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
if (error)
goto out_unlock;
@@ -857,7 +859,7 @@ xfs_setattr_size(
if (error)
goto out_trans_cancel;
- truncate_setsize(inode, iattr->ia_size);
+ truncate_setsize(inode, newsize);
commit_flags = XFS_TRANS_RELEASE_LOG_RES;
lock_flags |= XFS_ILOCK_EXCL;
@@ -876,19 +878,30 @@ xfs_setattr_size(
* these flags set. For all other operations the VFS set these flags
* explicitly if it wants a timestamp update.
*/
- if (iattr->ia_size != ip->i_size &&
- (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
+ if (newsize != oldsize && (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
iattr->ia_ctime = iattr->ia_mtime =
current_fs_time(inode->i_sb);
mask |= ATTR_CTIME | ATTR_MTIME;
}
- if (iattr->ia_size > ip->i_size) {
- ip->i_d.di_size = iattr->ia_size;
- ip->i_size = iattr->ia_size;
- } else if (iattr->ia_size <= ip->i_size ||
- (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
- error = xfs_itruncate_data(&tp, ip, iattr->ia_size);
+ /*
+ * The first thing we do is set the size to new_size permanently on
+ * disk. This way we don't have to worry about anyone ever being able
+ * to look at the data being freed even in the face of a crash.
+ * What we're getting around here is the case where we free a block, it
+ * is allocated to another file, it is written to, and then we crash.
+ * If the new data gets written to the file but the log buffers
+ * containing the free and reallocation don't, then we'd end up with
+ * garbage in the blocks being freed. As long as we make the new size
+ * permanent before actually freeing any blocks it doesn't matter if
+ * they get written to.
+ */
+ ip->i_d.di_size = newsize;
+ ip->i_size = newsize;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ if (newsize <= oldsize) {
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
if (error)
goto out_trans_abort;
Index: xfs/fs/xfs/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/xfs_qm_syscalls.c 2011-11-30 12:58:07.866710876 +0100
+++ xfs/fs/xfs/xfs_qm_syscalls.c 2011-11-30 12:58:51.523141034 +0100
@@ -31,6 +31,7 @@
#include "xfs_mount.h"
#include "xfs_bmap_btree.h"
#include "xfs_inode.h"
+#include "xfs_inode_item.h"
#include "xfs_itable.h"
#include "xfs_bmap.h"
#include "xfs_rtalloc.h"
@@ -263,13 +264,19 @@ xfs_qm_scall_trunc_qfile(
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, ip, 0);
- error = xfs_itruncate_data(&tp, ip, 0);
+ ip->i_d.di_size = 0;
+ ip->i_size = 0;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
if (error) {
xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
XFS_TRANS_ABORT);
goto out_unlock;
}
+ ASSERT(ip->i_d.di_nextents == 0);
+
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h 2011-11-30 12:58:07.876710822 +0100
+++ xfs/fs/xfs/xfs_trace.h 2011-11-30 12:58:51.523141034 +0100
@@ -1090,8 +1090,8 @@ DECLARE_EVENT_CLASS(xfs_itrunc_class,
DEFINE_EVENT(xfs_itrunc_class, name, \
TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), \
TP_ARGS(ip, new_size))
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_start);
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_end);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_start);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_end);
TRACE_EVENT(xfs_pagecache_inval,
TP_PROTO(struct xfs_inode *ip, xfs_off_t start, xfs_off_t finish),
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c 2011-11-30 12:58:07.893377397 +0100
+++ xfs/fs/xfs/xfs_vnodeops.c 2011-11-30 12:58:51.526474350 +0100
@@ -226,7 +226,14 @@ xfs_free_eofblocks(
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, ip, 0);
- error = xfs_itruncate_data(&tp, ip, ip->i_size);
+ /*
+ * Do not update the on-disk file size. If we update the
+ * on-disk file size and then the system crashes before the
+ * contents of the file are flushed to disk then the files
+ * may be full of holes (ie NULL files bug).
+ */
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
+ ip->i_size);
if (error) {
/*
* If we get an error at this point we simply don't
@@ -670,13 +677,19 @@ xfs_inactive(
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, ip, 0);
- error = xfs_itruncate_data(&tp, ip, 0);
+ ip->i_d.di_size = 0;
+ ip->i_size = 0;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
if (error) {
xfs_trans_cancel(tp,
XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
xfs_iunlock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
return VN_INACTIVE_CACHE;
}
+
+ ASSERT(ip->i_d.di_nextents == 0);
} else if (S_ISLNK(ip->i_d.di_mode)) {
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 01/11] xfs: remove xfs_itruncate_data
2011-12-08 15:57 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
@ 2011-12-13 21:23 ` Dave Chinner
0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 21:23 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:57:56AM -0500, Christoph Hellwig wrote:
> This wrapper isn't overly useful, not to say rather confusing.
>
> Around the call to xfs_itruncate_extents it does:
>
> - add tracing
> - add a few asserts in debug builds
> - conditionally update the inode size in two places
> - log the inode
>
> Both the tracing and the inode logging can be moved to xfs_itruncate_extents
> as they are useful for the attribute fork as well - in fact the attr code
> already does an equivalent xfs_trans_log_inode call just after calling
> xfs_itruncate_extents. The conditional size updates are a mess, and there
> was no reason to do them in two places anyway, as the first one was
> conditional on the inode having extents - but without extents we
> xfs_itruncate_extents would be a no-op and the placement wouldn't matter
> anyway. Instead move the size assignments and the asserts that make sense
> to the callers that want it.
>
> As a side effect of this clean up xfs_setattr_size by introducing variables
> for the old and new inode size, and moving the size updates into a common
> place.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
I can't see anything wrong with this, and it hasn't produced any
failures in my testing, so it looks OK.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
2011-12-08 15:57 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
@ 2011-12-08 15:57 ` Christoph Hellwig
2011-12-13 21:25 ` Dave Chinner
2011-12-08 15:57 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
` (8 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:57 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-cleanup-xfs_iomap_eof_align_last_fsb --]
[-- Type: text/plain, Size: 2230 bytes --]
Replace the nasty if, else if, elseif condition with more natural C flow
that expressed the logic we want here better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
Index: xfs/fs/xfs/xfs_iomap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iomap.c 2011-11-17 12:07:52.580802800 +0100
+++ xfs/fs/xfs/xfs_iomap.c 2011-11-30 10:57:35.649224495 +0100
@@ -57,26 +57,26 @@ xfs_iomap_eof_align_last_fsb(
xfs_fileoff_t *last_fsb)
{
xfs_fileoff_t new_last_fsb = 0;
- xfs_extlen_t align;
+ xfs_extlen_t align = 0;
int eof, error;
- if (XFS_IS_REALTIME_INODE(ip))
- ;
- /*
- * If mounted with the "-o swalloc" option, roundup the allocation
- * request to a stripe width boundary if the file size is >=
- * stripe width and we are allocating past the allocation eof.
- */
- else if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC) &&
- (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_swidth)))
- new_last_fsb = roundup_64(*last_fsb, mp->m_swidth);
- /*
- * Roundup the allocation request to a stripe unit (m_dalign) boundary
- * if the file size is >= stripe unit size, and we are allocating past
- * the allocation eof.
- */
- else if (mp->m_dalign && (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_dalign)))
- new_last_fsb = roundup_64(*last_fsb, mp->m_dalign);
+ if (!XFS_IS_REALTIME_INODE(ip)) {
+ /*
+ * Round up the allocation request to a stripe unit
+ * (m_dalign) boundary if the file size is >= stripe unit
+ * size, and we are allocating past the allocation eof.
+ *
+ * If mounted with the "-o swalloc" option the alignment is
+ * increased from the strip unit size to the stripe width.
+ */
+ if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
+ align = mp->m_swidth;
+ else if (mp->m_dalign)
+ align = mp->m_dalign;
+
+ if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
+ new_last_fsb = roundup_64(*last_fsb, align);
+ }
/*
* Always round up the allocation request to an extent boundary
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH 03/11] xfs: remove the unused dm_attrs structure
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
2011-12-08 15:57 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
2011-12-08 15:57 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
@ 2011-12-08 15:57 ` Christoph Hellwig
2011-12-08 15:57 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
` (7 subsequent siblings)
10 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:57 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-remove-dm_attrs --]
[-- Type: text/plain, Size: 1190 bytes --]
.. and the just as dead bhv_desc forward declaration while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.h | 7 -------
1 file changed, 7 deletions(-)
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 10:56:57.396098395 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-11-30 10:57:46.375833051 +0100
@@ -211,7 +211,6 @@ typedef struct xfs_icdinode {
#ifdef __KERNEL__
-struct bhv_desc;
struct xfs_buf;
struct xfs_bmap_free;
struct xfs_bmbt_irec;
@@ -220,12 +219,6 @@ struct xfs_mount;
struct xfs_trans;
struct xfs_dquot;
-typedef struct dm_attrs_s {
- __uint32_t da_dmevmask; /* DMIG event mask */
- __uint16_t da_dmstate; /* DMIG state info */
- __uint16_t da_pad; /* DMIG extra padding */
-} dm_attrs_t;
-
typedef struct xfs_inode {
/* Inode linking and identification information. */
struct xfs_mount *i_mount; /* fs mount struct ptr */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (2 preceding siblings ...)
2011-12-08 15:57 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
@ 2011-12-08 15:57 ` Christoph Hellwig
2011-12-13 21:59 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
` (6 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:57 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-ifp-if_ext_max --]
[-- Type: text/plain, Size: 19903 bytes --]
We spent a lot of effort to maintain this field, but it always equalts to the
fork size divided by the constant size of an extent. The prime use of it is
to assert that the two stay in sync. Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it. Also introduce a few helpers to consolidate the places
where we actually care about the value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_attr_leaf.c | 9 ----
fs/xfs/xfs_bmap.c | 101 +++++++++++++++++++++---------------------------
fs/xfs/xfs_dfrag.c | 39 +++++++++---------
fs/xfs/xfs_iget.c | 2
fs/xfs/xfs_inode.c | 30 ++++----------
fs/xfs/xfs_inode.h | 4 -
fs/xfs/xfs_inode_item.c | 2
fs/xfs/xfs_trace.h | 5 --
8 files changed, 78 insertions(+), 114 deletions(-)
Index: xfs/fs/xfs/xfs_attr_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr_leaf.c 2011-12-06 15:34:26.000000000 +0100
+++ xfs/fs/xfs/xfs_attr_leaf.c 2011-12-07 11:17:02.336317626 +0100
@@ -271,10 +271,6 @@ xfs_attr_shortform_add(xfs_da_args_t *ar
dp = args->dp;
mp = dp->i_mount;
dp->i_d.di_forkoff = forkoff;
- dp->i_df.if_ext_max =
- XFS_IFORK_DSIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
- dp->i_afp->if_ext_max =
- XFS_IFORK_ASIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
ifp = dp->i_afp;
ASSERT(ifp->if_flags & XFS_IFINLINE);
@@ -326,7 +322,6 @@ xfs_attr_fork_reset(
ASSERT(ip->i_d.di_anextents == 0);
ASSERT(ip->i_afp == NULL);
- ip->i_df.if_ext_max = XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t);
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
}
@@ -389,10 +384,6 @@ xfs_attr_shortform_remove(xfs_da_args_t
(args->op_flags & XFS_DA_OP_ADDNAME) ||
!(mp->m_flags & XFS_MOUNT_ATTR2) ||
dp->i_d.di_format == XFS_DINODE_FMT_BTREE);
- dp->i_afp->if_ext_max =
- XFS_IFORK_ASIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
- dp->i_df.if_ext_max =
- XFS_IFORK_DSIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
xfs_trans_log_inode(args->trans, dp,
XFS_ILOG_CORE | XFS_ILOG_ADATA);
}
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c 2011-12-07 11:15:25.863506932 +0100
+++ xfs/fs/xfs/xfs_bmap.c 2011-12-07 11:17:02.339650941 +0100
@@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
}
/*
-* Update the record referred to by cur to the value given
+ * Check if the inode needs to be converted to btree format.
+ */
+static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
+{
+ return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_NEXTENTS(ip, whichfork) >
+ XFS_IFORK_MAXEXT(ip, whichfork);
+}
+
+/*
+ * Check if the inode should be converted to extent format.
+ */
+static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
+{
+ return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
+ XFS_IFORK_NEXTENTS(ip, whichfork) <=
+ XFS_IFORK_MAXEXT(ip, whichfork);
+}
+
+/*
+ * Update the record referred to by cur to the value given
* by [off, bno, len, state].
* This either works (return 0) or gets an EFSCORRUPTED error.
*/
@@ -683,8 +703,8 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(i == 1, done);
}
- if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+ if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
bma->firstblock, bma->flist,
&bma->cur, 1, &tmp_rval, XFS_DATA_FORK);
@@ -767,8 +787,8 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(i == 1, done);
}
- if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+ if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
bma->firstblock, bma->flist, &bma->cur, 1,
&tmp_rval, XFS_DATA_FORK);
@@ -836,8 +856,8 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(i == 1, done);
}
- if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+ if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
bma->firstblock, bma->flist, &bma->cur,
1, &tmp_rval, XFS_DATA_FORK);
@@ -884,8 +904,7 @@ xfs_bmap_add_extent_delay_real(
}
/* convert to a btree if necessary */
- if (XFS_IFORK_FORMAT(bma->ip, XFS_DATA_FORK) == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(bma->ip, XFS_DATA_FORK) > ifp->if_ext_max) {
+ if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
int tmp_logflags; /* partial log flag return val */
ASSERT(bma->cur == NULL);
@@ -1421,8 +1440,7 @@ xfs_bmap_add_extent_unwritten_real(
}
/* convert to a btree if necessary */
- if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > ifp->if_ext_max) {
+ if (xfs_bmap_needs_btree(ip, XFS_DATA_FORK)) {
int tmp_logflags; /* partial log flag return val */
ASSERT(cur == NULL);
@@ -1812,8 +1830,7 @@ xfs_bmap_add_extent_hole_real(
}
/* convert to a btree if necessary */
- if (XFS_IFORK_FORMAT(bma->ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(bma->ip, whichfork) > ifp->if_ext_max) {
+ if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
int tmp_logflags; /* partial log flag return val */
ASSERT(bma->cur == NULL);
@@ -3037,8 +3054,7 @@ xfs_bmap_extents_to_btree(
ifp = XFS_IFORK_PTR(ip, whichfork);
ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
/*
* Make space in the inode incore.
*/
@@ -3184,13 +3200,8 @@ xfs_bmap_forkoff_reset(
ip->i_d.di_format != XFS_DINODE_FMT_BTREE) {
uint dfl_forkoff = xfs_default_attroffset(ip) >> 3;
- if (dfl_forkoff > ip->i_d.di_forkoff) {
+ if (dfl_forkoff > ip->i_d.di_forkoff)
ip->i_d.di_forkoff = dfl_forkoff;
- ip->i_df.if_ext_max =
- XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t);
- ip->i_afp->if_ext_max =
- XFS_IFORK_ASIZE(ip) / sizeof(xfs_bmbt_rec_t);
- }
}
}
@@ -3430,8 +3441,6 @@ xfs_bmap_add_attrfork(
int error; /* error return value */
ASSERT(XFS_IFORK_Q(ip) == 0);
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
mp = ip->i_mount;
ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
@@ -3486,12 +3495,9 @@ xfs_bmap_add_attrfork(
error = XFS_ERROR(EINVAL);
goto error1;
}
- ip->i_df.if_ext_max =
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
+
ASSERT(ip->i_afp == NULL);
ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP);
- ip->i_afp->if_ext_max =
- XFS_IFORK_ASIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
ip->i_afp->if_flags = XFS_IFEXTENTS;
logflags = 0;
xfs_bmap_init(&flist, &firstblock);
@@ -3535,20 +3541,17 @@ xfs_bmap_add_attrfork(
} else
spin_unlock(&mp->m_sb_lock);
}
- if ((error = xfs_bmap_finish(&tp, &flist, &committed)))
+
+ error = xfs_bmap_finish(&tp, &flist, &committed);
+ if (error)
goto error2;
- error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
- return error;
+ return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
error2:
xfs_bmap_cancel(&flist);
error1:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
error0:
xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
return error;
}
@@ -4379,8 +4382,6 @@ xfs_bmapi_read(
XFS_STATS_INC(xs_blk_mapr);
ifp = XFS_IFORK_PTR(ip, whichfork);
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
if (!(ifp->if_flags & XFS_IFEXTENTS)) {
error = xfs_iread_extents(NULL, ip, whichfork);
@@ -4871,8 +4872,6 @@ xfs_bmapi_write(
return XFS_ERROR(EIO);
ifp = XFS_IFORK_PTR(ip, whichfork);
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
XFS_STATS_INC(xs_blk_mapw);
@@ -4981,8 +4980,7 @@ xfs_bmapi_write(
/*
* Transform from btree to extents, give it cur.
*/
- if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
- XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max) {
+ if (xfs_bmap_wants_extents(ip, whichfork)) {
int tmp_logflags = 0;
ASSERT(bma.cur);
@@ -4992,10 +4990,10 @@ xfs_bmapi_write(
if (error)
goto error0;
}
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE ||
- XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max);
+ XFS_IFORK_NEXTENTS(ip, whichfork) >
+ XFS_IFORK_MAXEXT(ip, whichfork));
error = 0;
error0:
/*
@@ -5095,8 +5093,7 @@ xfs_bunmapi(
ASSERT(len > 0);
ASSERT(nexts >= 0);
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
if (!(ifp->if_flags & XFS_IFEXTENTS) &&
(error = xfs_iread_extents(tp, ip, whichfork)))
return error;
@@ -5321,8 +5318,7 @@ xfs_bunmapi(
* will be dirty.
*/
if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
- XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
+ xfs_bmap_needs_btree(ip, whichfork) &&
del.br_startoff > got.br_startoff &&
del.br_startoff + del.br_blockcount <
got.br_startoff + got.br_blockcount) {
@@ -5353,13 +5349,11 @@ nodelete:
}
}
*done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0;
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
/*
* Convert to a btree if necessary.
*/
- if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max) {
+ if (xfs_bmap_needs_btree(ip, whichfork)) {
ASSERT(cur == NULL);
error = xfs_bmap_extents_to_btree(tp, ip, firstblock, flist,
&cur, 0, &tmp_logflags, whichfork);
@@ -5370,8 +5364,7 @@ nodelete:
/*
* transform from btree to extents, give it cur
*/
- else if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
- XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max) {
+ else if (xfs_bmap_wants_extents(ip, whichfork)) {
ASSERT(cur != NULL);
error = xfs_bmap_btree_to_extents(tp, ip, cur, &tmp_logflags,
whichfork);
@@ -5382,8 +5375,6 @@ nodelete:
/*
* transform from extents to local?
*/
- ASSERT(ifp->if_ext_max ==
- XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
error = 0;
error0:
/*
Index: xfs/fs/xfs/xfs_dfrag.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dfrag.c 2011-12-02 19:39:31.437161062 +0100
+++ xfs/fs/xfs/xfs_dfrag.c 2011-12-07 11:17:02.342984256 +0100
@@ -163,12 +163,14 @@ xfs_swap_extents_check_format(
/* Check temp in extent form to max in target */
if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) > ip->i_df.if_ext_max)
+ XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
+ XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
return EINVAL;
/* Check target in extent form to max in temp */
if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > tip->i_df.if_ext_max)
+ XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) >
+ XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
return EINVAL;
/*
@@ -180,18 +182,25 @@ xfs_swap_extents_check_format(
* (a common defrag case) which will occur when the temp inode is in
* extent format...
*/
- if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
- ((XFS_IFORK_BOFF(ip) &&
- tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip)) ||
- XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <= ip->i_df.if_ext_max))
+ if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ if (XFS_IFORK_BOFF(ip) &&
+ tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
+ return EINVAL;
+ if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
+ XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
return EINVAL;
+ }
/* Reciprocal target->temp btree format checks */
- if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
- ((XFS_IFORK_BOFF(tip) &&
- ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip)) ||
- XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <= tip->i_df.if_ext_max))
+ if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ if (XFS_IFORK_BOFF(tip) &&
+ ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
+ return EINVAL;
+
+ if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
+ XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
return EINVAL;
+ }
return 0;
}
@@ -349,16 +358,6 @@ xfs_swap_extents(
*tifp = *tempifp; /* struct copy */
/*
- * Fix the in-memory data fork values that are dependent on the fork
- * offset in the inode. We can't assume they remain the same as attr2
- * has dynamic fork offsets.
- */
- ifp->if_ext_max = XFS_IFORK_SIZE(ip, XFS_DATA_FORK) /
- (uint)sizeof(xfs_bmbt_rec_t);
- tifp->if_ext_max = XFS_IFORK_SIZE(tip, XFS_DATA_FORK) /
- (uint)sizeof(xfs_bmbt_rec_t);
-
- /*
* Fix the on-disk inode values
*/
tmp = (__uint64_t)ip->i_d.di_nblocks;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c 2011-12-02 19:39:31.447161062 +0100
+++ xfs/fs/xfs/xfs_iget.c 2011-12-07 11:17:02.342984256 +0100
@@ -451,8 +451,6 @@ again:
*ipp = ip;
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
/*
* If we have a real type for an on-disk inode, we can set ops(&unlock)
* now. If it's a new inode being created, xfs_ialloc will handle it.
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c 2011-12-07 11:15:33.600131684 +0100
+++ xfs/fs/xfs/xfs_inode.c 2011-12-07 11:17:02.342984256 +0100
@@ -299,11 +299,8 @@ xfs_iformat(
{
xfs_attr_shortform_t *atp;
int size;
- int error;
+ int error = 0;
xfs_fsize_t di_size;
- ip->i_df.if_ext_max =
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
- error = 0;
if (unlikely(be32_to_cpu(dip->di_nextents) +
be16_to_cpu(dip->di_anextents) >
@@ -409,10 +406,10 @@ xfs_iformat(
}
if (!XFS_DFORK_Q(dip))
return 0;
+
ASSERT(ip->i_afp == NULL);
ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP | KM_NOFS);
- ip->i_afp->if_ext_max =
- XFS_IFORK_ASIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
+
switch (dip->di_aformat) {
case XFS_DINODE_FMT_LOCAL:
atp = (xfs_attr_shortform_t *)XFS_DFORK_APTR(dip);
@@ -604,10 +601,11 @@ xfs_iformat_btree(
* or the number of extents is greater than the number of
* blocks.
*/
- if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max
- || XFS_BMDR_SPACE_CALC(nrecs) >
- XFS_DFORK_SIZE(dip, ip->i_mount, whichfork)
- || XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
+ if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <=
+ XFS_IFORK_MAXEXT(ip, whichfork) ||
+ XFS_BMDR_SPACE_CALC(nrecs) >
+ XFS_DFORK_SIZE(dip, ip->i_mount, whichfork) ||
+ XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
xfs_warn(ip->i_mount, "corrupt inode %Lu (btree).",
(unsigned long long) ip->i_ino);
XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,
@@ -835,12 +833,6 @@ xfs_iread(
* with the uninitialized part of it.
*/
ip->i_d.di_mode = 0;
- /*
- * Initialize the per-fork minima and maxima for a new
- * inode here. xfs_iformat will do it for old inodes.
- */
- ip->i_df.if_ext_max =
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
}
/*
@@ -1740,8 +1732,6 @@ xfs_ifree(
ip->i_d.di_flags = 0;
ip->i_d.di_dmevmask = 0;
ip->i_d.di_forkoff = 0; /* mark the attr fork not in use */
- ip->i_df.if_ext_max =
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
/*
@@ -2408,7 +2398,7 @@ xfs_iflush(
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
ASSERT(!completion_done(&ip->i_flush));
ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
- ip->i_d.di_nextents > ip->i_df.if_ext_max);
+ ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
iip = ip->i_itemp;
mp = ip->i_mount;
@@ -2524,7 +2514,7 @@ xfs_iflush_int(
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
ASSERT(!completion_done(&ip->i_flush));
ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
- ip->i_d.di_nextents > ip->i_df.if_ext_max);
+ ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
iip = ip->i_itemp;
mp = ip->i_mount;
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c 2011-12-07 11:15:27.990162076 +0100
+++ xfs/fs/xfs/xfs_inode_item.c 2011-12-07 11:17:02.342984256 +0100
@@ -79,8 +79,6 @@ xfs_inode_item_size(
break;
case XFS_DINODE_FMT_BTREE:
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
iip->ili_format.ilf_fields &=
~(XFS_ILOG_DDATA | XFS_ILOG_DEXT |
XFS_ILOG_DEV | XFS_ILOG_UUID);
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h 2011-12-07 11:15:33.603464999 +0100
+++ xfs/fs/xfs/xfs_trace.h 2011-12-07 11:17:02.346317572 +0100
@@ -1568,7 +1568,6 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_clas
__field(xfs_ino_t, ino)
__field(int, format)
__field(int, nex)
- __field(int, max_nex)
__field(int, broot_size)
__field(int, fork_off)
),
@@ -1578,18 +1577,16 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_clas
__entry->ino = ip->i_ino;
__entry->format = ip->i_d.di_format;
__entry->nex = ip->i_d.di_nextents;
- __entry->max_nex = ip->i_df.if_ext_max;
__entry->broot_size = ip->i_df.if_broot_bytes;
__entry->fork_off = XFS_IFORK_BOFF(ip);
),
TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %d, "
- "Max in-fork extents %d, broot size %d, fork offset %d",
+ "broot size %d, fork offset %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__print_symbolic(__entry->which, XFS_SWAPEXT_INODES),
__print_symbolic(__entry->format, XFS_INODE_FORMAT_STR),
__entry->nex,
- __entry->max_nex,
__entry->broot_size,
__entry->fork_off)
)
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-12-07 11:16:06.329954372 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-12-07 11:17:02.346317572 +0100
@@ -66,7 +66,6 @@ typedef struct xfs_ifork {
struct xfs_btree_block *if_broot; /* file's incore btree root */
short if_broot_bytes; /* bytes allocated for root */
unsigned char if_flags; /* per-fork flags */
- unsigned char if_ext_max; /* max # of extent records */
union {
xfs_bmbt_rec_host_t *if_extents;/* linear map file exts */
xfs_ext_irec_t *if_ext_irec; /* irec map file exts */
@@ -206,7 +205,8 @@ typedef struct xfs_icdinode {
((w) == XFS_DATA_FORK ? \
((ip)->i_d.di_nextents = (n)) : \
((ip)->i_d.di_anextents = (n)))
-
+#define XFS_IFORK_MAXEXT(ip, w) \
+ (XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
#ifdef __KERNEL__
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
2011-12-08 15:57 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
@ 2011-12-13 21:59 ` Dave Chinner
0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 21:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:57:59AM -0500, Christoph Hellwig wrote:
> We spent a lot of effort to maintain this field, but it always equalts to the
> fork size divided by the constant size of an extent. The prime use of it is
> to assert that the two stay in sync. Just divide the fork size by the extent
> size in the few places that we actually use it and remove the overhead
> of maintaining it.
Ok, so you are trading off the overhead of initialising once with
runtime overhead of a read, against a runtime overhead read and a
integer division. Some platforms have slow integer division, so at
face value this migh tbe a bit slower.
However, sizeof(struct xfs_bmbt_rec) == 16, which is determined at
compile time so the compiler can optimise that to a shift, which has
basically no overhead compared to a division on platforms where
division is slow.
>From taht perspective, this seems like a good tradeoff to make -
very little additional runtime overhead, much simpler code.
> Also introduce a few helpers to consolidate the places
> where we actually care about the value.
The helpers are a vast improvement.
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Couple of small formatting comments below, but otherwise consider it
Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Index: xfs/fs/xfs/xfs_dfrag.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dfrag.c 2011-12-02 19:39:31.437161062 +0100
> +++ xfs/fs/xfs/xfs_dfrag.c 2011-12-07 11:17:02.342984256 +0100
> @@ -163,12 +163,14 @@ xfs_swap_extents_check_format(
>
> /* Check temp in extent form to max in target */
> if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
> - XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) > ip->i_df.if_ext_max)
> + XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
> + XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
> return EINVAL;
I'd indent the XFS_IFORK_MAXEXT() so it's obvious it's part of a
conditional and not a new conditional expression. Maybe something
like:
+ XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
+ XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
Otherwise it's not exactly obvious how the logic flows here.
>
> /* Check target in extent form to max in temp */
> if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
> - XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > tip->i_df.if_ext_max)
> + XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) >
> + XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
> return EINVAL;
Same here.
>
> /*
> @@ -180,18 +182,25 @@ xfs_swap_extents_check_format(
> * (a common defrag case) which will occur when the temp inode is in
> * extent format...
> */
> - if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> - ((XFS_IFORK_BOFF(ip) &&
> - tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip)) ||
> - XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <= ip->i_df.if_ext_max))
> + if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> + if (XFS_IFORK_BOFF(ip) &&
> + tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
> + return EINVAL;
> + if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
> + XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
> return EINVAL;
^ needs another indent.
> + }
>
> /* Reciprocal target->temp btree format checks */
> - if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> - ((XFS_IFORK_BOFF(tip) &&
> - ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip)) ||
> - XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <= tip->i_df.if_ext_max))
> + if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> + if (XFS_IFORK_BOFF(tip) &&
> + ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
> + return EINVAL;
> +
> + if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
> + XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
> return EINVAL;
Same here.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 05/11] xfs: make i_flags an unsigned long
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (3 preceding siblings ...)
2011-12-08 15:57 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2012-01-13 19:07 ` Ben Myers
2011-12-08 15:58 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
` (5 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-use-i_flags --]
[-- Type: text/plain, Size: 1314 bytes --]
To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems. Beause of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:58:58.449770178 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-11-30 12:59:05.013067955 +0100
@@ -242,7 +242,7 @@ typedef struct xfs_inode {
wait_queue_head_t i_ipin_wait; /* inode pinning wait queue */
spinlock_t i_flags_lock; /* inode i_flags lock */
/* Miscellaneous state. */
- unsigned short i_flags; /* see defined flags below */
+ unsigned long i_flags; /* see defined flags below */
unsigned char i_update_core; /* timestamps/size is dirty */
unsigned int i_delayed_blks; /* count of delay alloc blks */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 05/11] xfs: make i_flags an unsigned long
2011-12-08 15:58 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
@ 2012-01-13 19:07 ` Ben Myers
2012-01-24 18:03 ` Christoph Hellwig
0 siblings, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-01-13 19:07 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:00AM -0500, Christoph Hellwig wrote:
> To be used for bit wakeup i_flags needs to be an unsigned long or we'll
> run into trouble on big endian systems. Beause of the 1-byte i_update
Because
> field right after it this actually causes a fairly large size increase
> on its own (4 or 8 bytes), but that increase will be more than offset
> by the next two patches.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Alex Elder <aelder@sgi.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Looks good to me. But I could go for a thorough explanation of 'trouble
on big endian systems'.
Reviewed-by: Ben Myers <bpm@sgi.com>
>
> ---
> fs/xfs/xfs_inode.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: xfs/fs/xfs/xfs_inode.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:58:58.449770178 +0100
> +++ xfs/fs/xfs/xfs_inode.h 2011-11-30 12:59:05.013067955 +0100
> @@ -242,7 +242,7 @@ typedef struct xfs_inode {
> wait_queue_head_t i_ipin_wait; /* inode pinning wait queue */
> spinlock_t i_flags_lock; /* inode i_flags lock */
> /* Miscellaneous state. */
> - unsigned short i_flags; /* see defined flags below */
> + unsigned long i_flags; /* see defined flags below */
> unsigned char i_update_core; /* timestamps/size is dirty */
> unsigned int i_delayed_blks; /* count of delay alloc blks */
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 05/11] xfs: make i_flags an unsigned long
2012-01-13 19:07 ` Ben Myers
@ 2012-01-24 18:03 ` Christoph Hellwig
0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2012-01-24 18:03 UTC (permalink / raw)
To: Ben Myers; +Cc: Christoph Hellwig, xfs
On Fri, Jan 13, 2012 at 01:07:16PM -0600, Ben Myers wrote:
> On Thu, Dec 08, 2011 at 10:58:00AM -0500, Christoph Hellwig wrote:
> > To be used for bit wakeup i_flags needs to be an unsigned long or we'll
> > run into trouble on big endian systems. Beause of the 1-byte i_update
> Because
> > field right after it this actually causes a fairly large size increase
> > on its own (4 or 8 bytes), but that increase will be more than offset
> > by the next two patches.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Alex Elder <aelder@sgi.com>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
>
> Looks good to me. But I could go for a thorough explanation of 'trouble
> on big endian systems'.
If we use the bitops that expect to operate on a 64-bit value on a
32-bit value on big endian we'll actually use the 32-bit longword next
to it, not the one we intended for the first half of the bits.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (4 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 22:19 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
` (4 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-i_flush --]
[-- Type: text/plain, Size: 9457 bytes --]
We almost never block on i_flock, the exception is synchronous inode
flushing. Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path. This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.
A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.
Note that for example the inode writeback locking is implemented in a
very similar way.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/xfs_iget.c | 20 +++++++++++-
fs/xfs/xfs_inode.c | 4 +-
fs/xfs/xfs_inode.h | 78 ++++++++++++++++++++++++++++++------------------
fs/xfs/xfs_inode_item.c | 4 +-
fs/xfs/xfs_super.c | 7 ----
fs/xfs/xfs_sync.c | 9 ++---
6 files changed, 76 insertions(+), 46 deletions(-)
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c 2011-11-30 12:59:04.989734747 +0100
+++ xfs/fs/xfs/xfs_iget.c 2011-11-30 12:59:08.843047205 +0100
@@ -77,7 +77,7 @@ xfs_inode_alloc(
ASSERT(atomic_read(&ip->i_pincount) == 0);
ASSERT(!spin_is_locked(&ip->i_flags_lock));
- ASSERT(completion_done(&ip->i_flush));
+ ASSERT(!xfs_isiflocked(ip));
ASSERT(ip->i_ino == 0);
mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
@@ -151,7 +151,7 @@ xfs_inode_free(
/* asserts to verify all state is correct here */
ASSERT(atomic_read(&ip->i_pincount) == 0);
ASSERT(!spin_is_locked(&ip->i_flags_lock));
- ASSERT(completion_done(&ip->i_flush));
+ ASSERT(!xfs_isiflocked(ip));
/*
* Because we use RCU freeing we need to ensure the inode always
@@ -714,3 +714,19 @@ xfs_isilocked(
return 0;
}
#endif
+
+void
+__xfs_iflock(
+ struct xfs_inode *ip)
+{
+ wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IFLOCK_BIT);
+ DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IFLOCK_BIT);
+
+ do {
+ prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ if (xfs_isiflocked(ip))
+ io_schedule();
+ } while (!xfs_iflock_nowait(ip));
+
+ finish_wait(wq, &wait.wait);
+}
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c 2011-11-30 12:59:05.003068007 +0100
+++ xfs/fs/xfs/xfs_inode.c 2011-11-30 12:59:08.843047205 +0100
@@ -2396,7 +2396,7 @@ xfs_iflush(
XFS_STATS_INC(xs_iflush_count);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
- ASSERT(!completion_done(&ip->i_flush));
+ ASSERT(xfs_isiflocked(ip));
ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
@@ -2512,7 +2512,7 @@ xfs_iflush_int(
#endif
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
- ASSERT(!completion_done(&ip->i_flush));
+ ASSERT(xfs_isiflocked(ip));
ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:59:05.013067955 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-11-30 12:59:08.846380520 +0100
@@ -237,7 +237,6 @@ typedef struct xfs_inode {
struct xfs_inode_log_item *i_itemp; /* logging information */
mrlock_t i_lock; /* inode lock */
mrlock_t i_iolock; /* inode IO lock */
- struct completion i_flush; /* inode flush completion q */
atomic_t i_pincount; /* inode pin count */
wait_queue_head_t i_ipin_wait; /* inode pinning wait queue */
spinlock_t i_flags_lock; /* inode i_flags lock */
@@ -324,6 +323,19 @@ xfs_iflags_test_and_clear(xfs_inode_t *i
return ret;
}
+static inline int
+xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags)
+{
+ int ret;
+
+ spin_lock(&ip->i_flags_lock);
+ ret = ip->i_flags & flags;
+ if (!ret)
+ ip->i_flags |= flags;
+ spin_unlock(&ip->i_flags_lock);
+ return ret;
+}
+
/*
* Project quota id helpers (previously projid was 16bit only
* and using two 16bit values to hold new 32bit projid was chosen
@@ -344,35 +356,17 @@ xfs_set_projid(struct xfs_inode *ip,
}
/*
- * Manage the i_flush queue embedded in the inode. This completion
- * queue synchronizes processes attempting to flush the in-core
- * inode back to disk.
- */
-static inline void xfs_iflock(xfs_inode_t *ip)
-{
- wait_for_completion(&ip->i_flush);
-}
-
-static inline int xfs_iflock_nowait(xfs_inode_t *ip)
-{
- return try_wait_for_completion(&ip->i_flush);
-}
-
-static inline void xfs_ifunlock(xfs_inode_t *ip)
-{
- complete(&ip->i_flush);
-}
-
-/*
* In-core inode flags.
*/
-#define XFS_IRECLAIM 0x0001 /* started reclaiming this inode */
-#define XFS_ISTALE 0x0002 /* inode has been staled */
-#define XFS_IRECLAIMABLE 0x0004 /* inode can be reclaimed */
-#define XFS_INEW 0x0008 /* inode has just been allocated */
-#define XFS_IFILESTREAM 0x0010 /* inode is in a filestream directory */
-#define XFS_ITRUNCATED 0x0020 /* truncated down so flush-on-close */
-#define XFS_IDIRTY_RELEASE 0x0040 /* dirty release already seen */
+#define XFS_IRECLAIM (1 << 0) /* started reclaiming this inode */
+#define XFS_ISTALE (1 << 1) /* inode has been staled */
+#define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */
+#define XFS_INEW (1 << 3) /* inode has just been allocated */
+#define XFS_IFILESTREAM (1 << 4) /* inode is in a filestream dir. */
+#define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */
+#define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */
+#define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */
+#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
/*
* Per-lifetime flags need to be reset when re-using a reclaimable inode during
@@ -385,6 +379,34 @@ static inline void xfs_ifunlock(xfs_inod
XFS_IFILESTREAM);
/*
+ * Synchronize processes attempting to flush the in-core inode back to disk.
+ */
+
+extern void __xfs_iflock(struct xfs_inode *ip);
+
+static inline int xfs_iflock_nowait(struct xfs_inode *ip)
+{
+ return !xfs_iflags_test_and_set(ip, XFS_IFLOCK);
+}
+
+static inline void xfs_iflock(struct xfs_inode *ip)
+{
+ if (!xfs_iflock_nowait(ip))
+ __xfs_iflock(ip);
+}
+
+static inline void xfs_ifunlock(struct xfs_inode *ip)
+{
+ xfs_iflags_clear(ip, XFS_IFLOCK);
+ wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
+}
+
+static inline int xfs_isiflocked(struct xfs_inode *ip)
+{
+ return xfs_iflags_test(ip, XFS_IFLOCK);
+}
+
+/*
* Flags for inode locking.
* Bit ranges: 1<<1 - 1<<16-1 -- iolock/ilock modes (bitfield)
* 1<<16 - 1<<32-1 -- lockdep annotation (integers)
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c 2011-11-30 12:59:05.023067900 +0100
+++ xfs/fs/xfs/xfs_inode_item.c 2011-11-30 12:59:08.846380520 +0100
@@ -717,7 +717,7 @@ xfs_inode_item_pushbuf(
* If a flush is not in progress anymore, chances are that the
* inode was taken off the AIL. So, just get out.
*/
- if (completion_done(&ip->i_flush) ||
+ if (!xfs_isiflocked(ip) ||
!(lip->li_flags & XFS_LI_IN_AIL)) {
xfs_iunlock(ip, XFS_ILOCK_SHARED);
return true;
@@ -750,7 +750,7 @@ xfs_inode_item_push(
struct xfs_inode *ip = iip->ili_inode;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED));
- ASSERT(!completion_done(&ip->i_flush));
+ ASSERT(xfs_isiflocked(ip));
/*
* Since we were able to lock the inode's flush lock and
Index: xfs/fs/xfs/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/xfs_super.c 2011-11-30 12:59:05.036401160 +0100
+++ xfs/fs/xfs/xfs_super.c 2011-11-30 12:59:08.846380520 +0100
@@ -829,13 +829,6 @@ xfs_fs_inode_init_once(
atomic_set(&ip->i_pincount, 0);
spin_lock_init(&ip->i_flags_lock);
init_waitqueue_head(&ip->i_ipin_wait);
- /*
- * Because we want to use a counting completion, complete
- * the flush completion once to allow a single access to
- * the flush completion without blocking.
- */
- init_completion(&ip->i_flush);
- complete(&ip->i_flush);
mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
"xfsino", ip->i_ino);
Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c 2011-11-30 12:59:05.053067738 +0100
+++ xfs/fs/xfs/xfs_sync.c 2011-11-30 12:59:08.849713835 +0100
@@ -671,14 +671,13 @@ xfs_reclaim_inode_grab(
return 1;
/*
- * do some unlocked checks first to avoid unnecessary lock traffic.
- * The first is a flush lock check, the second is a already in reclaim
- * check. Only do these checks if we are not going to block on locks.
+ * If we are asked for non-blocking operation, do unlocked checks to
+ * see if the inode already is being flushed or in reclaim to avoid
+ * lock traffic.
*/
if ((flags & SYNC_TRYLOCK) &&
- (!ip->i_flush.done || __xfs_iflags_test(ip, XFS_IRECLAIM))) {
+ __xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM))
return 1;
- }
/*
* The radix tree lock here protects a thread in xfs_iget from racing
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock
2011-12-08 15:58 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
@ 2011-12-13 22:19 ` Dave Chinner
2011-12-18 18:11 ` Christoph Hellwig
0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 22:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:01AM -0500, Christoph Hellwig wrote:
> We almost never block on i_flock, the exception is synchronous inode
> flushing. Instead of bloating the inode with a 16/24-byte completion
> that we abuse as a semaphore just implement it as a bitlock that uses
> a bit waitqueue for the rare sleeping path. This primarily is a
> tradeoff between a much smaller inode and a faster non-blocking
> path vs faster wakeups, and we are much better off with the former.
>
> A small downside is that we will lose lockdep checking for i_flock, but
> given that it's always taken inside the ilock that should be acceptable.
I didn't think we had lockdep checking on the i_flock because it
uses completions rather than real lock primitives. Either way, I'm
not concerned about this aspect of the change - lockdep doesn't pick
up the typical sort of holdoff problems that i_flock vs delwri
trigger...
> Note that for example the inode writeback locking is implemented in a
> very similar way.
....
> +
> +static inline void xfs_ifunlock(struct xfs_inode *ip)
> +{
> + xfs_iflags_clear(ip, XFS_IFLOCK);
> + wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
> +}
Should the wakeup be done whilst the ip->i_flags_lock is still held?
The VFS code does the __I_SYNC wakeup while still holding the
inode->i_lock so that the clear and wakeup are atomic, similarly the
__I_NEW bit....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock
2011-12-13 22:19 ` Dave Chinner
@ 2011-12-18 18:11 ` Christoph Hellwig
0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-18 18:11 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Dec 14, 2011 at 09:19:17AM +1100, Dave Chinner wrote:
> > +static inline void xfs_ifunlock(struct xfs_inode *ip)
> > +{
> > + xfs_iflags_clear(ip, XFS_IFLOCK);
> > + wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
> > +}
>
> Should the wakeup be done whilst the ip->i_flags_lock is still held?
There is no reason to do so - __XFS_IFLOCK_BIT is only used as a wakeup key
in wake_up_bit, so it can easily be done after the unlocking. Doing so
is indeed a lot more efficient as the waiter needs to take the locks as
the first thing after beeing woken.
> The VFS code does the __I_SYNC wakeup while still holding the
> inode->i_lock so that the clear and wakeup are atomic, similarly the
> __I_NEW bit....
It implements a bit different semantics, and actually needs to hold the
lock for more synchronization than just the wakeup bit. It does in fact
use the bit wakeup helpers using the atomic test/set bit operations
despite holding a lock for some reason.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (5 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 22:21 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
` (3 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-i_ipin_wait --]
[-- Type: text/plain, Size: 4084 bytes --]
Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue. This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/xfs_inode.c | 27 +++++++++++++++++++++------
fs/xfs/xfs_inode.h | 3 ++-
fs/xfs/xfs_inode_item.c | 2 +-
fs/xfs/xfs_super.c | 1 -
4 files changed, 24 insertions(+), 9 deletions(-)
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c 2011-11-30 12:59:08.843047205 +0100
+++ xfs/fs/xfs/xfs_inode.c 2011-11-30 12:59:10.296372665 +0100
@@ -2037,7 +2037,7 @@ xfs_idestroy_fork(
* once someone is waiting for it to be unpinned.
*/
static void
-xfs_iunpin_nowait(
+xfs_iunpin(
struct xfs_inode *ip)
{
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
@@ -2049,14 +2049,29 @@ xfs_iunpin_nowait(
}
+static void
+__xfs_iunpin_wait(
+ struct xfs_inode *ip)
+{
+ wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IPINNED_BIT);
+ DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IPINNED_BIT);
+
+ xfs_iunpin(ip);
+
+ do {
+ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ if (xfs_ipincount(ip))
+ io_schedule();
+ } while (xfs_ipincount(ip));
+ finish_wait(wq, &wait.wait);
+}
+
void
xfs_iunpin_wait(
struct xfs_inode *ip)
{
- if (xfs_ipincount(ip)) {
- xfs_iunpin_nowait(ip);
- wait_event(ip->i_ipin_wait, (xfs_ipincount(ip) == 0));
- }
+ if (xfs_ipincount(ip))
+ __xfs_iunpin_wait(ip);
}
/*
@@ -2415,7 +2430,7 @@ xfs_iflush(
* out for us if they occur after the log force completes.
*/
if (!(flags & SYNC_WAIT) && xfs_ipincount(ip)) {
- xfs_iunpin_nowait(ip);
+ xfs_iunpin(ip);
xfs_ifunlock(ip);
return EAGAIN;
}
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-11-30 12:59:10.296372665 +0100
@@ -238,7 +238,6 @@ typedef struct xfs_inode {
mrlock_t i_lock; /* inode lock */
mrlock_t i_iolock; /* inode IO lock */
atomic_t i_pincount; /* inode pin count */
- wait_queue_head_t i_ipin_wait; /* inode pinning wait queue */
spinlock_t i_flags_lock; /* inode i_flags lock */
/* Miscellaneous state. */
unsigned long i_flags; /* see defined flags below */
@@ -367,6 +366,8 @@ xfs_set_projid(struct xfs_inode *ip,
#define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */
#define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */
#define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
+#define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
+#define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
/*
* Per-lifetime flags need to be reset when re-using a reclaimable inode during
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c 2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_inode_item.c 2011-11-30 12:59:10.296372665 +0100
@@ -555,7 +555,7 @@ xfs_inode_item_unpin(
trace_xfs_inode_unpin(ip, _RET_IP_);
ASSERT(atomic_read(&ip->i_pincount) > 0);
if (atomic_dec_and_test(&ip->i_pincount))
- wake_up(&ip->i_ipin_wait);
+ wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
}
/*
Index: xfs/fs/xfs/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/xfs_super.c 2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_super.c 2011-11-30 12:59:10.296372665 +0100
@@ -828,7 +828,6 @@ xfs_fs_inode_init_once(
/* xfs inode */
atomic_set(&ip->i_pincount, 0);
spin_lock_init(&ip->i_flags_lock);
- init_waitqueue_head(&ip->i_ipin_wait);
mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
"xfsino", ip->i_ino);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue
2011-12-08 15:58 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
@ 2011-12-13 22:21 ` Dave Chinner
0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 22:21 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:02AM -0500, Christoph Hellwig wrote:
> Replace i_pin_wait, which is only used during synchronous inode flushing
> with a bit waitqueue. This trades off a much smaller inode against
> slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
> bytes in the XFS inode.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Alex Elder <aelder@sgi.com>
Looks fine - the pinning code never modifies the i_flags field, so
doesn't require the i_flags_lock anywhere so the concerns i had
about the previous patch are not relevant here...
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (6 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 22:58 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
` (2 subsequent siblings)
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-i_size --]
[-- Type: text/plain, Size: 17985 bytes --]
There is no fundamental need to keep an in-memory inode size copy in the
XFS inode. We already have the on-disk value in the dinode, and the
separate in-memory copy that we need for regular files only in the
XFS inode.
Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use
the VFS inode i_size field for regular fields. Switch code that was
directly accessing it to either the XFS_ISIZE macro or direct access
of the VFS i_size if the code is limited to regular files and in
highlevel code.
This also allows dropping a a big bunch of fairly complicated code in the
write path which dealt with keeping the xfs_inode i_size uptodate with the
VFS i_size that is getting updated inside ->write_end.
Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is point in doing so. Just relax the
assert in xfs_ifree to only check the on-disk size instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_aops.c | 2 +-
fs/xfs/xfs_bmap.c | 15 ++++++---------
fs/xfs/xfs_file.c | 45 +++++++++++----------------------------------
fs/xfs/xfs_fs_subr.c | 2 +-
fs/xfs/xfs_iget.c | 1 -
fs/xfs/xfs_inode.c | 8 ++------
fs/xfs/xfs_inode.h | 16 ++++++++++++----
fs/xfs/xfs_iomap.c | 12 ++++++------
fs/xfs/xfs_iops.c | 3 +--
fs/xfs/xfs_qm_syscalls.c | 1 -
fs/xfs/xfs_trace.h | 2 +-
fs/xfs/xfs_vnodeops.c | 31 +++++++++++++++----------------
12 files changed, 56 insertions(+), 82 deletions(-)
Index: xfs/fs/xfs/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_aops.c 2011-12-02 19:39:30.743827732 +0100
+++ xfs/fs/xfs/xfs_aops.c 2011-12-07 11:18:04.305981907 +0100
@@ -111,7 +111,7 @@ xfs_ioend_new_eof(
xfs_fsize_t bsize;
bsize = ioend->io_offset + ioend->io_size;
- isize = MAX(ip->i_size, ip->i_new_size);
+ isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
isize = MIN(isize, bsize);
return isize > ip->i_d.di_size ? isize : 0;
}
Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c 2011-12-02 19:39:30.753827732 +0100
+++ xfs/fs/xfs/xfs_file.c 2011-12-07 11:18:04.305981907 +0100
@@ -327,7 +327,7 @@ xfs_file_aio_read(
mp->m_rtdev_targp : mp->m_ddev_targp;
if ((iocb->ki_pos & target->bt_smask) ||
(size & target->bt_smask)) {
- if (iocb->ki_pos == ip->i_size)
+ if (iocb->ki_pos == i_size_read(inode))
return 0;
return -XFS_ERROR(EINVAL);
}
@@ -412,30 +412,6 @@ xfs_file_splice_read(
return ret;
}
-STATIC void
-xfs_aio_write_isize_update(
- struct inode *inode,
- loff_t *ppos,
- ssize_t bytes_written)
-{
- struct xfs_inode *ip = XFS_I(inode);
- xfs_fsize_t isize = i_size_read(inode);
-
- if (bytes_written > 0)
- XFS_STATS_ADD(xs_write_bytes, bytes_written);
-
- if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
- *ppos > isize))
- *ppos = isize;
-
- if (*ppos > ip->i_size) {
- xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
- if (*ppos > ip->i_size)
- ip->i_size = *ppos;
- xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
- }
-}
-
/*
* If this was a direct or synchronous I/O that failed (such as ENOSPC) then
* part of the I/O may have been written to disk before the error occurred. In
@@ -451,8 +427,8 @@ xfs_aio_write_newsize_update(
xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
if (new_size == ip->i_new_size)
ip->i_new_size = 0;
- if (ip->i_d.di_size > ip->i_size)
- ip->i_d.di_size = ip->i_size;
+ if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
+ ip->i_d.di_size = i_size_read(VFS_I(ip));
xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
}
}
@@ -492,15 +468,16 @@ xfs_file_splice_write(
new_size = *ppos + count;
xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (new_size > ip->i_size)
+ if (new_size > i_size_read(inode))
ip->i_new_size = new_size;
xfs_iunlock(ip, XFS_ILOCK_EXCL);
trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
+ if (ret > 0)
+ XFS_STATS_ADD(xs_write_bytes, ret);
- xfs_aio_write_isize_update(inode, ppos, ret);
xfs_aio_write_newsize_update(ip, new_size);
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
return ret;
@@ -728,14 +705,14 @@ restart:
* values are still valid.
*/
if ((ip->i_new_size && *pos > ip->i_new_size) ||
- (!ip->i_new_size && *pos > ip->i_size)) {
+ (!ip->i_new_size && *pos > i_size_read(inode))) {
if (*iolock == XFS_IOLOCK_SHARED) {
xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
*iolock = XFS_IOLOCK_EXCL;
xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
goto restart;
}
- error = -xfs_zero_eof(ip, *pos, ip->i_size);
+ error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
}
/*
@@ -744,7 +721,7 @@ restart:
* ip->i_new_size if this IO ends beyond any other in-flight writes.
*/
new_size = *pos + *count;
- if (new_size > ip->i_size) {
+ if (new_size > i_size_read(inode)) {
if (new_size > ip->i_new_size)
ip->i_new_size = new_size;
*new_sizep = new_size;
@@ -957,11 +934,11 @@ xfs_file_aio_write(
ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
ocount, &new_size, &iolock);
- xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
-
if (ret <= 0)
goto out_unlock;
+ XFS_STATS_ADD(xs_write_bytes, ret);
+
/* Handle various SYNC-type writes */
if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
loff_t end = pos + ret - 1;
Index: xfs/fs/xfs/xfs_fs_subr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_fs_subr.c 2011-12-02 19:39:30.763827732 +0100
+++ xfs/fs/xfs/xfs_fs_subr.c 2011-12-07 11:18:04.305981907 +0100
@@ -90,7 +90,7 @@ xfs_wait_on_pages(
if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
return -filemap_fdatawait_range(mapping, first,
- last == -1 ? ip->i_size - 1 : last);
+ last == -1 ? XFS_ISIZE(ip) - 1 : last);
}
return 0;
}
Index: xfs/fs/xfs/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iops.c 2011-12-07 11:15:33.600131684 +0100
+++ xfs/fs/xfs/xfs_iops.c 2011-12-07 11:18:04.309315222 +0100
@@ -778,7 +778,7 @@ xfs_setattr_size(
lock_flags |= XFS_IOLOCK_EXCL;
xfs_ilock(ip, lock_flags);
- oldsize = ip->i_size;
+ oldsize = inode->i_size;
newsize = iattr->ia_size;
/*
@@ -897,7 +897,6 @@ xfs_setattr_size(
* they get written to.
*/
ip->i_d.di_size = newsize;
- ip->i_size = newsize;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
if (newsize <= oldsize) {
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h 2011-12-07 11:17:02.346317572 +0100
+++ xfs/fs/xfs/xfs_trace.h 2011-12-07 11:18:04.309315222 +0100
@@ -1038,7 +1038,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
TP_fast_assign(
__entry->dev = VFS_I(ip)->i_sb->s_dev;
__entry->ino = ip->i_ino;
- __entry->isize = ip->i_size;
+ __entry->isize = VFS_I(ip)->i_size;
__entry->disize = ip->i_d.di_size;
__entry->new_size = ip->i_new_size;
__entry->offset = offset;
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c 2011-12-07 11:17:02.339650941 +0100
+++ xfs/fs/xfs/xfs_bmap.c 2011-12-07 11:18:04.312648538 +0100
@@ -3997,11 +3997,8 @@ xfs_bmap_one_block(
xfs_bmbt_irec_t s; /* internal version of extent */
#ifndef DEBUG
- if (whichfork == XFS_DATA_FORK) {
- return S_ISREG(ip->i_d.di_mode) ?
- (ip->i_size == ip->i_mount->m_sb.sb_blocksize) :
- (ip->i_d.di_size == ip->i_mount->m_sb.sb_blocksize);
- }
+ if (whichfork == XFS_DATA_FORK)
+ return XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize;
#endif /* !DEBUG */
if (XFS_IFORK_NEXTENTS(ip, whichfork) != 1)
return 0;
@@ -4013,7 +4010,7 @@ xfs_bmap_one_block(
xfs_bmbt_get_all(ep, &s);
rval = s.br_startoff == 0 && s.br_blockcount == 1;
if (rval && whichfork == XFS_DATA_FORK)
- ASSERT(ip->i_size == ip->i_mount->m_sb.sb_blocksize);
+ ASSERT(XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize);
return rval;
}
@@ -5425,7 +5422,7 @@ xfs_getbmapx_fix_eof_hole(
if (startblock == HOLESTARTBLOCK) {
mp = ip->i_mount;
out->bmv_block = -1;
- fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, ip->i_size));
+ fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, XFS_ISIZE(ip)));
fixlen -= out->bmv_offset;
if (prealloced && out->bmv_offset + out->bmv_length == end) {
/* Came to hole at EOF. Trim it. */
@@ -5513,7 +5510,7 @@ xfs_getbmap(
fixlen = XFS_MAXIOFFSET(mp);
} else {
prealloced = 0;
- fixlen = ip->i_size;
+ fixlen = XFS_ISIZE(ip);
}
}
@@ -5542,7 +5539,7 @@ xfs_getbmap(
xfs_ilock(ip, XFS_IOLOCK_SHARED);
if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
- if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
+ if (ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_d.di_size) {
error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
if (error)
goto out_unlock_iolock;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c 2011-12-07 11:17:37.032796326 +0100
+++ xfs/fs/xfs/xfs_iget.c 2011-12-07 11:18:04.312648538 +0100
@@ -94,7 +94,6 @@ xfs_inode_alloc(
ip->i_update_core = 0;
ip->i_delayed_blks = 0;
memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
- ip->i_size = 0;
ip->i_new_size = 0;
return ip;
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c 2011-12-07 11:17:50.642722594 +0100
+++ xfs/fs/xfs/xfs_inode.c 2011-12-07 11:18:04.315981854 +0100
@@ -347,7 +347,6 @@ xfs_iformat(
return XFS_ERROR(EFSCORRUPTED);
}
ip->i_d.di_size = 0;
- ip->i_size = 0;
ip->i_df.if_u2.if_rdev = xfs_dinode_get_rdev(dip);
break;
@@ -853,7 +852,6 @@ xfs_iread(
}
ip->i_delayed_blks = 0;
- ip->i_size = ip->i_d.di_size;
/*
* Mark the buffer containing the inode as something to keep
@@ -1043,7 +1041,6 @@ xfs_ialloc(
}
ip->i_d.di_size = 0;
- ip->i_size = 0;
ip->i_d.di_nextents = 0;
ASSERT(ip->i_d.di_nblocks == 0);
@@ -1198,7 +1195,7 @@ xfs_itruncate_extents(
int done = 0;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL));
- ASSERT(new_size <= ip->i_size);
+ ASSERT(new_size <= XFS_ISIZE(ip));
ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
ASSERT(ip->i_itemp != NULL);
ASSERT(ip->i_itemp->ili_lock_flags == 0);
@@ -1712,8 +1709,7 @@ xfs_ifree(
ASSERT(ip->i_d.di_nlink == 0);
ASSERT(ip->i_d.di_nextents == 0);
ASSERT(ip->i_d.di_anextents == 0);
- ASSERT((ip->i_d.di_size == 0 && ip->i_size == 0) ||
- (!S_ISREG(ip->i_d.di_mode)));
+ ASSERT(ip->i_d.di_size == 0 || !S_ISREG(ip->i_d.di_mode));
ASSERT(ip->i_d.di_nblocks == 0);
/*
Index: xfs/fs/xfs/xfs_iomap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iomap.c 2011-12-07 11:15:53.693356164 +0100
+++ xfs/fs/xfs/xfs_iomap.c 2011-12-07 11:18:04.315981854 +0100
@@ -74,7 +74,7 @@ xfs_iomap_eof_align_last_fsb(
else if (mp->m_dalign)
align = mp->m_dalign;
- if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
+ if (align && XFS_ISIZE(ip) >= XFS_FSB_TO_B(mp, align))
new_last_fsb = roundup_64(*last_fsb, align);
}
@@ -154,7 +154,7 @@ xfs_iomap_write_direct(
offset_fsb = XFS_B_TO_FSBT(mp, offset);
last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
- if ((offset + count) > ip->i_size) {
+ if ((offset + count) > XFS_ISIZE(ip)) {
error = xfs_iomap_eof_align_last_fsb(mp, ip, extsz, &last_fsb);
if (error)
goto error_out;
@@ -211,7 +211,7 @@ xfs_iomap_write_direct(
xfs_trans_ijoin(tp, ip, 0);
bmapi_flag = 0;
- if (offset < ip->i_size || extsz)
+ if (offset < XFS_ISIZE(ip) || extsz)
bmapi_flag |= XFS_BMAPI_PREALLOC;
/*
@@ -286,7 +286,7 @@ xfs_iomap_eof_want_preallocate(
int found_delalloc = 0;
*prealloc = 0;
- if ((offset + count) <= ip->i_size)
+ if (offset + count <= XFS_ISIZE(ip))
return 0;
/*
@@ -340,7 +340,7 @@ xfs_iomap_prealloc_size(
* if we pass in alloc_blocks = 0. Hence the "+ 1" to
* ensure we always pass in a non-zero value.
*/
- alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size) + 1;
+ alloc_blocks = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)) + 1;
alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
rounddown_pow_of_two(alloc_blocks));
@@ -564,7 +564,7 @@ xfs_iomap_write_allocate(
* back....
*/
nimaps = 1;
- end_fsb = XFS_B_TO_FSB(mp, ip->i_size);
+ end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
error = xfs_bmap_last_offset(NULL, ip, &last_block,
XFS_DATA_FORK);
if (error)
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c 2011-12-07 11:15:33.603464999 +0100
+++ xfs/fs/xfs/xfs_vnodeops.c 2011-12-07 11:18:04.315981854 +0100
@@ -175,7 +175,7 @@ xfs_free_eofblocks(
* Figure out if there are any blocks beyond the end
* of the file. If not, then there is nothing to do.
*/
- end_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)ip->i_size));
+ end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
last_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_MAXIOFFSET(mp));
if (last_fsb <= end_fsb)
return 0;
@@ -233,7 +233,7 @@ xfs_free_eofblocks(
* may be full of holes (ie NULL files bug).
*/
error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
- ip->i_size);
+ XFS_ISIZE(ip));
if (error) {
/*
* If we get an error at this point we simply don't
@@ -547,8 +547,8 @@ xfs_release(
return 0;
if ((S_ISREG(ip->i_d.di_mode) &&
- ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
- ip->i_delayed_blks > 0)) &&
+ (VFS_I(ip)->i_size > 0 ||
+ (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
(ip->i_df.if_flags & XFS_IFEXTENTS)) &&
(!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) {
@@ -625,7 +625,7 @@ xfs_inactive(
* only one with a reference to the inode.
*/
truncate = ((ip->i_d.di_nlink == 0) &&
- ((ip->i_d.di_size != 0) || (ip->i_size != 0) ||
+ ((ip->i_d.di_size != 0) || XFS_ISIZE(ip) != 0 ||
(ip->i_d.di_nextents > 0) || (ip->i_delayed_blks > 0)) &&
S_ISREG(ip->i_d.di_mode));
@@ -639,12 +639,12 @@ xfs_inactive(
if (ip->i_d.di_nlink != 0) {
if ((S_ISREG(ip->i_d.di_mode) &&
- ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
- ip->i_delayed_blks > 0)) &&
- (ip->i_df.if_flags & XFS_IFEXTENTS) &&
- (!(ip->i_d.di_flags &
+ (VFS_I(ip)->i_size > 0 ||
+ (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
+ (ip->i_df.if_flags & XFS_IFEXTENTS) &&
+ (!(ip->i_d.di_flags &
(XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) ||
- (ip->i_delayed_blks != 0)))) {
+ ip->i_delayed_blks != 0))) {
error = xfs_free_eofblocks(mp, ip, 0);
if (error)
return VN_INACTIVE_CACHE;
@@ -678,7 +678,6 @@ xfs_inactive(
xfs_trans_ijoin(tp, ip, 0);
ip->i_d.di_size = 0;
- ip->i_size = 0;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
@@ -1974,11 +1973,11 @@ xfs_zero_remaining_bytes(
* since nothing can read beyond eof. The space will
* be zeroed when the file is extended anyway.
*/
- if (startoff >= ip->i_size)
+ if (startoff >= XFS_ISIZE(ip))
return 0;
- if (endoff > ip->i_size)
- endoff = ip->i_size;
+ if (endoff > XFS_ISIZE(ip))
+ endoff = XFS_ISIZE(ip);
bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp,
@@ -2273,7 +2272,7 @@ xfs_change_file_space(
bf->l_start += offset;
break;
case 2: /*SEEK_END*/
- bf->l_start += ip->i_size;
+ bf->l_start += XFS_ISIZE(ip);
break;
default:
return XFS_ERROR(EINVAL);
@@ -2290,7 +2289,7 @@ xfs_change_file_space(
bf->l_whence = 0;
startoffset = bf->l_start;
- fsize = ip->i_size;
+ fsize = XFS_ISIZE(ip);
/*
* XFS_IOC_RESVSP and XFS_IOC_UNRESVSP will reserve or unreserve
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-12-07 11:17:50.642722594 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-12-07 11:18:04.319315170 +0100
@@ -246,16 +246,12 @@ typedef struct xfs_inode {
xfs_icdinode_t i_d; /* most of ondisk inode */
- xfs_fsize_t i_size; /* in-memory size */
xfs_fsize_t i_new_size; /* size when write completes */
/* VFS inode */
struct inode i_vnode; /* embedded VFS inode */
} xfs_inode_t;
-#define XFS_ISIZE(ip) S_ISREG((ip)->i_d.di_mode) ? \
- (ip)->i_size : (ip)->i_d.di_size;
-
/* Convert from vfs inode to xfs inode */
static inline struct xfs_inode *XFS_I(struct inode *inode)
{
@@ -269,6 +265,18 @@ static inline struct inode *VFS_I(struct
}
/*
+ * For regular files we only update the on-disk filesize when actually
+ * writing data back to disk. Until then only the copy in the VFS inode
+ * is uptodate.
+ */
+static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
+{
+ if (S_ISREG(ip->i_d.di_mode))
+ return i_size_read(VFS_I(ip));
+ return ip->i_d.di_size;
+}
+
+/*
* i_flags helper functions
*/
static inline void
Index: xfs/fs/xfs/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/xfs_qm_syscalls.c 2011-12-07 11:15:33.600131684 +0100
+++ xfs/fs/xfs/xfs_qm_syscalls.c 2011-12-07 11:18:04.319315170 +0100
@@ -265,7 +265,6 @@ xfs_qm_scall_trunc_qfile(
xfs_trans_ijoin(tp, ip, 0);
ip->i_d.di_size = 0;
- ip->i_size = 0;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode
2011-12-08 15:58 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
@ 2011-12-13 22:58 ` Dave Chinner
0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 22:58 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:03AM -0500, Christoph Hellwig wrote:
> There is no fundamental need to keep an in-memory inode size copy in the
> XFS inode. We already have the on-disk value in the dinode, and the
> separate in-memory copy that we need for regular files only in the
> XFS inode.
>
> Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use
> the VFS inode i_size field for regular fields. Switch code that was
> directly accessing it to either the XFS_ISIZE macro or direct access
> of the VFS i_size if the code is limited to regular files and in
> highlevel code.
Welcome back, XFS_ISIZE() ;)
> This also allows dropping a a big bunch of fairly complicated code in the
> write path which dealt with keeping the xfs_inode i_size uptodate with the
> VFS i_size that is getting updated inside ->write_end.
Excellent - one less round trip on the ILOCK for each extending
write.
> Note that we do not bother resetting the VFS i_size when truncating a file
> that gets freed to zero as there is point in doing so. Just relax the
no point in doing so because
the VFS inode is no longer in
use at this point.
> assert in xfs_ifree to only check the on-disk size instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
......
> @@ -3997,11 +3997,8 @@ xfs_bmap_one_block(
> xfs_bmbt_irec_t s; /* internal version of extent */
>
> #ifndef DEBUG
> - if (whichfork == XFS_DATA_FORK) {
> - return S_ISREG(ip->i_d.di_mode) ?
> - (ip->i_size == ip->i_mount->m_sb.sb_blocksize) :
> - (ip->i_d.di_size == ip->i_mount->m_sb.sb_blocksize);
> - }
> + if (whichfork == XFS_DATA_FORK)
> + return XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize;
> #endif /* !DEBUG */
> if (XFS_IFORK_NEXTENTS(ip, whichfork) != 1)
> return 0;
I thought that this was a small change of logic, and validated that
it was OK to do so, but turns out it is not. I soon noticed that
this was the logic inside XFS_ISIZE(), and that it is all the
ip->i_size to XFS_ISIZE(ip) conversions that change the logic:
> @@ -269,6 +265,18 @@ static inline struct inode *VFS_I(struct
> }
>
> /*
> + * For regular files we only update the on-disk filesize when actually
> + * writing data back to disk. Until then only the copy in the VFS inode
> + * is uptodate.
> + */
> +static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
> +{
> + if (S_ISREG(ip->i_d.di_mode))
> + return i_size_read(VFS_I(ip));
> + return ip->i_d.di_size;
> +}
This means we -explicity- only have a separate in-memory size for
regular files which is a significant change of logic. Preivously all
inodes, regardless of type, had both on-disk and in-memory values.
After thinking about this for a bit, the new code actually reflects
the original intent of the i_size/i_d.di_size split. i.e. it was
only necessary to separate the size into in-memory/on-disk pair so
that the on-disk size matched what -data writeback- had completed.
And data writeback can only occur on regular files, so it was only
necessary for regular files. The original implementation didn't
reflect this - it just changed all inodes to have this split.
However, For all other types of inodes, all the size updates are
made directly to i_d.di_size and a logged immediately. In those
cases, we are simultaneously updating i_size, so we don't really
need it at all. e also don't need to touch the inode->i_size,
because the VFS is already maintaining that for use in these
cases....
Ok, I've convinced myself that this is valid, and I haven't seen any
problems in testing, so:
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (7 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 23:16 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
2011-12-08 15:58 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-kill-i_new_size --]
[-- Type: text/plain, Size: 13732 bytes --]
Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is a) always uptodate when
writeback could see a page. Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.
Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk. I hope to fix this properly in
the generic code.
A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added. I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_aops.c | 28 +++++++++++---------
fs/xfs/xfs_file.c | 72 +++++++----------------------------------------------
fs/xfs/xfs_iget.c | 1
fs/xfs/xfs_inode.h | 2 -
fs/xfs/xfs_trace.h | 18 ++-----------
5 files changed, 29 insertions(+), 92 deletions(-)
Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c 2011-11-30 12:59:11.669698558 +0100
+++ xfs/fs/xfs/xfs_file.c 2011-11-30 12:59:13.533021797 +0100
@@ -413,27 +413,6 @@ xfs_file_splice_read(
}
/*
- * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
- * part of the I/O may have been written to disk before the error occurred. In
- * this case the on-disk file size may have been adjusted beyond the in-memory
- * file size and now needs to be truncated back.
- */
-STATIC void
-xfs_aio_write_newsize_update(
- struct xfs_inode *ip,
- xfs_fsize_t new_size)
-{
- if (new_size == ip->i_new_size) {
- xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
- if (new_size == ip->i_new_size)
- ip->i_new_size = 0;
- if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
- ip->i_d.di_size = i_size_read(VFS_I(ip));
- xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
- }
-}
-
-/*
* xfs_file_splice_write() does not use xfs_rw_ilock() because
* generic_file_splice_write() takes the i_mutex itself. This, in theory,
* couuld cause lock inversions between the aio_write path and the splice path
@@ -451,7 +430,6 @@ xfs_file_splice_write(
{
struct inode *inode = outfilp->f_mapping->host;
struct xfs_inode *ip = XFS_I(inode);
- xfs_fsize_t new_size;
int ioflags = 0;
ssize_t ret;
@@ -465,20 +443,12 @@ xfs_file_splice_write(
xfs_ilock(ip, XFS_IOLOCK_EXCL);
- new_size = *ppos + count;
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (new_size > i_size_read(inode))
- ip->i_new_size = new_size;
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
if (ret > 0)
XFS_STATS_ADD(xs_write_bytes, ret);
- xfs_aio_write_newsize_update(ip, new_size);
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
return ret;
}
@@ -673,16 +643,13 @@ xfs_file_aio_write_checks(
struct file *file,
loff_t *pos,
size_t *count,
- xfs_fsize_t *new_sizep,
int *iolock)
{
struct inode *inode = file->f_mapping->host;
struct xfs_inode *ip = XFS_I(inode);
- xfs_fsize_t new_size;
int error = 0;
xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
- *new_sizep = 0;
restart:
error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
if (error) {
@@ -697,15 +664,13 @@ restart:
/*
* If the offset is beyond the size of the file, we need to zero any
* blocks that fall between the existing EOF and the start of this
- * write. There is no need to issue zeroing if another in-flght IO ends
- * at or before this one If zeronig is needed and we are currently
- * holding the iolock shared, we need to update it to exclusive which
- * involves dropping all locks and relocking to maintain correct locking
- * order. If we do this, restart the function to ensure all checks and
- * values are still valid.
+ * write. If zeroing is needed and we are currently holding the
+ * iolock shared, we need to update it to exclusive which involves
+ * dropping all locks and relocking to maintain correct locking order.
+ * If we do this, restart the function to ensure all checks and values
+ * are still valid.
*/
- if ((ip->i_new_size && *pos > ip->i_new_size) ||
- (!ip->i_new_size && *pos > i_size_read(inode))) {
+ if (*pos > i_size_read(inode)) {
if (*iolock == XFS_IOLOCK_SHARED) {
xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
*iolock = XFS_IOLOCK_EXCL;
@@ -714,19 +679,6 @@ restart:
}
error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
}
-
- /*
- * If this IO extends beyond EOF, we may need to update ip->i_new_size.
- * We have already zeroed space beyond EOF (if necessary). Only update
- * ip->i_new_size if this IO ends beyond any other in-flight writes.
- */
- new_size = *pos + *count;
- if (new_size > i_size_read(inode)) {
- if (new_size > ip->i_new_size)
- ip->i_new_size = new_size;
- *new_sizep = new_size;
- }
-
xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
if (error)
return error;
@@ -772,7 +724,6 @@ xfs_file_dio_aio_write(
unsigned long nr_segs,
loff_t pos,
size_t ocount,
- xfs_fsize_t *new_size,
int *iolock)
{
struct file *file = iocb->ki_filp;
@@ -817,7 +768,7 @@ xfs_file_dio_aio_write(
xfs_rw_ilock(ip, *iolock);
}
- ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
+ ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
if (ret)
return ret;
@@ -855,7 +806,6 @@ xfs_file_buffered_aio_write(
unsigned long nr_segs,
loff_t pos,
size_t ocount,
- xfs_fsize_t *new_size,
int *iolock)
{
struct file *file = iocb->ki_filp;
@@ -869,7 +819,7 @@ xfs_file_buffered_aio_write(
*iolock = XFS_IOLOCK_EXCL;
xfs_rw_ilock(ip, *iolock);
- ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
+ ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
if (ret)
return ret;
@@ -909,7 +859,6 @@ xfs_file_aio_write(
ssize_t ret;
int iolock;
size_t ocount = 0;
- xfs_fsize_t new_size = 0;
XFS_STATS_INC(xs_write_calls);
@@ -929,10 +878,10 @@ xfs_file_aio_write(
if (unlikely(file->f_flags & O_DIRECT))
ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
- ocount, &new_size, &iolock);
+ ocount, &iolock);
else
ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
- ocount, &new_size, &iolock);
+ ocount, &iolock);
if (ret <= 0)
goto out_unlock;
@@ -953,7 +902,6 @@ xfs_file_aio_write(
}
out_unlock:
- xfs_aio_write_newsize_update(ip, new_size);
xfs_rw_iunlock(ip, iolock);
return ret;
}
Index: xfs/fs/xfs/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_aops.c 2011-11-30 12:59:11.669698558 +0100
+++ xfs/fs/xfs/xfs_aops.c 2011-12-01 08:12:10.946664057 +0100
@@ -111,8 +111,7 @@ xfs_ioend_new_eof(
xfs_fsize_t bsize;
bsize = ioend->io_offset + ioend->io_size;
- isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
- isize = MIN(isize, bsize);
+ isize = MIN(i_size_read(VFS_I(ip)), bsize);
return isize > ip->i_d.di_size ? isize : 0;
}
@@ -126,11 +125,7 @@ static inline bool xfs_ioend_is_append(s
}
/*
- * Update on-disk file size now that data has been written to disk. The
- * current in-memory file size is i_size. If a write is beyond eof i_new_size
- * will be the intended file size until i_size is updated. If this write does
- * not extend all the way to the valid file size then restrict this update to
- * the end of the write.
+ * Update on-disk file size now that data has been written to disk.
*
* This function does not block as blocking on the inode lock in IO completion
* can lead to IO completion order dependency deadlocks.. If it can't get the
@@ -1279,6 +1274,15 @@ xfs_end_io_direct_write(
struct xfs_ioend *ioend = iocb->private;
/*
+ * While the generic direct I/O code updates the inode size, it does
+ * so only after the end_io handler is called, which means our
+ * end_io handler thinks the on-disk size is outside the in-core
+ * size. To prevent this just update it a little bit earlier here.
+ */
+ if (offset + size > i_size_read(ioend->io_inode))
+ i_size_write(ioend->io_inode, offset + size);
+
+ /*
* blockdev_direct_IO can return an error even after the I/O
* completion handler was called. Thus we need to protect
* against double-freeing.
@@ -1340,12 +1344,10 @@ xfs_vm_write_failed(
if (to > inode->i_size) {
/*
- * punch out the delalloc blocks we have already allocated. We
- * don't call xfs_setattr() to do this as we may be in the
- * middle of a multi-iovec write and so the vfs inode->i_size
- * will not match the xfs ip->i_size and so it will zero too
- * much. Hence we jus truncate the page cache to zero what is
- * necessary and punch the delalloc blocks directly.
+ * Punch out the delalloc blocks we have already allocated.
+ *
+ * Don't bother with xfs_setattr given that nothing can have
+ * it do disk yet as the page is still locked at this point.
*/
struct xfs_inode *ip = XFS_I(inode);
xfs_fileoff_t start_fsb;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c 2011-11-30 12:59:11.676365190 +0100
+++ xfs/fs/xfs/xfs_iget.c 2011-11-30 12:59:13.533021797 +0100
@@ -94,7 +94,6 @@ xfs_inode_alloc(
ip->i_update_core = 0;
ip->i_delayed_blks = 0;
memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
- ip->i_new_size = 0;
return ip;
}
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h 2011-11-30 12:59:11.673031874 +0100
+++ xfs/fs/xfs/xfs_trace.h 2011-11-30 12:59:13.536355113 +0100
@@ -891,7 +891,6 @@ DECLARE_EVENT_CLASS(xfs_file_class,
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(xfs_fsize_t, size)
- __field(xfs_fsize_t, new_size)
__field(loff_t, offset)
__field(size_t, count)
__field(int, flags)
@@ -900,17 +899,15 @@ DECLARE_EVENT_CLASS(xfs_file_class,
__entry->dev = VFS_I(ip)->i_sb->s_dev;
__entry->ino = ip->i_ino;
__entry->size = ip->i_d.di_size;
- __entry->new_size = ip->i_new_size;
__entry->offset = offset;
__entry->count = count;
__entry->flags = flags;
),
- TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
+ TP_printk("dev %d:%d ino 0x%llx size 0x%llx "
"offset 0x%llx count 0x%zx ioflags %s",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->size,
- __entry->new_size,
__entry->offset,
__entry->count,
__print_flags(__entry->flags, "|", XFS_IO_FLAGS))
@@ -978,7 +975,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(loff_t, size)
- __field(loff_t, new_size)
__field(loff_t, offset)
__field(size_t, count)
__field(int, type)
@@ -990,7 +986,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
__entry->dev = VFS_I(ip)->i_sb->s_dev;
__entry->ino = ip->i_ino;
__entry->size = ip->i_d.di_size;
- __entry->new_size = ip->i_new_size;
__entry->offset = offset;
__entry->count = count;
__entry->type = type;
@@ -998,13 +993,11 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
__entry->startblock = irec ? irec->br_startblock : 0;
__entry->blockcount = irec ? irec->br_blockcount : 0;
),
- TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
- "offset 0x%llx count %zd type %s "
- "startoff 0x%llx startblock %lld blockcount 0x%llx",
+ TP_printk("dev %d:%d ino 0x%llx size 0x%llx offset 0x%llx count %zd "
+ "type %s startoff 0x%llx startblock %lld blockcount 0x%llx",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->size,
- __entry->new_size,
__entry->offset,
__entry->count,
__print_symbolic(__entry->type, XFS_IO_TYPES),
@@ -1031,7 +1024,6 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
__field(xfs_ino_t, ino)
__field(loff_t, isize)
__field(loff_t, disize)
- __field(loff_t, new_size)
__field(loff_t, offset)
__field(size_t, count)
),
@@ -1040,17 +1032,15 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
__entry->ino = ip->i_ino;
__entry->isize = VFS_I(ip)->i_size;
__entry->disize = ip->i_d.di_size;
- __entry->new_size = ip->i_new_size;
__entry->offset = offset;
__entry->count = count;
),
- TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx new_size 0x%llx "
+ TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx "
"offset 0x%llx count %zd",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->isize,
__entry->disize,
- __entry->new_size,
__entry->offset,
__entry->count)
);
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h 2011-11-30 12:59:11.679698505 +0100
+++ xfs/fs/xfs/xfs_inode.h 2011-12-01 08:12:10.839997391 +0100
@@ -246,8 +246,6 @@ typedef struct xfs_inode {
xfs_icdinode_t i_d; /* most of ondisk inode */
- xfs_fsize_t i_new_size; /* size when write completes */
-
/* VFS inode */
struct inode i_vnode; /* embedded VFS inode */
} xfs_inode_t;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
2011-12-08 15:58 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
@ 2011-12-13 23:16 ` Dave Chinner
2011-12-14 13:30 ` Christoph Hellwig
0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 23:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:04AM -0500, Christoph Hellwig wrote:
> Now that we use the VFS i_size field throughout XFS there is no need for the
> i_new_size field any more given that the VFS i_size field gets updated
> in ->write_end before unlocking the page, and thus is a) always uptodate when
> writeback could see a page. Removing i_new_size also has the advantage that
> we will never have to trim back di_size during a failed buffered write,
> given that it never gets updated past i_size.
>
> Note that currently the generic direct I/O code only updates i_size after
> calling our end_io handler, which requires a small workaround to make
> sure di_size actually makes it to disk. I hope to fix this properly in
> the generic code.
I think there's a couple of issues with the work around - the
generic code marks the inode dirty when it updates the inode size.
With your early setting of the size, it will no longer do this
because it doesn't see that it needs to update the inode size.
This may not be a problem if we mark the inode dirty elsewhere, but
I'm not sure we do in the direct IO path.
Apart from that, I don't see anything wrong with the change. I'll
wait for you to clarify whether we need to mark the inode dirty
before signing off on it, though....
> A downside is that we lose the support for parallel non-overlapping O_DIRECT
> appending writes that recently was added. I don't think keeping the complex
> and fragile i_new_size infrastructure for this is a good tradeoff - if we
> really care about parallel appending writers we should investigate turning
> the iolock into a range lock, which would also allow for parallel
> non-overlapping buffered writers.
Yeah, i think we can live without that for the moment. Range locks
are a much better idea for many reasons (like concurrent buffered
write support).
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
2011-12-13 23:16 ` Dave Chinner
@ 2011-12-14 13:30 ` Christoph Hellwig
0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-14 13:30 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Dec 14, 2011 at 10:16:11AM +1100, Dave Chinner wrote:
> > Note that currently the generic direct I/O code only updates i_size after
> > calling our end_io handler, which requires a small workaround to make
> > sure di_size actually makes it to disk. I hope to fix this properly in
> > the generic code.
>
> I think there's a couple of issues with the work around - the
> generic code marks the inode dirty when it updates the inode size.
> With your early setting of the size, it will no longer do this
> because it doesn't see that it needs to update the inode size.
> This may not be a problem if we mark the inode dirty elsewhere, but
> I'm not sure we do in the direct IO path.
We do from I/O completion, once we actually update the on-disk inode
size that actually gets logged. Before the on-disk inode size has
been updated ->write_inode is a no-op as far as size updates are
concerned, so marking the inode dirty earlier doesn't buy us anything.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (8 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 23:20 ` Dave Chinner
2011-12-08 15:58 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-cleanup-xfs_file_aio_write_checks --]
[-- Type: text/plain, Size: 1293 bytes --]
While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_file.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c 2011-12-07 12:46:31.343897882 +0100
+++ xfs/fs/xfs/xfs_file.c 2011-12-07 12:48:33.309903801 +0100
@@ -636,7 +636,9 @@ out_lock:
/*
* Common pre-write limit and setup checks.
*
- * Returns with iolock held according to @iolock.
+ * Called with the iolocked held either shared and exclusive according to
+ * @iolock, and returns with it held. Might upgrade the iolock to exclusive
+ * if called for a direct write beyond i_size.
*/
STATIC ssize_t
xfs_file_aio_write_checks(
@@ -653,8 +655,7 @@ xfs_file_aio_write_checks(
restart:
error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
if (error) {
- xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
- *iolock = 0;
+ xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
return error;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
2011-12-08 15:58 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
@ 2011-12-13 23:20 ` Dave Chinner
2011-12-14 13:27 ` Christoph Hellwig
0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 23:20 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:05AM -0500, Christoph Hellwig wrote:
> While xfs_iunlock is fine with 0 lockflags the calling conventions are much
> cleaner if xfs_file_aio_write_checks never returns without the iolock held.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
I'm not sure this makes sense by itself. maybe combine it with the
next patch? Regardless:
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
2011-12-13 23:20 ` Dave Chinner
@ 2011-12-14 13:27 ` Christoph Hellwig
0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-14 13:27 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Dec 14, 2011 at 10:20:17AM +1100, Dave Chinner wrote:
> On Thu, Dec 08, 2011 at 10:58:05AM -0500, Christoph Hellwig wrote:
> > While xfs_iunlock is fine with 0 lockflags the calling conventions are much
> > cleaner if xfs_file_aio_write_checks never returns without the iolock held.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> I'm not sure this makes sense by itself. maybe combine it with the
> next patch? Regardless:
I actually had it that way initially but split the patches because the
changes aren't directly related.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 11/11] xfs: cleanup xfs_file_aio_write
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
` (9 preceding siblings ...)
2011-12-08 15:58 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
@ 2011-12-08 15:58 ` Christoph Hellwig
2011-12-13 23:28 ` Dave Chinner
10 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:58 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-cleanup-xfs_file_aio_write --]
[-- Type: text/plain, Size: 5634 bytes --]
With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_file.c | 82 ++++++++++++++++++++++++------------------------------
1 file changed, 37 insertions(+), 45 deletions(-)
Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c 2011-12-07 12:48:33.309903801 +0100
+++ xfs/fs/xfs/xfs_file.c 2011-12-07 12:48:48.546487926 +0100
@@ -724,8 +724,7 @@ xfs_file_dio_aio_write(
const struct iovec *iovp,
unsigned long nr_segs,
loff_t pos,
- size_t ocount,
- int *iolock)
+ size_t ocount)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
@@ -735,10 +734,10 @@ xfs_file_dio_aio_write(
ssize_t ret = 0;
size_t count = ocount;
int unaligned_io = 0;
+ int iolock;
struct xfs_buftarg *target = XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
- *iolock = 0;
if ((pos & target->bt_smask) || (count & target->bt_smask))
return -XFS_ERROR(EINVAL);
@@ -753,31 +752,31 @@ xfs_file_dio_aio_write(
* EOF zeroing cases and fill out the new inode size as appropriate.
*/
if (unaligned_io || mapping->nrpages)
- *iolock = XFS_IOLOCK_EXCL;
+ iolock = XFS_IOLOCK_EXCL;
else
- *iolock = XFS_IOLOCK_SHARED;
- xfs_rw_ilock(ip, *iolock);
+ iolock = XFS_IOLOCK_SHARED;
+ xfs_rw_ilock(ip, iolock);
/*
* Recheck if there are cached pages that need invalidate after we got
* the iolock to protect against other threads adding new pages while
* we were waiting for the iolock.
*/
- if (mapping->nrpages && *iolock == XFS_IOLOCK_SHARED) {
- xfs_rw_iunlock(ip, *iolock);
- *iolock = XFS_IOLOCK_EXCL;
- xfs_rw_ilock(ip, *iolock);
+ if (mapping->nrpages && iolock == XFS_IOLOCK_SHARED) {
+ xfs_rw_iunlock(ip, iolock);
+ iolock = XFS_IOLOCK_EXCL;
+ xfs_rw_ilock(ip, iolock);
}
- ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
if (ret)
- return ret;
+ goto out;
if (mapping->nrpages) {
ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
FI_REMAPF_LOCKED);
if (ret)
- return ret;
+ goto out;
}
/*
@@ -786,15 +785,18 @@ xfs_file_dio_aio_write(
*/
if (unaligned_io)
inode_dio_wait(inode);
- else if (*iolock == XFS_IOLOCK_EXCL) {
+ else if (iolock == XFS_IOLOCK_EXCL) {
xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
- *iolock = XFS_IOLOCK_SHARED;
+ iolock = XFS_IOLOCK_SHARED;
}
trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
ret = generic_file_direct_write(iocb, iovp,
&nr_segs, pos, &iocb->ki_pos, count, ocount);
+out:
+ xfs_rw_iunlock(ip, iolock);
+
/* No fallback to buffered IO on errors for XFS. */
ASSERT(ret < 0 || ret == count);
return ret;
@@ -806,8 +808,7 @@ xfs_file_buffered_aio_write(
const struct iovec *iovp,
unsigned long nr_segs,
loff_t pos,
- size_t ocount,
- int *iolock)
+ size_t ocount)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
@@ -815,14 +816,14 @@ xfs_file_buffered_aio_write(
struct xfs_inode *ip = XFS_I(inode);
ssize_t ret;
int enospc = 0;
+ int iolock = XFS_IOLOCK_EXCL;
size_t count = ocount;
- *iolock = XFS_IOLOCK_EXCL;
- xfs_rw_ilock(ip, *iolock);
+ xfs_rw_ilock(ip, iolock);
- ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
if (ret)
- return ret;
+ goto out;
/* We can write back this queue in page reclaim */
current->backing_dev_info = mapping->backing_dev_info;
@@ -836,13 +837,15 @@ write_retry:
* page locks and retry *once*
*/
if (ret == -ENOSPC && !enospc) {
- ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
- if (ret)
- return ret;
enospc = 1;
- goto write_retry;
+ ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
+ if (!ret)
+ goto write_retry;
}
+
current->backing_dev_info = NULL;
+out:
+ xfs_rw_iunlock(ip, iolock);
return ret;
}
@@ -858,7 +861,6 @@ xfs_file_aio_write(
struct inode *inode = mapping->host;
struct xfs_inode *ip = XFS_I(inode);
ssize_t ret;
- int iolock;
size_t ocount = 0;
XFS_STATS_INC(xs_write_calls);
@@ -878,32 +880,22 @@ xfs_file_aio_write(
return -EIO;
if (unlikely(file->f_flags & O_DIRECT))
- ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
- ocount, &iolock);
+ ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos, ocount);
else
ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
- ocount, &iolock);
+ ocount);
- if (ret <= 0)
- goto out_unlock;
+ if (ret > 0) {
+ ssize_t err;
- XFS_STATS_ADD(xs_write_bytes, ret);
+ XFS_STATS_ADD(xs_write_bytes, ret);
- /* Handle various SYNC-type writes */
- if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
- loff_t end = pos + ret - 1;
- int error;
-
- xfs_rw_iunlock(ip, iolock);
- error = xfs_file_fsync(file, pos, end,
- (file->f_flags & __O_SYNC) ? 0 : 1);
- xfs_rw_ilock(ip, iolock);
- if (error)
- ret = error;
+ /* Handle various SYNC-type writes */
+ err = generic_write_sync(file, pos, ret);
+ if (err < 0)
+ ret = err;
}
-out_unlock:
- xfs_rw_iunlock(ip, iolock);
return ret;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 11/11] xfs: cleanup xfs_file_aio_write
2011-12-08 15:58 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
@ 2011-12-13 23:28 ` Dave Chinner
0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-12-13 23:28 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 08, 2011 at 10:58:06AM -0500, Christoph Hellwig wrote:
> With all the size field updates out of the way xfs_file_aio_write can
> be further simplified by pushing all iolock handling into
> xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
> the generic generic_write_sync helper for synchronous writes.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok to me. The whole series cleans up the IO path nicely.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 27+ messages in thread