public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
@ 2013-09-09  8:34 Dave Chinner
  2013-09-09 13:32 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-09-09  8:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

v5 filesystems use 512 byte inodes as a minimum, so read inodes in
clusters that are effectively half the size of a v4 filesystem with
256 byte inodes. For v5 fielsystems, scale the inode cluster size
with the size of the inode so that we keep a constant 32 inodes per
cluster ratio for all inode IO.

This only works if mkfs.xfs sets the inode alignment appropriately
for larger inode clusters, so this functionality is made conditional
on mkfs doing the right thing. xfs_repair needs to know about
the inode alignment changes, too.

FWIW, results with lookaside cache size of 37 entries with this
patch are (Note: finobt enabled on v5 filesystems, v4 using defaults
including 256 byte inode size):

Wall time:
	create	bulkstat	find+stat	ls -R	unlink
v4	237s	161s		173s		201s	299s
v5	235s	163s		205s		 31s	356s
patched	234s	160s		182s		 29s	317s

System time:
	create	bulkstat	find+stat	ls -R	unlink
v4	2601s	2490s		1653s		1656s	2960s
v5	2637s	2497s		1681s		  20s	3216s
patched	2613s	2451s		1658s		  20s	3007s

Lookaside cache hit rate:
	create	bulkstat	find+stat	ls -R	unlink
v4	0.73	0.91		0.71		0.70	0.71
v5	0.76	0.88		0.68		0.10	0.75
patched	0.81	0.93		0.70		0.08	0.84

So, wall time same or down across the board, system time same or
down across the board, and cache hit rates all improve except for
the ls -R case which is a pure cold cache directory read workload
on v5 filesystems...

So, this patch removes most of the performance and CPU usage
differential between v4 and v5 filesystems on traversal related
workloads.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c      | 14 ++++++++++++++
 fs/xfs/xfs_mount.h      |  2 +-
 fs/xfs/xfs_trans_resv.c |  3 +--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 8ac98c7..788d666d 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -722,8 +722,22 @@ xfs_mountfs(
 	 * Set the inode cluster size.
 	 * This may still be overridden by the file system
 	 * block size if it is larger than the chosen cluster size.
+	 *
+	 * For v5 filesystems, scale the cluster size with the inode size to
+	 * keep a constant ratio of inode per cluster buffer, but only if mkfs
+	 * has set the inode alignment value appropriately for larger cluster
+	 * sizes.
 	 */
 	mp->m_inode_cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		int	new_size = mp->m_inode_cluster_size;
+
+		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
+		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
+			mp->m_inode_cluster_size = new_size;
+		xfs_info(mp, "Using inode cluster size of %d bytes",
+			 mp->m_inode_cluster_size);
+	}
 
 	/*
 	 * Set inode alignment fields
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 2a997dc..a4f7f94 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -111,7 +111,7 @@ typedef struct xfs_mount {
 	__uint8_t		m_blkbb_log;	/* blocklog - BBSHIFT */
 	__uint8_t		m_agno_log;	/* log #ag's */
 	__uint8_t		m_agino_log;	/* #bits for agino in inum */
-	__uint16_t		m_inode_cluster_size;/* min inode buf size */
+	uint			m_inode_cluster_size;/* min inode buf size */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	uint			m_blockwmask;	/* blockwsize-1 */
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index ae7a185..1494f62 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -403,8 +403,7 @@ xfs_calc_ifree_reservation(
 		xfs_calc_inode_res(mp, 1) +
 		xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
 		xfs_calc_buf_res(2, XFS_FSB_TO_B(mp, 1)) +
-		MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
-		    XFS_INODE_CLUSTER_SIZE(mp)) +
+		MAX(XFS_FSB_TO_B(mp, 1), XFS_INODE_CLUSTER_SIZE(mp)) +
 		xfs_calc_buf_res(1, 0) +
 		xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
 				 mp->m_in_maxlevels, 0) +
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-09  8:34 [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems Dave Chinner
@ 2013-09-09 13:32 ` Christoph Hellwig
  2013-09-09 13:54   ` Eric Sandeen
  2013-09-09 15:35   ` Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2013-09-09 13:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

I can't see anything v5 format specific here, as 512 byte inodes can be
created with v4 filesystems as well.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-09 13:32 ` Christoph Hellwig
@ 2013-09-09 13:54   ` Eric Sandeen
  2013-09-09 15:35   ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2013-09-09 13:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 9/9/13 8:32 AM, Christoph Hellwig wrote:
> I can't see anything v5 format specific here, as 512 byte inodes can be
> created with v4 filesystems as well.

Good point - why is this uniquely needed for V5 filesystems?
If it wasn't a problem w/ larger inodes before, why is it now?

This makes the dreaded test matrix go a bit more wonky...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-09 13:32 ` Christoph Hellwig
  2013-09-09 13:54   ` Eric Sandeen
@ 2013-09-09 15:35   ` Dave Chinner
  2013-09-11 16:21     ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-09-09 15:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Sep 09, 2013 at 06:32:54AM -0700, Christoph Hellwig wrote:
> I can't see anything v5 format specific here, as 512 byte inodes can be
> created with v4 filesystems as well.

It's an RFC, and so i've just done the minimum I need to do to
verify it is working corectly. Indeed, I'm not sure it is all there
yet, as I haven't tested increased inode alignments on older
kernels. So I'm not sure whether it will need a feature bit or
not yet.

The test matrix of having to test everything on v4 and v5 is just
nasty, especially if we are talking about prototyping code. I'd much
prefer to bring things to v5 filesytsems where we have much lower
exposure and risk of corruption problems, and then when we know it's
solid because of the QA we've done on it, then we can expose the
majority of the XFS userbase to it by bringing it back to v4
filesystems.

As i've said before - if someone wants to bring new features to v4
filesystems sooner than I do, then they need to step up and do the
work themselves, because I don't have the time and resources to
fully verify new features on v4 filesystem formats...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-09 15:35   ` Dave Chinner
@ 2013-09-11 16:21     ` Christoph Hellwig
  2013-09-17  1:04       ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2013-09-11 16:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
> The test matrix of having to test everything on v4 and v5 is just
> nasty, especially if we are talking about prototyping code. I'd much
> prefer to bring things to v5 filesytsems where we have much lower
> exposure and risk of corruption problems, and then when we know it's
> solid because of the QA we've done on it, then we can expose the
> majority of the XFS userbase to it by bringing it back to v4
> filesystems.

I think the test matrix is a reason for not enabling this only on v5
filesystems.  Large inodes are an old and supported use case, although
probably not as heavily tested as it should.  By introducing two
different large inode cases we don't really help increasing test
coverage for a code path that is the same for v4 and v5.

That being said as long as you're still prototyping I'm not going to
interfere.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-11 16:21     ` Christoph Hellwig
@ 2013-09-17  1:04       ` Dave Chinner
  2013-09-17 13:51         ` Mark Tinguely
  2013-09-17 14:46         ` Eric Sandeen
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2013-09-17  1:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
> On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
> > The test matrix of having to test everything on v4 and v5 is just
> > nasty, especially if we are talking about prototyping code. I'd much
> > prefer to bring things to v5 filesytsems where we have much lower
> > exposure and risk of corruption problems, and then when we know it's
> > solid because of the QA we've done on it, then we can expose the
> > majority of the XFS userbase to it by bringing it back to v4
> > filesystems.
> 
> I think the test matrix is a reason for not enabling this only on v5
> filesystems. 

You're assuming that someone is doing lots of QA on v4 filesystems.
Most of my attention is focussed on v5 filesystems and compared to
the amount of v5 QA I'm doing, there is very little v4 QA. All my
development and prototyping is being done on v5 filesystems, and the
code I post indicates that.

I'm not about to propose new features for v4 filesystems if I
haven't tested them robustly. And, in many cases, the new features
I'm proposing require a new filesystem to be made (like this one
does because of the inode alignment requirement) and userspace tool
support, and so it's going to be months (maybe a year) before
userspace support is in the hands of distro-based users.

People testing v5 filesystems right now are handrolling their
userspace code, and so they are following the bleeding edge of both
user and kernel space development. They are not using the bleeding
edge to test new v4 filesystem features. 

Given this, it makes sense to roll the v5 code first, then a
kernel release or 2 later roll in the v4 support once the v5 code
has been exposed and we've flushed out the problems. It minimises
our exposure to filesystem corruption issues, it gets the code into
the hands of early adopters and testers quickly, and it gets rolled
back into v4 filesystems in the same timeframe as distros will be
picking up the feature in v5 filesystems for the first time.

Nobody has yet given a technical reason why such a careful, staged
approach to new feature rollout for v4 filesystems is untenable. All
I'm hearing is people shouting at me for not bringing new features
to v4 filesystems.  Indeed, my reasons and plans to bring the
features to v4 in the near future are being completely ignored to
the point of recklessness...

> Large inodes are an old and supported use case, although
> probably not as heavily tested as it should.  By introducing two
> different large inode cases we don't really help increasing test
> coverage for a code path that is the same for v4 and v5.

I think you've got it wrong - 512 byte inodes have not been
regularly or heavily tested until we introduced v5 filesystems. Now
they are getting tested all the time on v5 filesystems, but AFAICT
there's only one person other than me regularly testing v5
filesystems and reporting bugs (Michael Semon).  Hence, AFAICT there
is very little ongoing test coverage of large inodes on v4
filesystems, and so the expansion of the test matrix to cover large
inodes on v4 filesystem is a very relevant concern.

We will be enabling both d_type and large inode clusters on v5
filesystems at all times - they won't be optional features. Hence
test matrix is simple - enable v5, all new features are enabled and
are tested.

However, for v4 filesystems, we've now got default v4, v4 X dtype,
v4 X dtype X 512 byte inodes, v4 X dtype X 512 byte inodes X inode
alignment (i.e. forwards and backwards compatibility of large inode
cluster configs on old 8k cluster kernels) and finally v4 X dtype X
512 byte inodes X inode alignment X large clusters.

IOWs, every change we make for v4 filesystems adds another
*optional* dimension to the v4 filesystem test matrix. Such an
explosion of feature configurations is not sustainable or
maintainable - ext4 has proven that beyond a doubt.  We have to
consider the cross multiplication of the optional v4 feature matrix,
and consider that everything needs to work correctly for all the
different combinations that can be made.

So, code paths might be shared between v4 and v5 filesystems, but we
don't have an optional feature matrix on v5 (yet), nor do we have
concerns about backwards and forwards compatibility, and so adding
new features to v5 filesystems has a far, far lower testing and QA
burden than adding a new feature to a v4 filesystem.

As I've repeatedly said, if someone wants to do all the v4
validation work I've mentioned above faster than I can do it, then
they can provide the patches for the v4 support in kernel and
userspace and all the tests needed to validate it on v4 filesystems.

[ And even then, the v4 dtype fiasco shows that some people have a
major misunderstanding of what is necessary to enable a feature on a
v4 filesystem. I'm still waiting for all the missing bits I
mentioned in my review of the patch to add the feature bit that were
ignored. e.g. the XFS_IOC_FSGEOM support for the feature bit, the
changes to xfs_info to emit that it's enabled, mkfs to emit that
it's enabled, xfs_db support for the field on v4 filesystems, etc.

IOWs, there is still a significant amount missing from the v4 dtype
support and so, again, I have little confidence that such things
will get done properly until I get around to doing them. I'll be be
pleasently surprised if the patches appear before I write them (the
kernel XFS_IOC_FSGEOM support needs to be in before 3.12 releases),
but I fear that I'm going to be forced to write them soon.... ]

> That being said as long as you're still prototyping I'm not going to
> interfere.

Until I see other people pro-actively fixing regressions, I don't
see that there is any scope for changing my approach. Right now the
only person I can really rely on to proactively fix problems is
myself, and I have limited time and resources...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-17  1:04       ` Dave Chinner
@ 2013-09-17 13:51         ` Mark Tinguely
  2013-09-17 21:25           ` Dave Chinner
  2013-09-17 14:46         ` Eric Sandeen
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Tinguely @ 2013-09-17 13:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On 09/16/13 20:04, Dave Chinner wrote:
> On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
>> On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
>>> The test matrix of having to test everything on v4 and v5 is just
>>> nasty, especially if we are talking about prototyping code. I'd much
>>> prefer to bring things to v5 filesytsems where we have much lower
>>> exposure and risk of corruption problems, and then when we know it's
>>> solid because of the QA we've done on it, then we can expose the
>>> majority of the XFS userbase to it by bringing it back to v4
>>> filesystems.
>>
>> I think the test matrix is a reason for not enabling this only on v5
>> filesystems.
>
> You're assuming that someone is doing lots of QA on v4 filesystems.
> Most of my attention is focussed on v5 filesystems and compared to
> the amount of v5 QA I'm doing, there is very little v4 QA. All my
> development and prototyping is being done on v5 filesystems, and the
> code I post indicates that.
>
> I'm not about to propose new features for v4 filesystems if I
> haven't tested them robustly. And, in many cases, the new features
> I'm proposing require a new filesystem to be made (like this one
> does because of the inode alignment requirement) and userspace tool
> support, and so it's going to be months (maybe a year) before
> userspace support is in the hands of distro-based users.
>
> People testing v5 filesystems right now are handrolling their
> userspace code, and so they are following the bleeding edge of both
> user and kernel space development. They are not using the bleeding
> edge to test new v4 filesystem features.
>
> Given this, it makes sense to roll the v5 code first, then a
> kernel release or 2 later roll in the v4 support once the v5 code
> has been exposed and we've flushed out the problems. It minimises
> our exposure to filesystem corruption issues, it gets the code into
> the hands of early adopters and testers quickly, and it gets rolled
> back into v4 filesystems in the same timeframe as distros will be
> picking up the feature in v5 filesystems for the first time.
>
> Nobody has yet given a technical reason why such a careful, staged
> approach to new feature rollout for v4 filesystems is untenable. All
> I'm hearing is people shouting at me for not bringing new features
> to v4 filesystems.  Indeed, my reasons and plans to bring the
> features to v4 in the near future are being completely ignored to
> the point of recklessness...
>
>> Large inodes are an old and supported use case, although
>> probably not as heavily tested as it should.  By introducing two
>> different large inode cases we don't really help increasing test
>> coverage for a code path that is the same for v4 and v5.
>
> I think you've got it wrong - 512 byte inodes have not been
> regularly or heavily tested until we introduced v5 filesystems. Now
> they are getting tested all the time on v5 filesystems, but AFAICT
> there's only one person other than me regularly testing v5
> filesystems and reporting bugs (Michael Semon).  Hence, AFAICT there
> is very little ongoing test coverage of large inodes on v4
> filesystems, and so the expansion of the test matrix to cover large
> inodes on v4 filesystem is a very relevant concern.
>
> We will be enabling both d_type and large inode clusters on v5
> filesystems at all times - they won't be optional features. Hence
> test matrix is simple - enable v5, all new features are enabled and
> are tested.
>
> However, for v4 filesystems, we've now got default v4, v4 X dtype,
> v4 X dtype X 512 byte inodes, v4 X dtype X 512 byte inodes X inode
> alignment (i.e. forwards and backwards compatibility of large inode
> cluster configs on old 8k cluster kernels) and finally v4 X dtype X
> 512 byte inodes X inode alignment X large clusters.
>
> IOWs, every change we make for v4 filesystems adds another
> *optional* dimension to the v4 filesystem test matrix. Such an
> explosion of feature configurations is not sustainable or
> maintainable - ext4 has proven that beyond a doubt.  We have to
> consider the cross multiplication of the optional v4 feature matrix,
> and consider that everything needs to work correctly for all the
> different combinations that can be made.
>
> So, code paths might be shared between v4 and v5 filesystems, but we
> don't have an optional feature matrix on v5 (yet), nor do we have
> concerns about backwards and forwards compatibility, and so adding
> new features to v5 filesystems has a far, far lower testing and QA
> burden than adding a new feature to a v4 filesystem.
>
> As I've repeatedly said, if someone wants to do all the v4
> validation work I've mentioned above faster than I can do it, then
> they can provide the patches for the v4 support in kernel and
> userspace and all the tests needed to validate it on v4 filesystems.
>
> [ And even then, the v4 dtype fiasco shows that some people have a
> major misunderstanding of what is necessary to enable a feature on a
> v4 filesystem. I'm still waiting for all the missing bits I
> mentioned in my review of the patch to add the feature bit that were
> ignored. e.g. the XFS_IOC_FSGEOM support for the feature bit, the
> changes to xfs_info to emit that it's enabled, mkfs to emit that
> it's enabled, xfs_db support for the field on v4 filesystems, etc.
>
> IOWs, there is still a significant amount missing from the v4 dtype
> support and so, again, I have little confidence that such things
> will get done properly until I get around to doing them. I'll be be
> pleasently surprised if the patches appear before I write them (the
> kernel XFS_IOC_FSGEOM support needs to be in before 3.12 releases),
> but I fear that I'm going to be forced to write them soon.... ]
>
>> That being said as long as you're still prototyping I'm not going to
>> interfere.
>
> Until I see other people pro-actively fixing regressions, I don't
> see that there is any scope for changing my approach. Right now the
> only person I can really rely on to proactively fix problems is
> myself, and I have limited time and resources...
>
> Cheers,
>
> Dave.

We are *not* screaming for this on v4. Not screaming for this to be 
mandatory on v5.

It will make inode allocation more difficult as the drive fragments.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-17  1:04       ` Dave Chinner
  2013-09-17 13:51         ` Mark Tinguely
@ 2013-09-17 14:46         ` Eric Sandeen
  2013-09-17 21:11           ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2013-09-17 14:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On 9/16/13 8:04 PM, Dave Chinner wrote:
> On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
>> On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
>>> The test matrix of having to test everything on v4 and v5 is just
>>> nasty, especially if we are talking about prototyping code. I'd much
>>> prefer to bring things to v5 filesytsems where we have much lower
>>> exposure and risk of corruption problems, and then when we know it's
>>> solid because of the QA we've done on it, then we can expose the
>>> majority of the XFS userbase to it by bringing it back to v4
>>> filesystems.
>>
>> I think the test matrix is a reason for not enabling this only on v5
>> filesystems. 
> 
> You're assuming that someone is doing lots of QA on v4 filesystems.
> Most of my attention is focussed on v5 filesystems and compared to
> the amount of v5 QA I'm doing, there is very little v4 QA. All my
> development and prototyping is being done on v5 filesystems, and the
> code I post indicates that.

Red Hat QE is doing quite a lot of testing of V4 at this point, although
not on totally bleeding-edge kernels.

> I'm not about to propose new features for v4 filesystems if I
> haven't tested them robustly. And, in many cases, the new features
> I'm proposing require a new filesystem to be made (like this one
> does because of the inode alignment requirement) and userspace tool
> support, and so it's going to be months (maybe a year) before
> userspace support is in the hands of distro-based users.
> 
> People testing v5 filesystems right now are handrolling their
> userspace code, and so they are following the bleeding edge of both
> user and kernel space development. They are not using the bleeding
> edge to test new v4 filesystem features. 
> 
> Given this, it makes sense to roll the v5 code first, then a
> kernel release or 2 later roll in the v4 support once the v5 code
> has been exposed and we've flushed out the problems. It minimises
> our exposure to filesystem corruption issues, it gets the code into
> the hands of early adopters and testers quickly, and it gets rolled
> back into v4 filesystems in the same timeframe as distros will be
> picking up the feature in v5 filesystems for the first time.
> 
> Nobody has yet given a technical reason why such a careful, staged
> approach to new feature rollout for v4 filesystems is untenable. All
> I'm hearing is people shouting at me for not bringing new features
> to v4 filesystems.  Indeed, my reasons and plans to bring the
> features to v4 in the near future are being completely ignored to
> the point of recklessness...

That sounds perfectly reasonable to me; from your original RFC
it wasn't clear to me that that was the plan (stage it & roll it out
for V4 later).

>> Large inodes are an old and supported use case, although
>> probably not as heavily tested as it should.  By introducing two
>> different large inode cases we don't really help increasing test
>> coverage for a code path that is the same for v4 and v5.
> 
> I think you've got it wrong - 512 byte inodes have not been
> regularly or heavily tested until we introduced v5 filesystems.

Gluster users have been advised to use 512-byte inodes for quite
some time, actually.
(http://www.gluster.org/community/documentation/index.php/QuickStart)

So there is some real-world coverage, and presumably QE as well.

I understand your valid concerns (snipped below as well) but let's not
overstate the case either; V4 and 512-byte are both seeing some coverage
even today.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-17 14:46         ` Eric Sandeen
@ 2013-09-17 21:11           ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2013-09-17 21:11 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, xfs

On Tue, Sep 17, 2013 at 09:46:15AM -0500, Eric Sandeen wrote:
> On 9/16/13 8:04 PM, Dave Chinner wrote:
> > On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
> >> On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
> >>> The test matrix of having to test everything on v4 and v5 is just
> >>> nasty, especially if we are talking about prototyping code. I'd much
> >>> prefer to bring things to v5 filesytsems where we have much lower
> >>> exposure and risk of corruption problems, and then when we know it's
> >>> solid because of the QA we've done on it, then we can expose the
> >>> majority of the XFS userbase to it by bringing it back to v4
> >>> filesystems.
> >>
> >> I think the test matrix is a reason for not enabling this only on v5
> >> filesystems. 
> > 
> > You're assuming that someone is doing lots of QA on v4 filesystems.
> > Most of my attention is focussed on v5 filesystems and compared to
> > the amount of v5 QA I'm doing, there is very little v4 QA. All my
> > development and prototyping is being done on v5 filesystems, and the
> > code I post indicates that.
> 
> Red Hat QE is doing quite a lot of testing of V4 at this point, although
> not on totally bleeding-edge kernels.

Right, there is QE being done, but not by developers writing new
features....

> > I'm hearing is people shouting at me for not bringing new features
> > to v4 filesystems.  Indeed, my reasons and plans to bring the
> > features to v4 in the near future are being completely ignored to
> > the point of recklessness...
> 
> That sounds perfectly reasonable to me; from your original RFC
> it wasn't clear to me that that was the plan (stage it & roll it out
> for V4 later).

You should assume this to be true for anything I do on v5
filesystems. If something can't be brought back to v4 filesystems,
then I'll make sure that everyone knows that.

However, even if something is possible, that doesn't mean it is a
good idea. There'll be plenty of new features coming through in the
not-to-distant future, and I'm seriously questioning the wisdom of
expecting everything to be made optional on v4 filesystems due to
the test matrix explosion. If it can be done in such a way that
doesn't explode the test matrix, then it's les sof an issue

Hence it may be worthwhile waiting on v4 and batching mutliple
features under a single feature bit (e.g. free inode btree, partial
inode chunk allocation and larger inode clusters all enabled by a
single v4 feature bit) once all the features are landed in v5 and
are stable. But a 1:1 mapping of features to mkfs options is simply
going to lead to problems due to insufficient QA coverage....

> >> Large inodes are an old and supported use case, although
> >> probably not as heavily tested as it should.  By introducing two
> >> different large inode cases we don't really help increasing test
> >> coverage for a code path that is the same for v4 and v5.
> > 
> > I think you've got it wrong - 512 byte inodes have not been
> > regularly or heavily tested until we introduced v5 filesystems.
> 
> Gluster users have been advised to use 512-byte inodes for quite
> some time, actually.
> (http://www.gluster.org/community/documentation/index.php/QuickStart)
> 
> So there is some real-world coverage, and presumably QE as well.

Right, but that is not testing code being developed and merged into
upstream dev trees and -rc kernels. i.e. it is upstream, bleeding
edge test coverage that I'm worried about, not on the kernels that
downstream products ship with...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
  2013-09-17 13:51         ` Mark Tinguely
@ 2013-09-17 21:25           ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2013-09-17 21:25 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Christoph Hellwig, xfs

On Tue, Sep 17, 2013 at 08:51:09AM -0500, Mark Tinguely wrote:
> On 09/16/13 20:04, Dave Chinner wrote:
> >On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
> >>On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
> >>>The test matrix of having to test everything on v4 and v5 is just
> >>>nasty, especially if we are talking about prototyping code. I'd much
> >>>prefer to bring things to v5 filesytsems where we have much lower
> >>>exposure and risk of corruption problems, and then when we know it's
> >>>solid because of the QA we've done on it, then we can expose the
> >>>majority of the XFS userbase to it by bringing it back to v4
> >>>filesystems.
> >>
> >>I think the test matrix is a reason for not enabling this only on v5
> >>filesystems.
....
> We are *not* screaming for this on v4. Not screaming for this to be
> mandatory on v5.
> 
> It will make inode allocation more difficult as the drive fragments.

Yes. But we have a plan to solve that:

http://oss.sgi.com/archives/xfs/2013-08/msg00346.html

And this work follows directly after Brian's free inode btree
patches. i.e. you need to consider this patch in the context of the
architectural modifications to inode allocation that have been
posted for discussion, not as an isolated, random change.

As I've mentioned in the past, I publish design documentation so
that everyone knows what goals we're working towards and the steps
being taking to get there. This is just a small piece in that
puzzle.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-09-17 21:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09  8:34 [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems Dave Chinner
2013-09-09 13:32 ` Christoph Hellwig
2013-09-09 13:54   ` Eric Sandeen
2013-09-09 15:35   ` Dave Chinner
2013-09-11 16:21     ` Christoph Hellwig
2013-09-17  1:04       ` Dave Chinner
2013-09-17 13:51         ` Mark Tinguely
2013-09-17 21:25           ` Dave Chinner
2013-09-17 14:46         ` Eric Sandeen
2013-09-17 21:11           ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox