public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
@ 2018-01-09 14:47 Jeff Layton
  2018-01-09 18:52 ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2018-01-09 14:47 UTC (permalink / raw)
  To: adilger.kernel, tytso; +Cc: linux-ext4, dhowells

From: Jeff Layton <jlayton@redhat.com>

This patch is based on top of the i_version rework that I'm flogging
upstream. It's just a cleanup of some nastiness I noticed while in there.

This code uses the i_ctime and i_version fields to each store half of
a refcount. I suspect it was done this way long ago when the i_version
field was a 32 bits, and was never changed when the field was converted
to a 64 bit value.

Change the code to just store the refcount in the i_version field rather
than splitting it across both fields.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/ext4/xattr.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 63656dbafdc4..6ea78dd367ca 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -294,14 +294,12 @@ ext4_xattr_inode_hash(struct ext4_sb_info *sbi, const void *buffer, size_t size)
 
 static u64 ext4_xattr_inode_get_ref(struct inode *ea_inode)
 {
-	return ((u64)ea_inode->i_ctime.tv_sec << 32) |
-		(u32) inode_peek_iversion_raw(ea_inode);
+	return inode_peek_iversion_raw(ea_inode);
 }
 
 static void ext4_xattr_inode_set_ref(struct inode *ea_inode, u64 ref_count)
 {
-	ea_inode->i_ctime.tv_sec = (u32)(ref_count >> 32);
-	inode_set_iversion_raw(ea_inode, ref_count & 0xffffffff);
+	inode_set_iversion_raw(ea_inode, ref_count);
 }
 
 static u32 ext4_xattr_inode_get_hash(struct inode *ea_inode)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
  2018-01-09 14:47 [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields Jeff Layton
@ 2018-01-09 18:52 ` Darrick J. Wong
  2018-01-09 19:05   ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2018-01-09 18:52 UTC (permalink / raw)
  To: Jeff Layton; +Cc: adilger.kernel, tytso, linux-ext4, dhowells

On Tue, Jan 09, 2018 at 09:47:01AM -0500, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> This patch is based on top of the i_version rework that I'm flogging
> upstream. It's just a cleanup of some nastiness I noticed while in there.
> 
> This code uses the i_ctime and i_version fields to each store half of
> a refcount. I suspect it was done this way long ago when the i_version

Way long ago == 2017-06-22 :)

The new 64k xattr value feature in ext4 uses hidden inodes to store attr
values that don't fit in a single block.  Since the inode can be shared
by multiple xattr keys and isn't exported via NFS (I hope...), they use
a 64-bit refcount mashed into i_ctime and i_version.  That's what this
gobbledygook is for.

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Large_Extended_Attribute_Values

--D

> field was a 32 bits, and was never changed when the field was converted
> to a 64 bit value.
> 
> Change the code to just store the refcount in the i_version field rather
> than splitting it across both fields.
> 
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/ext4/xattr.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 63656dbafdc4..6ea78dd367ca 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -294,14 +294,12 @@ ext4_xattr_inode_hash(struct ext4_sb_info *sbi, const void *buffer, size_t size)
>  
>  static u64 ext4_xattr_inode_get_ref(struct inode *ea_inode)
>  {
> -	return ((u64)ea_inode->i_ctime.tv_sec << 32) |
> -		(u32) inode_peek_iversion_raw(ea_inode);
> +	return inode_peek_iversion_raw(ea_inode);
>  }
>  
>  static void ext4_xattr_inode_set_ref(struct inode *ea_inode, u64 ref_count)
>  {
> -	ea_inode->i_ctime.tv_sec = (u32)(ref_count >> 32);
> -	inode_set_iversion_raw(ea_inode, ref_count & 0xffffffff);
> +	inode_set_iversion_raw(ea_inode, ref_count);
>  }
>  
>  static u32 ext4_xattr_inode_get_hash(struct inode *ea_inode)
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
  2018-01-09 18:52 ` Darrick J. Wong
@ 2018-01-09 19:05   ` Jeff Layton
  2018-01-11 18:39     ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2018-01-09 19:05 UTC (permalink / raw)
  To: Darrick J. Wong, tahsin; +Cc: adilger.kernel, tytso, linux-ext4, dhowells

On Tue, 2018-01-09 at 10:52 -0800, Darrick J. Wong wrote:
> On Tue, Jan 09, 2018 at 09:47:01AM -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > This patch is based on top of the i_version rework that I'm flogging
> > upstream. It's just a cleanup of some nastiness I noticed while in there.
> > 
> > This code uses the i_ctime and i_version fields to each store half of
> > a refcount. I suspect it was done this way long ago when the i_version
> 
> Way long ago == 2017-06-22 :)
> 

(cc'ing Tahsin)

Hah, ok. I was thinking this was legacy code, but I guess not! I should
have probably done some git archaeology first.

> The new 64k xattr value feature in ext4 uses hidden inodes to store attr
> values that don't fit in a single block.  Since the inode can be shared
> by multiple xattr keys and isn't exported via NFS (I hope...), they use
> a 64-bit refcount mashed into i_ctime and i_version.  That's what this
> gobbledygook is for.
> 
> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Large_Extended_Attribute_Values
> 
> --D
> 

Ahh, many thanks...

My main question was: why split the refcount across fields like this? If
it's necessary now for backward compatibility then so be it, but it's
weird and not 100% clear why it's being done that way.


> > field was a 32 bits, and was never changed when the field was converted
> > to a 64 bit value.
> > 
> > Change the code to just store the refcount in the i_version field rather
> > than splitting it across both fields.
> > 
> > Cc: David Howells <dhowells@redhat.com>
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/ext4/xattr.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> > index 63656dbafdc4..6ea78dd367ca 100644
> > --- a/fs/ext4/xattr.c
> > +++ b/fs/ext4/xattr.c
> > @@ -294,14 +294,12 @@ ext4_xattr_inode_hash(struct ext4_sb_info *sbi, const void *buffer, size_t size)
> >  
> >  static u64 ext4_xattr_inode_get_ref(struct inode *ea_inode)
> >  {
> > -	return ((u64)ea_inode->i_ctime.tv_sec << 32) |
> > -		(u32) inode_peek_iversion_raw(ea_inode);
> > +	return inode_peek_iversion_raw(ea_inode);
> >  }
> >  
> >  static void ext4_xattr_inode_set_ref(struct inode *ea_inode, u64 ref_count)
> >  {
> > -	ea_inode->i_ctime.tv_sec = (u32)(ref_count >> 32);
> > -	inode_set_iversion_raw(ea_inode, ref_count & 0xffffffff);
> > +	inode_set_iversion_raw(ea_inode, ref_count);
> >  }
> >  
> >  static u32 ext4_xattr_inode_get_hash(struct inode *ea_inode)
> > -- 
> > 2.14.3
> > 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
  2018-01-09 19:05   ` Jeff Layton
@ 2018-01-11 18:39     ` Theodore Ts'o
  2018-01-11 20:09       ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2018-01-11 18:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Darrick J. Wong, adilger.kernel, linux-ext4, dhowells

On Tue, Jan 09, 2018 at 02:05:31PM -0500, Jeff Layton wrote:
>
> Ahh, many thanks...
> 
> My main question was: why split the refcount across fields like this? If
> it's necessary now for backward compatibility then so be it, but it's
> weird and not 100% clear why it's being done that way.

The main reason is that the only inode that will need it is the hidden
extended attribute inode (and then only for Samba servers that are
supporting enterprise domain CIFS servers where there are more than
64k files using the same windows ACL).  So we didn't want to use extra
bytes in the inode, since it's only going to be used in a very tiny
fraction of servers.

For the right workload, though, this should allow ext4 to have
significantly better performance, since if you are serving a large
directory where all of the files have their own ACL or Windows
security ID xattrs, without shared extended attributes, when you open
a Files explorer on that directory, for each file the file system will
be forced to do lots of random 4k reads to fetch the xattrs.

   	     	     	       	  	- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
  2018-01-11 18:39     ` Theodore Ts'o
@ 2018-01-11 20:09       ` Jeff Layton
  2018-01-11 22:41         ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2018-01-11 20:09 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Darrick J. Wong, adilger.kernel, linux-ext4, dhowells

On Thu, 2018-01-11 at 13:39 -0500, Theodore Ts'o wrote:
> On Tue, Jan 09, 2018 at 02:05:31PM -0500, Jeff Layton wrote:
> > 
> > Ahh, many thanks...
> > 
> > My main question was: why split the refcount across fields like this? If
> > it's necessary now for backward compatibility then so be it, but it's
> > weird and not 100% clear why it's being done that way.
> 
> The main reason is that the only inode that will need it is the hidden
> extended attribute inode (and then only for Samba servers that are
> supporting enterprise domain CIFS servers where there are more than
> 64k files using the same windows ACL).  So we didn't want to use extra
> bytes in the inode, since it's only going to be used in a very tiny
> fraction of servers.
> 
> For the right workload, though, this should allow ext4 to have
> significantly better performance, since if you are serving a large
> directory where all of the files have their own ACL or Windows
> security ID xattrs, without shared extended attributes, when you open
> a Files explorer on that directory, for each file the file system will
> be forced to do lots of random 4k reads to fetch the xattrs.
> 

Oh, sorry...I may not have been clear.

I wasn't really asking about the need for xattr inodes. My question was
more about why it was decided to split the refcount in two, and store
part of it in the ctime and part in the i_version field.

Wouldn't it have made more sense to just store it all in i_version field
(like this patch makes it do)? If this breaks backward compatibility
though, then I'm fine with just dropping the patch.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields
  2018-01-11 20:09       ` Jeff Layton
@ 2018-01-11 22:41         ` Theodore Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2018-01-11 22:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Darrick J. Wong, adilger.kernel, linux-ext4, dhowells

On Thu, Jan 11, 2018 at 03:09:03PM -0500, Jeff Layton wrote:
> 
> Wouldn't it have made more sense to just store it all in i_version field
> (like this patch makes it do)? If this breaks backward compatibility
> though, then I'm fine with just dropping the patch.

Yes, that part is there for compatibility for the deployed base
(Lustre Servers).

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-11 22:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 14:47 [RFC PATCH] ext4: don't split xattr inode refcounts across i_ctime and i_version fields Jeff Layton
2018-01-09 18:52 ` Darrick J. Wong
2018-01-09 19:05   ` Jeff Layton
2018-01-11 18:39     ` Theodore Ts'o
2018-01-11 20:09       ` Jeff Layton
2018-01-11 22:41         ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox