public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xattr atomicy
@ 2013-12-13 11:56 Christoph Hellwig
  2013-12-13 19:52 ` Ben Myers
  2013-12-13 21:51 ` Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2013-12-13 11:56 UTC (permalink / raw)
  To: xfs

On the nfsv4 list it was recently discussed how atomic / transaction
xattr updates are.  It turns out none of that seems documented on the
syscall level, but for XFS we have an odd inconsistancy in that attr
updates generally are atomic and logged, except when we go out to
remote attributes in xfs_attr_rmtval_set, in which case attr updates
are no logged, and we do synchronous writes instead.

Besides the weird semantic difference that is impossible to explain to
users performance will also generally be bad with a synchronous buffer
write.  Is there any good reason to not log the buffer for the remote
attributes? Given that attribute are limited to 64kB it's not like
the value is larger than large directory blocks that we already
support.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xattr atomicy
  2013-12-13 11:56 xattr atomicy Christoph Hellwig
@ 2013-12-13 19:52 ` Ben Myers
  2013-12-13 21:51 ` Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Ben Myers @ 2013-12-13 19:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey Christoph,

On Fri, Dec 13, 2013 at 03:56:44AM -0800, Christoph Hellwig wrote:
> On the nfsv4 list it was recently discussed how atomic / transaction
> xattr updates are.  It turns out none of that seems documented on the
> syscall level, but for XFS we have an odd inconsistancy in that attr
> updates generally are atomic and logged, except when we go out to
> remote attributes in xfs_attr_rmtval_set, in which case attr updates
> are no logged, and we do synchronous writes instead.
> 
> Besides the weird semantic difference that is impossible to explain to
> users performance will also generally be bad with a synchronous buffer
> write.  Is there any good reason to not log the buffer for the remote
> attributes? Given that attribute are limited to 64kB it's not like
> the value is larger than large directory blocks that we already
> support.

Looks like it's just because we're concerned about the size of the transaction:

1221 STATIC int
1222 xfs_attr_node_addname(xfs_da_args_t *args)
1223 {
...
1359         /*
1360          * If there was an out-of-line value, allocate the blocks we
1361          * identified for its storage and copy the value.  This is done
1362          * after we create the attribute so that we don't overflow the
1363          * maximum size of a transaction and/or hit a deadlock.
1364          */
1365         if (args->rmtblkno > 0) {
1366                 error = xfs_attr_rmtval_set(args);
1367                 if (error)
1368                         return(error);
1369         }

I'm not clear on what the deadlock might have been.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xattr atomicy
  2013-12-13 11:56 xattr atomicy Christoph Hellwig
  2013-12-13 19:52 ` Ben Myers
@ 2013-12-13 21:51 ` Dave Chinner
  2013-12-16 15:19   ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2013-12-13 21:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Dec 13, 2013 at 03:56:44AM -0800, Christoph Hellwig wrote:
> On the nfsv4 list it was recently discussed how atomic / transaction
> xattr updates are.  It turns out none of that seems documented on the
> syscall level, but for XFS we have an odd inconsistancy in that attr
> updates generally are atomic and logged, except when we go out to
> remote attributes in xfs_attr_rmtval_set, in which case attr updates
> are no logged, and we do synchronous writes instead.

Yes, but they are still atomic from a user and crash recovery
point of view....

I'd been wondering a while back if we could make remote xattrs use
an ordered buffer so we don't need to log it back can leave it for
async write but still have it pin the log tail. However, I don't
think we can do that as we can't recover the attr data that is lost
if we crash. Hence I think our only option is to log it if we want
ot make it an async write.

> Besides the weird semantic difference that is impossible to explain to
> users performance will also generally be bad with a synchronous buffer
> write.  Is there any good reason to not log the buffer for the remote
> attributes? Given that attribute are limited to 64kB it's not like
> the value is larger than large directory blocks that we already
> support.

Well, I think it's a bit different to the directory block case - the
directory blocks are filesystem metadata, while xattrs contain user
data. Hence if we log user xattrs a user can consume all of the log
bandwidth writing xattrs and degrade the metadata modification
performance of the rest of the filesystem.

One issue that we'll need to deal with is that it may change the
minimum log size calculations if we add 64k of data to the attribute
transaction reservations. We currently calculate the remote attr
reservation in xfs_log_calc_max_attrsetm_res() and that will need to
change. If it pushes the remote attr reservation to be the largest
transaction reservation, we could have log size issues on existing
filesystems and that would lead to only enabling logging of remote
xattrs if the log is physically big enough.

The thing is, it's really only user data that ever ends up in a
remote attr block - system xattrs like ACLs and selinux contexts,
application xattrs from DMF, gluster, swift, Samba, etc rarely
consume enough bytes to push the xattr out of line.

So, IMO, the first question we need to answer is whether the current
behaviour is actually a problem for anyone....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xattr atomicy
  2013-12-13 21:51 ` Dave Chinner
@ 2013-12-16 15:19   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2013-12-16 15:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Sat, Dec 14, 2013 at 08:51:17AM +1100, Dave Chinner wrote:
> Yes, but they are still atomic from a user and crash recovery
> point of view....

I can't see how we can guarantee an atomic update for them, both
in the case of an I/O error and an actual system crash.

> Well, I think it's a bit different to the directory block case - the
> directory blocks are filesystem metadata, while xattrs contain user
> data. Hence if we log user xattrs a user can consume all of the log
> bandwidth writing xattrs and degrade the metadata modification
> performance of the rest of the filesystem.

We're getting close to do that with namespace modifications with
all your scalability work :)

I think that's a point to consider, but not really black and white.  It
just makes it a bit easier to consume log bandwith, and increases the
need to have some form of per-user quotas for this sort of operations.

> So, IMO, the first question we need to answer is whether the current
> behaviour is actually a problem for anyone....

I've not heard of real life problems, but an interfaces that has very
nice behavior for the common case, but a much less optimal for a corner
cases is bound to cause trouble.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-12-16 15:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-13 11:56 xattr atomicy Christoph Hellwig
2013-12-13 19:52 ` Ben Myers
2013-12-13 21:51 ` Dave Chinner
2013-12-16 15:19   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox