public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Karel Zak <kzak@redhat.com>, Greg KH <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	linux-man <linux-man@vger.kernel.org>,
	LSM <linux-security-module@vger.kernel.org>,
	Ian Kent <raven@themaw.net>, David Howells <dhowells@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <christian@brauner.io>,
	Amir Goldstein <amir73il@gmail.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API
Date: Wed, 11 May 2022 09:04:47 +1000	[thread overview]
Message-ID: <20220510230447.GC2306852@dread.disaster.area> (raw)
In-Reply-To: <87bkw5d098.fsf@oldenburg.str.redhat.com>

On Tue, May 10, 2022 at 02:45:39PM +0200, Florian Weimer wrote:
> * Dave Chinner:
> 
> > IOWs, what Linux really needs is a listxattr2() syscall that works
> > the same way that getdents/XFS_IOC_ATTRLIST_BY_HANDLE work. With the
> > list function returning value sizes and being able to iterate
> > effectively, every problem that listxattr() causes goes away.
> 
> getdents has issues of its own because it's unspecified what happens if
> the list of entries is modified during iteration.  Few file systems add
> another tree just to guarantee stable iteration.

The filesystem I care about (XFS) guarantees stable iteration and
stable seekdir/telldir cookies. It's not that hard to do, but it
requires the filesystem designer to understand that this is a
necessary feature before they start designing the on-disk directory
format and lookup algorithms....

> Maybe that's different for xattrs because they are supposed to be small
> and can just be snapshotted with a full copy?

It's different for xattrs because we directly control the API
specification for XFS_IOC_ATTRLIST_BY_HANDLE, not POSIX. We can
define the behaviour however we want. Stable iteration is what
listing keys needs.

The cursor is defined as 16 bytes of opaque data, enabling us to
encoded exactly where in the hashed name btree index we have
traversed to:

/*
 * Kernel-internal version of the attrlist cursor.
 */
struct xfs_attrlist_cursor_kern {
        __u32   hashval;        /* hash value of next entry to add */
        __u32   blkno;          /* block containing entry (suggestion) */
        __u32   offset;         /* offset in list of equal-hashvals */
        __u16   pad1;           /* padding to match user-level */
        __u8    pad2;           /* padding to match user-level */
        __u8    initted;        /* T/F: cursor has been initialized */
};

Hence we have all the information in the cursor we need to reset the
btree traversal index to the exact entry we finished at (even in the
presence of hash collisions in the index). Hence removal of the
entry the cursor points to isn't a problem for us, we just move to
the next highest sequential hash index in the btree and start again
from there.

Of course, if this is how we define listxattr2() behaviour (or maybe
we should call it "list_keys()" to make it clear we are treating
this as a key/value store instead of xattrs) then each filesystem
can put what it needs in that cursor to guarantee it can restart key
iteration correctly if the entry the cursor points to has been
removed.  We can also make the cursor larger if necessary for other
filesystems to store the information they need.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-05-10 23:04 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 12:23 [RFC PATCH] getting misc stats/attributes via xattr API Miklos Szeredi
2022-05-03 14:39 ` Amir Goldstein
2022-05-03 14:53   ` Greg KH
2022-05-03 15:04     ` Miklos Szeredi
2022-05-03 15:14       ` Amir Goldstein
2022-05-03 16:54       ` Greg KH
2022-05-03 22:43 ` Dave Chinner
2022-05-04  7:18   ` Miklos Szeredi
2022-05-04 14:22     ` Amir Goldstein
2022-05-05 12:30 ` Karel Zak
2022-05-05 13:59   ` Miklos Szeredi
2022-05-05 23:38 ` tytso
2022-05-06  0:06   ` Amir Goldstein
2022-05-07  0:32   ` Dave Chinner
2022-05-09 12:48 ` Christian Brauner
2022-05-09 14:20   ` Amir Goldstein
2022-05-09 15:08     ` Christian Brauner
2022-05-09 17:07       ` Amir Goldstein
2022-05-09 21:42       ` Vivek Goyal
2022-05-10  3:34         ` Ian Kent
2022-05-10  0:55   ` Dave Chinner
2022-05-10 12:40     ` Christian Brauner
2022-05-11  0:42       ` Dave Chinner
2022-05-11  9:16         ` Christian Brauner
2022-05-10 12:45     ` Florian Weimer
2022-05-10 23:04       ` Dave Chinner [this message]
2022-05-10  3:49   ` Miklos Szeredi
2022-05-10  4:27     ` Ian Kent
2022-05-10  8:06       ` Miklos Szeredi
2022-05-10  8:07         ` Miklos Szeredi
2022-05-10 11:53     ` Christian Brauner
2022-05-10 13:15       ` Miklos Szeredi
2022-05-10 13:18         ` Miklos Szeredi
2022-05-10 14:19         ` Christian Brauner
2022-05-10 14:41           ` Miklos Szeredi
2022-05-10 15:30             ` Christian Brauner
2022-05-10 15:47               ` Miklos Szeredi
2022-05-10 15:53                 ` Christian Brauner
2022-05-10 12:35   ` Karel Zak
2022-05-10 23:25     ` Dave Chinner
2022-05-11  8:58       ` Karel Zak
2022-11-14  9:00 ` Abel Wu
2022-11-14 12:35   ` Miklos Szeredi
2022-11-15  3:39     ` Abel Wu
2023-04-15 11:06     ` [LSF/MM TOPIC] fsinfo and mount namespace notifications Amir Goldstein
2023-04-18  8:54       ` Miklos Szeredi
2023-04-18 15:56         ` Amir Goldstein
2023-04-18 18:57           ` Miklos Szeredi
2023-04-19  8:18             ` Amir Goldstein
2023-04-19  8:43               ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220510230447.GC2306852@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kzak@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=raven@themaw.net \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox