From: Jan Hudec <bulb@ucw.cz>
To: David Teigland <teigland@redhat.com>
Cc: Al Viro <viro@ftp.linux.org.uk>,
linux-kernel@vger.kernel.org, akpm@osdl.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 11/16] GFS: mount and tuning options
Date: Wed, 12 Oct 2005 10:43:24 +0200 [thread overview]
Message-ID: <20051012084323.GC21612@djinn> (raw)
In-Reply-To: <20051011213811.GA15913@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6774 bytes --]
On Tue, Oct 11, 2005 at 16:38:11 -0500, David Teigland wrote:
> On Mon, Oct 10, 2005 at 10:37:48PM +0100, Al Viro wrote:
> > On Mon, Oct 10, 2005 at 12:10:52PM -0500, David Teigland wrote:
> > > There are a variety of mount options, tunable parameters, internal
> > > statistics, and methods of online file system manipulation.
> >
> > Could you explain WTF are you doing with rename here? This pile of
> > ioctls is every bit as bad as sys_reiser4(); kindly provide a detailed
> > description of the API you've introduced and explain why nothing saner
> > would do...
>
> First some background that I've copied from elsewhere: The superblock
> contains a pointer to a "master" directory that contains various system
> inodes. The inodes in the master directory are:
>
> 1) A directory named "jindex" containing all the journal files. The
> journals are named "journal0", "journal1", ..., "journalX"
>
> 2) A directory named "per_node" that contains a bunch of files where
> each node can store data specific to that node. Each node has
> files "inum_rangeX", "statfs_changeX", "unlinked_tagX", and
> "quota_changeX". So, there are a set of these four files for each
> journal in the jindex directory.
>
> 3) A file named "inum" that contains the next cluster-wide inode number.
>
> 4) A file named "statfs" that contains the cluster-wide statfs
> information.
>
> 5) A file named "rindex" that contains the locations of all the RGs in
> the filesystem. (RG's == resource groups == allocation groups)
>
> 6) A file named "quota" that contains the quota values (UID and GID)
> for the filesystem.
>
> 7) A directory named "root" that is the root directory of the
> user-visible filesystem.
>
> The ioctls "hfile_stat", "hfile_read", "hfile_write", "hfile_trunc" are
> used to operate on the hidden system files. I notice we're not using
> trunc, so it can be removed. stat/read/write could be replaced with a few
> specific ioctl's if that's preferred.
They are normal directories and normal files, except they are not
exposed in the mount-point, right? Then why don't you simply provide a
directory handle for the master directory and use normal filesystem
operations for the rest?
That way you would have just one ioctl -- getmasterdir. The tool would
fchdir to the handle returned and manipulate the files from there with
normal syscalls. It would still see to the user-visible part throught
the root directory too (since bind mounts are supported, this should not
be a problem).
Or you could do even without ioctls. Just expose the files via /proc
> The next issue is adding journals (and the associated system files) to a
> fs. The gfs2_jadd command does this with the fs online. If you created
> the fs with 8 journals and you now want 12 machines to mount it at once,
> you need to add 4 journals by running "gfs2_jadd -j 4 /path/to/fs".
>
> Say gfs2_jadd is adding a 9th journal (id 8) ...
>
> creates ordinary file /.gfs2_admin/new_inode
> writes to new_inode initializing it as an inum_range file
> moves .gfs2_admin/new_inode to per_node/inum_range8
>
> creates ordinary file /.gfs2_admin/new_inode
> writes to new_inode initializing it as a statfs_change file
> moves .gfs2_admin/new_inode to per_node/statfs_change8
>
> same for unlinked_tag8 and quota_change8
>
> creates ordinary file /.gfs2_admin/new_inode
> writes to new_inode initializing it as a journal file
> moves .gfs2_admin/new_inode to jindex/journal8
>
> (keeping in mind that the "per_node" and "jindex" dirs and the files
> under them are in the hidden/system portion of the fs)
>
> The create and write steps use ordinary system calls. The "move" step
> uses the "rename2system" ioctl to move .gfs2_admin/new_inode to the
> specified system file. The new files are synced before being renamed so
> in case of a crash only correctly formed files are found in the hidden
> dirs. Only when the final journal file is moved into place is the fs
> ready to accept a new mounter.
And with directory handle, you would just chdir there and do:
rename("root/.gfs2_admin/new_inode", "jindex/journal8")
> Next is exapanding the size of the fs. To do this, gfs2_grow first opens
> the device and initializes the new space with RG headers. Second, it uses
> the "resize_add_rgrps" ioctl to add new structures defining the space to
> the "rindex" system file. I'm looking into using hfile_write for this.
Ok, if it can't be done with write, it probably needs something like
ioctl. Though it could be an ioctl on that file, not on the device...
> Other ioctls:
> get_super - copy struct gfs2_sb to user space
> get_file_stat - copy struct gfs2_dinode to user space for given file
> set_file_flag - set gfs-specific flag in inode
> get_bmap - map file block to disk block
> get_file_meta - return all the metadata for a file or dir
> do_file_flush - sync out all dirty data and drop the cache and lock
> do_quota_sync - sync outstanding quota change (moving to sysfs)
> do_quota_refresh - refresh quota lvb from the quota file (moving to sysfs)
> do_quota_read - read quota values from quota file
>
> Some of these we could do without if they're objectionable. Regardless,
> we'll take a closer look to see if any don't qualify as useful enough.
Some of them would be better off as procfs or sysfs entries.
IIRC get_bmap exists elsewhere too, so that should be ok. And
get_file_meta probably won't do without ioctl either.
Wouldn't the get_file_stat be included in get_file_meta?
> Finally, how ioctl is implemented. All the commands above are multiplexed
> through one actual ioctl (GFS2_IOCTL_SUPER) that passes in:
>
> struct gfs2_ioctl {
> unsigned int gi_argc;
> char **gi_argv;
>
> char __user *gi_data;
> unsigned int gi_size;
> uint64_t gi_offset;
> };
>
> - argv[0] is the command string, e.g. "set_file_flag", "rename2system",
> - argv[x] are other string arguments for the command, e.g. for set_file_flag
> argv[1] is either "set" or "clear". For rename2system argv[1] is the
> destination directory and argv[2] is the new name.
> - gi_data, gi_size, gi_offset - data returned to caller when needed
>
> This could be exchanged, of course, for the more tradition ioctl mess if
> that's any saner.
Well, if you get rid of the access to files in the master directory by
making that directory visible somehow, you will be left with a bunch of
ioctls on files, which are different enough to warrant individual ioctl
numbers for sake of efficiency.
--
Jan 'Bulb' Hudec <bulb@ucw.cz>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-10-12 8:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-10 17:10 [PATCH 11/16] GFS: mount and tuning options David Teigland
2005-10-10 21:01 ` Greg KH
2005-10-10 21:14 ` David Teigland
2005-10-10 21:19 ` Greg KH
2005-10-10 21:30 ` Al Viro
2005-10-10 22:22 ` David Teigland
2005-10-10 21:37 ` Al Viro
2005-10-11 21:38 ` David Teigland
2005-10-12 8:43 ` Jan Hudec [this message]
2005-10-12 16:12 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051012084323.GC21612@djinn \
--to=bulb@ucw.cz \
--cc=akpm@osdl.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=teigland@redhat.com \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox