From: anand jain <anand.jain@oracle.com>
To: Zach Brown <zab@redhat.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info from the kernel
Date: Tue, 11 Jun 2013 22:05:57 +0800 [thread overview]
Message-ID: <51B72EC5.3000407@oracle.com> (raw)
In-Reply-To: <20130610203054.GK24721@lenny.home.zabbo.net>
On 06/11/2013 04:30 AM, Zach Brown wrote:
> On Mon, Jun 10, 2013 at 10:59:15PM +0800, Anand Jain wrote:
>> This adds two ioctl BTRFS_IOC_GET_FSIDS and BTRFS_IOC_GET_DEVS
>> which reads the btrfs_fs_devices and btrfs_device structure
>> from the kernel respectively.
>
> Why not just use sysfs to export the device lists?
Hmm. I see sysfs being used but isn't ioctl more easy
to get stuff out from the kernel (except for dumping
address_space) ? Providing a complete sysfs interface
for btrfs can be a RFC as well. For now ioctl will help
to fix the bugs.
>> The information in these structure are useful to report the
>> device/fs information in line with the kernel operations and
>> thus immediately addresses the problem that 'btrfs fi show'
>> command reports the stale information after device device add
>> remove operation is performed. That is because btrfs fi show
>> reads the disks directly.
>
> Hmm. Would it be enough to get the block device address_space in sync
> with the btrfs stuctures? Would that get rid of the need for this
> interface?
Absolutely ! I have that to experiment in linux for
a long time. More to come on that as time permits from
the bug fixes which is kind of urgent need as of now.
> But sure, for the sake of argument and code review, let's say that we
> wanted some ioctls to read the btrfs kernel device lists.
thanks.
>> Further the frame-work provided here would help to enhance
>
> Please don't add a layer of generic encoding above these new ioctls.
> It's not necessary and it makes more work for the various parts of the
> world that want to do deep ioctl inspection (think, for example, of
> strace).
Agreed. Debugging has to be easier.
>> +static size_t get_ioctl_sz(size_t pl_list_sz, u64 mul, u64 alloc_cnt)
>> +
>> +static int get_ioctl_header_and_payload(unsigned long arg,
>> + size_t unit_sz, int default_len,
>> + struct btrfs_ioctl_header **argp, size_t *sz)
>
> This adds a lot of fiddly math and allocation that can go wrong for no
> benefit. We can restructure the interface so all of this code goes
> away.
>
> 1) Have a simple single fixed input structure for each ioctl. Maybe
> with some extra padding and a flags argument if you think stuff is going
> to be added over time. No generic header. No casting. The ioctl
> number defines the input structure. If you need a different structure
> later, use a different ioctl number.
-No generic header and No casting- Why not ? anything that
might not work with strace ?
(Header-payload model are typically favorites of IO protocol drivers.
they are easily extensible, checking for compatibility is easier, and
code re-useability is great).
Thanks for the review comments below. I will look into that.
Anand
-------
> 2) Don't have a fixed array of output structs embedded in the input
> struct. Have a pointer to the array of output structs in the input
> struct. Probably with a number of elements found there.
> 3) Don't copy the giant user output buffer into the kernel, write to it,
> then copy it back to user space. Instead have the kernel copy_to_user()
> each output structure to the userspace pointer as it goes. Have the
> ioctl() return a positive count of the number of elements copied.
>> static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>> unsigned long arg)
>> {
>> - struct btrfs_ioctl_vol_args *vol;
>> + struct btrfs_ioctl_vol_args *vol = NULL;
>> struct btrfs_fs_devices *fs_devices;
>> - int ret = -ENOTTY;
>> + struct btrfs_ioctl_header *argp = NULL;
>> + int ret = -ENOTTY;
>> + size_t sz = 0;
>>
>> if (!capable(CAP_SYS_ADMIN))
>> return -EPERM;
>>
>> - vol = memdup_user((void __user *)arg, sizeof(*vol));
>> - if (IS_ERR(vol))
>> - return PTR_ERR(vol);
>> -
>> switch (cmd) {
>> case BTRFS_IOC_SCAN_DEV:
>> + vol = memdup_user((void __user *)arg, sizeof(*vol));
>> + if (IS_ERR(vol))
>> + return PTR_ERR(vol);
>
> Hmm, having to duplicate that previously shared setup into each ioctl
> block is a sign that the layering is wrong.
>
> Don't smush the new ioctls into case bodies of this existing simple
> fuction that worked with the _vol_args ioctls. Rename this
> btrfs_ioctl_vol_args, or something, and call it from a tiny new
> btrfs_control_ioctl() for its two ioctls. That new high level btrfs
> ioctl dispatch function can then call the two new _get_devs and
> _get_fsids functions.
>> + case BTRFS_IOC_GET_FSIDS:
>> + ret = get_ioctl_header_and_payload(arg,
>> + sizeof(struct btrfs_ioctl_fs_list),
>> + BTRFS_FSIDS_LEN, &argp, &sz);
>> + if (ret == 1) {
>> + ret = copy_to_user((void __user *)arg, argp, sz);
>> + kfree(argp);
>> + return -EFAULT;
>> + } else if (ret)
>> + return -EFAULT;
>> + ret = btrfs_get_fsids(argp);
>> + if (ret == 0 && copy_to_user((void __user *)arg, argp, sz))
>> + ret = -EFAULT;
>> + kfree(argp);
>> + break;
>
> All of this can go away if you have trivial input and array-of-output
> args that the ioctl helper works with.
>
> case BTRFS_IOC_GET_FSIDS:
> ret = btrfs_ioctl_get_fsids(arg);
> break;
>> +int btrfs_get_fsids(struct btrfs_ioctl_header *argp)
>> +{
>> + u64 cnt = 0;
>> + struct btrfs_device *device;
>> + struct btrfs_fs_devices *fs_devices;
>> + struct btrfs_ioctl_fs_list *fslist;
>> + struct btrfs_ioctl_fsinfo *fsinfo;
>> +
>> + fslist = (struct btrfs_ioctl_fs_list *)((u8 *)argp
>> + + sizeof(*argp));
>
> Instead of working with an allocated copy of the input, copy the user
> input struct onto the stack here.
>
> struct btrfs_ioctl_get_fsids_input in;
>
> if (copy_from_user(&in, argp, sizeof(in)))
> return -EFAULT;
>
> (Or you could get_user() each field.. but that's probably not worth it
> for a small input struct.)
>> +
>> + list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> Surely this needs to be protected by some lock? The uuid_mutex, maybe?
>> + fsinfo = &fslist->fsinfo[cnt];
>> + memcpy(fsinfo->fsid, fs_devices->fsid,
>> + BTRFS_FSID_SIZE);
>
> And then copy each device's output struct back to userspace one at a
> time instead of building them up in the giant allocated kernel buffer.
>
> This might get a little hairy if you don't want to block under the
> uuid_mutex, but that's solvable too (dummy cursor items on the list,
> probably).
>> + fsinfo->bytes_used = device->dev_root\
>> + ->fs_info->super_copy->bytes_used;
>
> A line continuation in the middle of a pointer deref? Cache the
> super_copy pointer on the stack, maybe. It's probably better to use a
> function that copies a single device's output struct to get all those
> indentation levels back.
>
>> + /* Todo: optimize. there must be better way of doing this*/
>> + list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> (same locking question)
>> + if (device->in_fs_metadata)
>> + dev->flags = dev->flags |
>> + BTRFS_DEV_IN_FS_MD;
>
> 'a = a | b' -> 'a |= b'
>> + memcpy(dev->name, rcu_str_deref(device->name),
>> + BTRFS_PATH_NAME_MAX);
>
> Hmm. Don't we need to be in rcu_read_lock() to use rcu_str_deref()?
>> +struct btrfs_ioctl_fsinfo {
>> + __u64 sz_self;
>> + __u8 fsid[BTRFS_FSID_SIZE]; /* out */
>> + __u64 num_devices;
>> + __u64 total_devices;
>> + __u64 missing_devices;
>> + __u64 total_rw_bytes;
>> + __u64 bytes_used;
>> + __u64 flags;
>> + __u64 default_mount_subvol_id;
>> + char label[BTRFS_LABEL_SIZE];
>> +}__attribute__ ((__packed__));
>
> It'd be __packed, but I'd naturally align the structs to u64s and not
> use it.
>> +struct btrfs_ioctl_fs_list {
>> + __u64 all; /* in */
>> + struct btrfs_ioctl_fsinfo fsinfo[BTRFS_FSIDS_LEN]; /* out */
>> +}__attribute__ ((__packed__));
>
> I'd turn this into something like:
>
> struct btrfs_ioctl_get_fsinfo_input {
> __u64 flags; /* bit for all */
> __u64 nr_fsinfo;
> __u64 fsinfo_ptr; /* u64 ptr to avoid compat */
> };
>> +struct btrfs_ioctl_devinfo {
>> + __u64 sz_self;
>> + __u64 flags;
>> + __u64 devid;
>> + __u64 total_bytes;
>> + __u64 disk_total_bytes;
>> + __u64 bytes_used;
>> + __u64 type;
>> + __u64 generation;
>> + __u32 io_align;
>> + __u32 io_width;
>> + __u32 sector_size;
> __u32 _padding;
>> + __u8 uuid[BTRFS_UUID_SIZE];
>> + __u8 name[BTRFS_PATH_NAME_MAX + 1];
>
> Pad that nutty 4088 byte string to a multiple of u64s and you can
> remove the packed attribute.
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-06-11 14:05 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-10 14:56 [PATCH 0/9] btrfs-progs: coalesce of patches Anand Jain
2013-06-10 14:56 ` [PATCH 1/9] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-06-10 20:00 ` Eric Sandeen
2013-06-11 13:15 ` anand jain
2013-06-10 14:56 ` [PATCH 2/9 v2] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-06-10 14:56 ` [PATCH 3/9 v2] btrfs-progs: update device scan usage Anand Jain
2013-06-10 14:56 ` [PATCH 4/9 v3] btrfs-progs: congregate dev scan Anand Jain
2013-06-10 14:56 ` [PATCH 5/9 v2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-06-10 14:56 ` [PATCH 6/9 v2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-06-10 14:56 ` [PATCH 7/9 v3] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:56 ` [PATCH 8/9] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-06-21 7:41 ` [PATCH 08/13 v2] " Anand Jain
2013-06-10 14:56 ` [PATCH 9/9 v2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-06-10 14:59 ` [PATCH 0/2] btrfs: coalesce of patches Anand Jain
2013-06-10 14:59 ` [PATCH 1/2] btrfs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:59 ` [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info " Anand Jain
2013-06-10 19:40 ` Josef Bacik
2013-06-11 13:10 ` anand jain
2013-06-11 13:15 ` Josef Bacik
2013-06-10 20:30 ` Zach Brown
2013-06-11 14:05 ` anand jain [this message]
2013-06-11 17:50 ` Zach Brown
2013-06-11 14:24 ` Josef Bacik
2013-06-21 7:02 ` Anand Jain
2013-06-21 7:32 ` [PATCH 2/2 v3] btrfs: obtain used_bytes in BTRFS_IOC_FS_INFO ioctl Anand Jain
2013-06-24 17:03 ` Josef Bacik
2013-06-25 3:00 ` Anand Jain
2013-07-08 7:39 ` [PATCH 13/13] btrfs-progs: fix memory leaks of device_list_add() Anand Jain
2013-07-15 4:58 ` Anand Jain
2013-07-15 5:30 ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches Anand Jain
2013-07-15 5:30 ` [PATCH 01/11] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-07-15 5:30 ` [PATCH 02/11] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-07-15 5:30 ` [PATCH 03/11] btrfs-progs: update device scan usage Anand Jain
2013-07-15 5:30 ` [PATCH 04/11] btrfs-progs: congregate dev scan Anand Jain
2013-07-15 5:30 ` [PATCH 05/11] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-05 16:53 ` David Sterba
2013-07-15 5:30 ` [PATCH 06/11] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-05 17:04 ` David Sterba
2013-08-13 4:07 ` anand jain
2013-07-15 5:30 ` [PATCH 07/11] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-07-15 5:30 ` [PATCH 08/11] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-07-15 5:30 ` [PATCH 09/11] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-07-15 5:30 ` [PATCH 10/11] btrfs-progs: get string for the group profile and type Anand Jain
2013-07-15 5:30 ` [PATCH 11/11] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-05 17:36 ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches David Sterba
2013-08-06 15:08 ` anand jain
2013-08-08 8:07 ` [PATCH 0/2 v2] introduce btrfs filesystem show --kernel Anand Jain
2013-08-08 8:07 ` [PATCH 1/2] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-08-08 8:07 ` [PATCH 2/2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-08 18:08 ` Zach Brown
2013-08-09 10:57 ` anand jain
2013-08-09 18:03 ` Zach Brown
2013-08-08 8:09 ` [PATCH 0/2] scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08 8:09 ` [PATCH 1/2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-08 8:09 ` [PATCH 2/2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08 8:10 ` [PATCH 0/2] " anand jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B72EC5.3000407@oracle.com \
--to=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=zab@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).