All of lore.kernel.org
 help / color / mirror / Atom feed
From: anand jain <anand.jain@oracle.com>
To: Zach Brown <zab@redhat.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info from the kernel
Date: Tue, 11 Jun 2013 22:05:57 +0800	[thread overview]
Message-ID: <51B72EC5.3000407@oracle.com> (raw)
In-Reply-To: <20130610203054.GK24721@lenny.home.zabbo.net>



On 06/11/2013 04:30 AM, Zach Brown wrote:
> On Mon, Jun 10, 2013 at 10:59:15PM +0800, Anand Jain wrote:
>> This adds two ioctl BTRFS_IOC_GET_FSIDS and BTRFS_IOC_GET_DEVS
>> which reads the btrfs_fs_devices and btrfs_device structure
>> from the kernel respectively.
>
> Why not just use sysfs to export the device lists?

  Hmm. I see sysfs being used but isn't ioctl more easy
  to get stuff out from the kernel (except for dumping
  address_space) ? Providing a complete sysfs interface
  for btrfs can be a RFC as well. For now ioctl will help
  to fix the bugs.

>> The information in these structure are useful to report the
>> device/fs information in line with the kernel operations and
>> thus immediately addresses the problem that 'btrfs fi show'
>> command reports the stale information after device device add
>> remove operation is performed. That is because btrfs fi show
>> reads the disks directly.
>
> Hmm.  Would it be enough to get the block device address_space in sync
> with the btrfs stuctures?  Would that get rid of the need for this
> interface?

  Absolutely ! I have that to experiment in linux for
  a long time. More to come on that as time permits from
  the bug fixes which is kind of urgent need as of now.

> But sure, for the sake of argument and code review, let's say that we
> wanted some ioctls to read the btrfs kernel device lists.

  thanks.

>> Further the frame-work provided here would help to enhance
>
> Please don't add a layer of generic encoding above these new ioctls.
> It's not necessary and it makes more work for the various parts of the
> world that want to do deep ioctl inspection (think, for example, of
> strace).

   Agreed. Debugging has to be easier.

>> +static size_t get_ioctl_sz(size_t pl_list_sz, u64 mul, u64 alloc_cnt)
>> +
>> +static int get_ioctl_header_and_payload(unsigned long arg,
>> +		size_t unit_sz, int default_len,
>> +		struct btrfs_ioctl_header **argp, size_t *sz)
>
> This adds a lot of fiddly math and allocation that can go wrong for no
> benefit.  We can restructure the interface so all of this code goes
> away.
>
> 1) Have a simple single fixed input structure for each ioctl.  Maybe
> with some extra padding and a flags argument if you think stuff is going
> to be added over time.  No generic header.  No casting.  The ioctl
> number defines the input structure.  If you need a different structure
> later, use a different ioctl number.

  -No generic header and No casting-  Why not ? anything that
  might not work with strace ?

  (Header-payload model are typically favorites of IO protocol drivers.
  they are easily extensible, checking for compatibility is easier, and
  code re-useability is great).

  Thanks for the review comments below. I will look into that.

Anand
-------
> 2) Don't have a fixed array of output structs embedded in the input
> struct.  Have a pointer to the array of output structs in the input
> struct.  Probably with a number of elements found there.



> 3) Don't copy the giant user output buffer into the kernel, write to it,
> then copy it back to user space.  Instead have the kernel copy_to_user()
> each output structure to the userspace pointer as it goes.  Have the
> ioctl() return a positive count of the number of elements copied.



>>   static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>>   				unsigned long arg)
>>   {
>> -	struct btrfs_ioctl_vol_args *vol;
>> +	struct btrfs_ioctl_vol_args *vol = NULL;
>>   	struct btrfs_fs_devices *fs_devices;
>> -	int ret = -ENOTTY;
>> +	struct btrfs_ioctl_header *argp = NULL;
>> + 	int ret = -ENOTTY;
>> +	size_t sz = 0;
>>
>>   	if (!capable(CAP_SYS_ADMIN))
>>   		return -EPERM;
>>
>> -	vol = memdup_user((void __user *)arg, sizeof(*vol));
>> -	if (IS_ERR(vol))
>> -		return PTR_ERR(vol);
>> -
>>   	switch (cmd) {
>>   	case BTRFS_IOC_SCAN_DEV:
>> +		vol = memdup_user((void __user *)arg, sizeof(*vol));
>> +		if (IS_ERR(vol))
>> +			return PTR_ERR(vol);
>
> Hmm, having to duplicate that previously shared setup into each ioctl
> block is a sign that the layering is wrong.
>
> Don't smush the new ioctls into case bodies of this existing simple
> fuction that worked with the _vol_args ioctls.  Rename this
> btrfs_ioctl_vol_args, or something, and call it from a tiny new
> btrfs_control_ioctl() for its two ioctls.  That new high level btrfs
> ioctl dispatch function can then call the two new _get_devs and
> _get_fsids functions.



>> +	case BTRFS_IOC_GET_FSIDS:
>> +		ret = get_ioctl_header_and_payload(arg,
>> +				sizeof(struct btrfs_ioctl_fs_list),
>> +				BTRFS_FSIDS_LEN, &argp, &sz);
>> +		if (ret == 1) {
>> +			ret = copy_to_user((void __user *)arg, argp, sz);
>> +			kfree(argp);
>> +			return -EFAULT;
>> +		} else if (ret)
>> +			return -EFAULT;
>> +		ret = btrfs_get_fsids(argp);
>> +		if (ret == 0 && copy_to_user((void __user *)arg, argp, sz))
>> +			ret = -EFAULT;
>> +		kfree(argp);
>> +		break;
>
> All of this can go away if you have trivial input and array-of-output
> args that the ioctl helper works with.
>
> 	case BTRFS_IOC_GET_FSIDS:
> 		ret = btrfs_ioctl_get_fsids(arg);
> 		break;


>> +int btrfs_get_fsids(struct btrfs_ioctl_header *argp)
>> +{
>> +	u64 cnt = 0;
>> +	struct btrfs_device *device;
>> +	struct btrfs_fs_devices *fs_devices;
>> +	struct btrfs_ioctl_fs_list *fslist;
>> +	struct btrfs_ioctl_fsinfo *fsinfo;
>> +
>> +	fslist = (struct btrfs_ioctl_fs_list *)((u8 *)argp
>> +						+ sizeof(*argp));
>
> Instead of working with an allocated copy of the input, copy the user
> input struct onto the stack here.
>
> 	struct btrfs_ioctl_get_fsids_input in;
>
> 	if (copy_from_user(&in, argp, sizeof(in)))
> 		return -EFAULT;
>
> (Or you could get_user() each field.. but that's probably not worth it
> for a small input struct.)



>> +
>> +	list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> Surely this needs to be protected by some lock?  The uuid_mutex, maybe?



>> +			fsinfo = &fslist->fsinfo[cnt];
>> +			memcpy(fsinfo->fsid, fs_devices->fsid,
>> +					BTRFS_FSID_SIZE);
>
> And then copy each device's output struct back to userspace one at a
> time instead of building them up in the giant allocated kernel buffer.
>
> This might get a little hairy if you don't want to block under the
> uuid_mutex, but that's solvable too (dummy cursor items on the list,
> probably).



>> +				fsinfo->bytes_used = device->dev_root\
>> +						->fs_info->super_copy->bytes_used;
>
> A line continuation in the middle of a pointer deref?  Cache the
> super_copy pointer on the stack, maybe.  It's probably better to use a
> function that copies a single device's output struct to get all those
> indentation levels back.
>
>> +	/* Todo: optimize. there must be better way of doing this*/
>> +	list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> (same locking question)



>> +				if (device->in_fs_metadata)
>> +					dev->flags = dev->flags |
>> +						BTRFS_DEV_IN_FS_MD;
>
> 	'a = a | b' -> 'a |= b'



>> +				memcpy(dev->name, rcu_str_deref(device->name),
>> +						BTRFS_PATH_NAME_MAX);
>
> Hmm.  Don't we need to be in rcu_read_lock() to use rcu_str_deref()?



>> +struct btrfs_ioctl_fsinfo {
>> +	__u64 sz_self;
>> +	__u8 fsid[BTRFS_FSID_SIZE]; /* out */
>> +	__u64 num_devices;
>> +	__u64 total_devices;
>> +	__u64 missing_devices;
>> +	__u64 total_rw_bytes;
>> +	__u64 bytes_used;
>> +	__u64 flags;
>> +	__u64 default_mount_subvol_id;
>> +	char label[BTRFS_LABEL_SIZE];
>> +}__attribute__ ((__packed__));
>
> It'd be __packed, but I'd naturally align the structs to u64s and not
> use it.



>> +struct btrfs_ioctl_fs_list {
>> +	__u64 all; /* in */
>> +	struct btrfs_ioctl_fsinfo fsinfo[BTRFS_FSIDS_LEN]; /* out */
>> +}__attribute__ ((__packed__));
>
> I'd turn this into something like:
>
> struct btrfs_ioctl_get_fsinfo_input {
> 	__u64 flags;  /* bit for all */
> 	__u64 nr_fsinfo;
> 	__u64 fsinfo_ptr; /* u64 ptr to avoid compat */
> };



>> +struct btrfs_ioctl_devinfo {
>> +	__u64 sz_self;
>> +	__u64 flags;
>> +	__u64 devid;
>> +	__u64 total_bytes;
>> +	__u64 disk_total_bytes;
>> +	__u64 bytes_used;
>> +	__u64 type;
>> +	__u64 generation;
>> +	__u32 io_align;
>> +	__u32 io_width;
>> +	__u32 sector_size;
>   __u32 _padding;
>> +	__u8 uuid[BTRFS_UUID_SIZE];
>> +	__u8 name[BTRFS_PATH_NAME_MAX + 1];
>
> Pad that nutty 4088 byte string to a multiple of u64s and you can
> remove the packed attribute.




> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2013-06-11 14:05 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-10 14:56 [PATCH 0/9] btrfs-progs: coalesce of patches Anand Jain
2013-06-10 14:56 ` [PATCH 1/9] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-06-10 20:00   ` Eric Sandeen
2013-06-11 13:15     ` anand jain
2013-06-10 14:56 ` [PATCH 2/9 v2] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-06-10 14:56 ` [PATCH 3/9 v2] btrfs-progs: update device scan usage Anand Jain
2013-06-10 14:56 ` [PATCH 4/9 v3] btrfs-progs: congregate dev scan Anand Jain
2013-06-10 14:56 ` [PATCH 5/9 v2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-06-10 14:56 ` [PATCH 6/9 v2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-06-10 14:56 ` [PATCH 7/9 v3] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:56 ` [PATCH 8/9] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-06-21  7:41   ` [PATCH 08/13 v2] " Anand Jain
2013-06-10 14:56 ` [PATCH 9/9 v2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-06-10 14:59 ` [PATCH 0/2] btrfs: coalesce of patches Anand Jain
2013-06-10 14:59   ` [PATCH 1/2] btrfs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:59   ` [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info " Anand Jain
2013-06-10 19:40     ` Josef Bacik
2013-06-11 13:10       ` anand jain
2013-06-11 13:15         ` Josef Bacik
2013-06-10 20:30     ` Zach Brown
2013-06-11 14:05       ` anand jain [this message]
2013-06-11 17:50         ` Zach Brown
2013-06-11 14:24     ` Josef Bacik
2013-06-21  7:02       ` Anand Jain
2013-06-21  7:32     ` [PATCH 2/2 v3] btrfs: obtain used_bytes in BTRFS_IOC_FS_INFO ioctl Anand Jain
2013-06-24 17:03       ` Josef Bacik
2013-06-25  3:00         ` Anand Jain
2013-07-08  7:39 ` [PATCH 13/13] btrfs-progs: fix memory leaks of device_list_add() Anand Jain
2013-07-15  4:58   ` Anand Jain
2013-07-15  5:30 ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches Anand Jain
2013-07-15  5:30   ` [PATCH 01/11] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-07-15  5:30   ` [PATCH 02/11] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-07-15  5:30   ` [PATCH 03/11] btrfs-progs: update device scan usage Anand Jain
2013-07-15  5:30   ` [PATCH 04/11] btrfs-progs: congregate dev scan Anand Jain
2013-07-15  5:30   ` [PATCH 05/11] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-05 16:53     ` David Sterba
2013-07-15  5:30   ` [PATCH 06/11] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-05 17:04     ` David Sterba
2013-08-13  4:07       ` anand jain
2013-07-15  5:30   ` [PATCH 07/11] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-07-15  5:30   ` [PATCH 08/11] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-07-15  5:30   ` [PATCH 09/11] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-07-15  5:30   ` [PATCH 10/11] btrfs-progs: get string for the group profile and type Anand Jain
2013-07-15  5:30   ` [PATCH 11/11] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-05 17:36   ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches David Sterba
2013-08-06 15:08     ` anand jain
2013-08-08  8:07 ` [PATCH 0/2 v2] introduce btrfs filesystem show --kernel Anand Jain
2013-08-08  8:07   ` [PATCH 1/2] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-08-08  8:07   ` [PATCH 2/2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-08 18:08     ` Zach Brown
2013-08-09 10:57       ` anand jain
2013-08-09 18:03         ` Zach Brown
2013-08-08  8:09 ` [PATCH 0/2] scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08  8:09   ` [PATCH 1/2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-08  8:09   ` [PATCH 2/2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08  8:10   ` [PATCH 0/2] " anand jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B72EC5.3000407@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.