linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: anand jain <anand.jain@oracle.com>
To: Zach Brown <zab@redhat.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info from the kernel
Date: Tue, 11 Jun 2013 22:05:57 +0800	[thread overview]
Message-ID: <51B72EC5.3000407@oracle.com> (raw)
In-Reply-To: <20130610203054.GK24721@lenny.home.zabbo.net>



On 06/11/2013 04:30 AM, Zach Brown wrote:
> On Mon, Jun 10, 2013 at 10:59:15PM +0800, Anand Jain wrote:
>> This adds two ioctl BTRFS_IOC_GET_FSIDS and BTRFS_IOC_GET_DEVS
>> which reads the btrfs_fs_devices and btrfs_device structure
>> from the kernel respectively.
>
> Why not just use sysfs to export the device lists?

  Hmm. I see sysfs being used but isn't ioctl more easy
  to get stuff out from the kernel (except for dumping
  address_space) ? Providing a complete sysfs interface
  for btrfs can be a RFC as well. For now ioctl will help
  to fix the bugs.

>> The information in these structure are useful to report the
>> device/fs information in line with the kernel operations and
>> thus immediately addresses the problem that 'btrfs fi show'
>> command reports the stale information after device device add
>> remove operation is performed. That is because btrfs fi show
>> reads the disks directly.
>
> Hmm.  Would it be enough to get the block device address_space in sync
> with the btrfs stuctures?  Would that get rid of the need for this
> interface?

  Absolutely ! I have that to experiment in linux for
  a long time. More to come on that as time permits from
  the bug fixes which is kind of urgent need as of now.

> But sure, for the sake of argument and code review, let's say that we
> wanted some ioctls to read the btrfs kernel device lists.

  thanks.

>> Further the frame-work provided here would help to enhance
>
> Please don't add a layer of generic encoding above these new ioctls.
> It's not necessary and it makes more work for the various parts of the
> world that want to do deep ioctl inspection (think, for example, of
> strace).

   Agreed. Debugging has to be easier.

>> +static size_t get_ioctl_sz(size_t pl_list_sz, u64 mul, u64 alloc_cnt)
>> +
>> +static int get_ioctl_header_and_payload(unsigned long arg,
>> +		size_t unit_sz, int default_len,
>> +		struct btrfs_ioctl_header **argp, size_t *sz)
>
> This adds a lot of fiddly math and allocation that can go wrong for no
> benefit.  We can restructure the interface so all of this code goes
> away.
>
> 1) Have a simple single fixed input structure for each ioctl.  Maybe
> with some extra padding and a flags argument if you think stuff is going
> to be added over time.  No generic header.  No casting.  The ioctl
> number defines the input structure.  If you need a different structure
> later, use a different ioctl number.

  -No generic header and No casting-  Why not ? anything that
  might not work with strace ?

  (Header-payload model are typically favorites of IO protocol drivers.
  they are easily extensible, checking for compatibility is easier, and
  code re-useability is great).

  Thanks for the review comments below. I will look into that.

Anand
-------
> 2) Don't have a fixed array of output structs embedded in the input
> struct.  Have a pointer to the array of output structs in the input
> struct.  Probably with a number of elements found there.



> 3) Don't copy the giant user output buffer into the kernel, write to it,
> then copy it back to user space.  Instead have the kernel copy_to_user()
> each output structure to the userspace pointer as it goes.  Have the
> ioctl() return a positive count of the number of elements copied.



>>   static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>>   				unsigned long arg)
>>   {
>> -	struct btrfs_ioctl_vol_args *vol;
>> +	struct btrfs_ioctl_vol_args *vol = NULL;
>>   	struct btrfs_fs_devices *fs_devices;
>> -	int ret = -ENOTTY;
>> +	struct btrfs_ioctl_header *argp = NULL;
>> + 	int ret = -ENOTTY;
>> +	size_t sz = 0;
>>
>>   	if (!capable(CAP_SYS_ADMIN))
>>   		return -EPERM;
>>
>> -	vol = memdup_user((void __user *)arg, sizeof(*vol));
>> -	if (IS_ERR(vol))
>> -		return PTR_ERR(vol);
>> -
>>   	switch (cmd) {
>>   	case BTRFS_IOC_SCAN_DEV:
>> +		vol = memdup_user((void __user *)arg, sizeof(*vol));
>> +		if (IS_ERR(vol))
>> +			return PTR_ERR(vol);
>
> Hmm, having to duplicate that previously shared setup into each ioctl
> block is a sign that the layering is wrong.
>
> Don't smush the new ioctls into case bodies of this existing simple
> fuction that worked with the _vol_args ioctls.  Rename this
> btrfs_ioctl_vol_args, or something, and call it from a tiny new
> btrfs_control_ioctl() for its two ioctls.  That new high level btrfs
> ioctl dispatch function can then call the two new _get_devs and
> _get_fsids functions.



>> +	case BTRFS_IOC_GET_FSIDS:
>> +		ret = get_ioctl_header_and_payload(arg,
>> +				sizeof(struct btrfs_ioctl_fs_list),
>> +				BTRFS_FSIDS_LEN, &argp, &sz);
>> +		if (ret == 1) {
>> +			ret = copy_to_user((void __user *)arg, argp, sz);
>> +			kfree(argp);
>> +			return -EFAULT;
>> +		} else if (ret)
>> +			return -EFAULT;
>> +		ret = btrfs_get_fsids(argp);
>> +		if (ret == 0 && copy_to_user((void __user *)arg, argp, sz))
>> +			ret = -EFAULT;
>> +		kfree(argp);
>> +		break;
>
> All of this can go away if you have trivial input and array-of-output
> args that the ioctl helper works with.
>
> 	case BTRFS_IOC_GET_FSIDS:
> 		ret = btrfs_ioctl_get_fsids(arg);
> 		break;


>> +int btrfs_get_fsids(struct btrfs_ioctl_header *argp)
>> +{
>> +	u64 cnt = 0;
>> +	struct btrfs_device *device;
>> +	struct btrfs_fs_devices *fs_devices;
>> +	struct btrfs_ioctl_fs_list *fslist;
>> +	struct btrfs_ioctl_fsinfo *fsinfo;
>> +
>> +	fslist = (struct btrfs_ioctl_fs_list *)((u8 *)argp
>> +						+ sizeof(*argp));
>
> Instead of working with an allocated copy of the input, copy the user
> input struct onto the stack here.
>
> 	struct btrfs_ioctl_get_fsids_input in;
>
> 	if (copy_from_user(&in, argp, sizeof(in)))
> 		return -EFAULT;
>
> (Or you could get_user() each field.. but that's probably not worth it
> for a small input struct.)



>> +
>> +	list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> Surely this needs to be protected by some lock?  The uuid_mutex, maybe?



>> +			fsinfo = &fslist->fsinfo[cnt];
>> +			memcpy(fsinfo->fsid, fs_devices->fsid,
>> +					BTRFS_FSID_SIZE);
>
> And then copy each device's output struct back to userspace one at a
> time instead of building them up in the giant allocated kernel buffer.
>
> This might get a little hairy if you don't want to block under the
> uuid_mutex, but that's solvable too (dummy cursor items on the list,
> probably).



>> +				fsinfo->bytes_used = device->dev_root\
>> +						->fs_info->super_copy->bytes_used;
>
> A line continuation in the middle of a pointer deref?  Cache the
> super_copy pointer on the stack, maybe.  It's probably better to use a
> function that copies a single device's output struct to get all those
> indentation levels back.
>
>> +	/* Todo: optimize. there must be better way of doing this*/
>> +	list_for_each_entry(fs_devices, &fs_uuids, list) {
>
> (same locking question)



>> +				if (device->in_fs_metadata)
>> +					dev->flags = dev->flags |
>> +						BTRFS_DEV_IN_FS_MD;
>
> 	'a = a | b' -> 'a |= b'



>> +				memcpy(dev->name, rcu_str_deref(device->name),
>> +						BTRFS_PATH_NAME_MAX);
>
> Hmm.  Don't we need to be in rcu_read_lock() to use rcu_str_deref()?



>> +struct btrfs_ioctl_fsinfo {
>> +	__u64 sz_self;
>> +	__u8 fsid[BTRFS_FSID_SIZE]; /* out */
>> +	__u64 num_devices;
>> +	__u64 total_devices;
>> +	__u64 missing_devices;
>> +	__u64 total_rw_bytes;
>> +	__u64 bytes_used;
>> +	__u64 flags;
>> +	__u64 default_mount_subvol_id;
>> +	char label[BTRFS_LABEL_SIZE];
>> +}__attribute__ ((__packed__));
>
> It'd be __packed, but I'd naturally align the structs to u64s and not
> use it.



>> +struct btrfs_ioctl_fs_list {
>> +	__u64 all; /* in */
>> +	struct btrfs_ioctl_fsinfo fsinfo[BTRFS_FSIDS_LEN]; /* out */
>> +}__attribute__ ((__packed__));
>
> I'd turn this into something like:
>
> struct btrfs_ioctl_get_fsinfo_input {
> 	__u64 flags;  /* bit for all */
> 	__u64 nr_fsinfo;
> 	__u64 fsinfo_ptr; /* u64 ptr to avoid compat */
> };



>> +struct btrfs_ioctl_devinfo {
>> +	__u64 sz_self;
>> +	__u64 flags;
>> +	__u64 devid;
>> +	__u64 total_bytes;
>> +	__u64 disk_total_bytes;
>> +	__u64 bytes_used;
>> +	__u64 type;
>> +	__u64 generation;
>> +	__u32 io_align;
>> +	__u32 io_width;
>> +	__u32 sector_size;
>   __u32 _padding;
>> +	__u8 uuid[BTRFS_UUID_SIZE];
>> +	__u8 name[BTRFS_PATH_NAME_MAX + 1];
>
> Pad that nutty 4088 byte string to a multiple of u64s and you can
> remove the packed attribute.




> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2013-06-11 14:05 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-10 14:56 [PATCH 0/9] btrfs-progs: coalesce of patches Anand Jain
2013-06-10 14:56 ` [PATCH 1/9] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-06-10 20:00   ` Eric Sandeen
2013-06-11 13:15     ` anand jain
2013-06-10 14:56 ` [PATCH 2/9 v2] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-06-10 14:56 ` [PATCH 3/9 v2] btrfs-progs: update device scan usage Anand Jain
2013-06-10 14:56 ` [PATCH 4/9 v3] btrfs-progs: congregate dev scan Anand Jain
2013-06-10 14:56 ` [PATCH 5/9 v2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-06-10 14:56 ` [PATCH 6/9 v2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-06-10 14:56 ` [PATCH 7/9 v3] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:56 ` [PATCH 8/9] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-06-21  7:41   ` [PATCH 08/13 v2] " Anand Jain
2013-06-10 14:56 ` [PATCH 9/9 v2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-06-10 14:59 ` [PATCH 0/2] btrfs: coalesce of patches Anand Jain
2013-06-10 14:59   ` [PATCH 1/2] btrfs: device delete to get errors from the kernel Anand Jain
2013-06-10 14:59   ` [PATCH 2/2 v2] btrfs: add framework to read fs info and dev info " Anand Jain
2013-06-10 19:40     ` Josef Bacik
2013-06-11 13:10       ` anand jain
2013-06-11 13:15         ` Josef Bacik
2013-06-10 20:30     ` Zach Brown
2013-06-11 14:05       ` anand jain [this message]
2013-06-11 17:50         ` Zach Brown
2013-06-11 14:24     ` Josef Bacik
2013-06-21  7:02       ` Anand Jain
2013-06-21  7:32     ` [PATCH 2/2 v3] btrfs: obtain used_bytes in BTRFS_IOC_FS_INFO ioctl Anand Jain
2013-06-24 17:03       ` Josef Bacik
2013-06-25  3:00         ` Anand Jain
2013-07-08  7:39 ` [PATCH 13/13] btrfs-progs: fix memory leaks of device_list_add() Anand Jain
2013-07-15  4:58   ` Anand Jain
2013-07-15  5:30 ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches Anand Jain
2013-07-15  5:30   ` [PATCH 01/11] btrfs-progs: btrfs_scan_for_fsid doesn't need all the arguments Anand Jain
2013-07-15  5:30   ` [PATCH 02/11] btrfs-progs: label option in btrfs filesystem show is not coded Anand Jain
2013-07-15  5:30   ` [PATCH 03/11] btrfs-progs: update device scan usage Anand Jain
2013-07-15  5:30   ` [PATCH 04/11] btrfs-progs: congregate dev scan Anand Jain
2013-07-15  5:30   ` [PATCH 05/11] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-05 16:53     ` David Sterba
2013-07-15  5:30   ` [PATCH 06/11] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-05 17:04     ` David Sterba
2013-08-13  4:07       ` anand jain
2013-07-15  5:30   ` [PATCH 07/11] btrfs-progs: device delete to get errors from the kernel Anand Jain
2013-07-15  5:30   ` [PATCH 08/11] btrfs-progs: get_label_mounted to return label instead of print Anand Jain
2013-07-15  5:30   ` [PATCH 09/11] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-07-15  5:30   ` [PATCH 10/11] btrfs-progs: get string for the group profile and type Anand Jain
2013-07-15  5:30   ` [PATCH 11/11] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-05 17:36   ` [PATCH 00/11 v2 (resend)] btrfs-progs: coalesce of patches David Sterba
2013-08-06 15:08     ` anand jain
2013-08-08  8:07 ` [PATCH 0/2 v2] introduce btrfs filesystem show --kernel Anand Jain
2013-08-08  8:07   ` [PATCH 1/2] btrfs-progs: move out print in cmd_df to another function Anand Jain
2013-08-08  8:07   ` [PATCH 2/2] btrfs-progs: introduce btrfs filesystem show --kernel Anand Jain
2013-08-08 18:08     ` Zach Brown
2013-08-09 10:57       ` anand jain
2013-08-09 18:03         ` Zach Brown
2013-08-08  8:09 ` [PATCH 0/2] scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08  8:09   ` [PATCH 1/2] btrfs-progs: btrfs_scan_one_dir not to skip links when /dev/mapper is provided Anand Jain
2013-08-08  8:09   ` [PATCH 2/2] btrfs-progs: scan /dev/mapper in filesystem show and device scan Anand Jain
2013-08-08  8:10   ` [PATCH 0/2] " anand jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B72EC5.3000407@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).