Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <kreijack@inwind.it>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 1/5] Avoid to consider lvm snapshots when scanning devices.
Date: Tue, 9 Dec 2014 08:32:20 +0800	[thread overview]
Message-ID: <54864314.1000403@cn.fujitsu.com> (raw)
In-Reply-To: <5485BC81.5000906@gmail.com>

Hi Goffredo

On 12/08/2014 03:02 AM, Qu Wenruo wrote:
>> -------- Original Message --------
>> Subject: [PATCH 1/5] Avoid to consider lvm snapshots when scanning devices.
>> From: Goffredo Baroncelli <kreijack@gmail.com>
>> To: <linux-btrfs@vger.kernel.org>
>> Date: 2014年12月05日 02:39
>>> LVM snapshots create a problem to the btrfs devices management.
>>> BTRFS assumes that each device haw an unique 'device UUID'.
>>> A LVM snapshot breaks this assumption.
>>>
>>> This patch skips LVM snapshots during the device scan phase.
>>> If you need to consider a LVM snapshot you have to set the
>>> environmental variable BTRFS_SKIP_LVM_SNAPSHOT to "no".
>> IMHO, it is better only to skip LVM snapshot if and only if the snapshot contains a btrfs with
>> conflicting UUID.
> Hi Qu,
>
> Currently the "scan phase" in btrfs is done 1 device at time.
> (udev finds a new device and starts "btrfs dev scan <device>"),
> and I haven't changed that. This means that btrfs[-prog] doesn't
> know which devices are already registered, or not. And if even
> it would know this information, you have to consider the case
> where the snapshot appears before the real target. So
> btrfs[-prog] is not in position to perform this analysis [see
> below my other comment]
>
>> Since LVM is such a flexible block level volume management, it is possible that some one
>> did a snapshot of a btrfs fs, and then reformat the original one to other fs.
>> In that case, the LVM snapshot skip seems overkilled.
>>
>> Also, personally, I prefer there will be some option like -i to allow user to choose which device is
>> used when conflicting uuid is detected. This seems to be the best case and user can have the full control
>> on device scan. This also makes the environment variant not needed.
>>
>> LVM snapshot skip (only when conflicting) is better to be the fallback behavior if -i is not given.
> I understood your reasons, but I don't find any solution
> compatible with the current btrfs device registration model
> (asynchronously when the device appears).
>
> In another patch set, I proposed a mount.btrfs helper which is
> in position to perform these analysis and to pick the "right"
> device (even with the user suggestion).
>
> Today the lvm-snapshot and btrfs behave very poor: it is not
> predictable which device is pick (the original or the snapshot).
> These patch *avoid* most problems skipping the snapshots, which
> to me seems a reasonable default.
> For the other case the user is still able to mount any disks
> [combination] passing them directly via command line (
> mount /dev/sdX -o device=/dev/sdY,device=/dev/sdz...  );
>
> Anyway I think for these kind of setup (btrfs on lvm-snapshot),
> passing the disks explicitly is the only solution; in fact your
> suggestion about the '-i' switch is not very different.
>
>> Thanks,
>> Qu
> BR
> G.Baroncelli
Your explains sounds quite reasonable, it's good enough before any 
better solutions.

Thanks,
Qu
>>> To check if a device is a LVM snapshot, it is checked the
>>> 'udev' device property 'DM_UDEV_LOW_PRIORITY_FLAG' .
>>> If it is set to 1, the device has to be skipped.
>>>
>>> As conseguence, btrfs now depends by libudev.
>>>
>>> Programmatically you can control this behavior with the functions:
>>> - btrfs_scan_set_skip_lvm_snapshot(int new_value)
>>> - int btrfs_scan_get_skip_lvm_snapshot( )
>>>
>>> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
>>> ---
>>>    Makefile |   4 +--
>>>    utils.c  | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    utils.h  |   9 +++++-
>>>    3 files changed, 117 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/Makefile b/Makefile
>>> index 4cae30c..9464361 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -26,7 +26,7 @@ TESTS = fsck-tests.sh convert-tests.sh
>>>    INSTALL = install
>>>    prefix ?= /usr/local
>>>    bindir = $(prefix)/bin
>>> -lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L.
>>> +lib_LIBS = -luuid -lblkid -lm -lz -ludev -llzo2 -L.
>>>    libdir ?= $(prefix)/lib
>>>    incdir = $(prefix)/include/btrfs
>>>    LIBS = $(lib_LIBS) $(libs_static)
>>> @@ -99,7 +99,7 @@ lib_links = libbtrfs.so.0 libbtrfs.so
>>>    headers = $(libbtrfs_headers)
>>>      # make C=1 to enable sparse
>>> -check_defs := .cc-defines.h
>>> +check_defs := .cc-defines.h
>>>    ifdef C
>>>        #
>>>        # We're trying to use sparse against glibc headers which go wild
>>> diff --git a/utils.c b/utils.c
>>> index 2a92416..9887f8b 100644
>>> --- a/utils.c
>>> +++ b/utils.c
>>> @@ -29,6 +29,7 @@
>>>    #include <sys/types.h>
>>>    #include <sys/stat.h>
>>>    #include <uuid/uuid.h>
>>> +#include <libudev.h>
>>>    #include <fcntl.h>
>>>    #include <unistd.h>
>>>    #include <mntent.h>
>>> @@ -52,6 +53,13 @@
>>>    #define BLKDISCARD    _IO(0x12,119)
>>>    #endif
>>>    +/*
>>> + * This variable controls if the lvm snapshot have to be skipped or not.
>>> + * Access this variable only via the btrfs_scan_[sg]et_skip_lvm_snapshot()
>>> + * functions
>>> + */
>>> +static int __scan_device_skip_lvm_snapshot = -1;
>>> +
>>>    static int btrfs_scan_done = 0;
>>>      static char argv0_buf[ARGV0_BUF_SIZE] = "btrfs";
>>> @@ -1593,6 +1601,9 @@ int btrfs_scan_block_devices(int run_ioctl)
>>>        char fullpath[110];
>>>        int scans = 0;
>>>        int special;
>>> +    int skip_snapshot;
>>> +
>>> +    skip_snapshot = btrfs_scan_get_skip_lvm_snapshot();
>>>      scan_again:
>>>        proc_partitions = fopen("/proc/partitions","r");
>>> @@ -1642,6 +1653,9 @@ scan_again:
>>>                continue;
>>>            }
>>>    +        if (skip_snapshot && is_low_priority_device(fullpath))
>>> +            continue;
>>> +
>>>            fd = open(fullpath, O_RDONLY);
>>>            if (fd < 0) {
>>>                if (errno != ENOMEDIUM)
>>> @@ -2182,6 +2196,30 @@ int test_dev_for_mkfs(char *file, int force_overwrite, char *estr)
>>>        return 0;
>>>    }
>>>    +int btrfs_scan_get_skip_lvm_snapshot( )
>>> +{
>>> +    const char *value;
>>> +
>>> +    if (__scan_device_skip_lvm_snapshot != -1 )
>>> +        return __scan_device_skip_lvm_snapshot;
>>> +
>>> +    value = getenv(BTRFS_SKIP_LVM_SNAPSHOT_ENV_NAME);
>>> +    if (value && !strcasecmp(value, "NO"))
>>> +        __scan_device_skip_lvm_snapshot = 0;
>>> +    else  if (value && !strcasecmp(value, "YES"))
>>> +        __scan_device_skip_lvm_snapshot = 1;
>>> +    else
>>> +        __scan_device_skip_lvm_snapshot =
>>> +            BTRFS_SKIP_LVM_SNAPSHOT_DEFAULT;
>>> +
>>> +    return __scan_device_skip_lvm_snapshot;
>>> +}
>>> +
>>> +void btrfs_scan_set_skip_lvm_snapshot(int new_value)
>>> +{
>>> +    __scan_device_skip_lvm_snapshot = !!new_value;
>>> +}
>>> +
>>>    int btrfs_scan_lblkid()
>>>    {
>>>        int fd = -1;
>>> @@ -2192,6 +2230,9 @@ int btrfs_scan_lblkid()
>>>        blkid_dev dev = NULL;
>>>        blkid_cache cache = NULL;
>>>        char path[PATH_MAX];
>>> +    int skip_snapshot;
>>> +
>>> +    skip_snapshot = btrfs_scan_get_skip_lvm_snapshot();
>>>          if (btrfs_scan_done)
>>>            return 0;
>>> @@ -2210,6 +2251,9 @@ int btrfs_scan_lblkid()
>>>            /* if we are here its definitely a btrfs disk*/
>>>            strncpy(path, blkid_dev_devname(dev), PATH_MAX);
>>>    +        if (skip_snapshot && is_low_priority_device(path))
>>> +            continue;
>>> +
>>>            fd = open(path, O_RDONLY);
>>>            if (fd < 0) {
>>>                printf("ERROR: could not open %s\n", path);
>>> @@ -2450,3 +2494,66 @@ int find_next_key(struct btrfs_path *path, struct btrfs_key *key)
>>>        }
>>>        return 1;
>>>    }
>>> +
>>> +/*
>>> + * This function return 1 if the device (path) is consdered "LOW_PRIORITY" by
>>> + * LVM2 library. These device are typically the nsapshot.
>>> + * This function return < 0 in case of error; 0 otherwise.
>>> + */
>>> +int is_low_priority_device(const char *path)
>>> +{
>>> +
>>> +    struct udev *udev=NULL;
>>> +    struct udev_device *dev;
>>> +    struct udev_enumerate *enumerate=NULL;
>>> +    struct udev_list_entry *devices;
>>> +    struct udev_list_entry *node, *list;
>>> +    int ret=-1;
>>> +    const char *value, *syspath;
>>> +    char *rpath=NULL;
>>> +
>>> +    rpath = realpath(path, NULL);
>>> +    if (!rpath) {
>>> +        fprintf(stderr, "ERROR: not enough memory\n");
>>> +        ret=-2;
>>> +        goto exit;
>>> +    }
>>> +
>>> +    /* Create the udev object */
>>> +    udev = udev_new();
>>> +    if (!udev) {
>>> +        fprintf(stderr, "ERROR: Can't create udev\n");
>>> +        ret=-1;
>>> +        goto exit;
>>> +    }
>>> +
>>> +    enumerate = udev_enumerate_new(udev);
>>> +    udev_enumerate_add_match_subsystem(enumerate, "block");
>>> +    udev_enumerate_add_match_property(enumerate, "DEVNAME", rpath);
>>> +    udev_enumerate_scan_devices(enumerate);
>>> +
>>> +    devices = udev_enumerate_get_list_entry(enumerate);
>>> +    syspath = udev_list_entry_get_name(devices);
>>> +
>>> +    dev = udev_device_new_from_syspath(udev, syspath);
>>> +
>>> +    list = udev_device_get_properties_list_entry (dev);
>>> +
>>> +    ret = 0;
>>> +    node = udev_list_entry_get_by_name(list, "DM_UDEV_LOW_PRIORITY_FLAG");
>>> +    if (node) {
>>> +        value = udev_list_entry_get_value(node);
>>> +        if (value && !strcmp(value, "1"))
>>> +            ret = 1;
>>> +    }
>>> +
>>> +exit:
>>> +    free(rpath);
>>> +
>>> +    /* Free the enumerator object */
>>> +    udev_enumerate_unref(enumerate);
>>> +
>>> +    udev_unref(udev);
>>> +
>>> +    return ret;
>>> +}
>>> diff --git a/utils.h b/utils.h
>>> index 289e86b..9855f09 100644
>>> --- a/utils.h
>>> +++ b/utils.h
>>> @@ -128,7 +128,7 @@ int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
>>>                   int verify);
>>>    int ask_user(char *question);
>>>    int lookup_ino_rootid(int fd, u64 *rootid);
>>> -int btrfs_scan_lblkid(void);
>>> +int btrfs_scan_lblkid();
>>>    int get_btrfs_mount(const char *dev, char *mp, size_t mp_size);
>>>    int find_mount_root(const char *path, char **mount_root);
>>>    int get_device_info(int fd, u64 devid,
>>> @@ -161,4 +161,11 @@ static inline u64 btrfs_min_dev_size(u32 leafsize)
>>>      int find_next_key(struct btrfs_path *path, struct btrfs_key *key);
>>>    +#define BTRFS_SKIP_LVM_SNAPSHOT_DEFAULT 1
>>> +#define BTRFS_SKIP_LVM_SNAPSHOT_ENV_NAME "BTRFS_SKIP_LVM_SNAPSHOT"
>>> +
>>> +int is_low_priority_device(const char *path);
>>> +void btrfs_scan_set_skip_lvm_snapshot(int new_value);
>>> +int btrfs_scan_get_skip_lvm_snapshot( );
>>> +
>>>    #endif
>>
>


  reply	other threads:[~2014-12-09  0:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-04 18:39 [PATCH V2][BTRFS-PROGS] Don't use LVM snapshot device Goffredo Baroncelli
2014-12-04 18:39 ` [PATCH 1/5] Avoid to consider lvm snapshots when scanning devices Goffredo Baroncelli
2014-12-08  2:02   ` Qu Wenruo
2014-12-08 14:58     ` Goffredo Baroncelli
2014-12-09  0:32       ` Qu Wenruo [this message]
2014-12-09 10:27       ` David Sterba
2014-12-09 18:19         ` Goffredo Baroncelli
2014-12-04 18:39 ` [PATCH 2/5] 'btrfs device scan' skips lvm snapshots Goffredo Baroncelli
2014-12-04 18:39 ` [PATCH 3/5] Update 'btrfs device scan' man page Goffredo Baroncelli
2014-12-04 18:39 ` [PATCH 4/5] Add reference to BTRFS_SKIP_LVM_SNAPSHOT environment variable Goffredo Baroncelli
2014-12-04 18:39 ` [PATCH 5/5] Abort in case of device uuid conflict Goffredo Baroncelli
2014-12-05  7:26 ` [PATCH V2][BTRFS-PROGS] Don't use LVM snapshot device Duncan
2014-12-05 18:39   ` Goffredo Baroncelli
2014-12-08 15:30 ` Phillip Susi
2014-12-08 17:36   ` Goffredo Baroncelli
2014-12-08 18:17     ` Phillip Susi
2014-12-08 19:22     ` Robert White
2014-12-10  7:52 ` Anand Jain
2014-12-10 18:40   ` Goffredo Baroncelli
  -- strict thread matches above, loose matches on Subject: below --
2014-12-04 18:24 [PATCH][BTRFS-PROGS] Skip " Goffredo Baroncelli
2014-12-04 18:24 ` [PATCH 1/5] Avoid to consider lvm snapshots when scanning devices Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54864314.1000403@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox