* [PATCH 0/2] btrfs: enhance btrfs device path rename
@ 2024-09-24 5:17 Qu Wenruo
2024-09-24 5:17 ` [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device Qu Wenruo
2024-09-24 5:17 ` [PATCH 2/2] btrfs: canonicalize the device path before adding it Qu Wenruo
0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-09-24 5:17 UTC (permalink / raw)
To: linux-btrfs
Fabian Vogt reported a very interesting case where we can use
"/proc/self/fd/3" to mount a btrfs, then that /proc name will remain as
the device path for the btrfs.
Sometimes udev scan can trigger a rename, and get a proper name back,
but if that doesn't win the race, we got the "/proc/self/fd/3" in the
mtab which makes no sense, since "/proc/self" is bounded to the current
process, no one is going to know where the real device is.
So to enhance the handling:
- Do proper path comparation to determine if we want to update device
path
Currently the path comparation is done by strcmp(), which will never
handle hard nor soft links at all.
Instead go with path_equal() on the resolved paths, which is way more
reliable.
This should reduce the unnecessary path name updates in the first
place.
- Canonicalize the device path if it's not inside "/dev/"
If we pass a soft link which is not inside "/dev/", we just follow the
soft link and use the final destination instead.
It can be something like "/dev/dm-2", other than more readable LVM
names like "/dev/test/scratch1", but it's still way better than
"/proc/self/fd/3".
If the soft link itself is already inside "/dev/", then we can
directly use the path.
This is to allow fstest run properly without forced to use
"/dev/dm-*" instead of the "/dev/mapper/test-scratch1" as test
devices.
Otherwise fs tests will reject runs because it believe the btrfs
is mounted somewhere else.
Qu Wenruo (2):
btrfs: avoid unnecessary device path update for the same device
btrfs: canonicalize the device path before adding it
fs/btrfs/volumes.c | 107 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 105 insertions(+), 2 deletions(-)
--
2.46.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device
2024-09-24 5:17 [PATCH 0/2] btrfs: enhance btrfs device path rename Qu Wenruo
@ 2024-09-24 5:17 ` Qu Wenruo
2024-09-24 11:47 ` Filipe Manana
2024-09-24 5:17 ` [PATCH 2/2] btrfs: canonicalize the device path before adding it Qu Wenruo
1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-09-24 5:17 UTC (permalink / raw)
To: linux-btrfs
[PROBLEM]
It is very common for udev to trigger device scan, and every time a
mounted btrfs device got re-scan from different soft links, we will get
unnecessary renames like this:
BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
BTRFS info (device dm-2): using free-space-tree
BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)
The program "mount_by_fd" has the following simple source code:
#include <fcntl.h>
#include <stdio.h>
#include <sys/mount.h>
int main(int argc, char *argv[]) {
int fd = open(argv[1], O_RDWR);
char path[256];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
return mount(path, argv[2], "btrfs", 0, NULL);
}
Note that, all the above paths are all just pointing to "/dev/dm-2".
The "/proc/self/fd/3" is the proc fs, describing all the opened fds.
The "/dev/mapper/test-scratch1" is the soft link created by LVM2.
[CAUSE]
Inside device_list_add(), we are using a very primitive way checking if
the device has changed, strcmp().
Which can never handle links well, no matter if it's hard or soft links.
So every different link of the same device will be treated as different
device, causing the unnecessary device name update.
[FIX]
Introduce a helper, is_same_device(), and use path_equal() to properly
detect the same block device.
So that the different soft links won't trigger the rename race.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 995b0647f538..b713e4ebb362 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,32 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
}
+static bool is_same_device(struct btrfs_device *device, const char *new_path)
+{
+ struct path old = { .mnt = NULL, .dentry = NULL };
+ struct path new = { .mnt = NULL, .dentry = NULL };
+ char *old_path;
+ bool is_same = false;
+ int ret;
+
+ if (!device->name)
+ goto out;
+
+ old_path = rcu_str_deref(device->name);
+ ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
+ if (ret)
+ goto out;
+ ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
+ if (ret)
+ goto out;
+ if (path_equal(&old, &new))
+ is_same = true;
+out:
+ path_put(&old);
+ path_put(&new);
+ return is_same;
+}
+
/*
* Add new device to list of registered devices
*
@@ -852,7 +878,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
MAJOR(path_devt), MINOR(path_devt),
current->comm, task_pid_nr(current));
- } else if (!device->name || strcmp(device->name->str, path)) {
+ } else if (!device->name || !is_same_device(device, path)) {
/*
* When FS is already mounted.
* 1. If you are here and if the device->name is NULL that
--
2.46.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] btrfs: canonicalize the device path before adding it
2024-09-24 5:17 [PATCH 0/2] btrfs: enhance btrfs device path rename Qu Wenruo
2024-09-24 5:17 ` [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device Qu Wenruo
@ 2024-09-24 5:17 ` Qu Wenruo
2024-09-24 12:12 ` Filipe Manana
1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-09-24 5:17 UTC (permalink / raw)
To: linux-btrfs; +Cc: Fabian Vogt
[PROBLEM]
Currently btrfs accepts any file path for its device, resulting some
weird situation:
# ./mount_by_fd /dev/test/scratch1 /mnt/btrfs/
The program has the following source code:
#include <fcntl.h>
#include <stdio.h>
#include <sys/mount.h>
int main(int argc, char *argv[]) {
int fd = open(argv[1], O_RDWR);
char path[256];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
return mount(path, argv[2], "btrfs", 0, NULL);
}
Then we can have the following weird device path:
BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)
Normally it's not a big deal, and later udev can trigger a device path
rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
will show up in mtab.
[CAUSE]
For filename "/proc/self/fd/3", it means the opened file descriptor 3.
In above case, it's exactly the device we want to open, aka points to
"/dev/test/scratch1" which is another softlink pointing to "/dev/dm-2".
Inside btrfs we solve the path using LOOKUP_FOLLOW, which follows the
symbolic link and grab the proper block device.
But we also save the filename into btrfs_device::name, still resulting
the weird path.
[FIX]
Instead of unconditionally trust the path, check if the original file
(not following the symbolic link) is inside "/dev/", if not, then
manually lookup the path to its final destination, and use that as our
device path.
This allows us to still use symbolic links, like
"/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
with LVM2 setup.
And for really weird names, like the above case, we solve it to
"/dev/dm-2" instead.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
Reported-by: Fabian Vogt <fvogt@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 78 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b713e4ebb362..8acb3c465783 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,70 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
}
+/*
+ * We can have very wide soft links passed in.
+ * One example is "/proc/<uid>/fd/<fd>", which can be a soft link to
+ * a proper block device.
+ *
+ * But it's never a good idea to use those weird names.
+ * Here we check if the path (not following symlinks) is a good one inside
+ * "/dev/".
+ */
+static bool is_good_dev_path(const char *dev_path)
+{
+ struct path path = { .mnt = NULL, .dentry = NULL };
+ char *path_buf = NULL;
+ char *resolved_path;
+ bool is_good = false;
+ int ret;
+
+ path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
+ if (!path_buf)
+ goto out;
+
+ /*
+ * Do not follow soft link, just check if the original path is inside
+ * "/dev/".
+ */
+ ret = kern_path(dev_path, 0, &path);
+ if (ret)
+ goto out;
+ resolved_path = d_path(&path, path_buf, PATH_MAX);
+ if (strncmp(resolved_path, "/dev/", strlen("/dev/")))
+ goto out;
+ is_good = true;
+out:
+ kfree(path_buf);
+ path_put(&path);
+ return is_good;
+}
+
+static int get_canonical_dev_path(const char *dev_path, char *canonical)
+{
+ struct path path = { .mnt = NULL, .dentry = NULL };
+ char *path_buf = NULL;
+ char *resolved_path;
+ int ret;
+
+ path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
+ if (!path_buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = kern_path(dev_path, LOOKUP_FOLLOW, &path);
+ if (ret) {
+ pr_info("path lookup failed for %s: %d\n", dev_path, ret);
+ goto out;
+ }
+ resolved_path = d_path(&path, path_buf, PATH_MAX);
+ strncpy(canonical, resolved_path, PATH_MAX - 1);
+out:
+ kfree(path_buf);
+ path_put(&path);
+ return ret;
+}
+
static bool is_same_device(struct btrfs_device *device, const char *new_path)
{
struct path old = { .mnt = NULL, .dentry = NULL };
@@ -1408,12 +1472,23 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
bool new_device_added = false;
struct btrfs_device *device = NULL;
struct file *bdev_file;
+ char *canonical_path = NULL;
u64 bytenr;
dev_t devt;
int ret;
lockdep_assert_held(&uuid_mutex);
+ if (!is_good_dev_path(path)) {
+ canonical_path = kmalloc(PATH_MAX, GFP_KERNEL);
+ if (canonical_path) {
+ ret = get_canonical_dev_path(path, canonical_path);
+ if (ret < 0) {
+ kfree(canonical_path);
+ canonical_path = NULL;
+ }
+ }
+ }
/*
* Avoid an exclusive open here, as the systemd-udev may initiate the
* device scan which may race with the user's mount or mkfs command,
@@ -1458,7 +1533,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
goto free_disk_super;
}
- device = device_list_add(path, disk_super, &new_device_added);
+ device = device_list_add(canonical_path ? canonical_path : path,
+ disk_super, &new_device_added);
if (!IS_ERR(device) && new_device_added)
btrfs_free_stale_devices(device->devt, device);
@@ -1467,6 +1543,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
error_bdev_put:
fput(bdev_file);
+ kfree(canonical_path);
return device;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device
2024-09-24 5:17 ` [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device Qu Wenruo
@ 2024-09-24 11:47 ` Filipe Manana
2024-09-24 21:12 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: Filipe Manana @ 2024-09-24 11:47 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Tue, Sep 24, 2024 at 6:17 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [PROBLEM]
> It is very common for udev to trigger device scan, and every time a
> mounted btrfs device got re-scan from different soft links, we will get
> unnecessary renames like this:
>
> BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
> BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
> BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
> BTRFS info (device dm-2): using free-space-tree
> BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
> BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)
>
> The program "mount_by_fd" has the following simple source code:
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <sys/mount.h>
>
> int main(int argc, char *argv[]) {
> int fd = open(argv[1], O_RDWR);
> char path[256];
>
> snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
> return mount(path, argv[2], "btrfs", 0, NULL);
> }
>
> Note that, all the above paths are all just pointing to "/dev/dm-2".
> The "/proc/self/fd/3" is the proc fs, describing all the opened fds.
This is redundant to say, we all know everything inside /proc belongs
to procfs, and self/fd/X corresponds to an open fd by the calling
task.
> The "/dev/mapper/test-scratch1" is the soft link created by LVM2.
What would be more useful here is to have all the steps to reproduce
the problem:
1) How you invoke that test program, what the arguments are;
2) Any other steps that one should run to reproduce the problem.
Can we also get a test case for fstests?
Even if it's caused by a race and it doesn't always trigger, with
several people often running fstests frequently, possible regressions
will be quickly detected.
>
> [CAUSE]
> Inside device_list_add(), we are using a very primitive way checking if
> the device has changed, strcmp().
>
> Which can never handle links well, no matter if it's hard or soft links.
>
> So every different link of the same device will be treated as different
> device, causing the unnecessary device name update.
>
> [FIX]
> Introduce a helper, is_same_device(), and use path_equal() to properly
> detect the same block device.
> So that the different soft links won't trigger the rename race.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Since there's a publicly accessible report for this, at
https://bugzilla.suse.com/show_bug.cgi?id=1230641, can we please at
least get a Link tag here?
> ---
> fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 995b0647f538..b713e4ebb362 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -732,6 +732,32 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
> return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
> }
>
> +static bool is_same_device(struct btrfs_device *device, const char *new_path)
> +{
> + struct path old = { .mnt = NULL, .dentry = NULL };
> + struct path new = { .mnt = NULL, .dentry = NULL };
This can simply be: ... = { 0 }
But I don't mind this more verbose initialization.
With the change log fixes:
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
> + bool is_same = false;
> + int ret;
> +
> + if (!device->name)
> + goto out;
> +
> + old_path = rcu_str_deref(device->name);
> + ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> + if (ret)
> + goto out;
> + ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
> + if (ret)
> + goto out;
> + if (path_equal(&old, &new))
> + is_same = true;
> +out:
> + path_put(&old);
> + path_put(&new);
> + return is_same;
> +}
> +
> /*
> * Add new device to list of registered devices
> *
> @@ -852,7 +878,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
> MAJOR(path_devt), MINOR(path_devt),
> current->comm, task_pid_nr(current));
>
> - } else if (!device->name || strcmp(device->name->str, path)) {
> + } else if (!device->name || !is_same_device(device, path)) {
> /*
> * When FS is already mounted.
> * 1. If you are here and if the device->name is NULL that
> --
> 2.46.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] btrfs: canonicalize the device path before adding it
2024-09-24 5:17 ` [PATCH 2/2] btrfs: canonicalize the device path before adding it Qu Wenruo
@ 2024-09-24 12:12 ` Filipe Manana
2024-09-24 16:21 ` Roman Mamedov
2024-09-24 21:16 ` Qu Wenruo
0 siblings, 2 replies; 8+ messages in thread
From: Filipe Manana @ 2024-09-24 12:12 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Fabian Vogt
On Tue, Sep 24, 2024 at 6:18 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [PROBLEM]
> Currently btrfs accepts any file path for its device, resulting some
> weird situation:
>
> # ./mount_by_fd /dev/test/scratch1 /mnt/btrfs/
>
> The program has the following source code:
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <sys/mount.h>
>
> int main(int argc, char *argv[]) {
> int fd = open(argv[1], O_RDWR);
> char path[256];
> snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
> return mount(path, argv[2], "btrfs", 0, NULL);
> }
>
> Then we can have the following weird device path:
>
> BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)
>
> Normally it's not a big deal, and later udev can trigger a device path
> rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
> will show up in mtab.
>
> [CAUSE]
> For filename "/proc/self/fd/3", it means the opened file descriptor 3.
> In above case, it's exactly the device we want to open, aka points to
> "/dev/test/scratch1" which is another softlink pointing to "/dev/dm-2".
>
> Inside btrfs we solve the path using LOOKUP_FOLLOW, which follows the
> symbolic link and grab the proper block device.
>
> But we also save the filename into btrfs_device::name, still resulting
> the weird path.
>
> [FIX]
> Instead of unconditionally trust the path, check if the original file
> (not following the symbolic link) is inside "/dev/", if not, then
> manually lookup the path to its final destination, and use that as our
> device path.
>
> This allows us to still use symbolic links, like
> "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
> with LVM2 setup.
>
> And for really weird names, like the above case, we solve it to
> "/dev/dm-2" instead.
>
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
> Reported-by: Fabian Vogt <fvogt@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/volumes.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b713e4ebb362..8acb3c465783 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -732,6 +732,70 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
> return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
> }
>
> +/*
> + * We can have very wide soft links passed in.
Wide? Did you mean wild in the sense of unusual?
> + * One example is "/proc/<uid>/fd/<fd>", which can be a soft link to
> + * a proper block device.
> + *
> + * But it's never a good idea to use those weird names.
> + * Here we check if the path (not following symlinks) is a good one inside
> + * "/dev/".
> + */
> +static bool is_good_dev_path(const char *dev_path)
> +{
> + struct path path = { .mnt = NULL, .dentry = NULL };
> + char *path_buf = NULL;
> + char *resolved_path;
> + bool is_good = false;
> + int ret;
> +
> + path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
> + if (!path_buf)
> + goto out;
> +
> + /*
> + * Do not follow soft link, just check if the original path is inside
> + * "/dev/".
> + */
> + ret = kern_path(dev_path, 0, &path);
> + if (ret)
> + goto out;
> + resolved_path = d_path(&path, path_buf, PATH_MAX);
> + if (strncmp(resolved_path, "/dev/", strlen("/dev/")))
> + goto out;
> + is_good = true;
> +out:
> + kfree(path_buf);
> + path_put(&path);
> + return is_good;
> +}
> +
> +static int get_canonical_dev_path(const char *dev_path, char *canonical)
> +{
> + struct path path = { .mnt = NULL, .dentry = NULL };
> + char *path_buf = NULL;
> + char *resolved_path;
> + int ret;
> +
> + path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
> + if (!path_buf) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + ret = kern_path(dev_path, LOOKUP_FOLLOW, &path);
> + if (ret) {
> + pr_info("path lookup failed for %s: %d\n", dev_path, ret);
Why not btrfs_info(), or better yet, btrfs_warn()?
It accepts a NULL fs_info argument and allows for a more standardized
btrfs message, with a proper prefix, etc.
> + goto out;
> + }
> + resolved_path = d_path(&path, path_buf, PATH_MAX);
> + strncpy(canonical, resolved_path, PATH_MAX - 1);
Please don't use strncpy(). This is strongly discouraged due to
security issues, see:
https://github.com/KSPP/linux/issues/90
https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
> +out:
> + kfree(path_buf);
> + path_put(&path);
> + return ret;
> +}
> +
> static bool is_same_device(struct btrfs_device *device, const char *new_path)
> {
> struct path old = { .mnt = NULL, .dentry = NULL };
> @@ -1408,12 +1472,23 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> bool new_device_added = false;
> struct btrfs_device *device = NULL;
> struct file *bdev_file;
> + char *canonical_path = NULL;
> u64 bytenr;
> dev_t devt;
> int ret;
>
> lockdep_assert_held(&uuid_mutex);
>
> + if (!is_good_dev_path(path)) {
> + canonical_path = kmalloc(PATH_MAX, GFP_KERNEL);
> + if (canonical_path) {
> + ret = get_canonical_dev_path(path, canonical_path);
> + if (ret < 0) {
> + kfree(canonical_path);
> + canonical_path = NULL;
> + }
> + }
> + }
> /*
> * Avoid an exclusive open here, as the systemd-udev may initiate the
> * device scan which may race with the user's mount or mkfs command,
> @@ -1458,7 +1533,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> goto free_disk_super;
> }
>
> - device = device_list_add(path, disk_super, &new_device_added);
> + device = device_list_add(canonical_path ? canonical_path : path,
Can use the shortcut: canonical_path ?: path
The rest looks fine, thanks.
> + disk_super, &new_device_added);
> if (!IS_ERR(device) && new_device_added)
> btrfs_free_stale_devices(device->devt, device);
>
> @@ -1467,6 +1543,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>
> error_bdev_put:
> fput(bdev_file);
> + kfree(canonical_path);
>
> return device;
> }
> --
> 2.46.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] btrfs: canonicalize the device path before adding it
2024-09-24 12:12 ` Filipe Manana
@ 2024-09-24 16:21 ` Roman Mamedov
2024-09-24 21:16 ` Qu Wenruo
1 sibling, 0 replies; 8+ messages in thread
From: Roman Mamedov @ 2024-09-24 16:21 UTC (permalink / raw)
To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs, Fabian Vogt
On Tue, 24 Sep 2024 13:12:09 +0100
Filipe Manana <fdmanana@kernel.org> wrote:
> > +/*
> > + * We can have very wide soft links passed in.
>
> Wide? Did you mean wild in the sense of unusual?
Samba uses this term to refer to links pointing outside of the particular
network share tree:
https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html#WIDELINKS
https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html#ALLOWINSECUREWIDELINKS
Not sure what is the origin or etymology, guess it may stem from "VFS-wide",
links across the entire pathname space.
--
With respect,
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device
2024-09-24 11:47 ` Filipe Manana
@ 2024-09-24 21:12 ` Qu Wenruo
0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-09-24 21:12 UTC (permalink / raw)
To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs
在 2024/9/24 21:17, Filipe Manana 写道:
> On Tue, Sep 24, 2024 at 6:17 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [PROBLEM]
>> It is very common for udev to trigger device scan, and every time a
>> mounted btrfs device got re-scan from different soft links, we will get
>> unnecessary renames like this:
>>
>> BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
>> BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
>> BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
>> BTRFS info (device dm-2): using free-space-tree
>> BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
>> BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)
>>
>> The program "mount_by_fd" has the following simple source code:
>>
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <sys/mount.h>
>>
>> int main(int argc, char *argv[]) {
>> int fd = open(argv[1], O_RDWR);
>> char path[256];
>>
>> snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
>> return mount(path, argv[2], "btrfs", 0, NULL);
>> }
>>
>> Note that, all the above paths are all just pointing to "/dev/dm-2".
>> The "/proc/self/fd/3" is the proc fs, describing all the opened fds.
>
> This is redundant to say, we all know everything inside /proc belongs
> to procfs, and self/fd/X corresponds to an open fd by the calling
> task.
>
>> The "/dev/mapper/test-scratch1" is the soft link created by LVM2.
>
> What would be more useful here is to have all the steps to reproduce
> the problem:
>
> 1) How you invoke that test program, what the arguments are;
> 2) Any other steps that one should run to reproduce the problem.
>
> Can we also get a test case for fstests?
From my tests with ext4, it's really btrfs' ability to change path name
halfway saving us.
For ext4, the weird device name will stick there forever, and I do not
even know if this is the expected behavior or not.
> Even if it's caused by a race and it doesn't always trigger, with
> several people often running fstests frequently, possible regressions
> will be quickly detected.
So far I haven't hit a case where the proc name can stay persistent,
udev rescan always saves us.
I guess the only reliable way to get such situation is to disable udev
or on systems without udev completely.
>
>>
>> [CAUSE]
>> Inside device_list_add(), we are using a very primitive way checking if
>> the device has changed, strcmp().
>>
>> Which can never handle links well, no matter if it's hard or soft links.
>>
>> So every different link of the same device will be treated as different
>> device, causing the unnecessary device name update.
>>
>> [FIX]
>> Introduce a helper, is_same_device(), and use path_equal() to properly
>> detect the same block device.
>> So that the different soft links won't trigger the rename race.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Since there's a publicly accessible report for this, at
> https://bugzilla.suse.com/show_bug.cgi?id=1230641, can we please at
> least get a Link tag here?
Sure.
>
>> ---
>> fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 995b0647f538..b713e4ebb362 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -732,6 +732,32 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
>> return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
>> }
>>
>> +static bool is_same_device(struct btrfs_device *device, const char *new_path)
>> +{
>> + struct path old = { .mnt = NULL, .dentry = NULL };
>> + struct path new = { .mnt = NULL, .dentry = NULL };
>
> This can simply be: ... = { 0 }
Unfortunately that will trigger designated-init warning since the
structure has __randomize_layout attribute.
>
> But I don't mind this more verbose initialization.
>
> With the change log fixes:
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks for the review,
Qu
>
> Thanks.
>
>> + bool is_same = false;
>> + int ret;
>> +
>> + if (!device->name)
>> + goto out;
>> +
>> + old_path = rcu_str_deref(device->name);
>> + ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
>> + if (ret)
>> + goto out;
>> + ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
>> + if (ret)
>> + goto out;
>> + if (path_equal(&old, &new))
>> + is_same = true;
>> +out:
>> + path_put(&old);
>> + path_put(&new);
>> + return is_same;
>> +}
>> +
>> /*
>> * Add new device to list of registered devices
>> *
>> @@ -852,7 +878,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>> MAJOR(path_devt), MINOR(path_devt),
>> current->comm, task_pid_nr(current));
>>
>> - } else if (!device->name || strcmp(device->name->str, path)) {
>> + } else if (!device->name || !is_same_device(device, path)) {
>> /*
>> * When FS is already mounted.
>> * 1. If you are here and if the device->name is NULL that
>> --
>> 2.46.1
>>
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] btrfs: canonicalize the device path before adding it
2024-09-24 12:12 ` Filipe Manana
2024-09-24 16:21 ` Roman Mamedov
@ 2024-09-24 21:16 ` Qu Wenruo
1 sibling, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-09-24 21:16 UTC (permalink / raw)
To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs, Fabian Vogt
在 2024/9/24 21:42, Filipe Manana 写道:
> On Tue, Sep 24, 2024 at 6:18 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [PROBLEM]
>> Currently btrfs accepts any file path for its device, resulting some
>> weird situation:
>>
>> # ./mount_by_fd /dev/test/scratch1 /mnt/btrfs/
>>
>> The program has the following source code:
>>
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <sys/mount.h>
>>
>> int main(int argc, char *argv[]) {
>> int fd = open(argv[1], O_RDWR);
>> char path[256];
>> snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
>> return mount(path, argv[2], "btrfs", 0, NULL);
>> }
>>
>> Then we can have the following weird device path:
>>
>> BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)
>>
>> Normally it's not a big deal, and later udev can trigger a device path
>> rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
>> will show up in mtab.
>>
>> [CAUSE]
>> For filename "/proc/self/fd/3", it means the opened file descriptor 3.
>> In above case, it's exactly the device we want to open, aka points to
>> "/dev/test/scratch1" which is another softlink pointing to "/dev/dm-2".
>>
>> Inside btrfs we solve the path using LOOKUP_FOLLOW, which follows the
>> symbolic link and grab the proper block device.
>>
>> But we also save the filename into btrfs_device::name, still resulting
>> the weird path.
>>
>> [FIX]
>> Instead of unconditionally trust the path, check if the original file
>> (not following the symbolic link) is inside "/dev/", if not, then
>> manually lookup the path to its final destination, and use that as our
>> device path.
>>
>> This allows us to still use symbolic links, like
>> "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
>> with LVM2 setup.
>>
>> And for really weird names, like the above case, we solve it to
>> "/dev/dm-2" instead.
>>
>> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
>> Reported-by: Fabian Vogt <fvogt@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/volumes.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 78 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b713e4ebb362..8acb3c465783 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -732,6 +732,70 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
>> return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
>> }
>>
>> +/*
>> + * We can have very wide soft links passed in.
>
> Wide? Did you mean wild in the sense of unusual?
My bad, I mean "weird".
>
>> + * One example is "/proc/<uid>/fd/<fd>", which can be a soft link to
>> + * a proper block device.
>> + *
>> + * But it's never a good idea to use those weird names.
>> + * Here we check if the path (not following symlinks) is a good one inside
>> + * "/dev/".
>> + */
>> +static bool is_good_dev_path(const char *dev_path)
>> +{
>> + struct path path = { .mnt = NULL, .dentry = NULL };
>> + char *path_buf = NULL;
>> + char *resolved_path;
>> + bool is_good = false;
>> + int ret;
>> +
>> + path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
>> + if (!path_buf)
>> + goto out;
>> +
>> + /*
>> + * Do not follow soft link, just check if the original path is inside
>> + * "/dev/".
>> + */
>> + ret = kern_path(dev_path, 0, &path);
>> + if (ret)
>> + goto out;
>> + resolved_path = d_path(&path, path_buf, PATH_MAX);
>> + if (strncmp(resolved_path, "/dev/", strlen("/dev/")))
>> + goto out;
>> + is_good = true;
>> +out:
>> + kfree(path_buf);
>> + path_put(&path);
>> + return is_good;
>> +}
>> +
>> +static int get_canonical_dev_path(const char *dev_path, char *canonical)
>> +{
>> + struct path path = { .mnt = NULL, .dentry = NULL };
>> + char *path_buf = NULL;
>> + char *resolved_path;
>> + int ret;
>> +
>> + path_buf = kmalloc(PATH_MAX, GFP_KERNEL);
>> + if (!path_buf) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + ret = kern_path(dev_path, LOOKUP_FOLLOW, &path);
>> + if (ret) {
>> + pr_info("path lookup failed for %s: %d\n", dev_path, ret);
>
> Why not btrfs_info(), or better yet, btrfs_warn()?
Oh, it's some debug code not removed.
In fact there should be no need to output anything.
We can always fallback to the super weird name.
>
> It accepts a NULL fs_info argument and allows for a more standardized
> btrfs message, with a proper prefix, etc.
>
>
>> + goto out;
>> + }
>> + resolved_path = d_path(&path, path_buf, PATH_MAX);
>> + strncpy(canonical, resolved_path, PATH_MAX - 1);
>
> Please don't use strncpy(). This is strongly discouraged due to
> security issues, see:
>
> https://github.com/KSPP/linux/issues/90
> https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Thanks a lot for pointing to this.
Thanks,
Qu
>
>> +out:
>> + kfree(path_buf);
>> + path_put(&path);
>> + return ret;
>> +}
>> +
>> static bool is_same_device(struct btrfs_device *device, const char *new_path)
>> {
>> struct path old = { .mnt = NULL, .dentry = NULL };
>> @@ -1408,12 +1472,23 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>> bool new_device_added = false;
>> struct btrfs_device *device = NULL;
>> struct file *bdev_file;
>> + char *canonical_path = NULL;
>> u64 bytenr;
>> dev_t devt;
>> int ret;
>>
>> lockdep_assert_held(&uuid_mutex);
>>
>> + if (!is_good_dev_path(path)) {
>> + canonical_path = kmalloc(PATH_MAX, GFP_KERNEL);
>> + if (canonical_path) {
>> + ret = get_canonical_dev_path(path, canonical_path);
>> + if (ret < 0) {
>> + kfree(canonical_path);
>> + canonical_path = NULL;
>> + }
>> + }
>> + }
>> /*
>> * Avoid an exclusive open here, as the systemd-udev may initiate the
>> * device scan which may race with the user's mount or mkfs command,
>> @@ -1458,7 +1533,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>> goto free_disk_super;
>> }
>>
>> - device = device_list_add(path, disk_super, &new_device_added);
>> + device = device_list_add(canonical_path ? canonical_path : path,
>
> Can use the shortcut: canonical_path ?: path
>
> The rest looks fine, thanks.
>
>
>> + disk_super, &new_device_added);
>> if (!IS_ERR(device) && new_device_added)
>> btrfs_free_stale_devices(device->devt, device);
>>
>> @@ -1467,6 +1543,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>
>> error_bdev_put:
>> fput(bdev_file);
>> + kfree(canonical_path);
>>
>> return device;
>> }
>> --
>> 2.46.1
>>
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-09-24 21:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 5:17 [PATCH 0/2] btrfs: enhance btrfs device path rename Qu Wenruo
2024-09-24 5:17 ` [PATCH 1/2] btrfs: avoid unnecessary device path update for the same device Qu Wenruo
2024-09-24 11:47 ` Filipe Manana
2024-09-24 21:12 ` Qu Wenruo
2024-09-24 5:17 ` [PATCH 2/2] btrfs: canonicalize the device path before adding it Qu Wenruo
2024-09-24 12:12 ` Filipe Manana
2024-09-24 16:21 ` Roman Mamedov
2024-09-24 21:16 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).