* [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan
@ 2024-02-02 0:59 Qu Wenruo
2024-02-02 0:59 ` [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache Qu Wenruo
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-02-02 0:59 UTC (permalink / raw)
To: linux-btrfs
Although my previous flock() based solution is preventing udev scan to
get pre-mature super blocks, it in fact masks the root problem:
There is a stray close() on writeable fd.
Commit b2a1be83b85f ("btrfs-progs: mkfs: keep file descriptors open
during whole time") tries to solve the problem by extending the lifespan
of writeable fds, so that when the fds are closed, the fs is ensured to
be properly populated.
The problem is, that patch is not covering all cases, there is a stray
fd close just under our noses: open_ctree_fs_info().
The function would open the initial device, then use that initial fd to
open the btrfs, then we immediately close the initial fd, as later IO
would all go with the device fd.
That close() call is causing problem, especially for mkfs, as at that
stage the fs is still using a temporary super block, not using the valid
btrfs super magic number.
Thus udev scan would race with mkfs, and if udev wins the race, it would
get the temporary super block, making libblk not to detect the new
btrfs.
This patchset would address the problem by:
- Make sure open_ctree*() calls all have corresponding close_ctree()
The first patch, as later we will only close the initial fd caused by
open_ctree_fs_info() during close_ctree().
- Save the initial fd into btrfs_fs_info for open_ctree_fs_info()
And later close the initial fd during close_ctree().
- Make sure open_ctree_fd() callers to properly close the fd
Just an extra cleanup.
This patchset would work even without the usage of flock() to block udev
scan, and since without flock() calls, the procedure is much simpler.
Qu Wenruo (3):
btrfs-progs: rescue: properly close the fs for clear-ino-cache
btrfs-progs: tune: fix the missing close()
btrfs-progs: fix the stray fd close in open_ctree_fs_info()
cmds/rescue.c | 1 +
kernel-shared/ctree.h | 9 +++++++++
kernel-shared/disk-io.c | 8 +++++++-
tune/main.c | 1 +
4 files changed, 18 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache 2024-02-02 0:59 [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan Qu Wenruo @ 2024-02-02 0:59 ` Qu Wenruo 2024-02-02 8:48 ` Anand Jain 2024-02-02 0:59 ` [PATCH 2/3] btrfs-progs: tune: fix the missing close() Qu Wenruo ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2024-02-02 0:59 UTC (permalink / raw) To: linux-btrfs [BUG] In cmd_rescue_clear_ino_cache(), we opened the fs, but without closing it using close_ctree(). [CAUSE] All my bad, I forgot it, as the original code is inside btrfs check, and had a "goto out_close;" to properly close the fs. [FIX] Manually call close_ctree() on the fs_info->tree_root. Signed-off-by: Qu Wenruo <wqu@suse.com> --- cmds/rescue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/cmds/rescue.c b/cmds/rescue.c index 621e243b2073..6d7d526df145 100644 --- a/cmds/rescue.c +++ b/cmds/rescue.c @@ -451,6 +451,7 @@ static int cmd_rescue_clear_ino_cache(const struct cmd_struct *cmd, } else { pr_verbose(LOG_DEFAULT, "Successfully cleared ino cache"); } + close_ctree(fs_info->tree_root); out: return !!ret; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache 2024-02-02 0:59 ` [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache Qu Wenruo @ 2024-02-02 8:48 ` Anand Jain 0 siblings, 0 replies; 10+ messages in thread From: Anand Jain @ 2024-02-02 8:48 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 2/2/24 06:29, Qu Wenruo wrote: > [BUG] > In cmd_rescue_clear_ino_cache(), we opened the fs, but without > closing it using close_ctree(). > > [CAUSE] > All my bad, I forgot it, as the original code is inside btrfs check, and > had a "goto out_close;" to properly close the fs. Nit: Forgot to add commit id that missed this? Reviewed-by: Anand Jain <anand.jain@oracle.com> Thx. Anand > > [FIX] > Manually call close_ctree() on the fs_info->tree_root. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > cmds/rescue.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/cmds/rescue.c b/cmds/rescue.c > index 621e243b2073..6d7d526df145 100644 > --- a/cmds/rescue.c > +++ b/cmds/rescue.c > @@ -451,6 +451,7 @@ static int cmd_rescue_clear_ino_cache(const struct cmd_struct *cmd, > } else { > pr_verbose(LOG_DEFAULT, "Successfully cleared ino cache"); > } > + close_ctree(fs_info->tree_root); > out: > return !!ret; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] btrfs-progs: tune: fix the missing close() 2024-02-02 0:59 [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan Qu Wenruo 2024-02-02 0:59 ` [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache Qu Wenruo @ 2024-02-02 0:59 ` Qu Wenruo 2024-02-02 9:01 ` Anand Jain 2024-02-02 0:59 ` [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() Qu Wenruo 2024-02-07 9:52 ` [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan David Sterba 3 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2024-02-02 0:59 UTC (permalink / raw) To: linux-btrfs [BUG] In "btrfs tune" subcomammand, we're using open_ctree_fd(), which requires passing a valid fd, and the caller is responsible to properly handling the lifespan of the fd. But we just call close_ctree() and btrfs_close_all_devices(), without closing the fd. [FIX] Just manually close the fd. Signed-off-by: Qu Wenruo <wqu@suse.com> --- tune/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tune/main.c b/tune/main.c index 9dcb55952b59..0fbf37dd4800 100644 --- a/tune/main.c +++ b/tune/main.c @@ -535,6 +535,7 @@ out: } close_ctree(root); btrfs_close_all_devices(); + close(fd); free_out: return ret; -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] btrfs-progs: tune: fix the missing close() 2024-02-02 0:59 ` [PATCH 2/3] btrfs-progs: tune: fix the missing close() Qu Wenruo @ 2024-02-02 9:01 ` Anand Jain 0 siblings, 0 replies; 10+ messages in thread From: Anand Jain @ 2024-02-02 9:01 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 2/2/24 06:29, Qu Wenruo wrote: > [BUG] > In "btrfs tune" subcomammand, we're using open_ctree_fd(), which > requires passing a valid fd, and the caller is responsible to properly > handling the lifespan of the fd. > > But we just call close_ctree() and btrfs_close_all_devices(), without > closing the fd. > > [FIX] > Just manually close the fd. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Thx. Anand ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() 2024-02-02 0:59 [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan Qu Wenruo 2024-02-02 0:59 ` [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache Qu Wenruo 2024-02-02 0:59 ` [PATCH 2/3] btrfs-progs: tune: fix the missing close() Qu Wenruo @ 2024-02-02 0:59 ` Qu Wenruo 2024-02-07 16:36 ` Anand Jain 2024-02-07 9:52 ` [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan David Sterba 3 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2024-02-02 0:59 UTC (permalink / raw) To: linux-btrfs [BUG] Although commit b2a1be83b85f ("btrfs-progs: mkfs: keep file descriptors open during whole time") is making sure we're only closing the writeable fds after the fs is properly created, there is still a missing fd not following the requirement. And this explains the issue why sometimes after mkfs.btrfs, lsblk still doesn't give a valid uuid. Shown by the strace output (the command is "mkfs.btrfs -f /dev/test/scratch1"): openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0 sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0 lseek(5, 0, SEEK_END) = 10737418240 lseek(5, 0, SEEK_SET) = 0 ...... close(5) = 0 <<< Closed now pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384 pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384 pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384 pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384 pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384 fsync(6) <<< But we're still writing into the disk. [CAUSE] After more digging, it turns out we have a very obvious escape in open_ctree_fs_info(): open_ctree_fs_info() |- fp = open(oca->filename, flags); |- info = __open_ctree_fd(); |- close(fp); As later we only do IO using the device fd, this close() seems fine. But the truth is, for mkfs usage, this fs_info is a temporary one, with a special magic number for the disk. And since mkfs is doing writeable operations, this close() would immediately trigger udev scan. And since at this stage, the fs is not yet fully created, udev can race with mkfs, and may get the invalid temporary superblock. [FIX] Introduce a new btrfs_fs_info member, initial_fd, for open_ctree_fs_info() to record the fd. And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then close it. By this, we make sure all writeable fds are only closed after we have written valid super blocks into the disk. Signed-off-by: Qu Wenruo <wqu@suse.com> --- kernel-shared/ctree.h | 9 +++++++++ kernel-shared/disk-io.c | 8 +++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index bcf11d870061..944632226baa 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -404,6 +404,15 @@ struct btrfs_fs_info { u32 sectorsize; u32 stripesize; u32 leaf_data_size; + + /* + * For open_ctree_fs_info() to hold the initial fd until close. + * + * For writeable open_ctree_fs_info() call, we should not close + * the fd until the fs_info is properly closed, or it will trigger + * udev scan while our fs is not properly initialized. + */ + int initial_fd; u16 csum_type; u16 csum_size; diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index c053319200cb..05323b2cd393 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -913,6 +913,7 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr) fs_info->metadata_alloc_profile = (u64)-1; fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; fs_info->nr_global_roots = 1; + fs_info->initial_fd = -1; return fs_info; @@ -1690,7 +1691,10 @@ struct btrfs_fs_info *open_ctree_fs_info(struct open_ctree_args *oca) return NULL; } info = __open_ctree_fd(fp, oca); - close(fp); + if (info) + info->initial_fd = fp; + else + close(fp); return info; } @@ -2297,6 +2301,8 @@ skip_commit: btrfs_release_all_roots(fs_info); ret = btrfs_close_devices(fs_info->fs_devices); + if (fs_info->initial_fd >= 0) + close(fs_info->initial_fd); btrfs_cleanup_all_caches(fs_info); btrfs_free_fs_info(fs_info); if (!err) -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() 2024-02-02 0:59 ` [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() Qu Wenruo @ 2024-02-07 16:36 ` Anand Jain 2024-02-07 21:05 ` Qu Wenruo 0 siblings, 1 reply; 10+ messages in thread From: Anand Jain @ 2024-02-07 16:36 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs > index c053319200cb..05323b2cd393 100644 > --- a/kernel-shared/disk-io.c > +++ b/kernel-shared/disk-io.c > @@ -913,6 +913,7 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr) > fs_info->metadata_alloc_profile = (u64)-1; > fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; > fs_info->nr_global_roots = 1; > + fs_info->initial_fd = -1; > > return fs_info; > > @@ -1690,7 +1691,10 @@ struct btrfs_fs_info *open_ctree_fs_info(struct open_ctree_args *oca) > return NULL; > } > info = __open_ctree_fd(fp, oca); > - close(fp); > + if (info) > + info->initial_fd = fp; Why not do this in __open_ctree_fd()? then in the parent function, __open_ctree_fd you could: if (!info) close(fp); > + else > + close(fp); > return info; > } > > @@ -2297,6 +2301,8 @@ skip_commit: > > btrfs_release_all_roots(fs_info); > ret = btrfs_close_devices(fs_info->fs_devices); > + if (fs_info->initial_fd >= 0) > + close(fs_info->initial_fd); > btrfs_cleanup_all_caches(fs_info); > btrfs_free_fs_info(fs_info); > if (!err) Essentially this patch converts the following nested open close fd1 open fd1 write zero fd1 write temp-sb fd2 open fd2 read super fd3 open devices fd2 close fd1 write temp-sb-to-remaining-dev-if-any fd3 write good-sb fd3 close fd1 close to [1] fd1 open fd1 write zero fd1 write temp-sb fd2 open fd2 read super fd3 open devices fd1 write temp-sb-to-remaining-dev-if-any fd3 write good-sb fd3 close fd2 close fd1 close However the patch [PATCH RFC] btrfs-progs: mkfs: optimize file descriptor usage in mkfs.btrfs achieved: (And reduced one less fd) fd1 open fd1 write zero fd1 write temp-sb fd1 read super fd2 open devices fd1 write temp-sb-to-remaining-dev-if-any fd2 write good-sb fd2 close fd1 close This patch saves the temporary fd (which is a helper fd to read the sb. fd2 in [1]) into a new member fs_info::initial_fd, does not make much sense to me. This fix is a kind of a quick, patchy solution rather than a ground-up clean design, imo. Thx. Anand ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() 2024-02-07 16:36 ` Anand Jain @ 2024-02-07 21:05 ` Qu Wenruo 2024-02-08 2:53 ` Anand Jain 0 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2024-02-07 21:05 UTC (permalink / raw) To: Anand Jain, Qu Wenruo, linux-btrfs On 2024/2/8 03:06, Anand Jain wrote: > >> index c053319200cb..05323b2cd393 100644 >> --- a/kernel-shared/disk-io.c >> +++ b/kernel-shared/disk-io.c >> @@ -913,6 +913,7 @@ struct btrfs_fs_info *btrfs_new_fs_info(int >> writable, u64 sb_bytenr) >> fs_info->metadata_alloc_profile = (u64)-1; >> fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; >> fs_info->nr_global_roots = 1; >> + fs_info->initial_fd = -1; >> return fs_info; >> @@ -1690,7 +1691,10 @@ struct btrfs_fs_info *open_ctree_fs_info(struct >> open_ctree_args *oca) >> return NULL; >> } > > >> info = __open_ctree_fd(fp, oca); >> - close(fp); >> + if (info) > > >> + info->initial_fd = fp; > > Why not do this in __open_ctree_fd()? Because we want to keep open()/close() in the same layer when possible. > then in the parent function, __open_ctree_fd you could: > > if (!info) > close(fp); > > >> + else >> + close(fp); > >> return info; >> } >> @@ -2297,6 +2301,8 @@ skip_commit: >> btrfs_release_all_roots(fs_info); >> ret = btrfs_close_devices(fs_info->fs_devices); >> + if (fs_info->initial_fd >= 0) >> + close(fs_info->initial_fd); >> btrfs_cleanup_all_caches(fs_info); >> btrfs_free_fs_info(fs_info); >> if (!err) > > > Essentially this patch converts the following nested open close > > > fd1 open > fd1 write zero > fd1 write temp-sb > fd2 open > fd2 read super > fd3 open devices > fd2 close > fd1 write temp-sb-to-remaining-dev-if-any > fd3 write good-sb > fd3 close > fd1 close > > to > > [1] > fd1 open > fd1 write zero > fd1 write temp-sb > fd2 open > fd2 read super > fd3 open devices > fd1 write temp-sb-to-remaining-dev-if-any > fd3 write good-sb > fd3 close > fd2 close > fd1 close > > > However the patch > [PATCH RFC] btrfs-progs: mkfs: optimize file descriptor usage in > mkfs.btrfs > > > achieved: (And reduced one less fd) > > fd1 open > fd1 write zero > fd1 write temp-sb > fd1 read super > fd2 open devices > fd1 write temp-sb-to-remaining-dev-if-any > fd2 write good-sb > fd2 close > fd1 close The problem we are hitting don't care how many fds we opened or whatever, as long as the final close is later than the final fs commit. > > > This patch saves the temporary fd (which is a helper fd to read the sb. > fd2 in [1]) into a new member fs_info::initial_fd, does not make much > sense to me. This fix is a kind of a quick, patchy solution rather > than a ground-up clean design, imo. Nope, your version needs all caller of open_ctree_fs_info() to converted to the new interface. This one keeps the interface untouched and still solves the problem. Yes, this can be enhanced, but not in the way you did. The proper way is to make open_ctree_fs_info() to not to rely on any external fd, but really make open_ctree_fs_info() to only open fds for its devices. This would make open_ctree_fs_info() to get rid of __open_ctree_fd() completely. That would be a large work and does not provide really much benefit compared to the current workaround. Thanks, Qu > > Thx. Anand > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() 2024-02-07 21:05 ` Qu Wenruo @ 2024-02-08 2:53 ` Anand Jain 0 siblings, 0 replies; 10+ messages in thread From: Anand Jain @ 2024-02-08 2:53 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo, linux-btrfs On 2/8/24 02:35, Qu Wenruo wrote: > > > On 2024/2/8 03:06, Anand Jain wrote: >> >>> index c053319200cb..05323b2cd393 100644 >>> --- a/kernel-shared/disk-io.c >>> +++ b/kernel-shared/disk-io.c >>> @@ -913,6 +913,7 @@ struct btrfs_fs_info *btrfs_new_fs_info(int >>> writable, u64 sb_bytenr) >>> fs_info->metadata_alloc_profile = (u64)-1; >>> fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; >>> fs_info->nr_global_roots = 1; >>> + fs_info->initial_fd = -1; >>> return fs_info; >>> @@ -1690,7 +1691,10 @@ struct btrfs_fs_info *open_ctree_fs_info(struct >>> open_ctree_args *oca) >>> return NULL; >>> } >> >> >>> info = __open_ctree_fd(fp, oca); >>> - close(fp); >>> + if (info) >> >> >>> + info->initial_fd = fp; >> >> Why not do this in __open_ctree_fd()? > > Because we want to keep open()/close() in the same layer when possible. > >> then in the parent function, __open_ctree_fd you could: >> >> if (!info) >> close(fp); >> >> >>> + else >>> + close(fp); >> >>> return info; >>> } >>> @@ -2297,6 +2301,8 @@ skip_commit: >>> btrfs_release_all_roots(fs_info); >>> ret = btrfs_close_devices(fs_info->fs_devices); >>> + if (fs_info->initial_fd >= 0) >>> + close(fs_info->initial_fd); >>> btrfs_cleanup_all_caches(fs_info); >>> btrfs_free_fs_info(fs_info); >>> if (!err) >> >> >> Essentially this patch converts the following nested open close >> >> >> fd1 open >> fd1 write zero >> fd1 write temp-sb >> fd2 open >> fd2 read super >> fd3 open devices >> fd2 close >> fd1 write temp-sb-to-remaining-dev-if-any >> fd3 write good-sb >> fd3 close >> fd1 close >> >> to >> >> [1] >> fd1 open >> fd1 write zero >> fd1 write temp-sb >> fd2 open >> fd2 read super >> fd3 open devices >> fd1 write temp-sb-to-remaining-dev-if-any >> fd3 write good-sb >> fd3 close >> fd2 close >> fd1 close >> >> >> However the patch >> [PATCH RFC] btrfs-progs: mkfs: optimize file descriptor usage in >> mkfs.btrfs >> >> >> achieved: (And reduced one less fd) >> >> fd1 open >> fd1 write zero >> fd1 write temp-sb >> fd1 read super >> fd2 open devices >> fd1 write temp-sb-to-remaining-dev-if-any >> fd2 write good-sb >> fd2 close >> fd1 close > > The problem we are hitting don't care how many fds we opened or > whatever, as long as the final close is later than the final fs commit. Indeed, reusing fewer fd is a good cleanup. > >> >> >> This patch saves the temporary fd (which is a helper fd to read the sb. >> fd2 in [1]) into a new member fs_info::initial_fd, does not make much >> sense to me. This fix is a kind of a quick, patchy solution rather >> than a ground-up clean design, imo. > > Nope, your version needs all caller of open_ctree_fs_info() to converted > to the new interface. It is a step in the right direction. > This one keeps the interface untouched and still solves the problem. > > Yes, this can be enhanced, but not in the way you did. > > The proper way is to make open_ctree_fs_info() to not to rely on any > external fd, but really make open_ctree_fs_info() to only open fds for > its devices. It depends on when the parent's fd is closed. Sequential is better imo (See Josef comments in my patch), keeping only one fd open at any time. David's concerns about excess udev events and calls to btrfs dev scan should be fine for mkfs since no good-sb written yet. > This would make open_ctree_fs_info() to get rid of __open_ctree_fd() > completely. It's a step forward, but adding another member to fs_info for a helper fd as in this patch isn't wise. > That would be a large work and does not provide really much benefit > compared to the current workaround. Improving open_ctree are more in line with my patch's goal. Thanks, Anand > > Thanks, > Qu > >> >> Thx. Anand >> >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan 2024-02-02 0:59 [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan Qu Wenruo ` (2 preceding siblings ...) 2024-02-02 0:59 ` [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() Qu Wenruo @ 2024-02-07 9:52 ` David Sterba 3 siblings, 0 replies; 10+ messages in thread From: David Sterba @ 2024-02-07 9:52 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Feb 02, 2024 at 11:29:18AM +1030, Qu Wenruo wrote: > Although my previous flock() based solution is preventing udev scan to > get pre-mature super blocks, it in fact masks the root problem: > > There is a stray close() on writeable fd. > > Commit b2a1be83b85f ("btrfs-progs: mkfs: keep file descriptors open > during whole time") tries to solve the problem by extending the lifespan > of writeable fds, so that when the fds are closed, the fs is ensured to > be properly populated. > > The problem is, that patch is not covering all cases, there is a stray > fd close just under our noses: open_ctree_fs_info(). > > The function would open the initial device, then use that initial fd to > open the btrfs, then we immediately close the initial fd, as later IO > would all go with the device fd. > > That close() call is causing problem, especially for mkfs, as at that > stage the fs is still using a temporary super block, not using the valid > btrfs super magic number. > > Thus udev scan would race with mkfs, and if udev wins the race, it would > get the temporary super block, making libblk not to detect the new > btrfs. > > This patchset would address the problem by: > > - Make sure open_ctree*() calls all have corresponding close_ctree() > The first patch, as later we will only close the initial fd caused by > open_ctree_fs_info() during close_ctree(). > > - Save the initial fd into btrfs_fs_info for open_ctree_fs_info() > And later close the initial fd during close_ctree(). > > - Make sure open_ctree_fd() callers to properly close the fd > Just an extra cleanup. > > This patchset would work even without the usage of flock() to block udev > scan, and since without flock() calls, the procedure is much simpler. > > Qu Wenruo (3): > btrfs-progs: rescue: properly close the fs for clear-ino-cache > btrfs-progs: tune: fix the missing close() > btrfs-progs: fix the stray fd close in open_ctree_fs_info() Thanks, added to devel. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-08 2:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-02 0:59 [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan Qu Wenruo 2024-02-02 0:59 ` [PATCH 1/3] btrfs-progs: rescue: properly close the fs for clear-ino-cache Qu Wenruo 2024-02-02 8:48 ` Anand Jain 2024-02-02 0:59 ` [PATCH 2/3] btrfs-progs: tune: fix the missing close() Qu Wenruo 2024-02-02 9:01 ` Anand Jain 2024-02-02 0:59 ` [PATCH 3/3] btrfs-progs: fix the stray fd close in open_ctree_fs_info() Qu Wenruo 2024-02-07 16:36 ` Anand Jain 2024-02-07 21:05 ` Qu Wenruo 2024-02-08 2:53 ` Anand Jain 2024-02-07 9:52 ` [PATCH 0/3] btrfs-progs: hunt down the stray fd close causing race with udev scan David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox