* [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check
@ 2024-11-24 12:40 Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 2/9] kselftest/arm64: Log fp-stress child startup errors to stdout Sasha Levin
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Christian Brauner, Jan Kara, syzbot+3b6b32dc50537a49bb4a,
Sasha Levin, viro, linux-fsdevel
From: Christian Brauner <brauner@kernel.org>
[ Upstream commit 6474353a5e3d0b2cf610153cea0c61f576a36d0a ]
Epoll relies on a racy fastpath check during __fput() in
eventpoll_release() to avoid the hit of pointlessly acquiring a
semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE().
Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com
Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: syzbot+3b6b32dc50537a49bb4a@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 6 ++++--
include/linux/eventpoll.h | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0ed73bc7d4652..bcaad495930c3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -741,7 +741,8 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
to_free = NULL;
head = file->f_ep;
if (head->first == &epi->fllink && !epi->fllink.next) {
- file->f_ep = NULL;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, NULL);
if (!is_file_epoll(file)) {
struct epitems_head *v;
v = container_of(head, struct epitems_head, epitems);
@@ -1498,7 +1499,8 @@ static int attach_epitem(struct file *file, struct epitem *epi)
spin_unlock(&file->f_lock);
goto allocate;
}
- file->f_ep = head;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, head);
to_free = NULL;
}
hlist_add_head_rcu(&epi->fllink, file->f_ep);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 3337745d81bd6..0c0d00fcd131f 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -42,7 +42,7 @@ static inline void eventpoll_release(struct file *file)
* because the file in on the way to be removed and nobody ( but
* eventpoll ) has still a reference to this file.
*/
- if (likely(!file->f_ep))
+ if (likely(!READ_ONCE(file->f_ep)))
return;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 2/9] kselftest/arm64: Log fp-stress child startup errors to stdout
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 3/9] s390/cpum_sf: Handle CPU hotplug remove during sampling Sasha Levin
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mark Brown, Catalin Marinas, Sasha Levin, will, shuah,
mark.rutland, linux-arm-kernel, linux-kselftest
From: Mark Brown <broonie@kernel.org>
[ Upstream commit dca93d29845dfed60910ba13dbfb6ae6a0e19f6d ]
Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241023-arm64-fp-stress-exec-fail-v1-1-ee3c62932c15@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index dd31647b00a22..cf9d7b2e4630c 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -79,7 +79,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(pipefd[1], 1);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -89,7 +89,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(startup_pipe[0], 3);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -107,16 +107,15 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = read(3, &i, sizeof(i));
if (ret < 0)
- fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
- strerror(errno), errno);
+ printf("read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
if (ret > 0)
- fprintf(stderr, "%d bytes of data on startup pipe\n",
- ret);
+ printf("%d bytes of data on startup pipe\n", ret);
close(3);
ret = execl(program, program, NULL);
- fprintf(stderr, "execl(%s) failed: %d (%s)\n",
- program, errno, strerror(errno));
+ printf("execl(%s) failed: %d (%s)\n",
+ program, errno, strerror(errno));
exit(EXIT_FAILURE);
} else {
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 3/9] s390/cpum_sf: Handle CPU hotplug remove during sampling
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 2/9] kselftest/arm64: Log fp-stress child startup errors to stdout Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 4/9] btrfs: don't take dev_replace rwsem on task already holding it Sasha Levin
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Thomas Richter, Hendrik Brueckner, Heiko Carstens, Sasha Levin,
gor, agordeev, sumanthk, linux-s390
From: Thomas Richter <tmricht@linux.ibm.com>
[ Upstream commit a0bd7dacbd51c632b8e2c0500b479af564afadf3 ]
CPU hotplug remove handling triggers the following function
call sequence:
CPUHP_AP_PERF_S390_SF_ONLINE --> s390_pmu_sf_offline_cpu()
...
CPUHP_AP_PERF_ONLINE --> perf_event_exit_cpu()
The s390 CPUMF sampling CPU hotplug handler invokes:
s390_pmu_sf_offline_cpu()
+--> cpusf_pmu_setup()
+--> setup_pmc_cpu()
+--> deallocate_buffers()
This function de-allocates all sampling data buffers (SDBs) allocated
for that CPU at event initialization. It also clears the
PMU_F_RESERVED bit. The CPU is gone and can not be sampled.
With the event still being active on the removed CPU, the CPU event
hotplug support in kernel performance subsystem triggers the
following function calls on the removed CPU:
perf_event_exit_cpu()
+--> perf_event_exit_cpu_context()
+--> __perf_event_exit_context()
+--> __perf_remove_from_context()
+--> event_sched_out()
+--> cpumsf_pmu_del()
+--> cpumsf_pmu_stop()
+--> hw_perf_event_update()
to stop and remove the event. During removal of the event, the
sampling device driver tries to read out the remaining samples from
the sample data buffers (SDBs). But they have already been freed
(and may have been re-assigned). This may lead to a use after free
situation in which case the samples are most likely invalid. In the
best case the memory has not been reassigned and still contains
valid data.
Remedy this situation and check if the CPU is still in reserved
state (bit PMU_F_RESERVED set). In this case the SDBs have not been
released an contain valid data. This is always the case when
the event is removed (and no CPU hotplug off occured).
If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/s390/kernel/perf_cpum_sf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index a3169193775f7..e52c89739bc9a 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1922,7 +1922,9 @@ static void cpumsf_pmu_stop(struct perf_event *event, int flags)
event->hw.state |= PERF_HES_STOPPED;
if ((flags & PERF_EF_UPDATE) && !(event->hw.state & PERF_HES_UPTODATE)) {
- hw_perf_event_update(event, 1);
+ /* CPU hotplug off removes SDBs. No samples to extract. */
+ if (cpuhw->flags & PMU_F_RESERVED)
+ hw_perf_event_update(event, 1);
event->hw.state |= PERF_HES_UPTODATE;
}
perf_pmu_enable(event->pmu);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 4/9] btrfs: don't take dev_replace rwsem on task already holding it
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 2/9] kselftest/arm64: Log fp-stress child startup errors to stdout Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 3/9] s390/cpum_sf: Handle CPU hotplug remove during sampling Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 5/9] btrfs: avoid unnecessary device path update for the same device Sasha Levin
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Johannes Thumshirn, Filipe Manana, David Sterba, Sasha Levin, clm,
josef, linux-btrfs
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
[ Upstream commit 8cca35cb29f81eba3e96ec44dad8696c8a2f9138 ]
Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of
the RAID stripe-tree, we get the following splat from lockdep:
BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started
============================================
WARNING: possible recursive locking detected
6.11.0-rc3-btrfs-for-next #599 Not tainted
--------------------------------------------
btrfs/2326 is trying to acquire lock:
ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
but task is already holding lock:
ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&fs_info->dev_replace.rwsem);
lock(&fs_info->dev_replace.rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by btrfs/2326:
#0: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
stack backtrace:
CPU: 1 UID: 0 PID: 2326 Comm: btrfs Not tainted 6.11.0-rc3-btrfs-for-next #599
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x80
__lock_acquire+0x2798/0x69d0
? __pfx___lock_acquire+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x19d/0x4a0
? btrfs_map_block+0x39f/0x2250
? __pfx_lock_acquire+0x10/0x10
? find_held_lock+0x2d/0x110
? lock_is_held_type+0x8f/0x100
down_read+0x8e/0x440
? btrfs_map_block+0x39f/0x2250
? __pfx_down_read+0x10/0x10
? do_raw_read_unlock+0x44/0x70
? _raw_read_unlock+0x23/0x40
btrfs_map_block+0x39f/0x2250
? btrfs_dev_replace_by_ioctl+0xd69/0x1d00
? btrfs_bio_counter_inc_blocked+0xd9/0x2e0
? __kasan_slab_alloc+0x6e/0x70
? __pfx_btrfs_map_block+0x10/0x10
? __pfx_btrfs_bio_counter_inc_blocked+0x10/0x10
? kmem_cache_alloc_noprof+0x1f2/0x300
? mempool_alloc_noprof+0xed/0x2b0
btrfs_submit_chunk+0x28d/0x17e0
? __pfx_btrfs_submit_chunk+0x10/0x10
? bvec_alloc+0xd7/0x1b0
? bio_add_folio+0x171/0x270
? __pfx_bio_add_folio+0x10/0x10
? __kasan_check_read+0x20/0x20
btrfs_submit_bio+0x37/0x80
read_extent_buffer_pages+0x3df/0x6c0
btrfs_read_extent_buffer+0x13e/0x5f0
read_tree_block+0x81/0xe0
read_block_for_search+0x4bd/0x7a0
? __pfx_read_block_for_search+0x10/0x10
btrfs_search_slot+0x78d/0x2720
? __pfx_btrfs_search_slot+0x10/0x10
? lock_is_held_type+0x8f/0x100
? kasan_save_track+0x14/0x30
? __kasan_slab_alloc+0x6e/0x70
? kmem_cache_alloc_noprof+0x1f2/0x300
btrfs_get_raid_extent_offset+0x181/0x820
? __pfx_lock_acquire+0x10/0x10
? __pfx_btrfs_get_raid_extent_offset+0x10/0x10
? down_read+0x194/0x440
? __pfx_down_read+0x10/0x10
? do_raw_read_unlock+0x44/0x70
? _raw_read_unlock+0x23/0x40
btrfs_map_block+0x5b5/0x2250
? __pfx_btrfs_map_block+0x10/0x10
scrub_submit_initial_read+0x8fe/0x11b0
? __pfx_scrub_submit_initial_read+0x10/0x10
submit_initial_group_read+0x161/0x3a0
? lock_release+0x20e/0x710
? __pfx_submit_initial_group_read+0x10/0x10
? __pfx_lock_release+0x10/0x10
scrub_simple_mirror.isra.0+0x3eb/0x580
scrub_stripe+0xe4d/0x1440
? lock_release+0x20e/0x710
? __pfx_scrub_stripe+0x10/0x10
? __pfx_lock_release+0x10/0x10
? do_raw_read_unlock+0x44/0x70
? _raw_read_unlock+0x23/0x40
scrub_chunk+0x257/0x4a0
scrub_enumerate_chunks+0x64c/0xf70
? __mutex_unlock_slowpath+0x147/0x5f0
? __pfx_scrub_enumerate_chunks+0x10/0x10
? bit_wait_timeout+0xb0/0x170
? __up_read+0x189/0x700
? scrub_workers_get+0x231/0x300
? up_write+0x490/0x4f0
btrfs_scrub_dev+0x52e/0xcd0
? create_pending_snapshots+0x230/0x250
? __pfx_btrfs_scrub_dev+0x10/0x10
btrfs_dev_replace_by_ioctl+0xd69/0x1d00
? lock_acquire+0x19d/0x4a0
? __pfx_btrfs_dev_replace_by_ioctl+0x10/0x10
? lock_release+0x20e/0x710
? btrfs_ioctl+0xa09/0x74f0
? __pfx_lock_release+0x10/0x10
? do_raw_spin_lock+0x11e/0x240
? __pfx_do_raw_spin_lock+0x10/0x10
btrfs_ioctl+0xa14/0x74f0
? lock_acquire+0x19d/0x4a0
? find_held_lock+0x2d/0x110
? __pfx_btrfs_ioctl+0x10/0x10
? lock_release+0x20e/0x710
? do_sigaction+0x3f0/0x860
? __pfx_do_vfs_ioctl+0x10/0x10
? do_raw_spin_lock+0x11e/0x240
? lockdep_hardirqs_on_prepare+0x270/0x3e0
? _raw_spin_unlock_irq+0x28/0x50
? do_sigaction+0x3f0/0x860
? __pfx_do_sigaction+0x10/0x10
? __x64_sys_rt_sigaction+0x18e/0x1e0
? __pfx___x64_sys_rt_sigaction+0x10/0x10
? __x64_sys_close+0x7c/0xd0
__x64_sys_ioctl+0x137/0x190
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f0bd1114f9b
Code: Unable to access opcode bytes at 0x7f0bd1114f71.
RSP: 002b:00007ffc8a8c3130 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0bd1114f9b
RDX: 00007ffc8a8c35e0 RSI: 00000000ca289435 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007
R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc8a8c6c85
R13: 00000000398e72a0 R14: 0000000000004361 R15: 0000000000000004
</TASK>
This happens because on RAID stripe-tree filesystems we recurse back into
btrfs_map_block() on scrub to perform the logical to device physical
mapping.
But as the device replace task is already holding the dev_replace::rwsem
we deadlock.
So don't take the dev_replace::rwsem in case our task is the task performing
the device replace.
Suggested-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/dev-replace.c | 2 ++
fs/btrfs/fs.h | 2 ++
fs/btrfs/volumes.c | 8 +++++---
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 8400e212e3304..f77ef719a3b11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -644,6 +644,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
return ret;
down_write(&dev_replace->rwsem);
+ dev_replace->replace_task = current;
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -976,6 +977,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
list_add(&tgt_device->dev_alloc_list, &fs_devices->alloc_list);
fs_devices->rw_devices++;
+ dev_replace->replace_task = NULL;
up_write(&dev_replace->rwsem);
btrfs_rm_dev_replace_blocked(fs_info);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index a523d64d54912..d24d41f7811a6 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -271,6 +271,8 @@ struct btrfs_dev_replace {
struct percpu_counter bio_counter;
wait_queue_head_t replace_wait;
+
+ struct task_struct *replace_task;
};
/*
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d2285c9726e7b..790e30e2101a6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6313,13 +6313,15 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
&stripe_offset, &raid56_full_stripe_start);
*length = min_t(u64, em->len - map_offset, max_len);
- down_read(&dev_replace->rwsem);
+ if (dev_replace->replace_task != current)
+ down_read(&dev_replace->rwsem);
+
dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
/*
* Hold the semaphore for read during the whole operation, write is
* requested at commit time but must wait.
*/
- if (!dev_replace_is_ongoing)
+ if (!dev_replace_is_ongoing && dev_replace->replace_task != current)
up_read(&dev_replace->rwsem);
num_stripes = 1;
@@ -6509,7 +6511,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
bioc->mirror_num = mirror_num;
out:
- if (dev_replace_is_ongoing) {
+ if (dev_replace_is_ongoing && dev_replace->replace_task != current) {
lockdep_assert_held(&dev_replace->rwsem);
/* Unlock and let waiting writers proceed */
up_read(&dev_replace->rwsem);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 5/9] btrfs: avoid unnecessary device path update for the same device
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
` (2 preceding siblings ...)
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 4/9] btrfs: don't take dev_replace rwsem on task already holding it Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 6/9] btrfs: do not clear read-only when adding sprout device Sasha Levin
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qu Wenruo, Filipe Manana, Fabian Vogt, David Sterba, Sasha Levin,
clm, josef, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 2e8b6bc0ab41ce41e6dfcc204b6cc01d5abbc952 ]
[PROBLEM]
It is very common for udev to trigger device scan, and every time a
mounted btrfs device got re-scan from different soft links, we will get
some of unnecessary device path updates, this is especially common
for LVM based storage:
# lvs
scratch1 test -wi-ao---- 10.00g
scratch2 test -wi-a----- 10.00g
scratch3 test -wi-a----- 10.00g
scratch4 test -wi-a----- 10.00g
scratch5 test -wi-a----- 10.00g
test test -wi-a----- 10.00g
# mkfs.btrfs -f /dev/test/scratch1
# mount /dev/test/scratch1 /mnt/btrfs
# dmesg -c
[ 205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154)
[ 205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9
[ 205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm
[ 205.713856] BTRFS info (device dm-4): using free-space-tree
[ 205.722324] BTRFS info (device dm-4): checking UUID tree
So far so good, but even if we just touched any soft link of
"dm-4", we will get quite some unnecessary device path updates.
# touch /dev/mapper/test-scratch1
# dmesg -c
[ 469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221)
[ 469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)
Such device path rename is unnecessary and can lead to random path
change due to the udev race.
[CAUSE]
Inside device_list_add(), we are using a very primitive way checking if
the device has changed, strcmp().
Which can never handle links well, no matter if it's hard or soft links.
So every different link of the same device will be treated as a different
device, causing the unnecessary device path update.
[FIX]
Introduce a helper, is_same_device(), and use path_equal() to properly
detect the same block device.
So that the different soft links won't trigger the rename race.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
Reported-by: Fabian Vogt <fvogt@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 790e30e2101a6..fdd392334916f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -689,6 +689,42 @@ u8 *btrfs_sb_fsid_ptr(struct btrfs_super_block *sb)
return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
}
+static bool is_same_device(struct btrfs_device *device, const char *new_path)
+{
+ struct path old = { .mnt = NULL, .dentry = NULL };
+ struct path new = { .mnt = NULL, .dentry = NULL };
+ char *old_path = NULL;
+ bool is_same = false;
+ int ret;
+
+ if (!device->name)
+ goto out;
+
+ old_path = kzalloc(PATH_MAX, GFP_NOFS);
+ if (!old_path)
+ goto out;
+
+ rcu_read_lock();
+ ret = strscpy(old_path, rcu_str_deref(device->name), PATH_MAX);
+ rcu_read_unlock();
+ if (ret < 0)
+ goto out;
+
+ ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
+ if (ret)
+ goto out;
+ ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
+ if (ret)
+ goto out;
+ if (path_equal(&old, &new))
+ is_same = true;
+out:
+ kfree(old_path);
+ path_put(&old);
+ path_put(&new);
+ return is_same;
+}
+
/*
* Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices
* being created with a disk that has already completed its fsid change. Such
@@ -888,7 +924,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
disk_super->fsid, devid, found_transid, path,
current->comm, task_pid_nr(current));
- } else if (!device->name || strcmp(device->name->str, path)) {
+ } else if (!device->name || !is_same_device(device, path)) {
/*
* When FS is already mounted.
* 1. If you are here and if the device->name is NULL that
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 6/9] btrfs: do not clear read-only when adding sprout device
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
` (3 preceding siblings ...)
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 5/9] btrfs: avoid unnecessary device path update for the same device Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 7/9] btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() Sasha Levin
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Boris Burkov, Qu Wenruo, David Sterba, Sasha Levin, clm, josef,
linux-btrfs
From: Boris Burkov <boris@bur.io>
[ Upstream commit 70958a949d852cbecc3d46127bf0b24786df0130 ]
If you follow the seed/sprout wiki, it suggests the following workflow:
btrfstune -S 1 seed_dev
mount seed_dev mnt
btrfs device add sprout_dev
mount -o remount,rw mnt
The first mount mounts the FS readonly, which results in not setting
BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
somewhat surprisingly clears the readonly bit on the sb (though the
mount is still practically readonly, from the users perspective...).
Finally, the remount checks the readonly bit on the sb against the flag
and sees no change, so it does not run the code intended to run on
ro->rw transitions, leaving BTRFS_FS_OPEN unset.
As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
does no work. This results in leaking deleted snapshots until we run out
of space.
I propose fixing it at the first departure from what feels reasonable:
when we clear the readonly bit on the sb during device add.
A new fstest I have written reproduces the bug and confirms the fix.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/volumes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fdd392334916f..b9a0b26d08e1c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2738,8 +2738,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
if (seeding_dev) {
- btrfs_clear_sb_rdonly(sb);
-
/* GFP_KERNEL allocation must not be under device_list_mutex */
seed_devices = btrfs_init_sprout(fs_info);
if (IS_ERR(seed_devices)) {
@@ -2882,8 +2880,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
mutex_unlock(&fs_info->chunk_mutex);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
error_trans:
- if (seeding_dev)
- btrfs_set_sb_rdonly(sb);
if (trans)
btrfs_end_transaction(trans);
error_free_zone:
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 7/9] btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl()
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
` (4 preceding siblings ...)
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 6/9] btrfs: do not clear read-only when adding sprout device Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 8/9] kselftest/arm64: Corrupt P0 in the irritator when testing SSVE Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 9/9] kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() Sasha Levin
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Filipe Manana, Qu Wenruo, David Sterba, Sasha Levin, clm, josef,
linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
[ Upstream commit 2342d6595b608eec94187a17dc112dd4c2a812fa ]
Smatch complains about calling PTR_ERR() against a NULL pointer:
fs/btrfs/super.c:2272 btrfs_control_ioctl() warn: passing zero to 'PTR_ERR'
Fix this by calling PTR_ERR() against the device pointer only if it
contains an error.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/super.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e33587a814098..4c98eb5230184 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2216,7 +2216,10 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
if (IS_ERR(device)) {
mutex_unlock(&uuid_mutex);
- ret = PTR_ERR(device);
+ if (IS_ERR(device))
+ ret = PTR_ERR(device);
+ else
+ ret = 0;
break;
}
ret = !(device->fs_devices->num_devices ==
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 8/9] kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
` (5 preceding siblings ...)
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 7/9] btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 9/9] kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() Sasha Levin
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mark Brown, Catalin Marinas, Sasha Levin, will, shuah,
mark.rutland, thiago.bauermann, linux-arm-kernel, linux-kselftest
From: Mark Brown <broonie@kernel.org>
[ Upstream commit 3e360ef0c0a1fb6ce9a302e40b8057c41ba8a9d2 ]
When building for streaming SVE the irritator for SVE skips updates of both
P0 and FFR. While FFR is skipped since it might not be present there is no
reason to skip corrupting P0 so switch to an instruction valid in streaming
mode and move the ifdef.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241107-arm64-fp-stress-irritator-v2-3-c4b9622e36ee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/arm64/fp/sve-test.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index 4328895dfc876..ff60360a97f80 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -304,9 +304,9 @@ function irritator_handler
movi v0.8b, #1
movi v9.16b, #2
movi v31.8b, #3
-#ifndef SSVE
// And P0
- rdffr p0.b
+ ptrue p0.d
+#ifndef SSVE
// And FFR
wrffr p15.b
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH AUTOSEL 6.6 9/9] kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
` (6 preceding siblings ...)
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 8/9] kselftest/arm64: Corrupt P0 in the irritator when testing SSVE Sasha Levin
@ 2024-11-24 12:40 ` Sasha Levin
7 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2024-11-24 12:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mark Brown, Catalin Marinas, Sasha Levin, will, shuah,
linux-arm-kernel, linux-kselftest
From: Mark Brown <broonie@kernel.org>
[ Upstream commit 27141b690547da5650a420f26ec369ba142a9ebb ]
The PAC exec_sign_all() test spawns some child processes, creating pipes
to be stdin and stdout for the child. It cleans up most of the file
descriptors that are created as part of this but neglects to clean up the
parent end of the child stdin and stdout. Add the missing close() calls.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241111-arm64-pac-test-collisions-v1-1-171875f37e44@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/arm64/pauth/pac.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/arm64/pauth/pac.c b/tools/testing/selftests/arm64/pauth/pac.c
index b743daa772f55..5a07b3958fbf2 100644
--- a/tools/testing/selftests/arm64/pauth/pac.c
+++ b/tools/testing/selftests/arm64/pauth/pac.c
@@ -182,6 +182,9 @@ int exec_sign_all(struct signatures *signed_vals, size_t val)
return -1;
}
+ close(new_stdin[1]);
+ close(new_stdout[0]);
+
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-11-24 12:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-24 12:40 [PATCH AUTOSEL 6.6 1/9] epoll: annotate racy check Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 2/9] kselftest/arm64: Log fp-stress child startup errors to stdout Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 3/9] s390/cpum_sf: Handle CPU hotplug remove during sampling Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 4/9] btrfs: don't take dev_replace rwsem on task already holding it Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 5/9] btrfs: avoid unnecessary device path update for the same device Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 6/9] btrfs: do not clear read-only when adding sprout device Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 7/9] btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 8/9] kselftest/arm64: Corrupt P0 in the irritator when testing SSVE Sasha Levin
2024-11-24 12:40 ` [PATCH AUTOSEL 6.6 9/9] kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox