* [PATCH 0/2] cleanup: Conditional locking support
@ 2023-11-02 10:44 Peter Zijlstra
2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman,
Oleg Nesterov, Eric Biederman
Hi,
Two patches, the first extending the cleanup/guard stuff to better handle
conditional locks, mutex_trylock(), mutex_lock_interruptible() etc.. And the
second a small ptrace conversion that I've been using as a test-case.
The normal scoped_guard() is changed to simply skip the body when a conditional
lock fails to acquire. A new scoped_cond_guard() is added that takes an extra
statement argument to provide an explicit 'fail' action.
The ptrace patch has:
scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
&task->signal->cred_guard_mutex) {
...
}
...
return 0;
Where if the lock acquire fails, it does 'return -ERESTARTNOINTR'.
The crazy perf thing then becomes:
scoped_cond_guard (rwsem_read_intr, goto no_lock,
task ? &task->signal->exec_update_lock : NULL) {
if (0) {
no_lock:
if (task)
return -EINTR;
}
... body with or without lock ...
}
Specifically, that thing needs the lock when there is a task, but otherwise
needs to still do the body without the lock.
IIO also wanted something along these lines, although they have a custom
'trylock' thing.
Barring objections, I'm planning to merge this into tip/locking/cleanup which
I'll merge into tip/locking/core (and tip/perf/core when times comes).
My plan is to post the perf patches in 3 batches of roughly 10 patches each,
the simpler first and the more crazy ones (including the above) last.
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra @ 2023-11-02 10:44 ` Peter Zijlstra 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman Adds: - DEFINE_GUARD_COND() / DEFINE_LOCK_GUARD_1_COND() to extend existing guards with conditional lock primitives, eg. mutex_trylock(), mutex_lock_interruptible(). nb. both primitives allow NULL 'locks', which cause the lock to fail (obviously). - extends scoped_guard() to not take the body when the the conditional guard 'fails'. eg. scoped_guard (mutex_intr, &task->signal_cred_guard_mutex) { ... } will only execute the body when the mutex is held. - provides scoped_cond_guard(name, fail, args...); which extends scoped_guard() to do fail when the lock-acquire fails. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- --- include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- include/linux/mutex.h | 3 +- include/linux/rwsem.h | 8 +++---- include/linux/spinlock.h | 15 +++++++++++++ 4 files changed, 70 insertions(+), 8 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -125,25 +125,55 @@ static inline class_##_name##_t class_## * trivial wrapper around DEFINE_CLASS() above specifically * for locks. * + * DEFINE_GUARD_COND(name, ext, condlock) + * wrapper around EXTEND_CLASS above to add conditional lock + * variants to a base class, eg. mutex_trylock() or + * mutex_lock_interruptible(). + * * guard(name): - * an anonymous instance of the (guard) class + * an anonymous instance of the (guard) class, not recommended for + * conditional locks. * * scoped_guard (name, args...) { }: * similar to CLASS(name, scope)(args), except the variable (with the * explicit name 'scope') is declard in a for-loop such that its scope is * bound to the next (compound) statement. * + * for conditional locks the loop body is skipped when the lock is not + * acquired. + * + * scoped_cond_guard (name, fail, args...) { }: + * similar to scoped_guard(), except it does fail when the lock + * acquire fails. + * */ #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ - DEFINE_CLASS(_name, _type, _unlock, ({ _lock; _T; }), _type _T) + DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return *_T; } + +#define DEFINE_GUARD_COND(_name, _ext, _condlock) \ + EXTEND_CLASS(_name, _ext, \ + ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \ + class_##_name##_t _T) \ + static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ + { return class_##_name##_lock_ptr(_T); } #define guard(_name) \ CLASS(_name, __UNIQUE_ID(guard)) +#define __guard_ptr(_name) class_##_name##_lock_ptr + #define scoped_guard(_name, args...) \ for (CLASS(_name, scope)(args), \ - *done = NULL; !done; done = (void *)1) + *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1) + +#define scoped_cond_guard(_name, _fail, args...) \ + for (CLASS(_name, scope)(args), \ + *done = NULL; !done; done = (void *)1) \ + if (!__guard_ptr(_name)(&scope)) _fail; \ + else /* * Additional helper macros for generating lock guards with types, either for @@ -152,6 +182,7 @@ static inline class_##_name##_t class_## * * DEFINE_LOCK_GUARD_0(name, lock, unlock, ...) * DEFINE_LOCK_GUARD_1(name, type, lock, unlock, ...) + * DEFINE_LOCK_GUARD_1_COND(name, ext, condlock) * * will result in the following type: * @@ -173,6 +204,11 @@ typedef struct { \ static inline void class_##_name##_destructor(class_##_name##_t *_T) \ { \ if (_T->lock) { _unlock; } \ +} \ + \ +static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \ +{ \ + return _T->lock; \ } @@ -201,4 +237,14 @@ __DEFINE_LOCK_GUARD_1(_name, _type, _loc __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__) \ __DEFINE_LOCK_GUARD_0(_name, _lock) +#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \ + EXTEND_CLASS(_name, _ext, \ + ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\ + if (_T->lock && !(_condlock)) _T->lock = NULL; \ + _t; }), \ + typeof_member(class_##_name##_t, lock) l) \ + static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ + { return class_##_name##_lock_ptr(_T); } + + #endif /* __LINUX_GUARDS_H */ --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -221,6 +221,7 @@ extern void mutex_unlock(struct mutex *l extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T)) -DEFINE_FREE(mutex, struct mutex *, if (_T) mutex_unlock(_T)) +DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T)) +DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0) #endif /* __LINUX_MUTEX_H */ --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -203,11 +203,11 @@ extern void up_read(struct rw_semaphore extern void up_write(struct rw_semaphore *sem); DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T)) -DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) - -DEFINE_FREE(up_read, struct rw_semaphore *, if (_T) up_read(_T)) -DEFINE_FREE(up_write, struct rw_semaphore *, if (_T) up_write(_T)) +DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T)) +DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0) +DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) +DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T)) /* * downgrade write lock to read lock --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -507,6 +507,8 @@ DEFINE_LOCK_GUARD_1(raw_spinlock, raw_sp raw_spin_lock(_T->lock), raw_spin_unlock(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock, _try, raw_spin_trylock(_T->lock)) + DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t, raw_spin_lock_nested(_T->lock, SINGLE_DEPTH_NESTING), raw_spin_unlock(_T->lock)) @@ -515,23 +517,36 @@ DEFINE_LOCK_GUARD_1(raw_spinlock_irq, ra raw_spin_lock_irq(_T->lock), raw_spin_unlock_irq(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock)) + DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t, raw_spin_lock_irqsave(_T->lock, _T->flags), raw_spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try, + raw_spin_trylock_irqsave(_T->lock, _T->flags)) + DEFINE_LOCK_GUARD_1(spinlock, spinlock_t, spin_lock(_T->lock), spin_unlock(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock)) + DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t, spin_lock_irq(_T->lock), spin_unlock_irq(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try, + spin_trylock_irq(_T->lock)) + DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t, spin_lock_irqsave(_T->lock, _T->flags), spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try, + spin_trylock_irqsave(_T->lock, _T->flags)) + #undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra @ 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 0 siblings, 2 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 14:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Peter Zijlstra wrote: > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- interesting... I don't know anything about cleanup.h, will read this code and the patch later, but I guess I understand the idea. Stupid/offtopic question... Can't we change guard() -#define guard(_name) \ - CLASS(_name, __UNIQUE_ID(guard)) +#define guard(_name, args...) \ + CLASS(_name, __UNIQUE_ID(guard))(args) and update the current users? To me guard(rcu); guard(spinlock, &lock); looks better than guard(rcu)(); // doesn't match scoped_guard(spinlock, &lock) guard(spinlock)(&lock); And this will make guard() consistent with scoped_guard(). No? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 14:40 ` Oleg Nesterov @ 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 15:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Oleg Nesterov wrote: > > On 11/02, Peter Zijlstra wrote: > > > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- > > interesting... I don't know anything about cleanup.h, will > read this code and the patch later, but I guess I understand > the idea. > > Stupid/offtopic question... Can't we change guard() > > -#define guard(_name) \ > - CLASS(_name, __UNIQUE_ID(guard)) > +#define guard(_name, args...) \ > + CLASS(_name, __UNIQUE_ID(guard))(args) > > and update the current users? > > To me > > guard(rcu); > guard(spinlock, &lock); > > looks better than > > guard(rcu)(); > // doesn't match scoped_guard(spinlock, &lock) > guard(spinlock)(&lock); > > And this will make guard() consistent with scoped_guard(). Just in case the kernel builds and botts with the patch below. The .c files were changed by perl -wpi~ -e 's/\bguard\(\w+\K\)\( (\))?/$1 || ", "/ex' kernel/sched/core.c drivers/gpio/gpio-sim.c drivers/hv/hv_balloon.c lib/locking-selftest.c Oleg. --- diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index 44bf1709a648..9f659a966ed9 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -70,7 +70,7 @@ static int gpio_sim_apply_pull(struct gpio_sim_chip *chip, gc = &chip->gc; desc = &gc->gpiodev->descs[offset]; - guard(mutex)(&chip->lock); + guard(mutex, &chip->lock); if (test_bit(FLAG_REQUESTED, &desc->flags) && !test_bit(FLAG_IS_OUT, &desc->flags)) { @@ -113,7 +113,7 @@ static int gpio_sim_get(struct gpio_chip *gc, unsigned int offset) { struct gpio_sim_chip *chip = gpiochip_get_data(gc); - guard(mutex)(&chip->lock); + guard(mutex, &chip->lock); return !!test_bit(offset, chip->value_map); } @@ -663,7 +663,7 @@ static ssize_t gpio_sim_device_config_dev_name_show(struct config_item *item, struct gpio_sim_device *dev = to_gpio_sim_device(item); struct platform_device *pdev; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); pdev = dev->pdev; if (pdev) @@ -965,7 +965,7 @@ gpio_sim_device_config_live_store(struct config_item *item, if (ret) return ret; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (live == gpio_sim_device_is_live_unlocked(dev)) ret = -EPERM; @@ -1011,7 +1011,7 @@ static ssize_t gpio_sim_bank_config_chip_name_show(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); struct gpio_sim_chip_name_ctx ctx = { bank->swnode, page }; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return device_for_each_child(&dev->pdev->dev, &ctx, @@ -1028,7 +1028,7 @@ gpio_sim_bank_config_label_show(struct config_item *item, char *page) struct gpio_sim_bank *bank = to_gpio_sim_bank(item); struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", bank->label ?: ""); } @@ -1040,7 +1040,7 @@ static ssize_t gpio_sim_bank_config_label_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1063,7 +1063,7 @@ gpio_sim_bank_config_num_lines_show(struct config_item *item, char *page) struct gpio_sim_bank *bank = to_gpio_sim_bank(item); struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%u\n", bank->num_lines); } @@ -1084,7 +1084,7 @@ gpio_sim_bank_config_num_lines_store(struct config_item *item, if (num_lines == 0) return -EINVAL; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1109,7 +1109,7 @@ gpio_sim_line_config_name_show(struct config_item *item, char *page) struct gpio_sim_line *line = to_gpio_sim_line(item); struct gpio_sim_device *dev = gpio_sim_line_get_device(line); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", line->name ?: ""); } @@ -1121,7 +1121,7 @@ static ssize_t gpio_sim_line_config_name_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_line_get_device(line); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1149,7 +1149,7 @@ static ssize_t gpio_sim_hog_config_name_show(struct config_item *item, struct gpio_sim_hog *hog = to_gpio_sim_hog(item); struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", hog->name ?: ""); } @@ -1161,7 +1161,7 @@ static ssize_t gpio_sim_hog_config_name_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1216,7 +1216,7 @@ gpio_sim_hog_config_direction_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); int dir; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1276,7 +1276,7 @@ gpio_sim_line_config_make_hog_item(struct config_group *group, const char *name) if (strcmp(name, "hog") != 0) return ERR_PTR(-EINVAL); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); hog = kzalloc(sizeof(*hog), GFP_KERNEL); if (!hog) @@ -1334,7 +1334,7 @@ gpio_sim_bank_config_make_line_group(struct config_group *group, if (ret != 1 || nchar != strlen(name)) return ERR_PTR(-EINVAL); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return ERR_PTR(-EBUSY); @@ -1387,7 +1387,7 @@ gpio_sim_device_config_make_bank_group(struct config_group *group, struct gpio_sim_device *dev = to_gpio_sim_device(&group->cg_item); struct gpio_sim_bank *bank; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return ERR_PTR(-EBUSY); diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index e000fa3b9f97..a8954db4cb1c 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -784,7 +784,7 @@ static void hv_online_page(struct page *pg, unsigned int order) struct hv_hotadd_state *has; unsigned long pfn = page_to_pfn(pg); - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry(has, &dm_device.ha_region_list, list) { /* The page belongs to a different HAS. */ if ((pfn < has->start_pfn) || @@ -803,7 +803,7 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) unsigned long residual, new_inc; int ret = 0; - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry(has, &dm_device.ha_region_list, list) { /* * If the pfn range we are dealing with is not in the current @@ -2068,7 +2068,7 @@ static void balloon_remove(struct hv_device *dev) #endif } - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) { list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) { list_del(&gap->list); diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h index 53f1a7a932b0..1d13792a3d85 100644 --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -105,8 +105,8 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \ #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ DEFINE_CLASS(_name, _type, _unlock, ({ _lock; _T; }), _type _T) -#define guard(_name) \ - CLASS(_name, __UNIQUE_ID(guard)) +#define guard(_name, args...) \ + CLASS(_name, __UNIQUE_ID(guard))(args) #define scoped_guard(_name, args...) \ for (CLASS(_name, scope)(args), \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 802551e0009b..81acd7811db3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1097,7 +1097,7 @@ int get_nohz_timer_target(void) hk_mask = housekeeping_cpumask(HK_TYPE_TIMER); - guard(rcu)(); + guard(rcu); for_each_domain(cpu, sd) { for_each_cpu_and(i, sched_domain_span(sd), hk_mask) { @@ -1827,7 +1827,7 @@ static int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, int old_min, old_max, old_min_rt; int result; - guard(mutex)(&uclamp_mutex); + guard(mutex, &uclamp_mutex); old_min = sysctl_sched_uclamp_util_min; old_max = sysctl_sched_uclamp_util_max; @@ -3440,8 +3440,8 @@ static int migrate_swap_stop(void *data) src_rq = cpu_rq(arg->src_cpu); dst_rq = cpu_rq(arg->dst_cpu); - guard(double_raw_spinlock)(&arg->src_task->pi_lock, &arg->dst_task->pi_lock); - guard(double_rq_lock)(src_rq, dst_rq); + guard(double_raw_spinlock, &arg->src_task->pi_lock, &arg->dst_task->pi_lock); + guard(double_rq_lock, src_rq, dst_rq); if (task_cpu(arg->dst_task) != arg->dst_cpu) return -EAGAIN; @@ -3734,7 +3734,7 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) __schedstat_inc(p->stats.nr_wakeups_remote); - guard(rcu)(); + guard(rcu); for_each_domain(rq->cpu, sd) { if (cpumask_test_cpu(cpu, sched_domain_span(sd))) { __schedstat_inc(sd->ttwu_wake_remote); @@ -3940,9 +3940,9 @@ void wake_up_if_idle(int cpu) { struct rq *rq = cpu_rq(cpu); - guard(rcu)(); + guard(rcu); if (is_idle_task(rcu_dereference(rq->curr))) { - guard(rq_lock_irqsave)(rq); + guard(rq_lock_irqsave, rq); if (is_idle_task(rq->curr)) resched_curr(rq); } @@ -4198,7 +4198,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) */ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { - guard(preempt)(); + guard(preempt); int cpu, success = 0; if (p == current) { @@ -5730,7 +5730,7 @@ static void sched_tick_remote(struct work_struct *work) * of when exactly it is running. */ if (tick_nohz_tick_stopped_cpu(cpu)) { - guard(rq_lock_irq)(rq); + guard(rq_lock_irq, rq); struct task_struct *curr = rq->curr; if (cpu_online(cpu)) { @@ -6297,8 +6297,8 @@ static bool try_steal_cookie(int this, int that) unsigned long cookie; bool success = false; - guard(irq)(); - guard(double_rq_lock)(dst, src); + guard(irq); + guard(double_rq_lock, dst, src); cookie = dst->core->core_cookie; if (!cookie) @@ -6410,7 +6410,7 @@ static void sched_core_cpu_starting(unsigned int cpu) struct rq *rq = cpu_rq(cpu), *core_rq = NULL; int t; - guard(core_lock)(&cpu); + guard(core_lock, &cpu); WARN_ON_ONCE(rq->core != rq); @@ -6449,7 +6449,7 @@ static void sched_core_cpu_deactivate(unsigned int cpu) struct rq *rq = cpu_rq(cpu), *core_rq = NULL; int t; - guard(core_lock)(&cpu); + guard(core_lock, &cpu); /* if we're the last man standing, nothing to do */ if (cpumask_weight(smt_mask) == 1) { diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c index 6f6a5fc85b42..724132f6109e 100644 --- a/lib/locking-selftest.c +++ b/lib/locking-selftest.c @@ -2527,8 +2527,8 @@ DEFINE_LOCK_GUARD_0(RCU_SCHED, rcu_read_lock_sched(), rcu_read_unlock_sched()) static void __maybe_unused inner##_in_##outer(void) \ { \ /* Relies the reversed clean-up ordering: inner first */ \ - guard(outer)(outer_lock); \ - guard(inner)(inner_lock); \ + guard(outer, outer_lock); \ + guard(inner, inner_lock); \ } /* ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov @ 2023-11-03 9:30 ` Peter Zijlstra 2023-11-03 18:17 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-03 9:30 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On Thu, Nov 02, 2023 at 03:40:11PM +0100, Oleg Nesterov wrote: > On 11/02, Peter Zijlstra wrote: > > > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- > > interesting... I don't know anything about cleanup.h, will > read this code and the patch later, but I guess I understand > the idea. > > Stupid/offtopic question... Can't we change guard() > > -#define guard(_name) \ > - CLASS(_name, __UNIQUE_ID(guard)) > +#define guard(_name, args...) \ > + CLASS(_name, __UNIQUE_ID(guard))(args) > > and update the current users? > > To me > > guard(rcu); > guard(spinlock, &lock); > > looks better than > > guard(rcu)(); > // doesn't match scoped_guard(spinlock, &lock) > guard(spinlock)(&lock); > > And this will make guard() consistent with scoped_guard(). > > No? Yes (and you're not the only one to have noticed), I think an earlier version actually had that. The current form came about in a fairly long thread with Linus. Most notably here: https://lkml.kernel.org/r/CAHk-%3DwgXN1YxGMUFeuC135aeUvqduF8zJJiZZingzS1Pao5h0A%40mail.gmail.com And I don't actually dislike the current guard form, I've been reading it like: guard<mutex>(&my_mutex); But that is arguably because I've done a fair few years of C++ systems programming before I got involved with this kernel thing. Also, we use a very similar syntax for the static_call thing: static_call(x86_pmu_enable)(event); That said; if we were to do this, then something like: #define __cond_guard(_name, _inst, _fail, args...) \ CLASS(_name, _inst)(args); \ if (!__guard_ptr(_name)(&_inst)) _fail #define cond_guard(_name, _fail, args...) \ __cond_guard(_name, __UNIQUE_ID(guard), _fail, args) cond_guard(spinlock_try, return -EBUSY, &my_lock); Becomes possible. Linus, do you like that enough to suffer a flag day patch as proposed by Oleg? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-03 9:30 ` Peter Zijlstra @ 2023-11-03 18:17 ` Linus Torvalds 2023-11-03 18:51 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2023-11-03 18:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On Thu, 2 Nov 2023 at 23:30, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Nov 02, 2023 at 03:40:11PM +0100, Oleg Nesterov wrote: > > > > To me > > > > guard(rcu); > > guard(spinlock, &lock); > > > > looks better than > > > > guard(rcu)(); > > // doesn't match scoped_guard(spinlock, &lock) > > guard(spinlock)(&lock); > > > > And this will make guard() consistent with scoped_guard(). [...] > That said; if we were to do this, then something like: > > #define __cond_guard(_name, _inst, _fail, args...) \ > CLASS(_name, _inst)(args); \ > if (!__guard_ptr(_name)(&_inst)) _fail > > #define cond_guard(_name, _fail, args...) \ > __cond_guard(_name, __UNIQUE_ID(guard), _fail, args) > > cond_guard(spinlock_try, return -EBUSY, &my_lock); > > Becomes possible. > > Linus, do you like that enough to suffer a flag day patch as proposed by > Oleg? I don't find myself caring too much whether we have that "double grouping" of the guard type-vs-arguments or the "(type, arg...)" syntax. I honestly think that "guard(spinlock)(&lock)" makes it more visually obvious that the first argument is the "type of guard", while "guard(spinlock, &lock)" makes it look like the two arguments are somehow at the same level, which they most definitely aren't. But I also can't find it in myself to care too much about something that is so purely syntactic, and that I suspect should be abstracted away anyway to just become "guard_spinlock(&lock)" with a trivial helper macro. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-03 18:17 ` Linus Torvalds @ 2023-11-03 18:51 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-03 18:51 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/03, Linus Torvalds wrote: > > On Thu, 2 Nov 2023 at 23:30, Peter Zijlstra <peterz@infradead.org> wrote: > > > > Linus, do you like that enough to suffer a flag day patch as proposed by > > Oleg? > > I don't find myself caring too much whether we have that "double > grouping" of the guard type-vs-arguments or the "(type, arg...)" > syntax. Neither me, > I honestly think that "guard(spinlock)(&lock)" makes it more visually > obvious that the first argument is the "type of guard", while > "guard(spinlock, &lock)" makes it look like the two arguments are > somehow at the same level, which they most definitely aren't. My point was that guard(spinlock)(&lock); doesn't match scoped_guard(spinlock, &lock); but I agree this purely cosmetic, so lets forget it. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra @ 2023-11-02 10:44 ` Peter Zijlstra 2023-11-02 15:17 ` Oleg Nesterov 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman Created as testing for the conditional guard infrastructure. Specifically this makes use of the following form: scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR, &task->signal->cred_guard_mutex) { ... } ... return 0; Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/sched/task.h | 2 include/linux/spinlock.h | 26 ++++++++ kernel/ptrace.c | 138 +++++++++++++++++++++------------------------ 3 files changed, 94 insertions(+), 72 deletions(-) --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -226,4 +226,6 @@ static inline void task_unlock(struct ta spin_unlock(&p->alloc_lock); } +DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T)) + #endif /* _LINUX_SCHED_TASK_H */ --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -548,5 +548,31 @@ DEFINE_LOCK_GUARD_1(spinlock_irqsave, sp DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try, spin_trylock_irqsave(_T->lock, _T->flags)) +DEFINE_LOCK_GUARD_1(read_lock, rwlock_t, + read_lock(_T->lock), + read_unlock(_T->lock)) + +DEFINE_LOCK_GUARD_1(read_lock_irq, rwlock_t, + read_lock_irq(_T->lock), + read_unlock_irq(_T->lock)) + +DEFINE_LOCK_GUARD_1(read_lock_irqsave, rwlock_t, + read_lock_irqsave(_T->lock, _T->flags), + read_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + +DEFINE_LOCK_GUARD_1(write_lock, rwlock_t, + write_lock(_T->lock), + write_unlock(_T->lock)) + +DEFINE_LOCK_GUARD_1(write_lock_irq, rwlock_t, + write_lock_irq(_T->lock), + write_unlock_irq(_T->lock)) + +DEFINE_LOCK_GUARD_1(write_lock_irqsave, rwlock_t, + write_lock_irqsave(_T->lock, _T->flags), + write_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + #undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -386,6 +386,34 @@ static int check_ptrace_options(unsigned return 0; } +static inline void ptrace_set_stopped(struct task_struct *task) +{ + guard(spinlock)(&task->sighand->siglock); + + /* + * If the task is already STOPPED, set JOBCTL_TRAP_STOP and + * TRAPPING, and kick it so that it transits to TRACED. TRAPPING + * will be cleared if the child completes the transition or any + * event which clears the group stop states happens. We'll wait + * for the transition to complete before returning from this + * function. + * + * This hides STOPPED -> RUNNING -> TRACED transition from the + * attaching thread but a different thread in the same group can + * still observe the transient RUNNING state. IOW, if another + * thread's WNOHANG wait(2) on the stopped tracee races against + * ATTACH, the wait(2) may fail due to the transient RUNNING. + * + * The following task_is_stopped() test is safe as both transitions + * in and out of STOPPED are protected by siglock. + */ + if (task_is_stopped(task) && + task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) { + task->jobctl &= ~JOBCTL_STOPPED; + signal_wake_up_state(task, __TASK_STOPPED); + } +} + static int ptrace_attach(struct task_struct *task, long request, unsigned long addr, unsigned long flags) @@ -393,17 +421,17 @@ static int ptrace_attach(struct task_str bool seize = (request == PTRACE_SEIZE); int retval; - retval = -EIO; if (seize) { if (addr != 0) - goto out; + return -EIO; /* * This duplicates the check in check_ptrace_options() because * ptrace_attach() and ptrace_setoptions() have historically * used different error codes for unknown ptrace options. */ if (flags & ~(unsigned long)PTRACE_O_MASK) - goto out; + return -EIO; + retval = check_ptrace_options(flags); if (retval) return retval; @@ -414,88 +442,54 @@ static int ptrace_attach(struct task_str audit_ptrace(task); - retval = -EPERM; if (unlikely(task->flags & PF_KTHREAD)) - goto out; + return -EPERM; if (same_thread_group(task, current)) - goto out; + return -EPERM; /* * Protect exec's credential calculations against our interference; * SUID, SGID and LSM creds get determined differently * under ptrace. */ - retval = -ERESTARTNOINTR; - if (mutex_lock_interruptible(&task->signal->cred_guard_mutex)) - goto out; - - task_lock(task); - retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); - task_unlock(task); - if (retval) - goto unlock_creds; - - write_lock_irq(&tasklist_lock); - retval = -EPERM; - if (unlikely(task->exit_state)) - goto unlock_tasklist; - if (task->ptrace) - goto unlock_tasklist; - - task->ptrace = flags; - - ptrace_link(task, current); - - /* SEIZE doesn't trap tracee on attach */ - if (!seize) - send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); + scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR, + &task->signal->cred_guard_mutex) { - spin_lock(&task->sighand->siglock); + scoped_guard (task_lock, task) { + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + if (retval) + return retval; + } + + scoped_guard (write_lock, &tasklist_lock) { + if (unlikely(task->exit_state)) + return -EPERM; + if (task->ptrace) + return -EPERM; + + task->ptrace = flags; + + ptrace_link(task, current); + + /* SEIZE doesn't trap tracee on attach */ + if (!seize) + send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); - /* - * If the task is already STOPPED, set JOBCTL_TRAP_STOP and - * TRAPPING, and kick it so that it transits to TRACED. TRAPPING - * will be cleared if the child completes the transition or any - * event which clears the group stop states happens. We'll wait - * for the transition to complete before returning from this - * function. - * - * This hides STOPPED -> RUNNING -> TRACED transition from the - * attaching thread but a different thread in the same group can - * still observe the transient RUNNING state. IOW, if another - * thread's WNOHANG wait(2) on the stopped tracee races against - * ATTACH, the wait(2) may fail due to the transient RUNNING. - * - * The following task_is_stopped() test is safe as both transitions - * in and out of STOPPED are protected by siglock. - */ - if (task_is_stopped(task) && - task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) { - task->jobctl &= ~JOBCTL_STOPPED; - signal_wake_up_state(task, __TASK_STOPPED); + ptrace_set_stopped(task); + } } - spin_unlock(&task->sighand->siglock); - - retval = 0; -unlock_tasklist: - write_unlock_irq(&tasklist_lock); -unlock_creds: - mutex_unlock(&task->signal->cred_guard_mutex); -out: - if (!retval) { - /* - * We do not bother to change retval or clear JOBCTL_TRAPPING - * if wait_on_bit() was interrupted by SIGKILL. The tracer will - * not return to user-mode, it will exit and clear this bit in - * __ptrace_unlink() if it wasn't already cleared by the tracee; - * and until then nobody can ptrace this task. - */ - wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE); - proc_ptrace_connector(task, PTRACE_ATTACH); - } + /* + * We do not bother to change retval or clear JOBCTL_TRAPPING + * if wait_on_bit() was interrupted by SIGKILL. The tracer will + * not return to user-mode, it will exit and clear this bit in + * __ptrace_unlink() if it wasn't already cleared by the tracee; + * and until then nobody can ptrace this task. + */ + wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE); + proc_ptrace_connector(task, PTRACE_ATTACH); - return retval; + return 0; } /** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra @ 2023-11-02 15:17 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 15:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Peter Zijlstra wrote: > > Created as testing for the conditional guard infrastructure. This patch scares me ;) I need to get used to guard/etc. But looks correct. Reviewed-by: Oleg Nesterov <oleg@redhat.com> > + /* SEIZE doesn't trap tracee on attach */ > + if (!seize) > + send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); This is offtopic, but with or without this patch it is a bit ugly to drop ->siglock and take it again right after that. We can do (later) a minor cleanup on top of this change. --- x/kernel/ptrace.c 2023-11-02 16:03:37.646838530 +0100 +++ x/kernel/ptrace.c 2023-11-02 16:05:52.171052506 +0100 @@ -386,10 +386,14 @@ return 0; } -static inline void ptrace_set_stopped(struct task_struct *task) +static inline void ptrace_set_stopped(struct task_struct *task, bool seize) { guard(spinlock)(&task->sighand->siglock); + /* SEIZE doesn't trap tracee on attach */ + if (!seize) + send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID); + /* * If the task is already STOPPED, set JOBCTL_TRAP_STOP and * TRAPPING, and kick it so that it transits to TRACED. TRAPPING @@ -470,11 +474,6 @@ task->ptrace = flags; ptrace_link(task, current); - - /* SEIZE doesn't trap tracee on attach */ - if (!seize) - send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); - ptrace_set_stopped(task); } } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] cleanup: Conditional locking support 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra @ 2023-11-02 15:34 ` Peter Zijlstra 2 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 15:34 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman On Thu, Nov 02, 2023 at 11:44:29AM +0100, Peter Zijlstra wrote: > My plan is to post the perf patches in 3 batches of roughly 10 patches each, > the simpler first and the more crazy ones (including the above) last. For those interested, the first batch just went out and can be found here: https://lkml.kernel.org/r/20231102150919.719936610@infradead.org ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-03 18:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 2023-11-03 18:17 ` Linus Torvalds 2023-11-03 18:51 ` Oleg Nesterov 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra 2023-11-02 15:17 ` Oleg Nesterov 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox