* [PATCH 0/2] cleanup: Conditional locking support
@ 2023-11-02 10:44 Peter Zijlstra
2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman,
Oleg Nesterov, Eric Biederman
Hi,
Two patches, the first extending the cleanup/guard stuff to better handle
conditional locks, mutex_trylock(), mutex_lock_interruptible() etc.. And the
second a small ptrace conversion that I've been using as a test-case.
The normal scoped_guard() is changed to simply skip the body when a conditional
lock fails to acquire. A new scoped_cond_guard() is added that takes an extra
statement argument to provide an explicit 'fail' action.
The ptrace patch has:
scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
&task->signal->cred_guard_mutex) {
...
}
...
return 0;
Where if the lock acquire fails, it does 'return -ERESTARTNOINTR'.
The crazy perf thing then becomes:
scoped_cond_guard (rwsem_read_intr, goto no_lock,
task ? &task->signal->exec_update_lock : NULL) {
if (0) {
no_lock:
if (task)
return -EINTR;
}
... body with or without lock ...
}
Specifically, that thing needs the lock when there is a task, but otherwise
needs to still do the body without the lock.
IIO also wanted something along these lines, although they have a custom
'trylock' thing.
Barring objections, I'm planning to merge this into tip/locking/cleanup which
I'll merge into tip/locking/core (and tip/perf/core when times comes).
My plan is to post the perf patches in 3 batches of roughly 10 patches each,
the simpler first and the more crazy ones (including the above) last.
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra @ 2023-11-02 10:44 ` Peter Zijlstra 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman Adds: - DEFINE_GUARD_COND() / DEFINE_LOCK_GUARD_1_COND() to extend existing guards with conditional lock primitives, eg. mutex_trylock(), mutex_lock_interruptible(). nb. both primitives allow NULL 'locks', which cause the lock to fail (obviously). - extends scoped_guard() to not take the body when the the conditional guard 'fails'. eg. scoped_guard (mutex_intr, &task->signal_cred_guard_mutex) { ... } will only execute the body when the mutex is held. - provides scoped_cond_guard(name, fail, args...); which extends scoped_guard() to do fail when the lock-acquire fails. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- --- include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- include/linux/mutex.h | 3 +- include/linux/rwsem.h | 8 +++---- include/linux/spinlock.h | 15 +++++++++++++ 4 files changed, 70 insertions(+), 8 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -125,25 +125,55 @@ static inline class_##_name##_t class_## * trivial wrapper around DEFINE_CLASS() above specifically * for locks. * + * DEFINE_GUARD_COND(name, ext, condlock) + * wrapper around EXTEND_CLASS above to add conditional lock + * variants to a base class, eg. mutex_trylock() or + * mutex_lock_interruptible(). + * * guard(name): - * an anonymous instance of the (guard) class + * an anonymous instance of the (guard) class, not recommended for + * conditional locks. * * scoped_guard (name, args...) { }: * similar to CLASS(name, scope)(args), except the variable (with the * explicit name 'scope') is declard in a for-loop such that its scope is * bound to the next (compound) statement. * + * for conditional locks the loop body is skipped when the lock is not + * acquired. + * + * scoped_cond_guard (name, fail, args...) { }: + * similar to scoped_guard(), except it does fail when the lock + * acquire fails. + * */ #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ - DEFINE_CLASS(_name, _type, _unlock, ({ _lock; _T; }), _type _T) + DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return *_T; } + +#define DEFINE_GUARD_COND(_name, _ext, _condlock) \ + EXTEND_CLASS(_name, _ext, \ + ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \ + class_##_name##_t _T) \ + static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ + { return class_##_name##_lock_ptr(_T); } #define guard(_name) \ CLASS(_name, __UNIQUE_ID(guard)) +#define __guard_ptr(_name) class_##_name##_lock_ptr + #define scoped_guard(_name, args...) \ for (CLASS(_name, scope)(args), \ - *done = NULL; !done; done = (void *)1) + *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1) + +#define scoped_cond_guard(_name, _fail, args...) \ + for (CLASS(_name, scope)(args), \ + *done = NULL; !done; done = (void *)1) \ + if (!__guard_ptr(_name)(&scope)) _fail; \ + else /* * Additional helper macros for generating lock guards with types, either for @@ -152,6 +182,7 @@ static inline class_##_name##_t class_## * * DEFINE_LOCK_GUARD_0(name, lock, unlock, ...) * DEFINE_LOCK_GUARD_1(name, type, lock, unlock, ...) + * DEFINE_LOCK_GUARD_1_COND(name, ext, condlock) * * will result in the following type: * @@ -173,6 +204,11 @@ typedef struct { \ static inline void class_##_name##_destructor(class_##_name##_t *_T) \ { \ if (_T->lock) { _unlock; } \ +} \ + \ +static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \ +{ \ + return _T->lock; \ } @@ -201,4 +237,14 @@ __DEFINE_LOCK_GUARD_1(_name, _type, _loc __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__) \ __DEFINE_LOCK_GUARD_0(_name, _lock) +#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \ + EXTEND_CLASS(_name, _ext, \ + ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\ + if (_T->lock && !(_condlock)) _T->lock = NULL; \ + _t; }), \ + typeof_member(class_##_name##_t, lock) l) \ + static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ + { return class_##_name##_lock_ptr(_T); } + + #endif /* __LINUX_GUARDS_H */ --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -221,6 +221,7 @@ extern void mutex_unlock(struct mutex *l extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T)) -DEFINE_FREE(mutex, struct mutex *, if (_T) mutex_unlock(_T)) +DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T)) +DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0) #endif /* __LINUX_MUTEX_H */ --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -203,11 +203,11 @@ extern void up_read(struct rw_semaphore extern void up_write(struct rw_semaphore *sem); DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T)) -DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) - -DEFINE_FREE(up_read, struct rw_semaphore *, if (_T) up_read(_T)) -DEFINE_FREE(up_write, struct rw_semaphore *, if (_T) up_write(_T)) +DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T)) +DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0) +DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) +DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T)) /* * downgrade write lock to read lock --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -507,6 +507,8 @@ DEFINE_LOCK_GUARD_1(raw_spinlock, raw_sp raw_spin_lock(_T->lock), raw_spin_unlock(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock, _try, raw_spin_trylock(_T->lock)) + DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t, raw_spin_lock_nested(_T->lock, SINGLE_DEPTH_NESTING), raw_spin_unlock(_T->lock)) @@ -515,23 +517,36 @@ DEFINE_LOCK_GUARD_1(raw_spinlock_irq, ra raw_spin_lock_irq(_T->lock), raw_spin_unlock_irq(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock)) + DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t, raw_spin_lock_irqsave(_T->lock, _T->flags), raw_spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try, + raw_spin_trylock_irqsave(_T->lock, _T->flags)) + DEFINE_LOCK_GUARD_1(spinlock, spinlock_t, spin_lock(_T->lock), spin_unlock(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock)) + DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t, spin_lock_irq(_T->lock), spin_unlock_irq(_T->lock)) +DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try, + spin_trylock_irq(_T->lock)) + DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t, spin_lock_irqsave(_T->lock, _T->flags), spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) +DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try, + spin_trylock_irqsave(_T->lock, _T->flags)) + #undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra @ 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 0 siblings, 2 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 14:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Peter Zijlstra wrote: > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- interesting... I don't know anything about cleanup.h, will read this code and the patch later, but I guess I understand the idea. Stupid/offtopic question... Can't we change guard() -#define guard(_name) \ - CLASS(_name, __UNIQUE_ID(guard)) +#define guard(_name, args...) \ + CLASS(_name, __UNIQUE_ID(guard))(args) and update the current users? To me guard(rcu); guard(spinlock, &lock); looks better than guard(rcu)(); // doesn't match scoped_guard(spinlock, &lock) guard(spinlock)(&lock); And this will make guard() consistent with scoped_guard(). No? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 14:40 ` Oleg Nesterov @ 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 15:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Oleg Nesterov wrote: > > On 11/02, Peter Zijlstra wrote: > > > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- > > interesting... I don't know anything about cleanup.h, will > read this code and the patch later, but I guess I understand > the idea. > > Stupid/offtopic question... Can't we change guard() > > -#define guard(_name) \ > - CLASS(_name, __UNIQUE_ID(guard)) > +#define guard(_name, args...) \ > + CLASS(_name, __UNIQUE_ID(guard))(args) > > and update the current users? > > To me > > guard(rcu); > guard(spinlock, &lock); > > looks better than > > guard(rcu)(); > // doesn't match scoped_guard(spinlock, &lock) > guard(spinlock)(&lock); > > And this will make guard() consistent with scoped_guard(). Just in case the kernel builds and botts with the patch below. The .c files were changed by perl -wpi~ -e 's/\bguard\(\w+\K\)\( (\))?/$1 || ", "/ex' kernel/sched/core.c drivers/gpio/gpio-sim.c drivers/hv/hv_balloon.c lib/locking-selftest.c Oleg. --- diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index 44bf1709a648..9f659a966ed9 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -70,7 +70,7 @@ static int gpio_sim_apply_pull(struct gpio_sim_chip *chip, gc = &chip->gc; desc = &gc->gpiodev->descs[offset]; - guard(mutex)(&chip->lock); + guard(mutex, &chip->lock); if (test_bit(FLAG_REQUESTED, &desc->flags) && !test_bit(FLAG_IS_OUT, &desc->flags)) { @@ -113,7 +113,7 @@ static int gpio_sim_get(struct gpio_chip *gc, unsigned int offset) { struct gpio_sim_chip *chip = gpiochip_get_data(gc); - guard(mutex)(&chip->lock); + guard(mutex, &chip->lock); return !!test_bit(offset, chip->value_map); } @@ -663,7 +663,7 @@ static ssize_t gpio_sim_device_config_dev_name_show(struct config_item *item, struct gpio_sim_device *dev = to_gpio_sim_device(item); struct platform_device *pdev; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); pdev = dev->pdev; if (pdev) @@ -965,7 +965,7 @@ gpio_sim_device_config_live_store(struct config_item *item, if (ret) return ret; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (live == gpio_sim_device_is_live_unlocked(dev)) ret = -EPERM; @@ -1011,7 +1011,7 @@ static ssize_t gpio_sim_bank_config_chip_name_show(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); struct gpio_sim_chip_name_ctx ctx = { bank->swnode, page }; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return device_for_each_child(&dev->pdev->dev, &ctx, @@ -1028,7 +1028,7 @@ gpio_sim_bank_config_label_show(struct config_item *item, char *page) struct gpio_sim_bank *bank = to_gpio_sim_bank(item); struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", bank->label ?: ""); } @@ -1040,7 +1040,7 @@ static ssize_t gpio_sim_bank_config_label_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1063,7 +1063,7 @@ gpio_sim_bank_config_num_lines_show(struct config_item *item, char *page) struct gpio_sim_bank *bank = to_gpio_sim_bank(item); struct gpio_sim_device *dev = gpio_sim_bank_get_device(bank); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%u\n", bank->num_lines); } @@ -1084,7 +1084,7 @@ gpio_sim_bank_config_num_lines_store(struct config_item *item, if (num_lines == 0) return -EINVAL; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1109,7 +1109,7 @@ gpio_sim_line_config_name_show(struct config_item *item, char *page) struct gpio_sim_line *line = to_gpio_sim_line(item); struct gpio_sim_device *dev = gpio_sim_line_get_device(line); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", line->name ?: ""); } @@ -1121,7 +1121,7 @@ static ssize_t gpio_sim_line_config_name_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_line_get_device(line); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1149,7 +1149,7 @@ static ssize_t gpio_sim_hog_config_name_show(struct config_item *item, struct gpio_sim_hog *hog = to_gpio_sim_hog(item); struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); return sprintf(page, "%s\n", hog->name ?: ""); } @@ -1161,7 +1161,7 @@ static ssize_t gpio_sim_hog_config_name_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); char *trimmed; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1216,7 +1216,7 @@ gpio_sim_hog_config_direction_store(struct config_item *item, struct gpio_sim_device *dev = gpio_sim_hog_get_device(hog); int dir; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return -EBUSY; @@ -1276,7 +1276,7 @@ gpio_sim_line_config_make_hog_item(struct config_group *group, const char *name) if (strcmp(name, "hog") != 0) return ERR_PTR(-EINVAL); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); hog = kzalloc(sizeof(*hog), GFP_KERNEL); if (!hog) @@ -1334,7 +1334,7 @@ gpio_sim_bank_config_make_line_group(struct config_group *group, if (ret != 1 || nchar != strlen(name)) return ERR_PTR(-EINVAL); - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return ERR_PTR(-EBUSY); @@ -1387,7 +1387,7 @@ gpio_sim_device_config_make_bank_group(struct config_group *group, struct gpio_sim_device *dev = to_gpio_sim_device(&group->cg_item); struct gpio_sim_bank *bank; - guard(mutex)(&dev->lock); + guard(mutex, &dev->lock); if (gpio_sim_device_is_live_unlocked(dev)) return ERR_PTR(-EBUSY); diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index e000fa3b9f97..a8954db4cb1c 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -784,7 +784,7 @@ static void hv_online_page(struct page *pg, unsigned int order) struct hv_hotadd_state *has; unsigned long pfn = page_to_pfn(pg); - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry(has, &dm_device.ha_region_list, list) { /* The page belongs to a different HAS. */ if ((pfn < has->start_pfn) || @@ -803,7 +803,7 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) unsigned long residual, new_inc; int ret = 0; - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry(has, &dm_device.ha_region_list, list) { /* * If the pfn range we are dealing with is not in the current @@ -2068,7 +2068,7 @@ static void balloon_remove(struct hv_device *dev) #endif } - guard(spinlock_irqsave)(&dm_device.ha_lock); + guard(spinlock_irqsave, &dm_device.ha_lock); list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) { list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) { list_del(&gap->list); diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h index 53f1a7a932b0..1d13792a3d85 100644 --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -105,8 +105,8 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \ #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ DEFINE_CLASS(_name, _type, _unlock, ({ _lock; _T; }), _type _T) -#define guard(_name) \ - CLASS(_name, __UNIQUE_ID(guard)) +#define guard(_name, args...) \ + CLASS(_name, __UNIQUE_ID(guard))(args) #define scoped_guard(_name, args...) \ for (CLASS(_name, scope)(args), \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 802551e0009b..81acd7811db3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1097,7 +1097,7 @@ int get_nohz_timer_target(void) hk_mask = housekeeping_cpumask(HK_TYPE_TIMER); - guard(rcu)(); + guard(rcu); for_each_domain(cpu, sd) { for_each_cpu_and(i, sched_domain_span(sd), hk_mask) { @@ -1827,7 +1827,7 @@ static int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, int old_min, old_max, old_min_rt; int result; - guard(mutex)(&uclamp_mutex); + guard(mutex, &uclamp_mutex); old_min = sysctl_sched_uclamp_util_min; old_max = sysctl_sched_uclamp_util_max; @@ -3440,8 +3440,8 @@ static int migrate_swap_stop(void *data) src_rq = cpu_rq(arg->src_cpu); dst_rq = cpu_rq(arg->dst_cpu); - guard(double_raw_spinlock)(&arg->src_task->pi_lock, &arg->dst_task->pi_lock); - guard(double_rq_lock)(src_rq, dst_rq); + guard(double_raw_spinlock, &arg->src_task->pi_lock, &arg->dst_task->pi_lock); + guard(double_rq_lock, src_rq, dst_rq); if (task_cpu(arg->dst_task) != arg->dst_cpu) return -EAGAIN; @@ -3734,7 +3734,7 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) __schedstat_inc(p->stats.nr_wakeups_remote); - guard(rcu)(); + guard(rcu); for_each_domain(rq->cpu, sd) { if (cpumask_test_cpu(cpu, sched_domain_span(sd))) { __schedstat_inc(sd->ttwu_wake_remote); @@ -3940,9 +3940,9 @@ void wake_up_if_idle(int cpu) { struct rq *rq = cpu_rq(cpu); - guard(rcu)(); + guard(rcu); if (is_idle_task(rcu_dereference(rq->curr))) { - guard(rq_lock_irqsave)(rq); + guard(rq_lock_irqsave, rq); if (is_idle_task(rq->curr)) resched_curr(rq); } @@ -4198,7 +4198,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) */ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { - guard(preempt)(); + guard(preempt); int cpu, success = 0; if (p == current) { @@ -5730,7 +5730,7 @@ static void sched_tick_remote(struct work_struct *work) * of when exactly it is running. */ if (tick_nohz_tick_stopped_cpu(cpu)) { - guard(rq_lock_irq)(rq); + guard(rq_lock_irq, rq); struct task_struct *curr = rq->curr; if (cpu_online(cpu)) { @@ -6297,8 +6297,8 @@ static bool try_steal_cookie(int this, int that) unsigned long cookie; bool success = false; - guard(irq)(); - guard(double_rq_lock)(dst, src); + guard(irq); + guard(double_rq_lock, dst, src); cookie = dst->core->core_cookie; if (!cookie) @@ -6410,7 +6410,7 @@ static void sched_core_cpu_starting(unsigned int cpu) struct rq *rq = cpu_rq(cpu), *core_rq = NULL; int t; - guard(core_lock)(&cpu); + guard(core_lock, &cpu); WARN_ON_ONCE(rq->core != rq); @@ -6449,7 +6449,7 @@ static void sched_core_cpu_deactivate(unsigned int cpu) struct rq *rq = cpu_rq(cpu), *core_rq = NULL; int t; - guard(core_lock)(&cpu); + guard(core_lock, &cpu); /* if we're the last man standing, nothing to do */ if (cpumask_weight(smt_mask) == 1) { diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c index 6f6a5fc85b42..724132f6109e 100644 --- a/lib/locking-selftest.c +++ b/lib/locking-selftest.c @@ -2527,8 +2527,8 @@ DEFINE_LOCK_GUARD_0(RCU_SCHED, rcu_read_lock_sched(), rcu_read_unlock_sched()) static void __maybe_unused inner##_in_##outer(void) \ { \ /* Relies the reversed clean-up ordering: inner first */ \ - guard(outer)(outer_lock); \ - guard(inner)(inner_lock); \ + guard(outer, outer_lock); \ + guard(inner, inner_lock); \ } /* ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov @ 2023-11-03 9:30 ` Peter Zijlstra 2023-11-03 18:17 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-03 9:30 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On Thu, Nov 02, 2023 at 03:40:11PM +0100, Oleg Nesterov wrote: > On 11/02, Peter Zijlstra wrote: > > > > include/linux/cleanup.h | 52 ++++++++++++++++++++++++++++++++++++++++++++--- > > interesting... I don't know anything about cleanup.h, will > read this code and the patch later, but I guess I understand > the idea. > > Stupid/offtopic question... Can't we change guard() > > -#define guard(_name) \ > - CLASS(_name, __UNIQUE_ID(guard)) > +#define guard(_name, args...) \ > + CLASS(_name, __UNIQUE_ID(guard))(args) > > and update the current users? > > To me > > guard(rcu); > guard(spinlock, &lock); > > looks better than > > guard(rcu)(); > // doesn't match scoped_guard(spinlock, &lock) > guard(spinlock)(&lock); > > And this will make guard() consistent with scoped_guard(). > > No? Yes (and you're not the only one to have noticed), I think an earlier version actually had that. The current form came about in a fairly long thread with Linus. Most notably here: https://lkml.kernel.org/r/CAHk-%3DwgXN1YxGMUFeuC135aeUvqduF8zJJiZZingzS1Pao5h0A%40mail.gmail.com And I don't actually dislike the current guard form, I've been reading it like: guard<mutex>(&my_mutex); But that is arguably because I've done a fair few years of C++ systems programming before I got involved with this kernel thing. Also, we use a very similar syntax for the static_call thing: static_call(x86_pmu_enable)(event); That said; if we were to do this, then something like: #define __cond_guard(_name, _inst, _fail, args...) \ CLASS(_name, _inst)(args); \ if (!__guard_ptr(_name)(&_inst)) _fail #define cond_guard(_name, _fail, args...) \ __cond_guard(_name, __UNIQUE_ID(guard), _fail, args) cond_guard(spinlock_try, return -EBUSY, &my_lock); Becomes possible. Linus, do you like that enough to suffer a flag day patch as proposed by Oleg? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-03 9:30 ` Peter Zijlstra @ 2023-11-03 18:17 ` Linus Torvalds 2023-11-03 18:51 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2023-11-03 18:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On Thu, 2 Nov 2023 at 23:30, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Nov 02, 2023 at 03:40:11PM +0100, Oleg Nesterov wrote: > > > > To me > > > > guard(rcu); > > guard(spinlock, &lock); > > > > looks better than > > > > guard(rcu)(); > > // doesn't match scoped_guard(spinlock, &lock) > > guard(spinlock)(&lock); > > > > And this will make guard() consistent with scoped_guard(). [...] > That said; if we were to do this, then something like: > > #define __cond_guard(_name, _inst, _fail, args...) \ > CLASS(_name, _inst)(args); \ > if (!__guard_ptr(_name)(&_inst)) _fail > > #define cond_guard(_name, _fail, args...) \ > __cond_guard(_name, __UNIQUE_ID(guard), _fail, args) > > cond_guard(spinlock_try, return -EBUSY, &my_lock); > > Becomes possible. > > Linus, do you like that enough to suffer a flag day patch as proposed by > Oleg? I don't find myself caring too much whether we have that "double grouping" of the guard type-vs-arguments or the "(type, arg...)" syntax. I honestly think that "guard(spinlock)(&lock)" makes it more visually obvious that the first argument is the "type of guard", while "guard(spinlock, &lock)" makes it look like the two arguments are somehow at the same level, which they most definitely aren't. But I also can't find it in myself to care too much about something that is so purely syntactic, and that I suspect should be abstracted away anyway to just become "guard_spinlock(&lock)" with a trivial helper macro. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] cleanup: Add conditional guard support 2023-11-03 18:17 ` Linus Torvalds @ 2023-11-03 18:51 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-03 18:51 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/03, Linus Torvalds wrote: > > On Thu, 2 Nov 2023 at 23:30, Peter Zijlstra <peterz@infradead.org> wrote: > > > > Linus, do you like that enough to suffer a flag day patch as proposed by > > Oleg? > > I don't find myself caring too much whether we have that "double > grouping" of the guard type-vs-arguments or the "(type, arg...)" > syntax. Neither me, > I honestly think that "guard(spinlock)(&lock)" makes it more visually > obvious that the first argument is the "type of guard", while > "guard(spinlock, &lock)" makes it look like the two arguments are > somehow at the same level, which they most definitely aren't. My point was that guard(spinlock)(&lock); doesn't match scoped_guard(spinlock, &lock); but I agree this purely cosmetic, so lets forget it. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra @ 2023-11-02 10:44 ` Peter Zijlstra 2023-11-02 15:17 ` Oleg Nesterov 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 10:44 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, peterz, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman Created as testing for the conditional guard infrastructure. Specifically this makes use of the following form: scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR, &task->signal->cred_guard_mutex) { ... } ... return 0; Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/sched/task.h | 2 include/linux/spinlock.h | 26 ++++++++ kernel/ptrace.c | 138 +++++++++++++++++++++------------------------ 3 files changed, 94 insertions(+), 72 deletions(-) --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -226,4 +226,6 @@ static inline void task_unlock(struct ta spin_unlock(&p->alloc_lock); } +DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T)) + #endif /* _LINUX_SCHED_TASK_H */ --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -548,5 +548,31 @@ DEFINE_LOCK_GUARD_1(spinlock_irqsave, sp DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try, spin_trylock_irqsave(_T->lock, _T->flags)) +DEFINE_LOCK_GUARD_1(read_lock, rwlock_t, + read_lock(_T->lock), + read_unlock(_T->lock)) + +DEFINE_LOCK_GUARD_1(read_lock_irq, rwlock_t, + read_lock_irq(_T->lock), + read_unlock_irq(_T->lock)) + +DEFINE_LOCK_GUARD_1(read_lock_irqsave, rwlock_t, + read_lock_irqsave(_T->lock, _T->flags), + read_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + +DEFINE_LOCK_GUARD_1(write_lock, rwlock_t, + write_lock(_T->lock), + write_unlock(_T->lock)) + +DEFINE_LOCK_GUARD_1(write_lock_irq, rwlock_t, + write_lock_irq(_T->lock), + write_unlock_irq(_T->lock)) + +DEFINE_LOCK_GUARD_1(write_lock_irqsave, rwlock_t, + write_lock_irqsave(_T->lock, _T->flags), + write_unlock_irqrestore(_T->lock, _T->flags), + unsigned long flags) + #undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -386,6 +386,34 @@ static int check_ptrace_options(unsigned return 0; } +static inline void ptrace_set_stopped(struct task_struct *task) +{ + guard(spinlock)(&task->sighand->siglock); + + /* + * If the task is already STOPPED, set JOBCTL_TRAP_STOP and + * TRAPPING, and kick it so that it transits to TRACED. TRAPPING + * will be cleared if the child completes the transition or any + * event which clears the group stop states happens. We'll wait + * for the transition to complete before returning from this + * function. + * + * This hides STOPPED -> RUNNING -> TRACED transition from the + * attaching thread but a different thread in the same group can + * still observe the transient RUNNING state. IOW, if another + * thread's WNOHANG wait(2) on the stopped tracee races against + * ATTACH, the wait(2) may fail due to the transient RUNNING. + * + * The following task_is_stopped() test is safe as both transitions + * in and out of STOPPED are protected by siglock. + */ + if (task_is_stopped(task) && + task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) { + task->jobctl &= ~JOBCTL_STOPPED; + signal_wake_up_state(task, __TASK_STOPPED); + } +} + static int ptrace_attach(struct task_struct *task, long request, unsigned long addr, unsigned long flags) @@ -393,17 +421,17 @@ static int ptrace_attach(struct task_str bool seize = (request == PTRACE_SEIZE); int retval; - retval = -EIO; if (seize) { if (addr != 0) - goto out; + return -EIO; /* * This duplicates the check in check_ptrace_options() because * ptrace_attach() and ptrace_setoptions() have historically * used different error codes for unknown ptrace options. */ if (flags & ~(unsigned long)PTRACE_O_MASK) - goto out; + return -EIO; + retval = check_ptrace_options(flags); if (retval) return retval; @@ -414,88 +442,54 @@ static int ptrace_attach(struct task_str audit_ptrace(task); - retval = -EPERM; if (unlikely(task->flags & PF_KTHREAD)) - goto out; + return -EPERM; if (same_thread_group(task, current)) - goto out; + return -EPERM; /* * Protect exec's credential calculations against our interference; * SUID, SGID and LSM creds get determined differently * under ptrace. */ - retval = -ERESTARTNOINTR; - if (mutex_lock_interruptible(&task->signal->cred_guard_mutex)) - goto out; - - task_lock(task); - retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); - task_unlock(task); - if (retval) - goto unlock_creds; - - write_lock_irq(&tasklist_lock); - retval = -EPERM; - if (unlikely(task->exit_state)) - goto unlock_tasklist; - if (task->ptrace) - goto unlock_tasklist; - - task->ptrace = flags; - - ptrace_link(task, current); - - /* SEIZE doesn't trap tracee on attach */ - if (!seize) - send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); + scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR, + &task->signal->cred_guard_mutex) { - spin_lock(&task->sighand->siglock); + scoped_guard (task_lock, task) { + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + if (retval) + return retval; + } + + scoped_guard (write_lock, &tasklist_lock) { + if (unlikely(task->exit_state)) + return -EPERM; + if (task->ptrace) + return -EPERM; + + task->ptrace = flags; + + ptrace_link(task, current); + + /* SEIZE doesn't trap tracee on attach */ + if (!seize) + send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); - /* - * If the task is already STOPPED, set JOBCTL_TRAP_STOP and - * TRAPPING, and kick it so that it transits to TRACED. TRAPPING - * will be cleared if the child completes the transition or any - * event which clears the group stop states happens. We'll wait - * for the transition to complete before returning from this - * function. - * - * This hides STOPPED -> RUNNING -> TRACED transition from the - * attaching thread but a different thread in the same group can - * still observe the transient RUNNING state. IOW, if another - * thread's WNOHANG wait(2) on the stopped tracee races against - * ATTACH, the wait(2) may fail due to the transient RUNNING. - * - * The following task_is_stopped() test is safe as both transitions - * in and out of STOPPED are protected by siglock. - */ - if (task_is_stopped(task) && - task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) { - task->jobctl &= ~JOBCTL_STOPPED; - signal_wake_up_state(task, __TASK_STOPPED); + ptrace_set_stopped(task); + } } - spin_unlock(&task->sighand->siglock); - - retval = 0; -unlock_tasklist: - write_unlock_irq(&tasklist_lock); -unlock_creds: - mutex_unlock(&task->signal->cred_guard_mutex); -out: - if (!retval) { - /* - * We do not bother to change retval or clear JOBCTL_TRAPPING - * if wait_on_bit() was interrupted by SIGKILL. The tracer will - * not return to user-mode, it will exit and clear this bit in - * __ptrace_unlink() if it wasn't already cleared by the tracee; - * and until then nobody can ptrace this task. - */ - wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE); - proc_ptrace_connector(task, PTRACE_ATTACH); - } + /* + * We do not bother to change retval or clear JOBCTL_TRAPPING + * if wait_on_bit() was interrupted by SIGKILL. The tracer will + * not return to user-mode, it will exit and clear this bit in + * __ptrace_unlink() if it wasn't already cleared by the tracee; + * and until then nobody can ptrace this task. + */ + wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE); + proc_ptrace_connector(task, PTRACE_ATTACH); - return retval; + return 0; } /** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra @ 2023-11-02 15:17 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2023-11-02 15:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Eric Biederman On 11/02, Peter Zijlstra wrote: > > Created as testing for the conditional guard infrastructure. This patch scares me ;) I need to get used to guard/etc. But looks correct. Reviewed-by: Oleg Nesterov <oleg@redhat.com> > + /* SEIZE doesn't trap tracee on attach */ > + if (!seize) > + send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); This is offtopic, but with or without this patch it is a bit ugly to drop ->siglock and take it again right after that. We can do (later) a minor cleanup on top of this change. --- x/kernel/ptrace.c 2023-11-02 16:03:37.646838530 +0100 +++ x/kernel/ptrace.c 2023-11-02 16:05:52.171052506 +0100 @@ -386,10 +386,14 @@ return 0; } -static inline void ptrace_set_stopped(struct task_struct *task) +static inline void ptrace_set_stopped(struct task_struct *task, bool seize) { guard(spinlock)(&task->sighand->siglock); + /* SEIZE doesn't trap tracee on attach */ + if (!seize) + send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID); + /* * If the task is already STOPPED, set JOBCTL_TRAP_STOP and * TRAPPING, and kick it so that it transits to TRACED. TRAPPING @@ -470,11 +474,6 @@ task->ptrace = flags; ptrace_link(task, current); - - /* SEIZE doesn't trap tracee on attach */ - if (!seize) - send_sig_info(SIGSTOP, SEND_SIG_PRIV, task); - ptrace_set_stopped(task); } } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] cleanup: Conditional locking support 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra @ 2023-11-02 15:34 ` Peter Zijlstra 2 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2023-11-02 15:34 UTC (permalink / raw) To: Linus Torvalds Cc: x86, linux-kernel, Jonathan Cameron, Greg Kroah-Hartman, Oleg Nesterov, Eric Biederman On Thu, Nov 02, 2023 at 11:44:29AM +0100, Peter Zijlstra wrote: > My plan is to post the perf patches in 3 batches of roughly 10 patches each, > the simpler first and the more crazy ones (including the above) last. For those interested, the first batch just went out and can be found here: https://lkml.kernel.org/r/20231102150919.719936610@infradead.org ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-03 18:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-02 10:44 [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra 2023-11-02 10:44 ` [PATCH 1/2] cleanup: Add conditional guard support Peter Zijlstra 2023-11-02 14:40 ` Oleg Nesterov 2023-11-02 15:55 ` Oleg Nesterov 2023-11-03 9:30 ` Peter Zijlstra 2023-11-03 18:17 ` Linus Torvalds 2023-11-03 18:51 ` Oleg Nesterov 2023-11-02 10:44 ` [PATCH 2/2] ptrace: Convert ptrace_attach() to use lock guards Peter Zijlstra 2023-11-02 15:17 ` Oleg Nesterov 2023-11-02 15:34 ` [PATCH 0/2] cleanup: Conditional locking support Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.