Re: Thread Safety Analysis and the Linux kernel

public inbox for linux-toolchains@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: "Puchert, Aaron" <aaron.puchert@sap.com>
Cc: Marco Elver <elver@google.com>,
	Aaron Ballman <aaron@aaronballman.com>,
	"linux-toolchains@vger.kernel.org"
	<linux-toolchains@vger.kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Bart Van Assche <bvanassche@acm.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: Re: Thread Safety Analysis and the Linux kernel
Date: Fri, 7 Mar 2025 09:52:04 +0100	[thread overview]
Message-ID: <20250307085204.GJ16878@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <DB7PR02MB3626056525E95B643D87CBC4E7CA2@DB7PR02MB3626.eurprd02.prod.outlook.com>

On Thu, Mar 06, 2025 at 11:14:21PM +0000, Puchert, Aaron wrote:

> So it boils down to
> 
> if (expr)
> 	acquire(mutex);
> do_something();
> if (expr)
> 	release(mutex);
> 
> If the "do_something()" block is rather short, as in the second example, it might also be an alternative to write:
> 
>          if (parent) {
>                  device_lock(parent);
>                  device_release_driver(dev);
>                  device_unlock(parent);
>          } else {
>                  device_release_driver(dev);
>          }
> 
> In any event, since the lock is only conditionally held, the block
> cannot require the mutex. So we can't be trying to satisfy
> "guarded_by" or "requires" attributes. We're just trying to avoid
> warnings about conditionally held capabilities. So at this point you
> might also be using "no_thread_safety_analysis". That's what we have
> typically done when we had this pattern.
> 
> > And I think we can expect a fair amount of resistance from maintainers
> > if we're going to go around rewriting their -- perfectly fine code,
> > thank you very much -- to please a static analyzer :/
> 
> I think there are some good arguments outside of the static analyzer.
> The pattern repeats the "if (expr)", and especially if "expr" is a bit
> more complex, this can be a source of errors:
> 
>                  if (!prog->sleepable)
>                          rcu_read_lock();
>  
>                  // ...
>  
>                  if (prog->sleepable)
>                          rcu_read_unlock();
> 
> This might slip through in a review, and could easily happen by
> copy-and-paste. (Of course you'll notice when testing this.) If you
> build some kind of conditional scope, it might look like this:
> 
>         scoped_conditional_guard (<rcu lock>, !prog->sleepable) {
>                 // ...
>         }
> 
> Another aspect is that even if the condition is symbolically the same,
> its value might change between locking and unlocking. If you write the
> condition once and evaluate it once, you can exclude both sources of
> errors.

Yeah, so IIRC I once proposed a guard that takes a NULL pointer to mean
not take the lock, but people had a bit of a fit.

It would've allowed writing the thing like:

	{
		guard(device)(parent);
		device_release_driver(dev);
	}

Anyway, yes I mostly agree, but this isn't necessarily code I have much
to say over. And while I can argue with the relevant maintainers, at
some point its their call if they are willing to accept a code change.

> > One of the more common patterns that I've talked about with Marco is
> > where modification requires two locks, while access requires either
> > lock.
> > 
> > The current __guarded_by() annotation cannot express this.
> > 
> > One possible way would be to construct a fake / 0-size capability
> > structure and have each of these locks acquire it in 'shared' mode (but
> > then you run into the recursion thing) and have a special annotation
> > that upgrades it to exclusive.
> > 
> > [...]
> > 
> > I would much rather write something like:
> > 
> > struct task_struct {
> >         raw_spinlock_t pi_lock;
> > 
> >         cpumask_t       cpus_allowed __guarded_by(pi_lock, rq->__lock);
> > }
> 
> There were ideas to support arbitrary Boolean expressions, but they
> wouldn't have allowed different expressions for read and write. So
> this would have been both too broad and not broad enough for your use
> case.

Right, that would work if you then also introduce __guarded_shared_by()
to express the requirements for access, it then becomes:

	cpumask_t	cpus_allowed __guarded_by(pi_lock && rq->__lock)
				     __guarded_shared_by(pi_lock || rq->__lock);

Which is a little more verbose over all, but entirely more flexible.

+- naming, I'm occasionally rubbish at naming :-)

> Your idea on the other hand sounds pretty reasonable to me. We already
> support multiple arguments in many other attributes, so it seems like
> a consistent extension. And semantically it makes sense to me as well:
> if a writer is required to hold all capabilities, then a reader should
> be fine if they hold any one. I'll have to think a bit more about
> whether this is always the right thing to do, but if it is, we have an
> easy way out.

If not all required locks are exclusive it comes apart.

Say, it needs a shared capability to satisfy existence (refcount, rcu,
whatever) and an object internal lock to serialize state.

But for exclusive locks, yes, holding all for modification implies
stability while holding any (except the empty) subset.

> > And now I realize this case is doubly annoying because (rq) isn't
> > something that is in lexical scope at this point :-( These are the
> > locking rules though, and it is vexing not being able to express things
> > like this.
> 
> If it's not in scope, how do we know we have the right thing? Is
> task_struct always contained in some larger object that also has "rq",
> and never outside of that?

See that __task_rq_lock() example I used for the return value. A task is
always on _a_ runqueue, but unless we hold that rq's lock, the relation
is not stable.

> The language doesn't really prevent you from having a lonely
> task_struct, and then it wouldn't be clear what we're referring to.

Yeah, this isn't something that can be expressed.

> A way out might be to consider the task_struct field in the larger
> object as guarded_by, and find a way to exclude the contained mutex.

> We could also introduce some kind of guarded_by_member(lock, member)
> to be added in the larger object. This would say that "lock" guards
> only the "member" subfield of the field. However, both would only
> apply if we're accessing the member through the parent. If you hand
> the task_struct by itself to some function and let that access
> cpus_allowed, we don't know what you're talking about.
> 
> But maybe you can explain a bit how the objects are coupled before I
> speculate too much into this.

:-)

Right, so we have a per-cpu struct rq, and a task has a member
indicating which cpu it currently belongs to.

struct rq {
	raw_spinlock_t __lock;
};

DEFINE_PER_CPU(struct rq, runqueues);

#define cpu_rq(cpu)	(&per_cpu(runqueues, (cpu)))

/*
 * And I suppose we can imagine runqueues to simply be an array
 * and per_cpu() to index into said array. Although the reality is that
 * it is much more like thread-local-storage in userspace.
 */

struct task_struct {
	raw_spinlock_t pi_lock;
	int cpu;
}

int task_cpu(struct task_struct *p)
{
	return p->cpu;
}

#define task_rq(p) cpu_rq(task_cpu(p))

struct rq *__task_rq_lock(struct task_struct *p)
	__must_hold(p->pi_lock)
	__acquires(_Return->__lock)
{
	struct rq *rq;

	lockdep_assert_held(&p->pi_lock);

	for (;;) {
		rq = task_rq(p);
		raw_spin_rq_lock(rq);
		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p)))
			return rq;
		raw_spin_rq_unlock(rq);

		while (unlikelt(task_on_rq_migrating(p)))
			cpu_relax();
	}
}

And then we have a pile of variables, like the allowed CPU mask, that
are only to be modified while holding both pi_lock and the correct
rq->lock.

And yes, this is all somewhat of a pain :-)

Now, luck will have it that I am in fact one of the scheduler
maintainers, so I can muck about with this a bit -- within reason.

One thing we could do is add a struct rq pointer -- duplicating the cpu
member state, like:

struct task_struct {
	raw_spinlock_t pikk
	int cpu;
	struct rq *rq;
};

Possibly just for verification builds, but migrations are expensive
enough that I don't think tracking this one extra variable is going to
be noticed much.

This would give us means of writing what we want:

	cpumask	cpus_allowed __guarded_by(pi_lock, rq->__lock);

next prev parent reply	other threads:[~2025-03-07  8:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 11:47 Thread Safety Analysis and the Linux kernel Marco Elver
2025-03-05 23:54 ` Puchert, Aaron
2025-03-06  9:47   ` Peter Zijlstra
2025-03-06 16:18     ` Bart Van Assche
2025-03-07  8:07       ` Peter Zijlstra
2025-03-07 21:50       ` Puchert, Aaron
2025-03-07 21:46     ` Puchert, Aaron
2025-03-06 10:08   ` Peter Zijlstra
2025-03-06 22:18     ` Puchert, Aaron
2025-03-07  7:59       ` Peter Zijlstra
2025-03-07 14:13         ` Peter Zijlstra
2025-03-06 10:37   ` Peter Zijlstra
2025-03-06 23:14     ` Puchert, Aaron
2025-03-07  8:52       ` Peter Zijlstra [this message]
2025-03-07 12:52         ` Peter Zijlstra
2025-03-07 14:22           ` Greg Kroah-Hartman
2025-03-07 14:35             ` Peter Zijlstra
2025-03-08  6:06               ` Greg Kroah-Hartman
2025-03-07 23:03         ` Puchert, Aaron
2025-03-06 17:11   ` Paul E. McKenney
2025-03-06 23:24     ` Puchert, Aaron
2025-03-06 23:44       ` Paul E. McKenney
2025-03-07 17:59         ` Puchert, Aaron
2025-03-07 18:24           ` Paul E. McKenney
2025-03-07 12:00   ` Marco Elver
2025-05-05 13:44 ` Marco Elver
2025-06-05 12:44   ` Marco Elver
2025-09-18 10:37     ` Marco Elver
2025-09-18 11:10       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250307085204.GJ16878@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aaron.puchert@sap.com \
    --cc=aaron@aaronballman.com \
    --cc=boqun.feng@gmail.com \
    --cc=bvanassche@acm.org \
    --cc=elver@google.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox