From mboxrd@z Thu Jan  1 00:00:00 1970
From: Gregory Haskins <gregory.haskins@gmail.com>
Subject: Re: [PATCH RT RFC v4 1/8] add generalized priority-inheritance interface
Date: Fri, 15 Aug 2008 16:32:43 -0400
Message-ID: <48A5E7EB.6020000@gmail.com>
References: <20080815202408.668.23736.stgit@dev.haskins.net> <20080815202823.668.26199.stgit@dev.haskins.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: mingo@elte.hu, paulmck@linux.vnet.ibm.com, peterz@infradead.org,
	tglx@linutronix.de, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	David.Holmes@sun.com, jkacur@gmail.com
To: Gregory Haskins <ghaskins@novell.com>
Return-path: <linux-rt-users-owner@vger.kernel.org>
Received: from py-out-1112.google.com ([64.233.166.177]:47928 "EHLO
	py-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1761460AbYHOUe6 (ORCPT
	<rfc822;linux-rt-users@vger.kernel.org>);
	Fri, 15 Aug 2008 16:34:58 -0400
Received: by py-out-1112.google.com with SMTP id p76so988579pyb.10
        for <linux-rt-users@vger.kernel.org>; Fri, 15 Aug 2008 13:34:57 -0700 (PDT)
In-Reply-To: <20080815202823.668.26199.stgit@dev.haskins.net>
Sender: linux-rt-users-owner@vger.kernel.org
List-ID: <linux-rt-users.vger.kernel.org>

Gregory Haskins wrote:
> The kernel currently addresses priority-inversion through priority-
> inheritence.  However, all of the priority-inheritence logic is
> integrated into the Real-Time Mutex infrastructure.  This causes a fe=
w
> problems:
>
>  1) This tightly coupled relationship makes it difficult to extend to
>     other areas of the kernel (for instance, pi-aware wait-queues may
>     be desirable).
>  2) Enhancing the rtmutex infrastructure becomes challenging because
>     there is no seperation between the locking code, and the pi-code.
>
> This patch aims to rectify these shortcomings by designing a stand-al=
one
> pi framework which can then be used to replace the rtmutex-specific
> version.  The goal of this framework is to provide similar functional=
ity
> to the existing subsystem, but with sole focus on PI and the
> relationships between objects that can boost priority, and the object=
s
> that get boosted.
>
> We introduce the concept of a "pi_source" and a "pi_sink", where, as =
the
> name suggests provides the basic relationship of a priority source, a=
nd
> its boosted target.  A pi_source acts as a reference to some arbitrar=
y
> source of priority, and a pi_sink can be boosted (or deboosted) by
> a pi_source.  For more details, please read the library documentation=
=2E
>
> There are currently no users of this inteface.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
>
>  Documentation/libpi.txt |   59 ++++++
>  include/linux/pi.h      |  293 ++++++++++++++++++++++++++++
>  lib/Makefile            |    3=20
>  lib/pi.c                |  489 +++++++++++++++++++++++++++++++++++++=
++++++++++
>  4 files changed, 843 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/libpi.txt
>  create mode 100644 include/linux/pi.h
>  create mode 100644 lib/pi.c
>
> diff --git a/Documentation/libpi.txt b/Documentation/libpi.txt
> new file mode 100644
> index 0000000..197b21a
> --- /dev/null
> +++ b/Documentation/libpi.txt
> @@ -0,0 +1,59 @@
> +=EF=BB=BFlib/pi.c - Priority Inheritance library
> +
> +Sources and sinks:
> +------------
> +
> +This library introduces the basic concept of a "pi_source" and a "pi=
_sink", where, as the name suggests provides the basic relationship of =
a priority source, and its boosted target.
> +
> +A pi_source is simply a reference to some arbitrary priority value t=
hat may range from 0 (highest prio), to MAX_PRIO (currently 140, lowest=
 prio).  A pi_source calls pi_sink.boost() whenever it wishes to boost =
the sink to (at least minimally) the priority value that the source rep=
resents.  It uses pi_sink.boost() for both the initial boosting, or for=
 any subsequent refreshes to the value (even if the value is decreasing=
 in logical priority).  The policy of the sink will dictate what happen=
s as a result of that boost.  Likewise, a pi_source calls pi_sink.deboo=
st() to stop contributing to the sink's minimum priority.
> +
> +It is important to note that a source is a reference to a priority v=
alue, not a value itself.  This is one of the concepts that allows the =
interface to be idempotent, which is important for properly updating a =
chain of sources and sinks in the proper order.  If we passed the prior=
ity on the stack, the order in which the system executes could allow th=
e actual value that is set to race.
> +
> +Nodes:
> +
> +A pi_node is a convenience object which is simultaneously a source a=
nd a sink.  As its name suggests, it would typically be deployed as a n=
ode in a pi-chain.  Other pi_sources can boost a node via its pi_sink.b=
oost() interface.  Likewise, a node can boost a fixed number of sinks v=
ia the node.add_sink() interface.
> +
> +Generally speaking, a node takes care of many common operations asso=
ciated with being a =E2=80=9Clink in the chain=E2=80=9D, such as:
> +
> +	1) determining the current priority of the node based on the (logic=
ally) highest priority source that is boosting the node.
> +	2) boosting/deboosting upstream sinks whenever the node locally cha=
nges priority.
> +	3) taking care to avoid deadlock during a chain update.
> +
> +Design details:
> +
> +Destruction:
> +
> +The pi-library objects are designed to be implicitly-destructable (m=
eaning they do not require an explicit =E2=80=9Cfree()=E2=80=9D operati=
on when they are not used anymore).  This is important considering thei=
r intended use (spinlock_t's which are also implicitly-destructable).  =
As such, any allocations needed for operation must come from internal s=
tructure storage as there will be no opportunity to free it later.
> +
> +Multiple sinks per Node:
> +
> +We allow multiple sinks to be associated with a node.  This is a sli=
ght departure from the previous implementation which had the notion of =
only a single sink (i.e. =E2=80=9Ctask->pi_blocked_on=E2=80=9D).  The r=
eason why we added the ability to add more than one sink was not to cha=
nge the default chaining model (I.e. multiple boost targets), but rathe=
r to add a flexible notification mechanism that is peripheral to the ch=
ain, which are informally called =E2=80=9Cleaf sinks=E2=80=9D.
> +
> +Leaf-sinks are boostable objects that do not perpetuate a chain per =
se.  Rather, they act as endpoints to a priority boosting.  Ultimately,=
 every chain ends with a leaf-sink, which presumably will act on the ne=
w priority information.  However, there may be any number of leaf-sinks=
 along a chain as well.  Each one will act on its localized priority in=
 its own implementation specific way.  For instance, a task_struct pi-l=
eaf may change the priority of the task and reschedule it if necessary.=
  Whereas an rwlock leaf-sink may boost a list of reader-owners.
> +
> +The following diagram depicts an example relationship (warning: chee=
sy ascii art)
> +
> +                   ---------       ---------
> +                   | leaf  |       | leaf  |
> +                   ---------       ---------
> +                  /	          /	   =20
> +     ---------   / ----------    / ---------     ---------
> +  ->-| node  |->---| node   |-->---| node  |->---| leaf  |
> +     ---------     ----------      ---------     ---------
> +
> +The reason why this was done was to unify the notion of a =E2=80=9Cs=
ink=E2=80=9D to a single interface, rather than having something like t=
ask->pi_blocks_on and a separate callback for the leaf action.  Instead=
, any downstream object can be represented by a sink, and the implement=
ation details are hidden (e.g. im a task, im a lock, im a node, im a wo=
rk-item, im a wait-queue, etc).
> +
> +Sinkrefs:
> +
> +Each pi_sink.boost() operation is represented by a unique pi_source =
to properly facilitate a one node to many source relationship.  Therefo=
re, if a pi_node is to act as aggregator to multiple sinks, it implicit=
ly must have one internal pi_source object for every sink that is added=
 (via node.add_sink().  This pi_source object has to be internally mana=
ged for the lifetime of the sink reference.
> +
> +Recall that due to the implicit-destruction requirement above, and t=
he fact that we will typically be executing in a preempt-disabled regio=
n, we have to be very careful about how we allocate references to those=
 sinks.  More on that next.  But long story short we limit the number o=
f sinks to MAX_PI_DEPENDENDICES (currently 5).
> +
> +Locking:
> +
> +(work in progress....)
> +
> +
> +
> +
> +
> diff --git a/include/linux/pi.h b/include/linux/pi.h
> new file mode 100644
> index 0000000..5535474
> --- /dev/null
> +++ b/include/linux/pi.h
> @@ -0,0 +1,293 @@
> +/*
> + * see Documentation/libpi.txt for details
> + */
> +
> +#ifndef _LINUX_PI_H
> +#define _LINUX_PI_H
> +
> +#include <linux/list.h>
> +#include <linux/plist.h>
> +#include <asm/atomic.h>
> +
> +#define MAX_PI_DEPENDENCIES 5
> +
> +struct pi_source {
> +	struct plist_node  list;
> +	int               *prio;
> +	int                boosted;
> +};
> +
> +
> +#define PI_FLAG_DEFER_UPDATE     (1 << 0)
> +#define PI_FLAG_ALREADY_BOOSTED  (1 << 1)
> +
> +struct pi_sink;
> +
> +struct pi_sink_ops {
> +	int (*boost)(struct pi_sink *sink, struct pi_source *src,
> +		     unsigned int flags);
> +	int (*deboost)(struct pi_sink *sink, struct pi_source *src,
> +		       unsigned int flags);
> +	int (*update)(struct pi_sink *sink,
> +		      unsigned int flags);
> +	int (*free)(struct pi_sink *sink,
> +		      unsigned int flags);
> +};
> +
> +struct pi_sink {
> +	atomic_t            refs;
> +	struct pi_sink_ops *ops;
> +};
> +
> +enum pi_state {
> +	pi_state_boost,
> +	pi_state_boosted,
> +	pi_state_deboost,
> +	pi_state_free,
> +};
> +
> +/*
> + * NOTE: PI must always use a true (e.g. raw) spinlock, since it is =
used by
> + * rtmutex infrastructure.
> + */
> +
> +struct pi_sinkref {
> +	raw_spinlock_t         lock;
> +	struct list_head       list;
> +	enum pi_state          state;
> +	struct pi_sink        *sink;
> +	struct pi_source       src;
> +	atomic_t               refs;
> +};
> +
> +struct pi_sinkref_pool {
> +	struct list_head       free;
> +	struct pi_sinkref      data[MAX_PI_DEPENDENCIES];
> +};
> +
> +struct pi_node {
> +	raw_spinlock_t         lock;
> +	int                    prio;
> +	struct pi_sink         sink;
> +	struct pi_sinkref_pool sinkref_pool;
> +	struct list_head       sinks;
> +	struct plist_head      srcs;
> +};
> +
> +/**
> + * pi_node_init - initialize a pi_node before use
> + * @node: a node context
> + */
> +extern void pi_node_init(struct pi_node *node);
> +
> +/**
> + * pi_add_sink - add a sink as an downstream object
> + * @node: the node context
> + * @sink: the sink context to add to the node
> + * @flags: optional flags to modify behavior
> + *   PI_FLAG_DEFER_UPDATE    - Do not perform sync update
> + *   PI_FLAG_ALREADY_BOOSTED - Do not perform initial boosting
> + *
> + * This function registers a sink to get notified whenever the
> + * node changes priority.
> + *
> + * Note: By default, this function will schedule the newly added sin=
k
> + * to get an inital boost notification on the next update (even
> + * without the presence of a priority transition).  However, if the
> + * ALREADY_BOOSTED flag is specified, the sink is initially marked a=
s
> + * BOOSTED and will only get notified if the node changes priority
> + * in the future.
> + *
> + * Note: By default, this function will synchronously update the
> + * chain unless the DEFER_UPDATE flag is specified.
> + *
> + * Returns: (int)
> + *   0 =3D success
> + *   any other value =3D failure
> + */
> +extern int pi_add_sink(struct pi_node *node, struct pi_sink *sink,
> +		       unsigned int flags);
> +
> +/**
> + * pi_del_sink - del a sink from the current downstream objects
> + * @node: the node context
> + * @sink: the sink context to delete from the node
> + * @flags: optional flags to modify behavior
> + *   PI_FLAG_DEFER_UPDATE    - Do not perform sync update
> + *
> + * This function unregisters a sink from the node.
> + *
> + * Note: The sink will not actually become fully deboosted until
> + * a call to node.update() successfully returns.
> + *
> + * Note: By default, this function will synchronously update the
> + * chain unless the DEFER_UPDATE flag is specified.
> + *
> + * Returns: (int)
> + *   0 =3D success
> + *   any other value =3D failure
> + */
> +extern int pi_del_sink(struct pi_node *node, struct pi_sink *sink,
> +		       unsigned int flags);
> +
> +/**
> + * pi_sink_init - initialize a pi_sink before use
> + * @sink: a sink context
> + * @ops: pointer to an pi_sink_ops structure
> + */
> +static inline void
> +pi_sink_init(struct pi_sink *sink, struct pi_sink_ops *ops)
> +{
> +	atomic_set(&sink->refs, 0);
> +	sink->ops =3D ops;
> +}
> +
> +/**
> + * pi_source_init - initialize a pi_source before use
> + * @src: a src context
> + * @prio: pointer to a priority value
> + *
> + * A pointer to a priority value is used so that boost and update
> + * are fully idempotent.
> + */
> +static inline void
> +pi_source_init(struct pi_source *src, int *prio)
> +{
> +	plist_node_init(&src->list, *prio);
> +	src->prio =3D prio;
> +	src->boosted =3D 0;
> +}
> +
> +/**
> + * pi_boost - boost a node with a pi_source
> + * @node: the node context
> + * @src: the src context to boost the node with
> + * @flags: optional flags to modify behavior
> + *   PI_FLAG_DEFER_UPDATE    - Do not perform sync update
> + *
> + * This function registers a priority source with the node, possibly
> + * boosting its value if the new source is the highest registered so=
urce.
> + *
> + * This function is used to both initially register a source, as wel=
l as
> + * to notify the node if the value changes in the future (even if th=
e
> + * priority is decreasing).
> + *
> + * Note: By default, this function will synchronously update the
> + * chain unless the DEFER_UPDATE flag is specified.
> + *
> + * Returns: (int)
> + *   0 =3D success
> + *   any other value =3D failure
> + */
> +static inline int
> +pi_boost(struct pi_node *node, struct pi_source *src, unsigned int f=
lags)
> +{
> +	struct pi_sink *sink =3D &node->sink;
> +
> +	if (sink->ops->boost)
> +		return sink->ops->boost(sink, src, flags);
> +
> +	return 0;
> +}
> +
> +/**
> + * pi_deboost - deboost a pi_source from a node
> + * @node: the node context
> + * @src: the src context to boost the node with
> + * @flags: optional flags to modify behavior
> + *   PI_FLAG_DEFER_UPDATE    - Do not perform sync update
> + *
> + * This function unregisters a priority source from the node, possib=
ly
> + * deboosting its value if the departing source was the highest
> + * registered source.
> + *
> + * Note: By default, this function will synchronously update the
> + * chain unless the DEFER_UPDATE flag is specified.
> + *
> + * Returns: (int)
> + *   0 =3D success
> + *   any other value =3D failure
> + */
> +static inline int
> +pi_deboost(struct pi_node *node, struct pi_source *src, unsigned int=
 flags)
> +{
> +	struct pi_sink *sink =3D &node->sink;
> +
> +	if (sink->ops->deboost)
> +		return sink->ops->deboost(sink, src, flags);
> +
> +	return 0;
> +}
> +
> +/**
> + * pi_update - force a manual chain update
> + * @node: the node context
> + * @flags: optional flags to modify behavior.  Reserved, must be 0.
> + *
> + * This function will push any priority changes (as a result of
> + * boost/deboost or add_sink/del_sink) down through the chain.
> + * If no changes are necessary, this function is a no-op.
> + *
> + * Returns: (int)
> + *   0 =3D success
> + *   any other value =3D failure
> + */
> +static inline int
> +pi_update(struct pi_node *node, unsigned int flags)
> +{
> +	struct pi_sink *sink =3D &node->sink;
> +
> +	if (sink->ops->update)
> +		return sink->ops->update(sink, flags);
> +
> +	return 0;
> +}
> +
> +/**
> + * pi_sink_put - down the reference count, freeing the sink if 0
> + * @node: the node context
> + * @flags: optional flags to modify behavior.  Reserved, must be 0.
> + *
> + * Returns: none
> + */
> +static inline void
> +pi_sink_put(struct pi_sink *sink, unsigned int flags)
> +{
> +	if (atomic_dec_and_test(&sink->refs)) {
> +		if (sink->ops->free)
> +			sink->ops->free(sink, flags);
> +	}
> +}
> +
> +
> +/**
> + * pi_get - up the reference count
> + * @node: the node context
> + * @flags: optional flags to modify behavior.  Reserved, must be 0.
> + *
> + * Returns: none
> + */
> +static inline void
> +pi_get(struct pi_node *node, unsigned int flags)
> +{
> +	struct pi_sink *sink =3D &node->sink;
> +
> +	atomic_inc(&sink->refs);
> +}
> +
> +/**
> + * pi_put - down the reference count, freeing the node if 0
> + * @node: the node context
> + * @flags: optional flags to modify behavior.  Reserved, must be 0.
> + *
> + * Returns: none
> + */
> +static inline void
> +pi_put(struct pi_node *node, unsigned int flags)
> +{
> +	struct pi_sink *sink =3D &node->sink;
> +
> +	pi_sink_put(sink, flags);
> +}
> +
> +#endif /* _LINUX_PI_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index 5187924..df81ad7 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -23,7 +23,8 @@ lib-$(CONFIG_SMP) +=3D cpumask.o
>  lib-y	+=3D kobject.o kref.o klist.o
> =20
>  obj-y +=3D div64.o sort.o parser.o halfmd4.o debug_locks.o random32.=
o \
> -	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o
> +	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
> +	 pi.o
> =20
>  ifeq ($(CONFIG_DEBUG_KOBJECT),y)
>  CFLAGS_kobject.o +=3D -DDEBUG
> diff --git a/lib/pi.c b/lib/pi.c
> new file mode 100644
> index 0000000..d00042c
> --- /dev/null
> +++ b/lib/pi.c
> @@ -0,0 +1,489 @@
> +/*
> + *  lib/pi.c
> + *
> + *  Priority-Inheritance library
> + *
> + *  Copyright (C) 2008 Novell
> + *
> + *  Author: Gregory Haskins <ghaskins@novell.com>
> + *
> + *  This code provides a generic framework for preventing priority
> + *  inversion by means of priority-inheritance. (see Documentation/l=
ibpi.txt
> + *  for details)
> + *
> + *  This library is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU General Public License
> + *  as published by the Free Software Foundation; version 2
> + *  of the License.
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/module.h>
> +#include <linux/pi.h>
> +
> +
> +struct updater {
> +    int update;
> +    struct pi_sinkref *sinkref;
> +    struct pi_sink *sink;
> +};
> +
> +/*
> + *-----------------------------------------------------------
> + * pi_sinkref_pool
> + *-----------------------------------------------------------
> + */
> +
> +static void
> +pi_sinkref_pool_init(struct pi_sinkref_pool *pool)
> +{
> +	int i;
> +
> +	INIT_LIST_HEAD(&pool->free);
> +
> +	for (i =3D 0; i < MAX_PI_DEPENDENCIES; ++i) {
> +		struct pi_sinkref *sinkref =3D &pool->data[i];
> +
> +		memset(sinkref, 0, sizeof(*sinkref));
> +		INIT_LIST_HEAD(&sinkref->list);
> +		list_add_tail(&sinkref->list, &pool->free);
> +	}
> +}
> +
> +static struct pi_sinkref *
> +pi_sinkref_alloc(struct pi_sinkref_pool *pool)
> +{
> +	struct pi_sinkref *sinkref;
> +
> +	if (list_empty(&pool->free))
> +		return NULL;
> +
> +	sinkref =3D list_first_entry(&pool->free, struct pi_sinkref, list);
> +	list_del(&sinkref->list);
> +	memset(sinkref, 0, sizeof(*sinkref));
> +
> +	return sinkref;
> +}
> +
> +static void
> +pi_sinkref_free(struct pi_sinkref_pool *pool,
> +		  struct pi_sinkref *sinkref)
> +{
> +	list_add_tail(&sinkref->list, &pool->free);
> +}
> +
> +/*
> + *-----------------------------------------------------------
> + * pi_sinkref
> + *-----------------------------------------------------------
> + */
> +
> +static inline void
> +_pi_sink_get(struct pi_sinkref *sinkref)
> +{
> +	atomic_inc(&sinkref->sink->refs);
> +	atomic_inc(&sinkref->refs);
> +}
> +
> +static inline void
> +_pi_sink_put_local(struct pi_node *node, struct pi_sinkref *sinkref)
> +{
> +	if (atomic_dec_and_lock(&sinkref->refs, &node->lock)) {
> +		list_del(&sinkref->list);
> +		pi_sinkref_free(&node->sinkref_pool, sinkref);
> +		spin_unlock(&node->lock);
> +	}
> +}
> +
> +static inline void
> +_pi_sink_put_all(struct pi_node *node, struct pi_sinkref *sinkref)
> +{
> +	struct pi_sink *sink =3D sinkref->sink;
> +
> +	_pi_sink_put_local(node, sinkref);
> +	pi_sink_put(sink, 0);
> +}
> +
> +/*
> + *-----------------------------------------------------------
> + * pi_node
> + *-----------------------------------------------------------
> + */
> +
> +static struct pi_node *node_of(struct pi_sink *sink)
> +{
> +	return container_of(sink, struct pi_node, sink);
> +}
> +
> +static inline void
> +__pi_boost(struct pi_node *node, struct pi_source *src)
> +{
> +	BUG_ON(src->boosted);
> +
> +	plist_node_init(&src->list, *src->prio);
> +	plist_add(&src->list, &node->srcs);
> +	src->boosted =3D 1;
> +}
> +
> +static inline void
> +__pi_deboost(struct pi_node *node, struct pi_source *src)
> +{
> +	BUG_ON(!src->boosted);
> +
> +	plist_del(&src->list, &node->srcs);
> +	src->boosted =3D 0;
> +}
> +
> +/*
> + * _pi_node_update - update the chain
> + *
> + * We loop through up to MAX_PI_DEPENDENCIES times looking for stale=
 entries
> + * that need to propagate up the chain.  This is a step-wise process=
 where we
> + * have to be careful about locking and preemption.  By trying MAX_P=
I_DEPs
> + * times, we guarantee that this update routine is an effective barr=
ier...
> + * all modifications made prior to the call to this barrier will hav=
e completed.
> + *
> + * Deadlock avoidance: This node may participate in a chain of nodes=
 which
> + * form a graph of arbitrary structure.  While the graph should tech=
nically
> + * never close on itself barring any bugs, we still want to protect =
against
> + * a theoretical ABBA deadlock (if for nothing else, to prevent lock=
dep
> + * from detecting this potential).  To do this, we employ a dual-loc=
king
> + * scheme where we can carefully control the order.  That is: node->=
lock
> + * protects most of the node's internal state, but it will never be =
held
> + * across a chain update.  sinkref->lock, on the other hand, can be =
held
> + * across a boost/deboost, and also guarantees proper execution orde=
r. Also
> + * note that no locks are held across an sink->update.
> + */
> +static int
> +_pi_node_update(struct pi_sink *sink, unsigned int flags)
> +{
> +	struct pi_node    *node =3D node_of(sink);
> +	struct pi_sinkref *sinkref;
> +	unsigned long      iflags;
> +	int                count =3D 0;
> +	int                i;
> +	int                pprio;
> +	struct updater     updaters[MAX_PI_DEPENDENCIES];
> +
> +	spin_lock_irqsave(&node->lock, iflags);
> +
> +	pprio =3D node->prio;
> +
> +	if (!plist_head_empty(&node->srcs))
> +		node->prio =3D plist_first(&node->srcs)->prio;
> +	else
> +		node->prio =3D MAX_PRIO;
> +
> +	list_for_each_entry(sinkref, &node->sinks, list) {
> +		/*
> +		 * If the priority is changing, or if this is a
> +		 * BOOST/DEBOOST, we consider this sink "stale"
> +		 */
> +		if (pprio !=3D node->prio
> +		    || sinkref->state !=3D pi_state_boosted) {
> +			struct updater *iter =3D &updaters[count++];
> +
> +			BUG_ON(!atomic_read(&sinkref->sink->refs));
> +			_pi_sink_get(sinkref);
> +
> +			iter->update  =3D 1;
> +			iter->sinkref =3D sinkref;
> +			iter->sink     =3D sinkref->sink;
> +		}
> +	}
> +
> +	spin_unlock(&node->lock);
> +
> +	for (i =3D 0; i < count; ++i) {
> +		struct updater    *iter =3D &updaters[i];
> +		unsigned int       lflags =3D PI_FLAG_DEFER_UPDATE;
> +		struct pi_sink    *sink;
> +
> +		sinkref =3D iter->sinkref;
> +		sink =3D iter->sink;
> +
> +		spin_lock(&sinkref->lock);
> +
> +		switch (sinkref->state) {
> +		case pi_state_boost:
> +			sinkref->state =3D pi_state_boosted;
> +			/* Fall through */
> +		case pi_state_boosted:
> +			sink->ops->boost(sink, &sinkref->src, lflags);
> +			break;
> +		case pi_state_deboost:
> +			sink->ops->deboost(sink, &sinkref->src, lflags);
> +			sinkref->state =3D pi_state_free;
> +
> +			/*
> +			 * drop the ref that we took when the sinkref
> +			 * was allocated.  We still hold a ref from
> +			 * above.
> +			 */
> +			_pi_sink_put_all(node, sinkref);
> +			break;
> +		case pi_state_free:
> +			iter->update =3D 0;
> +			break;
> +		default:
> +			panic("illegal sinkref type: %d", sinkref->state);
> +		}
> +
> +		spin_unlock(&sinkref->lock);
> +
> +		/*
> +		 * We will drop the sinkref reference while still holding the
> +		 * preempt/irqs off so that the memory is returned synchronously
> +		 * to the system.
> +		 */
> +		_pi_sink_put_local(node, sinkref);
> +	}
> +
> +	local_irq_restore(iflags);
> +
> +	/*
> +	 * Note: At this point, sinkref is invalid since we put'd
> +	 * it above, but sink is valid since we still hold the remote
> +	 * reference.  This is key to the design because it allows us
> +	 * to synchronously free the sinkref object, yet maintain a
> +	 * reference to the sink across the update
> +	 */
> +	for (i =3D 0; i < count; ++i) {
> +		struct updater *iter =3D &updaters[i];
> +
> +		if (iter->update)
> +			iter->sink->ops->update(iter->sink, 0);
> +	}
> +
> +	/*
> +	 * We perform all the free opertations together at the end, using
> +	 * only automatic/stack variables since any one of these operations
> +	 * could result in our node object being deallocated
> +	 */
> +	for (i =3D 0; i < count; ++i) {
> +		struct updater *iter =3D &updaters[i];
> +
> +		pi_sink_put(iter->sink, 0);
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +_pi_del_sink(struct pi_node *node, struct pi_sink *sink, unsigned in=
t flags)
> +{
> +	struct pi_sinkref *sinkref;
> +	struct updater     updaters[MAX_PI_DEPENDENCIES];
> +	unsigned long      iflags;
> +	int                count =3D 0;
> +	int                i;
> +
> +	local_irq_save(iflags);
> +	spin_lock(&node->lock);
> +
> +	list_for_each_entry(sinkref, &node->sinks, list) {
> +		if (!sink || sink =3D=3D sinkref->sink) {
> +			struct updater *iter =3D &updaters[count++];
> +
> +			_pi_sink_get(sinkref);
> +			iter->sinkref =3D sinkref;
> +			iter->sink    =3D sinkref->sink;
> +		}
> +	}
> +
> +	spin_unlock(&node->lock);
> +
> +	for (i =3D 0; i < count; ++i) {
> +		struct updater *iter   =3D &updaters[i];
> +		int             remove =3D 0;
> +
> +		sinkref =3D iter->sinkref;
> +
> +		spin_lock(&sinkref->lock);
> +
> +		switch (sinkref->state) {
> +		case pi_state_boost:
> +			/*
> +			 * This state indicates the sink was never formally
> +			 * boosted so we can just delete it immediately
> +			 */
> +			remove =3D 1;
> +			break;
> +		case pi_state_boosted:
> +			if (sinkref->sink->ops->deboost)
> +				/*
> +				 * If the sink supports deboost notification,
> +				 * schedule it for deboost at the next update
> +				 */
> +				sinkref->state =3D pi_state_deboost;
> +			else
> +				/*
> +				 * ..otherwise schedule it for immediate
> +				 * removal
> +				 */
> +				remove =3D 1;
> +			break;
> +		default:
> +			break;
> +		}
> +
> +		if (remove) {
> +			/*
> +			 * drop the ref that we took when the sinkref
> +			 * was allocated.  We still hold a ref from
> +			 * above
> +			 */
> +			_pi_sink_put_all(node, sinkref);
> +			sinkref->state =3D pi_state_free;
> +		}
> +
> +		spin_unlock(&sinkref->lock);
> +
> +		_pi_sink_put_local(node, sinkref);
> +	}
> +
> +	local_irq_restore(iflags);
> +
> +	for (i =3D 0; i < count; ++i)
> +		pi_sink_put(updaters[i].sink, 0);
> +
> +	if (!(flags & PI_FLAG_DEFER_UPDATE))
> +		_pi_node_update(&node->sink, 0);
> +
> +	return 0;
> +}
> +
> +static int
> +_pi_node_boost(struct pi_sink *sink, struct pi_source *src,
> +	       unsigned int flags)
> +{
> +	struct pi_node *node =3D node_of(sink);
> +	unsigned long   iflags;
> +
> +	spin_lock_irqsave(&node->lock, iflags);
> +	if (src->boosted)
> +		__pi_deboost(node, src);
> +	__pi_boost(node, src);
> +	spin_unlock_irqrestore(&node->lock, iflags);
> +
> +	if (!(flags & PI_FLAG_DEFER_UPDATE))
> +		_pi_node_update(sink, 0);
> +
> +	return 0;
> +}
> +
> +static int
> +_pi_node_deboost(struct pi_sink *sink, struct pi_source *src,
> +		 unsigned int flags)
> +{
> +	struct pi_node *node =3D node_of(sink);
> +	unsigned long   iflags;
> +
> +	spin_lock_irqsave(&node->lock, iflags);
> +	__pi_deboost(node, src);
> +	spin_unlock_irqrestore(&node->lock, iflags);
> +
> +	if (!(flags & PI_FLAG_DEFER_UPDATE))
> +		_pi_node_update(sink, 0);
> +
> +	return 0;
> +}
> +
> +static int
> +_pi_node_free(struct pi_sink *sink, unsigned int flags)
> +{
> +	struct pi_node *node =3D node_of(sink);
> +
> +	/*
> +	 * When the node is freed, we should perform an implicit
> +	 * del_sink on any remaining sinks we may have.
> +	 */
> +	return _pi_del_sink(node, NULL, flags);
> +}
> +
> +static struct pi_sink_ops pi_node_sink =3D {
> +    .boost   =3D _pi_node_boost,
> +    .deboost =3D _pi_node_deboost,
> +    .update  =3D _pi_node_update,
> +    .free    =3D _pi_node_free,
> +};
> +
> +void
> +pi_node_init(struct pi_node *node)
> +{
> +	spin_lock_init(&node->lock);
> +	node->prio =3D MAX_PRIO;
> +	atomic_set(&node->sink.refs, 1);
> +	node->sink.ops =3D &pi_node_sink;
>  =20
                  ^^^^^^

Note to self:  this should use pi_sink_init()


> +	pi_sinkref_pool_init(&node->sinkref_pool);
> +	INIT_LIST_HEAD(&node->sinks);
> +	plist_head_init(&node->srcs, &node->lock);
> +}
> +
> +int
> +pi_add_sink(struct pi_node *node, struct pi_sink *sink, unsigned int=
 flags)
> +{
> +	struct pi_sinkref *sinkref;
> +	int                ret =3D 0;
> +	unsigned long      iflags;
> +
> +	spin_lock_irqsave(&node->lock, iflags);
> +
> +	if (!atomic_read(&node->sink.refs)) {
> +		ret =3D -EINVAL;
> +		goto out;
> +	}
> +
> +	sinkref =3D pi_sinkref_alloc(&node->sinkref_pool);
> +	if (!sinkref) {
> +		ret =3D -ENOMEM;
> +		goto out;
> +	}
> +
> +	spin_lock_init(&sinkref->lock);
> +	INIT_LIST_HEAD(&sinkref->list);
> +
> +	if (flags & PI_FLAG_ALREADY_BOOSTED)
> +		sinkref->state =3D pi_state_boosted;
> +	else
> +		/*
> +		 * Schedule it for addition at the next update
> +		 */
> +		sinkref->state =3D pi_state_boost;
> +
> +	pi_source_init(&sinkref->src, &node->prio);
> +	sinkref->sink =3D sink;
> +
> +	/* set one ref from ourselves.  It will be dropped on del_sink */
> +	atomic_inc(&sinkref->sink->refs);
> +	atomic_set(&sinkref->refs, 1);
> +
> +	list_add_tail(&sinkref->list, &node->sinks);
> +
> +	spin_unlock_irqrestore(&node->lock, iflags);
> +
> +	if (!(flags & PI_FLAG_DEFER_UPDATE))
> +		_pi_node_update(&node->sink, 0);
> +
> +	return 0;
> +
> + out:
> +	spin_unlock_irqrestore(&node->lock, iflags);
> +
> +	return ret;
> +}
> +
> +int
> +pi_del_sink(struct pi_node *node, struct pi_sink *sink, unsigned int=
 flags)
> +{
> +	/*
> +	 * There may be multiple matches to sink because sometimes a
> +	 * deboost/free may still be pending an update when the same
> +	 * node has been added.  So we want to process any and all
> +	 * instances that match our target
> +	 */
> +	return _pi_del_sink(node, sink, flags);
> +}
> +
> +
> +
>
>  =20

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-user=
s" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html