All of lore.kernel.org
 help / color / mirror / Atom feed
* [OSSTEST PATCH] README.planner: Document the resource planning system
@ 2014-11-20 18:07 Ian Jackson
  2014-11-21 13:44 ` Ian Campbell
  0 siblings, 1 reply; 2+ messages in thread
From: Ian Jackson @ 2014-11-20 18:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Ian Campbell

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 README.planner |  181 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 1 deletion(-)

diff --git a/README.planner b/README.planner
index de8b962..ec4dce8 100644
--- a/README.planner
+++ b/README.planner
@@ -1,4 +1,183 @@
-Resource planner / scheduler
+RESOURCE PLANNER (IE SCHEDULER)
+===============================
+
+Overall architecture
+--------------------
+
+Resources (eg hosts) are owned by `tasks'.  As resources are allocated
+and deallocated, their `owntaskid' in the database is updated.
+
+When a process wishes to allocate resources, it does as follows:
+
+ - Select an appropriate task.  For command-line use, the user@host
+   static task usually used (as specified by the OSSTEST_TASK env var)
+   and things fail if it doesn't actually exist.
+
+   Automatic runs create a new ownd task for each job (in become-task
+   in JobDB-Executive.tcl, in sg-run-job.
+
+ - Connect to the queue daemon and participate in the planning
+   process.
+
+
+Planning
+--------
+
+The queue daemon sequences the planning of resource use and the
+allocation of resources.  This is done in a periodic planning cycle.
+Planning cycles are prompted by newly available resources, new
+requests for participation, and periodically.
+
+During each planning cycle we construct, from scratch, a complete plan
+for which resources are to be used, when, by which tasks.  Resources
+which are free and suitable for allocation right away are planned and
+allocated for immediate use.
+
+But, the plan extends far enough into the future to cover all
+currently-foreseeable requirements for resources.  This provides the
+planning algorithms the most complete information about available
+tradeoffs, and also provides useful output (the resource plan) for
+administrators and users.
+
+Each planning cycle starts with the existing allocated resources.  The
+planning daemon records (on disk, not in the database) what expected
+duration was declared with each of those allocations.  (A task that
+has allocated the resources it needs does not any longer participate
+in the planning process, although it will retain a liveness connection
+to the ms-ownerdaemon.)
+
+Then each interested client of ms-queuedaemon is asked - one by one,
+in turn - to fill into the plan-under-construction, what resources it
+intends to uses when.  Clients specify the expected duration of their
+use (but there is no mechanism for enforcing accuracy of these
+estimates).  ms-queuedaemon collates and records the provided
+information and passes it on to the next client.
+
+If there are resources which are available right now which a client
+wants to use, the client will allocate it there and then during its
+planning slot.
+
+The queueing order is determined by the job priority value.  Each
+client declares its own priority.  The usual basis for the priority is
+is client's starting time_t.  So by and large jobs execute in order.
+
+The main client in the planning process is
+ts-hosts-allocate-Executive.  That program contains the heuristics for
+choosing good tests hosts under various conditions.
+
+Command-line users can use mg-allocate -U to obtain resources through
+the planning process.  mg-allocate participates with a high queue
+priority so that command-line allocations will take precedence over
+automatic test runs.  (mg-allocate without -U bypasses the planner and
+can be used to `grab' resources which happen currently to be free.)
+
+The distinction between `idle' and `allocatable' resources exists so
+that newly-freed resources are properly offered first to the tasks at
+the front of the queue.  ms-ownerdaemon sets all idle resources to
+allocatable at the start of each planning cycle.
+
+
+ms-ownerdaemon and `ownd' tasks
+-------------------------------
+
+ms-ownerdaemon helps with cleanup and does nothing else.  Test runs
+connect to it and obtain ephemeral `task' ids.  All of the processes
+which are part of the the test run retain a descriptor onto the
+socket connection to ms-ownerdaemon.  When the last holder of a copy
+of the socket connection fd dies, ms-ownerdaemon sees the connection
+close.  It then sets the task to `not live' in the database.
+
+This means that there is no need for any explicit cleanup: tasks
+which just crash have their resources freed automatically.
+
+If the ms-ownerdaemon fails and is restarted, the tasks which were
+clients of the previous ms-owerdaemon cannot be automatically cleaned
+up.  The new ms-ownerdaemon will annotate them with `previous'.  The
+administrator can then clean them up manually, if she knows that all
+the corresponding actual processes are no longer running.
+
+
+Types of task
+-------------
+
+ * static tasks.  Usual for command-line use.  They are manually
+   created (with ./mg-hosts manual-task-create) and not normally ever
+   destroyed.
+
+ * `ownd' tasks.  These are used for production runs from cron and
+   some other mostly-automatic invocations of osstest (eg
+   mg-execute-flight).  They are automatically created and destroyed -
+   see above.
+
+ * magic task numbers with special meanings:
+
+     magic/allocatable
+
+        The resource is free and a process which is participating in
+        the planning process may allocate it to themselves by updating
+        the `owntaskid' in the resources table to refer to their own
+        task.
+
+     magic/idle
+
+        The resource is free but has perhaps only recently become so.
+        It can be allocated outside the planning process, but proceses
+        participating in planning should regard the resource as
+        unavailable.
+
+     magic/shared
+
+        The resource has been divided into shares.  It is unavailable
+        in its own right without being unshared first.  The individual
+        shares have their own owners.
+
+     magic/preparing
+
+        Applies only to shares of a divided resource.  The share is
+        unavailable because the process handling the division is still
+        putting the resource into the proper state implied by the
+        sharing information (see below).
+
+
+Sharing
+-------
+
+Hosts can be shared between multiple clients.  The first client to
+decide to set up a host for sharing:
+
+ - `Divides' the resource in the database
+    * allocates the host to the taskid `shared' and creates a set
+      of new rows in the resources table to represent the shares
+      (the number of shares is fixed at this point)
+    * initially, sets all but one of those shares to be owned by
+      magic/preparing
+    * sets the remaining share to be owned by itself
+ - Performs whatever actions are necessary to get the host into
+   a suitable state for it and others to use it (eg, installing
+   the OS)
+ - Sets the remaining shares to `idle' so that others can allocate
+   them
+
+(During planning - ie, for resources not yet available immediately -
+the intent to do this can be part of the plan so that other tasks can
+see and take account of it.  The time necessary for preparing the host
+is not currently modelled during planning.)
+
+Likewise a process which finds a shared resource completely idle can
+unshare it.  That is:
+    * Check that all the shares are allocatable
+    * Delete all the rows representing the shares
+    * Claim ownership of the main resource by changing the owntaskid
+      from `shared' to the process's own task.
+
+Shared resources also have a `wear' counter, which is there to arrange
+that shared systems get regrooved occasionally even if nothing decides
+to unshare them.
+
+
+
+DETAILED PROTOCOL NOTES
+=======================
 
 ms-queuedaemon commands
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [OSSTEST PATCH] README.planner: Document the resource planning system
  2014-11-20 18:07 [OSSTEST PATCH] README.planner: Document the resource planning system Ian Jackson
@ 2014-11-21 13:44 ` Ian Campbell
  0 siblings, 0 replies; 2+ messages in thread
From: Ian Campbell @ 2014-11-21 13:44 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Thu, 2014-11-20 at 18:07 +0000, Ian Jackson wrote:

> +   Automatic runs create a new ownd task for each job (in become-task

At first I thought ownd was a typo for owned, but I see that it is
called this in the source...

> +The main client in the planning process is
> +ts-hosts-allocate-Executive.  That program contains the heuristics for
> +choosing good tests hosts under various conditions.

s/tests/test/ I think, or maybe an apostrophe?

> +
> +Command-line users can use mg-allocate -U to obtain resources through
> +the planning process.  mg-allocate participates with a high queue
> +priority so that command-line allocations will take precedence over
> +automatic test runs.  (mg-allocate without -U bypasses the planner and
> +can be used to `grab' resources which happen currently to be free.)
> +
> +The distinction between `idle' and `allocatable' resources exists so
> +that newly-freed resources are properly offered first to the tasks at
> +the front of the queue.  ms-ownerdaemon sets all idle resources to
> +allocatable at the start of each planning cycle.

This paragraph makes the first reference to this idle vs. allocatable
concept, but seems to expect that these have already been discussed.

Perhaps s/The/A/ at the start would help but I think more is needed. It
is implied (I think) that the allocation strategy described in all of
the preceding paragraphs operates only on allocatable resources and not
idle ones, or maybe vice versa? Or maybe one or the other is only
available to tasks at the front of the queue? Anyway, can this be made
explicit please.

Having read further I now see this is describe somewhat in the 'types of
task' section. So perhaps all which is needed is a forward reference, or
some rejigging of the ordering of the doc.

> +If the ms-ownerdaemon fails and is restarted, the tasks which were
> +clients of the previous ms-owerdaemon cannot be automatically cleaned
> +up.  The new ms-ownerdaemon will annotate them with `previous'.  The
> +administrator can then clean them up manually, if she knows that all
> +the corresponding actual processes are no longer running.

Perhaps the recipe for this cleanup could be added to README.dev?

> +
> +
> +Types of task
> +-------------
> +
> + * static tasks.  Usual for command-line use.  They are manually
> +   created (with ./mg-hosts manual-task-create) and not normally ever
> +   destroyed.

FWIW there is an explicit example of this in README.dev.

> +     magic/idle
> +
> +        The resource is free but has perhaps only recently become so.
> +        It can be allocated outside the planning process, but proceses

"processes".

> +        participating in planning should regard the resource as
> +        unavailable.
> +
> +     magic/shared
> +
> +        The resource has been divided into shares.  It is unavailable
> +        in its own right without being unshared first.  The individual
> +        shares have their own owners.

"See below for more information on sharing".

(I was just about to ask something which is answered down there...)

> +
> +Sharing
> +-------
> +Likewise a process which finds a shared resource completely idle can
> +unshare it.  That is:
> +    * Check that all the shares are allocatable
> +    * Delete all the rows representing the shares

Is there a race here? Or does the check actually imply allocates each of
them to itself?
 
Ian.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-11-21 13:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-20 18:07 [OSSTEST PATCH] README.planner: Document the resource planning system Ian Jackson
2014-11-21 13:44 ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.