All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
@ 2011-11-23 14:55 Dario Faggioli
  2011-11-23 15:07 ` [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers Dario Faggioli
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-11-23 14:55 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser


[-- Attachment #1.1.1: Type: text/plain, Size: 1412 bytes --]

Hi everyone,

This series changes how locks are dealt with while adjusting domains'
scheduling parameters.

I've done and am still doing tests for credit and credit2, and it's
surviving to all I threw at it up to now. Unfortunately, I can't test
the sedf part yet, since it is not working on my test boxes due to other
issues (which I'm also trying to track down). I'm sending the series out
anyway, so that at least you can have a look at it, and maybe give a
spin on the sedf part, if you don't have anything more interesting to
do. ;-P

Juergen series on xl scheduling support attached to this mail, in the
form of a single patch, for testing convenience.

--

xen/common/sched_credit.c  |   10 +++++++---
xen/common/sched_credit2.c |   21 +++++++++++----------
xen/common/sched_sedf.c    |  131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
xen/common/schedule.c      |   34 ++--------------------------------
4 files changed, 130 insertions(+), 66 deletions(-)

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)



[-- Attachment #1.1.2: update-xl-sched-interface-from-Juergen.patch --]
[-- Type: text/x-patch, Size: 25627 bytes --]

# HG changeset patch
# Parent a80a577c34ca421fe97141eee329fb14b100faa2
Improved xl scheduling support from Juergen.

This is here as a single patch, just for convenience, in case
one wants to test the series. For details see
http://osdir.com/ml/xen-development/2011-11/msg01097.html .

diff -r a80a577c34ca docs/man/xl.pod.1
--- a/docs/man/xl.pod.1	Wed Nov 23 15:36:22 2011 +0100
+++ b/docs/man/xl.pod.1	Wed Nov 23 15:41:32 2011 +0100
@@ -579,25 +579,30 @@ default B<credit> is used for scheduling
 
 =over 4
 
-=item B<sched-credit> [ B<-d> I<domain-id> [ B<-w>[B<=>I<WEIGHT>] | B<-c>[B<=>I<CAP>] ] ]
+=item B<sched-credit> [I<OPTIONS>]
 
-Set credit scheduler parameters.  The credit scheduler is a
+Set or get credit scheduler parameters.  The credit scheduler is a
 proportional fair share CPU scheduler built from the ground up to be
 work conserving on SMP hosts.
 
 Each domain (including Domain0) is assigned a weight and a cap.
 
-B<PARAMETERS>
+B<OPTIONS>
 
 =over 4
 
-=item I<WEIGHT>
+=item B<-d DOMAIN>, B<--domain=DOMAIN>
+
+Specify domain for which scheduler parameters are to be modified or retrieved.
+Mandatory for modifying scheduler parameters.
+
+=item B<-w WEIGHT>, B<--weight=WEIGHT>
 
 A domain with a weight of 512 will get twice as much CPU as a domain
 with a weight of 256 on a contended host. Legal weights range from 1
 to 65535 and the default is 256.
 
-=item I<CAP>
+=item B<-c CAP>, B<--cap=CAP>
 
 The cap optionally fixes the maximum amount of CPU a domain will be
 able to consume, even if the host system has idle CPU cycles. The cap
@@ -605,6 +610,81 @@ is expressed in percentage of one physic
 50 is half a CPU, 400 is 4 CPUs, etc. The default, 0, means there is
 no upper cap.
 
+=item B<-p CPUPOOL>, B<--cpupool=CPUPOOL>
+
+Restrict output to domains in the specified cpupool.
+
+=back
+
+=item B<sched-credit2> [I<OPTIONS>]
+
+Set or get credit2 scheduler parameters.  The credit2 scheduler is a
+proportional fair share CPU scheduler built from the ground up to be
+work conserving on SMP hosts.
+
+Each domain (including Domain0) is assigned a weight.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-d DOMAIN>, B<--domain=DOMAIN>
+
+Specify domain for which scheduler parameters are to be modified or retrieved.
+Mandatory for modifying scheduler parameters.
+
+=item B<-w WEIGHT>, B<--weight=WEIGHT>
+
+A domain with a weight of 512 will get twice as much CPU as a domain
+with a weight of 256 on a contended host. Legal weights range from 1
+to 65535 and the default is 256.
+
+=item B<-p CPUPOOL>, B<--cpupool=CPUPOOL>
+
+Restrict output to domains in the specified cpupool.
+
+=back
+
+=item B<sched-sedf> [I<OPTIONS>]
+
+Set or get Simple EDF (Earliest Deadline First) scheduler parameters. This
+scheduler provides weighted CPU sharing in an intuitive way and uses
+realtime-algorithms to ensure time guarantees.  For more information see
+docs/misc/sedf_scheduler_mini-HOWTO.txt in the Xen distribution.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-d DOMAIN>, B<--domain=DOMAIN>
+
+Specify domain for which scheduler parameters are to be modified or retrieved.
+Mandatory for modifying scheduler parameters.
+
+=item B<-p PERIOD>, B<--period=PERIOD>
+
+The normal EDF scheduling usage in milliseconds.
+
+=item B<-s SLICE>, B<--slice=SLICE>
+
+The normal EDF scheduling usage in milliseconds.
+
+=item B<-l LATENCY>, B<--latency=LATENCY>
+
+Scaled period if domain is doing heavy I/O.
+
+=item B<-e EXTRA>, B<--extra=EXTRA>
+
+Flag for allowing domain to run in extra time (0 or 1).
+
+=item B<-w WEIGHT>, B<--weight=WEIGHT>
+
+Another way of setting CPU slice.
+
+=item B<-c CPUPOOL>, B<--cpupool=CPUPOOL>
+
+Restrict output to domains in the specified cpupool.
+
 =back
 
 =back
diff -r a80a577c34ca tools/libxl/libxl.c
--- a/tools/libxl/libxl.c	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/libxl.c	Wed Nov 23 15:41:32 2011 +0100
@@ -361,6 +361,7 @@ static void xcinfo2xlinfo(const xc_domai
     xlinfo->cpu_time = xcinfo->cpu_time;
     xlinfo->vcpu_max_id = xcinfo->max_vcpu_id;
     xlinfo->vcpu_online = xcinfo->nr_online_vcpus;
+    xlinfo->cpupool = xcinfo->cpupool;
 }
 
 libxl_dominfo * libxl_list_domain(libxl_ctx *ctx, int *nb_domain)
@@ -2678,7 +2679,7 @@ int libxl_sched_credit_domain_get(libxl_
 
     rc = xc_sched_credit_domain_get(ctx->xch, domid, &sdom);
     if (rc != 0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting domain sched credit");
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain sched credit");
         return ERROR_FAIL;
     }
 
@@ -2728,6 +2729,103 @@ int libxl_sched_credit_domain_set(libxl_
     return 0;
 }
 
+int libxl_sched_credit2_domain_get(libxl_ctx *ctx, uint32_t domid, libxl_sched_credit2 *scinfo)
+{
+    struct xen_domctl_sched_credit2 sdom;
+    int rc;
+
+    rc = xc_sched_credit2_domain_get(ctx->xch, domid, &sdom);
+    if (rc != 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain sched credit2");
+        return ERROR_FAIL;
+    }
+
+    scinfo->weight = sdom.weight;
+
+    return 0;
+}
+
+int libxl_sched_credit2_domain_set(libxl_ctx *ctx, uint32_t domid, libxl_sched_credit2 *scinfo)
+{
+    struct xen_domctl_sched_credit2 sdom;
+    xc_domaininfo_t domaininfo;
+    int rc;
+
+    rc = xc_domain_getinfolist(ctx->xch, domid, 1, &domaininfo);
+    if (rc < 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain info list");
+        return ERROR_FAIL;
+    }
+    if (rc != 1 || domaininfo.domain != domid)
+        return ERROR_INVAL;
+
+
+    if (scinfo->weight < 1 || scinfo->weight > 65535) {
+        LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
+            "Cpu weight out of range, valid values are within range from 1 to 65535");
+        return ERROR_INVAL;
+    }
+
+    sdom.weight = scinfo->weight;
+
+    rc = xc_sched_credit2_domain_set(ctx->xch, domid, &sdom);
+    if ( rc < 0 ) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting domain sched credit2");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_sched_sedf_domain_get(libxl_ctx *ctx, uint32_t domid, libxl_sched_sedf *scinfo)
+{
+    uint64_t period;
+    uint64_t slice;
+    uint64_t latency;
+    uint16_t extratime;
+    uint16_t weight;
+    int rc;
+
+    rc = xc_sedf_domain_get(ctx->xch, domid, &period, &slice, &latency, &extratime, &weight);
+    if (rc != 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain sched sedf");
+        return ERROR_FAIL;
+    }
+
+    scinfo->period = period / 1000000;
+    scinfo->slice = slice / 1000000;
+    scinfo->latency = latency / 1000000;
+    scinfo->extratime = extratime;
+    scinfo->weight = weight;
+
+    return 0;
+}
+
+int libxl_sched_sedf_domain_set(libxl_ctx *ctx, uint32_t domid, libxl_sched_sedf *scinfo)
+{
+    xc_domaininfo_t domaininfo;
+    int rc;
+
+    rc = xc_domain_getinfolist(ctx->xch, domid, 1, &domaininfo);
+    if (rc < 0) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain info list");
+        return ERROR_FAIL;
+    }
+    if (rc != 1 || domaininfo.domain != domid)
+        return ERROR_INVAL;
+
+
+    rc = xc_sedf_domain_set(ctx->xch, domid, scinfo->period * 1000000,
+                            scinfo->slice * 1000000, scinfo->latency * 1000000,
+                            scinfo->extratime, scinfo->weight);
+    if ( rc < 0 ) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting domain sched sedf");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 static int trigger_type_from_string(char *trigger_name)
 {
     if (!strcmp(trigger_name, "nmi"))
diff -r a80a577c34ca tools/libxl/libxl.h
--- a/tools/libxl/libxl.h	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/libxl.h	Wed Nov 23 15:41:32 2011 +0100
@@ -567,6 +567,14 @@ int libxl_sched_credit_domain_get(libxl_
                                   libxl_sched_credit *scinfo);
 int libxl_sched_credit_domain_set(libxl_ctx *ctx, uint32_t domid,
                                   libxl_sched_credit *scinfo);
+int libxl_sched_credit2_domain_get(libxl_ctx *ctx, uint32_t domid,
+                                   libxl_sched_credit2 *scinfo);
+int libxl_sched_credit2_domain_set(libxl_ctx *ctx, uint32_t domid,
+                                   libxl_sched_credit2 *scinfo);
+int libxl_sched_sedf_domain_get(libxl_ctx *ctx, uint32_t domid,
+                                libxl_sched_sedf *scinfo);
+int libxl_sched_sedf_domain_set(libxl_ctx *ctx, uint32_t domid,
+                                libxl_sched_sedf *scinfo);
 int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
                        char *trigger_name, uint32_t vcpuid);
 int libxl_send_sysrq(libxl_ctx *ctx, uint32_t domid, char sysrq);
diff -r a80a577c34ca tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/libxl_types.idl	Wed Nov 23 15:41:32 2011 +0100
@@ -109,6 +109,7 @@ SHUTDOWN_* constant."""),
     ("cpu_time",    uint64),
     ("vcpu_max_id", uint32),
     ("vcpu_online", uint32),
+    ("cpupool",     uint32),
     ], dispose_fn=None)
 
 libxl_cpupoolinfo = Struct("cpupoolinfo", [
@@ -374,3 +375,15 @@ libxl_sched_credit = Struct("sched_credi
     ("weight", integer),
     ("cap", integer),
     ], dispose_fn=None)
+
+libxl_sched_credit2 = Struct("sched_credit2", [
+    ("weight", integer),
+    ], dispose_fn=None)
+
+libxl_sched_sedf = Struct("sched_sedf", [
+    ("period", integer),
+    ("slice", integer),
+    ("latency", integer),
+    ("extratime", integer),
+    ("weight", integer),
+    ], dispose_fn=None)
diff -r a80a577c34ca tools/libxl/xl.h
--- a/tools/libxl/xl.h	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/xl.h	Wed Nov 23 15:41:32 2011 +0100
@@ -55,6 +55,8 @@ int main_vcpuset(int argc, char **argv);
 int main_memmax(int argc, char **argv);
 int main_memset(int argc, char **argv);
 int main_sched_credit(int argc, char **argv);
+int main_sched_credit2(int argc, char **argv);
+int main_sched_sedf(int argc, char **argv);
 int main_domid(int argc, char **argv);
 int main_domname(int argc, char **argv);
 int main_rename(int argc, char **argv);
diff -r a80a577c34ca tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/xl_cmdimpl.c	Wed Nov 23 15:41:32 2011 +0100
@@ -3699,29 +3699,204 @@ static int sched_credit_domain_set(
     return rc;
 }
 
-static void sched_credit_domain_output(
-    int domid, libxl_sched_credit *scinfo)
+static int sched_credit_domain_output(
+    int domid)
 {
     char *domname;
+    libxl_sched_credit scinfo;
+    int rc;
+
+    if (domid < 0) {
+        printf("%-33s %4s %6s %4s\n", "Name", "ID", "Weight", "Cap");
+        return 0;
+    }
+    rc = sched_credit_domain_get(domid, &scinfo);
+    if (rc)
+        return rc;
     domname = libxl_domid_to_name(ctx, domid);
     printf("%-33s %4d %6d %4d\n",
         domname,
         domid,
-        scinfo->weight,
-        scinfo->cap);
+        scinfo.weight,
+        scinfo.cap);
     free(domname);
+    return 0;
+}
+
+static int sched_credit2_domain_get(
+    int domid, libxl_sched_credit2 *scinfo)
+{
+    int rc;
+
+    rc = libxl_sched_credit2_domain_get(ctx, domid, scinfo);
+    if (rc)
+        fprintf(stderr, "libxl_sched_credit2_domain_get failed.\n");
+
+    return rc;
+}
+
+static int sched_credit2_domain_set(
+    int domid, libxl_sched_credit2 *scinfo)
+{
+    int rc;
+
+    rc = libxl_sched_credit2_domain_set(ctx, domid, scinfo);
+    if (rc)
+        fprintf(stderr, "libxl_sched_credit2_domain_set failed.\n");
+
+    return rc;
+}
+
+static int sched_credit2_domain_output(
+    int domid)
+{
+    char *domname;
+    libxl_sched_credit2 scinfo;
+    int rc;
+
+    if (domid < 0) {
+        printf("%-33s %4s %6s\n", "Name", "ID", "Weight");
+        return 0;
+    }
+    rc = sched_credit2_domain_get(domid, &scinfo);
+    if (rc)
+        return rc;
+    domname = libxl_domid_to_name(ctx, domid);
+    printf("%-33s %4d %6d\n",
+        domname,
+        domid,
+        scinfo.weight);
+    free(domname);
+    return 0;
+}
+
+static int sched_sedf_domain_get(
+    int domid, libxl_sched_sedf *scinfo)
+{
+    int rc;
+
+    rc = libxl_sched_sedf_domain_get(ctx, domid, scinfo);
+    if (rc)
+        fprintf(stderr, "libxl_sched_sedf_domain_get failed.\n");
+
+    return rc;
+}
+
+static int sched_sedf_domain_set(
+    int domid, libxl_sched_sedf *scinfo)
+{
+    int rc;
+
+    rc = libxl_sched_sedf_domain_set(ctx, domid, scinfo);
+    if (rc)
+        fprintf(stderr, "libxl_sched_sedf_domain_set failed.\n");
+
+    return rc;
+}
+
+static int sched_sedf_domain_output(
+    int domid)
+{
+    char *domname;
+    libxl_sched_sedf scinfo;
+    int rc;
+
+    if (domid < 0) {
+        printf("%-33s %4s %6s %-6s %7s %5s %6s\n", "Name", "ID", "Period", "Slice", "Latency", "Extra", "Weight");
+        return 0;
+    }
+    rc = sched_sedf_domain_get(domid, &scinfo);
+    if (rc)
+        return rc;
+    domname = libxl_domid_to_name(ctx, domid);
+    printf("%-33s %4d %6d %6d %7d %5d %6d\n",
+        domname,
+        domid,
+        scinfo.period,
+        scinfo.slice,
+        scinfo.latency,
+        scinfo.extratime,
+        scinfo.weight);
+    free(domname);
+    return 0;
+}
+
+static int sched_domain_output(
+    uint32_t sched, int (*output)(int), const char *cpupool)
+{
+    libxl_dominfo *info;
+    libxl_cpupoolinfo *poolinfo = NULL;
+    uint32_t poolid;
+    char *poolname;
+    int nb_domain, n_pools = 0, i, p;
+    int rc = 0;
+
+    if (cpupool) {
+        if (cpupool_qualifier_to_cpupoolid(cpupool, &poolid, NULL) ||
+            !libxl_cpupoolid_to_name(ctx, poolid)) {
+            fprintf(stderr, "unknown cpupool \'%s\'\n", cpupool);
+            return -ERROR_FAIL;
+        }
+    }
+
+    info = libxl_list_domain(ctx, &nb_domain);
+    if (!info) {
+        fprintf(stderr, "libxl_domain_infolist failed.\n");
+        return 1;
+    }
+    poolinfo = libxl_list_cpupool(ctx, &n_pools);
+    if (!poolinfo) {
+        fprintf(stderr, "error getting cpupool info\n");
+        return -ERROR_NOMEM;
+    }
+
+    for (p = 0; !rc && (p < n_pools); p++) {
+        if ((poolinfo[p].sched_id != sched) ||
+            (cpupool && (poolid != poolinfo[p].poolid)))
+            continue;
+
+        poolname = libxl_cpupoolid_to_name(ctx, poolinfo[p].poolid);
+        printf("Cpupool %s:\n", poolname);
+        free(poolname);
+
+        output(-1);
+        for (i = 0; i < nb_domain; i++) {
+            if (info[i].cpupool != poolinfo[p].poolid)
+                continue;
+            rc = output(info[i].domid);
+            if (rc)
+                break;
+        }
+    }
+    if (poolinfo) {
+        for (p = 0; p < n_pools; p++) {
+            libxl_cpupoolinfo_dispose(poolinfo + p);
+        }
+    }
+    return 0;
 }
 
 int main_sched_credit(int argc, char **argv)
 {
-    libxl_dominfo *info;
     libxl_sched_credit scinfo;
-    int nb_domain, i;
     const char *dom = NULL;
+    const char *cpupool = NULL;
     int weight = 256, cap = 0, opt_w = 0, opt_c = 0;
     int opt, rc;
-
-    while ((opt = def_getopt(argc, argv, "d:w:c:", "sched-credit", 0)) != -1) {
+    int option_index = 0;
+    static struct option long_options[] = {
+        {"domain", 1, 0, 'd'},
+        {"weight", 1, 0, 'w'},
+        {"cap", 1, 0, 'c'},
+        {"cpupool", 1, 0, 'p'},
+        {"help", 0, 0, 'h'},
+        {0, 0, 0, 0}
+    };
+
+    while (1) {
+        opt = getopt_long(argc, argv, "d:w:c:p:h", long_options, &option_index);
+        if (opt == -1)
+            break;
         switch (opt) {
         case 0: case 2:
             return opt;
@@ -3736,28 +3911,26 @@ int main_sched_credit(int argc, char **a
             cap = strtol(optarg, NULL, 10);
             opt_c = 1;
             break;
+        case 'p':
+            cpupool = optarg;
+            break;
+        case 'h':
+            help("sched-credit");
+            return 0;
         }
     }
 
+    if (cpupool && (dom || opt_w || opt_c)) {
+        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
+        return 1;
+    }
     if (!dom && (opt_w || opt_c)) {
         fprintf(stderr, "Must specify a domain.\n");
         return 1;
     }
 
     if (!dom) { /* list all domain's credit scheduler info */
-        info = libxl_list_domain(ctx, &nb_domain);
-        if (!info) {
-            fprintf(stderr, "libxl_domain_infolist failed.\n");
-            return 1;
-        }
-
-        printf("%-33s %4s %6s %4s\n", "Name", "ID", "Weight", "Cap");
-        for (i = 0; i < nb_domain; i++) {
-            rc = sched_credit_domain_get(info[i].domid, &scinfo);
-            if (rc)
-                return -rc;
-            sched_credit_domain_output(info[i].domid, &scinfo);
-        }
+        return -sched_domain_output(XEN_SCHEDULER_CREDIT, sched_credit_domain_output, cpupool);
     } else {
         find_domain(dom);
 
@@ -3766,8 +3939,8 @@ int main_sched_credit(int argc, char **a
             return -rc;
 
         if (!opt_w && !opt_c) { /* output credit scheduler info */
-            printf("%-33s %4s %6s %4s\n", "Name", "ID", "Weight", "Cap");
-            sched_credit_domain_output(domid, &scinfo);
+            sched_credit_domain_output(-1);
+            return -sched_credit_domain_output(domid);
         } else { /* set credit scheduler paramaters */
             if (opt_w)
                 scinfo.weight = weight;
@@ -3782,6 +3955,191 @@ int main_sched_credit(int argc, char **a
     return 0;
 }
 
+int main_sched_credit2(int argc, char **argv)
+{
+    libxl_sched_credit2 scinfo;
+    const char *dom = NULL;
+    const char *cpupool = NULL;
+    int weight = 256, opt_w = 0;
+    int opt, rc;
+    int option_index = 0;
+    static struct option long_options[] = {
+        {"domain", 1, 0, 'd'},
+        {"weight", 1, 0, 'w'},
+        {"cpupool", 1, 0, 'p'},
+        {"help", 0, 0, 'h'},
+        {0, 0, 0, 0}
+    };
+
+    while (1) {
+        opt = getopt_long(argc, argv, "d:w:p:h", long_options, &option_index);
+        if (opt == -1)
+            break;
+        switch (opt) {
+        case 0: case 2:
+            return opt;
+        case 'd':
+            dom = optarg;
+            break;
+        case 'w':
+            weight = strtol(optarg, NULL, 10);
+            opt_w = 1;
+            break;
+        case 'p':
+            cpupool = optarg;
+            break;
+        case 'h':
+            help("sched-credit");
+            return 0;
+        }
+    }
+
+    if (cpupool && (dom || opt_w)) {
+        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
+        return 1;
+    }
+    if (!dom && opt_w) {
+        fprintf(stderr, "Must specify a domain.\n");
+        return 1;
+    }
+
+    if (!dom) { /* list all domain's credit scheduler info */
+        return -sched_domain_output(XEN_SCHEDULER_CREDIT2, sched_credit2_domain_output, cpupool);
+    } else {
+        find_domain(dom);
+
+        rc = sched_credit2_domain_get(domid, &scinfo);
+        if (rc)
+            return -rc;
+
+        if (!opt_w) { /* output credit2 scheduler info */
+            sched_credit2_domain_output(-1);
+            return -sched_credit2_domain_output(domid);
+        } else { /* set credit2 scheduler paramaters */
+            if (opt_w)
+                scinfo.weight = weight;
+            rc = sched_credit2_domain_set(domid, &scinfo);
+            if (rc)
+                return -rc;
+        }
+    }
+
+    return 0;
+}
+
+int main_sched_sedf(int argc, char **argv)
+{
+    libxl_sched_sedf scinfo;
+    const char *dom = NULL;
+    const char *cpupool = NULL;
+    int period = 0, opt_p = 0;
+    int slice = 0, opt_s = 0;
+    int latency = 0, opt_l = 0;
+    int extra = 0, opt_e = 0;
+    int weight = 0, opt_w = 0;
+    int opt, rc;
+    int option_index = 0;
+    static struct option long_options[] = {
+        {"period", 1, 0, 'p'},
+        {"slice", 1, 0, 's'},
+        {"latency", 1, 0, 'l'},
+        {"extra", 1, 0, 'e'},
+        {"weight", 1, 0, 'w'},
+        {"cpupool", 1, 0, 'c'},
+        {"help", 0, 0, 'h'},
+        {0, 0, 0, 0}
+    };
+
+    while (1) {
+        opt = getopt_long(argc, argv, "d:p:s:l:e:w:c:h", long_options, &option_index);
+        if (opt == -1)
+            break;
+        switch (opt) {
+        case 0: case 2:
+            return opt;
+        case 'd':
+            dom = optarg;
+            break;
+        case 'p':
+            period = strtol(optarg, NULL, 10);
+            opt_p = 1;
+            break;
+        case 's':
+            slice = strtol(optarg, NULL, 10);
+            opt_s = 1;
+            break;
+        case 'l':
+            latency = strtol(optarg, NULL, 10);
+            opt_l = 1;
+            break;
+        case 'e':
+            extra = strtol(optarg, NULL, 10);
+            opt_e = 1;
+            break;
+        case 'w':
+            weight = strtol(optarg, NULL, 10);
+            opt_w = 1;
+            break;
+        case 'c':
+            cpupool = optarg;
+            break;
+        case 'h':
+            help("sched-sedf");
+            return 0;
+        }
+    }
+
+    if (cpupool && (dom || opt_p || opt_s || opt_l || opt_e || opt_w)) {
+        fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n");
+        return 1;
+    }
+    if (!dom && (opt_p || opt_s || opt_l || opt_e || opt_w)) {
+        fprintf(stderr, "Must specify a domain.\n");
+        return 1;
+    }
+    if (opt_w && (opt_p || opt_s)) {
+        fprintf(stderr, "Specifying a weight AND period or slice is not allowed.\n");
+    }
+
+    if (!dom) { /* list all domain's credit scheduler info */
+        return -sched_domain_output(XEN_SCHEDULER_SEDF, sched_sedf_domain_output, cpupool);
+    } else {
+        find_domain(dom);
+
+        rc = sched_sedf_domain_get(domid, &scinfo);
+        if (rc)
+            return -rc;
+
+        if (!opt_p && !opt_s && !opt_l && !opt_e && !opt_w) { /* output sedf scheduler info */
+            sched_sedf_domain_output(-1);
+            return -sched_sedf_domain_output(domid);
+        } else { /* set sedf scheduler paramaters */
+            if (opt_p) {
+                scinfo.period = period;
+                scinfo.weight = 0;
+            }
+            if (opt_s) {
+                scinfo.slice = slice;
+                scinfo.weight = 0;
+            }
+            if (opt_l)
+                scinfo.latency = latency;
+            if (opt_e)
+                scinfo.extratime = extra;
+            if (opt_w) {
+                scinfo.weight = weight;
+                scinfo.period = 0;
+                scinfo.slice = 0;
+            }
+            rc = sched_sedf_domain_set(domid, &scinfo);
+            if (rc)
+                return -rc;
+        }
+    }
+
+    return 0;
+}
+
 int main_domid(int argc, char **argv)
 {
     int opt;
diff -r a80a577c34ca tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c	Wed Nov 23 15:36:22 2011 +0100
+++ b/tools/libxl/xl_cmdtable.c	Wed Nov 23 15:41:32 2011 +0100
@@ -192,10 +192,35 @@ struct cmd_spec cmd_table[] = {
     { "sched-credit",
       &main_sched_credit, 0,
       "Get/set credit scheduler parameters",
-      "[-d <Domain> [-w[=WEIGHT]|-c[=CAP]]]",
+      "[-d <Domain> [-w[=WEIGHT]|-c[=CAP]]] [-p CPUPOOL]",
       "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
       "-w WEIGHT, --weight=WEIGHT     Weight (int)\n"
-      "-c CAP, --cap=CAP              Cap (int)"
+      "-c CAP, --cap=CAP              Cap (int)\n"
+      "-p CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
+    },
+    { "sched-credit2",
+      &main_sched_credit2, 0,
+      "Get/set credit2 scheduler parameters",
+      "[-d <Domain> [-w[=WEIGHT]]] [-p CPUPOOL]",
+      "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
+      "-w WEIGHT, --weight=WEIGHT     Weight (int)\n"
+      "-p CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
+    },
+    { "sched-sedf",
+      &main_sched_sedf, 0,
+      "Get/set sedf scheduler parameters",
+      "[options]",
+      "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
+      "-p MS, --period=MS             Relative deadline(ms)\n"
+      "-s MS, --slice=MS              Worst-case execution time(ms). (slice <\n"
+      "                               period)\n"
+      "-l MS, --latency=MS            Scaled period (ms) when domain performs\n"
+      "                               heavy I/O\n"
+      "-e FLAG, --extra=FLAG          Flag (0 or 1) controls if domain can run\n"
+      "                               in extra time\n"
+      "-w FLOAT, --weight=FLOAT       CPU Period/slice (do not set with\n"
+      "                               --period/--slice)\n"
+      "-c CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
     },
     { "domid",
       &main_domid, 0,

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
@ 2011-11-23 15:07 ` Dario Faggioli
  2011-11-23 16:24   ` Ian Campbell
  2011-11-23 15:09 ` [RFC/RFT][PATCH 2 of 3] Remove VCPU pausing while adjusting domain scheduling parameters Dario Faggioli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2011-11-23 15:07 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 8386 bytes --]

Pluggable schedulers' code knows what (and when) to lock better than
generic code, let's delegate to them all the concurrency related issues.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r 84b3e46aa7d2 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Wed Nov 23 12:03:37 2011 +0000
+++ b/xen/common/sched_credit.c	Wed Nov 23 15:09:14 2011 +0100
@@ -161,6 +161,7 @@ struct csched_dom {
  * System-wide private data
  */
 struct csched_private {
+    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
     spinlock_t lock;
     struct list_head active_sdom;
     uint32_t ncpus;
@@ -800,6 +801,10 @@ csched_dom_cntl(
     struct csched_private *prv = CSCHED_PRIV(ops);
     unsigned long flags;
 
+    /* Protect both get and put branches with the pluggable scheduler
+     * lock. Runq lock not needed anywhere in here. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit.weight = sdom->weight;
@@ -809,8 +814,6 @@ csched_dom_cntl(
     {
         ASSERT(op->cmd == XEN_DOMCTL_SCHEDOP_putinfo);
 
-        spin_lock_irqsave(&prv->lock, flags);
-
         if ( op->u.credit.weight != 0 )
         {
             if ( !list_empty(&sdom->active_sdom_elem) )
@@ -824,9 +827,10 @@ csched_dom_cntl(
         if ( op->u.credit.cap != (uint16_t)~0U )
             sdom->cap = op->u.credit.cap;
 
-        spin_unlock_irqrestore(&prv->lock, flags);
     }
 
+    spin_unlock_irqrestore(&prv->lock, flags);
+
     return 0;
 }
 
diff -r 84b3e46aa7d2 xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c	Wed Nov 23 12:03:37 2011 +0000
+++ b/xen/common/sched_credit2.c	Wed Nov 23 15:09:14 2011 +0100
@@ -1384,6 +1384,10 @@ csched_dom_cntl(
     struct csched_private *prv = CSCHED_PRIV(ops);
     unsigned long flags;
 
+    /* Must hold csched_priv lock to read and update sdom,
+     * runq lock to update csvcs. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit2.weight = sdom->weight;
@@ -1397,10 +1401,6 @@ csched_dom_cntl(
             struct list_head *iter;
             int old_weight;
 
-            /* Must hold csched_priv lock to update sdom, runq lock to
-             * update csvcs. */
-            spin_lock_irqsave(&prv->lock, flags);
-
             old_weight = sdom->weight;
 
             sdom->weight = op->u.credit2.weight;
@@ -1411,22 +1411,23 @@ csched_dom_cntl(
                 struct csched_vcpu *svc = list_entry(iter, struct csched_vcpu, sdom_elem);
 
                 /* NB: Locking order is important here.  Because we grab this lock here, we
-                 * must never lock csched_priv.lock if we're holding a runqueue
-                 * lock. */
-                vcpu_schedule_lock_irq(svc->vcpu);
+                 * must never lock csched_priv.lock if we're holding a runqueue lock.
+                 * Also, calling vcpu_schedule_lock() is enough, since IRQs have already
+                 * been disabled. */
+                vcpu_schedule_lock(svc->vcpu);
 
                 BUG_ON(svc->rqd != RQD(ops, svc->vcpu->processor));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                vcpu_schedule_unlock_irq(svc->vcpu);
+                vcpu_schedule_unlock(svc->vcpu);
             }
-
-            spin_unlock_irqrestore(&prv->lock, flags);
         }
     }
 
+    spin_unlock_irqrestore(&prv->lock, flags);
+
     return 0;
 }
 
diff -r 84b3e46aa7d2 xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c	Wed Nov 23 12:03:37 2011 +0000
+++ b/xen/common/sched_sedf.c	Wed Nov 23 15:09:14 2011 +0100
@@ -1397,18 +1397,28 @@ static int sedf_adjust_weights(struct cp
 static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op)
 {
     struct vcpu *v;
-    int rc;
+    int rc = 0;
 
     PRINT(2,"sedf_adjust was called, domain-id %i new period %"PRIu64" "
           "new slice %"PRIu64"\nlatency %"PRIu64" extra:%s\n",
           p->domain_id, op->u.sedf.period, op->u.sedf.slice,
           op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no");
 
+    /* Serialize everything against runq lock to prevent concurrent update
+     * (notice all non-current VCPUs of the domain have been paused in the
+     * caller). */
+    if ( p == current->domain )
+        vcpu_schedule_lock_irq(current);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo )
     {
         /* Check for sane parameters. */
         if ( !op->u.sedf.period && !op->u.sedf.weight )
-            return -EINVAL;
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+
         if ( op->u.sedf.weight )
         {
             if ( (op->u.sedf.extratime & EXTRA_AWARE) &&
@@ -1443,7 +1453,11 @@ static int sedf_adjust(const struct sche
                      (op->u.sedf.period < PERIOD_MIN) ||
                      (op->u.sedf.slice  > op->u.sedf.period) ||
                      (op->u.sedf.slice  < SLICE_MIN) )
-                    return -EINVAL;
+                {
+                    rc = -EINVAL;
+                    goto out;
+                }
+
                 EDOM_INFO(v)->weight = 0;
                 EDOM_INFO(v)->extraweight = 0;
                 EDOM_INFO(v)->period_orig = 
@@ -1455,7 +1469,7 @@ static int sedf_adjust(const struct sche
 
         rc = sedf_adjust_weights(p->cpupool, op);
         if ( rc )
-            return rc;
+            goto out;
 
         for_each_vcpu ( p, v )
         {
@@ -1469,7 +1483,11 @@ static int sedf_adjust(const struct sche
     else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         if ( p->vcpu[0] == NULL )
-            return -EINVAL;
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+
         op->u.sedf.period    = EDOM_INFO(p->vcpu[0])->period;
         op->u.sedf.slice     = EDOM_INFO(p->vcpu[0])->slice;
         op->u.sedf.extratime = EDOM_INFO(p->vcpu[0])->status & EXTRA_AWARE;
@@ -1477,8 +1495,12 @@ static int sedf_adjust(const struct sche
         op->u.sedf.weight    = EDOM_INFO(p->vcpu[0])->weight;
     }
 
-    PRINT(2,"sedf_adjust_finished\n");
-    return 0;
+out:
+    if ( p == current->domain )
+        vcpu_schedule_unlock_irq(current);
+
+    PRINT(2,"sedf_adjust_finished with return code %d\n", rc);
+    return rc;
 }
 
 const struct scheduler sched_sedf_def = {
diff -r 84b3e46aa7d2 xen/common/schedule.c
--- a/xen/common/schedule.c	Wed Nov 23 12:03:37 2011 +0000
+++ b/xen/common/schedule.c	Wed Nov 23 15:09:14 2011 +0100
@@ -1015,15 +1015,8 @@ long sched_adjust(struct domain *d, stru
 
     /*
      * Most VCPUs we can simply pause. If we are adjusting this VCPU then
-     * we acquire the local schedule_lock to guard against concurrent updates.
-     *
-     * We only acquire the local schedule lock after we have paused all other
-     * VCPUs in this domain. There are two reasons for this:
-     * 1- We don't want to hold up interrupts as pausing a VCPU can
-     *    trigger a tlb shootdown.
-     * 2- Pausing other VCPUs involves briefly locking the schedule
-     *    lock of the CPU they are running on. This CPU could be the
-     *    same as ours.
+     * concurrent updates shall be prevented within the actual pluggable
+     * scheduler.
      */
 
     for_each_vcpu ( d, v )
@@ -1032,15 +1025,9 @@ long sched_adjust(struct domain *d, stru
             vcpu_pause(v);
     }
 
-    if ( d == current->domain )
-        vcpu_schedule_lock_irq(current);
-
     if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
-    if ( d == current->domain )
-        vcpu_schedule_unlock_irq(current);
-
     for_each_vcpu ( d, v )
     {
         if ( v != current )

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC/RFT][PATCH 2 of 3] Remove VCPU pausing while adjusting domain scheduling parameters.
  2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
  2011-11-23 15:07 ` [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers Dario Faggioli
@ 2011-11-23 15:09 ` Dario Faggioli
  2011-11-23 15:11 ` [RFC/RFT][PATCH 3 of 3] Introduce proper locking in sedf Dario Faggioli
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-11-23 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 2096 bytes --]

Pausing all the non-current VCPUs of a domain while changing its
scheduling parameters is dangerous (what if two VCPUs try to pause
all the other VCPUs but themselves at the same time?). Moreover, it
does not appear to be needed, since:
 - races are (or should be) avoided within each pluggable scheduler
   by means of proper locking;
 - changes in values using during actual scheduling are (or should be)
   prevented by runq locking.

Just remove it.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r 6b5e2eb81706 xen/common/schedule.c
--- a/xen/common/schedule.c	Wed Nov 23 15:09:14 2011 +0100
+++ b/xen/common/schedule.c	Wed Nov 23 15:20:28 2011 +0100
@@ -1005,7 +1005,6 @@ int sched_id(void)
 /* Adjust scheduling parameter for a given domain. */
 long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 {
-    struct vcpu *v;
     long ret;
     
     if ( (op->sched_id != DOM2OP(d)->sched_id) ||
@@ -1013,27 +1012,11 @@ long sched_adjust(struct domain *d, stru
           (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
         return -EINVAL;
 
-    /*
-     * Most VCPUs we can simply pause. If we are adjusting this VCPU then
-     * concurrent updates shall be prevented within the actual pluggable
-     * scheduler.
-     */
-
-    for_each_vcpu ( d, v )
-    {
-        if ( v != current )
-            vcpu_pause(v);
-    }
-
+    /* NB: the pluggable scheduler code needs to take care
+     * of locking by itself. */
     if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
-    for_each_vcpu ( d, v )
-    {
-        if ( v != current )
-            vcpu_unpause(v);
-    }
-
     return ret;
 }
 

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC/RFT][PATCH 3 of 3] Introduce proper locking in sedf.
  2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
  2011-11-23 15:07 ` [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers Dario Faggioli
  2011-11-23 15:09 ` [RFC/RFT][PATCH 2 of 3] Remove VCPU pausing while adjusting domain scheduling parameters Dario Faggioli
@ 2011-11-23 15:11 ` Dario Faggioli
  2011-12-06  8:38 ` [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Juergen Gross
  2011-12-06 12:24 ` George Dunlap
  4 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-11-23 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 10009 bytes --]

Add a global scheduler lock in sedf, as we have in credit and credit2,
mostly for preventing races while adjusting scheduling parameters. Also,
add runq locking within sedf_adjust_weights since we are touching
scheduling sensitive values in there.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r 17653f5c2995 xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c	Wed Nov 23 15:20:28 2011 +0100
+++ b/xen/common/sched_sedf.c	Wed Nov 23 15:36:22 2011 +0100
@@ -61,6 +61,11 @@ struct sedf_dom_info {
     struct domain  *domain;
 };
 
+struct sedf_priv_info {
+    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
+    spinlock_t lock;
+};
+
 struct sedf_vcpu_info {
     struct vcpu *vcpu;
     struct list_head list;
@@ -115,6 +120,8 @@ struct sedf_cpu_info {
     s_time_t         current_slice_expires;
 };
 
+#define SEDF_PRIV(_ops) \
+    ((struct sedf_priv_info *)((_ops)->sched_data))
 #define EDOM_INFO(d)   ((struct sedf_vcpu_info *)((d)->sched_priv))
 #define CPU_INFO(cpu)  \
     ((struct sedf_cpu_info *)per_cpu(schedule_data, cpu).sched_priv)
@@ -772,6 +779,31 @@ static struct task_slice sedf_do_extra_s
 }
 

+static int sedf_init(struct scheduler *ops)
+{
+    struct sedf_priv_info *prv;
+
+    prv = xzalloc(struct sedf_priv_info);
+    if ( prv == NULL )
+        return -ENOMEM;
+
+    ops->sched_data = prv;
+    spin_lock_init(&prv->lock);
+
+    return 0;
+}
+
+
+static void sedf_deinit(const struct scheduler *ops)
+{
+    struct sedf_priv_info *prv;
+
+    prv = SEDF_PRIV(ops);
+    if ( prv != NULL )
+        xfree(prv);
+}
+
+
 /* Main scheduling function
    Reasons for calling this function are:
    -timeslice for the current period used up
@@ -1247,11 +1279,18 @@ static void sedf_dump_domain(struct vcpu
 /* dumps all domains on the specified cpu */
 static void sedf_dump_cpu_state(const struct scheduler *ops, int i)
 {
+    struct sedf_priv_info *prv = SEDF_PRIV(ops);
     struct list_head      *list, *queue, *tmp;
     struct sedf_vcpu_info *d_inf;
     struct domain         *d;
     struct vcpu    *ed;
     int loop = 0;
+    unsigned long flags;
+
+    /* Lock the scheduler, we want to be sure we dump a
+     * consistent set of parameters.
+     */
+    spin_lock_irqsave(&prv->lock, flags);
  
     printk("now=%"PRIu64"\n",NOW());
     queue = RUNQ(i);
@@ -1316,6 +1355,8 @@ static void sedf_dump_cpu_state(const st
         }
     }
     rcu_read_unlock(&domlist_read_lock);
+
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 

@@ -1335,7 +1376,9 @@ static int sedf_adjust_weights(struct cp
         return -ENOMEM;
     }
 
-    /* Sum across all weights. */
+    /* Sum across all weights. Notice that no runq locking is needed
+     * here: the caller holds sedf_priv_info->lock and we're not changing
+     * anything that is accessed during scheduling. */
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool( d, c )
     {
@@ -1365,7 +1408,9 @@ static int sedf_adjust_weights(struct cp
     }
     rcu_read_unlock(&domlist_read_lock);
 
-    /* Adjust all slices (and periods) to the new weight. */
+    /* Adjust all slices (and periods) to the new weight. Unlike above, we
+     * need to take thr runq lock for the various VCPUs: we're modyfing
+     * slice and period which are referenced during scheduling. */
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool( d, c )
     {
@@ -1375,12 +1420,15 @@ static int sedf_adjust_weights(struct cp
                 continue;
             if ( EDOM_INFO(p)->weight )
             {
+                /* Interrupts already off */
+                vcpu_schedule_lock(p);
                 EDOM_INFO(p)->period_orig = 
                     EDOM_INFO(p)->period  = WEIGHT_PERIOD;
                 EDOM_INFO(p)->slice_orig  =
                     EDOM_INFO(p)->slice   = 
                     (EDOM_INFO(p)->weight *
                      (WEIGHT_PERIOD - WEIGHT_SAFETY - sumt[cpu])) / sumw[cpu];
+                vcpu_schedule_unlock(p);
             }
         }
     }
@@ -1396,6 +1444,8 @@ static int sedf_adjust_weights(struct cp
 /* set or fetch domain scheduling parameters */
 static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op)
 {
+    struct sedf_priv_info *prv = SEDF_PRIV(ops);
+    unsigned long flags;
     struct vcpu *v;
     int rc = 0;
 
@@ -1404,11 +1454,12 @@ static int sedf_adjust(const struct sche
           p->domain_id, op->u.sedf.period, op->u.sedf.slice,
           op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no");
 
-    /* Serialize everything against runq lock to prevent concurrent update
-     * (notice all non-current VCPUs of the domain have been paused in the
-     * caller). */
-    if ( p == current->domain )
-        vcpu_schedule_lock_irq(current);
+    /* Serialize against the pluggable scheduler lock to protect from
+     * concurrent updates. We need to take the runq lock for the VCPUs
+     * as well, since we are touching extraweight, weight, slice and
+     * period. As in sched_credit2.c, runq locks nest inside the
+     * pluggable scheduler lock. */
+    spin_lock_irqsave(&prv->lock, flags);
 
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo )
     {
@@ -1427,43 +1478,52 @@ static int sedf_adjust(const struct sche
                 /* Weight-driven domains with extratime only. */
                 for_each_vcpu ( p, v )
                 {
+                    /* (Here and everywhere in the following) IRQs are already off,
+                     * hence vcpu_spin_lock() is the one. */
+                    vcpu_schedule_lock(v);
                     EDOM_INFO(v)->extraweight = op->u.sedf.weight;
                     EDOM_INFO(v)->weight = 0;
                     EDOM_INFO(v)->slice = 0;
                     EDOM_INFO(v)->period = WEIGHT_PERIOD;
+                    vcpu_schedule_unlock(v);
                 }
             }
             else
             {
                 /* Weight-driven domains with real-time execution. */
-                for_each_vcpu ( p, v )
+                for_each_vcpu ( p, v ) {
+                    vcpu_schedule_lock(v);
                     EDOM_INFO(v)->weight = op->u.sedf.weight;
+                    vcpu_schedule_lock(v);
+                }
             }
         }
         else
         {
+            /*
+             * Sanity checking: note that disabling extra weight requires
+             * that we set a non-zero slice.
+             */
+            if ( (op->u.sedf.period > PERIOD_MAX) ||
+                 (op->u.sedf.period < PERIOD_MIN) ||
+                 (op->u.sedf.slice  > op->u.sedf.period) ||
+                 (op->u.sedf.slice  < SLICE_MIN) )
+            {
+                rc = -EINVAL;
+                goto out;
+            }
+
             /* Time-driven domains. */
             for_each_vcpu ( p, v )
             {
-                /*
-                 * Sanity checking: note that disabling extra weight requires
-                 * that we set a non-zero slice.
-                 */
-                if ( (op->u.sedf.period > PERIOD_MAX) ||
-                     (op->u.sedf.period < PERIOD_MIN) ||
-                     (op->u.sedf.slice  > op->u.sedf.period) ||
-                     (op->u.sedf.slice  < SLICE_MIN) )
-                {
-                    rc = -EINVAL;
-                    goto out;
-                }
-
+                vcpu_schedule_lock(v);
                 EDOM_INFO(v)->weight = 0;
                 EDOM_INFO(v)->extraweight = 0;
                 EDOM_INFO(v)->period_orig = 
                     EDOM_INFO(v)->period  = op->u.sedf.period;
                 EDOM_INFO(v)->slice_orig  = 
                     EDOM_INFO(v)->slice   = op->u.sedf.slice;
+                vcpu_schedule_unlock(v);
             }
         }
 
@@ -1473,11 +1533,13 @@ static int sedf_adjust(const struct sche
 
         for_each_vcpu ( p, v )
         {
+            vcpu_schedule_lock(v);
             EDOM_INFO(v)->status  = 
                 (EDOM_INFO(v)->status &
                  ~EXTRA_AWARE) | (op->u.sedf.extratime & EXTRA_AWARE);
             EDOM_INFO(v)->latency = op->u.sedf.latency;
             extraq_check(v);
+            vcpu_schedule_lock(v);
         }
     }
     else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
@@ -1496,17 +1558,19 @@ static int sedf_adjust(const struct sche
     }
 
 out:
-    if ( p == current->domain )
-        vcpu_schedule_unlock_irq(current);
+    spin_unlock_irqrestore(&prv->lock, flags);
 
     PRINT(2,"sedf_adjust_finished with return code %d\n", rc);
     return rc;
 }
 
+static struct sedf_priv_info _sedf_priv;
+
 const struct scheduler sched_sedf_def = {
-    .name     = "Simple EDF Scheduler",
-    .opt_name = "sedf",
-    .sched_id = XEN_SCHEDULER_SEDF,
+    .name           = "Simple EDF Scheduler",
+    .opt_name       = "sedf",
+    .sched_id       = XEN_SCHEDULER_SEDF,
+    .sched_data     = &_sedf_priv,
     
     .init_domain    = sedf_init_domain,
     .destroy_domain = sedf_destroy_domain,
@@ -1520,6 +1584,9 @@ const struct scheduler sched_sedf_def = 
     .alloc_domdata  = sedf_alloc_domdata,
     .free_domdata   = sedf_free_domdata,
 
+    .init           = sedf_init,
+    .deinit         = sedf_deinit,
+
     .do_schedule    = sedf_do_schedule,
     .pick_cpu       = sedf_pick_cpu,
     .dump_cpu_state = sedf_dump_cpu_state,

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-11-23 15:07 ` [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers Dario Faggioli
@ 2011-11-23 16:24   ` Ian Campbell
  2011-11-23 17:09     ` Dario Faggioli
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ian Campbell @ 2011-11-23 16:24 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, Juergen Gross, xen-devel@lists.xensource.com,
	Keir (Xen.org)

On Wed, 2011-11-23 at 15:07 +0000, Dario Faggioli wrote:
> Pluggable schedulers' code knows what (and when) to lock better than
> generic code, let's delegate to them all the concurrency related issues.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> diff -r 84b3e46aa7d2 xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c	Wed Nov 23 12:03:37 2011 +0000
> +++ b/xen/common/sched_credit.c	Wed Nov 23 15:09:14 2011 +0100
> @@ -161,6 +161,7 @@ struct csched_dom {
>   * System-wide private data
>   */
>  struct csched_private {
> +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
>      spinlock_t lock;
>      struct list_head active_sdom;
>      uint32_t ncpus;

Given that every scheduler plugin is going to need this lock perhaps it
could be provided globally (but still have the responsibility for using
it fall on the plugin)?

I was mainly thinking of the sedf case where you add and maintain the
whole structure for just that lock. Perhaps you have future plans which
involve having to do that anyway in which case maybe my suggestion
doesn't make sense.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-11-23 16:24   ` Ian Campbell
@ 2011-11-23 17:09     ` Dario Faggioli
  2011-11-23 17:30       ` Ian Campbell
  2011-12-06 10:34     ` George Dunlap
  2011-12-07 14:49     ` [PATCHv2 0 of 1] Rework locking for sched_adjust Dario Faggioli
  2 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2011-11-23 17:09 UTC (permalink / raw)
  To: Ian Campbell
  Cc: George Dunlap, Juergen Gross, xen-devel@lists.xensource.com,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1878 bytes --]

On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote:
> >  struct csched_private {
> > +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
> >      spinlock_t lock;
> >      struct list_head active_sdom;
> >      uint32_t ncpus;
> 
> Given that every scheduler plugin is going to need this lock perhaps it
> could be provided globally (but still have the responsibility for using
> it fall on the plugin)?
> 
Makes sense to me, and it should be something pretty easy to do, if you
were thinking of just moving the lock to general code.
I'm saying this because both credit and credit2 has much more payload in
their `struct csched_private', and if we also want to get rid of the
struct for them as well, that would be a different story! :-)

> I was mainly thinking of the sedf case where you add and maintain the
> whole structure for just that lock. Perhaps you have future plans which
> involve having to do that anyway in which case maybe my suggestion
> doesn't make sense.
>
I know and I agree, that 1-field-struct is just as ugly as hell! Reason
why I went for it are homogeneity with the current code of all the
schedulers, and yes, also, what you're saying above, i.e., it might turn
useful in future to have some scheduler-wide repository in sedf as it is
now for credit*. But no, I don't have _specific_ plans yet, so your
comment do make sense.

Anyway, even if we'd stay with what's in this patch, I think this at
least need some commenting...

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-11-23 17:09     ` Dario Faggioli
@ 2011-11-23 17:30       ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2011-11-23 17:30 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, Juergen Gross, xen-devel@lists.xensource.com,
	Keir (Xen.org)

On Wed, 2011-11-23 at 17:09 +0000, Dario Faggioli wrote:
> On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote:
> > >  struct csched_private {
> > > +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
> > >      spinlock_t lock;
> > >      struct list_head active_sdom;
> > >      uint32_t ncpus;
> > 
> > Given that every scheduler plugin is going to need this lock perhaps it
> > could be provided globally (but still have the responsibility for using
> > it fall on the plugin)?
> > 
> Makes sense to me, and it should be something pretty easy to do, if you
> were thinking of just moving the lock to general code.
> I'm saying this because both credit and credit2 has much more payload in
> their `struct csched_private', and if we also want to get rid of the
> struct for them as well, that would be a different story! :-)

No, I just meant the lock.

> 
> > I was mainly thinking of the sedf case where you add and maintain the
> > whole structure for just that lock. Perhaps you have future plans which
> > involve having to do that anyway in which case maybe my suggestion
> > doesn't make sense.
> >
> I know and I agree, that 1-field-struct is just as ugly as hell! Reason
> why I went for it are homogeneity with the current code of all the
> schedulers, and yes, also, what you're saying above, i.e., it might turn
> useful in future to have some scheduler-wide repository in sedf as it is
> now for credit*. But no, I don't have _specific_ plans yet, so your
> comment do make sense.
> 
> Anyway, even if we'd stay with what's in this patch, I think this at
> least need some commenting...
> 
> Thanks and Regards,
> Dario
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
                   ` (2 preceding siblings ...)
  2011-11-23 15:11 ` [RFC/RFT][PATCH 3 of 3] Introduce proper locking in sedf Dario Faggioli
@ 2011-12-06  8:38 ` Juergen Gross
  2011-12-06 10:10   ` Dario Faggioli
  2011-12-06 12:30   ` George Dunlap
  2011-12-06 12:24 ` George Dunlap
  4 siblings, 2 replies; 21+ messages in thread
From: Juergen Gross @ 2011-12-06  8:38 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Keir Fraser

On 11/23/2011 03:55 PM, Dario Faggioli wrote:
> Hi everyone,
>
> This series changes how locks are dealt with while adjusting domains'
> scheduling parameters.
>
> I've done and am still doing tests for credit and credit2, and it's
> surviving to all I threw at it up to now. Unfortunately, I can't test
> the sedf part yet, since it is not working on my test boxes due to other
> issues (which I'm also trying to track down). I'm sending the series out
> anyway, so that at least you can have a look at it, and maybe give a
> spin on the sedf part, if you don't have anything more interesting to
> do. ;-P

Sorry, does not work with sched=sedf:

nehalem1 login: (XEN) Xen BUG at spinlock.c:47
(XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480124e84>] check_lock+0x44/0x50
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff83033eb17868   rcx: 0000000000000001
(XEN) rdx: 0000000000000000   rsi: 0000000000000001   rdi: ffff83033eb1786c
(XEN) rbp: ffff82c4802afc10   rsp: ffff82c4802afc10   r8:  0000000000000004
(XEN) r9:  000000000000003c   r10: ffff82c4802285b0   r11: 0000000000000212
(XEN) r12: ffff83033eb16000   r13: 0000000000000010   r14: ffff82c4802e3c60
(XEN) r15: ffff83033eb17868   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 000000032b937000   cr2: 00007f09a92342c0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802afc10:
(XEN)    ffff82c4802afc28 ffff82c480124ec1 0000000000000010 ffff82c4802afc68
(XEN)    ffff82c48012c085 7400000000000002 0000000000000010 0000000000000010
(XEN)    0000000000000020 ffff82c4802e3c60 ffff83033eaf8fb0 ffff82c4802afca8
(XEN)    ffff82c48012c557 ffff82c4802afd18 0000000000000010 0000000000000003
(XEN)    0000000000000004 ffff82c4802e3c60 ffff83033eaf8fb0 ffff82c4802afcc8
(XEN)    ffff82c48012c660 ffff82c480124ec1 0000000000000004 ffff82c4802afd58
(XEN)    ffff82c48011f8ab ffff82c480124ec1 ffff83033ea97048 ffff82c4802afd18
(XEN)    ffff82c480121084 ffff82c4802afe48 ffff83033ea92000 ffff83033eb11f60
(XEN)    0000000000000296 ffff82c4802afd38 ffff82c480121aab 00000004bf2fb000
(XEN)    ffff83033ea92000 000000000064b004 ffff83033ea92000 0000000000000000
(XEN)    00007fff37431340 ffff82c4802afd88 ffff82c480120fec ffff82c480125104
(XEN)    fffffffffffffff3 ffff82c4802afd88 fffffffffffffff3 ffff82c4802afef8
(XEN)    ffff82c4801037e3 000000000061f004 0000000000000000 000000000061f004
(XEN)    0000000000000001 ffff82c4802afef8 ffff82c480126cb8 ffff82c4802afdf8
(XEN)    ffff82c480106f0e ffff002000000000 00000000000c0000 00000000ffffffff
(XEN)    0000000000000000 0000000000000000 00000000000bf7b3 00000018d63a6d18
(XEN)    0000000300000004 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff82c4802afe88 0000000800000010 00007f09a9890000 0000000000000004
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000a00000001
(XEN)    0000000000000000 00007fff374314e0 00007fff37431540 000000000061e050
(XEN) Xen call trace:
(XEN)    [<ffff82c480124e84>] check_lock+0x44/0x50
(XEN)    [<ffff82c480124ec1>] _spin_lock+0x11/0x5d
(XEN)    [<ffff82c48012c085>] xmem_pool_alloc+0x138/0x4d2
(XEN)    [<ffff82c48012c557>] _xmalloc+0x138/0x230
(XEN)    [<ffff82c48012c660>] _xzalloc+0x11/0x2d
(XEN)    [<ffff82c48011f8ab>] sedf_adjust+0x37c/0x9b2
(XEN)    [<ffff82c480120fec>] sched_adjust+0x5f/0xb7
(XEN)    [<ffff82c4801037e3>] do_domctl+0xf32/0x1a9f
(XEN)    [<ffff82c48021f128>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************
(XEN)

-- 
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06  8:38 ` [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Juergen Gross
@ 2011-12-06 10:10   ` Dario Faggioli
  2011-12-06 11:03     ` Juergen Gross
  2011-12-06 12:30   ` George Dunlap
  1 sibling, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2011-12-06 10:10 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 704 bytes --]

On Tue, 2011-12-06 at 09:38 +0100, Juergen Gross wrote:
> Sorry, does not work with sched=sedf:
> 
Hi Juergen,

Thanks for reporting. I'm reworking this as per Ian's suggestions, but
I'll check this out. Can I as how you made this happening? Just trying
to change a sedf-domain scheduling parameter I guess... Is that right?

Thanks again and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-11-23 16:24   ` Ian Campbell
  2011-11-23 17:09     ` Dario Faggioli
@ 2011-12-06 10:34     ` George Dunlap
  2011-12-06 16:35       ` Dario Faggioli
  2011-12-07 14:49     ` [PATCHv2 0 of 1] Rework locking for sched_adjust Dario Faggioli
  2 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2011-12-06 10:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: George Dunlap, Keir (Xen.org), Juergen Gross,
	xen-devel@lists.xensource.com, Dario Faggioli

On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote:
> On Wed, 2011-11-23 at 15:07 +0000, Dario Faggioli wrote:
> > Pluggable schedulers' code knows what (and when) to lock better than
> > generic code, let's delegate to them all the concurrency related issues.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > 
> > diff -r 84b3e46aa7d2 xen/common/sched_credit.c
> > --- a/xen/common/sched_credit.c	Wed Nov 23 12:03:37 2011 +0000
> > +++ b/xen/common/sched_credit.c	Wed Nov 23 15:09:14 2011 +0100
> > @@ -161,6 +161,7 @@ struct csched_dom {
> >   * System-wide private data
> >   */
> >  struct csched_private {
> > +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
> >      spinlock_t lock;
> >      struct list_head active_sdom;
> >      uint32_t ncpus;
> 
> Given that every scheduler plugin is going to need this lock perhaps it
> could be provided globally (but still have the responsibility for using
> it fall on the plugin)?

Sorry for the long delay in responding... I don't really like this idea.
For one thing, that would make the generic scheduler code responsible
for making per-cpupool locks, which could look messy, and adds to code
complexity. There's already code to allocate per-pool data structures
for the schedulers -- it seems like just leveraging that would be
easiest.  I also think that logically, having each scheduler in charge
of its own locking makes more sense; having the generic scheduling code
do stuff on behalf of the pluggable scheduler is what caused this
problem in the first place.

If we think having a one-item struct is ugly, could we just use
spinlock_t as the type we allocate / cast to in the sedf scheduler?

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06 10:10   ` Dario Faggioli
@ 2011-12-06 11:03     ` Juergen Gross
  0 siblings, 0 replies; 21+ messages in thread
From: Juergen Gross @ 2011-12-06 11:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Keir Fraser

Hi Dario,

On 12/06/2011 11:10 AM, Dario Faggioli wrote:
> On Tue, 2011-12-06 at 09:38 +0100, Juergen Gross wrote:
>> Sorry, does not work with sched=sedf:
>>
> Hi Juergen,
>
> Thanks for reporting. I'm reworking this as per Ian's suggestions, but
> I'll check this out. Can I as how you made this happening? Just trying
> to change a sedf-domain scheduling parameter I guess... Is that right?

I booted with sched=sedf and tried to change Domain-0 parameters with:

xl sched-sedf -d Domain-0 -w 10


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
                   ` (3 preceding siblings ...)
  2011-12-06  8:38 ` [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Juergen Gross
@ 2011-12-06 12:24 ` George Dunlap
  2011-12-06 16:46   ` Dario Faggioli
  4 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2011-12-06 12:24 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Keir Fraser

On Wed, Nov 23, 2011 at 2:55 PM, Dario Faggioli <raistlin@linux.it> wrote:
> Hi everyone,
>
> This series changes how locks are dealt with while adjusting domains'
> scheduling parameters.
>
> I've done and am still doing tests for credit and credit2, and it's
> surviving to all I threw at it up to now. Unfortunately, I can't test
> the sedf part yet, since it is not working on my test boxes due to other
> issues (which I'm also trying to track down). I'm sending the series out
> anyway, so that at least you can have a look at it, and maybe give a
> spin on the sedf part, if you don't have anything more interesting to
> do. ;-P
>
> Juergen series on xl scheduling support attached to this mail, in the
> form of a single patch, for testing convenience.

Hey Dario,  sorry for the long delay in responding.

Overall the patch series looks good to me.  The only issue I see is
that it looks like you introduce a regression between patch 2 and 3 --
that is, if you apply patches 1 and 2, but not 3, then the lock you
grab in sched_sedf.c:sedf_adjust() won't protect against concurrent
accesses from other vcpus; nor will it correctly handle updates to the
per-vcpu EDOM_INFO() variables.

Regressions in mid-patch series are bad because it can mess up bisection.

I think a better way of breaking it down would be:
* Add scheduler global locks, and have them called in the pluggable
scheduler *_adjust() function.  This is redundant, but shouldn't be
harmful.
* Atomically remove both the pause and the lock in schedule.c, and add
the scheduling locks around the per-vcpu EDOM_INFO variables in
sched_sedf.c, all in one patch.  This makes the patch bigger, but it
avoids a regression.  The two things are related anwyay: the reason
you now need scheduling locks around EDOM_INFO variables is because
you're getting rid of the pausing and the lock in schedule.c.

Thoughts?

 -George

>
> --
>
> xen/common/sched_credit.c  |   10 +++++++---
> xen/common/sched_credit2.c |   21 +++++++++++----------
> xen/common/sched_sedf.c    |  131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
> xen/common/schedule.c      |   34 ++--------------------------------
> 4 files changed, 130 insertions(+), 66 deletions(-)
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -------------------------------------------------------------------
> Dario Faggioli, http://retis.sssup.it/people/faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06  8:38 ` [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Juergen Gross
  2011-12-06 10:10   ` Dario Faggioli
@ 2011-12-06 12:30   ` George Dunlap
  2011-12-06 12:39     ` Juergen Gross
  1 sibling, 1 reply; 21+ messages in thread
From: George Dunlap @ 2011-12-06 12:30 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Keir Fraser, xen-devel, Dario Faggioli

On Tue, Dec 6, 2011 at 8:38 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> (XEN) Xen BUG at spinlock.c:47
[snip]
> (XEN) Xen call trace:
> (XEN)    [<ffff82c480124e84>] check_lock+0x44/0x50
> (XEN)    [<ffff82c480124ec1>] _spin_lock+0x11/0x5d
> (XEN)    [<ffff82c48012c085>] xmem_pool_alloc+0x138/0x4d2
> (XEN)    [<ffff82c48012c557>] _xmalloc+0x138/0x230
> (XEN)    [<ffff82c48012c660>] _xzalloc+0x11/0x2d
> (XEN)    [<ffff82c48011f8ab>] sedf_adjust+0x37c/0x9b2
> (XEN)    [<ffff82c480120fec>] sched_adjust+0x5f/0xb7
> (XEN)    [<ffff82c4801037e3>] do_domctl+0xf32/0x1a9f
> (XEN)    [<ffff82c48021f128>] syscall_enter+0xc8/0x122

Hmm, looks like the problem is that we assert that locks must be
called with IRQs enabled all the time, or never.  From
xen/common/spinlock.c:

     * We partition locks into IRQ-safe (always held with IRQs disabled) and
     * IRQ-unsafe (always held with IRQs enabled) types. The convention for
     * every lock must be consistently observed else we can deadlock in
     * IRQ-context rendezvous functions (a rendezvous which gets every CPU
     * into IRQ context before any CPU is released from the rendezvous).

sedf_adj() grabs the private lock with irqs disabled, then calls
sedf_adjust_weights(), which calls xmalloc to allocate some local
scratch space, which ultimately grabs a non-IRQ spinlock.

Not sure the best thing to do here...

 -George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06 12:30   ` George Dunlap
@ 2011-12-06 12:39     ` Juergen Gross
  2011-12-06 16:39       ` Dario Faggioli
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2011-12-06 12:39 UTC (permalink / raw)
  To: George Dunlap; +Cc: Dario Faggioli, xen-devel, Keir Fraser

On 12/06/2011 01:30 PM, George Dunlap wrote:
> On Tue, Dec 6, 2011 at 8:38 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com>  wrote:
>> (XEN) Xen BUG at spinlock.c:47
> [snip]
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c480124e84>] check_lock+0x44/0x50
>> (XEN)    [<ffff82c480124ec1>] _spin_lock+0x11/0x5d
>> (XEN)    [<ffff82c48012c085>] xmem_pool_alloc+0x138/0x4d2
>> (XEN)    [<ffff82c48012c557>] _xmalloc+0x138/0x230
>> (XEN)    [<ffff82c48012c660>] _xzalloc+0x11/0x2d
>> (XEN)    [<ffff82c48011f8ab>] sedf_adjust+0x37c/0x9b2
>> (XEN)    [<ffff82c480120fec>] sched_adjust+0x5f/0xb7
>> (XEN)    [<ffff82c4801037e3>] do_domctl+0xf32/0x1a9f
>> (XEN)    [<ffff82c48021f128>] syscall_enter+0xc8/0x122
> Hmm, looks like the problem is that we assert that locks must be
> called with IRQs enabled all the time, or never.  From
> xen/common/spinlock.c:
>
>       * We partition locks into IRQ-safe (always held with IRQs disabled) and
>       * IRQ-unsafe (always held with IRQs enabled) types. The convention for
>       * every lock must be consistently observed else we can deadlock in
>       * IRQ-context rendezvous functions (a rendezvous which gets every CPU
>       * into IRQ context before any CPU is released from the rendezvous).
>
> sedf_adj() grabs the private lock with irqs disabled, then calls
> sedf_adjust_weights(), which calls xmalloc to allocate some local
> scratch space, which ultimately grabs a non-IRQ spinlock.
>
> Not sure the best thing to do here...
sedf_adjust_weights() is called only from sedf_adj(). The easiest solution
would be to xmalloc the scratch space in sedf_adj() before grabbing the lock
and passing the xmalloced area to sedf_adjust_weights().


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
  2011-12-06 10:34     ` George Dunlap
@ 2011-12-06 16:35       ` Dario Faggioli
  0 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-12-06 16:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: George Dunlap, Juergen Gross, xen-devel@lists.xensource.com,
	Keir (Xen.org), Ian Campbell

On Tue, Dec 6, 2011 at 11:34 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>> Given that every scheduler plugin is going to need this lock perhaps it
>> could be provided globally (but still have the responsibility for using
>> it fall on the plugin)?
>
> Sorry for the long delay in responding... I don't really like this idea.
> For one thing, that would make the generic scheduler code responsible
> for making per-cpupool locks, which could look messy, and adds to code
> complexity.
>
Right. I was looking into this and facing right this issue, i.e., how to make
that lock a per-cpupool thing, and wasn't quite succeeding in doing that
in a clean way.

> I also think that logically, having each scheduler in charge
> of its own locking makes more sense; having the generic scheduling code
> do stuff on behalf of the pluggable scheduler is what caused this
> problem in the first place.
>
Indeed.

> If we think having a one-item struct is ugly, could we just use
> spinlock_t as the type we allocate / cast to in the sedf scheduler?
>
Well, I put that struct there in the first place so I definitely can live
with it. I certainly don't think it is the most beautiful piece of code
I've ever wrote but, considering I'd have to dynamically allocate the
lock anyway (for the reasons above), and access it through
ops->sched_data, having it pointing directly to the lock is not that
much better than the struct to me.

Therefore, I think I'll let you decide here, and will keep or kill the
struct basing on what you think is nicer! :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06 12:39     ` Juergen Gross
@ 2011-12-06 16:39       ` Dario Faggioli
  0 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-12-06 16:39 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel, Keir Fraser

On Tue, Dec 6, 2011 at 1:39 PM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> sedf_adjust_weights() is called only from sedf_adj(). The easiest solution
> would be to xmalloc the scratch space in sedf_adj() before grabbing the lock
> and passing the xmalloced area to sedf_adjust_weights().
>
Done, and it seems to be working... It might be a bit weird to read,
but I couldn't
find any other equally easy and effective solution so.

Thanks a lot and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust
  2011-12-06 12:24 ` George Dunlap
@ 2011-12-06 16:46   ` Dario Faggioli
  0 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-12-06 16:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: Juergen Gross, xen-devel, Keir Fraser

On Tue, Dec 6, 2011 at 1:24 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> Hey Dario,  sorry for the long delay in responding.
>
Hi, and no problem at all :-)

> Regressions in mid-patch series are bad because it can mess up bisection.
>
I strongly agree. I didn't think much about that for this series since
the mechanism
I was trying to fix was (especially for sedf) already broken in
various ways but,
indeed, patches could be better split.

> I think a better way of breaking it down would be:
> * Add scheduler global locks, and have them called in the pluggable
> scheduler *_adjust() function.  This is redundant, but shouldn't be
> harmful.
> * Atomically remove both the pause and the lock in schedule.c, and add
> the scheduling locks around the per-vcpu EDOM_INFO variables in
> sched_sedf.c, all in one patch.  This makes the patch bigger, but it
> avoids a regression.
>
Ok, I'll go for this.

> The two things are related anwyay: the reason
> you now need scheduling locks around EDOM_INFO variables is because
> you're getting rid of the pausing and the lock in schedule.c.
>
Yeah, what I was trying to do was leaving a clear footstep about why
pausing was going and, yes, also making patches smaller, but again, I
see your point and will follow your advice. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 0 of 1] Rework locking for sched_adjust.
  2011-11-23 16:24   ` Ian Campbell
  2011-11-23 17:09     ` Dario Faggioli
  2011-12-06 10:34     ` George Dunlap
@ 2011-12-07 14:49     ` Dario Faggioli
  2011-12-07 15:02       ` [PATCHv2 1 " Dario Faggioli
  2011-12-07 15:04       ` [PATCHv2 0 " Dario Faggioli
  2 siblings, 2 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-12-07 14:49 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com
  Cc: George Dunlap, Juergen Gross, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1666 bytes --]

Hi everyone,

Here it is v2 of the lock reworking around and within sched-adjust.

With respect to the first posting [1]:
 - I _did_not_ move the per-pluggable scheduler lock toward schedule.c,
   as agreed with George during review;
 - I fixed the bug in sedf spotted by Juergen the way he suggested;
 - I've finally been able to test it under all the three schedulers, 
   and it is doing its job, at least here;

Notice the series "collapsed" in one signle patch, as it was being hard
to find a breakdown of it that does not introduce regressions and/or
transient deadlock situations worse than the ones it's trying to cure...
I hope it's still readable and comfortable to review. :-)

Thanks to everyone who provided his feedback on the first version of
this.

Regards,
Dario

[1] http://xen.1045712.n5.nabble.com/RFC-RFT-PATCH-0-of-3-rework-locking-in-sched-adjust-td5016899.html

--
 xen/common/sched_credit.c  |   10 ++++++--
 xen/common/sched_credit2.c |   21 ++++++++++---------
 xen/common/sched_sedf.c    |  156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
 xen/common/schedule.c      |   34 +-------------------------------
 4 files changed, 140 insertions(+), 81 deletions(-)

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 1 of 1] Rework locking for sched_adjust.
  2011-12-07 14:49     ` [PATCHv2 0 of 1] Rework locking for sched_adjust Dario Faggioli
@ 2011-12-07 15:02       ` Dario Faggioli
  2011-12-14 10:24         ` George Dunlap
  2011-12-07 15:04       ` [PATCHv2 0 " Dario Faggioli
  1 sibling, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2011-12-07 15:02 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com
  Cc: George Dunlap, Juergen Gross, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 16646 bytes --]

The main idea is to move (as much as possible) locking logic
from generic code to the various pluggable schedulers.

While at it, the following is also accomplished:
 - pausing all the non-current VCPUs of a domain while changing its
   scheduling parameters is not effective in avoiding races and it is
   prone to deadlock, so that is removed.
 - sedf needs a global lock for preventing races while adjusting
   domains' scheduling parameters (as it is for credit and credit2),
   so that is added.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r 38eb74c01d9d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Tue Dec 06 21:16:56 2011 +0000
+++ b/xen/common/sched_credit.c	Wed Dec 07 15:45:55 2011 +0100
@@ -161,6 +161,7 @@ struct csched_dom {
  * System-wide private data
  */
 struct csched_private {
+    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
     spinlock_t lock;
     struct list_head active_sdom;
     uint32_t ncpus;
@@ -800,6 +801,10 @@ csched_dom_cntl(
     struct csched_private *prv = CSCHED_PRIV(ops);
     unsigned long flags;
 
+    /* Protect both get and put branches with the pluggable scheduler
+     * lock. Runq lock not needed anywhere in here. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit.weight = sdom->weight;
@@ -809,8 +814,6 @@ csched_dom_cntl(
     {
         ASSERT(op->cmd == XEN_DOMCTL_SCHEDOP_putinfo);
 
-        spin_lock_irqsave(&prv->lock, flags);
-
         if ( op->u.credit.weight != 0 )
         {
             if ( !list_empty(&sdom->active_sdom_elem) )
@@ -824,9 +827,10 @@ csched_dom_cntl(
         if ( op->u.credit.cap != (uint16_t)~0U )
             sdom->cap = op->u.credit.cap;
 
-        spin_unlock_irqrestore(&prv->lock, flags);
     }
 
+    spin_unlock_irqrestore(&prv->lock, flags);
+
     return 0;
 }
 
diff -r 38eb74c01d9d xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c	Tue Dec 06 21:16:56 2011 +0000
+++ b/xen/common/sched_credit2.c	Wed Dec 07 15:45:55 2011 +0100
@@ -1384,6 +1384,10 @@ csched_dom_cntl(
     struct csched_private *prv = CSCHED_PRIV(ops);
     unsigned long flags;
 
+    /* Must hold csched_priv lock to read and update sdom,
+     * runq lock to update csvcs. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit2.weight = sdom->weight;
@@ -1397,10 +1401,6 @@ csched_dom_cntl(
             struct list_head *iter;
             int old_weight;
 
-            /* Must hold csched_priv lock to update sdom, runq lock to
-             * update csvcs. */
-            spin_lock_irqsave(&prv->lock, flags);
-
             old_weight = sdom->weight;
 
             sdom->weight = op->u.credit2.weight;
@@ -1411,22 +1411,23 @@ csched_dom_cntl(
                 struct csched_vcpu *svc = list_entry(iter, struct csched_vcpu, sdom_elem);
 
                 /* NB: Locking order is important here.  Because we grab this lock here, we
-                 * must never lock csched_priv.lock if we're holding a runqueue
-                 * lock. */
-                vcpu_schedule_lock_irq(svc->vcpu);
+                 * must never lock csched_priv.lock if we're holding a runqueue lock.
+                 * Also, calling vcpu_schedule_lock() is enough, since IRQs have already
+                 * been disabled. */
+                vcpu_schedule_lock(svc->vcpu);
 
                 BUG_ON(svc->rqd != RQD(ops, svc->vcpu->processor));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                vcpu_schedule_unlock_irq(svc->vcpu);
+                vcpu_schedule_unlock(svc->vcpu);
             }
-
-            spin_unlock_irqrestore(&prv->lock, flags);
         }
     }
 
+    spin_unlock_irqrestore(&prv->lock, flags);
+
     return 0;
 }
 
diff -r 38eb74c01d9d xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c	Tue Dec 06 21:16:56 2011 +0000
+++ b/xen/common/sched_sedf.c	Wed Dec 07 15:45:55 2011 +0100
@@ -61,6 +61,11 @@ struct sedf_dom_info {
     struct domain  *domain;
 };
 
+struct sedf_priv_info {
+    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
+    spinlock_t lock;
+};
+
 struct sedf_vcpu_info {
     struct vcpu *vcpu;
     struct list_head list;
@@ -115,6 +120,8 @@ struct sedf_cpu_info {
     s_time_t         current_slice_expires;
 };
 
+#define SEDF_PRIV(_ops) \
+    ((struct sedf_priv_info *)((_ops)->sched_data))
 #define EDOM_INFO(d)   ((struct sedf_vcpu_info *)((d)->sched_priv))
 #define CPU_INFO(cpu)  \
     ((struct sedf_cpu_info *)per_cpu(schedule_data, cpu).sched_priv)
@@ -772,6 +779,31 @@ static struct task_slice sedf_do_extra_s
 }
 
 
+static int sedf_init(struct scheduler *ops)
+{
+    struct sedf_priv_info *prv;
+
+    prv = xzalloc(struct sedf_priv_info);
+    if ( prv == NULL )
+        return -ENOMEM;
+
+    ops->sched_data = prv;
+    spin_lock_init(&prv->lock);
+
+    return 0;
+}
+
+
+static void sedf_deinit(const struct scheduler *ops)
+{
+    struct sedf_priv_info *prv;
+
+    prv = SEDF_PRIV(ops);
+    if ( prv != NULL )
+        xfree(prv);
+}
+
+
 /* Main scheduling function
    Reasons for calling this function are:
    -timeslice for the current period used up
@@ -1320,22 +1352,15 @@ static void sedf_dump_cpu_state(const st
 
 
 /* Adjusts periods and slices of the domains accordingly to their weights. */
-static int sedf_adjust_weights(struct cpupool *c, struct xen_domctl_scheduler_op *cmd)
+static int sedf_adjust_weights(struct cpupool *c, int nr_cpus, int *sumw, s_time_t *sumt)
 {
     struct vcpu *p;
     struct domain      *d;
-    unsigned int        cpu, nr_cpus = cpumask_last(&cpu_online_map) + 1;
-    int                *sumw = xzalloc_array(int, nr_cpus);
-    s_time_t           *sumt = xzalloc_array(s_time_t, nr_cpus);
+    unsigned int        cpu;
 
-    if ( !sumw || !sumt )
-    {
-        xfree(sumt);
-        xfree(sumw);
-        return -ENOMEM;
-    }
-
-    /* Sum across all weights. */
+    /* Sum across all weights. Notice that no runq locking is needed
+     * here: the caller holds sedf_priv_info.lock and we're not changing
+     * anything that is accessed during scheduling. */
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool( d, c )
     {
@@ -1365,7 +1390,9 @@ static int sedf_adjust_weights(struct cp
     }
     rcu_read_unlock(&domlist_read_lock);
 
-    /* Adjust all slices (and periods) to the new weight. */
+    /* Adjust all slices (and periods) to the new weight. Unlike above, we
+     * need to take thr runq lock for the various VCPUs: we're modyfing
+     * slice and period which are referenced during scheduling. */
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool( d, c )
     {
@@ -1375,20 +1402,20 @@ static int sedf_adjust_weights(struct cp
                 continue;
             if ( EDOM_INFO(p)->weight )
             {
+                /* Interrupts already off */
+                vcpu_schedule_lock(p);
                 EDOM_INFO(p)->period_orig = 
                     EDOM_INFO(p)->period  = WEIGHT_PERIOD;
                 EDOM_INFO(p)->slice_orig  =
                     EDOM_INFO(p)->slice   = 
                     (EDOM_INFO(p)->weight *
                      (WEIGHT_PERIOD - WEIGHT_SAFETY - sumt[cpu])) / sumw[cpu];
+                vcpu_schedule_unlock(p);
             }
         }
     }
     rcu_read_unlock(&domlist_read_lock);
 
-    xfree(sumt);
-    xfree(sumw);
-
     return 0;
 }
 
@@ -1396,19 +1423,45 @@ static int sedf_adjust_weights(struct cp
 /* set or fetch domain scheduling parameters */
 static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op)
 {
+    struct sedf_priv_info *prv = SEDF_PRIV(ops);
+    unsigned long flags;
+    unsigned int nr_cpus = cpumask_last(&cpu_online_map) + 1;
+    int *sumw = xzalloc_array(int, nr_cpus);
+    s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus);
     struct vcpu *v;
-    int rc;
+    int rc = 0;
 
     PRINT(2,"sedf_adjust was called, domain-id %i new period %"PRIu64" "
           "new slice %"PRIu64"\nlatency %"PRIu64" extra:%s\n",
           p->domain_id, op->u.sedf.period, op->u.sedf.slice,
           op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no");
 
+    /* Serialize against the pluggable scheduler lock to protect from
+     * concurrent updates. We need to take the runq lock for the VCPUs
+     * as well, since we are touching extraweight, weight, slice and
+     * period. As in sched_credit2.c, runq locks nest inside the
+     * pluggable scheduler lock. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo )
     {
+        /* These are used in sedf_adjust_weights() but have to be allocated in
+         * this function, as we need to avoid nesting xmem_pool_alloc's lock
+         * within our prv->lock. */
+        if ( !sumw || !sumt )
+        {
+            /* Check for errors here, the _getinfo branch doesn't care */
+            rc = -ENOMEM;
+            goto out;
+        }
+
         /* Check for sane parameters. */
         if ( !op->u.sedf.period && !op->u.sedf.weight )
-            return -EINVAL;
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+
         if ( op->u.sedf.weight )
         {
             if ( (op->u.sedf.extratime & EXTRA_AWARE) &&
@@ -1417,59 +1470,78 @@ static int sedf_adjust(const struct sche
                 /* Weight-driven domains with extratime only. */
                 for_each_vcpu ( p, v )
                 {
+                    /* (Here and everywhere in the following) IRQs are already off,
+                     * hence vcpu_spin_lock() is the one. */
+                    vcpu_schedule_lock(v);
                     EDOM_INFO(v)->extraweight = op->u.sedf.weight;
                     EDOM_INFO(v)->weight = 0;
                     EDOM_INFO(v)->slice = 0;
                     EDOM_INFO(v)->period = WEIGHT_PERIOD;
+                    vcpu_schedule_unlock(v);
                 }
             }
             else
             {
                 /* Weight-driven domains with real-time execution. */
-                for_each_vcpu ( p, v )
+                for_each_vcpu ( p, v ) {
+                    vcpu_schedule_lock(v);
                     EDOM_INFO(v)->weight = op->u.sedf.weight;
+                    vcpu_schedule_unlock(v);
+                }
             }
         }
         else
         {
+            /*
+             * Sanity checking: note that disabling extra weight requires
+             * that we set a non-zero slice.
+             */
+            if ( (op->u.sedf.period > PERIOD_MAX) ||
+                 (op->u.sedf.period < PERIOD_MIN) ||
+                 (op->u.sedf.slice  > op->u.sedf.period) ||
+                 (op->u.sedf.slice  < SLICE_MIN) )
+            {
+                rc = -EINVAL;
+                goto out;
+            }
+
             /* Time-driven domains. */
             for_each_vcpu ( p, v )
             {
-                /*
-                 * Sanity checking: note that disabling extra weight requires
-                 * that we set a non-zero slice.
-                 */
-                if ( (op->u.sedf.period > PERIOD_MAX) ||
-                     (op->u.sedf.period < PERIOD_MIN) ||
-                     (op->u.sedf.slice  > op->u.sedf.period) ||
-                     (op->u.sedf.slice  < SLICE_MIN) )
-                    return -EINVAL;
+                vcpu_schedule_lock(v);
                 EDOM_INFO(v)->weight = 0;
                 EDOM_INFO(v)->extraweight = 0;
                 EDOM_INFO(v)->period_orig = 
                     EDOM_INFO(v)->period  = op->u.sedf.period;
                 EDOM_INFO(v)->slice_orig  = 
                     EDOM_INFO(v)->slice   = op->u.sedf.slice;
+                vcpu_schedule_unlock(v);
             }
         }
 
-        rc = sedf_adjust_weights(p->cpupool, op);
+        rc = sedf_adjust_weights(p->cpupool, nr_cpus, sumw, sumt);
         if ( rc )
-            return rc;
+            goto out;
 
         for_each_vcpu ( p, v )
         {
+            vcpu_schedule_lock(v);
             EDOM_INFO(v)->status  = 
                 (EDOM_INFO(v)->status &
                  ~EXTRA_AWARE) | (op->u.sedf.extratime & EXTRA_AWARE);
             EDOM_INFO(v)->latency = op->u.sedf.latency;
             extraq_check(v);
+            vcpu_schedule_unlock(v);
         }
     }
     else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         if ( p->vcpu[0] == NULL )
-            return -EINVAL;
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+
         op->u.sedf.period    = EDOM_INFO(p->vcpu[0])->period;
         op->u.sedf.slice     = EDOM_INFO(p->vcpu[0])->slice;
         op->u.sedf.extratime = EDOM_INFO(p->vcpu[0])->status & EXTRA_AWARE;
@@ -1477,14 +1549,23 @@ static int sedf_adjust(const struct sche
         op->u.sedf.weight    = EDOM_INFO(p->vcpu[0])->weight;
     }
 
-    PRINT(2,"sedf_adjust_finished\n");
-    return 0;
+out:
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    xfree(sumt);
+    xfree(sumw);
+
+    PRINT(2,"sedf_adjust_finished with return code %d\n", rc);
+    return rc;
 }
 
+static struct sedf_priv_info _sedf_priv;
+
 const struct scheduler sched_sedf_def = {
-    .name     = "Simple EDF Scheduler",
-    .opt_name = "sedf",
-    .sched_id = XEN_SCHEDULER_SEDF,
+    .name           = "Simple EDF Scheduler",
+    .opt_name       = "sedf",
+    .sched_id       = XEN_SCHEDULER_SEDF,
+    .sched_data     = &_sedf_priv,
     
     .init_domain    = sedf_init_domain,
     .destroy_domain = sedf_destroy_domain,
@@ -1498,6 +1579,9 @@ const struct scheduler sched_sedf_def = 
     .alloc_domdata  = sedf_alloc_domdata,
     .free_domdata   = sedf_free_domdata,
 
+    .init           = sedf_init,
+    .deinit         = sedf_deinit,
+
     .do_schedule    = sedf_do_schedule,
     .pick_cpu       = sedf_pick_cpu,
     .dump_cpu_state = sedf_dump_cpu_state,
diff -r 38eb74c01d9d xen/common/schedule.c
--- a/xen/common/schedule.c	Tue Dec 06 21:16:56 2011 +0000
+++ b/xen/common/schedule.c	Wed Dec 07 15:45:55 2011 +0100
@@ -1005,7 +1005,6 @@ int sched_id(void)
 /* Adjust scheduling parameter for a given domain. */
 long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 {
-    struct vcpu *v;
     long ret;
     
     if ( (op->sched_id != DOM2OP(d)->sched_id) ||
@@ -1013,40 +1012,11 @@ long sched_adjust(struct domain *d, stru
           (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
         return -EINVAL;
 
-    /*
-     * Most VCPUs we can simply pause. If we are adjusting this VCPU then
-     * we acquire the local schedule_lock to guard against concurrent updates.
-     *
-     * We only acquire the local schedule lock after we have paused all other
-     * VCPUs in this domain. There are two reasons for this:
-     * 1- We don't want to hold up interrupts as pausing a VCPU can
-     *    trigger a tlb shootdown.
-     * 2- Pausing other VCPUs involves briefly locking the schedule
-     *    lock of the CPU they are running on. This CPU could be the
-     *    same as ours.
-     */
-
-    for_each_vcpu ( d, v )
-    {
-        if ( v != current )
-            vcpu_pause(v);
-    }
-
-    if ( d == current->domain )
-        vcpu_schedule_lock_irq(current);
-
+    /* NB: the pluggable scheduler code needs to take care
+     * of locking by itself. */
     if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
-    if ( d == current->domain )
-        vcpu_schedule_unlock_irq(current);
-
-    for_each_vcpu ( d, v )
-    {
-        if ( v != current )
-            vcpu_unpause(v);
-    }
-
     return ret;
 }
 

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 0 of 1] Rework locking for sched_adjust.
  2011-12-07 14:49     ` [PATCHv2 0 of 1] Rework locking for sched_adjust Dario Faggioli
  2011-12-07 15:02       ` [PATCHv2 1 " Dario Faggioli
@ 2011-12-07 15:04       ` Dario Faggioli
  1 sibling, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2011-12-07 15:04 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com
  Cc: George Dunlap, Juergen Gross, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 629 bytes --]

On Wed, 2011-12-07 at 15:49 +0100, Dario Faggioli wrote:
> Hi everyone,
>
Ok, apparently Evolution (with maybe a little help from me too! :-P)
messed up with threading... For the next time, I'll take a serious look
at that thing called patchbomb, I promise! :-P

Sorry for this,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1 of 1] Rework locking for sched_adjust.
  2011-12-07 15:02       ` [PATCHv2 1 " Dario Faggioli
@ 2011-12-14 10:24         ` George Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: George Dunlap @ 2011-12-14 10:24 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, Juergen Gross, xen-devel@lists.xensource.com,
	Keir (Xen.org), Ian Campbell

Dario, this patch won't apply due to wordwrap damage.  Try using "hg
email", or sending as an attachment.

Thanks,
 -George

On Wed, 2011-12-07 at 15:02 +0000, Dario Faggioli wrote:
> The main idea is to move (as much as possible) locking logic
> from generic code to the various pluggable schedulers.
> 
> While at it, the following is also accomplished:
>  - pausing all the non-current VCPUs of a domain while changing its
>    scheduling parameters is not effective in avoiding races and it is
>    prone to deadlock, so that is removed.
>  - sedf needs a global lock for preventing races while adjusting
>    domains' scheduling parameters (as it is for credit and credit2),
>    so that is added.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> diff -r 38eb74c01d9d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c	Tue Dec 06 21:16:56 2011 +0000
> +++ b/xen/common/sched_credit.c	Wed Dec 07 15:45:55 2011 +0100
> @@ -161,6 +161,7 @@ struct csched_dom {
>   * System-wide private data
>   */
>  struct csched_private {
> +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
>      spinlock_t lock;
>      struct list_head active_sdom;
>      uint32_t ncpus;
> @@ -800,6 +801,10 @@ csched_dom_cntl(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      unsigned long flags;
>  
> +    /* Protect both get and put branches with the pluggable scheduler
> +     * lock. Runq lock not needed anywhere in here. */
> +    spin_lock_irqsave(&prv->lock, flags);
> +
>      if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
>      {
>          op->u.credit.weight = sdom->weight;
> @@ -809,8 +814,6 @@ csched_dom_cntl(
>      {
>          ASSERT(op->cmd == XEN_DOMCTL_SCHEDOP_putinfo);
>  
> -        spin_lock_irqsave(&prv->lock, flags);
> -
>          if ( op->u.credit.weight != 0 )
>          {
>              if ( !list_empty(&sdom->active_sdom_elem) )
> @@ -824,9 +827,10 @@ csched_dom_cntl(
>          if ( op->u.credit.cap != (uint16_t)~0U )
>              sdom->cap = op->u.credit.cap;
>  
> -        spin_unlock_irqrestore(&prv->lock, flags);
>      }
>  
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
>      return 0;
>  }
>  
> diff -r 38eb74c01d9d xen/common/sched_credit2.c
> --- a/xen/common/sched_credit2.c	Tue Dec 06 21:16:56 2011 +0000
> +++ b/xen/common/sched_credit2.c	Wed Dec 07 15:45:55 2011 +0100
> @@ -1384,6 +1384,10 @@ csched_dom_cntl(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      unsigned long flags;
>  
> +    /* Must hold csched_priv lock to read and update sdom,
> +     * runq lock to update csvcs. */
> +    spin_lock_irqsave(&prv->lock, flags);
> +
>      if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
>      {
>          op->u.credit2.weight = sdom->weight;
> @@ -1397,10 +1401,6 @@ csched_dom_cntl(
>              struct list_head *iter;
>              int old_weight;
>  
> -            /* Must hold csched_priv lock to update sdom, runq lock to
> -             * update csvcs. */
> -            spin_lock_irqsave(&prv->lock, flags);
> -
>              old_weight = sdom->weight;
>  
>              sdom->weight = op->u.credit2.weight;
> @@ -1411,22 +1411,23 @@ csched_dom_cntl(
>                  struct csched_vcpu *svc = list_entry(iter, struct csched_vcpu, sdom_elem);
>  
>                  /* NB: Locking order is important here.  Because we grab this lock here, we
> -                 * must never lock csched_priv.lock if we're holding a runqueue
> -                 * lock. */
> -                vcpu_schedule_lock_irq(svc->vcpu);
> +                 * must never lock csched_priv.lock if we're holding a runqueue lock.
> +                 * Also, calling vcpu_schedule_lock() is enough, since IRQs have already
> +                 * been disabled. */
> +                vcpu_schedule_lock(svc->vcpu);
>  
>                  BUG_ON(svc->rqd != RQD(ops, svc->vcpu->processor));
>  
>                  svc->weight = sdom->weight;
>                  update_max_weight(svc->rqd, svc->weight, old_weight);
>  
> -                vcpu_schedule_unlock_irq(svc->vcpu);
> +                vcpu_schedule_unlock(svc->vcpu);
>              }
> -
> -            spin_unlock_irqrestore(&prv->lock, flags);
>          }
>      }
>  
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
>      return 0;
>  }
>  
> diff -r 38eb74c01d9d xen/common/sched_sedf.c
> --- a/xen/common/sched_sedf.c	Tue Dec 06 21:16:56 2011 +0000
> +++ b/xen/common/sched_sedf.c	Wed Dec 07 15:45:55 2011 +0100
> @@ -61,6 +61,11 @@ struct sedf_dom_info {
>      struct domain  *domain;
>  };
>  
> +struct sedf_priv_info {
> +    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
> +    spinlock_t lock;
> +};
> +
>  struct sedf_vcpu_info {
>      struct vcpu *vcpu;
>      struct list_head list;
> @@ -115,6 +120,8 @@ struct sedf_cpu_info {
>      s_time_t         current_slice_expires;
>  };
>  
> +#define SEDF_PRIV(_ops) \
> +    ((struct sedf_priv_info *)((_ops)->sched_data))
>  #define EDOM_INFO(d)   ((struct sedf_vcpu_info *)((d)->sched_priv))
>  #define CPU_INFO(cpu)  \
>      ((struct sedf_cpu_info *)per_cpu(schedule_data, cpu).sched_priv)
> @@ -772,6 +779,31 @@ static struct task_slice sedf_do_extra_s
>  }
>  
> 
> +static int sedf_init(struct scheduler *ops)
> +{
> +    struct sedf_priv_info *prv;
> +
> +    prv = xzalloc(struct sedf_priv_info);
> +    if ( prv == NULL )
> +        return -ENOMEM;
> +
> +    ops->sched_data = prv;
> +    spin_lock_init(&prv->lock);
> +
> +    return 0;
> +}
> +
> +
> +static void sedf_deinit(const struct scheduler *ops)
> +{
> +    struct sedf_priv_info *prv;
> +
> +    prv = SEDF_PRIV(ops);
> +    if ( prv != NULL )
> +        xfree(prv);
> +}
> +
> +
>  /* Main scheduling function
>     Reasons for calling this function are:
>     -timeslice for the current period used up
> @@ -1320,22 +1352,15 @@ static void sedf_dump_cpu_state(const st
>  
> 
>  /* Adjusts periods and slices of the domains accordingly to their weights. */
> -static int sedf_adjust_weights(struct cpupool *c, struct xen_domctl_scheduler_op *cmd)
> +static int sedf_adjust_weights(struct cpupool *c, int nr_cpus, int *sumw, s_time_t *sumt)
>  {
>      struct vcpu *p;
>      struct domain      *d;
> -    unsigned int        cpu, nr_cpus = cpumask_last(&cpu_online_map) + 1;
> -    int                *sumw = xzalloc_array(int, nr_cpus);
> -    s_time_t           *sumt = xzalloc_array(s_time_t, nr_cpus);
> +    unsigned int        cpu;
>  
> -    if ( !sumw || !sumt )
> -    {
> -        xfree(sumt);
> -        xfree(sumw);
> -        return -ENOMEM;
> -    }
> -
> -    /* Sum across all weights. */
> +    /* Sum across all weights. Notice that no runq locking is needed
> +     * here: the caller holds sedf_priv_info.lock and we're not changing
> +     * anything that is accessed during scheduling. */
>      rcu_read_lock(&domlist_read_lock);
>      for_each_domain_in_cpupool( d, c )
>      {
> @@ -1365,7 +1390,9 @@ static int sedf_adjust_weights(struct cp
>      }
>      rcu_read_unlock(&domlist_read_lock);
>  
> -    /* Adjust all slices (and periods) to the new weight. */
> +    /* Adjust all slices (and periods) to the new weight. Unlike above, we
> +     * need to take thr runq lock for the various VCPUs: we're modyfing
> +     * slice and period which are referenced during scheduling. */
>      rcu_read_lock(&domlist_read_lock);
>      for_each_domain_in_cpupool( d, c )
>      {
> @@ -1375,20 +1402,20 @@ static int sedf_adjust_weights(struct cp
>                  continue;
>              if ( EDOM_INFO(p)->weight )
>              {
> +                /* Interrupts already off */
> +                vcpu_schedule_lock(p);
>                  EDOM_INFO(p)->period_orig = 
>                      EDOM_INFO(p)->period  = WEIGHT_PERIOD;
>                  EDOM_INFO(p)->slice_orig  =
>                      EDOM_INFO(p)->slice   = 
>                      (EDOM_INFO(p)->weight *
>                       (WEIGHT_PERIOD - WEIGHT_SAFETY - sumt[cpu])) / sumw[cpu];
> +                vcpu_schedule_unlock(p);
>              }
>          }
>      }
>      rcu_read_unlock(&domlist_read_lock);
>  
> -    xfree(sumt);
> -    xfree(sumw);
> -
>      return 0;
>  }
>  
> @@ -1396,19 +1423,45 @@ static int sedf_adjust_weights(struct cp
>  /* set or fetch domain scheduling parameters */
>  static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op)
>  {
> +    struct sedf_priv_info *prv = SEDF_PRIV(ops);
> +    unsigned long flags;
> +    unsigned int nr_cpus = cpumask_last(&cpu_online_map) + 1;
> +    int *sumw = xzalloc_array(int, nr_cpus);
> +    s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus);
>      struct vcpu *v;
> -    int rc;
> +    int rc = 0;
>  
>      PRINT(2,"sedf_adjust was called, domain-id %i new period %"PRIu64" "
>            "new slice %"PRIu64"\nlatency %"PRIu64" extra:%s\n",
>            p->domain_id, op->u.sedf.period, op->u.sedf.slice,
>            op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no");
>  
> +    /* Serialize against the pluggable scheduler lock to protect from
> +     * concurrent updates. We need to take the runq lock for the VCPUs
> +     * as well, since we are touching extraweight, weight, slice and
> +     * period. As in sched_credit2.c, runq locks nest inside the
> +     * pluggable scheduler lock. */
> +    spin_lock_irqsave(&prv->lock, flags);
> +
>      if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo )
>      {
> +        /* These are used in sedf_adjust_weights() but have to be allocated in
> +         * this function, as we need to avoid nesting xmem_pool_alloc's lock
> +         * within our prv->lock. */
> +        if ( !sumw || !sumt )
> +        {
> +            /* Check for errors here, the _getinfo branch doesn't care */
> +            rc = -ENOMEM;
> +            goto out;
> +        }
> +
>          /* Check for sane parameters. */
>          if ( !op->u.sedf.period && !op->u.sedf.weight )
> -            return -EINVAL;
> +        {
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
>          if ( op->u.sedf.weight )
>          {
>              if ( (op->u.sedf.extratime & EXTRA_AWARE) &&
> @@ -1417,59 +1470,78 @@ static int sedf_adjust(const struct sche
>                  /* Weight-driven domains with extratime only. */
>                  for_each_vcpu ( p, v )
>                  {
> +                    /* (Here and everywhere in the following) IRQs are already off,
> +                     * hence vcpu_spin_lock() is the one. */
> +                    vcpu_schedule_lock(v);
>                      EDOM_INFO(v)->extraweight = op->u.sedf.weight;
>                      EDOM_INFO(v)->weight = 0;
>                      EDOM_INFO(v)->slice = 0;
>                      EDOM_INFO(v)->period = WEIGHT_PERIOD;
> +                    vcpu_schedule_unlock(v);
>                  }
>              }
>              else
>              {
>                  /* Weight-driven domains with real-time execution. */
> -                for_each_vcpu ( p, v )
> +                for_each_vcpu ( p, v ) {
> +                    vcpu_schedule_lock(v);
>                      EDOM_INFO(v)->weight = op->u.sedf.weight;
> +                    vcpu_schedule_unlock(v);
> +                }
>              }
>          }
>          else
>          {
> +            /*
> +             * Sanity checking: note that disabling extra weight requires
> +             * that we set a non-zero slice.
> +             */
> +            if ( (op->u.sedf.period > PERIOD_MAX) ||
> +                 (op->u.sedf.period < PERIOD_MIN) ||
> +                 (op->u.sedf.slice  > op->u.sedf.period) ||
> +                 (op->u.sedf.slice  < SLICE_MIN) )
> +            {
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
>              /* Time-driven domains. */
>              for_each_vcpu ( p, v )
>              {
> -                /*
> -                 * Sanity checking: note that disabling extra weight requires
> -                 * that we set a non-zero slice.
> -                 */
> -                if ( (op->u.sedf.period > PERIOD_MAX) ||
> -                     (op->u.sedf.period < PERIOD_MIN) ||
> -                     (op->u.sedf.slice  > op->u.sedf.period) ||
> -                     (op->u.sedf.slice  < SLICE_MIN) )
> -                    return -EINVAL;
> +                vcpu_schedule_lock(v);
>                  EDOM_INFO(v)->weight = 0;
>                  EDOM_INFO(v)->extraweight = 0;
>                  EDOM_INFO(v)->period_orig = 
>                      EDOM_INFO(v)->period  = op->u.sedf.period;
>                  EDOM_INFO(v)->slice_orig  = 
>                      EDOM_INFO(v)->slice   = op->u.sedf.slice;
> +                vcpu_schedule_unlock(v);
>              }
>          }
>  
> -        rc = sedf_adjust_weights(p->cpupool, op);
> +        rc = sedf_adjust_weights(p->cpupool, nr_cpus, sumw, sumt);
>          if ( rc )
> -            return rc;
> +            goto out;
>  
>          for_each_vcpu ( p, v )
>          {
> +            vcpu_schedule_lock(v);
>              EDOM_INFO(v)->status  = 
>                  (EDOM_INFO(v)->status &
>                   ~EXTRA_AWARE) | (op->u.sedf.extratime & EXTRA_AWARE);
>              EDOM_INFO(v)->latency = op->u.sedf.latency;
>              extraq_check(v);
> +            vcpu_schedule_unlock(v);
>          }
>      }
>      else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
>      {
>          if ( p->vcpu[0] == NULL )
> -            return -EINVAL;
> +        {
> +            rc = -EINVAL;
> +            goto out;
> +        }
> +
>          op->u.sedf.period    = EDOM_INFO(p->vcpu[0])->period;
>          op->u.sedf.slice     = EDOM_INFO(p->vcpu[0])->slice;
>          op->u.sedf.extratime = EDOM_INFO(p->vcpu[0])->status & EXTRA_AWARE;
> @@ -1477,14 +1549,23 @@ static int sedf_adjust(const struct sche
>          op->u.sedf.weight    = EDOM_INFO(p->vcpu[0])->weight;
>      }
>  
> -    PRINT(2,"sedf_adjust_finished\n");
> -    return 0;
> +out:
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    xfree(sumt);
> +    xfree(sumw);
> +
> +    PRINT(2,"sedf_adjust_finished with return code %d\n", rc);
> +    return rc;
>  }
>  
> +static struct sedf_priv_info _sedf_priv;
> +
>  const struct scheduler sched_sedf_def = {
> -    .name     = "Simple EDF Scheduler",
> -    .opt_name = "sedf",
> -    .sched_id = XEN_SCHEDULER_SEDF,
> +    .name           = "Simple EDF Scheduler",
> +    .opt_name       = "sedf",
> +    .sched_id       = XEN_SCHEDULER_SEDF,
> +    .sched_data     = &_sedf_priv,
>      
>      .init_domain    = sedf_init_domain,
>      .destroy_domain = sedf_destroy_domain,
> @@ -1498,6 +1579,9 @@ const struct scheduler sched_sedf_def = 
>      .alloc_domdata  = sedf_alloc_domdata,
>      .free_domdata   = sedf_free_domdata,
>  
> +    .init           = sedf_init,
> +    .deinit         = sedf_deinit,
> +
>      .do_schedule    = sedf_do_schedule,
>      .pick_cpu       = sedf_pick_cpu,
>      .dump_cpu_state = sedf_dump_cpu_state,
> diff -r 38eb74c01d9d xen/common/schedule.c
> --- a/xen/common/schedule.c	Tue Dec 06 21:16:56 2011 +0000
> +++ b/xen/common/schedule.c	Wed Dec 07 15:45:55 2011 +0100
> @@ -1005,7 +1005,6 @@ int sched_id(void)
>  /* Adjust scheduling parameter for a given domain. */
>  long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
>  {
> -    struct vcpu *v;
>      long ret;
>      
>      if ( (op->sched_id != DOM2OP(d)->sched_id) ||
> @@ -1013,40 +1012,11 @@ long sched_adjust(struct domain *d, stru
>            (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
>          return -EINVAL;
>  
> -    /*
> -     * Most VCPUs we can simply pause. If we are adjusting this VCPU then
> -     * we acquire the local schedule_lock to guard against concurrent updates.
> -     *
> -     * We only acquire the local schedule lock after we have paused all other
> -     * VCPUs in this domain. There are two reasons for this:
> -     * 1- We don't want to hold up interrupts as pausing a VCPU can
> -     *    trigger a tlb shootdown.
> -     * 2- Pausing other VCPUs involves briefly locking the schedule
> -     *    lock of the CPU they are running on. This CPU could be the
> -     *    same as ours.
> -     */
> -
> -    for_each_vcpu ( d, v )
> -    {
> -        if ( v != current )
> -            vcpu_pause(v);
> -    }
> -
> -    if ( d == current->domain )
> -        vcpu_schedule_lock_irq(current);
> -
> +    /* NB: the pluggable scheduler code needs to take care
> +     * of locking by itself. */
>      if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 )
>          TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
>  
> -    if ( d == current->domain )
> -        vcpu_schedule_unlock_irq(current);
> -
> -    for_each_vcpu ( d, v )
> -    {
> -        if ( v != current )
> -            vcpu_unpause(v);
> -    }
> -
>      return ret;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-12-14 10:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-23 14:55 [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Dario Faggioli
2011-11-23 15:07 ` [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers Dario Faggioli
2011-11-23 16:24   ` Ian Campbell
2011-11-23 17:09     ` Dario Faggioli
2011-11-23 17:30       ` Ian Campbell
2011-12-06 10:34     ` George Dunlap
2011-12-06 16:35       ` Dario Faggioli
2011-12-07 14:49     ` [PATCHv2 0 of 1] Rework locking for sched_adjust Dario Faggioli
2011-12-07 15:02       ` [PATCHv2 1 " Dario Faggioli
2011-12-14 10:24         ` George Dunlap
2011-12-07 15:04       ` [PATCHv2 0 " Dario Faggioli
2011-11-23 15:09 ` [RFC/RFT][PATCH 2 of 3] Remove VCPU pausing while adjusting domain scheduling parameters Dario Faggioli
2011-11-23 15:11 ` [RFC/RFT][PATCH 3 of 3] Introduce proper locking in sedf Dario Faggioli
2011-12-06  8:38 ` [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust Juergen Gross
2011-12-06 10:10   ` Dario Faggioli
2011-12-06 11:03     ` Juergen Gross
2011-12-06 12:30   ` George Dunlap
2011-12-06 12:39     ` Juergen Gross
2011-12-06 16:39       ` Dario Faggioli
2011-12-06 12:24 ` George Dunlap
2011-12-06 16:46   ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.