public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
* [PATCHv2 0/2] Multi-threaded RPM support
@ 2017-06-08 14:42 Alexander Kanavin
  2017-06-08 14:42 ` [PATCHv2 1/2] package_rpm.bbclass: use multithreaded xz compression Alexander Kanavin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-08 14:42 UTC (permalink / raw)
  To: openembedded-core

Changes in version 2: use openmp for multi-processing instead of NSPR library, as
requested by upstream. Add a link to the upstream submission to patches.

These two patches add multi-threaded features to RPM to speed up the do_package_write_rpm task.

Specifically:

1) Multi-threaded xz compressor is used instead of default single-threaded gz. This has the most
dramatic effect when a recipe produces a smaller number of large-sized packages.

2) Packages creation is run in multiple threads via thread pools. This is most beneficial when
a recipe produces a large amount of small packages.

Some not very scientific benchmarks for time and .rpm sizes
(time is measured for do_package_write_rpm tasks only):

webkitgtk
before: 9m12s 1550M
after: 1m40s 858M

glibc-locale
before: 2m52s 125M
after: 30s 56M

glibc
before: 46s 54M
after: 13s 38M

perl
before: 1m09s 63M
after: 45s 42M

python3
before: 30s 38M
after: 18s 24M

The following changes since commit 576821ea0a7558b626ccc87e9ae0e9ee40864956:

  bitbake: bitbake-layers: check layer dependencies before adding (2017-06-06 19:52:51 +0100)

are available in the git repository at:

  git://git.yoctoproject.org/poky-contrib akanavin/parallel-rpm
  http://git.yoctoproject.org/cgit.cgi/poky-contrib/log/?h=akanavin/parallel-rpm

Alexander Kanavin (2):
  package_rpm.bbclass: use multithreaded xz compression
  rpm: run binary package generation via thread pools

 meta/classes/package_rpm.bbclass                   |   2 +
 ...y-package-building-into-a-separate-functi.patch |  83 ++++++++
 ...-binary-package-creation-via-thread-pools.patch | 125 ++++++++++++
 ...c-make-operations-over-string-pools-threa.patch | 207 ++++++++++++++++++++
 ...c-remove-static-local-variables-from-buil.patch | 216 +++++++++++++++++++++
 meta/recipes-devtools/rpm/rpm_git.bb               |   4 +
 6 files changed, 637 insertions(+)
 create mode 100644 meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch

-- 
2.11.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2 1/2] package_rpm.bbclass: use multithreaded xz compression
  2017-06-08 14:42 [PATCHv2 0/2] Multi-threaded RPM support Alexander Kanavin
@ 2017-06-08 14:42 ` Alexander Kanavin
  2017-06-08 14:42 ` [PATCHv2 2/2] rpm: run binary package generation via thread pools Alexander Kanavin
  2017-06-08 21:15 ` [PATCHv2 0/2] Multi-threaded RPM support Leonardo Sandoval
  2 siblings, 0 replies; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-08 14:42 UTC (permalink / raw)
  To: openembedded-core

RPM's default is single-threaded gz; the change greatly helps with
both buildtimes (when there is a small number of large-sized packages)
and disk space taken by resulting rpms.

Signed-off-by: Alexander Kanavin <alexander.kanavin@linux.intel.com>
---
 meta/classes/package_rpm.bbclass | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/meta/classes/package_rpm.bbclass b/meta/classes/package_rpm.bbclass
index a844c4d4360..dc241975c3b 100644
--- a/meta/classes/package_rpm.bbclass
+++ b/meta/classes/package_rpm.bbclass
@@ -644,6 +644,8 @@ python do_package_rpm () {
     cmd = cmd + " --define '_build_name_fmt %%{NAME}-%%{VERSION}-%%{RELEASE}.%%{ARCH}.rpm'"
     cmd = cmd + " --define '_use_internal_dependency_generator 0'"
     cmd = cmd + " --define '_binaries_in_noarch_packages_terminate_build 0'"
+    cmd = cmd + " --define '_binary_payload w6T.xzdio'"
+    cmd = cmd + " --define '_source_payload w6T.xzdio'"
     if perfiledeps:
         cmd = cmd + " --define '__find_requires " + outdepends + "'"
         cmd = cmd + " --define '__find_provides " + outprovides + "'"
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-08 14:42 [PATCHv2 0/2] Multi-threaded RPM support Alexander Kanavin
  2017-06-08 14:42 ` [PATCHv2 1/2] package_rpm.bbclass: use multithreaded xz compression Alexander Kanavin
@ 2017-06-08 14:42 ` Alexander Kanavin
  2017-06-08 16:55   ` Mark Hatle
  2017-06-08 21:15 ` [PATCHv2 0/2] Multi-threaded RPM support Leonardo Sandoval
  2 siblings, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-08 14:42 UTC (permalink / raw)
  To: openembedded-core

This greatly reduces build times when there is a large amount of small
rpm packages to produce. The patches are rather invasive,
and so will be submitted upstream.

Signed-off-by: Alexander Kanavin <alexander.kanavin@linux.intel.com>
---
 ...y-package-building-into-a-separate-functi.patch |  83 ++++++++
 ...-binary-package-creation-via-thread-pools.patch | 125 ++++++++++++
 ...c-make-operations-over-string-pools-threa.patch | 207 ++++++++++++++++++++
 ...c-remove-static-local-variables-from-buil.patch | 216 +++++++++++++++++++++
 meta/recipes-devtools/rpm/rpm_git.bb               |   4 +
 5 files changed, 635 insertions(+)
 create mode 100644 meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
 create mode 100644 meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch

diff --git a/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch b/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
new file mode 100644
index 00000000000..3d8b12144e7
--- /dev/null
+++ b/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
@@ -0,0 +1,83 @@
+From b841b699e519438a66b661247c94efff63d0700e Mon Sep 17 00:00:00 2001
+From: Alexander Kanavin <alex.kanavin@gmail.com>
+Date: Thu, 25 May 2017 18:15:27 +0300
+Subject: [PATCH 01/14] Split binary package building into a separate function
+
+So that it can be run as a thread pool task.
+
+Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
+Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
+---
+ build/pack.c | 33 +++++++++++++++++++++------------
+ 1 file changed, 21 insertions(+), 12 deletions(-)
+
+diff --git a/build/pack.c b/build/pack.c
+index 497300b96..891e6bdc3 100644
+--- a/build/pack.c
++++ b/build/pack.c
+@@ -546,18 +546,13 @@ static rpmRC checkPackages(char *pkgcheck)
+     return RPMRC_OK;
+ }
+ 
+-rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
++static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename)
+ {
+-    rpmRC rc;
+-    const char *errorString;
+-    Package pkg;
+-    char *pkglist = NULL;
+-
+-    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
+-	char *fn;
++        const char *errorString;
++        rpmRC rc = RPMRC_OK;
+ 
+ 	if (pkg->fileList == NULL)
+-	    continue;
++	    return rc;
+ 
+ 	if ((rc = processScriptFiles(spec, pkg)))
+ 	    return rc;
+@@ -591,7 +586,7 @@ rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
+ 		     headerGetString(pkg->header, RPMTAG_NAME), errorString);
+ 		return RPMRC_FAIL;
+ 	    }
+-	    fn = rpmGetPath("%{_rpmdir}/", binRpm, NULL);
++	    *filename = rpmGetPath("%{_rpmdir}/", binRpm, NULL);
+ 	    if ((binDir = strchr(binRpm, '/')) != NULL) {
+ 		struct stat st;
+ 		char *dn;
+@@ -613,14 +608,28 @@ rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
+ 	    free(binRpm);
+ 	}
+ 
+-	rc = writeRPM(pkg, NULL, fn, NULL);
++	rc = writeRPM(pkg, NULL, *filename, NULL);
+ 	if (rc == RPMRC_OK) {
+ 	    /* Do check each written package if enabled */
+-	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", fn, NULL);
++	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", *filename, NULL);
+ 	    if (pkgcheck[0] != ' ') {
+ 		rc = checkPackages(pkgcheck);
+ 	    }
+ 	    free(pkgcheck);
++	}
++        return rc;
++}
++
++rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
++{
++    rpmRC rc;
++    Package pkg;
++    char *pkglist = NULL;
++
++    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
++	char *fn = NULL;
++        rc = packageBinary(spec, pkg, cookie, cheating, &fn);
++	if (rc == RPMRC_OK) {
+ 	    rstrcat(&pkglist, fn);
+ 	    rstrcat(&pkglist, " ");
+ 	}
+-- 
+2.11.0
+
diff --git a/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch b/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
new file mode 100644
index 00000000000..549148930fa
--- /dev/null
+++ b/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
@@ -0,0 +1,125 @@
+From f8a05339a1e7b307e8de8c858e6e963782fc5925 Mon Sep 17 00:00:00 2001
+From: Alexander Kanavin <alex.kanavin@gmail.com>
+Date: Thu, 25 May 2017 19:30:20 +0300
+Subject: [PATCH 1/3] Run binary package creation via thread pools.
+
+Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
+Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
+---
+ build/pack.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
+ configure.ac |  3 +++
+ 2 files changed, 69 insertions(+), 14 deletions(-)
+
+diff --git a/build/pack.c b/build/pack.c
+index 5898a491b..f85d8dc90 100644
+--- a/build/pack.c
++++ b/build/pack.c
+@@ -616,25 +616,77 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
+         return rc;
+ }
+ 
+-rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
++struct binaryPackageTaskData
+ {
+-    rpmRC rc;
+     Package pkg;
++    char *filename;
++    rpmRC result;
++    struct binaryPackageTaskData *next;
++};
++
++static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const char *cookie, int cheating)
++{
++    struct binaryPackageTaskData *tasks = NULL;
++    struct binaryPackageTaskData *task = NULL;
++    struct binaryPackageTaskData *prev = NULL;
++
++    for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
++        task = rcalloc(1, sizeof(*task));
++        task->pkg = pkg;
++        if (pkg == spec->packages) {
++            // the first package needs to be processed ahead of others, as they copy
++            // changelog data from it, and so otherwise data races would happen
++            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename));
++            rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
++            tasks = task;
++        }
++        if (prev != NULL) {
++            prev->next = task;
++        }
++        prev = task;
++    }
++
++    #pragma omp parallel
++    #pragma omp single
++    for (task = tasks; task != NULL; task = task->next) {
++        if (task != tasks)
++        #pragma omp task
++        {
++            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename));
++            rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
++        }
++    }
++
++    return tasks;
++}
++
++static void freeBinaryPackageTasks(struct binaryPackageTaskData* tasks)
++{
++    while (tasks != NULL) {
++        struct binaryPackageTaskData* next = tasks->next;
++        rfree(tasks->filename);
++        rfree(tasks);
++        tasks = next;
++    }
++}
++
++rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
++{
+     char *pkglist = NULL;
+ 
+-    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
+-	char *fn = NULL;
+-        rc = packageBinary(spec, pkg, cookie, cheating, &fn);
+-	if (rc == RPMRC_OK) {
+-	    rstrcat(&pkglist, fn);
+-	    rstrcat(&pkglist, " ");
+-	}
+-	free(fn);
+-	if (rc != RPMRC_OK) {
+-	    pkglist = _free(pkglist);
+-	    return rc;
+-	}
++    struct binaryPackageTaskData *tasks = runBinaryPackageTasks(spec, cookie, cheating);
++
++    for (struct binaryPackageTaskData *task = tasks; task != NULL; task = task->next) {
++        if (task->result == RPMRC_OK) {
++            rstrcat(&pkglist, task->filename);
++            rstrcat(&pkglist, " ");
++        } else {
++            _free(pkglist);
++            freeBinaryPackageTasks(tasks);
++            return RPMRC_FAIL;
++        }
+     }
++    freeBinaryPackageTasks(tasks);
+ 
+     /* Now check the package set if enabled */
+     if (pkglist != NULL) {
+diff --git a/configure.ac b/configure.ac
+index a506ec819..59fa0acaf 100644
+--- a/configure.ac
++++ b/configure.ac
+@@ -17,6 +17,9 @@ AC_DISABLE_STATIC
+ 
+ PKG_PROG_PKG_CONFIG
+ 
++AC_OPENMP
++RPMCFLAGS="$OPENMP_CFLAGS $RPMCFLAGS"
++
+ dnl Checks for programs.
+ AC_PROG_CXX
+ AC_PROG_AWK
+-- 
+2.11.0
+
diff --git a/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch b/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
new file mode 100644
index 00000000000..cbfa44e268f
--- /dev/null
+++ b/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
@@ -0,0 +1,207 @@
+From 92a130c944ce79d082b94e42ea5a5edb2ac2237b Mon Sep 17 00:00:00 2001
+From: Alexander Kanavin <alex.kanavin@gmail.com>
+Date: Tue, 30 May 2017 13:58:30 +0300
+Subject: [PATCH 2/3] rpmstrpool.c: make operations over string pools
+ thread-safe
+
+Otherwise multithreaded rpm building explodes in various ways due
+to data races.
+
+Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
+Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
+
+---
+ rpmio/rpmstrpool.c | 56 +++++++++++++++++++++++++++++++++++++++++++++---------
+ 1 file changed, 47 insertions(+), 9 deletions(-)
+
+diff --git a/rpmio/rpmstrpool.c b/rpmio/rpmstrpool.c
+index 30a57eb10..58ba95a02 100644
+--- a/rpmio/rpmstrpool.c
++++ b/rpmio/rpmstrpool.c
+@@ -113,6 +113,8 @@ static poolHash poolHashCreate(int numBuckets)
+     return ht;
+ }
+ 
++static const char * rpmstrPoolStrNoLock(rpmstrPool pool, rpmsid sid);
++
+ static void poolHashResize(rpmstrPool pool, int numBuckets)
+ {
+     poolHash ht = pool->hash;
+@@ -120,7 +122,7 @@ static void poolHashResize(rpmstrPool pool, int numBuckets)
+ 
+     for (int i=0; i<ht->numBuckets; i++) {
+         if (!ht->buckets[i].keyid) continue;
+-        unsigned int keyHash = rstrhash(rpmstrPoolStr(pool, ht->buckets[i].keyid));
++        unsigned int keyHash = rstrhash(rpmstrPoolStrNoLock(pool, ht->buckets[i].keyid));
+         for (unsigned int j=0;;j++) {
+             unsigned int hash = hashbucket(keyHash, j) % numBuckets;
+             if (!buckets[hash].keyid) {
+@@ -149,7 +151,7 @@ static void poolHashAddHEntry(rpmstrPool pool, const char * key, unsigned int ke
+             ht->buckets[hash].keyid = keyid;
+             ht->keyCount++;
+             break;
+-        } else if (!strcmp(rpmstrPoolStr(pool, ht->buckets[hash].keyid), key)) {
++        } else if (!strcmp(rpmstrPoolStrNoLock(pool, ht->buckets[hash].keyid), key)) {
+             return;
+         }
+     }
+@@ -191,7 +193,7 @@ static void poolHashPrintStats(rpmstrPool pool)
+     int maxcollisions = 0;
+ 
+     for (i=0; i<ht->numBuckets; i++) {
+-        unsigned int keyHash = rstrhash(rpmstrPoolStr(pool, ht->buckets[i].keyid));
++        unsigned int keyHash = rstrhash(rpmstrPoolStrNoLock(pool, ht->buckets[i].keyid));
+         for (unsigned int j=0;;j++) {
+             unsigned int hash = hashbucket(keyHash, i) % ht->numBuckets;
+             if (hash==i) {
+@@ -221,7 +223,7 @@ static void rpmstrPoolRehash(rpmstrPool pool)
+ 
+     pool->hash = poolHashCreate(sizehint);
+     for (int i = 1; i <= pool->offs_size; i++)
+-	poolHashAddEntry(pool, rpmstrPoolStr(pool, i), i);
++	poolHashAddEntry(pool, rpmstrPoolStrNoLock(pool, i), i);
+ }
+ 
+ rpmstrPool rpmstrPoolCreate(void)
+@@ -245,6 +247,8 @@ rpmstrPool rpmstrPoolCreate(void)
+ 
+ rpmstrPool rpmstrPoolFree(rpmstrPool pool)
+ {
++    #pragma omp critical(rpmstrpool)
++    {
+     if (pool) {
+ 	if (pool->nrefs > 1) {
+ 	    pool->nrefs--;
+@@ -260,18 +264,24 @@ rpmstrPool rpmstrPoolFree(rpmstrPool pool)
+ 	    free(pool);
+ 	}
+     }
++    }
+     return NULL;
+ }
+ 
+ rpmstrPool rpmstrPoolLink(rpmstrPool pool)
+ {
++    #pragma omp critical(rpmstrpool)
++    {
+     if (pool)
+ 	pool->nrefs++;
++    }
+     return pool;
+ }
+ 
+ void rpmstrPoolFreeze(rpmstrPool pool, int keephash)
+ {
++    #pragma omp critical(rpmstrpool)
++    {
+     if (pool && !pool->frozen) {
+ 	if (!keephash) {
+ 	    pool->hash = poolHashFree(pool->hash);
+@@ -281,16 +291,20 @@ void rpmstrPoolFreeze(rpmstrPool pool, int keephash)
+ 			      pool->offs_alloced * sizeof(*pool->offs));
+ 	pool->frozen = 1;
+     }
++    }
+ }
+ 
+ void rpmstrPoolUnfreeze(rpmstrPool pool)
+ {
++    #pragma omp critical(rpmstrpool)
++    {
+     if (pool) {
+ 	if (pool->hash == NULL) {
+ 	    rpmstrPoolRehash(pool);
+ 	}
+ 	pool->frozen = 0;
+     }
++    }
+ }
+ 
+ static rpmsid rpmstrPoolPut(rpmstrPool pool, const char *s, size_t slen, unsigned int hash)
+@@ -350,7 +364,7 @@ static rpmsid rpmstrPoolGet(rpmstrPool pool, const char * key, size_t keylen,
+             return 0;
+         }
+ 
+-	s = rpmstrPoolStr(pool, ht->buckets[hash].keyid);
++	s = rpmstrPoolStrNoLock(pool, ht->buckets[hash].keyid);
+ 	/* pool string could be longer than keylen, require exact matche */
+ 	if (strncmp(s, key, keylen) == 0 && s[keylen] == '\0')
+ 	    return ht->buckets[hash].keyid;
+@@ -373,27 +387,31 @@ static inline rpmsid strn2id(rpmstrPool pool, const char *s, size_t slen,
+ rpmsid rpmstrPoolIdn(rpmstrPool pool, const char *s, size_t slen, int create)
+ {
+     rpmsid sid = 0;
+-
++    #pragma omp critical(rpmstrpool)
++    {
+     if (s != NULL) {
+ 	unsigned int hash = rstrnhash(s, slen);
+ 	sid = strn2id(pool, s, slen, hash, create);
+     }
++    }
+     return sid;
+ }
+ 
+ rpmsid rpmstrPoolId(rpmstrPool pool, const char *s, int create)
+ {
+     rpmsid sid = 0;
+-
++    #pragma omp critical(rpmstrpool)
++    {
+     if (s != NULL) {
+ 	size_t slen;
+ 	unsigned int hash = rstrlenhash(s, &slen);
+ 	sid = strn2id(pool, s, slen, hash, create);
+     }
++    }
+     return sid;
+ }
+ 
+-const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
++static const char * rpmstrPoolStrNoLock(rpmstrPool pool, rpmsid sid)
+ {
+     const char *s = NULL;
+     if (pool && sid > 0 && sid <= pool->offs_size)
+@@ -401,12 +419,25 @@ const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
+     return s;
+ }
+ 
++const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
++{
++    const char *s = NULL;
++    #pragma omp critical(rpmstrpool)
++    {
++    s = rpmstrPoolStrNoLock(pool, sid);
++    }
++    return s;
++}
++
+ size_t rpmstrPoolStrlen(rpmstrPool pool, rpmsid sid)
+ {
+     size_t slen = 0;
++    #pragma omp critical(rpmstrpool)
++    {
+     if (pool && sid > 0 && sid <= pool->offs_size) {
+ 	slen = strlen(pool->offs[sid]);
+     }
++    }
+     return slen;
+ }
+ 
+@@ -421,5 +452,12 @@ int rpmstrPoolStreq(rpmstrPool poolA, rpmsid sidA,
+ 
+ rpmsid rpmstrPoolNumStr(rpmstrPool pool)
+ {
+-    return (pool != NULL) ? pool->offs_size : 0;
++    rpmsid id = 0;
++    #pragma omp critical(rpmstrpool)
++    {
++    if (pool) {
++	id = pool->offs_size;
++    }
++    }
++    return id;
+ }
+-- 
+2.11.0
+
diff --git a/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch b/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
new file mode 100644
index 00000000000..9d43c3767b1
--- /dev/null
+++ b/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
@@ -0,0 +1,216 @@
+From 869d60f6cf68c2c83cc799f051c064e762ae9814 Mon Sep 17 00:00:00 2001
+From: Alexander Kanavin <alex.kanavin@gmail.com>
+Date: Thu, 8 Jun 2017 17:08:09 +0300
+Subject: [PATCH 3/3] build/pack.c: remove static local variables from
+ buildHost() and getBuildTime()
+
+Their use is causing difficult to diagnoze data races when building multiple
+packages in parallel, and is a bad idea in general, as it also makes it more
+difficult to reason about code.
+
+Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
+Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
+
+---
+ build/pack.c | 92 ++++++++++++++++++++++++++++++------------------------------
+ 1 file changed, 46 insertions(+), 46 deletions(-)
+
+diff --git a/build/pack.c b/build/pack.c
+index f85d8dc90..12b1ceb76 100644
+--- a/build/pack.c
++++ b/build/pack.c
+@@ -151,54 +151,47 @@ exit:
+     return rc;
+ }
+ 
+-static rpm_time_t * getBuildTime(void)
++static rpm_time_t getBuildTime(void)
+ {
+-    static rpm_time_t buildTime[1];
++    rpm_time_t buildTime = 0;
+     char *srcdate;
+     time_t epoch;
+     char *endptr;
+ 
+-    if (buildTime[0] == 0) {
+-        srcdate = getenv("SOURCE_DATE_EPOCH");
+-        if (srcdate) {
+-            errno = 0;
+-            epoch = strtol(srcdate, &endptr, 10);
+-            if (srcdate == endptr || *endptr || errno != 0)
+-                rpmlog(RPMLOG_ERR, _("unable to parse SOURCE_DATE_EPOCH\n"));
+-            else
+-                buildTime[0] = (int32_t) epoch;
+-        } else
+-            buildTime[0] = (int32_t) time(NULL);
+-    }
++    srcdate = getenv("SOURCE_DATE_EPOCH");
++    if (srcdate) {
++        errno = 0;
++        epoch = strtol(srcdate, &endptr, 10);
++        if (srcdate == endptr || *endptr || errno != 0)
++            rpmlog(RPMLOG_ERR, _("unable to parse SOURCE_DATE_EPOCH\n"));
++        else
++            buildTime = (int32_t) epoch;
++    } else
++        buildTime = (int32_t) time(NULL);
+ 
+     return buildTime;
+ }
+ 
+-static const char * buildHost(void)
++static char * buildHost(void)
+ {
+-    static char hostname[1024];
+-    static int oneshot = 0;
++    char* hostname;
+     struct hostent *hbn;
+     char *bhMacro;
+ 
+-    if (! oneshot) {
+-        bhMacro = rpmExpand("%{?_buildhost}", NULL);
+-        if (strcmp(bhMacro, "") != 0 && strlen(bhMacro) < 1024) {
+-            strcpy(hostname, bhMacro);
+-        } else {
+-            if (strcmp(bhMacro, "") != 0)
+-                rpmlog(RPMLOG_WARNING, _("The _buildhost macro is too long\n"));
+-            (void) gethostname(hostname, sizeof(hostname));
+-            hbn = gethostbyname(hostname);
+-            if (hbn)
+-                strcpy(hostname, hbn->h_name);
+-            else
+-                rpmlog(RPMLOG_WARNING,
+-                        _("Could not canonicalize hostname: %s\n"), hostname);
+-        }
+-        free(bhMacro);
+-        oneshot = 1;
+-    }
++    bhMacro = rpmExpand("%{?_buildhost}", NULL);
++    if (strcmp(bhMacro, "") != 0) {
++        rasprintf(&hostname, "%s", bhMacro);
++    } else {
++        hostname = rcalloc(1024, sizeof(*hostname));
++        (void) gethostname(hostname, 1024);
++        hbn = gethostbyname(hostname);
++        if (hbn)
++            strcpy(hostname, hbn->h_name);
++        else
++            rpmlog(RPMLOG_WARNING,
++                    _("Could not canonicalize hostname: %s\n"), hostname);
++    }
++    free(bhMacro);
+     return(hostname);
+ }
+ 
+@@ -308,7 +301,8 @@ static int haveRichDep(Package pkg)
+ }
+ 
+ static rpmRC writeRPM(Package pkg, unsigned char ** pkgidp,
+-		      const char *fileName, char **cookie)
++		      const char *fileName, char **cookie,
++		      rpm_time_t buildTime, const char* buildHost)
+ {
+     FD_t fd = NULL;
+     char * rpmio_flags = NULL;
+@@ -397,7 +391,7 @@ static rpmRC writeRPM(Package pkg, unsigned char ** pkgidp,
+ 
+     /* Create and add the cookie */
+     if (cookie) {
+-	rasprintf(cookie, "%s %d", buildHost(), (int) (*getBuildTime()));
++	rasprintf(cookie, "%s %d", buildHost, buildTime);
+ 	headerPutString(pkg->header, RPMTAG_COOKIE, *cookie);
+     }
+     
+@@ -546,7 +540,7 @@ static rpmRC checkPackages(char *pkgcheck)
+     return RPMRC_OK;
+ }
+ 
+-static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename)
++static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename, rpm_time_t buildTime, const char* buildHost)
+ {
+         const char *errorString;
+         rpmRC rc = RPMRC_OK;
+@@ -565,8 +559,8 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
+ 	headerCopyTags(spec->packages->header, pkg->header, copyTags);
+ 	
+ 	headerPutString(pkg->header, RPMTAG_RPMVERSION, VERSION);
+-	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost());
+-	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
++	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost);
++	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, &buildTime, 1);
+ 
+ 	if (spec->sourcePkgId != NULL) {
+ 	    headerPutBin(pkg->header, RPMTAG_SOURCEPKGID, spec->sourcePkgId,16);
+@@ -604,7 +598,7 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
+ 	    free(binRpm);
+ 	}
+ 
+-	rc = writeRPM(pkg, NULL, *filename, NULL);
++	rc = writeRPM(pkg, NULL, *filename, NULL, buildTime, buildHost);
+ 	if (rc == RPMRC_OK) {
+ 	    /* Do check each written package if enabled */
+ 	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", *filename, NULL);
+@@ -629,6 +623,8 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
+     struct binaryPackageTaskData *tasks = NULL;
+     struct binaryPackageTaskData *task = NULL;
+     struct binaryPackageTaskData *prev = NULL;
++    rpm_time_t buildTime = getBuildTime();
++    char *host = buildHost();
+ 
+     for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
+         task = rcalloc(1, sizeof(*task));
+@@ -636,7 +632,7 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
+         if (pkg == spec->packages) {
+             // the first package needs to be processed ahead of others, as they copy
+             // changelog data from it, and so otherwise data races would happen
+-            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename));
++            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename), buildTime, buildHost);
+             rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
+             tasks = task;
+         }
+@@ -652,11 +648,12 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
+         if (task != tasks)
+         #pragma omp task
+         {
+-            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename));
++            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename), buildTime, buildHost);
+             rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
+         }
+     }
+ 
++    free(host);
+     return tasks;
+ }
+ 
+@@ -707,16 +704,18 @@ rpmRC packageSources(rpmSpec spec, char **cookie)
+     rpmRC rc;
+ 
+     /* Add some cruft */
++    rpm_time_t buildTime = getBuildTime();
++    char* host = buildHost();
+     headerPutString(sourcePkg->header, RPMTAG_RPMVERSION, VERSION);
+-    headerPutString(sourcePkg->header, RPMTAG_BUILDHOST, buildHost());
+-    headerPutUint32(sourcePkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
++    headerPutString(sourcePkg->header, RPMTAG_BUILDHOST, (const char*)buildHost);
++    headerPutUint32(sourcePkg->header, RPMTAG_BUILDTIME, &buildTime, 1);
+ 
+     /* XXX this should be %_srpmdir */
+     {	char *fn = rpmGetPath("%{_srcrpmdir}/", spec->sourceRpmName,NULL);
+ 	char *pkgcheck = rpmExpand("%{?_build_pkgcheck_srpm} ", fn, NULL);
+ 
+ 	spec->sourcePkgId = NULL;
+-	rc = writeRPM(sourcePkg, &spec->sourcePkgId, fn, cookie);
++	rc = writeRPM(sourcePkg, &spec->sourcePkgId, fn, cookie, buildTime, host);
+ 
+ 	/* Do check SRPM package if enabled */
+ 	if (rc == RPMRC_OK && pkgcheck[0] != ' ') {
+@@ -726,5 +725,6 @@ rpmRC packageSources(rpmSpec spec, char **cookie)
+ 	free(pkgcheck);
+ 	free(fn);
+     }
++    free(host);
+     return rc;
+ }
+-- 
+2.11.0
+
diff --git a/meta/recipes-devtools/rpm/rpm_git.bb b/meta/recipes-devtools/rpm/rpm_git.bb
index 2310ee6b09e..b95b4719c19 100644
--- a/meta/recipes-devtools/rpm/rpm_git.bb
+++ b/meta/recipes-devtools/rpm/rpm_git.bb
@@ -35,6 +35,10 @@ SRC_URI = "git://github.com/rpm-software-management/rpm \
            file://0001-Fix-build-with-musl-C-library.patch \
            file://0001-Add-a-color-setting-for-mips64_n32-binaries.patch \
            file://0001-Add-PYTHON_ABI-when-searching-for-python-libraries.patch \
+           file://0001-Split-binary-package-building-into-a-separate-functi.patch \
+           file://0002-Run-binary-package-creation-via-thread-pools.patch \
+           file://0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch \
+           file://0004-build-pack.c-remove-static-local-variables-from-buil.patch \
            "
 
 PV = "4.13.90+git${SRCPV}"
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-08 14:42 ` [PATCHv2 2/2] rpm: run binary package generation via thread pools Alexander Kanavin
@ 2017-06-08 16:55   ` Mark Hatle
  2017-06-08 17:23     ` Alexander Kanavin
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Hatle @ 2017-06-08 16:55 UTC (permalink / raw)
  To: Alexander Kanavin, openembedded-core

On 6/8/17 9:42 AM, Alexander Kanavin wrote:
> This greatly reduces build times when there is a large amount of small
> rpm packages to produce. The patches are rather invasive,
> and so will be submitted upstream.
> 
> Signed-off-by: Alexander Kanavin <alexander.kanavin@linux.intel.com>
> ---
>  ...y-package-building-into-a-separate-functi.patch |  83 ++++++++
>  ...-binary-package-creation-via-thread-pools.patch | 125 ++++++++++++
>  ...c-make-operations-over-string-pools-threa.patch | 207 ++++++++++++++++++++
>  ...c-remove-static-local-variables-from-buil.patch | 216 +++++++++++++++++++++
>  meta/recipes-devtools/rpm/rpm_git.bb               |   4 +
>  5 files changed, 635 insertions(+)
>  create mode 100644 meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
> 
> diff --git a/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch b/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
> new file mode 100644
> index 00000000000..3d8b12144e7
> --- /dev/null
> +++ b/meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
> @@ -0,0 +1,83 @@
> +From b841b699e519438a66b661247c94efff63d0700e Mon Sep 17 00:00:00 2001
> +From: Alexander Kanavin <alex.kanavin@gmail.com>
> +Date: Thu, 25 May 2017 18:15:27 +0300
> +Subject: [PATCH 01/14] Split binary package building into a separate function
> +
> +So that it can be run as a thread pool task.
> +
> +Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
> +Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
> +---
> + build/pack.c | 33 +++++++++++++++++++++------------
> + 1 file changed, 21 insertions(+), 12 deletions(-)
> +
> +diff --git a/build/pack.c b/build/pack.c
> +index 497300b96..891e6bdc3 100644
> +--- a/build/pack.c
> ++++ b/build/pack.c
> +@@ -546,18 +546,13 @@ static rpmRC checkPackages(char *pkgcheck)
> +     return RPMRC_OK;
> + }
> + 
> +-rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> ++static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename)
> + {
> +-    rpmRC rc;
> +-    const char *errorString;
> +-    Package pkg;
> +-    char *pkglist = NULL;
> +-
> +-    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
> +-	char *fn;
> ++        const char *errorString;
> ++        rpmRC rc = RPMRC_OK;

The above change appears to be changing the spacing of the code.  RPM is
somewhat unique in how it does spacing.

Generally it's:

four spaces
   tab
       tab + four spaces
           tab + tab
repeat....

(Note there are a lot of inconsistencies here, so I'd say follow what the
function itself is using, as much as you can.)

> + 
> + 	if (pkg->fileList == NULL)
> +-	    continue;
> ++	    return rc;
> + 
> + 	if ((rc = processScriptFiles(spec, pkg)))
> + 	    return rc;
> +@@ -591,7 +586,7 @@ rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> + 		     headerGetString(pkg->header, RPMTAG_NAME), errorString);
> + 		return RPMRC_FAIL;
> + 	    }
> +-	    fn = rpmGetPath("%{_rpmdir}/", binRpm, NULL);
> ++	    *filename = rpmGetPath("%{_rpmdir}/", binRpm, NULL);

I'm not sure if it would be a cleaner change if you did (top of function):

char *fn = *filename;

and then kept this code as it is.  Might be up to the RPM community for that
type of comment.

> + 	    if ((binDir = strchr(binRpm, '/')) != NULL) {
> + 		struct stat st;
> + 		char *dn;
> +@@ -613,14 +608,28 @@ rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> + 	    free(binRpm);
> + 	}
> + 
> +-	rc = writeRPM(pkg, NULL, fn, NULL);
> ++	rc = writeRPM(pkg, NULL, *filename, NULL);
> + 	if (rc == RPMRC_OK) {
> + 	    /* Do check each written package if enabled */
> +-	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", fn, NULL);
> ++	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", *filename, NULL);
> + 	    if (pkgcheck[0] != ' ') {
> + 		rc = checkPackages(pkgcheck);
> + 	    }
> + 	    free(pkgcheck);
> ++	}
> ++        return rc;
> ++}
> ++
> ++rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> ++{
> ++    rpmRC rc;
> ++    Package pkg;
> ++    char *pkglist = NULL;
> ++
> ++    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
> ++	char *fn = NULL;
> ++        rc = packageBinary(spec, pkg, cookie, cheating, &fn);
> ++	if (rc == RPMRC_OK) {

tab/spacing above is inconsistent.

> + 	    rstrcat(&pkglist, fn);
> + 	    rstrcat(&pkglist, " ");
> + 	}
> +-- 
> +2.11.0
> +
> diff --git a/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch b/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
> new file mode 100644
> index 00000000000..549148930fa
> --- /dev/null
> +++ b/meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
> @@ -0,0 +1,125 @@
> +From f8a05339a1e7b307e8de8c858e6e963782fc5925 Mon Sep 17 00:00:00 2001
> +From: Alexander Kanavin <alex.kanavin@gmail.com>
> +Date: Thu, 25 May 2017 19:30:20 +0300
> +Subject: [PATCH 1/3] Run binary package creation via thread pools.
> +
> +Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
> +Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
> +---
> + build/pack.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
> + configure.ac |  3 +++
> + 2 files changed, 69 insertions(+), 14 deletions(-)
> +
> +diff --git a/build/pack.c b/build/pack.c
> +index 5898a491b..f85d8dc90 100644
> +--- a/build/pack.c
> ++++ b/build/pack.c
> +@@ -616,25 +616,77 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
> +         return rc;
> + }
> + 
> +-rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> ++struct binaryPackageTaskData
> + {
> +-    rpmRC rc;
> +     Package pkg;
> ++    char *filename;
> ++    rpmRC result;
> ++    struct binaryPackageTaskData *next;
> ++};
> ++
> ++static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const char *cookie, int cheating)
> ++{
> ++    struct binaryPackageTaskData *tasks = NULL;
> ++    struct binaryPackageTaskData *task = NULL;
> ++    struct binaryPackageTaskData *prev = NULL;
> ++
> ++    for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
> ++        task = rcalloc(1, sizeof(*task));
> ++        task->pkg = pkg;
> ++        if (pkg == spec->packages) {
> ++            // the first package needs to be processed ahead of others, as they copy
> ++            // changelog data from it, and so otherwise data races would happen
> ++            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename));
> ++            rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
> ++            tasks = task;
> ++        }
> ++        if (prev != NULL) {
> ++            prev->next = task;
> ++        }
> ++        prev = task;
> ++    }
> ++
> ++    #pragma omp parallel
> ++    #pragma omp single
> ++    for (task = tasks; task != NULL; task = task->next) {
> ++        if (task != tasks)
> ++        #pragma omp task
> ++        {
> ++            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename));
> ++            rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
> ++        }
> ++    }
> ++
> ++    return tasks;
> ++}
> ++
> ++static void freeBinaryPackageTasks(struct binaryPackageTaskData* tasks)
> ++{
> ++    while (tasks != NULL) {
> ++        struct binaryPackageTaskData* next = tasks->next;
> ++        rfree(tasks->filename);
> ++        rfree(tasks);
> ++        tasks = next;
> ++    }
> ++}
> ++
> ++rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
> ++{
> +     char *pkglist = NULL;
> + 
> +-    for (pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
> +-	char *fn = NULL;
> +-        rc = packageBinary(spec, pkg, cookie, cheating, &fn);
> +-	if (rc == RPMRC_OK) {
> +-	    rstrcat(&pkglist, fn);
> +-	    rstrcat(&pkglist, " ");
> +-	}
> +-	free(fn);
> +-	if (rc != RPMRC_OK) {
> +-	    pkglist = _free(pkglist);
> +-	    return rc;
> +-	}
> ++    struct binaryPackageTaskData *tasks = runBinaryPackageTasks(spec, cookie, cheating);
> ++
> ++    for (struct binaryPackageTaskData *task = tasks; task != NULL; task = task->next) {
> ++        if (task->result == RPMRC_OK) {
> ++            rstrcat(&pkglist, task->filename);
> ++            rstrcat(&pkglist, " ");
> ++        } else {
> ++            _free(pkglist);
> ++            freeBinaryPackageTasks(tasks);
> ++            return RPMRC_FAIL;
> ++        }
> +     }
> ++    freeBinaryPackageTasks(tasks);
> + 
> +     /* Now check the package set if enabled */
> +     if (pkglist != NULL) {
> +diff --git a/configure.ac b/configure.ac
> +index a506ec819..59fa0acaf 100644
> +--- a/configure.ac
> ++++ b/configure.ac
> +@@ -17,6 +17,9 @@ AC_DISABLE_STATIC
> + 
> + PKG_PROG_PKG_CONFIG
> + 
> ++AC_OPENMP
> ++RPMCFLAGS="$OPENMP_CFLAGS $RPMCFLAGS"
> ++
> + dnl Checks for programs.
> + AC_PROG_CXX
> + AC_PROG_AWK
> +-- 
> +2.11.0
> +
> diff --git a/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch b/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
> new file mode 100644
> index 00000000000..cbfa44e268f
> --- /dev/null
> +++ b/meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
> @@ -0,0 +1,207 @@
> +From 92a130c944ce79d082b94e42ea5a5edb2ac2237b Mon Sep 17 00:00:00 2001
> +From: Alexander Kanavin <alex.kanavin@gmail.com>
> +Date: Tue, 30 May 2017 13:58:30 +0300
> +Subject: [PATCH 2/3] rpmstrpool.c: make operations over string pools
> + thread-safe
> +
> +Otherwise multithreaded rpm building explodes in various ways due
> +to data races.
> +
> +Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
> +Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
> +
> +---
> + rpmio/rpmstrpool.c | 56 +++++++++++++++++++++++++++++++++++++++++++++---------
> + 1 file changed, 47 insertions(+), 9 deletions(-)
> +
> +diff --git a/rpmio/rpmstrpool.c b/rpmio/rpmstrpool.c
> +index 30a57eb10..58ba95a02 100644
> +--- a/rpmio/rpmstrpool.c
> ++++ b/rpmio/rpmstrpool.c
> +@@ -113,6 +113,8 @@ static poolHash poolHashCreate(int numBuckets)
> +     return ht;
> + }
> + 
> ++static const char * rpmstrPoolStrNoLock(rpmstrPool pool, rpmsid sid);
> ++
> + static void poolHashResize(rpmstrPool pool, int numBuckets)
> + {
> +     poolHash ht = pool->hash;
> +@@ -120,7 +122,7 @@ static void poolHashResize(rpmstrPool pool, int numBuckets)
> + 
> +     for (int i=0; i<ht->numBuckets; i++) {
> +         if (!ht->buckets[i].keyid) continue;
> +-        unsigned int keyHash = rstrhash(rpmstrPoolStr(pool, ht->buckets[i].keyid));
> ++        unsigned int keyHash = rstrhash(rpmstrPoolStrNoLock(pool, ht->buckets[i].keyid));
> +         for (unsigned int j=0;;j++) {
> +             unsigned int hash = hashbucket(keyHash, j) % numBuckets;
> +             if (!buckets[hash].keyid) {
> +@@ -149,7 +151,7 @@ static void poolHashAddHEntry(rpmstrPool pool, const char * key, unsigned int ke
> +             ht->buckets[hash].keyid = keyid;
> +             ht->keyCount++;
> +             break;
> +-        } else if (!strcmp(rpmstrPoolStr(pool, ht->buckets[hash].keyid), key)) {
> ++        } else if (!strcmp(rpmstrPoolStrNoLock(pool, ht->buckets[hash].keyid), key)) {
> +             return;
> +         }
> +     }
> +@@ -191,7 +193,7 @@ static void poolHashPrintStats(rpmstrPool pool)
> +     int maxcollisions = 0;
> + 
> +     for (i=0; i<ht->numBuckets; i++) {
> +-        unsigned int keyHash = rstrhash(rpmstrPoolStr(pool, ht->buckets[i].keyid));
> ++        unsigned int keyHash = rstrhash(rpmstrPoolStrNoLock(pool, ht->buckets[i].keyid));
> +         for (unsigned int j=0;;j++) {
> +             unsigned int hash = hashbucket(keyHash, i) % ht->numBuckets;
> +             if (hash==i) {
> +@@ -221,7 +223,7 @@ static void rpmstrPoolRehash(rpmstrPool pool)
> + 
> +     pool->hash = poolHashCreate(sizehint);
> +     for (int i = 1; i <= pool->offs_size; i++)
> +-	poolHashAddEntry(pool, rpmstrPoolStr(pool, i), i);
> ++	poolHashAddEntry(pool, rpmstrPoolStrNoLock(pool, i), i);
> + }
> + 
> + rpmstrPool rpmstrPoolCreate(void)
> +@@ -245,6 +247,8 @@ rpmstrPool rpmstrPoolCreate(void)
> + 
> + rpmstrPool rpmstrPoolFree(rpmstrPool pool)
> + {
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (pool) {
> + 	if (pool->nrefs > 1) {
> + 	    pool->nrefs--;
> +@@ -260,18 +264,24 @@ rpmstrPool rpmstrPoolFree(rpmstrPool pool)
> + 	    free(pool);
> + 	}
> +     }
> ++    }
> +     return NULL;
> + }
> + 
> + rpmstrPool rpmstrPoolLink(rpmstrPool pool)
> + {
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (pool)
> + 	pool->nrefs++;
> ++    }
> +     return pool;
> + }
> + 
> + void rpmstrPoolFreeze(rpmstrPool pool, int keephash)
> + {
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (pool && !pool->frozen) {
> + 	if (!keephash) {
> + 	    pool->hash = poolHashFree(pool->hash);
> +@@ -281,16 +291,20 @@ void rpmstrPoolFreeze(rpmstrPool pool, int keephash)
> + 			      pool->offs_alloced * sizeof(*pool->offs));
> + 	pool->frozen = 1;
> +     }
> ++    }
> + }
> + 
> + void rpmstrPoolUnfreeze(rpmstrPool pool)
> + {
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (pool) {
> + 	if (pool->hash == NULL) {
> + 	    rpmstrPoolRehash(pool);
> + 	}
> + 	pool->frozen = 0;
> +     }
> ++    }
> + }
> + 
> + static rpmsid rpmstrPoolPut(rpmstrPool pool, const char *s, size_t slen, unsigned int hash)
> +@@ -350,7 +364,7 @@ static rpmsid rpmstrPoolGet(rpmstrPool pool, const char * key, size_t keylen,
> +             return 0;
> +         }
> + 
> +-	s = rpmstrPoolStr(pool, ht->buckets[hash].keyid);
> ++	s = rpmstrPoolStrNoLock(pool, ht->buckets[hash].keyid);
> + 	/* pool string could be longer than keylen, require exact matche */
> + 	if (strncmp(s, key, keylen) == 0 && s[keylen] == '\0')
> + 	    return ht->buckets[hash].keyid;
> +@@ -373,27 +387,31 @@ static inline rpmsid strn2id(rpmstrPool pool, const char *s, size_t slen,
> + rpmsid rpmstrPoolIdn(rpmstrPool pool, const char *s, size_t slen, int create)
> + {
> +     rpmsid sid = 0;
> +-
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (s != NULL) {
> + 	unsigned int hash = rstrnhash(s, slen);
> + 	sid = strn2id(pool, s, slen, hash, create);
> +     }
> ++    }
> +     return sid;
> + }
> + 
> + rpmsid rpmstrPoolId(rpmstrPool pool, const char *s, int create)
> + {
> +     rpmsid sid = 0;
> +-
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (s != NULL) {
> + 	size_t slen;
> + 	unsigned int hash = rstrlenhash(s, &slen);
> + 	sid = strn2id(pool, s, slen, hash, create);
> +     }
> ++    }
> +     return sid;
> + }
> + 
> +-const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
> ++static const char * rpmstrPoolStrNoLock(rpmstrPool pool, rpmsid sid)
> + {
> +     const char *s = NULL;
> +     if (pool && sid > 0 && sid <= pool->offs_size)
> +@@ -401,12 +419,25 @@ const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
> +     return s;
> + }
> + 
> ++const char * rpmstrPoolStr(rpmstrPool pool, rpmsid sid)
> ++{
> ++    const char *s = NULL;
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> ++    s = rpmstrPoolStrNoLock(pool, sid);
> ++    }
> ++    return s;
> ++}
> ++
> + size_t rpmstrPoolStrlen(rpmstrPool pool, rpmsid sid)
> + {
> +     size_t slen = 0;
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> +     if (pool && sid > 0 && sid <= pool->offs_size) {
> + 	slen = strlen(pool->offs[sid]);
> +     }
> ++    }
> +     return slen;
> + }
> + 
> +@@ -421,5 +452,12 @@ int rpmstrPoolStreq(rpmstrPool poolA, rpmsid sidA,
> + 
> + rpmsid rpmstrPoolNumStr(rpmstrPool pool)
> + {
> +-    return (pool != NULL) ? pool->offs_size : 0;
> ++    rpmsid id = 0;
> ++    #pragma omp critical(rpmstrpool)
> ++    {
> ++    if (pool) {
> ++	id = pool->offs_size;
> ++    }
> ++    }
> ++    return id;
> + }
> +-- 
> +2.11.0
> +
> diff --git a/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch b/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
> new file mode 100644
> index 00000000000..9d43c3767b1
> --- /dev/null
> +++ b/meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
> @@ -0,0 +1,216 @@
> +From 869d60f6cf68c2c83cc799f051c064e762ae9814 Mon Sep 17 00:00:00 2001
> +From: Alexander Kanavin <alex.kanavin@gmail.com>
> +Date: Thu, 8 Jun 2017 17:08:09 +0300
> +Subject: [PATCH 3/3] build/pack.c: remove static local variables from
> + buildHost() and getBuildTime()
> +
> +Their use is causing difficult to diagnoze data races when building multiple
> +packages in parallel, and is a bad idea in general, as it also makes it more
> +difficult to reason about code.
> +
> +Upstream-Status: Submitted [https://github.com/rpm-software-management/rpm/pull/226]
> +Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
> +
> +---
> + build/pack.c | 92 ++++++++++++++++++++++++++++++------------------------------
> + 1 file changed, 46 insertions(+), 46 deletions(-)
> +
> +diff --git a/build/pack.c b/build/pack.c
> +index f85d8dc90..12b1ceb76 100644
> +--- a/build/pack.c
> ++++ b/build/pack.c
> +@@ -151,54 +151,47 @@ exit:
> +     return rc;
> + }
> + 
> +-static rpm_time_t * getBuildTime(void)
> ++static rpm_time_t getBuildTime(void)
> + {
> +-    static rpm_time_t buildTime[1];
> ++    rpm_time_t buildTime = 0;
> +     char *srcdate;
> +     time_t epoch;
> +     char *endptr;
> + 
> +-    if (buildTime[0] == 0) {
> +-        srcdate = getenv("SOURCE_DATE_EPOCH");
> +-        if (srcdate) {
> +-            errno = 0;
> +-            epoch = strtol(srcdate, &endptr, 10);
> +-            if (srcdate == endptr || *endptr || errno != 0)
> +-                rpmlog(RPMLOG_ERR, _("unable to parse SOURCE_DATE_EPOCH\n"));
> +-            else
> +-                buildTime[0] = (int32_t) epoch;
> +-        } else
> +-            buildTime[0] = (int32_t) time(NULL);
> +-    }
> ++    srcdate = getenv("SOURCE_DATE_EPOCH");
> ++    if (srcdate) {
> ++        errno = 0;
> ++        epoch = strtol(srcdate, &endptr, 10);
> ++        if (srcdate == endptr || *endptr || errno != 0)
> ++            rpmlog(RPMLOG_ERR, _("unable to parse SOURCE_DATE_EPOCH\n"));
> ++        else
> ++            buildTime = (int32_t) epoch;
> ++    } else
> ++        buildTime = (int32_t) time(NULL);

I think there is a concern here that the getBuildTime() function should capture
the time once and then re-use it for the life of that specific RPM session.  If
the getBuildTime is being called by a thread, I can see why this may not work.
Moving the initial call (and thus setting of the static value earlier would be
desirable.)

> + 
> +     return buildTime;
> + }
> + 
> +-static const char * buildHost(void)
> ++static char * buildHost(void)
> + {
> +-    static char hostname[1024];
> +-    static int oneshot = 0;
> ++    char* hostname;

The same for the buildHost.  Canonicalizing it each time and other things have
caused problems in the past.  So doing it once and caching the result is likely
the right answer.  Doing it earlier enough to cache the result BEFORE threading
is advisable.

> +     struct hostent *hbn;
> +     char *bhMacro;
> + 
> +-    if (! oneshot) {
> +-        bhMacro = rpmExpand("%{?_buildhost}", NULL);
> +-        if (strcmp(bhMacro, "") != 0 && strlen(bhMacro) < 1024) {
> +-            strcpy(hostname, bhMacro);
> +-        } else {
> +-            if (strcmp(bhMacro, "") != 0)
> +-                rpmlog(RPMLOG_WARNING, _("The _buildhost macro is too long\n"));
> +-            (void) gethostname(hostname, sizeof(hostname));
> +-            hbn = gethostbyname(hostname);
> +-            if (hbn)
> +-                strcpy(hostname, hbn->h_name);
> +-            else
> +-                rpmlog(RPMLOG_WARNING,
> +-                        _("Could not canonicalize hostname: %s\n"), hostname);
> +-        }
> +-        free(bhMacro);
> +-        oneshot = 1;
> +-    }
> ++    bhMacro = rpmExpand("%{?_buildhost}", NULL);
> ++    if (strcmp(bhMacro, "") != 0) {
> ++        rasprintf(&hostname, "%s", bhMacro);
> ++    } else {
> ++        hostname = rcalloc(1024, sizeof(*hostname));
> ++        (void) gethostname(hostname, 1024);
> ++        hbn = gethostbyname(hostname);
> ++        if (hbn)
> ++            strcpy(hostname, hbn->h_name);
> ++        else
> ++            rpmlog(RPMLOG_WARNING,
> ++                    _("Could not canonicalize hostname: %s\n"), hostname);
> ++    }
> ++    free(bhMacro);
> +     return(hostname);
> + }
> + 
> +@@ -308,7 +301,8 @@ static int haveRichDep(Package pkg)
> + }
> + 
> + static rpmRC writeRPM(Package pkg, unsigned char ** pkgidp,
> +-		      const char *fileName, char **cookie)
> ++		      const char *fileName, char **cookie,
> ++		      rpm_time_t buildTime, const char* buildHost)
> + {
> +     FD_t fd = NULL;
> +     char * rpmio_flags = NULL;
> +@@ -397,7 +391,7 @@ static rpmRC writeRPM(Package pkg, unsigned char ** pkgidp,
> + 
> +     /* Create and add the cookie */
> +     if (cookie) {
> +-	rasprintf(cookie, "%s %d", buildHost(), (int) (*getBuildTime()));
> ++	rasprintf(cookie, "%s %d", buildHost, buildTime);
> + 	headerPutString(pkg->header, RPMTAG_COOKIE, *cookie);
> +     }
> +     
> +@@ -546,7 +540,7 @@ static rpmRC checkPackages(char *pkgcheck)
> +     return RPMRC_OK;
> + }
> + 
> +-static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename)
> ++static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int cheating, char** filename, rpm_time_t buildTime, const char* buildHost)
> + {
> +         const char *errorString;
> +         rpmRC rc = RPMRC_OK;
> +@@ -565,8 +559,8 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
> + 	headerCopyTags(spec->packages->header, pkg->header, copyTags);
> + 	
> + 	headerPutString(pkg->header, RPMTAG_RPMVERSION, VERSION);
> +-	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost());
> +-	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
> ++	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost);
> ++	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, &buildTime, 1);

I don't see what the advantage of this change over the current approach is --
unless the first call is in a thread so the variable can clash.  If that is the
case, it may make more sense to ensure that these two are run very early in the
build process before it threads.

> + 
> + 	if (spec->sourcePkgId != NULL) {
> + 	    headerPutBin(pkg->header, RPMTAG_SOURCEPKGID, spec->sourcePkgId,16);
> +@@ -604,7 +598,7 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
> + 	    free(binRpm);
> + 	}
> + 
> +-	rc = writeRPM(pkg, NULL, *filename, NULL);
> ++	rc = writeRPM(pkg, NULL, *filename, NULL, buildTime, buildHost);
> + 	if (rc == RPMRC_OK) {
> + 	    /* Do check each written package if enabled */
> + 	    char *pkgcheck = rpmExpand("%{?_build_pkgcheck} ", *filename, NULL);
> +@@ -629,6 +623,8 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
> +     struct binaryPackageTaskData *tasks = NULL;
> +     struct binaryPackageTaskData *task = NULL;
> +     struct binaryPackageTaskData *prev = NULL;
> ++    rpm_time_t buildTime = getBuildTime();
> ++    char *host = buildHost();
> + 
> +     for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
> +         task = rcalloc(1, sizeof(*task));
> +@@ -636,7 +632,7 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
> +         if (pkg == spec->packages) {
> +             // the first package needs to be processed ahead of others, as they copy
> +             // changelog data from it, and so otherwise data races would happen
> +-            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename));
> ++            task->result = packageBinary(spec, pkg, cookie, cheating, &(task->filename), buildTime, buildHost);
> +             rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
> +             tasks = task;
> +         }
> +@@ -652,11 +648,12 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
> +         if (task != tasks)
> +         #pragma omp task
> +         {
> +-            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename));
> ++            task->result = packageBinary(spec, task->pkg, cookie, cheating, &(task->filename), buildTime, buildHost);
> +             rpmlog(RPMLOG_NOTICE, _("Finished binary package job, result %d, filename %s\n"), task->result, task->filename);
> +         }
> +     }
> + 
> ++    free(host);
> +     return tasks;
> + }
> + 
> +@@ -707,16 +704,18 @@ rpmRC packageSources(rpmSpec spec, char **cookie)
> +     rpmRC rc;
> + 
> +     /* Add some cruft */
> ++    rpm_time_t buildTime = getBuildTime();
> ++    char* host = buildHost();
> +     headerPutString(sourcePkg->header, RPMTAG_RPMVERSION, VERSION);
> +-    headerPutString(sourcePkg->header, RPMTAG_BUILDHOST, buildHost());
> +-    headerPutUint32(sourcePkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
> ++    headerPutString(sourcePkg->header, RPMTAG_BUILDHOST, (const char*)buildHost);
> ++    headerPutUint32(sourcePkg->header, RPMTAG_BUILDTIME, &buildTime, 1);
> + 
> +     /* XXX this should be %_srpmdir */
> +     {	char *fn = rpmGetPath("%{_srcrpmdir}/", spec->sourceRpmName,NULL);
> + 	char *pkgcheck = rpmExpand("%{?_build_pkgcheck_srpm} ", fn, NULL);
> + 
> + 	spec->sourcePkgId = NULL;
> +-	rc = writeRPM(sourcePkg, &spec->sourcePkgId, fn, cookie);
> ++	rc = writeRPM(sourcePkg, &spec->sourcePkgId, fn, cookie, buildTime, host);
> + 
> + 	/* Do check SRPM package if enabled */
> + 	if (rc == RPMRC_OK && pkgcheck[0] != ' ') {
> +@@ -726,5 +725,6 @@ rpmRC packageSources(rpmSpec spec, char **cookie)
> + 	free(pkgcheck);
> + 	free(fn);
> +     }
> ++    free(host);
> +     return rc;
> + }
> +-- 
> +2.11.0
> +
> diff --git a/meta/recipes-devtools/rpm/rpm_git.bb b/meta/recipes-devtools/rpm/rpm_git.bb
> index 2310ee6b09e..b95b4719c19 100644
> --- a/meta/recipes-devtools/rpm/rpm_git.bb
> +++ b/meta/recipes-devtools/rpm/rpm_git.bb
> @@ -35,6 +35,10 @@ SRC_URI = "git://github.com/rpm-software-management/rpm \
>             file://0001-Fix-build-with-musl-C-library.patch \
>             file://0001-Add-a-color-setting-for-mips64_n32-binaries.patch \
>             file://0001-Add-PYTHON_ABI-when-searching-for-python-libraries.patch \
> +           file://0001-Split-binary-package-building-into-a-separate-functi.patch \
> +           file://0002-Run-binary-package-creation-via-thread-pools.patch \
> +           file://0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch \
> +           file://0004-build-pack.c-remove-static-local-variables-from-buil.patch \
>             "
>  
>  PV = "4.13.90+git${SRCPV}"
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-08 16:55   ` Mark Hatle
@ 2017-06-08 17:23     ` Alexander Kanavin
  2017-06-08 20:29       ` Mark Hatle
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-08 17:23 UTC (permalink / raw)
  To: Mark Hatle, openembedded-core

On 06/08/2017 07:55 PM, Mark Hatle wrote:
>> +@@ -565,8 +559,8 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
>> + 	headerCopyTags(spec->packages->header, pkg->header, copyTags);
>> + 	
>> + 	headerPutString(pkg->header, RPMTAG_RPMVERSION, VERSION);
>> +-	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost());
>> +-	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
>> ++	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost);
>> ++	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, &buildTime, 1);
>
> I don't see what the advantage of this change over the current approach is --
> unless the first call is in a thread so the variable can clash.  If that is the
> case, it may make more sense to ensure that these two are run very early in the
> build process before it threads.

They're run from threads, and they do clash, and I had spent a whole day 
tracing data corruption coredumps back to these two.

This is explained in the commit message:

"Their use is causing difficult to diagnoze data races when building 
multiple packages in parallel..."

And I had changed the code so that they're run early in the build 
process, and just once. See below...


>> +@@ -629,6 +623,8 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
>> +     struct binaryPackageTaskData *tasks = NULL;
>> +     struct binaryPackageTaskData *task = NULL;
>> +     struct binaryPackageTaskData *prev = NULL;
>> ++    rpm_time_t buildTime = getBuildTime();
>> ++    char *host = buildHost();
>> +
>> +     for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
>> +         task = rcalloc(1, sizeof(*task));


^^^^ right there.

Please read the code more carefully :-)


Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-08 17:23     ` Alexander Kanavin
@ 2017-06-08 20:29       ` Mark Hatle
  2017-06-09  9:02         ` Alexander Kanavin
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Hatle @ 2017-06-08 20:29 UTC (permalink / raw)
  To: Alexander Kanavin, openembedded-core

On 6/8/17 12:23 PM, Alexander Kanavin wrote:
> On 06/08/2017 07:55 PM, Mark Hatle wrote:
>>> +@@ -565,8 +559,8 @@ static rpmRC packageBinary(rpmSpec spec, Package pkg, const char *cookie, int ch
>>> + 	headerCopyTags(spec->packages->header, pkg->header, copyTags);
>>> + 	
>>> + 	headerPutString(pkg->header, RPMTAG_RPMVERSION, VERSION);
>>> +-	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost());
>>> +-	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, getBuildTime(), 1);
>>> ++	headerPutString(pkg->header, RPMTAG_BUILDHOST, buildHost);
>>> ++	headerPutUint32(pkg->header, RPMTAG_BUILDTIME, &buildTime, 1);
>>
>> I don't see what the advantage of this change over the current approach is --
>> unless the first call is in a thread so the variable can clash.  If that is the
>> case, it may make more sense to ensure that these two are run very early in the
>> build process before it threads.
> 
> They're run from threads, and they do clash, and I had spent a whole day 
> tracing data corruption coredumps back to these two.
> 
> This is explained in the commit message:
> 
> "Their use is causing difficult to diagnoze data races when building 
> multiple packages in parallel..."
> 
> And I had changed the code so that they're run early in the build 
> process, and just once. See below...

Yes, but you were no longer calling them and moved to passing in a variable.
The point of my comment is two fold.

One, by changing the behavior of the function you now HAVE to record the value
or if you run it again you will get a different result.  (Likely not what is
desired in the general case...)

..and two you also had to change the function calls to pass in this single value.

Assuming that moving the call earlier (like you did) works, and the static
values are retrievable by the function.  You could have fixed this with the
snipped below (without storing the values).  This would simplify the patch.

(If there is something in the way the threads work with openmp that makes the
static variable not transfer from the main execution thread to the children
threads -- that is a different issue and should be stated in the commit.)

> 
>>> +@@ -629,6 +623,8 @@ static struct binaryPackageTaskData* runBinaryPackageTasks(rpmSpec spec, const c
>>> +     struct binaryPackageTaskData *tasks = NULL;
>>> +     struct binaryPackageTaskData *task = NULL;
>>> +     struct binaryPackageTaskData *prev = NULL;
>>> ++    rpm_time_t buildTime = getBuildTime();
>>> ++    char *host = buildHost();
>>> +
>>> +     for (Package pkg = spec->packages; pkg != NULL; pkg = pkg->next) {
>>> +         task = rcalloc(1, sizeof(*task));
> 
> 
> ^^^^ right there.
> 
> Please read the code more carefully :-)
> 
> 
> Alex
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 0/2] Multi-threaded RPM support
  2017-06-08 14:42 [PATCHv2 0/2] Multi-threaded RPM support Alexander Kanavin
  2017-06-08 14:42 ` [PATCHv2 1/2] package_rpm.bbclass: use multithreaded xz compression Alexander Kanavin
  2017-06-08 14:42 ` [PATCHv2 2/2] rpm: run binary package generation via thread pools Alexander Kanavin
@ 2017-06-08 21:15 ` Leonardo Sandoval
  2017-06-09  9:13   ` Alexander Kanavin
  2 siblings, 1 reply; 10+ messages in thread
From: Leonardo Sandoval @ 2017-06-08 21:15 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: openembedded-core

On Thu, 2017-06-08 at 17:42 +0300, Alexander Kanavin wrote:
> Changes in version 2: use openmp for multi-processing instead of NSPR library, as
> requested by upstream. Add a link to the upstream submission to patches.
> 
> These two patches add multi-threaded features to RPM to speed up the do_package_write_rpm task.
> 
> Specifically:
> 
> 1) Multi-threaded xz compressor is used instead of default single-threaded gz. This has the most
> dramatic effect when a recipe produces a smaller number of large-sized packages.
> 
> 2) Packages creation is run in multiple threads via thread pools. This is most beneficial when
> a recipe produces a large amount of small packages.
> 
> Some not very scientific benchmarks for time and .rpm sizes
> (time is measured for do_package_write_rpm tasks only):
> 

Tried v2 again and I am getting this problem (segmentation faults):

http://errors.yoctoproject.org/Errors/Build/38812/


The idea I have is to get some numbers from buildstats and have better
insight of the impact of this series. BTW, I tried core-image-sato.



> webkitgtk
> before: 9m12s 1550M
> after: 1m40s 858M
> 
> glibc-locale
> before: 2m52s 125M
> after: 30s 56M
> 
> glibc
> before: 46s 54M
> after: 13s 38M
> 
> perl
> before: 1m09s 63M
> after: 45s 42M
> 
> python3
> before: 30s 38M
> after: 18s 24M
> 
> The following changes since commit 576821ea0a7558b626ccc87e9ae0e9ee40864956:
> 
>   bitbake: bitbake-layers: check layer dependencies before adding (2017-06-06 19:52:51 +0100)
> 
> are available in the git repository at:
> 
>   git://git.yoctoproject.org/poky-contrib akanavin/parallel-rpm
>   http://git.yoctoproject.org/cgit.cgi/poky-contrib/log/?h=akanavin/parallel-rpm
> 
> Alexander Kanavin (2):
>   package_rpm.bbclass: use multithreaded xz compression
>   rpm: run binary package generation via thread pools
> 
>  meta/classes/package_rpm.bbclass                   |   2 +
>  ...y-package-building-into-a-separate-functi.patch |  83 ++++++++
>  ...-binary-package-creation-via-thread-pools.patch | 125 ++++++++++++
>  ...c-make-operations-over-string-pools-threa.patch | 207 ++++++++++++++++++++
>  ...c-remove-static-local-variables-from-buil.patch | 216 +++++++++++++++++++++
>  meta/recipes-devtools/rpm/rpm_git.bb               |   4 +
>  6 files changed, 637 insertions(+)
>  create mode 100644 meta/recipes-devtools/rpm/files/0001-Split-binary-package-building-into-a-separate-functi.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0002-Run-binary-package-creation-via-thread-pools.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0003-rpmstrpool.c-make-operations-over-string-pools-threa.patch
>  create mode 100644 meta/recipes-devtools/rpm/files/0004-build-pack.c-remove-static-local-variables-from-buil.patch
> 
> -- 
> 2.11.0
> 




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-08 20:29       ` Mark Hatle
@ 2017-06-09  9:02         ` Alexander Kanavin
  2017-06-09 11:18           ` Alexander Kanavin
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-09  9:02 UTC (permalink / raw)
  To: Mark Hatle, openembedded-core

On 06/08/2017 11:29 PM, Mark Hatle wrote:
> Assuming that moving the call earlier (like you did) works, and the static
> values are retrievable by the function.  You could have fixed this with the
> snipped below (without storing the values).  This would simplify the patch.
>
> (If there is something in the way the threads work with openmp that makes the
> static variable not transfer from the main execution thread to the children
> threads -- that is a different issue and should be stated in the commit.)

Ah, now I get you. You are saying that if these functions are executed 
once at the start to pre-populate the static data, then they 
subsequently become thread-safe. Unfortunately this is not the case: I 
have no idea why this happens, as they would only read static data, but 
when I did that (in the first iteration of the patch which used nspr 
instead of openmp), there was a very rare, but still happening data 
corruption issue. There was no explicit first call, but the first 
package was built ahead of others (for other reasons, and it still is), 
and that's where the first call happened.

Then I commented out the calls, and the data corruption disappeared. 
Then I rewrote the function to eliminate the static data, and it's still 
working as expected. Weird! I don't want to spend any more time on this 
though.

Then there's my other point: use of static data makes it harder to 
reason about code. You need to take into account not only the function 
inputs, but also the history of function calls up to the current 
execution point. Static data is really a 70s C relic from before threads 
were invented.

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 0/2] Multi-threaded RPM support
  2017-06-08 21:15 ` [PATCHv2 0/2] Multi-threaded RPM support Leonardo Sandoval
@ 2017-06-09  9:13   ` Alexander Kanavin
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-09  9:13 UTC (permalink / raw)
  To: Leonardo Sandoval; +Cc: openembedded-core

On 06/09/2017 12:15 AM, Leonardo Sandoval wrote:
> On Thu, 2017-06-08 at 17:42 +0300, Alexander Kanavin wrote:
>> Changes in version 2: use openmp for multi-processing instead of NSPR library, as
>> requested by upstream. Add a link to the upstream submission to patches.
>>
>> These two patches add multi-threaded features to RPM to speed up the do_package_write_rpm task.
>>
>> Specifically:
>>
>> 1) Multi-threaded xz compressor is used instead of default single-threaded gz. This has the most
>> dramatic effect when a recipe produces a smaller number of large-sized packages.
>>
>> 2) Packages creation is run in multiple threads via thread pools. This is most beneficial when
>> a recipe produces a large amount of small packages.
>>
>> Some not very scientific benchmarks for time and .rpm sizes
>> (time is measured for do_package_write_rpm tasks only):
>>
>
> Tried v2 again and I am getting this problem (segmentation faults):
>
> http://errors.yoctoproject.org/Errors/Build/38812/
>
>
> The idea I have is to get some numbers from buildstats and have better
> insight of the impact of this series. BTW, I tried core-image-sato.

Now I see the issue locally, even though it does not crash, but I think 
it's the same issue :) Bear with me, it's my first openmp adventure ever.

Alex



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] rpm: run binary package generation via thread pools
  2017-06-09  9:02         ` Alexander Kanavin
@ 2017-06-09 11:18           ` Alexander Kanavin
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Kanavin @ 2017-06-09 11:18 UTC (permalink / raw)
  To: Mark Hatle, openembedded-core

On 06/09/2017 12:02 PM, Alexander Kanavin wrote:
> Ah, now I get you. You are saying that if these functions are executed
> once at the start to pre-populate the static data, then they
> subsequently become thread-safe. Unfortunately this is not the case: I
> have no idea why this happens, as they would only read static data, but
> when I did that (in the first iteration of the patch which used nspr
> instead of openmp), there was a very rare, but still happening data
> corruption issue. There was no explicit first call, but the first
> package was built ahead of others (for other reasons, and it still is),
> and that's where the first call happened.

Actually, looking at code again, that may not be true - building a first 
package may not necessarily guarantee the pre-population of static data. 
See what I mean about code that is hard to reason about? It's best to 
get rid of static data altogether, even if it makes for a larger patch.


Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-09 11:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-08 14:42 [PATCHv2 0/2] Multi-threaded RPM support Alexander Kanavin
2017-06-08 14:42 ` [PATCHv2 1/2] package_rpm.bbclass: use multithreaded xz compression Alexander Kanavin
2017-06-08 14:42 ` [PATCHv2 2/2] rpm: run binary package generation via thread pools Alexander Kanavin
2017-06-08 16:55   ` Mark Hatle
2017-06-08 17:23     ` Alexander Kanavin
2017-06-08 20:29       ` Mark Hatle
2017-06-09  9:02         ` Alexander Kanavin
2017-06-09 11:18           ` Alexander Kanavin
2017-06-08 21:15 ` [PATCHv2 0/2] Multi-threaded RPM support Leonardo Sandoval
2017-06-09  9:13   ` Alexander Kanavin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox