From: Saul Wold <sgw@linux.intel.com>
To: Robert Yang <liezhi.yang@windriver.com>,
openembedded-core@lists.openembedded.org
Subject: Re: [PATCH 1/1] useradd_base.bbclass: sleep more and more seconds (up to 10)
Date: Thu, 03 Apr 2014 13:42:27 -0700 [thread overview]
Message-ID: <533DC7B3.3060108@linux.intel.com> (raw)
In-Reply-To: <a3a8a29aaf2c8349e71f20cd42e3a1627dd27a6c.1396519133.git.liezhi.yang@windriver.com>
On 04/03/2014 02:59 AM, Robert Yang wrote:
> Currently, it would sleep 1 second when fail to add the user, this maybe
> not enough when we use the sstate cache, as my test shows below, nearly
> all the useradd actions are doing in the same minute when mirror from
> ssate cache, and it would fail when the load is high, I got these time
> by adding strace before the useradd for debugging:
>
> 2014-03-31 14:48:22.978079781 +0800 /tmp/log/pulseaudio.4.c
> 2014-03-31 14:48:22.028079813 +0800 /tmp/log/pulseaudio.1.c
> 2014-03-31 14:48:21.949079816 +0800 /tmp/log/pulseaudio.3.c
> 2014-03-31 14:48:20.903079852 +0800 /tmp/log/pulseaudio.2.c
> 2014-03-31 14:48:20.006079883 +0800 /tmp/log/nfs-utils.9.c
> 2014-03-31 14:48:18.876079923 +0800 /tmp/log/xuser-account.9.c
> 2014-03-31 14:48:18.824079924 +0800 /tmp/log/pulseaudio.0.c
> 2014-03-31 14:48:17.826079959 +0800 /tmp/log/xuser-account.8.c
> 2014-03-31 14:48:17.766079961 +0800 /tmp/log/nfs-utils.8.c
> 2014-03-31 14:48:16.794079995 +0800 /tmp/log/xuser-account.7.c
> 2014-03-31 14:48:16.735079997 +0800 /tmp/log/nfs-utils.7.c
> 2014-03-31 14:48:14.719080066 +0800 /tmp/log/xuser-account.5.c
> 2014-03-31 14:48:14.677080068 +0800 /tmp/log/nfs-utils.5.c
> 2014-03-31 14:48:12.621080139 +0800 /tmp/log/nfs-utils.3.c
> 2014-03-31 14:48:11.589080175 +0800 /tmp/log/nfs-utils.2.c
> 2014-03-31 14:48:10.242080221 +0800 /tmp/log/builder.0.c
> 2014-03-31 14:48:09.523080246 +0800 /tmp/log/nfs-utils.0.c
> 2014-03-31 14:48:09.488080248 +0800 /tmp/log/openssh.0.c
> 2014-03-31 14:48:09.485080248 +0800 /tmp/log/rpcbind.1.c
> 2014-03-31 14:48:07.590080313 +0800 /tmp/log/rpcbind.0.c
> 2014-03-31 14:28:15.437121590 +0800 /tmp/log/avahi.0.c
> 2014-03-31 14:18:19.067142238 +0800 /tmp/log/dbus.0.c
>
> The nfs-utils and xuser-account are failed to add the user.
>
> The useradd command needs two locks, passwd.lock and group.lock, it may
> get one, but can't get another one if we look into these .c files, sleep
> 1 second is not enough, it needs more seconds, the reason is that, if
> succeed, it doesn't have any side effects, if failed, we need wait for
> more seconds rather than make it more crowding.
>
> I've tried to use "sleep 5", but it didn't make much better since they
> would sleep and wake up nearly at the same time, I also tried to use
> "sleep <RANDOM seconds between 1 and 10>", that didn't make much better
> ,either.
>
> I think that a better ways is sleep more and more seconds (up to 10
> seconds) when failed, this can't fix the problem that they may do the
> actions at the same time, but the logic is: if it is not crowding, sleep
> less time should be OK, otherwise sleep more and more time.
>
> Here is the testing result which seems much better:
> 2014-04-03 14:09:56.605185284 +0800 dbus.0.c
> 2014-04-03 14:09:39.899185862 +0800 rpcbind.5.c
> 2014-04-03 14:09:38.400185914 +0800 distcc.4.c
> 2014-04-03 14:09:35.206186025 +0800 pulseaudio.1.c
> 2014-04-03 14:09:33.979186067 +0800 rpcbind.4.c
> 2014-04-03 14:09:33.364186089 +0800 pulseaudio.0.c
> 2014-04-03 14:09:33.360186089 +0800 distcc.3.c
> 2014-04-03 14:09:30.996186171 +0800 avahi-ui.0.c
> 2014-04-03 14:09:30.298186195 +0800 distcc.2.c
> 2014-04-03 14:09:29.905186208 +0800 rpcbind.3.c
> 2014-04-03 14:09:29.410186226 +0800 avahi-ui.2.c
> 2014-04-03 14:09:28.239186266 +0800 distcc.1.c
> 2014-04-03 14:09:27.298186299 +0800 xuser-account.0.c
> 2014-04-03 14:09:27.032186308 +0800 distcc.0.c
> 2014-04-03 14:09:26.836186315 +0800 rpcbind.2.c
> 2014-04-03 14:09:25.846186349 +0800 nfs-utils.1.c
> 2014-04-03 14:09:25.752186352 +0800 avahi-ui.1.c
> 2014-04-03 14:09:24.779186386 +0800 builder.0.c
> 2014-04-03 14:09:24.746186387 +0800 rpcbind.1.c
> 2014-04-03 14:09:23.916186416 +0800 openssh.1.c
> 2014-04-03 14:09:23.848186418 +0800 nfs-utils.0.c
> 2014-04-03 14:09:23.594186427 +0800 rpcbind.0.c
> 2014-04-03 14:09:22.609186461 +0800 ppp-dialin.0.c
> 2014-04-03 14:09:21.817186488 +0800 openssh.0.c
>
> [YOCTO #6085]
>
> Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
> ---
> meta/classes/useradd_base.bbclass | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/meta/classes/useradd_base.bbclass b/meta/classes/useradd_base.bbclass
> index 7aafe29..01d2e99 100644
> --- a/meta/classes/useradd_base.bbclass
> +++ b/meta/classes/useradd_base.bbclass
> @@ -24,7 +24,7 @@ perform_groupadd () {
> group_exists="`grep "^$groupname:" $rootdir/etc/group || true`"
> if test "x$group_exists" = "x"; then
> bbwarn "groupadd command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
Why not move the count assignment that is below the fi (not visible in
this diff) to above the test and then check for count > retries, this
will save one call to expr.
Sau!
> else
> break
> fi
> @@ -52,7 +52,7 @@ perform_useradd () {
> user_exists="`grep "^$username:" $rootdir/etc/passwd || true`"
> if test "x$user_exists" = "x"; then
> bbwarn "useradd command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
> @@ -90,7 +90,7 @@ perform_groupmems () {
> mem_exists="`grep "^$groupname:[^:]*:[^:]*:\([^,]*,\)*$username\(,[^,]*\)*" $rootdir/etc/group || true`"
> if test "x$mem_exists" = "x"; then
> bbwarn "groupmems command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
> @@ -126,7 +126,7 @@ perform_groupdel () {
> group_exists="`grep "^$groupname:" $rootdir/etc/group || true`"
> if test "x$group_exists" != "x"; then
> bbwarn "groupdel command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
> @@ -154,7 +154,7 @@ perform_userdel () {
> user_exists="`grep "^$username:" $rootdir/etc/passwd || true`"
> if test "x$user_exists" != "x"; then
> bbwarn "userdel command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
> @@ -184,7 +184,7 @@ perform_groupmod () {
> eval $PSEUDO groupmod $opts
> if test $? != 0; then
> bbwarn "groupmod command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
> @@ -214,7 +214,7 @@ perform_usermod () {
> eval $PSEUDO usermod $opts
> if test $? != 0; then
> bbwarn "usermod command did not succeed. Retrying..."
> - sleep 1
> + sleep `expr $count + 1`
> else
> break
> fi
>
next prev parent reply other threads:[~2014-04-03 20:42 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-03 9:59 [PATCH 0/1] useradd_base.bbclass: sleep more and more seconds (up to 10) Robert Yang
2014-04-03 9:59 ` [PATCH 1/1] " Robert Yang
2014-04-03 20:42 ` Saul Wold [this message]
2014-04-04 7:33 ` Robert Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=533DC7B3.3060108@linux.intel.com \
--to=sgw@linux.intel.com \
--cc=liezhi.yang@windriver.com \
--cc=openembedded-core@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox