From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail1.windriver.com (mail1.windriver.com [147.11.146.13]) by mail.openembedded.org (Postfix) with ESMTP id E3E3972BCE for ; Thu, 22 Jan 2015 09:10:16 +0000 (UTC) Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail1.windriver.com (8.14.9/8.14.5) with ESMTP id t0M9AGxt023903 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 22 Jan 2015 01:10:16 -0800 (PST) Received: from [128.224.162.174] (128.224.162.174) by ALA-HCA.corp.ad.wrs.com (147.11.189.40) with Microsoft SMTP Server id 14.3.174.1; Thu, 22 Jan 2015 01:10:15 -0800 Message-ID: <54C0BE76.7030601@windriver.com> Date: Thu, 22 Jan 2015 17:10:14 +0800 From: Robert Yang User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Richard Purdie References: <1421158296.31262.17.camel@linuxfoundation.org> <54BCBBF6.4020904@windriver.com> <1421663325.1798.31.camel@linuxfoundation.org> <54BDBC83.9030007@windriver.com> In-Reply-To: <54BDBC83.9030007@windriver.com> Cc: bitbake-devel Subject: Re: [PATCH] bitbake: Add pyinotify to lib/ X-BeenThere: bitbake-devel@lists.openembedded.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Patches and discussion that advance bitbake development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 09:10:17 -0000 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Hi RP, I think that I can confirm it is caused by exceeded the max user processes, the default value on my host is: $ ulimit -u 5000 When the "cannot fork" error happens, the top shows that there are more than 4000 processes, here is the data from "top -d 2 -b": Tasks: 4547 total, 1070 running, 3443 sleeping, 6 stopped, 28 zombie Tasks: 4654 total, 1109 running, 3499 sleeping, 6 stopped, 40 zombie Tasks: 4682 total, 1114 running, 3531 sleeping, 6 stopped, 31 zombie Tasks: 4753 total, 1110 running, 3597 sleeping, 6 stopped, 40 zombie Tasks: 4519 total, 1056 running, 3417 sleeping, 6 stopped, 40 zombie Tasks: 4547 total, 1096 running, 3424 sleeping, 6 stopped, 21 zombie Tasks: 4632 total, 1140 running, 3453 sleeping, 6 stopped, 33 zombie Tasks: 4633 total, 1039 running, 3563 sleeping, 6 stopped, 25 zombie Tasks: 4737 total, 1089 running, 3611 sleeping, 6 stopped, 31 zombie Tasks: 4670 total, 1121 running, 3512 sleeping, 6 stopped, 31 zombie Tasks: 4506 total, 1045 running, 3433 sleeping, 6 stopped, 22 zombie Tasks: 4522 total, 1056 running, 3427 sleeping, 6 stopped, 33 zombie Tasks: 4491 total, 1098 running, 3363 sleeping, 6 stopped, 24 zombie Tasks: 4565 total, 1101 running, 3432 sleeping, 6 stopped, 26 zombie Tasks: 4559 total, 1112 running, 3406 sleeping, 6 stopped, 35 zombie Tasks: 4775 total, 1119 running, 3620 sleeping, 6 stopped, 30 zombie Tasks: 4677 total, 1109 running, 3545 sleeping, 6 stopped, 17 zombie Tasks: 4618 total, 1093 running, 3486 sleeping, 6 stopped, 33 zombie Tasks: 4518 total, 1100 running, 3385 sleeping, 6 stopped, 26 zombie I run 5 builds on the same host, each of them is BB_NUMBER_THREADS=32 and PARALLEL_MAKE="-j32", I can get the error every time when "bitbake image". Then I use "ulimit -u 10000", the "bitbake image" works well, and the world is in building. I've never seen this problem before 2015/01/02 (on the same host), did we improve bitbake's parallel recently, please ? I have two rough ideas to fix the problem: 1) Let bitbake check the remaining processes account before start new processes. 2) Try to reduce forking process in meta/classes, for example: | grep | sed We can get rid of "grep" to reduce forking. What's your opinion, please ? // Robert On 01/20/2015 10:25 AM, Robert Yang wrote: > > Hello RP, > > I've got several errors like the following when the system's load is high, > not sure whether related to pyinotify. > > for example, when do_configure: > sh: 0: Cannot fork > > when do_package or others: > Exception: OSError: [Errno 11] Resource temporarily unavailable > > // Robert > > On 01/19/2015 06:28 PM, Richard Purdie wrote: >> On Mon, 2015-01-19 at 16:10 +0800, Robert Yang wrote: >>> The inotify watcher numbers need less than "sysctl -n >>> fs.inotify.max_user_watches", >>> otherwise we may get the errors like: >>> WatchManagerError: add_watch: cannot watch /path/to/build/conf/bblayers.conf >>> WD=-1, Errno=No space left on device (ENOSPC), >>> >>> It's easy to meet this error if we run many builds at the same time, >>> On Ubuntu Ubuntu 12.04.3 x86_64, the default value is "8192". >>> >>> Can we add some counters in cooker.py (or other files) to check the >>> value and print ERRORS/WARNINGS, please ? Ther current "ENOSPC" errors >>> is not easy to debug. >>> >>> I'd like to work on it if that make sense. >> >> Surely we should just trap the ENOSPC error and translate it into a >> human readable error message? I don't like the idea of adding counters >> into the system. >> >> To improve the situation from a variety of perspectives, I'm thinking we >> should perhaps just place watches on the directories containing the >> files rather than the files themselves since this would drastically >> reduce the number of watches we need. The downside is we may have to be >> more careful about how we invalidate the caches. >> >> Cheers, >> >> Richard >> >> >>