From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiri Slaby Subject: Re: Regression from 2.6.36 Date: Thu, 07 Apr 2011 12:19:22 +0200 Message-ID: <4D9D8FAA.9080405@suse.cz> References: <20110315132527.130FB80018F1@mail1005.cent> <20110317001519.GB18911@kroah.com> <20110407120112.E08DCA03@pobox.sk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: linux-kernel@vger.kernel.org, Changli Gao , Andrew Morton , linux-mm@kvack.org, Eric Dumazet , linux-fsdevel@vger.kernel.org, Jiri Slaby To: azurIt Return-path: In-Reply-To: <20110407120112.E08DCA03@pobox.sk> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org Cced few people. Also the series which introduced this were discussed at: http://lkml.org/lkml/2010/5/3/53 On 04/07/2011 12:01 PM, azurIt wrote: >=20 > I have finally completed bisection, here are the results: >=20 >=20 >=20 > a892e2d7dcdfa6c76e60c50a8c7385c65587a2a6 is first bad commit > commit a892e2d7dcdfa6c76e60c50a8c7385c65587a2a6 > Author: Changli Gao > Date: Tue Aug 10 18:01:35 2010 -0700 >=20 > vfs: use kmalloc() to allocate fdmem if possible > =20 > Use kmalloc() to allocate fdmem if possible. > =20 > vmalloc() is used as a fallback solution for fdmem allocation. A n= ew > helper function __free_fdtable() is introduced to reduce the lines = of > code. > =20 > A potential bug, vfree() a memory allocated by kmalloc(), is fixed. > =20 > [akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem(= ) and free_fdmem()] > Signed-off-by: Changli Gao > Cc: Alexander Viro > Cc: Jiri Slaby > Cc: "Paul E. McKenney" > Cc: Alexey Dobriyan > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Avi Kivity > Cc: Tetsuo Handa > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds >=20 > :040000 040000 a7b3997bc754f573b4a309cda1a0774ea95c235e 4241a4f2115c60e= 5c1dc1879c85c9911fa077807 M fs >=20 >=20 >=20 >=20 >=20 > =20 > ______________________________________________________________ > > Od: "Greg KH" > > Komu: azurIt > > D=C3=A1tum: 17.03.2011 01:15 > > Predmet: Re: Regression from 2.6.36 > > > > CC: linux-kernel@vger.kernel.org On Tue, Mar 15, 2011 at 02:25:27PM = +0100, azurIt wrote:=20 > > =20 > > Hi,=20 > > =20 > > we are successfully running several very busy web servers on 2.6.32.= * and=20 > > few days ago I decided to upgrade to 2.6.37 (mainly because of blkio= cgroup).=20 > > I installed 2.6.37.2 on one of the servers and very strange things s= tarted to=20 > > happen with Apache web server.=20 > > =20 > > We are using Apache with MPM-ITK ( http://mpm-itk.sesse.net/ ) so it= is doing=20 > > lots of 'fork' and lots of 'setuid'. I have also noticed that proble= m is=20 > > happening only on very busy servers.=20 > > =20 > > Everything is ok when Apache is started but as time is passing by, i= ts 'root'=20 > > processes (Apache processes running under root) are consuming more a= nd more CPU.=20 > > Finally, the whole server becames very unstable and Apache must be r= estarted.=20 > > This is repeating until the load on web sites is much lower (usually= on 22:00).=20 > > Sometimes it takes 3 hours when restart is needed, sometimes only 1 = hour (again,=20 > > depends on load on web sites). Here is the graph of CPU utilization = showing the=20 > > problem (red color), Apache was REstarted at 8:11 and 9:35:=20 > > http://watchdog.sk/lkml/cpu-problem.png=20 > > =20 > > Here is how it looks on htop:=20 > > http://watchdog.sk/lkml/htop.jpg=20 > > =20 > > And finally here is how it looks with older kernels (yes, when i ins= tall older=20 > > kernel, problem is gone), notice also that I/O wait is much lower an= d nicer=20 > > (blue color):=20 > > http://watchdog.sk/lkml/cpu-ok.png=20 > > =20 > > I was also strace-ing Apache processes which were doing problems, he= re it is:=20 > > http://watchdog.sk/lkml/strace.txt=20 > > =20 > > I'm not 100% sure but I think that CPU was consumed on 'futex' lines= .=20 > > =20 > > I tried several kernel versions and find out that everything BEFORE = 2.6.36 is=20 > > NOT affected and everything AFTER 2.6.36 (included) is affected.=20 > > =20 > > Versions which I tried and were NOT affected by this problem:=20 > > 2.6.32.*=20 > > 2.6.35.11=20 > > =20 > > Versions which I tried and were affected by this problem:=20 > > 2.6.36=20 > > 2.6.36.4=20 > > 2.6.37.2=20 > > 2.6.37.3=20 > > 2.6.38-rc8 (final version was not released yet)=20 > > =20 > > All tests were made on vanilla kernels on Debian Lenny with this con= fig:=20 > > http://watchdog.sk/lkml/config=20 > > =20 > > Do you need any other information from me ? I'm able to try other ve= rsions or=20 > > patches but, please, take into account that I have to do this on _pr= oduction_=20 > > server (I failed to reproduce it in testing environment). Also, I'm = able to try=20 > > only one kernel per day.=20 > =20 > Ick, one kernel per day might make this a bit difficult, but if there=20 > was any way you could use 'git bisect' to try to narrow this down to t= he=20 > patch that caused this problem, it would be great.=20 > =20 > You can mark 2.6.35 as working and 2.6.36 as bad and git will go from=20 > there and try to offer you different chances to find the problem.=20 > =20 > thanks,=20 > =20 > greg k-h thanks, --=20 js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter= .ca/ Don't email: email@kvack.org