* 3.13.5 : rm -rf running forever, one cpu at approx 100% @ 2014-02-27 0:52 Ken Moffat 2014-02-27 1:28 ` Gene Heskett 2014-02-27 3:26 ` Mike Galbraith 0 siblings, 2 replies; 10+ messages in thread From: Ken Moffat @ 2014-02-27 0:52 UTC (permalink / raw) To: linux-kernel Hi, Short summary : on 3.13.5, rm -rf of an application source directory on an ext4 filesystem sometimes takes forever (probably isn't going anywhere), with one CPU pegged at all-but 100% utilization. I've nearly finished building a new system from source, to check various desktop packages in linuxfromscratch. On this build, much of it is things I don't normally use and I needed to upgrade my buildscripts, so most of it was built in chroot using 3.10.32. But late last night I booted the new system using 3.13.5 to finish the build. This morning I discovered that rm -rf for the icedtea source directory was still running, and had taken over 5 hours of CPU time (one CPU seemd to be running at close to 100%, the others had dropped to their slowest frequency). That script was running as root (yeah, but it's a new system) and it looks as if /etc/passwd~ had got trashed, because I could no longer su or login. Not sure if that is related, at this stage it might just be a side-effect of my scripts. Booted another system, chrooted, fixed up passwords. Started again after commenting out icedtea - I hadn't intended to build what was an old version, I'd just forgotten it was in this script - that's why I do things in userspace, not the kernel :-( Continued with remaining packages, but a couple of hours later I saw a similar "one CPU at 100%, rm -rf GConf source taking forever" problem. Dumped all the processes with Alt-SysRQ-T [ huge log ] but at that point 'rm' was merely 'ready' so I doubt there is anything useful to see in the log. Built 3.13.4, booted to that. So far, everything looks good - but I'm now building the _current_ version of icedtea, so if this isn't a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow. Meanwhile, any suggestions about how I can debug this if I hit it again, please ? ĸen -- das eine Mal als Tragödie, dieses Mal als Farce ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 0:52 3.13.5 : rm -rf running forever, one cpu at approx 100% Ken Moffat @ 2014-02-27 1:28 ` Gene Heskett 2014-02-27 1:47 ` Ken Moffat 2014-02-27 3:26 ` Mike Galbraith 1 sibling, 1 reply; 10+ messages in thread From: Gene Heskett @ 2014-02-27 1:28 UTC (permalink / raw) To: Ken Moffat; +Cc: linux-kernel On Wednesday 26 February 2014, Ken Moffat wrote: >Hi, > > Short summary : on 3.13.5, rm -rf of an application source >directory on an ext4 filesystem sometimes takes forever (probably >isn't going anywhere), with one CPU pegged at all-but 100% utilization. > > I've nearly finished building a new system from source, to check >various desktop packages in linuxfromscratch. On this build, much of >it is things I don't normally use and I needed to upgrade my >buildscripts, so most of it was built in chroot using 3.10.32. But >late last night I booted the new system using 3.13.5 to finish the >build. This morning I discovered that rm -rf for the icedtea source >directory was still running, and had taken over 5 hours of CPU time >(one CPU seemd to be running at close to 100%, the others had dropped >to their slowest frequency). That script was running as root (yeah, >but it's a new system) and it looks as if /etc/passwd~ had got >trashed, because I could no longer su or login. Not sure if that is >related, at this stage it might just be a side-effect of my scripts. > > Booted another system, chrooted, fixed up passwords. Started >again after commenting out icedtea - I hadn't intended to build >what was an old version, I'd just forgotten it was in this script - >that's why I do things in userspace, not the kernel :-( > > Continued with remaining packages, but a couple of hours later I >saw a similar "one CPU at 100%, rm -rf GConf source taking forever" >problem. Dumped all the processes with Alt-SysRQ-T [ huge log ] but >at that point 'rm' was merely 'ready' so I doubt there is anything >useful to see in the log. > > Built 3.13.4, booted to that. So far, everything looks good - but >I'm now building the _current_ version of icedtea, so if this isn't >a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow. > > Meanwhile, any suggestions about how I can debug this if I hit it >again, please ? > >ؤ¸en I don't have any help to offer Ken, but this walks and quacks much like a duck I'm encountering in 3.13.5, with the backup program amanda, which uses gnu tar. To facilitate intelligent guesses as to the size of the various levels of backup, amanda does a dummy collection using tar, sent to /dev/null, using only the size it reports on the first pass. Version 1.22, quite old, works on 3.12.9, but not on 3.13.5. I have now pulled in, built and installed tar-1.27, and rebuilt amanda to let it know that the tar its using is not in /usr/bin, but in /usr/local/bin. Next run at 1:30AM This freeze, using 100% of a core, but causing no visible disk activity has killed my backups 3 nights running. At this point I've no clue as to the cause, but I will be watching this thread closely, it has a similar description. Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> NOTICE: Will pay 100 USD for an HP-4815A defective but complete probe assembly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 1:28 ` Gene Heskett @ 2014-02-27 1:47 ` Ken Moffat 2014-02-27 1:57 ` Gene Heskett 0 siblings, 1 reply; 10+ messages in thread From: Ken Moffat @ 2014-02-27 1:47 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote: > On Wednesday 26 February 2014, Ken Moffat wrote: > > I don't have any help to offer Ken, but this walks and quacks much like a > duck I'm encountering in 3.13.5, with the backup program amanda, which uses > gnu tar. To facilitate intelligent guesses as to the size of the various > levels of backup, amanda does a dummy collection using tar, sent to > /dev/null, using only the size it reports on the first pass. Version 1.22, > quite old, works on 3.12.9, but not on 3.13.5. I have now pulled in, built > and installed tar-1.27, and rebuilt amanda to let it know that the tar its > using is not in /usr/bin, but in /usr/local/bin. Next run at 1:30AM > > This freeze, using 100% of a core, but causing no visible disk activity has > killed my backups 3 nights running. At this point I've no clue as to the > cause, but I will be watching this thread closely, it has a similar > description. > > Cheers, Gene My box where I see this is a phenom (my intel is still on 3.13.4), and from past threads I suspect yours is as well ? ĸen -- das eine Mal als Tragödie, dieses Mal als Farce ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 1:47 ` Ken Moffat @ 2014-02-27 1:57 ` Gene Heskett 0 siblings, 0 replies; 10+ messages in thread From: Gene Heskett @ 2014-02-27 1:57 UTC (permalink / raw) To: Ken Moffat; +Cc: linux-kernel On Wednesday 26 February 2014, Ken Moffat wrote: >On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote: >> On Wednesday 26 February 2014, Ken Moffat wrote: >> >> I don't have any help to offer Ken, but this walks and quacks much like >> a duck I'm encountering in 3.13.5, with the backup program amanda, >> which uses gnu tar. To facilitate intelligent guesses as to the size >> of the various levels of backup, amanda does a dummy collection using >> tar, sent to /dev/null, using only the size it reports on the first >> pass. Version 1.22, quite old, works on 3.12.9, but not on 3.13.5. I >> have now pulled in, built and installed tar-1.27, and rebuilt amanda >> to let it know that the tar its using is not in /usr/bin, but in >> /usr/local/bin. Next run at 1:30AM >> >> This freeze, using 100% of a core, but causing no visible disk activity >> has killed my backups 3 nights running. At this point I've no clue as >> to the cause, but I will be watching this thread closely, it has a >> similar description. >> >> Cheers, Gene > > My box where I see this is a phenom (my intel is still on 3.13.4), >and from past threads I suspect yours is as well ? > >ؤ¸en Yes, it is Ken, an old slow 2.1 Ghz 9550 phenom, with 8Gb of ram. 32 bit PAE build. I wonder if that is also connected... But my crystal ball is broke, in addition to the wet ram starting to rust out. Can't be the years, I've only 40 years practice at being 39. ;-) Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> NOTICE: Will pay 100 USD for an HP-4815A defective but complete probe assembly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 0:52 3.13.5 : rm -rf running forever, one cpu at approx 100% Ken Moffat 2014-02-27 1:28 ` Gene Heskett @ 2014-02-27 3:26 ` Mike Galbraith 2014-02-27 3:45 ` Ken Moffat 1 sibling, 1 reply; 10+ messages in thread From: Mike Galbraith @ 2014-02-27 3:26 UTC (permalink / raw) To: Ken Moffat; +Cc: linux-kernel On Thu, 2014-02-27 at 00:52 +0000, Ken Moffat wrote: > Hi, > > Short summary : on 3.13.5, rm -rf of an application source > directory on an ext4 filesystem sometimes takes forever (probably > isn't going anywhere), with one CPU pegged at all-but 100% utilization. > > I've nearly finished building a new system from source, to check > various desktop packages in linuxfromscratch. On this build, much of > it is things I don't normally use and I needed to upgrade my > buildscripts, so most of it was built in chroot using 3.10.32. But > late last night I booted the new system using 3.13.5 to finish the > build. This morning I discovered that rm -rf for the icedtea source > directory was still running, and had taken over 5 hours of CPU time > (one CPU seemd to be running at close to 100%, the others had dropped > to their slowest frequency). That script was running as root (yeah, > but it's a new system) and it looks as if /etc/passwd~ had got > trashed, because I could no longer su or login. Not sure if that is > related, at this stage it might just be a side-effect of my scripts. > > Booted another system, chrooted, fixed up passwords. Started > again after commenting out icedtea - I hadn't intended to build > what was an old version, I'd just forgotten it was in this script - > that's why I do things in userspace, not the kernel :-( > > Continued with remaining packages, but a couple of hours later I > saw a similar "one CPU at 100%, rm -rf GConf source taking forever" > problem. Dumped all the processes with Alt-SysRQ-T [ huge log ] but > at that point 'rm' was merely 'ready' so I doubt there is anything > useful to see in the log. > > Built 3.13.4, booted to that. So far, everything looks good - but > I'm now building the _current_ version of icedtea, so if this isn't > a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow. > > Meanwhile, any suggestions about how I can debug this if I hit it > again, please ? I would start with strace to see if a task is looping in userspace, then move on to perf top -g -p <pid> (or perf record/report) to peek at what it's up to in the kernel. Once you have the where, trace_printk() is the best thing since sliced bread (which ranks just below printk()). -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 3:26 ` Mike Galbraith @ 2014-02-27 3:45 ` Ken Moffat 2014-02-27 4:19 ` Gene Heskett 2014-02-27 4:52 ` Mike Galbraith 0 siblings, 2 replies; 10+ messages in thread From: Ken Moffat @ 2014-02-27 3:45 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote: > > I would start with strace to see if a task is looping in userspace, then > move on to perf top -g -p <pid> (or perf record/report) to peek at what > it's up to in the kernel. Once you have the where, trace_printk() is > the best thing since sliced bread (which ranks just below printk()). > > -Mike Thanks. I'll need to build perf. ĸen -- das eine Mal als Tragödie, dieses Mal als Farce ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 3:45 ` Ken Moffat @ 2014-02-27 4:19 ` Gene Heskett 2014-02-27 4:52 ` Mike Galbraith 1 sibling, 0 replies; 10+ messages in thread From: Gene Heskett @ 2014-02-27 4:19 UTC (permalink / raw) To: Ken Moffat; +Cc: Mike Galbraith, linux-kernel On Wednesday 26 February 2014, Ken Moffat wrote: >On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote: >> I would start with strace to see if a task is looping in userspace, >> then move on to perf top -g -p <pid> (or perf record/report) to peek >> at what it's up to in the kernel. Once you have the where, >> trace_printk() is the best thing since sliced bread (which ranks just >> below printk()). >> >> -Mike > > Thanks. I'll need to build perf. > >ؤ¸en I probably will too, but I don't have a huge amount of tracing turned on in this kernel. We'll see what happens tonight & go from there. FWIW, about all I see in htop is the command line that launched it. Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> NOTICE: Will pay 100 USD for an HP-4815A defective but complete probe assembly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 3:45 ` Ken Moffat 2014-02-27 4:19 ` Gene Heskett @ 2014-02-27 4:52 ` Mike Galbraith 2014-02-27 11:30 ` Gene Heskett 2014-03-03 21:16 ` Ken Moffat 1 sibling, 2 replies; 10+ messages in thread From: Mike Galbraith @ 2014-02-27 4:52 UTC (permalink / raw) To: Ken Moffat; +Cc: linux-kernel On Thu, 2014-02-27 at 03:45 +0000, Ken Moffat wrote: > On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote: > > > > I would start with strace to see if a task is looping in userspace, then > > move on to perf top -g -p <pid> (or perf record/report) to peek at what > > it's up to in the kernel. Once you have the where, trace_printk() is > > the best thing since sliced bread (which ranks just below printk()). > > > > -Mike > Thanks. I'll need to build perf. You may want to build the kernel with frame-pointers too, for easy gdb list *0x(hexnum) of *func()+0x(hexoffset) use. Crash is also pretty handy both for rummaging live via crash vmlinux /proc/kcore, and for leisurely postmortem analysis if you set the box up to crashdump in advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger) when you see the bad thing happen. Crash has all kinds of goodies, including invocation of gdb. -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 4:52 ` Mike Galbraith @ 2014-02-27 11:30 ` Gene Heskett 2014-03-03 21:16 ` Ken Moffat 1 sibling, 0 replies; 10+ messages in thread From: Gene Heskett @ 2014-02-27 11:30 UTC (permalink / raw) To: Mike Galbraith; +Cc: Ken Moffat, linux-kernel On Wednesday 26 February 2014, Mike Galbraith wrote: >On Thu, 2014-02-27 at 03:45 +0000, Ken Moffat wrote: >> On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote: >> > I would start with strace to see if a task is looping in userspace, >> > then move on to perf top -g -p <pid> (or perf record/report) to peek >> > at what it's up to in the kernel. Once you have the where, >> > trace_printk() is the best thing since sliced bread (which ranks >> > just below printk()). >> > >> > -Mike >> >> Thanks. I'll need to build perf. > >You may want to build the kernel with frame-pointers too, for easy gdb >list *0x(hexnum) of *func()+0x(hexoffset) use. Crash is also pretty >handy both for rummaging live via crash vmlinux /proc/kcore, and for >leisurely postmortem analysis if you set the box up to crashdump in >advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger) >when you see the bad thing happen. Crash has all kinds of goodies, >including invocation of gdb. > >-Mike It doesn't look as if I'll have to guys. Amanda is about 1/2 done and running normally by switching from tar-1.22 to tar-1.27. So I'm going to toddle off in the general direction of a bed and let amanda send me an email when its done. And it did, including some "strange" reports I have yet to decipher. Diffs in the tar's probably. But it did not hang. Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> NOTICE: Will pay 100 USD for an HP-4815A defective but complete probe assembly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 3.13.5 : rm -rf running forever, one cpu at approx 100% 2014-02-27 4:52 ` Mike Galbraith 2014-02-27 11:30 ` Gene Heskett @ 2014-03-03 21:16 ` Ken Moffat 1 sibling, 0 replies; 10+ messages in thread From: Ken Moffat @ 2014-03-03 21:16 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel On Thu, Feb 27, 2014 at 05:52:42AM +0100, Mike Galbraith wrote: > On Thu, 2014-02-27 at 03:45 +0000, Ken Moffat wrote: > > On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote: > > > > > > I would start with strace to see if a task is looping in userspace, then > > > move on to perf top -g -p <pid> (or perf record/report) to peek at what > > > it's up to in the kernel. Once you have the where, trace_printk() is > > > the best thing since sliced bread (which ranks just below printk()). > > > > > > -Mike > > Thanks. I'll need to build perf. > > You may want to build the kernel with frame-pointers too, for easy gdb > list *0x(hexnum) of *func()+0x(hexoffset) use. Crash is also pretty > handy both for rummaging live via crash vmlinux /proc/kcore, and for > leisurely postmortem analysis if you set the box up to crashdump in > advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger) > when you see the bad thing happen. Crash has all kinds of goodies, > including invocation of gdb. > > -Mike In fact, it seems I had already enabled perf, but not built the userspace part. However, I was motivated to build crash, and a kernel with frame pointers plus a _lot_ of debug options. Unsurprisingly, that ran like a dog, and many things seemed to use a lot of CPU according to top (e.g. 'X', at least for the first several minutes after it started), but I couldn't replicate the problem. Also proved that my OpenJDK script doesn't normally trash /etc/passwd- :) Then, I went back to the original 3.13.5, proved that perf was working, and built a new LFS system in chroot, again with 3.13.5, in chroot. Booted it as soon as I'd built xorg, then went on through glib/gtk, firefox, print/photo programs, libreoffice. No problems at all, apart from the common "needed to use -j3 for gcc, because it's a phonon". NB I forgot to mention that this is all 64-bit, and the box has 8GB. If something does happen here in the next few days (I've still got to sort out my kde scripts and build that) then I'll reply to this. But otherwise, I guess it must be cosmic rays, or maybe Gaffer forgot to feed and muck-out the imps. ĸen -- das eine Mal als Tragödie, dieses Mal als Farce ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-03 21:16 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-27 0:52 3.13.5 : rm -rf running forever, one cpu at approx 100% Ken Moffat 2014-02-27 1:28 ` Gene Heskett 2014-02-27 1:47 ` Ken Moffat 2014-02-27 1:57 ` Gene Heskett 2014-02-27 3:26 ` Mike Galbraith 2014-02-27 3:45 ` Ken Moffat 2014-02-27 4:19 ` Gene Heskett 2014-02-27 4:52 ` Mike Galbraith 2014-02-27 11:30 ` Gene Heskett 2014-03-03 21:16 ` Ken Moffat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox