Vivek Goyal wrote: > On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote: >> Vivek Goyal wrote: >>> Hi All, >>> >>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4. >>> First version of the patches was posted here. >> Hi Vivek, >> >> I did some simple test for V2, and triggered an kernel panic. >> The following script can reproduce this bug. It seems that the cgroup >> is already removed, but IO Controller still try to access into it. >> > > Hi Gui, > > Thanks for the report. I use cgroup_path() for debugging. I guess that > cgroup_path() was passed null cgrp pointer that's why it crashed. > > If yes, then it is strange though. I call cgroup_path() only after > grabbing a refenrece to css object. (I am assuming that if I have a valid > reference to css object then css->cgrp can't be null). > Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here. The code dealing with css refcnt and cgroup rmdir has changed quite a lot, and is much more complex than it was. > Anyway, can you please try out following patch and see if it fixes your > crash. ... > BTW, I tried following equivalent script and I can't see the crash on > my system. Are you able to hit it regularly? > I modified the script like this: ====================== #!/bin/sh echo 1 > /proc/sys/vm/drop_caches mkdir /cgroup 2> /dev/null mount -t cgroup -o io,blkio io /cgroup mkdir /cgroup/test1 mkdir /cgroup/test2 echo 100 > /cgroup/test1/io.weight echo 500 > /cgroup/test2/io.weight dd if=/dev/zero bs=4096 count=128000 of=500M.1 & pid1=$! echo $pid1 > /cgroup/test1/tasks dd if=/dev/zero bs=4096 count=128000 of=500M.2 & pid2=$! echo $pid2 > /cgroup/test2/tasks sleep 5 kill -9 $pid1 kill -9 $pid2 for ((;count != 2;)) { rmdir /cgroup/test1 > /dev/null 2>&1 if [ $? -eq 0 ]; then count=$(( $count + 1 )) fi rmdir /cgroup/test2 > /dev/null 2>&1 if [ $? -eq 0 ]; then count=$(( $count + 1 )) fi } umount /cgroup rmdir /cgroup ====================== I ran this script and got lockdep BUG. Full log and my config are attached. Actually this can be triggered with the following steps on my box: # mount -t cgroup -o blkio,io xxx /mnt # mkdir /mnt/0 # echo $$ > /mnt/0/tasks # echo 3 > /proc/sys/vm/drop_cache # echo $$ > /mnt/tasks # rmdir /mnt/0 And when I ran the script for the second time, my box was freezed and I had to reset it. > Instead of killing the tasks I also tried moving the tasks into root cgroup > and then deleting test1 and test2 groups, that also did not produce any crash. > (Hit a different bug though after 5-6 attempts :-) > > As I mentioned in the patchset, currently we do have issues with group > refcounting and cgroup/group going away. Hopefully in next version they > all should be fixed up. But still, it is nice to hear back... >