Vivek Goyal wrote:
> On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi All,
>>>
>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>> First version of the patches was posted here.
>> Hi Vivek,
>>
>> I did some simple test for V2, and triggered an kernel panic.
>> The following script can reproduce this bug. It seems that the cgroup
>> is already removed, but IO Controller still try to access into it.
>>
> 
> Hi Gui,
> 
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
> 
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).
> 

Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
and is much more complex than it was.

> Anyway, can you please try out following patch and see if it fixes your
> crash.
...
> BTW, I tried following equivalent script and I can't see the crash on 
> my system. Are you able to hit it regularly?
> 

I modified the script like this:

======================
#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight

dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
pid1=$!
echo $pid1 > /cgroup/test1/tasks

dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks

sleep 5
kill -9 $pid1
kill -9 $pid2

for ((;count != 2;))
{
        rmdir /cgroup/test1 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi

        rmdir /cgroup/test2 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi
}

umount /cgroup
rmdir /cgroup
======================

I ran this script and got lockdep BUG. Full log and my config are attached.

Actually this can be triggered with the following steps on my box:
# mount -t cgroup -o blkio,io xxx /mnt
# mkdir /mnt/0
# echo $$ > /mnt/0/tasks
# echo 3 > /proc/sys/vm/drop_cache
# echo $$ > /mnt/tasks
# rmdir /mnt/0

And when I ran the script for the second time, my box was freezed
and I had to reset it.

> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
> 
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
>