From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: syscall rmdir hangs with autofs
Date: Mon, 19 Jul 2010 15:21:44 +0300
Message-ID: <4C444358.8010500@redhat.com>
References: <20100719083932.187C6303001B@mail.linux-ag.de> <4C44333B.5090709@redhat.com> <20100719114034.62BDD30303F5@mail.linux-ag.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Sebastian Hetze <s.hetze@linux-ag.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:18149 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1760669Ab0GSMVy (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 19 Jul 2010 08:21:54 -0400
In-Reply-To: <20100719114034.62BDD30303F5@mail.linux-ag.de>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 07/19/2010 02:40 PM, Sebastian Hetze wrote:
> On Mon, Jul 19, 2010 at 02:12:59PM +0300, Avi Kivity wrote:
>    
>> On 07/19/2010 11:39 AM, Sebastian Hetze wrote:
>>      
>>> Hi *,
>>>
>>> we are encountering occasional problems with autofs running inside
>>> an KVM guest.
>>>
>>> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
>>> [1387441.969110] "echo 0>   /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
>>> [1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
>>> [1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
>>> [1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
>>> [1387441.969133] Call Trace:
>>> [1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
>>> [1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
>>> [1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
>>> [1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
>>> [1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
>>> [1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
>>> [1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb
>>>
>>> The block always occurs in sys_rmdir when automount tries to remove the
>>> mountpoint right after umounting the filesystem. There is an successful lstat()
>>> on the mountpoint directly precceeding the rmdir call.
>>>
>>> It looks like we are triggering some sort of race condition here.
>>>
>>> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
>>> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
>>> long before with all different combinations of guest/host/qemu versions.
>>> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
>>> on top of an LVM device. FS is ext3.
>>>
>>> Unfortunately, the problem is not easy reproduceable. It occurs every one
>>> or two weeks. But since the hanging system call blocks the whole filesystem
>>> we have to reboot the guest to get it into an useable state again.
>>>
>>> Any ideas what's going wrong here?
>>>
>>>
>>>        
>> Is there substantial I/O going on?
>>
>> If not, it may be an autofs bug unrelated to kvm.
>>      
> the autofs expire event occured at 01:15:01
>
> sar shows
>
> 00:00:01   CPU     %user     %nice   %system   %iowait  %steal   %idle
> 00:35:02   all      0,31      1,90      3,88     29,78    0,00   64,12
> 00:45:02   all      0,72      1,99      3,56     23,93    0,00   69,80
> 00:55:01   all      1,35      1,49      4,13     23,76    0,00   69,27
> 01:05:01   all      0,77      1,84      4,43     28,34    0,00   64,62
> 01:15:01   all      0,29      1,46      3,41     44,07    0,00   50,77
> 01:25:02   all      0,22      1,25      2,63     45,34    0,00   50,56
> 01:35:02   all      0,34      1,33      2,87     46,74    0,00   48,72
> 01:45:02   all      0,30      0,90      2,57     40,03    0,00   56,20
> 01:55:02   all      0,26      0,43      2,29      9,79    0,00   87,23
>
> 00:00:01          tps      rtps      wtps   bread/s   bwrtn/s
> 00:35:02       461,69    407,75     53,94  35196,06  32673,83
> 00:45:02       298,29    238,30     59,99  38553,34  33062,97
> 00:55:01       294,81    241,08     53,73  35469,66  25948,30
> 01:05:01       338,62    279,97     58,66  36164,27  31109,18
> 01:15:01       462,22    406,24     55,97  28428,26  25725,05
> 01:25:02       366,88    331,82     35,07  24160,53  22284,83
> 01:35:02       394,73    358,21     36,52  25770,79  23516,81
> 01:45:02       409,83    379,66     30,17  17874,79  15608,74
> 01:55:02       453,18    448,62      4,56   3754,82     79,47
>
> so, yes there is substantion I/O going on.
>    

Looks like a false alarm then.  The rmdir is waiting for the mount to 
flush everything to disk, which is slow and takes a while.

Does it return eventually?

Perhaps it should do a lazy unmount.

-- 
error compiling committee.c: too many arguments to function