From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Hetze Subject: Re: syscall rmdir hangs with autofs Date: Mon, 19 Jul 2010 13:40:33 +0200 Message-ID: <20100719114034.92ECB30303F5@mail.linux-ag.de> References: <20100719083932.187C6303001B@mail.linux-ag.de> <4C44333B.5090709@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Sebastian Hetze , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from ironport.linux-ag.com ([62.245.157.240]:13920 "EHLO ironport.linux-ag.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760590Ab0GSLkf (ORCPT ); Mon, 19 Jul 2010 07:40:35 -0400 Received: from localhost (mail.linux-ag.de [62.245.157.206]) by mail.linux-ag.de (Postfix) with ESMTP id 92ECB30303F5 for ; Mon, 19 Jul 2010 13:40:34 +0200 (CEST) Content-Disposition: inline In-Reply-To: <4C44333B.5090709@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Jul 19, 2010 at 02:12:59PM +0300, Avi Kivity wrote: > On 07/19/2010 11:39 AM, Sebastian Hetze wrote: >> Hi *, >> >> we are encountering occasional problems with autofs running inside >> an KVM guest. >> >> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds. >> [1387441.969110] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [1387441.969112] automount D e8510198 0 26560 2702 0x00000000 >> [1387441.969117] db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000 >> [1387441.969123] c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0 >> [1387441.969128] f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0 >> [1387441.969133] Call Trace: >> [1387441.969140] [] ? mntput_no_expire+0x1a/0xd0 >> [1387441.969146] [] __mutex_lock_slowpath+0xbe/0x120 >> [1387441.969149] [] mutex_lock+0x20/0x40 >> [1387441.969152] [] do_rmdir+0x52/0xe0 >> [1387441.969155] [] ? do_page_fault+0x1d7/0x3a0 >> [1387441.969158] [] sys_rmdir+0x10/0x20 >> [1387441.969161] [] syscall_call+0x7/0xb >> >> The block always occurs in sys_rmdir when automount tries to remove the >> mountpoint right after umounting the filesystem. There is an successful lstat() >> on the mountpoint directly precceeding the rmdir call. >> >> It looks like we are triggering some sort of race condition here. >> >> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest, >> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed >> long before with all different combinations of guest/host/qemu versions. >> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container >> on top of an LVM device. FS is ext3. >> >> Unfortunately, the problem is not easy reproduceable. It occurs every one >> or two weeks. But since the hanging system call blocks the whole filesystem >> we have to reboot the guest to get it into an useable state again. >> >> Any ideas what's going wrong here? >> >> > > Is there substantial I/O going on? > > If not, it may be an autofs bug unrelated to kvm. the autofs expire event occured at 01:15:01 sar shows 00:00:01 CPU %user %nice %system %iowait %steal %idle 00:35:02 all 0,31 1,90 3,88 29,78 0,00 64,12 00:45:02 all 0,72 1,99 3,56 23,93 0,00 69,80 00:55:01 all 1,35 1,49 4,13 23,76 0,00 69,27 01:05:01 all 0,77 1,84 4,43 28,34 0,00 64,62 01:15:01 all 0,29 1,46 3,41 44,07 0,00 50,77 01:25:02 all 0,22 1,25 2,63 45,34 0,00 50,56 01:35:02 all 0,34 1,33 2,87 46,74 0,00 48,72 01:45:02 all 0,30 0,90 2,57 40,03 0,00 56,20 01:55:02 all 0,26 0,43 2,29 9,79 0,00 87,23 00:00:01 tps rtps wtps bread/s bwrtn/s 00:35:02 461,69 407,75 53,94 35196,06 32673,83 00:45:02 298,29 238,30 59,99 38553,34 33062,97 00:55:01 294,81 241,08 53,73 35469,66 25948,30 01:05:01 338,62 279,97 58,66 36164,27 31109,18 01:15:01 462,22 406,24 55,97 28428,26 25725,05 01:25:02 366,88 331,82 35,07 24160,53 22284,83 01:35:02 394,73 358,21 36,52 25770,79 23516,81 01:45:02 409,83 379,66 30,17 17874,79 15608,74 01:55:02 453,18 448,62 4,56 3754,82 79,47 so, yes there is substantion I/O going on.