syscall rmdir hangs with autofs

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* syscall rmdir hangs with autofs
@ 2010-07-19  8:39 Sebastian Hetze
  2010-07-19 11:12 ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19  8:39 UTC (permalink / raw)
  To: kvm

Hi *,

we are encountering occasional problems with autofs running inside
an KVM guest.

[1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
[1387441.969110] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
[1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
[1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
[1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
[1387441.969133] Call Trace:
[1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
[1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
[1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
[1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
[1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
[1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
[1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb

The block always occurs in sys_rmdir when automount tries to remove the
mountpoint right after umounting the filesystem. There is an successful lstat()
on the mountpoint directly precceeding the rmdir call. 

It looks like we are triggering some sort of race condition here.

We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
long before with all different combinations of guest/host/qemu versions.
The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
on top of an LVM device. FS is ext3.

Unfortunately, the problem is not easy reproduceable. It occurs every one
or two weeks. But since the hanging system call blocks the whole filesystem
we have to reboot the guest to get it into an useable state again.

Any ideas what's going wrong here?

Best regards,

  Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19  8:39 syscall rmdir hangs with autofs Sebastian Hetze
@ 2010-07-19 11:12 ` Avi Kivity
  2010-07-19 11:40   ` Sebastian Hetze
       [not found]   ` <20100719114034.62BDD30303F5@mail.linux-ag.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 11:12 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 11:39 AM, Sebastian Hetze wrote:
> Hi *,
>
> we are encountering occasional problems with autofs running inside
> an KVM guest.
>
> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
> [1387441.969110] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
> [1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
> [1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
> [1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
> [1387441.969133] Call Trace:
> [1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
> [1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
> [1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
> [1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
> [1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
> [1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
> [1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb
>
> The block always occurs in sys_rmdir when automount tries to remove the
> mountpoint right after umounting the filesystem. There is an successful lstat()
> on the mountpoint directly precceeding the rmdir call.
>
> It looks like we are triggering some sort of race condition here.
>
> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
> long before with all different combinations of guest/host/qemu versions.
> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
> on top of an LVM device. FS is ext3.
>
> Unfortunately, the problem is not easy reproduceable. It occurs every one
> or two weeks. But since the hanging system call blocks the whole filesystem
> we have to reboot the guest to get it into an useable state again.
>
> Any ideas what's going wrong here?
>
>    

Is there substantial I/O going on?

If not, it may be an autofs bug unrelated to kvm.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 11:12 ` Avi Kivity
@ 2010-07-19 11:40   ` Sebastian Hetze
       [not found]   ` <20100719114034.62BDD30303F5@mail.linux-ag.de>
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 11:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 02:12:59PM +0300, Avi Kivity wrote:
> On 07/19/2010 11:39 AM, Sebastian Hetze wrote:
>> Hi *,
>>
>> we are encountering occasional problems with autofs running inside
>> an KVM guest.
>>
>> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
>> [1387441.969110] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
>> [1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
>> [1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
>> [1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
>> [1387441.969133] Call Trace:
>> [1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
>> [1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
>> [1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
>> [1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
>> [1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
>> [1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
>> [1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb
>>
>> The block always occurs in sys_rmdir when automount tries to remove the
>> mountpoint right after umounting the filesystem. There is an successful lstat()
>> on the mountpoint directly precceeding the rmdir call.
>>
>> It looks like we are triggering some sort of race condition here.
>>
>> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
>> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
>> long before with all different combinations of guest/host/qemu versions.
>> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
>> on top of an LVM device. FS is ext3.
>>
>> Unfortunately, the problem is not easy reproduceable. It occurs every one
>> or two weeks. But since the hanging system call blocks the whole filesystem
>> we have to reboot the guest to get it into an useable state again.
>>
>> Any ideas what's going wrong here?
>>
>>    
>
> Is there substantial I/O going on?
>
> If not, it may be an autofs bug unrelated to kvm.

the autofs expire event occured at 01:15:01

sar shows

00:00:01   CPU     %user     %nice   %system   %iowait  %steal   %idle
00:35:02   all      0,31      1,90      3,88     29,78    0,00   64,12
00:45:02   all      0,72      1,99      3,56     23,93    0,00   69,80
00:55:01   all      1,35      1,49      4,13     23,76    0,00   69,27
01:05:01   all      0,77      1,84      4,43     28,34    0,00   64,62
01:15:01   all      0,29      1,46      3,41     44,07    0,00   50,77
01:25:02   all      0,22      1,25      2,63     45,34    0,00   50,56
01:35:02   all      0,34      1,33      2,87     46,74    0,00   48,72
01:45:02   all      0,30      0,90      2,57     40,03    0,00   56,20
01:55:02   all      0,26      0,43      2,29      9,79    0,00   87,23

00:00:01          tps      rtps      wtps   bread/s   bwrtn/s
00:35:02       461,69    407,75     53,94  35196,06  32673,83
00:45:02       298,29    238,30     59,99  38553,34  33062,97
00:55:01       294,81    241,08     53,73  35469,66  25948,30
01:05:01       338,62    279,97     58,66  36164,27  31109,18
01:15:01       462,22    406,24     55,97  28428,26  25725,05
01:25:02       366,88    331,82     35,07  24160,53  22284,83
01:35:02       394,73    358,21     36,52  25770,79  23516,81
01:45:02       409,83    379,66     30,17  17874,79  15608,74
01:55:02       453,18    448,62      4,56   3754,82     79,47

so, yes there is substantion I/O going on.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
       [not found]   ` <20100719114034.62BDD30303F5@mail.linux-ag.de>
@ 2010-07-19 12:21     ` Avi Kivity
  2010-07-19 12:48       ` Sebastian Hetze
  0 siblings, 1 reply; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 12:21 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 02:40 PM, Sebastian Hetze wrote:
> On Mon, Jul 19, 2010 at 02:12:59PM +0300, Avi Kivity wrote:
>    
>> On 07/19/2010 11:39 AM, Sebastian Hetze wrote:
>>      
>>> Hi *,
>>>
>>> we are encountering occasional problems with autofs running inside
>>> an KVM guest.
>>>
>>> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
>>> [1387441.969110] "echo 0>   /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
>>> [1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
>>> [1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
>>> [1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
>>> [1387441.969133] Call Trace:
>>> [1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
>>> [1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
>>> [1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
>>> [1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
>>> [1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
>>> [1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
>>> [1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb
>>>
>>> The block always occurs in sys_rmdir when automount tries to remove the
>>> mountpoint right after umounting the filesystem. There is an successful lstat()
>>> on the mountpoint directly precceeding the rmdir call.
>>>
>>> It looks like we are triggering some sort of race condition here.
>>>
>>> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
>>> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
>>> long before with all different combinations of guest/host/qemu versions.
>>> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
>>> on top of an LVM device. FS is ext3.
>>>
>>> Unfortunately, the problem is not easy reproduceable. It occurs every one
>>> or two weeks. But since the hanging system call blocks the whole filesystem
>>> we have to reboot the guest to get it into an useable state again.
>>>
>>> Any ideas what's going wrong here?
>>>
>>>
>>>        
>> Is there substantial I/O going on?
>>
>> If not, it may be an autofs bug unrelated to kvm.
>>      
> the autofs expire event occured at 01:15:01
>
> sar shows
>
> 00:00:01   CPU     %user     %nice   %system   %iowait  %steal   %idle
> 00:35:02   all      0,31      1,90      3,88     29,78    0,00   64,12
> 00:45:02   all      0,72      1,99      3,56     23,93    0,00   69,80
> 00:55:01   all      1,35      1,49      4,13     23,76    0,00   69,27
> 01:05:01   all      0,77      1,84      4,43     28,34    0,00   64,62
> 01:15:01   all      0,29      1,46      3,41     44,07    0,00   50,77
> 01:25:02   all      0,22      1,25      2,63     45,34    0,00   50,56
> 01:35:02   all      0,34      1,33      2,87     46,74    0,00   48,72
> 01:45:02   all      0,30      0,90      2,57     40,03    0,00   56,20
> 01:55:02   all      0,26      0,43      2,29      9,79    0,00   87,23
>
> 00:00:01          tps      rtps      wtps   bread/s   bwrtn/s
> 00:35:02       461,69    407,75     53,94  35196,06  32673,83
> 00:45:02       298,29    238,30     59,99  38553,34  33062,97
> 00:55:01       294,81    241,08     53,73  35469,66  25948,30
> 01:05:01       338,62    279,97     58,66  36164,27  31109,18
> 01:15:01       462,22    406,24     55,97  28428,26  25725,05
> 01:25:02       366,88    331,82     35,07  24160,53  22284,83
> 01:35:02       394,73    358,21     36,52  25770,79  23516,81
> 01:45:02       409,83    379,66     30,17  17874,79  15608,74
> 01:55:02       453,18    448,62      4,56   3754,82     79,47
>
> so, yes there is substantion I/O going on.
>    

Looks like a false alarm then.  The rmdir is waiting for the mount to 
flush everything to disk, which is slow and takes a while.

Does it return eventually?

Perhaps it should do a lazy unmount.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 12:21     ` Avi Kivity
@ 2010-07-19 12:48       ` Sebastian Hetze
  2010-07-19 13:09         ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 12:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 03:21:44PM +0300, Avi Kivity wrote:
> On 07/19/2010 02:40 PM, Sebastian Hetze wrote:
>> On Mon, Jul 19, 2010 at 02:12:59PM +0300, Avi Kivity wrote:
>>    
>>> On 07/19/2010 11:39 AM, Sebastian Hetze wrote:
>>>      
>>>> Hi *,
>>>>
>>>> we are encountering occasional problems with autofs running inside
>>>> an KVM guest.
>>>>
>>>> [1387441.969106] INFO: task automount:26560 blocked for more than 120 seconds.
>>>> [1387441.969110] "echo 0>   /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [1387441.969112] automount     D e8510198     0 26560   2702 0x00000000
>>>> [1387441.969117]  db0a1ef4 00000082 80000000 e8510198 0004ed69 c8266000 f6e85a40 00000000
>>>> [1387441.969123]  c08455e0 c08455e0 f41157f0 f4115a88 c55315e0 00000000 c0207c0a db0a1ef0
>>>> [1387441.969128]  f4115a88 f7222bbc f7222bb8 ffffffff db0a1f20 c05976ae db0a1f14 f41157f0
>>>> [1387441.969133] Call Trace:
>>>> [1387441.969140]  [<c0207c0a>] ? mntput_no_expire+0x1a/0xd0
>>>> [1387441.969146]  [<c05976ae>] __mutex_lock_slowpath+0xbe/0x120
>>>> [1387441.969149]  [<c05975d0>] mutex_lock+0x20/0x40
>>>> [1387441.969152]  [<c01fbc82>] do_rmdir+0x52/0xe0
>>>> [1387441.969155]  [<c059ae47>] ? do_page_fault+0x1d7/0x3a0
>>>> [1387441.969158]  [<c01fbd70>] sys_rmdir+0x10/0x20
>>>> [1387441.969161]  [<c01033cc>] syscall_call+0x7/0xb
>>>>
>>>> The block always occurs in sys_rmdir when automount tries to remove the
>>>> mountpoint right after umounting the filesystem. There is an successful lstat()
>>>> on the mountpoint directly precceeding the rmdir call.
>>>>
>>>> It looks like we are triggering some sort of race condition here.
>>>>
>>>> We are currently using 2.6.31-20-generic-pae ubuntu kernel in the 6 CPU guest,
>>>> 2.6.34 vanilla and qemu-kvm-0.12.4 in the host. But the problem existed
>>>> long before with all different combinations of guest/host/qemu versions.
>>>> The virtual HD is if=ide,format=host_device,cache=none on an DRBD container
>>>> on top of an LVM device. FS is ext3.
>>>>
>>>> Unfortunately, the problem is not easy reproduceable. It occurs every one
>>>> or two weeks. But since the hanging system call blocks the whole filesystem
>>>> we have to reboot the guest to get it into an useable state again.
>>>>
>>>> Any ideas what's going wrong here?
>>>>
>>>>
>>>>        
>>> Is there substantial I/O going on?
>>>
>>> If not, it may be an autofs bug unrelated to kvm.
>>>      
>> the autofs expire event occured at 01:15:01
>>
>> sar shows
>>
>> 00:00:01   CPU     %user     %nice   %system   %iowait  %steal   %idle
>> 00:35:02   all      0,31      1,90      3,88     29,78    0,00   64,12
>> 00:45:02   all      0,72      1,99      3,56     23,93    0,00   69,80
>> 00:55:01   all      1,35      1,49      4,13     23,76    0,00   69,27
>> 01:05:01   all      0,77      1,84      4,43     28,34    0,00   64,62
>> 01:15:01   all      0,29      1,46      3,41     44,07    0,00   50,77
>> 01:25:02   all      0,22      1,25      2,63     45,34    0,00   50,56
>> 01:35:02   all      0,34      1,33      2,87     46,74    0,00   48,72
>> 01:45:02   all      0,30      0,90      2,57     40,03    0,00   56,20
>> 01:55:02   all      0,26      0,43      2,29      9,79    0,00   87,23
>>
>> 00:00:01          tps      rtps      wtps   bread/s   bwrtn/s
>> 00:35:02       461,69    407,75     53,94  35196,06  32673,83
>> 00:45:02       298,29    238,30     59,99  38553,34  33062,97
>> 00:55:01       294,81    241,08     53,73  35469,66  25948,30
>> 01:05:01       338,62    279,97     58,66  36164,27  31109,18
>> 01:15:01       462,22    406,24     55,97  28428,26  25725,05
>> 01:25:02       366,88    331,82     35,07  24160,53  22284,83
>> 01:35:02       394,73    358,21     36,52  25770,79  23516,81
>> 01:45:02       409,83    379,66     30,17  17874,79  15608,74
>> 01:55:02       453,18    448,62      4,56   3754,82     79,47
>>
>> so, yes there is substantion I/O going on.
>>    
>
> Looks like a false alarm then.  The rmdir is waiting for the mount to  
> flush everything to disk, which is slow and takes a while.
>
> Does it return eventually?

No, it does not return within hours (>10). And the problem occurs only
once in a while although the system is busy every day (and night)
and automount is mounting/expiring frequently.

I would expect the "/bin/umount dir" process (which is forked by
automount if I read the code correctly) to return only after the flush
is complete. So I expect the rmdir being called afterwards on an plain
empty directory.

BTW: the mount is just a bind mount, so no flush should be necessary
anyway.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 12:48       ` Sebastian Hetze
@ 2010-07-19 13:09         ` Avi Kivity
  2010-07-19 13:45           ` Sebastian Hetze
       [not found]           ` <20100719134558.A0CD2A005F@mail.linux-ag.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 13:09 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 03:48 PM, Sebastian Hetze wrote:
>
>> Looks like a false alarm then.  The rmdir is waiting for the mount to
>> flush everything to disk, which is slow and takes a while.
>>
>> Does it return eventually?
>>      
> No, it does not return within hours (>10). And the problem occurs only
> once in a while although the system is busy every day (and night)
> and automount is mounting/expiring frequently.
>
> I would expect the "/bin/umount dir" process (which is forked by
> automount if I read the code correctly) to return only after the flush
> is complete. So I expect the rmdir being called afterwards on an plain
> empty directory.
>
> BTW: the mount is just a bind mount, so no flush should be necessary
> anyway.
>    

It looks like VFS breakage.  Does this happen with older guest kernels?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 13:09         ` Avi Kivity
@ 2010-07-19 13:45           ` Sebastian Hetze
       [not found]           ` <20100719134558.A0CD2A005F@mail.linux-ag.de>
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 13:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 04:09:12PM +0300, Avi Kivity wrote:
> On 07/19/2010 03:48 PM, Sebastian Hetze wrote:
>>
>>> Looks like a false alarm then.  The rmdir is waiting for the mount to
>>> flush everything to disk, which is slow and takes a while.
>>>
>>> Does it return eventually?
>>>      
>> No, it does not return within hours (>10). And the problem occurs only
>> once in a while although the system is busy every day (and night)
>> and automount is mounting/expiring frequently.
>>
>> I would expect the "/bin/umount dir" process (which is forked by
>> automount if I read the code correctly) to return only after the flush
>> is complete. So I expect the rmdir being called afterwards on an plain
>> empty directory.
>>
>> BTW: the mount is just a bind mount, so no flush should be necessary
>> anyway.
>>    
>
> It looks like VFS breakage.  Does this happen with older guest kernels?

currently I can confirm this problem with 2.6.31-22-generic-pae back to
2.6.31-16-generic-pae. What guest kernel version would you suggest?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
       [not found]           ` <20100719134558.A0CD2A005F@mail.linux-ag.de>
@ 2010-07-19 14:00             ` Avi Kivity
  2010-07-19 14:47               ` Sebastian Hetze
       [not found]               ` <20100719144750.334F2303001B@mail.linux-ag.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 14:00 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 04:45 PM, Sebastian Hetze wrote:
>
>> It looks like VFS breakage.  Does this happen with older guest kernels?
>>      
> currently I can confirm this problem with 2.6.31-22-generic-pae back to
> 2.6.31-16-generic-pae. What guest kernel version would you suggest?
>    

Can you try it on a non-virtualized system?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 14:00             ` Avi Kivity
@ 2010-07-19 14:47               ` Sebastian Hetze
       [not found]               ` <20100719144750.334F2303001B@mail.linux-ag.de>
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 05:00:52PM +0300, Avi Kivity wrote:
> On 07/19/2010 04:45 PM, Sebastian Hetze wrote:
>>
>>> It looks like VFS breakage.  Does this happen with older guest kernels?
>>>      
>> currently I can confirm this problem with 2.6.31-22-generic-pae back to
>> 2.6.31-16-generic-pae. What guest kernel version would you suggest?
>>    
>
> Can you try it on a non-virtualized system?

This automount setup has been running non-virtualized for quite a while
without problems until last year. Never seen or heared about similar hangs
with automount before.
We have not been able to reproduce this error in laboratory conditions.
The productive system has 2TB data and the problem occurs totally
unpredictable with real live workload.
And there is no easy way to switch back to non-virtualized unless we
declare this whole project a failure.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
       [not found]               ` <20100719144750.334F2303001B@mail.linux-ag.de>
@ 2010-07-19 15:03                 ` Avi Kivity
  2010-07-19 15:23                   ` Sebastian Hetze
       [not found]                   ` <20100719152518.641BAB001A@mail.linux-ag.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 15:03 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 05:47 PM, Sebastian Hetze wrote:
> On Mon, Jul 19, 2010 at 05:00:52PM +0300, Avi Kivity wrote:
>    
>> On 07/19/2010 04:45 PM, Sebastian Hetze wrote:
>>      
>>>        
>>>> It looks like VFS breakage.  Does this happen with older guest kernels?
>>>>
>>>>          
>>> currently I can confirm this problem with 2.6.31-22-generic-pae back to
>>> 2.6.31-16-generic-pae. What guest kernel version would you suggest?
>>>
>>>        
>> Can you try it on a non-virtualized system?
>>      
> This automount setup has been running non-virtualized for quite a while
> without problems until last year. Never seen or heared about similar hangs
> with automount before.
> We have not been able to reproduce this error in laboratory conditions.
> The productive system has 2TB data and the problem occurs totally
> unpredictable with real live workload.
> And there is no easy way to switch back to non-virtualized unless we
> declare this whole project a failure.
>    

Ahem.

What's your hardware platform?  EPT/NPT capable?  Host kernel version?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 15:03                 ` Avi Kivity
@ 2010-07-19 15:23                   ` Sebastian Hetze
       [not found]                   ` <20100719152518.641BAB001A@mail.linux-ag.de>
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 15:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 06:03:30PM +0300, Avi Kivity wrote:
> On 07/19/2010 05:47 PM, Sebastian Hetze wrote:
>> On Mon, Jul 19, 2010 at 05:00:52PM +0300, Avi Kivity wrote:
>>    
>>> On 07/19/2010 04:45 PM, Sebastian Hetze wrote:
>>>      
>>>>        
>>>>> It looks like VFS breakage.  Does this happen with older guest kernels?
>>>>>
>>>>>          
>>>> currently I can confirm this problem with 2.6.31-22-generic-pae back to
>>>> 2.6.31-16-generic-pae. What guest kernel version would you suggest?
>>>>
>>>>        
>>> Can you try it on a non-virtualized system?
>>>      
>> This automount setup has been running non-virtualized for quite a while
>> without problems until last year. Never seen or heared about similar hangs
>> with automount before.
>> We have not been able to reproduce this error in laboratory conditions.
>> The productive system has 2TB data and the problem occurs totally
>> unpredictable with real live workload.
>> And there is no easy way to switch back to non-virtualized unless we
>> declare this whole project a failure.
>>    
>
> Ahem.
>
> What's your hardware platform?  EPT/NPT capable?  Host kernel version?

Intel S5520HC Board with 2 Xeon CPU E5520, HT enabled
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2
ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow
vnmi flexpriority ept vpid

(so yes, EPT is available)

host kernel is vanilla 2.6.34 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
       [not found]                   ` <20100719152518.641BAB001A@mail.linux-ag.de>
@ 2010-07-19 15:28                     ` Avi Kivity
  2010-07-19 15:38                       ` Sebastian Hetze
       [not found]                       ` <20100719153816.1E33FB0016@mail.linux-ag.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 15:28 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 06:23 PM, Sebastian Hetze wrote:
>
>> What's your hardware platform?  EPT/NPT capable?  Host kernel version?
>>      
> Intel S5520HC Board with 2 Xeon CPU E5520, HT enabled
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
> xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2
> ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow
> vnmi flexpriority ept vpid
>
> (so yes, EPT is available)
>
> host kernel is vanilla 2.6.34
>
>    

Well, EPT makes it unlikely that there's an mmu bug involved.

Is the guest smp?  Can you try UP?

Can you try 2.6.32.latest in the guest?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
  2010-07-19 15:28                     ` Avi Kivity
@ 2010-07-19 15:38                       ` Sebastian Hetze
       [not found]                       ` <20100719153816.1E33FB0016@mail.linux-ag.de>
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Hetze @ 2010-07-19 15:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sebastian Hetze, kvm

On Mon, Jul 19, 2010 at 06:28:32PM +0300, Avi Kivity wrote:
> On 07/19/2010 06:23 PM, Sebastian Hetze wrote:
>>
>>> What's your hardware platform?  EPT/NPT capable?  Host kernel version?
>>>      
>> Intel S5520HC Board with 2 Xeon CPU E5520, HT enabled
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>> syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
>> xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2
>> ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow
>> vnmi flexpriority ept vpid
>>
>> (so yes, EPT is available)
>>
>> host kernel is vanilla 2.6.34
>>
>>    
>
> Well, EPT makes it unlikely that there's an mmu bug involved.
>
> Is the guest smp?  Can you try UP?

yes, the guest starts with -smp 6

Since the system is quite busy and the problem occurs only ever two
weeks or so, UP is no option.

>
> Can you try 2.6.32.latest in the guest?

This we can certainly do. Is 2.6.32.16 your recommendation
or will any newer kernel do also?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: syscall rmdir hangs with autofs
       [not found]                       ` <20100719153816.1E33FB0016@mail.linux-ag.de>
@ 2010-07-19 17:55                         ` Avi Kivity
  0 siblings, 0 replies; 14+ messages in thread
From: Avi Kivity @ 2010-07-19 17:55 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On 07/19/2010 06:38 PM, Sebastian Hetze wrote:
>
>> Well, EPT makes it unlikely that there's an mmu bug involved.
>>
>> Is the guest smp?  Can you try UP?
>>      
> yes, the guest starts with -smp 6
>
> Since the system is quite busy and the problem occurs only ever two
> weeks or so, UP is no option.
>    

I see.

>> Can you try 2.6.32.latest in the guest?
>>      
> This we can certainly do. Is 2.6.32.16 your recommendation
> or will any newer kernel do also?
>
>    

2.6.32.16 is probably the stablest out there.  Note that there's a 
problem exposed with certain versions of gcc, I recommend compiling with 
CONFIG_DEBUG_RODATA=n, or the kernel may fail to boot.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-07-19 17:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-19  8:39 syscall rmdir hangs with autofs Sebastian Hetze
2010-07-19 11:12 ` Avi Kivity
2010-07-19 11:40   ` Sebastian Hetze
     [not found]   ` <20100719114034.62BDD30303F5@mail.linux-ag.de>
2010-07-19 12:21     ` Avi Kivity
2010-07-19 12:48       ` Sebastian Hetze
2010-07-19 13:09         ` Avi Kivity
2010-07-19 13:45           ` Sebastian Hetze
     [not found]           ` <20100719134558.A0CD2A005F@mail.linux-ag.de>
2010-07-19 14:00             ` Avi Kivity
2010-07-19 14:47               ` Sebastian Hetze
     [not found]               ` <20100719144750.334F2303001B@mail.linux-ag.de>
2010-07-19 15:03                 ` Avi Kivity
2010-07-19 15:23                   ` Sebastian Hetze
     [not found]                   ` <20100719152518.641BAB001A@mail.linux-ag.de>
2010-07-19 15:28                     ` Avi Kivity
2010-07-19 15:38                       ` Sebastian Hetze
     [not found]                       ` <20100719153816.1E33FB0016@mail.linux-ag.de>
2010-07-19 17:55                         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox