linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] DM suspend locks up under load?
@ 2012-01-04 22:50 David Shaw
  2012-01-05 10:44 ` Zdenek Kabelac
  0 siblings, 1 reply; 3+ messages in thread
From: David Shaw @ 2012-01-04 22:50 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

Hi,

I'm using some code that creates a snapshot using DM directly (we aren't using LVM), using essentially:

  suspend linear device X
  reload X as a "snapshot-origin" device
  create "snapshot" device
  resume original X device (which is now a snapshot-origin)

This has worked fine for several years.  Recently, however, we updated to a more recent system, and ext4, and are seeing something odd.  Under load, the process above freezes at the first suspend step, and locks up the device in question, requiring a reboot to fix things.

I wrote the attached program to demonstrate the problem.  All it does it call DM_DEVICE_SUSPEND and DM_DEVICE_RESUME over and over on a DM device.  Basically, run the test program on any mounted linear DM target in one shell, then delete a lot of data from a directory residing on that device in another shell.  On my systems this will freeze both the test program and the rm in D state, and require a reboot to fix things.

I've tried multiple different kernels, but at the moment, I'm using kernel-PAE-2.6.35.6-45.fc14.i686 and device-mapper-libs-1.02.63-2.fc14.i686.

One clue I can add is that it only seems to happen if the filesystem on the device is ext4.  It does not happen with ext3.

Any ideas on where I should look next?

Thanks,

David


[-- Attachment #2: suspendtest.c --]
[-- Type: application/octet-stream, Size: 883 bytes --]

#include <stdio.h>
#include <libdevmapper.h>

static int
dm_command(int command,const char *device)
{
  int ret=0;
  struct dm_task *dmt;

  dmt=dm_task_create(command);
  if(!dmt)
    return 0;

  if(!dm_task_set_name(dmt,device))
    goto fail;

  ret=dm_task_run(dmt);

 fail:

  dm_task_destroy(dmt);

  return ret;
}

int
main(int argc,char *argv[])
{
  if(argc<2)
    {
      printf("%s <DM name>\n",argv[0]);
      return 1;
    }

  dm_udev_set_sync_support(0);

  printf("Suspending and resuming %s\n",argv[1]);

  for(;;)
    {
      if(!dm_command(DM_DEVICE_SUSPEND,argv[1]))
	{
	  fprintf(stderr,"Unable to suspend %s\n",argv[1]);
	  break;
	}

      printf("/\r");
      fflush(stdout);

      if(!dm_command(DM_DEVICE_RESUME,argv[1]))
	{
	  fprintf(stderr,"Unable to resume %s\n",argv[1]);
	  break;
	}

      printf("\\\r");
      fflush(stdout);
    }

  return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linux-lvm] DM suspend locks up under load?
  2012-01-04 22:50 [linux-lvm] DM suspend locks up under load? David Shaw
@ 2012-01-05 10:44 ` Zdenek Kabelac
  2012-01-10 23:20   ` David Shaw
  0 siblings, 1 reply; 3+ messages in thread
From: Zdenek Kabelac @ 2012-01-05 10:44 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 4.1.2012 23:50, David Shaw napsal(a):
> Hi,
>
> I'm using some code that creates a snapshot using DM directly (we aren't using LVM), using essentially:
>
>    suspend linear device X
>    reload X as a "snapshot-origin" device
>    create "snapshot" device
>    resume original X device (which is now a snapshot-origin)
>
> This has worked fine for several years.  Recently, however, we updated to a more recent system, and ext4, and are seeing something odd.  Under load, the process above freezes at the first suspend step, and locks up the device in question, requiring a reboot to fix things.
>
> I wrote the attached program to demonstrate the problem.  All it does it call DM_DEVICE_SUSPEND and DM_DEVICE_RESUME over and over on a DM device.  Basically, run the test program on any mounted linear DM target in one shell, then delete a lot of data from a directory residing on that device in another shell.  On my systems this will freeze both the test program and the rm in D state, and require a reboot to fix things.
>
> I've tried multiple different kernels, but at the moment, I'm using kernel-PAE-2.6.35.6-45.fc14.i686 and device-mapper-libs-1.02.63-2.fc14.i686.
>
> One clue I can add is that it only seems to happen if the filesystem on the device is ext4.  It does not happen with ext3.
>
> Any ideas on where I should look next?
>

Maybe you should suspect ext4  - if there is no problem with dm & ext3 ?

I guess you need to get stacktrace where the system locks.
(echo t >/proc/sysrq-trigger  - or Sysrq+T)

You should probably also try different kernel.

Zdenek

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linux-lvm] DM suspend locks up under load?
  2012-01-05 10:44 ` Zdenek Kabelac
@ 2012-01-10 23:20   ` David Shaw
  0 siblings, 0 replies; 3+ messages in thread
From: David Shaw @ 2012-01-10 23:20 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development

On Jan 5, 2012, at 5:44 AM, Zdenek Kabelac wrote:

> Dne 4.1.2012 23:50, David Shaw napsal(a):
>> Hi,
>> 
>> I'm using some code that creates a snapshot using DM directly (we aren't using LVM), using essentially:
>> 
>>   suspend linear device X
>>   reload X as a "snapshot-origin" device
>>   create "snapshot" device
>>   resume original X device (which is now a snapshot-origin)
>> 
>> This has worked fine for several years.  Recently, however, we updated to a more recent system, and ext4, and are seeing something odd.  Under load, the process above freezes at the first suspend step, and locks up the device in question, requiring a reboot to fix things.
>> 
>> I wrote the attached program to demonstrate the problem.  All it does it call DM_DEVICE_SUSPEND and DM_DEVICE_RESUME over and over on a DM device.  Basically, run the test program on any mounted linear DM target in one shell, then delete a lot of data from a directory residing on that device in another shell.  On my systems this will freeze both the test program and the rm in D state, and require a reboot to fix things.
>> 
>> I've tried multiple different kernels, but at the moment, I'm using kernel-PAE-2.6.35.6-45.fc14.i686 and device-mapper-libs-1.02.63-2.fc14.i686.
>> 
>> One clue I can add is that it only seems to happen if the filesystem on the device is ext4.  It does not happen with ext3.
>> 
>> Any ideas on where I should look next?
>> 
> 
> Maybe you should suspect ext4  - if there is no problem with dm & ext3 ?
> 
> I guess you need to get stacktrace where the system locks.
> (echo t >/proc/sysrq-trigger  - or Sysrq+T)
> 
> You should probably also try different kernel.

Thanks for the tip!  It did indeed turn out to be ext4, and it was already fixed: http://git.kernel.org/?p=linux/kernel/git/stable/linux
-stable.git;a=commitdiff;h=be4f27d324e8ddd57cc0d4d604fe85ee0425cba9

David

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-01-10 23:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-04 22:50 [linux-lvm] DM suspend locks up under load? David Shaw
2012-01-05 10:44 ` Zdenek Kabelac
2012-01-10 23:20   ` David Shaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).