Single-drive RAID0

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Single-drive RAID0
@ 2011-02-08 16:15 Wojcik, Krzysztof
  2011-02-09  2:45 ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-08 16:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid@vger.kernel.org

Neil,

I try to create a single-drive RAID0 array with data preservation.
It means I have a single disk with partitions and data on them. 
I would like to migrate this single disk to RAID0 array and then grow it to N devices.
Of course I expect partitions with my useful data will be mapped to corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.). Indeed they are but when I try to mount new block device it disappears from /dev directory and mount is not succeeded. 
/var/log/messages shows all the time: 
md126: p1
md126: detected capacity change from 0 to xxxxxxxxxxxxx

It is not always reproducible but very often.
I try this operation on 0.9 and imsm metadata. Symptoms are similar.

Do you have any suggestions? How to fix this issue?

Regards
Krzysztof

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-08 16:15 Single-drive RAID0 Wojcik, Krzysztof
@ 2011-02-09  2:45 ` NeilBrown
  2011-02-09 14:33   ` Wojcik, Krzysztof
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-02-09  2:45 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Tue, 8 Feb 2011 16:15:53 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> Neil,
> 
> I try to create a single-drive RAID0 array with data preservation.
> It means I have a single disk with partitions and data on them. 
> I would like to migrate this single disk to RAID0 array and then grow it to N devices.
> Of course I expect partitions with my useful data will be mapped to corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.). Indeed they are but when I try to mount new block device it disappears from /dev directory and mount is not succeeded. 
> /var/log/messages shows all the time: 
> md126: p1
> md126: detected capacity change from 0 to xxxxxxxxxxxxx
> 
> It is not always reproducible but very often.
> I try this operation on 0.9 and imsm metadata. Symptoms are similar.
> 
> Do you have any suggestions? How to fix this issue?
> 
> Regards
> Krzysztof

It isn't clear to me what the issue is and when I try something that might be
what you are suggesting it works perfectly every time.

Maybe if you could provide something more detailed and specific.
e.g. a series of steps that I can try together with all the messages you get
(both from the kernel and from mdadm) throughout the process.

That will probably help.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-09  2:45 ` NeilBrown
@ 2011-02-09 14:33   ` Wojcik, Krzysztof
  2011-02-14  2:11     ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-09 14:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]


Steps to reproduce:
1. Create partition on disk (for example: 1 partition, size=half of /dev/sda disk)
2. Create filesystem on partition (ext3)
3. Mount partition and create some control data on partition
4. Unmount partition
5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l 0 -n 1 /dev/sda --force)
6. md127 and md127p1 appears in /dev as expected
7. Try to mount md127p1 and verify control data on them

Expected result:
md127p1 device mounted and control data not corrupted

Actual result:
1. Mount command returns error:
mount: special device /dev/md127p1 does not exists
2. md127p1 disappears from /dev/
3. kernel indefinitely shows (until array will be stop):
md126: p1
md126: detected capacity change from 0 to xxxxxxxxxxxxx
md126: p1
md126: detected capacity change from 0 to xxxxxxxxxxxxx
...


Additional Info:
Reproduced on:
- SLES11 SP1
- mdamd 3.2 from top of your devel-3.2 branch
- kernel 2.6.38 RC2

I've attached script that I use to reproduce this issue.
For native metadata run:
./raid_ready_drive native
For imsm metadata just:
./raid_ready_drive

Regards
Krzysztof



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, February 09, 2011 3:46 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Tue, 8 Feb 2011 16:15:53 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> > Neil,
> >
> > I try to create a single-drive RAID0 array with data preservation.
> > It means I have a single disk with partitions and data on them.
> > I would like to migrate this single disk to RAID0 array and then grow
> it to N devices.
> > Of course I expect partitions with my useful data will be mapped to
> corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.).
> Indeed they are but when I try to mount new block device it disappears
> from /dev directory and mount is not succeeded.
> 
> > /var/log/messages shows all the time:
> > md126: p1
> > md126: detected capacity change from 0 to xxxxxxxxxxxxx
> >
> > It is not always reproducible but very often.
> > I try this operation on 0.9 and imsm metadata. Symptoms are similar.
> >
> > Do you have any suggestions? How to fix this issue?
> >
> > Regards
> > Krzysztof
> 
> It isn't clear to me what the issue is and when I try something that
> might be
> what you are suggesting it works perfectly every time.
> 
> Maybe if you could provide something more detailed and specific.
> e.g. a series of steps that I can try together with all the messages
> you get
> (both from the kernel and from mdadm) throughout the process.
> 
> That will probably help.
> 
> NeilBrown

[-- Attachment #2: raid_ready_drive --]
[-- Type: application/octet-stream, Size: 2635 bytes --]

#!/bin/bash

V1MNT=/mnt/vol1
V2MNT=/mnt/vol2
CONT=/dev/md/imsm0
FILE_SIZE=100000000
CHUNK=128

# define disk for tests
#disk=/dev/sda

N1=/dev/md/vol_1

function stat() {
#	set +x
	if [ $1 != $2 ] ; then
	    echo -e "\nCommand $3 returned wrong status (expected: $2, got: $1)\n"
	    exit 1
	fi
#	set -x
}

if [ -z $disk ] ; then
	echo "Disk for test not specified. Please set \"disk\" variable."
	exit
fi
echo "Disk to test= $disk"

if [ ! -d $V1MNT ] ; then
    mkdir $V1MNT
fi

if [ ! -d $V2MNT ] ; then
    mkdir $V2MNT
fi

#set -x

################## cleanup ############################
#unmount
umount $V1MNT
umount $V2MNT

#stop
res=1
while [ $res != 0 ] ; do
	mdadm -Ss
	res=$?
done

#clean
mdadm --zero-superblock $disk


#########################################################
######################### MAIN ##########################
#########################################################

################## prepare disk for tests ###############

sfdisk -uM $disk << EOF
,50000,L
,50000,L
EOF

udevadm settle

	P1=$disk'1'
	P2=$disk'2'
	mkfs $P1
	mkfs $P2

	mount $P1 $V1MNT
	mount $P2 $V2MNT

	openssl rand -out $V1MNT/test1.bin $FILE_SIZE
	SUM1=$(md5sum $V1MNT/test1.bin | gawk '{print $1}')
	openssl rand -out $V2MNT/test1.bin $FILE_SIZE
	SUM2=$(md5sum $V2MNT/test1.bin | gawk '{print $1}')

	echo "SUM1=$SUM1" > $SUM
	echo "SUM2=$SUM2" >> $SUM

	umount $V1MNT
	umount $V2MNT

######################## run test ########################
if [[ $1 = "native" ]] ; then
	#create vol
	mdadm -C $N1 -e 0.9 -amd -l 0 --chunk $CHUNK -n 1 $disk -R --force
	stat $? 0 craete_vol1
else
	#create container
	mdadm -C $CONT -amd -e imsm -n 1 $disk -R --force
	stat $? 0 create_container

	#create vol
	mdadm -C $N1 -amd -l 0 --chunk $CHUNK -n 1 $disk -R --force
	stat $? 0 craete_vol1
fi

#check status
cat /proc/mdstat

udevadm settle

ls /dev | grep md
ls /dev/md

ls /dev/md | grep vol_1p1 > /dev/null
if [ $? != 0 ] ; then
	echo "Error. Partition #1 not mapped!"
	exit
fi
ls /dev/md | grep vol_1p2 > /dev/null
if [ $? != 0 ] ; then
	echo "Error. Partition #2 not mapped!"
	exit
fi

#mount
mount $N1'p1' $V1MNT
stat $? 0 mount_vol1

mount $N1'p2' $V2MNT
stat $? 0 mount_vol2

SUM1_A=$(md5sum $V1MNT/test1.bin | gawk '{print $1}')
SUM2_A=$(md5sum $V2MNT/test1.bin | gawk '{print $1}')

if [[ $SUM1 != $SUM1_A ]] || [[ $SUM2 != $SUM2_A ]] ; then
    	echo "Error. Test failed. md5sum of files not equal!"
else
    echo "Test PASSED!"
fi

#############################################################
######################## END ################################
#############################################################


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-09 14:33   ` Wojcik, Krzysztof
@ 2011-02-14  2:11     ` NeilBrown
  2011-02-14 17:04       ` Wojcik, Krzysztof
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-02-14  2:11 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Wed, 9 Feb 2011 14:33:01 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> 
> Steps to reproduce:
> 1. Create partition on disk (for example: 1 partition, size=half of /dev/sda disk)
> 2. Create filesystem on partition (ext3)
> 3. Mount partition and create some control data on partition
> 4. Unmount partition
> 5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l 0 -n 1 /dev/sda --force)
> 6. md127 and md127p1 appears in /dev as expected
> 7. Try to mount md127p1 and verify control data on them

I couldn't get this to fail for me.
It is possible that there is something subtle about the precise device
geometry.
Could you send me
   sfdisk -l /dev/sda
(or whatever device you are using)

However when I ran the script without requesting native metadata I did
get some results somewhat similar to yours - and they were not consistent
which is an important similarity.

This patch should fix that particular problem.  Let me know if you
can still produce any of these errors with the patch applied.

Thanks,
NeilBrown



diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0cc30ec..6818ff4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4138,10 +4138,10 @@ array_size_store(mddev_t *mddev, const char *buf, size_t len)
 	}
 
 	mddev->array_sectors = sectors;
-	set_capacity(mddev->gendisk, mddev->array_sectors);
-	if (mddev->pers)
+	if (mddev->pers) {
+		set_capacity(mddev->gendisk, mddev->array_sectors);
 		revalidate_disk(mddev->gendisk);
-
+	}
 	return len;
 }
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-14  2:11     ` NeilBrown
@ 2011-02-14 17:04       ` Wojcik, Krzysztof
  2011-02-15  0:01         ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-14 17:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Monday, February 14, 2011 3:12 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Wed, 9 Feb 2011 14:33:01 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> >
> > Steps to reproduce:
> > 1. Create partition on disk (for example: 1 partition, size=half of
> /dev/sda disk)
> > 2. Create filesystem on partition (ext3)
> > 3. Mount partition and create some control data on partition
> > 4. Unmount partition
> > 5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l
> 0 -n 1 /dev/sda --force)
> > 6. md127 and md127p1 appears in /dev as expected
> > 7. Try to mount md127p1 and verify control data on them
> 
> I couldn't get this to fail for me.
> It is possible that there is something subtle about the precise device
> geometry.
> Could you send me
>    sfdisk -l /dev/sda
> (or whatever device you are using)
> 
> However when I ran the script without requesting native metadata I did
> get some results somewhat similar to yours - and they were not
> consistent
> which is an important similarity.
> 
> This patch should fix that particular problem.  Let me know if you
> can still produce any of these errors with the patch applied.


Unfortunately issue is reproducible with path applied.
I tried to reproduce it on other setup (other PC and disks) also... issue exists :(

Interesting observation is that when I stop array just after creation and then reassemble it again, everything work fine.
On older kernel version (I tried 2.6.34) issue is NOT reproducible also.

Regards
Krzysztof

> 
> Thanks,
> NeilBrown
> 
> 
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0cc30ec..6818ff4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4138,10 +4138,10 @@ array_size_store(mddev_t *mddev, const char
> *buf, size_t len)
>  	}
> 
>  	mddev->array_sectors = sectors;
> -	set_capacity(mddev->gendisk, mddev->array_sectors);
> -	if (mddev->pers)
> +	if (mddev->pers) {
> +		set_capacity(mddev->gendisk, mddev->array_sectors);
>  		revalidate_disk(mddev->gendisk);
> -
> +	}
>  	return len;
>  }
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-14 17:04       ` Wojcik, Krzysztof
@ 2011-02-15  0:01         ` NeilBrown
  2011-02-15 16:30           ` Wojcik, Krzysztof
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-02-15  0:01 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Mon, 14 Feb 2011 17:04:38 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:
> > It is possible that there is something subtle about the precise device
> > geometry.
> > Could you send me
> >    sfdisk -l /dev/sda
> > (or whatever device you are using)

You didn't include this information...

> > This patch should fix that particular problem.  Let me know if you
> > can still produce any of these errors with the patch applied.
> 
> 
> Unfortunately issue is reproducible with path applied.
> I tried to reproduce it on other setup (other PC and disks) also... issue exists :(
> 
> Interesting observation is that when I stop array just after creation and then reassemble it again, everything work fine.
> On older kernel version (I tried 2.6.34) issue is NOT reproducible also.
> 

All very strange.  And as I cannot reproduce it, it is very hard to debug.

Maybe some daemon is accessing the array at some awkward time and causing
different behaviour for you...

If it is repeatable enough that you could try 'git bisect' to find which
commit introduced the problem, that would be helpful but I suspect it would
be very time-consuming.

It might help to put a "WARN_ON(1)" in the place where it prints "detected
capacity change ..." so we get a stack trace and can see how it got there.
That might git a hint to what is looping.
Also a printk in md_open if it returns ERESTARTSYS would be interesting.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-15  0:01         ` NeilBrown
@ 2011-02-15 16:30           ` Wojcik, Krzysztof
  2011-02-16  0:42             ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-15 16:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Tuesday, February 15, 2011 1:02 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Mon, 14 Feb 2011 17:04:38 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> > > It is possible that there is something subtle about the precise
> device
> > > geometry.
> > > Could you send me
> > >    sfdisk -l /dev/sda
> > > (or whatever device you are using)
> 
> You didn't include this information...

Disk /dev/sdc: 30515 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdc1          0+   6374    6375-  51207187   83  Linux
/dev/sdc2       6375   12749    6375   51207187+  83  Linux
/dev/sdc3          0       -       0          0    0  Empty
/dev/sdc4          0       -       0          0    0  Empty

> 
> > > This patch should fix that particular problem.  Let me know if you
> > > can still produce any of these errors with the patch applied.
> >
> >
> > Unfortunately issue is reproducible with path applied.
> > I tried to reproduce it on other setup (other PC and disks) also...
> issue exists :(
> >
> > Interesting observation is that when I stop array just after creation
> and then reassemble it again, everything work fine.
> > On older kernel version (I tried 2.6.34) issue is NOT reproducible
> also.
> >
> 
> All very strange.  And as I cannot reproduce it, it is very hard to
> debug.
> 
> Maybe some daemon is accessing the array at some awkward time and
> causing
> different behaviour for you...
> 
> If it is repeatable enough that you could try 'git bisect' to find
> which
> commit introduced the problem, that would be helpful but I suspect it
> would
> be very time-consuming.
> 
> It might help to put a "WARN_ON(1)" in the place where it prints
> "detected
> capacity change ..." so we get a stack trace and can see how it got
> there.
> That might git a hint to what is looping.
> Also a printk in md_open if it returns ERESTARTSYS would be
> interesting.

In attachment part of logs from kernel with WARN_ON(1) and value returned by md_open() 
(line preceded with "##### KW: err= x").

I am trying to look for in new areas. I've run:
Udevd --debug --dedug-traces

Logs from udev and kernel in attachment.
Maybe it will help to find solution...
It seems to udev adds and removes device in loop...

Regards
Krzysztof

> 
> Thanks,
> NeilBrown

[-- Attachment #2: kernel_log.tgz --]
[-- Type: application/x-compressed, Size: 1396 bytes --]

[-- Attachment #3: kernel_with_udev.tgz --]
[-- Type: application/x-compressed, Size: 2855 bytes --]

[-- Attachment #4: udev_logs.tgz --]
[-- Type: application/x-compressed, Size: 5751 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-15 16:30           ` Wojcik, Krzysztof
@ 2011-02-16  0:42             ` NeilBrown
  2011-02-16 16:57               ` Wojcik, Krzysztof
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-02-16  0:42 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Tue, 15 Feb 2011 16:30:05 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> > It might help to put a "WARN_ON(1)" in the place where it prints
> > "detected
> > capacity change ..." so we get a stack trace and can see how it got
> > there.
> > That might git a hint to what is looping.
> > Also a printk in md_open if it returns ERESTARTSYS would be
> > interesting.
> 
> In attachment part of logs from kernel with WARN_ON(1) and value returned by md_open() 
> (line preceded with "##### KW: err= x").
> 
> I am trying to look for in new areas. I've run:
> Udevd --debug --dedug-traces
> 
> Logs from udev and kernel in attachment.
> Maybe it will help to find solution...
> It seems to udev adds and removes device in loop...

Thanks for the extra logs.... it helps a bit, but I'm not a lot closer.

I've seen something a little bit like this which was fixed by adding

TEST!="md/array_state", GOTO="md_end"

just after

# container devices have a metadata version of e.g. 'external:ddf' and
# never leave state 'inactive'

in /lib/udev/rules.d/64-md-raid.rules


Could you try that??

It looks like the 'bdev' passed to md_open keeps changing, which it shouldn't.

If the above doesn't help, please add:

	printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev->gendisk, bdev->bd_dev);

at the top of 'md_open', and see what it produces.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-16  0:42             ` NeilBrown
@ 2011-02-16 16:57               ` Wojcik, Krzysztof
  2011-02-17  0:38                 ` NeilBrown
  2011-02-21  2:15                 ` NeilBrown
  0 siblings, 2 replies; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-16 16:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, February 16, 2011 1:43 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Tue, 15 Feb 2011 16:30:05 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> > > It might help to put a "WARN_ON(1)" in the place where it prints
> > > "detected
> > > capacity change ..." so we get a stack trace and can see how it got
> > > there.
> > > That might git a hint to what is looping.
> > > Also a printk in md_open if it returns ERESTARTSYS would be
> > > interesting.
> >
> > In attachment part of logs from kernel with WARN_ON(1) and value
> returned by md_open()
> > (line preceded with "##### KW: err= x").
> >
> > I am trying to look for in new areas. I've run:
> > Udevd --debug --dedug-traces
> >
> > Logs from udev and kernel in attachment.
> > Maybe it will help to find solution...
> > It seems to udev adds and removes device in loop...
> 
> Thanks for the extra logs.... it helps a bit, but I'm not a lot closer.
> 
> I've seen something a little bit like this which was fixed by adding
> 
> TEST!="md/array_state", GOTO="md_end"
> 
> just after
> 
> # container devices have a metadata version of e.g. 'external:ddf' and
> # never leave state 'inactive'
> 
> in /lib/udev/rules.d/64-md-raid.rules
> 
> 
> Could you try that??

I tried. It not helps :(

> 
> It looks like the 'bdev' passed to md_open keeps changing, which it
> shouldn't.
> 
> If the above doesn't help, please add:
> 
> 	printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev-
> >gendisk, bdev->bd_dev);
> 
> at the top of 'md_open', and see what it produces.

I tried this also but I can't see any useful information that can make me some idea.
Could you look at logs in attachment?

Regards
Krzysztof

> 
> Thanks,
> NeilBrown


[-- Attachment #2: kernel_log.tgz --]
[-- Type: application/x-compressed, Size: 1830 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-16 16:57               ` Wojcik, Krzysztof
@ 2011-02-17  0:38                 ` NeilBrown
  2011-02-21  2:15                 ` NeilBrown
  1 sibling, 0 replies; 16+ messages in thread
From: NeilBrown @ 2011-02-17  0:38 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> > 
> > It looks like the 'bdev' passed to md_open keeps changing, which it
> > shouldn't.
> > 
> > If the above doesn't help, please add:
> > 
> > 	printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev-
> > >gendisk, bdev->bd_dev);
> > 
> > at the top of 'md_open', and see what it produces.
> 
> I tried this also but I can't see any useful information that can make me some idea.
> Could you look at logs in attachment?
> 

Thanks.

That at-least confirms that the bdev is often changing while the mddev and
disk stay unchanged.  Having that confirmation is useful..

I didn't think that could happen - the bdev should remain pinned while the
mddev exists.  Obviously it doesn't.

I'll see what I can find out.

NeilBrown


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-16 16:57               ` Wojcik, Krzysztof
  2011-02-17  0:38                 ` NeilBrown
@ 2011-02-21  2:15                 ` NeilBrown
  2011-02-21 14:11                   ` Wojcik, Krzysztof
  1 sibling, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-02-21  2:15 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> > 
> > It looks like the 'bdev' passed to md_open keeps changing, which it
> > shouldn't.
> > 
> > If the above doesn't help, please add:
> > 
> > 	printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev-  
> > >gendisk, bdev->bd_dev);  
> > 
> > at the top of 'md_open', and see what it produces.  
> 
> I tried this also but I can't see any useful information that can make me some idea.
> Could you look at logs in attachment?
> 

I've made some progress.

The struct block_device is definitely getting freed and re-allocated.

This can happen if you

  mknod /dev/.tmp b 9 0
  open /dev/.tmp and do stuff, then close
  rm /dev/.tmp

If you open something in /dev which stays there, e.g. /dev/md0, then as long
as the inode for /dev/md0 remains in the inode cache it will hold a reference
to the block_device and so the same one will get reused.
But when to 'rm /dev/.tmp', the inode gets flushed from the cache and the
reference of the block_device is dropped.  If that was the only reference
then the block_device is discarded.

Each time a new block_device is created for a given block device, it will
re-do the partition scan which will mean that the partition devices get
deleted and recreated.

Maybe you are racing with this somehow and that is how the partition devices
appear to not be there - udev is removing and recreating them.

Something you could try would be to change fs/block_dev.c.
At about line 470 you should find:

static const struct super_operations bdev_sops = {
	.statfs = simple_statfs,
	.alloc_inode = bdev_alloc_inode,
	.destroy_inode = bdev_destroy_inode,
	.drop_inode = generic_delete_inode,
	.evict_inode = bdev_evict_inode,
};

If you change the 'drop_inode' line to
        .drop_inode = do_not_drop_inode,

and define a function:

static int do_not_drop_inode(struct inode *inode)
{
	return 0;
}

That will cause the block_device to remain in a cache for a while after the
last reference is dropped.
That should make your symptoms go away.

I'm not sure that is the "right" fix, but it should confirm what I suspect
is happening.

If it does, then we can explore more...
Did you say that this works better on earlier kernels?  If so, what is the
latest kernel that you have tried that you don't have this problem on?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-21  2:15                 ` NeilBrown
@ 2011-02-21 14:11                   ` Wojcik, Krzysztof
  2011-02-22  0:50                     ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-21 14:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Monday, February 21, 2011 3:15 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> > >
> > > It looks like the 'bdev' passed to md_open keeps changing, which it
> > > shouldn't.
> > >
> > > If the above doesn't help, please add:
> > >
> > > 	printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev-
> > > >gendisk, bdev->bd_dev);
> > >
> > > at the top of 'md_open', and see what it produces.
> >
> > I tried this also but I can't see any useful information that can
> make me some idea.
> > Could you look at logs in attachment?
> >
> 
> 
> I've made some progress.
> 
> 
> The struct block_device is definitely getting freed and re-allocated.
> 
> This can happen if you
> 
>   mknod /dev/.tmp b 9 0
>   open /dev/.tmp and do stuff, then close
>   rm /dev/.tmp
> 
> 
> If you open something in /dev which stays there, e.g. /dev/md0, then as
> long
> as the inode for /dev/md0 remains in the inode cache it will hold a
> reference
> to the block_device and so the same one will get reused.
> But when to 'rm /dev/.tmp', the inode gets flushed from the cache and
> the
> reference of the block_device is dropped.  If that was the only
> reference
> then the block_device is discarded.
> 
> Each time a new block_device is created for a given block device, it
> will
> re-do the partition scan which will mean that the partition devices get
> deleted and recreated.
> 
> Maybe you are racing with this somehow and that is how the partition
> devices
> appear to not be there - udev is removing and recreating them.
> 
> Something you could try would be to change fs/block_dev.c.
> At about line 470 you should find:
> 
> static const struct super_operations bdev_sops = {
> 	.statfs = simple_statfs,
> 	.alloc_inode = bdev_alloc_inode,
> 	.destroy_inode = bdev_destroy_inode,
> 	.drop_inode = generic_delete_inode,
> 	.evict_inode = bdev_evict_inode,
> };
> 
> If you change the 'drop_inode' line to
>         .drop_inode = do_not_drop_inode,
> 
> and define a function:
> 
> 
> static int do_not_drop_inode(struct inode *inode)
> {
> 	return 0;
> }
> 
> That will cause the block_device to remain in a cache for a while after
> the
> last reference is dropped.
> That should make your symptoms go away.
> 
> I'm not sure that is the "right" fix, but it should confirm what I
> suspect
> is happening.

Hi,

I tested your proposal.
It resolve the problem.
What are your future plans?
Do you need to consult the patch with someone involved in the block devices code?
When can I expect final patch?

> 
> If it does, then we can explore more...
> Did you say that this works better on earlier kernels?  If so, what is
> the
> latest kernel that you have tried that you don't have this problem on?

I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember correctly, 2.6.34- they work properly.

Regards
Krzysztof

> 
> Thanks,
> NeilBrown



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-21 14:11                   ` Wojcik, Krzysztof
@ 2011-02-22  0:50                     ` NeilBrown
  2011-02-22 11:50                       ` Wojcik, Krzysztof
  2011-02-22 16:42                       ` Jan Ceuleers
  0 siblings, 2 replies; 16+ messages in thread
From: NeilBrown @ 2011-02-22  0:50 UTC (permalink / raw)
  To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org

On Mon, 21 Feb 2011 14:11:51 +0000 "Wojcik, Krzysztof"
<krzysztof.wojcik@intel.com> wrote:

> Hi,
> 
> I tested your proposal.
> It resolve the problem.

Thanks!

> What are your future plans?
> Do you need to consult the patch with someone involved in the block devices code?
> When can I expect final patch?

Here it is.  As you can see it is quite different - I figured out what is
really going on..


> 
> > 
> > If it does, then we can explore more...
> > Did you say that this works better on earlier kernels?  If so, what is
> > the
> > latest kernel that you have tried that you don't have this problem on?
> 
> I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember correctly, 2.6.34- they work properly.

That fits - I discovered that I broke it in 2.6.35.

Thanks,
NeilBrown

From b7fea734c7780ca80f58fb87a6c9f55aa83314d7 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 22 Feb 2011 11:48:03 +1100
Subject: [PATCH] md: Fix - again - partition detection when array becomes active

Revert
    b821eaa572fd737faaf6928ba046e571526c36c6
and
    f3b99be19ded511a1bf05a148276239d9f13eefa

When I wrote the first of these I had a wrong idea about the
lifetime of 'struct block_device'.  It can disappear at any time that
the block device is not open if it falls out of the inode cache.

So relying on the 'size' recorded with it to detect when the
device size has changed and so we need to revalidate, is wrong.

Rather, we really do need the 'changed' attribute stored directly in
the mddev and set/tested as appropriate.

Without this patch, a sequence of:
   mknod / open / close / unlink

(which can cause a block_device to be created and then destroyed)
will result in a rescan of the partition table and consequence removal
and addition of partitions.
Several of these in a row can get udev racing to create and unlink and
other code can get confused.

With the patch, the rescan is only performed when needed and so there
are no races.

This is suitable for any stable kernel from 2.6.35.

Reported-by: "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
---
 drivers/md/md.c |   22 +++++++++++++++++++++-
 drivers/md/md.h |    2 ++
 2 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 330addf..818313e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4627,6 +4627,7 @@ static int do_md_run(mddev_t *mddev)
 	}
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	revalidate_disk(mddev->gendisk);
+	mddev->changed = 1;
 	kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE);
 out:
 	return err;
@@ -4715,6 +4716,7 @@ static void md_clean(mddev_t *mddev)
 	mddev->sync_speed_min = mddev->sync_speed_max = 0;
 	mddev->recovery = 0;
 	mddev->in_sync = 0;
+	mddev->changed = 0;
 	mddev->degraded = 0;
 	mddev->safemode = 0;
 	mddev->bitmap_info.offset = 0;
@@ -4830,6 +4832,7 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open)
 
 		set_capacity(disk, 0);
 		mutex_unlock(&mddev->open_mutex);
+		mddev->changed = 1;
 		revalidate_disk(disk);
 
 		if (mddev->ro)
@@ -6014,7 +6017,7 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 	atomic_inc(&mddev->openers);
 	mutex_unlock(&mddev->open_mutex);
 
-	check_disk_size_change(mddev->gendisk, bdev);
+	check_disk_change(bdev);
  out:
 	return err;
 }
@@ -6029,6 +6032,21 @@ static int md_release(struct gendisk *disk, fmode_t mode)
 
 	return 0;
 }
+
+static int md_media_changed(struct gendisk *disk)
+{
+	mddev_t *mddev = disk->private_data;
+
+	return mddev->changed;
+}
+
+static int md_revalidate(struct gendisk *disk)
+{
+	mddev_t *mddev = disk->private_data;
+
+	mddev->changed = 0;
+	return 0;
+}
 static const struct block_device_operations md_fops =
 {
 	.owner		= THIS_MODULE,
@@ -6039,6 +6057,8 @@ static const struct block_device_operations md_fops =
 	.compat_ioctl	= md_compat_ioctl,
 #endif
 	.getgeo		= md_getgeo,
+	.media_changed  = md_media_changed,
+	.revalidate_disk= md_revalidate,
 };
 
 static int md_thread(void * arg)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 7e90b85..12215d4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -274,6 +274,8 @@ struct mddev_s
 	atomic_t			active;		/* general refcount */
 	atomic_t			openers;	/* number of active opens */
 
+	int				changed;	/* True if we might need to
+							 * reread partition info */
 	int				degraded;	/* whether md should consider
 							 * adding a spare
 							 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: Single-drive RAID0
  2011-02-22  0:50                     ` NeilBrown
@ 2011-02-22 11:50                       ` Wojcik, Krzysztof
  2011-02-22 16:42                       ` Jan Ceuleers
  1 sibling, 0 replies; 16+ messages in thread
From: Wojcik, Krzysztof @ 2011-02-22 11:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Tuesday, February 22, 2011 1:50 AM
> To: Wojcik, Krzysztof
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Single-drive RAID0
> 
> On Mon, 21 Feb 2011 14:11:51 +0000 "Wojcik, Krzysztof"
> <krzysztof.wojcik@intel.com> wrote:
> 
> > Hi,
> >
> > I tested your proposal.
> > It resolve the problem.
> 
> Thanks!
> 
> > What are your future plans?
> > Do you need to consult the patch with someone involved in the block
> devices code?
> > When can I expect final patch?
> 
> Here it is.  As you can see it is quite different - I figured out what
> is
> really going on..
> 
> 
> >
> > >
> > > If it does, then we can explore more...
> > > Did you say that this works better on earlier kernels?  If so, what
> is
> > > the
> > > latest kernel that you have tried that you don't have this problem
> on?
> >
> > I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember
> correctly, 2.6.34- they work properly.
> 
> That fits - I discovered that I broke it in 2.6.35.
> 
> Thanks,
> NeilBrown

Hi,

I confirm. This patch resolve the problem.
I did tests on your current top of for-next branch and issue is not reproducible.
Thank for your cooperation! :-)

Regards
Krzysztof

> 
> From b7fea734c7780ca80f58fb87a6c9f55aa83314d7 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Tue, 22 Feb 2011 11:48:03 +1100
> Subject: [PATCH] md: Fix - again - partition detection when array
> becomes active
> 
> Revert
>     b821eaa572fd737faaf6928ba046e571526c36c6
> and
>     f3b99be19ded511a1bf05a148276239d9f13eefa
> 
> When I wrote the first of these I had a wrong idea about the
> lifetime of 'struct block_device'.  It can disappear at any time that
> the block device is not open if it falls out of the inode cache.
> 
> So relying on the 'size' recorded with it to detect when the
> device size has changed and so we need to revalidate, is wrong.
> 
> Rather, we really do need the 'changed' attribute stored directly in
> the mddev and set/tested as appropriate.
> 
> Without this patch, a sequence of:
>    mknod / open / close / unlink
> 
> (which can cause a block_device to be created and then destroyed)
> will result in a rescan of the partition table and consequence removal
> and addition of partitions.
> Several of these in a row can get udev racing to create and unlink and
> other code can get confused.
> 
> With the patch, the rescan is only performed when needed and so there
> are no races.
> 
> This is suitable for any stable kernel from 2.6.35.
> 
> Reported-by: "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
> Cc: stable@kernel.org
> ---
>  drivers/md/md.c |   22 +++++++++++++++++++++-
>  drivers/md/md.h |    2 ++
>  2 files changed, 23 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 330addf..818313e 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4627,6 +4627,7 @@ static int do_md_run(mddev_t *mddev)
>  	}
>  	set_capacity(mddev->gendisk, mddev->array_sectors);
>  	revalidate_disk(mddev->gendisk);
> +	mddev->changed = 1;
>  	kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE);
>  out:
>  	return err;
> @@ -4715,6 +4716,7 @@ static void md_clean(mddev_t *mddev)
>  	mddev->sync_speed_min = mddev->sync_speed_max = 0;
>  	mddev->recovery = 0;
>  	mddev->in_sync = 0;
> +	mddev->changed = 0;
>  	mddev->degraded = 0;
>  	mddev->safemode = 0;
>  	mddev->bitmap_info.offset = 0;
> @@ -4830,6 +4832,7 @@ static int do_md_stop(mddev_t * mddev, int mode,
> int is_open)
> 
>  		set_capacity(disk, 0);
>  		mutex_unlock(&mddev->open_mutex);
> +		mddev->changed = 1;
>  		revalidate_disk(disk);
> 
>  		if (mddev->ro)
> @@ -6014,7 +6017,7 @@ static int md_open(struct block_device *bdev,
> fmode_t mode)
>  	atomic_inc(&mddev->openers);
>  	mutex_unlock(&mddev->open_mutex);
> 
> -	check_disk_size_change(mddev->gendisk, bdev);
> +	check_disk_change(bdev);
>   out:
>  	return err;
>  }
> @@ -6029,6 +6032,21 @@ static int md_release(struct gendisk *disk,
> fmode_t mode)
> 
>  	return 0;
>  }
> +
> +static int md_media_changed(struct gendisk *disk)
> +{
> +	mddev_t *mddev = disk->private_data;
> +
> +	return mddev->changed;
> +}
> +
> +static int md_revalidate(struct gendisk *disk)
> +{
> +	mddev_t *mddev = disk->private_data;
> +
> +	mddev->changed = 0;
> +	return 0;
> +}
>  static const struct block_device_operations md_fops =
>  {
>  	.owner		= THIS_MODULE,
> @@ -6039,6 +6057,8 @@ static const struct block_device_operations
> md_fops =
>  	.compat_ioctl	= md_compat_ioctl,
>  #endif
>  	.getgeo		= md_getgeo,
> +	.media_changed  = md_media_changed,
> +	.revalidate_disk= md_revalidate,
>  };
> 
>  static int md_thread(void * arg)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 7e90b85..12215d4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -274,6 +274,8 @@ struct mddev_s
>  	atomic_t			active;		/* general refcount */
>  	atomic_t			openers;	/* number of active opens */
> 
> +	int				changed;	/* True if we might need to
> +							 * reread partition info */
>  	int				degraded;	/* whether md should consider
>  							 * adding a spare
>  							 */
> --
> 1.7.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-22  0:50                     ` NeilBrown
  2011-02-22 11:50                       ` Wojcik, Krzysztof
@ 2011-02-22 16:42                       ` Jan Ceuleers
  2011-02-23  2:20                         ` NeilBrown
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Ceuleers @ 2011-02-22 16:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Wojcik, Krzysztof, linux-raid@vger.kernel.org

On 22/02/11 01:50, NeilBrown wrote:
> With the patch, the rescan is only performed when needed and so there
> are no races.

Neil,

If I've properly understood your commit message you are reducing the 
frequency of the rescans, but they can still occur. I submit that this 
merely reduces the probability but does not remove the possibility of 
races with udev.

If my interpretation is correct, then is this something that should be 
fixed (to remove the possibility of races altogether), or merely documented?

Thanks, Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single-drive RAID0
  2011-02-22 16:42                       ` Jan Ceuleers
@ 2011-02-23  2:20                         ` NeilBrown
  0 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2011-02-23  2:20 UTC (permalink / raw)
  To: Jan Ceuleers; +Cc: Wojcik, Krzysztof, linux-raid@vger.kernel.org

On Tue, 22 Feb 2011 17:42:14 +0100 Jan Ceuleers <jan.ceuleers@computer.org>
wrote:

> On 22/02/11 01:50, NeilBrown wrote:
> > With the patch, the rescan is only performed when needed and so there
> > are no races.
> 
> Neil,
> 
> If I've properly understood your commit message you are reducing the 
> frequency of the rescans, but they can still occur. I submit that this 
> merely reduces the probability but does not remove the possibility of 
> races with udev.
> 
> If my interpretation is correct, then is this something that should be 
> fixed (to remove the possibility of races altogether), or merely documented?
> 
> Thanks, Jan

Thanks for reviewing the code!!

The patch should reduce the frequency of rescans to just those times when it
is actually needed.
i.e. when the array data first becomes available, and when it stops being
available.
So it will only happen as a direct result of an administrative action.

Previously it could happen at arbitrary times based on open /close/unlink
patterns.

So it isn't really "reducing the frequency" so much as "making it happen only
at well defined times, not arbitrary times".

So this does remove any possibility of races.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-02-23  2:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-08 16:15 Single-drive RAID0 Wojcik, Krzysztof
2011-02-09  2:45 ` NeilBrown
2011-02-09 14:33   ` Wojcik, Krzysztof
2011-02-14  2:11     ` NeilBrown
2011-02-14 17:04       ` Wojcik, Krzysztof
2011-02-15  0:01         ` NeilBrown
2011-02-15 16:30           ` Wojcik, Krzysztof
2011-02-16  0:42             ` NeilBrown
2011-02-16 16:57               ` Wojcik, Krzysztof
2011-02-17  0:38                 ` NeilBrown
2011-02-21  2:15                 ` NeilBrown
2011-02-21 14:11                   ` Wojcik, Krzysztof
2011-02-22  0:50                     ` NeilBrown
2011-02-22 11:50                       ` Wojcik, Krzysztof
2011-02-22 16:42                       ` Jan Ceuleers
2011-02-23  2:20                         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).