* Single-drive RAID0 @ 2011-02-08 16:15 Wojcik, Krzysztof 2011-02-09 2:45 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-08 16:15 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid@vger.kernel.org Neil, I try to create a single-drive RAID0 array with data preservation. It means I have a single disk with partitions and data on them. I would like to migrate this single disk to RAID0 array and then grow it to N devices. Of course I expect partitions with my useful data will be mapped to corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.). Indeed they are but when I try to mount new block device it disappears from /dev directory and mount is not succeeded. /var/log/messages shows all the time: md126: p1 md126: detected capacity change from 0 to xxxxxxxxxxxxx It is not always reproducible but very often. I try this operation on 0.9 and imsm metadata. Symptoms are similar. Do you have any suggestions? How to fix this issue? Regards Krzysztof ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-08 16:15 Single-drive RAID0 Wojcik, Krzysztof @ 2011-02-09 2:45 ` NeilBrown 2011-02-09 14:33 ` Wojcik, Krzysztof 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-02-09 2:45 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Tue, 8 Feb 2011 16:15:53 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > Neil, > > I try to create a single-drive RAID0 array with data preservation. > It means I have a single disk with partitions and data on them. > I would like to migrate this single disk to RAID0 array and then grow it to N devices. > Of course I expect partitions with my useful data will be mapped to corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.). Indeed they are but when I try to mount new block device it disappears from /dev directory and mount is not succeeded. > /var/log/messages shows all the time: > md126: p1 > md126: detected capacity change from 0 to xxxxxxxxxxxxx > > It is not always reproducible but very often. > I try this operation on 0.9 and imsm metadata. Symptoms are similar. > > Do you have any suggestions? How to fix this issue? > > Regards > Krzysztof It isn't clear to me what the issue is and when I try something that might be what you are suggesting it works perfectly every time. Maybe if you could provide something more detailed and specific. e.g. a series of steps that I can try together with all the messages you get (both from the kernel and from mdadm) throughout the process. That will probably help. NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-09 2:45 ` NeilBrown @ 2011-02-09 14:33 ` Wojcik, Krzysztof 2011-02-14 2:11 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-09 14:33 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2750 bytes --] Steps to reproduce: 1. Create partition on disk (for example: 1 partition, size=half of /dev/sda disk) 2. Create filesystem on partition (ext3) 3. Mount partition and create some control data on partition 4. Unmount partition 5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l 0 -n 1 /dev/sda --force) 6. md127 and md127p1 appears in /dev as expected 7. Try to mount md127p1 and verify control data on them Expected result: md127p1 device mounted and control data not corrupted Actual result: 1. Mount command returns error: mount: special device /dev/md127p1 does not exists 2. md127p1 disappears from /dev/ 3. kernel indefinitely shows (until array will be stop): md126: p1 md126: detected capacity change from 0 to xxxxxxxxxxxxx md126: p1 md126: detected capacity change from 0 to xxxxxxxxxxxxx ... Additional Info: Reproduced on: - SLES11 SP1 - mdamd 3.2 from top of your devel-3.2 branch - kernel 2.6.38 RC2 I've attached script that I use to reproduce this issue. For native metadata run: ./raid_ready_drive native For imsm metadata just: ./raid_ready_drive Regards Krzysztof > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Wednesday, February 09, 2011 3:46 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Tue, 8 Feb 2011 16:15:53 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > Neil, > > > > I try to create a single-drive RAID0 array with data preservation. > > It means I have a single disk with partitions and data on them. > > I would like to migrate this single disk to RAID0 array and then grow > it to N devices. > > Of course I expect partitions with my useful data will be mapped to > corresponding MD block devices (/dev/md126p1, /dev/md126p2 etc.). > Indeed they are but when I try to mount new block device it disappears > from /dev directory and mount is not succeeded. > > > /var/log/messages shows all the time: > > md126: p1 > > md126: detected capacity change from 0 to xxxxxxxxxxxxx > > > > It is not always reproducible but very often. > > I try this operation on 0.9 and imsm metadata. Symptoms are similar. > > > > Do you have any suggestions? How to fix this issue? > > > > Regards > > Krzysztof > > It isn't clear to me what the issue is and when I try something that > might be > what you are suggesting it works perfectly every time. > > Maybe if you could provide something more detailed and specific. > e.g. a series of steps that I can try together with all the messages > you get > (both from the kernel and from mdadm) throughout the process. > > That will probably help. > > NeilBrown [-- Attachment #2: raid_ready_drive --] [-- Type: application/octet-stream, Size: 2635 bytes --] #!/bin/bash V1MNT=/mnt/vol1 V2MNT=/mnt/vol2 CONT=/dev/md/imsm0 FILE_SIZE=100000000 CHUNK=128 # define disk for tests #disk=/dev/sda N1=/dev/md/vol_1 function stat() { # set +x if [ $1 != $2 ] ; then echo -e "\nCommand $3 returned wrong status (expected: $2, got: $1)\n" exit 1 fi # set -x } if [ -z $disk ] ; then echo "Disk for test not specified. Please set \"disk\" variable." exit fi echo "Disk to test= $disk" if [ ! -d $V1MNT ] ; then mkdir $V1MNT fi if [ ! -d $V2MNT ] ; then mkdir $V2MNT fi #set -x ################## cleanup ############################ #unmount umount $V1MNT umount $V2MNT #stop res=1 while [ $res != 0 ] ; do mdadm -Ss res=$? done #clean mdadm --zero-superblock $disk ######################################################### ######################### MAIN ########################## ######################################################### ################## prepare disk for tests ############### sfdisk -uM $disk << EOF ,50000,L ,50000,L EOF udevadm settle P1=$disk'1' P2=$disk'2' mkfs $P1 mkfs $P2 mount $P1 $V1MNT mount $P2 $V2MNT openssl rand -out $V1MNT/test1.bin $FILE_SIZE SUM1=$(md5sum $V1MNT/test1.bin | gawk '{print $1}') openssl rand -out $V2MNT/test1.bin $FILE_SIZE SUM2=$(md5sum $V2MNT/test1.bin | gawk '{print $1}') echo "SUM1=$SUM1" > $SUM echo "SUM2=$SUM2" >> $SUM umount $V1MNT umount $V2MNT ######################## run test ######################## if [[ $1 = "native" ]] ; then #create vol mdadm -C $N1 -e 0.9 -amd -l 0 --chunk $CHUNK -n 1 $disk -R --force stat $? 0 craete_vol1 else #create container mdadm -C $CONT -amd -e imsm -n 1 $disk -R --force stat $? 0 create_container #create vol mdadm -C $N1 -amd -l 0 --chunk $CHUNK -n 1 $disk -R --force stat $? 0 craete_vol1 fi #check status cat /proc/mdstat udevadm settle ls /dev | grep md ls /dev/md ls /dev/md | grep vol_1p1 > /dev/null if [ $? != 0 ] ; then echo "Error. Partition #1 not mapped!" exit fi ls /dev/md | grep vol_1p2 > /dev/null if [ $? != 0 ] ; then echo "Error. Partition #2 not mapped!" exit fi #mount mount $N1'p1' $V1MNT stat $? 0 mount_vol1 mount $N1'p2' $V2MNT stat $? 0 mount_vol2 SUM1_A=$(md5sum $V1MNT/test1.bin | gawk '{print $1}') SUM2_A=$(md5sum $V2MNT/test1.bin | gawk '{print $1}') if [[ $SUM1 != $SUM1_A ]] || [[ $SUM2 != $SUM2_A ]] ; then echo "Error. Test failed. md5sum of files not equal!" else echo "Test PASSED!" fi ############################################################# ######################## END ################################ ############################################################# ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-09 14:33 ` Wojcik, Krzysztof @ 2011-02-14 2:11 ` NeilBrown 2011-02-14 17:04 ` Wojcik, Krzysztof 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-02-14 2:11 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Wed, 9 Feb 2011 14:33:01 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > > Steps to reproduce: > 1. Create partition on disk (for example: 1 partition, size=half of /dev/sda disk) > 2. Create filesystem on partition (ext3) > 3. Mount partition and create some control data on partition > 4. Unmount partition > 5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l 0 -n 1 /dev/sda --force) > 6. md127 and md127p1 appears in /dev as expected > 7. Try to mount md127p1 and verify control data on them I couldn't get this to fail for me. It is possible that there is something subtle about the precise device geometry. Could you send me sfdisk -l /dev/sda (or whatever device you are using) However when I ran the script without requesting native metadata I did get some results somewhat similar to yours - and they were not consistent which is an important similarity. This patch should fix that particular problem. Let me know if you can still produce any of these errors with the patch applied. Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 0cc30ec..6818ff4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4138,10 +4138,10 @@ array_size_store(mddev_t *mddev, const char *buf, size_t len) } mddev->array_sectors = sectors; - set_capacity(mddev->gendisk, mddev->array_sectors); - if (mddev->pers) + if (mddev->pers) { + set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); - + } return len; } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-14 2:11 ` NeilBrown @ 2011-02-14 17:04 ` Wojcik, Krzysztof 2011-02-15 0:01 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-14 17:04 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Monday, February 14, 2011 3:12 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Wed, 9 Feb 2011 14:33:01 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > > > Steps to reproduce: > > 1. Create partition on disk (for example: 1 partition, size=half of > /dev/sda disk) > > 2. Create filesystem on partition (ext3) > > 3. Mount partition and create some control data on partition > > 4. Unmount partition > > 5. Create RAID0 on disk (mdadm -C /dev/md/vol_1 -amd -e 0.9 -c 64 -l > 0 -n 1 /dev/sda --force) > > 6. md127 and md127p1 appears in /dev as expected > > 7. Try to mount md127p1 and verify control data on them > > I couldn't get this to fail for me. > It is possible that there is something subtle about the precise device > geometry. > Could you send me > sfdisk -l /dev/sda > (or whatever device you are using) > > However when I ran the script without requesting native metadata I did > get some results somewhat similar to yours - and they were not > consistent > which is an important similarity. > > This patch should fix that particular problem. Let me know if you > can still produce any of these errors with the patch applied. Unfortunately issue is reproducible with path applied. I tried to reproduce it on other setup (other PC and disks) also... issue exists :( Interesting observation is that when I stop array just after creation and then reassemble it again, everything work fine. On older kernel version (I tried 2.6.34) issue is NOT reproducible also. Regards Krzysztof > > Thanks, > NeilBrown > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 0cc30ec..6818ff4 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4138,10 +4138,10 @@ array_size_store(mddev_t *mddev, const char > *buf, size_t len) > } > > mddev->array_sectors = sectors; > - set_capacity(mddev->gendisk, mddev->array_sectors); > - if (mddev->pers) > + if (mddev->pers) { > + set_capacity(mddev->gendisk, mddev->array_sectors); > revalidate_disk(mddev->gendisk); > - > + } > return len; > } > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-14 17:04 ` Wojcik, Krzysztof @ 2011-02-15 0:01 ` NeilBrown 2011-02-15 16:30 ` Wojcik, Krzysztof 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-02-15 0:01 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Mon, 14 Feb 2011 17:04:38 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > > It is possible that there is something subtle about the precise device > > geometry. > > Could you send me > > sfdisk -l /dev/sda > > (or whatever device you are using) You didn't include this information... > > This patch should fix that particular problem. Let me know if you > > can still produce any of these errors with the patch applied. > > > Unfortunately issue is reproducible with path applied. > I tried to reproduce it on other setup (other PC and disks) also... issue exists :( > > Interesting observation is that when I stop array just after creation and then reassemble it again, everything work fine. > On older kernel version (I tried 2.6.34) issue is NOT reproducible also. > All very strange. And as I cannot reproduce it, it is very hard to debug. Maybe some daemon is accessing the array at some awkward time and causing different behaviour for you... If it is repeatable enough that you could try 'git bisect' to find which commit introduced the problem, that would be helpful but I suspect it would be very time-consuming. It might help to put a "WARN_ON(1)" in the place where it prints "detected capacity change ..." so we get a stack trace and can see how it got there. That might git a hint to what is looping. Also a printk in md_open if it returns ERESTARTSYS would be interesting. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-15 0:01 ` NeilBrown @ 2011-02-15 16:30 ` Wojcik, Krzysztof 2011-02-16 0:42 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-15 16:30 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2621 bytes --] > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Tuesday, February 15, 2011 1:02 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Mon, 14 Feb 2011 17:04:38 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > It is possible that there is something subtle about the precise > device > > > geometry. > > > Could you send me > > > sfdisk -l /dev/sda > > > (or whatever device you are using) > > You didn't include this information... Disk /dev/sdc: 30515 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdc1 0+ 6374 6375- 51207187 83 Linux /dev/sdc2 6375 12749 6375 51207187+ 83 Linux /dev/sdc3 0 - 0 0 0 Empty /dev/sdc4 0 - 0 0 0 Empty > > > > This patch should fix that particular problem. Let me know if you > > > can still produce any of these errors with the patch applied. > > > > > > Unfortunately issue is reproducible with path applied. > > I tried to reproduce it on other setup (other PC and disks) also... > issue exists :( > > > > Interesting observation is that when I stop array just after creation > and then reassemble it again, everything work fine. > > On older kernel version (I tried 2.6.34) issue is NOT reproducible > also. > > > > All very strange. And as I cannot reproduce it, it is very hard to > debug. > > Maybe some daemon is accessing the array at some awkward time and > causing > different behaviour for you... > > If it is repeatable enough that you could try 'git bisect' to find > which > commit introduced the problem, that would be helpful but I suspect it > would > be very time-consuming. > > It might help to put a "WARN_ON(1)" in the place where it prints > "detected > capacity change ..." so we get a stack trace and can see how it got > there. > That might git a hint to what is looping. > Also a printk in md_open if it returns ERESTARTSYS would be > interesting. In attachment part of logs from kernel with WARN_ON(1) and value returned by md_open() (line preceded with "##### KW: err= x"). I am trying to look for in new areas. I've run: Udevd --debug --dedug-traces Logs from udev and kernel in attachment. Maybe it will help to find solution... It seems to udev adds and removes device in loop... Regards Krzysztof > > Thanks, > NeilBrown [-- Attachment #2: kernel_log.tgz --] [-- Type: application/x-compressed, Size: 1396 bytes --] [-- Attachment #3: kernel_with_udev.tgz --] [-- Type: application/x-compressed, Size: 2855 bytes --] [-- Attachment #4: udev_logs.tgz --] [-- Type: application/x-compressed, Size: 5751 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-15 16:30 ` Wojcik, Krzysztof @ 2011-02-16 0:42 ` NeilBrown 2011-02-16 16:57 ` Wojcik, Krzysztof 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-02-16 0:42 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Tue, 15 Feb 2011 16:30:05 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > > It might help to put a "WARN_ON(1)" in the place where it prints > > "detected > > capacity change ..." so we get a stack trace and can see how it got > > there. > > That might git a hint to what is looping. > > Also a printk in md_open if it returns ERESTARTSYS would be > > interesting. > > In attachment part of logs from kernel with WARN_ON(1) and value returned by md_open() > (line preceded with "##### KW: err= x"). > > I am trying to look for in new areas. I've run: > Udevd --debug --dedug-traces > > Logs from udev and kernel in attachment. > Maybe it will help to find solution... > It seems to udev adds and removes device in loop... Thanks for the extra logs.... it helps a bit, but I'm not a lot closer. I've seen something a little bit like this which was fixed by adding TEST!="md/array_state", GOTO="md_end" just after # container devices have a metadata version of e.g. 'external:ddf' and # never leave state 'inactive' in /lib/udev/rules.d/64-md-raid.rules Could you try that?? It looks like the 'bdev' passed to md_open keeps changing, which it shouldn't. If the above doesn't help, please add: printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev->gendisk, bdev->bd_dev); at the top of 'md_open', and see what it produces. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-16 0:42 ` NeilBrown @ 2011-02-16 16:57 ` Wojcik, Krzysztof 2011-02-17 0:38 ` NeilBrown 2011-02-21 2:15 ` NeilBrown 0 siblings, 2 replies; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-16 16:57 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1938 bytes --] > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Wednesday, February 16, 2011 1:43 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Tue, 15 Feb 2011 16:30:05 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > > It might help to put a "WARN_ON(1)" in the place where it prints > > > "detected > > > capacity change ..." so we get a stack trace and can see how it got > > > there. > > > That might git a hint to what is looping. > > > Also a printk in md_open if it returns ERESTARTSYS would be > > > interesting. > > > > In attachment part of logs from kernel with WARN_ON(1) and value > returned by md_open() > > (line preceded with "##### KW: err= x"). > > > > I am trying to look for in new areas. I've run: > > Udevd --debug --dedug-traces > > > > Logs from udev and kernel in attachment. > > Maybe it will help to find solution... > > It seems to udev adds and removes device in loop... > > Thanks for the extra logs.... it helps a bit, but I'm not a lot closer. > > I've seen something a little bit like this which was fixed by adding > > TEST!="md/array_state", GOTO="md_end" > > just after > > # container devices have a metadata version of e.g. 'external:ddf' and > # never leave state 'inactive' > > in /lib/udev/rules.d/64-md-raid.rules > > > Could you try that?? I tried. It not helps :( > > It looks like the 'bdev' passed to md_open keeps changing, which it > shouldn't. > > If the above doesn't help, please add: > > printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev- > >gendisk, bdev->bd_dev); > > at the top of 'md_open', and see what it produces. I tried this also but I can't see any useful information that can make me some idea. Could you look at logs in attachment? Regards Krzysztof > > Thanks, > NeilBrown [-- Attachment #2: kernel_log.tgz --] [-- Type: application/x-compressed, Size: 1830 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-16 16:57 ` Wojcik, Krzysztof @ 2011-02-17 0:38 ` NeilBrown 2011-02-21 2:15 ` NeilBrown 1 sibling, 0 replies; 16+ messages in thread From: NeilBrown @ 2011-02-17 0:38 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > > > > It looks like the 'bdev' passed to md_open keeps changing, which it > > shouldn't. > > > > If the above doesn't help, please add: > > > > printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev- > > >gendisk, bdev->bd_dev); > > > > at the top of 'md_open', and see what it produces. > > I tried this also but I can't see any useful information that can make me some idea. > Could you look at logs in attachment? > Thanks. That at-least confirms that the bdev is often changing while the mddev and disk stay unchanged. Having that confirmation is useful.. I didn't think that could happen - the bdev should remain pinned while the mddev exists. Obviously it doesn't. I'll see what I can find out. NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-16 16:57 ` Wojcik, Krzysztof 2011-02-17 0:38 ` NeilBrown @ 2011-02-21 2:15 ` NeilBrown 2011-02-21 14:11 ` Wojcik, Krzysztof 1 sibling, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-02-21 2:15 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > > > > It looks like the 'bdev' passed to md_open keeps changing, which it > > shouldn't. > > > > If the above doesn't help, please add: > > > > printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev- > > >gendisk, bdev->bd_dev); > > > > at the top of 'md_open', and see what it produces. > > I tried this also but I can't see any useful information that can make me some idea. > Could you look at logs in attachment? > I've made some progress. The struct block_device is definitely getting freed and re-allocated. This can happen if you mknod /dev/.tmp b 9 0 open /dev/.tmp and do stuff, then close rm /dev/.tmp If you open something in /dev which stays there, e.g. /dev/md0, then as long as the inode for /dev/md0 remains in the inode cache it will hold a reference to the block_device and so the same one will get reused. But when to 'rm /dev/.tmp', the inode gets flushed from the cache and the reference of the block_device is dropped. If that was the only reference then the block_device is discarded. Each time a new block_device is created for a given block device, it will re-do the partition scan which will mean that the partition devices get deleted and recreated. Maybe you are racing with this somehow and that is how the partition devices appear to not be there - udev is removing and recreating them. Something you could try would be to change fs/block_dev.c. At about line 470 you should find: static const struct super_operations bdev_sops = { .statfs = simple_statfs, .alloc_inode = bdev_alloc_inode, .destroy_inode = bdev_destroy_inode, .drop_inode = generic_delete_inode, .evict_inode = bdev_evict_inode, }; If you change the 'drop_inode' line to .drop_inode = do_not_drop_inode, and define a function: static int do_not_drop_inode(struct inode *inode) { return 0; } That will cause the block_device to remain in a cache for a while after the last reference is dropped. That should make your symptoms go away. I'm not sure that is the "right" fix, but it should confirm what I suspect is happening. If it does, then we can explore more... Did you say that this works better on earlier kernels? If so, what is the latest kernel that you have tried that you don't have this problem on? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-21 2:15 ` NeilBrown @ 2011-02-21 14:11 ` Wojcik, Krzysztof 2011-02-22 0:50 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-21 14:11 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Monday, February 21, 2011 3:15 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Wed, 16 Feb 2011 16:57:06 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > > > > > It looks like the 'bdev' passed to md_open keeps changing, which it > > > shouldn't. > > > > > > If the above doesn't help, please add: > > > > > > printk("bdev=%p, mddev=%p, disk=%p dev=%x\n", bdev, mddev, mddev- > > > >gendisk, bdev->bd_dev); > > > > > > at the top of 'md_open', and see what it produces. > > > > I tried this also but I can't see any useful information that can > make me some idea. > > Could you look at logs in attachment? > > > > > I've made some progress. > > > The struct block_device is definitely getting freed and re-allocated. > > This can happen if you > > mknod /dev/.tmp b 9 0 > open /dev/.tmp and do stuff, then close > rm /dev/.tmp > > > If you open something in /dev which stays there, e.g. /dev/md0, then as > long > as the inode for /dev/md0 remains in the inode cache it will hold a > reference > to the block_device and so the same one will get reused. > But when to 'rm /dev/.tmp', the inode gets flushed from the cache and > the > reference of the block_device is dropped. If that was the only > reference > then the block_device is discarded. > > Each time a new block_device is created for a given block device, it > will > re-do the partition scan which will mean that the partition devices get > deleted and recreated. > > Maybe you are racing with this somehow and that is how the partition > devices > appear to not be there - udev is removing and recreating them. > > Something you could try would be to change fs/block_dev.c. > At about line 470 you should find: > > static const struct super_operations bdev_sops = { > .statfs = simple_statfs, > .alloc_inode = bdev_alloc_inode, > .destroy_inode = bdev_destroy_inode, > .drop_inode = generic_delete_inode, > .evict_inode = bdev_evict_inode, > }; > > If you change the 'drop_inode' line to > .drop_inode = do_not_drop_inode, > > and define a function: > > > static int do_not_drop_inode(struct inode *inode) > { > return 0; > } > > That will cause the block_device to remain in a cache for a while after > the > last reference is dropped. > That should make your symptoms go away. > > I'm not sure that is the "right" fix, but it should confirm what I > suspect > is happening. Hi, I tested your proposal. It resolve the problem. What are your future plans? Do you need to consult the patch with someone involved in the block devices code? When can I expect final patch? > > If it does, then we can explore more... > Did you say that this works better on earlier kernels? If so, what is > the > latest kernel that you have tried that you don't have this problem on? I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember correctly, 2.6.34- they work properly. Regards Krzysztof > > Thanks, > NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-21 14:11 ` Wojcik, Krzysztof @ 2011-02-22 0:50 ` NeilBrown 2011-02-22 11:50 ` Wojcik, Krzysztof 2011-02-22 16:42 ` Jan Ceuleers 0 siblings, 2 replies; 16+ messages in thread From: NeilBrown @ 2011-02-22 0:50 UTC (permalink / raw) To: Wojcik, Krzysztof; +Cc: linux-raid@vger.kernel.org On Mon, 21 Feb 2011 14:11:51 +0000 "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> wrote: > Hi, > > I tested your proposal. > It resolve the problem. Thanks! > What are your future plans? > Do you need to consult the patch with someone involved in the block devices code? > When can I expect final patch? Here it is. As you can see it is quite different - I figured out what is really going on.. > > > > > If it does, then we can explore more... > > Did you say that this works better on earlier kernels? If so, what is > > the > > latest kernel that you have tried that you don't have this problem on? > > I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember correctly, 2.6.34- they work properly. That fits - I discovered that I broke it in 2.6.35. Thanks, NeilBrown From b7fea734c7780ca80f58fb87a6c9f55aa83314d7 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Tue, 22 Feb 2011 11:48:03 +1100 Subject: [PATCH] md: Fix - again - partition detection when array becomes active Revert b821eaa572fd737faaf6928ba046e571526c36c6 and f3b99be19ded511a1bf05a148276239d9f13eefa When I wrote the first of these I had a wrong idea about the lifetime of 'struct block_device'. It can disappear at any time that the block device is not open if it falls out of the inode cache. So relying on the 'size' recorded with it to detect when the device size has changed and so we need to revalidate, is wrong. Rather, we really do need the 'changed' attribute stored directly in the mddev and set/tested as appropriate. Without this patch, a sequence of: mknod / open / close / unlink (which can cause a block_device to be created and then destroyed) will result in a rescan of the partition table and consequence removal and addition of partitions. Several of these in a row can get udev racing to create and unlink and other code can get confused. With the patch, the rescan is only performed when needed and so there are no races. This is suitable for any stable kernel from 2.6.35. Reported-by: "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org --- drivers/md/md.c | 22 +++++++++++++++++++++- drivers/md/md.h | 2 ++ 2 files changed, 23 insertions(+), 1 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 330addf..818313e 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4627,6 +4627,7 @@ static int do_md_run(mddev_t *mddev) } set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); + mddev->changed = 1; kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE); out: return err; @@ -4715,6 +4716,7 @@ static void md_clean(mddev_t *mddev) mddev->sync_speed_min = mddev->sync_speed_max = 0; mddev->recovery = 0; mddev->in_sync = 0; + mddev->changed = 0; mddev->degraded = 0; mddev->safemode = 0; mddev->bitmap_info.offset = 0; @@ -4830,6 +4832,7 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) set_capacity(disk, 0); mutex_unlock(&mddev->open_mutex); + mddev->changed = 1; revalidate_disk(disk); if (mddev->ro) @@ -6014,7 +6017,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) atomic_inc(&mddev->openers); mutex_unlock(&mddev->open_mutex); - check_disk_size_change(mddev->gendisk, bdev); + check_disk_change(bdev); out: return err; } @@ -6029,6 +6032,21 @@ static int md_release(struct gendisk *disk, fmode_t mode) return 0; } + +static int md_media_changed(struct gendisk *disk) +{ + mddev_t *mddev = disk->private_data; + + return mddev->changed; +} + +static int md_revalidate(struct gendisk *disk) +{ + mddev_t *mddev = disk->private_data; + + mddev->changed = 0; + return 0; +} static const struct block_device_operations md_fops = { .owner = THIS_MODULE, @@ -6039,6 +6057,8 @@ static const struct block_device_operations md_fops = .compat_ioctl = md_compat_ioctl, #endif .getgeo = md_getgeo, + .media_changed = md_media_changed, + .revalidate_disk= md_revalidate, }; static int md_thread(void * arg) diff --git a/drivers/md/md.h b/drivers/md/md.h index 7e90b85..12215d4 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -274,6 +274,8 @@ struct mddev_s atomic_t active; /* general refcount */ atomic_t openers; /* number of active opens */ + int changed; /* True if we might need to + * reread partition info */ int degraded; /* whether md should consider * adding a spare */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: Single-drive RAID0 2011-02-22 0:50 ` NeilBrown @ 2011-02-22 11:50 ` Wojcik, Krzysztof 2011-02-22 16:42 ` Jan Ceuleers 1 sibling, 0 replies; 16+ messages in thread From: Wojcik, Krzysztof @ 2011-02-22 11:50 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Tuesday, February 22, 2011 1:50 AM > To: Wojcik, Krzysztof > Cc: linux-raid@vger.kernel.org > Subject: Re: Single-drive RAID0 > > On Mon, 21 Feb 2011 14:11:51 +0000 "Wojcik, Krzysztof" > <krzysztof.wojcik@intel.com> wrote: > > > Hi, > > > > I tested your proposal. > > It resolve the problem. > > Thanks! > > > What are your future plans? > > Do you need to consult the patch with someone involved in the block > devices code? > > When can I expect final patch? > > Here it is. As you can see it is quite different - I figured out what > is > really going on.. > > > > > > > > > > If it does, then we can explore more... > > > Did you say that this works better on earlier kernels? If so, what > is > > > the > > > latest kernel that you have tried that you don't have this problem > on? > > > > I've tested on kernel 2.6.32 from RHEL6.0 and, if I remember > correctly, 2.6.34- they work properly. > > That fits - I discovered that I broke it in 2.6.35. > > Thanks, > NeilBrown Hi, I confirm. This patch resolve the problem. I did tests on your current top of for-next branch and issue is not reproducible. Thank for your cooperation! :-) Regards Krzysztof > > From b7fea734c7780ca80f58fb87a6c9f55aa83314d7 Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@suse.de> > Date: Tue, 22 Feb 2011 11:48:03 +1100 > Subject: [PATCH] md: Fix - again - partition detection when array > becomes active > > Revert > b821eaa572fd737faaf6928ba046e571526c36c6 > and > f3b99be19ded511a1bf05a148276239d9f13eefa > > When I wrote the first of these I had a wrong idea about the > lifetime of 'struct block_device'. It can disappear at any time that > the block device is not open if it falls out of the inode cache. > > So relying on the 'size' recorded with it to detect when the > device size has changed and so we need to revalidate, is wrong. > > Rather, we really do need the 'changed' attribute stored directly in > the mddev and set/tested as appropriate. > > Without this patch, a sequence of: > mknod / open / close / unlink > > (which can cause a block_device to be created and then destroyed) > will result in a rescan of the partition table and consequence removal > and addition of partitions. > Several of these in a row can get udev racing to create and unlink and > other code can get confused. > > With the patch, the rescan is only performed when needed and so there > are no races. > > This is suitable for any stable kernel from 2.6.35. > > Reported-by: "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> > Signed-off-by: NeilBrown <neilb@suse.de> > Cc: stable@kernel.org > --- > drivers/md/md.c | 22 +++++++++++++++++++++- > drivers/md/md.h | 2 ++ > 2 files changed, 23 insertions(+), 1 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 330addf..818313e 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4627,6 +4627,7 @@ static int do_md_run(mddev_t *mddev) > } > set_capacity(mddev->gendisk, mddev->array_sectors); > revalidate_disk(mddev->gendisk); > + mddev->changed = 1; > kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE); > out: > return err; > @@ -4715,6 +4716,7 @@ static void md_clean(mddev_t *mddev) > mddev->sync_speed_min = mddev->sync_speed_max = 0; > mddev->recovery = 0; > mddev->in_sync = 0; > + mddev->changed = 0; > mddev->degraded = 0; > mddev->safemode = 0; > mddev->bitmap_info.offset = 0; > @@ -4830,6 +4832,7 @@ static int do_md_stop(mddev_t * mddev, int mode, > int is_open) > > set_capacity(disk, 0); > mutex_unlock(&mddev->open_mutex); > + mddev->changed = 1; > revalidate_disk(disk); > > if (mddev->ro) > @@ -6014,7 +6017,7 @@ static int md_open(struct block_device *bdev, > fmode_t mode) > atomic_inc(&mddev->openers); > mutex_unlock(&mddev->open_mutex); > > - check_disk_size_change(mddev->gendisk, bdev); > + check_disk_change(bdev); > out: > return err; > } > @@ -6029,6 +6032,21 @@ static int md_release(struct gendisk *disk, > fmode_t mode) > > return 0; > } > + > +static int md_media_changed(struct gendisk *disk) > +{ > + mddev_t *mddev = disk->private_data; > + > + return mddev->changed; > +} > + > +static int md_revalidate(struct gendisk *disk) > +{ > + mddev_t *mddev = disk->private_data; > + > + mddev->changed = 0; > + return 0; > +} > static const struct block_device_operations md_fops = > { > .owner = THIS_MODULE, > @@ -6039,6 +6057,8 @@ static const struct block_device_operations > md_fops = > .compat_ioctl = md_compat_ioctl, > #endif > .getgeo = md_getgeo, > + .media_changed = md_media_changed, > + .revalidate_disk= md_revalidate, > }; > > static int md_thread(void * arg) > diff --git a/drivers/md/md.h b/drivers/md/md.h > index 7e90b85..12215d4 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -274,6 +274,8 @@ struct mddev_s > atomic_t active; /* general refcount */ > atomic_t openers; /* number of active opens */ > > + int changed; /* True if we might need to > + * reread partition info */ > int degraded; /* whether md should consider > * adding a spare > */ > -- > 1.7.1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-22 0:50 ` NeilBrown 2011-02-22 11:50 ` Wojcik, Krzysztof @ 2011-02-22 16:42 ` Jan Ceuleers 2011-02-23 2:20 ` NeilBrown 1 sibling, 1 reply; 16+ messages in thread From: Jan Ceuleers @ 2011-02-22 16:42 UTC (permalink / raw) To: NeilBrown; +Cc: Wojcik, Krzysztof, linux-raid@vger.kernel.org On 22/02/11 01:50, NeilBrown wrote: > With the patch, the rescan is only performed when needed and so there > are no races. Neil, If I've properly understood your commit message you are reducing the frequency of the rescans, but they can still occur. I submit that this merely reduces the probability but does not remove the possibility of races with udev. If my interpretation is correct, then is this something that should be fixed (to remove the possibility of races altogether), or merely documented? Thanks, Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Single-drive RAID0 2011-02-22 16:42 ` Jan Ceuleers @ 2011-02-23 2:20 ` NeilBrown 0 siblings, 0 replies; 16+ messages in thread From: NeilBrown @ 2011-02-23 2:20 UTC (permalink / raw) To: Jan Ceuleers; +Cc: Wojcik, Krzysztof, linux-raid@vger.kernel.org On Tue, 22 Feb 2011 17:42:14 +0100 Jan Ceuleers <jan.ceuleers@computer.org> wrote: > On 22/02/11 01:50, NeilBrown wrote: > > With the patch, the rescan is only performed when needed and so there > > are no races. > > Neil, > > If I've properly understood your commit message you are reducing the > frequency of the rescans, but they can still occur. I submit that this > merely reduces the probability but does not remove the possibility of > races with udev. > > If my interpretation is correct, then is this something that should be > fixed (to remove the possibility of races altogether), or merely documented? > > Thanks, Jan Thanks for reviewing the code!! The patch should reduce the frequency of rescans to just those times when it is actually needed. i.e. when the array data first becomes available, and when it stops being available. So it will only happen as a direct result of an administrative action. Previously it could happen at arbitrary times based on open /close/unlink patterns. So it isn't really "reducing the frequency" so much as "making it happen only at well defined times, not arbitrary times". So this does remove any possibility of races. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-02-23 2:20 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-08 16:15 Single-drive RAID0 Wojcik, Krzysztof 2011-02-09 2:45 ` NeilBrown 2011-02-09 14:33 ` Wojcik, Krzysztof 2011-02-14 2:11 ` NeilBrown 2011-02-14 17:04 ` Wojcik, Krzysztof 2011-02-15 0:01 ` NeilBrown 2011-02-15 16:30 ` Wojcik, Krzysztof 2011-02-16 0:42 ` NeilBrown 2011-02-16 16:57 ` Wojcik, Krzysztof 2011-02-17 0:38 ` NeilBrown 2011-02-21 2:15 ` NeilBrown 2011-02-21 14:11 ` Wojcik, Krzysztof 2011-02-22 0:50 ` NeilBrown 2011-02-22 11:50 ` Wojcik, Krzysztof 2011-02-22 16:42 ` Jan Ceuleers 2011-02-23 2:20 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).