* to be or not to be... @ 2006-04-23 12:58 gelma 2006-04-23 15:14 ` Molle Bestefich 2006-04-23 21:45 ` Neil Brown 0 siblings, 2 replies; 5+ messages in thread From: gelma @ 2006-04-23 12:58 UTC (permalink / raw) To: linux-raid Hi all, to make a long story very very shorty: a) I create /dev/md1, kernel latest rc-2-git4 and mdadm-2.4.1.tgz, with this command: /root/mdadm -Cv /dev/.static/dev/.static/dev/.static/dev/md1 --bitmap-chunk=1024 --chunk=256 --assume-clean --bitmap=internal -l5 -n5 /dev/hda2 /dev/hdb2 /dev/hde2 /dev/hdf2 missing b) dm-encrypt /dev/md1 c) create fs with: mkfs.ext3 -O dir_index -L 'tritone' -i 256000 /dev/mapper/raidone d) export it via nfs (mounting /dev/mapper/raidone as ext2) e) start to cp-ing files f) after 1 TB of written data, with no problem/warning, one of the not-in-raid-array HD freeze g) reboot, check with: fsck -C -D -y /dev/mapper/raidone a) first run: lot of strange errors report about impossible i_size values, duplicated blocks, and so on, but it ends without complain, saying everything is fixed. b) mount it (as ext3), everything, at first glance, seems good (I will check checksum tomorrow) as number/size/filename/directory place of files. In /lost+found some files, but nothing "real". I mean, special files/devices, that never were on that fs, with giga/tera size (holes, of course), with chattr bits randomly setted. when I try to remove them I've got a kernel complain about offset in dir /lost+found. c) fsck again, after everything is fine Now the cloning from old storage is going on, and now I'm wondering if "--assume-clean" could be the reason of what happens. btw, hardware passed usual test (memtest, cpuburn, ecc). thanks a lot for your time and sorry for my terrible english, gelma ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: to be or not to be... 2006-04-23 12:58 to be or not to be gelma @ 2006-04-23 15:14 ` Molle Bestefich 2006-04-23 21:45 ` Neil Brown 1 sibling, 0 replies; 5+ messages in thread From: Molle Bestefich @ 2006-04-23 15:14 UTC (permalink / raw) To: gelma; +Cc: linux-raid gelma wrote: > first run: lot of strange errors report about impossible i_size > values, duplicated blocks, and so on You mention only filesystem errors, no block device related errors. In this case, I'd say that it's more likely that dm-crypt is to blame rather than MD. I think you should try the dm-devel mailing list. Posting a complete log of everything that has happened would probably be a good thing. I have no experience with dm-crypt, but I do have experience with another dm target (dm-snapshot), which iss very good at destroying my data. If you want a stable solution for encrypting your files, I can recommend loop-aes. loop-aes has very well thought-through security, the docs are concise but have wide coverage, it has good backwards compatibility - probably not your biggest concern right now, but it is *very* nice to know that your data is accessible, in the future as well as now - etc.. I've been using it for a couple of years now, since the 2.2 or 2.4 days (can't remember), and I've had nothing short of an absolutely *brilliant* experience with it. Enough propaganda for now, hope that you get your problem solved :-). ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: to be or not to be... 2006-04-23 12:58 to be or not to be gelma 2006-04-23 15:14 ` Molle Bestefich @ 2006-04-23 21:45 ` Neil Brown 2006-04-24 14:34 ` gelma 2006-05-03 15:36 ` gelma 1 sibling, 2 replies; 5+ messages in thread From: Neil Brown @ 2006-04-23 21:45 UTC (permalink / raw) To: gelma; +Cc: linux-raid On Sunday April 23, dislessico@gmail.com wrote: > Hi all, > to make a long story very very shorty: > a) I create /dev/md1, kernel latest rc-2-git4 and mdadm-2.4.1.tgz, > with this command: > /root/mdadm -Cv /dev/.static/dev/.static/dev/.static/dev/md1 \ > --bitmap-chunk=1024 --chunk=256 --assume-clean --bitmap=internal \ ^^^^^^^^^^^^^^ > -l5 -n5 /dev/hda2 /dev/hdb2 /dev/hde2 /dev/hdf2 missing > From the man page: --assume-clean Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actu- ally write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice - while normally safe - is not recommended. Use ^^^ this ony if you really know what you are doing. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ So presumably you know what you are doing, and I wonder why you bother to ask us :-) Ofcourse, if you don't know what you are doing, then I suggest dropping the --assume-clean. In correct use of this flag can lead to data corruption. This is particularly true if your array goes degraded, but is also true while your array isn't degraded. In this case it is (I think) very unusual and may not be the cause of your corruption, but you should avoid using the flag anyway. > b) dm-encrypt /dev/md1 > > c) create fs with: > mkfs.ext3 -O dir_index -L 'tritone' -i 256000 /dev/mapper/raidone > > d) export it via nfs (mounting /dev/mapper/raidone as ext2) ^^^^ Why not ext3? > > e) start to cp-ing files > > f) after 1 TB of written data, with no problem/warning, one of the > not-in-raid-array HD freeze This could signal a bad controller. If it does, then you cannot trust any drives. NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: to be or not to be... 2006-04-23 21:45 ` Neil Brown @ 2006-04-24 14:34 ` gelma 2006-05-03 15:36 ` gelma 1 sibling, 0 replies; 5+ messages in thread From: gelma @ 2006-04-24 14:34 UTC (permalink / raw) To: Neil Brown, linux-raid On Mon, Apr 24, 2006 at 07:45:27AM +1000, Neil Brown wrote: > your array isn't degraded. In this case it is (I think) very unusual > and may not be the cause of your corruption, but you should avoid > using the flag anyway. thanks a lot for your time and your attention, Neil. Your support it's fast and valuable, as usual. well, I wasted lot of hours, after my post, trying to find the reason of the corruptions I've got. Well, the problem is funny... I mean... I can cp hundred of giga, in ext2, without complain in dmesg/log, but if I umount the fs and run fsck I've got a lot of incredible problem (duplicated blocks, and so on). with ext3 it can works for hours, seldom I've got ext3-journal corruption. anyway, after fsck, the checksum of files is always good, and lost+found full of monster (some files need debugs to be eliminated (lsattr/chattr failed working with them)). after checking hardware, changing controllers, now I have changed even hd cables. at home I will re-run all the tests. I don't think it's a problem of raid software, of course. > > > > b) dm-encrypt /dev/md1 > > > > c) create fs with: > > mkfs.ext3 -O dir_index -L 'tritone' -i 256000 /dev/mapper/raidone > > > > d) export it via nfs (mounting /dev/mapper/raidone as ext2) > ^^^^ > > Why not ext3? Well, because I had to clone 1,5 TB of data, spread over a lot of disks, in one shot, and to avoid journal seeks I've done so. > > > > > e) start to cp-ing files > > > > f) after 1 TB of written data, with no problem/warning, one of the > > not-in-raid-array HD freeze > > This could signal a bad controller. If it does, then you cannot trust > any drives. well, it was my fault... I mean, I've got a Dell server, without enough internal room for all the disks. The source disk was out of the server, and I move it... it wasn't happy... anyway, I'm using HPT ATA PCI controller (well tested, I mean, I used the ones in the server since 2000). btw, 5 disks Maxtor, 500Giga each one. The problem isn't MD related, but it's the first time I've got so much problems finding the culprit of data corruption. Usually it's RAM/CPU fault, few times I've got problem with controller... but this time I'm going slightly mad... also, why meta and not data (file are checked with a stupid python script I wrote)... is there an ATA command triggered only with metadata? uhm... maybe mounting the array in synchronous mode I could gather more info, uhm... at the end, Neil, thanks a lot for your work. If you'll be in Italy, some day, I'll be happy to be your host. ciao, gelma ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: to be or not to be... 2006-04-23 21:45 ` Neil Brown 2006-04-24 14:34 ` gelma @ 2006-05-03 15:36 ` gelma 1 sibling, 0 replies; 5+ messages in thread From: gelma @ 2006-05-03 15:36 UTC (permalink / raw) To: linux-raid; +Cc: Neil Brown On lun, apr 24, 2006 at 07:45:27 +1000, Neil Brown wrote: > This could signal a bad controller. If it does, then you cannot trust > any drives. Hi all, just to tell the end of story, my tests, and others,[1] confirms problema in MD+DM-crypt interaction. It would be good to put an advisory in menuconfig. Thanks a lot for your time, gelma ------- [1] http://episteme.arstechnica.com/groupee/forums/a/tpc/f/96509133/m/282007248731 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-05-03 15:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-23 12:58 to be or not to be gelma 2006-04-23 15:14 ` Molle Bestefich 2006-04-23 21:45 ` Neil Brown 2006-04-24 14:34 ` gelma 2006-05-03 15:36 ` gelma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).