Update -- so far, we've not managed to gain any confidence that we'll ever be able to re-mount that disk. The general consensus seems to be to fish all the data off the disk using rsync, and then move off XFS to ext4. Not a very helpful message for y'all to hear, I know. But if it's any help in prioritising your future work, i think the dealbreaker for us was the inescapable quotacheck on mount, which means that any time a fileserver goes down unexpectedly, we have an unavoidable, indeterminate-but-long period of downtime... hp On 26/02/15 13:07, Harry wrote: > Thanks Dave, > > * The main filesystem is currently online and seems ok, but quotas are > not active. > * We want to estimate how long the quotacheck will take when we > reboot/remount > * We're even a bit worried the disk might be in a broken state, such > that the quotacheck won't actually complete successfully at all. > > A brief description of our setup: > - we're on AWS > - using mdadm to make a raid array out of 8x 200GB SSD EBS drives (and > lvm) > - we're using DRBD to make a live backup of all writes to another > instance with a similar raid array > > We're not doing our experiments on our live system. Instead, we're > using the drives from the DRBD target system. We take DRBD offline, > so it's no longer writing, then we take snapshots of the drives, then > remount those elsewhere so we can experiment without disturbing the > live system. > > We've managed to mount the backup drives ok, with the 'noquota' > option. Files look ok. But, so far, we haven't been able to get a > quotacheck to complete. We've waited 12 hours+. Do you think it's > possible DRBD is giving us copies of the live disks that are > inconsistent somehow? > > How can we reassure ourselves that this live disk *will* mount > successfully if we reboot the machine, and can we estimate how long it > will take? > > /mount | grep log_storage/ > /dev/drbd0 on /mnt/log_storage type xfs > (rw,prjquota,allocsize=64k,_netdev) > > /df -i /mnt/log_storage// > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/drbd0 938210704 72929413 865281291 8% /mnt/log_storage > > /df -h /mnt/log_storage// > Filesystem Size Used Avail Use% Mounted on > /dev/drbd0 1.6T 1.4T 207G 88% /mnt/log_storage > > /xfs_info ///mnt/log_storage//// > // > meta-data=/dev/drbd0 isize=256 agcount=64, > agsize=6553600 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=418906112, > imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=12800, version=2 > = sectsz=512 sunit=0 blks, > lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > The missing paths errors are, I think, from folders we've deleted but > not yet removed from the projid/projects folders. I *think* they're a > red herring here. > > We've also tried running xfs_repair on the backup drives. It takes > about 3 hours, and shows a lot of errors about incorrect directory > flags on inodes. here's one from the bottom of the log of a recent > attempt: > > directory flags set on non-directory inode 268702898 > > > rgds, > Confused in London. > > > > On 24/02/15 21:59, Dave Chinner wrote: >> On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote: >>> Hi there, >>> >>> We've got a moderately large disk (~2TB) into an inconsistent state, >>> such that it's going to want a quotacheck the next time we mount it >>> (it's currently mounted with quota accounting inactive). Our tests >>> suggest this is going to take several hours, and cause an outage we >>> can't afford. >> What tests are you performing to suggest a quotacheck of a small >> filesystem will take hours? (yes, 2TB is a *small* filesystem). >> >> (xfs_info, df -i, df -h, storage hardware, etc are all relevant >> here). >> >>> We're wondering whether there's a 'nuke the site from orbit' option >>> that will let us avoid it. The plan would be to: >>> - switch off quotas and delete them completely, using the commands: >>> -- disable >>> -- off >>> -- remove >>> - remount the drive with -o prjquota, hoping that there will not be >>> a quotacheck, because we've deleted all the old quota data >> Mounting with a quota enabled *forces* a quota check if quotas >> aren't currently enabled. You cannot avoid it; it's the way quota >> consistency is created. >> >>> - run a script gradually restore all the quotas, one by one and in >>> good time, from our own external backups (we've got the quotas in a >>> database basically). >> Can't be done - quotas need to be consistent with what is currently >> on disk, not what you have in a backup somewhere. >> >>> So the questions are: >>> - is there a way to remove all quota information from a mounted drive? >>> (the current mount status seems to be that it tried to mount it with >> mount with quotas on and turn them off via xfs_quota,i or mount >> without quota options at all. Then run the remove command in >> xfs_quota. >> >>> -o prjquota but that quota accounting is *not* active) >> Not possible. >> >>> - will it work and let us remount the drive with -o prjquota without >>> causing a quotacheck? >> No. >> >> Cheers, >> >> Dave. > > Rgds, > Harry + the PythonAnywhere team. > > -- > Harry Percival > Developer > harry@pythonanywhere.com > > PythonAnywhere - a fully browser-based Python development and hosting environment > > > PythonAnywhere LLP > 17a Clerkenwell Road, London EC1M 5RD, UK > VAT No.: GB 893 5643 79 > Registered in England and Wales as company number OC378414. > Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK Rgds, Harry + the PythonAnywhere team. -- Harry Percival Developer harry@pythonanywhere.com PythonAnywhere - a fully browser-based Python development and hosting environment PythonAnywhere LLP 17a Clerkenwell Road, London EC1M 5RD, UK VAT No.: GB 893 5643 79 Registered in England and Wales as company number OC378414. Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK