* Problem with mkfs.xfs on a regular file
@ 2013-11-27 2:31 Phil White
2013-11-27 2:36 ` Phil White
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Phil White @ 2013-11-27 2:31 UTC (permalink / raw)
To: xfs
Gents:
I was making an image for a VM using everyone's favorite fs with a line
that looked something like this:
-------------
dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
-------------
mkfs.xfs gave me this output:
-------------
meta-data=/root/image isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
existing superblock read failed: Invalid argument
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
-------------
And, of course, it failed to mount.
Knowing that this was using a possibly old version of xfsprogs (3.1.10, it
turned out), I grabbed the git tree and tried that. The same problem occurred.
The "existing superblock read" problem was from mkfs/xfs_mkfs.c @ 806 in
zero_old_xfs_structures():
-------------
if (pread(xi->dfd, buf, new_sb->sb_sectsize, 0) != new_sb->sb_sectsize) {
fprintf(stderr, _("existing superblock read failed: %s\n"),
strerror(errno));
free(buf);
return;
}
-------------
The pwrite64 problem was from libxfs/rdwr.c @ 806 in __write_buf():
-------------
sts = pwrite64(fd, buf, len, offset);
if (sts < 0) {
int error = errno;
fprintf(stderr, _("%s: pwrite64 failed: %s\n"),
progname, strerror(error));
if (flags & LIBXFS_B_EXIT)
exit(1);
return error;
} else if (sts != len) {
fprintf(stderr, _("%s: error - pwrite64 only %d of %d bytes\n"),
progname, sts, len);
if (flags & LIBXFS_B_EXIT)
exit(1);
return EIO;
}
-------------
While it occurred to me that the problem might just be line 806 of some files
in xfsprogs, I threw it under a debugger and took a closer look. The file
descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I
thought that might indicate a problem with lseek(), so I rewrote the pwrite64()
and pread() as lseek()s and read()/write()
As you may have guessed, this did me no good at all.
It's trying to read/write 512 bytes at the beginning of the file which seems
reasonably innocuous. I double checked the man page which says that under
2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp comes
out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures()
is called and the first error comes up, so I'm at a loss for what's going wrong.
-Phil
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White @ 2013-11-27 2:36 ` Phil White 2013-11-27 2:38 ` Nathan Scott 2013-11-27 2:47 ` Dave Chinner 2 siblings, 0 replies; 15+ messages in thread From: Phil White @ 2013-11-27 2:36 UTC (permalink / raw) To: xfs On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote: > Gents: > > I was making an image for a VM using everyone's favorite fs with a line > that looked something like this: > ------------- > dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop > ------------- Obviously, I intended "mkfs.xfs -f /root/image". Why is proofreading *AFTER* you send an email so much more effective? -Phil _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White 2013-11-27 2:36 ` Phil White @ 2013-11-27 2:38 ` Nathan Scott 2013-11-27 2:41 ` Phil White 2013-11-27 2:47 ` Dave Chinner 2 siblings, 1 reply; 15+ messages in thread From: Nathan Scott @ 2013-11-27 2:38 UTC (permalink / raw) To: Phil White; +Cc: xfs ----- Original Message ----- > ... > It's trying to read/write 512 bytes at the beginning of the file which seems > reasonably innocuous. I double checked the man page which says that under > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp > comes > out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures() > is called and the first error comes up, so I'm at a loss for what's going > wrong. The filesystem backing the new /root/image file doesn't support direct I/O? cheers. -- Nathan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:38 ` Nathan Scott @ 2013-11-27 2:41 ` Phil White 2013-11-27 2:47 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Phil White @ 2013-11-27 2:41 UTC (permalink / raw) To: Nathan Scott; +Cc: Phil White, xfs On Tue, Nov 26, 2013 at 09:38:20PM -0500, Nathan Scott wrote: > ----- Original Message ----- > > ... > > It's trying to read/write 512 bytes at the beginning of the file which seems > > reasonably innocuous. I double checked the man page which says that under > > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp > > comes > > out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures() > > is called and the first error comes up, so I'm at a loss for what's going > > wrong. > > The filesystem backing the new /root/image file doesn't support direct I/O? > > cheers. The filesystem backing /root/image is xfs. Good guess though. For the record: caliban mnt # xfs_info / meta-data=/dev/root isize=256 agcount=4, agsize=25685952 blks = sectsz=4096 attr=2 data = bsize=4096 blocks=102743808, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=50167, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 -Phil _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:41 ` Phil White @ 2013-11-27 2:47 ` Dave Chinner 0 siblings, 0 replies; 15+ messages in thread From: Dave Chinner @ 2013-11-27 2:47 UTC (permalink / raw) To: Phil White; +Cc: Nathan Scott, Phil White, xfs On Tue, Nov 26, 2013 at 06:41:19PM -0800, Phil White wrote: > On Tue, Nov 26, 2013 at 09:38:20PM -0500, Nathan Scott wrote: > > ----- Original Message ----- > > > ... > > > It's trying to read/write 512 bytes at the beginning of the file which seems > > > reasonably innocuous. I double checked the man page which says that under > > > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp > > > comes > > > out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures() > > > is called and the first error comes up, so I'm at a loss for what's going > > > wrong. > > > > The filesystem backing the new /root/image file doesn't support direct I/O? > > > > cheers. > > The filesystem backing /root/image is xfs. Good guess though. > > For the record: > caliban mnt # xfs_info / > meta-data=/dev/root isize=256 agcount=4, agsize=25685952 blks > = sectsz=4096 attr=2 ^^^^^^^^^^^ Yup, there's your problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White 2013-11-27 2:36 ` Phil White 2013-11-27 2:38 ` Nathan Scott @ 2013-11-27 2:47 ` Dave Chinner 2013-11-28 2:39 ` Eric Sandeen 2 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2013-11-27 2:47 UTC (permalink / raw) To: Phil White; +Cc: xfs On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote: > Gents: > > I was making an image for a VM using everyone's favorite fs with a line > that looked something like this: > ------------- > dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop > ------------- > > > mkfs.xfs gave me this output: > ------------- > meta-data=/root/image isize=256 agcount=4, agsize=65536 blks > = sectsz=512 attr=2, projid32bit=0 > data = bsize=4096 blocks=262144, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=2560, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > existing superblock read failed: Invalid argument > mkfs.xfs: pwrite64 failed: Invalid argument > mkfs.xfs: read failed: Invalid argument > ------------- ..... > > While it occurred to me that the problem might just be line 806 of some files > in xfsprogs, I threw it under a debugger and took a closer look. The file > descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I > thought that might indicate a problem with lseek(), so I rewrote the pwrite64() > and pread() as lseek()s and read()/write() > > As you may have guessed, this did me no good at all. > > It's trying to read/write 512 bytes at the beginning of the file which seems > reasonably innocuous. I double checked the man page which says that under > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. That doesn't mean it is correct, because the man page also says: " In Linux alignment restrictions vary by filesystem and kernel version and might be absent entirely." So, I bet that your underlying filesystem (i.e. the host filesystem) has a sector size of 4k, and that's why direct Io on 512 byte alignment is failing. In that case, run "mkfs.xfs -s size=4k ..." and mkfs should just work fine... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-27 2:47 ` Dave Chinner @ 2013-11-28 2:39 ` Eric Sandeen 2013-11-28 5:16 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2013-11-28 2:39 UTC (permalink / raw) To: Dave Chinner, Phil White; +Cc: xfs On 11/26/13, 8:47 PM, Dave Chinner wrote: > On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote: >> Gents: >> >> I was making an image for a VM using everyone's favorite fs with a line >> that looked something like this: >> ------------- >> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop >> ------------- >> >> >> mkfs.xfs gave me this output: >> ------------- >> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks >> = sectsz=512 attr=2, projid32bit=0 >> data = bsize=4096 blocks=262144, imaxpct=25 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 >> log =internal log bsize=4096 blocks=2560, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> existing superblock read failed: Invalid argument >> mkfs.xfs: pwrite64 failed: Invalid argument >> mkfs.xfs: read failed: Invalid argument >> ------------- > ..... >> >> While it occurred to me that the problem might just be line 806 of some files >> in xfsprogs, I threw it under a debugger and took a closer look. The file >> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I >> thought that might indicate a problem with lseek(), so I rewrote the pwrite64() >> and pread() as lseek()s and read()/write() >> >> As you may have guessed, this did me no good at all. >> >> It's trying to read/write 512 bytes at the beginning of the file which seems >> reasonably innocuous. I double checked the man page which says that under >> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. > > That doesn't mean it is correct, because the man page also says: > > " In Linux alignment restrictions vary by filesystem and kernel > version and might be absent entirely." > > So, I bet that your underlying filesystem (i.e. the host filesystem) > has a sector size of 4k, and that's why direct Io on 512 byte > alignment is failing. In that case, run "mkfs.xfs -s size=4k ..." > and mkfs should just work fine... Sadly, no. Or at least, probably not. __initbuf memalign(libxfs_device_alignment(), bytes); where libxfs_device_alignment() does: platform_align_blockdev if (!max_block_alignment) return getpagesize(); return max_block_alignment; and through twisty paths through platform_findsizes(), max_block_alignment is BBSIZE, or 512. IOWS: xfsprogs is a braindead package that doesn't know how to properly handle non-512-aligned DIO. ;) </snark> We should fix xfsprogs, but I'm also looking making 4k "sector" xfs able to do 512 DIOs as long as the device under it can do 512 DIOS. We got into the 4k DIO alignment because mkfs.xfs saw a 512 logical / 4k physical drive, and chose 4k for its own internal sector size, which is great in terms of doing metadata IO efficiently, but not so great in terms of rejecting 512 byte IOs that the drive could otherwise do. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 2:39 ` Eric Sandeen @ 2013-11-28 5:16 ` Dave Chinner 2013-11-28 5:34 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2013-11-28 5:16 UTC (permalink / raw) To: Eric Sandeen; +Cc: Phil White, xfs On Wed, Nov 27, 2013 at 08:39:55PM -0600, Eric Sandeen wrote: > On 11/26/13, 8:47 PM, Dave Chinner wrote: > > On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote: > >> Gents: > >> > >> I was making an image for a VM using everyone's favorite fs with a line > >> that looked something like this: > >> ------------- > >> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop > >> ------------- > >> > >> > >> mkfs.xfs gave me this output: > >> ------------- > >> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks > >> = sectsz=512 attr=2, projid32bit=0 > >> data = bsize=4096 blocks=262144, imaxpct=25 > >> = sunit=0 swidth=0 blks > >> naming =version 2 bsize=4096 ascii-ci=0 > >> log =internal log bsize=4096 blocks=2560, version=2 > >> = sectsz=512 sunit=0 blks, lazy-count=1 > >> realtime =none extsz=4096 blocks=0, rtextents=0 > >> existing superblock read failed: Invalid argument > >> mkfs.xfs: pwrite64 failed: Invalid argument > >> mkfs.xfs: read failed: Invalid argument > >> ------------- > > ..... > >> > >> While it occurred to me that the problem might just be line 806 of some files > >> in xfsprogs, I threw it under a debugger and took a closer look. The file > >> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I > >> thought that might indicate a problem with lseek(), so I rewrote the pwrite64() > >> and pread() as lseek()s and read()/write() > >> > >> As you may have guessed, this did me no good at all. > >> > >> It's trying to read/write 512 bytes at the beginning of the file which seems > >> reasonably innocuous. I double checked the man page which says that under > >> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. > > > > That doesn't mean it is correct, because the man page also says: > > > > " In Linux alignment restrictions vary by filesystem and kernel > > version and might be absent entirely." > > > > So, I bet that your underlying filesystem (i.e. the host filesystem) > > has a sector size of 4k, and that's why direct Io on 512 byte > > alignment is failing. In that case, run "mkfs.xfs -s size=4k ..." > > and mkfs should just work fine... > > Sadly, no. Or at least, probably not. > > __initbuf > memalign(libxfs_device_alignment(), bytes); > > where libxfs_device_alignment() does: Yeah, that's for memory buffer alignment, though, not IO alignment. It's busted because that should always default to page size, not sector size. But that's not the problem - for example: # xfs_info /storage meta-data=/dev/md0 isize=256 agcount=32, agsize=21503744 blks = sectsz=4096 attr=2, projid32bit=0 = crc=0 data = bsize=4096 blocks=688119680, imaxpct=5 = sunit=32 swidth=320 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=335995, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 That's a 4k sector filesystem, and: # dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && mkfs.xfs -d file,size=1g,name=/storage/fubar.img 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 4.18106 s, 257 MB/s meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mkfs works fine on it. As does xfs_repair: # xfs_repair -f /storage/fubar.img Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... And xfs_db works just fine, too: $ sudo xfs_db -f /storage/fubar.img xfs_db> sb 0 xfs_db> p magicnum = 0x58465342 blocksize = 4096 dblocks = 262144 rblocks = 0 rextents = 0 uuid = 73d16c96-df35-4f1f-b781-34da486f089c logstart = 131076 rootino = 128 rbmino = 129 .... because it doesn't set the LIBXFS_DIRECT flag on the device instantiation structures yet and so is using buffered IO. > IOWS: xfsprogs is a braindead package that doesn't know how to > properly handle non-512-aligned DIO. ;) </snark> Yeah, it doesn't know how to handle it but it avoids the problem completely by using buffered IO instead. It works just fine. ;) So, let's recreate the problem knowing that: $ sudo dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && sudo strace -f -o t.t mkfs.xfs -d size=1g,name=/storage/fubar.img 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 4.52546 s, 237 MB/s meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument So, it failed to write using direct IO because of IO alignment because I didn't tell mkfs that it was running on a file. i.e. I forgot the "-d file" option. $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument Yup, still fails. Let's force it! $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 existing superblock read failed: Invalid argument mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument And there's the identical failure to what was reported. So, user error - the user is telling mkfs.xfs that it is making a filesystem on a block device named "/storage/fubar.img". The same thing happens with the normal method of specifying the block device: sudo mkfs.xfs -f -d size=1g /storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 existing superblock read failed: Invalid argument mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument But if we remove the image file: $ sudo mkfs.xfs -f -d size=1g /storage/fubar.img /storage/fubar.img: No such file or directory Usage: mkfs.xfs .... It's pretty clear that we need the "-d file" when the file doesn't actually exist. IOWs, mkfs does not expect a block device to lie about it's sector sizes, but that's exactly what treating an image file like a block device leads to. This isn't the DIO sector size problem you were looking for, Eric ;) FWIW, an strace shows: 12256 ioctl(3, BLKDISCARD, 0x7fff76f4ea50) = -1 ENOTTY (Inappropriate ioctl for device) ... that we make that same mistake in several places in mkfs. What mkfs needs to do is reject devices that are files when "-d file", "-l file" and "-r file" is not specified, and the problem will go away because it will catch users who forget to tell mkfs that it is supposed to be operating on an image file... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 5:16 ` Dave Chinner @ 2013-11-28 5:34 ` Eric Sandeen 2013-11-28 10:01 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2013-11-28 5:34 UTC (permalink / raw) To: Dave Chinner; +Cc: Phil White, xfs On 11/27/13, 11:16 PM, Dave Chinner wrote: <snip> > So, it failed to write using direct IO because of IO alignment > because I didn't tell mkfs that it was running on a file. i.e. I > forgot the "-d file" option. > > $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=0 > data = bsize=4096 blocks=262144, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=7344, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > mkfs.xfs: pwrite64 failed: Invalid argument > mkfs.xfs: read failed: Invalid argument > > Yup, still fails. Let's force it! > > $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=0 > data = bsize=4096 blocks=262144, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=7344, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > existing superblock read failed: Invalid argument > mkfs.xfs: pwrite64 failed: Invalid argument > mkfs.xfs: read failed: Invalid argument > > And there's the identical failure to what was reported. > > So, user error - the user is telling mkfs.xfs that it is making a > filesystem on a block device named "/storage/fubar.img". The same > thing happens with the normal method of specifying the block device: If only we had some way to tell, programatically, whether the mkfs target was a regular file or a block device, eh? ;) Seriously, I always thought the requirment to specify "-d file" was silly. And now I think it's even more silly, if it actually is required for proper behavior... > What mkfs needs to do is reject devices that are files when "-d > file", "-l file" and "-r file" is not specified, and the problem > will go away because it will catch users who forget to tell mkfs > that it is supposed to be operating on an image file... Or maybe just stat() it, and DTRT? -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 5:34 ` Eric Sandeen @ 2013-11-28 10:01 ` Dave Chinner 2013-11-28 11:47 ` Phil White 2013-11-28 15:32 ` Eric Sandeen 0 siblings, 2 replies; 15+ messages in thread From: Dave Chinner @ 2013-11-28 10:01 UTC (permalink / raw) To: Eric Sandeen; +Cc: Phil White, xfs On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: > On 11/27/13, 11:16 PM, Dave Chinner wrote: > > So, it failed to write using direct IO because of IO alignment > > because I didn't tell mkfs that it was running on a file. i.e. I > > forgot the "-d file" option. > > > > $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img > > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks > > = sectsz=512 attr=2, projid32bit=1 > > = crc=0 > > data = bsize=4096 blocks=262144, imaxpct=25 > > = sunit=0 swidth=0 blks > > naming =version 2 bsize=4096 ascii-ci=0 > > log =internal log bsize=4096 blocks=7344, version=2 > > = sectsz=512 sunit=0 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > > mkfs.xfs: pwrite64 failed: Invalid argument > > mkfs.xfs: read failed: Invalid argument > > > > Yup, still fails. Let's force it! > > > > $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img > > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks > > = sectsz=512 attr=2, projid32bit=1 > > = crc=0 > > data = bsize=4096 blocks=262144, imaxpct=25 > > = sunit=0 swidth=0 blks > > naming =version 2 bsize=4096 ascii-ci=0 > > log =internal log bsize=4096 blocks=7344, version=2 > > = sectsz=512 sunit=0 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > > existing superblock read failed: Invalid argument > > mkfs.xfs: pwrite64 failed: Invalid argument > > mkfs.xfs: read failed: Invalid argument > > > > And there's the identical failure to what was reported. > > > > So, user error - the user is telling mkfs.xfs that it is making a > > filesystem on a block device named "/storage/fubar.img". The same > > thing happens with the normal method of specifying the block device: > > If only we had some way to tell, programatically, whether the mkfs target > was a regular file or a block device, eh? ;) > > Seriously, I always thought the requirment to specify "-d file" was silly. > And now I think it's even more silly, if it actually is required for > proper behavior... It has always been required if you want mkfs to create the file for you. And given that doing stuff like ioctl(BLKDISCARD) on files is completely wrong, so I think it really is needed... > > What mkfs needs to do is reject devices that are files when "-d > > file", "-l file" and "-r file" is not specified, and the problem > > will go away because it will catch users who forget to tell mkfs > > that it is supposed to be operating on an image file... > > Or maybe just stat() it, and DTRT? Well, we need to stat it to make sure that it's a file if "-d file" is specified, and a block device if it's not. That will prevent this problem. Every other xfsprogs utility has to be told that it is being pointed at an image file rather than a block device, so why should mkfs be any different? Indeed, if we don't require users to tell mkfs that it's a file, what do we do with non-existent device names when they are provided by the user? Just create the file rather than returning ENOENT? So suddenly /dev/ fills up with fileystem images because of typos? Principle of Least Surprise says that ENOENT is the correct behaviour, hence it follows that "-d file" is needed and should be properly checked and enforced. I'll add this to the start of the patch set I'm currently working on that fixes all of the mkfs input parameter validation problems I've found over the past couple of weeks... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 10:01 ` Dave Chinner @ 2013-11-28 11:47 ` Phil White 2013-11-28 15:38 ` Eric Sandeen 2013-11-28 15:32 ` Eric Sandeen 1 sibling, 1 reply; 15+ messages in thread From: Phil White @ 2013-11-28 11:47 UTC (permalink / raw) To: Dave Chinner; +Cc: Phil White, Eric Sandeen, xfs On Thu, Nov 28, 2013 at 09:01:07PM +1100, Dave Chinner wrote: > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: > > If only we had some way to tell, programatically, whether the mkfs target > > was a regular file or a block device, eh? ;) > > > > Seriously, I always thought the requirment to specify "-d file" was silly. > > And now I think it's even more silly, if it actually is required for > > proper behavior... > > It has always been required if you want mkfs to create the file for > you. And given that doing stuff like ioctl(BLKDISCARD) on files is > completely wrong, so I think it really is needed... > > > > What mkfs needs to do is reject devices that are files when "-d > > > file", "-l file" and "-r file" is not specified, and the problem > > > will go away because it will catch users who forget to tell mkfs > > > that it is supposed to be operating on an image file... > > > > Or maybe just stat() it, and DTRT? > > Well, we need to stat it to make sure that it's a file if "-d file" > is specified, and a block device if it's not. That will prevent this > problem. Every other xfsprogs utility has to be told that it is > being pointed at an image file rather than a block device, so why > should mkfs be any different? FWIW, I have a patch to just stat() and discard LIBXFS_DIRECT if the target is not a block device. It worked for what I was doing and I wouldn't mind cleaning it up if need be. The main thing is that it seems to me that mkfs mandates that the situation I outlined shouldn't ever fail. That's probably something worth adding to xfstests as well. And writing that test, I suppose, is something I don't mind doing either. -Phil -Phil _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 11:47 ` Phil White @ 2013-11-28 15:38 ` Eric Sandeen 0 siblings, 0 replies; 15+ messages in thread From: Eric Sandeen @ 2013-11-28 15:38 UTC (permalink / raw) To: Phil White, Dave Chinner; +Cc: xfs On 11/28/13, 5:47 AM, Phil White wrote: > On Thu, Nov 28, 2013 at 09:01:07PM +1100, Dave Chinner wrote: >> On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: >>> If only we had some way to tell, programatically, whether the mkfs target >>> was a regular file or a block device, eh? ;) >>> >>> Seriously, I always thought the requirment to specify "-d file" was silly. >>> And now I think it's even more silly, if it actually is required for >>> proper behavior... >> >> It has always been required if you want mkfs to create the file for >> you. And given that doing stuff like ioctl(BLKDISCARD) on files is >> completely wrong, so I think it really is needed... >> >>>> What mkfs needs to do is reject devices that are files when "-d >>>> file", "-l file" and "-r file" is not specified, and the problem >>>> will go away because it will catch users who forget to tell mkfs >>>> that it is supposed to be operating on an image file... >>> >>> Or maybe just stat() it, and DTRT? >> >> Well, we need to stat it to make sure that it's a file if "-d file" >> is specified, and a block device if it's not. That will prevent this >> problem. Every other xfsprogs utility has to be told that it is >> being pointed at an image file rather than a block device, so why >> should mkfs be any different? > > FWIW, I have a patch to just stat() and discard LIBXFS_DIRECT if the > target is not a block device. It worked for what I was doing and I > wouldn't mind cleaning it up if need be. I wonder if that's a little too surgical; today if we specify -d file, we get xi.disfile=1, and set xi.dcreat=1 as well as long as -N wasn't specified. Having xi.disfile set affects a few other behaviors after all, so I just wonder if we should stat it early, set xi.disfile, and let all the normal paths take it from there. I'm not quite sure. > The main thing is that it seems to me that mkfs mandates that the situation > I outlined shouldn't ever fail. That's probably something worth adding to > xfstests as well. > > And writing that test, I suppose, is something I don't mind doing either. Tests are always welcome, that'd be great! Thanks, -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 10:01 ` Dave Chinner 2013-11-28 11:47 ` Phil White @ 2013-11-28 15:32 ` Eric Sandeen 2013-11-28 21:12 ` Dave Chinner 1 sibling, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2013-11-28 15:32 UTC (permalink / raw) To: Dave Chinner; +Cc: Phil White, xfs On 11/28/13, 4:01 AM, Dave Chinner wrote: > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: <snip> >> Or maybe just stat() it, and DTRT? > > Well, we need to stat it to make sure that it's a file if "-d file" > is specified, and a block device if it's not. That will prevent this > problem. Every other xfsprogs utility has to be told that it is > being pointed at an image file rather than a block device, so why > should mkfs be any different? The option is there but again I never really knew why. They work fine without -f, at least in general: $ xfs_db fsfile xfs_db> $ xfs_repair fsfile Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... ... $ xfs_metadump fsfile fsfile.meta $ file fsfile.meta fsfile.meta: XFS filesystem metadump image etc > Indeed, if we don't require users to tell mkfs that it's a file, > what do we do with non-existent device names when they are provided > by the user? Just create the file rather than returning ENOENT? So > suddenly /dev/ fills up with fileystem images because of typos? That won't happen because it doesn't create a new file unless -d file is specified, so I guess that's one difference. i.e. with -d file it'll create a file of the requested size; without it, it will mkfs it to whatever size the file already is, or if it doesn't exist, return -ENOENT. > Principle of Least Surprise says that ENOENT is the correct > behaviour, hence it follows that "-d file" is needed and should be > properly checked and enforced. I'll add this to the start of the > patch set I'm currently working on that fixes all of the mkfs input > parameter validation problems I've found over the past couple of > weeks... Well, I hope it doesn't stop mkfs.xfs from mkfs'ing an existing file image, which has always worked before... Thanks, -Eric > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 15:32 ` Eric Sandeen @ 2013-11-28 21:12 ` Dave Chinner 2013-11-29 1:28 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2013-11-28 21:12 UTC (permalink / raw) To: Eric Sandeen; +Cc: Phil White, xfs On Thu, Nov 28, 2013 at 09:32:32AM -0600, Eric Sandeen wrote: > On 11/28/13, 4:01 AM, Dave Chinner wrote: > > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: > > <snip> > > >> Or maybe just stat() it, and DTRT? > > > > Well, we need to stat it to make sure that it's a file if "-d file" > > is specified, and a block device if it's not. That will prevent this > > problem. Every other xfsprogs utility has to be told that it is > > being pointed at an image file rather than a block device, so why > > should mkfs be any different? > > The option is there but again I never really knew why. They work > fine without -f, at least in general: Just like mkfs works fine, *in general*. That doesn't mean they will always work, though: $ sudo xfs_repair -n /storage/broken.img Phase 1 - find and verify superblock... xfs_repair: read failed: Invalid argument Repair fails on the file which has a smaller sector size than the host filesystem, unless you tell it is working on a file, not a block device: $ sudo xfs_repair -n -f /storage/broken.img Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 would have corrected attribute entry count in inode 649642 from 40 to 0 local inode 649790 attr too small (size = 1, min size = 4) bad attribute fork in inode 649790, would clear attr fork would have cleared inode 649790 .... And so behaviour is identical to mkfs... > - scan filesystem freespace and inode maps... > ... > > $ xfs_metadump fsfile fsfile.meta > > $ file fsfile.meta > fsfile.meta: XFS filesystem metadump image > > etc > > > Indeed, if we don't require users to tell mkfs that it's a file, > > what do we do with non-existent device names when they are provided > > by the user? Just create the file rather than returning ENOENT? So > > suddenly /dev/ fills up with fileystem images because of typos? > > That won't happen because it doesn't create a new file unless -d file > is specified, so I guess that's one difference. i.e. with -d file > it'll create a file of the requested size; without it, it will mkfs > it to whatever size the file already is, or if it doesn't exist, > return -ENOENT. > > > Principle of Least Surprise says that ENOENT is the correct > > behaviour, hence it follows that "-d file" is needed and should be > > properly checked and enforced. I'll add this to the start of the > > patch set I'm currently working on that fixes all of the mkfs input > > parameter validation problems I've found over the past couple of > > weeks... > > Well, I hope it doesn't stop mkfs.xfs from mkfs'ing an existing > file image, which has always worked before... that's what stat() is for. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file 2013-11-28 21:12 ` Dave Chinner @ 2013-11-29 1:28 ` Dave Chinner 0 siblings, 0 replies; 15+ messages in thread From: Dave Chinner @ 2013-11-29 1:28 UTC (permalink / raw) To: Eric Sandeen; +Cc: Phil White, xfs On Fri, Nov 29, 2013 at 08:12:13AM +1100, Dave Chinner wrote: > On Thu, Nov 28, 2013 at 09:32:32AM -0600, Eric Sandeen wrote: > > On 11/28/13, 4:01 AM, Dave Chinner wrote: > > > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote: > > > > <snip> > > > > >> Or maybe just stat() it, and DTRT? > > > > > > Well, we need to stat it to make sure that it's a file if "-d file" > > > is specified, and a block device if it's not. That will prevent this > > > problem. Every other xfsprogs utility has to be told that it is > > > being pointed at an image file rather than a block device, so why > > > should mkfs be any different? > > > > The option is there but again I never really knew why. They work > > fine without -f, at least in general: > > Just like mkfs works fine, *in general*. That doesn't mean they will always > work, though: FYI, here's the list of stuff on top of making mkfs detect files and set the proper flags and avoid direct IO that I found while doing this: Other file/blockdev issues fixed: - use getstr to detect specifying the data device name twice. - check file/size/name parameters before anything else. - overwrite checks need to be done before the image file is opened and potentially truncated. - blkid_get_topology() should not be called for image files, so warn when it is called that way. - zero_old_xfs_structures() emits a spurious error: "existing superblock read failed: Success" when it is run on a truncated image file. Don't warn if we see this problem on an image file. - Don't issue discards on image files. - Use fsync() for image files, not BLKFLSBUF in platform_flush_device() for Linux. And so now "-d file" is only needed to trigger creation of the image file, or if you want to truncate the old file away completely first... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-11-29 1:28 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White 2013-11-27 2:36 ` Phil White 2013-11-27 2:38 ` Nathan Scott 2013-11-27 2:41 ` Phil White 2013-11-27 2:47 ` Dave Chinner 2013-11-27 2:47 ` Dave Chinner 2013-11-28 2:39 ` Eric Sandeen 2013-11-28 5:16 ` Dave Chinner 2013-11-28 5:34 ` Eric Sandeen 2013-11-28 10:01 ` Dave Chinner 2013-11-28 11:47 ` Phil White 2013-11-28 15:38 ` Eric Sandeen 2013-11-28 15:32 ` Eric Sandeen 2013-11-28 21:12 ` Dave Chinner 2013-11-29 1:28 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox