* Problem with mkfs.xfs on a regular file
@ 2013-11-27 2:31 Phil White
2013-11-27 2:36 ` Phil White
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Phil White @ 2013-11-27 2:31 UTC (permalink / raw)
To: xfs
Gents:
I was making an image for a VM using everyone's favorite fs with a line
that looked something like this:
-------------
dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
-------------
mkfs.xfs gave me this output:
-------------
meta-data=/root/image isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
existing superblock read failed: Invalid argument
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
-------------
And, of course, it failed to mount.
Knowing that this was using a possibly old version of xfsprogs (3.1.10, it
turned out), I grabbed the git tree and tried that. The same problem occurred.
The "existing superblock read" problem was from mkfs/xfs_mkfs.c @ 806 in
zero_old_xfs_structures():
-------------
if (pread(xi->dfd, buf, new_sb->sb_sectsize, 0) != new_sb->sb_sectsize) {
fprintf(stderr, _("existing superblock read failed: %s\n"),
strerror(errno));
free(buf);
return;
}
-------------
The pwrite64 problem was from libxfs/rdwr.c @ 806 in __write_buf():
-------------
sts = pwrite64(fd, buf, len, offset);
if (sts < 0) {
int error = errno;
fprintf(stderr, _("%s: pwrite64 failed: %s\n"),
progname, strerror(error));
if (flags & LIBXFS_B_EXIT)
exit(1);
return error;
} else if (sts != len) {
fprintf(stderr, _("%s: error - pwrite64 only %d of %d bytes\n"),
progname, sts, len);
if (flags & LIBXFS_B_EXIT)
exit(1);
return EIO;
}
-------------
While it occurred to me that the problem might just be line 806 of some files
in xfsprogs, I threw it under a debugger and took a closer look. The file
descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I
thought that might indicate a problem with lseek(), so I rewrote the pwrite64()
and pread() as lseek()s and read()/write()
As you may have guessed, this did me no good at all.
It's trying to read/write 512 bytes at the beginning of the file which seems
reasonably innocuous. I double checked the man page which says that under
2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp comes
out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures()
is called and the first error comes up, so I'm at a loss for what's going wrong.
-Phil
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White
@ 2013-11-27 2:36 ` Phil White
2013-11-27 2:38 ` Nathan Scott
2013-11-27 2:47 ` Dave Chinner
2 siblings, 0 replies; 15+ messages in thread
From: Phil White @ 2013-11-27 2:36 UTC (permalink / raw)
To: xfs
On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote:
> Gents:
>
> I was making an image for a VM using everyone's favorite fs with a line
> that looked something like this:
> -------------
> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
> -------------
Obviously, I intended "mkfs.xfs -f /root/image".
Why is proofreading *AFTER* you send an email so much more effective?
-Phil
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White
2013-11-27 2:36 ` Phil White
@ 2013-11-27 2:38 ` Nathan Scott
2013-11-27 2:41 ` Phil White
2013-11-27 2:47 ` Dave Chinner
2 siblings, 1 reply; 15+ messages in thread
From: Nathan Scott @ 2013-11-27 2:38 UTC (permalink / raw)
To: Phil White; +Cc: xfs
----- Original Message -----
> ...
> It's trying to read/write 512 bytes at the beginning of the file which seems
> reasonably innocuous. I double checked the man page which says that under
> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp
> comes
> out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures()
> is called and the first error comes up, so I'm at a loss for what's going
> wrong.
The filesystem backing the new /root/image file doesn't support direct I/O?
cheers.
--
Nathan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:38 ` Nathan Scott
@ 2013-11-27 2:41 ` Phil White
2013-11-27 2:47 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Phil White @ 2013-11-27 2:41 UTC (permalink / raw)
To: Nathan Scott; +Cc: Phil White, xfs
On Tue, Nov 26, 2013 at 09:38:20PM -0500, Nathan Scott wrote:
> ----- Original Message -----
> > ...
> > It's trying to read/write 512 bytes at the beginning of the file which seems
> > reasonably innocuous. I double checked the man page which says that under
> > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp
> > comes
> > out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures()
> > is called and the first error comes up, so I'm at a loss for what's going
> > wrong.
>
> The filesystem backing the new /root/image file doesn't support direct I/O?
>
> cheers.
The filesystem backing /root/image is xfs. Good guess though.
For the record:
caliban mnt # xfs_info /
meta-data=/dev/root isize=256 agcount=4, agsize=25685952 blks
= sectsz=4096 attr=2
data = bsize=4096 blocks=102743808, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=50167, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
-Phil
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White
2013-11-27 2:36 ` Phil White
2013-11-27 2:38 ` Nathan Scott
@ 2013-11-27 2:47 ` Dave Chinner
2013-11-28 2:39 ` Eric Sandeen
2 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2013-11-27 2:47 UTC (permalink / raw)
To: Phil White; +Cc: xfs
On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote:
> Gents:
>
> I was making an image for a VM using everyone's favorite fs with a line
> that looked something like this:
> -------------
> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
> -------------
>
>
> mkfs.xfs gave me this output:
> -------------
> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks
> = sectsz=512 attr=2, projid32bit=0
> data = bsize=4096 blocks=262144, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=4096 blocks=2560, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> existing superblock read failed: Invalid argument
> mkfs.xfs: pwrite64 failed: Invalid argument
> mkfs.xfs: read failed: Invalid argument
> -------------
.....
>
> While it occurred to me that the problem might just be line 806 of some files
> in xfsprogs, I threw it under a debugger and took a closer look. The file
> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I
> thought that might indicate a problem with lseek(), so I rewrote the pwrite64()
> and pread() as lseek()s and read()/write()
>
> As you may have guessed, this did me no good at all.
>
> It's trying to read/write 512 bytes at the beginning of the file which seems
> reasonably innocuous. I double checked the man page which says that under
> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem.
That doesn't mean it is correct, because the man page also says:
" In Linux alignment restrictions vary by filesystem and kernel
version and might be absent entirely."
So, I bet that your underlying filesystem (i.e. the host filesystem)
has a sector size of 4k, and that's why direct Io on 512 byte
alignment is failing. In that case, run "mkfs.xfs -s size=4k ..."
and mkfs should just work fine...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:41 ` Phil White
@ 2013-11-27 2:47 ` Dave Chinner
0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2013-11-27 2:47 UTC (permalink / raw)
To: Phil White; +Cc: Nathan Scott, Phil White, xfs
On Tue, Nov 26, 2013 at 06:41:19PM -0800, Phil White wrote:
> On Tue, Nov 26, 2013 at 09:38:20PM -0500, Nathan Scott wrote:
> > ----- Original Message -----
> > > ...
> > > It's trying to read/write 512 bytes at the beginning of the file which seems
> > > reasonably innocuous. I double checked the man page which says that under
> > > 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. sbp
> > > comes
> > > out with 4096 in blocksize and 512 in sectsize when zero_old_xfs_structures()
> > > is called and the first error comes up, so I'm at a loss for what's going
> > > wrong.
> >
> > The filesystem backing the new /root/image file doesn't support direct I/O?
> >
> > cheers.
>
> The filesystem backing /root/image is xfs. Good guess though.
>
> For the record:
> caliban mnt # xfs_info /
> meta-data=/dev/root isize=256 agcount=4, agsize=25685952 blks
> = sectsz=4096 attr=2
^^^^^^^^^^^
Yup, there's your problem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-27 2:47 ` Dave Chinner
@ 2013-11-28 2:39 ` Eric Sandeen
2013-11-28 5:16 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2013-11-28 2:39 UTC (permalink / raw)
To: Dave Chinner, Phil White; +Cc: xfs
On 11/26/13, 8:47 PM, Dave Chinner wrote:
> On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote:
>> Gents:
>>
>> I was making an image for a VM using everyone's favorite fs with a line
>> that looked something like this:
>> -------------
>> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
>> -------------
>>
>>
>> mkfs.xfs gave me this output:
>> -------------
>> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks
>> = sectsz=512 attr=2, projid32bit=0
>> data = bsize=4096 blocks=262144, imaxpct=25
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0
>> log =internal log bsize=4096 blocks=2560, version=2
>> = sectsz=512 sunit=0 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>> existing superblock read failed: Invalid argument
>> mkfs.xfs: pwrite64 failed: Invalid argument
>> mkfs.xfs: read failed: Invalid argument
>> -------------
> .....
>>
>> While it occurred to me that the problem might just be line 806 of some files
>> in xfsprogs, I threw it under a debugger and took a closer look. The file
>> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I
>> thought that might indicate a problem with lseek(), so I rewrote the pwrite64()
>> and pread() as lseek()s and read()/write()
>>
>> As you may have guessed, this did me no good at all.
>>
>> It's trying to read/write 512 bytes at the beginning of the file which seems
>> reasonably innocuous. I double checked the man page which says that under
>> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem.
>
> That doesn't mean it is correct, because the man page also says:
>
> " In Linux alignment restrictions vary by filesystem and kernel
> version and might be absent entirely."
>
> So, I bet that your underlying filesystem (i.e. the host filesystem)
> has a sector size of 4k, and that's why direct Io on 512 byte
> alignment is failing. In that case, run "mkfs.xfs -s size=4k ..."
> and mkfs should just work fine...
Sadly, no. Or at least, probably not.
__initbuf
memalign(libxfs_device_alignment(), bytes);
where libxfs_device_alignment() does:
platform_align_blockdev
if (!max_block_alignment)
return getpagesize();
return max_block_alignment;
and through twisty paths through platform_findsizes(), max_block_alignment is BBSIZE, or 512.
IOWS: xfsprogs is a braindead package that doesn't know how to properly handle non-512-aligned DIO. ;) </snark>
We should fix xfsprogs, but I'm also looking making 4k "sector" xfs able to do 512 DIOs as long as the device under it can do 512 DIOS.
We got into the 4k DIO alignment because mkfs.xfs saw a 512 logical / 4k physical drive, and chose 4k for its own internal sector size, which is great in terms of doing metadata IO efficiently, but not so great in terms of rejecting 512 byte IOs that the drive could otherwise do.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 2:39 ` Eric Sandeen
@ 2013-11-28 5:16 ` Dave Chinner
2013-11-28 5:34 ` Eric Sandeen
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2013-11-28 5:16 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Phil White, xfs
On Wed, Nov 27, 2013 at 08:39:55PM -0600, Eric Sandeen wrote:
> On 11/26/13, 8:47 PM, Dave Chinner wrote:
> > On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote:
> >> Gents:
> >>
> >> I was making an image for a VM using everyone's favorite fs with a line
> >> that looked something like this:
> >> -------------
> >> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop
> >> -------------
> >>
> >>
> >> mkfs.xfs gave me this output:
> >> -------------
> >> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks
> >> = sectsz=512 attr=2, projid32bit=0
> >> data = bsize=4096 blocks=262144, imaxpct=25
> >> = sunit=0 swidth=0 blks
> >> naming =version 2 bsize=4096 ascii-ci=0
> >> log =internal log bsize=4096 blocks=2560, version=2
> >> = sectsz=512 sunit=0 blks, lazy-count=1
> >> realtime =none extsz=4096 blocks=0, rtextents=0
> >> existing superblock read failed: Invalid argument
> >> mkfs.xfs: pwrite64 failed: Invalid argument
> >> mkfs.xfs: read failed: Invalid argument
> >> -------------
> > .....
> >>
> >> While it occurred to me that the problem might just be line 806 of some files
> >> in xfsprogs, I threw it under a debugger and took a closer look. The file
> >> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I
> >> thought that might indicate a problem with lseek(), so I rewrote the pwrite64()
> >> and pread() as lseek()s and read()/write()
> >>
> >> As you may have guessed, this did me no good at all.
> >>
> >> It's trying to read/write 512 bytes at the beginning of the file which seems
> >> reasonably innocuous. I double checked the man page which says that under
> >> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem.
> >
> > That doesn't mean it is correct, because the man page also says:
> >
> > " In Linux alignment restrictions vary by filesystem and kernel
> > version and might be absent entirely."
> >
> > So, I bet that your underlying filesystem (i.e. the host filesystem)
> > has a sector size of 4k, and that's why direct Io on 512 byte
> > alignment is failing. In that case, run "mkfs.xfs -s size=4k ..."
> > and mkfs should just work fine...
>
> Sadly, no. Or at least, probably not.
>
> __initbuf
> memalign(libxfs_device_alignment(), bytes);
>
> where libxfs_device_alignment() does:
Yeah, that's for memory buffer alignment, though, not IO alignment.
It's busted because that should always default to page size, not
sector size. But that's not the problem - for example:
# xfs_info /storage
meta-data=/dev/md0 isize=256 agcount=32, agsize=21503744 blks
= sectsz=4096 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=688119680, imaxpct=5
= sunit=32 swidth=320 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=335995, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
That's a 4k sector filesystem, and:
# dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && mkfs.xfs -d file,size=1g,name=/storage/fubar.img
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.18106 s, 257 MB/s
meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=7344, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
#
mkfs works fine on it. As does xfs_repair:
# xfs_repair -f /storage/fubar.img
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 0
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
And xfs_db works just fine, too:
$ sudo xfs_db -f /storage/fubar.img
xfs_db> sb 0
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 262144
rblocks = 0
rextents = 0
uuid = 73d16c96-df35-4f1f-b781-34da486f089c
logstart = 131076
rootino = 128
rbmino = 129
....
because it doesn't set the LIBXFS_DIRECT flag on the device
instantiation structures yet and so is using buffered IO.
> IOWS: xfsprogs is a braindead package that doesn't know how to
> properly handle non-512-aligned DIO. ;) </snark>
Yeah, it doesn't know how to handle it but it avoids the problem
completely by using buffered IO instead. It works just fine. ;)
So, let's recreate the problem knowing that:
$ sudo dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && sudo strace -f -o t.t mkfs.xfs -d size=1g,name=/storage/fubar.img
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.52546 s, 237 MB/s
meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=7344, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
So, it failed to write using direct IO because of IO alignment
because I didn't tell mkfs that it was running on a file. i.e. I
forgot the "-d file" option.
$ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img
meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=7344, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
Yup, still fails. Let's force it!
$ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img
meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=7344, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
existing superblock read failed: Invalid argument
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
And there's the identical failure to what was reported.
So, user error - the user is telling mkfs.xfs that it is making a
filesystem on a block device named "/storage/fubar.img". The same
thing happens with the normal method of specifying the block device:
sudo mkfs.xfs -f -d size=1g /storage/fubar.img
meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=7344, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
existing superblock read failed: Invalid argument
mkfs.xfs: pwrite64 failed: Invalid argument
mkfs.xfs: read failed: Invalid argument
But if we remove the image file:
$ sudo mkfs.xfs -f -d size=1g /storage/fubar.img
/storage/fubar.img: No such file or directory
Usage: mkfs.xfs
....
It's pretty clear that we need the "-d file" when the file doesn't
actually exist.
IOWs, mkfs does not expect a block device to lie about it's sector
sizes, but that's exactly what treating an image file like a block
device leads to. This isn't the DIO sector size problem you were
looking for, Eric ;)
FWIW, an strace shows:
12256 ioctl(3, BLKDISCARD, 0x7fff76f4ea50) = -1 ENOTTY (Inappropriate ioctl for device)
... that we make that same mistake in several places in mkfs.
What mkfs needs to do is reject devices that are files when "-d
file", "-l file" and "-r file" is not specified, and the problem
will go away because it will catch users who forget to tell mkfs
that it is supposed to be operating on an image file...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 5:16 ` Dave Chinner
@ 2013-11-28 5:34 ` Eric Sandeen
2013-11-28 10:01 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2013-11-28 5:34 UTC (permalink / raw)
To: Dave Chinner; +Cc: Phil White, xfs
On 11/27/13, 11:16 PM, Dave Chinner wrote:
<snip>
> So, it failed to write using direct IO because of IO alignment
> because I didn't tell mkfs that it was running on a file. i.e. I
> forgot the "-d file" option.
>
> $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img
> meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
> = sectsz=512 attr=2, projid32bit=1
> = crc=0
> data = bsize=4096 blocks=262144, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=4096 blocks=7344, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> mkfs.xfs: pwrite64 failed: Invalid argument
> mkfs.xfs: read failed: Invalid argument
>
> Yup, still fails. Let's force it!
>
> $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img
> meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
> = sectsz=512 attr=2, projid32bit=1
> = crc=0
> data = bsize=4096 blocks=262144, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=4096 blocks=7344, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> existing superblock read failed: Invalid argument
> mkfs.xfs: pwrite64 failed: Invalid argument
> mkfs.xfs: read failed: Invalid argument
>
> And there's the identical failure to what was reported.
>
> So, user error - the user is telling mkfs.xfs that it is making a
> filesystem on a block device named "/storage/fubar.img". The same
> thing happens with the normal method of specifying the block device:
If only we had some way to tell, programatically, whether the mkfs target
was a regular file or a block device, eh? ;)
Seriously, I always thought the requirment to specify "-d file" was silly.
And now I think it's even more silly, if it actually is required for
proper behavior...
> What mkfs needs to do is reject devices that are files when "-d
> file", "-l file" and "-r file" is not specified, and the problem
> will go away because it will catch users who forget to tell mkfs
> that it is supposed to be operating on an image file...
Or maybe just stat() it, and DTRT?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 5:34 ` Eric Sandeen
@ 2013-11-28 10:01 ` Dave Chinner
2013-11-28 11:47 ` Phil White
2013-11-28 15:32 ` Eric Sandeen
0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2013-11-28 10:01 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Phil White, xfs
On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
> On 11/27/13, 11:16 PM, Dave Chinner wrote:
> > So, it failed to write using direct IO because of IO alignment
> > because I didn't tell mkfs that it was running on a file. i.e. I
> > forgot the "-d file" option.
> >
> > $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img
> > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
> > = sectsz=512 attr=2, projid32bit=1
> > = crc=0
> > data = bsize=4096 blocks=262144, imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0
> > log =internal log bsize=4096 blocks=7344, version=2
> > = sectsz=512 sunit=0 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> > mkfs.xfs: pwrite64 failed: Invalid argument
> > mkfs.xfs: read failed: Invalid argument
> >
> > Yup, still fails. Let's force it!
> >
> > $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img
> > meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks
> > = sectsz=512 attr=2, projid32bit=1
> > = crc=0
> > data = bsize=4096 blocks=262144, imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0
> > log =internal log bsize=4096 blocks=7344, version=2
> > = sectsz=512 sunit=0 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> > existing superblock read failed: Invalid argument
> > mkfs.xfs: pwrite64 failed: Invalid argument
> > mkfs.xfs: read failed: Invalid argument
> >
> > And there's the identical failure to what was reported.
> >
> > So, user error - the user is telling mkfs.xfs that it is making a
> > filesystem on a block device named "/storage/fubar.img". The same
> > thing happens with the normal method of specifying the block device:
>
> If only we had some way to tell, programatically, whether the mkfs target
> was a regular file or a block device, eh? ;)
>
> Seriously, I always thought the requirment to specify "-d file" was silly.
> And now I think it's even more silly, if it actually is required for
> proper behavior...
It has always been required if you want mkfs to create the file for
you. And given that doing stuff like ioctl(BLKDISCARD) on files is
completely wrong, so I think it really is needed...
> > What mkfs needs to do is reject devices that are files when "-d
> > file", "-l file" and "-r file" is not specified, and the problem
> > will go away because it will catch users who forget to tell mkfs
> > that it is supposed to be operating on an image file...
>
> Or maybe just stat() it, and DTRT?
Well, we need to stat it to make sure that it's a file if "-d file"
is specified, and a block device if it's not. That will prevent this
problem. Every other xfsprogs utility has to be told that it is
being pointed at an image file rather than a block device, so why
should mkfs be any different?
Indeed, if we don't require users to tell mkfs that it's a file,
what do we do with non-existent device names when they are provided
by the user? Just create the file rather than returning ENOENT? So
suddenly /dev/ fills up with fileystem images because of typos?
Principle of Least Surprise says that ENOENT is the correct
behaviour, hence it follows that "-d file" is needed and should be
properly checked and enforced. I'll add this to the start of the
patch set I'm currently working on that fixes all of the mkfs input
parameter validation problems I've found over the past couple of
weeks...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 10:01 ` Dave Chinner
@ 2013-11-28 11:47 ` Phil White
2013-11-28 15:38 ` Eric Sandeen
2013-11-28 15:32 ` Eric Sandeen
1 sibling, 1 reply; 15+ messages in thread
From: Phil White @ 2013-11-28 11:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: Phil White, Eric Sandeen, xfs
On Thu, Nov 28, 2013 at 09:01:07PM +1100, Dave Chinner wrote:
> On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
> > If only we had some way to tell, programatically, whether the mkfs target
> > was a regular file or a block device, eh? ;)
> >
> > Seriously, I always thought the requirment to specify "-d file" was silly.
> > And now I think it's even more silly, if it actually is required for
> > proper behavior...
>
> It has always been required if you want mkfs to create the file for
> you. And given that doing stuff like ioctl(BLKDISCARD) on files is
> completely wrong, so I think it really is needed...
>
> > > What mkfs needs to do is reject devices that are files when "-d
> > > file", "-l file" and "-r file" is not specified, and the problem
> > > will go away because it will catch users who forget to tell mkfs
> > > that it is supposed to be operating on an image file...
> >
> > Or maybe just stat() it, and DTRT?
>
> Well, we need to stat it to make sure that it's a file if "-d file"
> is specified, and a block device if it's not. That will prevent this
> problem. Every other xfsprogs utility has to be told that it is
> being pointed at an image file rather than a block device, so why
> should mkfs be any different?
FWIW, I have a patch to just stat() and discard LIBXFS_DIRECT if the
target is not a block device. It worked for what I was doing and I
wouldn't mind cleaning it up if need be.
The main thing is that it seems to me that mkfs mandates that the situation
I outlined shouldn't ever fail. That's probably something worth adding to
xfstests as well.
And writing that test, I suppose, is something I don't mind doing either.
-Phil
-Phil
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 10:01 ` Dave Chinner
2013-11-28 11:47 ` Phil White
@ 2013-11-28 15:32 ` Eric Sandeen
2013-11-28 21:12 ` Dave Chinner
1 sibling, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2013-11-28 15:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: Phil White, xfs
On 11/28/13, 4:01 AM, Dave Chinner wrote:
> On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
<snip>
>> Or maybe just stat() it, and DTRT?
>
> Well, we need to stat it to make sure that it's a file if "-d file"
> is specified, and a block device if it's not. That will prevent this
> problem. Every other xfsprogs utility has to be told that it is
> being pointed at an image file rather than a block device, so why
> should mkfs be any different?
The option is there but again I never really knew why. They work
fine without -f, at least in general:
$ xfs_db fsfile
xfs_db>
$ xfs_repair fsfile
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
...
$ xfs_metadump fsfile fsfile.meta
$ file fsfile.meta
fsfile.meta: XFS filesystem metadump image
etc
> Indeed, if we don't require users to tell mkfs that it's a file,
> what do we do with non-existent device names when they are provided
> by the user? Just create the file rather than returning ENOENT? So
> suddenly /dev/ fills up with fileystem images because of typos?
That won't happen because it doesn't create a new file unless -d file
is specified, so I guess that's one difference. i.e. with -d file
it'll create a file of the requested size; without it, it will mkfs
it to whatever size the file already is, or if it doesn't exist,
return -ENOENT.
> Principle of Least Surprise says that ENOENT is the correct
> behaviour, hence it follows that "-d file" is needed and should be
> properly checked and enforced. I'll add this to the start of the
> patch set I'm currently working on that fixes all of the mkfs input
> parameter validation problems I've found over the past couple of
> weeks...
Well, I hope it doesn't stop mkfs.xfs from mkfs'ing an existing
file image, which has always worked before...
Thanks,
-Eric
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 11:47 ` Phil White
@ 2013-11-28 15:38 ` Eric Sandeen
0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2013-11-28 15:38 UTC (permalink / raw)
To: Phil White, Dave Chinner; +Cc: xfs
On 11/28/13, 5:47 AM, Phil White wrote:
> On Thu, Nov 28, 2013 at 09:01:07PM +1100, Dave Chinner wrote:
>> On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
>>> If only we had some way to tell, programatically, whether the mkfs target
>>> was a regular file or a block device, eh? ;)
>>>
>>> Seriously, I always thought the requirment to specify "-d file" was silly.
>>> And now I think it's even more silly, if it actually is required for
>>> proper behavior...
>>
>> It has always been required if you want mkfs to create the file for
>> you. And given that doing stuff like ioctl(BLKDISCARD) on files is
>> completely wrong, so I think it really is needed...
>>
>>>> What mkfs needs to do is reject devices that are files when "-d
>>>> file", "-l file" and "-r file" is not specified, and the problem
>>>> will go away because it will catch users who forget to tell mkfs
>>>> that it is supposed to be operating on an image file...
>>>
>>> Or maybe just stat() it, and DTRT?
>>
>> Well, we need to stat it to make sure that it's a file if "-d file"
>> is specified, and a block device if it's not. That will prevent this
>> problem. Every other xfsprogs utility has to be told that it is
>> being pointed at an image file rather than a block device, so why
>> should mkfs be any different?
>
> FWIW, I have a patch to just stat() and discard LIBXFS_DIRECT if the
> target is not a block device. It worked for what I was doing and I
> wouldn't mind cleaning it up if need be.
I wonder if that's a little too surgical; today if we specify -d file,
we get xi.disfile=1, and set xi.dcreat=1 as well as long as -N wasn't specified.
Having xi.disfile set affects a few other behaviors after all, so
I just wonder if we should stat it early, set xi.disfile, and let all the
normal paths take it from there. I'm not quite sure.
> The main thing is that it seems to me that mkfs mandates that the situation
> I outlined shouldn't ever fail. That's probably something worth adding to
> xfstests as well.
>
> And writing that test, I suppose, is something I don't mind doing either.
Tests are always welcome, that'd be great!
Thanks,
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 15:32 ` Eric Sandeen
@ 2013-11-28 21:12 ` Dave Chinner
2013-11-29 1:28 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2013-11-28 21:12 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Phil White, xfs
On Thu, Nov 28, 2013 at 09:32:32AM -0600, Eric Sandeen wrote:
> On 11/28/13, 4:01 AM, Dave Chinner wrote:
> > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
>
> <snip>
>
> >> Or maybe just stat() it, and DTRT?
> >
> > Well, we need to stat it to make sure that it's a file if "-d file"
> > is specified, and a block device if it's not. That will prevent this
> > problem. Every other xfsprogs utility has to be told that it is
> > being pointed at an image file rather than a block device, so why
> > should mkfs be any different?
>
> The option is there but again I never really knew why. They work
> fine without -f, at least in general:
Just like mkfs works fine, *in general*. That doesn't mean they will always
work, though:
$ sudo xfs_repair -n /storage/broken.img
Phase 1 - find and verify superblock...
xfs_repair: read failed: Invalid argument
Repair fails on the file which has a smaller sector size than the
host filesystem, unless you tell it is working on a file, not a
block device:
$ sudo xfs_repair -n -f /storage/broken.img
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
would have corrected attribute entry count in inode 649642 from 40 to 0
local inode 649790 attr too small (size = 1, min size = 4)
bad attribute fork in inode 649790, would clear attr fork
would have cleared inode 649790
....
And so behaviour is identical to mkfs...
> - scan filesystem freespace and inode maps...
> ...
>
> $ xfs_metadump fsfile fsfile.meta
>
> $ file fsfile.meta
> fsfile.meta: XFS filesystem metadump image
>
> etc
>
> > Indeed, if we don't require users to tell mkfs that it's a file,
> > what do we do with non-existent device names when they are provided
> > by the user? Just create the file rather than returning ENOENT? So
> > suddenly /dev/ fills up with fileystem images because of typos?
>
> That won't happen because it doesn't create a new file unless -d file
> is specified, so I guess that's one difference. i.e. with -d file
> it'll create a file of the requested size; without it, it will mkfs
> it to whatever size the file already is, or if it doesn't exist,
> return -ENOENT.
>
> > Principle of Least Surprise says that ENOENT is the correct
> > behaviour, hence it follows that "-d file" is needed and should be
> > properly checked and enforced. I'll add this to the start of the
> > patch set I'm currently working on that fixes all of the mkfs input
> > parameter validation problems I've found over the past couple of
> > weeks...
>
> Well, I hope it doesn't stop mkfs.xfs from mkfs'ing an existing
> file image, which has always worked before...
that's what stat() is for.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Problem with mkfs.xfs on a regular file
2013-11-28 21:12 ` Dave Chinner
@ 2013-11-29 1:28 ` Dave Chinner
0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2013-11-29 1:28 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Phil White, xfs
On Fri, Nov 29, 2013 at 08:12:13AM +1100, Dave Chinner wrote:
> On Thu, Nov 28, 2013 at 09:32:32AM -0600, Eric Sandeen wrote:
> > On 11/28/13, 4:01 AM, Dave Chinner wrote:
> > > On Wed, Nov 27, 2013 at 11:34:35PM -0600, Eric Sandeen wrote:
> >
> > <snip>
> >
> > >> Or maybe just stat() it, and DTRT?
> > >
> > > Well, we need to stat it to make sure that it's a file if "-d file"
> > > is specified, and a block device if it's not. That will prevent this
> > > problem. Every other xfsprogs utility has to be told that it is
> > > being pointed at an image file rather than a block device, so why
> > > should mkfs be any different?
> >
> > The option is there but again I never really knew why. They work
> > fine without -f, at least in general:
>
> Just like mkfs works fine, *in general*. That doesn't mean they will always
> work, though:
FYI, here's the list of stuff on top of making mkfs detect files and
set the proper flags and avoid direct IO that I found while doing
this:
Other file/blockdev issues fixed:
- use getstr to detect specifying the data device name
twice.
- check file/size/name parameters before anything else.
- overwrite checks need to be done before the image file is
opened and potentially truncated.
- blkid_get_topology() should not be called for image files,
so warn when it is called that way.
- zero_old_xfs_structures() emits a spurious error:
"existing superblock read failed: Success"
when it is run on a truncated image file. Don't warn if we
see this problem on an image file.
- Don't issue discards on image files.
- Use fsync() for image files, not BLKFLSBUF in
platform_flush_device() for Linux.
And so now "-d file" is only needed to trigger creation of the
image file, or if you want to truncate the old file away completely
first...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-11-29 1:28 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-27 2:31 Problem with mkfs.xfs on a regular file Phil White
2013-11-27 2:36 ` Phil White
2013-11-27 2:38 ` Nathan Scott
2013-11-27 2:41 ` Phil White
2013-11-27 2:47 ` Dave Chinner
2013-11-27 2:47 ` Dave Chinner
2013-11-28 2:39 ` Eric Sandeen
2013-11-28 5:16 ` Dave Chinner
2013-11-28 5:34 ` Eric Sandeen
2013-11-28 10:01 ` Dave Chinner
2013-11-28 11:47 ` Phil White
2013-11-28 15:38 ` Eric Sandeen
2013-11-28 15:32 ` Eric Sandeen
2013-11-28 21:12 ` Dave Chinner
2013-11-29 1:28 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox