* mkfs.ext4 vs. e2fsck discard oddities
@ 2012-02-28 17:34 Eric Sandeen
2012-02-29 7:12 ` Lukas Czerner
0 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2012-02-28 17:34 UTC (permalink / raw)
To: ext4 development, Lukáš Czerner
I've been testing Lukas' last 2 patches for e2fsck discard, and noticed something a little odd.
If I make a 512M file, loopback mount it, and mkfs.ext4 it with discard, it uses about 17M at that point.
If I then run fsstress on it with a known seed, then run e2fsck -E discard on it, it uses about 52M.
If I repeat the above test telling mkfs.ext4 NOT to discard, I'm left with about 94M after the discarding e2fsck.
So it seems that perhaps e2fsck is not discarding everything that it could; after a discarding fsck, we should be left with the same (minimal) nr. of blocks "in use" no?
I guess that's better than discarding _more_ than it should though. ;)
(I suppose it is possible that this is the underlying filesytem being selective about which discards it accepts, but it behaves the same way on ext4 and xfs backing filesystems)
-Eric
FWIW, sequence of events here, tested with and without "-K" on mkfs.ext4:
dd if=/dev/zero of=fsfile bs=1M count=512
losetup /dev/loop0 fsfile
mkfs.ext4 -F /dev/loop0&>/dev/null
mount /dev/loop0 mnt/
/root/git/xfstests/ltp/fsstress -s 1 -d mnt/ -n 2000 -p 4
umount mnt/
e2fsck/e2fsck.static -fy -E discard /dev/loop0> fsck1.out || exit
du -hc fsfile
losetup -d /dev/loop0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-02-28 17:34 mkfs.ext4 vs. e2fsck discard oddities Eric Sandeen
@ 2012-02-29 7:12 ` Lukas Czerner
2012-02-29 16:01 ` Eric Sandeen
2012-03-01 4:47 ` Theodore Tso
0 siblings, 2 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-02-29 7:12 UTC (permalink / raw)
To: Eric Sandeen; +Cc: ext4 development, Lukáš Czerner
On Tue, 28 Feb 2012, Eric Sandeen wrote:
> I've been testing Lukas' last 2 patches for e2fsck discard, and noticed something a little odd.
>
> If I make a 512M file, loopback mount it, and mkfs.ext4 it with discard, it uses about 17M at that point.
> If I then run fsstress on it with a known seed, then run e2fsck -E discard on it, it uses about 52M.
>
> If I repeat the above test telling mkfs.ext4 NOT to discard, I'm left with about 94M after the discarding e2fsck.
>
> So it seems that perhaps e2fsck is not discarding everything that it could; after a discarding fsck, we should be left with the same (minimal) nr. of blocks "in use" no?
The reason is (as I commented in the patch #2) that we will not discard
BLOCK_UNINIT groups. We use BLOCK_UNINIT as a optimization measure to
skip groups which are likely to be non-provisioned, because we have
never written there anything since the mkfs.
If you create file system without discard, then obviously nothing is
discarded, image is fully provisioned and e2fsck discard *only* initialized
groups. So you'll end up with the bigger image, in case that your image was
not sparse.
I hope that makes sense.
Actually I want to make the same optimization for fitrim. We discussed
it with Ted and Phillip (see the discussion under [RESEND] [PATCH 2/2
v2] e2fsck: Do not forget to discard last block group. They did seem to
be convinced by that, however I think it is right thing to do for the
reasons I gave in that thread.
Thanks!
-Lukas
>
> I guess that's better than discarding _more_ than it should though. ;)
>
> (I suppose it is possible that this is the underlying filesytem being selective about which discards it accepts, but it behaves the same way on ext4 and xfs backing filesystems)
>
> -Eric
>
> FWIW, sequence of events here, tested with and without "-K" on mkfs.ext4:
>
> dd if=/dev/zero of=fsfile bs=1M count=512
> losetup /dev/loop0 fsfile
> mkfs.ext4 -F /dev/loop0&>/dev/null
> mount /dev/loop0 mnt/
> /root/git/xfstests/ltp/fsstress -s 1 -d mnt/ -n 2000 -p 4
> umount mnt/
> e2fsck/e2fsck.static -fy -E discard /dev/loop0> fsck1.out || exit
> du -hc fsfile
> losetup -d /dev/loop0
>
>
--
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-02-29 7:12 ` Lukas Czerner
@ 2012-02-29 16:01 ` Eric Sandeen
2012-03-01 4:47 ` Theodore Tso
1 sibling, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2012-02-29 16:01 UTC (permalink / raw)
To: Lukas Czerner; +Cc: ext4 development
On 2/29/12 1:12 AM, Lukas Czerner wrote:
> On Tue, 28 Feb 2012, Eric Sandeen wrote:
>
>> I've been testing Lukas' last 2 patches for e2fsck discard, and noticed something a little odd.
>>
>> If I make a 512M file, loopback mount it, and mkfs.ext4 it with discard, it uses about 17M at that point.
>> If I then run fsstress on it with a known seed, then run e2fsck -E discard on it, it uses about 52M.
>>
>> If I repeat the above test telling mkfs.ext4 NOT to discard, I'm left with about 94M after the discarding e2fsck.
>>
>> So it seems that perhaps e2fsck is not discarding everything that it could; after a discarding fsck, we should be left with the same (minimal) nr. of blocks "in use" no?
>
> The reason is (as I commented in the patch #2) that we will not discard
> BLOCK_UNINIT groups. We use BLOCK_UNINIT as a optimization measure to
> skip groups which are likely to be non-provisioned, because we have
> never written there anything since the mkfs.
>
> If you create file system without discard, then obviously nothing is
> discarded, image is fully provisioned and e2fsck discard *only* initialized
> groups. So you'll end up with the bigger image, in case that your image was
> not sparse.
>
> I hope that makes sense.
It does, sorry, I had been focusing too much on patch #1 ;)
-Eric
> Actually I want to make the same optimization for fitrim. We discussed
> it with Ted and Phillip (see the discussion under [RESEND] [PATCH 2/2
> v2] e2fsck: Do not forget to discard last block group. They did seem to
> be convinced by that, however I think it is right thing to do for the
> reasons I gave in that thread.
>
> Thanks!
> -Lukas
>
>>
>> I guess that's better than discarding _more_ than it should though. ;)
>>
>> (I suppose it is possible that this is the underlying filesytem being selective about which discards it accepts, but it behaves the same way on ext4 and xfs backing filesystems)
>>
>> -Eric
>>
>> FWIW, sequence of events here, tested with and without "-K" on mkfs.ext4:
>>
>> dd if=/dev/zero of=fsfile bs=1M count=512
>> losetup /dev/loop0 fsfile
>> mkfs.ext4 -F /dev/loop0&>/dev/null
>> mount /dev/loop0 mnt/
>> /root/git/xfstests/ltp/fsstress -s 1 -d mnt/ -n 2000 -p 4
>> umount mnt/
>> e2fsck/e2fsck.static -fy -E discard /dev/loop0> fsck1.out || exit
>> du -hc fsfile
>> losetup -d /dev/loop0
>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-02-29 7:12 ` Lukas Czerner
2012-02-29 16:01 ` Eric Sandeen
@ 2012-03-01 4:47 ` Theodore Tso
2012-03-01 7:12 ` Lukas Czerner
1 sibling, 1 reply; 10+ messages in thread
From: Theodore Tso @ 2012-03-01 4:47 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Theodore Tso, Eric Sandeen, ext4 development
On Feb 29, 2012, at 2:12 AM, Lukas Czerner wrote:
>
> The reason is (as I commented in the patch #2) that we will not discard
> BLOCK_UNINIT groups. We use BLOCK_UNINIT as a optimization measure to
> skip groups which are likely to be non-provisioned, because we have
> never written there anything since the mkfs.
>
> If you create file system without discard, then obviously nothing is
> discarded, image is fully provisioned and e2fsck discard *only* initialized
> groups. So you'll end up with the bigger image, in case that your image was
> not sparse.
i still think it makes sense to have an option where we discard everything
including BLOCK_UNINIT blocks. Mke2fs doesn't discard blocks by default
because of a fear of crappy SSD drives, and while that fear may be
overstated, assuming that all of the unused blocks will *always* have been
discarded at mkfs time isn't necessarily a good thing to assume. I'll grant
that it might be a fine default, but there needs to be *some* way to discard
everything that's unused….
-- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-01 4:47 ` Theodore Tso
@ 2012-03-01 7:12 ` Lukas Czerner
2012-03-01 14:38 ` Ted Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: Lukas Czerner @ 2012-03-01 7:12 UTC (permalink / raw)
To: Theodore Tso; +Cc: Lukas Czerner, Eric Sandeen, ext4 development
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1651 bytes --]
On Wed, 29 Feb 2012, Theodore Tso wrote:
>
> On Feb 29, 2012, at 2:12 AM, Lukas Czerner wrote:
> >
> > The reason is (as I commented in the patch #2) that we will not discard
> > BLOCK_UNINIT groups. We use BLOCK_UNINIT as a optimization measure to
> > skip groups which are likely to be non-provisioned, because we have
> > never written there anything since the mkfs.
> >
> > If you create file system without discard, then obviously nothing is
> > discarded, image is fully provisioned and e2fsck discard *only* initialized
> > groups. So you'll end up with the bigger image, in case that your image was
> > not sparse.
>
> i still think it makes sense to have an option where we discard everything
> including BLOCK_UNINIT blocks. Mke2fs doesn't discard blocks by default
> because of a fear of crappy SSD drives, and while that fear may be
> overstated, assuming that all of the unused blocks will *always* have been
> discarded at mkfs time isn't necessarily a good thing to assume. I'll grant
> that it might be a fine default, but there needs to be *some* way to discard
> everything that's unused….
>
> -- Ted
>
>
Hi Ted,
actually mke2fs does discard block by default. It has been like that
since the beginning. Back then we only had '-K' argument to 'keep'
blocks and do not attempt to discard. Nowadays user can do '-E
nodiscard', but it is users choice. That one of the premiss of my patch
to skip BLOKC_UNINIT.
I hope that the 'age' of crappy SSD's is over now, and even though
there are surely some of them still running we do not want to optimize
for them, but rather for the better quality SSD's right ?
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-01 7:12 ` Lukas Czerner
@ 2012-03-01 14:38 ` Ted Ts'o
2012-03-01 14:54 ` Lukas Czerner
0 siblings, 1 reply; 10+ messages in thread
From: Ted Ts'o @ 2012-03-01 14:38 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Eric Sandeen, ext4 development
On Thu, Mar 01, 2012 at 08:12:44AM +0100, Lukas Czerner wrote:
>
> actually mke2fs does discard block by default. It has been like that
> since the beginning. Back then we only had '-K' argument to 'keep'
> blocks and do not attempt to discard. Nowadays user can do '-E
> nodiscard', but it is users choice.
Ah, you're right. The defaults had changed back and forth a couple of
times over time and I had lost track of how things had been settled
for mke2fs (which is different from e2fsck). At least at one point it
was _not_ the default, and in fact the man page was out of sync with
the behavior of the mke2fs.
The point remains the same, though, if the file system was created
with mke2fs -E nodiscard, how do you undo that decision if there's no
way to force the discard of BLOCK_UNINIT blocks?
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-01 14:38 ` Ted Ts'o
@ 2012-03-01 14:54 ` Lukas Czerner
2012-03-08 16:48 ` Phillip Susi
0 siblings, 1 reply; 10+ messages in thread
From: Lukas Czerner @ 2012-03-01 14:54 UTC (permalink / raw)
To: Ted Ts'o; +Cc: Lukas Czerner, Eric Sandeen, ext4 development
On Thu, 1 Mar 2012, Ted Ts'o wrote:
> On Thu, Mar 01, 2012 at 08:12:44AM +0100, Lukas Czerner wrote:
> >
> > actually mke2fs does discard block by default. It has been like that
> > since the beginning. Back then we only had '-K' argument to 'keep'
> > blocks and do not attempt to discard. Nowadays user can do '-E
> > nodiscard', but it is users choice.
>
> Ah, you're right. The defaults had changed back and forth a couple of
> times over time and I had lost track of how things had been settled
> for mke2fs (which is different from e2fsck). At least at one point it
> was _not_ the default, and in fact the man page was out of sync with
> the behavior of the mke2fs.
I am really not sure about that, I know that there was a discussion
whether to disable it by default, but I think that we never did that.
But that's not important.
>
> The point remains the same, though, if the file system was created
> with mke2fs -E nodiscard, how do you undo that decision if there's no
> way to force the discard of BLOCK_UNINIT blocks?
>
> - Ted
>
Well, it is not default right ? So the user should better know what is
he doing. Moreover it is not like it is end of the world when we do not
provide that option, since SSD's will handle over provisioning to some
extent even without slowdown, and as for thin-provisioned devices you
should know why you're overriding defaults and what it means for you.
Anyway, if people really want this another option to discard all the
block groups including those UNINIT ones, I guess I can not resist that
:). '-E discard_all' maybe ?
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-01 14:54 ` Lukas Czerner
@ 2012-03-08 16:48 ` Phillip Susi
2012-03-09 8:59 ` Lukas Czerner
0 siblings, 1 reply; 10+ messages in thread
From: Phillip Susi @ 2012-03-08 16:48 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Ted Ts'o, Eric Sandeen, ext4 development
On 3/1/2012 9:54 AM, Lukas Czerner wrote:
> Well, it is not default right ? So the user should better know what is
> he doing. Moreover it is not like it is end of the world when we do not
> provide that option, since SSD's will handle over provisioning to some
> extent even without slowdown, and as for thin-provisioned devices you
> should know why you're overriding defaults and what it means for you.
>
> Anyway, if people really want this another option to discard all the
> block groups including those UNINIT ones, I guess I can not resist that
> :). '-E discard_all' maybe ?
I think the option is a little more generic than discard. The uninit
groups are not discarded because they are not checked in the first
place. A bad group descriptor checksum will force the group to be
checked, and thus discarded as well. I think what is needed is an
option to trigger the same thing: force all groups to be checked, even
if they are uninit and have good descriptor checksums. Maybe -E thorough?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-08 16:48 ` Phillip Susi
@ 2012-03-09 8:59 ` Lukas Czerner
2012-03-09 15:14 ` Phillip Susi
0 siblings, 1 reply; 10+ messages in thread
From: Lukas Czerner @ 2012-03-09 8:59 UTC (permalink / raw)
To: Phillip Susi; +Cc: Lukas Czerner, Ted Ts'o, Eric Sandeen, ext4 development
On Thu, 8 Mar 2012, Phillip Susi wrote:
> On 3/1/2012 9:54 AM, Lukas Czerner wrote:
> > Well, it is not default right ? So the user should better know what is
> > he doing. Moreover it is not like it is end of the world when we do not
> > provide that option, since SSD's will handle over provisioning to some
> > extent even without slowdown, and as for thin-provisioned devices you
> > should know why you're overriding defaults and what it means for you.
> >
> > Anyway, if people really want this another option to discard all the
> > block groups including those UNINIT ones, I guess I can not resist that
> > :). '-E discard_all' maybe ?
>
> I think the option is a little more generic than discard. The uninit groups
> are not discarded because they are not checked in the first place. A bad
> group descriptor checksum will force the group to be checked, and thus
> discarded as well. I think what is needed is an option to trigger the same
> thing: force all groups to be checked, even if they are uninit and have good
> descriptor checksums. Maybe -E thorough?
>
Why would we try to check UNINIT groups with valid descriptor checksums
? I think that this problem will be solved with BLOCK_DISCARDED flag as
we discussed with Ted in another thread. No need to have yet another
option so it is win-win :)
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mkfs.ext4 vs. e2fsck discard oddities
2012-03-09 8:59 ` Lukas Czerner
@ 2012-03-09 15:14 ` Phillip Susi
0 siblings, 0 replies; 10+ messages in thread
From: Phillip Susi @ 2012-03-09 15:14 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Ted Ts'o, Eric Sandeen, ext4 development
On 3/9/2012 3:59 AM, Lukas Czerner wrote:
> Why would we try to check UNINIT groups with valid descriptor checksums
> ? I think that this problem will be solved with BLOCK_DISCARDED flag as
> we discussed with Ted in another thread. No need to have yet another
> option so it is win-win :)
Because not skipping a specific action on an uninitialized group (
discard ) is a specific case of the more general form of not skipping
uninitialized groups. I thought that it might sometimes be useful to
actually verify the group is correct instead of trusting the uninit
flag, especially if you are about to discard it. Also any other things
that are added in the future and skipped for uninit groups would not
need yet another flag to specifically not skip that action, since it
will be covered by the more general flag already.
Also the way the code was structured it looked like it would be much
simpler to bypass the skip and do the full check of the uninit group
than to modify it to discard the group even though checking it was skipped.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-09 15:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-28 17:34 mkfs.ext4 vs. e2fsck discard oddities Eric Sandeen
2012-02-29 7:12 ` Lukas Czerner
2012-02-29 16:01 ` Eric Sandeen
2012-03-01 4:47 ` Theodore Tso
2012-03-01 7:12 ` Lukas Czerner
2012-03-01 14:38 ` Ted Ts'o
2012-03-01 14:54 ` Lukas Czerner
2012-03-08 16:48 ` Phillip Susi
2012-03-09 8:59 ` Lukas Czerner
2012-03-09 15:14 ` Phillip Susi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).