* BUG relating to fstrim on btrfs partitions
@ 2013-10-10 10:20 Mike Audia
2013-10-10 11:39 ` Duncan
0 siblings, 1 reply; 5+ messages in thread
From: Mike Audia @ 2013-10-10 10:20 UTC (permalink / raw)
To: linux-btrfs
I think I found a bug affecting btrfs filesystems and users invoking fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the amount trimmed does not change on the 2nd invocation AND it takes just as long as the first. Why do I think this is a bug? When I do the same on an ext4 partition I get different behavior: the output shows 0 B trimmed and it does is instantaneously when I run it a 2nd time. After contacting the fstrim developer, he stated that the userspace part (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is the job of the filesystem to properly implement this.
Supporting data
----------------
Example on a btrfs partition:
The 1st time:
% time sudo fstrim -v /
/: 5.2 GiB (5575192576 bytes) trimmed
sudo fstrim -v / 0.00s user 0.05s system 2% cpu 2.084 total
The 2nd time:
% time sudo fstrim -v /
/: 5.2 GiB (5575192576 bytes) trimmed
sudo fstrim -v / 0.00s user 0.06s system 2% cpu 2.107 total
If I run the command twice on an ext4 filesystem, it does go to zero and the 2nd invocation is instantaneous:
The 1st time:
% time sudo fstrim -v /
/: 15.4 GiB (16481087488 bytes) trimmed
sudo fstrim -v / 0.00s user 0.08s system 1% cpu 6.268 total
The 2nd time:
% time sudo fstrim -v /
/: 0 B (0 bytes) trimmed
sudo fstrim -v / 0.00s user 0.00s system 48% cpu 0.007 total
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG relating to fstrim on btrfs partitions
2013-10-10 10:20 BUG relating to fstrim on btrfs partitions Mike Audia
@ 2013-10-10 11:39 ` Duncan
2013-10-11 14:44 ` Eric Sandeen
0 siblings, 1 reply; 5+ messages in thread
From: Duncan @ 2013-10-10 11:39 UTC (permalink / raw)
To: linux-btrfs
Mike Audia posted on Thu, 10 Oct 2013 06:20:42 -0400 as excerpted:
> I think I found a bug affecting btrfs filesystems and users invoking
> fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the
> amount trimmed does not change on the 2nd invocation AND it takes just
> as long as the first. Why do I think this is a bug? When I do the same
> on an ext4 partition I get different behavior: the output shows 0 B
> trimmed and it does is instantaneously when I run it a 2nd time. After
> contacting the fstrim developer, he stated that the userspace part
> (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is
> the job of the filesystem to properly implement this.
This behavior is documented in the fstrim manpage under -v/--verbose:
>>> When [--verbose is] specified fstrim will output the number of bytes
>>> passed from the filesystem down the block stack to the device for
>>> potential discard. This number is a maximum discard amount from the
>>> storage device's perspective, because FITRIM ioctl called repeated
>>> will keep sending the same sectors for discard repeatedly.
>>>
>>> fstrim will report the same potential discard bytes each time, but
>>> only sectors which had been written to between the discards would
>>> actually be discarded by the storage device.
Why ext4 behavior doesn't conform to that fstrim documentation I can't
say (except by stating the obvious that the ext4 filesystem
implementation of that ioctl obviously does it differently, but why...
you'd have to either ask the ext4 folks or read its docs/sources), but
given that fstrim documentation, the btrfs behavior is certainly NOTABUG
as it's simply conforming to the documentation.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG relating to fstrim on btrfs partitions
2013-10-10 11:39 ` Duncan
@ 2013-10-11 14:44 ` Eric Sandeen
2013-10-11 15:14 ` Emil Karlson
0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2013-10-11 14:44 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
On 10/10/13 6:39 AM, Duncan wrote:
> Mike Audia posted on Thu, 10 Oct 2013 06:20:42 -0400 as excerpted:
>
>> I think I found a bug affecting btrfs filesystems and users invoking
>> fstrim to discard unused blocks: if I execute a `fstrim -v /` twice, the
>> amount trimmed does not change on the 2nd invocation AND it takes just
>> as long as the first. Why do I think this is a bug? When I do the same
>> on an ext4 partition I get different behavior: the output shows 0 B
>> trimmed and it does is instantaneously when I run it a 2nd time. After
>> contacting the fstrim developer, he stated that the userspace part
>> (fstrim) does only one thing and it is invoke an ioctl (FITRIM); it is
>> the job of the filesystem to properly implement this.
>
> This behavior is documented in the fstrim manpage under -v/--verbose:
>
>>>> When [--verbose is] specified fstrim will output the number of bytes
>>>> passed from the filesystem down the block stack to the device for
>>>> potential discard. This number is a maximum discard amount from the
>>>> storage device's perspective, because FITRIM ioctl called repeated
>>>> will keep sending the same sectors for discard repeatedly.
>>>>
>>>> fstrim will report the same potential discard bytes each time, but
>>>> only sectors which had been written to between the discards would
>>>> actually be discarded by the storage device.
>
> Why ext4 behavior doesn't conform to that fstrim documentation I can't
> say (except by stating the obvious that the ext4 filesystem
> implementation of that ioctl obviously does it differently, but why...
> you'd have to either ask the ext4 folks or read its docs/sources), but
> given that fstrim documentation, the btrfs behavior is certainly NOTABUG
> as it's simply conforming to the documentation.
ext4 is conforming just fine.
"fstrim will output the number of bytes passed from the filesystem down
the block stack to the device for potential discard."
It reports the number of bytes passed *from the filesystem* to the block
device for discard, not the total range requested by the user.
If the filesystem is clever enough to know that the range in question has
not been written to since the last discard, then it takes no action, and
reports zero bytes.
So it sounds like btrfs doesn't maintain this "already discarded" state,
and will "re-discard" unused regions every time fstrim is issued.
Not a bug per se, but not really optimized.
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG relating to fstrim on btrfs partitions
2013-10-11 14:44 ` Eric Sandeen
@ 2013-10-11 15:14 ` Emil Karlson
2013-10-11 15:21 ` Eric Sandeen
0 siblings, 1 reply; 5+ messages in thread
From: Emil Karlson @ 2013-10-11 15:14 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Duncan, Linux Btrfs
> If the filesystem is clever enough to know that the range in question has
> not been written to since the last discard, then it takes no action, and
> reports zero bytes.
File system images can be rewritten on a new media so there is a
drawback to that.
Best Regards
-Emil
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG relating to fstrim on btrfs partitions
2013-10-11 15:14 ` Emil Karlson
@ 2013-10-11 15:21 ` Eric Sandeen
0 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2013-10-11 15:21 UTC (permalink / raw)
To: Emil Karlson; +Cc: Duncan, Linux Btrfs
On 10/11/13 10:14 AM, Emil Karlson wrote:
>> If the filesystem is clever enough to know that the range in question has
>> not been written to since the last discard, then it takes no action, and
>> reports zero bytes.
>
> File system images can be rewritten on a new media so there is a
> drawback to that.
It's in-memory for the mounted filesystem, not on disk.
It checks the EXT4_GROUP_INFO_WAS_TRIMMED_BIT flag stored in bb_state
in the ext4_group_info structure.
So when you mount a dd'd copy, it takes a fresh look, and DTRT.
-Eric
> Best Regards
> -Emil
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-10-11 15:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-10 10:20 BUG relating to fstrim on btrfs partitions Mike Audia
2013-10-10 11:39 ` Duncan
2013-10-11 14:44 ` Eric Sandeen
2013-10-11 15:14 ` Emil Karlson
2013-10-11 15:21 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).