* fstrim on BTRFS
@ 2011-12-28 16:57 Martin Steigerwald
2011-12-29 4:02 ` Li Zefan
2011-12-29 4:29 ` Fajar A. Nugraha
0 siblings, 2 replies; 15+ messages in thread
From: Martin Steigerwald @ 2011-12-28 16:57 UTC (permalink / raw)
To: linux-btrfs
Hi!
With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas it
trimmed:
merkaba:~> fstrim -v /boot
/boot: 224657408 bytes were trimmed
merkaba:~> fstrim -v /boot
/boot: 0 bytes were trimmed
But BTRFS does not:
merkaba:~> fstrim -v /
/: 4431613952 bytes were trimmed
merkaba:~> fstrim -v /
/: 4341846016 bytes were trimmed
Is it planned to add this feature to BTRFS as well?
I wish you a relaxed between Christmas and new year time,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-28 16:57 fstrim on BTRFS Martin Steigerwald
@ 2011-12-29 4:02 ` Li Zefan
2011-12-29 4:21 ` Fajar A. Nugraha
` (2 more replies)
2011-12-29 4:29 ` Fajar A. Nugraha
1 sibling, 3 replies; 15+ messages in thread
From: Li Zefan @ 2011-12-29 4:02 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs
Martin Steigerwald wrote:
> Hi!
>
> With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas it
> trimmed:
>
> merkaba:~> fstrim -v /boot
> /boot: 224657408 bytes were trimmed
> merkaba:~> fstrim -v /boot
> /boot: 0 bytes were trimmed
>
>
> But BTRFS does not:
>
> merkaba:~> fstrim -v /
> /: 4431613952 bytes were trimmed
> merkaba:~> fstrim -v /
> /: 4341846016 bytes were trimmed
>
>
> Is it planned to add this feature to BTRFS as well?
>
There's no such plan, but it's do-able, and I can take care of it.
There's an issue though.
Whether we want to store TRIMMED information on disk? ext4 doesn't
do this, so the first fstrim will be slow though you've done fstrim
in previous mount.
For btrfs this issue can't be solved without disk format change that
will break older kernels, but only 3.2-rcX kernels will be affected if
we push the following change into mainline before 3.2 release.
---
ctree.h | 4 ++--
free-space-cache.c | 5 +++--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6738503..919e055 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -278,8 +278,8 @@ struct btrfs_chunk {
/* additional stripes go here */
} __attribute__ ((__packed__));
-#define BTRFS_FREE_SPACE_EXTENT 1
-#define BTRFS_FREE_SPACE_BITMAP 2
+#define BTRFS_FREE_SPACE_EXTENT 0
+#define BTRFS_FREE_SPACE_BITMAP 1
struct btrfs_free_space_entry {
__le64 offset;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..8a7c0e0 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -481,6 +481,7 @@ static int io_ctl_add_entry(struct io_ctl *io_ctl, u64 offset, u64 bytes,
entry->bytes = cpu_to_le64(bytes);
entry->type = (bitmap) ? BTRFS_FREE_SPACE_BITMAP :
BTRFS_FREE_SPACE_EXTENT;
+ entry->type = 1 << entry->type;
io_ctl->cur += sizeof(struct btrfs_free_space_entry);
io_ctl->size -= sizeof(struct btrfs_free_space_entry);
@@ -669,7 +670,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
goto free_cache;
}
- if (type == BTRFS_FREE_SPACE_EXTENT) {
+ if (type & BTRFS_FREE_SPACE_EXTENT) {
spin_lock(&ctl->tree_lock);
ret = link_free_space(ctl, e);
spin_unlock(&ctl->tree_lock);
@@ -679,7 +680,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
kmem_cache_free(btrfs_free_space_cachep, e);
goto free_cache;
}
- } else {
+ } else if (type & BTRFS_FREE_SPACE_BITMAP) {
BUG_ON(!num_bitmaps);
num_bitmaps--;
e->bitmap = kzalloc(PAGE_CACHE_SIZE, GFP_NOFS);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:02 ` Li Zefan
@ 2011-12-29 4:21 ` Fajar A. Nugraha
2011-12-29 4:32 ` Fajar A. Nugraha
2011-12-29 4:37 ` Roman Mamedov
2011-12-29 10:52 ` Martin Steigerwald
2012-01-03 21:05 ` Chris Mason
2 siblings, 2 replies; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-29 4:21 UTC (permalink / raw)
To: linux-btrfs
On Thu, Dec 29, 2011 at 11:02 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> Martin Steigerwald wrote:
>> With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas it
>> trimmed:
>> But BTRFS does not:
> There's no such plan, but it's do-able, and I can take care of it.
> There's an issue though.
> For btrfs this issue can't be solved without disk format change that
> will break older kernels, but only 3.2-rcX kernels will be affected if
> we push the following change into mainline before 3.2 release.
Slightly off-topic, how useful would trim be for btrfs when using
newer SSD which have their own garbage collection and wear leveling
(e.g. sandforce-based)?
I'm trying fstrim and my disk is now pegged at write IOPS. Just
wondering if maybe a "btrfs fi balance" would be more useful, since:
- with trim, used space will remain used. Thus future writes will only
utilized space marked as "free", making them wear faster
- with "btrfs fi balance", btrfs will move the data around so (to some
degree) the currently-unused space will be used, and currently-used
space will be unused, which will improve wear leveling.
--
Fajar
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-28 16:57 fstrim on BTRFS Martin Steigerwald
2011-12-29 4:02 ` Li Zefan
@ 2011-12-29 4:29 ` Fajar A. Nugraha
2011-12-29 9:39 ` Li Zefan
1 sibling, 1 reply; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-29 4:29 UTC (permalink / raw)
To: linux-btrfs
On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
<Martin@lichtvoll.de> wrote:
> But BTRFS does not:
>
> merkaba:~> fstrim -v /
> /: 4431613952 bytes were trimmed
> merkaba:~> fstrim -v /
> /: 4341846016 bytes were trimmed
.... and apparently it can't trim everything. Or maybe my kernel is
just too old.
$ sudo fstrim -v /
2258165760 Bytes was trimmed
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 50G 34G 12G 75% /
$ mount | grep "/ "
/dev/sda6 on / type btrfs (rw,noatime,subvolid=258,compress-force=lzo)
so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.
--
Fajar
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:21 ` Fajar A. Nugraha
@ 2011-12-29 4:32 ` Fajar A. Nugraha
2011-12-29 4:37 ` Roman Mamedov
1 sibling, 0 replies; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-29 4:32 UTC (permalink / raw)
To: linux-btrfs
On Thu, Dec 29, 2011 at 11:21 AM, Fajar A. Nugraha <list@fajar.net> wrote:
> I'm trying fstrim and my disk is now pegged at write IOPS. Just
> wondering if maybe a "btrfs fi balance" would be more useful,
Sorry, I meant "btrfs fi defrag"
--
Fajar
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:21 ` Fajar A. Nugraha
2011-12-29 4:32 ` Fajar A. Nugraha
@ 2011-12-29 4:37 ` Roman Mamedov
2011-12-29 4:42 ` Fajar A. Nugraha
1 sibling, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2011-12-29 4:37 UTC (permalink / raw)
To: Fajar A. Nugraha; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]
On Thu, 29 Dec 2011 11:21:14 +0700
"Fajar A. Nugraha" <list@fajar.net> wrote:
> Slightly off-topic, how useful would trim be for btrfs when using
> newer SSD which have their own garbage collection and wear leveling
> (e.g. sandforce-based)?
>
> I'm trying fstrim and my disk is now pegged at write IOPS. Just
> wondering if maybe a "btrfs fi balance" would be more useful, since:
> - with trim, used space will remain used. Thus future writes will only
> utilized space marked as "free", making them wear faster
> - with "btrfs fi balance", btrfs will move the data around so (to some
> degree) the currently-unused space will be used, and currently-used
> space will be unused, which will improve wear leveling.
Modern controllers (like the SandForce you mentioned) do their own wear leveling 'under the hood', i.e. the same user-visible sectors DO NOT neccessarily map to the same locations on the flash at all times; and introducing 'manual' wear leveling by additional rewriting is not a good idea, it's just going to wear it out more.
--
With respect,
Roman
~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:37 ` Roman Mamedov
@ 2011-12-29 4:42 ` Fajar A. Nugraha
2011-12-29 5:29 ` cwillu
0 siblings, 1 reply; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-29 4:42 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On Thu, Dec 29, 2011 at 11:37 AM, Roman Mamedov <rm@romanrm.ru> wrote:
> On Thu, 29 Dec 2011 11:21:14 +0700
> "Fajar A. Nugraha" <list@fajar.net> wrote:
>
>> I'm trying fstrim and my disk is now pegged at write IOPS. Just
>> wondering if maybe a "btrfs fi balance" would be more useful, since:
> Modern controllers (like the SandForce you mentioned) do their own wear leveling 'under the hood', i.e. the same user-visible sectors DO NOT neccessarily map to the same locations on the flash at all times; and introducing 'manual' wear leveling by additional rewriting is not a good idea, it's just going to wear it out more.
I know that modern controllers have their own wear leveling, but AFAIK
they basically:
(1) have reserved a certain size for wear leveling purposes
(2) when a write request comes, they basically use new sectors from
the pool, and put the "old" sectors to the pool (doing garbage
collection like trim/rewrite in the process)
(3) they can't re-use sectors that are currently being used and not
rewritten (e.g. sectors used by OS files)
If (3) is still valid, then the only way to reuse the sectors is by
forcing a rewrite (e.g. using "btrfs fi defrag"). So the question is,
is (3) still valid?
--
Fajar
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:42 ` Fajar A. Nugraha
@ 2011-12-29 5:29 ` cwillu
0 siblings, 0 replies; 15+ messages in thread
From: cwillu @ 2011-12-29 5:29 UTC (permalink / raw)
To: Fajar A. Nugraha; +Cc: Roman Mamedov, linux-btrfs
On Wed, Dec 28, 2011 at 10:42 PM, Fajar A. Nugraha <list@fajar.net> wrote:
> On Thu, Dec 29, 2011 at 11:37 AM, Roman Mamedov <rm@romanrm.ru> wrote:
>> On Thu, 29 Dec 2011 11:21:14 +0700
>> "Fajar A. Nugraha" <list@fajar.net> wrote:
>>
>>> I'm trying fstrim and my disk is now pegged at write IOPS. Just
>>> wondering if maybe a "btrfs fi balance" would be more useful, since:
>
>
>> Modern controllers (like the SandForce you mentioned) do their own wear leveling 'under the hood', i.e. the same user-visible sectors DO NOT neccessarily map to the same locations on the flash at all times; and introducing 'manual' wear leveling by additional rewriting is not a good idea, it's just going to wear it out more.
>
> I know that modern controllers have their own wear leveling, but AFAIK
> they basically:
> (1) have reserved a certain size for wear leveling purposes
> (2) when a write request comes, they basically use new sectors from
> the pool, and put the "old" sectors to the pool (doing garbage
> collection like trim/rewrite in the process)
> (3) they can't re-use sectors that are currently being used and not
> rewritten (e.g. sectors used by OS files)
>
> If (3) is still valid, then the only way to reuse the sectors is by
> forcing a rewrite (e.g. using "btrfs fi defrag"). So the question is,
> is (3) still valid?
Erase blocks are generally much larger than logical sectors. There's nothing
stopping an SSD from shuffling around logical sectors as much as it wants, at
any time, any virtual all SSDs do this behind the scenes already, sufficient to
maintain adequate wear levelling.
The problem isn't levelling, but rather that once the pool of erase blocks with
remaining clear space is gone, any further writes require the SSD to do a
read/erase/rewrite shuffle of the valid data in an erase block to reclaim and
compact the scattered overwritten sectors. Early SSDs ended up operating in
this mode continuously, which is why their performance would drop off over
time: every little 512 byte write would require reading several hundred
kilobytes (if not megabytes) first, so that it could be rewritten with the new
data after erasing the whole block (cutting the power during this process would
often cause additional hilarity; SD cards have been especially bad for this).
The later controllers gained some intelligence, such that they would set aside
some erase blocks to perform that compaction in the background, allowing them
to maintain a pool of free erase blocks. Note that it's trivial at that point
for the drive to move the data from a relatively unworn erase block to one from
the pool if necessary, although I don't know that this is actually used, as
wear levelling really isn't a big deal in practice.
What TRIM does in this mix is tell the SSD that various logical blocks can be
considered to be overwritten (so to speak), and as such, don't need (and
shouldn't!) be rewritten if and when the erase block that holds them is
compacted. This allows the SSD to compact those sectors into the pool earlier
than it might have been able to otherwise (in the best case), and in the worst
case can prevent that data from being needlessly copied again and again.
Consider if you filled a somewhat naive SSD (specifically, one which held no
spare erase blocks for compaction) to capacity, deleted everything, and then
overwrote the same logical sector repeatedly: without trim, the ssd has no way
of knowing that the rest of the blocks are garbage that can be reused, and so
it'll be stuck reading an entire erase block's worth of garbage, clearing the
erase block, and writing that garbage back out with the changed 512 bytes.
Even with wear-levelling, you'll still suffer a horrendous write-performance
loss, and will wear through the drive far faster than one might otherwise
expect.
This is why some have said that TRIM support is just a crutch for poor
firmware, and is why many devices (all, the last time I checked :p) have poorly
performing TRIM commands: with a couple erase blocks set aside, that
pathological case won't occur; instead you'll have a couple erase blocks that
gradually get filled up with old copies of the only logical sector that's
changing, which can be efficiently erased and returned to the pool. Add in
some transparent compression (e.g., OCZ's), and you can probably get away with
very few erase blocks in the free pool and still maintain acceptable write
performance.
In light of this, the problem with just using btrfs's defrag/balance as
currently implemented becomes more apparent: we're not actually freeing up any
space, we're just overwriting logical sectors with data that was already stored
elsewhere. In the mythical best case, a magical SSD will notice the duplicated
blocks and just store a reference; in the common case of a half-decent
firmware, the SSD will still get along okay (it's basically the same situation
as the previous example); in the worse case of a naive or misguided SSD, you're
pretty much guaranteeing the worst case behaviour: filling up the drive with
garbage, at which point the writes from the balance/defrag will likely hit the
wear-amplification case described above.
Or something like that anyway :p
--Carey Underwood
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:29 ` Fajar A. Nugraha
@ 2011-12-29 9:39 ` Li Zefan
2011-12-29 9:52 ` Fajar A. Nugraha
0 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2011-12-29 9:39 UTC (permalink / raw)
To: Fajar A. Nugraha; +Cc: linux-btrfs
Fajar A. Nugraha wrote:
> On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
> <Martin@lichtvoll.de> wrote:
>> But BTRFS does not:
>>
>> merkaba:~> fstrim -v /
>> /: 4431613952 bytes were trimmed
>> merkaba:~> fstrim -v /
>> /: 4341846016 bytes were trimmed
>
> .... and apparently it can't trim everything. Or maybe my kernel is
> just too old.
>
>
> $ sudo fstrim -v /
> 2258165760 Bytes was trimmed
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda6 50G 34G 12G 75% /
>
> $ mount | grep "/ "
> /dev/sda6 on / type btrfs (rw,noatime,subvolid=258,compress-force=lzo)
>
> so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.
>
That's because only free spaces in block groups will be trimmed. Btrfs
allocates space from block groups, and when there's no space availabe,
it will allocate a new block group from the pool. In your case there's
~10G in the pool.
You can do a "btrfs fi df /", and you'll see the total size of existing
block groups.
You can empty the pool by:
# dd if=/dev/zero of=/mytmpfile bs=1M
Then release the space (but it won't return back to the pool):
# rm /mytmpfile
# sync
and try "btrfs fi df /" and trim again.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 9:39 ` Li Zefan
@ 2011-12-29 9:52 ` Fajar A. Nugraha
2011-12-30 6:19 ` Li Zefan
0 siblings, 1 reply; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-29 9:52 UTC (permalink / raw)
To: Li Zefan; +Cc: linux-btrfs
On Thu, Dec 29, 2011 at 4:39 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> Fajar A. Nugraha wrote:
>> On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
>> <Martin@lichtvoll.de> wrote:
>>> But BTRFS does not:
>>>
>>> merkaba:~> fstrim -v /
>>> /: 4431613952 bytes were trimmed
>>> merkaba:~> fstrim -v /
>>> /: 4341846016 bytes were trimmed
>>
>> .... and apparently it can't trim everything. Or maybe my kernel is
>> just too old.
>>
>>
>> $ sudo fstrim -v /
>> 2258165760 Bytes was trimmed
>>
>> $ df -h /
>> Filesystem =A0 =A0 =A0 =A0 =A0 =A0Size =A0Used Avail Use% Mounted on
>> /dev/sda6 =A0 =A0 =A0 =A0 =A0 =A0 =A050G =A0 34G =A0 12G =A075% /
>>
>> $ mount | grep "/ "
>> /dev/sda6 on / type btrfs (rw,noatime,subvolid=3D258,compress-force=3D=
lzo)
>>
>> so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.
>>
>
> That's because only free spaces in block groups will be trimmed. Btrf=
s
> allocates space from block groups, and when there's no space availabe=
,
> it will allocate a new block group from the pool. In your case there'=
s
> ~10G in the pool.
Thanks for your response.
>
> You can do a "btrfs fi df /", and you'll see the total size of existi=
ng
> block groups.
$ sudo btrfs fi df /
Data: total=3D43.47GB, used=3D31.88GB
System, DUP: total=3D8.00MB, used=3D12.00KB
System: total=3D4.00MB, used=3D0.00
Metadata, DUP: total=3D3.25GB, used=3D619.88MB
Metadata: total=3D8.00MB, used=3D0.00
That should mean existing block groups is at least 46GB, right? In
which case my pool (a 50G partition) should only have about 4GB of
space not allocated to block groups. The numbers don't seem to match.
>
> You can empty the pool by:
>
> =A0 =A0 =A0 =A0# dd if=3D/dev/zero of=3D/mytmpfile bs=3D1M
>
> Then release the space (but it won't return back to the pool):
>
> =A0 =A0 =A0 =A0# rm /mytmpfile
> =A0 =A0 =A0 =A0# sync
Is there a bad side effect of doing so? For example, since all free
space in the pool would be allocated to data block group, would that
mean my metadata block group is capped at 3.25GB? Or would some data
block group can be converted to metadata, and vice versa?
--=20
=46ajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:02 ` Li Zefan
2011-12-29 4:21 ` Fajar A. Nugraha
@ 2011-12-29 10:52 ` Martin Steigerwald
2012-01-03 21:05 ` Chris Mason
2 siblings, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2011-12-29 10:52 UTC (permalink / raw)
To: linux-btrfs
Am Donnerstag, 29. Dezember 2011 schrieb Li Zefan:
> Martin Steigerwald wrote:
> > Hi!
> >=20
> > With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas =
it
> > trimmed:
> >=20
> > merkaba:~> fstrim -v /boot
> > /boot: 224657408 bytes were trimmed
> > merkaba:~> fstrim -v /boot
> > /boot: 0 bytes were trimmed
> >=20
> >=20
> > But BTRFS does not:
> >=20
> > merkaba:~> fstrim -v /
> > /: 4431613952 bytes were trimmed
> > merkaba:~> fstrim -v /
> > /: 4341846016 bytes were trimmed
> >=20
> >=20
> > Is it planned to add this feature to BTRFS as well?
>=20
> There's no such plan, but it's do-able, and I can take care of it.
> There's an issue though.
>=20
> Whether we want to store TRIMMED information on disk? ext4 doesn't
> do this, so the first fstrim will be slow though you've done fstrim
> in previous mount.
>=20
> For btrfs this issue can't be solved without disk format change that
> will break older kernels, but only 3.2-rcX kernels will be affected i=
f
> we push the following change into mainline before 3.2 release.
I can=B4t comment on the disk format change. But if it is accepted, I c=
an=20
give your patchset a spin before 3.3 merge window. Tell me when you=B4d=
like=20
that.
If not, then AFAIK there is another disk format change necessary to rai=
se=20
hard link limit. So maybe then it makes sense to combine both disk form=
at=20
changes at some future kernel. Better an early one, before adoption rai=
ses=20
even more.
Thanks,
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 9:52 ` Fajar A. Nugraha
@ 2011-12-30 6:19 ` Li Zefan
2011-12-30 6:35 ` Fajar A. Nugraha
0 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2011-12-30 6:19 UTC (permalink / raw)
To: Fajar A. Nugraha; +Cc: linux-btrfs
Fajar A. Nugraha wrote:
> On Thu, Dec 29, 2011 at 4:39 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> Fajar A. Nugraha wrote:
>>> On Wed, Dec 28, 2011 at 11:57 PM, Martin Steigerwald
>>> <Martin@lichtvoll.de> wrote:
>>>> But BTRFS does not:
>>>>
>>>> merkaba:~> fstrim -v /
>>>> /: 4431613952 bytes were trimmed
>>>> merkaba:~> fstrim -v /
>>>> /: 4341846016 bytes were trimmed
>>>
>>> .... and apparently it can't trim everything. Or maybe my kernel is
>>> just too old.
>>>
>>>
>>> $ sudo fstrim -v /
>>> 2258165760 Bytes was trimmed
>>>
>>> $ df -h /
>>> Filesystem Size Used Avail Use% Mounted on
>>> /dev/sda6 50G 34G 12G 75% /
>>>
>>> $ mount | grep "/ "
>>> /dev/sda6 on / type btrfs (rw,noatime,subvolid=258,compress-force=lzo)
>>>
>>> so only about 2G out of 12G can be trimmed. This is on kernel 3.1.4.
>>>
>>
>> That's because only free spaces in block groups will be trimmed. Btrfs
>> allocates space from block groups, and when there's no space availabe,
>> it will allocate a new block group from the pool. In your case there's
>> ~10G in the pool.
>
> Thanks for your response.
>
>>
>> You can do a "btrfs fi df /", and you'll see the total size of existing
>> block groups.
>
> $ sudo btrfs fi df /
> Data: total=43.47GB, used=31.88GB
> System, DUP: total=8.00MB, used=12.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=3.25GB, used=619.88MB
This is DUP, so the actual physical size is (3.25 * 2) = 6.5G
> Metadata: total=8.00MB, used=0.00
>
> That should mean existing block groups is at least 46GB, right? In
so the sum is 50G.
> which case my pool (a 50G partition) should only have about 4GB of
> space not allocated to block groups. The numbers don't seem to match.
>
The pool has been emptied, so there're other reasons that you had only
~2GB trimmed, and the possible reason is fstrim in btrfs is buggy.
I sent a fix weeks ago, which is not merged yet:
http://marc.info/?l=linux-btrfs&m=132212530410572&w=2
>>
>> You can empty the pool by:
>>
>> # dd if=/dev/zero of=/mytmpfile bs=1M
>>
>> Then release the space (but it won't return back to the pool):
>>
>> # rm /mytmpfile
>> # sync
>
> Is there a bad side effect of doing so? For example, since all free
> space in the pool would be allocated to data block group, would that
> mean my metadata block group is capped at 3.25GB?
You can config the ratio of data block groups and metadata block groups
via "metadata_ratio=" mount option.
> Or would some data
> block group can be converted to metadata, and vice versa?
>
This won't happen. Also empty block groups won't be reclaimed, but it's
in TODO list.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-30 6:19 ` Li Zefan
@ 2011-12-30 6:35 ` Fajar A. Nugraha
0 siblings, 0 replies; 15+ messages in thread
From: Fajar A. Nugraha @ 2011-12-30 6:35 UTC (permalink / raw)
To: Li Zefan; +Cc: linux-btrfs
On Fri, Dec 30, 2011 at 1:19 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> Or would some data
>> block group can be converted to metadata, and vice versa?
>>
>
> This won't happen. Also empty block groups won't be reclaimed, but it's
> in TODO list.
Ah, OK.
6G for metadata out of 50G total seems a bit much, but I can live with
it for now.
Thanks,
Fajar
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fstrim on BTRFS
2011-12-29 4:02 ` Li Zefan
2011-12-29 4:21 ` Fajar A. Nugraha
2011-12-29 10:52 ` Martin Steigerwald
@ 2012-01-03 21:05 ` Chris Mason
2 siblings, 0 replies; 15+ messages in thread
From: Chris Mason @ 2012-01-03 21:05 UTC (permalink / raw)
To: Li Zefan; +Cc: Martin Steigerwald, linux-btrfs
On Thu, Dec 29, 2011 at 12:02:48PM +0800, Li Zefan wrote:
> Martin Steigerwald wrote:
> > Hi!
> >
> > With 3.2-rc4 (probably earlier), Ext4 seems to remember what areas it
> > trimmed:
> >
> > merkaba:~> fstrim -v /boot
> > /boot: 224657408 bytes were trimmed
> > merkaba:~> fstrim -v /boot
> > /boot: 0 bytes were trimmed
> >
> >
> > But BTRFS does not:
> >
> > merkaba:~> fstrim -v /
> > /: 4431613952 bytes were trimmed
> > merkaba:~> fstrim -v /
> > /: 4341846016 bytes were trimmed
> >
> >
> > Is it planned to add this feature to BTRFS as well?
> >
>
> There's no such plan, but it's do-able, and I can take care of it.
> There's an issue though.
>
> Whether we want to store TRIMMED information on disk? ext4 doesn't
> do this, so the first fstrim will be slow though you've done fstrim
> in previous mount.
I'd rather not store the trim status on disk. The extra trims
don't have a huge cost, and since some devices have a large granularity
for trims, they may ignore the trim until it tosses a larger contiguous
area of the disk.
I'd be fine with a flag to the in-memory free extent struct that
indicates if it has been trimmed down to the device.
-chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* fstrim on BTRFS
@ 2014-10-31 0:21 Noah Massey
0 siblings, 0 replies; 15+ messages in thread
From: Noah Massey @ 2014-10-31 0:21 UTC (permalink / raw)
To: linux-btrfs
Hello,
I am looking for some clarification on TRIM / SSD maintenance.
The wiki [1] suggests periodic fstrim, but fstrim does not discard
unallocated blocks[2].
Which makes sense, given that mkfs issues a device wide trim, so they
shouldn't have data.
But it seems like both a balance, and a pending patch
( 47ab2a6 Btrfs: remove empty block groups automatically )
can deallocate block groups without TRIM, leading to the SSD retaining
data it doesn't need to.
Is there a bitter way to trigger a more thorough discard than
fallocate, rm, fstrim, balance -dusage=0 ?
And are there plans to support trimming unallocated space, or is this
not possible with current FS format?
Thank you,
Noah
[1] https://btrfs.wiki.kernel.org/index.php/FAQ#Does_Btrfs_support_TRIM.2Fdiscard.3F
[2] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14195.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-31 0:21 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-28 16:57 fstrim on BTRFS Martin Steigerwald
2011-12-29 4:02 ` Li Zefan
2011-12-29 4:21 ` Fajar A. Nugraha
2011-12-29 4:32 ` Fajar A. Nugraha
2011-12-29 4:37 ` Roman Mamedov
2011-12-29 4:42 ` Fajar A. Nugraha
2011-12-29 5:29 ` cwillu
2011-12-29 10:52 ` Martin Steigerwald
2012-01-03 21:05 ` Chris Mason
2011-12-29 4:29 ` Fajar A. Nugraha
2011-12-29 9:39 ` Li Zefan
2011-12-29 9:52 ` Fajar A. Nugraha
2011-12-30 6:19 ` Li Zefan
2011-12-30 6:35 ` Fajar A. Nugraha
-- strict thread matches above, loose matches on Subject: below --
2014-10-31 0:21 Noah Massey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).