* BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
@ 2022-10-09 11:03 admiral
2022-10-09 11:13 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: admiral @ 2022-10-09 11:03 UTC (permalink / raw)
To: linux-btrfs
Dear btrfs team,
thanks for all your great work!
I have been running btrfs now for several years and really like the
robustness and ease of use!
Last week I experienced 99% the same thing as described here by Loren M.
Lang:
https://www.spinics.net/lists/linux-btrfs/msg81173.html
only difference: This is not my / but a 40TB storage mounted to
/media/btrfs1/
quick summary what happend:
- enabled quotas to better understand where all my space has gone
- started balancing
- system got completely stuck due to the meanwhile well understood reasons
- pushed reset button
I can mount my btrfs system perfectly read-only and access the data. As soon
as I try to mount rw, my system will exremely slow down, memory will fill up
until I will finally end up with a panicking kernel.
So, no problem to successfully boot with the fstab entries on ro or
commented out.
admiral@server:/$ uname -a
Linux server.domain.loc 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2
(2022-06-30) x86_64 GNU/Linux
admiral@server:/$ btrfs --version
btrfs-progs v5.10.1
Here the question:
I am looking for the option to disable quota on an unmounted btrfs like
described here:
https://patchwork.kernel.org/project/linux-btrfs/patch/20180812013358.16431-
1-wqu@suse.com/
All my trials and checks et cetera were performed with btrfs-progs v4.20.1-2
as debian buster's latest state:
https://packages.debian.org/de/buster/btrfs-progs
I already upgraded the btrfs-progs to debian backport v5.10.1 but do not
find any option to offline disable quota, yet:
https://packages.debian.org/buster-backports/btrfs-progs
Can you point me some direction how to move forward to recover the btrfs?
Thanks a lot,
admiralbulli
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
2022-10-09 11:03 BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2 admiral
@ 2022-10-09 11:13 ` Qu Wenruo
2022-10-09 11:37 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2022-10-09 11:13 UTC (permalink / raw)
To: admiral, linux-btrfs
On 2022/10/9 19:03, admiral@admiralbulli.de wrote:
> Dear btrfs team,
> thanks for all your great work!
> I have been running btrfs now for several years and really like the
> robustness and ease of use!
>
> Last week I experienced 99% the same thing as described here by Loren M.
> Lang:
> https://www.spinics.net/lists/linux-btrfs/msg81173.html
> only difference: This is not my / but a 40TB storage mounted to
> /media/btrfs1/
>
> quick summary what happend:
> - enabled quotas to better understand where all my space has gone
> - started balancing
> - system got completely stuck due to the meanwhile well understood reasons
> - pushed reset button
>
> I can mount my btrfs system perfectly read-only and access the data. As soon
> as I try to mount rw, my system will exremely slow down, memory will fill up
> until I will finally end up with a panicking kernel.
>
> So, no problem to successfully boot with the fstab entries on ro or
> commented out.
>
> admiral@server:/$ uname -a
> Linux server.domain.loc 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2
> (2022-06-30) x86_64 GNU/Linux
Your kernel is just one version too old...
In fact, v5.0 kernel we have introduced a lot of qgroup optimization to
address the slow performance (including hang, huge memory usage) of
balance with qgroup enabled.
Although that optimization also introduced some regression, all the
known regression should have been fixed and backported.
But for older kernels, like your 4.x kernels, we don't have the
optimization at all.
Thus in your case, you may want to use the latest LTS kernel at least
(v5.15.x).
Thanks,
Qu
>
> admiral@server:/$ btrfs --version
> btrfs-progs v5.10.1
>
> Here the question:
> I am looking for the option to disable quota on an unmounted btrfs like
> described here:
> https://patchwork.kernel.org/project/linux-btrfs/patch/20180812013358.16431-
> 1-wqu@suse.com/
>
> All my trials and checks et cetera were performed with btrfs-progs v4.20.1-2
> as debian buster's latest state:
> https://packages.debian.org/de/buster/btrfs-progs
>
> I already upgraded the btrfs-progs to debian backport v5.10.1 but do not
> find any option to offline disable quota, yet:
> https://packages.debian.org/buster-backports/btrfs-progs
>
> Can you point me some direction how to move forward to recover the btrfs?
>
> Thanks a lot,
>
> admiralbulli
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
2022-10-09 11:13 ` Qu Wenruo
@ 2022-10-09 11:37 ` Qu Wenruo
2022-10-10 21:55 ` admiral
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2022-10-09 11:37 UTC (permalink / raw)
To: admiral, linux-btrfs
On 2022/10/9 19:13, Qu Wenruo wrote:
>
>
> On 2022/10/9 19:03, admiral@admiralbulli.de wrote:
>> Dear btrfs team,
>> thanks for all your great work!
>> I have been running btrfs now for several years and really like the
>> robustness and ease of use!
>>
>> Last week I experienced 99% the same thing as described here by Loren M.
>> Lang:
>> https://www.spinics.net/lists/linux-btrfs/msg81173.html
>> only difference: This is not my / but a 40TB storage mounted to
>> /media/btrfs1/
>>
>> quick summary what happend:
>> - enabled quotas to better understand where all my space has gone
>> - started balancing
>> - system got completely stuck due to the meanwhile well understood
>> reasons
>> - pushed reset button
>>
>> I can mount my btrfs system perfectly read-only and access the data.
>> As soon
>> as I try to mount rw, my system will exremely slow down, memory will
>> fill up
>> until I will finally end up with a panicking kernel.
>>
>> So, no problem to successfully boot with the fstab entries on ro or
>> commented out.
>>
>> admiral@server:/$ uname -a
>> Linux server.domain.loc 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2
>> (2022-06-30) x86_64 GNU/Linux
>
> Your kernel is just one version too old...
My bad, two versions too old.
>
> In fact, v5.0 kernel we have introduced a lot of qgroup optimization to
Git describes --contains shows it's v5.1 for the optimization.
> address the slow performance (including hang, huge memory usage) of
> balance with qgroup enabled.
>
> Although that optimization also introduced some regression, all the
> known regression should have been fixed and backported.
>
> But for older kernels, like your 4.x kernels, we don't have the
> optimization at all.
>
> Thus in your case, you may want to use the latest LTS kernel at least
> (v5.15.x).
>
> Thanks,
> Qu
>
>>
>> admiral@server:/$ btrfs --version
>> btrfs-progs v5.10.1
>>
>> Here the question:
>> I am looking for the option to disable quota on an unmounted btrfs like
>> described here:
>> https://patchwork.kernel.org/project/linux-btrfs/patch/20180812013358.16431-
>> 1-wqu@suse.com/
>>
>> All my trials and checks et cetera were performed with btrfs-progs
>> v4.20.1-2
>> as debian buster's latest state:
>> https://packages.debian.org/de/buster/btrfs-progs
>>
>> I already upgraded the btrfs-progs to debian backport v5.10.1 but do not
>> find any option to offline disable quota, yet:
>> https://packages.debian.org/buster-backports/btrfs-progs
>>
>> Can you point me some direction how to move forward to recover the btrfs?
>>
>> Thanks a lot,
>>
>> admiralbulli
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
2022-10-09 11:37 ` Qu Wenruo
@ 2022-10-10 21:55 ` admiral
0 siblings, 0 replies; 8+ messages in thread
From: admiral @ 2022-10-10 21:55 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
Dear Qu,
thank you so much for your quick and well directed feedback.
I really appreciate this!
Thanks,
admiralbulli
P.S.:
For the sake of simplicity, I booted into an Ubuntu 22.04 (kernel
5.15) live system.
Mounting the btrfs rw worked like a charm.
disabled quota:
btrfs quota disable /mymount
did some cleanup:
btrfs balance start -dusage=5 /mymount
btrfs balance start -musage=20 /mymount
btrfs scrub start -BdR /mymount
Rebooted into the old debian system.
Able to mount eveything again rw.
THANKS AGAIN!
On 2022/10/9 19:37, Qu Wenruo wrote:
> On 2022/10/9 19:13, Qu Wenruo wrote:
>>
>>
>> On 2022/10/9 19:03, admiral@admiralbulli.de wrote:
>>> Dear btrfs team,
>>> thanks for all your great work!
>>> I have been running btrfs now for several years and really like the
>>> robustness and ease of use!
>>>
>>> Last week I experienced 99% the same thing as described here by Loren M.
>>> Lang:
>>> https://www.spinics.net/lists/linux-btrfs/msg81173.html
>>> only difference: This is not my / but a 40TB storage mounted to
>>> /media/btrfs1/
>>>
>>> quick summary what happend:
>>> - enabled quotas to better understand where all my space has gone
>>> - started balancing
>>> - system got completely stuck due to the meanwhile well understood
>>> reasons
>>> - pushed reset button
>>>
>>> I can mount my btrfs system perfectly read-only and access the data.
>>> As soon
>>> as I try to mount rw, my system will exremely slow down, memory will
>>> fill up
>>> until I will finally end up with a panicking kernel.
>>>
>>> So, no problem to successfully boot with the fstab entries on ro or
>>> commented out.
>>>
>>> admiral@server:/$ uname -a
>>> Linux server.domain.loc 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2
>>> (2022-06-30) x86_64 GNU/Linux
>>
>> Your kernel is just one version too old...
>
> My bad, two versions too old.
>
>>
>> In fact, v5.0 kernel we have introduced a lot of qgroup optimization to
>
> Git describes --contains shows it's v5.1 for the optimization.
>
>> address the slow performance (including hang, huge memory usage) of
>> balance with qgroup enabled.
>>
>> Although that optimization also introduced some regression, all the
>> known regression should have been fixed and backported.
>>
>> But for older kernels, like your 4.x kernels, we don't have the
>> optimization at all.
>>
>> Thus in your case, you may want to use the latest LTS kernel at least
>> (v5.15.x).
>>
>> Thanks,
>> Qu
>>
>>>
>>> admiral@server:/$ btrfs --version
>>> btrfs-progs v5.10.1
>>>
>>> Here the question:
>>> I am looking for the option to disable quota on an unmounted btrfs like
>>> described here:
>>> https://patchwork.kernel.org/project/linux-btrfs/patch/20180812013358.16431-
>>> 1-wqu@suse.com/
>>>
>>> All my trials and checks et cetera were performed with btrfs-progs
>>> v4.20.1-2
>>> as debian buster's latest state:
>>> https://packages.debian.org/de/buster/btrfs-progs
>>>
>>> I already upgraded the btrfs-progs to debian backport v5.10.1 but do not
>>> find any option to offline disable quota, yet:
>>> https://packages.debian.org/buster-backports/btrfs-progs
>>>
>>> Can you point me some direction how to move forward to recover the btrfs?
>>>
>>> Thanks a lot,
>>>
>>> admiralbulli
>>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
@ 2024-05-05 3:55 O'Brien Dave
2024-05-05 6:09 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: O'Brien Dave @ 2024-05-05 3:55 UTC (permalink / raw)
To: linux-btrfs
Dear BTRFS team,
I’ve had a weird hanging situation as described by the previous email in this chain: https://lore.kernel.org/linux-btrfs/133101d8dbce$c666a030$5333e090$@admiralbulli.de/
The situation:
I have /home as 2x6TB hdd in BTRFS Raid0/data, RAID1/MData. I make a daily snapshot by cronjob overnight, so there's about 1000 snapshots on it. (/ is on a separated ssd)
To see where all the space was going, I enabled quotas: `btrfs quota enable /home`, and it started doing its thing. When it was nearly complete, I deleted one of the subvols with `btrfs subvol delete /home/BACKUP....` (one of the earlier backups, about 117MB exclusive, according to the qgroup), and realised it would take a while to complete, so I left it alone.
Later that same day, there was a power outage, and when I restarted the box, everything came up as normal, but a `btrfs-cleaner` process started that eventually took all of memory (32GB) and then eventually made the machine non-responsive. I tried to disable the quotas with `btrfs quota disable /home` while this was happening, but the command didn't return.
I rebooted in single user with `/home` unmounted, set up 128GB of swap using a USB 3.0 flashdrive, then ran `btrfs check -p -Q /home`. It took 75 hours to run, and used a max of about 80GB of RAM+Swap, and reported no errors. I tried to mount the drive as normal again, and once more `btrfs-cleaner` spins up, takes all memory and makes everything unresponsive, with constant `OOM` killings of all processes, until eventually the system crashed. It didn't use the swap much, which might be relevant. All through this, `btrfs-orphan-cleanup-progress` reports that there is one orphan to be deleted, corresponding to the snapshot I deleted, and it doesn't go away.
`btrfs qgroup show /home` shows the deleted subvol as <stale>.
I can mount the volume read-only and with `ro,rescue-all` with no drama, and nothing dramatic appears in the system logs, but mounting as `default` causes the eventual crash of the machine as described above.
I cannot run `btrfs quota disable /home` as the command doesn't return, and the system eventually locks up when mounted RW.
My current kernel is 6.8.7-fc200, which should all of the optimisations discussed in previous emails in this thread. The filesystem is about 3 years old (2021/04) but I don’t remember which kernel was running then, but it should have been at least 5.8 according to https://en.wikipedia.org/wiki/Fedora_Linux_release_history.
Is there a way to disable the quotas with device unmounted (I don’t really need that info, and I can always rescan later.) I made a start at patching the `disable-quota` command into btrfs-progs, but it reports an open transaction, when run.
Any advice on how to proceed? (Apart from backup everything, of course)
thanks and regards,
dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
2024-05-05 3:55 O'Brien Dave
@ 2024-05-05 6:09 ` Qu Wenruo
0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-05-05 6:09 UTC (permalink / raw)
To: O'Brien Dave, linux-btrfs
在 2024/5/5 13:25, O'Brien Dave 写道:
> Dear BTRFS team,
>
> I’ve had a weird hanging situation as described by the previous email in this chain: https://lore.kernel.org/linux-btrfs/133101d8dbce$c666a030$5333e090$@admiralbulli.de/
> The situation:
> I have /home as 2x6TB hdd in BTRFS Raid0/data, RAID1/MData. I make a daily snapshot by cronjob overnight, so there's about 1000 snapshots on it. (/ is on a separated ssd)
>
> To see where all the space was going, I enabled quotas: `btrfs quota enable /home`, and it started doing its thing. When it was nearly complete, I deleted one of the subvols with `btrfs subvol delete /home/BACKUP....` (one of the earlier backups, about 117MB exclusive, according to the qgroup), and realised it would take a while to complete, so I left it alone.
Deleting a snapshot is super qgroup heavy, it needs to remark all
involved data extents for qgroup to rescan, and furthermore, the rescan
has to be done in just one transaction, mostly to hang the whole system.
>
> Later that same day, there was a power outage, and when I restarted the box, everything came up as normal, but a `btrfs-cleaner` process started that eventually took all of memory (32GB) and then eventually made the machine non-responsive. I tried to disable the quotas with `btrfs quota disable /home` while this was happening, but the command didn't return.
That's the same thing, doing the same subvolume dropping.
And unfortunately there is no proper way to handle it without marking
qgroup inconsistent.
So the only way to get rid of the situation is using the newer sysfs
interface "/sys/fs/btrfs/<uuid>/qgroups/drop_subtree_treshold".
Some lower value like 2 or 3 would be good enough to address the
situation, which would automatically change qgroup to inconsistent if a
larger enough subtree is dropped.
Thanks,
Qu
>
> I rebooted in single user with `/home` unmounted, set up 128GB of swap using a USB 3.0 flashdrive, then ran `btrfs check -p -Q /home`. It took 75 hours to run, and used a max of about 80GB of RAM+Swap, and reported no errors. I tried to mount the drive as normal again, and once more `btrfs-cleaner` spins up, takes all memory and makes everything unresponsive, with constant `OOM` killings of all processes, until eventually the system crashed. It didn't use the swap much, which might be relevant. All through this, `btrfs-orphan-cleanup-progress` reports that there is one orphan to be deleted, corresponding to the snapshot I deleted, and it doesn't go away.
>
> `btrfs qgroup show /home` shows the deleted subvol as <stale>.
>
> I can mount the volume read-only and with `ro,rescue-all` with no drama, and nothing dramatic appears in the system logs, but mounting as `default` causes the eventual crash of the machine as described above.
>
> I cannot run `btrfs quota disable /home` as the command doesn't return, and the system eventually locks up when mounted RW.
>
> My current kernel is 6.8.7-fc200, which should all of the optimisations discussed in previous emails in this thread. The filesystem is about 3 years old (2021/04) but I don’t remember which kernel was running then, but it should have been at least 5.8 according to https://en.wikipedia.org/wiki/Fedora_Linux_release_history.
>
> Is there a way to disable the quotas with device unmounted (I don’t really need that info, and I can always rescan later.) I made a start at patching the `disable-quota` command into btrfs-progs, but it reports an open transaction, when run.
>
> Any advice on how to proceed? (Apart from backup everything, of course)
>
> thanks and regards,
> dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
@ 2024-05-07 13:43 O'Brien Dave
2024-05-07 20:44 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: O'Brien Dave @ 2024-05-07 13:43 UTC (permalink / raw)
To: quwenruo.btrfs; +Cc: linux-btrfs
> So the only way to get rid of the situation is using the newer sysfs
> interface "/sys/fs/btrfs/<uuid>/qgroups/drop_subtree_treshold”.
>
> Some lower value like 2 or 3 would be good enough to address the
> situation, which would automatically change qgroup to inconsistent if a
> larger enough subtree is dropped.
Setting the threshold to 2 or 3 didn't work - the machine ran until OOM failure in both cases - but what did work was setting it to 1 or 0. (I’m not sure which fixed it, as I set it to 1, then 0, there was a flurry of disk activity and the qgroups were immediately marked as inconsistent.)
So, after rebooting into single user mode with /home in ro:
$ vim /etc/fstab # to change /home back to the defaults
$ mount /home
$ echo "0" >/sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
$ cat/sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold # to check
$ btrfs qgroup show -pcre /home
$ btrfs quota disable /home
Thanks for your help!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2
2024-05-07 13:43 O'Brien Dave
@ 2024-05-07 20:44 ` Qu Wenruo
0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-05-07 20:44 UTC (permalink / raw)
To: O'Brien Dave; +Cc: linux-btrfs
在 2024/5/7 23:13, O'Brien Dave 写道:
>> So the only way to get rid of the situation is using the newer sysfs
>> interface "/sys/fs/btrfs/<uuid>/qgroups/drop_subtree_treshold”.
>>
>> Some lower value like 2 or 3 would be good enough to address the
>> situation, which would automatically change qgroup to inconsistent if a
>> larger enough subtree is dropped.
>
> Setting the threshold to 2 or 3 didn't work - the machine ran until OOM failure in both cases - but what did work was setting it to 1 or 0. (I’m not sure which fixed it, as I set it to 1, then 0, there was a flurry of disk activity and the qgroups were immediately marked as inconsistent.)
>
> So, after rebooting into single user mode with /home in ro:
>
> $ vim /etc/fstab # to change /home back to the defaults
> $ mount /home
I guess there is some timing problem involved.
Normally a subtree with level 2 or 3 isn't that large (hundreds
extents), and should not cause a huge problem.
But it's possible that some huge subtree is already queued for scan,
thus at the time of setting drop_subtree_threshold, it's too late.
So I'd recommend to mount it RO first, setting the value, then remount
it to RW, so that none of the huge subtree would be queued before
setting the value.
Anyway glad to help, and I really believe we need a way to set the
option in a more persistent way, so that we can avoid such inconvenience
for all.
Thanks,
Qu
> $ echo "0" >/sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
> $ cat/sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold # to check
> $ btrfs qgroup show -pcre /home
> $ btrfs quota disable /home
>
> Thanks for your help!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-07 20:44 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-09 11:03 BTRFS w/ quotas hangs on read-write mount using all available RAM - rev2 admiral
2022-10-09 11:13 ` Qu Wenruo
2022-10-09 11:37 ` Qu Wenruo
2022-10-10 21:55 ` admiral
-- strict thread matches above, loose matches on Subject: below --
2024-05-05 3:55 O'Brien Dave
2024-05-05 6:09 ` Qu Wenruo
2024-05-07 13:43 O'Brien Dave
2024-05-07 20:44 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox