public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ahmad Draidi <a.r.draidi@redscript.org>
To: linux-bcachefs@vger.kernel.org
Cc: syzbot+7bf808f7fe4a6549f36e@syzkaller.appspotmail.com
Subject: Re: [PATCH] bcachefs: Allocator now directly wakes up copygc when necessary
Date: Tue, 3 Dec 2024 10:06:29 +0400	[thread overview]
Message-ID: <eaa30a2b-3d96-4249-983b-79cb0348d16d@redscript.org> (raw)
In-Reply-To: <92dce846-d110-4c97-afd1-0b198c1fdf4d@redscript.org>

Hello,


On 10/24/24 07:46, Ahmad Draidi wrote:
> Greetings,
>
>
> On 10/20/24 01:56, Kent Overstreet wrote:
>> copygc tries to wait in a way that balances waiting for work to
>> accumulate with running before we run out of free space - but for a
>> variety of reasons (multiple devices, io clock slop, the vagaries of
>> fragmentation) this isn't completely reliable.
>>
>> So to avoid getting stuck, add direct wakeups from the allocator to the
>> copygc thread when we start to notice we're low on free buckets.
>
> Since I switched to 6.11.x from 6.10.x, I've had "Allocator stuck? 
> Waited for 30 seconds" messages and I/O would stop to the FS. No 
> timeout on read, for example, but it just stops for hours, until I 
> reboot. I'm able to quickly and reliably trigger this with my workload.
>
>
> I applied this patch on top of 6.11.4 but can still see "Allocator 
> stuck" in dmesg. I see the following before and after the patch:-
>
> "BUG: unable to handle page fault for address: fffffffffffff81b
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page"
>
> ...
>
> "RIP: 0010:bch2_btree_path_peek_slot+0x64/0x210 [bcachefs]"
>
>
> A longer log snippet of "allocator stuck" and the above are at: 
> https://pastebin.com/ptuzaryi

Just a quick update for anyone reading this. The issue is solved for me 
after upgrading to 6.12.1.


>
>
> I did fsck after FS got stuck, and errors were found and fixed, but 
> issue happens again, before and after the patch.
>
> Some info that might be needed: I'm using ECC RAM, 2x SAS SSDs, 2x 
> SATA HDDs, LUKS, and the following opts:
>
> starting version 1.12: rebalance_work_acct_fix 
> opts=metadata_replicas=2,data_replicas=2,metadata_replicas_required=2,data_replicas_required=2,
>
> metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4,background_compression=gzip,metadata_target=ssd,foreground_target=ssd, 
>
>
> background_target=hdd,promote_target=ssd
>
>
> Let me know if I can help.
>
>
> Thanks!
>
> Ahmad
>
>

      reply	other threads:[~2024-12-03  6:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19 21:56 [PATCH] bcachefs: Allocator now directly wakes up copygc when necessary Kent Overstreet
2024-10-24  3:46 ` Ahmad Draidi
2024-12-03  6:06   ` Ahmad Draidi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eaa30a2b-3d96-4249-983b-79cb0348d16d@redscript.org \
    --to=a.r.draidi@redscript.org \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=syzbot+7bf808f7fe4a6549f36e@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox