public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ahmad Draidi <a.r.draidi@redscript.org>
To: Kent Overstreet <kent.overstreet@linux.dev>,
	linux-bcachefs@vger.kernel.org
Cc: syzbot+7bf808f7fe4a6549f36e@syzkaller.appspotmail.com
Subject: Re: [PATCH] bcachefs: Allocator now directly wakes up copygc when necessary
Date: Thu, 24 Oct 2024 07:46:07 +0400	[thread overview]
Message-ID: <92dce846-d110-4c97-afd1-0b198c1fdf4d@redscript.org> (raw)
In-Reply-To: <20241019215605.160125-1-kent.overstreet@linux.dev>

Greetings,


On 10/20/24 01:56, Kent Overstreet wrote:
> copygc tries to wait in a way that balances waiting for work to
> accumulate with running before we run out of free space - but for a
> variety of reasons (multiple devices, io clock slop, the vagaries of
> fragmentation) this isn't completely reliable.
>
> So to avoid getting stuck, add direct wakeups from the allocator to the
> copygc thread when we start to notice we're low on free buckets.

Since I switched to 6.11.x from 6.10.x, I've had "Allocator stuck? 
Waited for 30 seconds" messages and I/O would stop to the FS. No timeout 
on read, for example, but it just stops for hours, until I reboot. I'm 
able to quickly and reliably trigger this with my workload.


I applied this patch on top of 6.11.4 but can still see "Allocator 
stuck" in dmesg. I see the following before and after the patch:-

"BUG: unable to handle page fault for address: fffffffffffff81b
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page"

...

"RIP: 0010:bch2_btree_path_peek_slot+0x64/0x210 [bcachefs]"


A longer log snippet of "allocator stuck" and the above are at: 
https://pastebin.com/ptuzaryi


I did fsck after FS got stuck, and errors were found and fixed, but 
issue happens again, before and after the patch.

Some info that might be needed: I'm using ECC RAM, 2x SAS SSDs, 2x SATA 
HDDs, LUKS, and the following opts:

starting version 1.12: rebalance_work_acct_fix 
opts=metadata_replicas=2,data_replicas=2,metadata_replicas_required=2,data_replicas_required=2,

metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4,background_compression=gzip,metadata_target=ssd,foreground_target=ssd,

background_target=hdd,promote_target=ssd


Let me know if I can help.


Thanks!

Ahmad



  reply	other threads:[~2024-10-24  3:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19 21:56 [PATCH] bcachefs: Allocator now directly wakes up copygc when necessary Kent Overstreet
2024-10-24  3:46 ` Ahmad Draidi [this message]
2024-12-03  6:06   ` Ahmad Draidi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92dce846-d110-4c97-afd1-0b198c1fdf4d@redscript.org \
    --to=a.r.draidi@redscript.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=syzbot+7bf808f7fe4a6549f36e@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox