* [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance @ 2020-07-06 7:44 Qu Wenruo 2020-07-06 7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo 2020-07-06 7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo 0 siblings, 2 replies; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 7:44 UTC (permalink / raw) To: linux-btrfs There is a report that, unlucky signal timing during balance can cause btrfs to remounted into RO mode. This is caused by the fact that, most btrfs_start_transaction() or delalloc metadata reserve are interruptible. That would return -EINTR to a lot of critical code section, and under most case, our way to handle such error is just to abort transaction, without any consideration for -EINTR. This is never a good idea to allow random Ctrl-C to make btrfs RO, even if the window is pretty small for regular operations. This patchset will address it in a different direction, since most operations are pretty fast, we don't need that signal check in waiting ticket. For those long running operations, signal should be checked in their call sites. E.g. __generic_block_fiemap() calls fatal_signal_pending() to check if it needs to exit, so does btrfs_clone(). We shouldn't check the signal, and just throw a -EINTR for all ticketing system callers, they don't really want to handle that damn -EINTR. Only long executing operations really need that signal checking, and let them to check, not the infrastructure. Reason for RFC: I'm not yet completely sure if uninterruptible ticketing system would cause extra problems. Any advice on that would be great. Qu Wenruo (2): btrfs: relocation: Allow signal to cancel balance btrfs: space-info: Don't allow signal to interrupt ticket waiting fs/btrfs/relocation.c | 3 ++- fs/btrfs/space-info.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance 2020-07-06 7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo @ 2020-07-06 7:44 ` Qu Wenruo 2020-07-06 13:45 ` Josef Bacik 2020-07-06 18:19 ` Hans van Kranenburg 2020-07-06 7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo 1 sibling, 2 replies; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 7:44 UTC (permalink / raw) To: linux-btrfs Although btrfs balance can be canceled with "btrfs balance cancel" command, it's still almost muscle memory to press Ctrl-C to cancel a long running btrfs balance. So allow btrfs balance to check signal to determine if it should exit. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 523d2e5fab8f..2b869fb2e62c 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end, */ int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) { - return atomic_read(&fs_info->balance_cancel_req); + return atomic_read(&fs_info->balance_cancel_req) || + fatal_signal_pending(current); } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance 2020-07-06 7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo @ 2020-07-06 13:45 ` Josef Bacik 2020-07-06 18:19 ` Hans van Kranenburg 1 sibling, 0 replies; 12+ messages in thread From: Josef Bacik @ 2020-07-06 13:45 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 7/6/20 3:44 AM, Qu Wenruo wrote: > Although btrfs balance can be canceled with "btrfs balance cancel" > command, it's still almost muscle memory to press Ctrl-C to cancel a > long running btrfs balance. > > So allow btrfs balance to check signal to determine if it should exit. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance 2020-07-06 7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo 2020-07-06 13:45 ` Josef Bacik @ 2020-07-06 18:19 ` Hans van Kranenburg 2020-07-06 22:43 ` Qu Wenruo 1 sibling, 1 reply; 12+ messages in thread From: Hans van Kranenburg @ 2020-07-06 18:19 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs Hi, On 7/6/20 9:44 AM, Qu Wenruo wrote: > Although btrfs balance can be canceled with "btrfs balance cancel" > command, it's still almost muscle memory to press Ctrl-C to cancel a > long running btrfs balance. Thanks for investigating all of this. Has it actually been unsafe (read: undefined behaviour) forever, or only since some recent change? Or did it just by accident not cause real damage earlier on in the past? [ 1041.810963] BTRFS info (device xvdb): relocating block group 91423244288 flags metadata <- ^C made it stop here [ 1076.189766] BTRFS info (device xvdb): relocating block group 91423244288 flags metadata [ 1081.300131] BTRFS info (device xvdb): found 6689 extents [ 1081.311138] BTRFS info (device xvdb): relocating block group 90349502464 flags data [ 1089.776066] BTRFS info (device xvdb): found 215 extents The above is with 4.19.118. Now I'm trying this again and looking just a little better at the logging, I see that what I thought (it always stopped after doing 1 block group) is not true. With ^C I can make it stop halfway and then next time it again starts at 91423244288. Related question: should, additionally, the btrfs balance in progs also be changed to catch the SIGINT while it's doing the balance ioctl, to prevent the signal from leaking to the kernel space? I mean, kernels with the bug are already 'out there' now... Thanks > So allow btrfs balance to check signal to determine if it should exit. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 523d2e5fab8f..2b869fb2e62c 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end, > */ > int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) > { > - return atomic_read(&fs_info->balance_cancel_req); > + return atomic_read(&fs_info->balance_cancel_req) || > + fatal_signal_pending(current); > } > ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance 2020-07-06 18:19 ` Hans van Kranenburg @ 2020-07-06 22:43 ` Qu Wenruo 0 siblings, 0 replies; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 22:43 UTC (permalink / raw) To: Hans van Kranenburg, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --] On 2020/7/7 上午2:19, Hans van Kranenburg wrote: > Hi, > > On 7/6/20 9:44 AM, Qu Wenruo wrote: >> Although btrfs balance can be canceled with "btrfs balance cancel" >> command, it's still almost muscle memory to press Ctrl-C to cancel a >> long running btrfs balance. > > Thanks for investigating all of this. > > Has it actually been unsafe (read: undefined behaviour) forever, or only > since some recent change? Forever. That -EINTR from metadata reservation path is there for a long long time. > > Or did it just by accident not cause real damage earlier on in the past? > > [ 1041.810963] BTRFS info (device xvdb): relocating block group > 91423244288 flags metadata > > <- ^C made it stop here > > [ 1076.189766] BTRFS info (device xvdb): relocating block group > 91423244288 flags metadata > [ 1081.300131] BTRFS info (device xvdb): found 6689 extents > [ 1081.311138] BTRFS info (device xvdb): relocating block group > 90349502464 flags data > [ 1089.776066] BTRFS info (device xvdb): found 215 extents > > The above is with 4.19.118. Now I'm trying this again and looking just a > little better at the logging, I see that what I thought (it always > stopped after doing 1 block group) is not true. With ^C I can make it > stop halfway and then next time it again starts at 91423244288. > > Related question: should, additionally, the btrfs balance in progs also > be changed to catch the SIGINT while it's doing the balance ioctl, to > prevent the signal from leaking to the kernel space? I mean, kernels > with the bug are already 'out there' now... It has no way to catch signal while trapped into kernel space. My previous assumption of the whole ioctl thing is still right, when we're in kernel space, we can't catch any signal. It's just the metadata reservation code manually checking the pending signal and return -EINTR. So even if we make btrfs-progs to catch that signal, it won't work. And even if it seems to work, it's because in btrfs module we're checking signal explicitly. Thanks, Qu > > Thanks > >> So allow btrfs balance to check signal to determine if it should exit. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/relocation.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c >> index 523d2e5fab8f..2b869fb2e62c 100644 >> --- a/fs/btrfs/relocation.c >> +++ b/fs/btrfs/relocation.c >> @@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end, >> */ >> int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info) >> { >> - return atomic_read(&fs_info->balance_cancel_req); >> + return atomic_read(&fs_info->balance_cancel_req) || >> + fatal_signal_pending(current); >> } >> ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE); >> >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo 2020-07-06 7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo @ 2020-07-06 7:44 ` Qu Wenruo 2020-07-06 13:45 ` Josef Bacik 1 sibling, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 7:44 UTC (permalink / raw) To: linux-btrfs; +Cc: Hans van Kranenburg [BUG] When balance receive a fatal signal, it can make the fs to read-only mode if the timing is unlucky enough: BTRFS info (device xvdb): balance: start -d -m -s BTRFS info (device xvdb): relocating block group 73001861120 flags metadata BTRFS info (device xvdb): found 12236 extents, stage: move data extents BTRFS info (device xvdb): relocating block group 71928119296 flags data BTRFS info (device xvdb): found 3 extents, stage: move data extents BTRFS info (device xvdb): found 3 extents, stage: update data pointers BTRFS info (device xvdb): relocating block group 60922265600 flags metadata BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown BTRFS info (device xvdb): forced readonly BTRFS info (device xvdb): balance: ended with status: -4 [CAUSE] This is caused by the fact that btrfs ticketing space system can be interrupted, and cause all kind of -EINTR returned to various critical section, where we never thought of -EINTR at all. Even for things like btrfs_start_transaction() can be affected by signal: btrfs_start_transaction() |- start_transaction(flush = FLUSH_ALL) |- btrfs_block_rsv_add() |- btrfs_reserve_metadata_bytes() |- __reserve_metadata_bytes() |- handle_reserve_ticket() |- wait_reserve_ticket() |- prepare_to_wait_event(TASK_KILLABLE) |- ticket->error = -EINTR; And all related callers get -EINTR error. In fact, there are really very limited call sites can really handle that -EINTR properly, above btrfs_drop_snapshot() is one case. [FIX] Things like metadata allocation is really a critical section for btrfs, we don't really want it to be that killable by some impatient users. In fact, for really long duration calls, it should have their own checks on signal, like balance, reflink, generic fiemap calls. So this patch will make ticket waiting uninterruptible, relying on each long duration calls to handle their signals more properly. Reported-by: Hans van Kranenburg <hans@knorrie.org> Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/space-info.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index c7bd3fdd7792..c5cfc759b804 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1099,7 +1099,8 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info, spin_lock(&space_info->lock); while (ticket->bytes > 0 && ticket->error == 0) { - ret = prepare_to_wait_event(&ticket->wait, &wait, TASK_KILLABLE); + ret = prepare_to_wait_event(&ticket->wait, &wait, + TASK_UNINTERRUPTIBLE); if (ret) { /* * Delete us from the list. After we unlock the space -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo @ 2020-07-06 13:45 ` Josef Bacik 2020-07-06 13:50 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Josef Bacik @ 2020-07-06 13:45 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg On 7/6/20 3:44 AM, Qu Wenruo wrote: > [BUG] > When balance receive a fatal signal, it can make the fs to read-only > mode if the timing is unlucky enough: > > BTRFS info (device xvdb): balance: start -d -m -s > BTRFS info (device xvdb): relocating block group 73001861120 flags metadata > BTRFS info (device xvdb): found 12236 extents, stage: move data extents > BTRFS info (device xvdb): relocating block group 71928119296 flags data > BTRFS info (device xvdb): found 3 extents, stage: move data extents > BTRFS info (device xvdb): found 3 extents, stage: update data pointers > BTRFS info (device xvdb): relocating block group 60922265600 flags metadata > BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown > BTRFS info (device xvdb): forced readonly > BTRFS info (device xvdb): balance: ended with status: -4 > > [CAUSE] > This is caused by the fact that btrfs ticketing space system can be > interrupted, and cause all kind of -EINTR returned to various critical > section, where we never thought of -EINTR at all. > > Even for things like btrfs_start_transaction() can be affected by > signal: > btrfs_start_transaction() > |- start_transaction(flush = FLUSH_ALL) > |- btrfs_block_rsv_add() > |- btrfs_reserve_metadata_bytes() > |- __reserve_metadata_bytes() > |- handle_reserve_ticket() > |- wait_reserve_ticket() > |- prepare_to_wait_event(TASK_KILLABLE) > |- ticket->error = -EINTR; > > And all related callers get -EINTR error. > > In fact, there are really very limited call sites can really handle that > -EINTR properly, above btrfs_drop_snapshot() is one case. > > [FIX] > Things like metadata allocation is really a critical section for btrfs, > we don't really want it to be that killable by some impatient users. > > In fact, for really long duration calls, it should have their own checks > on signal, like balance, reflink, generic fiemap calls. > > So this patch will make ticket waiting uninterruptible, relying on each > long duration calls to handle their signals more properly. > Nope, everybody that calls start_transaction() should be able to handle -EINTR, so we need to find those callsites and fix them, not make it so we hang the box because we don't like fixing error paths. Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 13:45 ` Josef Bacik @ 2020-07-06 13:50 ` Qu Wenruo 2020-07-06 13:53 ` Josef Bacik 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 13:50 UTC (permalink / raw) To: Josef Bacik, linux-btrfs; +Cc: Hans van Kranenburg On 2020/7/6 下午9:45, Josef Bacik wrote: > On 7/6/20 3:44 AM, Qu Wenruo wrote: >> [BUG] >> When balance receive a fatal signal, it can make the fs to read-only >> mode if the timing is unlucky enough: >> >> BTRFS info (device xvdb): balance: start -d -m -s >> BTRFS info (device xvdb): relocating block group 73001861120 flags >> metadata >> BTRFS info (device xvdb): found 12236 extents, stage: move data >> extents >> BTRFS info (device xvdb): relocating block group 71928119296 flags >> data >> BTRFS info (device xvdb): found 3 extents, stage: move data extents >> BTRFS info (device xvdb): found 3 extents, stage: update data pointers >> BTRFS info (device xvdb): relocating block group 60922265600 flags >> metadata >> BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 >> unknown >> BTRFS info (device xvdb): forced readonly >> BTRFS info (device xvdb): balance: ended with status: -4 >> >> [CAUSE] >> This is caused by the fact that btrfs ticketing space system can be >> interrupted, and cause all kind of -EINTR returned to various critical >> section, where we never thought of -EINTR at all. >> >> Even for things like btrfs_start_transaction() can be affected by >> signal: >> btrfs_start_transaction() >> |- start_transaction(flush = FLUSH_ALL) >> |- btrfs_block_rsv_add() >> |- btrfs_reserve_metadata_bytes() >> |- __reserve_metadata_bytes() >> |- handle_reserve_ticket() >> |- wait_reserve_ticket() >> |- prepare_to_wait_event(TASK_KILLABLE) >> |- ticket->error = -EINTR; >> >> And all related callers get -EINTR error. >> >> In fact, there are really very limited call sites can really handle that >> -EINTR properly, above btrfs_drop_snapshot() is one case. >> >> [FIX] >> Things like metadata allocation is really a critical section for btrfs, >> we don't really want it to be that killable by some impatient users. >> >> In fact, for really long duration calls, it should have their own checks >> on signal, like balance, reflink, generic fiemap calls. >> >> So this patch will make ticket waiting uninterruptible, relying on each >> long duration calls to handle their signals more properly. >> > > Nope, everybody that calls start_transaction() should be able to handle > -EINTR, so we need to find those callsites and fix them, not make it so > we hang the box because we don't like fixing error paths. Thanks, Then we also need to handle btrfs_delalloc_reserve_metadata(), btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add(). Are you really willing to go to that rabbit hole? To me, there are only limited call sites would benefit from signal checking, while we need to handle tons of unnecessary possible -EINTR errors just for no obvious benefits for other calls sites? That doesn't sound sane to me. Thanks, Qu > > Josef > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 13:50 ` Qu Wenruo @ 2020-07-06 13:53 ` Josef Bacik 2020-07-06 14:05 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Josef Bacik @ 2020-07-06 13:53 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg On 7/6/20 9:50 AM, Qu Wenruo wrote: > > > On 2020/7/6 下午9:45, Josef Bacik wrote: >> On 7/6/20 3:44 AM, Qu Wenruo wrote: >>> [BUG] >>> When balance receive a fatal signal, it can make the fs to read-only >>> mode if the timing is unlucky enough: >>> >>> BTRFS info (device xvdb): balance: start -d -m -s >>> BTRFS info (device xvdb): relocating block group 73001861120 flags >>> metadata >>> BTRFS info (device xvdb): found 12236 extents, stage: move data >>> extents >>> BTRFS info (device xvdb): relocating block group 71928119296 flags >>> data >>> BTRFS info (device xvdb): found 3 extents, stage: move data extents >>> BTRFS info (device xvdb): found 3 extents, stage: update data pointers >>> BTRFS info (device xvdb): relocating block group 60922265600 flags >>> metadata >>> BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 >>> unknown >>> BTRFS info (device xvdb): forced readonly >>> BTRFS info (device xvdb): balance: ended with status: -4 >>> >>> [CAUSE] >>> This is caused by the fact that btrfs ticketing space system can be >>> interrupted, and cause all kind of -EINTR returned to various critical >>> section, where we never thought of -EINTR at all. >>> >>> Even for things like btrfs_start_transaction() can be affected by >>> signal: >>> btrfs_start_transaction() >>> |- start_transaction(flush = FLUSH_ALL) >>> |- btrfs_block_rsv_add() >>> |- btrfs_reserve_metadata_bytes() >>> |- __reserve_metadata_bytes() >>> |- handle_reserve_ticket() >>> |- wait_reserve_ticket() >>> |- prepare_to_wait_event(TASK_KILLABLE) >>> |- ticket->error = -EINTR; >>> >>> And all related callers get -EINTR error. >>> >>> In fact, there are really very limited call sites can really handle that >>> -EINTR properly, above btrfs_drop_snapshot() is one case. >>> >>> [FIX] >>> Things like metadata allocation is really a critical section for btrfs, >>> we don't really want it to be that killable by some impatient users. >>> >>> In fact, for really long duration calls, it should have their own checks >>> on signal, like balance, reflink, generic fiemap calls. >>> >>> So this patch will make ticket waiting uninterruptible, relying on each >>> long duration calls to handle their signals more properly. >>> >> >> Nope, everybody that calls start_transaction() should be able to handle >> -EINTR, so we need to find those callsites and fix them, not make it so >> we hang the box because we don't like fixing error paths. Thanks, > > Then we also need to handle btrfs_delalloc_reserve_metadata(), > btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add(). > This only needs to be handled for FLUSH_ALL and FLUSH_STEAL. Anybody doing btrfs_start_transaction() should be able to fail with -EINTR, because they should be close to the syscall path. Balance needs to be fixed to deal with it, and I assume there might be a few other places, but by-in-large none of these places should flip read only. Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 13:53 ` Josef Bacik @ 2020-07-06 14:05 ` Qu Wenruo 2020-07-06 14:33 ` Josef Bacik 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-07-06 14:05 UTC (permalink / raw) To: Josef Bacik, Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg [-- Attachment #1.1: Type: text/plain, Size: 4025 bytes --] On 2020/7/6 下午9:53, Josef Bacik wrote: > On 7/6/20 9:50 AM, Qu Wenruo wrote: >> >> >> On 2020/7/6 下午9:45, Josef Bacik wrote: >>> On 7/6/20 3:44 AM, Qu Wenruo wrote: >>>> [BUG] >>>> When balance receive a fatal signal, it can make the fs to read-only >>>> mode if the timing is unlucky enough: >>>> >>>> BTRFS info (device xvdb): balance: start -d -m -s >>>> BTRFS info (device xvdb): relocating block group 73001861120 flags >>>> metadata >>>> BTRFS info (device xvdb): found 12236 extents, stage: move data >>>> extents >>>> BTRFS info (device xvdb): relocating block group 71928119296 flags >>>> data >>>> BTRFS info (device xvdb): found 3 extents, stage: move data extents >>>> BTRFS info (device xvdb): found 3 extents, stage: update data >>>> pointers >>>> BTRFS info (device xvdb): relocating block group 60922265600 flags >>>> metadata >>>> BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 >>>> unknown >>>> BTRFS info (device xvdb): forced readonly >>>> BTRFS info (device xvdb): balance: ended with status: -4 >>>> >>>> [CAUSE] >>>> This is caused by the fact that btrfs ticketing space system can be >>>> interrupted, and cause all kind of -EINTR returned to various critical >>>> section, where we never thought of -EINTR at all. >>>> >>>> Even for things like btrfs_start_transaction() can be affected by >>>> signal: >>>> btrfs_start_transaction() >>>> |- start_transaction(flush = FLUSH_ALL) >>>> |- btrfs_block_rsv_add() >>>> |- btrfs_reserve_metadata_bytes() >>>> |- __reserve_metadata_bytes() >>>> |- handle_reserve_ticket() >>>> |- wait_reserve_ticket() >>>> |- prepare_to_wait_event(TASK_KILLABLE) >>>> |- ticket->error = -EINTR; >>>> >>>> And all related callers get -EINTR error. >>>> >>>> In fact, there are really very limited call sites can really handle >>>> that >>>> -EINTR properly, above btrfs_drop_snapshot() is one case. >>>> >>>> [FIX] >>>> Things like metadata allocation is really a critical section for btrfs, >>>> we don't really want it to be that killable by some impatient users. >>>> >>>> In fact, for really long duration calls, it should have their own >>>> checks >>>> on signal, like balance, reflink, generic fiemap calls. >>>> >>>> So this patch will make ticket waiting uninterruptible, relying on each >>>> long duration calls to handle their signals more properly. >>>> >>> >>> Nope, everybody that calls start_transaction() should be able to handle >>> -EINTR, so we need to find those callsites and fix them, not make it so >>> we hang the box because we don't like fixing error paths. Thanks, >> >> Then we also need to handle btrfs_delalloc_reserve_metadata(), >> btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add(). >> > > This only needs to be handled for FLUSH_ALL and FLUSH_STEAL. Anybody > doing btrfs_start_transaction() should be able to fail with -EINTR, > because they should be close to the syscall path. Balance needs to be > fixed to deal with it, and I assume there might be a few other places, > but by-in-large none of these places should flip read only. Thanks, There are already ones existing, for btrfs_drop_snapshot(). This is mostly caused by btrfs_delalloc_reserve_metadata(), which always use FLUSH_ALL unless there is a running trans or its space cache inode. But still, for a lot of relocation code, we don't really want to bother the EINTR and just want uninterruptible at least for now. Any idea for that? Or just rework how we handle errors in a lot of places? It doesn't look correct for such a low level mechanism to return -EINTR while most of callers doesn't really want to bother. Thanks, Qu > > Josef [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 14:05 ` Qu Wenruo @ 2020-07-06 14:33 ` Josef Bacik 2020-07-07 16:16 ` David Sterba 0 siblings, 1 reply; 12+ messages in thread From: Josef Bacik @ 2020-07-06 14:33 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg On 7/6/20 10:05 AM, Qu Wenruo wrote: > > > On 2020/7/6 下午9:53, Josef Bacik wrote: >> On 7/6/20 9:50 AM, Qu Wenruo wrote: >>> >>> >>> On 2020/7/6 下午9:45, Josef Bacik wrote: >>>> On 7/6/20 3:44 AM, Qu Wenruo wrote: >>>>> [BUG] >>>>> When balance receive a fatal signal, it can make the fs to read-only >>>>> mode if the timing is unlucky enough: >>>>> >>>>> BTRFS info (device xvdb): balance: start -d -m -s >>>>> BTRFS info (device xvdb): relocating block group 73001861120 flags >>>>> metadata >>>>> BTRFS info (device xvdb): found 12236 extents, stage: move data >>>>> extents >>>>> BTRFS info (device xvdb): relocating block group 71928119296 flags >>>>> data >>>>> BTRFS info (device xvdb): found 3 extents, stage: move data extents >>>>> BTRFS info (device xvdb): found 3 extents, stage: update data >>>>> pointers >>>>> BTRFS info (device xvdb): relocating block group 60922265600 flags >>>>> metadata >>>>> BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 >>>>> unknown >>>>> BTRFS info (device xvdb): forced readonly >>>>> BTRFS info (device xvdb): balance: ended with status: -4 >>>>> >>>>> [CAUSE] >>>>> This is caused by the fact that btrfs ticketing space system can be >>>>> interrupted, and cause all kind of -EINTR returned to various critical >>>>> section, where we never thought of -EINTR at all. >>>>> >>>>> Even for things like btrfs_start_transaction() can be affected by >>>>> signal: >>>>> btrfs_start_transaction() >>>>> |- start_transaction(flush = FLUSH_ALL) >>>>> |- btrfs_block_rsv_add() >>>>> |- btrfs_reserve_metadata_bytes() >>>>> |- __reserve_metadata_bytes() >>>>> |- handle_reserve_ticket() >>>>> |- wait_reserve_ticket() >>>>> |- prepare_to_wait_event(TASK_KILLABLE) >>>>> |- ticket->error = -EINTR; >>>>> >>>>> And all related callers get -EINTR error. >>>>> >>>>> In fact, there are really very limited call sites can really handle >>>>> that >>>>> -EINTR properly, above btrfs_drop_snapshot() is one case. >>>>> >>>>> [FIX] >>>>> Things like metadata allocation is really a critical section for btrfs, >>>>> we don't really want it to be that killable by some impatient users. >>>>> >>>>> In fact, for really long duration calls, it should have their own >>>>> checks >>>>> on signal, like balance, reflink, generic fiemap calls. >>>>> >>>>> So this patch will make ticket waiting uninterruptible, relying on each >>>>> long duration calls to handle their signals more properly. >>>>> >>>> >>>> Nope, everybody that calls start_transaction() should be able to handle >>>> -EINTR, so we need to find those callsites and fix them, not make it so >>>> we hang the box because we don't like fixing error paths. Thanks, >>> >>> Then we also need to handle btrfs_delalloc_reserve_metadata(), >>> btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add(). >>> >> >> This only needs to be handled for FLUSH_ALL and FLUSH_STEAL. Anybody >> doing btrfs_start_transaction() should be able to fail with -EINTR, >> because they should be close to the syscall path. Balance needs to be >> fixed to deal with it, and I assume there might be a few other places, >> but by-in-large none of these places should flip read only. Thanks, > > There are already ones existing, for btrfs_drop_snapshot(). > > This is mostly caused by btrfs_delalloc_reserve_metadata(), which always > use FLUSH_ALL unless there is a running trans or its space cache inode. > > But still, for a lot of relocation code, we don't really want to bother > the EINTR and just want uninterruptible at least for now. > > Any idea for that? > Or just rework how we handle errors in a lot of places? > > It doesn't look correct for such a low level mechanism to return -EINTR > while most of callers doesn't really want to bother. > That's the point, most callers shouldn't have to, because most callers shouldn't be far enough into their operations that -EINTR causes problems. We should probably just change btrfs_drop_snapshot to use join, and then have it do any space reservation it needs outside of the trans handle. The other option is a FLUSH_ALL_UNKILLABLE. Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting 2020-07-06 14:33 ` Josef Bacik @ 2020-07-07 16:16 ` David Sterba 0 siblings, 0 replies; 12+ messages in thread From: David Sterba @ 2020-07-07 16:16 UTC (permalink / raw) To: Josef Bacik; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs, Hans van Kranenburg On Mon, Jul 06, 2020 at 10:33:56AM -0400, Josef Bacik wrote: > On 7/6/20 10:05 AM, Qu Wenruo wrote: > >> This only needs to be handled for FLUSH_ALL and FLUSH_STEAL. Anybody > >> doing btrfs_start_transaction() should be able to fail with -EINTR, > >> because they should be close to the syscall path. Balance needs to be > >> fixed to deal with it, and I assume there might be a few other places, > >> but by-in-large none of these places should flip read only. Thanks, > > > > There are already ones existing, for btrfs_drop_snapshot(). > > > > This is mostly caused by btrfs_delalloc_reserve_metadata(), which always > > use FLUSH_ALL unless there is a running trans or its space cache inode. > > > > But still, for a lot of relocation code, we don't really want to bother > > the EINTR and just want uninterruptible at least for now. > > > > Any idea for that? > > Or just rework how we handle errors in a lot of places? > > > > It doesn't look correct for such a low level mechanism to return -EINTR > > while most of callers doesn't really want to bother. > > > > That's the point, most callers shouldn't have to, because most callers shouldn't > be far enough into their operations that -EINTR causes problems. Agreed, that's a sane pattern to follow so we should convert the remaining cases, perhaps also auditing all existing btrfs_start_transaction calls but after a quick look I haven't spotted anything else than the reloc and drop snapshot. > We should probably just change btrfs_drop_snapshot to use join, and then have it > do any space reservation it needs outside of the trans handle. The other option > is a FLUSH_ALL_UNKILLABLE. Thanks, I'd rather not push the killable semantics to the flushing state machine and let it up to the caller context to decide. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-07-07 16:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-07-06 7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo 2020-07-06 7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo 2020-07-06 13:45 ` Josef Bacik 2020-07-06 18:19 ` Hans van Kranenburg 2020-07-06 22:43 ` Qu Wenruo 2020-07-06 7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo 2020-07-06 13:45 ` Josef Bacik 2020-07-06 13:50 ` Qu Wenruo 2020-07-06 13:53 ` Josef Bacik 2020-07-06 14:05 ` Qu Wenruo 2020-07-06 14:33 ` Josef Bacik 2020-07-07 16:16 ` David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox