[BUG REPORT] General protection fault while discarding extents on XFS on next-20240305

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
@ 2024-03-06  7:19 Chandan Babu R
  2024-03-06 12:35 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Chandan Babu R @ 2024-03-06  7:19 UTC (permalink / raw)
  To: kbusch; +Cc: linux-block, linux-xfs

Hi,

Executing generic/251 on an XFS filesystem with next-20240305 kernel caused
the following call trace,

[ 6105.092156] XFS (loop5): discard failed for extent [0x344,4], error -4
[ 6105.094267] general protection fault, probably for non-canonical address 0xdffffc000000002a: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 6105.097056] KASAN: null-ptr-deref in range [0x0000000000000150-0x0000000000000157]
[ 6105.098639] CPU: 1 PID: 906401 Comm: fstrim Kdump: loaded Not tainted 6.8.0-rc7-next-20240305+ #1
[ 6105.100368] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023
[ 6105.102049] RIP: 0010:submit_bio_noacct+0x3bc/0x17e0
[ 6105.103441] Code: 00 00 41 89 c5 41 83 e5 01 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 4d 63 ed 4a 8d bc 2b 56 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 d3 0f 00 00
[ 6105.107067] RSP: 0018:ffa00000056a7898 EFLAGS: 00010203
[ 6105.108547] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1fe2200032629a49
[ 6105.110107] RDX: 000000000000002a RSI: 00000000007fffff RDI: 0000000000000157
[ 6105.111686] RBP: ff1100019314d200 R08: 0000000000000000 R09: ff110001026e0880
[ 6105.113281] R10: ff110001026e0887 R11: 0000000000000001 R12: ff1100019314d210
[ 6105.114871] R13: 0000000000000001 R14: ff1100019314d208 R15: ff110001026e0860
[ 6105.116446] FS:  00007f6f4bbfd800(0000) GS:ff110003ee480000(0000) knlGS:0000000000000000
[ 6105.118185] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6105.119630] CR2: 000055997361d4a8 CR3: 000000016f144004 CR4: 0000000000771ef0
[ 6105.121230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6105.122772] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6105.124510] PKRU: 55555554
[ 6105.125601] Call Trace:
3[ 6105.126672]  <TASK>
[ 6105.133971]  xfs_discard_extents+0x340/0x860 [xfs]
[ 6105.139534]  xfs_ioc_trim+0x4b1/0x960 [xfs]
[ 6105.150011]  xfs_file_ioctl+0xc49/0x1370 [xfs]
[ 6105.167691]  __x64_sys_ioctl+0x132/0x1a0
[ 6105.168725]  do_syscall_64+0x69/0x170
[ 6105.169681]  entry_SYSCALL_64_after_hwframe+0x6c/0x74

The above *probably* occured because __blkdev_issue_discard() noticed a pending
signal, processed the bio, freed the bio and returned a non-NULL bio pointer
to the caller (i.e. xfs_discard_extents()).

xfs_discard_extents() then tries to process the freed bio once again.

-- 
Chandan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06  7:19 [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305 Chandan Babu R
@ 2024-03-06 12:35 ` Christoph Hellwig
  2024-03-06 14:36   ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-03-06 12:35 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: kbusch, linux-block, linux-xfs

On Wed, Mar 06, 2024 at 12:49:29PM +0530, Chandan Babu R wrote:
> The above *probably* occured because __blkdev_issue_discard() noticed a pending
> signal, processed the bio, freed the bio and returned a non-NULL bio pointer
> to the caller (i.e. xfs_discard_extents()).
> 
> xfs_discard_extents() then tries to process the freed bio once again.

Yes, __blkdev_issue_discard really needs to clear *biop to NULL for
this case, i.e.:

diff --git a/block/blk-lib.c b/block/blk-lib.c
index dc8e35d0a51d6d..26850d4895cdaf 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -99,6 +99,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		cond_resched();
 		if (fatal_signal_pending(current)) {
 			await_bio_chain(bio);
+			*biop = NULL;
 			return -EINTR;
 		}
 	}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 12:35 ` Christoph Hellwig
@ 2024-03-06 14:36   ` Keith Busch
  2024-03-06 14:40     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2024-03-06 14:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-block, linux-xfs

On Wed, Mar 06, 2024 at 04:35:09AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 06, 2024 at 12:49:29PM +0530, Chandan Babu R wrote:
> > The above *probably* occured because __blkdev_issue_discard() noticed a pending
> > signal, processed the bio, freed the bio and returned a non-NULL bio pointer
> > to the caller (i.e. xfs_discard_extents()).
> > 
> > xfs_discard_extents() then tries to process the freed bio once again.
> 
> Yes, __blkdev_issue_discard really needs to clear *biop to NULL for
> this case, i.e.:
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index dc8e35d0a51d6d..26850d4895cdaf 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -99,6 +99,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		cond_resched();
>  		if (fatal_signal_pending(current)) {
>  			await_bio_chain(bio);
> +			*biop = NULL;
>  			return -EINTR;
>  		}
>  	}

But everyone who calls this already sets their local bio to NULL by
default, and __blkdev_issue_discard updates *biop only on success, so
'*biop' should already be NULL here. ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 14:36   ` Keith Busch
@ 2024-03-06 14:40     ` Keith Busch
  2024-03-06 14:45       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2024-03-06 14:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-block, linux-xfs

On Wed, Mar 06, 2024 at 07:36:43AM -0700, Keith Busch wrote:
> On Wed, Mar 06, 2024 at 04:35:09AM -0800, Christoph Hellwig wrote:
> > On Wed, Mar 06, 2024 at 12:49:29PM +0530, Chandan Babu R wrote:
> > > The above *probably* occured because __blkdev_issue_discard() noticed a pending
> > > signal, processed the bio, freed the bio and returned a non-NULL bio pointer
> > > to the caller (i.e. xfs_discard_extents()).
> > > 
> > > xfs_discard_extents() then tries to process the freed bio once again.
> > 
> > Yes, __blkdev_issue_discard really needs to clear *biop to NULL for
> > this case, i.e.:
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index dc8e35d0a51d6d..26850d4895cdaf 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -99,6 +99,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> >  		cond_resched();
> >  		if (fatal_signal_pending(current)) {
> >  			await_bio_chain(bio);
> > +			*biop = NULL;
> >  			return -EINTR;
> >  		}
> >  	}
> 
> But everyone who calls this already sets their local bio to NULL by
> default, and __blkdev_issue_discard updates *biop only on success, so
> '*biop' should already be NULL here. ?

Oh my mistake: xfs_discard_extents() does this in a loop and chains
along the previous iteration's bio. Your update is needed and looks
good.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 14:40     ` Keith Busch
@ 2024-03-06 14:45       ` Christoph Hellwig
  2024-03-06 15:18         ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-03-06 14:45 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Chandan Babu R, linux-block, linux-xfs

On Wed, Mar 06, 2024 at 07:40:02AM -0700, Keith Busch wrote:
> Oh my mistake: xfs_discard_extents() does this in a loop and chains
> along the previous iteration's bio. Your update is needed and looks
> good.

Yeah, the whole point for the in/out biop parameter is to allow
this kind of chaining.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 14:45       ` Christoph Hellwig
@ 2024-03-06 15:18         ` Christoph Hellwig
  2024-03-06 20:51           ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-03-06 15:18 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Chandan Babu R, linux-block, linux-xfs

Lookings at this a bit more I'm not sure my fix is enough as the error
handling is really complex.  Also given that some discard callers are
from kernel threads messing with interruptibility I'm not entirely
sure that having this check in the common helper is a good idea.

Let me think of a better way to deal with this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 15:18         ` Christoph Hellwig
@ 2024-03-06 20:51           ` Dave Chinner
  2024-03-06 22:16             ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2024-03-06 20:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Chandan Babu R, linux-block, linux-xfs

On Wed, Mar 06, 2024 at 07:18:02AM -0800, Christoph Hellwig wrote:
> Lookings at this a bit more I'm not sure my fix is enough as the error
> handling is really complex.  Also given that some discard callers are
> from kernel threads messing with interruptibility I'm not entirely
> sure that having this check in the common helper is a good idea.

Yeah, this seems like a problem. The only places that userspace
should be issuing discards directly and hence be interruptible from
are FITRIM, BLKDISCARD and fallocate() on block devices.

Filesystems already handle fatal signals in FITRIM (e.g. see
xfs_trim_should_stop(), ext4_trim_interrupted(),
btrfs_trim_free_extents(), etc), so it seems to me that the only
non-interruptible call from userspace are operations directly on
block devices which have no higher level iteration over the range to
discard and the user controls the range directly.

Perhaps the solution is to change BLKDISCARD/fallocate() on bdev to
look more like xfs_discard_extents() where it breaks the range up
into smaller chunks and intersperses bio chaining with signal
checks.

I suspect the same solution is necessary for blkdev_issue_zeroout()
and blkdev_issue_secure_erase(), because both of them have user
controlled lengths...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 20:51           ` Dave Chinner
@ 2024-03-06 22:16             ` Christoph Hellwig
  2024-03-07  9:21               ` Nilay Shroff
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-03-06 22:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Keith Busch, Chandan Babu R, linux-block,
	linux-xfs

On Thu, Mar 07, 2024 at 07:51:35AM +1100, Dave Chinner wrote:
> On Wed, Mar 06, 2024 at 07:18:02AM -0800, Christoph Hellwig wrote:
> > Lookings at this a bit more I'm not sure my fix is enough as the error
> > handling is really complex.  Also given that some discard callers are
> > from kernel threads messing with interruptibility I'm not entirely
> > sure that having this check in the common helper is a good idea.
> 
> Yeah, this seems like a problem. The only places that userspace
> should be issuing discards directly and hence be interruptible from
> are FITRIM, BLKDISCARD and fallocate() on block devices.

Yes.

> Filesystems already handle fatal signals in FITRIM (e.g. see
> xfs_trim_should_stop(), ext4_trim_interrupted(),
> btrfs_trim_free_extents(), etc), so it seems to me that the only
> non-interruptible call from userspace are operations directly on
> block devices which have no higher level iteration over the range to
> discard and the user controls the range directly.

Yeah.

> Perhaps the solution is to change BLKDISCARD/fallocate() on bdev to
> look more like xfs_discard_extents() where it breaks the range up
> into smaller chunks and intersperses bio chaining with signal
> checks.

Well, xfs_discard_extents has different extents from the higher
layers.  __blkdev_issue_discard than breaks it up based on what
fits into the bio (and does some alignment against our normal
rule of leaving that to the splitting code).  But I suspect moving
the loop in __blkdev_issue_discard into the callers could really
help with this.

> 
> I suspect the same solution is necessary for blkdev_issue_zeroout()
> and blkdev_issue_secure_erase(), because both of them have user
> controlled lengths...

Yes.  (or rather two sub cases of the former and the latter)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305
  2024-03-06 22:16             ` Christoph Hellwig
@ 2024-03-07  9:21               ` Nilay Shroff
  0 siblings, 0 replies; 9+ messages in thread
From: Nilay Shroff @ 2024-03-07  9:21 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner
  Cc: Keith Busch, Chandan Babu R, linux-block, linux-xfs



On 3/7/24 03:46, Christoph Hellwig wrote:
> On Thu, Mar 07, 2024 at 07:51:35AM +1100, Dave Chinner wrote:
>> On Wed, Mar 06, 2024 at 07:18:02AM -0800, Christoph Hellwig wrote:
>>> Lookings at this a bit more I'm not sure my fix is enough as the error
>>> handling is really complex.  Also given that some discard callers are
>>> from kernel threads messing with interruptibility I'm not entirely
>>> sure that having this check in the common helper is a good idea.
>>
>> Yeah, this seems like a problem. The only places that userspace
>> should be issuing discards directly and hence be interruptible from
>> are FITRIM, BLKDISCARD and fallocate() on block devices.
> 
> Yes.
> 
>> Filesystems already handle fatal signals in FITRIM (e.g. see
>> xfs_trim_should_stop(), ext4_trim_interrupted(),
>> btrfs_trim_free_extents(), etc), so it seems to me that the only
>> non-interruptible call from userspace are operations directly on
>> block devices which have no higher level iteration over the range to
>> discard and the user controls the range directly.
> 
> Yeah.
> 
>> Perhaps the solution is to change BLKDISCARD/fallocate() on bdev to
>> look more like xfs_discard_extents() where it breaks the range up
>> into smaller chunks and intersperses bio chaining with signal
>> checks.
> 
> Well, xfs_discard_extents has different extents from the higher
> layers.  __blkdev_issue_discard than breaks it up based on what
> fits into the bio (and does some alignment against our normal
> rule of leaving that to the splitting code).  But I suspect moving
> the loop in __blkdev_issue_discard into the callers could really
> help with this.
> 
>>
>> I suspect the same solution is necessary for blkdev_issue_zeroout()
>> and blkdev_issue_secure_erase(), because both of them have user
>> controlled lengths...
> 
> Yes.  (or rather two sub cases of the former and the latter)
> 
How about adding a new parameter named "interruptible" to __blkdev_issue_discard()
and then using that parameter to deduce whether we need to intercept fatal
signal or not? We can ensure that it's only when __blkdev_issue_discard() is
invoked from the userspace (BLKDISCARD/fallocate()) we would have "interruptible"
set to "1" otherwise for all other code path it could be set to "0".

Yes we may need one helper which would help set the "interruptible" to "1" 
when invoked from BLKDISCARD/fallocate(). 

Probably the code should look as below (not tested):

diff --git a/block/blk-lib.c b/block/blk-lib.c
index dc8e35d0a51d..4b17f8b9dec1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -56,7 +56,7 @@ static void await_bio_chain(struct bio *bio)
 }
 
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-               sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
+               sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, int interruptible)
 {
        struct bio *bio = *biop;
        sector_t bs_mask;
@@ -97,8 +97,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
                 * is disabled.
                 */
                cond_resched();
-               if (fatal_signal_pending(current)) {
+               if (interruptible && fatal_signal_pending(current)) {
                        await_bio_chain(bio);
+                       *biop = NULL;
                        return -EINTR;
                }
        }
@@ -126,7 +127,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
        int ret;
 
        blk_start_plug(&plug);
-       ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio);
+       ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio, 0);
        if (!ret && bio) {
                ret = submit_bio_wait(bio);
                if (ret == -EOPNOTSUPP)
@@ -139,6 +140,26 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
 
+int blkdev_issue_discard_interruptible(struct block_device *bdev, sector_t sector,
+               sector_t nr_sects, gfp_t gfp_mask)
+{
+       struct bio *bio = NULL;
+       struct blk_plug plug;
+       int ret;
+
+       blk_start_plug(&plug);
+       ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio, 1);
+       if (!ret && bio) {
+               ret = submit_bio_wait(bio);
+               if (ret == -EOPNOTSUPP)
+                       ret = 0;
+               bio_put(bio);
+       }
+       blk_finish_plug(&plug);
+
+       return ret;
+}
+
 static int __blkdev_issue_write_zeroes(struct block_device *bdev,
                sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
                struct bio **biop, unsigned flags)
diff --git a/block/fops.c b/block/fops.c
index 0cf8cf72cdfa..f9399a59cf4e 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -828,7 +828,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
                if (error)
                        goto fail;
 
-               error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
+               error = blkdev_issue_discard_interruptible(bdev, start >> SECTOR_SHIFT,
                                             len >> SECTOR_SHIFT, GFP_KERNEL);
                break;
        default:
diff --git a/block/ioctl.c b/block/ioctl.c
index 438f79c564cf..e869f2859eb1 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -117,7 +117,8 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
        err = truncate_bdev_range(bdev, mode, start, start + len - 1);
        if (err)
                goto fail;
-       err = blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL);
+       err = blkdev_issue_discard_interruptible(bdev, start >> 9,
+                       len >> 9, GFP_KERNEL);
 fail:
        filemap_invalidate_unlock(inode->i_mapping);
        return err;


And yes we would need similar changes for  blkdev_issue_zeroout() and 
blkdev_issue_secure_erase().

Thanks,
--Nilay






^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-07  9:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-06  7:19 [BUG REPORT] General protection fault while discarding extents on XFS on next-20240305 Chandan Babu R
2024-03-06 12:35 ` Christoph Hellwig
2024-03-06 14:36   ` Keith Busch
2024-03-06 14:40     ` Keith Busch
2024-03-06 14:45       ` Christoph Hellwig
2024-03-06 15:18         ` Christoph Hellwig
2024-03-06 20:51           ` Dave Chinner
2024-03-06 22:16             ` Christoph Hellwig
2024-03-07  9:21               ` Nilay Shroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).