[PATCH] blk-lib: let user kill a zereout process

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] blk-lib: let user kill a zereout process
@ 2024-02-20 20:41 Keith Busch
  2024-02-21  3:05 ` Chaitanya Kulkarni
  2024-02-21  8:46 ` Nilay Shroff
  0 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2024-02-20 20:41 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, Keith Busch

From: Keith Busch <kbusch@kernel.org>

If a user runs something like `blkdiscard -z /dev/sda`, and the device
does not have an efficient write zero offload, the kernel will dispatch
long chains of bio's using the ZERO_PAGE for the entire capacity of the
device. If the capacity is very large, this process could take a long
time in an uninterruptable state, which the user may want to abort.
Check between batches for the user's request to kill the process so they
don't need to wait potentially many hours.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/blk-lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e8351..d5c334aa98e0d 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
 				break;
 		}
 		cond_resched();
+		if (fatal_signal_pending(current))
+			break;
 	}

 	*biop = bio;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch
@ 2024-02-21  3:05 ` Chaitanya Kulkarni
  2024-02-21  3:16   ` Keith Busch
  2024-02-21  8:46 ` Nilay Shroff
  1 sibling, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-02-21  3:05 UTC (permalink / raw)
  To: Keith Busch, linux-block@vger.kernel.org; +Cc: axboe@kernel.org, Keith Busch

On 2/20/24 12:41, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> If a user runs something like `blkdiscard -z /dev/sda`, and the device
> does not have an efficient write zero offload, the kernel will dispatch
> long chains of bio's using the ZERO_PAGE for the entire capacity of the
> device. If the capacity is very large, this process could take a long
> time in an uninterruptable state, which the user may want to abort.
> Check between batches for the user's request to kill the process so they
> don't need to wait potentially many hours.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   block/blk-lib.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index e59c3069e8351..d5c334aa98e0d 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>   				break;
>   		}
>   		cond_resched();
> +		if (fatal_signal_pending(current))
> +			break;
>   	}
>   
>   	*biop = bio;

We are exiting before completion of the whole I/O request, does it makes
sense to return 0 == success if I/O is interrupted by the signal ?
that means I/O not completed, hence it is actually an error, can we return
the -EINTR when you are handling outstanding signal ?

something like this untested :-

linux-block (for-next) # git diff
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..bfadd9eecc6e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -172,6 +172,7 @@ static int __blkdev_issue_zero_pages(struct 
block_device *bdev,
         struct bio *bio = *biop;
         int bi_size = 0;
         unsigned int sz;
+       int ret = 0;

         if (bdev_read_only(bdev))
                 return -EPERM;
@@ -190,10 +191,14 @@ static int __blkdev_issue_zero_pages(struct 
block_device *bdev,
                                 break;
                 }
                 cond_resched();
+               if (fatal_signal_pending(current)) {
+                       ret = -EINTR;
+                       break;
+               }
         }

         *biop = bio;
-       return 0;
+       return ret;
  }

  /**


-ck



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  3:05 ` Chaitanya Kulkarni
@ 2024-02-21  3:16   ` Keith Busch
  2024-02-21  3:21     ` Keith Busch
  2024-02-21  8:54     ` Ming Lei
  0 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2024-02-21  3:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org

On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote:
> On 2/20/24 12:41, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
> >   				break;
> >   		}
> >   		cond_resched();
> > +		if (fatal_signal_pending(current))
> > +			break;
> >   	}
> >   
> >   	*biop = bio;
> 
> We are exiting before completion of the whole I/O request, does it makes
> sense to return 0 == success if I/O is interrupted by the signal ?
> that means I/O not completed, hence it is actually an error, can we return
> the -EINTR when you are handling outstanding signal ?

I initially thought so too, but it doesn't matter. Once the process
returns to user space, the signal kills it and the exit status reflects
accordingly. That's true even before this patch, but it would just take
longer for the exit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  3:16   ` Keith Busch
@ 2024-02-21  3:21     ` Keith Busch
  2024-02-21  5:32       ` Chaitanya Kulkarni
  2024-02-21  8:54     ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2024-02-21  3:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org

On Tue, Feb 20, 2024 at 08:16:29PM -0700, Keith Busch wrote:
> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote:
> > On 2/20/24 12:41, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
> > >   				break;
> > >   		}
> > >   		cond_resched();
> > > +		if (fatal_signal_pending(current))
> > > +			break;
> > >   	}
> > >   
> > >   	*biop = bio;
> > 
> > We are exiting before completion of the whole I/O request, does it makes
> > sense to return 0 == success if I/O is interrupted by the signal ?
> > that means I/O not completed, hence it is actually an error, can we return
> > the -EINTR when you are handling outstanding signal ?
> 
> I initially thought so too, but it doesn't matter. Once the process
> returns to user space, the signal kills it and the exit status reflects
> accordingly. That's true even before this patch, but it would just take
> longer for the exit.

Also consider that we have bio's in flight here, and an error return
will skip waiting for them. The kill signal handling here doesn't abort
inflight requests (that's too hard); it just prevents creating and
waiting for the rest of them, which could be millions.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  3:21     ` Keith Busch
@ 2024-02-21  5:32       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-02-21  5:32 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org

On 2/20/24 19:21, Keith Busch wrote:
> On Tue, Feb 20, 2024 at 08:16:29PM -0700, Keith Busch wrote:
>> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote:
>>> On 2/20/24 12:41, Keith Busch wrote:
>>>> From: Keith Busch <kbusch@kernel.org>
>>>> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>>>>    				break;
>>>>    		}
>>>>    		cond_resched();
>>>> +		if (fatal_signal_pending(current))
>>>> +			break;
>>>>    	}
>>>>    
>>>>    	*biop = bio;
>>> We are exiting before completion of the whole I/O request, does it makes
>>> sense to return 0 == success if I/O is interrupted by the signal ?
>>> that means I/O not completed, hence it is actually an error, can we return
>>> the -EINTR when you are handling outstanding signal ?
>> I initially thought so too, but it doesn't matter. Once the process
>> returns to user space, the signal kills it and the exit status reflects
>> accordingly. That's true even before this patch, but it would just take
>> longer for the exit.
> Also consider that we have bio's in flight here, and an error return
> will skip waiting for them. The kill signal handling here doesn't abort
> inflight requests (that's too hard); it just prevents creating and
> waiting for the rest of them, which could be millions.

comment would be nice but not necessary, irrespective of that,
looks good :-

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch
  2024-02-21  3:05 ` Chaitanya Kulkarni
@ 2024-02-21  8:46 ` Nilay Shroff
  2024-02-21  9:46   ` Chaitanya Kulkarni
  2024-02-21 16:27   ` Keith Busch
  1 sibling, 2 replies; 10+ messages in thread
From: Nilay Shroff @ 2024-02-21  8:46 UTC (permalink / raw)
  To: Keith Busch, linux-block; +Cc: axboe, Keith Busch



On 2/21/24 02:11, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> If a user runs something like `blkdiscard -z /dev/sda`, and the device
> does not have an efficient write zero offload, the kernel will dispatch
> long chains of bio's using the ZERO_PAGE for the entire capacity of the
> device. If the capacity is very large, this process could take a long
> time in an uninterruptable state, which the user may want to abort.
> Check between batches for the user's request to kill the process so they
> don't need to wait potentially many hours.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  block/blk-lib.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index e59c3069e8351..d5c334aa98e0d 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>  				break;
>  		}
>  		cond_resched();
> +		if (fatal_signal_pending(current))
> +			break;
>  	}
>  
>  	*biop = bio;

I have an NVMe device which supports write offload. The total size of this disk is ~1.5 TB.
When I tried zero out the whole NVMe drive it took more than 10 minutes. Please see below:

# cat /sys/block/nvme0n1/queue/write_zeroes_max_bytes 
8388608

# nvme id-ns /dev/nvme0n1 -H 
NVME Identify Namespace 1:
nsze    : 0x1749a956
ncap    : 0x1749a956
nuse    : 0x1749a956
<snip>
nvmcap  : 1600321314816
<snip>
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
<snip>

# time blkdiscard -z /dev/nvme0n1 

real	10m27.514s
user	0m0.000s
sys	0m0.369s

So shouldn't we need to add the same code (allowing user to kill the process) under 
__blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if
drive has a very large capacity then it would take up a lot of time to zero out the complete drive. 
Yes the time required may not be in hours in this case but it could be in tens of minutes depending 
on the drive capacity.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  3:16   ` Keith Busch
  2024-02-21  3:21     ` Keith Busch
@ 2024-02-21  8:54     ` Ming Lei
  2024-02-21  9:45       ` Chaitanya Kulkarni
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2024-02-21  8:54 UTC (permalink / raw)
  To: Keith Busch
  Cc: Chaitanya Kulkarni, Keith Busch, linux-block@vger.kernel.org,
	axboe@kernel.org

On Wed, Feb 21, 2024 at 11:16 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote:
> > On 2/20/24 12:41, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
> > >                             break;
> > >             }
> > >             cond_resched();
> > > +           if (fatal_signal_pending(current))
> > > +                   break;
> > >     }
> > >
> > >     *biop = bio;
> >
> > We are exiting before completion of the whole I/O request, does it makes
> > sense to return 0 == success if I/O is interrupted by the signal ?
> > that means I/O not completed, hence it is actually an error, can we return
> > the -EINTR when you are handling outstanding signal ?
>
> I initially thought so too, but it doesn't matter. Once the process
> returns to user space, the signal kills it and the exit status reflects
> accordingly. That's true even before this patch, but it would just take
> longer for the exit.

The zeroout API could be called from FS code in user context, and this way
may confuse FS code, given it returns successfully, but actually it does not.

Thanks,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  8:54     ` Ming Lei
@ 2024-02-21  9:45       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-02-21  9:45 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org

On 2/21/24 00:54, Ming Lei wrote:
> On Wed, Feb 21, 2024 at 11:16 AM Keith Busch <kbusch@kernel.org> wrote:
>> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote:
>>> On 2/20/24 12:41, Keith Busch wrote:
>>>> From: Keith Busch <kbusch@kernel.org>
>>>> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>>>>                              break;
>>>>              }
>>>>              cond_resched();
>>>> +           if (fatal_signal_pending(current))
>>>> +                   break;
>>>>      }
>>>>
>>>>      *biop = bio;
>>> We are exiting before completion of the whole I/O request, does it makes
>>> sense to return 0 == success if I/O is interrupted by the signal ?
>>> that means I/O not completed, hence it is actually an error, can we return
>>> the -EINTR when you are handling outstanding signal ?
>> I initially thought so too, but it doesn't matter. Once the process
>> returns to user space, the signal kills it and the exit status reflects
>> accordingly. That's true even before this patch, but it would just take
>> longer for the exit.
> The zeroout API could be called from FS code in user context, and this way
> may confuse FS code, given it returns successfully, but actually it does not.
>
> Thanks,
>

that is why I asked for returning -EINTR initially, but it seems like it 
will
not have any effect ? given that process is about to exit ?

note that that it may also get called from other places e.g. NVMeOF target
or any future callers :-

nvmet_bdev_execute_write_zeroes()
     __blkdev_issue_zeroout()
         __blkdev_issue_zero_pages()

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  8:46 ` Nilay Shroff
@ 2024-02-21  9:46   ` Chaitanya Kulkarni
  2024-02-21 16:27   ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2024-02-21  9:46 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: axboe@kernel.org, Keith Busch, linux-block@vger.kernel.org,
	Keith Busch

On 2/21/24 00:46, Nilay Shroff wrote:
>
> On 2/21/24 02:11, Keith Busch wrote:
>> From: Keith Busch <kbusch@kernel.org>
>>
>> If a user runs something like `blkdiscard -z /dev/sda`, and the device
>> does not have an efficient write zero offload, the kernel will dispatch
>> long chains of bio's using the ZERO_PAGE for the entire capacity of the
>> device. If the capacity is very large, this process could take a long
>> time in an uninterruptable state, which the user may want to abort.
>> Check between batches for the user's request to kill the process so they
>> don't need to wait potentially many hours.
>>
>> Signed-off-by: Keith Busch <kbusch@kernel.org>
>> ---
>>   block/blk-lib.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index e59c3069e8351..d5c334aa98e0d 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
>>   				break;
>>   		}
>>   		cond_resched();
>> +		if (fatal_signal_pending(current))
>> +			break;
>>   	}
>>   
>>   	*biop = bio;
> I have an NVMe device which supports write offload. The total size of this disk is ~1.5 TB.
> When I tried zero out the whole NVMe drive it took more than 10 minutes. Please see below:
>
> # cat /sys/block/nvme0n1/queue/write_zeroes_max_bytes
> 8388608
>
> # nvme id-ns /dev/nvme0n1 -H
> NVME Identify Namespace 1:
> nsze    : 0x1749a956
> ncap    : 0x1749a956
> nuse    : 0x1749a956
> <snip>
> nvmcap  : 1600321314816
> <snip>
> LBA Format  0 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
> <snip>
>
> # time blkdiscard -z /dev/nvme0n1
>
> real	10m27.514s
> user	0m0.000s
> sys	0m0.369s
>
> So shouldn't we need to add the same code (allowing user to kill the process) under
> __blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if
> drive has a very large capacity then it would take up a lot of time to zero out the complete drive.
> Yes the time required may not be in hours in this case but it could be in tens of minutes depending
> on the drive capacity.
>
> Thanks,
> --Nilay
>
>

this is long standing problem with discard and write-zeroes, if we are 
going this route
then we might as well make this change for rest of the callers in 
blk-lib.c ..

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-lib: let user kill a zereout process
  2024-02-21  8:46 ` Nilay Shroff
  2024-02-21  9:46   ` Chaitanya Kulkarni
@ 2024-02-21 16:27   ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2024-02-21 16:27 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Keith Busch, linux-block, axboe

On Wed, Feb 21, 2024 at 02:16:23PM +0530, Nilay Shroff wrote:
> On 2/21/24 02:11, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> 
> # time blkdiscard -z /dev/nvme0n1 
> 
> real	10m27.514s
> user	0m0.000s
> sys	0m0.369s
> 
> So shouldn't we need to add the same code (allowing user to kill the process) under 
> __blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if
> drive has a very large capacity then it would take up a lot of time to zero out the complete drive. 
> Yes the time required may not be in hours in this case but it could be in tens of minutes depending 
> on the drive capacity.

Yeah, that's long enough to change your mind and not want to wait around
for it to proceed anyway. Between that and the filesystem usage, looks
like I have more things to consider for v2.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-21 16:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch
2024-02-21  3:05 ` Chaitanya Kulkarni
2024-02-21  3:16   ` Keith Busch
2024-02-21  3:21     ` Keith Busch
2024-02-21  5:32       ` Chaitanya Kulkarni
2024-02-21  8:54     ` Ming Lei
2024-02-21  9:45       ` Chaitanya Kulkarni
2024-02-21  8:46 ` Nilay Shroff
2024-02-21  9:46   ` Chaitanya Kulkarni
2024-02-21 16:27   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).