* [PATCH] blk-lib: let user kill a zereout process @ 2024-02-20 20:41 Keith Busch 2024-02-21 3:05 ` Chaitanya Kulkarni 2024-02-21 8:46 ` Nilay Shroff 0 siblings, 2 replies; 10+ messages in thread From: Keith Busch @ 2024-02-20 20:41 UTC (permalink / raw) To: linux-block; +Cc: axboe, Keith Busch From: Keith Busch <kbusch@kernel.org> If a user runs something like `blkdiscard -z /dev/sda`, and the device does not have an efficient write zero offload, the kernel will dispatch long chains of bio's using the ZERO_PAGE for the entire capacity of the device. If the capacity is very large, this process could take a long time in an uninterruptable state, which the user may want to abort. Check between batches for the user's request to kill the process so they don't need to wait potentially many hours. Signed-off-by: Keith Busch <kbusch@kernel.org> --- block/blk-lib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index e59c3069e8351..d5c334aa98e0d 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, break; } cond_resched(); + if (fatal_signal_pending(current)) + break; } *biop = bio; -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch @ 2024-02-21 3:05 ` Chaitanya Kulkarni 2024-02-21 3:16 ` Keith Busch 2024-02-21 8:46 ` Nilay Shroff 1 sibling, 1 reply; 10+ messages in thread From: Chaitanya Kulkarni @ 2024-02-21 3:05 UTC (permalink / raw) To: Keith Busch, linux-block@vger.kernel.org; +Cc: axboe@kernel.org, Keith Busch On 2/20/24 12:41, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > If a user runs something like `blkdiscard -z /dev/sda`, and the device > does not have an efficient write zero offload, the kernel will dispatch > long chains of bio's using the ZERO_PAGE for the entire capacity of the > device. If the capacity is very large, this process could take a long > time in an uninterruptable state, which the user may want to abort. > Check between batches for the user's request to kill the process so they > don't need to wait potentially many hours. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > block/blk-lib.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index e59c3069e8351..d5c334aa98e0d 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, > break; > } > cond_resched(); > + if (fatal_signal_pending(current)) > + break; > } > > *biop = bio; We are exiting before completion of the whole I/O request, does it makes sense to return 0 == success if I/O is interrupted by the signal ? that means I/O not completed, hence it is actually an error, can we return the -EINTR when you are handling outstanding signal ? something like this untested :- linux-block (for-next) # git diff diff --git a/block/blk-lib.c b/block/blk-lib.c index e59c3069e835..bfadd9eecc6e 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -172,6 +172,7 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, struct bio *bio = *biop; int bi_size = 0; unsigned int sz; + int ret = 0; if (bdev_read_only(bdev)) return -EPERM; @@ -190,10 +191,14 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, break; } cond_resched(); + if (fatal_signal_pending(current)) { + ret = -EINTR; + break; + } } *biop = bio; - return 0; + return ret; } /** -ck ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 3:05 ` Chaitanya Kulkarni @ 2024-02-21 3:16 ` Keith Busch 2024-02-21 3:21 ` Keith Busch 2024-02-21 8:54 ` Ming Lei 0 siblings, 2 replies; 10+ messages in thread From: Keith Busch @ 2024-02-21 3:16 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote: > On 2/20/24 12:41, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, > > break; > > } > > cond_resched(); > > + if (fatal_signal_pending(current)) > > + break; > > } > > > > *biop = bio; > > We are exiting before completion of the whole I/O request, does it makes > sense to return 0 == success if I/O is interrupted by the signal ? > that means I/O not completed, hence it is actually an error, can we return > the -EINTR when you are handling outstanding signal ? I initially thought so too, but it doesn't matter. Once the process returns to user space, the signal kills it and the exit status reflects accordingly. That's true even before this patch, but it would just take longer for the exit. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 3:16 ` Keith Busch @ 2024-02-21 3:21 ` Keith Busch 2024-02-21 5:32 ` Chaitanya Kulkarni 2024-02-21 8:54 ` Ming Lei 1 sibling, 1 reply; 10+ messages in thread From: Keith Busch @ 2024-02-21 3:21 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org On Tue, Feb 20, 2024 at 08:16:29PM -0700, Keith Busch wrote: > On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote: > > On 2/20/24 12:41, Keith Busch wrote: > > > From: Keith Busch <kbusch@kernel.org> > > > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, > > > break; > > > } > > > cond_resched(); > > > + if (fatal_signal_pending(current)) > > > + break; > > > } > > > > > > *biop = bio; > > > > We are exiting before completion of the whole I/O request, does it makes > > sense to return 0 == success if I/O is interrupted by the signal ? > > that means I/O not completed, hence it is actually an error, can we return > > the -EINTR when you are handling outstanding signal ? > > I initially thought so too, but it doesn't matter. Once the process > returns to user space, the signal kills it and the exit status reflects > accordingly. That's true even before this patch, but it would just take > longer for the exit. Also consider that we have bio's in flight here, and an error return will skip waiting for them. The kill signal handling here doesn't abort inflight requests (that's too hard); it just prevents creating and waiting for the rest of them, which could be millions. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 3:21 ` Keith Busch @ 2024-02-21 5:32 ` Chaitanya Kulkarni 0 siblings, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2024-02-21 5:32 UTC (permalink / raw) To: Keith Busch; +Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org On 2/20/24 19:21, Keith Busch wrote: > On Tue, Feb 20, 2024 at 08:16:29PM -0700, Keith Busch wrote: >> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote: >>> On 2/20/24 12:41, Keith Busch wrote: >>>> From: Keith Busch <kbusch@kernel.org> >>>> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, >>>> break; >>>> } >>>> cond_resched(); >>>> + if (fatal_signal_pending(current)) >>>> + break; >>>> } >>>> >>>> *biop = bio; >>> We are exiting before completion of the whole I/O request, does it makes >>> sense to return 0 == success if I/O is interrupted by the signal ? >>> that means I/O not completed, hence it is actually an error, can we return >>> the -EINTR when you are handling outstanding signal ? >> I initially thought so too, but it doesn't matter. Once the process >> returns to user space, the signal kills it and the exit status reflects >> accordingly. That's true even before this patch, but it would just take >> longer for the exit. > Also consider that we have bio's in flight here, and an error return > will skip waiting for them. The kill signal handling here doesn't abort > inflight requests (that's too hard); it just prevents creating and > waiting for the rest of them, which could be millions. comment would be nice but not necessary, irrespective of that, looks good :- Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 3:16 ` Keith Busch 2024-02-21 3:21 ` Keith Busch @ 2024-02-21 8:54 ` Ming Lei 2024-02-21 9:45 ` Chaitanya Kulkarni 1 sibling, 1 reply; 10+ messages in thread From: Ming Lei @ 2024-02-21 8:54 UTC (permalink / raw) To: Keith Busch Cc: Chaitanya Kulkarni, Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org On Wed, Feb 21, 2024 at 11:16 AM Keith Busch <kbusch@kernel.org> wrote: > > On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote: > > On 2/20/24 12:41, Keith Busch wrote: > > > From: Keith Busch <kbusch@kernel.org> > > > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, > > > break; > > > } > > > cond_resched(); > > > + if (fatal_signal_pending(current)) > > > + break; > > > } > > > > > > *biop = bio; > > > > We are exiting before completion of the whole I/O request, does it makes > > sense to return 0 == success if I/O is interrupted by the signal ? > > that means I/O not completed, hence it is actually an error, can we return > > the -EINTR when you are handling outstanding signal ? > > I initially thought so too, but it doesn't matter. Once the process > returns to user space, the signal kills it and the exit status reflects > accordingly. That's true even before this patch, but it would just take > longer for the exit. The zeroout API could be called from FS code in user context, and this way may confuse FS code, given it returns successfully, but actually it does not. Thanks, ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 8:54 ` Ming Lei @ 2024-02-21 9:45 ` Chaitanya Kulkarni 0 siblings, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2024-02-21 9:45 UTC (permalink / raw) To: Ming Lei, Keith Busch Cc: Keith Busch, linux-block@vger.kernel.org, axboe@kernel.org On 2/21/24 00:54, Ming Lei wrote: > On Wed, Feb 21, 2024 at 11:16 AM Keith Busch <kbusch@kernel.org> wrote: >> On Wed, Feb 21, 2024 at 03:05:30AM +0000, Chaitanya Kulkarni wrote: >>> On 2/20/24 12:41, Keith Busch wrote: >>>> From: Keith Busch <kbusch@kernel.org> >>>> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, >>>> break; >>>> } >>>> cond_resched(); >>>> + if (fatal_signal_pending(current)) >>>> + break; >>>> } >>>> >>>> *biop = bio; >>> We are exiting before completion of the whole I/O request, does it makes >>> sense to return 0 == success if I/O is interrupted by the signal ? >>> that means I/O not completed, hence it is actually an error, can we return >>> the -EINTR when you are handling outstanding signal ? >> I initially thought so too, but it doesn't matter. Once the process >> returns to user space, the signal kills it and the exit status reflects >> accordingly. That's true even before this patch, but it would just take >> longer for the exit. > The zeroout API could be called from FS code in user context, and this way > may confuse FS code, given it returns successfully, but actually it does not. > > Thanks, > that is why I asked for returning -EINTR initially, but it seems like it will not have any effect ? given that process is about to exit ? note that that it may also get called from other places e.g. NVMeOF target or any future callers :- nvmet_bdev_execute_write_zeroes() __blkdev_issue_zeroout() __blkdev_issue_zero_pages() -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch 2024-02-21 3:05 ` Chaitanya Kulkarni @ 2024-02-21 8:46 ` Nilay Shroff 2024-02-21 9:46 ` Chaitanya Kulkarni 2024-02-21 16:27 ` Keith Busch 1 sibling, 2 replies; 10+ messages in thread From: Nilay Shroff @ 2024-02-21 8:46 UTC (permalink / raw) To: Keith Busch, linux-block; +Cc: axboe, Keith Busch On 2/21/24 02:11, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > If a user runs something like `blkdiscard -z /dev/sda`, and the device > does not have an efficient write zero offload, the kernel will dispatch > long chains of bio's using the ZERO_PAGE for the entire capacity of the > device. If the capacity is very large, this process could take a long > time in an uninterruptable state, which the user may want to abort. > Check between batches for the user's request to kill the process so they > don't need to wait potentially many hours. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > block/blk-lib.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index e59c3069e8351..d5c334aa98e0d 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, > break; > } > cond_resched(); > + if (fatal_signal_pending(current)) > + break; > } > > *biop = bio; I have an NVMe device which supports write offload. The total size of this disk is ~1.5 TB. When I tried zero out the whole NVMe drive it took more than 10 minutes. Please see below: # cat /sys/block/nvme0n1/queue/write_zeroes_max_bytes 8388608 # nvme id-ns /dev/nvme0n1 -H NVME Identify Namespace 1: nsze : 0x1749a956 ncap : 0x1749a956 nuse : 0x1749a956 <snip> nvmcap : 1600321314816 <snip> LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use) <snip> # time blkdiscard -z /dev/nvme0n1 real 10m27.514s user 0m0.000s sys 0m0.369s So shouldn't we need to add the same code (allowing user to kill the process) under __blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if drive has a very large capacity then it would take up a lot of time to zero out the complete drive. Yes the time required may not be in hours in this case but it could be in tens of minutes depending on the drive capacity. Thanks, --Nilay ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 8:46 ` Nilay Shroff @ 2024-02-21 9:46 ` Chaitanya Kulkarni 2024-02-21 16:27 ` Keith Busch 1 sibling, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2024-02-21 9:46 UTC (permalink / raw) To: Nilay Shroff Cc: axboe@kernel.org, Keith Busch, linux-block@vger.kernel.org, Keith Busch On 2/21/24 00:46, Nilay Shroff wrote: > > On 2/21/24 02:11, Keith Busch wrote: >> From: Keith Busch <kbusch@kernel.org> >> >> If a user runs something like `blkdiscard -z /dev/sda`, and the device >> does not have an efficient write zero offload, the kernel will dispatch >> long chains of bio's using the ZERO_PAGE for the entire capacity of the >> device. If the capacity is very large, this process could take a long >> time in an uninterruptable state, which the user may want to abort. >> Check between batches for the user's request to kill the process so they >> don't need to wait potentially many hours. >> >> Signed-off-by: Keith Busch <kbusch@kernel.org> >> --- >> block/blk-lib.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/block/blk-lib.c b/block/blk-lib.c >> index e59c3069e8351..d5c334aa98e0d 100644 >> --- a/block/blk-lib.c >> +++ b/block/blk-lib.c >> @@ -190,6 +190,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev, >> break; >> } >> cond_resched(); >> + if (fatal_signal_pending(current)) >> + break; >> } >> >> *biop = bio; > I have an NVMe device which supports write offload. The total size of this disk is ~1.5 TB. > When I tried zero out the whole NVMe drive it took more than 10 minutes. Please see below: > > # cat /sys/block/nvme0n1/queue/write_zeroes_max_bytes > 8388608 > > # nvme id-ns /dev/nvme0n1 -H > NVME Identify Namespace 1: > nsze : 0x1749a956 > ncap : 0x1749a956 > nuse : 0x1749a956 > <snip> > nvmcap : 1600321314816 > <snip> > LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use) > <snip> > > # time blkdiscard -z /dev/nvme0n1 > > real 10m27.514s > user 0m0.000s > sys 0m0.369s > > So shouldn't we need to add the same code (allowing user to kill the process) under > __blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if > drive has a very large capacity then it would take up a lot of time to zero out the complete drive. > Yes the time required may not be in hours in this case but it could be in tens of minutes depending > on the drive capacity. > > Thanks, > --Nilay > > this is long standing problem with discard and write-zeroes, if we are going this route then we might as well make this change for rest of the callers in blk-lib.c .. -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-lib: let user kill a zereout process 2024-02-21 8:46 ` Nilay Shroff 2024-02-21 9:46 ` Chaitanya Kulkarni @ 2024-02-21 16:27 ` Keith Busch 1 sibling, 0 replies; 10+ messages in thread From: Keith Busch @ 2024-02-21 16:27 UTC (permalink / raw) To: Nilay Shroff; +Cc: Keith Busch, linux-block, axboe On Wed, Feb 21, 2024 at 02:16:23PM +0530, Nilay Shroff wrote: > On 2/21/24 02:11, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > # time blkdiscard -z /dev/nvme0n1 > > real 10m27.514s > user 0m0.000s > sys 0m0.369s > > So shouldn't we need to add the same code (allowing user to kill the process) under > __blkdev_issue_write_zeroes()? I think even though a drive supports "write zero offload", if > drive has a very large capacity then it would take up a lot of time to zero out the complete drive. > Yes the time required may not be in hours in this case but it could be in tens of minutes depending > on the drive capacity. Yeah, that's long enough to change your mind and not want to wait around for it to proceed anyway. Between that and the filesystem usage, looks like I have more things to consider for v2. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-21 16:28 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-20 20:41 [PATCH] blk-lib: let user kill a zereout process Keith Busch 2024-02-21 3:05 ` Chaitanya Kulkarni 2024-02-21 3:16 ` Keith Busch 2024-02-21 3:21 ` Keith Busch 2024-02-21 5:32 ` Chaitanya Kulkarni 2024-02-21 8:54 ` Ming Lei 2024-02-21 9:45 ` Chaitanya Kulkarni 2024-02-21 8:46 ` Nilay Shroff 2024-02-21 9:46 ` Chaitanya Kulkarni 2024-02-21 16:27 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox