From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E19FC52D7C for ; Fri, 16 Aug 2024 01:59:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B63C8D002F; Thu, 15 Aug 2024 21:59:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 267288D002B; Thu, 15 Aug 2024 21:59:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 109C98D002F; Thu, 15 Aug 2024 21:59:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E49B08D002B for ; Thu, 15 Aug 2024 21:59:18 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 98ABF8191B for ; Fri, 16 Aug 2024 01:59:18 +0000 (UTC) X-FDA: 82456451196.25.EFFDB6F Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf22.hostedemail.com (Postfix) with ESMTP id 9C007C000B for ; Fri, 16 Aug 2024 01:59:16 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=J+J7t+Oe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723773497; a=rsa-sha256; cv=none; b=5004ERiTfj63h89wC3qG3BxJoKvzBuVtPmB2oS/o3qohGt54VrCRRSHadT3LWOT2cpQ8uw glwLE/MbPAdeVcN+7cr1BqlyxexYY3kOtJfqEEwA4+7utRL8G30y3cdzk4uFca+LMfC+OB 0TQu0Xn6G0D/Klva423tOaw/kZeLfHU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=J+J7t+Oe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723773497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o/Y3G2OoO96Bba31b8hkBVJ7gM7C216dPhBHBIsl3B8=; b=lHbF2pAX9/aTKLta+uzZpK0uvz3WX19MQTeX24yTShAiKezRRLGJCyGPrlK0n8Va7SAZQ9 /HfHxAh4R2xbppYUuolK6uJqVFg0sVNrL7fakEG8sMERIjiyYHqD89dlI3TnrB+NoXHdM0 G5V7/GGofqZzI4PK/dcukb4H6wIx2NU= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a7de4364ca8so189170466b.2 for ; Thu, 15 Aug 2024 18:59:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723773555; x=1724378355; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=o/Y3G2OoO96Bba31b8hkBVJ7gM7C216dPhBHBIsl3B8=; b=J+J7t+Oes0qif+C0WGAHpcGdHakgR6ZTMLG30Z7b2+GQhm4hZT/FyelSDMDGiGxvER nN+/UU2JGi9qWyaDlG1iOWp3qPegpEGCOKyOtLbutKqoEK9I7u6537dUhKkDFRtrQso8 OE64H+IW4sGsFVQ5zdkTxZPpIyCYAK7ITImwwLlCaoUea/MBTXw4dlg/C2FBCqc9dDex SjOkkJEruybpq4ZE2DpFeuddGSvM5VcN3O9wdgZ741suELsvZzpuHkeCY7lJOZ0/OSlh qd8SXiJwdAhehncSvb0IgyWe65EnzcfOoeyM0deK8eThboTTTEgAGcW8ydVFwKQoBnwi NPHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723773555; x=1724378355; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o/Y3G2OoO96Bba31b8hkBVJ7gM7C216dPhBHBIsl3B8=; b=EItLXCtiFtdUuTxCPtfqvDhVVGmsLffDhawkdr4zP+noQe5cyUR7i4XWd4SH2K3DZC tX6JL4pJRXHWZhEVZx32c0u9emLRv59S1ftwBvIiNoqKLYM3GfPZrq0EzWPo0Dhq6uOq YP7QronPbhLbOLLq1BhbZNb90vrHLdk80tNlajcu3YOcCs4dpvEavgFwTAtdogeLz67g I1x5fAkx/HIBl4xPaCLMbF8y1Fv1cd1g23/dELyqdmrzOcnv/QpzV5vbOGbKdHwXiKOy zcOoyo0orCaO2SUipKo5jSFuhDHueK9IWsjX/VAszC1/Zvp97D9o73YGnplA9v29vvS+ s9wA== X-Forwarded-Encrypted: i=1; AJvYcCXQxB4VUofS5+/pk3DrFSSgLr5+3rmYXz5+BQu51Haiq+xGUhMHzYmJJ8ftERmtQ8bI6TK/jENC/mNBvfwIEif1cb4= X-Gm-Message-State: AOJu0YxQYj6mSTyihcAy0S2RUoKfn9HQNXlYUYTdbr+G3iqBG1eW7GyG /QLK6/8TT5qUI02FAj/0My0H4mKmJE5/yDBNCViIyXK0FMXFvFQx X-Google-Smtp-Source: AGHT+IH7VivWEMfJjLw8o+JCpPIz2lu4p/Ahpn2HRkkwPnYwU4Bt3SrSZk04yLy5L1SVs/jbT0zn4w== X-Received: by 2002:a17:907:f1e8:b0:a6f:e47d:a965 with SMTP id a640c23a62f3a-a8392941bd2mr102098966b.41.1723773554539; Thu, 15 Aug 2024 18:59:14 -0700 (PDT) Received: from [192.168.42.136] ([85.255.234.87]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfe58sm180471266b.62.2024.08.15.18.59.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Aug 2024 18:59:14 -0700 (PDT) Message-ID: Date: Fri, 16 Aug 2024 02:59:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 5/5] block: implement io_uring discard cmd To: Ming Lei , Jens Axboe Cc: io-uring@vger.kernel.org, Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org References: <6ecd7ab3386f63f1656dc766c1b5b038ff5353c2.1723601134.git.asml.silence@gmail.com> <4d016a30-d258-4d0e-b3bc-18bf0bd48e32@kernel.dk> Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9C007C000B X-Stat-Signature: ye4cxfk5amj6kjfba3msgbezibc9xf5w X-Rspam-User: X-HE-Tag: 1723773556-171961 X-HE-Meta: U2FsdGVkX19yo7eooBD2J4puqeXvFkResxopijy+yH4vGHlkYzswT1q+Qo+ecmqcLbXXimgdmAKj3UEAiYYUkqhb6glrfH6ZOMvq7B7qWud2ZHpWkcGMfzQL/lOzA5NlH9Zjuknj72ZwA6NpkMP5zHqSAhHw3+VvBskSHPRMQIw74P2gnnhIyjGW17Uo2g/eyN6BnbYAEKCJzaoL2JzsKflyJnrBfhvEA+OHLC+gd4ofv8ODvYJxv5CwamMWzemqcP9P31+m9cNoMdrELw2NstlYi1bFju3STtQECZ92Zk/5UoKt9Fh6eiksU/IwGOXPRCesFKTI7HTc9RffqDSL+zA+Jnb7Il19Pn/pchXkvvtqKtAH0bkBXn46uDwjbPAMtHkRFV5lU7ZJzXnXHcnYMXSA3MOONtvWQaF8LmMShEdsn1+U5RHBQHE4B0WNUEIaNhRQdpSWXZA0KpVyqHD0j6vWjzxnVxchoy14hl4F1kZGiqv92PHlMTGcy9oXobDF0GAfoQA3LlT2FCPtaDspCst4zQoT285LOBbZGQKJmIXn7w3JU8nokdATnsy6oTfsxe4y5vuwW9O+2Mm5XN5MSSmxlMb7Voe17HLLzKB2RMBbsoUVPBkWaLRJZ6vx+3MDnBTZsnv22x2uxaqgasdaidApvFWIjIGFN2R5yhRVnTXhG5mimjtb1FYwc3v85VPwVFtSPP3M0/qmlw3fTJIOka6GBOPBgAAq0LJHZ0xyhjQFTiRChW/zYKUr2LhMbKqWe4HYKqqyUZY2AyEsyP2BbaBKDC5h0zYo2CMHm0+Ij/MDwVFWMVPAaJzVIoH36dejJJ1ZyTImNEdzYEP9pf8g3F+gnjfjs3nDwxYMQvnfCLL3NMIgkbGL4jjy87v7nwRVjoMXdbIkuvkOZoHrpK+OfLbDProbQdSiHIrL/nncMQHq+ibITajNwOP3k4UqNXRSLAwUJYphJJ9FRHxa3qm cTRqRfAa XvzWK08QnAcL9s8usfbG1NnJMZBYVk4iu90f5LblWcYVRUCun33nHEw0oSzpRPpcTpVNLiwtxmXoPc2Xod56pzT5+wTAcX6C0nR52Ve2JCNnYWBscDr5WpofOY1jU5cbVLZf2Pch2cjK8AuPUrTDOycBLjjbF83awLufCD0ICIzfwTxkXkVgCFhlwHbaW+h9yu6++wFZTIUArN59g8s34jt3gvyCryfvY4hH9G5/YIaaPz7KWQAUv4qDhvxYIDaxPYE0Q0gdfcMgE52KBzD2k2+PXub6rLE7B4PP4a1ckCMnySGS8AecPev/nRry8BW57eR57tDmk2S0znn12DDY2STc+6dFFuCxDF1zofA9qmnbkujwW1g/PmhkJZgV86AqOnyQV8+BwjRw4UlD6rqjWfr3Dp0xj1+1xyyFbQN5/qg6NR2R+44gFZXR1j51VFbIzd//li2nScYo5KJobuXqxJA2BMncwjbBRzb7S/3DjB0wyM391FjJ5V7oqenl1btpODs9KUmCv+dIJJaaS8f0ciK2wBJDsnHYC3Hgd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/16/24 02:45, Ming Lei wrote: > On Thu, Aug 15, 2024 at 07:24:16PM -0600, Jens Axboe wrote: >> On 8/15/24 5:44 PM, Ming Lei wrote: >>> On Thu, Aug 15, 2024 at 06:11:13PM +0100, Pavel Begunkov wrote: >>>> On 8/15/24 15:33, Jens Axboe wrote: >>>>> On 8/14/24 7:42 PM, Ming Lei wrote: >>>>>> On Wed, Aug 14, 2024 at 6:46?PM Pavel Begunkov wrote: >>>>>>> >>>>>>> Add ->uring_cmd callback for block device files and use it to implement >>>>>>> asynchronous discard. Normally, it first tries to execute the command >>>>>>> from non-blocking context, which we limit to a single bio because >>>>>>> otherwise one of sub-bios may need to wait for other bios, and we don't >>>>>>> want to deal with partial IO. If non-blocking attempt fails, we'll retry >>>>>>> it in a blocking context. >>>>>>> >>>>>>> Suggested-by: Conrad Meyer >>>>>>> Signed-off-by: Pavel Begunkov >>>>>>> --- >>>>>>> block/blk.h | 1 + >>>>>>> block/fops.c | 2 + >>>>>>> block/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++ >>>>>>> include/uapi/linux/fs.h | 2 + >>>>>>> 4 files changed, 99 insertions(+) >>>>>>> >>>>>>> diff --git a/block/blk.h b/block/blk.h >>>>>>> index e180863f918b..5178c5ba6852 100644 >>>>>>> --- a/block/blk.h >>>>>>> +++ b/block/blk.h >>>>>>> @@ -571,6 +571,7 @@ blk_mode_t file_to_blk_mode(struct file *file); >>>>>>> int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode, >>>>>>> loff_t lstart, loff_t lend); >>>>>>> long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); >>>>>>> +int blkdev_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); >>>>>>> long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); >>>>>>> >>>>>>> extern const struct address_space_operations def_blk_aops; >>>>>>> diff --git a/block/fops.c b/block/fops.c >>>>>>> index 9825c1713a49..8154b10b5abf 100644 >>>>>>> --- a/block/fops.c >>>>>>> +++ b/block/fops.c >>>>>>> @@ -17,6 +17,7 @@ >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> +#include >>>>>>> #include "blk.h" >>>>>>> >>>>>>> static inline struct inode *bdev_file_inode(struct file *file) >>>>>>> @@ -873,6 +874,7 @@ const struct file_operations def_blk_fops = { >>>>>>> .splice_read = filemap_splice_read, >>>>>>> .splice_write = iter_file_splice_write, >>>>>>> .fallocate = blkdev_fallocate, >>>>>>> + .uring_cmd = blkdev_uring_cmd, >>>>>> >>>>>> Just be curious, we have IORING_OP_FALLOCATE already for sending >>>>>> discard to block device, why is .uring_cmd added for this purpose? >>>> >>>> Which is a good question, I haven't thought about it, but I tend to >>>> agree with Jens. Because vfs_fallocate is created synchronous >>>> IORING_OP_FALLOCATE is slow for anything but pretty large requests. >>>> Probably can be patched up, which would involve changing the >>>> fops->fallocate protot, but I'm not sure async there makes sense >>>> outside of bdev (?), and cmd approach is simpler, can be made >>>> somewhat more efficient (1 less layer in the way), and it's not >>>> really something completely new since we have it in ioctl. >>> >>> Yeah, we have ioctl(DISCARD), which acquires filemap_invalidate_lock, >>> same with blkdev_fallocate(). >>> >>> But this patch drops this exclusive lock, so it becomes async friendly, >>> but may cause stale page cache. However, if the lock is required, it can't >>> be efficient anymore and io-wq may be inevitable, :-) >> >> If you want to grab the lock, you can still opportunistically grab it. >> For (by far) the common case, you'll get it, and you can still do it >> inline. > > If the lock is grabbed in the whole cmd lifetime, it is basically one sync > interface cause there is at most one async discard cmd in-flight for each > device. > > Meantime the handling has to move to io-wq for avoiding to block current > context, the interface becomes same with IORING_OP_FALLOCATE? Right, and agree that we can't trylock because we'd need to keep it locked until IO completes, at least the sync versions does that. But I think *invalidate_pages() in the patch should be enough. That's what the write path does, so it shouldn't cause any problem to the kernel. As for user space, that'd be more relaxed than the ioctl, just as writes are, so nothing new to the user. I hope someone with better filemap understanding can confirm it (or not). -- Pavel Begunkov