From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B99FECA0ED3 for ; Wed, 4 Sep 2024 14:08:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA9DF6B0413; Wed, 4 Sep 2024 10:08:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0B186B0416; Wed, 4 Sep 2024 10:08:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C36B76B0414; Wed, 4 Sep 2024 10:08:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9ED7C6B0412 for ; Wed, 4 Sep 2024 10:08:19 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 46E668167F for ; Wed, 4 Sep 2024 14:08:19 +0000 (UTC) X-FDA: 82527235518.17.BEEF9A8 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf04.hostedemail.com (Postfix) with ESMTP id 34DBA40034 for ; Wed, 4 Sep 2024 14:08:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lCUtEBG5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725458819; a=rsa-sha256; cv=none; b=IeuM0M9sx0TZHQKpVQ4hZGMRYzQOc7ee4PM9vh/eYcCWXlwBnhSSumaGxCniLSJl8kKXyg 4XnGLqQ9+n3a5yS7XQ6zTAo4f14dScYMb9A/rF33l8T2GCcBqRBr0a+RXHz2wXVryczuUF rHMtqFfzK7og1RN19WxERwrSNzRdrw8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lCUtEBG5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725458819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9Vzg9NTfvw0pq61GqdD+2MdU5ejBvO4DrvV5LujxZ60=; b=YDwE56ynZ/3F++s0GWGXMvI4W5rgq3Y7LOGni74OXeCGkpr9xWkoYFAv7XYwKxUf9AhWd7 tRqabb8z91rfRW4JZfraEf9KYURWLARMVfPBGJa/2EbWLE/IZJzM49Y+AzKdIq8IqwMU1+ KxRULw94zELO3Dklj3MaNlY3gyWLSYY= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a89c8db505bso475929366b.0 for ; Wed, 04 Sep 2024 07:08:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725458895; x=1726063695; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=9Vzg9NTfvw0pq61GqdD+2MdU5ejBvO4DrvV5LujxZ60=; b=lCUtEBG5vM7UpmPTrj1rWwtGiSm/h5XvdKzfU6fuoiQpvE+0ngke06t+xKG3ZepbZi us2CwzCqTA3zrE5UQUypHVFrqbLttsqaYOGHq2gTDczBkyahvA53qy988Fm7/QXI6v+2 DRLQcy7Nq5bJokGQEgSxcDODO3g4QeVdd06IB9Ttxq3H4uS/uP0Cgr4IWLM9O1wLx7fD qh7QAil4SykxB8q9JkXhtZKuJNJ2mVc8emvLv8ajGPF1X9uiosy5YELnVntmX9CkTw/e 8rBEVjUgW5w93hYunt8lmIqEbK9EeMEtiM56DSy7uJn6kw8I9+r72hqfNpcbO/BkPkcg iqRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725458895; x=1726063695; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9Vzg9NTfvw0pq61GqdD+2MdU5ejBvO4DrvV5LujxZ60=; b=DiBmAgfTxRz97YQzK8ywO3K3bCsXhemF02QQNt2878s9y5W08si39HqCjzC0Zdpqfv Lsl84WiaxyitZP1T1I1WTE58SUP1W3xrN65C8Fa69XJi04CLpzHoevw739Z0N7b8/9pA S8gymJtJaE/wgUYVMyC7wD9UxIqH6cY136kXzKpnaSxSwz4j3cHrT7JjWR2cV48qpUNz NZIUD+m1MUa2x5OAiuEyJbcN7d7KXxRVjham//rr9d/6ael1fx5eWL3Dw6DhDsCdI/V0 PfAmm/UVl/ufCfvenVb7mLhXcSudVyamYHuxKtaufBMr0esAgb574xGkQz0T/5Exhg0M 1szA== X-Forwarded-Encrypted: i=1; AJvYcCXB/cggIsH9C7xnJ/TjYBG8ksS9hRUIDbkgF+P+cU5mQ3sg2PiJFuSJeJCMNcyx38A94QOkjT/vfg==@kvack.org X-Gm-Message-State: AOJu0YzE+R6B+OnBn2C1SjfP+d92oH1SSUbJwGAXE84TrUQ4KdjtYmIW 1AxLD9Uq9vfEQ4P9JbIqqAcwrPeNVCTa/3fylyfbjn9aSrkjkG/1 X-Google-Smtp-Source: AGHT+IFHkxzA8+V6mMLDPA0dFW/6czPw92cnzbP7aNN0xEnuG0VygH5QgI54NmEhOeZFBBZ5S5U4Ng== X-Received: by 2002:a17:907:2da5:b0:a86:9cff:6798 with SMTP id a640c23a62f3a-a8a32ed4b43mr369763366b.30.1725458894425; Wed, 04 Sep 2024 07:08:14 -0700 (PDT) Received: from [192.168.42.8] ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8988feb61fsm828174566b.10.2024.09.04.07.08.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 04 Sep 2024 07:08:13 -0700 (PDT) Message-ID: <574578e0-ed5c-488e-b4f7-71da59651fc9@gmail.com> Date: Wed, 4 Sep 2024 15:08:41 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 5/7] block: implement async discard as io_uring cmd To: Christoph Hellwig Cc: io-uring@vger.kernel.org, Jens Axboe , Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org, dchinner@redhat.com References: Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 34DBA40034 X-Stat-Signature: ggm5ep3oqcz94ut6u8gre9oaf6wm5zcp X-Rspam-User: X-HE-Tag: 1725458896-794396 X-HE-Meta: U2FsdGVkX1/8J79BlAv47urbXtGmNBNAi4QKhwTedwFTytgZyjZ6SASYCiZVImwkNed+lMHHF9NFViQgqImhVmJjSDMRuB+krTgz09TbfuawVUMnka7gvnsthPC4FWh8aJ+3k3Bj54rLtEv7T9egTCyo0mHrt/4QKXcEVcE5TtOehjyI/95aNaZ2SHBebqdq79SacDjAvNYecC/UxI6eypV0921g3UJFn2lpL/zm41GIFBecMjcmoYovpkPpr0Q4op8XkVtKCwbtd6MxwmhmHkw8T1F1xaj27E5uzTzZzNwx/nqeTYcLuudJ9yqktTx9PbNgl010jqIp4gbp3MlWykO0eD/Vv1pDOfe3GT8rg0O7nKY1iFDwUvUtqnYbBmPwOWLP//5Sd01Die4ddM9SPHI6eQRO/LwSQVYI1quiobPC7EPkE3yMpWgxqdGkXQsRR2N1QgAMJF2e8XfZcC6yoxHEMhVcaeTv3JBugbB6IVQlcSfPagSItBx6twiVkSfn7d+WWts5VR7XzltFyRsf9ZAfaXvQsCmqeDLDnw0F+saARpcDmAfUh6OviJFlK556SZiO5YhSIF3aF74nzTedZM7Ex/WkOl99sQX47u28PrQg6CY1e7Qhd1T4DHgS3gKs+Gtrc2huuMAWXTFuH21jUXOllTa8dRQm7VIDf7fgAYwX+bUc10pnk06UAYaxPOaFGb+GyEwe4G518j1E6vDqVEMhVly90OeRMVUSqzpn2U7xLHgJDgch7YOP6OiGGNUYC+cQq+kmJp1Y7hWTDsnzlqk3DhbgDfQ0kLIRN7XArJ/kjIX9b+Yj4Z3wYdiFe723N6z7rKHgkc2MtXEV7B03kRLN3+HlMnUcwoOlwQyaPXdLhDeHr7u4JO1ms2rXsdWtWYQR1Al1zfCFnDPwRkxHk1KXTNcgLjbhg/NPGl/E/VHuMeuhKyQGx/IaYuve0bqdBcKcZzBDfNUpFiOw9WM qPwx8y/+ 5s+JJuhLWXiMoD0U+bMX23iuBmPJ1uwn1iSG341fGtnSF31FlJe748WycYJPwm3d+i2qU3KqASQlsi2YZped9GwfG5Q8SZNJ84yEL9o2zdx9GcSoEDZkPonLVaIDpm0rEEIB1/RVNyAL57ZSzWJ/Hvpqt9LuTUDf7gAr1clzVwJHDIR+wRxw6y1Ak1Yn62RRFb86fwuhNE0G70hML5HIA5PbbCf0G0KexOZO46Xmyv43eGMQ+dgRTu7IZ7xqxfPC5l+b2XznH8rZAvGDBjVIyoZoOHpOFl4OxCZAdRZ0giqId+3ggx3b3OR/JcqPUZuOhi1ZB9DLvdrQpoHFgpAt9i0t606iWupMtiGzNvxKcI4rs2E9eAXnBhS4nyw3qwNe+FvHQAbhiE9IXdMkJ+qVNZCJC1OToaDUXWrcLqASr+ciiWs6hqY48C5po35KKUt/I3OmGN8rCLerAlxes6Qi0ys3VHO+bgevbyTvgXjpghwu4lqyvQbX5TWHpG6jUHFKmjg14 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000605, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/23/24 12:59, Christoph Hellwig wrote: > On Thu, Aug 22, 2024 at 02:07:16PM +0100, Pavel Begunkov wrote: >>>> Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races, >>>> we only do a best effort attempt to invalidate page cache, and it can >>>> race with any writes and reads and leave page cache stale. It's the >>>> same kind of races we allow to direct writes. >>> >>> Can you please write up a man page for this that clear documents the >>> expecvted semantics? >> >> Do we have it documented anywhere how O_DIRECT writes interact >> with page cache, so I can refer to it? > > I can't find a good writeup. Adding Dave as he tends to do long > emails on topic like this so he might have one hiding somewhere. > >>> GFP_KERNEL can often will block. You'll probably want a GFP_NOWAIT >>> allocation here for the nowait case. >> >> I can change it for clarity, but I don't think it's much of a concern >> since the read/write path and pretty sure a bunch of other places never >> cared about it. It does the main thing, propagating it down e.g. for >> tag allocation. > > True, we're only doing the nowait allocation for larger data > structures. Which is a bit odd indeed. That's widespread, last time I looked into it no amount of patching saved io_uring and tasks being killed by the oom reaper under memory pressure. >> I'd rather avoid calling bio_discard_limit() an extra time, it does >> too much stuff inside, when the expected case is a single bio and >> for multi-bio that overhead would really matter. > > Compared to a memory allocation it's not really doing all the much. > In the long run we really should move splitting discard bios down > the stack like we do for normal I/O anyway. > >> Maybe I should uniline blk_alloc_discard_bio() and dedup it with > > uniline? I read that as unŃ–nline, but as it's not inline I don't > understand what you mean either. "Hand code" if you wish, but you can just ignore it >>>> +#define BLOCK_URING_CMD_DISCARD 0 >>> >>> Is fs.h the reight place for this? >> >> Arguable, but I can move it to io_uring, makes things simpler >> for me. > > I would have expected a uapi/linux/blkdev.h for it (and I'm kinda > surprised we don't have that yet). I think that would be overkill, we don't need it for just these commands, and it's only adds pain with probing the header with autotools or so. If there is a future vision for it I'd say we can drop a patch on top. >>> Curious: how to we deal with conflicting uring cmds on different >>> device and how do we probe for them? The NVMe uring_cmds >>> use the ioctl-style _IO* encoding which at least helps a bit with >>> that and which seem like a good idea. Maybe someone needs to write >>> up a few lose rules on uring commands? >> >> My concern is that we're sacrificing compiler optimisations >> (well, jump tables are disabled IIRC) for something that doesn't even >> guarantee uniqueness. I'd like to see some degree of reflection, >> like user querying a file class in terms of what operations it >> supports, but that's beyond the scope of the series. > > We can't guaranteed uniqueness, but between the class, the direction, > and the argument size we get a pretty good one. There is a reason > pretty much all ioctls added in the last 25 years are using this scheme. which is likely because some people insisted on it and not because the scheme is so great that everyone became acolytes. Not to mention only 256 possible "types" and the endless mess of sharing them and trying to find a range to use. I'll convert to have less headache, but either way we're just propagating the problem into the future. -- Pavel Begunkov