From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6F6DC3DA4A for ; Thu, 22 Aug 2024 13:06:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E1566B02CA; Thu, 22 Aug 2024 09:06:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1912E6B02CB; Thu, 22 Aug 2024 09:06:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 032BC6B02CC; Thu, 22 Aug 2024 09:06:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D60556B02CA for ; Thu, 22 Aug 2024 09:06:56 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8F8ACA9DDA for ; Thu, 22 Aug 2024 13:06:56 +0000 (UTC) X-FDA: 82479906432.20.5196110 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf25.hostedemail.com (Postfix) with ESMTP id 19FF7A0026 for ; Thu, 22 Aug 2024 13:06:53 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IG861tBG; spf=pass (imf25.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=asml.silence@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724331933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JY6WD/cdLiYuTThhGylTjTMuiY5wE2mMjm5oQKT0gq0=; b=JlXKjZxL9e1lU98bGhqL+GAc/3QzmPepZpYphMdo/FyVAHWRVz/S7pEMwznubqzT8oB1ki xWU2M9gSotgrVFnvzgeS71tfJuQcT8udiFP7G1AvfSdzHDSKW0onX8yeEB9xvJWfje3Lc8 sGUwdAUkt1VY6p6MU4gecQIy7WnEPiE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724331933; a=rsa-sha256; cv=none; b=hBc0NSbmKkf+8vkaNLiZWHpAWYbufgufOq02x+MH5LAyAqxaIimB3irFpQVgJrTIOK2Y7r LJv8QytaTZEjfuyuWhhLsKk0ycJm9O7yxX6V+NVFoIFtc+nmRWIEH+iW1otaElTIAF36rY Zx3Y0SETLaaoCWHYoGnrvyFqNUkicuw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IG861tBG; spf=pass (imf25.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=asml.silence@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a8696e9bd24so48829966b.0 for ; Thu, 22 Aug 2024 06:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724332012; x=1724936812; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=JY6WD/cdLiYuTThhGylTjTMuiY5wE2mMjm5oQKT0gq0=; b=IG861tBG5ShLBoIVLuqNs38qW4o3QeaKksXyWODv8ggwFk4RVX7aI3SePXJWNHWpxx 3I2sVmZbAarhCLWLnbSmpM+uOXeA6jZ9ZiV3GM68mEp8cqTg7v7RbKZlgh7oiBU+M9J+ LQiH8YKbyBbnL70TpJmAtqaaFFcI59/gIXSuz9MQ8GXaSRYwuTS8lNNtN+18hcLC7QSk VXmBmeOBSKiGKpbEyUUdhydcrnaOFbqz5/y1uGlUqC2bZmw+cEYqELAvk/0Cve7K+PSv BuVfo1pTT9M5J6XlQGd3boL07R7Hq4zsXXZ/EY6wOYVKlStDFqWsppFcsecRstCB3iEz RZ7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724332012; x=1724936812; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JY6WD/cdLiYuTThhGylTjTMuiY5wE2mMjm5oQKT0gq0=; b=kPpfrnTJvPgE+VdDA5jLufQXlsHYqN2RiBSH7bXC98vGf8zNq9y+PQuUrx2vuIM6HN tzmlpOXst8fisemrCLet61cZQ9OJYYRWo61I/36J8H9d6vxFuqhvAvyyD9WWD/u6PrcM tOqeohgrRxsSdDBkN6IQ5KvFp7HO9QW7MPXZk9Su2QDNPFURyqc8N9gn5DK6v0ZSRI7F UOKrb+fwjwNaYVp1CGL6kBKjebHH9IrvPJxvOLtqxfTYLgSy8GKN2w1tgfIq58s1H7LB 3MUfjHAeRqC1wAI8M1PCCDhD4l09PAp8o6y0jXTGRsxU/DSXEFWWdrulUepw9/YRe+uR jCPw== X-Forwarded-Encrypted: i=1; AJvYcCUYiM7e3Dx3d/8zFpVkJvOeIcqUPLDNRdeUzMjJSAEbZp8E/FBS+KNqs77IHmjtsB8iAJFqK/BRiA==@kvack.org X-Gm-Message-State: AOJu0YyS6CUyKEtnSx3Xxl5xSx/lX/XtmLYL8C6+JJLNmLDri5ajj/YB bnkZFIpX1Du+1JuhdHfeFv/9UTB5/BEKTvS9S36tG00NZwwIXieQ X-Google-Smtp-Source: AGHT+IHyui9FV1zbhnVYWjCICzl8/4W4vLQP4cDkVJaCJwV36cV+fodq9sVe/ysFJ0Feghsq34W3LA== X-Received: by 2002:a17:906:eec4:b0:a7a:3928:3529 with SMTP id a640c23a62f3a-a866f110c45mr457228366b.13.1724332011466; Thu, 22 Aug 2024 06:06:51 -0700 (PDT) Received: from [192.168.42.32] ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c044ddc3e4sm898651a12.13.2024.08.22.06.06.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Aug 2024 06:06:51 -0700 (PDT) Message-ID: Date: Thu, 22 Aug 2024 14:07:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 5/7] block: implement async discard as io_uring cmd To: Christoph Hellwig Cc: io-uring@vger.kernel.org, Jens Axboe , Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org References: Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 19FF7A0026 X-Stat-Signature: bijjrgx4s366wezo9hwk4zpswhwx6dtd X-HE-Tag: 1724332013-601238 X-HE-Meta: U2FsdGVkX1+hm4kFnRJpqBf+S4kFp6jSWq7gIOnRuMAjxolYo5r367Hs/QwS8uVLvk9yboxfzSDUPEjqgouAKRT8O/FrjoqUaZj9MCnOyYQswWy1c9OF8nMru/ot2Us1gJ5PJ6zaFEoWNW2va6+eO+uVq+XWCpdX/HgAarJKx8ksM5o+Qusnt8HDzqMFJA47MlPxQ8sBF33sGyI2vglvi0PHcWIHYPfJdFxlsSfE3lkdDCAr2Iq4XZgiJlbFIRnbJrhWVpn/dM6SGkmJWSZ8VoQJRipnlUlky8hFntdAMZEe5h3Zmct/QSMuXO8QfKIOdYNJLpkYcT179z5+PZqBVWOQrZmQNbBZVFZdz87PUT7PqyNcxPtoNkVHA7ymJLqO2hd7Uz8oNFWihMnIdvmyyINUN2ZcqEbKRJHG+r1OCuldKZ3IROUC4ESAzmnwaj7zW/L8e+R3Ct0TzYWctFTIE8j6KlgoDRMVQb+PEV8koAzd7p0YxkSR21tc38g2BTtonmAQKCMAMF2rXFLYsEJau9dhFVUXkX84WHG1yTth2WtRqWGa5x4OnntMX3I7nyjsc6yTN93q+3Yb45TqryO20xpY8UVk1CVjEYg9FT81YWFqbBxhXou3xx6YjnPeBuzhqwvCB1wcYsQshTr6lIQ8EZc7Dqq1xfUNSmXf9yeFpfzXNK7Yu5wOMZI68zIBBmlPamBT6ArthtE4vCauEtgvh7Nf2ahtNeG6CUIcUVxyiRbnD1bBqqIesZuDMuNQAY3gszXLPRmSusKaTxg9Ka+Go7xkA5aOUPN3SLRNM6rfL/0svu/iK/M7s+kThjlortHk7XBOF7gVW/DYpurUnn7yYo1uMLv8qYT1ke1MqOW0bHaMxbZemo/8AkAREr2Jp66HpXBIbH93otxLA1BIr5r0QTgdOrpXeAD6Pjjyx+dmUp+OmfKl3zOrBexbp1fBn3+2571NPZu8ST8JUKlv0GH sZIlENUv 6wOrHZoR+TnUrL9RCBwleChQGq2Tawwj2QOrgPp/B4WplHLDfDbIZjHE1htm/HGw1N2SPDvM6rp+j1ckvRRMmGW+Ip0CUvVWqjSvatpThL87SjoZGj78DOcan7y0X+on2QKjXJCgEJfzgBc2tTf2+kqj7ev0XTg7IZf0pRmrhtb+B0Y20uOkfN0purwJafgUyLOXWnTd2dAbk9RTZBWPbmJ9bXwq67NbF7lDyfPWqPKikOFrH/4yYDypGQhEb9tIZBh7ZlYQeyrOxv2PGQtQ4hk77BQJtw16bmEfx6XInmViDQEJlxNqnybtqhiQQ2oBvpBxkaroAvdJibZKxLyOFfLSEplS10sY3ZfGqaANPSms9r75Bypy+2cQ3n/bejT7n3TWkPhks3TR/ZSxPukUe7GP1fos1ziNY7HurlJJkOsDopIIoAarCTVok70+gp4xh9mcRpyaoBW47vxc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/22/24 07:46, Christoph Hellwig wrote: > On Thu, Aug 22, 2024 at 04:35:55AM +0100, Pavel Begunkov wrote: >> io_uring allows to implement custom file specific operations via >> fops->uring_cmd callback. Use it to wire up asynchronous discard >> commands. Normally, first it tries to do a non-blocking issue, and if >> fails we'd retry from a blocking context by returning -EAGAIN to >> core io_uring. >> >> Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races, >> we only do a best effort attempt to invalidate page cache, and it can >> race with any writes and reads and leave page cache stale. It's the >> same kind of races we allow to direct writes. > > Can you please write up a man page for this that clear documents the > expecvted semantics? Do we have it documented anywhere how O_DIRECT writes interact with page cache, so I can refer to it? >> +static void bio_cmd_end(struct bio *bio) > > This is really weird function name. blk_cmd_end_io or > blk_cmd_bio_end_io would be the usual choices. Will change with other cosmetics. >> + while ((bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, >> + GFP_KERNEL))) { > > GFP_KERNEL can often will block. You'll probably want a GFP_NOWAIT > allocation here for the nowait case. I can change it for clarity, but I don't think it's much of a concern since the read/write path and pretty sure a bunch of other places never cared about it. It does the main thing, propagating it down e.g. for tag allocation. >> + if (nowait) { >> + /* >> + * Don't allow multi-bio non-blocking submissions as >> + * subsequent bios may fail but we won't get direct >> + * feedback about that. Normally, the caller should >> + * retry from a blocking context. >> + */ >> + if (unlikely(nr_sects)) { >> + bio_put(bio); >> + return -EAGAIN; >> + } >> + bio->bi_opf |= REQ_NOWAIT; >> + } > > And this really looks weird. It first allocates a bio, potentially That's what the write path does. > blocking, and then gives up? I think you're much better off with > something like: I'd rather avoid calling bio_discard_limit() an extra time, it does too much stuff inside, when the expected case is a single bio and for multi-bio that overhead would really matter. Maybe I should uniline blk_alloc_discard_bio() and dedup it with write zeroes, I'll see if can be done after other write zeroes changes. > > if (nowait) { > if (nr_sects > bio_discard_limit(bdev, sector)) > return -EAGAIN; > bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, > GFP_NOWAIT); > if (!bio) > return -EAGAIN > bio->bi_opf |= REQ_NOWAIT; > goto submit; > } > > /* submission loop here */ > >> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h >> index 753971770733..0016e38ed33c 100644 >> --- a/include/uapi/linux/fs.h >> +++ b/include/uapi/linux/fs.h >> @@ -208,6 +208,8 @@ struct fsxattr { >> * (see uapi/linux/blkzoned.h) >> */ >> >> +#define BLOCK_URING_CMD_DISCARD 0 > > Is fs.h the reight place for this? Arguable, but I can move it to io_uring, makes things simpler for me. > Curious: how to we deal with conflicting uring cmds on different > device and how do we probe for them? The NVMe uring_cmds > use the ioctl-style _IO* encoding which at least helps a bit with > that and which seem like a good idea. Maybe someone needs to write > up a few lose rules on uring commands? My concern is that we're sacrificing compiler optimisations (well, jump tables are disabled IIRC) for something that doesn't even guarantee uniqueness. I'd like to see some degree of reflection, like user querying a file class in terms of what operations it supports, but that's beyond the scope of the series. -- Pavel Begunkov