From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C6CAEDE9B8 for ; Tue, 10 Sep 2024 20:10:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B29598D00BA; Tue, 10 Sep 2024 16:10:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD93B8D0056; Tue, 10 Sep 2024 16:10:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A0878D00BA; Tue, 10 Sep 2024 16:10:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7E0868D0056 for ; Tue, 10 Sep 2024 16:10:12 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 15D691209EA for ; Tue, 10 Sep 2024 20:10:12 +0000 (UTC) X-FDA: 82549920264.05.35993F7 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf02.hostedemail.com (Postfix) with ESMTP id 1BCEC80007 for ; Tue, 10 Sep 2024 20:10:09 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z1ewZR26; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725999006; a=rsa-sha256; cv=none; b=uzsEmFYpkzDfDgviIV9CMN11Y57K1oceHWyDQ0UPCihAJiuVknYNG48JWNYkdF9HcVB7Jw 34rLPVnwvLHMoAyrKdu1/aWLtiXn94vhNIislsPPHytnj2IJeMMf/ItIiqy8Uu9huEel3R aUk243iCFMQyg58lpBqJSGSYgyEVLpk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z1ewZR26; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725999006; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1V8I0bda6wgDAwGKScnVsLlsELDqLNjD+DNqt7q61gM=; b=f2DPcqbq2eNTnP2hHM5/6SzsrM5wZNzHLC/XMaDGH9aCXs7lfshmP0kxxpu2yLF1sG7rIN u9jucNcca9fR7HqICBHu4YZnwNZv8exAUG+LYuRNYNiXGyVsS1W7g+ZAGvsLHDlO1qruOo 3Lb+jZ1+tnc6CKReDExslHR2w4/d1cQ= Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42cc43454d5so7980815e9.3 for ; Tue, 10 Sep 2024 13:10:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725999008; x=1726603808; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=1V8I0bda6wgDAwGKScnVsLlsELDqLNjD+DNqt7q61gM=; b=Z1ewZR26maqgri3neyE0IRz16qblLcDhY/47UcISSLxV2R6J01VvFED7xzKmF/ZglM VSYqLiJWt5yI6Q0zvPlzo/VWV/hEquMqy4jXRVBbX4fDwihS9unoH1iQmctRugTrfl6H Qf5RBIcQ3FFVc2cSjMocbl9TPOfEOU0bX45aS9uE4Jr3SsHOq7uihmUxC3H7yBK3MIcb U9xR3ZFO9zn39SkKAuqpAVWD6uiefj0qL4ZEHQQ+01P6PFuleNiuNhnzAo4HhF6bmbgm 5dxG3Z3c4nUswpU6/bMpbxuKTaY+BId8io6r8w5zNkNO/o7UIS+J74c5ouIgGnBeya0x /eJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725999008; x=1726603808; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1V8I0bda6wgDAwGKScnVsLlsELDqLNjD+DNqt7q61gM=; b=DOmKAlEYhOUVMY1kcq3EYHogTh6rj+9KiQQAmMrSOohoLYoIlbyx/Mwwbd9rQN7Ixh k2Txj3Nb9Y4uk2lu9p/5XzwjFf9DIEoSD+3ysUILw+oK89mTWAptjWtkEJCsEEYiNz9F 8EsZaGNd0UkGCLTpe8EsFFS/ucOpytlgBn0nYiulu9sG/f3IefaJwvveWycOtS6Q32X1 uTByhB2/LVQkdSGSLEDNsMzI2ML2anP/eU0gz8tu5KybnP1Gs0vTgldWkC7Zd0Q9b8TI TSpuR06QYSolowed1khWExXLyyW4yjTiXdtek6eqmj/IYsqqze1cLDBQ/GvRJFY1MEFl HC6g== X-Forwarded-Encrypted: i=1; AJvYcCXY0m5qGVYsBZLYKYiS/LURJV6dbgmy6IHuhH00aKtR607QWqjZKyRwr+ZY+nqsYBmuxuZo8NKPdA==@kvack.org X-Gm-Message-State: AOJu0YwV+3nGtjNnb++PyOhvrEvf3uzco25pXHGEKHGZxU4+QtoEKtu0 zJZoVg4EpQYbQa0/+x+69pOQTaaSocCOq319aqTpmxJfQ8kPS/sO X-Google-Smtp-Source: AGHT+IGPhCsr2Cg6XFvgIErACrpEzyACnrLWkIv8TFs1iJmpEPQsddD03gvCRqm/Dm+ZabnJuSjhSQ== X-Received: by 2002:a5d:47ab:0:b0:374:c122:e8b8 with SMTP id ffacd0b85a97d-378949ef675mr8780895f8f.11.1725999007887; Tue, 10 Sep 2024 13:10:07 -0700 (PDT) Received: from [192.168.42.24] ([185.69.144.178]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42caeb33489sm120509425e9.19.2024.09.10.13.10.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Sep 2024 13:10:07 -0700 (PDT) Message-ID: Date: Tue, 10 Sep 2024 21:10:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 8/8] block: implement async write zero pages command To: Christoph Hellwig Cc: io-uring@vger.kernel.org, Jens Axboe , Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org References: Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: uaowhe5xxx8zhcy1zgffuisoq8jobd3k X-Rspamd-Queue-Id: 1BCEC80007 X-Rspamd-Server: rspam02 X-HE-Tag: 1725999009-683721 X-HE-Meta: U2FsdGVkX1+L8736Qdy1TRhH+NOpo+g8L4FC5Wrrv35ChFmUHPuZSpMlQGR/upz/nIS+GIyBY4h/kJgRj4nM0tZegAp38+2Zi58qZaV2I3bFQR/yeel1xJjpyxNhvQSn/Ip5ckbl0k8vBjK5V72ELxhMuHthdjWO6ZHhtfLAzuVdIb3G3L9IY9LVJYB2dUesz61Ht2/ujOT1HfspQTTCYZfNAdhE6/K9vSvMqzvAMZ3KI4ZbKQZzUJyIc09eOjT5HWGkUpidJd1DDmfSwTF5EaMZ6bHDBNU6HGsg7WDhjpQwZT1P8pRlVIS8iGhWsUSKxSDtjlGWEI2PuqceYaAdWhhxbN1sWV3/+yXLW5zzlxL+iSQAAfN+urRyqs/a0Zlaui0kEx+6/4a9e7TiM2ywo5Z7xGvfg1ahILIE2UPpBFKeSs+Tl1I4QC3ifD0Td8MBDJdRoHEQJdyJAcvLITjQ96dC4LD7QJJbtKSdaZyldl7vi4hxtm5LhwCuttvENrUF4dKdj+YRI0k4M5g/brEZzYWadmKL8+CbDS09huEe3RZy3ltl4cYtlVZF3T05YPWVI2yFoxbUSAiEe+gWjVMODYT5fryQS2sLt6eX+yXLmPXoYjg/yTeA9v/afYKlT6RZk4CPOiPaaEkZOeYi/plOkW0EYrcUHwo7Mhh05JRzx4ShMwqBbxbIhghxpY7jS/+mlGmGk/MtylnSq9Dd3iCAOs6bJhv5YLw00oIaoP/VOhBLrlb81GG0yp8V3XKfeTPn6Hz0qUxRwAbCHuqwTBhqdiqZagCZG8v4JYHZidXg5KUQjjcnwmQqLBETRLjSEjoVIrRQR/lv7lz3r2oehnMioHkcK0sXVzkRlRC9WHTQKAlqeRkTVNNrIw5uvsRv41/yLYjK8lqeH33wN0/HnUILg/FdtjREqtcYyHbYSOtM2lmctfff742q4w39KJKcHY87XyikEs4Dr26Q3y3NM0k fRooPYPk cFWGtWu22BzsNkhnXazxOdK7YgckqpW/IP/UdTOarhS41DZe+MqwgLuUa2FlunveQfFMoJ2pSskP/D8mKLjZSsCrFFdkS+VqtEcYoYN4q0G13/cRqRMWiQ2Pe8PTezj6oUvT0wJNzK8yuJJclRje1Bgcpqh6ZDHTdgi2eo8ENese+V3w/T0t5Jac2LuRTDrOPrd208fjax8izrZSvGDFYj0zdR3dGYOZtQfjwHYhQson50OScDSPes00CSFTMlp5szsncXb41ER6ACVGME921gijtZm+VHmyGyrtRj1YsOj++3BU+nkS0ICrmGwibFjZ1BYN7jWUslv5LViB2lJM+RNcR+iN3MApc3PVaRC5y7upirphC8qGzurrPUx+vbOd0pjs8LQIAcfa6TZU33CCoZf5ueu7sL4YZk28KbC9HgLeGFOOgXA3P739tqJFGQjipyuwpi1gidJPUreA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.014495, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/10/24 15:20, Christoph Hellwig wrote: > On Tue, Sep 10, 2024 at 01:17:48PM +0100, Pavel Begunkov wrote: >>>> Add a command that writes the zero page to the drive. Apart from passing >>>> the zero page instead of actual data it uses the normal write path and >>>> doesn't do any further acceleration, nor it requires any special >>>> hardware support. The indended use is to have a fallback when >>>> BLOCK_URING_CMD_WRITE_ZEROES is not supported. >>> >>> That's just a horrible API. The user should not have to care if the >>> kernel is using different kinds of implementations. >> >> It's rather not a good api when instead of issuing a presumably low >> overhead fast command the user expects sending a good bunch of actual >> writes with different performance characteristics. > > The normal use case (at least the ones I've been involved with) are > simply zero these blocks or the entire device, and please do it as > good as you can. Needing asynchronous error handling in userspace > for that is extremely counter productive. If we expect any error handling from the user space at all (we do), it'll and have to be asynchronous, it's async commands and io_uring. Asking the user to reissue a command in some form is normal. >> In my experience, >> such fallbacks cause more pain when a more explicit approach is >> possible. And let me note that it's already exposed via fallocate, even >> though in a bit different way. > > Do you mean the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE case in > blkdev_fallocate? As far as I can tell this is actually a really bad > example, as even a hardware offloaded write zeroes can and often does > write physical zeroes to the media, and does so from a firmware path > that is often slower than the kernel loop. That's a shame, I agree, which is why I call it "presumably" faster, but that actually gives more reasons why you might want this cmd separately from write zeroes, considering the user might know its hardware and the kernel doesn't try to choose which approach faster. > But you have an actual use case where you want to send a write zeroes > command but never a loop of writes, it would be good to document that > and add a flag for it. And if we don't have that case it would still Users who know more about hw and e.g. prefer writes with 0 page as per above. Users with lots of devices who care about pcie / memory bandwidth, there is enough of those, they might want to do something different like adjusting algorithms and throttling. Better/easier testing, though of lesser importance. Those I made up just now on the spot, but the reporter did specifically ask about some way to differentiate fallbacks. > be good to have a reserved flags field to add it later if needed. if (unlikely(sqe->ioprio || sqe->__pad1 || sqe->len || sqe->rw_flags || sqe->file_index)) return -EINVAL; There is a good bunch of sqe fields that can used for that later. > Btw, do you have API documentation (e.g. in the form of a man page) > for these new calls somewhere? Mentioned in the cover: tests and docs: https://github.com/isilence/liburing.git discard-cmd man page specifically: https://github.com/isilence/liburing/commit/a6fa2bc2400bf7fcb80496e322b5db4c8b3191f0 I'll send them once the kernel is set in place. -- Pavel Begunkov