From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 389DC241CA6 for ; Mon, 24 Feb 2025 08:56:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740387414; cv=none; b=Z8mVvvXDAP+vBllb0is8fNRp8RsBvhoqn3uaKYMxfopd1SK/otUm2qNY0IvWLL8m46VJvRSOYGPVJkwsn5nB7MeZCA+ladlHhRk9+toXMvlACXGI1prvdtGssddnOArD4j+q5rCu/t62NsS5OEYP+Ij72JoSPskPsuIXImzaTp8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740387414; c=relaxed/simple; bh=xq0dUbIsObricEpN11IBCCHbKywJQ6xYZeXU20Lj/+A=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eqYHfgJX7R7LGqo45uoSeXt6gLvHbM01a1xCwYOdIOkwWbpgFcd8kz6zqDMlhCSP78NzICuFlsw3ekAkT09adNIXogebEGNt44mZUno5DKsa178eAP9r32AL6kL//a05bGH05DdD3GmqAmq9TZLzUaZLWltZbXPZ8X1aXB6Ijas= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SLFAt6s/; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SLFAt6s/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740387412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uIqgti8JhQLr3gDPSlEgz2+yjZL8zWhP7BiX1sRX38w=; b=SLFAt6s/1JuE7UtCkDY9nwp5gjG5G+7IoyQFguB0pklt39jfzgR6hdQgPf1T5C8dMj5qj9 5/jyfjwqS5Ej80x9aBsSwv3R5Phc5Jm6R7K8AAtWZ7WMAPd52dq63Ro9jQuRTHHfBzR5bB Zil6N3GpebesH8oWQHmWXgKgq6lTs0Y= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-201-OwCJXigTMgC20E-ONEoOvA-1; Mon, 24 Feb 2025 03:56:48 -0500 X-MC-Unique: OwCJXigTMgC20E-ONEoOvA-1 X-Mimecast-MFC-AGG-ID: OwCJXigTMgC20E-ONEoOvA_1740387407 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 31E171800871; Mon, 24 Feb 2025 08:56:46 +0000 (UTC) Received: from fedora (unknown [10.72.120.33]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D38351800358; Mon, 24 Feb 2025 08:56:37 +0000 (UTC) Date: Mon, 24 Feb 2025 16:56:31 +0800 From: Ming Lei To: Yu Kuai Cc: tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" Subject: Re: [PATCH 2/2] blk-throttle: fix off-by-one jiffies wait_time Message-ID: References: <20250222092823.210318-1-yukuai1@huaweicloud.com> <20250222092823.210318-3-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 On Mon, Feb 24, 2025 at 03:03:18PM +0800, Yu Kuai wrote: > Hi, Ming! > > 在 2025/02/24 11:28, Ming Lei 写道: > > throtl_trim_slice() returns immediately if throtl_slice_used() > > is true. > > > > And throtl_slice_used() checks jiffies in [start, end] via time_in_range(), > > so if `start <= jiffies <= end', it still returns false. > > Yes, I misread the code, by thinking throtl_slice_used() will return > true if the slice is still used. :( > > > > > BTW, throtl_trim_slice() looks like problematic: > > > > > > - if (bytes_trim <= 0 && io_trim <= 0) > > > + if (bytes_trim <= 0 || io_trim <= 0 || > > > + tg->bytes_disp[rw] < bytes_trim || tg->io_disp[rw] < io_trim) > > > return; > > That is exactly what my patch is doing, just taking deviation and > > timeout into account, also U64_MAX limit has to be excluded. > Yes, perhaps you can add some comments in the last two conditions of > your patch. Yes, we need to add comment on the check, how about the following words? ``` If actually rate doesn't match with expected rate, do not trim slice otherwise the present rate control info is lost, we don't have chance to compensate it in the following period of this slice any more. Add one deviation threshold since bio size is usually not divisible by present rate, and bio dispatch should be done or nothing Also limit max slice size for avoiding to extend the window too big. ``` > I think maybe you're concerned about the case IO is > throttled by IOs and bytes_disp should really erase extra bytes that > does not reach bps_limit. > > > > > If you test the patch, it works just fine. My patch controls bytes_exp > > <= 1.5 * disp, then throtl/001 can be completed in 1.5sec, and if it is > > changed to 1.25 * disp, the test can be completed in 1.25sec. > > > > With this fix, why do we have to play the complicated carryover > > trick? > > > > I understand your code now. carryover_bytes in this case is wrong, as > long as new slice is not started, and trim slice doesn't erase extra > bytes by mistake, throttle time should be accurate over time because > bytes_disp is accurate. Yes. More or less wait for handling single bio can just affect instantaneous rate, but the algorithm is adaptive so it can adjust the following wait if the slice isn't over. > > And root cause really is throtl_trim_slice(). > > However, by code review, I think throtl_start_new_slice() should be > handled as well, the same as throtl_trim_slice(), if the old bio is > dispatched with one more jiffies wait time, issue a new bio in the same > jiffies will have the same problem. For example: throtl_start_new_slice() is called when nothing is queued and the current slice is used, so probably it is fine. throtl_start_new_slice_with_credit() is called when dispatching one bio, and it is still called when the current slice is used. > > Caller do sync IO, with io size: (bps_limit / (HZ / throtl_slice) + 1), This will cause wait for every single IO. But once the IO is throttled because of the wait, throtl_start_new_slice() won't succeed any more, meantime throtl_trim_slice() will try to fix the rate control for the extra 1 jiffies. So in-tree code with trim fix works just fine if consecutive IOs are coming. > and caller will issue new IO when old IO is done. Then in this case, > each IO will start a new slice and wait for throtl_slice + 1 jiffies. I > believe this can be fixed as well so that BIOs after the first one will > only wait for throtl_silce jiffies. I guess the above case only exists in theory when you submit new IO after the old IO is dispatched immediately. Not sure this case is really meaningful because submission period/latency is added/faked by user. Thanks, Ming