linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Hugh Dickins <hughd@google.com>, Keith Busch <kbusch@kernel.org>,
	Liu Song <liusong@linux.alibaba.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up queued tags
Date: Wed, 09 Nov 2022 17:48:08 -0500	[thread overview]
Message-ID: <87wn83eod3.fsf@suse.de> (raw)
In-Reply-To: <cd88f306-1da4-a243-ec23-fea033142fbb@kernel.dk> (Jens Axboe's message of "Wed, 9 Nov 2022 15:06:52 -0700")

Jens Axboe <axboe@kernel.dk> writes:

> On 11/5/22 5:10 PM, Gabriel Krisman Bertazi wrote:
>> Performance-wise, one should expect very similar performance to the
>> original algorithm for the case where there is no queueing.  In both the
>> old algorithm and this implementation, the first thing is to check
>> ws_active, which bails out if there is no queueing to be managed. In the
>> new code, we took care to avoid accounting completions and wakeups when
>> there is no queueing, to not pay the cost of atomic operations
>> unnecessarily, since it doesn't skew the numbers.
>> 
>> For more interesting cases, where there is queueing, we need to take
>> into account the cross-communication of the atomic operations.  I've
>> been benchmarking by running parallel fio jobs against a single hctx
>> nullb in different hardware queue depth scenarios, and verifying both
>> IOPS and queueing.
>> 
>> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
>> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
>> varying only the hardware queue length per test.
>> 
>> queue size 2                 4                 8                 16                 32                 64
>> 6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
>> patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)
>> 
>> The following is a similar experiment, ran against a nullb with a single
>> bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
>> parallel fio jobs operating on the same device
>> 
>> queue size 2 	             4                 8              	16             	    32		       64
>> 6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
>> patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)
>
> What's the queue depth of these devices? That's the interesting question
> here, as it'll tell us if any of these are actually hitting the slower
> path where you made changes. 
>

Hi Jens,

The hardware queue depth is a parameter being varied in this experiment.
Each column of the tables has a different queue depth.  Its value is the
first line (queue size) of both tables.  For instance, looking at the
first table, for a device with hardware queue depth=2, 6.1-rc2 gave
1681K IOPS and the patched version gave 1721.8K IOPS.

As mentioned, I monitored the size of the sbitmap wqs during the
benchmark execution to confirm it was indeed hitting the slow path and
queueing.  Indeed, I observed less queueing on higher QDs (16,32) and
even less for QD=64.  For QD<=8, there was extensive queueing present
throughout the execution.

I should provide the queue size over time alongside the latency numbers.
I have to rerun the benchmarks already to collect the information
Chaitanya requested.

> I suspect you are for the second set of numbers, but not for the first
> one?

No. both tables show some level of queueing. The shared bitmap in
table 2 surely has way more intensive queueing, though.

> Anything that isn't hitting the wait path for tags isn't a very useful
> test, as I would not expect any changes there.

Even when there is less to no queueing (QD=64 in this data), we still
enter sbitmap_queue_wake_up and bail out on the first line
!wait_active. This is why I think it is important to include QD=64
here. it is less interesting data, as I mentioned, but it shows no
regressions of the faspath.

Thanks,

-- 
Gabriel Krisman Bertazi

  reply	other threads:[~2022-11-09 22:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-05 23:10 [PATCH] sbitmap: Use single per-bitmap counting to wake up queued tags Gabriel Krisman Bertazi
2022-11-08 23:28 ` Chaitanya Kulkarni
2022-11-09  3:03   ` Gabriel Krisman Bertazi
2022-11-09  3:35     ` Chaitanya Kulkarni
2022-11-09 22:06 ` Jens Axboe
2022-11-09 22:48   ` Gabriel Krisman Bertazi [this message]
2022-11-10  3:25     ` Jens Axboe
2022-11-10  9:42 ` Yu Kuai
2022-11-10 11:16   ` Jan Kara
2022-11-10 13:18     ` Yu Kuai
2022-11-10 15:35       ` Jan Kara
2022-11-11  0:59         ` Yu Kuai
2022-11-11 15:38 ` Jens Axboe
2022-11-14 13:23 ` Jan Kara
2022-11-14 14:20   ` [PATCH] sbitmap: Advance the queue index before waking up the queue Gabriel Krisman Bertazi
2022-11-14 14:34     ` Jan Kara
2022-11-15  3:52   ` [PATCH] sbitmap: Use single per-bitmap counting to wake up queued tags Gabriel Krisman Bertazi
2022-11-15 10:24     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wn83eod3.fsf@suse.de \
    --to=krisman@suse.de \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liusong@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).