Linux Device Mapper development
 help / color / mirror / Atom feed
From: Ken Raeburn <raeburn@redhat.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: linux-block@vger.kernel.org, vdo-devel@redhat.com,
	dm-devel@redhat.com, ebiggers@kernel.org, tj@kernel.org
Subject: Re: [dm-devel] [vdo-devel] [PATCH v2 00/39] Add the dm-vdo deduplication and compression device mapper target.
Date: Mon, 24 Jul 2023 14:03:45 -0400	[thread overview]
Message-ID: <87mszl9ofy.fsf@redhat.com> (raw)
In-Reply-To: <ZLa086NuWiMkJKJE@redhat.com> (Mike Snitzer's message of "Tue, 18 Jul 2023 11:51:15 -0400")


(Apologies for the re-send ... I neglected to turn of HTML and so
linux-block bounced the email as spam.)

On Tue, Jul 18, 2023 at 11:51 AM Mike Snitzer <snitzer@kernel.org> wrote:

 But the long-standing dependency on VDO's work-queue data
 struct is still lingering (drivers/md/dm-vdo/work-queue.c). At a
 minimum we need to work toward pinning down _exactly_ why that is, and
 I think the best way to answer that is by simply converting the VDO
 code over to using Linux's workqueues.  If doing so causes serious
 inherent performance (or functionality) loss then we need to
 understand why -- and fix Linux's workqueue code accordingly. (I've
 cc'd Tejun so he is aware).

We tried this experiment and did indeed see some significant
performance differences. Nearly a 7x slowdown in some cases.

VDO can be pretty CPU-intensive. In addition to hashing and
compression, it scans some big in-memory data structures as part of
the deduplication process. Some data structures are split across one
or more "zones" to enable concurrency (usually split based on bits of
an address or something like that), but some are not, and a couple of
those threads can sometimes exceed 50% CPU utilization, even 90%
depending on the system and test data configuration. (Usually this is
while pushing over 1GB/s through the deduplication and compression
processing on a system with fast storage. On a slow VM with spinning
storage, the CPU load is much smaller.)

We use a sort of message-passing arrangement where a worker thread is
responsible for updating certain data structures as needed for the
I/Os in progress, rather than having the processing of each I/O
contend for locks on the data structures. It gives us some good
throughput under load but it does mean upwards of a dozen handoffs per
4kB write, depending on compressibility, whether the block is a
duplicate, and various other factors. So processing 1 GB/s means
handling over 3M messages per second, though each step of processing
is generally lightweight. For our dedicated worker threads, it's not
unusual for a thread to wake up and process a few tens or even
hundreds of updates to its data structures (likely benefiting from CPU
caching of the data structures) before running out of available work
and going back to sleep.

The experiment I ran was to create an ordered workqueue instead of
each dedicated thread where we need serialization, and unordered
workqueues when concurrency is allowed. On our slower test systems (>
10y old Supermicro Xeon E5-1650 v2, RAID-0 storage using SSDs or
HDDs), the slowdown was less significant (under 2x), but on our faster
system (4-5? year old Supermicro 1029P-WTR, 2x Xeon Gold 6128 = 12
cores, NVMe storage) we got nearly a 7x slowdown overall. I haven't
yet dug deeply into _why_ the kernel work queues are slower in this
sort of setup. I did run "perf top" briefly during one test with
kernel work queues, and the largest single use of CPU cycles was in
spin lock acquisition, but I didn't get call graphs.

(This was with Fedora 37 6.2.12-200 and 6.2.15-200 kernels, without
the latest submissions from Tejun, which look interesting. Though I
suspect we care more about cache locality for some of our
thread-specific data structures than for accessing the I/O
structures.)

Ken

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

  parent reply	other threads:[~2023-07-24 18:04 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 21:45 [dm-devel] [PATCH v2 00/39] Add the dm-vdo deduplication and compression device mapper target J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 01/39] Add documentation for dm-vdo J. corwin Coburn
2023-05-24 22:36   ` kernel test robot
2023-05-23 21:45 ` [dm-devel] [PATCH v2 02/39] Add the MurmurHash3 fast hashing algorithm J. corwin Coburn
2023-05-23 22:06   ` Eric Biggers
2023-05-23 22:13     ` corwin
2023-05-23 22:25       ` Eric Biggers
2023-05-23 23:06         ` Eric Biggers
2023-05-24  4:15           ` corwin
2023-05-23 21:45 ` [dm-devel] [PATCH v2 03/39] Add memory allocation utilities J. corwin Coburn
2023-05-23 22:14   ` Eric Biggers
2023-05-23 21:45 ` [dm-devel] [PATCH v2 04/39] Add basic logging and support utilities J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 05/39] Add vdo type declarations, constants, and simple data structures J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 06/39] Add thread and synchronization utilities J. corwin Coburn
2023-05-24  5:15   ` kernel test robot
2023-05-23 21:45 ` [dm-devel] [PATCH v2 07/39] Add specialized request queueing functionality J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 08/39] Add basic data structures J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 09/39] Add deduplication configuration structures J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 10/39] Add deduplication index storage interface J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 11/39] Implement the delta index J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 12/39] Implement the volume index J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 13/39] Implement the open chapter and chapter indexes J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 14/39] Implement the chapter volume store J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 15/39] Implement top-level deduplication index J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 16/39] Implement external deduplication index interface J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 17/39] Add administrative state and scheduling for vdo J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 18/39] Add vio, the request object for vdo metadata J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 19/39] Add data_vio, the request object which services incoming bios J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 20/39] Add flush support to vdo J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 21/39] Add the vdo io_submitter J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 22/39] Add hash locks and hash zones J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 23/39] Add use of the deduplication index in " J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 24/39] Add the compressed block bin packer J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 25/39] Add vdo_slab J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 26/39] Add the slab summary J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 27/39] Add the block allocators and physical zones J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 28/39] Add the slab depot itself J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 29/39] Add the vdo block map J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 30/39] Implement the vdo block map page cache J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 31/39] Add the vdo recovery journal J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 32/39] Add repair (crash recovery and read-only rebuild) of damaged vdos J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 33/39] Add the vdo structure itself J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 34/39] Add the on-disk formats and marshalling of vdo structures J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 35/39] Add statistics tracking J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 36/39] Add sysfs support for setting vdo parameters and fetching statistics J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 37/39] Add vdo debugging support J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 38/39] Add dm-vdo-target.c J. corwin Coburn
2023-05-23 21:45 ` [dm-devel] [PATCH v2 39/39] Enable configuration and building of dm-vdo J. corwin Coburn
2023-05-23 22:40 ` [dm-devel] [PATCH v2 00/39] Add the dm-vdo deduplication and compression device mapper target Eric Biggers
2023-05-30 23:03   ` [dm-devel] [vdo-devel] " Matthew Sakai
2023-07-18 15:51 ` [dm-devel] " Mike Snitzer
2023-07-22  1:59   ` [dm-devel] [vdo-devel] " Kenneth Raeburn
2023-07-23  6:24     ` Sweet Tea Dorminy
2023-07-26 23:33       ` Ken Raeburn
2023-07-27 15:29         ` Sweet Tea Dorminy
2023-07-26 23:32     ` Ken Raeburn
2023-07-27 14:57       ` [dm-devel] " Mike Snitzer
2023-07-28  8:28         ` Ken Raeburn
2023-07-28 14:49           ` Mike Snitzer
2023-07-24 18:03   ` Ken Raeburn [this message]
2023-08-09 23:40 ` [dm-devel] [vdo-devel] " Matthew Sakai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mszl9ofy.fsf@redhat.com \
    --to=raeburn@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=ebiggers@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdo-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox