Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Ramanan Govindarajan <ramanan.govindarajan@oracle.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	Paul Webb <paul.x.webb@oracle.com>,
	Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [bug-report] 5-9% FIO randomwrite ext4 perf regression on 6.12.y kernel
Date: Thu, 21 Nov 2024 00:00:20 +0000	[thread overview]
Message-ID: <0cfbfcf6-08f5-4d1b-82c4-729db9198896@nvidia.com> (raw)
In-Reply-To: <392209D9-5AC6-4FDE-8D84-FB8A82AD9AEF@oracle.com>

On 11/20/24 13:35, Saeed Mirzamohammadi wrote:
> Hi,
>
> I’m reporting a performance regression of up to 9-10% with FIO randomwrite benchmark on ext4 comparing 6.12.0-rc2 kernel and v5.15.161. Also, standard deviation after this change grows up to 5-6%.
>
> Bisect root cause commit
> ===================
> - commit 63dfa1004322 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard”)
>
>
> Test details
> =========
> - readwrite=randwrite bs=4k size=1G ioengine=libaio iodepth=16 direct=1 time_based=1 ramp_time=180 runtime=1800 randrepeat=1 gtod_reduce=1
> - Test is on ext4 filesystem
> - System has 4 NVMe disks
>

Thanks a lot for the report, to narrow down this problem can you
please :-

1. Run the same test on the raw nvme device /dev/nvme0n1 that you
    have used for this benchmark ?
2. Run the same test on the  XFS formatted nvme device instead of ext4 ?

This way we will know if there is an issue only with the ext4 or
with other file systems are suffering from this problem too or
it is below the file system layer such as block layer and nvme pci driver ?

It will also help if you can repeat these numbers for io_uring fio io_engine
to narrow down this problem to know if the issue is ioengine specific.

Looking at the commit [1], it only sets the max value to write zeroes 
sectors
if NVME_QUIRK_DEALLOCATE_ZEROES is set, else uses the controller max
write zeroes value.

So not sure how this commit can slow things down unless there is change in
behavior of the write-zeores instead of offloading (REQ_OP_WRITE_ZEROES)
it's now falling back to REQ_OP_WRITE with ZERO PAGE when called from
ext4 sb_issue_zeroout :-

fs/ext4/ialloc.c ext4_init_inode_table        sb_issue_zeroout()
fs/ext4/inode.c  ext4_issue_zeroout           sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks  sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks  sb_issue_zeroout()

-ck

 From 63dfa1004322d596417f23da43cdc43cf6298c71 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 4 Mar 2024 07:04:46 -0700
Subject: [PATCH] nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of
  nvme_config_discard

Move the handling of the NVME_QUIRK_DEALLOCATE_ZEROES quirk out of
nvme_config_discard so that it is combined with the normal write_zeroes
limit handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
  drivers/nvme/host/core.c | 11 ++++++-----
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6ae9aedf7bc2..a6c0b2f4cf79 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1816,9 +1816,6 @@ static void nvme_config_discard(struct nvme_ctrl 
*ctrl, struct gendisk *disk,
         else
                 blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
         queue->limits.discard_granularity = 
queue_logical_block_size(queue);
-
-       if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-               blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
  }

  static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct 
nvme_ns_ids *b)
@@ -2029,8 +2026,12 @@ static void nvme_update_disk_info(struct 
nvme_ctrl *ctrl, struct gendisk *disk,
         set_capacity_and_notify(disk, capacity);

         nvme_config_discard(ctrl, disk, head);
-       blk_queue_max_write_zeroes_sectors(disk->queue,
- ctrl->max_zeroes_sectors);
+
+       if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+               blk_queue_max_write_zeroes_sectors(disk->queue, UINT_MAX);
+ else
+               blk_queue_max_write_zeroes_sectors(disk->queue,
+                               ctrl->max_zeroes_sectors);
  }

  static bool nvme_ns_is_readonly(struct nvme_ns *ns, struct 
nvme_ns_info *info)



  reply	other threads:[~2024-11-21  0:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 21:35 [bug-report] 5-9% FIO randomwrite ext4 perf regression on 6.12.y kernel Saeed Mirzamohammadi
2024-11-21  0:00 ` Chaitanya Kulkarni [this message]
2024-11-21  1:20   ` Jens Axboe
2024-11-21  4:57     ` Christoph Hellwig
2024-11-21 14:48       ` Jens Axboe
2024-11-21 11:30     ` Phil Auld
2024-11-21 14:49       ` Jens Axboe
     [not found]         ` <181bcb70-e0bf-4024-80b7-e79276d6eaf7@oracle.com>
2024-11-21 21:19           ` [External] : " Phil Auld
2024-11-22 12:13           ` Christoph Hellwig
2024-11-22 17:18             ` Paul Webb
2024-11-22 18:26               ` Saeed Mirzamohammadi
2024-11-22 21:09                 ` Keith Busch
2024-11-25  6:46                   ` Christoph Hellwig
2024-11-25 18:28                   ` Saeed Mirzamohammadi
2024-11-26  4:55                     ` Christoph Hellwig
2024-11-26 18:06                       ` Saeed Mirzamohammadi
2024-11-26 18:09                         ` Christoph Hellwig
2024-11-26 18:13                           ` Saeed Mirzamohammadi
2024-11-22 17:13         ` Paul Webb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cfbfcf6-08f5-4d1b-82c4-729db9198896@nvidia.com \
    --to=chaitanyak@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=paul.x.webb@oracle.com \
    --cc=ramanan.govindarajan@oracle.com \
    --cc=saeed.mirzamohammadi@oracle.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox