All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pratyush Yadav <ptyadav@amazon.de>
To: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	"Jens Axboe" <axboe@kernel.dk>, <linux-nvme@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] nvme-pci: do not set the NUMA node of device if it has none
Date: Wed, 26 Jul 2023 21:32:33 +0200	[thread overview]
Message-ID: <mafs08rb28o4u.fsf_-_@amazon.de> (raw)
In-Reply-To: <ZMFHEK95WGwtYbid@kbusch-mbp.dhcp.thefacebook.com> (Keith Busch's message of "Wed, 26 Jul 2023 10:17:20 -0600")

On Wed, Jul 26 2023, Keith Busch wrote:

> On Wed, Jul 26, 2023 at 05:30:33PM +0200, Pratyush Yadav wrote:
>> On Wed, Jul 26 2023, Christoph Hellwig wrote:
>> > On Wed, Jul 26, 2023 at 10:58:36AM +0300, Sagi Grimberg wrote:
>> >>>> For example, AWS EC2's i3.16xlarge instance does not expose NUMA
>> >>>> information for the NVMe devices. This means all NVMe devices have
>> >>>> NUMA_NO_NODE by default. Without this patch, random 4k read performance
>> >>>> measured via fio on CPUs from node 1 (around 165k IOPS) is almost 50%
>> >>>> less than CPUs from node 0 (around 315k IOPS). With this patch, CPUs on
>> >>>> both nodes get similar performance (around 315k IOPS).
>> >>>
>> >>> irqbalance doesn't work with this driver though: the interrupts are
>> >>> managed by the kernel. Is there some other reason to explain the perf
>> >>> difference?
>>
>> Hmm, I did not know that. I have not gone and looked at the code but I
>> think the same reasoning should hold, just with s/irqbalance/kernel. If
>> the kernel IRQ balancer sees the device is on node 0, it would deliver
>> its interrupts to CPUs on node 0.
>>
>> In my tests I can see that the interrupts for NVME queues are sent only
>> to CPUs from node 0 without this patch. With this patch CPUs from both
>> nodes get the interrupts.
>
> Could you send the output of:
>
>   numactl --hardware

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 0 size: 245847 MB
node 0 free: 245211 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 1 size: 245932 MB
node 1 free: 245328 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

>
> and then with and without your patch:
>
>   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
>     cat /proc/irq/$i/{smp,effective}_affinity_list; \
>   done

Without my patch:

    $   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
    >     cat /proc/irq/$i/{smp,effective}_affinity_list; \
    >   done
    40
    40
    33
    33
    44
    44
    9
    9
    32
    32
    2
    2
    6
    6
    11
    11
    1
    1
    35
    35
    39
    39
    13
    13
    42
    42
    46
    46
    41
    41
    46
    46
    15
    15
    5
    5
    43
    43
    0
    0
    14
    14
    8
    8
    12
    12
    7
    7
    10
    10
    47
    47
    38
    38
    36
    36
    3
    3
    34
    34
    45
    45
    5
    5

With my patch:

    $   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
    >     cat /proc/irq/$i/{smp,effective}_affinity_list; \
    >   done
    9
    9
    15
    15
    5
    5
    23
    23
    38
    38
    52
    52
    21
    21
    36
    36
    13
    13
    56
    56
    44
    44
    42
    42
    31
    31
    48
    48
    5
    5
    3
    3
    1
    1
    11
    11
    28
    28
    18
    18
    34
    34
    29
    29
    58
    58
    46
    46
    54
    54
    59
    59
    32
    32
    7
    7
    56
    56
    62
    62
    49
    49
    57
    57

-- 
Regards,
Pratyush Yadav



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879





  reply	other threads:[~2023-07-26 19:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-25 11:06 [PATCH] nvme-pci: do not set the NUMA node of device if it has none Pratyush Yadav
2023-07-25 14:35 ` Keith Busch
2023-07-26  7:58   ` Sagi Grimberg
2023-07-26 13:14     ` Christoph Hellwig
2023-07-26 15:30       ` Pratyush Yadav
2023-07-26 16:17         ` Keith Busch
2023-07-26 19:32           ` Pratyush Yadav [this message]
2023-07-26 22:25             ` Keith Busch
2023-07-28 18:09               ` Pratyush Yadav
2023-07-28 19:34                 ` Keith Busch
2023-08-04 14:50                   ` Pratyush Yadav
2023-08-04 15:19                     ` Keith Busch
2023-08-08 15:51                       ` Pratyush Yadav
2023-08-08 16:35                         ` Keith Busch
2024-07-23  9:49 ` Maurizio Lombardi
2024-07-23 14:39   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mafs08rb28o4u.fsf_-_@amazon.de \
    --to=ptyadav@amazon.de \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.