All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Hannes Reinecke <hare@suse.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, hch@lst.de, gjoyce@linux.ibm.com,
	axboe@fb.com
Subject: Re: [PATCH] nvme: find numa distance only if controller has valid numa id
Date: Mon, 15 Apr 2024 10:56:07 -0600	[thread overview]
Message-ID: <Zh1cJwMo4cntWoHS@kbusch-mbp> (raw)
In-Reply-To: <05dbae65-2cc2-40d7-9066-a83cdfdc47be@suse.de>

On Mon, Apr 15, 2024 at 04:39:45PM +0200, Hannes Reinecke wrote:
> > For calculating the distance between two nodes we invoke the function __node_distance().
> > This function would then access the numa distance table, which is typically an array with
> > valid index starting from 0. So obviously accessing this table with index of -1 would
> > deference incorrect memory location. De-referencing incorrect memory location might have
> > side effects including panic (though I didn't encounter panic). Furthermore in such a case,
> > the calculated node distance could potentially be incorrect and that might cause the nvme
> > multipath to choose a suboptimal IO path.
> > 
> > This patch may not help choosing the optimal IO path (as we assume that the node distance would be
> > LOCAL_DISTANCE in case nvme controller numa node id is -1) but it ensures that we don't access the
> > invalid memory location for calculating node distance.
> > 
> Hmm. One wonders: how does such a system work?
> The systems I know always have the PCI slots attached to the CPU
> sockets, so if the CPU is not present the NVMe device on that
> slot will be non-functional. In fact, it wouldn't be visible at
> all as the PCI lanes are not powered up.
> In your system the PCI lanes clearly are powered up, as the NVMe
> device shows up in the PCI enumeration.
> Which means you are running a rather different PCI configuration.
> Question now is: does the NVMe device _work_?
> If it does, shouldn't the NUMA node continue to be present (some kind of
> memory-less, CPU-less NUMA node ...)?
> As a side-note, we'll need these kind of configuration anyway once
> CXL switches become available...

I recall systems with IO controller attached in a shared manner to all
sockets, so memory is UMA from IO device perspecitve (it may still be
NUMA from CPU). I don't think you need to consider memory-only NUMA
nodes unless there are additional distances to consider (at which point
it's no longer UMA).


  reply	other threads:[~2024-04-15 16:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-13  9:04 [PATCH] nvme: find numa distance only if controller has valid numa id Nilay Shroff
2024-04-14  8:30 ` Sagi Grimberg
2024-04-14 11:02   ` Nilay Shroff
2024-04-15  8:55     ` Sagi Grimberg
2024-04-15  9:30       ` Nilay Shroff
2024-04-15 10:04         ` Sagi Grimberg
2024-04-15 14:39         ` Hannes Reinecke
2024-04-15 16:56           ` Keith Busch [this message]
2024-04-16  8:06           ` Nilay Shroff
2024-04-15  7:25 ` Christoph Hellwig
2024-04-15  7:54   ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh1cJwMo4cntWoHS@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=axboe@fb.com \
    --cc=gjoyce@linux.ibm.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nilay@linux.ibm.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.