public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* race between nvme device creation and discovery?
@ 2024-02-02 15:16 Daniel Wagner
  2024-02-05  5:02 ` Hannes Reinecke
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Wagner @ 2024-02-02 15:16 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org
  Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg

I am trying to figure out why some of the blktests fail randomly when
running with FC as transport. This failure only appear when the
autoconnect is running in the background. A clear indication we still
have some sort of interference with it.

nvme/030 fails a bit more often then the rest, and it might just because
it issues several 'nvme discover' commands, many other tests only a one.

When a test fails, it fails with

  failed to lookup subsystem for controller nvme0

which is from libnvme when it iterates over sysfs to gather infos.

        subsysname = nvme_ctrl_lookup_subsystem_name(r, name);
        if (!subsysname) {
                nvme_msg(r, LOG_ERR,
                         "failed to lookup subsystem for controller %s\n",
                         name);
                errno = ENXIO;
                return NULL;
        }

My current theory is when a new controller isa dded is not atomic from
the POV userland and thus libnvme is able to observe a situation when
there is controller but the matching subsystem is not yet visible.

So something like:

  nvme_init_ctrl
    cdev_device_add

  // libnvme iterates over sysfs

  nvme_init_ctrl_finish
    nvme_init_identify
      nvme_init_subsystem
         device_add          // nvme-subsys%d
         sysfs_create_link   // subsys->dev -> ctrl-device

Does this any sense? And if so what could be done? Should we add some
retry logic to libnvme?

Daniel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-06 12:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 15:16 race between nvme device creation and discovery? Daniel Wagner
2024-02-05  5:02 ` Hannes Reinecke
2024-02-05  7:47   ` Daniel Wagner
2024-02-05  7:57     ` Hannes Reinecke
2024-02-05  8:46       ` Daniel Wagner
2024-02-05  9:20     ` Maurizio Lombardi
2024-02-05  9:20       ` Maurizio Lombardi
2024-02-05 10:07         ` Daniel Wagner
2024-02-05 10:17           ` Daniel Wagner
2024-02-05 17:13             ` Daniel Wagner
2024-02-06 12:45               ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox