NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
* [ndctl RFC 0/3] Support poison list retrieval
@ 2022-10-13 23:39 alison.schofield
  2022-10-13 23:39 ` [RFC 1/3] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: alison.schofield @ 2022-10-13 23:39 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Dave Jiang, Ben Widawsky
  Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The RFC label is because this is built upon in flight patchsets
making it unlikely others can try it out. It depends upon the
tracing support in Dave's monitor patchset [1], and the kernel
driver support for poison in this patchset [2].

The first patch adds a libcxl API for triggering the read of a
poison list from a memory device. Users of that API will need to
trace the kernel events to collect the error records.

Patches 2 & 3 offer a pretty option, --media-errors to cxl list 
where the the poison list is read, results collected and parsed,
and the media error records included in the JSON list output.

The JSON output of 'cxl list' does not include all the same fields
that are available in the 'cxl_poison' trace event.

Trace events of 'cxl_poison' always include these fields:
region: memdev: pcidev: hpa: dpa: length: source: flags: overflow_time:

'cxl list --media-errors' omits fields that seem useless in the
context of the cxl list command:
- Do not repeat the memdev, region, or pcidev's that are
  already included in the list output.
- Only include 'hpa' when media errors are listed by region.

Examples:
cxl list -m mem2 --media-errors
[
  {
    "memdev":"mem2",
    "pmem_size":1073741824,
    "ram_size":0,
    "serial":2,
    "host":"cxl_mem.2",
    "media_errors":{
      "nr media-errors":2,
      "media-error records":[
        {
          "dpa":64,
          "length":128,
          "source":"Injected",
          "flags":"Overflow,",
          "overflow_time":1656711046
        },
        {
          "dpa":192,
          "length":192,
          "source":"Internal",
          "flags":"Overflow,",
          "overflow_time":1656711046
        },
      ]
    }
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035623989248,
    "size":2147483648,
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":{
      "nr media-errors":2,
      "media-error records":[
        {
          "memdev":"mem2",
          "hpa":0,
          "dpa":0,
          "length":64,
          "source":"Reserved",
          "flags":"",
          "overflow_time":0
        },
	{
          "memdev":"mem5",
          "hpa":0,
          "dpa":384,
          "length":256,
          "source":"Injected",
          "flags":"",
          "overflow_time":0
        }
      ]
    }
  }
]

[1] https://lore.kernel.org/nvdimm/166363103019.3861186.3067220004819656109.stgit@djiang5-desk3.ch.intel.com/
[2] https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofield@intel.com/

Alison Schofield (3):
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl/list: collect and parse the poison list records
  cxl/list: add --media-errors option to cxl list

 Documentation/cxl/cxl-list.txt |  66 +++++++++++
 cxl/filter.c                   |   2 +
 cxl/filter.h                   |   1 +
 cxl/json.c                     | 197 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  40 +++++++
 cxl/lib/libcxl.sym             |   6 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   2 +
 8 files changed, 316 insertions(+)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-13 23:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-13 23:39 [ndctl RFC 0/3] Support poison list retrieval alison.schofield
2022-10-13 23:39 ` [RFC 1/3] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2022-10-13 23:39 ` [RFC 2/3] cxl/list: collect and parse the poison list records alison.schofield
2022-10-13 23:39 ` [RFC 3/3] cxl/list: add --media-errors option to cxl list alison.schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox