Linux CXL
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Vishal Verma <vishal.l.verma@intel.com>
Cc: nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org
Subject: Re: [ndctl PATCH v11 0/7] Support poison list retrieval
Date: Wed, 20 Mar 2024 13:42:08 -0700	[thread overview]
Message-ID: <ZftKIJSSIm4cZUje@aschofie-mobl2> (raw)
In-Reply-To: <cover.1710386468.git.alison.schofield@intel.com>

On Wed, Mar 13, 2024 at 09:05:16PM -0700, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>

Asking folks to share this with future users of the poison list
feature of ndctl. ie. cxl list --media-errors

I'd like to get additional 'user' input on the json output provided by
this --media-errors option to cxl-list. After a few iterations of what
should be included in the cxl-list output, I'm not so sure that we've
captured sufficient input from potential users. (Since they typically
won't use this til it's released in ndctl.)

To guide your thinking recall that users can retrieve a devices poison
list now without any cxl-cli (ndctl) tool support. Users can trigger
the collection via sysfs and see the results in the trace logs like:

this:
- echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the trace file at 

or this:
- cxl monitor --daemon --log=<poison-log-path>
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the monitor log

or this:
- enable tp_printk
- echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the dmesg log

So, a few ways to get at this cxl_poison trace data:
memdev=mem9
host=cxl_mem.5 
serial=5 
trace_type=List 
region=region5 
region_uuid=99352a43-44cb-405d-85c9-fdbd971455d8
hpa=0xf110001000
dpa=0x40000000
dpa_length=0x40
source=Injected
flags=
overflow_time=0

The tool should be providing a better experience that the sysfs/trace.
The tools does look up memdevs contributing to a region and triggers
the needed poison list reads, so that's a small convenience. It's
usefulness needs to extend to the json listing.

Here's history of json output pulled from the patch cover letters.
It's long, but I didn't want to omit any detail.

I've appended here the history of changes to the output.
Only including samples where the json output actually changed.
I'm including it to spur conversation not as a guideline.

Subject: [ndctl PATCH v11 0/7] Support poison list retrieval

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }


           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }


Subject: [ndctl PATCH v7 0/7] Support poison list retrieval

# cxl list -m mem1 --media-errors
[
  {
    "memdev":"mem1",
    "pmem_size":1073741824,
    "ram_size":1073741824,
    "serial":1,
    "numa_node":1,
    "host":"cxl_mem.1",
    "media_errors":[
      {
        "dpa":0,
        "length":64,
        "source":"Internal"
      },
      {
        "decoder":"decoder10.0",
        "hpa":1035355557888,
        "dpa":1073741824,
        "length":64,
        "source":"External"
      },
      {
        "decoder":"decoder10.0",
        "hpa":1035355566080,
        "dpa":1073745920,
        "length":64,
        "source":"Injected"
      }
    ]
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035355553792,
    "size":2147483648,
    "type":"pmem",
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":[
      {
        "decoder":"decoder10.0",
        "hpa":1035355557888,
        "dpa":1073741824,
        "length":64,
        "source":"External"
      },
      {
        "decoder":"decoder8.1",
        "hpa":1035355566080,
        "dpa":1073745920,
        "length":64,
        "source":"Internal"
      }
    ]
  }
]

Subject: [ndctl PATCH v6 0/7] Support poison list retrieval

# cxl list -m mem1 --media-errors
[
  {
    "memdev":"mem1",
    "pmem_size":1073741824,
    "ram_size":1073741824,
    "serial":1,
    "numa_node":1,
    "host":"cxl_mem.1",
    "media_errors":[
      {
        "dpa":0,
        "dpa_length":64,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035355553792,
    "size":2147483648,
    "type":"pmem",
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":[
      {
        "memdev":"mem1",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "memdev":"mem1",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

Subject: [ndctl PATCH v2 0/3] Support poison list retrieval

Example: By memdev
cxl list -m mem1 --poison -u
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x1",
  "numa_node":1,
  "host":"cxl_mem.1",
  "poison":{
    "nr_poison_records":4,
    "poison_records":[
      {
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x40001000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x600",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}

Example: By region
cxl list -r region5 --poison -u
{
  "region":"region5",
  "resource":"0xf110000000",
  "size":"2.00 GiB (2.15 GB)",
  "type":"pmem",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "decode_state":"commit",
  "poison":{
    "nr_poison_records":2,
    "poison_records":[
      {
        "memdev":"mem1",
        "region":"region5",
        "hpa":"0xf110001000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "memdev":"mem0",
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}


Example: By memdev and coincidentally in a region
# cxl list -m mem0 --poison -u
{
  "memdev":"mem0",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0",
  "numa_node":0,
  "host":"cxl_mem.0",
  "poison":{
    "nr_poison_records":1,
    "poison_records":[
      {
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}


Example: No poison found
cxl list -m mem9 --poison -u
{
  "memdev":"mem9",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x9",
  "numa_node":1,
  "host":"cxl_mem.9",
  "poison":{
    "nr_poison_records":0
  }
}


      parent reply	other threads:[~2024-03-20 20:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2024-03-18 17:51   ` fan
2024-03-18 20:11     ` Alison Schofield
2024-03-18 21:01       ` Dan Williams
2024-03-19 16:43         ` Alison Schofield
2024-03-14  4:05 ` [ndctl PATCH v11 2/7] cxl/event_trace: add an optional pid check to event parsing alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 3/7] cxl/event_trace: support poison context in " alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
2024-03-15 15:44   ` Dave Jiang
2024-03-15 17:39   ` Dan Williams
2024-03-18 17:28     ` Alison Schofield
2024-03-18 21:21   ` fan
2024-03-14  4:05 ` [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records alison.schofield
2024-03-15 16:16   ` Dave Jiang
2024-03-20 20:24     ` Alison Schofield
2024-03-14  4:05 ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
2024-03-15  1:09   ` Wonjae Lee
2024-03-15  2:36     ` Alison Schofield
2024-03-15  3:35       ` Dan Williams
2024-03-20 20:40         ` Alison Schofield
2024-03-27 19:48         ` Alison Schofield
2024-04-18 20:12           ` Alison Schofield
2024-03-15 16:41   ` Dave Jiang
2024-03-14  4:05 ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
2024-03-15 17:03   ` Dave Jiang
2024-03-15 23:03   ` Wonjae Lee
2024-03-18 17:17     ` Alison Schofield
2024-03-20 20:42 ` Alison Schofield [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZftKIJSSIm4cZUje@aschofie-mobl2 \
    --to=alison.schofield@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox