All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: "Verma, Vishal L" <vishal.l.verma@intel.com>
Cc: "Williams, Dan J" <dan.j.williams@intel.com>,
	"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Subject: Re: [GIT preview] for-6.3/cxl-ram-region
Date: Tue, 31 Jan 2023 18:03:53 -0500	[thread overview]
Message-ID: <Y9meWfDiCGbca4nP@memverge.com> (raw)
In-Reply-To: <73ef066b15c5551087da3667398f462d427d3204.camel@intel.com>

On Tue, Jan 31, 2023 at 08:24:19PM +0000, Verma, Vishal L wrote:
> On Tue, 2023-01-31 at 19:46 +0000, Verma, Vishal L wrote:
> > On Tue, 2023-01-31 at 14:03 -0500, Gregory Price wrote:
> > > 
> > > 
> > > Right now I believe this is failing due to the interleave and size not
> > > having default values
> > > 
> > > ./cxl create-region -m -t ram -d decoder0.0 -w 1 -g 4096 mem0
> > > cxl region: create_region: create_region: unable to determine region size
> > > cxl region: cmd_create_region: created 0 regions
> > > 
> > > 
> > > appears to be due to this code
> > > static int create_region(struct cxl_ctx *ctx, int *count,
> > >              struct parsed_params *p)
> > > {
> > > // ... snip ...
> > >     rc = create_region_validate_config(ctx, p);
> > >     if (rc)
> > >         return rc;
> > > 
> > >     if (p->size) {
> > >         size = p->size;
> > >         default_size = false;
> > >     } else if (p->ep_min_size) {
> > >         size = p->ep_min_size * p->ways;
> > > **    } else {
> > > **        log_err(&rl, "%s: unable to determine region size\n", __func__);
> > > **        return -ENXIO;
> > > **    }
> > > 
> > > So both size and ep_min_size are 0 here
> > > 
> > > echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
> > > cat /sys/bus/cxl/devices/region0/interleave_ways
> > > 0
> > > cat /sys/bus/cxl/devices/region0/interleave_granularity
> > > 0
> > > cat /sys/bus/cxl/devices/region0/size
> > > 0
> > 
> > Ah - this revealed an actual bug in these commits - the size and
> > ep_min_size don't refer to the region's size, it is the capacity of the
> > component memdevs. Right after create_ram_region, the region size is
> > expected to be zero.
> > 
> > However the bug here was a pmem assumption I had missed. When
> > determining sizes, we only look at pmem capacity, which is wrong. It
> > happened to work in my testing because the memdevs I used had both pmem
> > and ram capacity. I'll update with a fix shortly. Thanks for trying it
> > out and reporting this!
> 
> I've updated the branch now with a fix for this.

Progress! But now i've found a kernel segfault :D
(sorry about the jumble here, looks like multiple issues))

[root@fedora cxl]# ./cxl create-region -m -t ram -d decoder0.0 -w 1 -g 4096 mem0
[  170.675334] cxl_region region0: Failed to synchronize CPU cache state
libcxl: [c x l1_7r0e.68249g6i] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  170.691163] #PF: supervisor instruction fetch in kernel mode
[o n 1_70.70e3n9a1b6l]e :# rPeF: error_code(0gixo0010) - not-present page
n0[:  fai led1 7to 0e.7n19709] PGD 800000004d25d067 P4D 800000004d25d067 PUD 4cdf3067 PMD 0
[  170.725436] Oops: 0010 [#1] PREEMPT SMP PTI
1b[l e
7c0x.l734 510r]e giConPU: 0 PID: 717 Comm: cxl Not tainted 6.2.0-rc2+ #19
[  170.739750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
:[  170.747119] R IP: 0c0r1e0:at0ex_0r
egi[o n: 170.751110] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  170.757699] RSP: 0018:ffffb9a3c0e97c60 EFLAGS: 00010296
[   17r0e.g7ion0:6 f6a0i9l1e]d RAX: 0000000000000000 RBX: ffff9c38e459de60 RCX: 0000000000000000
[  170.772499] RDX: 0000000000000000 RSI: ffff9c38e42ecdb0 RDI: ffff9c390f11d400
 [  t170o.77 8e3nab0l0e] RBP: fff:f 9Nco3 8seed38000 R08: 0000000000000001 R09: ffffb9a3c0e97b38
[  170.783787] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c393d8c8c00
uch d[ev i 1ce7 0o.7r8 800a9]d R13: ffff9c390f141c00 R14: ffff9c38eed38340 R15: ffff9c38c1a01400
dr[e  s1s7
0.795938] FS:  00007ff89ca037c0(0000) GS:ffff9c393dc00000(0000) knlGS:0000000000000000
[  170.802891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  170.806705] CR2: ffffffffffffffd6 CR3: 0000000024c8e000 CR4: 00000000000006f0
[  170.817025] Call Trace:
[  170.818831]  <TASK>
[  170.820589]  cxl_region_decode_reset+0xb8/0x110
[  170.823893]  cxl_region_detach+0xda/0x1e0
[  170.829457]  detach_target.part.0+0x29/0x80
[  170.833503]  unregister_region+0x42/0x90
[  170.836813]  devm_release_action+0x3d/0x70
[  170.840128]  ? __pfx_unregister_region+0x10/0x10
[  170.843899]  delete_region_store+0x69/0x80
[  170.847680]  kernfs_fop_write_iter+0x11e/0x200
[  170.851217]  vfs_write+0x222/0x3e0
[  170.854141]  ksys_write+0x5b/0xd0
[  170.856695]  do_syscall_64+0x5b/0x80
[  170.859678]  ? kmem_cache_free+0x15/0x3b0
[  170.862234]  ? do_sys_openat2+0x77/0x150
[  170.865560]  ? syscall_exit_to_user_mode+0x17/0x40
[  170.870920]  ? do_syscall_64+0x67/0x80
[  170.874726]  ? syscall_exit_to_user_mode+0x17/0x40
[  170.879464]  ? do_syscall_64+0x67/0x80
[  170.881634]  ? __irq_exit_rcu+0x3d/0x140
[  170.884720]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  170.888810] RIP: 0033:0x7ff89c901c37
[  170.891435] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 4[  170.905803] RSP: 002b:00007fff0e843a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  170.913373] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff89c901c37
[  170.920868] RDX: 0000000000000008 RSI: 0000000001290ee6 RDI: 0000000000000003
[  170.931402] RBP: 00007fff0e843aa0 R08: 000000000000fee0 R09: 0000000000000073
[  170.936639] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  170.942484] R13: 00007fff0e844000 R14: 000000000041fdc8 R15: 00007ff89cbdf000
[  170.954794]  </TASK>
[  170.957649] Modules linked in: rfkill vfat fat snd_pcm iTCO_wdt snd_timer intel_pmc_bxt ppdev iTCO_vendor_support snd cxl_pmem soundcore bochg[  170.980623] CR2: 0000000000000000
[  170.984137] ---[ end trace 0000000000000000 ]---
[  170.989062] RIP: 0010:0x0
[  170.991505] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  170.996401] RSP: 0018:ffffb9a3c0e97c60 EFLAGS: 00010296
[  170.999716] RAX: 0000000000000000 RBX: ffff9c38e459de60 RCX: 0000000000000000
[  171.006146] RDX: 0000000000000000 RSI: ffff9c38e42ecdb0 RDI: ffff9c390f11d400
[  171.018226] RBP: ffff9c38eed38000 R08: 0000000000000001 R09: ffffb9a3c0e97b38
[  171.024812] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c393d8c8c00
[  171.036512] R13: ffff9c390f141c00 R14: ffff9c38eed38340 R15: ffff9c38c1a01400
[  171.042400] FS:  00007ff89ca037c0(0000) GS:ffff9c393dc00000(0000) knlGS:0000000000000000
[  171.050182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.055740] CR2: ffffffffffffffd6 CR3: 0000000024c8e000 CR4: 00000000000006f0
Killed

  reply	other threads:[~2023-01-31 23:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26  6:25 [GIT preview] for-6.3/cxl-ram-region Dan Williams
2023-01-26  6:29 ` Dan Williams
2023-01-26 18:50   ` Jonathan Cameron
2023-01-26 19:34     ` Jonathan Cameron
2023-01-30 14:16       ` Gregory Price
2023-01-30 20:10         ` Dan Williams
2023-01-30 20:58           ` Gregory Price
2023-01-30 23:18             ` Dan Williams
2023-01-30 22:00               ` Gregory Price
2023-01-31  2:00               ` Gregory Price
2023-01-31 16:56                 ` Dan Williams
2023-01-31 17:59                 ` Verma, Vishal L
2023-01-31 19:03                   ` Gregory Price
2023-01-31 19:46                     ` Verma, Vishal L
2023-01-31 20:24                       ` Verma, Vishal L
2023-01-31 23:03                         ` Gregory Price [this message]
2023-01-31 23:17                           ` Gregory Price
2023-01-31 23:50                             ` Fan Ni
2023-02-01  5:29                               ` Gregory Price
2023-02-01 21:16                                 ` Gregory Price
2023-02-02  1:06                                   ` Gregory Price
2023-02-02 16:03                                   ` Jonathan Cameron
2023-02-01 22:05                                     ` Gregory Price
2023-02-02 18:13                                       ` Jonathan Cameron
2023-02-02  0:43                                         ` Gregory Price
2023-02-02 18:18                                       ` Dan Williams
2023-02-02  0:44                                         ` Gregory Price
2023-02-07 16:31                                           ` Jonathan Cameron
2023-01-30 14:23       ` Gregory Price
2023-01-31 14:56         ` Jonathan Cameron
2023-01-31 17:34           ` Gregory Price
2023-01-26 22:05 ` Gregory Price
2023-01-26 22:20   ` Dan Williams
2023-02-04  2:36 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9meWfDiCGbca4nP@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.