* don't control-c during ndctl create-namespace?
@ 2017-07-24 23:15 Linda Knippers
2017-07-24 23:35 ` Dan Williams
2017-08-16 16:40 ` Jeff Moyer
0 siblings, 2 replies; 7+ messages in thread
From: Linda Knippers @ 2017-07-24 23:15 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-nvdimm@lists.01.org
[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]
Hi Dan,
I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
I'm running a 4.12 kernel with the latest ndctl.
I had three namespaces configured and all seemed well. When I configured the
fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
state I was in but according to what I could see with ndctl, it had created the
namespace but not enabled it, so I enabled it manually with ndctl and that
seemed ok.
Then I tried to use ndctl create-namespace to change the name, which failed
because the namespace was enabled so I disabled it and tried again. At some
point, not really sure where, I got this kernel warning:
# [ 5224.196085] nd namespace4.3: failed to track label: 4
(details in the attached file)
At this point I rebooted the system. When it came back up, nmem0 was disabled.
I dumped the labels (also attached) and I see that nmem0 has some extra labels
that correspond to the namespace that I was struggling with.
I think my troubles started with the control-c. It doesn't look like ndctl traps
signals when creating namespaces so perhaps we can get into an inconsistent
state.
It also seems like that kernel warning is a bit more important than a
WARN_ONCE would imply. I think that was the beginning of the end of my
configuration. It might have been better to just panic.
I was trying to figure out if I could fix my configuration without
losing the good namespaces but I don't see a way. The check-labels option
isn't very helpful because I think it only looks at the info blocks,
which are fine, even though the labels on nmem0 are not. The destroy-namespace
option doesn't help because it only works with a good namespace.
I'm going to wipe my nvdimms and start over. I suspect the problem is
reproducible but it could depend on the timing of the control-c, unless
the root cause was actually trying to rename a namespace. Maybe I'll try
that again but not today.
-- ljk
[-- Attachment #2: kernelwarning.txt --]
[-- Type: text/plain, Size: 4043 bytes --]
# [ 5224.196085] nd namespace4.3: failed to track label: 4
[ 5224.201190] ------------[ cut here ]------------
[ 5224.205833] WARNING: CPU: 41 PID: 2784 at drivers/nvdimm/label.c:719
nd_pmem_namespace_label_update+0x6d6/0x6e0 [libnvdimm]
[ 5224.217009] Modules linked in: xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ipt_REJECT nf_reject_ipv4
ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute
bridge ebtable_nat ip6table_security ip6table_raw ip6table_mangle
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
iptable_security iptable_raw iptable_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter ip_tables sunrpc vfat fat dm_mirror dm_region_hash dm_log
dm_mod dax_pmem nd_pmem nd_blk device_dax nd_btt intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
[ 5224.287988] pcbc aesni_intel crypto_simd glue_helper cryptd
ipmi_ssif ses ipmi_si enclosure pcspkr nfit hpwdt hpilo ioatdma
ipmi_devintf shpchp dca wmi ipmi_msghandler libnvdimm acpi_power_meter
sch_fq_codel ext4 mbcache jbd2 sd_mod mgag200 8021q garp i2c_algo_bit
stp drm_kms_helper llc mrp syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm tg3 ptp drm uas crc32c_intel smartpqi pps_core usb_storage
scsi_transport_sas i2c_core
[ 5224.325853] CPU: 41 PID: 2784 Comm: ndctl Not tainted 4.12.0+ #1
[ 5224.331881] Hardware name: HPE
[ 5224.340435] task: ffff91d7c75a5a00 task.stack: ffffb25006b08000
[ 5224.346380] RIP: 0010:nd_pmem_namespace_label_update+0x6d6/0x6e0
[libnvdimm]
[ 5224.353453] RSP: 0018:ffffb25006b0bca0 EFLAGS: 00010246
[ 5224.358695] RAX: 0000000000000029 RBX: ffff91d7c69937c0 RCX: 0000000000000000
[ 5224.365856] RDX: 0000000000000000 RSI: ffff91e63f84df98 RDI: ffff91e63f84df98
[ 5224.373016] RBP: ffffb25006b0bd58 R08: 00000000fffffffe R09: 0000000000000fb9
[ 5224.380178] R10: 0000000000000005 R11: 0000000000000fb8 R12: 0000000000000004
[ 5224.387338] R13: ffff91e63ecbfb80 R14: ffff91e636da0400 R15: ffff91e6082edc00
[ 5224.394499] FS: 00007fdf83eaad80(0000) GS:ffff91e63f840000(0000) knlGS:0000000000000000
[ 5224.402617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5224.408383] CR2: 00007f2736a87b00 CR3: 0000000fa2228000 CR4: 00000000007406e0
[ 5224.415545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5224.422704] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5224.429866] PKRU: 55555554
[ 5224.432582] Call Trace:
[ 5224.435040] ? kfree+0x133/0x180
[ 5224.438283] nd_namespace_label_update+0xec/0x130 [libnvdimm]
[ 5224.444052] uuid_store+0x17c/0x1a0 [libnvdimm]
[ 5224.448602] ? _copy_to_user+0x26/0x40
[ 5224.452369] dev_attr_store+0x18/0x30
[ 5224.456047] sysfs_kf_write+0x3a/0x50
[ 5224.459721] kernfs_fop_write+0xff/0x180
[ 5224.463660] __vfs_write+0x37/0x170
[ 5224.467165] ? selinux_file_permission+0xe5/0x120
[ 5224.471887] ? security_file_permission+0x3b/0xc0
[ 5224.476609] vfs_write+0xb2/0x1b0
[ 5224.479936] SyS_write+0x55/0xc0
[ 5224.483177] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 5224.487811] RIP: 0033:0x7fdf831c7b50
[ 5224.491399] RSP: 002b:00007fff147a2348 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 5224.498996] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdf831c7b50
[ 5224.506157] RDX: 0000000000000025 RSI: 00007fff147a23f0 RDI: 0000000000000003
[ 5224.513318] RBP: 00007fff147a1fd0 R08: 000000000000e4f0 R09: 0000000000000024
[ 5224.520479] R10: 0000000000000873 R11: 0000000000000246 R12: 0000000000405500
[ 5224.527640] R13: 00007fff147a2880 R14: 0000000000000000 R15: 0000000000000000
[ 5224.534801] Code: 85 db 49 89 c4 75 04 49 8b 5f 18 49 8d 7f 08 e8 11
e4 f6 cd 44 89 e1 48 89 c6 48 89 da 48 c7 c7 38 ae 57 c0 31 c0 e8 5d eb
b6 cd <0f> ff eb a3 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 85 d2 48 89
[ 5224.553745] ---[ end trace 0608553751bfd078 ]---
[-- Attachment #3: labels.txt --]
[-- Type: text/plain, Size: 6164 bytes --]
$ sudo ndctl read-labels -j all
[
{
"dev":"nmem1",
"index":[
{
"signature":"NAMESPACE_INDEX",
"seq":2,
"nslot":1016
},
{
"signature":"NAMESPACE_INDEX",
"seq":3,
"nslot":1016
}
],
"label":[
{
"uuid":"359c68f7-efd9-410b-a727-1def27d574b5",
"name":"number1",
"slot":0,
"position":1,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":268435456,
"rawsize":3221225472
},
{
"uuid":"c4047c15-7fa1-4f59-b60f-676f23326989",
"name":"number2",
"slot":1,
"position":1,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":3489660928,
"rawsize":3221225472
},
{
"uuid":"56252c66-b1f1-435c-821a-67becef82f4c",
"name":"number3",
"slot":2,
"position":1,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":6710886400,
"rawsize":3221225472
},
{
"uuid":"5b6e072c-5abb-43ff-ba8a-3bca6ee19068",
"name":"number3",
"slot":3,
"position":1,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
}
]
},
{
"dev":"nmem3",
"index":[
{
"signature":"NAMESPACE_INDEX",
"seq":2,
"nslot":1016
},
{
"signature":"NAMESPACE_INDEX",
"seq":3,
"nslot":1016
}
],
"label":[
{
"uuid":"359c68f7-efd9-410b-a727-1def27d574b5",
"name":"number1",
"slot":0,
"position":3,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":268435456,
"rawsize":3221225472
},
{
"uuid":"c4047c15-7fa1-4f59-b60f-676f23326989",
"name":"number2",
"slot":1,
"position":3,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":3489660928,
"rawsize":3221225472
},
{
"uuid":"56252c66-b1f1-435c-821a-67becef82f4c",
"name":"number3",
"slot":2,
"position":3,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":6710886400,
"rawsize":3221225472
},
{
"uuid":"5b6e072c-5abb-43ff-ba8a-3bca6ee19068",
"name":"number3",
"slot":3,
"position":3,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
}
]
},
{
"dev":"nmem0",
"index":[
{
"signature":"NAMESPACE_INDEX",
"seq":3,
"nslot":1016
},
{
"signature":"NAMESPACE_INDEX",
"seq":1,
"nslot":1016
}
],
"label":[
{
"uuid":"359c68f7-efd9-410b-a727-1def27d574b5",
"name":"number1",
"slot":0,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":268435456,
"rawsize":3221225472
},
{
"uuid":"c4047c15-7fa1-4f59-b60f-676f23326989",
"name":"number2",
"slot":1,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":3489660928,
"rawsize":3221225472
},
{
"uuid":"56252c66-b1f1-435c-821a-67becef82f4c",
"name":"number3",
"slot":2,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":6710886400,
"rawsize":3221225472
},
{
"uuid":"5b6e072c-5abb-43ff-ba8a-3bca6ee19068",
"name":"number3",
"slot":3,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
},
{
"uuid":"07a28334-2937-4c07-8c84-1a4df9958fb5",
"name":"number3",
"slot":4,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
},
{
"uuid":"769236aa-0438-49c1-8769-67fcbf088469",
"name":"number3",
"slot":5,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
},
{
"uuid":"17b47468-9a92-418c-af07-05e5a7627de9",
"name":"number3",
"slot":6,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
},
{
"uuid":"752cc95d-8fb9-4ea4-a43b-6ce9c81f11f2",
"name":"number4",
"slot":7,
"position":0,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
}
]
},
{
"dev":"nmem2",
"index":[
{
"signature":"NAMESPACE_INDEX",
"seq":2,
"nslot":1016
},
{
"signature":"NAMESPACE_INDEX",
"seq":3,
"nslot":1016
}
],
"label":[
{
"uuid":"359c68f7-efd9-410b-a727-1def27d574b5",
"name":"number1",
"slot":0,
"position":2,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":268435456,
"rawsize":3221225472
},
{
"uuid":"c4047c15-7fa1-4f59-b60f-676f23326989",
"name":"number2",
"slot":1,
"position":2,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":3489660928,
"rawsize":3221225472
},
{
"uuid":"56252c66-b1f1-435c-821a-67becef82f4c",
"name":"number3",
"slot":2,
"position":2,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":6710886400,
"rawsize":3221225472
},
{
"uuid":"5b6e072c-5abb-43ff-ba8a-3bca6ee19068",
"name":"number3",
"slot":3,
"position":2,
"nlabel":4,
"isetcookie":5125243788704424131,
"dpa":9932111872,
"rawsize":3221225472
}
]
}
]
read 4 nmems
[-- Attachment #4: Type: text/plain, Size: 151 bytes --]
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-07-24 23:15 don't control-c during ndctl create-namespace? Linda Knippers
@ 2017-07-24 23:35 ` Dan Williams
2017-07-24 23:43 ` Linda Knippers
2017-08-16 16:40 ` Jeff Moyer
1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2017-07-24 23:35 UTC (permalink / raw)
To: Linda Knippers; +Cc: linux-nvdimm@lists.01.org
On Mon, Jul 24, 2017 at 4:15 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> Hi Dan,
>
> I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
> I'm running a 4.12 kernel with the latest ndctl.
>
> I had three namespaces configured and all seemed well. When I configured the
> fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
> state I was in but according to what I could see with ndctl, it had created the
> namespace but not enabled it, so I enabled it manually with ndctl and that
> seemed ok.
>
> Then I tried to use ndctl create-namespace to change the name, which failed
> because the namespace was enabled so I disabled it and tried again. At some
> point, not really sure where, I got this kernel warning:
>
> # [ 5224.196085] nd namespace4.3: failed to track label: 4
>
> (details in the attached file)
>
> At this point I rebooted the system. When it came back up, nmem0 was disabled.
> I dumped the labels (also attached) and I see that nmem0 has some extra labels
> that correspond to the namespace that I was struggling with.
>
> I think my troubles started with the control-c. It doesn't look like ndctl traps
> signals when creating namespaces so perhaps we can get into an inconsistent
> state.
>
> It also seems like that kernel warning is a bit more important than a
> WARN_ONCE would imply. I think that was the beginning of the end of my
> configuration. It might have been better to just panic.
In general if the system is even remotely recoverable we don't panic.
In this case it is recoverable. The WARN_ONCE() is really there as a
loud, "this is a kernel bug, but we'll do our best to keep going".
> I was trying to figure out if I could fix my configuration without
> losing the good namespaces but I don't see a way. The check-labels option
> isn't very helpful because I think it only looks at the info blocks,
> which are fine, even though the labels on nmem0 are not. The destroy-namespace
> option doesn't help because it only works with a good namespace.
>
> I'm going to wipe my nvdimms and start over. I suspect the problem is
> reproducible but it could depend on the timing of the control-c, unless
> the root cause was actually trying to rename a namespace. Maybe I'll try
> that again but not today.
The recovery method when the labels are corrupted is:
ndctl disable-region all
ndctl zero-labels all
ndctl enable-region all
...and that should get you back to square one.
If you are able to reproduce I'd like to see the state of the DIMM
label areas. You can dump them in json format with the following:
ndctl read-labels -j all
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-07-24 23:35 ` Dan Williams
@ 2017-07-24 23:43 ` Linda Knippers
2017-07-24 23:57 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Linda Knippers @ 2017-07-24 23:43 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-nvdimm@lists.01.org
On 07/24/2017 07:35 PM, Dan Williams wrote:
> On Mon, Jul 24, 2017 at 4:15 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> Hi Dan,
>>
>> I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
>> I'm running a 4.12 kernel with the latest ndctl.
>>
>> I had three namespaces configured and all seemed well. When I configured the
>> fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
>> state I was in but according to what I could see with ndctl, it had created the
>> namespace but not enabled it, so I enabled it manually with ndctl and that
>> seemed ok.
>>
>> Then I tried to use ndctl create-namespace to change the name, which failed
>> because the namespace was enabled so I disabled it and tried again. At some
>> point, not really sure where, I got this kernel warning:
>>
>> # [ 5224.196085] nd namespace4.3: failed to track label: 4
>>
>> (details in the attached file)
>>
>> At this point I rebooted the system. When it came back up, nmem0 was disabled.
>> I dumped the labels (also attached) and I see that nmem0 has some extra labels
>> that correspond to the namespace that I was struggling with.
>>
>> I think my troubles started with the control-c. It doesn't look like ndctl traps
>> signals when creating namespaces so perhaps we can get into an inconsistent
>> state.
>>
>> It also seems like that kernel warning is a bit more important than a
>> WARN_ONCE would imply. I think that was the beginning of the end of my
>> configuration. It might have been better to just panic.
>
> In general if the system is even remotely recoverable we don't panic.
> In this case it is recoverable. The WARN_ONCE() is really there as a
> loud, "this is a kernel bug, but we'll do our best to keep going".
Keeping going is ok unless you're risking data.
>> I was trying to figure out if I could fix my configuration without
>> losing the good namespaces but I don't see a way. The check-labels option
>> isn't very helpful because I think it only looks at the info blocks,
>> which are fine, even though the labels on nmem0 are not. The destroy-namespace
>> option doesn't help because it only works with a good namespace.
>>
>> I'm going to wipe my nvdimms and start over. I suspect the problem is
>> reproducible but it could depend on the timing of the control-c, unless
>> the root cause was actually trying to rename a namespace. Maybe I'll try
>> that again but not today.
>
> The recovery method when the labels are corrupted is:
>
> ndctl disable-region all
> ndctl zero-labels all
> ndctl enable-region all
>
> ...and that should get you back to square one.
Right, but that blows away all my namespaces. I was hoping to find a way
to just fix up (delete) what appeared to be extraneous labels.
>
> If you are able to reproduce I'd like to see the state of the DIMM
> label areas. You can dump them in json format with the following:
>
> ndctl read-labels -j all
That was in one of the attachments.
-- ljk
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-07-24 23:43 ` Linda Knippers
@ 2017-07-24 23:57 ` Dan Williams
0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2017-07-24 23:57 UTC (permalink / raw)
To: Linda Knippers; +Cc: linux-nvdimm@lists.01.org
On Mon, Jul 24, 2017 at 4:43 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 07/24/2017 07:35 PM, Dan Williams wrote:
>> On Mon, Jul 24, 2017 at 4:15 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>> Hi Dan,
>>>
>>> I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
>>> I'm running a 4.12 kernel with the latest ndctl.
>>>
>>> I had three namespaces configured and all seemed well. When I configured the
>>> fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
>>> state I was in but according to what I could see with ndctl, it had created the
>>> namespace but not enabled it, so I enabled it manually with ndctl and that
>>> seemed ok.
>>>
>>> Then I tried to use ndctl create-namespace to change the name, which failed
>>> because the namespace was enabled so I disabled it and tried again. At some
>>> point, not really sure where, I got this kernel warning:
>>>
>>> # [ 5224.196085] nd namespace4.3: failed to track label: 4
>>>
>>> (details in the attached file)
>>>
>>> At this point I rebooted the system. When it came back up, nmem0 was disabled.
>>> I dumped the labels (also attached) and I see that nmem0 has some extra labels
>>> that correspond to the namespace that I was struggling with.
>>>
>>> I think my troubles started with the control-c. It doesn't look like ndctl traps
>>> signals when creating namespaces so perhaps we can get into an inconsistent
>>> state.
>>>
>>> It also seems like that kernel warning is a bit more important than a
>>> WARN_ONCE would imply. I think that was the beginning of the end of my
>>> configuration. It might have been better to just panic.
>>
>> In general if the system is even remotely recoverable we don't panic.
>> In this case it is recoverable. The WARN_ONCE() is really there as a
>> loud, "this is a kernel bug, but we'll do our best to keep going".
>
> Keeping going is ok unless you're risking data.
>
>>> I was trying to figure out if I could fix my configuration without
>>> losing the good namespaces but I don't see a way. The check-labels option
>>> isn't very helpful because I think it only looks at the info blocks,
>>> which are fine, even though the labels on nmem0 are not. The destroy-namespace
>>> option doesn't help because it only works with a good namespace.
>>>
>>> I'm going to wipe my nvdimms and start over. I suspect the problem is
>>> reproducible but it could depend on the timing of the control-c, unless
>>> the root cause was actually trying to rename a namespace. Maybe I'll try
>>> that again but not today.
>>
>> The recovery method when the labels are corrupted is:
>>
>> ndctl disable-region all
>> ndctl zero-labels all
>> ndctl enable-region all
>>
>> ...and that should get you back to square one.
>
> Right, but that blows away all my namespaces. I was hoping to find a way
> to just fix up (delete) what appeared to be extraneous labels.
That's something I've been thinking about as well, like a 'slot'
option to ndctl zero-labels to just try to limit the zeroing to single
bad slot. That said, since the kernel was in charge of the label area
it's a bug that it generated a layout that itself couldn't parse.
>> If you are able to reproduce I'd like to see the state of the DIMM
>> label areas. You can dump them in json format with the following:
>>
>> ndctl read-labels -j all
>
> That was in one of the attachments.
Ah, sorry, I overlooked that.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-07-24 23:15 don't control-c during ndctl create-namespace? Linda Knippers
2017-07-24 23:35 ` Dan Williams
@ 2017-08-16 16:40 ` Jeff Moyer
2017-08-16 20:12 ` Linda Knippers
1 sibling, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2017-08-16 16:40 UTC (permalink / raw)
To: Linda Knippers, Dan Williams; +Cc: linux-nvdimm@lists.01.org
Hi, Linda and Dan,
Linda Knippers <linda.knippers@hpe.com> writes:
> Hi Dan,
>
> I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
> I'm running a 4.12 kernel with the latest ndctl.
>
> I had three namespaces configured and all seemed well. When I configured the
> fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
> state I was in but according to what I could see with ndctl, it had created the
> namespace but not enabled it, so I enabled it manually with ndctl and that
> seemed ok.
>
> Then I tried to use ndctl create-namespace to change the name, which failed
> because the namespace was enabled so I disabled it and tried again. At some
> point, not really sure where, I got this kernel warning:
>
> # [ 5224.196085] nd namespace4.3: failed to track label: 4
I think I know how to reproduce this part reliably. Simply try to
create multiple namespaces in a single region at the same time:
# ndctl create-namespace -r regionN -m memory & ndctl create-namespace -r regionN -m memory
That will lead to the dev_WARN_ONCE Linda mentioned. Then, the DIMM
will have an invalid label layout. On reboot, the dimm will be disabled
(these messages are printed when I reboot in this state):
[ 24.311419] nvdimm nmem1: nvdimm_init_config_data: len: 131072 rc: 0
[ 24.311420] nvdimm nmem1: config data size: 131072
[ 24.311421] nvdimm nmem1: __nd_label_validate: nsindex0 labelsize 1 invalid
[ 24.311422] nvdimm nmem1: __nd_label_validate: nsindex1 labelsize 1 invalid
[ 24.311425] nvdimm nmem1: : pmem-9221e8a3: 0x1f80000000 @ 0x10000000 reserve
[ 24.311427] nvdimm nmem1: : null: 0x0 @ 0x0 reserve
[ 24.311428] nvdimm nmem1: nvdimm_drvdata_release
[ 24.311430] nd_bus ndbus0: nvdimm.probe(nmem1) = -16
[ 24.311442] nvdimm: probe of nmem1 failed with error -16
Trying to enable nmem1 will result in EBUSY, since we're trying to
reserve address 0 (see the null entry above).
Unlike Linda's case, I can recover by zeroing the label space. However,
I don't have interleave enabled.
I've attached the result of read-labels for nmem1 below.
-Jeff
# ndctl read-labels -j nmem1
{
"dev":"nmem1",
"index":[
{
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":1,
"nslot":510
},
{
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":2,
"nslot":510
}
],
"label":[
{
"uuid":"9221e8a3-f43a-4204-86b1-e4bcd977ae27",
"name":"",
"slot":0,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"4c812805-e736-4876-bab2-eab15a847a9f",
"name":"",
"slot":1,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"4c812805-e736-4876-bab2-eab15a847a9f",
"name":"",
"slot":2,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
"name":"",
"slot":3,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
"name":"",
"slot":4,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
"name":"",
"slot":5,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"b8bf5176-cc34-4b39-8d03-3a912e715366",
"name":"",
"slot":6,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
},
{
"uuid":"bd340ce8-5774-402b-b1ac-b82d590665d7",
"name":"",
"slot":7,
"position":0,
"nlabel":1,
"isetcookie":62413126465469009,
"lbasize":0,
"dpa":268435456,
"rawsize":135291469824,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"00000000-0000-0000-0000-000000000000"
}
]
}
read 1 nmem
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-08-16 16:40 ` Jeff Moyer
@ 2017-08-16 20:12 ` Linda Knippers
2017-08-16 20:30 ` Jeff Moyer
0 siblings, 1 reply; 7+ messages in thread
From: Linda Knippers @ 2017-08-16 20:12 UTC (permalink / raw)
To: Jeff Moyer, Dan Williams; +Cc: Saldivar, Maurice A., linux-nvdimm@lists.01.org
On 08/16/2017 12:40 PM, Jeff Moyer wrote:
> Hi, Linda and Dan,
>
> Linda Knippers <linda.knippers@hpe.com> writes:
>
>> Hi Dan,
>>
>> I've got 4 NVDIMMs in an interleave set in a configuration that supports labels.
>> I'm running a 4.12 kernel with the latest ndctl.
>>
>> I had three namespaces configured and all seemed well. When I configured the
>> fourth one, I made a mistake in the name so I hit control-c. I wasn't sure what
>> state I was in but according to what I could see with ndctl, it had created the
>> namespace but not enabled it, so I enabled it manually with ndctl and that
>> seemed ok.
>>
>> Then I tried to use ndctl create-namespace to change the name, which failed
>> because the namespace was enabled so I disabled it and tried again. At some
>> point, not really sure where, I got this kernel warning:
>>
>> # [ 5224.196085] nd namespace4.3: failed to track label: 4
>
> I think I know how to reproduce this part reliably. Simply try to
> create multiple namespaces in a single region at the same time:
>
> # ndctl create-namespace -r regionN -m memory & ndctl create-namespace -r regionN -m memory
>
> That will lead to the dev_WARN_ONCE Linda mentioned. Then, the DIMM
> will have an invalid label layout. On reboot, the dimm will be disabled
> (these messages are printed when I reboot in this state):
>
> [ 24.311419] nvdimm nmem1: nvdimm_init_config_data: len: 131072 rc: 0
> [ 24.311420] nvdimm nmem1: config data size: 131072
> [ 24.311421] nvdimm nmem1: __nd_label_validate: nsindex0 labelsize 1 invalid
> [ 24.311422] nvdimm nmem1: __nd_label_validate: nsindex1 labelsize 1 invalid
> [ 24.311425] nvdimm nmem1: : pmem-9221e8a3: 0x1f80000000 @ 0x10000000 reserve
> [ 24.311427] nvdimm nmem1: : null: 0x0 @ 0x0 reserve
> [ 24.311428] nvdimm nmem1: nvdimm_drvdata_release
> [ 24.311430] nd_bus ndbus0: nvdimm.probe(nmem1) = -16
> [ 24.311442] nvdimm: probe of nmem1 failed with error -16
>
> Trying to enable nmem1 will result in EBUSY, since we're trying to
> reserve address 0 (see the null entry above).
>
> Unlike Linda's case, I can recover by zeroing the label space.
If you have to zero your labels, it's not really recovering. Or are you
able to recreate labels and not lose data that might have been in those
pmem ranges?
> However, I don't have interleave enabled.
Perhaps Maurice can try this with interleave enabled.
-- ljk
>
> I've attached the result of read-labels for nmem1 below.
>
> -Jeff
>
> # ndctl read-labels -j nmem1
> {
> "dev":"nmem1",
> "index":[
> {
> "signature":"NAMESPACE_INDEX",
> "major":1,
> "minor":2,
> "labelsize":256,
> "seq":1,
> "nslot":510
> },
> {
> "signature":"NAMESPACE_INDEX",
> "major":1,
> "minor":2,
> "labelsize":256,
> "seq":2,
> "nslot":510
> }
> ],
> "label":[
> {
> "uuid":"9221e8a3-f43a-4204-86b1-e4bcd977ae27",
> "name":"",
> "slot":0,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"4c812805-e736-4876-bab2-eab15a847a9f",
> "name":"",
> "slot":1,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"4c812805-e736-4876-bab2-eab15a847a9f",
> "name":"",
> "slot":2,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
> "name":"",
> "slot":3,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
> "name":"",
> "slot":4,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"f7be9e94-1ba7-4f52-9090-12bcc08839fb",
> "name":"",
> "slot":5,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"b8bf5176-cc34-4b39-8d03-3a912e715366",
> "name":"",
> "slot":6,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> },
> {
> "uuid":"bd340ce8-5774-402b-b1ac-b82d590665d7",
> "name":"",
> "slot":7,
> "position":0,
> "nlabel":1,
> "isetcookie":62413126465469009,
> "lbasize":0,
> "dpa":268435456,
> "rawsize":135291469824,
> "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
> "abstraction_guid":"00000000-0000-0000-0000-000000000000"
> }
> ]
> }
> read 1 nmem
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: don't control-c during ndctl create-namespace?
2017-08-16 20:12 ` Linda Knippers
@ 2017-08-16 20:30 ` Jeff Moyer
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2017-08-16 20:30 UTC (permalink / raw)
To: Linda Knippers; +Cc: Saldivar, Maurice A., linux-nvdimm@lists.01.org
Linda Knippers <linda.knippers@hpe.com> writes:
> On 08/16/2017 12:40 PM, Jeff Moyer wrote:
>> [ 24.311419] nvdimm nmem1: nvdimm_init_config_data: len: 131072 rc: 0
>> [ 24.311420] nvdimm nmem1: config data size: 131072
>> [ 24.311421] nvdimm nmem1: __nd_label_validate: nsindex0 labelsize 1 invalid
>> [ 24.311422] nvdimm nmem1: __nd_label_validate: nsindex1 labelsize 1 invalid
>> [ 24.311425] nvdimm nmem1: : pmem-9221e8a3: 0x1f80000000 @ 0x10000000 reserve
>> [ 24.311427] nvdimm nmem1: : null: 0x0 @ 0x0 reserve
>> [ 24.311428] nvdimm nmem1: nvdimm_drvdata_release
>> [ 24.311430] nd_bus ndbus0: nvdimm.probe(nmem1) = -16
>> [ 24.311442] nvdimm: probe of nmem1 failed with error -16
>>
>> Trying to enable nmem1 will result in EBUSY, since we're trying to
>> reserve address 0 (see the null entry above).
>>
>> Unlike Linda's case, I can recover by zeroing the label space.
>
> If you have to zero your labels, it's not really recovering. Or are you
> able to recreate labels and not lose data that might have been in those
> pmem ranges?
Good point. Due to the corrupted label, I lose everything on that DIMM.
I think Dan mentioned maybe adding the ability to clear out bad labels.
We'd be better off not getting into this state in the first place,
obviously. ;-)
-Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-16 20:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-24 23:15 don't control-c during ndctl create-namespace? Linda Knippers
2017-07-24 23:35 ` Dan Williams
2017-07-24 23:43 ` Linda Knippers
2017-07-24 23:57 ` Dan Williams
2017-08-16 16:40 ` Jeff Moyer
2017-08-16 20:12 ` Linda Knippers
2017-08-16 20:30 ` Jeff Moyer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.