All of lore.kernel.org
 help / color / mirror / Atom feed
From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Helloworld program fails during device initialization.
Date: Tue, 14 Nov 2017 21:44:29 +0000	[thread overview]
Message-ID: <1510695867.2168.39.camel@intel.com> (raw)
In-Reply-To: 82C9F782B054C94B9FC04A331649C77A9D42AE11@FMSMSX108.amr.corp.intel.com

[-- Attachment #1: Type: text/plain, Size: 17382 bytes --]

SPDK performs parallel initialization of all devices for performance reasons, so
the log below is both of the initialization paths of your devices muxed
together. The part Paul highlighted is for your non-functioning device, while
all of the other log messages are for your functioning device, I think. It would
be easy to tell if you unplugged the working one and only left the one that
wasn't working - can you try that and send the new log to confirm?

When SPDK first reads the registers for your non-functioning SSD to figure out
what state it is in, it sees that the CC.EN bit is set to 1, but the CSTS.RDY
bit is 0. That means the device is performing some internal operation - i.e.
it's already in the process of enabling itself. It is invalid for SPDK to write
any registers until CSTS.RDY flips to 1 in this case. However, it never does and
the code just times out. Unless I'm interpreting something incorrectly, I
believe your device to be out of spec (probably because that particular device
is not functioning, not because all models of that type are out of spec).

Most commonly, problems like this are resolved by power cycling the device. If
you haven't tried rebooting, definitely give that a shot. Did you say that the
device does work with the kernel? When you load the kernel driver can you see
/dev/sdX and dd to it without errors?

Thanks,
Ben

On Tue, 2017-11-14 at 21:07 +0000, Luse, Paul E wrote:
> So in your output the highlighted lines below show one extra state that your
> SSD goes through during init vs the SSD I have, looks like
> NVME_CTRLR_STATE_DISABLE_WAIT_FOR_READY_1 is the state which is also state 1
> in the enum which matches the error listed further below in red.  Having not
> spent much time debugging this driver’s init I can’t say for sure if this (the
> additional output/steps between the extra state and the timeout) is expected
> or not.
>  
> You can wait for Daniel or someone w/more experience to reply or if you want
> to experiment you can either step through the hello_world code with gdb
> putting a break point on nvme_ctrlr_process_init() with a good drive then with
> this one and get an idea of the correct flow vs what you are seeing you can
> maybe mes around with the timeout being set in the condition below which is
> seen on your drive but not on mine.  Up to you, my suggestion is just a shot
> in the dark.
>  
>                                           if (csts.bits.rdy == 0) {
>                                                        
> SPDK_DEBUGLOG(SPDK_TRACE_NVME, "CC.EN = 1 && CSTS.RDY = 0 - waiting for reset
> to complete\n");
>                                                        
> nvme_ctrlr_set_state(ctrlr, NVME_CTRLR_STATE_DISABLE_WAIT_FOR_READY_1,
> ready_timeout_in_ms);
>                                                         return 0;
>                                           }
>  
> Thx
> Paul
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sreeni (Sreenivasa)
> Busam (Stellus)
> Sent: Tuesday, November 14, 2017 1:14 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Thanks Daniel and Paul for helping debug it.
> This is great, I know how to debug SPDK code J
> Here is the debug log.
> nvme_ctrlr.c:1372:nvme_ctrlr_process_init: *INFO*: CC.EN = 1
> nvme_ctrlr.c:1386:nvme_ctrlr_process_init: *INFO*: Setting CC.EN = 0
> nvme_ctrlr.c: 617:nvme_ctrlr_set_state: *INFO*: setting state to disable and
> wait for CSTS.RDY = 0 (timeout 30000 ms)
> nvme_ctrlr.c:1372:nvme_ctrlr_process_init: *INFO*: CC.EN = 1
> nvme_ctrlr.c:1380:nvme_ctrlr_process_init: *INFO*: CC.EN = 1 && CSTS.RDY = 0 -
> waiting for reset to complete
> nvme_ctrlr.c: 617:nvme_ctrlr_set_state: *INFO*: setting state to disable and
> wait for CSTS.RDY = 1 (timeout 30000 ms)
> nvme_ctrlr.c:1432:nvme_ctrlr_process_init: *INFO*: CC.EN = 0 && CSTS.RDY = 0
> nvme_ctrlr.c: 617:nvme_ctrlr_set_state: *INFO*: setting state to enable
> controller by writing CC.EN = 1 (timeout 30000 ms)
> nvme_ctrlr.c:1444:nvme_ctrlr_process_init: *INFO*: Setting CC.EN = 1
> nvme_ctrlr.c: 617:nvme_ctrlr_set_state: *INFO*: setting state to wait for
> CSTS.RDY = 1 (timeout 30000 ms)
> nvme_ctrlr.c:1451:nvme_ctrlr_process_init: *INFO*: CC.EN = 1 && CSTS.RDY = 1 -
> controller is ready
> nvme_pcie.c:1521:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x100ff4c000 len:4096
> nvme_pcie.c:1548:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x3fb0d4c000
> nvme_ctrlr.c: 713:nvme_ctrlr_identify: *INFO*: transport max_xfer_size 2072576
> nvme_pcie.c:1521:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x100ff4b000 len:4096
> nvme_pcie.c:1548:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x3fb0d4b000
> nvme_ctrlr.c: 813:nvme_ctrlr_set_keep_alive_timeout: *INFO*: Controller KAS is
> 0 - not enabling Keep Alive
> nvme_ctrlr.c: 883:nvme_ctrlr_set_host_id: *INFO*: Using 64-bit host identifier
> nvme_ctrlr.c: 899:nvme_ctrlr_set_host_id: *INFO*: User did not specify host ID
> - not sending Set Features - Host ID
> nvme_ctrlr.c: 613:nvme_ctrlr_set_state: *INFO*: setting state to ready (no
> timeout)
> Attached to 0000:ad:00.0
> Using controller SAMSUNG MZQWV480HCGM (S1TRNYAFA00148      ) with 1
> namespaces.
>   Namespace ID: 1 size: 480GB
> nvme_ctrlr.c:1475:nvme_ctrlr_process_init: *ERROR*: Initialization timed out
> in state 1
> nvme_ctrlr.c: 523:nvme_ctrlr_shutdown: *ERROR*: did not shutdown within 5
> seconds
> spdk_nvme_probe() failed
> nvme_ctrlr.c: 515:nvme_ctrlr_shutdown: *INFO*: shutdown complete
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Verkamp, Daniel
> Sent: Tuesday, November 14, 2017 12:06 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Sorry, forgot to mention that you also need to compile with debug enabled
> (make CONFIG_DEBUG=y or ./configure --enable-debug); otherwise, the trace flag
> call doesn’t do anything.
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Luse, Paul E
> Sent: Tuesday, November 14, 2017 12:56 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Sreeni,
>  
> Are you sure you got those statements added (also need to add #include
> “spdk/log.h” as I didn’t see some expected output in your text.  Here’s what
> it should look like:
>  
> peluse(a)pels2-64:~/spdk/examples/nvme/hello_world$ sudo ./hello_world
> Starting DPDK 17.08.0 initialization...
> [ DPDK EAL parameters: hello_world -c 0x1 --file-prefix=spdk0 --base-
> virtaddr=0x1000000000 --proc-type=auto ]
> EAL: Detected 72 lcore(s)
> EAL: Auto-detected process type: PRIMARY
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> EAL: PCI device 0000:06:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:953 spdk_nvme
> Attaching to 0000:06:00.0
> nvme_ctrlr.c: 623:nvme_ctrlr_set_state: *INFO*: setting state to init (no
> timeout)
> nvme_ctrlr.c:1382:nvme_ctrlr_process_init: *INFO*: CC.EN = 1
> nvme_ctrlr.c:1396:nvme_ctrlr_process_init: *INFO*: Setting CC.EN = 0
> nvme_ctrlr.c: 627:nvme_ctrlr_set_state: *INFO*: setting state to disable and
> wait for CSTS.RDY = 0 (timeout 20000 ms)
> nvme_ctrlr.c:1442:nvme_ctrlr_process_init: *INFO*: CC.EN = 0 && CSTS.RDY = 0
> nvme_ctrlr.c: 627:nvme_ctrlr_set_state: *INFO*: setting state to enable
> controller by writing CC.EN = 1 (timeout 20000 ms)
> nvme_ctrlr.c:1454:nvme_ctrlr_process_init: *INFO*: Setting CC.EN = 1
> nvme_ctrlr.c: 627:nvme_ctrlr_set_state: *INFO*: setting state to wait for
> CSTS.RDY = 1 (timeout 20000 ms)
> nvme_ctrlr.c:1461:nvme_ctrlr_process_init: *INFO*: CC.EN = 1 && CSTS.RDY = 1 -
> controller is ready
> nvme_pcie.c:1541:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x103ff9b000 len:4096
> nvme_pcie.c:1568:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x45179b000
> nvme_ctrlr.c: 723:nvme_ctrlr_identify: *INFO*: transport max_xfer_size 2072576
> nvme_ctrlr.c: 727:nvme_ctrlr_identify: *INFO*: MDTS max_xfer_size 131072
> nvme_pcie.c:1541:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x103ff9a000 len:4096
> nvme_pcie.c:1568:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x45179a000
> nvme_ns.c:  98:nvme_ns_identify_update: *INFO*: ns 1 stripe size quirk 256
> blocks
> nvme_pcie.c:1541:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x103ff9b000 len:512
> nvme_pcie.c:1568:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x45179b000
> nvme_ctrlr.c: 823:nvme_ctrlr_set_keep_alive_timeout: *INFO*: Controller KAS is
> 0 - not enabling Keep Alive
> nvme_ctrlr.c: 893:nvme_ctrlr_set_host_id: *INFO*: Using 64-bit host identifier
> nvme_ctrlr.c: 909:nvme_ctrlr_set_host_id: *INFO*: User did not specify host ID
> - not sending Set Features - Host ID
> nvme_ctrlr.c: 623:nvme_ctrlr_set_state: *INFO*: setting state to ready (no
> timeout)
> Attached to 0000:06:00.0
> Using controller INTEL SSDPEDMD400G4  (CVFT7203005M400LGN  ) with 1
> namespaces.
>   Namespace ID: 1 size: 400GB
> Initialization complete.
> nvme_pcie.c:1541:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x103fe73000 len:512
> nvme_pcie.c:1568:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x451673000
> nvme_pcie.c:1541:nvme_pcie_prp_list_append: *INFO*: prp_index:0
> virt_addr:0x103fe73000 len:512
> nvme_pcie.c:1568:nvme_pcie_prp_list_append: *INFO*: prp1 = 0x451673000
> Hello world!
> nvme_ctrlr.c: 512:nvme_ctrlr_shutdown: *INFO*: RTD3E = 0 us
> nvme_ctrlr.c: 515:nvme_ctrlr_shutdown: *INFO*: shutdown timeout = 10000 ms
> nvme_ctrlr.c: 525:nvme_ctrlr_shutdown: *INFO*: shutdown complete in 1808
> milliseconds
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sreeni (Sreenivasa)
> Busam (Stellus)
> Sent: Tuesday, November 14, 2017 12:44 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Hi Daniel,
>  
> I added the debug flag you mentioned in the program and tested it. There is
> not much of any log which is helpful for the probe of the device.
> Please ignore the printf from func pci_probe_all_drivers, it was printf added
> by me.
> I have attached the file.
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sreeni (Sreenivasa)
> Busam (Stellus)
> Sent: Tuesday, November 14, 2017 10:43 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Thanks Daniel for your comments. I think the SSD devices are of same model
> looking at lspci output.
> Let me confirm it with my lab administrator.
> I will add the debug code and give you the output.
>  
> Sreeni
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Verkamp, Daniel
> Sent: Tuesday, November 14, 2017 9:43 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> It looks to me like the device at 0000:ad:00.0 worked in both tests, and the
> device at 0000:ae:00.0 failed during identify and wasn’t used in perf due to
> the -r argument specifying only 0000:ad:00.0.
>  
> Are these two devices the same model/firmware of NVMe controller, or is there
> some difference between them?
>  
> It would also be useful to see the debug output from the NVMe library for the
> failing controller; you can enable this in hello_world.c by adding these two
> lines at the beginning of main():
>  
>   spdk_log_set_print_level(SPDK_LOG_DEBUG);
>   spdk_log_set_trace_flag(“all”);
>  
> Thanks,
> -- Daniel
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Luse, Paul E
> Sent: Tuesday, November 14, 2017 6:46 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Well, maybe not.  Could very well be an issue in the driver that’s only
> exposed via this hardware especially given that hello_world failed and perf
> did not.
>  
> Sreeni, do you have any other data points that you can share? Like, for
> example, have you tried multiple SSDs of that same model or any other model?
> Does hello consistently fail and perf consistently pass? I haven’t done a ton
> of debug with either but with a little more info, if someone else doesn’t jump
> in with the answer, I can pretty easily help compare the init sequence in a
> few of these and maybe start to narrow down what’s different.
>  
> Thx
> Paul
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Wu, WenzhongX
> Sent: Tuesday, November 14, 2017 2:13 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Helloworld program fails during device initialization.
>  
> Hi Sreeni,
>  
> Verify with newest spdk source code with intel P3700 ,run helloword everything
> is ok:
> root(a)waikiki5-target:~/workspace/spdk1102/examples/nvme/hello_world#
> ./hello_world
> Starting DPDK 17.08.0 initialization...
> [ DPDK EAL parameters: hello_world -c 0x1 --file-prefix=spdk0 --base-
> virtaddr=0x1000000000 --proc-type=auto ]
> EAL: Detected 36 lcore(s)
> EAL: Auto-detected process type: PRIMARY
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> EAL: PCI device 0000:81:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:953 spdk_nvme
> Attaching to 0000:81:00.0
> EAL: PCI device 0000:84:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:953 spdk_nvme
> Attaching to 0000:84:00.0
> EAL: PCI device 0000:85:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:953 spdk_nvme
> Attaching to 0000:85:00.0
> Attached to 0000:85:00.0
> Using controller INTEL SSDPEDMD800G4  (PHFT6401011B800CGN  ) with 1
> namespaces.
>   Namespace ID: 1 size: 800GB
> Attached to 0000:81:00.0
> Using controller INTEL SSDPEDMD800G4  (PHFT5373002D800CGN  ) with 1
> namespaces.
>   Namespace ID: 1 size: 800GB
> Attached to 0000:84:00.0
> Using controller INTEL SSDPEDMD800G4  (PHFT53640003800CGN  ) with 1
> namespaces.
>   Namespace ID: 1 size: 800GB
> Initialization complete.
> Hello world!
> Hello world!
> Hello world!
>  
> Seem  it’s your hardware cause this issues.
>  
> Thanks
> Wenzhong
>  
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sreeni (Sreenivasa)
> Busam (Stellus)
> Sent: Tuesday, November 14, 2017 2:36 PM
> To: spdk(a)lists.01.org
> Subject: [SPDK] Helloworld program fails during device initialization.
>  
> I was testing the helloworld program on ubuntu with NVMe SSD and the probe()
> is failing. Please let me know if this is a known problem.
> Let me know if there is workaround.
> The perf program did not have any problem.
> I have attached the output for both tests.
> lspci| grep Non
> ad:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
> Controller (rev 01)
> ae:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
> Controller (rev 01)
> uname -a
> Linux sjca5radevct13 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10
> 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> 1. helloworld output:
> ---------------------------
> Attached to 0000:ad:00.0
> Using controller SAMSUNG MZQWV480HCGM (S1TRNYAFA00148      ) with 1
> namespaces.
>   Namespace ID: 1 size: 480GB
> nvme_ctrlr.c:1475:nvme_ctrlr_process_init: *ERROR*: Initialization timed out
> in state 1
> nvme_ctrlr.c: 523:nvme_ctrlr_shutdown: *ERROR*: did not shutdown within 5
> seconds
> spdk_nvme_probe() failed
>  
> 2. perf output:
> sudo ./perf -q 1 -s 4096 -t 10 -w read -r "trtype:PCIe traddr:0000:ad:00.0"
> ---------------
> Starting DPDK 17.08.0 initialization...
> [ DPDK EAL parameters: perf -c 0x1 --file-prefix=spdk_pid7647 ]
> EAL: Detected 32 lcore(s)
> EAL: No free hugepages reported in hugepages-1048576kB
> Attached to NVMe Controller at 0000:ad:00.0 [144d:a802]
> Associating SAMSUNG MZQWV480HCGM (S1TRNYAFA00148      ) with lcore 0
> Initialization complete. Launching workers.
> Starting thread on core 0
> ========================================================
>       
>                                                                               
>        Latency(us)
> Device Information                                     :       IOPS      
> MB/s    Average        min        max
> SAMSUNG MZQWV480HCGM (S1TRNYAFA00148      ) from core 0:   13773.50     
> 53.80      72.59      60.20     243.82
> ========================================================
> Total                                                  :   13773.50     
> 53.80      72.59      60.20     243.82
>  
> Thanks for your help,
> Sreeni
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]

             reply	other threads:[~2017-11-14 21:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 21:44 Walker, Benjamin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-11-15  0:50 [SPDK] Helloworld program fails during device initialization Luse, Paul E
2017-11-14 23:48 Sreeni Busam
2017-11-14 22:46 Luse, Paul E
2017-11-14 22:25 Sreeni Busam
2017-11-14 21:55 Sreeni Busam
2017-11-14 21:38 Sreeni Busam
2017-11-14 21:07 Luse, Paul E
2017-11-14 20:14 Sreeni Busam
2017-11-14 20:06 Verkamp, Daniel
2017-11-14 20:05 Sreeni Busam
2017-11-14 19:55 Luse, Paul E
2017-11-14 19:49 Sreeni Busam
2017-11-14 19:43 Sreeni Busam
2017-11-14 18:42 Sreeni Busam
2017-11-14 18:39 Sreeni Busam
2017-11-14 18:09 Sreeni Busam
2017-11-14 17:42 Verkamp, Daniel
2017-11-14 13:45 Luse, Paul E
2017-11-14  9:12 Wu, WenzhongX
2017-11-14  6:36 Sreeni Busam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1510695867.2168.39.camel@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.