From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marc Sune Subject: Memory corruption in librte_ether? Date: Fri, 17 Oct 2014 23:16:47 +0200 Message-ID: <5441873F.90500@bisdn.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable To: "" Return-path: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" Hi all, I was rebasing the KNI mempool v4 patch(I have it finalised, but wanted=20 to check) to the latest master HEAD=20 (075e064089e1c2b6899db58c69be1a387eb5ffa7) when I ran into problems with=20 the current KNI example with em interfaces in a VM. I then switched to=20 master's head and retried (so without the KNI mempool patch!) with the=20 *same behaviour*. Behaviour here listed is with master head, so nothing=20 to do with the patch I am working on. The *VM*, emulated with qemu has 4 e1000 interfaces attached to several=20 bridges. qmeu version 1.1.2 running in debian 7 64bit. With this setup I=20 get the error: (gdb) r Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3=20 -n 2 -- -p 0x3 -P --config=3D\(0,1,1,1\),\(1,0,0,0\) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1"= . EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Support maximum 64 logical core(s) by configuration. EAL: Detected 2 lcore(s) EAL: Setting up memory... EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7ffff6e00000 (size =3D 0x200000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7ffff6400000 (size =3D 0x800000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7ffff5e00000 (size =3D 0x400000) EAL: Ask a virtual area of 0x17000000 bytes EAL: Virtual area found at 0x7fffdec00000 (size =3D 0x17000000) EAL: Ask a virtual area of 0x1e00000 bytes EAL: Virtual area found at 0x7fffdcc00000 (size =3D 0x1e00000) EAL: Ask a virtual area of 0x1400000 bytes EAL: Virtual area found at 0x7fffdb600000 (size =3D 0x1400000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7fffdac00000 (size =3D 0x800000) EAL: Ask a virtual area of 0x2000000 bytes EAL: Virtual area found at 0x7fffd8a00000 (size =3D 0x2000000) EAL: Ask a virtual area of 0x2c00000 bytes EAL: Virtual area found at 0x7fffd5c00000 (size =3D 0x2c00000) EAL: Ask a virtual area of 0x7c00000 bytes EAL: Virtual area found at 0x7fffcde00000 (size =3D 0x7c00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcd800000 (size =3D 0x400000) EAL: Ask a virtual area of 0xc00000 bytes EAL: Virtual area found at 0x7fffcca00000 (size =3D 0xc00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcc400000 (size =3D 0x400000) EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7fffcc000000 (size =3D 0x200000) EAL: Requesting 331 pages of size 2MB from socket 0 [New Thread 0x7fffcbfff700 (LWP 19279)] yEAL: TSC frequency is ~2494343 KHz EAL: WARNING: cpu flags constant_tsc=3Dyes nonstop_tsc=3Dno -> using=20 unreliable clock cycles ! EAL: Master core 0 is ready (tid=3Df7ff0800) [New Thread 0x7fffcb7fc700 (LWP 19280)] EAL: Core 1 is ready (tid=3Dcb7fc700) EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: 0000:00:03.0 not managed by UIO driver, skipping EAL: PCI device 0000:00:06.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f9a000 PMD: eth_em_dev_init(): port_id 0 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:07.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f7a000 PMD: eth_em_dev_init(): port_id 1 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:08.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f5a000 PMD: eth_em_dev_init(): port_id 2 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:09.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f3a000 PMD: eth_em_dev_init(): port_id 3 vendorID=3D0x8086 deviceID=3D0x100e APP: Port ID: 0 APP: Rx lcore ID: 1, Tx lcore ID: 1 APP: Kernel thread lcore ID: 1 APP: Port ID: 1 APP: Rx lcore ID: 0, Tx lcore ID: 0 APP: Kernel thread lcore ID: 0 APP: Initialising port 0 ... PMD: eth_em_rx_queue_setup(): sw_ring=3D0x7fffcd4e7d00=20 hw_ring=3D0x7ffff6fdaac0 dma_addr=3D0x5daac0 PMD: eth_em_tx_queue_setup(): sw_ring=3D0x7fffcd4e5c00=20 hw_ring=3D0x7ffff6feaac0 dma_addr=3D0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): drop_en functionality not supported by devi= ce EAL: Error - exiting with code: 1 Cause: Could not setup up RX queue for port1 (-22) [Thread 0x7fffcb7fc700 (LWP 19280) exited] [Thread 0x7ffff7ff0800 (LWP 19278) exited] The default rx_conf in librte_pmd_e1000/igb_ethdev.c seems OK, setting=20 drop_en to 0. Debugging e1000 pmd (the 4 NICs are emulating the same exact device): marc@dpdk:~/dpdk/lib$ git diff diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefil= e index 14bc4a2..e50b715 100644 --- a/lib/librte_pmd_e1000/Makefile +++ b/lib/librte_pmd_e1000/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB =3D librte_pmd_e1000.a -CFLAGS +=3D -O3 +CFLAGS +=3D -g -O0 CFLAGS +=3D $(WERROR_FLAGS) seems something is wrong First iface (PCI 0:6.0): (gdb) print dev->data->name $4 =3D "0:6.0", '\000' (gdb) print *rx_conf $5 =3D {rx_thresh =3D {pthresh =3D 0 '\000', hthresh =3D 0 '\000', wthres= h =3D 0=20 '\000'}, rx_free_thresh =3D 0, rx_drop_en =3D 0 '\000', rx_deferred_start= =3D=20 0 '\000'} (gdb) Second iface (PCI 0:7.0): (gdb) print dev->data->name $6 =3D "0:7.0", '\000' (gdb) print *rx_conf $7 =3D {rx_thresh =3D {pthresh =3D 0 '\000', hthresh =3D 0 '\000', wthres= h =3D 0=20 '\000'}, rx_free_thresh =3D 33088, rx_drop_en =3D 176 '\260',=20 rx_deferred_start =3D 44 ','} Note that rx_free_thresh on has polluted values. However, when adding -g -O0 in ethdev: marc@dpdk:~/dpdk/lib$ git diff diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile index b310f8b..ec385ef 100644 --- a/lib/librte_ether/Makefile +++ b/lib/librte_ether/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB =3D libethdev.a -CFLAGS +=3D -O3 +CFLAGS +=3D -g -O0 CFLAGS +=3D $(WERROR_FLAGS) SRCS-y +=3D rte_ethdev.c diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefil= e index 14bc4a2..e50b715 100644 --- a/lib/librte_pmd_e1000/Makefile +++ b/lib/librte_pmd_e1000/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB =3D librte_pmd_e1000.a -CFLAGS +=3D -O3 +CFLAGS +=3D -g -O0 CFLAGS +=3D $(WERROR_FLAGS) ifeq ($(CC), icc) Now the rx queue has correctly been set up (memory corruption!) so the=20 rx_conf appears to be OK, although now tx_conf seems wrong: (gdb) r Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3=20 -n 2 -- -p 0x3 -P --config=3D\(0,1,1,1\),\(1,0,0,0\) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1"= . EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Support maximum 64 logical core(s) by configuration. EAL: Detected 2 lcore(s) EAL: Setting up memory... EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7ffff6e00000 (size =3D 0x200000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7ffff6400000 (size =3D 0x800000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7ffff5e00000 (size =3D 0x400000) EAL: Ask a virtual area of 0x17000000 bytes EAL: Virtual area found at 0x7fffdec00000 (size =3D 0x17000000) EAL: Ask a virtual area of 0x1e00000 bytes EAL: Virtual area found at 0x7fffdcc00000 (size =3D 0x1e00000) EAL: Ask a virtual area of 0x1400000 bytes EAL: Virtual area found at 0x7fffdb600000 (size =3D 0x1400000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7fffdac00000 (size =3D 0x800000) EAL: Ask a virtual area of 0x2000000 bytes EAL: Virtual area found at 0x7fffd8a00000 (size =3D 0x2000000) EAL: Ask a virtual area of 0x2c00000 bytes EAL: Virtual area found at 0x7fffd5c00000 (size =3D 0x2c00000) EAL: Ask a virtual area of 0x7c00000 bytes EAL: Virtual area found at 0x7fffcde00000 (size =3D 0x7c00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcd800000 (size =3D 0x400000) EAL: Ask a virtual area of 0xc00000 bytes EAL: Virtual area found at 0x7fffcca00000 (size =3D 0xc00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcc400000 (size =3D 0x400000) EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7fffcc000000 (size =3D 0x200000) EAL: Requesting 331 pages of size 2MB from socket 0 [New Thread 0x7fffcbfff700 (LWP 22143)] EAL: TSC frequency is ~2494343 KHz EAL: WARNING: cpu flags constant_tsc=3Dyes nonstop_tsc=3Dno -> using=20 unreliable clock cycles ! EAL: Master core 0 is ready (tid=3Df7ff0800) [New Thread 0x7fffcb7fc700 (LWP 22144)] EAL: Core 1 is ready (tid=3Dcb7fc700) EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: 0000:00:03.0 not managed by UIO driver, skipping EAL: PCI device 0000:00:06.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f9a000 PMD: eth_em_dev_init(): port_id 0 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:07.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f7a000 PMD: eth_em_dev_init(): port_id 1 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:08.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f5a000 PMD: eth_em_dev_init(): port_id 2 vendorID=3D0x8086 deviceID=3D0x100e EAL: PCI device 0000:00:09.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f3a000 PMD: eth_em_dev_init(): port_id 3 vendorID=3D0x8086 deviceID=3D0x100e APP: Port ID: 0 APP: Rx lcore ID: 1, Tx lcore ID: 1 APP: Kernel thread lcore ID: 1 APP: Port ID: 1 APP: Rx lcore ID: 0, Tx lcore ID: 0 APP: Kernel thread lcore ID: 0 APP: Initialising port 0 ... PMD: eth_em_rx_queue_setup(): sw_ring=3D0x7fffcd4e7d00=20 hw_ring=3D0x7ffff6fdaac0 dma_addr=3D0x5daac0 PMD: eth_em_tx_queue_setup(): sw_ring=3D0x7fffcd4e5c00=20 hw_ring=3D0x7ffff6feaac0 dma_addr=3D0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): sw_ring=3D0x7fffcd4e5600=20 hw_ring=3D0x7fffcd50c1c0 dma_addr=3D0x2cb0c1c0 PMD: eth_em_tx_queue_setup(): tx_free_thresh must be less than the=20 number of TX descriptors minus 3. (tx_free_thresh=3D65535 port=3D1 queue=3D= 0) EAL: Error - exiting with code: 1 Cause: Could not setup up TX queue for port1 (-22) [Thread 0x7fffcbfff700 (LWP 22143) exited] [Thread 0x7ffff7ff0800 (LWP 22140) exited] [Inferior 1 (process 22140) exited with code 01] Debugging it: MD: eth_em_rx_queue_setup(): sw_ring=3D0x7fffcd4e7d00=20 hw_ring=3D0x7ffff6fdaac0 dma_addr=3D0x5daac0 Breakpoint 1, eth_em_tx_queue_setup (dev=3D0x796420, queue_idx=3D0,=20 nb_desc=3D512, socket_id=3D4294967295, tx_conf=3D0x7fffffffe39c) at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208 1208 hw =3D E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); (gdb) print dev->data->name $1 =3D "0:6.0", '\000' (gdb) print tx_conf $2 =3D (const struct rte_eth_txconf *) 0x7fffffffe39c (gdb) print *tx_conf $3 =3D {tx_thresh =3D {pthresh =3D 0 '\000', hthresh =3D 0 '\000', wthres= h =3D 0=20 '\000'}, tx_rs_thresh =3D 0, tx_free_thresh =3D 0, txq_flags =3D 0,=20 tx_deferred_start =3D 0 '\000'} (gdb) c Continuing. PMD: eth_em_tx_queue_setup(): sw_ring=3D0x7fffcd4e5c00=20 hw_ring=3D0x7ffff6feaac0 dma_addr=3D0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): sw_ring=3D0x7fffcd4e5600=20 hw_ring=3D0x7fffcd50c1c0 dma_addr=3D0x2cb0c1c0 Breakpoint 1, eth_em_tx_queue_setup (dev=3D0x796460, queue_idx=3D0,=20 nb_desc=3D512, socket_id=3D4294967295, tx_conf=3D0x7fffffffe39c) at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208 1208 hw =3D E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); (gdb) print dev->data->name $4 =3D "0:7.0", '\000' (gdb) print *tx_conf $5 =3D {tx_thresh =3D {pthresh =3D 0 '\000', hthresh =3D 0 '\000', wthres= h =3D 0=20 '\000'}, tx_rs_thresh =3D 58608, tx_free_thresh =3D 65535, txq_flags =3D = 32767, tx_deferred_start =3D 0 '\000'} The KNI example runs *perfectly*in the VM, with the same launching=20 parameters with v1.7.1, and seems to work fine until=20 27b31ee33fa5e7cc9a086c690b98ed8e1a153c6a. So the commit that breaks it=20 (the example, not the commit that is wrong) seems to be: commit 81f7ecd934372fc9f592d1322f8eff86350fa4f5 Author: Pablo de Lara Date: Wed Oct 1 10:49:05 2014 +0100 examples: use factorized default Rx/Tx configuration For apps that were using default rte_eth_rxconf and rte_eth_txconf structures, these have been removed and now they are obtained by calling rte_eth_dev_info_get, just before setting up RX/TX queues. Signed-off-by: Pablo de Lara Acked-by: David Marchand Which seems to indicate rte_eth_dev_info_get() is somehow corrupting=20 memory(?=BF). But I haven't figure out the problem (yet). I suspect of: commit fbde27f19ab8f1d386868275bd8c016e693cf073 Author: Pablo de Lara Date: Wed Oct 1 10:49:04 2014 +0100 ethdev: get default Rx/Tx configuration from dev info Many sample apps use duplicated code to set rte_eth_txconf and=20 rte_eth_rxconf structures. This patch allows the user to get a default optimal=20 RX/TX configuration through rte_eth_dev_info get, and still any parameters may be=20 tweaked as wished, before setting up queues. Besides, if a NULL pointer is passed to rte_eth_rx_queue_setup or rte_eth_tx_queue_setup, these functions get internally the default=20 RX/TX configuration for the user. Signed-off-by: Pablo de Lara Reviewed-by: Bruce Richardson Acked-by: David Marchand [Thomas: split patch] commit a30268e9a2d0618902e8cf96b90b27db4fb02d54 Author: Pablo de Lara Date: Wed Oct 1 10:49:03 2014 +0100 ethdev: reset whole dev info structure before filling To guarantee that RX/TX configuration structures are reseted before modifying them, plus the other dev info fields, dev info structure is zeroed beforehand. Signed-off-by: Pablo de Lara Acked-by: David Marchand Can anyone confirm it? Marc p.s. Has someone managed to run a dpdk app with valgrind?