* [PATCH 0/1] mlx4: mlx4_core failed to load @ 2014-04-28 18:33 clsoto 2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto 2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller 0 siblings, 2 replies; 6+ messages in thread From: clsoto @ 2014-04-28 18:33 UTC (permalink / raw) To: netdev; +Cc: clsoto, brking This is for a case where mlx4_core fails to load. drivers/net/ethernet/mellanox/mlx4/main.c | 7 ------- include/linux/mlx4/device.h | 1 - 2 files changed, 8 deletions(-) Carol -- ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used 2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto @ 2014-04-28 18:33 ` clsoto 2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller 1 sibling, 0 replies; 6+ messages in thread From: clsoto @ 2014-04-28 18:33 UTC (permalink / raw) To: netdev; +Cc: clsoto, brking [-- Attachment #1: mlx4_load_fail_use_prio.patch --] [-- Type: text/plain, Size: 3987 bytes --] doing insmod mlx4_core.ko use_prio=1 will fail to load and will produce this stack trace: Call Trace: [c0000000f777f1f0] [d0000000008b110c] .mlx4_bitmap_init+0x7c/0xe0 [mlx4_core] [c0000000f777f270] [d0000000008ccca4] .mlx4_init_qp_table+0x1a4/0x400 [mlx4_core] [c0000000f777f310] [d0000000008c1ba4] .mlx4_setup_hca+0x454/0x680 [mlx4_core] [c0000000f777f3e0] [d0000000008c28fc] .__mlx4_init_one+0xb2c/0x11a0 [mlx4_core] [c0000000f777f4e0] [c00000000042e6f0] .local_pci_probe+0x60/0xb0 [c0000000f777f570] [c00000000042e9d8] .pci_device_probe+0x198/0x1a0 [c0000000f777f620] [c0000000004cc730] .driver_probe_device+0xd0/0x450 [c0000000f777f6b0] [c0000000004ccc3c] .__driver_attach+0xfc/0x100 [c0000000f777f740] [c0000000004c99a4] .bus_for_each_dev+0x84/0xf0 [c0000000f777f7e0] [c0000000004cbf24] .driver_attach+0x24/0x40 [c0000000f777f850] [c0000000004cb8c8] .bus_add_driver+0x298/0x3b0 [c0000000f777f8f0] [c0000000004cd5bc] .driver_register+0x8c/0x170 [c0000000f777f970] [c00000000042e424] .__pci_register_driver+0x44/0x60 [c0000000f777f9e0] [d0000000008d7fc4] .mlx4_init+0x154/0x1b0 [mlx4_core] [c0000000f777fa70] [c00000000000bfa4] .do_one_initcall+0x144/0x1f0 [c0000000f777fb60] [c000000000144a10] .load_module+0x1c00/0x21a0 [c0000000f777fd30] [c0000000001451b0] .SyS_finit_module+0xb0/0x100 [c0000000f777fe30] [c000000000009efc] syscall_exit+0x0/0x7c Instruction dump: 7863e8c2 3929dbd8 7c6307b4 7d2918ae 7d290774 4bffffac 7888bfe3 39200000 40e2ffbc 3d42fff1 892ab182 69290001 <0b090000> 2fa90000 39200000 41feffa0 ---[ end trace 09b5aa84365cea39 ]--- mlx4_core 0001:00:00.0: Failed to initialize queue pair table, aborting. mlx4_core: probe of 0001:00:00.0 failed with error -12 Using the argument use_prio will increase the number of reserved qps and then it will increase the size of the bitmap too big that kzalloc fails. Signed-off-by: Carol Soto <clsoto@linux.vnet.ibm.com> --- drivers/net/ethernet/mellanox/mlx4/main.c | 7 ------- include/linux/mlx4/device.h | 1 - 2 files changed, 8 deletions(-) Index: b/drivers/net/ethernet/mellanox/mlx4/main.c =================================================================== --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -132,11 +132,6 @@ MODULE_PARM_DESC(log_num_vlan, "Log2 max /* Log2 max number of VLANs per ETH port (0-7) */ #define MLX4_LOG_NUM_VLANS 7 -static bool use_prio; -module_param_named(use_prio, use_prio, bool, 0444); -MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports " - "(0/1, default 0)"); - int log_mtts_per_seg = ilog2(MLX4_MTT_ENTRY_PER_SEG); module_param_named(log_mtts_per_seg, log_mtts_per_seg, int, 0444); MODULE_PARM_DESC(log_mtts_per_seg, "Log2 number of MTT entries per segment (1-7)"); @@ -296,7 +291,6 @@ static int mlx4_dev_cap(struct mlx4_dev dev->caps.log_num_macs = log_num_mac; dev->caps.log_num_vlans = MLX4_LOG_NUM_VLANS; - dev->caps.log_num_prios = use_prio ? 3 : 0; for (i = 1; i <= dev->caps.num_ports; ++i) { dev->caps.port_type[i] = MLX4_PORT_TYPE_NONE; @@ -366,7 +360,6 @@ static int mlx4_dev_cap(struct mlx4_dev dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] = (1 << dev->caps.log_num_macs) * (1 << dev->caps.log_num_vlans) * - (1 << dev->caps.log_num_prios) * dev->caps.num_ports; dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; Index: b/include/linux/mlx4/device.h =================================================================== --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -449,7 +449,6 @@ struct mlx4_caps { int reserved_qps_base[MLX4_NUM_QP_REGION]; int log_num_macs; int log_num_vlans; - int log_num_prios; enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1]; u8 supported_type[MLX4_MAX_PORTS + 1]; u8 suggested_type[MLX4_MAX_PORTS + 1]; -- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load 2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto 2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto @ 2014-04-28 19:59 ` David Miller 2014-05-13 15:06 ` Carol Soto 1 sibling, 1 reply; 6+ messages in thread From: David Miller @ 2014-04-28 19:59 UTC (permalink / raw) To: clsoto; +Cc: netdev, brking From: clsoto@linux.vnet.ibm.com Date: Mon, 28 Apr 2014 13:33:30 -0500 > This is for a case where mlx4_core fails to load. You cannot just will-nilly delete module parameters that you decide you don't want to support any more. Once you add a module parameter, you are stuck with it forever once it makes it into a released kernel. It is a user visible interface. I'm not applying this patch, you have to actually fix the bug rather then wholesale remove the facility altogether. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load 2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller @ 2014-05-13 15:06 ` Carol Soto 2014-05-13 16:32 ` David Miller 0 siblings, 1 reply; 6+ messages in thread From: Carol Soto @ 2014-05-13 15:06 UTC (permalink / raw) To: David Miller; +Cc: netdev, brking On 4/28/2014 2:59 PM, David Miller wrote: > From: clsoto@linux.vnet.ibm.com > Date: Mon, 28 Apr 2014 13:33:30 -0500 > >> This is for a case where mlx4_core fails to load. > You cannot just will-nilly delete module parameters that you decide > you don't want to support any more. > > Once you add a module parameter, you are stuck with it forever once > it makes it into a released kernel. It is a user visible interface. > > I'm not applying this patch, you have to actually fix the bug rather > then wholesale remove the facility altogether. The problem here is that when use_prio argument is used then the number of reserved qps increase from 0x20000 to 0x90000. So when it goes to mlx4_bitmap_init the argument reserved_top becomes a lot bigger than argument num, because of this then the math to get the size for the kzalloc is very big. The argument num is the num of qps that the adapter supports so then this sounds to me like a bug that if we use the use_prio we can not have more qps reserved than the num qps that adapter supports. That is why I went to the path of removing the argument in this patch. Any other suggestion? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load 2014-05-13 15:06 ` Carol Soto @ 2014-05-13 16:32 ` David Miller 2014-05-13 18:14 ` Carol Soto 0 siblings, 1 reply; 6+ messages in thread From: David Miller @ 2014-05-13 16:32 UTC (permalink / raw) To: clsoto; +Cc: netdev, brking From: Carol Soto <clsoto@linux.vnet.ibm.com> Date: Tue, 13 May 2014 10:06:54 -0500 > > On 4/28/2014 2:59 PM, David Miller wrote: >> From: clsoto@linux.vnet.ibm.com >> Date: Mon, 28 Apr 2014 13:33:30 -0500 >> >>> This is for a case where mlx4_core fails to load. >> You cannot just will-nilly delete module parameters that you decide >> you don't want to support any more. >> >> Once you add a module parameter, you are stuck with it forever once >> it makes it into a released kernel. It is a user visible interface. >> >> I'm not applying this patch, you have to actually fix the bug rather >> then wholesale remove the facility altogether. > > The problem here is that when use_prio argument is used then the > number of reserved qps increase from 0x20000 to 0x90000. So when it > goes to mlx4_bitmap_init the argument reserved_top becomes a lot > bigger than argument num, because of this then the math to get the > size for the kzalloc is very big. The argument num is the num of qps > that the adapter supports so then this sounds to me like a bug that if > we use the use_prio we can not have more qps reserved than the num qps > that adapter supports. That is why I went to the path of removing the > argument in this patch. Any other suggestion? It is not my job to fix bugs in your driver. But it is my job to make sure you do not break things that are user visible, and that means you cannot delete module parameters that are "too difficult to fix". You should have considered more carefully the semantics of this module option when it was added. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load 2014-05-13 16:32 ` David Miller @ 2014-05-13 18:14 ` Carol Soto 0 siblings, 0 replies; 6+ messages in thread From: Carol Soto @ 2014-05-13 18:14 UTC (permalink / raw) To: David Miller; +Cc: netdev, brking On 5/13/2014 11:32 AM, David Miller wrote: > From: Carol Soto <clsoto@linux.vnet.ibm.com> > Date: Tue, 13 May 2014 10:06:54 -0500 > >> On 4/28/2014 2:59 PM, David Miller wrote: >>> From: clsoto@linux.vnet.ibm.com >>> Date: Mon, 28 Apr 2014 13:33:30 -0500 >>> >>>> This is for a case where mlx4_core fails to load. >>> You cannot just will-nilly delete module parameters that you decide >>> you don't want to support any more. >>> >>> Once you add a module parameter, you are stuck with it forever once >>> it makes it into a released kernel. It is a user visible interface. >>> >>> I'm not applying this patch, you have to actually fix the bug rather >>> then wholesale remove the facility altogether. >> The problem here is that when use_prio argument is used then the >> number of reserved qps increase from 0x20000 to 0x90000. So when it >> goes to mlx4_bitmap_init the argument reserved_top becomes a lot >> bigger than argument num, because of this then the math to get the >> size for the kzalloc is very big. The argument num is the num of qps >> that the adapter supports so then this sounds to me like a bug that if >> we use the use_prio we can not have more qps reserved than the num qps >> that adapter supports. That is why I went to the path of removing the >> argument in this patch. Any other suggestion? > It is not my job to fix bugs in your driver. > > But it is my job to make sure you do not break things that are > user visible, and that means you cannot delete module parameters > that are "too difficult to fix". > > You should have considered more carefully the semantics of this > module option when it was added. This is not my driver. I do not know how this argument make it upstream in the first place. It maybe was functional at some point but I do not have that information. That maybe a question for Mellanox. Now by debugging the code in my system I do not see how this argument is useful based in my previous comment. Maybe we need Mellanox to confirm here what is the use of this argument and if it is needed. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-13 18:14 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto 2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto 2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller 2014-05-13 15:06 ` Carol Soto 2014-05-13 16:32 ` David Miller 2014-05-13 18:14 ` Carol Soto
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).