* [PATCH 0/1] mlx4: mlx4_core failed to load
@ 2014-04-28 18:33 clsoto
2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto
2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller
0 siblings, 2 replies; 6+ messages in thread
From: clsoto @ 2014-04-28 18:33 UTC (permalink / raw)
To: netdev; +Cc: clsoto, brking
This is for a case where mlx4_core fails to load.
drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
include/linux/mlx4/device.h | 1 -
2 files changed, 8 deletions(-)
Carol
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used
2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto
@ 2014-04-28 18:33 ` clsoto
2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller
1 sibling, 0 replies; 6+ messages in thread
From: clsoto @ 2014-04-28 18:33 UTC (permalink / raw)
To: netdev; +Cc: clsoto, brking
[-- Attachment #1: mlx4_load_fail_use_prio.patch --]
[-- Type: text/plain, Size: 3987 bytes --]
doing insmod mlx4_core.ko use_prio=1 will fail to load and will produce
this stack trace:
Call Trace:
[c0000000f777f1f0] [d0000000008b110c] .mlx4_bitmap_init+0x7c/0xe0 [mlx4_core]
[c0000000f777f270] [d0000000008ccca4] .mlx4_init_qp_table+0x1a4/0x400 [mlx4_core]
[c0000000f777f310] [d0000000008c1ba4] .mlx4_setup_hca+0x454/0x680 [mlx4_core]
[c0000000f777f3e0] [d0000000008c28fc] .__mlx4_init_one+0xb2c/0x11a0 [mlx4_core]
[c0000000f777f4e0] [c00000000042e6f0] .local_pci_probe+0x60/0xb0
[c0000000f777f570] [c00000000042e9d8] .pci_device_probe+0x198/0x1a0
[c0000000f777f620] [c0000000004cc730] .driver_probe_device+0xd0/0x450
[c0000000f777f6b0] [c0000000004ccc3c] .__driver_attach+0xfc/0x100
[c0000000f777f740] [c0000000004c99a4] .bus_for_each_dev+0x84/0xf0
[c0000000f777f7e0] [c0000000004cbf24] .driver_attach+0x24/0x40
[c0000000f777f850] [c0000000004cb8c8] .bus_add_driver+0x298/0x3b0
[c0000000f777f8f0] [c0000000004cd5bc] .driver_register+0x8c/0x170
[c0000000f777f970] [c00000000042e424] .__pci_register_driver+0x44/0x60
[c0000000f777f9e0] [d0000000008d7fc4] .mlx4_init+0x154/0x1b0 [mlx4_core]
[c0000000f777fa70] [c00000000000bfa4] .do_one_initcall+0x144/0x1f0
[c0000000f777fb60] [c000000000144a10] .load_module+0x1c00/0x21a0
[c0000000f777fd30] [c0000000001451b0] .SyS_finit_module+0xb0/0x100
[c0000000f777fe30] [c000000000009efc] syscall_exit+0x0/0x7c
Instruction dump:
7863e8c2 3929dbd8 7c6307b4 7d2918ae 7d290774 4bffffac 7888bfe3 39200000
40e2ffbc 3d42fff1 892ab182 69290001 <0b090000> 2fa90000 39200000 41feffa0
---[ end trace 09b5aa84365cea39 ]---
mlx4_core 0001:00:00.0: Failed to initialize queue pair table, aborting.
mlx4_core: probe of 0001:00:00.0 failed with error -12
Using the argument use_prio will increase the number of reserved qps and then
it will increase the size of the bitmap too big that kzalloc fails.
Signed-off-by: Carol Soto <clsoto@linux.vnet.ibm.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
include/linux/mlx4/device.h | 1 -
2 files changed, 8 deletions(-)
Index: b/drivers/net/ethernet/mellanox/mlx4/main.c
===================================================================
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -132,11 +132,6 @@ MODULE_PARM_DESC(log_num_vlan, "Log2 max
/* Log2 max number of VLANs per ETH port (0-7) */
#define MLX4_LOG_NUM_VLANS 7
-static bool use_prio;
-module_param_named(use_prio, use_prio, bool, 0444);
-MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
- "(0/1, default 0)");
-
int log_mtts_per_seg = ilog2(MLX4_MTT_ENTRY_PER_SEG);
module_param_named(log_mtts_per_seg, log_mtts_per_seg, int, 0444);
MODULE_PARM_DESC(log_mtts_per_seg, "Log2 number of MTT entries per segment (1-7)");
@@ -296,7 +291,6 @@ static int mlx4_dev_cap(struct mlx4_dev
dev->caps.log_num_macs = log_num_mac;
dev->caps.log_num_vlans = MLX4_LOG_NUM_VLANS;
- dev->caps.log_num_prios = use_prio ? 3 : 0;
for (i = 1; i <= dev->caps.num_ports; ++i) {
dev->caps.port_type[i] = MLX4_PORT_TYPE_NONE;
@@ -366,7 +360,6 @@ static int mlx4_dev_cap(struct mlx4_dev
dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] =
(1 << dev->caps.log_num_macs) *
(1 << dev->caps.log_num_vlans) *
- (1 << dev->caps.log_num_prios) *
dev->caps.num_ports;
dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH;
Index: b/include/linux/mlx4/device.h
===================================================================
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -449,7 +449,6 @@ struct mlx4_caps {
int reserved_qps_base[MLX4_NUM_QP_REGION];
int log_num_macs;
int log_num_vlans;
- int log_num_prios;
enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1];
u8 supported_type[MLX4_MAX_PORTS + 1];
u8 suggested_type[MLX4_MAX_PORTS + 1];
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load
2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto
2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto
@ 2014-04-28 19:59 ` David Miller
2014-05-13 15:06 ` Carol Soto
1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2014-04-28 19:59 UTC (permalink / raw)
To: clsoto; +Cc: netdev, brking
From: clsoto@linux.vnet.ibm.com
Date: Mon, 28 Apr 2014 13:33:30 -0500
> This is for a case where mlx4_core fails to load.
You cannot just will-nilly delete module parameters that you decide
you don't want to support any more.
Once you add a module parameter, you are stuck with it forever once
it makes it into a released kernel. It is a user visible interface.
I'm not applying this patch, you have to actually fix the bug rather
then wholesale remove the facility altogether.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load
2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller
@ 2014-05-13 15:06 ` Carol Soto
2014-05-13 16:32 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Carol Soto @ 2014-05-13 15:06 UTC (permalink / raw)
To: David Miller; +Cc: netdev, brking
On 4/28/2014 2:59 PM, David Miller wrote:
> From: clsoto@linux.vnet.ibm.com
> Date: Mon, 28 Apr 2014 13:33:30 -0500
>
>> This is for a case where mlx4_core fails to load.
> You cannot just will-nilly delete module parameters that you decide
> you don't want to support any more.
>
> Once you add a module parameter, you are stuck with it forever once
> it makes it into a released kernel. It is a user visible interface.
>
> I'm not applying this patch, you have to actually fix the bug rather
> then wholesale remove the facility altogether.
The problem here is that when use_prio argument is used then the number
of reserved qps increase from 0x20000 to 0x90000. So when it goes to
mlx4_bitmap_init the argument reserved_top becomes a lot bigger than
argument num, because of this then the math to get the size for the
kzalloc is very big. The argument num is the num of qps that the adapter
supports so then this sounds to me like a bug that if we use the
use_prio we can not have more qps reserved than the num qps that adapter
supports. That is why I went to the path of removing the argument in
this patch. Any other suggestion?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load
2014-05-13 15:06 ` Carol Soto
@ 2014-05-13 16:32 ` David Miller
2014-05-13 18:14 ` Carol Soto
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2014-05-13 16:32 UTC (permalink / raw)
To: clsoto; +Cc: netdev, brking
From: Carol Soto <clsoto@linux.vnet.ibm.com>
Date: Tue, 13 May 2014 10:06:54 -0500
>
> On 4/28/2014 2:59 PM, David Miller wrote:
>> From: clsoto@linux.vnet.ibm.com
>> Date: Mon, 28 Apr 2014 13:33:30 -0500
>>
>>> This is for a case where mlx4_core fails to load.
>> You cannot just will-nilly delete module parameters that you decide
>> you don't want to support any more.
>>
>> Once you add a module parameter, you are stuck with it forever once
>> it makes it into a released kernel. It is a user visible interface.
>>
>> I'm not applying this patch, you have to actually fix the bug rather
>> then wholesale remove the facility altogether.
>
> The problem here is that when use_prio argument is used then the
> number of reserved qps increase from 0x20000 to 0x90000. So when it
> goes to mlx4_bitmap_init the argument reserved_top becomes a lot
> bigger than argument num, because of this then the math to get the
> size for the kzalloc is very big. The argument num is the num of qps
> that the adapter supports so then this sounds to me like a bug that if
> we use the use_prio we can not have more qps reserved than the num qps
> that adapter supports. That is why I went to the path of removing the
> argument in this patch. Any other suggestion?
It is not my job to fix bugs in your driver.
But it is my job to make sure you do not break things that are
user visible, and that means you cannot delete module parameters
that are "too difficult to fix".
You should have considered more carefully the semantics of this
module option when it was added.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/1] mlx4: mlx4_core failed to load
2014-05-13 16:32 ` David Miller
@ 2014-05-13 18:14 ` Carol Soto
0 siblings, 0 replies; 6+ messages in thread
From: Carol Soto @ 2014-05-13 18:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev, brking
On 5/13/2014 11:32 AM, David Miller wrote:
> From: Carol Soto <clsoto@linux.vnet.ibm.com>
> Date: Tue, 13 May 2014 10:06:54 -0500
>
>> On 4/28/2014 2:59 PM, David Miller wrote:
>>> From: clsoto@linux.vnet.ibm.com
>>> Date: Mon, 28 Apr 2014 13:33:30 -0500
>>>
>>>> This is for a case where mlx4_core fails to load.
>>> You cannot just will-nilly delete module parameters that you decide
>>> you don't want to support any more.
>>>
>>> Once you add a module parameter, you are stuck with it forever once
>>> it makes it into a released kernel. It is a user visible interface.
>>>
>>> I'm not applying this patch, you have to actually fix the bug rather
>>> then wholesale remove the facility altogether.
>> The problem here is that when use_prio argument is used then the
>> number of reserved qps increase from 0x20000 to 0x90000. So when it
>> goes to mlx4_bitmap_init the argument reserved_top becomes a lot
>> bigger than argument num, because of this then the math to get the
>> size for the kzalloc is very big. The argument num is the num of qps
>> that the adapter supports so then this sounds to me like a bug that if
>> we use the use_prio we can not have more qps reserved than the num qps
>> that adapter supports. That is why I went to the path of removing the
>> argument in this patch. Any other suggestion?
> It is not my job to fix bugs in your driver.
>
> But it is my job to make sure you do not break things that are
> user visible, and that means you cannot delete module parameters
> that are "too difficult to fix".
>
> You should have considered more carefully the semantics of this
> module option when it was added.
This is not my driver. I do not know how this argument make it upstream
in the first place. It maybe was functional at some point but I do not
have that information. That maybe a question for Mellanox. Now by
debugging the code in my system I do not see how this argument is useful
based in my previous comment. Maybe we need Mellanox to confirm here
what is the use of this argument and if it is needed.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-13 18:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-28 18:33 [PATCH 0/1] mlx4: mlx4_core failed to load clsoto
2014-04-28 18:33 ` [PATCH 1/1] mlx4: mlx4_core failed to load if use_prio argument is used clsoto
2014-04-28 19:59 ` [PATCH 0/1] mlx4: mlx4_core failed to load David Miller
2014-05-13 15:06 ` Carol Soto
2014-05-13 16:32 ` David Miller
2014-05-13 18:14 ` Carol Soto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).