netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mlx4: "failed to allocate default counter port 1"
@ 2015-06-30 10:45 Sebastian Ott
  2015-06-30 12:21 ` Or Gerlitz
  2015-06-30 12:53 ` Or Gerlitz
  0 siblings, 2 replies; 14+ messages in thread
From: Sebastian Ott @ 2015-06-30 10:45 UTC (permalink / raw)
  To: Eran Ben Elisha, Or Gerlitz, Jack Morgenstein, Hadar Hen Zion
  Cc: netdev, linux-kernel

Hello,

after the latest mellanox update the mlx4 driver fails to probe a VF:
[   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22
[   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
[   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22

PFs still work. See below for more dmesg output - I also added a line of
debug output...maybe this helps.

Regards,
Sebastian

# git diff
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 8204013..e0c41c3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -565,6 +565,9 @@ static int mlx4_slave_cmd(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
                                }
                        }
                        ret = mlx4_status_to_errno(vhcr->status);
+                       if (ret)
+                               printk(KERN_WARNING"%s op=%d, ret=%d, status=%d\n",
+                                      __func__, op, ret, vhcr->status);
                } else {
                        if (dev->persist->state &
                            MLX4_DEVICE_STATE_INTERNAL_ERROR)
# git describe
v4.1-11355-g6aaf0da
# dmesg
[   88.518946] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[   88.518967] mlx4_core: Initializing 0000:00:00.0
[   88.519101] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[   88.519661] mlx4_core 0000:00:00.0: enabling bus mastering
[   88.520279] mlx4_core 0000:00:00.0: Detected virtual function - running in slave mode
[   88.520404] mlx4_core 0000:00:00.0: Sending reset
[   88.526726] mlx4_core 0000:00:00.0: Sending vhcr0
[   88.539676] mlx4_core 0000:00:00.0: BlueFlame not available
[   88.539678] mlx4_core 0000:00:00.0: Base MM extensions: flags 31104ec2, rsvd L_Key 00008000
[   88.539680] mlx4_core 0000:00:00.0: Max ICM size 4294967296 MB
[   88.539682] mlx4_core 0000:00:00.0: Max QPs: 16777216, reserved QPs: 64, entry size: 256
[   88.539683] mlx4_core 0000:00:00.0: Max SRQs: 16777216, reserved SRQs: 64, entry size: 128
[   88.539685] mlx4_core 0000:00:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 128
[   88.539687] mlx4_core 0000:00:00.0: Num sys EQs: 1024, max EQs: 512, reserved EQs: 8, entry size: 128
[   88.539688] mlx4_core 0000:00:00.0: reserved MPTs: 256, reserved MTTs: 64
[   88.539690] mlx4_core 0000:00:00.0: Max PDs: 131072, reserved PDs: 4, reserved UARs: 2
[   88.539691] mlx4_core 0000:00:00.0: Max QP/MCG: 131072, reserved MGMs: 0
[   88.539693] mlx4_core 0000:00:00.0: Max CQEs: 4194304, max WQEs: 16384, max SRQ WQEs: 16384
[   88.539695] mlx4_core 0000:00:00.0: Local CA ACK delay: 15, max MTU: 4096, port width cap: 3
[   88.539696] mlx4_core 0000:00:00.0: Max SQ desc size: 1008, max SQ S/G: 62
[   88.539698] mlx4_core 0000:00:00.0: Max RQ desc size: 512, max RQ S/G: 32
[   88.539699] mlx4_core 0000:00:00.0: Max GSO size: 131072
[   88.539701] mlx4_core 0000:00:00.0: Max counters: 256
[   88.539702] mlx4_core 0000:00:00.0: Max RSS Table size: 256
[   88.539704] mlx4_core 0000:00:00.0: DMFS high rate steer QPn base: 64
[   88.539705] mlx4_core 0000:00:00.0: DMFS high rate steer QPn range: 254
[   88.539707] mlx4_core 0000:00:00.0: QP Rate-Limit: #rates 1024, unit/val max 3/40, min 1/512
[   88.539709] mlx4_core 0000:00:00.0: DEV_CAP flags:
[   88.539710] mlx4_core 0000:00:00.0:     RC transport
[   88.539711] mlx4_core 0000:00:00.0:     UC transport
[   88.539713] mlx4_core 0000:00:00.0:     UD transport
[   88.539714] mlx4_core 0000:00:00.0:     XRC transport
[   88.539716] mlx4_core 0000:00:00.0:     SRQ support
[   88.539717] mlx4_core 0000:00:00.0:     IPoIB checksum offload
[   88.539719] mlx4_core 0000:00:00.0:     P_Key violation counter
[   88.539720] mlx4_core 0000:00:00.0:     Q_Key violation counter
[   88.539722] mlx4_core 0000:00:00.0:     Big LSO headers
[   88.539723] mlx4_core 0000:00:00.0:     MW support
[   88.539724] mlx4_core 0000:00:00.0:     APM support
[   88.539726] mlx4_core 0000:00:00.0:     Atomic ops support
[   88.539727] mlx4_core 0000:00:00.0:     Address vector port checking support
[   88.539729] mlx4_core 0000:00:00.0:     UD multicast support
[   88.539730] mlx4_core 0000:00:00.0:     IBoE support
[   88.539732] mlx4_core 0000:00:00.0:     Unicast loopback support
[   88.539733] mlx4_core 0000:00:00.0:     FCS header control
[   88.539735] mlx4_core 0000:00:00.0:     UDP RSS support
[   88.539736] mlx4_core 0000:00:00.0:     Unicast VEP steering support
[   88.539738] mlx4_core 0000:00:00.0:     Multicast VEP steering support
[   88.539739] mlx4_core 0000:00:00.0:     Counters support
[   88.539741] mlx4_core 0000:00:00.0:     RSS IP fragments support
[   88.539742] mlx4_core 0000:00:00.0:     Port ETS Scheduler support
[   88.539744] mlx4_core 0000:00:00.0:     Port link type sensing support
[   88.539745] mlx4_core 0000:00:00.0:     Port management change event support
[   88.539747] mlx4_core 0000:00:00.0:     64 byte EQE support
[   88.539748] mlx4_core 0000:00:00.0:     64 byte CQE support
[   88.539749] mlx4_core 0000:00:00.0:     RSS support
[   88.539751] mlx4_core 0000:00:00.0:     RSS Toeplitz Hash Function support
[   88.539752] mlx4_core 0000:00:00.0:     RSS XOR Hash Function support
[   88.539754] mlx4_core 0000:00:00.0:     Device managed flow steering support
[   88.539755] mlx4_core 0000:00:00.0:     Automatic MAC reassignment support
[   88.539757] mlx4_core 0000:00:00.0:     Time stamping support
[   88.539758] mlx4_core 0000:00:00.0:     VST (control vlan insertion/stripping) support
[   88.539760] mlx4_core 0000:00:00.0:     FSM (MAC anti-spoofing) support
[   88.539761] mlx4_core 0000:00:00.0:     Dynamic QP updates support
[   88.539763] mlx4_core 0000:00:00.0:     MAD DEMUX (Secure-Host) support
[   88.539764] mlx4_core 0000:00:00.0:     Large cache line (>64B) CQE stride support
[   88.539766] mlx4_core 0000:00:00.0:     Large cache line (>64B) EQE stride support
[   88.539767] mlx4_core 0000:00:00.0:     Ethernet protocol control support
[   88.539769] mlx4_core 0000:00:00.0:     Ethernet Backplane autoneg support
[   88.539770] mlx4_core 0000:00:00.0:     CONFIG DEV support
[   88.539771] mlx4_core 0000:00:00.0:     Asymmetric EQs support
[   88.539773] mlx4_core 0000:00:00.0:     More than 80 VFs support
[   88.539774] mlx4_core 0000:00:00.0:     Recoverable error events support
[   88.539776] mlx4_core 0000:00:00.0:     Port Remap support
[   88.539777] mlx4_core 0000:00:00.0:     QCN support
[   88.539779] mlx4_core 0000:00:00.0:     QP rate limiting support
[   88.539780] mlx4_core 0000:00:00.0:     Ethernet Flow control statistics support
[   88.539782] mlx4_core 0000:00:00.0:     Granular QoS per VF support
[   88.539783] mlx4_core 0000:00:00.0:     Port beacon support
[   88.540492] mlx4_core 0000:00:00.0: HCA minimum page size:512
[   88.543436] mlx4_core 0000:00:00.0: Timestamping is not supported in slave mode
[   88.543438] mlx4_core 0000:00:00.0: Steering mode is: Device managed flow steering
[   88.543440] mlx4_core 0000:00:00.0: RSS support for IP fragments is off
[   88.543441] mlx4_core 0000:00:00.0: Failed to map blue flame area
[   88.909056] mlx4_core 0000:00:00.0: NOP command IRQ test passed
[   88.909558] mlx4_slave_cmd op=3840, ret=-22, status=3
[   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22
[   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
[   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 10:45 mlx4: "failed to allocate default counter port 1" Sebastian Ott
@ 2015-06-30 12:21 ` Or Gerlitz
  2015-06-30 13:19   ` Sebastian Ott
  2015-06-30 12:53 ` Or Gerlitz
  1 sibling, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2015-06-30 12:21 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Eran Ben Elisha, Or Gerlitz, Jack Morgenstein, Hadar Hen Zion,
	Linux Netdev List, Linux Kernel

On Tue, Jun 30, 2015 at 1:45 PM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> after the latest mellanox update the mlx4 driver fails to probe a VF:
> [   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22
> [   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
> [   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22
>
> PFs still work. See below for more dmesg output - I also added a line of
> debug output...maybe this helps.

Can you please send your "lspci | grep nox" listing? also what
Firmware version you have there? e.g when you probe the PF with
mlx4_core debug_level=1 can you sens us the lines that follow the PF
probe, e.g as here + dump of all caps as you did for the VF

  952.367911] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[  952.374606] mlx4_core: Initializing 0000:06:00.0
[  953.384332] mlx4_core 0000:06:00.0: FW version 2.34.5000 (cmd intf
rev 3), max commands 16
[...]

Also send us the output of "dmesg | grep -i counter" after such verbose load.

thanks,

Or.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 10:45 mlx4: "failed to allocate default counter port 1" Sebastian Ott
  2015-06-30 12:21 ` Or Gerlitz
@ 2015-06-30 12:53 ` Or Gerlitz
  2015-06-30 13:24   ` Sebastian Ott
  1 sibling, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2015-06-30 12:53 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On 6/30/2015 1:45 PM, Sebastian Ott wrote:
> [   88.909558] mlx4_slave_cmd op=3840, ret=-22, status=3
> [   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22
> [   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
> [   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22

Do you run the VF on the same system/kernel as the PF, or the VF is 
probed to VM which runs the latest kernel and the PF runsolder kernel 
(which?)

Can you also hook the PF code that serves this flow to see where we 
actually fail? basically, we should be going this way 
mlx4_ALLOC_RES_wrapper --> counter_alloc_res -- so I'd like to see which 
of the branches in counter_alloc_res fails...

Or.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 12:21 ` Or Gerlitz
@ 2015-06-30 13:19   ` Sebastian Ott
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Ott @ 2015-06-30 13:19 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eran Ben Elisha, Or Gerlitz, Jack Morgenstein, Hadar Hen Zion,
	Linux Netdev List, Linux Kernel

On Tue, 30 Jun 2015, Or Gerlitz wrote:
> On Tue, Jun 30, 2015 at 1:45 PM, Sebastian Ott
> <sebott@linux.vnet.ibm.com> wrote:
> > after the latest mellanox update the mlx4 driver fails to probe a VF:
> > [   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22
> > [   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
> > [   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22
> >
> > PFs still work. See below for more dmesg output - I also added a line of
> > debug output...maybe this helps.
> 
> Can you please send your "lspci | grep nox" listing? also what

0000:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

> Firmware version you have there? e.g when you probe the PF with
> mlx4_core debug_level=1 can you sens us the lines that follow the PF
> probe, e.g as here + dump of all caps as you did for the VF

I have access to 2 machines and run a guest instance on both machines:
* on one the guest has acccess to a PF, but VF enablement is disallowed
* on the other the hypervisor controls the PF and the guests have only
  access to the VFs - so I cannot say much about the PF here

At least I found out the FW version - it's: 2.33.5100

Regards,
Sebastian

> 
>   952.367911] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
> [  952.374606] mlx4_core: Initializing 0000:06:00.0
> [  953.384332] mlx4_core 0000:06:00.0: FW version 2.34.5000 (cmd intf
> rev 3), max commands 16
> [...]
> 
> Also send us the output of "dmesg | grep -i counter" after such verbose load.
> 
> thanks,
> 
> Or.
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 12:53 ` Or Gerlitz
@ 2015-06-30 13:24   ` Sebastian Ott
  2015-06-30 13:52     ` Or Gerlitz
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Ott @ 2015-06-30 13:24 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On Tue, 30 Jun 2015, Or Gerlitz wrote:
> On 6/30/2015 1:45 PM, Sebastian Ott wrote:
> > [   88.909558] mlx4_slave_cmd op=3840, ret=-22, status=3
> > [   88.909562] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters:
> > failed to allocate default counter port 1 err -22
> > [   88.909564] mlx4_core 0000:00:00.0: Failed to allocate default counters,
> > aborting
> > [   88.961735] mlx4_core: probe of 0000:00:00.0 failed with error -22
> 
> Do you run the VF on the same system/kernel as the PF, or the VF is probed to
> VM which runs the latest kernel and the PF runsolder kernel (which?)

The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1

> Can you also hook the PF code that serves this flow to see where we actually
> fail? basically, we should be going this way mlx4_ALLOC_RES_wrapper -->
> counter_alloc_res -- so I'd like to see which of the branches in
> counter_alloc_res fails...

Nope, I don't have direct access to the PF, sry.

Sebastian
> 
> Or.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 13:24   ` Sebastian Ott
@ 2015-06-30 13:52     ` Or Gerlitz
  2015-06-30 14:17       ` Sebastian Ott
  0 siblings, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2015-06-30 13:52 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On 6/30/2015 4:24 PM, Sebastian Ott wrote:
>> >Do you run the VF on the same system/kernel as the PF, or the VF is probed to
>> >VM which runs the latest kernel and the PF runsolder kernel (which?)
> The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1
>

Can you try running the inbox PF driver that comes with the PF kernel 
(what kernel is that?) I'd like to see we're OK there.

Or.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 13:52     ` Or Gerlitz
@ 2015-06-30 14:17       ` Sebastian Ott
  2015-07-01 13:18         ` Or Gerlitz
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Ott @ 2015-06-30 14:17 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On Tue, 30 Jun 2015, Or Gerlitz wrote:
> On 6/30/2015 4:24 PM, Sebastian Ott wrote:
> > > >Do you run the VF on the same system/kernel as the PF, or the VF is
> > > >probed to
> > > >VM which runs the latest kernel and the PF runsolder kernel (which?)
> > The latter case. The PF is driven by a much older Kernel running OFED
> > 2.3.2.0.0.1
> >
> 
> Can you try running the inbox PF driver that comes with the PF kernel (what
> kernel is that?) I'd like to see we're OK there.

Frankly, I don't know. Plus I also don't know how to build an ofed kernel.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-06-30 14:17       ` Sebastian Ott
@ 2015-07-01 13:18         ` Or Gerlitz
  2015-07-01 13:59           ` Sebastian Ott
  2015-07-29 12:50           ` Sebastian Ott
  0 siblings, 2 replies; 14+ messages in thread
From: Or Gerlitz @ 2015-07-01 13:18 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On 6/30/2015 5:17 PM, Sebastian Ott wrote:
> On Tue, 30 Jun 2015, Or Gerlitz wrote:
>> On 6/30/2015 4:24 PM, Sebastian Ott wrote:
>>>>> Do you run the VF on the same system/kernel as the PF, or the VF is
>>>>> probed to
>>>>> VM which runs the latest kernel and the PF runsolder kernel (which?)
>>> The latter case. The PF is driven by a much older Kernel running OFED
>>> 2.3.2.0.0.1
>>>
>> Can you try running the inbox PF driver that comes with the PF kernel (what
>> kernel is that?) I'd like to see we're OK there.
> Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
>

I didn't want you to build that package, but rather the outer way 
around, namely
see what happens if uninstalling this package and running with the mlx4 
inbox PF
driver from the kernel provided from your distro of choice or an 
upstreamkernel installed
by you. Anyway, I hope the below patch would provide a quick band-aid 
and let
you to continue running upstream VFs over that PF config, let me know (I 
will be
OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Or.


diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 12fbfcb..a66cc6e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct 
mlx4_dev *dev)
                 } else if (err == -ENOENT) {
                         err = 0;
                         continue;
+               } else if (mlx4_is_slave(dev) && err == -EINVAL) {
+                       priv->def_counter[port] = 
MLX4_SINK_COUNTER_INDEX(dev);
+                       mlx4_warn(dev, "can't allocate counter from old 
PF driver, using index %d\n",
+ MLX4_SINK_COUNTER_INDEX(dev));
                 } else {
                         mlx4_err(dev, "%s: failed to allocate default 
counter port %d err %d\n",
                                  __func__, port + 1, err);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-01 13:18         ` Or Gerlitz
@ 2015-07-01 13:59           ` Sebastian Ott
  2015-07-01 14:18             ` Sebastian Ott
  2015-07-29 12:50           ` Sebastian Ott
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Ott @ 2015-07-01 13:59 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On Wed, 1 Jul 2015, Or Gerlitz wrote:
> On 6/30/2015 5:17 PM, Sebastian Ott wrote:
> > On Tue, 30 Jun 2015, Or Gerlitz wrote:
> > > On 6/30/2015 4:24 PM, Sebastian Ott wrote:
> > > > > > Do you run the VF on the same system/kernel as the PF, or the VF is
> > > > > > probed to
> > > > > > VM which runs the latest kernel and the PF runsolder kernel (which?)
> > > > The latter case. The PF is driven by a much older Kernel running OFED
> > > > 2.3.2.0.0.1
> > > >
> > > Can you try running the inbox PF driver that comes with the PF kernel
> > > (what
> > > kernel is that?) I'd like to see we're OK there.
> > Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
> >
> 
> I didn't want you to build that package, but rather the outer way around,
> namely
> see what happens if uninstalling this package and running with the mlx4 inbox
> PF
> driver from the kernel provided from your distro of choice or an
> upstreamkernel installed
> by you. Anyway, I hope the below patch would provide a quick band-aid and let
> you to continue running upstream VFs over that PF config, let me know (I will
> be
> OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Thanks for the patch. Unfortunately, that didn't work:

[  170.531076] mlx4_core 0000:00:00.0: NOP command IRQ test passed
[  170.531291] mlx4_core 0000:00:00.0: can't allocate counter from old PF driver, using index 255
[  170.531294] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 1
[  170.531531] mlx4_core 0000:00:00.0: can't allocate counter from old PF driver, using index 255
[  170.531534] mlx4_core 0000:00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 2
[  170.531535] mlx4_core 0000:00:00.0: Failed to allocate default counters, aborting
[  170.587306] mlx4_core: probe of 0000:00:00.0 failed with error -22

Regards,
Sebastian

> 
> Or.
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
> b/drivers/net/ethernet/mellanox/mlx4/main.c
> index 12fbfcb..a66cc6e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct
> mlx4_dev *dev)
>                 } else if (err == -ENOENT) {
>                         err = 0;
>                         continue;
> +               } else if (mlx4_is_slave(dev) && err == -EINVAL) {
> +                       priv->def_counter[port] =
> MLX4_SINK_COUNTER_INDEX(dev);
> +                       mlx4_warn(dev, "can't allocate counter from old PF
> driver, using index %d\n",
> + MLX4_SINK_COUNTER_INDEX(dev));
>                 } else {
>                         mlx4_err(dev, "%s: failed to allocate default counter
> port %d err %d\n",
>                                  __func__, port + 1, err);
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-01 13:59           ` Sebastian Ott
@ 2015-07-01 14:18             ` Sebastian Ott
  2015-07-01 20:38               ` Or Gerlitz
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Ott @ 2015-07-01 14:18 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On Wed, 1 Jul 2015, Sebastian Ott wrote:

> On Wed, 1 Jul 2015, Or Gerlitz wrote:
> > On 6/30/2015 5:17 PM, Sebastian Ott wrote:
> > > On Tue, 30 Jun 2015, Or Gerlitz wrote:
> > > > On 6/30/2015 4:24 PM, Sebastian Ott wrote:
> > > > > > > Do you run the VF on the same system/kernel as the PF, or the VF is
> > > > > > > probed to
> > > > > > > VM which runs the latest kernel and the PF runsolder kernel (which?)
> > > > > The latter case. The PF is driven by a much older Kernel running OFED
> > > > > 2.3.2.0.0.1
> > > > >
> > > > Can you try running the inbox PF driver that comes with the PF kernel
> > > > (what
> > > > kernel is that?) I'd like to see we're OK there.
> > > Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
> > >
> > 
> > I didn't want you to build that package, but rather the outer way around,
> > namely
> > see what happens if uninstalling this package and running with the mlx4 inbox
> > PF
> > driver from the kernel provided from your distro of choice or an
> > upstreamkernel installed
> > by you. Anyway, I hope the below patch would provide a quick band-aid and let
> > you to continue running upstream VFs over that PF config, let me know (I will
> > be
> > OOO till Thu-Sun). Once we see how this behaves, will take it from there.
> 
> Thanks for the patch. Unfortunately, that didn't work:
> 

OK, using this patch it worked:

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 12fbfcb..29c2a01 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2273,6 +2273,11 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev)
 		} else if (err == -ENOENT) {
 			err = 0;
 			continue;
+		} else if (mlx4_is_slave(dev) && err == -EINVAL) {
+			priv->def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev);
+			mlx4_warn(dev, "can't allocate counter from old PF driver, using index %d\n",
+				  MLX4_SINK_COUNTER_INDEX(dev));
+			err = 0;
 		} else {
 			mlx4_err(dev, "%s: failed to allocate default counter port %d err %d\n",
 				 __func__, port + 1, err);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-01 14:18             ` Sebastian Ott
@ 2015-07-01 20:38               ` Or Gerlitz
  2015-07-02  9:20                 ` Sebastian Ott
  0 siblings, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2015-07-01 20:38 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Or Gerlitz, Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion,
	Linux Netdev List

On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> OK, using this patch it worked:

yep, I forgot to recap err to zero.

By "it worked" you mean the VF is live and kicking, all functionality
you had before the 4.2 merge window is back again?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-01 20:38               ` Or Gerlitz
@ 2015-07-02  9:20                 ` Sebastian Ott
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Ott @ 2015-07-02  9:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion,
	Linux Netdev List

On Wed, 1 Jul 2015, Or Gerlitz wrote:
> On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > OK, using this patch it worked:
> 
> yep, I forgot to recap err to zero.
> 
> By "it worked" you mean the VF is live and kicking, all functionality
> you had before the 4.2 merge window is back again?
> 

Yes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-01 13:18         ` Or Gerlitz
  2015-07-01 13:59           ` Sebastian Ott
@ 2015-07-29 12:50           ` Sebastian Ott
  2015-07-29 13:25             ` Or Gerlitz
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Ott @ 2015-07-29 12:50 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On Wed, 1 Jul 2015, Or Gerlitz wrote:
> On 6/30/2015 5:17 PM, Sebastian Ott wrote:
> > On Tue, 30 Jun 2015, Or Gerlitz wrote:
> > > On 6/30/2015 4:24 PM, Sebastian Ott wrote:
> > > > > > Do you run the VF on the same system/kernel as the PF, or the VF is
> > > > > > probed to
> > > > > > VM which runs the latest kernel and the PF runsolder kernel (which?)
> > > > The latter case. The PF is driven by a much older Kernel running OFED
> > > > 2.3.2.0.0.1
> > > >
> > > Can you try running the inbox PF driver that comes with the PF kernel
> > > (what
> > > kernel is that?) I'd like to see we're OK there.
> > Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
> >
> 
> I didn't want you to build that package, but rather the outer way around,
> namely
> see what happens if uninstalling this package and running with the mlx4 inbox
> PF
> driver from the kernel provided from your distro of choice or an
> upstreamkernel installed
> by you. Anyway, I hope the below patch would provide a quick band-aid and let
> you to continue running upstream VFs over that PF config, let me know (I will
> be
> OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Any updates on this one?

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4: "failed to allocate default counter port 1"
  2015-07-29 12:50           ` Sebastian Ott
@ 2015-07-29 13:25             ` Or Gerlitz
  0 siblings, 0 replies; 14+ messages in thread
From: Or Gerlitz @ 2015-07-29 13:25 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: Eran Ben Elisha, Jack Morgenstein, Hadar Hen Zion, netdev

On 7/29/2015 3:50 PM, Sebastian Ott wrote:
> Any updates on this one?

yep, we submitted the patch, see upstream commit 
178d23e3cd4811ebe702d60ac31e8bee389a5847
"net/mlx4_core: Use sink counter for the VF default as fallback"

Or.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-29 13:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-30 10:45 mlx4: "failed to allocate default counter port 1" Sebastian Ott
2015-06-30 12:21 ` Or Gerlitz
2015-06-30 13:19   ` Sebastian Ott
2015-06-30 12:53 ` Or Gerlitz
2015-06-30 13:24   ` Sebastian Ott
2015-06-30 13:52     ` Or Gerlitz
2015-06-30 14:17       ` Sebastian Ott
2015-07-01 13:18         ` Or Gerlitz
2015-07-01 13:59           ` Sebastian Ott
2015-07-01 14:18             ` Sebastian Ott
2015-07-01 20:38               ` Or Gerlitz
2015-07-02  9:20                 ` Sebastian Ott
2015-07-29 12:50           ` Sebastian Ott
2015-07-29 13:25             ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).