Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-09  7:21 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <20111005081502.GB2681@mtldesk30>

On Wed, 2011-10-05 at 10:15 +0200, Eli Cohen wrote:
> On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> 
> I believe we have an endianess problem here. The source buffer is in
> big endian - in x86 archs, it will rich the pci device unswapped since
> both x86 and pci are little endian. In ppc, it wil be swapped by the
> chipset 
  ^^^^^^^ ugh ?

> so it will reach the device in little endian which is wrong.
> So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> the dwords before the call to __iowrite64_copy. Of course which should
> fix this in an arch independent manner. Let me know this works for
> you.

That's generally the wrong way to handle endian (to swap32 a whole
buffer).

But I need to know more about the structure of those descriptors etc...
to judge properly.

Cheers,
Ben.

> > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > >  .../...
> > > 
> > > > > Can you also send me the output of ethtool -i?
> > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > 
> > > > > Yevgeny
> > > > 
> > > > Hello, Yevgeny.
> > > > 
> > > > You will find the output of ethtool -i below.
> > > > 
> > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > processors. They can provide us some more insight into this.
> > > 
> > > May I get some background please ? :-)
> > > 
> > > I'm not aware of a specific issue with write combining but I'd need to
> > > know more about what you are doing and the code to do it to comment on
> > > whether it should work or not.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > 
> > Hello, Ben.
> > 
> > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > added blue frame support, that does not require writing to the device
> > memory to indicate a new packet (the doorbell register as it is called).
> > 
> > Well, the ring is getting full with no interrupt or packets transmitted.
> > I simply added a write to the doorbell register and it works for me.
> > Yevgeny says this is not the right fix, claiming there is a problem with
> > write combining on POWER. The code uses memory barriers, so I don't know
> > why there is any problem.
> > 
> > I am posting the code here to show better what the situation is.
> > Yevgeny can tell more about the device and the driver.
> > 
> > The code below is the driver as of now, including a diff with what I
> > changed and had resulted OK for me. Before the blue frame support, the
> > only code executed was the else part. I can't tell much what the device
> > should be seeing and doing after the blue frame part of the code is
> > executed. But it does send the packet if I write to the doorbell
> > register.
> > 
> > Yevgeny, can you tell us what the device should be doing and why you
> > think this is a problem on POWER? Is it possible that this is simply a
> > problem with the firmware version?
> > 
> > Thanks,
> > Cascardo.
> > 
> > ---
> >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > !vlan_tag) {
> >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > ring->doorbell_qpn;
> >                 op_own |= htonl((bf_index & 0xffff) << 8);
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > 
> >                 wmb();
> > 
> >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > long *) &tx_desc->ctrl,
> >                      desc_size);
> > 
> >                 wmb();
> > 
> >                 ring->bf.offset ^= ring->bf.buf_size;
> >         } else {
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > -               wmb();
> > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> >         }
> > 
> > +       wmb();
> > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> > +
> > ---

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-09  7:25 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <20111006135759.GH2681@mtldesk30>

On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> 
> How about this patch - can you give it a try?
> 
> 
> >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli@mellanox.co.il>
> Date: Thu, 6 Oct 2011 15:50:02 +0200
> Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> 
> The source buffer used for copying into the blue flame register is already in
> big endian. However, when copying to device on powerpc, the endianess is
> swapped so the data reaches th device in little endian which is wrong. On x86
> based platform no swapping occurs so it reaches the device with the correct
> endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> is no change; on BE there will be a swap.

That looks wrong.

What is this __iowrite64_copy... oh I see

Nice, somebody _AGAIN_ added a bunch of "generic" IO accessors that are
utterly wrong on all archs except x86 (ok, -almost-). There isn't a
single bloody memory barrier in there !

So, __iowrite64_copy is doing raw_writel which will -not- swap, so your
buffer is going to have the same endianness in the destination than it
has in the source. This is _NOT_ the right place to do a swap.

It's the original construction of the descriptor that needs change. The
data itself should never need to be affected accross a copy operation
(unless your HW is terminally busted).

Cheers,
Ben.

> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
> ---
>  drivers/net/mlx4/en_tx.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> index 16337fb..3743acc 100644
> --- a/drivers/net/mlx4/en_tx.c
> +++ b/drivers/net/mlx4/en_tx.c
> @@ -601,6 +601,16 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
>  
>  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
>  {
> +	int i;
> +	__le32 *psrc = (__le32 *)src;
> +
> +	/*
> +	 * the buffer is already in big endian. For little endian machines that's
> +	 * fine. For big endain machines we must swap since the chipset swaps again
> +	 */
> +	for (i = 0; i < bytecnt / 4; ++i)
> +		psrc[i] = le32_to_cpu(psrc[i]);
> +
>  	__iowrite64_copy(dst, src, bytecnt / 8);
>  }
>  
> -- 
> 1.7.7.rc0.70.g82660
> 
> 
> 
> > On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > 
> > I believe we have an endianess problem here. The source buffer is in
> > big endian - in x86 archs, it will rich the pci device unswapped since
> > both x86 and pci are little endian. In ppc, it wil be swapped by the
> > chipset so it will reach the device in little endian which is wrong.
> > So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> > the dwords before the call to __iowrite64_copy. Of course which should
> > fix this in an arch independent manner. Let me know this works for
> > you.
> > 
> > > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > >  .../...
> > > > 
> > > > > > Can you also send me the output of ethtool -i?
> > > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > > 
> > > > > > Yevgeny
> > > > > 
> > > > > Hello, Yevgeny.
> > > > > 
> > > > > You will find the output of ethtool -i below.
> > > > > 
> > > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > > processors. They can provide us some more insight into this.
> > > > 
> > > > May I get some background please ? :-)
> > > > 
> > > > I'm not aware of a specific issue with write combining but I'd need to
> > > > know more about what you are doing and the code to do it to comment on
> > > > whether it should work or not.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > 
> > > 
> > > Hello, Ben.
> > > 
> > > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > > added blue frame support, that does not require writing to the device
> > > memory to indicate a new packet (the doorbell register as it is called).
> > > 
> > > Well, the ring is getting full with no interrupt or packets transmitted.
> > > I simply added a write to the doorbell register and it works for me.
> > > Yevgeny says this is not the right fix, claiming there is a problem with
> > > write combining on POWER. The code uses memory barriers, so I don't know
> > > why there is any problem.
> > > 
> > > I am posting the code here to show better what the situation is.
> > > Yevgeny can tell more about the device and the driver.
> > > 
> > > The code below is the driver as of now, including a diff with what I
> > > changed and had resulted OK for me. Before the blue frame support, the
> > > only code executed was the else part. I can't tell much what the device
> > > should be seeing and doing after the blue frame part of the code is
> > > executed. But it does send the packet if I write to the doorbell
> > > register.
> > > 
> > > Yevgeny, can you tell us what the device should be doing and why you
> > > think this is a problem on POWER? Is it possible that this is simply a
> > > problem with the firmware version?
> > > 
> > > Thanks,
> > > Cascardo.
> > > 
> > > ---
> > >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > > !vlan_tag) {
> > >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > > ring->doorbell_qpn;
> > >                 op_own |= htonl((bf_index & 0xffff) << 8);
> > >                 /* Ensure new descirptor hits memory
> > >                 * before setting ownership of this descriptor to HW */
> > >                 wmb();
> > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > 
> > >                 wmb();
> > > 
> > >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > > long *) &tx_desc->ctrl,
> > >                      desc_size);
> > > 
> > >                 wmb();
> > > 
> > >                 ring->bf.offset ^= ring->bf.buf_size;
> > >         } else {
> > >                 /* Ensure new descirptor hits memory
> > >                 * before setting ownership of this descriptor to HW */
> > >                 wmb();
> > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > -               wmb();
> > > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > MLX4_SEND_DOORBELL);
> > >         }
> > > 
> > > +       wmb();
> > > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > MLX4_SEND_DOORBELL);
> > > +
> > > ---

^ permalink raw reply

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled
From: Benjamin Herrenschmidt @ 2011-10-09  7:28 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, Thadeu Lima de Souza Cascardo, Eli Cohen,
	Yevgeny Petrilin, linuxppc-dev, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AE76@saturn3.aculab.com>

On Thu, 2011-10-06 at 15:10 +0100, David Laight wrote:
> horrid...
> 1) I'm not sure the caller expects the buffer to be corrupted.
> 2) It contains a lot of memory cycles.
> 3) It looked from the calls that this code is copying descriptors,
>    so the transfer length is probably 1 or 2 words - so the loop
>    is inefficient.
> 4) ppc doesn't have a fast byteswap instruction (very new gcc might
>    use the byteswapping memery access for the le32_to_cpu() though),
>    so it would be better getting the byteswap done inside
>    __iowrite64_copy() - since that is probably requesting a byteswap
>    anyway.
> OTOH I'm not at all clear about the 64bit xfers....

And it's just plain wrong anyway. You should never have to byteswap a
copy.

Ben.

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-09  7:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <1318145118.29415.371.camel@pasglop>

On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > 
> > How about this patch - can you give it a try?
> > 
> > 
> > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > From: Eli Cohen <eli@mellanox.co.il>
> > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > 
> > The source buffer used for copying into the blue flame register is already in
> > big endian. However, when copying to device on powerpc, the endianess is
> > swapped so the data reaches th device in little endian which is wrong. On x86
> > based platform no swapping occurs so it reaches the device with the correct
> > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > is no change; on BE there will be a swap.
> 
> That looks wrong.
Not sure I understand: are you saying that on ppc, when you call
__iowrite64_copy, it will not reach the device swapped?

The point is that we must always have the buffer ready in big endian
in memory. In the case of blue flame, we must also copy it to the
device registers in pci memory space. So if we use the buffer we
already prepared, we must have another swap. I can think of a nicer
way to implement this functionality but the question is do you think
my observation above is wrong and why.

> 
> What is this __iowrite64_copy... oh I see
> 
> Nice, somebody _AGAIN_ added a bunch of "generic" IO accessors that are
> utterly wrong on all archs except x86 (ok, -almost-). There isn't a
> single bloody memory barrier in there !
> 
> So, __iowrite64_copy is doing raw_writel which will -not- swap, so your
> buffer is going to have the same endianness in the destination than it
> has in the source. This is _NOT_ the right place to do a swap.
> 
> It's the original construction of the descriptor that needs change. The
> data itself should never need to be affected accross a copy operation
> (unless your HW is terminally busted).
> 
> Cheers,
> Ben.
> 
> > Signed-off-by: Eli Cohen <eli@mellanox.co.il>
> > ---
> >  drivers/net/mlx4/en_tx.c |   10 ++++++++++
> >  1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> > index 16337fb..3743acc 100644
> > --- a/drivers/net/mlx4/en_tx.c
> > +++ b/drivers/net/mlx4/en_tx.c
> > @@ -601,6 +601,16 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
> >  
> >  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
> >  {
> > +	int i;
> > +	__le32 *psrc = (__le32 *)src;
> > +
> > +	/*
> > +	 * the buffer is already in big endian. For little endian machines that's
> > +	 * fine. For big endain machines we must swap since the chipset swaps again
> > +	 */
> > +	for (i = 0; i < bytecnt / 4; ++i)
> > +		psrc[i] = le32_to_cpu(psrc[i]);
> > +
> >  	__iowrite64_copy(dst, src, bytecnt / 8);
> >  }
> >  
> > -- 
> > 1.7.7.rc0.70.g82660
> > 
> > 
> > 
> > > On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > > I believe we have an endianess problem here. The source buffer is in
> > > big endian - in x86 archs, it will rich the pci device unswapped since
> > > both x86 and pci are little endian. In ppc, it wil be swapped by the
> > > chipset so it will reach the device in little endian which is wrong.
> > > So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> > > the dwords before the call to __iowrite64_copy. Of course which should
> > > fix this in an arch independent manner. Let me know this works for
> > > you.
> > > 
> > > > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > >  .../...
> > > > > 
> > > > > > > Can you also send me the output of ethtool -i?
> > > > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > > > 
> > > > > > > Yevgeny
> > > > > > 
> > > > > > Hello, Yevgeny.
> > > > > > 
> > > > > > You will find the output of ethtool -i below.
> > > > > > 
> > > > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > > > processors. They can provide us some more insight into this.
> > > > > 
> > > > > May I get some background please ? :-)
> > > > > 
> > > > > I'm not aware of a specific issue with write combining but I'd need to
> > > > > know more about what you are doing and the code to do it to comment on
> > > > > whether it should work or not.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > 
> > > > Hello, Ben.
> > > > 
> > > > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > > > added blue frame support, that does not require writing to the device
> > > > memory to indicate a new packet (the doorbell register as it is called).
> > > > 
> > > > Well, the ring is getting full with no interrupt or packets transmitted.
> > > > I simply added a write to the doorbell register and it works for me.
> > > > Yevgeny says this is not the right fix, claiming there is a problem with
> > > > write combining on POWER. The code uses memory barriers, so I don't know
> > > > why there is any problem.
> > > > 
> > > > I am posting the code here to show better what the situation is.
> > > > Yevgeny can tell more about the device and the driver.
> > > > 
> > > > The code below is the driver as of now, including a diff with what I
> > > > changed and had resulted OK for me. Before the blue frame support, the
> > > > only code executed was the else part. I can't tell much what the device
> > > > should be seeing and doing after the blue frame part of the code is
> > > > executed. But it does send the packet if I write to the doorbell
> > > > register.
> > > > 
> > > > Yevgeny, can you tell us what the device should be doing and why you
> > > > think this is a problem on POWER? Is it possible that this is simply a
> > > > problem with the firmware version?
> > > > 
> > > > Thanks,
> > > > Cascardo.
> > > > 
> > > > ---
> > > >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > > > !vlan_tag) {
> > > >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > > > ring->doorbell_qpn;
> > > >                 op_own |= htonl((bf_index & 0xffff) << 8);
> > > >                 /* Ensure new descirptor hits memory
> > > >                 * before setting ownership of this descriptor to HW */
> > > >                 wmb();
> > > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > > 
> > > >                 wmb();
> > > > 
> > > >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > > > long *) &tx_desc->ctrl,
> > > >                      desc_size);
> > > > 
> > > >                 wmb();
> > > > 
> > > >                 ring->bf.offset ^= ring->bf.buf_size;
> > > >         } else {
> > > >                 /* Ensure new descirptor hits memory
> > > >                 * before setting ownership of this descriptor to HW */
> > > >                 wmb();
> > > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > > -               wmb();
> > > > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > > MLX4_SEND_DOORBELL);
> > > >         }
> > > > 
> > > > +       wmb();
> > > > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > > MLX4_SEND_DOORBELL);
> > > > +
> > > > ---
> 

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-09  8:00 UTC (permalink / raw)
  To: Eli Cohen
  Cc: netdev@vger.kernel.org, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <20111009073546.GI2681@mtldesk30>

On Sun, 2011-10-09 at 09:35 +0200, Eli Cohen wrote:
> On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> > On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > > 
> > > How about this patch - can you give it a try?
> > > 
> > > 
> > > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > > From: Eli Cohen <eli@mellanox.co.il>
> > > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > > 
> > > The source buffer used for copying into the blue flame register is already in
> > > big endian. However, when copying to device on powerpc, the endianess is
> > > swapped so the data reaches th device in little endian which is wrong. On x86
> > > based platform no swapping occurs so it reaches the device with the correct
> > > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > > is no change; on BE there will be a swap.
> > 
> > That looks wrong.
> Not sure I understand: are you saying that on ppc, when you call
> __iowrite64_copy, it will not reach the device swapped?

Well, first, what do you mean by "swapped" ? :-) But no, it won't for
all intend and purpose, this is a copy routine, copy routines never
swap, neither do fifo accesses for example.

> The point is that we must always have the buffer ready in big endian
> in memory. In the case of blue flame, we must also copy it to the
> device registers in pci memory space. So if we use the buffer we
> already prepared, we must have another swap. I can think of a nicer
> way to implement this functionality but the question is do you think
> my observation above is wrong and why.

No. If it's in memory BE then the copy routine will keep it BE. A copy
routine doesn't swap and doesn't affect endianness.

Additionally, a swapping phase like you proposed doing 32-bit swaps
means that you know for sure that the buffer is made of 32-bit
quantities, is that the case ? Even if you had needed that swap, if your
buffer had contained 16-bit or 64-bit quantities, you're toast.

What is this buffer anyway ? A descriptor or a network packet ?

If it's a packet, then it's data, endianness has no meaning (or rather
it has for individual fields of the packets but they are already in the
right format and a 32-bit swap will never be right).
 
It's almost never right to perform swapping when copying data (or
reading/writing a FIFO).

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-09  8:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <1318147254.29415.377.camel@pasglop>

On Sun, Oct 09, 2011 at 10:00:54AM +0200, Benjamin Herrenschmidt wrote:
> On Sun, 2011-10-09 at 09:35 +0200, Eli Cohen wrote:
> > On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> > > On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > > > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > > > 
> > > > How about this patch - can you give it a try?
> > > > 
> > > > 
> > > > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > > > From: Eli Cohen <eli@mellanox.co.il>
> > > > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > > > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > > > 
> > > > The source buffer used for copying into the blue flame register is already in
> > > > big endian. However, when copying to device on powerpc, the endianess is
> > > > swapped so the data reaches th device in little endian which is wrong. On x86
> > > > based platform no swapping occurs so it reaches the device with the correct
> > > > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > > > is no change; on BE there will be a swap.
> > > 
> > > That looks wrong.
> > Not sure I understand: are you saying that on ppc, when you call
> > __iowrite64_copy, it will not reach the device swapped?
> 
> Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> all intend and purpose, this is a copy routine, copy routines never
> swap, neither do fifo accesses for example.
When I say swapped, I mean not necessairliy by software. I think that
the chipset will swap the the data. The reason I think so is that the
CPU arch is big endian, while PCI bus is defined as little endian.
That's why I think a swap will occur in ppc and not in x86.
> 
> > The point is that we must always have the buffer ready in big endian
> > in memory. In the case of blue flame, we must also copy it to the
> > device registers in pci memory space. So if we use the buffer we
> > already prepared, we must have another swap. I can think of a nicer
> > way to implement this functionality but the question is do you think
> > my observation above is wrong and why.
> 
> No. If it's in memory BE then the copy routine will keep it BE. A copy
> routine doesn't swap and doesn't affect endianness.
> 
> Additionally, a swapping phase like you proposed doing 32-bit swaps
> means that you know for sure that the buffer is made of 32-bit
> quantities, is that the case ?
Yes

> Even if you had needed that swap, if your
> buffer had contained 16-bit or 64-bit quantities, you're toast.
> 
> What is this buffer anyway ? A descriptor or a network packet ?
It's a special descriptor that resides both in memory and also written
to the device's register. An it contains both data and control
informartion.
> 
> If it's a packet, then it's data, endianness has no meaning (or rather
> it has for individual fields of the packets but they are already in the
> right format and a 32-bit swap will never be right).
>  
> It's almost never right to perform swapping when copying data (or
> reading/writing a FIFO).
> 
> Cheers,
> Ben.

^ permalink raw reply

* [PATCH] gro: refetch inet6_protos[] after pulling ext headers
From: Yan, Zheng @ 2011-10-09  8:34 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Herbert Xu, davem@davemloft.net

ipv6_gro_receive() doesn't update the protocol ops after pulling
the ext headers. It looks like a typo.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b5669a..d27c797 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -875,6 +875,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 		skb_reset_transport_header(skb);
 		__skb_push(skb, skb_gro_offset(skb));
 
+		ops = rcu_dereference(inet6_protos[proto]);
 		if (!ops || !ops->gro_receive)
 			goto out_unlock;
 

^ permalink raw reply related

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-09  8:38 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <20111009080747.GJ2681@mtldesk30>

On Sun, 2011-10-09 at 10:07 +0200, Eli Cohen wrote:

> > Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> > all intend and purpose, this is a copy routine, copy routines never
> > swap, neither do fifo accesses for example.
> When I say swapped, I mean not necessairliy by software. I think that
> the chipset will swap the the data. The reason I think so is that the
> CPU arch is big endian, while PCI bus is defined as little endian.
> That's why I think a swap will occur in ppc and not in x86.

No it won't "swap the data". The wiring between PCI and the CPU bus is
done in a way called "byte address invariant", and there is some kind of
flip of byte lanes related essentially to ensure that, but for all
intend and purpose it's transparent.

> It's a special descriptor that resides both in memory and also written
> to the device's register. An it contains both data and control
> informartion.

Data should not be swapped then. Only individual bits of control
information. In any case, the buffer should be generated in the right
format in memory to start with. The copy routine doesn't need to swap.

To go back to the driver code, the statements that ring a "bell" are:

	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

This doesn't look right unless "doorbell_qpn" itself is already somewhat
in the appropriate byte order.

Is that vlan_tag a big or little endian quantity ? Either way, this
looks broken in either x86 or ppc unless doorbell_qpn itself is already
in the right endian.

But since later I see

	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);

That rings a nasty bell, it looks like doorbell_pqn is in CPU (native)
endian, so it should have to be swapped before being OR'ed into the
descriptor, either that or the HW does some black magic I don't
understand.

Cheers,
Ben.

^ permalink raw reply

* [PATCH] r8169: Fix WOL in power down case
From: Marc Ballarin @ 2011-10-09  8:49 UTC (permalink / raw)
  To: Francois Romieu; +Cc: hayeswang, netdev
In-Reply-To: <20111007232834.GA29770@electric-eye.fr.zoreil.com>

Commit 92fc43b4159b518f5baae57301f26d770b0834c9 ("r8169: modify the flow of the
hw reset.") breaks WOL on some versions of the hardware.

Commit 106633897e086e1b47126996aac1a427eb80eb1b ("r8169: fix WOL setting for
8105 and 8111evl") tries to fix this, but only does so in the standby case.

This patch applies an analogous fix to the shutdown path.

Signed-off-by: Marc Ballarin <ballarin.marc@gmx.de>

---

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index c236670..a2d6e3a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5817,6 +5817,15 @@ static void rtl_shutdown(struct pci_dev *pdev)
 			/* PCI commit */
 			RTL_R8(ChipCmd);
 		}
+		/* rtl8105, rtl8111E, and rtl8111evl need the following bits
+		 * set for WOL to work */
+		if ((tp->mac_version == RTL_GIGA_MAC_VER_32 ||
+		     tp->mac_version == RTL_GIGA_MAC_VER_33 ||
+		     tp->mac_version == RTL_GIGA_MAC_VER_34) &&
+		    (__rtl8169_get_wol(tp) & WAKE_ANY)) {
+			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
+			  AcceptMulticast | AcceptMyPhys);
+		}
 
 		pci_wake_from_d3(pdev, true);
 		pci_set_power_state(pdev, PCI_D3hot);

^ permalink raw reply related

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-09  9:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <1318149536.29415.384.camel@pasglop>

On Sun, Oct 09, 2011 at 10:38:56AM +0200, Benjamin Herrenschmidt wrote:
> On Sun, 2011-10-09 at 10:07 +0200, Eli Cohen wrote:
> 
> > > Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> > > all intend and purpose, this is a copy routine, copy routines never
> > > swap, neither do fifo accesses for example.
> > When I say swapped, I mean not necessairliy by software. I think that
> > the chipset will swap the the data. The reason I think so is that the
> > CPU arch is big endian, while PCI bus is defined as little endian.
> > That's why I think a swap will occur in ppc and not in x86.
> 
> No it won't "swap the data". The wiring between PCI and the CPU bus is
> done in a way called "byte address invariant", and there is some kind of
> flip of byte lanes related essentially to ensure that, but for all
> intend and purpose it's transparent.
> 
> > It's a special descriptor that resides both in memory and also written
> > to the device's register. An it contains both data and control
> > informartion.
> 
> Data should not be swapped then. Only individual bits of control
> information. In any case, the buffer should be generated in the right
> format in memory to start with. The copy routine doesn't need to swap.
> 
> To go back to the driver code, the statements that ring a "bell" are:
> 
> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> 
> This doesn't look right unless "doorbell_qpn" itself is already somewhat
> in the appropriate byte order.
This is something that supports my claim that the chipset swaps
endianess in ppc.
Look at mlx4_en_activate_tx_ring():
ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
so in LE machines it layed as big endian in memory while in BE machines
it is layed as little endian in memory.
Then we write this value to the device registers which must get it in
big endian otherwise it won't work - and we know this works in both
ppc and x86. You can ignore the case of blue flame:

        } else if (nreq) {
                qp->sq.head += nreq;

                /*
                 * Make sure that descriptors are written before
                 * doorbell record.
                 */
                wmb();

                writel(qp->doorbell_qpn, qp->bf.uar->map +
MLX4_SEND_DOORBELL); <==  remember that it is layed in little endian
but the device must get it in big endian.

                /*
                 * Make sure doorbells don't leak out of SQ spinlock
                 * and reach the HCA out of order.
                 */
                mmiowb();

        }




> 
> Is that vlan_tag a big or little endian quantity ? Either way, this
> looks broken in either x86 or ppc unless doorbell_qpn itself is already
> in the right endian.
> 
> But since later I see
> 
> 	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> 
> That rings a nasty bell, it looks like doorbell_pqn is in CPU (native)
> endian, so it should have to be swapped before being OR'ed into the
> descriptor, either that or the HW does some black magic I don't
> understand.
> 
> Cheers,
> Ben.
> 

^ permalink raw reply

* Re: [PATCH] gro: refetch inet6_protos[] after pulling ext headers
From: Eric Dumazet @ 2011-10-09  9:33 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: netdev@vger.kernel.org, Herbert Xu, davem@davemloft.net
In-Reply-To: <4E915C9B.4000807@intel.com>

Le dimanche 09 octobre 2011 à 16:34 +0800, Yan, Zheng a écrit :
> ipv6_gro_receive() doesn't update the protocol ops after pulling
> the ext headers. It looks like a typo.
> 
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 3b5669a..d27c797 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -875,6 +875,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
>  		skb_reset_transport_header(skb);
>  		__skb_push(skb, skb_gro_offset(skb));
>  
> +		ops = rcu_dereference(inet6_protos[proto]);
>  		if (!ops || !ops->gro_receive)
>  			goto out_unlock;

Good catch !

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-09  9:52 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <20111009092102.GK2681@mtldesk30>


> > To go back to the driver code, the statements that ring a "bell" are:
> > 
> > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > 
> > This doesn't look right unless "doorbell_qpn" itself is already somewhat
> > in the appropriate byte order.

> This is something that supports my claim that the chipset swaps
> endianess in ppc.

No the chipset doesn't swap

> Look at mlx4_en_activate_tx_ring():
> ring->doorbell_qpn = swab32(ring->qp.qpn << 8);

That looks gross, I think somebody writing this driver doesn't
understand endianness.

> so in LE machines it layed as big endian in memory while in BE machines
> it is layed as little endian in memory.

Yes which is very odd, it should be layed out the same in memory
regardless of the machine. However in this case, this isn't accessed
directly via DMA (this field at least), so what you appear to be doing
here is to artificially "undo" what writel does later on (see below).

> Then we write this value to the device registers which must get it in
> big endian otherwise it won't work - and we know this works in both
> ppc and x86. You can ignore the case of blue flame:

Well it works because your device is odd and has BE registers :-) It's
however not the right way to do and it's broken in your blue flame case
(for what is now obvious reasons, see below).

What's happening basically here is that you are swapping once in
swab32 , store that swapped value, then writel will re-swap on power and
not swap on x86 (because the writel accessor performs swapping). So on
x86 you basically do LE -> BE and on ppc you do BE -> LE -> BE :-)
Pretty inefficient.

None of this has anything to do with the chipset which doesn't swap
anything behind your back.

Ideally you want to avoid that swapping altogether and use the right
accessor that indicates that your register is BE to start with. IE.
remove the swab32 completely and then use something like 
iowrite32be() instead of writel().

Basically, the problem you have is that writel() has an implicit "write
to LE register" semantic. Your register is BE. the "iomap" variants
provide you with more fine grained "be" variants to use in that case.
There's also writel_be() but that one doesn't exist on every
architecture afaik.

Now, once the mmio problem is out of the way, let's look back at how you
then use that qpn.

With the current code, you've generated something in memory which is
byte reversed, so essentially "LE" on ppc and "BE" on x86.

Then, this statement:

*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

Will essentially write it out as-is in memory for use by the chip. The chip,
from what you say, expects BE, so this will be broken on PPC.

Here too, the right solution is to instead not do that swab32 to begin
with (ring->doorbell_qpn remains a native endian value) and instead do,
in addition to the above mentioned change to the writel:

	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);

(Also get rid of that cast and define vlan_tag as a __be32 to start
with).

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-09 10:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <1318153939.29415.401.camel@pasglop>

On Sun, Oct 09, 2011 at 11:52:19AM +0200, Benjamin Herrenschmidt wrote:
> 
> > > To go back to the driver code, the statements that ring a "bell" are:
> > > 
> > > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > > 
> > > This doesn't look right unless "doorbell_qpn" itself is already somewhat
> > > in the appropriate byte order.
> 
> > This is something that supports my claim that the chipset swaps
> > endianess in ppc.
> 
> No the chipset doesn't swap
> 
> > Look at mlx4_en_activate_tx_ring():
> > ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
> 
> That looks gross, I think somebody writing this driver doesn't
> understand endianness.
> 
> > so in LE machines it layed as big endian in memory while in BE machines
> > it is layed as little endian in memory.
> 
> Yes which is very odd, it should be layed out the same in memory
> regardless of the machine. However in this case, this isn't accessed
> directly via DMA (this field at least), so what you appear to be doing
> here is to artificially "undo" what writel does later on (see below).
> 
> > Then we write this value to the device registers which must get it in
> > big endian otherwise it won't work - and we know this works in both
> > ppc and x86. You can ignore the case of blue flame:
> 
> Well it works because your device is odd and has BE registers :-) It's
> however not the right way to do and it's broken in your blue flame case
> (for what is now obvious reasons, see below).
> 
> What's happening basically here is that you are swapping once in
> swab32 , store that swapped value, then writel will re-swap on power and
> not swap on x86 (because the writel accessor performs swapping). So on
> x86 you basically do LE -> BE and on ppc you do BE -> LE -> BE :-)
> Pretty inefficient.
> 
> None of this has anything to do with the chipset which doesn't swap
> anything behind your back.
> 
> Ideally you want to avoid that swapping altogether and use the right
> accessor that indicates that your register is BE to start with. IE.
> remove the swab32 completely and then use something like 
> iowrite32be() instead of writel().
I agree, this looks better but does it work on memory mapped io or
only on io pci space? All our registers are memory mapped...

> 
> Basically, the problem you have is that writel() has an implicit "write
> to LE register" semantic. Your register is BE. the "iomap" variants
> provide you with more fine grained "be" variants to use in that case.
> There's also writel_be() but that one doesn't exist on every
> architecture afaik.
So writel_be is the function I should use for memory mapped io? If it
does not exist for all platforms it's a pitty :-(
> 
> Now, once the mmio problem is out of the way, let's look back at how you
> then use that qpn.
> 
> With the current code, you've generated something in memory which is
> byte reversed, so essentially "LE" on ppc and "BE" on x86.
> 
> Then, this statement:
> 
> *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> 
> Will essentially write it out as-is in memory for use by the chip. The chip,
> from what you say, expects BE, so this will be broken on PPC.
I see. So this field is layed in le for ppc and the rest of the
descriptor is be. so I assum that __iowrite64_copy() does not swap
anything but we still have tx_desc->ctrl.vlan_tag in the wrong
endianess.

> 
> Here too, the right solution is to instead not do that swab32 to begin
> with (ring->doorbell_qpn remains a native endian value) and instead do,
> in addition to the above mentioned change to the writel:
> 
> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
> 
> (Also get rid of that cast and define vlan_tag as a __be32 to start
> with).
> 
> Cheers,
> Ben.

Thanks for your review. I will send another patch which should fix the
deficiencies.

^ permalink raw reply

* Re: [PATCH] r8169: Fix WOL in power down case
From: Francois Romieu @ 2011-10-09 11:54 UTC (permalink / raw)
  To: Marc Ballarin; +Cc: hayeswang, netdev
In-Reply-To: <20111009104907.0c6cd439900ca5eb7789295f@gmx.de>

Marc Ballarin <ballarin.marc@gmx.de> :
> Commit 92fc43b4159b518f5baae57301f26d770b0834c9 ("r8169: modify the flow of the
> hw reset.") breaks WOL on some versions of the hardware.
> 
> Commit 106633897e086e1b47126996aac1a427eb80eb1b ("r8169: fix WOL setting for
> 8105 and 8111evl") tries to fix this, but only does so in the standby case.
> 
> This patch applies an analogous fix to the shutdown path.

Nonononono, you must dig for similar occurrences of this bug. Take a look at
r810x_pll_power_down and you will realize that RTL_GIGA_MAC_VER_{29, 30} have
the exact same problem.

Moreover, the patch duplicates a chunk of conditional code:
1) it bloats the driver (ok, we have seen worse...).
2) you can bet that both instances won't be updated if / when
   RTL_GIGA_MAC_VER_xy appears.

The patch below should fix it.

It looks like PM savings needs some love when WoL is enabled but this is a
different topic.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index c236670..b53c83d 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3328,22 +3328,36 @@ static void r810x_phy_power_up(struct rtl8169_private *tp)
 	rtl_writephy(tp, MII_BMCR, BMCR_ANENABLE);
 }
 
-static void r810x_pll_power_down(struct rtl8169_private *tp)
+static bool rtl_test_wol_and_enable_packets_receive(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
 
-	if (__rtl8169_get_wol(tp) & WAKE_ANY) {
-		rtl_writephy(tp, 0x1f, 0x0000);
-		rtl_writephy(tp, MII_BMCR, 0x0000);
+	if (!(__rtl8169_get_wol(tp) & WAKE_ANY))
+		return false;
 
-		if (tp->mac_version == RTL_GIGA_MAC_VER_29 ||
-		    tp->mac_version == RTL_GIGA_MAC_VER_30)
-			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
-				AcceptMulticast | AcceptMyPhys);
-		return;
+	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl_writephy(tp, MII_BMCR, 0x0000);
+
+	switch (tp->mac_version) {
+	case RTL_GIGA_MAC_VER_29:
+	case RTL_GIGA_MAC_VER_30:
+	case RTL_GIGA_MAC_VER_32:
+	case RTL_GIGA_MAC_VER_33:
+	case RTL_GIGA_MAC_VER_34:
+		RTL_W32(RxConfig, RTL_R32(RxConfig) |
+			AcceptBroadcast | AcceptMulticast | AcceptMyPhys);
+		break;
+	default:
+		break;
 	}
 
-	r810x_phy_power_down(tp);
+	return true;
+}
+
+static void r810x_pll_power_down(struct rtl8169_private *tp)
+{
+	if (!rtl_test_wol_and_enable_packets_receive(tp))
+		r810x_phy_power_down(tp);
 }
 
 static void r810x_pll_power_up(struct rtl8169_private *tp)
@@ -3430,17 +3444,8 @@ static void r8168_pll_power_down(struct rtl8169_private *tp)
 	    tp->mac_version == RTL_GIGA_MAC_VER_33)
 		rtl_ephy_write(ioaddr, 0x19, 0xff64);
 
-	if (__rtl8169_get_wol(tp) & WAKE_ANY) {
-		rtl_writephy(tp, 0x1f, 0x0000);
-		rtl_writephy(tp, MII_BMCR, 0x0000);
-
-		if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
-		    tp->mac_version == RTL_GIGA_MAC_VER_33 ||
-		    tp->mac_version == RTL_GIGA_MAC_VER_34)
-			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
-				AcceptMulticast | AcceptMyPhys);
+	if (rtl_test_wol_and_enable_packets_receive(tp))
 		return;
-	}
 
 	r8168_phy_power_down(tp);
 
@@ -5807,10 +5812,10 @@ static void rtl_shutdown(struct pci_dev *pdev)
 
 	if (system_state == SYSTEM_POWER_OFF) {
 		/* WoL fails with 8168b when the receiver is disabled. */
-		if ((tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+		if (rtl_test_wol_and_enable_packets_receive(tp) &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
 		     tp->mac_version == RTL_GIGA_MAC_VER_12 ||
-		     tp->mac_version == RTL_GIGA_MAC_VER_17) &&
-		    (tp->features & RTL_FEATURE_WOL)) {
+		     tp->mac_version == RTL_GIGA_MAC_VER_17)) {
 			pci_clear_master(pdev);
 
 			RTL_W8(ChipCmd, CmdRxEnb);

^ permalink raw reply related

* [PATCH 0/7] mlx4_en: bug fixes and performance enhancement
From: Alexander Guller @ 2011-10-09 15:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Hi Dave,

This is a series of patches for the mlx4_en driver generated against net-next tree.
The patches include several bug fixes and a fix for a better initialization of TX interrupts.

 en_cq.c      |   31 +++++++++-------
 en_ethtool.c |   14 ++++++-
 en_main.c    |    6 ++-
 en_netdev.c  |  112 ++++++++++++++++++++---------------------------------------
 en_port.c    |   18 +++++++--
 en_port.h    |   11 ++++-
 mlx4_en.h    |   15 +++----
 7 files changed, 103 insertions(+), 104 deletions(-)


Best regards,
Alex.

^ permalink raw reply

* [PATCH 1/7] mlx4_en: Assigning TX irq per ring
From: Alexander Guller @ 2011-10-09 15:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Until now only RX rings used irq per ring
and TX used only one per port.
>From now on, both of them will use the
irq per ring while RX & TX ring[i] will
use the same irq.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |   26 ++++++++++++++---------
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   14 +-----------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    4 +-
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index ec4b6d0..70ec529 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -74,7 +74,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv,
 	return err;
 }
 
-int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
+int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
+			int cq_idx)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
@@ -90,13 +91,15 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 	if (cq->is_tx == RX) {
 		if (mdev->dev->caps.comp_pool) {
 			if (!cq->vector) {
-				sprintf(name , "%s-rx-%d", priv->dev->name, cq->ring);
+				sprintf(name, "%s-%d", priv->dev->name,
+					cq->ring);
+				/* Set IRQ for specific name (per ring) */
 				if (mlx4_assign_eq(mdev->dev, name, &cq->vector)) {
-					cq->vector = (cq->ring + 1 + priv->port) %
-						mdev->dev->caps.num_comp_vectors;
+					cq->vector = (cq->ring + 1 + priv->port)
+					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed Assigning an EQ to "
-						  "%s_rx-%d ,Falling back to legacy EQ's\n",
-						  priv->dev->name, cq->ring);
+						  "%s ,Falling back to legacy EQ's\n",
+						  name);
 				}
 			}
 		} else {
@@ -104,10 +107,13 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 				mdev->dev->caps.num_comp_vectors;
 		}
 	} else {
-		if (!cq->vector || !mdev->dev->caps.comp_pool) {
-			/*Fallback to legacy pool in case of error*/
-			cq->vector   = 0;
-		}
+		/* For TX we use the same irq per
+		ring we assigned for the RX    */
+		struct mlx4_en_cq *rx_cq;
+
+		cq_idx = cq_idx % priv->rx_ring_num;
+		rx_cq = &priv->rx_cq[cq_idx];
+		cq->vector = rx_cq->vector;
 	}
 
 	if (!cq->is_tx)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 27789be..b42c6aa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -587,7 +587,6 @@ int mlx4_en_start_port(struct net_device *dev)
 	int i;
 	int j;
 	u8 mc_list[16] = {0};
-	char name[32];
 
 	if (priv->port_up) {
 		en_dbg(DRV, priv, "start port called while port already up\n");
@@ -608,7 +607,7 @@ int mlx4_en_start_port(struct net_device *dev)
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		cq = &priv->rx_cq[i];
 
-		err = mlx4_en_activate_cq(priv, cq);
+		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed activating Rx CQ\n");
 			goto cq_err;
@@ -642,20 +641,11 @@ int mlx4_en_start_port(struct net_device *dev)
 		goto mac_err;
 	}
 
-	if (mdev->dev->caps.comp_pool && !priv->tx_vector) {
-		sprintf(name , "%s-tx", priv->dev->name);
-		if (mlx4_assign_eq(mdev->dev , name, &priv->tx_vector)) {
-			mlx4_warn(mdev, "Failed Assigning an EQ to "
-					"%s_tx ,Falling back to legacy "
-					"EQ's\n", priv->dev->name);
-		}
-	}
 	/* Configure tx cq's and rings */
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		/* Configure cq */
 		cq = &priv->tx_cq[i];
-		cq->vector = priv->tx_vector;
-		err = mlx4_en_activate_cq(priv, cq);
+		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed allocating Tx CQ\n");
 			goto tx_err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ed84811..115784d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -470,7 +470,6 @@ struct mlx4_en_priv {
 	u16 log_rx_info;
 
 	struct mlx4_en_tx_ring tx_ring[MAX_TX_RINGS];
-	int tx_vector;
 	struct mlx4_en_rx_ring rx_ring[MAX_RX_RINGS];
 	struct mlx4_en_cq tx_cq[MAX_TX_RINGS];
 	struct mlx4_en_cq rx_cq[MAX_RX_RINGS];
@@ -510,7 +509,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 		      int entries, int ring, enum cq_type mode);
 void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 			bool reserve_vectors);
-int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
+int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
+			int cq_idx);
 void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_set_cq_moder(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_arm_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 2/7] mlx4_en: Removing reserve vectors
From: Alexander Guller @ 2011-10-09 15:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Fixed a bug where ring size change caused insufficient memory
upon driver restart due to unreleased EQs.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c      |    5 ++---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    8 ++++----
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    5 ++---
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 70ec529..227997d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -139,14 +139,13 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 	return 0;
 }
 
-void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
-			bool reserve_vectors)
+void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 
 	mlx4_en_unmap_buffer(&cq->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size);
-	if (priv->mdev->dev->caps.comp_pool && cq->vector && !reserve_vectors)
+	if (priv->mdev->dev->caps.comp_pool && cq->vector)
 		mlx4_release_eq(priv->mdev->dev, cq->vector);
 	cq->buf_size = 0;
 	cq->buf = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index eb09625..e247bd7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -416,7 +416,7 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 		mlx4_en_stop_port(dev);
 	}
 
-	mlx4_en_free_resources(priv, true);
+	mlx4_en_free_resources(priv);
 
 	priv->prof->tx_ring_size = tx_size;
 	priv->prof->rx_ring_size = rx_size;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b42c6aa..8402982 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -876,7 +876,7 @@ static int mlx4_en_close(struct net_device *dev)
 	return 0;
 }
 
-void mlx4_en_free_resources(struct mlx4_en_priv *priv, bool reserve_vectors)
+void mlx4_en_free_resources(struct mlx4_en_priv *priv)
 {
 	int i;
 
@@ -884,14 +884,14 @@ void mlx4_en_free_resources(struct mlx4_en_priv *priv, bool reserve_vectors)
 		if (priv->tx_ring[i].tx_info)
 			mlx4_en_destroy_tx_ring(priv, &priv->tx_ring[i]);
 		if (priv->tx_cq[i].buf)
-			mlx4_en_destroy_cq(priv, &priv->tx_cq[i], reserve_vectors);
+			mlx4_en_destroy_cq(priv, &priv->tx_cq[i]);
 	}
 
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		if (priv->rx_ring[i].rx_info)
 			mlx4_en_destroy_rx_ring(priv, &priv->rx_ring[i]);
 		if (priv->rx_cq[i].buf)
-			mlx4_en_destroy_cq(priv, &priv->rx_cq[i], reserve_vectors);
+			mlx4_en_destroy_cq(priv, &priv->rx_cq[i]);
 	}
 }
 
@@ -961,7 +961,7 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	mdev->pndev[priv->port] = NULL;
 	mutex_unlock(&mdev->state_lock);
 
-	mlx4_en_free_resources(priv, false);
+	mlx4_en_free_resources(priv);
 	free_netdev(dev);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 115784d..fe8146d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -502,13 +502,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 int mlx4_en_start_port(struct net_device *dev);
 void mlx4_en_stop_port(struct net_device *dev);
 
-void mlx4_en_free_resources(struct mlx4_en_priv *priv, bool reserve_vectors);
+void mlx4_en_free_resources(struct mlx4_en_priv *priv);
 int mlx4_en_alloc_resources(struct mlx4_en_priv *priv);
 
 int mlx4_en_create_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 		      int entries, int ring, enum cq_type mode);
-void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
-			bool reserve_vectors);
+void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 			int cq_idx);
 void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 4/7] mlx4_en: Added missing iounmap upon releasing a device
From: Alexander Guller @ 2011-10-09 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Fixed a memory leak caused by missing iounmap when device
is being released.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 6bfea23..a06096f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -176,6 +176,7 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr)
 	flush_workqueue(mdev->workqueue);
 	destroy_workqueue(mdev->workqueue);
 	mlx4_mr_free(dev, &mdev->mr);
+	iounmap(mdev->uar_map);
 	mlx4_uar_free(dev, &mdev->priv_uar);
 	mlx4_pd_free(dev, mdev->priv_pdn);
 	kfree(mdev);
@@ -223,7 +224,7 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
 			 MLX4_PERM_LOCAL_WRITE |  MLX4_PERM_LOCAL_READ,
 			 0, 0, &mdev->mr)) {
 		mlx4_err(mdev, "Failed allocating memory region\n");
-		goto err_uar;
+		goto err_map;
 	}
 	if (mlx4_mr_enable(mdev->dev, &mdev->mr)) {
 		mlx4_err(mdev, "Failed enabling memory region\n");
@@ -282,6 +283,9 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
 
 err_mr:
 	mlx4_mr_free(dev, &mdev->mr);
+err_map:
+	if (!mdev->uar_map)
+		iounmap(mdev->uar_map);
 err_uar:
 	mlx4_uar_free(dev, &mdev->priv_uar);
 err_pd:
-- 
1.7.5.4

Until now only RX rings used irq per ring
and TX used only one per port.
>From now on, both of them will use the
irq per ring while RX & TX ring[i] will
use the same irq.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |   26 ++++++++++++++---------
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   14 +-----------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    4 +-
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index ec4b6d0..70ec529 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -74,7 +74,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv,
 	return err;
 }
 
-int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
+int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
+			int cq_idx)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
@@ -90,13 +91,15 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 	if (cq->is_tx == RX) {
 		if (mdev->dev->caps.comp_pool) {
 			if (!cq->vector) {
-				sprintf(name , "%s-rx-%d", priv->dev->name, cq->ring);
+				sprintf(name, "%s-%d", priv->dev->name,
+					cq->ring);
+				/* Set IRQ for specific name (per ring) */
 				if (mlx4_assign_eq(mdev->dev, name, &cq->vector)) {
-					cq->vector = (cq->ring + 1 + priv->port) %
-						mdev->dev->caps.num_comp_vectors;
+					cq->vector = (cq->ring + 1 + priv->port)
+					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed Assigning an EQ to "
-						  "%s_rx-%d ,Falling back to legacy EQ's\n",
-						  priv->dev->name, cq->ring);
+						  "%s ,Falling back to legacy EQ's\n",
+						  name);
 				}
 			}
 		} else {
@@ -104,10 +107,13 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 				mdev->dev->caps.num_comp_vectors;
 		}
 	} else {
-		if (!cq->vector || !mdev->dev->caps.comp_pool) {
-			/*Fallback to legacy pool in case of error*/
-			cq->vector   = 0;
-		}
+		/* For TX we use the same irq per
+		ring we assigned for the RX    */
+		struct mlx4_en_cq *rx_cq;
+
+		cq_idx = cq_idx % priv->rx_ring_num;
+		rx_cq = &priv->rx_cq[cq_idx];
+		cq->vector = rx_cq->vector;
 	}
 
 	if (!cq->is_tx)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 27789be..b42c6aa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -587,7 +587,6 @@ int mlx4_en_start_port(struct net_device *dev)
 	int i;
 	int j;
 	u8 mc_list[16] = {0};
-	char name[32];
 
 	if (priv->port_up) {
 		en_dbg(DRV, priv, "start port called while port already up\n");
@@ -608,7 +607,7 @@ int mlx4_en_start_port(struct net_device *dev)
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		cq = &priv->rx_cq[i];
 
-		err = mlx4_en_activate_cq(priv, cq);
+		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed activating Rx CQ\n");
 			goto cq_err;
@@ -642,20 +641,11 @@ int mlx4_en_start_port(struct net_device *dev)
 		goto mac_err;
 	}
 
-	if (mdev->dev->caps.comp_pool && !priv->tx_vector) {
-		sprintf(name , "%s-tx", priv->dev->name);
-		if (mlx4_assign_eq(mdev->dev , name, &priv->tx_vector)) {
-			mlx4_warn(mdev, "Failed Assigning an EQ to "
-					"%s_tx ,Falling back to legacy "
-					"EQ's\n", priv->dev->name);
-		}
-	}
 	/* Configure tx cq's and rings */
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		/* Configure cq */
 		cq = &priv->tx_cq[i];
-		cq->vector = priv->tx_vector;
-		err = mlx4_en_activate_cq(priv, cq);
+		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed allocating Tx CQ\n");
 			goto tx_err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ed84811..115784d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -470,7 +470,6 @@ struct mlx4_en_priv {
 	u16 log_rx_info;
 
 	struct mlx4_en_tx_ring tx_ring[MAX_TX_RINGS];
-	int tx_vector;
 	struct mlx4_en_rx_ring rx_ring[MAX_RX_RINGS];
 	struct mlx4_en_cq tx_cq[MAX_TX_RINGS];
 	struct mlx4_en_cq rx_cq[MAX_RX_RINGS];
@@ -510,7 +509,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 		      int entries, int ring, enum cq_type mode);
 void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 			bool reserve_vectors);
-int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
+int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
+			int cq_idx);
 void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_set_cq_moder(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_arm_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 5/7] mlx4_en: Fix QP number calculation according to module param
From: Alexander Guller @ 2011-10-09 15:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Number of bits taken from mac table index in QP
calculation should be based on log_num_mac parameter.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_port.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 5ada5b4..8824309 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -128,7 +128,7 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 	memset(context, 0, sizeof *context);
 
 	context->base_qpn = cpu_to_be32(base_qpn);
-	context->n_mac = 0x2;
+	context->n_mac = dev->caps.log_num_macs;
 	context->promisc = cpu_to_be32(promisc << SET_PORT_PROMISC_SHIFT |
 				       base_qpn);
 	context->mcast = cpu_to_be32(m_promisc << SET_PORT_MC_PROMISC_SHIFT |
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 6/7] mlx4_en: Fix crash upon device initialization error
From: Alexander Guller @ 2011-10-09 15:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Netdevice was being freed without being unregistered first if
mlx4_SET_PORT_general or mlx4_INIT_PORT failed.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b82db4a..c4c4be4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1097,6 +1097,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		en_err(priv, "Netdev registration failed for port %d\n", port);
 		goto out;
 	}
+	priv->registered = 1;
 
 	en_warn(priv, "Using %d TX rings\n", prof->tx_ring_num);
 	en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
@@ -1118,7 +1119,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		en_err(priv, "Failed Initializing port\n");
 		goto out;
 	}
-	priv->registered = 1;
 	mlx4_en_set_default_moderation(priv);
 	queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
 	return 0;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 7/7] mlx4_en: Adding 40gb speed report for ethtool
From: Alexander Guller @ 2011-10-09 15:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Query port will now identify a 40G Ethernet speed.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_port.c |   16 +++++++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_port.h |   11 +++++++++--
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 8824309..9d27555 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -167,11 +167,21 @@ int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port)
 	/* This command is always accessed from Ethtool context
 	 * already synchronized, no need in locking */
 	state->link_state = !!(qport_context->link_up & MLX4_EN_LINK_UP_MASK);
-	if ((qport_context->link_speed & MLX4_EN_SPEED_MASK) ==
-	    MLX4_EN_1G_SPEED)
+	switch (qport_context->link_speed & MLX4_EN_SPEED_MASK) {
+	case MLX4_EN_1G_SPEED:
 		state->link_speed = 1000;
-	else
+		break;
+	case MLX4_EN_10G_SPEED_XAUI:
+	case MLX4_EN_10G_SPEED_XFI:
 		state->link_speed = 10000;
+		break;
+	case MLX4_EN_40G_SPEED:
+		state->link_speed = 40000;
+		break;
+	default:
+		state->link_speed = -1;
+		break;
+	}
 	state->transciver = qport_context->transceiver;
 
 out:
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.h b/drivers/net/ethernet/mellanox/mlx4/en_port.h
index e3d73e4..19eb244 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.h
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.h
@@ -94,6 +94,14 @@ enum {
 	MLX4_MCAST_ENABLE       = 2,
 };
 
+enum {
+	MLX4_EN_1G_SPEED	= 0x02,
+	MLX4_EN_10G_SPEED_XFI	= 0x01,
+	MLX4_EN_10G_SPEED_XAUI	= 0x00,
+	MLX4_EN_40G_SPEED	= 0x40,
+	MLX4_EN_OTHER_SPEED	= 0x0f,
+};
+
 struct mlx4_en_query_port_context {
 	u8 link_up;
 #define MLX4_EN_LINK_UP_MASK	0x80
@@ -101,8 +109,7 @@ struct mlx4_en_query_port_context {
 	__be16 mtu;
 	u8 reserved2;
 	u8 link_speed;
-#define MLX4_EN_SPEED_MASK	0x3
-#define MLX4_EN_1G_SPEED	0x2
+#define MLX4_EN_SPEED_MASK	0x43
 	u16 reserved3[5];
 	__be64 mac;
 	u8 transceiver;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 3/7] mlx4_en: Adjusting moderation per each ring
From: Alexander Guller @ 2011-10-09 15:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, alexg, yevgenyp

Moderation is now done per ring and coalescing is enabled
by set_ring_param in ethtool.

Signed-off-by: Alexander Guller <alexg@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   12 +++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   88 ++++++++---------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    6 +-
 3 files changed, 45 insertions(+), 61 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index e247bd7..74e2a2a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -342,13 +342,13 @@ static int mlx4_en_set_coalesce(struct net_device *dev,
 	priv->rx_usecs_high = coal->rx_coalesce_usecs_high;
 	priv->sample_interval = coal->rate_sample_interval;
 	priv->adaptive_rx_coal = coal->use_adaptive_rx_coalesce;
-	priv->last_moder_time = MLX4_EN_AUTO_CONF;
 	if (priv->adaptive_rx_coal)
 		return 0;
 
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		priv->rx_cq[i].moder_cnt = priv->rx_frames;
 		priv->rx_cq[i].moder_time = priv->rx_usecs;
+		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
 		err = mlx4_en_set_cq_moder(priv, &priv->rx_cq[i]);
 		if (err)
 			return err;
@@ -394,6 +394,7 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 	u32 rx_size, tx_size;
 	int port_up = 0;
 	int err = 0;
+	int i;
 
 	if (param->rx_jumbo_pending || param->rx_mini_pending)
 		return -EINVAL;
@@ -432,6 +433,15 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 			en_err(priv, "Failed starting port\n");
 	}
 
+	for (i = 0; i < priv->rx_ring_num; i++) {
+		priv->rx_cq[i].moder_cnt = priv->rx_frames;
+		priv->rx_cq[i].moder_time = priv->rx_usecs;
+		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
+		err = mlx4_en_set_cq_moder(priv, &priv->rx_cq[i]);
+		if (err)
+			goto out;
+	}
+
 out:
 	mutex_unlock(&mdev->state_lock);
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 8402982..b82db4a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -415,6 +415,9 @@ static void mlx4_en_set_default_moderation(struct mlx4_en_priv *priv)
 		cq = &priv->rx_cq[i];
 		cq->moder_cnt = priv->rx_frames;
 		cq->moder_time = priv->rx_usecs;
+		priv->last_moder_time[i] = MLX4_EN_AUTO_CONF;
+		priv->last_moder_packets[i] = 0;
+		priv->last_moder_bytes[i] = 0;
 	}
 
 	for (i = 0; i < priv->tx_ring_num; i++) {
@@ -430,11 +433,8 @@ static void mlx4_en_set_default_moderation(struct mlx4_en_priv *priv)
 	priv->rx_usecs_high = MLX4_EN_RX_COAL_TIME_HIGH;
 	priv->sample_interval = MLX4_EN_SAMPLE_INTERVAL;
 	priv->adaptive_rx_coal = 1;
-	priv->last_moder_time = MLX4_EN_AUTO_CONF;
 	priv->last_moder_jiffies = 0;
-	priv->last_moder_packets = 0;
 	priv->last_moder_tx_packets = 0;
-	priv->last_moder_bytes = 0;
 }
 
 static void mlx4_en_auto_moderation(struct mlx4_en_priv *priv)
@@ -446,43 +446,30 @@ static void mlx4_en_auto_moderation(struct mlx4_en_priv *priv)
 	unsigned long avg_pkt_size;
 	unsigned long rx_packets;
 	unsigned long rx_bytes;
-	unsigned long tx_packets;
-	unsigned long tx_pkt_diff;
 	unsigned long rx_pkt_diff;
 	int moder_time;
-	int i, err;
+	int ring, err;
 
 	if (!priv->adaptive_rx_coal || period < priv->sample_interval * HZ)
 		return;
 
-	spin_lock_bh(&priv->stats_lock);
-	rx_packets = priv->stats.rx_packets;
-	rx_bytes = priv->stats.rx_bytes;
-	tx_packets = priv->stats.tx_packets;
-	spin_unlock_bh(&priv->stats_lock);
-
-	if (!priv->last_moder_jiffies || !period)
-		goto out;
-
-	tx_pkt_diff = ((unsigned long) (tx_packets -
-					priv->last_moder_tx_packets));
-	rx_pkt_diff = ((unsigned long) (rx_packets -
-					priv->last_moder_packets));
-	packets = max(tx_pkt_diff, rx_pkt_diff);
-	rate = packets * HZ / period;
-	avg_pkt_size = packets ? ((unsigned long) (rx_bytes -
-				 priv->last_moder_bytes)) / packets : 0;
-
-	/* Apply auto-moderation only when packet rate exceeds a rate that
-	 * it matters */
-	if (rate > MLX4_EN_RX_RATE_THRESH && avg_pkt_size > MLX4_EN_AVG_PKT_SMALL) {
-		/* If tx and rx packet rates are not balanced, assume that
-		 * traffic is mainly BW bound and apply maximum moderation.
-		 * Otherwise, moderate according to packet rate */
-		if (2 * tx_pkt_diff > 3 * rx_pkt_diff ||
-		    2 * rx_pkt_diff > 3 * tx_pkt_diff) {
-			moder_time = priv->rx_usecs_high;
-		} else {
+	for (ring = 0; ring < priv->rx_ring_num; ring++) {
+		spin_lock_bh(&priv->stats_lock);
+		rx_packets = priv->rx_ring[ring].packets;
+		rx_bytes = priv->rx_ring[ring].bytes;
+		spin_unlock_bh(&priv->stats_lock);
+
+		rx_pkt_diff = ((unsigned long) (rx_packets -
+				priv->last_moder_packets[ring]));
+		packets = rx_pkt_diff;
+		rate = packets * HZ / period;
+		avg_pkt_size = packets ? ((unsigned long) (rx_bytes -
+				priv->last_moder_bytes[ring])) / packets : 0;
+
+		/* Apply auto-moderation only when packet rate
+		 * exceeds a rate that it matters */
+		if (rate > (MLX4_EN_RX_RATE_THRESH / priv->rx_ring_num) &&
+		    avg_pkt_size > MLX4_EN_AVG_PKT_SMALL) {
 			if (rate < priv->pkt_rate_low)
 				moder_time = priv->rx_usecs_low;
 			else if (rate > priv->pkt_rate_high)
@@ -492,36 +479,23 @@ static void mlx4_en_auto_moderation(struct mlx4_en_priv *priv)
 					(priv->rx_usecs_high - priv->rx_usecs_low) /
 					(priv->pkt_rate_high - priv->pkt_rate_low) +
 					priv->rx_usecs_low;
+		} else {
+			moder_time = priv->rx_usecs_low;
 		}
-	} else {
-		moder_time = priv->rx_usecs_low;
-	}
-
-	en_dbg(INTR, priv, "tx rate:%lu rx_rate:%lu\n",
-	       tx_pkt_diff * HZ / period, rx_pkt_diff * HZ / period);
 
-	en_dbg(INTR, priv, "Rx moder_time changed from:%d to %d period:%lu "
-	       "[jiff] packets:%lu avg_pkt_size:%lu rate:%lu [p/s])\n",
-		 priv->last_moder_time, moder_time, period, packets,
-		 avg_pkt_size, rate);
-
-	if (moder_time != priv->last_moder_time) {
-		priv->last_moder_time = moder_time;
-		for (i = 0; i < priv->rx_ring_num; i++) {
-			cq = &priv->rx_cq[i];
+		if (moder_time != priv->last_moder_time[ring]) {
+			priv->last_moder_time[ring] = moder_time;
+			cq = &priv->rx_cq[ring];
 			cq->moder_time = moder_time;
 			err = mlx4_en_set_cq_moder(priv, cq);
-			if (err) {
-				en_err(priv, "Failed modifying moderation for cq:%d\n", i);
-				break;
-			}
+			if (err)
+				en_err(priv, "Failed modifying moderation "
+					     "for cq:%d\n", ring);
 		}
+		priv->last_moder_packets[ring] = rx_packets;
+		priv->last_moder_bytes[ring] = rx_bytes;
 	}
 
-out:
-	priv->last_moder_packets = rx_packets;
-	priv->last_moder_tx_packets = tx_packets;
-	priv->last_moder_bytes = rx_bytes;
 	priv->last_moder_jiffies = jiffies;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index fe8146d..3b753f7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -426,11 +426,11 @@ struct mlx4_en_priv {
 	struct mlx4_en_port_state port_state;
 	spinlock_t stats_lock;
 
-	unsigned long last_moder_packets;
+	unsigned long last_moder_packets[MAX_RX_RINGS];
 	unsigned long last_moder_tx_packets;
-	unsigned long last_moder_bytes;
+	unsigned long last_moder_bytes[MAX_RX_RINGS];
 	unsigned long last_moder_jiffies;
-	int last_moder_time;
+	int last_moder_time[MAX_RX_RINGS];
 	u16 rx_usecs;
 	u16 rx_frames;
 	u16 tx_usecs;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH] af_packet: tpacket_destruct_skb, deref skb after BUG_ON assertion
From: danborkmann @ 2011-10-09 15:19 UTC (permalink / raw)
  To: David S. Miller, netdev

This tiny patch derefs the skb only after BUG_ON(skb==NULL) was evaluated
and not before. Patched against latest Linus tree.

Thanks,
Daniel

Signed-off-by: Daniel Borkmann <danborkmann@iogearbox.net>

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fabb4fa..d9d833b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1167,11 +1167,12 @@ ring_is_full:

  static void tpacket_destruct_skb(struct sk_buff *skb)
  {
-	struct packet_sock *po = pkt_sk(skb->sk);
+	struct packet_sock *po;
  	void *ph;

  	BUG_ON(skb == NULL);

+	po = pkt_sk(skb->sk);
  	if (likely(po->tx_ring.pg_vec)) {
  		ph = skb_shinfo(skb)->destructor_arg;
  		BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);

^ permalink raw reply related

* Re: [PATCH] r8169: Fix WOL in power down case
From: Marc Ballarin @ 2011-10-09 17:13 UTC (permalink / raw)
  To: Francois Romieu; +Cc: hayeswang, netdev
In-Reply-To: <20111009115438.GA12343@electric-eye.fr.zoreil.com>

On Sun, 9 Oct 2011 13:54:38 +0200
Francois Romieu <romieu@fr.zoreil.com> wrote:

> ...
> Moreover, the patch duplicates a chunk of conditional code:
> 1) it bloats the driver (ok, we have seen worse...).
> 2) you can bet that both instances won't be updated if / when
>    RTL_GIGA_MAC_VER_xy appears.

I thought so ;-)

> 
> The patch below should fix it.

Yes, it works fine here. Thanks a lot.

^ permalink raw reply

* Re: [net-next 02/11] igb: Use node specific allocations for the q_vectors and rings
From: Andi Kleen @ 2011-10-09 18:08 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Alexander Duyck, netdev, gospo, sassmann
In-Reply-To: <1318056461-19562-3-git-send-email-jeffrey.t.kirsher@intel.com>

Jeff Kirsher <jeffrey.t.kirsher@intel.com> writes:
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
> -		ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
> +		if (orig_node == -1) {
> +			int cur_node = next_online_node(adapter->node);
> +			if (cur_node == MAX_NUMNODES)
> +				cur_node = first_online_node;

RR seems quite arbitrary. Who guarantees those nodes have any
relationship with the CPUs submitting on those queues? Or the node
the device is on.

Anyways if it's a good idea probably need to add a
dma_alloc_coherent_node() too

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox