Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-01-18 17:21 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jay Vosburgh, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <4D35C2D7.6090008@intel.com>

On 01/18/2011 07:41 PM, John Fastabend wrote:

>>
>> John, what is you opinion on such load balancing method in general,
>> without referring to particular use cases?
>>
>
> This seems reasonable to me, but I'll defer to Jay on this. As long as the
> limitations are documented and it looks like they are this may be fine.
>
> Mostly I was interested to know what led you down this path and why MPIO
> was not working as at least I expected it should. When I get some time I'll
> see if we can address at least some of these issues. Even so it seems like
> this bonding mode may still be useful for some use cases perhaps even none
> storage use cases.
>
>>

I was adressing several problems with my patch:
  - I was unable to consume whole bandwidth with multipath - with four 
1Gbit "paths" it was slightly above 2Gbit/s
  - Link failures caused quite often disk failures, which led to Oracle 
ASM rebalance, especially with versions below 11.
  - It is not always possible to autogenerate multipathd.conf with 
human-readable device names because of iscsi session id and scsi device 
bus/channel/etc mismatch(usually it differs by 1, but not necessarily), 
with bonding solution I can just look into /dev/disk/by-path to find out 
where physically is device, let's  say, /dev/sdab, located(it's just a 
free bonus I've got, so to say:)) .



-- 
С уважением,
руководитель службы
эксплуатации коммерческих и финансовых сервисов
ООО Яндекс

Олег Юхно



^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-01-18 16:57 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: John Fastabend, Jay Vosburgh, David S. Miller,
	netdev@vger.kernel.org, Sébastien Barré,
	Christophe Paasch
In-Reply-To: <4D35BED5.7040301@gmail.com>

On 01/18/2011 07:24 PM, Nicolas de Pesloüan wrote:
> Le 18/01/2011 16:28, Oleg V. Ukhno a écrit :
>> On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
>>> I remember a topology (described by Jay, for as far as I remember),
>>> where two hosts were connected through two distinct VLANs. In such
>>> topology:
>>> - it is possible to detect path failure using arp monitoring instead of
>>> miimon.
>>> - changing the destination MAC address of egress packets are not
>>> necessary, because egress path selection force ingress path selection
>>> due to the VLAN.
>>
>> In case with two VLANs - yes, this shouldn't be necessary(but needs to
>> be tested, I am not sure), but within one - it is essential for correct
>> rx load striping.
>
> Changing the destination MAC address is definitely not required if you
> segregate each path in a distinct VLAN.
Yes, such L2 network topology should provide necessary high-availability 
and load striping without need to change MAC addresses. But it is more 
difficult to maintain and to understand, in my opinion(when there are 
just several configurations like this - it's ok, but when you have 50 or 
more?) - this is why I've chosen 802.3ad.

> Even in the present of ISL between some switches, packet sent through
> host A interface connected to vlan 100 will only enter host B using the
> interface connected to vlan 100. So every slaves of the bonding
> interface can use the same MAC address.
>
> Of course, changing the destination address would be required in order
> to achieve ingress load balancing on a *single* LAN. But, as Jay noted
> at the beginning of this thread, this would violate 802.3ad.
>

I think receiving same MAC-addresses on different ports on same host 
will just make any troubleshooting much harder, won't it? With different 
MACs it takes little time to find out where the problem is, usually.
I think that implementing choice for choosing whether use single MAC 
address in etherchannel or just use slave's real MAC adresses, won't 
harm anything for both 802.3ad and balance-rr modes, but will simplify 
it's usage without doing any evil, when documented properly.

>
> You are right, but such LAN setup need to be carefully designed and
> built. I'm not sure that an automatic channel aggregation system is the
> right way to do it. Hence the reason why I suggest to use balance-rr
> with VLANs.
>
>>> Oleg, would you mind trying the above "two VLAN" topology" with
>>> mode=balance-rr and report any results ? For high-availability purpose,
>>> it's obviously necessary to setup those VLAN on distinct switches.
>> I'll do it, but it will take some time to setup test environment,
>> several days may be.
>
> Thanks. For testing purpose, it is enough to setup those VLAN on a
> single switch if it is easier for you to do.
Well, I'll do it with 2 switches :)
>
>> You mean following topology:
>
> See above.
>
>> (i'm sure it will work as desired if each host is connected to each
>> switch with only one slave link, if there are more slaves in each switch
>> - unsure)?
>
> If you want to use more than 2 slaves per host, then you need more than
> 2 VLAN.

That's what I don't like in this solution. Within one LAN is is simplier 
and requires less configuration efforts.

You also need to have the exact same number of slaves on all
> hosts, as egress path selection cause ingress path selection at the
> other side.
>

Well, and here's one difference from bonding with my patch. In case of 
my patch applied, it is not required to have equal number of slaves, it 
is enough to have *even* number of slaves, this almost always(so far I 
haven't seen opposite) gurarntees good rx(ingress) load striping.

>
> Nicolas.
>


-- 
С уважением,
руководитель службы
эксплуатации коммерческих и финансовых сервисов
ООО Яндекс

Олег Юхно



^ permalink raw reply

* Re: [PATCH] ns83820: Avoid bad pointer deref in ns83820_init_one().
From: Benjamin LaHaise @ 2011-01-18 16:42 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: netdev, linux-ns83820, linux-kernel, Tejun Heo, Kulikov Vasiliy,
	Denis Kirjanov, David S. Miller
In-Reply-To: <alpine.LNX.2.00.1101172116330.27021@swampdragon.chaosbits.net>

On Mon, Jan 17, 2011 at 09:24:57PM +0100, Jesper Juhl wrote:
> In drivers/net/ns83820.c::ns83820_init_one() we dynamically allocate 
> memory via alloc_etherdev(). We then call PRIV() on the returned storage 
> which is 'return netdev_priv()'. netdev_priv() takes the pointer it is 
> passed and adds 'ALIGN(sizeof(struct net_device), NETDEV_ALIGN)' to it and 
> returns it. Then we test the resulting pointer for NULL, which it is 
> unlikely to be at this point, and later dereference it. This will go bad 
> if alloc_etherdev() actually returned NULL.
> 
> This patch reworks the code slightly so that we test for a NULL pointer 
> (and return -ENOMEM) directly after calling alloc_etherdev().

Looks good.

		-ben

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
>  ns83820.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
>   Compile tested only. I have no way to test this for real.
> 
> diff --git a/drivers/net/ns83820.c b/drivers/net/ns83820.c
> index 84134c7..a41b2cf 100644
> --- a/drivers/net/ns83820.c
> +++ b/drivers/net/ns83820.c
> @@ -1988,12 +1988,11 @@ static int __devinit ns83820_init_one(struct pci_dev *pci_dev,
>  	}
>  
>  	ndev = alloc_etherdev(sizeof(struct ns83820));
> -	dev = PRIV(ndev);
> -
>  	err = -ENOMEM;
> -	if (!dev)
> +	if (!ndev)
>  		goto out;
>  
> +	dev = PRIV(ndev);
>  	dev->ndev = ndev;
>  
>  	spin_lock_init(&dev->rx_info.lock);
> 
> 
> -- 
> Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please.

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: John Fastabend @ 2011-01-18 16:41 UTC (permalink / raw)
  To: Oleg V. Ukhno; +Cc: Jay Vosburgh, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <4D358A47.4020009@yandex-team.ru>

On 1/18/2011 4:40 AM, Oleg V. Ukhno wrote:
> On 01/18/2011 06:16 AM, John Fastabend wrote:
>> On 1/14/2011 4:05 PM, Jay Vosburgh wrote:
>>> 	Can somebody (John?) more knowledgable than I about dm-multipath
>>> comment on the above?
>>
>> Here I'll give it a go.
>>
>> I don't think detecting L2 link failure this way is very robust. If there
>> is a failure farther away then your immediate link your going to break
>> completely? Your bonding hash will continue to round robin the iscsi
>> packets and half them will get dropped on the floor. dm-multipath handles
>> this reasonably gracefully. Also in this bonding environment you seem to
>> be very sensitive to RTT times on the network. Maybe not bad out right but
>> I wouldn't consider this robust either.
> 
> John, I agree - this bonding mode should be used in quite limited number 
> of situations, but as for failure farther away then immediate link - 
> every bonding mode will suffer same problems in this case - bonding 
> detects only L2 failures, other is done by upper-layer mechanisms. And 
> almost all bonding modes depend on equal RTT on slaves. And, there is 
> already similar load balancing mode - balance-alb - what I did is 
> approximately the same, but for 802.3ad bonding mode and provides 
> "better"(more equal and non-conditional layser2) load striping for tx 
> and _rx_ .
> 
> I think I shouldn't mention the particular use case of this patch - when 
> I wrote it I tried to make a more general solution - my goal was "make 
> equal or near-equal load striping for TX and (most important part) RX 
> within single ethernet(layer 2) domain for  TCP transmission". This 
> bonding mode  just introduces ability to stripe rx and tx load for 
> single TCP connection between hosts inside of one ethernet segment. 
> iSCSI is just an example. It is possible to stripe load between a 
> linux-based router and linux-based web/ftp/etc server as well in the 
> same manner. I think this feature will be useful in some number of 
> network configurations.
> 
>   Also, I looked into net-next code - it seems to me that it can be 
> implemented(adapted to net-next bonding code) without any difficulties 
> and hashing function change makes no problem here.
> 
> What I've written below is just my personal experience and opinion after 
> 5 years of using Oracle +iSCSI +mpath(later - patched bonding).
> 
>  From my personal experience I just can say that most iSCSI failures are 
> caused by link failures, and also I would never send any significant 
> iSCSI traffic via router - router would be a bottleneck in this case.
> So, in my case iSCSI traffic flows within one ethernet domain and in 
> case of link failure bonding driver simply fails one slave(in case of 
> bonding) , instead of checking and failing hundreths of paths (in case 
> of mpath) and first case significantly less cpu, net and time 
> consuming(if using default mpath checker - readsector0).
> Mpath is good for me, when I use it to "merge" drbd mirrors from 
> different hosts, but for just doing simple load striping within single 
> L2 network switch  between 2 .. 16 hosts is some overkill(particularly 
> in maintaining human-readable device naming) :).
> 
> John, what is you opinion on such load balancing method in general, 
> without referring to particular use cases?
> 

This seems reasonable to me, but I'll defer to Jay on this. As long as the
limitations are documented and it looks like they are this may be fine.

Mostly I was interested to know what led you down this path and why MPIO
was not working as at least I expected it should. When I get some time I'll
see if we can address at least some of these issues. Even so it seems like
this bonding mode may still be useful for some use cases perhaps even none
storage use cases.

> 
>>
>> You could tweak your scsi timeout values and fail_fast values, set the io
>> retry to 0 to cause the fail over to occur faster. I suspect you already
>> did this and still it is too slow? Maybe adding a checker in multipathd to
>> listen for link events would be fast enough. The checker could then fail
>> the path immediately.
>>
>> I'll try to address your comments from the other thread here. In general I
>> wonder if it would be better to solve the problems in dm-multipath rather than
>> add another bonding mode?
> Of course I did this, but mpath is fine when device quantity is below 
> 30-40 devices with two paths, 150-200 devices with 2+ paths can make 
> life far more interesting :)

OK admittedly this gets ugly fast.

>>
>> OVU - it is slow(I am using ISCSI for Oracle , so I need to minimize latency)
>>
>> The dm-multipath layer is adding latency? How much? If this is really true
>> maybe its best to the address the real issue here and not avoid it by
>> using the bonding layer.
> 
> I do not remember exact number now, but switching one of my databases , 
> about 2 years ago to bonding increased read throughput for the entire db 
> from 15-20 Tb/day to approximately 30-35 Tb/day (4 iscsi initiators and 
> 8 iscsi targets, 4 ethernet links for iSCSI on each host, all plugged in 
> one switch) because of "full" bandwidth use. Also, bonding usage 
> simplifies network and application setup greatly(compared to mpath)
> 
>>
>> OVU - it handles any link failures bad, because of it's command queue
>> limitation(all queued commands above 32 are discarded in case of path
>> failure, as I remember)
>>
>> Maybe true but only link failures with the immediate peer are handled
>> with a bonding strategy. By working at the block layer we can detect
>> failures throughout the path. I would need to look into this again I
>> know when we were looking at this sometime ago there was some talk about
>> improving this behavior. I need to take some time to go back through the
>> error recovery stuff to remember how this works.
>>
>> OVU - it performs very bad when there are many devices and maтy paths(I was
>> unable to utilize more that 2Gbps of 4 even with 100 disks with 4 paths
>> per each disk)
> 
> Well, I think that behavior can be explained in such a way:
> when balancing by I/Os number per path(rr_min_io), and there is a huge 
> number of devices, mpath is doing load-balaning per-device, and it is 
> not possible to quarantee equal device use for all devices, so there 
> will be imbalance over network interface(mpath is unaware of it's 
> existence, etc), and it is likely it becomes more imbalanced when there 
> are many devices. Also, counting I/O's for many devices and paths 
> consumes some CPU resources and also can cause excessive context switches.
> 

hmm I'll get something setup here and see if this is the case.

>>
>> Hmm well that seems like something is broken. I'll try this setup when
>> I get some time next few days. This really shouldn't be the case dm-multipath
>> should not add a bunch of extra latency or effect throughput significantly.
>> By the way what are you seeing without mpio?
> 
> And one more obsevation from my 2-years old tests - reading device(using 
> dd) (rhel 5 update 1 kernel, ramdisk via ISCSI via loopback ) as mpath 
> device with single path was done at approximately 120-150mb/s, and same 
> test on non-mpath device at 800-900mb/s. Here I am quite sure, it was a 
> kind of revelation to me that time.
> 

Similarly I'll have a look. Thanks for the info.

>>
>> Thanks,
>> John
>>
> 
> 
 

^ permalink raw reply

* [PATCH] af_unix: implement socket filter
From: Ian Molton @ 2011-01-18 16:39 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, eric.dumazet, ebiederm, xemul, davidel,
	Alban Crequy

From: Alban Crequy <alban.crequy@collabora.co.uk>

Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txt

But the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.

This short program demonstrates the problem on SOCK_DGRAM.

int main(void) {
  int i, j, ret;
  int sv[2];
  struct pollfd fds[2];
  char *message = "Hello world!";
  char buffer[64];
  struct sock_filter ins[32] = {{0,},};
  struct sock_fprog filter;

  socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);

  for (i = 0 ; i < 2 ; i++) {
    fds[i].fd = sv[i];
    fds[i].events = POLLIN;
    fds[i].revents = 0;
  }

  for(j = 1 ; j < 13 ; j++) {

    /* Set a socket filter to truncate the message */
    memset(ins, 0, sizeof(ins));
    ins[0].code = BPF_RET|BPF_K;
    ins[0].k = j;
    filter.len = 1;
    filter.filter = ins;
    setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));

    /* send a message */
    send(sv[0], message, strlen(message) + 1, 0);

    /* The filter should let the message pass but truncated. */
    poll(fds, 2, 0);

    /* Receive the truncated message*/
    ret = recv(sv[1], buffer, 64, 0);
    printf("received %d bytes, expected %d\n", ret, j);
  }

    for (i = 0 ; i < 2 ; i++)
      close(sv[i]);

  return 0;
}

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index dd419d2..8d9bbba 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1475,6 +1475,12 @@ restart:
 			goto out_free;
 	}
 
+	if (sk_filter(other, skb) < 0) {
+		/* Toss the packet but do not return any error to the sender */
+		err = len;
+		goto out_free;
+	}
+
 	unix_state_lock(other);
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
-- 
1.7.2.3

^ permalink raw reply related

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Nicolas de Pesloüan @ 2011-01-18 16:24 UTC (permalink / raw)
  To: Oleg V. Ukhno
  Cc: John Fastabend, Jay Vosburgh, David S. Miller,
	netdev@vger.kernel.org, Sébastien Barré,
	Christophe Paasch
In-Reply-To: <4D35B1B0.2090905@yandex-team.ru>

Le 18/01/2011 16:28, Oleg V. Ukhno a écrit :
> On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
>> I remember a topology (described by Jay, for as far as I remember),
>> where two hosts were connected through two distinct VLANs. In such
>> topology:
>> - it is possible to detect path failure using arp monitoring instead of
>> miimon.
>> - changing the destination MAC address of egress packets are not
>> necessary, because egress path selection force ingress path selection
>> due to the VLAN.
>
> In case with two VLANs - yes, this shouldn't be necessary(but needs to
> be tested, I am not sure), but within one - it is essential for correct
> rx load striping.

Changing the destination MAC address is definitely not required if you segregate each path in a 
distinct VLAN.

             +-------------------+     +-------------------+
     +-------|switch 1 - vlan 100|-----|switch 2 - vlan 100|-------+
     |       +-------------------+     +-------------------+       |
+------+              |                         |              +------+
|host A|              |                         |              |host B|
+------+              |                         |              +------+
     |       +-------------------+     +-------------------+       |
     +-------|switch 3 - vlan 200|-----|switch 4 - vlan 200|-------+
             +-------------------+     +-------------------+

Even in the present of ISL between some switches, packet sent through host A interface connected to 
vlan 100 will only enter host B using the interface connected to vlan 100. So every slaves of the 
bonding interface can use the same MAC address.

Of course, changing the destination address would be required in order to achieve ingress load 
balancing on a *single* LAN. But, as Jay noted at the beginning of this thread, this would violate 
802.3ad.

>> I think the only point is whether we need a new xmit_hash_policy for
>> mode=802.3ad or whether mode=balance-rr could be enough.
> May by, but it seems to me fair enough not to restrict this feature only
> to non-LACP aggregate links; dynamic aggregation may be useful(it helps
> to avoid switch misconfiguration(misconfigured slaves on switch side)
> sometimes without loss of service).

You are right, but such LAN setup need to be carefully designed and built. I'm not sure that an 
automatic channel aggregation system is the right way to do it. Hence the reason why I suggest to 
use balance-rr with VLANs.

>> Oleg, would you mind trying the above "two VLAN" topology" with
>> mode=balance-rr and report any results ? For high-availability purpose,
>> it's obviously necessary to setup those VLAN on distinct switches.
> I'll do it, but it will take some time to setup test environment,
> several days may be.

Thanks. For testing purpose, it is enough to setup those VLAN on a single switch if it is easier for 
you to do.

> You mean following topology:

See above.

> (i'm sure it will work as desired if each host is connected to each
> switch with only one slave link, if there are more slaves in each switch
> - unsure)?

If you want to use more than 2 slaves per host, then you need more than 2 VLAN. You also need to 
have the exact same number of slaves on all hosts, as egress path selection cause ingress path 
selection at the other side.

             +-------------------+     +-------------------+
     +-------|switch 1 - vlan 100|-----|switch 2 - vlan 100|-------+
     |       +-------------------+     +-------------------+       |
+------+              |                         |              +------+
|host A|              |                         |              |host B|
+------+              |                         |              +------+
   | |       +-------------------+     +-------------------+       | |
   | +-------|switch 3 - vlan 200|-----|switch 4 - vlan 200|-------+ |
   |         +-------------------+     +-------------------+         |
   |                   |                         |                   |
   |                   |                         |                   |
   |         +-------------------+     +-------------------+         |
   +---------|switch 5 - vlan 300|-----|switch 6 - vlan 300|---------+
             +-------------------+     +-------------------+

Of course, you can add others host to vlan 100, 200 and 300, with the exact same configuration at 
host A or host B.

	Nicolas.

^ permalink raw reply

* Re: [PATCH net-next 5/8] vmxnet3: Make ethtool handlers multiqueue aware
From: Ben Hutchings @ 2011-01-18 16:05 UTC (permalink / raw)
  To: Shreyas N Bhatewara; +Cc: netdev, linux-kernel, pv-drivers
In-Reply-To: <20110115005946.1064.97955.stgit@sbhatewara-dev1.eng.vmware.com>

On Fri, 2011-01-14 at 16:59 -0800, Shreyas N Bhatewara wrote:
> Show per-queue stats in ethtool -S output for vmxnet3 interface. Register dump
> of ethtool should dump registers for all tx and rx queues.
> 
> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
> ---
>  drivers/net/vmxnet3/vmxnet3_ethtool.c |  259 ++++++++++++++++++---------------
>  1 files changed, 145 insertions(+), 114 deletions(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> index 8e17fc8..d70cee1 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> @@ -68,76 +68,78 @@ vmxnet3_set_rx_csum(struct net_device *netdev, u32 val)
>  static const struct vmxnet3_stat_desc
>  vmxnet3_tq_dev_stats[] = {
>  	/* description,         offset */
> -	{ "TSO pkts tx",        offsetof(struct UPT1_TxStats, TSOPktsTxOK) },
> -	{ "TSO bytes tx",       offsetof(struct UPT1_TxStats, TSOBytesTxOK) },
> -	{ "ucast pkts tx",      offsetof(struct UPT1_TxStats, ucastPktsTxOK) },
> -	{ "ucast bytes tx",     offsetof(struct UPT1_TxStats, ucastBytesTxOK) },
> -	{ "mcast pkts tx",      offsetof(struct UPT1_TxStats, mcastPktsTxOK) },
> -	{ "mcast bytes tx",     offsetof(struct UPT1_TxStats, mcastBytesTxOK) },
> -	{ "bcast pkts tx",      offsetof(struct UPT1_TxStats, bcastPktsTxOK) },
> -	{ "bcast bytes tx",     offsetof(struct UPT1_TxStats, bcastBytesTxOK) },
> -	{ "pkts tx err",        offsetof(struct UPT1_TxStats, pktsTxError) },
> -	{ "pkts tx discard",    offsetof(struct UPT1_TxStats, pktsTxDiscard) },
> +	{ "Tx Queue#",        0 },
> +	{ "  TSO pkts tx",	offsetof(struct UPT1_TxStats, TSOPktsTxOK) },
> +	{ "  TSO bytes tx",	offsetof(struct UPT1_TxStats, TSOBytesTxOK) },
[...]

I really don't like this.  You're making the assumption that these stats
will always be displayed as they are now by the ethtool command, but
that is not the only user of the ethtool API.

I expect that some people have scripts that involve reading ethtool
stats into a hash/dictionary.  (In fact, I wrote a diagnostic script for
Solarflare that does that.)  After this change to your driver, they
would get results from only one TX queue (with different names from
before).

So please:
- Don't use leading or trailing spaces in names
- Keep the global statistics, as most users will be more interested in
these
- If you think users actually want per-queue statistics, add them with
unique names (like bnx2x does)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v4] netfilter: ipt_CLUSTERIP: remove "no conntrack!"
From: Patrick McHardy @ 2011-01-18 15:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Eric Dumazet, Jan Engelhardt, Netfilter Development Mailinglist,
	netdev
In-Reply-To: <4D2F28B9.50407@netfilter.org>

On 13.01.2011 17:30, Pablo Neira Ayuso wrote:
> On 13/01/11 15:39, Eric Dumazet wrote:
> hey hey, I'm fine with fixing things. Patch v4 is OK.
> 
> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-01-18 15:28 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: John Fastabend, Jay Vosburgh, David S. Miller,
	netdev@vger.kernel.org, Sébastien Barré,
	Christophe Paasch
In-Reply-To: <4D35A9B4.7030701@gmail.com>

On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
> Le 18/01/2011 13:40, Oleg V. Ukhno a écrit :
>
> The fact that there exist many situations where it simply doesn't work,
> should not cause the idea of Oleg to be rejected.
>
> In Documentation/networking/bonding.txt, tuning tcp_reordering on
> receiving side is already documented as a possible workaround for out of
> order delivery due to load balancing of a single TCP session, using
> mode=balance-rr.
>
> This might work reasonably well in a pure LAN topology, without any
> router between both ends of the TCP session, even if this is limited to
> Linux hosts. The uses are not uncommon and not limited to iSCSI:
> - between an application server and a database server,
> - between members of a cluster, for replication purpose,
> - between a server and a backup system,
> - ...
Nicolas, thank you for your opinion - this is exactly what I mean - 
iSCSI is just one particular use case, but there are many cases where 
this load balancing method will be useful
>
> Of course, for longer paths, with routers and variable RTT, we would
> need something different (possibly MultiPathTCP:
> http://datatracker.ietf.org/wg/mptcp/).
>
> I remember a topology (described by Jay, for as far as I remember),
> where two hosts were connected through two distinct VLANs. In such
> topology:
> - it is possible to detect path failure using arp monitoring instead of
> miimon.
> - changing the destination MAC address of egress packets are not
> necessary, because egress path selection force ingress path selection
> due to the VLAN.

In case with two VLANs - yes, this shouldn't be necessary(but needs to 
be tested, I am not sure), but within one - it is essential for correct 
rx load striping.
>
> I think the only point is whether we need a new xmit_hash_policy for
> mode=802.3ad or whether mode=balance-rr could be enough.
May by, but it seems to me fair enough not to restrict this feature only 
to non-LACP aggregate links; dynamic aggregation may be useful(it helps 
to avoid switch misconfiguration(misconfigured slaves on switch side) 
sometimes without loss of service).
>
> Oleg, would you mind trying the above "two VLAN" topology" with
> mode=balance-rr and report any results ? For high-availability purpose,
> it's obviously necessary to setup those VLAN on distinct switches.
I'll do it, but it will take some time to setup test environment, 
several days may be.
You mean following topology:
           switch 1
        /           \
host A                host B
        \  switch 2 /

(i'm sure it will work as desired if each host is connected to each 
switch with only one slave link, if there are more slaves in each switch 
- unsure)?
>
> Nicolas
>
>
>


-- 
Best regards,
Oleg Ukhno.
ITO Team Lead,
Yandex LLC.




^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Nicolas de Pesloüan @ 2011-01-18 14:54 UTC (permalink / raw)
  To: Oleg V. Ukhno, John Fastabend, Jay Vosburgh, David S. Miller
  Cc: netdev@vger.kernel.org, Sébastien Barré,
	Christophe Paasch
In-Reply-To: <4D358A47.4020009@yandex-team.ru>

Le 18/01/2011 13:40, Oleg V. Ukhno a écrit :

The fact that there exist many situations where it simply doesn't work, should not cause the idea of 
Oleg to be rejected.

In Documentation/networking/bonding.txt, tuning tcp_reordering on receiving side is already 
documented as a possible workaround for out of order delivery due to load balancing of a single TCP 
session, using mode=balance-rr.

This might work reasonably well in a pure LAN topology, without any router between both ends of the 
TCP session, even if this is limited to Linux hosts. The uses are not uncommon and not limited to iSCSI:
- between an application server and a database server,
- between members of a cluster, for replication purpose,
- between a server and a backup system,
- ...

Of course, for longer paths, with routers and variable RTT, we would need something different 
(possibly MultiPathTCP: http://datatracker.ietf.org/wg/mptcp/).

I remember a topology (described by Jay, for as far as I remember), where two hosts were connected 
through two distinct VLANs. In such topology:
- it is possible to detect path failure using arp monitoring instead of miimon.
- changing the destination MAC address of egress packets are not necessary, because egress path 
selection force ingress path selection due to the VLAN.

I think the only point is whether we need a new xmit_hash_policy for mode=802.3ad or whether 
mode=balance-rr could be enough.

Oleg, would you mind trying the above "two VLAN" topology" with mode=balance-rr and report any 
results ? For high-availability purpose, it's obviously necessary to setup those VLAN on distinct 
switches.

	Nicolas

^ permalink raw reply

* Re: [PATCH net-2.6 0/8] bnx2x: Minor link related fixes
From: Yaniv Rosner @ 2011-01-18 14:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong
In-Reply-To: <1295361191.5281.83.camel@lb-tlvb-dmitry>

Hi Dave,
I meant net-2.6, and not net-next-2.6

Thanks,
Yaniv

On Tue, 2011-01-18 at 16:33 +0200, Yaniv Rosner wrote:
> Hi Dave,
> The following patch series describe some link fixes in bnx2x driver
> Please consider applying it to net-next-2.6.
> 
> Thanks,
> Yaniv
> 
> 
> 
> 
> 
> 
> 
> 
> 




^ permalink raw reply

* [PATCH net-2.6 8/8] bnx2x: Update bnx2x version to 1.62.00-4
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Update bnx2x version to 1.62.00-4

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index 6a858a2..e56a45c 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -22,8 +22,8 @@
  * (you will need to reboot afterwards) */
 /* #define BNX2X_STOP_ON_ERROR */
 
-#define DRV_MODULE_VERSION      "1.62.00-3"
-#define DRV_MODULE_RELDATE      "2010/12/21"
+#define DRV_MODULE_VERSION      "1.62.00-4"
+#define DRV_MODULE_RELDATE      "2011/01/18"
 #define BNX2X_BC_VER            0x040200
 
 #define BNX2X_MULTI_QUEUE
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 7/8] bnx2x: Fix AER setting for BCM57712
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Fix AER settings for BCM57712 to allow accessing all device addresses range in CL45 MDC/MDIO

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index bb1c811..7160ec5 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -1573,7 +1573,7 @@ static void bnx2x_set_aer_mmd_xgxs(struct link_params *params,
 
 	offset = phy->addr + ser_lane;
 	if (CHIP_IS_E2(bp))
-		aer_val = 0x2800 + offset - 1;
+		aer_val = 0x3800 + offset - 1;
 	else
 		aer_val = 0x3800 + offset;
 	CL45_WR_OVER_CL22(bp, phy,
-- 
1.7.1










^ permalink raw reply related

* [PATCH net-2.6 6/8] bnx2x: Fix BCM84823 LED behavior
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Fix BCM84823 LED behavior which may show on some systems

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |   18 +++++++++++++++++-
 drivers/net/bnx2x/bnx2x_reg.h  |    4 ++++
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index f5fd33e..bb1c811 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -5972,10 +5972,26 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp,
 			 MDIO_PMA_REG_8481_LED2_MASK,
 			 0x18);
 
+	/* Select activity source by Tx and Rx, as suggested by PHY AE */
 	bnx2x_cl45_write(bp, phy,
 			 MDIO_PMA_DEVAD,
 			 MDIO_PMA_REG_8481_LED3_MASK,
-			 0x0040);
+			 0x0006);
+
+	/* Select the closest activity blink rate to that in 10/100/1000 */
+	bnx2x_cl45_write(bp, phy,
+			MDIO_PMA_DEVAD,
+			MDIO_PMA_REG_8481_LED3_BLINK,
+			0);
+
+	bnx2x_cl45_read(bp, phy,
+			MDIO_PMA_DEVAD,
+			MDIO_PMA_REG_84823_CTL_LED_CTL_1, &val);
+	val |= MDIO_PMA_REG_84823_LED3_STRETCH_EN; /* stretch_en for LED3*/
+
+	bnx2x_cl45_write(bp, phy,
+			 MDIO_PMA_DEVAD,
+			 MDIO_PMA_REG_84823_CTL_LED_CTL_1, val);
 
 	/* 'Interrupt Mask' */
 	bnx2x_cl45_write(bp, phy,
diff --git a/drivers/net/bnx2x/bnx2x_reg.h b/drivers/net/bnx2x/bnx2x_reg.h
index 38ef7ca..73efc9b 100644
--- a/drivers/net/bnx2x/bnx2x_reg.h
+++ b/drivers/net/bnx2x/bnx2x_reg.h
@@ -6194,7 +6194,11 @@ Theotherbitsarereservedandshouldbezero*/
 #define MDIO_CTL_REG_84823_MEDIA_PRIORITY_COPPER	0x0000
 #define MDIO_CTL_REG_84823_MEDIA_PRIORITY_FIBER		0x0100
 #define MDIO_CTL_REG_84823_MEDIA_FIBER_1G			0x1000
+#define MDIO_CTL_REG_84823_USER_CTRL_REG		0x4005
+#define MDIO_CTL_REG_84823_USER_CTRL_CMS		0x0080
 
+#define MDIO_PMA_REG_84823_CTL_LED_CTL_1		0xa8e3
+#define MDIO_PMA_REG_84823_LED3_STRETCH_EN		0x0080
 
 #define IGU_FUNC_BASE			0x0400
 
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 5/8] bnx2x: Mark full duplex on some external PHYs
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Device may show incorrect duplex mode for devices with external PHY

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index 500258d..f5fd33e 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -4408,6 +4408,7 @@ static u8 bnx2x_8073_read_status(struct bnx2x_phy *phy,
 		}
 		bnx2x_ext_phy_10G_an_resolve(bp, phy, vars);
 		bnx2x_8073_resolve_fc(phy, params, vars);
+		vars->duplex = DUPLEX_FULL;
 	}
 	return link_up;
 }
@@ -5154,6 +5155,7 @@ static u8 bnx2x_8706_8726_read_status(struct bnx2x_phy *phy,
 		else
 			vars->line_speed = SPEED_10000;
 		bnx2x_ext_phy_resolve_fc(phy, params, vars);
+		vars->duplex = DUPLEX_FULL;
 	}
 	return link_up;
 }
@@ -5850,8 +5852,11 @@ static u8 bnx2x_8727_read_status(struct bnx2x_phy *phy,
 		DP(NETIF_MSG_LINK, "port %x: External link is down\n",
 			   params->port);
 	}
-	if (link_up)
+	if (link_up) {
 		bnx2x_ext_phy_resolve_fc(phy, params, vars);
+		vars->duplex = DUPLEX_FULL;
+		DP(NETIF_MSG_LINK, "duplex = 0x%x\n", vars->duplex);
+	}
 
 	if ((DUAL_MEDIA(params)) &&
 	    (phy->req_line_speed == SPEED_1000)) {
@@ -6218,6 +6223,7 @@ static u8 bnx2x_848xx_read_status(struct bnx2x_phy *phy,
 	/* Check link 10G */
 	if (val2 & (1<<11)) {
 		vars->line_speed = SPEED_10000;
+		vars->duplex = DUPLEX_FULL;
 		link_up = 1;
 		bnx2x_ext_phy_10G_an_resolve(bp, phy, vars);
 	} else { /* Check Legacy speed link */
@@ -6581,6 +6587,7 @@ static u8 bnx2x_7101_read_status(struct bnx2x_phy *phy,
 				MDIO_AN_DEVAD, MDIO_AN_REG_MASTER_STATUS,
 				&val2);
 		vars->line_speed = SPEED_10000;
+		vars->duplex = DUPLEX_FULL;
 		DP(NETIF_MSG_LINK, "SFX7101 AN status 0x%x->Master=%x\n",
 			   val2, (val2 & (1<<14)));
 		bnx2x_ext_phy_10G_an_resolve(bp, phy, vars);
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 4/8] bnx2x: Fix BCM8073/BCM8727 microcode loading
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Improve microcode loading verification before proceeding to next stage

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |   73 +++++++++++++++++++++++----------------
 1 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index 36a8844..500258d 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -3870,11 +3870,14 @@ static void bnx2x_8073_resolve_fc(struct bnx2x_phy *phy,
 			   pause_result);
 	}
 }
-
-static void bnx2x_8073_8727_external_rom_boot(struct bnx2x *bp,
+static u8 bnx2x_8073_8727_external_rom_boot(struct bnx2x *bp,
 					      struct bnx2x_phy *phy,
 					      u8 port)
 {
+	u32 count = 0;
+	u16 fw_ver1, fw_msgout;
+	u8 rc = 0;
+
 	/* Boot port from external ROM  */
 	/* EDC grst */
 	bnx2x_cl45_write(bp, phy,
@@ -3904,14 +3907,45 @@ static void bnx2x_8073_8727_external_rom_boot(struct bnx2x *bp,
 		       MDIO_PMA_REG_GEN_CTRL,
 		       MDIO_PMA_REG_GEN_CTRL_ROM_RESET_INTERNAL_MP);
 
-	/* wait for 120ms for code download via SPI port */
-	msleep(120);
+	/* Delay 100ms per the PHY specifications */
+	msleep(100);
+
+	/* 8073 sometimes taking longer to download */
+	do {
+		count++;
+		if (count > 300) {
+			DP(NETIF_MSG_LINK,
+				 "bnx2x_8073_8727_external_rom_boot port %x:"
+				 "Download failed. fw version = 0x%x\n",
+				 port, fw_ver1);
+			rc = -EINVAL;
+			break;
+		}
+
+		bnx2x_cl45_read(bp, phy,
+				MDIO_PMA_DEVAD,
+				MDIO_PMA_REG_ROM_VER1, &fw_ver1);
+		bnx2x_cl45_read(bp, phy,
+				MDIO_PMA_DEVAD,
+				MDIO_PMA_REG_M8051_MSGOUT_REG, &fw_msgout);
+
+		msleep(1);
+	} while (fw_ver1 == 0 || fw_ver1 == 0x4321 ||
+			((fw_msgout & 0xff) != 0x03 && (phy->type ==
+			PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM8073)));
 
 	/* Clear ser_boot_ctl bit */
 	bnx2x_cl45_write(bp, phy,
 		       MDIO_PMA_DEVAD,
 		       MDIO_PMA_REG_MISC_CTRL1, 0x0000);
 	bnx2x_save_bcm_spirom_ver(bp, phy, port);
+
+	DP(NETIF_MSG_LINK,
+		 "bnx2x_8073_8727_external_rom_boot port %x:"
+		 "Download complete. fw version = 0x%x\n",
+		 port, fw_ver1);
+
+	return rc;
 }
 
 static void bnx2x_8073_set_xaui_low_power_mode(struct bnx2x *bp,
@@ -7721,7 +7755,6 @@ static u8 bnx2x_8073_common_init_phy(struct bnx2x *bp,
 
 	/* PART2 - Download firmware to both phys */
 	for (port = PORT_MAX - 1; port >= PORT_0; port--) {
-		u16 fw_ver1;
 		if (CHIP_IS_E2(bp))
 			port_of_path = 0;
 		else
@@ -7729,19 +7762,9 @@ static u8 bnx2x_8073_common_init_phy(struct bnx2x *bp,
 
 		DP(NETIF_MSG_LINK, "Loading spirom for phy address 0x%x\n",
 			   phy_blk[port]->addr);
-		bnx2x_8073_8727_external_rom_boot(bp, phy_blk[port],
-						  port_of_path);
-
-		bnx2x_cl45_read(bp, phy_blk[port],
-			      MDIO_PMA_DEVAD,
-			      MDIO_PMA_REG_ROM_VER1, &fw_ver1);
-		if (fw_ver1 == 0 || fw_ver1 == 0x4321) {
-			DP(NETIF_MSG_LINK,
-				 "bnx2x_8073_common_init_phy port %x:"
-				 "Download failed. fw version = 0x%x\n",
-				 port, fw_ver1);
+		if (bnx2x_8073_8727_external_rom_boot(bp, phy_blk[port],
+						      port_of_path))
 			return -EINVAL;
-		}
 
 		/* Only set bit 10 = 1 (Tx power down) */
 		bnx2x_cl45_read(bp, phy_blk[port],
@@ -7906,27 +7929,17 @@ static u8 bnx2x_8727_common_init_phy(struct bnx2x *bp,
 	}
 	/* PART2 - Download firmware to both phys */
 	for (port = PORT_MAX - 1; port >= PORT_0; port--) {
-		u16 fw_ver1;
 		 if (CHIP_IS_E2(bp))
 			port_of_path = 0;
 		else
 			port_of_path = port;
 		DP(NETIF_MSG_LINK, "Loading spirom for phy address 0x%x\n",
 			   phy_blk[port]->addr);
-		bnx2x_8073_8727_external_rom_boot(bp, phy_blk[port],
-						  port_of_path);
-		bnx2x_cl45_read(bp, phy_blk[port],
-			      MDIO_PMA_DEVAD,
-			      MDIO_PMA_REG_ROM_VER1, &fw_ver1);
-		if (fw_ver1 == 0 || fw_ver1 == 0x4321) {
-			DP(NETIF_MSG_LINK,
-				 "bnx2x_8727_common_init_phy port %x:"
-				 "Download failed. fw version = 0x%x\n",
-				 port, fw_ver1);
+		if (bnx2x_8073_8727_external_rom_boot(bp, phy_blk[port],
+						      port_of_path))
 			return -EINVAL;
-		}
-	}
 
+	}
 	return 0;
 }
 
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 3/8] bnx2x: LED fix for BCM8727 over BCM57712
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

LED on BCM57712+BCM8727 systems requires different settings

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index bdf3c67..36a8844 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -3166,7 +3166,23 @@ u8 bnx2x_set_led(struct link_params *params,
 		if (!vars->link_up)
 			break;
 	case LED_MODE_ON:
-		if (SINGLE_MEDIA_DIRECT(params)) {
+		if (params->phy[EXT_PHY1].type ==
+		    PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM8727 &&
+		    CHIP_IS_E2(bp) && params->num_phys == 2) {
+			/**
+			* This is a work-around for E2+8727 Configurations
+			*/
+			if (mode == LED_MODE_ON ||
+				speed == SPEED_10000){
+				REG_WR(bp, NIG_REG_LED_MODE_P0 + port*4, 0);
+				REG_WR(bp, NIG_REG_LED_10G_P0 + port*4, 1);
+
+				tmp = EMAC_RD(bp, EMAC_REG_EMAC_LED);
+				EMAC_WR(bp, EMAC_REG_EMAC_LED,
+					(tmp | EMAC_LED_OVERRIDE));
+				return rc;
+			}
+		} else if (SINGLE_MEDIA_DIRECT(params)) {
 			/**
 			* This is a work-around for HW issue found when link
 			* is up in CL73
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 2/8] bnx2x: Common init will be executed only once after POR
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Common init used to be called by the driver when the first port comes up, mainly to reset and reload external PHY microcode. 
However, in case management driver is active on the other port, traffic would halted. So limit the common init to be done only once after POR.


Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_link.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index 77f9eb1..bdf3c67 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -7958,6 +7958,7 @@ u8 bnx2x_common_init_phy(struct bnx2x *bp, u32 shmem_base_path[],
 			 u32 shmem2_base_path[], u32 chip_id)
 {
 	u8 rc = 0;
+	u32 phy_ver;
 	u8 phy_index;
 	u32 ext_phy_type, ext_phy_config;
 	DP(NETIF_MSG_LINK, "Begin common phy init\n");
@@ -7965,6 +7966,16 @@ u8 bnx2x_common_init_phy(struct bnx2x *bp, u32 shmem_base_path[],
 	if (CHIP_REV_IS_EMUL(bp))
 		return 0;
 
+	/* Check if common init was already done */
+	phy_ver = REG_RD(bp, shmem_base_path[0] +
+			 offsetof(struct shmem_region,
+				  port_mb[PORT_0].ext_phy_fw_version));
+	if (phy_ver) {
+		DP(NETIF_MSG_LINK, "Not doing common init; phy ver is 0x%x\n",
+			       phy_ver);
+		return 0;
+	}
+
 	/* Read the ext_phy_type for arbitrary port(0) */
 	for (phy_index = EXT_PHY1; phy_index < MAX_PHYS;
 	      phy_index++) {
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 1/8] bnx2x: Swap BCM8073 PHY polarity if required
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Enable controlling BCM8073 PN polarity swap through nvm configuration, which is required in certain systems

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_hsi.h  |    4 +++
 drivers/net/bnx2x/bnx2x_link.c |   42 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_hsi.h b/drivers/net/bnx2x/bnx2x_hsi.h
index 6238d4f..548f563 100644
--- a/drivers/net/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/bnx2x/bnx2x_hsi.h
@@ -352,6 +352,10 @@ struct port_hw_cfg {			    /* port 0: 0x12c  port 1: 0x2bc */
 #define PORT_HW_CFG_LANE_SWAP_CFG_31203120	    0x0000d8d8
 	/* forced only */
 #define PORT_HW_CFG_LANE_SWAP_CFG_32103210	    0x0000e4e4
+    /*	Indicate whether to swap the external phy polarity */
+#define PORT_HW_CFG_SWAP_PHY_POLARITY_MASK	       0x00010000
+#define PORT_HW_CFG_SWAP_PHY_POLARITY_DISABLED	    0x00000000
+#define PORT_HW_CFG_SWAP_PHY_POLARITY_ENABLED	    0x00010000
 
 	u32 external_phy_config;
 #define PORT_HW_CFG_SERDES_EXT_PHY_TYPE_MASK	    0xff000000
diff --git a/drivers/net/bnx2x/bnx2x_link.c b/drivers/net/bnx2x/bnx2x_link.c
index 43b0de2..77f9eb1 100644
--- a/drivers/net/bnx2x/bnx2x_link.c
+++ b/drivers/net/bnx2x/bnx2x_link.c
@@ -4108,6 +4108,25 @@ static u8 bnx2x_8073_config_init(struct bnx2x_phy *phy,
 
 	DP(NETIF_MSG_LINK, "Before rom RX_ALARM(port1): 0x%x\n", tmp1);
 
+	/**
+	 * If this is forced speed, set to KR or KX (all other are not
+	 * supported)
+	 */
+	/* Swap polarity if required - Must be done only in non-1G mode */
+	if (params->lane_config & PORT_HW_CFG_SWAP_PHY_POLARITY_ENABLED) {
+		/* Configure the 8073 to swap _P and _N of the KR lines */
+		DP(NETIF_MSG_LINK, "Swapping polarity for the 8073\n");
+		/* 10G Rx/Tx and 1G Tx signal polarity swap */
+		bnx2x_cl45_read(bp, phy,
+				MDIO_PMA_DEVAD,
+				MDIO_PMA_REG_8073_OPT_DIGITAL_CTRL, &val);
+		bnx2x_cl45_write(bp, phy,
+				 MDIO_PMA_DEVAD,
+				 MDIO_PMA_REG_8073_OPT_DIGITAL_CTRL,
+				 (val | (3<<9)));
+	}
+
+
 	/* Enable CL37 BAM */
 	if (REG_RD(bp, params->shmem_base +
 			 offsetof(struct shmem_region, dev_info.
@@ -4314,6 +4333,29 @@ static u8 bnx2x_8073_read_status(struct bnx2x_phy *phy,
 	}
 
 	if (link_up) {
+		/* Swap polarity if required */
+		if (params->lane_config &
+		    PORT_HW_CFG_SWAP_PHY_POLARITY_ENABLED) {
+			/* Configure the 8073 to swap P and N of the KR lines */
+			bnx2x_cl45_read(bp, phy,
+					MDIO_XS_DEVAD,
+					MDIO_XS_REG_8073_RX_CTRL_PCIE, &val1);
+			/**
+			* Set bit 3 to invert Rx in 1G mode and clear this bit
+			* when it`s in 10G mode.
+			*/
+			if (vars->line_speed == SPEED_1000) {
+				DP(NETIF_MSG_LINK, "Swapping 1G polarity for"
+					      "the 8073\n");
+				val1 |= (1<<3);
+			} else
+				val1 &= ~(1<<3);
+
+			bnx2x_cl45_write(bp, phy,
+					 MDIO_XS_DEVAD,
+					 MDIO_XS_REG_8073_RX_CTRL_PCIE,
+					 val1);
+		}
 		bnx2x_ext_phy_10G_an_resolve(bp, phy, vars);
 		bnx2x_8073_resolve_fc(phy, params, vars);
 	}
-- 
1.7.1









^ permalink raw reply related

* [PATCH net-2.6 0/8] bnx2x: Minor link related fixes
From: Yaniv Rosner @ 2011-01-18 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, eilong

Hi Dave,
The following patch series describe some link fixes in bnx2x driver
Please consider applying it to net-next-2.6.

Thanks,
Yaniv

^ permalink raw reply

* [patch 4/4] ipset: fix build with NDEBUG defined
From: holger @ 2011-01-18 14:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142154.697547841@eitzenberger.org>

[-- Attachment #1: ipset-fix-NDEBUG.diff --]
[-- Type: text/plain, Size: 1021 bytes --]

The usage of the gcc option -Wunused-parameter interferes badly with
the assert() macros.  In case -DNDEBUG is specified build fails with:

  cc1: warnings being treated as errors
  print.c: In function 'ipset_print_family':
  print.c:92: error: unused parameter 'opt'
  print.c: In function 'ipset_print_port':
  print.c:413: error: unused parameter 'opt'
  print.c: In function 'ipset_print_proto':

A possible fix is just to remove -Wunused, as -Wextra + -Wunused enables
-Wunused-paramter.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ipset/configure.ac
===================================================================
--- ipset.orig/configure.ac	2011-01-18 14:47:46.000000000 +0100
+++ ipset/configure.ac	2011-01-18 14:56:11.000000000 +0100
@@ -144,7 +144,6 @@
 AX_CFLAGS_GCC_OPTION(-Wstrict-prototypes)
 AX_CFLAGS_GCC_OPTION(-Wswitch-default)
 AX_CFLAGS_GCC_OPTION(-Wundef)
-AX_CFLAGS_GCC_OPTION(-Wunused)
 AX_CFLAGS_GCC_OPTION(-Wwrite-strings)
 
 dnl Checks for library functions.

-- 

^ permalink raw reply

* [patch 2/4] ipset: make IPv4 and IPv6 address handling similar
From: holger @ 2011-01-18 14:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142154.697547841@eitzenberger.org>

[-- Attachment #1: ipset-fix-ipv6-netmask-parsing.diff --]
[-- Type: text/plain, Size: 760 bytes --]

While the following works for AF_INET:

 ipset add foo 192.168.1.1/32

this does not work for AF_INET6:

 ipset add foo6 20a1:1:2:3:4:5:6:7/128
 ipset v5.2: Syntax error: plain IP address must be supplied: 20a1:1:2:3:4:5:6:7/128
 
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ipset/lib/parse.c
===================================================================
--- ipset.orig/lib/parse.c	2011-01-14 11:14:41.000000000 +0100
+++ ipset/lib/parse.c	2011-01-14 11:15:06.000000000 +0100
@@ -960,9 +960,7 @@
 		ipset_data_set(data, IPSET_OPT_FAMILY, &family);
 	}
 	
-	return family == AF_INET ? ipset_parse_ip(session, opt, str)
-				 : ipset_parse_single_ip(session, opt, str);
-
+	return ipset_parse_ip(session, opt, str);
 }
 
 /**

-- 

^ permalink raw reply

* [patch 1/4] ipset: show correct line numbers in restore output
From: holger @ 2011-01-18 14:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142154.697547841@eitzenberger.org>

[-- Attachment #1: ipset-show-correct-line-number.diff --]
[-- Type: text/plain, Size: 779 bytes --]

When passing something like

  create foo6 hash:ip hashsize 64 family inet6
  add foo6 20a1:1234:5678::/64
  add foo6 20a1:1234:5679::/64

you get:

  ipset v5.2: Error in line 1: Syntax error: plain IP address must be supplied: 20a1:1234:5678::/64

Should be line 2 though.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ipset/lib/session.c
===================================================================
--- ipset.orig/lib/session.c	2011-01-05 18:59:59.000000000 +0100
+++ ipset/lib/session.c	2011-01-07 13:11:33.000000000 +0100
@@ -194,7 +194,7 @@
 
 	if (session->lineno != 0 && type == IPSET_ERROR) {
 		sprintf(session->report, "Error in line %u: ",
-			session->lineno);
+			session->lineno + 1);
 	}
 	offset = strlen(session->report);
 	

-- 

^ permalink raw reply

* [patch 3/4] ipset: do session initialization once
From: holger @ 2011-01-18 14:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142154.697547841@eitzenberger.org>

[-- Attachment #1: ipset-one-time-session-init.diff --]
[-- Type: text/plain, Size: 988 bytes --]

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ipset/src/ipset.c
===================================================================
--- ipset.orig/src/ipset.c	2011-01-05 12:05:31.000000000 +0100
+++ ipset/src/ipset.c	2011-01-05 12:07:02.000000000 +0100
@@ -431,14 +431,6 @@
 	const struct ipset_commands *command;
 	const struct ipset_type *type;
 
-	/* Initialize session */
-	if (session == NULL) {
-		session = ipset_session_init(printf);
-		if (session == NULL)
-			return exit_error(OTHER_PROBLEM,
-				"Cannot initialize ipset session, aborting.");
-	}
-
 	/* Commandline parsing, somewhat similar to that of 'ip' */
 
 	/* First: parse core options */
@@ -743,5 +735,10 @@
 	ipset_type_add(&ipset_hash_ipportnet0);
 	ipset_type_add(&ipset_list_set0);
 
+	session = ipset_session_init(printf);
+	if (session == NULL)
+		return exit_error(OTHER_PROBLEM,
+						  "Cannot initialize ipset session, aborting.");
+
 	return parse_commandline(argc, argv);
 }

-- 

^ permalink raw reply

* [patch 0/4] Ipset fixes
From: holger @ 2011-01-18 14:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, netdev

Hi Jozsef,

what follows are some small improvements and fixes for ipset 5.

Please take a look.  Thanks!

 /holger

-- 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox