Netdev List

Netdev List
 help / color / mirror / Atom feed

* 5717 support commit is buggy
From: David Miller @ 2009-09-11 22:15 UTC (permalink / raw)
  To: mcarlson; +Cc: benli, netdev


The change:

commit f6eb9b1fc1411d22c073f5264e5630a541d0f7df
Author: Matt Carlson <mcarlson@broadcom.com>
Date:   Tue Sep 1 13:19:53 2009 +0000

    tg3: Add 5717 asic rev
    
    This patch adds the 5717 asic rev.
    
    Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
    Reviewed-by: Benjamin Li <benli@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

breaks 5703 chips on my workstation.

I suspect it breaks a lot of other chips too.

I'm about to do some tests, but I suspect it's this change:

@@ -111,7 +111,8 @@
  * replace things like '% foo' with '& (foo - 1)'.
  */
 #define TG3_RX_RCB_RING_SIZE(tp)	\
-	((tp->tg3_flags2 & TG3_FLG2_5705_PLUS) ?  512 : 1024)
+	(((tp->tg3_flags & TG3_FLAG_JUMBO_CAPABLE) && \
+	  !(tp->tg3_flags2 & TG3_FLG2_5780_CLASS)) ? 512 : 1024)
 
 #define TG3_TX_RING_SIZE		512
 #define TG3_DEF_TX_RING_PENDING		(TG3_TX_RING_SIZE - 1)

and thus an incorrect RCB ring size is being used which eventually
locks up the card.

Also:

@@ -13486,7 +13556,8 @@ static void __devinit tg3_init_link_config(struct tg3 *tp)
 
 static void __devinit tg3_init_bufmgr_config(struct tg3 *tp)
 {
-	if (tp->tg3_flags2 & TG3_FLG2_5705_PLUS) {
+	if (tp->tg3_flags2 & TG3_FLG2_5705_PLUS &&
+	    GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5717) {
 		tp->bufmgr_config.mbuf_read_dma_low_water =
 			DEFAULT_MB_RDMA_LOW_WATER_5705;
 		tp->bufmgr_config.mbuf_mac_rx_low_water =

I wonder what that does with C precedence rules.  Probably need
parenhesis around "tp->tg3_flags2 & TG3_FLG2_5705_PLUS" for
safety.

^ permalink raw reply

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
From: David Miller @ 2009-09-11 22:10 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, eric.dumazet, netdev
In-Reply-To: <20090911213812.GA3965@ami.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 11 Sep 2009 23:38:13 +0200

> On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
> ...
>> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.
> 
> Btw, I guess David owes you one classful(!) dummy(?) scheduler...

Did I forget to add the sch_mq.c file to the tree?  What are
you saying? :-)

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Eric Paris @ 2009-09-11 21:51 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: David Miller, linux-kernel, linux-fsdevel, netdev, viro, alan,
	hch
In-Reply-To: <20090911212731.GA19901@shareable.org>

On Fri, 2009-09-11 at 22:27 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > On Fri, 2009-09-11 at 21:46 +0100, Jamie Lokier wrote:
> > > Eric Paris wrote:
> > > > > I would really prefer if you worked on eliminating the problem that
> > > > > prevents you from using netlink instead.
> > > > 
> > > > I'm not really sure if I can, although I'd love to hear input from
> > > > someone who knows the netlink code on how I can make it do what I need.
> > > > I'm really not duplicating much other than the NLMSG_OK and NLMSG_NEXT
> > > > macros.  My code doesn't even use skbs and I'm not savy enough to really
> > > > know how I could.  I'm more than willing to work on it if someone can
> > > > point me to how it might work.
> > > 
> > > Let's turn the question around.
> > > 
> > > Since you're doing lots of non-sockety things, and can't tolerate
> > > dropped packets - why isn't it a character device?  What's the reason
> > > for using a socket at all?
> > > 
> > > (I'm reminded of /dev/poll, /dev/epoll and /dev/inotify :-)
> > 
> > Originally it was a char device and I was told to use a socket protocol
> > so I could use get/set sockopt rather than ioctl, because ioctl is the
> > devil (even if those aren't THAT much better).
> > 
> > The queuing being done using events instead of skbs was done reusing
> > inotify code, reusing network code would be just as good with me.  What
> > I really need is a way to convey a pointer from one process to another.
> > That's why I claim loss is not an option, since I'm holding a reference
> > to the pointer I can't have that conveyance disappear under us.
> 
> It's fine as long as the disappearing knows to releas the reference.

Absolutely.

> But I suspect fanotify would be awfully hard to use if messages were
> unreliable.

For some things yes, some things no.  I'd have to understand where loss
can happen to know if it's feasible.  If I know loss happens in the
sender context that's great.  If it's somewhere in the middle and the
sender doesn't immediately know it'll never be delivered, yes, I don't
think it can solve all my needs.  How many places can and skb get lost
between the sender and the receiver?

> Does fanotify need "lots of ioctls", or could it fit comfortably into
> say 2-5 strongly typed system calls, like inotify and epoll do?

Certainly not a lot.  I do it now with 5 setsockopt calls.  I could do
it with a limited number of syscalls (what I have today in about 4) but
I still don't believe I know all the future use cases or how to define
them in syscalls.  People are still coming up with things they want
inotify to do but we can't implement them in the syscalls we have.  I
think we all know it's a hell of a lot easier to implement a new
setsockopt call than a new syscall.

I don't think I know everything everyone is going to want out of a new
notification system yet, so I'm reticent to go down that path.  I know
how to solve some of the suck of inotify, I know how to solve a number
of people's problems (and have all the patches to do it) but when it
comes to the organic nature of things and what I hope to be the growing
future of fanotify to solve everything for everyone, I don't think I can
define it all yet.  I just don't want to end up with another inotify, it
works, the syscalls do what they do very very well, but people want new
things out of it, and we can't reasonably give it to them.

Maybe syscalls are the right thing, and I know I can solve a lot of use
cases with syscalls, but I don't even know if I know all the use cases.

-Eric

^ permalink raw reply

* Re: [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: Jay Vosburgh @ 2009-09-11 21:48 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, bonding-devel
In-Reply-To: <20090911211317.GT8515@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>bonding: add sysfs files to display tlb and alb hash table contents

	Isn't it considered bad form to have sysfs files that kick out
large amounts of data like this?  Not that I think this is a bad
facility to have, just checking on the mechanism.

>While debugging some problems with alb (mode 6) bonding I realized that
>being able to output the contents of both hash tables would be helpful.
>This is what the output looks like for the two files:
>
>device  load
>eth1    491
>eth2    491
>hash device   last device   tx bytes       load        next previous
>2    eth1     eth1          2254           491         0    0
>3    eth2     eth2          2744           491         0    0
>6             eth2          0              488         0    0
>8             eth2          0              461698      0    0
>1b            eth2          0              249         0    0
>eb            eth2          0              21          0    0
>ff            eth2          0              22          0    0
>
>hash ip_src          ip_dst          mac_dst           slave assign ntt
>2    10.0.3.2        10.0.3.11       00:e0:81:71:ee:a9 eth1  1      0
>3    10.0.3.2        10.0.3.10       00:e0:81:71:ee:a9 eth2  1      0
>8    10.0.3.2        10.0.3.1        00:e0:81:71:ee:a9 eth2  1      0
>
>These were a great help debugging the fixes I have just posted and they
>might be helpful for others, so I decided to include them in my
>patchset.
>
>Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>
>---
> drivers/net/bonding/bond_alb.c   |   61 ++++++++++++++++++++++++++++++++++++++
> drivers/net/bonding/bond_alb.h   |    2 +
> drivers/net/bonding/bond_sysfs.c |   40 +++++++++++++++++++++++++
> 3 files changed, 103 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 7db8835..4e930e3 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -778,6 +778,67 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> 	return tx_slave;
> }
>
>+int rlb_print_rx_hashtbl(struct bonding *bond, char *buf)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	struct rlb_client_info *client_info;
>+	u32 hash_index;
>+	u32 count = 0;
>+	
>+	_lock_rx_hashtbl(bond);
>+
>+	count = sprintf(buf, "hash ip_src          ip_dst          mac_dst           slave assign ntt\n");
>+	hash_index = bond_info->rx_hashtbl_head;
>+	for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
>+		client_info = &(bond_info->rx_hashtbl[hash_index]);
>+		count += sprintf(buf + count,"%-4x %-15pi4 %-15pi4 %pM %-5s %-6d %d\n",
>+				 hash_index,
>+				 &client_info->ip_src,
>+				 &client_info->ip_dst,
>+				 client_info->mac_dst,
>+				 client_info->slave->dev->name,
>+				 client_info->assigned,
>+				 client_info->ntt);
>+	}
>+
>+	_unlock_rx_hashtbl(bond);
>+	return count;
>+}
>+
>+int tlb_print_tx_hashtbl(struct bonding *bond, char *buf)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	u32 hash_index;
>+	u32 count = 0;
>+	struct slave *slave;
>+	int i;
>+	
>+	_lock_tx_hashtbl(bond);
>+
>+	count += sprintf(buf, "device  load\n");
>+	bond_for_each_slave(bond, slave, i) {
>+		struct tlb_slave_info *slave_info = &(SLAVE_TLB_INFO(slave));
>+		count += sprintf(buf + count,"%-7s %d\n",slave->dev->name,slave_info->load);
>+	}
>+	count += sprintf(buf + count, "hash device   last device   tx bytes       load        next previous\n");
>+	for (hash_index = 0; hash_index < TLB_HASH_TABLE_SIZE; hash_index++) {
>+		struct tlb_client_info *client_info = &(bond_info->tx_hashtbl[hash_index]);
>+		if (client_info->tx_slave || client_info->last_slave) {
>+			count += sprintf(buf + count,"%-4x %-8s %-13s %-14d %-11d %-4x %d\n",
>+					 hash_index,
>+					 (client_info->tx_slave) ? client_info->tx_slave->dev->name : "",
>+					 (client_info->last_slave) ? client_info->last_slave->dev->name : "",
>+					 client_info->tx_bytes,
>+					 client_info->load_history,
>+					 (client_info->next != TLB_NULL_INDEX) ? client_info->next : 0,
>+					 (client_info->prev != TLB_NULL_INDEX) ? client_info->prev : 0);
>+		}
>+	}
>+
>+	_unlock_tx_hashtbl(bond);
>+	return count;
>+}
>+
> /* Caller must hold rx_hashtbl lock */
> static void rlb_init_table_entry(struct rlb_client_info *entry)
> {
>diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
>index b65fd29..8543447 100644
>--- a/drivers/net/bonding/bond_alb.h
>+++ b/drivers/net/bonding/bond_alb.h
>@@ -132,5 +132,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
> void bond_alb_monitor(struct work_struct *);
> int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
> void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
>+int rlb_print_rx_hashtbl(struct bonding *bond, char *buf);
>+int tlb_print_tx_hashtbl(struct bonding *bond, char *buf);
> #endif /* __BOND_ALB_H__ */
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 55bf34f..1123e1f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -1480,6 +1480,44 @@ static ssize_t bonding_show_ad_partner_mac(struct device *d,
> static DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL);
>
>
>+/*
>+ * Show current tlb/alb tx hash table.
>+ */
>+static ssize_t bonding_show_tlb_tx_hash(struct device *d,
>+					   struct device_attribute *attr,
>+					   char *buf)
>+{
>+	int count = 0;
>+	struct bonding *bond = to_bond(d);
>+
>+	if (bond->params.mode == BOND_MODE_ALB ||
>+	    bond->params.mode == BOND_MODE_TLB) {
>+		count = tlb_print_tx_hashtbl(bond, buf);
>+	}
>+
>+	return count;
>+}
>+static DEVICE_ATTR(tlb_tx_hash, S_IRUGO, bonding_show_tlb_tx_hash, NULL);

	Should the mode here be S_IRUSR (0400, instead of 0444)?
Otherwise, a nefarious user could "while 1 cat /sys/.../tlb_tx_hash" and
keep the hash table lock fairly busy.  Since the lock is acquired for
every packet on tx, that's probably a bad thing.

>+
>+/*
>+ * Show current alb rx hash table.
>+ */
>+static ssize_t bonding_show_alb_rx_hash(struct device *d,
>+					   struct device_attribute *attr,
>+					   char *buf)
>+{
>+	int count = 0;
>+	struct bonding *bond = to_bond(d);
>+
>+	if (bond->params.mode == BOND_MODE_ALB) {
>+		count = rlb_print_rx_hashtbl(bond, buf);
>+	}
>+
>+	return count;
>+}
>+static DEVICE_ATTR(alb_rx_hash, S_IRUGO, bonding_show_alb_rx_hash, NULL);

	Same comment as for the mode of the tlb_tx_hash, although the rx
hash table lock is much more lightly used, so it might not be a real
problem.

>
> static struct attribute *per_bond_attrs[] = {
> 	&dev_attr_slaves.attr,
>@@ -1505,6 +1543,8 @@ static struct attribute *per_bond_attrs[] = {
> 	&dev_attr_ad_actor_key.attr,
> 	&dev_attr_ad_partner_key.attr,
> 	&dev_attr_ad_partner_mac.attr,
>+	&dev_attr_alb_rx_hash.attr,
>+	&dev_attr_tlb_tx_hash.attr,
> 	NULL,
> };
>
>-- 
>1.5.5.6
>

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Jamie Lokier @ 2009-09-11 21:42 UTC (permalink / raw)
  To: jamal
  Cc: Eric Paris, David Miller, linux-kernel, linux-fsdevel, netdev,
	viro, alan, hch, balbir
In-Reply-To: <1252704102.25158.36.camel@dogo.mojatatu.com>

jamal wrote:
> 1) Netlink messages wont get lost unless the listener is not keeping up
> and the kernel sending it messages ends up filling its queues. In such a
> case your event message will be delivered to the 49 other users but
> not the overloaded one. You can add sequence numbers to the event
> messages you send to the listeners and any gaps in sequences on received
> events imply lost events. You can add a mechanism to query your user
> space kernel when something like that gets lost.

One of the uses of fanotify is as a security or auditing mechanism.
That can't tolerate gaps.

It's fundemantally different from inotify in one important respect:
inotify apps can recover from losing events by checking what they are
watching.

The fanotify application will know that it missed events, but what
happens to the other application which _caused_ those events?  Does it
get to do things it shouldn't, or hide them from the fanotify app, by
simply overloading the system?  Or the opposite, does it get access
denied - spurious file errors when the system is overloaded?

There's no way to handle that by dropping events.  A transport
mechanism can be dropped (say skbs), but the event itself has to be
kept, and then retried.

Since you have to keep an event object around until it's handled,
there's no point tying it to an unreliable delivery mechanism which
you'd have to wrap a retry mechanism around.

In other words, that part of netlink is a poor match.  It would match
inotify much better.

> 2) Your architecture has to take care of maintaining the state of what
> you want to deliver. So your editing has nothing to do with skbs.
> i.e an event happens, you update your state. If you need to send the
> event to the listeners, you alloc an skb - populate it with the info;
> multicast it to all the listeners. If something else happens, i would
> suggest for sake of simplicity you rinse and repeat. Sure, the listener
> may get contradicting events - but they should be able to handle it.

Speaking of skbs, how fast and compact are they for this?

Eric's explained that it would be normal for _every_ file operation on
some systems to trigger a fanotify event and possibly wait on the
response, or at least in major directory trees on the filesystem.
Even if it's just for the fanotify app to say "oh I don't care about
that file, carry on".

File performance is one of those things which really needs to be fast
for a good user experience - and it's not unusual to grep the odd
10,000 files here or there (just think of what a kernel developer
does), or to replace a few thousand quickly (rpm/dpkg) and things like
that.

While skbs and netlink aren't that slow, I suspect they're an order of
magnitude or two slower than, say, epoll or inotify at passing events
around.

-- Jamie

^ permalink raw reply

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
From: Jarek Poplawski @ 2009-09-11 21:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, netdev, David Miller
In-Reply-To: <4AA8E2FB.3040809@trash.net>

On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
...
> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.

Btw, I guess David owes you one classful(!) dummy(?) scheduler...

Jarek P.

commit 6ec1c69a8f6492fd25722f4762721921da074c12
Author: David S. Miller <davem@davemloft.net>
Date:   Sun Sep 6 01:58:51 2009 -0700

    net_sched: add classful multiqueue dummy scheduler
    
    This patch adds a classful dummy scheduler which can be used as root qdisc
    for multiqueue devices and exposes each device queue as a child class.
    
    This allows to address queues individually and graft them similar to regular
    classes. Additionally it presents an accumulated view of the statistics of
    all real root qdiscs in the dummy root.
    
    Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
    
    - cl_ops->select_queue selects the tx queue number for new child classes.
    
    - qdisc_ops->attach() overrides root qdisc device grafting to attach
      non-shared qdiscs to the queues.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
new file mode 100644
index 0000000..c84dec9
--- /dev/null
+++ b/net/sched/sch_mq.c


^ permalink raw reply

* Re: [iproute2] tc action mirred    question
From: jamal @ 2009-09-11 21:28 UTC (permalink / raw)
  To: Xiaofei Wu; +Cc: linux netdev
In-Reply-To: <510605.34044.qm@web111611.mail.gq1.yahoo.com>

On Fri, 2009-09-11 at 11:45 -0700, Xiaofei Wu wrote:
> I run your example ( mirror  lo -> eth0) on Sep. 10th, got almost the same result(in my last email) as yours.
> I think interface 'lo' is very special.
> 
> When I do the following (eth0 -> lo), the results are very strange.
> 1> run 'tc qdisc add dev eth0 handle 1: root prio'
> 
> 2>  tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \
> match ip src 192.168.1.0/32 flowid 1:16 \
> action pedit munge offset -14 u16 set 0x0023 \
> munge offset -12 u32 set 0xcdafecda \
> munge offset -8 u32 set 0x0023cdaf \
> munge offset -4 u32 set 0xd0740800 pipe \
> action mirred egress mirror dev lo
> 
> window1  run ' ping 192.168.1.1'
> window2  'tcpdump -i lo -e', I can not capture any packets.
> 

I think you are doing something wrong. Are there really packets
being generated with that source address.
I just did:
----
tc qdisc add dev eth0 handle 1: root prio

tc filter add dev eth0 parent 1: protocol ip prio 10 u32 match ip dst
10.0.0.27 flowid 1:16 action pedit munge offset -14 u16 set 0x0023 munge
offset -12 u32 set 0xcdafecda munge offset -8 u32 set 0x0023cdaf munge
offset -4 u32 set 0xd0740800 pipe action mirred egress mirror dev lo
----

I then ping 10.0.0.27 and i can see the packets on tcpdump lo,

> mirror  lo -> eth0 ok, eth0 -> lo  can not work ???
> 
> 2'> change 'action mirred egress mirror dev lo' to 'action mirred egress mirror dev eth1' ,
> 'tcpdump -i eth1 -e'    also capture nothing.
> Does this mean something wrong with ' action pedit ...' ?   ("offset must be on 32 bit boundaries"?)
> 


Just make sure it all works first. Perhaps you need to run tcpdump with
-n to avoid name lookup or make sure you are not just arping and not
issuing icmp etc.


>  
> >> lo -> eth0
> >> But I want to only modify the dst MAC, src MAC of the mirroring packets, transmit them to next hop. 
> >> (not modify the dst,src MAC of the packets to 'lo').  What should I do?
> 
> >Ok, so modifying then mirroring wont work on ingress;->
> >One thing you can try is first to mirror lo->eth0, then pedit only
> >specific flow on eth0 that came from lo.
> 
> How to do this. Could you show me the example commands?   Thank you.
> 

Add the rule to mirror on lo
Add the rule to pedit for mirrored packet on eth0

cheers,
jamal



^ permalink raw reply

* [PATCH 5/5] netxen: update copyright
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1252704495-16118-1-git-send-email-dhananjay@netxen.com>

o Add QLogic copyright, add linux-driver@qlogic.com to
  MAINTAINERS.
o Delete old contact information.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/Makefile             |    7 +------
 drivers/net/netxen/netxen_nic.h         |    7 +------
 drivers/net/netxen/netxen_nic_ctx.c     |    7 +------
 drivers/net/netxen/netxen_nic_ethtool.c |    7 +------
 drivers/net/netxen/netxen_nic_hdr.h     |    7 +------
 drivers/net/netxen/netxen_nic_hw.c      |    7 +------
 drivers/net/netxen/netxen_nic_hw.h      |    7 +------
 drivers/net/netxen/netxen_nic_init.c    |    7 +------
 drivers/net/netxen/netxen_nic_main.c    |    7 +------
 9 files changed, 9 insertions(+), 54 deletions(-)

diff --git a/drivers/net/netxen/Makefile b/drivers/net/netxen/Makefile
index a70b682..11d94e2 100644
--- a/drivers/net/netxen/Makefile
+++ b/drivers/net/netxen/Makefile
@@ -1,4 +1,5 @@
 # Copyright (C) 2003 - 2009 NetXen, Inc.
+# Copyright (C) 2009 - QLogic Corporation.
 # All rights reserved.
 # 
 # This program is free software; you can redistribute it and/or
@@ -19,12 +20,6 @@
 # The full GNU General Public License is included in this distribution
 # in the file called LICENSE.
 # 
-# Contact Information:
-#    info@netxen.com
-# NetXen Inc,
-# 18922 Forge Drive
-# Cupertino, CA 95014-0701
-#
 #
 
 
diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 1ae46e8..7384f59 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #ifndef _NETXEN_NIC_H_
diff --git a/drivers/net/netxen/netxen_nic_ctx.c b/drivers/net/netxen/netxen_nic_ctx.c
index 33f82db..9cb8f68 100644
--- a/drivers/net/netxen/netxen_nic_ctx.c
+++ b/drivers/net/netxen/netxen_nic_ctx.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #include "netxen_nic_hw.h"
diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c
index d18832c..714f387 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #include <linux/types.h>
diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index 26188b4..7a71774 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #ifndef __NETXEN_NIC_HDR_H_
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 018cf42..3231400 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #include "netxen_nic.h"
diff --git a/drivers/net/netxen/netxen_nic_hw.h b/drivers/net/netxen/netxen_nic_hw.h
index 98e4b95..3fd1dcb 100644
--- a/drivers/net/netxen/netxen_nic_hw.h
+++ b/drivers/net/netxen/netxen_nic_hw.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #ifndef __NETXEN_NIC_HW_H_
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 128d1b6..91c2bc6 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #include <linux/netdevice.h>
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 2b1b939..c4f2427 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2003 - 2009 NetXen, Inc.
+ * Copyright (C) 2009 - QLogic Corporation.
  * All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -20,12 +21,6 @@
  * The full GNU General Public License is included in this distribution
  * in the file called LICENSE.
  *
- * Contact Information:
- *    info@netxen.com
- * NetXen Inc,
- * 18922 Forge Drive
- * Cupertino, CA 95014-0701
- *
  */
 
 #include <linux/vmalloc.h>
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 1/5] netxen: change firmware write size
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha, Amit Kumar Salecha
In-Reply-To: <1252704495-16118-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@qlogic.com>

Use 8 byte strides for firmware download into card
memory since oncard memory controller needs 8 byte
(64 bit) accesses. This avoids unnecessary rmw cycles.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic_init.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 485b947..8926b0e 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -727,21 +727,28 @@ netxen_load_firmware(struct netxen_adapter *adapter)
 			flashaddr += 8;
 		}
 	} else {
-		u32 data;
+		u64 data;
+		u32 hi, lo;
 
-		size = (NETXEN_IMAGE_START - NETXEN_BOOTLD_START) / 4;
+		size = (NETXEN_IMAGE_START - NETXEN_BOOTLD_START) / 8;
 		flashaddr = NETXEN_BOOTLD_START;
 
 		for (i = 0; i < size; i++) {
 			if (netxen_rom_fast_read(adapter,
-					flashaddr, (int *)&data) != 0)
+					flashaddr, &lo) != 0)
+				return -EIO;
+			if (netxen_rom_fast_read(adapter,
+					flashaddr + 4, &hi) != 0)
 				return -EIO;
 
+			/* hi, lo are already in host endian byteorder */
+			data = (((u64)hi << 32) | lo);
+
 			if (adapter->pci_mem_write(adapter,
-						flashaddr, &data, 4))
+						flashaddr, &data, 8))
 				return -EIO;
 
-			flashaddr += 4;
+			flashaddr += 8;
 		}
 	}
 	msleep(1);
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 0/5] netxen: driver update for 2.6.32
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev

Dave,
	Since you have delayed net-next tree submission, can you
	please include these patches as well? Couple of these
	are bug fixes, so they need to go anyway.

	Thanks,
	Dhananjay



^ permalink raw reply

* [PATCH 2/5] netxen: improve pci memory access
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1252704495-16118-1-git-send-email-dhananjay@netxen.com>

o Access on card memory through memory controller (agent)
  rather than moving small pci window around. Clean up the
  code for moving windows around.

o Restrict memory accesss to 64 bit, currently only firmware
  download uses this.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h      |    4 +-
 drivers/net/netxen/netxen_nic_hw.c   |  348 +++++++++-------------------------
 drivers/net/netxen/netxen_nic_main.c |    7 +-
 3 files changed, 93 insertions(+), 266 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 2371888..7e3d2b9 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -552,8 +552,8 @@ struct netxen_hardware_context {
 
 	int qdr_sn_window;
 	int ddr_mn_window;
-	unsigned long mn_win_crb;
-	unsigned long ms_win_crb;
+	u32 mn_win_crb;
+	u32 ms_win_crb;
 
 	u8 cut_through;
 	u8 revision_id;
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 555bc4a..018cf42 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1279,25 +1279,6 @@ netxen_nic_hw_read_wx_2M(struct netxen_adapter *adapter, ulong off)
 	return data;
 }
 
-/*
- * check memory access boundary.
- * used by test agent. support ddr access only for now
- */
-static unsigned long
-netxen_nic_pci_mem_bound_check(struct netxen_adapter *adapter,
-		unsigned long long addr, int size)
-{
-	if (!ADDR_IN_RANGE(addr,
-			NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX) ||
-		!ADDR_IN_RANGE(addr+size-1,
-			NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX) ||
-		((size != 1) && (size != 2) && (size != 4) && (size != 8))) {
-		return 0;
-	}
-
-	return 1;
-}
-
 static int netxen_pci_set_window_warning_count;
 
 static unsigned long
@@ -1424,10 +1405,8 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 		/* DDR network side */
 		window = MN_WIN(addr);
 		adapter->ahw.ddr_mn_window = window;
-		NXWR32(adapter, adapter->ahw.mn_win_crb | NETXEN_PCI_CRBSPACE,
-				window);
-		win_read = NXRD32(adapter,
-				adapter->ahw.mn_win_crb | NETXEN_PCI_CRBSPACE);
+		NXWR32(adapter, adapter->ahw.mn_win_crb, window);
+		win_read = NXRD32(adapter, adapter->ahw.mn_win_crb);
 		if ((win_read << 17) != window) {
 			printk(KERN_INFO "Written MNwin (0x%x) != "
 				"Read MNwin (0x%x)\n", window, win_read);
@@ -1442,10 +1421,8 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 
 		window = OCM_WIN(addr);
 		adapter->ahw.ddr_mn_window = window;
-		NXWR32(adapter, adapter->ahw.mn_win_crb | NETXEN_PCI_CRBSPACE,
-				window);
-		win_read = NXRD32(adapter,
-				adapter->ahw.mn_win_crb | NETXEN_PCI_CRBSPACE);
+		NXWR32(adapter, adapter->ahw.mn_win_crb, window);
+		win_read = NXRD32(adapter, adapter->ahw.mn_win_crb);
 		if ((win_read >> 7) != window) {
 			printk(KERN_INFO "%s: Written OCMwin (0x%x) != "
 					"Read OCMwin (0x%x)\n",
@@ -1458,10 +1435,8 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 		/* QDR network side */
 		window = MS_WIN(addr);
 		adapter->ahw.qdr_sn_window = window;
-		NXWR32(adapter, adapter->ahw.ms_win_crb | NETXEN_PCI_CRBSPACE,
-				window);
-		win_read = NXRD32(adapter,
-				adapter->ahw.ms_win_crb | NETXEN_PCI_CRBSPACE);
+		NXWR32(adapter, adapter->ahw.ms_win_crb, window);
+		win_read = NXRD32(adapter, adapter->ahw.ms_win_crb);
 		if (win_read != window) {
 			printk(KERN_INFO "%s: Written MSwin (0x%x) != "
 					"Read MSwin (0x%x)\n",
@@ -1484,177 +1459,6 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 	return addr;
 }
 
-static int netxen_nic_pci_is_same_window(struct netxen_adapter *adapter,
-				      unsigned long long addr)
-{
-	int window;
-	unsigned long long qdr_max;
-
-	if (NX_IS_REVISION_P2(adapter->ahw.revision_id))
-		qdr_max = NETXEN_ADDR_QDR_NET_MAX_P2;
-	else
-		qdr_max = NETXEN_ADDR_QDR_NET_MAX_P3;
-
-	if (ADDR_IN_RANGE(addr,
-			NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX)) {
-		/* DDR network side */
-		BUG();	/* MN access can not come here */
-	} else if (ADDR_IN_RANGE(addr,
-			NETXEN_ADDR_OCM0, NETXEN_ADDR_OCM0_MAX)) {
-		return 1;
-	} else if (ADDR_IN_RANGE(addr,
-				NETXEN_ADDR_OCM1, NETXEN_ADDR_OCM1_MAX)) {
-		return 1;
-	} else if (ADDR_IN_RANGE(addr, NETXEN_ADDR_QDR_NET, qdr_max)) {
-		/* QDR network side */
-		window = ((addr - NETXEN_ADDR_QDR_NET) >> 22) & 0x3f;
-		if (adapter->ahw.qdr_sn_window == window)
-			return 1;
-	}
-
-	return 0;
-}
-
-static int netxen_nic_pci_mem_read_direct(struct netxen_adapter *adapter,
-			u64 off, void *data, int size)
-{
-	unsigned long flags;
-	void __iomem *addr, *mem_ptr = NULL;
-	int ret = 0;
-	u64 start;
-	unsigned long mem_base;
-	unsigned long mem_page;
-
-	write_lock_irqsave(&adapter->adapter_lock, flags);
-
-	/*
-	 * If attempting to access unknown address or straddle hw windows,
-	 * do not access.
-	 */
-	start = adapter->pci_set_window(adapter, off);
-	if ((start == -1UL) ||
-		(netxen_nic_pci_is_same_window(adapter, off+size-1) == 0)) {
-		write_unlock_irqrestore(&adapter->adapter_lock, flags);
-		printk(KERN_ERR "%s out of bound pci memory access. "
-			"offset is 0x%llx\n", netxen_nic_driver_name,
-			(unsigned long long)off);
-		return -1;
-	}
-
-	addr = pci_base_offset(adapter, start);
-	if (!addr) {
-		write_unlock_irqrestore(&adapter->adapter_lock, flags);
-		mem_base = pci_resource_start(adapter->pdev, 0);
-		mem_page = start & PAGE_MASK;
-		/* Map two pages whenever user tries to access addresses in two
-		consecutive pages.
-		*/
-		if (mem_page != ((start + size - 1) & PAGE_MASK))
-			mem_ptr = ioremap(mem_base + mem_page, PAGE_SIZE * 2);
-		else
-			mem_ptr = ioremap(mem_base + mem_page, PAGE_SIZE);
-		if (mem_ptr == NULL) {
-			*(uint8_t  *)data = 0;
-			return -1;
-		}
-		addr = mem_ptr;
-		addr += start & (PAGE_SIZE - 1);
-		write_lock_irqsave(&adapter->adapter_lock, flags);
-	}
-
-	switch (size) {
-	case 1:
-		*(uint8_t  *)data = readb(addr);
-		break;
-	case 2:
-		*(uint16_t *)data = readw(addr);
-		break;
-	case 4:
-		*(uint32_t *)data = readl(addr);
-		break;
-	case 8:
-		*(uint64_t *)data = readq(addr);
-		break;
-	default:
-		ret = -1;
-		break;
-	}
-	write_unlock_irqrestore(&adapter->adapter_lock, flags);
-
-	if (mem_ptr)
-		iounmap(mem_ptr);
-	return ret;
-}
-
-static int
-netxen_nic_pci_mem_write_direct(struct netxen_adapter *adapter, u64 off,
-		void *data, int size)
-{
-	unsigned long flags;
-	void __iomem *addr, *mem_ptr = NULL;
-	int ret = 0;
-	u64 start;
-	unsigned long mem_base;
-	unsigned long mem_page;
-
-	write_lock_irqsave(&adapter->adapter_lock, flags);
-
-	/*
-	 * If attempting to access unknown address or straddle hw windows,
-	 * do not access.
-	 */
-	start = adapter->pci_set_window(adapter, off);
-	if ((start == -1UL) ||
-		(netxen_nic_pci_is_same_window(adapter, off+size-1) == 0)) {
-		write_unlock_irqrestore(&adapter->adapter_lock, flags);
-		printk(KERN_ERR "%s out of bound pci memory access. "
-			"offset is 0x%llx\n", netxen_nic_driver_name,
-			(unsigned long long)off);
-		return -1;
-	}
-
-	addr = pci_base_offset(adapter, start);
-	if (!addr) {
-		write_unlock_irqrestore(&adapter->adapter_lock, flags);
-		mem_base = pci_resource_start(adapter->pdev, 0);
-		mem_page = start & PAGE_MASK;
-		/* Map two pages whenever user tries to access addresses in two
-		 * consecutive pages.
-		 */
-		if (mem_page != ((start + size - 1) & PAGE_MASK))
-			mem_ptr = ioremap(mem_base + mem_page, PAGE_SIZE*2);
-		else
-			mem_ptr = ioremap(mem_base + mem_page, PAGE_SIZE);
-		if (mem_ptr == NULL)
-			return -1;
-		addr = mem_ptr;
-		addr += start & (PAGE_SIZE - 1);
-		write_lock_irqsave(&adapter->adapter_lock, flags);
-	}
-
-	switch (size) {
-	case 1:
-		writeb(*(uint8_t *)data, addr);
-		break;
-	case 2:
-		writew(*(uint16_t *)data, addr);
-		break;
-	case 4:
-		writel(*(uint32_t *)data, addr);
-		break;
-	case 8:
-		writeq(*(uint64_t *)data, addr);
-		break;
-	default:
-		ret = -1;
-		break;
-	}
-	write_unlock_irqrestore(&adapter->adapter_lock, flags);
-	if (mem_ptr)
-		iounmap(mem_ptr);
-	return ret;
-}
-
 #define MAX_CTL_CHECK   1000
 
 static int
@@ -1667,19 +1471,28 @@ netxen_nic_pci_mem_write_128M(struct netxen_adapter *adapter,
 	uint64_t      off8, tmpw, word[2] = {0, 0};
 	void __iomem *mem_crb;
 
-	/*
-	 * If not MN, go check for MS or invalid.
-	 */
-	if (netxen_nic_pci_mem_bound_check(adapter, off, size) == 0)
-		return netxen_nic_pci_mem_write_direct(adapter,
-				off, data, size);
+	if (size != 8)
+		return -EIO;
+
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_QDR_NET,
+				NETXEN_ADDR_QDR_NET_MAX_P2)) {
+		mem_crb = pci_base_offset(adapter, NETXEN_CRB_QDR_NET);
+		goto correct;
+	}
+
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX)) {
+		mem_crb = pci_base_offset(adapter, NETXEN_CRB_DDR_NET);
+		goto correct;
+	}
+
+	return -EIO;
 
+correct:
 	off8 = off & 0xfffffff8;
 	off0 = off & 0x7;
 	sz[0] = (size < (8 - off0)) ? size : (8 - off0);
 	sz[1] = size - sz[0];
 	loop = ((off0 + size - 1) >> 3) + 1;
-	mem_crb = pci_base_offset(adapter, NETXEN_CRB_DDR_NET);
 
 	if ((size != 8) || (off0 != 0))  {
 		for (i = 0; i < loop; i++) {
@@ -1760,20 +1573,29 @@ netxen_nic_pci_mem_read_128M(struct netxen_adapter *adapter,
 	uint64_t      off8, val, word[2] = {0, 0};
 	void __iomem *mem_crb;
 
+	if (size != 8)
+		return -EIO;
 
-	/*
-	 * If not MN, go check for MS or invalid.
-	 */
-	if (netxen_nic_pci_mem_bound_check(adapter, off, size) == 0)
-		return netxen_nic_pci_mem_read_direct(adapter, off, data, size);
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_QDR_NET,
+				NETXEN_ADDR_QDR_NET_MAX_P2)) {
+		mem_crb = pci_base_offset(adapter, NETXEN_CRB_QDR_NET);
+		goto correct;
+	}
 
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX)) {
+		mem_crb = pci_base_offset(adapter, NETXEN_CRB_DDR_NET);
+		goto correct;
+	}
+
+	return -EIO;
+
+correct:
 	off8 = off & 0xfffffff8;
 	off0[0] = off & 0x7;
 	off0[1] = 0;
 	sz[0] = (size < (8 - off0[0])) ? size : (8 - off0[0]);
 	sz[1] = size - sz[0];
 	loop = ((off0[0] + size - 1) >> 3) + 1;
-	mem_crb = pci_base_offset(adapter, NETXEN_CRB_DDR_NET);
 
 	write_lock_irqsave(&adapter->adapter_lock, flags);
 	netxen_nic_pci_change_crbwindow_128M(adapter, 0);
@@ -1847,20 +1669,26 @@ netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 {
 	int i, j, ret = 0, loop, sz[2], off0;
 	uint32_t temp;
-	uint64_t off8, mem_crb, tmpw, word[2] = {0, 0};
+	uint64_t off8, tmpw, word[2] = {0, 0};
+	void __iomem *mem_crb;
 
-	/*
-	 * If not MN, go check for MS or invalid.
-	 */
-	if (off >= NETXEN_ADDR_QDR_NET && off <= NETXEN_ADDR_QDR_NET_MAX_P3)
-		mem_crb = NETXEN_CRB_QDR_NET;
-	else {
-		mem_crb = NETXEN_CRB_DDR_NET;
-		if (netxen_nic_pci_mem_bound_check(adapter, off, size) == 0)
-			return netxen_nic_pci_mem_write_direct(adapter,
-					off, data, size);
+	if (size != 8)
+		return -EIO;
+
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_QDR_NET,
+				NETXEN_ADDR_QDR_NET_MAX_P3)) {
+		mem_crb = netxen_get_ioaddr(adapter, NETXEN_CRB_QDR_NET);
+		goto correct;
 	}
 
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX)) {
+		mem_crb = netxen_get_ioaddr(adapter, NETXEN_CRB_DDR_NET);
+		goto correct;
+	}
+
+	return -EIO;
+
+correct:
 	off8 = off & 0xfffffff8;
 	off0 = off & 0x7;
 	sz[0] = (size < (8 - off0)) ? size : (8 - off0);
@@ -1906,21 +1734,18 @@ netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 	 */
 
 	for (i = 0; i < loop; i++) {
-		temp = off8 + (i << 3);
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_ADDR_LO, temp);
-		temp = 0;
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_ADDR_HI, temp);
-		temp = word[i] & 0xffffffff;
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_WRDATA_LO, temp);
-		temp = (word[i] >> 32) & 0xffffffff;
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_WRDATA_HI, temp);
-		temp = MIU_TA_CTL_ENABLE | MIU_TA_CTL_WRITE;
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_CTRL, temp);
-		temp = MIU_TA_CTL_START | MIU_TA_CTL_ENABLE | MIU_TA_CTL_WRITE;
-		NXWR32(adapter, mem_crb+MIU_TEST_AGT_CTRL, temp);
+		writel(off8 + (i << 3), mem_crb+MIU_TEST_AGT_ADDR_LO);
+		writel(0, mem_crb+MIU_TEST_AGT_ADDR_HI);
+		writel(word[i] & 0xffffffff, mem_crb+MIU_TEST_AGT_WRDATA_LO);
+		writel((word[i] >> 32) & 0xffffffff,
+				mem_crb+MIU_TEST_AGT_WRDATA_HI);
+		writel((MIU_TA_CTL_ENABLE | MIU_TA_CTL_WRITE),
+				mem_crb+MIU_TEST_AGT_CTRL);
+		writel(MIU_TA_CTL_START | MIU_TA_CTL_ENABLE | MIU_TA_CTL_WRITE,
+				mem_crb+MIU_TEST_AGT_CTRL);
 
 		for (j = 0; j < MAX_CTL_CHECK; j++) {
-			temp = NXRD32(adapter, mem_crb + MIU_TEST_AGT_CTRL);
+			temp = readl(mem_crb + MIU_TEST_AGT_CTRL);
 			if ((temp & MIU_TA_CTL_BUSY) == 0)
 				break;
 		}
@@ -1947,21 +1772,26 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 {
 	int i, j = 0, k, start, end, loop, sz[2], off0[2];
 	uint32_t      temp;
-	uint64_t      off8, val, mem_crb, word[2] = {0, 0};
+	uint64_t      off8, val, word[2] = {0, 0};
+	void __iomem *mem_crb;
 
-	/*
-	 * If not MN, go check for MS or invalid.
-	 */
+	if (size != 8)
+		return -EIO;
+
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_QDR_NET,
+				NETXEN_ADDR_QDR_NET_MAX_P3)) {
+		mem_crb = netxen_get_ioaddr(adapter, NETXEN_CRB_QDR_NET);
+		goto correct;
+	}
 
-	if (off >= NETXEN_ADDR_QDR_NET && off <= NETXEN_ADDR_QDR_NET_MAX_P3)
-		mem_crb = NETXEN_CRB_QDR_NET;
-	else {
-		mem_crb = NETXEN_CRB_DDR_NET;
-		if (netxen_nic_pci_mem_bound_check(adapter, off, size) == 0)
-			return netxen_nic_pci_mem_read_direct(adapter,
-					off, data, size);
+	if (ADDR_IN_RANGE(off, NETXEN_ADDR_DDR_NET, NETXEN_ADDR_DDR_NET_MAX)) {
+		mem_crb = netxen_get_ioaddr(adapter, NETXEN_CRB_DDR_NET);
+		goto correct;
 	}
 
+	return -EIO;
+
+correct:
 	off8 = off & 0xfffffff8;
 	off0[0] = off & 0x7;
 	off0[1] = 0;
@@ -1976,17 +1806,14 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 	 */
 
 	for (i = 0; i < loop; i++) {
-		temp = off8 + (i << 3);
-		NXWR32(adapter, mem_crb + MIU_TEST_AGT_ADDR_LO, temp);
-		temp = 0;
-		NXWR32(adapter, mem_crb + MIU_TEST_AGT_ADDR_HI, temp);
-		temp = MIU_TA_CTL_ENABLE;
-		NXWR32(adapter, mem_crb + MIU_TEST_AGT_CTRL, temp);
-		temp = MIU_TA_CTL_START | MIU_TA_CTL_ENABLE;
-		NXWR32(adapter, mem_crb + MIU_TEST_AGT_CTRL, temp);
+		writel(off8 + (i << 3), mem_crb + MIU_TEST_AGT_ADDR_LO);
+		writel(0, mem_crb + MIU_TEST_AGT_ADDR_HI);
+		writel(MIU_TA_CTL_ENABLE, mem_crb + MIU_TEST_AGT_CTRL);
+		writel(MIU_TA_CTL_START | MIU_TA_CTL_ENABLE,
+				mem_crb + MIU_TEST_AGT_CTRL);
 
 		for (j = 0; j < MAX_CTL_CHECK; j++) {
-			temp = NXRD32(adapter, mem_crb + MIU_TEST_AGT_CTRL);
+			temp = readl(mem_crb + MIU_TEST_AGT_CTRL);
 			if ((temp & MIU_TA_CTL_BUSY) == 0)
 				break;
 		}
@@ -2001,8 +1828,7 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 		start = off0[i] >> 2;
 		end   = (off0[i] + sz[i] - 1) >> 2;
 		for (k = start; k <= end; k++) {
-			temp = NXRD32(adapter,
-				mem_crb + MIU_TEST_AGT_RDDATA(k));
+			temp = readl(mem_crb + MIU_TEST_AGT_RDDATA(k));
 			word[i] |= ((uint64_t)temp << (32 * k));
 		}
 	}
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 81c24a7..c6f32e5 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -643,9 +643,10 @@ netxen_setup_pci_map(struct netxen_adapter *adapter)
 		adapter->ahw.ddr_mn_window = 0;
 		adapter->ahw.qdr_sn_window = 0;
 
-		adapter->ahw.mn_win_crb = 0x100000 + PCIX_MN_WINDOW +
-			(pci_func * 0x20);
-		adapter->ahw.ms_win_crb = 0x100000 + PCIX_SN_WINDOW;
+		adapter->ahw.mn_win_crb = NETXEN_PCI_CRBSPACE +
+			0x100000 + PCIX_MN_WINDOW + (pci_func * 0x20);
+		adapter->ahw.ms_win_crb = NETXEN_PCI_CRBSPACE +
+			0x100000 + PCIX_SN_WINDOW;
 		if (pci_func < 4)
 			adapter->ahw.ms_win_crb += (pci_func * 0x20);
 		else
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 4/5] netxen: fix tx timeout recovery
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha, Amit Kumar Salecha
In-Reply-To: <1252704495-16118-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@qlogic.com>

Redesign tx timeout handling in line with new firmware
reset design that co-ordinates with other PCI function
drivers.

o For NX3031, first try to reset PCI function's own
  context before requesting firmware reset.

o For NX2031, since firmware heartbit is not supported
  directly request firmware reset.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h      |    4 +-
 drivers/net/netxen/netxen_nic_init.c |    4 +-
 drivers/net/netxen/netxen_nic_main.c |   69 ++++++++++++++++++++++++++++-----
 3 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 7e3d2b9..1ae46e8 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -178,6 +178,7 @@
 
 #define MAX_BUFFERS_PER_CMD	32
 #define TX_STOP_THRESH		((MAX_SKB_FRAGS >> 2) + 4)
+#define NX_MAX_TX_TIMEOUTS	2
 
 /*
  * Following are the states of the Phantom. Phantom will set them and
@@ -1145,7 +1146,8 @@ struct netxen_adapter {
 	u8 link_changed;
 	u8 fw_wait_cnt;
 	u8 fw_fail_cnt;
-	u16 resv4;
+	u8 tx_timeo_cnt;
+	u8 need_fw_reset;
 
 	u8 has_link_events;
 	u8 fw_type;
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 8926b0e..128d1b6 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -1434,8 +1434,10 @@ int netxen_process_cmd_ring(struct netxen_adapter *adapter)
 
 		if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev)) {
 			__netif_tx_lock(tx_ring->txq, smp_processor_id());
-			if (netxen_tx_avail(tx_ring) > TX_STOP_THRESH)
+			if (netxen_tx_avail(tx_ring) > TX_STOP_THRESH) {
 				netif_wake_queue(netdev);
+				adapter->tx_timeo_cnt = 0;
+			}
 			__netif_tx_unlock(tx_ring->txq);
 		}
 	}
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 4668a72..2b1b939 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -66,7 +66,7 @@ static int netxen_nic_close(struct net_device *netdev);
 static netdev_tx_t netxen_nic_xmit_frame(struct sk_buff *,
 					       struct net_device *);
 static void netxen_tx_timeout(struct net_device *netdev);
-static void netxen_reset_task(struct work_struct *work);
+static void netxen_tx_timeout_task(struct work_struct *work);
 static void netxen_fw_poll_work(struct work_struct *work);
 static void netxen_schedule_work(struct netxen_adapter *adapter,
 		work_func_t func, int delay);
@@ -875,6 +875,8 @@ wait_init:
 
 	netxen_check_options(adapter);
 
+	adapter->need_fw_reset = 0;
+
 	/* fall through and release firmware */
 
 err_out:
@@ -1183,7 +1185,7 @@ netxen_setup_netdev(struct netxen_adapter *adapter,
 
 	netdev->irq = adapter->msix_entries[0].vector;
 
-	INIT_WORK(&adapter->tx_timeout_task, netxen_reset_task);
+	INIT_WORK(&adapter->tx_timeout_task, netxen_tx_timeout_task);
 
 	if (netxen_read_mac_addr(adapter))
 		dev_warn(&pdev->dev, "failed to read mac addr\n");
@@ -1882,7 +1884,7 @@ static void netxen_tx_timeout(struct net_device *netdev)
 	schedule_work(&adapter->tx_timeout_task);
 }
 
-static void netxen_reset_task(struct work_struct *work)
+static void netxen_tx_timeout_task(struct work_struct *work)
 {
 	struct netxen_adapter *adapter =
 		container_of(work, struct netxen_adapter, tx_timeout_task);
@@ -1890,15 +1892,37 @@ static void netxen_reset_task(struct work_struct *work)
 	if (!netif_running(adapter->netdev))
 		return;
 
-	if (test_bit(__NX_RESETTING, &adapter->state))
+	if (test_and_set_bit(__NX_RESETTING, &adapter->state))
 		return;
 
-	netxen_napi_disable(adapter);
+	if (++adapter->tx_timeo_cnt >= NX_MAX_TX_TIMEOUTS)
+		goto request_reset;
+
+	if (NX_IS_REVISION_P2(adapter->ahw.revision_id)) {
+		/* try to scrub interrupt */
+		netxen_napi_disable(adapter);
 
-	adapter->netdev->trans_start = jiffies;
+		adapter->netdev->trans_start = jiffies;
 
-	netxen_napi_enable(adapter);
-	netif_wake_queue(adapter->netdev);
+		netxen_napi_enable(adapter);
+
+		netif_wake_queue(adapter->netdev);
+
+		goto done;
+
+	} else {
+		if (!netxen_nic_reset_context(adapter)) {
+			adapter->netdev->trans_start = jiffies;
+			goto done;
+		}
+
+		/* context reset failed, fall through for fw reset */
+	}
+
+request_reset:
+	adapter->need_fw_reset = 1;
+done:
+	clear_bit(__NX_RESETTING, &adapter->state);
 }
 
 struct net_device_stats *netxen_nic_get_stats(struct net_device *netdev)
@@ -2048,6 +2072,22 @@ nx_decr_dev_ref_cnt(struct netxen_adapter *adapter)
 	return count;
 }
 
+static void
+nx_dev_request_reset(struct netxen_adapter *adapter)
+{
+	u32 state;
+
+	if (netxen_api_lock(adapter))
+		return;
+
+	state = NXRD32(adapter, NX_CRB_DEV_STATE);
+
+	if (state != NX_DEV_INITALIZING)
+		NXWR32(adapter, NX_CRB_DEV_STATE, NX_DEV_NEED_RESET);
+
+	netxen_api_unlock(adapter);
+}
+
 static int
 netxen_can_start_firmware(struct netxen_adapter *adapter)
 {
@@ -2133,9 +2173,11 @@ netxen_fwinit_work(struct work_struct *work)
 	switch (dev_state) {
 	case NX_DEV_COLD:
 	case NX_DEV_READY:
-		netxen_start_firmware(adapter);
-		netxen_schedule_work(adapter, netxen_attach_work, 0);
-		return;
+		if (!netxen_start_firmware(adapter)) {
+			netxen_schedule_work(adapter, netxen_attach_work, 0);
+			return;
+		}
+		break;
 
 	case NX_DEV_INITALIZING:
 		if (++adapter->fw_wait_cnt < FW_POLL_THRESH) {
@@ -2195,6 +2237,11 @@ netxen_check_health(struct netxen_adapter *adapter)
 	if (netxen_nic_check_temp(adapter))
 		goto detach;
 
+	if (adapter->need_fw_reset) {
+		nx_dev_request_reset(adapter);
+		goto detach;
+	}
+
 	state = NXRD32(adapter, NX_CRB_DEV_STATE);
 	if (state == NX_DEV_NEED_RESET)
 		goto detach;
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 3/5] netxen: fix file firmware leak
From: Dhananjay Phadke @ 2009-09-11 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1252704495-16118-1-git-send-email-dhananjay@netxen.com>

Release file firmware when no firmware reset is required.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index c6f32e5..4668a72 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -817,7 +817,7 @@ netxen_start_firmware(struct netxen_adapter *adapter)
 	if (err < 0)
 		goto err_out;
 	if (err == 0)
-		goto wait_init;
+		goto ready;
 
 	if (first_boot != 0x55555555) {
 		NXWR32(adapter, CRB_CMDPEG_STATE, 0);
@@ -860,6 +860,7 @@ netxen_start_firmware(struct netxen_adapter *adapter)
 		| (_NETXEN_NIC_LINUX_SUBVERSION);
 	NXWR32(adapter, CRB_DRIVER_VERSION, val);
 
+ready:
 	NXWR32(adapter, NX_CRB_DEV_STATE, NX_DEV_READY);
 
 wait_init:
@@ -874,7 +875,7 @@ wait_init:
 
 	netxen_check_options(adapter);
 
-	return 0;
+	/* fall through and release firmware */
 
 err_out:
 	netxen_release_firmware(adapter);
-- 
1.6.0.2


^ permalink raw reply related

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Jamie Lokier @ 2009-09-11 21:27 UTC (permalink / raw)
  To: Eric Paris
  Cc: David Miller, linux-kernel, linux-fsdevel, netdev, viro, alan,
	hch
In-Reply-To: <1252703626.2305.50.camel@dhcp231-106.rdu.redhat.com>

Eric Paris wrote:
> On Fri, 2009-09-11 at 21:46 +0100, Jamie Lokier wrote:
> > Eric Paris wrote:
> > > > I would really prefer if you worked on eliminating the problem that
> > > > prevents you from using netlink instead.
> > > 
> > > I'm not really sure if I can, although I'd love to hear input from
> > > someone who knows the netlink code on how I can make it do what I need.
> > > I'm really not duplicating much other than the NLMSG_OK and NLMSG_NEXT
> > > macros.  My code doesn't even use skbs and I'm not savy enough to really
> > > know how I could.  I'm more than willing to work on it if someone can
> > > point me to how it might work.
> > 
> > Let's turn the question around.
> > 
> > Since you're doing lots of non-sockety things, and can't tolerate
> > dropped packets - why isn't it a character device?  What's the reason
> > for using a socket at all?
> > 
> > (I'm reminded of /dev/poll, /dev/epoll and /dev/inotify :-)
> 
> Originally it was a char device and I was told to use a socket protocol
> so I could use get/set sockopt rather than ioctl, because ioctl is the
> devil (even if those aren't THAT much better).
> 
> The queuing being done using events instead of skbs was done reusing
> inotify code, reusing network code would be just as good with me.  What
> I really need is a way to convey a pointer from one process to another.
> That's why I claim loss is not an option, since I'm holding a reference
> to the pointer I can't have that conveyance disappear under us.

It's fine as long as the disappearing knows to releas the reference.
But I suspect fanotify would be awfully hard to use if messages were
unreliable.

> If network people want me to get back out of the network system I can go
> back to a char file with lots of ioctls.  I'd love to reuse code, I just
> don't know what's possible...

Ok.  I understand you're pushed in different directions by different
schools of thought.

Let's look at some history.  What happened to /dev/epoll.  It worked
very well (and several OSes have /dev/poll which is similar).  There
was no technical reason to change the interface.

But when it came to mainlining it, Linus objected, and forced it to
become a small set of system calls.  It's quite a nice interface to
use now.

Then /dev/inotify.  You know what happened.  The history was similar:
Linux objected to the device, and forced it to use a few system calls.

More recently, people skipped over the /dev path, having seen how it
went before, and just implemented things like timerfd, eventfd and
signalfd system calls.

That seems to be the Linux way - if the interface can be exposed as a
small set of sensible system calls, and it's really a core kernel
facility.

Does fanotify need "lots of ioctls", or could it fit comfortably into
say 2-5 strongly typed system calls, like inotify and epoll do?

-- Jamie

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: jamal @ 2009-09-11 21:21 UTC (permalink / raw)
  To: Eric Paris
  Cc: David Miller, linux-kernel, linux-fsdevel, netdev, viro, alan,
	hch, balbir
In-Reply-To: <1252697613.2305.38.camel@dhcp231-106.rdu.redhat.com>

On Fri, 2009-09-11 at 15:33 -0400, Eric Paris wrote:

> 
> I'm willing to try to make this happen, I'm just sure I see the benefit
> and I don't know anyone who knows the net/netlink code well enough who
> owuld be interested to help.....
> 

I can help you on the netlink side but maybe we need to define the
problem scope properly first. I dont know anything about fanotify, but
i then dont need to. Can I re-phrase your problem as follows:

You have kernel code which wishes to communicate information to multiple
listeners in user space. The listeners register some form of filter
(that you term as a monitor) for an object of interest.  For a specific
filter you wish to send them events when something happens to
the object they registered for.
Would the above be accurate?

One good place to start is the taskstats code in linux. Look at
Documentation/accounting in the kernel code. Ive also CCed Balbir who
was involved in that work since i think it bears similarity to what you
are trying to achieve conceptually.

To your other statements:
0) A single skb can be used to send to all the listners..
1) Netlink messages wont get lost unless the listener is not keeping up
and the kernel sending it messages ends up filling its queues. In such a
case your event message will be delivered to the 49 other users but
not the overloaded one. You can add sequence numbers to the event
messages you send to the listeners and any gaps in sequences on received
events imply lost events. You can add a mechanism to query your user
space kernel when something like that gets lost.
2) Your architecture has to take care of maintaining the state of what
you want to deliver. So your editing has nothing to do with skbs.
i.e an event happens, you update your state. If you need to send the
event to the listeners, you alloc an skb - populate it with the info;
multicast it to all the listeners. If something else happens, i would
suggest for sake of simplicity you rinse and repeat. Sure, the listener
may get contradicting events - but they should be able to handle it.

I hope this helps...

cheers,
jamal

^ permalink raw reply

* Re: [net-next PATCH] etherdevice.h: random_ether_addr update
From: Stephen Hemminger @ 2009-09-11 21:15 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, jeffrey.t.kirsher, netdev, gospo, gregory.v.rose,
	donald.c.skidmore
In-Reply-To: <1252700442.15292.62.camel@Joe-Laptop.home>

On Fri, 11 Sep 2009 13:20:42 -0700
Joe Perches <joe@perches.com> wrote:

> On Fri, 2009-09-11 at 12:15 -0700, David Miller wrote:
> > From: Joe Perches <joe@perches.com>
> > Date: Thu, 10 Sep 2009 20:02:43 -0700
> > > On Thu, 2009-09-10 at 19:07 -0700, Stephen Hemminger wrote:
> > >> On Thu, 10 Sep 2009 18:48:27 -0700
> > >> Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > >> > From: Gregory Rose <gregory.v.rose@intel.com>
> > >> > This patch changes the default VF MAC address generation to use an Intel
> > >> > Organizational Unit Identifier (OUI), instead of a fully randomized
> > >> > Ethernet address.  This is to help prevent accidental MAC address
> > >> > collisions.
> > > I think this not a very good idea.
> > I also completely agree that this patch is not a wise move.
> 
> Perhaps this?
> 
> random_ether_address should not assign an "0x02" leading octet.
> 
> "02" has the local assignment bit set,
> but is actually a value assigned via OUI.
> 
> Do not use get_random_bytes to avoid drawing down entropy pool.
> 

Getting 6 bytes once is not going to be enough of a problem
to drain the pool. I prefer not to weaken the randomness here.


-- 

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Eric Paris @ 2009-09-11 21:13 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: David Miller, linux-kernel, linux-fsdevel, netdev, viro, alan,
	hch
In-Reply-To: <20090911204602.GA19371@shareable.org>

On Fri, 2009-09-11 at 21:46 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > > I would really prefer if you worked on eliminating the problem that
> > > prevents you from using netlink instead.
> > 
> > I'm not really sure if I can, although I'd love to hear input from
> > someone who knows the netlink code on how I can make it do what I need.
> > I'm really not duplicating much other than the NLMSG_OK and NLMSG_NEXT
> > macros.  My code doesn't even use skbs and I'm not savy enough to really
> > know how I could.  I'm more than willing to work on it if someone can
> > point me to how it might work.
> 
> Let's turn the question around.
> 
> Since you're doing lots of non-sockety things, and can't tolerate
> dropped packets - why isn't it a character device?  What's the reason
> for using a socket at all?
> 
> (I'm reminded of /dev/poll, /dev/epoll and /dev/inotify :-)

Originally it was a char device and I was told to use a socket protocol
so I could use get/set sockopt rather than ioctl, because ioctl is the
devil (even if those aren't THAT much better).

The queuing being done using events instead of skbs was done reusing
inotify code, reusing network code would be just as good with me.  What
I really need is a way to convey a pointer from one process to another.
That's why I claim loss is not an option, since I'm holding a reference
to the pointer I can't have that conveyance disappear under us.

If network people want me to get back out of the network system I can go
back to a char file with lots of ioctls.  I'd love to reuse code, I just
don't know what's possible...

-Eric

^ permalink raw reply

* [PATCH 4/4] bonding: add sysfs files to display tlb and alb hash table contents
From: Andy Gospodarek @ 2009-09-11 21:13 UTC (permalink / raw)
  To: netdev, fubar, bonding-devel


bonding: add sysfs files to display tlb and alb hash table contents

While debugging some problems with alb (mode 6) bonding I realized that
being able to output the contents of both hash tables would be helpful.
This is what the output looks like for the two files:

device  load
eth1    491
eth2    491
hash device   last device   tx bytes       load        next previous
2    eth1     eth1          2254           491         0    0
3    eth2     eth2          2744           491         0    0
6             eth2          0              488         0    0
8             eth2          0              461698      0    0
1b            eth2          0              249         0    0
eb            eth2          0              21          0    0
ff            eth2          0              22          0    0

hash ip_src          ip_dst          mac_dst           slave assign ntt
2    10.0.3.2        10.0.3.11       00:e0:81:71:ee:a9 eth1  1      0
3    10.0.3.2        10.0.3.10       00:e0:81:71:ee:a9 eth2  1      0
8    10.0.3.2        10.0.3.1        00:e0:81:71:ee:a9 eth2  1      0

These were a great help debugging the fixes I have just posted and they
might be helpful for others, so I decided to include them in my
patchset.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

---
 drivers/net/bonding/bond_alb.c   |   61 ++++++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bond_alb.h   |    2 +
 drivers/net/bonding/bond_sysfs.c |   40 +++++++++++++++++++++++++
 3 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 7db8835..4e930e3 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -778,6 +778,67 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
 	return tx_slave;
 }
 
+int rlb_print_rx_hashtbl(struct bonding *bond, char *buf)
+{
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+	struct rlb_client_info *client_info;
+	u32 hash_index;
+	u32 count = 0;
+	
+	_lock_rx_hashtbl(bond);
+
+	count = sprintf(buf, "hash ip_src          ip_dst          mac_dst           slave assign ntt\n");
+	hash_index = bond_info->rx_hashtbl_head;
+	for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+		client_info = &(bond_info->rx_hashtbl[hash_index]);
+		count += sprintf(buf + count,"%-4x %-15pi4 %-15pi4 %pM %-5s %-6d %d\n",
+				 hash_index,
+				 &client_info->ip_src,
+				 &client_info->ip_dst,
+				 client_info->mac_dst,
+				 client_info->slave->dev->name,
+				 client_info->assigned,
+				 client_info->ntt);
+	}
+
+	_unlock_rx_hashtbl(bond);
+	return count;
+}
+
+int tlb_print_tx_hashtbl(struct bonding *bond, char *buf)
+{
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+	u32 hash_index;
+	u32 count = 0;
+	struct slave *slave;
+	int i;
+	
+	_lock_tx_hashtbl(bond);
+
+	count += sprintf(buf, "device  load\n");
+	bond_for_each_slave(bond, slave, i) {
+		struct tlb_slave_info *slave_info = &(SLAVE_TLB_INFO(slave));
+		count += sprintf(buf + count,"%-7s %d\n",slave->dev->name,slave_info->load);
+	}
+	count += sprintf(buf + count, "hash device   last device   tx bytes       load        next previous\n");
+	for (hash_index = 0; hash_index < TLB_HASH_TABLE_SIZE; hash_index++) {
+		struct tlb_client_info *client_info = &(bond_info->tx_hashtbl[hash_index]);
+		if (client_info->tx_slave || client_info->last_slave) {
+			count += sprintf(buf + count,"%-4x %-8s %-13s %-14d %-11d %-4x %d\n",
+					 hash_index,
+					 (client_info->tx_slave) ? client_info->tx_slave->dev->name : "",
+					 (client_info->last_slave) ? client_info->last_slave->dev->name : "",
+					 client_info->tx_bytes,
+					 client_info->load_history,
+					 (client_info->next != TLB_NULL_INDEX) ? client_info->next : 0,
+					 (client_info->prev != TLB_NULL_INDEX) ? client_info->prev : 0);
+		}
+	}
+
+	_unlock_tx_hashtbl(bond);
+	return count;
+}
+
 /* Caller must hold rx_hashtbl lock */
 static void rlb_init_table_entry(struct rlb_client_info *entry)
 {
diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
index b65fd29..8543447 100644
--- a/drivers/net/bonding/bond_alb.h
+++ b/drivers/net/bonding/bond_alb.h
@@ -132,5 +132,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
 void bond_alb_monitor(struct work_struct *);
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
 void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
+int rlb_print_rx_hashtbl(struct bonding *bond, char *buf);
+int tlb_print_tx_hashtbl(struct bonding *bond, char *buf);
 #endif /* __BOND_ALB_H__ */
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 55bf34f..1123e1f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1480,6 +1480,44 @@ static ssize_t bonding_show_ad_partner_mac(struct device *d,
 static DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL);
 
 
+/*
+ * Show current tlb/alb tx hash table.
+ */
+static ssize_t bonding_show_tlb_tx_hash(struct device *d,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	int count = 0;
+	struct bonding *bond = to_bond(d);
+
+	if (bond->params.mode == BOND_MODE_ALB ||
+	    bond->params.mode == BOND_MODE_TLB) {
+		count = tlb_print_tx_hashtbl(bond, buf);
+	}
+
+	return count;
+}
+static DEVICE_ATTR(tlb_tx_hash, S_IRUGO, bonding_show_tlb_tx_hash, NULL);
+
+
+/*
+ * Show current alb rx hash table.
+ */
+static ssize_t bonding_show_alb_rx_hash(struct device *d,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	int count = 0;
+	struct bonding *bond = to_bond(d);
+
+	if (bond->params.mode == BOND_MODE_ALB) {
+		count = rlb_print_rx_hashtbl(bond, buf);
+	}
+
+	return count;
+}
+static DEVICE_ATTR(alb_rx_hash, S_IRUGO, bonding_show_alb_rx_hash, NULL);
+
 
 static struct attribute *per_bond_attrs[] = {
 	&dev_attr_slaves.attr,
@@ -1505,6 +1543,8 @@ static struct attribute *per_bond_attrs[] = {
 	&dev_attr_ad_actor_key.attr,
 	&dev_attr_ad_partner_key.attr,
 	&dev_attr_ad_partner_mac.attr,
+	&dev_attr_alb_rx_hash.attr,
+	&dev_attr_tlb_tx_hash.attr,
 	NULL,
 };
 
-- 
1.5.5.6


^ permalink raw reply related

* RE: [net-next PATCH V2] etherdevice.h: random_ether_addr update
From: Rose, Gregory V @ 2009-09-11 21:13 UTC (permalink / raw)
  To: Joe Perches, David Miller
  Cc: shemminger@vyatta.com, Kirsher, Jeffrey T, netdev@vger.kernel.org,
	gospo@redhat.com, Skidmore, Donald C
In-Reply-To: <1252701862.15292.73.camel@Joe-Laptop.home>

The addresses generated here will likely never be used in a production environment.  They are place holders so that the device will function before any management console has done what 99.9% of all vendors will do, which is assign their own address.  OK, that 99% is not a real number, but consider it illustrative of the fact that almost all VMM vendors will assign their own MAC addresses.

They're useful for test and development and that's about it.  We're in discussion with partners about how best to approach the management interface for this and each vendor has his own preferred approach (which complicates the solution) but for now the accepted method is for the igbvf driver to set it's own assigned MAC address in the guest.

The discussion is useful but I think we should keep in mind that the addresses in question will be transitory in nature, likely to be never used except for test and development.

One thing, the original patch described it as ncessary to help prevent collisions among MAC address and this is incorrect.  It was intended to help reduce collisions with other vendors MAC addresses.  If the list doesn't consider that a worth while goal then we can withdraw the patch or take one of the suggested permutations.

Thanks,

- Greg Rose
LAN Access Division
Intel Corp.

-----Original Message-----
From: Joe Perches [mailto:joe@perches.com] 
Sent: Friday, September 11, 2009 1:44 PM
To: David Miller
Cc: shemminger@vyatta.com; Kirsher, Jeffrey T; netdev@vger.kernel.org; gospo@redhat.com; Rose, Gregory V; Skidmore, Donald C
Subject: Re: [net-next PATCH V2] etherdevice.h: random_ether_addr update

Perhaps this is slightly better, it doesn't call
random32 for each octet and makes sure the leading
octet is >= 0x04.

random_ether_address should assign a leading octet >= "0x04"

Does not use get_random_bytes to avoid drawing down entropy pool.

Signed-off-by: Joe Perches <joe@perches.com>

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 3d7a668..fddcabf 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -121,9 +121,26 @@ static inline int is_valid_ether_addr(const u8 *addr)
  */
 static inline void random_ether_addr(u8 *addr)
 {
-	get_random_bytes (addr, ETH_ALEN);
-	addr [0] &= 0xfe;	/* clear multicast bit */
-	addr [0] |= 0x02;	/* set local assignment bit (IEEE802) */
+	u32 val;
+
+	/* not calling get_random_bytes to avoid using entropy */
+	do {
+		val = random32();
+		addr[0] = val;
+	} while (addr[0] < 4);
+	addr[0] &= 0xfe;	/* clear multicast bit */
+	addr[0] |= 0x02;	/* set local assignment bit (IEEE802) */
+
+	val >>= 8;
+	addr[1] = val;
+	val >>= 8;
+	addr[2] = val;
+	val >>= 8;
+	addr[3] = val;
+	val = random32();
+	addr[4] = val;
+	val >>= 8;
+	addr[5] = val;
 }

 /**

^ permalink raw reply related

* [PATCH 3/4] bonding: send ARP requests on interfaces other than the primary for tlb/alb
From: Andy Gospodarek @ 2009-09-11 21:11 UTC (permalink / raw)
  To: netdev, fubar, bonding-devel


Subject: [PATCH] bonding: send ARP requests on interfaces other than the primary for tlb/alb

This patch sends ARP request on the correct destination output interface
rather than always sending them on the primary interface.  I've also
added some bits to make sure that the source and destination address in
the ARP header are correct since simply changing the source MAC and
output interface will not be that helpful.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

---
 drivers/net/bonding/bond_alb.c |   45 ++++++++++++++++++---------------------
 1 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index a88d0ec..7db8835 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -747,35 +747,32 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
 	struct arp_pkt *arp = arp_pkt(skb);
 	struct slave *tx_slave = NULL;
 
-	if (arp->op_code == htons(ARPOP_REPLY)) {
-		/* the arp must be sent on the selected
-		* rx channel
-		*/
-		tx_slave = rlb_choose_channel(skb, bond);
-		if (tx_slave) {
-			memcpy(arp->mac_src,tx_slave->dev->dev_addr, ETH_ALEN);
-		}
-		pr_debug("Server sent ARP Reply packet\n");
-	} else if (arp->op_code == htons(ARPOP_REQUEST)) {
-		/* Create an entry in the rx_hashtbl for this client as a
-		 * place holder.
-		 * When the arp reply is received the entry will be updated
-		 * with the correct unicast address of the client.
-		 */
-		rlb_choose_channel(skb, bond);
+	/* Choose an output channel for the ARP frame */
+	tx_slave = rlb_choose_channel(skb, bond);
 
-		/* The ARP relpy packets must be delayed so that
-		 * they can cancel out the influence of the ARP request.
-		 */
+	/* If a valid interface is returned, make sure the sender and target MAC
+	 * addresses are correct based on the interface that will be transmitting
+	 * the frame. */
+	if (tx_slave) {
+		/* If sender mac is the bond's address, rewrite */
+		if (!compare_ether_addr_64bits(arp->mac_src,bond->dev->dev_addr))
+			memcpy(arp->mac_src,tx_slave->dev->dev_addr,bond->dev->addr_len);
+
+		/* If target mac is the bond's address, rewrite */
+		if (!compare_ether_addr_64bits(arp->mac_dst,bond->dev->dev_addr))
+			memcpy(arp->mac_dst,tx_slave->dev->dev_addr,bond->dev->addr_len);
+
+	} else if (arp->op_code == htons(ARPOP_REQUEST)) {
+		/* if tx_slave is NULL, the periodic ARP replies must
+		 * be delayed so they can cancel out the influence of
+		 * the ARP request. */
 		bond->alb_info.rlb_update_delay_counter = RLB_UPDATE_DELAY;
 
-		/* arp requests are broadcast and are sent on the primary
-		 * the arp request will collapse all clients on the subnet to
+		/* ARP requests are broadcast and are sent on the primary
+		 * the ARP request will collapse all clients on the subnet to
 		 * the primary slave. We must register these clients to be
-		 * updated with their assigned mac.
-		 */
+		 * updated with their assigned MAC. */
 		rlb_req_update_subnet_clients(bond, arp->ip_src);
-		pr_debug("Server sent ARP Request packet\n");
 	}
 
 	return tx_slave;
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH 2/4] bonding: make sure tx and rx hash tables stay in sync when using alb mode
From: Andy Gospodarek @ 2009-09-11 21:11 UTC (permalink / raw)
  To: netdev, fubar, bonding-devel


Subject: [PATCH] bonding: make sure tx and rx hash tables stay in sync when using alb mode

I noticed that it was easy for alb (mode 6) bonding to get into a state
where the tx hash-table and rx hash-table are out of sync (there is
really nothing to keep them synchronized), and we will transmit traffic
destined for a host on one slave and send ARP frames to the same slave
from another interface using a different source MAC.

There is no compelling reason to do this, so this patch makes sure the
rx hash-table changes whenever the tx hash-table is updated based on
device load.  This patch also drops the code that does rlb re-balancing
since the balancing will not be controlled by the tx hash-table based on
transmit load.

Long-term it would be nice to reduce these two tables into one, but
until that is done (as well as some significant re-factoring on the alb
code) they should be kept in sync.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

---
 drivers/net/bonding/bond_alb.c |  123 ++++++++++++++++------------------------
 1 files changed, 49 insertions(+), 74 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index bcf25c6..a88d0ec 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -111,6 +111,7 @@ static inline struct arp_pkt *arp_pkt(const struct sk_buff *skb)
 
 /* Forward declaration */
 static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[]);
+static struct slave *rlb_update_rx_table(struct bonding *bond, struct slave *next_slave, u32 hash_index);
 
 static inline u8 _simple_hash(const u8 *hash_start, int hash_size)
 {
@@ -124,7 +125,18 @@ static inline u8 _simple_hash(const u8 *hash_start, int hash_size)
 	return hash;
 }
 
-/*********************** tlb specific functions ***************************/
+
+/********************* rlb and tlb lock functions *************************/
+static inline void _lock_rx_hashtbl(struct bonding *bond)
+{
+	spin_lock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));
+}
+
+static inline void _unlock_rx_hashtbl(struct bonding *bond)
+{
+	spin_unlock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));
+}
+
 
 static inline void _lock_tx_hashtbl(struct bonding *bond)
 {
@@ -136,6 +148,7 @@ static inline void _unlock_tx_hashtbl(struct bonding *bond)
 	spin_unlock_bh(&(BOND_ALB_INFO(bond).tx_hashtbl_lock));
 }
 
+/*********************** tlb specific functions ***************************/
 /* Caller must hold tx_hashtbl lock */
 static inline void tlb_init_table_entry(struct tlb_client_info *entry, int save_load)
 {
@@ -296,6 +309,12 @@ static struct slave *tlb_choose_channel(struct bonding *bond, u32 hash_index, u3
 	if (!assigned_slave) {
 		assigned_slave = tlb_get_best_slave(bond, hash_index);
 
+		if (bond_info->rlb_enabled) {
+			_lock_rx_hashtbl(bond);
+			rlb_update_rx_table(bond, assigned_slave, hash_index);
+			_unlock_rx_hashtbl(bond);
+		}
+
 		if (assigned_slave) {
 			struct tlb_slave_info *slave_info =
 				&(SLAVE_TLB_INFO(assigned_slave));
@@ -325,14 +344,37 @@ static struct slave *tlb_choose_channel(struct bonding *bond, u32 hash_index, u3
 }
 
 /*********************** rlb specific functions ***************************/
-static inline void _lock_rx_hashtbl(struct bonding *bond)
+
+/* Caller must hold bond lock for read and hashtbl lock */
+static struct slave *rlb_update_rx_table(struct bonding *bond, struct slave *next_slave, u32 hash_index)
 {
-	spin_lock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+
+	/* check rlb table and correct it if wrong */
+	if (bond_info->rlb_enabled) {
+		struct rlb_client_info *rx_client_info = &(bond_info->rx_hashtbl[hash_index]);
+
+		/* if the new slave computed by tlb checks doesn't match rlb, stop rlb from using it */
+		if (next_slave && (next_slave != rx_client_info->slave)) 
+			rx_client_info->slave = next_slave;
+	}
+	return next_slave;
 }
 
-static inline void _unlock_rx_hashtbl(struct bonding *bond)
+/* Caller must hold bond lock for read and hashtbl lock */
+static struct slave *alb_get_best_slave(struct bonding *bond, u32 hash_index)
 {
-	spin_unlock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));
+	struct slave *next_slave = NULL;
+
+	_lock_tx_hashtbl(bond);
+
+	next_slave = tlb_get_best_slave(bond, hash_index);
+
+	_unlock_tx_hashtbl(bond);
+
+	rlb_update_rx_table(bond, next_slave, hash_index);
+
+	return next_slave;
 }
 
 /* when an ARP REPLY is received from a client update its info
@@ -402,38 +444,6 @@ out:
 	return res;
 }
 
-/* Caller must hold bond lock for read */
-static struct slave *rlb_next_rx_slave(struct bonding *bond)
-{
-	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
-	struct slave *rx_slave, *slave, *start_at;
-	int i = 0;
-
-	if (bond_info->next_rx_slave) {
-		start_at = bond_info->next_rx_slave;
-	} else {
-		start_at = bond->first_slave;
-	}
-
-	rx_slave = NULL;
-
-	bond_for_each_slave_from(bond, slave, i, start_at) {
-		if (SLAVE_IS_OK(slave)) {
-			if (!rx_slave) {
-				rx_slave = slave;
-			} else if (slave->speed > rx_slave->speed) {
-				rx_slave = slave;
-			}
-		}
-	}
-
-	if (rx_slave) {
-		bond_info->next_rx_slave = rx_slave->next;
-	}
-
-	return rx_slave;
-}
-
 /* teach the switch the mac of a disabled slave
  * on the primary for fault tolerance
  *
@@ -475,7 +485,7 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
 	for (; index != RLB_NULL_INDEX; index = next_index) {
 		next_index = rx_hash_table[index].next;
 		if (rx_hash_table[index].slave == slave) {
-			struct slave *assigned_slave = rlb_next_rx_slave(bond);
+			struct slave *assigned_slave = alb_get_best_slave(bond, index);
 
 			if (assigned_slave) {
 				rx_hash_table[index].slave = assigned_slave;
@@ -687,7 +697,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 		}
 	}
 	/* assign a new slave */
-	assigned_slave = rlb_next_rx_slave(bond);
+	assigned_slave = alb_get_best_slave(bond, hash_index);
 
 	if (assigned_slave) {
 		client_info->ip_src = arp->ip_src;
@@ -771,36 +781,6 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
 	return tx_slave;
 }
 
-/* Caller must hold bond lock for read */
-static void rlb_rebalance(struct bonding *bond)
-{
-	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
-	struct slave *assigned_slave;
-	struct rlb_client_info *client_info;
-	int ntt;
-	u32 hash_index;
-
-	_lock_rx_hashtbl(bond);
-
-	ntt = 0;
-	hash_index = bond_info->rx_hashtbl_head;
-	for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
-		client_info = &(bond_info->rx_hashtbl[hash_index]);
-		assigned_slave = rlb_next_rx_slave(bond);
-		if (assigned_slave && (client_info->slave != assigned_slave)) {
-			client_info->slave = assigned_slave;
-			client_info->ntt = 1;
-			ntt = 1;
-		}
-	}
-
-	/* update the team's flag only after the whole iteration */
-	if (ntt) {
-		bond_info->rx_ntt = 1;
-	}
-	_unlock_rx_hashtbl(bond);
-}
-
 /* Caller must hold rx_hashtbl lock */
 static void rlb_init_table_entry(struct rlb_client_info *entry)
 {
@@ -1521,11 +1501,6 @@ void bond_alb_monitor(struct work_struct *work)
 			read_lock(&bond->lock);
 		}
 
-		if (bond_info->rlb_rebalance) {
-			bond_info->rlb_rebalance = 0;
-			rlb_rebalance(bond);
-		}
-
 		/* check if clients need updating */
 		if (bond_info->rx_ntt) {
 			if (bond_info->rlb_update_delay_counter) {
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH 1/4] bonding: allow previous slave to be used when re-balancing traffic on tlb/alb interfaces
From: Andy Gospodarek @ 2009-09-11 21:10 UTC (permalink / raw)
  To: netdev, fubar, bonding-devel

[PATCH] bonding: allow previous slave to be used when re-balancing traffic on tlb/alb interfaces

When using tlb (mode 5) or alb (mode 6) bonding, a task runs every 10s
and re-balances the output devices based on load.  I was trying to
diagnose some connectivity issues and realized that a high-traffic host
would often switch output interfaces every 10s.  I discovered this
happened because the 'least loaded interface' was chosen as the next
output interface for any given stream and quite often some lower load
traffic would slip in an take the interface previously used by our
stream.  This meant the 'least loaded interface' was no longer the one
we used during the last interval.

The switching of streams to another interface was not extremely helpful
as it would force the destination host or router to update its ARP
tables and produce some additional ARP traffic as the destination host
verified that is was using the MAC address it expected.  Having the
destination MAC for a given IP change every 10s seems undesirable.

The decision was made to use the same slave during this interval if the
current load on that interface was < 10.  A load of < 10 indicates that
during the last 10s sample, roughly 100bytes were sent by all streams
currently assigned to that interface.  This essentially means the
interface is unloaded, but allows for a few frames that will probably
have minimal impact to slip into the same interface we were using in the
past.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

---
 drivers/net/bonding/bond_alb.c |   21 ++++++++++++++++++++-
 drivers/net/bonding/bond_alb.h |    4 ++++
 2 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 46d312b..bcf25c6 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -143,6 +143,7 @@ static inline void tlb_init_table_entry(struct tlb_client_info *entry, int save_
 		entry->load_history = 1 + entry->tx_bytes /
 				      BOND_TLB_REBALANCE_INTERVAL;
 		entry->tx_bytes = 0;
+		entry->last_slave = entry->tx_slave;
 	}

 	entry->tx_slave = NULL;
@@ -263,6 +264,24 @@ static struct slave *tlb_get_least_loaded_slave(struct bonding *bond)
 	return least_loaded;
 }

+/* Caller must hold bond lock for read and hashtbl lock */
+static struct slave *tlb_get_best_slave(struct bonding *bond, u32 hash_index)
+{
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+	struct tlb_client_info *tx_hash_table = bond_info->tx_hashtbl;
+	struct slave *last_slave = tx_hash_table[hash_index].last_slave;
+	struct slave *next_slave = NULL;
+
+	if (last_slave && SLAVE_IS_OK(last_slave)) {
+		/* Use the last slave listed in the tx hashtbl if:
+		   the last slave currently is essentially unloaded. */
+		if (SLAVE_TLB_INFO(last_slave).load < 10)
+			next_slave = last_slave;
+	}
+
+	return next_slave ? next_slave : tlb_get_least_loaded_slave(bond);
+}
+
 /* Caller must hold bond lock for read */
 static struct slave *tlb_choose_channel(struct bonding *bond, u32 hash_index, u32 skb_len)
 {
@@ -275,7 +294,7 @@ static struct slave *tlb_choose_channel(struct bonding *bond, u32 hash_index, u3
 	hash_table = bond_info->tx_hashtbl;
 	assigned_slave = hash_table[hash_index].tx_slave;
 	if (!assigned_slave) {
-		assigned_slave = tlb_get_least_loaded_slave(bond);
+		assigned_slave = tlb_get_best_slave(bond, hash_index);

 		if (assigned_slave) {
 			struct tlb_slave_info *slave_info =
diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
index 50968f8..b65fd29 100644
--- a/drivers/net/bonding/bond_alb.h
+++ b/drivers/net/bonding/bond_alb.h
@@ -36,6 +36,10 @@ struct tlb_client_info {
 				 * packets to a Client that the Hash function
 				 * gave this entry index.
 				 */
+	struct slave *last_slave; /* Pointer to last slave used for transmiting
+				 * packets to a Client that the Hash function
+				 * gave this entry index.
+				 */
 	u32 tx_bytes;		/* Each Client acumulates the BytesTx that
 				 * were tranmitted to it, and after each
 				 * CallBack the LoadHistory is devided
-- 
1.5.5.6

^ permalink raw reply related

* Re: bisect results of MSI-X related panic (help!)
From: Jesper Juhl @ 2009-09-11 21:05 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: linux-kernel, netdev
In-Reply-To: <1252699744.3877.15.camel@jbrandeb-hc.jf.intel.com>

On Fri, 11 Sep 2009, Jesse Brandeburg wrote:

> I've been attempting to isolate a problem that we see on x86_64, when we
> have many (6 or more) MSI-X enabled LAN ports with 33 MSI-X vectors
> each.
>
> The system panics, but with almost random panic traces, usually
> somewhere around something to do with an interrupt. 2.6.29 is fine,
> 2.6.30-rc1 is not, and 2.6.31-rc8 fails as well.
>
> The test I am using to reproduce is
> rmmod ixgbe
> modprobe ixgbe
> ip l set ethX up (X = 1 8 9 10 11 12 13 14 15)
> run set_irq_affinity script (binds rx0/tx0 to cpu0, rx1/tx1 to cpu1, for
> each ethX)
> ping -f -c 5000 host
>
> I've bisected, here is my bisect log, problem is that the commit
> identified is a merge commit, and *I don't know what to revert to test*.
> It appears the parent of the merge:
> 6e15cf04860074ad032e88c306bea656bbdd0f22 is marked good, but looks to be
> in a possibly related area to the panic.
>
> Can someone please help me figure out what to do next?

I don't know if I can help, but I'll try. At least I can tell you what I'd 
do if I had no other input - perhaps it'll help you, perhaps not...

First thing I'd do would be to test with the final 2.6.31 and the latest 
git kernel. Who knows, if you're lucky it may already be fixed.

Second thing I'd do would be to try and cut down my .config to the bare 
minimum needed to boot and reproduce the bug on the box in question.
I'd do this for two reasons; 1) perhaps you'll discover that 
disabeling/enabeling a certain kernel option makes the problem go away. 
That would be useful info. 2) having a bare minimum .config makes it 
faster to re-build kernels when doing a bisect.

Third thing I'd do would be to re-do the bisect using the 2.6.31 (or 
latest git) kernel as the starting point. The new bisect will pick 
different patches as the test points and may lead to a better result (at 
least it sometimes has for me).

Fourth thing I'd do (assuming the above did not produce anything useful) 
would be to take my minimal config and enable every single debug option 
(no matter how irrelevant it seemed) I could on top of it, and hope that 
one of them would catch something that would help me identify the problem.

If all of the above failed to produce any clue I'd ask for help on the 
mailing lists :)   Sorry, but that's all I can think of.  Hope it helps.

-- 
Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
Plain text mails only, please      http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

^ permalink raw reply

* net-next-2.6 submission delayed...
From: David Miller @ 2009-09-11 21:03 UTC (permalink / raw)
  To: netdev

Some tg3 driver change craps out the card on my workstation
after just a few TCP packets back and forth, so I have to
bisect and debug this before I can ask Linus to pull.

I believe there was a similar report made a week or so
ago.

^ permalink raw reply

* lockup with 2.6.31 while running sfuzz.
From: Dave Jones @ 2009-09-11 20:54 UTC (permalink / raw)
  To: netdev

Just before locking up completely, I managed to capture this ..
Repeated it twice. Happens within a few minutes of running.

	Dave

BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
IP: [<ffffffff81096d61>] __lock_acquire+0xae/0xc0e
PGD 3088f067 PUD 3146c067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 0 
Modules linked in: ip_queue sctp libcrc32c ip6_queue can_bcm sco cmtp kernelcapi bnep can_raw hidp l2cap rds rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr pppoe pppox ppp_generic slhc atm appletalk can af_key rose ax25 bluetooth rfkill ipx p8022 psnap llc p8023 decnet irda crc_ccitt gfs2 dlm configfs nfsd lockd nfs_acl auth_rpcgss sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 p4_clockmod freq_table speedstep_lib xfs exportfs vfat fat ext2 dm_multipath snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer e1000 i2c_i801 snd iTCO_wdt shpchp iTCO_vendor_support e752x_edac ppdev edac_core parport_pc soundcore snd_page_alloc dcdbas parport raid1 raid0 floppy radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: freq_table]
Pid: 1859, comm: sfuzz Not tainted 2.6.31-2.fc12.x86_64 #1 Precision WorkStation 470    
RIP: 0010:[<ffffffff81096d61>]  [<ffffffff81096d61>] __lock_acquire+0xae/0xc0e
RSP: 0018:ffff88003085fb68  EFLAGS: 00010046
RAX: 0000000000000046 RBX: ffff88003143a4a0 RCX: ffffffff81439f9c
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000068
RBP: ffff88003085fbe8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
R13: 0000000000000068 R14: 0000000000000002 R15: 0000000000000000
FS:  00007f3df0ecd700(0000) GS:ffff880004600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000070 CR3: 0000000030da4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sfuzz (pid: 1859, threadinfo ffff88003085e000, task ffff88003143a4a0)
Stack:
 ffff88003085fb78 ffffffff81019a3b ffff88003085fb88 0000000067f0b452
<0> 000000003085fbb8 0000000067f0b452 ffff880000000000 ffffffff815041ee
<0> ffffffff817e5da8 0000000067f0b452 ffff88003085fbe8 0000000000000002
Call Trace:
 [<ffffffff81019a3b>] ? native_sched_clock+0x2d/0x62
 [<ffffffff815041ee>] ? __mutex_unlock_slowpath+0x12f/0x158
 [<ffffffff810979af>] lock_acquire+0xee/0x12e
 [<ffffffff81439f9c>] ? lock_sock_nested+0x4d/0x12d
 [<ffffffff814552b5>] ? rtnl_lock+0x2a/0x40
 [<ffffffff81439f9c>] ? lock_sock_nested+0x4d/0x12d
 [<ffffffff815062a2>] _spin_lock_bh+0x4a/0x93
 [<ffffffff81439f9c>] ? lock_sock_nested+0x4d/0x12d
 [<ffffffff81439f9c>] lock_sock_nested+0x4d/0x12d
 [<ffffffffa06b20c2>] lock_sock+0x23/0x39 [can_raw]
 [<ffffffffa06b2b98>] raw_release+0x3c/0x12f [can_raw]
 [<ffffffff81436b96>] sock_release+0x32/0x98
 [<ffffffff81436c34>] sock_close+0x38/0x50
 [<ffffffff81143e21>] __fput+0x137/0x200
 [<ffffffff81143f17>] fput+0x2d/0x43
 [<ffffffff81438007>] sys_accept4+0x1f4/0x224
 [<ffffffff81141fa4>] ? fsnotify_modify+0x7b/0x9a
 [<ffffffff81011f7a>] ? sysret_check+0x2e/0x69
 [<ffffffff810c3bae>] ? audit_syscall_entry+0x12d/0x16d
 [<ffffffff81438202>] sys_accept+0x23/0x39
 [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b
Code: 00 be f4 09 00 00 0f 85 0c 0b 00 00 e9 a4 0a 00 00 83 fe 07 76 11 e8 e7 aa 1e 00 48 c7 c7 3e c1 66 81 e9 c0 0a 00 00 85 f6 75 09 <49> 8b 45 08 48 85 c0 75 2b 31 d2 4c 89 ef 48 89 4d 98 4c 89 4d 
RIP  [<ffffffff81096d61>] __lock_acquire+0xae/0xc0e
 RSP <ffff88003085fb68>
CR2: 0000000000000070
---[ end trace 6d2b85c48fdea652 ]---


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox