Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] Freeing alive inet6 address
From: Denis V. Lunev @ 2007-09-07 10:21 UTC (permalink / raw)
  To: den, adobriyan, xemul, dev, kuznet, yoshfuji, davem; +Cc: netdev, devel

From: Denis V. Lunev <den@openvz.org>

addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
and drops the count. So, in6_ifa_put perrformed at out: is extra. This
results in message: "Freeing alive inet6 address" and not released dst entries.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>

--- ./net/ipv6/ndisc.c.ipv6dad	2007-09-03 16:54:32.000000000 +0400
+++ ./net/ipv6/ndisc.c	2007-09-07 13:34:30.000000000 +0400
@@ -736,7 +736,7 @@ static void ndisc_recv_ns(struct sk_buff
 				 * so fail our DAD process
 				 */
 				addrconf_dad_failure(ifp);
-				goto out;
+				return;
 			} else {
 				/*
 				 * This is not a dad solicitation.

^ permalink raw reply

* Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
From: jamal @ 2007-09-07 13:22 UTC (permalink / raw)
  To: James Chapman
  Cc: netdev, davem, jeff, mandeep.baines, ossthema, Stephen Hemminger
In-Reply-To: <46E11A61.9030409@katalix.com>

On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote:
> Not really. I used 3-year-old, single CPU x86 boxes with e100 
> interfaces. 
> The idle poll change keeps them in polled mode. Without idle 
> poll, I get twice as many interrupts as packets, one for txdone and one 
> for rx. NAPI is continuously scheduled in/out.

Certainly faster than the machine in the paper (which was about 2 years
old in 2005).
I could never get ping -f to do that for me - so things must be getting
worse with newer machines then.

> No. Since I did a flood ping from the machine under test, the improved 
> latency meant that the ping response was handled more quickly, causing 
> the next packet to be sent sooner. So more packets were transmitted in 
> the allotted time (10 seconds).

ok.

> With current NAPI:
> rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 
> 1.611/1.421 ms
> 
> With idle poll changes:
> rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 
> 1.175/1.236 ms

Not bad in terms of latency. The deviation certainly looks better.

> But the CPU has done more work. 

I am going to be the devil's advocate[1]:
If the problem i am trying to solve is "reduce cpu use at lower rate",
then this is not the right answer because your cpu use has gone up.
Your latency numbers have not improved that much (looking at the avg)
and your throughput is not that much higher. Will i be willing to pay
more cpu (of an already piggish cpu use by NAPI at that rate with 2
interupts per packet)?

Another test: try a simple ping and compare the rtts.

> The problem I started thinking about was the one where NAPI thrashes 
> in/out of polled mode at higher and higher rates as network interface 
> speeds and CPU speeds increase. A flood ping demonstrates this even on 
> 100M links on my boxes. 

things must be getting worse in the state of average hardware out there.
It will be worthwile exercise to compare on an even faster machine
and see what transpires there.

> Networking boxes want consistent 
> performance/latency for all traffic patterns and they need to avoid 
> interrupt livelock. Current practice seems to be to use hardware 
> interrupt mitigation or timers to limit interrupt rate but this just 
> hurts latency, as you noted. So I'm trying to find a way to limit the 
> NAPI interrupt rate without increasing latency. My comment about this 
> approach being suitable for routers and networked servers is that these 
> boxes care more about minimizing packet latency than they do about 
> wasting CPU cycles by polling idle devices.

I think the arguement of "who cares about a little more cpu" is valid
for the case of routers. It is a double edged sword, because it applies
to the case of "who cares if NAPI uses a little more cpu at low rates"
and "who cares if James turns on polling and abuses a little more-more
cpu". Since NAPI is the incumbent, the onus(sp?) is to do better. You
must do better sir!

Look at the timers, she said - that way you may be able to cut the cpu
abuse.

cheers,
jamal

[1] historically the devils advocate was a farce really ;->

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Johannes Berg @ 2007-09-07 13:27 UTC (permalink / raw)
  To: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: Herbert Xu, satyam-wEGCiKHe2LqWVfeAwA7xHQ,
	flo-BCn6idZOOBwdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	michal.k.k.piotrowski-Re5JQEeQqe8AvxtiuMwx3w,
	ipw3945-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	yi.zhu-ral2JQCrhuEAvxtiuMwx3w, flamingice-R9e9/4HEdknk1uMJSBkQmQ
In-Reply-To: <20070906154612.GD8030-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]

On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:

> Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
> this code.  I therefore especially like the indications of where RTNL
> is held and not!!!

:)

> Some questions below based on a quick scan.  And a global question:
> should the comments about RTNL being held be replaced by ASSERT_RTNL()?

I don't like ASSERT_RTNL() much because it actually tries to lock it.
I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
something equivalent.

In any case, I have an updated patch I'll be sending soon, and it
requires a new list walking primitive I'll also send.

> > -	write_lock_bh(&local->sub_if_lock);
> > +	/* we're under RTNL so all this is fine */
> >  	if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
> > -		write_unlock_bh(&local->sub_if_lock);
> >  		__ieee80211_if_del(local, sdata);
> >  		return -ENODEV;
> >  	}
> > -	list_add(&sdata->list, &local->sub_if_list);
> > +	list_add_tail_rcu(&sdata->list, &local->interfaces);
> 
> The _rcu is required because this list isn't protected by RTNL?

Yes, not all walkers of the list are protected by the RTNL.

> > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> >  		/* Remove all virtual interfaces that use this BSS
> >  		 * as their sdata->bss */
> >  		struct ieee80211_sub_if_data *tsdata, *n;
> > -		LIST_HEAD(tmp_list);
> > 
> > -		write_lock_bh(&local->sub_if_lock);
> 
> This code is also protected by RTNL?

Yes.

> >  	ASSERT_RTNL();
> 
> I -like- this!!!  ;-)

:)

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Paul E. McKenney @ 2007-09-07 14:25 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Herbert Xu, satyam, flo, linux-kernel, netdev, linux-wireless,
	michal.k.k.piotrowski, ipw3945-devel, yi.zhu, flamingice
In-Reply-To: <1189171635.28781.134.camel@johannes.berg>

On Fri, Sep 07, 2007 at 03:27:15PM +0200, Johannes Berg wrote:
> On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
> 
> > Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
> > this code.  I therefore especially like the indications of where RTNL
> > is held and not!!!
> 
> :)
> 
> > Some questions below based on a quick scan.  And a global question:
> > should the comments about RTNL being held be replaced by ASSERT_RTNL()?
> 
> I don't like ASSERT_RTNL() much because it actually tries to lock it.
> I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> something equivalent.

Ah!  It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
or whatever.

> In any case, I have an updated patch I'll be sending soon, and it
> requires a new list walking primitive I'll also send.

Look forward to seeing it!

> > > -	write_lock_bh(&local->sub_if_lock);
> > > +	/* we're under RTNL so all this is fine */
> > >  	if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
> > > -		write_unlock_bh(&local->sub_if_lock);
> > >  		__ieee80211_if_del(local, sdata);
> > >  		return -ENODEV;
> > >  	}
> > > -	list_add(&sdata->list, &local->sub_if_list);
> > > +	list_add_tail_rcu(&sdata->list, &local->interfaces);
> > 
> > The _rcu is required because this list isn't protected by RTNL?
> 
> Yes, not all walkers of the list are protected by the RTNL.

K.

> > > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> > >  		/* Remove all virtual interfaces that use this BSS
> > >  		 * as their sdata->bss */
> > >  		struct ieee80211_sub_if_data *tsdata, *n;
> > > -		LIST_HEAD(tmp_list);
> > > 
> > > -		write_lock_bh(&local->sub_if_lock);
> > 
> > This code is also protected by RTNL?
> 
> Yes.

Comment?  (Or is it in the function header?)

> > >  	ASSERT_RTNL();
> > 
> > I -like- this!!!  ;-)
> 
> :)

							Thanx, Paul

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Johannes Berg @ 2007-09-07 14:30 UTC (permalink / raw)
  To: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: Herbert Xu, satyam-wEGCiKHe2LqWVfeAwA7xHQ,
	flo-BCn6idZOOBwdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	michal.k.k.piotrowski-Re5JQEeQqe8AvxtiuMwx3w,
	ipw3945-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	yi.zhu-ral2JQCrhuEAvxtiuMwx3w, flamingice-R9e9/4HEdknk1uMJSBkQmQ
In-Reply-To: <20070907142538.GC8864-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

> > I don't like ASSERT_RTNL() much because it actually tries to lock it.
> > I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> > something equivalent.
> 
> Ah!  It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
> or whatever.

I don't know why it tries that anyway. Maybe it's from semaphore days
where you couldn't check _is_locked()?

> > In any case, I have an updated patch I'll be sending soon, and it
> > requires a new list walking primitive I'll also send.
> 
> Look forward to seeing it!

Will send in a minute.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Johannes Berg @ 2007-09-07 14:35 UTC (permalink / raw)
  To: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: Herbert Xu, satyam-wEGCiKHe2LqWVfeAwA7xHQ,
	flo-BCn6idZOOBwdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	michal.k.k.piotrowski-Re5JQEeQqe8AvxtiuMwx3w,
	ipw3945-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	yi.zhu-ral2JQCrhuEAvxtiuMwx3w, flamingice-R9e9/4HEdknk1uMJSBkQmQ
In-Reply-To: <20070907142538.GC8864-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

> > > > @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
> > > >  		/* Remove all virtual interfaces that use this BSS
> > > >  		 * as their sdata->bss */
> > > >  		struct ieee80211_sub_if_data *tsdata, *n;
> > > > -		LIST_HEAD(tmp_list);
> > > > 
> > > > -		write_lock_bh(&local->sub_if_lock);
> > > 
> > > This code is also protected by RTNL?
> > 
> > Yes.
> 
> Comment?  (Or is it in the function header?)

Oh, forgot to say: yes, there is a comment further up and even an
ASSERT_RTNL()

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* [RFC] mac80211: fix virtual interface locking
From: Johannes Berg @ 2007-09-07 14:38 UTC (permalink / raw)
  To: paulmck
  Cc: Herbert Xu, satyam, flo, linux-kernel, netdev, linux-wireless,
	michal.k.k.piotrowski, ipw3945-devel, yi.zhu, flamingice,
	John W. Linville
In-Reply-To: <20070907142538.GC8864@linux.vnet.ibm.com>

Florian Lohoff noticed a bug in mac80211: when bringing the
master interface down while other virtual interfaces are up
we call dev_close() under a spinlock which is not allowed.
This patch removes the sub_if_lock used by mac80211 in favour
of using an RCU list. All list manipulations are already done
under rtnl so are well protected against each other, and the
read-side locks we took in the RX and TX code are already in
RCU read-side critical sections.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Florian Lohoff <flo@rfc822.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Satyam Sharma <satyam@infradead.org>

---
If you want to test this you'll need to get the other pending patches,
as John is at KS he isn't pushing to Dave who is at KS too anyhow. Grab
from
http://johannes.sipsolutions.net/patches/net-2.6.24/all/2007-09-06-13:43/ patches 002-011, they are slated to go into net-2.6.24 if timing works out. I'll backport this fix to -stable when we actually get around to verifying it.

 net/mac80211/ieee80211.c       |  100 ++++++++++++++++++++---------------------
 net/mac80211/ieee80211_i.h     |    5 --
 net/mac80211/ieee80211_iface.c |   31 +++++-------
 net/mac80211/ieee80211_sta.c   |   12 ++--
 net/mac80211/rx.c              |    9 +--
 net/mac80211/tx.c              |   10 ++--
 6 files changed, 84 insertions(+), 83 deletions(-)

--- wireless-dev.orig/net/mac80211/ieee80211.c	2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211.c	2007-09-07 16:30:34.044429746 +0200
@@ -88,24 +88,31 @@ static struct dev_mc_list *ieee80211_get
 		return NULL;
 	}
 
-	/* start of iteration, both unassigned */
-	if (!mcd->cur && !mcd->sdata) {
-		mcd->sdata = list_entry(local->sub_if_list.next,
-					struct ieee80211_sub_if_data, list);
-		mcd->cur = mcd->sdata->dev->mc_list;
-	}
+	/*
+	 * Prepare for iteration if not done already.
+	 */
+	list_prepare_entry(mcd->sdata, &local->interfaces, list);
 
-	if (mcd->cur)
+	if (mcd->cur) {
+		/*
+		 * Iterate over the multicast addresses in
+		 * the current device (mcd->sdata).
+		 */
 		mcd->cur = mcd->cur->next;
+	}
 
-	while (!mcd->cur) {
-		/* reached end of interface list? */
-		if (mcd->sdata->list.next == &local->sub_if_list)
-			break;
-		/* otherwise try next interface */
-		mcd->sdata = list_entry(mcd->sdata->list.next,
-					struct ieee80211_sub_if_data, list);
-		mcd->cur = mcd->sdata->dev->mc_list;
+	if (!mcd->cur) {
+		/*
+		 * Iterate over the devices until finding one (the
+		 * first or the next) with multicast addresses.
+		 */
+		list_for_each_entry_continue_rcu(mcd->sdata,
+						 &local->interfaces,
+						 list) {
+			mcd->cur = mcd->sdata->dev->mc_list;
+			if (mcd->cur)
+				break;
+		}
 	}
 
 	return mcd->cur;
@@ -145,9 +152,10 @@ static void ieee80211_configure_filter(s
 
 	/*
 	 * We can iterate through the device list for the multicast
-	 * address list so need to lock it.
+	 * address list so need to be in a RCU read-side section,
+	 * the RTNL isn't held in this function.
 	 */
-	read_lock(&local->sub_if_lock);
+	rcu_read_lock();
 
 	/* be a bit nasty */
 	new_flags |= (1<<31);
@@ -163,7 +171,7 @@ static void ieee80211_configure_filter(s
 	WARN_ON(mcd.cur);
 
 	local->filter_flags = new_flags & ~(1<<31);
-	read_unlock(&local->sub_if_lock);
+	rcu_read_unlock();
 
 	netif_tx_unlock(local->mdev);
 }
@@ -176,14 +184,13 @@ static int ieee80211_master_open(struct 
 	struct ieee80211_sub_if_data *sdata;
 	int res = -EOPNOTSUPP;
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	/* we hold the RTNL here so can safely walk the list */
+	list_for_each_entry(sdata, &local->interfaces, list) {
 		if (sdata->dev != dev && netif_running(sdata->dev)) {
 			res = 0;
 			break;
 		}
 	}
-	read_unlock(&local->sub_if_lock);
 	return res;
 }
 
@@ -192,11 +199,10 @@ static int ieee80211_master_stop(struct 
 	struct ieee80211_local *local = wdev_priv(dev->ieee80211_ptr);
 	struct ieee80211_sub_if_data *sdata;
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list)
+	/* we hold the RTNL here so can safely walk the list */
+	list_for_each_entry(sdata, &local->interfaces, list)
 		if (sdata->dev != dev && netif_running(sdata->dev))
 			dev_close(sdata->dev);
-	read_unlock(&local->sub_if_lock);
 
 	return 0;
 }
@@ -395,8 +401,8 @@ static int ieee80211_open(struct net_dev
 
 	sdata = IEEE80211_DEV_TO_SUB_IF(dev);
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(nsdata, &local->sub_if_list, list) {
+	/* we hold the RTNL here so can safely walk the list */
+	list_for_each_entry(nsdata, &local->interfaces, list) {
 		struct net_device *ndev = nsdata->dev;
 
 		if (ndev != dev && ndev != local->mdev && netif_running(ndev) &&
@@ -405,10 +411,8 @@ static int ieee80211_open(struct net_dev
 			 * check whether it may have the same address
 			 */
 			if (!identical_mac_addr_allowed(sdata->type,
-							nsdata->type)) {
-				read_unlock(&local->sub_if_lock);
+							nsdata->type))
 				return -ENOTUNIQ;
-			}
 
 			/*
 			 * can only add VLANs to enabled APs
@@ -419,7 +423,6 @@ static int ieee80211_open(struct net_dev
 				sdata->u.vlan.ap = nsdata;
 		}
 	}
-	read_unlock(&local->sub_if_lock);
 
 	switch (sdata->type) {
 	case IEEE80211_IF_TYPE_WDS:
@@ -541,14 +544,13 @@ static int ieee80211_stop(struct net_dev
 		del_timer_sync(&sdata->u.sta.timer);
 		del_timer_sync(&sdata->u.sta.admit_timer);
 		/*
-		 * Holding the sub_if_lock for writing here blocks
-		 * out the receive path and makes sure it's not
-		 * currently processing a packet that may get
-		 * added to the queue.
+		 * When we get here, the interface is marked down.
+		 * Call synchronize_rcu() to wait for the RX path
+		 * should it be using the interface and enqueuing
+		 * frames at this very time on another CPU.
 		 */
-		write_lock_bh(&local->sub_if_lock);
+		synchronize_rcu();
 		skb_queue_purge(&sdata->u.sta.skb_queue);
-		write_unlock_bh(&local->sub_if_lock);
 
 		if (!local->ops->hw_scan &&
 		    local->scan_dev == sdata->dev) {
@@ -1101,9 +1103,9 @@ void ieee80211_tx_status(struct ieee8021
 
 	rthdr->data_retries = status->retry_count;
 
-	read_lock(&local->sub_if_lock);
+	rcu_read_lock();
 	monitors = local->monitors;
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 		/*
 		 * Using the monitors counter is possibly racy, but
 		 * if the value is wrong we simply either clone the skb
@@ -1119,7 +1121,7 @@ void ieee80211_tx_status(struct ieee8021
 				continue;
 			monitors--;
 			if (monitors)
-				skb2 = skb_clone(skb, GFP_KERNEL);
+				skb2 = skb_clone(skb, GFP_ATOMIC);
 			else
 				skb2 = NULL;
 			skb->dev = sdata->dev;
@@ -1134,7 +1136,7 @@ void ieee80211_tx_status(struct ieee8021
 		}
 	}
  out:
-	read_unlock(&local->sub_if_lock);
+	rcu_read_unlock();
 	if (skb)
 		dev_kfree_skb(skb);
 }
@@ -1222,8 +1224,7 @@ struct ieee80211_hw *ieee80211_alloc_hw(
 
 	INIT_LIST_HEAD(&local->modes_list);
 
-	rwlock_init(&local->sub_if_lock);
-	INIT_LIST_HEAD(&local->sub_if_list);
+	INIT_LIST_HEAD(&local->interfaces);
 
 	INIT_DELAYED_WORK(&local->scan_work, ieee80211_sta_scan_work);
 	ieee80211_rx_bss_list_init(mdev);
@@ -1242,7 +1243,8 @@ struct ieee80211_hw *ieee80211_alloc_hw(
 	sdata->u.ap.force_unicast_rateidx = -1;
 	sdata->u.ap.max_ratectrl_rateidx = -1;
 	ieee80211_if_sdata_init(sdata);
-	list_add_tail(&sdata->list, &local->sub_if_list);
+	/* no RCU needed since we're still during init phase */
+	list_add_tail(&sdata->list, &local->interfaces);
 
 	tasklet_init(&local->tx_pending_tasklet, ieee80211_tx_pending,
 		     (unsigned long)local);
@@ -1401,7 +1403,6 @@ void ieee80211_unregister_hw(struct ieee
 {
 	struct ieee80211_local *local = hw_to_local(hw);
 	struct ieee80211_sub_if_data *sdata, *tmp;
-	struct list_head tmp_list;
 	int i;
 
 	tasklet_kill(&local->tx_pending_tasklet);
@@ -1415,11 +1416,12 @@ void ieee80211_unregister_hw(struct ieee
 	if (local->apdev)
 		ieee80211_if_del_mgmt(local);
 
-	write_lock_bh(&local->sub_if_lock);
-	list_replace_init(&local->sub_if_list, &tmp_list);
-	write_unlock_bh(&local->sub_if_lock);
-
-	list_for_each_entry_safe(sdata, tmp, &tmp_list, list)
+	/*
+	 * At this point, interface list manipulations are fine
+	 * because the driver cannot be handing us frames any
+	 * more and the tasklet is killed.
+	 */
+	list_for_each_entry_safe(sdata, tmp, &local->interfaces, list)
 		__ieee80211_if_del(local, sdata);
 
 	rtnl_unlock();
--- wireless-dev.orig/net/mac80211/ieee80211_i.h	2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_i.h	2007-09-07 16:30:33.974429746 +0200
@@ -548,9 +548,8 @@ struct ieee80211_local {
 	ieee80211_rx_handler *rx_handlers;
 	ieee80211_tx_handler *tx_handlers;
 
-	rwlock_t sub_if_lock; /* Protects sub_if_list. Cannot be taken under
-			       * sta_bss_lock or sta_lock. */
-	struct list_head sub_if_list;
+	struct list_head interfaces;
+
 	int sta_scanning;
 	int scan_channel_idx;
 	enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
--- wireless-dev.orig/net/mac80211/ieee80211_iface.c	2007-09-07 10:52:12.604441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_iface.c	2007-09-07 16:30:34.244429746 +0200
@@ -79,16 +79,15 @@ int ieee80211_if_add(struct net_device *
 	ieee80211_debugfs_add_netdev(sdata);
 	ieee80211_if_set_type(ndev, type);
 
-	write_lock_bh(&local->sub_if_lock);
+	/* we're under RTNL so all this is fine */
 	if (unlikely(local->reg_state == IEEE80211_DEV_UNREGISTERED)) {
-		write_unlock_bh(&local->sub_if_lock);
 		__ieee80211_if_del(local, sdata);
 		return -ENODEV;
 	}
-	list_add(&sdata->list, &local->sub_if_list);
+	list_add_tail_rcu(&sdata->list, &local->interfaces);
+
 	if (new_dev)
 		*new_dev = ndev;
-	write_unlock_bh(&local->sub_if_lock);
 
 	return 0;
 
@@ -242,22 +241,22 @@ void ieee80211_if_reinit(struct net_devi
 		/* Remove all virtual interfaces that use this BSS
 		 * as their sdata->bss */
 		struct ieee80211_sub_if_data *tsdata, *n;
-		LIST_HEAD(tmp_list);
 
-		write_lock_bh(&local->sub_if_lock);
-		list_for_each_entry_safe(tsdata, n, &local->sub_if_list, list) {
+		list_for_each_entry_safe(tsdata, n, &local->interfaces, list) {
 			if (tsdata != sdata && tsdata->bss == &sdata->u.ap) {
 				printk(KERN_DEBUG "%s: removing virtual "
 				       "interface %s because its BSS interface"
 				       " is being removed\n",
 				       sdata->dev->name, tsdata->dev->name);
-				list_move_tail(&tsdata->list, &tmp_list);
+				list_del_rcu(&tsdata->list);
+				/*
+				 * We have lots of time and can afford
+				 * to sync for each interface
+				 */
+				synchronize_rcu();
+				__ieee80211_if_del(local, tsdata);
 			}
 		}
-		write_unlock_bh(&local->sub_if_lock);
-
-		list_for_each_entry_safe(tsdata, n, &tmp_list, list)
-			__ieee80211_if_del(local, tsdata);
 
 		kfree(sdata->u.ap.beacon_head);
 		kfree(sdata->u.ap.beacon_tail);
@@ -334,18 +333,16 @@ int ieee80211_if_remove(struct net_devic
 
 	ASSERT_RTNL();
 
-	write_lock_bh(&local->sub_if_lock);
-	list_for_each_entry_safe(sdata, n, &local->sub_if_list, list) {
+	list_for_each_entry_safe(sdata, n, &local->interfaces, list) {
 		if ((sdata->type == id || id == -1) &&
 		    strcmp(name, sdata->dev->name) == 0 &&
 		    sdata->dev != local->mdev) {
-			list_del(&sdata->list);
-			write_unlock_bh(&local->sub_if_lock);
+			list_del_rcu(&sdata->list);
+			synchronize_rcu();
 			__ieee80211_if_del(local, sdata);
 			return 0;
 		}
 	}
-	write_unlock_bh(&local->sub_if_lock);
 	return -ENODEV;
 }
 
--- wireless-dev.orig/net/mac80211/ieee80211_sta.c	2007-09-07 10:52:12.634441281 +0200
+++ wireless-dev/net/mac80211/ieee80211_sta.c	2007-09-07 10:57:23.574441281 +0200
@@ -3597,8 +3597,8 @@ void ieee80211_scan_completed(struct iee
 	memset(&wrqu, 0, sizeof(wrqu));
 	wireless_send_event(dev, SIOCGIWSCAN, &wrqu, NULL);
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 
 		/* No need to wake the master device. */
 		if (sdata->dev == local->mdev)
@@ -3612,7 +3612,7 @@ void ieee80211_scan_completed(struct iee
 
 		netif_wake_queue(sdata->dev);
 	}
-	read_unlock(&local->sub_if_lock);
+	rcu_read_unlock();
 
 	sdata = IEEE80211_DEV_TO_SUB_IF(dev);
 	if (sdata->type == IEEE80211_IF_TYPE_IBSS) {
@@ -3749,8 +3749,8 @@ static int ieee80211_sta_start_scan(stru
 
 	local->sta_scanning = 1;
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 
 		/* Don't stop the master interface, otherwise we can't transmit
 		 * probes! */
@@ -3762,7 +3762,7 @@ static int ieee80211_sta_start_scan(stru
 		    (sdata->u.sta.flags & IEEE80211_STA_ASSOCIATED))
 			ieee80211_send_nullfunc(local, sdata, 1);
 	}
-	read_unlock(&local->sub_if_lock);
+	rcu_read_unlock();
 
 	if (ssid) {
 		local->scan_ssid_len = ssid_len;
--- wireless-dev.orig/net/mac80211/rx.c	2007-09-07 10:52:12.654441281 +0200
+++ wireless-dev/net/mac80211/rx.c	2007-09-07 16:30:34.144429746 +0200
@@ -1522,8 +1522,9 @@ void __ieee80211_rx(struct ieee80211_hw 
 	}
 
 	/*
-	 * key references are protected using RCU and this requires that
-	 * we are in a read-site RCU section during receive processing
+	 * key references and virtual interfaces are protected using RCU
+	 * and this requires that we are in a read-side RCU section during
+	 * receive processing
 	 */
 	rcu_read_lock();
 
@@ -1578,8 +1579,7 @@ void __ieee80211_rx(struct ieee80211_hw 
 
 	bssid = ieee80211_get_bssid(hdr, skb->len - radiotap_len);
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 		rx.flags |= IEEE80211_TXRXD_RXRA_MATCH;
 
 		if (!netif_running(sdata->dev))
@@ -1632,7 +1632,6 @@ void __ieee80211_rx(struct ieee80211_hw 
 					     &rx, sta);
 	} else
 		dev_kfree_skb(skb);
-	read_unlock(&local->sub_if_lock);
 
  end:
 	rcu_read_unlock();
--- wireless-dev.orig/net/mac80211/tx.c	2007-09-07 10:52:12.674441281 +0200
+++ wireless-dev/net/mac80211/tx.c	2007-09-07 12:20:51.174437343 +0200
@@ -291,8 +291,12 @@ static void purge_old_ps_buffers(struct 
 	struct ieee80211_sub_if_data *sdata;
 	struct sta_info *sta;
 
-	read_lock(&local->sub_if_lock);
-	list_for_each_entry(sdata, &local->sub_if_list, list) {
+	/*
+	 * virtual interfaces are protected by RCU
+	 */
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 		struct ieee80211_if_ap *ap;
 		if (sdata->dev == local->mdev ||
 		    sdata->type != IEEE80211_IF_TYPE_AP)
@@ -305,7 +309,7 @@ static void purge_old_ps_buffers(struct 
 		}
 		total += skb_queue_len(&ap->ps_bc_buf);
 	}
-	read_unlock(&local->sub_if_lock);
+	rcu_read_unlock();
 
 	read_lock_bh(&local->sta_lock);
 	list_for_each_entry(sta, &local->sta_list, list) {



^ permalink raw reply

* problems with lockd in 2.6.22.6
From: Wolfgang Walter @ 2007-09-07 15:49 UTC (permalink / raw)
  To: neilb; +Cc: netdev, nfs

Hello,

we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from ^\\236^\É^D

1) These random characters in the second line are caused by a bug in svc_tcp_accept.
I already posted this patch on netdev@vger.kernel.org:

Signed-off-by: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
--- linux-2.6.22.6/net/sunrpc/svcsock.c	2007-08-27 18:10:14.000000000 +0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c	2007-09-03 18:27:30.000000000 +0200
@@ -1090,7 +1090,7 @@
 						   serv->sv_name);
 				printk(KERN_NOTICE
 				       "%s: last TCP connect from %s\n",
-				       serv->sv_name, buf);
+				       serv->sv_name, __svc_print_addr(sin, buf, sizeof(buf)));
 			}
 			/*
 			 * Always select the oldest socket. It's not fair,

with this patch applied one gets something like

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from 10.11.0.12, port=784

2) The number of nfsd threads we are running on the machine is 1024. So this is not
the problem. It seems, though, that in the case of lockd svc_tcp_accept does not
check the number of nfsd threads but the number of lockd threads which is one.
As soon as the number of open lockd sockets surpasses 80 this message gets logged.
This usually happens every evening when a lot of people shutdown their workstation.

3) For unknown reason these sockets then remain open. In the morning when people
start their workstation again we therefor not only get a lot of these messages
again but often the nfs-server does not proberly work any more. Restarting the
nfs-daemon is a workaround.

Reagrds,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Michael Buesch @ 2007-09-07 16:01 UTC (permalink / raw)
  To: Johannes Berg
  Cc: paulmck, Herbert Xu, satyam, flo, linux-kernel, netdev,
	linux-wireless, michal.k.k.piotrowski, ipw3945-devel, yi.zhu,
	flamingice
In-Reply-To: <1189171635.28781.134.camel@johannes.berg>

On Friday 07 September 2007, Johannes Berg wrote:
> On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
> 
> > Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
> > this code.  I therefore especially like the indications of where RTNL
> > is held and not!!!
> 
> :)
> 
> > Some questions below based on a quick scan.  And a global question:
> > should the comments about RTNL being held be replaced by ASSERT_RTNL()?
> 
> I don't like ASSERT_RTNL() much because it actually tries to lock it.
> I'd be much happer if it was WARN_ON(!mutex_locked(&rtnl_mutex)) or
> something equivalent.

What's the problem with trying to lock it?
In the paths where you insert this assertion, you will be locked.
So the trylock will fail and not cause any blocking or something else.
It's basically not more expensive than your mutex_locked test.
And the !mutex_locked test might not work on UP (Not sure, about
the current implementation.)

^ permalink raw reply

* PATCH  to bug #8876
From: Nikolay Kopitonenko @ 2007-09-07 14:46 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi there!

Below is a fix for this:
http://bugzilla.kernel.org/show_bug.cgi?id=8876


Applies to any version since 2.6.22 to latest: 2.6.23-rc5-git1

please apply :)


-------------------------CUT---------------------
diff -urN a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c	2007-07-09 02:32:17.000000000 +0300
+++ b/net/ipv4/devinet.c	2007-08-10 20:33:22.000000000 +0300
@@ -1193,7 +1193,7 @@
 		for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
 		     ifa = ifa->ifa_next, ip_idx++) {
 			if (ip_idx < s_ip_idx)
-				goto cont;
+				continue;
 			if (inet_fill_ifaddr(skb, ifa, NETLINK_CB(cb->skb).pid,
 					     cb->nlh->nlmsg_seq,
 					     RTM_NEWADDR, NLM_F_MULTI) <= 0)
-------------------------/CUT---------------------

Signed-off-by: Nikolay.Kopitonenko@yourserveradmin.com


Thanks

Nikolay Kopitonenko

^ permalink raw reply

* Re: [NFS] problems with lockd in 2.6.22.6
From: J. Bruce Fields @ 2007-09-07 16:19 UTC (permalink / raw)
  To: Wolfgang Walter; +Cc: neilb, netdev, nfs
In-Reply-To: <200709071749.55760.wolfgang.walter@studentenwerk.mhn.de>

On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> Hello,
> 
> we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message
> 
> lockd: too many open TCP sockets, consider increasing the number of nfsd threads
> lockd: last TCP connect from ^\\236^\É^D
> 
> 1) These random characters in the second line are caused by a bug in svc_tcp_accept.
> I already posted this patch on netdev@vger.kernel.org:

Thanks, I've applied that.  (The bug is a little subtle: there's
actually two previous __svc_print_addr() calls which might have
initialized "buf" correctly, and it's not obvious that the second isn't
always called (since it's in a dprintk, which is a macro that expands
into a printk inside a conditional)).

> with this patch applied one gets something like
> 
> lockd: too many open TCP sockets, consider increasing the number of
> nfsd threads lockd: last TCP connect from 10.11.0.12, port=784
> 
> 
> 2) The number of nfsd threads we are running on the machine is 1024.
> So this is not the problem. It seems, though, that in the case of
> lockd svc_tcp_accept does not check the number of nfsd threads but the
> number of lockd threads which is one.  As soon as the number of open
> lockd sockets surpasses 80 this message gets logged.  This usually
> happens every evening when a lot of people shutdown their workstation.

So to be clear: there's not an actual problem here other than that the
logs are getting spammed?  (Not that that isn't a problem in itself.)

> 3) For unknown reason these sockets then remain open. In the morning
> when people start their workstation again we therefor not only get a
> lot of these messages again but often the nfs-server does not proberly
> work any more. Restarting the nfs-daemon is a workaround.

Hm, thanks.

--b.

^ permalink raw reply

* Re: 2.6.23-rc4-mm1: e1000e napi lockup
From: Kok, Auke @ 2007-09-07 16:24 UTC (permalink / raw)
  To: David Miller, jirislaby; +Cc: akpm, netdev, e1000-devel
In-Reply-To: <20070907.010338.41638771.davem@davemloft.net>

David Miller wrote:
> From: Jiri Slaby <jirislaby@gmail.com>
> Date: Fri, 07 Sep 2007 09:19:30 +0200
> 
>> I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
>> napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
> 
> Yes, the semantics changed slightly in the net-2.6.24 tree the
> other week and someone needs to fix it up.
> 
> The netif_napi_add() implicitly does a napi_disable() call.  Device
> open must explicitly napi_enable() and device close must explicitly
> napi_disable(), and if done elsewhere these calls must be strictly
> balanced.

I'll fix it... it's my patch that adds the new napi code to it and I need to get 
it ready for the merge window anyway.

thanks for testing.

Auke

^ permalink raw reply

* Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
From: Kok, Auke @ 2007-09-07 16:31 UTC (permalink / raw)
  To: David Acker
  Cc: John Ronciak, Jesse Brandeburg, Jeff Kirsher, Milton Miller,
	Jeff Garzik, netdev, e1000-devel, Scott Feldman
In-Reply-To: <20070831205430.7209E46C20E@localhost>

David Acker wrote:
> On the systems that have cache incoherent DMA, including ARM, there is a
> race condition between software allocating a new receive buffer and hardware
> writing into a buffer.  The two race on touching the last Receive Frame
> Descriptor (RFD).  It has its el-bit set and its next link equal to 0.
> When hardware encounters this buffer it attempts to write data to it and
> then update Status Word bits and Actual Count in the RFD.  At the same time
> software may try to clear the el-bit and set the link address to a new buffer.
> 
> Since the entire RFD is once cache-line, the two write operations can collide.
> This can lead to the receive unit stalling or interpreting random memory as
> its receive area.
> 
> The fix is to set the el-bit on and the size to 0 on the next to last buffer
> in the chain.  When the hardware encounters this buffer it stops and does not
> write to it at all.  The hardware issues an RNR interrupt with the receive
> unit in the No Resources state.  Software can write to the tail of the list
> because it knows hardware will stop on the previous descriptor that was
> marked as the end of list.
> 
> Once it has a new next to last buffer prepared, it can clear the el-bit and
> set the size on the previous one.  The race on this buffer is safe since
> the link already points to a valid next buffer and the software can handle
> the race setting the size (assuming aligned 16 bit writes are atomic with
> respect to the DMA read). If the hardware sees the el-bit cleared without
> the size set, it will move on to the next buffer and skip this one.  If it
> sees the size set but the el-bit still set, it will complete that buffer
> and then RNR interrupt and wait.
> 
> Flags are kept in the software descriptor to note if the el bit is set and if
> the size was 0.  When software clears the RFD's el bit and set its size, it
> also clears the el flag but leaves the size was 0 bit set.  This way software
> can identify them when the race may have occurred when cleaning the ring.
> On these descriptors, it looks ahead and if the next one is complete then
> hardware must have skipped the current one.  Logic is added to prevent two
> packets in a row being marked while the receiver is running to avoid running
> in lockstep with the hardware and thereby limiting the required lookahead.
> 
> This is a patch for 2.6.23-rc4.
> 
> Signed-off-by: David Acker <dacker@roinet.com>


first impressions are not good: pings are erratic and shoot up to 3 seconds. In 
an overnight stress test, the receive unit went offline and never came back up 
(TX still working).

it sounds like something in the logic is suspending the ru too much, but I 
haven't had time to look deeply into the code yet.

Auke


> 
> ---
> 
> --- linux-2.6.23-rc4/drivers/net/e100.c.orig	2007-08-30 13:32:10.000000000 -0400
> +++ linux-2.6.23-rc4/drivers/net/e100.c	2007-08-30 15:42:07.000000000 -0400
> @@ -106,6 +106,13 @@
>   *	the RFD, the RFD must be dma_sync'ed to maintain a consistent
>   *	view from software and hardware.
>   *
> + *	In order to keep updates to the RFD link field from colliding with
> + *	hardware writes to mark packets complete, we use the feature that
> + *	hardware will not write to a size 0 descriptor and mark the previous
> + *	packet as end-of-list (EL).   After updating the link, we remove EL
> + *	and only then restore the size such that hardware may use the
> + *	previous-to-end RFD. 
> + *
>   *	Under typical operation, the  receive unit (RU) is start once,
>   *	and the controller happily fills RFDs as frames arrive.  If
>   *	replacement RFDs cannot be allocated, or the RU goes non-active,
> @@ -281,14 +288,14 @@ struct csr {
>  };
>  
>  enum scb_status {
> +	rus_no_res       = 0x08,
>  	rus_ready        = 0x10,
>  	rus_mask         = 0x3C,
>  };
>  
>  enum ru_state  {
> -	RU_SUSPENDED = 0,
> -	RU_RUNNING	 = 1,
> -	RU_UNINITIALIZED = -1,
> +	ru_stopped = 0,
> +	ru_running = 1,
>  };
>  
>  enum scb_stat_ack {
> @@ -401,10 +408,16 @@ struct rfd {
>  	u16 size;
>  };
>  
> +enum rx_flags {
> +	rx_el = 0x01,
> +	rx_s0 = 0x02,
> +};
> +
>  struct rx {
>  	struct rx *next, *prev;
>  	struct sk_buff *skb;
>  	dma_addr_t dma_addr;
> +	u8 flags;
>  };
>  
>  #if defined(__BIG_ENDIAN_BITFIELD)
> @@ -952,7 +965,7 @@ static void e100_get_defaults(struct nic
>  		((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));
>  
>  	/* Template for a freshly allocated RFD */
> -	nic->blank_rfd.command = cpu_to_le16(cb_el);
> +	nic->blank_rfd.command = 0;
>  	nic->blank_rfd.rbd = 0xFFFFFFFF;
>  	nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
>  
> @@ -1753,18 +1766,48 @@ static int e100_alloc_cbs(struct nic *ni
>  	return 0;
>  }
>  
> -static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
> +static void e100_find_mark_el(struct nic *nic, struct rx *marked_rx, int is_running)
>  {
> -	if(!nic->rxs) return;
> -	if(RU_SUSPENDED != nic->ru_running) return;
> +	struct rx *rx = nic->rx_to_use->prev->prev;
> +	struct rfd *rfd;
> +
> +	if (marked_rx == rx)
> +		return;
> +
> +	rfd = (struct rfd *) rx->skb->data;
> +	rfd->command |= cpu_to_le16(cb_el);
> +	rfd->size = 0;
> +	pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
> +	rx->flags |= (rx_el | rx_s0);
> +
> +	if (!marked_rx)
> +		return;
> +
> +	rfd = (struct rfd *) marked_rx->skb->data;
> +	rfd->command &= ~cpu_to_le16(cb_el);
> +	pci_dma_sync_single_for_device(nic->pdev, marked_rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
> +
> +	rfd->size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
> +	pci_dma_sync_single_for_device(nic->pdev, marked_rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
>  
> -	/* handle init time starts */
> -	if(!rx) rx = nic->rxs;
> +	if (is_running)
> +		marked_rx->flags &= ~rx_el;
> +	else
> +		marked_rx->flags &= ~(rx_el | rx_s0);
> +}
> +
> +static inline void e100_start_receiver(struct nic *nic)
> +{
> +	if(!nic->rxs) return;
> +	if (ru_stopped != nic->ru_running) return;
>  
>  	/* (Re)start RU if suspended or idle and RFA is non-NULL */
> -	if(rx->skb) {
> -		e100_exec_cmd(nic, ruc_start, rx->dma_addr);
> -		nic->ru_running = RU_RUNNING;
> +	if (nic->rx_to_clean->skb) {
> +		e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
> +		nic->ru_running = ru_running;
>  	}
>  }
>  
> @@ -1793,8 +1836,6 @@ static int e100_rx_alloc_skb(struct nic 
>  		struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
>  		put_unaligned(cpu_to_le32(rx->dma_addr),
>  			(u32 *)&prev_rfd->link);
> -		wmb();
> -		prev_rfd->command &= ~cpu_to_le16(cb_el);
>  		pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
>  			sizeof(struct rfd), PCI_DMA_TODEVICE);
>  	}
> @@ -1808,6 +1849,7 @@ static int e100_rx_indicate(struct nic *
>  	struct sk_buff *skb = rx->skb;
>  	struct rfd *rfd = (struct rfd *)skb->data;
>  	u16 rfd_status, actual_size;
> +	u8 status;
>  
>  	if(unlikely(work_done && *work_done >= work_to_do))
>  		return -EAGAIN;
> @@ -1819,9 +1861,47 @@ static int e100_rx_indicate(struct nic *
>  
>  	DPRINTK(RX_STATUS, DEBUG, "status=0x%04X\n", rfd_status);
>  
> -	/* If data isn't ready, nothing to indicate */
> -	if(unlikely(!(rfd_status & cb_complete)))
> +	/* 
> +	 * If data isn't ready, nothing to indicate
> +	 * If both the el and s0 rx flags are set, we have hit the marked
> +	 * buffer but we don't know if hardware has seen it so we check
> +	 * the status.
> +	 * If only the s0 flag is set, we check the next buffer.
> +	 * If it is complete, we know that hardware saw the rfd el bit
> +	 * get cleared but did not see the rfd size get set so it
> +	 * skipped this buffer.  We just return 0 and look at the
> +	 * next buffer.
> +	 * If only the s0 flag is set but the next buffer is
> +	 * not complete, we cleared the el flag as hardware
> +	 * hit this buffer.
> +	 */
> +	if (unlikely(!(rfd_status & cb_complete))) {
> +		u8 maskedFlags = rx->flags & (rx_el | rx_s0);
> +		if (maskedFlags == (rx_el | rx_s0)) {
> +			status = readb(&nic->csr->scb.status);
> +			if (status & rus_no_res)
> +				nic->ru_running = ru_stopped;
> +		} else if (maskedFlags == rx_s0) {
> +			struct rx *next_rx = rx->next;
> +			struct rfd *next_rfd = (struct rfd *)next_rx->skb->data;
> +			pci_dma_sync_single_for_cpu(nic->pdev,
> +				next_rx->dma_addr, sizeof(struct rfd),
> +				PCI_DMA_FROMDEVICE);
> +			if (next_rfd->status & cpu_to_le16(cb_complete)) {
> +				pci_unmap_single(nic->pdev, rx->dma_addr,
> +					RFD_BUF_LEN, PCI_DMA_FROMDEVICE);
> +				dev_kfree_skb_any(skb);
> +				rx->skb = NULL;
> +				rx->flags &= ~rx_s0;
> +				return 0;
> +			} else {
> +				status = readb(&nic->csr->scb.status);
> +				if (status & rus_no_res)
> +					nic->ru_running = ru_stopped;
> +			}
> +		}
>  		return -ENODATA;
> +	}
>  
>  	/* Get actual data size */
>  	actual_size = le16_to_cpu(rfd->actual_size) & 0x3FFF;
> @@ -1832,9 +1912,15 @@ static int e100_rx_indicate(struct nic *
>  	pci_unmap_single(nic->pdev, rx->dma_addr,
>  		RFD_BUF_LEN, PCI_DMA_FROMDEVICE);
>  
> -	/* this allows for a fast restart without re-enabling interrupts */
> -	if(le16_to_cpu(rfd->command) & cb_el)
> -		nic->ru_running = RU_SUSPENDED;
> +	/*
> +	 * This happens when hardward sees the rfd el flag set
> +	 * but then sees the rfd size set as well
> +	 */
> +	if (le16_to_cpu(rfd->command) & cb_el) {
> +		status = readb(&nic->csr->scb.status);
> +		if (status & rus_no_res)
> +			nic->ru_running = ru_stopped;
> +	}
>  
>  	/* Pull off the RFD and put the actual data (minus eth hdr) */
>  	skb_reserve(skb, sizeof(struct rfd));
> @@ -1865,32 +1951,34 @@ static int e100_rx_indicate(struct nic *
>  static void e100_rx_clean(struct nic *nic, unsigned int *work_done,
>  	unsigned int work_to_do)
>  {
> -	struct rx *rx;
> +	struct rx *rx, *marked_rx;
>  	int restart_required = 0;
> -	struct rx *rx_to_start = NULL;
> -
> -	/* are we already rnr? then pay attention!!! this ensures that
> -	 * the state machine progression never allows a start with a
> -	 * partially cleaned list, avoiding a race between hardware
> -	 * and rx_to_clean when in NAPI mode */
> -	if(RU_SUSPENDED == nic->ru_running)
> -		restart_required = 1;
> +	int err = 0;
>  
>  	/* Indicate newly arrived packets */
>  	for(rx = nic->rx_to_clean; rx->skb; rx = nic->rx_to_clean = rx->next) {
> -		int err = e100_rx_indicate(nic, rx, work_done, work_to_do);
> -		if(-EAGAIN == err) {
> -			/* hit quota so have more work to do, restart once
> -			 * cleanup is complete */
> -			restart_required = 0;
> +		err = e100_rx_indicate(nic, rx, work_done, work_to_do);
> +		/* Hit quota or no more to clean */
> +		if(-EAGAIN == err || -ENODATA == err)
>  			break;
> -		} else if(-ENODATA == err)
> -			break; /* No more to clean */
>  	}
>  
> -	/* save our starting point as the place we'll restart the receiver */
> -	if(restart_required)
> -		rx_to_start = nic->rx_to_clean;
> +	/*
> +	 * On EAGAIN, hit quota so have more work to do, restart once
> +	 * cleanup is complete.
> +	 * Else, are we already rnr? then pay attention!!! this ensures that
> +	 * the state machine progression never allows a start with a
> +	 * partially cleaned list, avoiding a race between hardware
> +	 * and rx_to_clean when in NAPI mode
> +	 */
> +	if(-EAGAIN != err && ru_stopped == nic->ru_running)
> +		restart_required = 1;
> +
> +	marked_rx = nic->rx_to_use->prev->prev;
> +	if (!(marked_rx->flags & rx_el)) {
> +		marked_rx = marked_rx->prev;
> +		BUG_ON(!marked_rx->flags & rx_el);
> +	}
>  
>  	/* Alloc new skbs to refill list */
>  	for(rx = nic->rx_to_use; !rx->skb; rx = nic->rx_to_use = rx->next) {
> @@ -1898,10 +1986,12 @@ static void e100_rx_clean(struct nic *ni
>  			break; /* Better luck next time (see watchdog) */
>  	}
>  
> +	e100_find_mark_el(nic, marked_rx, !restart_required);
> +
>  	if(restart_required) {
>  		// ack the rnr?
>  		writeb(stat_ack_rnr, &nic->csr->scb.stat_ack);
> -		e100_start_receiver(nic, rx_to_start);
> +		e100_start_receiver(nic);
>  		if(work_done)
>  			(*work_done)++;
>  	}
> @@ -1912,8 +2002,6 @@ static void e100_rx_clean_list(struct ni
>  	struct rx *rx;
>  	unsigned int i, count = nic->params.rfds.count;
>  
> -	nic->ru_running = RU_UNINITIALIZED;
> -
>  	if(nic->rxs) {
>  		for(rx = nic->rxs, i = 0; i < count; rx++, i++) {
>  			if(rx->skb) {
> @@ -1935,7 +2023,6 @@ static int e100_rx_alloc_list(struct nic
>  	unsigned int i, count = nic->params.rfds.count;
>  
>  	nic->rx_to_use = nic->rx_to_clean = NULL;
> -	nic->ru_running = RU_UNINITIALIZED;
>  
>  	if(!(nic->rxs = kcalloc(count, sizeof(struct rx), GFP_ATOMIC)))
>  		return -ENOMEM;
> @@ -1950,7 +2037,9 @@ static int e100_rx_alloc_list(struct nic
>  	}
>  
>  	nic->rx_to_use = nic->rx_to_clean = nic->rxs;
> -	nic->ru_running = RU_SUSPENDED;
> +	nic->ru_running = ru_stopped;
> +
> +	e100_find_mark_el(nic, NULL, 0);
>  
>  	return 0;
>  }
> @@ -1971,8 +2060,8 @@ static irqreturn_t e100_intr(int irq, vo
>  	iowrite8(stat_ack, &nic->csr->scb.stat_ack);
>  
>  	/* We hit Receive No Resource (RNR); restart RU after cleaning */
> -	if(stat_ack & stat_ack_rnr)
> -		nic->ru_running = RU_SUSPENDED;
> +	if (stat_ack & stat_ack_rnr)
> +		nic->ru_running = ru_stopped;
>  
>  	if(likely(netif_rx_schedule_prep(netdev))) {
>  		e100_disable_irq(nic);
> @@ -2065,7 +2154,7 @@ static int e100_up(struct nic *nic)
>  	if((err = e100_hw_init(nic)))
>  		goto err_clean_cbs;
>  	e100_set_multicast_list(nic->netdev);
> -	e100_start_receiver(nic, NULL);
> +	e100_start_receiver(nic);
>  	mod_timer(&nic->watchdog, jiffies);
>  	if((err = request_irq(nic->pdev->irq, e100_intr, IRQF_SHARED,
>  		nic->netdev->name, nic->netdev)))
> @@ -2146,7 +2235,7 @@ static int e100_loopback_test(struct nic
>  		mdio_write(nic->netdev, nic->mii.phy_id, MII_BMCR,
>  			BMCR_LOOPBACK);
>  
> -	e100_start_receiver(nic, NULL);
> +	e100_start_receiver(nic);
>  
>  	if(!(skb = netdev_alloc_skb(nic->netdev, ETH_DATA_LEN))) {
>  		err = -ENOMEM;

^ permalink raw reply

* BUG: skge ethernet breakage (PCI: Unable to reserve mem region)
From: Jan Gukelberger @ 2007-09-07 16:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

Hi,

I originally reported this bug to the Debian BTS:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=441232
There I was told to talk directly to upstream.

I am pasting the original bug report below. The referenced text files
can be found at the mentioned BTS URL. 
Additionally, I have just tried Linux 2.6.23-rc5 and am attaching the
corresponding dmesg output.

Thanks in advance for any help,
Jan


With recent kernels my on-board network adapter does not work any more.
This is a Marvell Gigabit Ethernet Controller on an Asus P5B-V
mainboard.

This is the same bug as #428452 reported earlier against
linux-image-2.6.21-1-amd64. (Sorry for the duplicate, I didn't know how
to extend the old report to newer kernel versions.)
In fact, the network adapter hasn't been working since then, i.e. the
last working kernel image was 2.6.20-1-amd64 (which I am using now). The
following images 2.6.2[12]-[12]-amd64 have all exposed the same problem.

The key problem seem to be the following lines in dmesg:
------------------------------------------------------------------------
ACPI: PCI Interrupt 0000:04:04.0[A] -> GSI 19 (level, low) -> IRQ 19
PCI: Unable to reserve mem region #1:4000@ff9f8000 for device 0000:04:04.0
skge 0000:04:04.0: cannot obtain PCI resources
ACPI: PCI interrupt for device 0000:04:04.0 disabled
skge: probe of 0000:04:04.0 failed with error -16
------------------------------------------------------------------------

I'm attaching full 'dmesg' output from working (2.6.20-1) and broken
(2.6.22-2) kernel versions as well as the output of 'lspci -vvv' on the
most recent kernel.

If you need any other information or I can try something please let me
know.

[-- Attachment #2: dmesg-2.6.23-rc5 --]
[-- Type: text/plain, Size: 23223 bytes --]

Linux version 2.6.23-rc5-amd64 (Debian 2.6.23~rc5-1~experimental.1~snapshot.9462) (waldi@debian.org) (gcc version 4.1.3 20070718 (prerelease) (Debian 4.1.2-14+1)) #1 SMP Thu Sep 6 06:18:52 UTC 2007
Command line: root=/dev/sda6 ro quiet vga=791 
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
 BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ef90000 (usable)
 BIOS-e820: 000000007ef90000 - 000000007ef9e000 (ACPI data)
 BIOS-e820: 000000007ef9e000 - 000000007efe0000 (ACPI NVS)
 BIOS-e820: 000000007efe0000 - 000000007f000000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
Entering add_active_range(0, 0, 158) 0 entries of 3200 used
Entering add_active_range(0, 256, 520080) 1 entries of 3200 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000FAD70, 0024 (r2 ACPIAM)
ACPI: XSDT 7EF90100, 006C (r1 NEC              5000730 MSFT       97)
ACPI: FACP 7EF90290, 00F4 (r3 MSTEST OEMFACP   5000730 MSFT       97)
ACPI: DSDT 7EF905C0, 9C85 (r1  A0579 A0579000        0 INTL 20060113)
ACPI: FACS 7EF9E000, 0040
ACPI: APIC 7EF90390, 006C (r1 MSTEST OEMAPIC   5000730 MSFT       97)
ACPI: MCFG 7EF90400, 003C (r1 MSTEST OEMMCFG   5000730 MSFT       97)
ACPI: SLIC 7EF90440, 0176 (r1 NEC              5000730 MSFT       97)
ACPI: OEMB 7EF9E040, 007B (r1 MSTEST AMI_OEM   5000730 MSFT       97)
ACPI: HPET 7EF9A250, 0038 (r1 MSTEST OEMHPET   5000730 MSFT       97)
ACPI: GSCI 7EF9E0C0, 2024 (r1 MSTEST GMCHSCI   5000730 MSFT       97)
ACPI: SSDT 7EFA00F0, 01BF (r1    AMI   CPU1PM        1 INTL 20060113)
ACPI: SSDT 7EFA02B0, 0133 (r1    AMI   CPU2PM        1 INTL 20060113)
No NUMA configuration found
Faking a node at 0000000000000000-000000007ef90000
Entering add_active_range(0, 0, 158) 0 entries of 3200 used
Entering add_active_range(0, 256, 520080) 1 entries of 3200 used
Bootmem setup node 0 0000000000000000-000000007ef90000
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0:        0 ->      158
    0:      256 ->   520080
On node 0 totalpages: 519982
  DMA zone: 56 pages used for memmap
  DMA zone: 1044 pages reserved
  DMA zone: 2898 pages, LIFO batch:0
  DMA32 zone: 7054 pages used for memmap
  DMA32 zone: 508930 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
ACPI: HPET id: 0x8086a202 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
swsusp: Registered nosave memory region: 000000000009e000 - 000000000009f000
swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000
swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e4000
swsusp: Registered nosave memory region: 00000000000e4000 - 0000000000100000
Allocating PCI resources starting at 80000000 (gap: 7f000000:7fe00000)
SMP: Allowing 4 CPUs, 2 hotplug CPUs
PERCPU: Allocating 35176 bytes of per cpu data
Built 1 zonelists in Node order.  Total pages: 511828
Policy zone: DMA32
Kernel command line: root=/dev/sda6 ro quiet vga=791 
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
time.c: Detected 2399.997 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Checking aperture...
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 2041096k/2080320k available (2066k kernel code, 38832k reserved, 977k data, 304k init)
Calibrating delay using timer specific routine.. 4803.34 BogoMIPS (lpj=9606690)
Security Framework v1.0.0 initialized
SELinux:  Disabled at boot.
Capability LSM initialized
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
SMP alternatives: switching to UP code
ACPI: Core revision 20070126
Using local APIC timer interrupts.
result 16666629
Detected 16.666 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4800.03 BogoMIPS (lpj=9600060)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU1: Thermal monitoring enabled (TM2)
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz stepping 06
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
PCI: Not using MMCONFIG.
PCI: Using configuration type 1
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
PCI quirk: region 0480-04bf claimed by ICH6 GPIO
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P8._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs *3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15)
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] -  CD, should be C4 [20070126]
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: ACPI device : hid PNP0A08
pnp: ACPI device : hid PNP0C01
pnp: ACPI device : hid PNP0200
pnp: ACPI device : hid PNP0B00
pnp: ACPI device : hid PNP0800
pnp: ACPI device : hid PNP0C04
pnp: ACPI device : hid PNP0501
pnp: ACPI device : hid PNP0700
pnp: ACPI device : hid PNP0C02
pnp: ACPI device : hid PNP0C02
pnp: ACPI device : hid PNP0103
pnp: ACPI device : hid PNP0C02
pnp: ACPI device : hid PNP0303
pnp: ACPI device : hid PNP0F03
pnp: ACPI device : hid PNP0C02
pnp: ACPI device : hid PNP0C01
pnp: PnP ACPI: found 16 devices
ACPI: ACPI bus type pnp unregistered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NET: Registered protocol family 8
NET: Registered protocol family 20
PCI-GART: No AMD northbridge found.
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
ACPI: RTC can wake from S4
Time: tsc clocksource has been installed.
pnp: the driver 'system' has been registered
pnp: match found with the PnP device '00:01' and the driver 'system'
pnp: 00:01: iomem range 0xfed14000-0xfed19fff has been reserved
pnp: match found with the PnP device '00:08' and the driver 'system'
pnp: 00:08: ioport range 0x290-0x297 has been reserved
pnp: match found with the PnP device '00:09' and the driver 'system'
pnp: 00:09: iomem range 0xfed1c000-0xfed1ffff has been reserved
pnp: 00:09: iomem range 0xfed20000-0xfed8ffff has been reserved
pnp: 00:09: iomem range 0xff9fa000-0xff9fafff has been reserved
pnp: 00:09: iomem range 0xfff00000-0xfffffffe could not be reserved
pnp: match found with the PnP device '00:0b' and the driver 'system'
pnp: 00:0b: iomem range 0xfec00000-0xfec00fff has been reserved
pnp: 00:0b: iomem range 0xfee00000-0xfee00fff could not be reserved
pnp: match found with the PnP device '00:0e' and the driver 'system'
pnp: 00:0e: iomem range 0xe0000000-0xefffffff has been reserved
pnp: match found with the PnP device '00:0f' and the driver 'system'
pnp: 00:0f: iomem range 0x0-0x9ffff could not be reserved
pnp: 00:0f: iomem range 0xc0000-0xcffff has been reserved
pnp: 00:0f: iomem range 0xe0000-0xfffff could not be reserved
pnp: 00:0f: iomem range 0x100000-0x7effffff could not be reserved
PCI: Bridge: 0000:00:01.0
  IO window: 8000-afff
  MEM window: ff700000-ff7fffff
  PREFETCH window: bfe00000-dfdfffff
PCI: Bridge: 0000:00:1c.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: dfe00000-dfefffff
PCI: Bridge: 0000:00:1c.4
  IO window: b000-bfff
  MEM window: ff800000-ff8fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
  IO window: c000-cfff
  MEM window: ff900000-ff9fffff
  PREFETCH window: 80000000-800fffff
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:01.0 to 64
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.4[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.4 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 6291456 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
checking if image is initramfs... it is
Freeing initrd memory: 5950k freed
audit: initializing netlink socket (disabled)
audit(1189181505.780:1): initialized
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Boot video device is 0000:01:00.0
PCI: Setting latency timer of device 0000:00:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:01.0:pcie00]
PCI: Setting latency timer of device 0000:00:1c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.0:pcie00]
Allocate Port Service[0000:00:1c.0:pcie02]
PCI: Setting latency timer of device 0000:00:1c.4 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.4:pcie00]
Allocate Port Service[0000:00:1c.4:pcie02]
vesafb: framebuffer at 0xc0000000, mapped to 0xffffc20000b00000, using 3072k, total 16384k
vesafb: mode is 1024x768x16, linelength=2048, pages=9
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
Real Time Clock Driver v1.12ac
hpet_resources: 0xfed00000 is busy
Linux agpgart interface v0.102
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
pnp: the driver 'serial' has been registered
pnp: match found with the PnP device '00:06' and the driver 'serial'
00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
pnp: the driver 'i8042 kbd' has been registered
pnp: match found with the PnP device '00:0c' and the driver 'i8042 kbd'
pnp: the driver 'i8042 aux' has been registered
pnp: match found with the PnP device '00:0d' and the driver 'i8042 aux'
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Freeing unused kernel memory: 304k freed
input: AT Translated Set 2 keyboard as /class/input/input0
ACPI Exception (processor_core-0797): AE_NOT_FOUND, Processor Device is not present [20070126]
ACPI Exception (processor_core-0797): AE_NOT_FOUND, Processor Device is not present [20070126]
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1a.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1a.0 to 64
uhci_hcd 0000:00:1a.0: UHCI Host Controller
uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:1a.0: irq 16, io base 0x0000e000
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
pnp: the driver 'ide' has been registered
Floppy drive(s): fd0 is 1.44M
SCSI subsystem initialized
libata version 2.21 loaded.
FDC 0 is a post-1991 82077
ACPI: PCI Interrupt 0000:00:1a.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1a.1 to 64
uhci_hcd 0000:00:1a.1: UHCI Host Controller
uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1a.1: irq 17, io base 0x0000e080
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.0: irq 23, io base 0x0000d800
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.1: irq 19, io base 0x0000d880
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000dc00
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:04:04.0[A] -> GSI 19 (level, low) -> IRQ 19
PCI: Unable to reserve mem region #1:4000@ff9f8000 for device 0000:04:04.0
skge 0000:04:04.0: cannot obtain PCI resources
ACPI: PCI interrupt for device 0000:04:04.0 disabled
skge: probe of 0000:04:04.0 failed with error -16
ACPI: PCI Interrupt 0000:04:03.0[A] -> GSI 21 (level, low) -> IRQ 21
firewire_ohci: Added fw-ohci device 0000:04:03.0, OHCI version 1.10
ahci 0000:00:1f.2: version 2.3
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19
firewire_core: created new fw device fw0 (0 config rom retries, S400)
ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x33 impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck stag pm led clo pmp pio slum part 
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
scsi4 : ahci
scsi5 : ahci
ata1: SATA max UDMA/133 cmd 0xffffc20000ac2900 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 1276
ata2: SATA max UDMA/133 cmd 0xffffc20000ac2980 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 1276
ata3: DUMMY
ata4: DUMMY
ata5: SATA max UDMA/133 cmd 0xffffc20000ac2b00 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 1276
ata6: SATA max UDMA/133 cmd 0xffffc20000ac2b80 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 1276
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: SAMSUNG HD401LJ, ZZ100-15, max UDMA7
ata1.00: 781422768 sectors, multi 0: LBA48 NCQ (not used)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATAPI: HL-DT-STDVD-RAM GSA-H30N, 1.01, max UDMA/100
ata2.00: configured for UDMA/100
ata5: SATA link down (SStatus 0 SControl 300)
ata6: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD401LJ  ZZ10 PQ: 0 ANSI: 5
scsi 1:0:0:0: CD-ROM            HL-DT-ST DVD-RAM GSA-H30N 1.01 PQ: 0 ANSI: 5
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
ACPI: PCI Interrupt 0000:00:1a.7[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1a.7 to 64
ehci_hcd 0000:00:1a.7: EHCI Host Controller
ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 6
ehci_hcd 0000:00:1a.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1a.7
ehci_hcd 0000:00:1a.7: irq 18, io mem 0xffaff000
ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
hub 6-0:1.0: 4 ports detected
ahci 0000:02:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
ahci 0000:02:00.0: flags: 64bit ncq pm led clo pmp pio slum part 
PCI: Setting latency timer of device 0000:02:00.0 to 64
scsi6 : ahci
scsi7 : ahci
ata7: SATA max UDMA/133 cmd 0xffffc20000ac4100 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 16
ata8: SATA max UDMA/133 cmd 0xffffc20000ac4180 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 16
ata7: SATA link down (SStatus 0 SControl 300)
ata8: SATA link down (SStatus 0 SControl 300)
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 7
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 23, io mem 0xffafec00
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb7: configuration #1 chosen from 1 choice
hub 7-0:1.0: USB hub found
hub 7-0:1.0: 6 ports detected
sd 0:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3 < sda5 sda6 sda7 >
sd 0:0:0:0: [sda] Attached SCSI disk
JMB363: IDE controller at PCI slot 0000:02:00.1
PCI: Enabling device 0000:02:00.1 (0000 -> 0001)
ACPI: PCI Interrupt 0000:02:00.1[B] -> GSI 17 (level, low) -> IRQ 17
JMB363: chipset revision 2
JMB363: 100% native mode on irq 17
PCI: Setting latency timer of device 0000:02:00.1 to 64
    ide0: BM-DMA at 0xb400-0xb407, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xb408-0xb40f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi CD-ROM sr0
sd 0:0:0:0: Attached scsi generic sg0 type 0
sr 1:0:0:0: Attached scsi generic sg1 type 5
hdb: WDC WD400BB-00AUA1, ATA DISK drive
hdb: selected mode 0x45
ide0 at 0xbc00-0xbc07,0xb882 on irq 17
Probing IDE interface ide1...
hdb: max request size: 128KiB
hdb: 78165360 sectors (40020 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hdb: cache flushes not supported
 hdb: hdb1 hdb2 < hdb5 hdb6 hdb7 hdb8 hdb9 hdb10 hdb11 >
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
input: PC Speaker as /class/input/input1
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.02 (26-Jul-2007)
iTCO_wdt: Found a ICH8 or ICH8R TCO device (Version=2, TCOBASE=0x0860)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
input: Power Button (FF) as /class/input/input2
ACPI: Power Button (FF) [PWRF]
input: Power Button (CM) as /class/input/input3
ACPI: Power Button (CM) [PWRB]
ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:1b.0 to 64
hda_codec: Unknown model for AD1988, trying auto-probe from BIOS...
Linux video capture interface: v2.00
PCI: Enabling device 0000:00:1f.3 (0001 -> 0003)
ACPI: PCI Interrupt 0000:00:1f.3[C] -> GSI 18 (level, low) -> IRQ 18
logips2pp: Detected unknown logitech mouse model 95
saa7146: register extension 'dvb'.
ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 22 (level, low) -> IRQ 22
saa7146: found saa7146 @ mem ffffc20000ace800 (revision 1, irq 22) (0x13c2,0x0003).
DVB: registering new adapter (Technotrend/Hauppauge WinTV Nexus-S rev2.X)
adapter has MAC addr = 00:d0:5c:22:68:cc
dvb-ttpci: gpioirq unknown type=0 len=0
dvb-ttpci: info @ card 0: firm f0240009, rtsl b0250018, vid 71010068, app 80002622
dvb-ttpci: firmware @ card 0 supports CI link layer interface
input: ImExPS/2 Logitech Explorer Mouse as /class/input/input4
dvb-ttpci: adac type set to 0 @ card 0
saa7146_vv: saa7146 (0): registered device video0 [v4l2]
saa7146_vv: saa7146 (0): registered device vbi0 [v4l2]
DVB: registering frontend 0 (ST STV0299 DVB-S)...
input: DVB on-card IR receiver as /class/input/input5
dvb-ttpci: found av7110-0.
EXT3 FS on sda6, internal journal
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
pnp: the driver 'parport_pc' has been registered
lp: driver loaded but no devices found
ppdev: user-space parallel port driver
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
[drm] Initialized radeon 1.28.0 20060524 on minor 0
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs

^ permalink raw reply

* Re: auto recycling of TIME_WAIT connections
From: Rick Jones @ 2007-09-07 17:04 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: netdev
In-Reply-To: <46E11829.4090607@draigBrady.com>

> The first issue, requires a large timeout, and
> the TIME_WAIT timeout is currently 60 seconds on linux.
> That timeout effectively limits the connection rate between
> local TCP clients and a server to 32k/60s or around 500 connections/second.

Actually, it would be more like 60k/60s if the application were making 
explicit calls to bind() as arguably it should if it is going to be 
churning through so many connections.

This was an issue over a decade ago with SPECweb96 benchmarking.  The 
initial solution was to make the explicit bind() calls and not rely on 
the anonymous/ephemeral port space.  After that, one starts adding 
additional IP's into the mix (at least where possible).  And if that 
fails, one has to go back to the beginning and ask oneself exactly why a 
client is trying to churn through so many connections per second in the 
first place.

If we were slavishly conformant to the RFC's :) that 60 seconds would be 
240 seconds...

> But that issue can't really happen when the client
> and server are on the same machine can it, and
> even if it could, the timeouts involved would be shorter.
> 
> Now linux does have an (undocumented) /proc/sys/net/ipv4/tcp_tw_recycle flag
> to enable recycling of TIME_WAIT connections. This is global however and could cause
> problems in general for external connections.

Rampant speculation begins...

If the client can be convinced to just call shutdown(SHUT_RDWR) rather 
than close(), and be the first to do so, ahead of the server, I think it 
will retain a link to the TCP endpoint in TIME_WAIT.  It could then, in 
TCP theory, call connect() again, and go through a path that allows 
transition from TIME_WAIT to ESTABLISHED if all the right things wrt 
Initial Sequence Number selection happen.  Whether randomization of the 
ISN allows that today is questionable.

> So how about auto enabling recycling for local connections?

I think the standard response is that one can never _really_ know what 
is local and what not, particularly in the presence of netfilter and the 
rewriting of headers behind one's back.

rick jones

^ permalink raw reply

* Re: Linksys Gigabit USB2.0 adapter (asix) regression
From: David Hollis @ 2007-09-07 17:18 UTC (permalink / raw)
  To: Erik Slagter; +Cc: netdev
In-Reply-To: <46D59717.3080609@slagter.name>

On Wed, 2007-08-29 at 17:56 +0200, Erik Slagter wrote:
> Never mind, I plugged the adapter into a windows machine once more
> (another one) and now it acts exactly like it's plugged into my linux
> laptop, so I guess it's actually broken. I will send it back for repair.
> 
> Thanks for your effort and I apologise for being a pain ;-)

Interesting.  I hope that that actually is the case (short of the PITA
for you having to deal with RMA and all of that stuff).  It definitely
wasn't making any sense with only the minor changes that have taken
place with the driver in the past few months.

-- 
David Hollis <dhollis@davehollis.com>


^ permalink raw reply

* Re: problems with lockd in 2.6.22.6
From: Wolfgang Walter @ 2007-09-07 18:05 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: neilb, netdev, nfs
In-Reply-To: <20070907161945.GI24638@fieldses.org>

Am Freitag, 7. September 2007 18:19 schrieben Sie:
> On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> > Hello,
> >
> > we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
> > then we get the message
> >
> > lockd: too many open TCP sockets, consider increasing the number of nfsd
> > threads lockd: last TCP connect from ^\\236^\É^D

> >
> > 2) The number of nfsd threads we are running on the machine is 1024.
> > So this is not the problem. It seems, though, that in the case of
> > lockd svc_tcp_accept does not check the number of nfsd threads but the
> > number of lockd threads which is one.  As soon as the number of open
> > lockd sockets surpasses 80 this message gets logged.  This usually
> > happens every evening when a lot of people shutdown their workstation.
>
> So to be clear: there's not an actual problem here other than that the
> logs are getting spammed?  (Not that that isn't a problem in itself.)
>

When more than 80 nfs clients try to lock files at the same time then it
probably would.

> > 3) For unknown reason these sockets then remain open. In the morning
> > when people start their workstation again we therefor not only get a
> > lot of these messages again but often the nfs-server does not properly
> > work any more. Restarting the nfs-daemon is a workaround.
>
> Hm, thanks.
>

I don't know if the lockd thing is the reason, though.

2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems
to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related 
problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. 
Sometimes this fails with

lockd_down: lockd failed to exit, clearing pid
nfsd: last server has exited
nfsd: unexporting all filesystems
lockd_up: makesock failed, error=-98

after which the server must be rebooted.

I think there is something with lockd because there are no problems over the 
day. It is in the morning when a lot of people log into their machines and 
start their desktops (I think kde locks its config files when it reads them).

Regards
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply

* Re: [PATCH] bonding: update some distro-specific documentation
From: Jay Vosburgh @ 2007-09-07 18:27 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev
In-Reply-To: <20070830212429.GA12676@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

	This all looks fine except for one nit (well, request for extra
detail, really):

>@@ -802,15 +802,20 @@ BROADCAST=192.168.1.255
> ONBOOT=yes
> BOOTPROTO=none
> USERCTL=no
>+BONDING_OPTS="mode=balance-alb miimon=100"
>
> 	Be sure to change the networking specific lines (IPADDR,
> NETMASK, NETWORK and BROADCAST) to match your network configuration.
>+You also need to set the BONDING_OPTS= line to specify the desired
>+options for your bond0 interface.  Specifying bonding options in this
>+way is the preferred method for configuring bonding interfaces.

	Can you add something here that mentions that, for the
arp_ip_target option, it has to be supplied as "arp_ip_target=+10.0.0.1"
and not just "arp_ip_target=10.0.0.1"?  Also, multiple targets require
multiple instances of the arp_ip_target option; it doesn't work to put
multiple IP addresses as in the module option (i.e.,
"arp_ip_target=10.0.0.1,10.0.0.2").

	This is necessary because ifup-eth isn't adding the "+" when it
translates the option for use with sysfs or parsing the multiple IP
address syntax.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Johannes Berg @ 2007-09-07 19:17 UTC (permalink / raw)
  To: Michael Buesch
  Cc: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Herbert Xu,
	satyam-wEGCiKHe2LqWVfeAwA7xHQ, flo-BCn6idZOOBwdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	michal.k.k.piotrowski-Re5JQEeQqe8AvxtiuMwx3w,
	ipw3945-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	yi.zhu-ral2JQCrhuEAvxtiuMwx3w, flamingice-R9e9/4HEdknk1uMJSBkQmQ
In-Reply-To: <200709071801.34909.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]

On Fri, 2007-09-07 at 18:01 +0200, Michael Buesch wrote:

> What's the problem with trying to lock it?

I think I had a problem with it once when I inserted it into some code
that was atomic and it all blew up badly ;) Nothing important really but
it sort of made me not like it much.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
From: David Acker @ 2007-09-07 20:41 UTC (permalink / raw)
  To: Kok, Auke
  Cc: John Ronciak, Jesse Brandeburg, Jeff Kirsher, Milton Miller,
	Jeff Garzik, netdev, e1000-devel, Scott Feldman
In-Reply-To: <46E17CD7.8080605@intel.com>

Kok, Auke wrote:
> first impressions are not good: pings are erratic and shoot up to 3 
> seconds. In an overnight stress test, the receive unit went offline and 
> never came back up (TX still working).
> 
> it sounds like something in the logic is suspending the ru too much, but 
> I haven't had time to look deeply into the code yet.

I don't have an e100 enabled x86 box handy but I will look into getting one setup.

I just applied this patch to my PXA255 based system http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm .
It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the e100 patch.  I get:

pings going from the embedded system to a desktop machine.
100 packets transmitted, 100 received, 0% packet loss, time 98996ms
rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms

Pings going the from the desktop machine to the embedded system
100 packets transmitted, 100 received, 0% packet loss, time 99217ms
rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms


iperf tcp from embedded to desktop gets:
[  5]  0.0-100.0 sec  1007 MBytes  84.4 Mbits/sec
iperf udp from the embedded to the desktop gets (embedded told to send at 100mbps):
[  5] Server Report:
[  5]  0.0-100.0 sec    947 MBytes  79.4 Mbits/sec  0.068 ms   16/675645 (0.0024%)
[  5]  0.0-100.0 sec  1 datagrams received out-of-order

iperf tcp from the desktop to the embedded gets:
[  6]  0.0-100.0 sec  1.01 GBytes  86.4 Mbits/sec
iperf udp from the desktop to the embedded gets the following when the desktop sent at 100 mbps
[  5]  0.0-100.0 sec    964 MBytes  80.8 Mbits/sec  0.359 ms 126467/813760 (16%)
[  5]  0.0-100.0 sec  1 datagrams received out-of-order


Boot messages for my e100 are:
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
PCI: enabling device 0000:00:09.0 (0000 -> 0003)
PCI: Setting latency timer of device 0000:00:09.0 to 64
e100: eth0: e100_probe: addr 0x10131000, irq 111, MAC addr 00:09:30:FF:F2:F6
cat /sys/bus/pci/drivers/e100/0000\:00\:09.0/{device,vendor,subsystem_device,subsystem_vendor}
0x1209
0x8086
0x0000
0x0000

It's on its own interrupt line:
cm-debian:~# cat /proc/interrupts |grep eth0
111:     402428           -  eth0

lspci shows:
00:09.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast Ethernet Controller (rev 09)

Let me know if there is any other information I can provide you.  I will look through the code to see what could be 
going on with your machine.  I will also look into reproducing these results with a newer kernel.  This may be tricky 
since compulab's patches are pretty stale and don't always apply easily.

-Ack

^ permalink raw reply

* Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
From: Kok, Auke @ 2007-09-07 21:03 UTC (permalink / raw)
  To: David Acker
  Cc: Kok, Auke, e1000-devel, netdev, Jesse Brandeburg, Milton Miller,
	Scott Feldman, John Ronciak, Jeff Kirsher, Jeff Garzik
In-Reply-To: <46E1B75C.6090208@roinet.com>

David Acker wrote:
> Kok, Auke wrote:
>> first impressions are not good: pings are erratic and shoot up to 3 
>> seconds. In an overnight stress test, the receive unit went offline and 
>> never came back up (TX still working).
>>
>> it sounds like something in the logic is suspending the ru too much, but 
>> I haven't had time to look deeply into the code yet.
> 
> I don't have an e100 enabled x86 box handy but I will look into getting one setup.
> 
> I just applied this patch to my PXA255 based system http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm .
> It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the e100 patch.  I get:
> 
> pings going from the embedded system to a desktop machine.
> 100 packets transmitted, 100 received, 0% packet loss, time 98996ms
> rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms
> 
> Pings going the from the desktop machine to the embedded system
> 100 packets transmitted, 100 received, 0% packet loss, time 99217ms
> rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms

ok, I just got a note from our lab saying that that particular system has the 
freak ping times even without your patch applied 8)

ignoring the ping issue, we still have the ru offline, but that could have 
possibly been caused by whatever is causing this ping issue... More testing is 
needed, and I'll try to find a system without the ping issue here first.

Auke

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
From: Kok, Auke @ 2007-09-07 21:18 UTC (permalink / raw)
  To: Kok, Auke
  Cc: David Acker, John Ronciak, Jesse Brandeburg, Jeff Kirsher,
	Milton Miller, Jeff Garzik, netdev, e1000-devel, Scott Feldman
In-Reply-To: <46E1BCA3.5030201@intel.com>

Kok, Auke wrote:
> David Acker wrote:
>> Kok, Auke wrote:
>>> first impressions are not good: pings are erratic and shoot up to 3 
>>> seconds. In an overnight stress test, the receive unit went offline and 
>>> never came back up (TX still working).
>>>
>>> it sounds like something in the logic is suspending the ru too much, but 
>>> I haven't had time to look deeply into the code yet.
>> I don't have an e100 enabled x86 box handy but I will look into getting one setup.
>>
>> I just applied this patch to my PXA255 based system http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm .
>> It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the e100 patch.  I get:
>>
>> pings going from the embedded system to a desktop machine.
>> 100 packets transmitted, 100 received, 0% packet loss, time 98996ms
>> rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms
>>
>> Pings going the from the desktop machine to the embedded system
>> 100 packets transmitted, 100 received, 0% packet loss, time 99217ms
>> rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms
> 
> ok, I just got a note from our lab saying that that particular system has the 
> freak ping times even without your patch applied 8)
> 
> ignoring the ping issue, we still have the ru offline, but that could have 
> possibly been caused by whatever is causing this ping issue... More testing is 
> needed, and I'll try to find a system without the ping issue here first.

update: Emil reports that the unit with the RU hang did not have bad ping times 
to begin with, pointing to a problem with the patch for sure now...

Auke

^ permalink raw reply

* error(s) in 2.6.23-rc5 bonding.txt ?
From: Rick Jones @ 2007-09-07 22:02 UTC (permalink / raw)
  To: Linux Network Development list

I was perusing Documentation/networking/bonding.txt in a 2.6.23-rc5 tree 
and came across the following discussing the round-robin scheduling:

>         Note that this out of order delivery occurs when both the
>         sending and receiving systems are utilizing a multiple
>         interface bond.  Consider a configuration in which a
>         balance-rr bond feeds into a single higher capacity network
>         channel (e.g., multiple 100Mb/sec ethernets feeding a single
>         gigabit ethernet via an etherchannel capable switch).  In this
>         configuration, traffic sent from the multiple 100Mb devices to
>         a destination connected to the gigabit device will not see
>         packets out of order.  

My first reaction was that this was incorrect - it didn't matter if the 
receiver was using a single link or not because the packets flowing 
across the multiple 100Mb links could hit the intermediate device out of 
order and so stay that way across the GbE link.

Before I go and patch-out that text I thought I'd double check.

rick jones

^ permalink raw reply

* Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
From: Jason Lunz @ 2007-09-07 21:20 UTC (permalink / raw)
  To: James Chapman
  Cc: netdev, davem, jeff, mandeep.baines, ossthema, hadi,
	Stephen Hemminger
In-Reply-To: <46E11A61.9030409@katalix.com>

In gmane.linux.network, you wrote:
> But the CPU has done more work. The flood ping will always show 
> increased CPU with these changes because the driver always stays in the 
> NAPI poll list. For typical LAN traffic, the average CPU usage doesn't 
> increase as much, though more measurements would be useful.

I'd be particularly interested to see what happens to your latency when
other apps are hogging the cpu. I assume from your description that your
cpu is mostly free to schedule the niced softirqd for the device polling
duration, but this won't always be the case. If other tasks are running
at high priority, it could be nearly a full jiffy before softirqd gets
to check the poll list again and the latency introduced could be much
higher than you've yet measured.

Jason

^ permalink raw reply

* Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.
From: Mike Christie @ 2007-09-07 22:23 UTC (permalink / raw)
  To: Anil Veerabhadrappa
  Cc: Mike Christie, Michael Chan, davem, netdev, open-iscsi, talm,
	lusinsky, uri, SCSI Mailing List
In-Reply-To: <1189027622.19638.42.camel@dhcp-10-13-106-205.broadcom.com>

Anil Veerabhadrappa wrote:
>>
>>> +
>>> +/* iSCSI stages */
>>> +#define ISCSI_STAGE_SECURITY_NEGOTIATION (0)
>>> +#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1)
>>> +#define ISCSI_STAGE_FULL_FEATURE_PHASE (3)
>>> +/* Logout response codes */
>>> +#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0)
>>> +#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1)
>>> +#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3)
>>> +
>>> +/* iSCSI task types */
>>> +#define ISCSI_TASK_TYPE_READ    (0)
>>> +#define ISCSI_TASK_TYPE_WRITE   (1)
>>> +#define ISCSI_TASK_TYPE_MPATH   (2)
>>
>>
>>
>> All of these iscsi code shoulds be in iscsi_proto.h or should be added 
>> there.
> This is a very tricky proposal as this header file is automatically
> generated by a well defined process and is shared between various driver
> supporting multiple platform/OS and the firmware. If it is not of a big
> issue I would like to keep it the way it is.

The values that are iscsi RFC values should come from the iscsi_proto.h 
file and not be duplicated for each driver.


>>> +/*
>>> + * hardware reset
>>> + */
>>> +int bnx2i_reset(struct scsi_cmnd *sc)
>>> +{
>>> +	return 0;
>>> +}
>>
>> So what is up with this one? It seems like if there is a way to reset 
>> hardware then you would want it as the scsi eh host reset callout 
>> instead of dropping the session. We could add some transport level 
>> recovery callouts for the iscsi specifics.
> 
> We may not be able to support HBA cold reset as bnx2 driver is the
> primary owner of chip reset and initialization. This is the drawback of
> sharing network interface with the NIC driver. If there is a need for
> administrator to reset the iSCSI port same can be achieved by running
> 'ifdown eth#' and 'ifup eth#'.
> Current driver even allows ethernet interface reset when there are
> active iSCSI connection, all active iscsi sessions will be reinstated
> when the network link comes back live
>  
> 

If you cannot support it or it does not make sense just remove the stub 
then. I say it is not a big deal now, but hopefully we do not hit fun 
like with qla3xxx and qla4xxx :)

>>> +
>>> +void bnx2i_sysfs_cleanup(void)
>>> +{
>>> +	class_device_unregister(&port_class_dev);
>>> +	class_unregister(&bnx2i_class);
>>> +}
>> The sysfs bits related to the hba should be use one of the scsi sysfs 
>> facilities or if they are related to iscsi bits and are generic then 
>> through the iscsi hba
> 
> bnx2i needs 2 sysfs entries -
> 1. QP size info - this is used to size per connection shared data
> structures to issue work requests to chip (login, scsi cmd, tmf, nopin)
> and get completions from the chip (scsi completions, async messages,
> etc'). This is a iSCSI HBA attribute
> 2. port mapper - we can be more flexible on classifying this as either
> iSCSI HBA attribute or bnx2i driver global attribute
> Can hooks be added to iSCSI transport class to include these?
> 

Which ones were they exactly? I think JamesB wanted only common 
transport values in the transport class. If it is driver specific then 
it should go on the host or target or device with the scsi_host_template 
attrs.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox