Netdev List
 help / color / mirror / Atom feed
* Re: [patch 2/4] ipset: make IPv4 and IPv6 address handling similar
From: Jozsef Kadlecsik @ 2011-01-18 20:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: holger, netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1101182121310.19007@obet.zrqbmnf.qr>

On Tue, 18 Jan 2011, Jan Engelhardt wrote:

> On Tuesday 2011-01-18 21:18, Jozsef Kadlecsik wrote:
> >On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:
> >
> >> While the following works for AF_INET:
> >> 
> >>  ipset add foo 192.168.1.1/32
> >> 
> >> this does not work for AF_INET6:
> >> 
> >>  ipset add foo6 20a1:1:2:3:4:5:6:7/128
> >>  ipset v5.2: Syntax error: plain IP address must be supplied: 20a1:1:2:3:4:5:6:7/128
> >
> >Yeah, the usual issue: should IPv4/32 and IPv6/128 be handled as a plain 
> >IPv4/v6 address when the manual says "enter a plain IPv4/v6 address" :-).
> 
> (Assuming this was a question, heuristically based on the word order
> you used:) I don't think so. iptables, resp. its modules, do not
> allow that either.

I know, but the situation is a little bit more complicated: the set type 
in question works differently with IPv4 and IPv6. In the IPv4 case, a 
range of IP addresses as IPv4/prefix is accepted as input (thus 
192.168.1.1/32 too), while for IPv6, only plain IPv6 addresses are allowed 
and therefore 20a1:1:2:3:4:5:6:7/128 was rejected.

That looks really odd so I added the feature (but could not resist to add 
my comment as a pseudo-question :-).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: WARNING in module rt2870sta stable kernel 2.6.37
From: Eric Dumazet @ 2011-01-18 20:33 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: Giangiacomo Mariotti, Greg KH, devel, linux-kernel, netdev
In-Reply-To: <AANLkTi=-4k0hbM9gESLBfzJW_reonqX_GMiVzXzRJFqu@mail.gmail.com>

Le mardi 18 janvier 2011 à 23:25 +0300, Denis Kirjanov a écrit :
> I have sent a patch to fix this problem: https://lkml.org/lkml/2011/1/10/329
> It also fixes a bug #26472: https://bugzilla.kernel.org/show_bug.cgi?id=26472

OK, next time CC netdev so that we can Ack your patches ;)




^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Pablo Neira Ayuso @ 2011-01-18 20:31 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, arthur.marsh, jengelh, eric.dumazet, netdev, hadi
In-Reply-To: <20110118102437.GB7520@ff.dom.local>

On 18/01/11 11:24, Jarek Poplawski wrote:
> On Tue, Jan 18, 2011 at 02:07:02AM -0800, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Tue, 18 Jan 2011 09:38:11 +0000
>>
>>> Even if I'm wrong, this change added to stable will break many configs.
>>> My proposal is to revert commit 0ab03c2b147 until proper fix is found.
>>
>> The flag combination is, at best ambiguous, it has no proper
>> definition without the check we added.
> 
> Do you all expect all users manage to upgrade avahi app before
> changing their stable kernel? I mean "own distro" users especially.

The combination that avahi uses makes no sense.

I've been auditing user-space tools that may have problems with this change:

* iw (it uses libnl)
* acpid (it uses a mangled version of libnetlink shipped in iproute)
* tstime, for taskstats, it uses libnl
* wimax-tools, it uses libnl
* quota-tools, it uses libnl
* keepalived, no libs used

Well, I can keep looking for more, but I think that avahi is the only
one doing this incorrectly.

Please, fix avahi instead.

^ permalink raw reply

* Re: WARNING in module rt2870sta stable kernel 2.6.37
From: Eric Dumazet @ 2011-01-18 20:30 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: devel, netdev, Greg KH, linux-kernel
In-Reply-To: <AANLkTi=HsE+h9GU8rM6zW2BVtoayEGfKOmPCsrrNHiz5@mail.gmail.com>

Le mardi 18 janvier 2011 à 21:16 +0100, Giangiacomo Mariotti a écrit :
> Hi, the following message was logged on a 2.6.37 kernel(it says
> tainted, but it's actually a micro patch I wrote and applied on top of
> current 2.6.37 vanilla, patch appended, but it should be irrelevant, I
> just shut up a bunch of useless debug output for this staging driver):

Thats a known problem

Please try following patch

[PATCH] staging, rt2860: fix panic

Its now illegal to call netif_stop_queue() before register_netdev()
(commit e6484930d7 ("net: allocate tx queues inregister_netdevice")
    
Reported-by: Giangiacomo Mariotti <gg.mariotti@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/staging/rt2860/rt_main_dev.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/rt2860/rt_main_dev.c b/drivers/staging/rt2860/rt_main_dev.c
index 701561d..236dd36 100644
--- a/drivers/staging/rt2860/rt_main_dev.c
+++ b/drivers/staging/rt2860/rt_main_dev.c
@@ -484,8 +484,6 @@ struct net_device *RtmpPhyNetDevInit(struct rt_rtmp_adapter *pAd,
 	net_dev->ml_priv = (void *)pAd;
 	pAd->net_dev = net_dev;
 
-	netif_stop_queue(net_dev);
-
 	return net_dev;
 
 }


_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel

^ permalink raw reply related

* Re: [PATCH] vhost: rcu annotation fixup
From: Paul E. McKenney @ 2011-01-18 20:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110118201031.GC18760@redhat.com>

On Tue, Jan 18, 2011 at 10:10:31PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2011 at 11:02:33AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
> > > > > When built with rcu checks enabled, vhost triggers
> > > > > bogus warnings as vhost features are read without
> > > > > dev->mutex sometimes.
> > > > > Fixing it properly is not trivial as vhost.h does not
> > > > > know which lockdep classes it will be used under.
> > > > > Disable the warning by stubbing out the check for now.
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >  drivers/vhost/vhost.h |    4 +---
> > > > >  1 files changed, 1 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > > > index 2af44b7..2d03a31 100644
> > > > > --- a/drivers/vhost/vhost.h
> > > > > +++ b/drivers/vhost/vhost.h
> > > > > @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> > > > >  {
> > > > >  	unsigned acked_features;
> > > > > 
> > > > > -	acked_features =
> > > > > -		rcu_dereference_index_check(dev->acked_features,
> > > > > -					    lockdep_is_held(&dev->mutex));
> > > > > +	acked_features = rcu_dereference_index_check(dev->acked_features, 1);
> > > > 
> > > > Ouch!!!
> > > > 
> > > > Could you please at least add a comment?
> > > 
> > > Yes, OK.
> > > 
> > > > Alternatively, pass in the lock that is held and check for that?  Given
> > > > that this is a static inline, the compiler should be able to optimize
> > > > the argument away when !PROVE_RCU, correct?
> > > > 
> > > > 							Thanx, Paul
> > > 
> > > Hopefully, yes. We don't always have a lock: the idea was
> > > to create a lockdep for these cases. But we can't pass
> > > the pointer to that ...
> > 
> > I suppose you could pass a pointer to the lockdep map structure.
> > Not sure if this makes sense, but it would handle the situation.
> 
> Will it compile with lockdep disabled too? What will the pointer be?

One (crude) approach would be to make the pointer void* if lockdep
is disabled.

> > Alternatively, create a helper function that checks the possibilities
> > and screams if none of them are in effect.
> > 
> > 							Thanx, Paul
> 
> The problem here is the callee needs to know about all callers.

As does the guy reading the code.  ;-)

							Thanx, Paul

> > > > >  	return acked_features & (1 << bit);
> > > > >  }
> > > > > 
> > > > > -- 
> > > > > 1.7.3.2.91.g446ac
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [patch 2/4] ipset: make IPv4 and IPv6 address handling similar
From: Jan Engelhardt @ 2011-01-18 20:25 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: holger, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.00.1101182116150.28203@blackhole.kfki.hu>


On Tuesday 2011-01-18 21:18, Jozsef Kadlecsik wrote:
>On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:
>
>> While the following works for AF_INET:
>> 
>>  ipset add foo 192.168.1.1/32
>> 
>> this does not work for AF_INET6:
>> 
>>  ipset add foo6 20a1:1:2:3:4:5:6:7/128
>>  ipset v5.2: Syntax error: plain IP address must be supplied: 20a1:1:2:3:4:5:6:7/128
>
>Yeah, the usual issue: should IPv4/32 and IPv6/128 be handled as a plain 
>IPv4/v6 address when the manual says "enter a plain IPv4/v6 address" :-).

(Assuming this was a question, heuristically based on the word order
you used:) I don't think so. iptables, resp. its modules, do not
allow that either.

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Jay Vosburgh @ 2011-01-18 20:24 UTC (permalink / raw)
  To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
  Cc: Oleg V. Ukhno, John Fastabend, David S. Miller,
	netdev@vger.kernel.org, =?UTF-8?B?U8OpYmFzdGllbiBCYXJyw6k=?=,
	Christophe Paasch
In-Reply-To: <4D35BED5.7040301@gmail.com>

Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

>Le 18/01/2011 16:28, Oleg V. Ukhno a écrit :
>> On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
>>> I remember a topology (described by Jay, for as far as I remember),
>>> where two hosts were connected through two distinct VLANs. In such
>>> topology:
>>> - it is possible to detect path failure using arp monitoring instead of
>>> miimon.

	I don't think this is true, at least not for the case of
balance-rr.  Using ARP monitoring with any sort of load balance scheme
is problematic, because the replies may be balanced to a different slave
than the sender.

>>> - changing the destination MAC address of egress packets are not
>>> necessary, because egress path selection force ingress path selection
>>> due to the VLAN.

	This is true, with one comment: Oleg's proposal we're discussing
changes the source MAC address of outgoing packets, not the destination.
The purpose being to manipulate the src-mac balancing algorithm on the
switch when the packets are hashed at the egress port channel group.
The packets (for a particular destination) all bear the same destination
MAC, but (as I understand it) are manually assigned tailored source MAC
addresses that hash to sequential values.

>> In case with two VLANs - yes, this shouldn't be necessary(but needs to
>> be tested, I am not sure), but within one - it is essential for correct
>> rx load striping.
>
>Changing the destination MAC address is definitely not required if you
>segregate each path in a distinct VLAN.
>
>            +-------------------+     +-------------------+
>    +-------|switch 1 - vlan 100|-----|switch 2 - vlan 100|-------+
>    |       +-------------------+     +-------------------+       |
>+------+              |                         |              +------+
>|host A|              |                         |              |host B|
>+------+              |                         |              +------+
>    |       +-------------------+     +-------------------+       |
>    +-------|switch 3 - vlan 200|-----|switch 4 - vlan 200|-------+
>            +-------------------+     +-------------------+
>
>Even in the present of ISL between some switches, packet sent through host
>A interface connected to vlan 100 will only enter host B using the
>interface connected to vlan 100. So every slaves of the bonding interface
>can use the same MAC address.

	That's true.  The big problem with the "VLAN tunnel" approach is
that it's not tolerant of link failures.

>Of course, changing the destination address would be required in order to
>achieve ingress load balancing on a *single* LAN. But, as Jay noted at the
>beginning of this thread, this would violate 802.3ad.
>
>>> I think the only point is whether we need a new xmit_hash_policy for
>>> mode=802.3ad or whether mode=balance-rr could be enough.
>> May by, but it seems to me fair enough not to restrict this feature only
>> to non-LACP aggregate links; dynamic aggregation may be useful(it helps
>> to avoid switch misconfiguration(misconfigured slaves on switch side)
>> sometimes without loss of service).
>
>You are right, but such LAN setup need to be carefully designed and
>built. I'm not sure that an automatic channel aggregation system is the
>right way to do it. Hence the reason why I suggest to use balance-rr with
>VLANs.

	The "VLAN tunnel" approach is a derivative of an actual switch
topology that balance-rr was originally intended for, many moons ago.
This is described in the current bonding.txt; I'll cut & paste a bit
here:

12.2 Maximum Throughput in a Multiple Switch Topology
-----------------------------------------------------

        Multiple switches may be utilized to optimize for throughput
when they are configured in parallel as part of an isolated network
between two or more systems, for example:

                       +-----------+
                       |  Host A   | 
                       +-+---+---+-+
                         |   |   |
                +--------+   |   +---------+
                |            |             |
         +------+---+  +-----+----+  +-----+----+
         | Switch A |  | Switch B |  | Switch C |
         +------+---+  +-----+----+  +-----+----+
                |            |             |
                +--------+   |   +---------+
                         |   |   |
                       +-+---+---+-+
                       |  Host B   | 
                       +-----------+

        In this configuration, the switches are isolated from one
another.  One reason to employ a topology such as this is for an
isolated network with many hosts (a cluster configured for high
performance, for example), using multiple smaller switches can be more
cost effective than a single larger switch, e.g., on a network with 24
hosts, three 24 port switches can be significantly less expensive than
a single 72 port switch.

        If access beyond the network is required, an individual host
can be equipped with an additional network device connected to an
external network; this host then additionally acts as a gateway.

	[end of cut]

	This was described to me some time ago as an early usage model
for balance-rr using multiple 10 Mb/sec switches.  It has the same link
monitoring problems as the "VLAN tunnel" approach, although modern
switches with "trunk failover" type of functionality may be able to
mitigate the problem.

>>> Oleg, would you mind trying the above "two VLAN" topology" with
>>> mode=balance-rr and report any results ? For high-availability purpose,
>>> it's obviously necessary to setup those VLAN on distinct switches.
>> I'll do it, but it will take some time to setup test environment,
>> several days may be.
>
>Thanks. For testing purpose, it is enough to setup those VLAN on a single
>switch if it is easier for you to do.
>
>> You mean following topology:
>
>See above.
>
>> (i'm sure it will work as desired if each host is connected to each
>> switch with only one slave link, if there are more slaves in each switch
>> - unsure)?
>
>If you want to use more than 2 slaves per host, then you need more than 2
>VLAN. You also need to have the exact same number of slaves on all hosts,
>as egress path selection cause ingress path selection at the other side.
>
>            +-------------------+     +-------------------+
>    +-------|switch 1 - vlan 100|-----|switch 2 - vlan 100|-------+
>    |       +-------------------+     +-------------------+       |
>+------+              |                         |              +------+
>|host A|              |                         |              |host B|
>+------+              |                         |              +------+
>  | |       +-------------------+     +-------------------+       | |
>  | +-------|switch 3 - vlan 200|-----|switch 4 - vlan 200|-------+ |
>  |         +-------------------+     +-------------------+         |
>  |                   |                         |                   |
>  |                   |                         |                   |
>  |         +-------------------+     +-------------------+         |
>  +---------|switch 5 - vlan 300|-----|switch 6 - vlan 300|---------+
>            +-------------------+     +-------------------+
>
>Of course, you can add others host to vlan 100, 200 and 300, with the
>exact same configuration at host A or host B.

	This is essentially the same thing as the diagram I pasted in up
above, except with VLANs and an additional layer of switches between the
hosts.  The multiple VLANs take the place of multiple discrete switches.

	This could also be accomplished via bridge groups (in
Cisco-speak).  For example, instead of VLAN 100, that could be bridge
group X, VLAN 200 is bridge group Y, and so on.

	Neither the VLAN nor the bridge group methods handle link
failures very well; if, in the above diagram, the link from "switch 2
vlan 100" to "host B" fails, there's no way for host A to know to stop
sending to "switch 1 vlan 100," and there's no backup path for VLAN 100
to "host B."

	One item I'd like to see some more data on is the level of
reordering at the receiver in Oleg's system.

	One of the reasons round robin isn't as useful as it once was is
due to the rise of NAPI and interrupt coalescing, both of which will
tend to increase the reordering of packets at the receiver when the
packets are evenly striped.  In the old days, it was one interrupt, one
packet.  Now, it's one interrupt or NAPI poll, many packets.  With the
packets striped across interfaces, this will tend to increase
reordering.  E.g.,

	slave 1		slave 2		slave 3
	Packet 1	P2		P3
	P4		P5		P6
	P7		P8		P9

	and so on.  A poll of slave 1 will get packets 1, 4 and 7 (and
probably several more), then a poll of slave 2 will get 2, 5 and 8, etc.

	I haven't done much testing with this lately, but I suspect this
behavior hasn't really changed.  Raising the tcp_reordering sysctl value
can mitigate this somewhat (by making TCP more tolerant of this), but
that doesn't help non-TCP protocols.

	Barring evidence to the contrary, I presume that Oleg's system
delivers out of order at the receiver.  That's not automatically a
reason to reject it, but this entire proposal is sufficiently complex to
configure that very explicit documentation will be necessary.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* pull request: wireless-2.6 2010-01-18
From: John W. Linville @ 2011-01-18 20:21 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Dave,

Here is another batch of fixes intended for 2.6.38.  Included are a
locking fix, and endian fix, a lockdep fix, an error code fix, a memory
leak fix, a type fix, and a couple of others.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 16c0f9362433a76f01d174bb8b9c87b9a96198ee:

  qeth: l3 hw tx csum circumvent hw bug (2011-01-15 20:45:57 -0800)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git master

Amitkumar Karwar (1):
      ieee80211: correct IEEE80211_ADDBA_PARAM_BUF_SIZE_MASK macro

Axel Lin (1):
      iwmc3200wifi: Return proper error for iwm_if_alloc

Bob Copeland (1):
      ath5k: fix locking in tx_complete_poll_work

Jesper Juhl (1):
      rt2x00: Don't leak mem in error path of rt2x00lib_request_firmware()

Johannes Berg (1):
      mac80211: fix lockdep warning

Luciano Coelho (1):
      mac80211: use maximum number of AMPDU frames as default in BA RX

Luis R. Rodriguez (1):
      ath9k_hw: ASPM interoperability fix for AR9380/AR9382

Rajkumar Manoharan (2):
      ath9k_htc: Fix endian issue in tx header
      ath9k_hw: do PA offset calibration only on longcal interval

Wey-Yi Guy (1):
      iwlwifi: fix valid chain reading from EEPROM

 drivers/net/wireless/ath/ath5k/base.c              |    4 ++++
 drivers/net/wireless/ath/ath9k/ar9002_calib.c      |   10 +++++-----
 .../net/wireless/ath/ath9k/ar9003_2p2_initvals.h   |    2 +-
 drivers/net/wireless/ath/ath9k/ar9003_hw.c         |    4 ++--
 drivers/net/wireless/ath/ath9k/htc.h               |    2 +-
 drivers/net/wireless/ath/ath9k/htc_drv_txrx.c      |    8 +++++---
 drivers/net/wireless/iwlwifi/iwl-agn-eeprom.c      |    2 +-
 drivers/net/wireless/iwmc3200wifi/netdev.c         |    2 ++
 drivers/net/wireless/rt2x00/rt2x00firmware.c       |    1 +
 include/linux/ieee80211.h                          |    2 +-
 net/mac80211/agg-rx.c                              |   11 ++---------
 net/mac80211/main.c                                |   12 +++++++++++-
 12 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 019a74d..09ae4ef 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -2294,6 +2294,8 @@ ath5k_tx_complete_poll_work(struct work_struct *work)
 	int i;
 	bool needreset = false;
 
+	mutex_lock(&sc->lock);
+
 	for (i = 0; i < ARRAY_SIZE(sc->txqs); i++) {
 		if (sc->txqs[i].setup) {
 			txq = &sc->txqs[i];
@@ -2321,6 +2323,8 @@ ath5k_tx_complete_poll_work(struct work_struct *work)
 		ath5k_reset(sc, NULL, true);
 	}
 
+	mutex_unlock(&sc->lock);
+
 	ieee80211_queue_delayed_work(sc->hw, &sc->tx_complete_work,
 		msecs_to_jiffies(ATH5K_TX_COMPLETE_POLL_INT));
 }
diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
index ea2e7d7..5e300bd 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
@@ -679,10 +679,6 @@ static bool ar9002_hw_calibrate(struct ath_hw *ah,
 
 	/* Do NF cal only at longer intervals */
 	if (longcal || nfcal_pending) {
-		/* Do periodic PAOffset Cal */
-		ar9002_hw_pa_cal(ah, false);
-		ar9002_hw_olc_temp_compensation(ah);
-
 		/*
 		 * Get the value from the previous NF cal and update
 		 * history buffer.
@@ -697,8 +693,12 @@ static bool ar9002_hw_calibrate(struct ath_hw *ah,
 			ath9k_hw_loadnf(ah, ah->curchan);
 		}
 
-		if (longcal)
+		if (longcal) {
 			ath9k_hw_start_nfcal(ah, false);
+			/* Do periodic PAOffset Cal */
+			ar9002_hw_pa_cal(ah, false);
+			ar9002_hw_olc_temp_compensation(ah);
+		}
 	}
 
 	return iscaldone;
diff --git a/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h b/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h
index 81f9cf2..9ecca93 100644
--- a/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h
+++ b/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h
@@ -1842,7 +1842,7 @@ static const u32 ar9300_2p2_soc_preamble[][2] = {
 
 static const u32 ar9300PciePhy_pll_on_clkreq_disable_L1_2p2[][2] = {
 	/* Addr      allmodes  */
-	{0x00004040, 0x08212e5e},
+	{0x00004040, 0x0821265e},
 	{0x00004040, 0x0008003b},
 	{0x00004044, 0x00000000},
 };
diff --git a/drivers/net/wireless/ath/ath9k/ar9003_hw.c b/drivers/net/wireless/ath/ath9k/ar9003_hw.c
index 6137634..06fb2c8 100644
--- a/drivers/net/wireless/ath/ath9k/ar9003_hw.c
+++ b/drivers/net/wireless/ath/ath9k/ar9003_hw.c
@@ -146,8 +146,8 @@ static void ar9003_hw_init_mode_regs(struct ath_hw *ah)
 		/* Sleep Setting */
 
 		INIT_INI_ARRAY(&ah->iniPcieSerdesLowPower,
-				ar9300PciePhy_clkreq_enable_L1_2p2,
-				ARRAY_SIZE(ar9300PciePhy_clkreq_enable_L1_2p2),
+				ar9300PciePhy_pll_on_clkreq_disable_L1_2p2,
+				ARRAY_SIZE(ar9300PciePhy_pll_on_clkreq_disable_L1_2p2),
 				2);
 
 		/* Fast clock modal settings */
diff --git a/drivers/net/wireless/ath/ath9k/htc.h b/drivers/net/wireless/ath/ath9k/htc.h
index 1ce506f..780ac5e 100644
--- a/drivers/net/wireless/ath/ath9k/htc.h
+++ b/drivers/net/wireless/ath/ath9k/htc.h
@@ -78,7 +78,7 @@ struct tx_frame_hdr {
 	u8 node_idx;
 	u8 vif_idx;
 	u8 tidno;
-	u32 flags; /* ATH9K_HTC_TX_* */
+	__be32 flags; /* ATH9K_HTC_TX_* */
 	u8 key_type;
 	u8 keyix;
 	u8 reserved[26];
diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
index 33f3602..7a5ffca 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
@@ -113,6 +113,7 @@ int ath9k_htc_tx_start(struct ath9k_htc_priv *priv, struct sk_buff *skb)
 
 	if (ieee80211_is_data(fc)) {
 		struct tx_frame_hdr tx_hdr;
+		u32 flags = 0;
 		u8 *qc;
 
 		memset(&tx_hdr, 0, sizeof(struct tx_frame_hdr));
@@ -136,13 +137,14 @@ int ath9k_htc_tx_start(struct ath9k_htc_priv *priv, struct sk_buff *skb)
 		/* Check for RTS protection */
 		if (priv->hw->wiphy->rts_threshold != (u32) -1)
 			if (skb->len > priv->hw->wiphy->rts_threshold)
-				tx_hdr.flags |= ATH9K_HTC_TX_RTSCTS;
+				flags |= ATH9K_HTC_TX_RTSCTS;
 
 		/* CTS-to-self */
-		if (!(tx_hdr.flags & ATH9K_HTC_TX_RTSCTS) &&
+		if (!(flags & ATH9K_HTC_TX_RTSCTS) &&
 		    (priv->op_flags & OP_PROTECT_ENABLE))
-			tx_hdr.flags |= ATH9K_HTC_TX_CTSONLY;
+			flags |= ATH9K_HTC_TX_CTSONLY;
 
+		tx_hdr.flags = cpu_to_be32(flags);
 		tx_hdr.key_type = ath9k_cmn_get_hw_crypto_keytype(skb);
 		if (tx_hdr.key_type == ATH9K_KEY_TYPE_CLEAR)
 			tx_hdr.keyix = (u8) ATH9K_TXKEYIX_INVALID;
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-eeprom.c b/drivers/net/wireless/iwlwifi/iwl-agn-eeprom.c
index 97906dd..14ceb4d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-eeprom.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-eeprom.c
@@ -168,7 +168,7 @@ int iwl_eeprom_check_sku(struct iwl_priv *priv)
 		/* not using .cfg overwrite */
 		radio_cfg = iwl_eeprom_query16(priv, EEPROM_RADIO_CONFIG);
 		priv->cfg->valid_tx_ant = EEPROM_RF_CFG_TX_ANT_MSK(radio_cfg);
-		priv->cfg->valid_rx_ant = EEPROM_RF_CFG_TX_ANT_MSK(radio_cfg);
+		priv->cfg->valid_rx_ant = EEPROM_RF_CFG_RX_ANT_MSK(radio_cfg);
 		if (!priv->cfg->valid_tx_ant || !priv->cfg->valid_rx_ant) {
 			IWL_ERR(priv, "Invalid chain (0X%x, 0X%x)\n",
 				priv->cfg->valid_tx_ant,
diff --git a/drivers/net/wireless/iwmc3200wifi/netdev.c b/drivers/net/wireless/iwmc3200wifi/netdev.c
index 13a69eb..5091d77 100644
--- a/drivers/net/wireless/iwmc3200wifi/netdev.c
+++ b/drivers/net/wireless/iwmc3200wifi/netdev.c
@@ -126,6 +126,7 @@ void *iwm_if_alloc(int sizeof_bus, struct device *dev,
 	ndev = alloc_netdev_mq(0, "wlan%d", ether_setup, IWM_TX_QUEUES);
 	if (!ndev) {
 		dev_err(dev, "no memory for network device instance\n");
+		ret = -ENOMEM;
 		goto out_priv;
 	}
 
@@ -138,6 +139,7 @@ void *iwm_if_alloc(int sizeof_bus, struct device *dev,
 				    GFP_KERNEL);
 	if (!iwm->umac_profile) {
 		dev_err(dev, "Couldn't alloc memory for profile\n");
+		ret = -ENOMEM;
 		goto out_profile;
 	}
 
diff --git a/drivers/net/wireless/rt2x00/rt2x00firmware.c b/drivers/net/wireless/rt2x00/rt2x00firmware.c
index f0e1eb7..be0ff78 100644
--- a/drivers/net/wireless/rt2x00/rt2x00firmware.c
+++ b/drivers/net/wireless/rt2x00/rt2x00firmware.c
@@ -58,6 +58,7 @@ static int rt2x00lib_request_firmware(struct rt2x00_dev *rt2x00dev)
 
 	if (!fw || !fw->size || !fw->data) {
 		ERROR(rt2x00dev, "Failed to read Firmware.\n");
+		release_firmware(fw);
 		return -ENOENT;
 	}
 
diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h
index 6042228..294169e 100644
--- a/include/linux/ieee80211.h
+++ b/include/linux/ieee80211.h
@@ -959,7 +959,7 @@ struct ieee80211_ht_info {
 /* block-ack parameters */
 #define IEEE80211_ADDBA_PARAM_POLICY_MASK 0x0002
 #define IEEE80211_ADDBA_PARAM_TID_MASK 0x003C
-#define IEEE80211_ADDBA_PARAM_BUF_SIZE_MASK 0xFFA0
+#define IEEE80211_ADDBA_PARAM_BUF_SIZE_MASK 0xFFC0
 #define IEEE80211_DELBA_PARAM_TID_MASK 0xF000
 #define IEEE80211_DELBA_PARAM_INITIATOR_MASK 0x0800
 
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index f138b19..227ca82 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -185,8 +185,6 @@ void ieee80211_process_addba_request(struct ieee80211_local *local,
 				     struct ieee80211_mgmt *mgmt,
 				     size_t len)
 {
-	struct ieee80211_hw *hw = &local->hw;
-	struct ieee80211_conf *conf = &hw->conf;
 	struct tid_ampdu_rx *tid_agg_rx;
 	u16 capab, tid, timeout, ba_policy, buf_size, start_seq_num, status;
 	u8 dialog_token;
@@ -231,13 +229,8 @@ void ieee80211_process_addba_request(struct ieee80211_local *local,
 		goto end_no_lock;
 	}
 	/* determine default buffer size */
-	if (buf_size == 0) {
-		struct ieee80211_supported_band *sband;
-
-		sband = local->hw.wiphy->bands[conf->channel->band];
-		buf_size = IEEE80211_MIN_AMPDU_BUF;
-		buf_size = buf_size << sband->ht_cap.ampdu_factor;
-	}
+	if (buf_size == 0)
+		buf_size = IEEE80211_MAX_AMPDU_BUF;
 
 
 	/* examine state machine */
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 485d36b..a46ff06 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -39,6 +39,8 @@ module_param(ieee80211_disable_40mhz_24ghz, bool, 0644);
 MODULE_PARM_DESC(ieee80211_disable_40mhz_24ghz,
 		 "Disable 40MHz support in the 2.4GHz band");
 
+static struct lock_class_key ieee80211_rx_skb_queue_class;
+
 void ieee80211_configure_filter(struct ieee80211_local *local)
 {
 	u64 mc;
@@ -569,7 +571,15 @@ struct ieee80211_hw *ieee80211_alloc_hw(size_t priv_data_len,
 	spin_lock_init(&local->filter_lock);
 	spin_lock_init(&local->queue_stop_reason_lock);
 
-	skb_queue_head_init(&local->rx_skb_queue);
+	/*
+	 * The rx_skb_queue is only accessed from tasklets,
+	 * but other SKB queues are used from within IRQ
+	 * context. Therefore, this one needs a different
+	 * locking class so our direct, non-irq-safe use of
+	 * the queue's lock doesn't throw lockdep warnings.
+	 */
+	skb_queue_head_init_class(&local->rx_skb_queue,
+				  &ieee80211_rx_skb_queue_class);
 
 	INIT_DELAYED_WORK(&local->scan_work, ieee80211_scan_work);
 
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [patch 4/4] ipset: fix build with NDEBUG defined
From: Jozsef Kadlecsik @ 2011-01-18 20:20 UTC (permalink / raw)
  To: holger; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142248.489615870@eitzenberger.org>

On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:

> The usage of the gcc option -Wunused-parameter interferes badly with
> the assert() macros.  In case -DNDEBUG is specified build fails with:
> 
>   cc1: warnings being treated as errors
>   print.c: In function 'ipset_print_family':
>   print.c:92: error: unused parameter 'opt'
>   print.c: In function 'ipset_print_port':
>   print.c:413: error: unused parameter 'opt'
>   print.c: In function 'ipset_print_proto':
> 
> A possible fix is just to remove -Wunused, as -Wextra + -Wunused enables
> -Wunused-paramter.

I chose to keep the compiler flags and add the required attribute to the 
function parameters instead.

Many thanks again, Holger!

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [PATCH] af_unix: implement socket filter
From: Eric Dumazet @ 2011-01-18 20:19 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Ian Molton, netdev, linux-kernel, davem, ebiederm, xemul, davidel
In-Reply-To: <20110118175143.3a164669@chocolatine.cbg.collabora.co.uk>

Le mardi 18 janvier 2011 à 17:51 +0000, Alban Crequy a écrit :
> Le Tue, 18 Jan 2011 18:22:41 +0100,
> Eric Dumazet <eric.dumazet@gmail.com> a écrit :

> > Any idea on performance cost adding sk_filter() call ?
> 
> Ian will write a performance test and repost the patch with some stats.
> I don't know about the performance cost.

Dont spend time on this, it was more a question for myself ;)

Cost should be very small, unless complex filter is used, and I have a
JIT compiler for BPF on x86_64, will post it when net-next-2.6 reopens.

^ permalink raw reply

* Re: [patch 2/4] ipset: make IPv4 and IPv6 address handling similar
From: Jozsef Kadlecsik @ 2011-01-18 20:18 UTC (permalink / raw)
  To: holger; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142247.498399684@eitzenberger.org>

On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:

> While the following works for AF_INET:
> 
>  ipset add foo 192.168.1.1/32
> 
> this does not work for AF_INET6:
> 
>  ipset add foo6 20a1:1:2:3:4:5:6:7/128
>  ipset v5.2: Syntax error: plain IP address must be supplied: 20a1:1:2:3:4:5:6:7/128

Yeah, the usual issue: should IPv4/32 and IPv6/128 be handled as a plain 
IPv4/v6 address when the manual says "enter a plain IPv4/v6 address" :-).

The complete fix was to add the exception to the generic IP address parser 
function.

Best regards,
Jozsef  
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [patch 3/4] ipset: do session initialization once
From: Jozsef Kadlecsik @ 2011-01-18 20:16 UTC (permalink / raw)
  To: holger; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142247.991392505@eitzenberger.org>

On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:

> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
> 
> Index: ipset/src/ipset.c
> ===================================================================
> --- ipset.orig/src/ipset.c	2011-01-05 12:05:31.000000000 +0100
> +++ ipset/src/ipset.c	2011-01-05 12:07:02.000000000 +0100
> @@ -431,14 +431,6 @@
>  	const struct ipset_commands *command;
>  	const struct ipset_type *type;
>  
> -	/* Initialize session */
> -	if (session == NULL) {
> -		session = ipset_session_init(printf);
> -		if (session == NULL)
> -			return exit_error(OTHER_PROBLEM,
> -				"Cannot initialize ipset session, aborting.");
> -	}
> -
>  	/* Commandline parsing, somewhat similar to that of 'ip' */
>  
>  	/* First: parse core options */
> @@ -743,5 +735,10 @@
>  	ipset_type_add(&ipset_hash_ipportnet0);
>  	ipset_type_add(&ipset_list_set0);
>  
> +	session = ipset_session_init(printf);
> +	if (session == NULL)
> +		return exit_error(OTHER_PROBLEM,
> +						  "Cannot initialize ipset session, aborting.");
> +
>  	return parse_commandline(argc, argv);
>  }
> 
> -- 
> 

Applied, thanks!

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [patch 1/4] ipset: show correct line numbers in restore output
From: Jozsef Kadlecsik @ 2011-01-18 20:15 UTC (permalink / raw)
  To: holger; +Cc: netfilter-devel, netdev
In-Reply-To: <20110118142247.001984763@eitzenberger.org>

Hi Holger,

First of all, thanks for the patches and reports.

On Tue, 18 Jan 2011, holger@eitzenberger.org wrote:

> When passing something like
> 
>   create foo6 hash:ip hashsize 64 family inet6
>   add foo6 20a1:1234:5678::/64
>   add foo6 20a1:1234:5679::/64
> 
> you get:
> 
>   ipset v5.2: Error in line 1: Syntax error: plain IP address must be supplied: 20a1:1234:5678::/64
> 
> Should be line 2 though.

Yes, good catch! Unfortunately your patch overwrites the correct line 
number when reported by the kernel. The proper fix was to add the missing 
session line number setting *before* the parser is called.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-18 20:13 UTC (permalink / raw)
  To: Rick Jones
  Cc: Simon Horman, Jesse Gross, Eric Dumazet, Rusty Russell,
	virtualization, dev, virtualization, netdev, kvm
In-Reply-To: <4D35ECE2.4040901@hp.com>

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> >So it won't be all that simple to implement well, and before we try,
> >I'd like to know whether there are applications that are helped
> >by it. For example, we could try to measure latency at various
> >pps and see whether the backpressure helps. netperf has -b, -w
> >flags which might help these measurements.
> 
> Those options are enabled when one adds --enable-burst to the
> pre-compilation ./configure  of netperf (one doesn't have to
> recompile netserver).  However, if one is also looking at latency
> statistics via the -j option in the top-of-trunk, or simply at the
> histogram with --enable-histogram on the ./configure and a verbosity
> level of 2 (global -v 2) then one wants the very top of trunk
> netperf from:
> 
> http://www.netperf.org/svn/netperf2/trunk
> 
> to get the recently added support for accurate (netperf level) RTT
> measuremnts on burst-mode request/response tests.
> 
> happy benchmarking,
> 
> rick jones
> 
> PS - the enhanced latency statistics from -j are only available in
> the "omni" version of the TCP_RR test.  To get that add a
> --enable-omni to the ./configure - and in this case both netperf and
> netserver have to be recompiled.


Is this TCP only? I would love to get latency data from UDP as well.

>  For very basic output one can
> peruse the output of:
> 
> src/netperf -t omni -- -O /?
> 
> and then pick those outputs of interest and put them into an output
> selection file which one then passes to either (test-specific) -o,
> -O or -k to get CVS, "Human" or keyval output respectively.  E.G.
> 
> raj@tardy:~/netperf2_trunk$ cat foo
> THROUGHPUT,THROUGHPUT_UNITS
> RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
> P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY
> 
> when foo is passed to -o one will get those all on one line of CSV.
> To -O one gets three lines of more netperf-classic-like "human"
> readable output, and when one passes that to -k one gets a string of
> keyval output a la:
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t omni -j -v 2 -- -r 1 -d rr -k foo
> OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost
> (127.0.0.1) port 0 AF_INET : histogram
> THROUGHPUT=29454.12
> THROUGHPUT_UNITS=Trans/s
> RT_LATENCY=33.951
> MIN_LATENCY=19
> MEAN_LATENCY=32.00
> MAX_LATENCY=126
> P50_LATENCY=32
> P90_LATENCY=38
> P99_LATENCY=41
> STDDEV_LATENCY=5.46
> 
> Histogram of request/response times
> UNIT_USEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> TEN_USEC      :    0: 3553: 45244: 237790: 7859:   86:   10:    3:    0:    0
> HUNDRED_USEC  :    0:    2:    0:    0:    0:    0:    0:    0:    0:    0
> UNIT_MSEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> TEN_MSEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> HUNDRED_MSEC  :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> UNIT_SEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> TEN_SEC       :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
> >100_SECS: 0
> HIST_TOTAL:      294547

^ permalink raw reply

* Re: [PATCH] vhost: rcu annotation fixup
From: Michael S. Tsirkin @ 2011-01-18 20:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110118190232.GM2193@linux.vnet.ibm.com>

On Tue, Jan 18, 2011 at 11:02:33AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
> > > On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
> > > > When built with rcu checks enabled, vhost triggers
> > > > bogus warnings as vhost features are read without
> > > > dev->mutex sometimes.
> > > > Fixing it properly is not trivial as vhost.h does not
> > > > know which lockdep classes it will be used under.
> > > > Disable the warning by stubbing out the check for now.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  drivers/vhost/vhost.h |    4 +---
> > > >  1 files changed, 1 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > > index 2af44b7..2d03a31 100644
> > > > --- a/drivers/vhost/vhost.h
> > > > +++ b/drivers/vhost/vhost.h
> > > > @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> > > >  {
> > > >  	unsigned acked_features;
> > > > 
> > > > -	acked_features =
> > > > -		rcu_dereference_index_check(dev->acked_features,
> > > > -					    lockdep_is_held(&dev->mutex));
> > > > +	acked_features = rcu_dereference_index_check(dev->acked_features, 1);
> > > 
> > > Ouch!!!
> > > 
> > > Could you please at least add a comment?
> > 
> > Yes, OK.
> > 
> > > Alternatively, pass in the lock that is held and check for that?  Given
> > > that this is a static inline, the compiler should be able to optimize
> > > the argument away when !PROVE_RCU, correct?
> > > 
> > > 							Thanx, Paul
> > 
> > Hopefully, yes. We don't always have a lock: the idea was
> > to create a lockdep for these cases. But we can't pass
> > the pointer to that ...
> 
> I suppose you could pass a pointer to the lockdep map structure.
> Not sure if this makes sense, but it would handle the situation.

Will it compile with lockdep disabled too? What will the pointer be?

> Alternatively, create a helper function that checks the possibilities
> and screams if none of them are in effect.
> 
> 							Thanx, Paul

The problem here is the callee needs to know about all callers.

> > > >  	return acked_features & (1 << bit);
> > > >  }
> > > > 
> > > > -- 
> > > > 1.7.3.2.91.g446ac
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Jarek Poplawski @ 2011-01-18 20:07 UTC (permalink / raw)
  To: Alessandro Suardi
  Cc: Jan Engelhardt, jamal, David Miller, pablo, arthur.marsh,
	eric.dumazet, netdev
In-Reply-To: <AANLkTi=00FRr08NjeMLN75U_N-cb9F78=E26LmMY7-uV@mail.gmail.com>

On Tue, Jan 18, 2011 at 08:26:42PM +0100, Alessandro Suardi wrote:
> On Tue, Jan 18, 2011 at 7:47 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > On Tue, Jan 18, 2011 at 07:28:52PM +0100, Jarek Poplawski wrote:
> >> On Tue, Jan 18, 2011 at 07:24:40PM +0100, Jan Engelhardt wrote:
> >> >
> >> > On Tuesday 2011-01-18 19:10, Alessandro Suardi wrote:
> >> > >On Tue, Jan 18, 2011 at 6:23 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> >> > >>
> >> > >> NLM_F_DUMP flags should be applied to GET requests only, eg. rtnetlink
> >> > >> tests message type to verify this. Since genetlink can't do the same
> >> > >> use "practical" test for ops->dumpit (assuming NEW request won't be
> >> > >> mixed with GET).
> >> ...
> >> > >2.6.37-git18 + netlink revert + this patch
> >> > > - fixes Avahi
> >> > > - breaks acpid
> >> > >Starting acpi daemon: RTNETLINK1 answers: Operation not supported
> >> > >acpid: error talking to the kernel via netlink
> >> >
> >> > Deducing from that, it is a GET-like request that was sent by acpid,
> >> > and the message type is one that has both a dumpit and a doit function.
> >> > So if EOPNOTSUPP now occurs on all message types that have both dumpit
> >> > and doit, you should have broken a lot more than just acpid.
> >>
> >> Right, we need something better here.
> >
> > On the other hand, until there is something better, we might try to
> > fix it at least for "normal" dumpit cases?
> >
> > Alessandro, could you try (with the netlink revert)?
...
> 2.6.37-git18 + netlink revert + this 2nd attempt
> 
>  appears to be good for me - both Avahi and acpid start up fine and I
>  can't see any other program misbehaving.
> 
> 
Alessandro, thanks for testing!

Jan, feel free to NAK if it can't help for your problem.

Jarek P.
---------------->
[PATCH v2] netlink: Fix possible NLM_F_DUMP misuse in genetlink

NLM_F_DUMP flags should be applied to GET requests only, eg. rtnetlink
tests message type to verify this. Since genetlink can't do the same
use "practical" test for ops->dumpit, assuming NEW request won't be
mixed with GET. Otherwise, it should work old way. Since, as reported
by Alessandro, apps like acpid use messages with ops->dumpit without
NLM_F_DUMP flags, there is no error reporting for this case.

Original patch by: Jan Engelhardt <jengelh@medozas.de>

Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
---

diff -Nurp a/net/netlink/genetlink.c b/net/netlink/genetlink.c
--- a/net/netlink/genetlink.c	2011-01-18 16:58:16.000000000 +0100
+++ b/net/netlink/genetlink.c	2011-01-18 19:36:25.000000000 +0100
@@ -519,15 +519,14 @@ static int genl_rcv_msg(struct sk_buff *
 	    security_netlink_recv(skb, CAP_NET_ADMIN))
 		return -EPERM;
 
-	if (nlh->nlmsg_flags & NLM_F_DUMP) {
-		if (ops->dumpit == NULL)
-			return -EOPNOTSUPP;
-
-		genl_unlock();
-		err = netlink_dump_start(net->genl_sock, skb, nlh,
-					 ops->dumpit, ops->done);
-		genl_lock();
-		return err;
+	if (ops->dumpit) {
+		if (nlh->nlmsg_flags & NLM_F_DUMP) {
+			genl_unlock();
+			err = netlink_dump_start(net->genl_sock, skb, nlh,
+						 ops->dumpit, ops->done);
+			genl_lock();
+			return err;
+		}
 	}
 
 	if (ops->doit == NULL)

^ permalink raw reply

* Re: [regression] 2.6.37+ commit 0363466866d9.... breaks tcp ipv6
From: Jesse Gross @ 2011-01-18 20:06 UTC (permalink / raw)
  To: Hans de Bruin; +Cc: LKML, netdev
In-Reply-To: <4D35EF64.1040906@xmsnet.nl>

On Tue, Jan 18, 2011 at 11:52 AM, Hans de Bruin <jmdebruin@xmsnet.nl> wrote:
> On 01/16/2011 09:24 PM, Hans de Bruin wrote:
>>
>> After last nights compile i lost the possibility to connect to ssh and
>> http over ipv6. The connection stops at syn_sent. connections to my
>> machine end in syn_recv. ping6 still works.
>>
>
> The bisect ended in:
>
> 0363466866d901fbc658f4e63dd61e7cc93dd0af is the first bad commit
> commit 0363466866d901fbc658f4e63dd61e7cc93dd0af
> Author: Jesse Gross <jesse@nicira.com>
> Date:   Sun Jan 9 06:23:35 2011 +0000
>
>    net offloading: Convert checksums to use centrally computed features.
>
>    In order to compute the features for other offloads (primarily
>    scatter/gather), we need to first check the ability of the NIC to
>    offload the checksum for the packet.  Since we have already computed
>    this, we can directly use the result instead of figuring it out
>    again.
>
>    Signed-off-by: Jesse Gross <jesse@nicira.com>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> ssh ::1  still works. And since dns still works I guess udp is not affected.
> My nic is a:
>
> 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit
> Ethernet PCI Express (rev 02)

Are you using vlans?  If so, can you please test this patch?
http://patchwork.ozlabs.org/patch/79264/

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Rick Jones @ 2011-01-18 19:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Simon Horman, Jesse Gross, Eric Dumazet, Rusty Russell,
	virtualization, dev, virtualization, netdev, kvm
In-Reply-To: <20110117102655.GH23479@redhat.com>

> So it won't be all that simple to implement well, and before we try,
> I'd like to know whether there are applications that are helped
> by it. For example, we could try to measure latency at various
> pps and see whether the backpressure helps. netperf has -b, -w
> flags which might help these measurements.

Those options are enabled when one adds --enable-burst to the pre-compilation 
./configure  of netperf (one doesn't have to recompile netserver).  However, if 
one is also looking at latency statistics via the -j option in the top-of-trunk, 
or simply at the histogram with --enable-histogram on the ./configure and a 
verbosity level of 2 (global -v 2) then one wants the very top of trunk netperf 
from:

http://www.netperf.org/svn/netperf2/trunk

to get the recently added support for accurate (netperf level) RTT measuremnts 
on burst-mode request/response tests.

happy benchmarking,

rick jones

PS - the enhanced latency statistics from -j are only available in the "omni" 
version of the TCP_RR test.  To get that add a --enable-omni to the ./configure 
- and in this case both netperf and netserver have to be recompiled.  For very 
basic output one can peruse the output of:

src/netperf -t omni -- -O /?

and then pick those outputs of interest and put them into an output selection 
file which one then passes to either (test-specific) -o, -O or -k to get CVS, 
"Human" or keyval output respectively.  E.G.

raj@tardy:~/netperf2_trunk$ cat foo
THROUGHPUT,THROUGHPUT_UNITS
RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY

when foo is passed to -o one will get those all on one line of CSV.  To -O one 
gets three lines of more netperf-classic-like "human" readable output, and when 
one passes that to -k one gets a string of keyval output a la:

raj@tardy:~/netperf2_trunk$ src/netperf -t omni -j -v 2 -- -r 1 -d rr -k foo
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 
AF_INET : histogram
THROUGHPUT=29454.12
THROUGHPUT_UNITS=Trans/s
RT_LATENCY=33.951
MIN_LATENCY=19
MEAN_LATENCY=32.00
MAX_LATENCY=126
P50_LATENCY=32
P90_LATENCY=38
P99_LATENCY=41
STDDEV_LATENCY=5.46

Histogram of request/response times
UNIT_USEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_USEC      :    0: 3553: 45244: 237790: 7859:   86:   10:    3:    0:    0
HUNDRED_USEC  :    0:    2:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_MSEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_MSEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
HUNDRED_MSEC  :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_SEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_SEC       :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
 >100_SECS: 0
HIST_TOTAL:      294547


^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.9)
From: Eric Dumazet @ 2011-01-18 19:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev
In-Reply-To: <20110118110634.7386c757@nehalam>

Le mardi 18 janvier 2011 à 11:06 -0800, Stephen Hemminger a écrit :

> +static bool choke_match_flow(struct sk_buff *skb1, struct sk_buff *skb2)
> +{
> +	int off1, off2, poff;
> +	u8 ip_proto;
> +	u32 ihl;
> +
> +	if (skb1->protocol != skb2->protocol)
> +		return false;
> +
> +	off1 = skb_network_offset(skb1);
> +	off2 = skb_network_offset(skb2);

> +
> +	switch (skb1->protocol) {
> +	case __constant_htons(ETH_P_IP): {
> +		struct iphdr *ip1, *ip2;
> +
> +		if (!pskb_may_pull(skb1, sizeof(struct iphdr) + off1))
> +			return false;
> +
	pskb_network_may_pull() might be cleaner


> +		ip1 = (struct iphdr *) (skb1->data + off1);
	ip1 = ip_hdr(skb);

> +		if (ip1->frag_off & htons(IP_MF | IP_OFFSET))
> +			return false;	/* don't compare fragments */
> +

Hmm, we should compare fragments if possible.

saddr/daddr are available, not the ports.

> +		if (!pskb_may_pull(skb2, sizeof(struct iphdr) + off2))
> +			return false;
> +
> +		ip2 = (struct iphdr *) (skb2->data + off2);
> +		if (ip2->frag_off & htons(IP_MF | IP_OFFSET))
> +			return false;
> +
> +		if (ip1->protocol != ip2->protocol ||
> +		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
> +			return false;
> +


	What happens if ip1->ihl != ip2->ihl  here ?

Here I would add the fragment test :

	if ((ip1->frag_off | ip2->frag_off)) & htons(IP_MF | IP_OFFSET))
		return true;

> +		ip_proto = ip1->protocol;
> +		ihl = ip1->ihl;
> +		break;
> +	}
> +
> +	case __constant_htons(ETH_P_IPV6): {
> +		struct ipv6hdr *ip1, *ip2;
> +
> +		if (!pskb_may_pull(skb1, sizeof(struct ipv6hdr *) + off1))
> +			return false;

ouch... sizeof(sizeof(struct ipv6hdr *) is not what you want but
sizeof(struct ipv6hdr) is.

So just use :

	pskb_network_may_pull(skb1, sizeof(*ip1))
> +
> +		if (!pskb_may_pull(skb2, sizeof(struct ipv6hdr *) + off2))
> +			return false;
> +
> +		ip1 = (struct ipv6hdr *) (skb1->data + off1);
	ip1 = ipv6_hdr(skb1);
> +		ip2 = (struct ipv6hdr *) (skb2->data + off2);
> +
> +		if (ip1->nexthdr != ip2->nexthdr ||
> +		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) != 0 ||
> +		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
> +			return false;
> +
> +		ihl = (40 >> 2);
> +		ip_proto = ip1->nexthdr;
> +		break;
> +	}
> +
> +	default:
> +		return false;
> +	}
> +




^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Alessandro Suardi @ 2011-01-18 19:26 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Jan Engelhardt, jamal, David Miller, pablo, arthur.marsh,
	eric.dumazet, netdev
In-Reply-To: <20110118184730.GD4202@del.dom.local>

On Tue, Jan 18, 2011 at 7:47 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> On Tue, Jan 18, 2011 at 07:28:52PM +0100, Jarek Poplawski wrote:
>> On Tue, Jan 18, 2011 at 07:24:40PM +0100, Jan Engelhardt wrote:
>> >
>> > On Tuesday 2011-01-18 19:10, Alessandro Suardi wrote:
>> > >On Tue, Jan 18, 2011 at 6:23 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
>> > >>
>> > >> NLM_F_DUMP flags should be applied to GET requests only, eg. rtnetlink
>> > >> tests message type to verify this. Since genetlink can't do the same
>> > >> use "practical" test for ops->dumpit (assuming NEW request won't be
>> > >> mixed with GET).
>> ...
>> > >2.6.37-git18 + netlink revert + this patch
>> > > - fixes Avahi
>> > > - breaks acpid
>> > >Starting acpi daemon: RTNETLINK1 answers: Operation not supported
>> > >acpid: error talking to the kernel via netlink
>> >
>> > Deducing from that, it is a GET-like request that was sent by acpid,
>> > and the message type is one that has both a dumpit and a doit function.
>> > So if EOPNOTSUPP now occurs on all message types that have both dumpit
>> > and doit, you should have broken a lot more than just acpid.
>>
>> Right, we need something better here.
>
> On the other hand, until there is something better, we might try to
> fix it at least for "normal" dumpit cases?
>
> Alessandro, could you try (with the netlink revert)?
>
> Thanks,
> Jarek P.
>
> ---
> diff -Nurp a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> --- a/net/netlink/genetlink.c   2011-01-18 16:58:16.000000000 +0100
> +++ b/net/netlink/genetlink.c   2011-01-18 19:36:25.000000000 +0100
> @@ -519,15 +519,14 @@ static int genl_rcv_msg(struct sk_buff *
>            security_netlink_recv(skb, CAP_NET_ADMIN))
>                return -EPERM;
>
> -       if (nlh->nlmsg_flags & NLM_F_DUMP) {
> -               if (ops->dumpit == NULL)
> -                       return -EOPNOTSUPP;
> -
> -               genl_unlock();
> -               err = netlink_dump_start(net->genl_sock, skb, nlh,
> -                                        ops->dumpit, ops->done);
> -               genl_lock();
> -               return err;
> +       if (ops->dumpit) {
> +               if (nlh->nlmsg_flags & NLM_F_DUMP) {
> +                       genl_unlock();
> +                       err = netlink_dump_start(net->genl_sock, skb, nlh,
> +                                                ops->dumpit, ops->done);
> +                       genl_lock();
> +                       return err;
> +               }
>        }
>
>        if (ops->doit == NULL)
>

Sure enough :)


2.6.37-git18 + netlink revert + this 2nd attempt

 appears to be good for me - both Avahi and acpid start up fine and I
 can't see any other program misbehaving.


Thanks, ciao,

--alessandro

 "There's always a siren singing you to shipwreck"

   (Radiohead, "There There")

^ permalink raw reply

* Re: rps testing questions
From: Rick Jones @ 2011-01-18 19:10 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: mi wake, netdev
In-Reply-To: <1295375676.3537.83.camel@bwh-desktop>

Ben Hutchings wrote:
> On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote:
> 
>>Ben Hutchings wrote:
>>
>>>On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
> 
> [...]
> 
>>>>I do ab and tbench testing also find there is less tps with enable
>>>>rps.but,there is more cpu using when with enable rps.when with enable
>>>>rps ,softirqs is blanced  on cpus.
>>>>
>>>>is there something wrong with my test?
>>>
>>>
>>>In addition to what Eric said, check the interrupt moderation settings
>>>(ethtool -c/-C options).  One-way latency for a single request/response
>>>test will be at least the interrupt moderation value.
>>>
>>>I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
>>>queues) so I don't know whether it can improve latency.  However, RFS
>>>certainly does when there are many flows.
>>
>>Is there actually an expectation that either RPS or RFS would improve *latency*? 
>>  Multiple-stream throughput certainly, but with the additional work done to 
>>spread things around, I wouldn't expect either to improve latency.
> 
> 
> Yes, it seems to make a big improvement to latency when many flows are
> active. 

OK, you and I were using different definitions.  I was speaking to single-stream 
latency, but didn't say it explicitly (I may have subconsciously thought it was 
implicit given the OP used a single instance of netperf :).

happy benchmarking,

rick jones

> Tom told me that one of his benchmarks was 200 * netperf TCP_RR
> in parallel, and I've seen over 40% reduction in latency for that. That
> said, allocating more RX queues might also help (sfc currently defaults
> to one per processor package rather than one per processor thread, due
> to concerns about CPU efficiency).
> 
> Ben.
> 


^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.9)
From: Stephen Hemminger @ 2011-01-18 19:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev
In-Reply-To: <1295286851.3335.36.camel@edumazet-laptop>

On Mon, 17 Jan 2011 18:54:11 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 17 janvier 2011 à 09:25 -0800, Stephen Hemminger a écrit :
> 
> > I rolled in your changes. But there is one more change I want to make.
> > The existing flow match based on hash is vulnerable to side-channel DoS attack.
> > It is possible for a hostile flow to send packets that match the same
> > hash value which would effectively kill a targeted flow.
> > 
> > The solution is to match based on full source and destination, not hash value.
> > Still coding that up.
> 
> I see, but you only want to make this full test if (!q->filter_list)  ?
> 
> (or precisely only if skb_get_rxhash() was used to get the cookie )

This is what I am starting to retest. The code can probably be simplified
to avoid the may_pull() on the packet already in queue.

Subject: sched: CHOKe flow scheduler

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
This version is based on net-next, and assumes Eric's patch for
corrected bstats is already applied.

0.9 incorporate patches from Patrick/Eric
    rework the peek_random and drop code to simplify and fix bug where
    random_N needs to called with full length (including holes).

 include/linux/pkt_sched.h |   29 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  579 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 620 insertions(+)

--- a/net/sched/Kconfig	2011-01-14 10:43:19.062537393 -0800
+++ b/net/sched/Kconfig	2011-01-16 13:42:45.938919517 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-14 10:43:19.072538228 -0800
+++ b/net/sched/Makefile	2011-01-16 13:42:45.946919793 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-17 09:18:42.271211633 -0800
@@ -0,0 +1,686 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+#include <linux/ip.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+
+/*
+   CHOKe stateless AQM for fair bandwidth allocation
+   =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table (packets) */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* number of elements in queue including holes */
+static unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* Is ECN parameter configured */
+static int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	do {
+		q->head = (q->head + 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	do {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = q->tab[idx];
+
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	qdisc_tree_decrease_qlen(sch, 1);
+	--sch->q.qlen;
+}
+
+/*
+ * Compare flow of two packets
+ *  Returns true only if source and destination address and port match.
+ *          false for special cases
+ */
+static bool choke_match_flow(struct sk_buff *skb1, struct sk_buff *skb2)
+{
+	int off1, off2, poff;
+	u8 ip_proto;
+	u32 ihl;
+
+	if (skb1->protocol != skb2->protocol)
+		return false;
+
+	off1 = skb_network_offset(skb1);
+	off2 = skb_network_offset(skb2);
+
+	switch (skb1->protocol) {
+	case __constant_htons(ETH_P_IP): {
+		struct iphdr *ip1, *ip2;
+
+		if (!pskb_may_pull(skb1, sizeof(struct iphdr) + off1))
+			return false;
+
+		ip1 = (struct iphdr *) (skb1->data + off1);
+		if (ip1->frag_off & htons(IP_MF | IP_OFFSET))
+			return false;	/* don't compare fragments */
+
+		if (!pskb_may_pull(skb2, sizeof(struct iphdr) + off2))
+			return false;
+
+		ip2 = (struct iphdr *) (skb2->data + off2);
+		if (ip2->frag_off & htons(IP_MF | IP_OFFSET))
+			return false;
+
+		if (ip1->protocol != ip2->protocol ||
+		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
+			return false;
+
+		ip_proto = ip1->protocol;
+		ihl = ip1->ihl;
+		break;
+	}
+
+	case __constant_htons(ETH_P_IPV6): {
+		struct ipv6hdr *ip1, *ip2;
+
+		if (!pskb_may_pull(skb1, sizeof(struct ipv6hdr *) + off1))
+			return false;
+
+		if (!pskb_may_pull(skb2, sizeof(struct ipv6hdr *) + off2))
+			return false;
+
+		ip1 = (struct ipv6hdr *) (skb1->data + off1);
+		ip2 = (struct ipv6hdr *) (skb2->data + off2);
+
+		if (ip1->nexthdr != ip2->nexthdr ||
+		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) != 0 ||
+		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
+			return false;
+
+		ihl = (40 >> 2);
+		ip_proto = ip1->nexthdr;
+		break;
+	}
+
+	default:
+		return false;
+	}
+
+	poff = proto_ports_offset(ip_proto);
+	if (poff >= 0) {
+		u32 *ports1, *ports2;
+
+		off1 += ihl * 4 + poff;
+		if (!pskb_may_pull(skb1, off1 + 4))
+			return false;
+
+		off2 += ihl * 4 + poff;
+		if (!pskb_may_pull(skb2, off2 + 4))
+			return false;
+
+		ports1 = (__force u32 *) (skb1->data + off1);
+		ports2 = (__force u32 *) (skb2->data + off2);
+
+		return *ports1 == *ports2;
+	}
+
+	return true;
+}
+
+static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
+{
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = classid;
+}
+
+static u16 choke_get_classid(const struct sk_buff *skb)
+{
+	return *(unsigned int *)(qdisc_skb_cb(skb)->data);
+}
+
+/*
+ * Classify flow using either:
+ *  1. pre-existing classification result in skb
+ *  2. fast internal classification
+ *  3. use TC filter based classification
+ */
+static bool choke_classify(struct sk_buff *skb,
+			   struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		choke_set_classid(skb, TC_H_MIN(res.classid));
+		return true;
+	}
+
+	return false;
+}
+
+/* Select packet a random from queue */
+static struct sk_buff *choke_peek_random(const struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist
+	 * Note : result can still be NULL if q->head == q->tail
+	 */
+	return q->tab[*pidx = q->head];
+}
+
+/* Select a packet at random from the queue and compare flow */
+static bool choke_match_random(const struct choke_sched_data *q,
+			       struct sk_buff *nskb,
+			       unsigned int *pidx)
+{
+	struct sk_buff *oskb;
+
+	if (q->head == q->tail)
+		return false;
+
+	oskb = choke_peek_random(q, pidx);
+	if (q->filter_list)
+		return choke_get_classid(nskb) == choke_get_classid(oskb);
+
+
+	return choke_match_flow(oskb, nskb);
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	int uninitialized_var(ret);
+
+	/* If using external classifiers, get result and record it. */
+	if (q->filter_list &&
+	    !choke_classify(skb, sch, &ret)) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		unsigned int idx;
+
+		/* Draw a packet at random from queue and compare flow */
+		if (choke_match_random(q, skb, &idx)) {
+			q->stats.matched++;
+			choke_drop_by_idx(sch, idx);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_bstats_update(sch, skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = RED_STAB_SIZE },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	const struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int oqlen = sch->q.qlen, tail = 0;
+
+			while (q->head != q->tail) {
+				struct sk_buff *skb = q->tab[q->head];
+
+				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
+			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-14 10:43:19.092539898 -0800
+++ b/include/linux/pkt_sched.h	2011-01-16 13:42:45.926919103 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* Hard queue length (packets)	*/
+	__u32		qth_min;	/* Min average threshold (packets) */
+	__u32		qth_max;	/* Max average threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply

* Re: [PATCH] vhost: rcu annotation fixup
From: Paul E. McKenney @ 2011-01-18 19:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110118175500.GA6935@redhat.com>

On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
> > > When built with rcu checks enabled, vhost triggers
> > > bogus warnings as vhost features are read without
> > > dev->mutex sometimes.
> > > Fixing it properly is not trivial as vhost.h does not
> > > know which lockdep classes it will be used under.
> > > Disable the warning by stubbing out the check for now.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/vhost/vhost.h |    4 +---
> > >  1 files changed, 1 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index 2af44b7..2d03a31 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> > >  {
> > >  	unsigned acked_features;
> > > 
> > > -	acked_features =
> > > -		rcu_dereference_index_check(dev->acked_features,
> > > -					    lockdep_is_held(&dev->mutex));
> > > +	acked_features = rcu_dereference_index_check(dev->acked_features, 1);
> > 
> > Ouch!!!
> > 
> > Could you please at least add a comment?
> 
> Yes, OK.
> 
> > Alternatively, pass in the lock that is held and check for that?  Given
> > that this is a static inline, the compiler should be able to optimize
> > the argument away when !PROVE_RCU, correct?
> > 
> > 							Thanx, Paul
> 
> Hopefully, yes. We don't always have a lock: the idea was
> to create a lockdep for these cases. But we can't pass
> the pointer to that ...

I suppose you could pass a pointer to the lockdep map structure.
Not sure if this makes sense, but it would handle the situation.

Alternatively, create a helper function that checks the possibilities
and screams if none of them are in effect.

							Thanx, Paul

> > >  	return acked_features & (1 << bit);
> > >  }
> > > 
> > > -- 
> > > 1.7.3.2.91.g446ac
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: inbound connection problems when "netlink: test for all flags of the NLM_F_DUMP composite" commit applied
From: Jarek Poplawski @ 2011-01-18 18:47 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Alessandro Suardi, jamal, David Miller, pablo, arthur.marsh,
	eric.dumazet, netdev
In-Reply-To: <20110118182852.GC4202@del.dom.local>

On Tue, Jan 18, 2011 at 07:28:52PM +0100, Jarek Poplawski wrote:
> On Tue, Jan 18, 2011 at 07:24:40PM +0100, Jan Engelhardt wrote:
> > 
> > On Tuesday 2011-01-18 19:10, Alessandro Suardi wrote:
> > >On Tue, Jan 18, 2011 at 6:23 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > >>
> > >> NLM_F_DUMP flags should be applied to GET requests only, eg. rtnetlink
> > >> tests message type to verify this. Since genetlink can't do the same
> > >> use "practical" test for ops->dumpit (assuming NEW request won't be
> > >> mixed with GET).
> ...
> > >2.6.37-git18 + netlink revert + this patch
> > > - fixes Avahi
> > > - breaks acpid
> > >Starting acpi daemon: RTNETLINK1 answers: Operation not supported
> > >acpid: error talking to the kernel via netlink
> > 
> > Deducing from that, it is a GET-like request that was sent by acpid, 
> > and the message type is one that has both a dumpit and a doit function.
> > So if EOPNOTSUPP now occurs on all message types that have both dumpit 
> > and doit, you should have broken a lot more than just acpid.
> 
> Right, we need something better here.

On the other hand, until there is something better, we might try to
fix it at least for "normal" dumpit cases?

Alessandro, could you try (with the netlink revert)?

Thanks,
Jarek P.

---
diff -Nurp a/net/netlink/genetlink.c b/net/netlink/genetlink.c
--- a/net/netlink/genetlink.c	2011-01-18 16:58:16.000000000 +0100
+++ b/net/netlink/genetlink.c	2011-01-18 19:36:25.000000000 +0100
@@ -519,15 +519,14 @@ static int genl_rcv_msg(struct sk_buff *
 	    security_netlink_recv(skb, CAP_NET_ADMIN))
 		return -EPERM;
 
-	if (nlh->nlmsg_flags & NLM_F_DUMP) {
-		if (ops->dumpit == NULL)
-			return -EOPNOTSUPP;
-
-		genl_unlock();
-		err = netlink_dump_start(net->genl_sock, skb, nlh,
-					 ops->dumpit, ops->done);
-		genl_lock();
-		return err;
+	if (ops->dumpit) {
+		if (nlh->nlmsg_flags & NLM_F_DUMP) {
+			genl_unlock();
+			err = netlink_dump_start(net->genl_sock, skb, nlh,
+						 ops->dumpit, ops->done);
+			genl_lock();
+			return err;
+		}
 	}
 
 	if (ops->doit == NULL)

^ permalink raw reply

* Re: rps testing questions
From: Ben Hutchings @ 2011-01-18 18:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: mi wake, netdev
In-Reply-To: <4D35DAB0.9030201@hp.com>

On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote:
> Ben Hutchings wrote:
> > On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
[...]
> >>I do ab and tbench testing also find there is less tps with enable
> >>rps.but,there is more cpu using when with enable rps.when with enable
> >>rps ,softirqs is blanced  on cpus.
> >>
> >>is there something wrong with my test?
> > 
> > 
> > In addition to what Eric said, check the interrupt moderation settings
> > (ethtool -c/-C options).  One-way latency for a single request/response
> > test will be at least the interrupt moderation value.
> > 
> > I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
> > queues) so I don't know whether it can improve latency.  However, RFS
> > certainly does when there are many flows.
> 
> Is there actually an expectation that either RPS or RFS would improve *latency*? 
>   Multiple-stream throughput certainly, but with the additional work done to 
> spread things around, I wouldn't expect either to improve latency.

Yes, it seems to make a big improvement to latency when many flows are
active.  Tom told me that one of his benchmarks was 200 * netperf TCP_RR
in parallel, and I've seen over 40% reduction in latency for that.  That
said, allocating more RX queues might also help (sfc currently defaults
to one per processor package rather than one per processor thread, due
to concerns about CPU efficiency).

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox