Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH 11/12] sfc: Only count bad packets in rx_errors
From: Dmitry Kravkov @ 2010-06-08  9:20 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ben Hutchings
In-Reply-To: <1275427265.2114.35.camel@achroite.uk.solarflarecom.com>

Hi,

Different drivers include different errors in stats->rx_errors:

  bxn2	  bnx2x	  ixgbx	  sfc		  dnet	  atlx
----------------------------------------------------------------------
rx_length	rx_length	rx_length	rx_length	rx_length	rx_length
rx_over	rx_over							rx_over
rx_frame	rx_frame			rx_frame	rx_frame	rx_frame
rx_crc	rx_crc	rx_crc	rx_crc			rx_crc
		rx_fifo							rx_fifo
		rx_missed							rx_missed
						rx_symbol
								unsup_opcd
----------------------------------------------------------------------
Is there any documentation which defines what should this statistic include?

Thanks
Dmitry


-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Ben Hutchings
Sent: Wednesday, June 02, 2010 12:21 AM
To: David Miller
Cc: netdev@vger.kernel.org; linux-net-drivers@solarflare.com
Subject: [PATCH 11/12] sfc: Only count bad packets in rx_errors

rx_errors is defined as 'bad packets received', but we are currently
including various overflow errors as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index aae3347..26b0cc2 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1518,11 +1518,8 @@ static struct net_device_stats *efx_net_stats(struct net_device *net_dev)
 	stats->tx_window_errors = mac_stats->tx_late_collision;
 
 	stats->rx_errors = (stats->rx_length_errors +
-			    stats->rx_over_errors +
 			    stats->rx_crc_errors +
 			    stats->rx_frame_errors +
-			    stats->rx_fifo_errors +
-			    stats->rx_missed_errors +
 			    mac_stats->rx_symbol_error);
 	stats->tx_errors = (stats->tx_window_errors +
 			    mac_stats->tx_bad);
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-08  9:27 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTilqB2sAyfBTLhYT_a2_TySGay4FFG6SGJUZ1o_4@mail.gmail.com>

On Mon, 2010-06-07 at 14:54 +0200, Kay Sievers wrote:

> > I see you remove the driver. Does this mean that in sysfs these devices
> > wouldn't have a driver symlink? ISTR that we needed that for userspace,
> > but I'm not entirely sure, and I don't have all the relevant userspace
> > in my test setup.
> 
> Yeah, if you need the driver, you need to use bus device, as you do
> now. The driver needs to be registered with the core, and the probe
> function needs to tell to bind to the devices while registered. You
> can not manually assign this.

So I spoke to Jouni and he says that he doesn't remember needing the
driver, so I guess we don't care. Can I get you to submit that patch
properly? Send it to linux-wireless@vger.kernel.org please.

johannes


^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-08  9:30 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1275989260.3706.115.camel@jlt3.sipsolutions.net>

On Tue, Jun 8, 2010 at 11:27, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2010-06-07 at 14:54 +0200, Kay Sievers wrote:
>
>> > I see you remove the driver. Does this mean that in sysfs these devices
>> > wouldn't have a driver symlink? ISTR that we needed that for userspace,
>> > but I'm not entirely sure, and I don't have all the relevant userspace
>> > in my test setup.
>>
>> Yeah, if you need the driver, you need to use bus device, as you do
>> now. The driver needs to be registered with the core, and the probe
>> function needs to tell to bind to the devices while registered. You
>> can not manually assign this.
>
> So I spoke to Jouni and he says that he doesn't remember needing the
> driver, so I guess we don't care. Can I get you to submit that patch
> properly? Send it to linux-wireless@vger.kernel.org please.

Sure, you mean the patch that changes the hwsim code, or the one I
talked about to properly create a virtual bus parent?

Kay

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-08  9:45 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTinMauFsJTKS3eT-pT44KQ8PSf3HJLS8CZIeHmxP@mail.gmail.com>

On Tue, 2010-06-08 at 11:30 +0200, Kay Sievers wrote:
> On Tue, Jun 8, 2010 at 11:27, Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Mon, 2010-06-07 at 14:54 +0200, Kay Sievers wrote:
> >
> >> > I see you remove the driver. Does this mean that in sysfs these devices
> >> > wouldn't have a driver symlink? ISTR that we needed that for userspace,
> >> > but I'm not entirely sure, and I don't have all the relevant userspace
> >> > in my test setup.
> >>
> >> Yeah, if you need the driver, you need to use bus device, as you do
> >> now. The driver needs to be registered with the core, and the probe
> >> function needs to tell to bind to the devices while registered. You
> >> can not manually assign this.
> >
> > So I spoke to Jouni and he says that he doesn't remember needing the
> > driver, so I guess we don't care. Can I get you to submit that patch
> > properly? Send it to linux-wireless@vger.kernel.org please.
> 
> Sure, you mean the patch that changes the hwsim code, or the one I
> talked about to properly create a virtual bus parent?

The hwsim one you pasted earlier.

johannes


^ permalink raw reply

* Re: [PATCHv2 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Ben Hutchings @ 2010-06-08 11:35 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, arnd, netdev, linux-net-drivers
In-Reply-To: <20100607.211829.63035366.davem@davemloft.net>

On Mon, 2010-06-07 at 21:18 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 04 Jun 2010 23:04:53 +0100
> 
> > +#if BITS_PER_LONG == 64
> > +#define NET_DEVICE_STATS_DEFINE(name)	u64 name
> > +#elif defined(__LITTLE_ENDIAN)
> > +#define NET_DEVICE_STATS_DEFINE(name)	u32 name, pad_ ## name
> > +#else
> > +#define NET_DEVICE_STATS_DEFINE(name)	u32 pad_ ## name, name
> > +#endif
> > +
>  ...
> > +	NET_DEVICE_STATS_DEFINE(rx_packets);
> > +	NET_DEVICE_STATS_DEFINE(tx_packets);
> > +	NET_DEVICE_STATS_DEFINE(rx_bytes);
>  ...
> >  	static const char fmt[] = "%30s %12lu\n";
> > +	static const char fmt64[] = "%30s %12llu\n";
>  ...
> > +	seq_printf(seq, fmt64, "total frames received", stats->rx_packets);
> > +	seq_printf(seq, fmt64, "total bytes received", stats->rx_bytes);
> > +	seq_printf(seq, fmt64, "Broadcast/Multicast Rcvd", stats->multicast);
> 
> I guess you only built this on a 64-bit platform that defines u64 as a
> long long type.

There was some disussion of formatting u64 / unsigned long long in the
past and I thought the outcome of that was that u64 should always be
defined as unsigned long long.  (See commits fe33332 and 9018113.)

[...]
> And the whole tree needs to be inspected to make sure there isn't going
> to be fallout in areas your patch didn't take care of wrt. printf format
> strings and the like.
> 
> What was always "unsigned long" is now a variable type, therefore using
> a fixed printf format string is impossible unless you always cast these
> things when passed in as printf arguments.

Yes, that's true if there are drivers out there printing members of
net_device_stats.  I admit I haven't checked for that.  (Hmm, might be
time to try Coccinelle.)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: 2.6.35-rc2-git1 - net/mac80211/sta_info.c:125 invoked rcu_dereference_check() without protection!
From: Miles Lane @ 2010-06-08 11:29 UTC (permalink / raw)
  To: Johannes Berg
  Cc: paulmck, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, eric.dumazet, netdev, Jens Axboe,
	Gui Jianfeng, Li Zefan
In-Reply-To: <1275981969.3706.1.camel@jlt3.sipsolutions.net>

On Tue, Jun 8, 2010 at 3:26 AM, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2010-06-07 at 16:59 -0700, Paul E. McKenney wrote:
>> On Mon, Jun 07, 2010 at 02:25:44PM -0400, Miles Lane wrote:
>> > [   43.478812] [ INFO: suspicious rcu_dereference_check() usage. ]
>> > [   43.478815] ---------------------------------------------------
>> > [   43.478820] net/mac80211/sta_info.c:125 invoked
>> > rcu_dereference_check() without protection!
>> > [   43.478824]
>> > [   43.478824] other info that might help us debug this:
>> > [   43.478826]
>> > [   43.478829]
>> > [   43.478830] rcu_scheduler_active = 1, debug_locks = 1
>> > [   43.478834] no locks held by NetworkManager/4017.
>>
>> Hmmm...  Johannes's update has been merged, and it requires that callers
>> either be in an RCU read-side critical section or hold either the
>> ->sta_lock or the ->sta_mtx, and this thread does none of this.
>>
>> Johannes, any thoughts?
>>
>>                                                       Thanx, Paul
>>
>> > [   43.478837] stack backtrace:
>> > [   43.478842] Pid: 4017, comm: NetworkManager Not tainted 2.6.35-rc2-git1 #8
>> > [   43.478846] Call Trace:
>> > [   43.478849]  <IRQ>  [<ffffffff81064e9c>] lockdep_rcu_dereference+0x9d/0xa5
>> > [   43.478876]  [<ffffffffa010cb3c>] sta_info_get_bss+0x71/0x12d [mac80211]
>> > [   43.478889]  [<ffffffffa010cc0d>] ieee80211_find_sta+0x15/0x2f [mac80211]
>> > [   43.478902]  [<ffffffffa019ae16>] iwlagn_tx_queue_reclaim+0xe7/0x1bb [iwlagn]
>
> iwlwifi wasn't using rcu protection here -- already sent a patch fixing
> it. My mistake, I  think. Thanks for checking :)
>
> johannes
>
>

Yes, your patch works for me.

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Kay Sievers @ 2010-06-08 11:55 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1275990325.3706.116.camel@jlt3.sipsolutions.net>

On Tue, 2010-06-08 at 11:45 +0200, Johannes Berg wrote:
> On Tue, 2010-06-08 at 11:30 +0200, Kay Sievers wrote:
> > On Tue, Jun 8, 2010 at 11:27, Johannes Berg <johannes@sipsolutions.net> wrote:
> > > On Mon, 2010-06-07 at 14:54 +0200, Kay Sievers wrote:
> > >
> > >> > I see you remove the driver. Does this mean that in sysfs these devices
> > >> > wouldn't have a driver symlink? ISTR that we needed that for userspace,
> > >> > but I'm not entirely sure, and I don't have all the relevant userspace
> > >> > in my test setup.
> > >>
> > >> Yeah, if you need the driver, you need to use bus device, as you do
> > >> now. The driver needs to be registered with the core, and the probe
> > >> function needs to tell to bind to the devices while registered. You
> > >> can not manually assign this.
> > >
> > > So I spoke to Jouni and he says that he doesn't remember needing the
> > > driver, so I guess we don't care. Can I get you to submit that patch
> > > properly? Send it to linux-wireless@vger.kernel.org please.
> > 
> > Sure, you mean the patch that changes the hwsim code, or the one I
> > talked about to properly create a virtual bus parent?
> 
> The hwsim one you pasted earlier.

Ok, this needs testing.

Please let me know, if that works for you so far.

Thanks,
Kay


From: Kay Sievers <kay.sievers@vrfy.org>
Subject: mac80211_hwsim - convert class devices to bus devices

Class devices must never have child devices of a different class,
so use bus devices.

Embed the 'struct device' into mac80211_hwsim_data, to do proper
ressource lifetime management at device release time.

Registers a driver, to binds devices to the bus.

Use a virtual root as the parent in /sys/devices/.

Puts the monitor device under the virtual parent.

This creates the following devices (udevadm monitor):
  /module/mac80211_hwsim (module)
  /bus/mac80211_hwsim (bus)
  /bus/mac80211_hwsim/drivers/mac80211_hwsim (drivers)
  /devices/mac80211_hwsim/hwsim0 (mac80211_hwsim)
  /devices/mac80211_hwsim/hwsim0/ieee80211/phy17 (ieee80211)
  /devices/mac80211_hwsim/hwsim0/ieee80211/phy17/rfkill19 (rfkill)
  /devices/mac80211_hwsim/hwsim0/net/wlan9 (net)
  /devices/mac80211_hwsim/hwsim0/net/wlan9/queues/rx-0 (queues)
  /devices/mac80211_hwsim/hwsim0/net/wlan9/queues/rx-1 (queues)
  /devices/mac80211_hwsim/hwsim0/net/wlan9/queues/rx-2 (queues)
  /devices/mac80211_hwsim/hwsim0/net/wlan9/queues/rx-3 (queues)
  /devices/mac80211_hwsim/hwsim1 (mac80211_hwsim)
  /devices/mac80211_hwsim/hwsim1/ieee80211/phy18 (ieee80211)
  /devices/mac80211_hwsim/hwsim1/ieee80211/phy18/rfkill20 (rfkill)
  /devices/mac80211_hwsim/hwsim1/net/wlan10 (net)
  /devices/mac80211_hwsim/hwsim1/net/wlan10/queues/rx-0 (queues)
  /devices/mac80211_hwsim/hwsim1/net/wlan10/queues/rx-1 (queues)
  /devices/mac80211_hwsim/hwsim1/net/wlan10/queues/rx-2 (queues)
  /devices/mac80211_hwsim/hwsim1/net/wlan10/queues/rx-3 (queues)
  /devices/mac80211_hwsim/net/hwsim0 (net)
  /devices/mac80211_hwsim/net/hwsim0/queues/rx-0 (queues)

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 6f8cb3e..2543e74 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -194,7 +194,16 @@ static inline void hwsim_clear_sta_magic(struct ieee80211_sta *sta)
 	sp->magic = 0;
 }
 
-static struct class *hwsim_class;
+static struct bus_type hwsim_bus = {
+	.name = "mac80211_hwsim",
+};
+
+static struct device_driver hwsim_driver = {
+	.name = "mac80211_hwsim",
+	.bus = &hwsim_bus,
+};
+
+static struct device *hwsim_virtual_parent;
 
 static struct net_device *hwsim_mon; /* global monitor netdev */
 
@@ -280,7 +289,7 @@ static struct list_head hwsim_radios;
 struct mac80211_hwsim_data {
 	struct list_head list;
 	struct ieee80211_hw *hw;
-	struct device *dev;
+	struct device dev;
 	struct ieee80211_supported_band bands[2];
 	struct ieee80211_channel channels_2ghz[ARRAY_SIZE(hwsim_channels_2ghz)];
 	struct ieee80211_channel channels_5ghz[ARRAY_SIZE(hwsim_channels_5ghz)];
@@ -1063,22 +1072,13 @@ static void mac80211_hwsim_free(void)
 		list_move(i, &tmplist);
 	spin_unlock_bh(&hwsim_radio_lock);
 
-	list_for_each_entry_safe(data, tmpdata, &tmplist, list) {
-		debugfs_remove(data->debugfs_group);
-		debugfs_remove(data->debugfs_ps);
-		debugfs_remove(data->debugfs);
-		ieee80211_unregister_hw(data->hw);
-		device_unregister(data->dev);
-		ieee80211_free_hw(data->hw);
-	}
-	class_destroy(hwsim_class);
+	list_for_each_entry_safe(data, tmpdata, &tmplist, list)
+		device_unregister(&data->dev);
+	device_unregister(hwsim_virtual_parent);
+	driver_unregister(&hwsim_driver);
+	bus_unregister(&hwsim_bus);
 }
 
-
-static struct device_driver mac80211_hwsim_driver = {
-	.name = "mac80211_hwsim"
-};
-
 static const struct net_device_ops hwsim_netdev_ops = {
 	.ndo_start_xmit 	= hwsim_mon_xmit,
 	.ndo_change_mtu		= eth_change_mtu,
@@ -1095,6 +1095,7 @@ static void hwsim_mon_setup(struct net_device *dev)
 	dev->type = ARPHRD_IEEE80211_RADIOTAP;
 	memset(dev->dev_addr, 0, ETH_ALEN);
 	dev->dev_addr[0] = 0x12;
+	dev->dev.parent = hwsim_virtual_parent;
 }
 
 
@@ -1231,6 +1232,17 @@ DEFINE_SIMPLE_ATTRIBUTE(hwsim_fops_group,
 			hwsim_fops_group_read, hwsim_fops_group_write,
 			"%llx\n");
 
+static void hwsim_dev_release(struct device* hwsim_dev) {
+	struct mac80211_hwsim_data *data =
+	  container_of(hwsim_dev, struct mac80211_hwsim_data, dev);
+
+	debugfs_remove(data->debugfs_group);
+	debugfs_remove(data->debugfs_ps);
+	debugfs_remove(data->debugfs);
+	ieee80211_unregister_hw(data->hw);
+	ieee80211_free_hw(data->hw);
+}
+
 static int __init init_mac80211_hwsim(void)
 {
 	int i, err = 0;
@@ -1251,9 +1263,22 @@ static int __init init_mac80211_hwsim(void)
 	spin_lock_init(&hwsim_radio_lock);
 	INIT_LIST_HEAD(&hwsim_radios);
 
-	hwsim_class = class_create(THIS_MODULE, "mac80211_hwsim");
-	if (IS_ERR(hwsim_class))
-		return PTR_ERR(hwsim_class);
+	err = bus_register(&hwsim_bus);
+	if (err)
+		return err;
+
+	err = driver_register(&hwsim_driver);
+	if (err) {
+		bus_unregister(&hwsim_bus);
+		return err;
+	}
+
+	hwsim_virtual_parent = root_device_register("mac80211_hwsim");
+	if (IS_ERR(hwsim_virtual_parent)) {
+		driver_unregister(&hwsim_driver);
+		bus_unregister(&hwsim_bus);
+		return PTR_ERR(hwsim_virtual_parent);
+	}
 
 	memset(addr, 0, ETH_ALEN);
 	addr[0] = 0x02;
@@ -1271,18 +1296,19 @@ static int __init init_mac80211_hwsim(void)
 		data = hw->priv;
 		data->hw = hw;
 
-		data->dev = device_create(hwsim_class, NULL, 0, hw,
-					  "hwsim%d", i);
-		if (IS_ERR(data->dev)) {
+		dev_set_name(&data->dev, "hwsim%d", i);
+		data->dev.parent = hwsim_virtual_parent;
+		data->dev.bus = &hwsim_bus;
+		data->dev.release = hwsim_dev_release;
+		err = device_register(&data->dev);
+		if (err) {
 			printk(KERN_DEBUG
-			       "mac80211_hwsim: device_create "
-			       "failed (%ld)\n", PTR_ERR(data->dev));
-			err = -ENOMEM;
+			       "mac80211_hwsim: device_register failed (%d)\n",
+			       err);
 			goto failed_drvdata;
 		}
-		data->dev->driver = &mac80211_hwsim_driver;
 
-		SET_IEEE80211_DEV(hw, data->dev);
+		SET_IEEE80211_DEV(hw, &data->dev);
 		addr[3] = i >> 8;
 		addr[4] = i;
 		memcpy(data->addresses[0].addr, addr, ETH_ALEN);
@@ -1513,7 +1539,7 @@ failed_mon:
 	return err;
 
 failed_hw:
-	device_unregister(data->dev);
+	device_unregister(&data->dev);
 failed_drvdata:
 	ieee80211_free_hw(hw);
 failed:



^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-08 12:03 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1275998127.1899.38.camel@yio.site>

On Tue, 2010-06-08 at 13:55 +0200, Kay Sievers wrote:

> Ok, this needs testing.
> 
> Please let me know, if that works for you so far.

Will do. Just a question: none of this seems to pin the module? So can I
be sure if I rmmod the module that the release function will still be
around etc.?

johannes


^ permalink raw reply

* Re: [PATCH nf-next-2.6] xt_rateest: Better struct xt_rateest layout
From: Patrick McHardy @ 2010-06-08 12:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Netfilter Development Mailinglist
In-Reply-To: <1275941837.2775.7.camel@edumazet-laptop>

On 07.06.2010 22:17, Eric Dumazet wrote:
> We currently dirty two cache lines in struct xt_rateest, this hurts SMP
> performance.
> 
> This patch moves lock/bstats/rstats at beginning of structure so that
> they share a single cache line.

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next-2.6 v2] pkt_sched: gen_estimator: kill est_lock rwlock
From: Jarek Poplawski @ 2010-06-08 12:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1275929761.2545.159.camel@edumazet-laptop>

Eric Dumazet wrote:
...
> [PATCH net-next-2.6] pkt_sched: gen_kill_estimator() rcu fixes
> 
> gen_kill_estimator() API is fundamentaly wrong, since caller should make
> sure an RCU grace period is respected before freeing bstats or lock.
> 
> This was partially addressed in commit 5d944c640b4 (gen_estimator:
> deadlock fix), but same problem exist for all gen_kill_estimator()
> users.
> 
> Change its name to gen_kill_estimator_rcu() and change all callers to
> use RCU grace period before freeing bstats container.
> 
> As a bonus, est_lock rwlock can be killed for good.
> 
> Special thanks to Changli Gao for commenting on a previous patch, this
> made this bug clear to me.
> 

Actually, I guess, Changli meant the bug introduced by your previous
patch by removing the est_lock. With this lock (and your commit 5d944)
bstats (and API) seem "fundamentaly" safe.

Jarek P.

^ permalink raw reply

* Re: [PATCH net-next-2.6 v2] pkt_sched: gen_estimator: kill est_lock rwlock
From: Eric Dumazet @ 2010-06-08 12:27 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <20100608121546.GA9392@ff.dom.local>

Le mardi 08 juin 2010 à 12:15 +0000, Jarek Poplawski a écrit :

> 
> Actually, I guess, Changli meant the bug introduced by your previous
> patch by removing the est_lock. With this lock (and your commit 5d944)
> bstats (and API) seem "fundamentaly" safe.
> 

Sorry, I have no idea of what you want to say, I cant find commit 5d944.




^ permalink raw reply

* Re: [PATCH net-next-2.6 v2] pkt_sched: gen_estimator: kill est_lock rwlock
From: Jarek Poplawski @ 2010-06-08 12:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276000052.2475.307.camel@edumazet-laptop>

On Tue, Jun 08, 2010 at 02:27:32PM +0200, Eric Dumazet wrote:
> Le mardi 08 juin 2010 ?? 12:15 +0000, Jarek Poplawski a écrit :
> 
> > 
> > Actually, I guess, Changli meant the bug introduced by your previous
> > patch by removing the est_lock. With this lock (and your commit 5d944)
> > bstats (and API) seem "fundamentaly" safe.
> > 
> 
> Sorry, I have no idea of what you want to say, I cant find commit 5d944.
> 

Sorry, I meant the commit mentioned in your changelog which was quoted.

> This was partially addressed in commit 5d944c640b4 (gen_estimator:
> deadlock fix), but same problem exist for all gen_kill_estimator()
> users.

Jarek P.

^ permalink raw reply

* AF_PACKET meets AF_UNIX
From: Sebastian Krahmer @ 2010-06-08 12:43 UTC (permalink / raw)
  To: netdev

Hi

I made a small fix for 2.6.34 to have something like
"tcpdump -i unix". For debugging purposes (DBUS etc)
it would be really helpfull. The sockaddr_ll struct
contains the socket's ino and the senders PID in
ifindex, so all necessary infos can be found
out via /proc/net/unix in userland.

The patch goes here: http://www.suse.de/~krahmer/unix-monitor-2.6.34.tgz

If you dont like any constructions in it please let me know,
so I can fix it and it can be accepted upstream :)

cya,
Sebastian

-- 
~
~ perl self.pl
~ $_='print"\$_=\47$_\47;eval"';eval
~ krahmer@suse.de - SuSE Security Team
~ SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)


^ permalink raw reply

* Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!
From: Miles Lane @ 2010-06-08 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	LKML, nauman, eric.dumazet, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg
In-Reply-To: <1275986441.5408.111.camel@twins>

On Tue, Jun 8, 2010 at 4:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-06-08 at 00:16 -0400, Miles Lane wrote:
>> On Mon, Jun 7, 2010 at 8:19 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Mon, Jun 07, 2010 at 02:14:25PM -0400, Miles Lane wrote:
>> >> Hi All,
>> >>
>> >> I just reproduced a warning I reported quite a while ago.  Is a patch
>> >> for this in the pipeline?
>> >
>> > I proposed a patch, thinking that it was a false positive.  Peter Zijlstra
>> > pointed out that there was a real race, and proposed an alternative patch,
>> > which may be found at http://lkml.org/lkml/2010/4/22/603.
>> >
>> > Could you please test Peter's patch and let us know if it cures the problem?
>> >
>
> Gah, this task_group() stuff is annoying, how about something like the
> below which teaches task_group() about the task_rq()->lock rule?
>
> ---
>  include/linux/cgroup.h |   20 +++++++++++----
>  kernel/sched.c         |   61 +++++++++++++++++++++++++----------------------
>  2 files changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 0c62160..1efd212 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state *cgroup_subsys_state(
>        return cgrp->subsys[subsys_id];
>  }
>
> -static inline struct cgroup_subsys_state *task_subsys_state(
> -       struct task_struct *task, int subsys_id)
> +/*
> + * function to get the cgroup_subsys_state which allows for extra
> + * rcu_dereference_check() conditions, such as locks used during the
> + * cgroup_subsys::attach() methods.
> + */
> +#define task_subsys_state_check(task, subsys_id, __c)                  \
> +       rcu_dereference_check(task->cgroups->subsys[subsys_id],         \
> +                             rcu_read_lock_held() ||                   \
> +                             lockdep_is_held(&task->alloc_lock) ||     \
> +                             cgroup_lock_is_held() || (__c))
> +
> +static inline struct cgroup_subsys_state *
> +task_subsys_state(struct task_struct *task, int subsys_id)
>  {
> -       return rcu_dereference_check(task->cgroups->subsys[subsys_id],
> -                                    rcu_read_lock_held() ||
> -                                    lockdep_is_held(&task->alloc_lock) ||
> -                                    cgroup_lock_is_held());
> +       return task_subsys_state_check(task, subsys_id, false);
>  }
>
>  static inline struct cgroup* task_cgroup(struct task_struct *task,
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f8b8996..e01bb45 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -306,32 +306,26 @@ static int init_task_group_load = INIT_TASK_GROUP_LOAD;
>  */
>  struct task_group init_task_group;
>
> -/* return group to which a task belongs */
> +/*
> + * Return the group to which this tasks belongs.
> + *
> + * We use task_subsys_state_check() and extend the RCU verification
> + * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
> + * holds that lock for each task it moves into the cgroup. Therefore
> + * by holding that lock, we pin the task to the current cgroup.
> + */
>  static inline struct task_group *task_group(struct task_struct *p)
>  {
> -       struct task_group *tg;
> +       struct cgroup_subsys_state *css;
>
> -#ifdef CONFIG_CGROUP_SCHED
> -       tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
> -                               struct task_group, css);
> -#else
> -       tg = &init_task_group;
> -#endif
> -       return tg;
> +       css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> +                       lockdep_is_held(&task_rq(p)->lock));
> +       return container_of(css, struct task_group, css);
>  }
>
>  /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
>  static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
>  {
> -       /*
> -        * Strictly speaking this rcu_read_lock() is not needed since the
> -        * task_group is tied to the cgroup, which in turn can never go away
> -        * as long as there are tasks attached to it.
> -        *
> -        * However since task_group() uses task_subsys_state() which is an
> -        * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
> -        */
> -       rcu_read_lock();
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>        p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
>        p->se.parent = task_group(p)->se[cpu];
> @@ -341,7 +335,6 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
>        p->rt.rt_rq  = task_group(p)->rt_rq[cpu];
>        p->rt.parent = task_group(p)->rt_se[cpu];
>  #endif
> -       rcu_read_unlock();
>  }
>
>  #else
> @@ -4465,16 +4458,6 @@ recheck:
>        }
>
>        if (user) {
> -#ifdef CONFIG_RT_GROUP_SCHED
> -               /*
> -                * Do not allow realtime tasks into groups that have no runtime
> -                * assigned.
> -                */
> -               if (rt_bandwidth_enabled() && rt_policy(policy) &&
> -                               task_group(p)->rt_bandwidth.rt_runtime == 0)
> -                       return -EPERM;
> -#endif
> -
>                retval = security_task_setscheduler(p, policy, param);
>                if (retval)
>                        return retval;
> @@ -4490,6 +4473,26 @@ recheck:
>         * runqueue lock must be held.
>         */
>        rq = __task_rq_lock(p);
> +
> +       retval = 0;
> +#ifdef CONFIG_RT_GROUP_SCHED
> +       if (user) {
> +               /*
> +                * Do not allow realtime tasks into groups that have no runtime
> +                * assigned.
> +                */
> +               if (rt_bandwidth_enabled() && rt_policy(policy) &&
> +                               task_group(p)->rt_bandwidth.rt_runtime == 0)
> +                       retval = -EPERM;
> +
> +               if (retval) {
> +                       __task_rq_unlock(rq);
> +                       raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +                       return retval;
> +               }
> +       }
> +#endif
> +
>        /* recheck policy now with rq lock held */
>        if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
>                policy = oldpolicy = -1;
>
>

  CC      kernel/sched.o
kernel/sched.c: In function ‘task_group’:
kernel/sched.c:321: error: implicit declaration of function ‘task_rq’
kernel/sched.c:321: error: invalid type argument of ‘->’ (have ‘int’)
make[1]: *** [kernel/sched.o] Error 1

I had to apply with fuzz.  Did it mess up?

static inline struct task_group *task_group(struct task_struct *p)
{
       struct cgroup_subsys_state *css;

       css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
                       lockdep_is_held(&task_rq(p)->lock));
       return container_of(css, struct task_group, css);
}

^ permalink raw reply

* Re: BUG: double spinlock in "drivers/net/3c505.c"
From: Christoph Fritz @ 2010-06-08 13:26 UTC (permalink / raw)
  To: Alexander Strakh
  Cc: Philip Blundell, Craig Southeren, Andrew Tridgell, Alan Cox,
	netdev, linux-kernel
In-Reply-To: <201006071517.28744.strakh@ispras.ru>

On Mon, 2010-06-07 at 15:17 +0400, Alexander Strakh wrote: 
> 	KERNEL_VERSION: 2.6.35-rc1
>         SUBJECT: duble spinlock  in function elp_start_xmit

Not only in elp_start_xmit. This driver is for a pretty old and slow isa
ethernet card and I think nobody cares. To quote a comment from the
source: "[...] the concurrency protection is particularly awful".

Thanks, 
   Christoph

^ permalink raw reply

* Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!
From: Peter Zijlstra @ 2010-06-08 13:34 UTC (permalink / raw)
  To: Miles Lane
  Cc: paulmck, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	LKML, nauman, eric.dumazet, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg
In-Reply-To: <AANLkTimXVlgzHivCQberLdrFP8BH_nJzt8_ozUNHG_aI@mail.gmail.com>

On Tue, 2010-06-08 at 09:14 -0400, Miles Lane wrote:

>   CC      kernel/sched.o
> kernel/sched.c: In function ‘task_group’:
> kernel/sched.c:321: error: implicit declaration of function ‘task_rq’
> kernel/sched.c:321: error: invalid type argument of ‘->’ (have ‘int’)
> make[1]: *** [kernel/sched.o] Error 1
> 
> I had to apply with fuzz.  Did it mess up?


No, I probably did.. task_rq() is defined on line 636 or thereabouts,
and this function landed around line 320.

Ahh, and it compiled here because I have CGROUP_SCHED=y, but
PROVE_RCU=n, so that whole check expression disappears and is never
evaluated...

/me fixes

---
Subject: sched: PROVE_RCU vs cpu_cgroup
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Jun 08 11:40:42 CEST 2010

PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
typically holds rq->lock around the css rcu derefs but the generic
cgroup code doesn't (and can't) know about that lock.

Provide means to add extra checks to the css dereference and use that
in the scheduler to annotate its users.

The addition of rq->lock to these checks is correct because the
cgroup_subsys::attach() method takes the rq->lock for each task it
moves, therefore by holding that lock, we ensure the task is pinned to
the current cgroup and the RCU dereference is valid.

That leaves one genuine race in __sched_setscheduler() where we used
task_group() without holding any of the required locks and thus raced
with the cgroup code. Solve this by moving the check under the rq->lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/cgroup.h |   20 +++++---
 kernel/sched.c         |  115 +++++++++++++++++++++++++------------------------
 2 files changed, 73 insertions(+), 62 deletions(-)

Index: linux-2.6/include/linux/cgroup.h
===================================================================
--- linux-2.6.orig/include/linux/cgroup.h
+++ linux-2.6/include/linux/cgroup.h
@@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state
 	return cgrp->subsys[subsys_id];
 }
 
-static inline struct cgroup_subsys_state *task_subsys_state(
-	struct task_struct *task, int subsys_id)
+/*
+ * function to get the cgroup_subsys_state which allows for extra
+ * rcu_dereference_check() conditions, such as locks used during the
+ * cgroup_subsys::attach() methods.
+ */
+#define task_subsys_state_check(task, subsys_id, __c)			\
+	rcu_dereference_check(task->cgroups->subsys[subsys_id],		\
+			      rcu_read_lock_held() ||			\
+			      lockdep_is_held(&task->alloc_lock) ||	\
+			      cgroup_lock_is_held() || (__c))
+
+static inline struct cgroup_subsys_state *
+task_subsys_state(struct task_struct *task, int subsys_id)
 {
-	return rcu_dereference_check(task->cgroups->subsys[subsys_id],
-				     rcu_read_lock_held() ||
-				     lockdep_is_held(&task->alloc_lock) ||
-				     cgroup_lock_is_held());
+	return task_subsys_state_check(task, subsys_id, false);
 }
 
 static inline struct cgroup* task_cgroup(struct task_struct *task,
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -306,52 +306,6 @@ static int init_task_group_load = INIT_T
  */
 struct task_group init_task_group;
 
-/* return group to which a task belongs */
-static inline struct task_group *task_group(struct task_struct *p)
-{
-	struct task_group *tg;
-
-#ifdef CONFIG_CGROUP_SCHED
-	tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
-				struct task_group, css);
-#else
-	tg = &init_task_group;
-#endif
-	return tg;
-}
-
-/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
-{
-	/*
-	 * Strictly speaking this rcu_read_lock() is not needed since the
-	 * task_group is tied to the cgroup, which in turn can never go away
-	 * as long as there are tasks attached to it.
-	 *
-	 * However since task_group() uses task_subsys_state() which is an
-	 * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
-	 */
-	rcu_read_lock();
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
-	p->se.parent = task_group(p)->se[cpu];
-#endif
-
-#ifdef CONFIG_RT_GROUP_SCHED
-	p->rt.rt_rq  = task_group(p)->rt_rq[cpu];
-	p->rt.parent = task_group(p)->rt_se[cpu];
-#endif
-	rcu_read_unlock();
-}
-
-#else
-
-static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline struct task_group *task_group(struct task_struct *p)
-{
-	return NULL;
-}
-
 #endif	/* CONFIG_CGROUP_SCHED */
 
 /* CFS-related fields in a runqueue */
@@ -644,6 +598,49 @@ static inline int cpu_of(struct rq *rq)
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		(&__raw_get_cpu_var(runqueues))
 
+#ifdef CONFIG_CGROUP_SCHED
+
+/*
+ * Return the group to which this tasks belongs.
+ *
+ * We use task_subsys_state_check() and extend the RCU verification
+ * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * holds that lock for each task it moves into the cgroup. Therefore
+ * by holding that lock, we pin the task to the current cgroup.
+ */
+static inline struct task_group *task_group(struct task_struct *p)
+{
+	struct cgroup_subsys_state *css;
+
+	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
+			lockdep_is_held(&task_rq(p)->lock));
+	return container_of(css, struct task_group, css);
+}
+
+/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
+static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+	p->se.parent = task_group(p)->se[cpu];
+#endif
+
+#ifdef CONFIG_RT_GROUP_SCHED
+	p->rt.rt_rq  = task_group(p)->rt_rq[cpu];
+	p->rt.parent = task_group(p)->rt_se[cpu];
+#endif
+}
+
+#else /* CONFIG_CGROUP_SCHED */
+
+static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
+static inline struct task_group *task_group(struct task_struct *p)
+{
+	return NULL;
+}
+
+#endif /* CONFIG_CGROUP_SCHED */
+
 inline void update_rq_clock(struct rq *rq)
 {
 	if (!rq->skip_clock_update)
@@ -4465,16 +4462,6 @@ recheck:
 	}
 
 	if (user) {
-#ifdef CONFIG_RT_GROUP_SCHED
-		/*
-		 * Do not allow realtime tasks into groups that have no runtime
-		 * assigned.
-		 */
-		if (rt_bandwidth_enabled() && rt_policy(policy) &&
-				task_group(p)->rt_bandwidth.rt_runtime == 0)
-			return -EPERM;
-#endif
-
 		retval = security_task_setscheduler(p, policy, param);
 		if (retval)
 			return retval;
@@ -4490,6 +4477,22 @@ recheck:
 	 * runqueue lock must be held.
 	 */
 	rq = __task_rq_lock(p);
+
+#ifdef CONFIG_RT_GROUP_SCHED
+	if (user) {
+		/*
+		 * Do not allow realtime tasks into groups that have no runtime
+		 * assigned.
+		 */
+		if (rt_bandwidth_enabled() && rt_policy(policy) &&
+				task_group(p)->rt_bandwidth.rt_runtime == 0) {
+			__task_rq_unlock(rq);
+			raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+			return -EPERM;
+		}
+	}
+#endif
+
 	/* recheck policy now with rq lock held */
 	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
 		policy = oldpolicy = -1;

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-08 13:54 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1275998603.3706.118.camel@jlt3.sipsolutions.net>

On Tue, Jun 8, 2010 at 14:03, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2010-06-08 at 13:55 +0200, Kay Sievers wrote:
>
>> Ok, this needs testing.
>>
>> Please let me know, if that works for you so far.
>
> Will do. Just a question: none of this seems to pin the module? So can I
> be sure if I rmmod the module that the release function will still be
> around etc.?

Oh, sorry, that's something very specific to network devices, that the
module can be unloaded while the devices it has created are still in
use. I am not sure right now what's needed to make this work in a
single module.

Kay

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-08 14:06 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTinev96yxFCFSdDdT89jjgYR4ChUCMU2-aSj8kkh@mail.gmail.com>

On Tue, 2010-06-08 at 15:54 +0200, Kay Sievers wrote:
> On Tue, Jun 8, 2010 at 14:03, Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Tue, 2010-06-08 at 13:55 +0200, Kay Sievers wrote:
> >
> >> Ok, this needs testing.
> >>
> >> Please let me know, if that works for you so far.
> >
> > Will do. Just a question: none of this seems to pin the module? So can I
> > be sure if I rmmod the module that the release function will still be
> > around etc.?
> 
> Oh, sorry, that's something very specific to network devices, that the
> module can be unloaded while the devices it has created are still in
> use. I am not sure right now what's needed to make this work in a
> single module.

Well they will be unregistered and everything, but once all the netdevs
are gone etc. the devices you create in the patch might stick around
because somebody has them open in sysfs, and I see nothing that would
pin the module in that case?

johannes


^ permalink raw reply

* Re: [PATCH nf-next-2.6] conntrack: IPS_UNTRACKED bit
From: Patrick McHardy @ 2010-06-08 14:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Netfilter Developers, netdev
In-Reply-To: <1275668732.2482.201.camel@edumazet-laptop>

On 04.06.2010 18:25, Eric Dumazet wrote:
> [PATCH nf-next-2.6] conntrack: IPS_UNTRACKED bit
> 
> NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
> twice per packet. This is bad for performance.
> __read_mostly annotation is also a bad choice.
> 
> This patch introduces IPS_UNTRACKED bit so that we can use later a
> per_cpu untrack structure more easily.
> 
> A new helper, nf_ct_untracked_get() returns a pointer to
> nf_conntrack_untracked.
> 
> Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
> IPS_NAT_DONE_MASK bits to untracked status.
> 
> nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.

Applied, thanks Eric.

^ permalink raw reply

* Re: RFS seems to have incompatiblities with bridged vlans
From: Eric Dumazet @ 2010-06-08 14:18 UTC (permalink / raw)
  To: John Fastabend
  Cc: Stephen Hemminger, Peter Lieven, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <4C0D7312.1020300@intel.com>

Le lundi 07 juin 2010 à 15:30 -0700, John Fastabend a écrit :

> There is always a possibility that the underlying device sets the 
> queue_mapping to be greater then num_cpus.  Also I suspect the same 
> issue exists with bonding devices.  Maybe something like the following 
> is worth while? compile tested only,
> 
> [PATCH] 8021q: vlan reassigns dev without check queue_mapping
> 
> recv path reassigns skb->dev without sanity checking the
> queue_mapping field.  This can result in the queue_mapping
> field being set incorrectly if the new dev supports less
> queues then the underlying device.
> 
> This patch just resets the queue_mapping to 0 which should
> resolve this issue?  Any thoughts?
> 
> The same issue could happen on bonding devices as well.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 
>   net/8021q/vlan_core.c |    6 ++++++
>   1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index bd537fc..ad309f8 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -21,6 +21,9 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct 
> vlan_group *grp,
>   	if (!skb->dev)
>   		goto drop;
> 
> +	if (unlikely(skb->queue_mapping >= skb->dev->real_num_tx_queues))
> +		skb_set_queue_mapping(skb, 0);
> +
>   	return (polling ? netif_receive_skb(skb) : netif_rx(skb));
> 
>   drop:
> @@ -93,6 +96,9 @@ vlan_gro_common(struct napi_struct *napi, struct 
> vlan_group *grp,
>   	if (!skb->dev)
>   		goto drop;
> 
> +	if (unlikely(skb->queue_mapping >= skb->dev->real_num_tx_queues))
> +		skb_set_queue_mapping(skb, 0);
> +
>   	for (p = napi->gro_list; p; p = p->next) {
>   		NAPI_GRO_CB(p)->same_flow =
>   			p->dev == skb->dev && !compare_ether_header(
> --

Only a workaround, added in hot path in a otherwise 'good' driver
(multiqueue enabled and ready)

eth0  -------> bond / bridge ---------> vlan.id
(nbtxq=8)      (ntxbq=1)        (nbtxq=X)

X is capped to 1 because of bond/bridge, while bond has no "queue"
(LLTX driver)

Solutions :

1) queue_mapping could be silently tested in get_rps_cpu()...

diff --git a/net/core/dev.c b/net/core/dev.c
index 6f330ce..3a3f7f6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2272,14 +2272,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
-		if (unlikely(index >= dev->num_rx_queues)) {
-			if (net_ratelimit()) {
-				pr_warning("%s received packet on queue "
-					"%u, but number of RX queues is %u\n",
-					dev->name, index, dev->num_rx_queues);
-			}
-			goto done;
-		}
+		if (WARN_ONCE(index >= dev->num_rx_queues,
+				KERN_WARNING "%s received packet on queue %u, "
+				"but number of RX queues is %u\n",
+				dev->name, index, dev->num_rx_queues))
+			index %= dev->num_rx_queues;
 		rxqueue = dev->_rx + index;
 	} else
 		rxqueue = dev->_rx;



2) bond/bridge should setup more queues, just in case.
   We probably need to be able to make things more dynamic,
   (propagate nbtxq between layers) but not for 2.6.35



diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..ce813dd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
 
 	rtnl_lock();
 
-	bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
-				bond_setup);
+	bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+				   bond_setup, max(64, nr_cpu_ids));
 	if (!bond_dev) {
 		pr_err("%s: eek! can't alloc netdev!\n", name);
 		rtnl_unlock();



^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-08 14:21 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1276005970.3706.132.camel@jlt3.sipsolutions.net>

On Tue, Jun 8, 2010 at 16:06, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2010-06-08 at 15:54 +0200, Kay Sievers wrote:
>> On Tue, Jun 8, 2010 at 14:03, Johannes Berg <johannes@sipsolutions.net> wrote:
>> > On Tue, 2010-06-08 at 13:55 +0200, Kay Sievers wrote:
>> >
>> >> Ok, this needs testing.
>> >>
>> >> Please let me know, if that works for you so far.
>> >
>> > Will do. Just a question: none of this seems to pin the module? So can I
>> > be sure if I rmmod the module that the release function will still be
>> > around etc.?
>>
>> Oh, sorry, that's something very specific to network devices, that the
>> module can be unloaded while the devices it has created are still in
>> use. I am not sure right now what's needed to make this work in a
>> single module.
>
> Well they will be unregistered and everything, but once all the netdevs
> are gone etc. the devices you create in the patch might stick around
> because somebody has them open in sysfs, and I see nothing that would
> pin the module in that case?

That's what I mean, I have no idea how to solve that with network
devices. I don't think any other subsytem allows to unload the module,
when devices, the module has created, are in use.

With this patch, it is likely to fail, even without sysfs, just when
the netif is still in up, and the module is unloaded.

The current code uses the in-core class_create() logic, which was only
meant for devices with a device node, and which is cleaned up by the
core itself. That's why this issue never appeared before.

But as said, I have no idea how to solve that with a single module. It
might not work at all without moving stuff into the core. That people
use device_create() with no major/minor might indicate that something
else is needed here. :)

Kay

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-08 14:26 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTikB0Amzi61JLxdY5HL-2s_nbYo3JH8vXooXzHdQ@mail.gmail.com>

On Tue, 2010-06-08 at 16:21 +0200, Kay Sievers wrote:

> > Well they will be unregistered and everything, but once all the netdevs
> > are gone etc. the devices you create in the patch might stick around
> > because somebody has them open in sysfs, and I see nothing that would
> > pin the module in that case?
> 
> That's what I mean, I have no idea how to solve that with network
> devices. I don't think any other subsytem allows to unload the module,
> when devices, the module has created, are in use.

You're right ... the would only be unregistered from your release, which
would happen after the module is long gone ...

> The current code uses the in-core class_create() logic, which was only
> meant for devices with a device node, and which is cleaned up by the
> core itself. That's why this issue never appeared before.
> 
> But as said, I have no idea how to solve that with a single module. It
> might not work at all without moving stuff into the core. That people
> use device_create() with no major/minor might indicate that something
> else is needed here. :)

So ... can we apply Eric's patch for now then?

johannes


^ permalink raw reply

* Re: [PATCH nf-next-2.6 2/2] conntrack: per_cpu untracking
From: Patrick McHardy @ 2010-06-08 14:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Netfilter Developers, netdev
In-Reply-To: <1275682522.2490.6.camel@edumazet-laptop>

On 04.06.2010 22:15, Eric Dumazet wrote:
> NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
> twice per packet, slowing down performance.
> 
> This patch converts it to a per_cpu variable.
> 
> We assume same cpu is used for a given packet, entering and exiting the
> NOTRACK state.

That doesn't seem to be a valid assumption, the conntrack entry is
attached to the skb and processing in the output path might get
preempted and rescheduled to a different CPU.

^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-08 14:47 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1276007174.3706.133.camel@jlt3.sipsolutions.net>

On Tue, Jun 8, 2010 at 16:26, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2010-06-08 at 16:21 +0200, Kay Sievers wrote:
>
>> > Well they will be unregistered and everything, but once all the netdevs
>> > are gone etc. the devices you create in the patch might stick around
>> > because somebody has them open in sysfs, and I see nothing that would
>> > pin the module in that case?
>>
>> That's what I mean, I have no idea how to solve that with network
>> devices. I don't think any other subsytem allows to unload the module,
>> when devices, the module has created, are in use.
>
> You're right ... the would only be unregistered from your release, which
> would happen after the module is long gone ...
>
>> The current code uses the in-core class_create() logic, which was only
>> meant for devices with a device node, and which is cleaned up by the
>> core itself. That's why this issue never appeared before.
>>
>> But as said, I have no idea how to solve that with a single module. It
>> might not work at all without moving stuff into the core. That people
>> use device_create() with no major/minor might indicate that something
>> else is needed here. :)
>
> So ... can we apply Eric's patch for now then?

It might break other stuff we don't know about yet. Just like we did
not know about what things hwsim is doing here. :)

Eric's patch change the sysfs layout and insert directories into the
device path, and stuff which hardcodes things, or accesses devices
with ../<name> will break.

I don't mind trying it, but I wouldn't be surprised if that needs to
be reverted when issues caused by this appear.


The hwsim issues are caused by the current hwsim code, by doing things
it should not do. Class devices of different classes must never be
stacked (the core should not allow that in the first place). Class
devices must never have a driver assigned behind its back. Also
device_create() should not be used for devices without a major/minor
(but that seems to be done in several other places too).

To fix the hwsim driver core interaction, core changes will probably
be needed to allow network modules to be removed while their devices
are active. That's something which seems not to work for bus devices
currently.

Kay

^ permalink raw reply

* Re: [PATCH nf-next-2.6 2/2] conntrack: per_cpu untracking
From: Eric Dumazet @ 2010-06-08 14:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Changli Gao, Netfilter Developers, netdev
In-Reply-To: <4C0E53C4.7090308@trash.net>

Le mardi 08 juin 2010 à 16:29 +0200, Patrick McHardy a écrit :
> On 04.06.2010 22:15, Eric Dumazet wrote:
> > NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
> > twice per packet, slowing down performance.
> > 
> > This patch converts it to a per_cpu variable.
> > 
> > We assume same cpu is used for a given packet, entering and exiting the
> > NOTRACK state.
> 
> That doesn't seem to be a valid assumption, the conntrack entry is
> attached to the skb and processing in the output path might get
> preempted and rescheduled to a different CPU.

Thats unfortunate.

Ok, only choice then is to not change refcount on the untracked ct, and
keep a shared (read only after setup time) untrack structure.





--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox